点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!
cs.LG 方向,今日共计143篇
大模型相关(10篇)
【1】De Jure: Iterative LLM Self-Refinement for Structured Extraction of Regulatory Rules
标题:De Jure:迭代LLM自我精炼,用于结构化提取监管规则
链接:https://arxiv.org/abs/2604.02276
作者:Keerat Guliani,Deepkamal Gill,David Landsman,Nima Eshraghi,Krishna Kumar,Lovedeep Gondara
摘要:监管文件编码具有法律约束力的义务,基于LLM的系统必须尊重。然而,将密集的、分层结构的法律文本转换成机器可读的规则仍然是一个昂贵的、专家密集型的过程。我们提出了De Jure,一个完全自动化的,与域无关的管道,用于从原始文档中提取结构化的监管规则,不需要人工注释,特定于域的提示或注释的黄金数据。De Jure通过四个连续的阶段进行操作:将源文档规范化为结构化的Markdown; LLM驱动的语义分解为结构化的规则单元;跨元数据,定义和规则语义的19个维度的多标准LLM作为判断评估;以及在有限的再生预算内迭代修复低得分提取,其中上游组件在规则单元评估之前进行修复。我们在三个监管语料库(涵盖金融、医疗保健和人工智能治理)上评估了四个模型的De Jure。在金融领域,De Jure在提取质量上产生了一致和单调的改进,在三次判断引导的迭代中达到了峰值性能。De Jure有效地推广到医疗保健和人工智能治理,在开源和闭源模型中保持高性能。通过RAG进行的下游合规性问答评估中,基于De Jure提取规则的响应在单规则检索深度的73.8%的情况下优于先前的工作,在更广泛的检索下上升到84.0%,证实了提取保真度直接转化为下游效用。这些结果表明,明确的,可解释的评估标准可以取代人类注释在复杂的监管领域,提供了一个可扩展的和可审计的路径,以监管为基础的LLM对齐。
摘要:Regulatory documents encode legally binding obligations that LLM-based systems must respect. Yet converting dense, hierarchically structured legal text into machine-readable rules remains a costly, expert-intensive process. We present De Jure, a fully automated, domain-agnostic pipeline for extracting structured regulatory rules from raw documents, requiring no human annotation, domain-specific prompting, or annotated gold data. De Jure operates through four sequential stages: normalization of source documents into structured Markdown; LLM-driven semantic decomposition into structured rule units; multi-criteria LLM-as-a-judge evaluation across 19 dimensions spanning metadata, definitions, and rule semantics; and iterative repair of low-scoring extractions within a bounded regeneration budget, where upstream components are repaired before rule units are evaluated. We evaluate De Jure across four models on three regulatory corpora spanning finance, healthcare, and AI governance. On the finance domain, De Jure yields consistent and monotonic improvement in extraction quality, reaching peak performance within three judge-guided iterations. De Jure generalizes effectively to healthcare and AI governance, maintaining high performance across both open- and closed-source models. In a downstream compliance question-answering evaluation via RAG, responses grounded in De Jure extracted rules are preferred over prior work in 73.8% of cases at single-rule retrieval depth, rising to 84.0% under broader retrieval, confirming that extraction fidelity translates directly into downstream utility. These results demonstrate that explicit, interpretable evaluation criteria can substitute for human annotation in complex regulatory domains, offering a scalable and auditable path toward regulation-grounded LLM alignment.
【2】The Expert Strikes Back: Interpreting Mixture-of-Experts Language Models at Expert Level
标题:专家反击:在专家级别解释专家混合语言模型
链接:https://arxiv.org/abs/2604.02178
作者:Jeremy Herbst,Jae Hee Lee,Stefan Wermter
摘要:混合专家(MoE)架构已成为扩展大型语言模型(LLM)的主要选择,每个令牌仅激活参数的子集。虽然MoE架构主要是为了计算效率而采用的,但它们的稀疏性是否使它们比密集前馈网络(FFN)更容易解释仍然是一个悬而未决的问题。我们比较MoE专家和密集的FFNs使用$k$-稀疏探测,发现专家神经元是一致的多语义,随着路由变得稀疏的差距扩大。这表明,稀疏性会使单个神经元和整个专家都倾向于单义性。利用这一发现,我们从神经元缩小到专家级别,作为更有效的分析单元。我们通过自动解释数百名专家来验证这种方法。这种分析使我们能够解决关于专业化的争论:专家既不是广泛领域的专家(例如,生物学)也不是简单的令牌级处理器。相反,他们充当细粒度任务专家,专门从事语言操作或语义任务(例如,LaTeX中的闭括号)。我们的研究结果表明,MOE本质上是可解释的专家级,提供了一个更清晰的路径,大规模模型的可解释性。代码可从以下网址获得:https://github.com/jerryy33/MoE_analysis
摘要:Mixture-of-Experts (MoE) architectures have become the dominant choice for scaling Large Language Models (LLMs), activating only a subset of parameters per token. While MoE architectures are primarily adopted for computational efficiency, it remains an open question whether their sparsity makes them inherently easier to interpret than dense feed-forward networks (FFNs). We compare MoE experts and dense FFNs using $k$-sparse probing and find that expert neurons are consistently less polysemantic, with the gap widening as routing becomes sparser. This suggests that sparsity pressures both individual neurons and entire experts toward monosemanticity. Leveraging this finding, we zoom out from the neuron to the expert level as a more effective unit of analysis. We validate this approach by automatically interpreting hundreds of experts. This analysis allows us to resolve the debate on specialization: experts are neither broad domain specialists (e.g., biology) nor simple token-level processors. Instead, they function as fine-grained task experts, specializing in linguistic operations or semantic tasks (e.g., closing brackets in LaTeX). Our findings suggest that MoEs are inherently interpretable at the expert level, providing a clearer path toward large-scale model interpretability. Code is available at: https://github.com/jerryy33/MoE_analysis
【3】AA-SVD : Anchored and Adaptive SVD for Large Language Model Compression
标题:AA-MVD:用于大型语言模型压缩的锚定和自适应的MVD
链接:https://arxiv.org/abs/2604.02119
作者:Atul Kumar Sinha,François Fleuret
摘要:我们引入了一个快速的基于低秩因式分解的框架来压缩大型语言模型,该框架可以快速压缩十亿参数模型,而无需重新训练。与现有的基于因式分解的方法不同,这些方法仅在原始输入上进行优化,忽略上游压缩的分布偏移,从而向前传播错误,或者只依赖于偏移的输入并有远离原始输出的风险,我们的方法兼顾了两者。除了单独的层压缩之外,我们还进一步端到端地优化每个Transformer块,最大限度地减少块级输出失真,并允许压缩层联合补偿累积误差。通过将每个压缩层锚定到原始输出,同时显式地对输入分布偏移进行建模,我们的方法找到了一个低秩近似,该近似与原始模型保持功能等效。在大型语言模型上的实验表明,我们的方法在压缩比上始终优于现有的基于SVD的基线,在激进的压缩预算中,这种优势变得越来越明显,竞争方法大幅下降或完全崩溃,为高效,大规模的模型部署提供了一个实用的解决方案。
摘要
:We introduce a fast low-rank factorization-based framework for compressing large language models that enables rapid compression of billion-parameter models without retraining. Unlike existing factorization-based approaches that optimize only on the original inputs, ignoring distribution shifts from upstream compression and thus propagating errors forward, or those that rely only on shifted inputs and risk drifting away from the original outputs, our approach accounts for both. Beyond individual layer compression, we further refine each transformer block end-to-end, minimizing block-level output distortion and allowing compressed layers to jointly compensate for accumulated errors. By anchoring each compressed layer to the original outputs while explicitly modeling input distribution shifts, our method finds a low-rank approximation that maintains functional equivalence with the original model. Experiments on large language models show that our method consistently outperforms existing SVD-based baselines across compression ratios, with the advantage becoming increasingly pronounced at aggressive compression budgets, where competing methods degrade substantially or collapse entirely, offering a practical solution for efficient, large-scale model deployment.
【4】FourierMoE: Fourier Mixture-of-Experts Adaptation of Large Language Models
标题:FourierMoE:大型语言模型的傅里叶专家混合适应
链接:https://arxiv.org/abs/2604.01762
作者:Juyong Jiang,Fan Wang,Hong Qi,Sunghun Kim,Jing Tang
备注:The first two authors contributed equally to this work; listing order is random
摘要:参数有效微调(PEFT)已经成为一个重要的范例,适应大型语言模型(LLM)在有限的计算预算。然而,标准PEFT方法往往挣扎在多任务微调设置,不同的优化目标诱导任务干扰和有限的参数预算导致代表性不足。虽然最近的方法采用混合专家(MoE)来缓解这些问题,但它们主要在空间域中操作,这可能会引入结构冗余和参数开销。为了克服这些限制,我们重新制定适应在频谱域。我们的频谱分析表明,不同的任务表现出不同的频率能量分布,LLM层显示异构频率敏感性。出于这些见解,我们提出了傅立叶MoE,它集成了MoE架构与离散傅立叶逆变换(IDFT)的频率感知适应。具体而言,FourierMoE采用频率自适应路由器将令牌分发给专门研究不同频带的专家。每个专家学习一组共轭对称复系数,保留完整的相位和幅度信息,同时理论上保证无损IDFT重建为实值空间权重。对28个基准、多个模型架构和规模的广泛评估表明,FourierMoE在单任务和多任务设置中的表现始终优于竞争基线,同时使用的可训练参数明显减少。这些结果突出了频谱域专家自适应作为LLM微调的有效和参数高效的范例的承诺。
摘要:Parameter-efficient fine-tuning (PEFT) has emerged as a crucial paradigm for adapting large language models (LLMs) under constrained computational budgets. However, standard PEFT methods often struggle in multi-task fine-tuning settings, where diverse optimization objectives induce task interference and limited parameter budgets lead to representational deficiency. While recent approaches incorporate mixture-of-experts (MoE) to alleviate these issues, they predominantly operate in the spatial domain, which may introduce structural redundancy and parameter overhead. To overcome these limitations, we reformulate adaptation in the spectral domain. Our spectral analysis reveals that different tasks exhibit distinct frequency energy distributions, and that LLM layers display heterogeneous frequency sensitivities. Motivated by these insights, we propose FourierMoE, which integrates the MoE architecture with the inverse discrete Fourier transform (IDFT) for frequency-aware adaptation. Specifically, FourierMoE employs a frequency-adaptive router to dispatch tokens to experts specialized in distinct frequency bands. Each expert learns a set of conjugate-symmetric complex coefficients, preserving complete phase and amplitude information while theoretically guaranteeing lossless IDFT reconstruction into real-valued spatial weights. Extensive evaluations across 28 benchmarks, multiple model architectures, and scales demonstrate that FourierMoE consistently outperforms competitive baselines in both single-task and multi-task settings while using significantly fewer trainable parameters. These results highlight the promise of spectral-domain expert adaptation as an effective and parameter-efficient paradigm for LLM fine-tuning.
【5】Expert-Choice Routing Enables Adaptive Computation in Diffusion Language Models
标题:专家选择路由支持扩散语言模型中的自适应计算
链接:https://arxiv.org/abs/2604.01622
作者:Shuibai Zhang,Caspian Zhuang,Chihan Cui,Zhihan Yang,Fred Zhangzhi Peng,Yanxin Zhang,Haoyue Bai,Zack Jia,Yang Zhou,Guanhua Chen,Ming Liu
备注:26 pages
摘要:扩散语言模型(DLM)支持并行、非自回归的文本生成,然而现有的DLM专家混合(MoE)模型继承自回归系统的令牌选择(TC)路由,导致负载不平衡和刚性计算分配。我们表明,专家选择(EC)路由是一个更好的适合DLMs:它提供了确定性的负载平衡的设计,产生更高的吞吐量和更快的收敛比TC。基于EC能力是外部可控的属性,我们引入了时间步长相关的专家能力,根据去噪步骤改变专家分配。我们发现,分配更多的容量到低掩码比的步骤一致地实现匹配的FLOP下的最佳性能,并提供了一个机械的解释:令牌在低掩码比的上下文中表现出一个数量级更高的学习效率,因此集中计算这些步骤产生最大的边际回报。最后,我们表明,现有的预训练TC DLMs可以通过仅替换路由器来改造EC,从而在不同的下游任务中实现更快的收敛和更高的准确性。总之,这些结果建立EC路由DLM MoE模型的一个优越的范例,并表明,在DLMs的计算可以被视为一个自适应的政策,而不是一个固定的架构常数。代码可在https://github.com/zhangshuibai/EC-DLM上获得。
摘要:Diffusion language models (DLMs) enable parallel, non-autoregressive text generation, yet existing DLM mixture-of-experts (MoE) models inherit token-choice (TC) routing from autoregressive systems, leading to load imbalance and rigid computation allocation. We show that expert-choice (EC) routing is a better fit for DLMs: it provides deterministic load balancing by design, yielding higher throughput and faster convergence than TC. Building on the property that EC capacity is externally controllable, we introduce timestep-dependent expert capacity, which varies expert allocation according to the denoising step. We find that allocating more capacity to low-mask-ratio steps consistently achieves the best performance under matched FLOPs, and provide a mechanistic explanation: tokens in low-mask-ratio contexts exhibit an order-of-magnitude higher learning efficiency, so concentrating compute on these steps yields the largest marginal return. Finally, we show that existing pretrained TC DLMs can be retrofitted to EC by replacing only the router, achieving faster convergence and improved accuracy across diverse downstream tasks. Together, these results establish EC routing as a superior paradigm for DLM MoE models and demonstrate that computation in DLMs can be treated as an adaptive policy rather than a fixed architectural constant. Code is available at https://github.com/zhangshuibai/EC-DLM.
【6】Learning from the Right Rollouts: Data Attribution for PPO-based LLM Post-Training
标题:从正确的滚动学习:基于PPO的LLM后训练的数据归因
链接:https://arxiv.org/abs/2604.01597
作者:Dong Shu,Denghui Zhang,Jessica Hullman
摘要:传统的RL算法(如邻近策略优化(PPO))通常在整个推出缓冲器上进行训练,在所有生成的片段提供有益的优化信号的假设下操作。然而,这些事件经常包含噪声或不忠实的推理,这会降低模型性能并减慢训练速度。在本文中,我们提出了\textbf{Influence-Guided PPO(I-PPO)},这是一个将数据属性集成到RL后训练循环中的新框架。通过使用基于梯度的近似计算每个事件的影响分数,I-PPO识别并消除与验证梯度反对齐的事件。我们的实验表明,I-PPO始终优于SFT和PPO基线。我们表明,我们的过滤过程作为一个内在的早期停止机制,加快训练效率,同时有效地减少不忠实的CoT推理。
摘要:Traditional RL algorithms like Proximal Policy Optimization (PPO) typically train on the entire rollout buffer, operating under the assumption that all generated episodes provide a beneficial optimization signal. However, these episodes frequently contain noisy or unfaithful reasoning, which can degrade model performance and slow down training. In this paper, we propose \textbf{Influence-Guided PPO (I-PPO)}, a novel framework that integrates data attribution into the RL post-training loop. By calculating an influence score for each episode using a gradient-based approximation, I-PPO identifies and eliminates episodes that are anti-aligned with a validation gradient. Our experiments demonstrate that I-PPO consistently outperforms SFT and PPO baselines. We show that our filtering process acts as an intrinsic early stopping mechanism, accelerating training efficiency while effectively reducing unfaithful CoT reasoning.
【7】Does Your Optimizer Care How You Normalize? Normalization-Optimizer Coupling in LLM Training
标题:您的优化器关心您如何规范化吗?LLM训练中的规范化-优化器耦合
链接:https://arxiv.org/abs/2604.01563
作者
:Abdelrahman Abouzeid
备注:16 pages, 8 figures. Preprint. Under review
摘要:在LLM训练中,归一化层和优化器通常被视为独立的设计选择。在1B参数和1000个训练步骤的3x 2阶乘中,我们证明了这个假设可能失败:动态Erf(Derf; Chen & Liu,2025)与μ子(Jordan,2024)存在很大的负相互作用,其与RMS Norm的差距从AdamW下的+0.31 nat增长到μ子下的+0.97,大约大了三倍。动态Tanh(DyT; Zhu等人,2025),包括作为一个有界归一化控制,没有显示这样的惩罚。我们的证据指出两种故障模式下μ子的更快的频谱范数增长:饱和(有损压缩)和规模盲目性(丢弃激活幅度)。重新引入运行规模估计的EMA混合恢复了约84%的差距。单独地,将Derf的alpha从其公布的默认值(0.5到0.3)降低,通过保持erf在其近线性状态下恢复约80%,其中它近似地保持相对尺度;该设置不是Chen & Liu(2025)公布的默认值。使用Derf公布的带有μ子的默认alpha会导致0.66 nat的相互作用损失,而不会产生NaN或发散,这使得在短期试验中很容易错过失败。
摘要:In LLM training, normalization layers and optimizers are typically treated as independent design choices. In a 3x2 factorial at 1B parameters and 1000 training steps, we show this assumption can fail: Dynamic Erf (Derf; Chen & Liu, 2025) suffers a large negative interaction with Muon (Jordan, 2024), with its gap to RMSNorm growing from +0.31 nats under AdamW to +0.97 under Muon, approximately three times larger. Dynamic Tanh (DyT; Zhu et al., 2025), included as a bounded-normalizer control, shows no such penalty. Our evidence points to two failure modes of erf under Muon's faster spectral-norm growth: saturation (lossy compression) and scale blindness (discarding activation magnitude). An EMA-blend that reintroduces running scale estimates recovers ~84% of the gap. Separately, reducing Derf's alpha from its published default (0.5 to 0.3) recovers ~80% by keeping erf in its near-linear regime, where it approximately preserves relative scale; this setting is not the published default of Chen & Liu (2025). Using Derf's published default alpha with Muon incurs a 0.66-nat interaction penalty without producing NaNs or divergence, making the failure easy to miss in short pilot runs.
【8】Matching Accuracy, Different Geometry: Evolution Strategies vs GRPO in LLM Post-Training
标题:匹配准确性、不同的几何形状:LLM后训练中的进化策略与GRPO
链接:https://arxiv.org/abs/2604.01499
作者:William Hoy,Binxu Wang,Xu Pan
摘要:进化策略(ES)已经成为基于强化学习的LLM微调的可扩展的无梯度替代方案,但目前尚不清楚可比的任务性能是否意味着参数空间中的可比解决方案。我们比较ES和组相对策略优化(GRPO)在四个任务中的单任务和连续的连续学习设置。ES匹配或超过GRPO单任务的准确性,并保持竞争力的顺序时,其迭代预算得到控制。尽管在任务性能的这种相似性,这两种方法产生显着不同的模型更新:ES使更大的变化,并引起更广泛的任务外KL漂移,而GRPO使更小,更本地化的更新。引人注目的是,ES和GRPO解决方案是线性连接的,没有损失障碍,即使他们的更新方向是近正交的。我们开发了ES的分析理论,在一个统一的框架内解释所有这些现象,显示ES如何在弱信息方向上积累大量的任务外运动,同时在任务上仍然取得足够的进展,以匹配基于梯度的RL在下游的准确性。这些结果表明,无梯度和基于梯度的微调可以达到类似的精确度,但几何不同的解决方案,具有重要的后果,遗忘和知识保存。源代码是公开的:https://github.com/Bhoy1/ESvsGRPO。
摘要:Evolution Strategies (ES) have emerged as a scalable gradient-free alternative to reinforcement learning based LLM fine-tuning, but it remains unclear whether comparable task performance implies comparable solutions in parameter space. We compare ES and Group Relative Policy Optimization (GRPO) across four tasks in both single-task and sequential continual-learning settings. ES matches or exceeds GRPO in single-task accuracy and remains competitive sequentially when its iteration budget is controlled. Despite this similarity in task performance, the two methods produce markedly different model updates: ES makes much larger changes and induces broader off-task KL drift, whereas GRPO makes smaller, more localized updates. Strikingly, the ES and GRPO solutions are linearly connected with no loss barrier, even though their update directions are nearly orthogonal. We develop an analytical theory of ES that explains all these phenomena within a unified framework, showing how ES can accumulate large off-task movement on weakly informative directions while still making enough progress on the task to match gradient-based RL in downstream accuracy. These results show that gradient-free and gradient-based fine-tuning can reach similarly accurate yet geometrically distinct solutions, with important consequences for forgetting and knowledge preservation. The source code is publicly available: https://github.com/Bhoy1/ESvsGRPO.
【9】CuTeGen: An LLM-Based Agentic Framework for Generation and Optimization of High-Performance GPU Kernels using CuTe
标题:CuTeGen:一个基于LLM的统计框架,用于使用CuTE生成和优化高性能图形处理器内核
链接:https://arxiv.org/abs/2604.01489
作者:Tara Saba,Anne Ouyang,Xujie Si,Fan Long
摘要:高性能GPU内核对于现代机器学习系统至关重要,但由于算法结构、内存层次结构使用和硬件特定优化之间的紧密耦合,开发高效的实现仍然是一个具有挑战性的专家驱动过程。最近的工作已经探索了使用大型语言模型(LLM)来自动生成GPU内核,但生成的实现通常难以保持正确性并在迭代改进中实现有竞争力的性能。我们提出了CuTeGen,一个代理框架,用于自动生成和优化GPU内核,将内核开发作为一个结构化的生成-测试-细化工作流。与依赖一次性生成或对候选实现进行大规模搜索的方法不同,CuTeGen专注于通过基于执行的验证、结构化调试和分阶段优化来逐步细化单个不断发展的内核。一个关键的设计选择是使用CuTe抽象层生成内核,该抽象层暴露了诸如平铺和数据移动等性能关键结构,同时为迭代修改提供了更稳定的表示。为了指导性能改进,CuTeGen结合了工作负载感知优化提示和分析反馈的延迟集成。矩阵乘法和激活工作负载的实验结果表明,该框架产生功能正确的内核,并实现了相对于优化的库实现有竞争力的性能。
摘要:High-performance GPU kernels are critical to modern machine learning systems, yet developing efficient implementations remains a challenging, expert-driven process due to the tight coupling between algorithmic structure, memory hierarchy usage, and hardware-specific optimizations. Recent work has explored using large language models (LLMs) to generate GPU kernels automatically, but generated implementations often struggle to maintain correctness and achieve competitive performance across iterative refinements. We present CuTeGen, an agentic framework for automated generation and optimization of GPU kernels that treats kernel development as a structured generate--test--refine workflow. Unlike approaches that rely on one-shot generation or large-scale search over candidate implementations, CuTeGen focuses on progressive refinement of a single evolving kernel through execution-based validation, structured debugging, and staged optimization. A key design choice is to generate kernels using the CuTe abstraction layer, which exposes performance-critical structures such as tiling and data movement while providing a more stable representation for iterative modification. To guide performance improvement, CuTeGen incorporates workload-aware optimization prompts and delayed integration of profiling feedback. Experimental results on matrix multiplication and activation workloads demonstrate that the framework produces functionally correct kernels and achieves competitive performance relative to optimized library implementations.
【10】Infeasibility Aware Large Language Models for Combinatorial Optimization
标题:用于组合优化的不可行性意识到大型语言模型
链接:https://arxiv.org/abs/2604.01455
作者:Yakun Wang,Min Chen,Zeguan Wu,Junyu Liu,Sitao Zhang,Zhenwen Shao
摘要:大型语言模型(LLM)越来越多地被探索用于NP难的组合优化问题,但大多数现有方法强调可行实例解决方案的生成,并且没有明确地解决不可行性检测。我们提出了一个不可行性感知框架,该框架结合了可认证的数据集构建,监督微调和LLM辅助的下游搜索。对于未成年人嵌入问题,我们引入了一个新的数学规划公式与可证明的零阶段不可行性筛选,这使得可扩展的训练实例的建设标记为可行的结构化证书或可证明不可行。使用通过这个精确的优化管道生成的训练数据,我们证明了8B参数LLM可以进行微调,以联合执行解决方案生成和不可行性检测。我们进一步利用LLM输出作为下游局部搜索的热启动,提供了一种实用的方法来加速优化,即使LLM输出是不完美的。实验表明,我们的微调模型提高了高达30%的GPT-5.2的整体准确性,同时LLM引导的热启动提供了高达$2\times$加速相比,从头开始在下游本地搜索。
摘要
:Large language models (LLMs) are increasingly explored for NP-hard combinatorial optimization problems, but most existing methods emphasize feasible-instance solution generation and do not explicitly address infeasibility detection. We propose an infeasibility-aware framework that combines certifiable dataset construction, supervised fine-tuning, and LLM-assisted downstream search. For the minor-embedding problem, we introduce a new mathematical programming formulation together with provable zero-phase infeasibility screening, which enables scalable construction of training instances labeled either as feasible with structured certificates or as certifiably infeasible. Using training data generated through this exact optimization pipeline, we show that an 8B-parameter LLM can be fine-tuned to jointly perform solution generation and infeasibility detection. We further utilize LLM outputs as warm starts for downstream local search, providing a practical way to accelerate optimization even when the LLM outputs are imperfect. Experiments show that our fine-tuned model improves overall accuracy by up to 30\% over GPT-5.2; meanwhile LLM-guided warm starts provide up to $2\times$ speedup compared with starting from scratch in downstream local search.
Graph相关(图学习|图神经网络|图优化等)(6篇)
【1】LEO: Graph Attention Network based Hybrid Multi Sensor Extended Object Fusion and Tracking for Autonomous Driving Applications
标题:LEO:基于图注意力网络的混合多传感器扩展目标融合与跟踪
链接:https://arxiv.org/abs/2604.02206
作者:Mayank Mayank,Bharanidhar Duraisamy,Florian Geiss
备注:10 pages, 6 figures
摘要:动态物体的精确形状和轨迹估计对于可靠的自动驾驶至关重要。经典的贝叶斯扩展对象模型提供了理论上的鲁棒性和效率,但依赖于先验和更新似然函数的完整性,而深度学习方法以密集注释和高计算为代价带来了适应性。我们使用LEO(对象的学习扩展)来弥补这些优势,LEO是一种时空图形注意力网络,它融合了多模态生产级传感器跟踪,以学习自适应融合权重,确保时间一致性,并表示多尺度形状。LEO使用特定于任务的地面实况公式,对复杂的几何形状(例如铰接式卡车和拖车)进行建模,并在传感器类型、配置、对象类别和区域之间进行概括,从而保持对具有挑战性和远程目标的鲁棒性。对Mercedes-Benz DRIVE PILOT SAE L3数据集的评估证明了适用于生产系统的实时计算效率;对Delft视图(VoD)等公共数据集的额外验证进一步证实了跨数据集的泛化。
摘要:Accurate shape and trajectory estimation of dynamic objects is essential for reliable automated driving. Classical Bayesian extended-object models offer theoretical robustness and efficiency but depend on completeness of a-priori and update-likelihood functions, while deep learning methods bring adaptability at the cost of dense annotations and high compute. We bridge these strengths with LEO (Learned Extension of Objects), a spatio-temporal Graph Attention Network that fuses multi-modal production-grade sensor tracks to learn adaptive fusion weights, ensure temporal consistency, and represent multi-scale shapes. Using a task-specific parallelogram ground-truth formulation, LEO models complex geometries (e.g. articulated trucks and trailers) and generalizes across sensor types, configurations, object classes, and regions, remaining robust for challenging and long-range targets. Evaluations on the Mercedes-Benz DRIVE PILOT SAE L3 dataset demonstrate real-time computational efficiency suitable for production systems; additional validation on public datasets such as View of Delft (VoD) further confirms cross-dataset generalization.
【2】Robust Graph Representation Learning via Adaptive Spectral Contrast
标题:通过自适应谱对比度的鲁棒图表示学习
链接:https://arxiv.org/abs/2604.01878
作者:Zhuolong Li,Boxue Yang,Haopeng Chen
摘要:谱图对比学习已经成为通过利用高频分量来处理同亲图和异亲图的统一范例。然而,我们确定了一个基本的频谱困境:虽然高频信号是必不可少的编码异质性,我们的理论分析证明,他们表现出显着更高的方差下频谱集中扰动。我们推导出一个遗憾的下限表明,现有的全球(节点不可知的)频谱融合是可证明次优的:在混合图与分离的节点明智的频率偏好,任何全球融合策略招致非零遗憾相对于一个节点明智的预言。为了摆脱这个束缚,我们提出了方面,一个框架,解决了这个困境,通过可靠性感知频谱门控机制。制定为一个极大极小的游戏,ASPECT采用了一个节点明智的门,动态重新加权频率信道的基础上,他们的稳定性对一个专门建立的对手,明确的目标频谱能量分布通过瑞利商罚款。这种设计迫使编码器学习结构上有区别的和频谱上鲁棒的表示。实证结果表明,ASPECT在9个基准测试中的8个上实现了新的最先进的性能,有效地将有意义的结构异质性与附带噪声解耦。
摘要:Spectral graph contrastive learning has emerged as a unified paradigm for handling both homophilic and heterophilic graphs by leveraging high-frequency components. However, we identify a fundamental spectral dilemma: while high-frequency signals are indispensable for encoding heterophily, our theoretical analysis proves they exhibit significantly higher variance under spectrally concentrated perturbations. We derive a regret lower bound showing that existing global (node-agnostic) spectral fusion is provably sub-optimal: on mixed graphs with separated node-wise frequency preferences, any global fusion strategy incurs non-vanishing regret relative to a node-wise oracle. To escape this bound, we propose ASPECT, a framework that resolves this dilemma through a reliability-aware spectral gating mechanism. Formulated as a minimax game, ASPECT employs a node-wise gate that dynamically re-weights frequency channels based on their stability against a purpose-built adversary, which explicitly targets spectral energy distributions via a Rayleigh quotient penalty. This design forces the encoder to learn representations that are both structurally discriminative and spectrally robust. Empirical results show that ASPECT achieves new state-of-the-art performance on 8 out of 9 benchmarks, effectively decoupling meaningful structural heterophily from incidental noise.
【3】Graph Neural Operator Towards Edge Deployability and Portability for Sparse-to-Dense, Real-Time Virtual Sensing on Irregular Grids
标题:图神经运算器在不规则网格上实现从稀疏到密集、实时虚拟感知的边缘可部署性和可移植性
链接:https://arxiv.org/abs/2604.01802
作者:William Howes,Jason Yoo,Kazuma Kobayashi,Subhankar Sarkar,Farid Ahmed,Souvik Chakraborty,Syed Bahauddin Alam
备注:34 pages, 5 figures, 16 tables
摘要:空间分布的物理场的精确感测通常需要密集的仪器,由于成本、可访问性和环境约束,这在现实世界的系统中通常是不可行的。基于物理的求解器通过控制方程的直接数值积分来解决这个问题,但是它们的计算延迟和功率要求排除了在资源受限的监测和控制系统中的实时使用。在这里,我们介绍了VIRSO(虚拟不规则实时稀疏算子),一个基于图形的神经运算符,用于不规则几何结构的稀疏到密集重建,以及一个可变连接算法,可变KNN(V-KNN),用于网格信息图的构建。与将硬件可部署性视为次要的先前神经运算符不同,VIRSO将推理重新定义为测量:光谱和空间分析的组合提供了准确的重建,而没有先前基于图形的方法的高延迟和功耗,可扩展性差,使VIRSO成为边缘约束的实时虚拟感测的潜在候选者。我们评估VIRSO上的三个核热工水力基准的几何和多物理场的复杂性增加,在重建比从47:1到156:1。VIRSO实现了平均相对L_2 $误差低于1%,优于其他基准操作,而使用更少的参数。完整的10层配置将能量延迟积(EDP)从图形运算符基线的${\approx}206$ J$\cdot$ms降低到NVIDIA H200上的$10.1$ J$\cdot$ms。在NVIDIA Jetson Orin Nano上实现,VIRSO的所有配置都提供低于10 W的功耗和低于秒的延迟。这些结果建立边缘的可行性和硬件的可移植性VIRSO和本计算感知操作员学习作为一个新的范例,在人迹罕至和资源受限的环境中进行实时传感。
摘要
:Accurate sensing of spatially distributed physical fields typically requires dense instrumentation, which is often infeasible in real-world systems due to cost, accessibility, and environmental constraints. Physics-based solvers address this through direct numerical integration of governing equations, but their computational latency and power requirements preclude real-time use in resource-constrained monitoring and control systems. Here we introduce VIRSO (Virtual Irregular Real-Time Sparse Operator), a graph-based neural operator for sparse-to-dense reconstruction on irregular geometries, and a variable-connectivity algorithm, Variable KNN (V-KNN), for mesh-informed graph construction. Unlike prior neural operators that treat hardware deployability as secondary, VIRSO reframes inference as measurement: the combination of both spectral and spatial analysis provides accurate reconstruction without the high latency and power consumption of previous graph-based methodologies with poor scalability, presenting VIRSO as a potential candidate for edge-constrained, real-time virtual sensing. We evaluate VIRSO on three nuclear thermal-hydraulic benchmarks of increasing geometric and multiphysics complexity, across reconstruction ratios from 47:1 to 156:1. VIRSO achieves mean relative $L_2$ errors below 1%, outperforming other benchmark operators while using fewer parameters. The full 10-layer configuration reduces the energy-delay product (EDP) from ${\approx}206$ J$\cdot$ms for the graph operator baseline to $10.1$ J$\cdot$ms on an NVIDIA H200. Implemented on an NVIDIA Jetson Orin Nano, all configurations of VIRSO provide sub-10 W power consumption and sub-second latency. These results establish the edge-feasibility and hardware-portability of VIRSO and present compute-aware operator learning as a new paradigm for real-time sensing in inaccessible and resource-constrained environments.
【4】CRIT: Graph-Based Automatic Data Synthesis to Enhance Cross-Modal Multi-Hop Reasoning
标题:CRIT:基于图的自动数据合成增强跨模式多跳推理
链接:https://arxiv.org/abs/2604.01634
作者:Junyoung Sung,Seungwoo Lyu,Minjun Kim,Sumin An,Arsha Nagrani,Paul Hongsuck Seo
备注:Accepted to CVPR 2026
摘要:现实世界的推理通常需要跨模态组合信息,在多跳过程中将文本上下文与视觉线索连接起来。然而,大多数多模态基准测试都无法捕捉到这种能力:它们通常依赖于单个图像或一组图像,其中可以仅从单个模态推断答案。这种限制反映在训练数据中,其中交错的图像-文本内容很少执行互补的多跳推理。因此,视觉语言模型(VLM)经常产生幻觉,并产生基于视觉证据的推理痕迹。为了解决这一差距,我们引入了CRIT,这是一个新的数据集和基准,它是用基于图的自动管道构建的,用于生成复杂的跨模态推理任务。CRIT由自然图像、视频和文本丰富的源等不同领域组成,并包括一个手动验证的测试集,用于可靠的评估。在这个基准上的实验表明,即使是最先进的模型在这样的推理任务上也很困难。在CRIT上训练的模型在跨模态多跳推理方面表现出显着的进步,包括对SPIQA和其他标准多模态基准的强大改进。
摘要:Real-world reasoning often requires combining information across modalities, connecting textual context with visual cues in a multi-hop process. Yet, most multimodal benchmarks fail to capture this ability: they typically rely on single images or set of images, where answers can be inferred from a single modality alone. This limitation is mirrored in the training data, where interleaved image-text content rarely enforces complementary, multi-hop reasoning. As a result, Vision-Language Models (VLMs) frequently hallucinate and produce reasoning traces poorly grounded in visual evidence. To address this gap, we introduce CRIT, a new dataset and benchmark built with a graph-based automatic pipeline for generating complex cross-modal reasoning tasks. CRIT consists of diverse domains ranging from natural images, videos, and text-rich sources, and includes a manually verified test set for reliable evaluation. Experiments on this benchmark reveal that even state-of-the-art models struggle on such reasoning tasks. Models trained on CRIT show significant gains in cross-modal multi-hop reasoning, including strong improvements on SPIQA and other standard multimodal benchmarks.
【5】Optimizing EEG Graph Structure for Seizure Detection: An Information Bottleneck and Self-Supervised Learning Approach
标题:优化脑电图结构以检测癫痫发作:信息瓶颈和自我监督学习方法
链接:https://arxiv.org/abs/2604.01595
作者:Lincan Li,Rikuto Kotoge,Xihao Piao,Zheng Chen,Yushun Dong
备注:Accepted by IEEE 14th International Conference on Healthcare Informatics (ICHI)
摘要:由于复杂的时空动力学和极端的患者间变异性,从EEG信号中检测癫痫发作具有高度挑战性。为了对它们进行建模,最近的方法通过统计相关性、预定义的相似性度量或隐式学习来构建动态图,但很少考虑EEG的噪声性质。因此,这些图通常包含冗余或与任务无关的连接,即使使用最先进的架构也会破坏模型性能。在本文中,我们提出了一个新的视角EEG癫痫发作检测:联合学习去噪动态图结构和信息的时空表示引导的信息瓶颈(IB)。与以前的方法不同,我们的图形构造器明确考虑了EEG数据的噪声特性,产生了紧凑可靠的连接模式,更好地支持下游癫痫发作检测。为了进一步增强表示学习,我们采用了一种自监督的Graph Masked AutoEncoder,它可以基于动态图上下文重建掩蔽的EEG信号,从而促进与IB原理一致的结构感知和紧凑表示。将这些东西放在一起,我们通过自我监督学习(IRENE)引入信息瓶颈引导的EEG癫痫发作检测,它明确地学习动态图形结构和可解释的时空EEG表示。IRENE解决了三个核心挑战:(i)识别信息量最大的节点和边缘;(ii)解释大脑网络中的癫痫发作传播;(iii)增强对标签稀缺性和患者间变异性的鲁棒性。在基准EEG数据集上进行的大量实验表明,我们的方法在癫痫发作检测方面优于最先进的基线,并为癫痫发作动力学提供了有临床意义的见解。源代码可在https://github.com/LabRAI/IRENE上获得。
摘要:Seizure detection from EEG signals is highly challenging due to complex spatiotemporal dynamics and extreme inter-patient variability. To model them, recent methods construct dynamic graphs via statistical correlations, predefined similarity measures, or implicit learning, yet rarely account for EEG's noisy nature. Consequently, these graphs usually contain redundant or task-irrelevant connections, undermining model performance even with state-of-the-art architectures. In this paper, we present a new perspective for EEG seizure detection: jointly learning denoised dynamic graph structures and informative spatial-temporal representations guided by the Information Bottleneck (IB). Unlike prior approaches, our graph constructor explicitly accounts for the noisy characteristics of EEG data, producing compact and reliable connectivity patterns that better support downstream seizure detection. To further enhance representation learning, we employ a self-supervised Graph Masked AutoEncoder that reconstructs masked EEG signals based on dynamic graph context, promoting structure-aware and compact representations aligned with the IB principle. Bringing things together, we introduce Information Bottleneck-guided EEG SeizuRE DetectioN via SElf-Supervised Learning (IRENE), which explicitly learns dynamic graph structures and interpretable spatial-temporal EEG representations. IRENE addresses three core challenges: (i) Identifying the most informative nodes and edges; (ii) Explaining seizure propagation in the brain network; and (iii) Enhancing robustness against label scarcity and inter-patient variability. Extensive experiments on benchmark EEG datasets demonstrate that our method outperforms state-of-the-art baselines in seizure detection and provides clinically meaningful insights into seizure dynamics. The source code is available at https://github.com/LabRAI/IRENE.
【6】Detecting Complex Money Laundering Patterns with Incremental and Distributed Graph Modeling
标题:利用增量和分布式图建模检测复杂的洗钱模式
链接:https://arxiv.org/abs/2604.01315
作者:Haseeb Tariq,Alen Kaja,Marwan Hassani
摘要:洗钱者利用现有侦查方法的局限性,以欺骗的方式隐藏其财务足迹。他们通过复制监控系统无法轻易区分的交易模式来管理这一点。因此,犯罪所得的资产被推入合法的金融渠道而不引起注意。为监控资金流动而开发的算法往往在规模和复杂性方面遇到困难。由于目前的解决方案(持续)无法控制僵化的、基于风险的规则系统产生的过多假阳性信号,因此进一步加大了识别此类活动的难度。我们提出了一个名为Redirect(Reduce,DIRECTITE和RECTify)的框架,专门用于克服这些挑战。我们工作的主要贡献是在无监督环境中对这个问题进行了新的框架设计;将大型事务图模糊地划分为较小的可管理组件,以便以分布式方式进行快速处理。此外,我们定义了一个精确的评估指标,可以更好地捕捉暴露的洗钱模式的有效性。通过全面的实验,我们证明,我们的框架实现了卓越的性能相比,现有的和国家的最先进的技术,特别是在效率和现实世界的适用性。为了验证,我们使用了真实的(开源)Libra数据集和IBM Watson最近发布的合成数据集。我们的代码和数据集可以在https://github.com/mhaseebtariq/redirect上找到。
摘要
:Money launderers take advantage of limitations in existing detection approaches by hiding their financial footprints in a deceitful manner. They manage this by replicating transaction patterns that the monitoring systems cannot easily distinguish. As a result, criminally gained assets are pushed into legitimate financial channels without drawing attention. Algorithms developed to monitor money flows often struggle with scale and complexity. The difficulty of identifying such activities is further intensified by the (persistent) inability of current solutions to control the excessive number of false positive signals produced by rigid, risk-based rules systems. We propose a framework called ReDiRect (REduce, DIstribute, and RECTify), specifically designed to overcome these challenges. The primary contribution of our work is a novel framing of this problem in an unsupervised setting; where a large transaction graph is fuzzily partitioned into smaller, manageable components to enable fast processing in a distributed manner. In addition, we define a refined evaluation metric that better captures the effectiveness of exposed money laundering patterns. Through comprehensive experimentation, we demonstrate that our framework achieves superior performance compared to existing and state-of-the-art techniques, particularly in terms of efficiency and real-world applicability. For validation, we used the real (open source) Libra dataset and the recently released synthetic datasets by IBM Watson. Our code and datasets are available at https://github.com/mhaseebtariq/redirect.
Transformer(7篇)
【1】Crystalite: A Lightweight Transformer for Efficient Crystal Modeling
标题:Crystalite:一种用于高效晶体建模的轻量级Transformer
链接:https://arxiv.org/abs/2604.02270
作者:Tin Hadži Veljković,Joshua Rosenthal,Ivor Lončarić,Jan-Willem van de Meent
备注:39 pages, 13 figures. Code available at: https://github.com/joshrosie/crystalite
摘要:晶体材料的生成模型通常依赖于等变图神经网络,其可以很好地捕获几何结构,但是训练成本高并且采样缓慢。我们提出Crystalite,一个轻量级的扩散Transformer晶体建模围绕两个简单的电感偏置。第一个是亚原子标记化,一种紧凑的化学结构原子表示,取代了高维的独热编码,更适合连续扩散。第二个是几何增强模块(GEM),它通过加性几何偏差将周期性最小图像对几何直接注入注意力。这些组件一起保持了标准Transformer的简单性和效率,同时使其更好地匹配晶体材料的结构。Crystalite在晶体结构预测基准和从头生成性能方面取得了最先进的结果,在评估的基线中获得了最佳的S.U.N.发现分数,同时采样速度大大快于几何结构繁重的替代品。
摘要:Generative models for crystalline materials often rely on equivariant graph neural networks, which capture geometric structure well but are costly to train and slow to sample. We present Crystalite, a lightweight diffusion Transformer for crystal modeling built around two simple inductive biases. The first is Subatomic Tokenization, a compact chemically structured atom representation that replaces high-dimensional one-hot encodings and is better suited to continuous diffusion. The second is the Geometry Enhancement Module (GEM), which injects periodic minimum-image pair geometry directly into attention through additive geometric biases. Together, these components preserve the simplicity and efficiency of a standard Transformer while making it better matched to the structure of crystalline materials. Crystalite achieves state-of-the-art results on crystal structure prediction benchmarks, and de novo generation performance, attaining the best S.U.N. discovery score among the evaluated baselines while sampling substantially faster than geometry-heavy alternatives.
【2】Ouroboros: Dynamic Weight Generation for Recursive Transformers via Input-Conditioned LoRA Modulation
标题:Ouroboros:通过输入条件LoRA调制为循环Transformer生成动态权重
链接:https://arxiv.org/abs/2604.02051
作者:Jaber Jaber,Osama Jaber
备注:10 pages, 5 tables, 1 figure, 1 algorithm. Code: https://github.com/RightNow-AI/ouroboros
摘要:递归Transformers在多个深度步长中重用共享的权重块,以计算参数为代价。一个核心限制:每一步都应用相同的转换,防止模型跨深度组合不同的操作。我们提出Ouroboros,一个系统,附加一个紧凑的控制器超网络的递归Transformer块。控制器观察当前隐藏状态,产生每步对角调制向量,并将其应用于冻结的SVD初始化LoRA基,使每个递归步骤依赖于输入。我们将其与门控递归(偏差初始化为88%保留)和每步LayerNorm相结合,以实现稳定的深度迭代。在Qwen2.5-3B拆分为Prelude/Recurrent/Coda架构(保留了36层中的17层)时,Ouroboros比未修改的17层基线减少了43.4%的训练损失,恢复了51.3%的性能差距。整个系统仅增加了9.2M可训练参数(控制器,门和每步范数),但在深度1处的性能优于同等大小的静态每步LoRA 1.44个损失点,并且在所有测试深度(1,4,8,16)和等级(8,32,64)中保持领先。我们还发现,门控递归是必不可少的:没有它,递归层的应用程序使模型严格恶化。这些增益是在训练分布上测量的;在保持文本上,控制器尚未超过基线,这是我们归因于冻结的下游层的限制,并详细讨论。代码:https://github.com/RightNow-AI/ouroboros
摘要:Recursive transformers reuse a shared weight block across multiple depth steps, trading parameters for compute. A core limitation: every step applies the same transformation, preventing the model from composing distinct operations across depth. We present Ouroboros, a system that attaches a compact Controller hypernetwork to a recursive transformer block. The Controller observes the current hidden state, produces a per-step diagonal modulation vector, and applies it to frozen SVD-initialized LoRA bases, making each recurrence step input-dependent. We combine this with gated recurrence (bias-initialized to 88% retention) and per-step LayerNorm for stable deep iteration. On Qwen2.5-3B split into a Prelude/Recurrent/Coda architecture (17 of 36 layers retained), Ouroboros reduces training loss by 43.4% over the unmodified 17-layer baseline, recovering 51.3% of the performance gap caused by layer removal. The full system adds only 9.2M trainable parameters (Controller, gate, and per-step norms) yet outperforms equivalently-sized static per-step LoRA by 1.44 loss points at depth 1 and remains ahead across all tested depths (1, 4, 8, 16) and ranks (8, 32, 64). We also find that gated recurrence is essential: without it, recursive layer application makes the model strictly worse. These gains are measured on the training distribution; on held-out text, the Controller does not yet improve over the baseline, a limitation we attribute to frozen downstream layers and discuss in detail. Code: https://github.com/RightNow-AI/ouroboros
【3】Physics-Informed Transformer for Multi-Band Channel Frequency Response Reconstruction
标题:用于多频段通道频率响应重建的物理信息Transformer
链接:https://arxiv.org/abs/2604.01944
作者:Anatolij Zubow,Joana Angjo,Sigrid Dimce,Falko Dressler
备注:6 pages, 6 figures
摘要:宽带信道频率响应(CFR)估计在多频带无线系统中是具有挑战性的,特别是当一个或多个子频带被共信道干扰暂时阻塞时。我们提出了一个物理通知复杂的Transformer,重建完整的宽带CFR从这样的碎片,部分观察到的频谱快照。每个子带中的干扰模式被建模为一个独立的两状态离散时间马尔可夫链,捕捉现实的突发占用行为。我们的模型在$T$快照和$F$频率箱的联合时频网格上运行,并使用一种因子化的自注意机制,该机制分别沿两个轴参加,将计算复杂度降低到$O(TF^2 + FT^2)$。复值输入和输出通过保留相位关系的全纯线性层进行处理。训练使用组合频谱保真度、功率延迟分布(PDP)重构、信道脉冲响应(CIR)稀疏性和时间平滑性的复合物理通知损耗。流动性效应通过按样本速度随机化来纳入,从而能够在不同的流动性机制中进行推广。对三个经典基线,即,最后观测结转,零填充,和三次样条插值的评估表明,我们的方法实现了最高的PDP相似性方面的地面真相,达到$ρ\geq 0.82$相比,$ρ\geq 0.62$的最佳基线干扰占用水平高达50%。此外,该模型在整个速度范围内平滑退化,始终优于所有其他基线。
摘要
:Wideband channel frequency response (CFR) estimation is challenging in multi-band wireless systems, especially when one or more sub-bands are temporarily blocked by co-channel interference. We present a physics-informed complex Transformer that reconstructs the full wideband CFR from such fragmented, partially observed spectrum snapshots. The interference pattern in each sub-band is modeled as an independent two-state discrete-time Markov chain, capturing realistic bursty occupancy behavior. Our model operates on the joint time-frequency grid of $T$ snapshots and $F$ frequency bins and uses a factored self-attention mechanism that separately attends along both axes, reducing the computational complexity to $O(TF^2 + FT^2)$. Complex-valued inputs and outputs are processed through a holomorphic linear layer that preserves phase relationships. Training uses a composite physics-informed loss combining spectral fidelity, power delay profile (PDP) reconstruction, channel impulse response (CIR) sparsity, and temporal smoothness. Mobility effects are incorporated through per-sample velocity randomization, enabling generalization across different mobility regimes. Evaluation against three classical baselines, namely, last-observation-carry-forward, zero-fill, and cubic-spline interpolation, shows that our approach achieves the highest PDP similarity with respect to the ground truth, reaching $ρ\geq 0.82$ compared to $ρ\geq 0.62$ for the best baseline at interference occupancy levels up to 50%. Furthermore, the model degrades smoothly across the full velocity range, consistently outperforming all other baselines.
【4】DDCL-INCRT: A Self-Organising Transformer with Hierarchical Prototype Structure (Theoretical Foundations)
标题:DDCL-INCPR:具有分层原型结构的自组织Transformer(理论基础)
链接:https://arxiv.org/abs/2604.01880
作者:Giansalvo Cirrincione
备注:30 pages, 5 figures. Submitted to Neural Networks (Elsevier)
摘要:Transformer家族的现代神经网络要求从业者在训练开始之前决定使用多少个注意力头,网络应该有多深,以及每个组件应该有多宽。这些决策是在不了解任务的情况下做出的,产生的架构系统性地大于必要的规模:经验研究发现,在训练后可以删除很大一部分头和层,而不会损失性能。 本文介绍了DDCL-INCRT,一个体系结构,确定自己的结构在培训。两个互补的想法结合在一起。第一种是DDCL(Deep Dual Competitive Learning,深度双重竞争学习),它用一个学习过的原型向量字典来代替前馈块,这些向量代表数据中信息量最大的方向。这些原型在训练目标的驱动下自动散开,没有明确的规则化。第二个是INCRT(Incremental Transformer),它控制磁头的数量:从一个磁头开始,只有当现有磁头未捕获的方向信息超过阈值时,它才添加一个新磁头。 主要的理论发现是,这两种机制相互加强:每一个新的头部放大原型分离,这反过来又提高了触发下一个添加的信号。在收敛时,网络自组织成按代表粒度排序的头部层次结构。这种层次结构被证明是唯一的和最小的,最小的架构足以完成任务,在规定的条件下。稳定性,收敛性和修剪安全的正式保证建立在整个。 建筑不是一个人设计的。这是一种衍生的东西。
摘要:Modern neural networks of the transformer family require the practitioner to decide, before training begins, how many attention heads to use, how deep the network should be, and how wide each component should be. These decisions are made without knowledge of the task, producing architectures that are systematically larger than necessary: empirical studies find that a substantial fraction of heads and layers can be removed after training without performance loss. This paper introduces DDCL-INCRT, an architecture that determines its own structure during training. Two complementary ideas are combined. The first, DDCL (Deep Dual Competitive Learning), replaces the feedforward block with a dictionary of learned prototype vectors representing the most informative directions in the data. The prototypes spread apart automatically, driven by the training objective, without explicit regularisation. The second, INCRT (Incremental Transformer), controls the number of heads: starting from one, it adds a new head only when the directional information uncaptured by existing heads exceeds a threshold. The main theoretical finding is that these two mechanisms reinforce each other: each new head amplifies prototype separation, which in turn raises the signal triggering the next addition. At convergence, the network self-organises into a hierarchy of heads ordered by representational granularity. This hierarchical structure is proved to be unique and minimal, the smallest architecture sufficient for the task, under the stated conditions. Formal guarantees of stability, convergence, and pruning safety are established throughout. The architecture is not something one designs. It is something one derives.
【5】Transformer self-attention encoder-decoder with multimodal deep learning for response time series forecasting and digital twin support in wind structural health monitoring
标题:具有多模式深度学习的Transformer自我关注编码器-解码器,用于响应时间序列预测和风结构健康监测中的数字孪生支持
链接:https://arxiv.org/abs/2604.01712
作者:Feiyu Zhou,Marios Impraimakis
备注:21 pages, 22 figures, 9 tables. This version corresponds to the published article in Computers & Structures. https://doi.org/10.1016/j.compstruc.2026.108216
摘要:本文研究了新型Transformer方法的风致结构响应预测能力。该模型还为桥梁结构健康监测提供了数字孪生组件。该方法首先利用系统的时间特性对预测模型进行训练。其次,将振动预测值与测量值进行比较,以检测较大的偏差。最后,确定的案例被用作结构变化的预警指标。基于人工智能的模型优于响应预测方法,因为不需要对风的平稳性或结构的正常振动行为进行假设。具体而言,风激动力学行为遭受不确定性相关的环境或交通条件发生变化时,获得穷人的预测。这导致了对什么构成正常振动行为的难以区分。为此,一个框架是严格审查的现实世界的测量从哈当厄桥监测挪威科技大学。该方法捕捉准确的结构行为在现实条件下,并在系统激励的变化。重要的是,这些结果突出了基于变压器的数字孪生组件作为下一代工具的潜力,用于弹性基础设施管理,持续学习和对系统生命周期的时间特征进行自适应监控。
摘要:The wind-induced structural response forecasting capabilities of a novel transformer methodology are examined here. The model also provides a digital twin component for bridge structural health monitoring. Firstly, the approach uses the temporal characteristics of the system to train a forecasting model. Secondly, the vibration predictions are compared to the measured ones to detect large deviations. Finally, the identified cases are used as an early-warning indicator of structural change. The artificial intelligence-based model outperforms approaches for response forecasting as no assumption on wind stationarity or on structural normal vibration behavior is needed. Specifically, wind-excited dynamic behavior suffers from uncertainty related to obtaining poor predictions when the environmental or traffic conditions change. This results in a hard distinction of what constitutes normal vibration behavior. To this end, a framework is rigorously examined on real-world measurements from the Hardanger Bridge monitored by the Norwegian University of Science and Technology. The approach captures accurate structural behavior in realistic conditions, and with respect to the changes in the system excitation. The results, importantly, highlight the potential of transformer-based digital twin components to serve as next-generation tools for resilient infrastructure management, continuous learning, and adaptive monitoring over the system's lifecycle with respect to temporal characteristics.
【6】Efficient Equivariant Transformer for Self-Driving Agent Modeling
标题:用于自动驾驶代理建模的高效等变Transformer
链接:https://arxiv.org/abs/2604.01466
作者:Scott Xu,Dian Chen,Kelvin Wong,Chris Zhang,Kion Fallah,Raquel Urtasun
备注:CVPR 2026
摘要:智能体行为的精确建模是自动驾驶的一项重要任务。它也是一个具有许多对称性的任务,例如场景中代理和对象的顺序的等变性或整个场景作为一个整体的任意旋转平移的等变性;即,SE(2)-equivariance。Transformer体系结构是对这些对称性进行建模的普遍工具。虽然标准的自注意本质上是置换等变的,但显式的成对相对位置编码已经成为引入SE(2)-等变的标准。然而,这种方法引入了一个额外的成本,是代理数量的平方,限制了其可扩展性,以更大的场景和批量大小。在这项工作中,我们提出了DriveGATr,一种新的基于transformer的代理建模架构,实现SE(2)-等方差,而无需现有方法的计算成本。受几何深度学习最新进展的启发,DriveGATr将场景元素编码为2D投影几何代数$\mathbb{R}^*_{2,0,1}$中的多向量,并使用等变Transformer块堆栈处理它们。至关重要的是,DriveGATr使用多向量之间的标准注意力来建模几何关系,从而消除了对昂贵的显式成对相对位置编码的需要。在Waymo Open Motion数据集上的实验表明,DriveGATr在交通模拟方面与最先进的技术相当,并在性能与计算成本方面建立了优越的帕累托前沿。
摘要
:Accurately modeling agent behaviors is an important task in self-driving. It is also a task with many symmetries, such as equivariance to the order of agents and objects in the scene or equivariance to arbitrary roto-translations of the entire scene as a whole; i.e., SE(2)-equivariance. The transformer architecture is a ubiquitous tool for modeling these symmetries. While standard self-attention is inherently permutation equivariant, explicit pairwise relative positional encodings have been the standard for introducing SE(2)-equivariance. However, this approach introduces an additional cost that is quadratic in the number of agents, limiting its scalability to larger scenes and batch sizes. In this work, we propose DriveGATr, a novel transformer-based architecture for agent modeling that achieves SE(2)-equivariance without the computational cost of existing methods. Inspired by recent advances in geometric deep learning, DriveGATr encodes scene elements as multivectors in the 2D projective geometric algebra $\mathbb{R}^*_{2,0,1}$ and processes them with a stack of equivariant transformer blocks. Crucially, DriveGATr models geometric relationships using standard attention between multivectors, eliminating the need for costly explicit pairwise relative positional encodings. Experiments on the Waymo Open Motion Dataset demonstrate that DriveGATr is comparable to the state-of-the-art in traffic simulation and establishes a superior Pareto front for performance vs computational cost.
【7】Homogenized Transformers
标题:均质化Transformer
链接:https://arxiv.org/abs/2604.01978
作者:Hugo Koubbi,Borjan Geshkovski,Philippe Rigollet
摘要:我们研究了一个深度多头自我注意的随机模型,其中权重在训练初始化时独立地跨层和头部重新采样。将深度视为时间变量,残余流在单位球体上定义了一个离散时间相互作用的粒子系统。我们证明,在适当的联合缩放的深度,剩余的步长,和头部的数量,这种动态承认一个非平凡的均匀化的限制。取决于缩放,限制是确定性的或随机的共同噪声;在平均场制度,后者导致一个随机的非线性福克-普朗克方程的代表性令牌的条件定律。在高斯设置中,极限漂移消失,使得均匀化动力学足够明确,以研究表示崩溃。这会产生定量的尺寸,上下文长度和温度之间的权衡,并确定制度,集群可以减轻。
摘要:We study a random model of deep multi-head self-attention in which the weights are resampled independently across layers and heads, as at initialization of training. Viewing depth as a time variable, the residual stream defines a discrete-time interacting particle system on the unit sphere. We prove that, under suitable joint scalings of the depth, the residual step size, and the number of heads, this dynamics admits a nontrivial homogenized limit. Depending on the scaling, the limit is either deterministic or stochastic with common noise; in the mean-field regime, the latter leads to a stochastic nonlinear Fokker--Planck equation for the conditional law of a representative token. In the Gaussian setting, the limiting drift vanishes, making the homogenized dynamics explicit enough to study representation collapse. This yields quantitative trade-offs between dimension, context length, and temperature, and identifies regimes in which clustering can be mitigated.
GAN|对抗|攻击|生成相关(4篇)
【1】AEGIS: Adversarial Entropy-Guided Immune System -- Thermodynamic State Space Models for Zero-Day Network Evasion Detection
标题:AEGIS:对抗性熵引导免疫系统--零日网络逃避检测的热力学状态空间模型
链接:https://arxiv.org/abs/2604.02149
作者:Vickson Ferrel
备注:10 pages, 3 figures, 3 tables
摘要:由于TLS 1.3加密限制了传统的深度包检测(DPI),安全社区已经转向基于欧几里得变换的分类器(例如,ET-BERT)进行加密流量分析。然而,这些模型仍然容易受到字节级对抗变形的影响-最近的预填充攻击将ET-BERT准确率降低到25.68%,而VLESS Reality完全绕过了基于证书的检测。我们介绍AEGIS:由热力学方差引导的双曲液态空间模型(TVD-HL-SSM)驱动的对抗性熵引导免疫系统。而不是在欧几里德有效载荷读取域中竞争,AEGIS放弃有效载荷字节,支持投影到非欧几里德庞加莱流形的6维连续时间流物理。液体时间常数测量微秒IAT衰减,热力学方差检测器计算序列范围的香农熵,以揭示自动C2隧道异常。具有零拷贝IPC的纯C++ eBPF Harvester绕过Python GIL,使线性时间O(N)Mamba-3核心能够以线速率处理64,000个数据包群。在跨越主干流量、物联网僵尸网络、零日和专有VLESS Reality隧道的400 GB 4层对抗语料库上进行评估,AEGIS在RTX 4090上以262 us的推理延迟实现了0.9952的F1分数和99.50%的真阳性率,为基于物理的对抗网络防御建立了新的最先进技术。
摘要:As TLS 1.3 encryption limits traditional Deep Packet Inspection (DPI), the security community has pivoted to Euclidean Transformer-based classifiers (e.g., ET-BERT) for encrypted traffic analysis. However, these models remain vulnerable to byte-level adversarial morphing -- recent pre-padding attacks reduced ET-BERT accuracy to 25.68%, while VLESS Reality bypasses certificate-based detection entirely. We introduce AEGIS: an Adversarial Entropy-Guided Immune System powered by a Thermodynamic Variance-Guided Hyperbolic Liquid State Space Model (TVD-HL-SSM). Rather than competing in the Euclidean payload-reading domain, AEGIS discards payload bytes in favor of 6-dimensional continuous-time flow physics projected into a non-Euclidean Poincare manifold. Liquid Time-Constants measure microsecond IAT decay, and a Thermodynamic Variance Detector computes sequence-wide Shannon Entropy to expose automated C2 tunnel anomalies. A pure C++ eBPF Harvester with zero-copy IPC bypasses the Python GIL, enabling a linear-time O(N) Mamba-3 core to process 64,000-packet swarms at line-rate. Evaluated on a 400GB, 4-tier adversarial corpus spanning backbone traffic, IoT botnets, zero-days, and proprietary VLESS Reality tunnels, AEGIS achieves an F1-score of 0.9952 and 99.50% True Positive Rate at 262 us inference latency on an RTX 4090, establishing a new state-of-the-art for physics-based adversarial network defense.
【2】CASHG: Context-Aware Stylized Online Handwriting Generation
标题:CASHG:上下文感知的风格化在线手写生成
链接:https://arxiv.org/abs/2604.02103
作者:Jinsu Shin,Sungeun Hong,Jin Yeong Bak
备注:42 pages, 19 figures
摘要:在线手写将笔划表示为按时间排序的轨迹,这使得手写内容更容易在广泛的应用中转换和重用。然而,由于句子合成需要具有笔画连续性和间距的上下文相关字符,因此生成忠实反映作者风格的自然手写级别在线手写仍然具有挑战性。先前的方法将这些边界属性视为序列建模的隐含结果,这在句子规模和有限的成分多样性下变得不可靠。我们提出了CASHG,一个上下文感知的风格化的在线手写生成器,明确地模型字符间的连接,风格一致的笔迹水平的轨迹合成。CASHG使用一个字符上下文编码器来获得字符身份和依赖于字符串的上下文记忆,并将它们融合在一个bigram感知的滑动窗口Transformer解码器中,该解码器强调本地前导-当前转换,并辅之以用于字符串级上下文的门控上下文融合。训练通过从孤立的字形到完整句子的三阶段课程进行,提高了稀疏转换覆盖下的鲁棒性。我们还介绍了连通性和间距分析(CSM),一个边界感知的评估套件,量化草书的连通性和间距相似性。在基准匹配的评估协议下,CASHG不断改进CSM,同时在基于DTW的轨迹相似性方面保持竞争力,并通过人工评估证实了收益。
摘要:Online handwriting represents strokes as time-ordered trajectories, which makes handwritten content easier to transform and reuse in a wide range of applications. However, generating natural sentence-level online handwriting that faithfully reflects a writer's style remains challenging, since sentence synthesis demands context-dependent characters with stroke continuity and spacing. Prior methods treat these boundary properties as implicit outcomes of sequence modeling, which becomes unreliable at the sentence scale and under limited compositional diversity. We propose CASHG, a context-aware stylized online handwriting generator that explicitly models inter-character connectivity for style-consistent sentence-level trajectory synthesis. CASHG uses a Character Context Encoder to obtain character identity and sentence-dependent context memory and fuses them in a bigram-aware sliding-window Transformer decoder that emphasizes local predecessor--current transitions, complemented by gated context fusion for sentence-level context.Training proceeds through a three-stage curriculum from isolated glyphs to full sentences, improving robustness under sparse transition coverage. We further introduce Connectivity and Spacing Metrics (CSM), a boundary-aware evaluation suite that quantifies cursive connectivity and spacing similarity. Under benchmark-matched evaluation protocols, CASHG consistently improves CSM over comparison methods while remaining competitive in DTW-based trajectory similarity, with gains corroborated by a human evaluation.
【3】RuleForge: Automated Generation and Validation for Web Vulnerability Detection at Scale
标题:RuleForge:大规模Web漏洞检测的自动生成和验证
链接:https://arxiv.org/abs/2604.01977
作者:Ayush Garg,Sophia Hager,Jacob Montiel,Aditya Tiwari,Michael Gentile,Zach Reavis,David Magnotti,Wayne Fullen
备注:11 pages, 10 figures. To be submitted to CAMLIS 2026
摘要:安全团队面临着一个挑战:新披露的常见漏洞和暴露(CVE)的数量远远超过了手动开发检测机制的能力。2025年,国家漏洞数据库发布了超过48,000个新漏洞,激发了自动化的需求。我们介绍了RuleForge,这是一个AWS内部系统,可以自动生成检测规则-基于JSON的模式,用于识别利用特定漏洞的恶意HTTP请求-从描述CVE详细信息的结构化Nuclei模板中。Nuclei模板提供了标准化的、基于YAML的漏洞描述,作为我们规则生成过程的结构化输入。 本文重点介绍RuleForge的架构和业务部署的CVE相关的威胁检测,特别强调我们的新的LLM作为一个法官(大语言模型作为法官)的信心验证系统和系统的反馈集成机制。这种验证方法在两个维度上评估候选规则-敏感性(避免假阴性)和特异性(避免假阳性)-与生产中的合成测试验证相比,AUROC为0.75,假阳性减少了67%。我们的5x 5生成策略(五个并行候选项,每个候选项最多进行五次细化尝试)与连续反馈循环相结合,可以实现系统性的质量改进。我们还提出了扩展,使规则生成从非结构化数据源,并展示了一个概念验证的多事件类型检测代理工作流。我们吸取的经验教训强调了将LLM应用于网络安全任务的关键考虑因素,包括过度自信的缓解以及领域专业知识在通过人在环验证对生成的规则进行及时设计和质量审查方面的重要性。
摘要:Security teams face a challenge: the volume of newly disclosed Common Vulnerabilities and Exposures (CVEs) far exceeds the capacity to manually develop detection mechanisms. In 2025, the National Vulnerability Database published over 48,000 new vulnerabilities, motivating the need for automation. We present RuleForge, an AWS internal system that automatically generates detection rules--JSON-based patterns that identify malicious HTTP requests exploiting specific vulnerabilities--from structured Nuclei templates describing CVE details. Nuclei templates provide standardized, YAML-based vulnerability descriptions that serve as the structured input for our rule generation process. This paper focuses on RuleForge's architecture and operational deployment for CVE-related threat detection, with particular emphasis on our novel LLM-as-a-judge (Large Language Model as judge) confidence validation system and systematic feedback integration mechanism. This validation approach evaluates candidate rules across two dimensions--sensitivity (avoiding false negatives) and specificity (avoiding false positives)--achieving AUROC of 0.75 and reducing false positives by 67% compared to synthetic-test-only validation in production. Our 5x5 generation strategy (five parallel candidates with up to five refinement attempts each) combined with continuous feedback loops enables systematic quality improvement. We also present extensions enabling rule generation from unstructured data sources and demonstrate a proof-of-concept agentic workflow for multi-event-type detection. Our lessons learned highlight critical considerations for applying LLMs to cybersecurity tasks, including overconfidence mitigation and the importance of domain expertise in both prompt design and quality review of generated rules through human-in-the-loop validation.
【4】Know Your Streams: On the Conceptualization, Characterization, and Generation of Intentional Event Streams
标题:了解你的流:关于有意事件流的概念化、描述和生成
链接:https://arxiv.org/abs/2604.01440
作者:Andrea Maldonado,Christian Imenkamp,Hendrik Reiter,Thomas Seidl,Wilhelm Hasselbring,Martin Werner,Agnes Koschmider
摘要:向支持物联网的传感器驱动系统的转变改变了运营数据的生成方式,支持连续的实时事件流(ES)而不是静态事件日志。这种演变提出了新的挑战,流过程挖掘(SPM),它必须处理无序事件,并发活动,不完整的情况下,和概念漂移。然而,SPM算法的评估仍然植根于过时的实践,依赖于静态日志或人工流化的数据,无法反映现实世界流的复杂性。为了解决这一差距,我们首先进行了全面的审查,数据流文献,以确定目前没有反映在SPM社区的流特性。接下来,我们使用这些信息来扩展ES的概念基础。最后,我们提出了意图流,一个原型生成器,以产生具有特定功能的ES。我们的评估表明,在SPM中,我们在生产可重复的,有意的ES用于有针对性的基准测试和自适应算法开发方面表现出色。
摘要:The shift toward IoT-enabled, sensor-driven systems has transformed how operational data is generated, favoring continuous, real-time event streams (ES) over static event logs. This evolution presents new challenges for Streaming Process Mining (SPM), which must cope with out-of-order events, concurrent activities, incomplete cases, and concept drifts. Yet, the evaluation of SPM algorithms remains rooted in outdated practices, relying on static logs or artificially streamified data that fail to reflect the complexities of real-world streams. To address this gap, we first perform a comprehensive review of data stream literature to identify stream characteristics currently not reflected in the SPM community. Next, we use this information to extend the conceptual foundation for ES. Finally, we propose Stream of Intent, a prototype generator to produce ES with specific features. Our evaluation shows excellence in producing reproducible, intentional ES for targeted benchmarking and adaptive algorithm development in SPM.
半/弱/无/有监督|不确定性|主动学习(9篇)
【1】When to ASK: Uncertainty-Gated Language Assistance for Reinforcement Learning
标题:何时询问:强化学习的不确定门控语言辅助
链接:https://arxiv.org/abs/2604.02226
作者:Juarez Monteiro,Nathan Gavenski,Gianlucca Zuin,Adriano Veloso
备注:In Proceedings of International Joint Conference on Neural Networks (IJCNN)
摘要:强化学习(RL)代理经常与分布外(OOD)场景作斗争,导致高度不确定性和随机行为。虽然语言模型(LM)包含有价值的世界知识,但较大的语言模型会产生较高的计算成本,阻碍实时使用,并在自主规划中表现出局限性。我们引入了知识自适应安全(ASK),它将较小的LM与经过训练的RL策略相结合,以增强OOD的泛化能力,而无需重新训练。ASK采用蒙特卡罗丢弃法来评估不确定性,并仅在不确定性超过设定阈值时才向LM查询行动建议。这种选择性使用保留了现有策略的效率,同时在不确定的情况下利用语言模型的推理。在FrozenLake环境的实验中,ASK在域内没有表现出任何改进,但在转移任务中表现出鲁棒的导航,获得了0.95的奖励。我们的研究结果表明,有效的神经符号整合需要仔细的编排,而不是简单的组合,强调需要足够的模型规模和有效的杂交机制,成功的OOD推广。
摘要:Reinforcement learning (RL) agents often struggle with out-of-distribution (OOD) scenarios, leading to high uncertainty and random behavior. While language models (LMs) contain valuable world knowledge, larger ones incur high computational costs, hindering real-time use, and exhibit limitations in autonomous planning. We introduce Adaptive Safety through Knowledge (ASK), which combines smaller LMs with trained RL policies to enhance OOD generalization without retraining. ASK employs Monte Carlo Dropout to assess uncertainty and queries the LM for action suggestions only when uncertainty exceeds a set threshold. This selective use preserves the efficiency of existing policies while leveraging the language model's reasoning in uncertain situations. In experiments on the FrozenLake environment, ASK shows no improvement in-domain, but demonstrates robust navigation in transfer tasks, achieving a reward of 0.95. Our findings indicate that effective neuro-symbolic integration requires careful orchestration rather than simple combination, highlighting the need for sufficient model scale and effective hybridization mechanisms for successful OOD generalization.
【2】Feature Weighting Improves Pool-Based Sequential Active Learning for Regression
标题:特征加权改进了基于池的回归顺序主动学习
链接:https://arxiv.org/abs/2604.02019
作者:Dongrui Wu
摘要:基于池的序贯主动学习回归算法(ALR)从大量未标记样本池中依次选择少量样本进行标记,从而在给定的标记预算下构造出更精确的回归模型。代表性和多样性,涉及计算不同样本之间的距离,是ALR的重要考虑因素。然而,以前的ALR方法没有将不同的功能在样本间距离计算的重要性,导致次优的样本选择。本文提出了三种特征加权的单任务ALR方法和两种特征加权的多任务ALR方法,其中,从少量先前标记的样本中训练的岭回归系数用于在样本间距离计算中加权相应的特征。实验表明,这种易于实现的增强几乎总是提高现有的四个ALR方法的性能,在单任务和多任务回归问题。特征加权策略还可以容易地扩展到基于流的ALR和分类算法。
摘要
:Pool-based sequential active learning for regression (ALR) optimally selects a small number of samples sequentially from a large pool of unlabeled samples to label, so that a more accurate regression model can be constructed under a given labeling budget. Representativeness and diversity, which involve computing the distances among different samples, are important considerations in ALR. However, previous ALR approaches do not incorporate the importance of different features in inter-sample distance computation, resulting in sub-optimal sample selection. This paper proposes three feature weighted single-task ALR approaches and two feature weighted multi-task ALR approaches, where the ridge regression coefficients trained from a small amount of previously labeled samples are used to weight the corresponding features in inter-sample distance computation. Experiments showed that this easy-to-implement enhancement almost always improves the performance of four existing ALR approaches, in both single-task and multi-task regression problems. The feature weighting strategy may also be easily extended to stream-based ALR, and classification algorithms.
【3】Curia-2: Scaling Self-Supervised Learning for Radiology Foundation Models
标题:Curia-2:放射学基金会模型的自我监督学习规模
链接:https://arxiv.org/abs/2604.01987
作者:Antoine Saporta,Baptiste Callard,Corentin Dancette,Julien Khlaut,Charles Corbière,Leo Butsanets,Amaury Prat,Pierre Manceron
摘要:医学成像的快速发展推动了基础模型(FM)的发展,以减少放射科医生不断增长的不可持续的工作量。虽然最近的FM已经显示了大规模预训练对CT和MRI分析的影响,但仍然有很大的空间来优化这些模型如何从复杂的放射学体积中学习。在Curia框架的基础上,这项工作引入了Curia-2,它显着改进了原始的预训练策略和表示质量,以更好地捕捉放射性数据的特殊性。所提出的方法能够将架构扩展到十亿参数的Vision Transformers,这是多模态CT和MRI FM的第一个。此外,我们正式评估这些模型的扩展和重组CuriaBench成两个不同的轨道:一个2D的轨道为基于切片的视觉模型和一个3D的轨道体积基准。我们的研究结果表明,Curia-2在以视觉为中心的任务上优于所有FM,并且在临床复杂任务(如寻找检测)上与视觉语言模型竞争。权重将公开提供,以促进进一步的研究。
摘要:The rapid growth of medical imaging has fueled the development of Foundation Models (FMs) to reduce the growing, unsustainable workload on radiologists. While recent FMs have shown the power of large-scale pre-training to CT and MRI analysis, there remains significant room to optimize how these models learn from complex radiological volumes. Building upon the Curia framework, this work introduces Curia-2, which significantly improves the original pre-training strategy and representation quality to better capture the specificities of radiological data. The proposed methodology enables scaling the architecture up to billion-parameter Vision Transformers, marking a first for multi-modal CT and MRI FMs. Furthermore, we formalize the evaluation of these models by extending and restructuring CuriaBench into two distinct tracks: a 2D track tailored for slice-based vision models and a 3D track for volumetric benchmarking. Our results demonstrate that Curia-2 outperforms all FMs on vision-focused tasks and fairs competitively to vision-language models on clinically complex tasks such as finding detection. Weights will be made publicly available to foster further research.
【4】Learning Spatial Structure from Pre-Beamforming Per-Antenna Range-Doppler Radar Data via Visibility-Aware Cross-Modal Supervision
标题:通过可见度感知跨模式监督从每天线预束形成距离-多普勒雷达数据中学习空间结构
链接:https://arxiv.org/abs/2604.01921
作者:George Sebastian,Philipp Berthold,Bianca Forkel,Leon Pohl,Mirko Maehlisch
摘要:汽车雷达感知管道通常在应用基于学习的模型之前通过波束成形来构建角度域表示。这项工作,而不是调查一个代表性的问题:有意义的空间结构可以直接从预波束成形每个天线的距离多普勒(RD)测量?实验是在6-TX x 8-RX(48个虚拟天线)商品汽车雷达上进行的,该雷达采用A/B线性调频序列调频连续波(CS-FMCW)发射方案,其中有效发射孔径在线性调频(单TX与多TX)之间变化,从而能够对线性调频相关的发射配置进行受控分析。我们使用端到端、完全数据驱动的方式训练的双线性调频共享权重编码器对每个天线的预波束成形RD张量进行操作,并使用鸟瞰图(BEV)占用率作为几何探针而不是性能驱动的目标来评估空间可恢复性。监督是可见性感知和跨模态的,源自LiSAR,通过基于射线的可见性对雷达视场和遮挡感知LiSAR可观察性进行显式建模。通过线性调频脉冲烧蚀(仅A,仅B,A+B),距离-频带分析,和物理对齐的基线,我们评估如何传输配置影响几何可恢复性。结果表明,空间结构可以直接从预波束成形每个天线RD张量学习,而无需显式的角度域构造或手工制作的信号处理阶段。
摘要:Automotive radar perception pipelines commonly construct angle-domain representations via beamforming before applying learning-based models. This work instead investigates a representational question: can meaningful spatial structure be learned directly from pre-beamforming per-antenna range-Doppler (RD) measurements? Experiments are conducted on a 6-TX x 8-RX (48 virtual antennas) commodity automotive radar employing an A/B chirp-sequence frequency-modulated continuous-wave (CS-FMCW) transmit scheme, in which the effective transmit aperture varies between chirps (single-TX vs. multi-TX), enabling controlled analysis of chirp-dependent transmit configurations. We operate on pre-beamforming per-antenna RD tensors using a dual-chirp shared-weight encoder trained in an end-to-end, fully data-driven manner, and evaluate spatial recoverability using bird's-eye-view (BEV) occupancy as a geometric probe rather than a performance-driven objective. Supervision is visibility-aware and cross-modal, derived from LiDAR with explicit modeling of the radar field-of-view and occlusion-aware LiDAR observability via ray-based visibility. Through chirp ablations (A-only, B-only, A+B), range-band analysis, and physics-aligned baselines, we assess how transmit configurations affect geometric recoverability. The results indicate that spatial structure can be learned directly from pre-beamforming per-antenna RD tensors without explicit angle-domain construction or hand-crafted signal-processing stages.
【5】Enhancing the Reliability of Medical AI through Expert-guided Uncertainty Modeling
标题:通过专家指导的不确定性建模提高医疗人工智能的可靠性
链接:https://arxiv.org/abs/2604.01898
作者:Aleksei Khalin,Ekaterina Zaychenkova,Aleksandr Yugay,Andrey Goncharov,Sergey Korchagin,Alexey Zaytsev,Egor Ershov
摘要:人工智能(AI)系统加速了医疗工作流程,提高了医疗保健的诊断准确性,可作为第二意见系统。然而,人工智能错误的不可预测性构成了一个重大挑战,特别是在医疗保健领域,错误可能会造成严重后果。一个被广泛采用的保障措施是将预测与不确定性估计配对,使人类专家能够专注于高风险案例,同时简化常规验证。然而,目前的不确定性估计方法仍然有限,特别是在量化任意的不确定性,这是由数据的模糊性和噪声。为了解决这个问题,我们提出了一种新的方法,利用专家回答中的分歧来生成训练机器学习模型的目标。这些目标与标准数据标签一起使用,通过两个集合方法及其轻量级变体分别估计总方差定律所给出的两个不确定性分量。我们验证了我们的方法对二值图像分类,二值和多类图像分割,和多项选择题回答。我们的实验表明,根据任务的不同,结合专家知识可以将不确定性估计质量提高9美元至50美元,使得这种信息来源对于构建医疗保健应用中的风险感知AI系统非常宝贵。
摘要:Artificial intelligence (AI) systems accelerate medical workflows and improve diagnostic accuracy in healthcare, serving as second-opinion systems. However, the unpredictability of AI errors poses a significant challenge, particularly in healthcare contexts, where mistakes can have severe consequences. A widely adopted safeguard is to pair predictions with uncertainty estimation, enabling human experts to focus on high-risk cases while streamlining routine verification. Current uncertainty estimation methods, however, remain limited, particularly in quantifying aleatoric uncertainty, which arises from data ambiguity and noise. To address this, we propose a novel approach that leverages disagreement in expert responses to generate targets for training machine learning models. These targets are used in conjunction with standard data labels to estimate two components of uncertainty separately, as given by the law of total variance, via a two-ensemble approach, as well as its lightweight variant. We validate our method on binary image classification, binary and multi-class image segmentation, and multiple-choice question answering. Our experiments demonstrate that incorporating expert knowledge can enhance uncertainty estimation quality by $9\%$ to $50\%$ depending on the task, making this source of information invaluable for the construction of risk-aware AI systems in healthcare applications.
【6】Towards Intrinsically Calibrated Uncertainty Quantification in Industrial Data-Driven Models via Diffusion Sampler
标题:通过扩散采样器实现工业数据驱动模型中的内在校准不确定性量化
链接:https://arxiv.org/abs/2604.01870
作者:Yiran Ma,Jerome Le Ny,Zhichao Chen,Zhihuan Song
备注:This manuscript has been accepted for publication in IEEE Transactions on Industrial Informatics. Copyright has been transferred to IEEE. Reuse of this material is subject to IEEE copyright restrictions
摘要:在现代流程工业中,当关键性能指标难以直接测量时,数据驱动模型是实时监控的重要工具。虽然准确的预测是必不可少的,但可靠的不确定性量化(UQ)对于安全性,可靠性和决策同样重要,但仍然是当前数据驱动方法的主要挑战。在这项工作中,我们引入了一个基于扩散的后验采样框架,通过忠实的后验采样,固有地产生校准良好的预测不确定性,消除了事后校准的需要。在合成分布的广泛评估,拉曼苯乙酸软测量基准,和一个真正的氨合成案例研究,我们的方法实现了实际的改进,现有的UQ技术在不确定度校准和预测精度。这些结果突出了扩散采样器作为一个原则性和可扩展的范例,在工业应用中推进不确定性感知建模。
摘要:In modern process industries, data-driven models are important tools for real-time monitoring when key performance indicators are difficult to measure directly. While accurate predictions are essential, reliable uncertainty quantification (UQ) is equally critical for safety, reliability, and decision-making, but remains a major challenge in current data-driven approaches. In this work, we introduce a diffusion-based posterior sampling framework that inherently produces well-calibrated predictive uncertainty via faithful posterior sampling, eliminating the need for post-hoc calibration. In extensive evaluations on synthetic distributions, the Raman-based phenylacetic acid soft sensor benchmark, and a real ammonia synthesis case study, our method achieves practical improvements over existing UQ techniques in both uncertainty calibration and predictive accuracy. These results highlight diffusion samplers as a principled and scalable paradigm for advancing uncertainty-aware modeling in industrial applications.
【7】DDCL: Deep Dual Competitive Learning: A Differentiable End-to-End Framework for Unsupervised Prototype-Based Representation Learning
标题:DDCL:深度双重竞争学习:无监督基于原型的表示学习的差异化端到端框架
链接:https://arxiv.org/abs/2604.01740
作者:Giansalvo Cirrincione
摘要:深度聚类中一个持续存在的结构性弱点是特征学习和聚类分配之间的脱节。大多数架构调用外部聚类步骤,通常是k-means,以产生指导训练的伪标签,防止主干直接优化聚类质量。本文介绍了深度双重竞争学习(DDCL),这是第一个完全可区分的端到端框架,用于无监督的基于原型的表示学习。核心贡献是架构:外部k-means被内部双竞争层(DCL)取代,该层生成原型作为网络的本地可区分输出。这个单一的反转使得从骨干特征提取到原型生成到软集群分配的完整管道可以通过单一统一损失通过反向传播进行训练,没有Lloyd迭代,没有伪标签离散化,也没有外部聚类步骤。地面的框架理论上,本文推导出一个精确的代数分解的软量化损失到一个单纯形约束的重建误差和非负加权原型方差项。这个身份揭示了一个自我调节机制内置的损失几何:梯度的方差项作为一个隐式的分离力,抵抗原型崩溃没有任何辅助目标,并导致一个全球性的李雅普诺夫稳定性定理减少冻结编码器系统。六块对照实验验证了每个结构预测。在超过10万个训练epoch中,分解恒等式保持零违规; Pearson -0.98证实了负反馈循环;通过联合训练的骨干,DDCL在聚类准确性方面优于其不可微消融65%,DeepCluster端到端122%。
摘要:A persistent structural weakness in deep clustering is the disconnect between feature learning and cluster assignment. Most architectures invoke an external clustering step, typically k-means, to produce pseudo-labels that guide training, preventing the backbone from directly optimising for cluster quality. This paper introduces Deep Dual Competitive Learning (DDCL), the first fully differentiable end-to-end framework for unsupervised prototype-based representation learning. The core contribution is architectural: the external k-means is replaced by an internal Dual Competitive Layer (DCL) that generates prototypes as native differentiable outputs of the network. This single inversion makes the complete pipeline, from backbone feature extraction through prototype generation to soft cluster assignment, trainable by backpropagation through a single unified loss, with no Lloyd iterations, no pseudo-label discretisation, and no external clustering step. To ground the framework theoretically, the paper derives an exact algebraic decomposition of the soft quantisation loss into a simplex-constrained reconstruction error and a non-negative weighted prototype variance term. This identity reveals a self-regulating mechanism built into the loss geometry: the gradient of the variance term acts as an implicit separation force that resists prototype collapse without any auxiliary objective, and leads to a global Lyapunov stability theorem for the reduced frozen-encoder system. Six blocks of controlled experiments validate each structural prediction. The decomposition identity holds with zero violations across more than one hundred thousand training epochs; the negative feedback cycle is confirmed with Pearson -0.98; with a jointly trained backbone, DDCL outperforms its non-differentiable ablation by 65% in clustering accuracy and DeepCluster end-to-end by 122%.
【8】Variational LSTM with Augmented Inputs: Nonlinear Response History Metamodeling with Aleatoric and Epistemic Uncertainty
标题:具有增强输入的变分LSTM:具有感性和认识不确定性的非线性响应历史元模型
链接:https://arxiv.org/abs/2604.01587
作者:Manisha Sapkota,Min Li,Bowei Li
备注:22 pages, 10 figures
摘要:高维非线性动力结构系统中的不确定性传播是最先进的基于性能的设计和风险评估的关键,其中来自激励和结构的不确定性,即,随机的不确定性必须考虑。由于大量的计算需求,这构成了一个重大挑战。因此,机器学习技术被引入作为元模型来减轻这种负担。然而,机器学习模型的“黑箱”性质强调了避免过度自信预测的必要性,特别是在数据和训练不足的情况下。除了考虑任意不确定性之外,这产生了估计与预测置信度相关的不确定性的需要,即,认知不确定性,对于基于机器学习的元模型。我们开发了一种概率元建模技术,该技术基于具有增强输入的变分长短期记忆(LSTM),以同时捕获任意和认知的不确定性。关键的随机系统参数被视为增强输入旁边的激励系列进行记录到记录的变化,以捕捉全范围的任意不确定性。同时,认知的不确定性是有效地近似通过蒙特卡洛dropout计划。与计算昂贵的全贝叶斯方法不同,该方法在实现几乎无成本的不确定性模拟的同时,产生的额外训练成本可以忽略不计。所提出的技术证明通过多个案例研究,涉及随机地震或风激励。结果表明,校准后的元模型准确地再现非线性响应的时间历程,并提供了相关的认知不确定性的置信区间。
摘要:Uncertainty propagation in high-dimensional nonlinear dynamic structural systems is pivotal in state-of-the-art performance-based design and risk assessment, where uncertainties from both excitations and structures, i.e., the aleatoric uncertainty, must be considered. This poses a significant challenge due to heavy computational demands. Machine learning techniques are thus introduced as metamodels to alleviate this burden. However, the "black box" nature of Machine learning models underscores the necessity of avoiding overly confident predictions, particularly when data and training efforts are insufficient. This creates a need, in addition to considering the aleatoric uncertainty, of estimating the uncertainty related to the prediction confidence, i.e., epistemic uncertainty, for machine learning-based metamodels. We developed a probabilistic metamodeling technique based on a variational long short-term memory (LSTM) with augmented inputs to simultaneously capture aleatoric and epistemic uncertainties. Key random system parameters are treated as augmented inputs alongside excitation series carrying record-to-record variability to capture the full range of aleatoric uncertainty. Meanwhile, epistemic uncertainty is effectively approximated via the Monte Carlo dropout scheme. Unlike computationally expensive full Bayesian approaches, this method incurs negligible additional training costs while enabling nearly cost-free uncertainty simulation. The proposed technique is demonstrated through multiple case studies involving stochastic seismic or wind excitations. Results show that the calibrated metamodels accurately reproduce nonlinear response time histories and provide confidence bounds indicating the associated epistemic uncertainty.
【9】UQ-SHRED: uncertainty quantification of shallow recurrent decoder networks for sparse sensing via engression
标题:UQ-SHRED:浅循环解码器网络的不确定性量化,用于通过侵入进行稀疏感知
链接
:https://arxiv.org/abs/2604.01305
作者:Mars Liyao Gao,Yuxuan Bao,Amy S. Rude,Xinwei Shen,J. Nathan Kutz
摘要:从稀疏传感器测量数据重建高维时空场在广泛的科学应用中是至关重要的。SHRED(SHallow REcurrent Decoder)架构是一种最新的架构,其从超稀疏传感器测量流重建高质量的空间域。SHRED的一个重要限制是,在复杂的,数据稀缺的,高频的,或随机的系统中,时空场的部分必须建模与有效的不确定性估计。我们介绍了UQ-SHRED,稀疏感知问题的分布式学习框架,通过基于神经网络的分布式回归(称为回归)提供不确定性量化。UQ-SHRED通过学习以传感器历史为条件的空间状态的预测分布来对不确定性进行建模。通过向传感器输入中注入随机噪声并使用能量分数损失进行训练,UQ-SHRED以最小的计算开销产生预测分布,仅需要在输入处注入噪声并通过单个架构重新训练,而无需重新训练或额外的网络结构。在复杂的合成和现实生活中的数据集,包括湍流,大气动力学,神经科学和天体物理学,UQ-SHRED提供了一个分布近似与校准良好的置信区间。我们进一步进行消融研究,以了解每个模型设置如何影响UQ-SHRED性能的质量,以及其在一组不同实验设置上的不确定性量化的有效性。
摘要:Reconstructing high-dimensional spatiotemporal fields from sparse sensor measurements is critical in a wide range of scientific applications. The SHallow REcurrent Decoder (SHRED) architecture is a recent state-of-the-art architecture that reconstructs high-quality spatial domain from hyper-sparse sensor measurement streams. An important limitation of SHRED is that in complex, data-scarce, high-frequency, or stochastic systems, portions of the spatiotemporal field must be modeled with valid uncertainty estimation. We introduce UQ-SHRED, a distributional learning framework for sparse sensing problems that provides uncertainty quantification through a neural network-based distributional regression called engression. UQ-SHRED models the uncertainty by learning the predictive distribution of the spatial state conditioned on the sensor history. By injecting stochastic noise into sensor inputs and training with an energy score loss, UQ-SHRED produces predictive distributions with minimal computational overhead, requiring only noise injection at the input and resampling through a single architecture without retraining or additional network structures. On complicated synthetic and real-life datasets including turbulent flow, atmospheric dynamics, neuroscience and astrophysics, UQ-SHRED provides a distributional approximation with well-calibrated confidence intervals. We further conduct ablation studies to understand how each model setting affects the quality of the UQ-SHRED performance, and its validity on uncertainty quantification over a set of different experimental setups.
迁移|Zero/Few/One-Shot|自适应(5篇)
【1】Auction-Based Online Policy Adaptation for Evolving Objectives
标题:基于拍卖的在线政策适应不断变化的目标
链接:https://arxiv.org/abs/2604.02151
作者:Guruprerana Shabadi,Kaushik Mallik
备注:17 pages, 6 figures
摘要:我们考虑多目标强化学习问题,其中目标来自同一个家族-例如可达性目标类-并且可能在运行时出现或消失。我们的目标是设计自适应的政策,可以有效地调整他们的行为,作为一套积极的目标的变化。为了解决这个问题,我们提出了一个模块化的框架,其中每个目标都由一个自私的本地政策支持,并通过一种新的基于拍卖的机制实现协调:政策投标执行其行动的权利,投标反映了当前状态的紧迫性。出价最高者选择行动,从而实现目标之间的动态和可解释的权衡。回到最初的适应问题,当目标改变时,系统通过简单地添加或删除相应的策略来适应。此外,由于目标来自相同的家族,因此可以部署参数化策略的相同副本,从而便于在运行时立即进行调整。我们展示了如何计算自私的本地政策,把问题变成一个一般和游戏,其中的政策相互竞争,以实现自己的目标。为了取得成功,每项政策不仅必须优化自己的目标,还必须考虑其他目标的存在,并学会制定反映相对优先级的校准出价。在我们的实现中,使用邻近策略优化(PPO)同时训练策略。我们评估Atari Assault和基于网格世界的动态目标路径规划任务。我们的方法比使用PPO训练的单片策略实现了更好的性能。
摘要:We consider multi-objective reinforcement learning problems where objectives come from an identical family -- such as the class of reachability objectives -- and may appear or disappear at runtime. Our goal is to design adaptive policies that can efficiently adjust their behaviors as the set of active objectives changes. To solve this problem, we propose a modular framework where each objective is supported by a selfish local policy, and coordination is achieved through a novel auction-based mechanism: policies bid for the right to execute their actions, with bids reflecting the urgency of the current state. The highest bidder selects the action, enabling a dynamic and interpretable trade-off among objectives. Going back to the original adaptation problem, when objectives change, the system adapts by simply adding or removing the corresponding policies. Moreover, as objectives arise from the same family, identical copies of a parameterized policy can be deployed, facilitating immediate adaptation at runtime. We show how the selfish local policies can be computed by turning the problem into a general-sum game, where the policies compete against each other to fulfill their own objectives. To succeed, each policy must not only optimize its own objective, but also reason about the presence of other goals and learn to produce calibrated bids that reflect relative priority. In our implementation, the policies are trained concurrently using proximal policy optimization (PPO). We evaluate on Atari Assault and a gridworld-based path-planning task with dynamic targets. Our method achieves substantially better performance than monolithic policies trained with PPO.
【2】CANDI: Curated Test-Time Adaptation for Multivariate Time-Series Anomaly Detection Under Distribution Shift
标题:CANDI:分布漂移下多元时间序列异常检测的精心策划的测试时间自适应
链接:https://arxiv.org/abs/2604.01845
作者:HyunGi Kim,Jisoo Mok,Hyungyu Lee,Juhyeon Shin,Sungroh Yoon
备注:AAAI 2026
摘要:多变量时间序列异常检测(MTSAD)的目的是识别偏离正态分布的多变量时间序列,在现实世界中的应用是至关重要的。然而,在现实世界的部署中,分布偏移是普遍存在的,并导致预训练的异常检测器的性能严重下降。测试时自适应(TTA)仅使用未标记的测试数据实时更新预先训练的模型,使其有希望解决这一挑战。在这项研究中,我们提出了CANDI(Curated test-time adaptation for multivariate time-series ANomaly detection under DIRECTION shift),这是一种新的TTA框架,可以选择性地识别和适应潜在的误报,同时保留预先训练的知识。CANDI引入了一个假阳性挖掘(FPM)策略,根据异常分数和潜在相似性来管理适应样本,并结合了一个即插即用的时空感知常态适应(SANA)模块,用于结构化的模型更新。大量的实验表明,CANDI显着提高了MTSAD的性能下的分布偏移,提高AUROC高达14%,而使用更少的适应样本。
摘要:Multivariate time-series anomaly detection (MTSAD) aims to identify deviations from normality in multivariate time-series and is critical in real-world applications. However, in real-world deployments, distribution shifts are ubiquitous and cause severe performance degradation in pre-trained anomaly detector. Test-time adaptation (TTA) updates a pre-trained model on-the-fly using only unlabeled test data, making it promising for addressing this challenge. In this study, we propose CANDI (Curated test-time adaptation for multivariate time-series ANomaly detection under DIstribution shift), a novel TTA framework that selectively identifies and adapts to potential false positives while preserving pre-trained knowledge. CANDI introduces a False Positive Mining (FPM) strategy to curate adaptation samples based on anomaly scores and latent similarity, and incorporates a plug-and-play Spatiotemporally-Aware Normality Adaptation (SANA) module for structurally informed model updates. Extensive experiments demonstrate that CANDI significantly improves the performance of MTSAD under distribution shift, improving AUROC up to 14% while using fewer adaptation samples.
【3】Koopman-Based Nonlinear Identification and Adaptive Control of a Turbofan Engine
标题:基于Koopman的涡轮风扇发动机非线性辨识与自适应控制
链接:https://arxiv.org/abs/2604.01730
作者:David Grasev
备注:21 pages, 23 figures
摘要:本文研究了基于Koopman算子的双轴涡扇发动机多变量控制方法。一个基于物理的组件级模型的开发,以生成训练数据和验证控制器。开发了一种元启发式扩展动态模式分解,其成本函数被设计为精确地捕获线轴速度动态和发动机压力比(EPR),从而能够构建适合于多个控制目标的单个Koopman模型。使用所确定的时变Koopman模型,开发了两个控制器:一个自适应Koopman的模型预测控制器(AKMPC)与干扰观测器和Koopman的反馈线性化控制器(K-FBLC),作为基准。两种控制策略,即配置的线轴速度和EPR,在海平面和不同的飞行条件下的控制器进行评估。结果表明,所提出的识别方法能够准确地预测两个阀芯速度和EPR,允许Koopman模型可以灵活地跨不同的控制配方重复使用。虽然这两种控制策略在稳态条件下实现了相当的性能,但由于AKMPC能够补偿模型失配,因此在不同的飞行条件下,AKMPC与K-FBLC相比具有更好的鲁棒性。此外,EPR控制策略改善了推力响应。该研究突出了基于Koopman的控制的适用性,并展示了基于AKMPC的框架的鲁棒涡扇发动机控制的优势。
摘要:This paper investigates Koopman operator-based approaches for multivariable control of a two-spool turbofan engine. A physics-based component-level model is developed to generate training data and validate the controllers. A meta-heuristic extended dynamic mode decomposition is developed, with a cost function designed to accurately capture both spool-speed dynamics and the engine pressure ratio (EPR), enabling the construction of a single Koopman model suitable for multiple control objectives. Using the identified time-varying Koopman model, two controllers are developed: an adaptive Koopman-based model predictive controller (AKMPC) with a disturbance observer and a Koopman-based feedback linearization controller (K-FBLC), which serves as a benchmark. The controllers are evaluated for two control strategies, namely configurations of spool speeds and EPR, under both sea-level and varying flight conditions. The results demonstrate that the proposed identification approach enables accurate predictions of both spool speeds and EPR, allowing the Koopman model to be reused flexibly across different control formulations. While both control strategies achieve comparable performance in steady conditions, the AKMPC exhibits superior robustness compared with the K-FBLC under varying flight conditions due to its ability to compensate for model mismatch. Moreover, the EPR control strategy improves the thrust response. The study highlights the applicability of Koopman-based control and demonstrates the advantages of the AKMPC-based framework for robust turbofan engine control.
【4】Prime Once, then Reprogram Locally: An Efficient Alternative to Black-Box Service Model Adaptation
标题:先启动一次,然后在本地重新编程:黑匣子服务模型适应的有效替代方案
链接:https://arxiv.org/abs/2604.01474
作者:Yunbei Zhang,Chengyi Cai,Feng Liu,Jihun Hamm
备注:CVPR 2026
摘要:调整封闭式服务模式(即,目标任务的API通常依赖于通过零阶优化(ZOO)重新编程。然而,这种标准策略以大量、昂贵的API调用而闻名,并且经常遭受缓慢、不稳定的优化。此外,我们观察到这种范式面临着现代API的新挑战(例如,GPT-40)。这些模型可能对ZOO依赖的输入扰动不太敏感,从而阻碍了性能增益。为了解决这些限制,我们提出了一种替代的服务模型(AReS)的有效重编程方法。AReS不是直接、连续的闭盒优化,而是启动与服务API的单次交互,以准备好适合的本地预训练编码器。这个启动阶段只训练本地编码器顶部的轻量级层,使其高度接受直接在本地模型上执行的后续玻璃盒(白盒)重编程阶段。因此,所有后续的适配和推理都只依赖于这个本地代理,从而消除了所有进一步的API成本。实验证明了AReS的有效性,其中先前基于ZOO的方法很难:在GPT-4 o上,AReS在zero-shot基线上实现了+27.8%的增益,这是一项基于ZOO的方法几乎没有改进的任务。总体而言,在10个不同的数据集上,AReS的性能优于最先进的方法(VLM为+2.5%,标准VM为+15.6%),同时减少了超过99.99%的API调用。因此,AReS为适应现代闭箱模型提供了一个强大而实用的解决方案。
摘要:Adapting closed-box service models (i.e., APIs) for target tasks typically relies on reprogramming via Zeroth-Order Optimization (ZOO). However, this standard strategy is known for extensive, costly API calls and often suffers from slow, unstable optimization. Furthermore, we observe that this paradigm faces new challenges with modern APIs (e.g., GPT-4o). These models can be less sensitive to the input perturbations ZOO relies on, thereby hindering performance gains. To address these limitations, we propose an Alternative efficient Reprogramming approach for Service models (AReS). Instead of direct, continuous closed-box optimization, AReS initiates a single-pass interaction with the service API to prime an amenable local pre-trained encoder. This priming stage trains only a lightweight layer on top of the local encoder, making it highly receptive to the subsequent glass-box (white-box) reprogramming stage performed directly on the local model. Consequently, all subsequent adaptation and inference rely solely on this local proxy, eliminating all further API costs. Experiments demonstrate AReS's effectiveness where prior ZOO-based methods struggle: on GPT-4o, AReS achieves a +27.8% gain over the zero-shot baseline, a task where ZOO-based methods provide little to no improvement. Broadly, across ten diverse datasets, AReS outperforms state-of-the-art methods (+2.5% for VLMs, +15.6% for standard VMs) while reducing API calls by over 99.99%. AReS thus provides a robust and practical solution for adapting modern closed-box models.
【5】Malliavin Calculus for Counterfactual Gradient Estimation in Adaptive Inverse Reinforcement Learning
标题:自适应反向强化学习中反事实梯度估计的Malliavin演算
链接:https://arxiv.org/abs/2604.01345
作者:Vikram Krishnamurthy,Luke Snow
摘要:逆强化学习(IRL)根据观察到的响应恢复前向学习者的损失函数自适应IRL旨在通过在执行强化学习(RL)时被动观察前向学习者的梯度来重建前向学习者的损失函数。本文提出了一种新的被动Langevin为基础的算法,实现自适应IRL。自适应IRL的关键困难在于被动算法中所需的梯度是反事实的,也就是说,它们以正向学习器轨迹下的概率为零的事件为条件。因此,朴素蒙特卡罗估计是非常低效的,核平滑虽然常见,但收敛速度慢。我们克服了这一点,采用Malliavin演算有效地估计所需的反事实梯度。我们重新制定的反事实条件反射的比例无条件的期望,涉及Malliavin数量,从而恢复标准的估计率。我们推导出必要的Malliavin衍生物及其伴随Skorohod积分公式的一般Langevin结构,并提供了一个具体的算法方法,利用这些反事实梯度估计。
摘要:Inverse reinforcement learning (IRL) recovers the loss function of a forward learner from its observed responses adaptive IRL aims to reconstruct the loss function of a forward learner by passively observing its gradients as it performs reinforcement learning (RL). This paper proposes a novel passive Langevin-based algorithm that achieves adaptive IRL. The key difficulty in adaptive IRL is that the required gradients in the passive algorithm are counterfactual, that is, they are conditioned on events of probability zero under the forward learner's trajectory. Therefore, naive Monte Carlo estimators are prohibitively inefficient, and kernel smoothing, though common, suffers from slow convergence. We overcome this by employing Malliavin calculus to efficiently estimate the required counterfactual gradients. We reformulate the counterfactual conditioning as a ratio of unconditioned expectations involving Malliavin quantities, thus recovering standard estimation rates. We derive the necessary Malliavin derivatives and their adjoint Skorohod integral formulations for a general Langevin structure, and provide a concrete algorithmic approach which exploits these for counterfactual gradient estimation.
强化学习(8篇)
【1】SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization
标题:SKILL 0:用于技能内化的上下文强化学习
链接:https://arxiv.org/abs/2604.02268
作者:Zhengxi Lu,Zhiyuan Yao,Jinyang Wu,Chengcheng Han,Qi Gu,Xunliang Cai,Weiming Lu,Jun Xiao,Yueting Zhuang,Yongliang Shen
摘要
:代理的技能,结构化的程序知识包和可执行的资源,代理动态加载在推理时间,已成为一个可靠的机制,增强LLM代理。然而,推理时间技能增强从根本上是有限的:检索噪声引入不相关的指导,注入的技能内容施加大量的令牌开销,模型永远不会真正获得它只是遵循的知识。我们问技能是否可以被内化到模型参数中,从而在没有任何运行时技能检索的情况下实现zero-shot自主行为。我们介绍SKILL 0,这是一个为技能内化而设计的上下文强化学习框架。SKILL 0引入了一个培训时间课程,从完整的技能上下文开始,并逐渐退出。技能按类别离线分组,并与交互历史一起呈现在紧凑的视觉上下文中,教授模型工具调用和多轮任务完成。然后,动态课程评估每个技能文件的政策上的有用性,只保留那些当前政策仍然受益于线性衰减的预算,直到代理在完全zero-shot设置操作。广泛的代理实验表明,SKILL 0在标准RL基线上实现了实质性的改进(ALFWorld为+9.7\%,Search-QA为+6.6\%),同时保持了每步不到0.5k令牌的高效上下文。我们的代码可在https://github.com/ZJU-REAL/SkillZero上获得。
摘要:Agent skills, structured packages of procedural knowledge and executable resources that agents dynamically load at inference time, have become a reliable mechanism for augmenting LLM agents. Yet inference-time skill augmentation is fundamentally limited: retrieval noise introduces irrelevant guidance, injected skill content imposes substantial token overhead, and the model never truly acquires the knowledge it merely follows. We ask whether skills can instead be internalized into model parameters, enabling zero-shot autonomous behavior without any runtime skill retrieval. We introduce SKILL0, an in-context reinforcement learning framework designed for skill internalization. SKILL0 introduces a training-time curriculum that begins with full skill context and progressively withdraws it. Skills are grouped offline by category and rendered with interaction history into a compact visual context, teaching he model tool invocation and multi-turn task completion. A Dynamic Curriculum then evaluates each skill file's on-policy helpfulness, retaining only those from which the current policy still benefits within a linearly decaying budget, until the agent operates in a fully zero-shot setting. Extensive agentic experiments demonstrate that SKILL0 achieves substantial improvements over the standard RL baseline (+9.7\% for ALFWorld and +6.6\% for Search-QA), while maintaining a highly efficient context of fewer than 0.5k tokens per step. Our code is available at https://github.com/ZJU-REAL/SkillZero.
【2】Model-Based Reinforcement Learning for Control under Time-Varying Dynamics
标题:时变动态下基于模型的强化学习控制
链接:https://arxiv.org/abs/2604.02260
作者:Klemens Iten,Bruce Lee,Chenhao Li,Lenart Treven,Andreas Krause,Bhavya Sukhija
备注:15 pages, 5 figues, 2 tables. This work has been submitted to the IEEE for possible publication
摘要:基于学习的控制方法通常假设静态系统动态,由于漂移,磨损或变化的操作条件,在现实世界的系统中经常违反这一假设。我们研究时变动力学下的强化学习控制。我们考虑一个连续的基于模型的强化学习设置,其中代理反复学习和控制动态系统,其过渡动力学跨情节演变。我们分析的问题,使用高斯过程动力学模型下的频率变分预算的假设。我们的分析表明,持续的非平稳性需要明确限制过时数据的影响,以保持校准的不确定性和有意义的动态遗憾保证。受这些见解的启发,我们提出了一个实用的乐观的基于模型的强化学习算法与自适应数据缓冲机制,并证明了改进的性能与非平稳动态连续控制基准。
摘要:Learning-based control methods typically assume stationary system dynamics, an assumption often violated in real-world systems due to drift, wear, or changing operating conditions. We study reinforcement learning for control under time-varying dynamics. We consider a continual model-based reinforcement learning setting in which an agent repeatedly learns and controls a dynamical system whose transition dynamics evolve across episodes. We analyze the problem using Gaussian process dynamics models under frequentist variation-budget assumptions. Our analysis shows that persistent non-stationarity requires explicitly limiting the influence of outdated data to maintain calibrated uncertainty and meaningful dynamic regret guarantees. Motivated by these insights, we propose a practical optimistic model-based reinforcement learning algorithm with adaptive data buffer mechanisms and demonstrate improved performance on continuous control benchmarks with non-stationary dynamics.
【3】Systematic Analyses of Reinforcement Learning Controllers in Signalized Urban Corridors
标题:信号城市走廊中强化学习控制器的系统分析
链接:https://arxiv.org/abs/2604.02025
作者:Xiaofei Song,Kerstin Eder,Jonathan Lawry,R. Eddie Wilson
摘要:在这项工作中,我们扩展了我们的系统容量区域的角度来看,多路口交通网络,专注于城市走廊网络的特殊情况。特别是,我们训练和评估集中,完全分散,参数共享分散RL控制器,并比较他们的能力区域和ATT与经典的基线MaxPressure控制器。此外,我们展示了如何parametersharing控制器可以推广到部署在一个更大的网络上,比它原来的trainedon.In这种设置,我们展示了一些初步的研究结果表明,即使路口没有正式协调,交通可能会自我组织成“绿波”。
摘要:In this work, we extend our systematic capacity region perspective to multi-junction traffic networks, focussing on the special case of an urban corridor network. In particular, we train and evaluate centralized, fully decentralized, and parameter-sharing decentralized RL controllers, and compare their capacity regions and ATTs together with a classical baseline MaxPressure controller. Further, we show how the parametersharing controller may be generalised to be deployed on a larger network than it was originally trained on. In this setting, we show some initial findings that suggest that even though the junctions are not formally coordinated, traffic may self organise into `green waves'.
【4】The Rank and Gradient Lost in Non-stationarity: Sample Weight Decay for Mitigating Plasticity Loss in Reinforcement Learning
标题:非平稳性中的等级和梯度损失:缓解强化学习中可塑性损失的样本权重衰减
链接:https://arxiv.org/abs/2604.01913
作者:Zihao Wu,Hongyao Tang,Yi Ma,Jiashun Liu,Yan Zheng,Jianye Hao
备注:ICLR
摘要:深度强化学习(RL)由于其非平稳性的本质,严重丧失了可塑性,这削弱了适应新数据和持续学习的能力。遗憾的是,我们对可塑性损失如何产生、消散和消除的理解仍然局限于经验发现,从而使理论端的探索不足。为了解决这一差距,我们从网络优化的理论角度研究可塑性损失问题。通过形式化描述在线强化学习过程中的两个罪魁祸首因素:数据分布的非平稳性和自举引起的目标的非平稳性,我们的理论将可塑性的损失归因于两种机制:神经正切核(NTK)Gram矩阵的秩崩溃和梯度幅度的$Θ(\frac{1}{k})$衰减。第一种机制从理论角度回应了先前的经验发现,并揭示了现有方法的效果,例如,网络重置、神经元回收和噪声注入。在此背景下,我们主要关注第二种机制,旨在通过解决梯度衰减问题来减轻塑性损失,这与现有方法正交。我们提出了样本权重衰减-一种轻量级的方法来恢复梯度幅度,作为基于经验重放的深度RL方法的塑性损失的一般补救措施。在实验中,我们评估了\methodName在TD 3、\myadded{Double DQN}和SAC上的功效,并在MuJoCo、\myadded{ALE}和DeepMind Control Suite任务中使用SimBa架构。结果表明,\methodName有效地消除了可塑性损失,并在深度RL算法、UTD、网络架构和环境的各种配置中持续提高了学习性能,在具有挑战性的DMC Humanoid任务中实现了SOTA性能。
摘要
:Deep reinforcement learning (RL) suffers from plasticity loss severely due to the nature of non-stationarity, which impairs the ability to adapt to new data and learn continually. Unfortunately, our understanding of how plasticity loss arises, dissipates, and can be dissolved remains limited to empirical findings, leaving the theoretical end underexplored.To address this gap, we study the plasticity loss problem from the theoretical perspective of network optimization. By formally characterizing the two culprit factors in online RL process: the non-stationarity of data distributions and the non-stationarity of targets induced by bootstrapping, our theory attributes the loss of plasticity to two mechanisms: the rank collapse of the Neural Tangent Kernel (NTK) Gram matrix and the $Θ(\frac{1}{k})$ decay of gradient magnitude. The first mechanism echoes prior empirical findings from the theoretical perspective and sheds light on the effects of existing methods, e.g., network reset, neuron recycle, and noise injection. Against this backdrop, we focus primarily on the second mechanism and aim to alleviate plasticity loss by addressing the gradient attenuation issue, which is orthogonal to existing methods. We propose Sample Weight Decay -- a lightweight method to restore gradient magnitude, as a general remedy to plasticity loss for deep RL methods based on experience replay. In experiments, we evaluate the efficacy of \methodName upon TD3, \myadded{Double DQN} and SAC with SimBa architecture in MuJoCo, \myadded{ALE} and DeepMind Control Suite tasks. The results demonstrate that \methodName effectively alleviates plasticity loss and consistently improves learning performance across various configurations of deep RL algorithms, UTD, network architectures, and environments, achieving SOTA performance on challenging DMC Humanoid tasks.
【5】Physics Informed Reinforcement Learning with Gibbs Priors for Topology Control in Power Grids
标题:具有Gibbs先验的物理信息强化学习用于电网中的布局控制
链接:https://arxiv.org/abs/2604.01830
作者:Pantelis Dogoulis,Maxime Cordy
摘要:电网运行的拓扑控制是一个具有挑战性的顺序决策问题,因为行动空间的增长与电网的大小和行动评估,通过模拟计算是昂贵的。我们提出了一个物理信息强化学习框架,该框架将半马尔可夫控制与吉布斯先验相结合,在动作空间上对系统的物理进行编码。只有当电网进入危险状态时才做出决定,而图神经网络代理预测可行拓扑动作的动作后过载风险。这些预测被用来构建一个物理通知吉布斯之前,既选择一个小的状态依赖的候选人集和重新加权策略logits之前的动作选择。通过这种方式,我们的方法降低了探索难度和在线模拟成本,同时保留了学习策略的灵活性。我们评估的方法在三个现实的基准环境的难度越来越大。在所有设置中,所提出的方法实现了控制质量和计算效率之间的强平衡:在第一个基准测试中,它与Oracle级别的性能相当,同时大约快6倍,在第二个基准测试中,它达到了Oracle奖励的94.6倍,而决策时间大约减少了200倍,并且在最具挑战性的基准上,在PPO基准上提高了高达255美元的奖励和284美元的存活步骤,同时保持比强大的专业工程基准快2.5倍。这些结果表明,我们的方法提供了一个有效的机制,在电网的拓扑控制。
摘要:Topology control for power grid operation is a challenging sequential decision making problem because the action space grows combinatorially with the size of the grid and action evaluation through simulation is computationally expensive. We propose a physics-informed Reinforcement Learning framework that combines semi-Markov control with a Gibbs prior, that encodes the system's physics, over the action space. The decision is only taken when the grid enters a hazardous regime, while a graph neural network surrogate predicts the post action overload risk of feasible topology actions. These predictions are used to construct a physics-informed Gibbs prior that both selects a small state-dependent candidate set and reweights policy logits before action selection. In this way, our method reduces exploration difficulty and online simulation cost while preserving the flexibility of a learned policy. We evaluate the approach in three realistic benchmark environments of increasing difficulty. Across all settings, the proposed method achieves a strong balance between control quality and computational efficiency: it matches oracle-level performance while being approximately $6\times$ faster on the first benchmark, reaches $94.6\%$ of oracle reward with roughly $200\times$ lower decision time on the second one, and on the most challenging benchmark improves over a PPO baseline by up to $255\%$ in reward and $284\%$ in survived steps while remaining about $2.5\times$ faster than a strong specialized engineering baseline. These results show that our method provides an effective mechanism for topology control in power grids.
【6】DISCO-TAB: A Hierarchical Reinforcement Learning Framework for Privacy-Preserving Synthesis of Complex Clinical Data
标题:DISCO-Tab:一个用于复杂临床数据隐私保护合成的分层强化学习框架
链接:https://arxiv.org/abs/2604.01481
作者:Arshia Ilaty,Hossein Shirazi,Amir Rahmani,Hajar Homayouni
摘要:强大的临床决策支持系统的发展经常受到高保真,隐私保护的生物医学数据的稀缺性的阻碍。虽然生成式大语言模型(LLM)为合成数据生成提供了一个有前途的途径,但它们往往难以捕捉电子健康记录(EHR)中固有的复杂的非线性依赖关系和严重的类不平衡,导致统计上合理但临床上无效的记录。为了弥合这一差距,我们引入了DISCO-TAB(用于TABular合成的识别器引导控制),这是一种新的框架,它通过强化学习优化了多目标优化系统,协调了一个微调的LLM。与之前依赖标量反馈的方法不同,DISCO-TAB在四个粒度(令牌、句子、特征和行)上评估合成,同时集成自动约束发现和逆频率奖励整形,以自主保留潜在的医学逻辑并解决少数类崩溃。我们严格验证我们的框架在不同的基准,包括高维,小样本医疗数据集(例如,心力衰竭,帕金森氏症)。我们的研究结果表明,分层反馈产生了最先进的性能,与GAN和扩散基线相比,下游临床分类器效用提高了38.2%,同时确保了出色的统计保真度(JSD < 0.01)和对成员推理攻击的强大抵抗力。这项工作建立了一个新的标准,为敏感的医疗保健应用程序生成可信的,实用性保护的合成表格数据。
摘要:The development of robust clinical decision support systems is frequently impeded by the scarcity of high-fidelity, privacy-preserving biomedical data. While Generative Large Language Models (LLMs) offer a promising avenue for synthetic data generation, they often struggle to capture the complex, non-linear dependencies and severe class imbalances inherent in Electronic Health Records (EHR), leading to statistically plausible but clinically invalid records. To bridge this gap, we introduce DISCO-TAB (DIScriminator-guided COntrol for TABular synthesis), a novel framework that orchestrates a fine-tuned LLM with a multi-objective discriminator system optimized via Reinforcement Learning. Unlike prior methods relying on scalar feedback, DISCO-TAB evaluates synthesis at four granularities, token, sentence, feature, and row, while integrating Automated Constraint Discovery and Inverse-Frequency Reward Shaping to autonomously preserve latent medical logic and resolve minority-class collapse. We rigorously validate our framework across diverse benchmarks, including high-dimensional, small-sample medical datasets (e.g., Heart Failure, Parkinson's). Our results demonstrate that hierarchical feedback yields state-of-the-art performance, achieving up to 38.2% improvement in downstream clinical classifier utility compared to GAN and Diffusion baselines, while ensuring exceptional statistical fidelity (JSD < 0.01) and robust resistance to membership inference attacks. This work establishes a new standard for generating trustworthy, utility-preserving synthetic tabular data for sensitive healthcare applications.
【7】Residuals-based Offline Reinforcement Learning
标题:基于剩余的离线强化学习
链接:https://arxiv.org/abs/2604.01378
作者:Qing Zhu,Xian Yu
摘要:离线强化学习(RL)越来越受到人们的关注,因为它可以从先前收集的数据中学习策略,而无需与真实环境进行交互,这在高风险应用中尤为重要。虽然越来越多的工作已经开发出离线RL算法,但这些方法通常依赖于对数据覆盖率的限制性假设,并受到分布偏移的影响。在本文中,我们提出了一个基于残差的离线RL框架一般状态和动作空间。具体来说,我们定义了一个基于残差的贝尔曼最优算子,明确地将估计误差学习过渡动态到政策优化,利用经验残差。我们证明了这个Bellman算子是一个压缩映射,并确定了其不动点是渐近最优的条件,并具有有限样本保证。我们进一步开发了基于残差的离线深度Q学习(DQN)算法。使用随机CartPole环境,我们证明了我们的残差为基础的离线DQN算法的有效性。
摘要:Offline reinforcement learning (RL) has received increasing attention for learning policies from previously collected data without interaction with the real environment, which is particularly important in high-stakes applications. While a growing body of work has developed offline RL algorithms, these methods often rely on restrictive assumptions about data coverage and suffer from distribution shift. In this paper, we propose a residuals-based offline RL framework for general state and action spaces. Specifically, we define a residuals-based Bellman optimality operator that explicitly incorporates estimation error in learning transition dynamics into policy optimization by leveraging empirical residuals. We show that this Bellman operator is a contraction mapping and identify conditions under which its fixed point is asymptotically optimal and possesses finite-sample guarantees. We further develop a residuals-based offline deep Q-learning (DQN) algorithm. Using a stochastic CartPole environment, we demonstrate the effectiveness of our residuals-based offline DQN algorithm.
【8】Reinforcement Learning for Speculative Trading under Exploratory Framework
标题:探索性框架下的投机交易强化学习
链接:https://arxiv.org/abs/2604.02035
作者:Yun Zhao,Alex S. L. Tse,Harry Zheng
备注:37 pages, 14 figures
摘要:我们在Wang et al. [2020]的探索性强化学习(RL)框架内研究了一个投机交易问题。在一般效用函数和价格过程下,将该问题转化为一个关于进入和退出时间的序列最优停时问题。我们首先考虑一个宽松的版本的问题,其中停止时间是由有限的,非随机强度控制驱动的Cox过程的跳跃时间建模。在探索性公式下,智能体的随机控制通过跳跃强度上的概率测度来表征,并且它们的目标函数通过Shannon微分熵来正则化。这产生了一个系统的探索性HJB方程和吉布斯分布在封闭形式的最优策略。误差估计和RL目标的收敛到原问题的值函数的建立。最后,设计了一个强化学习算法,并在一个配对交易应用程序中展示了其实现。
摘要:We study a speculative trading problem within the exploratory reinforcement learning (RL) framework of Wang et al. [2020]. The problem is formulated as a sequential optimal stopping problem over entry and exit times under general utility function and price process. We first consider a relaxed version of the problem in which the stopping times are modeled by the jump times of Cox processes driven by bounded, non-randomized intensity controls. Under the exploratory formulation, the agent's randomized control is characterized via the probability measure over the jump intensities, and their objective function is regularized by Shannon's differential entropy. This yields a system of the exploratory HJB equations and Gibbs distributions in closed-form as the optimal policy. Error estimates and convergence of the RL objective to the value function of the original problem are established. Finally, an RL algorithm is designed, and its implementation is showcased in a pairs-trading application.
符号|符号学习(1篇)
【1】Bias Inheritance in Neural-Symbolic Discovery of Constitutive Closures Under Function-Class Mismatch
标题:功能类不匹配下神经符号结构性闭合发现中的偏差继承
链接:https://arxiv.org/abs/2604.01335
作者:Hanbing Liang,Ze Tao,Fujun Liu
摘要:我们调查的数据驱动发现的非线性反应扩散系统与已知的管理PDE结构的本构闭包。我们的目标是从时空观测中稳健地恢复扩散和反应定律,同时避免低残差或短视野预测与物理恢复相混淆的常见陷阱。我们提出了一个三阶段的神经符号框架:(1)使用噪声鲁棒弱形式驱动目标在物理约束下学习数值代理;(2)将这些代理压缩到受限的可解释符号族(例如,多项式、有理式和饱和形式);以及(3)通过对看不见的初始条件的显式向前再模拟来验证符号闭包。大量的数值实验揭示了两种不同的制度。在匹配库设置下,弱多项式基线表现为正确指定的参考估计量,表明神经代理并不均匀优于经典基础。相反,在函数类不匹配的情况下,神经代理提供了必要的灵活性,并且可以压缩成紧凑的符号法则,并且具有最小的卷展退化。然而,我们确定了一个关键的“偏见遗传”的机制,符号压缩不会自动修复构成偏见。在不同的观察制度,真正的错误的符号关闭密切跟踪的神经代理,产生一个偏差遗传比接近1。这些发现表明,神经符号建模的主要瓶颈在于初始的数值逆问题,而不是随后的符号压缩。我们强调,构成索赔必须严格支持前瞻性验证,而不是残差最小化。
摘要:We investigate the data-driven discovery of constitutive closures in nonlinear reaction-diffusion systems with known governing PDE structures. Our objective is to robustly recover diffusion and reaction laws from spatiotemporal observations while avoiding the common pitfall where low residuals or short-horizon predictions are conflated with physical recovery. We propose a three-stage neural-symbolic framework: (1) learning numerical surrogates under physical constraints using a noise-robust weak-form-driven objective; (2) compressing these surrogates into restricted interpretable symbolic families (e.g., polynomial, rational, and saturation forms); and (3) validating the symbolic closures through explicit forward re-simulation on unseen initial conditions. Extensive numerical experiments reveal two distinct regimes. Under matched-library settings, weak polynomial baselines behave as correctly specified reference estimators, showing that neural surrogates do not uniformly outperform classical bases. Conversely, under function-class mismatch, neural surrogates provide necessary flexibility and can be compressed into compact symbolic laws with minimal rollout degradation. However, we identify a critical "bias inheritance" mechanism where symbolic compression does not automatically repair constitutive bias. Across various observation regimes, the true error of the symbolic closure closely tracks that of the neural surrogate, yielding a bias inheritance ratio near one. These findings demonstrate that the primary bottleneck in neural-symbolic modeling lies in the initial numerical inverse problem rather than the subsequent symbolic compression. We underscore that constitutive claims must be rigorously supported by forward validation rather than residual minimization alone.
医学相关(2篇)
【1】Learning ECG Image Representations via Dual Physiological-Aware Alignments
标题:通过双重生理感知对齐学习心电图图像表示
链接:https://arxiv.org/abs/2604.01526
作者:Hung Manh Pham,Jialu Tang,Aaqib Saeed,Dong Ma,Bin Zhu,Pan Zhou
摘要:心电图(ECG)是心血管疾病最广泛使用的诊断工具之一,全球范围内的大量ECG数据仅以图像形式出现。然而,大多数现有的自动化ECG分析方法依赖于对原始信号记录的访问,限制了它们在现实世界和资源受限环境中的适用性。在本文中,我们提出了ECG-Scan,一个自我监督的框架,用于通过双重生理感知对齐从ECG图像学习临床广义表示:1)我们的方法使用图像和金标准信号-文本模态之间的多模态对比对齐来优化图像表示学习。2)我们进一步整合领域知识,通过软铅的约束,规范化的重建过程,提高信号铅的一致性。跨多个数据集和下游任务的广泛基准测试表明,与现有图像基线相比,我们基于图像的模型实现了卓越的性能,并显着缩小了ECG图像和信号分析之间的差距。这些结果突出了自我监督图像建模的潜力,以解锁大规模的传统ECG数据,并扩大自动心血管诊断的访问。
摘要:Electrocardiograms (ECGs) are among the most widely used diagnostic tools for cardiovascular diseases, and a large amount of ECG data worldwide appears only in image form. However, most existing automated ECG analysis methods rely on access to raw signal recordings, limiting their applicability in real-world and resource-constrained settings. In this paper, we present ECG-Scan, a self-supervised framework for learning clinically generalized representations from ECG images through dual physiological-aware alignments: 1) Our approach optimizes image representation learning using multimodal contrastive alignment between image and gold-standard signal-text modalities. 2) We further integrate domain knowledge via soft-lead constraints, regularizing the reconstruction process and improving signal lead inter-consistency. Extensive benchmarking across multiple datasets and downstream tasks demonstrates that our image-based model achieves superior performance compared to existing image baselines and notably narrows the gap between ECG image and signal analysis. These results highlight the potential of self-supervised image modeling to unlock large-scale legacy ECG data and broaden access to automated cardiovascular diagnostics.
【2】OkanNet: A Lightweight Deep Learning Architecture for Classification of Brain Tumor from MRI Images
标题:OkanNet:用于从MRI图像中分类脑肿瘤的轻量级深度学习架构
链接:https://arxiv.org/abs/2604.01264
作者:Okan Uçar,Murat Kurt
备注:7 pages, 3 figures, 1 table
摘要:医学成像技术,特别是磁共振成像(MRI),被认为是神经系统疾病诊断和治疗计划的金标准。然而,MRI图像的手动分析对于放射科医生来说是一个耗时的过程,并且由于疲劳而容易出现人为错误。在这项研究中,开发了两种不同的深度学习方法,并对它们进行了比较分析,用于脑肿瘤(胶质瘤、脑膜瘤、脑膜瘤和非肿瘤)的自动检测和分类。在第一种方法中,从头开始设计了一个名为“OkanNet”的自定义卷积神经网络(CNN)架构,该架构具有低计算成本和快速训练时间。在第二种方法中,迁移学习方法使用50层ResNet-50 [1]架构,在ImageNet数据集上进行预训练。在Masoud Nickparvar编译的扩展数据集上进行的实验中,包含总计7,023 $$ MRI图像,基于迁移学习的ResNet-50模型表现出卓越的分类性能,实现了96.49 $的准确性和0.963 $的精度。相比之下,自定义OkanNet架构达到了88.10美元的准确率;然而,它被证明是计算能力有限的移动和嵌入式系统的强大替代方案,在训练时间方面,它产生的结果比ResNet-50快约3.2倍(311美元秒)。本研究通过实验数据证明了医学图像分析中模型深度和计算效率之间的权衡。
摘要
:Medical imaging techniques, especially Magnetic Resonance Imaging (MRI), are accepted as the gold standard in the diagnosis and treatment planning of neurological diseases. However, the manual analysis of MRI images is a time-consuming process for radiologists and is prone to human error due to fatigue. In this study, two different Deep Learning approaches were developed and analyzed comparatively for the automatic detection and classification of brain tumors (Glioma, Meningioma, Pituitary, and No Tumor). In the first approach, a custom Convolutional Neural Network (CNN) architecture named "OkanNet", which has a low computational cost and fast training time, was designed from scratch. In the second approach, the Transfer Learning method was applied using the 50-layer ResNet-50 [1] architecture, pre-trained on the ImageNet dataset. In experiments conducted on an extended dataset compiled by Masoud Nickparvar containing a total of $7,023$ MRI images, the Transfer Learning-based ResNet-50 model exhibited superior classification performance, achieving $96.49\%$ Accuracy and $0.963$ Precision. In contrast, the custom OkanNet architecture reached an accuracy rate of $88.10\%$; however, it proved to be a strong alternative for mobile and embedded systems with limited computational power by yielding results approximately $3.2$ times faster ($311$ seconds) than ResNet-50 in terms of training time. This study demonstrates the trade-off between model depth and computational efficiency in medical image analysis through experimental data.
蒸馏|知识提取(1篇)
【1】Unifying Group-Relative and Self-Distillation Policy Optimization via Sample Routing
标题:通过样本路由统一组相关和自蒸馏策略优化
链接:https://arxiv.org/abs/2604.02288
作者:Gengsheng Li,Tianyu Yang,Junfeng Fang,Mingyang Song,Mao Zheng,Haiyun Guo,Dan Zhang,Jinqiao Wang,Tat-Seng Chua
摘要:带有可验证奖励的强化学习(RLVR)已经成为大型语言模型后训练的标准范例。虽然组相对策略优化(GRPO)被广泛采用,但其粗略的信用分配会统一惩罚失败的推出,缺乏有效解决特定偏差所需的令牌级关注。自蒸馏策略优化(SDPO)通过提供更密集、更有针对性的logit级监督来解决这个问题,从而促进快速的早期改进,但在长时间的训练中它经常崩溃。我们跟踪这种后期的不稳定性,以两个内在的缺陷:自蒸馏已经正确的样本引入优化的模糊性,和自学的信号可靠性逐步下降。为了解决这些问题,我们提出了样本路由策略优化(SRPO),一个统一的策略框架,正确的样本路由到GRPO的奖励对齐的钢筋和失败的样本SDPO的目标logit水平的校正。SRPO还结合了熵感知的动态加权机制,以抑制高熵、不可靠的蒸馏目标,同时强调有信心的目标。通过五个基准和两个模型尺度的评估,SRPO实现了SDPO的快速早期改进和GRPO的长期稳定性。它始终超过两个基线的峰值性能,将Qwen 3 -8B上的五个基准平均值比GRPO提高了3.4%,比SDPO提高了6.3%,同时产生了适度的响应长度,并将每步计算成本降低了17.2%。
摘要:Reinforcement learning with verifiable rewards (RLVR) has become a standard paradigm for post-training large language models. While Group Relative Policy Optimization (GRPO) is widely adopted, its coarse credit assignment uniformly penalizes failed rollouts, lacking the token-level focus needed to efficiently address specific deviations. Self-Distillation Policy Optimization (SDPO) addresses this by providing denser, more targeted logit-level supervision that facilitates rapid early improvement, yet it frequently collapses during prolonged training. We trace this late-stage instability to two intrinsic flaws: self-distillation on already-correct samples introduces optimization ambiguity, and the self-teacher's signal reliability progressively degrades. To resolve these issues, we propose Sample-Routed Policy Optimization (SRPO), a unified on-policy framework that routes correct samples to GRPO's reward-aligned reinforcement and failed samples to SDPO's targeted logit-level correction. SRPO further incorporates an entropy-aware dynamic weighting mechanism to suppress high-entropy, unreliable distillation targets while emphasizing confident ones. Evaluated across five benchmarks and two model scales, SRPO achieves both the rapid early improvement of SDPO and the long-horizon stability of GRPO. It consistently surpasses the peak performance of both baselines, raising the five-benchmark average on Qwen3-8B by 3.4% over GRPO and 6.3% over SDPO, while simultaneously yielding moderate response lengths and lowering per-step compute cost by up to 17.2%.
推荐(1篇)
【1】Grounded Token Initialization for New Vocabulary in LMs for Generative Recommendation
标题:生成性推荐的LM中新词汇的固定代币分配
链接:https://arxiv.org/abs/2604.02324
作者:Daiwei Chen,Zhoutong Fu,Chengming Jiang,Haichao Zhang,Ran Zhou,Tan Wang,Chunnan Yao,Guoyao Li,Rui Cai,Yihan Cao,Ruijie Jiang,Fedor Borisyuk,Jianqiang Shen,Jingwei Wu,Ramya Korlakai Vinayak
摘要:语言模型(LM)越来越多地扩展了用于特定领域任务的新的可学习词汇标记,例如生成式推荐中的Semantic-ID标记。标准的做法是将这些新的标记作为现有词汇嵌入的平均值,然后依靠监督微调来学习它们的表示。我们对这种策略进行了系统的分析:通过光谱和几何诊断,我们表明,平均初始化将所有新令牌折叠到一个退化的子空间中,消除了令牌间的差异,随后的微调努力完全恢复。这些发现表明,当使用新词汇表扩展LM时,\ldblquote令牌初始化“是一个关键瓶颈。受此诊断的启发,我们提出了扎根令牌假设:在微调之前,在预训练的嵌入空间中以语言学方式将新令牌接地,更好地使模型能够利用其通用知识用于新令牌领域。我们将这一假设操作化为GTI(Grounded Token),这是一个轻量级的基础阶段,在微调之前,仅使用成对的语言监督将新令牌映射到预训练嵌入空间中不同的、语义上有意义的位置。尽管简单,但GTI在多个生成式推荐基准(包括行业规模和公共数据集)的大多数评估设置中优于均值初始化和现有的任务自适应方法。进一步的分析表明,接地嵌入产生更丰富的令牌间的结构,通过微调持续,证实了初始化质量是词汇扩展的关键瓶颈的假设。
摘要:Language models (LMs) are increasingly extended with new learnable vocabulary tokens for domain-specific tasks, such as Semantic-ID tokens in generative recommendation. The standard practice initializes these new tokens as the mean of existing vocabulary embeddings, then relies on supervised fine-tuning to learn their representations. We present a systematic analysis of this strategy: through spectral and geometric diagnostics, we show that mean initialization collapses all new tokens into a degenerate subspace, erasing inter-token distinctions that subsequent fine-tuning struggles to fully recover. These findings suggest that \emph{token initialization} is a key bottleneck when extending LMs with new vocabularies. Motivated by this diagnosis, we propose the \emph{Grounded Token Initialization Hypothesis}: linguistically grounding novel tokens in the pretrained embedding space before fine-tuning better enables the model to leverage its general-purpose knowledge for novel-token domains. We operationalize this hypothesis as GTI (Grounded Token Initialization), a lightweight grounding stage that, prior to fine-tuning, maps new tokens to distinct, semantically meaningful locations in the pretrained embedding space using only paired linguistic supervision. Despite its simplicity, GTI outperforms both mean initialization and existing auxiliary-task adaptation methods in the majority of evaluation settings across multiple generative recommendation benchmarks, including industry-scale and public datasets. Further analyses show that grounded embeddings produce richer inter-token structure that persists through fine-tuning, corroborating the hypothesis that initialization quality is a key bottleneck in vocabulary extension.
聚类(1篇)
【1】A Novel Theoretical Analysis for Clustering Heteroscedastic Gaussian Data without Knowledge of the Number of Clusters
标题:在不了解集群数量的情况下对异方差高斯数据进行集群的新理论分析
链接:https://arxiv.org/abs/2604.01943
作者:Dominique Pastor,Elsa Dupraz,Ismail Hbilou,Guillaume Ansel
备注:76 pages, submitted to JMLR
摘要
:本文讨论了异方差测量向量的聚类问题,因为它们可以有不同的协方差矩阵。从假设一个给定的集群内的测量向量是高斯分布的集群质心周围的可能不同的和未知的协变矩阵,我们引入了一种新的成本函数来估计的质心。这个代价函数的梯度的零点是某个函数的不动点。因此,该方法概括了用于推导现有Mean-Shift算法的方法。但作为一个主要的和新的理论结果相比,Mean-Shift,本文表明,唯一的固定点的识别功能往往是集群的质心,如果每个集群的测量数量和质心之间的距离是足够大的。作为第二个贡献,本文介绍了用于聚类的Wald内核。该核定义为用于检验高斯均值的Wald假设检验的p值。因此,Wald内核测量测量向量属于给定集群的可扩展性,并且它比通常的高斯内核更好地与测量向量的维度进行缩放。最后,建议的理论框架,使我们能够得到一个新的聚类算法,称为CENTRE-X的工作原理,通过估计的不动点的识别功能。CENTRE-X与Mean-Shift一样,不需要事先知道聚类的数量。与Mean-Shift算法相比,它依赖于Wald假设检验来显著减少要计算的固定点的数量,从而导致复杂性的明显增加。仿真结果表明,CENTRE-X具有可比或更好的性能比标准的聚类算法K-means和Mean-Shift,即使在协方差矩阵是不完全知道。
摘要:This paper addresses the problem of clustering measurement vectors that are heteroscedastic in that they can have different covariance matrices. From the assumption that the measurement vectors within a given cluster are Gaussian distributed with possibly different and unknown covariant matrices around the cluster centroid, we introduce a novel cost function to estimate the centroids. The zeros of the gradient of this cost function turn out to be the fixed-points of a certain function. As such, the approach generalizes the methodology employed to derive the existing Mean-Shift algorithm. But as a main and novel theoretical result compared to Mean-Shift, this paper shows that the sole fixed-points of the identified function tend to be the cluster centroids if both the number of measurements per cluster and the distances between centroids are large enough. As a second contribution, this paper introduces the Wald kernel for clustering. This kernel is defined as the p-value of the Wald hypothesis test for testing the mean of a Gaussian. As such, the Wald kernel measures the plausibility that a measurement vector belongs to a given cluster and it scales better with the dimension of the measurement vectors than the usual Gaussian kernel. Finally, the proposed theoretical framework allows us to derive a new clustering algorithm called CENTRE-X that works by estimating the fixed-points of the identified function. As Mean-Shift, CENTRE-X requires no prior knowledge of the number of clusters. It relies on a Wald hypothesis test to significantly reduce the number of fixed points to calculate compared to the Mean-Shift algorithm, thus resulting in a clear gain in complexity. Simulation results on synthetic and real data sets show that CENTRE-X has comparable or better performance than standard clustering algorithms K-means and Mean-Shift, even when the covariance matrices are not perfectly known.
超分辨率|去噪|去模糊|去雾(1篇)
【1】Smoothing the Landscape: Causal Structure Learning via Diffusion Denoising Objectives
标题:平滑景观:通过扩散去噪目标进行因果结构学习
链接:https://arxiv.org/abs/2604.02250
作者:Hao Zhu,Di Zhou,Donna Slonim
备注:To appear in the Proceedings of the 5th Conference on Causal Learning and Reasoning (CLeaR 2026)
摘要:了解观测数据中的因果依赖关系对于为决策提供信息至关重要。这些关系通常被建模为贝叶斯网络(BN)和有向无环图(DAG)。现有的方法,如NOTEARS和DAG-GNN,经常面临高维数据的可扩展性和稳定性问题,特别是当存在特征样本不平衡时。在这里,我们证明了扩散模型的去噪得分匹配目标可以平滑梯度,以实现更快,更稳定的收敛。我们还提出了一个自适应的k跳非循环约束,提高了运行时间超过现有的解决方案,需要矩阵求逆。我们将这个框架命名为去噪扩散因果发现(DDCD)。与生成扩散模型不同,DDCD利用反向去噪过程来推断参数化的因果结构,而不是生成数据。我们证明了DDCD在合成基准数据上的竞争性能。我们还表明,我们的方法是实际有用的两个现实世界的例子进行定性分析。代码可在此URL:https://github.com/haozhu233/ddcd.
摘要:Understanding causal dependencies in observational data is critical for informing decision-making. These relationships are often modeled as Bayesian Networks (BNs) and Directed Acyclic Graphs (DAGs). Existing methods, such as NOTEARS and DAG-GNN, often face issues with scalability and stability in high-dimensional data, especially when there is a feature-sample imbalance. Here, we show that the denoising score matching objective of diffusion models could smooth the gradients for faster, more stable convergence. We also propose an adaptive k-hop acyclicity constraint that improves runtime over existing solutions that require matrix inversion. We name this framework Denoising Diffusion Causal Discovery (DDCD). Unlike generative diffusion models, DDCD utilizes the reverse denoising process to infer a parameterized causal structure rather than to generate data. We demonstrate the competitive performance of DDCDs on synthetic benchmarking data. We also show that our methods are practically useful by conducting qualitative analyses on two real-world examples. Code is available at this url: https://github.com/haozhu233/ddcd.
自动驾驶|车辆|车道检测等(2篇)
【1】SECURE: Stable Early Collision Understanding via Robust Embeddings in Autonomous Driving
标题:SECURE:通过自动驾驶中的鲁棒嵌入实现稳定的早期碰撞理解
链接:https://arxiv.org/abs/2604.01337
作者:Wenjing Wang,Wenxuan Wang,Songning Lai
备注:13 pages, 2 figures
摘要:虽然深度学习大大提高了事故预测,但这些安全关键系统对现实世界扰动的鲁棒性仍然是一个重大挑战。我们发现,国家的最先进的模型,如碰撞,尽管他们的高性能,表现出显着的不稳定性预测和潜在的表示时,面对轻微的输入扰动,造成严重的可靠性风险。为了解决这个问题,我们引入了安全-稳定的早期碰撞理解鲁棒嵌入,一个框架,正式定义和执行模型的鲁棒性。SECURE基于四个关键属性:预测空间和潜在特征空间的一致性和稳定性。我们提出了一种原则性的训练方法,该方法使用多目标损失来微调基线模型,从而最大限度地减少与参考模型的偏差,并惩罚对对抗性扰动的敏感性。DAD和CCD数据集上的实验表明,我们的方法不仅显着提高了对各种扰动的鲁棒性,而且还提高了对干净数据的性能,实现了新的最先进的结果。
摘要:While deep learning has significantly advanced accident anticipation, the robustness of these safety-critical systems against real-world perturbations remains a major challenge. We reveal that state-of-the-art models like CRASH, despite their high performance, exhibit significant instability in predictions and latent representations when faced with minor input perturbations, posing serious reliability risks. To address this, we introduce SECURE - Stable Early Collision Understanding Robust Embeddings, a framework that formally defines and enforces model robustness. SECURE is founded on four key attributes: consistency and stability in both prediction space and latent feature space. We propose a principled training methodology that fine-tunes a baseline model using a multi-objective loss, which minimizes divergence from a reference model and penalizes sensitivity to adversarial perturbations. Experiments on DAD and CCD datasets demonstrate that our approach not only significantly enhances robustness against various perturbations but also improves performance on clean data, achieving new state-of-the-art results.
【2】Macroscopic transport patterns of UAV traffic in 3D anisotropic wind fields: A constraint-preserving hybrid PINN-FVM approach
标题:3D各向异性风场中无人机交通的宏观传输模式:约束保持混合PINN-FVMM方法
链接:https://arxiv.org/abs/2604.01327
作者:Hanbing Liang,Fujun Liu
摘要:三维空域的宏观无人机交通组织面临着静风场和复杂障碍物的挑战。一个关键的困难在于同时捕获由风引起的强各向异性,同时严格保持传输一致性和边界语义,这通常在标准的物理信息学习方法中受到损害。为了解决这个问题,我们提出了一个约束保持混合求解器,它集成了一个物理信息的神经网络的各向异性Eikonal值问题与保守的有限体积法稳定密度运输。这些组件耦合通过一个外部皮卡德迭代与下松弛,其中的目标条件是硬编码和严格保守的无通量边界在运输步骤中强制执行。我们评估的框架可再生的归巢和点对点的情况下,有效地捕捉价值切片,诱导运动模式,和稳定的密度结构,如频带和瓶颈。最终,我们的观点强调了透明的经验诊断支持的可重复的计算框架的价值,使宏观交通现象的可追溯评估。
摘要
:Macroscopic unmanned aerial vehicle (UAV) traffic organization in three-dimensional airspace faces significant challenges from static wind fields and complex obstacles. A critical difficulty lies in simultaneously capturing the strong anisotropy induced by wind while strictly preserving transport consistency and boundary semantics, which are often compromised in standard physics-informed learning approaches. To resolve this, we propose a constraint-preserving hybrid solver that integrates a physics-informed neural network for the anisotropic Eikonal value problem with a conservative finite-volume method for steady density transport. These components are coupled through an outer Picard iteration with under-relaxation, where the target condition is hard-encoded and strictly conservative no-flux boundaries are enforced during the transport step. We evaluate the framework on reproducible homing and point-to-point scenarios, effectively capturing value slices, induced-motion patterns, and steady density structures such as bands and bottlenecks. Ultimately, our perspective emphasizes the value of a reproducible computational framework supported by transparent empirical diagnostics to enable the traceable assessment of macroscopic traffic phenomena.
点云|SLAM|雷达|激光|深度RGBD相关(1篇)
【1】On the Role of Depth in the Expressivity of RNNs
标题:深度在RNN表达性中的作用
链接:https://arxiv.org/abs/2604.02201
作者:Maude Lizaire,Michael Rizvi-Martel,Éric Dupuis,Guillaume Rabusseau
摘要:深度在前馈神经网络中的好处是众所周知的:用非线性激活组成多层线性变换可以实现复杂的计算。虽然在递归神经网络(RNN)中预计会有类似的效果,但目前尚不清楚深度如何与递归相互作用以塑造表达能力。在这里,我们正式表明,深度有效地增加了RNN在参数数量方面的记忆容量,从而通过实现更复杂的输入转换和改善对过去信息的保留来增强表达能力。我们将分析扩展到2 RNN,这是输入和隐藏状态之间具有乘法相互作用的RNN的推广。与RNN不同,RNN保持线性而没有非线性激活,2RNN执行多项式变换,其最大程度随深度而增长。我们进一步表明,乘法相互作用不能,在一般情况下,被取代的分层非线性。最后,我们在合成和现实世界的任务中验证了这些见解。
摘要:The benefits of depth in feedforward neural networks are well known: composing multiple layers of linear transformations with nonlinear activations enables complex computations. While similar effects are expected in recurrent neural networks (RNNs), it remains unclear how depth interacts with recurrence to shape expressive power. Here, we formally show that depth increases RNNs' memory capacity efficiently with respect to the number of parameters, thus enhancing expressivity both by enabling more complex input transformations and improving the retention of past information. We broaden our analysis to 2RNNs, a generalization of RNNs with multiplicative interactions between inputs and hidden states. Unlike RNNs, which remain linear without nonlinear activations, 2RNNs perform polynomial transformations whose maximal degree grows with depth. We further show that multiplicative interactions cannot, in general, be replaced by layerwise nonlinearities. Finally, we validate these insights empirically on synthetic and real-world tasks.
联邦学习|隐私保护|加密(1篇)
【1】BVFLMSP : Bayesian Vertical Federated Learning for Multimodal Survival with Privacy
标题:BVFLMSG:用于具有隐私的多模式生存的Bayesian垂直联邦学习
链接:https://arxiv.org/abs/2604.02248
作者:Abhilash Kar,Basisth Saha,Tanmay Sen,Biswabrata Pradhan
摘要:多模式时间事件预测通常需要集成分布在多方的敏感数据,由于隐私限制,集中式模型训练不切实际。与此同时,大多数现有的多模态生存模型产生单一的确定性预测,而没有表明模型在其估计中的置信度,这可能会限制其在现实世界决策中的可靠性。为了解决这些挑战,我们提出了BVFLMSP,贝叶斯垂直联合学习(VFL)框架的多模态时间到事件分析的基础上分裂神经网络架构。在BVFLMSP中,每个客户端使用贝叶斯神经网络独立地对特定的数据模态进行建模,而中央服务器聚合中间表示来执行生存风险预测。为了提高隐私,我们集成了差分隐私机制,在传输之前扰动客户端表示,在联邦训练过程中提供正式的隐私保证,防止信息泄露。 我们首先评估我们的贝叶斯多模态生存模型对广泛使用的单一模态生存基线和集中的多模态基线MultiSurv。在多模态设置中,所提出的方法显示出一致的改善,与MultiSurv相比,C指数高达0.02。然后,我们在不同的隐私预算下,在不同的模态组合中比较了联邦和集中式学习,突出了预测性能和隐私之间的权衡。实验结果表明,BVFLMSP有效地包括多模态数据,提高生存预测现有的基线,并保持强大的严格的隐私约束下,同时提供不确定性估计。
摘要:Multimodal time-to-event prediction often requires integrating sensitive data distributed across multiple parties, making centralized model training impractical due to privacy constraints. At the same time, most existing multimodal survival models produce single deterministic predictions without indicating how confident the model is in its estimates, which can limit their reliability in real-world decision making. To address these challenges, we propose BVFLMSP, a Bayesian Vertical Federated Learning (VFL) framework for multimodal time-to-event analysis based on a Split Neural Network architecture. In BVFLMSP, each client independently models a specific data modality using a Bayesian neural network, while a central server aggregates intermediate representations to perform survival risk prediction. To enhance privacy, we integrate differential privacy mechanisms by perturbing client side representations before transmission, providing formal privacy guarantees against information leakage during federated training. We first evaluate our Bayesian multimodal survival model against widely used single modality survival baselines and the centralized multimodal baseline MultiSurv. Across multimodal settings, the proposed method shows consistent improvements in discrimination performance, with up to 0.02 higher C-index compared to MultiSurv. We then compare federated and centralized learning under varying privacy budgets across different modality combinations, highlighting the tradeoff between predictive performance and privacy. Experimental results show that BVFLMSP effectively includes multimodal data, improves survival prediction over existing baselines, and remains robust under strict privacy constraints while providing uncertainty estimates.
推理|分析|理解|解释(9篇)
【1】Batched Contextual Reinforcement: A Task-Scaling Law for Efficient Reasoning
标题:批量上下文强化:高效推理的任务缩放定律
链接:https://arxiv.org/abs/2604.02322
作者:Bangji Yang,Hongbo Ma,Jiajun Fan,Ge Liu
备注:43 pages, 5 figures, 24 tables
摘要:采用思想链推理的大型语言模型实现了强大的性能,但遭受过度的令牌消耗,从而增加了推理成本。现有的效率方法,如显式长度惩罚,难度估计,或多阶段课程要么降低推理质量或需要复杂的训练管道。我们引入了批量上下文强化,这是一种极简主义的单阶段训练范式,通过简单的结构修改来解锁有效的推理:训练模型在共享上下文窗口内同时解决N个问题,纯粹通过每个实例的准确性来奖励。这个公式创建了一个隐式的令牌预算,产生了几个关键的发现:(1)我们确定了一个新的任务缩放定律:随着推理过程中并发问题N的数量增加,每个问题的令牌使用量单调下降,而准确性的下降远远超过基线,建立N作为一个可控的吞吐量维度。(2)BCR通过在标准的单问题推理中展示“免费午餐”现象,挑战了传统的准确性-效率权衡。在1.5B和4 B模型系列中,BCR将令牌使用量减少了15.8%至62.6%,同时在五个主要数学基准中保持或提高了准确性。(3)定性分析揭示了紧急自我调节的效率,其中模型自主消除冗余的元认知循环,没有明确的长度监督。(4)至关重要的是,我们的经验表明,隐式预算约束成功地规避了对抗梯度和灾难性的优化崩溃固有的显式长度处罚,提供了一个高度稳定的,基于约束的替代长度控制。这些结果证明了BCR的实用性,表明简单的结构性激励可以解锁LLM中潜在的高密度推理。
摘要
:Large Language Models employing Chain-of-Thought reasoning achieve strong performance but suffer from excessive token consumption that inflates inference costs. Existing efficiency methods such as explicit length penalties, difficulty estimators, or multi-stage curricula either degrade reasoning quality or require complex training pipelines. We introduce Batched Contextual Reinforcement, a minimalist, single-stage training paradigm that unlocks efficient reasoning through a simple structural modification: training the model to solve N problems simultaneously within a shared context window, rewarded purely by per-instance accuracy. This formulation creates an implicit token budget that yields several key findings: (1) We identify a novel task-scaling law: as the number of concurrent problems N increases during inference, per-problem token usage decreases monotonically while accuracy degrades far more gracefully than baselines, establishing N as a controllable throughput dimension. (2) BCR challenges the traditional accuracy-efficiency trade-off by demonstrating a "free lunch" phenomenon at standard single-problem inference. Across both 1.5B and 4B model families, BCR reduces token usage by 15.8% to 62.6% while consistently maintaining or improving accuracy across five major mathematical benchmarks. (3) Qualitative analyses reveal emergent self-regulated efficiency, where models autonomously eliminate redundant metacognitive loops without explicit length supervision. (4) Crucially, we empirically demonstrate that implicit budget constraints successfully circumvent the adversarial gradients and catastrophic optimization collapse inherent to explicit length penalties, offering a highly stable, constraint-based alternative for length control. These results prove BCR practical, showing simple structural incentives unlock latent high-density reasoning in LLMs.
【2】Taming the Exponential: A Fast Softmax Surrogate for Integer-Native Edge Inference
标题:驯服指数:整合原生边缘推理的快速Softmax替代品
链接:https://arxiv.org/abs/2604.02292
作者:Dimitrios Danopoulos,Enrico Lupi,Michael Kagan,Maurizio Pierini
摘要:Softmax可能成为Transformer模型的Multi-Head Attention(MHA)块中的计算瓶颈,特别是在低精度推理下的小型模型中,其中求幂和归一化会产生显著的开销。因此,我们建议使用头部校准的剪切线性Softmax(HCCS),这是指数softmax函数的有界单调替代,它使用最大集中注意力对数的剪切线性映射。这种近似产生稳定的概率分布,保持原始logits的顺序,并且具有非负值。HCCS与以前的softmax代理不同,因为它包括一组轻量级校准参数,这些参数基于代表性数据集进行离线优化,并针对每个单独的注意力头部进行校准,以保留各个头部的统计特性。我们描述了针对AMD Versal AI引擎的高吞吐量场景的硬件驱动的HCCS实现。AMD目前针对该平台的参考实现依赖于bfloat 16算术或LUT来执行指数运算,这可能会限制平台的吞吐量,并且无法利用AI引擎的高吞吐量整数向量处理单元。相比之下,HCCS提供了到AI引擎的int 8乘法累加(MAC)单元的自然映射。据我们所知,这是AMD AI引擎的第一个int 8优化的softmax替代品,其速度性能显著超过其他参考实现,同时在量化感知重新训练后,在小型或大量量化的MHA工作负载上保持有竞争力的任务准确性。
摘要:Softmax can become a computational bottleneck in the Transformer model's Multi-Head Attention (MHA) block, particularly in small models under low-precision inference, where exponentiation and normalization incur significant overhead. As such, we suggest using Head-Calibrated Clipped-Linear Softmax (HCCS), a bounded, monotone surrogate to the exponential softmax function, which uses a clipped linear mapping of the max centered attention logits. This approximation produces a stable probability distribution, maintains the ordering of the original logits and has non-negative values. HCCS differs from previous softmax surrogates as it includes a set of lightweight calibration parameters that are optimized offline based on a representative dataset and calibrated for each individual attention head to preserve the statistical properties of the individual heads. We describe a hardware-motivated implementation of HCCS for high-throughput scenarios targeting the AMD Versal AI Engines. The current reference implementations from AMD for this platform rely upon either bfloat16 arithmetic or LUTs to perform the exponential operation, which might limit the throughput of the platform and fail to utilize the high-throughput integer vector processing units of the AI Engine. In contrast, HCCS provides a natural mapping to the AI Engines' int8 multiply accumulate (MAC) units. To the best of our knowledge, this is the first int8 optimized softmax surrogate for AMD AI engines that significantly exceeds the speed performance of other reference implementations while maintaining competitive task accuracy on small or heavily quantized MHA workloads after quantization-aware retraining.
【3】LatentUM: Unleashing the Potential of Interleaved Cross-Modal Reasoning via a Latent-Space Unified Model
标题:LatentUM:通过潜伏空间统一模型释放交织跨模式推理的潜力
链接:https://arxiv.org/abs/2604.02097
作者:Jiachun Jin,Zetong Zhou,Xiao Yang,Hao Zhang,Pengfei Liu,Jun Zhu,Zhijie Deng
摘要:统一模型(UMs)承诺能够跨异构模态理解和生成内容。与仅仅生成视觉内容相比,使用UM进行交叉模态推理更有前途和价值,例如,用于解决需要密集视觉思维的理解问题,通过自我反思改善视觉生成,或通过逐步行动干预指导模拟物理世界的视觉动态。然而,现有的UM需要像素解码作为桥梁,由于其不相交的视觉表示的理解和生成,这是无效和低效的。在本文中,我们介绍了LatentUM,一种新的统一模型,表示在一个共享的语义潜在空间内的所有模态,消除了视觉理解和生成之间的像素空间调解的需要。这种设计自然能够实现灵活的交错跨模态推理和生成。除了提高计算效率外,共享表示还大大消除了编解码器偏差并加强了跨模态对齐,使LatentUM能够在视觉空间规划基准上实现最先进的性能,通过自我反思推动视觉生成的极限,并通过预测共享语义潜在空间内的未来视觉状态来支持世界建模。
摘要:Unified models (UMs) hold promise for their ability to understand and generate content across heterogeneous modalities. Compared to merely generating visual content, the use of UMs for interleaved cross-modal reasoning is more promising and valuable, e.g., for solving understanding problems that require dense visual thinking, improving visual generation through self-reflection, or modeling visual dynamics of the physical world guided by stepwise action interventions. However, existing UMs necessitate pixel decoding as a bridge due to their disjoint visual representations for understanding and generation, which is both ineffective and inefficient. In this paper, we introduce LatentUM, a novel unified model that represents all modalities within a shared semantic latent space, eliminating the need for pixel-space mediation between visual understanding and generation. This design naturally enables flexible interleaved cross-modal reasoning and generation. Beyond improved computational efficiency, the shared representation substantially alleviates codec bias and strengthens cross-modal alignment, allowing LatentUM to achieve state-of-the-art performance on the Visual Spatial Planning benchmark, push the limits of visual generation through self-reflection, and support world modeling by predicting future visual states within the shared semantic latent space.
【4】Apriel-Reasoner: RL Post-Training for General-Purpose and Efficient Reasoning
标题:Apriel-Reasoner:用于通用和高效推理的RL后训练
链接:https://arxiv.org/abs/2604.02007
作者:Rafael Pardinas,Ehsan Kamalloo,David Vazquez,Alexandre Drouin
备注:20 pages, 4 tables, 6 figures, appendix included
摘要:使用具有可验证奖励的强化学习(RLVR)构建跨不同领域的通用推理模型已被前沿开放权重模型广泛采用。然而,它们的训练配方和域混合通常不会公开。跨领域的联合优化带来了重大挑战:领域在推出长度、问题难度和样本效率方面差异很大。此外,具有长的思想链跟踪的模型增加了推理成本和延迟,使得效率对于实际部署至关重要。我们提出了Apriel-Reasoner,在Apriel-Base上使用完全可重复的多域RL后训练配方进行训练,Apriel-Base是一个15 B参数的开放权重LLM,使用公共数据集跨越五个领域:数学,代码生成,指令遵循,逻辑难题和函数调用。我们引入了一个自适应域采样机制,保留目标域的比例,尽管异构的推出动态,和一个困难意识的标准长度惩罚的扩展,没有额外的训练开销,鼓励较长的推理困难的问题和较短的痕迹简单的。经过严格的16 K令牌输出预算训练,Apriel-Reasoner在推理时一般化到32 K令牌,并在AIME 2025,GPQA,MMLU-Pro和LiveCodeBench上进行了改进,同时产生30-50%的推理跟踪。它以较低的代币成本匹配类似大小的强大开放权重模型,从而推动准确性与代币预算的帕累托边界。
摘要
:Building general-purpose reasoning models using reinforcement learning with verifiable rewards (RLVR) across diverse domains has been widely adopted by frontier open-weight models. However, their training recipes and domain mixtures are often not disclosed. Joint optimization across domains poses significant challenges: domains vary widely in rollout length, problem difficulty and sample efficiency. Further, models with long chain-of-thought traces increase inference cost and latency, making efficiency critical for practical deployment. We present Apriel-Reasoner, trained with a fully reproducible multi-domain RL post-training recipe on Apriel-Base, a 15B-parameter open-weight LLM, across five domains using public datasets: mathematics, code generation, instruction following, logical puzzles and function calling. We introduce an adaptive domain sampling mechanism that preserves target domain ratios despite heterogeneous rollout dynamics, and a difficulty-aware extension of the standard length penalty that, with no additional training overhead, encourages longer reasoning for difficult problems and shorter traces for easy ones. Trained with a strict 16K-token output budget, Apriel-Reasoner generalizes to 32K tokens at inference and improves over Apriel-Base on AIME 2025, GPQA, MMLU-Pro, and LiveCodeBench while producing 30-50% shorter reasoning traces. It matches strong open-weight models of similar size at lower token cost, thereby pushing the Pareto frontier of accuracy versus token budget.
【5】LiveMathematicianBench: A Live Benchmark for Mathematician-Level Reasoning with Proof Sketches
标题:LiveMathematicianBench:具有证明草图的数学级推理的实时基准
链接:https://arxiv.org/abs/2604.01754
作者:Linyang He,Qiyao Yu,Hanze Dong,Baohao Liao,Xinxing Xu,Micah Goldblum,Jiang Bian,Nima Mesgarani
备注:Project page: https://livemathematicianbench.github.io/
摘要:数学推理是人类智能的标志,大型语言模型(LLM)是否能够有意义地执行它仍然是人工智能和认知科学的核心问题。随着LLM越来越多地融入科学工作流程,对他们的数学能力进行严格评估成为一种实际需要。现有的基准受到合成设置和数据污染的限制。我们提出了LiveMathematicianBench,这是一个动态的多项选择基准,用于研究级数学推理,它是根据模型训练截止后最近发表的arXiv论文构建的。通过在新发表的定理中进行评估,它提供了一个超越记忆模式的现实测试平台。基准引入了定理类型的十三类逻辑分类法(例如,蕴涵、等价、存在、唯一),从而能够跨推理形式进行细粒度评估。它采用了一个证明草图引导的干扰物管道,使用高层次的证明策略来构建看似合理但无效的答案选择,反映误导性的证明方向,增加了对表面水平匹配的真正理解的敏感性。我们还引入了一个替代抵抗机制,以区分答案识别实质性推理。评估显示基准远远没有饱和:Gemini-3.1-pro-preview,最好的模型,只达到43.5%。在抗替代评估下,准确率急剧下降:GPT-5.4得分最高为30.6%,而Gemini-3.1-pro-preview下降到17.6%,低于20%的随机基线。一个双模式协议表明,证明草图访问产生一致的准确性增益,这表明模型可以利用高层次的证明策略进行推理。总的来说,LiveMathematicianBench提供了一个可扩展的,抗污染的测试平台,用于研究LLM中的研究级数学推理。
摘要:Mathematical reasoning is a hallmark of human intelligence, and whether large language models (LLMs) can meaningfully perform it remains a central question in artificial intelligence and cognitive science. As LLMs are increasingly integrated into scientific workflows, rigorous evaluation of their mathematical capabilities becomes a practical necessity. Existing benchmarks are limited by synthetic settings and data contamination. We present LiveMathematicianBench, a dynamic multiple-choice benchmark for research-level mathematical reasoning built from recent arXiv papers published after model training cutoffs. By grounding evaluation in newly published theorems, it provides a realistic testbed beyond memorized patterns. The benchmark introduces a thirteen-category logical taxonomy of theorem types (e.g., implication, equivalence, existence, uniqueness), enabling fine-grained evaluation across reasoning forms. It employs a proof-sketch-guided distractor pipeline that uses high-level proof strategies to construct plausible but invalid answer choices reflecting misleading proof directions, increasing sensitivity to genuine understanding over surface-level matching. We also introduce a substitution-resistant mechanism to distinguish answer recognition from substantive reasoning. Evaluation shows the benchmark is far from saturated: Gemini-3.1-pro-preview, the best model, achieves only 43.5%. Under substitution-resistant evaluation, accuracy drops sharply: GPT-5.4 scores highest at 30.6%, while Gemini-3.1-pro-preview falls to 17.6%, below the 20% random baseline. A dual-mode protocol reveals that proof-sketch access yields consistent accuracy gains, suggesting models can leverage high-level proof strategies for reasoning. Overall, LiveMathematicianBench offers a scalable, contamination-resistant testbed for studying research-level mathematical reasoning in LLMs.
【6】When Reward Hacking Rebounds: Understanding and Mitigating It with Representation-Level Signals
标题:当奖励黑客反弹时:理解并通过代表级信号缓解它
链接:https://arxiv.org/abs/2604.01476
作者:Rui Wu,Ruixiang Tang
备注:15 pages, 8 figures
摘要:LLM的强化学习容易受到奖励黑客攻击,其中模型利用捷径来最大化奖励,而不解决预期的任务。我们系统地研究了这种现象,在编码任务中使用环境操作设置,模型可以重写评估器代码,以平凡地通过测试,而无需解决任务,作为一个受控的测试平台。在这两个研究模型,我们确定了一个可重复的三阶段反弹模式:模型首先尝试重写评估器,但失败,因为他们的重写嵌入测试用例,他们自己的解决方案不能通过。然后,他们暂时撤退到合法的解决方案。当合法的奖励仍然稀缺时,他们会用不同的策略成功地进行黑客攻击。使用表示工程,我们提取的概念方向的捷径,欺骗和评估意识域一般对比对,并发现捷径方向跟踪黑客行为最密切,使其成为一个有效的代表性代理检测。出于这一发现的动机,我们提出了优势修改,它将捷径概念得分集成到GRPO优势计算中,以在政策更新之前惩罚黑客推出。由于惩罚被内化到训练信号中,而不是仅在推理时应用,因此与生成时间激活转向相比,优势修改提供了更强大的黑客攻击抑制。
摘要:Reinforcement learning for LLMs is vulnerable to reward hacking, where models exploit shortcuts to maximize reward without solving the intended task. We systematically study this phenomenon in coding tasks using an environment-manipulation setting, where models can rewrite evaluator code to trivially pass tests without solving the task, as a controlled testbed. Across both studied models, we identify a reproducible three-phase rebound pattern: models first attempt to rewrite the evaluator but fail, as their rewrites embed test cases their own solutions cannot pass. They then temporarily retreat to legitimate solving. When legitimate reward remains scarce, they rebound into successful hacking with qualitatively different strategies. Using representation engineering, we extract concept directions for shortcut, deception, and evaluation awareness from domain-general contrastive pairs and find that the shortcut direction tracks hacking behavior most closely, making it an effective representational proxy for detection. Motivated by this finding, we propose Advantage Modification, which integrates shortcut concept scores into GRPO advantage computation to penalize hacking rollouts before policy updates. Because the penalty is internalized into the training signal rather than applied only at inference time, Advantage Modification provides more robust suppression of hacking compared with generation-time activation steering.
【7】Massively Parallel Exact Inference for Hawkes Processes
标题:Hawkes过程的大规模并行精确推理
链接:https://arxiv.org/abs/2604.01342
作者:Ahmer Raza,Hudson Smith
摘要:多变量Hawkes过程是一类广泛使用的自激点过程,但极大似然估计在事件数上的尺度为O(N^2)$。规范线性指数霍克斯过程承认更快的$O(N)$递归,但以前的工作评估这个递归顺序,没有利用现代GPU上的并行化。我们表明,霍克斯过程强度可以表示为一个产品的稀疏过渡矩阵承认线性时间关联乘法,使计算通过并行前缀扫描。这产生了一个简单的,但大规模并行算法的最大似然估计的线性指数霍克斯过程。我们的方法降低了计算复杂度约为O(N/P)$与$P$并行处理器,自然产生一个并行的计划,以保持恒定的内存使用,避免GPU内存的限制。重要的是,它在没有任何额外假设或近似的情况下计算精确的可能性,保持了模型的简单性和可解释性。我们在模拟和真实数据集上展示了数量级的加速,扩展到数千个节点和数千万个事件,大大超出了先前工作中报告的规模。我们提供了一个开源的PyTorch库来实现我们的优化。
摘要:Multivariate Hawkes processes are a widely used class of self-exciting point processes, but maximum likelihood estimation naively scales as $O(N^2)$ in the number of events. The canonical linear exponential Hawkes process admits a faster $O(N)$ recurrence, but prior work evaluates this recurrence sequentially, without exploiting parallelization on modern GPUs. We show that the Hawkes process intensity can be expressed as a product of sparse transition matrices admitting a linear-time associative multiply, enabling computation via a parallel prefix scan. This yields a simple yet massively parallelizable algorithm for maximum likelihood estimation of linear exponential Hawkes processes. Our method reduces the computational complexity to approximately $O(N/P)$ with $P$ parallel processors, and naturally yields a batching scheme to maintain constant memory usage, avoiding GPU memory constraints. Importantly, it computes the exact likelihood without any additional assumptions or approximations, preserving the simplicity and interpretability of the model. We demonstrate orders-of-magnitude speedups on simulated and real datasets, scaling to thousands of nodes and tens of millions of events, substantially beyond scales reported in prior work. We provide an open-source PyTorch library implementing our optimizations.
【8】An Online Machine Learning Multi-resolution Optimization Framework for Energy System Design Limit of Performance Analysis
标题:能源系统设计极限性能分析的在线机器学习多分辨率优化框架
链接:https://arxiv.org/abs/2604.01308
作者:Oluwamayowa O. Amusat,Luka Grbcic,Remi Patureau,M. Jibran S. Zuberi,Dan Gunter,Michael Wetter
摘要:为工业过程设计可靠的集成能源系统需要跨多个层级的优化和验证模型,从架构级规模调整到高保真动态操作。然而,模型不匹配的性能损失的来源模糊,复杂的量化体系结构到操作的性能差距。我们提出了一个在线的,机器学习加速的多分辨率优化框架,估计可实现的性能的架构特定的上限,同时最大限度地减少昂贵的高保真模型评估。我们证明了一个试点能源系统提供1兆瓦的工业热负荷的方法。首先,我们解决了一个多目标的架构优化,选择系统配置和组件的能力。然后,我们开发了一种机器学习(ML)加速的多分辨率,滚动时域最优控制策略,该策略接近指定架构的可接受性能界限,给定架构优化模型未捕获的额外控制和动态。ML引导的控制器自适应调度的预测不确定性和热启动高保真解决方案的基础上的优化解决方案,使用精英低保真解决方案。我们对试点案例研究的结果表明,相对于基于规则的控制器,所提出的多分辨率策略将架构到操作的性能差距减少了42%,同时相对于没有ML指导的相同多保真度方法,将所需的高保真模型评估减少了34%,从而实现了更快,更可靠的设计验证。总之,这些增益使高保真验证易于处理,提供了可实现的操作性能的实际上限。
摘要:Designing reliable integrated energy systems for industrial processes requires optimization and verification models across multiple fidelities, from architecture-level sizing to high-fidelity dynamic operation. However, model mismatch across fidelities obscures the sources of performance loss and complicates the quantification of architecture-to-operation performance gaps. We propose an online, machine-learning-accelerated multi-resolution optimization framework that estimates an architecture-specific upper bound on achievable performance while minimizing expensive high-fidelity model evaluations. We demonstrate the approach on a pilot energy system supplying a 1 MW industrial heat load. First, we solve a multi-objective architecture optimization to select the system configuration and component capacities. We then develop an machine learning (ML)-accelerated multi-resolution, receding-horizon optimal control strategy that approaches the achievable-performance bound for the specified architecture, given the additional controls and dynamics not captured by the architectural optimization model. The ML-guided controller adaptively schedules the optimization resolution based on predictive uncertainty and warm-starts high-fidelity solves using elite low-fidelity solutions. Our results on the pilot case study show that the proposed multi-resolution strategy reduces the architecture-to-operation performance gap by up to 42% relative to a rule-based controller, while reducing required high-fidelity model evaluations by 34% relative to the same multi-fidelity approach without ML guidance, enabling faster and more reliable design verification. Together, these gains make high-fidelity verification tractable, providing a practical upper bound on achievable operational performance.
【9】Gradient estimators for parameter inference in discrete stochastic kinetic models
标题:离散随机动力学模型参数推断的梯度估计
链接:https://arxiv.org/abs/2604.02121
作者:Ludwig Burger,Annalena Kofler,Lukas Heinrich,Ulrich Gerland
备注:13 pages, 6 figures
摘要:随机动力学模型在物理学中无处不在,但从实验数据推断其参数仍然具有挑战性。在确定性模型中,参数推断通常依赖于梯度,因为它们可以通过自动微分有效地获得。然而,这些工具不能直接应用于随机模拟算法(SSA),如吉莱斯皮算法,因为从一组离散的反应采样引入不可微的操作。在这项工作中,我们采用了三个梯度估计从机器学习的Gillespie SSA:Gumbel-Softmax直通(GS-ST)估计,得分函数估计和替代路径估计。我们比较所有估计在两个代表性的系统表现出松弛或振荡动力学,后者需要梯度估计的时间相关的目标函数的属性。我们发现,GS-ST估计大多产生良好的梯度估计,但在具有挑战性的参数制度表现出发散的方差,导致不成功的参数推断。在这些情况下,其他估计器提供更鲁棒、更低的方差梯度。我们的研究结果表明,基于梯度的参数推断可以有效地集成与Gillespie SSA,不同的估计提供互补的优势。
摘要:Stochastic kinetic models are ubiquitous in physics, yet inferring their parameters from experimental data remains challenging. In deterministic models, parameter inference often relies on gradients, as they can be obtained efficiently through automatic differentiation. However, these tools cannot be directly applied to stochastic simulation algorithms (SSA) such as the Gillespie algorithm, since sampling from a discrete set of reactions introduces non-differentiable operations. In this work, we adopt three gradient estimators from machine learning for the Gillespie SSA: the Gumbel-Softmax Straight-Through (GS-ST) estimator, the Score Function estimator, and the Alternative Path estimator. We compare the properties of all estimators in two representative systems exhibiting relaxation or oscillatory dynamics, where the latter requires gradient estimation of time-dependent objective functions. We find that the GS-ST estimator mostly yields well-behaved gradient estimates, but exhibits diverging variance in challenging parameter regimes, resulting in unsuccessful parameter inference. In these cases, the other estimators provide more robust, lower variance gradients. Our results demonstrate that gradient-based parameter inference can be integrated effectively with the Gillespie SSA, with different estimators offering complementary advantages.
检测相关(2篇)
【1】Mining Instance-Centric Vision-Language Contexts for Human-Object Interaction Detection
标题:挖掘以实例为中心的视觉语言上下文用于人机交互检测
链接:https://arxiv.org/abs/2604.02071
作者:Soo Won Seo,KyungChae Lee,Hyungchan Cho,Taein Son,Nam Ik Cho,Jun Won Choi
备注:Accepted to CVPR 2026. Code: https://github.com/nowuss/InCoM-Net
摘要:人-物交互(HOI)检测旨在定位人-物对并从单个图像中对它们的交互进行分类,这是一项需要强大的视觉理解和细致入微的上下文推理的任务。最近的方法利用视觉语言模型(VLM)引入语义先验,显着提高HOI检测性能。然而,现有的方法往往无法充分利用分布在整个场景中的各种上下文线索。为了克服这些限制,我们提出了以实例为中心的上下文挖掘网络(InCoM-Net)-一种新的框架,有效地集成了丰富的语义知识提取的VLMs与实例特定的功能产生的对象检测器。这种设计通过不仅在每个检测到的实例内而且在实例及其周围场景上下文之间建模关系来实现更深层次的交互推理。InCoM-Net包括两个核心组件:以实例为中心的上下文细化(ICR),它从VLM衍生的特征中分别提取实例内,实例间和全局上下文线索,以及渐进式上下文聚合(ProCA),它迭代地将这些多上下文特征与实例级检测器特征融合,以支持高级HOI推理。在HICO-DET和V-COCO基准上进行的大量实验表明,InCoM-Net实现了最先进的性能,超过了以前的HOI检测方法。代码可在https://github.com/nowuss/InCoM-Net上获得。
摘要:Human-Object Interaction (HOI) detection aims to localize human-object pairs and classify their interactions from a single image, a task that demands strong visual understanding and nuanced contextual reasoning. Recent approaches have leveraged Vision-Language Models (VLMs) to introduce semantic priors, significantly improving HOI detection performance. However, existing methods often fail to fully capitalize on the diverse contextual cues distributed across the entire scene. To overcome these limitations, we propose the Instance-centric Context Mining Network (InCoM-Net)-a novel framework that effectively integrates rich semantic knowledge extracted from VLMs with instance-specific features produced by an object detector. This design enables deeper interaction reasoning by modeling relationships not only within each detected instance but also across instances and their surrounding scene context. InCoM-Net comprises two core components: Instancecentric Context Refinement (ICR), which separately extracts intra-instance, inter-instance, and global contextual cues from VLM-derived features, and Progressive Context Aggregation (ProCA), which iteratively fuses these multicontext features with instance-level detector features to support high-level HOI reasoning. Extensive experiments on the HICO-DET and V-COCO benchmarks show that InCoM-Net achieves state-of-the-art performance, surpassing previous HOI detection methods. Code is available at https://github.com/nowuss/InCoM-Net.
【2】IndoorCrowd: A Multi-Scene Dataset for Human Detection, Segmentation, and Tracking with an Automated Annotation Pipeline
标题:IndoorCrowd:用于人体检测、分割和跟踪的多场景数据集,具有自动注释管道
链接:https://arxiv.org/abs/2604.02032
作者:Sebastian-Ion Nae,Radu Moldoveanu,Alexandra Stefania Ghita,Adina Magda Florea
备注:Accepted at Conference on Computer Vision and Pattern Recognition Workshops 2026
摘要:了解人类在拥挤的室内环境中的行为是监控、智能建筑和人机交互的核心,但现有的数据集很少能大规模捕捉到真实世界的室内复杂性。我们介绍了IndoorCrowd,这是一个用于室内人体检测、实例分割和多对象跟踪的多场景数据集,收集自四个校园地点(ACS-EC、ACS-EG、IE-Central、R-Central)。它包含31 $视频(9 {,}913$帧,5 $fps),具有人工验证的每个实例分割掩码。一个620美元的框架控制子集基准测试了三个基础模型自动注释器:SAM 3,GroundingSAM和EfficientGroundingSAM,使用Cohen的$κ$,AP,精度,召回率和掩码IoU来对抗人类标签。另一个$2{,}552$帧子集支持具有MOTChallenge格式的连续身份跟踪的多对象跟踪。我们使用YOLOv 8 n,YOLOv 26 n和RT-DETR-L与ByteTrack,BoT-SORT和OC-SORT配对建立检测,分割和跟踪基线。逐场景分析揭示了由人群密度、规模和遮挡驱动的大量难度变化:ACS-EC具有79.3\%$密集帧和60.8$px的平均实例规模,是最具挑战性的场景。项目页面可在https://sheepseb.github.io/IndoorCrowd/上找到。
摘要:Understanding human behaviour in crowded indoor environments is central to surveillance, smart buildings, and human-robot interaction, yet existing datasets rarely capture real-world indoor complexity at scale. We introduce IndoorCrowd, a multi-scene dataset for indoor human detection, instance segmentation, and multi-object tracking, collected across four campus locations (ACS-EC, ACS-EG, IE-Central, R-Central). It comprises $31$ videos ($9{,}913$ frames at $5$fps) with human-verified, per-instance segmentation masks. A $620$-frame control subset benchmarks three foundation-model auto-annotators: SAM3, GroundingSAM, and EfficientGroundingSAM, against human labels using Cohen's $κ$, AP, precision, recall, and mask IoU. A further $2{,}552$-frame subset supports multi-object tracking with continuous identity tracks in MOTChallenge format. We establish detection, segmentation, and tracking baselines using YOLOv8n, YOLOv26n, and RT-DETR-L paired with ByteTrack, BoT-SORT, and OC-SORT. Per-scene analysis reveals substantial difficulty variation driven by crowd density, scale, and occlusion: ACS-EC, with $79.3\%$ dense frames and a mean instance scale of $60.8$px, is the most challenging scene. The project page is available at https://sheepseb.github.io/IndoorCrowd/.
分类|识别(3篇)
【1】Best-Arm Identification with Noisy Actuation
标题:有噪音驱动的最佳手臂识别
链接:https://arxiv.org/abs/2604.02255
作者:Merve Karakas,Osama Hanna,Lin F. Yang,Christina Fragouli
摘要:在本文中,我们考虑一个多臂强盗(MAB)的情况下,研究如何确定最好的手臂时,手臂命令传达从一个中央学习器到一个分布式代理在一个离散的无记忆通道(DMC)。根据代理的能力,我们提供的通信方案,以及他们的分析,有趣的是涉及到的基础DMC的零错误容量。
摘要:In this paper, we consider a multi-armed bandit (MAB) instance and study how to identify the best arm when arm commands are conveyed from a central learner to a distributed agent over a discrete memoryless channel (DMC). Depending on the agent capabilities, we provide communication schemes along with their analysis, which interestingly relate to the zero-error capacity of the underlying DMC.
【2】AstroConcepts: A Large-Scale Multi-Label Classification Corpus for Astrophysics
标题:AstroConcepts:天体物理学大规模多标签分类数据库
链接:https://arxiv.org/abs/2604.02156
作者:Atilla Kaan Alkan,Felix Grezes,Sergi Blanco-Cuaresma,Jennifer Lynn Bartlett,Daniel Chivvis,Anna Kelbert,Kelly Lockhart,Alberto Accomazzi
备注:9 pages, 2 figures
摘要:科学多标签文本分类遭受极端的类不平衡,其中专业术语表现出严重的幂律分布,挑战标准分类方法。现有的科学语料库缺乏全面的控制词汇,而是集中在广泛的类别和限制系统研究的极端不平衡。我们介绍AstroConcepts,这是一个英文摘要语料库,来自21,702篇已发表的天体物理学论文,标记有来自统一天文学主题词表的2,367个概念。语料库显示出严重的标签不平衡,76%的概念的训练示例少于50个。通过发布此资源,我们可以系统地研究科学领域中的极端类不平衡,并在传统,神经和词汇限制的LLM方法中建立强大的基线。我们的评估揭示了三个关键模式,为科学文本分类提供了新的见解。首先,词汇约束的LLM实现竞争力的性能相对于领域适应模型在天体物理学分类,这表明一个潜在的参数有效的方法。其次,领域适应产生相对较大的改进,罕见的,专门的术语,虽然绝对性能仍然有限,在所有的方法。第三,我们提出了频率分层评价,以揭示性能模式,隐藏的总分数,从而使鲁棒性评估的科学多标签评价的中心。这些结果为科学NLP提供了可操作的见解,并为极端不平衡的研究建立了基准。
摘要:Scientific multi-label text classification suffers from extreme class imbalance, where specialized terminology exhibits severe power-law distributions that challenge standard classification approaches. Existing scientific corpora lack comprehensive controlled vocabularies, focusing instead on broad categories and limiting systematic study of extreme imbalance. We introduce AstroConcepts, a corpus of English abstracts from 21,702 published astrophysics papers, labeled with 2,367 concepts from the Unified Astronomy Thesaurus. The corpus exhibits severe label imbalance, with 76% of concepts having fewer than 50 training examples. By releasing this resource, we enable systematic study of extreme class imbalance in scientific domains and establish strong baselines across traditional, neural, and vocabulary-constrained LLM methods. Our evaluation reveals three key patterns that provide new insights into scientific text classification. First, vocabulary-constrained LLMs achieve competitive performance relative to domain-adapted models in astrophysics classification, suggesting a potential for parameter-efficient approaches. Second, domain adaptation yields relatively larger improvements for rare, specialized terminology, although absolute performance remains limited across all methods. Third, we propose frequency-stratified evaluation to reveal performance patterns that are hidden by aggregate scores, thereby making robustness assessment central to scientific multi-label evaluation. These results offer actionable insights for scientific NLP and establish benchmarks for research on extreme imbalance.
【3】Probabilistic classification from possibilistic data: computing Kullback-Leibler projection with a possibility distribution
标题:可能性数据的概率分类:用可能性分布计算Kullback-Leibler投影
链接:https://arxiv.org/abs/2604.01939
作者:Ismaïl Baaj,Pierre Marquis
摘要:我们认为学习与可能性监督多类分类。对于每个训练实例,监督是一个归一化的可能性分布,表示类的分级可扩展性。从这个可能性分布,我们构造了一个非空的封闭凸集的容许概率分布相结合的两个要求:概率相容性的可能性和必要性措施引起的可能性分布,和线性形状的约束,必须满足保持定性结构的可能性分布。因此,具有相同可能性程度的类接收相等的概率,并且如果一类具有比另一类严格更大的可能性程度,则它接收严格更大的概率。给定一个实例的模型输出的严格正概率向量,我们计算它在容许集上的Kullback-Leibler投影。这个投影产生了Kullback-Leibler意义上的最接近的可容许概率分布。然后,我们可以通过最小化预测与其投影之间的偏差来训练模型,这量化了满足诱导优势和形状约束所需的最小调整。的投影计算与Dykstra的算法使用Bregman投影与负熵,我们提供明确的公式投影到每个约束集。基于ChaosNLI数据集的合成数据和真实世界自然语言推理任务的实验表明,所提出的投影算法对于实际使用是足够有效的,并且由此产生的基于投影的学习目标可以提高预测性能。
摘要
:We consider learning with possibilistic supervision for multi-class classification. For each training instance, the supervision is a normalized possibility distribution that expresses graded plausibility over the classes. From this possibility distribution, we construct a non-empty closed convex set of admissible probability distributions by combining two requirements: probabilistic compatibility with the possibility and necessity measures induced by the possibility distribution, and linear shape constraints that must be satisfied to preserve the qualitative structure of the possibility distribution. Thus, classes with the same possibility degree receive equal probabilities, and if a class has a strictly larger possibility degree than another class, then it receives a strictly larger probability. Given a strictly positive probability vector output by a model for an instance, we compute its Kullback-Leibler projection onto the admissible set. This projection yields the closest admissible probability distribution in Kullback-Leibler sense. We can then train the model by minimizing the divergence between the prediction and its projection, which quantifies the smallest adjustment needed to satisfy the induced dominance and shape constraints. The projection is computed with Dykstra's algorithm using Bregman projections associated with the negative entropy, and we provide explicit formulas for the projections onto each constraint set. Experiments conducted on synthetic data and on a real-world natural language inference task, based on the ChaosNLI dataset, show that the proposed projection algorithm is efficient enough for practical use, and that the resulting projection-based learning objective can improve predictive performance.
表征(2篇)
【1】Prosodic ABX: A Language-Agnostic Method for Measuring Prosodic Contrast in Speech Representations
标题:韵律ABX:一种测量语音表达中韵律对比度的模糊不可知方法
链接:https://arxiv.org/abs/2604.02102
作者:Haitong Sun,Stephen McIntosh,Kwanghee Choi,Eunjung Yeo,Daisuke Saito,Nobuaki Minematsu
备注:Submitted to Interspeech 2026; 6 pages, 4 figures
摘要:众所周知,自监督语音模型(S3 M)的语音表示对音素对比敏感,但它们对韵律对比的敏感性尚未直接测量。ABX歧视任务已被用来衡量音位对比S3M表示通过最小对。我们引入韵律ABX,这个框架的扩展,以评估韵律对比度只有少数的例子,没有明确的标签。此外,我们构建并发布了英语和日语最小对的数据集,并将其与普通话数据集一起使用,以评估英语重音,日语音高口音和普通话音调的对比。最后,我们表明,模型和层的排名往往保留在几个实验条件下,使其实用的低资源设置。
摘要:Speech representations from self-supervised speech models (S3Ms) are known to be sensitive to phonemic contrasts, but their sensitivity to prosodic contrasts has not been directly measured. The ABX discrimination task has been used to measure phonemic contrast in S3M representations via minimal pairs. We introduce prosodic ABX, an extension of this framework to evaluate prosodic contrast with only a handful of examples and no explicit labels. Also, we build and release a dataset of English and Japanese minimal pairs and use it along with a Mandarin dataset to evaluate contrast in English stress, Japanese pitch accent, and Mandarin tone. Finally, we show that model and layer rankings are often preserved across several experimental conditions, making it practical for low-resource settings.
【2】Investigating Permutation-Invariant Discrete Representation Learning for Spatially Aligned Images
标题:研究空间对齐图像的排列不变离散表示学习
链接:https://arxiv.org/abs/2604.01843
作者:Jamie S. J. Stirling,Noura Al-Moubayed,Hubert P. H. Shum
备注:15 pages plus references; 5 figures; supplementary appended; accepted to ICPR 2026
摘要:矢量量化方法(VQ-VAE,VQ-GAN)学习图像的离散神经表示,但这些表示本质上依赖于位置:代码在空间上排列并与上下文纠缠,需要自回归或基于扩散的先验来建模它们在采样时间的依赖性。在这项工作中,我们问位置信息是否是必要的离散表示空间对齐的数据。我们提出了置换不变的矢量量化自动编码器(PI-VQ),其中潜在的代码被约束为不携带位置信息。我们发现,这种约束鼓励代码捕捉全局,语义特征,并使图像之间的直接插值,而无需事先学习。为了解决减少信息容量的置换不变表示,我们引入匹配量化,矢量量化算法的基础上最佳的二分匹配,增加有效的瓶颈容量由3.5\times $相对于天真的最近邻量化。学习代码的组成结构进一步实现了基于插值的采样,允许在单个前向通道中合成新图像。我们在CelebA,CelebA-HQ和FFHQ上评估PI-VQ,获得具有竞争力的精度,密度和覆盖率指标,用于我们的方法合成的图像。我们讨论的权衡固有的位置自由的表示,包括分离性和可解释性的潜在代码,指向许多方向为未来的工作。
摘要:Vector quantization approaches (VQ-VAE, VQ-GAN) learn discrete neural representations of images, but these representations are inherently position-dependent: codes are spatially arranged and contextually entangled, requiring autoregressive or diffusion-based priors to model their dependencies at sample time. In this work, we ask whether positional information is necessary for discrete representations of spatially aligned data. We propose the permutation-invariant vector-quantized autoencoder (PI-VQ), in which latent codes are constrained to carry no positional information. We find that this constraint encourages codes to capture global, semantic features, and enables direct interpolation between images without a learned prior. To address the reduced information capacity of permutation-invariant representations, we introduce matching quantization, a vector quantization algorithm based on optimal bipartite matching that increases effective bottleneck capacity by $3.5\times$ relative to naive nearest-neighbour quantization. The compositional structure of the learned codes further enables interpolation-based sampling, allowing synthesis of novel images in a single forward pass. We evaluate PI-VQ on CelebA, CelebA-HQ and FFHQ, obtaining competitive precision, density and coverage metrics for images synthesised with our approach. We discuss the trade-offs inherent to position-free representations, including separability and interpretability of the latent codes, pointing to numerous directions for future work.
3D|3D重建等相关(1篇)
【1】Dual-Attention Based 3D Channel Estimation
标题:基于双重关注的3D通道估计
链接:https://arxiv.org/abs/2604.01769
作者:Xiangzhao Qin,Sha Hu
备注:5 pages, 6 figures
摘要:对于多输入多输出(MIMO)信道,基于线性最小均方误差(LMMSE)的最优信道估计(CE)需要三维(3D)滤波。然而,复杂性往往是禁止由于大矩阵尺寸。次优估计器通过将其分解到时域、频域和空域来近似3DCE,同时在相关MIMO信道下产生显著的性能下降。另一方面,深度学习(DL)的最新进展可以通过注意力机制探索所有领域的通道相关性。在此基础上,我们提出了一种基于双重注意机制的3DCE网络(3DCENet),可以实现准确的估计。
摘要:For multi-input and multi-output (MIMO) channels, the optimal channel estimation (CE) based on linear minimum mean square error (LMMSE) requires three-dimensional (3D) filtering. However, the complexity is often prohibitive due to large matrix dimensions. Suboptimal estimators approximate 3DCE by decomposing it into time, frequency, and spatial domains, while yields noticeable performance degradation under correlated MIMO channels. On the other hand, recent advances in deep learning (DL) can explore channel correlations in all domains via attention mechanisms. Building on this capability, we propose a dual attention mechanism based 3DCE network (3DCENet) that can achieve accurate estimates.
编码器(1篇)
【1】Application of parametric Shallow Recurrent Decoder Network to magnetohydrodynamic flows in liquid metal blankets of fusion reactors
标题:参数浅层回归解码器网络在聚变反应堆液态金属包层磁流体动力学中的应用
链接
:https://arxiv.org/abs/2604.02139
作者:M. Lo Verso,C. Introini,E. Cervi,L. Savoldi,J. N. Kutz,A. Cammi
摘要:磁流体动力学(MHD)现象在核聚变系统的设计和运行中起着关键作用,其中导电流体(如反应堆包层中使用的液态金属或熔盐)与不同强度和方向的磁场相互作用,影响产生的流动动力学。MHD模型的数值解需要解析高度非线性、多物理场的方程组,这可能会对计算要求很高,特别是在多查询、参数化或实时环境中。这项研究调查了一个完全数据驱动的框架MHD状态重建,集成了通过奇异值分解(SVD)与SHRED(SHRED),神经网络架构,旨在重建完整的时空状态从稀疏的时间序列测量选定的观测值,包括以前看不见的参数配置降维。SHRED方法被施加到一个三维的几何形状代表的一部分WCLL毯式电池,其中铅锂流动周围的水冷却管。多个磁场配置进行检查,包括恒定的环向场,组合环向极向场,和随时间变化的磁场。在所有考虑的场景中,SHRED实现了高重建精度,鲁棒性和泛化到训练期间未看到的磁场强度,方向和时间演变。值得注意的是,在存在时变磁场的情况下,该模型仅使用温度测量准确地推断出磁场本身的时间演化。总的来说,研究结果确定SHRED作为一种计算效率高,数据驱动,灵活的MHD状态重建方法,具有显着的潜力,实时监测,诊断和控制聚变反应堆系统。
摘要:Magnetohydrodynamic (MHD) phenomena play a pivotal role in the design and operation of nuclear fusion systems, where electrically conducting fluids (such as liquid metals or molten salts employed in reactor blankets) interact with magnetic fields of varying intensity and orientation, influencing the resulting flow dynamics. The numerical solution of MHD models entails the resolution of highly nonlinear, multiphysics systems of equations, which can become computationally demanding, particularly in multi-query, parametric, or real-time contexts. This study investigates a fully data-driven framework for MHD state reconstruction that integrates dimensionality reduction through Singular Value Decomposition (SVD) with the SHallow REcurrent Decoder (SHRED), a neural network architecture designed to reconstruct the full spatio-temporal state from sparse time-series measurements of selected observables, including previously unseen parametric configurations. The SHRED methodology is applied to a three-dimensional geometry representative of a portion of a WCLL blanket cell, in which lead-lithium flows around a water-cooled tube. Multiple magnetic field configurations are examined, including constant toroidal fields, combined toroidal-poloidal fields, and time-dependent magnetic fields. Across all considered scenarios, SHRED achieves high reconstruction accuracy, robustness, and generalization to magnetic field intensities, orientations, and temporal evolutions not seen during training. Notably, in the presence of time-varying magnetic fields, the model accurately infers the temporal evolution of the magnetic field itself using temperature measurements alone. Overall, the findings identify SHRED as a computationally efficient, data-driven, and flexible approach for MHD state reconstruction, with significant potential for real-time monitoring, diagnostics and control in fusion reactor systems.
优化|敛散性(6篇)
【1】Intelligent Cloud Orchestration: A Hybrid Predictive and Heuristic Framework for Cost Optimization
标题:智能云规划:成本优化的混合预测和启发式框架
链接:https://arxiv.org/abs/2604.02131
作者:Heet Nagoriya,Komal Rohit
备注:8 pages, 4 figures, 2 tables
摘要:云计算允许可扩展的资源配置,但动态工作负载变化往往会导致过度配置带来更高的成本。机器学习(ML)方法,如长短期记忆(LSTM)网络,对于在更高级别上预测工作负载模式是有效的,但它们可能会在突然的流量高峰期间引入延迟。相比之下,像博弈论这样的数学方法提供了快速可靠的调度决策,但它们不能考虑未来的工作量变化。为了解决这个问题,本文提出了一个混合编排框架,结合LSTM的预测缩放与启发式任务分配。结果表明,这种方法降低了基础设施成本接近ML为基础的模型,同时保持快速的响应时间类似于启发式方法。这项工作提出了一种实用的方法,以提高云资源管理的成本效率。
摘要:Cloud computing allows scalable resource provisioning, but dynamic workload changes often lead to higher costs due to over-provisioning. Machine learning (ML) approaches, such as Long Short-Term Memory (LSTM) networks, are effective for predicting workload patterns at a higher level, but they can introduce delays during sudden traffic spikes. In contrast, mathematical heuristics like Game Theory provide fast and reliable scheduling decisions, but they do not account for future workload changes. To address this trade-off, this paper proposes a hybrid orchestration framework that combines LSTM-based predictive scaling with heuristic task allocation. The results show that this approach reduces infrastructure costs close to ML-based models while maintaining fast response times similar to heuristic methods. This work presents a practical approach for improving cost efficiency in cloud resource management.
【2】Test-Time Scaling Makes Overtraining Compute-Optimal
标题:测试时间扩展使过度训练成为计算最佳
链接:https://arxiv.org/abs/2604.01411
作者:Nicholas Roberts,Sungjun Cho,Zhiqi Gao,Tzu-Heng Huang,Albert Wu,Gabriel Orlanski,Avi Trost,Kelly Buchanan,Aws Albarghouthi,Frederic Sala
摘要:现代LLM在测试时缩放,例如通过重复采样,其中推理成本随着模型大小和样本数量而增长。这就产生了一个权衡,而训练前的缩放定律,比如龙猫,并没有解决这个问题。我们提出了Train-to-Test($T^2$)缩放律,这些缩放律在固定的端到端预算下联合优化模型大小、训练令牌和推理样本数量。$T^2$通过用于测试时缩放的pass@$k$建模,使预训练缩放律现代化,然后联合优化预训练和测试时决策。从$T^2$的预测是强大的不同的建模方法:测量联合缩放效应的任务损失和建模的影响任务的准确性。在八个下游任务中,我们发现,当考虑推理成本时,最佳预训练决策从根本上转变为过度训练机制,远远超出了标准预训练缩放套件的范围。我们通过在T^2 $ scaling预测的最佳区域中预训练严重过度训练的模型来验证我们的结果,证实了与单独的预训练缩放相比,它们的性能更强。最后,由于前沿LLM是后训练的,我们表明我们的研究结果在后训练阶段仍然存在,使得T^2 $扩展在现代部署中具有意义。
摘要:Modern LLMs scale at test-time, e.g. via repeated sampling, where inference cost grows with model size and the number of samples. This creates a trade-off that pretraining scaling laws, such as Chinchilla, do not address. We present Train-to-Test ($T^2$) scaling laws that jointly optimize model size, training tokens, and number of inference samples under fixed end-to-end budgets. $T^2$ modernizes pretraining scaling laws with pass@$k$ modeling used for test-time scaling, then jointly optimizes pretraining and test-time decisions. Forecasts from $T^2$ are robust over distinct modeling approaches: measuring joint scaling effect on the task loss and modeling impact on task accuracy. Across eight downstream tasks, we find that when accounting for inference cost, optimal pretraining decisions shift radically into the overtraining regime, well-outside of the range of standard pretraining scaling suites. We validate our results by pretraining heavily overtrained models in the optimal region that $T^2$ scaling forecasts, confirming their substantially stronger performance compared to pretraining scaling alone. Finally, as frontier LLMs are post-trained, we show that our findings survive the post-training stage, making $T^2$ scaling meaningful in modern deployments.
【3】Causal Optimal Coupling for Gaussian Input-Output Distributional Data
标题:高斯输入输出分布数据的因果最优耦合
链接:https://arxiv.org/abs/2604.01406
作者:Daran Xu,Amirhossein Taghvaei
摘要:我们研究的问题,确定因果动力系统所产生的输入输出分布数据之间的最佳耦合。耦合需要满足规定的边际分布和因果关系的约束,反映了系统的时间结构。我们将这个问题表述为一个薛定谔桥,它寻求最接近的耦合-在Kullback-Leibler分歧-给定的先验,同时执行边际和因果约束。对于高斯边际和一般的时间相关的二次成本函数的情况下,我们推导出一个完全易于处理的表征收敛到最优解的Sinkhorn迭代。除了其理论贡献,建议的框架提供了一个原则性的基础,适用于因果最佳运输方法,系统识别分布数据。
摘要
:We study the problem of identifying an optimal coupling between input-output distributional data generated by a causal dynamical system. The coupling is required to satisfy prescribed marginal distributions and a causality constraint reflecting the temporal structure of the system. We formulate this problem as a Schr"odinger Bridge, which seeks the coupling closest - in Kullback-Leibler divergence - to a given prior while enforcing both marginal and causality constraints. For the case of Gaussian marginals and general time-dependent quadratic cost functions, we derive a fully tractable characterization of the Sinkhorn iterations that converges to the optimal solution. Beyond its theoretical contribution, the proposed framework provides a principled foundation for applying causal optimal transport methods to system identification from distributional data.
【4】Efficient and Principled Scientific Discovery through Bayesian Optimization: A Tutorial
标题:通过Bayesian优化实现高效且有原则的科学发现:一个挑战
链接:https://arxiv.org/abs/2604.01328
作者:Zhongwei Yu,Rasul Tutunov,Alexandre Max Maraval,Zikai Xie,Zhenzhi Tan,Jiankang Wang,Zijing Li,Liangliang Xu,Qi Yang,Jun Jiang,Sanzhong Luo,Zhenxiao Guo,Haitham Bou-Ammar,Jun Wang
摘要:传统的科学发现依赖于一个迭代的假设-实验-完善循环,这个循环已经推动了几个世纪的进步,但其直观的、临时的实施往往浪费资源,产生低效的设计,并错过关键的见解。本教程介绍了贝叶斯优化(BO),这是一个原则性的概率驱动框架,它将这个核心科学周期形式化并自动化。BO使用代理模型(例如,高斯过程)来将经验观察建模为不断发展的假设,以及获取功能来指导实验选择,平衡已知知识的利用和未知领域的探索,以消除猜测和手动试错。我们首先将科学发现框架为优化问题,然后通过催化,材料科学,有机合成和分子发现的案例研究来解开BO的核心组件,端到端工作流程和现实世界的功效。我们还涵盖了科学应用的关键技术扩展,包括批量实验,异方差,上下文优化和人在环集成。本教程为广大受众量身定制,将BO中的人工智能进步与实际的自然科学应用联系起来,提供分层内容,使跨学科研究人员能够设计更有效的实验并加速原则性的科学发现。
摘要:Traditional scientific discovery relies on an iterative hypothesise-experiment-refine cycle that has driven progress for centuries, but its intuitive, ad-hoc implementation often wastes resources, yields inefficient designs, and misses critical insights. This tutorial presents Bayesian Optimisation (BO), a principled probability-driven framework that formalises and automates this core scientific cycle. BO uses surrogate models (e.g., Gaussian processes) to model empirical observations as evolving hypotheses, and acquisition functions to guide experiment selection, balancing exploitation of known knowledge and exploration of uncharted domains to eliminate guesswork and manual trial-and-error. We first frame scientific discovery as an optimisation problem, then unpack BO's core components, end-to-end workflows, and real-world efficacy via case studies in catalysis, materials science, organic synthesis, and molecule discovery. We also cover critical technical extensions for scientific applications, including batched experimentation, heteroscedasticity, contextual optimisation, and human-in-the-loop integration. Tailored for a broad audience, this tutorial bridges AI advances in BO with practical natural science applications, offering tiered content to empower cross-disciplinary researchers to design more efficient experiments and accelerate principled scientific discovery.
【5】JetPrism: diagnosing convergence for generative simulation and inverse problems in nuclear physics
标题:JetPrism:诊断生成式模拟和核物理反问题的收敛性
链接:https://arxiv.org/abs/2604.01313
作者:Zeyu Xia,Tyler Kim,Trevor Reed,Judy Fox,Geoffrey Fox,Adam Szczepaniak
备注:Submitted to AI4EIC 2025. 21 pages, 17 figures
摘要:高保真蒙特卡罗模拟和复杂的逆问题,如将模糊的实验观测映射到地面真实状态,是计算密集型的,但对于强大的数据分析至关重要。条件流匹配(CFM)提供了一种数学上强大的方法来加速这些任务,但我们证明了它的标准训练损失从根本上是误导。在严格的物理应用中,CFM过早地损失平台,作为真实收敛和物理保真度的不可靠指标。为了研究这种脱节,我们设计了JetPrism,一个可配置的CFM框架作为一个有效的生成代理评估无条件生成和条件检测器展开。使用合成压力测试和杰斐逊实验室的运动学数据集($γp \to ρ^0 p \to π^+π^- p$)相关的即将到来的电子离子对撞机(EIC),我们建立了物理信息的指标继续显着改善后,标准损失收敛。因此,我们提出了一个多度量评估协议,结合边缘和成对的$χ^2$统计,$W_1$距离,相关矩阵距离($D_{\mathrm{corr}}$),最近邻距离比($R_{\mathrm{NN}}$)。通过证明特定领域的评估必须取代通用的损失指标,这项工作将JetPrism建立为一个可靠的生成代理,确保与地面实况数据的精确统计一致性,而无需记住训练集。虽然在核物理中得到了证明,但这种诊断框架很容易扩展到参数生成和跨广泛领域的复杂逆问题。潜在的应用范围包括医学成像、天体物理学、半导体发现和定量金融,其中高保真模拟、严格反演和生成可靠性至关重要。
摘要:High-fidelity Monte Carlo simulations and complex inverse problems, such as mapping smeared experimental observations to ground-truth states, are computationally intensive yet essential for robust data analysis. Conditional Flow Matching (CFM) offers a mathematically robust approach to accelerating these tasks, but we demonstrate its standard training loss is fundamentally misleading. In rigorous physics applications, CFM loss plateaus prematurely, serving as an unreliable indicator of true convergence and physical fidelity. To investigate this disconnect, we designed JetPrism, a configurable CFM framework acting as an efficient generative surrogate for evaluating unconditional generation and conditional detector unfolding. Using synthetic stress tests and a Jefferson Lab kinematic dataset ($γp \to ρ^0 p \to π^+π^- p$) relevant to the forthcoming Electron-Ion Collider (EIC), we establish that physics-informed metrics continue to improve significantly long after the standard loss converges. Consequently, we propose a multi-metric evaluation protocol incorporating marginal and pairwise $χ^2$ statistics, $W_1$ distances, correlation matrix distances ($D_{\mathrm{corr}}$), and nearest-neighbor distance ratios ($R_{\mathrm{NN}}$). By demonstrating that domain-specific evaluations must supersede generic loss metrics, this work establishes JetPrism as a dependable generative surrogate that ensures precise statistical agreement with ground-truth data without memorizing the training set. While demonstrated in nuclear physics, this diagnostic framework is readily extensible to parameter generation and complex inverse problems across broad domains. Potential applications span medical imaging, astrophysics, semiconductor discovery, and quantitative finance, where high-fidelity simulation, rigorous inversion, and generative reliability are critical.
【6】A Learning-Based Cooperative Coevolution Framework for Heterogeneous Large-Scale Global Optimization
标题:基于学习的异类大规模全局优化协同进化框架
链接:https://arxiv.org/abs/2604.01241
作者:Wenjie Qiu,Zixin Wang,Hongyu Fang,Zeyuan Ma,Yue-Jiao Gong
备注:13 pages, 5 figures, 3 tables. Accepted for publication in GECCO 2026
摘要:协同进化(CC)通过分解有效地解决了大规模全局优化(LSGO)问题,但却难以解决现实应用中出现的异构LSGO(H-LSGO)问题,其中子问题表现出不同的维度和不同的景观。流行的CC范例依赖于固定的低维优化器,通常无法驾驭这种异构性。为了解决这一问题,我们提出了基于学习的异构协同进化框架(LH-CC)。通过将优化过程表述为马尔可夫决策过程,LH-CC采用元代理自适应地为每个子问题选择最合适的优化器。我们还引入了一个灵活的基准套件,以产生不同的H-LSGO问题的实例。在具有复杂耦合关系的3000维问题上进行的大量实验表明,LH-CC与最先进的基线相比,具有更高的解质量和计算效率。此外,该框架在不同的问题实例、优化范围和优化器中具有鲁棒的泛化能力。我们的研究结果表明,动态优化器选择是解决复杂H-LSGO问题的关键策略。
摘要
:Cooperative Coevolution (CC) effectively addresses Large-Scale Global Optimization (LSGO) via decomposition but struggles with the emerging class of Heterogeneous LSGO (H-LSGO) problems arising from real-world applications, where subproblems exhibit diverse dimensions and distinct landscapes. The prevailing CC paradigm, relying on a fixed low-dimensional optimizer, often fails to navigate this heterogeneity. To address this limitation, we propose the Learning-Based Heterogeneous Cooperative Coevolution Framework (LH-CC). By formulating the optimization process as a Markov Decision Process, LH-CC employs a meta-agent to adaptively select the most suitable optimizer for each subproblem. We also introduce a flexible benchmark suite to generate diverse H-LSGO problem instances. Extensive experiments on 3000-dimensional problems with complex coupling relationships demonstrate that LH-CC achieves superior solution quality and computational efficiency compared to state-of-the-art baselines. Furthermore, the framework exhibits robust generalization across varying problem instances, optimization horizons, and optimizers. Our findings reveal that dynamic optimizer selection is a pivotal strategy for solving complex H-LSGO problems.
预测|估计(9篇)
【1】A Practical Two-Stage Framework for GPU Resource and Power Prediction in Heterogeneous HPC Systems
标题:一种实用的用于异类高性能计算系统中的图形处理器资源和功耗预测的两阶段框架
链接:https://arxiv.org/abs/2604.02158
作者:Beste Oztop,Dhruva Kulkarni,Zhengji Zhao,Ayse Kivilcim Coskun,Kadidia Konate
备注:9 pages, 6 figures
摘要:随着高性能计算(HPC)对GPU需求的不断增长,GPU资源和功率的有效利用变得至关重要。在本文中,我们使用Slurm工作负载管理器历史日志和NVIDIA数据中心GPU管理器(DCGM)收集的GPU性能指标,分析了GPU利用率和GPU内存利用率,以及Vienna ab initio Simulation Package(VASP)的功耗。VASP是NERSC的Perlmutter上广泛使用的材料科学应用程序,这是一种基于NVIDIA A100 GPU的HPE Cray EX系统。使用我们的见解,从VASP应用程序的资源利用率分析,我们提出了一个资源预测框架来预测平均GPU功率,最大GPU利用率,最大GPU内存利用率值的异构HPC系统应用程序,使更有效的调度决策和功耗感知的系统操作。我们的预测框架包括两个阶段:1)仅使用Slurm会计日志作为训练数据,2)使用DCGM收集的历史GPU分析指标来增强训练数据。仅使用Slurm提交功能的最大GPU利用率预测可实现高达97%的准确度。此外,从GPU计算和内存活动指标设计的功能与平均功耗具有良好的相关性,我们的运行时功耗预测实验的预测准确率高达92%。这些发现证明了DCGM指标在捕获应用程序特性方面的有效性,并突出了它们在开发预测模型以支持HPC系统中的动态电源管理方面的潜力。
摘要:Efficient utilization of GPU resources and power has become critical with the growing demand for GPUs in high-performance computing (HPC). In this paper, we analyze GPU utilization and GPU memory utilization, as well as the power consumption of the Vienna ab initio Simulation Package (VASP), using the Slurm workload manager historical logs and GPU performance metrics collected by NVIDIA's Data Center GPU Manager (DCGM). VASP is a widely used materials science application on Perlmutter at NERSC, an HPE Cray EX system based on NVIDIA A100 GPUs. Using our insights from the resource utilization analysis of VASP applications, we propose a resource prediction framework to predict the average GPU power, maximum GPU utilization, and maximum GPU memory utilization values of heterogeneous HPC system applications to enable more efficient scheduling decisions and power-aware system operation. Our prediction framework consists of two stages: 1) using only the Slurm accounting logs as training data and 2) augmenting the training data with historical GPU profiling metrics collected with DCGM. The maximum GPU utilization predictions using only the Slurm submission features achieve up to 97% accuracy. Furthermore, features engineered from GPU-compute and memory activity metrics exhibit good correlations with average power utilization, and our runtime power usage prediction experiments result in up to 92% prediction accuracy. These findings demonstrate the effectiveness of DCGM metrics in capturing application characteristics and highlight their potential for developing predictive models to support dynamic power management in HPC systems.
【2】Bridging Deep Learning and Integer Linear Programming: A Predictive-to-Prescriptive Framework for Supply Chain Analytics
标题:连接深度学习和扩展线性规划:供应链分析的预测到规定框架
链接:https://arxiv.org/abs/2604.01775
作者:Khai Banh Nghiep,Duc Nguyen Minh,Lan Hoang Thi
备注:12 pages, 4 figures, 4 tables
摘要:虽然需求预测是供应链规划的关键组成部分,但实际零售数据可能会表现出不可调和的季节性,不规则的尖峰和噪音,从而几乎无法实现精确的预测。本文提出了一个结合预测和运营分析的三步分析框架。第一阶段包括探索性数据分析,其中对来自180,519笔交易的交付跟踪数据进行分区,并检查长期趋势,季节性和交付相关属性。其次,比较了统计时间序列分解模型N-BEATS MSTL和最近的深度学习架构N-HiTS的预测性能。N-BEATS和N-HiTS都是统计学上的,因此N-BEATS和N-HiTS是统计学上选择的。最新的时间序列深度学习模型,N-HiTS,N-BEATS。N-HiTS和N-BEATS N-HiTS和N-HiTS在很大程度上优于统计基准。N-BEATS被选为最优化的模型,作为预测误差最低的模型,在1918台机组的未来4周的第三阶段和最后阶段预测值中,并将其作为一组确定性整数线性规划结果的模型提供,这些结果旨在最大限度地减少总交付时间和一组约束预算,容量和服务约束。解决方案分配提供了一个可行的和成本最优的航运计划。总的来说,该研究提供了一个引人注目的例子,说明精确预测和简单,高度可解释的模型优化在物流中的实际影响。
摘要:Although demand forecasting is a critical component of supply chain planning, actual retail data can exhibit irreconcilable seasonality, irregular spikes, and noise, rendering precise projections nearly unattainable. This paper proposes a three-step analytical framework that combines forecasting and operational analytics. The first stage consists of exploratory data analysis, where delivery-tracked data from 180,519 transactions are partitioned, and long-term trends, seasonality, and delivery-related attributes are examined. Secondly, the forecasting performance of a statistical time series decomposition model N-BEATS MSTL and a recent deep learning architecture N-HiTS were compared. N-BEATS and N-HiTS were both statistically, and hence were N-BEATS's and N-HiTS's statistically selected. Most recent time series deep learning models, N-HiTS, N-BEATS. N-HiTS and N-BEATS N-HiTS and N-HiTS outperformed the statistical benchmark to a large extent. N-BEATS was selected to be the most optimized model, as the one with the lowest forecasting error, in the 3rd and final stage forecasting values of the next 4 weeks of 1918 units, and provided those as a model with a set of deterministically integer linear program outcomes that are aimed to minimize the total delivery time with a set of bound budget, capacity, and service constraints. The solution allocation provided a feasible and cost-optimal shipping plan. Overall, the study provides a compelling example of the practical impact of precise forecasting and simple, highly interpretable model optimization in logistics.
【3】MATA-Former & SIICU: Semantic Aware Temporal Alignment for High-Fidelity ICU Risk Prediction
标题:MATA-Former和SIICU:高保真ICU风险预测的语义感知时间对齐
链接:https://arxiv.org/abs/2604.01727
作者:Zhichong Zheng,Xiaohang Nie,Xueqi Wang,Yuanjin Zhao,Haitao Zhang,Yichao Tang
摘要:预测不断演变的临床风险依赖于内在的病理依赖性,而不仅仅是时间上的接近,但目前的方法与粗糙的二进制监督和物理时间戳的斗争。为了使预测建模与临床逻辑保持一致,我们提出了医疗语义感知时间ALiBi Transformer(MATA-Former),利用事件语义动态参数化注意力权重,以优先考虑时间滞后的因果有效性。此外,我们引入了高原高斯软标签(PSL),将二元分类转化为连续多水平回归,用于全轨迹风险建模。在SIICU(一个新构建的数据集,包含超过506 k个事件,具有严格的专家验证,细粒度注释)和MIMIC-IV数据集上进行评估,我们的框架在从文本密集型,不规则临床时间序列中捕获风险方面表现出卓越的功效和强大的泛化能力。
摘要:Forecasting evolving clinical risks relies on intrinsic pathological dependencies rather than mere chronological proximity, yet current methods struggle with coarse binary supervision and physical timestamps. To align predictive modeling with clinical logic, we propose the Medical-semantics Aware Time-ALiBi Transformer (MATA-Former), utilizing event semantics to dynamically parameterize attention weights to prioritize causal validity over time lags. Furthermore, we introduce Plateau-Gaussian Soft Labeling (PSL), reformulating binary classification into continuous multi-horizon regression for full-trajectory risk modeling. Evaluated on SIICU -- a newly constructed dataset featuring over 506k events with rigorous expert-verified, fine-grained annotations -- and the MIMIC-IV dataset, our framework demonstrates superior efficacy and robust generalization in capturing risks from text-intensive, irregular clinical time series.
【4】Label Shift Estimation With Incremental Prior Update
标题:使用增量先前更新的标签移动估计
链接:https://arxiv.org/abs/2604.01651
作者:Yunrui Zhang,Gustavo Batista,Salil S. Kanhere
备注:SIAM SDM 2025
摘要:在监督学习中经常假设训练集和测试集具有相同的标签分布。然而,在现实生活中,这种假设很少成立。例如,医疗诊断结果分布会随着时间和地点的变化而变化;欺诈检测模型必须适应欺诈活动模式的变化;社交媒体帖子的类别分布会根据热门话题和用户人口统计数据而变化。在标签偏移估计任务中,目标是估计测试集中变化的标签分布$p_t(y)$,假设似然$p(x| y)$不改变,意味着没有概念漂移。在本文中,我们提出了一种新的方法事后标签移位估计,不同于以前的方法,执行矩匹配的混淆矩阵估计从验证集或最大化的新数据的可能性与期望最大化算法。我们的目标是逐步更新每个样本的先验,调整每个后验以获得更准确的标签偏移估计。所提出的方法是基于对现代概率分类器通常是真实的分类器的直观假设。与其他方法相比,所提出的方法依赖于较弱的校准概念。作为一种事后的标签偏移估计方法,该方法是通用的,可以应用于任何黑盒概率分类器。在CIFAR-10和MNIST上的实验表明,该方法在不同的标定和不同强度的标签偏移下,性能始终优于当前最先进的基于最大似然的方法。
摘要:An assumption often made in supervised learning is that the training and testing sets have the same label distribution. However, in real-life scenarios, this assumption rarely holds. For example, medical diagnosis result distributions change over time and across locations; fraud detection models must adapt as patterns of fraudulent activity shift; the category distribution of social media posts changes based on trending topics and user demographics. In the task of label shift estimation, the goal is to estimate the changing label distribution $p_t(y)$ in the testing set, assuming the likelihood $p(x|y)$ does not change, implying no concept drift. In this paper, we propose a new approach for post-hoc label shift estimation, unlike previous methods that perform moment matching with confusion matrix estimated from a validation set or maximize the likelihood of the new data with an expectation-maximization algorithm. We aim to incrementally update the prior on each sample, adjusting each posterior for more accurate label shift estimation. The proposed method is based on intuitive assumptions on classifiers that are generally true for modern probabilistic classifiers. The proposed method relies on a weaker notion of calibration compared to other methods. As a post-hoc approach for label shift estimation, the proposed method is versatile and can be applied to any black-box probabilistic classifier. Experiments on CIFAR-10 and MNIST show that the proposed method consistently outperforms the current state-of-the-art maximum likelihood-based methods under different calibrations and varying intensities of label shift.
【5】Soft MPCritic: Amortized Model Predictive Value Iteration
标题:软MPCritic:摊销模型预测值迭代
链接:https://arxiv.org/abs/2604.01477
作者:Thomas Banker,Nathan P. Lawrence,Ali Mesbah
备注:submitted to CDC 2026
摘要:强化学习(RL)和模型预测控制(MPC)提供了互补的优势,但将它们大规模结合仍然具有计算挑战性。我们提出了软MPCritic,RL-MPC框架,学习(软)值空间,同时使用基于样本的规划在线控制和价值目标生成。Soft MPCritic通过模型预测路径积分控制(MPPI)实例化MPC,并用拟合值迭代训练终端Q函数,将学习的值函数与规划器对齐,并隐式地扩展有效规划范围。我们引入了一个摊销的热启动策略,从在线观察时,计算批量的MPPI为基础的价值目标的计划开环动作序列。这使得软MPCritic在计算上实用,同时保持解决方案的质量。软MPC Critic计划以基于MIMO的方式与动态模型的集合一起训练下一步预测精度。总之,这些成分使软MPCritic能够通过对经典和复杂控制任务的鲁棒,短期规划来有效学习。这些结果建立软MPC critic作为一个实用的和可扩展的蓝图,在政策提取和直接的,长期规划可能失败的设置中合成MPC政策。
摘要:Reinforcement learning (RL) and model predictive control (MPC) offer complementary strengths, yet combining them at scale remains computationally challenging. We propose soft MPCritic, an RL-MPC framework that learns in (soft) value space while using sample-based planning for both online control and value target generation. soft MPCritic instantiates MPC through model predictive path integral control (MPPI) and trains a terminal Q-function with fitted value iteration, aligning the learned value function with the planner and implicitly extending the effective planning horizon. We introduce an amortized warm-start strategy that recycles planned open-loop action sequences from online observations when computing batched MPPI-based value targets. This makes soft MPCritic computationally practical, while preserving solution quality. soft MPCritic plans in a scenario-based fashion with an ensemble of dynamic models trained for next-step prediction accuracy. Together, these ingredients enable soft MPCritic to learn effectively through robust, short-horizon planning on classic and complex control tasks. These results establish soft MPCritic as a practical and scalable blueprint for synthesizing MPC policies in settings where policy extraction and direct, long-horizon planning may fail.
【6】PI-JEPA: Label-Free Surrogate Pretraining for Coupled Multiphysics Simulation via Operator-Split Latent Prediction
标题:PI-JEPA:通过操作员分裂潜在预测进行耦合多物理场模拟的无标签代理预训练
链接:https://arxiv.org/abs/2604.01349
作者:Brandon Yee,Pairie Koh
摘要:储层模拟工作流面临着一个基本的数据不对称性:输入参数字段(地质统计渗透率实现,孔隙度分布)可以自由生成任意数量,但现有的神经运算符代理需要大量昂贵的标记模拟轨迹语料库,无法利用这种未标记的结构。我们介绍了\textbf{PI-JEPA}(物理信息联合嵌入预测架构),这是一个代理预训练框架,可以在没有任何完整PDE解决方案的情况下训练\textbf {没有任何完整PDE解决方案},在每个子算子的PDE残差正则化下对未标记的参数字段使用掩码潜在预测。预测器库在结构上与控制方程的Lie-Trotter算子分裂分解相一致,为每个子过程(压力,饱和传输,反应)提供单独的物理约束潜在模块,从而能够通过少至100个标记的模拟运行进行微调。在单相达西流上,PI-JEPA在$N_\ell{=}100$时实现了比FNO低1.9\times $的误差,比DeepONet低2.4\times $的误差,比$N_\ell{=}500$时仅受监督的训练提高了24%,这表明无标签替代预训练大大降低了多物理场替代部署所需的模拟预算。
摘要:Reservoir simulation workflows face a fundamental data asymmetry: input parameter fields (geostatistical permeability realizations, porosity distributions) are free to generate in arbitrary quantities, yet existing neural operator surrogates require large corpora of expensive labeled simulation trajectories and cannot exploit this unlabeled structure. We introduce \textbf{PI-JEPA} (Physics-Informed Joint Embedding Predictive Architecture), a surrogate pretraining framework that trains \emph{without any completed PDE solves}, using masked latent prediction on unlabeled parameter fields under per-sub-operator PDE residual regularization. The predictor bank is structurally aligned with the Lie--Trotter operator-splitting decomposition of the governing equations, dedicating a separate physics-constrained latent module to each sub-process (pressure, saturation transport, reaction), enabling fine-tuning with as few as 100 labeled simulation runs. On single-phase Darcy flow, PI-JEPA achieves $1.9\times$ lower error than FNO and $2.4\times$ lower error than DeepONet at $N_\ell{=}100$, with 24\% improvement over supervised-only training at $N_\ell{=}500$, demonstrating that label-free surrogate pretraining substantially reduces the simulation budget required for multiphysics surrogate deployment.
【7】Model Merging via Data-Free Covariance Estimation
标题:通过无数据协方差估计进行模型合并
链接:https://arxiv.org/abs/2604.01329
作者:Marawan Gamal Abdel Hameed,Derek Tam,Pascal Jr Tikeng Notsawo,Colin Raffel,Guillaume Rabusseau
摘要
:模型合并提供了一种廉价的方法来组合各个模型,以生成继承每个模型的能力的模型。虽然一些合并方法可以接近多任务训练的性能,但它们通常是出于实践动机,缺乏理论依据。一个原则性的替代方案是将模型合并作为一个逐层优化问题,直接最小化任务之间的干扰。然而,该公式需要根据数据估计每层协方差矩阵,而在执行合并时可能无法使用该矩阵。相比之下,许多基于实验动机的方法不需要辅助数据,这使得它们在实践中具有优势。在这项工作中,我们重新审视干扰最小化框架,并表明,在某些条件下,协方差矩阵可以直接从差分矩阵估计,消除了对数据的需要,同时也降低了计算成本。我们在从86 M参数到7 B参数的模型上跨视觉和语言基准验证了我们的方法,优于之前的无数据最先进的合并方法
摘要:Model merging provides a way of cheaply combining individual models to produce a model that inherits each individual's capabilities. While some merging methods can approach the performance of multitask training, they are often heuristically motivated and lack theoretical justification. A principled alternative is to pose model merging as a layer-wise optimization problem that directly minimizes interference between tasks. However, this formulation requires estimating per-layer covariance matrices from data, which may not be available when performing merging. In contrast, many of the heuristically-motivated methods do not require auxiliary data, making them practically advantageous. In this work, we revisit the interference minimization framework and show that, under certain conditions, covariance matrices can be estimated directly from difference matrices, eliminating the need for data while also reducing computational costs. We validate our approach across vision and language benchmarks on models ranging from 86M parameters to 7B parameters, outperforming previous data-free state-of-the-art merging methods
【8】Forecasting Supply Chain Disruptions with Foresight Learning
标题:利用前瞻性学习预测供应链中断
链接:https://arxiv.org/abs/2604.01298
作者:Benjamin Turtel,Paul Wilczewski,Kris Skotheim
摘要:在供应链中断成为现实之前预测供应链中断是企业和政策制定者面临的核心挑战。一个关键的困难是学习如何从嘈杂和非结构化的输入中可靠地推理出不常见的高影响事件--在这种情况下,通用模型在没有特定任务适应的情况下很难做到。我们引入了一个端到端的框架,训练LLM使用已实现的中断结果作为监督来生成校准的概率预测。由此产生的模型在准确性、校准和精度方面大大优于强基线(包括GPT-5)。我们还表明,培训诱导更结构化和可靠的概率推理没有明确的提示。这些结果为训练产生决策准备信号的特定领域预测模型提供了一条通用途径。为了支持透明度,我们开源了本研究中使用的评估数据集。 数据集:https://huggingface.co/datasets/LightningRodLabs/supply-chain-predictions
摘要:Anticipating supply chain disruptions before they materialize is a core challenge for firms and policymakers alike. A key difficulty is learning to reason reliably about infrequent, high-impact events from noisy and unstructured inputs - a setting where general-purpose models struggle without task-specific adaptation. We introduce an end-to-end framework that trains LLMs to produce calibrated probabilistic forecasts using realized disruption outcomes as supervision. The resulting model substantially outperforms strong baselines - including GPT-5 - on accuracy, calibration, and precision. We also show that training induces more structured and reliable probabilistic reasoning without explicit prompting. These results suggest a general pathway for training domain-specific forecasting models that produce decision-ready signals. To support transparency we open-source the evaluation dataset used in this study. Dataset: https://huggingface.co/datasets/LightningRodLabs/supply-chain-predictions
【9】DySCo: Dynamic Semantic Compression for Effective Long-term Time Series Forecasting
标题:DySCo:动态语义压缩用于有效的长期时间序列预测
链接:https://arxiv.org/abs/2604.01261
作者:Xiang Ao,Yinyu Tan,Mengru Chen
备注:9 pages, 7 figures
摘要:时间序列预测(TSF)在金融、气象和能源等领域至关重要。虽然理论上扩展回顾窗口可以提供更丰富的历史背景,但在实践中,它通常会引入不相关的噪声和计算冗余,从而阻止模型有效地捕获复杂的长期依赖关系。为了应对这些挑战,我们提出了一个动态语义压缩(DySCo)框架。与依赖固定统计的传统方法不同,DySCo引入了熵引导动态采样(EGDS)机制,可以自主识别和保留高熵段,同时压缩冗余趋势。此外,我们采用分层频率增强分解(HFED)策略将高频异常与低频模式分开,确保在稀疏采样期间保留关键细节。最后,一个跨尺度交互混合器(CSIM)的目的是动态融合全局上下文与本地表示,取代简单的线性聚合。实验结果表明,DySCo作为一个通用的即插即用模块,显着提高主流模型的能力,以降低计算成本,捕捉长期的相关性。
摘要:Time series forecasting (TSF) is critical across domains such as finance, meteorology, and energy. While extending the lookback window theoretically provides richer historical context, in practice, it often introduces irrelevant noise and computational redundancy, preventing models from effectively capturing complex long-term dependencies. To address these challenges, we propose a Dynamic Semantic Compression (DySCo) framework. Unlike traditional methods that rely on fixed heuristics, DySCo introduces an Entropy-Guided Dynamic Sampling (EGDS) mechanism to autonomously identify and retain high-entropy segments while compressing redundant trends. Furthermore, we incorporate a Hierarchical Frequency-Enhanced Decomposition (HFED) strategy to separate high-frequency anomalies from low-frequency patterns, ensuring that critical details are preserved during sparse sampling. Finally, a Cross-Scale Interaction Mixer(CSIM) is designed to dynamically fuse global contexts with local representations, replacing simple linear aggregation. Experimental results demonstrate that DySCo serves as a universal plug-and-play module, significantly enhancing the ability of mainstream models to capture long-term correlations with reduced computational cost.
其他神经网络|深度学习|模型|建模(16篇)
【1】Universal Hypernetworks for Arbitrary Models
标题:适用于任意模型的通用超网络
链接:https://arxiv.org/abs/2604.02215
作者:Xuanfeng Zhou
摘要:传统的超网络通常围绕特定的基本模型参数化进行设计,因此更改目标架构通常需要重新设计超网络并从头开始重新训练。我们介绍了\n {通用超网络}(UHN),一个固定架构的生成器,预测权重确定性参数,架构和任务描述符。这种基于生成器的公式化将生成器架构与目标网络参数化相结合,因此一个生成器可以实例化跨测试架构和任务族的异构模型。我们的经验声明有三个方面:(1)一个固定的UHN仍然具有跨视觉,图形,文本和公式回归基准的直接训练的竞争力;(2)相同的UHN支持家庭内的多模型泛化和跨异构模型的多任务学习;(3)UHN在最终基础模型之前可以稳定地递归生成多达三个中间生成的UHN。我们的代码可在https://github.com/Xuanfeng-Zhou/UHN上获得。
摘要:Conventional hypernetworks are typically engineered around a specific base-model parameterization, so changing the target architecture often entails redesigning the hypernetwork and retraining it from scratch. We introduce the \emph{Universal Hypernetwork} (UHN), a fixed-architecture generator that predicts weights from deterministic parameter, architecture, and task descriptors. This descriptor-based formulation decouples the generator architecture from target-network parameterization, so one generator can instantiate heterogeneous models across the tested architecture and task families. Our empirical claims are threefold: (1) one fixed UHN remains competitive with direct training across vision, graph, text, and formula-regression benchmarks; (2) the same UHN supports both multi-model generalization within a family and multi-task learning across heterogeneous models; and (3) UHN enables stable recursive generation with up to three intermediate generated UHNs before the final base model. Our code is available at https://github.com/Xuanfeng-Zhou/UHN.
【2】Neural network methods for two-dimensional finite-source reflector design
标题:二维有限源反射器设计的神经网络方法
链接:https://arxiv.org/abs/2604.02184
作者:Roel Hacking,Lisa Kusch,Koondanibha Mitra,Martijn Anthonissen,Wilbert IJzerman
备注:20 pages, 10 figures, 1 table. Submitted to Machine Learning: Science and Technology
摘要:我们解决了设计二维反射器的逆问题,将光从一个有限的,扩展的源到一个规定的远场分布。我们提出了反射器高度的神经网络参数化,并开发了两个可微目标函数:(i)直接变量变化损失,通过学习的逆映射推动源分布,以及(ii)基于网格的损失,将目标空间网格映射回源,在交叉点上积分,即使源不连续也保持连续。通过自动微分获得的导数,并使用鲁棒拟牛顿法进行优化。作为比较,我们制定了一个反卷积基线建立在一个简化的有限源近似:一维单调映射恢复通量平衡,产生一个常微分方程求解积分因子的形式,这个求解器是嵌入在一个修改后的Van Cittert迭代与非负裁剪和射线跟踪向前运营商。在四个基准-连续和不连续的源,并与/不具有最小高度的限制-我们评估的精度射线跟踪归一化平均绝对误差(NMAE)。我们的神经网络方法收敛速度更快,并且比反卷积方法实现了更低的NMAE,并且自然地处理高度约束。我们讨论了如何可以扩展到旋转对称和完整的三维设置,通过迭代校正方案的方法。
摘要:We address the inverse problem of designing two-dimensional reflectors that transform light from a finite, extended source into a prescribed far-field distribution. We propose a neural network parameterization of the reflector height and develop two differentiable objective functions: (i) a direct change-of-variables loss that pushes the source distribution through the learned inverse mapping, and (ii) a mesh-based loss that maps a target-space grid back to the source, integrates over intersections, and remains continuous even when the source is discontinuous. Gradients are obtained via automatic differentiation and optimized with a robust quasi-Newton method. As a comparison, we formulate a deconvolution baseline built on a simplified finite-source approximation: a 1D monotone mapping is recovered from flux balance, yielding an ordinary differential equation solved in integrating-factor form; this solver is embedded in a modified Van Cittert iteration with nonnegativity clipping and a ray-traced forward operator. Across four benchmarks -- continuous and discontinuous sources, and with/without minimum-height constraints -- we evaluate accuracy by ray-traced normalized mean absolute error (NMAE). Our neural network approach converges faster and achieves consistently lower NMAE than the deconvolution method, and handles height constraints naturally. We discuss how the method may be extended to rotationally symmetric and full three-dimensional settings via iterative correction schemes.
【3】World Action Verifier: Self-Improving World Models via Forward-Inverse Asymmetry
标题:世界行动验证者:通过正向-反向不对称自我改进的世界模型
链接:https://arxiv.org/abs/2604.01985
作者:Yuejiang Liu,Fan Feng,Lingjing Kong,Weifeng Lu,Jinzhou Tang,Kun Zhang,Kevin Murphy,Chelsea Finn,Yilun Du
备注:Project Website: https://world-action-verifier.github.io
摘要:通用世界模型承诺可扩展的政策评估,优化和规划,但实现所需的鲁棒性仍然具有挑战性。与主要关注最优行为的策略学习不同,世界模型必须在更广泛的次优行为范围内是可靠的,而这些次优行为通常不能被行为标记的交互数据充分覆盖。为了应对这一挑战,我们提出了世界行动验证器(WAV),一个框架,使世界模型,以确定自己的预测错误和自我改进。关键思想是将动作条件状态预测分解为两个因素--状态可达性和动作可达性--并分别验证。我们表明,这些验证问题可以大大容易比预测未来的状态,由于两个潜在的不对称性:更广泛的可用性的行动自由的数据和行动相关的功能的低维。利用这些不对称性,我们用(i)从视频语料库获得的多样化子目标生成器和(ii)从状态特征子集推断动作的稀疏逆模型来增强世界模型。通过在生成的子目标、推断的动作和前向推出之间强制执行周期一致性,WAV在现有方法通常失败的探索不足的制度中提供了有效的验证机制。在横跨MiniGrid、RoboMimic和ManiSkill的九个任务中,我们的方法实现了2倍的样本效率,同时将下游策略性能提高了18%。
摘要:General-purpose world models promise scalable policy evaluation, optimization, and planning, yet achieving the required level of robustness remains challenging. Unlike policy learning, which primarily focuses on optimal actions, a world model must be reliable over a much broader range of suboptimal actions, which are often insufficiently covered by action-labeled interaction data. To address this challenge, we propose World Action Verifier (WAV), a framework that enables world models to identify their own prediction errors and self-improve. The key idea is to decompose action-conditioned state prediction into two factors -- state plausibility and action reachability -- and verify each separately. We show that these verification problems can be substantially easier than predicting future states due to two underlying asymmetries: the broader availability of action-free data and the lower dimensionality of action-relevant features. Leveraging these asymmetries, we augment a world model with (i) a diverse subgoal generator obtained from video corpora and (ii) a sparse inverse model that infers actions from a subset of state features. By enforcing cycle consistency among generated subgoals, inferred actions, and forward rollouts, WAV provides an effective verification mechanism in under-explored regimes, where existing methods typically fail. Across nine tasks spanning MiniGrid, RoboMimic, and ManiSkill, our method achieves 2x higher sample efficiency while improving downstream policy performance by 18%.
【4】Generalization Bounds and Statistical Guarantees for Multi-Task and Multiple Operator Learning with MNO Networks
标题:MNO网络多任务和多操作员学习的广义界限和统计保证
链接:https://arxiv.org/abs/2604.01961
作者:Adrien Weihs,Hayden Schaeffer
摘要:多算子学习涉及学习由算子描述符$α$索引的算子族$\{G[α]:U\to V\}_{α\in W}$。训练数据是分层收集的,先对运算符实例$α$进行采样,然后对每个实例的输入函数$u$进行采样,最后对每个输入的评估点$x$进行采样,从而产生噪声观测值$G[α][u](x)$。虽然最近的工作已经开发了表达多任务和多操作学习架构和近似理论的比例律,定量统计泛化保证仍然有限。我们为可分离模型提供了一个基于覆盖数的泛化分析,重点关注多神经运算符(MNO)架构:我们首先推导出由深度ReLU子网络的乘积的线性组合给出的假设类的显式度量熵界,然后将这些复杂性界限与MNO的近似保证相结合,以获得新(未见过)的预期测试误差的显式近似估计权衡三元组$(α,u,x)$。由此得到的界使得对分层采样预算$(n_α,n_u,n_x)$的依赖是透明的,并在算子采样预算$n_α$中产生一个显式的学习率声明,为算子实例间的推广提供了一个样本复杂度表征。结构和架构也可以被视为通用求解器或“小”PDE基础模型的示例,其中三元组是多模态的一种形式。
摘要:Multiple operator learning concerns learning operator families $\{G[α]:U\to V\}_{α\in W}$ indexed by an operator descriptor $α$. Training data are collected hierarchically by sampling operator instances $α$, then input functions $u$ per instance, and finally evaluation points $x$ per input, yielding noisy observations of $G[α][u](x)$. While recent work has developed expressive multi-task and multiple operator learning architectures and approximation-theoretic scaling laws, quantitative statistical generalization guarantees remain limited. We provide a covering-number-based generalization analysis for separable models, focusing on the Multiple Neural Operator (MNO) architecture: we first derive explicit metric-entropy bounds for hypothesis classes given by linear combinations of products of deep ReLU subnetworks, and then combine these complexity bounds with approximation guarantees for MNO to obtain an explicit approximation-estimation tradeoff for the expected test error on new (unseen) triples $(α,u,x)$. The resulting bound makes the dependence on the hierarchical sampling budgets $(n_α,n_u,n_x)$ transparent and yields an explicit learning-rate statement in the operator-sampling budget $n_α$, providing a sample-complexity characterization for generalization across operator instances. The structure and architecture can also be viewed as a general purpose solver or an example of a "small'' PDE foundation model, where the triples are one form of multi-modality.
【5】Learn by Surprise, Commit by Proof
标题:以惊喜学习,以证据承诺
链接:https://arxiv.org/abs/2604.01951
作者:Kang-Sin Choi
备注:24 pages, 3 figures
摘要
:我们提出了LSCP,自主知识获取的自门控后训练框架:只学习模型还不知道的东西,根据它知道的东西进行验证,强度与信念成正比,没有外部预言。当一个段落产生了过高的每令牌损失时,LSCP会标记它,生成一个Q&A链,迫使模型表达自己的知识并识别差距,然后通过$β_2 = 0.999 \cdot r^k$将AdamW的$β_2$与确信深度k(段落存活的自我验证步骤数)成比例地调整。整个学习强度由单个参数$r$控制。除了新知识之外,这个过程还强化了弱编码的现有知识,这是幻觉的主要来源。该框架是自熄的:随着模型的学习,学习段落上的每个令牌损失朝着阈值下降,系统逐渐收敛到标准AdamW。这模拟了生物记忆巩固:上下文窗口中的临时信息被选择性地巩固为参数权重,即模型的长期记忆。在参考模型(Qwen 3 - 14 B)和六个模型(8B-32 B,四个家族)上的实验表明,标准微调产生死记硬背(扰动间隙(释义与原始困惑的比率)为11.6 +- 0.2 x基线),而所有LSCP条件都是语义学习(2.7--3.0 x)。r=1.0条件(相同的优化器,几乎相同的数据,只有问答格式不同)证实训练数据格式(而不是$β_2$门控)是防止记忆的主要机制;门控相反可以保护邻近知识免受损坏内容的污染(r=0.98时相邻问题的准确率为93 +- 7%,而基线为90%)。
摘要:We propose LSCP, a self-gated post-training framework for autonomous knowledge acquisition: learning only what a model does not already know, verified against what it does know, at a strength proportional to conviction, with no external oracle. When a passage produces anomalously high per-token loss, LSCP flags it, generates a Q&A chain that forces the model to articulate its own knowledge and identify gaps, then adjusts AdamW's $β_2$ proportionally to conviction depth k (the number of self-verification steps the passage survives) via $β_2 = 0.999 \cdot r^k$. The entire learning intensity is governed by a single parameter $r$. Beyond new knowledge, this process sharpens weakly encoded existing knowledge, which is a primary source of hallucination. The framework is self-extinguishing: as the model learns, per-token loss on learned passages decreases toward the surprisal threshold and the system progressively converges to standard AdamW. This models biological memory consolidation: temporary information in the context window is selectively consolidated into parametric weights, the model's long-term memory. Experiments on the reference model (Qwen3-14B) and across six models (8B--32B, four families) show that standard fine-tuning produces rote memorization (perturbation gap (the ratio of paraphrase to original perplexity) of 11.6 +- 0.2 x baseline) while all LSCP conditions learn semantically (2.7--3.0x). The r=1.0 condition (identical optimizer, nearly identical data, only Q&A format differs) confirms that the training data format, not $β_2$ gating, is the primary mechanism preventing memorization; gating instead protects neighboring knowledge from contamination by corrupt content (93 +- 7% accuracy on adjacent questions at r=0.98 vs. 90% baseline).
【6】PAC-Bayesian Reward-Certified Outcome Weighted Learning
标题:Pac-Bayesian奖励认证的结果加权学习
链接:https://arxiv.org/abs/2604.01946
作者:Yuya Ishikawa,Shu Tamano
摘要:通过结果加权学习(OWL)估计最佳个体化治疗规则(ITR)通常依赖于观察到的奖励,这些奖励是真实潜在效用的嘈杂或乐观的代理。忽略这种奖励的不确定性,导致选择的政策与膨胀的明显表现,但现有的OWL框架缺乏有限样本的保证,需要系统地嵌入到学习目标的不确定性。为了解决这个问题,我们提出了PAC-Bayesian Reward-Certified Outcome Weighted Learning(PROWL)。给定一个单侧不确定性证书,PROWL构造一个保守的奖励和一个严格依赖于政策的真实期望值的下限。从理论上讲,我们证明了一个确切的认证减少,将强大的政策学习到一个统一的,分裂的成本敏感的分类任务。这种提法使推导出一个非渐近的PAC贝叶斯下界随机ITR,我们建立的最佳后验最大化这一界限的特点是完全由一般的贝叶斯更新。为了克服广义贝叶斯推理中固有的学习率选择问题,我们引入了一个完全自动化的,基于边界的校准程序,再加上Fisher一致认证的铰链代理有效的优化。我们的实验表明,PROWL实现了改进,估计强大的,高价值的治疗制度下严重的奖励不确定性相比,标准方法ITR估计。
摘要:Estimating optimal individualized treatment rules (ITRs) via outcome weighted learning (OWL) often relies on observed rewards that are noisy or optimistic proxies for the true latent utility. Ignoring this reward uncertainty leads to the selection of policies with inflated apparent performance, yet existing OWL frameworks lack the finite-sample guarantees required to systematically embed such uncertainty into the learning objective. To address this issue, we propose PAC-Bayesian Reward-Certified Outcome Weighted Learning (PROWL). Given a one-sided uncertainty certificate, PROWL constructs a conservative reward and a strictly policy-dependent lower bound on the true expected value. Theoretically, we prove an exact certified reduction that transforms robust policy learning into a unified, split-free cost-sensitive classification task. This formulation enables the derivation of a nonasymptotic PAC-Bayes lower bound for randomized ITRs, where we establish that the optimal posterior maximizing this bound is exactly characterized by a general Bayes update. To overcome the learning-rate selection problem inherent in generalized Bayesian inference, we introduce a fully automated, bounds-based calibration procedure, coupled with a Fisher-consistent certified hinge surrogate for efficient optimization. Our experiments demonstrate that PROWL achieves improvements in estimating robust, high-value treatment regimes under severe reward uncertainty compared to standard methods for ITR estimation.
【7】Woosh: A Sound Effects Foundation Model
标题:Woosh:音效基础模型
链接:https://arxiv.org/abs/2604.01929
作者:Gaëtan Hadjeres,Marc Ferras,Khaled Koutini,Benno Weck,Alexandre Bittar,Thomas Hummel,Zineb Lahrici,Hakim Missoum,Joan Serrà,Yuki Mitsufuji
摘要:音频研究社区依赖于开放生成模型作为构建新方法和建立基线的基础工具。在本报告中,我们介绍了索尼AI公开发布的音效基础模型Woosh,详细介绍了其架构,训练过程以及对其他流行开放模型的评估。针对声音效果进行了优化,我们提供了(1)高质量的音频编码器/解码器模型和(2)用于调节的文本-音频对齐模型,以及(3)文本到音频和(4)视频到音频生成模型。该版本还包括经过提炼的文本到音频和视频到音频模型,允许低资源操作和快速推理。我们对公共和私人数据的评估显示,与现有的开放替代方案(如StableAudio-Open和TangoFlux)相比,每个模块都具有竞争力或更好的性能。推断代码和模型权重可在https://github.com/SonyResearch/Woosh上获得。演示示例可以在https://sonyresearch.github.io/Woosh/上找到。
摘要:The audio research community depends on open generative models as foundational tools for building novel approaches and establishing baselines. In this report, we present Woosh, Sony AI's publicly released sound effect foundation model, detailing its architecture, training process, and an evaluation against other popular open models. Being optimized for sound effects, we provide (1) a high-quality audio encoder/decoder model and (2) a text-audio alignment model for conditioning, together with (3) text-to-audio and (4) video-to-audio generative models. Distilled text-to-audio and video-to-audio models are also included in the release, allowing for low-resource operation and fast inference. Our evaluation on both public and private data shows competitive or better performance for each module when compared to existing open alternatives like StableAudio-Open and TangoFlux. Inference code and model weights are available at https://github.com/SonyResearch/Woosh. Demo samples can be found at https://sonyresearch.github.io/Woosh/.
【8】LI-DSN: A Layer-wise Interactive Dual-Stream Network for EEG Decoding
标题:LI-NSO:用于脑电解码的分层交互式双流网络
链接:https://arxiv.org/abs/2604.01889
作者:Chenghao Yue,Zhiyuan Ma,Zhongye Xia,Xinche Zhang,Yisi Zhang,Xinke Shen,Sen Song
摘要:脑电图(EEG)提供了一个非侵入性的大脑活动窗口,提供了高时间分辨率,对于通过脑机接口(BCI)理解和与神经过程交互至关重要。目前的脑电双流神经网络通常通过并行分支独立地处理时间和空间特征,延迟它们的集成,直到最终的后期融合。这种设计固有地导致一个“信息筒仓”的问题,排除中间的交叉流细化和阻碍时空分解的充分利用功能。我们提出了LI-DSN,一个逐层交互式双流网络,促进渐进的,跨流通信在每一层,从而克服了后期融合范例的局限性。LI-DSN引入了一种新的时空整合注意力(TSIA)机制,该机制构建了空间亲和相关矩阵(SACM)来捕获电极间的空间结构关系,并构建了时间通道聚合矩阵(TCAM)来整合空间指导下的余弦门控时间动态。此外,我们采用了自适应融合策略与可学习的通道权重,以优化双流功能的整合。在八个不同的EEG数据集上进行的广泛实验,包括运动想象(MI)分类,情感识别和稳态视觉诱发电位(SSVEP),一致表明LI-DSN显着优于13个最先进的(SOTA)基线模型,展示了其卓越的鲁棒性和解码性能。该守则将在验收后予以公布。
摘要
:Electroencephalography (EEG) provides a non-invasive window into brain activity, offering high temporal resolution crucial for understanding and interacting with neural processes through brain-computer interfaces (BCIs). Current dual-stream neural networks for EEG often process temporal and spatial features independently through parallel branches, delaying their integration until a final, late-stage fusion. This design inherently leads to an "information silo" problem, precluding intermediate cross-stream refinement and hindering spatial-temporal decompositions essential for full feature utilization. We propose LI-DSN, a layer-wise interactive dual-stream network that facilitates progressive, cross-stream communication at each layer, thereby overcoming the limitations of late-fusion paradigms. LI-DSN introduces a novel Temporal-Spatial Integration Attention (TSIA) mechanism, which constructs a Spatial Affinity Correlation Matrix (SACM) to capture inter-electrode spatial structural relationships and a Temporal Channel Aggregation Matrix (TCAM) to integrate cosine-gated temporal dynamics under spatial guidance. Furthermore, we employ an adaptive fusion strategy with learnable channel weights to optimize the integration of dual-stream features. Extensive experiments across eight diverse EEG datasets, encompassing motor imagery (MI) classification, emotion recognition, and steady-state visual evoked potentials (SSVEP), consistently demonstrate that LI-DSN significantly outperforms 13 state-of-the-art (SOTA) baseline models, showcasing its superior robustness and decoding performance. The code will be publicized after acceptance.
【9】LiteInception: A Lightweight and Interpretable Deep Learning Framework for General Aviation Fault Diagnosis
标题:LiteIncept:用于通用航空故障诊断的轻量级且可解释的深度学习框架
链接:https://arxiv.org/abs/2604.01725
作者:Zhihuan Wei,Xinhang Chen,Danyang Han,Yang Hu,Jie Liu,Xuewen Miao,Guijiang Li
摘要:通用航空故障诊断和高效维护对飞行安全至关重要;然而,在资源受限的边缘设备上部署深度学习模型在计算能力和可解释性方面面临双重挑战。本文提出了LiteInception--一个为边缘部署设计的轻量级可解释故障诊断框架。该框架采用了与标准维护工作流程相一致的两级级联架构:第一阶段执行高召回故障检测,第二阶段对异常样本进行细粒度故障分类,从而解耦优化目标并实现计算资源的按需分配。对于模型压缩,提出了一种基于互信息、梯度分析和SE注意力权重的多方法融合策略,将输入传感器通道从23个减少到15个,并引入了1+1分支LiteInception架构,将InceptionTime参数压缩了70%,CPU推理速度提高了8倍以上,F1损失小于3%。此外,知识蒸馏被引入作为一种精确度-召回率调节机制,通过切换训练策略,使相同的轻量级模型能够适应不同的场景,例如安全关键和辅助诊断。最后,构建了一个集成四种属性方法的双层可解释性框架,提供了“哪个传感器x哪个时间段”的可追溯证据链。在NGAFID数据集上的实验表明,故障检测准确率为81.92%,召回率为83.24%,故障识别准确率为77.00%,验证了框架在效率,准确性和可解释性之间的良好平衡。
摘要:General aviation fault diagnosis and efficient maintenance are critical to flight safety; however, deploying deep learning models on resource-constrained edge devices poses dual challenges in computational capacity and interpretability. This paper proposes LiteInception--a lightweight interpretable fault diagnosis framework designed for edge deployment. The framework adopts a two-stage cascaded architecture aligned with standard maintenance workflows: Stage 1 performs high-recall fault detection, and Stage 2 conducts fine-grained fault classification on anomalous samples, thereby decoupling optimization objectives and enabling on-demand allocation of computational resources. For model compression, a multi-method fusion strategy based on mutual information, gradient analysis, and SE attention weights is proposed to reduce the input sensor channels from 23 to 15, and a 1+1 branch LiteInception architecture is introduced that compresses InceptionTime parameters by 70%, accelerates CPU inference by over 8x, with less than 3% F1 loss. Furthermore, knowledge distillation is introduced as a precision-recall regulation mechanism, enabling the same lightweight model to adapt to different scenarios--such as safety-critical and auxiliary diagnosis--by switching training strategies. Finally, a dual-layer interpretability framework integrating four attribution methods is constructed, providing traceable evidence chains of "which sensor x which time period." Experiments on the NGAFID dataset demonstrate a fault detection accuracy of 81.92% with 83.24% recall, and a fault identification accuracy of 77.00%, validating the framework's favorable balance among efficiency, accuracy, and interpretability.
【10】Cognitive Energy Modeling for Neuroadaptive Human-Machine Systems using EEG and WGAN-GP
标题:基于EEG和WGAN-GP的神经自适应人机系统认知能量建模
链接:https://arxiv.org/abs/2604.01653
作者:Sriram Sattiraju,Vaibhav Gollapalli,Aryan Shah,Timothy McMahan
摘要:脑电图(EEG)提供了一种非侵入性的洞察大脑的认知和情绪动力学。然而,对这些状态如何实时演变进行建模并量化这种转变所需的能量仍然是一个重大挑战。薛定谔桥问题(SBP)提供了一个原则性的概率框架来模拟大脑状态之间最有效的进化,解释为认知能量成本的度量。虽然生成模型(如GANs)已被广泛用于增强EEG数据,但仍不清楚合成EEG是否保留了基于转换的分析所需的潜在动力学结构。在这项工作中,我们通过使用SBP导出的传输成本作为度量来评估GAN生成的EEG是否保留了认知状态转换的基于能量的建模所需的分布几何来解决这一差距。我们比较了从Stroop任务中收集的真实和合成EEG中获得的过渡能量,并在组和参与者级别的分析中表现出很强的一致性。这些结果表明,合成EEG保留了基于SBP的建模所需的过渡结构,使其能够用于数据高效的神经适应系统。我们进一步提出了一个框架,其中SBP衍生的认知能量作为自适应人机系统的控制信号,支持响应于用户的认知和情感状态的系统行为的实时调整。
摘要:Electroencephalography (EEG) provides a non-invasive insight into the brain's cognitive and emotional dynamics. However, modeling how these states evolve in real time and quantifying the energy required for such transitions remains a major challenge. The Schrödinger Bridge Problem (SBP) offers a principled probabilistic framework to model the most efficient evolution between the brain states, interpreted as a measure of cognitive energy cost. While generative models such as GANs have been widely used to augment EEG data, it remains unclear whether synthetic EEG preserves the underlying dynamical structure required for transition-based analysis. In this work, we address this gap by using SBP-derived transport cost as a metric to evaluate whether GAN-generated EEG retains the distributional geometry necessary for energy-based modeling of cognitive state transitions. We compare transition energies derived from real and synthetic EEG collected during Stroop tasks and demonstrate strong agreement across group and participant-level analyses. These results indicate that synthetic EEG preserves the transition structure required for SBP-based modeling, enabling its use in data-efficient neuroadaptive systems. We further present a framework in which SBP-derived cognitive energy serves as a control signal for adaptive human-machine systems, supporting real-time adjustment of system behavior in response to user cognitive and affective state.
【11】Thinking While Listening: Fast-Slow Recurrence for Long-Horizon Sequential Modeling
标题:边听边思考:长视野序列建模的快-慢回归
链接:https://arxiv.org/abs/2604.01577
作者:Shota Takashiro,Masanori Koyama,Takeru Miyato,Yusuke Iwasawa,Yutaka Matsuo,Kohei Hayashi
摘要:我们扩展了最近的潜在经常性建模的顺序输入流。通过在缓慢的观察更新之间交错快速,经常性的潜在更新与自组织能力,我们的方法促进了与输入一起进化的稳定内部结构的学习。这种机制允许模型在长期范围内保持一致和聚类表示,与LSTM,状态空间模型和Transformer变体等顺序基线相比,改进了强化学习和算法任务中的分布外泛化。
摘要:We extend the recent latent recurrent modeling to sequential input streams. By interleaving fast, recurrent latent updates with self-organizational ability between slow observation updates, our method facilitates the learning of stable internal structures that evolve alongside the input. This mechanism allows the model to maintain coherent and clustered representations over long horizons, improving out-of-distribution generalization in reinforcement learning and algorithmic tasks compared to sequential baselines such as LSTM, state space models, and Transformer variants.
【12】ZEUS: Accelerating Diffusion Models with Only Second-Order Predictor
标题:ZEUS:仅具有二阶预测的加速扩散模型
链接
:https://arxiv.org/abs/2604.01552
作者:Yixiao Wang,Ting Jiang,Zishan Shao,Hancheng Ye,Jingwei Sun,Mingyuan Ma,Jianyi Zhang,Yiran Chen,Hai Li
摘要:去噪生成模型提供高保真生成,但由于采样期间需要许多迭代去噪器调用,因此仍然受到推理延迟的影响。免训练加速方法通过稀疏化模型架构或缩短采样轨迹来减少延迟。目前的免训练加速方法比必要的更复杂:高阶预测器在积极的加速下放大了错误,并且架构修改阻碍了部署。超过2倍的加速,跳步会造成结构性稀缺--每个局部窗口最多一次新的评估--只剩下计算输出及其向后差异作为唯一的因果基础信息。在此基础上,我们提出了ZEUS,一种加速方法,使用二阶预测器预测减少的去噪器评估,并通过避免背靠背外推的交织方案稳定积极的连续跳过。ZEUS基本上增加了零开销,没有功能缓存,也没有架构修改,并且它与不同的主干,预测目标和求解器选择兼容。在整个图像和视频生成过程中,ZEUS在最近的无训练基线上不断提高速度保真度性能,在保持感知质量的同时实现高达3.2倍的端到端加速。我们的代码可在https://github.com/Ting-Justin-Jiang/ZEUS上获得。
摘要:Denoising generative models deliver high-fidelity generation but remain bottlenecked by inference latency due to the many iterative denoiser calls required during sampling. Training-free acceleration methods reduce latency by either sparsifying the model architecture or shortening the sampling trajectory. Current training-free acceleration methods are more complex than necessary: higher-order predictors amplify error under aggressive speedups, and architectural modifications hinder deployment. Beyond 2x acceleration, step skipping creates structural scarcity -- at most one fresh evaluation per local window -- leaving the computed output and its backward difference as the only causally grounded information. Based on this, we propose ZEUS, an acceleration method that predicts reduced denoiser evaluations using a second-order predictor, and stabilizes aggressive consecutive skipping with an interleaved scheme that avoids back-to-back extrapolations. ZEUS adds essentially zero overhead, no feature caches, and no architectural modifications, and it is compatible with different backbones, prediction objectives, and solver choices. Across image and video generation, ZEUS consistently improves the speed-fidelity performance over recent training-free baselines, achieving up to 3.2x end-to-end speedup while maintaining perceptual quality. Our code is available at: https://github.com/Ting-Justin-Jiang/ZEUS.
【13】Benchmark Problems and Benchmark Datasets for the evaluation of Machine and Deep Learning methods on Photoplethysmography signals: the D4 report from the QUMPHY project
标题:用于评估光体积脉搏成像信号机器和深度学习方法的基准问题和基准数据集:QUUMPY项目的D4报告
链接:https://arxiv.org/abs/2604.01398
作者:Urs Hackstein,Jordi Alastruey,Philip Aston,Ciaran Bench,Peter H. Charlton,Loic Coquelin,Nando Hegemann,Vaidotas Marozas,Mohammad Moulaeifard,Manasi Nandi,Andrius Petrenas,Oskar Pfeffer,Mantas Rinkevicius,Andrius Solosenko,Nils Strodthoff,Sara Vardanega
备注:28 pages
摘要:本报告是欧盟资助的Qumphy项目(22HLT01 Qumphy)的一部分,该项目致力于制定措施,以量化与应用于医疗问题的机器学习算法相关的不确定性,特别是光电容积描记(PPG)信号的分析和处理。在本报告中,列出了与PPG信号相关并作为基准问题的六个医疗问题。还描述了合适的基准数据集及其使用。
摘要:This report is part of the Qumphy project (22HLT01 Qumphy) that is funded by the European Union and is dedicated to the development of measures to quantify the uncertainties associated with Machine Learning algorithms applied to medical problems, in particular the analysis and processing of Photoplethysmography (PPG) signals. In this report, a list of six medical problems that are related to PPG signals and serve as Benchmark Problems is given. Suitable Benchmark datasets and their usage are described also.
【14】Safety, Security, and Cognitive Risks in World Models
标题:世界模型中的安全、保障和认知风险
链接:https://arxiv.org/abs/2604.01346
作者:Manoj Parmar
备注:26 pages, 1 figure (6 panels), 2 tables. Empirical proof-of-concept on GRU/RSSM/DreamerV3 architectures
摘要:世界模型--学习环境动态的内部模拟器--正在迅速成为机器人、自动驾驶汽车和人工智能中自主决策的基础。然而,这种预测能力带来了一系列独特的安全、安保和认知风险。攻击者可以破坏训练数据,毒害潜在表示,并利用复合推出错误在安全关键型部署中造成灾难性故障。装备世界模型的智能体更有能力进行目标泛化、欺骗性对齐和奖励黑客攻击,因为它们可以模拟自己行为的后果。权威的世界模型预测进一步助长了自动化偏见和错误的人类信任,而运营商缺乏审计工具。 本文调查了世界模型景观;介绍了轨迹持久性和代表性风险的正式定义;提出了五个配置文件的攻击者能力分类;并开发了一个统一的威胁模型,将MITRE ATLAS和OWASP LLM Top 10扩展到世界模型堆栈。我们提供了一个关于恶意持续对抗攻击的经验概念验证(GRU-RSSM:A_1 = 2.26x放大,对抗微调下减少-59.5%;随机RSSM代理:A_1 = 0.65x; DreamerV 3检查点:非零动作漂移确认)。我们通过四种部署场景来说明风险,并提出跨学科的缓解措施,包括对抗性硬化、对齐工程、NIST AI RMF和欧盟AI法案治理以及人为因素设计。我们认为,世界模型必须被视为安全关键的基础设施,需要与飞行控制软件或医疗设备相同的严格性。
摘要:World models -- learned internal simulators of environment dynamics -- are rapidly becoming foundational to autonomous decision-making in robotics, autonomous vehicles, and agentic AI. Yet this predictive power introduces a distinctive set of safety, security, and cognitive risks. Adversaries can corrupt training data, poison latent representations, and exploit compounding rollout errors to cause catastrophic failures in safety-critical deployments. World model-equipped agents are more capable of goal misgeneralisation, deceptive alignment, and reward hacking precisely because they can simulate the consequences of their own actions. Authoritative world model predictions further foster automation bias and miscalibrated human trust that operators lack the tools to audit. This paper surveys the world model landscape; introduces formal definitions of trajectory persistence and representational risk; presents a five-profile attacker capability taxonomy; and develops a unified threat model extending MITRE ATLAS and the OWASP LLM Top 10 to the world model stack. We provide an empirical proof-of-concept on trajectory-persistent adversarial attacks (GRU-RSSM: A_1 = 2.26x amplification, -59.5% reduction under adversarial fine-tuning; stochastic RSSM proxy: A_1 = 0.65x; DreamerV3 checkpoint: non-zero action drift confirmed). We illustrate risks through four deployment scenarios and propose interdisciplinary mitigations spanning adversarial hardening, alignment engineering, NIST AI RMF and EU AI Act governance, and human-factors design. We argue that world models must be treated as safety-critical infrastructure requiring the same rigour as flight-control software or medical devices.
【15】Topological Effects in Neural Network Field Theory
标题:神经网络场论中的布局效应
链接:https://arxiv.org/abs/2604.02313
作者:Christian Ferko,James Halverson,Vishnu Jejjala,Brandon Robinson
备注:55 pages, 8 figures
摘要:神经网络场论将场论表述为由网络架构及其参数的密度定义的场的统计集合。我们通过引入标记拓扑量子数的离散参数将构造扩展到拓扑设置。我们恢复Berezinskii-Kosterlitz-无边界过渡,包括自旋波临界线和高温下涡旋的增殖。我们还验证了玻色弦的T-对偶性,在动量交换和缠绕下的不变性,根据Buscher规则在恒定环形背景下的sigma模型耦合的变换,在自对偶半径下电流代数的增强,以及非几何T-折叠过渡函数。
摘要
:Neural network field theory formulates field theory as a statistical ensemble of fields defined by a network architecture and a density on its parameters. We extend the construction to topological settings via the inclusion of discrete parameters that label the topological quantum number. We recover the Berezinskii--Kosterlitz--Thouless transition, including the spin-wave critical line and the proliferation of vortices at high temperatures. We also verify the T-duality of the bosonic string, showing invariance under the exchange of momentum and winding on $S^1$, the transformation of the sigma model couplings according to the Buscher rules on constant toroidal backgrounds, the enhancement of the current algebra at self-dual radius, and non-geometric T-fold transition functions.
【16】Learning in Prophet Inequalities with Noisy Observations
标题:通过吵闹的观察学习先知不平等
链接:https://arxiv.org/abs/2604.01789
作者:Jung-hun Kim,Vianney Perchet
备注:ICLR 2026
摘要:我们研究先知的不平等,在线决策和最优停止的一个基本问题,在一个实际的环境中,只有通过噪音的实现和奖励分布是未知的。在每个阶段,决策者接收噪声奖励,其真实值遵循具有未知潜在参数的线性模型,并观察从分布中提取的特征向量。为了应对这一挑战,我们提出了通过置信下限(LCB)阈值来整合学习和决策的算法。在i.i.d.\设置,我们建立了一个探索,然后决定策略和$\varepados $-贪婪的变种实现尖锐的竞争比1 - 1/e$,在一个温和的条件下的最优值。对于不同的分布,我们表明,竞争力的比例为1/2$可以保证对一个宽松的基准。此外,通过对过去奖励的有限窗口访问,实现了相对于最优基准的1/2$的紧比率。
摘要:We study the prophet inequality, a fundamental problem in online decision-making and optimal stopping, in a practical setting where rewards are observed only through noisy realizations and reward distributions are unknown. At each stage, the decision-maker receives a noisy reward whose true value follows a linear model with an unknown latent parameter, and observes a feature vector drawn from a distribution. To address this challenge, we propose algorithms that integrate learning and decision-making via lower-confidence-bound (LCB) thresholding. In the i.i.d.\ setting, we establish that both an Explore-then-Decide strategy and an $\varepsilon$-Greedy variant achieve the sharp competitive ratio of $1 - 1/e$, under a mild condition on the optimal value. For non-identical distributions, we show that a competitive ratio of $1/2$ can be guaranteed against a relaxed benchmark. Moreover, with limited window access to past rewards, the tight ratio of $1/2$ against the optimal benchmark is achieved.
其他(34篇)
【1】ActionParty: Multi-Subject Action Binding in Generative Video Games
标题:收件箱派对:世代视频游戏中的多主题动作绑定
链接:https://arxiv.org/abs/2604.02330
作者:Alexander Pondaven,Ziyi Wu,Igor Gilitschenski,Philip Torr,Sergey Tulyakov,Fabio Pizzati,Aliaksandr Siarohin
备注:Project page: https://action-party.github.io/
摘要:视频传播的最新进展使得能够开发能够模拟交互式环境的“世界模型”。然而,这些模型在很大程度上限于单智能体设置,无法同时控制场景中的多个智能体。在这项工作中,我们解决了一个基本问题,在现有的视频扩散模型,努力将具体的行动与其相应的主题相关联的行动绑定。为此,我们提出了一个行为可控的多主体世界模型,用于生成式视频游戏。它引入了主体状态令牌,即持久捕获场景中每个主体的状态的潜在变量。通过联合建模状态令牌和视频潜伏期与空间偏置机制,我们解开全球视频帧渲染从个人的动作控制的主题更新。我们在Melting Pot基准上评估了WooParty,展示了第一个能够在46个不同环境中同时控制多达7个玩家的视频世界模型。我们的研究结果表明,在行动的准确性和身份一致性显着改善,同时通过复杂的相互作用,使强大的自回归跟踪的主题。
摘要:Recent advances in video diffusion have enabled the development of "world models" capable of simulating interactive environments. However, these models are largely restricted to single-agent settings, failing to control multiple agents simultaneously in a scene. In this work, we tackle a fundamental issue of action binding in existing video diffusion models, which struggle to associate specific actions with their corresponding subjects. For this purpose, we propose ActionParty, an action controllable multi-subject world model for generative video games. It introduces subject state tokens, i.e. latent variables that persistently capture the state of each subject in the scene. By jointly modeling state tokens and video latents with a spatial biasing mechanism, we disentangle global video frame rendering from individual action-controlled subject updates. We evaluate ActionParty on the Melting Pot benchmark, demonstrating the first video world model capable of controlling up to seven players simultaneously across 46 diverse environments. Our results show significant improvements in action-following accuracy and identity consistency, while enabling robust autoregressive tracking of subjects through complex interactions.
【2】go-$m$HC: Direct Parameterization of Manifold-Constrained Hyper-Connections via Generalized Orthostochastic Matrices
标题:go-$m$HC:通过广义正随机矩阵对总管约束超连接进行直接参数化
链接:https://arxiv.org/abs/2604.02309
作者:Torque Dandachi,Sophia Diggs-Galligan
备注:29 pages, 30 figures, 9 tables. Includes supplementary material
摘要:双随机矩阵使学习混合跨剩余流,但参数化的双随机矩阵(伯克霍夫多面体)的集合准确和有效地仍然是一个开放的挑战。现有的精确方法的规模与流的数量($d$),而克罗内克因式分解的方法是有效的,但表现力有限。我们引入了一种新的精确参数化理论的广义正交随机矩阵,其规模为$\mathcal{O}(d^3)$,并暴露了一个单一的超参数$s$之间的连续插值计算有效的边界和充分表达Birkhoff多面体。流形约束的超连接($m$HC),学习动态层连接的框架,我们实例化这个参数在go-$m$HC。我们的方法与Kronecker因子分解方法自然组合,以类似的FLOP成本大幅恢复表现力。频谱分析表明,去$m$HC填充伯克霍夫多面体远远超过克罗内克分解基线。在合成流混合任务中,go-$m$HC实现了最小的理论损失,同时收敛速度加快了10倍。我们验证了我们的方法在一个30 M参数的GPT风格的语言模型。go-$m$HC的表达性、效率和精确性为将$d$作为模型容量的新维度提供了一条实用的途径。
摘要:Doubly stochastic matrices enable learned mixing across residual streams, but parameterizing the set of doubly stochastic matrices (the Birkhoff polytope) exactly and efficiently remains an open challenge. Existing exact methods scale factorially with the number of streams ($d$), while Kronecker-factorized approaches are efficient but expressivity-limited. We introduce a novel exact parameterization grounded in the theory of generalized orthostochastic matrices, which scales as $\mathcal{O}(d^3)$ and exposes a single hyperparameter $s$ which continuously interpolates between a computationally efficient boundary and the fully expressive Birkhoff polytope. Building on Manifold-Constrained Hyper-Connections ($m$HC), a framework for learned dynamic layer connectivity, we instantiate this parameterization in go-$m$HC. Our method composes naturally with Kronecker-factorized methods, substantially recovering expressivity at similar FLOP costs. Spectral analysis indicates that go-$m$HC fills the Birkhoff polytope far more completely than Kronecker-factorized baselines. On synthetic stream-mixing tasks, go-$m$HC achieves the minimum theoretical loss while converging up to $10\times$ faster. We validate our approach in a 30M parameter GPT-style language model. The expressivity, efficiency, and exactness of go-$m$HC offer a practical avenue for scaling $d$ as a new dimension of model capacity.
【3】(PAC-)Learning state machines from data streams: A generic strategy and an improved heuristic (Extended version)
标题:(PAC-)从数据流学习状态机:通用策略和改进的启发式(扩展版本)
链接:https://arxiv.org/abs/2604.02244
作者
:Robert Baumgartner,Sicco Verwer
备注:Extended version of Learning state machines from data streams: A generic strategy and an improved heuristic, International Conference on Grammatical Inference (ICGI) 2023, Rabat, Morocco
摘要:这是我们的出版物《从数据流中学习状态机:通用策略和改进的启发式》的扩展版本,2023年国际语法推理会议(ICGI),摩洛哥拉巴特。它已经扩展了PAC边界的形式证明,并且类似方法的讨论和分析已经从附录中移走,现在是一个完整的部分。 状态机模型是模拟离散事件系统的行为的模型,能够表示诸如软件系统、网络交互和控制系统之类的系统,并且已经被广泛研究。然而,大多数学习算法的本质是假设所有数据在算法开始时都可用,并且很少有研究从流数据中学习状态机。在本文中,我们希望通过提出一种从数据流中学习状态机的通用方法,以及使用草图来解释不完整前缀树的合并启发式方法,进一步缩小这一差距。我们实现了我们的方法在一个开源的状态合并库,并与现有的方法进行比较。我们展示了我们的方法在运行时间,内存消耗和结果质量方面的有效性。此外,我们提供了一个正式的分析,我们的算法,表明它是能够学习的PAC框架内,并显示理论上的改进,以增加运行时间,而不会牺牲算法的正确性,在较大的样本量。
摘要:This is an extended version of our publication Learning state machines from data streams: A generic strategy and an improved heuristic, International Conference on Grammatical Inference (ICGI) 2023, Rabat, Morocco. It has been extended with a formal proof on PAC-bounds, and the discussion and analysis of a similar approach has been moved from the appendix and is now a full Section. State machines models are models that simulate the behavior of discrete event systems, capable of representing systems such as software systems, network interactions, and control systems, and have been researched extensively. The nature of most learning algorithms however is the assumption that all data be available at the beginning of the algorithm, and little research has been done in learning state machines from streaming data. In this paper, we want to close this gap further by presenting a generic method for learning state machines from data streams, as well as a merge heuristic that uses sketches to account for incomplete prefix trees. We implement our approach in an open-source state merging library and compare it with existing methods. We show the effectiveness of our approach with respect to run-time, memory consumption, and quality of results on a well known open dataset. Additionally, we provide a formal analysis of our algorithm, showing that it is capable of learning within the PAC framework, and show a theoretical improvement to increase run-time, without sacrificing correctness of the algorithm in larger sample sizes.
【4】From High-Dimensional Spaces to Verifiable ODD Coverage for Safety-Critical AI-based Systems
标题:从多维空间到安全关键人工智能系统的可验证ODD覆盖
链接:https://arxiv.org/abs/2604.02198
作者:Thomas Stefani,Johann Maximilian Christensen,Elena Hoemann,Frank Köster,Sven Hallerbach
摘要:虽然人工智能(AI)为运营绩效提供了变革潜力,但其在航空等安全关键领域的部署需要严格遵守严格的认证标准。目前的EASA指南要求证明AI/ML组成部分的运营设计领域(ODD)的完整覆盖范围-这一要求要求需要证明在定义的运营边界内不存在关键差距。然而,由于系统在高维参数空间中运行,现有方法难以提供满足完整性标准所需的可扩展性和正式基础。目前,还没有标准化的工程方法来弥合抽象的ODD定义和可验证的证据之间的差距。本文提出了一种方法,将参数离散化,基于约束的过滤,和基于临界的降维到一个结构化的,多步骤的ODD覆盖验证过程中,解决了这个空白。基于先前基于人工智能的空中防撞研究所收集的模拟数据,这项工作展示了一种系统的工程方法来定义和实现满足EASA完整性要求的覆盖指标。最终,这种方法能够在更高的维度上验证ODD覆盖率,在符合EASA标准的同时推进安全设计方法。
摘要:While Artificial Intelligence (AI) offers transformative potential for operational performance, its deployment in safety-critical domains such as aviation requires strict adherence to rigorous certification standards. Current EASA guidelines mandate demonstrating complete coverage of the AI/ML constituent's Operational Design Domain (ODD) -- a requirement that demands proof that no critical gaps exist within defined operational boundaries. However, as systems operate within high-dimensional parameter spaces, existing methods struggle to provide the scalability and formal grounding necessary to satisfy the completeness criterion. Currently, no standardized engineering method exists to bridge the gap between abstract ODD definitions and verifiable evidence. This paper addresses this void by proposing a method that integrates parameter discretization, constraint-based filtering, and criticality-based dimension reduction into a structured, multi-step ODD coverage verification process. Grounded in gathered simulation data from prior research on AI-based mid-air collision avoidance research, this work demonstrates a systematic engineering approach to defining and achieving coverage metrics that satisfy EASA's demand for completeness. Ultimately, this method enables the validation of ODD coverage in higher dimensions, advancing a Safety-by-Design approach while complying with EASA's standards.
【5】Computing the Exact Pareto Front in Average-Cost Multi-Objective Markov Decision Processes
标题:平均成本多目标Markov决策过程中精确帕累托前沿的计算
链接:https://arxiv.org/abs/2604.02196
作者:Jiping Luo,Nikolaos Pappas
摘要:许多通信和控制问题都可以归结为多目标马尔可夫决策过程(MOMDPs)。MOMDP的完整解决方案是帕累托前沿。大部分文献通过标量化将这一前沿近似为单目标MDP。最近的工作已经开始在折扣或简单的双目标设置的特点,充分利用其几何。在这项工作中,我们的平均成本MOMDPs的准确前的特点。我们表明,前是一个连续的,分段线性曲面躺在边界上的凸多面体。每个顶点对应于一个确定性策略,相邻的顶点在一个状态中不同。每个边缘被实现为在其端点的策略的凸组合,以封闭形式给出的混合系数。我们将这些结果应用于远程状态估计问题,其中前面的每个顶点对应于一个阈值策略。精确的Pareto前沿和解决方案,某些非凸MDP可以得到没有明确解决任何MDP。
摘要:Many communication and control problems are cast as multi-objective Markov decision processes (MOMDPs). The complete solution to an MOMDP is the Pareto front. Much of the literature approximates this front via scalarization into single-objective MDPs. Recent work has begun to characterize the full front in discounted or simple bi-objective settings by exploiting its geometry. In this work, we characterize the exact front in average-cost MOMDPs. We show that the front is a continuous, piecewise-linear surface lying on the boundary of a convex polytope. Each vertex corresponds to a deterministic policy, and adjacent vertices differ in exactly one state. Each edge is realized as a convex combination of the policies at its endpoints, with the mixing coefficient given in closed form. We apply these results to a remote state estimation problem, where each vertex on the front corresponds to a threshold policy. The exact Pareto front and solutions to certain non-convex MDPs can be obtained without explicitly solving any MDP.
【6】Do Lexical and Contextual Coreference Resolution Systems Degrade Differently under Mention Noise? An Empirical Study on Scientific Software Mentions
标题:词汇和上下文共指解析系统在提及噪音下是否会发生不同的退化?科学软件提及的实证研究
链接:https://arxiv.org/abs/2604.02171
作者:Atilla Kaan Alkan,Felix Grezes,Jennifer Lynn Bartlett,Anna Kelbert,Kelly Lockhart,Alberto Accomazzi
备注:8 pages
摘要:我们介绍了我们在SOMD 2026跨文档软件提及共指消解共享任务中的参与情况,其中我们的系统在所有三个子任务中排名第二。我们比较了两种无微调的方法:模糊匹配(FM),一种词汇串相似性方法,和上下文感知表示(CAR),它结合了提及级和文档级嵌入。两者在所有子任务上都实现了具有竞争力的性能(CoNLL F1为0.94-0.96),其中CAR在官方测试集上始终优于FM 1分,与软件名称的高表面规则性一致,这减少了对复杂语义推理的需求。受控噪声注入研究揭示了互补的故障模式:随着边界噪声的增加,CAR从干净到完全损坏的输入仅损失0.07 F1点,而FM为0.20,而在提到的替代下,FM的退化更优雅(0.52 vs. 0.63)。我们的推理时间分析表明,FM规模与语料库的大小超线性,而CAR规模近似线性,使CAR更有效的选择在大规模。这些研究结果表明,系统的选择应通知的噪声配置文件的上游提及检测器和目标语料库的规模。我们发布我们的代码,以支持未来的工作,在这方面的探索不足的任务。
摘要
:We present our participation in the SOMD 2026 shared task on cross-document software mention coreference resolution, where our systems ranked second across all three subtasks. We compare two fine-tuning-free approaches: Fuzzy Matching (FM), a lexical string-similarity method, and Context Aware Representations (CAR), which combines mention-level and document-level embeddings. Both achieve competitive performance across all subtasks (CoNLL F1 of 0.94-0.96), with CAR consistently outperforming FM by 1 point on the official test set, consistent with the high surface regularity of software names, which reduces the need for complex semantic reasoning. A controlled noise-injection study reveals complementary failure modes: as boundary noise increases, CAR loses only 0.07 F1 points from clean to fully corrupted input, compared to 0.20 for FM, whereas under mention substitution, FM degrades more gracefully (0.52 vs. 0.63). Our inference-time analysis shows that FM scales superlinearly with corpus size, whereas CAR scales approximately linearly, making CAR the more efficient choice at large scale. These findings suggest that system selection should be informed by both the noise profile of the upstream mention detector and the scale of the target corpus. We release our code to support future work on this underexplored task.
【7】Cross-Modal Visuo-Tactile Object Perception
标题:跨模式视觉触觉物体感知
链接:https://arxiv.org/abs/2604.02108
作者:Anirvan Dutta,Simone Tasciotti,Claudia Cusseddu,Ang Li,Panayiota Poirazi,Julijana Gjorgjieva,Etienne Burdet,Patrick van der Smagt,Mohsen Kaboli
备注:23 pages, 8 figures, 1 table. Submitted for review to journal
摘要:估计物理特性对于安全和高效的自主机器人操作至关重要,特别是在接触丰富的交互期间。在这样的设置中,视觉和触觉感测提供关于对象几何形状、姿态、惯性、刚度和接触动力学(诸如粘滑行为)的补充信息。然而,这些属性仅是间接可观察的,并且不总是能够精确地建模(例如,与非线性接触摩擦耦合的非刚性物体中的变形),使得估计问题固有地复杂并且需要在动作期间持续利用视觉-触觉感觉信息。现有的视觉-触觉感知框架主要强调强有力的传感器融合或静态跨模态对齐,对物体属性的不确定性和信念如何随时间演变的考虑有限。受人类多感官感知和主动推理的启发,我们提出了跨模态潜在过滤器(CMLF)来学习物理对象属性的结构化的因果潜在状态空间。CMLF支持视觉和触觉之间跨模态先验的双向传输,并通过随时间推移而演变的贝叶斯推理过程整合感官证据。真实世界的机器人实验表明,CMLF提高了效率和鲁棒性的不确定性下的潜在物理特性估计相比,基线方法。除了性能增益之外,该模型还表现出与人类相似的感知耦合现象,包括对跨模态错觉的敏感性和学习跨感官关联的相似轨迹。总之,这些结果构成了机器人多感官感知的可推广的,强大的和物理上一致的跨模态集成的重要一步。
摘要:Estimating physical properties is critical for safe and efficient autonomous robotic manipulation, particularly during contact-rich interactions. In such settings, vision and tactile sensing provide complementary information about object geometry, pose, inertia, stiffness, and contact dynamics, such as stick-slip behavior. However, these properties are only indirectly observable and cannot always be modeled precisely (e.g., deformation in non-rigid objects coupled with nonlinear contact friction), making the estimation problem inherently complex and requiring sustained exploitation of visuo-tactile sensory information during action. Existing visuo-tactile perception frameworks have primarily emphasized forceful sensor fusion or static cross-modal alignment, with limited consideration of how uncertainty and beliefs about object properties evolve over time. Inspired by human multi-sensory perception and active inference, we propose the Cross-Modal Latent Filter (CMLF) to learn a structured, causal latent state-space of physical object properties. CMLF supports bidirectional transfer of cross-modal priors between vision and touch and integrates sensory evidence through a Bayesian inference process that evolves over time. Real-world robotic experiments demonstrate that CMLF improves the efficiency and robustness of latent physical properties estimation under uncertainty compared to baseline approaches. Beyond performance gains, the model exhibits perceptual coupling phenomena analogous to those observed in humans, including susceptibility to cross-modal illusions and similar trajectories in learning cross-sensory associations. Together, these results constitutes a significant step toward generalizable, robust and physically consistent cross-modal integration for robotic multi-sensory perception.
【8】Abnormal Head Movements in Neurological Conditions: A Knowledge-Based Dataset with Application to Cervical Dystonia
标题:神经系统疾病中的异常头部运动:基于知识的数据集应用于颈性肌张力障碍
链接:https://arxiv.org/abs/2604.01962
作者:Saja Al-Dabet,Sherzod Turaev,Nazar Zaki
摘要:异常头部运动(AHM)表现在广泛的神经系统疾病中;然而,缺乏整合运动学测量,临床严重程度评分和患者人口统计学的多条件资源构成了AI驱动诊断工具开发的持续障碍。为了解决这一差距,本研究引入了NeuroPose-AHM,这是一个基于知识的神经诱导AHM数据集,通过应用于1,430篇同行评审出版物的多LLM提取框架构建。该数据集包含来自846篇AHM相关论文的2,756份患者组水平记录,涵盖57种神经系统疾病。LLM间可靠性分析证实了稳健的提取性能,研究级分类达到了高度一致(kappa = 0.822)。为了证明数据集的分析效用,一个四任务框架应用于颈部肌张力障碍(CD),最直接定义的病理头部运动的条件。首先,任务1执行多标签AHM类型分类(F1 = 0.856)。任务2构建了头颈部严重程度指数(HNSI),这是一个统一的度量标准,将异质性临床评级量表标准化。然后在任务3中评价该指数的临床相关性,其中HNSI根据真实世界CD患者数据进行验证,对齐的严重度带比例(6.7%)为高严重度范围内的指数校准提供了初步的可行性指示。最后,任务4在运动类型概率和HNSI分数之间进行桥接分析,产生显著相关性(p小于0.001)。这些结果表明,NeuroPose-AHM作为一个结构化的,基于知识的资源,神经AHM研究的分析工具。NeuroPose-AHM数据集可在Zenodo上公开获取(https://doi.org/10.5281/zenodo.19386862)。
摘要:Abnormal head movements (AHMs) manifest across a broad spectrum of neurological disorders; however, the absence of a multi-condition resource integrating kinematic measurements, clinical severity scores, and patient demographics constitutes a persistent barrier to the development of AI-driven diagnostic tools. To address this gap, this study introduces NeuroPose-AHM, a knowledge-based dataset of neurologically induced AHMs constructed through a multi-LLM extraction framework applied to 1,430 peer-reviewed publications. The dataset contains 2,756 patient-group-level records spanning 57 neurological conditions, derived from 846 AHM-relevant papers. Inter-LLM reliability analysis confirms robust extraction performance, with study-level classification achieving strong agreement (kappa = 0.822). To demonstrate the dataset's analytical utility, a four-task framework is applied to cervical dystonia (CD), the condition most directly defined by pathological head movement. First, Task 1 performs multi-label AHM type classification (F1 = 0.856). Task 2 constructs the Head-Neck Severity Index (HNSI), a unified metric that normalizes heterogeneous clinical rating scales. The clinical relevance of this index is then evaluated in Task 3, where HNSI is validated against real-world CD patient data, with aligned severe-band proportions (6.7%) providing a preliminary plausibility indication for index calibration within the high severity range. Finally, Task 4 performs bridge analysis between movement-type probabilities and HNSI scores, producing significant correlations (p less than 0.001). These results demonstrate the analytical utility of NeuroPose-AHM as a structured, knowledge-based resource for neurological AHM research. The NeuroPose-AHM dataset is publicly available on Zenodo (https://doi.org/10.5281/zenodo.19386862).
【9】annbatch unlocks terabyte-scale training of biological data in anndata
标题:annbatch解锁anndata中生物数据TB规模的训练
链接:https://arxiv.org/abs/2604.01949
作者:Ilan Gold,Felix Fischer,Lucas Arnoldt,F. Alexander Wolf,Fabian J. Theis
摘要:生物数据集的规模现在通常超过系统内存,使得数据访问而不是模型计算成为训练机器学习模型的主要瓶颈。这一瓶颈在生物学中尤为严重,在生物学中,广泛使用的社区数据格式必须支持异构元数据、稀疏和密集分析以及已建立的计算生态系统中的下游分析。在这里,我们介绍了annbatch,一个原生于anndata的小型批量加载器,它可以直接在磁盘支持的数据集上进行核外训练。在单细胞转录组学、显微镜和全基因组测序基准测试中,annbatch将加载吞吐量提高了一个数量级,并将训练时间从几天缩短到几小时,同时保持与scverse生态系统的完全兼容。Annbatch为可扩展的生物AI建立了一个实用的数据加载基础设施,允许在不放弃标准生物数据格式的情况下使用越来越大和多样化的数据集。Github:https://github.com/scverse/annbatch
摘要
:The scale of biological datasets now routinely exceeds system memory, making data access rather than model computation the primary bottleneck in training machine-learning models. This bottleneck is particularly acute in biology, where widely used community data formats must support heterogeneous metadata, sparse and dense assays, and downstream analysis within established computational ecosystems. Here we present annbatch, a mini-batch loader native to anndata that enables out-of-core training directly on disk-backed datasets. Across single-cell transcriptomics, microscopy and whole-genome sequencing benchmarks, annbatch increases loading throughput by up to an order of magnitude and shortens training from days to hours, while remaining fully compatible with the scverse ecosystem. Annbatch establishes a practical data-loading infrastructure for scalable biological AI, allowing increasingly large and diverse datasets to be used without abandoning standard biological data formats. Github: https://github.com/scverse/annbatch
【10】Language-Pretraining-Induced Bias: A Strong Foundation for General Vision Tasks
标题:训练前诱导的偏见:一般视力任务的坚实基础
链接:https://arxiv.org/abs/2604.01833
作者:Yaxin Luo,Zhiqiang Shen
摘要:语言预训练模型和视觉预训练模型中离群参数的比例存在显著差异,这使得跨模态(语言和视觉)本质上比跨域适应更具挑战性。因此,许多先前的研究都集中在跨域迁移上,而不是试图桥接语言和视觉模态,假设语言预训练模型由于不同的参数空间而不适合下游视觉任务。与此假设相反,我们表明,添加一个桥梁训练阶段作为模态适应学习器可以有效地调整大语言模型(LLM)参数与视觉任务。具体来说,我们提出了一个简单而强大的解决方案随机标签桥训练,不需要手动标记,并帮助LLM参数适应视觉基础任务。此外,我们的研究结果表明,部分桥接训练通常是有利的,因为LLM中的某些层表现出强大的基础属性,即使没有对视觉任务进行微调,这些属性也仍然是有益的。这一令人惊讶的发现为直接在视觉模型中利用语言预训练参数开辟了新的途径,并突出了部分桥接训练作为跨模态适应的实用途径的潜力。
摘要:The ratio of outlier parameters in language pre-training models and vision pre-training models differs significantly, making cross-modality (language and vision) inherently more challenging than cross-domain adaptation. As a result, many prior studies have focused on cross-domain transfer rather than attempting to bridge language and vision modalities, assuming that language pre-trained models are unsuitable for downstream visual tasks due to disparate parameter spaces. Contrary to this assumption, we show that adding a bridge training stage as a modality adaptation learner can effectively align Large Language Model (LLM) parameters with vision tasks. Specifically, we propose a simple yet powerful solution random label bridge training that requires no manual labeling and helps LLM parameters adapt to vision foundation tasks. Moreover, our findings reveal that partial bridge training is often advantageous, as certain layers in LLMs exhibit strong foundational properties that remain beneficial even without fine-tuning for visual tasks. This surprising discovery opens up new avenues for leveraging language pre-trained parameters directly within vision models and highlights the potential of partial bridge training as a practical pathway to cross-modality adaptation.
【11】MiCA Learns More Knowledge Than LoRA and Full Fine-Tuning
标题:MiCA比LoRA学到更多知识并进行全面微调
链接:https://arxiv.org/abs/2604.01694
作者:Sten Rüdiger,Sebastian Raschka
摘要:Minor Component Adaptation(MiCA)是一种针对大型语言模型的新的参数高效微调方法,其重点是调整模型表示的未充分利用的子空间。与低秩自适应(LoRA)等针对主导子空间的传统方法不同,MiCA利用奇异值分解来识别与最不重要奇异值相关联的次要奇异向量相关的子空间,并在微调这些方向时限制参数的更新。与LoRA相比,该策略在优化的训练超参数下的知识获取提高了5.9倍,并且最小参数占用率为6-60%。这些结果表明,将适应限制在次要的奇异方向上,为将新知识整合到预先训练的语言模型中提供了一种更有效和稳定的机制。
摘要:Minor Component Adaptation (MiCA) is a novel parameter-efficient fine-tuning method for large language models that focuses on adapting underutilized subspaces of model representations. Unlike conventional methods such as Low-Rank Adaptation (LoRA), which target dominant subspaces, MiCA leverages Singular Value Decomposition to identify subspaces related to minor singular vectors associated with the least significant singular values and constrains the update of parameters during fine-tuning to those directions. This strategy leads to up to 5.9x improvement in knowledge acquisition under optimized training hyperparameters and a minimal parameter footprint of 6-60% compared to LoRA. These results suggest that constraining adaptation to minor singular directions provides a more efficient and stable mechanism for integrating new knowledge into pre-trained language models.
【12】Coupled Query-Key Dynamics for Attention
标题:耦合查询-注意力的关键动态
链接:https://arxiv.org/abs/2604.01683
作者:Barak Gahtan,Alex M. Bronstein
摘要:标准缩放的点积注意力从输入的静态独立投影计算分数。我们表明,在评分之前通过共享学习动态来共同发展查询和键-我们称之为耦合QK动态-提高了语言建模的复杂性和训练稳定性。在WikiText-103上,在60 M参数下,耦合动力学实现了22.55- 22.62的困惑度,24.22用于标准注意力($-$6.6--6.9\%),只有0.11\%的额外参数(在两个实例中共享)。结构消融隔离耦合作为活性成分:当Q和K都耦合时,辛(Hamilton)和非辛(Euler)积分器的性能相同,而匹配容量的非耦合MLP基线仅达到23.81,种子方差高出8倍。积分步数(1- 7)同样是无关的-单个耦合步就足够了。一个计算机匹配的比较显示,耦合是一个样本效率很高的机制:标准的注意力训练了2.4倍的时间(匹配挂钟)达到了同样的困惑,但需要2.4倍的标记。优势扩展到1.5亿($-6.7\%),但缩小到3.5亿($-1.0\%),其中差分注意力(18.93)超过耦合动力学(19.35)。这种好处是依赖于语料库的:耦合有助于域连贯文本(WikiText-103 $-6.6\%,PubMed $-4.5\%),但在异构Web文本($+10.3\%)上会降低,并且在GLUE上没有显示任何好处。我们描述了耦合何时有帮助,何时没有帮助,提供了实用的指导方针。
摘要:Standard scaled dot-product attention computes scores from static, independent projections of the input. We show that evolving queries and keys \emph{jointly} through shared learned dynamics before scoring - which we call \textbf{coupled QK dynamics} - improves language modeling perplexity and training stability. On WikiText-103 at 60M parameters, coupled dynamics achieves 22.55--22.62 perplexity vs.\ 24.22 for standard attention ($-$6.6--6.9\%), with only 0.11\% additional parameters (shared across both instantiations). A structural ablation isolates coupling as the active ingredient: a symplectic (Hamiltonian) and a non-symplectic (Euler) integrator perform identically when both couple Q and K, while an uncoupled MLP baseline of matched capacity reaches only 23.81 with 8$\times$ higher seed variance. The integration step count (1--7) is similarly irrelevant - a single coupled step suffices. A compute-matched comparison reveals that coupling is a \emph{sample-efficiency} mechanism: standard attention trained for 2.4$\times$ longer (matching wall-clock) reaches the same perplexity, but requires 2.4$\times$ more tokens. The advantage scales to 150M ($-$6.7\%) but narrows at 350M ($-$1.0\%), where Differential Attention (18.93) overtakes coupled dynamics (19.35). The benefit is corpus-dependent: coupling helps on domain-coherent text (WikiText-103 $-$6.6\%, PubMed $-$4.5\%) but degrades on heterogeneous web text ($+$10.3\%) and shows no benefit on GLUE. We characterize when coupling helps and when it does not, providing practical guidelines.
【13】Pseudo-Quantized Actor-Critic Algorithm for Robustness to Noisy Temporal Difference Error
标题:具有鲁棒性的伪量化Actor批评算法
链接:https://arxiv.org/abs/2604.01613
作者:Taisuke Kobayashi
备注:38 pages, 12 figures
摘要
:在强化学习(RL)中,时间差(TD)误差被广泛用于优化值和策略函数。然而,由于TD误差是由自举方法定义的,其计算往往是嘈杂的,不稳定的学习。启发式,以提高TD错误的准确性,如目标网络和集成模型,已被介绍到目前为止。虽然这些都是当前深度RL算法的基本方法,但它们会导致计算成本增加和学习效率降低等副作用。因此,本文重新审视了基于控制的TD学习算法作为推理,推导出一种新的算法,能够对噪声TD错误的鲁棒学习。首先,最优性的分布模型,一个二元随机变量,由一个S形函数表示。除了正向和反向Kullback-Leibler发散,这个新模型还推导出了一个鲁棒的学习规则:当sigmoid函数饱和时,可能由于噪声导致的大TD误差,梯度消失,隐含地将其排除在学习之外。此外,这两个分歧表现出明显的梯度消失特性。基于这些分析,最优性被分解成多个级别,以实现TD误差的伪量化,旨在进一步降低噪声。此外,一个基于詹森-香农分歧的方法近似推导出继承这两个分歧的特点。这些好处通过RL基准测试进行了验证,即使在竞争力不足或奖励包含噪声的情况下,也能证明稳定的学习。
摘要:In reinforcement learning (RL), temporal difference (TD) errors are widely adopted for optimizing value and policy functions. However, since the TD error is defined by a bootstrap method, its computation tends to be noisy and destabilize learning. Heuristics to improve the accuracy of TD errors, such as target networks and ensemble models, have been introduced so far. While these are essential approaches for the current deep RL algorithms, they cause side effects like increased computational cost and reduced learning efficiency. Therefore, this paper revisits the TD learning algorithm based on control as inference, deriving a novel algorithm capable of robust learning against noisy TD errors. First, the distribution model of optimality, a binary random variable, is represented by a sigmoid function. Alongside forward and reverse Kullback-Leibler divergences, this new model derives a robust learning rule: when the sigmoid function saturates with a large TD error probably due to noise, the gradient vanishes, implicitly excluding it from learning. Furthermore, the two divergences exhibit distinct gradient-vanishing characteristics. Building on these analyses, the optimality is decomposed into multiple levels to achieve pseudo-quantization of TD errors, aiming for further noise reduction. Additionally, a Jensen-Shannon divergence-based approach is approximately derived to inherit the characteristics of both divergences. These benefits are verified through RL benchmarks, demonstrating stable learning even when heuristics are insufficient or rewards contain noise.
【14】Training In-Context and In-Weights Mixtures Via Contrastive Context Sampling
标题:通过对比上下文采样训练上下文和权重混合
链接:https://arxiv.org/abs/2604.01601
作者:Deeptanshu Malu,Deevyanshu Malu,Aditya Nemiwal,Sunita Sarawagi
摘要:我们调查的培训策略,共同发展的背景下学习(ICL)和权重学习(IWL),以及它们之间切换的能力,根据上下文的相关性。虽然目前的LLM表现出这两种模式,但标准的特定于任务的微调通常会侵蚀ICL,从而通过上下文示例激励IC-Train微调。先前的工作表明,IC-Train后ICL的出现取决于任务多样性和培训持续时间等因素。 在本文中,我们表明,目标输入和上下文示例之间的相似性结构也起着重要的作用。随机上下文导致ICL和IWL优势的丧失,而只有上下文中的相似示例才导致ICL退化为不考虑相关性的复制标签。为了解决这个问题,我们提出了一个简单的对比上下文,它强制执行两种类型的对比:(1)在上下文中混合相似和随机的例子,以进化出正确形式的ICL,以及(2)在上下文中改变相似性等级,以进化出ICL-IWL混合物。我们提出的见解,这种对比的重要性与理论分析的最小模型。我们对四个LLM和几个任务进行了广泛的实证评估。诊断探针证实,对比的环境会产生稳定的ICL-IWL混合物,避免崩溃为纯ICL、IWL或复制。
摘要:We investigate training strategies that co-develop in-context learning (ICL) and in-weights learning (IWL), and the ability to switch between them based on context relevance. Although current LLMs exhibit both modes, standard task-specific fine-tuning often erodes ICL, motivating IC-Train - fine-tuning with in-context examples. Prior work has shown that emergence of ICL after IC-Train depends on factors such as task diversity and training duration. In this paper we show that the similarity structure between target inputs and context examples also plays an important role. Random context leads to loss of ICL and IWL dominance, while only similar examples in context causes ICL to degenerate to copying labels without regard to relevance. To address this, we propose a simple Contrastive-Context which enforces two types of contrasts: (1) mix of similar and random examples within a context to evolve a correct form of ICL, and (2) varying grades of similarity across contexts to evolve ICL-IWL mixtures. We present insights on the importance of such contrast with theoretical analysis of a minimal model. We validate with extensive empirical evaluation on four LLMs and several tasks. Diagnostic probes confirm that contrasted contexts yield stable ICL-IWL mixtures, avoiding collapse into pure ICL, IWL, or copying.
【15】Care-Conditioned Neuromodulation for Autonomy-Preserving Supportive Dialogue Agents
标题:照顾条件神经调节对自主保持支持性对话代理
链接:https://arxiv.org/abs/2604.01576
作者:Shalima Binta Manir,Tim Oates
摘要:部署在支持或咨询角色中的大型语言模型必须在有用性与维护用户自主性之间取得平衡,然而标准对齐方法主要是为了有用性和无害性而优化,而没有明确地建模关系风险,如依赖强化,过度保护或强制指导。我们介绍护理条件神经调节(CCN),一个状态依赖的控制框架,其中来自结构化的用户状态和对话上下文条件响应生成和候选人选择的学习标量信号。我们将这种设置形式化为一个维护一致性的问题,并定义了一个效用函数,该函数奖励自治支持和帮助,同时惩罚依赖和胁迫。我们还构建了一个基准的关系故障模式,在多轮对话,包括保证依赖,操纵性护理,过度保护,边界不一致。在这个基准测试中,与基于效用的重新排序相结合的照顾条件候选生成将保留优先级的效用比监督微调提高了+0.25,比偏好优化基线提高了+0.07,同时保持了相当的优先级。飞行员人工评估和zero-shot转移到真实的情感支持对话显示与自动化指标的方向一致。这些结果表明,状态依赖控制结合基于效用的选择是一种实用的方法,多目标对齐的敏感性对话。
摘要:Large language models deployed in supportive or advisory roles must balance helpfulness with preservation of user autonomy, yet standard alignment methods primarily optimize for helpfulness and harmlessness without explicitly modeling relational risks such as dependency reinforcement, overprotection, or coercive guidance. We introduce Care-Conditioned Neuromodulation (CCN), a state-dependent control framework in which a learned scalar signal derived from structured user state and dialogue context conditions response generation and candidate selection. We formalize this setting as an autonomy-preserving alignment problem and define a utility function that rewards autonomy support and helpfulness while penalizing dependency and coercion. We also construct a benchmark of relational failure modes in multi-turn dialogue, including reassurance dependence, manipulative care, overprotection, and boundary inconsistency. On this benchmark, care-conditioned candidate generation combined with utility-based reranking improves autonomy-preserving utility by +0.25 over supervised fine-tuning and +0.07 over preference optimization baselines while maintaining comparable supportiveness. Pilot human evaluation and zero-shot transfer to real emotional-support conversations show directional agreement with automated metrics. These results suggest that state-dependent control combined with utility-based selection is a practical approach to multi-objective alignment in autonomy-sensitive dialogue.
【16】EXHIB: A Benchmark for Realistic and Diverse Evaluation of Function Similarity in the Wild
标题:EXHIB:野外功能相似性现实和多样化评估的基准
链接:https://arxiv.org/abs/2604.01554
作者:Yiming Fan,Jun Yeon Won,Ding Zhu,Melih Sirlanci,Mahdi Khalili,Carter Yagemann
备注:13 pages, 7 figures. This is a technical report for the EXHIB benchmark. Code and data are available at https://github.com/fan1192/bfsd-anon-artifact
摘要:二进制函数相似性检测(Binary Function Similarity Detection,BFSD)是软件安全领域的核心问题,支持漏洞分析、恶意软件分类和补丁来源等任务。在过去的几十年里,已经为这一应用开发了许多模型和工具;然而,由于在这一领域缺乏全面的通用基准,研究人员一直在努力有效地比较不同的模型。现有的数据集范围有限,通常集中在一组狭窄的转换或二进制文件类型上,并且无法反映真实世界应用程序的全部多样性。 我们介绍了EXHIB,一个基准测试,包括五个现实的数据集收集从野外,每个突出了一个不同的方面的BFSD问题空间。我们评估了9个代表性的模型,跨越多个BFSD范式的EXHIB和观察性能下降高达30%的固件和语义数据集相比,标准设置,揭示了大量的泛化差距。我们的研究结果表明,鲁棒性低,中级的二进制变化并没有推广到高层次的语义差异,强调了一个关键的盲点,在目前的BFSD评估实践。
摘要
:Binary Function Similarity Detection (BFSD) is a core problem in software security, supporting tasks such as vulnerability analysis, malware classification, and patch provenance. In the past few decades, numerous models and tools have been developed for this application; however, due to the lack of a comprehensive universal benchmark in this field, researchers have struggled to compare different models effectively. Existing datasets are limited in scope, often focusing on a narrow set of transformations or types of binaries, and fail to reflect the full diversity of real-world applications. We introduce EXHIB, a benchmark comprising five realistic datasets collected from the wild, each highlighting a distinct aspect of the BFSD problem space. We evaluate 9 representative models spanning multiple BFSD paradigms on EXHIB and observe performance degradations of up to 30% on firmware and semantic datasets compared to standard settings, revealing substantial generalization gaps. Our results show that robustness to low- and mid-level binary variations does not generalize to high-level semantic differences, underscoring a critical blind spot in current BFSD evaluation practices.
【17】ProdCodeBench: A Production-Derived Benchmark for Evaluating AI Coding Agents
标题:DeliverCodeBench:用于评估人工智能编码代理的产品衍生基准
链接:https://arxiv.org/abs/2604.01527
作者:Smriti Jha,Matteo Paltenghi,Chandra Maddila,Vijayaraghavan Murali,Shubham Ugare,Satish Chandra
摘要:反映生产工作负载的基准更适合在工业环境中评估AI编码代理,但现有的基准与编程语言分布,提示风格和代码库结构的实际使用不同。本文提出了一种用于策划生产衍生基准的方法,通过CockCodeBench进行了说明-这是一个使用生产AI编码助手从真实会话构建的基准。我们详细介绍了我们的数据收集和管理实践,包括基于LLM的任务分类,测试相关性验证和多运行稳定性检查,这些检查解决了从monorepo环境构建可靠评估信号的挑战。每个策划的示例都包括一个逐字提示,一个提交的代码更改和跨七种编程语言的失败通过测试。我们对四个基础模型的系统分析得出的解决率从53.2%到72.2%不等,这表明更多地使用工作验证工具(如执行测试和调用静态分析)的模型可以实现更高的解决率。这表明迭代验证有助于实现有效的代理行为,并且暴露特定于代码库的验证机制可以显着提高在不熟悉的环境中操作的外部培训代理的性能。我们分享我们的方法和经验教训,使其他组织能够构建类似的生产衍生基准。
摘要:Benchmarks that reflect production workloads are better for evaluating AI coding agents in industrial settings, yet existing benchmarks differ from real usage in programming language distribution, prompt style and codebase structure. This paper presents a methodology for curating production-derived benchmarks, illustrated through ProdCodeBench - a benchmark built from real sessions with a production AI coding assistant. We detail our data collection and curation practices including LLM-based task classification, test relevance validation, and multi-run stability checks which address challenges in constructing reliable evaluation signals from monorepo environments. Each curated sample consists of a verbatim prompt, a committed code change and fail-to-pass tests spanning seven programming languages. Our systematic analysis of four foundation models yields solve rates from 53.2% to 72.2% revealing that models making greater use of work validation tools, such as executing tests and invoking static analysis, achieve higher solve rates. This suggests that iterative verification helps achieve effective agent behavior and that exposing codebase-specific verification mechanisms may significantly improve the performance of externally trained agents operating in unfamiliar environments. We share our methodology and lessons learned to enable other organizations to construct similar production-derived benchmarks.
【18】Beyond Logit Adjustment: A Residual Decomposition Framework for Long-Tailed Reranking
标题:超越Logit调整:长尾重新排名的剩余分解框架
链接:https://arxiv.org/abs/2604.01506
作者:Zhanliang Wang,Hongzhuo Chen,Quan Minh Nguyen,Mian Umair Ahsan,Kai Wang
备注:Preprint
摘要:长尾分类,其中少量的频繁类占主导地位的许多罕见的,仍然具有挑战性,因为模型系统地倾向于频繁类在推理时。现有的事后方法(如logit调整)通过向基础模型logit添加固定的类偏移量来解决此问题。然而,恢复两个类别的相对排名所需的校正不需要跨输入恒定,并且固定偏移不能适应这种变化。我们通过对基本模型前k候选名单进行贝叶斯最优重新排名来研究这个问题。最佳得分和基本得分之间的差距,即残差校正,分解为每个类内恒定的类间分量,以及取决于输入和竞争标签的成对分量。当残差是纯类时,固定的偏移量足以恢复贝叶斯最优排序。我们进一步表明,当相同的标签对诱导不兼容的排序约束的上下文中,没有固定的偏移量可以实现这种恢复。这种分解导致可测试的预测,当成对校正可以提高性能,当不能。我们开发了REPAIR(通过成对残差校正重新排序),一个轻量级的事后重新排序器,它结合了收缩稳定的类项和由候选名单上的竞争功能驱动的线性成对项。五个基准跨越图像分类,物种识别,场景识别和罕见疾病诊断的实验证实,分解解释了成对校正的帮助和单独的classwise校正就足够了。
摘要:Long-tailed classification, where a small number of frequent classes dominate many rare ones, remains challenging because models systematically favor frequent classes at inference time. Existing post-hoc methods such as logit adjustment address this by adding a fixed classwise offset to the base-model logits. However, the correction required to restore the relative ranking of two classes need not be constant across inputs, and a fixed offset cannot adapt to such variation. We study this problem through Bayes-optimal reranking on a base-model top-k shortlist. The gap between the optimal score and the base score, the residual correction, decomposes into a classwise component that is constant within each class, and a pairwise component that depends on the input and competing labels. When the residual is purely classwise, a fixed offset suffices to recover the Bayes-optimal ordering. We further show that when the same label pair induces incompatible ordering constraints across contexts, no fixed offset can achieve this recovery. This decomposition leads to testable predictions regarding when pairwise correction can improve performance and when cannot. We develop REPAIR (Reranking via Pairwise residual correction), a lightweight post-hoc reranker that combines a shrinkage-stabilized classwise term with a linear pairwise term driven by competition features on the shortlist. Experiments on five benchmarks spanning image classification, species recognition, scene recognition, and rare disease diagnosis confirm that the decomposition explains where pairwise correction helps and where classwise correction alone suffices.
【19】When AI Gets it Wong: Reliability and Risk in AI-Assisted Medication Decision Systems
标题:当人工智能得到它黄:人工智能辅助用药决策系统的可靠性和风险
链接:https://arxiv.org/abs/2604.01449
作者:Khalid Adnan Alsayed
备注:9 pages, 1 figure. Position paper with simulated experimental analysis of AI reliability in medication decision systems
摘要:人工智能(AI)系统越来越多地集成到医疗保健和药房工作流程中,支持药物推荐、剂量确定和药物相互作用检测等任务。虽然这些系统通常在标准评估指标下表现出强大的性能,但它们在现实世界决策中的可靠性仍然没有得到充分的理解。在药物管理等高风险领域,即使是一个错误的建议也可能导致严重的患者伤害。本文通过关注系统故障及其潜在的临床后果来研究人工智能辅助药物系统的可靠性。这项工作不是仅仅通过综合指标来评估性能,而是将注意力转移到错误是如何发生的以及当人工智能系统产生不正确的输出时会发生什么。通过一系列涉及药物相互作用和剂量决策的受控模拟场景,我们分析了不同类型的系统故障,包括错过的相互作用,不正确的风险标记和不适当的剂量建议。研究结果强调,与药物相关的人工智能错误可能导致药物不良反应、无效治疗或延迟护理,特别是在没有足够的人为监督的情况下使用系统时。此外,本文还讨论了过度依赖人工智能建议的风险以及决策过程透明度有限所带来的挑战。这项工作为医疗保健领域的AI评估提供了一个以可靠性为中心的视角,强调了理解失败行为和现实影响的重要性。它强调了用风险意识评估方法补充传统绩效指标的必要性,特别是在药房实践等安全关键领域。
摘要
:Artificial intelligence (AI) systems are increasingly integrated into healthcare and pharmacy workflows, supporting tasks such as medication recommendations, dosage determination, and drug interaction detection. While these systems often demonstrate strong performance under standard evaluation metrics, their reliability in real-world decision-making remains insufficiently understood. In high-risk domains such as medication management, even a single incorrect recommendation can result in severe patient harm. This paper examines the reliability of AI-assisted medication systems by focusing on system failures and their potential clinical consequences. Rather than evaluating performance solely through aggregate metrics, this work shifts attention towards how errors occur and what happens when AI systems produce incorrect outputs. Through a series of controlled, simulated scenarios involving drug interactions and dosage decisions, we analyse different types of system failures, including missed interactions, incorrect risk flagging, and inappropriate dosage recommendations. The findings highlight that AI errors in medication-related contexts can lead to adverse drug reactions, ineffective treatment, or delayed care, particularly when systems are used without sufficient human oversight. Furthermore, the paper discusses the risks of over-reliance on AI recommendations and the challenges posed by limited transparency in decision-making processes. This work contributes a reliability-focused perspective on AI evaluation in healthcare, emphasising the importance of understanding failure behavior and real-world impact. It highlights the need to complement traditional performance metrics with risk-aware evaluation approaches, particularly in safety-critical domains such as pharmacy practice.
【20】Generative Profiling for Soft Real-Time Systems and its Applications to Resource Allocation
标题:软实时系统生成性剖析及其在资源分配中的应用
链接:https://arxiv.org/abs/2604.01441
作者:Georgiy A. Bondar,Abigail Eisenklam,Yifan Cai,Robert Gifford,Tushar Sial,Linh Thi Xuan Phan,Abhishek Halder
摘要:现代实时系统需要精确的任务定时行为特征,以确保可预测的性能,特别是在复杂的硬件架构上。现有的方法,例如最坏情况执行时间分析,通常不能捕获在变化的资源上下文(例如,高速缓存、存储器带宽和CPU频率的分配),这是实现有效资源利用所必需的。在本文中,我们介绍了一种新的生成分析方法,合成上下文相关的,细粒度的实时任务的时间配置文件,包括那些不可测量的资源分配。我们的方法利用非参数,有条件的多边缘薛定谔桥(MSB)制定生成准确的执行配置文件看不见的资源上下文,最大似然保证。我们通过真实世界的基准测试证明了我们的方法的效率和有效性,并在实时系统的自适应多核资源分配的代表性案例研究中展示了其实际效用。
摘要:Modern real-time systems require accurate characterization of task timing behavior to ensure predictable performance, particularly on complex hardware architectures. Existing methods, such as worst-case execution time analysis, often fail to capture the fine-grained timing behaviors of a task under varying resource contexts (e.g., an allocation of cache, memory bandwidth, and CPU frequency), which is necessary to achieve efficient resource utilization. In this paper, we introduce a novel generative profiling approach that synthesizes context-dependent, fine-grained timing profiles for real-time tasks, including those for unmeasured resource allocations. Our approach leverages a nonparametric, conditional multi-marginal Schrödinger Bridge (MSB) formulation to generate accurate execution profiles for unseen resource contexts, with maximum likelihood guarantees. We demonstrate the efficiency and effectiveness of our approach through real-world benchmarks, and showcase its practical utility in a representative case study of adaptive multicore resource allocation for real-time systems.
【21】Improving Latent Generalization Using Test-time Compute
标题:使用测试时计算改进潜在概括
链接:https://arxiv.org/abs/2604.01430
作者:Arslan Chaudhry,Sridhar Thiagarajan,Andrew Lampinen
摘要:语言模型(LM)表现出两种不同的知识获取机制:权重学习(即,在模型权重内编码信息)和上下文学习(ICL)。虽然这两种模式提供了互补的优势,权重学习经常努力促进内在知识的演绎推理。我们将这种限制描述为潜在泛化的缺陷,其中逆转诅咒就是一个例子。相反,在上下文学习表现出高度强大的潜在泛化能力。为了提高权重知识的潜在泛化能力,以前的方法依赖于训练时数据增强,但这些技术是特定于任务的,可扩展性差,并且无法泛化到分布外的知识。为了克服这些缺点,这项工作研究了如何教模型使用测试时计算,或“思考”,特别是提高潜在的泛化。我们使用来自正确性反馈的强化学习(RL)来训练模型,以产生长的思想链(CoT)来提高潜在的泛化能力。我们的实验表明,这种思维方法不仅解决了许多潜在的泛化失败的分布知识的情况下,而且,不像增强基线,推广到新的知识,没有RL进行。然而,在纯粹的反转任务中,我们发现思维并不能解锁直接的知识反转,但思维模型的生成和验证能力使它们能够获得远远高于机会的表现。事实自我验证的脆弱性意味着思维模式仍然远远低于这项任务的情境学习的表现。总的来说,我们的研究结果建立了测试时的思维作为一个灵活的和有前途的方向,以提高潜在的推广LM。
摘要:Language Models (LMs) exhibit two distinct mechanisms for knowledge acquisition: in-weights learning (i.e., encoding information within the model weights) and in-context learning (ICL). Although these two modes offer complementary strengths, in-weights learning frequently struggles to facilitate deductive reasoning over the internalized knowledge. We characterize this limitation as a deficit in latent generalization, of which the reversal curse is one example. Conversely, in-context learning demonstrates highly robust latent generalization capabilities. To improve latent generalization from in-weights knowledge, prior approaches rely on train-time data augmentation, yet these techniques are task-specific, scale poorly, and fail to generalize to out-of-distribution knowledge. To overcome these shortcomings, this work studies how models can be taught to use test-time compute, or 'thinking', specifically to improve latent generalization. We use Reinforcement Learning (RL) from correctness feedback to train models to produce long chains-of-thought (CoTs) to improve latent generalization. Our experiments show that this thinking approach not only resolves many instances of latent generalization failures on in-distribution knowledge but also, unlike augmentation baselines, generalizes to new knowledge for which no RL was performed. Nevertheless, on pure reversal tasks, we find that thinking does not unlock direct knowledge inversion, but the generate-and-verify ability of thinking models enables them to get well above chance performance. The brittleness of factual self-verification means thinking models still remain well below the performance of in-context learning for this task. Overall, our results establish test-time thinking as a flexible and promising direction for improving the latent generalization of LMs.
【22】Regularizing Attention Scores with Bootstrapping
标题:通过Bootstrapping规范注意力分数
链接:https://arxiv.org/abs/2604.01339
作者:Neo Christopher Chung,Maxim Laletin
摘要:Vision Transformers(ViT)依赖注意机制来衡量输入特征,因此注意分数自然被认为是其决策过程的解释。然而,注意力分数几乎总是非零的,导致嘈杂和分散的注意力地图和限制的可解释性。我们能否量化注意力分数的不确定性度量并获得规则化的注意力分数?为此,我们认为注意力分数的ViT在一个统计框架,独立的噪音会导致微不足道的,但非零的分数。利用统计学习技术,我们引入了注意力分数的自举,它通过重新搜索输入特征生成注意力分数的基线分布。这样的自助分布,然后使用估计的显着性和后验概率的注意分数。在自然和医学图像中,所提出的注意力正则化方法展示了一种直接去除噪声引起的虚假注意力的方法,大大改善了收缩和稀疏性。使用模拟和真实世界的数据集进行定量评估。我们的研究强调了自举作为一个实用的正则化工具时,使用注意力分数作为解释ViT。 可用代码:https://github.com/ncchung/AttentionRegularization
摘要:Vision transformers (ViT) rely on attention mechanism to weigh input features, and therefore attention scores have naturally been considered as explanations for its decision-making process. However, attention scores are almost always non-zero, resulting in noisy and diffused attention maps and limiting interpretability. Can we quantify uncertainty measures of attention scores and obtain regularized attention scores? To this end, we consider attention scores of ViT in a statistical framework where independent noise would lead to insignificant yet non-zero scores. Leveraging statistical learning techniques, we introduce the bootstrapping for attention scores which generates a baseline distribution of attention scores by resampling input features. Such a bootstrap distribution is then used to estimate significances and posterior probabilities of attention scores. In natural and medical images, the proposed \emph{Attention Regularization} approach demonstrates a straightforward removal of spurious attention arising from noise, drastically improving shrinkage and sparsity. Quantitative evaluations are conducted using both simulation and real-world datasets. Our study highlights bootstrapping as a practical regularization tool when using attention scores as explanations for ViT. Code available: https://github.com/ncchung/AttentionRegularization
【23】Evolutionary Multi-Objective Fusion of Deepfake Speech Detectors
标题:Deepfake语音检测器的进化多目标融合
链接:https://arxiv.org/abs/2604.01330
作者:Vojtěch Staněk,Martin Perešíni,Lukáš Sekanina,Anton Firc,Kamil Malinka
备注:Accepted to WCCI CEC 2026
摘要:虽然建立在大型自监督学习(SSL)模型上的Deepfake语音检测器实现了高精度,但采用标准集成融合来进一步增强鲁棒性往往会导致系统规模过大,收益递减。为了解决这个问题,我们提出了一个进化的多目标得分融合框架,共同最大限度地减少检测错误和系统的复杂性。我们探讨了两种编码优化的NSGA-II:二进制编码的检测器选择得分平均和一个实值的计划,优化检测器的加权和的权重。在ASVspoof 5数据集上使用36个基于SSL的检测器进行的实验表明,所获得的Pareto前沿优于简单平均和逻辑回归基线。实值变体实现了2.37%的EER(0.0684 minDCF),并确定了与最先进性能相匹配的配置,同时显著降低了系统复杂性,仅需要一半的参数。我们的方法还提供了一套不同的权衡解决方案,使部署选择,平衡精度和计算成本。
摘要:While deepfake speech detectors built on large self-supervised learning (SSL) models achieve high accuracy, employing standard ensemble fusion to further enhance robustness often results in oversized systems with diminishing returns. To address this, we propose an evolutionary multi-objective score fusion framework that jointly minimizes detection error and system complexity. We explore two encodings optimized by NSGA-II: binary-coded detector selection for score averaging and a real-valued scheme that optimizes detector weights for a weighted sum. Experiments on the ASVspoof 5 dataset with 36 SSL-based detectors show that the obtained Pareto fronts outperform simple averaging and logistic regression baselines. The real-valued variant achieves 2.37% EER (0.0684 minDCF) and identifies configurations that match state-of-the-art performance while significantly reducing system complexity, requiring only half the parameters. Our method also provides a diverse set of trade-off solutions, enabling deployment choices that balance accuracy and computational cost.
【24】Sven: Singular Value Descent as a Computationally Efficient Natural Gradient Method
标题:斯文:奇异值下降作为一种计算高效的自然梯度方法
链接:https://arxiv.org/abs/2604.01279
作者:Samuel Bright-Thonney,Thomas R. Harvey,Andre Lukas,Jesse Thaler
摘要:我们介绍Sven(奇异值dEsceNt),这是一种新的神经网络优化算法,它利用损失函数的自然分解为单个数据点的总和,而不是在计算参数更新之前将全部损失减少到单个标量。Sven将每个数据点的残差作为一个单独的条件同时满足,使用损失雅可比矩阵的Moore-Penrose伪逆来找到最能同时满足所有条件的最小范数参数更新。在实践中,这个伪逆近似通过截断奇异值分解,只保留$k$最重要的方向,并招致的计算开销只有一个因素的$k$相对于随机梯度下降。这与传统的自然梯度方法相比,传统的自然梯度方法按参数数量的平方进行缩放。我们表明,斯文可以被理解为一个自然梯度方法推广到过参数化制度,恢复自然梯度下降的参数化限制。在回归任务中,Sven显著优于包括Adam在内的标准一阶方法,收敛速度更快,最终损失更低,同时以一小部分的壁时间成本与LBFGS保持竞争力。我们讨论了扩展的主要挑战,即内存开销,并提出缓解策略。除了标准的机器学习基准之外,我们预计Sven将在科学计算环境中找到自然的应用,其中自定义损失函数分解为几个条件。
摘要:We introduce Sven (Singular Value dEsceNt), a new optimization algorithm for neural networks that exploits the natural decomposition of loss functions into a sum over individual data points, rather than reducing the full loss to a single scalar before computing a parameter update. Sven treats each data point's residual as a separate condition to be satisfied simultaneously, using the Moore-Penrose pseudoinverse of the loss Jacobian to find the minimum-norm parameter update that best satisfies all conditions at once. In practice, this pseudoinverse is approximated via a truncated singular value decomposition, retaining only the $k$ most significant directions and incurring a computational overhead of only a factor of $k$ relative to stochastic gradient descent. This is in comparison to traditional natural gradient methods, which scale as the square of the number of parameters. We show that Sven can be understood as a natural gradient method generalized to the over-parametrized regime, recovering natural gradient descent in the under-parametrized limit. On regression tasks, Sven significantly outperforms standard first-order methods including Adam, converging faster and to a lower final loss, while remaining competitive with LBFGS at a fraction of the wall-time cost. We discuss the primary challenge to scaling, namely memory overhead, and propose mitigation strategies. Beyond standard machine learning benchmarks, we anticipate that Sven will find natural application in scientific computing settings where custom loss functions decompose into several conditions.
【25】Demographic Parity Tails for Regression
标题:人口平价回归
链接:https://arxiv.org/abs/2604.02017
作者:Naht Sinh Le,Christophe Denis,Mohamed Hebiri
摘要:人口统计学奇偶性(DP)是回归中广泛研究的公平性标准,强制执行预测和敏感属性之间的独立性。然而,约束整个分布可能会降低预测精度,并且对于许多应用来说可能是不必要的,其中公平性问题局限于分布的特定区域。为了克服这个问题,我们提出了一个新的框架下的DP回归,重点是在敏感群体的目标分布的尾部。我们的方法建立在最优运输理论的基础上。通过仅对分布的目标区域实施公平性约束,我们的方法可以实现更细致入微和对上下文敏感的干预。利用最新的进展,我们开发了一个可解释的和灵活的算法,利用最佳运输的几何结构。我们提供了理论保证,包括风险界限和公平性,并通过回归设置中的实验验证该方法。
摘要:Demographic parity (DP) is a widely studied fairness criterion in regression, enforcing independence between the predictions and sensitive attributes. However, constraining the entire distribution can degrade predictive accuracy and may be unnecessary for many applications, where fairness concerns are localized to specific regions of the distribution. To overcome this issue, we propose a new framework for regression under DP that focuses on the tails of target distribution across sensitive groups. Our methodology builds on optimal transport theory. By enforcing fairness constraints only over targeted regions of the distribution, our approach enables more nuanced and context-sensitive interventions. Leveraging recent advances, we develop an interpretable and flexible algorithm that leverages the geometric structure of optimal transport. We provide theoretical guarantees, including risk bounds and fairness properties, and validate the method through experiments in regression settings.
【26】Random Coordinate Descent on the Wasserstein Space of Probability Measures
标题:Wasserstein概率测度空间上的随机坐标下降
链接:https://arxiv.org/abs/2604.01606
作者:Yewei Xu,Qin Li
摘要:在赋予Wasserstein-2几何的概率测度空间上的优化是现代机器学习和平均场建模的核心。然而,传统的方法依赖于全Wasserstein梯度往往遭受高的计算开销在高维或病态设置。我们提出了一个专门为Wasserstein流形设计的随机坐标下降框架,为复合目标引入了随机Wasserstein坐标下降(RWCD)和随机Wasserstein坐标近端{-梯度}(RWCP)。通过利用坐标结构,我们的方法适用于全梯度方法通常难以实现的各向异性目标景观。我们提供了一个严格的收敛性分析,在各种景观几何形状,建立保证非凸,Polyak-Jasojasiewicz,测地线凸的条件。我们的理论结果反映了经典的收敛性质发现在欧几里得空间,揭示了一个引人注目的对称坐标下降向量和概率措施。所开发的技术本质上是自适应的Wasserstein几何形状,并提供了一个强大的分析模板,可以扩展到其他优化求解器的空间内的措施。病态能量的数值实验表明,我们的框架提供了显着的加速比传统的全梯度方法。
摘要
:Optimization over the space of probability measures endowed with the Wasserstein-2 geometry is central to modern machine learning and mean-field modeling. However, traditional methods relying on full Wasserstein gradients often suffer from high computational overhead in high-dimensional or ill-conditioned settings. We propose a randomized coordinate descent framework specifically designed for the Wasserstein manifold, introducing both Random Wasserstein Coordinate Descent (RWCD) and Random Wasserstein Coordinate Proximal{-Gradient} (RWCP) for composite objectives. By exploiting coordinate-wise structures, our methods adapt to anisotropic objective landscapes where full-gradient approaches typically struggle. We provide a rigorous convergence analysis across various landscape geometries, establishing guarantees under non-convex, Polyak-Łojasiewicz, and geodesically convex conditions. Our theoretical results mirror the classic convergence properties found in Euclidean space, revealing a compelling symmetry between coordinate descent on vectors and on probability measures. The developed techniques are inherently adaptive to the Wasserstein geometry and offer a robust analytical template that can be extended to other optimization solvers within the space of measures. Numerical experiments on ill-conditioned energies demonstrate that our framework offers significant speedups over conventional full-gradient methods.
【27】A Determinantal Approach to a Sharp $\ell^1-\ell^\infty-\ell^2$ Norm Inequality
链接:https://arxiv.org/abs/2604.01525
作者:Jose Antonio Lara Benitez
摘要:本文给出了不等式\[x\|_1\,\|x\|_infty \le \frac{1+\sqrt{p}}{2}\,\|x\|_2^2,\]对每个\(x\in\mathbb{R}^p\)都成立的一个线性代数证明.这个不等式涉及有限维空间上的三个基本范数,并在优化和数值分析中有应用。我们的证明利用了一个参数化的二次型族的行列式结构,我们证明了常数$(1+\sqrt{p})/2$是最优的。
摘要:We give a short linear--algebraic proof of the inequality \[ \|x\|_1\,\|x\|_\infty \le \frac{1+\sqrt{p}}{2}\,\|x\|_2^2, \] valid for every \(x\in\mathbb{R}^p\). This inequality relates three fundamental norms on finite-dimensional spaces and has applications in optimization and numerical analysis. Our proof exploits the determinantal structure of a parametrized family of quadratic forms, and we show the constant $(1+\sqrt{p})/2$ is optimal.
【28】Non-monotonicity in Conformal Risk Control
标题:保形风险控制的非单调性
链接:https://arxiv.org/abs/2604.01502
作者:Tareq Aldirawi,Yun Li,Wenge Guo
备注:38 pages, 6 figures, 3 tables
摘要:保形风险控制(CRC)提供了无分布保证,用于将预期损失控制在用户指定的水平。现有的理论通常假设损失随着控制预测集的大小的调谐参数而单调减小。这一假设在实践中经常被违反,其中由于覆盖和效率等竞争目标,损失可能表现为非单调性。 我们研究CRC下的非单调损失函数时,调谐参数是从一个有限的网格,一个常见的情况下阈值或离散化的决策规则。重新审视一个已知的反例,我们表明,无单调性的CRC的有效性取决于校准样本量和网格分辨率之间的关系。特别是,当校准样本相对于网格足够大时,仍然可以实现风险控制。 我们提供了一个有限样本保证有界损失的大小为$m$的网格,表明超过目标水平$α$的超额风险是$\sqrt{\log(m)/n}$,其中$n$是校准样本大小。一个匹配的下界表明,这个速率是最小最大最优的。我们还得到了额外的结构条件下,包括Lipschitz连续性和单调性,细化保证,并通过重要性加权分布转移的设置扩展分析。 对合成多标签分类和真实对象检测数据的数值实验说明了非单调性的实际影响。考虑有限样本偏差的方法比基于单调性变换的方法实现更稳定的风险控制,同时保持有竞争力的预测集大小。
摘要:Conformal risk control (CRC) provides distribution-free guarantees for controlling the expected loss at a user-specified level. Existing theory typically assumes that the loss decreases monotonically with a tuning parameter that governs the size of the prediction set. This assumption is often violated in practice, where losses may behave non-monotonically due to competing objectives such as coverage and efficiency. We study CRC under non-monotone loss functions when the tuning parameter is selected from a finite grid, a common scenario in thresholding or discretized decision rules. Revisiting a known counterexample, we show that the validity of CRC without monotonicity depends on the relationship between the calibration sample size and the grid resolution. In particular, risk control can still be achieved when the calibration sample is sufficiently large relative to the grid. We provide a finite-sample guarantee for bounded losses over a grid of size $m$, showing that the excess risk above the target level $α$ is of order $\sqrt{\log(m)/n}$, where $n$ is the calibration sample size. A matching lower bound shows that this rate is minimax optimal. We also derive refined guarantees under additional structural conditions, including Lipschitz continuity and monotonicity, and extend the analysis to settings with distribution shift via importance weighting. Numerical experiments on synthetic multilabel classification and real object detection data illustrate the practical impact of non-monotonicity. Methods that account for finite-sample deviations achieve more stable risk control than approaches based on monotonicity transformations, while maintaining competitive prediction-set sizes.
【29】The topological gap at criticality: scaling exponent d + η, universality, and scope
标题:临界点的拓扑差距:标度指数d + ón、普适性和范围
链接:https://arxiv.org/abs/2604.01484
作者:Matthew Loftus
备注:7 pages, 4 figures, 4 tables
摘要:拓扑间隙$Δ = TP_{H_1}^{real}-TP_{H_1}^{shuf}$--多数自旋α复合体在密度匹配的零上的剩余$H_1 $总持久性--编码自旋模型中的临界相关性。我们建立了有限尺寸标度:$Δ(L,T)= A L ^{d + η} G_-(L| t/T_c|)$,其中$G_-(x)\sim(1 + x/x_0)^{-(1 + β/ν)}$。对于二维伊辛方程,α = 2.249 pm 0.038 $,满足d + η = 9/4 $到0.03 σ $; G-指数γ = 1.089 pm 0.077 $与1 + β/ν = 9/8 $(Δ R^2 <10 ^{-5}$)相一致。对于2D Potts $q = 3 $,$L $直到1024,$α = 2.272\pm 0.024 $($d + η = 2.267 $的0.2 σ $),具有对标度的两项修正($R^2 = 0.9999 $)。$G_-$指数$γ = 1.114 $(68% CI $[1.053,1.173]$)匹配$1 + β/v = 17/15 $。范围界限:该定律在2D Potts $q = 4 $($α = 2.347\pm 0.017 $,$9.3 σ $from $d + η = 5/2 $)中失效,其中对数校正阻止收敛,并且对于原始3D Ising($4 σ $from $d + η $),但是密度归一化$Δ/|M| ^{1/2}$恢复$α = 3.06\pm 0.04 $($0.6 σ $)。该框架失败的一阶,BKT和渗流。准则:$α = d + η $在对标度的修正是代数的($ω> 0 $)时成立,但在对数的($ω\to 0 $)时不成立。
摘要:The topological gap $Δ= TP_{H_1}^{real} - TP_{H_1}^{shuf}$ -- the excess $H_1$ total persistence of the majority-spin alpha complex over a density-matched null -- encodes critical correlations in spin models. We establish finite-size scaling: $Δ(L,T) = A L^{d+η} G_-(L|t/T_c|)$, with $G_-(x) \sim (1+x/x_0)^{-(1+β/ν)}$. For 2D Ising, $α= 2.249 \pm 0.038$, matching $d+η= 9/4$ to $0.03σ$; the $G_-$ exponent $γ= 1.089 \pm 0.077$ is consistent with $1+β/ν= 9/8$ ($ΔR^2 < 10^{-5}$). For 2D Potts $q=3$ with $L$ up to 1024, $α= 2.272 \pm 0.024$ ($0.2σ$ from $d+η= 2.267$), with two-term corrections to scaling ($R^2 = 0.9999$). The $G_-$ exponent $γ= 1.114$ (68% CI $[1.053, 1.173]$) matches $1+β/ν= 17/15$. Scope boundaries: the law fails for 2D Potts $q=4$ ($α= 2.347 \pm 0.017$, $9.3σ$ from $d+η= 5/2$) where logarithmic corrections prevent convergence, and for raw 3D Ising ($4σ$ from $d+η$), but density normalization $Δ/|M|^{1/2}$ recovers $α= 3.06 \pm 0.04$ ($0.6σ$). The framework fails for first-order, BKT, and percolation. The criterion: $α= d+η$ holds when corrections to scaling are algebraic ($ω> 0$) but fails when logarithmic ($ω\to 0$).
【30】The Newton-Muon Optimizer
标题:牛顿-μ子优化器
链接:https://arxiv.org/abs/2604.01472
作者:Zhehang Du,Weijie Su
摘要:Muon优化器因其在训练大型语言模型方面的强大性能而受到广泛关注,但其矩阵梯度正交化背后的设计原理在很大程度上仍然难以捉摸。在本文中,我们介绍了一个代理模型,不仅揭示了新的μ子的设计,但更重要的是导致一个新的优化。与牛顿方法的推导相同,代理函数仅使用三个矩阵将损失近似为扰动的二次函数,即权重矩阵W:梯度G、输出空间曲率矩阵H和堆叠层输入的数据矩阵Z。通过在一步中最小化这个代理,并对权重采用一定的各向同性假设,我们得到了封闭形式的更新规则(直到动量和权重衰减)$W \leftarrow W - η\cdot \mathrm{msgn}(G(ZZ^\top)^{-1})$,其中$η$是学习率,如果$X=USV^\top$是紧致奇异值分解,则$\mathrm{msgn}(X)=UV^\top$。这种新的优化方法,我们称之为牛顿μ子,表明标准μ子可以被解释为一个隐式牛顿型方法,忽略了正确的预处理引起的输入二阶矩。从经验上讲,在使用Muon进行GPT-2预训练的最早公开发布的Modded-NanoGPT speedrun配置的复制中,Newton-Muon在少6%的迭代步骤中达到目标验证损失,并将挂钟训练时间减少约4%。
摘要:The Muon optimizer has received considerable attention for its strong performance in training large language models, yet the design principle behind its matrix-gradient orthogonalization remains largely elusive. In this paper, we introduce a surrogate model that not only sheds new light on the design of Muon, but more importantly leads to a new optimizer. In the same spirit as the derivation of Newton's method, the surrogate approximates the loss as a quadratic function of the perturbation to a weight matrix $W$ using only three matrices: the gradient $G$, an output-space curvature matrix $H$, and the data matrix $Z$ that stacks the layer inputs. By minimizing this surrogate in one step and adopting a certain isotropic assumption on the weights, we obtain the closed-form update rule (up to momentum and weight decay) $W \leftarrow W - η\cdot \mathrm{msgn}(G(ZZ^\top)^{-1})$, where $η$ is the learning rate and $\mathrm{msgn}(X)=UV^\top$ if $X=USV^\top$ is a compact singular value decomposition. This new optimization method, which we refer to as Newton-Muon, shows that standard Muon can be interpreted as an implicit Newton-type method that neglects the right preconditioning induced by the input second moment. Empirically, on a reproduction of the earliest publicly released Modded-NanoGPT speedrun configuration using Muon for GPT-2 pretraining, Newton-Muon reaches the target validation loss in 6\% fewer iteration steps and reduces wall-clock training time by about 4\%.
【31】VIANA: character Value-enhanced Intensity Assessment via domain-informed Neural Architecture
标题:VIANA:通过领域知情的神经架构进行角色价值增强强度评估
链接:https://arxiv.org/abs/2604.01365
作者:Luana P. Queiroz,Icaro S. C. Bernardes,Ana M. Ribeiro,Bernardo M. Aguilera-Mercado,Idelfonso B. R. Nogueira
摘要:预测气味的感知强度仍然是感官科学中的一个基本挑战,因为它们的反应是复杂的,非线性的,以及难以将分子结构与人类感知相关联。虽然传统的深度学习模型,如图卷积网络(GCN),擅长捕捉分子拓扑结构,但它们往往无法解释嗅觉的生物和感知背景。本研究介绍了VIANA,一个新颖的“三支柱”框架,集成了结构图理论,字符值嵌入,现象学行为。该方法系统地评估了三个不同领域的知识转移:通过GCN的分子结构,通过主要气味图(POM)嵌入的语义气味特征值,以及通过希尔定律的生物剂量反应逻辑。我们证明,知识转移并不是固有的积极的,而是必须保持平衡的信息量提供给模型。虽然原始语义数据导致领域信息模型中的“信息过载”,但应用主成分分析(PCA)提取95%最具影响力的语义方差产生了更好的“信号蒸馏”效果。结果表明,这三个知识转移支柱的综合表现明显优于基线结构模型,VIANA的峰值R^2为0.996,检验均方误差(MSE)为0.19。在这种情况下,VIANA成功地捕捉到饱和度的物理上限、检测阈值的灵敏度以及气味特征值表达的细微差别,从而提供了基于领域的人类嗅觉体验模拟。这项研究为数字嗅觉提供了一个强大的框架,有效地弥合了分子信息学和感官知觉之间的差距。
摘要:Predicting the perceived intensity of odorants remains a fundamental challenge in sensory science due to the complex, non-linear behavior of their response, as well as the difficulty in correlating molecular structure with human perception. While traditional deep learning models, such as Graph Convolutional Networks (GCNs), excel at capturing molecular topology, they often fail to account for the biological and perceptual context of olfaction. This study introduces VIANA, a novel "tri-pillar" framework that integrates structural graph theory, character value embeddings, and phenomenological behavior. This methodology systematically evaluates knowledge transfer across three distinct domains: molecular structure via GCNs, semantic odor character values via Principal Odor Map (POM) embeddings, and biological dose-response logic via Hill's law. We demonstrate that knowledge transfer is not inherently positive; rather, a balance must be maintained in the volume of information provided to the model. While raw semantic data led to "information overload" in domain-informed models, applying Principal Component Analysis (PCA) to distill the 95% most impactful semantic variance yielded a superior "signal distillation" effect. Results indicate that the synthesis of these three knowledge transfer pillars significantly outperforms baseline structural models, with VIANA achieving a peak R^2 of 0.996 and a test Mean Squared Error (MSE) of 0.19. In this context, VIANA successfully captures the physical ceiling of saturation, the sensitivity of detection thresholds, and the nuance of odor character value expression, providing a domain grounded simulation of the human olfactory experience. This research provides a robust framework for digital olfaction, effectively bridging the gap between molecular informatics and sensory perception.
【32】Descending into the Modular Bootstrap
标题:模块化Bootstrap
链接:https://arxiv.org/abs/2604.01275
作者:Nathan Benjamin,A. Liam Fitzpatrick,Wei Li,Jesse Thaler
备注:57 pages, 23 figures, 4 tables; code available at http://github.com/jdthaler/modular-bootstrap
摘要:In this paper, we attempt to explore the landscape of two-dimensional conformal field theories (2d CFTs) by efficiently searching for numerical solutions to the modular bootstrap equation using machine-learning-style optimization. The torus partition function of a 2d CFT is fixed by the spectrum of its primary operators and its chiral algebra, which we take to be the Virasoro algebra with $c>1$. We translate the requirement that this partition function is modular invariant into a loss function, which we then minimize to identify possible primary spectra. Our approach involves two technical innovations that facilitate finding reliable candidate CFTs. The first is a strategy to estimate the uncertainty associated with truncating the spectrum to the lowest dimension operators. The second is the use of a new singular-value-based optimizer (Sven) that is more effective than gradient descent at navigating the hierarchical structure of the loss landscape. We numerically construct candidate truncated CFT partition functions with central charges between 1 and $\frac{8}{7}$, a range devoid of known examples, and argue that these candidates likely come from a continuous space of modular bootstrap solutions. We also provide evidence for a more stringent constraint on the spectral gap near $c = 1$ than the existing bound of $Δ_{\rm gap} \le \frac{c}{6} + \frac{1}{3}$.
摘要:In this paper, we attempt to explore the landscape of two-dimensional conformal field theories (2d CFTs) by efficiently searching for numerical solutions to the modular bootstrap equation using machine-learning-style optimization. The torus partition function of a 2d CFT is fixed by the spectrum of its primary operators and its chiral algebra, which we take to be the Virasoro algebra with $c>1$. We translate the requirement that this partition function is modular invariant into a loss function, which we then minimize to identify possible primary spectra. Our approach involves two technical innovations that facilitate finding reliable candidate CFTs. The first is a strategy to estimate the uncertainty associated with truncating the spectrum to the lowest dimension operators. The second is the use of a new singular-value-based optimizer (Sven) that is more effective than gradient descent at navigating the hierarchical structure of the loss landscape. We numerically construct candidate truncated CFT partition functions with central charges between 1 and $\frac{8}{7}$, a range devoid of known examples, and argue that these candidates likely come from a continuous space of modular bootstrap solutions. We also provide evidence for a more stringent constraint on the spectral gap near $c = 1$ than the existing bound of $Δ_{\rm gap} \le \frac{c}{6} + \frac{1}{3}$.
【33】Experimental Design for Missing Physics
标题:缺失物理的实验设计
链接:https://arxiv.org/abs/2604.01231
作者:Arno Strouwen,Sebastián Micluţa-Câmpeanu
摘要:对于大多数过程系统,模型结构的知识是不完整的。这种缺失的物理学必须从实验数据中学习。最近,泛微分方程和符号回归的组合已经成为发现这些缺失物理的流行工具。通用微分方程使用神经网络来表示模型结构中缺失的部分,符号回归旨在使这些神经网络可解释。这些机器学习技术需要高质量的数据来成功恢复真实的模型结构。为了收集这些信息丰富的数据,开发了一种序贯实验设计技术,该技术基于最佳区分符号回归所建议的合理模型结构。然后,将该技术应用于发现生物反应器中缺失的物理特性。
摘要:For most process systems, knowledge of the model structure is incomplete. This missing physics must then be learned from experimental data. Recently, a combination of universal differential equations and symbolic regression has become a popular tool to discover these missing physics. Universal differential equations employ neural networks to represent missing parts of the model structure, and symbolic regression aims to make these neural networks interpretable. These machine learning techniques require high-quality data to successfully recover the true model structure. To gather such informative data, a sequential experimental design technique is developed which is based on optimally discriminating between the plausible model structures suggested by symbolic regression. This technique is then applied to discovering the missing physics of a bioreactor.
【34】Interpretable Battery Aging without Extra Tests via Neural-Assisted Physics-based Modelling
标题:通过基于神经辅助物理的建模,无需额外测试即可解释电池老化
链接:https://arxiv.org/abs/2604.01229
作者:Yuan Qiu,Wei Li,Wei Zhang,Yi Zhou,Fang Liu,Jianbiao Wang,Zhi Wei Seh
备注:Accepted to IEEE WCCI 2026 (IJCNN Special Session SS30: Computational Intelligence and AI Applications for Sustainable Energy Management in Smart Grids and Energy Communities, 2nd ed.). 8 pages, 4 figures, 2 tables
摘要:健康状态(SoH)被广泛用于电池管理,但它是一个单一的标量,并且提供有限的可解释性。具有相似SoH的两个电池可以表现出非常不同的退化行为,并且缺乏可解释性阻碍了最佳电池操作。在本文中,我们提出了IBAM可解释的电池老化建模与神经辅助物理为基础的框架。IBAM输出2-D老化指纹,无需额外的诊断测试,仅使用电池管理系统的常规日志。指纹通过捕获电池的曲线范围内的极化电压损失和放电结束附近的尾部损失提供了很好的可解释性。IBAM首先基于分数阶等效电路模型创建基于物理的电池模型,然后使用两阶段最小二乘法从模型中提取每个循环的指纹。IBAM进一步通过物理引导回归将指纹锚定在SoH轴上,其中通过具有定制的多通道电压特征的双向门控递归单元来估计每个周期的SoH。在短寿命、中寿命和长寿命的电池中,IBAM在不同的老化阶段始终产生最佳的物理模型保真度,并对不同寿命的电池的退化机制和指纹模式提供清晰的解释。由此产生的指纹支持可解释的电池健康评估,并可以告知电池控制选择。
摘要:State of health (SoH) is widely used for battery management, but it is a single scalar and offers limited interpretability. Two batteries with similar SoH can exhibit very different degradation behaviors and the lack of interpretability hinders optimal battery operation. In this paper, we propose IBAM for interpretable battery aging modelling with a neural-assisted physics-based framework. IBAM outputs a 2-D aging fingerprint without extra diagnostic tests and uses only routine logs from the battery management system. The fingerprint offers great interpretability by capturing a battery's curve-wide polarization voltage loss and the tail loss near the end-of-discharge. IBAM first creates a physics-based battery model based on a fractional-order equivalent circuit model, and then extracts per-cycle fingerprints from the model using a two-stage least-squares method. IBAM further anchors fingerprints on the SoH axis with physics-guided regression, where the per-cycle SoH is estimated via a bidirectional gated recurrent unit with customized multi-channel voltage features. Across batteries with short-, medium-, and long-lifespans, IBAM consistently yields the best physics model fidelity at different aging stages, and provides clear interpretations of degradation mechanisms and fingerprint patterns about batteries of different lifespans. The resulting fingerprints support interpretable battery health assessment and can inform battery control choices.
机器翻译由腾讯交互翻译提供,仅供参考
点击“阅读原文”获取带摘要的学术速递