点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!
cs.LG 方向,今日共计149篇
大模型相关(18篇)
【1】Learning from Trials and Errors: Reflective Test-Time Planning for Embodied LLMs
标题:从尝试和错误中学习:已确定的LLM的反思性测试时间规划
链接:https://arxiv.org/abs/2602.21198
作者:Yining Hong,Huang Huang,Manling Li,Li Fei-Fei,Jiajun Wu,Yejin Choi
摘要:高级LLM赋予机器人高级任务推理能力,但它们无法反思出了什么问题或为什么出了问题,将部署变成了一系列独立的试验,在这些试验中,错误会重复,而不是积累成经验。借鉴人类反射实践者,我们引入了反射测试时间规划,它集成了两种反射模式:\textit{reflection-in-action},其中代理使用测试时间缩放来生成并在执行之前使用内部反射对多个候选动作进行评分;和\texit {reflection-on-action},其使用测试时训练来更新其内部反射模型和基于执行之后的外部反射的其动作策略。我们还包括回顾性反思,允许代理重新评估早期的决策,并进行模型更新与事后适当的长期信用分配。对我们新设计的Long-Horizon Household基准和MuJoCo Cupboard Fitting基准的实验显示,与基线模型相比,取得了显着进步,消融研究验证了行动反思和行动反思的互补作用。包括真实机器人试验在内的定性分析强调通过反思进行行为矫正。
摘要:Embodied LLMs endow robots with high-level task reasoning, but they cannot reflect on what went wrong or why, turning deployment into a sequence of independent trials where mistakes repeat rather than accumulate into experience. Drawing upon human reflective practitioners, we introduce Reflective Test-Time Planning, which integrates two modes of reflection: \textit{reflection-in-action}, where the agent uses test-time scaling to generate and score multiple candidate actions using internal reflections before execution; and \textit{reflection-on-action}, which uses test-time training to update both its internal reflection model and its action policy based on external reflections after execution. We also include retrospective reflection, allowing the agent to re-evaluate earlier decisions and perform model updates with hindsight for proper long-horizon credit assignment. Experiments on our newly-designed Long-Horizon Household benchmark and MuJoCo Cupboard Fitting benchmark show significant gains over baseline models, with ablative studies validating the complementary roles of reflection-in-action and reflection-on-action. Qualitative analyses, including real-robot trials, highlight behavioral correction through reflection.
【2】Why Pass@k Optimization Can Degrade Pass@1: Prompt Interference in LLM Post-training
标题:为什么Pass@k优化会降级Pass@1:LLM后训练中的提示干扰
链接:https://arxiv.org/abs/2602.21189
作者:Anas Barakat,Souradip Chakraborty,Khushbu Pahwa,Amrit Singh Bedi
摘要:Pass@k是一个广泛使用的性能指标,用于可验证的大型语言模型任务,包括数学推理,代码生成和短答案推理。如果$k$个独立采样的解决方案中的任何一个通过了验证器,它就定义为成功。这种多样本推理度量激发了直接优化pass@$k$的推理感知微调方法。然而,以前的工作报告了一个反复出现的权衡:在这种方法下,pass@k提高,而pass@1降低。这种权衡实际上很重要,因为由于延迟和成本预算、不完善的验证器覆盖以及对可靠的单次回退的需求,pass@1通常仍然是一个硬操作约束。我们研究了这种权衡的起源,并提供了一个理论表征时,通过k策略优化可以减少pass@1通过梯度冲突引起的即时干扰。我们表明,Pass@$k$策略梯度可能与pass@1梯度发生冲突,因为Pass@$k$优化隐式地将提示重新加权为低成功提示;当这些提示受到我们所说的负面干扰时,它们的加权可以旋转Pass@k更新方向远离pass@1方向。我们说明了我们的理论研究结果与大型语言模型实验可验证的数学推理任务。
摘要:Pass@k is a widely used performance metric for verifiable large language model tasks, including mathematical reasoning, code generation, and short-answer reasoning. It defines success if any of $k$ independently sampled solutions passes a verifier. This multi-sample inference metric has motivated inference-aware fine-tuning methods that directly optimize pass@$k$. However, prior work reports a recurring trade-off: pass@k improves while pass@1 degrades under such methods. This trade-off is practically important because pass@1 often remains a hard operational constraint due to latency and cost budgets, imperfect verifier coverage, and the need for a reliable single-shot fallback. We study the origin of this trade-off and provide a theoretical characterization of when pass@k policy optimization can reduce pass@1 through gradient conflict induced by prompt interference. We show that pass@$k$ policy gradients can conflict with pass@1 gradients because pass@$k$ optimization implicitly reweights prompts toward low-success prompts; when these prompts are what we term negatively interfering, their upweighting can rotate the pass@k update direction away from the pass@1 direction. We illustrate our theoretical findings with large language model experiments on verifiable mathematical reasoning tasks.
【3】SELAUR: Self Evolving LLM Agent via Uncertainty-aware Rewards
标题:SELAur:通过不确定性意识奖励自我进化的LLM代理
链接:https://arxiv.org/abs/2602.21158
作者:Dengjia Zhang,Xiaoou Liu,Lu Cheng,Yaqing Wang,Kenton Murray,Hua Wei
摘要
:大型语言模型(LLM)越来越多地被部署为多步决策代理,其中有效的奖励设计对于指导学习至关重要。虽然最近的工作探索了各种形式的奖励塑造和阶梯级信用分配,但一个关键信号仍然在很大程度上被忽视:LLM的内在不确定性。不确定性反映了模型的信心,揭示了需要探索的地方,并提供了有价值的学习线索,即使在失败的轨迹。我们介绍SELAUR:通过不确定性感知奖励的自我进化LLM代理,这是一个强化学习框架,将不确定性直接纳入奖励设计。SELAUR将基于熵、最小置信度和边缘的度量集成到组合的令牌级不确定性估计中,提供密集的置信度对齐监督,并采用故障感知奖励重塑机制,将这些不确定性信号注入步骤和概率级奖励中,以提高探索效率和学习稳定性。在ALFWorld和WebShop两个基准测试上的实验表明,我们的方法在强基线上不断提高成功率。消融研究进一步证明了不确定性信号如何增强探索和鲁棒性。
摘要:Large language models (LLMs) are increasingly deployed as multi-step decision-making agents, where effective reward design is essential for guiding learning. Although recent work explores various forms of reward shaping and step-level credit assignment, a key signal remains largely overlooked: the intrinsic uncertainty of LLMs. Uncertainty reflects model confidence, reveals where exploration is needed, and offers valuable learning cues even in failed trajectories. We introduce SELAUR: Self Evolving LLM Agent via Uncertainty-aware Rewards, a reinforcement learning framework that incorporates uncertainty directly into the reward design. SELAUR integrates entropy-, least-confidence-, and margin-based metrics into a combined token-level uncertainty estimate, providing dense confidence-aligned supervision, and employs a failure-aware reward reshaping mechanism that injects these uncertainty signals into step- and trajectory-level rewards to improve exploration efficiency and learning stability. Experiments on two benchmarks, ALFWorld and WebShop, show that our method consistently improves success rates over strong baselines. Ablation studies further demonstrate how uncertainty signals enhance exploration and robustness.
【4】SpatiaLQA: A Benchmark for Evaluating Spatial Logical Reasoning in Vision-Language Models
标题:SpatiaLQA:评估视觉语言模型中空间逻辑推理的基准
链接:https://arxiv.org/abs/2602.20901
作者:Yuechen Xie,Xiaoyan Zhang,Yicheng Shan,Hao Zhu,Rui Tang,Rong Wei,Mingli Song,Yuanyu Wan,Jie Song
备注:Accepted by CVPR 2026
摘要:视觉语言模型(VLM)由于其出色的理解和推理能力,越来越多地应用于现实世界的场景。尽管VLM已经在常见的视觉问题回答和逻辑推理方面表现出令人印象深刻的能力,但它们仍然缺乏在复杂的现实环境中做出合理决策的能力。我们将这种能力定义为空间逻辑推理,它不仅需要理解复杂场景中物体之间的空间关系,还需要理解多步任务中步骤之间的逻辑依赖关系。为了弥合这一差距,我们引入了空间逻辑问题推理(SpatiaLQA),一个旨在评估VLMs空间逻辑推理能力的基准。SpatiaLQA由来自241个真实室内场景的9,605个问题答案对组成。我们在41个主流的VLMs上进行了大量的实验,结果表明,即使是最先进的模型仍然难以进行空间逻辑推理。为了解决这个问题,我们提出了一种称为递归场景图辅助推理的方法,该方法利用视觉基础模型将复杂场景逐步分解为任务相关的场景图,从而增强了VLM的空间逻辑推理能力,优于所有以前的方法。代码和数据集可在https://github.com/xieyc99/SpatiaLQA上获得。
摘要:Vision-Language Models (VLMs) have been increasingly applied in real-world scenarios due to their outstanding understanding and reasoning capabilities. Although VLMs have already demonstrated impressive capabilities in common visual question answering and logical reasoning, they still lack the ability to make reasonable decisions in complex real-world environments. We define this ability as spatial logical reasoning, which not only requires understanding the spatial relationships among objects in complex scenes, but also the logical dependencies between steps in multi-step tasks. To bridge this gap, we introduce Spatial Logical Question Answering (SpatiaLQA), a benchmark designed to evaluate the spatial logical reasoning capabilities of VLMs. SpatiaLQA consists of 9,605 question answer pairs derived from 241 real-world indoor scenes. We conduct extensive experiments on 41 mainstream VLMs, and the results show that even the most advanced models still struggle with spatial logical reasoning. To address this issue, we propose a method called recursive scene graph assisted reasoning, which leverages visual foundation models to progressively decompose complex scenes into task-relevant scene graphs, thereby enhancing the spatial logical reasoning ability of VLMs, outperforming all previous methods. Code and dataset are available at https://github.com/xieyc99/SpatiaLQA.
【5】Don't Ignore the Tail: Decoupling top-K Probabilities for Efficient Language Model Distillation
标题:不要忽视尾部:脱钩前K概率以实现高效语言模型提炼
链接:https://arxiv.org/abs/2602.20816
作者:Sayantan Dasgupta,Trevor Cohn,Timothy Baldwin
摘要:语言模型蒸馏中使用的核心学习信号是学生和教师分布之间的标准Kullback-Leibler(KL)分歧。传统的KL发散倾向于由具有最高概率的下一个令牌主导,即,教师的模式,从而减少输出分布的可能性较小但可能提供信息的成分的影响。我们提出了一个新的尾部感知发散,它将教师模型的前K个预测概率的贡献与较低概率预测的贡献相结合,同时保持与KL发散相同的计算配置文件。我们的解耦方法减少了教师模式的影响,因此,增加了分布尾部的贡献。实验结果表明,我们改进的蒸馏方法在各种数据集上的解码器模型的预训练和监督蒸馏中都具有竞争力的性能。此外,蒸馏过程是有效的,可以用适度的学术预算来执行大型数据集,消除了对工业规模计算的需要。
摘要:The core learning signal used in language model distillation is the standard Kullback-Leibler (KL) divergence between the student and teacher distributions. Traditional KL divergence tends to be dominated by the next tokens with the highest probabilities, i.e., the teacher's modes, thereby diminishing the influence of less probable yet potentially informative components of the output distribution. We propose a new tail-aware divergence that decouples the contribution of the teacher model's top-K predicted probabilities from that of lower-probability predictions, while maintaining the same computational profile as the KL Divergence. Our decoupled approach reduces the impact of the teacher modes and, consequently, increases the contribution of the tail of the distribution. Experimental results demonstrate that our modified distillation method yields competitive performance in both pre-training and supervised distillation of decoder models across various datasets. Furthermore, the distillation process is efficient and can be performed with a modest academic budget for large datasets, eliminating the need for industry-scale computing.
【6】Vision-Language Models for Ergonomic Assessment of Manual Lifting Tasks: Estimating Horizontal and Vertical Hand Distances from RGB Video
标题:视觉-语言模型用于人工举升任务的工效学评估:从RGB视频估计水平和垂直手距离
链接:https://arxiv.org/abs/2602.20658
作者:Mohammad Sadra Rajabi,Aanuoluwapo Ojelade,Sunwook Kim,Maury A. Nussbaum
摘要:手动提升任务是与工作相关的肌肉骨骼疾病的主要因素,有效的人体工程学风险评估对于量化物理暴露和告知人体工程学干预至关重要。修订的NIOSH提升方程(RNLE)是一种广泛使用的人体工程学风险评估工具,用于提升任务,依赖于六个任务变量,包括水平(H)和垂直(V)手距离;这些距离通常通过手动测量或专门的传感系统获得,并且难以在现实环境中使用。我们评估了使用创新的视觉语言模型(VLM)从RGB视频流非侵入性地估计H和V的可行性。开发了两个多级基于VLM的流水线:文本引导的仅检测流水线和检测加分割流水线。这两条管道都使用了任务相关感兴趣区域的文本引导定位,从这些区域提取视觉特征,以及基于变换器的时间回归来估计提升开始和结束时的H和V。对于一系列提升任务,使用两个管道和七个相机视图条件下的留一主体验证来评估估计性能。结果在管道和摄像机视图条件之间存在显著差异,基于分割的多视图管道始终产生最小误差,在估计H时实现约6-8 cm的平均绝对误差,在估计V时实现约5-8 cm的平均绝对误差。相对于仅检测流水线,像素级分割将H的估计误差减少了大约20-30%,将V的估计误差减少了大约35-40%。这些发现支持基于VLM的管道的可行性,基于视频的估计RNLE距离参数。
摘要:Manual lifting tasks are a major contributor to work-related musculoskeletal disorders, and effective ergonomic risk assessment is essential for quantifying physical exposure and informing ergonomic interventions. The Revised NIOSH Lifting Equation (RNLE) is a widely used ergonomic risk assessment tool for lifting tasks that relies on six task variables, including horizontal (H) and vertical (V) hand distances; such distances are typically obtained through manual measurement or specialized sensing systems and are difficult to use in real-world environments. We evaluated the feasibility of using innovative vision-language models (VLMs) to non-invasively estimate H and V from RGB video streams. Two multi-stage VLM-based pipelines were developed: a text-guided detection-only pipeline and a detection-plus-segmentation pipeline. Both pipelines used text-guided localization of task-relevant regions of interest, visual feature extraction from those regions, and transformer-based temporal regression to estimate H and V at the start and end of a lift. For a range of lifting tasks, estimation performance was evaluated using leave-one-subject-out validation across the two pipelines and seven camera view conditions. Results varied significantly across pipelines and camera view conditions, with the segmentation-based, multi-view pipeline consistently yielding the smallest errors, achieving mean absolute errors of approximately 6-8 cm when estimating H and 5-8 cm when estimating V. Across pipelines and camera view configurations, pixel-level segmentation reduced estimation error by approximately 20-30% for H and 35-40% for V relative to the detection-only pipeline. These findings support the feasibility of VLM-based pipelines for video-based estimation of RNLE distance parameters.
【7】Personal Information Parroting in Language Models
标题:语言模型中的个人信息鹦鹉学舌
链接:https://arxiv.org/abs/2602.20580
作者:Nishant Subramani,Kshitish Ghate,Mona Diab
备注:EACL Findings 2026
摘要:现代语言模型(LM)是在Web的大量碎片上训练的,包含数百万个个人信息(PI)实例,其中许多LM都记住了,这增加了隐私风险。在这项工作中,我们开发了正则表达式和规则(R&R)检测器套件来检测电子邮件地址,电话号码和IP地址,其性能优于最好的基于正则表达式的PI检测器。在一组手动策划的483个PI实例中,我们测量了记忆:发现13.6%是Pythia-6.9b模型逐字重复的,即,当用原始文档中PI之前的标记提示模型时,贪婪解码精确地生成整个PI跨度。我们将这种分析扩展到研究Pythia模型套件中不同大小(160 M-6.9B)和预训练时间步长(70 k-143 k迭代)的模型,发现模型大小和预训练量与记忆呈正相关。即使是最小的模型Pythia-160 m,也能准确地模仿2.7%的实例。因此,我们强烈建议对预训练数据集进行积极的过滤和匿名化,以尽量减少PI的模仿。
摘要:Modern language models (LM) are trained on large scrapes of the Web, containing millions of personal information (PI) instances, many of which LMs memorize, increasing privacy risks. In this work, we develop the regexes and rules (R&R) detector suite to detect email addresses, phone numbers, and IP addresses, which outperforms the best regex-based PI detectors. On a manually curated set of 483 instances of PI, we measure memorization: finding that 13.6% are parroted verbatim by the Pythia-6.9b model, i.e., when the model is prompted with the tokens that precede the PI in the original document, greedy decoding generates the entire PI span exactly. We expand this analysis to study models of varying sizes (160M-6.9B) and pretraining time steps (70k-143k iterations) in the Pythia model suite and find that both model size and amount of pretraining are positively correlated with memorization. Even the smallest model, Pythia-160m, parrots 2.7% of the instances exactly. Consequently, we strongly recommend that pretraining datasets be aggressively filtered and anonymized to minimize PI parroting.
【8】Wireless Federated Multi-Task LLM Fine-Tuning via Sparse-and-Orthogonal LoRA
标题:通过稀疏和垂直LoRA进行无线联合多任务LLM微调
链接:https://arxiv.org/abs/2602.20492
作者:Nuocheng Yang,Sihua Wang,Ouwen Huan,Mingzhe Chen,Tony Q. S. Quek,Changchuan Yin
备注:13 pages, 5 figures
摘要:基于低秩自适应(LoRA)的分散式联邦学习(DFL)使具有多任务数据集的移动设备能够通过无线连接与相邻设备的子集交换本地更新的参数以进行知识集成,从而协作地微调大语言模型(LLM)。然而,直接聚合在异构数据集上微调的参数会导致DFL生命周期中的三个主要问题:(i)\textit{在微调过程中的灾难性知识遗忘},由数据异构性引起的冲突更新方向引起;(ii)\textit{在模型聚合过程中的低效通信和收敛},由于带宽密集型冗余模型传输;以及(iii)\textit{推理过程中的多任务知识干扰},这是由于在推理期间不兼容的知识表示共存而导致的。为了在完全去中心化的场景中解决这些问题,我们首先提出了一种稀疏正交的LoRA,确保模型更新之间的正交性,以消除微调过程中的方向冲突。然后,我们分析了设备连接拓扑如何影响多任务性能,促使聚合过程中基于集群的拓扑设计。最后,提出了一种隐式混合专家(MoE)机制,以避免在推理过程中不相容知识的共存。仿真结果表明,与传统的LoRA方法相比,该方法可有效降低通信资源消耗达73美元,平均性能提高5美元.
摘要
:Decentralized federated learning (DFL) based on low-rank adaptation (LoRA) enables mobile devices with multi-task datasets to collaboratively fine-tune a large language model (LLM) by exchanging locally updated parameters with a subset of neighboring devices via wireless connections for knowledge integration.However, directly aggregating parameters fine-tuned on heterogeneous datasets induces three primary issues across the DFL life-cycle: (i) \textit{catastrophic knowledge forgetting during fine-tuning process}, arising from conflicting update directions caused by data heterogeneity; (ii) \textit{inefficient communication and convergence during model aggregation process}, due to bandwidth-intensive redundant model transmissions; and (iii) \textit{multi-task knowledge interference during inference process}, resulting from incompatible knowledge representations coexistence during inference. To address these issues in a fully decentralized scenario, we first propose a sparse-and-orthogonal LoRA that ensures orthogonality between model updates to eliminate direction conflicts during fine-tuning.Then, we analyze how device connection topology affects multi-task performance, prompting a cluster-based topology design during aggregation.Finally, we propose an implicit mixture of experts (MoE) mechanism to avoid the coexistence of incompatible knowledge during inference. Simulation results demonstrate that the proposed approach effectively reduces communication resource consumption by up to $73\%$ and enhances average performance by $5\%$ compared with the traditional LoRA method.
【9】Oracle-Robust Online Alignment for Large Language Models
标题:大型语言模型的Oracle稳健在线对齐
链接:https://arxiv.org/abs/2602.20457
作者:Zimeng Li,Mudit Gaur,Vaneet Aggarwal
摘要:我们研究在线对齐的大型语言模型下错误指定的偏好反馈,观察到的偏好预言偏离理想的,但未知的地面真理预言。由于数据收集和策略更新之间的耦合,在线LLM对齐问题是一个双层强化问题。最近,该问题已减少到易于处理的单级目标的SAIL(自我改进有效的在线对齐)框架。在本文中,我们引入了逐点预言不确定性集在这个问题中,并制定了一个预言鲁棒在线对齐目标作为最坏情况下的优化问题。对于对数线性政策,我们表明,这个强大的目标承认一个确切的封闭形式分解到原始损失函数加上一个明确的灵敏度惩罚。我们开发了投影随机复合更新所产生的弱凸目标,并证明$\widetilde{O}(\varepaly ^{-2})$ oracle复杂性达到近似平稳。
摘要:We study online alignment of large language models under misspecified preference feedback, where the observed preference oracle deviates from an ideal but unknown ground-truth oracle. The online LLM alignment problem is a bi-level reinforcement problem due to the coupling between data collection and policy updates. Recently, the problem has been reduced to tractable single-level objective in the SAIL (Self-Improving Efficient Online Alignment) framework. In this paper, we introduce a pointwise oracle uncertainty set in this problem and formulate an oracle-robust online alignment objective as a worst-case optimization problem. For log-linear policies, we show that this robust objective admits an exact closed-form decomposition into the original loss function plus an explicit sensitivity penalty. We develop projected stochastic composite updates for the resulting weakly convex objective and prove $\widetilde{O}(\varepsilon^{-2})$ oracle complexity for reaching approximate stationarity.
【10】Protein Language Models Diverge from Natural Language: Comparative Analysis and Improved Inference
标题:蛋白质语言模型与自然语言的区别:比较分析和改进的推理
链接:https://arxiv.org/abs/2602.20449
作者:Anna Hart,Chi Han,Jeonghwan Kim,Huimin Zhao,Heng Ji
摘要:现代蛋白质语言模型(PLM)将基于transformer的模型架构从自然语言处理应用到生物序列,预测各种蛋白质功能和特性。然而,蛋白质语言与自然语言有着关键的区别,例如尽管只有20个氨基酸的词汇,但它具有丰富的功能空间。这些差异促使人们研究基于transformer的架构在蛋白质领域中的不同运作方式,以及我们如何更好地利用PLM来解决蛋白质相关的任务。在这项工作中,我们首先直接比较了蛋白质和自然语言领域之间存储在注意力头层中的信息分布的差异。此外,我们适应了一个简单的早期退出技术,最初用于自然语言领域,以提高效率的性能为代价,以实现增加的准确性和大幅提高蛋白质非结构属性预测的效率,通过允许模型自动选择蛋白质表示从中间层的PLM的特定任务和手头的蛋白质。我们实现了0.4到7.01个百分点的性能提升,同时在模型和非结构性预测任务中将效率提高了10%以上。我们的工作开辟了一个研究领域,直接比较语言模型在进入蛋白质领域时如何改变行为,并推进生物领域的语言建模。
摘要:Modern Protein Language Models (PLMs) apply transformer-based model architectures from natural language processing to biological sequences, predicting a variety of protein functions and properties. However, protein language has key differences from natural language, such as a rich functional space despite a vocabulary of only 20 amino acids. These differences motivate research into how transformer-based architectures operate differently in the protein domain and how we can better leverage PLMs to solve protein-related tasks. In this work, we begin by directly comparing how the distribution of information stored across layers of attention heads differs between the protein and natural language domain. Furthermore, we adapt a simple early-exit technique-originally used in the natural language domain to improve efficiency at the cost of performance-to achieve both increased accuracy and substantial efficiency gains in protein non-structural property prediction by allowing the model to automatically select protein representations from the intermediate layers of the PLMs for the specific task and protein at hand. We achieve performance gains ranging from 0.4 to 7.01 percentage points while simultaneously improving efficiency by over 10 percent across models and non-structural prediction tasks. Our work opens up an area of research directly comparing how language models change behavior when moved into the protein domain and advances language modeling in biological domains.
【11】Emergent Manifold Separability during Reasoning in Large Language Models
标题:大型语言模型推理过程中的紧急Manifle可分离性
链接:https://arxiv.org/abs/2602.20338
作者:Alexandre Polo,Chanwoo Chun,SueYeon Chung
备注:Alexandre Polo and Chanwoo Chun contributed equally to this work"
摘要:思想链(CoT)提示显着提高了大型语言模型的推理,但底层表示几何的时间动态仍然知之甚少。我们调查这些动态应用流形容量理论(MCT)的组成布尔逻辑任务,使我们能够量化的线性可分性的潜在表示,而没有探针训练的混杂因素。我们的分析表明,推理表现为一个短暂的几何脉冲,概念流形解纠缠成线性可分离的子空间之前立即计算和快速压缩后。这种行为偏离了标准的线性探头精度,这在计算后很长时间内仍然很高,这表明仅仅是可检索的信息和几何上准备用于处理的信息之间的根本区别。我们将这种现象解释为动态流形管理,一种机制,模型动态地调节表示能力,以优化整个推理链的剩余流的带宽。
摘要
:Chain-of-Thought (CoT) prompting significantly improves reasoning in Large Language Models, yet the temporal dynamics of the underlying representation geometry remain poorly understood. We investigate these dynamics by applying Manifold Capacity Theory (MCT) to a compositional Boolean logic task, allowing us to quantify the linear separability of latent representations without the confounding factors of probe training. Our analysis reveals that reasoning manifests as a transient geometric pulse, where concept manifolds are untangled into linearly separable subspaces immediately prior to computation and rapidly compressed thereafter. This behavior diverges from standard linear probe accuracy, which remains high long after computation, suggesting a fundamental distinction between information that is merely retrievable and information that is geometrically prepared for processing. We interpret this phenomenon as \emph{Dynamic Manifold Management}, a mechanism where the model dynamically modulates representational capacity to optimize the bandwidth of the residual stream throughout the reasoning chain.
【12】Circuit Tracing in Vision-Language Models: Understanding the Internal Mechanisms of Multimodal Thinking
标题:视觉语言模型中的电路追踪:了解多模式思维的内部机制
链接:https://arxiv.org/abs/2602.20330
作者:Jingcheng Yang,Tianhu Xiong,Shengyi Qian,Klara Nahrstedt,Mingyuan Wu
备注:To appear in the Findings of CVPR 2026
摘要:视觉语言模型(VLM)功能强大,但仍然是不透明的黑盒子。我们介绍了第一个框架透明电路跟踪VLMs系统地分析多模态推理。通过利用代码转换器,属性图和基于注意力的方法,我们揭示了VLM如何分层整合视觉和语义概念。我们发现,不同的视觉功能电路可以处理数学推理和支持跨模态协会。通过功能转向和电路修补验证,我们的框架证明这些电路是因果和可控的,为更可解释和可靠的VLM奠定了基础。
摘要:Vision-language models (VLMs) are powerful but remain opaque black boxes. We introduce the first framework for transparent circuit tracing in VLMs to systematically analyze multimodal reasoning. By utilizing transcoders, attribution graphs, and attention-based methods, we uncover how VLMs hierarchically integrate visual and semantic concepts. We reveal that distinct visual feature circuits can handle mathematical reasoning and support cross-modal associations. Validated through feature steering and circuit patching, our framework proves these circuits are causal and controllable, laying the groundwork for more explainable and reliable VLMs.
【13】An artificial intelligence framework for end-to-end rare disease phenotyping from clinical notes using large language models
标题:使用大型语言模型从临床笔记中进行端到端罕见疾病表型分析的人工智能框架
链接:https://arxiv.org/abs/2602.20324
作者:Cathy Shyr,Yan Hu,Rory J. Tinker,Thomas A. Cassini,Kevin W. Byram,Rizwan Hamid,Daniel V. Fabbri,Adam Wright,Josh F. Peterson,Lisa Bastarache,Hua Xu
摘要:表型分析是罕见疾病诊断的基础,但从临床记录中手动处理结构化表型是劳动密集型的,难以扩展。现有的人工智能方法通常优化表型分析的各个组成部分,但不能实现从临床文本中提取特征、将其标准化为人类表型本体(HPO)术语以及对诊断信息HPO术语进行优先级排序的完整临床工作流程。我们开发了RARE-PHENIX,这是一个用于罕见疾病表型分析的端到端AI框架,它集成了基于大型语言模型的表型提取、基于本体的HPO术语标准化以及诊断信息表型的监督排名。我们使用来自11个未诊断疾病网络临床站点的2,671名患者的数据对RARE-PHENIX进行了训练,并在范德比尔特大学医学中心的16,357份真实临床记录上进行了外部验证。使用临床医生策划的HPO术语作为金标准,RARE-PHENIX在端到端评估中基于本体的相似性和精确度-召回-F1指标方面始终优于最先进的深度学习基线(PhenoBERT)(即,基于本体的相似性为0.70对0.58)。消融分析表明,在RARE-PHENIX中添加每个模块(提取、标准化和优先化)后,性能得到改善,支持对完整临床表型工作流程建模的价值。通过将表型建模为临床一致的工作流程,而不是单一的提取任务,RARE-PHENIX提供了结构化的,排名的表型,这些表型与临床医生的治疗更加一致,并有可能支持现实世界中的人类在环罕见疾病诊断。
摘要:Phenotyping is fundamental to rare disease diagnosis, but manual curation of structured phenotypes from clinical notes is labor-intensive and difficult to scale. Existing artificial intelligence approaches typically optimize individual components of phenotyping but do not operationalize the full clinical workflow of extracting features from clinical text, standardizing them to Human Phenotype Ontology (HPO) terms, and prioritizing diagnostically informative HPO terms. We developed RARE-PHENIX, an end-to-end AI framework for rare disease phenotyping that integrates large language model-based phenotype extraction, ontology-grounded standardization to HPO terms, and supervised ranking of diagnostically informative phenotypes. We trained RARE-PHENIX using data from 2,671 patients across 11 Undiagnosed Diseases Network clinical sites, and externally validated it on 16,357 real-world clinical notes from Vanderbilt University Medical Center. Using clinician-curated HPO terms as the gold standard, RARE-PHENIX consistently outperformed a state-of-the-art deep learning baseline (PhenoBERT) across ontology-based similarity and precision-recall-F1 metrics in end-to-end evaluation (i.e., ontology-based similarity of 0.70 vs. 0.58). Ablation analyses demonstrated performance improvements with the addition of each module in RARE-PHENIX (extraction, standardization, and prioritization), supporting the value of modeling the full clinical phenotyping workflow. By modeling phenotyping as a clinically aligned workflow rather than a single extraction task, RARE-PHENIX provides structured, ranked phenotypes that are more concordant with clinician curation and has the potential to support human-in-the-loop rare disease diagnosis in real-world settings.
【14】QuantVLA: Scale-Calibrated Post-Training Quantization for Vision-Language-Action Models
标题:QuantVLA:视觉-语言-动作模型的规模校准训练后量化
链接:https://arxiv.org/abs/2602.20309
作者:Jingxuan Zhang,Yunta Hsieh,Zhongwei Wang,Haokun Lin,Xin Wang,Ziqi Wang,Yingtie Lei,Mi Zhang
摘要
:视觉-语言-动作(VLA)模型统一了感知、语言和对具体代理的控制,但由于计算和内存需求的快速增长,特别是随着模型扩展到更长的范围和更大的骨干,在实际部署中面临着重大挑战。为了解决这些瓶颈,我们引入QuantVLA,一个无需训练的训练后量化(PTQ)框架,据我们所知,这是VLA系统的第一个PTQ方法,也是第一个成功地实现扩散Transformer(DiT)动作头的方法。QuantVLA包含三个标度校准组件:(1)选择性量化布局,其将语言主干和DiT中的所有线性层进行整合,同时将注意力投影保持在浮点中以保留原始操作员时间表;(2)注意力温度匹配,一种轻量级的按头缩放机制,其稳定注意力对数并在推理时折叠到去量化标度中;以及(3)输出头平衡,减轻投影后能量漂移的每层残余界面校准。该框架不需要额外的训练,只使用一个小的未标记的校准缓冲区,并支持整数内核的低位权重和激活,同时保持架构不变。在LIBERO上的代表性VLA模型中,QuantVLA超过了全精度基线的任务成功率,在量化组件上实现了约70%的相对内存节省,并在端到端推理延迟方面实现了1.22倍的加速,为在严格的计算,内存和功耗限制下实现可扩展的低比特嵌入式智能提供了一条实用的途径。
摘要:Vision-language-action (VLA) models unify perception, language, and control for embodied agents but face significant challenges in practical deployment due to rapidly increasing compute and memory demands, especially as models scale to longer horizons and larger backbones. To address these bottlenecks, we introduce QuantVLA, a training-free post-training quantization (PTQ) framework that, to our knowledge, is the first PTQ approach for VLA systems and the first to successfully quantize a diffusion transformer (DiT) action head. QuantVLA incorporates three scale-calibrated components: (1) a selective quantization layout that integerizes all linear layers in both the language backbone and the DiT while keeping attention projections in floating point to preserve the original operator schedule; (2) attention temperature matching, a lightweight per-head scaling mechanism that stabilizes attention logits and is folded into the dequantization scales at inference; and (3) output head balancing, a per-layer residual interface calibration that mitigates post-projection energy drift. The framework requires no additional training, uses only a small unlabeled calibration buffer, and supports integer kernels for low-bit weights and activations while leaving the architecture unchanged. Across representative VLA models on LIBERO, QuantVLA exceeds the task success rates of full-precision baselines, achieves about 70% relative memory savings on the quantized components, and delivers a 1.22x speedup in end-to-end inference latency, providing a practical pathway toward scalable low-bit embodied intelligence under strict compute, memory, and power constraints.
【15】Exploring Anti-Aging Literature via ConvexTopics and Large Language Models
标题:通过凹凸主题和大型语言模型探索抗衰老文学
链接:https://arxiv.org/abs/2602.20224
作者:Lana E. Yeganova,Won G. Kim,Shubo Tian,Natalie Xie,Donald C. Comeau,W. John Wilbur,Zhiyong Lu
摘要:生物医学出版物的快速扩张为组织知识和检测新兴趋势带来了挑战,强调了对可扩展和可解释方法的需求。常见的聚类和主题建模方法(如K-means或LDA)对初始化敏感,容易出现局部最优,限制了可重复性和评估。我们提出了一种基于凸优化的聚类算法,通过从数据中选择样本并保证全局最优来产生稳定,细粒度的主题。应用于约12,000篇关于衰老和长寿的PubMed文章,我们的方法揭示了医学专家验证的主题。它产生了可解释的主题,从分子机制到膳食补充剂,体力活动和肠道微生物群。该方法表现良好,最重要的是,其可重复性和可解释性将其与常见的聚类方法(包括K-means,LDA和BERTopic)区分开来。这项工作为开发可扩展的、可通过网络访问的知识发现工具提供了基础。
摘要:The rapid expansion of biomedical publications creates challenges for organizing knowledge and detecting emerging trends, underscoring the need for scalable and interpretable methods. Common clustering and topic modeling approaches such as K-means or LDA remain sensitive to initialization and prone to local optima, limiting reproducibility and evaluation. We propose a reformulation of a convex optimization based clustering algorithm that produces stable, fine-grained topics by selecting exemplars from the data and guaranteeing a global optimum. Applied to about 12,000 PubMed articles on aging and longevity, our method uncovers topics validated by medical experts. It yields interpretable topics spanning from molecular mechanisms to dietary supplements, physical activity, and gut microbiota. The method performs favorably, and most importantly, its reproducibility and interpretability distinguish it from common clustering approaches, including K-means, LDA, and BERTopic. This work provides a basis for developing scalable, web-accessible tools for knowledge discovery.
【16】Golden Layers and Where to Find Them: Improved Knowledge Editing for Large Language Models Via Layer Gradient Analysis
标题:黄金层及其在哪里找到它们:通过层梯度分析改进大型语言模型的知识编辑
链接:https://arxiv.org/abs/2602.20207
作者:Shrestha Datta,Hongfu Liu,Anshuman Chhabra
摘要:大型语言模型(LLM)中的知识编辑旨在将模型对特定查询的预测更新为所需目标,同时保留其对所有其他输入的行为。此过程通常包括两个阶段:识别要编辑的图层和执行参数更新。直觉上,不同的查询可能会将知识定位在模型的不同深度,从而导致固定编辑层的不同样本编辑性能。在这项工作中,我们假设存在固定的黄金层,可以实现接近最佳的编辑性能,类似于样本最佳层。为了验证这一假设,我们通过将黄金层与地面真实样本最优层进行比较来提供经验证据。此外,我们表明,黄金层可以使用代理数据集可靠地识别,并有效地推广到跨数据集的看不见的测试集查询。最后,我们提出了一种新的方法,即层梯度分析(LGA),它通过梯度属性有效地估计黄金层,避免了多次编辑运行中的大量试错。在几个基准数据集上的大量实验证明了我们的LGA方法在不同LLM类型和各种知识编辑方法中的有效性和鲁棒性。
摘要:Knowledge editing in Large Language Models (LLMs) aims to update the model's prediction for a specific query to a desired target while preserving its behavior on all other inputs. This process typically involves two stages: identifying the layer to edit and performing the parameter update. Intuitively, different queries may localize knowledge at different depths of the model, resulting in different sample-wise editing performance for a fixed editing layer. In this work, we hypothesize the existence of fixed golden layers that can achieve near-optimal editing performance similar to sample-wise optimal layers. To validate this hypothesis, we provide empirical evidence by comparing golden layers against ground-truth sample-wise optimal layers. Furthermore, we show that golden layers can be reliably identified using a proxy dataset and generalize effectively to unseen test set queries across datasets. Finally, we propose a novel method, namely Layer Gradient Analysis (LGA) that estimates golden layers efficiently via gradient-attribution, avoiding extensive trial-and-error across multiple editing runs. Extensive experiments on several benchmark datasets demonstrate the effectiveness and robustness of our LGA approach across different LLM types and various knowledge editing methods.
【17】MoBiQuant: Mixture-of-Bits Quantization for Token-Adaptive Elastic LLMs
标题:MoBiQuant:令牌自适应弹性LLM的混合位量化
链接:https://arxiv.org/abs/2602.20191
作者:Dongwei Wang,Jinhee Kim,Seokho Han,Denis Gudovskiy,Yohei Nakata,Tomoyuki Okuno,KhayTze Peong,Kang Eun Jeon,Jong Hwan Ko,Yiran Chen,Huanrui Yang
备注:17 pages, 12 figures
摘要:云和边缘设备上不断变化的运行时复杂性需要弹性大语言模型(LLM)部署,其中LLM可以基于可用的计算资源以各种量化精度进行推断。然而,已经观察到,用于量化的校准参数通常与特定精度相关联,这在运行时的弹性精度校准和精度切换期间提出了挑战。在这项工作中,我们将不同的校准参数的来源归因于由精度依赖的离群值迁移现象引起的不同的标记级别灵敏度。受此观察的启发,我们提出了\texttt{MoBiQuant},一种新的Mixture-of-Bits量化框架,该框架基于标记灵敏度调整弹性LLM推断的权重精度。具体来说,我们提出了多合一递归残差量化,可以迭代地重建更高精度的权重和令牌感知路由器,动态选择剩余位片的数量。MoBiQuant能够实现平滑的精度切换,同时提高令牌离群值分布的泛化能力。实验结果表明,MoBiQuant具有很强的弹性,使其能够匹配LLaMA 3 -8B上的位特定校准PTQ的性能,而无需重复校准。
摘要:Changing runtime complexity on cloud and edge devices necessitates elastic large language model (LLM) deployment, where an LLM can be inferred with various quantization precisions based on available computational resources. However, it has been observed that the calibration parameters for quantization are typically linked to specific precisions, which presents challenges during elastic-precision calibration and precision switching at runtime. In this work, we attribute the source of varying calibration parameters to the varying token-level sensitivity caused by a precision-dependent outlier migration phenomenon.Motivated by this observation, we propose \texttt{MoBiQuant}, a novel Mixture-of-Bits quantization framework that adjusts weight precision for elastic LLM inference based on token sensitivity. Specifically, we propose the many-in-one recursive residual quantization that can iteratively reconstruct higher-precision weights and the token-aware router to dynamically select the number of residual bit slices. MoBiQuant enables smooth precision switching while improving generalization for the distribution of token outliers. Experimental results demonstrate that MoBiQuant exhibits strong elasticity, enabling it to match the performance of bit-specific calibrated PTQ on LLaMA3-8B without repeated calibration.
【18】Benchmarking Distilled Language Models: Performance and Efficiency in Resource-Constrained Settings
标题:提炼语言模型基准:资源受限环境中的性能和效率
链接:https://arxiv.org/abs/2602.20164
作者:Sachin Gopal Wani,Eric Page,Ajay Dholakia,David Ellison
备注:16 pages, 5 figures, accepted at the the 2025 TPCTC Conference
摘要:知识蒸馏提供了一种变革性的途径,可以开发出适用于资源受限环境的强大而高效的小型语言模型(SLM)。在本文中,我们基准的性能和计算成本的蒸馏模型对他们的香草和专有的同行,提供了一个定量分析其效率。我们的研究结果表明,蒸馏创建一个优越的性能计算曲线。我们发现,创建一个经过提炼的8B模型的计算效率比训练它的普通模型高出2,000倍以上,同时实现的推理能力与标准模型相当,甚至超过标准模型的十倍。这些发现验证了蒸馏不仅是一种压缩技术,而且是构建最先进的,可访问的AI的主要策略
摘要:Knowledge distillation offers a transformative pathway to developing powerful, yet efficient, small language models (SLMs) suitable for resource-constrained environments. In this paper, we benchmark the performance and computational cost of distilled models against their vanilla and proprietary counterparts, providing a quantitative analysis of their efficiency. Our results demonstrate that distillation creates a superior performance-tocompute curve. We find that creating a distilled 8B model is over 2,000 times more compute-efficient than training its vanilla counterpart, while achieving reasoning capabilities on par with, or even exceeding, standard models ten times its size. These findings validate distillation not just as a compression technique, but as a primary strategy for building state-of-the-art, accessible AI
Graph相关(图学习|图神经网络|图优化等)(5篇)
【1】Probing Graph Neural Network Activation Patterns Through Graph Topology
标题:通过图布局探索图神经网络激活模式
链接:https://arxiv.org/abs/2602.21092
作者:Floriano Tori,Lorenzo Bini,Marco Sorbi,Stéphane Marchand-Maillet,Vincent Ginis
摘要:图上的曲率概念提供了图拓扑的理论描述,突出了瓶颈和密集的连接区域。图神经网络中的消息传递范式的伪像,如过度平滑和过度挤压,都归因于这些区域。然而,目前尚不清楚图的拓扑如何与GNN的学习偏好相互作用。通过对应于图Transformers中的极端边缘激活值的大量激活,我们探测这种对应关系。我们的研究结果的合成图和分子基准显示,MA不优先集中在曲率极值,尽管它们的理论联系信息流。在长程图基准上,我们发现了一个系统性的曲率偏移:全局注意力机制加剧了拓扑瓶颈,大大增加了负曲率的流行。我们的工作将曲率重新定义为一种诊断探针,用于理解图学习何时以及为什么失败。
摘要:Curvature notions on graphs provide a theoretical description of graph topology, highlighting bottlenecks and denser connected regions. Artifacts of the message passing paradigm in Graph Neural Networks, such as oversmoothing and oversquashing, have been attributed to these regions. However, it remains unclear how the topology of a graph interacts with the learned preferences of GNNs. Through Massive Activations, which correspond to extreme edge activation values in Graph Transformers, we probe this correspondence. Our findings on synthetic graphs and molecular benchmarks reveal that MAs do not preferentially concentrate on curvature extremes, despite their theoretical link to information flow. On the Long Range Graph Benchmark, we identify a systemic \textit{curvature shift}: global attention mechanisms exacerbate topological bottlenecks, drastically increasing the prevalence of negative curvature. Our work reframes curvature as a diagnostic probe for understanding when and why graph learning fails.
【2】DRESS: A Continuous Framework for Structural Graph Refinement
标题:DRESS:结构图细化的连续框架
链接:https://arxiv.org/abs/2602.20833
作者:Eduar Castrillo Velilla
摘要:Weisfeiler-Lehman(WL)层次结构是图同构测试和结构分析的基石框架。然而,扩展到1-WL以上到3-WL及更高需要基于张量的操作,这些操作的扩展时间为O(n^3)或O(n^4),这使得它们在计算上对于大型图是不可接受的。在本文中,我们从原始DRESS方程(Castrillo,Leon和Gomez,2018)开始-一个无参数的连续边动力系统-并证明它将棱柱图与K_{3,3}区分开来,这是1-WL可证明无法分离的一对。然后,我们将其推广到图案DRESS,它取代三角形的邻域与任意结构图案和收敛到一个唯一的不动点在三个充分条件下,并进一步广义DRESS,一个抽象的模板参数化的邻域操作,聚合功能和规范的选择。最后,我们介绍了Delta-DRESS,它运行DRESS的每个节点删除子图G\{v},连接框架的凯利-乌拉姆重建猜想。Motif-DRESS和Delta-DRESS都可以凭经验区分混淆3-WL的强正则图(SRG),如Rook和Shrikhande图。我们的研究结果建立了DRESS家族作为一个高度可扩展的框架,经验上超过了1-WL和3-WL在著名的基准图,没有令人望而却步的O(n^4)计算成本。
摘要:The Weisfeiler-Lehman (WL) hierarchy is a cornerstone framework for graph isomorphism testing and structural analysis. However, scaling beyond 1-WL to 3-WL and higher requires tensor-based operations that scale as O(n^3) or O(n^4), making them computationally prohibitive for large graphs. In this paper, we start from the Original-DRESS equation (Castrillo, Leon, and Gomez, 2018)--a parameter-free, continuous dynamical system on edges--and show that it distinguishes the prism graph from K_{3,3}, a pair that 1-WL provably cannot separate. We then generalize it to Motif-DRESS, which replaces triangle neighborhoods with arbitrary structural motifs and converges to a unique fixed point under three sufficient conditions, and further to Generalized-DRESS, an abstract template parameterized by the choice of neighborhood operator, aggregation function and norm. Finally, we introduce Delta-DRESS, which runs DRESS on each node-deleted subgraph G\{v}, connecting the framework to the Kelly-Ulam reconstruction conjecture. Both Motif-DRESS and Delta-DRESS empirically distinguish Strongly Regular Graphs (SRGs)--such as the Rook and Shrikhande graphs--that confound 3-WL. Our results establish the DRESS family as a highly scalable framework that empirically surpasses both 1-WL and 3-WL on well-known benchmark graphs, without the prohibitive O(n^4) computational cost.
【3】Stability and Generalization of Push-Sum Based Decentralized Optimization over Directed Graphs
标题:有向图上基于推和的分散优化的稳定性和推广
链接:https://arxiv.org/abs/2602.20567
作者:Yifei Liang,Yan Sun,Xiaochun Cao,Li Shen
备注:47 Pages
摘要:基于Push-Sum的分散式学习可以在信息交换可能不对称的有向通信网络上进行优化。虽然这些方法的收敛特性是很好的理解,其有限迭代的稳定性和泛化行为仍然不清楚,由于列随机混合和不对称误差传播引起的结构偏差。在这项工作中,我们开发了一个统一的均匀稳定性框架的随机梯度推(SGP)算法,捕捉有向拓扑结构的效果。一个关键的技术成分是Push-Sum的不平衡感知一致性界限,它通过两个量控制共识偏差:稳定分布不平衡参数$δ$和控制混合速度的频谱间隙$(1-λ)$。这种分解使我们能够解开拓扑引起的偏见的统计效应。我们建立了有限迭代稳定性和优化保证凸目标和非凸目标满足Polyak--Jasojasiewicz条件。对于凸问题,SGP获得了$\tilde{\mathcal{O}}\!\ left(\frac{1}{\sqrt{mn}}+\fracγ{δ(1-λ)}+γ\right)$,并刻画了使该界最小化的相应最优提前停止时间.对于Pvp目标,我们得到了类凸优化和推广率,其主导依赖性与$κ\!\左(1+\frac{1}{δ(1-λ)}\right)$,揭示了问题条件化和有向通信拓扑之间的乘法耦合。我们的分析阐明了与标准分散式SGD相比,何时需要Push-Sum校正,并量化了不平衡和混合如何共同塑造可达到的最佳学习性能。
摘要:Push-Sum-based decentralized learning enables optimization over directed communication networks, where information exchange may be asymmetric. While convergence properties of such methods are well understood, their finite-iteration stability and generalization behavior remain unclear due to structural bias induced by column-stochastic mixing and asymmetric error propagation. In this work, we develop a unified uniform-stability framework for the Stochastic Gradient Push (SGP) algorithm that captures the effect of directed topology. A key technical ingredient is an imbalance-aware consistency bound for Push-Sum, which controls consensus deviation through two quantities: the stationary distribution imbalance parameter $δ$ and the spectral gap $(1-λ)$ governing mixing speed. This decomposition enables us to disentangle statistical effects from topology-induced bias. We establish finite-iteration stability and optimization guarantees for both convex objectives and non-convex objectives satisfying the Polyak--Łojasiewicz condition. For convex problems, SGP attains excess generalization error of order $\tilde{\mathcal{O}}\!\left(\frac{1}{\sqrt{mn}}+\fracγ{δ(1-λ)}+γ\right)$ under step-size schedules, and we characterize the corresponding optimal early stopping time that minimizes this bound. For PŁ objectives, we obtain convex-like optimization and generalization rates with dominant dependence proportional to $κ\!\left(1+\frac{1}{δ(1-λ)}\right)$, revealing a multiplicative coupling between problem conditioning and directed communication topology. Our analysis clarifies when Push-Sum correction is necessary compared with standard decentralized SGD and quantifies how imbalance and mixing jointly shape the best attainable learning performance.
【4】CGSTA: Cross-Scale Graph Contrast with Stability-Aware Alignment for Multivariate Time-Series Anomaly Detection
标题:CGSTA:跨尺度图对比与稳定性感知对齐,用于多元时间序列异常检测
链接:https://arxiv.org/abs/2602.20468
作者:Zhongpeng Qi,Jun Zhang,Wei Li,Zhuoxuan Liang
备注:Accepted by DASFAA'26
摘要:多变量时间序列异常检测对于可靠的工业控制、遥测和服务监控至关重要。然而,不断变化的变量间依赖性和不可避免的噪声使其具有挑战性。现有的方法通常使用单尺度图或实例级对比度。此外,学习的动态图可能在没有稳定锚的情况下过拟合噪声,导致误报或遗漏。为了应对这些挑战,我们提出了具有两个关键创新的CGSTA框架。首先,动态分层图构建(DLGC)为每个滑动窗口形成变量关系的局部,区域和全局视图;而不是对比整个窗口,跨尺度对比判别(CDS)对比每个视图中的图形表示,并在视图中对齐相同的窗口,以使学习结构感知。其次,稳定性感知对齐(SAA)保持从正常数据中学习的每尺度稳定参考,并将当前窗口的快速变化的图形引导到它以抑制噪声。我们融合了多尺度和时间特征,并使用条件密度估计器来产生每个时间步的异常分数。在四个基准测试中,CGSTA在PSM和WADI上提供了最佳性能,并且与SWaT和SMAP上的基线方法相当。
摘要
:Multivariate time-series anomaly detection is essential for reliable industrial control, telemetry, and service monitoring. However, the evolving inter-variable dependencies and inevitable noise render it challenging. Existing methods often use single-scale graphs or instance-level contrast. Moreover, learned dynamic graphs can overfit noise without a stable anchor, causing false alarms or misses. To address these challenges, we propose the CGSTA framework with two key innovations. First, Dynamic Layered Graph Construction (DLGC) forms local, regional, and global views of variable relations for each sliding window; rather than contrasting whole windows, Contrastive Discrimination across Scales (CDS) contrasts graph representations within each view and aligns the same window across views to make learning structure-aware. Second, Stability-Aware Alignment (SAA) maintains a per-scale stable reference learned from normal data and guides the current window's fast-changing graphs toward it to suppress noise. We fuse the multi-scale and temporal features and use a conditional density estimator to produce per-time-step anomaly scores. Across four benchmarks, CGSTA delivers optimal performance on PSM and WADI, and is comparable to the baseline methods on SWaT and SMAP.
【5】OrgFlow: Generative Modeling of Organic Crystal Structures from Molecular Graphs
标题:OrgFlow:从分子图生成有机晶体结构建模
链接:https://arxiv.org/abs/2602.20195
作者:Mohammadmahdi Vahediahmar,Matthew A. McDonald,Feng Liu
备注:9 pages, 4 figures
摘要:晶体结构预测是材料科学中的一个长期挑战,大多数数据驱动的方法都是针对无机系统开发的。这为有机晶体留下了一个重要的空白,有机晶体是药物,聚合物和功能材料的核心,但存在独特的挑战,例如更大的晶胞和严格的化学连接性。本文介绍了一种直接从分子图预测有机晶体结构的流匹配模型。该体系结构将分子连接性与周期性边界条件相结合,同时保持晶体系统的对称性。一个键意识的损失引导模型走向现实的局部化学通过强制分布的键长和连接。为了支持可靠和有效的训练,我们构建了一个有机晶体的策划数据集,以及一个预处理管道,可以预先计算键和边,大大减少了训练和推理过程中的计算开销。实验表明,我们的方法实现了比现有基线高10倍以上的匹配率,同时需要更少的采样步骤进行推理。这些结果建立生成建模作为一个实用的和可扩展的框架,有机晶体结构预测。
摘要:Crystal structure prediction is a long-standing challenge in materials science, with most data-driven methods developed for inorganic systems. This leaves an important gap for organic crystals, which are central to pharmaceuticals, polymers, and functional materials, but present unique challenges, such as larger unit cells and strict chemical connectivity. We introduce a flow-matching model for predicting organic crystal structures directly from molecular graphs. The architecture integrates molecular connectivity with periodic boundary conditions while preserving the symmetries of crystalline systems. A bond-aware loss guides the model toward realistic local chemistry by enforcing distributions of bond lengths and connectivity. To support reliable and efficient training, we built a curated dataset of organic crystals, along with a preprocessing pipeline that precomputes bonds and edges, substantially reducing computational overhead during both training and inference. Experiments show that our method achieves a Match Rate more than 10 times higher than existing baselines while requiring fewer sampling steps for inference. These results establish generative modeling as a practical and scalable framework for organic crystal structure prediction.
Transformer(4篇)
【1】Scaling Vision Transformers: Evaluating DeepSpeed for Image-Centric Workloads
标题:缩放视觉转换器:评估以图像为中心的工作负载的DeepSpeed
链接:https://arxiv.org/abs/2602.21081
作者:Huy Trinh,Rebecca Ma,Zeqi Yu,Tahsin Reza
摘要:Vision Transformers(ViTs)通过利用自我注意机制来捕获数据中的全局关系,在图像处理任务中表现出显着的潜力。然而,它们的可扩展性受到显着的计算和内存需求的阻碍,特别是对于具有许多参数的大规模模型。这项研究的目的是利用DeepSpeed,一个高效的分布式训练框架,通常用于语言模型,以提高ViTs的可扩展性和性能。我们在CIFAR-10和CIFAR-100等各种数据集上评估多个GPU配置的节点内和节点间训练效率,探索分布式数据并行性对训练速度、通信开销和整体可扩展性(强和弱扩展性)的影响。通过系统地改变软件参数,如批量大小和梯度累积,我们确定了影响分布式训练性能的关键因素。本研究中的实验为将DeepSpeed应用于图像相关任务提供了基础。未来的工作将扩展这些调查,以加深我们对DeepSpeed局限性的理解,并探索优化Vision Transformers分布式训练管道的策略。
摘要:Vision Transformers (ViTs) have demonstrated remarkable potential in image processing tasks by utilizing self-attention mechanisms to capture global relationships within data. However, their scalability is hindered by significant computational and memory demands, especially for large-scale models with many parameters. This study aims to leverage DeepSpeed, a highly efficient distributed training framework that is commonly used for language models, to enhance the scalability and performance of ViTs. We evaluate intra- and inter-node training efficiency across multiple GPU configurations on various datasets like CIFAR-10 and CIFAR-100, exploring the impact of distributed data parallelism on training speed, communication overhead, and overall scalability (strong and weak scaling). By systematically varying software parameters, such as batch size and gradient accumulation, we identify key factors influencing performance of distributed training. The experiments in this study provide a foundational basis for applying DeepSpeed to image-related tasks. Future work will extend these investigations to deepen our understanding of DeepSpeed's limitations and explore strategies for optimizing distributed training pipelines for Vision Transformers.
【2】TrajGPT-R: Generating Urban Mobility Trajectory with Reinforcement Learning-Enhanced Generative Pre-trained Transformer
标题:TrajGPT-R:利用强化学习增强型生成预训练Transformer生成城市移动轨迹
链接:https://arxiv.org/abs/2602.20643
作者:Jiawei Wang,Chuang Yang,Jiawei Yong,Xiaohang Xu,Hongjun Wang,Noboru Koshizuka,Shintaro Fukushima,Ryosuke Shibasaki,Renhe Jiang
备注:TrajGPT-R is a Reinforcement Learning-Enhanced Generative Pre-trained Transformer for Mobility Trajectory Generation
摘要
:移动轨迹对于了解城市动态和加强城市规划至关重要,但对这些数据的访问经常受到隐私问题的阻碍。这项研究介绍了一个用于生成大规模城市移动轨迹的变革性框架,采用了一种基于transformer的模型的新应用,该模型通过两个阶段的过程进行了预训练和微调。最初,轨迹生成被概念化为离线强化学习(RL)问题,在标记化期间实现了词汇空间的显着减少。逆向强化学习(IRL)的集成允许捕获随机奖励信号,利用历史数据来推断个人移动偏好。随后,使用构建的奖励模型对预训练模型进行微调,有效地解决了传统基于RL的自回归方法中固有的挑战,例如长期信用分配和稀疏奖励环境的处理。对多个数据集的综合评估表明,我们的框架在可靠性和多样性方面明显优于现有模型。我们的研究结果不仅推进了城市交通建模领域,还为模拟城市数据提供了一种强大的方法,对交通管理和城市发展规划具有重要意义。该实现可在https://github.com/Wangjw6/TrajGPT_R上公开获取。
摘要:Mobility trajectories are essential for understanding urban dynamics and enhancing urban planning, yet access to such data is frequently hindered by privacy concerns. This research introduces a transformative framework for generating large-scale urban mobility trajectories, employing a novel application of a transformer-based model pre-trained and fine-tuned through a two-phase process. Initially, trajectory generation is conceptualized as an offline reinforcement learning (RL) problem, with a significant reduction in vocabulary space achieved during tokenization. The integration of Inverse Reinforcement Learning (IRL) allows for the capture of trajectory-wise reward signals, leveraging historical data to infer individual mobility preferences. Subsequently, the pre-trained model is fine-tuned using the constructed reward model, effectively addressing the challenges inherent in traditional RL-based autoregressive methods, such as long-term credit assignment and handling of sparse reward environments. Comprehensive evaluations on multiple datasets illustrate that our framework markedly surpasses existing models in terms of reliability and diversity. Our findings not only advance the field of urban mobility modeling but also provide a robust methodology for simulating urban data, with significant implications for traffic management and urban development planning. The implementation is publicly available at https://github.com/Wangjw6/TrajGPT_R.
【3】Standard Transformers Achieve the Minimax Rate in Nonparametric Regression with $C^{s,λ}$ Targets
标题:标准Transformer在非参数回归中实现最小最大速率,$C ' s,A}$目标
链接:https://arxiv.org/abs/2602.20555
作者:Yanming Lai,Defeng Sun
备注:58 pages, 1 figure
摘要:Transformer模型在大型语言模型和计算机视觉等领域的巨大成功需要严格的理论研究。据我们所知,本文是第一个证明标准Transformers可以在$L^t$距离($t \in [1,\infty]$)下以任意精度逼近Hölder函数$ C^{s,λ}\left([0,1]^{d\times n}\right)$$(s\in\mathbb{N}_{\geq0},0
摘要:The tremendous success of Transformer models in fields such as large language models and computer vision necessitates a rigorous theoretical investigation. To the best of our knowledge, this paper is the first work proving that standard Transformers can approximate Hölder functions $ C^{s,λ}\left([0,1]^{d\times n}\right) $$ (s\in\mathbb{N}_{\geq0},0
【4】PhyGHT: Physics-Guided HyperGraph Transformer for Signal Purification at the HL-LHC
标题:PhyGHT:用于HL-LHC信号净化的物理引导超图Transformer
链接:https://arxiv.org/abs/2602.20475
作者:Mohammed Rakib,Luke Vaughan,Shivang Patel,Flera Rizatdinova,Alexander Khanov,Atriya Sen
备注:Under Review
摘要:欧洲核子研究中心的高亮度大型强子对撞机(HL-LHC)将产生前所未有的数据集,能够揭示宇宙的基本性质。然而,实现其发现潜力面临着一个重大挑战:从大约200个同时发生的堆积碰撞占主导地位的压倒性背景中提取小信号部分。这种极端的噪声严重扭曲了精确重建所需的物理观测值。为了解决这个问题,我们引入了物理引导超图Transformer(PhyGHT),一个混合架构,结合了距离感知的本地图形注意力与全球自我注意力,以反映质子-质子碰撞中形成的粒子簇的物理拓扑结构。至关重要的是,我们集成了Pileup Suppression Gate(PSG),这是一种可解释的物理约束机制,可以在超图聚合之前明确学习过滤软噪声。为了验证我们的方法,我们发布了一个新的模拟数据集的顶夸克对生产模型极端堆积条件。PhyGHT在预测信号的能量和质量校正因子方面优于ATLAS和CMS实验的最新基线。通过准确重建顶夸克的不变质量,我们展示了机器学习创新和跨学科合作如何直接推动实验物理学前沿的科学发现,并增强HL-LHC的发现潜力。数据集和代码可在https://github.com/rAIson-Lab/PhyGHT上获得
摘要:The High-Luminosity Large Hadron Collider (HL-LHC) at CERN will produce unprecedented datasets capable of revealing fundamental properties of the universe. However, realizing its discovery potential faces a significant challenge: extracting small signal fractions from overwhelming backgrounds dominated by approximately 200 simultaneous pileup collisions. This extreme noise severely distorts the physical observables required for accurate reconstruction. To address this, we introduce the Physics-Guided Hypergraph Transformer (PhyGHT), a hybrid architecture that combines distance-aware local graph attention with global self-attention to mirror the physical topology of particle showers formed in proton-proton collisions. Crucially, we integrate a Pileup Suppression Gate (PSG), an interpretable, physics-constrained mechanism that explicitly learns to filter soft noise prior to hypergraph aggregation. To validate our approach, we release a novel simulated dataset of top-quark pair production to model extreme pileup conditions. PhyGHT outperforms state-of-the-art baselines from the ATLAS and CMS experiments in predicting the signal's energy and mass correction factors. By accurately reconstructing the top quark's invariant mass, we demonstrate how machine learning innovation and interdisciplinary collaboration can directly advance scientific discovery at the frontiers of experimental physics and enhance the HL-LHC's discovery potential. The dataset and code are available at https://github.com/rAIson-Lab/PhyGHT
GAN|对抗|攻击|生成相关(5篇)
【1】Deep unfolding of MCMC kernels: scalable, modular & explainable GANs for high-dimensional posterior sampling
标题:MCMC内核的深度展开:可扩展、模块化和可解释的GAN,用于多维后验采样
链接:https://arxiv.org/abs/2602.20758
作者:Jonathan Spence,Tobías I. Liaudat,Konstantinos Zygalakis,Marcelo Pereyra
备注
:37 pages, 10 figures, 5 tables
摘要:马尔可夫链蒙特卡罗(MCMC)方法是贝叶斯计算的基础,但可能是计算密集型的,特别是在高维环境中。前推生成模型,如生成对抗网络(GANs),变分自动编码器和归一化流,为后验采样提供了一种计算效率高的替代方案。然而,前推模型是不透明的,因为它们缺乏贝叶斯定理的模块性,导致对似然函数变化的概括性较差。在这项工作中,我们介绍了一种新的方法,GAN架构设计,应用深度展开Langevin MCMC算法。这种范式将固定步长的迭代算法映射到模块化神经网络上,产生了灵活且易于解释的架构。至关重要的是,我们的设计允许在推理时指定关键模型参数,从而对似然参数的变化提供鲁棒性。我们使用监督正则化Wasserstein GAN框架进行后验采样,对这些展开的采样器进行端到端的训练。通过大量的贝叶斯成像实验,我们证明了我们提出的方法实现了高采样精度和出色的计算效率,同时保留了经典MCMC策略的物理一致性,适应性和可解释性。
摘要:Markov chain Monte Carlo (MCMC) methods are fundamental to Bayesian computation, but can be computationally intensive, especially in high-dimensional settings. Push-forward generative models, such as generative adversarial networks (GANs), variational auto-encoders and normalising flows offer a computationally efficient alternative for posterior sampling. However, push-forward models are opaque as they lack the modularity of Bayes Theorem, leading to poor generalisation with respect to changes in the likelihood function. In this work, we introduce a novel approach to GAN architecture design by applying deep unfolding to Langevin MCMC algorithms. This paradigm maps fixed-step iterative algorithms onto modular neural networks, yielding architectures that are both flexible and amenable to interpretation. Crucially, our design allows key model parameters to be specified at inference time, offering robustness to changes in the likelihood parameters. We train these unfolded samplers end-to-end using a supervised regularized Wasserstein GAN framework for posterior sampling. Through extensive Bayesian imaging experiments, we demonstrate that our proposed approach achieves high sampling accuracy and excellent computational efficiency, while retaining the physics consistency, adaptability and interpretability of classical MCMC strategies.
【2】SibylSense: Adaptive Rubric Learning via Memory Tuning and Adversarial Probing
标题:SibylSense:通过记忆调整和对抗性探索的自适应Ruby学习
链接:https://arxiv.org/abs/2602.20751
作者:Yifei Xu,Guilherme Potje,Shivam Shandilya,Tiancheng Yuan,Leonardo de Oliveira Nunes,Rakshanda Agarwal,Saeid Asgari,Adam Atkinson,Emre Kıcıman,Songwu Lu,Ranveer Chandra,Tusher Chakraborty
摘要:为开放式世代设计一致和强大的奖励仍然是RL后培训的关键障碍。规则提供了结构化的、可解释的监督,但扩展规则的构建是困难的:专家规则成本高昂,提示规则通常是肤浅的或不一致的,固定池的判别规则可能饱和和漂移,从而实现奖励黑客。我们提出了SibylSense,一种推理时间学习方法,通过可调的内存银行验证的标题项目,适应冻结标题生成器。记忆是通过基于验证者的项目奖励来更新的,该奖励是通过少数几个例子中的参考-候选答案判别差距来衡量的。SibylSense将记忆调整与规则对抗策略更新交替进行,以产生令规则满意的候选答案,缩小区分差距,并驱动规则生成器捕获新的质量维度。两个开放式任务的实验表明,SibylSense产生更多的歧视性规则,并提高下游RL性能超过静态和非自适应基线。
摘要:Designing aligned and robust rewards for open-ended generation remains a key barrier to RL post-training. Rubrics provide structured, interpretable supervision, but scaling rubric construction is difficult: expert rubrics are costly, prompted rubrics are often superficial or inconsistent, and fixed-pool discriminative rubrics can saturate and drift, enabling reward hacking. We present SibylSense, an inference-time learning approach that adapts a frozen rubric generator through a tunable memory bank of validated rubric items. Memory is updated via verifier-based item rewards measured by reference-candidate answer discriminative gaps from a handful of examples. SibylSense alternates memory tuning with a rubric-adversarial policy update that produces rubric-satisfying candidate answers, shrinking discriminative gaps and driving the rubric generator to capture new quality dimensions. Experiments on two open-ended tasks show that SibylSense yields more discriminative rubrics and improves downstream RL performance over static and non-adaptive baselines.
【3】Is the Trigger Essential? A Feature-Based Triggerless Backdoor Attack in Vertical Federated Learning
标题:触发器至关重要吗?垂直联邦学习中基于冲突的无触发后门攻击
链接:https://arxiv.org/abs/2602.20593
作者:Yige Liu,Yiwei Lou,Che Wang,Yongzhi Cao,Hanpin Wang
摘要:垂直联邦学习(Vertical Federated Learning,VFL)作为一种分布式协作机器学习范式,允许多个具有不同特征的被动方和一个具有标签的主动方协作训练模型。尽管VFL以其隐私保护能力而闻名,但它仍然面临着来自后门攻击的重大隐私和安全威胁。现有的后门攻击通常涉及攻击者在训练阶段将触发器植入模型中,并通过在推理阶段将触发器添加到样本中来执行攻击。然而,在本文中,我们发现,触发器是不是必不可少的VFL后门攻击。有鉴于此,我们通过引入基于特征的无标记后门攻击,揭示了一种新的VFL后门攻击途径。这种攻击是在更严格的安全假设下进行的,攻击者在训练阶段是诚实但好奇的,而不是恶意的。它包括三个模块:针对目标后门攻击的标签推断、具有放大和扰动机制的毒药生成以及后门执行以实施攻击。在五个基准数据集上进行的大量实验表明,我们的攻击比三种基线后门攻击的性能高出2到50倍,同时对主要任务的影响最小。即使在有32个被动方和只有一组辅助数据的VFL场景中,我们的攻击也保持了很高的性能。此外,当面对不同的防御策略时,我们的攻击在很大程度上不受影响,并表现出很强的鲁棒性。我们希望这种无漏洞后门攻击途径的披露将鼓励社区重新审视VFL场景中的安全威胁,并激励研究人员开发更强大和实用的防御策略。
摘要
:As a distributed collaborative machine learning paradigm, vertical federated learning (VFL) allows multiple passive parties with distinct features and one active party with labels to collaboratively train a model. Although it is known for the privacy-preserving capabilities, VFL still faces significant privacy and security threats from backdoor attacks. Existing backdoor attacks typically involve an attacker implanting a trigger into the model during the training phase and executing the attack by adding the trigger to the samples during the inference phase. However, in this paper, we find that triggers are not essential for backdoor attacks in VFL. In light of this, we disclose a new backdoor attack pathway in VFL by introducing a feature-based triggerless backdoor attack. This attack operates under a more stringent security assumption, where the attacker is honest-but-curious rather than malicious during the training phase. It comprises three modules: label inference for the targeted backdoor attack, poison generation with amplification and perturbation mechanisms, and backdoor execution to implement the attack. Extensive experiments on five benchmark datasets demonstrate that our attack outperforms three baseline backdoor attacks by 2 to 50 times while minimally impacting the main task. Even in VFL scenarios with 32 passive parties and only one set of auxiliary data, our attack maintains high performance. Moreover, when confronted with distinct defense strategies, our attack remains largely unaffected and exhibits strong robustness. We hope that the disclosure of this triggerless backdoor attack pathway will encourage the community to revisit security threats in VFL scenarios and inspire researchers to develop more robust and practical defense strategies.
【4】CREDIT: Certified Ownership Verification of Deep Neural Networks Against Model Extraction Attacks
标题:CREDID:针对模型提取攻击的深度神经网络认证所有权验证
链接:https://arxiv.org/abs/2602.20419
作者:Bolin Shen,Zhan Cheng,Neil Zhenqiang Gong,Fan Yao,Yushun Dong
摘要:机器学习即服务(MLaaS)已经成为一种广泛采用的范式,用于提供对深度神经网络(DNN)模型的访问,使用户能够通过标准化的API方便地利用这些模型。然而,这样的服务极易受到模型提取攻击(MEA)的攻击,其中对手反复查询目标模型以收集输入输出对,并使用它们来训练紧密复制其功能的代理模型。虽然已经提出了许多防御策略,但在严格的理论保证下验证可疑模型的所有权仍然是一项具有挑战性的任务。为了解决这一差距,我们引入了CREDIT,这是一种针对多边环境协定的认证所有权验证。具体来说,我们采用互信息来量化DNN模型之间的相似性,提出了一个实用的验证阈值,并提供了严格的理论保证,所有权验证基于这个阈值。我们在不同领域和任务的几个主流数据集上广泛评估了我们的方法,实现了最先进的性能。我们的实现可在以下网址公开获取:https://github.com/LabRAI/CREDIT。
摘要:Machine Learning as a Service (MLaaS) has emerged as a widely adopted paradigm for providing access to deep neural network (DNN) models, enabling users to conveniently leverage these models through standardized APIs. However, such services are highly vulnerable to Model Extraction Attacks (MEAs), where an adversary repeatedly queries a target model to collect input-output pairs and uses them to train a surrogate model that closely replicates its functionality. While numerous defense strategies have been proposed, verifying the ownership of a suspicious model with strict theoretical guarantees remains a challenging task. To address this gap, we introduce CREDIT, a certified ownership verification against MEAs. Specifically, we employ mutual information to quantify the similarity between DNN models, propose a practical verification threshold, and provide rigorous theoretical guarantees for ownership verification based on this threshold. We extensively evaluate our approach on several mainstream datasets across different domains and tasks, achieving state-of-the-art performance. Our implementation is publicly available at: https://github.com/LabRAI/CREDIT.
【5】Multimodal Crystal Flow: Any-to-Any Modality Generation for Unified Crystal Modeling
标题:多峰Crystal Flow:统一晶体建模的任意对任意模式生成
链接:https://arxiv.org/abs/2602.20210
作者:Kiyoung Seong,Sungsoo Ahn,Sehui Han,Changyoung Park
摘要:晶体建模跨越不同模态的条件和无条件生成任务家族,包括晶体结构预测(CSP)和晶体从头生成(DNG)。虽然最近的深度生成模型表现出了很好的性能,但它们在很大程度上仍然是特定于任务的,缺乏一个统一的框架,可以在不同的生成任务中共享晶体表示。为了解决这一限制,我们提出了一个统一的多模态流模型,它通过原子类型和晶体结构的独立时间变量实现多个晶体生成任务作为不同的推理轨迹。为了在标准的Transformer模型中实现多模式流,我们引入了具有层次排列增强的组合和可感知的原子排序,注入了强大的组合和晶体学先验,而无需明确的结构模板。在MP-20和MPTS-52基准测试上的实验表明,MCFlow在多个晶体生成任务中实现了与特定于任务的基线相比具有竞争力的性能。
摘要:Crystal modeling spans a family of conditional and unconditional generation tasks across different modalities, including crystal structure prediction (CSP) and \emph{de novo} generation (DNG). While recent deep generative models have shown promising performance, they remain largely task-specific, lacking a unified framework that shares crystal representations across different generation tasks. To address this limitation, we propose \emph{Multimodal Crystal Flow (MCFlow)}, a unified multimodal flow model that realizes multiple crystal generation tasks as distinct inference trajectories via independent time variables for atom types and crystal structures. To enable multimodal flow in a standard transformer model, we introduce a composition- and symmetry-aware atom ordering with hierarchical permutation augmentation, injecting strong compositional and crystallographic priors without explicit structural templates. Experiments on the MP-20 and MPTS-52 benchmarks show that MCFlow achieves competitive performance against task-specific baselines across multiple crystal generation tasks.
半/弱/无/有监督|不确定性|主动学习(7篇)
【1】ProxyFL: A Proxy-Guided Framework for Federated Semi-Supervised Learning
标题:ProxyFL:一个用于联邦半监督学习的代理引导框架
链接:https://arxiv.org/abs/2602.21078
作者:Duowen Chen,Yan Wang
备注:CVPR 2026. code: https://github.com/DuowenC/FSSLlib
摘要:联邦半监督学习(FSSL)旨在通过以隐私保护的方式利用部分注释的本地数据来协作训练跨客户端的全局模型。在FSSL中,数据异构性是一个具有挑战性的问题,它既存在于客户端之间,也存在于客户端内部。外部异质性是指不同客户端之间的数据分布差异,而内部异质性则表示客户端内标记数据和未标记数据之间的不匹配。大多数FSSL方法通常设计固定或动态的参数聚合策略,以收集服务器(外部)上的客户端知识和/或过滤掉低置信度的未标记样本,以减少本地客户端(内部)的错误。但是,前者很难通过直接权重精确拟合理想的全局分布,而后者导致较少的数据参与FL训练。为此,我们提出了一个代理引导的框架,称为ProxyFL,重点是通过一个统一的代理,同时减轻外部和内部的异质性。也就是说,我们考虑分类器的可学习权重作为代理来模拟局部和全局的类别分布。对于外部,我们针对离群值而不是直接权重显式优化全局代理;对于内部,我们通过正负代理池将丢弃的样本重新包含到训练中,以减轻潜在不正确的伪标签的影响。Insight实验和理论分析表明,我们在FSSL的显着性能和收敛性。
摘要:Federated Semi-Supervised Learning (FSSL) aims to collaboratively train a global model across clients by leveraging partially-annotated local data in a privacy-preserving manner. In FSSL, data heterogeneity is a challenging issue, which exists both across clients and within clients. External heterogeneity refers to the data distribution discrepancy across different clients, while internal heterogeneity represents the mismatch between labeled and unlabeled data within clients. Most FSSL methods typically design fixed or dynamic parameter aggregation strategies to collect client knowledge on the server (external) and / or filter out low-confidence unlabeled samples to reduce mistakes in local client (internal). But, the former is hard to precisely fit the ideal global distribution via direct weights, and the latter results in fewer data participation into FL training. To this end, we propose a proxy-guided framework called ProxyFL that focuses on simultaneously mitigating external and internal heterogeneity via a unified proxy. I.e., we consider the learnable weights of classifier as proxy to simulate the category distribution both locally and globally. For external, we explicitly optimize global proxy against outliers instead of direct weights; for internal, we re-include the discarded samples into training by a positive-negative proxy pool to mitigate the impact of potentially-incorrect pseudo-labels. Insight experiments & theoretical analysis show our significant performance and convergence in FSSL.
【2】Fuz-RL: A Fuzzy-Guided Robust Framework for Safe Reinforcement Learning under Uncertainty
标题:Fuz-RL:不确定性下安全强化学习的模糊引导鲁棒框架
链接:https://arxiv.org/abs/2602.20729
作者:Xu Wan,Chao Yang,Cheng Yang,Jie Song,Mingyang Sun
摘要:安全强化学习(RL)对于实现高性能,同时确保现实世界应用的安全性至关重要。然而,在现实环境中的多个不确定性源的复杂的相互作用,可解释的风险评估和稳健的决策提出了重大挑战。为了解决这些挑战,我们提出了Fuz-RL,一个模糊测量引导的安全RL的鲁棒框架。具体来说,我们的框架开发了一种新的模糊Bellman算子估计强大的价值函数,使用Choquet积分。理论上,我们证明了解决Fuz-RL问题(约束马尔可夫决策过程(CMDP)的形式)是等价于解决分布鲁棒安全RL问题(在强大的CMDP形式),有效地避免了最小-最大优化。对安全控制健身房和安全健身房场景的实证分析表明,Fuz-RL以无模型的方式有效地集成了现有的安全RL基线,在观察,动作和动态的各种类型的不确定性下,显着提高了安全和控制性能。
摘要:Safe Reinforcement Learning (RL) is crucial for achieving high performance while ensuring safety in real-world applications. However, the complex interplay of multiple uncertainty sources in real environments poses significant challenges for interpretable risk assessment and robust decision-making. To address these challenges, we propose Fuz-RL, a fuzzy measure-guided robust framework for safe RL. Specifically, our framework develops a novel fuzzy Bellman operator for estimating robust value functions using Choquet integrals. Theoretically, we prove that solving the Fuz-RL problem (in Constrained Markov Decision Process (CMDP) form) is equivalent to solving distributionally robust safe RL problems (in robust CMDP form), effectively avoiding min-max optimization. Empirical analyses on safe-control-gym and safety-gymnasium scenarios demonstrate that Fuz-RL effectively integrates with existing safe RL baselines in a model-free manner, significantly improving both safety and control performance under various types of uncertainties in observation, action, and dynamics.
【3】Three Concrete Challenges and Two Hopes for the Safety of Unsupervised Elicitation
标题:无监督诱惑安全的三个具体挑战和两个希望
链接:https://arxiv.org/abs/2602.20400
作者:Callum Canavan,Aditya Shrivastava,Allison Qi,Jonathan Michala,Fabien Roger
备注:19 pages, 9 figures
摘要:为了将语言模型引导到超出人类能力的任务上的真实输出,以前的工作建议在简单任务上训练模型,以引导它们在更难的任务上(从容易到困难的泛化),或者使用无监督训练算法来引导没有外部标签的模型(无监督启发)。虽然这两种范式的技术已被证明可以提高各种任务的模型准确性,但我们认为,用于这些评估的数据集可能会导致过于乐观的评估结果。与许多现实世界的数据集不同,它们通常(1)没有比真实性更显着的特征,(2)具有平衡的训练集,(3)仅包含模型可以给出明确答案的数据点。我们构建了缺乏这些属性的数据集,以压力测试一系列标准的无监督启发和从简单到困难的泛化技术。我们发现,没有一种技术在这些挑战中表现得很好。我们还研究了集成和组合易硬和无监督技术,并发现它们只能部分缓解由于这些挑战而导致的性能下降。我们认为,克服这些挑战应该是未来无监督启发工作的优先事项。
摘要:To steer language models towards truthful outputs on tasks which are beyond human capability, previous work has suggested training models on easy tasks to steer them on harder ones (easy-to-hard generalization), or using unsupervised training algorithms to steer models with no external labels at all (unsupervised elicitation). Although techniques from both paradigms have been shown to improve model accuracy on a wide variety of tasks, we argue that the datasets used for these evaluations could cause overoptimistic evaluation results. Unlike many real-world datasets, they often (1) have no features with more salience than truthfulness, (2) have balanced training sets, and (3) contain only data points to which the model can give a well-defined answer. We construct datasets that lack each of these properties to stress-test a range of standard unsupervised elicitation and easy-to-hard generalization techniques. We find that no technique reliably performs well on any of these challenges. We also study ensembling and combining easy-to-hard and unsupervised techniques, and find they only partially mitigate performance degradation due to these challenges. We believe that overcoming these challenges should be a priority for future work on unsupervised elicitation.
【4】Hierarchical Molecular Representation Learning via Fragment-Based Self-Supervised Embedding Prediction
标题:基于片段的自监督嵌入预测的分层分子表示学习
链接:https://arxiv.org/abs/2602.20344
作者:Jiele Wu,Haozhe Ma,Zhihan Guo,Thanh Vinh Vo,Tze Yun Leong
备注:15 pages (8 pages main text),8 figures
摘要
:图自监督学习(GSSL)已经证明了在不需要人工注释的情况下生成表达性图嵌入的强大潜力,这使得它在具有高标记成本的领域(如分子图分析)中特别有价值。然而,现有的GSSL方法大多集中在节点或边缘水平的信息,往往忽略了化学相关的子结构,强烈影响分子的性质。在这项工作中,我们提出了图语义预测网络(GraSPNet),一个层次化的自监督框架,显式地模拟原子级和片段级的语义。GraSPNet将分子图分解为化学上有意义的片段,而无需预定义的词汇表,并通过多级消息传递学习节点和片段级表示,在这两个级别上进行掩码语义预测。这种分层语义监督使GraSPNet能够学习多分辨率的结构信息,这些信息既具有表达性又具有可转移性。对多个分子性质预测基准的广泛实验表明,GraSPNet学习化学上有意义的表示,并在迁移学习设置中始终优于最先进的GSSL方法。
摘要:Graph self-supervised learning (GSSL) has demonstrated strong potential for generating expressive graph embeddings without the need for human annotations, making it particularly valuable in domains with high labeling costs such as molecular graph analysis. However, existing GSSL methods mostly focus on node- or edge-level information, often ignoring chemically relevant substructures which strongly influence molecular properties. In this work, we propose Graph Semantic Predictive Network (GraSPNet), a hierarchical self-supervised framework that explicitly models both atomic-level and fragment-level semantics. GraSPNet decomposes molecular graphs into chemically meaningful fragments without predefined vocabularies and learns node- and fragment-level representations through multi-level message passing with masked semantic prediction at both levels. This hierarchical semantic supervision enables GraSPNet to learn multi-resolution structural information that is both expressive and transferable. Extensive experiments on multiple molecular property prediction benchmarks demonstrate that GraSPNet learns chemically meaningful representations and consistently outperforms state-of-the-art GSSL methods in transfer learning settings.
【5】Uncertainty-Aware Delivery Delay Duration Prediction via Multi-Task Deep Learning
标题:基于多任务深度学习的不确定性交付延迟持续时间预测
链接:https://arxiv.org/abs/2602.20271
作者:Stefan Faulkner,Reza Zandehshahvar,Vahid Eghbal Akhlaghi,Sebastien Ouellet,Carsten Jordan,Pascal Van Hentenryck
摘要:准确的交货延迟预测对于维持整个现代供应链的运营效率和客户满意度至关重要。然而,物流网络的日益复杂性,跨越多式联运,跨国路由,以及明显的区域变化,使这一预测任务具有内在的挑战性。本文介绍了一种多任务深度学习模型,用于在存在显著不平衡数据的情况下预测交付延迟持续时间,其中延迟发货很少,但在操作上是重要的。该模型将高维装运特征嵌入到表格数据的专用嵌入层中,然后使用分类回归策略来预测准时和延迟装运的交付延迟持续时间。与顺序管道不同,这种方法支持端到端训练,提高了延迟案例的检测能力,并支持概率预测,以实现不确定性感知决策。该方法是在工业合作伙伴的大规模真实数据集上进行评估的,该数据集包括具有不同区域特征的四个主要来源地的1000多万条历史货运记录。该模型与传统的机器学习方法进行了比较。实验结果表明,该方法实现了延迟发货预测的平均绝对误差为0.67-0.91天,优于基于树的一步回归基线的41-64%和基于树的两步分类回归模型的15- 35%。这些收益表明,该模型在高度不平衡和异构条件下的业务交付延迟预测的有效性。
摘要:Accurate delivery delay prediction is critical for maintaining operational efficiency and customer satisfaction across modern supply chains. Yet the increasing complexity of logistics networks, spanning multimodal transportation, cross-country routing, and pronounced regional variability, makes this prediction task inherently challenging. This paper introduces a multi-task deep learning model for delivery delay duration prediction in the presence of significant imbalanced data, where delayed shipments are rare but operationally consequential. The model embeds high-dimensional shipment features with dedicated embedding layers for tabular data, and then uses a classification-then-regression strategy to predict the delivery delay duration for on-time and delayed shipments. Unlike sequential pipelines, this approach enables end-to-end training, improves the detection of delayed cases, and supports probabilistic forecasting for uncertainty-aware decision making. The proposed approach is evaluated on a large-scale real-world dataset from an industrial partner, comprising more than 10 million historical shipment records across four major source locations with distinct regional characteristics. The proposed model is compared with traditional machine learning methods. Experimental results show that the proposed method achieves a mean absolute error of 0.67-0.91 days for delayed-shipment predictions, outperforming single-step tree-based regression baselines by 41-64% and two-step classify-then-regress tree-based models by 15-35%. These gains demonstrate the effectiveness of the proposed model in operational delivery delay forecasting under highly imbalanced and heterogeneous conditions.
【6】Not Just How Much, But Where: Decomposing Epistemic Uncertainty into Per-Class Contributions
标题:不仅仅是多少,而是在哪里:将认识的不确定性分解为每个班级的贡献
链接:https://arxiv.org/abs/2602.21160
作者:Mame Diarra Toure,David A. Stephens
备注:8 pages, 17 figures
摘要:在安全关键分类中,故障的成本通常是不对称的,但贝叶斯深度学习用单个标量互信息(MI)总结了认知不确定性,无法区分模型的无知是否涉及良性或安全关键类。我们将MI分解为每类向量$C_k(x)=σ_k^{2}/(2μ_k)$,其中$μ_k{=}\mathbb{E}[p_k]$和$σ_k^2{=}\mathrm{Var}[p_k]$跨越后验样本。分解遵循熵的二阶泰勒展开;$1/μ_k$加权校正边界抑制,并使$C_k$在稀有类和常见类之间具有可比性。通过构造$\sum_k C_k \approx \mathrm{MI}$,以及伴随的偏度诊断标记近似退化的输入。在描述了$C_k$的公理性质之后,我们在三个任务上验证了它:(i)糖尿病视网膜病变的选择性预测,其中临界类$C_k$使选择性风险比MI降低34.7%,比方差基线降低56.2%;(ii)在临床和图像基准上的分布外检测,其中$\sum_k C_k$实现了最高的AUROC,并且每类视图暴露了MI不可见的不对称移位;和(iii)受控标签噪声研究,其中在端到端贝叶斯训练下,$\sum_k C_k$对注入任意噪声的敏感性低于MI,而这两个度量在迁移学习下都降低。在所有任务中,后验近似的质量至少与度量的选择一样强烈地塑造了不确定性,这表明不确定性如何通过网络传播与如何测量一样重要。
摘要
:In safety-critical classification, the cost of failure is often asymmetric, yet Bayesian deep learning summarises epistemic uncertainty with a single scalar, mutual information (MI), that cannot distinguish whether a model's ignorance involves a benign or safety-critical class. We decompose MI into a per-class vector $C_k(x)=σ_k^{2}/(2μ_k)$, with $μ_k{=}\mathbb{E}[p_k]$ and $σ_k^2{=}\mathrm{Var}[p_k]$ across posterior samples. The decomposition follows from a second-order Taylor expansion of the entropy; the $1/μ_k$ weighting corrects boundary suppression and makes $C_k$ comparable across rare and common classes. By construction $\sum_k C_k \approx \mathrm{MI}$, and a companion skewness diagnostic flags inputs where the approximation degrades. After characterising the axiomatic properties of $C_k$, we validate it on three tasks: (i) selective prediction for diabetic retinopathy, where critical-class $C_k$ reduces selective risk by 34.7\% over MI and 56.2\% over variance baselines; (ii) out-of-distribution detection on clinical and image benchmarks, where $\sum_k C_k$ achieves the highest AUROC and the per-class view exposes asymmetric shifts invisible to MI; and (iii) a controlled label-noise study in which $\sum_k C_k$ shows less sensitivity to injected aleatoric noise than MI under end-to-end Bayesian training, while both metrics degrade under transfer learning. Across all tasks, the quality of the posterior approximation shapes uncertainty at least as strongly as the choice of metric, suggesting that how uncertainty is propagated through the network matters as much as how it is measured.
【7】Multimodal MRI Report Findings Supervised Brain Lesion Segmentation with Substructures
标题:多模式MRI报告结果通过子结构监督脑部病变分割
链接:https://arxiv.org/abs/2602.20994
作者:Yubin Ge,Yongsong Huang,Xiaofeng Liu
备注:IEEE International Symposium on Biomedical Imaging (ISBI) 2026
摘要:报告监督(RSuper)学习试图减轻对具有从放射学报告导出的约束的密集肿瘤体素标签的需要(例如,体积、计数、大小、位置)。然而,在脑肿瘤的MRI研究中,我们经常涉及多参数扫描和子结构。在这里,细粒度模态/参数报告通常与全局结果一起提供,并与不同的子结构相关。此外,报告往往只描述最大的病变,并提供定性或不确定的线索(“轻度”,“可能”)。经典RSuper损耗(例如,总和体积一致性)可能在这种不完整性下过度约束或幻觉未报告的发现,并且不能利用这些分层发现或利用合并数据集中的各种病变类型的先验。我们明确解析了全球定量和模态方面的定性研究结果,并引入了一个统一的、单侧的、不确定性感知的公式(MS-RSuper),该公式:(i)将模态特定的定性线索(例如,T1 c增强,FLAIR水肿)与其相应的子结构使用存在和不存在损失;(ii)强制部分定量线索的单侧下限(例如,最大的病变尺寸,最小的多重性);和(iii)增加轴外与轴内解剖学先验以考虑群组差异。不完整的令牌会受到惩罚;缺失的线索会被降低权重。在1238个报告标记的BrTS-MET/MEN扫描中,我们的MS-RSuper在很大程度上优于稀疏监督基线和朴素RSuper方法。
摘要:Report-supervised (RSuper) learning seeks to alleviate the need for dense tumor voxel labels with constraints derived from radiology reports (e.g., volumes, counts, sizes, locations). In MRI studies of brain tumors, however, we often involve multi-parametric scans and substructures. Here, fine-grained modality/parameter-wise reports are usually provided along with global findings and are correlated with different substructures. Moreover, the reports often describe only the largest lesion and provide qualitative or uncertain cues (``mild,'' ``possible''). Classical RSuper losses (e.g., sum volume consistency) can over-constrain or hallucinate unreported findings under such incompleteness, and are unable to utilize these hierarchical findings or exploit the priors of varied lesion types in a merged dataset. We explicitly parse the global quantitative and modality-wise qualitative findings and introduce a unified, one-sided, uncertainty-aware formulation (MS-RSuper) that: (i) aligns modality-specific qualitative cues (e.g., T1c enhancement, FLAIR edema) with their corresponding substructures using existence and absence losses; (ii) enforces one-sided lower-bounds for partial quantitative cues (e.g., largest lesion size, minimal multiplicity); and (iii) adds extra- vs. intra-axial anatomical priors to respect cohort differences. Certainty tokens scale penalties; missing cues are down-weighted. On 1238 report-labeled BraTS-MET/MEN scans, our MS-RSuper largely outperforms both a sparsely-supervised baseline and a naive RSuper method.
迁移|Zero/Few/One-Shot|自适应(4篇)
【1】From Isolation to Integration: Building an Adaptive Expert Forest for Pre-Trained Model-based Class-Incremental Learning
标题:从隔离到集成:为预训练的基于模型的类增量学习构建自适应专家森林
链接:https://arxiv.org/abs/2602.20911
作者:Ruiqi Liu,Boyu Diao,Hangda Liu,Zhulin An,Fei Wang,Yongjun Xu
摘要:类增量学习(CIL)要求模型学习新类而不忘记旧类。一种常见的方法是冻结一个预先训练好的模型,然后为每个任务训练一个新的轻量级适配器。虽然这可以防止遗忘,但它将学到的知识视为简单的,非结构化的集合,并且无法使用任务之间的关系。为此,我们提出了语义引导的自适应专家森林(SAEF),一种新的方法,组织适配器到一个结构化的层次结构,以更好的知识共享。SAEF首先根据任务的语义关系将其分为概念簇。然后,在每个集群中,通过合并相似任务的适配器来创建新的适配器,从而构建平衡的专家树。在推理时,SAEF从森林中为任何给定的输入找到并激活一组相关专家。最终的预测是通过结合这些被激活的专家的输出来做出的,并根据每个专家的自信程度进行加权。在多个基准数据集上的实验表明,SAEF达到了SOTA性能。
摘要:Class-Incremental Learning (CIL) requires models to learn new classes without forgetting old ones. A common method is to freeze a pre-trained model and train a new, lightweight adapter for each task. While this prevents forgetting, it treats the learned knowledge as a simple, unstructured collection and fails to use the relationships between tasks. To this end, we propose the Semantic-guided Adaptive Expert Forest (SAEF), a new method that organizes adapters into a structured hierarchy for better knowledge sharing. SAEF first groups tasks into conceptual clusters based on their semantic relationships. Then, within each cluster, it builds a balanced expert tree by creating new adapters from merging the adapters of similar tasks. At inference time, SAEF finds and activates a set of relevant experts from the forest for any given input. The final prediction is made by combining the outputs of these activated experts, weighted by how confident each expert is. Experiments on several benchmark datasets show that SAEF achieves SOTA performance.
【2】Actor-Curator: Co-adaptive Curriculum Learning via Policy-Improvement Bandits for RL Post-Training
标题:演员兼策展人:通过RL后训练的政策改进强盗进行协同适应课程学习
链接:https://arxiv.org/abs/2602.20532
作者:Zhengyao Gu,Jonathan Light,Raul Astudillo,Ziyu Ye,Langzhou He,Henry Peng Zou,Wei Cheng,Santiago Paternain,Philip S. Yu,Yisong Yue
备注:37 pages, 8 figures, 1 table. Preprint under review. Equal contribution by first two authors
摘要
:采用强化学习的训练后大型基础模型通常依赖于海量且异构的数据集,这使得有效的课程学习变得至关重要且具有挑战性。在这项工作中,我们提出了ACTOR-CURATOR,这是一个可扩展的全自动课程学习框架,用于大型语言模型(LLM)的强化学习后训练。ACTOR-CURATOR学习一个神经管理员,通过直接优化预期的策略性能改进,从大型问题库中动态选择训练问题。我们制定的问题选择作为一个非平稳的随机强盗问题,推导出一个原则性的损失函数的基础上在线随机镜像下降,并建立部分反馈下的后悔保证。从经验上讲,ACTOR-CURATOR在各种具有挑战性的推理基准中始终优于统一抽样和强大的课程基线,证明了培训稳定性和效率的提高。值得注意的是,它在AIME 2024上实现了28.6%的相对增益,在ARC-1D上实现了30.5%的相对增益,并且加速高达80%。这些结果表明,ACTOR-CURATOR是一种强大而实用的可扩展LLM后培训方法。
摘要:Post-training large foundation models with reinforcement learning typically relies on massive and heterogeneous datasets, making effective curriculum learning both critical and challenging. In this work, we propose ACTOR-CURATOR, a scalable and fully automated curriculum learning framework for reinforcement learning post-training of large language models (LLMs). ACTOR-CURATOR learns a neural curator that dynamically selects training problems from large problem banks by directly optimizing for expected policy performance improvement. We formulate problem selection as a non-stationary stochastic bandit problem, derive a principled loss function based on online stochastic mirror descent, and establish regret guarantees under partial feedback. Empirically, ACTOR-CURATOR consistently outperforms uniform sampling and strong curriculum baselines across a wide range of challenging reasoning benchmarks, demonstrating improved training stability and efficiency. Notably, it achieves relative gains of 28.6% on AIME2024 and 30.5% on ARC-1D over the strongest baseline and up to 80% speedup. These results suggest that ACTOR-CURATOR is a powerful and practical approach for scalable LLM post-training.
【3】KnapSpec: Self-Speculative Decoding via Adaptive Layer Selection as a Knapsack Problem
标题:KnapSec:通过自适应层选择作为背包问题的自我推测解码
链接:https://arxiv.org/abs/2602.20217
作者:Seongjin Cha,Gyuwan Kim,Dongsu Han,Tao Yang,Insu Han
摘要:自推测解码(SSD)通过跳过层来加速LLM推理,以创建有效的草案模型,但现有方法通常依赖于静态推理,忽略了长上下文场景中注意力的动态计算开销。我们提出了KnapSpec,一个无训练的框架,重新制定草案模型选择作为一个背包问题,以最大限度地提高令牌每时间的吞吐量。通过解耦Attention层和MLP层并将其硬件特定的LATERAL建模为上下文长度的函数,KnapSpec通过并行动态规划算法自适应地动态识别最佳草图配置。此外,我们提供了第一个严格的理论分析,建立隐藏状态之间的余弦相似性作为数学上合理的代理令牌接受率。这一基础使我们的方法能够保持高度的绘图忠实性,同时导航现实世界硬件的不断变化的瓶颈。我们在Qwen3和Llama3上的实验表明,KnapSpec始终优于最先进的SSD基线,在各种基准测试中实现了高达1.47倍的时钟加速。我们的即插即用方法可确保对长序列进行高速推理,而无需额外训练或损害目标模型的输出分布。
摘要:Self-speculative decoding (SSD) accelerates LLM inference by skipping layers to create an efficient draft model, yet existing methods often rely on static heuristics that ignore the dynamic computational overhead of attention in long-context scenarios. We propose KnapSpec, a training-free framework that reformulates draft model selection as a knapsack problem to maximize tokens-per-time throughput. By decoupling Attention and MLP layers and modeling their hardware-specific latencies as functions of context length, KnapSpec adaptively identifies optimal draft configurations on the fly via a parallel dynamic programming algorithm. Furthermore, we provide the first rigorous theoretical analysis establishing cosine similarity between hidden states as a mathematically sound proxy for the token acceptance rate. This foundation allows our method to maintain high drafting faithfulness while navigating the shifting bottlenecks of real-world hardware. Our experiments on Qwen3 and Llama3 demonstrate that KnapSpec consistently outperforms state-of-the-art SSD baselines, achieving up to 1.47x wall-clock speedup across various benchmarks. Our plug-and-play approach ensures high-speed inference for long sequences without requiring additional training or compromising the target model's output distribution.
【4】DANCE: Doubly Adaptive Neighborhood Conformal Estimation
标题:双重自适应邻域共形估计
链接:https://arxiv.org/abs/2602.20652
作者:Brandon R. Feng,Brian J. Reich,Daniel Beaglehole,Xihaier Luo,David Keetae Park,Shinjae Yoo,Zhechao Huang,Xueyu Mao,Olcay Boz,Jungeum Kim
摘要:复杂深度学习模型的最新发展带来了前所未有的跨多种数据表示类型准确预测的能力。这些模型的不确定性量化的共形预测已经越来越受欢迎,提供自适应的,预测有效的预测集。对于分类任务,保形方法通常集中在利用logit分数。然而,对于预先训练的模型,当没有针对目标任务进行校准时,这可能导致效率低下,过于保守的集合大小。我们提出了DANCE,一个双局部自适应最近邻为基础的保形算法结合两个新的不一致性分数直接使用数据的嵌入式表示。DANCE首先从嵌入层拟合一个任务自适应核回归模型,然后使用学习的核空间来生成用于不确定性量化的最终预测集。我们测试了最先进的本地,任务适应和zero-shot适形基线,证明了DANCE在各种数据集上集大小效率和鲁棒性的卓越融合。
摘要:The recent developments of complex deep learning models have led to unprecedented ability to accurately predict across multiple data representation types. Conformal prediction for uncertainty quantification of these models has risen in popularity, providing adaptive, statistically-valid prediction sets. For classification tasks, conformal methods have typically focused on utilizing logit scores. For pre-trained models, however, this can result in inefficient, overly conservative set sizes when not calibrated towards the target task. We propose DANCE, a doubly locally adaptive nearest-neighbor based conformal algorithm combining two novel nonconformity scores directly using the data's embedded representation. DANCE first fits a task-adaptive kernel regression model from the embedding layer before using the learned kernel space to produce the final prediction sets for uncertainty quantification. We test against state-of-the-art local, task-adapted and zero-shot conformal baselines, demonstrating DANCE's superior blend of set size efficiency and robustness across various datasets.
强化学习(3篇)
【1】Squint: Fast Visual Reinforcement Learning for Sim-to-Real Robotics
标题:Squint:模拟到真实机器人的快速视觉强化学习
链接:https://arxiv.org/abs/2602.21203
作者:Abdulaziz Almuzairee,Henrik I. Christensen
备注:For website and code, see https://aalmuzairee.github.io/squint
摘要:视觉强化学习对机器人技术很有吸引力,但代价高昂--非策略方法采样效率高但速度慢;策略方法并行化好但浪费样本。最近的工作表明,在基于状态的控制的挂钟时间内,非策略方法可以比策略方法更快地训练。将其扩展到视觉仍然具有挑战性,其中高维输入图像使训练动态复杂化,并引入大量存储和编码开销。为了解决这些挑战,我们引入了Squint,这是一种视觉Soft Actor Critic方法,它比之前的视觉关闭策略和策略方法实现了更快的挂钟训练。Squint通过并行模拟,分布式评论,分辨率斜视,层归一化,调整的更新数据比和优化的实现来实现这一点。我们评估的SO-101任务集,一套新的8个操作任务在ManiSkill 3重域随机化,并演示模拟到真实的SO-101机器人转移。我们在单个RTX 3090 GPU上对策略进行了15分钟的训练,大多数任务在6分钟内完成收敛。
摘要:Visual reinforcement learning is appealing for robotics but expensive -- off-policy methods are sample-efficient yet slow; on-policy methods parallelize well but waste samples. Recent work has shown that off-policy methods can train faster than on-policy methods in wall-clock time for state-based control. Extending this to vision remains challenging, where high-dimensional input images complicate training dynamics and introduce substantial storage and encoding overhead. To address these challenges, we introduce Squint, a visual Soft Actor Critic method that achieves faster wall-clock training than prior visual off-policy and on-policy methods. Squint achieves this via parallel simulation, a distributional critic, resolution squinting, layer normalization, a tuned update-to-data ratio, and an optimized implementation. We evaluate on the SO-101 Task Set, a new suite of eight manipulation tasks in ManiSkill3 with heavy domain randomization, and demonstrate sim-to-real transfer to a real SO-101 robot. We train policies for 15 minutes on a single RTX 3090 GPU, with most tasks converging in under 6 minutes.
【2】Localized Dynamics-Aware Domain Adaption for Off-Dynamics Offline Reinforcement Learning
标题:用于非动态离线强化学习的局部动态感知领域适应
链接:https://arxiv.org/abs/2602.21072
作者:Zhangjie Xia,Yu Yang,Pan Xu
备注:33 pages, 9 figures, 11 tables
摘要:非动态离线强化学习(RL)的目的是利用有限的目标数据和在不同的过渡动态下收集的丰富的源数据来学习目标域的策略。现有的方法通常解决动态不匹配的全局状态空间或通过逐点数据过滤,这些方法可能会错过本地化的跨域相似性或产生高的计算成本。我们提出了局部动态感知域自适应(LoDADA),它利用局部动态失配来更好地重用源数据。LoDADA聚类从源和目标数据集的转换,并通过域歧视估计集群级动态差异。来自差异小的聚类的源转换被保留,而来自差异大的聚类的源转换被过滤掉。这产生了细粒度和可扩展的数据选择策略,避免了过于粗糙的全局假设和昂贵的每样本过滤。我们提供理论见解和广泛的实验,在不同的全球和本地动态变化的环境。结果表明,LoDADA通过更好地利用局部分布失配,始终优于最先进的非动态离线RL方法。
摘要:Off-dynamics offline reinforcement learning (RL) aims to learn a policy for a target domain using limited target data and abundant source data collected under different transition dynamics. Existing methods typically address dynamics mismatch either globally over the state space or via pointwise data filtering; these approaches can miss localized cross-domain similarities or incur high computational cost. We propose Localized Dynamics-Aware Domain Adaptation (LoDADA), which exploits localized dynamics mismatch to better reuse source data. LoDADA clusters transitions from source and target datasets and estimates cluster-level dynamics discrepancy via domain discrimination. Source transitions from clusters with small discrepancy are retained, while those from clusters with large discrepancy are filtered out. This yields a fine-grained and scalable data selection strategy that avoids overly coarse global assumptions and expensive per-sample filtering. We provide theoretical insights and extensive experiments across environments with diverse global and local dynamics shifts. Results show that LoDADA consistently outperforms state-of-the-art off-dynamics offline RL methods by better leveraging localized distribution mismatch.
【3】Gap-Dependent Bounds for Nearly Minimax Optimal Reinforcement Learning with Linear Function Approximation
标题:线性函数逼近的近极小最优强化学习的间隙相关边界
链接:https://arxiv.org/abs/2602.20297
作者:Haochen Zhang,Zhong Zheng,Lingzhou Xue
摘要:我们研究了线性函数近似下强化学习中近似最小最优算法的间隙相关性能保证。虽然以前的作品已经建立了间隙相关的遗憾界在这种情况下,现有的分析不适用于算法,实现近最小最大最优的最坏情况下的遗憾界$\tilde{O}(d\sqrt{H^3K})$,其中$d$是特征尺寸,$H$是地平线的长度,和$K$是情节的数量。我们通过为近似最小最大最优算法LSVI-UCB++提供第一个间隙相关的遗憾界限来弥合这一差距(He et al.,2023年)。我们的分析产生了改善的依赖性$d$和$H$相比,以前的差距依赖的结果。此外,利用LSVI-UCB++的低策略切换属性,我们引入了一个并发变体,可以在多个代理之间进行有效的并行探索,并通过线性函数近似建立了在线多代理RL的第一个间隙相关样本复杂度上限,实现了相对于代理数量的线性加速。
摘要:We study gap-dependent performance guarantees for nearly minimax-optimal algorithms in reinforcement learning with linear function approximation. While prior works have established gap-dependent regret bounds in this setting, existing analyses do not apply to algorithms that achieve the nearly minimax-optimal worst-case regret bound $\tilde{O}(d\sqrt{H^3K})$, where $d$ is the feature dimension, $H$ is the horizon length, and $K$ is the number of episodes. We bridge this gap by providing the first gap-dependent regret bound for the nearly minimax-optimal algorithm LSVI-UCB++ (He et al., 2023). Our analysis yields improved dependencies on both $d$ and $H$ compared to previous gap-dependent results. Moreover, leveraging the low policy-switching property of LSVI-UCB++, we introduce a concurrent variant that enables efficient parallel exploration across multiple agents and establish the first gap-dependent sample complexity upper bound for online multi-agent RL with linear function approximation, achieving linear speedup with respect to the number of agents.
元学习(1篇)
【1】IMOVNO+: A Regional Partitioning and Meta-Heuristic Ensemble Framework for Imbalanced Multi-Class Learning
标题:IMVNO+:用于不平衡多班级学习的区域划分和元启发式集合框架
链接
:https://arxiv.org/abs/2602.20199
作者:Soufiane Bacha,Laouni Djafri,Sahraoui Dhelim,Huansheng Ning
备注:28 pages
摘要:类不平衡、重叠和噪声降低了数据质量,降低了模型的可靠性,并限制了泛化。虽然在二元分类中得到了广泛的研究,但这些问题在多类环境中仍然没有得到充分的研究,在多类环境中,复杂的类间关系使得少数-多数结构不清楚,传统的聚类无法捕捉分布形状。仅依赖于几何距离的方法有可能删除信息样本并生成低质量的合成数据,而二值化方法则处理局部不平衡并忽略全局类间依赖性。在算法层面,集成很难集成弱分类器,导致鲁棒性有限。本文提出了IMOVNO+(IMBALance-OVerlap-NOise+算法级优化),这是一个两级框架,旨在共同提高二进制和多类任务的数据质量和算法鲁棒性。在数据层面,首先,条件概率被用来量化每个样本的信息量。其次,将数据集划分为核心区域、重叠区域和噪声区域。第三,引入了一种将Z分数度量与大跳跃间隙距离相结合的重排序-清洗算法。第四,基于多重正则化的智能过采样算法控制合成样本接近度,防止新的重叠。在算法层面,元启发式修剪集成分类器,以减少弱学习者的影响。IMOVNO+在35个数据集上进行了评估(13个多类,22个二进制)。结果显示,与最先进的方法相比,具有一致的优越性,在某些情况下接近100%。对于多类数据,IMOVNO+在G均值方面获得了37-57%的收益,在F1得分方面获得了25-44%的收益,在精确度方面获得了25-39%的收益,在召回率方面获得了26-43%的收益。在二进制任务中,它达到了近乎完美的性能,提高了14- 39%。该框架处理数据稀缺性和收集和隐私限制的不平衡。
摘要:Class imbalance, overlap, and noise degrade data quality, reduce model reliability, and limit generalization. Although widely studied in binary classification, these issues remain underexplored in multi-class settings, where complex inter-class relationships make minority-majority structures unclear and traditional clustering fails to capture distribution shape. Approaches that rely only on geometric distances risk removing informative samples and generating low-quality synthetic data, while binarization approaches treat imbalance locally and ignore global inter-class dependencies. At the algorithmic level, ensembles struggle to integrate weak classifiers, leading to limited robustness. This paper proposes IMOVNO+ (IMbalance-OVerlap-NOise+ Algorithm-Level Optimization), a two-level framework designed to jointly enhance data quality and algorithmic robustness for binary and multi-class tasks. At the data level, first, conditional probability is used to quantify the informativeness of each sample. Second, the dataset is partitioned into core, overlapping, and noisy regions. Third, an overlapping-cleaning algorithm is introduced that combines Z-score metrics with a big-jump gap distance. Fourth, a smart oversampling algorithm based on multi-regularization controls synthetic sample proximity, preventing new overlaps. At the algorithmic level, a meta-heuristic prunes ensemble classifiers to reduce weak-learner influence. IMOVNO+ was evaluated on 35 datasets (13 multi-class, 22 binary). Results show consistent superiority over state-of-the-art methods, approaching 100% in several cases. For multi-class data, IMOVNO+ achieves gains of 37-57% in G-mean, 25-44% in F1-score, 25-39% in precision, and 26-43% in recall. In binary tasks, it attains near-perfect performance with improvements of 14-39%. The framework handles data scarcity and imbalance from collection and privacy limits.
符号|符号学习(1篇)
【1】GENSR: Symbolic Regression Based in Equation Generative Space
标题:GENSR:基于方程生成空间的符号回归
链接:https://arxiv.org/abs/2602.20557
作者:Qian Li,Yuxiao Hu,Juncheng Liu,Yuntian Chen
摘要:符号回归(Symbolic Regression,SR)试图揭示隐藏在观测数据背后的方程。然而,大多数方法在离散方程空间内搜索,其中方程的结构修改很少与它们的数值行为一致,使得拟合误差反馈太嘈杂而无法指导探索。为了解决这一挑战,我们提出了GenSR,一个生成的潜在的基于空间的SR框架以下的“地图建设->粗定位->精细搜索”的范式。具体来说,GenSR首先预训练一个双分支条件变分自动编码器(CVAE),将符号方程重新参数化到一个具有符号连续性和局部数值平滑性的生成潜在空间中。该空间可以被视为方程空间的结构良好的“地图”,为搜索提供方向信号。在推理时,CVAE将输入数据粗略地定位到潜在空间中有希望的区域。然后,改进的CMA-ES细化候选区域,利用平滑的潜在梯度。从贝叶斯的角度来看,GenSR将SR任务重新定义为最大化条件分布$p(\mathrm{Equ.} \mid \mathrm{mathrm.})$,CVAE培训通过证据下限(ELBO)实现这一目标。这一新视角为GenSR的有效性提供了理论保障。大量的实验表明,GenSR联合优化预测精度,表达简单性和计算效率,同时在噪声下保持鲁棒性。
摘要:Symbolic Regression (SR) tries to reveal the hidden equations behind observed data. However, most methods search within a discrete equation space, where the structural modifications of equations rarely align with their numerical behavior, leaving fitting error feedback too noisy to guide exploration. To address this challenge, we propose GenSR, a generative latent space-based SR framework following the `map construction -> coarse localization -> fine search'' paradigm. Specifically, GenSR first pretrains a dual-branch Conditional Variational Autoencoder (CVAE) to reparameterize symbolic equations into a generative latent space with symbolic continuity and local numerical smoothness. This space can be regarded as a well-structured `map'' of the equation space, providing directional signals for search. At inference, the CVAE coarsely localizes the input data to promising regions in the latent space. Then, a modified CMA-ES refines the candidate region, leveraging smooth latent gradients. From a Bayesian perspective, GenSR reframes the SR task as maximizing the conditional distribution $p(\mathrm{Equ.} \mid \mathrm{Num.})$, with CVAE training achieving this objective through the Evidence Lower Bound (ELBO). This new perspective provides a theoretical guarantee for the effectiveness of GenSR. Extensive experiments show that GenSR jointly optimizes predictive accuracy, expression simplicity, and computational efficiency, while remaining robust under noise.
医学相关(2篇)
【1】Sequential Counterfactual Inference for Temporal Clinical Data: Addressing the Time Traveler Dilemma
标题:时态临床数据的顺序反事实推理:解决时间旅行者困境
链接:https://arxiv.org/abs/2602.21168
作者:Jingya Cheng,Alaleh Azhir,Jiazi Tian,Hossein Estiri
摘要
:反事实推理使临床医生能够询问关于患者结果的“如果”问题,但标准方法假设特征独立性和同时可修改性-纵向临床数据违反了假设。我们引入了顺序反事实框架,它通过区分不可变特征(慢性诊断)和可控特征(实验室值)并对干预措施如何随时间传播进行建模,来尊重电子健康记录中的时间依赖性。应用于2,723名COVID-19患者(383名长期COVID心力衰竭病例,2,340名匹配对照),我们证明了38-67%的慢性疾病患者在原始方法下需要生物学上不可能的反事实。我们确定了一个心肾级联反应(CKD -> AKI -> HF),每一步的相对风险分别为2.27和1.19,说明了时序性(但不是幼稚的)反事实可以捕获的时间传播。我们的框架将反事实解释从“如果这个特征是不同的呢?“到“如果我们早一点干预会怎么样,以及它将如何传播下去?“--产生基于生物可接受性的临床可操作的见解。
摘要:Counterfactual inference enables clinicians to ask "what if" questions about patient outcomes, but standard methods assume feature independence and simultaneous modifiability -- assumptions violated by longitudinal clinical data. We introduce the Sequential Counterfactual Framework, which respects temporal dependencies in electronic health records by distinguishing immutable features (chronic diagnoses) from controllable features (lab values) and modeling how interventions propagate through time. Applied to 2,723 COVID-19 patients (383 Long COVID heart failure cases, 2,340 matched controls), we demonstrate that 38-67% of patients with chronic conditions would require biologically impossible counterfactuals under naive methods. We identify a cardiorenal cascade (CKD -> AKI -> HF) with relative risks of 2.27 and 1.19 at each step, illustrating temporal propagation that sequential -- but not naive -- counterfactuals can capture. Our framework transforms counterfactual explanation from "what if this feature were different?" to "what if we had intervened earlier, and how would that propagate forward?" -- yielding clinically actionable insights grounded in biological plausibility.
【2】MIP Candy: A Modular PyTorch Framework for Medical Image Processing
标题:MPP Candy:用于医学图像处理的模块化PyTorch框架
链接:https://arxiv.org/abs/2602.21033
作者:Tianhao Fu,Yucheng Chen
摘要:医学图像处理需要专门的软件来处理高维体积数据、异构文件格式和特定领域的培训程序。现有的框架要么提供需要大量集成工作的低级组件,要么强加刚性的、单一的管道来抵制修改。我们提出了MIP Candy(MIPCandy),这是一个免费提供的,基于PyTorch的框架,专为医学图像处理而设计。MIPCandy提供了一个完整的模块化管道,涵盖数据加载,训练,推理和评估,允许研究人员通过实现单个方法$\texttt{build_network}$来获得功能齐全的流程工作流,同时保留对每个组件的细粒度控制。设计的核心是$\texttt{LayerT}$,这是一种延迟配置机制,可以在运行时替换卷积、规范化和激活模块,而无需子类化。该框架还提供了内置的$k$-fold交叉验证,具有自动感兴趣区域检测的数据集检查,深度监督,指数移动平均,多前端实验跟踪(权重和偏差,概念,MLflow),训练状态恢复,以及通过商回归进行验证分数预测。可扩展的bundle生态系统提供了遵循一致的训练器-预测器模式的预构建模型实现,并与核心框架集成而无需修改。MIPCandy是Apache-2.0许可下的开源,需要Python~3.12或更高版本。源代码和文档可在https://github.com/ProjectNeura/MIPCandy上获得。
摘要:Medical image processing demands specialized software that handles high-dimensional volumetric data, heterogeneous file formats, and domain-specific training procedures. Existing frameworks either provide low-level components that require substantial integration effort or impose rigid, monolithic pipelines that resist modification. We present MIP Candy (MIPCandy), a freely available, PyTorch-based framework designed specifically for medical image processing. MIPCandy provides a complete, modular pipeline spanning data loading, training, inference, and evaluation, allowing researchers to obtain a fully functional process workflow by implementing a single method, $\texttt{build_network}$, while retaining fine-grained control over every component. Central to the design is $\texttt{LayerT}$, a deferred configuration mechanism that enables runtime substitution of convolution, normalization, and activation modules without subclassing. The framework further offers built-in $k$-fold cross-validation, dataset inspection with automatic region-of-interest detection, deep supervision, exponential moving average, multi-frontend experiment tracking (Weights & Biases, Notion, MLflow), training state recovery, and validation score prediction via quotient regression. An extensible bundle ecosystem provides pre-built model implementations that follow a consistent trainer--predictor pattern and integrate with the core framework without modification. MIPCandy is open-source under the Apache-2.0 license and requires Python~3.12 or later. Source code and documentation are available at https://github.com/ProjectNeura/MIPCandy.
蒸馏|知识提取(2篇)
【1】GATES: Self-Distillation under Privileged Context with Consensus Gating
标题:盖茨:特权背景下的自我蒸馏与共识门控
链接:https://arxiv.org/abs/2602.20574
作者:Alex Stein,Furong Huang,Tom Goldstein
备注:10 Pages of main text with an additional 7 pages of supplementary material
摘要:我们在监督不可靠的环境中研究自我升华:没有真实标签,可验证的奖励或外部评分员来评估答案。我们专注于文档为基础的问题回答与不对称的背景下,其中一个单一的模型既作为导师(在训练过程中访问相关的源文件)和学生(在测试时只回答问题)。而不是假设导师的正确性,我们从导师的共识,通过采样多个文件为基础的推理痕迹,并使用协议门学习在线监督。在这个可靠性信号的条件下,我们通过完整的导师推理轨迹(而不仅仅是最终答案)提取知识,提供密集和稳定的学习信号。从经验上讲,这种共识门控轨迹蒸馏大大提高了向无文件学生的转移。在非对称评估下的域内保持准确率从46.0%提高到62.0%,在公共文档自由数学基准上的平均(maj@8)准确率从20.2%提高到35.4%。
摘要:We study self-distillation in settings where supervision is unreliable: there are no ground truth labels, verifiable rewards, or external graders to evaluate answers. We focus on document-grounded question answering with asymmetric context, where a single model serves as both tutor (with access to a relevant source document during training) and student (answering from the question alone at test time). Rather than assuming tutor correctness, we derive supervision online from tutor consensus by sampling multiple document-grounded reasoning traces and using agreement to gate learning. Conditioned on this reliability signal, we distill knowledge through full tutor reasoning trajectories (not just final answers), providing a dense and stable learning signal. Empirically, this consensus-gated trajectory distillation substantially improves transfer to the document-free student. Held-out in-domain accuracy under asymmetric evaluation improves from 46.0\% to 62.0\%, and average (maj@8) accuracy on public document-free math benchmarks improves from 20.2\% to 35.4\%.
【2】CITED: A Decision Boundary-Aware Signature for GNNs Towards Model Extraction Defense
标题:CITED:GNN的决策边界感知签名,以实现模型提取防御
链接:https://arxiv.org/abs/2602.20418
作者:Bolin Shen,Md Shamim Seraj,Zhan Cheng,Shayok Chakraborty,Yushun Dong
摘要:图神经网络(GNN)在各种应用中表现出卓越的性能,例如推荐系统和金融风险管理。然而,在本地部署大规模GNN模型对用户来说尤其具有挑战性,因为它需要大量的计算资源和大量的属性数据。因此,机器学习即服务(MLaaS)变得越来越流行,提供了一种方便的方式来部署和访问各种模型,包括GNN。然而,一种被称为模型提取攻击(MEA)的新兴威胁带来了巨大的风险,因为攻击者可以很容易地获得表现出类似功能的替代GNN模型。具体而言,攻击者使用子图输入重复查询目标模型以收集相应的响应。这些输入-输出对随后被用于以最小的成本训练它们自己的代理模型。已经提出了许多技术来防御多边环境协定,但大多数限于特定的输出水平(例如,嵌入或标记),并具有固有的技术缺陷。为了解决这些限制,我们提出了一个新的所有权验证框架CITED,这是第一个同类的方法,实现所有权验证嵌入和标签级别。此外,CITED是一种新颖的基于签名的方法,既不损害下游性能,也不引入降低效率的辅助模型,同时仍然优于所有水印和指纹方法。大量的实验证明了我们的CITED框架的有效性和鲁棒性。代码可从以下网址获得:https://github.com/LabRAI/CITED。
摘要:Graph neural networks (GNNs) have demonstrated superior performance in various applications, such as recommendation systems and financial risk management. However, deploying large-scale GNN models locally is particularly challenging for users, as it requires significant computational resources and extensive property data. Consequently, Machine Learning as a Service (MLaaS) has become increasingly popular, offering a convenient way to deploy and access various models, including GNNs. However, an emerging threat known as Model Extraction Attacks (MEAs) presents significant risks, as adversaries can readily obtain surrogate GNN models exhibiting similar functionality. Specifically, attackers repeatedly query the target model using subgraph inputs to collect corresponding responses. These input-output pairs are subsequently utilized to train their own surrogate models at minimal cost. Many techniques have been proposed to defend against MEAs, but most are limited to specific output levels (e.g., embedding or label) and suffer from inherent technical drawbacks. To address these limitations, we propose a novel ownership verification framework CITED which is a first-of-its-kind method to achieve ownership verification on both embedding and label levels. Moreover, CITED is a novel signature-based method that neither harms downstream performance nor introduces auxiliary models that reduce efficiency, while still outperforming all watermarking and fingerprinting approaches. Extensive experiments demonstrate the effectiveness and robustness of our CITED framework. Code is available at: https://github.com/LabRAI/CITED.
推荐(1篇)
【1】Position-Aware Sequential Attention for Accurate Next Item Recommendations
标题:具有位置意识的顺序关注,以提供准确的下一个项目推荐
链接:https://arxiv.org/abs/2602.21052
作者:Timur Nabiev,Evgeny Frolov
摘要:顺序自注意模型通常依赖于附加位置嵌入,其在输入处将位置信息注入到项目表示中。在没有位置信号的情况下,注意力块在序列位置上是置换等变的,因此除了因果掩蔽之外没有内在的时间顺序概念。我们认为,添加剂的位置嵌入的注意力机制,只有表面上敏感的序列顺序:位置信息与项目嵌入语义纠缠在一起,传播弱,在深层架构,并限制了捕捉丰富的序列模式的能力。为了解决这些限制,我们引入了一个核化的自我注意力机制,其中一个可学习的位置内核纯粹在位置空间中操作,从语义相似性中解脱出来,并直接调节注意力权重。当应用于每个注意力块时,该内核支持自适应多尺度顺序建模。在标准的下一项预测基准上的实验表明,我们的位置内核注意力在强竞争基线上不断提高。
摘要:Sequential self-attention models usually rely on additive positional embeddings, which inject positional information into item representations at the input. In the absence of positional signals, the attention block is permutation-equivariant over sequence positions and thus has no intrinsic notion of temporal order beyond causal masking. We argue that additive positional embeddings make the attention mechanism only superficially sensitive to sequence order: positional information is entangled with item embedding semantics, propagates weakly in deep architectures, and limits the ability to capture rich sequential patterns. To address these limitations, we introduce a kernelized self-attention mechanism, where a learnable positional kernel operates purely in the position space, disentangled from semantic similarity, and directly modulates attention weights. When applied per attention block, this kernel enables adaptive multi-scale sequential modeling. Experiments on standard next-item prediction benchmarks show that our positional kernel attention consistently improves over strong competing baselines.
聚类(1篇)
【1】Coupled Cluster con MōLe: Molecular Orbital Learning for Neural Wavefunctions
标题:耦合集群con MðLe:神经波函数的分子轨道学习
链接:https://arxiv.org/abs/2602.20232
作者:Luca Thiede,Abdulrahman Aldossary,Andreas Burger,Jorge Arturo Campos-Gonzalez-Angulo,Ning Wang,Alexander Zook,Melisa Alkan,Kouhei Nakaji,Taylor Lee Patti,Jérôme Florian Gonthier,Mohammad Ghazi Vakili,Alán Aspuru-Guzik
摘要:密度泛函理论(DFT)是计算分子性质的最广泛使用的方法;然而,它的准确性往往不足以定量预测。耦合团簇(CC)理论是最成功的方法,可以实现DFT以外的精度,并预测与实验密切相关的性质。它被称为量子化学的“黄金标准”。不幸的是,CC的高计算成本限制了其广泛的适用性。在这项工作中,我们提出了分子轨道学习(MōLe)架构,这是一种等变机器学习模型,可以直接预测CC的核心数学对象,即从平均场Hartree-Fock分子轨道作为输入的激发振幅。我们测试了我们的模型的各个方面,并展示了其显着的数据效率和分布外的推广到更大的分子和非平衡几何形状,尽管只在小的平衡几何形状训练。最后,我们还研究了它的能力,以减少收敛CC计算所需的周期数。MōLe可以为基于波函数的高精度ML架构奠定基础,以加速分子设计和补充力场方法。
摘要
:Density functional theory (DFT) is the most widely used method for calculating molecular properties; however, its accuracy is often insufficient for quantitative predictions. Coupled-cluster (CC) theory is the most successful method for achieving accuracy beyond DFT and for predicting properties that closely align with experiment. It is known as the ''gold standard'' of quantum chemistry. Unfortunately, the high computational cost of CC limits its widespread applicability. In this work, we present the Molecular Orbital Learning (MōLe) architecture, an equivariant machine learning model that directly predicts CC's core mathematical objects, the excitation amplitudes, from the mean-field Hartree-Fock molecular orbitals as inputs. We test various aspects of our model and demonstrate its remarkable data efficiency and out-of-distribution generalization to larger molecules and off-equilibrium geometries, despite being trained only on small equilibrium geometries. Finally, we also examine its ability to reduce the number of cycles required to converge CC calculations. MōLe can set the foundations for high-accuracy wavefunction-based ML architectures to accelerate molecular design and complement force-field approaches.
自动驾驶|车辆|车道检测等(1篇)
【1】On Electric Vehicle Energy Demand Forecasting and the Effect of Federated Learning
标题:电动汽车能源需求预测及联邦学习的效果
链接:https://arxiv.org/abs/2602.20782
作者:Andreas Tritsarolis,Gil Sampaio,Nikos Pelekis,Yannis Theodoridis
摘要:新能源、智能设备和需求侧管理策略的广泛应用推动了从基础设施负载建模到用户行为分析的多项分析操作。电动汽车供电设备(EVSE)的能源需求预测(EDF)是确保高效能源管理和可持续性的最关键操作之一,因为它使公用事业提供商能够预测能源/电力需求,优化资源分配,并实施积极措施以提高电网可靠性。然而,由于外部因素,准确的EDF是一个具有挑战性的问题,例如不同的用户例程,天气条件,驾驶行为,未知的充电状态等。此外,随着对隐私和可持续性的关注和限制的增加,训练数据变得越来越碎片化,导致分布式数据集分散在不同的数据孤岛和/或边缘设备上,需要联合学习解决方案。在本文中,我们研究了不同的成熟的时间序列预测方法来解决EDF问题,从统计方法(ARIMA系列)到传统的机器学习模型(如XGBoost)和深度神经网络(GRU和LSTM)。我们通过对四个真实世界的EVSE数据集进行性能比较来概述这些方法,这些数据集在集中式和联邦学习范式下进行评估,重点关注预测保真度,隐私保护和能源开销之间的权衡。我们的实验结果表明,一方面,梯度提升树(XGBoost)在预测精度和能源效率方面优于统计和基于NN的模型,另一方面,联邦学习模型平衡了这些因素,为分散式能源需求预测提供了一个有希望的方向。
摘要:The wide spread of new energy resources, smart devices, and demand side management strategies has motivated several analytics operations, from infrastructure load modeling to user behavior profiling. Energy Demand Forecasting (EDF) of Electric Vehicle Supply Equipments (EVSEs) is one of the most critical operations for ensuring efficient energy management and sustainability, since it enables utility providers to anticipate energy/power demand, optimize resource allocation, and implement proactive measures to improve grid reliability. However, accurate EDF is a challenging problem due to external factors, such as the varying user routines, weather conditions, driving behaviors, unknown state of charge, etc. Furthermore, as concerns and restrictions about privacy and sustainability have grown, training data has become increasingly fragmented, resulting in distributed datasets scattered across different data silos and/or edge devices, calling for federated learning solutions. In this paper, we investigate different well-established time series forecasting methodologies to address the EDF problem, from statistical methods (the ARIMA family) to traditional machine learning models (such as XGBoost) and deep neural networks (GRU and LSTM). We provide an overview of these methods through a performance comparison over four real-world EVSE datasets, evaluated under both centralized and federated learning paradigms, focusing on the trade-offs between forecasting fidelity, privacy preservation, and energy overheads. Our experimental results demonstrate, on the one hand, the superiority of gradient boosted trees (XGBoost) over statistical and NN-based models in both prediction accuracy and energy efficiency and, on the other hand, an insight that Federated Learning-enabled models balance these factors, offering a promising direction for decentralized energy demand forecasting.
联邦学习|隐私保护|加密(1篇)
【1】Heterogeneity-Aware Client Selection Methodology For Efficient Federated Learning
标题:用于高效联邦学习的具有异类意识的客户端选择方法
链接:https://arxiv.org/abs/2602.20450
作者:Nihal Balivada,Shrey Gupta,Shashank Shreedhar Bhatt,Suyash Gupta
摘要:联合学习(FL)支持分布式客户端-服务器架构,其中多个客户端协作训练全局机器学习(ML)模型,而无需共享敏感的本地数据。然而,由于客户端之间的统计异质性,FL通常会导致比传统ML算法更低的准确性。先前的工作试图通过使用模型更新来解决这个问题,例如从客户端模型中选择可以提高全局模型准确性的参与者的损失和偏差。然而,这些更新既不能准确地代表客户端的异质性,也不是他们的选择方法确定性。我们通过引入Terraform来减轻这些限制,Terraform是一种新型的客户端选择方法,它使用梯度更新和确定性选择算法来选择异构客户端进行再培训。这种双管齐下的方法使Terraform的准确率比以前的工作高出47%。我们通过全面的消融研究和训练时间分析进一步证明了其效率,为Terraform的鲁棒性提供了强有力的理由。
摘要:Federated Learning (FL) enables a distributed client-server architecture where multiple clients collaboratively train a global Machine Learning (ML) model without sharing sensitive local data. However, FL often results in lower accuracy than traditional ML algorithms due to statistical heterogeneity across clients. Prior works attempt to address this by using model updates, such as loss and bias, from client models to select participants that can improve the global model's accuracy. However, these updates neither accurately represent a client's heterogeneity nor are their selection methods deterministic. We mitigate these limitations by introducing Terraform, a novel client selection methodology that uses gradient updates and a deterministic selection algorithm to select heterogeneous clients for retraining. This bi-pronged approach allows Terraform to achieve up to 47 percent higher accuracy over prior works. We further demonstrate its efficiency through comprehensive ablation studies and training time analyses, providing strong justification for the robustness of Terraform.
推理|分析|理解|解释(8篇)
【1】PIME: Prototype-based Interpretable MCTS-Enhanced Brain Network Analysis for Disorder Diagnosis
标题:PIME:基于原型的可解释MCTS增强脑网络分析用于疾病诊断
链接:https://arxiv.org/abs/2602.21046
作者:Kunyu Zhang,Yanwu Yang,Jing Zhang,Xiangjie Shi,Shujian Yu
摘要:最近用于基于fMRI的诊断的深度学习方法通过对功能连接网络进行建模,实现了有希望的准确性。然而,标准的方法往往与嘈杂的互动斗争,传统的事后归因方法可能缺乏可靠性,可能会突出特定于小册子的文物。为了解决这些挑战,我们引入了PIME,这是一个可解释的框架,它通过将基于原型的分类和一致性训练与学习过程中的结构扰动相结合,将内在的可解释性与最小充分子图优化联系起来。这鼓励了结构化的潜在空间,并使Monte Carlo Tree Search(MCTS)在原型一致的目标下提取紧凑的最小充分解释子图。在三个基准fMRI数据集上的实验表明,PIME达到了最先进的性能。此外,通过学习原型来限制搜索空间,PIME识别出与已建立的神经成像结果一致的关键大脑区域。稳定性分析显示,各图谱的重现性和解释一致性为90%。
摘要:Recent deep learning methods for fMRI-based diagnosis have achieved promising accuracy by modeling functional connectivity networks. However, standard approaches often struggle with noisy interactions, and conventional post-hoc attribution methods may lack reliability, potentially highlighting dataset-specific artifacts. To address these challenges, we introduce PIME, an interpretable framework that bridges intrinsic interpretability with minimal-sufficient subgraph optimization by integrating prototype-based classification and consistency training with structural perturbations during learning. This encourages a structured latent space and enables Monte Carlo Tree Search (MCTS) under a prototype-consistent objective to extract compact minimal-sufficient explanatory subgraphs post-training. Experiments on three benchmark fMRI datasets demonstrate that PIME achieves state-of-the-art performance. Furthermore, by constraining the search space via learned prototypes, PIME identifies critical brain regions that are consistent with established neuroimaging findings. Stability analysis shows 90% reproducibility and consistent explanations across atlases.
【2】Transcoder Adapters for Reasoning-Model Diffing
标题:用于推理模型区分的代码转换器适配器
链接:https://arxiv.org/abs/2602.20904
作者:Nathan Hu,Jake Ward,Thomas Icard,Christopher Potts
备注:9 pages main, 27 pages total, 10 figures. Code and visualizations at https://transcoder-adapters.github.io/
摘要:虽然推理模型越来越普遍,但推理训练对模型内部机制的影响仍然知之甚少。在这项工作中,我们介绍了转码器适配器,一种用于学习微调前后MLP计算差异的可解释近似的技术。我们应用代码转换器适配器来描述Qwen2.5-Math-7 B及其推理提取变体DeepSeek-R1-Distill-Qwen-7 B之间的差异。学习适配器忠实于目标模型的内部计算和下一个令牌预测。当在推理基准上进行评估时,适配器匹配推理模型的响应长度,并且通常从推理微调中恢复50-90%的准确性增益。适配器功能是稀疏激活和可解释的。当检查适配器功能时,我们发现只有约8%的适配器具有与推理行为直接相关的激活示例。我们深入研究了这样一种行为--犹豫标记的产生(例如,“等待”)。使用属性图,我们追踪到只有约2.4%的适配器功能(总共5.6k)执行两个功能之一。这些特征对于产生犹豫标记是必要的,也是足够的;去除它们可以减少响应长度,通常不会影响准确性。总的来说,我们的研究结果提供了对推理训练的深入了解,并建议转码器适配器可能有助于更广泛地研究微调。
摘要:While reasoning models are increasingly ubiquitous, the effects of reasoning training on a model's internal mechanisms remain poorly understood. In this work, we introduce transcoder adapters, a technique for learning an interpretable approximation of the difference in MLP computation before and after fine-tuning. We apply transcoder adapters to characterize the differences between Qwen2.5-Math-7B and its reasoning-distilled variant, DeepSeek-R1-Distill-Qwen-7B. Learned adapters are faithful to the target model's internal computation and next-token predictions. When evaluated on reasoning benchmarks, adapters match the reasoning model's response lengths and typically recover 50-90% of the accuracy gains from reasoning fine-tuning. Adapter features are sparsely activating and interpretable. When examining adapter features, we find that only ~8% have activating examples directly related to reasoning behaviors. We deeply study one such behavior -- the production of hesitation tokens (e.g., "wait"). Using attribution graphs, we trace hesitation to only ~2.4% of adapter features (5.6k total) performing one of two functions. These features are necessary and sufficient for producing hesitation tokens; removing them reduces response length, often without affecting accuracy. Overall, our results provide insight into reasoning training and suggest transcoder adapters may be useful for studying fine-tuning more broadly.
【3】Probing Dec-POMDP Reasoning in Cooperative MARL
标题:协作MARL中Dec-POMDP推理的探讨
链接:https://arxiv.org/abs/2602.20804
作者:Kale-ab Tessera,Leonard Hinckeldey,Riccardo Zamboni,David Abel,Amos Storkey
备注:To appear at the 25th International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS 2026)
摘要:协作多智能体强化学习(MARL)通常被构建为分散的部分可观察马尔可夫决策过程(Dec-POMDP),这种设置的难度来自两个关键挑战:部分可观察性和分散协调。解决这些任务需要Dec-POMDP推理,其中代理使用历史来推断隐藏状态并基于本地信息进行协调。然而,目前还不清楚流行的基准是否真的需要这种推理,或者允许通过更简单的策略取得成功。我们引入了一个诊断套件,结合统计接地性能比较和信息理论探测审计的行为复杂性的基线政策(IPPO和MAPPO)在37个场景跨越MPE,SMAX,煮过头,Hanabi和MaBrax。我们的诊断表明,在这些基准上的成功很少需要真正的Dec-POMDP推理。在超过一半的场景中,反应式策略与基于内存的代理的性能相匹配,紧急协调通常依赖于脆弱的同步动作耦合,而不是强大的时间影响。这些研究结果表明,一些广泛使用的基准可能无法充分测试核心Dec-POMDP假设在目前的培训模式,可能会导致过于乐观的进展评估。我们发布了诊断工具,以支持协作MARL中更严格的环境设计和评估。
摘要
:Cooperative multi-agent reinforcement learning (MARL) is typically framed as a decentralised partially observable Markov decision process (Dec-POMDP), a setting whose hardness stems from two key challenges: partial observability and decentralised coordination. Genuinely solving such tasks requires Dec-POMDP reasoning, where agents use history to infer hidden states and coordinate based on local information. Yet it remains unclear whether popular benchmarks actually demand this reasoning or permit success via simpler strategies. We introduce a diagnostic suite combining statistically grounded performance comparisons and information-theoretic probes to audit the behavioural complexity of baseline policies (IPPO and MAPPO) across 37 scenarios spanning MPE, SMAX, Overcooked, Hanabi, and MaBrax. Our diagnostics reveal that success on these benchmarks rarely requires genuine Dec-POMDP reasoning. Reactive policies match the performance of memory-based agents in over half the scenarios, and emergent coordination frequently relies on brittle, synchronous action coupling rather than robust temporal influence. These findings suggest that some widely used benchmarks may not adequately test core Dec-POMDP assumptions under current training paradigms, potentially leading to over-optimistic assessments of progress. We release our diagnostic tooling to support more rigorous environment design and evaluation in cooperative MARL.
【4】Understanding the Role of Rehearsal Scale in Continual Learning under Varying Model Capacities
标题:了解排练量表在不同模型能力下持续学习中的作用
链接:https://arxiv.org/abs/2602.20791
作者:JinLi He,Liang Bai,Xian Yang
摘要:排练是减轻灾难性遗忘的关键技术之一,由于其简单实用而被广泛应用于持续学习算法中。然而,理论上理解如何排练规模影响学习动力仍然有限。为了解决这一差距,我们制定排练为基础的持续学习作为一个多维的有效性驱动的迭代优化问题,提供了一个统一的表征在不同的性能指标。在这个框架内,我们得出了一个封闭的形式分析的适应性,记忆性和概括的角度来看,排练规模。我们的研究结果揭示了几个有趣和违反直觉的发现。首先,排练会损害模型的适应性,这与传统上公认的好处形成鲜明对比。第二,增加排练规模并不一定能提高记忆力。当任务相似且噪声水平较低时,记忆错误表现出一个递减的下限。最后,我们通过数值模拟和对多个真实世界数据集的深度神经网络的扩展分析来验证这些见解,揭示了持续学习中排练机制的统计模式。
摘要:Rehearsal is one of the key techniques for mitigating catastrophic forgetting and has been widely adopted in continual learning algorithms due to its simplicity and practicality. However, the theoretical understanding of how rehearsal scale influences learning dynamics remains limited. To address this gap, we formulate rehearsal-based continual learning as a multidimensional effectiveness-driven iterative optimization problem, providing a unified characterization across diverse performance metrics. Within this framework, we derive a closed-form analysis of adaptability, memorability, and generalization from the perspective of rehearsal scale. Our results uncover several intriguing and counterintuitive findings. First, rehearsal can impair model's adaptability, in sharp contrast to its traditionally recognized benefits. Second, increasing the rehearsal scale does not necessarily improve memory retention. When tasks are similar and noise levels are low, the memory error exhibits a diminishing lower bound. Finally, we validate these insights through numerical simulations and extended analyses on deep neural networks across multiple real-world datasets, revealing statistical patterns of rehearsal mechanisms in continual learning.
【5】Benchmarking GNN Models on Molecular Regression Tasks with CKA-Based Representation Analysis
标题:使用基于CKA的表示分析对分子回归任务的GNN模型进行基准测试
链接:https://arxiv.org/abs/2602.20573
作者:Rajan,Ishaan Gupta
备注:10 pages, 5 figures and 2 tables
摘要:分子通常表示为SMILES字符串,其可以容易地转换为固定大小的分子指纹。这些指纹作为特征向量来训练ML/DL模型,用于计算化学、药物发现、生物化学和材料科学领域中的分子性质预测任务。最近的研究表明,SMILES可以用来构建分子图,其中原子是节点($V$),键是边($E$)。这些图随后可用于训练几何DL模型,如GNN。GNN学习分子内部的固有结构关系,而不是依赖于固定大小的指纹。虽然GNN是功能强大的聚合器,但它们对较小数据集的有效性和不同架构的归纳偏差的研究较少。在我们目前的研究中,我们对不同数据集领域(物理化学,生物学和分析)的四种不同GNN架构进行了系统的基准测试。此外,我们还实现了一个分层融合(GNN+FP)框架的目标预测。我们观察到,融合框架始终优于或匹配独立GNN(RMSE改进> 7\%$)和基线模型的性能。此外,我们研究了GNN和指纹嵌入之间使用中心核对齐(CKA)的代表性相似性,发现它们占用高度独立的潜在空间(CKA $\le0.46$)。跨架构CKA分数表明同位素模型(如GCN,GraphSAGE和GIN(CKA $\geq 0.88 $))之间的高度收敛,GAT学习适度独立的表示(CKA $0.55-0.80$)。
摘要:Molecules are commonly represented as SMILES strings, which can be readily converted to fixed-size molecular fingerprints. These fingerprints serve as feature vectors to train ML/DL models for molecular property prediction tasks in the field of computational chemistry, drug discovery, biochemistry, and materials science. Recent research has demonstrated that SMILES can be used to construct molecular graphs where atoms are nodes ($V$) and bonds are edges ($E$). These graphs can subsequently be used to train geometric DL models like GNN. GNN learns the inherent structural relationships within a molecule rather than depending on fixed-size fingerprints. Although GNN are powerful aggregators, their efficacy on smaller datasets and inductive biases across different architectures is less studied. In our present study, we performed a systematic benchmarking of four different GNN architectures across a diverse domain of datasets (physical chemistry, biological, and analytical). Additionally, we have also implemented a hierarchical fusion (GNN+FP) framework for target prediction. We observed that the fusion framework consistently outperforms or matches the performance of standalone GNN (RMSE improvement > $7\%$) and baseline models. Further, we investigated the representational similarity using centered kernel alignment (CKA) between GNN and fingerprint embeddings and found that they occupy highly independent latent spaces (CKA $\le0.46$). The cross-architectural CKA score suggests a high convergence between isotopic models like GCN, GraphSAGE and GIN (CKA $\geq0.88$), with GAT learning moderately independent representation (CKA $0.55-0.80$).
【6】Controllable Exploration in Hybrid-Policy RLVR for Multi-Modal Reasoning
标题:用于多模式推理的混合策略WLVR的可控探索
链接:https://arxiv.org/abs/2602.20197
作者:Zhuoxu Huang,Mengxi Jia,Hao Sun,Xuelong Li,Jungong Han
备注:Published as a conference paper at ICLR 2026
摘要
:具有可验证奖励的强化学习(RLVR)已成为增强多模态大型语言模型(MLLM)推理能力的主要学习范式。然而,在RL训练过程中,MLLM的巨大状态空间和稀疏奖励往往会导致熵崩溃,策略退化或过度利用次优行为。这就需要一个勘探战略,保持生产随机性,同时避免不受控制的随机抽样的缺点,产生低效的勘探。在本文中,我们提出了CalibRL,一个混合策略RLVR框架,支持可控的探索与专家的指导,使两个关键机制。首先,分布感知的优势加权通过组稀有性来校准分布,从而保持探索。同时,非对称激活函数(LeakyReLU)利用专家知识作为校准基线,以缓和过度自信的更新,同时保持其纠正方向。CalibRL以引导的方式增加策略熵,并通过在线采样估计策略分布来澄清目标分布。更新由这些信息行为驱动,避免收敛到错误的模式。重要的是,这些设计有助于缓解模型的政策和专家轨迹之间的分布不匹配,从而在探索和开发之间实现更稳定的平衡。在八个基准测试中进行了广泛的实验,包括域内和域外设置,证明了一致的改进,验证了我们可控的混合策略RLVR训练的有效性。代码可在https://github.com/zhh6425/CalibRL上获得。
摘要:Reinforcement Learning with verifiable rewards (RLVR) has emerged as a primary learning paradigm for enhancing the reasoning capabilities of multi-modal large language models (MLLMs). However, during RL training, the enormous state space of MLLM and sparse rewards often leads to entropy collapse, policy degradation, or over-exploitation of suboptimal behaviors. This necessitates an exploration strategy that maintains productive stochasticity while avoiding the drawbacks of uncontrolled random sampling, yielding inefficient exploration. In this paper, we propose CalibRL, a hybrid-policy RLVR framework that supports controllable exploration with expert guidance, enabled by two key mechanisms. First, a distribution-aware advantage weighting scales updates by group rareness to calibrate the distribution, therefore preserving exploration. Meanwhile, the asymmetric activation function (LeakyReLU) leverages the expert knowledge as a calibration baseline to moderate overconfident updates while preserving their corrective direction. CalibRL increases policy entropy in a guided manner and clarifies the target distribution by estimating the on-policy distribution through online sampling. Updates are driven by these informative behaviors, avoiding convergence to erroneous patterns. Importantly, these designs help alleviate the distributional mismatch between the model's policy and expert trajectories, thereby achieving a more stable balance between exploration and exploitation. Extensive experiments across eight benchmarks, including both in-domain and out-of-domain settings, demonstrate consistent improvements, validating the effectiveness of our controllable hybrid-policy RLVR training. Code is available at https://github.com/zhh6425/CalibRL.
【7】Amortized Bayesian inference for actigraph time sheet data from mobile devices
标题:来自移动设备的活动记录表数据的摊销Bayesian推断
链接:https://arxiv.org/abs/2602.20611
作者:Daniel Zhou,Sudipto Banerjee
备注:40 pages, 7 figures
摘要:移动数据技术使用"活动记录仪“提供关于健康变量的信息,这些变量是受试者移动的函数。可穿戴设备和相关技术的出现推动了由人体运动数据组成的健康数据库的创建,以进行对移动模式和健康结果的研究。分析高分辨率活动记录仪数据的统计方法取决于特定的推理背景,但人工智能(AI)框架的出现要求这些方法与转移学习和摊销一致。本文设计了活动记录仪时间表的摊销贝叶斯推断。我们追求贝叶斯方法,以确保充分传播的不确定性和量化使用分层动态线性模型。我们的分析围绕活动记录仪数据,这些数据来自洛杉矶可持续交通方法的身体活动(PASTA-LA)研究,该研究由加州大学洛杉矶分校菲尔丁公共卫生学院进行。除了实现活动记录仪时间表的概率插补外,我们还能够统计学地了解解释变量对受试者队列的加速度幅度(MAG)的时变影响。
摘要:Mobile data technologies use ``actigraphs'' to furnish information on health variables as a function of a subject's movement. The advent of wearable devices and related technologies has propelled the creation of health databases consisting of human movement data to conduct research on mobility patterns and health outcomes. Statistical methods for analyzing high-resolution actigraph data depend on the specific inferential context, but the advent of Artificial Intelligence (AI) frameworks require that the methods be congruent to transfer learning and amortization. This article devises amortized Bayesian inference for actigraph time sheets. We pursue a Bayesian approach to ensure full propagation of uncertainty and its quantification using a hierarchical dynamic linear model. We build our analysis around actigraph data from the Physical Activity through Sustainable Transport Approaches in Los Angeles (PASTA-LA) study conducted by the Fielding School of Public Health in the University of California, Los Angeles. Apart from achieving probabilistic imputation of actigraph time sheets, we are also able to statistically learn about the time-varying impact of explanatory variables on the magnitude of acceleration (MAG) for a cohort of subjects.
【8】Data-Driven Deep MIMO Detection:Network Architectures and Generalization Analysis
标题:数据驱动的深度MMO检测:网络架构和概括分析
链接:https://arxiv.org/abs/2602.20178
作者:Yongwei Yi,Xinping Yi,Wenjin Wang,Xiao Li,Shi Jin
备注:17 pages, 7 figures. Full version of a work prepared for submission to IEEE
摘要:在实际的多用户多输入多输出(MU-MIMO)系统中,由于严重的用户间干扰和对信道状态信息(CSI)不确定性的敏感性,符号检测仍然具有挑战性。相对于大多数研究的信念传播型模型驱动的方法,这会导致高计算复杂度,软干扰消除(SIC)罢工之间的性能和复杂度的良好平衡。为了进一步解决CSI失配和非线性效应,最近提出的数据驱动的深度神经接收器(如DeepSIC)利用深度神经网络的优势进行干扰消除和符号检测,表现出强大的经验性能。然而,对于DeepSIC为什么以及在多大程度上可以通过训练样本的数量进行泛化,仍然缺乏理论基础。本文提出了在MLP网络架构中检查完全数据驱动的DeepSIC检测,该架构由多个通过外部和内部有向无环图(DAG)互连的MLP组成。在这样的架构中,DeepSIC可以升级为使用图神经网络(GNNs)的基于图的消息传递过程,称为GNNSIC,具有跨用户和迭代的共享模型参数。值得注意的是,GNNSIC实现了与DeepSIC相当的出色表现力,具有更少的可训练参数,从而提高了样本效率并增强了用户泛化能力。通过使用Rademacher复杂度进行基于范数的泛化分析,我们发现由于参数共享,可以在GNNSIC中消除对DeepSIC迭代次数的指数依赖。仿真结果表明,GNNSIC在显著减少参数和训练样本的情况下,获得了与DeepSIC相当或更高的误符号率(SER)性能。
摘要
:In practical Multiuser Multiple-Input Multiple-Output (MU-MIMO) systems, symbol detection remains challenging due to severe inter-user interference and sensitivity to Channel State Information (CSI) uncertainty. In contrast to the mostly studied belief propagation-type model-driven methods, which incur high computational complexity, Soft Interference Cancellation (SIC) strikes a good balance between performance and complexity. To further address CSI mismatch and nonlinear effects, the recently proposed data-driven deep neural receivers, such as DeepSIC, leverage the advantages of deep neural networks for interference cancellation and symbol detection, demonstrating strong empirical performance. However, there is still a lack of theoretical underpinning for why and to what extent DeepSIC could generalize with the number of training samples. This paper proposes inspecting the fully data-driven DeepSIC detection within a Network-of-MLPs architecture, which is composed of multiple interconnected MLPs via outer and inner Directed Acyclic Graphs (DAGs). Within such an architecture, DeepSIC can be upgraded as a graph-based message-passing process using Graph Neural Networks (GNNs), termed GNNSIC, with shared model parameters across users and iterations. Notably, GNNSIC achieves excellent expressivity comparable to DeepSIC with substantially fewer trainable parameters, resulting in improved sample efficiency and enhanced user generalization. By conducting a norm-based generalization analysis using Rademacher complexity, we reveal that an exponential dependence on the number of iterations for DeepSIC can be eliminated in GNNSIC due to parameter sharing. Simulation results demonstrate that GNNSIC attains comparable or improved Symbol Error Rate (SER) performance to DeepSIC with significantly fewer parameters and training samples.
检测相关(4篇)
【1】Assessing the Impact of Speaker Identity in Speech Spoofing Detection
标题:评估说话者身份在语音欺骗检测中的影响
链接:https://arxiv.org/abs/2602.20805
作者:Anh-Tuan Dao,Driss Matrouf,Nicholas Evans
摘要:欺骗检测系统通常使用来自多个说话者的不同录音进行训练,通常假设所得到的嵌入与说话者身份无关。然而,这一假设仍未得到证实。在本文中,我们研究了说话人信息对欺骗检测系统的影响。我们在Speaker-Invariant Multi-Task框架中提出了两种方法,一种是在嵌入中对说话人身份进行建模,另一种是将其删除。SInMT集成了多任务学习,用于联合说话人识别和欺骗检测,并结合了梯度反转层。使用四个数据集进行评估,与基线相比,我们的说话者不变模型将平均等错误率降低了17%,对于最具挑战性的攻击(例如,A11)。
摘要:Spoofing detection systems are typically trained using diverse recordings from multiple speakers, often assuming that the resulting embeddings are independent of speaker identity. However, this assumption remains unverified. In this paper, we investigate the impact of speaker information on spoofing detection systems. We propose two approaches within our Speaker-Invariant Multi-Task framework, one that models speaker identity within the embeddings and another that removes it. SInMT integrates multi-task learning for joint speaker recognition and spoofing detection, incorporating a gradient reversal layer. Evaluated using four datasets, our speaker-invariant model reduces the average equal error rate by 17% compared to the baseline, with up to 48% reduction for the most challenging attacks (e.g., A11).
【2】Knowing the Unknown: Interpretable Open-World Object Detection via Concept Decomposition Model
标题:了解未知:通过概念分解模型的可解释开放世界对象检测
链接:https://arxiv.org/abs/2602.20616
作者:Xueqiang Lv,Shizhou Zhang,Yinghui Xing,Di Xu,Peng Wang,Yanning Zhang
摘要:开放世界对象检测(OWOD)需要在可靠地识别未知对象的同时逐步检测已知类别。现有的方法主要集中在提高未知召回率,但忽略了可解释性,往往导致已知-未知混淆和预测可靠性降低。本文旨在使整个OWOD框架具有可解释性,使检测器能够真正“知道未知”。为此,我们提出了一个概念驱动的InterPretable OWOD框架(IPOW),通过引入OWOD的概念分解模型(CDM),将Faster R-CNN中的耦合RoI特征显式分解为区分,共享和背景概念。区分概念识别最具区分性的特征,以扩大已知类别之间的距离,而共享和背景概念,由于其强大的泛化能力,可以很容易地转移到检测未知类别。利用可解释的框架,我们确定,已知未知的混淆时,未知的对象落入已知类的判别空间。为了解决这个问题,我们提出了概念引导纠正(CGR),以进一步解决这种混乱。大量的实验表明,IPOW显着提高未知召回,同时减轻混乱,并提供概念级的可解释性为已知和未知的预测。
摘要:Open-world object detection (OWOD) requires incrementally detecting known categories while reliably identifying unknown objects. Existing methods primarily focus on improving unknown recall, yet overlook interpretability, often leading to known-unknown confusion and reduced prediction reliability. This paper aims to make the entire OWOD framework interpretable, enabling the detector to truly "knowing the unknown". To this end, we propose a concept-driven InterPretable OWOD framework(IPOW) by introducing a Concept Decomposition Model (CDM) for OWOD, which explicitly decomposes the coupled RoI features in Faster R-CNN into discriminative, shared, and background concepts. Discriminative concepts identify the most discriminative features to enlarge the distances between known categories, while shared and background concepts, due to their strong generalization ability, can be readily transferred to detect unknown categories. Leveraging the interpretable framework, we identify that known-unknown confusion arises when unknown objects fall into the discriminative space of known classes. To address this, we propose Concept-Guided Rectification (CGR) to further resolve such confusion. Extensive experiments show that IPOW significantly improves unknown recall while mitigating confusion, and provides concept-level interpretability for both known and unknown predictions.
【3】Learning During Detection: Continual Learning for Neural OFDM Receivers via DMRS
标题:检测期间学习:通过DMRS进行神经CDMA接收机的连续学习
链接:https://arxiv.org/abs/2602.20361
作者:Mohanad Obeed,Ming Jian
摘要:深度神经网络(DNN)越来越多地被用于接收器设计,因为它们可以处理复杂的环境,而不依赖于显式信道模型。然而,由于沟通渠道变化迅速,其分布可能会随着时间的推移而发生变化,因此经常需要定期进行再培训。本文提出了一种零开销的在线和连续学习的正交频分复用(OFDM)神经接收机,直接检测接收信号的软比特的框架。与依赖于专用训练间隔或完整资源网格的传统微调方法不同,我们的方法利用现有的解调参考信号(DMRS)同时实现信号解调和模型自适应。我们介绍了三种导频设计:完全随机导频、混合导频和额外导频,可灵活支持联合解调和学习。为了适应这些导频设计,我们开发了两种接收机架构:(i)一个并行设计,分离的推理和微调不间断的操作,和(ii)一个前向通过重用设计,降低计算复杂度。仿真结果表明,该方法有效地跟踪慢和快信道分布的变化,没有额外的开销,服务中断,或灾难性的性能下降下分布偏移。
摘要
:Deep neural networks (DNNs) have been increasingly explored for receiver design because they can handle complex environments without relying on explicit channel models. Nevertheless, because communication channels change rapidly, their distributions can shift over time, often making periodic retraining necessary. This paper proposes a zero-overhead online and continual learning framework for orthogonal frequency-division multiplexing (OFDM) neural receivers that directly detect the soft bits of received signals. Unlike conventional fine-tuning methods that rely on dedicated training intervals or full resource grids, our approach leverages existing demodulation reference signals (DMRS) to simultaneously enable signal demodulation and model adaptation. We introduce three pilot designs: fully randomized, hybrid, and additional pilots that flexibly support joint demodulation and learning. To accommodate these pilot designs, we develop two receiver architectures: (i) a parallel design that separates inference and fine-tuning for uninterrupted operation, and (ii) a forward-pass reusing design that reduces computational complexity. Simulation results show that the proposed method effectively tracks both slow and fast channel distribution variations without additional overhead, service interruption, or catastrophic performance degradation under distribution shift.
【4】Detecting and Mitigating Group Bias in Heterogeneous Treatment Effects
标题:检测和减轻不均匀治疗效果中的群体偏见
链接:https://arxiv.org/abs/2602.20383
作者:Joel Persson,Jurriën Bakker,Dennis Bohle,Stefan Feuerriegel,Florian von Wangenheim
摘要:异质性治疗效果(HTE)越来越多地使用机器学习模型进行估计,这些模型可以产生高度个性化的治疗效果预测。然而,在实践中,预测的治疗效果很少在个体水平上进行解释、报告或审计,而是经常汇总到更广泛的亚组,如人口统计学细分、风险分层或市场。我们发现,这种聚集可以诱导系统偏差的组水平的因果效应:即使当模型预测的个人水平的条件平均治疗效果(CATE)是正确的指定和训练的数据从随机实验,聚集预测的CATE到组水平一般不会恢复相应的组平均治疗效果(GATE)。我们开发了一个统一的统计框架,以检测和减轻这种形式的随机实验中的群体偏见。我们首先将组偏差定义为模型隐含的和实验确定的GATE之间的差异,推导出一个渐近正态估计,然后提供一个简单的统计检验。为了缓解,我们提出了一个基于收缩的偏差校正,并表明,理论上的最佳和经验上可行的解决方案具有封闭形式的表达式。该框架是完全通用的,施加最小的假设,只需要计算样本矩。我们分析了减轻检测到的群体偏见对利润最大化的个性化目标的经济影响,从而描述了偏见校正何时改变目标决策和利润,以及所涉及的权衡。在主要数字平台上的大规模实验数据的应用验证了我们的理论结果,并展示了经验性能。
摘要:Heterogeneous treatment effects (HTEs) are increasingly estimated using machine learning models that produce highly personalized predictions of treatment effects. In practice, however, predicted treatment effects are rarely interpreted, reported, or audited at the individual level but, instead, are often aggregated to broader subgroups, such as demographic segments, risk strata, or markets. We show that such aggregation can induce systematic bias of the group-level causal effect: even when models for predicting the individual-level conditional average treatment effect (CATE) are correctly specified and trained on data from randomized experiments, aggregating the predicted CATEs up to the group level does not, in general, recover the corresponding group average treatment effect (GATE). We develop a unified statistical framework to detect and mitigate this form of group bias in randomized experiments. We first define group bias as the discrepancy between the model-implied and experimentally identified GATEs, derive an asymptotically normal estimator, and then provide a simple-to-implement statistical test. For mitigation, we propose a shrinkage-based bias-correction, and show that the theoretically optimal and empirically feasible solutions have closed-form expressions. The framework is fully general, imposes minimal assumptions, and only requires computing sample moments. We analyze the economic implications of mitigating detected group bias for profit-maximizing personalized targeting, thereby characterizing when bias correction alters targeting decisions and profits, and the trade-offs involved. Applications to large-scale experimental data at major digital platforms validate our theoretical results and demonstrate empirical performance.
分类|识别(4篇)
【1】Estimation of Confidence Bounds in Binary Classification using Wilson Score Kernel Density Estimation
标题:利用Wilson得分核密度估计估计二元分类的置信界
链接:https://arxiv.org/abs/2602.20947
作者:Thorbjørn Mosekjær Iversen,Zebin Duan,Frederik Hagelskjær
摘要:近年来,基于深度学习的二进制分类器的性能和易用性都有了显著提高。这为关键检测任务的自动化提供了可能,而传统上这些任务只能由人工完成。然而,二进制分类器在关键操作中的应用取决于可靠的置信界限的估计,使得系统性能可以确保达到给定的统计显著性。我们提出了威尔逊分数核密度分类,这是一种新的基于核的方法来估计置信区间的二进制分类。我们的方法的核心是威尔逊得分核密度估计,这是一个函数估计估计的置信区间在二项式实验条件不同的成功概率。我们的方法是在四个不同的数据集上进行选择性分类的背景下进行评估的,说明了它作为任何特征提取器(包括视觉基础模型)的分类头的用途。我们提出的方法表现出类似的性能高斯过程分类,但在较低的计算复杂度。
摘要:The performance and ease of use of deep learning-based binary classifiers have improved significantly in recent years. This has opened up the potential for automating critical inspection tasks, which have traditionally only been trusted to be done manually. However, the application of binary classifiers in critical operations depends on the estimation of reliable confidence bounds such that system performance can be ensured up to a given statistical significance. We present Wilson Score Kernel Density Classification, which is a novel kernel-based method for estimating confidence bounds in binary classification. The core of our method is the Wilson Score Kernel Density Estimator, which is a function estimator for estimating confidence bounds in Binomial experiments with conditionally varying success probabilities. Our method is evaluated in the context of selective classification on four different datasets, illustrating its use as a classification head of any feature extractor, including vision foundation models. Our proposed method shows similar performance to Gaussian Process Classification, but at a lower computational complexity.
【2】Memory-guided Prototypical Co-occurrence Learning for Mixed Emotion Recognition
标题:用于混合情绪识别的记忆引导原型同现学习
链接:https://arxiv.org/abs/2602.20530
作者:Ming Li,Yong-Jin Liu,Fang Liu,Huankun Sheng,Yeying Fan,Yixiang Wei,Minnan Luo,Weizhan Zhang,Wenping Wang
摘要:从多模态生理和行为信号中识别情感在情感计算中起着关键作用,然而大多数现有模型仍然局限于在受控实验室环境中预测奇异情感。相比之下,现实世界中的人类情感体验的特点往往是同时存在多个情感状态,刺激了最近的兴趣,混合情感识别作为一个情感分布学习问题。然而,目前的研究方法往往忽视了共存情绪之间内在的效价一致性和结构相关性。为了解决这个问题,我们提出了一个记忆引导的原型共现学习(MPCL)框架,明确的情感共现模式模型。具体来说,我们首先通过多尺度联想记忆机制融合多模态信号。为了捕捉跨模态语义关系,我们构建情感特定的原型记忆库,产生丰富的生理和行为表示,并采用原型关系蒸馏,以确保跨模态对齐的潜在原型空间。此外,受人类认知记忆系统的启发,我们引入了一种记忆检索策略来提取跨情感类别的语义级共现关联。通过这种自下而上的分层抽象过程,我们的模型学习情感信息表示,以实现准确的情感分布预测。在两个公开数据集上的综合实验表明,MPCL在混合情感识别方面无论是定量还是定性都始终优于最先进的方法。
摘要:Emotion recognition from multi-modal physiological and behavioral signals plays a pivotal role in affective computing, yet most existing models remain constrained to the prediction of singular emotions in controlled laboratory settings. Real-world human emotional experiences, by contrast, are often characterized by the simultaneous presence of multiple affective states, spurring recent interest in mixed emotion recognition as an emotion distribution learning problem. Current approaches, however, often neglect the valence consistency and structured correlations inherent among coexisting emotions. To address this limitation, we propose a Memory-guided Prototypical Co-occurrence Learning (MPCL) framework that explicitly models emotion co-occurrence patterns. Specifically, we first fuse multi-modal signals via a multi-scale associative memory mechanism. To capture cross-modal semantic relationships, we construct emotion-specific prototype memory banks, yielding rich physiological and behavioral representations, and employ prototype relation distillation to ensure cross-modal alignment in the latent prototype space. Furthermore, inspired by human cognitive memory systems, we introduce a memory retrieval strategy to extract semantic-level co-occurrence associations across emotion categories. Through this bottom-up hierarchical abstraction process, our model learns affectively informative representations for accurate emotion distribution prediction. Comprehensive experiments on two public datasets demonstrate that MPCL consistently outperforms state-of-the-art methods in mixed emotion recognition, both quantitatively and qualitatively.
【3】VISION-ICE: Video-based Interpretation and Spatial Identification of Arrhythmia Origins via Neural Networks in Intracardiac Echocardiography
标题:Vision-ICE:通过心内超声心动图中神经网络对心律失常起源进行基于视频的解释和空间识别
链接:https://arxiv.org/abs/2602.20165
作者:Dorsa EPMoghaddam,Feng Gao,Drew Bernard,Kavya Sinha,Mehdi Razavi,Behnaam Aazhang
备注:8 pages, 3 figures, 3 tabels
摘要:现代高密度标测技术和术前CT/MRI在定位心律失常方面仍然需要大量的时间和资源。AI已被验证为临床决策辅助工具,可提供准确、快速的超声心动图图像实时分析。在此基础上,我们提出了一个支持AI的框架,该框架利用心内超声心动图(ICE)(电生理学程序的常规部分)来指导临床医生进行心内膜异位症的治疗,并可能减少手术时间。心律失常源定位被制定为三个类别的分类任务,区分正常窦性心律,左侧和右侧心律失常,基于ICE视频数据。我们开发了一个3D卷积神经网络,用于区分上述三类。在10倍交叉验证中,该模型在对4名以前未见过的患者进行评估时,平均准确率为66.2%(大大优于33.3%的随机基线)。这些结果证明了使用ICE视频结合深度学习进行自动心律失常定位的可行性和临床前景。利用ICE成像可以实现更快,更有针对性的电生理干预,并减少心脏消融术的手术负担。未来的工作将集中在扩展数据集,以提高模型的鲁棒性和在不同患者人群中的通用性。
摘要:Contemporary high-density mapping techniques and preoperative CT/MRI remain time and resource intensive in localizing arrhythmias. AI has been validated as a clinical decision aid in providing accurate, rapid real-time analysis of echocardiographic images. Building on this, we propose an AI-enabled framework that leverages intracardiac echocardiography (ICE), a routine part of electrophysiology procedures, to guide clinicians toward areas of arrhythmogenesis and potentially reduce procedural time. Arrhythmia source localization is formulated as a three-class classification task, distinguishing normal sinus rhythm, left-sided, and right-sided arrhythmias, based on ICE video data. We developed a 3D Convolutional Neural Network trained to discriminate among the three aforementioned classes. In ten-fold cross-validation, the model achieved a mean accuracy of 66.2% when evaluated on four previously unseen patients (substantially outperforming the 33.3% random baseline). These results demonstrate the feasibility and clinical promise of using ICE videos combined with deep learning for automated arrhythmia localization. Leveraging ICE imaging could enable faster, more targeted electrophysiological interventions and reduce the procedural burden of cardiac ablation. Future work will focus on expanding the dataset to improve model robustness and generalizability across diverse patient populations.
【4】An Enhanced Projection Pursuit Tree Classifier with Visual Methods for Assessing Algorithmic Improvements
标题:一种改进的投影寻踪树分类器及其算法改进
链接:https://arxiv.org/abs/2602.21130
作者:Natalia da Silva,Dianne Cook,Eun-Kyung Lee
摘要:本文提出了增强的投影寻踪树分类器和视觉诊断方法,以评估其在高维度的影响。原始算法使用树结构中变量的线性组合,其中深度被限制为小于类的数量--事实证明,这一限制对于复杂的分类问题来说过于严格。我们的扩展提高了性能,在多类设置不平等的方差-协方差结构和非线性类分离,允许更多的分裂和更灵活的类分组的投影寻踪计算。提出算法改进是直截了当的;证明它们的实际效用不是。因此,我们开发了两种视觉诊断方法来验证增强功能是否按预期执行。使用高维可视化技术,我们检查模型适合基准数据集,以评估算法的行为是否理论化。一个交互式的Web应用程序,使用户能够探索行为的原始和增强的分类器在受控的情况下。这些增强功能在R包PPtreeExt中实现。
摘要
:This paper presents enhancements to the projection pursuit tree classifier and visual diagnostic methods for assessing their impact in high dimensions. The original algorithm uses linear combinations of variables in a tree structure where depth is constrained to be less than the number of classes -- a limitation that proves too rigid for complex classification problems. Our extensions improve performance in multi-class settings with unequal variance-covariance structures and nonlinear class separations by allowing more splits and more flexible class groupings in the projection pursuit computation. Proposing algorithmic improvements is straightforward; demonstrating their actual utility is not. We therefore develop two visual diagnostic approaches to verify that the enhancements perform as intended. Using high-dimensional visualization techniques, we examine model fits on benchmark datasets to assess whether the algorithm behaves as theorized. An interactive web application enables users to explore the behavior of both the original and enhanced classifiers under controlled scenarios. The enhancements are implemented in the R package PPtreeExt.
表征(1篇)
【1】Communication-Inspired Tokenization for Structured Image Representations
标题:结构化图像表示的受通信启发的代币化
链接:https://arxiv.org/abs/2602.20731
作者:Aram Davtyan,Yusuf Sahin,Yasaman Haghighi,Sebastian Stapf,Pablo Acuaviva,Alexandre Alahi,Paolo Favaro
备注:Project website: https://araachie.github.io/comit/
摘要:离散图像标记器已经成为现代视觉和多模态系统的关键组件,为基于transformer的架构提供顺序接口。然而,大多数现有的方法仍然主要针对重建和压缩进行优化,通常会产生捕获局部纹理而不是对象级语义结构的令牌。受人类通信的增量和合成性质的启发,我们引入了通信启发的令牌化(COMiT),一个用于学习结构化离散视觉令牌序列的框架。COMiT通过迭代地观察局部图像裁剪并循环地更新其离散表示,在固定的令牌预算内构建潜在消息。在每一步中,模型都会整合新的视觉信息,同时细化和重组现有的令牌序列。经过几次编码迭代后,最终的消息会调节流匹配解码器,重建完整的图像。编码和解码都在单个Transformer模型中实现,并使用流匹配重建和语义表示对齐丢失的组合进行端到端训练。我们的实验表明,虽然语义对齐提供了基础,细心的顺序标记是至关重要的诱导可解释的,以对象为中心的令牌结构,并大大提高组合泛化和关系推理比以前的方法。
摘要:Discrete image tokenizers have emerged as a key component of modern vision and multimodal systems, providing a sequential interface for transformer-based architectures. However, most existing approaches remain primarily optimized for reconstruction and compression, often yielding tokens that capture local texture rather than object-level semantic structure. Inspired by the incremental and compositional nature of human communication, we introduce COMmunication inspired Tokenization (COMiT), a framework for learning structured discrete visual token sequences. COMiT constructs a latent message within a fixed token budget by iteratively observing localized image crops and recurrently updating its discrete representation. At each step, the model integrates new visual information while refining and reorganizing the existing token sequence. After several encoding iterations, the final message conditions a flow-matching decoder that reconstructs the full image. Both encoding and decoding are implemented within a single transformer model and trained end-to-end using a combination of flow-matching reconstruction and semantic representation alignment losses. Our experiments demonstrate that while semantic alignment provides grounding, attentive sequential tokenization is critical for inducing interpretable, object-centric token structure and substantially improving compositional generalization and relational reasoning over prior methods.
3D|3D重建等相关(1篇)
【1】WeirNet: A Large-Scale 3D CFD Benchmark for Geometric Surrogate Modeling of Piano Key Weirs
标题:WeirNet:钢琴琴键堰几何代理建模的大规模3D计算流体动力学基准
链接:https://arxiv.org/abs/2602.20714
作者:Lisa Lüddecke,Michael Hohmann,Sebastian Eilermann,Jan Tillmann-Mumm,Pezhman Pourabdollah,Mario Oertel,Oliver Niggemann
摘要:由于泄流能力取决于三维几何形状和运行条件,因此对钢琴键堰(PKW)设计的水力性能进行可靠预测是一项挑战。替代模型可以加速水力结构设计,但进展受到稀缺的大型,有据可查的数据集的限制,这些数据集共同捕获几何变化,运行条件和功能性能。本研究介绍了WeirNet,一个大型的三维CFD基准数据集的几何替代建模PKW。WeirNet包含3,794个参数化的、可行性约束的矩形和梯形PKW几何体,每个几何体都使用一致的自由表面OpenFOAM工作流程在19种排放条件下进行调度,从而生成71,387个完整的模拟,形成基准并具有完整的排放系数标签。该数据集作为多模态紧凑参数描述符、水密表面网格和高分辨率点云以及标准化任务和分布内和分布外分割发布。代表性的替代家庭的基准流量系数预测。基于参数描述符的基于树的回归器实现了最佳的整体准确性,而基于点和网格的模型仍然具有竞争力,并提供参数化不可知推理。所有代理都以毫秒为单位计算每个样本,从而在CFD运行时提供数量级的加速。与看不见的放电值相比,分布外结果将几何形状偏移确定为主要失效模式,并且数据效率实验显示,超过大约60%的训练数据的收益递减。通过公开发布数据集以及模拟设置和评估管道,WeirNet为数据驱动的水力建模建立了可复制的框架,并在水力规划的早期阶段更快地探索PKW设计。
摘要:Reliable prediction of hydraulic performance is challenging for Piano Key Weir (PKW) design because discharge capacity depends on three-dimensional geometry and operating conditions. Surrogate models can accelerate hydraulic-structure design, but progress is limited by scarce large, well-documented datasets that jointly capture geometric variation, operating conditions, and functional performance. This study presents WeirNet, a large 3D CFD benchmark dataset for geometric surrogate modeling of PKWs. WeirNet contains 3,794 parametric, feasibility-constrained rectangular and trapezoidal PKW geometries, each scheduled at 19 discharge conditions using a consistent free-surface OpenFOAM workflow, resulting in 71,387 completed simulations that form the benchmark and with complete discharge coefficient labels. The dataset is released as multiple modalities compact parametric descriptors, watertight surface meshes and high-resolution point clouds together with standardized tasks and in-distribution and out-of-distribution splits. Representative surrogate families are benchmarked for discharge coefficient prediction. Tree-based regressors on parametric descriptors achieve the best overall accuracy, while point- and mesh-based models remain competitive and offer parameterization-agnostic inference. All surrogates evaluate in milliseconds per sample, providing orders-of-magnitude speedups over CFD runtimes. Out-of-distribution results identify geometry shift as the dominant failure mode compared to unseen discharge values, and data-efficiency experiments show diminishing returns beyond roughly 60% of the training data. By publicly releasing the dataset together with simulation setups and evaluation pipelines, WeirNet establishes a reproducible framework for data-driven hydraulic modeling and enables faster exploration of PKW designs during the early stages of hydraulic planning.
编码器(1篇)
【1】Shape-informed cardiac mechanics surrogates in data-scarce regimes via geometric encoding and generative augmentation
标题:通过几何编码和生成增强在数据稀缺制度中的形状通知心脏力学替代品
链接:https://arxiv.org/abs/2602.20306
作者:Davide Carrara,Marc Hirschvogel,Francesca Bonizzoni,Stefano Pagani,Simone Pezzuto,Francesco Regazzoni
备注:39 pages, 19 figures
摘要:心脏力学的高保真度计算模型提供了对心脏功能的机械洞察,但对于常规临床使用来说,计算量过大。替代模型可以加速模拟,但在不同解剖结构中的推广是具有挑战性的,特别是在数据稀缺的环境中。我们提出了一个两步框架,从学习物理响应的几何表示,使形状知情的代理建模数据稀缺的条件下。首先,形状模型学习左心室几何形状的紧凑潜在表示。学习的潜在空间有效地编码解剖结构,并使合成的几何形状生成数据增强。其次,基于神经场的代理模型,这种几何编码的条件下,被训练来预测外部负荷下的心室位移。所提出的架构通过使用通用心室坐标来执行位置编码,这提高了跨不同解剖结构的泛化。使用两种替代策略对几何可变性进行编码,并对其进行了系统比较:基于PCA的方法适用于几何的点云表示,以及直接从点云学习的基于DeepSDF的隐式神经表示。总的来说,我们的结果,在理想化和患者特定的数据集上获得,表明所提出的方法允许准确的预测和泛化到看不见的几何形状,以及对噪声或稀疏采样输入的鲁棒性。
摘要:High-fidelity computational models of cardiac mechanics provide mechanistic insight into the heart function but are computationally prohibitive for routine clinical use. Surrogate models can accelerate simulations, but generalization across diverse anatomies is challenging, particularly in data-scarce settings. We propose a two-step framework that decouples geometric representation from learning the physics response, to enable shape-informed surrogate modeling under data-scarce conditions. First, a shape model learns a compact latent representation of left ventricular geometries. The learned latent space effectively encodes anatomies and enables synthetic geometries generation for data augmentation. Second, a neural field-based surrogate model, conditioned on this geometric encoding, is trained to predict ventricular displacement under external loading. The proposed architecture performs positional encoding by using universal ventricular coordinates, which improves generalization across diverse anatomies. Geometric variability is encoded using two alternative strategies, which are systematically compared: a PCA-based approach suitable for working with point cloud representations of geometries, and a DeepSDF-based implicit neural representation learned directly from point clouds. Overall, our results, obtained on idealized and patient-specific datasets, show that the proposed approaches allow for accurate predictions and generalization to unseen geometries, and robustness to noisy or sparsely sampled inputs.
优化|敛散性(5篇)
【1】GauS: Differentiable Scheduling Optimization via Gaussian Reparameterization
标题:GauS:通过高斯重新参数化的差异调度优化
链接:https://arxiv.org/abs/2602.20427
作者:Yaohui Cai,Vesal Bakhtazad,Cunxi Yu,Zhiru Zhang
摘要:有效的操作调度是软件编译和硬件综合中的一个基本挑战。虽然最近的可微方法试图用基于梯度的搜索来取代传统的方法,如精确求解器或递归,但它们通常依赖于分类分布,无法捕捉时间的有序性质,并且参数空间的尺度很差。在本文中,我们提出了一种新的微分框架,GauS,模型运营商调度作为一个随机松弛使用高斯分布,充分利用现代并行计算设备,如GPU。通过将时间表表示为连续的高斯变量,我们成功地捕捉到了时间的有序性,并将优化空间减少了几个数量级。我们的方法是高度灵活的,以代表各种目标和约束,这提供了第一个可微制定复杂的流水线调度问题。我们评估我们的方法在一系列的基准,证明高斯达到帕累托最优的结果。
摘要:Efficient operator scheduling is a fundamental challenge in software compilation and hardware synthesis. While recent differentiable approaches have sought to replace traditional ones like exact solvers or heuristics with gradient-based search, they typically rely on categorical distributions that fail to capture the ordinal nature of time and suffer from a parameter space that scales poorly. In this paper, we propose a novel differentiable framework, GauS, that models operator scheduling as a stochastic relaxation using Gaussian distributions, which fully utilize modern parallel computing devices like GPUs. By representing schedules as continuous Gaussian variables, we successfully capture the ordinal nature of time and reduce the optimization space by orders of magnitude. Our method is highly flexible to represent various objectives and constraints, which provides the first differentiable formulation for the complex pipelined scheduling problem. We evaluate our method on a range of benchmarks, demonstrating that Gaus achieves Pareto-optimal results.
【2】Tensor Network Generator-Enhanced Optimization for Traveling Salesman Problem
标题:旅行商问题的张量网络生成器增强优化
链接:https://arxiv.org/abs/2602.20175
作者:Ryo Sakai,Chen-Yu Liu
备注:11 pages, 7 figures
摘要:我们提出了一个应用张量网络生成器增强优化(TN-GEO)框架来解决旅行商问题(TSP),一个基本的组合优化挑战。我们的方法采用张量网络Born机的基础上自动微分矩阵乘积状态(MPS)作为生成模型,使用Born规则来定义概率分布的候选解决方案。与基于二进制编码的方法不同,这种方法需要N^2 $变量和惩罚项来执行有效的巡回约束,我们采用基于置换的整数变量公式,并使用自回归采样和掩码来保证每个生成的样本是一个有效的巡回建设。我们还介绍了一个$k$-网站MPS的变体,学习分布超过$k$-克(连续的城市连续性)使用滑动窗口的方法,使参数有效的建模更大的实例。在TSPLIB基准测试实例上的实验验证表明,TN-GEO的性能优于包括交换和2-opt爬山在内的经典算法。的$k$网站的变体,把更多的重点放在本地的相关性,显示更好的结果相比,全MPS的情况。
摘要
:We present an application of the tensor network generator-enhanced optimization (TN-GEO) framework to address the traveling salesman problem (TSP), a fundamental combinatorial optimization challenge. Our approach employs a tensor network Born machine based on automatically differentiable matrix product states (MPS) as the generative model, using the Born rule to define probability distributions over candidate solutions. Unlike approaches based on binary encoding, which require $N^2$ variables and penalty terms to enforce valid tour constraints, we adopt a permutation-based formulation with integer variables and use autoregressive sampling with masking to guarantee that every generated sample is a valid tour by construction. We also introduce a $k$-site MPS variant that learns distributions over $k$-grams (consecutive city subsequences) using a sliding window approach, enabling parameter-efficient modeling for larger instances. Experimental validation on TSPLIB benchmark instances with up to 52 cities demonstrates that TN-GEO can outperform classical heuristics including swap and 2-opt hill-climbing. The $k$-site variants, which put more focus on local correlations, show better results compared to the full-MPS case.
【3】F10.7 Index Prediction: A Multiscale Decomposition Strategy with Wavelet Transform for Performance Optimization
标题:F10.7指数预测:一种用于性能优化的多尺度分解策略
链接:https://arxiv.org/abs/2602.20712
作者:Xuran Ma,Xuebao Li,Yanfang Zheng,Yongshang Lv,Xiaojia Ji,Jiancheng Xu,Hongwei Ye,Zixian Wu,Shuainan Yan,Liang Dong,Zamri Zainal Abidin,Xusheng Huang,Shunhuang Zhang,Honglei Jin,Tarik Abdul Latef,Noraisyah Mohamed Shah,Mohamadariff Othman,Kamarul Ariffin Noordin
摘要:在这项研究中,我们构建了用于训练、验证和测试的数据集A,以及用于评估泛化的数据集B。我们提出了一种新的F10.7指数预测方法,利用小波分解,F10.7连同其分解的近似和细节信号输入到iTransformer模型。我们还将国际太阳黑子数(ISN)及其小波分解的信号,以评估其对预测性能的影响。最后将本文的最优方法与S. Yan等人(2025)和三种操作模型(SWPC,BGS,CLS)。此外,我们将我们的方法转移到H. Ye et al.(2024),并在数据集B上将我们的方法与他们的方法进行比较。主要发现包括:(1)基于小波的组合方法总体上优于仅使用F10.7指数的基线。随着更高级别的近似信号和细节信号的增量添加,预测性能得到改善。将F10.7与其第一至第五级近似和细节信号集成的组合6方法优于仅使用近似或细节信号的方法。(2)结合ISN及其小波分解信号并不能提高预测性能。(3)组合6方法显著优于S. Yan et al.(2025)和三种操作模型,与前一种方法相比,RMSE、MAE和MAPE分别降低了18.22%、15.09%和8.57%。它还在四种不同的太阳活动条件下表现出色。(4)我们的方法比H. Ye et al.(2024)在所有预测范围内。据我们所知,这是小波分解在F10.7预测中的首次应用,大大提高了预测性能。
摘要:In this study, we construct Dataset A for training, validation, and testing, and Dataset B to evaluate generalization. We propose a novel F10.7 index forecasting method using wavelet decomposition, which feeds F10.7 together with its decomposed approximate and detail signals into the iTransformer model. We also incorporate the International Sunspot Number (ISN) and its wavelet-decomposed signals to assess their influence on prediction performance. Our optimal method is then compared with the latest method from S. Yan et al. (2025) and three operational models (SWPC, BGS, CLS). Additionally, we transfer our method to the PatchTST model used in H. Ye et al. (2024) and compare our method with theirs on Dataset B. Key findings include: (1) The wavelet-based combination methods overall outperform the baseline using only F10.7 index. The prediction performance improves as higher-level approximate and detail signals are incrementally added. The Combination 6 method integrating F10.7 with its first to fifth level approximate and detail signals outperforms methods using only approximate or detail signals. (2) Incorporating ISN and its wavelet-decomposed signals does not enhance prediction performance. (3) The Combination 6 method significantly surpasses S. Yan et al. (2025) and three operational models, with RMSE, MAE, and MAPE reduced by 18.22%, 15.09%, and 8.57%, respectively, against the former method. It also excels across four different conditions of solar activity. (4) Our method demonstrates superior generalization and prediction capability over the method of H. Ye et al. (2024) across all forecast horizons. To our knowledge, this is the first application of wavelet decomposition in F10.7 prediction, substantially improving forecast performance.
【4】On the Convergence of Stochastic Gradient Descent with Perturbed Forward-Backward Passes
标题:关于前后向扰动的随机梯度下降的收敛性
链接:https://arxiv.org/abs/2602.20646
作者:Boao Kong,Hengrui Zhang,Kun Yuan
备注:34 pages
摘要:我们研究随机梯度下降(SGD)的复合优化问题的$N$序列算子受到扰动的向前和向后通过。与将梯度噪声视为加性和局部化的经典分析不同,对中间输出和梯度的扰动通过计算图级联,与运算符的数量几何复合。我们提出了第一个全面的理论分析,这种设置。具体来说,我们的特点是如何向前和向后扰动传播和放大在一个单一的梯度步骤,获得收敛保证一般非凸目标和功能满足Polyak-作为副产品,我们的分析为深度学习中广泛观察到的梯度尖峰现象提供了理论解释,精确地描述了训练从尖峰或发散中恢复的条件。凸和非凸正则化逻辑回归的实验验证了我们的理论,说明了预测的尖峰行为和不对称的灵敏度向前与向后扰动。
摘要:We study stochastic gradient descent (SGD) for composite optimization problems with $N$ sequential operators subject to perturbations in both the forward and backward passes. Unlike classical analyses that treat gradient noise as additive and localized, perturbations to intermediate outputs and gradients cascade through the computational graph, compounding geometrically with the number of operators. We present the first comprehensive theoretical analysis of this setting. Specifically, we characterize how forward and backward perturbations propagate and amplify within a single gradient step, derive convergence guarantees for both general non-convex objectives and functions satisfying the Polyak--Łojasiewicz condition, and identify conditions under which perturbations do not deteriorate the asymptotic convergence order. As a byproduct, our analysis furnishes a theoretical explanation for the gradient spiking phenomenon widely observed in deep learning, precisely characterizing the conditions under which training recovers from spikes or diverges. Experiments on logistic regression with convex and non-convex regularization validate our theories, illustrating the predicted spike behavior and the asymmetric sensitivity to forward versus backward perturbations.
【5】Selecting Optimal Variable Order in Autoregressive Ising Models
标题:自回归伊辛模型中的最佳变量阶数选择
链接:https://arxiv.org/abs/2602.20394
作者:Shiba Biswal,Marc Vuffray,Andrey Y. Lokhov
摘要
:自回归模型能够从学习的概率分布中进行易于处理的采样,但它们的性能主要取决于通过所得条件分布的复杂性进行因子分解时使用的变量排序。我们建议学习描述底层数据的马尔可夫随机场,并使用推断的图形模型结构来构建优化的变量排序。我们说明了我们的方法在二维图像模型中的结构感知排序导致限制条件集,从而降低模型的复杂性。离散数据的伊辛模型的数值实验表明,图形通知排序产生更高的保真度生成的样本相比,天真的变量排序。
摘要:Autoregressive models enable tractable sampling from learned probability distributions, but their performance critically depends on the variable ordering used in the factorization via complexities of the resulting conditional distributions. We propose to learn the Markov random field describing the underlying data, and use the inferred graphical model structure to construct optimized variable orderings. We illustrate our approach on two-dimensional image-like models where a structure-aware ordering leads to restricted conditioning sets, thereby reducing model complexity. Numerical experiments on Ising models with discrete data demonstrate that graph-informed orderings yield higher-fidelity generated samples compared to naive variable orderings.
预测|估计(7篇)
【1】High-Dimensional Robust Mean Estimation with Untrusted Batches
标题:不可信批的多维鲁棒平均估计
链接:https://arxiv.org/abs/2602.20698
作者:Maryam Aliakbarpour,Vladimir Braverman,Yuhan Liu,Junze Yin
摘要:我们研究高维均值估计在一个合作的设置中,数据是由$N$用户在批量大小$n$。在这种环境中,学习器试图从统计上异质且潜在恶意的源集合中恢复真实分布$P$的平均值$μ$。我们通过一个双重腐败的景观形式化这一挑战:一个$\varepad $-分数的用户是完全敌对的,而其余的"好“用户提供的数据分布是相关的$P$,但偏离了接近参数$α$。 不像现有的工作不可信的批量模型,通常通过总变异距离在离散设置中测量这种偏差,我们解决了连续的,高维制度下的两个自然变量的偏差:(1)良好的批次是从分布与$\sqrtα$的均值漂移,或(2)一个$α $分数的样本在每个良好的批次是不利的损坏。特别是,第二个模型提出了重大的新挑战:在高维,不像离散设置,即使是一小部分的样本级腐败可以任意改变经验均值和协方差。 我们提供了两个基于平方和(SoS)的算法来导航这种分层腐败。我们的算法实现了最小最大最优错误率$O(\sqrt{\varepad/n} + \sqrt{d/nN} + \sqrtα)$,这表明虽然异质性$α$代表了固有的统计困难,但由于批处理结构提供的内部平均,对抗性用户的影响被抑制了1/\sqrt{n}$。
摘要:We study high-dimensional mean estimation in a collaborative setting where data is contributed by $N$ users in batches of size $n$. In this environment, a learner seeks to recover the mean $μ$ of a true distribution $P$ from a collection of sources that are both statistically heterogeneous and potentially malicious. We formalize this challenge through a double corruption landscape: an $\varepsilon$-fraction of users are entirely adversarial, while the remaining ``good'' users provide data from distributions that are related to $P$, but deviate by a proximity parameter $α$. Unlike existing work on the untrusted batch model, which typically measures this deviation via total variation distance in discrete settings, we address the continuous, high-dimensional regime under two natural variants for deviation: (1) good batches are drawn from distributions with a mean-shift of $\sqrtα$, or (2) an $α$-fraction of samples within each good batch are adversarially corrupted. In particular, the second model presents significant new challenges: in high dimensions, unlike discrete settings, even a small fraction of sample-level corruption can shift empirical means and covariances arbitrarily. We provide two Sum-of-Squares (SoS) based algorithms to navigate this tiered corruption. Our algorithms achieve the minimax-optimal error rate $O(\sqrt{\varepsilon/n} + \sqrt{d/nN} + \sqrtα)$, demonstrating that while heterogeneity $α$ represents an inherent statistical difficulty, the influence of adversarial users is suppressed by a factor of $1/\sqrt{n}$ due to the internal averaging afforded by the batch structure.
【2】Bikelution: Federated Gradient-Boosting for Scalable Shared Micro-Mobility Demand Forecasting
标题:Bikelution:可扩展共享微移动需求预测的联合客户增强
链接:https://arxiv.org/abs/2602.20671
作者:Antonios Tziorvas,Andreas Tritsarolis,Yannis Theodoridis
摘要:无码头自行车共享系统的快速增长产生了大量的时空数据集,可用于车队分配,减少拥堵和可持续移动性。然而,自行车需求取决于几个外部因素,使得传统的时间序列模型不足。集中式机器学习(CML)可以产生高精度的预测,但当数据分布在边缘设备上时,会引发隐私和带宽问题。为了克服这些限制,我们提出了Bikelution,这是一种基于梯度提升树的高效联合学习(FL)解决方案,它可以保护隐私,同时提前六小时提供准确的中期需求预测。在三个真实BSS数据集上的实验表明,Bikelution与其基于CML的变体相当,并且优于当前最先进的技术。结果突出了隐私感知需求预测的可行性,并概述了FL和CML方法之间的权衡。
摘要:The rapid growth of dockless bike-sharing systems has generated massive spatio-temporal datasets useful for fleet allocation, congestion reduction, and sustainable mobility. Bike demand, however, depends on several external factors, making traditional time-series models insufficient. Centralized Machine Learning (CML) yields high-accuracy forecasts but raises privacy and bandwidth issues when data are distributed across edge devices. To overcome these limitations, we propose Bikelution, an efficient Federated Learning (FL) solution based on gradient-boosted trees that preserves privacy while delivering accurate mid-term demand forecasts up to six hours ahead. Experiments on three real-world BSS datasets show that Bikelution is comparable to its CML-based variant and outperforms the current state-of-the-art. The results highlight the feasibility of privacy-aware demand forecasting and outline the trade-offs between FL and CML approaches.
【3】Sample-efficient evidence estimation of score based priors for model selection
标题:基于分数的先验样本高效证据估计模型选择
链接:https://arxiv.org/abs/2602.20549
作者:Frederic Wang,Katherine L. Bouman
备注:ICLR 2026
摘要
:先验的选择是解决不适定成像逆问题的关键,因此必须选择一个与测量值一致的先验,以避免严重的偏差。在贝叶斯逆问题中,这可以通过在指定先验的不同模型$M$下评估模型证据$p(y \mid M)$,然后选择具有最高值的模型来实现。扩散模型是解决具有数据驱动先验的逆问题的最先进方法;然而,直接计算关于扩散先验的模型证据是棘手的。此外,大多数现有的模型证据估计需要许多逐点评估的未规范化的先验密度或准确的清洁先验分数。我们提出了\方法,通过整合后验抽样方法的时间边缘的扩散先验模型证据的估计。我们的方法利用在反向扩散采样过程中自然获得的大量中间样本,仅使用少量后验样本(例如,20)。我们还演示了如何实现我们的估计与最近的扩散后验抽样方法。从经验上讲,我们的估计匹配的模型证据时,它可以被计算分析,它是能够选择正确的扩散模型先验和诊断先验失配在不同的高度病态,非线性逆问题,包括现实世界的黑洞成像问题。
摘要:The choice of prior is central to solving ill-posed imaging inverse problems, making it essential to select one consistent with the measurements $y$ to avoid severe bias. In Bayesian inverse problems, this could be achieved by evaluating the model evidence $p(y \mid M)$ under different models $M$ that specify the prior and then selecting the one with the highest value. Diffusion models are the state-of-the-art approach to solving inverse problems with a data-driven prior; however, directly computing the model evidence with respect to a diffusion prior is intractable. Furthermore, most existing model evidence estimators require either many pointwise evaluations of the unnormalized prior density or an accurate clean prior score. We propose \method, an estimator of the model evidence of a diffusion prior by integrating over the time-marginals of posterior sampling methods. Our method leverages the large amount of intermediate samples naturally obtained during the reverse diffusion sampling process to obtain an accurate estimation of the model evidence using only a handful of posterior samples (e.g., 20). We also demonstrate how to implement our estimator in tandem with recent diffusion posterior sampling methods. Empirically, our estimator matches the model evidence when it can be computed analytically, and it is able to both select the correct diffusion model prior and diagnose prior misfit under different highly ill-conditioned, non-linear inverse problems, including a real-world black hole imaging problem.
【4】$κ$-Explorer: A Unified Framework for Active Model Estimation in MDPs
标题:$k $-Explorer:MDP中主动模型估计的统一框架
链接:https://arxiv.org/abs/2602.20404
作者:Xihe Gu,Urbashi Mitra,Tara Javidi
摘要:在具有完美状态可观测性的表马尔可夫决策过程中,每个轨迹都提供了以状态-动作对为条件的转移分布的活动样本。因此,准确的模型估计取决于探索政策如何根据每个过渡分布的内在复杂性来分配访问频率。基于最近的工作覆盖为基础的探索,我们引入了一个参数化的家庭可分解和凹目标函数$U_κ$,明确地将内在的估计复杂性和外在的访问频率。此外,曲率$κ$提供了各种全局目标的统一处理,例如平均情况和最坏情况估计误差目标。利用$U_κ$的梯度的封闭形式特征,我们提出了$κ$-Explorer,一种对状态-动作占用测度进行Frank-Wolfe式优化的主动探索算法. $U_κ$的递减返回结构自然会优先考虑未充分探索和高方差的转换,同时保留平滑特性,从而实现有效的优化。我们为$κ$-Explorer建立了严格的遗憾保证,并进一步引入了一个完全在线和计算效率高的代理算法,以供实际使用。在基准MDP上的实验表明,与现有的探索策略相比,$κ$-Explorer提供了更好的性能。
摘要:In tabular Markov decision processes (MDPs) with perfect state observability, each trajectory provides active samples from the transition distributions conditioned on state-action pairs. Consequently, accurate model estimation depends on how the exploration policy allocates visitation frequencies in accordance with the intrinsic complexity of each transition distribution. Building on recent work on coverage-based exploration, we introduce a parameterized family of decomposable and concave objective functions $U_κ$ that explicitly incorporate both intrinsic estimation complexity and extrinsic visitation frequency. Moreover, the curvature $κ$ provides a unified treatment of various global objectives, such as the average-case and worst-case estimation error objectives. Using the closed-form characterization of the gradient of $U_κ$, we propose $κ$-Explorer, an active exploration algorithm that performs Frank-Wolfe-style optimization over state-action occupancy measures. The diminishing-returns structure of $U_κ$ naturally prioritizes underexplored and high-variance transitions, while preserving smoothness properties that enable efficient optimization. We establish tight regret guarantees for $κ$-Explorer and further introduce a fully online and computationally efficient surrogate algorithm for practical use. Experiments on benchmark MDPs demonstrate that $κ$-Explorer provides superior performance compared to existing exploration strategies.
【5】Enhancing Heat Sink Efficiency in MOSFETs using Physics Informed Neural Networks: A Systematic Study on Coolant Velocity Estimation
标题:使用物理知识神经网络提高MOS散热器效率:散热速度估计的系统研究
链接:https://arxiv.org/abs/2602.20177
作者:Aniruddha Bora,Isabel K. Alvarez,Julie Chalfant,Chryssostomos Chryssostomidis
摘要:在这项工作中,我们提出了一种方法,使用物理信息神经网络(PINNs),以确定所需的冷却剂的速度,给定的入口和出口温度为给定的热通量在多层金属氧化物半导体场效应晶体管(MOSFET)。MOSFET是电力电子构建模块(PEBB)的组成部分,并承受大部分热负载。因此,MOSFET的有效冷却对于防止过热和潜在烧毁至关重要。确定有效冷却所需的速度是很重要的,但这是一个不适定的反问题,很难用传统的方法来解决。MOSFET由具有不同热导率的多层组成,包括铝、热解石墨片(PGS)和含有流动水的不锈钢管。我们提出了一种算法,采用顺序训练的MOSFET层PINN。在数学上,顺序训练方法通过在其训练阶段将其他层的参数视为常数来实现每层的优化。这降低了优化景观的维度,使其更容易找到每个层参数的全局最小值,并避免不良的局部最小值。从理论上分析了PINNs解与解析解的收敛性。最后,我们表明,我们提出的方法的预测是在良好的协议与实验结果。
摘要
:In this work, we present a methodology using Physics Informed Neural Networks (PINNs) to determine the required velocity of a coolant, given inlet and outlet temperatures for a given heat flux in a multilayered metal-oxide-semiconductor field-effect transistor (MOSFET). MOSFETs are integral components of Power Electronic Building Blocks (PEBBs) and experiences the majority of the thermal load. Effective cooling of MOSFETs is therefore essential to prevent overheating and potential burnout. Determining the required velocity for the purpose of effective cooling is of importance but is an ill-posed inverse problem and difficult to solve using traditional methods. MOSFET consists of multiple layers with different thermal conductivities, including aluminum, pyrolytic graphite sheets (PGS), and stainless steel pipes containing flowing water. We propose an algorithm that employs sequential training of the MOSFET layers in PINNs. Mathematically, the sequential training method decouples the optimization of each layer by treating the parameters of other layers as constants during its training phase. This reduces the dimensionality of the optimization landscape, making it easier to find the global minimum for each layer's parameters and avoid poor local minima. Convergence of the PINNs solution to the analytical solution is theoretically analyzed. Finally we show the prediction of our proposed methodology to be in good agreement with experimental results.
【6】Benchmarking Early Deterioration Prediction Across Hospital-Rich and MCI-Like Emergency Triage Under Constrained Sensing
标题:在限制感知下,在医院丰富和类似MCC的紧急分诊中对早期恶化预测进行基准
链接:https://arxiv.org/abs/2602.20168
作者:KMA Solaiman,Joshua Sebastian,Karma Tobden
备注:10 pages, 4 figures, 6 tables. Submitted to IEEE ICHI 2026
摘要:紧急分诊决策是在严格的信息约束下做出的,但大多数数据驱动的恶化模型都是使用初始评估期间不可用的信号进行评估的。我们提出了一个泄漏感知的基准框架,早期恶化预测,评估模型的性能在现实的,有时间限制的传感条件下。使用来自MIMIC-IV-ED的患者去重复队列,我们将医院丰富的分诊与仅生命体征的MCI样设置进行比较,将输入限制在第一个小时内提供的信息。在多种建模方法中,当仅限于生命体征时,预测性能仅适度下降,这表明早期生理测量保留了大量的临床信号。结构化消融和可解释性分析将呼吸和氧合指标确定为早期风险分层的最有影响力的因素,随着感知降低,模型表现出稳定、适度的退化。这项工作提供了一个临床接地基准,以支持评估和设计的可部署的分流决策支持系统在资源有限的设置。
摘要:Emergency triage decisions are made under severe information constraints, yet most data-driven deterioration models are evaluated using signals unavailable during initial assessment. We present a leakage-aware benchmarking framework for early deterioration prediction that evaluates model performance under realistic, time-limited sensing conditions. Using a patient-deduplicated cohort derived from MIMIC-IV-ED, we compare hospital-rich triage with a vitals-only, MCI-like setting, restricting inputs to information available within the first hour of presentation. Across multiple modeling approaches, predictive performance declines only modestly when limited to vitals, indicating that early physiological measurements retain substantial clinical signal. Structured ablation and interpretability analyses identify respiratory and oxygenation measures as the most influential contributors to early risk stratification, with models exhibiting stable, graceful degradation as sensing is reduced. This work provides a clinically grounded benchmark to support the evaluation and design of deployable triage decision-support systems in resource-constrained settings.
【7】KEMP-PIP: A Feature-Fusion Based Approach for Pro-inflammatory Peptide Prediction
标题:KEMP-PIP:一种基于融合的促炎症肽预测方法
链接:https://arxiv.org/abs/2602.20198
作者:Soumik Deb Niloy,Md. Fahmid-Ul-Alam Juboraj,Swakkhar Shatabda
备注:11 pages, 4 figures, 6 tables; includes web server and GitHub implementation
摘要:促炎肽(PIP)在免疫信号传导和炎症中起关键作用,但由于昂贵且耗时的测定而难以通过实验鉴定。为了应对这一挑战,我们提出了KEMP-PIP,这是一个混合机器学习框架,它将深度蛋白质嵌入与手工描述符集成在一起,以实现强大的PIP预测。我们的方法将来自预训练的ESM蛋白质语言模型的上下文嵌入与多尺度k-mer频率,物理化学描述符和modlAMP序列特征相结合。特征修剪和类加权逻辑回归管理高维和类不平衡,而集成平均与优化的决策阈值增强了灵敏度-特异性平衡。通过系统的消融研究,我们证明了整合互补特征集可以持续提高预测性能。在标准基准数据集上,KEMP-PIP实现了0.505的MCC,0.752的准确度和0.762的AUC,优于ProIn-fuse,Multiplayer VotPIP和StackPIP。相对于StackPIP,这些结果代表MCC的9.5%和准确性和AUC的4.8%的改善。KEMP-PIP Web服务器可在https://nilsparrow1920-kemp-pip.hf.space/上免费获得,完整实施可在https://github.com/S18-Niloy/KEMP-PIP上获得。
摘要:Pro-inflammatory peptides (PIPs) play critical roles in immune signaling and inflammation but are difficult to identify experimentally due to costly and time-consuming assays. To address this challenge, we present KEMP-PIP, a hybrid machine learning framework that integrates deep protein embeddings with handcrafted descriptors for robust PIP prediction. Our approach combines contextual embeddings from pretrained ESM protein language models with multi-scale k-mer frequencies, physicochemical descriptors, and modlAMP sequence features. Feature pruning and class-weighted logistic regression manage high dimensionality and class imbalance, while ensemble averaging with an optimized decision threshold enhances the sensitivity--specificity balance. Through systematic ablation studies, we demonstrate that integrating complementary feature sets consistently improves predictive performance. On the standard benchmark dataset, KEMP-PIP achieves an MCC of 0.505, accuracy of 0.752, and AUC of 0.762, outperforming ProIn-fuse, MultiFeatVotPIP, and StackPIP. Relative to StackPIP, these results represent improvements of 9.5% in MCC and 4.8% in both accuracy and AUC. The KEMP-PIP web server is freely available at https://nilsparrow1920-kemp-pip.hf.space/ and the full implementation at https://github.com/S18-Niloy/KEMP-PIP.
其他神经网络|深度学习|模型|建模(29篇)
【1】Statistical Query Lower Bounds for Smoothed Agnostic Learning
标题:平滑不可知学习的统计查询下限
链接:https://arxiv.org/abs/2602.21191
作者:Ilias Diakonikolas,Daniel M. Kane
摘要:我们研究了平滑不可知学习的复杂性,最近推出的~\cite{CKKMS 24},其中学习者在输入的轻微高斯扰动下与目标类中的最佳分类器竞争。具体来说,我们专注于原型任务的不可知学习半空间下的亚高斯分布的平滑模型。这个问题最著名的上界依赖于$L_1$-多项式回归,复杂度为$d^{\tilde{O}(1/σ^2)\log(1/ε)}$,其中$σ$是平滑参数,$ε$是超额误差。我们的主要结果是一个统计查询(SQ)下限提供正式的证据表明,这个上限是接近最好的可能。更详细地说,我们证明了(即使对于高斯边缘)任何用于半空间的平滑不可知学习的SQ算法都需要复杂度$d^{Ω(1/σ^{2}+\log(1/ε))}$。这是该任务复杂性的第一个非平凡下限,几乎与已知的上限相匹配。粗略地说,我们表明,应用$L_1$-多项式回归的平滑版本的功能基本上是最好的可能。我们的技术涉及通过线性规划对偶找到一个时刻匹配的硬分布。这个对偶程序正好对应于找到目标函数的平滑版本的低次近似多项式(这证明是$L_1$-多项式回归工作所需的相同条件)。我们明确的SQ下界,然后来自证明这个近似度的半空间类的下界。
摘要:We study the complexity of smoothed agnostic learning, recently introduced by~\cite{CKKMS24}, in which the learner competes with the best classifier in a target class under slight Gaussian perturbations of the inputs. Specifically, we focus on the prototypical task of agnostically learning halfspaces under subgaussian distributions in the smoothed model. The best known upper bound for this problem relies on $L_1$-polynomial regression and has complexity $d^{\tilde{O}(1/σ^2) \log(1/ε)}$, where $σ$ is the smoothing parameter and $ε$ is the excess error. Our main result is a Statistical Query (SQ) lower bound providing formal evidence that this upper bound is close to best possible. In more detail, we show that (even for Gaussian marginals) any SQ algorithm for smoothed agnostic learning of halfspaces requires complexity $d^{Ω(1/σ^{2}+\log(1/ε))}$. This is the first non-trivial lower bound on the complexity of this task and nearly matches the known upper bound. Roughly speaking, we show that applying $L_1$-polynomial regression to a smoothed version of the function is essentially best possible. Our techniques involve finding a moment-matching hard distribution by way of linear programming duality. This dual program corresponds exactly to finding a low-degree approximating polynomial to the smoothed version of the target function (which turns out to be the same condition required for the $L_1$-polynomial regression to work). Our explicit SQ lower bound then comes from proving lower bounds on this approximation degree for the class of halfspaces.
【2】Scaling State-Space Models on Multiple GPUs with Tensor Parallelism
标题:利用张量平衡主义在多个图形处理器上扩展状态空间模型
链接:https://arxiv.org/abs/2602.21144
作者:Anurag Dutt,Nimit Shah,Hazem Masarani,Anshul Gandhi
备注:Submitted to 46th IEEE International Conference on Distributed Computing Systems (ICDCS 2026)
摘要:选择性状态空间模型(SSM)已经迅速成为大型语言模型的重要支柱,特别是对于长上下文工作负载。然而,在部署中,它们的推理性能通常受到单个GPU的内存容量、带宽和延迟限制的限制,这使得多GPU执行变得越来越必要。虽然张量并行(TP)被广泛用于缩放Transformer推理,但将其应用于选择性SSM块是不平凡的,因为SSM混合器将大投影与顺序递归状态更新和局部混合相耦合,其效率取决于保持局部性并避免关键路径中的同步。 本文提出了一种用于选择性SSM推理的通信高效TP设计,该设计解决了三个实际工程挑战:通过预填充和解码的SSM状态缓存实现TTFT改进,划分混合器的打包参数张量,以便在最小化通信的同时经常性更新保持本地,以及通过量化AllReduce减少TP聚合开销。我们在NVIDIA A6000和A100集群上评估了三个代表性的基于SSM的LLM,这些LLM跨越了纯SSM和混合架构- Mamba、Falcon-Mamba和Zamba。我们的实验表明,张量并行SSM推理带来了可观的吞吐量增益,Mamba的批处理请求吞吐量在2个GPU上提高了约1.6- 2.1倍,在4个GPU上提高了约2.6- 4.0倍,在长上下文长度下获得了最大的好处,并通过降低同步带宽开销,从量化的all-reduce中进一步提高了约10-18%的吞吐量。
摘要:Selective state space models (SSMs) have rapidly become a compelling backbone for large language models, especially for long-context workloads. Yet in deployment, their inference performance is often bounded by the memory capacity, bandwidth, and latency limits of a single GPU, making multi-GPU execution increasingly necessary. Although tensor parallelism (TP) is widely used to scale Transformer inference, applying it to selective SSM blocks is non-trivial because the SSM mixer couples large projections with a sequence-wise recurrent state update and local mixing whose efficiency depends on preserving locality and avoiding synchronization in the critical path. This paper presents a communication-efficient TP design for selective SSM inference that addresses three practical engineering challenges: enabling TTFT improvements via an SSM state cache across prefill and decode, partitioning the mixer's packed parameter tensor so that recurrent updates remain local while minimizing communication, and reducing TP aggregation overhead with quantized AllReduce. We evaluate on three representative SSM-based LLMs spanning pure-SSM and hybrid architectures - Mamba, Falcon-Mamba, and Zamba - on NVIDIA A6000 and A100 clusters. Our experiments show substantial throughput gains from tensor-parallel SSM inference, improving batch-request throughput by ~1.6-2.1x on 2 GPUs and ~2.6-4.0x on 4 GPUs for Mamba, with the largest benefits at long context lengths, and achieving a further ~10-18% throughput improvement from quantized all-reduce by lowering synchronization bandwidth overhead.
【3】LUMEN: Longitudinal Multi-Modal Radiology Model for Prognosis and Diagnosis
标题:LUMEN:用于预后和诊断的纵向多模式放射学模型
链接:https://arxiv.org/abs/2602.21142
作者:Zhifan Jiang,Dong Yang,Vishwesh Nath,Abhijeet Parida,Nishad P. Kulkarni,Ziyue Xu,Daguang Xu,Syed Muhammad Anwar,Holger R. Roth,Marius George Linguraru
备注:Accepted to IEEE International Symposium on Biomedical Imaging (ISBI) 2026
摘要:大型视觉语言模型(VLM)已经从通用应用发展到临床领域等专业用例,展示了放射学决策支持的潜力。一个有前途的应用是通过视觉和自然语言问答(VQA)界面分析胸部X射线(CXR)等放射学成像数据,帮助放射科医生做出决策。当纵向成像可用时,放射科医生分析时间变化,这对准确诊断和预后至关重要。手动纵向分析是一个耗时的过程,促使开发一个可以提供预测能力的培训框架。我们引入了一种新的训练框架LUMEN,该框架针对纵向CXR解释进行了优化,利用多图像和多任务指令微调来增强预后和诊断性能。我们在公开的MIMIC-CXR及其相关的Medical-Diff-VQA数据集上进行实验。我们进一步制定和构建了一个新的预防以下数据集纳入纵向研究,使发展的预后VQA任务。我们的方法在诊断VQA任务中表现出比基线模型的显着改进,更重要的是,显示出有希望的预后能力。这些结果强调了精心设计的、调整的VLM在实现纵向放射成像数据的更准确和有临床意义的放射学解释方面的价值。
摘要
:Large vision-language models (VLMs) have evolved from general-purpose applications to specialized use cases such as in the clinical domain, demonstrating potential for decision support in radiology. One promising application is assisting radiologists in decision-making by the analysis of radiology imaging data such as chest X-rays (CXR) via a visual and natural language question-answering (VQA) interface. When longitudinal imaging is available, radiologists analyze temporal changes, which are essential for accurate diagnosis and prognosis. The manual longitudinal analysis is a time-consuming process, motivating the development of a training framework that can provide prognostic capabilities. We introduce a novel training framework LUMEN, that is optimized for longitudinal CXR interpretation, leveraging multi-image and multi-task instruction fine-tuning to enhance prognostic and diagnostic performance. We conduct experiments on the publicly available MIMIC-CXR and its associated Medical-Diff-VQA datasets. We further formulate and construct a novel instruction-following dataset incorporating longitudinal studies, enabling the development of a prognostic VQA task. Our method demonstrates significant improvements over baseline models in diagnostic VQA tasks, and more importantly, shows promising potential for prognostic capabilities. These results underscore the value of well-designed, instruction-tuned VLMs in enabling more accurate and clinically meaningful radiological interpretation of longitudinal radiological imaging data.
【4】SOM-VQ: Topology-Aware Tokenization for Interactive Generative Models
标题:SOM-VQ:交互式生成模型的布局感知令牌化
链接:https://arxiv.org/abs/2602.21133
作者:Alessandro Londei,Denise Lanzieri,Matteo Benati
摘要:矢量量化表示支持强大的离散生成模型,但在令牌空间中缺乏语义结构,限制了可解释的人类控制。我们介绍了SOM-VQ,这是一种将矢量量化与自组织映射相结合的标记化方法,用于学习具有显式低维拓扑的离散码本。与标准VQ-VAE不同,SOM-VQ使用拓扑感知更新来保持邻域结构:学习网格上的附近标记对应于语义相似的状态,从而实现对潜在空间的直接几何操作。我们证明,SOM-VQ产生更多的可学习的令牌序列中的评估域,同时提供了一个明确的导航几何代码空间。至关重要的是,拓扑组织实现了直观的人在回路控制:用户可以通过操纵令牌空间中的距离来引导生成,实现语义对齐而无需帧级约束。我们专注于人类运动生成-一个运动学结构,平滑的时间连续性和交互式用例(编排,康复,HCI)使拓扑感知控制特别自然的领域-通过简单的基于网格的采样展示了参考序列的受控发散和收敛。SOM-VQ为可解释的离散表示提供了一个通用框架,适用于音乐,手势和其他交互生成领域。
摘要:Vector-quantized representations enable powerful discrete generative models but lack semantic structure in token space, limiting interpretable human control. We introduce SOM-VQ, a tokenization method that combines vector quantization with Self-Organizing Maps to learn discrete codebooks with explicit low-dimensional topology. Unlike standard VQ-VAE, SOM-VQ uses topology-aware updates that preserve neighborhood structure: nearby tokens on a learned grid correspond to semantically similar states, enabling direct geometric manipulation of the latent space. We demonstrate that SOM-VQ produces more learnable token sequences in the evaluated domains while providing an explicit navigable geometry in code space. Critically, the topological organization enables intuitive human-in-the-loop control: users can steer generation by manipulating distances in token space, achieving semantic alignment without frame-level constraints. We focus on human motion generation - a domain where kinematic structure, smooth temporal continuity, and interactive use cases (choreography, rehabilitation, HCI) make topology-aware control especially natural - demonstrating controlled divergence and convergence from reference sequences through simple grid-based sampling. SOM-VQ provides a general framework for interpretable discrete representations applicable to music, gesture, and other interactive generative domains.
【5】Matching Multiple Experts: On the Exploitability of Multi-Agent Imitation Learning
标题:匹配多个专家:多智能体模仿学习的利用性
链接:https://arxiv.org/abs/2602.21020
作者:Antoine Bergerault,Volkan Cevher,Negar Mehr
摘要:多智能体模仿学习(MA-IL)旨在从多智能体交互领域的专家交互演示中学习最优策略。尽管现有的保证所得到的学习政策的性能,学习政策有多远的纳什均衡的特征是失踪的离线MA-IL。在本文中,我们证明了不可能和困难的结果,学习低可利用的政策,一般$n$-局中人马尔可夫游戏。我们这样做,提供的例子,即使是精确的措施匹配失败,并展示了一个新的硬度的结果表征纳什差距给定一个固定的措施匹配误差。然后,我们将展示如何利用专家均衡的战略优势假设来克服这些挑战。具体来说,对于优势策略专家均衡的情况,假设行为克隆错误$ε_{\text{BC}}$,这提供了一个折扣因子$γ$的纳什模仿差距$\mathcal{O}\left(nε_{\text{BC}}/(1-γ)^2\right)$。我们推广了这一结果与最佳反应连续性的新概念,并认为这是隐含的鼓励标准正则化技术。
摘要:Multi-agent imitation learning (MA-IL) aims to learn optimal policies from expert demonstrations of interactions in multi-agent interactive domains. Despite existing guarantees on the performance of the resulting learned policies, characterizations of how far the learned polices are from a Nash equilibrium are missing for offline MA-IL. In this paper, we demonstrate impossibility and hardness results of learning low-exploitable policies in general $n$-player Markov Games. We do so by providing examples where even exact measure matching fails, and demonstrating a new hardness result on characterizing the Nash gap given a fixed measure matching error. We then show how these challenges can be overcome using strategic dominance assumptions on the expert equilibrium. Specifically, for the case of dominant strategy expert equilibria, assuming Behavioral Cloning error $ε_{\text{BC}}$, this provides a Nash imitation gap of $\mathcal{O}\left(nε_{\text{BC}}/(1-γ)^2\right)$ for a discount factor $γ$. We generalize this result with a new notion of best-response continuity, and argue that this is implicitly encouraged by standard regularization techniques.
【6】MAST: A Multi-fidelity Augmented Surrogate model via Spatial Trust-weighting
标题:MAST:通过空间信任加权的多保真增强代理模型
链接:https://arxiv.org/abs/2602.20974
作者:Ahmed Mohamed Eisa Nasr,Haris Moazam Sheikh
备注:Submitted to International Conference on Machine Learning 2026
摘要:在工程设计和科学计算中,计算成本和预测精度是内在耦合的。高保真模拟提供准确的预测,但在大量的计算成本,而较低的保真度近似提供效率的准确性为代价。多保真度替代模型通过将丰富的低保真度数据与稀疏的高保真度观测相结合来解决这种权衡。然而,现有的方法遭受昂贵的训练成本或依赖于全局相关性假设,这些假设在实践中往往无法捕捉保真度关系如何在输入空间中变化,导致性能不佳,特别是在预算紧张的情况下。我们引入了MAST,这是一种将校正后的低保真观测与高保真度预测相结合的方法,信任高保真度的近观察样本并依赖于其他地方校正后的低保真度。MAST通过显式差异建模和基于距离的加权与封闭形式的方差传播,产生一个单一的异方差高斯过程。在多保真度合成基准测试中,MAST显示出比当前最先进的技术有明显的改进。至关重要的是,MAST在不同的总预算和保真度差距中保持稳健的性能,在这种情况下,竞争方法表现出显着的退化或不稳定的行为。
摘要
:In engineering design and scientific computing, computational cost and predictive accuracy are intrinsically coupled. High-fidelity simulations provide accurate predictions but at substantial computational costs, while lower-fidelity approximations offer efficiency at the expense of accuracy. Multi-fidelity surrogate modelling addresses this trade-off by combining abundant low-fidelity data with sparse high-fidelity observations. However, existing methods suffer from expensive training cost or rely on global correlation assumptions that often fail in practice to capture how fidelity relationships vary across the input space, leading to poor performance particularly under tight budget constraints. We introduce MAST, a method that blends corrected low-fidelity observations with high-fidelity predictions, trusting high-fidelity near observed samples and relying on corrected low-fidelity elsewhere. MAST achieves this through explicit discrepancy modelling and distance-based weighting with closed-form variance propagation, producing a single heteroscedastic Gaussian process. Across multi-fidelity synthetic benchmarks, MAST shows a marked improvement over the current state-of-the-art techniques. Crucially, MAST maintains robust performance across varying total budget and fidelity gaps, conditions under which competing methods exhibit significant degradation or unstable behaviour.
【7】Extending $μ$P: Spectral Conditions for Feature Learning Across Optimizers
标题:扩展$μ$P:跨优化器的特征学习谱条件
链接:https://arxiv.org/abs/2602.20937
作者:Akshita Gupta,Marieme Ngom,Sam Foreman,Venkatram Vishwanath
备注:10 main pages, 16 appendix pages and 17 figures; Amended version of the publication in 17th International OPT Workshop on Optimization for Machine Learning
摘要:已经提出了自适应一阶和二阶优化方法的几种变体,以加速和扩展大型语言模型的训练。这些优化例程的性能对超参数(HP)的选择高度敏感,而针对大规模模型进行调整的计算成本很高。最大更新参数化$(μ$P$)$是一组缩放规则,其旨在使最佳HP独立于模型大小,从而允许在较小(计算上更便宜)模型上调谐的HP被转移到训练较大的目标模型。尽管SGD和Adam的结果很有希望,但推导其他优化器的$μ$P是具有挑战性的,因为底层的张量编程方法很难掌握。最近的工作,介绍了光谱条件作为替代张量程序的基础上,我们提出了一个新的框架,以获得更广泛的优化类,包括AdamW,ADOPT,LAMB,索菲亚,洗发水和μ子$μ$P。我们在多个基准模型上实现了我们的$μ$P推导,并证明了上述优化器在增加模型宽度时的zero-shot学习率转移。此外,我们提供了经验的深度缩放参数化这些优化的见解。
摘要:Several variations of adaptive first-order and second-order optimization methods have been proposed to accelerate and scale the training of large language models. The performance of these optimization routines is highly sensitive to the choice of hyperparameters (HPs), which are computationally expensive to tune for large-scale models. Maximal update parameterization $(μ$P$)$ is a set of scaling rules which aims to make the optimal HPs independent of the model size, thereby allowing the HPs tuned on a smaller (computationally cheaper) model to be transferred to train a larger, target model. Despite promising results for SGD and Adam, deriving $μ$P for other optimizers is challenging because the underlying tensor programming approach is difficult to grasp. Building on recent work that introduced spectral conditions as an alternative to tensor programs, we propose a novel framework to derive $μ$P for a broader class of optimizers, including AdamW, ADOPT, LAMB, Sophia, Shampoo and Muon. We implement our $μ$P derivations on multiple benchmark models and demonstrate zero-shot learning rate transfer across increasing model width for the above optimizers. Further, we provide empirical insights into depth-scaling parameterization for these optimizers.
【8】On the Generalization Behavior of Deep Residual Networks From a Dynamical System Perspective
标题:从动态系统角度研究深度剩余网络的推广行为
链接:https://arxiv.org/abs/2602.20921
作者:Jinshu Huang,Mingfei Sun,Chunlin Wu
摘要:深度神经网络(DNN)大大推进了机器学习,模型深度在其成功中发挥了核心作用。动态系统建模方法最近已经成为一个强大的框架,为DNN的结构和学习行为提供了新的数学见解。在这项工作中,我们通过结合Rademacher复杂性,动力系统的流图以及ResNets在深层极限中的收敛行为,建立了离散和连续时间残差网络(ResNets)的泛化误差界。由此产生的界限是为了$O(1/\sqrt{S})$相对于训练样本$S$的数量,并包括一个结构依赖的负项,在较温和的假设下产生深度一致和渐近推广的界限。这些发现提供了对离散和连续时间ResNet泛化的统一理解,有助于缩小离散和连续时间设置之间样本复杂性顺序和假设之间的差距。
摘要:Deep neural networks (DNNs) have significantly advanced machine learning, with model depth playing a central role in their successes. The dynamical system modeling approach has recently emerged as a powerful framework, offering new mathematical insights into the structure and learning behavior of DNNs. In this work, we establish generalization error bounds for both discrete- and continuous-time residual networks (ResNets) by combining Rademacher complexity, flow maps of dynamical systems, and the convergence behavior of ResNets in the deep-layer limit. The resulting bounds are of order $O(1/\sqrt{S})$ with respect to the number of training samples $S$, and include a structure-dependent negative term, yielding depth-uniform and asymptotic generalization bounds under milder assumptions. These findings provide a unified understanding of generalization across both discrete- and continuous-time ResNets, helping to close the gap in both the order of sample complexity and assumptions between the discrete- and continuous-time settings.
【9】Regret-Guided Search Control for Efficient Learning in AlphaZero
标题:遗憾引导搜索控制在AlphaZero中实现高效学习
链接:https://arxiv.org/abs/2602.20809
作者:Yun-Jui Tsai,Wei-Yu Chen,Yan-Ru Ju,Yu-Hung Chang,Ti-Rong Wu
备注:Accepted by the Fourteenth International Conference on Learning Representations (ICLR 2026)
摘要
:强化学习(RL)代理实现了卓越的性能,但仍然远不如人类的学习效率。虽然RL代理需要大量的自我游戏来提取有用的信号,但人类通常只需要几个游戏,通过反复重新访问发生错误的状态来快速改进。这种被称为搜索控制的想法旨在从有价值的状态重新开始,而不是总是从初始状态重新开始。在AlphaZero中,之前的工作Go-Exploit通过从自我游戏或搜索树中采样过去的状态来应用这个想法,但它平等地对待所有状态,而不管它们的学习潜力如何。我们提出了后悔引导搜索控制(RGSC),它扩展了AlphaZero的后悔网络,学习识别高后悔状态,其中代理的评估与实际结果最不一致。这些状态从自我游戏轨迹和MCTS节点收集,存储在优先后悔缓冲区中,并作为新的起始位置重复使用。在9 x9 Go,10 x10 Othello和11 x11 Hex中,RGSC的平均性能分别超过AlphaZero和Go-Exploit 77和89 Elo。当在经过良好训练的9 x9 Go模型上训练时,RGSC进一步将对KataGo的胜率从69.3%提高到78.2%,而两个基线都没有改善。这些结果表明,RGSC提供了一种有效的搜索控制机制,提高了AlphaZero训练的效率和鲁棒性。我们的代码可在https://rlg.iis.sinica.edu.tw/papers/rgsc上获得。
摘要:Reinforcement learning (RL) agents achieve remarkable performance but remain far less learning-efficient than humans. While RL agents require extensive self-play games to extract useful signals, humans often need only a few games, improving rapidly by repeatedly revisiting states where mistakes occurred. This idea, known as search control, aims to restart from valuable states rather than always from the initial state. In AlphaZero, prior work Go-Exploit applies this idea by sampling past states from self-play or search trees, but it treats all states equally, regardless of their learning potential. We propose Regret-Guided Search Control (RGSC), which extends AlphaZero with a regret network that learns to identify high-regret states, where the agent's evaluation diverges most from the actual outcome. These states are collected from both self-play trajectories and MCTS nodes, stored in a prioritized regret buffer, and reused as new starting positions. Across 9x9 Go, 10x10 Othello, and 11x11 Hex, RGSC outperforms AlphaZero and Go-Exploit by an average of 77 and 89 Elo, respectively. When training on a well-trained 9x9 Go model, RGSC further improves the win rate against KataGo from 69.3% to 78.2%, while both baselines show no improvement. These results demonstrate that RGSC provides an effective mechanism for search control, improving both efficiency and robustness of AlphaZero training. Our code is available at https://rlg.iis.sinica.edu.tw/papers/rgsc.
【10】Exploring the Impact of Parameter Update Magnitude on Forgetting and Generalization of Continual Learning
标题:探索参数更新幅度对持续学习遗忘和概括的影响
链接:https://arxiv.org/abs/2602.20796
作者:JinLi He,Liang Bai,Xian Yang
摘要:参数更新的幅度被认为是持续学习的关键因素。然而,大多数现有的研究都集中在设计不同的更新策略,而对潜在机制的理论理解仍然有限。因此,我们从参数更新幅度的角度来描述模型的遗忘,并将其形式化为由参数空间中的特定任务漂移引起的知识退化,这在以前的研究中没有完全捕获,由于他们假设一个统一的参数空间。通过推导出最小化遗忘的最佳参数更新幅度,我们统一了两个代表性的更新范例,冻结训练和初始化训练,在约束参数更新的优化框架内。我们的理论结果进一步揭示了具有小参数距离的序列任务在冻结训练而不是初始化训练下表现出更好的泛化和更少的遗忘。这些理论的见解启发了一种新的混合参数更新策略,自适应调整更新幅度的基础上梯度方向。在深度神经网络上的实验表明,这种混合方法优于标准训练策略,为设计高效、可扩展的持续学习算法提供了新的理论视角和实践灵感。
摘要:The magnitude of parameter updates are considered a key factor in continual learning. However, most existing studies focus on designing diverse update strategies, while a theoretical understanding of the underlying mechanisms remains limited. Therefore, we characterize model's forgetting from the perspective of parameter update magnitude and formalize it as knowledge degradation induced by task-specific drift in the parameter space, which has not been fully captured in previous studies due to their assumption of a unified parameter space. By deriving the optimal parameter update magnitude that minimizes forgetting, we unify two representative update paradigms, frozen training and initialized training, within an optimization framework for constrained parameter updates. Our theoretical results further reveals that sequence tasks with small parameter distances exhibit better generalization and less forgetting under frozen training rather than initialized training. These theoretical insights inspire a novel hybrid parameter update strategy that adaptively adjusts update magnitude based on gradient directions. Experiments on deep neural networks demonstrate that this hybrid approach outperforms standard training strategies, providing new theoretical perspectives and practical inspiration for designing efficient and scalable continual learning algorithms.
【11】UrbanFM: Scaling Urban Spatio-Temporal Foundation Models
标题:UrbanFM:缩放城市时空基础模型
链接:https://arxiv.org/abs/2602.20677
作者:Wei Chen,Yuqian Wu,Junle Chen,Xiaofang Zhou,Yuxuan Liang
摘要:城市系统作为动态的复杂系统,不断产生时空数据流,这些数据流编码了人类流动和城市演化的基本规律。虽然科学人工智能已经见证了基因组学和气象学等学科基础模型的变革力量,但由于“特定于城市”的模型过于适合特定区域或任务,城市计算仍然是碎片化的,这阻碍了它们的普遍性。为了弥合这一差距,推进城市系统的时空基础模型,我们采用缩放为中心的角度和系统地研究两个关键问题:什么规模和如何规模。基于第一性原理分析,我们确定了三个关键维度:异质性,相关性和动态性,使这些原则与城市时空数据的基本科学属性相一致。具体来说,为了通过数据缩放来解决异质性,我们构建了WorldST。这个数十亿规模的语料库将来自全球100多个城市的各种物理信号(如交通流量和速度)转换为统一的数据格式。为了使计算缩放建模相关性,我们引入了MiniST单元,一种新的分裂机制,将连续的时空场离散化为可学习的计算单元,以统一基于网格和基于传感器的观测的表示。最后,通过架构扩展解决动态,我们提出了UrbanFM,一个极简的自我注意力架构,设计有限的归纳偏见,自主学习动态时空依赖性从海量数据。此外,我们建立了EvalST,迄今为止最大规模的城市时空基准。大量的实验表明,UrbanFM在看不见的城市和任务中实现了显着的zero-shot泛化,标志着迈向大规模城市时空基础模型的关键第一步。
摘要
:Urban systems, as dynamic complex systems, continuously generate spatio-temporal data streams that encode the fundamental laws of human mobility and city evolution. While AI for Science has witnessed the transformative power of foundation models in disciplines like genomics and meteorology, urban computing remains fragmented due to "scenario-specific" models, which are overfitted to specific regions or tasks, hindering their generalizability. To bridge this gap and advance spatio-temporal foundation models for urban systems, we adopt scaling as the central perspective and systematically investigate two key questions: what to scale and how to scale. Grounded in first-principles analysis, we identify three critical dimensions: heterogeneity, correlation, and dynamics, aligning these principles with the fundamental scientific properties of urban spatio-temporal data. Specifically, to address heterogeneity through data scaling, we construct WorldST. This billion-scale corpus standardizes diverse physical signals, such as traffic flow and speed, from over 100 global cities into a unified data format. To enable computation scaling for modeling correlations, we introduce the MiniST unit, a novel split mechanism that discretizes continuous spatio-temporal fields into learnable computational units to unify representations of grid-based and sensor-based observations. Finally, addressing dynamics via architecture scaling, we propose UrbanFM, a minimalist self-attention architecture designed with limited inductive biases to autonomously learn dynamic spatio-temporal dependencies from massive data. Furthermore, we establish EvalST, the largest-scale urban spatio-temporal benchmark to date. Extensive experiments demonstrate that UrbanFM achieves remarkable zero-shot generalization across unseen cities and tasks, marking a pivotal first step toward large-scale urban spatio-temporal foundation models.
【12】Sparse Bayesian Deep Functional Learning with Structured Region Selection
标题:具有结构化区域选择的稀疏Bayesian深度功能学习
链接:https://arxiv.org/abs/2602.20651
作者:Xiaoxian Zhu,Yingmeng Li,Shuangge Ma,Mengyun Wu
摘要:在ECG监测、神经成像、可穿戴传感和工业设备诊断等现代应用中,复杂且连续结构化的数据无处不在,为功能数据分析带来了挑战和机遇。然而,现有的方法面临着一个关键的权衡:传统的函数模型受到线性的限制,而深度学习方法缺乏稀疏效应的可解释区域选择。为了弥合这些差距,我们提出了一个稀疏贝叶斯函数深度神经网络(sBayFDNN)。它通过深度贝叶斯架构来学习自适应函数嵌入,以捕获复杂的非线性关系,而结构化的先验知识可以通过量化的不确定性来选择可解释的、区域性的有影响力的域。理论上,我们建立严格的近似误差界,后验一致性,区域选择的一致性。这些结果为贝叶斯深度功能模型提供了第一个理论保证,确保了其可靠性和统计严谨性。经验上,全面的模拟和现实世界的研究证实了sBayFDNN的有效性和优越性。至关重要的是,sBayFDNN擅长识别复杂的依赖关系,以进行准确的预测,并更精确地识别功能上有意义的区域,这些功能从根本上超越了现有的方法。
摘要:In modern applications such as ECG monitoring, neuroimaging, wearable sensing, and industrial equipment diagnostics, complex and continuously structured data are ubiquitous, presenting both challenges and opportunities for functional data analysis. However, existing methods face a critical trade-off: conventional functional models are limited by linearity, whereas deep learning approaches lack interpretable region selection for sparse effects. To bridge these gaps, we propose a sparse Bayesian functional deep neural network (sBayFDNN). It learns adaptive functional embeddings through a deep Bayesian architecture to capture complex nonlinear relationships, while a structured prior enables interpretable, region-wise selection of influential domains with quantified uncertainty. Theoretically, we establish rigorous approximation error bounds, posterior consistency, and region selection consistency. These results provide the first theoretical guarantees for a Bayesian deep functional model, ensuring its reliability and statistical rigor. Empirically, comprehensive simulations and real-world studies confirm the effectiveness and superiority of sBayFDNN. Crucially, sBayFDNN excels in recognizing intricate dependencies for accurate predictions and more precisely identifies functionally meaningful regions, capabilities fundamentally beyond existing approaches.
【13】Stop-Think-AutoRegress: Language Modeling with Latent Diffusion Planning
标题:停止-思考-AutoRegress:具有潜在扩散规划的语言建模
链接:https://arxiv.org/abs/2602.20528
作者:Justin Lovelace,Christian Belardi,Sofian Zalouk,Adhitya Polavaram,Srivatsa Kundurthy,Kilian Q. Weinberger
备注:COLM 2025
摘要:Stop-Think-AutoRegress语言扩散模型(STAR-LDM)将潜在扩散规划与自回归生成相结合。与传统的自回归语言模型仅限于逐个标记的决策不同,STAR-LDM包含了一个“思考”阶段,该阶段暂停生成,以便在继续之前通过扩散来完善语义计划。这使得在提交离散令牌之前,可以在连续空间中进行全局规划。评估表明,STAR-LDM显着优于类似规模的模型在语言理解基准,并达到$>70\%$的胜率在法学硕士作为判断比较的叙事连贯性和常识推理。该架构还允许通过轻量级分类器进行直接控制,从而在无需模型再训练的情况下实现属性的细粒度转向,同时保持比专门方法更好的流畅性控制权衡。
摘要:The Stop-Think-AutoRegress Language Diffusion Model (STAR-LDM) integrates latent diffusion planning with autoregressive generation. Unlike conventional autoregressive language models limited to token-by-token decisions, STAR-LDM incorporates a "thinking" phase that pauses generation to refine a semantic plan through diffusion before continuing. This enables global planning in continuous space prior to committing to discrete tokens. Evaluations show STAR-LDM significantly outperforms similar-sized models on language understanding benchmarks and achieves $>70\%$ win rates in LLM-as-judge comparisons for narrative coherence and commonsense reasoning. The architecture also allows straightforward control through lightweight classifiers, enabling fine-grained steering of attributes without model retraining while maintaining better fluency-control trade-offs than specialized approaches.
【14】A Generalized Apprenticeship Learning Framework for Capturing Evolving Student Pedagogical Strategies
标题:用于捕捉不断发展的学生教学策略的广义学徒学习框架
链接:https://arxiv.org/abs/2602.20527
作者:Md Mirajul Islam,Xi Yang,Adittya Soukarjya Saha,Rajesh Debnath,Min Chi
备注:16 pages
摘要:强化学习(RL)和深度强化学习(DRL)近年来发展迅速,并已成功应用于智能教学系统(ITS)等电子学习环境。尽管取得了巨大的成功,但由于样本效率低下和难以设计奖励函数等主要挑战,DRL在教育技术中的广泛应用受到限制。相比之下,学徒学习(AL)使用一些专家演示来推断专家的基本奖励功能,并得出决策政策,概括和复制最佳行为。在这项工作中,我们利用一个广义的AL框架,主题,通过捕捉专家学生学习过程的复杂性,其中多个奖励功能可能会随着时间的推移动态演变,以诱导有效的教学政策。我们评估了THEMES对六个最先进的基线的有效性,展示了其卓越的性能,并强调了其作为诱导有效教学政策的强大替代方案的潜力,并表明它可以实现高性能,AUC为0.899,Jaccard为0.653,仅使用上学期的18个轨迹来预测学生在下学期的教学决策。
摘要
:Reinforcement Learning (RL) and Deep Reinforcement Learning (DRL) have advanced rapidly in recent years and have been successfully applied to e-learning environments like intelligent tutoring systems (ITSs). Despite great success, the broader application of DRL to educational technologies has been limited due to major challenges such as sample inefficiency and difficulty designing the reward function. In contrast, Apprenticeship Learning (AL) uses a few expert demonstrations to infer the expert's underlying reward functions and derive decision-making policies that generalize and replicate optimal behavior. In this work, we leverage a generalized AL framework, THEMES, to induce effective pedagogical policies by capturing the complexities of the expert student learning process, where multiple reward functions may dynamically evolve over time. We evaluate the effectiveness of THEMES against six state-of-the-art baselines, demonstrating its superior performance and highlighting its potential as a powerful alternative for inducing effective pedagogical policies and show that it can achieve high performance, with an AUC of 0.899 and a Jaccard of 0.653, using only 18 trajectories of a previous semester to predict student pedagogical decisions in a later semester.
【15】Elimination-compensation pruning for fully-connected neural networks
标题:全连接神经网络的消除补偿修剪
链接:https://arxiv.org/abs/2602.20467
作者:Enrico Ballini,Luca Muscarnera,Alessio Fumagalli,Anna Scotti,Francesco Regazzoni
摘要:深度神经网络在捕捉大型噪声数据集中的复杂模式方面的无与伦比的能力通常与其庞大的假设空间有关,因此与表征模型架构的大量参数有关。修剪技术肯定自己是有效的工具,提取稀疏表示的神经网络参数,仔细平衡压缩和保存的信息。然而,修剪背后的一个基本假设是,可消耗的权重对网络的误差影响很小,而高度重要的权重对推理的影响更大。我们认为,这个想法可以推广,如果一个权重不是简单地删除,但也补偿与扰动的相邻偏置,这并不有助于网络稀疏?我们的工作介绍了一种新的修剪方法,在该方法中,每个权重的重要性度量计算考虑到其相邻偏差的最佳扰动后的输出行为,有效地计算自动微分。然后,这些扰动可以在移除每个权重之后彼此独立地直接应用。在推导出上述数量的解析表达式后,进行了数值实验,以将这种技术与一些最流行的修剪策略进行基准测试,证明了所提出的方法在非常多样化的机器学习场景中的内在效率。最后,我们的研究结果进行了讨论,我们的研究结果的理论意义。
摘要:The unmatched ability of Deep Neural Networks in capturing complex patterns in large and noisy datasets is often associated with their large hypothesis space, and consequently to the vast amount of parameters that characterize model architectures. Pruning techniques affirmed themselves as valid tools to extract sparse representations of neural networks parameters, carefully balancing between compression and preservation of information. However, a fundamental assumption behind pruning is that expendable weights should have small impact on the error of the network, while highly important weights should tend to have a larger influence on the inference. We argue that this idea could be generalized; what if a weight is not simply removed but also compensated with a perturbation of the adjacent bias, which does not contribute to the network sparsity? Our work introduces a novel pruning method in which the importance measure of each weight is computed considering the output behavior after an optimal perturbation of its adjacent bias, efficiently computable by automatic differentiation. These perturbations can be then applied directly after the removal of each weight, independently of each other. After deriving analytical expressions for the aforementioned quantities, numerical experiments are conducted to benchmark this technique against some of the most popular pruning strategies, demonstrating an intrinsic efficiency of the proposed approach in very diverse machine learning scenarios. Finally, our findings are discussed and the theoretical implications of our results are presented.
【16】A Long-Short Flow-Map Perspective for Drifting Models
标题:漂流模型的长短流图透视
链接:https://arxiv.org/abs/2602.20463
作者:Zhiqi Li,Bo Zhu
备注:25 pages, 7 figures
摘要:本文通过一个半群相容的长短流图分解,重新解释了漂移模型deng2026generative。我们表明,一个全球性的运输过程可以分解成一个长期的地平线流图,然后由一个短期终端流图承认一个封闭形式的最佳速度表示,并采取终端间隔长度为零恢复正是漂移场与流图一致性所需的保守脉冲项。基于这一观点,我们提出了一种新的似然学习公式,将长短流图分解与运输下的密度演化相结合。我们通过理论分析和基准测试的实证评估来验证该框架,并进一步提供特征空间优化的理论解释,同时强调未来研究的几个开放问题。
摘要:This paper provides a reinterpretation of the Drifting Model~\cite{deng2026generative} through a semigroup-consistent long-short flow-map factorization. We show that a global transport process can be decomposed into a long-horizon flow map followed by a short-time terminal flow map admitting a closed-form optimal velocity representation, and that taking the terminal interval length to zero recovers exactly the drifting field together with a conservative impulse term required for flow-map consistency. Based on this perspective, we propose a new likelihood learning formulation that aligns the long-short flow-map decomposition with density evolution under transport. We validate the framework through both theoretical analysis and empirical evaluations on benchmark tests, and further provide a theoretical interpretation of the feature-space optimization while highlighting several open problems for future study.
【17】Diffusion Modulation via Environment Mechanism Modeling for Planning
标题:基于环境机制的扩散调节规划模型
链接:https://arxiv.org/abs/2602.20422
作者:Hanping Zhang,Yuhong Guo
摘要:扩散模型在离线强化学习(RL)规划的轨迹生成方面表现出了很好的能力。然而,传统的基于扩散的规划方法往往无法考虑这样一个事实,即在RL中生成轨迹需要过渡之间的唯一一致性,以确保在真实环境中的一致性。这种疏忽可能导致生成的轨迹与真实环境的基本机制之间存在相当大的差异。为了解决这个问题,我们提出了一种新的基于扩散的规划方法,称为通过环境机制建模(DMEMM)的扩散调制。DMEMM通过结合关键的强化学习环境机制,特别是过渡动态和奖励函数来调节扩散模型训练。实验结果表明,DMEMM实现了最先进的性能与离线强化学习规划。
摘要
:Diffusion models have shown promising capabilities in trajectory generation for planning in offline reinforcement learning (RL). However, conventional diffusion-based planning methods often fail to account for the fact that generating trajectories in RL requires unique consistency between transitions to ensure coherence in real environments. This oversight can result in considerable discrepancies between the generated trajectories and the underlying mechanisms of a real environment. To address this problem, we propose a novel diffusion-based planning method, termed as Diffusion Modulation via Environment Mechanism Modeling (DMEMM). DMEMM modulates diffusion model training by incorporating key RL environment mechanisms, particularly transition dynamics and reward functions. Experimental results demonstrate that DMEMM achieves state-of-the-art performance for planning with offline reinforcement learning.
【18】Wasserstein Distributionally Robust Online Learning
标题:Wasserstein分布稳健的在线学习
链接:https://arxiv.org/abs/2602.20403
作者:Guixian Chen,Salar Fattahi,Soroosh Shafiee
摘要:我们研究了分布鲁棒的在线学习,其中风险厌恶的学习者顺序更新决策,以防止从以过去观察为中心的Wasserstein模糊集得出的最坏情况分布。虽然这种范式在离线环境中通过Wasserstein分布鲁棒优化(DRO)得到了很好的理解,但其在线扩展在收敛和计算方面都提出了重大挑战。在本文中,我们将解决这些挑战。首先,我们制定的问题作为一个在线的鞍点之间的随机博弈决策者和对手选择最坏情况下的分布,并提出了一个一般框架,收敛到一个强大的纳什均衡相吻合的解决方案,相应的离线Wasserstein DRO问题。其次,我们解决了主要的计算瓶颈,这是最坏情况下的期望问题的重复解决方案。对于一类重要的分段凹损失函数,我们提出了一个量身定制的算法,利用问题的几何形状,以实现大幅加速超过国家的最先进的求解器,如Guidelines。关键的洞察力是一个新的连接之间的最坏情况下的期望问题,一个固有的无限维优化问题,和一个经典的和易于处理的预算分配问题,这是独立的利益。
摘要:We study distributionally robust online learning, where a risk-averse learner updates decisions sequentially to guard against worst-case distributions drawn from a Wasserstein ambiguity set centered at past observations. While this paradigm is well understood in the offline setting through Wasserstein Distributionally Robust Optimization (DRO), its online extension poses significant challenges in both convergence and computation. In this paper, we address these challenges. First, we formulate the problem as an online saddle-point stochastic game between a decision maker and an adversary selecting worst-case distributions, and propose a general framework that converges to a robust Nash equilibrium coinciding with the solution of the corresponding offline Wasserstein DRO problem. Second, we address the main computational bottleneck, which is the repeated solution of worst-case expectation problems. For the important class of piecewise concave loss functions, we propose a tailored algorithm that exploits problem geometry to achieve substantial speedups over state-of-the-art solvers such as Gurobi. The key insight is a novel connection between the worst-case expectation problem, an inherently infinite-dimensional optimization problem, and a classical and tractable budget allocation problem, which is of independent interest.
【19】Quantitative Approximation Rates for Group Equivariant Learning
标题:群等变学习的定量逼近率
链接:https://arxiv.org/abs/2602.20370
作者:Jonathan W. Siegel,Snir Hordan,Hannah Lawrence,Ali Syed,Nadav Dym
摘要:通用逼近定理表明,神经网络可以逼近紧致集上的任何连续函数。近似理论的后期工作提供了ReLU网络在$α$-Hölder函数类$f:[0,1]^N \to \mathbb{R}$上的定量近似率。本文的目标是在群等变学习的背景下提供类似的定量近似结果,其中已知学习的$α$-Hölder函数服从某些群对称性。虽然有很多兴趣在文献中了解的通用近似性能的等变模型,非常少的定量近似结果是已知的等变模型。 在本文中,我们通过推导几个突出的组等变和不变的架构的定量近似率来弥合这一差距。我们考虑的架构包括:置换不变深集架构;置换等变求和器和Transformer架构;使用基于帧平均的不变网络对置换和刚性运动的联合不变性;以及一般的双Lipschitz不变模型。总的来说,我们证明了大小相等的ReLU MLP和等变架构在等变函数上的表现力是一样的。因此,硬编码等方差不会导致这些模型的表达能力或近似能力的损失。
摘要:The universal approximation theorem establishes that neural networks can approximate any continuous function on a compact set. Later works in approximation theory provide quantitative approximation rates for ReLU networks on the class of $α$-Hölder functions $f: [0,1]^N \to \mathbb{R}$. The goal of this paper is to provide similar quantitative approximation results in the context of group equivariant learning, where the learned $α$-Hölder function is known to obey certain group symmetries. While there has been much interest in the literature in understanding the universal approximation properties of equivariant models, very few quantitative approximation results are known for equivariant models. In this paper, we bridge this gap by deriving quantitative approximation rates for several prominent group-equivariant and invariant architectures. The architectures that we consider include: the permutation-invariant Deep Sets architecture; the permutation-equivariant Sumformer and Transformer architectures; joint invariance to permutations and rigid motions using invariant networks based on frame averaging; and general bi-Lipschitz invariant models. Overall, we show that equally-sized ReLU MLPs and equivariant architectures are equally expressive over equivariant functions. Thus, hard-coding equivariance does not result in a loss of expressivity or approximation power in these models.
【20】Momentum Guidance: Plug-and-Play Guidance for Flow Models
标题:动量指导:流程模型的即插即用指导
链接:https://arxiv.org/abs/2602.20360
作者:Runlong Liao,Jian Yu,Baiyu Su,Chi Zhang,Lizhang Chen,Qiang Liu
摘要:基于流的生成模型已成为高质量生成建模的强大框架,但预训练模型很少以其香草条件形式使用:由于神经网络的平滑效应,没有指导的条件样本通常显得分散且缺乏细粒度细节。现有的制导技术,如无分类器制导(CFG),提高了保真度,但增加了一倍的推理成本,并通常减少样本的多样性。我们引入了动量引导(MG),这是一种利用ODE轨迹本身的新的引导维度。MG使用过去速度的指数移动平均值外推当前速度,并保留标准的每步一次评估成本。该方法不需要额外的计算量就能达到标准制导的效果,与CFG相结合可以进一步提高制导质量。实验证明MG的有效性跨基准。具体来说,在ImageNet-256上,MG在各种采样设置中,无CFG时FID平均提高了36.68%,有CFG时平均提高了25.52%,在64个采样步骤中FID达到了1.597。对稳定扩散3和FLUX.1-dev等大型基于流的模型的评估进一步证实了标准度量的一致质量增强。
摘要
:Flow-based generative models have become a strong framework for high-quality generative modeling, yet pretrained models are rarely used in their vanilla conditional form: conditional samples without guidance often appear diffuse and lack fine-grained detail due to the smoothing effects of neural networks. Existing guidance techniques such as classifier-free guidance (CFG) improve fidelity but double the inference cost and typically reduce sample diversity. We introduce Momentum Guidance (MG), a new dimension of guidance that leverages the ODE trajectory itself. MG extrapolates the current velocity using an exponential moving average of past velocities and preserves the standard one-evaluation-per-step cost. It matches the effect of standard guidance without extra computation and can further improve quality when combined with CFG. Experiments demonstrate MG's effectiveness across benchmarks. Specifically, on ImageNet-256, MG achieves average improvements in FID of 36.68% without CFG and 25.52% with CFG across various sampling settings, attaining an FID of 1.597 at 64 sampling steps. Evaluations on large flow-based models like Stable Diffusion 3 and FLUX.1-dev further confirm consistent quality enhancements across standard metrics.
【21】In-context Pre-trained Time-Series Foundation Models adapt to Unseen Tasks
标题:背景下预训练的时间序列基础模型适应不可见的任务
链接:https://arxiv.org/abs/2602.20307
作者:Shangqing Xu,Harshavardhan Kamarthi,Haoxin Liu,B. Aditya Prakash
摘要:时间序列基础模型(TSFM)在不同的数据集和任务中表现出强大的泛化能力。然而,现有的基础模型通常经过预训练以增强特定任务的性能,并且在没有微调的情况下通常很难推广到看不见的任务。为了解决这一限制,我们建议使用上下文学习(ICL)功能来增强TSFM,使它们能够通过动态适应上下文中提供的输入-输出关系来执行测试时推理。我们的框架,在上下文时间序列预训练(ICTP),重组原始的预训练数据,以装备骨干TSFM与ICL功能,使适应看不见的任务。实验表明,ICT将最先进的TSFM的性能提高了约11.4%,而无需微调。
摘要:Time-series foundation models (TSFMs) have demonstrated strong generalization capabilities across diverse datasets and tasks. However, existing foundation models are typically pre-trained to enhance performance on specific tasks and often struggle to generalize to unseen tasks without fine-tuning. To address this limitation, we propose augmenting TSFMs with In-Context Learning (ICL) capabilities, enabling them to perform test-time inference by dynamically adapting to input-output relationships provided within the context. Our framework, In-Context Time-series Pre-training (ICTP), restructures the original pre-training data to equip the backbone TSFM with ICL capabilities, enabling adaptation to unseen tasks. Experiments demonstrate that ICT improves the performance of state-of-the-art TSFMs by approximately 11.4% on unseen tasks without requiring fine-tuning.
【22】Multilevel Determinants of Overweight and Obesity Among U.S. Children Aged 10-17: Comparative Evaluation of Statistical and Machine Learning Approaches Using the 2021 National Survey of Children's Health
标题:美国10-17岁儿童超重和肥胖的多水平决定因素:使用2021年全国儿童健康调查对统计和机器学习方法进行比较评估
链接:https://arxiv.org/abs/2602.20303
作者:Joyanta Jyoti Mondal
摘要:背景资料:儿童和青少年超重和肥胖仍然是美国主要的公共卫生问题,并受到行为,家庭和社区因素的影响。它们在总体水平上的联合预测结构仍不完全特征化。目的:该研究旨在确定美国青少年超重和肥胖的多层次预测因素,并比较统计,机器学习和深度学习模型的预测性能,校准和亚组公平性。数据和方法:我们分析了2021年全国儿童健康调查的18,792名10-17岁儿童。超重/肥胖使用BMI类别定义。预测因素包括饮食、体育活动、睡眠、父母压力、社会经济条件、不良经历和邻里特征。模型包括逻辑回归,随机森林,梯度提升,XGBoost,LightGBM,多层感知器和TabNet。使用AUC、准确度、精确度、召回率、F1评分和Brier评分评估性能。结果:分辨力范围为0.66 - 0.79。Logistic回归、梯度增强和MLP显示了最稳定的辨别力和校准平衡。增强和深度学习适度提高了召回率和F1分数。没有一种模式是绝对优越的。种族和贫困群体之间的表现差异在算法中持续存在。结论:增加模型复杂性比逻辑回归带来的收益有限。预测者始终跨越行为,家庭和邻里领域。持续的亚组差异表明需要提高数据质量和以公平为重点的监督,而不是更大的算法复杂性。
摘要:Background: Childhood and adolescent overweight and obesity remain major public health concerns in the United States and are shaped by behavioral, household, and community factors. Their joint predictive structure at the population level remains incompletely characterized. Objectives: The study aims to identify multilevel predictors of overweight and obesity among U.S. adolescents and compare the predictive performance, calibration, and subgroup equity of statistical, machine-learning, and deep-learning models. Data and Methods: We analyze 18,792 children aged 10-17 years from the 2021 National Survey of Children's Health. Overweight/obesity is defined using BMI categories. Predictors included diet, physical activity, sleep, parental stress, socioeconomic conditions, adverse experiences, and neighborhood characteristics. Models include logistic regression, random forest, gradient boosting, XGBoost, LightGBM, multilayer perceptron, and TabNet. Performance is evaluated using AUC, accuracy, precision, recall, F1 score, and Brier score. Results: Discrimination range from 0.66 to 0.79. Logistic regression, gradient boosting, and MLP showed the most stable balance of discrimination and calibration. Boosting and deep learning modestly improve recall and F1 score. No model was uniformly superior. Performance disparities across race and poverty groups persist across algorithms. Conclusion: Increased model complexity yields limited gains over logistic regression. Predictors consistently span behavioral, household, and neighborhood domains. Persistent subgroup disparities indicate the need for improved data quality and equity-focused surveillance rather than greater algorithmic complexity.
【23】Learning to Solve Complex Problems via Dataset Decomposition
标题:学习通过数据集分解解决复杂问题
链接:https://arxiv.org/abs/2602.20296
作者:Wanru Zhao,Lucas Caccia,Zhengyan Shi,Minseon Kim,Weijia Xu,Alessandro Sordoni
备注:NeurIPS 2025
摘要
:课程学习是一类训练策略,它通过难度来组织暴露于模型的数据,逐渐从简单到更复杂的示例。本研究探索了一种逆向课程生成方法,将复杂的数据集递归分解为更简单,更容易学习的组件。我们提出了一个教师-学生框架,其中教师配备了逐步推理的能力,用于递归生成更简单的示例版本,使学生模型能够逐步掌握困难的任务。我们提出了一种新的评分系统来衡量数据的难度,其结构的复杂性和概念的深度的基础上,允许课程建设分解的数据。在数学数据集(MATH和AIME)和代码生成数据集上的实验表明,与原始数据集上的标准训练相比,用我们的方法生成的课程训练的模型表现出更好的性能。
摘要:Curriculum learning is a class of training strategies that organizes the data being exposed to a model by difficulty, gradually from simpler to more complex examples. This research explores a reverse curriculum generation approach that recursively decomposes complex datasets into simpler, more learnable components. We propose a teacher-student framework where the teacher is equipped with the ability to reason step-by-step, which is used to recursively generate easier versions of examples, enabling the student model to progressively master difficult tasks. We propose a novel scoring system to measure data difficulty based on its structural complexity and conceptual depth, allowing curriculum construction over decomposed data. Experiments on math datasets (MATH and AIME) and code generation datasets demonstrate that models trained with curricula generated by our approach exhibit superior performance compared to standard training on original datasets.
【24】MultiModalPFN: Extending Prior-Data Fitted Networks for Multimodal Tabular Learning
标题:MultiModalPFN:扩展先验数据拟合网络用于多模态表格学习
链接:https://arxiv.org/abs/2602.20223
作者:Wall Kim,Chaeyoung Song,Hanul Kim
备注:Accepted to CVPR 2026
摘要:最近,TabPFN作为表格数据的基础模型受到了关注。然而,它很难集成异构模式,如图像和文本,这在医疗保健和营销等领域很常见,从而限制了其适用性。为了解决这个问题,我们提出了多模态先验数据拟合网络(MMPFN),它扩展了TabPFN,以统一的方式处理表格和非表格模态。MMPFN包括每模态编码器,模态投影仪和预训练的基础模型。模态投影器作为关键的桥梁,将非表格嵌入转换为表格兼容的标记,以进行统一处理。为此,我们引入了一个多头门控MLP和一个交叉注意力算法,从非表格输入中提取更丰富的上下文,同时减轻了多模态学习中的注意力不平衡问题。对医疗和通用多模式数据集的广泛实验表明,MMPFN始终优于竞争性的最先进方法,并有效地利用非表格模式和表格特征。这些结果凸显了将先验数据拟合网络扩展到多模态环境的前景,为异构数据学习提供了可扩展且有效的框架。源代码可在https://github.com/too-z/MultiModalPFN上获得。
摘要:Recently, TabPFN has gained attention as a foundation model for tabular data. However, it struggles to integrate heterogeneous modalities such as images and text, which are common in domains like healthcare and marketing, thereby limiting its applicability. To address this, we present the Multi-Modal Prior-data Fitted Network (MMPFN), which extends TabPFN to handle tabular and non-tabular modalities in a unified manner. MMPFN comprises per-modality encoders, modality projectors, and pre-trained foundation models. The modality projectors serve as the critical bridge, transforming non-tabular embeddings into tabular-compatible tokens for unified processing. To this end, we introduce a multi-head gated MLP and a cross-attention pooler that extract richer context from non-tabular inputs while mitigates attention imbalance issue in multimodal learning. Extensive experiments on medical and general-purpose multimodal datasets demonstrate that MMPFN consistently outperforms competitive state-of-the-art methods and effectively exploits non-tabular modalities alongside tabular features. These results highlight the promise of extending prior-data fitted networks to the multimodal setting, offering a scalable and effective framework for heterogeneous data learning. The source code is available at https://github.com/too-z/MultiModalPFN.
【25】Model Merging in the Essential Subspace
标题:基本子空间中的模型合并
链接:https://arxiv.org/abs/2602.20208
作者:Longhua Li,Lei Qi,Qi Tian,Xin Geng
备注:Accepted by CVPR 2026
摘要:模型合并旨在将从共享预训练检查点导出的多个特定于任务的微调模型集成到单个多任务模型中,而无需额外训练。尽管有广泛的研究,任务干扰仍然是一个主要的障碍,往往会破坏合并模型的性能。在本文中,我们提出了ESM(基本子空间合并),一个强大的框架,有效的模型合并。我们首先对参数更新引起的特征偏移进行主成分分析(PCA)。由此产生的主方向跨越一个主要影响特征表示的基本子空间。每个任务的参数更新矩阵投影到其各自的基本子空间进行低秩分解,然后合并。这种方法减轻了任务间干扰,同时保留了核心任务特定功能。此外,我们引入了一个多层次的极化缩放策略,放大包含关键知识的参数,抑制冗余的,防止必要的知识被淹没在融合过程中。在多个任务集和模型规模上的大量实验表明,我们的方法在多任务模型合并中达到了最先进的性能。
摘要:Model merging aims to integrate multiple task-specific fine-tuned models derived from a shared pre-trained checkpoint into a single multi-task model without additional training. Despite extensive research, task interference remains a major obstacle that often undermines the performance of merged models. In this paper, we propose ESM (Essential Subspace Merging) , a robust framework for effective model merging. We begin by performing Principal Component Analysis (PCA) on feature shifts induced by parameter updates. The resulting principal directions span an essential subspace that dominantly influences feature representations. Each task's parameter update matrix is projected onto its respective essential subspace for low-rank decomposition before merging. This methodology mitigates inter-task interference while preserving core task-specific functionality. Furthermore, we introduce a multi-level polarized scaling strategy that amplifies parameters containing critical knowledge and suppresses redundant ones, preventing essential knowledge from being overwhelmed during fusion. Extensive experiments across multiple task sets and model scales demonstrate that our method achieves state-of-the-art performance in multi-task model merging.
【26】FedAvg-Based CTMC Hazard Model for Federated Bridge Deterioration Assessment
标题:基于FedAvg的CTMC危险模型用于联邦桥梁退化评估
链接:https://arxiv.org/abs/2602.20194
作者:Takato Yasuno
备注:10 pages, 4 figures, 2 tables
摘要
:桥梁定期检查记录包含有关公共基础设施的敏感信息,使得在现有数据治理约束下跨组织数据共享变得不切实际。我们提出了一个联合框架,估计连续时间马尔可夫链(CTMC)的桥梁恶化的危险模型,使市政当局协作训练共享的基准模型,而不转移原始检查记录。每个用户都持有本地检测数据,并在三个退化方向转换(良好到轻微、良好到严重和轻微到严重)上训练对数线性风险模型,协变量为桥梁年龄、海岸线距离和甲板面积。局部优化是通过小批量随机梯度下降CTMC对数似然,只有一个12维的伪梯度向量上传到一个中央服务器每个通信回合。服务器使用具有动量和梯度裁剪的样本加权联合平均(FedAvg)来聚合用户更新。本文中的所有实验都是在从具有特定区域异质性的已知地面实况参数集生成的完全合成数据上进行的,从而能够对联邦收敛行为进行受控评估。跨异构用户的仿真结果表明一致收敛的平均负对数似然,随着用户规模的增加,聚合梯度范数下降。此外,联合更新机制提供了一种自然的参与激励:在共享技术标准平台上登记其当地检查数据集的用户将收到定期更新的全球基准参数-仅从当地数据无法获得的信息-从而能够在不放弃数据主权的情况下进行循证生命周期规划。
摘要:Bridge periodic inspection records contain sensitive information about public infrastructure, making cross-organizational data sharing impractical under existing data governance constraints. We propose a federated framework for estimating a Continuous-Time Markov Chain (CTMC) hazard model of bridge deterioration, enabling municipalities to collaboratively train a shared benchmark model without transferring raw inspection records. Each User holds local inspection data and trains a log-linear hazard model over three deterioration-direction transitions -- Good$\to$Minor, Good$\to$Severe, and Minor$\to$Severe -- with covariates for bridge age, coastline distance, and deck area. Local optimization is performed via mini-batch stochastic gradient descent on the CTMC log-likelihood, and only a 12-dimensional pseudo-gradient vector is uploaded to a central server per communication round. The server aggregates User updates using sample-weighted Federated Averaging (FedAvg) with momentum and gradient clipping. All experiments in this paper are conducted on fully synthetic data generated from a known ground-truth parameter set with region-specific heterogeneity, enabling controlled evaluation of federated convergence behaviour. Simulation results across heterogeneous Users show consistent convergence of the average negative log-likelihood, with the aggregated gradient norm decreasing as User scale increases. Furthermore, the federated update mechanism provides a natural participation incentive: Users who register their local inspection datasets on a shared technical-standard platform receive in return the periodically updated global benchmark parameters -- information that cannot be obtained from local data alone -- thereby enabling evidence-based life-cycle planning without surrendering data sovereignty.
【27】Is Multi-Distribution Learning as Easy as PAC Learning: Sharp Rates with Bounded Label Noise
标题:多分布学习是否像PAC学习一样简单:具有有限标签噪音的锐速率
链接:https://arxiv.org/abs/2602.21039
作者:Rafael Hanashiro,Abhishek Shetty,Patrick Jaillet
摘要:为了理解从异构源学习的统计复杂性,我们研究了多分布学习问题。给定$k$个数据源,目标是通过利用共享结构为每个源输出一个分类器,以降低样本复杂度。我们专注于有界标签噪声设置,以确定是否在单任务学习中实现的快速1/ε率扩展到这种制度,对k的依赖性最小。令人惊讶的是,我们发现情况并非如此。我们证明了,在$k$分布上学习本质上会导致以$k/ε^2$缩放的慢速率,即使在恒定的噪声水平下,除非每个分布都是单独学习的。一个关键的技术贡献是一个结构化的假设测试框架,捕捉有界噪声下证明接近最优性的统计成本,我们表明,在多分布设置的成本是不可避免的。 最后,我们证明了当与每个分布的最优贝叶斯误差的更强基准竞争时,样本复杂度在$k$中产生\textit{乘法}惩罚。这在随机分类噪声和Massart噪声之间建立了一个\textit{统计}分离,突出了从多个源学习的一个独特的基本障碍。
摘要:Towards understanding the statistical complexity of learning from heterogeneous sources, we study the problem of multi-distribution learning. Given $k$ data sources, the goal is to output a classifier for each source by exploiting shared structure to reduce sample complexity. We focus on the bounded label noise setting to determine whether the fast $1/ε$ rates achievable in single-task learning extend to this regime with minimal dependence on $k$. Surprisingly, we show that this is not the case. We demonstrate that learning across $k$ distributions inherently incurs slow rates scaling with $k/ε^2$, even under constant noise levels, unless each distribution is learned separately. A key technical contribution is a structured hypothesis-testing framework that captures the statistical cost of certifying near-optimality under bounded noise-a cost we show is unavoidable in the multi-distribution setting. Finally, we prove that when competing with the stronger benchmark of each distribution's optimal Bayes error, the sample complexity incurs a \textit{multiplicative} penalty in $k$. This establishes a \textit{statistical} separation between random classification noise and Massart noise, highlighting a fundamental barrier unique to learning from multiple sources.
【28】The Sim-to-Real Gap in MRS Quantification: A Systematic Deep Learning Validation for GABA
标题:RIS量化中的模拟与真实差距:GABA的系统性深度学习验证
链接:https://arxiv.org/abs/2602.20289
作者:Zien Ma,S. M. Shermer,Oktay Karakuş,Frank C. Langbein
备注:37 pages, 10 figures, 12 tables
摘要:磁共振波谱(MRS)用于量化体内代谢物,并估计从神经系统疾病到癌症的各种疾病的生物标志物。由于低信噪比(SNR)和光谱重叠,定量低浓度代谢物如GABA(γ$-氨基丁酸)具有挑战性。我们研究并验证了深度学习用于量化来自MEGA-PRESS光谱的复杂,低SNR,重叠信号,设计了卷积神经网络(CNN)和Y形自动编码器(YAE),并通过贝叶斯优化从切片轮廓感知MEGA-PRESS模拟中选择10,000个模拟光谱的最佳模型。所选模型在100,000个模拟光谱上进行训练。我们验证了他们的性能144光谱从112个实验phantomerals包含五个代谢物的利益(GABA,Glu,Gln,NAA,Cr)与已知的地面真值浓度跨溶液和凝胶系列在不同的带宽和实现下获得在3 T。这些模型进一步评估广泛使用的LCModel量化工具。在模拟中,两个模型都达到了近乎完美的一致性(小MAE;回归斜率$\约1.00$,$R^2 \约1.00$)。在实验体模数据上,误差最初大幅增加。然而,在训练数据中对可变线宽进行建模显著减少了这一差距。最好的增强深度学习模型在最大归一化相对浓度的所有体模光谱中实现了GABA的平均MAE为0.151(YAE)和0.160(FCNN),优于传统的基线LC模型(0.220)。模拟与真实的差距仍然存在,但物理信息数据增强大大减少了这一差距。需要幻影地面真相来判断一种方法是否能在真实数据上可靠地执行。
摘要
:Magnetic resonance spectroscopy (MRS) is used to quantify metabolites in vivo and estimate biomarkers for conditions ranging from neurological disorders to cancers. Quantifying low-concentration metabolites such as GABA ($γ$-aminobutyric acid) is challenging due to low signal-to-noise ratio (SNR) and spectral overlap. We investigate and validate deep learning for quantifying complex, low-SNR, overlapping signals from MEGA-PRESS spectra, devise a convolutional neural network (CNN) and a Y-shaped autoencoder (YAE), and select the best models via Bayesian optimisation on 10,000 simulated spectra from slice-profile-aware MEGA-PRESS simulations. The selected models are trained on 100,000 simulated spectra. We validate their performance on 144 spectra from 112 experimental phantoms containing five metabolites of interest (GABA, Glu, Gln, NAA, Cr) with known ground truth concentrations across solution and gel series acquired at 3 T under varied bandwidths and implementations. These models are further assessed against the widely used LCModel quantification tool. On simulations, both models achieve near-perfect agreement (small MAEs; regression slopes $\approx 1.00$, $R^2 \approx 1.00$). On experimental phantom data, errors initially increased substantially. However, modelling variable linewidths in the training data significantly reduced this gap. The best augmented deep learning models achieved a mean MAE for GABA over all phantom spectra of 0.151 (YAE) and 0.160 (FCNN) in max-normalised relative concentrations, outperforming the conventional baseline LCModel (0.220). A sim-to-real gap remains, but physics-informed data augmentation substantially reduced it. Phantom ground truth is needed to judge whether a method will perform reliably on real data.
【29】Regressor-guided Diffusion Model for De Novo Peptide Sequencing with Explicit Mass Control
标题:具有显式质量控制的De Novo肽测序的回归引导扩散模型
链接:https://arxiv.org/abs/2602.20209
作者:Shaorong Chen,Jingbo Zhou,Jun Xia
摘要:新蛋白质的发现依赖于灵敏的蛋白质鉴定,其中来自质谱的从头肽测序(DNPS)是一种至关重要的方法。虽然深度学习已经推进了DNPS,但现有模型不能充分执行基本的质量一致性约束,即预测的肽质量必须与实验测量的前体质量相匹配。以前的DNPS方法通常将此关键信息视为简单的输入特征或在后处理中使用它,导致许多不符合此基本物理特性的难以置信的预测。为了解决这一限制,我们引入了DiffuNovo,这是一种用于从头肽测序的新型回归引导扩散模型,可提供显式的肽水平质量控制。我们的方法在两个关键阶段集成了质量约束:在训练过程中,一种新的肽级质量损失指导模型优化,而在推理过程中,来自潜在空间中基于梯度的更新的基于回归的指导引导生成,以迫使预测的肽遵守质量约束。对已建立的基准进行的全面评估表明,DiffuNovo在DNPS准确性方面超越了最先进的方法。此外,作为第一个采用扩散模型作为其核心骨架的DNPS模型,DiffuNovo利用了扩散结构的强大可控性,并实现了质量误差的显着降低,从而产生了物理上更合理的肽。这些创新代表了朝着强大和广泛适用的DNPS的实质性进步。源代码可在补充材料中找到。
摘要:The discovery of novel proteins relies on sensitive protein identification, for which de novo peptide sequencing (DNPS) from mass spectra is a crucial approach. While deep learning has advanced DNPS, existing models inadequately enforce the fundamental mass consistency constraint, that a predicted peptide's mass must match the experimental measured precursor mass. Previous DNPS methods often treat this critical information as a simple input feature or use it in post-processing, leading to numerous implausible predictions that do not adhere to this fundamental physical property. To address this limitation, we introduce DiffuNovo, a novel regressor-guided diffusion model for de novo peptide sequencing that provides explicit peptide-level mass control. Our approach integrates the mass constraint at two critical stages: during training, a novel peptide-level mass loss guides model optimization, while at inference, regressor-based guidance from gradient-based updates in the latent space steers the generation to compel the predicted peptide adheres to the mass constraint. Comprehensive evaluations on established benchmarks demonstrate that DiffuNovo surpasses state-of-the-art methods in DNPS accuracy. Additionally, as the first DNPS model to employ a diffusion model as its core backbone, DiffuNovo leverages the powerful controllability of diffusion architecture and achieves a significant reduction in mass error, thereby producing much more physically plausible peptides. These innovations represent a substantial advancement toward robust and broadly applicable DNPS. The source code is available in the supplementary material.
其他(33篇)
【1】Test-Time Training with KV Binding Is Secretly Linear Attention
标题:具有KV绑定的测试时间训练是秘密线性注意力
链接:https://arxiv.org/abs/2602.21204
作者:Junchen Liu,Sven Elflein,Or Litany,Zan Gojcic,Ruilong Li
备注:Webpage: https://research.nvidia.com/labs/sil/projects/tttla/
摘要:使用KV绑定作为序列建模层的测试时训练(TTT)通常被解释为在线元学习的一种形式,它在测试时记住键值映射。然而,我们的分析揭示了与这种基于记忆的解释相矛盾的多种现象。出于这些发现,我们重新制定的TTT,并表明,广泛的一类TTT架构可以表示为一种形式的学习线性注意力算子。除了解释以前令人困惑的模型行为,这种观点产生了多个实际好处:它使原则架构简化,承认完全并行的配方,保持性能,同时提高效率,并提供了一个系统的减少不同的TTT变量的标准线性注意力形式。总的来说,我们的研究结果重新定义TTT不是作为测试时的记忆,而是作为学习的线性注意力与增强的表征能力。
摘要:Test-time training (TTT) with KV binding as sequence modeling layer is commonly interpreted as a form of online meta-learning that memorizes a key-value mapping at test time. However, our analysis reveals multiple phenomena that contradict this memorization-based interpretation. Motivated by these findings, we revisit the formulation of TTT and show that a broad class of TTT architectures can be expressed as a form of learned linear attention operator. Beyond explaining previously puzzling model behaviors, this perspective yields multiple practical benefits: it enables principled architectural simplifications, admits fully parallel formulations that preserve performance while improving efficiency, and provides a systematic reduction of diverse TTT variants to a standard linear attention form. Overall, our results reframe TTT not as test-time memorization, but as learned linear attention with enhanced representational capacity.
【2】Aletheia tackles FirstProof autonomously
标题:Aletheia自主应对FirstProof
链接:https://arxiv.org/abs/2602.21201
作者:Tony Feng,Junehyuk Jung,Sang-hyun Kim,Carlo Pagano,Sergei Gukov,Chiang-Chiang Tsai,David Woodruff,Adel Javanmard,Aryan Mokhtari,Dawsen Hwang,Yuri Chervonyi,Jonathan N. Lee,Garrett Bingham,Trieu H. Trinh,Vahab Mirrokni,Quoc V. Le,Thang Luong
备注
:34 pages. Project page: https://github.com/google-deepmind/superhuman/tree/main/aletheia
摘要:我们报告了Aletheia的性能(Feng等人,2026b),一个由Gemini 3 Deep Think提供动力的数学研究代理,参加了首届FirstProof挑战赛。在允许的挑战时间内,根据大多数专家的评估,Aletheia自主解决了10个问题中的6个问题(2,5,7,8,9,10);我们注意到专家们对问题8(仅)并不一致。为了完全透明,我们解释了我们对FirstProof的解释,并披露了有关我们实验和评估的细节。原始提示和输出可在www.example.com上获取。
摘要:We report the performance of Aletheia (Feng et al., 2026b), a mathematics research agent powered by Gemini 3 Deep Think, on the inaugural FirstProof challenge. Within the allowed timeframe of the challenge, Aletheia autonomously solved 6 problems (2, 5, 7, 8, 9, 10) out of 10 according to majority expert assessments; we note that experts were not unanimous on Problem 8 (only). For full transparency, we explain our interpretation of FirstProof and disclose details about our experiments as well as our evaluation. Raw prompts and outputs are available at https://github.com/google-deepmind/superhuman/tree/main/aletheia.
【3】Untied Ulysses: Memory-Efficient Context Parallelism via Headwise Chunking
标题:Unted Buttons:通过Headwise Chunking实现内存高效上下文并行主义
链接:https://arxiv.org/abs/2602.21196
作者:Ravi Ghadia,Maksim Abraham,Sergei Vorobyov,Max Ryabinin
备注:14 pages, 6 figures
摘要:使用Transformer模型有效地处理长序列通常需要通过上下文并行在加速器之间拆分计算。这一系列方法中的主要方法,如Ring Attention或DeepSpeed扩展,可以扩展上下文维度,但不关注内存效率,这限制了它们可以支持的序列长度。更高级的技术,如全流水线分布式Transformer或激活卸载,可以以训练吞吐量为代价进一步扩展可能的上下文长度。在本文中,我们提出了UPipe,一个简单而有效的上下文并行技术,执行细粒度分块在注意头级别。这种技术大大减少了自我注意的激活记忆使用,打破了激活记忆障碍,解锁了更长的上下文长度。我们的方法减少了中间张量内存使用的注意力层多达87.5$\%$的32 B Transformers,同时匹配以前的上下文并行技术的训练速度。UPipe可以在单个8$\times$H100节点上训练Llama 3 -8B时支持5 M令牌的上下文长度,比以前的方法提高了25$\%$。
摘要:Efficiently processing long sequences with Transformer models usually requires splitting the computations across accelerators via context parallelism. The dominant approaches in this family of methods, such as Ring Attention or DeepSpeed Ulysses, enable scaling over the context dimension but do not focus on memory efficiency, which limits the sequence lengths they can support. More advanced techniques, such as Fully Pipelined Distributed Transformer or activation offloading, can further extend the possible context length at the cost of training throughput. In this paper, we present UPipe, a simple yet effective context parallelism technique that performs fine-grained chunking at the attention head level. This technique significantly reduces the activation memory usage of self-attention, breaking the activation memory barrier and unlocking much longer context lengths. Our approach reduces intermediate tensor memory usage in the attention layer by as much as 87.5$\%$ for 32B Transformers, while matching previous context parallelism techniques in terms of training speed. UPipe can support the context length of 5M tokens when training Llama3-8B on a single 8$\times$H100 node, improving upon prior methods by over 25$\%$.
【4】The Diffusion Duality, Chapter II: $Ψ$-Samplers and Efficient Curriculum
标题:扩散二元性,第二章:$ð $-采样器和有效课程
链接:https://arxiv.org/abs/2602.21185
作者:Justin Deschenaux,Caglar Gulcehre,Subham Sekhar Sahoo
摘要:均匀状态离散扩散模型由于其自我校正的能力而在几步生成和指导方面表现出色,使其在这些设置中优于自回归或掩蔽扩散模型。然而,随着步骤数量的增加,它们的采样质量与祖先采样器持平。我们介绍了一个家庭的预测-校正(PC)采样器的离散扩散,推广以前的方法,并适用于任意噪声过程。当与均匀状态扩散配对时,我们的采样器在语言和图像建模方面都优于祖先采样,在OpenWebText上匹配的unigram熵和CIFAR 10上更好的FID/IS分数上实现了更低的生成困惑。至关重要的是,与传统的采样器不同,我们的PC方法通过更多的采样步骤不断改进。总的来说,这些发现对掩蔽扩散是基于扩散的语言建模的必然未来的假设提出了质疑。除了采样,我们还为高斯松弛训练阶段开发了一个高效的记忆课程,与Duo相比,训练时间减少了25%,记忆减少了33%,同时在OpenWebText和LM 1B上保持了相当的困惑和强大的下游性能。我们发布代码,检查点和视频教程:https://s-sahoo.com/duo-ch2
摘要:Uniform-state discrete diffusion models excel at few-step generation and guidance due to their ability to self-correct, making them preferred over autoregressive or Masked diffusion models in these settings. However, their sampling quality plateaus with ancestral samplers as the number of steps increases. We introduce a family of Predictor-Corrector (PC) samplers for discrete diffusion that generalize prior methods and apply to arbitrary noise processes. When paired with uniform-state diffusion, our samplers outperform ancestral sampling on both language and image modeling, achieving lower generative perplexity at matched unigram entropy on OpenWebText and better FID/IS scores on CIFAR10. Crucially, unlike conventional samplers, our PC methods continue to improve with more sampling steps. Taken together, these findings call into question the assumption that Masked diffusion is the inevitable future of diffusion-based language modeling. Beyond sampling, we develop a memory-efficient curriculum for the Gaussian relaxation training phase, reducing training time by 25% and memory by 33% compared to Duo while maintaining comparable perplexity on OpenWebText and LM1B and strong downstream performance. We release code, checkpoints, and a video-tutorial on: https://s-sahoo.com/duo-ch2
【5】A Benchmark for Deep Information Synthesis
标题:深度信息合成的基准
链接:https://arxiv.org/abs/2602.21143
作者:Debjit Paul,Daniel Murphy,Milan Gritta,Ronald Cardenas,Victor Prokhorov,Lena Sophia Bolliger,Aysim Toker,Roy Miles,Andreea-Maria Oncescu,Jasivan Alex Sivakumar,Philipp Borchert,Ismail Elezi,Meiru Zhang,Ka Yiu Lee,Guchun Zhang,Jun Wang,Gerasimos Lampouras
备注:Accepted at ICLR 2026
摘要
:基于大型语言模型(LLM)的代理越来越多地用于解决涉及工具使用的复杂任务,例如Web浏览,代码执行和数据分析。然而,目前的评价基准并没有充分评估他们解决现实世界的任务,需要综合信息从多个来源和推断的见解超越简单的事实检索的能力。为了解决这个问题,我们介绍了DEEPSYNTH,一种新的基准设计,以评估代理人的现实,耗时的问题,结合信息收集,合成和结构化推理,以产生见解。DEEPSYNTH包含120个任务,涉及7个领域和67个国家的数据源。DEEPSYNTH使用多阶段数据收集管道构建,需要注释者收集官方数据源,创建假设,执行手动分析,并使用可验证的答案设计任务。当在DEEPSYNTH上进行评估时,11个最先进的LLM和深度研究代理在LLM法官指标上获得了8.97和17.5的最高F1分数,强调了基准的难度。我们的分析表明,目前的代理人在大信息空间中与幻觉和推理作斗争,突出了DEEPSYNTH作为指导未来研究的关键基准。
摘要:Large language model (LLM)-based agents are increasingly used to solve complex tasks involving tool use, such as web browsing, code execution, and data analysis. However, current evaluation benchmarks do not adequately assess their ability to solve real-world tasks that require synthesizing information from multiple sources and inferring insights beyond simple fact retrieval. To address this, we introduce DEEPSYNTH, a novel benchmark designed to evaluate agents on realistic, time-consuming problems that combine information gathering, synthesis, and structured reasoning to produce insights. DEEPSYNTH contains 120 tasks collected across 7 domains and data sources covering 67 countries. DEEPSYNTH is constructed using a multi-stage data collection pipeline that requires annotators to collect official data sources, create hypotheses, perform manual analysis, and design tasks with verifiable answers. When evaluated on DEEPSYNTH, 11 state-of-the-art LLMs and deep research agents achieve a maximum F1 score of 8.97 and 17.5 on the LLM-judge metric, underscoring the difficulty of the benchmark. Our analysis reveals that current agents struggle with hallucinations and reasoning over large information spaces, highlighting DEEPSYNTH as a crucial benchmark for guiding future research.
【6】Ski Rental with Distributional Predictions of Unknown Quality
标题:质量分布预测未知的滑雪板租赁
链接:https://arxiv.org/abs/2602.21104
作者:Qiming Cui,Michael Dinitz
摘要:我们重新审视的中心在线滑雪租赁问题的“预测算法”的框架,从分布式预测的角度来看。滑雪租赁是最早进行预测研究的问题之一,自然预测只是滑雪天数。但是,将预测视为滑雪日的分布p帽既更自然,也可能更强大。如果真实的滑雪天数是从一些真实的(但未知)分布p,那么我们作为我们的主要结果表明,有一个算法的预期成本最多OPT + O(min(max({eta},1)* sqrt(b),b log b)),其中OPT是真实分布p的最优策略的预期成本,b是购买成本,而{eta}是地球移动器(Wasserstein-1)在p和p-hat之间的距离。注意,当{eta} < o(sqrt(b))时,这给出了小于b(平凡界)的附加损失,并且当{eta}任意大时(对应于极其不准确的预测),我们仍然不会支付超过O(b log b)的附加损失。这些界限的含义是,我们的算法具有一致性O(sqrt(b))(预测误差为0时的附加损失)和鲁棒性O(b log b)(预测误差任意大时的附加损失)。此外,我们不需要假设我们知道(或有任何限制)的预测误差{eta},与以前的工作在鲁棒优化,假设我们知道这个错误。 我们用各种下界来补充这个上限,表明它本质上是紧的:不仅不能改善一致性/鲁棒性权衡,而且我们的特定损失函数不能得到有意义的改善。
摘要:We revisit the central online problem of ski rental in the "algorithms with predictions" framework from the point of view of distributional predictions. Ski rental was one of the first problems to be studied with predictions, where a natural prediction is simply the number of ski days. But it is both more natural and potentially more powerful to think of a prediction as a distribution p-hat over the ski days. If the true number of ski days is drawn from some true (but unknown) distribution p, then we show as our main result that there is an algorithm with expected cost at most OPT + O(min(max({eta}, 1) * sqrt(b), b log b)), where OPT is the expected cost of the optimal policy for the true distribution p, b is the cost of buying, and {eta} is the Earth Mover's (Wasserstein-1) distance between p and p-hat. Note that when {eta} < o(sqrt(b)) this gives additive loss less than b (the trivial bound), and when {eta} is arbitrarily large (corresponding to an extremely inaccurate prediction) we still do not pay more than O(b log b) additive loss. An implication of these bounds is that our algorithm has consistency O(sqrt(b)) (additive loss when the prediction error is 0) and robustness O(b log b) (additive loss when the prediction error is arbitrarily large). Moreover, we do not need to assume that we know (or have any bound on) the prediction error {eta}, in contrast with previous work in robust optimization which assumes that we know this error. We complement this upper bound with a variety of lower bounds showing that it is essentially tight: not only can the consistency/robustness tradeoff not be improved, but our particular loss function cannot be meaningfully improved.
【7】Motivation is Something You Need
标题:动力是你需要的
链接:https://arxiv.org/abs/2602.21064
作者:Mehdi Acheli,Walid Gaaloul
摘要:这项工作介绍了一种新的训练范式,从情感神经科学。受人类大脑中情感和认知的相互作用,更具体地说是SEEKING动机状态的启发,我们设计了一个双模型框架,其中一个较小的基础模型被连续训练,而一个较大的动机模型在预定义的“动机条件”期间被间歇性激活。该框架模拟了高度好奇和期待奖励的情绪状态,其中更广泛的大脑区域被招募来增强认知表现。利用可扩展的架构,其中较大的模型扩展较小的模型,我们的方法可以在值得注意的训练步骤中共享权重更新和选择性扩展网络容量。对图像分类任务的实证评估表明,与传统方案相比,交替训练方案不仅有效地增强了基础模型,而且在某些情况下,尽管每个时期看到的数据较少,但动机模型也超过了其独立模型。这使得同时训练两个针对不同部署限制定制的模型具有竞争力或卓越的性能,同时保持比训练更大模型时更低的训练成本。
摘要
:This work introduces a novel training paradigm that draws from affective neuroscience. Inspired by the interplay of emotions and cognition in the human brain and more specifically the SEEKING motivational state, we design a dual-model framework where a smaller base model is trained continuously, while a larger motivated model is activated intermittently during predefined "motivation conditions". The framework mimics the emotional state of high curiosity and anticipation of reward in which broader brain regions are recruited to enhance cognitive performance. Exploiting scalable architectures where larger models extend smaller ones, our method enables shared weight updates and selective expansion of network capacity during noteworthy training steps. Empirical evaluation on the image classification task demonstrates that, not only does the alternating training scheme efficiently and effectively enhance the base model compared to a traditional scheme, in some cases, the motivational model also surpasses its standalone counterpart despite seeing less data per epoch. This opens the possibility of simultaneously training two models tailored to different deployment constraints with competitive or superior performance while keeping training cost lower than when training the larger model.
【8】T1: One-to-One Channel-Head Binding for Multivariate Time-Series Imputation
标题:T1:多元时间序列插补的一对一平行头绑定
链接:https://arxiv.org/abs/2602.21043
作者:Dongik Park,Hyunwoo Ryu,Suahn Bae,Keondo Park,Hyung-Sin Kim
备注:Accepted at ICLR 2026
摘要:在多元时间序列中填补缺失值仍然是一个挑战,特别是在不同的缺失模式和严重缺失。现有的方法遭受次优性能损坏的时间特征阻碍有效的交叉变量信息传递,放大重建误差。稳健的插补既需要从每个变量内的稀疏观测中提取时间模式,又需要在变量之间选择性地传递信息--然而目前的方法在一个方面表现出色,而在另一个方面却有所妥协。我们介绍了T1(具有1对1通道头绑定的时间序列插补),这是一种CNN-Transformer混合架构,通过通道头绑定实现了强大的插补-一种在CNN通道和注意力头之间创建一对一对应关系的机制。这种设计实现了选择性信息传输:当缺失破坏某些时间模式时,它们对应的注意力路径基于剩余的可观察模式自适应地降低权重,同时通过未受影响的通道保持可靠的交叉变量连接。在11个基准数据集上的实验表明,T1实现了最先进的性能,与第二好的基线相比,MSE平均降低了46%,在极端稀疏(70%的缺失率)下具有特别强的增益。该模型无需重新训练即可推广到不可见的缺失模式,并在所有数据集上使用一致的超参数配置。该代码可在https://github.com/Oppenheimerdinger/T1上获得。
摘要:Imputing missing values in multivariate time series remains challenging, especially under diverse missing patterns and heavy missingness. Existing methods suffer from suboptimal performance as corrupted temporal features hinder effective cross-variable information transfer, amplifying reconstruction errors. Robust imputation requires both extracting temporal patterns from sparse observations within each variable and selectively transferring information across variables--yet current approaches excel at one while compromising the other. We introduce T1 (Time series imputation with 1-to-1 channel-head binding), a CNN-Transformer hybrid architecture that achieves robust imputation through Channel-Head Binding--a mechanism creating one-to-one correspondence between CNN channels and attention heads. This design enables selective information transfer: when missingness corrupts certain temporal patterns, their corresponding attention pathways adaptively down-weight based on remaining observable patterns while preserving reliable cross-variable connections through unaffected channels. Experiments on 11 benchmark datasets demonstrate that T1 achieves state-of-the-art performance, reducing MSE by 46% on average compared to the second-best baseline, with particularly strong gains under extreme sparsity (70% missing ratio). The model generalizes to unseen missing patterns without retraining and uses a consistent hyperparameter configuration across all datasets. The code is available at https://github.com/Oppenheimerdinger/T1.
【9】Does Order Matter : Connecting The Law of Robustness to Robust Generalization
标题:秩序重要吗:将稳健定律与稳健概括联系起来
链接:https://arxiv.org/abs/2602.20971
作者:Himadri Mandal,Vishnu Varadarajan,Jaee Ponde,Aritra Das,Mihir More,Debayan Gupta
摘要:Bubeck和Sellke(2021)提出了一个开放的问题,即鲁棒性定律和鲁棒推广之间的联系。鲁棒性定律指出,过参数化是模型稳健插值的必要条件;特别是,稳健插值要求学习函数是Lipschitz。鲁棒泛化问题是小的鲁棒训练损失是否意味着小的鲁棒测试损失。我们解决这个问题,明确连接任意数据分布的两个。具体来说,我们引入了一个非平凡的概念,强大的推广错误,并将其转换为一个下界的预期Rademacher复杂性的诱导强大的损失类。我们的边界恢复了Wu等人的$Ω(n^{1/d})$机制。(2023),并表明,直到常数,鲁棒推广不改变光滑插值所需的Lipschitz常数的顺序。我们通过实验来探索预测的缩放与数据集大小和模型容量,测试经验行为是否与Bubeck和Sellke(2021)或Wu等人的预测更接近。(2023年)。对于MNIST,我们发现下界Lipschitz常数的尺度是Wu等人预测的量级。(2023年)。非正式地说,为了获得低鲁棒推广误差,Lipschitz常数必须位于我们所限定的范围内,并且允许的扰动半径与Lipschitz尺度有关。
摘要:Bubeck and Sellke (2021) pose as an open problem the connection between the law of robustness and robust generalization. The law of robustness states that overparameterization is necessary for models to interpolate robustly; in particular, robust interpolation requires the learned function to be Lipschitz. Robust generalization asks whether small robust training loss implies small robust test loss. We resolve this problem by explicitly connecting the two for arbitrary data distributions. Specifically, we introduce a nontrivial notion of robust generalization error and convert it into a lower bound on the expected Rademacher complexity of the induced robust loss class. Our bounds recover the $Ω(n^{1/d})$ regime of Wu et al.\ (2023) and show that, up to constants, robust generalization does not change the order of the Lipschitz constant required for smooth interpolation. We conduct experiments to probe the predicted scaling with dataset size and model capacity, testing whether empirical behavior aligns more closely with the predictions of Bubeck and Sellke (2021) or Wu et al.\ (2023). For MNIST, we find that the lower-bound Lipschitz constant scales on the order predicted by Wu et al.\ (2023). Informally, to obtain low robust generalization error, the Lipschitz constant must lie in a range that we bound, and the allowable perturbation radius is linked to the Lipschitz scale.
【10】Hierarchic-EEG2Text: Assessing EEG-To-Text Decoding across Hierarchical Abstraction Levels
标题:分层EGG 2文本:评估跨分层抽象级别的EGG到文本解码
链接:https://arxiv.org/abs/2602.20932
作者:Anupam Sharma,Harish Katti,Prajwal Singh,Shanmuganathan Raman,Krishna Miyapuram
摘要
:脑电图(EEG)记录了从人类头皮测量的大脑中神经元的空间平均电活动。先前的研究已经探索了基于EEG的对象或概念的分类,通常用于被动观看简要呈现的图像或视频刺激,具有有限的类别。由于EEG表现出低信噪比,识别大量类别的细粒度表示仍然具有挑战性;然而,可能存在抽象级对象表示。在这项工作中,我们研究了EEG是否在多个层次上捕获对象表示,并提出了情节分析,其中机器学习(ML)模型在各种相关的分类任务(情节)中进行评估。与以前的情节EEG研究,依赖于固定或随机采样类的相等基数,我们采用层次意识的情节采样使用WordNet生成不同层次的可变类的情节。我们还提出了最大的情节框架在EEG域中检测观察到的文本从EEG信号的PEERS数据集,包括$931538$ EEG样本下的$1610$对象标签,从$264$人类参与者(受试者)执行受控的认知任务,使神经动力学的基础感知,决策和性能监测的研究。 我们研究了语义抽象级别如何影响多种学习技术和架构的分类性能,提供了一个全面的分析。当分类类别是从层次结构的更高级别提取时,模型往往会提高性能,这表明对抽象的敏感性。我们的工作突出了抽象深度作为EEG解码的一个未充分探索的维度,并激励了未来在这个方向上的研究。
摘要:An electroencephalogram (EEG) records the spatially averaged electrical activity of neurons in the brain, measured from the human scalp. Prior studies have explored EEG-based classification of objects or concepts, often for passive viewing of briefly presented image or video stimuli, with limited classes. Because EEG exhibits a low signal-to-noise ratio, recognizing fine-grained representations across a large number of classes remains challenging; however, abstract-level object representations may exist. In this work, we investigate whether EEG captures object representations across multiple hierarchical levels, and propose episodic analysis, in which a Machine Learning (ML) model is evaluated across various, yet related, classification tasks (episodes). Unlike prior episodic EEG studies that rely on fixed or randomly sampled classes of equal cardinality, we adopt hierarchy-aware episode sampling using WordNet to generate episodes with variable classes of diverse hierarchy. We also present the largest episodic framework in the EEG domain for detecting observed text from EEG signals in the PEERS dataset, comprising $931538$ EEG samples under $1610$ object labels, acquired from $264$ human participants (subjects) performing controlled cognitive tasks, enabling the study of neural dynamics underlying perception, decision-making, and performance monitoring. We examine how the semantic abstraction level affects classification performance across multiple learning techniques and architectures, providing a comprehensive analysis. The models tend to improve performance when the classification categories are drawn from higher levels of the hierarchy, suggesting sensitivity to abstraction. Our work highlights abstraction depth as an underexplored dimension of EEG decoding and motivates future research in this direction.
【11】Rethink Efficiency Side of Neural Combinatorial Solver: An Offline and Self-Play Paradigm
标题:重新思考神经组合求解器的效率方面:离线和自我游戏范式
链接:https://arxiv.org/abs/2602.20730
作者:Zhenxing Xu,Zeyuan Ma,Weidong Bao,Hui Yan,Yan Zheng,Ji Wang
摘要:我们提出了ECO,一个多功能的学习范式,使有效的离线自我发挥神经组合优化(NCO)。ECO通过以下方式解决了该领域的关键限制:1)范式转变:超越低效的在线范式,我们引入了由监督预热和迭代直接偏好优化(DPO)组成的两阶段离线范式; 2)架构转变:我们故意设计了基于Mamba的架构,以进一步提高离线范式的效率;以及3)渐进式引导:为了稳定训练,我们采用了基于策略的自举机制,以确保在训练期间持续改进策略。TSP和CVRP的比较结果突出表明,ECO在最新基线上具有竞争力,在内存利用率和训练吞吐量方面具有显着的效率优势。我们对ECO的效率、吞吐量和内存使用情况进行了进一步深入的分析。消融研究显示了我们设计背后的原理。
摘要:We propose ECO, a versatile learning paradigm that enables efficient offline self-play for Neural Combinatorial Optimization (NCO). ECO addresses key limitations in the field through: 1) Paradigm Shift: Moving beyond inefficient online paradigms, we introduce a two-phase offline paradigm consisting of supervised warm-up and iterative Direct Preference Optimization (DPO); 2) Architecture Shift: We deliberately design a Mamba-based architecture to further enhance the efficiency in the offline paradigm; and 3) Progressive Bootstrapping: To stabilize training, we employ a heuristic-based bootstrapping mechanism that ensures continuous policy improvement during training. Comparison results on TSP and CVRP highlight that ECO performs competitively with up-to-date baselines, with significant advantage on the efficiency side in terms of memory utilization and training throughput. We provide further in-depth analysis on the efficiency, throughput and memory usage of ECO. Ablation studies show rationale behind our designs.
【12】QEDBENCH: Quantifying the Alignment Gap in Automated Evaluation of University-Level Mathematical Proofs
标题:QEDBENCH:量化大学水平数学证明自动评估中的一致差距
链接:https://arxiv.org/abs/2602.20629
作者:Santiago Gonzalez,Alireza Amiri Bavandpour,Peter Ye,Edward Zhang,Ruslans Aleksejevs,Todor Antić,Polina Baron,Sujeet Bhalerao,Shubhrajit Bhattacharya,Zachary Burton,John Byrne,Hyungjun Choi,Nujhat Ahmed Disha,Koppany István Encz,Yuchen Fang,Robert Joseph George,Ebrahim Ghorbani,Alan Goldfarb,Jing Guo,Meghal Gupta,Stefano Huber,Annika Kanckos,Minjung Kang,Hyun Jong Kim,Dino Lorenzini,Levi Lorenzo,Tianyi Mao,Giovanni Marzenta,Ariane M. Masuda,Lukas Mauth,Ana Mickovic,Andres Miniguano-Trujillo,Antoine Moulin,Wenqi Ni,Tomos Parry,Kevin Ren,Hossein Roodbarani,Mathieu Rundström,Manjil Saikia,Detchat Samart,Rebecca Steiner,Connor Stewart,Dhara Thakkar,Jeffrey Tse,Vasiliki Velona,Yunhai Xiang,Sibel Yalçın,Jun Yan,Ji Zeng,Arman Cohan,Quanquan C. Liu
摘要:随着大型语言模型(LLM)饱和的基本基准,研究前沿已经从生成转移到自动评估的可靠性。我们证明,标准的“法学硕士作为一个法官”协议遭受系统的对齐差距时,适用于高本科生早期研究生水平的数学。为了量化这一点,我们引入了QEDBench,这是第一个大规模的双标题对齐基准,通过将特定课程的标题与专家常识标准进行对比,系统地测量与人类专家在大学水平数学证明上的对齐。通过部署一个双重评估矩阵(7个法官x 5个求解器)对1,000多小时的人类评估,我们发现某些前沿评估器,如Claude Opus 4.5,DeepSeek-V3,Qwen 2.5 Max和Llama 4 Maverick表现出显著的正偏差(分别高达+0.18,+0.20,+0.30,+0.36平均分数膨胀)。此外,我们发现了离散域中的关键推理差距:虽然Gemini 3.0 Pro达到了最先进的性能(平均人类评估得分为0.91),但其他推理模型,如GPT-5 Pro和Claude Sonnet 4.5,在离散域中的性能显着下降。具体来说,他们的平均人类评估分数在离散数学中降至0.72和0.63,在图论中降至0.74和0.50。除了这些研究成果,我们还发布了QEDBench作为评估和改进AI法官的公共基准。我们的基准在www.example.com上公开发布。
摘要
:As Large Language Models (LLMs) saturate elementary benchmarks, the research frontier has shifted from generation to the reliability of automated evaluation. We demonstrate that standard "LLM-as-a-Judge" protocols suffer from a systematic Alignment Gap when applied to upper-undergraduate to early graduate level mathematics. To quantify this, we introduce QEDBench, the first large-scale dual-rubric alignment benchmark to systematically measure alignment with human experts on university-level math proofs by contrasting course-specific rubrics against expert common knowledge criteria. By deploying a dual-evaluation matrix (7 judges x 5 solvers) against 1,000+ hours of human evaluation, we reveal that certain frontier evaluators like Claude Opus 4.5, DeepSeek-V3, Qwen 2.5 Max, and Llama 4 Maverick exhibit significant positive bias (up to +0.18, +0.20, +0.30, +0.36 mean score inflation, respectively). Furthermore, we uncover a critical reasoning gap in the discrete domain: while Gemini 3.0 Pro achieves state-of-the-art performance (0.91 average human evaluation score), other reasoning models like GPT-5 Pro and Claude Sonnet 4.5 see their performance significantly degrade in discrete domains. Specifically, their average human evaluation scores drop to 0.72 and 0.63 in Discrete Math, and to 0.74 and 0.50 in Graph Theory. In addition to these research results, we also release QEDBench as a public benchmark for evaluating and improving AI judges. Our benchmark is publicly published at https://github.com/qqliu/Yale-QEDBench.
【13】Upper-Linearizability of Online Non-Monotone DR-Submodular Maximization over Down-Closed Convex Sets
标题:下闭凸集上在线非单调DR-次模最大化的上线性化
链接:https://arxiv.org/abs/2602.20578
作者:Yiyang Lu,Haresh Jadav,Mohammad Pedramfar,Ranveer Singh,Vaneet Aggarwal
摘要:我们研究在线最大化的非单调递减回报(DR)下闭凸集,现有的无投影在线方法遭受次优遗憾和有限的反馈保证制度的次模函数。我们的主要贡献是一个新的结构结果表明,这个类是1/E$-线性化精心设计的指数reparametrization,缩放参数,和代理潜力,使减少在线线性优化。因此,我们获得了$O(T^{1/2})$静态遗憾,每轮一个梯度查询和解锁自适应和动态遗憾保证,以及半强盗,强盗和零阶反馈下的改进率。在所有反馈模型中,我们的边界严格改进了现有技术。
摘要:We study online maximization of non-monotone Diminishing-Return(DR)-submodular functions over down-closed convex sets, a regime where existing projection-free online methods suffer from suboptimal regret and limited feedback guarantees. Our main contribution is a new structural result showing that this class is $1/e$-linearizable under carefully designed exponential reparametrization, scaling parameter, and surrogate potential, enabling a reduction to online linear optimization. As a result, we obtain $O(T^{1/2})$ static regret with a single gradient query per round and unlock adaptive and dynamic regret guarantees, together with improved rates under semi-bandit, bandit, and zeroth-order feedback. Across all feedback models, our bounds strictly improve the state of the art.
【14】Inner Speech as Behavior Guides: Steerable Imitation of Diverse Behaviors for Human-AI coordination
标题:内心言语作为行为指南:人类与人工智能协调的多样化行为的可控模仿
链接:https://arxiv.org/abs/2602.20517
作者:Rakshit Trivedi,Kartik Sharma,David C Parkes
备注:Spotlight paper at NeurIPS 2025
摘要:有效的人类-人工智能协调需要人工智能体能够展示和响应类似人类的行为,同时适应不断变化的环境。模仿学习已经成为通过训练它们模仿人类演示行为来构建此类代理的突出方法之一。然而,目前的方法难以捕捉人类行为的固有多样性和非马尔可夫性质,并且缺乏在推理时引导行为的能力。从人类认知过程的理论中汲取灵感,其中内部言语在执行之前指导动作选择,我们提出了MIMIC(模仿和控制的内部动机建模),这是一个使用语言作为行为意图的内部表示的框架。MIMIC采用视觉语言模型作为语言脚手架的新用途,训练一个条件变分自动编码器,能够从观察产生内部语音。然后,基于扩散的行为克隆策略选择以当前观察和生成的内部语音为条件的动作。MIMIC通过在特定于行为的语音上调节代理,在推理时实现细粒度的行为转向。跨机器人操作任务和人类-AI协作游戏的实验表明,MIMIC显著增强了行为多样性和对人类演示的保真度,同时实现了细微的行为转向,而无需对额外的演示进行培训。我们开源我们的代码,并提供预先培训的MIMIC代理和定性演示:https://mimic-research.github.io。
摘要:Effective human-AI coordination requires artificial agents capable of exhibiting and responding to human-like behaviors while adapting to changing contexts. Imitation learning has emerged as one of the prominent approaches to build such agents by training them to mimic human-demonstrated behaviors. However, current methods struggle to capture the inherent diversity and non-Markovian nature of human behavior and lack the ability to steer behavior at inference time. Drawing inspiration from the theory of human cognitive processes, where inner speech guides action selection before execution, we propose MIMIC (Modeling Inner Motivations for Imitation and Control), a framework that uses language as an internal representation of behavioral intent. MIMIC employs the novel use of vision-language models as linguistic scaffolding to train a conditional variational autoencoder capable of generating inner speech from observations. A diffusion-based behavior cloning policy then selects actions conditioned on current observations and the generated inner speech. MIMIC enables fine-grained steering of behavior at inference time by conditioning the agent on behavior-specific speech. Experiments across robotic manipulation tasks and human-AI collaboration games demonstrate that MIMIC significantly enhances both behavior diversity and fidelity to human demonstrations while enabling nuanced behavioral steering without training on additional demonstrations. We open source our code and provide pre-trained MIMIC agents and qualitative demos at: https://mimic-research.github.io.
【15】ActionEngine: From Reactive to Programmatic GUI Agents via State Machine Memory
标题:收件箱引擎:通过状态机内存从反应式到可编程的图形用户界面代理
链接:https://arxiv.org/abs/2602.20502
作者:Hongbin Zhong,Fazle Faisal,Luis França,Tanakorn Leesatapornwongsa,Adriana Szekeres,Kexin Rong,Suman Nath
摘要:现有的图形用户界面(GUI)代理通过逐步调用视觉语言模型进行操作-拍摄屏幕截图,推理下一个动作,执行它,然后在新页面上重复-导致高成本和延迟,随着推理步骤的数量而扩展,并且由于没有以前访问过的页面的持久内存,准确性有限。 我们提出了一个无训练的框架,从反应式执行过渡到编程规划,通过一个新的两个代理架构:一个爬行代理,通过离线探索构建GUI的可更新状态机内存,和一个执行代理,利用此内存来合成完整的,可执行的Python程序,用于在线任务执行。 为了确保对不断变化的接口的鲁棒性,执行失败会触发基于视觉的重新接地回退,修复失败的操作并更新内存。 这种设计大大提高了效率和准确性:在WebArena基准测试的Reddit任务上,我们的代理平均只需一个LLM调用就可以实现95%的任务成功率,而最强的仅视觉基线则为66%,同时将成本降低了11.8倍,端到端延迟降低了2倍。 这些组件通过将全局程序规划、爬虫验证的操作模板和节点级执行与本地化验证和修复相结合,共同产生可扩展且可靠的GUI交互。
摘要
:Existing Graphical User Interface (GUI) agents operate through step-by-step calls to vision language models--taking a screenshot, reasoning about the next action, executing it, then repeating on the new page--resulting in high costs and latency that scale with the number of reasoning steps, and limited accuracy due to no persistent memory of previously visited pages. We propose ActionEngine, a training-free framework that transitions from reactive execution to programmatic planning through a novel two-agent architecture: a Crawling Agent that constructs an updatable state-machine memory of the GUIs through offline exploration, and an Execution Agent that leverages this memory to synthesize complete, executable Python programs for online task execution. To ensure robustness against evolving interfaces, execution failures trigger a vision-based re-grounding fallback that repairs the failed action and updates the memory. This design drastically improves both efficiency and accuracy: on Reddit tasks from the WebArena benchmark, our agent achieves 95% task success with on average a single LLM call, compared to 66% for the strongest vision-only baseline, while reducing cost by 11.8x and end-to-end latency by 2x. Together, these components yield scalable and reliable GUI interaction by combining global programmatic planning, crawler-validated action templates, and node-level execution with localized validation and repair.
【16】VINA: Variational Invertible Neural Architectures
标题:VINA:变分可逆神经架构
链接:https://arxiv.org/abs/2602.20480
作者:Shubhanshu Shekhar,Mohammad Javad Khojasteh,Ananya Acharya,Tony Tohme,Kamal Youcef-Toumi
备注:57 pages, 11 figures, 5 tables
摘要:规范化流(NF)的独特架构特征,特别是双射性和易处理的雅可比矩阵,使它们非常适合生成建模。可逆神经网络(INN)建立在这些原则的基础上,以解决有监督的逆问题,实现正向和反向映射的直接建模。在本文中,我们从理论和实践的角度重新审视这些架构,并解决文献中的一个关键空白:缺乏理论保证的近似质量在现实的假设下,无论是在INN的后验推理或生成建模与NF。 我们引入了一个基于变分无监督损失函数的INN和NF的统一框架,其灵感来自相关领域的类似公式,如生成对抗网络(GANs)和用于训练规范化流的精确召回发散。在这个框架内,我们推导出理论性能保证,量化后的精度为INN和分布精度为NF,假设是较弱的,更实际的现实比以前的工作中使用的。 在这些理论成果的基础上,我们进行了广泛的案例研究,提炼出一般的设计原则和实践指导方针。最后,我们展示了我们的方法对一个现实的海洋声学反演问题的有效性。
摘要:The distinctive architectural features of normalizing flows (NFs), notably bijectivity and tractable Jacobians, make them well-suited for generative modeling. Invertible neural networks (INNs) build on these principles to address supervised inverse problems, enabling direct modeling of both forward and inverse mappings. In this paper, we revisit these architectures from both theoretical and practical perspectives and address a key gap in the literature: the lack of theoretical guarantees on approximation quality under realistic assumptions, whether for posterior inference in INNs or for generative modeling with NFs. We introduce a unified framework for INNs and NFs based on variational unsupervised loss functions, inspired by analogous formulations in related areas such as generative adversarial networks (GANs) and the Precision-Recall divergence for training normalizing flows. Within this framework, we derive theoretical performance guarantees, quantifying posterior accuracy for INNs and distributional accuracy for NFs, under assumptions that are weaker and more practically realistic than those used in prior work. Building on these theoretical results, we conduct extensive case studies to distill general design principles and practical guidelines. We conclude by demonstrating the effectiveness of our approach on a realistic ocean-acoustic inversion problem.
【17】Prior-Agnostic Incentive-Compatible Exploration
标题:先前不可知的激励兼容探索
链接:https://arxiv.org/abs/2602.20465
作者:Ramya Ramalingam,Osbert Bastani,Aaron Roth
摘要:在强盗设置中,优化长期后悔指标需要探索,这相当于有时采取短视的次优行动。当一个长寿的委托人仅仅推荐由一系列不同的代理人执行的动作时(如在在线推荐平台中),这提供了一种激励错位:探索对委托人来说是“值得的”,但对代理人来说不是。以前的工作研究后悔最小化的约束下,贝叶斯激励相容性在一个静态的随机设置与一个固定的和共同的先验之间共享的代理和算法设计者。 我们表明,(加权)交换遗憾界本身足以使代理忠实地遵循预测的近似贝叶斯纳什均衡,即使在动态环境中,代理有冲突的先验信念和机制设计者没有任何代理信念的知识。为了获得这些界限,有必要假设代理不仅对奖励有一定程度的不确定性,而且对他们的到达时间也有一定程度的不确定性,即他们在算法所服务的代理序列中的相对位置。我们实例化我们的抽象边界与具体的算法,以保证自适应和加权后悔在强盗设置。
摘要:In bandit settings, optimizing long-term regret metrics requires exploration, which corresponds to sometimes taking myopically sub-optimal actions. When a long-lived principal merely recommends actions to be executed by a sequence of different agents (as in an online recommendation platform) this provides an incentive misalignment: exploration is "worth it" for the principal but not for the agents. Prior work studies regret minimization under the constraint of Bayesian Incentive-Compatibility in a static stochastic setting with a fixed and common prior shared amongst the agents and the algorithm designer. We show that (weighted) swap regret bounds on their own suffice to cause agents to faithfully follow forecasts in an approximate Bayes Nash equilibrium, even in dynamic environments in which agents have conflicting prior beliefs and the mechanism designer has no knowledge of any agents beliefs. To obtain these bounds, it is necessary to assume that the agents have some degree of uncertainty not just about the rewards, but about their arrival time -- i.e. their relative position in the sequence of agents served by the algorithm. We instantiate our abstract bounds with concrete algorithms for guaranteeing adaptive and weighted regret in bandit settings.
【18】Nonparametric Teaching of Attention Learners
标题:注意力学习者的非参数教学
链接:https://arxiv.org/abs/2602.20461
作者:Chen Zhang,Jianghui Wang,Bingyang Cheng,Zhongtao Chen,Wendong XU,Cong Wang,Marco Canini,Francesco Orabona,Yik Chung WU,Ngai Wong
备注:ICLR 2026 (36 pages, 6 figures)
摘要
:注意力学习器,建立在注意力机制上的神经网络,例如,Transformers,擅长学习将序列与其相应属性相关联的隐式关系,例如,将给定的令牌序列映射到下一个令牌的概率。然而,学习过程往往是昂贵的。为了解决这个问题,我们提出了一个新的范式命名为注意神经教学(AtteNT),重新解释学习过程中通过非参数教学的角度。具体地说,后者提供了一个理论框架,用于教授隐含定义的映射(即,非参数)。这种隐式映射通过一组密集的序列属性对来体现,AtteNT教师选择一个子集来加速注意力学习者训练的收敛。通过分析研究注意力在训练过程中对基于参数的梯度下降的作用,并通过非参数教学中的函数梯度下降来重新塑造注意力学习者的进化,通过参数更新,我们首次证明了教学注意力学习者与教学重要性自适应非参数学习者是一致的。这些新的发现使AtteNT致力于提高注意学习者的学习效率。具体来说,我们观察到LLM的训练时间减少了13.01%,ViT减少了20.58%,包括微调和从头开始的训练。至关重要的是,这些收益是在不影响准确性的情况下实现的;事实上,性能在各种下游任务中得到了一致的保持和增强。
摘要:Attention learners, neural networks built on the attention mechanism, e.g., transformers, excel at learning the implicit relationships that relate sequences to their corresponding properties, e.g., mapping a given sequence of tokens to the probability of the next token. However, the learning process tends to be costly. To address this, we present a novel paradigm named Attention Neural Teaching (AtteNT) that reinterprets the learning process through a nonparametric teaching perspective. Specifically, the latter provides a theoretical framework for teaching mappings that are implicitly defined (i.e., nonparametric) via example selection. Such an implicit mapping is embodied through a dense set of sequence-property pairs, with the AtteNT teacher selecting a subset to accelerate convergence in attention learner training. By analytically investigating the role of attention on parameter-based gradient descent during training, and recasting the evolution of attention learners, shaped by parameter updates, through functional gradient descent in nonparametric teaching, we show for the first time that teaching attention learners is consistent with teaching importance-adaptive nonparametric learners. These new findings readily commit AtteNT to enhancing learning efficiency of attention learners. Specifically, we observe training time reductions of 13.01% for LLMs and 20.58% for ViTs, spanning both fine-tuning and training-from-scratch regimes. Crucially, these gains are achieved without compromising accuracy; in fact, performance is consistently preserved and often enhanced across a diverse set of downstream tasks.
【19】Imputation of Unknown Missingness in Sparse Electronic Health Records
标题:稀疏电子健康记录中未知失踪的归责
链接:https://arxiv.org/abs/2602.20442
作者:Jun Han,Josue Nassar,Sanjit Singh Batra,Aldo Cordova-Palomera,Vijay Nori,Robert E. Tillman
摘要:机器学习为推进医学领域的发展带来了巨大的希望,电子健康记录(EHR)是主要的数据源。然而,由于医疗保健提供者之间数据收集和共享的各种挑战和限制,EHR通常是稀疏的,并且包含缺失的数据。用于填补缺失值的现有技术主要集中在已知的未知数上,例如实验室测试结果的缺失值或不可用值;大多数技术都没有明确解决难以区分缺失内容的情况。例如,EHR中缺失的诊断代码可能意味着患者尚未被诊断出患有该疾病,或者已经做出诊断,但未被提供者共享。这种情况属于未知的未知的范式。为了解决这一挑战,我们开发了一个通用的算法去噪数据,以恢复未知的缺失值在二进制EHR。我们设计了一个基于变换器的去噪神经网络,其中对输出进行自适应阈值处理,以便在我们预测数据丢失的情况下恢复值。我们的研究结果表明,与现有的插补方法相比,在真实的EHR数据集中对医疗代码进行去噪的准确性有所提高,并且使用去噪数据提高了下游任务的性能。特别是,当将我们的方法应用于现实世界的应用,从EHR预测医院再入院时,我们的方法在所有现有基线上实现了统计学上的显著改善。
摘要:Machine learning holds great promise for advancing the field of medicine, with electronic health records (EHRs) serving as a primary data source. However, EHRs are often sparse and contain missing data due to various challenges and limitations in data collection and sharing between healthcare providers. Existing techniques for imputing missing values predominantly focus on known unknowns, such as missing or unavailable values of lab test results; most do not explicitly address situations where it is difficult to distinguish what is missing. For instance, a missing diagnosis code in an EHR could signify either that the patient has not been diagnosed with the condition or that a diagnosis was made, but not shared by a provider. Such situations fall into the paradigm of unknown unknowns. To address this challenge, we develop a general purpose algorithm for denoising data to recover unknown missing values in binary EHRs. We design a transformer-based denoising neural network where the output is thresholded adaptively to recover values in cases where we predict data are missing. Our results demonstrate improved accuracy in denoising medical codes within a real EHR dataset compared to existing imputation approaches and leads to increased performance on downstream tasks using the denoised data. In particular, when applying our method to a real world application, predicting hospital readmission from EHRs, our method achieves statistically significant improvement over all existing baselines.
【20】GeoPT: Scaling Physics Simulation via Lifted Geometric Pre-Training
标题:GeoPT:通过提升几何预训练来缩放物理模拟
链接:https://arxiv.org/abs/2602.20399
作者:Haixu Wu,Minghao Guo,Zongyi Li,Zhiyang Dou,Mingsheng Long,Kaiming He,Wojciech Matusik
摘要:神经模拟器有望成为物理模拟的有效替代品,但由于生成高保真训练数据的成本过高,因此无法对其进行缩放。对大量现成几何图形的预训练提供了一种自然的选择,但面临着一个根本的差距:仅对静态几何图形的监督忽视了动态,并可能导致物理任务的负迁移。我们提出了GeoPT,这是一个基于提升几何预训练的通用物理模拟统一预训练模型。其核心思想是用合成动力学来增强几何,实现动态感知的自我监督,而无需物理标签。GeoPT在超过100万个样本上进行了预训练,不断提高工业保真度基准,涵盖汽车,飞机和船舶的流体力学,以及碰撞模拟中的固体力学,将标记数据要求减少20-60%,并将收敛速度加快2倍。这些结果表明,使用合成动力学的提升弥合了几何-物理差距,为神经模拟和潜在的超越打开了一条可扩展的道路。代码可在https://github.com/Physics-Scaling/GeoPT上获得。
摘要
:Neural simulators promise efficient surrogates for physics simulation, but scaling them is bottlenecked by the prohibitive cost of generating high-fidelity training data. Pre-training on abundant off-the-shelf geometries offers a natural alternative, yet faces a fundamental gap: supervision on static geometry alone ignores dynamics and can lead to negative transfer on physics tasks. We present GeoPT, a unified pre-trained model for general physics simulation based on lifted geometric pre-training. The core idea is to augment geometry with synthetic dynamics, enabling dynamics-aware self-supervision without physics labels. Pre-trained on over one million samples, GeoPT consistently improves industrial-fidelity benchmarks spanning fluid mechanics for cars, aircraft, and ships, and solid mechanics in crash simulation, reducing labeled data requirements by 20-60% and accelerating convergence by 2$\times$. These results show that lifting with synthetic dynamics bridges the geometry-physics gap, unlocking a scalable path for neural simulation and potentially beyond. Code is available at https://github.com/Physics-Scaling/GeoPT.
【21】cc-Shapley: Measuring Multivariate Feature Importance Needs Causal Context
标题:cc-Shapley:衡量多元特征重要性需要因果背景
链接:https://arxiv.org/abs/2602.20396
作者:Jörg Martin,Stefan Haufe
摘要:可解释的人工智能有望产生对相关特征的洞察,从而使人类能够检查和仔细检查机器学习模型,甚至促进科学发现。考虑到Shapley值的广泛应用,我们发现纯粹的数据驱动的操作化的多变量特征的重要性是不适合这样的目的。即使对于具有两个特征的简单问题,由于对撞机的偏差和抑制,仅在另一个特征的观测背景下考虑一个特征会产生虚假的关联,这可能导致误解。需要关于数据生成过程的因果知识来识别和纠正这种误导性的特征属性。我们提出了cc-Shapley(因果关系Shapley),利用数据因果结构的知识对传统观察Shapley值进行干预性修改,从而分析剩余特征的因果关系中特征的相关性。我们从理论上表明,这根除了由对撞机偏见引起的虚假关联。我们比较了Shapley和cc-Shapley值在各种合成数据集和真实数据集上的行为。我们观察到无效或逆转的关联相比,单变量特征的重要性时,从观察到CC-Shapley。
摘要:Explainable artificial intelligence promises to yield insights into relevant features, thereby enabling humans to examine and scrutinize machine learning models or even facilitating scientific discovery. Considering the widespread technique of Shapley values, we find that purely data-driven operationalization of multivariate feature importance is unsuitable for such purposes. Even for simple problems with two features, spurious associations due to collider bias and suppression arise from considering one feature only in the observational context of the other, which can lead to misinterpretations. Causal knowledge about the data-generating process is required to identify and correct such misleading feature attributions. We propose cc-Shapley (causal context Shapley), an interventional modification of conventional observational Shapley values leveraging knowledge of the data's causal structure, thereby analyzing the relevance of a feature in the causal context of the remaining features. We show theoretically that this eradicates spurious association induced by collider bias. We compare the behavior of Shapley and cc-Shapley values on various, synthetic, and real-world datasets. We observe nullification or reversal of associations compared to univariate feature importance when moving from observational to cc-Shapley.
【22】No One Size Fits All: QueryBandits for Hallucination Mitigation
标题:没有一种尺寸适合所有人:用于缓解幻觉的查询盗贼
链接:https://arxiv.org/abs/2602.20332
作者:Nicole Cho,William Watson,Alec Koppel,Sumitra Ganesh,Manuela Veloso
摘要:大型语言模型(LLM)中的高级推理能力导致了更频繁的幻觉;但大多数缓解工作都集中在用于事后检测和参数编辑的开源模型上。特别令人担忧的是,缺乏专注于闭源模型中的幻觉的研究,因为它们构成了机构部署中的绝大多数模型。我们介绍QueryBandits,一个模型无关的上下文强盗框架,自适应学习在线选择最佳的查询重写策略,利用经验验证和校准的奖励函数。在16个QA场景中,我们的顶级QueryBandit(Thompson Sampling)在无重写基线上实现了87.5%的胜率,并优于zero-shot静态策略(例如,释义或扩展)分别为42.6%和60.3%。此外,所有上下文强盗在所有数据集上的表现都优于香草强盗,更高的特征方差与手臂选择的更大方差相一致。这证实了我们的发现,有没有一个最佳的重写策略的所有查询。我们还发现,某些静态策略会产生更高的累积后悔比没有重写,这表明一个不灵活的查询重写政策会恶化幻觉。因此,使用QueryBandits学习语义特征的在线策略可以纯粹通过前向传递机制改变模型行为,使其能够与闭源模型一起使用,并绕过重新训练或基于梯度的适应。
摘要:Advanced reasoning capabilities in Large Language Models (LLMs) have led to more frequent hallucinations; yet most mitigation work focuses on open-source models for post-hoc detection and parameter editing. The dearth of studies focusing on hallucinations in closed-source models is especially concerning, as they constitute the vast majority of models in institutional deployments. We introduce QueryBandits, a model-agnostic contextual bandit framework that adaptively learns online to select the optimal query-rewrite strategy by leveraging an empirically validated and calibrated reward function. Across 16 QA scenarios, our top QueryBandit (Thompson Sampling) achieves an 87.5% win rate over a No-Rewrite baseline and outperforms zero-shot static policies (e.g., Paraphrase or Expand) by 42.6% and 60.3%, respectively. Moreover, all contextual bandits outperform vanilla bandits across all datasets, with higher feature variance coinciding with greater variance in arm selection. This substantiates our finding that there is no single rewrite policy optimal for all queries. We also discover that certain static policies incur higher cumulative regret than No-Rewrite, indicating that an inflexible query-rewriting policy can worsen hallucinations. Thus, learning an online policy over semantic features with QueryBandits can shift model behavior purely through forward-pass mechanisms, enabling its use with closed-source models and bypassing the need for retraining or gradient-based adaptation.
【23】CaDrift: A Time-dependent Causal Generator of Drifting Data Streams
标题:CaDrift:漂移数据流的时间相关因果生成器
链接:https://arxiv.org/abs/2602.20329
作者:Eduardo V. L. Barboza,Jean Paul Barddal,Robert Sabourin,Rafael M. O. Cruz
备注:Paper submitted to ICLR 2026
摘要:本文提出了一种基于结构因果模型的时间相关合成数据生成器框架CaDrift。该框架产生了一个几乎无限的数据流与受控的移位事件和时间相关的数据的组合,使其成为一个工具,以评估不断变化的数据下的方法。CaDrift通过SCM的漂移映射函数合成各种分布和协变量的变化,这些变化改变了特征和目标之间的潜在因果关系。此外,CaDrift通过利用因果建模中的干预来对偶尔的扰动进行建模。实验结果表明,在分布式移位事件之后,分类器的准确率趋于下降,随后是逐渐的检索,证实了生成器在模拟移位方面的有效性。该框架已在GitHub上提供。
摘要
:This work presents Causal Drift Generator (CaDrift), a time-dependent synthetic data generator framework based on Structural Causal Models (SCMs). The framework produces a virtually infinite combination of data streams with controlled shift events and time-dependent data, making it a tool to evaluate methods under evolving data. CaDrift synthesizes various distributional and covariate shifts by drifting mapping functions of the SCM, which change underlying cause-and-effect relationships between features and the target. In addition, CaDrift models occasional perturbations by leveraging interventions in causal modeling. Experimental results show that, after distributional shift events, the accuracy of classifiers tends to drop, followed by a gradual retrieval, confirming the generator's effectiveness in simulating shifts. The framework has been made available on GitHub.
【24】Discrete Diffusion with Sample-Efficient Estimators for Conditionals
标题:具有条件样本有效估计器的离散扩散
链接:https://arxiv.org/abs/2602.20293
作者:Karthik Elamvazhuthi,Abhijith Jayakumar,Andrey Y. Lokhov
摘要:我们研究了一个离散去噪扩散框架,该框架集成了单站点条件的样本有效估计器,并在离散状态空间上进行生成建模的循环噪声和去噪动态。而不是近似一个离散模拟的得分函数,我们的配方对待单网站的条件概率的基本对象,参数化的反向扩散过程。我们采用一种称为神经交互筛选估计(NeurISE)的样本有效方法来估计扩散动力学中的这些条件。合成伊辛模型,MNIST,和科学数据集产生的D波量子退火,合成波茨模型和一维量子系统的控制实验证明了所提出的方法。在二进制数据集上,这些实验表明,所提出的方法优于流行的现有方法,包括基于比率的方法,在总变差、互相关和核密度估计度量方面实现了改进的性能。
摘要:We study a discrete denoising diffusion framework that integrates a sample-efficient estimator of single-site conditionals with round-robin noising and denoising dynamics for generative modeling over discrete state spaces. Rather than approximating a discrete analog of a score function, our formulation treats single-site conditional probabilities as the fundamental objects that parameterize the reverse diffusion process. We employ a sample-efficient method known as Neural Interaction Screening Estimator (NeurISE) to estimate these conditionals in the diffusion dynamics. Controlled experiments on synthetic Ising models, MNIST, and scientific data sets produced by a D-Wave quantum annealer, synthetic Potts model and one-dimensional quantum systems demonstrate the proposed approach. On the binary data sets, these experiments demonstrate that the proposed approach outperforms popular existing methods including ratio-based approaches, achieving improved performance in total variation, cross-correlations, and kernel density estimation metrics.
【25】The Truthfulness Spectrum Hypothesis
标题:真实性谱假设
链接:https://arxiv.org/abs/2602.20273
作者:Zhuofan Josh Ying,Shauli Ravfogel,Nikolaus Kriegeskorte,Peter Hase
备注:28 pages, 26 figures
摘要:据报道,大型语言模型(LLM)对真实性进行了线性编码,但最近的工作对这一发现的普遍性提出了质疑。我们调和这些观点与真实性谱假设:表征空间包含的方向范围从广泛的领域一般到狭隘的领域特定的。为了验证这一假设,我们系统地评估了五种真理类型(定义,经验,逻辑,虚构和道德),阿谀奉承和期望倒置的谎言,以及现有的诚实基准探测泛化。线性探测器在大多数领域都能很好地推广,但在阿谀奉承和期望倒置的谎言上却失败了。然而,所有领域的训练共同恢复了强大的性能,证实了尽管成对转移不佳,但领域一般方向仍然存在。探针方向的几何形状解释了这些模式:探针之间的马氏余弦相似性几乎完美地预测了跨域泛化(R^2=0.98)。概念擦除方法进一步隔离了(1)领域通用的,(2)领域特定的,或(3)仅在特定领域子集之间共享的真值方向。因果干预表明,特定领域的方向比一般领域的方向更有效地引导。最后,后期训练重塑了真相几何,将奉承性谎言从其他真相类型中推得更远,这表明聊天模型的奉承倾向有一个表征基础。总之,我们的研究结果支持真实性谱假设:不同一般性的真理方向共存于表征空间中,训练后重塑其几何形状。所有实验的代码在https://github.com/zfying/truth_spec中提供。
摘要:Large language models (LLMs) have been reported to linearly encode truthfulness, yet recent work questions this finding's generality. We reconcile these views with the truthfulness spectrum hypothesis: the representational space contains directions ranging from broadly domain-general to narrowly domain-specific. To test this hypothesis, we systematically evaluate probe generalization across five truth types (definitional, empirical, logical, fictional, and ethical), sycophantic and expectation-inverted lying, and existing honesty benchmarks. Linear probes generalize well across most domains but fail on sycophantic and expectation-inverted lying. Yet training on all domains jointly recovers strong performance, confirming that domain-general directions exist despite poor pairwise transfer. The geometry of probe directions explains these patterns: Mahalanobis cosine similarity between probes near-perfectly predicts cross-domain generalization (R^2=0.98). Concept-erasure methods further isolate truth directions that are (1) domain-general, (2) domain-specific, or (3) shared only across particular domain subsets. Causal interventions reveal that domain-specific directions steer more effectively than domain-general ones. Finally, post-training reshapes truth geometry, pushing sycophantic lying further from other truth types, suggesting a representational basis for chat models' sycophantic tendencies. Together, our results support the truthfulness spectrum hypothesis: truth directions of varying generality coexist in representational space, with post-training reshaping their geometry. Code for all experiments is provided in https://github.com/zfying/truth_spec.
【26】SMaRT: Online Reusable Resource Assignment and an Application to Mediation in the Kenyan Judiciary
标题:SMaRT:在线可重复使用资源分配和肯尼亚司法机构的调解应用
链接:https://arxiv.org/abs/2602.18431
作者:Shafkat Farabi,Didac Marti Pinto,Wei Lu,Manuel Ramos-Maqueda,Sanmay Das,Antoine Deeb,Anja Sautmann
摘要
:出于肯尼亚司法案件的调解员分配问题,我们研究了在线资源分配问题,传入的任务(案件)必须立即分配给可用的,能力有限的资源(调解员)。这些资源的质量不同,可能需要学习。此外,资源只能分配给与其他资源可以分配给的任务子集在不同程度上重叠的任务子集。目标是最大限度地完成任务,同时满足所有资源的软容量约束。现实世界问题的规模带来了巨大的挑战,因为有超过2000个调解人和地理位置(87)和案例类型(12)的多种组合,每个调解人都有资格工作。这些特征,新资源的未知质量,软容量限制和高维状态空间,使得现有的调度和资源分配算法不适用或效率低。我们正式的问题,在一个易于处理的方式使用二次规划分配和多代理土匪式的框架学习。我们展示了我们的新算法,SMaRT(选择调解人,是正确的任务),与基线相比,程式化的调解人分配问题的实例的关键属性和优势。然后,我们考虑其应用程序的真实世界的数据,从肯尼亚司法机构的案件和调解员。SMaRT优于基线,并允许控制能力约束的严格性和整体情况下的解决率之间的权衡,无论是在设置调解质量是事先已知的,并在土匪般的设置,学习是问题定义的一部分。基于这些结果,我们计划在不久的将来在司法部门进行一项SMaRT随机对照试验。
摘要:Motivated by the problem of assigning mediators to cases in the Kenyan judicial, we study an online resource allocation problem where incoming tasks (cases) must be immediately assigned to available, capacity-constrained resources (mediators). The resources differ in their quality, which may need to be learned. In addition, resources can only be assigned to a subset of tasks that overlaps to varying degrees with the subset of tasks other resources can be assigned to. The objective is to maximize task completion while satisfying soft capacity constraints across all the resources. The scale of the real-world problem poses substantial challenges, since there are over 2000 mediators and a multitude of combinations of geographic locations (87) and case types (12) that each mediator is qualified to work on. Together, these features, unknown quality of new resources, soft capacity constraints, and a high-dimensional state space, make existing scheduling and resource allocation algorithms either inapplicable or inefficient. We formalize the problem in a tractable manner using a quadratic program formulation for assignment and a multi-agent bandit-style framework for learning. We demonstrate the key properties and advantages of our new algorithm, SMaRT (Selecting Mediators that are Right for the Task), compared with baselines on stylized instances of the mediator allocation problem. We then consider its application to real-world data on cases and mediators from the Kenyan judiciary. SMaRT outperforms baselines and allows control over the tradeoff between the strictness of capacity constraints and overall case resolution rates, both in settings where mediator quality is known beforehand and in bandit-like settings where learning is part of the problem definition. On the strength of these results, we plan to run a randomized controlled trial with SMaRT in the judiciary in the near future.
【27】ShaRP: Shape-Regularized Multidimensional Projections
标题:ShaRP:形状规则化多维投影
链接:https://arxiv.org/abs/2306.00554
作者:Alister Machado,Alexandru Telea,Michael Behrisch
备注:To appear in EuroVA Workshop 2023
摘要:投影,或降维方法,是可视化探索高维数据的首选技术。存在许多这样的技术,它们中的每一个都具有不同的视觉签名,即,在散点图中排列点的一种可识别的方式。这些特征是算法设计的隐含结果,例如该方法是否专注于局部与全局数据模式保留;优化技术;以及超参数设置。我们提出了一种新的投影技术- ShaRP -,为用户提供显式的控制创建的散点图,它可以更好地满足交互式可视化场景的视觉签名。ShaRP可以很好地扩展维度和数据集大小,通常可以处理任何定量数据集,并提供这种扩展功能,即在质量指标方面以较小的用户可控成本控制投影形状。
摘要:Projections, or dimensionality reduction methods, are techniques of choice for the visual exploration of high-dimensional data. Many such techniques exist, each one of them having a distinct visual signature - i.e., a recognizable way to arrange points in the resulting scatterplot. Such signatures are implicit consequences of algorithm design, such as whether the method focuses on local vs global data pattern preservation; optimization techniques; and hyperparameter settings. We present a novel projection technique - ShaRP - that provides users explicit control over the visual signature of the created scatterplot, which can cater better to interactive visualization scenarios. ShaRP scales well with dimensionality and dataset size, generically handles any quantitative dataset, and provides this extended functionality of controlling projection shapes at a small, user-controllable cost in terms of quality metrics.
【28】Complexity of Classical Acceleration for $\ell_1$-Regularized PageRank
链接:https://arxiv.org/abs/2602.21138
作者:Kimon Fountoulakis,David Martínez-Rubio
备注:23 pages, 8 Figures
摘要:我们研究了度加权的工作需要使用标准的每迭代一个梯度加速近似梯度方法(FISTA)计算$\ell_1$-正则化PageRank。对于非加速局部方法,最著名的最坏情况工作尺度为$\widetilde{O}((αρ)^{-1})$,其中$α$是隐形传态参数,$ρ$是$\ell_1$-正则化参数。一个自然的问题是,FISTA是否可以将对$α$的依赖性从$1/α$提高到$1/\sqrtα$,同时保持$1/ρ$局部缩放。挑战在于,加速可以通过瞬时激活在最优状态下为零的节点来破坏局部性,从而增加梯度评估的成本。我们分析FISTA的稍微过度正规化的目标,并表明,在一个可检查的限制条件下,所有的假激活保持在一个边界集$\mathcal{B}$。这产生了一个由加速的$(ρ\sqrtα)^{-1}\log(α/\varept)$项加上边界开销$\sqrt{vol(\mathcal{B})}/(ρα^{3/2})$组成的边界。我们提供图形结构的条件,意味着这种限制。在合成图和真实图上的实验显示了度加权工作模型下的加速和减速机制。
摘要:We study the degree-weighted work required to compute $\ell_1$-regularized PageRank using the standard one-gradient-per-iteration accelerated proximal-gradient method (FISTA). For non-accelerated local methods, the best known worst-case work scales as $\widetilde{O} ((αρ)^{-1})$, where $α$ is the teleportation parameter and $ρ$ is the $\ell_1$-regularization parameter. A natural question is whether FISTA can improve the dependence on $α$ from $1/α$ to $1/\sqrtα$ while preserving the $1/ρ$ locality scaling. The challenge is that acceleration can break locality by transiently activating nodes that are zero at optimality, thereby increasing the cost of gradient evaluations. We analyze FISTA on a slightly over-regularized objective and show that, under a checkable confinement condition, all spurious activations remain inside a boundary set $\mathcal{B}$. This yields a bound consisting of an accelerated $(ρ\sqrtα)^{-1}\log(α/\varepsilon)$ term plus a boundary overhead $\sqrt{vol(\mathcal{B})}/(ρα^{3/2})$. We provide graph-structural conditions that imply such confinement. Experiments on synthetic and real graphs show the resulting speedup and slowdown regimes under the degree-weighted work model.
【29】Empirically Calibrated Conditional Independence Tests
标题:经验校准的条件独立性测验
链接:https://arxiv.org/abs/2602.21036
作者:Milleno Pan,Antoine de Mathelin,Wesley Tansey
摘要:条件独立性测试(CIT)被广泛用于因果发现和特征选择。即使使用错误发现率(FDR)控制程序,它们在实践中也常常不能提供频率论保证。我们强调了两种常见的失效模式:(i)在小样本中,许多CIT的渐近保证可能是不准确的,即使正确指定的模型也无法估计噪声水平并控制误差,以及(ii)当样本量很大但模型被错误指定时,未考虑的依赖关系会扭曲测试的行为,并且无法返回空值下的统一p值。我们提出了经验校准的条件独立性测试(ECCIT),一种方法,测量和校正误校准。对于所选择的基础CIT(例如,GCM,HRT),ECCIT优化选择功能和响应函数的对手,以最大限度地提高误校准指标。然后,ECCIT拟合单调校准图,根据观察到的误校准比例调整基础检验p值。在合成和真实数据的经验基准测试中,ECCIT以比现有校准策略更高的功率实现有效FDR,同时保持测试不可知性。
摘要:Conditional independence tests (CIT) are widely used for causal discovery and feature selection. Even with false discovery rate (FDR) control procedures, they often fail to provide frequentist guarantees in practice. We highlight two common failure modes: (i) in small samples, asymptotic guarantees for many CITs can be inaccurate and even correctly specified models fail to estimate the noise levels and control the error, and (ii) when sample sizes are large but models are misspecified, unaccounted dependencies skew the test's behavior and fail to return uniform p-values under the null. We propose Empirically Calibrated Conditional Independence Tests (ECCIT), a method that measures and corrects for miscalibration. For a chosen base CIT (e.g., GCM, HRT), ECCIT optimizes an adversary that selects features and response functions to maximize a miscalibration metric. ECCIT then fits a monotone calibration map that adjusts the base-test p-values in proportion to the observed miscalibration. Across empirical benchmarks on synthetic and real data, ECCIT achieves valid FDR with higher power than existing calibration strategies while remaining test agnostic.
【30】Some Simple Economics of AGI
标题:AGI的一些简单经济学
链接:https://arxiv.org/abs/2602.20946
作者:Christian Catalini,Xiang Hui,Jane Wu
备注:JEL Classification: D82, D83, J23, J24, L23, O33. 112 pages, 3 figures
摘要:几千年来,人类认知是地球进步的主要引擎。随着人工智能将认知从生物学中分离出来,可衡量的执行的边际成本降为零,吸收了任何可通过指标捕获的劳动力--包括创造性、分析性和创新性工作。增长的约束不再是智力,而是人类的验证带宽:当执行力充足时,验证、审计和承保责任的能力。我们将AGI过渡建模为两条竞赛成本曲线的碰撞:指数衰减的自动化成本和生物学检查的验证成本。这种结构上的不对称扩大了代理人可以执行的内容与人类可以验证的内容之间的可测量性差距。它还推动了技术变革从偏重技能转向偏重可衡量性。租金转移到验证级的基础事实、加密出处和责任承保--确保结果而不仅仅是产生结果的能力。目前的人在回路中的平衡是不稳定的:随着学徒制的崩溃(缺少初级回路)和专家们对它们的过时进行编码(编码者的诅咒),从下面侵蚀。未经验证的部署变得私人理性--特洛伊木马外部性。如果不加以管理,这些力量将导致空心经济。然而,通过将验证与代理能力一起扩展,威胁崩溃的力量成为无限发现和实验的催化剂-增强经济。我们为个人,公司,投资者和政策制定者制定了一个实用的剧本。今天的决定性挑战不是部署最自治系统的竞赛,而是确保其监督基础的竞赛。只有扩大我们的核查带宽以及我们的执行能力,我们才能确保我们所收集的情报保留了发起它的人性。
摘要:For millennia, human cognition was the primary engine of progress on Earth. As AI decouples cognition from biology, the marginal cost of measurable execution falls to zero, absorbing any labor capturable by metrics--including creative, analytical, and innovative work. The binding constraint on growth is no longer intelligence but human verification bandwidth: the capacity to validate, audit, and underwrite responsibility when execution is abundant. We model the AGI transition as the collision of two racing cost curves: an exponentially decaying Cost to Automate and a biologically bottlenecked Cost to Verify. This structural asymmetry widens a Measurability Gap between what agents can execute and what humans can afford to verify. It also drives a shift from skill-biased to measurability-biased technical change. Rents migrate to verification-grade ground truth, cryptographic provenance, and liability underwriting--the ability to insure outcomes rather than merely generate them. The current human-in-the-loop equilibrium is unstable: eroded from below as apprenticeship collapses (Missing Junior Loop) and from within as experts codify their obsolescence (Codifier's Curse). Unverified deployment becomes privately rational--a Trojan Horse externality. Unmanaged, these forces pull toward a Hollow Economy. Yet by scaling verification alongside agentic capabilities, the forces that threaten collapse become the catalyst for unbounded discovery and experimentation--an Augmented Economy. We derive a practical playbook for individuals, companies, investors, and policymakers. Today's defining challenge is not the race to deploy the most autonomous systems; it is the race to secure the foundations of their oversight. Only by scaling our bandwidth for verification alongside our capacity for execution can we ensure that the intelligence we have summoned preserves the humanity that initiated it.
【31】Functional Continuous Decomposition
标题:功能连续分解
链接:https://arxiv.org/abs/2602.20857
作者:Teymur Aghayev
备注:16 pages, 9 figures, 6 tables
摘要:非平稳时间序列数据的分析需要深入了解其本地和全球模式与物理解释。然而,传统的平滑算法,如B样条,Savitzky-Golay滤波,经验模式分解(EMD),缺乏执行参数优化的能力,保证连续性。在本文中,我们提出了功能连续分解(FCD),这是一个JAX加速的框架,可以对各种数学函数进行参数化的连续优化。通过使用Levenberg-Marquardt优化来实现高达$C^1 $的连续拟合,FCD将原始时间序列数据转换为$M $模式,以捕获从短期到长期趋势的不同时间模式。FCD的应用包括物理,医学,金融分析和机器学习,通常用于分析信号的时间模式,优化参数,导数和分解积分。此外,FCD可以用于物理分析和特征提取,平均每段SRMSE为0.735,完全分解1,000个点的速度为0.47秒。最后,我们证明了一个卷积神经网络(CNN)增强FCD功能,如优化的函数值,参数和导数,实现了16.8%的收敛速度和2.5%的准确性比标准的CNN。
摘要
:The analysis of non-stationary time-series data requires insight into its local and global patterns with physical interpretability. However, traditional smoothing algorithms, such as B-splines, Savitzky-Golay filtering, and Empirical Mode Decomposition (EMD), lack the ability to perform parametric optimization with guaranteed continuity. In this paper, we propose Functional Continuous Decomposition (FCD), a JAX-accelerated framework that performs parametric, continuous optimization on a wide range of mathematical functions. By using Levenberg-Marquardt optimization to achieve up to $C^1$ continuous fitting, FCD transforms raw time-series data into $M$ modes that capture different temporal patterns from short-term to long-term trends. Applications of FCD include physics, medicine, financial analysis, and machine learning, where it is commonly used for the analysis of signal temporal patterns, optimized parameters, derivatives, and integrals of decomposition. Furthermore, FCD can be applied for physical analysis and feature extraction with an average SRMSE of 0.735 per segment and a speed of 0.47s on full decomposition of 1,000 points. Finally, we demonstrate that a Convolutional Neural Network (CNN) enhanced with FCD features, such as optimized function values, parameters, and derivatives, achieved 16.8% faster convergence and 2.5% higher accuracy over a standard CNN.
【32】Characterizing Online and Private Learnability under Distributional Constraints via Generalized Smoothness
标题:通过广义平滑度描述分布约束下的在线和私人学习能力
链接:https://arxiv.org/abs/2602.20585
作者:Moïse Blanchard,Abhishek Shetty,Alexander Rakhlin
摘要:理解使学习和泛化成为可能的最小假设也许是学习理论的中心问题。统计学习理论中的几个著名结果,如VC定理和Littlestone对在线可学习性的描述,分别建立了允许在独立数据和对抗数据下学习的假设类条件。最近的工作桥接这些极端的基础上,我们研究顺序决策下的分布式对手,可以自适应地选择数据生成的分布从一个固定的家庭$U$,并询问当这样的问题是可学习的样本复杂性,表现得像有利的独立的情况下。我们提供了一个接近完整的表征家庭$U$承认学习的概念称为广义平滑,即一个分布家庭承认VC维依赖的遗憾界的每一个有限VC假设类,当且仅当它是广义光滑。此外,我们给出了通用算法,实现低遗憾下的任何广义光滑对手没有明确的知识$U$。最后,当$U$是已知的,我们提供了一个组合参数,破碎数,捕获多少不相交的区域可以进行非平凡的质量下$U$方面的精细界限。这些结果提供了一个几乎完整的了解分布式对手下的可学习性。此外,基于在线学习和差分隐私之间令人惊讶的联系,我们表明,广义平滑性也表征了分布约束下的私人可学习性。
摘要:Understanding minimal assumptions that enable learning and generalization is perhaps the central question of learning theory. Several celebrated results in statistical learning theory, such as the VC theorem and Littlestone's characterization of online learnability, establish conditions on the hypothesis class that allow for learning under independent data and adversarial data, respectively. Building upon recent work bridging these extremes, we study sequential decision making under distributional adversaries that can adaptively choose data-generating distributions from a fixed family $U$ and ask when such problems are learnable with sample complexity that behaves like the favorable independent case. We provide a near complete characterization of families $U$ that admit learnability in terms of a notion known as generalized smoothness i.e. a distribution family admits VC-dimension-dependent regret bounds for every finite-VC hypothesis class if and only if it is generalized smooth. Further, we give universal algorithms that achieve low regret under any generalized smooth adversary without explicit knowledge of $U$. Finally, when $U$ is known, we provide refined bounds in terms of a combinatorial parameter, the fragmentation number, that captures how many disjoint regions can carry nontrivial mass under $U$. These results provide a nearly complete understanding of learnability under distributional adversaries. In addition, building upon the surprising connection between online learning and differential privacy, we show that the generalized smoothness also characterizes private learnability under distributional constraints.
【33】Cross-Chirality Generalization by Axial Vectors for Hetero-Chiral Protein-Peptide Interaction Design
标题:轴向载体的交叉手征性概括用于异向-手征蛋白-肽相互作用设计
链接:https://arxiv.org/abs/2602.20176
作者:Ziyi Yang,Zitong Tian,Yinjun Jia,Tianyi Zhang,Jiqing Zheng,Hao Wang,Yubu Su,Juncai He,Lei Liu,Yanyan Lan
备注:9 pages, 5 figures
摘要:靶向L-蛋白的D-肽结合剂具有很好的治疗潜力。尽管基于机器学习的靶向调节肽设计取得了快速进展,但生成D肽结合剂在很大程度上仍未被探索。在这项工作中,我们表明,通过注入轴向特征的$E(3)$-等变(极)矢量的功能,它是可行的,以实现交叉手征推广从同手性(L-L)的训练数据异手性(D-L)的设计任务。通过在潜在扩散模型中实施该方法,我们实现了D-肽结合剂设计,其不仅在计算机基准中优于现有工具,而且在湿实验室验证中也证明了有效性。据我们所知,我们的方法代表了第一个湿实验室验证的生成AI,用于从头设计D-肽结合剂,为蛋白质设计中处理手性提供了新的视角。
摘要:D-peptide binders targeting L-proteins have promising therapeutic potential. Despite rapid advances in machine learning-based target-conditioned peptide design, generating D-peptide binders remains largely unexplored. In this work, we show that by injecting axial features to $E(3)$-equivariant (polar) vector features,it is feasible to achieve cross-chirality generalization from homo-chiral (L--L) training data to hetero-chiral (D--L) design tasks. By implementing this method within a latent diffusion model, we achieved D-peptide binder design that not only outperforms existing tools in in silico benchmarks, but also demonstrates efficacy in wet-lab validation. To our knowledge, our approach represents the first wet-lab validated generative AI for the de novo design of D-peptide binders, offering new perspectives on handling chirality in protein design.
机器翻译由腾讯交互翻译提供,仅供参考
点击“阅读原文”获取带摘要的学术速递