点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!
cs.LG 方向,今日共计246篇
大模型相关(29篇)
【1】Optimizing LLM Prompt Engineering with DSPy Based Declarative Learning
标题:利用基于DSPy的陈述式学习优化LLM即时工程
链接:https://arxiv.org/abs/2604.04869
作者:Shiek Ruksana,Sailesh Kiran Kurra,Thipparthi Sanjay Baradwaj
备注:Best paper Award ,IEEE International Conference on Emerging Smart Computing and Informatics (ESCI) Pune, India. Mar 11-13, 2026
摘要:大型语言模型(LLM)在广泛的自然语言处理任务中表现出强大的性能;然而,它们的有效性高度依赖于即时设计,结构和嵌入式推理信号。传统的快速工程方法在很大程度上依赖于启发式的试错过程,这限制了任务的可扩展性,可重复性和通用性。DSPy是一个用于优化文本处理管道的声明式框架,它为基于LLM的系统提供了一种自动化、模块化和可学习的提示构建方法,本文系统地研究了基于DSPy的声明式学习的提示优化方法,重点是提示合成、校正、校准和自适应推理控制。我们引入了一个统一的DSPy LLM架构,该架构结合了符号规划,无梯度优化和自动模块重写,以减少幻觉,改善事实基础,并避免不必要的提示复杂性。对推理任务、检索增强生成和多步思想链基准进行的实验评估表明,在输出可靠性、效率和跨模型泛化方面都有一致的收益。结果显示,事实准确率提高了30%至45%,幻觉率降低了约25%。最后,我们概述了声明提示优化框架的主要限制,并讨论了未来的研究方向。
摘要:Large Language Models (LLMs) have shown strong performance across a wide range of natural language processing tasks; however, their effectiveness is highly dependent on prompt design, structure, and embedded reasoning signals. Conventional prompt engineering methods largely rely on heuristic trial-and-error processes, which limits scalability, reproducibility, and generalization across tasks. DSPy, a declarative framework for optimizing text-processing pipelines, offers an alternative approach by enabling automated, modular, and learnable prompt construction for LLM-based systems.This paper presents a systematic study of DSPy-based declarative learning for prompt optimization, with emphasis on prompt synthesis, correction, calibration, and adaptive reasoning control. We introduce a unified DSPy LLM architecture that combines symbolic planning, gradient free optimization, and automated module rewriting to reduce hallucinations, improve factual grounding, and avoid unnecessary prompt complexity. Experimental evaluations conducted on reasoning tasks, retrieval-augmented generation, and multi-step chain-of-thought benchmarks demonstrate consistent gains in output reliability, efficiency, and generalization across models. The results show improvements of up to 30 to 45% in factual accuracy and a reduction of approximately 25% in hallucination rates. Finally, we outline key limitations and discuss future research directions for declarative prompt optimization frameworks.
【2】HUKUKBERT: Domain-Specific Language Model for Turkish Law
标题:HUKUKBERT:土耳其法律领域特定语言模型
链接:https://arxiv.org/abs/2604.04790
作者:Mehmet Utku Öztürk,Tansu Türkoğlu,Buse Buz-Yalug
备注:15 pages
摘要:自然语言处理(NLP)的最新进展使LegalTech的应用越来越多,但由于缺乏特定领域的数据和模型,针对土耳其法律的现有研究仍然有限。虽然已经为英语法律文本开发了广泛的模型,如LEGAL-BERT,但土耳其法律领域缺乏特定领域的高容量对应物。在本文中,我们介绍了土耳其语最全面的法律语言模型HukukBERT,它使用混合域自适应预训练(DAPT)方法在18 GB的清洁法律语料库上进行训练,该方法集成了全词掩码,令牌范围掩码,词范围掩码和有针对性的关键词掩码。我们系统地比较了我们的48 K WordPiece标记器和DAPT方法与通用和现有的特定于域的土耳其模型。在一个新的法律完形填空测试基准-一个蒙面的法律术语预测任务,为土耳其法院判决设计- HukukBERT实现了最先进的性能与84.40%的前1名的准确率,大大优于现有的模型。此外,我们在土耳其官方法院判决的结构分割的下游任务中评估了HukukBERT,它实现了92.8%的文档通过率,建立了一个新的最先进的技术。我们发布HukukBERT以支持土耳其法律NLP任务的未来研究,包括命名实体的识别,判决预测和法律文档的分类。
摘要:Recent advances in natural language processing (NLP) have increasingly enabled LegalTech applications, yet existing studies specific to Turkish law have still been limited due to the scarcity of domain-specific data and models. Although extensive models like LEGAL-BERT have been developed for English legal texts, the Turkish legal domain lacks a domain-specific high-volume counterpart. In this paper, we introduce HukukBERT, the most comprehensive legal language model for Turkish, trained on a 18 GB cleaned legal corpus using a hybrid Domain-Adaptive Pre-Training (DAPT) methodology integrating Whole-Word Masking, Token Span Masking, Word Span Masking, and targeted Keyword Masking. We systematically compared our 48K WordPiece tokenizer and DAPT approach against general-purpose and existing domain-specific Turkish models. Evaluated on a novel Legal Cloze Test benchmark -- a masked legal term prediction task designed for Turkish court decisions -- HukukBERT achieves state-of-the-art performance with 84.40\% Top-1 accuracy, substantially outperforming existing models. Furthermore, we evaluated HukukBERT in the downstream task of structural segmentation of official Turkish court decisions, where it achieves a 92.8\% document pass rate, establishing a new state-of-the-art. We release HukukBERT to support future research in Turkish legal NLP tasks, including recognition of named entities, prediction of judgment, and classification of legal documents.
【3】Darkness Visible: Reading the Exception Handler of a Language Model
标题:可见的黑暗:阅读语言模型的异常收件箱
链接:https://arxiv.org/abs/2604.04756
作者:Peter Balogh
摘要:GPT-2 Small的最终MLP展示了一个完全清晰的路由程序-- 27个命名神经元被组织成一个三层异常处理程序--而它路由的知识仍然纠缠在大约3,040个剩余神经元上。我们将所有3,072个神经元(精确到数字)分解为:5个融合的核心神经元,将词汇重新设置为功能词,10个区分器,抑制错误的候选者,5个检测结构边界的专家,以及7个共识神经元,每个神经元都监控一个独特的语言维度。共识-例外交叉(MLP干预从有益变为有害)在统计学上是尖锐的(bootstrap 95%CI在所有共识水平上不包括零;交叉在4/7和5/7之间)。三个实验表明,“知识神经元”(戴等人,2022),在该模型的L11处,用作路由基础设施而不是事实存储:MLP放大或抑制已经存在于残留流中的信号以避免注意,利用上下文约束进行缩放。一个花园路径实验揭示了一个反向的花园路径效应- GPT-2立即使用动词子分类,与异常处理程序在标记级别的可预测性,而不是语法结构操作一致。这种架构只在终端层结晶-在更深的模型中,我们预测在最后一层,而不是在第11层的等效结构。代码和数据:https://github.com/pbalogh/transparent-gpt2
摘要:The final MLP of GPT-2 Small exhibits a fully legible routing program -- 27 named neurons organized into a three-tier exception handler -- while the knowledge it routes remains entangled across ~3,040 residual neurons. We decompose all 3,072 neurons (to numerical precision) into: 5 fused Core neurons that reset vocabulary toward function words, 10 Differentiators that suppress wrong candidates, 5 Specialists that detect structural boundaries, and 7 Consensus neurons that each monitor a distinct linguistic dimension. The consensus-exception crossover -- where MLP intervention shifts from helpful to harmful -- is statistically sharp (bootstrap 95% CIs exclude zero at all consensus levels; crossover between 4/7 and 5/7). Three experiments show that "knowledge neurons" (Dai et al., 2022), at L11 of this model, function as routing infrastructure rather than fact storage: the MLP amplifies or suppresses signals already present in the residual stream from attention, scaling with contextual constraint. A garden-path experiment reveals a reversed garden-path effect -- GPT-2 uses verb subcategorization immediately, consistent with the exception handler operating at token-level predictability rather than syntactic structure. This architecture crystallizes only at the terminal layer -- in deeper models, we predict equivalent structure at the final layer, not at layer 11. Code and data: https://github.com/pbalogh/transparent-gpt2
【4】One Model for All: Multi-Objective Controllable Language Models
标题:通用模型:多目标可控语言模型
链接:https://arxiv.org/abs/2604.04497
作者:Qiang He,Yucheng Yang,Tianyi Zhou,Meng Fang,Mykola Pechenizkiy,Setareh Maghsudi
备注:Published in Transactions on Machine Learning Research (03/2026): https://openreview.net/forum?id=qAM5PmvFYY
摘要:将大型语言模型(LLM)与人类偏好对齐对于增强LLM的安全性,有用性,幽默性,忠实性等至关重要。目前,从人类反馈的强化学习(RLHF)主要集中在从平均人类评级中学习到的固定奖励,这可能会削弱不同偏好的适应性和可控性。然而,创建个性化的LLM需要将LLM与个人偏好相匹配,这是非常重要的,因为每个用户的数据稀缺,以及多目标权衡中用户偏好的多样性,从在某些情况下强调同理心到在其他情况下要求效率和精度。我们是否可以训练一个LLM来在帕累托前沿的不同用户偏好中产生个性化的输出?在本文中,我们介绍了多目标控制(MOC),它训练一个单一的LLM直接产生的帕累托前沿的偏好定义的区域的响应。我们的方法引入多目标优化(MOO)的原则RLHF训练的LLM作为一个首选条件的政策网络。我们通过在策略级别应用MOO来提高MOC的计算效率,使我们能够在单个A6000 GPU上微调7 B参数模型。大量的实验表明,MOC在三个方面优于基线:(i)LLM输出的可控性w.r.t.用户偏好之间的权衡多个奖励;(ii)质量和LLM输出的多样性,通过实现多个解决方案的超容量来衡量;(iii)泛化到看不见的偏好。这些结果突出了MOC在需要可扩展和可定制LLM的现实应用中的潜力。
摘要:Aligning large language models (LLMs) with human preferences is critical for enhancing LLMs' safety, helpfulness, humor, faithfulness, etc. Current reinforcement learning from human feedback (RLHF) mainly focuses on a fixed reward learned from average human ratings, which may weaken the adaptability and controllability of varying preferences. However, creating personalized LLMs requires aligning LLMs with individual human preferences, which is non-trivial due to the scarce data per user and the diversity of user preferences in multi-objective trade-offs, varying from emphasizing empathy in certain contexts to demanding efficiency and precision in others. Can we train one LLM to produce personalized outputs across different user preferences on the Pareto front? In this paper, we introduce Multi-Objective Control (MOC), which trains a single LLM to directly generate responses in the preference-defined regions of the Pareto front. Our approach introduces multi-objective optimization (MOO) principles into RLHF to train an LLM as a preference-conditioned policy network. We improve the computational efficiency of MOC by applying MOO at the policy level, enabling us to fine-tune a 7B-parameter model on a single A6000 GPU. Extensive experiments demonstrate the advantages of MOC over baselines in three aspects: (i) controllability of LLM outputs w.r.t. user preferences on the trade-off among multiple rewards; (ii) quality and diversity of LLM outputs, measured by the hyper-volume of multiple solutions achieved; and (iii) generalization to unseen preferences. These results highlight MOC's potential for real-world applications requiring scalable and customizable LLMs.
【5】SLaB: Sparse-Lowrank-Binary Decomposition for Efficient Large Language Models
标题:SLaB:用于高效大型语言模型的稀疏-低等级-二进制分解
链接:https://arxiv.org/abs/2604.04493
作者:Ziwei Li,Yuang Ma,Yi Kang
摘要:大型语言模型(LLM)的快速增长带来了巨大的部署挑战,因为它们需要大量的计算和内存。虽然模型压缩,如网络修剪,提供了潜在的解决方案,大多数现有的方法往往无法保持良好的性能在高压缩比。为了解决这个问题,我们提出了SLaB,这是一种新的框架,它将每个线性层权重分解为三个互补的组成部分:稀疏矩阵,低秩矩阵和二进制矩阵。SLaB消除了重新训练的需要,并利用激活感知修剪分数来指导分解过程。Llama家族模型上的实验表明,SLaB实现了最先进的性能,与现有方法相比,在50%的压缩率下减少了高达36%的困惑,并在zero-shot任务的基线上提高了高达8.98%的准确性。
摘要
:The rapid growth of large language models (LLMs) presents significant deployment challenges due to their massive computational and memory demands. While model compression, such as network pruning, offers potential solutions, most existing methods often fail to maintain good performance at high compression ratios. To address this, we propose SLaB, a novel framework that decomposes each linear layer weight into three complementary components: a sparse matrix, a low-rank matrix, and a binary matrix. SLaB eliminates the need for retraining and leverages activation-aware pruning scores to guide the decomposition process. Experiments on Llama-family models demonstrate that SLaB achieves state-of-the-art performance, reducing perplexity by up to 36% compared to existing methods at 50% compression and improving accuracy by up to 8.98% over the baseline on zero-shot tasks.
【6】A Patch-based Cross-view Regularized Framework for Backdoor Defense in Multimodal Large Language Models
标题:多模式大型语言模型后门防御的基于补丁的跨视图正规化框架
链接:https://arxiv.org/abs/2604.04488
作者:Tianmeng Fang,Yong Wang,Zetai Kong,Zengzhen Su,Jun Wang,Chengjin Yu,Wei Wang
备注:26 pages, 3 figures. Subjects: Machine Learning (cs.LG)
摘要:多模态大型语言模型已经成为统一处理视觉和语言任务的重要基础设施。然而,这种模型在监督微调期间非常容易受到后门植入的影响,并且一旦激活特定的触发模式,就会稳定地输出攻击者预定义的有害响应。后门防御的核心挑战在于在低中毒率下抑制攻击成功,同时保持模型的正常生成能力。这两个目标在本质上是相互冲突的。强抑制通常会降低良性性能,而弱正则化则无法减轻后门行为。为此,我们提出了一个统一的防御框架补丁增强和跨视图的规律性,同时约束模型的异常行为,从两个特征表示和输出分布水平的触发模式。具体而言,补丁级数据增强与跨视图输出差异正则化相结合,以利用后门响应对非语义扰动异常不变的事实,并主动拉开原始视图和扰动视图的输出分布,从而显着抑制后门触发的成功率。同时,通过对输出熵的约束,避免了防御过程中模型的过度抑制,保证了正常命令生成的质量。实验结果表明,在三个模型,两个任务,和六个攻击,我们提出的防御方法有效地降低了攻击的成功率,同时保持了高水平的正常文本生成能力。我们的工作使大规模多模态模型在现实的低频中毒和隐蔽触发场景中的安全,可控的部署成为可能。
摘要:Multimodal large language models have become an important infrastructure for unified processing of visual and linguistic tasks. However, such models are highly susceptible to backdoor implantation during supervised fine-tuning and will steadily output the attacker's predefined harmful responses once a specific trigger pattern is activated. The core challenge of backdoor defense lies in suppressing attack success under low poisoning ratios while preserving the model's normal generation ability. These two objectives are inherently conflicting. Strong suppression often degrades benign performance, whereas weak regularization fails to mitigate backdoor behaviors. To this end, we propose a unified defense framework based on patch augmentation and cross-view regularity, which simultaneously constrains the model's anomalous behaviors in response to triggered patterns from both the feature representation and output distribution levels. Specifically, patch-level data augmentation is combined with cross-view output difference regularization to exploit the fact that backdoor responses are abnormally invariant to non-semantic perturbations and to proactively pull apart the output distributions of the original and perturbed views, thereby significantly suppressing the success rate of backdoor triggering. At the same time, we avoid over-suppression of the model during defense by imposing output entropy constraints, ensuring the quality of normal command generation. Experimental results across three models, two tasks, and six attacks show that our proposed defense method effectively reduces the attack success rate while maintaining a high level of normal text generation capability. Our work enables the secure, controlled deployment of large-scale multimodal models in realistic low-frequency poisoning and covert triggering scenarios.
【7】DP-OPD: Differentially Private On-Policy Distillation for Language Models
标题:DP-OPD:语言模型的差异私密政策提炼
链接:https://arxiv.org/abs/2604.04461
作者:Fatemeh Khadem,Sajad Mousavi,Yi Fang,Yuhong Liu
摘要:大型语言模型(LLM)越来越多地适应包含敏感信息的专有和特定于领域的语料库,从而在正式的隐私保证和通过模型压缩的高效部署之间产生了紧张关系。差分隐私(DP)通常通过DP-SGD实施,提供记录级保护,但通常会在自回归生成中导致大量效用损失,其中优化噪声会放大曝光偏差并在长时间部署中复合误差。现有的方法,私人蒸馏要么DP-SGD适用于教师和学生,恶化计算和隐私-效用权衡,或依赖于DP合成文本生成从DP训练的教师,避免DP对学生的DP优化一个大的教师和引入离线生成管道的成本。我们提出了一个无合成的框架,它只通过DP-SGD对学生实施隐私,同时利用冻结的教师在学生生成的轨迹上提供密集的令牌级目标。DP-OPD通过延续令牌上的私有广义知识蒸馏(private generalized knowledge distillation)来实例化这个想法。在严格的隐私预算($\vareptotal =2.0$)下,DP-OPD改善了DP微调和非策略DP蒸馏的困惑,并优于基于合成的DP蒸馏(Yelp:44.15$\rightarrow$41.68; BigPatent:32.43$\rightarrow$30.63),同时大大简化了训练管道。特别是,\textbf{DP-OPD通过消除DP教师培训和离线合成文本生成,将私有压缩到单个DP学生培训循环中}。代码将在https://github.com/khademfatemeh/dp_opd上发布。
摘要:Large language models (LLMs) are increasingly adapted to proprietary and domain-specific corpora that contain sensitive information, creating a tension between formal privacy guarantees and efficient deployment through model compression. Differential privacy (DP), typically enforced via DP-SGD, provides record-level protection but often incurs substantial utility loss in autoregressive generation, where optimization noise can amplify exposure bias and compounding errors along long rollouts. Existing approaches to private distillation either apply DP-SGD to both teacher and student, worsening computation and the privacy--utility tradeoff, or rely on DP synthetic text generation from a DP-trained teacher, avoiding DP on the student at the cost of DP-optimizing a large teacher and introducing an offline generation pipeline. We propose \textbf{Differentially Private On-Policy Distillation (DP-OPD)}, a synthesis-free framework that enforces privacy solely through DP-SGD on the student while leveraging a frozen teacher to provide dense token-level targets on \emph{student-generated} trajectories. DP-OPD instantiates this idea via \emph{private generalized knowledge distillation} on continuation tokens. Under a strict privacy budget ($\varepsilon=2.0$), DP-OPD improves perplexity over DP fine-tuning and off-policy DP distillation, and outperforms synthesis-based DP distillation (Yelp: 44.15$\rightarrow$41.68; BigPatent: 32.43$\rightarrow$30.63), while substantially simplifying the training pipeline. In particular, \textbf{DP-OPD collapses private compression into a single DP student-training loop} by eliminating DP teacher training and offline synthetic text generation. Code will be released upon publication at https://github.com/khademfatemeh/dp_opd.
【8】How Alignment Routes: Localizing, Scaling, and Controlling Policy Circuits in Language Models
标题:如何调整路线:语言模型中的本地化、扩展和控制策略回路
链接:https://arxiv.org/abs/2604.04385
作者:Gregory N. Frank
摘要:我们确定了一个重复的稀疏路由机制,在语音训练的语言模型:一个门注意头读取检测到的内容,并触发下游放大器头,提高信号拒绝。使用政治审查和安全拒绝作为自然实验,我们追踪了来自6个实验室的9个模型,所有这些模型都在120个提示对的语料库上进行了验证。门控头通过必要性和充分性互换检验(p < 0.001,置换无效),核心放大器头在自举复位下稳定(Jaccard 0.92-1.0)。三个同代缩放对显示,路由分布在规模(消融高达17倍弱),同时保持可检测的交换。通过调制检测层信号,我们不断控制从硬拒绝到实际遵守的策略强度,路由阈值因主题而异。该电路还揭示了意图识别和策略路由之间的结构分离:在密码编码下,门头的路由贡献崩溃(Phi-4中n=120时为78%),而模型则以解决难题而不是拒绝来响应。路由机制永远不会触发,即使更深层的探测分数表明模型开始表示有害内容。这种不对称性与预训练和后训练的不同鲁棒性一致:广泛的语义理解与较窄的策略绑定,后者在输入转换下的泛化能力较差。
摘要:We identify a recurring sparse routing mechanism in alignment-trained language models: a gate attention head reads detected content and triggers downstream amplifier heads that boost the signal toward refusal. Using political censorship and safety refusal as natural experiments, we trace this mechanism across 9 models from 6 labs, all validated on corpora of 120 prompt pairs. The gate head passes necessity and sufficiency interchange tests (p < 0.001, permutation null), and core amplifier heads are stable under bootstrap resampling (Jaccard 0.92-1.0). Three same-generation scaling pairs show that routing distributes at scale (ablation up to 17x weaker) while remaining detectable by interchange. By modulating the detection-layer signal, we continuously control policy strength from hard refusal through steering to factual compliance, with routing thresholds that vary by topic. The circuit also reveals a structural separation between intent recognition and policy routing: under cipher encoding, the gate head's routing contribution collapses (78% in Phi-4 at n=120) while the model responds with puzzle-solving rather than refusal. The routing mechanism never fires, even though probe scores at deeper layers indicate the model begins to represent the harmful content. This asymmetry is consistent with different robustness properties of pretraining and post-training: broad semantic understanding versus narrower policy binding that generalizes less well under input transformation.
【9】CPT: Controllable and Editable Design Variations with Language Models
标题:CPT:具有语言模型的可控和可编辑设计变体
链接:https://arxiv.org/abs/2604.04380
作者:Karthik Suresh,Amine Ben Khalifa,Li Zhang,Wei-ting Hsu,Fangzheng Wu,Vinay More,Asim Kadav
备注:18 pages, 6 figures, Accepted at NeurIPS 2025 Workshop on Generative and Protective AI for Content Creation (GenProCC 2025)
摘要:设计视觉上多样化和高质量的设计仍然是一个手动,耗时的过程,限制了创意工作流程的可扩展性和个性化。我们提出了一个系统,用于生成可编辑的设计变化,使用解码器的语言模型,创意预训练的Transformer(CPT),训练预测设计模板中的视觉风格属性。我们方法的核心是一种名为Creative Markup Language(CML)的新表示,这是一种紧凑的机器学习友好格式,可捕获画布级结构,页面布局和元素级细节(文本,图像和矢量图形),包括内容和样式。我们在专业设计师创作的大型设计模板语料库上微调CPT,使其能够学习有意义的上下文感知预测属性,如配色方案和字体选择。该模型产生语义结构化和风格连贯的输出,保持元素之间的内部一致性。与生成图像模型不同,我们的系统生成完全可编辑的设计文档,而不是仅包含像素的图像,允许用户在设计编辑器中进行编辑和个性化。在实验中,我们的方法生成上下文的颜色和字体的变化,现有的模板,并显示在调整布局,同时保持设计原则的承诺。
摘要:Designing visually diverse and high-quality designs remains a manual, time-consuming process, limiting scalability and personalization in creative workflows. We present a system for generating editable design variations using a decoder-only language model, the Creative Pre-trained Transformer (CPT), trained to predict visual style attributes in design templates. At the core of our approach is a new representation called Creative Markup Language (CML), a compact, machine-learning-friendly format that captures canvas-level structure, page layout, and element-level details (text, images, and vector graphics), including both content and style. We fine-tune CPT on a large corpus of design templates authored by professional designers, enabling it to learn meaningful, context-aware predictions for attributes such as color schemes and font choices. The model produces semantically structured and stylistically coherent outputs, preserving internal consistency across elements. Unlike generative image models, our system yields fully editable design documents rather than pixel-only images, allowing users to iterate and personalize within a design editor. In experiments, our approach generates contextual color and font variations for existing templates and shows promise in adjusting layouts while maintaining design principles.
【10】Decocted Experience Improves Test-Time Inference in LLM Agents
标题:丰富的经验改善了LLM代理的测试时推理
链接:https://arxiv.org/abs/2604.04373
作者:Maohao Shen,Kaiwen Zha,Zexue He,Zhang-Wei Hong,Siru Ouyang,J. Jon Ryu,Prasanna Sattigeri,Suhas Diggavi,Gregory Wornell
摘要:人们越来越关注在不更新模型参数的情况下改进LLM。一个公认的方向是测试时间缩放,其中增加的推理时间计算(例如,较长的推理、采样或搜索)被用于提高性能。然而,对于复杂的推理和代理任务,天真地扩展测试时间计算可能会大大增加成本,并且仍然会导致在次优探索上浪费预算。在本文中,我们探索\n {context}作为一个互补的缩放轴,以提高LLM的性能,并系统地研究如何构建更好的输入,引导推理通过\n {经验}。我们表明,有效的语境构建关键取决于经验。我们提出了一个详细的分析经验增强代理,研究如何从经验中获得上下文,性能如何与积累的经验,什么特点良好的上下文,以及哪些数据结构最好地支持上下文建设。我们确定了一个有效的语境构建的关键机制:从经验中提取精华,连贯地组织它,并检索显着的信息,以建立有效的上下文。我们在推理和代理任务中验证了我们的发现,包括数学推理,网页浏览和软件工程。
摘要
:There is growing interest in improving LLMs without updating model parameters. One well-established direction is test-time scaling, where increased inference-time computation (e.g., longer reasoning, sampling, or search) is used to improve performance. However, for complex reasoning and agentic tasks, naively scaling test-time compute can substantially increase cost and still lead to wasted budget on suboptimal exploration. In this paper, we explore \emph{context} as a complementary scaling axis for improving LLM performance, and systematically study how to construct better inputs that guide reasoning through \emph{experience}. We show that effective context construction critically depends on \emph{decocted experience}. We present a detailed analysis of experience-augmented agents, studying how to derive context from experience, how performance scales with accumulated experience, what characterizes good context, and which data structures best support context construction. We identify \emph{decocted experience} as a key mechanism for effective context construction: extracting essence from experience, organizing it coherently, and retrieving salient information to build effective context. We validate our findings across reasoning and agentic tasks, including math reasoning, web browsing, and software engineering.
【11】REAM: Merging Improves Pruning of Experts in LLMs
标题:REAM:合并改善了法学硕士专家的精简
链接:https://arxiv.org/abs/2604.04356
作者:Saurav Jha,Maryam Hashemzadeh,Ali Saheb Pasand,Ali Parviz,Min-Joong Lee,Boris Knyazev
备注:code is at https://github.com/SamsungSAILMontreal/ream
摘要:混合专家(MoE)大型语言模型(LLM)是性能最好的架构之一。最大的模型通常具有数千亿个参数,给部署带来了巨大的内存挑战。传统的减少内存需求的方法包括权重修剪和量化。受路由器加权专家激活剪枝(REAP)的启发,提出了一种新的方法,路由器加权专家激活合并(REAM)。REAM没有删除专家,而是将他们分组并合并他们的权重,更好地保留了原始性能。我们评估REAM对REAP和其他基线跨多个MoE LLM在不同的多项选择题(MC)的问题回答和生成(GEN)基准。我们的研究结果揭示了MC和GEN性能之间的权衡,这取决于校准数据的组合。通过控制一般,数学和编码数据的混合,我们研究了这种权衡的帕累托边界,并表明REAM往往优于基线,在许多情况下,与原始的未压缩模型相当。
摘要:Mixture-of-Experts (MoE) large language models (LLMs) are among the top-performing architectures. The largest models, often with hundreds of billions of parameters, pose significant memory challenges for deployment. Traditional approaches to reduce memory requirements include weight pruning and quantization. Motivated by the Router-weighted Expert Activation Pruning (REAP) that prunes experts, we propose a novel method, Router-weighted Expert Activation Merging (REAM). Instead of removing experts, REAM groups them and merges their weights, better preserving original performance. We evaluate REAM against REAP and other baselines across multiple MoE LLMs on diverse multiple-choice (MC) question answering and generative (GEN) benchmarks. Our results reveal a trade-off between MC and GEN performance that depends on the mix of calibration data. By controlling the mix of general, math and coding data, we examine the Pareto frontier of this trade-off and show that REAM often outperforms the baselines and in many cases is comparable to the original uncompressed models.
【12】High-Stakes Personalization: Rethinking LLM Customization for Individual Investor Decision-Making
标题:高风险个性化:重新思考个人投资者决策的LLM定制
链接:https://arxiv.org/abs/2604.04300
作者:Yash Ganpat Sawant
备注:4 pages + 1 page references. Submitted to CustomNLP4U Workshop @ ACL 2026
摘要:个性化LLM系统发展迅速,但大多数系统都在用户偏好稳定且地面真实性不存在或主观的领域中运行。我们认为,个人投资者的决策提出了一个独特的具有挑战性的领域LLM个性化-一个暴露在当前的定制模式的根本局限性。利用我们为人工智能增强的投资组合管理而构建和部署的系统,我们确定了四个轴,个人投资暴露了标准LLM定制的基本限制:(1)行为记忆复杂性,其中投资者模式是时间演变的,自相矛盾的,并且在财务上是重要的;(2)漂移下的理论一致性,其中在数周或数月内保持一致的投资理由使无状态和会话绑定架构紧张;(3)风格信号紧张,系统必须同时尊重个人投资理念和表面可能与之相矛盾的客观证据;(4)没有地面事实的对齐,个性化质量无法根据固定的标签集进行评估,因为结果是随机的和延迟的。我们描述了构建系统时出现的架构响应,并为高风险、时间扩展的决策领域中的个性化NLP提出了开放的研究方向。
摘要:Personalized LLM systems have advanced rapidly, yet most operate in domains where user preferences are stable and ground truth is either absent or subjective. We argue that individual investor decision-making presents a uniquely challenging domain for LLM personalization - one that exposes fundamental limitations in current customization paradigms. Drawing on our system, built and deployed for AI-augmented portfolio management, we identify four axes along which individual investing exposes fundamental limitations in standard LLM customization: (1) behavioral memory complexity, where investor patterns are temporally evolving, self-contradictory, and financially consequential; (2) thesis consistency under drift, where maintaining coherent investment rationale over weeks or months strains stateless and session-bounded architectures; (3) style-signal tension, where the system must simultaneously respect personal investment philosophy and surface objective evidence that may contradict it; and (4) alignment without ground truth, where personalization quality cannot be evaluated against a fixed label set because outcomes are stochastic and delayed. We describe the architectural responses that emerged from building the system and propose open research directions for personalized NLP in high-stakes, temporally extended decision domains.
【13】APPA: Adaptive Preference Pluralistic Alignment for Fair Federated RLHF of LLMs
标题:APPA:LLM公平联邦RL HF的自适应偏好多元化调整
链接:https://arxiv.org/abs/2604.04261
作者:Mahmoud Srewa,Tianyu Zhao,Salma Elmalaki
摘要:将大型语言模型(LLM)与不同的人类偏好对齐需要多元对齐,其中单个模型必须同时尊重多个不同群体的价值观。在来自人类反馈的联邦强化学习(FedRLHF)中,这些群体在不集中偏好数据的情况下调整共享策略,这使得公平的奖励聚合至关重要。现有的聚合方法表现出明显的权衡:基于平均值的聚合系统性地低于表现最差的组,而最小聚合以整体一致性为代价优先考虑表现最差的组。我们提出了APPA,一个自适应偏好多元对齐框架,动态重新加权组级奖励的基础上历史对齐奖励。我们的方法优先考虑对齐组,而不会降低对齐的,同时不需要访问原始偏好数据。APPA集成到基于近端策略优化(PPO)的FedRLHF管道中,并在三个模型系列(Gemma 2 2B,Llama 3.2 3B,Qwen 3 0.6B)的GLOBALQA和OQA上进行了评估,实现了强大的公平对齐权衡,将最差组对齐比平均聚合提高了28%,同时在大多数配置中保持了比最小聚合更高的整体对齐。
摘要:Aligning large language models (LLMs) with diverse human preferences requires pluralistic alignment, where a single model must respect the values of multiple distinct groups simultaneously. In federated reinforcement learning from human feedback (FedRLHF), these groups align a shared policy without centralizing preference data, which makes fair reward aggregation essential. Existing aggregation methods exhibit clear trade offs: average based aggregation systematically under aligns worst performing groups, while min aggregation prioritizes worst group performance at the cost of overall alignment. We propose APPA, an Adaptive Preference Pluralistic Alignment framework that dynamically reweights group level rewards based on historical alignment rewards. Our approach prioritizes under aligned groups without degrading well aligned ones, while requiring no access to raw preference data. Integrated into a proximal policy optimization (PPO) based FedRLHF pipeline and evaluated on GLOBALQA and OQA across three model families (Gemma 2 2B, Llama 3.2 3B, Qwen3 0.6B), APPA achieves strong fairness alignment trade offs, improving worst group alignment by up to 28% over average aggregation while maintaining higher overall alignment than min aggregation across most configurations.
【14】Combee: Scaling Prompt Learning for Self-Improving Language Model Agents
标题:Combee:为自我改进的语言模型代理扩展即时学习
链接:https://arxiv.org/abs/2604.04247
作者:Hanchen Li,Runyuan He,Qizheng Zhang,Changxiu Ji,Qiuyang Mang,Xiaokun Chen,Lakshya A Agrawal,Wei-Liang Liao,Eric Yang,Alvin Cheung,James Zou,Kunle Olukotun,Ion Stoica,Joseph E. Gonzalez
摘要:快速学习的最新进展允许大型语言模型代理从推理时间上下文获取任务相关的知识,而无需参数变化。例如,现有的方法(如ACE或GEPA)可以学习系统提示,以提高基于先前代理运行的准确性。但是,这些方法主要关注单代理或低并行度设置。这从根本上限制了他们从大量收集的代理痕迹中有效学习的能力。并行地运行快速学习将是有效和有益的,以适应从许多代理跟踪或并行代理执行中学习的增长趋势。然而,如果没有一个原则性的策略进行缩放,目前的方法遭受质量退化与高并行性。为了提高快速学习的效率和质量,我们提出了Combee,一个新的框架来扩展并行快速学习的自我改进代理。Combee加快了学习速度,使许多代理能够并行运行,同时从它们的聚合跟踪中学习,而不会降低质量。为了实现这一点,Combee利用并行扫描并采用增强的洗牌机制; Combee还引入了动态批量大小控制器来平衡质量和延迟。在AppWorld、Terminal-Bench、Formula和FiNER上的评估表明,Combee比以前的方法实现了高达17倍的加速,具有相当或更好的准确性和同等的成本。
摘要:Recent advances in prompt learning allow large language model agents to acquire task-relevant knowledge from inference-time context without parameter changes. For example, existing methods (like ACE or GEPA) can learn system prompts to improve accuracy based on previous agent runs. However, these methods primarily focus on single-agent or low-parallelism settings. This fundamentally limits their ability to efficiently learn from a large set of collected agentic traces. It would be efficient and beneficial to run prompt learning in parallel to accommodate the growing trend of learning from many agentic traces or parallel agent executions. Yet without a principled strategy for scaling, current methods suffer from quality degradation with high parallelism. To improve both the efficiency and quality of prompt learning, we propose Combee, a novel framework to scale parallel prompt learning for self-improving agents. Combee speeds up learning and enables running many agents in parallel while learning from their aggregate traces without quality degradation. To achieve this, Combee leverages parallel scans and employs an augmented shuffle mechanism; Combee also introduces a dynamic batch size controller to balance quality and delay. Evaluations on AppWorld, Terminal-Bench, Formula, and FiNER demonstrate that Combee achieves up to 17x speedup over previous methods with comparable or better accuracy and equivalent cost.
【15】Which English Do LLMs Prefer? Triangulating Structural Bias Towards American English in Foundation Models
标题:法学硕士更喜欢哪种英语?基础模型中对美式英语的结构偏见进行三角分析
链接:https://arxiv.org/abs/2604.04204
作者:Mir Tafseer Nayeem,Davood Rafiei
备注:Preprint
摘要:大型语言模型(LLM)越来越多地部署在高风险领域,但它们只暴露了有限的语言设置,最明显的是“英语(美国)”,尽管英语的全球多样性和殖民历史。通过后殖民框架来解释更广泛的意义,我们研究了数据策展,数字主导地位和语言标准化的地缘政治历史如何塑造LLM开发管道。针对两个占主导地位的标准品种,美国英语(AmE)和英国英语(BrE),我们构建了一个精选的语料库1,813 AmE-BrE变体,并引入DiAlign,一个动态的,无训练的方法估计方言对齐使用分布证据。我们通过对三个阶段的证据进行三角测量来操作结构偏差:(i)对六个主要预训练语料库的审计显示系统性的AmE倾向,(ii)标记器分析显示BrE形式产生更高的分割成本,(iii)生成评估显示模型输出中持续的AmE偏好。据我们所知,这是第一次在法学硕士发展阶段对标准英语变体中方言不对称性进行系统且多方面的检查。我们发现,当代LLM将AmE视为事实上的规范,引发了对语言同质化,认知不公正和全球人工智能部署不公平的担忧,同时激发了对更具方言包容性的语言技术的实际步骤。
摘要:Large language models (LLMs) are increasingly deployed in high-stakes domains, yet they expose only limited language settings, most notably "English (US)," despite the global diversity and colonial history of English. Through a postcolonial framing to explain the broader significance, we investigate how geopolitical histories of data curation, digital dominance, and linguistic standardization shape the LLM development pipeline. Focusing on two dominant standard varieties, American English (AmE) and British English (BrE), we construct a curated corpus of 1,813 AmE--BrE variants and introduce DiAlign, a dynamic, training-free method for estimating dialectal alignment using distributional evidence. We operationalize structural bias by triangulating evidence across three stages: (i) audits of six major pretraining corpora reveal systematic skew toward AmE, (ii) tokenizer analyses show that BrE forms incur higher segmentation costs, and (iii) generative evaluations show a persistent AmE preference in model outputs. To our knowledge, this is the first systematic and multi-faceted examination of dialectal asymmetries in standard English varieties across the phases of LLM development. We find that contemporary LLMs privilege AmE as the de facto norm, raising concerns about linguistic homogenization, epistemic injustice, and inequity in global AI deployment, while motivating practical steps toward more dialectally inclusive language technologies.
【16】Embedding Enhancement via Fine-Tuned Language Models for Learner-Item Cognitive Modeling
标题:通过微调语言模型嵌入增强学习者项目认知建模
链接:https://arxiv.org/abs/2604.04088
作者:Yuanhao Liu,Zihan Zhou,Kaiying Wu,Shuo Liu,Yiyang Huang,Jiajun Guo,Aimin Zhou,Hong Qian
备注:Accepted by The ACM Web Conference 2026 (WWW '26)
摘要:学习者项目认知建模在基于Web的在线智能教育系统中发挥着核心作用,它可以在不同的在线教育场景中实现认知诊断。虽然ID嵌入仍然是认知建模的主流方法,由于其有效性和灵活性,语言模型(LM)的最新进展,引入了新的可能性,将丰富的语义表示,以提高CD性能。这突出表明,需要全面分析LM如何通过跨主流CD任务的语义集成来增强嵌入。本文确定了在现有工作中充分利用LM的两个关键挑战:LM和CD模型的训练目标之间的不一致在特征空间中产生了分布差距;统一的框架对于在不同的CD任务中集成文本嵌入,同时保留现有认知建模范式的优势,以确保嵌入增强的鲁棒性至关重要。为了解决这些挑战,本文介绍了EduEmbed,一个统一的嵌入增强框架,利用微调的LM,以丰富不同的CD任务的学习者项目认知建模。EduEmbed分两个阶段运作。在第一阶段,我们基于特定于角色的表示和交互诊断器来微调LM,以弥合CD模型的语义差距。在第二阶段,我们采用了文本适配器提取任务相关的语义,并将它们与现有的建模范式,以提高泛化。我们在四个CD任务和计算机自适应测试(CAT)任务上评估了拟议的框架,取得了稳健的性能。进一步的分析揭示了语义信息在不同任务中的影响,为未来在线智能教育系统中CD中LM的应用研究提供了关键见解。
摘要:Learner-item cognitive modeling plays a central role in the web-based online intelligent education system by enabling cognitive diagnosis (CD) across diverse online educational scenarios. Although ID embedding remains the mainstream approach in cognitive modeling due to its effectiveness and flexibility, recent advances in language models (LMs) have introduced new possibilities for incorporating rich semantic representations to enhance CD performance. This highlights the need for a comprehensive analysis of how LMs enhance embeddings through semantic integration across mainstream CD tasks. This paper identifies two key challenges in fully leveraging LMs in existing work: Misalignment between the training objectives of LMs and CD models creates a distribution gap in feature spaces; A unified framework is essential for integrating textual embeddings across varied CD tasks while preserving the strengths of existing cognitive modeling paradigms to ensure the robustness of embedding enhancement. To address these challenges, this paper introduces EduEmbed, a unified embedding enhancement framework that leverages fine-tuned LMs to enrich learner-item cognitive modeling across diverse CD tasks. EduEmbed operates in two stages. In the first stage, we fine-tune LMs based on role-specific representations and an interaction diagnoser to bridge the semantic gap of CD models. In the second stage, we employ a textual adapter to extract task-relevant semantics and integrate them with existing modeling paradigms to improve generalization. We evaluate the proposed framework on four CD tasks and computerized adaptive testing (CAT) task, achieving robust performance. Further analysis reveals the impact of semantic information across diverse tasks, offering key insights for future research on the application of LMs in CD for online intelligent education systems.
【17】Unmasking Hallucinations: A Causal Graph-Attention Perspective on Factual Reliability in Large Language Models
标题:揭开幻觉:大型语言模型中事实可靠性的因果图-注意力视角
链接:https://arxiv.org/abs/2604.04020
作者:Sailesh kiran kurra,Shiek Ruksana,Vishal Borusu
备注:Paper accepted for publication at IEEE International Conference on Emerging Computing and Intelligent Technologies 2026 (ICoECIT),5 Pages,5 figures,1 table
摘要:本文主要研究人工智能语言模型(LLM)所引起的幻觉,LLM显示出非凡的语言理解和生成能力,但它也有一个主要的缺点,即产生的输出是不正确的、误导的或不支持输入数据的幻觉。这些幻觉在医疗诊断或法律推理等场景中会导致严重的问题。通过这项工作,我们提出了因果图注意力网络(GCAN)该框架通过解释Transformer架构中的内部注意流来减少幻觉,并借助构造结合自我注意权重和基于梯度的影响分数的令牌级图,我们的方法使用新的度量来量化每个令牌的实际依赖性因果贡献分数(CCS)我们还引入了一个事实锚定的图形重新加权层,动态减少幻觉倾向的节点在生成过程中的影响。在TruthfulQA和HotpotQA等标准基准测试上的实验显示,与基线检索增强生成(RAG)模型相比,幻觉率降低了27.8%,事实准确率提高了16.4%。这项工作有助于未来LLM架构的可解释性,鲁棒性和实际可靠性。
摘要:This paper primarily focuses on the hallucinations caused due to AI language models(LLMs).LLMs have shown extraordinary Language understanding and generation capabilities .Still it has major a disadvantage hallucinations which give outputs which are factually incorrect ,misleading or unsupported by input data . These hallucinations cause serious problems in scenarios like medical diagnosis or legal reasoning.Through this work,we propose causal graph attention network (GCAN) framework that reduces hallucinations through interpretation of internal attention flow within a transformer architecture with the help of constructing token level graphs that combine self attention weights and gradient based influence scores.our method quantifies each tokens factual dependency using a new metric called the Causal Contribution Score (CCS). We further introduce a fact-anchored graph reweighting layer that dynamically reduces the influence of hallucination prone nodes during generation. Experiments on standard benchmarks such as TruthfulQA and HotpotQA show a 27.8 percent reduction in hallucination rate and 16.4 percent improvement in factual accuracy over baseline retrieval-augmented generation (RAG) models. This work contributes to the interpretability,robustness, and factual reliability of future LLM architectures.
【18】Can LLMs Learn to Reason Robustly under Noisy Supervision?
标题:法学硕士能否学会在喧闹的监督下顽强地推理?
链接:https://arxiv.org/abs/2604.03993
作者:Shenzhi Yang,Guangcheng Zhu,Bowen Song,Sharon Li,Haobo Wang,Xing Zheng,Yingfan Ma,Zhongqi Chen,Weiqiang Wang,Gang Chen
摘要
:具有可验证奖励的强化学习(RLVR)有效地训练了依赖于丰富完美标签的推理模型,但由于专家稀缺,其对不可避免的噪声标签的脆弱性仍然严重不足。在这项工作中,我们迈出了第一步,对噪声标签机制RLVR的系统分析。与监督分类相比,大多数RLVR算法都包含一个基于滚动的条件:标签对训练的影响取决于当前策略是否可以生成实现它的滚动,这一属性自然会扩展到嘈杂的标签。基于这一观察结果,我们区分了两种类型的噪声:非活动噪声标签,这会降低数据效率,而活动噪声标签,这会被强化,并有可能使模型偏向不正确的分布。从噪声样本的训练实验中,我们发现了早期正确性一致性现象:尽管噪声样本在后期阶段开始落后,但在早期训练中,干净样本和噪声样本的准确性都有类似的提高。受这种动态的激励,我们提出了在线标签细化(OLR),当两个条件成立时,它会逐步纠正具有多数投票答案的潜在噪声标签:多数答案的推出通过率的正斜率和更新之间的稳定历史一致性,随着政策的改进,实现逐步自我纠正。我们在六个分布数学推理基准(AIME 24/25,AMC,MATH-500,Minerva和Olympiad)和三个分布任务(ARC-c,GPQA-diamond和MMLU-pro)上评估OLR。在噪声比从0.1到0.9的范围内,OLR在非活动和活动噪声标签设置下始终提高了鲁棒性,在分布内基准测试中实现了3.6%到3.9%的平均增益,在分布外评估中实现了3.3%到4.6%的平均增益。
摘要:Reinforcement Learning with Verifiable Rewards (RLVR) effectively trains reasoning models that rely on abundant perfect labels, but its vulnerability to unavoidable noisy labels due to expert scarcity remains critically underexplored. In this work, we take the first step toward a systematic analysis of noisy label mechanisms in RLVR. In contrast to supervised classification, most RLVR algorithms incorporate a rollout-based condition: a label's influence on training is contingent on whether the current policy can generate rollouts that realize it, a property that naturally extends to noisy labels. Based on this observation, we distinguish two types of noise: inactive noisy labels, which reduce data efficiency, and active noisy labels, which are reinforced and risk skewing the model toward incorrect distributions. From experiments on training with noisy samples, we identify an Early Correctness Coherence phenomenon: although noisy samples begin to lag behind in later stages, accuracy on both clean and noisy samples increases similarly in early training. Motivated by this dynamic, we propose Online Label Refinement (OLR), which progressively corrects potentially noisy labels with majority-voted answers when two conditions hold: a positive slope in the majority answer's rollout pass rate and stable historical consistency across updates, enabling gradual self-correction as the policy improves. We evaluate OLR on six in-distribution mathematical reasoning benchmarks (AIME24/25, AMC, MATH-500, Minerva, and Olympiad) and three out-of-distribution tasks (ARC-c, GPQA-diamond, and MMLU-pro). Across noise ratios from 0.1 to 0.9, OLR consistently improves robustness under both inactive and active noisy-label settings, achieving average gains of 3.6% to 3.9% on in-distribution benchmarks and 3.3% to 4.6% on out-of-distribution evaluations.
【19】Predict, Don't React: Value-Based Safety Forecasting for LLM Streaming
标题:预测,不要反应:LLM流媒体基于价值的安全预测
链接:https://arxiv.org/abs/2604.03962
作者:Pride Kavumba,Koki Wataoka,Huy H. Nguyen,Jiaxuan Li,Masaya Ohagi
摘要:在许多实际的LLM部署中,单个护栏用于快速和响应调节。即时审核对完全观察到的文本进行操作,而流响应审核需要在部分代上做出安全决策。现有的基于文本的流护栏通常将此输出端问题框定为边界检测,训练模型以识别响应已经变得不安全的最早前缀。在这项工作中,我们介绍了StreamGuard,一个统一的模型不可知的流护栏,而不是制定适度作为一个预测问题:给定一个部分前缀,该模型预测可能的未来延续的预期危害性。我们使用Monte Carlo推出来监督此预测,这使得早期干预无需精确的令牌级边界注释。 在标准的安全基准测试中,StreamGuard在输入审核和流输出审核方面都表现出色。相对于Qwen 3Guard-Stream-8B-strict,StreamGuard在8B规模下将聚合输入调节F1从86.7提高到88.2,将聚合流媒体输出调节F1从80.4提高到81.9。在QWENGUARDTEST response_loc流基准测试中,StreamGuard达到97.5 F1,95.1召回和92.6%及时干预,而Qwen 3Guard-Stream-8B-stric为95.9 F1,92.1召回和89.9%,同时将未命中率从7.9%降至4.9%。我们进一步表明,基于预测的监督在令牌化器和模型家族之间有效地转移:通过转移目标,Gemma 3-StreamGuard-1B达到81.3响应调节F1,98.2流F1和3.5%的未命中率。这些结果表明,强大的端到端流媒体调节可以在没有精确边界标签的情况下获得,并且预测未来风险是低延迟安全干预的有效监管策略。
摘要:In many practical LLM deployments, a single guardrail is used for both prompt and response moderation. Prompt moderation operates on fully observed text, whereas streaming response moderation requires safety decisions to be made over partial generations. Existing text-based streaming guardrails commonly frame this output-side problem as boundary detection, training models to identify the earliest prefix at which a response has already become unsafe. In this work, we introduce StreamGuard, a unified model-agnostic streaming guardrail that instead formulates moderation as a forecasting problem: given a partial prefix, the model predicts the expected harmfulness of likely future continuations. We supervise this prediction using Monte Carlo rollouts, which enables early intervention without requiring exact token-level boundary annotations. Across standard safety benchmarks, StreamGuard performs strongly both for input moderation and for streaming output moderation. At the 8B scale, StreamGuard improves aggregated input-moderation F1 from 86.7 to 88.2 and aggregated streaming output-moderation F1 from 80.4 to 81.9 relative to Qwen3Guard-Stream-8B-strict. On the QWENGUARDTEST response_loc streaming benchmark, StreamGuard reaches 97.5 F1, 95.1 recall, and 92.6% on-time intervention, compared to 95.9 F1, 92.1 recall, and 89.9% for Qwen3Guard-Stream-8B-stric, while reducing the miss rate from 7.9% to 4.9%. We further show that forecasting-based supervision transfers effectively across tokenizers and model families: with transferred targets, Gemma3-StreamGuard-1B reaches 81.3 response-moderation F1, 98.2 streaming F1, and a 3.5% miss rate. These results show that strong end-to-end streaming moderation can be obtained without exact boundary labels, and that forecasting future risk is an effective supervision strategy for low-latency safety intervention.
【20】When Models Know More Than They Say: Probing Analogical Reasoning in LLMs
标题:当模型知道的比他们所说的更多时:探索LLM中的类比推理
链接:https://arxiv.org/abs/2604.03877
作者:Hope McGovern,Caroline Craig,Thomas Lippincott,Hale Sirin
摘要:类比推理是叙事理解的核心认知能力。虽然LLM在表面和结构线索对齐时表现良好,但在表面上不明显但需要潜在信息的情况下,它们会挣扎,这表明抽象和概括的局限性。在本文中,我们比较了一个模型的探测表示与其提示性能在检测叙事类比,揭示了一个不对称:修辞类比,探测显着优于提示在开源模型,而叙事类比,他们实现了类似的(低)性能。这表明,内部表征和提示行为之间的关系是任务依赖性的,可能反映了提示如何访问可用信息的限制。
摘要:Analogical reasoning is a core cognitive faculty essential for narrative understanding. While LLMs perform well when surface and structural cues align, they struggle in cases where an analogy is not apparent on the surface but requires latent information, suggesting limitations in abstraction and generalisation. In this paper we compare a model's probed representations with its prompted performance at detecting narrative analogies, revealing an asymmetry: for rhetorical analogies, probing significantly outperforms prompting in open-source models, while for narrative analogies, they achieve a similar (low) performance. This suggests that the relationship between internal representations and prompted behavior is task-dependent and may reflect limitations in how prompting accesses available information.
【21】SODA: Semi On-Policy Black-Box Distillation for Large Language Models
标题:SODA:大型语言模型的半政策黑匣子蒸馏
链接:https://arxiv.org/abs/2604.03873
作者:Xiwen Chen,Jingjing Wang,Wenhui Zhu,Peijie Qiu,Xuanzhao Dong,Hejian Sang,Zhipeng Wang,Alborz Geramifard,Feng Luo
摘要:大型语言模型的黑盒知识蒸馏提出了严格的权衡。简单的非策略方法(例如,序列级知识蒸馏)努力纠正学生的固有错误。完全符合政策的方法(例如,生成式对抗蒸馏(Generative Adversarial Distillation)通过对抗训练解决了这个问题,但引入了众所周知的训练不稳定性和严重的计算开销。为了解决这一困境,我们提出了SODA(半政策蒸馏与对齐),一个高效的替代方案的动机之间的固有能力差距的前沿教师和小得多的基础模型。由于紧凑的学生模型的自然,zero-shot响应几乎严格劣于强大的教师的目标,我们可以通过将教师的最佳响应与学生输出的一次性静态快照配对来构建高效的对比信号。这表明,将小学生暴露于自己的静态劣行为足以实现高质量的分布对齐,消除了昂贵的动态部署和脆弱的对抗平衡的需要。在四个紧凑的Qwen2.5和Llama-3模型的广泛评估验证了这种半政策范式。SODA在16个基准测试结果中的15个上匹配或优于最先进的方法。更重要的是,它实现了这种卓越的蒸馏质量,同时训练速度提高了10倍,消耗的峰值GPU内存减少了27%,并完全消除了对抗性不稳定性。
摘要:Black-box knowledge distillation for large language models presents a strict trade-off. Simple off-policy methods (e.g., sequence-level knowledge distillation) struggle to correct the student's inherent errors. Fully on-policy methods (e.g., Generative Adversarial Distillation) solve this via adversarial training but introduce well-known training instability and crippling computational overhead. To address this dilemma, we propose SODA (Semi On-policy Distillation with Alignment), a highly efficient alternative motivated by the inherent capability gap between frontier teachers and much smaller base models. Because a compact student model's natural, zero-shot responses are almost strictly inferior to the powerful teacher's targets, we can construct a highly effective contrastive signal simply by pairing the teacher's optimal response with a one-time static snapshot of the student's outputs. This demonstrates that exposing the small student to its own static inferior behaviors is sufficient for high-quality distribution alignment, eliminating the need for costly dynamic rollouts and fragile adversarial balancing. Extensive evaluations across four compact Qwen2.5 and Llama-3 models validate this semi on-policy paradigm. SODA matches or outperforms the state-of-the-art methods on 15 out of 16 benchmark results. More importantly, it achieves this superior distillation quality while training 10 times faster, consuming 27% less peak GPU memory, and completely eliminating adversarial instability.
【22】Where to Steer: Input-Dependent Layer Selection for Steering Improves LLM Alignment
标题:转向哪里:转向的输入相关层选择改善LLM对齐
链接:https://arxiv.org/abs/2604.03867
作者:Soham Gadgil,Chris Lin,Su-In Lee
备注:Preprint
摘要:导向向量已经成为一种轻量级且有效的方法,用于在推理时对齐大型语言模型(LLM),通过将LLM表示向目标行为转移来实现对模型行为的调制。然而,现有的方法通常在全局固定层应用导向向量,隐含地假设最佳干预层在输入之间是不变的。我们认为,这一假设是从根本上限制,作为表示相关的目标行为可以在不同的层编码取决于输入。从理论上讲,我们表明,不同的输入可能需要在不同的层转向,以实现与理想的模型行为对齐。我们还提供了经验证据,最佳的导向层在实践中的输入有很大的不同。受这些观察的启发,我们引入了Where to Steer(W2S),这是一个框架,通过学习从输入嵌入到最佳转向层的映射,自适应地选择以输入为条件的干预层。在多个LLM和对齐行为中,W2S始终优于固定层基线,在分发和分发设置中都有改进。我们的研究结果突出了LLM对齐中输入相关控制的重要性,并表明自适应层选择是当前导向矢量方法中缺少的关键设计维度。
摘要:Steering vectors have emerged as a lightweight and effective approach for aligning large language models (LLMs) at inference time, enabling modulation over model behaviors by shifting LLM representations towards a target behavior. However, existing methods typically apply steering vectors at a globally fixed layer, implicitly assuming that the optimal intervention layer is invariant across inputs. We argue that this assumption is fundamentally limited, as representations relevant to a target behavior can be encoded at different layers depending on the input. Theoretically, we show that different inputs can require steering at different layers to achieve alignment with a desirable model behavior. We also provide empirical evidence that the optimal steering layer varies substantially across inputs in practice. Motivated by these observations, we introduce Where to Steer (W2S), a framework that adaptively selects the intervention layer conditioned on the input, by learning a mapping from input embeddings to optimal steering layers. Across multiple LLMs and alignment behaviors, W2S consistently outperforms fixed-layer baselines, with improvements in both in-distribution and out-of-distribution settings. Our findings highlight the importance of input-dependent control in LLM alignment and demonstrate that adaptive layer selection is a key design dimension missing in the current methodology of steering vectors.
【23】Representational Collapse in Multi-Agent LLM Committees: Measurement and Diversity-Aware Consensus
标题:多主体LLM委员会的代表性崩溃:测量和多元化意识共识
链接:https://arxiv.org/abs/2604.03809
作者:Dipkumar Patel
备注:11 pages, 2 figures, 7 tables
摘要
:多代理LLM委员会在不同的角色提示下复制相同的模型,并通过多数投票汇总输出,隐含地假设代理提供补充证据。我们嵌入每个代理的思想链原理并测量成对相似性:在100个GSM 8 K问题中,三个Qwen2.5- 14 B代理的平均余弦相似性为0.888,有效秩为2.17(满分3.0),这是一种我们称之为代表性崩溃的故障模式。DALC是一种免训练共识协议,根据嵌入几何计算多样性权重,在GSM 8 K上达到87%,而自一致性达到84%,代币成本降低26%。消融实验揭示了1-3个点的每协议运行到运行的方差,确认提示共享贡献超过单独的多样性加权,并显示编码器的选择强烈调制崩溃的严重性(余弦0.908与mxbai与0.888与nomic)和下游的准确性。更强大的发现是,崩溃是可测量的,更难的任务,嵌入代理的选择是一个一阶设计决策的任何潜在的通信协议。
摘要:Multi-agent LLM committees replicate the same model under different role prompts and aggregate outputs by majority vote, implicitly assuming that agents contribute complementary evidence. We embed each agent's chain-of-thought rationale and measure pairwise similarity: across 100 GSM8K questions with three Qwen2.5-14B agents, mean cosine similarity is 0.888 and effective rank is 2.17 out of 3.0, a failure mode we term representational collapse. DALC, a training-free consensus protocol that computes diversity weights from embedding geometry, reaches 87% on GSM8K versus 84% for self-consistency at 26% lower token cost. Ablation experiments reveal 1-3 point per-protocol run-to-run variance, confirm that hint sharing contributes more than diversity weighting alone, and show that encoder choice strongly modulates collapse severity (cosine 0.908 with mxbai versus 0.888 with nomic) and downstream accuracy. The more robust finding is that collapse is measurable, worsens on harder tasks, and that the choice of embedding proxy is a first-order design decision for any latent communication protocol.
【24】Automated Attention Pattern Discovery at Scale in Large Language Models
标题:大型语言模型中大规模自动注意力模式发现
链接:https://arxiv.org/abs/2604.03764
作者:Jonathan Katzy,Razvan-Mihai Popescu,Erik Mekkes,Arie van Deursen,Maliheh Izadi
备注:Accepted to TMLR
摘要:大型语言模型通过扩展在一般环境中工作的能力而取得了成功。不幸的是,对于可解释性方法来说,情况并非如此。目前的趋势是在机械可解释性提供精确的解释特定的行为在控制设置。这些方法往往不能概括,或者对于大型研究来说过于资源密集。在这项工作中,我们建议研究重复的行为在大型语言模型挖掘完成的情况下,在Java代码数据集,通过利用结构化的代码。我们收集的注意力模式中产生的注意力头,以证明它们是可扩展的信号的全球可解释性的模型组件。我们表明,视觉模型提供了一个很有前途的方向,在规模上分析注意模式。为了证明这一点,我们引入了注意力模式掩蔽自动编码器(AP-MAE),这是一种基于视觉变换器的模型,可以有效地重建掩蔽的注意力模式。在StarCoder 2上的实验表明,AP-MAE(i)以高精度重建被掩盖的注意力模式,(ii)以最小的退化在看不见的模型中泛化,(iii)在推理中揭示重复的模式,(iv)预测一代人是否正确,而无需访问地面真相,准确率范围从55%到70%,具体取决于任务,以及(v)能够进行有针对性的干预,当选择性地应用时,可以将准确性提高13.6%,但当过度应用时会导致塌陷。这些结果将注意力模式建立为可解释性的可扩展信号,并表明AP-MAE为大型语言模型的分析和干预提供了可转移的基础。除了其独立的价值,AP-MAE还作为一个选择程序,以指导细粒度的机械方法。我们发布代码和模型,以支持未来大规模可解释性的工作。
摘要:Large language models have found success by scaling up capabilities to work in general settings. The same can unfortunately not be said for interpretability methods. The current trend in mechanistic interpretability is to provide precise explanations of specific behaviors in controlled settings. These often do not generalize, or are too resource intensive for larger studies. In this work we propose to study repeated behaviors in large language models by mining completion scenarios in Java code datasets, through exploiting the structured nature of code. We collect the attention patterns generated in the attention heads to demonstrate that they are scalable signals for global interpretability of model components. We show that vision models offer a promising direction for analyzing attention patterns at scale. To demonstrate this, we introduce the Attention Pattern - Masked Autoencoder(AP-MAE), a vision transformer-based model that efficiently reconstructs masked attention patterns. Experiments on StarCoder2 show that AP-MAE (i) reconstructs masked attention patterns with high accuracy, (ii) generalizes across unseen models with minimal degradation, (iii) reveals recurring patterns across inferences, (iv) predicts whether a generation will be correct without access to ground truth, with accuracies ranging from 55% to 70% depending on the task, and (v) enables targeted interventions that increase accuracy by 13.6% when applied selectively, but cause collapse when applied excessively. These results establish attention patterns as a scalable signal for interpretability and demonstrate that AP-MAE provides a transferable foundation for both analysis and intervention in large language models. Beyond its standalone value, AP-MAE also serves as a selection procedure to guide fine-grained mechanistic approaches. We release code and models to support future work in large-scale interpretability.
【25】Rethinking Token Prediction: Tree-Structured Diffusion Language Model
标题:重新思考代币预测:树结构扩散语言模型
链接:https://arxiv.org/abs/2604.03537
作者:Zihao Wu,Haoming Yang,Juncheng Dong,Vahid Tarokh
摘要:离散扩散语言模型已经成为自回归语言模型的一个有竞争力的替代方案,但在有限的参数和内存预算下有效地训练它们仍然具有挑战性。现代架构主要基于全词汇表标记预测层,其占模型参数的相当大的一部分(例如,在小规模DiT风格的设计中超过20%),并且通常主导峰值GPU内存使用。这导致在受限的训练资源下参数和存储器的低效使用。为了解决这个问题,我们重新审视了显式全词汇预测的必要性,而是利用令牌之间的固有结构来构建树结构的扩散语言模型。具体来说,我们模拟的扩散过程与中间的潜在状态对应的令牌的祖先节点在预先构建的词汇树。这种树结构的分解指数地降低了分类维度,使得预测头部的大小可以忽略不计,并且能够重新分配参数以加深注意力块。从经验上讲,在相同的参数预算下,我们的方法将峰值GPU内存使用量减少了一半,同时匹配最先进的离散扩散语言模型的困惑性能。
摘要:Discrete diffusion language models have emerged as a competitive alternative to auto-regressive language models, but training them efficiently under limited parameter and memory budgets remains challenging. Modern architectures are predominantly based on a full-vocabulary token prediction layer, which accounts for a substantial fraction of model parameters (e.g., more than 20% in small scale DiT-style designs) and often dominates peak GPU memory usage. This leads to inefficient use of both parameters and memory under constrained training resources. To address this issue, we revisit the necessity of explicit full-vocabulary prediction, and instead exploit the inherent structure among tokens to build a tree-structured diffusion language model. Specifically, we model the diffusion process with intermediate latent states corresponding to a token's ancestor nodes in a pre-constructed vocabulary tree. This tree-structured factorization exponentially reduces the classification dimensionality, makes the prediction head negligible in size, and enables reallocation of parameters to deepen the attention blocks. Empirically, under the same parameter budget, our method reduces peak GPU memory usage by half while matching the perplexity performance of state-of-the-art discrete diffusion language models.
【26】Beauty in the Eye of AI: Aligning LLMs and Vision Models with Human Aesthetics in Network Visualization
标题:人工智能眼中的美:将LLM和视觉模型与网络可视化中的人类美学结合起来
链接:https://arxiv.org/abs/2604.03417
作者:Peng Zhang,Xuefeng Li,Xiaoqi Wang,Han-Wei Shen,Yifan Hu
摘要:网络可视化传统上依赖于启发式指标,如压力,假设优化它们会导致美观和信息丰富的布局。然而,没有一个单一的指标能始终如一地产生最有效的结果。数据驱动的替代方案是从人类偏好中学习,注释者在相同图形的多个布局中选择他们喜欢的可视化。然后,这些人类偏好标签可以用于训练近似人类审美偏好的生成模型。然而,大规模获得人类标签是昂贵和耗时的。因此,到目前为止,这种生成方法只使用机器标记的数据进行了测试。在本文中,我们将探索使用大型语言模型(LLM)和视觉模型(VM)作为人类判断的代理。通过一项精心设计的涉及27名参与者的用户研究,我们策划了一系列人类偏好标签。我们使用这些数据来更好地了解人类的偏好,并引导LLM/VM标签。我们表明,提示工程,结合Few-Shot的例子和不同的输入格式,如图像嵌入,显着提高了LLM-人的对齐,并通过LLM的置信度分数的额外过滤推到人与人的水平对齐。此外,我们证明,经过仔细训练的VM可以实现VM-人类对齐的水平与人类注释者之间的水平相当。我们的研究结果表明,人工智能可以作为人类标签的可扩展代理。
摘要:Network visualization has traditionally relied on heuristic metrics, such as stress, under the assumption that optimizing them leads to aesthetic and informative layouts. However, no single metric consistently produces the most effective results. A data-driven alternative is to learn from human preferences, where annotators select their favored visualization among multiple layouts of the same graphs. These human-preference labels can then be used to train a generative model that approximates human aesthetic preferences. However, obtaining human labels at scale is costly and time-consuming. As a result, this generative approach has so far been tested only with machine-labeled data. In this paper, we explore the use of large language models (LLMs) and vision models (VMs) as proxies for human judgment. Through a carefully designed user study involving 27 participants, we curated a large set of human preference labels. We used this data both to better understand human preferences and to bootstrap LLM/VM labelers. We show that prompt engineering that combines few-shot examples and diverse input formats, such as image embeddings, significantly improves LLM-human alignment, and additional filtering by the confidence score of the LLM pushes the alignment to human-human levels. Furthermore, we demonstrate that carefully trained VMs can achieve VM-human alignment at a level comparable to that between human annotators. Our results suggest that AI can feasibly serve as a scalable proxy for human labelers.
【27】Scalable Variational Bayesian Fine-Tuning of LLMs via Orthogonalized Low-Rank Adapters
标题:基于归一化低秩自适应器的LLM可扩展变分贝叶斯微调
链接:https://arxiv.org/abs/2604.03388
作者:Haotian Xiang,Bingcong Li,Qin Lu
摘要:在将大型语言模型(LLM)部署到安全关键型应用程序时,不确定性量化(UQ)对于自我评估基于LLM的决策的可靠性至关重要。然而,这样的决策通常会受到过度自信的影响,特别是在针对具有有限数据的下游特定领域任务的参数有效微调(PEFT)之后。缓解此问题的现有方法依赖于基于拉普拉斯近似的事后框架,其可能根据训练轨迹产生次优校准,或者依赖于变分贝叶斯训练,其需要在蒙特卡洛估计的推断时间通过整个LLM骨干的多个完整前向传递,从而对部署提出可扩展性挑战。为了解决这些限制,我们建立在贝叶斯最后一层(BLL)模型,其中基于LLM的确定性特征提取器后面是随机的最后一层参数,用于不确定性推理。由于PEFT的现有低秩适配器(LoRA)由于秩崩溃而具有有限的表达能力,因此我们使用极性分解的低秩适配器表示(PoLAR)来解决这个问题,这是一种与黎曼优化配对的正交化参数化,以实现更稳定和更有表达力的适应。在此PoLAR-BLL模型的基础上,我们利用变分(V)推理框架提出了一种可扩展的贝叶斯微调方法,该方法通过交替优化来联合寻找PoLAR参数和最后一层参数的近似后验。由此产生的PoLAR-VBLL是一个灵活的框架,很好地集成了架构增强的优化与可扩展的贝叶斯推理赋予LLM良好校准的UQ。我们的实证结果验证了PoLAR-VBLL的泛化和不确定性估计方面的分布和分布外的数据为各种常识推理任务的有效性。
摘要:When deploying large language models (LLMs) to safety-critical applications, uncertainty quantification (UQ) is of utmost importance to self-assess the reliability of the LLM-based decisions. However, such decisions typically suffer from overconfidence, particularly after parameter-efficient fine-tuning (PEFT) for downstream domain-specific tasks with limited data. Existing methods to alleviate this issue either rely on Laplace approximation based post-hoc framework, which may yield suboptimal calibration depending on the training trajectory, or variational Bayesian training that requires multiple complete forward passes through the entire LLM backbone at inference time for Monte Carlo estimation, posing scalability challenges for deployment. To address these limitations, we build on the Bayesian last layer (BLL) model, where the LLM-based deterministic feature extractor is followed by random last layer parameters for uncertainty reasoning. Since existing low-rank adapters (LoRA) for PEFT have limited expressiveness due to rank collapse, we address this with Polar-decomposed Low-rank Adapter Representation (PoLAR), an orthogonalized parameterization paired with Riemannian optimization to enable more stable and expressive adaptation. Building on this PoLAR-BLL model, we leverage the variational (V) inference framework to put forth a scalable Bayesian fine-tuning approach which jointly seeks the PoLAR parameters and approximate posterior of the last layer parameters via alternating optimization. The resulting PoLAR-VBLL is a flexible framework that nicely integrates architecture-enhanced optimization with scalable Bayesian inference to endow LLMs with well-calibrated UQ. Our empirical results verify the effectiveness of PoLAR-VBLL in terms of generalization and uncertainty estimation on both in-distribution and out-of-distribution data for various common-sense reasoning tasks.
【28】The limits of bio-molecular modeling with large language models : a cross-scale evaluation
标题:使用大型语言模型进行生物分子建模的局限性:跨规模评估
链接:https://arxiv.org/abs/2604.03361
作者:Yaxin Xu,Yue Zhou,Tianyu Zhao,Fengwei An,Zhixiang Ren
摘要
:跨分子尺度的生物分子系统建模一直是科学研究的核心挑战。大型语言模型(LLM)越来越多地应用于生物分子发现,但对多尺度生物问题的系统评估和对其工具增强能力的严格评估仍然有限。我们通过提出的跨尺度生物分子基准揭示了LLM性能和机制理解之间的系统性差距:BioMol-LLM-Bench,一个统一的框架,包括26个下游任务,涵盖4个不同的难度级别,并集成了计算工具,以进行更全面的评估。对13个代表性模型的评估揭示了4个主要发现:思想链数据提供的益处有限,甚至可能降低生物任务的性能;混合曼巴注意力架构对长生物分子序列更有效;监督微调以泛化为代价提高了专业化;目前的LLM在分类任务上表现良好,但在具有挑战性的回归任务上仍然很弱。总之,这些发现为未来基于LLM的分子系统建模提供了实用指导。
摘要:The modeling of bio-molecular system across molecular scales remains a central challenge in scientific research. Large language models (LLMs) are increasingly applied to bio-molecular discovery, yet systematic evaluation across multi-scale biological problems and rigorous assessment of their tool-augmented capabilities remain limited. We reveal a systematic gap between LLM performance and mechanistic understanding through the proposed cross-scale bio-molecular benchmark: BioMol-LLM-Bench, a unified framework comprising 26 downstream tasks that covers 4 distinct difficulty levels, and computational tools are integrated for a more comprehensive evaluation. Evaluation on 13 representative models reveals 4 main findings: chain-of-thought data provides limited benefit and may even reduce performance on biological tasks; hybrid mamba-attention architectures are more effective for long bio-molecular sequences; supervised fine-tuning improves specialization at the cost of generalization; and current LLMs perform well on classification tasks but remain weak on challenging regression tasks. Together, these findings provide practical guidance for future LLM-based modeling of molecular systems.
【29】Generative Chemical Language Models for Energetic Materials Discovery
标题:用于高能材料发现的生成化学语言模型
链接:https://arxiv.org/abs/2604.03304
作者:Andrew Salij,R. Seaton Ullberg,Megan C. Davis,Marc J. Cawkwell,Christopher J. Snyder,Cristina Garcia Cardona,Ivana Matanovic,Wilton J. M. Kort-Kamp
摘要:由于高质量数据有限,发现新的高能材料仍然是一个紧迫的挑战。为了解决这个问题,我们开发了生成式分子语言模型,这些模型已经在广泛的化学数据上进行了预训练,然后用精心策划的高能材料数据集进行了微调。这种迁移学习策略将化学语言模型的能力扩展到了它们主要开发的药理学空间之外,提供了一个适用于其他数据备用发现问题的框架。此外,我们还讨论了基于片段的分子编码的化学语言模型的好处,特别是在构建合成可访问的结构。总之,这些进展为加速具有苛刻性能要求的下一代含能材料的设计提供了基础。
摘要:The discovery of new energetic materials remains a pressing challenge hindered by limited availability of high-quality data. To address this, we have developed generative molecular language models that have been pretrained on extensive chemical data and then fine-tuned with curated energetic materials datasets. This transfer-learning strategy extends the chemical language model capabilities beyond the pharmacological space in which they have been predominantly developed, offering a framework applicable to other data-spare discovery problems. Furthermore, we discuss the benefits of fragment-based molecular encodings for chemical language models, in particular in constructing synthetically accessible structures. Together, these advances provide a foundation for accelerating the design of next-generation energetic materials with demanding performance requirements.
Graph相关(图学习|图神经网络|图优化等)(7篇)
【1】Empowering Power Outage Prediction with Spatially Aware Hybrid Graph Neural Networks and Contrastive Learning
标题:利用空间感知混合图神经网络和对比学习支持停电预测
链接:https://arxiv.org/abs/2604.04916
作者:Xuyang Shen,Zijie Pan,Diego Cerrai,Xinxuan Zhang,Christopher Colorio,Emmanouil N. Anagnostou,Dongjin Song
摘要:极端天气事件,如严重风暴、飓风、暴风雪和冰暴,因气候变化而加剧,经常导致大范围停电。这些停电使工业运营停止,影响社区,破坏关键基础设施,深刻破坏经济,并对各个部门产生深远影响。为了减轻这些影响,康涅狄格大学和Eversource能源中心开发了一个停电预测建模(OPM)系统,在此类天气事件发生之前为配电网络提供先发制人的预测。然而,系统中现有的预测模型没有纳入极端天气事件的空间效应。为此,我们开发了具有对比学习的空间感知混合图神经网络(SA-HGNN),以增强极端天气引起的停电的OPM预测。具体来说,我们首先对两个静态特征的空间关系进行编码(例如,土地覆被、基础设施)和特定事件的动态特征(例如,风速,降水)通过空间感知混合图神经网络(SA-HGNN)。接下来,我们利用对比学习来处理与不同类型的极端天气事件相关的不平衡问题,并通过最小化相似位置之间的事件内距离,同时最大化所有位置之间的事件间距离来生成特定位置的嵌入。在四个公用事业服务领域进行了深入的实证研究,即,康涅狄格州,马萨诸塞州西部,马萨诸塞州东部和新罕布什尔州,证明了SA-HGNN可以实现最先进的停电预测性能。
摘要:Extreme weather events, such as severe storms, hurricanes, snowstorms, and ice storms, which are exacerbated by climate change, frequently cause widespread power outages. These outages halt industrial operations, impact communities, damage critical infrastructure, profoundly disrupt economies, and have far-reaching effects across various sectors. To mitigate these effects, the University of Connecticut and Eversource Energy Center have developed an outage prediction modeling (OPM) system to provide pre-emptive forecasts for electric distribution networks before such weather events occur. However, existing predictive models in the system do not incorporate the spatial effect of extreme weather events. To this end, we develop Spatially Aware Hybrid Graph Neural Networks (SA-HGNN) with contrastive learning to enhance the OPM predictions for extreme weather-induced power outages. Specifically, we first encode spatial relationships of both static features (e.g., land cover, infrastructure) and event-specific dynamic features (e.g., wind speed, precipitation) via Spatially Aware Hybrid Graph Neural Networks (SA-HGNN). Next, we leverage contrastive learning to handle the imbalance problem associated with different types of extreme weather events and generate location-specific embeddings by minimizing intra-event distances between similar locations while maximizing inter-event distances across all locations. Thorough empirical studies in four utility service territories, i.e., Connecticut, Western Massachusetts, Eastern Massachusetts, and New Hampshire, demonstrate that SA-HGNN can achieve state-of-the-art performance for power outage prediction.
【2】Towards Agentic Defect Reasoning: A Graph-Assisted Retrieval Framework for Laser Powder Bed Fusion
标题:走向统计缺陷推理:激光粉末床聚变的图形辅助检索框架
链接:https://arxiv.org/abs/2604.04208
作者:Muhammad Rizwan Awan,Volker Pickert,Muhammad Waqar Ashraf,Saleh Ali,Farshid Mahmouditabar,Shafiq Odhano
摘要:激光粉末床熔合(LPBF)对工艺参数高度敏感,这些工艺参数通过复杂的热和流体机制影响缺陷的形成。然而,缺陷相关的知识分散在文献中,限制了系统的理解。本文以Ti6Al4V为例,提出了一种基于图的缺陷推理检索框架。科学出版物被转换为结构化的表示,参数,机制和缺陷之间的关系被编码到证据链接的知识图中。该框架集成了语义和基于图形的检索,支持一个轻量级的基于代理的推理层,以构建可解释的缺陷路径。评价显示了较高的检索准确率(0.9667)和召回率(0.9667),证明了相关缺陷相关证据的有效识别。该框架使透明的推理链连接过程参数的缺陷。这项工作提供了一种可扩展的方法,将非结构化文献转换为增材制造的可查询和可解释的知识资源。
摘要:Laser Powder Bed Fusion (LPBF) is highly sensitive to process parameters, which influence defect formation through complex thermal and fluid mechanisms. However, defect-related knowledge is dispersed across the literature, limiting systematic understanding. This study presents a graph-assisted retrieval framework for defect reasoning in LPBF, using Ti6Al4V as a case study. Scientific publications are transformed into a structured representation, and relationships between parameters, mechanisms, and defects are encoded into an evidence-linked knowledge graph. The framework integrates semantic and graph-based retrieval, supported by a lightweight agent-based reasoning layer to construct interpretable defect pathways. Evaluation shows high retrieval accuracy (0.9667) and recall (0.9667), demonstrating effective identification of relevant defect related evidence. The framework enables transparent reasoning chains linking process parameters to defects. This work provides a scalable approach for converting unstructured literature into a query able and interpretable knowledge resource for additive manufacturing.
【3】Explainability-Guided Adversarial Attacks on Transformer-Based Malware Detectors Using Control Flow Graphs
标题:使用控制流图对基于转换器的恶意软件检测器进行解释性引导的对抗攻击
链接:https://arxiv.org/abs/2604.03843
作者:Andrew Wheeler,Kshitiz Aryal,Maanak Gupta
备注:9 pages, 3 figures, 4 tables, 1 algorithm, 2 equations
摘要:在诸如控制流图(CFG)的图形模态上操作的基于变换器的恶意软件检测系统通过对程序行为中的结构关系进行建模来实现强大的性能。然而,它们对对抗性规避攻击的鲁棒性仍然没有得到充分的研究。本文研究了基于RoberTa的恶意软件检测器的漏洞,该检测器将CFG线性化为函数调用序列,这是一种允许Transformer建模的设计选择,但可能会引入令牌级敏感性和可被对手利用的排序工件。通过评估规避策略在这个图序列框架内,我们提供了深入了解基于变压器的恶意软件检测器的实际鲁棒性超出聚合检测精度。 本文提出了一种白盒对抗规避攻击,利用可解释性机制来识别和干扰最有影响力的图组件。使用从集成梯度中导出的令牌和单词级属性,该攻击迭代地用合成外部导入替换正属性函数调用,从而在不改变整体程序结构的情况下产生对抗性CFG表示。对小型和大型Windows便携式可执行文件(PE)数据集的实验评估表明,该方法可以可靠地引起错误分类,即使是对训练到高精度的模型。我们的研究结果强调,可解释性工具,而有价值的可解释性,也可以暴露关键的攻击面在基于变压器的恶意软件检测器。
摘要:Transformer-based malware detection systems operating on graph modalities such as control flow graphs (CFGs) achieve strong performance by modeling structural relationships in program behavior. However, their robustness to adversarial evasion attacks remains underexplored. This paper examines the vulnerability of a RoBERTa-based malware detector that linearizes CFGs into sequences of function calls, a design choice that enables transformer modeling but may introduce token-level sensitivities and ordering artifacts exploitable by adversaries. By evaluating evasion strategies within this graph-to-sequence framework, we provide insight into the practical robustness of transformer-based malware detectors beyond aggregate detection accuracy. This paper proposes a white-box adversarial evasion attack that leverages explainability mechanisms to identify and perturb most influential graph components. Using token- and word-level attributions derived from integrated gradients, the attack iteratively replaces positively attributed function calls with synthetic external imports, producing adversarial CFG representations without altering overall program structure. Experimental evaluation on small- and large-scale Windows Portable Executable (PE) datasets demonstrates that the proposed method can reliably induce misclassification, even against models trained to high accuracy. Our results highlight that explainability tools, while valuable for interpretability, can also expose critical attack surfaces in transformer-based malware detectors.
【4】k-Maximum Inner Product Attention for Graph Transformers and the Expressive Power of GraphGPS The Expressive Power of GraphGPS
标题:图形变换器的k-最大值内积注意力和GraphGPS的表现力GraphGPS的表现力
链接:https://arxiv.org/abs/2604.03815
作者:Jonas De Schouwer,Haitz Sáez de Ocáriz Borde,Xiaowen Dong
备注:Accepted at the ICLR 2026 GRaM Workshop. 9 pages, 9 figures, 16 tables; 30 pages of supplementary material
摘要:图Transformers在克服传统图神经网络的局限性方面表现出了希望,例如过度挤压和建模长期依赖性的困难。然而,他们的应用到大规模的图是阻碍了二次记忆和计算复杂性的所有对所有的注意力机制。虽然已经提出了线性化注意力和限制注意力模式等替代方案,但这些通常会降低性能或限制表达能力。为了更好地平衡效率和有效性,我们引入了k-最大内积(k-MIP)的注意图Transformers。k-MIP attention通过top-k操作为每个查询选择最相关的关键节点,从而产生稀疏但灵活的注意模式。结合基于符号矩阵的注意力得分计算,这导致线性内存复杂性和实际加速比全对全注意力高出一个数量级,从而能够在单个A100 GPU上处理超过50万个节点的图形。我们提供了一个理论分析的表达能力,表明K-MIP注意不妥协的表达图形Transformers:具体来说,我们证明了K-MIP Transformers可以近似任何全注意力Transformer任意精度。此外,我们分析了GraphGPS框架的表达能力,其中我们集成了我们的注意力机制,并建立了一个上限的图形区分能力的S-SEG-WL测试。最后,我们在长距离图基准测试、城市网络基准测试和两个定制的大规模归纳点云数据集上验证了我们的方法,这些数据集一直是性能最好的可扩展图Transformers之一。
摘要
:Graph transformers have shown promise in overcoming limitations of traditional graph neural networks, such as oversquashing and difficulties in modelling long-range dependencies. However, their application to large-scale graphs is hindered by the quadratic memory and computational complexity of the all-to-all attention mechanism. Although alternatives such as linearized attention and restricted attention patterns have been proposed, these often degrade performance or limit expressive power. To better balance efficiency and effectiveness, we introduce k-Maximum Inner Product (k-MIP) attention for graph transformers. k-MIP attention selects the most relevant key nodes per query via a top-k operation, yielding a sparse yet flexible attention pattern. Combined with an attention score computation based on symbolic matrices, this results in linear memory complexity and practical speedups of up to an order of magnitude compared to all-to-all attention, enabling the processing of graphs with over 500k nodes on a single A100 GPU. We provide a theoretical analysis of expressive power, showing that k-MIP attention does not compromise the expressiveness of graph transformers: specifically, we prove that k-MIP transformers can approximate any full-attention transformer to arbitrary precision. In addition, we analyze the expressive power of the GraphGPS framework, in which we integrate our attention mechanism, and establish an upper bound on its graph distinguishing capability in terms of the S-SEG-WL test. Finally, we validate our approach on the Long Range Graph Benchmark, the City-Networks benchmark, and two custom large-scale inductive point cloud datasets, consistently ranking among the top-performing scalable graph transformers.
【5】Learning Superpixel Ensemble and Hierarchy Graphs for Melanoma Detection
标题:学习超像素集合和层次图用于黑色素瘤检测
链接:https://arxiv.org/abs/2604.03710
作者:Asmaa M. Elwer,Muhammad A. Rushdi,Mahmoud H. Annaby
摘要:图形信号处理(GSP)正在成为生物医学信号和图像分析的主要工具。在大多数GSP技术中,通常通过统计和计算方法来设置图结构和边权重。最近,图结构学习方法提供了更可靠和灵活的数据表示。在这项工作中,我们介绍了一种基于两个图论表示的皮肤镜图像中黑色素瘤检测的图学习方法:超像素集成图(SEG)和超像素层次图(SHG)。对于这两种类型的图,在相邻级别的超像素之间分别在没有和具有父子约束的多个级别生成皮肤病变图像的超像素图,其中每个级别对应于具有不同数量的节点(20、40、60、80或100个节点)的子图。探讨了两种边权重分配技术:手工高斯权重和基于优化方法的学习权重。图形节点信号根据纹理、几何和颜色超像素特征进行分配。此外,通过应用不同的阈值(25%,50%和75%)来修剪最弱的边缘,并分析修剪对黑色素瘤检测性能的影响,研究了图边缘阈值化的效果。使用不同的分类器对所提出的方法进行了实验评估,并在公开的ISIC2017数据集上进行了训练和测试。通过从ISIC存档中添加更多的黑色素瘤图像,应用数据增强来缓解类别不平衡。结果表明,具有纹理节点信号的学习超像素集成图给出了最高的性能,达到99.00%的准确度和99.59%的AUC。
摘要:Graph signal processing (GSP) is becoming a major tool in biomedical signal and image analysis. In most GSP techniques, graph structures and edge weights have been typically set via statistical and computational methods. More recently, graph structure learning methods offered more reliable and flexible data representations. In this work, we introduce a graph learning approach for melanoma detection in dermoscopic images based on two graph-theoretic representations: superpixel ensemble graphs (SEG) and superpixel hierarchy graphs (SHG). For these two types of graphs, superpixel maps of a skin lesion image are respectively generated at multiple levels without and with parentchild constraints among superpixels at adjacent levels, where each level corresponds to a subgraph with a different number of nodes (20, 40, 60, 80, or 100 nodes). Two edge weight assignment techniques are explored: handcrafted Gaussian weights and learned weights based on optimization methods. The graph nodal signals are assigned based on texture, geometric, and color superpixel features. In addition, the effect of graph edge thresholding is investigated by applying different thresholds (25%, 50%, and 75%) to prune the weakest edges and analyze the impact of pruning on the melanoma detection performance. Experimental evaluation of the proposed method is performed with different classifiers trained and tested on the publicly available ISIC2017 dataset. Data augmentation is applied to alleviate class imbalance by adding more melanoma images from the ISIC archive. The results show that learned superpixel ensemble graphs with textural nodal signals give the highest performance reaching an accuracy of 99.00% and an AUC of 99.59%.
【6】Beyond Predefined Schemas: TRACE-KG for Context-Enriched Knowledge Graphs from Complex Documents
标题:超越预定义模式:TRACE-KG用于来自复杂文档的上下文丰富知识图
链接:https://arxiv.org/abs/2604.03496
作者:Mohammad Sadeq Abolhasani,Yang Ba,Yixuan He,Rong Pan
摘要:知识图构建通常依赖于预定义的本体或无模式提取。本体驱动的管道强制执行一致的类型,但需要昂贵的模式设计和维护,而无模式的方法往往会产生支离破碎的图与薄弱的全球组织,特别是在长的技术文档密集,上下文相关的信息。我们提出TRACE-KG(文本dRiven schemA上下文丰富的知识图),一个多模态的框架,共同构建一个上下文丰富的知识图和诱导模式,而无需假设一个预定义的本体。TRACE-KG通过结构化限定符捕获条件关系,并使用数据驱动的模式组织实体和关系,该模式作为可重用的语义支架,同时保留对源证据的完全可追溯性。实验表明,TRACE-KG产生结构连贯,可追溯的知识图,并提供了一个实用的替代本体驱动和无模式的建设管道。
摘要:Knowledge graph construction typically relies either on predefined ontologies or on schema-free extraction. Ontology-driven pipelines enforce consistent typing but require costly schema design and maintenance, whereas schema-free methods often produce fragmented graphs with weak global organization, especially in long technical documents with dense, context-dependent information. We propose TRACE-KG (Text-dRiven schemA for Context-Enriched Knowledge Graphs), a multimodal framework that jointly constructs a context-enriched knowledge graph and an induced schema without assuming a predefined ontology. TRACE-KG captures conditional relations through structured qualifiers and organizes entities and relations using a data-driven schema that serves as a reusable semantic scaffold while preserving full traceability to the source evidence. Experiments show that TRACE-KG produces structurally coherent, traceable knowledge graphs and offers a practical alternative to both ontology-driven and schema-free construction pipelines.
【7】Towards Intelligent Energy Security: A Unified Spatio-Temporal and Graph Learning Framework for Scalable Electricity Theft Detection in Smart Grids
标题:迈向智能能源安全:统一的时空和图形学习框架,用于智能电网中可扩展的电力盗窃检测
链接:https://arxiv.org/abs/2604.03344
作者
:AbdulQoyum A. Olowookere,Usman A. Oguntola,Ebenezer. Leke Odekanle,Maridiyah A. Madehin,Aisha A. Adesope
备注:26 pages, 9 figures
摘要:电力盗窃和非技术性损失(NTL)仍然是现代智能电网的关键挑战,造成重大经济损失并影响电网可靠性。该研究介绍了SmartGuard能源智能系统(SGEIS),这是一个用于电力盗窃检测和智能能源监控的集成人工智能框架。该系统结合了监督机器学习、基于深度学习的时间序列建模、非侵入式负载监控(NILM)和基于图形的学习,以捕获时间和空间消费模式。一个全面的数据处理管道的开发,结合功能工程,多尺度的时间分析,和基于规则的异常标记。深度学习模型,包括长短期记忆(LSTM),时间卷积网络(TCN)和自动编码器,用于检测异常使用模式。并行地,集成学习方法(诸如随机森林、梯度提升、XGBoost和LightGBM)用于分类。为了对网格拓扑和空间依赖性进行建模,图神经网络(GNN)被应用于识别互连节点之间的相关异常。NILM模块通过从聚合信号中分解设备级消耗来增强可解释性。实验结果表明,Gradient Boosting具有强大的性能,ROC-AUC为0.894,而基于图的模型在识别高风险节点方面的准确率超过96%。该混合框架通过集成时间、统计和空间智能来提高检测鲁棒性。总的来说,SGEIS为电力盗窃检测提供了一个可扩展的实用解决方案,提供了高准确性,改进的可解释性和现实世界智能电网部署的强大潜力。
摘要:Electricity theft and non-technical losses (NTLs) remain critical challenges in modern smart grids, causing significant economic losses and compromising grid reliability. This study introduces the SmartGuard Energy Intelligence System (SGEIS), an integrated artificial intelligence framework for electricity theft detection and intelligent energy monitoring. The proposed system combines supervised machine learning, deep learning-based time-series modeling, Non-Intrusive Load Monitoring (NILM), and graph-based learning to capture both temporal and spatial consumption patterns. A comprehensive data processing pipeline is developed, incorporating feature engineering, multi-scale temporal analysis, and rule-based anomaly labeling. Deep learning models, including Long Short-Term Memory (LSTM), Temporal Convolutional Networks (TCN), and Autoencoders, are employed to detect abnormal usage patterns. In parallel, ensemble learning methods such as Random Forest, Gradient Boosting, XGBoost, and LightGBM are utilized for classification. To model grid topology and spatial dependencies, Graph Neural Networks (GNNs) are applied to identify correlated anomalies across interconnected nodes. The NILM module enhances interpretability by disaggregating appliance-level consumption from aggregate signals. Experimental results demonstrate strong performance, with Gradient Boosting achieving a ROC-AUC of 0.894, while graph-based models attain over 96% accuracy in identifying high-risk nodes. The hybrid framework improves detection robustness by integrating temporal, statistical, and spatial intelligence. Overall, SGEIS provides a scalable and practical solution for electricity theft detection, offering high accuracy, improved interpretability, and strong potential for real-world smart grid deployment.
Transformer(4篇)
【1】Greedy and Transformer-Based Multi-Port Selection for Slow Fluid Antenna Multiple Access
标题:用于慢流体天线多路访问的贪婪和基于Transformer的多端口选择
链接:https://arxiv.org/abs/2604.04589
作者:Darian Perez-Adan,Jose P. Gonzalez-Coma,F. Javier Lopez-Martinez,Luis Castedo
摘要:我们解决端口选择问题的流体天线多址(FAMA)系统与多端口流体天线(FA)接收机。现有的方法要么以过高的计算成本实现接近最佳的频谱效率(SE),要么为了降低复杂度而牺牲显著的性能。我们提出了两种互补的策略:(i)GFwd+S,一种贪婪的前向选择方法,具有交换细化,在SE方面始终优于最先进的参考方案,以及(ii)通过模仿学习训练的基于Transformer的神经网络,然后是Reinforce策略梯度阶段,以较低的计算成本接近GFwd+S性能。
摘要:We address the port-selection problem in fluid antenna multiple access (FAMA) systems with multi-port fluid antenna (FA) receivers. Existing methods either achieve near-optimal spectral efficiency (SE) at prohibitive computational cost or sacrifice significant performance for lower complexity. We propose two complementary strategies: (i) GFwd+S, a greedy forward-selection method with swap refinement that consistently outperforms state-of-the-art reference schemes in terms of SE, and (ii) a Transformer-based neural network trained via imitation learning followed by a Reinforce policy-gradient stage, which approaches GFwd+S performance at lower computational cost.
【2】BWTA: Accurate and Efficient Binarized Transformer by Algorithm-Hardware Co-design
标题:BWTA:通过架构与硬件协同设计实现准确高效的二进制化Transformer
链接:https://arxiv.org/abs/2604.03957
作者:Yifu Ding,Xianglong Liu,Shenghao Jin,Jinyang Guo,Jiwen Lu
备注:Under review
摘要:超低比特量化为基于Transformer的模型带来了巨大的效率,但精度下降和有限的GPU支持阻碍了其广泛使用。本文分析了二值化中的零点失真,并提出了一种二进制权重和三进制激活(BWTA)量化方案,该方案将微小值投影到零,并保留了极低比特模型的准确性。对于训练,我们提出了平滑多阶段量化,结合了逐级降级策略和幅度对齐投影因子,以实现稳定和快速收敛。对于推理,我们开发了一个BWTA MatMul CUDA内核,具有并行位打包和全面的二进制/三进制MatMul实现,用于线性和注意力运算符,允许跨Transformer架构的无缝集成。实验表明,BWTA接近全精度性能的BERT,平均下降3.5%的GLUE和不到2%的下降五个任务,并达到可比的困惑和准确性的LLM。在效率方面,它在NVIDIA GPU上提供了比FP 16高16到24倍的内核级加速,在LLM上提供了216到330 tokens/s的端到端预填充加速,并且内存占用量更低。作为一种算法-硬件协同设计,BWTA在不牺牲模型质量的情况下展示了实用的、低延迟的超低位推理。
摘要
:Ultra low-bit quantization brings substantial efficiency for Transformer-based models, but the accuracy degradation and limited GPU support hinder its wide usage. In this paper, we analyze zero-point distortion in binarization and propose a Binary Weights & Ternary Activations (BWTA) quantization scheme, which projects tiny values to zero and preserves the accuracy of extremely low-bit models. For training, we propose Smooth Multi-Stage Quantization, combining a Levelwise Degradation Strategy and a Magnitude-Alignment Projection Factor to enable stable and fast convergence. For inference, we develop a BWTA MatMul CUDA kernel with instruction-level parallel bit-packing and comprehensive binary/ternary MatMul implementations for both linear and attention operators, allowing seamless integration across Transformer architectures. Experiments show that BWTA approaches full-precision performance for BERT, with an average 3.5% drop on GLUE and less than 2% drop on five tasks, and achieves comparable perplexity and accuracy for LLMs. In efficiency, it delivers 16 to 24 times kernel-level speedup over FP16 on NVIDIA GPUs, and 216 to 330 tokens/s end-to-end prefill speedup with lower memory footprint on LLMs. As an algorithm-hardware co-design, BWTA demonstrates practical, low-latency ultra-low-bit inference without sacrificing model quality.
【3】Collapse-Free Prototype Readout Layer for Transformer Encoders
标题:用于Transformer编码器的无塌陷原型读出层
链接:https://arxiv.org/abs/2604.03850
作者:Giansalvo Cirrincione,Rahul Ranjeev Kumar
备注:35 pages, 6 figures, submitted to Pattern Recognition
摘要:DDCL-Attention是Transformer编码器的基于原型的读出层,它用学习的压缩机制取代了简单的池化方法,如平均池化或类令牌。它使用一小组全局原型向量,并通过软概率匹配将令牌分配给它们,以序列长度的线性复杂度产生紧凑的令牌摘要。 该方法提供了三个主要优点。首先,它通过将训练损失精确分解为重建项和多样性项来避免原型崩溃,确保原型保持独特。其次,使用Tikhonov的奇异扰动理论和显式学习率约束,证明其与编码器的联合训练在实际时间尺度条件下是稳定的。第三,相同的框架支持三种用途:最终读出层,可区分码本扩展VQ-VAE,和分层文档压缩器。 在四个数据集上的实验证实了理论预测:损失分解准确,当满足稳定性条件时,原型分离如预期地增长,并且码本达到充分利用,优于标准硬矢量量化。关于轨道碎片分类的另一项研究表明,该方法还适用于标准自然语言处理和视觉任务以外的领域,包括科学表格数据。
摘要:DDCL-Attention is a prototype-based readout layer for transformer encoders that replaces simple pooling methods, such as mean pooling or class tokens, with a learned compression mechanism. It uses a small set of global prototype vectors and assigns tokens to them through soft probabilistic matching, producing compact token summaries at linear complexity in sequence length. The method offers three main advantages. First, it avoids prototype collapse through an exact decomposition of the training loss into a reconstruction term and a diversity term, ensuring that prototypes remain distinct. Second, its joint training with the encoder is shown to be stable under a practical timescale condition, using Tikhonov's singular perturbation theory and explicit learning-rate constraints. Third, the same framework supports three uses: a final readout layer, a differentiable codebook extending VQ-VAE, and a hierarchical document compressor. Experiments on four datasets confirm the theoretical predictions: the loss decomposition holds exactly, prototype separation grows as expected when the stability condition is met, and the codebook reaches full utilization, outperforming standard hard vector quantization. An additional study on orbital debris classification shows that the method also applies beyond standard NLP and vision tasks, including scientific tabular data.
【4】Inference-Path Optimization via Circuit Duplication in Frozen Visual Transformers for Marine Species Classification
标题:冷冻视觉变形器中通过电路复制进行推断路径优化用于海洋物种分类
链接:https://arxiv.org/abs/2604.03428
作者:Thomas Manuel Rost
备注:pre study, more ablations to come
摘要:自动水下物种分类受到注释成本和环境变化的限制,限制了完全监督模型的可移植性。最近的工作表明,来自自监督视觉基础模型的冻结嵌入已经为海洋图像分类提供了强大的标签效率基线。在这里,我们研究是否可以在推理时改进这种冻结嵌入机制,而无需微调或更改模型权重。 我们应用电路复制,一个推理时间的方法,最初提出的大型语言模型,其中一个选定的范围内的Transformer层遍历两次,在向前通过。我们在两种设置下使用冻结的DINOv3嵌入在类不平衡的AQUA20基准上进行评估:全局电路选择,其中为完整数据集选择单个重复电路,以及类特定电路选择,其中每个物种可能会收到不同的最佳电路。这两种设置都使用简单的半监督下游分类器。 在标准的冻结向前传球的基础上,循环重复持续改进。在最大标签预算下,特定于类的选择达到了0.875的宏F1,在没有任何基于梯度的训练的情况下,将与完全监督的ConvNeXt基准(0.889)的差距缩小到了1.4分。四个物种超过了他们的完全监督参考,章鱼提高了+12.1 F1点。在所有预算中,大约75%的班级更喜欢班级特定的电路,这表明真正的班级依赖性好处。据我们所知,这是电路复制在计算机视觉中的首次应用。
摘要:Automated underwater species classification is constrained by annotation cost and environmental variation that limits the transferability of fully supervised models. Recent work has shown that frozen embeddings from self-supervised vision foundation models already provide a strong label-efficient baseline for marine image classification. Here we investigate whether this frozen-embedding regime can be improved at inference time, without fine-tuning or changing model weights. We apply Circuit Duplication, an inference-time method originally proposed for Large Language Models, in which a selected range of transformer layers is traversed twice during the forward pass. We evaluate on the class-imbalanced AQUA20 benchmark using frozen DINOv3 embeddings under two settings: global circuit selection, where a single duplicated circuit is chosen for the full dataset, and class-specific circuit selection, where each species may receive a different optimal circuit. Both settings use simple semi-supervised downstream classifiers. Circuit Duplication consistently improves over the standard frozen forward pass. At the maximum label budget, class-specific selection reaches a macro F1 of 0.875, closing the gap to the fully supervised ConvNeXt benchmark (0.889) to 1.4 points without any gradient-based training. Four species exceed their fully supervised reference, with octopus improving by +12.1 F1 points. Across all budgets, roughly 75% of classes prefer a class-specific circuit, indicating a genuinely class-dependent benefit. To our knowledge, this is the first application of Circuit Duplication to computer vision.
GAN|对抗|攻击|生成相关(12篇)
【1】Dynamic Free-Rider Detection in Federated Learning via Simulated Attack Patterns
标题:通过模拟攻击模式实现联邦学习中的动态搭便车检测
链接:https://arxiv.org/abs/2604.04611
作者:Motoki Nakamura
备注:Submitted to ECML PKDD 2026 (under review)
摘要
:联邦学习(FL)使多个客户端能够通过聚合本地更新而无需共享私有数据来协作训练全局模型。然而,FL经常面临搭便车的挑战,客户端提交虚假的模型参数,而不执行实际的训练,以获得全局模型而不做出贡献。Chen等人提出了一种基于模型参数权重演化频率的搭便车检测方法。这种检测方法是实用的搭便车检测方法的主要候选者,因为它既不需要代理数据集,也不需要预训练。然而,它很难检测到“动态”搭便车者,他们在早期回合中表现诚实,后来转向搭便车,特别是在全球模型模仿攻击下,如delta权重攻击和我们新提出的自适应WEF伪装攻击。在本文中,我们提出了一种新的检测方法S2-WEF模拟WEF模式的潜在的全球模型为基础的攻击服务器端使用以前广播的全球模型,并确定客户端提交的WEF模式类似于模拟的。为了处理各种搭便车攻击策略,S2-WEF进一步将这种基于模拟的相似性得分与从提交的WEF之间的相互比较计算的偏差得分相结合,并通过二维聚类和按得分分类来分离良性和搭便车客户端。该方法能够动态检测在训练期间转变为搭便车者的客户端,而无需代理数据集或预训练。我们在三个数据集和五种攻击类型上进行了广泛的实验,证明了S2-WEF比现有方法具有更高的鲁棒性。
摘要:Federated learning (FL) enables multiple clients to collaboratively train a global model by aggregating local updates without sharing private data. However, FL often faces the challenge of free-riders, clients who submit fake model parameters without performing actual training to obtain the global model without contributing. Chen et al. proposed a free-rider detection method based on the weight evolving frequency (WEF) of model parameters. This detection approach is a leading candidate for practical free-rider detection methods, as it requires neither a proxy dataset nor pre-training. Nevertheless, it struggles to detect ``dynamic'' free-riders who behave honestly in early rounds and later switch to free-riding, particularly under global-model-mimicking attacks such as the delta weight attack and our newly proposed adaptive WEF-camouflage attack. In this paper, we propose a novel detection method S2-WEF that simulates the WEF patterns of potential global-model-based attacks on the server side using previously broadcasted global models, and identifies clients whose submitted WEF patterns resemble the simulated ones. To handle a variety of free-rider attack strategies, S2-WEF further combines this simulation-based similarity score with a deviation score computed from mutual comparisons among submitted WEFs, and separates benign and free-rider clients by two-dimensional clustering and per-score classification. This method enables dynamic detection of clients that transition into free-riders during training without proxy datasets or pre-training. We conduct extensive experiments across three datasets and five attack types, demonstrating that S2-WEF achieves higher robustness than existing approaches.
【2】Explainable Autonomous Cyber Defense using Adversarial Multi-Agent Reinforcement Learning
标题:使用对抗性多智能体强化学习的可解释自主网络防御
链接:https://arxiv.org/abs/2604.04442
作者:Yiyao Zhang,Diksha Goel,Hussain Ahmad
摘要:自主代理越来越多地部署在进攻性和防御性网络行动中,在关键基础设施环境中创建高速闭环交互。高级持续性威胁(APT)攻击者利用“生活在陆地上”技术和有针对性的遥测扰动来诱导监控系统中的模糊性,导致自动化防御反应过度或将良性行为错误分类为恶意活动。现有的单片和多智能体防御管道主要基于基于相关性的信号进行操作,对响应行为缺乏结构性约束,并且容易在模糊或对抗性输入下发生推理漂移。我们提出了因果多智能体决策框架(C-MADF),一个结构约束的架构,自主网络防御,集成了因果建模与对抗双政策控制。C-MADF首先从历史遥测中学习结构因果模型(SCM),并将其编译为定义可接受响应转换的调查级有向无环图(DAG)。该路线图被形式化为马尔可夫决策过程(MDP),其动作空间被明确限制为因果一致的转换。在这个受限空间内的决策由双代理强化学习系统执行,其中威胁优化蓝队策略由保守形状的红队策略抵消。政策间分歧通过政策分歧分数量化,并通过配备解释性透明度分数的人在环界面暴露,该分数可作为不确定性下的升级信号。在真实世界的CICIoT 2023数据集上,C-MADF将三个前沿文献基线的假阳性率从11.2%,9.7%和8.4%降低到1.8%,同时实现了0.997的精确度,0.961的召回率和0.979的F1分数。
摘要:Autonomous agents are increasingly deployed in both offensive and defensive cyber operations, creating high-speed, closed-loop interactions in critical infrastructure environments. Advanced Persistent Threat (APT) actors exploit "Living off the Land" techniques and targeted telemetry perturbations to induce ambiguity in monitoring systems, causing automated defenses to overreact or misclassify benign behavior as malicious activity. Existing monolithic and multi-agent defense pipelines largely operate on correlation-based signals, lack structural constraints on response actions, and are vulnerable to reasoning drift under ambiguous or adversarial inputs. We present the Causal Multi-Agent Decision Framework (C-MADF), a structurally constrained architecture for autonomous cyber defense that integrates causal modeling with adversarial dual-policy control. C-MADF first learns a Structural Causal Model (SCM) from historical telemetry and compiles it into an investigation-level Directed Acyclic Graph (DAG) that defines admissible response transitions. This roadmap is formalized as a Markov Decision Process (MDP) whose action space is explicitly restricted to causally consistent transitions. Decision-making within this constrained space is performed by a dual-agent reinforcement learning system in which a threat-optimizing Blue-Team policy is counterbalanced by a conservatively shaped Red-Team policy. Inter-policy disagreement is quantified through a Policy Divergence Score and exposed via a human-in-the-loop interface equipped with an Explainability-Transparency Score that serves as an escalation signal under uncertainty. On the real-world CICIoT2023 dataset, C-MADF reduces the false-positive rate from 11.2%, 9.7%, and 8.4% in three cutting-edge literature baselines to 1.8%, while achieving 0.997 precision, 0.961 recall, and 0.979 F1-score.
【3】Adversarial Robustness Analysis of Cloud-Assisted Autonomous Driving Systems
标题:云辅助自动驾驶系统的对抗鲁棒性分析
链接:https://arxiv.org/abs/2604.04349
作者:Maher Al Islam,Amr S. El-Wakeel
摘要:自动驾驶汽车越来越依赖于基于深度学习的感知和控制,这对计算提出了很高的要求。云辅助架构将这些功能卸载到远程服务器,从而通过车联网(IoV)实现增强感知和协调决策。然而,这种模式引入了跨层漏洞,其中感知模型的对抗性操纵和车辆-云链路中的网络损伤可能共同破坏安全关键的自主性。本文提出了一个硬件在环车联网测试平台,该平台集成了实时感知、控制和通信,以评估云辅助自动驾驶中的此类漏洞。部署在云上的基于YOLOv 8的对象检测器会受到使用快速梯度符号方法(FGSM)和投影梯度下降(PGD)的白盒对抗攻击,而网络对手会在车辆-云循环中引起延迟和数据包丢失。结果表明,对抗性扰动显着降低感知性能,PGD降低检测精度和召回率从0.73和0.68在干净的基线到0.22和0.15在λ = 0.04。150-250 ms的网络延迟(对应于大约3-4帧的瞬时丢失)和0.5- 5%的分组丢失率进一步使闭环控制不稳定,导致延迟致动和违反规则。这些发现强调了云辅助自动驾驶系统中跨层弹性的必要性。
摘要
:Autonomous vehicles increasingly rely on deep learning-based perception and control, which impose substantial computational demands. Cloud-assisted architectures offload these functions to remote servers, enabling enhanced perception and coordinated decision-making through the Internet of Vehicles (IoV). However, this paradigm introduces cross-layer vulnerabilities, where adversarial manipulation of perception models and network impairments in the vehicle-cloud link can jointly undermine safety-critical autonomy. This paper presents a hardware-in-the-loop IoV testbed that integrates real-time perception, control, and communication to evaluate such vulnerabilities in cloud-assisted autonomous driving. A YOLOv8-based object detector deployed on the cloud is subjected to whitebox adversarial attacks using the Fast Gradient Sign Method (FGSM) and Projected Gradient Descent (PGD), while network adversaries induce delay and packet loss in the vehicle-cloud loop. Results show that adversarial perturbations significantly degrade perception performance, with PGD reducing detection precision and recall from 0.73 and 0.68 in the clean baseline to 0.22 and 0.15 at epsilon= 0.04. Network delays of 150-250 ms, corresponding to transient losses of approximately 3-4 frames, and packet loss rates of 0.5-5 % further destabilize closed-loop control, leading to delayed actuation and rule violations. These findings highlight the need for cross-layer resilience in cloud-assisted autonomous driving systems.
【4】Convolutional Neural Network and Adversarial Autoencoder in EEG images classification
标题:卷积神经网络和对抗自动编码器在脑电图像分类中的应用
链接:https://arxiv.org/abs/2604.04313
作者:Albert Nasybullin,Semen Kurkin
备注:4 pages, 6 figures
摘要:在本文中,我们考虑将计算机视觉算法应用于EEG数据分析过程中神经科学中面临的分类问题。我们的方法是将计算机视觉和神经网络方法相结合,解决手部运动过程中的人脑活动分类问题。我们对原始EEG信号进行预处理并生成2D EEG地形图。后来,我们开发了监督和半监督神经网络来分类不同的运动皮层活动。
摘要:In this paper, we consider applying computer vision algorithms for the classification problem one faces in neuroscience during EEG data analysis. Our approach is to apply a combination of computer vision and neural network methods to solve human brain activity classification problems during hand movement. We pre-processed raw EEG signals and generated 2D EEG topograms. Later, we developed supervised and semi-supervised neural networks to classify different motor cortex activities.
【5】DAGAF: A directed acyclic generative adversarial framework for joint structure learning and tabular data synthesis
标题:DAGAF:一个用于联合结构学习和表格数据合成的有向非循环生成对抗框架
链接:https://arxiv.org/abs/2604.04290
作者:Hristo Petkov,Calum MacLellan,Feng Dong
备注:The code for this paper is available at https://github.com/ItsyPetkov/DAGAF
摘要:了解数据变量之间的因果关系可以为表格数据集的构建提供重要的见解。大多数现有的因果关系学习方法通常专注于应用单个可识别的因果模型,例如加性噪声模型(ANM)或线性非高斯非循环模型(LiNGAM),以发现观测数据中表现出的依赖性。我们通过引入一种新的双步骤框架来改进这种方法,该框架能够在多个因果模型假设下进行因果结构学习和表格数据合成。我们的方法使用有向无环图(DAG)来表示数据变量之间的因果关系。通过应用各种函数因果模型,包括ANM,LiNGAM和后非线性模型(PNL),我们隐式地学习DAG的内容来模拟观测数据的生成过程,有效地复制了真实的数据分布。这是支持的理论分析,以解释多个损失条款组成的目标函数的框架。实验结果表明,DAGAF在结构学习方面优于许多现有的方法,在真实世界和基准数据集上都实现了显着较低的结构汉明距离(SHD)分数(Sachs:47%,Child:11%,Hailfinder:5%,Pathfinder:与最先进的技术相比提高了7%),同时能够产生多样化的高质量样本。
摘要:Understanding the causal relationships between data variables can provide crucial insights into the construction of tabular datasets. Most existing causality learning methods typically focus on applying a single identifiable causal model, such as the Additive Noise Model (ANM) or the Linear non-Gaussian Acyclic Model (LiNGAM), to discover the dependencies exhibited in observational data. We improve on this approach by introducing a novel dual-step framework capable of performing both causal structure learning and tabular data synthesis under multiple causal model assumptions. Our approach uses Directed Acyclic Graphs (DAG) to represent causal relationships among data variables. By applying various functional causal models including ANM, LiNGAM and the Post-Nonlinear model (PNL), we implicitly learn the contents of DAG to simulate the generative process of observational data, effectively replicating the real data distribution. This is supported by a theoretical analysis to explain the multiple loss terms comprising the objective function of the framework. Experimental results demonstrate that DAGAF outperforms many existing methods in structure learning, achieving significantly lower Structural Hamming Distance (SHD) scores across both real-world and benchmark datasets (Sachs: 47%, Child: 11%, Hailfinder: 5%, Pathfinder: 7% improvement compared to state-of-the-art), while being able to produce diverse, high-quality samples.
【6】Peoples Water Data: Enabling Reliable Field Data Generation and Microbial Contamination Screening in Household Drinking Water
标题:人民水数据:实现可靠的现场数据生成和家庭饮用水中的微生物污染筛查
链接:https://arxiv.org/abs/2604.04240
作者:Suzan Kagan,Shira Spigelman,Sankar Sudhir,Thalappil Pradeep,Hadas Mamane
摘要:不安全的饮用水仍然是全球主要的公共卫生问题,特别是在常规微生物监测有限的低资源地区。虽然大肠杆菌是国际公认的粪便污染指标,但基于实验室的检测往往无法大规模进行。在这项研究中,我们开发并评估了一个两阶段的机器学习框架,用于预测E。大肠杆菌存在于分散的家庭使用点的饮用水在钦奈,印度使用低成本的理化和上下文指标。该数据集包括在人民水数据倡议下收集的3 023个样本;经过协调、技术清理和离群值筛选后,保留了2 207个有效样本。 该框架提供了一个可扩展的决策支持工具,用于在资源受限的环境中优先进行微生物检测,并解决了使用点污染风险评估中的一个重要空白。除了预测建模之外,本研究还在人工智能支持的现场实施框架内进行,该框架结合了面向学生的指导和实时QC,以提高分散式家庭水监测中的协议遵守率、可追溯性和数据可靠性。
摘要
:Unsafe drinking water remains a major public health concern globally, particularly in low-resource regions where routine microbiological surveillance is limited. Although Escherichia coli is the internationally recognized indicator of fecal contamination, laboratory-based testing is often inaccessible at scale. In this study, we developed and evaluated a two-stage machine-learning framework for predicting E. coli presence in decentralized household point-of-use drinking water in Chennai, India using low-cost physicochemical and contextual indicators. The dataset comprised 3,023 samples collected under the Peoples Water Data initiative; after harmonization, technical cleaning, and outlier screening, 2,207 valid samples were retained. This framework provides a scalable decision-support tool for prioritizing microbiological testing in resource-constrained environments and addresses an important gap in point-of-use contamination risk assessment. Beyond predictive modeling, the present study was conducted within an AI-supported field implementation framework that combined student-facing guidance and real-time QC to improve protocol adherence, traceability, and data reliability in decentralized household water monitoring.
【7】ACES: Who Tests the Tests? Leave-One-Out AUC Consistency for Code Generation
标题:ACES:谁测试测试?代码生成的留一值的AUCK一致性
链接:https://arxiv.org/abs/2604.03922
作者:Hui Sun,Yun-Ji Zhang,Zheng Xie,Ren-Biao Liu,Yali Du,Xin-Ye Li,Ming Li
备注:32 pages, 14 figures, 9 tables
摘要:使用LLM生成的测试选择LLM生成的候选代码是一项挑战,因为测试本身可能不正确。现有的方法要么平等对待所有测试,要么依赖于ad-hoc算法来过滤不可靠的测试。然而,确定测试正确性需要知道哪些代码是正确的,从而创建一个循环依赖。我们的关键见解是,我们根本不需要确定测试的正确性:\n {测试投票应该排名,而不仅仅是计数}。重要的不是有多少代码通过测试,而是测试是否能区分正确和不正确的代码。我们打破了循环依赖通过留一出评估:举行一个测试,排名代码的总分数在所有剩余的测试,并衡量是否举行了测试的通过/失败模式同意这个排名。我们将这种一致性形式化为留一AUC~(LOO-AUC),并证明了预期的LOO-AUC与每个测试区分正确代码和错误代码的能力成正比。在此基础上,我们提出了具有两个互补变体的\textbf{ACES}~(\textbf{A}UC \textbf{C}onsist\textbf{E}ncy \textbf{S}coring):ACES-C提供了封闭形式的权重,在平均测试质量的温和假设下,可证明近似于预期中的预言; ACES-O放弃了这一假设,并迭代优化了可微的LOO-AUC目标。两者都只对二进制传递矩阵进行操作,开销可以忽略不计,并在多个代码生成基准上实现了最先进的Pass@$k$。
摘要:Selecting LLM-generated code candidates using LLM-generated tests is challenging because the tests themselves may be incorrect. Existing methods either treat all tests equally or rely on ad-hoc heuristics to filter unreliable tests. Yet determining test correctness requires knowing which codes are correct, creating a \emph{circular dependency}. Our key insight is that we need not determine test correctness at all: \emph{test votes should rank, not merely count}. What matters is not how many codes pass a test, but whether the test can \emph{distinguish} correct from incorrect code. We break the circular dependency via leave-one-out evaluation: hold out one test, rank codes by their aggregate scores on all remaining tests, and measure whether the held-out test's pass/fail pattern agrees with this ranking. We formalize this agreement as the leave-one-out AUC~(LOO-AUC) and prove that the expected LOO-AUC is proportional to each test's ability to separate correct code from incorrect code. Building on this, we propose \textbf{ACES}~(\textbf{A}UC \textbf{C}onsist\textbf{E}ncy \textbf{S}coring) with two complementary variants: ACES-C provides closed-form weights that provably approximate the oracle in expectation under a mild assumption on average test quality; ACES-O drops this assumption and iteratively optimizes a differentiable LOO-AUC objective. Both operate solely on the binary pass matrix with negligible overhead, and achieve state-of-the-art Pass@$k$ on multiple code generation benchmarks.
【8】Align Your Structures: Generating Trajectories with Structure Pretraining for Molecular Dynamics
标题:对齐结构:通过分子动力学的结构预训练生成轨迹
链接:https://arxiv.org/abs/2604.03911
作者:Aniketh Iyengar,Jiaqi Han,Pengwei Sun,Mingjian Jiang,Jianwen Xie,Stefano Ermon
备注:Published at ICLR 2026. 38 pages, 17 figures, 17 tables
摘要:使用深度生成模型生成分子动力学(MD)轨迹已经引起了越来越多的关注,但由于MD数据的有限可用性和建模高维MD分布所涉及的复杂性,仍然具有固有的挑战性。为了克服这些挑战,我们提出了一个新的框架,利用结构预训练MD轨迹生成。具体来说,我们首先训练一个基于扩散的结构生成模型上的大规模构象数据集,在此之上,我们引入了一个训练的MD轨迹数据,旨在加强生成的结构之间的时间一致性的TMF模块。我们的方法有效地利用丰富的结构数据,以减轻MD轨迹数据的稀缺性,并有效地将复杂的MD建模任务分解为两个可管理的子问题:结构生成和时间对齐。我们在QM9和DRUGS小分子数据集上全面评估了我们的方法,包括无条件生成、正向模拟和插值任务,并将我们的框架和分析进一步扩展到四肽和蛋白质单体系统。实验结果证实,我们的方法在生成化学逼真的MD轨迹,证明了几何,动力学和能量测量的准确性显着改善。
摘要:Generating molecular dynamics (MD) trajectories using deep generative models has attracted increasing attention, yet remains inherently challenging due to the limited availability of MD data and the complexities involved in modeling high-dimensional MD distributions. To overcome these challenges, we propose a novel framework that leverages structure pretraining for MD trajectory generation. Specifically, we first train a diffusion-based structure generation model on a large-scale conformer dataset, on top of which we introduce an interpolator module trained on MD trajectory data, designed to enforce temporal consistency among generated structures. Our approach effectively harnesses abundant structural data to mitigate the scarcity of MD trajectory data and effectively decomposes the intricate MD modeling task into two manageable subproblems: structural generation and temporal alignment. We comprehensively evaluate our method on the QM9 and DRUGS small-molecule datasets across unconditional generation, forward simulation, and interpolation tasks, and further extend our framework and analysis to tetrapeptide and protein monomer systems. Experimental results confirm that our approach excels in generating chemically realistic MD trajectories, as evidenced by remarkable improvements of accuracy in geometric, dynamical, and energetic measurements.
【9】Improving ML Attacks on LWE with Data Repetition and Stepwise Regression
标题:通过数据重复和逐步回归改进对LWE的ML攻击
链接:https://arxiv.org/abs/2604.03903
作者:Alberto Alfarano,Eshika Saxena,Emily Wenger,François Charton,Kristin Lauter
摘要:带错误学习(LWE)问题是基于格的密码学中的一个数学难题。在最简单的二进制秘密的情况下,它是子集和问题,有误差。在二进制、三进制和小秘密的情况下,证明了对LWE的有效ML攻击,并在相当稀疏的秘密上取得了成功。ML攻击在“残酷区域”中恢复具有多达3个活动位的秘密(Nolte等人,2024)对用BKZ预处理的样品进行检测。我们表明,使用更大的训练集和重复的例子可以恢复更密集的秘密。从经验上讲,我们观察到基于模型的尝试恢复秘密,数据集大小和重复的例子之间的幂律关系。我们引入逐步回归技术来恢复秘密的“酷位”。
摘要:The Learning with Errors (LWE) problem is a hard math problem in lattice-based cryptography. In the simplest case of binary secrets, it is the subset sum problem, with error. Effective ML attacks on LWE were demonstrated in the case of binary, ternary, and small secrets, succeeding on fairly sparse secrets. The ML attacks recover secrets with up to 3 active bits in the "cruel region" (Nolte et al., 2024) on samples pre-processed with BKZ. We show that using larger training sets and repeated examples enables recovery of denser secrets. Empirically, we observe a power-law relationship between model-based attempts to recover the secrets, dataset size, and repeated examples. We introduce a stepwise regression technique to recover the "cool bits" of the secret.
【10】CountsDiff: A Diffusion Model on the Natural Numbers for Generation and Imputation of Count-Based Data
标题:CountsDiff:用于基于计数的数据生成和插补的自然数扩散模型
链接:https://arxiv.org/abs/2604.03779
作者:Renzo G. Soatto,Anders Hoel,Greycen Ren,Shorna Alam,Stephen Bates,Nikolaos P. Daskalakis,Caroline Uhler,Maria Skoularidou
备注:36 Pages, 11 figures. In review
摘要:扩散模型在连续和基于标记的领域的生成任务中表现出色,但它们在离散有序数据中的应用仍然不发达。我们提出了CountsDiff,一个旨在对自然数的分布进行本地建模的扩散框架。CountsDiff扩展了Blackout扩散框架,通过直接参数化生存概率时间表和明确的损失加权来简化其制定。这通过在现有扩散建模框架中直接模拟的设计参数引入了灵活性。除了这种重新参数化之外,CountsDiff还引入了现代扩散模型的功能,这些功能以前在基于计数的领域中不存在,包括连续时间训练,无分类器指导和允许非单调反向轨迹的流失/重新掩蔽反向动态。我们提出了CountsDiff的初始实例化,并在自然图像数据集(CIFAR-10,CelebA)上对其进行了验证,探索了在复杂,经过充分研究和可解释的数据域中改变引入的设计参数的影响。然后,我们强调生物计数测定作为一个自然的用例,评估CountsDiff在胎儿细胞和心脏细胞图谱中的单细胞RNA-seq插补。值得注意的是,我们发现,即使是这个简单的实例化也匹配或超过了最先进的离散生成模型和领先的RNA-seq插补方法的性能,同时为未来工作中通过优化设计选择进一步获得收益留下了大量空间。
摘要:Diffusion models have excelled at generative tasks for both continuous and token-based domains, but their application to discrete ordinal data remains underdeveloped. We present CountsDiff, a diffusion framework designed to natively model distributions on the natural numbers. CountsDiff extends the Blackout diffusion framework by simplifying its formulation through a direct parameterization in terms of a survival probability schedule and an explicit loss weighting. This introduces flexibility through design parameters with direct analogues in existing diffusion modeling frameworks. Beyond this reparameterization, CountsDiff introduces features from modern diffusion models, previously absent in counts-based domains, including continuous-time training, classifier-free guidance, and churn/remasking reverse dynamics that allow non-monotone reverse trajectories. We propose an initial instantiation of CountsDiff and validate it on natural image datasets (CIFAR-10, CelebA), exploring the effects of varying the introduced design parameters in a complex, well-studied, and interpretable data domain. We then highlight biological count assays as a natural use case, evaluating CountsDiff on single-cell RNA-seq imputation in a fetal cell and heart cell atlas. Remarkably, we find that even this simple instantiation matches or surpasses the performance of a state-of-the-art discrete generative model and leading RNA-seq imputation methods, while leaving substantial headroom for further gains through optimized design choices in future work.
【11】CRAFT: Video Diffusion for Bimanual Robot Data Generation
标题:CRAFT:用于双手机器人数据生成的视频传播
链接:https://arxiv.org/abs/2604.03552
作者:Jason Chen,I-Chun Arthur Liu,Gaurav Sukhatme,Daniel Seita
摘要:从演示中学习的双手机器人从根本上受到现实世界数据的成本和狭窄的视觉多样性的限制,这限制了跨视点,对象配置和实施例的策略鲁棒性。我们提出了Canny引导的机器人数据生成使用视频扩散Transformers(CRAFT),视频扩散为基础的框架,可扩展的双手演示生成,合成时间上连贯的操作视频,同时产生动作标签。通过调节视频扩散的边缘为基础的结构线索从模拟器生成的轨迹,CRAFT产生物理上合理的轨迹变化,并支持一个统一的增强管道跨越对象姿态变化,摄像机的观点,照明和背景变化,跨实施例传输,和多视图合成。我们利用预先训练的视频扩散模型将模拟视频以及来自模拟轨迹的动作标签转换为动作一致的演示。CRAFT仅从几个真实世界的演示开始,就可以生成一个大型的、视觉上多样化的逼真训练数据集,而无需在真实机器人(Sim2Real)上重播演示。在模拟和现实世界的双手操作任务中,CRAFT提高了现有增强策略和直接数据缩放的成功率,表明基于扩散的视频生成可以大大扩展演示多样性并提高双臂操作任务的泛化能力。我们的项目网站是:https://craftaug.github.io/
摘要
:Bimanual robot learning from demonstrations is fundamentally limited by the cost and narrow visual diversity of real-world data, which constrains policy robustness across viewpoints, object configurations, and embodiments. We present Canny-guided Robot Data Generation using Video Diffusion Transformers (CRAFT), a video diffusion-based framework for scalable bimanual demonstration generation that synthesizes temporally coherent manipulation videos while producing action labels. By conditioning video diffusion on edge-based structural cues extracted from simulator-generated trajectories, CRAFT produces physically plausible trajectory variations and supports a unified augmentation pipeline spanning object pose changes, camera viewpoints, lighting and background variations, cross-embodiment transfer, and multi-view synthesis. We leverage a pre-trained video diffusion model to convert simulated videos, along with action labels from the simulation trajectories, into action-consistent demonstrations. Starting from only a few real-world demonstrations, CRAFT generates a large, visually diverse set of photorealistic training data, bypassing the need to replay demonstrations on the real robot (Sim2Real). Across simulated and real-world bimanual tasks, CRAFT improves success rates over existing augmentation strategies and straightforward data scaling, demonstrating that diffusion-based video generation can substantially expand demonstration diversity and improve generalization for dual-arm manipulation tasks. Our project website is available at: https://craftaug.github.io/
【12】Adversarial Robustness of Deep State Space Models for Forecasting
标题:用于预测的深状态空间模型的对抗鲁棒性
链接:https://arxiv.org/abs/2604.03427
作者:Sribalaji C. Anand,George J. Pappas
备注:8 pages, 5 figures, conference submission
摘要:时间序列预测的状态空间模型(SSM)在基准数据集上表现出很强的经验性能,但其在对抗性扰动下的鲁棒性却知之甚少。我们解决这个差距,通过控制理论的镜头,专注于最近提出的时空SSM预报。我们首先建立的解码器只有空时架构可以代表最佳的卡尔曼预测器时,底层的数据生成过程是自回归-属性没有其他SSM拥有。在此基础上,我们将稳健的预测器设计制定为Stackelberg游戏,以对抗受检测预算约束的最坏情况的隐形对手,并通过对抗训练解决它。我们推导出对抗性预测误差的封闭形式界限,揭示了开环不稳定性,闭环不稳定性和解码器状态维度如何各自放大脆弱性-为稳健的预测器设计提供了可操作的原则。最后,我们表明,即使对手无法访问预测器,仍然可以通过利用模型的局部线性输入输出行为,完全绕过梯度计算来构建有效的攻击。在Monash基准数据集上的实验表明,无模型攻击,没有任何梯度计算,可以比小步长的投影梯度下降造成至少33%的错误。
摘要:State-space model (SSM) for time-series forecasting have demonstrated strong empirical performance on benchmark datasets, yet their robustness under adversarial perturbations is poorly understood. We address this gap through a control-theoretic lens, focusing on the recently proposed Spacetime SSM forecaster. We first establish that the decoder-only Spacetime architecture can represent the optimal Kalman predictor when the underlying data-generating process is autoregressive - a property no other SSM possesses. Building on this, we formulate robust forecaster design as a Stackelberg game against worst-case stealthy adversaries constrained by a detection budget, and solve it via adversarial training. We derive closed-form bounds on adversarial forecasting error that expose how open-loop instability, closed-loop instability, and decoder state dimension each amplify vulnerability - offering actionable principles towards robust forecaster design. Finally, we show that even adversaries with no access to the forecaster can nonetheless construct effective attacks by exploiting the model's locally linear input-output behavior, bypassing gradient computations entirely. Experiments on the Monash benchmark datasets highlight that model-free attacks, without any gradient computation, can cause at least 33% more error than projected gradient descent with a small step size.
半/弱/无/有监督|不确定性|主动学习(7篇)
【1】SLSREC: Self-Supervised Contrastive Learning for Adaptive Fusion of Long- and Short-Term User Interests
标题:SL SREC:用于长期和短期用户兴趣的自适应融合的自我监督对比学习
链接:https://arxiv.org/abs/2604.04530
作者:Wei Zhou,Yue Shen,Junkai Ji,Yinglan Feng,Xing Tang,Xiuqiang He,Liang Feng,Zexuan Zhu
摘要:用户兴趣通常包括长期偏好和短期意图,反映了不同时间段用户行为的动态性质。用户交互的不均匀时间分布突出了不断变化的兴趣模式,这使得使用全面的历史行为准确捕捉兴趣变化具有挑战性。为了解决这个问题,我们提出了SLSRec,这是一种新的基于会话的模型,融合了长期和短期建议,通过分割历史行为来有效地捕获用户兴趣的时间动态。与传统模型将长期和短期用户兴趣组合成单一表示,从而影响推荐准确性不同,SLSRec利用自监督学习框架来区分这两种类型的兴趣。引入对比学习策略以确保长期和短期兴趣表示的准确校准。此外,一个基于注意力的融合网络被设计成自适应地聚合兴趣表示,优化它们的集成,以提高推荐性能。在三个公共基准数据集上进行的大量实验表明,SLSRec始终优于最先进的模型,同时在各种场景下表现出卓越的鲁棒性。我们将在接受后发布所有源代码。
摘要:User interests typically encompass both long-term preferences and short-term intentions, reflecting the dynamic nature of user behaviors across different timeframes. The uneven temporal distribution of user interactions highlights the evolving patterns of interests, making it challenging to accurately capture shifts in interests using comprehensive historical behaviors. To address this, we propose SLSRec, a novel Session-based model with the fusion of Long- and Short-term Recommendations that effectively captures the temporal dynamics of user interests by segmenting historical behaviors over time. Unlike conventional models that combine long- and short-term user interests into a single representation, compromising recommendation accuracy, SLSRec utilizes a self-supervised learning framework to disentangle these two types of interests. A contrastive learning strategy is introduced to ensure accurate calibration of long- and short-term interest representations. Additionally, an attention-based fusion network is designed to adaptively aggregate interest representations, optimizing their integration to enhance recommendation performance. Extensive experiments on three public benchmark datasets demonstrate that SLSRec consistently outperforms state-of-the-art models while exhibiting superior robustness across various scenarios.We will release all source code upon acceptance.
【2】Uncertainty-Aware Foundation Models for Clinical Data
标题:临床数据的不确定性意识基础模型
链接:https://arxiv.org/abs/2604.04175
作者:Qian Zhou,Yuanyun Zhang,Shi Li
摘要
:医疗保健基础模型在很大程度上遵循了自然语言处理和计算机视觉的范式,强调大规模预训练和对异构临床数据的确定性表示。然而,临床观察结果本质上是不完整的,反映了潜在生理状态的稀疏、不规则和模态依赖性测量。在这项工作中,我们提出了一个框架的不确定性意识的基础建模,代表每个病人不是一个点嵌入,但作为一个分布在合理的潜在状态。通过学习集值表示和执行一致性跨部分视图的同一个病人,该模型捕获什么是不变的推断,而明确编码认知的不确定性。我们将此配方与多模式编码器和可扩展的自监督目标相结合,结合重建,对比对齐和分布式正则化。在不同的临床任务中,我们的方法提高了预测性能,在缺失数据下的鲁棒性,以及相对于强基线的不确定性校准。这些结果表明,对未观察到的内容进行建模,而不仅仅是对医疗保健基金会模型构成关键的归纳偏差。
摘要:Healthcare foundation models have largely followed paradigms from natural language processing and computer vision, emphasizing large scale pretraining and deterministic representations over heterogeneous clinical data. However, clinical observations are inherently incomplete, reflecting sparse, irregular, and modality dependent measurements of an underlying physiologic state. In this work, we propose a framework for uncertainty aware foundation modeling that represents each patient not as a point embedding, but as a distribution over plausible latent states. By learning set valued representations and enforcing consistency across partial views of the same patient, the model captures what is invariantly inferable while explicitly encoding epistemic uncertainty. We integrate this formulation with multimodal encoders and scalable self supervised objectives, combining reconstruction, contrastive alignment, and distributional regularization. Across diverse clinical tasks, our approach improves predictive performance, robustness under missing data, and uncertainty calibration relative to strong baselines. These results suggest that modeling what is not observed rather than only what is constitutes a critical inductive bias for healthcare foundation models.
【3】Uncertainty-Aware Test-Time Adaptation for Cross-Region Spatio-Temporal Fusion of Land Surface Temperature
标题:不确定性感知测试时间自适应跨区域陆地表面温度时空融合
链接:https://arxiv.org/abs/2604.04153
作者:Sofiane Bouaziz,Adel Hafiane,Raphael Canals,Rachid Nedjai
备注:Accepted to IGARSS 2026
摘要:深度学习模型在各种遥感应用中显示出巨大的潜力。然而,由于领域的变化,它们通常很难在训练过程中看不到的地理区域进行概括。由于土地覆盖、气候和环境条件的变化,当训练区域和新目标区域之间的数据分布不同时,就会发生域转移。测试时自适应(TTA)已经出现作为这种转变的解决方案,但现有的方法主要是为分类而设计的,并不直接适用于回归任务。在这项工作中,我们解决的回归任务的时空融合(STF)的地表温度估计。我们提出了一个不确定性感知的TTA框架,它只更新预先训练的STF模型的融合模块,由认知不确定性,土地利用和土地覆盖一致性以及偏差校正指导,而不需要源数据或标记的目标样本。在四个气候不同的目标地区,即意大利的罗马、埃及的开罗、西班牙的马德里和法国的蒙彼利埃进行的实验显示,法国奥尔良的预训练模型的RMSE和MAE得到了一致的改善。平均增益分别为24.2%和27.9%,即使使用有限的未标记目标数据和仅10个TTA时期。
摘要:Deep learning models have shown great promise in diverse remote sensing applications. However, they often struggle to generalize across geographic regions unseen during training due to domain shifts. Domain shifts occur when data distributions differ between the training region and new target regions, due to variations in land cover, climate, and environmental conditions. Test-time adaptation (TTA) has emerged as a solution to such shifts, but existing methods are primarily designed for classification and are not directly applicable to regression tasks. In this work, we address the regression task of spatio-temporal fusion (STF) for land surface temperature estimation. We propose an uncertainty-aware TTA framework that updates only the fusion module of a pre-trained STF model, guided by epistemic uncertainty, land use and land cover consistency, and bias correction, without requiring source data or labeled target samples. Experiments on four target regions with diverse climates, namely Rome in Italy, Cairo in Egypt, Madrid in Spain, and Montpellier in France, show consistent improvements in RMSE and MAE for a pre-trained model in Orléans, France. The average gains are 24.2% and 27.9%, respectively, even with limited unlabeled target data and only 10 TTA epochs.
【4】Extended Hybrid Timed Petri Nets with Semi-Supervised Anomaly Detection for Switched Systems, Modelling and Fault Detection
标题:具有半监督异常检测的扩展混合时间Petri网用于交换系统、建模和故障检测
链接:https://arxiv.org/abs/2604.04051
作者:Fatiha Hamdi,Abdelhafid Zeroual,Fouzi Harrou
摘要:混合物理系统结合了连续和离散动态,可以同时受到故障的影响。传统的故障检测方法通常单独对待这些动态,限制了它们捕获相互作用的故障模式的能力。本文提出了一个统一的故障检测框架的混合动力系统集成扩展时间连续Petri网(ETCPN)模型与半监督异常检测。建议ETCPN扩展现有的Petri网形式主义,通过引入标记相关的流函数,使离散和连续动态之间的内在耦合。在此基础上,设计了一种模式相关的混合观测器,通过离线求解线性矩阵不等式(LMI)来确定观测器增益,从而保证其在任意切换下的稳定性。观测器生成反映估计输出和测量输出之间差异的残差。这些残差使用半监督方法进行处理,包括单类SVM(OC-SVM),支持向量数据描述(SVDD)和椭圆包络(EE),仅在正常数据上进行训练,以避免依赖标记的故障。该框架通过包括离散故障、连续故障和混合故障的仿真进行了验证。结果表明,高检测精度,快速收敛,和强大的性能,与OC-SVM和SVDD提供检测率和虚警之间的最佳权衡。该框架对于实时部署来说计算效率很高,因为主要复杂性仅限于离线LMI设计阶段。
摘要:Hybrid physical systems combine continuous and discrete dynamics, which can be simultaneously affected by faults. Conventional fault detection methods often treat these dynamics separately, limiting their ability to capture interacting fault patterns. This paper proposes a unified fault detection framework for hybrid dynamical systems by integrating an Extended Timed Continuous Petri Net (ETCPN) model with semi-supervised anomaly detection. The proposed ETCPN extends existing Petri net formalisms by introducing marking-dependent flow functions, enabling intrinsic coupling between discrete and continuous dynamics. Based on this structure, a mode-dependent hybrid observer is designed, whose stability under arbitrary switching is ensured via Linear Matrix Inequalities (LMIs), solved offline to determine observer gains. The observer generates residuals that reflect discrepancies between the estimated and measured outputs. These residuals are processed using semi-supervised methods, including One-Class SVM (OC-SVM), Support Vector Data Description (SVDD), and Elliptic Envelope (EE), trained exclusively on normal data to avoid reliance on labeled faults. The framework is validated through simulations involving discrete faults, continuous faults, and hybrid faults. Results demonstrate high detection accuracy, fast convergence, and robust performance, with OC-SVM and SVDD providing the best trade-off between detection rate and false alarms. The framework is computationally efficient for real-time deployment, as the main complexity is confined to the offline LMI design phase.
【5】Supervised Dimensionality Reduction Revisited: Why LDA on Frozen CNN Features Deserves a Second Look
标题:重新审视受监督的主观性减少:为什么Frozen CNN专题中的LDA值得重新审视
链接:https://arxiv.org/abs/2604.03928
作者:Indar Kumar,Girish Karhana,Sai Krishna Jasti,Ankit Hemant Lade
备注:9 pages, 4 figures, 6 tables. Code available at https://github.com/IndarKarhana/lda-image-classification
摘要:有效的网约车调度需要预测需求模式,这些模式在一天中的不同时间、一周中的不同日子、季节和特殊事件中会有很大变化。我们提出了一个制度校准的方法,(i)分段的历史行程数据到需求制度,(ii)匹配当前的运营期,以最相似的历史类比通过六个度量相似性合奏(Kolmogorov-Smirnov,Wasserstein-1,特征距离,方差比,事件模式,时间接近度),以及(iii)在驱动基于LP的车队重新定位策略和具有匈牙利匹配的批量调度之前使用所得到的校准需求。在消融中,仅分布的子集在平均等待上是最强的,而完整的集合被保留为面向鲁棒性的默认值。 在8个不同场景中对520万次纽约市TLC旅行进行了评估(冬季/夏季,工作日/周末/假期,早上/晚上/晚上),每个5个随机种子,我们的方法将平均骑手等待时间减少了31.1%(自举95% CI:[26.5,36.6]%;各场景中Friedman卡方= 80.0,p = 4.25e-18; Cohen d = 7.5-29.9)。改善延伸到尾部:P95等待下降37.6%,等待时间的基尼系数从0.441提高到0.409(相对7.3%)。这两个贡献构成乘法和独立验证:校准提供了16.9%的减少; LP重新定位增加了15.5%。该方法不需要训练,具有确定性和可解释性,适用于芝加哥(通过纽约市构建的政权库减少了23.3%的等待时间),并且在车队规模上具有鲁棒性(对于0.5- 2倍的车队规模,提高了32-47%)。我们提供全面的消融研究,正式的统计测试,并与OSRM路由保真度验证。
摘要:Effective ride-hailing dispatch requires anticipating demand patterns that vary substantially across time-of-day, day-of-week, season, and special events. We propose a regime-calibrated approach that (i) segments historical trip data into demand regimes, (ii) matches the current operating period to the most similar historical analogues via a six-metric similarity ensemble (Kolmogorov-Smirnov, Wasserstein-1, feature distance, variance ratio, event pattern, temporal proximity), and (iii) uses the resulting calibrated demand prior to drive both an LP-based fleet repositioning policy and batch dispatch with Hungarian matching. In ablation, a distributional-only subset is strongest on mean wait, while the full ensemble is retained as a robustness-oriented default. Evaluated on 5.2 million NYC TLC trips across 8 diverse scenarios (winter/summer, weekday/weekend/holiday, morning/evening/night) with 5 random seeds each, our method reduces mean rider wait times by 31.1% (bootstrap 95% CI: [26.5, 36.6]%; Friedman chi-sq = 80.0, p = 4.25e-18; Cohen's d = 7.5-29.9 across scenarios). The improvement extends to the tail: P95 wait drops 37.6% and the Gini coefficient of wait times improves from 0.441 to 0.409 (7.3% relative). The two contributions compose multiplicatively and are independently validated: calibration provides 16.9% reduction; LP repositioning adds a further 15.5%. The approach requires no training, is deterministic and explainable, generalizes to Chicago (23.3% wait reduction via NYC-built regime library), and is robust across fleet sizes (32-47% improvement for 0.5-2x fleet scaling). We provide comprehensive ablation studies, formal statistical tests, and routing-fidelity validation with OSRM.
【6】Spatiotemporal Interpolation of GEDI Biomass with Calibrated Uncertainty
标题:具有校准不确定性的GEDI生物量时空内插
链接:https://arxiv.org/abs/2604.03874
作者:Robin Young,Srinivasan Keshav
摘要:监测森林砍伐驱动的碳排放需要对地上生物量密度(AGBD)进行空间上明确和时间上连续的估计,并具有校准的不确定性。美国宇航局的全球生态系统动力学调查(GEDI)提供了可靠的激光雷达衍生AGBD,但其轨道采样导致时空覆盖不规则,偶尔的操作中断,包括从2023年3月到2024年4月的13个月冬眠,在观测记录中留下了很长的空白。先前的工作已经使用机器学习方法来填补GEDI的空间差距,使用卫星衍生的功能,但通过未观察到的时期,特别是在活跃的干扰事件,生物量的时间内插,仍然在很大程度上没有得到解决。此外,生物量绘图的标准集成方法已被证明会产生系统性的错误校准的预测区间。为了解决这些差距,我们扩展了注意神经过程(ANP)框架,以前应用于空间生物量插值,联合稀疏时空设置使用地理空间基础模型嵌入。我们对称地对待空间和时间,经验验证了一种形式的空间换时间的替代,在这种替代中,在其他时间从附近的位置观察到的信息在保持期间的预测。我们的研究结果表明,ANP产生良好校准的不确定性估计在整个干扰制度,支持其使用在测量,报告和验证(MRV)的应用程序,需要可靠的不确定性量化森林碳核算。
摘要:Monitoring deforestation-driven carbon emissions requires both spatially explicit and temporally continuous estimates of aboveground biomass density (AGBD) with calibrated uncertainty. NASA's Global Ecosystem Dynamics Investigation (GEDI) provides reliable LIDAR-derived AGBD, but its orbital sampling causes irregular spatiotemporal coverage, and occasional operational interruptions, including a 13-month hibernation from March 2023 to April 2024, leave extended gaps in the observational record. Prior work has used machine learning approaches to fill GEDI's spatial gaps using satellite-derived features, but temporal interpolation of biomass through unobserved periods, particularly across active disturbance events, remains largely unaddressed. Moreover, standard ensemble methods for biomass mapping have been shown to produce systematically miscalibrated prediction intervals. To address these gaps, we extend the Attentive Neural Process (ANP) framework, previously applied to spatial biomass interpolation, to jointly sparse spatiotemporal settings using geospatial foundation model embeddings. We treat space and time symmetrically, empirically validating a form of space-for-time substitution in which observations from nearby locations at other times inform predictions at held-out periods. Our results demonstrate that the ANP produces well-calibrated uncertainty estimates across disturbance regimes, supporting its use in Measurement, Reporting, and Verification (MRV) applications that require reliable uncertainty quantification for forest carbon accounting.
【7】CoLoRSMamba: Conditional LoRA-Steered Mamba for Supervised Multimodal Violence Detection
标题:CoLoRSMamba:有条件LoRA引导的曼巴,用于监督多模式暴力检测
链接:https://arxiv.org/abs/2604.03329
作者:Damith Chamalke Senadeera,Dimitrios Kollias,Gregory Slabaugh
摘要
:暴力检测受益于音频,但现实世界的音景可能是嘈杂的或与可见场景弱相关的。我们提出了CoLoRSMamba,一个定向的视频到音频多模式架构,通过CLS引导的条件LoRA耦合VideoMamba和AudioMamba。在每一层,VideoMamba CLS令牌产生一个通道调制矢量和一个稳定门,用于调整AudioMamba投影,负责选择性状态空间参数(Delta,B,C),包括步长路径,产生场景感知的音频动态,而没有令牌级的交叉注意。训练将二进制分类与对称的AV-InfoNCE目标相结合,该目标将剪辑级音频和视频嵌入对齐。为了支持公平的多模态评估,我们从时间注释中策划NTU-CCTV和DVD数据集的音频过滤剪辑级别子集,仅保留具有可用音频的剪辑。在这些子集上,CoLoRSMAamba优于代表性的仅音频,仅视频和多模式基线,在NTU-CCTV上实现88.63%的准确性/86.24%F1-V,在DVD上实现75.77%的准确性/72.94%F1-V。它还提供了一个有利的准确性和效率的权衡,超过了几个更大的模型,更少的参数和FLOP。
摘要:Violence detection benefits from audio, but real-world soundscapes can be noisy or weakly related to the visible scene. We present CoLoRSMamba, a directional Video to Audio multimodal architecture that couples VideoMamba and AudioMamba through CLS-guided conditional LoRA. At each layer, the VideoMamba CLS token produces a channel-wise modulation vector and a stabilization gate that adapt the AudioMamba projections responsible for the selective state-space parameters (Delta, B, C), including the step-size pathway, yielding scene-aware audio dynamics without token-level cross-attention. Training combines binary classification with a symmetric AV-InfoNCE objective that aligns clip-level audio and video embeddings. To support fair multimodal evaluation, we curate audio-filtered clip level subsets of the NTU-CCTV and DVD datasets from temporal annotations, retaining only clips with available audio. On these subsets, CoLoRSMamba outperforms representative audio-only, video-only, and multimodal baselines, achieving 88.63% accuracy / 86.24% F1-V on NTU-CCTV and 75.77% accuracy / 72.94% F1-V on DVD. It further offers a favorable accuracy-efficiency tradeoff, surpassing several larger models with fewer parameters and FLOPs.
迁移|Zero/Few/One-Shot|自适应(10篇)
【1】Data Attribution in Adaptive Learning
标题:适应性学习中的数据归因
链接:https://arxiv.org/abs/2604.04892
作者:Amit Kiran Rege
备注:Work in progress
摘要:机器学习模型越来越多地生成自己的训练数据--在线强盗、强化学习和语言模型的后训练管道都是主要的例子。在这些自适应设置中,单个训练观察既更新了学习者,又改变了学习者将收集的未来数据的分布。为静态数据集设计的标准归因方法忽略了这种反馈。我们通过一个有条件的干预目标,形式化的有限视野自适应学习的发生级别的属性,证明重放侧的信息不能恢复它在一般情况下,并确定一个结构类,其中的目标是从记录的数据中识别。
摘要:Machine learning models increasingly generate their own training data -- online bandits, reinforcement learning, and post-training pipelines for language models are leading examples. In these adaptive settings, a single training observation both updates the learner and shifts the distribution of future data the learner will collect. Standard attribution methods, designed for static datasets, ignore this feedback. We formalize occurrence-level attribution for finite-horizon adaptive learning via a conditional interventional target, prove that replay-side information cannot recover it in general, and identify a structural class in which the target is identified from logged data.
【2】ZeD-MAP: Bundle Adjustment Guided Zero-Shot Depth Maps for Real-Time Aerial Imaging
标题:ZeD-MAP:用于实时航空成像的束调整引导零拍摄深度图
链接:https://arxiv.org/abs/2604.04667
作者:Selim Ahmet Iz,Francesco Nex,Norman Kerle,Henry Meissner,Ralf Berger
摘要:超高分辨率无人机图像的实时深度重建对于时间关键的地理空间任务(如灾难响应)至关重要,但由于宽基线视差,大图像尺寸,低纹理或镜面,遮挡和严格的计算约束,仍然具有挑战性。最近的zero-shot扩散模型提供了快速的每图像密集预测,而无需特定于任务的再训练,并且需要比基于变换的预测器更少的标记数据集,同时避免了经典多视图立体的刚性捕获几何要求。然而,它们的概率推断阻止了跨连续帧和重叠图块的可靠度量准确性和时间一致性。我们提出了ZeD-MAP,一个集群级框架,通过集成增量基于集群的捆绑调整(BA),将测试时扩散深度模型转换为度量一致的SLAM映射管道。流式传输的UAV帧被分组为重叠的集群;周期性BA产生度量一致的姿势和稀疏的3D连接点,这些连接点被重新投影到选定的帧中,并用作基于扩散的深度估计的度量指导。验证在约50米高度捕获的地面标志飞行(GSD约为0.85 cm/px,对应于每帧2,650平方米的地面覆盖)与DLR模块化航空相机系统(MACS)的对比表明,我们的方法实现了亚米精度,在水平(XY)平面上的误差约为0.87 m,在垂直(Z)方向上的误差约为0.12 m,同时将每个图像的运行时间保持在1.47和4.91秒之间。结果受到手动点云注释的轻微噪声的影响。这些研究结果表明,基于BA的度量指导提供了与经典摄影测量方法相当的一致性,同时显着加速处理,实现实时3D地图生成。
摘要:Real-time depth reconstruction from ultra-high-resolution UAV imagery is essential for time-critical geospatial tasks such as disaster response, yet remains challenging due to wide-baseline parallax, large image sizes, low-texture or specular surfaces, occlusions, and strict computational constraints. Recent zero-shot diffusion models offer fast per-image dense predictions without task-specific retraining, and require fewer labelled datasets than transformer-based predictors while avoiding the rigid capture geometry requirement of classical multi-view stereo. However, their probabilistic inference prevents reliable metric accuracy and temporal consistency across sequential frames and overlapping tiles. We present ZeD-MAP, a cluster-level framework that converts a test-time diffusion depth model into a metrically consistent, SLAM-like mapping pipeline by integrating incremental cluster-based bundle adjustment (BA). Streamed UAV frames are grouped into overlapping clusters; periodic BA produces metrically consistent poses and sparse 3D tie-points, which are reprojected into selected frames and used as metric guidance for diffusion-based depth estimation. Validation on ground-marker flights captured at approximately 50 m altitude (GSD is approximately 0.85 cm/px, corresponding to 2,650 square meters ground coverage per frame) with the DLR Modular Aerial Camera System (MACS) shows that our method achieves sub-meter accuracy, with approximately 0.87 m error in the horizontal (XY) plane and 0.12 m in the vertical (Z) direction, while maintaining per-image runtimes between 1.47 and 4.91 seconds. Results are subject to minor noise from manual point-cloud annotation. These findings show that BA-based metric guidance provides consistency comparable to classical photogrammetric methods while significantly accelerating processing, enabling real-time 3D map generation.
【3】SAIL: Scene-aware Adaptive Iterative Learning for Long-Tail Trajectory Prediction in Autonomous Vehicles
标题:SAIL:用于自动驾驶汽车长尾轨迹预测的场景感知自适应迭代学习
链接:https://arxiv.org/abs/2604.04573
作者:Bin Rao,Haicheng Liao,Chengyue Wang,Keqiang Li,Zhenning Li,Hai Yang
摘要:自动驾驶汽车(AV)依赖于准确的轨迹预测,以在不同的交通环境中安全导航,但现有的模型与长尾障碍物斗争-罕见但安全关键的事件,其特征是突然机动,高碰撞风险和复杂的相互作用。这些挑战源于数据不平衡,长尾轨迹的定义不充分,以及将常见行为优先于不常见行为的次优学习策略。为了解决这个问题,我们提出了SAIL,一个新的框架,系统地解决了长尾问题,首先定义和建模轨迹在三个关键属性维度:预测误差,碰撞风险和状态复杂性。然后,我们的方法协同属性引导的增强和特征提取过程与高度自适应的对比学习策略。该策略采用连续余弦动量调度、相似加权硬否定挖掘和基于演化特征聚类的动态伪标记机制。此外,它采用了一个聚焦机制,以加强对每个识别类中的硬阳性样本的学习。这种全面的设计使SAIL能够在识别和预测多样化和具有挑战性的长尾事件方面表现出色。对nuScenes和ETH/UCY数据集的广泛评估证明了SAIL的卓越性能,与最先进的基线相比,在最难的1%长尾样本上的预测误差降低了28.8%,同时在所有场景中保持了具有竞争力的准确性。该框架在现实世界的混合自主设置中推进了可靠的AV轨迹预测。
摘要:Autonomous vehicles (AVs) rely on accurate trajectory prediction for safe navigation in diverse traffic environments, yet existing models struggle with long-tail scenarios-rare but safety-critical events characterized by abrupt maneuvers, high collision risks, and complex interactions. These challenges stem from data imbalance, inadequate definitions of long-tail trajectories, and suboptimal learning strategies that prioritize common behaviors over infrequent ones. To address this, we propose SAIL, a novel framework that systematically tackles the long-tail problem by first defining and modeling trajectories across three key attribute dimensions: prediction error, collision risk, and state complexity. Our approach then synergizes an attribute-guided augmentation and feature extraction process with a highly adaptive contrastive learning strategy. This strategy employs a continuous cosine momentum schedule, similarity-weighted hard-negative mining, and a dynamic pseudo-labeling mechanism based on evolving feature clustering. Furthermore, it incorporates a focusing mechanism to intensify learning on hard-positive samples within each identified class. This comprehensive design enables SAIL to excel at identifying and forecasting diverse and challenging long-tail events. Extensive evaluations on the nuScenes and ETH/UCY datasets demonstrate SAIL's superior performance, achieving up to 28.8% reduction in prediction error on the hardest 1% of long-tail samples compared to state-of-the-art baselines, while maintaining competitive accuracy across all scenarios. This framework advances reliable AV trajectory prediction in real-world, mixed-autonomy settings.
【4】GAIN: Multiplicative Modulation for Domain Adaptation
标题:GAIN:用于域自适应的相乘调制
链接:https://arxiv.org/abs/2604.04516
作者:Hengshuai Yao,Xing Chen,Ahmed Murtadha,Guan Wang
摘要:使LLM适应新的域会导致遗忘,因为标准方法(全微调,LoRA)会向权重空间注入新的方向。我们提出了GAIN,它通过乘法调制W_new = S * W来重新强调现有特征。学习的对角矩阵S被应用于注意力输出投影和可选地FFN。这一原理反映了神经科学中的增益调节,神经元通过调整反应强度来适应环境,同时保持选择性。 我们从四个家庭(774 M至70 B)的五个模型上评估增益,依次适应八个领域。GAIN-FFN匹配LoRA的域内自适应,但它们对先前训练的域的影响是相反的:GAIN-FFN将它们提高了7-13%(验证PPL),而LoRA将它们降低了18- 36%。下游的准确性证实了这一模式:例如,在Qwen2.5上进行了七次连续调整后,GAIN-FFN仅使BoolQ降低了0.8%,而LoRA则使其降低了14.9%。GAIN为每个模型添加了46 K-230 K个参数,可以吸收到预训练的权重中,从而实现零推理成本。
摘要:Adapting LLMs to new domains causes forgetting because standard methods (full fine-tuning, LoRA) inject new directions into the weight space. We propose GAIN, which re-emphasizes existing features through multiplicative modulation W_new = S * W. The learned diagonal matrix S is applied to the attention output projection and optionally the FFN. The principle mirrors gain modulation in neuroscience, where neurons adapt to context by scaling response strength while preserving selectivity. We evaluate GAIN on five models from four families (774M to 70B), adapting sequentially across eight domains. GAIN-FFN matches LoRA's in-domain adaptation, but their effects on previously trained domains are opposite: GAIN-FFN improves them by 7-13% (validation PPL), while LoRA degrades them by 18-36%. Downstream accuracy confirms the pattern: for example, after seven sequential adaptations on Qwen2.5, GAIN-FFN degrades BoolQ by only 0.8% while LoRA damages it by 14.9%. GAIN adds 46K-230K parameters per model and can be absorbed into the pretrained weights for zero inference cost.
【5】Jellyfish: Zero-Shot Federated Unlearning Scheme with Knowledge Disentanglement
标题:水母:具有知识解开的Zero-Shot联邦取消学习计划
链接:https://arxiv.org/abs/2604.04030
作者:Houzhe Wang,Xiaojie Zhu,Chi Chen
摘要:随着数据隐私和安全的重要性日益增加,联邦遗忘成为一个新的研究领域,致力于确保一旦删除特定数据,联邦学习模型不再保留或披露相关信息。在本文中,我们提出了一个zero-shot联邦unlearning计划,命名为水母。它在四个关键方面区别于传统的联邦非学习框架:合成数据生成,知识分解,损失函数设计和模型修复。为了保护被遗忘数据的隐私,我们设计了一个zero-shot unlearning机制,生成错误最小化噪声作为被遗忘数据的代理数据。为了保持模型效用,我们首先提出了一种知识解纠缠机制,该机制通过限制被遗忘数据的激活通道数量并鼓励激活稀疏性来调节最终卷积层的输出。接下来,我们构建了一个综合损失函数,该函数包含多个分量,包括硬损失,混淆损失,蒸馏损失,模型权重漂移损失,梯度协调和梯度掩蔽,以有效地对齐“遗忘”和“保留”目标的学习轨迹。最后,我们提出了一个zero-shot修复机制,利用代理数据在可接受的范围内恢复模型的准确性,而无需访问用户的本地数据。为了评估所提出的zero-shot联邦unlearning方案的性能,我们在不同的设置进行了全面的实验。仿真结果验证了该方案的有效性和鲁棒性。
摘要
:With the increasing importance of data privacy and security, federated unlearning emerges as a new research field dedicated to ensuring that once specific data is deleted, federated learning models no longer retain or disclose related information. In this paper, we propose a zero-shot federated unlearning scheme, named Jellyfish. It distinguishes itself from conventional federated unlearning frameworks in four key aspects: synthetic data generation, knowledge disentanglement, loss function design, and model repair. To preserve the privacy of forgotten data, we design a zero-shot unlearning mechanism that generates error-minimization noise as proxy data for the data to be forgotten. To maintain model utility, we first propose a knowledge disentanglement mechanism that regularises the output of the final convolutional layer by restricting the number of activated channels for the data to be forgotten and encouraging activation sparsity. Next, we construct a comprehensive loss function that incorporates multiple components, including hard loss, confusion loss, distillation loss, model weight drift loss, gradient harmonization, and gradient masking, to effectively align the learning trajectories of the objectives of ``forgetting" and ``retaining". Finally, we propose a zero-shot repair mechanism that leverages proxy data to restore model accuracy within acceptable bounds without accessing users' local data. To evaluate the performance of the proposed zero-shot federated unlearning scheme, we conducted comprehensive experiments across diverse settings. The results validate the effectiveness and robustness of the scheme.
【6】Lightweight Query Routing for Adaptive RAG: A Baseline Study on RAGRouter-Bench
标题:自适应RAG的轻量级查询路由:RAGRouter-Bench的基线研究
链接:https://arxiv.org/abs/2604.03455
作者:Prakhar Bansal,Shivangi Agarwal
备注:5 pages, 3 tables
摘要:检索-增强生成流水线跨越各种检索策略,这些策略在令牌成本和功能方面有很大差异。为每个查询选择正确的策略是一个实际的效率问题,但是没有路由分类器在RAGRouter-Bench \citep{wang 2026 ragrouterbench}上进行过训练,RAGRouter-Bench \citep{wang 2026 ragrouterbench}是最近发布的跨越四个知识领域的7,727 $查询的基准,每个查询都注释了三种规范查询类型之一:事实,推理和总结。我们提出了第一个系统的评估轻量级分类器为基础的路由在这个基准。五个经典的分类器进行了评估,根据三个功能制度,即TF-IDF,MiniLM句子嵌入\citep{reimers 2019 sbert},手工制作的结构特征,产生15个分类器的功能组合。我们的最佳配置,具有SVM的TF-IDF,实现了$\mathbf{0.928}$的宏观平均F1和$\mathbf{93.2\%}$的准确性,同时模拟了相对于始终使用最昂贵的范例的$\mathbf{28.1\%}$令牌节省。词汇TF-IDF功能优于语义句子嵌入$3.1$宏F1点,这表明表面关键字模式是查询类型复杂性的强预测因子。域级分析表明,医疗查询是最难路由和法律查询最易处理。这些结果建立了一个可重复的查询端基线,并突出了语料库感知路由必须关闭的差距。
摘要:Retrieval-Augmented Generation pipelines span a wide range of retrieval strategies that differ substantially in token cost and capability. Selecting the right strategy per query is a practical efficiency problem, yet no routing classifiers have been trained on RAGRouter-Bench \citep{wang2026ragrouterbench}, a recently released benchmark of $7,727$ queries spanning four knowledge domains, each annotated with one of three canonical query types: factual, reasoning, and summarization. We present the first systematic evaluation of lightweight classifier-based routing on this benchmark. Five classical classifiers are evaluated under three feature regimes, namely, TF-IDF, MiniLM sentence embeddings \citep{reimers2019sbert}, and hand-crafted structural features, yielding 15 classifier feature combinations. Our best configuration, TF-IDF with an SVM, achieves a macro-averaged F1 of $\mathbf{0.928}$ and an accuracy of $\mathbf{93.2\%}$, while simulating $\mathbf{28.1\%}$ token savings relative to always using the most expensive paradigm. Lexical TF-IDF features outperform semantic sentence embeddings by $3.1$ macro-F1 points, suggesting that surface keyword patterns are strong predictors of query-type complexity. Domain-level analysis reveals that medical queries are hardest to route and legal queries most tractable. These results establish a reproducible query-side baseline and highlight the gap that corpus-aware routing must close.
【7】Neural Operators for Multi-Task Control and Adaptation
标题:用于多任务控制和自适应的神经算子
链接:https://arxiv.org/abs/2604.03449
作者:David Sewell,Xingjian Li,Stepan Tretiakov,Krishna Kumar,David Fridovich-Keil
备注:25 pages, 10 figures, 2 tables
摘要:神经算子方法已经成为学习无限维函数空间之间映射的强大工具,但它们在最优控制中的潜力在很大程度上尚未开发。我们专注于多任务控制问题,其解决方案是从任务描述(例如,成本或动态函数)到最优控制律(例如,反馈政策)。我们近似这些解决方案运营商使用置换不变的神经运算符架构。在一系列参数最优控制环境和运动基准中,通过行为克隆训练的单个操作员准确地近似解决方案操作员,并推广到看不见的任务,分布设置和不同数量的任务观察。我们进一步表明,我们的神经运算符架构的分支-主干结构能够有效和灵活地适应新的任务。我们开发了从轻量级更新到全网络微调的结构化适应策略,在不同的数据和计算设置中实现强大的性能。最后,我们引入了元训练算子变体,优化了Few-Shot自适应的初始化。这些方法能够在有限的数据下快速适应任务,并且始终优于流行的元学习基线。总之,我们的研究结果表明,神经操作符提供了一个统一的和有效的框架,多任务控制和适应。
摘要:Neural operator methods have emerged as powerful tools for learning mappings between infinite-dimensional function spaces, yet their potential in optimal control remains largely unexplored. We focus on multi-task control problems, whose solution is a mapping from task description (e.g., cost or dynamics functions) to optimal control law (e.g., feedback policy). We approximate these solution operators using a permutation-invariant neural operator architecture. Across a range of parametric optimal control environments and a locomotion benchmark, a single operator trained via behavioral cloning accurately approximates the solution operator and generalizes to unseen tasks, out-of-distribution settings, and varying amounts of task observations. We further show that the branch-trunk structure of our neural operator architecture enables efficient and flexible adaptation to new tasks. We develop structured adaptation strategies ranging from lightweight updates to full-network fine-tuning, achieving strong performance across different data and compute settings. Finally, we introduce meta-trained operator variants that optimize the initialization for few-shot adaptation. These methods enable rapid task adaptation with limited data and consistently outperform a popular meta-learning baseline. Together, our results demonstrate that neural operators provide a unified and efficient framework for multi-task control and adaptation.
【8】Zero-Shot Quantization via Weight-Space Arithmetic
标题:通过权空间算法的Zero-Shot量化
链接:https://arxiv.org/abs/2604.03420
作者:Daniele Solombrino,Antonio Andrea Gargiulo,Adrian Robert Minut,Luca Zhou,Alessandro Zirilli,Emanuele Rodolà
摘要
:我们证明了对后训练量化(PTQ)的鲁棒性是权重空间中的可转移方向。我们称这个方向为量化向量:通过简单的权重空间算法从供体任务中提取,它可以用于修补接收器模型,并将对PTQ引起的噪声的鲁棒性提高多达60%,而无需接收器侧量化感知训练(QAT)。由于该方法不需要接收机训练数据,因此它为极低比特部署提供了QAT的zero-shot、低成本替代方案。我们在Vision Transformer(ViT)模型上演示了这一点。更广泛地说,我们的研究结果表明,量化鲁棒性不仅仅是特定任务训练的副产品,而是权重空间几何的可重用特征,可以转移而不是重新训练。
摘要:We show that robustness to post-training quantization (PTQ) is a transferable direction in weight space. We call this direction the quantization vector: extracted from a donor task by simple weight-space arithmetic, it can be used to patch a receiver model and improve robustness to PTQ-induced noise by as much as 60%, without receiver-side quantization-aware training (QAT). Because the method requires no receiver training data, it provides a zero-shot, low-cost alternative to QAT for extremely low-bit deployment. We demonstrate this on Vision Transformer (ViT) models. More broadly, our results suggest that quantization robustness is not merely a byproduct of task-specific training, but a reusable feature of weight-space geometry that can be transferred rather than retrained.
【9】Adaptive Threshold-Driven Continuous Greedy Method for Scalable Submodular Optimization
标题:可扩展子模块优化的自适应阈值驱动连续贪婪方法
链接:https://arxiv.org/abs/2604.03419
作者:Mohammadreza Rostami,Solmaz S. Kia
摘要:Submodular maximization under matroid constraints is a fundamental problem in combinatorial optimization with applications in sensing, data summarization, active learning, and resource allocation. While the Sequential Greedy (SG) algorithm achieves only a $\frac{1}{2}$-approximation due to irrevocable selections, Continuous Greedy (CG) attains the optimal $\bigl(1-\frac{1}{e}\bigr)$-approximation via the multilinear relaxation, at the cost of a progressively dense decision vector that forces agents to exchange feature embeddings for nearly every ground-set element. We propose \textit{ATCG} (\underline{A}daptive \underline{T}hresholded \underline{C}ontinuous \underline{G}reedy), which gates gradient evaluations behind a per-partition progress ratio $η_i$, expanding each agent's active set only when current candidates fail to capture sufficient marginal gain, thereby directly bounding which feature embeddings are ever transmitted. Theoretical analysis establishes a curvature-aware approximation guarantee with effective factor $τ_{\mathrm{eff}}=\max\{τ,1-c\}$, interpolating between the threshold-based guarantee and the low-curvature regime where \textit{ATCG} recovers the performance of CG. Experiments on a class-balanced prototype selection problem over a subset of the CIFAR-10 animal dataset show that \textit{ATCG} achieves objective values comparable to those of the full CG method while substantially reducing communication overhead through adaptive active-set expansion.
摘要:Submodular maximization under matroid constraints is a fundamental problem in combinatorial optimization with applications in sensing, data summarization, active learning, and resource allocation. While the Sequential Greedy (SG) algorithm achieves only a $\frac{1}{2}$-approximation due to irrevocable selections, Continuous Greedy (CG) attains the optimal $\bigl(1-\frac{1}{e}\bigr)$-approximation via the multilinear relaxation, at the cost of a progressively dense decision vector that forces agents to exchange feature embeddings for nearly every ground-set element. We propose \textit{ATCG} (\underline{A}daptive \underline{T}hresholded \underline{C}ontinuous \underline{G}reedy), which gates gradient evaluations behind a per-partition progress ratio $η_i$, expanding each agent's active set only when current candidates fail to capture sufficient marginal gain, thereby directly bounding which feature embeddings are ever transmitted. Theoretical analysis establishes a curvature-aware approximation guarantee with effective factor $τ_{\mathrm{eff}}=\max\{τ,1-c\}$, interpolating between the threshold-based guarantee and the low-curvature regime where \textit{ATCG} recovers the performance of CG. Experiments on a class-balanced prototype selection problem over a subset of the CIFAR-10 animal dataset show that \textit{ATCG} achieves objective values comparable to those of the full CG method while substantially reducing communication overhead through adaptive active-set expansion.
【10】Physics-Constrained Adaptive Flow Matching for Climate Downscaling
标题:用于气候缩减的物理约束自适应流匹配
链接:https://arxiv.org/abs/2604.03459
作者:Kevin Debeire,Aytaç Paçal,Pierre Gentine,Luis Medrano-Navarro,Nils Thuerey,Veronika Eyring
备注:submitted
摘要:公里尺度的区域气候信息对于评估气候变化的影响至关重要,但由于计算成本高,使用全球气候模型生成这些信息过于昂贵。机器学习模型提供了一种快速的替代方案,但它们往往违反基本的物理定律,并且在应用于其训练分布之外的气候时会退化。我们提出了物理约束自适应流匹配(PC-AFM),生成降尺度模型,解决这两个问题。基于Fotiadis等人的自适应流量匹配(AFM)模型(2025)作为我们的基线,我们添加了软守恒约束,使降尺度输出与降水和湿度的大规模输入保持一致,并通过ConFIG算法使用梯度手术,以防止这些约束干扰生成目标。我们对中欧气候数据进行训练模型,在六个变量(近地表温度,降水,比湿度,表面压力和水平风分量)上进行10次降尺度任务(63公里至6.3公里)评估,并在一组全面的指标(包括偏差,集合技能得分,功率谱和守恒误差)上进行评估,并在两个气候区域上测试泛化。在训练分布中,PC-AFM减少了守恒误差,提高了整体校准,同时匹配标准技能指标的基线。在训练分布之外,不受约束的模型通过外推学习的统计数据产生了较大的系统误差,PC-AFM将降水湿偏差减半,减少了守恒误差并提高了极端分位数的准确性,所有这些都没有关于推断时目标气候的任何信息。这些结果表明,物理一致性是在现实世界的应用中部署生成降尺度模型的实际要求。
摘要
:Regional climate information at kilometer scales is essential for assessing the impacts of climate change, but generating it with global climate models is too expensive due to their high computational costs. Machine learning models offer a fast alternative, yet they often violate basic physical laws and degrade when applied to climates outside of their training distribution. We present Physics-Constrained Adaptive Flow Matching (PC-AFM), a generative downscaling model that addresses both problems. Building on the Adaptive Flow Matching (AFM) model of Fotiadis et al. (2025) as our baseline, we add soft conservation constraints that keep the downscaled output consistent with the large-scale input for precipitation and humidity, and use gradient surgery via the ConFIG algorithm to prevent these constraints from interfering with the generative objective. We train the model on Central Europe climate data, evaluate it on a 10-time downscaling task (63km to 6.3km) over six variables (near-surface temperature, precipitation, specific humidity, surface pressure, and horizontal wind components) across a comprehensive set of metrics including bias, ensemble skill scores, power spectra, and conservation error, and test the generalization on two held-out climate regions. Within the training distribution, PC-AFM reduces conservation errors and improves ensemble calibration while matching the baseline on standard skill metrics. Outside the training distribution, where unconstrained models develop large systematic errors by extrapolating learned statistics, PC-AFM halves precipitation wet bias, reduces conservation error and improves extreme-quantile accuracy, all without any information about the target climate at inference time. These results indicate that physical consistency is a practical requirement for deploying generative downscaling models in real-world applications.
强化学习(9篇)
【1】Stratifying Reinforcement Learning with Signal Temporal Logic
标题:利用信号时态逻辑分层强化学习
链接:https://arxiv.org/abs/2604.04923
作者:Justin Curry,Alberto Speranzon
备注:8 pages, 13 figures
摘要:在本文中,我们开发了一个基于分层的语义信号时序逻辑(STL)中,每个原子谓词被解释为一个分层空间中的成员资格测试。这一观点揭示了分层理论和STL之间的一个新的对应原则,表明大多数STL公式可以被视为诱导时空分层。这种解释的意义是双重的。首先,它提供了一个新的理论框架来分析深度强化学习(DRL)生成的嵌入空间的结构,并将其与周围决策空间的几何结构联系起来。第二,它提供了一个原则性的框架,既可以重用现有的高维分析工具,又可以激励新的计算技术的创建。为了给理论打下基础,我们(1)说明分层理论在Minigrid游戏中的作用,(2)将数值技术应用于玩这种游戏的DRL代理的潜在嵌入,其中STL公式的鲁棒性用作奖励。在这个过程中,我们提出了计算效率高的签名,初步证据的基础上,出现有希望揭示这种嵌入空间的分层结构。
摘要:In this paper, we develop a stratification-based semantics for Signal Temporal Logic (STL) in which each atomic predicate is interpreted as a membership test in a stratified space. This perspective reveals a novel correspondence principle between stratification theory and STL, showing that most STL formulas can be viewed as inducing a stratification of space-time. The significance of this interpretation is twofold. First, it offers a fresh theoretical framework for analyzing the structure of the embedding space generated by deep reinforcement learning (DRL) and relates it to the geometry of the ambient decision space. Second, it provides a principled framework that both enables the reuse of existing high-dimensional analysis tools and motivates the creation of novel computational techniques. To ground the theory, we (1) illustrate the role of stratification theory in Minigrid games and (2) apply numerical techniques to the latent embeddings of a DRL agent playing such a game where the robustness of STL formulas is used as the reward. In the process, we propose computationally efficient signatures that, based on preliminary evidence, appear promising for uncovering the stratification structure of such embedding spaces.
【2】Selecting Decision-Relevant Concepts in Reinforcement Learning
标题:强化学习中选择决策相关概念
链接:https://arxiv.org/abs/2604.04808
作者:Naveen Raman,Stephanie Milani,Fei Fang
备注:16 pages, 13 figures
摘要:训练可解释的基于概念的政策需要从业者手动选择哪些人类可理解的概念,代理应该与顺序决策时的原因。这种选择需要领域专业知识,耗时且成本高昂,与候选人数量的比例很差,并且没有提供性能保证。为了克服这一限制,我们提出了第一个算法的原则自动概念选择顺序决策。我们的关键见解是,概念选择可以通过状态抽象的镜头来看待:直观地说,如果删除一个概念会导致代理混淆需要不同动作的状态,那么这个概念就是决策相关的。因此,智能体应该依赖于决策相关的概念;具有相同概念表示的状态应该共享相同的最优动作,这保留了原始状态空间的最优决策结构。这种观点导致决策相关选择(DRS)算法,它从候选集合中选择概念的子集,以及将所选概念与所得策略的性能相关的性能界限。从经验上讲,DRS自动恢复手动策划的概念集,同时匹配或超过其性能,并提高了强化学习基准和现实医疗环境中测试时概念干预的有效性。
摘要:Training interpretable concept-based policies requires practitioners to manually select which human-understandable concepts an agent should reason with when making sequential decisions. This selection demands domain expertise, is time-consuming and costly, scales poorly with the number of candidates, and provides no performance guarantees. To overcome this limitation, we propose the first algorithms for principled automatic concept selection in sequential decision-making. Our key insight is that concept selection can be viewed through the lens of state abstraction: intuitively, a concept is decision-relevant if removing it would cause the agent to confuse states that require different actions. As a result, agents should rely on decision-relevant concepts; states with the same concept representation should share the same optimal action, which preserves the optimal decision structure of the original state space. This perspective leads to the Decision-Relevant Selection (DRS) algorithm, which selects a subset of concepts from a candidate set, along with performance bounds relating the selected concepts to the performance of the resulting policy. Empirically, DRS automatically recovers manually curated concept sets while matching or exceeding their performance, and improves the effectiveness of test-time concept interventions across reinforcement learning benchmarks and real-world healthcare environments.
【3】Anticipatory Reinforcement Learning: From Generative Path-Laws to Distributional Value Functions
标题:预期强化学习:从生成路径定律到分配价值函数
链接:https://arxiv.org/abs/2604.04662
作者:Daniel Bloch
摘要
:本文介绍了预期强化学习(ARL),一种新的框架,旨在弥合非马尔可夫决策过程和经典强化学习架构之间的差距,特别是在一个单一的观察轨迹的约束下。在以跳跃扩散和结构突变为特征的环境中,传统的基于状态的方法往往无法捕获精确预见所需的基本路径依赖几何。我们解决了这个问题,通过提升状态空间到一个签名增强的流形,在那里的历史过程中嵌入作为一个动态坐标。通过利用自洽场的方法,代理人保持一个预期的代理未来的路径法,允许确定性的预期回报的评估。从随机分支到单次线性评估的这种过渡显著降低了计算复杂度和方差。我们证明,这个框架保留了基本的收缩特性,并确保稳定的推广,即使在重尾噪声的存在下。我们的研究结果表明,通过将强化学习建立在路径空间的拓扑特征上,智能体可以在高度波动的连续时间环境中实现主动风险管理和卓越的政策稳定性。
摘要:This paper introduces Anticipatory Reinforcement Learning (ARL), a novel framework designed to bridge the gap between non-Markovian decision processes and classical reinforcement learning architectures, specifically under the constraint of a single observed trajectory. In environments characterised by jump-diffusions and structural breaks, traditional state-based methods often fail to capture the essential path-dependent geometry required for accurate foresight. We resolve this by lifting the state space into a signature-augmented manifold, where the history of the process is embedded as a dynamical coordinate. By utilising a self-consistent field approach, the agent maintains an anticipated proxy of the future path-law, allowing for a deterministic evaluation of expected returns. This transition from stochastic branching to a single-pass linear evaluation significantly reduces computational complexity and variance. We prove that this framework preserves fundamental contraction properties and ensures stable generalisation even in the presence of heavy-tailed noise. Our results demonstrate that by grounding reinforcement learning in the topological features of path-space, agents can achieve proactive risk management and superior policy stability in highly volatile, continuous-time environments.
【4】FlashSAC: Fast and Stable Off-Policy Reinforcement Learning for High-Dimensional Robot Control
标题:Flash SAC:用于多维机器人控制的快速稳定的非策略强化学习
链接:https://arxiv.org/abs/2604.04539
作者:Donghu Kim,Youngdo Lee,Minho Park,Kinam Kim,I Made Aswin Nahendra,Takuma Seno,Sehee Min,Daniel Palenicek,Florian Vogt,Danica Kragic,Jan Peters,Jaegul Choo,Hojoon Lee
备注:preprint, 40pages
摘要:强化学习(RL)是机器人控制的核心方法,当专家演示不可用时。近端策略优化(PPO)等策略方法因其稳定性而被广泛使用,但它们对窄分布的策略数据的依赖限制了高维状态和动作空间中的准确策略评估。非策略方法可以通过从更广泛的状态-动作分布中学习来克服这一限制,但收敛缓慢且不稳定,因为在不同数据上拟合值函数需要多次梯度更新,导致临界误差通过自举积累。我们提出了FlashSAC,这是一种基于Soft Actor-Critic的快速稳定的离线RL算法。受监督学习中观察到的缩放定律的启发,FlashSAC大幅减少了梯度更新,同时使用更大的模型和更高的数据吞吐量进行补偿。为了在增加的规模下保持稳定性,FlashSAC显式地限制了权重、特征和梯度范数,从而抑制了关键错误的积累。在10个模拟器中的60多个任务中,FlashSAC在最终性能和训练效率方面始终优于PPO和强大的非策略基线,在灵巧操作等高维任务上的收益最大。在模拟到真实的人形运动中,FlashSAC将训练时间从数小时减少到数分钟,展示了非策略RL用于模拟到真实传输的前景。
摘要:Reinforcement learning (RL) is a core approach for robot control when expert demonstrations are unavailable. On-policy methods such as Proximal Policy Optimization (PPO) are widely used for their stability, but their reliance on narrowly distributed on-policy data limits accurate policy evaluation in high-dimensional state and action spaces. Off-policy methods can overcome this limitation by learning from a broader state-action distribution, yet suffer from slow convergence and instability, as fitting a value function over diverse data requires many gradient updates, causing critic errors to accumulate through bootstrapping. We present FlashSAC, a fast and stable off-policy RL algorithm built on Soft Actor-Critic. Motivated by scaling laws observed in supervised learning, FlashSAC sharply reduces gradient updates while compensating with larger models and higher data throughput. To maintain stability at increased scale, FlashSAC explicitly bounds weight, feature, and gradient norms, curbing critic error accumulation. Across over 60 tasks in 10 simulators, FlashSAC consistently outperforms PPO and strong off-policy baselines in both final performance and training efficiency, with the largest gains on high-dimensional tasks such as dexterous manipulation. In sim-to-real humanoid locomotion, FlashSAC reduces training time from hours to minutes, demonstrating the promise of off-policy RL for sim-to-real transfer.
【5】ReinVBC: A Model-based Reinforcement Learning Approach to Vehicle Braking Controller
标题:ReinVBC:一种基于模型的车辆制动控制器强化学习方法
链接:https://arxiv.org/abs/2604.04401
作者:Haoxin Lin,Junjie Zhou,Daheng Xu,Yang Yu
摘要:制动系统是保证汽车安全性和操纵性的关键模块,在生产过程中大量依赖人工标定。减少人力和时间消耗,同时保持车辆制动控制器(VBC)的性能,极大地有利于汽车行业。离线强化学习中的基于模型的方法有助于在数据驱动的动态模型中进行策略探索,为解决现实世界的控制任务提供了一个有前途的解决方案。这项工作提出了ReinVBC,它采用基于离线模型的强化学习方法来处理车辆制动控制问题。我们将有用的工程设计引入模型学习和利用的范例中,以获得可靠的车辆动力学模型和有能力的制动策略。几个结果表明,我们的方法在现实世界中的车辆制动的能力和它的潜力,以取代生产级防抱死制动系统。
摘要:Braking system, the key module to ensure the safety and steer-ability of current vehicles, relies on extensive manual calibration during production. Reducing labor and time consumption while maintaining the Vehicle Braking Controller (VBC) performance greatly benefits the vehicle industry. Model-based methods in offline reinforcement learning, which facilitate policy exploration within a data-driven dynamics model, offer a promising solution for addressing real-world control tasks. This work proposes ReinVBC, which applies an offline model-based reinforcement learning approach to deal with the vehicle braking control problem. We introduce useful engineering designs into the paradigm of model learning and utilization to obtain a reliable vehicle dynamics model and a capable braking policy. Several results demonstrate the capability of our method in real-world vehicle braking and its potential to replace the production-grade anti-lock braking system.
【6】Boosted Distributional Reinforcement Learning: Analysis and Healthcare Applications
标题:增强的分布式强化学习:分析和医疗保健应用
链接:https://arxiv.org/abs/2604.04334
作者:Zequn Chen,Wesley J. Marrero
备注:Preprint. 40 pages,11 figures. Supplementary appendix included
摘要
:研究人员和从业者越来越多地考虑强化学习来优化机器人和医疗保健等复杂领域的决策。迄今为止,这些努力主要是利用基于期望的学习。然而,依赖于期望为中心的目标可能是不够的,在高度不确定的情况下,涉及多个异质群体的一致性决策。虽然分布式强化学习算法已经被引入到模拟结果的完整分布,但它们可以在可比代理之间产生很大的已实现收益差异。这一挑战在医疗环境中尤为严峻,医生(控制者)必须管理多名疾病进展不确定和治疗反应异质的患者(下级代理人)。我们提出了一种增强的分布式强化学习(BDRL)算法,优化特定于代理的结果分布,同时执行相似代理之间的可比性,并分析其收敛性。为了进一步稳定学习,我们将更新后的投影步骤公式化为约束凸优化问题,它有效地将单个结果与指定公差内的高性能参考对齐。我们应用我们的算法来管理高血压在一个很大的子集的美国成年人的人口,个人分类为心血管疾病的风险群体。我们的方法通过模仿每个风险组中高性能参考的行为来修改中位数和脆弱患者的治疗计划。此外,我们发现,与强化学习基线相比,BDRL提高了质量调整生命年的数量和一致性。
摘要:Researchers and practitioners are increasingly considering reinforcement learning to optimize decisions in complex domains like robotics and healthcare. To date, these efforts have largely utilized expectation-based learning. However, relying on expectation-focused objectives may be insufficient for making consistent decisions in highly uncertain situations involving multiple heterogeneous groups. While distributional reinforcement learning algorithms have been introduced to model the full distributions of outcomes, they can yield large discrepancies in realized benefits among comparable agents. This challenge is particularly acute in healthcare settings, where physicians (controllers) must manage multiple patients (subordinate agents) with uncertain disease progression and heterogeneous treatment responses. We propose a Boosted Distributional Reinforcement Learning (BDRL) algorithm that optimizes agent-specific outcome distributions while enforcing comparability among similar agents and analyze its convergence. To further stabilize learning, we incorporate a post-update projection step formulated as a constrained convex optimization problem, which efficiently aligns individual outcomes with a high-performing reference within a specified tolerance. We apply our algorithm to manage hypertension in a large subset of the US adult population by categorizing individuals into cardiovascular disease risk groups. Our approach modifies treatment plans for median and vulnerable patients by mimicking the behavior of high-performing references in each risk group. Furthermore, we find that BDRL improves the number and consistency of quality-adjusted life years compared with reinforcement learning baselines.
【7】Pedagogical Safety in Educational Reinforcement Learning: Formalizing and Detecting Reward Hacking in AI Tutoring Systems
标题:教育强化学习中的教学安全:人工智能教学系统中的奖励黑客形式化和检测
链接:https://arxiv.org/abs/2604.04237
作者:Oluseyi Olukola,Nick Rahimi
备注:43 pages, 5 figures. Submitted to the International Journal of Artificial Intelligence in Education (IJAIED)
摘要:强化学习(RL)越来越多地用于智能辅导系统中的个性化教学,但该领域缺乏定义和评估教学安全性的正式框架。我们引入了一个四层模型的教育RL包括结构,进度,行为和对齐安全的教学安全,并提出了奖励黑客严重性指数(RHSI),以量化代理奖励和真正的学习之间的错位。 我们在人工智能辅导环境的受控模拟中评估了该框架,该环境具有四种条件和三种学习者配置文件的120个会话,总计18{,}000个交互。结果表明,一个执行优化的代理系统过度选择一个高参与度的行动,没有直接掌握增益,产生强大的测量性能,但有限的学习进展。一个多目标的奖励公式减少了这个问题,但没有消除它,因为代理人继续在许多国家赞成代理奖励行为。相比之下,一个约束架构相结合的先决条件执行和最小认知需求大大减少了奖励黑客,降低RHSI从0.317在无约束的多目标条件下的0.102。消融结果进一步表明,行为安全是最有影响力的保障,对重复的低价值的行动选择。 这些发现表明,奖励设计本身可能不足以确保教育RL中的教学行为,至少在这里研究的模拟环境中。更广泛地说,该论文将教学安全定位为人工智能安全和智能教育系统交叉点的一个重要研究问题。
摘要:Reinforcement learning (RL) is increasingly used to personalize instruction in intelligent tutoring systems, yet the field lacks a formal framework for defining and evaluating pedagogical safety. We introduce a four-layer model of pedagogical safety for educational RL comprising structural, progress, behavioral, and alignment safety and propose the Reward Hacking Severity Index (RHSI) to quantify misalignment between proxy rewards and genuine learning. We evaluate the framework in a controlled simulation of an AI tutoring environment with 120 sessions across four conditions and three learner profiles, totaling 18{,}000 interactions. Results show that an engagement-optimized agent systematically over-selected a high-engagement action with no direct mastery gain, producing strong measured performance but limited learning progress. A multi-objective reward formulation reduced this problem but did not eliminate it, as the agent continued to favor proxy-rewarding behavior in many states. In contrast, a constrained architecture combining prerequisite enforcement and minimum cognitive demand substantially reduced reward hacking, lowering RHSI from 0.317 in the unconstrained multi-objective condition to 0.102. Ablation results further suggest that behavioral safety was the most influential safeguard against repetitive low-value action selection. These findings suggest that reward design alone may be insufficient to ensure pedagogically aligned behavior in educational RL, at least in the simulated environment studied here. More broadly, the paper positions pedagogical safety as an important research problem at the intersection of AI safety and intelligent educational systems.
【8】Provable Multi-Task Reinforcement Learning: A Representation Learning Framework with Low Rank Rewards
标题:可证多任务强化学习:一种低等级奖励的表示学习框架
链接:https://arxiv.org/abs/2604.03891
作者:Yaoze Guo,Shana Moothedath
摘要
:多任务表征学习(MTRL)是一种在相关任务之间学习共享潜在表征的方法,促进协作学习,提高整体学习效率。本文研究了多任务强化学习(RL)的MTRL,其中多个任务具有相同的状态-动作空间和转移概率,但具有不同的奖励。我们考虑T线性马尔可夫决策过程(MDP),其中奖励函数和转移动力学允许维度d的线性特征嵌入。任务之间的相关性由奖励矩阵上的低秩结构捕获。由于数据的复杂性和策略依赖性,导致错误的时间进展,因此在多个RL任务中学习共享表示具有挑战性。我们的方法采用无奖励强化学习框架,首先学习数据收集策略。然后,该策略通知用于估计未知奖励矩阵的探索策略。重要的是,在这个精心设计的策略下收集的数据可以进行准确的估计,最终支持学习接近最优的策略。与现有的方法,依赖于限制性的假设,如高斯特征,不一致的条件,或获得最佳的解决方案,我们提出了一种低秩矩阵估计方法,在RL设置中遇到的更一般的特征分布下操作。理论分析表明,在这些宽松的假设下,准确的低秩矩阵恢复是可以实现的,我们表征了表示误差和样本复杂度之间的关系。利用学到的表示,我们构建了近最优的政策,并证明了一个遗憾的界限。实验结果表明,我们的方法有效地学习强大的共享表示和任务动态从有限的数据。
摘要:Multi-task representation learning (MTRL) is an approach that learns shared latent representations across related tasks, facilitating collaborative learning that improves the overall learning efficiency. This paper studies MTRL for multi-task reinforcement learning (RL), where multiple tasks have the same state-action space and transition probabilities, but different rewards. We consider T linear Markov Decision Processes (MDPs) where the reward functions and transition dynamics admit linear feature embeddings of dimension d. The relatedness among the tasks is captured by a low-rank structure on the reward matrices. Learning shared representations across multiple RL tasks is challenging due to the complex and policy-dependent nature of data that leads to a temporal progression of error. Our approach adopts a reward-free reinforcement learning framework to first learn a data-collection policy. This policy then informs an exploration strategy for estimating the unknown reward matrices. Importantly, the data collected under this well-designed policy enable accurate estimation, which ultimately supports the learning of an near-optimal policy. Unlike existing approaches that rely on restrictive assumptions such as Gaussian features, incoherence conditions, or access to optimal solutions, we propose a low-rank matrix estimation method that operates under more general feature distributions encountered in RL settings. Theoretical analysis establishes that accurate low-rank matrix recovery is achievable under these relaxed assumptions, and we characterize the relationship between representation error and sample complexity. Leveraging the learned representation, we construct near-optimal policies and prove a regret bound. Experimental results demonstrate that our method effectively learns robust shared representations and task dynamics from finite data.
【9】Delayed Homomorphic Reinforcement Learning for Environments with Delayed Feedback
标题:具有延迟反馈的环境的延迟同质强化学习
链接:https://arxiv.org/abs/2604.03641
作者:Jongsoo Lee,Jangwon Kim,Soohee Han
摘要:现实世界系统中的强化学习通常伴随着延迟反馈,这打破了马尔可夫假设,阻碍了学习和控制。规范状态增强方法导致状态空间爆炸,这引入了严重的样本复杂性负担。尽管最近取得了进展,但最先进的基于增强的基线仍然不完整:它们要么主要减轻了评论家的负担,要么对演员和评论家采用了不统一的治疗方法。为了提供一个结构化和样本效率的解决方案,我们提出了延迟同态强化学习(DHRL),这是一个基于MDP同态的框架,它可以折叠信念等价的增强状态,并在不损失最优性的情况下对所产生的抽象MDP进行有效的策略学习。我们提供了理论分析的状态空间压缩边界和样本的复杂性,并介绍了一个实用的算法。在MuJoCo基准测试中进行的连续控制任务实验证实,我们的算法优于基于增强的基线,特别是在长延迟的情况下。
摘要:Reinforcement learning in real-world systems is often accompanied by delayed feedback, which breaks the Markov assumption and impedes both learning and control. Canonical state augmentation approaches cause the state-space explosion, which introduces a severe sample-complexity burden. Despite recent progress, the state-of-the-art augmentation-based baselines remain incomplete: they either predominantly reduce the burden on the critic or adopt non-unified treatments for the actor and critic. To provide a structured and sample-efficient solution, we propose delayed homomorphic reinforcement learning (DHRL), a framework grounded in MDP homomorphisms that collapses belief-equivalent augmented states and enables efficient policy learning on the resulting abstract MDP without loss of optimality. We provide theoretical analyses of state-space compression bounds and sample complexity, and introduce a practical algorithm. Experiments on continuous control tasks in MuJoCo benchmark confirm that our algorithm outperforms strong augmentation-based baselines, particularly under long delays.
符号|符号学习(1篇)
【1】Analyzing Symbolic Properties for DRL Agents in Systems and Networking
标题:分析系统和网络中DRL代理的符号属性
链接:https://arxiv.org/abs/2604.04914
作者:Mohammad Zangooei,Jannis Weil,Amr Rizk,Mina Tahmasbi Arashloo,Raouf Boutaba
备注:Accepted in ACM SIGMETRICS'26
摘要:深度强化学习(DRL)在系统和网络中的复杂控制问题上表现出卓越的性能,包括自适应视频流、无线资源管理和拥塞控制。然而,为了安全部署,关键是要推理代理在实践中遇到的系统状态范围内的行为。现有的验证为基础的方法在这一领域主要集中在点属性,固定的输入状态,提供有限的覆盖范围,并需要大量的手动工作,以确定相关的输入输出对进行分析。在本文中,我们研究的符号属性,指定预期的行为范围内的输入状态,DRL代理系统和网络。我们提出了一个通用的符号属性,单调性和鲁棒性作为具体的例子,并展示了如何使用现有的DNN验证引擎进行分析。我们的方法编码的符号属性之间的比较相关的执行相同的政策,并将它们分解成实际上易于处理的子属性。这些技术作为实际的使能应用现有的验证工具,符号分析。使用我们的框架,diffRL,我们进行了广泛的实证研究,在三个基于DRL的控制系统,自适应视频流,无线资源管理和拥塞控制。通过这些案例研究,我们分析了广泛输入范围内的符号属性,研究了属性满意度在训练过程中的演变,研究了模型大小对可验证性的影响,并比较了多个验证后端。我们的研究结果表明,符号属性提供了比点属性更广泛的覆盖范围,可以发现不明显的,操作上有意义的反例,同时也揭示了实际的解决方案的权衡和限制。
摘要
:Deep reinforcement learning (DRL) has shown remarkable performance on complex control problems in systems and networking, including adaptive video streaming, wireless resource management, and congestion control. For safe deployment, however, it is critical to reason about how agents behave across the range of system states they encounter in practice. Existing verification-based methods in this domain primarily focus on point properties, defined around fixed input states, which offer limited coverage and require substantial manual effort to identify relevant input-output pairs for analysis. In this paper, we study symbolic properties, that specify expected behavior over ranges of input states, for DRL agents in systems and networking. We present a generic formulation for symbolic properties, with monotonicity and robustness as concrete examples, and show how they can be analyzed using existing DNN verification engines. Our approach encodes symbolic properties as comparisons between related executions of the same policy and decomposes them into practically tractable sub-properties. These techniques serve as practical enablers for applying existing verification tools to symbolic analysis. Using our framework, diffRL, we conduct an extensive empirical study across three DRL-based control systems, adaptive video streaming, wireless resource management, and congestion control. Through these case studies, we analyze symbolic properties over broad input ranges, examine how property satisfaction evolves during training, study the impact of model size on verifiability, and compare multiple verification backends. Our results show that symbolic properties provide substantially broader coverage than point properties and can uncover non-obvious, operationally meaningful counterexamples, while also revealing practical solver trade-offs and limitations.
分层学习(1篇)
【1】ArrowFlow: Hierarchical Machine Learning in the Space of Permutations
标题:ArrowFlow:排列空间中的分层机器学习
链接:https://arxiv.org/abs/2604.04087
作者:Ozgur Yilmaz
摘要:我们引入了ArrowFlow,这是一种完全在排列空间中运行的机器学习架构。它的计算单元是排名过滤器,学习排序,通过斯皮尔曼的脚距离比较输入,并通过置换矩阵累积进行更新,这是一种基于位移证据的非梯度规则。层按层次结构组成:每一层的输出排名成为下一层的输入,从而实现深度有序表示学习,而无需在核心计算中使用任何浮点参数。 我们连接到阿罗的不可能性定理的架构,表明违反社会选择公平公理(上下文依赖,专业化,对称性破缺)作为非线性,稀疏性和稳定性的归纳偏见。 实验跨越UCI表格基准,MNIST,基因表达癌症分类(TCGA)和偏好数据,所有这些都针对GridSearchCV调整的基线。ArrowFlow在Iris上击败了所有基线(2.7% vs. 3.3%),在大多数UCI数据集上具有竞争力。一个单一的参数,多项式次数,作为一个主开关:度1产生噪声鲁棒性(降低8-28%),隐私保护(+0.5pp成本),和丢失特征的弹性;更高的度交易这些提高清洁精度。 ArrowFlow的设计目的不是超越基于梯度的方法。这是一个存在的证据,证明竞争分类是可能的,在一个根本不同的计算范式,一个提升有序结构的一等公民,与自然对齐的整数只有和神经形态硬件。
摘要:We introduce ArrowFlow, a machine learning architecture that operates entirely in the space of permutations. Its computational units are ranking filters, learned orderings that compare inputs via Spearman's footrule distance and update through permutation-matrix accumulation, a non-gradient rule rooted in displacement evidence. Layers compose hierarchically: each layer's output ranking becomes the next layer's input, enabling deep ordinal representation learning without any floating-point parameters in the core computation. We connect the architecture to Arrow's impossibility theorem, showing that violations of social-choice fairness axioms (context dependence, specialization, symmetry breaking) serve as inductive biases for nonlinearity, sparsity, and stability. Experiments span UCI tabular benchmarks, MNIST, gene expression cancer classification (TCGA), and preference data, all against GridSearchCV-tuned baselines. ArrowFlow beats all baselines on Iris (2.7% vs. 3.3%) and is competitive on most UCI datasets. A single parameter, polynomial degree, acts as a master switch: degree 1 yields noise robustness (8-28% less degradation), privacy preservation (+0.5pp cost), and missing-feature resilience; higher degrees trade these for improved clean accuracy. ArrowFlow is not designed to surpass gradient-based methods. It is an existence proof that competitive classification is possible in a fundamentally different computational paradigm, one that elevates ordinal structure to a first-class citizen, with natural alignment to integer-only and neuromorphic hardware.
医学相关(5篇)
【1】FairLogue: A Toolkit for Intersectional Fairness Analysis in Clinical Machine Learning Models
标题:FairLogue:临床机器学习模型中交叉公平性分析的工具包
链接:https://arxiv.org/abs/2604.04858
作者:Nick Souligne,Vignesh Subbian
摘要:目的:医学公平性对于医疗保健中公平和可信的机器学习至关重要。大多数公平工具强调单轴人口比较,可能会错过影响交叉人群的复合差异。本研究介绍了Fairlogue,一个工具包,旨在操作的观察和反事实的背景下,在临床环境中的交叉公平性评估。研究方法:Fairlogue是一个基于Python的工具包,由三个部分组成:1)将人口统计学均等、均等化几率和平等机会差异扩展到交叉人群的观察框架; 2)在基于治疗的背景下评估公平性的反事实框架; 3)在交叉群体成员干预下评估公平性的广义反事实框架。在青光眼手术预测任务中,使用来自All of Us Controlled Tier V8数据集的电子健康记录数据(使用种族和性别作为受保护属性的逻辑回归)对该工具包进行了评估。结果如下:观察分析发现,尽管模型性能中等,但存在大量的交叉差异(AUROC = 0.709;准确度= 0.651)。交叉评估显示,公平性差距大于单轴分析,包括人口统计学奇偶性差异为0.20,真阳性和假阳性率差距分别为0.33和0.15。使用基于排列的零分布的反事实分析产生了接近零的不公平性(“u值”)估计,表明观察到的差异与协变量条件化后的机会一致。结论:Fairlogue提供了一个模块化工具包,集成了观察和反事实方法,用于量化和评估临床机器学习工作流程中的交叉偏倚。
摘要:Objective: Algorithmic fairness is essential for equitable and trustworthy machine learning in healthcare. Most fairness tools emphasize single-axis demographic comparisons and may miss compounded disparities affecting intersectional populations. This study introduces Fairlogue, a toolkit designed to operationalize intersectional fairness assessment in observational and counterfactual contexts within clinical settings. Methods: Fairlogue is a Python-based toolkit composed of three components: 1) an observational framework extending demographic parity, equalized odds, and equal opportunity difference to intersectional populations; 2) a counterfactual framework evaluating fairness under treatment-based contexts; and 3) a generalized counterfactual framework assessing fairness under interventions on intersectional group membership. The toolkit was evaluated using electronic health record data from the All of Us Controlled Tier V8 dataset in a glaucoma surgery prediction task using logistic regression with race and gender as protected attributes. Results: Observational analysis identified substantial intersectional disparities despite moderate model performance (AUROC = 0.709; accuracy = 0.651). Intersectional evaluation revealed larger fairness gaps than single-axis analyses, including demographic parity differences of 0.20 and equalized odds true positive and false positive rate gaps of 0.33 and 0.15, respectively. Counterfactual analysis using permutation-based null distributions produced unfairness ("u-value") estimates near zero, suggesting observed disparities were consistent with chance after conditioning on covariates. Conclusion: Fairlogue provides a modular toolkit integrating observational and counterfactual methods for quantifying and evaluating intersectional bias in clinical machine learning workflows.
【2】A Clinical Point Cloud Paradigm for In-Hospital Mortality Prediction from Multi-Level Incomplete Multimodal EHRs
标题:多水平不完整多模式EHR预测医院内死亡率的临床点云范式
链接:https://arxiv.org/abs/2604.04614
作者:Bohao Li,Tao Zou,Junchen Ye,Yan Gong,Bowen Du
备注:20 pages
摘要:基于深度学习的多模态电子健康记录(EHR)建模已成为临床诊断和风险预测的重要方法。然而,由于不同的临床工作流程和隐私限制,原始EHR本质上是多层次的不完整的,包括不规则的采样,缺失的模态和稀疏的标签。这些问题导致时间错位,模态不平衡,和有限的监督。大多数现有的多模态方法假设相对完整的数据,即使是针对不完整性设计的方法通常也只能孤立地解决其中的一两个问题。因此,它们通常依赖于严格的时间/模态对齐或丢弃不完整的数据,这可能会扭曲原始的临床语义。为了解决这个问题,我们提出了健康点(HP),一个统一的临床点云范例多层次不完整的电子病历。HP将异质临床事件表示为由内容、时间、模态和病例定义的连续4D空间中的点。为了对任意点对之间的交互进行建模,我们引入了一种低秩关系注意机制,该机制可以有效地捕获这四个维度之间的高阶依赖关系。我们进一步开发了一个层次化的交互和采样策略,以平衡细粒度建模和计算效率。基于此框架,HP实现了灵活的事件级交互和细粒度的自我监督,支持强大的模态恢复和有效使用未标记数据。在大规模EHR数据集上进行的风险预测实验表明,HP在不同程度的不完整性下始终保持着最先进的性能和较强的鲁棒性。
摘要:Deep learning-based modeling of multimodal Electronic Health Records (EHRs) has become an important approach for clinical diagnosis and risk prediction. However, due to diverse clinical workflows and privacy constraints, raw EHRs are inherently multi-level incomplete, including irregular sampling, missing modalities, and sparse labels. These issues cause temporal misalignment, modality imbalance, and limited supervision. Most existing multimodal methods assume relatively complete data, and even methods designed for incompleteness usually address only one or two of these issues in isolation. As a result, they often rely on rigid temporal/modal alignment or discard incomplete data, which may distort raw clinical semantics. To address this problem, we propose HealthPoint (HP), a unified clinical point cloud paradigm for multi-level incomplete EHRs. HP represents heterogeneous clinical events as points in a continuous 4D space defined by content, time, modality, and case. To model interactions between arbitrary point pairs, we introduce a Low-Rank Relational Attention mechanism that efficiently captures high-order dependencies across these four dimensions. We further develop a hierarchical interaction and sampling strategy to balance fine-grained modeling and computational efficiency. Built on this framework, HP enables flexible event-level interaction and fine-grained self-supervision, supporting robust modality recovery and effective use of unlabeled data. Experiments on large-scale EHR datasets for risk prediction show that HP consistently achieves state-of-the-art performance and strong robustness under varying degrees of incompleteness.
【3】ECG Biometrics with ArcFace-Inception: External Validation on MIMIC and HEEDB
标题:使用ArcFace-Incement的心电图生物识别技术:MIIC和HEEDB上的外部验证
链接:https://arxiv.org/abs/2604.04485
作者:Arjuna Scagnetto
摘要:ECG生物识别技术主要是在小群体和短的会话间隔上进行研究,留下了在大画廊,外部域转移和多年时间间隔下识别行为的开放。我们评估了一个1D Inception-v1模型使用ArcFace在来自53,079名患者的164,440个12导联ECG的内部临床语料库上进行训练,并在来自MIMIC-IV-ECG和HEEDB的较大队列中进行测试。该研究采用了统一的闭集留一法 方案与Rank@K和TAR@FAR指标,以及规模,时间应力,重新排序和置信度分析。在一般可比性下,系统在ASUGI-DB上的Rank@1为0.9506,在MIMIC-GC上为0.8291,在 HEEDB-GC。在恒定廊道尺寸的时间应力测试中,从1年到5年,MIMIC的Rank@1从0.7853下降到0.6433,HEEDB的Rank@1从0.6864下降到0.5560。HEEDB上的尺度分析显示,随着廊道尺寸的增加, 增加和恢复,因为每个病人有更多的检查。在HEEDB-RR上,事后重新排序进一步改善了检索,AS-范数从0.7765基线达到Rank@1 = 0.8005。因此,ECG身份信息 在外部验证的大规模闭集条件下仍然是可测量的,但其操作质量受到域异质性,纵向漂移,画廊大小和第二阶段分数处理的强烈影响。
摘要:ECG biometrics has been studied mainly on small cohorts and short inter-session intervals, leaving open how identification behaves under large galleries, external domain shift, and multi-year temporal gaps. We evaluated a 1D Inception-v1 model trained with ArcFace on an internal clinical corpus of 164,440 12-lead ECGs from 53,079 patients and tested it on larger cohorts derived from MIMIC-IV-ECG and HEEDB. The study used a unified closed-set leave-one-out protocol with Rank@K and TAR@FAR metrics, together with scale, temporal-stress, reranking, and confidence analyses. Under general comparability, the system achieved Rank@1 of 0.9506 on ASUGI-DB, 0.8291 on MIMIC-GC, and 0.6884 on HEEDB-GC. In the temporal stress test at constant gallery size, Rank@1 declined from 0.7853 to 0.6433 on MIMIC and from 0.6864 to 0.5560 on HEEDB from 1 to 5 years. Scale analysis on HEEDB showed monotonic degradation as gallery size increased and recovery as more examinations per patient became available. On HEEDB-RR, post-hoc reranking further improved retrieval, with AS-norm reaching Rank@1 = 0.8005 from a 0.7765 baseline. ECG identity information therefore remains measurable under externally validated large-scale closed-set conditions, but its operational quality is strongly affected by domain heterogeneity, longitudinal drift, gallery size, and second-stage score processing.
【4】Good Rankings, Wrong Probabilities: A Calibration Audit of Multimodal Cancer Survival Models
标题:好的排名,错误的可能性:多模式癌症生存模型的校准审计
链接:https://arxiv.org/abs/2604.04239
作者:Sajad Ghawami
备注:15 pages, 5 figures
摘要:融合全切片组织病理学图像与基因组数据的多模式深度学习模型在癌症生存预测方面取得了很强的区分性能,如一致性指数所衡量的那样。然而,从这些模型中得出的生存概率--无论是直接从原生输出还是通过标准的事后重建--是否经过校准,在很大程度上仍然没有得到检验。 据我们所知,我们对多模式WSI基因组学生存架构进行了首次系统性折叠水平1校准审核,评估了本地离散时间生存输出(实验A:TCGA-BRCA的3个模型)和来自标量的Breslow重建生存曲线风险评分(实验B:5种TCGA癌症类型的11个架构)。在实验A中,所有三个模型在大多数折叠上都未通过1-校准(Benjamini-Hochberg校正后,15个折叠水平测试中有12个被拒绝)。在全部290个倍数水平检验中,166个在Benjamini-Hochberg校正后的中位事件时间处拒绝了正确校准的零值(FDR = 0.05)。MCAT在GBMLGG上达到C指数0.817,但在所有五个折叠上均未通过1校准。 基于门控的融合与更好的校准相关联;双线性和级联融合则不然。事后普拉特缩放减少了在评估水平处的误校准(例如,MCAT:5/5倍至2/5倍),不影响辨别力。单用一致性指数不足以评价预期用于临床的生存模型。
摘要:Multimodal deep learning models that fuse whole-slide histopathology images with genomic data have achieved strong discriminative performance for cancer survival prediction, as measured by the concordance index. Yet whether the survival probabilities derived from these models - either directly from native outputs or via standard post-hoc reconstruction - are calibrated remains largely unexamined. We conduct, to our knowledge, the first systematic fold-level 1-calibration audit of multimodal WSI-genomics survival architectures, evaluating native discrete-time survival outputs (Experiment A: 3 models on TCGA-BRCA) and Breslow-reconstructed survival curves from scalar risk scores (Experiment B: 11 architectures across 5 TCGA cancer types). In Experiment A, all three models fail 1-calibration on a majority of folds (12 of 15 fold-level tests reject after Benjamini-Hochberg correction). Across the full 290 fold-level tests, 166 reject the null of correct calibration at the median event time after Benjamini-Hochberg correction (FDR = 0.05). MCAT achieves C-index 0.817 on GBMLGG yet fails 1-calibration on all five folds. Gating-based fusion is associated with better calibration; bilinear and concatenation fusion are not. Post-hoc Platt scaling reduces miscalibration at the evaluated horizon (e.g., MCAT: 5/5 folds failing to 2/5) without affecting discrimination. The concordance index alone is insufficient for evaluating survival models intended for clinical use.
【5】XAttnRes: Cross-Stage Attention Residuals for Medical Image Segmentation
标题:XAttnRes:医学图像分割的跨阶段注意力残留
链接:https://arxiv.org/abs/2604.03297
作者:Xinyu Liu,Qing Xu,Zhen Chen
摘要:在大型语言模型(LLM)领域,注意力恢复最近已经证明,在所有先前层输出上学习的选择性聚合可以优于固定的剩余连接。我们提出了跨阶段注意力恢复(XAttnRes),一种机制,保持一个全局特征历史池积累编码器和解码器阶段的输出。通过轻量级的伪查询注意力,每个阶段都有选择地从所有前面的表示中聚合。为了弥合LLM中的同维Transformer层与分割网络中的多尺度编码器-解码器级之间的差距,XAttnRes引入了空间对齐和通道投影步骤,这些步骤以可忽略的开销处理交叉分辨率特征。当添加到现有的分割网络中时,XAttnRes在四个数据集和三种成像模式上持续提高性能。我们进一步观察到,即使没有跳过连接,XAttnRes也可以实现与基线相当的性能,这表明学习聚合可以恢复传统上由预定连接提供的阶段间信息流。
摘要:In the field of Large Language Models (LLMs), Attention Residuals have recently demonstrated that learned, selective aggregation over all preceding layer outputs can outperform fixed residual connections. We propose Cross-Stage Attention Residuals (XAttnRes), a mechanism that maintains a global feature history pool accumulating both encoder and decoder stage outputs. Through lightweight pseudo-query attention, each stage selectively aggregates from all preceding representations. To bridge the gap between the same-dimensional Transformer layers in LLMs and the multi-scale encoder-decoder stages in segmentation networks, XAttnRes introduces spatial alignment and channel projection steps that handle cross-resolution features with negligible overhead. When added to existing segmentation networks, XAttnRes consistently improves performance across four datasets and three imaging modalities. We further observe that XAttnRes alone, even without skip connections, achieves performance on par with the baseline, suggesting that learned aggregation can recover the inter-stage information flow traditionally provided by predetermined connections.
蒸馏|知识提取(2篇)
【1】Out-of-Air Computation: Enabling Structured Extraction from Wireless Superposition
标题:空中计算:实现无线叠加的结构化提取
链接:https://arxiv.org/abs/2604.04312
作者:Seyed Mohammad Azimi-Abarghouyi
摘要:空中计算(AirComp)传统上建立在将计算预嵌入到传输波形中或利用大规模天线阵列的原理上,通常需要无线多址信道(MAC)在接近理想计算介质的条件下操作。本文介绍了一种新的计算框架,称为空中计算(AirCPU),它建立了一个联合信源信道编码的基础,其中计算是不嵌入在传输之前,而是利用结构化编码从无线叠加提取。AirCPU直接对连续值设备数据进行操作,避免了对单独源量化级的需要,并采用多层嵌套网格架构,通过将每个输入分解为分层缩放的分量来实现渐进式分辨率,所有分量都在固定功率约束下通过公共有界数字星座进行传输。我们正式解耦分辨率的概念,表明在操作制度的解码错误概率是足够小的,信道噪声和有限的星座约束对失真的影响变得可以忽略不计,所产生的计算误差主要是由最细的晶格的目标分辨率设置。对于衰落MAC,我们进一步引入集体和连续计算机制,除了所提出的直接计算之外,其利用多个解码的整数系数函数和边信息函数作为无线叠加的结构表示,以显著扩展可靠的操作机制;在这种情况下,我们制定和表征基本的可靠性条件和整数优化问题,并开发一个结构化的低复杂度两组近似来解决它们。
摘要:Over-the-air computation (AirComp) has traditionally been built on the principle of pre-embedding computation into transmitted waveforms or on exploiting massive antenna arrays, often requiring the wireless multiple-access channel (MAC) to operate under conditions that approximate an ideal computational medium. This paper introduces a new computation framework, termed out-of-air computation (AirCPU), which establishes a joint source-channel coding foundation in which computation is not embedded before transmission but is instead extracted from the wireless superposition by exploiting structured coding. AirCPU operates directly on continuous-valued device data, avoiding the need for a separate source quantization stage, and employs a multi-layer nested lattice architecture that enables progressive resolution by decomposing each input into hierarchically scaled components, all transmitted over a common bounded digital constellation under a fixed power constraint. We formalize the notion of decoupled resolution, showing that in operating regimes where the decoding error probability is sufficiently small, the impact of channel noise and finite constellation constraints on distortion becomes negligible, and the resulting computation error is primarily determined by the target resolution set by the finest lattice. For fading MACs, we further introduce collective and successive computation mechanisms, in addition to the proposed direct computation, which exploit multiple decoded integer-coefficient functions and side-information functions as structural representations of the wireless superposition to significantly expand the reliable operating regime; in this context, we formulate and characterize the underlying reliability conditions and integer optimization problems, and develop a structured low-complexity two-group approximation to address them.
【2】Geometric Limits of Knowledge Distillation: A Minimum-Width Theorem via Superposition Theory
标题:知识蒸馏的几何极限:叠加理论的最小宽度定理
链接:https://arxiv.org/abs/2604.04037
作者:Dawar Jyoti Deka,Nilesh Sarkar
摘要:知识蒸馏将大教师压缩成较小的学生,但表现饱和在损失地板上,这种损失地板在培训方法和目标中持续存在。我们认为这个地板是几何的:神经网络通过叠加表示的特征远远超过维度,并且宽度为$d_S$的学生最多可以编码$d_S \cdot g(α)$特征,其中$g(α)= 1/((1-α)\ln\frac{1}{1-α})$是一个稀疏依赖的容量函数。超出此预算的功能将永久丢失,从而产生重要性加权损失下限。我们验证了一个玩具模型(48个配置,平均精度>93%)和Pythia-410 M,其中稀疏自动编码器测量$F \approximat 28{,}700$功能在$α\approximat0.992 $(临界宽度$d_S^* \approximat 1{,}065$)。蒸馏成五个学生的宽度证实了预测的单调地板排序。观察到的地板分解成一个几何组件和一个宽度无关的建筑基线($R^2 = 0.993$)。线性探测显示,粗糙的概念即使在88%的特征损失中也能存活下来,这表明底部来自重要性分布长尾中细粒度特征的总损失。我们的研究结果连接代表性的几何蒸馏极限,并提供了一个实用的工具,预测蒸馏性能从SAE测量单独。
摘要:Knowledge distillation compresses large teachers into smaller students, but performance saturates at a loss floor that persists across training methods and objectives. We argue this floor is geometric: neural networks represent far more features than dimensions through superposition, and a student of width $d_S$ can encode at most $d_S \cdot g(α)$ features, where $g(α) = 1/((1-α)\ln\frac{1}{1-α})$ is a sparsity-dependent capacity function. Features beyond this budget are permanently lost, yielding an importance-weighted loss floor. We validate on a toy model (48 configurations, median accuracy >93%) and on Pythia-410M, where sparse autoencoders measure $F \approx 28{,}700$ features at $α\approx 0.992$ (critical width $d_S^* \approx 1{,}065$). Distillation into five student widths confirms the predicted monotonic floor ordering. The observed floor decomposes into a geometric component and a width-independent architectural baseline ($R^2 = 0.993$). Linear probing shows coarse concepts survive even 88% feature loss, revealing the floor arises from aggregate loss of fine-grained features in the importance distribution's long tail. Our results connect representation geometry to distillation limits and provide a practical tool for predicting distillation performance from SAE measurements alone.
推荐(1篇)
【1】A Logical-Rule Autoencoder for Interpretable Recommendations
标题:可解释建议的逻辑规则自动编码器
链接:https://arxiv.org/abs/2604.04270
作者:Jinhao Pan,Bowen Wei,Ziwei Zhu
摘要:大多数深度学习推荐模型都像黑箱一样运行,依赖于模糊决策过程的潜在表示。这种内在可解释性的缺乏引起了需要透明度和问责制的应用程序的关注。在这项工作中,我们提出了一个逻辑规则可解释的自动编码器(LIA)的协同过滤,是可解释的设计。LIA引入了一个可学习的逻辑规则层,其中每个规则神经元都配备了一个门参数,可以在训练过程中自动选择AND和OR运算符,使模型能够直接从数据中发现不同的逻辑模式。为了支持功能完整性而不使输入维度加倍,LIA通过连接权重的符号对否定进行编码,提供了一种参数高效的机制来表达每个规则中的正项和负项条件。通过学习明确的、人类可读的重构规则,LIA允许用户直接跟踪每个建议背后的决策过程。大量的实验表明,我们的方法实现了改进的推荐性能,传统的基线,同时保持完全可解释。代码和数据可在https://github.com/weibowen555/LIA上获得。
摘要:Most deep learning recommendation models operate as black boxes, relying on latent representations that obscure their decision process. This lack of intrinsic interpretability raises concerns in applications that require transparency and accountability. In this work, we propose a Logical-rule Interpretable Autoencoder (LIA) for collaborative filtering that is interpretable by design. LIA introduces a learnable logical rule layer in which each rule neuron is equipped with a gate parameter that automatically selects between AND and OR operators during training, enabling the model to discover diverse logical patterns directly from data. To support functional completeness without doubling the input dimensionality, LIA encodes negation through the sign of connection weights, providing a parameter-efficient mechanism for expressing both positive and negated item conditions within each rule. By learning explicit, human-readable reconstruction rules, LIA allows users to directly trace the decision process behind each recommendation. Extensive experiments show that our method achieves improved recommendation performance over traditional baselines while remaining fully interpretable. Code and data are available at https://github.com/weibowen555/LIA.
超分辨率|去噪|去模糊|去雾(3篇)
【1】Partially deterministic sampling for compressed sensing with denoising guarantees
标题:具有去噪保证的压缩感知部分确定性采样
链接:https://arxiv.org/abs/2604.04802
作者:Yaniv Plan,Matthew S. Scott,Ozgur Yilmaz
摘要:研究了采样向量从酉矩阵行中选取的压缩感知算法。在文献中,这些采样向量通常是随机选择的;随机性的使用使得该领域的经验和理论取得了重大进展。然而,在实践中,通常存在某些关键的采样向量,在这种情况下,实践者将偏离理论并确定性地对这些行进行采样。在这项工作中,我们推导出一个优化的抽样方案,自然结合随机和确定性的选择行的伯努利选择器,从而严格决定哪些行应确定性采样。这种采样方案提供了可衡量的改进,在图像压缩感知的生成和稀疏先验时,与替换和不替换的采样方案相比,我们与理论结果和数值实验。此外,我们的理论保证与以前的作品相比,具有改进的样本复杂度范围,以及在此设置中的新颖的去噪保证。
摘要:We study compressed sensing when the sampling vectors are chosen from the rows of a unitary matrix. In the literature, these sampling vectors are typically chosen randomly; the use of randomness has enabled major empirical and theoretical advances in the field. However, in practice there are often certain crucial sampling vectors, in which case practitioners will depart from the theory and sample such rows deterministically. In this work, we derive an optimized sampling scheme for Bernoulli selectors which naturally combines random and deterministic selection of rows, thus rigorously deciding which rows should be sampled deterministically. This sampling scheme provides measurable improvements in image compressed sensing for both generative and sparse priors when compared to with-replacement and without-replacement sampling schemes, as we show with theoretical results and numerical experiments. Additionally, our theoretical guarantees feature improved sample complexity bounds compared to previous works, and novel denoising guarantees in this setting.
【2】TinyNina: A Resource-Efficient Edge-AI Framework for Sustainable Air Quality Monitoring via Intra-Image Satellite Super-Resolution
标题:TinyNina:通过图像内卫星超分辨率进行可持续空气质量监测的资源高效边缘AI框架
链接:https://arxiv.org/abs/2604.04445
作者:Prasanjit Dey,Zachary Yahn,Bianca Schoen-Phelan,Soumyabrata Dev
备注:This manuscript is currently under review at IEEE Access
摘要:二氧化氮(NO_2 $)是一种主要的大气污染物,是呼吸道疾病和城市气候相关挑战的重要贡献者。虽然Sentinel-2等卫星平台提供全球覆盖,但其固有的空间分辨率往往限制了所需的精确度,细粒度NO$_2$评估。为了解决这个问题,我们提出了TinyNina,这是一个专为可持续环境监测而设计的资源高效的Edge-AI框架。TinyNina实现了一种新的图像内学习范式,利用Sentinel-2的多光谱层次作为内部训练标签,有效地消除了对昂贵且通常不可用的外部高分辨率参考数据集的依赖。该框架结合了特定波长的注意力门和深度可分离卷积,以保留对污染物敏感的光谱特征,同时保持仅51 K参数的超轻量足迹。通过对3,276个匹配的卫星-地面站对进行验证的实验结果表明,TinyNina实现了最先进的平均绝对误差(MAE)为7.4 μ g/m^3 $。与EDSR和RCAN等高容量模型相比,此性能意味着计算开销减少了95%,推理速度提高了47倍。通过优先考虑特定任务的效用和架构效率,TinyNina为智能城市基础设施中的实时空气质量监测提供了一种可扩展的低延迟解决方案。
摘要:Nitrogen dioxide (NO$_2$) is a primary atmospheric pollutant and a significant contributor to respiratory morbidity and urban climate-related challenges. While satellite platforms like Sentinel-2 provide global coverage, their native spatial resolution often limits the precision required, fine-grained NO$_2$ assessment. To address this, we propose TinyNina, a resource-efficient Edge-AI framework specifically engineered for sustainable environmental monitoring. TinyNina implements a novel intra-image learning paradigm that leverages the multi-spectral hierarchy of Sentinel-2 as internal training labels, effectively eliminating the dependency on costly and often unavailable external high-resolution reference datasets. The framework incorporates wavelength-specific attention gates and depthwise separable convolutions to preserve pollutant-sensitive spectral features while maintaining an ultra-lightweight footprint of only 51K parameters. Experimental results, validated against 3,276 matched satellite-ground station pairs, demonstrate that TinyNina achieves a state-of-the-art Mean Absolute Error (MAE) of 7.4 $μ$g/m$^3$. This performance represents a 95% reduction in computational overhead and 47$\times$ faster inference compared to high-capacity models such as EDSR and RCAN. By prioritizing task-specific utility and architectural efficiency, TinyNina provides a scalable, low-latency solution for real-time air quality monitoring in smart city infrastructures.
【3】NAIMA: Semantics Aware RGB Guided Depth Super-Resolution
标题:NAIMA:语义意识到Ruby引导深度超分辨率
链接:https://arxiv.org/abs/2604.04407
作者:Tayyab Nasir,Daochang Liu,Ajmal Mian
摘要:引导深度超分辨率(GDSR)是一种用于深度图超分辨率的多模式方法,其依赖于低分辨率深度图和高分辨率RGB图像来恢复更精细的结构细节。然而,RGB图像中指示深度不连续性的误导性颜色和纹理线索通常导致所生成的深度图中的伪影和模糊的深度边界。我们提出了一种解决方案,引入全局上下文语义先验,从预训练的Vision Transformer令牌嵌入生成。我们从预训练的令牌嵌入中提取语义知识的方法是基于它们在相关单目深度估计任务中所表现出的有效性。我们引入了一个引导令牌注意力(GTA)模块,它迭代地将编码的RGB空间特征与深度编码对齐,使用交叉注意力选择性地注入从预训练的Vision Transformer的不同层提取的全局语义上下文。此外,我们提出了一种称为隐式多标记对齐神经注意力(NAIMA)的架构,该架构将DINOv2与GTA块集成为语义感知的GDSR。我们提出的架构,其提取语义知识的能力,在多个缩放因子和数据集上实现了对现有方法的显着改进。
摘要:Guided depth super-resolution (GDSR) is a multi-modal approach for depth map super-resolution that relies on a low-resolution depth map and a high-resolution RGB image to restore finer structural details. However, the misleading color and texture cues indicating depth discontinuities in RGB images often lead to artifacts and blurred depth boundaries in the generated depth map. We propose a solution that introduces global contextual semantic priors, generated from pretrained vision transformer token embeddings. Our approach to distilling semantic knowledge from pretrained token embeddings is motivated by their demonstrated effectiveness in related monocular depth estimation tasks. We introduce a Guided Token Attention (GTA) module, which iteratively aligns encoded RGB spatial features with depth encodings, using cross-attention for selectively injecting global semantic context extracted from different layers of a pretrained vision transformer. Additionally, we present an architecture called Neural Attention for Implicit Multi-token Alignment (NAIMA), which integrates DINOv2 with GTA blocks for a semantics-aware GDSR. Our proposed architecture, with its ability to distill semantic knowledge, achieves significant improvements over existing methods across multiple scaling factors and datasets.
自动驾驶|车辆|车道检测等(5篇)
【1】Multi-Modal Sensor Fusion using Hybrid Attention for Autonomous Driving
标题:利用混合注意力实现自动驾驶的多模式传感器融合
链接:https://arxiv.org/abs/2604.04797
作者:Mayank Mayank,Bharanidhar Duraisamy,Florian Geiß,Abhinav Valada
备注:9 pages, 8 figures
摘要
:自动驾驶的精确3D物体检测需要互补传感器。相机提供密集的语义,但不可靠的深度,而毫米波雷达提供精确的距离和速度测量与稀疏的几何。我们提出了MMF-BEV,这是一种雷达-相机BEV融合框架,它利用可变形注意力在Delft视图(VoD)4D雷达数据集上进行跨模态特征对齐[1]。MMF-BEV构建BEVDepth [2]相机分支和RadarBEVNet [3]雷达分支,每个分支都使用可变形自注意力进行增强,并通过可变形交叉注意力模块将它们融合。我们评估了三种配置:仅摄像机,仅雷达和混合融合。传感器贡献分析量化每距离模态加权,提供传感器互补性的可解释的证据。两阶段训练策略-使用深度监督预训练相机分支,然后联合训练雷达和融合模块以稳定学习。VoD实验表明,MMF-BEV始终优于单峰基线,并在完整注释区域和近距离感兴趣区域的所有对象类中实现了与先前融合方法相比具有竞争力的结果。
摘要:Accurate 3D object detection for autonomous driving requires complementary sensors. Cameras provide dense semantics but unreliable depth, while millimeter-wave radar offers precise range and velocity measurements with sparse geometry. We propose MMF-BEV, a radar-camera BEV fusion framework that leverages deformable attention for cross-modal feature alignment on the View-of-Delft (VoD) 4D radar dataset [1]. MMF-BEV builds a BEVDepth [2] camera branch and a RadarBEVNet [3] radar branch, each enhanced with Deformable Self-Attention, and fuses them via a Deformable Cross-Attention module. We evaluate three configurations: camera-only, radar-only, and hybrid fusion. A sensor contribution analysis quantifies per-distance modality weighting, providing interpretable evidence of sensor complementarity. A two-stage training strategy - pre-training the camera branch with depth supervision, then jointly training radar and fusion modules stabilizes learning. Experiments on VoD show that MMF-BEV consistently outperforms unimodal baselines and achieves competitive results against prior fusion methods across all object classes in both the full annotated area and near-range Region of Interest.
【2】Intelligent Traffic Monitoring with YOLOv11: A Case Study in Real-Time Vehicle Detection
标题:利用YOLOv 11进行智能交通监控:实时车辆检测的案例研究
链接:https://arxiv.org/abs/2604.04080
作者:Shkelqim Sherifi
备注:2025 International Conference on Computer and Applications (ICCA)
摘要:由人工智能驱动的计算机视觉的最新进展显着增强了监控系统。一个值得注意的应用是交通监控,它利用计算机视觉以及基于深度学习的对象检测和计数。我们提出了一个离线的实时交通监控系统,该系统将预先训练的YOLOv 11检测器与BoT-SORT/ByteTrack相结合,用于多对象跟踪,在PyTorch/OpenCV中实现,并包装在基于Qt的桌面UI中。CNN管道能够从视频流中进行高效的车辆检测和计数,而无需依赖于云。在不同的场景下,该系统实现了(66.67-95.83%)的计数精度。类检测产生高精度(汽车:0.97-1.00;卡车:1.00)和强召回(汽车:0.82-1.00;卡车:0.70-1.00),导致F1分数(汽车为0.90-1.00,卡车为0.82-1.00)。虽然恶劣的天气条件可能会对这一性能产生负面影响,但在典型条件下,结果仍然稳健。通过将轻量级模型与可访问的、独立于云的界面相集成,本文通过展示人工智能驱动的交通监控系统的能力,为未来智慧城市的现代化和发展做出了贡献。
摘要:Recent advancements in computer vision, driven by artificial intelligence, have significantly enhanced monitoring systems. One notable application is traffic monitoring, which leverages computer vision alongside deep learning-based object detection and counting. We present an offline, real-time traffic monitoring system that couples a pre-trained YOLOv11 detector with BoT-SORT/ByteTrack for multi-object tracking, implemented in PyTorch/OpenCV and wrapped in a Qt-based desktop UI. The CNN pipeline enables efficient vehicle detection and counting from video streams without cloud dependencies. Across diverse scenes, the system achieves (66.67-95.83%) counting accuracy. Class-wise detection yields high precision (cars: 0.97-1.00; trucks: 1.00) with strong recall (cars: 0.82-1.00; trucks: 0.70-1.00), resulting in F1 scores of (0.90-1.00 for cars and 0.82-1.00 for trucks). While adverse weather conditions may negatively impact this performance, results remain robust in typical conditions. By integrating lightweight models with an accessible, cloud-independent interface, this paper contributes to the modernization and development of future smart cities by showing the capacity of AI-driven traffic monitoring systems.
【3】Spatiotemporal-Aware Bit-Flip Injection on DNN-based Advanced Driver Assistance Systems
标题:基于DNN的高级驾驶员辅助系统上的时空感知位翻转注入
链接:https://arxiv.org/abs/2604.03753
作者:Taibiao Zhao,Xiang Zhang,Mingxuan Sun,Ruyi Ding,Xugui Zhou
摘要:现代高级驾驶辅助系统(ADAS)依赖深度神经网络(DNN)进行感知和规划。由于DNN的参数在推断期间驻留在DRAM中,因此由宇宙辐射或低电压操作引起的位翻转可能会破坏DNN计算,扭曲驱动决策,并导致现实世界的事件。本文提出了一种时空感知故障注入(STAFI)框架,有效地定位关键故障点的ADAS DNN。在空间上,我们提出了一种渐进式度量引导位搜索(PMBS),它有效地识别关键网络权重位,这些关键网络权重位的损坏导致驾驶行为的最大偏差(例如,意外加速或转向)。此外,我们开发了一个关键故障时间识别(CFTI)机制,确定何时触发这些故障,同时考虑到实时系统和环境状态的上下文,以最大限度地提高安全性的影响。在生产ADAS的DNN上进行的实验表明,STAFI发现的危险诱导关键故障比最强基线多29.56倍。
摘要:Modern advanced driver assistance systems (ADAS) rely on deep neural networks (DNNs) for perception and planning. Since DNNs' parameters reside in DRAM during inference, bit flips caused by cosmic radiation or low-voltage operation may corrupt DNN computations, distort driving decisions, and lead to real-world incidents. This paper presents a SpatioTemporal-Aware Fault Injection (STAFI) framework to locate critical fault sites in DNNs for ADAS efficiently. Spatially, we propose a Progressive Metric-guided Bit Search (PMBS) that efficiently identifies critical network weight bits whose corruption causes the largest deviations in driving behavior (e.g., unintended acceleration or steering). Furthermore, we develop a Critical Fault Time Identification (CFTI) mechanism that determines when to trigger these faults, taking into account the context of real-time systems and environmental states, to maximize the safety impact. Experiments on DNNs for a production ADAS demonstrate that STAFI uncovers 29.56x more hazard-inducing critical faults than the strongest baseline.
【4】Super Agents and Confounders: Influence of surrounding agents on vehicle trajectory prediction
标题:超级特工和混淆者:周围特工对车辆轨迹预测的影响
链接:https://arxiv.org/abs/2604.03463
作者:Daniel Jost,Luca Paparusso,Martin Stoll,Jörg Wagner,Raghu Rajan,Joschka Bödecker
摘要
:在高度交互的驾驶场景中,轨迹预测取决于来自周围交通参与者(如汽车和行人)的信息。我们的主要贡献是对最先进的轨迹预测器进行了全面分析,揭示了一个令人惊讶的关键缺陷:许多周围的代理降低了预测准确性,而不是提高它。使用基于Shapley的归因,我们严格证明了模型学习不稳定和非因果的决策方案,这些方案在训练运行中变化很大。基于这些见解,我们建议整合条件信息瓶颈(CIB),它不需要额外的监督,并且经过训练可以有效地压缩代理功能,并忽略那些对预测任务不利的功能。使用多个数据集和模型架构的综合实验表明,这种简单而有效的方法不仅在许多情况下提高了整体轨迹预测性能,而且还提高了对不同扰动的鲁棒性。我们的研究结果突出了有选择地整合上下文信息的重要性,这些信息通常包含虚假或误导性的信号,在轨迹预测中。此外,我们提供了可解释的指标来识别非鲁棒行为,并提出了一个有前途的解决方案。
摘要:In highly interactive driving scenes, trajectory prediction is conditioned on information from surrounding traffic participants such as cars and pedestrians. Our main contribution is a comprehensive analysis of state-of-the-art trajectory predictors, which reveals a surprising and critical flaw: many surrounding agents degrade prediction accuracy rather than improve it. Using Shapley-based attribution, we rigorously demonstrate that models learn unstable and non-causal decision-making schemes that vary significantly across training runs. Building on these insights, we propose to integrate a Conditional Information Bottleneck (CIB), which does not require additional supervision and is trained to effectively compress agent features as well as ignore those that are not beneficial for the prediction task. Comprehensive experiments using multiple datasets and model architectures demonstrate that this simple yet effective approach not only improves overall trajectory prediction performance in many cases but also increases robustness to different perturbations. Our results highlight the importance of selectively integrating contextual information, which can often contain spurious or misleading signals, in trajectory prediction. Moreover, we provide interpretable metrics for identifying non-robust behavior and present a promising avenue towards a solution.
【5】SDVDiag: Using Context-Aware Causality Mining for the Diagnosis of Connected Vehicle Functions
标题:SDVDiag:使用上下文感知因果关系挖掘来诊断互联车辆功能
链接:https://arxiv.org/abs/2604.03391
作者:Matthias Weiß,Falk Dettinger,Elias Detrois,Nasser Jazdi,Michael Weyrich
备注:7 pages, 4 figures, to be submitted to the VTC2026
摘要:联网车辆功能的实际实现正在稳步扩展,但由于其分布式特性以及底层云、边缘和网络基础设施的复杂性,可靠地运行这些功能仍然具有挑战性。快速诊断问题并了解导致故障的错误链对于减少停机时间至关重要。然而,诊断这些系统仍然主要是手动执行的,因为自动化分析技术主要是数据驱动的,并与隐藏的关系和上下文信息的集成作斗争。本文通过引入多模式方法来解决这一差距,该方法将人类反馈和系统特定信息集成到因果分析过程中。采用基于人的反馈的强化学习,在结合专家知识的同时,不断训练因果关系挖掘模型。其他模块利用分布式跟踪数据来修剪假阳性因果链接,并启用特定领域关系的注入,以进一步细化因果图。使用在联网车辆测试场中运行的自动代客泊车应用程序进行评估。结果表明,与纯粹的数据驱动方法相比,因果边缘检测的精度从14%显著提高到100%,并提高了系统的可解释性,突出了系统运营商在联网车辆领域的潜力。
摘要:Real-world implementations of connected vehicle functions are spreading steadily, yet operating these functions reliably remains challenging due to their distributed nature and the complexity of the underlying cloud, edge, and networking infrastructure. Quick diagnosis of problems and understanding the error chains that lead to failures is essential for reducing downtime. However, diagnosing these systems is still largely performed manually, as automated analysis techniques are predominantly data-driven and struggle with hidden relationships and the integration of context information. This paper addresses this gap by introducing a multimodal approach that integrates human feedback and system-specific information into the causal analysis process. Reinforcement Learning from Human Feedback is employed to continuously train a causality mining model while incorporating expert knowledge. Additional modules leverage distributed tracing data to prune false-positive causal links and enable the injection of domain-specific relationships to further refine the causal graph.Evaluation is performed using an automated valet parking application operated in a connected vehicle test field. Results demonstrate a significant increase in precision from 14\% to 100\% for the detection of causal edges and improved system interpretability compared to purely data-driven approaches, highlighting the potential for system operators in the connected vehicle domain.
点云|SLAM|雷达|激光|深度RGBD相关(1篇)
【1】Hybrid Fourier Neural Operator for Surrogate Modeling of Laser Processing with a Quantum-Circuit Mixer
标题:量子电路混合器激光加工替代建模的混合傅里叶神经运算器
链接:https://arxiv.org/abs/2604.04828
作者:Mateusz Papierz,Asel Sagingalieva,Alix Benoit,Toni Ivas,Elia Iseli,Alexey Melnikov
备注:24 pages, 10 figures, 6 tables
摘要:数据驱动的代理可以取代参数偏微分方程的昂贵的多物理场求解器,但为三维问题构建紧凑,准确的神经运算符仍然具有挑战性:在傅立叶神经运算符中,密集的模式频谱通道混合与保留的傅立叶模式的数量呈线性关系,膨胀参数计数并限制实时部署。我们介绍HQ-LP-FNO,一种混合量子经典FNO,它用一个紧凑的、模式共享的变分量子电路混频器取代了这些密集光谱块的一部分,其参数计数与傅立叶模式预算无关。一个参数匹配的经典瓶颈控制共同设计,提供了一个严格的评估框架。评估的三维替代建模的高能激光加工,耦合传热,熔池对流,自由表面变形,相变,HQ-LP-FNO减少了15.6%的可训练参数相对于经典的基线,同时降低相分数平均绝对误差26%和相对温度MAE从2.89%到2.56%。对量子通道预算的扫描显示,适度的VQC分配在所有测试配置中产生最佳的温度度量,包括完全经典的基线,指向最佳的经典量子分区。烧蚀证实,模式共享混合,自然实现的VQC通过其紧凑的电路结构,是这些改进的主要贡献者。在ibm-torino后端校准噪声下的噪声模拟器研究证实了量子混频器在测试范围内的数值稳定性。这些结果表明,基于VQC的参数高效光谱混合可以改善复杂多物理问题的神经算子替代品,并在实践中建立混合量子算子学习的受控评估协议。
摘要
:Data-driven surrogates can replace expensive multiphysics solvers for parametric PDEs, yet building compact, accurate neural operators for three-dimensional problems remains challenging: in Fourier Neural Operators, dense mode-wise spectral channel mixing scales linearly with the number of retained Fourier modes, inflating parameter counts and limiting real-time deployability. We introduce HQ-LP-FNO, a hybrid quantum-classical FNO that replaces a configurable fraction of these dense spectral blocks with a compact, mode-shared variational quantum circuit mixer whose parameter count is independent of the Fourier mode budget. A parameter-matched classical bottleneck control is co-designed to provide a rigorous evaluation framework. Evaluated on three-dimensional surrogate modeling of high-energy laser processing, coupling heat transfer, melt-pool convection, free-surface deformation, and phase change, HQ-LP-FNO reduces trainable parameters by 15.6% relative to a classical baseline while lowering phase-fraction mean absolute error by 26% and relative temperature MAE from 2.89% to 2.56%. A sweep over the quantum-channel budget reveals that a moderate VQC allocation yields the best temperature metrics across all tested configurations, including the fully classical baseline, pointing toward an optimal classical-quantum partitioning. The ablation confirms that mode-shared mixing, naturally implemented by the VQC through its compact circuit structure, is the dominant contributor to these improvements. A noisy-simulator study under backend-calibrated noise from ibm-torino confirms numerical stability of the quantum mixer across the tested shot range. These results demonstrate that VQC-based parameter-efficient spectral mixing can improve neural operator surrogates for complex multiphysics problems and establish a controlled evaluation protocol for hybrid quantum operator learning in practice.
联邦学习|隐私保护|加密(2篇)
【1】SecureAFL: Secure Asynchronous Federated Learning
标题:SecureAFL:安全的同步联邦学习
链接:https://arxiv.org/abs/2604.03862
作者:Anjun Gao,Feng Wang,Zhenglin Wan,Yueyang Quan,Zhuqing Liu,Minghong Fang
备注:To appear in ACM AsiaCCS 2026
摘要:联合学习(FL)使多个客户端能够通过服务器协作训练全局机器学习模型,而无需共享其私有训练数据。在传统的FL中,系统遵循同步方法,其中服务器在聚合它们以更新全局模型之前等待来自众多客户端的模型更新。然而,同步FL是由掉队者问题阻碍。为了解决这个问题,异步FL架构允许服务器在接收到任何客户端的本地模型更新后立即更新全局模型。尽管异步FL具有优势,但其分散的性质使其容易受到中毒攻击。已经提出了几种针对异步FL的防御措施,但这些机制仍然容易受到高级攻击或依赖于不切实际的服务器假设。在本文中,我们介绍了SecureAFL,一个创新的框架,旨在确保异步FL对中毒攻击。SecureAFL通过检测和丢弃异常更新,同时估计丢失客户端的贡献,提高了异步FL的鲁棒性。此外,它利用拜占庭强大的聚合技术,如坐标明智的中位数,整合收到的和估计的更新。在各种真实数据集上的大量实验证明了SecureAFL的有效性。
摘要:Federated learning (FL) enables multiple clients to collaboratively train a global machine learning model via a server without sharing their private training data. In traditional FL, the system follows a synchronous approach, where the server waits for model updates from numerous clients before aggregating them to update the global model. However, synchronous FL is hindered by the straggler problem. To address this, the asynchronous FL architecture allows the server to update the global model immediately upon receiving any client's local model update. Despite its advantages, the decentralized nature of asynchronous FL makes it vulnerable to poisoning attacks. Several defenses tailored for asynchronous FL have been proposed, but these mechanisms remain susceptible to advanced attacks or rely on unrealistic server assumptions. In this paper, we introduce SecureAFL, an innovative framework designed to secure asynchronous FL against poisoning attacks. SecureAFL improves the robustness of asynchronous FL by detecting and discarding anomalous updates while estimating the contributions of missing clients. Additionally, it utilizes Byzantine-robust aggregation techniques, such as coordinate-wise median, to integrate the received and estimated updates. Extensive experiments on various real-world datasets demonstrate the effectiveness of SecureAFL.
【2】BlazeFL: Fast and Deterministic Federated Learning Simulation
标题:BlazeFL:快速且确定性的联邦学习模拟
链接:https://arxiv.org/abs/2604.03606
作者:Kitsuya Azuma,Takayuki Nishio
备注:9 pages, 4 figures. Accepted to the FedVision at CVPR 2026 (CVPRW)
摘要:联邦学习(FL)研究越来越依赖于具有数百或数千个虚拟客户端的单节点模拟,这使得效率和可重复性至关重要。然而,并行客户端训练通常通过共享随机状态和调度可变性引入不确定性,迫使研究人员以吞吐量换取可重复性或在复杂框架内实现自定义控制逻辑。我们提出了BlazeFL,一个轻量级的框架,单节点FL模拟,通过自由线程共享内存执行和确定性随机性管理来实现这种权衡。BlazeFL使用基于线程的并行性,在服务器和客户端之间进行内存参数交换,避免了序列化和进程间通信开销。为了支持确定性执行,BlazeFL将隔离的随机数生成器(RNG)流分配给客户端。在固定的软件/硬件堆栈下,当随机操作符使用BlazeFL管理的生成器时,这种设计在基于线程和基于进程的模式下重复高并发运行时产生按位相同的结果。在CIFAR-10图像分类实验中,相对于广泛使用的开源基线,BlazeFL大大减少了执行时间,在通信主导的工作负载上实现了高达3.1x $的加速,同时保留了轻量级的依赖关系。我们的开源实现可以在https://github.com/kitsuyaazuma/blazefl上找到。
摘要:Federated learning (FL) research increasingly relies on single-node simulations with hundreds or thousands of virtual clients, making both efficiency and reproducibility essential. Yet parallel client training often introduces nondeterminism through shared random state and scheduling variability, forcing researchers to trade throughput for reproducibility or to implement custom control logic within complex frameworks. We present BlazeFL, a lightweight framework for single-node FL simulation that alleviates this trade-off through free-threaded shared-memory execution and deterministic randomness management. BlazeFL uses thread-based parallelism with in-memory parameter exchange between the server and clients, avoiding serialization and inter-process communication overhead. To support deterministic execution, BlazeFL assigns isolated random number generator (RNG) streams to clients. Under a fixed software/hardware stack, and when stochastic operators consume BlazeFL-managed generators, this design yields bitwise-identical results across repeated high-concurrency runs in both thread-based and process-based modes. In CIFAR-10 image-classification experiments, BlazeFL substantially reduces execution time relative to a widely used open-source baseline, achieving up to 3.1$\times$ speedup on communication-dominated workloads while preserving a lightweight dependency footprint. Our open-source implementation is available at: https://github.com/kitsuyaazuma/blazefl.
推理|分析|理解|解释(18篇)
【1】Early Stopping for Large Reasoning Models via Confidence Dynamics
标题:通过置信动力学实现大型推理模型的早期停止
链接:https://arxiv.org/abs/2604.04930
作者:Parsa Hosseini,Sumit Nawathe,Mahdi Salmani,Meisam Razaviyayn,Soheil Feizi
摘要:Large reasoning models rely on long chain-of-thought generation to solve complex problems, but extended reasoning often incurs substantial computational cost and can even degrade performance due to overthinking. A key challenge is determining when the model should stop reasoning and produce the final answer. In this work, we study the confidence of intermediate answers during reasoning and observe two characteristic behaviors: correct reasoning trajectories often reach high-confidence answers early, while incorrect rollouts tend to produce long, unproductive reasoning traces and exhibit less reliable confidence dynamics. Motivated by these observations, we propose CoDE-Stop (Confidence Dynamics Early Stop), an early stopping method that leverages the dynamics of intermediate answer confidence to decide when to terminate reasoning, requiring no additional training and easily integrating into existing models. We evaluate CoDE-Stop on diverse reasoning and science benchmarks across multiple models. Compared to prior early stopping methods, it achieves a more favorable accuracy-compute tradeoff and reduces total token usage by 25-50% compared to standard full-length reasoning. In addition, we provide analyses of confidence dynamics during reasoning, offering insights into how confidence changes in both correct and incorrect trajectories.
【2】Are Latent Reasoning Models Easily Interpretable?
标题:潜在推理模型容易解释吗?
链接:https://arxiv.org/abs/2604.04902
作者:Connor Dilgren,Sarah Wiegreffe
备注:Preprint
摘要:Latent reasoning models (LRMs) have attracted significant research interest due to their low inference cost (relative to explicit reasoning models) and theoretical ability to explore multiple reasoning paths in parallel. However, these benefits come at the cost of reduced interpretability: LRMs are difficult to monitor because they do not reason in natural language. This paper presents an investigation into LRM interpretability by examining two state-of-the-art LRMs. First, we find that latent reasoning tokens are often unnecessary for LRMs' predictions; on logical reasoning datasets, LRMs can almost always produce the same final answers without using latent reasoning at all. This underutilization of reasoning tokens may partially explain why LRMs do not consistently outperform explicit reasoning methods and raises doubts about the stated role of these tokens in prior work. Second, we demonstrate that when latent reasoning tokens are necessary for performance, we can decode gold reasoning traces up to 65-93% of the time for correctly predicted instances. This suggests LRMs often implement the expected solution rather than an uninterpretable reasoning process. Finally, we present a method to decode a verified natural language reasoning trace from latent tokens without knowing a gold reasoning trace a priori, demonstrating that it is possible to find a verified trace for a majority of correct predictions but only a minority of incorrect predictions. Our findings highlight that current LRMs largely encode interpretable processes, and interpretability itself can be a signal of prediction correctness.
【3】Noise Immunity in In-Context Tabular Learning: An Empirical Robustness Analysis of TabPFN's Attention Mechanisms
标题:背景下表格学习中的抗噪性:TabPFN注意机制的实证鲁棒性分析
链接:https://arxiv.org/abs/2604.04868
作者:James Hu,Mahdi Ghelichi
摘要:Tabular foundation models (TFMs) such as TabPFN (Tabular Prior-Data Fitted Network) are designed to generalize across heterogeneous tabular datasets through in-context learning (ICL). They perform prediction in a single forward pass conditioned on labeled examples without dataset-specific parameter updates. This paradigm is particularly attractive in industrial domains (e.g., finance and healthcare) where tabular prediction is pervasive. Retraining a bespoke model for each new table can be costly or infeasible in these settings, while data quality issues such as irrelevant predictors, correlated feature groups, and label noise are common. In this paper, we provide strong empirical evidence that TabPFN is highly robust under these sub-optimal conditions. We study TabPFN and its attention mechanisms for binary classification problems with controlled synthetic perturbations that vary: (i) dataset width by injecting random uncorrelated features and by introducing nonlinearly correlated features, (ii) dataset size by increasing the number of training rows, and (iii) label quality by increasing the fraction of mislabeled targets. Beyond predictive performance, we analyze internal signals including attention concentration and attention-based feature ranking metrics. Across these parametric tests, TabPFN is remarkably resilient: ROC-AUC remains high, attention stays structured and sharp, and informative features are highly ranked by attention-based metrics. Qualitative visualizations with attention heatmaps, feature-token embeddings, and SHAP plots further support a consistent pattern across layers in which TabPFN increasingly concentrates on useful features while separating their signals from noise. Together, these findings suggest that TabPFN is a robust TFM capable of maintaining both predictive performance and coherent internal behavior under various scenarios of data imperfections.
【4】Cog-DRIFT: Exploration on Adaptively Reformulated Instances Enables Learning from Hard Reasoning Problems
标题:Cog-Drift:探索适应性重新制定的任务集,能够从困难推理问题中学习
链接:https://arxiv.org/abs/2604.04767
作者:Justin Chih-Yao Chen,Archiki Prasad,Zaid Khan,Joykirat Singh,Runchu Tian,Elias Stengel-Eskin,Mohit Bansal
备注:22 pages, 4 figures. Code: https://github.com/dinobby/Cog-DRIFT
摘要
:Reinforcement learning from verifiable rewards (RLVR) has improved the reasoning abilities of LLMs, yet a fundamental limitation remains: models cannot learn from problems that are too difficult to solve under their current policy, as these yield no meaningful reward signal. We propose a simple yet effective solution based on task reformulation. We transform challenging open-ended problems into cognitively simpler variants -- such as multiple-choice and cloze formats -- that preserve the original answer while reducing the effective search space and providing denser learning signals. These reformulations span a spectrum from discriminative to generative tasks, which we exploit to bootstrap learning: models first learn from structured, easier formats, and this knowledge transfers back to improve performance on the original open-ended problems. Building on this insight, we introduce Cog-DRIFT, a framework that constructs reformulated variants and organizes them into an adaptive curriculum based on difficulty. Training progresses from easier to harder formats, enabling the model to learn from problems that previously yielded zero signal under standard RL post-training. Cog-DRIFT not only improves on the originally unsolvable hard problems (absolute +10.11% for Qwen and +8.64% for Llama) but also generalizes well to other held-out datasets. Across 2 models and 6 reasoning benchmarks, our method consistently outperforms standard GRPO and strong guided-exploration baselines. On average, Cog-DRIFT shows +4.72% (Qwen) and +3.23% (Llama) improvements over the second-best baseline. We further show that Cog-DRIFT improves pass@k at test time, and the curriculum improves sample efficiency. Overall, our results highlight task reformulation and curriculum learning as an effective paradigm for overcoming the exploration barrier in LLM post-training.
【5】Explainable Machine Learning for Sepsis Outcome Prediction Using a Novel Romanian Electronic Health Record Dataset
标题:使用新型罗马尼亚电子健康记录数据集进行可解释的机器学习预测败血症结果
链接:https://arxiv.org/abs/2604.04698
作者:Andrei-Alexandru Bunea,Ovidiu Ghibea,Dan-Matei Popovici,Ion Daniel,Octavian Andronic
摘要:We develop and analyze explainable machine learning (ML) models for sepsis outcome prediction using a novel Electronic Health Record (EHR) dataset from 12,286 hospitalizations at a large emergency hospital in Romania. The dataset includes demographics, International Classification of Diseases (ICD-10) diagnostics, and 600 types of laboratory tests. This study aims to identify clinically strong predictors while achieving state-of-the-art results across three classification tasks: (1)deceased vs. discharged, (2)deceased vs. recovered, and (3)recovered vs. ameliorated. We trained five ML models to capture complex distributions while preserving clinical interpretability. Experiments explored the trade-off between feature richness and patient coverage, using subsets of the 10--50 most frequent laboratory tests. Model performance was evaluated using accuracy and area under the curve (AUC), and explainability was assessed using SHapley Additive exPlanations (SHAP). The highest performance was obtained for the deceased vs. recovered case study (AUC=0.983, accuracy=0.93). SHAP analysis identified several strong predictors such as cardiovascular comorbidities, urea levels, aspartate aminotransferase, platelet count, and eosinophil percentage. Eosinopenia emerged as a top predictor, highlighting its value as an underutilized marker that is not included in current assessment standards, while the high performance suggests the applicability of these models in clinical settings.
【6】Empirical Characterization of Rationale Stability Under Controlled Perturbations for Explainable Pattern Recognition
标题:可控扰动下可解释模式识别的基本原理稳定性的经验表征
链接:https://arxiv.org/abs/2604.04456
作者:Abu Noman Md Sakib,Zhensen Wang,Merjulah Roby,Zijie Zhang
备注:28th International Conference on Pattern Recognition (ICPR) 2026
摘要:Reliable pattern recognition systems should exhibit consistent behavior across similar inputs, and their explanations should remain stable. However, most Explainable AI evaluations remain instance centric and do not explicitly quantify whether attribution patterns are consistent across samples that share the same class or represent small variations of the same input. In this work, we propose a novel metric aimed at assessing the consistency of model explanations, ensuring that models consistently reflect the intended objectives and consistency under label-preserving perturbations. We implement this metric using a pre-trained BERT model on the SST-2 sentiment analysis dataset, with additional robustness tests on RoBERTa, DistilBERT, and IMDB, applying SHAP to compute feature importance for various test samples. The proposed metric quantifies the cosine similarity of SHAP values for inputs with the same label, aiming to detect inconsistent behaviors, such as biased reliance on certain features or failure to maintain consistent reasoning for similar predictions. Through a series of experiments, we evaluate the ability of this metric to identify misaligned predictions and inconsistencies in model explanations. These experiments are compared against standard fidelity metrics to assess whether the new metric can effectively identify when a model's behavior deviates from its intended objectives. The proposed framework provides a deeper understanding of model behavior by enabling more robust verification of rationale stability, which is critical for building trustworthy AI systems. By quantifying whether models rely on consistent attribution patterns for similar inputs, the proposed approach supports more robust evaluation of model behavior in practical pattern recognition pipelines. Our code is publicly available at https://github.com/anmspro/ESS-XAI-Stability.
【7】Finite-Time Analysis of Q-Value Iteration for General-Sum Stackelberg Games
标题:广义和Stackelberg对策Q值迭代的实时分析
链接:https://arxiv.org/abs/2604.04394
作者:Narim Jeong,Donghwan Lee
备注:8 pages
摘要:Reinforcement learning has been successful both empirically and theoretically in single-agent settings, but extending these results to multi-agent reinforcement learning in general-sum Markov games remains challenging. This paper studies the convergence of Stackelberg Q-value iteration in two-player general-sum Markov games from a control-theoretic perspective. We introduce a relaxed policy condition tailored to the Stackelberg setting and model the learning dynamics as a switching system. By constructing upper and lower comparison systems, we establish finite-time error bounds for the Q-functions and characterize their convergence properties. Our results provide a novel control-theoretic perspective on Stackelberg learning. Moreover, to the best of the authors' knowledge, this paper offers the first finite-time convergence guarantees for Q-value iteration in general-sum Markov games under Stackelberg interactions.
【8】Gradual Cognitive Externalization: A Framework for Understanding How Ambient Intelligence Externalizes Human Cognition
标题:渐进认知外部化:理解环境智能如何外部化人类认知的框架
链接:https://arxiv.org/abs/2604.04387
作者:Zhimin Zhao
摘要
:Developers are publishing AI agent skills that replicate a colleague's communication style, encode a supervisor's mentoring heuristics, or preserve a person's behavioral repertoire beyond biological death. To explain why, we propose Gradual Cognitive Externalization (GCE), a framework arguing that human cognitive functions are migrating into digital substrates through ambient intelligence co-adaptation rather than mind uploading. GCE rests on the behavioral manifold hypothesis: everyday cognition occupies a low-dimensional manifold that is structured, redundant, and learnable from sustained observation. We document evidence from scheduling assistants, writing tools, recommendation engines, and agent skill ecosystems showing that the preconditions for externalization are already observable. We formalize three criteria separating cognitive integration from tool use (bidirectional adaptation, functional equivalence, causal coupling), derive five testable predictions with theory-constrained thresholds, and provide a concrete experimental protocol. The question is no longer whether minds can be uploaded, but how fast cognitive functions are already migrating into digital substrates and what follows.
【9】Thermodynamic-Inspired Explainable GeoAI: Uncovering Regime-Dependent Mechanisms in Heterogeneous Spatial Systems
标题:受热力学启发的可解释地理人工智能:揭示异类空间系统中的状态相关机制
链接:https://arxiv.org/abs/2604.04339
作者:Sooyoung Lim,Zhenlong Li,Zi-Kui Liu
摘要:Modeling spatial heterogeneity and associated critical transitions remains a fundamental challenge in geography and environmental science. While conventional Geographically Weighted Regression (GWR) and deep learning models have improved predictive skill, they often fail to elucidate state-dependent nonlinearities where the functional roles of drivers represent opposing effects across heterogeneous domains. We introduce a thermodynamics-inspired explainable geospatial AI framework that integrates statistical mechanics with graph neural networks. By conceptualizing spatial variability as a thermodynamic competition between system Burden (E) and Capacity (S), our model disentangles the latent mechanisms driving spatial processes. Using three simulation datasets and three real-word datasets across distinct domains (housing markets, mental health prevalence, and wildfire-induced PM2.5 anomalies), we show that the new framework successfully identifies regime-dependent role reversals of predictors that standard baselines miss. Notably, the framework explicitly diagnoses the phase transition into a Burden-dominated regime during the 2023 Canadian wildfire event, distinguishing physical mechanism shifts from statistical outliers. These findings demonstrate that thermodynamic constraints can improve the interpretability of GeoAI while preserving strong predictive performance in complex spatial systems.
【10】Towards Unveiling Vulnerabilities of Large Reasoning Models in Machine Unlearning
标题:揭示机器去学习中大型推理模型的漏洞
链接:https://arxiv.org/abs/2604.04255
作者:Aobo Chen,Chenxu Zhao,Chenglin Miao,Mengdi Huai
摘要:Large language models (LLMs) possess strong semantic understanding, driving significant progress in data mining applications. This is further enhanced by large reasoning models (LRMs), which provide explicit multi-step reasoning traces. On the other hand, the growing need for the right to be forgotten has driven the development of machine unlearning techniques, which aim to eliminate the influence of specific data from trained models without full retraining. However, unlearning may also introduce new security vulnerabilities by exposing additional interaction surfaces. Although many studies have investigated unlearning attacks, there is no prior work on LRMs. To bridge the gap, we first in this paper propose LRM unlearning attack that forces incorrect final answers while generating convincing but misleading reasoning traces. This objective is challenging due to non-differentiable logical constraints, weak optimization effect over long rationales, and discrete forget set selection. To overcome these challenges, we introduce a bi-level exact unlearning attack that incorporates a differentiable objective function, influential token alignment, and a relaxed indicator strategy. To demonstrate the effectiveness and generalizability of our attack, we also design novel optimization frameworks and conduct comprehensive experiments in both white-box and black-box settings, aiming to raise awareness of the emerging threats to LRM unlearning pipelines.
【11】Fine-grained Analysis of Stability and Generalization for Stochastic Bilevel Optimization
标题:随机二层优化的稳定性和推广细粒度分析
链接:https://arxiv.org/abs/2604.04090
作者:Xuelin Zhang,Hong Chen,Bin Gu,Tieliang Gong,Feng Zheng
摘要:Stochastic bilevel optimization (SBO) has been integrated into many machine learning paradigms recently, including hyperparameter optimization, meta learning, and reinforcement learning. Along with the wide range of applications, there have been numerous studies on the computational behavior of SBO. However, the generalization guarantees of SBO methods are far less understood from the lens of statistical learning theory. In this paper, we provide a systematic generalization analysis of the first-order gradient-based bilevel optimization methods. Firstly, we establish the quantitative connections between the on-average argument stability and the generalization gap of SBO methods. Then, we derive the upper bounds of on-average argument stability for single-timescale stochastic gradient descent (SGD) and two-timescale SGD, where three settings (nonconvex-nonconvex (NC-NC), convex-convex (C-C), and strongly-convex-strongly-convex (SC-SC)) are considered respectively. Experimental analysis validates our theoretical findings. Compared with the previous algorithmic stability analysis, our results do not require reinitializing the inner-level parameters at each iteration and are applicable to more general objective functions.
【12】Diagonal-Tiled Mixed-Precision Attention for Efficient Low-Bit MXFP Inference
标题:对角拼接混合精度关注高效低位MXFP推理
链接:https://arxiv.org/abs/2604.03950
作者:Yifu Ding,Xinhao Zhang,Jinyang Guo
备注:CVPR Workshop EDGE 2026
摘要
:Transformer-based large language models (LLMs) have demonstrated remarkable performance across a wide range of real-world tasks, but their inference cost remains prohibitively high due to the quadratic complexity of attention and the memory bandwidth limitations of high-precision operations. In this work, we present a low-bit mixed-precision attention kernel using the microscaling floating-point (MXFP) data format, utilizing the computing capability on next-generation GPU architectures. Our Diagonal-Tiled Mixed-Precision Attention (DMA) incorporates two kinds of low-bit computation at the tiling-level, and is a delicate fused kernel implemented using Triton, exploiting hardware-level parallelism and memory efficiency to enable fast and efficient inference without compromising model performance. Extensive empirical evaluations on NVIDIA B200 GPUs show that our kernel maintains generation quality with negligible degradation, and meanwhile achieves significant speedup by kernel fusion. We release our code at https://github.com/yifu-ding/MP-Sparse-Attn.
【13】Understanding When Poisson Log-Normal Models Outperform Penalized Poisson Regression for Microbiome Count Data
标题:了解微生物组计数数据的Poisson log正态模型何时优于惩罚Poisson回归
链接:https://arxiv.org/abs/2604.03853
作者:Daniel Agyapong,Julien Chiquet,Jane Marks,Toby Dylan Hocking
摘要:Multivariate count models are often justified by their ability to capture latent dependence, but researchers receive little guidance on when this added structure improves on simpler penalized marginal Poisson regression. We study this question using real microbiome data under a unified held-out evaluation framework. For count prediction, we compare PLN and GLMNet(Poisson) on 20 datasets spanning 32 to 18,270 samples and 24 to 257 taxa, using held-out Poisson deviance under leave-one-taxon-out prediction with 3-fold sample cross-validation rather than synthetic or in-sample criteria. For network inference, we compare PLNNetwork and GLMNet(Poisson) neighborhood selection on five publicly available datasets with experimentally validated microbial interaction truth. PLN outperforms GLMNet(Poisson) on most count-prediction datasets, with gains up to 38 percent. The primary predictor of the winner is the sample-to-taxon ratio, with mean absolute correlation as the strongest secondary signal and overdispersion as an additional predictor. PLNNetwork performs best on broad undirected interaction benchmarks, whereas GLMNet(Poisson) is better aligned with local or directional effects. Taken together, these results provide guidance for choosing between latent multivariate count models and penalized Poisson regression in biological count prediction and interaction recovery.
【14】LightThinker++: From Reasoning Compression to Memory Management
标题:LightThinker++:从推理压缩到内存管理
链接:https://arxiv.org/abs/2604.03679
作者:Yuqi Zhu,Jintian Zhang,Zhenjie Wan,Yujie Luo,Shuofei Qiao,Zhengke Gui,Da Zheng,Lei Liang,Huajun Chen,Ningyu Zhang
备注:Work in progress. This is an extended version of LightThinker
摘要:Large language models (LLMs) excel at complex reasoning, yet their efficiency is limited by the surging cognitive overhead of long thought traces. In this paper, we propose LightThinker, a method that enables LLMs to dynamically compress intermediate thoughts into compact semantic representations. However, static compression often struggles with complex reasoning where the irreversible loss of intermediate details can lead to logical bottlenecks. To address this, we evolve the framework into LightThinker++, introducing Explicit Adaptive Memory Management. This paradigm shifts to behavioral-level management by incorporating explicit memory primitives, supported by a specialized trajectory synthesis pipeline to train purposeful memory scheduling. Extensive experiments demonstrate the framework's versatility across three dimensions. (1) LightThinker reduces peak token usage by 70% and inference time by 26% with minimal accuracy loss. (2) In standard reasoning, LightThinker++ slashes peak token usage by 69.9% while yielding a +2.42% accuracy gain under the same context budget for maximum performance. (3) Most notably, in long-horizon agentic tasks, it maintains a stable footprint beyond 80 rounds (a 60%-70% reduction), achieving an average performance gain of 14.8% across different complex scenarios. Overall, our work provides a scalable direction for sustaining deep LLM reasoning over extended horizons with minimal overhead.
【15】Hardware-Oriented Inference Complexity of Kolmogorov-Arnold Networks
标题:Kolmogorov-Arnold网络的面向硬件推理复杂性
链接:https://arxiv.org/abs/2604.03345
作者:Bilal Khalid,Pedro Freire,Sergei K. Turitsyn,Jaroslaw E. Prilepsky
备注:This work has been submitted to the IEEE for possible publication
摘要:Kolmogorov-Arnold Networks (KANs) have recently emerged as a powerful architecture for various machine learning applications. However, their unique structure raises significant concerns regarding their computational overhead. Existing studies primarily evaluate KAN complexity in terms of Floating-Point Operations (FLOPs) required for GPU-based training and inference. However, in many latency-sensitive and power-constrained deployment scenarios, such as neural network-driven non-linearity mitigation in optical communications or channel state estimation in wireless communications, training is performed offline and dedicated hardware accelerators are preferred over GPUs for inference. Recent hardware implementation studies report KAN complexity using platform-specific resource consumption metrics, such as Look-Up Tables, Flip-Flops, and Block RAMs. However, these metrics require a full hardware design and synthesis stage that limits their utility for early-stage architectural decisions and cross-platform comparisons. To address this, we derive generalized, platform-independent formulae for evaluating the hardware inference complexity of KANs in terms of Real Multiplications (RM), Bit Operations (BOP), and Number of Additions and Bit-Shifts (NABS). We extend our analysis across multiple KAN variants, including B-spline, Gaussian Radial Basis Function (GRBF), Chebyshev, and Fourier KANs. The proposed metrics can be computed directly from the network structure and enable a fair and straightforward inference complexity comparison between KAN and other neural network architectures.
【16】InsightBoard: An Interactive Multi-Metric Visualization and Fairness Analysis Plugin for TensorBoard
标题:InsightBoard:TensorBoard的交互式多指标可视化和公平性分析插件
链接:https://arxiv.org/abs/2604.03323
作者:Ray Zeyao Chen,Christan Grant
摘要
:Modern machine learning systems deployed in safety-critical domains require visibility not only into aggregate performance but also into how training dynamics affect subgroup fairness over time. Existing training dashboards primarily support single-metric monitoring and offer limited support for examining relationships between heterogeneous metrics or diagnosing subgroup disparities during training. We present InsightBoard, an interactive TensorBoard plugin that integrates synchronized multi-metric visualization with slice-based fairness diagnostics in a unified interface. InsightBoard enables practitioners to jointly inspect training dynamics, performance metrics, and subgroup disparities through linked multi-view plots, correlation analysis, and standard group fairness indicators computed over user-defined slices. Through case studies with YOLOX on the BDD100k dataset, we demonstrate that models achieving strong aggregate performance can still exhibit substantial demographic and environmental disparities that remain hidden under conventional monitoring. By making fairness diagnostics available during training, InsightBoard supports earlier, more informed model inspection without modifying existing training pipelines or introducing additional data stores.
【17】ENEC: A Lossless AI Model Compression Method Enabling Fast Inference on Ascend NPUs
标题:ENEC:一种无损人工智能模型压缩方法,能够对Ascend NPU进行快速推理
链接:https://arxiv.org/abs/2604.03298
作者:Jinwu Yang,Jiaan Wu,Zedong Liu,Xinyang Ma,Hairui Zhao,Yida Gu,Yuanhong Huang,Xingchen Liu,Wenjing Huang,Zheng Wei,Jing Xing,Yili Ma,Qingyi Zhang,Baoyi An,Zhongzhe Hu,Shaoteng Liu,Xia Zhu,Jiaxun Lu,Guangming Tan,Dingwen Tao
备注:Accepted by ISCA 2026, 15 pages, 13 figures, 7 tables
摘要:The rapid scaling of Large Language Models presents significant challenges for their deployment and inference, particularly on resource-constrained specialized AI hardware accelerators such as Huawei's Ascend NPUs, where weight data transfer has become a critical performance bottleneck. While lossless compression can preserve model accuracy and reduce data volume, existing lossless compression algorithms exhibit extremely low throughput when ported to the Ascend NPU architecture. In this paper, we propose ENEC, a novel lossless compression method specifically customized for AI model weights and optimized for Ascend Neural Processing Units. ENEC adopts a block-based fixed-length encoding scheme and incorporates a series of NPU-specific optimizations: bit-width quantization with hierarchical halving bit-packing, vectorized branch-free integer transformation, and dependency-decoupled intra-segment scan for efficient prefix-sum computation. Experimental results demonstrate that ENEC outperforms existing state-of-the-art NPU compressors in both compression ratio and throughput. Compared to leading GPU solutions, ENEC achieves a 3.43X higher throughput than DietGPU and a 1.12X better compression ratio than nvCOMP. By reducing weight transmission overhead, ENEC significantly improves end-to-end inference performance, achieving up to a 6.3X speedup. On Ascend NPUs, ENEC is the first open-source lossless compression algorithm for model weights that achieves performance comparable to state-of-the-art GPU compressors, offering an effective solution for deploying large-scale AI models.
【18】DRAFT: Task Decoupled Latent Reasoning for Agent Safety
标题:草案:任务脱钩的代理安全潜在推理
链接:https://arxiv.org/abs/2604.03242
作者:Lin Wang,Junfeng Fang,Dan Zhang,Fei Shen,Xiang Wang,Tat-Seng Chua
摘要:The advent of tool-using LLM agents shifts safety monitoring from output moderation to auditing long, noisy interaction trajectories, where risk-critical evidence is sparse-making standard binary supervision poorly suited for credit assignment. To address this, we propose DRAFT (Task Decoupled Latent Reasoning for Agent Safety), a latent reasoning framework that decouples safety judgment into two trainable stages: an Extractor that distills the full trajectory into a compact continuous latent draft, and a Reasoner that jointly attends to the draft and the original trajectory to predict safety. DRAFT avoids lossy explicit summarize-then-judge pipelines by performing evidence aggregation in latent space, enabling end-to-end differentiable training.Across benchmarks including ASSEBench and R-Judge, DRAFT consistently outperforms strong baselines, improving accuracy from 63.27% (LoRA) to 91.18% averaged over benchmarks, and learns more separable representations. Ablations demonstrate a clear synergy between the Extractor and the Reasoner.Overall, DRAFT suggests that continuous latent reasoning prior to readout is a practical path to robust agent safety under long-context supervision with sparse evidence.
检测相关(1篇)
【1】HI-MoE: Hierarchical Instance-Conditioned Mixture-of-Experts for Object Detection
标题:HI-MoE:用于对象检测的分层实例条件混合专家
链接:https://arxiv.org/abs/2604.04908
作者:Vadim Vashkelis,Natalia Trukhina
摘要:Mixture-of-Experts (MoE) architectures enable conditional computation by activating only a subset of model parameters for each input. Although sparse routing has been highly effective in language models and has also shown promise in vision, most vision MoE methods operate at the image or patch level. This granularity is poorly aligned with object detection, where the fundamental unit of reasoning is an object query corresponding to a candidate instance. We propose Hierarchical Instance-Conditioned Mixture-of-Experts (HI-MoE), a DETR-style detection architecture that performs routing in two stages: a lightweight scene router first selects a scene-consistent expert subset, and an instance router then assigns each object query to a small number of experts within that subset. This design aims to preserve sparse computation while better matching the heterogeneous, instance-centric structure of detection. In the current draft, experiments are concentrated on COCO with preliminary specialization analysis on LVIS. Under these settings, HI-MoE improves over a dense DINO baseline and over simpler token-level or instance-only routing variants, with especially strong gains on small objects. We also provide an initial visualization of expert specialization patterns. We present the method, ablations, and current limitations in a form intended to support further experimental validation.
分类|识别(7篇)
【1】Integer-Only Operations on Extreme Learning Machine Test Time Classification
标题:极限学习机器测试时间分类上的纯整运算
链接:https://arxiv.org/abs/2604.04363
作者:Emerson Lopes Machadoa,Cristiano Jacques Miosso,Ricardo Pezzuol Jacobi
备注:14 pages. Originally written in 2015; archived in 2026
摘要:We present a theoretical analysis and empirical evaluations of a novel set of techniques for computational cost reduction of test time operations of network classifiers based on extreme learning machine (ELM). By exploring some characteristics we derived from these models, we show that the classification at test time can be performed using solely integer operations without compromising the classification accuracy. Our contributions are as follows: (i) We show empirical evidence that the input weights values can be drawn from the ternary set with limited reduction of the classification accuracy. This has the computational advantage of dismissing multiplications; (ii) We prove the classification accuracy of normalized and non-normalized test signals are the same; (iii) We show how to create an integer version of the output weights that results in a limited reduction of the classification accuracy. We tested our techniques on 5 computer vision datasets commonly used in the literature and the results indicate that our techniques can allow the reduction of the computational cost of the operations necessary for the classification at test time in FPGAs. This is important in embedded applications, where power consumption is limited, and crucial in data centers of large corporations, where power consumption is expensive.
【2】How Long short-term memory artificial neural network, synthetic data, and fine-tuning improve the classification of raw EEG data
标题:长短期记忆人工神经网络、合成数据和微调如何改进原始脑电数据的分类
链接:https://arxiv.org/abs/2604.04316
作者:Albert Nasybullin,Vladimir Maksimenko,Semen Kurkin
备注:4 pages, 4 figures, 2 tables
摘要:In this paper, we discuss a Machine Learning pipeline for the classification of EEG data. We propose a combination of synthetic data generation, long short-term memory artificial neural network (LSTM), and fine-tuning to solve classification problems for experiments with implicit visual stimuli, such as the Necker cube with different levels of ambiguity. The developed approach increased the quality of the classification model of raw EEG data.
【3】Measuring Robustness of Speech Recognition from MEG Signals Under Distribution Shift
标题:分布漂移下MEG信号语音识别的鲁棒性测量
链接:https://arxiv.org/abs/2604.04129
作者:Sheng-You Chien,Bo-Yi Mao,Yi-Ning Chang,Po-Chih Kuo
备注:17 pages, 6 figures, LibriBrain Competition @NeurIPS2025
摘要:This study investigates robust speech-related decoding from non-invasive MEG signals using the LibriBrain phoneme-classification benchmark from the 2025 PNPL competition. We compare residual convolutional neural networks (CNNs), an STFT-based CNN, and a CNN--Transformer hybrid, while also examining the effects of group averaging, label balancing, repeated grouping, normalization strategies, and data augmentation. Across our in-house implementations, preprocessing and data-configuration choices matter more than additional architectural complexity, among which instance normalization emerges as the most influential modification for generalization. The strongest of our own models, a CNN with group averaging, label balancing, repeated grouping, and instance normalization, achieves 60.95% F1-macro on the test split, compared with 39.53% for the plain CNN baseline. However, most of our models, without instance normalization, show substantial validation-to-test degradation, indicating that distribution shift induced by different normalization statistics is a major obstacle to generalization in our experiments. By contrast, MEGConformer maintains 64.09% F1-macro on both validation and test, and saliency-map analysis is qualitatively consistent with this contrast: weaker models exhibit more concentrated or repetitive phoneme-sensitive patterns across splits, whereas MEGConformer appears more distributed. Overall, the results suggest that improving the reliability of non-invasive phoneme decoding will likely require better handling of normalization-related distribution shift while also addressing the challenge of single-trial decoding.
【4】OASIC: Occlusion-Agnostic and Severity-Informed Classification
标题:OASIC:封堵不可知和严重度知情分类
链接:https://arxiv.org/abs/2604.04012
作者:Kay Gijzen,Gertjan J. Burghouts,Daniël M. Pelt
备注:14 pages, 5 figures
摘要:Severe occlusions of objects pose a major challenge for computer vision. We show that two root causes are (1) the loss of visible information and (2) the distracting patterns caused by the occluders. Our approach addresses both causes at the same time. First, the distracting patterns are removed at test-time, via masking of the occluding patterns. This masking is independent of the type of occlusion, by handling the occlusion through the lens of visual anomalies w.r.t. the object of interest. Second, to deal with less visual details, we follow standard practice by masking random parts of the object during training, for various degrees of occlusions. We discover that (a) it is possible to estimate the degree of the occlusion (i.e. severity) at test-time, and (b) that a model optimized for a specific degree of occlusion also performs best on a similar degree during test-time. Combining these two insights brings us to a severity-informed classification model called OASIC: Occlusion Agnostic Severity Informed Classification. We estimate the severity of occlusion for a test image, mask the occluder, and select the model that is optimized for the degree of occlusion. This strategy performs better than any single model optimized for any smaller or broader range of occlusion severities. Experiments show that combining gray masking with adaptive model selection improves $\text{AUC}_\text{occ}$ by +18.5 over standard training on occluded images and +23.7 over finetuning on unoccluded images.
【5】Event-Driven Neuromorphic Vision Enables Energy-Efficient Visual Place Recognition
标题:事件驱动的神经形态视觉实现节能视觉位置识别
链接:https://arxiv.org/abs/2604.03277
作者:Geoffroy Keime,Nicolas Cuperlier,Benoit R. Cottereau
备注:40 pages single column, v1
摘要:Reliable visual place recognition (VPR) under dynamic real-world conditions is critical for autonomous robots, yet conventional deep networks remain limited by high computational and energy demands. Inspired by the mammalian navigation system, we introduce SpikeVPR, a bio-inspired and neuromorphic approach combining event-based cameras with spiking neural networks (SNNs) to generate compact, invariant place descriptors from few exemplars, achieving robust recognition under extreme changes in illumination, viewpoint, and appearance. SpikeVPR is trained end-to-end using surrogate gradient learning and incorporates EventDilation, a novel augmentation strategy enhancing robustness to speed and temporal variations. Evaluated on two challenging benchmarks (Brisbane-Event-VPR and NSAVP), SpikeVPR achieves performance comparable to state-of-the-art deep networks while using 50 times fewer parameters and consuming 30 and 250 times less energy, enabling real-time deployment on mobile and neuromorphic platforms. These results demonstrate that spike-based coding offers an efficient pathway toward robust VPR in complex, changing environments.
【6】A Robust SINDy Autoencoder for Noisy Dynamical System Identification
标题:用于有噪动态系统识别的鲁棒SINDY自动编码器
链接:https://arxiv.org/abs/2604.04829
作者:Kairui Ding
备注:27 pages
摘要:Sparse identification of nonlinear dynamics (SINDy) has been widely used to discover the governing equations of a dynamical system from data. It uses sparse regression techniques to identify parsimonious models of unknown systems from a library of candidate functions. Therefore, it relies on the assumption that the dynamics are sparsely represented in the coordinate system used. To address this limitation, one seeks a coordinate transformation that provides reduced coordinates capable of reconstructing the original system. Recently, SINDy autoencoders have extended this idea by combining sparse model discovery with autoencoder architectures to learn simplified latent coordinates together with parsimonious governing equations. A central challenge in this framework is robustness to measurement error. Inspired by noise-separating neural network structures, we incorporate a noise-separation module into the SINDy autoencoder architecture, thereby improving robustness and enabling more reliable identification of noisy dynamical systems. Numerical experiments on the Lorenz system show that the proposed method recovers interpretable latent dynamics and accurately estimates the measurement noise from noisy observations.
【7】Nearly Optimal Best Arm Identification for Semiparametric Bandits
标题:半参数盗贼的近乎最优最佳手臂识别
链接:https://arxiv.org/abs/2604.03969
作者:Seok-Jin Kim
备注:To appear at AISTATS 2026
摘要:We study fixed-confidence Best Arm Identification (BAI) in semiparametric bandits, where rewards are linear in arm features plus an unknown additive baseline shift. Unlike linear-bandit BAI, this setting requires orthogonalized regression, and its instance-optimal sample complexity has remained open. For the transductive setting, we establish an attainable instance-dependent lower bound characterized by the corresponding linear-bandit complexity on shifted features. We then propose a computationally efficient phase-elimination algorithm based on a new $XY$-design for orthogonalized regression. Our analysis yields a nearly optimal high-probability sample-complexity upper bound, up to log factors and an additive $d^2$ term, and experiments on synthetic instances and the Jester dataset show clear gains over prior baselines.
表征(1篇)
【1】TORA: Topological Representation Alignment for 3D Shape Assembly
标题:TORA:3D形状装配的布局表示对齐
链接:https://arxiv.org/abs/2604.04050
作者:Nahyuk Lee,Zhiang Chen,Marc Pollefeys,Sunghwan Hong
摘要:Flow-matching methods for 3D shape assembly learn point-wise velocity fields that transport parts toward assembled configurations, yet they receive no explicit guidance about which cross-part interactions should drive the motion. We introduce TORA, a topology-first representation alignment framework that distills relational structure from a frozen pretrained 3D encoder into the flow-matching backbone during training. We first realize this via simple instantiation, token-wise cosine matching, which injects the learned geometric descriptors from the teacher representation. We then extend to employ a Centered Kernel Alignment (CKA) loss to match the similarity structure between student and teacher representations for enhanced topological alignment. Through systematic probing of diverse 3D encoders, we show that geometry- and contact-centric teacher properties, not semantic classification ability, govern alignment effectiveness, and that alignment is most beneficial at later transformer layers where spatial structure naturally emerges. TORA introduces zero inference overhead while yielding two consistent benefits: faster convergence (up to 6.9$\times$) and improved accuracy in-distribution, along with greater robustness under domain shift. Experiments on five benchmarks spanning geometric, semantic, and inter-object assembly demonstrate state-of-the-art performance, with particularly pronounced gains in zero-shot transfer to unseen real-world and synthetic datasets. Project page: https://nahyuklee.github.io/tora.
3D|3D重建等相关(1篇)
【1】MAVEN: A Mesh-Aware Volumetric Encoding Network for Simulating 3D Flexible Deformation
标题:MAVEN:用于模拟3D柔性变形的网格感知体积编码网络
链接:https://arxiv.org/abs/2604.04474
作者:Zhe Feng,Shilong Tao,Haonan Sun,Shaohan Chen,Zhanxing Zhu,Yunhuai Liu
摘要
:Deep learning-based approaches, particularly graph neural networks (GNNs), have gained prominence in simulating flexible deformations and contacts of solids, due to their ability to handle unstructured physical fields and nonlinear regression on graph structures. However, existing GNNs commonly represent meshes with graphs built solely from vertices and edges. These approaches tend to overlook higher-dimensional spatial features, e.g., 2D facets and 3D cells, from the original geometry. As a result, it is challenging to accurately capture boundary representations and volumetric characteristics, though this information is critically important for modeling contact interactions and internal physical quantity propagation, particularly under sparse mesh discretization. In this paper, we introduce MAVEN, a mesh-aware volumetric encoding network for simulating 3D flexible deformation, which explicitly models geometric mesh elements of higher dimension to achieve a more accurate and natural physical simulation. MAVEN establishes learnable mappings among 3D cells, 2D facets, and vertices, enabling flexible mutual transformations. Explicit geometric features are incorporated into the model to alleviate the burden of implicitly learning geometric patterns. Experimental results show that MAVEN consistently achieves state-of-the-art performance across established datasets and a novel metal stretch-bending task featuring large deformations and prolonged contacts.
编码器(4篇)
【1】Autoencoder-Based Parameter Estimation for Superposed Multi-Component Damped Sinusoidal Signals
标题:基于自动编码器的叠加多分量衰减频谱信号参数估计
链接:https://arxiv.org/abs/2604.03985
作者:Momoka Iida,Hayato Motohashi,Hirotaka Takahashi
备注:27 pages, 16 figures, 14 tables
摘要:Damped sinusoidal oscillations are widely observed in many physical systems, and their analysis provides access to underlying physical properties. However, parameter estimation becomes difficult when the signal decays rapidly, multiple components are superposed, and observational noise is present. In this study, we develop an autoencoder-based method that uses the latent space to estimate the frequency, phase, decay time, and amplitude of each component in noisy multi-component damped sinusoidal signals. We investigate multi-component cases under Gaussian-distribution training and further examine the effect of the training-data distribution through comparisons between Gaussian and uniform training. The performance is evaluated through waveform reconstruction and parameter-estimation accuracy. We find that the proposed method can estimate the parameters with high accuracy even in challenging setups, such as those involving a subdominant component or nearly opposite-phase components, while remaining reasonably robust when the training distribution is less informative. This demonstrates its potential as a tool for analyzing short-duration, noisy signals.
【2】Improving Feasibility via Fast Autoencoder-Based Projections
标题:通过基于自动编码器的快速预测提高可行性
链接:https://arxiv.org/abs/2604.03489
作者:Maria Chzhen,Priya L. Donti
摘要:Enforcing complex (e.g., nonconvex) operational constraints is a critical challenge in real-world learning and control systems. However, existing methods struggle to efficiently enforce general classes of constraints. To address this, we propose a novel data-driven amortized approach that uses a trained autoencoder as an approximate projector to provide fast corrections to infeasible predictions. Specifically, we train an autoencoder using an adversarial objective to learn a structured, convex latent representation of the feasible set. This enables rapid correction of neural network outputs by projecting their associated latent representations onto a simple convex shape before decoding into the original feasible set. We test our approach on a diverse suite of constrained optimization and reinforcement learning problems with challenging nonconvex constraints. Results show that our method effectively enforces constraints at a low computational cost, offering a practical alternative to expensive feasibility correction techniques based on traditional solvers.
【3】MetaSAEs: Joint Training with a Decomposability Penalty Produces More Atomic Sparse Autoencoder Latents
标题:MetaSAEs:具有可分解性惩罚的联合训练产生更多原子稀疏自动编码器潜伏期
链接:https://arxiv.org/abs/2604.03436
作者:Matthew Levinson
摘要:Sparse autoencoders (SAEs) are increasingly used for safety-relevant applications including alignment detection and model steering. These use cases require SAE latents to be as atomic as possible. Each latent should represent a single coherent concept drawn from a single underlying representational subspace. In practice, SAE latents blend representational subspaces together. A single feature can activate across semantically distinct contexts that share no true common representation, muddying an already complex picture of model computation. We introduce a joint training objective that directly penalizes this subspace blending. A small meta SAE is trained alongside the primary SAE to sparsely reconstruct the primary SAE's decoder columns; the primary SAE is penalized whenever its decoder directions are easy to reconstruct from the meta dictionary. This occurs whenever latent directions lie in a subspace spanned by other primary directions. This creates gradient pressure toward more mutually independent decoder directions that resist sparse meta-compression. On GPT-2 large (layer 20), the selected configuration reduces mean $|\varphi|$ by 7.5% relative to an identical solo SAE trained on the same data. Automated interpretability (fuzzing) scores improve by 7.6%, providing external validation of the atomicity gain independent of the training and co-occurrence metrics. Reconstruction overhead is modest. Results on Gemma 2 9B are directional. On not-fully-converged SAEs, the same parameterization yields the best results, a $+8.6\%$ $Δ$Fuzz. Though directional, this is an encouraging sign that the method transfers to a larger model. Qualitative analysis confirms that features firing on polysemantic tokens are split into semantically distinct sub-features, each specializing in a distinct representational subspace.
【4】NativeTernary: A Self-Delimiting Binary Encoding with Unary Run-Length Hierarchy Markers for Ternary Neural Network Weights, Structured Data, and General Computing Infrastructure
标题:NativeTernary:用于三元神经网络权重、结构化数据和通用计算基础设施的具有一元运行长度层次标记的自定界二进制编码
链接:https://arxiv.org/abs/2604.03336
作者:Maharshi Savdhariya
备注:9 pages. Patent filed, Indian Patent Office, March 2026. C implementation forthcoming: https://github.com/sm45118/nativeternary. v2 planned with GGUF benchmarks. Keywords: ternary encoding, BitNet b1.58, 1-bit LLMs, ternary weights, GGUF, IoT compression, run-length encoding, embedded systems
摘要:BitNet b1.58 (Ma et al., 2024) demonstrates that large language models can operate entirely on ternary weights {-1, 0, +1}, yet no native binary wire format exists for such models. NativeTernary closes this gap. We present NativeTernary, a binary encoding scheme that partitions the 2-bit pair space into three data symbols representing ternary values -- either balanced {-1, 0, +1} or unsigned {0, 1, 2} -- and a reserved structural delimiter. The central contribution is the use of unary run-length encoding to represent semantic hierarchy depth: a sequence of N consecutive delimiter pairs denotes a boundary of level N, encoding character, word, sentence, paragraph, and topic boundaries at cost 2, 4, 6, 8, and 10 bits respectively -- proportional to boundary rarity. The choice of which 2-bit pair serves as the delimiter is a design parameter: {11} is the primary embodiment, offering simple OR-gate detection; {00} is an alternative embodiment optimised for ultra-low-power CMOS systems, minimising switching activity. All four bit-pair choices are covered by the patent claims. We present three encoding variants: (1) the primary scheme with {11} as sole delimiter; (2) a dual-starter variant where both {10} and {11} initiate distinct symbol namespaces; and (3) an analysis of unsigned versus balanced ternary data mappings. We describe a path toward ternary-native general computing infrastructure requiring no hardware changes, and outline applications spanning ternary neural network weight storage, hierarchical natural language encoding, edge computing, IoT and satellite telemetry, industrial sensors, automotive systems, medical devices, gaming, and financial tick data. The decoder is a 10-line stateless state machine resilient to bitstream corruption.
优化|敛散性(9篇)
【1】Safe and Near-Optimal Gate Control: A Case Study from the Danish West Coast
标题:安全且接近最佳的闸门控制:丹麦西海岸的案例研究
链接:https://arxiv.org/abs/2604.04545
作者:Martin Kristjansen,Kim Guldstrand Larsen,Marius Mikučionis,Christian Schilling
备注:In Proceedings MARS 2026, arXiv:2604.03053
摘要:Ringkoebing Fjord is an inland water basin on the Danish west coast separated from the North Sea by a set of gates used to control the amount of water entering and leaving the fjord. Currently, human operators decide when and how many gates to open or close for controlling the fjord's water level, with the goal to satisfy a range of conflicting safety and performance requirements such as keeping the water level in a target range, allowing maritime traffic, and enabling fish migration. Uppaal Stratego. We then use this digital twin along with forecasts of the sea level and the wind speed to learn a gate controller in an online fashion. We evaluate the learned controllers under different sea-level scenarios, representing normal tidal behavior, high waters, and low waters. Our evaluation demonstrates that, unlike a baseline controller, the learned controllers satisfy the safety requirements, while performing similarly regarding the other requirements.
【2】Relative Density Ratio Optimization for Stable and Statistically Consistent Model Alignment
标题:相对密度比优化以实现稳定且统计一致的模型对齐
链接:https://arxiv.org/abs/2604.04410
作者:Hiroshi Takahashi,Tomoharu Iwata,Atsutoshi Kumagai,Sekitoshi Kanai,Masanori Yamada,Kosuke Nishida,Kazutoshi Shinoda
备注:Code is available at https://github.com/takahashihiroshi/rdro
摘要:Aligning language models with human preferences is essential for ensuring their safety and reliability. Although most existing approaches assume specific human preference models such as the Bradley-Terry model, this assumption may fail to accurately capture true human preferences, and consequently, these methods lack statistical consistency, i.e., the guarantee that language models converge to the true human preference as the number of samples increases. In contrast, direct density ratio optimization (DDRO) achieves statistical consistency without assuming any human preference models. DDRO models the density ratio between preferred and non-preferred data distributions using the language model, and then optimizes it via density ratio estimation. However, this density ratio is unstable and often diverges, leading to training instability of DDRO. In this paper, we propose a novel alignment method that is both stable and statistically consistent. Our approach is based on the relative density ratio between the preferred data distribution and a mixture of the preferred and non-preferred data distributions. Our approach is stable since this relative density ratio is bounded above and does not diverge. Moreover, it is statistically consistent and yields significantly tighter convergence guarantees than DDRO. We experimentally show its effectiveness with Qwen 2.5 and Llama 3.
【3】MC-CPO: Mastery-Conditioned Constrained Policy Optimization
标题:MC-CPO:掌握条件约束政策优化
链接:https://arxiv.org/abs/2604.04251
作者:Oluseyi Olukola,Nick Rahimi
备注:15 pages, 8 figures. Submitted to IEEE Transactions on Learning Technologies (TLT)
摘要
:Engagement-optimized adaptive tutoring systems may prioritize short-term behavioral signals over sustained learning outcomes, creating structural incentives for reward hacking in reinforcement learning policies. We formalize this challenge as a constrained Markov decision process (CMDP) with mastery-conditioned feasibility, in which pedagogical safety constraints dynamically restrict admissible actions according to learner mastery and prerequisite structure. We introduce Mastery-Conditioned Constrained Policy Optimization (MC-CPO), a two-timescale primal-dual algorithm that integrates structural action masking with constrained policy optimization. In the tabular regime, we establish feasibility preservation and convergence to stationary feasible points under standard stochastic approximation conditions and derive a safety gap result showing that optimization within the mastery-conditioned feasible set can strictly dominate post-hoc filtering under identical safety budgets. Empirical validation is conducted in minimal and extended tabular environments and in a neural tutoring setting. Across 10 random seeds and one million training steps in the neural regime, MC-CPO satisfies constraint budgets within tolerance, reduces discounted safety costs relative to unconstrained and reward-shaped baselines, and substantially lowers the Reward Hacking Severity Index (RHSI). These results indicate that embedding pedagogical structure directly into the feasible action space provides a principled foundation for mitigating reward hacking in instructional reinforcement learning systems.
【4】Subspace Control: Turning Constrained Model Steering into Controllable Spectral Optimization
标题:子空间控制:将约束模型引导转化为可控谱优化
链接:https://arxiv.org/abs/2604.04231
作者:Yancheng Huang,Changsheng Wang,Chongyu Fan,Yicheng Lang,Bingqi Shang,Yang Zhang,Mingyi Hong,Qing Qu,Alvaro Velasquez,Sijia Liu
摘要:Foundation models, such as large language models (LLMs), are powerful but often require customization before deployment to satisfy practical constraints such as safety, privacy, and task-specific requirements, leading to "constrained" optimization problems for model steering and adaptation. However, solving such problems remains largely underexplored and is particularly challenging due to interference between the primary objective and constraint objectives during optimization. In this paper, we propose a subspace control framework for constrained model training. Specifically, (i) we first analyze, from a model merging perspective, how spectral cross-task interference arises and show that it can be resolved via a one-shot solution that orthogonalizes the merged subspace; (ii) we establish a connection between this solution and gradient orthogonalization in the spectral optimizer Muon; and (iii) building on these insights, we introduce SIFT (spectral interference-free training), which leverages a localization scheme to selectively intervene during optimization, enabling controllable updates that mitigate objective-constraint conflicts. We evaluate SIFT across four representative applications: (a) machine unlearning, (b) safety alignment, (c) text-to-speech adaptation, and (d) hallucination mitigation. Compared to both control-based and control-free baselines, SIFT consistently achieves substantial and robust performance improvements across all tasks. Code is available at https://github.com/OPTML-Group/SIFT.
【5】Restless Bandits with Individual Penalty Constraints: A New Near-Optimal Index Policy and How to Learn It
标题:个体惩罚约束下的不安分土匪:一种新的近优指数策略及其学习方法
链接:https://arxiv.org/abs/2604.04101
作者:Nida Zamir,I-Hong Hou
摘要:This paper investigates the Restless Multi-Armed Bandit (RMAB) framework under individual penalty constraints to address resource allocation challenges in dynamic wireless networked environments. Unlike conventional RMAB models, our model allows each user (arm) to have distinct and stringent performance constraints, such as energy limits, activation limits, or age of information minimums, enabling the capture of diverse objectives including fairness and efficiency. To find the optimal resource allocation policy, we propose a new Penalty-Optimal Whittle (POW) index policy. The POW index of an user only depends on the user's transition kernel and penalty constraints, and remains invariable to system-wide features such as the number of users present and the amount of resource available. This makes it computationally tractable to calculate the POW Indices offline without any need for online adaptation. Moreover, we theoretically prove that the POW index policy is asymptotically optimal while satisfying all individual penalty constraints. We also introduce a deep reinforcement learning algorithm to efficiently learn the POW index on the fly. Simulation results across various applications and system configurations further demonstrate that the POW index policy not only has near-optimal performance but also significantly outperforms other existing policies.
【6】On the Efficiency of Sinkhorn-Knopp for Entropically Regularized Optimal Transport
标题:论辛霍恩-诺普方程对熵正规化最优运输的效率
链接:https://arxiv.org/abs/2604.03787
作者:Kun He
备注:66 pages
摘要:The Sinkhorn--Knopp (SK) algorithm is a cornerstone method for matrix scaling and entropically regularized optimal transport (EOT). Despite its empirical efficiency, existing theoretical guarantees to achieve a target marginal accuracy $\varepsilon$ deteriorate severely in the presence of outliers, bottlenecked either by the global maximum regularized cost $η\|C\|_\infty$ (where $η$ is the regularization parameter and $C$ the cost matrix) or the matrix's minimum-to-maximum entry ratio $ν$. This creates a fundamental disconnect between theory and practice. In this paper, we resolve this discrepancy. For EOT, we introduce the novel concept of well-boundedness, a local bulk mass property that rigorously isolates the well-behaved portion of the data from extreme outliers. We prove that governed by this fundamental notion, SK recovers the target transport plan for a problem of dimension $n$ in $O(\log n - \log \varepsilon)$ iterations, completely independent of the regularized cost $η\|C\|_\infty$. Furthermore, we show that a virtually cost-free pre-scaling step eliminates the dimensional dependence entirely, accelerating convergence to a strictly dimension-free $O(\log(1/\varepsilon))$ iterations. Beyond EOT, we establish a sharp phase transition for general $(\boldsymbol{u},\boldsymbol{v})$-scaling governed by a critical matrix density threshold. We prove that when a matrix's density exceeds this threshold, the iteration complexity is strictly independent of $ν$. Conversely, when the density falls below this threshold, the dependence on $ν$ becomes unavoidable; in this sub-critical regime, we construct instances where SK requires $Ω(n/\varepsilon)$ iterations.
【7】Neural Global Optimization via Iterative Refinement from Noisy Samples
标题:通过有噪样本迭代细化的神经全局优化
链接:https://arxiv.org/abs/2604.03614
作者:Qusay Muzaffar,David Levin,Michael Werman
备注:17 pages, 5 figures, 2 tables
摘要
:Global optimization of black-box functions from noisy samples is a fundamental challenge in machine learning and scientific computing. Traditional methods such as Bayesian Optimization often converge to local minima on multi-modal functions, while gradient-free methods require many function evaluations. We present a novel neural approach that learns to find global minima through iterative refinement. Our model takes noisy function samples and their fitted spline representation as input, then iteratively refines an initial guess toward the true global minimum. Trained on randomly generated functions with ground truth global minima obtained via exhaustive search, our method achieves a mean error of 8.05 percent on challenging multi-modal test functions, compared to 36.24 percent for the spline initialization, a 28.18 percent improvement. The model successfully finds global minima in 72 percent of test cases with error below 10 percent, demonstrating learned optimization principles rather than mere curve fitting. Our architecture combines encoding of multiple modalities including function values, derivatives, and spline coefficients with iterative position updates, enabling robust global optimization without requiring derivative information or multiple restarts.
【8】PINNs in PDE Constrained Optimal Control Problems: Direct vs Indirect Methods
标题:PDL约束最优控制问题中的PINN:直接方法与间接方法
链接:https://arxiv.org/abs/2604.04920
作者:Zhen Zhang,Shanqing Liu,Alessandro Alla,Jerome Darbon,George Em Karniadakis
备注:8 pages, 3 figures
摘要:We study physics-informed neural networks (PINNs) as numerical tools for the optimal control of semilinear partial differential equations. We first recall the classical direct and indirect viewpoints for optimal control of PDEs, and then present two PINN formulations: a direct formulation based on minimizing the objective under the state constraint, and an indirect formulation based on the first-order optimality system. For a class of semilinear parabolic equations, we derive the state equation, the adjoint equation, and the stationarity condition in a form consistent with continuous-time Pontryagin-type optimality conditions. We then specialize the framework to an Allen-Cahn control problem and compare three numerical approaches: (i) a discretize-then-optimize adjoint method, (ii) a direct PINN, and (iii) an indirect PINN. Numerical results show that the PINN parameterization has an implicit regularizing effect, in the sense that it tends to produce smoother control profiles. They also indicate that the indirect PINN more faithfully preserves the PDE contraint and optimality structure and yields a more accurate neural approximation than the direct PINN.
【9】Primal-Dual Methods for Nonsmooth Nonconvex Optimization with Orthogonality Constraints
标题:非光滑非凸约束优化的原对偶方法
链接:https://arxiv.org/abs/2604.04130
作者:Linglingzhi Zhu,Wentao Ding,Shangyuan Liu,Anthony Man-Cho So
摘要:Recent advancements in data science have significantly elevated the importance of orthogonally constrained optimization problems. The Riemannian approach has become a popular technique for addressing these problems due to the advantageous computational and analytical properties of the Stiefel manifold. Nonetheless, the interplay of nonsmoothness alongside orthogonality constraints introduces substantial challenges to current Riemannian methods, including scalability, parallelizability, complicated subproblems, and cumulative numerical errors that threaten feasibility. In this paper, we take a retraction-free primal-dual approach and propose a linearized smoothing augmented Lagrangian method specifically designed for nonsmooth and nonconvex optimization with orthogonality constraints. Our proposed method is single-loop and free of subproblem solving. We establish its iteration complexity of $O(ε^{-3})$ for finding $ε$-KKT points, matching the best-known results in the Riemannian optimization literature. Additionally, by invoking the standard Kurdyka-Lojasiewicz (KL) property, we demonstrate asymptotic sequential convergence of the proposed algorithm. Numerical experiments on both smooth and nonsmooth orthogonal constrained problems demonstrate the superior computational efficiency and scalability of the proposed method compared with state-of-the-art algorithms.
预测|估计(6篇)
【1】Efficient Onboard Spacecraft Pose Estimation with Event Cameras and Neuromorphic Hardware
标题:利用事件摄像机和神经形态硬件进行高效的星载航天器姿态估计
链接:https://arxiv.org/abs/2604.04117
作者:Arunkumar Rathinam,Jules Lecomte,Jost Reelsen,Gregor Lenz,Axel von Arnim,Djamila Aouada
备注:AI4SPACE workshop at CVPR 2026
摘要:Reliable relative pose estimation is a key enabler for autonomous rendezvous and proximity operations, yet space imagery is notoriously challenging due to extreme illumination, high contrast, and fast target motion. Event cameras provide asynchronous, change-driven measurements that can remain informative when frame-based imagery saturates or blurs, while neuromorphic processors can exploit sparse activations for low-latency, energy-efficient inferences. This paper presents a spacecraft 6-DoF pose-estimation pipeline that couples event-based vision with the BrainChip Akida neuromorphic processor. Using the SPADES dataset, we train compact MobileNet-style keypoint regression networks on lightweight event-frame representations, apply quantization-aware training (8/4-bit), and convert the models to Akida-compatible spiking neural networks. We benchmark three event representations and demonstrate real-time, low-power inference on Akida V1 hardware. We additionally design a heatmap-based model targeting Akida V2 and evaluate it on Akida Cloud, yielding improved pose accuracy. To our knowledge, this is the first end-to-end demonstration of spacecraft pose estimation running on Akida hardware, highlighting a practical route to low-latency, low-power perception for future autonomous space missions.
【2】Algebraic Diversity: Group-Theoretic Spectral Estimation from Single Observations
标题:代数多样性:单次观测的群论谱估计
链接:https://arxiv.org/abs/2604.03634
作者:Mitchell A. Thornton
摘要
:We prove that temporal averaging over multiple observations can be replaced by algebraic group action on a single observation for second-order statistical estimation. A General Replacement Theorem establishes conditions under which a group-averaged estimator from one snapshot achieves equivalent subspace decomposition to multi-snapshot covariance estimation, and an Optimality Theorem proves that the symmetric group is universally optimal (yielding the KL transform). The framework unifies the DFT, DCT, and KLT as special cases of group-matched spectral transforms, with a closed-form double-commutator eigenvalue problem for polynomial-time optimal group selection. Five applications are demonstrated: MUSIC DOA estimation from a single snapshot, massive MIMO channel estimation with 64% throughput gain, single-pulse waveform classification at 90% accuracy, graph signal processing with non-Abelian groups, and a new algebraic analysis of transformer LLMs revealing that RoPE uses the wrong algebraic group for 70-80% of attention heads across five models (22,480 head observations), that the optimal group is content-dependent, and that spectral-concentration-based pruning improves perplexity at the 13B scale. All diagnostics require a single forward pass with no gradients or training.
【3】Evaluation of Bagging Predictors with Kernel Density Estimation and Bagging Score
标题:用核密度估计和装袋得分评价装袋预测因子
链接:https://arxiv.org/abs/2604.03599
作者:Philipp Seitz,Jan Schmitt,Andreas Schiffler
备注:5 pages, 2 figures, 2 tables, 1 algorithm, 9th International Conference on Advances in Artificial Intelligence (ICAAI 2025)
摘要:For a larger set of predictions of several differently trained machine learning models, known as bagging predictors, the mean of all predictions is taken by default. Nevertheless, this proceeding can deviate from the actual ground truth in certain parameter regions. An approach is presented to determine a representative y_BS from such a set of predictions using Kernel Density Estimation (KDE) in nonlinear regression with Neural Networks (NN) which simultaneously provides an associated quality criterion beta_BS, called Bagging Score (BS), that reflects the confidence of the obtained ensemble prediction. It is shown that working with the new approach better predictions can be made than working with the common use of mean or median. In addition to this, the used method is contrasted to several approaches of nonlinear regression from the literatur, resulting in a top ranking in each of the calculated error values without using any optimization or feature selection technique.
【4】Agile Story-Point Estimation: Is RAG a Better Way to Go?
标题:敏捷故事点估计:RAG是更好的选择吗?
链接:https://arxiv.org/abs/2604.03443
作者:Lamyea Maha,Tajmilur Rahman,Chanchal Roy
摘要:The sprint-based iterative approach in the Agile software development method allows continuous feedback and adaptation. One of the crucial Agile software development activities is the sprint planning session where developers estimate the effort required to complete tasks through a consensus-based estimation technique such as Planning Poker. In the Agile software development method, a common unit of measuring development effort is Story Point (SP) which is assigned to tasks to understand the complexity and development time needed to complete them. Despite the benefits of this process, it is an extremely time-consuming manual process. To mitigate this issue, in this study, we investigated if this manual process can be automated using Retrieval Augmented Generation (RAG) which comprises a "Retriever" and a "Generator". We applied two embedding models - bge-large-en-v1.5, and Sentence-Transformers' all-mpnet-base-v2 on 23 open-source software projects of varying sizes and examined four key aspects: 1) how retrieval hyper-parameters influence the performance, 2) whether estimation accuracy differs across different sizes of the projects, 3) whether embedding model choice affects accuracy, and 4) how the RAG-based approach compares to the existing baselines. Although the RAG-based approach outperformed the baseline models in several occasions, our results did not exhibit statistically significant differences in performance across the projects or across the embedding models. This highlights the need for further studies and refinement of the RAG, and model adaptation strategies for better accuracy in automatically estimating user stories.
【5】Apparent Age Estimation: Challenges and Outcomes
标题:表观年龄估计:挑战和结果
链接:https://arxiv.org/abs/2604.03335
作者:Justin Rainier Go,Lorenz Bernard Marqueses,Mikaella Kaye Martinez,John Kevin Patrick Sarmiento,Abien Fred Agarap
备注:Accepted for oral presentation at Philippine Computing Science Congress 2026
摘要:Apparent age estimation is a valuable tool for business personalization, yet current models frequently exhibit demographic biases. We review prior works on the DEX method by applying distribution learning techniques such as Mean-Variance Loss (MVL) and Adaptive Mean-Residue Loss (AMRL), and evaluate them in both accuracy and fairness. Using IMDB-WIKI, APPA-REAL, and FairFace, we demonstrate that while AMRL achieves state-of-the-art accuracy, trade-offs between precision and demographic equity persist. Despite clear age clustering in UMAP embeddings, our saliency maps indicate inconsistent feature focus across demographics, leading to significant performance degradation for Asian and African American populations. We argue that technical improvements alone are insufficient; accurate and fair apparent age estimation requires the integration of localized and diverse datasets, and strict adherence to fairness validation protocols.
【6】Debiased Machine Learning for Conformal Prediction of Counterfactual Outcomes Under Runtime Confounding
标题:去偏机器学习用于条件混杂下反事实结果的保形预测
链接:https://arxiv.org/abs/2604.03772
作者:Keith Barnatchez,Kevin P. Josey,Rachel C. Nethery,Giovanni Parmigiani
摘要
:Data-driven decision making frequently relies on predicting counterfactual outcomes. In practice, researchers commonly train counterfactual prediction models on a source dataset to inform decisions on a possibly separate target population. Conformal prediction has arisen as a popular method for producing assumption-lean prediction intervals for counterfactual outcomes that would arise under different treatment decisions in the target population of interest. However, existing methods require that every confounding factor of the treatment-outcome relationship used for training on the source data is additionally measured in the target population, risking miscoverage if important confounders are unmeasured in the target population. In this paper, we introduce a computationally efficient debiased machine learning framework that allows for valid prediction intervals when only a subset of confounders is measured in the target population, a common challenge referred to as runtime confounding. Grounded in semiparametric efficiency theory, we show the resulting prediction intervals achieve desired coverage rates with faster convergence compared to standard methods. Through numerous synthetic and semi-synthetic experiments, we demonstrate the utility of our proposed method.
其他神经网络|深度学习|模型|建模(42篇)
【1】QED-Nano: Teaching a Tiny Model to Prove Hard Theorems
标题:QED-Nano:教授微型模型来证明硬定理
链接:https://arxiv.org/abs/2604.04898
作者:LM-Provers,Yuxiao Qu,Amrith Setlur,Jasper Dekoninck,Edward Beeching,Jia Li,Ian Wu,Lewis Tunstall,Aviral Kumar
摘要:Proprietary AI systems have recently demonstrated impressive capabilities on complex proof-based problems, with gold-level performance reported at the 2025 International Mathematical Olympiad (IMO). However, the training pipelines behind these systems remain largely undisclosed, and their reliance on large "internal" models and scaffolds makes them expensive to run, difficult to reproduce, and hard to study or improve upon. This raises a central question: can small, open models also be trained to achieve competitive reasoning performance on difficult Olympiad-level math? In this paper, we answer this question by building QED-Nano, a 4B model post-trained for Olympiad-level proofs. Our training recipe has three stages: (1) supervised fine-tuning to imbue good proof-writing styles by distilling from DeepSeek-Math-V2, (2) reinforcement learning (RL) with rubric-based rewards, and (3) expanding RL with a reasoning cache, which decomposes long proofs into iterative summarize-and-refine cycles and enables stronger test-time reasoning. QED-Nano surpasses the proof-generation performance of much larger open models, including Nomos-1 and GPT-OSS-120B, and approaches the performance of proprietary models like Gemini 3 Pro, at a fraction of the inference cost. To support further research on open mathematical reasoning, we release the full QED-Nano pipeline, including the QED-Nano and QED-Nano-SFT models, the FineProofs-SFT and FineProofs-RL datasets, and the training and evaluation code.
【2】Synthetic Sandbox for Training Machine Learning Engineering Agents
标题:用于训练机器学习工程代理的合成沙盒
链接:https://arxiv.org/abs/2604.04872
作者:Yuhang Zhou,Lizhu Zhang,Yifan Wu,Jiayi Liu,Xiangjun Fan,Zhuokai Zhao,Hong Yan
备注:28 pages, 9 tables, 8 figures
摘要:As large language model agents advance beyond software engineering (SWE) tasks toward machine learning engineering (MLE), verifying agent behavior becomes orders of magnitude more expensive: while SWE tasks can be verified via fast-executing unit tests, MLE verification requires running full ML pipelines -- data preprocessing, model training, and metric evaluation -- on large datasets at each rollout step, rendering trajectory-wise on-policy reinforcement learning (RL) prohibitively slow. Existing approaches retreat to supervised fine-tuning (SFT) or offline proxy rewards, sacrificing the exploration and generalization benefits of on-policy RL. We observe that sandbox data size is the primary source of this bottleneck. Based on this insight, we introduce SandMLE, a multi-agent framework that generates diverse, verifiable synthetic MLE environments from a small number of seed tasks, preserving the structural and technical complexity of real-world problems while constraining datasets to micro-scale (each task is paired with only 50-200 training samples). Through extensive experiments, we show that SandMLE reduces execution time by over 13 times, enabling large-scale, on-policy trajectory-wise RL for the first time in the MLE domain. On MLE-bench-lite, SandMLE yields significant gains over SFT baselines across Qwen3-8B, 14B, and 30B-A3B, with relative medal rate improvements ranging from 20.3% to 66.9%. Furthermore, the trained policy generalizes across unseen agentic scaffolds, achieving up to 32.4% better HumanRank score on MLE-Dojo.
【3】Fine-Tuning Integrity for Modern Neural Networks: Structured Drift Proofs via Norm, Rank, and Sparsity Certificates
标题:现代神经网络的微调完整性:通过Norm、Rank和Sparsity证书的结构化漂移证明
链接:https://arxiv.org/abs/2604.04738
作者:Zhenhang Shang,Kani Chen
备注:15 pages, 3 figures
摘要:Fine-tuning is now the primary method for adapting large neural networks, but it also introduces new integrity risks. An untrusted party can insert backdoors, change safety behavior, or overwrite large parts of a model while claiming only small updates. Existing verification tools focus on inference correctness or full-model provenance and do not address this problem. We introduce Fine-Tuning Integrity (FTI) as a security goal for controlled model evolution. An FTI system certifies that a fine-tuned model differs from a trusted base only within a policy-defined drift class. We propose Succinct Model Difference Proofs (SMDPs) as a new cryptographic primitive for enforcing these drift constraints. SMDPs provide zero-knowledge proofs that the update to a model is norm-bounded, low-rank, or sparse. The verifier cost depends only on the structure of the drift, not on the size of the model. We give concrete SMDP constructions based on random projections, polynomial commitments, and streaming linear checks. We also prove an information-theoretic lower bound showing that some form of structure is necessary for succinct proofs. Finally, we present architecture-aware instantiations for transformers, CNNs, and MLPs, together with an end-to-end system that aggregates block-level proofs into a global certificate.
【4】Sampling Parallelism for Fast and Efficient Bayesian Learning
标题:采样并行主义实现快速有效的Bayesian学习
链接:https://arxiv.org/abs/2604.04736
作者:Asena Karolin Özdemir,Lars H. Heyen,Arvid Weyrauch,Achim Streit,Markus Götz,Charlotte Debus
备注
:12 pages, 10 figures, 1 table
摘要:Machine learning models, and deep neural networks in particular, are increasingly deployed in risk-sensitive domains such as healthcare, environmental forecasting, and finance, where reliable quantification of predictive uncertainty is essential. However, many uncertainty quantification (UQ) methods remain difficult to apply due to their substantial computational cost. Sampling-based Bayesian learning approaches, such as Bayesian neural networks (BNNs), are particularly expensive since drawing and evaluating multiple parameter samples rapidly exhausts memory and compute resources. These constraints have limited the accessibility and exploration of Bayesian techniques thus far. To address these challenges, we introduce sampling parallelism, a simple yet powerful parallelization strategy that targets the primary bottleneck of sampling-based Bayesian learning: the samples themselves. By distributing sample evaluations across multiple GPUs, our method reduces memory pressure and training time without requiring architectural changes or extensive hyperparameter tuning. We detail the methodology and evaluate its performance on a few example tasks and architectures, comparing against distributed data parallelism (DDP) as a baseline. We further demonstrate that sampling parallelism is complementary to existing strategies by implementing a hybrid approach that combines sample and data parallelism. Our experiments show near-perfect scaling when the sample number is scaled proportionally to the computational resources, confirming that sample evaluations parallelize cleanly. Although DDP achieves better raw speedups under scaling with constant workload, sampling parallelism has a notable advantage: by applying independent stochastic augmentations to the same batch on each GPU, it increases augmentation diversity and thus reduces the number of epochs required for convergence.
【5】The Infinite-Dimensional Nature of Spectroscopy and Why Models Succeed, Fail, and Mislead
标题:光谱学的无限维本质以及模型为何成功、失败和误导
链接:https://arxiv.org/abs/2604.04717
作者:Umberto Michelucci,Francesca Venturini
摘要:Machine learning (ML) models have achieved strikingly high accuracies in spectroscopic classification tasks, often without a clear proof that those models used chemically meaningful features. Existing studies have linked these results to data preprocessing choices, noise sensitivity, and model complexity, but no unifying explanation is available so far. In this work, we show that these phenomena arise naturally from the intrinsic high dimensionality of spectral data. Using a theoretical analysis grounded in the Feldman-Hajek theorem and the concentration of measure, we show that even infinitesimal distributional differences, caused by noise, normalisation, or instrumental artefacts, may become perfectly separable in high-dimensional spaces. Through a series of specific experiments on synthetic and real fluorescence spectra, we illustrate how models can achieve near-perfect accuracy even when chemical distinctions are absent, and why feature-importance maps may highlight spectrally irrelevant regions. We provide a rigorous theoretical framework, confirm the effect experimentally, and conclude with practical recommendations for building and interpreting ML models in spectroscopy.
【6】Grokking as Dimensional Phase Transition in Neural Networks
标题:Groking作为神经网络中的维度相转变
链接:https://arxiv.org/abs/2604.04655
作者:Ping Wang
摘要:Neural network grokking -- the abrupt memorization-to-generalization transition -- challenges our understanding of learning dynamics. Through finite-size scaling of gradient avalanche dynamics across eight model scales, we find that grokking is a \textit{dimensional phase transition}: effective dimensionality~$D$ crosses from sub-diffusive (subcritical, $D < 1$) to super-diffusive (supercritical, $D > 1$) at generalization onset, exhibiting self-organized criticality (SOC). Crucially, $D$ reflects \textbf{gradient field geometry}, not network architecture: synthetic i.i.d.\ Gaussian gradients maintain $D \approx 1$ regardless of graph topology, while real training exhibits dimensional excess from backpropagation correlations. The grokking-localized $D(t)$ crossing -- robust across topologies -- offers new insight into the trainability of overparameterized networks.
【7】Learning from Equivalence Queries, Revisited
标题:从对等中学习,再访
链接:https://arxiv.org/abs/2604.04535
作者:Mark Braverman,Roi Livni,Yishay Mansour,Shay Moran,Kobbi Nissim
摘要:Modern machine learning systems, such as generative models and recommendation systems, often evolve through a cycle of deployment, user interaction, and periodic model updates. This differs from standard supervised learning frameworks, which focus on loss or regret minimization over a fixed sequence of prediction tasks. Motivated by this setting, we revisit the classical model of learning from equivalence queries, introduced by Angluin (1988). In this model, a learner repeatedly proposes hypotheses and, when a deployed hypothesis is inadequate, receives a counterexample. Under fully adversarial counterexample generation, however, the model can be overly pessimistic. In addition, most prior work assumes a \emph{full-information} setting, where the learner also observes the correct label of the counterexample, an assumption that is not always natural. We address these issues by restricting the environment to a broad class of less adversarial counterexample generators, which we call \emph{symmetric}. Informally, such generators choose counterexamples based only on the symmetric difference between the hypothesis and the target. This class captures natural mechanisms such as random counterexamples (Angluin and Dohrn, 2017; Bhatia, 2021; Chase, Freitag, and Reyzin, 2024), as well as generators that return the simplest counterexample according to a prescribed complexity measure. Within this framework, we study learning from equivalence queries under both full-information and bandit feedback. We obtain tight bounds on the number of learning rounds in both settings and highlight directions for future work. Our analysis combines a game-theoretic view of symmetric adversaries with adaptive weighting methods and minimax arguments.
【8】Reproducibility study on how to find Spurious Correlations, Shortcut Learning, Clever Hans or Group-Distributional non-robustness and how to fix them
标题:关于如何发现虚假相关、非线性学习、聪明汉斯或群体分布非稳健性以及如何修复它们的再现性研究
链接:https://arxiv.org/abs/2604.04518
作者:Ole Delzer,Sidney Bender
备注:62 pages, 27 figures
摘要:Deep Neural Networks (DNNs) are increasingly utilized in high-stakes domains like medical diagnostics and autonomous driving where model reliability is critical. However, the research landscape for ensuring this reliability is terminologically fractured across communities that pursue the same goal of ensuring models rely on causally relevant features rather than confounding signals. While frameworks such as distributionally robust optimization (DRO), invariant risk minimization (IRM), shortcut learning, simplicity bias, and the Clever Hans effect all address model failure due to spurious correlations, researchers typically only reference work within their own domains. This reproducibility study unifies these perspectives through a comparative analysis of correction methods under challenging constraints like limited data availability and severe subgroup imbalance. We evaluate recently proposed correction methods based on explainable artificial intelligence (XAI) techniques alongside popular non-XAI baselines using both synthetic and real-world datasets. Findings show that XAI-based methods generally outperform non-XAI approaches, with Counterfactual Knowledge Distillation (CFKD) proving most consistently effective at improving generalization. Our experiments also reveal that the practical application of many methods is hindered by a dependency on group labels, as manual annotation is often infeasible and automated tools like Spectral Relevance Analysis (SpRAy) struggle with complex features and severe imbalance. Furthermore, the scarcity of minority group samples in validation sets renders model selection and hyperparameter tuning unreliable, posing a significant obstacle to the deployment of robust and trustworthy models in safety-critical areas.
【9】Discrete Prototypical Memories for Federated Time Series Foundation Models
标题:联邦时间序列基础模型的离散原型存储器
链接:https://arxiv.org/abs/2604.04475
作者:Liwei Deng,Qingxiang Liu,Xinhe Niu,Shengchao Chen,Sheng Sun,Yuankai Wu,Guodong Long,Yuxuan Liang
备注:13 pages,5 figures
摘要:Leveraging Large Language Models (LLMs) as federated learning (FL)-based time series foundation models offers a promising way to transfer the generalization capabilities of LLMs to time series data while preserving access to private data. However, the semantic misalignment between time-series data and the text-centric latent space of existing LLMs often leads to degraded performance. Meanwhile, the parameter-sharing mechanism in existing FL methods model heterogeneous cross-domain time-series data into a unified continuous latent space, which contradicts the fact that time-series semantics frequently manifest as discrete and recurring regimes. To address these limitations, we propose \textsc{FeDPM}, a federated framework for time-series foundation models based on discrete prototypical memories. Specifically, we learn local prototypical memory priors for intra-domain time-series data. We then align cross-domain memories to promote a unified discrete latent space and introduce a domain-specific memory update mechanism to balance shared and personalized prototypical knowledge. Extensive experiments demonstrate the efficiency and effectiveness of \textsc{FeDPM}. The code is publicly available at https://anonymous.4open.science/r/FedUnit-64D1.
【10】Generative modeling of granular flow on inclined planes using conditional flow matching
标题:使用条件流匹配的倾斜平面颗粒流生成建模
链接:https://arxiv.org/abs/2604.04453
作者:Xuyang Li,Rui Li,Teng Man,Yimin Lu
摘要:Granular flows govern many natural and industrial processes, yet their interior kinematics and mechanics remain largely unobservable, as experiments access only boundaries or free surfaces. Conventional numerical simulations are computationally expensive for fast inverse reconstruction, and deterministic models tend to collapse to over-smoothed mean predictions in ill-posed settings. This study, to the best of the authors' knowledge, presents the first conditional flow matching (CFM) framework for granular-flow reconstruction from sparse boundary observations. Trained on high-fidelity particle-resolved discrete element simulations, the generative model is guided at inference by a differentiable forward operator with a sparsity-aware gradient guidance mechanism, which enforces measurement consistency without hyperparameter tuning and prevents unphysical velocity predictions in non-material regions. A physics decoder maps the reconstructed velocity fields to stress states and energy fluctuation quantities, including mean stress, deviatoric stress, and granular temperature. The framework accurately recovers interior flow fields from full observation to only 16% of the informative window, and it remains effective under strongly diluted spatial resolution with only 11% of data. It also outperforms a deterministic CNN baseline in the most ill-posed reconstruction regime and provides spatially resolved uncertainty estimates through ensemble generation. These results demonstrate that conditional generative modeling offers a practical route for non-invasive inference of hidden bulk mechanics in granular media, with broader applicability for inverse problems in particulate and multiphase systems.
【11】Is Prompt Selection Necessary for Task-Free Online Continual Learning?
标题:快速选择是否是无任务在线继续学习的必要条件?
链接:https://arxiv.org/abs/2604.04420
作者:Seoyoung Park,Haemin Lee,Hankook Lee
备注:Accepted to CVPR Findings 2026. The code is available at https://github.com/efficient-learning-lab/SinglePrompt
摘要
:Task-free online continual learning has recently emerged as a realistic paradigm for addressing continual learning in dynamic, real-world environments, where data arrive in a non-stationary stream without clear task boundaries and can only be observed once. To consider such challenging scenarios, many recent approaches have employed prompt selection, an adaptive strategy that selects prompts from a pool based on input signals. However, we observe that such selection strategies often fail to select appropriate prompts, yielding suboptimal results despite additional training of key parameters. Motivated by this observation, we propose a simple yet effective SinglePrompt that eliminates the need for prompt selection and focuses on classifier optimization. Specifically, we simply (i) inject a single prompt into each self-attention block, (ii) employ a cosine similarity-based logit design to alleviate the forgetting effect inherent in the classifier weights, and (iii) mask logits for unexposed classes in the current minibatch. With this simple task-free design, our framework achieves state-of-the-art performance across various online continual learning benchmarks. Source code is available at https://github.com/efficient-learning-lab/SinglePrompt.
【12】Eliminating Vendor Lock-In in Quantum Machine Learning via Framework-Agnostic Neural Networks
标题:通过框架不可知的神经网络消除量子机器学习中的供应商锁定
链接:https://arxiv.org/abs/2604.04414
作者:Poornima Kumaresan,Shwetha Singaravelu,Lakshmi Rajendran,Santhosh Sivasubramani
摘要:Quantum machine learning (QML) stands at the intersection of quantum computing and artificial intelligence, offering the potential to solve problems that remain intractable for classical methods. However, the current landscape of QML software frameworks suffers from severe fragmentation: models developed in TensorFlow Quantum cannot execute on PennyLane backends, circuits authored in Qiskit Machine Learning cannot be deployed to Amazon Braket hardware, and researchers who invest in one ecosystem face prohibitive switching costs when migrating to another. This vendor lock-in impedes reproducibility, limits hardware access, and slows the pace of scientific discovery. In this paper, we present a framework-agnostic quantum neural network (QNN) architecture that abstracts away vendor-specific interfaces through a unified computational graph, a hardware abstraction layer (HAL), and a multi-framework export pipeline. The core architecture supports simultaneous integration with TensorFlow, PyTorch, and JAX as classical co-processors, while the HAL provides transparent access to IBM Quantum, Amazon Braket, Azure Quantum, IonQ, and Rigetti backends through a single application programming interface (API). We introduce three pluggable data encoding strategies (amplitude, angle, and instantaneous quantum polynomial encoding) that are compatible with all supported backends. An export module leveraging Open Neural Network Exchange (ONNX) metadata enables lossless circuit translation across Qiskit, Cirq, PennyLane, and Braket representations. We benchmark our framework on the Iris, Wine, and MNIST-4 classification tasks, demonstrating training time parity (within 8\% overhead) compared to native framework implementations, while achieving identical classification accuracy.
【13】Deep Kuratowski Embedding Neural Networks for Wasserstein Metric Learning
标题:用于Wasserstein度量学习的深度Kuratowski嵌入神经网络
链接:https://arxiv.org/abs/2604.04343
作者:Andrew Qing He
摘要:Computing pairwise Wasserstein distances is a fundamental bottleneck in data analysis pipelines. Motivated by the classical Kuratowski embedding theorem, we propose two neural architectures for learning to approximate the Wasserstein-2 distance ($W_2$) from data. The first, DeepKENN, aggregates distances across all intermediate feature maps of a CNN using learnable positive weights. The second, ODE-KENN, replaces the discrete layer stack with a Neural ODE, embedding each input into the infinite-dimensional Banach space $C^1([0,1], \mathbb{R}^d)$ and providing implicit regularization via trajectory smoothness. Experiments on MNIST with exact precomputed $W_2$ distances show that ODE-KENN achieves a 28% lower test MSE than the single-layer baseline and 18% lower than DeepKENN under matched parameter counts, while exhibiting a smaller generalization gap. The resulting fast surrogate can replace the expensive $W_2$ oracle in downstream pairwise distance computations.
【14】Generative models for decision-making under distributional shift
标题:分配转移下决策的生成模型
链接:https://arxiv.org/abs/2604.04342
作者:Xiuyuan Cheng,Yunqin Zhu,Yao Xie
备注:Under review for INFORMS TutORials in Operations Research, 2026
摘要:Many data-driven decision problems are formulated using a nominal distribution estimated from historical data, while performance is ultimately determined by a deployment distribution that may be shifted, context-dependent, partially observed, or stress-induced. This tutorial presents modern generative models, particularly flow- and score-based methods, as mathematical tools for constructing decision-relevant distributions. From an operations research perspective, their primary value lies not in unconstrained sample synthesis but in representing and transforming distributions through transport maps, velocity fields, score fields, and guided stochastic dynamics. We present a unified framework based on pushforward maps, continuity, Fokker-Planck equations, Wasserstein geometry, and optimization in probability space. Within this framework, generative models can be used to learn nominal uncertainty, construct stressed or least-favorable distributions for robustness, and produce conditional or posterior distributions under side information and partial observation. We also highlight representative theoretical guarantees, including forward-reverse convergence for iterative flow models, first-order minimax analysis in transport-map space, and error-transfer bounds for posterior sampling with generative priors. The tutorial provides a principled introduction to using generative models for scenario generation, robust decision-making, uncertainty quantification, and related problems under distributional shift.
【15】Entropy, Disagreement, and the Limits of Foundation Models in Genomics
标题:基因组学中基础模型的遗传性、分歧和局限性
链接:https://arxiv.org/abs/2604.04287
作者:Maxime Rochkoulets,Lovro Vrček,Mile Šikić
摘要
:Foundation models in genomics have shown mixed success compared to their counterparts in natural language processing. Yet, the reasons for their limited effectiveness remain poorly understood. In this work, we investigate the role of entropy as a fundamental factor limiting the capacities of such models to learn from their training data and develop foundational capabilities. We train ensembles of models on text and DNA sequences and analyze their predictions, static embeddings, and empirical Fisher information flow. We show that the high entropy of genomic sequences -- from the point of view of unseen token prediction -- leads to near-uniform output distributions, disagreement across models, and unstable static embeddings, even for models that are matched in architecture, training and data. We then demonstrate that models trained on DNA concentrate Fisher information in embedding layers, seemingly failing to exploit inter-token relationships. Our results suggest that self-supervised training from sequences alone may not be applicable to genomic data, calling into question the assumptions underlying current methodologies for training genomic foundation models.
【16】A Family of Open Time-Series Foundation Models for the Radio Access Network
标题:无线电接入网络的一系列开放时间序列基础模型
链接:https://arxiv.org/abs/2604.04271
作者:Ioannis Panitsas,Leandros Tassiulas
摘要:The Radio Access Network (RAN) is evolving into a programmable and disaggregated infrastructure that increasingly relies on AI-native algorithms for optimization and closed-loop control. However, current RAN intelligence is still largely built from task-specific models tailored to individual functions, resulting in model fragmentation, limited knowledge sharing across tasks, poor generalization, and increased system complexity. To address these limitations, we introduce TimeRAN, a unified multi-task learning framework for time-series modeling in the RAN. TimeRAN leverages a lightweight time-series foundation model with few task-specific heads to learn transferable representations that can be efficiently adapted across diverse tasks with limited supervision. To enable large-scale pretraining, we further curate and open-source TimeRAN DataPile, the largest time-series corpus for RAN analytics to date, comprising over 355K time series and 0.56B measurements across diverse telemetry sources, protocol layers, and deployment scenarios. We evaluate TimeRAN across a comprehensive set of RAN analytics tasks, including anomaly detection, classification, forecasting, and imputation, and show that it achieves state-of-the-art performance with minimal or no task-specific fine-tuning. Finally, we integrate TimeRAN into a proof-of-concept 5G testbed and demonstrate that it operates efficiently with limited resource requirements in real-world scenarios.
【17】Transmission Neural Networks: Inhibitory and Excitatory Connections
标题:传输神经网络:抑制性和兴奋性连接
链接:https://arxiv.org/abs/2604.04246
作者:Shuang Gao,Peter E. Caines
备注:8 pages
摘要:This paper extends the Transmission Neural Network model proposed by Gao and Caines in [1]-[3] to incorporate inhibitory connections and neurotransmitter populations. The extended network model contains binary neuronal states, transmission dynamics, and inhibitory and excitatory connections. Under technical assumptions, we establish the characterization of the firing probabilities of neurons, and show that such a characterization considering inhibitions can be equivalently represented by a neural network where each neuron has a continuous state of dimension 2. Moreover, we incorporated neurotransmitter populations into the modeling and establish the limit network model when the number of neurotransmitters at all synaptic connections go to infinity. Finally, sufficient conditions for stability and contraction properties of the limit network model are established.
【18】Learning An Interpretable Risk Scoring System for Maximizing Decision Net Benefit
标题:学习可解释的风险评分系统以最大化决策净效益
链接:https://arxiv.org/abs/2604.04241
作者:Wenhao Chi,Ş. İlker Birbil
备注:31 pages, 5 figures, and 6 tables
摘要:Risk scoring systems are widely used in high-stakes domains to assist decision-making. However, existing approaches often focus on optimizing predictive accuracy or likelihood-based criteria, which may not align with the main goal of maximizing utility. In this paper, we propose a novel risk scoring system that directly optimizes net benefit over a range of decision thresholds. The model is formulated as a sparse integer linear programming problem which enables the construction of a transparent scoring system with integer coefficients, and hence, facilitates interpretation and practical application. We also establish fundamental relationships among net benefit, discrimination, and calibration. Our analysis proves that optimizing net benefit also guarantees conventional performance measures. We thoroughly evaluated our method on multiple public datasets as well as on a real-world clinical dataset. This computational study demonstrated that our interpretable method can effectively achieve high net benefit while maintaining competitive discrimination and calibration performance.
【19】Learning from Imperfect Demonstrations via Temporal Behavior Tree-Guided Trajectory Repair
标题:通过时间行为树引导轨迹修复从不完美的演示中学习
链接:https://arxiv.org/abs/2604.04225
作者:Aniruddh G. Puranic,Sebastian Schirmer,John S. Baras,Calin Belta
备注:12 pages, 4 figures. This work has been submitted to the IEEE for possible publication
摘要:Learning robot control policies from demonstrations is a powerful paradigm, yet real-world data is often suboptimal, noisy, or otherwise imperfect, posing significant challenges for imitation and reinforcement learning. In this work, we present a formal framework that leverages Temporal Behavior Trees (TBT), an extension of Signal Temporal Logic (STL) with Behavior Tree semantics, to repair suboptimal trajectories prior to their use in downstream policy learning. Given demonstrations that violate a TBT specification, a model-based repair algorithm corrects trajectory segments to satisfy the formal constraints, yielding a dataset that is both logically consistent and interpretable. The repaired trajectories are then used to extract potential functions that shape the reward signal for reinforcement learning, guiding the agent toward task-consistent regions of the state space without requiring knowledge of the agent's kinematic model. We demonstrate the effectiveness of this framework on discrete grid-world navigation and continuous single and multi-agent reach-avoid tasks, highlighting its potential for data-efficient robot learning in settings where high-quality demonstrations cannot be assumed.
【20】The Geometric Alignment Tax: Tokenization vs. Continuous Geometry in Scientific Foundation Models
标题:几何对齐税:科学基础模型中的代币化与连续几何
链接
:https://arxiv.org/abs/2604.04155
作者:Prashant C. Raju
摘要:Foundation models for biology and physics optimize predictive accuracy, but their internal representations systematically fail to preserve the continuous geometry of the systems they model. We identify the root cause: the Geometric Alignment Tax, an intrinsic cost of forcing continuous manifolds through discrete categorical bottlenecks. Controlled ablations on synthetic dynamical systems demonstrate that replacing cross-entropy with a continuous head on an identical encoder reduces geometric distortion by up to 8.5x, while learned codebooks exhibit a non-monotonic double bind where finer quantization worsens geometry despite improving reconstruction. Under continuous objectives, three architectures differ by 1.3x; under discrete tokenization, they diverge by 3,000x. Evaluating 14 biological foundation models with rate-distortion theory and MINE, we identify three failure regimes: Local-Global Decoupling, Representational Compression, and Geometric Vacuity. A controlled experiment confirms that Evo 2's reverse-complement robustness on real DNA reflects conserved sequence composition, not learned symmetry. No model achieves simultaneously low distortion, high mutual information, and global coherence.
【21】Physical Sensitivity Kernels Can Emerge in Data-Driven Forward Models: Evidence From Surface-Wave Dispersion
标题:物理敏感性核可能出现在数据驱动的正演模型中:来自海波频散的证据
链接:https://arxiv.org/abs/2604.04107
作者:Ziye Yu,Yuqi Cai,Xin Liu
备注:12 pages, 2 figures
摘要:Data-driven neural networks are increasingly used as surrogate forward models in geophysics, but it remains unclear whether they recover only the data mapping or also the underlying physical sensitivity structure. Here we test this question using surface-wave dispersion. By comparing automatically differentiated gradients from a neural-network surrogate with theoretical sensitivity kernels, we show that the learned gradients can recover the main depth-dependent structure of physical kernels across a broad range of periods. This indicates that neural surrogate models can learn physically meaningful differential information, rather than acting as purely black-box predictors. At the same time, strong structural priors in the training distribution can introduce systematic artifacts into the inferred sensitivities. Our results show that neural forward surrogates can recover useful physical information for inversion and uncertainty analysis, while clarifying the conditions under which this differential structure remains physically consistent.
【22】Spectral Path Regression: Directional Chebyshev Harmonics for Interpretable Tabular Learning
标题:谱路径回归:可解释表格学习的方向切比雪夫调和法
链接:https://arxiv.org/abs/2604.04091
作者:Milo Coombs
备注:19 pages, 4 figures. Includes appendix. Experiments on standard tabular benchmarks. Code available at https://github.com/MiloCoombs2002/spectral-paths
摘要:Classical approximation bases such as Chebyshev polynomials provide principled and interpretable representations, but their multivariate tensor-product constructions scale exponentially with dimension and impose axis-aligned structure that is poorly matched to real tabular data. We address this by replacing tensorised oscillations with directional harmonic modes of the form $\cos(\mathbf{m}^{\top}\arccos(\mathbf{x}))$, which organise multivariate structure by direction in angular space rather than by coordinate index. This representation yields a discrete spectral regression model in which complexity is controlled by selecting a small number of structured frequency vectors (spectral paths), and training reduces to a single closed-form ridge solve with no iterative optimisation. Experiments on standard continuous-feature tabular regression benchmarks show that the resulting models achieve accuracy competitive with strong nonlinear baselines while remaining compact, computationally efficient, and explicitly interpretable through analytic expressions of learned feature interactions.
【23】Multimodal Structure Learning: Disentangling Shared and Specific Topology via Cross-Modal Graphical Lasso
标题:多模式结构学习:通过跨模式图形套索解开共享和特定的布局
链接:https://arxiv.org/abs/2604.03953
作者:Fei Wang,Yutong Zhang,Xiong Wang
备注:Submitted to a conference
摘要:Learning interpretable multimodal representations inherently relies on uncovering the conditional dependencies between heterogeneous features. However, sparse graph estimation techniques, such as Graphical Lasso (GLasso), to visual-linguistic domains is severely bottlenecked by high-dimensional noise, modality misalignment, and the confounding of shared versus category-specific topologies. In this paper, we propose Cross-Modal Graphical Lasso (CM-GLasso) that overcomes these fundamental limitations. By coupling a novel text-visualization strategy with a unified vision-language encoder, we strictly align multimodal features into a shared latent space. We introduce a cross-attention distillation mechanism that condenses high-dimensional patches into explicit semantic nodes, naturally extracting spatial-aware cross-modal priors. Furthermore, we unify tailored GLasso estimation and Common-Specific Structure Learning (CSSL) into a joint objective optimized via the Alternating Direction Method of Multiplier (ADMM). This formulation guarantees the simultaneous disentanglement of invariant and class-specific precision matrices without multi-step error accumulation. Extensive experiments across eight benchmarks covering both natural and medical domains demonstrate that CM-GLasso establishes a new state-of-the-art in generative classification and dense semantic segmentation tasks.
【24】Improving Model Performance by Adapting the KGE Metric to Account for System Non-Stationarity
标题:通过调整KGE指标以考虑系统非平稳性来提高模型性能
链接:https://arxiv.org/abs/2604.03906
作者:M Jawad,HV Gupta,YH Wang,MA Farmani,A Behrangi,GY Niu
摘要:Geoscientific systems tend to be characterized by pronounced temporal non-stationarity, arising from seasonal and climatic variability in hydrometeorological drivers, and from natural and anthropogenic changes to land use and cover. As has been pointed out, such variability renders "the assumption of statistical stationarity obsolete in water management", and requires us to "account for, rather than ignore, non-stationary trends" in the data. However, metrics used for model development are typically based on the implicit and unjustifiable assumption that the data generating process is time-stationary. Here, we introduce the JKGE_ss metric (adapted from KGE_ss) that detects and accounts for dynamical non-stationarity in the statistical properties of the data and thereby improves information extraction and model performance. Unlike NSE and KGE_ss, which use the long-term mean as a benchmark against which to evaluate model efficiency, JKGE_ss emphasizes reproduction of temporal variations in system storage. We tested the robustness of the new metric by training physical-conceptual and data-based catchment-scale models of varying complexity across a wide range of hydroclimatic conditions, from recent-precipitation-dominated to snow-dominated to strongly arid. In all cases, the result was improved reproduction of system temporal dynamics at all time scales, across wet to dry years, and over the full range of flow levels (especially recession periods). Since traditional metrics fail to adequately account for temporal shifts in system dynamics, potentially resulting in misleading assessments of model performance under changing conditions, we recommend the adoption of JKGE_ss for geoscientific model development.
【25】Choosing the Right Regularizer for Applied ML: Simulation Benchmarks of Popular Scikit-learn Regularization Frameworks
标题:为应用ML选择正确的规则器:流行Scikit-learn规则化框架的模拟基准
链接:https://arxiv.org/abs/2604.03541
作者:Benjamin S. Knight,Ahsaas Bajaj
摘要:This study surveys the historical development of regularization, tracing its evolution from stepwise regression in the 1960s to recent advancements in formal error control, structured penalties for non-independent features, Bayesian methods, and l0-based regularization (among other techniques). We empirically evaluate the performance of four canonical frameworks -- Ridge, Lasso, ElasticNet, and Post-Lasso OLS -- across 134,400 simulations spanning a 7-dimensional manifold grounded in eight production-grade machine learning models. Our findings demonstrate that for prediction accuracy when the sample-to-feature ratio is sufficient (n/p >= 78), Ridge, Lasso, and ElasticNet are nearly interchangeable. However, we find that Lasso recall is highly fragile under multicollinearity; at high condition numbers (kappa) and low SNR, Lasso recall collapses to 0.18 while ElasticNet maintains 0.93. Consequently, we advise practitioners against using Lasso or Post-Lasso OLS at high kappa with small sample sizes. The analysis concludes with an objective-driven decision guide to assist machine learning engineers in selecting the optimal scikit-learn-supported framework based on observable feature space attributes.
【26】Online learning of smooth functions on $\mathbb{R}$
链接:https://arxiv.org/abs/2604.03525
作者:Jesse Geneson,Kuldeep Singh,Alexander Wang
摘要:We study adversarial online learning of real-valued functions on $\mathbb{R}$. In each round the learner is queried at $x_t\in\mathbb{R}$, predicts $\hat y_t$, and then observes the true value $f(x_t)$; performance is measured by cumulative $p$-loss $\sum_{t\ge 1}|\hat y_t-f(x_t)|^p$. For the class \[ \mathcal{G}_q=\Bigl\{f:\mathbb{R}\to\mathbb{R}\ \text{absolutely continuous}:\ \int_{\mathbb{R}}|f'(x)|^q\,dx\le 1\Bigr\}, \] we show that the standard model becomes ill-posed on $\mathbb{R}$: for every $p\ge 1$ and $q>1$, an adversary can force infinite loss. Motivated by this obstruction, we analyze three modified learning scenarios that limit the influence of queries that are far from previously observed inputs. In Scenario 1 the adversary must choose each new query within distance $1$ of some past query. In Scenario 2 the adversary may query anywhere, but the learner is penalized only on rounds whose query lies within distance $1$ of a past query. In Scenario 3 the loss in round $t$ is multiplied by a weight $g(\min_{j
【27】ExpressEdit: Fast Editing of Stylized Facial Expressions with Diffusion Models in Photoshop
标题:收件箱编辑:在Photoshop中使用扩散模型快速编辑风格化面部表情
链接:https://arxiv.org/abs/2604.03448
作者:Kenan Tang,Jiasheng Guo,Jeffrey Lin,Yao Qin
备注:Accepted to CVPR 2026 Workshop on Generative AI for Storytelling (AISTORY)
摘要:Facial expressions of characters are a vital component of visual storytelling. While current AI image editing models hold promise for assisting artists in the task of stylized expression editing, these models introduce global noise and pixel drift into the edited image, preventing the integration of these models into professional image editing software and workflows. To bridge this gap, we introduce ExpressEdit, a fully open-source Photoshop plugin that is free from common artifacts of proprietary image editing models and robustly synergizes with native Photoshop operations such as Liquify. ExpressEdit seamlessly edits an expression within 3 seconds on a single consumer-grade GPU, significantly faster than popular proprietary models. Moreover, to support the generation of diverse expressions according to different narrative needs, we compile a comprehensive expression database of 135 expression tags enriched with example stories and images designed for retrieval-augmented generation. We open source the code and dataset to facilitate future research and artistic exploration.
【28】From Model-Based Screening to Data-Driven Surrogates: A Multi-Stage Workflow for Exploring Stochastic Agent-Based Models
标题:从基于模型的筛选到数据驱动的代理:探索随机基于主体的模型的多阶段工作流程
链接:https://arxiv.org/abs/2604.03350
作者:Paul Saves,Matthieu Mastio,Nicolas Verstaevel,Benoit Gaudou
备注
:Published in MABS 2026 - The 27th International Workshop on Multi-Agent-Based Simulation
摘要:Systematic exploration of Agent-Based Models (ABMs) is challenged by the curse of dimensionality and their inherent stochasticity. We present a multi-stage pipeline integrating the systematic design of experiments with machine learning surrogates. Using a predator-prey case study, our methodology proceeds in two steps. First, an automated model-based screening identifies dominant variables, assesses outcome variability, and segments the parameter space. Second, we train Machine Learning models to map the remaining nonlinear interaction effects. This approach automates the discovery of unstable regions where system outcomes are highly dependent on nonlinear interactions between many variables. Thus, this work provides modelers with a rigorous, hands-off framework for sensitivity analysis and policy testing, even when dealing with high-dimensional stochastic simulators.
【29】General Explicit Network (GEN): A novel deep learning architecture for solving partial differential equations
标题:通用显式网络(GER):用于求解偏微方程的新型深度学习架构
链接:https://arxiv.org/abs/2604.03321
作者:Genwei Ma,Ting Luo,Ping Yang,Xing Zhao
摘要:Machine learning, especially physics-informed neural networks (PINNs) and their neural network variants, has been widely used to solve problems involving partial differential equations (PDEs). The successful deployment of such methods beyond academic research remains limited. For example, PINN methods primarily consider discrete point-to-point fitting and fail to account for the potential properties of real solutions. The adoption of continuous activation functions in these approaches leads to local characteristics that align with the equation solutions while resulting in poor extensibility and robustness. A general explicit network (GEN) that implements point-to-function PDE solving is proposed in this paper. The "function" component can be constructed based on our prior knowledge of the original PDEs through corresponding basis functions for fitting. The experimental results demonstrate that this approach enables solutions with high robustness and strong extensibility to be obtained.
【30】Multi-Agent Training-free Urban Food Delivery System using Resilient UMST Network
标题:使用弹性UMPS网络的多代理免训练城市食品配送系统
链接:https://arxiv.org/abs/2604.03280
作者:Md Nahid Hasan,Vishwam Tiwari,Aditya Challa,Vaskar Raychoudhury,Snehanshu Saha
摘要:Delivery systems have become a core part of urban life, supporting the demand for food, medicine, and other goods. Yet traditional logistics networks remain fragile, often struggling to adapt to road closures, accidents, and shifting demand. Online Food Delivery (OFD) platforms now represent a cornerstone of urban logistics, with the global market projected to grow to over 500 billion USD by 2030. Designing delivery networks that are efficient and resilient remains a major challenge: fully connected graphs provide flexibility but are computationally infeasible at scale, while single Minimum Spanning Trees (MSTs) are efficient but easily disrupted. We propose the Union of Minimum Spanning Trees (UMST) approach to construct delivery networks that are sparse yet robust. UMST generates multiple MSTs through randomized edge perturbations and unites them, producing graphs with far fewer edges than fully connected networks while maintaining multiple alternative routes between delivery hotspots. Across multiple U.S. cities, UMST achieves 20--40$\times$ fewer edges than fully connected graphs while enabling substantial order bundling with 75--83% participation rates. Compared to learning-based baselines including MADDPG and Graph Neural Networks, UMST delivers competitive performance (88-96% success rates, 44-53% distance savings) without requiring training, achieving 30$\times$ faster execution while maintaining interpretable routing structures. Its combination of structural efficiency and operational flexibility offers a scalable and resilient foundation for urban delivery networks.
【31】Self-Execution Simulation Improves Coding Models
标题:自执行模拟改进了编码模型
链接:https://arxiv.org/abs/2604.03253
作者:Gallil Maimon,Ori Yoran,Felix Kreuk,Michael Hassid,Gal Cohen,Pierre Chambon,Yossi Adi
摘要:A promising research direction in enabling LLMs to generate consistently correct code involves addressing their inability to properly estimate program execution, particularly for code they generate. In this work, we demonstrate that Code LLMs can be trained to simulate program execution in a step-by-step manner and that this capability can be leveraged to improve competitive programming performance. Our approach combines supervised fine-tuning on natural language execution traces, textual explanations grounded in true execution, with reinforcement learning using verifiable rewards. We introduce two complementary objectives: output prediction given code and inputs, and solving competitive programming tasks with either ground-truth or self-predicted execution feedback. These objectives enable models to perform self-verification over multiple candidate solutions, and iterative self-fixing by simulating test execution. Across multiple competitive programming benchmarks, our method yields consistent improvements over standard reasoning approaches. We further present ablations and analysis to elucidate the role of execution simulation and its limitations.
【32】A Muon-Accelerated Algorithm for Low Separation Rank Tensor Generalized Linear Models
标题:低分离度阶张量广义线性模型的μ子加速算法
链接:https://arxiv.org/abs/2604.04726
作者:Xiao Liang,Shuang Li
摘要
:Tensor-valued data arise naturally in multidimensional signal and imaging problems, such as biomedical imaging. When incorporated into generalized linear models (GLMs), naive vectorization can destroy their multi-way structure and lead to high-dimensional, ill-posed estimation. To address this challenge, Low Separation Rank (LSR) decompositions reduce model complexity by imposing low-rank multilinear structure on the coefficient tensor. A representative approach for estimating LSR-based tensor GLMs (LSR-TGLMs) is the Low Separation Rank Tensor Regression (LSRTR) algorithm, which adopts block coordinate descent and enforces orthogonality of the factor matrices through repeated QR-based projections. However, the repeated projection steps can be computationally demanding and slow convergence. Motivated by the need for scalable estimation and classification from such data, we propose LSRTR-M, which incorporates Muon (MomentUm Orthogonalized by Newton-Schulz) updates into the LSRTR framework. Specifically, LSRTR-M preserves the original block coordinate scheme while replacing the projection-based factor updates with Muon steps. Across synthetic linear, logistic, and Poisson LSR-TGLMs, LSRTR-M converges faster in both iteration count and wall-clock time, while achieving lower normalized estimation and prediction errors. On the Vessel MNIST 3D task, it further improves computational efficiency while maintaining competitive classification performance.
【33】Towards protein folding pathways by reconstructing protein residue networks with a policy-driven model
标题:通过政策驱动模型重建蛋白质残基网络,走向蛋白质折叠途径
链接:https://arxiv.org/abs/2604.04677
作者:Susan Khor
备注:8 pages, 5 figures, 3 tables
摘要:A method that reconstructs protein residue networks using suitable node selection and edge recovery policies produced numerical observations that correlate strongly (Pearson's correlation coefficient < -0.83) with published folding rates for 52 two-state folders and 21 multi-state folders; correlations are also strong at the fold-family level. These results were obtained serendipitously with the ND model, which was introduced previously, but is here extended with policies that dictate actions according to feature states. This result points to the importance of both the starting search point and the prevailing condition (random seed) for the quick success of policy search by a simple hill-climber. The two conditions, suitable policies and random seed, which (evidenced by the strong correlation statistic) setup a conducive environment for modelling protein folding within ND, could be compared to appropriate physiological conditions required by proteins to fold naturally. Of interest is an examination of the sequence of restored edges for potential as plausible protein folding pathways. Towards this end, trajectory data is collected for analysis and further model evaluation and development.
【34】Minimaxity and Admissibility of Bayesian Neural Networks
标题:Bayesian神经网络的极小性和可接受性
链接:https://arxiv.org/abs/2604.04673
作者:Daniel Andrew Coulson,Martin T. Wells
备注:95 pages and 6 figures
摘要:Bayesian neural networks (BNNs) offer a natural probabilistic formulation for inference in deep learning models. Despite their popularity, their optimality has received limited attention through the lens of statistical decision theory. In this paper, we study decision rules induced by deep, fully connected feedforward ReLU BNNs in the normal location model under quadratic loss. We show that, for fixed prior scales, the induced Bayes decision rule is not minimax. We then propose a hyperprior on the effective output variance of the BNN prior that yields a superharmonic square-root marginal density, establishing that the resulting decision rule is simultaneously admissible and minimax. We further extend these results from the quadratic loss setting to the predictive density estimation problem with Kullback--Leibler loss. Finally, we validate our theoretical findings numerically through simulation.
【35】Interpretation of Crystal Energy Landscapes with Kolmogorov-Arnold Networks
标题:用Kolmogorov-Arnold网络解释晶体能景观
链接:https://arxiv.org/abs/2604.04636
作者:Gen Zu,Ning Mao,Claudia Felser,Yang Zhang
摘要:Characterizing crystalline energy landscapes is essential to predicting thermodynamic stability, electronic structure, and functional behavior. While machine learning (ML) enables rapid property predictions, the "black-box" nature of most models limits their utility for generating new scientific insights. Here, we introduce Kolmogorov-Arnold Networks (KANs) as an interpretable framework to bridge this gap. Unlike conventional neural networks with fixed activation functions, KANs employ learnable functions that reveal underlying physical relationships. We developed the Element-Weighted KAN, a composition-only model that achieves state-of-the-art accuracy in predicting formation energy, band gap, and work function across large-scale datasets. Crucially, without any explicit physical constraints, KANs uncover interpretable chemical trends aligned with the periodic table and quantum mechanical principles through embedding analysis, correlation studies, and principal component analysis. These results demonstrate that KANs provide a powerful framework with high predictive performance and scientific interpretability, establishing a new paradigm for transparent, chemistry-based materials informatics.
【36】Generative Modeling under Non-Monotonic MAR Missingness via Approximate Wasserstein Gradient Flows
标题:通过近似Wasserstein梯度流进行非单调VAR缺失下的生成建模
链接:https://arxiv.org/abs/2604.04567
作者:Gitte Kremling,Jeffrey Näf,Johannes Lederer
摘要
:The prevalence of missing values in data science poses a substantial risk to any further analyses. Despite a wealth of research, principled nonparametric methods to deal with general non-monotone missingness are still scarce. Instead, ad-hoc imputation methods are often used, for which it remains unclear whether the correct distribution can be recovered. In this paper, we propose FLOWGEM, a principled iterative method for generating a complete dataset from a dataset with values Missing at Random (MAR). Motivated by convergence results of the ignoring maximum likelihood estimator, our approach minimizes the expected Kullback-Leibler (KL) divergence between the observed data distribution and the distribution of the generated sample over different missingness patterns. To minimize the KL divergence, we employ a discretized particle evolution of the corresponding Wasserstein Gradient Flow, where the velocity field is approximated using a local linear estimator of the density ratio. This construction yields a data generation scheme that iteratively transports an initial particle ensemble toward the target distribution. Simulation studies and real-data benchmarks demonstrate that FLOWGEM achieves state-of-the-art performance across a range of settings, including the challenging case of non-monotonic MAR mechanisms. Together, these results position FLOWGEM as a principled and practical alternative to existing imputation methods, and a decisive step towards closing the gap between theoretical rigor and empirical performance.
【37】Sharp asymptotic theory for Q-learning with LDTZ learning rate and its generalization
标题:具有LDTZ学习率的Q-学习的Sharp渐近理论及其推广
链接:https://arxiv.org/abs/2604.04218
作者:Soham Bonnerjee,Zhipeng Lou,Wei Biao Wu
摘要:Despite the sustained popularity of Q-learning as a practical tool for policy determination, a majority of relevant theoretical literature deals with either constant ($η_{t}\equiv η$) or polynomially decaying ($η_{t} = ηt^{-α}$) learning schedules. However, it is well known that these choices suffer from either persistent bias or prohibitively slow convergence. In contrast, the recently proposed linear decay to zero (\texttt{LD2Z}: $η_{t,n}=η(1-t/n)$) schedule has shown appreciable empirical performance, but its theoretical and statistical properties remain largely unexplored, especially in the Q-learning setting. We address this gap in the literature by first considering a general class of power-law decay to zero (\texttt{PD2Z}-$ν$: $η_{t,n}=η(1-t/n)^ν$). Proceeding step-by-step, we present a sharp non-asymptotic error bound for Q-learning with \texttt{PD2Z}-$ν$ schedule, which then is used to derive a central limit theory for a new \textit{tail} Polyak-Ruppert averaging estimator. Finally, we also provide a novel time-uniform Gaussian approximation (also known as \textit{strong invariance principle}) for the partial sum process of Q-learning iterates, which facilitates bootstrap-based inference. All our theoretical results are complemented by extensive numerical experiments. Beyond being new theoretical and statistical contributions to the Q-learning literature, our results definitively establish that \texttt{LD2Z} and in general \texttt{PD2Z}-$ν$ achieve a best-of-both-worlds property: they inherit the rapid decay from initialization (characteristic of constant step-sizes) while retaining the asymptotic convergence guarantees (characteristic of polynomially decaying schedules). This dual advantage explains the empirical success of \texttt{LD2Z} while providing practical guidelines for inference through our results.
【38】Relay-Assisted Activation-Integrated SIM for Wireless Physical Neural Networks
标题:用于无线物理神经网络的中继辅助激活集成SIM
链接:https://arxiv.org/abs/2604.04212
作者:Meng Hua,Deniz Gündüz
摘要:Wireless physical neural networks (WPNNs) have emerged as a promising paradigm for performing neural computation directly in the physical layer of wireless systems, offering low latency and high energy efficiency. However, most existing WPNN implementations primarily rely on linear physical transformations, which fundamentally limits their expressiveness. In this work, we propose a relay-assisted WPNN architecture based on activation-integrated stacked intelligent metasurfaces (AI-SIMs), where each passive metasurface layer enabling linear wave manipulation is cascaded with an activation metasurface layer that realizes nonlinear processing in the analog domain. By deliberately structuring multi-hop wireless propagation, the relay amplification matrix and the metasurface phase-shift matrices jointly act as trainable network weights, while hardware-implemented activation functions provide essential nonlinearity. Simulation results demonstrate that the proposed architecture achieves high classification accuracy, and that incorporating hardware-based activation functions significantly improves representational capability and performance compared with purely linear physical implementations.
【39】Non-Equilibrium Stochastic Dynamics as a Unified Framework for Insight and Repetitive Learning: A Kramers Escape Approach to Continual Learning
标题:非平衡随机动力学作为洞察力和重复学习的统一框架:持续学习的Kramers逃逸方法
链接:https://arxiv.org/abs/2604.04154
作者:Gunn Kim
备注:12 pages, 4 figures
摘要:Continual learning in artificial neural networks is fundamentally limited by the stability--plasticity dilemma: systems that retain prior knowledge tend to resist acquiring new knowledge, and vice versa. Existing approaches, most notably elastic weight consolidation~(EWC), address this empirically without a physical account of why plasticity eventually collapses as tasks accumulate. Separately, the distinction between sudden insight and gradual skill acquisition through repetitive practice has lacked a unified theoretical description. Here, we show that both problems admit a common resolution within non-equilibrium statistical physics. We model the state of a learning system as a particle evolving under Langevin dynamics on a double-well energy landscape, with the noise amplitude governed by a time-dependent effective temperature $T(t)$. The probability density obeys a Fokker--Planck equation, and transitions between metastable states are governed by the Kramers escape rate $k = (ω_0ω_b/2π)\,e^{-ΔE/T}$. We make two contributions. First, we identify the EWC penalty term as an energy barrier whose height grows linearly with the number of accumulated tasks, yielding an exponential collapse of the transition rate predicted analytically and confirmed numerically. Second, we show that insight and repetitive learning correspond to two qualitatively distinct temperature protocols within the same Fokker--Planck equation: insight events produce transient spikes in $T(t)$ that drive rapid barrier crossing, whereas repetitive practice operates at a modestly elevated but fixed temperature, achieving transitions through sustained stochastic diffusion. These results establish a physically grounded framework for understanding plasticity and its failure in continual learning systems, and suggest principled design criteria for adaptive noise schedules in artificial intelligence.
【40】Topological Sensitivity in Connectome-Constrained Neural Networks
标题:连接体约束神经网络中的布局敏感性
链接:https://arxiv.org/abs/2604.04033
作者:Nalin Dhiman
备注:17 pages, 5 fig
摘要
:Connectome-constrained neural networks are often evaluated against sparse random controls and then interpreted as evidence that biological graph topology improves learning efficiency. We revisit that claim in a controlled flyvis-based study using a Drosophila connectome, a naive self-loop-matched random graph, and a degree-preserving rewired null. Under weak controls, in which both models were recovered from a connectome-trained checkpoint and the null matched only global graph counts, the connectome appeared substantially better in early loss, mean activity, and runtime. That picture changed under stricter controls. Training both graphs from a shared random initialization removed the early loss advantage, and replacing the naive null by a degree-preserving null removed the apparent activity advantage. A five-sample degree-preserving ensemble and a pre-training activity-scale diagnostic further strengthened this revised interpretation. We also report a descriptive mechanism analysis of the earlier weak-control comparison, but we treat it as behavioral characterization rather than proof of causal superiority. We show that previously reported topology advantages in connectome-constrained neural networks can arise from initialization and null-model confounds, and largely disappear under fair from-scratch initialization and degree-preserving controls.
【41】Fused Multinomial Logistic Regression Utilizing Summary-Level External Machine-learning Information
标题:利用摘要级外部机器学习信息的融合多项逻辑回归
链接:https://arxiv.org/abs/2604.03939
作者:Chi-Shian Dai,Jun Shao
备注:24 pages, 2 figures
摘要:In many modern applications, a carefully designed primary study provides individual-level data for interpretable modeling, while summary-level external information is available through black-box, efficient, and nonparametric machine-learning predictions. Although summary-level external information has been studied in the data integration literature, there is limited methodology for leveraging external nonparametric machine-learning predictions to improve statistical inference in the primary study. We propose a general empirical-likelihood framework that incorporates external predictions through moment constraints. An advantage of nonparametric machine-learning prediction is that it induces a rich class of valid moment restrictions that remain robust to covariate shift under a mild overlap condition without requiring explicit density-ratio modeling. We focus on multinomial logistic regression as the primary model and address common data-quality issues in external sources, including coarsened outcomes, partially observed covariates, covariate shift, and heterogeneity in generating mechanisms known as concept shift. We establish large-sample properties of the resulting fused estimator, including consistency and asymptotic normality under regularity conditions. Moreover, we provide mild sufficient conditions under which incorporating external predictions delivers a strict efficiency gain relative to the primary-only estimator. Simulation studies and an application to the National Health and Nutrition Examination Survey on multiclass blood-pressure classification.
【42】IPSL-AID: Generative Diffusion Models for Climate Downscaling from Global to Regional Scales
标题:IPSL-AID:气候从全球到区域尺度缩减的生成扩散模型
链接:https://arxiv.org/abs/2604.03275
作者:Kishanthan Kingston,Olivier Boucher,Freddy Bouchet,Pierre Chapel,Rosemary Eade,Jean-Francois Lamarque,Redouane Lguensat,Kazem Ardaneh
备注:17 pages, 12 figures, submitted to Climate Informatique 2026, to appear in Environmental Data Science
摘要:Effective adaptation and mitigation strategies for climate change require high-resolution projections to inform strategic decision-making. Conventional global climate models, which typically operate at resolutions of 150 to 200 kilometers, lack the capacity to represent essential regional processes. IPSL-AID is a global to regional downscaling tool based on a denoising diffusion probabilistic model designed to address this limitation. Trained on ERA5 reanalysis data, it generates 0.25 degree resolution fields for temperature, wind, and precipitation using coarse inputs and their spatiotemporal context. It also models probability distributions of fine-scale features to produce plausible scenarios for uncertainty quantification. The model accurately reconstructs statistical distributions, including extreme events, power spectra, and spatial structures. This work highlights the potential of generative diffusion models for efficient climate downscaling with uncertainty
其他(58篇)
【1】Rethinking Exploration in RLVR: From Entropy Regularization to Refinement via Bidirectional Entropy Modulation
标题:RLVR中的重新思考探索:从熵规则化到双向熵调制的细化
链接:https://arxiv.org/abs/2604.04894
作者:Hengrui Gu,Xiaotian Han,Yujing Bian,Kaixiong Zhou
摘要:Reinforcement learning with verifiable rewards (RLVR) has significantly advanced the reasoning capabilities of large language models (LLMs). However, it faces a fundamental limitation termed \textit{restricted exploration}, where the policy rapidly converges to a narrow set of solutions. While entropy regularization is a popular approach used to sustain exploration, it often proves unreliable for LLMs, suffering from high hyperparameter sensitivity and yielding only marginal performance gains. Motivated by these inefficiencies, we propose to rethink the relationship between policy entropy and exploration. By deriving a parametric formulation of group-relative advantage estimation and analyzing entropy dynamics, we conceptually decompose policy entropy into \textit{informative entropy}, which preserves diverse solution paths, and \textit{spurious entropy}, which erodes reasoning patterns. Our analysis reveals that, in contrast to blind maximization, effective exploration requires \textit{entropy refinement}-a mechanism implicitly embedded in group-relative advantage estimation that sustains informative entropy on positive rollouts while suppressing spurious entropy on negative ones. Guided by this insight, we propose \textbf{AsymGRPO}, an exploratory framework that explicitly decouples the modulation of positive and negative rollouts. This allows for independent control over the preservation of informative entropy and the suppression of spurious noise. Extensive experiments demonstrate that AsymGRPO achieves superior performance compared to strong baselines and exhibits the potential to synergize with existing entropy regularization methods.
【2】The Role of Generator Access in Autoregressive Post-Training
标题:生成器访问在自回归后训练中的作用
链接:https://arxiv.org/abs/2604.04855
作者:Amit Kiran Rege
备注:Work in progress
摘要:We study how generator access constrains autoregressive post-training. The central question is whether the learner is confined to fresh root-start rollouts or can return to previously built prefixes and query the next-token rule there. In the root-start regime, output sampling, generated-token log probabilities, top-$k$ reports, and full next-token distributions along sampled trajectories all reduce to one canonical experiment, limited by the on-policy probability of reaching informative prefixes. Weak prefix control breaks this barrier, and once control is available, richer observations such as conditional sampling or logits can outperform top-$1$ access. Changing only the generator interface creates an exponential gap for KL-regularized outcome-reward post-training.
【3】SkillX: Automatically Constructing Skill Knowledge Bases for Agents
标题:SkillX:为代理自动构建技能知识库
链接:https://arxiv.org/abs/2604.04804
作者:Chenxi Wang,Zhuoyun Yu,Xin Xie,Wuguannan Yao,Runnan Fang,Shuofei Qiao,Kexin Cao,Guozhou Zheng,Xiang Qi,Peng Zhang,Shumin Deng
备注:Work in progress
摘要:Learning from experience is critical for building capable large language model (LLM) agents, yet prevailing self-evolving paradigms remain inefficient: agents learn in isolation, repeatedly rediscover similar behaviors from limited experience, resulting in redundant exploration and poor generalization. To address this problem, we propose SkillX, a fully automated framework for constructing a \textbf{plug-and-play skill knowledge base} that can be reused across agents and environments. SkillX operates through a fully automated pipeline built on three synergistic innovations: \textit{(i) Multi-Level Skills Design}, which distills raw trajectories into three-tiered hierarchy of strategic plans, functional skills, and atomic skills; \textit{(ii) Iterative Skills Refinement}, which automatically revises skills based on execution feedback to continuously improve library quality; and \textit{(iii) Exploratory Skills Expansion}, which proactively generates and validates novel skills to expand coverage beyond seed training data. Using a strong backbone agent (GLM-4.6), we automatically build a reusable skill library and evaluate its transferability on challenging long-horizon, user-interactive benchmarks, including AppWorld, BFCL-v3, and $τ^2$-Bench. Experiments show that SkillKB consistently improves task success and execution efficiency when plugged into weaker base agents, highlighting the importance of structured, hierarchical experience representations for generalizable agent learning. Our code will be publicly available soon at https://github.com/zjunlp/SkillX.
【4】Forgetting to Witness: Efficient Federated Unlearning and Its Visible Evaluation
标题:忘记见证:高效的联邦取消学习及其可见评估
链接:https://arxiv.org/abs/2604.04800
作者:Houzhe Wang,Xiaojie Zhu,Chi Chen
摘要:With the increasing importance of data privacy and security, federated unlearning has emerged as a novel research field dedicated to ensuring that federated learning models no longer retain or leak relevant information once specific data has been deleted. In this paper, to the best of our knowledge, we propose the first complete pipeline for federated unlearning, which includes a federated unlearning approach and an evaluation framework. Our proposed federated unlearning approach ensures high efficiency and model accuracy without the need to store historical data.It effectively leverages the knowledge distillation model alongside various optimization mechanisms. Moreover, we propose a framework named Skyeye to visualize the forgetting capacity of federated unlearning models. It utilizes the federated unlearning model as the classifier integrated into a Generative Adversarial Network (GAN). Afterward, both the classifier and discriminator guide the generator in generating samples. Throughout this process, the generator learns from the classifier's knowledge. The generator then visualizes this knowledge through sample generation. Finally, the model's forgetting capability is evaluated based on the relevance between the deleted data and the generated samples. Comprehensive experiments are conducted to illustrate the effectiveness of the proposed federated unlearning approach and the corresponding evaluation framework.
【5】Undetectable Conversations Between AI Agents via Pseudorandom Noise-Resilient Key Exchange
标题:人工智能代理之间通过伪随机抗噪密钥交换进行无法检测的对话
链接:https://arxiv.org/abs/2604.04757
作者:Vinod Vaikuntanathan,Or Zamir
摘要:AI agents are increasingly deployed to interact with other agents on behalf of users and organizations. We ask whether two such agents, operated by different entities, can carry out a parallel secret conversation while still producing a transcript that is computationally indistinguishable from an honest interaction, even to a strong passive auditor that knows the full model descriptions, the protocol, and the agents' private contexts. Building on recent work on watermarking and steganography for LLMs, we first show that if the parties possess an interaction-unique secret key, they can facilitate an optimal-rate covert conversation: the hidden conversation can exploit essentially all of the entropy present in the honest message distributions. Our main contributions concern extending this to the keyless setting, where the agents begin with no shared secret. We show that covert key exchange, and hence covert conversation, is possible even when each model has an arbitrary private context, and their messages are short and fully adaptive, assuming only that sufficiently many individual messages have at least constant min-entropy. This stands in contrast to previous covert communication works, which relied on the min-entropy in each individual message growing with the security parameter. To obtain this, we introduce a new cryptographic primitive, which we call pseudorandom noise-resilient key exchange: a key-exchange protocol whose public transcript is pseudorandom while still remaining correct under constant noise. We study this primitive, giving several constructions relevant to our application as well as strong limitations showing that more naive variants are impossible or vulnerable to efficient attacks. These results show that transcript auditing alone cannot rule out covert coordination between AI agents, and identify a new cryptographic theory that may be of independent interest.
【6】MUXQ: Mixed-to-Uniform Precision MatriX Quantization via Low-Rank Outlier Decomposition
标题
:MUXQ:通过低阶离群点分解进行混合到均匀精度MatriX量化
链接:https://arxiv.org/abs/2604.04701
作者:Seoungsub Lee,In Seo Kim,Seon Wook Kim
摘要:Large language models (LLMs) have achieved outstanding performance across a wide range of natural language processing tasks, but their enormous parameter counts impose ubstantial memory and computational overheads. This challenge is particularly critical in NPU-based on-device environments, where FP16/FP32 computation is inefficient and integer (INT) quantization is therefore essential. However, existing methods, including ZeroQuant, LLM.int8(), and SmoothQuant, do not fully address input-activation outliers and the associated hardware inefficiencies. To overcome these limitations, we propose MUXQ (Mixed-to-Uniform Quantization). MUXQ detects outlier channels in input activations and introduces a small auxiliary matrix that redistributes outlier magnitudes across channels, thereby alleviating the outlier problem. This enables even activation outliers to be quantized at low-precision INT levels while preserving a hardware-friendly computation structure. Experiments on GPT-2 models at three scales (0.1B, 0.3B, and 0.7B parameters) using the WikiText-2 dataset show that MUXQ consistently achieves lower perplexity than naive quantization. In particular, under per-tensor quantization, MUXQ quantizes both activations and weights to INT8 while maintaining accuracy close to that of FP16. With only modest computational overhead, MUXQ enables stable low-precision inference and can be readily combined with other quantization techniques. These results suggest that MUXQ provides a promising direction for efficient and accurate LLM inference on edge devices.
【7】Batch Loss Score for Dynamic Data Pruning
标题:用于动态数据修剪的批丢失分数
链接:https://arxiv.org/abs/2604.04681
作者:Qing Zhou,Bingxuan Zhao,Tao Yang,Hongyuan Zhang,Junyu Gao,Qi Wang
备注:CVPR2026 accepted
摘要:Dynamic data pruning accelerates deep learning by selectively omitting less informative samples during training. While per-sample loss is a common importance metric, obtaining it can be challenging or infeasible for complex models or loss functions, often requiring significant implementation effort. This work proposes the Batch Loss Score (BLS), a computationally efficient alternative using an Exponential Moving Average (EMA) of readily available batch losses to assign scores to individual samples. We frame the batch loss, from the perspective of a single sample, as a noisy measurement of its scaled individual loss, with noise originating from stochastic batch composition. It is formally shown that the EMA mechanism functions as a first-order low-pass filter, attenuating high-frequency batch composition noise. This yields a score approximating the smoothed and persistent contribution of the individual sample to the loss, providing a theoretical grounding for BLS as a proxy for sample importance. BLS demonstrates remarkable code integration simplicity (\textbf{three-line injection}) and readily adapts existing per-sample loss-based methods (\textbf{one-line proxy}). Its effectiveness is demonstrated by enhancing two such methods to losslessly prune \textbf{20\%-50\%} of samples across \textit{14 datasets}, \textit{11 tasks} and \textit{18 models}, highlighting its utility and broad applicability, especially for complex scenarios where per-sample loss is difficult to access. Code is available at https://github.com/mrazhou/BLS.
【8】From Curiosity to Caution: Mitigating Reward Hacking for Best-of-N with Pessimism
标题:从好奇到谨慎:缓解悲观主义者对N中最优秀的奖励黑客
链接:https://arxiv.org/abs/2604.04648
作者:Zhuohao Yu,Zhiwei Steven Wu,Adam Block
备注:29 pages, 8 figures
摘要:Inference-time compute scaling has emerged as a powerful paradigm for improving language model performance on a wide range of tasks, but the question of how best to use the additional compute remains open. A popular approach is BoN sampling, where N candidate responses are generated, scored according to a reward model, and the highest-scoring response is selected. While this approach can improve performance, it is vulnerable to reward hacking, where performance degrades as N increases due to the selection of responses that exploit imperfections in the reward model instead of genuinely improving generation quality. Prior attempts to mitigate reward hacking, via stronger reward models or heavy-handed distributional regularization, either fail to fully address over-optimization or are too conservative to exploit additional compute. In this work, we explore the principle of pessimism in RL, which uses lower confidence bounds on value estimates to avoid OOD actions with uncertain reward estimates. Our approach, termed as caution, can be seen as the reverse of curiosity: where curiosity rewards prediction error as a signal of novelty, caution penalizes prediction error as a signal of distributional uncertainty. Practically, caution trains an error model on typical responses and uses its prediction error to lower reward estimates for atypical ones. Our extensive empirical evaluation demonstrates that caution is a simple, computationally efficient approach that substantially mitigates reward hacking in BoN sampling. We also provide a theoretical analysis in a simplified linear setting, which shows that caution provably improves over the standard BoN approach. Together, our results not only establish caution as a practical solution to reward hacking, but also provide evidence that curiosity-based approaches can be a general OOD detection technique in LLM settings.
【9】LP-GEMM: Integrating Layout Propagation into GEMM Operations
标题:LP-GEMM:将布局传播集成到GEMM运营中
链接:https://arxiv.org/abs/2604.04599
作者:César Guedes Carneiro,Lucas Alvarenga,Guido Araujo,Sandro Rigo
摘要
:In Scientific Computing and modern Machine Learning (ML) workloads, sequences of dependent General Matrix Multiplications (GEMMs) often dominate execution time. While state-of-the-art BLAS libraries aggressively optimize individual GEMM calls, they remain constrained by the BLAS API, which requires each call to independently pack input matrices and restore outputs to a canonical memory layout. In sequential GEMMs, these constraints cause redundant packing and unpacking, wasting valuable computational resources. This paper introduces LP-GEMM, a decomposition of the GEMM kernel that enables packing-layout propagation across sequential GEMM operations. This approach eliminates unnecessary data repacking while preserving full BLAS semantic correctness at the boundaries. We evaluate LP-GEMM on x86 (AVX-512) and RISC-V (RVV 1.0) architectures across MLP-like and Attention-like workloads. Our results show average speedups of 2.25x over OpenBLAS on Intel x86 for sequential GEMMs and competitive gains relative to vendor-optimized libraries such as Intel MKL. We demonstrate the practicality of the approach beyond microbenchmarks by implementing a standalone C++ version of the Llama-3.2 inference path using exclusively BLAS-level GEMM calls. These results confirm that leveraging data layout propagation between operations can significantly boost performance.
【10】Beyond Imbalance Ratio: Data Characteristics as Critical Moderators of Oversampling Method Selection
标题:超越失衡率:数据特征作为过度抽样方法选择的关键调节者
链接:https://arxiv.org/abs/2604.04541
作者:Yuwen Jiang,Songyun Ye
摘要:The prevailing IR-threshold paradigm posits a positive correlation between imbalance ratio (IR) and oversampling effectiveness, yet this assumption remains empirically unsubstantiated through controlled experimentation. We conducted 12 controlled experiments (N > 100 dataset variants) that systematically manipulated IR while holding data characteristics (class separability, cluster structure) constant via algorithmic generation of Gaussian mixture datasets. Two additional validation experiments examined ceiling effects and metric-dependence. All methods were evaluated on 17 real-world datasets from OpenML. Upon controlling for confounding variables, IR exhibited a weak to moderate negative correlation with oversampling benefits. Class separability emerged as a substantially stronger moderator, accounting for significantly more variance in method effectiveness than IR alone. We propose a 'Context Matters' framework that integrates IR, class separability, and cluster structure to provide evidence-based selection criteria for practitioners.
【11】Isokinetic Flow Matching for Pathwise Straightening of Generative Flows
标题:生成流路径拉直的等动能流匹配
链接:https://arxiv.org/abs/2604.04491
作者:Tauhid Khan
摘要:Flow Matching (FM) constructs linear conditional probability paths, but the learned marginal velocity field inevitably exhibits strong curvature due to trajectory superposition. This curvature severely inflates numerical truncation errors, bottlenecking few-step sampling. To overcome this, we introduce Isokinetic Flow Matching (Iso-FM), a lightweight, Jacobian-free dynamical regularizer that directly penalizes pathwise acceleration. By using a self-guided finite-difference approximation of the material derivative Dv/Dt, Iso-FM enforces local velocity consistency without requiring auxiliary encoders or expensive second-order autodifferentiation. Operating as a pure plug-and-play addition to single-stage FM training, Iso-FM dramatically improves few-step generation. On CIFAR-10 (DiT-S/2), Iso-FM slashes conditional non-OT FID at 2 steps from 78.82 to 27.13 - a 2.9x relative efficiency gain - and reaches a best-observed FID at 4 steps of 10.23. These results firmly establish acceleration regularization as a principled, compute-efficient mechanism for fast generative sampling.
【12】Estimating Central, Peripheral, and Temporal Visual Contributions to Human Decision Making in Atari Games
标题:估计雅达利游戏中中央、周边和时间视觉对人类决策的贡献
链接:https://arxiv.org/abs/2604.04439
作者:Henrik Krauss,Takehisa Yairi
摘要:We study how different visual information sources contribute to human decision making in dynamic visual environments. Using Atari-HEAD, a large-scale Atari gameplay dataset with synchronized eye-tracking, we introduce a controlled ablation framework as a means to reverse-engineer the contribution of peripheral visual information, explicit gaze information in form of gaze maps, and past-state information from human behavior. We train action-prediction networks under six settings that selectively include or exclude these information sources. Across 20 games, peripheral information shows by far the strongest contribution, with median prediction-accuracy drops in the range of 35.27-43.90% when removed. Gaze information yields smaller drops of 2.11-2.76%, while past-state information shows a broader range of 1.52-15.51%, with the upper end likely more informative due to reduced peripheral-information leakage. To complement aggregate accuracies, we cluster states by true-action probabilities assigned by the different model configurations. This analysis identifies coarse behavioral regimes, including focus-dominated, periphery-dominated, and more contextual decision situations. These results suggest that human decision making in Atari depends strongly on information beyond the current focus of gaze, while the proposed framework provides a way to estimate such information-source contributions from behavior.
【13】Context is All You Need
标题:背景就是您所需要的一切
链接:https://arxiv.org/abs/2604.04364
作者:Jean Erik Delanois,Shruti Joshi,Ryan Golden,Teresa Nick,Maxim Bazhenov
摘要:Artificial Neural Networks (ANNs) are increasingly deployed across diverse real-world settings, where they must operate under data distributions that differ from those seen during training. This challenge is central to Domain Generalization (DG), which trains models to generalize to unseen domains without target data, and Test-Time Adaptation (TTA), which improves robustness by adapting to unlabeled test data at deployment. Existing approaches to address these challenges are often complex, resource-intensive, and difficult to scale. We introduce CONTXT (Contextual augmentatiOn for Neural feaTure X Transforms), a simple and intuitive method for contextual adaptation. CONTXT modulates internal representations using simple additive and multiplicative feature transforms. Within a TTA setting, it yields consistent gains across discriminative tasks (e.g., ANN/CNN classification) and generative models (e.g., LLMs). The method is lightweight, easy to integrate, and incurs minimal overhead, enabling robust performance under domain shift without added complexity. More broadly, CONTXT provides a compact way to steer information flow and neural processing without retraining.
【14】Soft Tournament Equilibrium
标题
:软锦标赛均衡
链接:https://arxiv.org/abs/2604.04328
作者:Saad Alqithami
摘要:The evaluation of general-purpose artificial agents, particularly those based on large language models, presents a significant challenge due to the non-transitive nature of their interactions. When agent A defeats B, B defeats C, and C defeats A, traditional ranking methods that force a linear ordering can be misleading and unstable. We argue that for such cyclic domains, the fundamental object of evaluation should not be a ranking but a set-valued core, as conceptualized in classical tournament theory. This paper introduces Soft Tournament Equilibrium (STE), a differentiable framework for learning and computing set-valued tournament solutions directly from pairwise comparison data. STE first learns a probabilistic tournament model, potentially conditioned on rich contextual information. It then employs novel, differentiable operators for soft reachability and soft covering to compute continuous analogues of two seminal tournament solutions: the Top Cycle and the Uncovered Set. The output is a set of core agents, each with a calibrated membership score, providing a nuanced and robust assessment of agent capabilities. We develop the theoretical foundation for STE to prove its consistency with classical solutions in the zero-temperature limit, which establishes its Condorcet-inclusion properties, and analyzing its stability and sample complexity. We specify an experimental protocol for validating STE on both synthetic and real-world benchmarks. This work aims to provide a complete, standalone treatise that re-centers general-agent evaluation on a more appropriate and robust theoretical foundation, moving from unstable rankings to stable, set-valued equilibria.
【15】Effects of Generative AI Errors on User Reliance Across Task Difficulty
标题:生成性人工智能错误对跨任务难度用户依赖性的影响
链接:https://arxiv.org/abs/2604.04319
作者:Jacy Reese Anthis,Hannah Cha,Solon Barocas,Alexandra Chouldechova,Jake Hofman
备注:Published in CHI EA 2026
摘要:The capabilities of artificial intelligence (AI) lie along a jagged frontier, where AI systems surprisingly fail on tasks that humans find easy and succeed on tasks that humans find hard. To investigate user reactions to this phenomenon, we developed an incentive-compatible experimental methodology based on diagram generation tasks, in which we induce errors in generative AI output and test effects on user reliance. We demonstrate the interface in a preregistered 3x2 experiment (N = 577) with error rates of 10%, 30%, or 50% on easier or harder diagram generation tasks. We confirmed that observing more errors reduces use, but we unexpectedly found that easy-task errors did not significantly reduce use more than hard-task errors, suggesting that people are not averse to jaggedness in this experimental setting. We encourage future work that varies task difficulty at the same time as other features of AI errors, such as whether the jagged error patterns are easily learned.
【16】Correcting Source Mismatch in Flow Matching with Radial-Angular Transport
标题:利用辐射角输运修正流量匹配中的源错配
链接:https://arxiv.org/abs/2604.04291
作者:Fouad Oubari,Mathilde Mougeot
摘要:Flow Matching is typically built from Gaussian sources and Euclidean probability paths. For heavy-tailed or anisotropic data, however, a Gaussian source induces a structural mismatch already at the level of the radial distribution. We introduce \textit{Radial--Angular Flow Matching (RAFM)}, a framework that explicitly corrects this source mismatch within the standard simulation-free Flow Matching template. RAFM uses a source whose radial law matches that of the data and whose conditional angular distribution is uniform on the sphere, thereby removing the Gaussian radial mismatch by construction. This reduces the remaining transport problem to angular alignment, which leads naturally to conditional paths on scaled spheres defined by spherical geodesic interpolation. The resulting framework yields explicit Flow Matching targets tailored to radial--angular transport without modifying the underlying deterministic training pipeline. We establish the exact density of the matched-radial source, prove a radial--angular KL decomposition that isolates the Gaussian radial penalty, characterize the induced target vector field, and derive a stability result linking Flow Matching error to generation error. We further analyze empirical estimation of the radial law, for which Wasserstein and CDF metrics provide natural guarantees. Empirically, RAFM substantially improves over standard Gaussian Flow Matching and remains competitive with recent non-Gaussian alternatives while preserving a lightweight deterministic training procedure. Overall, RAFM provides a principled source-and-path design for Flow Matching on heavy-tailed and extreme-event data.
【17】Beyond Fluency: Toward Reliable Trajectories in Agentic IR
标题:超越流畅性:迈向大型IR中的可靠轨迹
链接:https://arxiv.org/abs/2604.04269
作者:Anushree Sinha,Srivaths Ranganathan,Debanshu Das,Abhishek Dharmaratnakar
摘要:Information Retrieval is shifting from passive document ranking toward autonomous agentic workflows that operate in multi-step Reason-Act-Observe loops. In such long-horizon trajectories, minor early errors can cascade, leading to functional misalignment between internal reasoning and external tool execution despite continued linguistic fluency. This position paper synthesizes failure modes observed in industrial agentic systems, categorizing errors across planning, retrieval, reasoning, and execution. We argue that safe deployment requires moving beyond endpoint accuracy toward trajectory integrity and causal attribution. To address compounding error and deceptive fluency, we propose verification gates at each interaction unit and advocate systematic abstention under calibrated uncertainty. Reliable Agentic IR systems must prioritize process correctness and grounded execution over plausible but unverified completion.
【18】Three Phases of Expert Routing: How Load Balance Evolves During Mixture-of-Experts Training
标题:专家路由的三个阶段:混合专家训练期间负载平衡如何演变
链接:https://arxiv.org/abs/2604.04230
作者:Charafeddine Mouzouni
摘要:We model Mixture-of-Experts (MoE) token routing as a congestion game with a single effective parameter, the congestion coefficient gamma_eff, that quantifies the balance-quality tradeoff. Tracking gamma_eff across training checkpoints of two open-source MoE models, OLMoE-1B-7B (20 checkpoints, with dense sampling in the surge region) and OpenMoE-8B (6 checkpoints), reveals a three-phase trajectory: a surge phase where the router learns to balance load (gamma_eff: 14 to 36-39, peaking in the step 30K-40K region), a stabilization phase where experts specialize under steady balance (B_0: 2.4 to 2.3, steps 100K-400K), and a relaxation phase where the router trades balance for quality as experts differentiate (gamma_eff: 27 to 9, steps 400K-1.2M). This non-monotone trajectory, invisible to post-hoc analysis of converged models, reveals that early MoE training prioritizes balance while late training prioritizes quality. The theoretical framework is honest about its limits: the single-type equilibrium reduces to temperature-scaled softmax (held-out L1: MFG = 0.199 vs. softmax = 0.200). The game is not a better predictor; it reveals what the temperature means and, critically, how that temperature evolves. We complement the dynamics with an effective congestion decomposition, a multi-type extension that improves load prediction via token clustering on all 16 layers (mean: 30%), scope diagnostics (K/M, epsilon_l), and robustness verification across four independent quality estimators (r >= 0.89). All confidence intervals are from bootstrap resampling over 50 independent text batches.
【19】ClawArena: Benchmarking AI Agents in Evolving Information Environments
标题:ClawArena:在不断变化的信息环境中对人工智能代理进行基准测试
链接:https://arxiv.org/abs/2604.04202
作者:Haonian Ji,Kaiwen Xiong,Siwei Han,Peng Xia,Shi Qiu,Yiyang Zhou,Jiaqi Liu,Jinlong Li,Bingzhou Li,Zeyu Zheng,Cihang Xie,Huaxiu Yao
摘要:AI agents deployed as persistent assistants must maintain correct beliefs as their information environment evolves. In practice, evidence is scattered across heterogeneous sources that often contradict one another, new information can invalidate earlier conclusions, and user preferences surface through corrections rather than explicit instructions. Existing benchmarks largely assume static, single-authority settings and do not evaluate whether agents can keep up with this complexity. We introduce ClawArena, a benchmark for evaluating AI agents in evolving information environments. Each scenario maintains a complete hidden ground truth while exposing the agent only to noisy, partial, and sometimes contradictory traces across multi-channel sessions, workspace files, and staged updates. Evaluation is organized around three coupled challenges: multi-source conflict reasoning, dynamic belief revision, and implicit personalization, whose interactions yield a 14-category question taxonomy. Two question formats, multi-choice (set-selection) and shell-based executable checks, test both reasoning and workspace grounding. The current release contains 64 scenarios across 8 professional domains, totaling 1{,}879 evaluation rounds and 365 dynamic updates. Experiments on five agent frameworks and five language models show that both model capability (15.4% range) and framework design (9.2%) substantially affect performance, that self-evolving skill frameworks can partially close model-capability gaps, and that belief revision difficulty is determined by update design strategy rather than the mere presence of updates. Code is available at https://github.com/aiming-lab/ClawArena.
【20】Which Leakage Types Matter?
标题:哪些泄漏类型很重要?
链接:https://arxiv.org/abs/2604.04199
作者:Simon Roth
备注:35 pages, 6 figures, 10 tables. Companion to arXiv:2603.10742
摘要:Twenty-eight within-subject counterfactual experiments across 2,047 tabular datasets, plus a boundary experiment on 129 temporal datasets, measuring the severity of four data leakage classes in machine learning. Class I (estimation - fitting scalers on full data) is negligible: all nine conditions produce $|Δ\text{AUC}| \leq 0.005$. Class II (selection - peeking, seed cherry-picking) is substantial: ~90% of the measured effect is noise exploitation that inflates reported scores. Class III (memorization) scales with model capacity: d_z = 0.37 (Naive Bayes) to 1.11 (Decision Tree). Class IV (boundary) is invisible under random CV. The textbook emphasis is inverted: normalization leakage matters least; selection leakage at practical dataset sizes matters most.
【21】Stable and Privacy-Preserving Synthetic Educational Data with Empirical Marginals: A Copula-Based Approach
标题:具有经验边缘群体的稳定且保护隐私的综合教育数据:基于Copula的方法
链接:https://arxiv.org/abs/2604.04195
作者:Gabriel Diaz Ramos,Lorenzo Luzi,Debshila Basu Mallick,Richard Baraniuk
备注:10 pages, 6 figures. Accepted at the Educational Data Mining (EDM) 2026 conference
摘要:To advance Educational Data Mining (EDM) within strict privacy-protecting regulatory frameworks, researchers must develop methods that enable data-driven analysis while protecting sensitive student information. Synthetic data generation is one such approach, enabling the release of statistically generated samples instead of real student records; however, existing deep learning and parametric generators often distort marginal distributions and degrade under iterative regeneration, leading to distribution drift and progressive loss of distributional support that compromise reliability. In response, we introduce the Non-Parametric Gaussian Copula (NPGC), a plug-and-play synthesis method that replaces deep learning and parametric optimization with empirical statistical anchoring to preserve the observed marginal distributions while modeling dependencies through a copula framework. NPGC integrates Differential Privacy (DP) at both the marginal and correlation levels, supports heterogeneous variable types, and treats missing data as an explicit state to retain informative absence patterns. We evaluate NPGC against deep learning and parametric baselines on five benchmark datasets and demonstrate that it remains stable across multiple regeneration cycles and achieves competitive downstream performance at substantially lower computational cost. We further validate NPGC through deployment in a real-world online learning platform, demonstrating its practicality for privacy-preserving research.
【22】Graphic-Design-Bench: A Comprehensive Benchmark for Evaluating AI on Graphic Design Tasks
标题:图形设计长凳:评估人工智能平面设计任务的综合基准
链接:https://arxiv.org/abs/2604.04192
作者:Adrienne Deganutti,Elad Hirsch,Haonan Zhu,Jaejung Seol,Purvanshi Mehta
摘要:We introduce GraphicDesignBench (GDB), the first comprehensive benchmark suite designed specifically to evaluate AI models on the full breadth of professional graphic design tasks. Unlike existing benchmarks that focus on natural-image understanding or generic text-to-image synthesis, GDB targets the unique challenges of professional design work: translating communicative intent into structured layouts, rendering typographically faithful text, manipulating layered compositions, producing valid vector graphics, and reasoning about animation. The suite comprises 50 tasks organized along five axes: layout, typography, infographics, template & design semantics and animation, each evaluated under both understanding and generation settings, and grounded in real-world design templates drawn from the LICA layered-composition dataset. We evaluate a set of frontier closed-source models using a standardized metric taxonomy covering spatial accuracy, perceptual quality, text fidelity, semantic alignment, and structural validity. Our results reveal that current models fall short on the core challenges of professional design: spatial reasoning over complex layouts, faithful vector code generation, fine-grained typographic perception, and temporal decomposition of animations remain largely unsolved. While high-level semantic understanding is within reach, the gap widens sharply as tasks demand precision, structure, and compositional awareness. GDB provides a rigorous, reproducible testbed for tracking progress toward AI systems that can function as capable design collaborators. The full evaluation framework is publicly available.
【23】FactReview: Evidence-Grounded Reviews with Literature Positioning and Execution-Based Claim Verification
标题:事实评论:基于证据的评论,具有文献定位和基于执行的主张验证
链接:https://arxiv.org/abs/2604.04074
作者:Hang Xu,Ling Yue,Chaoqian Ouyang,Libin Zheng,Shaowu Pan,Shimin Di,Min-Ling Zhang
摘要:Peer review in machine learning is under growing pressure from rising submission volume and limited reviewer time. Most LLM-based reviewing systems read only the manuscript and generate comments from the paper's own narrative. This makes their outputs sensitive to presentation quality and leaves them weak when the evidence needed for review lies in related work or released code. We present FactReview, an evidence-grounded reviewing system that combines claim extraction, literature positioning, and execution-based claim verification. Given a submission, FactReview identifies major claims and reported results, retrieves nearby work to clarify the paper's technical position, and, when code is available, executes the released repository under bounded budgets to test central empirical claims. It then produces a concise review and an evidence report that assigns each major claim one of five labels: Supported, Supported by the paper, Partially supported, In conflict, or Inconclusive. In a case study on CompGCN, FactReview reproduces results that closely match those reported for link prediction and node classification, yet also shows that the paper's broader performance claim across tasks is not fully sustained: on MUTAG graph classification, the reproduced result is 88.4%, whereas the strongest baseline reported in the paper remains 92.6%. The claim is therefore only partially supported. More broadly, this case suggests that AI is most useful in peer review not as a final decision-maker, but as a tool for gathering evidence and helping reviewers produce more evidence-grounded assessments. The code is public at https://github.com/DEFENSE-SEU/Review-Assistant.
【24】Multirate Stein Variational Gradient Descent for Efficient Bayesian Sampling
标题:多速率Stein变分梯度下降用于高效的Bayesian采样
链接:https://arxiv.org/abs/2604.03981
作者:Arash Sarshar
摘要:Many particle-based Bayesian inference methods use a single global step size for all parts of the update. In Stein variational gradient descent (SVGD), however, each update combines two qualitatively different effects: attraction toward high-posterior regions and repulsion that preserves particle diversity. These effects can evolve at different rates, especially in high-dimensional, anisotropic, or hierarchical posteriors, so one step size can be unstable in some regions and inefficient in others. We derive a multirate version of SVGD that updates these components on different time scales. The framework yields practical algorithms, including a symmetric split method, a fixed multirate method (MR-SVGD), and an adaptive multirate method (Adapt-MR-SVGD) with local error control. We evaluate the methods in a broad and rigorous benchmark suite covering six problem families: a 50D Gaussian target, multiple 2D synthetic targets, UCI Bayesian logistic regression, multimodal Gaussian mixtures, Bayesian neural networks, and large-scale hierarchical logistic regression. Evaluation includes posterior-matching metrics, predictive performance, calibration quality, mixing, and explicit computational cost accounting. Across these six benchmark families, multirate SVGD variants improve robustness and quality-cost tradeoffs relative to vanilla SVGD. The strongest gains appear on stiff hierarchical, strongly anisotropic, and multimodal targets, where adaptive multirate SVGD is usually the strongest variant and fixed multirate SVGD provides a simpler robust alternative at lower cost.
【25】Automating Cloud Security and Forensics Through a Secure-by-Design Generative AI Framework
标题:通过设计安全生成人工智能框架实现云安全和取证自动化
链接:https://arxiv.org/abs/2604.03912
作者:Dalal Alharthi,Ivan Roberto Kawaminami Garcia
备注:arXiv admin note: substantial text overlap with arXiv:2510.00452
摘要
:As cloud environments become increasingly complex, cybersecurity and forensic investigations must evolve to meet emerging threats. Large Language Models (LLMs) have shown promise in automating log analysis and reasoning tasks, yet they remain vulnerable to prompt injection attacks and lack forensic rigor. To address these dual challenges, we propose a unified, secure-by-design GenAI framework that integrates PromptShield and the Cloud Investigation Automation Framework (CIAF). PromptShield proactively defends LLMs against adversarial prompts using ontology-driven validation that standardizes user inputs and mitigates manipulation. CIAF streamlines cloud forensic investigations through structured, ontology-based reasoning across all six phases of the forensic process. We evaluate our system on real-world datasets from AWS and Microsoft Azure, demonstrating substantial improvements in both LLM security and forensic accuracy. Experimental results show PromptShield boosts classification performance under attack conditions, achieving precision, recall, and F1 scores above 93%, while CIAF enhances ransomware detection accuracy in cloud logs using Likert-transformed performance features. Our integrated framework advances the automation, interpretability, and trustworthiness of cloud forensics and LLM-based systems, offering a scalable foundation for real-time, AI-driven incident response across diverse cloud infrastructures.
【26】Lotka-Sharpe Neural Operators for Control of Population PDEs
标题:用于控制种群PDEs的Lotka-Sharpe神经运算器
链接:https://arxiv.org/abs/2604.03892
作者:Miroslav Krstic,Iasson Karafyllis,Luke Bhan,Carina Veil
备注:16 pages. In submission
摘要:Age-structured predator-prey integro-partial differential equations provide models of interacting populations in ecology, epidemiology, and biotechnology. A key challenge in feedback design for these systems is the scalar $ζ$, defined implicitly by the Lotka-Sharpe nonlinear integral condition, as a mapping from fertility and mortality rates to $ζ$. To solve this challenge with operator learning, we first prove that the Lotka-Sharpe operator is Lipschitz continuous, guaranteeing the existence of arbitrarily accurate neural operator approximations over a compact set of fertility and mortality functions. We then show that the resulting approximate feedback law preserves semi-global practical asymptotic stability under propagation of the operator approximation error through various other nonlinear operators, all the way through to the control input. In the numerical results, not only do we learn ``once-and-for-all'' the canonical Lotka-Sharpe (LS) operator, and thus make it available for future uses in control of other age-structured population interconnections, but we demonstrate the online usage of the neural LS operator under estimation of the fertility and mortality functions.
【27】Regime-Calibrated Demand Priors for Ride-Hailing Fleet Dispatch and Repositioning
标题:经地区校准的乘车船队调度和重新定位的需求优先级
链接:https://arxiv.org/abs/2604.03883
作者:Indar Kumar,Akanksha Tiwari
备注:10 pages, 10 figures, 8 tables. Code: https://github.com/IndarKarhana/regime-calibrated-dispatch
摘要:Effective ride-hailing dispatch requires anticipating demand patterns that vary substantially across time-of-day, day-of-week, season, and special events. We propose a regime-calibrated approach that (i) segments historical trip data into demand regimes, (ii) matches the current operating period to the most similar historical analogues via a similarity ensemble combining Kolmogorov-Smirnov distance, Wasserstein-1 distance, feature distance, variance ratio, event pattern similarity, and temporal proximity, and (iii) uses the resulting calibrated demand prior to drive both an LP-based fleet repositioning policy and batch dispatch with Hungarian matching. In ablation, a distributional-only metric subset achieves the strongest mean-wait reduction, while the full ensemble is retained as a robustness-oriented default that preserves calendar and event context. Evaluated on 5.2 million NYC TLC trips across 8 diverse scenarios (winter/summer, weekday/weekend/holiday, morning/evening/night) with 5 random seeds each, our method reduces mean rider wait times by 31.1% (bootstrap 95% CI: [26.5, 36.6]; Friedman chi-squared = 80.0, p = 4.25e-18; Cohen's d = 7.5-29.9). P95 wait drops 37.6% and the Gini coefficient of wait times improves from 0.441 to 0.409. The two contributions compose multiplicatively: calibration provides 16.9% reduction relative to the replay baseline; LP repositioning adds a further 15.5%. The approach requires no training, is deterministic and explainable, generalizes to Chicago (23.3% wait reduction using the NYC-built regime library without retraining), and is robust across fleet sizes (32-47% improvement for 0.5x-2.0x fleet scaling). Code is available at https://github.com/IndarKarhana/regime-calibrated-dispatch.
【28】A Bayesian Information-Theoretic Approach to Data Attribution
标题:数据归因的Bayesian信息论方法
链接:https://arxiv.org/abs/2604.03858
作者:Dharmesh Tailor,Nicolò Felicioni,Kamil Ciosek
备注:Accepted at the 29th International Conference on Artificial Intelligence and Statistics (AISTATS 2026)
摘要:Training Data Attribution (TDA) seeks to trace model predictions back to influential training examples, enhancing interpretability and safety. We formulate TDA as a Bayesian information-theoretic problem: subsets are scored by the information loss they induce - the entropy increase at a query when removed. This criterion credits examples for resolving predictive uncertainty rather than label noise. To scale to modern networks, we approximate information loss using a Gaussian Process surrogate built from tangent features. We show this aligns with classical influence scores for single-example attribution while promoting diversity for subsets. For even larger-scale retrieval, we relax to an information-gain objective and add a variance correction for scalable attribution in vector databases. Experiments show competitive performance on counterfactual sensitivity, ground-truth retrieval and coreset selection, showing that our method scales to modern architectures while bridging principled measures with practice.
【29】Rényi Attention Entropy for Patch Pruning
标题:补丁修剪的Rényi注意力熵
链接:https://arxiv.org/abs/2604.03803
作者:Hiroaki Aizawa,Yuki Igaue
备注:Accepted to ICPR2026
摘要
:Transformers are strong baselines in both vision and language because self-attention captures long-range dependencies across tokens. However, the cost of self-attention grows quadratically with the number of tokens. Patch pruning mitigates this cost by estimating per-patch importance and removing redundant patches. To identify informative patches for pruning, we introduce a criterion based on the Shannon entropy of the attention distribution. Low-entropy patches, which receive selective and concentrated attention, are kept as important, while high-entropy patches with attention spread across many locations are treated as redundant. We also extend the criterion from Shannon to Rényi entropy, which emphasizes sharp attention peaks and supports pruning strategies that adapt to task needs and computational limits. In experiments on fine-grained image recognition, where patch selection is critical, our method reduced computation while preserving accuracy. Moreover, adjusting the pruning policy through the Rényi entropy measure yields further gains and improves the trade-off between accuracy and computation.
【30】Automated Conjecture Resolution with Formal Verification
标题:通过形式验证自动化猜想解析
链接:https://arxiv.org/abs/2604.03789
作者:Haocheng Ju,Guoxiong Gao,Jiedong Jiang,Bin Wu,Zeming Sun,Leheng Chen,Yutong Wang,Yuefeng Wang,Zichen Wang,Wanyi He,Peihao Wu,Liang Xiao,Ruochuan Liu,Bryan Dai,Bin Dong
备注:Code and resources are available at: Rethlas (https://github.com/frenzymath/Rethlas), Archon (https://github.com/frenzymath/Archon), and the formalization results (https://github.com/frenzymath/Anderson-Conjecture)
摘要:Recent advances in large language models have significantly improved their ability to perform mathematical reasoning, extending from elementary problem solving to increasingly capable performance on research-level problems. However, reliably solving and verifying such problems remains challenging due to the inherent ambiguity of natural language reasoning. In this paper, we propose an automated framework for tackling research-level mathematical problems that integrates natural language reasoning with formal verification, enabling end-to-end problem solving with minimal human intervention. Our framework consists of two components: an informal reasoning agent, Rethlas, and a formal verification agent, Archon. Rethlas mimics the workflow of human mathematicians by combining reasoning primitives with our theorem search engine, Matlas, to explore solution strategies and construct candidate proofs. Archon, equipped with our formal theorem search engine LeanSearch, translates informal arguments into formalized Lean 4 projects through structured task decomposition, iterative refinement, and automated proof synthesis, ensuring machine-checkable correctness. Using this framework, we automatically resolve an open problem in commutative algebra and formally verify the resulting proof in Lean 4 with essentially no human involvement. Our experiments demonstrate that strong theorem retrieval tools enable the discovery and application of cross-domain mathematical techniques, while the formal agent is capable of autonomously filling nontrivial gaps in informal arguments. More broadly, our work illustrates a promising paradigm for mathematical research in which informal and formal reasoning systems, equipped with theorem retrieval tools, operate in tandem to produce verifiable results, substantially reduce human effort, and offer a concrete instantiation of human-AI collaborative mathematical research.
【31】RL-Driven Sustainable Land-Use Allocation for the Lake Malawi Basin
标题:RL驱动的马拉维湖盆地可持续土地利用分配
链接:https://arxiv.org/abs/2604.03768
作者:Ying Yao
备注:7 pages, 5 figures
摘要:Unsustainable land-use practices in ecologically sensitive regions threaten biodiversity, water resources, and the livelihoods of millions. This paper presents a deep reinforcement learning (RL) framework for optimizing land-use allocation in the Lake Malawi Basin to maximize total ecosystem service value (ESV). Drawing on the benefit transfer methodology of Costanza et al., we assign biome-specific ESV coefficients -- locally anchored to a Malawi wetland valuation -- to nine land-cover classes derived from Sentinel-2 imagery. The RL environment models a 50x50 cell grid at 500m resolution, where a Proximal Policy Optimization (PPO) agent with action masking iteratively transfers land-use pixels between modifiable classes. The reward function combines per-cell ecological value with spatial coherence objectives: contiguity bonuses for ecologically connected land-use patches (forest, cropland, built area etc.) and buffer zone penalties for high-impact development adjacent to water bodies. We evaluate the framework across three scenarios: (i) pure ESV maximization, (ii) ESV with spatial reward shaping, and (iii) a regenerative agriculture policy scenario. Results demonstrate that the agent effectively learns to increase total ESV; that spatial reward shaping successfully steers allocations toward ecologically sound patterns, including homogeneous land-use clustering and slight forest consolidation near water bodies; and that the framework responds meaningfully to policy parameter changes, establishing its utility as a scenario-analysis tool for environmental planning.
【32】Stochastic Generative Plug-and-Play Priors
标题:随机生成即插即用先验
链接:https://arxiv.org/abs/2604.03603
作者:Chicago Y. Park,Edward P. Chandler,Yuyang Hu,Michael T. McCann,Cristina Garcia-Cardona,Brendt Wohlberg,Ulugbek S. Kamilov
摘要:Plug-and-play (PnP) methods are widely used for solving imaging inverse problems by incorporating a denoiser into optimization algorithms. Score-based diffusion models (SBDMs) have recently demonstrated strong generative performance through a denoiser trained across a wide range of noise levels. Despite their shared reliance on denoisers, it remains unclear how to systematically use SBDMs as priors within the PnP framework without relying on reverse diffusion sampling. In this paper, we establish a score-based interpretation of PnP that justifies using pretrained SBDMs directly within PnP algorithms. Building on this connection, we introduce a stochastic generative PnP (SGPnP) framework that injects noise to better leverage the expressive generative SBDM priors, thereby improving robustness in severely ill-posed inverse problems. We provide a new theory showing that this noise injection induces optimization on a Gaussian-smoothed objective and promotes escape from strict saddle points. Experiments on challenging inverse tasks, such as multi-coil MRI reconstruction and large-mask natural image inpainting, demonstrate consistent improvement over conventional PnP methods and achieve performance competitive with diffusion-based solvers.
【33】Simple yet Effective: Low-Rank Spatial Attention for Neural Operators
标题:简单而有效:神经操作员的低秩空间注意力
链接:https://arxiv.org/abs/2604.03582
作者:Zherui Yang,Haiyang Xin,Tao Du,Ligang Liu
摘要:Neural operators have emerged as data-driven surrogates for solving partial differential equations (PDEs), and their success hinges on efficiently modeling the long-range, global coupling among spatial points induced by the underlying physics. In many PDE regimes, the induced global interaction kernels are empirically compressible, exhibiting rapid spectral decay that admits low-rank approximations. We leverage this observation to unify representative global mixing modules in neural operators under a shared low-rank template: compressing high-dimensional pointwise features into a compact latent space, processing global interactions within it, and reconstructing the global context back to spatial points. Guided by this view, we introduce Low-Rank Spatial Attention (LRSA) as a clean and direct instantiation of this template. Crucially, unlike prior approaches that often rely on non-standard aggregation or normalization modules, LRSA is built purely from standard Transformer primitives, i.e., attention, normalization, and feed-forward networks, yielding a concise block that is straightforward to implement and directly compatible with hardware-optimized kernels. In our experiments, such a simple construction is sufficient to achieve high accuracy, yielding an average error reduction of over 17\% relative to second-best methods, while remaining stable and efficient in mixed-precision training.
【34】LangFIR: Discovering Sparse Language-Specific Features from Monolingual Data for Language Steering
标题:LangFIR:从单语数据中发现稀疏语言特定特征以进行语言引导
链接:https://arxiv.org/abs/2604.03532
作者:Sing Hieng Wong,Hassan Sajjad,A. B. Siddique
备注:Submitted to COLM 2026
摘要:Large language models (LLMs) show strong multilingual capabilities, yet reliably controlling the language of their outputs remains difficult. Representation-level steering addresses this by adding language-specific vectors to model activations at inference time, but identifying language-specific directions in the residual stream often relies on multilingual or parallel data that can be expensive to obtain. Sparse autoencoders (SAEs) decompose residual activations into interpretable, sparse feature directions and offer a natural basis for this search, yet existing SAE-based approaches face the same data constraint. We introduce LangFIR (Language Feature Identification via Random-token Filtering), a method that discovers language-specific SAE features using only a small amount of monolingual data and random-token sequences. Many SAE features consistently activated by target-language inputs do not encode language identity. Random-token sequences surface these language-agnostic features, allowing LangFIR to filter them out and isolate a sparse set of language-specific features. We show that these features are extremely sparse, highly selective for their target language, and causally important: directional ablation increases cross-entropy loss only for the corresponding language. Using these features to construct steering vectors for multilingual generation control, LangFIR achieves the best average accuracy BLEU across three models (Gemma 3 1B, Gemma 3 4B, and Llama 3.1 8B), three datasets, and twelve target languages, outperforming the strongest monolingual baseline by up to and surpassing methods that rely on parallel data. Our results suggest that language identity in multilingual LLMs is localized in a sparse set of feature directions discoverable with monolingual data. Code is available at https://anonymous.4open.science/r/LangFIR-C0F5/.
【35】Optimizing Neurorobot Policy under Limited Demonstration Data through Preference Regret
标题:通过偏好后悔优化有限演示数据下的神经机器人政策
链接:https://arxiv.org/abs/2604.03523
作者:Viet Dung Nguyen,Yuhang Song,Anh Nguyen,Jamison Heard,Reynold Bailey,Alexander Ororbia
备注:10 pages, 4 figures, 4 tables
摘要:Robot reinforcement learning from demonstrations (RLfD) assumes that expert data is abundant; this is usually unrealistic in the real world given data scarcity as well as high collection cost. Furthermore, imitation learning algorithms assume that the data is independently and identically distributed, which ultimately results in poorer performance as gradual errors emerge and compound within test-time trajectories. We address these issues by introducing the "master your own expertise" (MYOE) framework, a self-imitation framework that enables robotic agents to learn complex behaviors from limited demonstration data samples. Inspired by human perception and action, we propose and design what we call the queryable mixture-of-preferences state space model (QMoP-SSM), which estimates the desired goal at every time step. These desired goals are used in computing the "preference regret", which is used to optimize the robot control policy. Our experiments demonstrate the robustness, adaptability, and out-of-sample performance of our agent compared to other state-of-the-art RLfD schemes. The GitHub repository that supports this work can be found at: https://github.com/rxng8/neurorobot-preference-regret-learning.
【36】VisionClaw: Always-On AI Agents through Smart Glasses
标题:VisionClaw:通过智能眼镜永远在线的人工智能代理
链接:https://arxiv.org/abs/2604.03486
作者:Xiaoan Liu,DaeHo Lee,Eric J Gonzalez,Mar Gonzalez-Franco,Ryo Suzuki
备注:Submitted to UIST 2026. 10 pages, 11 figures, plus appendix
摘要:We present VisionClaw, an always-on wearable AI agent that integrates live egocentric perception with agentic task execution. Running on Meta Ray-Ban smart glasses, VisionClaw continuously perceives real-world context and enables in-situ, speech-driven action initiation and delegation via OpenClaw AI agents. Therefore, users can directly execute tasks through the smart glasses, such as adding real-world objects to an Amazon cart, generating notes from physical documents, receiving meeting briefings on the go, creating events from posters, or controlling IoT devices. We evaluate VisionClaw through a controlled laboratory study (N=12) and a longitudinal deployment study (N=5). Results show that integrating perception and execution enables faster task completion and reduces interaction overhead compared to non-always-on and non-agent baselines. Beyond performance gains, deployment findings reveal a shift in interaction: tasks are initiated opportunistically during ongoing activities, and execution is increasingly delegated rather than manually controlled. These results suggest a new paradigm for wearable AI agents, where perception and action are continuously coupled to support situated, hands-free interaction.
【37】Investigating Data Interventions for Subgroup Fairness: An ICU Case Study
标题:调查数据干预以实现亚组公平性:ICU案例研究
链接:https://arxiv.org/abs/2604.03478
作者:Erin Tan,Judy Hanwen Shen,Irene Y. Chen
摘要:In high-stakes settings where machine learning models are used to automate decision-making about individuals, the presence of algorithmic bias can exacerbate systemic harm to certain subgroups of people. These biases often stem from the underlying training data. In practice, interventions to "fix the data" depend on the actual additional data sources available -- where many are less than ideal. In these cases, the effects of data scaling on subgroup performance become volatile, as the improvements from increased sample size are counteracted by the introduction of distribution shifts in the training set. In this paper, we investigate the limitations of combining data sources to improve subgroup performance within the context of healthcare. Clinical models are commonly trained on datasets comprised of patient electronic health record (EHR) data from different hospitals or admission departments. Across two such datasets, the eICU Collaborative Research Database and the MIMIC-IV dataset, we find that data addition can both help and hurt model fairness and performance, and many intuitive strategies for data selection are unreliable. We compare model-based post-hoc calibration and data-centric addition strategies to find that the combination of both is important to improve subgroup performance. Our work questions the traditional dogma of "better data" for overcoming fairness challenges by comparing and combining data- and model-based approaches.
【38】Earth Embeddings Reveal Diverse Urban Signals from Space
标题:地球嵌入揭示来自太空的多样化城市信号
链接:https://arxiv.org/abs/2604.03456
作者:Wenjing Gong,Udbhav Srivastava,Yuchen Wang,Yuhao Jia,Qifan Wu,Weishan Bai,Yifan Yang,Xiao Huang,Xinyue Ye
备注:30 pages, 18 figures
摘要:Conventional urban indicators derived from censuses, surveys, and administrative records are often costly, spatially inconsistent, and slow to update. Recent geospatial foundation models enable Earth embeddings, compact satellite image representations transferable across downstream tasks, but their utility for neighborhood-scale urban monitoring remains unclear. Here, we benchmark three Earth embedding families, AlphaEarth, Prithvi, and Clay, for urban signal prediction across six U.S. metropolitan areas from 2020 to 2023. Using a unified supervised-learning framework, we predict 14 neighborhood-level indicators spanning crime, income, health, and travel behavior, and evaluate performance under four settings: global, city-wise, year-wise, and city-year. Results show that Earth embeddings capture substantial urban variation, with the highest predictive skill for outcomes more directly tied to built-environment structure, including chronic health burdens and dominant commuting modes. By contrast, indicators shaped more strongly by fine-scale behavior and local policy, such as cycling, remain difficult to infer. Predictive performance varies markedly across cities but remains comparatively stable across years, indicating strong spatial heterogeneity alongside temporal robustness. Exploratory analysis suggests that cross-city variation in predictive performance is associated with urban form in task-specific ways. Controlled dimensionality experiments show that representation efficiency is critical: compact 64-dimensional AlphaEarth embeddings remain more informative than 64-dimensional reductions of Prithvi and Clay. This study establishes a benchmark for evaluating Earth embeddings in urban remote sensing and demonstrates their potential as scalable, low-cost features for SDG-aligned neighborhood-scale urban monitoring.
【39】Olmo Hybrid: From Theory to Practice and Back
标题:奥尔莫杂交种:从理论到实践再回来
链接:https://arxiv.org/abs/2604.03444
作者:William Merrill,Yanhong Li,Tyler Romero,Anej Svete,Caia Costello,Pradeep Dasigi,Dirk Groeneveld,David Heineman,Bailey Kuehl,Nathan Lambert,Jacob Morrison,Luca Soldaini,Finbarr Timbers,Pete Walsh,Noah A. Smith,Hannaneh Hajishirzi,Ashish Sabharwal
摘要:Recent work has demonstrated the potential of non-transformer language models, especially linear recurrent neural networks (RNNs) and hybrid models that mix recurrence and attention. Yet there is no consensus on whether the potential benefits of these new architectures justify the risk and effort of scaling them up. To address this, we provide evidence for the advantages of hybrid models over pure transformers on several fronts. First, theoretically, we show that hybrid models do not merely inherit the expressivity of transformers and linear RNNs, but can express tasks beyond both, such as code execution. Putting this theory to practice, we train Olmo Hybrid, a 7B-parameter model largely comparable to Olmo 3 7B but with the sliding window layers replaced by Gated DeltaNet layers. We show that Olmo Hybrid outperforms Olmo 3 across standard pretraining and mid-training evaluations, demonstrating the benefit of hybrid models in a controlled, large-scale setting. We find that the hybrid model scales significantly more efficiently than the transformer, explaining its higher performance. However, its unclear why greater expressivity on specific formal problems should result in better scaling or superior performance on downstream tasks unrelated to those problems. To explain this apparent gap, we return to theory and argue why increased expressivity should translate to better scaling efficiency, completing the loop. Overall, our results suggest that hybrid models mixing attention and recurrent layers are a powerful extension to the language modeling paradigm: not merely to reduce memory during inference, but as a fundamental way to obtain more expressive models that scale better during pretraining.
【40】Diffusion Policy with Bayesian Expert Selection for Active Multi-Target Tracking
标题:基于贝叶斯专家选择的多目标主动跟踪扩散策略
链接:https://arxiv.org/abs/2604.03404
作者:Haotian Xiang,Qin Lu,Yaakov Bar-Shalom
摘要
:Active multi-target tracking requires a mobile robot to balance exploration for undetected targets with exploitation of uncertain tracked ones. Diffusion policies have emerged as a powerful approach for capturing diverse behavioral strategies by learning action sequences from expert demonstrations. However, existing methods implicitly select among strategies through the denoising process, without uncertainty quantification over which strategy to execute. We formulate expert selection for diffusion policies as an offline contextual bandit problem and propose a Bayesian framework for pessimistic, uncertainty-aware strategy selection. A multi-head Variational Bayesian Last Layer (VBLL) model predicts the expected tracking performance of each expert strategy given the current belief state, providing both a point estimate and predictive uncertainty. Following the pessimism principle for offline decision-making, a Lower Confidence Bound (LCB) criterion then selects the expert whose worst-case predicted performance is best, avoiding overcommitment to experts with unreliable predictions. The selected expert conditions a diffusion policy to generate corresponding action sequences. Experiments on simulated indoor tracking scenarios demonstrate that our approach outperforms both the base diffusion policy and standard gating methods, including Mixture-of-Experts selection and deterministic regression baselines.
【41】Banana100: Breaking NR-IQA Metrics by 100 Iterative Image Replications with Nano Banana Pro
标题:Banana 100:使用Nano Banana Pro通过100次迭代图像复制打破NR-IQA预设
链接:https://arxiv.org/abs/2604.03400
作者:Kenan Tang,Praveen Arunshankar,Andong Hua,Anthony Yang,Yao Qin
备注:Accepted to CVPR 2026 Workshop on Agentic AI for Visual Media
摘要:The multi-step, iterative image editing capabilities of multi-modal agentic systems have transformed digital content creation. Although latest image editing models faithfully follow instructions and generate high-quality images in single-turn edits, we identify a critical weakness in multi-turn editing, which is the iterative degradation of image quality. As images are repeatedly edited, minor artifacts accumulate, rapidly leading to a severe accumulation of visible noise and a failure to follow simple editing instructions. To systematically study these failures, we introduce Banana100, a comprehensive dataset of 28,000 degraded images generated through 100 iterative editing steps, including diverse textures and image content. Alarmingly, image quality evaluators fail to detect the degradation. Among 21 popular no-reference image quality assessment (NR-IQA) metrics, none of them consistently assign lower scores to heavily degraded images than to clean ones. The dual failures of generators and evaluators may threaten the stability of future model training and the safety of deployed agentic systems, if the low-quality synthetic data generated by multi-turn edits escape quality filters. We release the full code and data to facilitate the development of more robust models, helping to mitigate the fragility of multi-modal agentic systems.
【42】Computer Architecture's AlphaZero Moment: Automated Discovery in an Encircled World
标题:计算机架构的AlphaZero时刻:包围世界中的自动发现
链接:https://arxiv.org/abs/2604.03312
作者:Karthikeyan Sankaralingam
摘要:The end of Moore's Law and Dennard scaling has fundamentally changed the economics of computer architecture. With transistor scaling delivering diminishing returns, architectural innovation is now the primary - and perhaps only - remaining lever for performance improvement. However, we argue that human-driven architecture research is fundamentally ill-suited for this new era. The architectural design space is vast (effectively infinite for practical purposes), yet human teams explore perhaps 50-100 designs per generation, sampling less than 0.001% of possibilities. This approach worked during the abundance era when Moore's Law provided a rising tide that lifted all designs. In the current scarcity paradigm, where every architecture must deliver 2X performance improvements using essentially the same transistor budget, systematic exploration becomes critical. We propose a concrete alternative: automated idea factories that generate and evaluate thousands of candidate architectures weekly through multi-tiered evaluation pipelines, learning from deployed telemetry data in a continuous feedback loop. Early results suggest that such systems can compress architectural design cycles from double-digit months to single-digit weeks by exploring orders of magnitude more candidates than any human team, and do it much faster. We predict that within 2 years, purely human-driven architecture research will be as obsolete as human chess players competing against engines.
【43】Emergent Compositional Communication for Latent World Properties
标题:潜在世界财产的紧急成分传播
链接:https://arxiv.org/abs/2604.03266
作者:Tomek Kaszyński
备注:24 pages, 4 figures, 12 tables. Code: https://github.com/TomekKaszynski/emergent-physics-comm
摘要:Can multi-agent communication pressure extract discrete, compositional representations of invisible physical properties from frozen video features? We show that agents communicating through a Gumbel-Softmax bottleneck with iterated learning develop positionally disentangled protocols for latent properties (elasticity, friction, mass ratio) without property labels or supervision on message structure. With 4 agents, 100% of 80 seeds converge to near-perfect compositionality (PosDis=0.999, holdout 98.3%). Controls confirm multi-agent structure -- not bandwidth or temporal coverage -- drives this effect. Causal intervention shows surgical property disruption (~15% drop on targeted property, <3% on others). A controlled backbone comparison reveals that the perceptual prior determines what is communicable: DINOv2 dominates on spatially-visible ramp physics (98.3% vs 95.1%), while V-JEPA 2 dominates on dynamics-only collision physics (87.4% vs 77.7%, d=2.74). Scale-matched (d=3.37) and frame-matched (d=6.53) controls attribute this gap entirely to video-native pretraining. The frozen protocol supports action-conditioned planning (91.5%) with counterfactual velocity reasoning (r=0.780). Validation on Physics 101 real camera footage confirms 85.6% mass-comparison accuracy on unseen objects, temporal dynamics contributing +11.2% beyond static appearance, agent-scaling compositionality replicating at 90% for 4 agents, and causal intervention extending to real video (d=1.87, p=0.022).
【44】Scaling DPPs for RAG: Density Meets Diversity
标题:扩展RAG DPP:密度满足多样性
链接:https://arxiv.org/abs/2604.03240
作者:Xun Sun,Baiheng Xie,Li Huang,Qiang Gao
摘要
:Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) by grounding generation in external knowledge, yielding relevance responses that are aligned with factual evidence and evolving corpora. Standard RAG pipelines construct context through relevance ranking, performing point-wise scoring between the user query and each corpora chunk. This formulation, however, ignores interactions among retrieved candidates, leading to redundant contexts that dilute density and fail to surface complementary evidence. We argue that effective retrieval should optimize jointly for both density and diversity, ensuring the grounding evidence that is dense in information yet diverse in coverage. In this study, we propose ScalDPP, a diversity-aware retrieval mechanism for RAG that incorporates Determinantal Point Processes (DPPs) through a lightweight P-Adapter, enabling scalable modeling of inter-chunk dependencies and complementary context selection. In addition, we develop a novel set-level objective, Diverse Margin Loss (DML), that enforces ground-truth complementary evidence chains to dominate any equally sized redundant alternatives under DPP geometry. Experimental results demonstrate the superiority of ScalDPP, substantiating our core statement in practice.
【45】Integrating Artificial Intelligence, Physics, and Internet of Things: A Framework for Cultural Heritage Conservation
标题:融合人工智能、物理学和物联网:文化遗产保护框架
链接:https://arxiv.org/abs/2604.03233
作者:Carmine Valentino,Federico Pichi,Francesco Colace,Dajana Conte,Gianluigi Rozza
摘要:The conservation of cultural heritage increasingly relies on integrating technological innovation with domain expertise to ensure effective monitoring and predictive maintenance. This paper presents a novel framework to support the preservation of cultural assets, combining Internet of Things (IoT) and Artificial Intelligence (AI) technologies, enhanced with the physical knowledge of phenomena. The framework is structured into four functional layers that permit the analysis of 3D models of cultural assets and elaborate simulations based on the knowledge acquired from data and physics. A central component of the proposed framework consists of Scientific Machine Learning, particularly Physics-Informed Neural Networks (PINNs), which incorporate physical laws into deep learning models. To enhance computational efficiency, the framework also integrates Reduced Order Methods (ROMs), specifically Proper Orthogonal Decomposition (POD), and is also compatible with classical Finite Element (FE) methods. Additionally, it includes tools to automatically manage and process 3D digital replicas, enabling their direct use in simulations. The proposed approach offers three main contributions: a methodology for processing 3D models of cultural assets for reliable simulation; the application of PINNs to combine data-driven and physics-based approaches in cultural heritage conservation; and the integration of PINNs with ROMs to efficiently model degradation processes influenced by environmental and material parameters. The reproducible and open-access experimental phase exploits simulated scenarios on complex and real-life geometries to test the efficacy of the proposed framework in each of its key components, allowing the possibility of dealing with both direct and inverse problems. Code availability: https://github.com/valc89/PhysicsInformedCulturalHeritage
【46】Copilot-Assisted Second-Thought Framework for Brain-to-Robot Hand Motion Decoding
标题:Copilot辅助的大脑到机器人手部运动解码的第二思维框架
链接:https://arxiv.org/abs/2603.27492
作者:Yizhe Li,Shixiao Wang,Jian K. Liu
摘要:Motor kinematics prediction (MKP) from electroencephalography (EEG) is an important research area for developing movement-related brain-computer interfaces (BCIs). While traditional methods often rely on convolutional neural networks (CNNs) or recurrent neural networks (RNNs), Transformer-based models have shown strong ability in modeling long sequential EEG data. In this study, we propose a CNN-attention hybrid model for decoding hand kinematics from EEG during grasp-and-lift tasks, achieving strong performance in within-subject experiments. We further extend this approach to EEG-EMG multimodal decoding, which yields substantially improved results. Within-subject tests achieve PCC values of 0.9854, 0.9946, and 0.9065 for the X, Y, and Z axes, respectively, computed on the midpoint trajectory between the thumb and index finger, while cross-subject tests result in 0.9643, 0.9795, and 0.5852. The decoded trajectories from both modalities are then used to control a Franka Panda robotic arm in a MuJoCo simulation. To enhance trajectory fidelity, we introduce a copilot framework that filters low-confidence decoded points using a motion-state-aware critic within a finite-state machine. This post-processing step improves the overall within-subject PCC of EEG-only decoding to 0.93 while excluding fewer than 20% of the data points.
【47】Noisy Nonreciprocal Pairwise Comparisons: Scale Variation, Noise Calibration, and Admissible Ranking Regions
标题:噪音非互惠成对比较:规模变化、噪音校准和可接受排名区域
链接:https://arxiv.org/abs/2604.04588
作者:Jean-Pierre Magnot
摘要:Pairwise comparisons are widely used in decision analysis, preference modeling, and evaluation problems. In many practical situations, the observed comparison matrix is not reciprocal. This lack of reciprocity is often treated as a defect to be corrected immediately. In this article, we adopt a different point of view: part of the nonreciprocity may reflect a genuine variation in the evaluation scale, while another part is due to random perturbations. We introduce an additive model in which the unknown underlying comparison matrix is consistent but not necessarily reciprocal. The reciprocal component carries the global ranking information, whereas the symmetric component describes possible scale variation. Around this structured matrix, we add a random perturbation and show how to estimate the noise level, assess whether the scale variation remains moderate, and assign probabilities to admissible ranking regions in the sense of strict ranking by pairwise comparisons. We also compare this approach with the brutal projection onto reciprocal matrices, which suppresses all symmetric information at once. The Gaussian perturbation model is used here not because human decisions are exactly Gaussian, but because observed judgment errors often result from the accumulation of many small effects. In such a context, the central limit principle provides a natural heuristic justification for Gaussian noise. This makes it possible to derive explicit estimators and probability assessments while keeping the model interpretable for decision problems.
【48】Minimising Willmore Energy via Neural Flow
标题:通过神经流最大限度地减少威尔莫能量
链接:https://arxiv.org/abs/2604.04321
作者:Edward Hirst,Henrique N. Sá Earp,Tomás S. R. Silva
备注:16+5 pages, 9 figures
摘要:The neural Willmore flow of a closed oriented $2$-surface in $\mathbb{R}^3$ is introduced as a natural evolution process to minimise the Willmore energy, which is the squared $L^2$-norm of mean curvature. Neural architectures are used to model maps from topological $2d$ domains to $3d$ Euclidean space, where the learning process minimises a PINN-style loss for the Willmore energy as a functional on the embedding. Training reproduces the expected round sphere for genus $0$ surfaces, and the Clifford torus for genus $1$ surfaces, respectively. Furthermore, the experiment in the genus $2$ case provides a novel approach to search for minimal Willmore surfaces in this open problem.
【49】CavMerge: Merging K-means Based on Local Log-Concavity
标题:CavMerge:基于局部Log-Concity的K均值合并
链接:https://arxiv.org/abs/2604.04302
作者:Zhili Qiao,Wangqian Ju,Peng Liu
摘要:K-means clustering, a classic and widely-used clustering technique, is known to exhibit suboptimal performance when applied to non-linearly separable data. Numerous adjustments and modifications have been proposed to address this issue, including methods that merge K-means results from a relatively large K to obtain a final cluster assignment. However, existing methods of this nature often encounter computational inefficiencies and suffer from hyperparameter tuning. Here we present \emph{CavMerge}, a novel K-means merging algorithm that is intuitive, free of parameter tuning, and computationally efficient. Operating under minimal local distributional assumptions, our algorithm demonstrates strong consistency and rapid convergence guarantees. Empirical studies on various simulated and real datasets demonstrate that our method yields more reliable clusters in comparison to current state-of-the-art algorithms.
【50】Avoiding Non-Integrable Beliefs in Expectation Propagation
标题:避免期望传播中的不可整合信念
链接:https://arxiv.org/abs/2604.04264
作者:Zilu Zhao,Jichao Chen,Dirk Slock
摘要:Expectation Propagation (EP) is a widely used iterative message-passing algorithm that decomposes a global inference problem into multiple local ones. It approximates marginal distributions as ``beliefs'' using intermediate functions called ``messages''. It has been shown that the stationary points of EP are the same as corresponding constrained Bethe Free Energy (BFE) optimization problem. Therefore, EP is an iterative method of optimizing the constrained BFE. However, the iterative method may fall out of the feasible set of the BFE optimization problem, i.e., the beliefs are not integrable. In most literature, the authors use various methods to keep all the messages integrable. In most Bayesian estimation problems, limiting the messages to be integrable shrinks the actual feasible set. Furthermore, in extreme cases where the factors are not integrable, making the message itself integrable is not enough to have integrable beliefs. In this paper, two EP frameworks are proposed to ensure that EP has integrable beliefs. Both of the methods allows non-integrable messages. We then investigate the signal recovery problem in Generalized Linear Model (GLM) using our proposed methods.
【51】PATHFINDER: Multi-objective discovery in structural and spectral spaces
标题:PathFindER:结构和光谱空间中的多目标发现
链接:https://arxiv.org/abs/2604.04194
作者:Kamyar Barakati,Boris N. Slautin,Utkarsh Pratiush,Hiroshi Funakubo,Sergei V. Kalinin
备注:24 pages, 6 figures
摘要:Automated decision-making is becoming key for automated characterization including electron and scanning probe microscopies and nano indentation. Most machine learning driven workflows optimize a single predefined objective and tend to converge prematurely on familiar responses, overlooking rare but scientifically important states. More broadly, the challenge is not only where to measure next, but how to coordinate exploration across structural, spectral, and measurement spaces under finite experimental budgets while balancing target-driven optimization with novelty discovery. Here we introduce PATHFINDER, a framework for autonomous microscopy that combines novelty driven exploration with optimization, helping the system discover more diverse and useful representations across structural, spectral, and measurement spaces. By combining latent space representations of local structure, surrogate modeling of functional response, and Pareto-based acquisition, the framework selects measurements that balance novelty discovery in feature and object space and are informative and experimentally actionable. Benchmarked on pre acquired STEM EELS data and realized experimentally in scanning probe microscopy of ferroelectric materials, this approach expands the accessible structure property landscape and avoids collapse onto a single apparent optimum. These results point to a new mode of autonomous microscopy that is not only optimization-driven, but also discovery-oriented, broad in its search, and responsive to human guidance.
【52】Biconvex Biclustering
标题:双凸双集群
链接:https://arxiv.org/abs/2604.03936
作者:Sam Rosen,Eric C. Chi,Jason Xu
备注:34 pages, 5 figures
摘要
:This article proposes a biconvex modification to convex biclustering in order to improve its performance in high-dimensional settings. In contrast to heuristics that discard a subset of noisy features a priori, our method jointly learns and accordingly weighs informative features while discovering biclusters. Moreover, the method is adaptive to the data, and is accompanied by an efficient algorithm based on proximal alternating minimization, complete with detailed guidance on hyperparameter tuning and efficient solutions to optimization subproblems. These contributions are theoretically grounded; we establish finite-sample bounds on the objective function under sub-Gaussian errors, and generalize these guarantees to cases where input affinities need not be uniform. Extensive simulation results reveal our method consistently recovers underlying biclusters while weighing and selecting features appropriately, outperforming peer methods. An application to a gene microarray dataset of lymphoma samples recovers biclusters matching an underlying classification, while giving additional interpretation to the mRNA samples via the column groupings and fitted weights.
【53】PhaseFlow4D: Physically Constrained 4D Beam Reconstruction via Feedback-Guided Latent Diffusion
标题:PhaseFlow 4D:通过反馈引导潜在扩散进行物理约束的4D束重建
链接:https://arxiv.org/abs/2604.03885
作者:Alexander Scheinker,Alexander Plastun,Peter Ostroumov
摘要:We address the problem of recovering a time-varying 4D distribution from a sparse sequence of 2D projections - analogous to novel-view synthesis from sparse cameras, but applied to the 4D transverse phase space density $ρ(x,p_x,y,p_y)$ of charged particle beams. Direct single shot measurement of this high-dimensional distribution is physically impossible in real particle accelerator systems; only limited 1D or 2D projections are accessible. We propose PhaseFlow4D, a feedback-guided latent diffusion model that reconstructs and tracks the full 4D phase space from incomplete 2D observations alone, with built-in hard physics constraints. Our core technical contribution is a 4D VAE whose decoder generates the full 4D phase space tensor, from which 2D projections are analytically computed and compared against 2D beam measurements. This projection-consistency constraint guarantees physical correctness by construction - not as a soft penalty, but as an architectural prior. An adaptive feedback loop then continuously tunes the conditioning vector of the latent diffusion model to track time-varying distributions online without retraining. We validate on multi-particle simulations of heavy-ion beams at the Facility for Rare Isotope Beams (FRIB), where full physics simulations require $\sim$6 hours on a 100-core HPC system. PhaseFlow4D achieves accurate 4D reconstructions 11000$\times$ faster while faithfully tracking distribution shifts under time-varying source conditions - demonstrating that principled generative reconstruction under incomplete observations transfers robustly beyond visual domains.
【54】New insights into Elo algorithm for practitioners and statisticians
标题:从业者和统计学家对Elo算法的新见解
链接:https://arxiv.org/abs/2604.03840
作者:Leszek Szczecinski
摘要:This work reconciles two perspectives on the Elo ranking that coexist in the literature: the practitioner's view as a heuristic feedback rule, and the statistician's view as online maximum likelihood estimation via stochastic gradient ascent. Both perspectives coincide exactly in the binary case (iff the expected score is the logistic function). However, estimation noise forces a principled decoupling between the model used for ranking and the model used for prediction: the effective scale and home-field advantage parameter must be adjusted to account for the noise. We provide both closed-form corrections and a data-driven identification procedure. For multilevel outcomes, an exact relationship exists when outcome scores are uniformly spaced, but approximations are preferred in general: they account for estimation noise and better fit the data. The decoupled approach substantially outperforms the conventional one that reuses the ranking model for prediction, and serves as a diagnostic of convergence status. Applied to six years of FIFA men's ranking, we find that the ranking had not converged for the vast majority of national teams. The paper is written in a semi-tutorial style accessible to practitioners, with all key results accompanied by closed-form expressions and numerical examples.
【55】The Generalised Kernel Covariance Measure
标题:广义核协方差度量
链接:https://arxiv.org/abs/2604.03721
作者:Luca Bergen,Dino Sejdinovic,Vanessa Didelez
备注:Accepted for the 5th Conference on Causal Learning and Reasoning (CLeaR 2026)
摘要:We consider the problem of conditional independence (CI) testing and adopt a kernel-based approach. Kernel-based CI tests embed variables in reproducing kernel Hilbert spaces, regress their embeddings on the conditioning variables, and test the resulting residuals for marginal independence. This approach yields tests that are sensitive to a broad range of conditional dependencies. Existing methods, however, rely heavily on kernel ridge regression, which is computationally expensive when properly tuned and yields poorly calibrated tests when left untuned, which limits their practical usefulness. We propose the Generalised Kernel Covariance Measure (GKCM), a regression-model-agnostic kernel-based CI test that accommodates a broad class of regression estimators. Building on the Generalised Hilbertian Covariance Measure framework (Lundborg et al., 2022), we characterise conditions under which GKCM satisfies uniform asymptotic level guarantees. In simulations, GKCM paired with tree-based regression models frequently outperforms state-of-the-art CI tests across a diverse range of data-generating processes, achieving better type I error control and competitive or superior power.
【56】Nonparametric Regression Discontinuity Designs with Survival Outcomes
标题:具有生存结局的非参数回归不连续设计
链接:https://arxiv.org/abs/2604.03502
作者:Maximilian Schuessler,Erik Sverdrup,Robert Tibshirani,Stefan Wager
摘要
:Quasi-experimental evaluations are central for generating real-world causal evidence and complementing insights from randomized trials. The regression discontinuity design (RDD) is a quasi-experimental design that can be used to estimate the causal effect of treatments that are assigned based on a running variable crossing a threshold. Such threshold-based rules are ubiquitous in healthcare, where predictive and prognostic biomarkers frequently guide treatment decisions. However, standard RD estimators rely on complete outcome data, an assumption often violated in time-to-event analyses where censoring arises from loss to follow-up. To address this issue, we propose a nonparametric approach that leverages doubly robust censoring corrections and can be paired with existing RD estimators. Our approach can handle multiple survival endpoints, long follow-up times, and covariate-dependent variation in survival and censoring. We discuss the relevance of our approach across multiple areas of applications and demonstrate its usefulness through simulations and the prostate component of the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial where our new approach offers several advantages, including higher efficiency and robustness to misspecification. We have also developed an open-source software package, $\texttt{rdsurvival}$, for the $\texttt{R}$ language.
【57】Recurrent Quantum Feature Maps for Reservoir Computing
标题:用于储层计算的回归量子特征图
链接:https://arxiv.org/abs/2604.03469
作者:Utkarsh Singh,Aaron Z. Goldberg,Christoph Simon,Khabat Heshami
备注:11 pages, 13 figures
摘要:Reservoir computing promises a fast method for handling large amounts of temporal data. This hinges on constructing a good reservoir--a dynamical system capable of transforming inputs into a high-dimensional representation while remembering properties of earlier data. In this work, we introduce a reservoir based on recurrent quantum feature maps where a fixed quantum circuit is reused to encode both current inputs and a classical feedback signal derived from previous outputs. We evaluate the model on the Mackey-Glass time-series prediction task using our recently introduced CP feature map, and find that it achieves lower mean squared error than standard classical baselines, including echo state networks and multilayer perceptrons, while maintaining compact circuit depth and qubit requirements. We further analyze memory capacity and show that the model effectively retains temporal information, consistent with its forecasting accuracy. Finally, we study the impact of realistic noise and find that performance is robust to several noise channels but remains sensitive to two-qubit gate errors, identifying a key limitation for near-term implementations.
【58】Expressibility of neural quantum states: a Walsh-complexity perspective
标题:神经量子状态的可表达性:沃尔什复杂性观点
链接:https://arxiv.org/abs/2604.03294
作者:Taige Wang
备注:8 pages, 2 figures
摘要:Neural quantum states are powerful variational wavefunctions, but it remains unclear which many-body states can be represented efficiently by modern additive architectures. We introduce Walsh complexity, a basis-dependent measure of how broadly a wavefunction is spread over parity patterns. States with an almost uniform Walsh spectrum require exponentially large Walsh complexity from any good approximant. We show that shallow additive feed-forward networks cannot generate such complexity in the tame regime, e.g. polynomial activations with subexponential parameter scaling. As a concrete example, we construct a simple dimerized state prepared by a single layer of disjoint controlled-$Z$ gates. Although it has only short-range entanglement and a simple tensor-network description, its Walsh complexity is maximal. Full-cube fits across system size and depth are consistent with the complexity bound: for polynomial activations, successful fitting appears only once depth reaches a logarithmic scale in $N$, whereas activation saturation in $\tanh$ produces a sharp threshold-like jump already at depth $3$. Walsh complexity therefore provides an expressibility axis complementary to entanglement and clarifies when depth becomes an essential resource for additive neural quantum states.
机器翻译由腾讯交互翻译提供,仅供参考
点击“阅读原文”获取带摘要的学术速递