机器学习学术速递[4.30]

点击阅读原文访问arxivdaily.com，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏等功能！

cs.LG 方向，今日共计127篇

大模型相关(13篇)

【1】Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models
标题：扭转潮流：跨架构蒸馏以扩散大型语言模型
链接：https://arxiv.org/abs/2604.26951

作者：Gongbo Zhang,Wen Wang,Ye Tian,Li Yuan
备注：15 pages, 3 figures. Code: https://github.com/PKU-YuanGroup/TIDE
摘要：扩散大语言模型（dLLM）提供并行解码和双向上下文，但最先进的dLLM需要数十亿个参数才能获得有竞争力的性能。虽然dLLM的现有蒸馏方法减少了单个架构内的推理步骤，但没有一个解决跨架构知识转移，其中教师和学生在架构，注意力机制和标记器方面存在差异。TIDE是第一个跨体系结构dLLM蒸馏的框架，包括三个模块组件：（1）TIDAL，它在训练过程和扩散时间步长上联合调节蒸馏强度，以考虑教师的噪声依赖可靠性;（2）CompDemo，它通过互补掩码分裂来丰富教师的上下文，以改善重度掩码下的预测;和（3）反向CALM，一种交叉标记器目标，它反转块级似然匹配，产生有界梯度和双端噪声过滤。通过两个异构管道将8B密集型和16 B MoE教师提取为0.6B学生，在八个基准测试中平均超过基线1.53分，在代码生成方面取得了显着的进步，其中HumanEval得分达到48.78，而AR基线为32.3。
摘要：Diffusion large language models (dLLMs) offer parallel decoding and bidirectional context, but state-of-the-art dLLMs require billions of parameters for competitive performance. While existing distillation methods for dLLMs reduce inference steps within a single architecture, none address cross-architecture knowledge transfer, in which the teacher and student differ in architecture, attention mechanism, and tokenizer. We present TIDE, the first framework for cross-architecture dLLM distillation, comprising three modular components: (1) TIDAL, which jointly modulates distillation strength across training progress and diffusion timestep to account for the teacher's noise-dependent reliability; (2) CompDemo, which enriches the teacher's context via complementary mask splitting to improve predictions under heavy masking; and (3) Reverse CALM, a cross-tokenizer objective that inverts chunk-level likelihood matching, yielding bounded gradients and dual-end noise filtering. Distilling 8B dense and 16B MoE teachers into a 0.6B student via two heterogeneous pipelines outperforms the baseline by an average of 1.53 points across eight benchmarks, yielding notable gains in code generation, where HumanEval scores reach 48.78 compared to 32.3 for the AR baseline.

【2】HealthNLP_Retrievers at ArchEHR-QA 2026: Cascaded LLM Pipeline for Grounded Clinical Question Answering
标题：HealthNLP_Retrievers参加ArchEHR-QA 2026：用于接地临床问题解答的级联LLM管道
链接：https://arxiv.org/abs/2604.26880

作者：Md Biplob Hosen,Md Alomgeer Hussein,Md Akmol Masud,Omar Faruque,Tera L Reynolds,Lujie Karen Chen
摘要：患者门户网站现在允许个人直接访问他们的电子健康记录（EHR），但访问本身并不能确保患者理解这些记录中包含的复杂临床信息或对其采取行动。ArchEHR-QA 2026共享任务通过专注于EHR上的接地问题回答来解决这一挑战，本文介绍了HealthNLP_Retrievers团队为此任务开发的系统。所提出的方法使用由Gemini 2.5 Pro大型语言模型提供支持的多级级联管道来解释患者撰写的问题并从冗长的临床记录中检索相关证据。我们的体系结构包括四个集成模块：（1）一个Few-Shot查询重构单元，它总结了冗长的病人查询;（2）一个基于证据的证据评分器，它对临床句子进行排名以优先考虑召回;（3）一个接地响应生成器，它合成严格限于已识别证据的专业水平的答案;以及（4）将生成的答案链接到支持临床句子的高精度多对多对齐框架。这种级联方法取得了有竞争力的结果。在各个方面，该系统在问题解释方面排名第一，在答案生成方面排名第五，在证据识别方面排名第七，在答案-证据对齐方面排名第九。这些结果表明，在结构化的多阶段管道中集成大型语言模型可以提高基础，精度和以患者为导向的健康沟通的专业质量。为了支持可重复性，我们的源代码在GitHub存储库中公开提供
摘要：Patient portals now give individuals direct access to their electronic health records (EHRs), yet access alone does not ensure patients understand or act on the complex clinical information contained in these records. The ArchEHR-QA 2026 shared task addresses this challenge by focusing on grounded question answering over EHRs, and this paper presents the system developed by the HealthNLP_Retrievers team for this task. The proposed approach uses a multi-stage cascaded pipeline powered by the Gemini 2.5 Pro large language model to interpret patient-authored questions and retrieve relevant evidence from lengthy clinical notes. Our architecture comprises four integrated modules: (1) a few-shot query reformulation unit which summarizes verbose patient queries; (2) a heuristic-based evidence scorer which ranks clinical sentences to prioritize recall; (3) a grounded response generator which synthesizes professional-caliber answers restricted strictly to identified evidence; and (4) a high-precision many-to-many alignment framework which links generated answers to supporting clinical sentences. This cascaded approach achieved competitive results. Across the individual tracks, the system ranked 1st in question interpretation, 5th in answer generation, 7th in evidence identification, and 9th in answer-evidence alignment. These results show that integrating large language models within a structured multi-stage pipeline improves grounding, precision, and the professional quality of patient-oriented health communication. To support reproducibility, our source code is publicly available in our GitHub repository

【3】Language Diffusion Models are Associative Memories Capable of Retrieving Unseen Data
标题：语言扩散模型是能够检索不可见数据的关联记忆
链接：https://arxiv.org/abs/2604.26841

作者：Bao Pham,Mohammed J. Zaki,Luca Ambrogioni,Dmitry Krotov,Matteo Negri
备注：Also see arXiv:2505.21777 for a related work
摘要：语言扩散模型什么时候记住它们的训练数据，以及如何定量评估它们真正的生成机制？我们解决这些问题，表明基于均匀的离散扩散模型（UDDMs）从根本上表现为联想记忆（AM）$\textit{与新兴的创造能力}$。AM的核心思想是通过在它们周围建立不同的吸引域来可靠地将存储的数据点恢复为$\textit{memories}$。历史上，像Hopfield网络这样的模型使用显式的能量函数来保证这些稳定的吸引子。我们通过利用能量不是严格必要的观察来扩大这一视角，因为吸引力的盆地也可以通过条件似然最大化来形成。通过评估$\textit{training}$和$\textit{test}$示例的令牌恢复，我们在UDDM中识别出由训练数据集大小控制的急剧记忆到泛化过渡：随着训练数据集的增加，训练示例周围的盆地缩小，未见过的测试示例周围的盆地扩大，直到两者后来收敛到同一水平。关键是，我们可以只使用预测令牌序列的条件熵来检测这种转变：记忆的特点是条件熵消失，而在泛化机制中，大多数令牌的条件熵仍然是有限的。因此，条件熵提供了一个实用的探测部署模型中的记忆到泛化的过渡。
摘要：When do language diffusion models memorize their training data, and how to quantitatively assess their true generative regime? We address these questions by showing that Uniform-based Discrete Diffusion Models (UDDMs) fundamentally behave as Associative Memories (AMs) $\textit{with emergent creative capabilities}$. The core idea of an AM is to reliably recover stored data points as $\textit{memories}$ by establishing distinct basins of attraction around them. Historically, models like Hopfield networks use an explicit energy function to guarantee these stable attractors. We broaden this perspective by leveraging the observation that energy is not strictly necessary, as basins of attraction can also be formed via conditional likelihood maximization. By evaluating token recovery of $\textit{training}$ and $\textit{test}$ examples, we identify in UDDMs a sharp memorization-to-generalization transition governed by the size of the training dataset: as it increases, basins around training examples shrink and basins around unseen test examples expand, until both later converge to the same level. Crucially, we can detect this transition using only the conditional entropy of predicted token sequences: memorization is characterized by vanishing conditional entropy, while in the generalization regime the conditional entropy of most tokens remains finite. Thus, conditional entropy offers a practical probe for the memorization-to-generalization transition in deployed models.

【4】Unifying Sparse Attention with Hierarchical Memory for Scalable Long-Context LLM Serving
标题：将稀疏注意力与分层内存统一起来，以实现可扩展的长上下文LLM服务
链接：https://arxiv.org/abs/2604.26837

作者：Zihan Zhao,Baotong Lu,Shengjie Lin,Yizou Chen,Jing Liu,Yanqi Zhang,Ziming Miao,Ming-Chang Yang,Haiying Shen,Qi Chen,Fan Yang
备注：15 pages
摘要：长上下文LLM服务受到参与不断增长的KV缓存的成本的影响。动态稀疏注意力通过每个解码步骤仅访问KV状态的一个小的、依赖于查询的子集并将KV存储扩展到CPU存储器来承诺缓解。然而，在实践中，这些算法节省很少转化为端到端的系统级增益，因为稀疏方法通常以不同的粒度运行，因此依赖于特定的，每个算法的实现。与此同时，分层KV存储引入了一个新的系统瓶颈：跨GPU-CPU边界检索细粒度的不规则KV子集可以轻松消除稀疏性的好处。我们提出了SPIN，一个稀疏注意感知的推理框架，通过三种技术协同设计执行流水线和分层KV存储：（1）统一的分区抽象，将不同的稀疏粒度映射到共享的基于页面的KV衬底上;（2）位置感知的KV缓存管理器，动态调整每个请求的HBM预算，并使用GPU友好的分桶LRU策略来减少PCIe往返;以及（3）两级分层元数据布局，其大小适合于活动工作集而不是最坏情况的地址空间。SPIN构建在具有三种代表性稀疏注意算法的vLLM基础上，与vLLM相比，SPIN提供了1.66- 5.66倍的端到端吞吐量和7- 9倍的TTFT，并将TPOT降低了58%。
摘要：Long-context LLM serving is bottlenecked by the cost of attending over ever-growing KV caches. Dynamic sparse attention promises relief by accessing only a small, query-dependent subset of the KV state per decoding step and extending the KV storage to CPU memory. In practice, however, these algorithmic savings rarely translate into end-to-end system-level gains because sparse methods typically operate at different granularities and thus rely on ad hoc, per-algorithm implementations. At the same time, hierarchical KV storage introduces a new systems bottleneck: retrieving fine-grained, irregular KV subsets across the GPU-CPU boundary can easily erase the benefits of sparsity. We present SPIN, a sparse-attention-aware inference framework that co-designs the execution pipeline with hierarchical KV storage through three techniques: (1) a unified partition abstraction that maps different sparsity granularities onto a shared page-based KV substrate; (2) a locality-aware KV cache manager that dynamically sizes per-request HBM budgets and uses a GPU-friendly bucketed LRU policy to cut PCIe round-trips; and (3) a two-level hierarchical metadata layout sized to the active working set rather than the worst-case address space. Built on vLLM with three representative sparse attention algorithms, SPIN delivers 1.66-5.66x higher end-to-end throughput and 7-9x lower TTFT than vLLM, and reduces TPOT by up to 58% over the original sparse-attention implementations.

【5】Domain-Adapted Small Language Models for Reliable Clinical Triage
标题：用于可靠临床分诊的领域自适应小语言模型
链接：https://arxiv.org/abs/2604.26766

作者：Manar Aljohani,Brandon Ho,Kenneth McKinley,Dennis Ren,Xuan Wang
摘要：准确和一致的紧急情况严重程度指数（ESI）分配仍然是急诊部门的一个持续挑战，其中高度可变的自由文本分诊文档导致错误分类和工作流程效率低下。这项研究评估了开源小语言模型（SLM）是否可以作为可靠的，隐私保护的决策支持工具，用于临床分诊。我们系统地比较了不同提示管道中的多个SLM，发现临床小插曲，分类叙述的简明摘要，产生了最准确的预测。SLM Qwen2.5- 7 B在准确性、稳定性和计算效率方面表现出最强的平衡。通过使用专家策划和银标准儿科分诊数据进行大规模领域调整，微调的Qwen2.5- 7 B模型大大减少了不一致性和临床显著错误，优于所有基线SLM和高级专有大型语言模型（LLM，例如，GPT-40）。这些研究结果强调了机构特定的SLM可靠的，隐私保护的ESI决策支持的可行性，并强调了更复杂的推理策略有针对性的微调的重要性。
摘要：Accurate and consistent Emergency Severity Index (ESI) assignment remains a persistent challenge in emergency departments, where highly variable free-text triage documentation contributes to mistriage and workflow inefficiencies. This study evaluates whether open-source small language models (SLMs) can serve as reliable, privacy-preserving decision-support tools for clinical triage. We systematically compared multiple SLMs across diverse prompting pipelines and found that clinical vignettes, concise summaries of triage narratives, yielded the most accurate predictions. The SLM, Qwen2.5-7B, demonstrated the strongest balance of accuracy, stability, and computational efficiency. Through large-scale domain adaptation using expert-curated and silver-standard pediatric triage data, fine-tuned Qwen2.5-7B models substantially reduced discordance and clinically significant errors, outperforming all baseline SLMs and advanced proprietary large language models (LLMs, e.g., GPT-4o). These findings highlight the feasibility of institution-specific SLMs for reliable, privacy-preserving ESI decision support and underscore the importance of targeted fine-tuning over more complex inference strategies.

【6】TLPO: Token-Level Policy Optimization for Mitigating Language Confusion in Large Language Models
标题：TLPO：缓解大型语言模型中语言混乱的令牌级策略优化
链接：https://arxiv.org/abs/2604.26553

作者：Jinho Choo,JunSeung Lee,Jimyeong Kim,Yeeho Song,S. K. Hong,Yeong-Dae Kwon
备注：Accepted to the main conference of ACL 2026
摘要：大型语言模型（LLM）表现出强大的多语言能力，但往往无法以预期的语言一致地生成响应，表现出一种称为语言混乱的现象。基于序列级微调的先前缓解方法，如DPO，ORPO和GRPO，在整个响应的水平上运行，并可能导致一般模型功能的意外退化，从而激发对更细粒度替代方案的需求。为了解决这个问题，我们引入了令牌级策略优化（TLPO），这是一个微调框架，旨在通过本地化的令牌级更新来减轻语言混乱。TLPO识别容易出错的位置，探索替代候选令牌，并使用定制的目标更新策略，以在粒度级别抑制错误诱导输出。这种选择性干预能够有效缓解语言混乱，而不会损害模型的一般能力。在不同语言的多个多语言LLM上的实验表明，TLPO在提高语言一致性的同时保持下游任务准确性方面显着优于基线。
摘要：Large language models (LLMs) demonstrate strong multilingual capabilities, yet often fail to consistently generate responses in the intended language, exhibiting a phenomenon known as language confusion. Prior mitigation approaches based on sequence-level fine-tuning, such as DPO, ORPO, and GRPO, operate at the level of entire responses and can lead to unintended degradation of general model capabilities, motivating the need for more fine-grained alternatives. To address this, we introduce Token-Level Policy Optimization (TLPO), a fine-tuning framework designed to mitigate language confusion through localized, token-level updates. TLPO identifies error-prone positions, explores alternative candidate tokens, and updates the policy using a tailored objective to suppress error-inducing outputs at a granular level. This selective intervention enables effective mitigation of language confusion without compromising the model's general abilities. Experiments on multiple multilingual LLMs across diverse languages demonstrate that TLPO significantly outperforms baselines in improving language consistency while preserving downstream task accuracy.

【7】Progressive Semantic Communication for Efficient Edge-Cloud Vision-Language Models
标题：高效边缘云视觉语言模型的渐进式语义通信
链接：https://arxiv.org/abs/2604.26508

作者：Cyril Shih-Huan Hsu,Wig Yuan-Cheng Cheng,Chrysa Papagianni
备注：Under review. Extended version with additional figures and appendices
摘要：在边缘设备上部署视觉语言模型（VLM）仍然具有挑战性，因为它们的计算和内存需求很大，超出了资源受限的嵌入式平台的能力。相反，在带宽有限的环境中，将推理完全卸载到云通常是不切实际的，其中传输原始视觉数据会引入大量的延迟开销。虽然最近的边缘云协作架构试图跨设备划分VLM工作负载，但它们通常依赖于传输固定大小的表示，缺乏对动态网络条件的适应性，并且未能充分利用语义冗余。在本文中，我们提出了一个渐进的语义通信框架的边缘云VLM推理，使用Meta AutoEncoder，压缩视觉令牌到自适应，逐步细化表示，使即插即用的部署与现成的VLM没有额外的微调。这种设计允许在不同的信息级别上灵活传输，在通信成本和语义保真度之间提供可控的权衡。我们实现了一个完整的端到端边缘云系统，包括嵌入式恩智浦i.MX95平台和GPU服务器，通过带宽受限的网络进行通信。实验结果表明，在1 Mbps的上行链路，所提出的渐进式方案显着降低了网络延迟相比，全边缘和全云解决方案，同时保持高的语义一致性，即使在高压缩。实现代码将在https://github.com/open-ep/ProSemComVLM上发布。
摘要：Deploying Vision-Language Models (VLMs) on edge devices remains challenging due to their substantial computational and memory demands, which exceed the capabilities of resource-constrained embedded platforms. Conversely, fully offloading inference to the cloud is often impractical in bandwidth-limited environments, where transmitting raw visual data introduces substantial latency overhead. While recent edge-cloud collaborative architectures attempt to partition VLM workloads across devices, they typically rely on transmitting fixed-size representations, lacking adaptability to dynamic network conditions and failing to fully exploit semantic redundancy. In this paper, we propose a progressive semantic communication framework for edge-cloud VLM inference, using a Meta AutoEncoder that compresses visual tokens into adaptive, progressively refinable representations, enabling plug-and-play deployment with off-the-shelf VLMs without additional fine-tuning. This design allows flexible transmission at different information levels, providing a controllable trade-off between communication cost and semantic fidelity. We implement a full end-to-end edge-cloud system comprising an embedded NXP i.MX95 platform and a GPU server, communicating over bandwidth-constrained networks. Experimental results show that, at 1 Mbps uplink, the proposed progressive scheme significantly reduces network latency compared to full-edge and full-cloud solutions, while maintaining high semantic consistency even under high compression. The implementation code will be released upon publication at https://github.com/open-ep/ProSemComVLM.

【8】SplitFT: An Adaptive Federated Split Learning System For LLMs Fine-Tuning
标题：SplitFT：用于LLM微调的自适应联合拆分学习系统
链接：https://arxiv.org/abs/2604.26388

作者：Yimeng Shan,Zhaorui Zhang,Sheng Di,Yu Liu,Xiaoyi Lu,Benben Liu
摘要：联邦分裂学习已被确定为一种有效的方法来解决经典联邦学习中客户端的计算资源约束，同时保证跨数据所有者的分布式模型训练的数据隐私。然而，当这种训练策略遇到大型语言模型（LLM）进行微调时，它面临着一些关键挑战。这些挑战包括在不同客户端之间自适应地设置切割层，以解决数据和设备异构性问题，这些问题会显著影响系统性能。此外，有效地减少微调过程期间的通信开销也是另一个挑战。没有任何工作试图应对这些挑战。为了弥合这一差距，我们提出了SplitTF，一个自适应联邦分裂学习系统LLM微调。SplitFT使不同的客户端能够根据其计算资源和训练的模型性能设置不同的切割层。SplitFT还提出了减少cutlayer中的LoRA秩以减少通信开销。除了模拟我们提出的分裂联邦学习系统在现实世界中的应用程序的异构数据，我们提出了一个基于长度的Dirichlet方法划分到不同的客户端的训练数据。大量的实验结果表明，我们提出的方法优于国家的最先进的方法微调的时间效率和模型性能的基础上各种流行的基准。
摘要：Federated Split Learning has been identified as an efficient approach to address the computational resource constraints of clients in classical federated learning, while guaranteeing data privacy for distributed model training across data owners. However, it faces some critical challenges when such a training strategy meets large language models (LLMs) for fine-tuning. Such challenges include setting the cutlayer adaptively across different clients to address the data and device heterogeneity issues, which affect the system performance significantly. In addition, efficiently reducing the communication overhead during the fine-tuning procedure is also another challenge. No work tries to address these challenges. To bridge this gap, we propose SplitTF, an adaptive federated split learning system for LLMs fine-tuning. SplitFT enables different clients to set different cut layers according to their computation resources and trained model performance. SplitFT also proposes to reduce the LoRA rank in cutlayer to reduce the communication overhead. In addition to simulating the heterogeneous data in real-world applications for our proposed split federated learning system, we propose a length-based Dirichlet approach to divide the training data into different clients. Extensive experimental results show that our proposed approach outperforms the state-of-the-art approach for fine-tuning time efficiency and model performance based on various popular benchmarks.

【9】CoQuant: Joint Weight-Activation Subspace Projection for Mixed-Precision LLMs
标题：CoQuant：混合精度LLM的联合权重激活子空间投影
链接：https://arxiv.org/abs/2604.26378

作者：Zhe Ding,Su Pan,Duowei Pan
备注：14 pages, 3 figures
摘要：后训练量化（PTQ）已成为降低大型语言模型（LLM）推理成本的重要技术。虽然最近的混合精度方法通过以高精度保留关键子空间来改进超低位量化，但它们通常仅依赖于激活统计来构建这些子空间。这忽略了线性操作的基本性质，其中输出扰动由激活和权重量化噪声共同驱动。在本文中，我们提出了CoQuant，联合权重激活子空间投影方法。通过对预期输出误差进行理论建模，CoQuant制定了一个封闭形式的加权PCA解决方案，该解决方案平衡了激活和权重协方差，以选择最佳的高精度子空间。在Llama-3.2和Qwen2.5模型上进行的大量实验表明，CoQuant在WikiText困惑度和zero-shot常识推理准确度方面始终优于强PTQ基线。这些结果表明，联合权重激活子空间建模为低比特LLM量化提供了一个原则和有效的方向。源代码可在https://github.com/Zachary5895/CoQuant上获得。
摘要：Post-training quantization (PTQ) has become an important technique for reducing the inference cost of Large Language Models (LLMs). While recent mixed-precision methods improve ultra-low bit quantization by preserving critical subspaces in high precision, they typically construct these subspaces relying solely on activation statistics. This ignores the fundamental nature of linear operations, where the output perturbation is jointly driven by both activation and weight quantization noise. In this paper, we propose CoQuant, a joint weight-activation subspace projection method. By theoretically modeling the expected output error, CoQuant formulates a closed-form weighted PCA solution that balances activation and weight covariances to select the optimal high-precision subspace. Extensive experiments on Llama-3.2 and Qwen2.5 models show that CoQuant consistently outperforms strong PTQ baselines in both WikiText perplexity and zero-shot common-sense reasoning accuracy. These results demonstrate that joint weight-activation subspace modeling provides a principled and effective direction for low-bit LLM quantization. The source code is available at https://github.com/Zachary5895/CoQuant.

【10】Addressing Performance Saturation for LLM RL via Precise Entropy Curve Control
标题：通过精确的熵曲线控制解决LLM RL的性能饱和问题
链接：https://arxiv.org/abs/2604.26326

作者：Bolian Li,Yifan Wang,Yi Ding,Anamika Lochab,Ananth Grama,Ruqi Zhang
摘要：强化学习（RL）在大型语言模型（LLM）中解锁了复杂的推理能力。然而，大多数强化学习算法都存在性能饱和的问题，这阻碍了强化学习训练规模的进一步扩大。这个问题可以用熵的崩溃来表征，这是RL探索的关键诊断。现有的尝试试图通过正则化或裁剪来防止熵崩溃，但它们产生的熵曲线通常表现出长期的不稳定性，这阻碍了性能的提高。在本文中，我们介绍了Entrocraft，一个简单的拒绝抽样方法，实现任何用户定制的熵时间表偏置的优势分布。Entrocraft不需要客观的正则化，并且与估计量无关。理论上，我们在最小假设下将每步熵变化与优势分布联系起来，这解释了现有RL和熵保持方法的行为。Entrocraft还可以对熵时间表进行系统研究，我们发现线性退火的性能最好，它开始时很高，然后衰减到稍低的目标。从经验上讲，Entrocraft解决了性能饱和问题，显著提高了泛化能力、输出多样性和长期培训。它使4 B模型的性能优于8B基线，在达到稳定状态之前保持最多4倍的改进，并将pass@K提高50%。
摘要：Reinforcement learning (RL) has unlocked complex reasoning abilities in large language models (LLMs). However, most RL algorithms suffer from performance saturation, preventing further gains as RL training scales. This problem can be characterized by the collapse of entropy, a key diagnostic for exploration in RL. Existing attempts have tried to prevent entropy collapse through regularization or clipping, but their resulting entropy curves often exhibit instability in the long term, which hinders performance gains. In this paper, we introduce Entrocraft, a simple rejection-sampling approach that realizes any user-customized entropy schedule by biasing the advantage distributions. Entrocraft requires no objective regularization and is advantage-estimator-agnostic. Theoretically, we relate per-step entropy change to the advantage distribution under minimal assumptions, which explains the behavior of existing RL and entropy-preserving methods. Entrocraft also enables a systematic study of entropy schedules, where we find that linear annealing, which starts high and decays to a slightly lower target, performs best. Empirically, Entrocraft addresses performance saturation, significantly improving generalization, output diversity, and long-term training. It enables a 4B model to outperform an 8B baseline, sustains improvement for up to 4x longer before plateauing, and raises pass@K by 50% over the baseline.

【11】FlowBot: Inducing LLM Workflows with Bilevel Optimization and Textual Gradients
标题：FlowBot：通过二层优化和文本属性引入LLM工作流程
链接：https://arxiv.org/abs/2604.26258

作者：Hongyeon Yu,Young-Bum Kim,Yoon Kim
摘要：LLM工作流程，协调对各个LLM的结构化调用（每个LLM都增加了不同的指令和工具）以实现特定目标，为扩展LLM的功能和构建可以处理不同任务的强大系统提供了一条有前途的道路。然而，用于构建这样的工作流的现有方法通常依赖于人工制作的管道和提示，这在现实世界的部署中呈现出实质性的瓶颈。如何以数据驱动的方式自动诱导和优化这些工作流程？本文描述了一个简单的数据驱动的方法，自动诱导LLM工作流。我们制定工作流归纳为一个双层优化问题：外循环，优化工作流的高层次草图（特别是如何LLM调用应该结构化），和一个内循环，优化每个单独的LLM调用一个接一个。这两个循环都是用“文本梯度”优化的，其中对于内部循环，我们通过逐层的“反向传播”文本梯度以模块化的方式优化每个组件。我们发现，通过我们的\textsc{FlowBot}（通过\textbf{b}ilevel \textbf{o}优化和\textbf{t}extual梯度进行工作\textbf{flow}归纳）方法发现的LLM工作流与使用人工制作或自动生成的工作流的强基线相比具有竞争力。
摘要：LLM workflows, which coordinate structured calls to individual LLMs (each augmented with varying instructions and tools) to achieve a particular goal, offer a promising path towards extending the capabilities of LLMs and building powerful systems that can tackle diverse tasks. However, existing approaches for building such workflows generally rely on human-crafted pipelines and prompts, which presents a substantial bottleneck in real world deployment. How can automatically induce and optimize such workflows in a data-driven way? This paper describes a simple data-driven approach for automatically inducing LLM workflows. We formulate workflow induction as a bilevel optimization problem: an outer loop which optimizes a high-level sketch of the workflow (in particular how the LLM calls should be structured), and an inner loop which optimizes each individual LLM call one-by one. Both loops are optimized with ``textual gradients'' where for the inner loop we optimize each component in a modular way through ``backpropagating'' textual gradients layer-by-layer. We find that LLM workflows discovered through our \textsc{FlowBot} (work\textbf{flow} induction through \textbf{b}ilevel \textbf{o}ptimization and \textbf{t}extual gradients) approach performs competitively against strong baselines that make use of human-crafted or automatically-generated workflows.

【12】DORA: A Scalable Asynchronous Reinforcement Learning System for Language Model Training
标题：DORA：用于语言模型训练的可扩展同步强化学习系统
链接：https://arxiv.org/abs/2604.26256

作者：Tianhao Hu,Xiangcheng Liu,Youshao Xiao,Yang Zheng,Xuan Huang,Jinrui Ding,Yufei Zhang,Tao Liang,Hongyu Zang,Quan Chen,Yueqing Sun,Wenjie Shi,Chao Zhang,Wei Wang,Qi Gu,Yerui Sun,Yucheng Xie,Xunliang Cai
摘要：强化学习（RL）已经成为LLM后训练的关键范例，但推出阶段（占总步骤时间的50- 80%）受到偏斜生成的影响：模型性能不可或缺的长尾轨迹阻塞了整个训练管道。异步训练通过将生成与训练重叠提供了一种自然的补救方法，但在效率和算法正确性之间引入了根本的紧张关系。我们确定了异步训练中的三个约束以保持收敛：轨迹内策略一致性，数据完整性和有界陈旧性。现有的方法无法从本质上解决长尾轨迹问题，这是进一步加剧了混合专家模型的不平衡特性，或偏离标准的RL训练公式，从而阻碍模型收敛。因此，我们提出了DORA（动态ORchestration异步推出），它解决了这一挑战，通过算法系统协同设计。DORA引入了多版本流部署，这是一种新颖的异步范例，可以同时维护多个策略版本-同时实现完全消除气泡而不影响算法约束。实验结果表明，我们的DORA系统实现了吞吐量的大幅提高-高达2- 3倍，比最先进的系统的开源基准-而不影响收敛。此外，在拥有数万个加速器的大规模工业应用中，与各种场景的同步训练相比，DORA将RL训练加速了2- 4倍。由此产生的开源模型LongCat-Flash-Thinking在复杂的推理基准测试中表现出具有竞争力的性能，与最先进的LLM的能力相匹配。
摘要：Reinforcement learning (RL) has become a critical paradigm for LLM post-training, yet the rollout phase -- accounting for 50--80% of total step time -- is bottlenecked by skewed generation: long-tailed trajectories indispensable for model performance block the entire training pipeline. Asynchronous training offers a natural remedy by overlapping generation with training, but introduces a fundamental tension between efficiency and algorithmic correctness. We identify three constraints in asynchronous training to preserve convergence: intra-trajectory policy consistency, data integrity, and bounded staleness. Existing approaches fail to intrinsically address the long-tailed trajectory problem, which is further exacerbated by the imbalance characteristic of Mix-of-Experts models, or deviate from the standard RL training formulation, thereby hindering model convergence. Therefore, we propose DORA (Dynamic ORchestration for Asynchronous Rollout), which addresses this challenge through algorithm-system co-design. DORA introduces multi-version streaming rollout, a novel asynchronous paradigm that maintains multiple policy versions concurrently -- simultaneously achieving full bubble elimination without compromising algorithmic constraints. Experimental results demonstrate that our DORA system achieves substantial improvements in throughput -- up to 2--3 times higher than state-of-the-art systems on open-source benchmarks -- without compromising convergence. Furthermore, in large-scale industrial applications with tens of thousands of accelerators, DORA accelerates RL training by 2--4 times compared to synchronous training across various scenarios. The resultant open-source models, LongCat-Flash-Thinking, exhibit competitive performance on complex reasoning benchmarks, matching the capability of most advanced LLMs.

【13】Large Language Models for Multilingual Code Intelligence: A Survey
标题：多语言代码智能的大型语言模型：调查
链接：https://arxiv.org/abs/2604.25960

作者：Chao Jiang,Dugang Liu,Cheng Wen,Zhiwu Xu,Hua Zheng,Muhammad Sadiq,Jawwad Ahmed Shamsi,Shengchao Qin,Zhong Ming
摘要：大型语言模型已经改变了人工智能辅助的软件工程，但目前的研究仍然偏向于Python等高资源语言，而Rust和OCaml等语言的性能较弱。由于现实世界的系统本质上是多语言的，因此健壮的多语言代码智能至关重要。本调查关注两个关键任务：从共享的自然语言需求生成多语言代码，以及在不同语言之间保持语义的多语言代码翻译。它回顾了代表性的方法，基准和评估指标，并强调了值得信赖的跨语言泛化的挑战和机遇。
摘要：Large language models have transformed AI-assisted software engineering, but current research remains biased toward high-resource languages such as Python, with weaker performance in languages like Rust and OCaml. Since real-world systems are inherently polyglot, robust multilingual code intelligence is crucial. This survey focuses on two key tasks: multilingual code generation from shared natural-language requirements, and multilingual code translation that preserves semantics across languages. It reviews representative methods, benchmarks, and evaluation metrics, and highlights challenges and opportunities for trustworthy cross-language generalization.

Graph相关(图学习|图神经网络|图优化等)(9篇)

【1】Semi-supervised learning with max-margin graph cuts
标题：具有最大利润图切割的半监督学习
链接：https://arxiv.org/abs/2604.26818

作者：Branislav Kveton,Michal Valko,Ali Rahimi,Ling Huang
备注：Published at AISTATS 2010 (13th International Conference on Artificial Intelligence and Statistics)
摘要：提出了一种新的半监督学习算法。该算法学习最大化由调和函数解决方案引起的标签的裕度的图切割。我们激励的方法，比较它与现有的工作，并证明其泛化误差的范围。我们的解决方案的质量是在一个合成问题和三个UCI ML存储库数据集上进行评估的。在大多数情况下，我们优于支持向量机的流形正则化，这是半监督最大边缘学习的最先进方法。
摘要：This paper proposes a novel algorithm for semisupervised learning. This algorithm learns graph cuts that maximize the margin with respect to the labels induced by the harmonic function solution. We motivate the approach, compare it to existing work, and prove a bound on its generalization error. The quality of our solutions is evaluated on a synthetic problem and three UCI ML repository datasets. In most cases, we outperform manifold regularization of support vector machines, which is a state-of-the-art approach to semi-supervised max-margin learning.

【2】PiGGO: Physics-Guided Learnable Graph Kalman Filters for Virtual Sensing of Nonlinear Dynamic Structures under Uncertainty
标题：PiGGO：用于不确定性下非线性动态结构虚拟感知的物理引导可学习图卡尔曼过滤器
链接：https://arxiv.org/abs/2604.26593

作者：Marcus Haywood-Alexander,Gregory Duthé,Eleni Chatzi
摘要：数字孪生为工程系统的监测和控制中的诊断和预测任务提供了强大的范例;然而，它们在复杂结构中的部署仍然受到模型形式不确定性的挑战，这些不确定性来自未知的非线性动力学和稀疏感测。这些限制阻碍了使用纯粹基于物理或纯粹数据驱动的方法进行可靠的在线状态估计。这项工作介绍了物理指导的图形神经ODE（PiGGO）框架，一个物理信息，基于图形的贝叶斯状态估计方法，其中学习的图形神经常微分方程（GNODE）作为扩展卡尔曼滤波器内的连续时间状态转换模型。图形表示明确定义了系统的状态空间，而物理引导的归纳偏差编码已知的结构关系，并约束非线性动力学的学习。通过将图形原生学习动态与递归贝叶斯滤波相结合，所提出的PiGGO框架能够对具有未知模型形式的非线性系统进行在线虚拟感知和不确定性感知状态估计，同时保持拓扑相似结构的泛化。数值案例研究表明，改进的鲁棒性模型的不确定性和测量噪声，优于开环图神经模型和传统的在线预测任务的过滤方法。
摘要：Digital twins provide a powerful paradigm for diagnostic and prognostic tasks in the monitoring and control of engineered systems; however, their deployment for complex structures remains challenged by model-form uncertainty, arising from unknown nonlinear dynamics, and by sparse sensing. These limitations hinder reliable online state estimation using either purely physics-based or purely data-driven approaches. This work introduces the Physics-Guided Graph Neural ODE (PiGGO) framework, a physics-informed, graph-based Bayesian state estimation approach in which a learned graph neural ordinary differential equation (GNODE) serves as the continuous-time state-transition model within an extended Kalman filter. The graph representation explicitly defines the system state-space, while physics-guided inductive biases encode known structural relationships and constrain the learning of nonlinear dynamics. By integrating graph-native learned dynamics with recursive Bayesian filtering, the proposed PiGGO framework enables online virtual sensing and uncertainty-aware state estimation for nonlinear systems with unknown model form, while maintaining generalisation across topologically similar structures. Numerical case studies demonstrate improved robustness to model uncertainty and measurement noise, outperforming both open-loop graph neural models and conventional filtering approaches in online prediction tasks.

【3】Large-scale semi-supervised learning with online spectral graph sparsification
标题：采用在线谱图稀疏化的大规模半监督学习
链接：https://arxiv.org/abs/2604.26550

作者：Daniele Calandriello,Alessandro Lazaric,Michal Valko
备注：Workshop on Resource-Efficient Machine Learning (REML), ICML 2015
摘要：我们介绍了稀疏HFS，一个可扩展的算法，可以计算解决方案的SSL问题，只使用O（n polylog（n））空间和O（m polylog（n））时间。
摘要：We introduce Sparse-HFS, a scalable algorithm that can compute solutions to SSL problems using only O(n polylog(n)) space and O(m polylog(n)) time.

【4】STLGT: A Scalable Trace-Based Linear Graph Transformer for Tail Latency Prediction in Microservices
标题：STLGT：一种可扩展的基于迹的线性图Transformer，用于微服务中的尾部延迟预测
链接：https://arxiv.org/abs/2604.26422

作者：Yongliang Ding,Qigong Bi,Peng Pu
备注：12 pages, 5 figures, 4 tables, conference
摘要：准确的端到端尾延迟预测对于微服务系统中的主动SLO管理至关重要。然而，建模长距离依赖性传播和非平稳的突发工作负载，同时保持大规模的推理效率仍然具有挑战性。我们提出了STLGT（可扩展的基于轨迹的线性图Transformer），这是一个基于API的预测器，它将轨迹编码为跨度图，用于多步p95尾延迟预测。STLGT使用结构感知的线性图Transformer来传播跨服务依赖，其推理时间在跨度图大小上是线性的，并且使用解耦的时间模块来捕获工作负载动态。在个性化教育微服务应用程序DeathStarBench和阿里巴巴跟踪中，STLGT平均将PERT-GNN的预测准确率提高了8.5% MAPE，并在N=32时实现了高达12倍的CPU推理速度，与预处理阿里巴巴跟踪后的最大跨度图大小相匹配。消融研究进一步证明了每个组件的有效性，特别是在突发交通。
摘要：Accurate end-to-end tail-latency forecasting is critical for proactive SLO management in microservice systems. However, modeling long-range dependency propagation and non-stationary, bursty workloads while maintaining inference efficiency at scale remains challenging. We present STLGT (Scalable Trace-based Linear Graph Transformer), a per-API predictor that encodes traces as span graphs for multi-step p95 tail-latency forecasting. STLGT uses a structure-aware linear graph Transformer to propagate cross-service dependencies with inference time linear in span graph size, and a decoupled temporal module to capture workload dynamics. Across a personalized education microservice application, DeathStarBench, and Alibaba traces, STLGT improves forecasting accuracy over PERT-GNN by 8.5% MAPE on average and achieves up to 12x faster CPU inference at N=32, matching the maximum span graph size after preprocessing the Alibaba traces. Ablation studies further demonstrate the effectiveness of each component, especially under bursty traffic.

【5】Cheeger--Hodge Contrastive Learning for Structurally Robust Graph Representation Learning
标题：Cheeger--Hodge对比学习，用于结构稳健的图表示学习
链接：https://arxiv.org/abs/2604.26301

作者：Mengyang Zhao,Longlong Li,Cunquan Qu
摘要：图对比学习（GCL）已经成为无监督图表示学习的一个重要框架。然而，仅仅依靠增广设计来定义GCL学习的不变性在结构扰动下可能是脆弱的。为了解决这个问题，我们提出了Cheeger-Hodge对比学习（CHCL），一个框架，在增强视图中对齐扰动稳定的Cheeger-Hodge联合签名，以实现鲁棒的图表示学习。该签名将基于代数连通性\（λ_2\）的Cheeger启发的连通性签名与1-Hodge Laplacian的低频谱相结合，从而同时捕获全局连通性和高阶结构信息.通过将编码器表示与建议的Cheeger-Hodge联合签名在增强视图中对齐，CHCL学习对局部结构扰动具有鲁棒性的图嵌入。在标准基准测试、传输设置上的大量实验表明，CHCL一致地提高了性能、鲁棒性和泛化能力。
摘要：Graph Contrastive Learning (GCL) has emerged as a prominent framework for unsupervised graph representation learning. However, relying on augmentation design alone to define the invariances learned by GCL can be brittle under structural perturbations. To address this issue, we propose Cheeger--Hodge Contrastive Learning (CHCL), a framework that aligns a perturbation-stable Cheeger--Hodge joint signature across augmented views for robust graph representation learning. The proposed signature combines a Cheeger-inspired connectivity signature derived from the algebraic connectivity $λ_2$ with the low-frequency spectrum of the 1-Hodge Laplacian, thereby capturing both global connectivity and higher-order structural information. By aligning encoder representations with the proposed Cheeger--Hodge joint signature across augmented views, CHCL learns graph embeddings that are robust to local structural perturbations. Extensive experiments on standard benchmarks, transfer settings demonstrate that CHCL consistently improves performance, robustness, and generalization.

【6】Unsupervised Graph Modeling for Anomaly Detection in Accounting Subject Relationships
标题：用于会计主体关系异常检测的无监督图建模
链接：https://arxiv.org/abs/2604.26216

作者：Yuhan Wang,Ruobing Yan,Zhe Su,Hejing Chen,Ningjing Sang,Yunfei Nie
摘要：本文针对会计科目关联结构中的异常检测问题，提出了一种基于图神经网络的结构化建模和无监督判别框架。该框架用于挖掘科目之间的稳定对应关系，并识别总分类账明细和凭证分录的结构性偏差。该方法首先将会计科目抽象为图节点，将同一业务记录中科目的同现和借贷对应抽象为带权边。边权值由共现频率或金额聚合等统计度量表征，从而形成期间级会计主题关联图。在表示学习阶段，使用消息传递机制融合节点自身属性和邻域上下文，获得包含结构信息的节点嵌入。在异常检测阶段，通过关系重构解码器估计主题对连接的合理性，并根据重构概率的偏差程度定义边缘级异常得分。这些分数，然后汇总，以获得节点级的风险排名和本地异常定位。该框架可以同时捕获局部子结构异常和跨社区异常连接，而不依赖于异常标记，输出可追踪的主题对风险线索。对比实验表明，更稳定的综合判别能力和更高的排名前的准确率。
摘要：This paper addresses the problem of anomaly detection in accounting subject association structures, proposing a structured modeling and unsupervised discriminant framework based on graph neural networks. This framework is used to mine stable correspondences between subjects and identify structural deviations from general ledger details and voucher entries. The method first abstracts accounting subjects as graph nodes, and the co-occurrence and debit/credit correspondence of subjects in the same business record are abstracted as weighted edges. The edge weights are characterized by statistical measures such as co-occurrence frequency or amount aggregation, thus forming a period-level accounting subject association graph. In the representation learning stage, a message passing mechanism is used to fuse the node's own attributes and neighborhood context to obtain node embeddings containing structural information. In the anomaly detection stage, the rationality of subject pair connections is estimated through a relation reconstruction decoder, and edge-level anomaly scores are defined based on the degree of deviation in reconstruction probabilities. These scores are then aggregated to obtain node-level risk ranking and local anomaly localization. This framework can simultaneously capture local substructure anomalies and cross-community anomaly connections without relying on anomaly labeling, outputting traceable subject pair risk clues. Comparative experiments demonstrate more stable comprehensive discriminant capabilities and higher top-ranking accuracy.

【7】Momentum-Conserving Graph Neural Networks for Deformable Objects
标题：用于可变形物体的动量守恒图神经网络
链接：https://arxiv.org/abs/2604.26097

作者：Jiahong Wang,Logan Numerow,Stelian Coros,Christian Theobalt,Vahid Babaei,Bernhard Thomaszewski
备注：Accepted to 3DV 2026
摘要：图形神经网络（GNNs）已经成为一种通用和有效的选择，用于建模可变形材料的动态行为。虽然GNN很容易推广到任意形状、网格拓扑和材料参数，但现有架构很难正确预测关键物理量（如线动量和角动量）的时间演化。在这项工作中，我们提出了MomentumGNN -一种旨在通过构建来准确跟踪动量的新颖架构。与现有的输出无约束节点加速度的GNN不同，我们的模型预测了每边的拉伸和弯曲脉冲，这保证了线动量和角动量的保持。我们使用基于物理的损失以无监督的方式训练我们的网络，并且我们表明我们的方法在一些常见的场景中表现优于基线，其中动量起着关键作用。
摘要：Graph neural networks (GNNs) have emerged as a versatile and efficient option for modeling the dynamic behavior of deformable materials. While GNNs generalize readily to arbitrary shapes, mesh topologies, and material parameters, existing architectures struggle to correctly predict the temporal evolution of key physical quantities such as linear and angular momentum. In this work, we propose MomentumGNN -- a novel architecture designed to accurately track momentum by construction. Unlike existing GNNs that output unconstrained nodal accelerations, our model predicts per-edge stretching and bending impulses which guarantee the preservation of linear and angular momentum. We train our network in an unsupervised fashion using a physics-based loss, and we show that our method outperforms baselines in a number of common scenarios where momentum plays a pivotal role.

【8】A Survey of Multi-Agent Deep Reinforcement Learning with Graph Neural Network-Based Communication
标题：基于图神经网络通信的多智能体深度强化学习综述
链接：https://arxiv.org/abs/2604.25972

作者：Valentin Cuzin-Rambaud,Laetitia Matignon,Maxime Morge
摘要：在多智能体强化学习（MARL）中，集成了一种通信机制，允许智能体通过共享信息更好地学习协调其行动并收敛于其目标。基于交互图，一个子类的方法采用图神经网络（GNN）来学习通信，使代理能够通过丰富它们交换的信息来改善它们的内部表示。随着越来越多的研究，我们注意到缺乏明确的结构和框架来区分和分类基于GNN通信的MARL方法。因此，本文综述了最近在这一领域的工作。我们提出了一个广义的基于GNN的通信过程，其目标是使方法背后的基本概念更加明显和可访问。
摘要：In multi-agent reinforcement learning (MARL), the integration of a communication mechanism, allowing agents to better learn to coordinate their actions and converge on their objectives by sharing information. Based on an interaction graph, a subclass of methods employs graph neural networks (GNNs) to learn the communication, enabling agents to improve their internal representations by enriching them with information exchanged. With growing research, we note a lack of explicit structure and framework to distinguish and classify MARL approaches with communication based on GNNs. Thus, this paper surveys recent works in this field. We propose a generalized GNN-based communication process with the goal of making the underlying concepts behind the methods more obvious and accessible.

【9】Sparse Graph Learning from Sparse Data via Fiedler Number Maximization
标题：通过Fiedler数最大化从稀疏数据中学习稀疏图
链接：https://arxiv.org/abs/2604.26132

作者：Bahar Oveisgharan,Gene Cheung,Andrew Eckford
摘要：我们的目标是从稀疏数据中学习一个稀疏且连通的图，其中观测值的数量K可以大大小于R^N中信号x的信号维度N，并且底层分布是未知的。在这种严重不适定的情况下，我们将Fiedler数（量化连通性的图拉普拉斯矩阵的第二个特征值）作为稀疏图学习目标中的鲁棒正则化项。我们首先开发了一个贪婪的算法，迭代地选择一个边缘全球削弱/删除，以减少目标，利用特征值摄动定理，绑定的边缘变化的负面影响的菲德勒数。接下来，我们设计了一个并行的变体，基于Cheeger不等式，递归地将输入图划分为两个子图，使用近似Cheeger切割来分布式地找到最佳边缘。仿真实验表明，Fiedler数最大化增强了稀疏图估计的鲁棒性，优于之前的稀疏图学习算法。
摘要：We aim to learn a sparse and connected graph from sparse data, where the number of observations K can be substantially smaller than the signal dimension N for signals x in R^N, and the underlying distribution is unknown. In this severely ill-posed setting, we incorporate Fiedler number (the second eigenvalue of the graph Laplacian matrix that quantifies connectedness) as a robust regularization term in the sparse graph learning objective. We first develop a greedy algorithm that iteratively selects one edge globally for weakening/removal to reduce the objective, leveraging eigenvalue perturbation theorems that bound the adverse effect of an edge change to the Fiedler number. Next, we design a parallel variant, based on the Cheeger's inequality, that recursively partitions an input graph into two sub-graphs using an approximate Cheeger cut to distributedly find an optimal edge. Simulation experiments show that Fiedler number maximization robustifies sparse graph estimates, outperforming previous sparse graph learning algorithms.

Transformer(5篇)

【1】Exploring the Potential of Probabilistic Transformer for Time Series Modeling: A Report on the ST-PT Framework
标题：探索概率Transformer用于时间序列建模的潜力：关于ST-PT框架的报告
链接：https://arxiv.org/abs/2604.26762

作者：Zhangzhi Xiong,Haoyi Wu,You Wu,Shuqi Gu,Kan Ren,Kewei Tu
备注：30 pages, 2 figures
摘要：概率Transformer（PT）建立了Transformer的自注意力加上其前馈块在数学上等价于条件随机场（CRF）上的平均场变分推理（MFVI）。在这种等价性下，Transformer不再是一个黑盒神经网络，而成为一个可编程的因子图：图拓扑、因子势和消息传递调度都是可以设计的显式和可检查的原语。PT最初是为自然语言开发的，在本报告中，我们研究了它在时间序列中的潜力。我们首先提升PT到时空概率Transformer（ST-PT），以修复PT丢失的通道轴和弱的每步语义，并采用ST-PT作为共享的基石骨干。然后，我们确定PT/ST-PT提供的三个不同的属性作为因子图模型，并导出三个研究问题，每个属性一个，探讨如何在时间序列中利用每个属性：RQ 1。图的拓扑和势是直接可编程的原语。这是否可以用来通过结构图修改将符号时间序列先验注入ST-PT，特别是在数据稀缺和噪声的情况下？RQ 2. CRF的因子矩阵是算子的势。一个外部条件是否可以在每个样本的基础上对这些因子矩阵进行编程，从而使条件生成成为结构化的，而不是固定的特征级调制？RQ 3.每个MFVI迭代是因子图上的贝叶斯后验更新。这是否可以将潜在空间自回归（AR）预测的潜在过渡从不透明的MLP转变为原则性的后验更新，CRF教师是否可以将其潜在内容提取到AR学生中以对抗累积误差？我们每个问题给出一个实证研究。总之，这三项研究将ST-PT定位为时间序列建模的可编程框架。
摘要：The Probabilistic Transformer (PT) establishes that the Transformer's self-attention plus its feed-forward block is mathematically equivalent to Mean-Field Variational Inference (MFVI) on a Conditional Random Field (CRF). Under this equivalence the Transformer ceases to be a black-box neural network and becomes a programmable factor graph: graph topology, factor potentials, and the message-passing schedule are all explicit and inspectable primitives that can be engineered. PT was originally developed for natural language and in this report we investigate its potential for time series. We first lift PT into the Spatial-Temporal Probabilistic Transformer (ST-PT) to repair PT's missing channel axis and weak per-step semantics, and adopt ST-PT as a shared cornerstone backbone. We then identify three distinct properties that PT/ST-PT offers as a factor-graph model and derive three Research Questions, one per property, that probe how each property can be exploited in time series: RQ1. The graph topology and potentials are direct programmable primitives. Can this be used to inject symbolic time-series priors into ST-PT through structural graph modifications, especially under data scarcity and noise? RQ2. The CRF's factor matrices are the operator's potentials. Can an external condition program these factor matrices on a per-sample basis, so that conditional generation becomes structural rather than feature-level modulation of a fixed one? RQ3. Each MFVI iteration is a Bayesian posterior update on the factor graph. Can this turn the latent transition of latent-space AutoRegressive (AR) forecasting from an opaque MLP into a principled posterior update, and can a CRF teacher distill its latents into the AR student to counter cumulative error? We give one empirical study per question. Together, these three studies position ST-PT as a programmable framework for time-series modeling.

【2】Efficient and Interpretable Transformer for Counterfactual Fairness
标题：高效且可解释的Transformer，实现反事实公平
链接：https://arxiv.org/abs/2604.26188

作者：Panyi Dong,Zhiyu Quan
摘要：在金融和保险等高风险、高度监管的领域，机器学习模型的依赖性越来越大，这在预测性能、可解释性和监管公平性要求之间造成了越来越大的紧张关系。在这些环境中，模型不仅要提供可靠的预测，还要提供透明的决策依据，并遵守严格的公平性要求。基于注意力的Transformers提供了强大的机制来建模复杂的数据关系，如在各种语言任务中所示，但它们的注意力机制本身并不能确保反事实的公平预测，即使与公平感知技术相结合。为了解决这些局限性，我们提出了特征相关性Transformer（FCorr Transformer），一个注意力轻架构，为表格数据量身定制。在这种设计中，注意力矩阵承认一个直接的统计解释为成对的特征依赖关系，提高了可解释性和效率。利用这种结构，我们引入了反事实注意力正则化（CAR），这是一个框架，在注意力水平上强制敏感特征的组不变公平表示，促进反事实公平预测，而不依赖于明确的因果假设。对不平衡分类和回归基准的实证评估表明，FCorrTransformer结合CAR实现了强大的反事实公平性，同时保持了具有竞争力的预测性能，并与标准的基于transformer的基线相比大幅降低了模型复杂性。总的来说，这项工作弥合了公平理论和机器学习模型之间的关键差距，为监管敏感领域的负责任人工智能提供了一个实用的框架。
摘要：The growing reliance of machine learning models in high-stakes, highly regulated domains such as finance and insurance has created a growing tension between predictive performance, interpretability, and regulatory fairness requirements. In these settings, models are expected not only to deliver reliable predictions but also to provide transparent decision rationales and comply with strict fairness requirements. Attention-based transformers offer powerful mechanisms for modeling complex data relationships as demonstrated in various language tasks, yet their attention mechanisms alone do not ensure counterfactually fair predictions, even when combined with fairness-aware techniques. To address these limitations, we propose the Feature Correlation Transformer (FCorrTransformer), an attention-light architecture tailored for tabular data. In this design, the attention matrix admits a direct statistical interpretation as pairwise feature dependencies, enhancing both interpretability and efficiency. Leveraging this structure, we introduce Counterfactual Attention Regularization (CAR), a framework that enforces group-invariant fair representations of sensitive features at the attention level, promoting counterfactually fair predictions without relying on explicit causal assumptions. Empirical evaluations on imbalanced classification and regression benchmarks demonstrate that FCorrTransformer combined with CAR achieves strong counterfactual fairness while maintaining competitive predictive performance and substantially reducing model complexity compared with standard transformer-based baselines. Overall, this work bridges a critical gap between fairness theory and machine learning models, offering a practical framework for responsible AI in regulatory-sensitive domains.

【3】PPG-Based Affect Recognition with Long-Range Deep Models: A Measurement-Driven Comparison of CNN, Transformer, and Mamba Architectures
标题：使用长距离深度模型的基于PGP的情感识别：CNN、Transformer和Mamba架构的测量驱动比较
链接：https://arxiv.org/abs/2604.26078

作者：Karim Alghoul,Hussein Al Osman,Abdulmotaleb El Saddik
摘要：光电体积描记术（PPG）由于其低成本和易于集成到消费者设备中而越来越多地用于可穿戴情感计算。深度学习的最新进展引入了长距离序列模型，如Transformers，以及状态空间模型，如Mamba，它们在自然语言和一般时间序列任务上表现出了强大的性能。然而，目前尚不清楚这些架构是否比广泛使用的卷积神经网络（CNN）和长短期记忆（LSTM）提供了切实的好处，因为数据集通常很小而且有噪声。这项工作提出了四种深度学习架构的测量驱动比较，CNN，CNN-LSTM混合，Transformers和Mamba，用于从基于手腕的PPG信号中分类唤醒，效价和放松状态。所有模型都在独立于受试者的5倍交叉验证协议下使用相同的预处理，分割和训练管道进行评估。我们的研究结果表明，Transformer和Mamba模型实现了与CNN基线相当的性能，但在所有任务中并不总是优于它。CNN总体上仍然是最有效的，以最小的模型大小提供最高的准确性，而Transformers在唤醒和放松方面的F1分数有更好的平衡。该研究提供了基于PPG的情感识别的Transformer和Mamba模型的第一次评估，为可穿戴情感监测系统的模型选择提供了实际指导。
摘要：Photoplethysmography (PPG) is increasingly used in wearable affective computing due to its low cost and ease of integration into consumer devices. Recent advances in deep learning have introduced long-range sequence models, such as Transformers, and state-space models, like Mamba, which have demonstrated strong performance on natural language and general time-series tasks. However, it remains unclear whether these architectures offer tangible benefits over widely used Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTMs) for PPG-based affect recognition, given that datasets are typically small and noisy. This work presents a measurement-driven comparison of four deep learning architectures, CNN, CNN-LSTM hybrid, Transformers, and Mamba, for classifying arousal, valence, and relaxation states from wrist-based PPG signals. All models are evaluated under a subject-independent 5-fold cross-validation protocol using identical preprocessing, segmentation, and training pipelines. Our results show that the Transformer and Mamba models achieve performance comparable to that of a CNN baseline, but do not consistently outperform it across all tasks. CNNs remain the most effective overall, providing the highest accuracy with the smallest model size, whereas Transformers have a better balance of F1 scores for Arousal and Relaxation. The study provides the first evaluation of Transformer and Mamba models for PPG-based affect recognition, offering practical guidance on model selection for wearable affective monitoring systems.

【4】Associative-State Universal Transformers: Sparse Retrieval Meets Structured Recurrence
标题：关联状态通用变形器：稀疏检索满足结构化回归
链接：https://arxiv.org/abs/2604.25930

作者：Liu Xiao
摘要：我们研究了结构化的递归状态是否可以作为一个紧凑的关联骨干语言建模，同时仍然支持精确检索。我们介绍了UniMatrix，一个通用的Transformer风格的家庭，跨深度重用共享的经常性块，并增加了混合状态更新，ROSA风格的残留路径，和令牌条件嵌入调制。我们评估这些模型的字节级WikiText-2，合成关联召回，苹果MPS上的吞吐量分析，以及三重令牌交互的校正基准。在小规模下，UniMatrix-Core和UniMatrix-ROSA在WikiText-2上的性能略优于参数匹配的Transformer，同时使用更少的参数，达到5.084和5.083位每字节，而不是5.124。主要的负面结果同样重要：在关联召回方面，原始UniMatrix家族仍然接近概率，而Transformer达到25.4%，这表明仅压缩递归状态不足以进行精确查找。一个检索导向的后续行动，UniMatrix-Asynchronous，帮助不大。相比之下，UniMatrix-SparsePointer增加了稀疏槽路由和直接指针-logit融合，在原始试验配方上达到75.6%，在无丢失后续上达到99.2%，同时使用的参数比Transformer基线少53.8%。烧蚀表明，增益来自足够的槽容量和精确的指针级输出路由。总的来说，结构化的递归状态是有前途的和参数有效的，但强大的长期行为仍然需要显式稀疏检索和更好的内核。
摘要：We study whether a structured recurrent state can serve as a compact associative backbone for language modeling while still supporting exact retrieval. We introduce UniMatrix, a Universal Transformer style family that reuses a shared recurrent block across depth and augments it with hybrid state updates, a ROSA-style residual path, and token-conditioned embedding modulation. We evaluate these models on byte-level WikiText-2, synthetic associative recall, throughput profiling on Apple MPS, and a corrected benchmark for triple-token interactions. At small scale, UniMatrix-Core and UniMatrix-ROSA slightly outperform a parameter-matched Transformer on WikiText-2 while using many fewer parameters, reaching 5.084 and 5.083 bits-per-byte versus 5.124. The main negative result is equally important: on associative recall, the original UniMatrix family remains near chance while the Transformer reaches 25.4 percent, showing that compressed recurrent state alone is not enough for exact lookup. A retrieval-oriented follow-up, UniMatrix-Assoc, helps only marginally. By contrast, UniMatrix-SparsePointer, which adds sparse slot routing and direct pointer-logit fusion, reaches 75.6 percent on the original pilot recipe and 99.2 percent on a no-dropout follow-up while using 53.8 percent fewer parameters than the Transformer baseline. Ablations show that the gain comes from sufficient slot capacity and exact pointer-level output routing. Overall, structured recurrent state is promising and parameter-efficient, but strong long-range behavior still requires explicit sparse retrieval and better kernels.

【5】Stochastic Scaling Limits and Synchronization by Noise in Deep Transformer Models
标题：深度Transformer模型中的随机缩放限制和噪音同步
链接：https://arxiv.org/abs/2604.26898

作者：Andrea Agazzi,Giuseppe Bruno,Eloy Mosig García,Samuele Saviozzi,Marco Romito
备注：55 pages, 6 figures
摘要：我们证明了在一个有限深度，有限宽度的Transformer模型与多层感知器（MLP）块的连续时间随机相互作用粒子系统的令牌逐层演化的路径收敛。我们还确定了随机偏微分方程描述的令牌的分布在此限制的演变，并证明了混沌的传播时，这种令牌的数量是大的。我们建立的界限是定量的，我们认为极限是可交换的。我们进一步证明了有限的随机模型显示同步的噪声和建立指数耗散的相互作用能量的平均值，只要共同的噪声是足够的强制性相对于确定性的自我注意力漂移。最后，我们刻画了满足前一个条件的激活函数。
摘要：We prove pathwise convergence of the layerwise evolution of tokens in a finite-depth, finite-width transformer model with MultiLayer Perceptron (MLP) blocks to a continuous-time stochastic interacting particle system. We also identify the stochastic partial differential equation describing the evolution of the tokens' distribution in this limit and prove propagation of chaos when the number of such tokens is large. The bounds we establish are quantitative and the limits we consider commute. We further prove that the limiting stochastic model displays synchronization by noise and establish exponential dissipation of the interaction energy on average, provided that the common noise is sufficiently coercive relative to the deterministic self-attention drift. We finally characterize the activation functions satisfying the former condition.

GAN|对抗|攻击|生成相关(1篇)

【1】Adversarial Robustness of NTK Neural Networks
标题：NTK神经网络的对抗鲁棒性
链接：https://arxiv.org/abs/2604.25965

作者：Yuxuan Hou
摘要：深度学习模型被广泛部署在安全关键领域，但仍然容易受到对抗性攻击。在本文中，我们研究了NTK神经网络的对抗鲁棒性在非参数回归的背景下。我们在Sobolev空间中建立了对抗回归的最小最大最优速率，然后证明了通过早期停止的梯度流训练的NTK神经网络可以达到这个最优速率。然而，在过拟合制度，我们证明了最小范数插值是容易受到对抗扰动。
摘要：Deep learning models are widely deployed in safety-critical domains, but remain vulnerable to adversarial attacks. In this paper, we study the adversarial robustness of NTK neural networks in the context of nonparametric regression. We establish minimax optimal rates for adversarial regression in Sobolev spaces and then show that NTK neural networks, trained via gradient flow with early stopping, can achieve this optimal rate. However, in the overfitting regime, we prove that the minimum norm interpolant is vulnerable to adversarial perturbations.

半/弱/无/有监督|不确定性|主动学习(6篇)

【1】Uncertainty-Aware Predictive Safety Filters for Probabilistic Neural Network Dynamics
标题：用于概率神经网络动力学的不确定性预测安全过滤器
链接：https://arxiv.org/abs/2604.26836

作者：Bernd Frauenknecht,Lukas Kesper,Daniel Mayfrank,Henrik Hose,Sebastian Trimpe
摘要：预测安全过滤器（PSF）利用模型预测控制在深度强化学习（RL）探索期间强制约束满足，但它们对第一原理模型或高斯过程的依赖限制了可扩展性和更广泛的适用性。与此同时，基于模型的强化学习（MBRL）方法通常采用概率集成（PE）神经网络，以最少的先验知识从数据中捕获复杂的高维动态。然而，现有的尝试将PE整合到PSF缺乏严格的不确定性量化。我们引入了不确定性感知预测安全过滤器（UPSi），这是一种PSF，它通过将未来结果制定为可达集，使用PE动态模型提供严格的安全预测。UPSi引入了一个明确的确定性约束，防止模型利用和无缝集成到常见的MBRL框架。我们在标准安全RL基准上评估了Dyna风格MBRL中的UPSi，并报告了与以前的神经网络PSF相比，探索安全性的实质性改进，同时保持与标准MBRL相当的性能。UPSi弥补了现代MBRL的可扩展性和通用性与预测安全过滤器的安全保证之间的差距。
摘要：Predictive safety filters (PSFs) leverage model predictive control to enforce constraint satisfaction during deep reinforcement learning (RL) exploration, yet their reliance on first-principles models or Gaussian processes limits scalability and broader applicability. Meanwhile, model-based RL (MBRL) methods routinely employ probabilistic ensemble (PE) neural networks to capture complex, high-dimensional dynamics from data with minimal prior knowledge. However, existing attempts to integrate PEs into PSFs lack rigorous uncertainty quantification. We introduce the Uncertainty-Aware Predictive Safety Filter (UPSi), a PSF that provides rigorous safety predictions using PE dynamics models by formulating future outcomes as reachable sets. UPSi introduces an explicit certainty constraint that prevents model exploitation and integrates seamlessly into common MBRL frameworks. We evaluate UPSi within Dyna-style MBRL on standard safe RL benchmarks and report substantial improvements in exploration safety over prior neural network PSFs while maintaining performance on par with standard MBRL. UPSi bridges the gap between the scalability and generality of modern MBRL and the safety guarantees of predictive safety filters.

【2】Learning to Route Electric Trucks Under Operational Uncertainty
标题：学习在运营不确定性下为电动卡车布线
链接：https://arxiv.org/abs/2604.26566

作者：Stavros Orfanoudakis,Ziyan Li,Ruixiao Yang,Nikolay Aristov,Pedro P. Vergara,Chuchu Fan,Elenna Dugundji
备注：Reinforcement Learning, Electric Truck Routing, Freight Transportation, Graph Neural Networks, Stochastic Optimization, Vehicle Routing
摘要：电动卡车运营需要在有限的电池范围、长充电时间、旅行和能源消耗以及共享充电基础设施竞争的情况下保持可行的路线决策。这些特征使得电动卡车路线规划成为一个耦合的物流和能源问题，限制了基于地理学的方法的实用性，并使其在大规模计算上不可行。针对充电约束和运营不确定性下的随机电动卡车路径问题，提出了一种基于学习的框架.通过强化学习解决的问题，被制定为一个事件驱动的半马尔可夫决策过程，具有共享充电资源，随机旅行和能源需求，以及现实的非线性快速充电行为。为了支持在这种情况下的学习，引入了基于图形的系统状态和可行决策的表示，以及基于规则的动作掩码，将策略限制在操作上允许的动作;从而提高了训练效率。在此基础上，开发了一个事件驱动的仿真环境，支持强化学习和对启发式和数学编程基线的基准测试。在一系列车队规模的计算实验表明，所提出的基于学习的算法始终优于基线，并在许多设置中达到接近优化基准的性能，同时在充电拥塞和不确定性下保持高成功率。
摘要：Electric truck operations require routing decisions that remain feasible under limited battery range, long charging times, travel and energy consumption, and competition for shared charging infrastructure. These features make electric truck routing a coupled logistics and energy problem, limiting the practicality of heuristics-based methods and rendering them computationally infeasible at scale. This paper proposes a learning-based framework for the stochastic electric truck routing under charging constraints and operational uncertainty. The problem, solved by Reinforcement Learning, is formulated as an event-driven semi-Markov decision process with shared charging resources, stochastic travel and energy requirements, and realistic nonlinear fast-charging behavior. To support learning in this setting, a graph-based representation of system state and feasible decisions is introduced, together with a rule-based action mask that restricts policies to operationally admissible actions; thus, improving training efficiency. Building on this formulation, an event-driven simulation environment is developed that supports both Reinforcement Learning and benchmarking against heuristic and mathematical programming baselines. Computational experiments across a range of fleet sizes show that the proposed learning-based algorithm consistently outperforms baselines and attains performance close to optimization benchmarks in many settings, while preserving high success rates under charging congestion and uncertainty.

【3】Lyapunov-Guided Self-Alignment: Test-Time Adaptation for Offline Safe Reinforcement Learning
标题：Lyapunov引导的自对准：离线安全强化学习的测试时间适应
链接：https://arxiv.org/abs/2604.26516

作者：Seungyub Han,Hyungjin Kim,Jungwoo Lee
备注：Accepted at AISTATS 2026. First two authors contributed equally. Project page: https://seungyubhan.github.io/sas/. Code: https://github.com/seungyubhan/sas
摘要：离线强化学习（RL）代理在部署时经常失败，因为训练数据集和真实环境之间的差距会导致不安全的行为。为了解决这个问题，我们提出了SAS（Self-Alignment for Safety），这是一个基于transformer的框架，可以在离线安全RL中进行测试时自适应，而无需重新训练。在SAS中，主要机制是自对准：在测试时，预先训练的代理生成几个想象的轨迹，并选择那些满足李雅普诺夫条件。然后，这些可行的片段被回收作为上下文提示，允许代理重新调整其行为，同时避免参数更新。实际上，SAS将Lyapunov引导的想象力转化为控制不变的提示，并且其Transformer架构允许分层RL解释，其中提示功能作为潜在技能的贝叶斯推理。在Safety Gymnasium和MuJoCo基准测试中，SAS始终降低成本和故障，同时保持或提高回报。
摘要：Offline reinforcement learning (RL) agents often fail when deployed, as the gap between training datasets and real environments leads to unsafe behavior. To address this, we present SAS (Self-Alignment for Safety), a transformer-based framework that enables test-time adaptation in offline safe RL without retraining. In SAS, the main mechanism is self-alignment: at test time, the pretrained agent generates several imagined trajectories and selects those satisfying the Lyapunov condition. These feasible segments are then recycled as in-context prompts, allowing the agent to realign its behavior toward safety while avoiding parameter updates. In effect, SAS turns Lyapunov-guided imagination into control-invariant prompts, and its transformer architecture admits a hierarchical RL interpretation where prompting functions as Bayesian inference over latent skills. Across Safety Gymnasium and MuJoCo benchmarks, SAS consistently reduces cost and failure while maintaining or improving return.

【4】Topology-Aware Representation Alignment for Semi-Supervised Vision-Language Learning
标题：半监督视觉语言学习的Topology感知表示对齐
链接：https://arxiv.org/abs/2604.26370

作者：Junwon You,Mihyun Jang,Sangwoo Mo,Jae-Hun Jung
备注：30 pages, 10 figures, 24 tables
摘要：视觉语言模型表现出了很强的性能，但它们通常不能很好地推广到专业领域。虽然半监督视觉语言学习通过利用一小部分标记的图像-文本对以及大量未标记的图像来减轻这种限制，但现有方法基本上仍然是成对的，并且无法对多模态表示流形的全局结构进行建模。现有的基于拓扑的对齐方法依赖于持久性图匹配，既不能保证几何对齐，也不能利用视觉语言学习的图像-文本配对信息。我们提出了拓扑感知多模态表示对齐（ToMA），一个框架，使用持久的同源性来识别拓扑突出的边缘，并通过可用的跨模态对应将它们跨模态对齐。ToMA利用了H_0-死亡边和轻量级H_1-出生边，使其能够在不构造2-单形的情况下捕获连通性和循环结构。实验表明，ToMA产生稳定的收益，在遥感和适度但一致的好处时尚检索上有明显的改善。进一步的分析表明，ToMA比其他基于拓扑的目标更稳定，并且轻量级H_1出生边提供了有用的高阶结构信号。
摘要：Vision-language models have shown strong performance, but they often generalize poorly to specialized domains. While semi-supervised vision-language learning mitigates this limitation by leveraging a small set of labeled image-text pairs together with abundant unlabeled images, existing methods remain fundamentally pairwise and fail to model the global structure of multimodal representation manifolds. Existing topology-based alignment methods rely on persistence diagram matching, which neither guarantees geometric alignment nor utilizes the image-text pairing information central to vision-language learning. We propose Topology-Aware Multimodal Representation Alignment (ToMA), a framework that uses persistent homology to identify topologically salient edges and aligns them across modalities through available cross-modal correspondences. ToMA leverages both H_0-death edges and lightweight H_1-birth edges, allowing it to capture both connectivity and cycle structure without constructing 2-simplices. Experiments show that ToMA yields stable gains, with clear improvements on remote sensing and modest but consistent benefits on fashion retrieval. Additional analysis shows that ToMA is more stable than alternative topology-based objectives and that lightweight H_1-birth edges provide useful higher-order structural signals.

【5】Uncertainty-Aware Reward Discounting for Mitigating Reward Hacking
标题：具有不确定性意识的奖励折扣以缓解奖励黑客行为
链接：https://arxiv.org/abs/2604.26360

作者：Disha Singha
备注：31 pages, 18 figures, 3 tables
摘要：强化学习（RL）系统通常会优化标量奖励函数，这些函数假设对结果进行精确和可靠的评估。然而，现实世界的目标，特别是那些来自人类偏好的目标，往往是不确定的，依赖于上下文的，内部不一致的。这种不匹配可能导致对齐失败，例如奖励黑客，过度优化和过度自信的行为。我们引入了一个双源的不确定性意识的奖励框架，明确模型的价值估计和人类偏好的不确定性认知的不确定性。模型的不确定性是通过集合不一致的价值预测，而偏好的不确定性来自奖励注释的变化。我们通过一个信心调整的可靠性过滤器来结合这些信号，该过滤器自适应地调节动作选择，鼓励在利用和谨慎之间取得平衡。多个离散网格配置（6x6，8x8，10 x10）和高维连续控制环境（Hopper-v4，Walker 2d-v4）的实证结果表明，我们的方法产生更稳定的训练动态，并减少了奖励模糊下的剥削行为，通过陷阱访问频率测量，奖励黑客行为减少了93.7%。我们证明了这些改进的统计显着性和在高达30%的监督噪音下的鲁棒性，尽管与无约束基线相比，观察到的峰值奖励有所权衡。通过将不确定性视为奖励信号的第一类组成部分，这项工作为更可靠和更一致的强化学习系统提供了一种原则性方法。
摘要：Reinforcement learning (RL) systems typically optimize scalar reward functions that assume precise and reliable evaluation of outcomes. However, real-world objectives--especially those derived from human preferences--are often uncertain, context-dependent, and internally inconsistent. This mismatch can lead to alignment failures such as reward hacking, over-optimization, and overconfident behavior. We introduce a dual-source uncertainty-aware reward framework that explicitly models both epistemic uncertainty in value estimation and uncertainty in human preferences. Model uncertainty is captured via ensemble disagreement over value predictions, while preference uncertainty is derived from variability in reward annotations. We combine these signals through a confidence-adjusted Reliability Filter that adaptively modulates action selection, encouraging a balance between exploitation and caution. Empirical results across multiple discrete grid configurations (6x6, 8x8, 10x10) and high-dimensional continuous control environments (Hopper-v4, Walker2d-v4) demonstrate that our approach yields more stable training dynamics and reduces exploitative behaviors under reward ambiguity, achieving a 93.7% reduction in reward-hacking behavior as measured by trap visitation frequency. We demonstrate statistical significance of these improvements and robustness under up to 30% supervisory noise, albeit with a trade-off in peak observed reward compared to unconstrained baselines. By treating uncertainty as a first-class component of the reward signal, this work offers a principled approach toward more reliable and aligned reinforcement learning systems.

【6】Similarity Choice and Negative Scaling in Supervised Contrastive Learning for Deepfake Audio Detection
标题：Deepfake音频检测的监督对比学习中的相似性选择和负缩放
链接：https://arxiv.org/abs/2604.26057

作者：Jaskirat Sudan,Hashim Ali,Surya Subramani,Hafiz Malik
摘要：监督对比学习（SupCon）被广泛用于形状表示，但对音频深度伪造检测的针对性研究有限。现有的工作通常将对比术语与更广泛的管道相结合;然而，对SupCon本身的关注缺失。在这项工作中，我们对wav 2 vec 2 XLS-R（300 M）进行了一项对照研究，该研究改变了（i）SupCon中的相似性（余弦与来自超球面角的角度相似性）和（ii）使用热启动全局跨批次队列的负缩放。阶段1使用SupCon微调编码器和投影头;阶段2冻结它们并使用BCE训练线性分类器。在ASVspoof 2019 LA上进行训练，并在ASV 19 eval加上ITW和ASVspoof 2021 DF/LA上进行评估，具有延迟队列的余弦SupCon实现了最佳的ITW EER（8.29%）和合并EER（4.44），而角度相似性在没有排队否定（ITW 8.70）的情况下表现强劲，表明对大型否定集的依赖减少。
摘要：Supervised contrastive learning (SupCon) is widely used to shape representations, but has seen limited targeted study for audio deepfake detection. Existing work typically combines contrastive terms with broader pipelines; however, the focus on SupCon itself is missing. In this work, we run a controlled study on wav2vec2 XLS-R (300M) that varies (i) similarity in SupCon (cosine vs angular similarity derived from the hyperspherical angle) and (ii) negative scaling using a warm-started global cross-batch queue. Stage 1 fine-tunes the encoder and projection head with SupCon; Stage 2 freezes them and trains a linear classifier with BCE. Trained on ASVspoof 2019 LA and evaluated on ASV19 eval plus ITW and ASVspoof 2021 DF/LA, Cosine SupCon with a delayed queue achieves the best ITW EER (8.29%) and pooled EER (4.44), while angular similarity performs strongly without queued negatives (ITW 8.70), indicating reduced reliance on large negative sets.

迁移|Zero/Few/One-Shot|自适应(5篇)

【1】PAINT: Partial-Solution Adaptive Interpolated Training for Self-Distilled Reasoners
标题：PANT：自提炼推理者的部分解决方案自适应插值训练
链接：https://arxiv.org/abs/2604.26573

作者：Zhiquan Tan,Yinrong Hong
摘要：改进大型语言模型（LLM）推理需要与模型自己的测试时状态保持一致并在令牌级别提供信息的监督。具有可验证奖励的强化学习提供了对策略的探索，但提供了稀疏的高方差信用;监督微调和蒸馏提供了密集的目标，但通常在固定的轨迹上进行训练或依赖于更强的教师。最近的特权上的政策自我升华探索了一个中间地带，评分学生推出相同的模型下验证的解决方案上下文。我们通过上下文重新评分镜头重新审视这一设置：对于推理，重要的选择不仅是特权上下文是否可用，而且应该透露多少以及它的分布应该塑造学生。我们提出了PAINT（部分解决方案自适应插值训练），它根据滚动参考重叠掩盖了验证的解决方案，并在一组稀疏的熵不匹配令牌位置上应用了一个小的能量空间插值。在整个竞争水平的数学基准，PAINT一贯提高了一个强大的事先对政策的自我蒸馏基线在所有三个Qwen 3规模。在Qwen 3 -8B上，它将宏Avg@12比该先前基线提高2.1个点，比GRPO提高2.9个点。
摘要：Improving large language model (LLM) reasoning requires supervision that is both aligned with the model's own test-time states and informative at the token level. Reinforcement learning with verifiable rewards provides on-policy exploration but offers sparse, high-variance credit; supervised fine-tuning and distillation provide dense targets but often train on fixed trajectories or rely on stronger teachers. Recent privileged on-policy self-distillation explores a middle ground by scoring student rollouts with the same model under verified solution context. We revisit this setting through a contextual re-scoring lens: for reasoning, the important choices are not only whether privileged context is available, but how much of it should be revealed and where its distribution should shape the student. We propose PAINT (Partial-solution Adaptive INterpolated Training), which masks the verified solution according to rollout-reference overlap and applies a small energy-space interpolation on a sparse set of entropy-mismatch token positions. Across competition-level math benchmarks, PAINT consistently improves over a strong prior on-policy self-distillation baseline at all three Qwen3 scales. On Qwen3-8B, it raises macro Avg@12 by 2.1 points over this prior baseline and 2.9 points over GRPO.

【2】Advancing multi-site emission control: A physics-informed transfer learning framework with mixture of experts for carbon-pollutant synergy
标题：推进多站点排放控制：一个基于物理知识的转移学习框架，由专家组成，实现碳污染物协同作用
链接：https://arxiv.org/abs/2604.26571

作者：Yuxuan Ying,Hanqing Yang,Kaige Wang,Yu Hu,Zhiming Zheng,Yunliang Jiang,Xiaoqing Lin,Xiaodong Li,Jun Chen
备注：Supplementary materials will be released after the final version is finalized
摘要：城市固体废物焚烧越来越成为城市废物管理的核心，但其可持续性效益取决于在高度异质的操作条件下控制碳排放和多种空气污染物。目前的数据驱动模型在单个工厂内通常是准确的，但很难在设施之间传输，限制了它们对可扩展的排放控制策略的价值。在这里，我们表明，多站点的排放行为可以表示通过可转让的系统级结构时，物理约束，操作制度的异质性和碳-污染物耦合共同考虑。我们开发了一个基于物理信息的迁移学习框架，该框架建立在碳-污染物混合专家模型的基础上，该模型将依赖于制度的专家路由与基于保护的正则化和碳-污染物协同指数相结合，用于综合风险评估。在13个城市固体废物焚烧厂，该模型捕获了污染物特定排放和系统级风险，实现了源域平均污染物R^2 $值为0.668- 0.904，CPSI R^2 $值为0.666- 0.970。从参考设施转移到12个目标工厂后，污染物的平均R^2 $保持在0.661 - 0.842之间，而CPSI保持了相当的可转移性（R^2 $ = 0.610- 0.841）。专家利用模式进一步表明，适应发生通过结构化的重新加权的操作制度，而不是完整的模型重新学习。通过将学习到的表示扩展为可解释的数字孪生模型，该框架提供了一条从排放预测到制度感知操作导航的路线，支持跨异构废物转化为能源系统的可扩展碳污染物协同控制。
摘要：Municipal solid waste incineration is increasingly central to urban waste management, yet its sustainability benefit depends on controlling carbon emissions and multiple air pollutants under highly heterogeneous operating conditions. Current data-driven models are often accurate within individual plants but are difficult to transfer across facilities, limiting their value for scalable emission-control strategies. Here we show that multi-site emission behaviour can be represented through transferable system-level structures when physical constraints, operating-regime heterogeneity and carbon--pollutant coupling are jointly considered. We develop a physics-informed transfer learning framework built on a carbon--pollutant mixture-of-experts model, which combines regime-dependent expert routing with conservation-based regularization and a carbon--pollutant synergistic index for integrated risk evaluation. Across 13 municipal solid waste incineration plants, the model captured both pollutant-specific emissions and system-level risk, achieving source-domain average pollutant $R^2$ values of 0.668--0.904 and CPSI $R^2$ values of 0.666--0.970. After transfer from a reference facility to 12 target plants, average pollutant $R^2$ remained between 0.661 and 0.842, while CPSI retained comparable transferability ($R^2$ = 0.610--0.841). Expert-utilization patterns further indicate that adaptation occurs through structured re-weighting of operating regimes rather than complete model re-learning. By extending the learned representation into an interpretable digital twin, this framework provides a route from emission prediction to regime-aware operational navigation, supporting scalable carbon--pollutant synergistic control across heterogeneous waste-to-energy systems.

【3】Hierarchical adaptive control for real-time dynamic inference at the edge
标题：边缘实时动态推理的分层自适应控制
链接：https://arxiv.org/abs/2604.26470

作者：Francesco Daghero,Mahyar Tourchi Moghaddam,Mikkel Baun Kjærgaard
备注：Accepted as paper at 5th Real-time And intelliGent Edge computing (RAGE 2026) workshop
摘要：工业系统越来越依赖于机器学习（ML），并在异构节点上运行，这些节点必须满足严格的延迟、能量和内存限制。动态ML模型在运行时重新配置其计算足迹，承诺高能效和较低的平均延迟，以实现适度的准确性权衡;然而，由于它们依赖于额外的超参数，它们的部署是复杂的。这些超参数控制精度与平均延迟的权衡，通常在必须匹配测试时间分布的校准数据集上进行调整，这一假设在现实世界中很少成立，导致次优操作条件，可能低于静态模型。我们提出了一个两层的自适应架构，共同优化模型和系统的决策。在全局级别上，调度器为每个边缘节点配置和部署由轻量级专用模型和通用回退组成的分类器级联，满足延迟和内存约束。在节点级别，本地控制器跟踪数据漂移和硬件资源，启用或禁用专用预测器（SP）以保持高能效并避免在不同条件下违反延迟约束。这种设计允许更长的操作时间，而无需强制全局重新部署步骤，并在无法访问远程全局控制器的情况下实现高效执行。我们在受控分布不匹配场景下对两个数据集进行了评估，显示平均每个推理延迟减少高达2.45倍，能量减少高达2.86倍，与静态基线相比准确率下降不到4%。我们的贡献是：（1）一个预算的SP级联公式，保留最坏情况下的延迟约束;（2）一个分层的控制器，保持数据和资源的变化下的效率;和（3）嵌入式硬件的实验评估。
摘要：Industrial systems increasingly depend on Machine Learning (ML), and operate on heterogeneous nodes that must satisfy tight latency, energy, and memory constraints. Dynamic ML models, which reconfigure their computational footprint at runtime, promise high energy efficiency and lower average latency for modest accuracy tradeoffs; however, their deployment is complex due to the additional hyperparameters they rely on. These hyperparameters, controlling the accuracy versus average latency tradeoff, are often tuned on a calibration dataset that must match the test time distribution, an assumption that rarely holds in real-world scenarios, leading to suboptimal operational conditions, possibly below static models. We propose a two-tier adaptive architecture that co-optimizes model and system decisions. At the global level, a scheduler configures and deploys, for each edge node, a cascade of classifiers composed of lightweight specialized models and a generalist fallback, satisfying latency and memory constraints. At the node level, a local controller tracks data drifts and hardware resources, enabling or disabling specialized predictors (SP) to preserve high energy efficiency and avoid latency-constraint violations under varying conditions. This design allows longer operating times without forcing a global redeployment step, and enables efficient execution in case of an unreachable remote global controller. We evaluate the approach on two datasets under controlled distribution mismatch scenarios, showing average per-inference reductions of latency up to 2.45x and energy up to 2.86x, with less than 4% accuracy drop compared to static baselines. Our contributions are:(1) a budgeted SP-cascade formulation that preserves worst-case latency constraints;(2) a hierarchical controller that maintains efficiency under data and resource changes; and (3) an experimental evaluation on embedded hardware.

【4】Adaptive and Fine-grained Module-wise Expert Pruning for Efficient LoRA-MoE Fine-Tuning
标题：自适应和细粒度的模块专家修剪，以实现高效的LoRA-MoE微调
链接：https://arxiv.org/abs/2604.26340

作者：Weihang Li,Jianchun Liu,Hongli Xu
摘要：LoRA-MoE已成为参数高效微调的有效范例，将LoRA的低训练成本与专家混合（MoE）的增强适应能力相结合。然而，现有的LoRA-MoE框架通常在异构Transformer模块（例如，注意力查询/关键预测和MLP门控网络）之间采用固定和统一的专家配置，忽略了它们不同的功能角色和容量要求。这种设计会导致局部过度配置、冗余的可训练参数和不必要的优化器状态开销。此外，现有方法在整个训练期间在专家之间强制负载平衡。虽然在早期阶段是有益的，但一旦路由模式稳定下来，这种约束就变得限制性，限制了下游任务的专家专业化。在本文中，我们提出了DMEP，一种新的LoRA-MoE微调框架的基础上动态模块明智的专家剪枝。DMEP在培训过程中跟踪专家利用率，并在每个模块的基础上物理删除低效用专家，从而产生针对不同模块的更紧凑的专家结构。然后，修剪后的模型在没有负载平衡约束的情况下继续训练，使剩余的专家能够完全专注于下游任务并开发专业知识。通过联合调整模块专家容量和消除不必要的平衡，DMEP提高了参数效率和训练效率。在多个推理基准上的大量实验表明，DMEP在保持或超过LoRA-MoE一致基线的下游推理精度的同时，减少了35%~ 43%的可训练参数，提高了约10%的训练吞吐量.
摘要：LoRA-MoE has emerged as an effective paradigm for parameter-efficient fine-tuning, combining the low training cost of LoRA with the increased adaptation capacity of Mixture-of-Experts (MoE). However, existing LoRA-MoE frameworks typically adopt a fixed and uniform expert configuration across heterogeneous Transformer modules (\eg, attention query/key projections and MLP gating networks), ignoring their distinct functional roles and capacity requirements. This design leads to localized over-provisioning, redundant trainable parameters, and unnecessary optimizer-state overhead. Moreover, prior methods enforce load balancing among experts throughout training. Although beneficial in the early stage, this constraint becomes restrictive once routing patterns stabilize, limiting expert specialization on downstream tasks. In this paper, we propose DMEP, a novel LoRA-MoE fine-tuning framework based on Dynamic Module-wise Expert Pruning. DMEP tracks expert utilization during training and physically removes low-utility experts on a per-module basis, yielding a more compact expert structure tailored to different modules. The pruned model then continues training without the load-balancing constraint, freeing the remaining experts to focus entirely on the downstream task and develop specialized expertise. By jointly adapting module-wise expert capacity and eliminating unnecessary balancing, DMEP improves both parameter efficiency and training efficiency. Extensive experiments on multiple reasoning benchmarks show that DMEP reduces trainable parameters by 35\%--43\% and improves training throughput by about 10\%, while maintaining or surpassing the downstream reasoning accuracy of uniform LoRA-MoE baselines.

【5】SWAN: World-Aware Adaptive Multimodal Networks for Runtime Variations
标题：SWAN：适用于收件箱变化的世界知名自适应多模式网络
链接：https://arxiv.org/abs/2604.26181

作者：Jason Wu,Shir-Kang Scott Jinn,Yuyang Yuan,Maggie Wigness,Lance M. Kaplan,Hang Qiu,Mani Srivastava
摘要：部署在现实环境中的多模态深度神经网络必须应对运行时的变化：模态质量、整体输入复杂性和可用平台资源的变化。当前的网络与这种波动作斗争--自适应网络不能坚持严格的计算预算，基于网络的网络忽视考虑输入复杂性，静态配置的网络在上述所有方面都失败了。因此，它们不能从消耗的计算资源中提取最大效用。我们提出了SWAN（样本和世界感知多模态网络），这是第一个实现所有三个目标的自适应多模态网络。SWAN采用质量感知控制器根据可变的用户指定的最大预算在模态之间分配资源。在此预算内，自适应门控模块通过根据样本复杂度缩放层利用率来进一步优化效率。为了获得进一步的收益，SWAN还采用了一个令牌丢弃模块，该模块在执行检测之前掩盖语义上不相关的多模式特征。我们在自动驾驶领域使用复杂的多目标3D检测来评估SWAN，以最小的退化将FLOP降低高达49%。
摘要：Multimodal deep neural networks deployed in realistic environments must contend with runtime variations: changes in modality quality, overall input complexity, and available platform resources. Current networks struggle with such fluctuations -- adaptive networks cannot adhere to a strict compute budget, controller-based networks neglect to consider input complexity, and statically provisioned networks fail at all the above. Consequently, they do not extract maximum utility from the expended computational resources. We present SWAN (Sample and World-Aware Multimodal Network), the first adaptive multimodal network that accomplishes all three goals. SWAN employs a quality-aware controller to assign resources among modalities according to a variable user-specified maximum budget. Within this budget, an adaptive gating module further optimizes efficiency by scaling layer utilization according to sample complexity. For further gains, SWAN also employs a token dropping module that masks semantically irrelevant multimodal features before performing detections. We evaluate SWAN in the domain of autonomous driving with complex multi-object 3D detection, reducing FLOPs by up to 49% with minimal degradation.

强化学习(1篇)

【1】Rule-based High-Level Coaching for Goal-Conditioned Reinforcement Learning in Search-and-Rescue UAV Missions Under Limited-Simulation Training
标题：有限模拟训练下的搜救无人机任务中基于规则的高级辅导用于目标条件强化学习
链接：https://arxiv.org/abs/2604.26833

作者：Mahya Ramezani,Holger Voos
摘要：针对无人机（UAV）在有限模拟训练条件下的搜救任务，提出了一种分层决策框架。该框架结合了一个固定的基于规则的高级顾问与在线目标条件的低级强化学习（RL）控制器。为了对早期适应进行压力测试，我们还考虑了严格的无预训练部署制度。高级顾问是从结构化任务规范离线定义的，并编译成确定性规则。它通过建议的行动、避免的行动和依赖于体制的仲裁权重提供可解释的任务和安全意识指导。低级别控制器从任务定义的密集奖励中在线学习，并通过一个模式感知的优先级重放机制来重用经验，该机制由规则派生的元数据增强。我们评估两个任务的框架：电池感知多目标交付和移动目标交付在障碍丰富的环境。在这两项任务中，所提出的方法主要通过减少碰撞终止来提高早期安全性和样本效率，同时保留在线适应特定于特定于机器人的动态的能力。
摘要：This paper presents a hierarchical decision-making framework for unmanned aerial vehicle (UAV) missions motivated by search-and-rescue (SAR) scenarios under limited simulation training. The framework combines a fixed rule-based high-level advisor with an online goal-conditioned low-level reinforcement learning (RL) controller. To stress-test early adaptation, we also consider a strict no-pretraining deployment regime. The high-level advisor is defined offline from a structured task specification and compiled into deterministic rules. It provides interpretable mission- and safety-aware guidance through recommended actions, avoided actions, and regime-dependent arbitration weights. The low-level controller learns online from task-defined dense rewards and reuses experience through a mode-aware prioritized replay mechanism augmented with rule-derived metadata. We evaluate the framework on two tasks: battery-aware multi-goal delivery and moving-target delivery in obstacle-rich environments. Across both tasks, the proposed method improves early safety and sample efficiency primarily by reducing collision terminations, while preserving the ability to adapt online to scenario-specific dynamics.

元学习(1篇)

【1】Meta-Learning and Targeted Differential Privacy to Improve the Accuracy-Privacy Trade-off in Recommendations
标题：元学习和有针对性的差异隐私以改善建议中的准确性与隐私权衡
链接：https://arxiv.org/abs/2604.26390

作者：Peter Müllner,Dominik Kowald,Markus Schedl,Elisabeth Lex
备注：Accepted at LBR@UMAP'26
摘要：在隐私保护推荐系统中，平衡差分隐私（DP）和推荐准确性是一个关键的挑战，因为DP噪声会降低准确性。我们在数据和模型层面都解决了这种权衡。在数据层面，我们只将DP应用于最典型的用户数据，这些数据可能会显示敏感属性，如性别或年龄，以减少不必要的干扰;我们将其称为目标DP。在模型级别，我们使用元学习来提高对剩余DP噪声的鲁棒性。与标准方法相比，这在准确性和隐私性之间实现了更好的权衡：与统一应用的DP和完整的DP基线相比，元学习提高了准确性，有针对性的DP导致更低的经验隐私风险。总的来说，我们的研究结果表明，在数据级别选择性地应用DP以及在模型级别的元学习可以有效地平衡推荐准确性和用户隐私。
摘要：Balancing differential privacy (DP) with recommendation accuracy is a key challenge in privacy-preserving recommender systems, since DP-noise degrades accuracy. We address this trade-off at both the data and model levels. At the data level, we apply DP only to the most stereotypical user data likely to reveal sensitive attributes, such as gender or age, to reduce unnecessary perturbation; we refer to this as targeted DP. At the model level, we use meta-learning to improve robustness to remaining DP-noise. This achieves a better trade-off between accuracy and privacy than standard approaches: Meta-learning improves accuracy and targeted DP leads to lower empirical privacy risk compared to uniformly applied DP and full DP baselines. Overall, our findings show that selectively applying DP at the data level together with meta-learning at the model level can effectively balance recommendation accuracy and user privacy.

符号|符号学习(2篇)

【1】AGEL-Comp: A Neuro-Symbolic Framework for Compositional Generalization in Interactive Agents
标题：AGEL-Comp：交互式主体中成分概括的神经符号框架
链接：https://arxiv.org/abs/2604.26522

作者：Mahnoor Shahid,Hannes Rothe
备注：Accepted at IntelliSys 2026
摘要：大型语言模型（LLM）为基础的代理表现出系统性故障的合成泛化，限制了他们的鲁棒性在交互式环境中。这项工作介绍了AGEL-Comp，一种神经符号AI代理架构，旨在通过代理的接地动作来解决这一挑战。AGEL-Comp集成了三个核心创新：（1）动态因果程序图（CPG）作为世界模型，将过程和因果知识表示为有向超图;（2）归纳逻辑编程（ILP）引擎，从经验反馈中合成新的Horn子句，通过交互建立符号知识;以及（3）混合推理核心，其中LLM提出一组候选子目标，这些子目标由神经定理证明器（NTP）验证逻辑一致性。总之，这些组件操作演绎-溯因学习周期：使代理推断计划和溯因扩展其符号世界模型，而神经适应阶段保持其推理引擎与新知识保持一致。我们提出了一个评估协议内的\texttt{复古任务}模拟环境探测合成泛化的情况下，以评估我们的AGEL代理。我们的研究结果清楚地表明，我们的AGEL模型比纯基于LLM的模型性能更好。我们的框架提出了一个原则性的路径，建立一个明确的，可解释的，和组成结构的理解他们的世界的代理。
摘要：Large Language Model (LLM)-based agents exhibit systemic failures in compositional generalization, limiting their robustness in interactive environments. This work introduces AGEL-Comp, a neuro-symbolic AI agent architecture designed to address this challenge by grounding actions of the agent. AGEL-Comp integrates three core innovations: (1) a dynamic Causal Program Graph (CPG) as a world model, representing procedural and causal knowledge as a directed hypergraph; (2) an Inductive Logic Programming (ILP) engine that synthesizes new Horn clauses from experiential feedback, grounding symbolic knowledge through interaction; and (3) a hybrid reasoning core where an LLM proposes a set of candidate sub-goals that are verified for logical consistency by a Neural Theorem Prover (NTP). Together, these components operationalize a deduction--abduction learning cycle: enabling the agent to deduce plans and abductively expand its symbolic world model, while a neural adaptation phase keeps its reasoning engine aligned with new knowledge. We propose an evaluation protocol within the \texttt{Retro Quest} simulation environment to probe for compositional generalization scenarios to evaluate our AGEL agent. Our findings clearly indicate the better performance of our AGEL model over pure LLM-based models. Our framework presents a principled path toward agents that build an explicit, interpretable, and compositionally structured understanding of their world.

【2】Grounding vs. Compositionality: On the Non-Complementarity of Reasoning in Neuro-Symbolic Systems
标题：基础与组合性：论神经符号系统中推理的非互补性
链接：https://arxiv.org/abs/2604.26521

作者：Mahnoor Shahid,Hannes Rothe
备注：Accepted at AAAI MAKE 2026
摘要：组合泛化仍然是现代神经网络的一个基本弱点，限制了它们在需要分布外推理的领域中的鲁棒性和适用性。神经符号人工智能的一个核心但未经证实的假设是，组合推理将作为成功的符号基础的副产品出现。这项工作提出了第一个系统的实证分析，挑战这一假设，解开接地和推理的贡献。为了使这项调查可操作化，我们引入了迭代逻辑张量网络（$i$LTN），这是一种专为多步演绎而设计的完全可微架构。使用一个正式的分类泛化-探测新的实体，看不见的关系，和复杂的规则组成-我们证明了一个模型训练只在一个接地目标无法概括。相比之下，我们的完整$i$LTN，在感知基础和多步推理上联合训练，在所有任务中都实现了高zero-shot准确率。我们的研究结果提供了确凿的证据表明，符号接地，虽然必要的，是不足以概括，建立推理是不是一个新兴的财产，但一个独特的能力，需要一个明确的学习目标。
摘要：Compositional generalization remains a foundational weakness of modern neural networks, limiting their robustness and applicability in domains requiring out-of-distribution reasoning. A central, yet unverified, assumption in neuro-symbolic AI is that compositional reasoning will emerge as a byproduct of successful symbol grounding. This work presents the first systematic empirical analysis to challenge this assumption by disentangling the contributions of grounding and reasoning. To operationalize this investigation, we introduce the Iterative Logic Tensor Network ($i$LTN), a fully differentiable architecture designed for multi-step deduction. Using a formal taxonomy of generalization -- probing for novel entities, unseen relations, and complex rule compositions -- we demonstrate that a model trained solely on a grounding objective fails to generalize. In contrast, our full $i$LTN, trained jointly on perceptual grounding and multi-step reasoning, achieves high zero-shot accuracy across all tasks. Our findings provide conclusive evidence that symbol grounding, while necessary, is insufficient for generalization, establishing that reasoning is not an emergent property but a distinct capability that requires an explicit learning objective.

医学相关(4篇)

【1】Asynchronous Federated Unlearning with Invariance Calibration for Medical Imaging
标题：基于不变性校正的医学成像异步联合学习算法
链接：https://arxiv.org/abs/2604.26809

作者：Zhaoyuan Cai,Xinglin Zhang
备注：8 pages, 5 figures, the article is accepted by IEEE IJCNN 2026
摘要：联邦遗忘（FU）是联邦学习（FL）中的一种新兴范式，它使参与的客户能够从经过训练的全球模型中完全删除他们的贡献，这是由授权被遗忘权的数据保护法规驱动的。然而，现有的FU方法大多依赖于同步协调。此要求迫使整个联合停止并等待落伍者完成擦除，从而由于设备异构性而造成显著延迟。此外，这些方法经常面临的问题是，删除数据的影响只是暂时抑制，并在随后的训练过程中重新出现，而不是真正被删除。为了克服这些局限性，本文提出了具有不变性校准的异步联合Unlearning（AFU-IC），这是一种用于医学成像的新框架，它将擦除过程从全局训练工作流中分离出来。这使得目标客户端能够在不中断全局训练的情况下异步执行遗忘。同时，服务器端的不变性校准机制可以防止模型重新学习被删除的数据。在三个医学基准上进行的广泛实验表明，AFU-IC实现了与黄金标准再训练相当的遗忘功效和模型保真度，同时与同步基线相比显着减少了挂钟延迟。AFU-IC确保跨筒仓医疗环境中的高效、合规和可靠FL。
摘要：Federated Unlearning (FU) is an emerging paradigm in Federated Learning (FL) that enables participating clients to fully remove their contributions from a trained global model, driven by data protection regulations that mandate the right to be forgotten. However, existing FU methods mostly rely on synchronous coordination. This requirement forces the entire federation to halt and wait for stragglers to complete erasure, creating significant delays due to device heterogeneity. Furthermore, these methods often face the problem that the influence of erased data is merely suppressed temporarily and resurfaces during subsequent training, rather than being genuinely removed. To overcome these limitations, this paper proposes Asynchronous Federated Unlearning with Invariance Calibration (AFU-IC), a novel framework for medical imaging that decouples the erasure process from the global training workflow. This enables the target client to perform unlearning asynchronously without interrupting global training. Meanwhile, a server-side invariance calibration mechanism prevents the model from relearning the erased data. Extensive experiments on three medical benchmarks demonstrate that AFU-IC achieves unlearning efficacy and model fidelity comparable to gold-standard retraining while significantly reducing wall-clock latency compared to synchronous baselines. AFU-IC ensures efficient, compliant and reliable FL in cross-silo medical environments.

【2】Do Larger Models Really Win in Drug Discovery? A Benchmark Assessment of Model Scaling in AI-Driven Molecular Property and Activity Prediction
标题：大型模型真的会在药物研发中获胜吗？人工智能驱动的分子性质和活性预测中模型缩放的基准评估
链接：https://arxiv.org/abs/2604.26498

作者：Jinjiang Guo
摘要：分子基础模型和通用大型语言模型的快速增长鼓励了以规模为中心的人工智能在药物发现中的观点，其中较大的预训练模型有望取代紧凑的化学信息学模型和特定任务的图神经网络（GNN）。我们在22个分子性质和活性终点上测试了这一假设，包括公共ADMET和Tox 21基准和两个内部抗感染活性数据集。在167，056个待完成的任务中-在结构相似性分离的五重交叉验证下进行分子评估（37，756例ADMET、77，946例Tox 21、49，266例抗结核和2，088例抗疟疾），经典机器学习（ML）模型，如RF（ECFP 4）和ExtraTrees RDKit描述符（RDKit descriptor）赢得了10项主要指标任务，GIN和Ligandformer等GNN赢得了9项，MoLFormer和ChemBERTa 2等预训练分子序列模型赢得了3项。以GPT5.5-SAR和Opus4.7-SAR为代表的基于规则的SAR推理基线在预先指定的主要指标下不获胜，尽管训练折叠导出的SAR知识为SAR推理和解释提供了可测量但不均匀的增益。这些结果表明，紧凑、专业的模型对于分子性质和活性预测仍然非常有效。经典ML、GNN和预训练序列模型之间的性能差异通常是适度的，并且依赖于端点，而更大或更通用的模型并不提供通用的预测优势。大型模型仍然可以增加zero-shot推理，SAR解释和假设生成的价值，但结果表明，预测性能取决于分子代表性，归纳偏差，数据制度，终点生物学和验证协议之间的对齐。
摘要：The rapid growth of molecular foundation models and general-purpose large language models has encouraged a scale-centric view of artificial intelligence in drug discovery, in which larger pretrained models are expected to supersede compact cheminformatics models and task-specific graph neural networks (GNNs). We test this assumption on 22 molecular property and activity endpoints, including public ADMET and Tox21 benchmarks and two internal anti-infective activity datasets. Across 167,056 held-out task--molecule evaluations under structure-similarity-separated five-fold cross-validation (37,756 ADMET, 77,946 Tox21, 49,266 anti-TB and 2,088 antimalaria), classical machine-learning (ML) models such as RF(ECFP4) and ExtraTrees(RDKit descriptors) win ten primary-metric tasks, GNNs such as GIN and Ligandformer win nine, and pretrained molecular sequence models such as MoLFormer and ChemBERTa2 win three. Rule-based SAR reasoning baselines, represented by GPT5.5-SAR and Opus4.7-SAR, do not win under the prespecified primary metrics, although train-fold-derived SAR knowledge provides measurable but uneven gains for SAR reasoning and interpretation. These results indicate that compact, specialized models remain highly effective for molecular property and activity prediction. The performance differences among classical ML, GNN and pretrained sequence models are often modest and endpoint-dependent, whereas larger or more general models do not provide a universal predictive advantage. Large models may still add value for zero-shot reasoning, SAR interpretation and hypothesis generation, but the results suggest that predictive performance depends on the alignment among molecular representation, inductive bias, data regime, endpoint biology and validation protocol.

【3】Multi-Stage Bi-Atrial Segmentation Framework from 3D Late Gadolinium-Enhanced MRI using V-Net Family Models
标题：使用V-Net家族模型的3D晚期加达增强MRI的多阶段双心房分割框架
链接：https://arxiv.org/abs/2604.26251

作者：Hao Wen,Jingsu Kang
备注：6 pages, 2 figures, technical report for participating the MBAS2024 challenge hosted on the MICCAI2024 conference
摘要：我们报告我们的多阶段框架设计的问题，多类双心房分割的三维晚期钆增强（LGE）MRI的人类心脏。该流水线包括使用多维对比度限制自适应直方图均衡（MCLAHE）的预处理步骤;使用V-Net系列模型从MCLAHE增强和下采样MRI进行粗区域分割;以及使用另一个V-Net模型从粗区域进行细分割。采用非对称损失法对模型权值进行优化。
摘要：We report our multi-stage framework designed for the problem of multi-class bi-atrial segmentation from 3D late gadolinium-enhanced (LGE) MRI of the human heart. The pipeline consists of a preprocessing step using multidimensional contrast limited adaptive histogram equalization (MCLAHE); coarse region segmentation from MCLAHE-enhanced and down-sampled MRI using a V-Net family model; and fine segmentation from the coarse region using another V-Net model. Asymmetric loss is adopted to optimize the model weights.

【4】Recurrence-Based Nonlinear Vocal Dynamics as Digital Biomarkers for Depression Detection from Conversational Speech
标题：基于复发的非线性声乐动力学作为从对话言语中检测抑郁症的数字生物标志物
链接：https://arxiv.org/abs/2604.26242

作者：Himadri S Samanta
备注：12 pages, 5 figures
摘要：抑郁症的数字生物标志物在很大程度上依赖于静态声学描述符，汇总汇总统计数据或传统的机器学习表示。这样的方法可能会错过嵌入在会话语音动态的非线性时间组织。我们假设抑郁症与发声状态轨迹中的复发结构改变有关，反映了发声系统如何随时间重新访问声学状态的变化。使用DAIC-WOZ语料库的抑郁子集与142个标记的参与者，我们建模帧级COVAREP轨迹作为非线性动力系统，并从74个声乐通道推导出基于递归的生物标志物。Logistic回归与特征选择和分层交叉验证评估分类性能。基于复发的生物标志物实现了0.689的平均交叉验证AUC，超过静态声学基线、熵动力学特征、赫斯特指数特征、确定性特征和Lyapunov样不稳定性代理。排列检验表明具有统计学显著性，$p=0.004$。合并的交叉验证预测结果为AUC 0.665，95% bootstrap置信区间为[0.568，0.758]。这些研究结果表明，抑郁症的特点可能是改变复发结构的对话声乐动力学和支持非线性状态空间分析作为一个有前途的方向数字精神生物标志物。
摘要：Digital biomarkers for depression have largely relied on static acoustic descriptors, pooled summary statistics, or conventional machine learning representations. Such approaches may miss nonlinear temporal organization embedded in conversational vocal dynamics. We hypothesized that depression is associated with altered recurrence structure in vocal state trajectories, reflecting changes in how the vocal system revisits acoustic states over time. Using the depression subset of the DAIC-WOZ corpus with 142 labeled participants, we modeled frame-level COVAREP trajectories as nonlinear dynamical systems and derived recurrence-based biomarkers from 74 vocal channels. Logistic regression with feature selection and stratified cross-validation evaluated classification performance. Recurrence-based biomarkers achieved a mean cross-validated AUC of 0.689, exceeding static acoustic baselines, entropy-dynamics features, Hurst exponent features, determinism features, and Lyapunov-like instability proxies. Permutation testing indicated statistical significance with $p=0.004$. Pooled cross-validated predictions yielded AUC 0.665 with a 95\% bootstrap confidence interval of [0.568, 0.758]. These findings suggest that depression may be characterized by altered recurrence structure in conversational vocal dynamics and support nonlinear state-space analysis as a promising direction for digital psychiatric biomarkers.

蒸馏|知识提取(1篇)

【1】Edge AI for Automotive Vulnerable Road User Safety: Deployable Detection via Knowledge Distillation
标题：边缘人工智能用于汽车弱势道路用户安全：通过知识蒸馏进行可部署检测
链接：https://arxiv.org/abs/2604.26857

作者：Akshay Karjol,Darrin M. Hanna
备注：6 pages, 3 figures
摘要：在边缘硬件上部署准确的目标检测以实现弱势道路使用者（VRU）安全，需要平衡模型容量与计算约束。大型模型实现了高精度，但在边缘部署所需的INT 8量化下失败，而小型模型则牺牲了检测性能。本文提出了一个知识蒸馏（KD）框架，该框架训练一个紧凑的YOLOv 8-S学生（11.2M参数）来模仿YOLOv 8-L教师（43.7M参数），在保持量化鲁棒性的同时实现3.9倍压缩。我们在全尺寸BDD 100 K（70 K训练图像）上进行评估，训练后量化到INT 8。教师在INT 8下遭受灾难性的降级（-23%mAP），而KD学生保持准确性（-5.6%mAP）。分析表明，KD传递的是精确校准而不是原始检测能力：KD学生在INT 8的直接训练中达到了0.748的精确度，而在同等召回率下获得了14.5%的增益，与崩溃的教师相比，误报减少了44%。在INT 8，KD学生在一个小3.9倍的模型中超过了教师的FP 32精度（0.748对0.718）。这些发现将知识蒸馏确定为在边缘硬件上部署准确、安全关键的VRU检测的要求。
摘要：Deploying accurate object detection for Vulnerable Road User (VRU) safety on edge hardware requires balancing model capacity against computational constraints. Large models achieve high accuracy but fail under INT8 quantization required for edge deployment, while small models sacrifice detection performance. This paper presents a knowledge distillation (KD) framework that trains a compact YOLOv8-S student (11.2M parameters) to mimic a YOLOv8-L teacher (43.7M parameters), achieving 3.9x compression while preserving quantization robustness. We evaluate on full-scale BDD100K (70K training images) with Post-Training Quantization to INT8. The teacher suffers catastrophic degradation under INT8 (-23% mAP), while the KD student retains accuracy (-5.6% mAP). Analysis reveals that KD transfers precision calibration rather than raw detection capacity: the KD student achieves 0.748 precision versus 0.653 for direct training at INT8, a 14.5% gain at equivalent recall, reducing false alarms by 44% versus the collapsed teacher. At INT8, the KD student exceeds the teacher's FP32 precision (0.748 vs. 0.718) in a model 3.9x smaller. These findings establish knowledge distillation as a requirement for deploying accurate, safety-critical VRU detection on edge hardware.

推荐(1篇)

【1】The Bandit's Blind Spot: The Critical Role of User State Representation in Recommender Systems
标题：强盗的盲点：推荐系统中用户状态表示的关键作用
链接：https://arxiv.org/abs/2604.26651

作者：Pedro R. Pires,Gregorio F. Azevedo,Rafael T. Sereicikas,Pietro L. Campos,Tiago A. Almeida
备注：Published in SAC'26, 8 pages, 2 figures
摘要：随着在线信息可用性的增加，推荐系统已成为许多基于Web的系统的重要工具。由于推荐环境的连续性，这些系统越来越依赖于上下文多臂强盗（CMAB）来提供个性化和实时的建议。在这些系统中，一个关键但尚未充分研究的组件是用户状态的表示，它通常封装了用户的交互历史，并与模型的决策和学习密切相关。在本文中，我们研究了不同的嵌入为基础的状态表示来自矩阵分解模型对传统的CMAB算法的性能的影响。我们的大规模实验表明，状态表示的变化可以导致比通过改变强盗算法本身所实现的更大的改进。此外，没有单一的嵌入或聚合策略在数据集上始终占主导地位，这强调了特定领域评估的必要性。这些结果暴露了文献中的一个巨大差距，并强调推进基于Bandit的推荐系统需要一种整体的方法，优先考虑嵌入质量和状态构建以及算法创新。我们实验的源代码可在https://github.com/UFSCar-LaSID/bandits_blind_spot上公开获得。
摘要：With the increasing availability of online information, recommender systems have become an important tool for many web-based systems. Due to the continuous aspect of recommendation environments, these systems increasingly rely on contextual multi-armed bandits (CMAB) to deliver personalized and real-time suggestions. A critical yet underexplored component in these systems is the representation of user state, which typically encapsulates the user's interaction history and is deeply correlated with the model's decisions and learning. In this paper, we investigate the impact of different embedding-based state representations derived from matrix factorization models on the performance of traditional CMAB algorithms. Our large-scale experiments reveal that variations in state representation can lead to improvements greater than those achieved by changing the bandit algorithm itself. Furthermore, no single embedding or aggregation strategy consistently dominates across datasets, underscoring the need for domain-specific evaluation. These results expose a substantial gap in the literature and emphasize that advancing bandit-based recommender systems requires a holistic approach that prioritizes embedding quality and state construction alongside algorithmic innovation. The source code for our experiments is publicly available on https://github.com/UFSCar-LaSID/bandits_blind_spot.

聚类(1篇)

【1】Spatially-constrained clustering of geospatial features for heat vulnerability assessment of favelas in Rio de Janeiro
标题：里约热内卢贫民窟热脆弱性评估的地理空间要素空间约束聚集
链接：https://arxiv.org/abs/2604.26133

作者：Baptiste Clemence,Thomas Hallopeau,Vanderlei Pascoal De Matos,Laurent Demagistri,Joris Guerin
备注：Workshop Publication (ICLR ML4RS 2026)
摘要：非正规住区不成比例地面临与气候有关的健康危害。然而，现有的方法缺乏系统的办法来联系不同的住区特点与环境健康的结果。我们开发了一个数据驱动的框架，以评估里约热内卢的贫民窟的热脆弱性相结合的空间约束聚类与地表温度（LST）分析。利用遥感和地理空间特征，我们确定了两个不同的贫民窟类型：最近，连接良好的定居点在平坦的地形（集群0）和历史上，连接不良的社区在植被覆盖的斜坡（集群1）。对16个极端高温事件的分析显示，集群之间的系统性温差为2- 3摄氏度，平坦地形的贫民窟经历了明显更高的热暴露。我们的研究结果表明，住区形态严重影响热脆弱性，提供了一个可复制的框架，有针对性的城市规划和公共卫生干预措施，在全球非正式住区。
摘要：Informal settlements face disproportionate exposure to climate-related health hazards. However, existing methodologies lack systematic approaches to link diverse settlement characteristics with environmental health outcomes. We develop a data-driven framework to assess heat vulnerability in Rio de Janeiro's favelas by combining spatially-constrained clustering with land surface temperature (LST) analysis. Using remote sensing and geospatial features, we identify two distinct favela typologies: recent, well-connected settlements on flat terrain (Cluster 0) and historical, poorly-connected communities on vegetated slopes (Cluster 1). Analysis of 16 extreme heat events reveals systematic temperature differences of 2--3$^\circ$C between clusters, with flat-terrain favelas experiencing significantly higher heat exposure. Our findings demonstrate that settlement morphology critically influences heat vulnerability, providing a replicable framework for targeted urban planning and public health interventions in informal settlements globally.

超分辨率|去噪|去模糊|去雾(1篇)

【1】Super-resolution Multi-signal Direction-of-Arrival Estimation by Hankel-structured Sensing and Decomposition
标题：基于Hankel结构感知和分解的超分辨率多信号到达方向估计
链接：https://arxiv.org/abs/2604.26793

作者：Georgios I. Orfanidis,Dimitris A. Pados,George Sklivanitis,Elizabeth Serena Bentley
摘要：现代自治系统中的传感方式，涉及硬件约束的空间采样在大阵列有限的相干时间，我们开发了一种新的框架，快速超分辨率多信号到达方向（DoA）估计的基础上Hankel结构的传感和数据矩阵分解的任意秩，下的$L_2$和$L_1$-范数制定。在高斯白噪声中证明了$L_2$-范数估计是最大似然最优的。在独立同分布（i.i.d.）情形下，证明了$L_1$-范数估计是最大似然最优的.各向同性拉普拉斯噪声，提供广泛的鲁棒性，脉冲干扰和损坏的测量中经常遇到的实践。大量的仿真结果表明，所提出的方法具有强大的超分辨率能力，需要显着降低信噪比和实现更高的分辨率概率比最近的竞争方法。
摘要：Motivated by sensing modalities in modern autonomous systems that involve hardware-constrained spatial sampling over large arrays with limited coherence time, we develop a novel framework for rapid super-resolution multi-signal direction-of-arrival (DoA) estimation based on Hankel-structured sensing and data matrix decomposition of arbitrary rank, under both the $L_2$ and $L_1$-norm formulation. The resulting $L_2$-norm estimator is shown to be maximum-likelihood optimal in white Gaussian noise. The $L_1$-norm estimator is shown to be maximum-likelihood optimal in independent, identically distributed (i.i.d.) isotropic Laplace noise, offering broad robustness to impulsive interference and corrupted measurements commonly encountered in practice. Extensive simulations demonstrate that the proposed methods exhibit powerful super-resolution capabilities, requiring significantly lower SNR and achieving substantially higher resolution probability than recent competing approaches.

联邦学习|隐私保护|加密(3篇)

【1】Who Trains Matters: Federated Learning under Enrollment and Participation Selection Biases
标题：谁训练很重要：入学和参与选择偏见下的联邦学习
链接：https://arxiv.org/abs/2604.26604

作者：Gota Morishita
备注：10 pages, 2 figures
摘要：联邦学习（FL）从分布式客户端贡献的更新中训练共享模型，通常隐含地假设贡献客户端代表目标人群。在实践中，这种代表性假设可能在两个不同的阶段失败，导致选择偏差。首先，设备限制、软件要求或用户同意等资格规则决定了哪些客户端曾经注册并可用于培训，从而导致\n {注册偏差}。其次，在注册的客户端中，用户和系统因素，如电池状态，网络状态和本地时间，决定了哪些客户端参与每一轮通信，诱导参与偏差。虽然现有的工作在很大程度上解决了轮水平的参与偏差，但它对人口水平的入学偏差的关注要少得多，这可能会导致培训目标和目标人群目标之间的持续不匹配。我们正式FL下的两阶段选择模型，并推导出\textsc{FedIPW}，一个逆概率加权聚合计划，恢复目标人口的平均更新标准的可验证性和积极性的假设。由于未登记的客户通常无法获得客户级协变量，因此我们还引入了一个有限信息聚合校准扩展，该扩展使用已知的目标人群总结来重新加权登记的样本，部分纠正登记偏倚。我们进一步提供了一个算法不可知的优化分析下剩余的加权误差，并表明，不完全的选择校正可以诱导一个非零的偏差地板。最后，合成联邦逻辑回归的实验验证了预测的目标不匹配，并表明注册校正减少了目标人口错误下的两阶段选择。
摘要：Federated learning (FL) trains a shared model from updates contributed by distributed clients, often implicitly assuming that contributing clients are representative of the target population. In practice, this representativeness assumption can fail at two distinct stages, inducing selection bias. First, eligibility rules such as device constraints, software requirements, or user consent determine which clients are ever enrolled and reachable for training, inducing \emph{enrollment bias}. Second, among enrolled clients, user and system factors such as battery state, network status, and local time determine which clients participate in each communication round, inducing \emph{participation bias}. Although existing work has largely addressed round-level participation bias, it has paid far less attention to population-level enrollment bias, which can induce a persistent mismatch between the training objective and the target-population objective. We formalize FL under a two-stage selection model and derive \textsc{FedIPW}, an inverse-probability-weighted aggregation scheme that recovers the target-population mean update under standard ignorability and positivity assumptions. Because client-level covariates are often unavailable for non-enrolled clients, we also introduce a limited-information aggregate-calibration extension that uses known target-population summaries to reweight the enrolled sample, partially correcting enrollment bias. We further provide an algorithm-agnostic optimization analysis under residual weighting error and show that incomplete selection correction can induce a non-vanishing bias floor. Finally, experiments on synthetic federated logistic regression validate the predicted objective mismatch and show that enrollment correction reduces target-population error under two-stage selection.

【2】Sample Selection Using Multi-Task Autoencoders in Federated Learning with Non-IID Data
标题：在非IID数据的联邦学习中使用多任务自动编码器进行样本选择
链接：https://arxiv.org/abs/2604.26116

作者：Emre Ardıç,Yakup Genç
备注：Published in Engineering Science and Technology, an International Journal, 61 (2025), 101920. DOI: https://doi.org/10.1016/j.jestch.2024.101920 and Codes: https://github.com/eardic/FL_DPQS
摘要：联合学习是一种机器学习范例，其中多个设备在中央服务器的监督下协作训练模型，同时确保数据隐私。然而，它的性能往往会受到冗余，恶意或异常样本的阻碍，导致模型退化和效率低下。为了克服这些问题，我们提出了新的图像分类的样本选择方法，采用多任务自动编码器估计样本的贡献，通过损失和特征分析。我们的方法结合了无监督离群值检测，使用一类支持向量机（OCSVM），隔离森林（IF），和自适应损失阈值（AT）的方法由中央服务器管理，过滤客户端上的噪声样本。我们还提出了一种由中央服务器控制的多类深度支持向量数据描述（SVDD）损失，以增强基于特征的样本选择。我们在CIFAR10和MNIST数据集上验证了我们的方法，这些数据集包括不同数量的客户端、非IID分布和高达40%的噪声水平。结果表明，基于损失的样本选择的准确性显着提高，实现了高达7.02%的收益CIFAR10与OCSVM和1.83%的MNIST与AT。此外，我们的联合SVDD损失进一步改进了基于特征的样本选择，在CIFAR10上使用OCSVM获得了高达0.99%的准确性增益。这些结果表明，我们的方法在提高各种客户端计数和噪声条件下的模型准确性方面是有效的。
摘要：Federated learning is a machine learning paradigm in which multiple devices collaboratively train a model under the supervision of a central server while ensuring data privacy. However, its performance is often hindered by redundant, malicious, or abnormal samples, leading to model degradation and inefficiency. To overcome these issues, we propose novel sample selection methods for image classification, employing a multitask autoencoder to estimate sample contributions through loss and feature analysis. Our approach incorporates unsupervised outlier detection, using one-class support vector machine (OCSVM), isolation forest (IF), and adaptive loss threshold (AT) methods managed by a central server to filter noisy samples on clients. We also propose a multi-class deep support vector data description (SVDD) loss controlled by a central server to enhance feature-based sample selection. We validate our methods on CIFAR10 and MNIST datasets across varying numbers of clients, non-IID distributions, and noise levels up to 40%. The results show significant accuracy improvements with loss-based sample selection, achieving gains of up to 7.02% on CIFAR10 with OCSVM and 1.83% on MNIST with AT. Additionally, our federated SVDD loss further improves feature-based sample selection, yielding accuracy gains of up to 0.99% on CIFAR10 with OCSVM. These results show the effectiveness of our methods in improving model accuracy across various client counts and noise conditions.

【3】Privacy-Preserving Federated Learning Framework for Distributed Chemical Process Optimization
标题：分布式化学过程优化的隐私保护联邦学习框架
链接：https://arxiv.org/abs/2604.26073

作者：Teetat Pipattaratonchai,Aueaphum Aueawatthanaphisut
备注：10 pages, 5 figures, 2 tables, 17 equations
摘要：工业化工厂通常在严格的数据保密性约束下运行，这使得集中式数据驱动的过程建模变得困难。联邦学习（FL）提供了一个很有前途的解决方案，它支持跨分布式设施的协作模型训练，而无需共享原始操作数据。本文提出了一种隐私保护的联邦学习框架，用于使用从多个地理上分离的工厂收集的数据进行分布式化工过程优化。每个工厂使用自己的时间序列传感器数据在本地训练基于神经网络的过程模型，而只有模型参数通过安全聚合机制传输到中央聚合服务器。这种设计允许跨工厂的知识共享，同时保持严格的数据本地性和行业机密性。实验评估进行了使用过程数据集从三个独立的化工厂在异构条件下运行。结果表明，联邦模型的快速收敛，在前五轮通信中，全局均方误差从约2369降至50以下，40轮后稳定在35左右。与仅本地训练相比，所提出的联邦框架显着提高了所有工厂的预测准确性，同时实现了与集中式训练相当的性能。研究结果表明，联邦学习为协作工业分析提供了一种有效且可扩展的解决方案，从而能够在分布式化学生产设施中实现隐私保护的预测建模和过程优化。
摘要：Industrial chemical plants often operate under strict data confidentiality constraints, making centralized data-driven process modeling difficult. Federated learning (FL) provides a promising solution by enabling collaborative model training across distributed facilities without sharing raw operational data. This paper proposes a privacy-preserving federated learning framework for distributed chemical process optimization using data collected from multiple geographically separated plants. Each plant locally trains a neural-network-based process model using its own time-series sensor data, while only model parameters are transmitted to a central aggregation server through secure aggregation mechanisms. This design allows cross-plant knowledge sharing while maintaining strict data locality and industrial confidentiality. Experimental evaluation was conducted using process datasets from three independent chemical plants operating under heterogeneous conditions. The results demonstrate rapid convergence of the federated model, with the global mean squared error decreasing from approximately 2369 to below 50 within the first five communication rounds and stabilizing around 35 after 40 rounds. In comparison with local-only training, the proposed federated framework significantly improves prediction accuracy across all plants, while achieving performance comparable to centralized training. The findings indicate that federated learning provides an effective and scalable solution for collaborative industrial analytics, enabling privacy-preserving predictive modeling and process optimization across distributed chemical production facilities.

推理|分析|理解|解释(6篇)

【1】CurEvo: Curriculum-Guided Self-Evolution for Video Understanding
标题：CurEvo：课程引导的视频理解自我进化
链接：https://arxiv.org/abs/2604.26707

作者：Guiyi Zeng,Junqing Yu,Yi-Ping Phoebe Chen,Xu Chen,Wei Yang,Zikai Song
备注：10 pages, 5 figures
摘要：自进化视频理解框架的最新进展已经证明了无需人工注释的自主学习的潜力。然而，现有的方法往往受到弱控制的优化和不受控制的难度进展，因为他们缺乏结构化的指导，在整个迭代学习过程。为了解决这些局限性，我们提出了CurEvo，一个由课程引导的自我进化框架，将课程学习引入自我进化，以实现更结构化和渐进式的模型改进。CurEvo根据模型能力动态调节任务难度，细化评估标准，平衡数据多样性，形成了一个由模型引导的反馈回路，使学习复杂度与模型能力保持一致。基于这一原则，我们开发了一个多维度的自适应QA框架，共同发展的问题生成和答案评估跨感知，识别和理解维度，确保连贯和可衡量的课程进展。通过这种整合，CurEvo将弱控制的自我进化转化为更结构化的学习过程，以实现自主视频理解。在七个主干中，CurEvo在四个VideoQA基准上持续提高基准准确性和基于评估者的语义得分，验证了视频理解的基于脚本引导的自我进化的有效性。
摘要：Recent advances in self-evolution video understanding frameworks have demonstrated the potential of autonomous learning without human annotations. However, existing methods often suffer from weakly controlled optimization and uncontrolled difficulty progression, as they lack structured guidance throughout the iterative learning process. To address these limitations, we propose CurEvo, a curriculum-guided self-evolution framework that introduces curriculum learning into self-evolution to achieve more structured and progressive model improvement. CurEvo dynamically regulates task difficulty, refines evaluation criteria, and balances data diversity according to model competence, forming a curriculum-guided feedback loop that aligns learning complexity with model capability. Built upon this principle, we develop a multi-dimensional adaptive QA framework that jointly evolves question generation and answer evaluation across perception, recognition, and understanding dimensions, ensuring coherent and measurable curriculum progression. Through this integration, CurEvo transforms weakly controlled self-evolution into a more structured learning process for autonomous video understanding. Across seven backbones, CurEvo consistently improves both benchmark accuracy and evaluator-based semantic score on four VideoQA benchmarks, validating the effectiveness of curriculum-guided self-evolution for video understanding.

【2】Understanding DNNs in Feature Interaction Models: A Dimensional Collapse Perspective
标题：理解特征交互模型中的DNN：维度崩溃的角度
链接：https://arxiv.org/abs/2604.26489

作者：Jiancheng Wang,Mingjia Yin,Hao Wang,Enhong Chen
备注：6 pages
摘要：DNN在功能交互推荐模型中得到了广泛采用。然而，关于它们的作用一直存在着长期的争论。一方面，一些作品声称DNN具有隐式捕获高阶特征交互的能力。相反，最近的研究强调了DNN在有效学习点积方面的局限性，特别是二阶交互，更不用说高阶交互了。在本文中，我们提出了一个新的视角来理解DNN的有效性：它们对表示的维度鲁棒性的影响。特别是，我们进行了广泛的实验，涉及并行DNN和堆叠DNN。我们的评估包括对两个特征交互模型的完整DNN的整体研究，以及DNN内组件的细粒度消融分析。实验结果表明，并行和堆叠DNN都可以有效地减轻嵌入的维数崩溃。此外，基于梯度的理论分析，由经验证据支持，揭示了维度崩溃的潜在机制。
摘要：DNNs have gained widespread adoption in feature interaction recommendation models. However, there has been a longstanding debate on their roles. On one hand, some works claim that DNNs possess the ability to implicitly capture high-order feature interactions. Conversely, recent studies have highlighted the limitations of DNNs in effectively learning dot products, specifically second-order interactions, let alone higher-order interactions. In this paper, we present a novel perspective to understand the effectiveness of DNNs: their impact on the dimensional robustness of the representations. In particular, we conduct extensive experiments involving both parallel DNNs and stacked DNNs. Our evaluation encompasses an overall study of complete DNN on two feature interaction models, alongside a fine-grained ablation analysis of components within DNNs. Experimental results demonstrate that both parallel and stacked DNNs can effectively mitigate the dimensional collapse of embeddings. Furthermore, a gradient-based theoretical analysis, supported by empirical evidence, uncovers the underlying mechanisms of dimensional collapse.

【3】Efficient, VRAM-Constrained xLM Inference on Clients
标题：客户端上高效、受VRAM约束的xLM推理
链接：https://arxiv.org/abs/2604.26334

作者：Aditya Ukarande,Deep Shekhar,Marc Blackstein,Ram Rangan
备注：Accepted at MLSys 2026 (Industry Track). 17 pages, 7 figures, 9 tables. Code and artifacts available at: https://github.com/deepshnv/pipeshard-mlsys26-ae
摘要：为了迎接下一轮客户端AI创新，迫切需要在客户端系统上实现高精度大型语言模型（LLM）和视觉语言模型（VLM）（统称为xLM）的高效无损推理。为了解决这个问题，我们提出了流水线分片，一种新的，基准配置文件引导的CPU-GPU混合调度技术，以实现高效的，VRAM约束的推理密集和混合的专家（MoE）LLM。它结合使用子层级别的模型分片、CPU卸载、流水线复制计算和VRAM中的优先张量放置，优化了首次令牌时间（TTFT）和每秒令牌数（TPS）指标，同时灵活地适应系统和推理条件。为了高效，高精度的VLM推理，我们将流水线分片与三个众所周知的先验思想（共同称为VLMOpt）的llama.cpp实现相结合，即视觉张量CPU卸载，闪光注意力以及视觉和语言模型VRAM重叠避免。这些增强功能旨在改进NVIDIA两款重要产品未来版本中的客户端xLM推理功能，这两款产品分别是游戏内推理软件开发工具包（IGI SDK）和Cosmos-Reason 1（CR 1）物理AI推理VLM。我们对多个模型和客户端系统进行的严格评估的亮点包括：对于交互式使用，TTFT提高了6.7倍，LLM的TPS提高了30倍，CR 1推理的VRAM需求降低了10倍，而在批处理模式下，吞吐量提高了8.2倍，所有这些都与各自的积极基线相比。本文已在2026年第9届MLSys会议（行业轨道）上接受。代码和工件可在https://github.com/deepshnv/pipeshard-mlsys26-ae上获得
摘要：To usher in the next round of client AI innovation, there is an urgent need to enable efficient, lossless inference of high-accuracy large language models (LLMs) and vision language models (VLMs), jointly referred to as xLMs, on client systems. To address this, we present pipelined sharding, a novel, benchmark-profile-guided CPU-GPU hybrid scheduling technique to achieve efficient, VRAM-constrained inference for both dense and mixture-of-experts (MoE) LLMs. Using a combination of model sharding at the sub-layer level, CPU offloading, pipelined copy-compute, and prioritized tensor placement in VRAM, it optimizes both time-to-first-token (TTFT) and tokens per second (TPS) metrics, while flexibly adapting to system and inference conditions. For efficient, high-accuracy VLM inference, we combine pipelined sharding with a llama.cpp implementation of three well-understood prior ideas (jointly called VLMOpt), namely, vision tensor CPU offloading, flash attention, and vision and language model VRAM overlap avoidance. These enhancements are targeted at improving client xLM inference in future releases of two important NVIDIA products - the In-Game Inferencing software development kit (IGI SDK) and the Cosmos-Reason1 (CR1) physical AI reasoning VLM. Highlights from our rigorous evaluation spanning multiple models and client systems include: for interactive use, TTFT improves by up to 6.7x and TPS by up to 30x for LLMs, and CR1 inference's VRAM demand is down by 10x, while in batched mode, throughput improves by up to 8.2x, all compared to their respective aggressive baselines. This paper is accepted at the 9th MLSys Conference (Industry Track), 2026. Code and artifact available at: https://github.com/deepshnv/pipeshard-mlsys26-ae

【4】Apriori-based Analysis of Learned Helplessness in Mathematics Tutoring: Behavioral Patterns by Level, Intervention, and Outcome
标题：数学辅导中习得性无助的基于先验分析：按级别、干预和结果划分的行为模式
链接：https://arxiv.org/abs/2604.26237

作者：John Paul P. Miranda
备注：9 pages, 2 figures, 1 table, journal article
摘要：本研究应用Apriori演算法分析数学辅导系统日志中与习得性无助相关的行为互动模式。交互数据在三个维度上进行了检查：LH水平（低与高），基于系统的干预（有与没有）和解决问题的结果（解决与未解决）。对完整数据集的分析表明，跳过问题而不使用提示是与未解决结果相关的最常见模式，而不跳过等持久性行为总体上不那么占主导地位。LH水平的比较表明，低LH的学生有更强的解决问题和不跳过之间的联系，以及暗示使用和解决的结果之间的正相关。高LH的学生表现出更多的回避模式，跳过与未解决的结果密切相关。在基于系统的干预条件的比较中，没有干预的学生对持续成功的链接有最高的提升，而干预组有更强的模式，涉及跳过行为导致未解决的结果。结果特异性分析表明，不跳过始终与所有组中已解决的问题相关，而没有提示的跳过则预测未解决的结果。实际影响和建议进行了讨论。
摘要：This study applied the Apriori algorithm to analyze behavioral interaction patterns associated with learned helplessness (LH) in mathematics tutoring system logs. Interaction data were examined across three dimensions: LH level (low vs. high), system-based intervention (with vs. without), and problem-solving outcomes (solved vs. unsolved). The analysis of the complete dataset showed that skipping problems without using hints was the most frequent pattern linked to unsolved outcomes, while persistence behaviors such as not skipping were less dominant overall. Comparisons by LH level showed that low-LH students had stronger links between problem solving and not skipping, as well as positive associations between hint use and solved outcomes. High-LH students showed more avoidance patterns, with skipping strongly tied to unsolved outcomes. In the comparison of system-based intervention conditions, students without intervention had the highest lift for persistence-success links, while the with-intervention group had stronger patterns involving skipping behaviors leading to unsolved outcomes. Outcome-specific analysis showed that not skipping was consistently associated with solved problems across all groups, while skipping without hints predicted unsolved outcomes. Practical implications and recommendations are discussed.

【5】eDySec: A Deep Learning-based Explainable Dynamic Analysis Framework for Detecting Malicious Packages in PyPI Ecosystem
标题：eDySec：一个基于深度学习的可解释动态分析框架，用于检测PyPI生态系统中的恶意包
链接：https://arxiv.org/abs/2604.26219

作者：Sk Tanzir Mehedi,Raja Jurdak,Chadni Islam,Abu Bakar Siddique Mahi,Gowri Ramachandran
备注：12 Pages, 11 Figures, and 5 Tables
摘要：开源软件存储库的安全性越来越受到下一代软件供应链攻击的威胁。这些攻击包括多阶段恶意软件执行、远程访问激活和动态有效负载生成。传统的机器学习（ML）检测器很难检测到这些攻击，因为动态行为数据具有高维和稀疏的性质，包括系统调用、网络流量、目录访问模式和依赖日志。因此，这些数据特征降低了ML模型的性能、稳定性和可解释性。鉴于深度学习（DL）在各个领域的成功及其对复杂模式建模的潜力，这些挑战使深度学习（DL）成为一个有前途的替代方案。本文介绍了eDySec，一个基于DL的高效，稳定，可解释的动态行为分析框架，以检测恶意软件包。使用QUT-DV 25数据集，它捕获软件包的安装时和安装后的行为，我们评估DL模型，并调查功能集，以确定最有区别的属性，使有效的恶意软件包检测。此外，模型稳定性分析和可解释的人工智能技术被纳入检测管道，以实现对模型决策的稳定和透明解释。实验结果表明，eDySec显着优于国家的最先进的框架。具体来说，它将特征维度减半，同时将误报率降低82%，误报率降低79%。它还将准确性提高了3%，实现了近乎完美的稳定性，并保持了每个包170 ms的推理延迟。进一步的分析表明，特征和模型选择起着关键作用，因为某些组合会降低性能。最终，这项研究推进了对下一代攻击的动态分析的优势和局限性的理解。
摘要：The security of open-source software repositories is increasingly threatened by next-gen software supply chain attacks. These attacks include multiphase malware execution, remote access activation, and dynamic payload generation. Traditional Machine Learning (ML) detectors struggle to detect these attacks due to the high-dimensional and sparse nature of dynamic behavioral data, including system calls, network traffic, directory access patterns, and dependency logs. As a result, these data characteristics degrade the performance, stability, and explainability of ML models. These challenges have made Deep Learning (DL) a promising alternative, given its success across various domains and its potential for modeling complex patterns. This paper presents eDySec, a DL-based efficient, stable, and explainable framework for dynamic behavioral analysis to detect malicious packages. Using the QUT-DV25 dataset, which captures both install-time and post-installation behaviors of packages, we evaluate DL models and investigate feature sets to identify the most discriminative attributes for enabling efficient malicious package detection. Additionally, model stability analysis and explainable AI techniques are incorporated into the detection pipeline to enable stable, and transparent interpretations of model decisions. Experimental results demonstrate that eDySec significantly outperforms the state-of-the-art frameworks. Specifically, it halves feature dimensionality while lowering false positives by 82% and false negatives by 79%. It also improves accuracy by 3%, achieves near-perfect stability, and maintains an inference latency of 170ms per package. Further analysis reveals that feature and model selection play a critical role, as certain combinations degrade performance. Ultimately, this study advances the understanding of the strengths and limitations of dynamic analysis against next-gen attacks.

【6】A Multimodal and Explainable Machine Learning Approach to Diagnosing Multi-Class Ejection Fraction from Electrocardiograms
标题：从心电图诊断多类射血分数的多模式和可解释的机器学习方法
链接：https://arxiv.org/abs/2604.25942

作者：Catherine Ning,Yu Ma,Cindy Beini Wang,Sean McMahon,Joseph Radojevic,Steven Zweibel,Dimitris Bertsimas
摘要：左心室射血分数（LVEF）评估依赖于超声心动图，限制了初级保健和资源有限的环境。我们开发了一个多模态机器学习框架，将工程化的12导联ECG时间序列特征与结构化的EHR变量相结合，将LVEF分为四个临床使用的分层：正常（> 50%），轻度降低（40 - 50%），中度降低（30 - 40%）和严重降低（<30%）。为了支持模型的可解释性，我们通过SHAP属性确定了最有影响力的ECG和EHR特征。使用Hartford HealthCare的回顾性数据，我们对来自30，952名门诊患者的36，784个心电图-超声心动图对训练了XGBoost模型，并评估了后续时期19，966个心电图的时间概括性。多模式模型实现了0.95（重度）、0.92（中度）、0.82（轻度）和0.91（正常）的单次与静息AUROC，优于仅ECG和仅EHR基线，并在时间验证下保持性能。这项工作支持基于ECG的多模式LVEF分层，作为一种实用的筛查和分诊辅助手段，在资源有限的情况下优先进行确证性成像。
摘要：Left ventricular ejection fraction (LVEF) assessment depends on echocardiography, limiting access in primary care and resource-constrained settings. We developed a multimodal machine-learning framework that combines engineered 12-lead ECG timeseries features with structured EHR variables to classify LVEF into four clinically used strata: normal (>50%), mildly reduced (40-50%), moderately reduced (30-40%), and severely reduced (<30%). To support model explainability, we identified the most influential ECG and EHR features via SHAP attributions. Using retrospective data from Hartford HealthCare, we trained XGBoost models on 36,784 ECG-echocardiogram pairs from 30,952 outpatients and evaluated temporal generalizability on 19,966 ECGs from a subsequent period. The multimodal model achieved one-vs-rest AUROCs of 0.95 (severe), 0.92 (moderate), 0.82 (mild), and 0.91 (normal), outperforming ECG-only and EHR-only baselines, and maintained performance under temporal validation. This work supports ECG-based, multimodal LVEF stratification as a practical screening and triage aid to prioritize confirmatory imaging where resources are limited.

检测相关(4篇)

【1】SG-UniBuc-NLP at SemEval-2026 Task 6: Multi-Head RoBERTa with Chunking for Long-Context Evasion Detection
标题：SG-UniBuc-NLP参加SemEval-2026任务6：具有分块功能的多头RoBERTa用于长上下文逃避检测
链接：https://arxiv.org/abs/2604.26375

作者：Gabriel Stefan,Sergiu Nisioi
备注：Accepted to SemEval-2026 (Task 6: CLARITY: Unmasking Political Question Evasions)
摘要：我们描述了我们的系统SemEval-2026任务6（真实性：揭露政治问题规避），它分类的粗粒度的清晰度（3路）和细粒度的规避策略（9路）的英语政治采访的反应。由于响应经常超过标准Transformer编码器的512个令牌限制，因此我们采用重叠滑动窗口分块策略，并在块表示上使用元素最大池聚合。一个共享的罗伯塔大编码器提供两个特定于任务的头通过多任务目标联合训练，推理时间集成超过7倍分层交叉验证。我们的系统在子任务1上实现了0.80的宏F1，在子任务2上实现了0.51的宏F1，在两个子任务中均排名第11。
摘要：We describe our system for SemEval-2026 Task 6 (CLARITY: Unmasking Political Question Evasions), which classifies English political interview responses by coarse-grained clarity (3-way) and fine-grained evasion strategy (9-way). Since responses frequently exceed the 512-token limit of standard Transformer encoders, we apply an overlapping sliding-window chunking strategy with element-wise Max-Pooling aggregation over chunk representations. A shared RoBERTa-large encoder supplies two task-specific heads trained jointly via a multi-task objective, with inference-time ensembling over 7-fold stratified cross-validation. Our system achieves a Macro-F1 of 0.80 on Subtask 1 and 0.51 on Subtask 2, ranking 11th in both subtasks.

【2】VulStyle: A Multi-Modal Pre-Training for Code Stylometry-Augmented Vulnerability Detection
标题：VulStyle：用于代码风格增强漏洞检测的多模式预训练
链接：https://arxiv.org/abs/2604.26313

作者：Chidera Biringa,Ajmal Abbas,Vishnu Selvaraj,Gokhan Kul
备注：12 pages, 2 figures. Accepted at the 56th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2026)
摘要：我们提出了VulStyle，一个多模态的软件漏洞检测模型，共同编码的功能级源代码，非终端抽象树（AST）结构，和代码样式（CStyle）功能。之前的代码表示工作主要利用标记级模型或完整的AST树，通常会丢失指示危险编程实践的风格线索，或导致较高的结构性开销。我们的方法只选择非终端AST节点，降低输入的复杂性，同时保持语义层次，并集成语法和词汇CStyle功能作为辅助漏洞信号。 VulStyle使用掩码语言建模对七种编程语言的490万个函数进行预训练，并在五个基准数据集上进行微调：Devign，BigVul，DiverseVul，REVEAL和VulDeePecker。VulStyle在BigVul和VulDeePecker上实现了最先进的性能，在强大的Transformer基线上将F1提高了4-48%，并在所有基准测试中获得了竞争力或最佳平均性能。我们贡献了一个消融研究隔离CStyle和AST结构的影响，错误的情况下分析，和一个威胁模型定位的检测任务在攻击者的现实情况。
摘要：We present VulStyle, a multi-modal software vulnerability detection model that jointly encodes function-level source code, non-terminal Abstract Syntax Tree (AST) structure, and code stylometry (CStyle) features. Prior work in code representation primarily leverages token-level models or full AST trees, often missing stylistic cues indicative of risky programming practices, or incurring high structural overhead. Our approach selects only non-terminal AST nodes, reducing input complexity while preserving semantic hierarchy, and integrates syntactic and lexical CStyle features as auxiliary vulnerability signals. VulStyle is pre-trained using masked language modeling on 4.9M functions across seven programming languages, and fine-tuned across five benchmark datasets: Devign, BigVul, DiverseVul, REVEAL, and VulDeePecker. VulStyle achieves state-of-the-art performance on BigVul and VulDeePecker, improving F1 by 4-48% over strong transformer baselines, and attains competitive or best-average performance across all benchmarks. We contribute an ablation study isolating the effect of CStyle and AST structure, error case analysis, and a threat model situating the detection task in attacker-realistic scenarios.

【3】Why Domain Matters: A Preliminary Study of Domain Effects in Underwater Object Detection
标题：为什么领域很重要：水下物体检测中领域效应的初步研究
链接：https://arxiv.org/abs/2604.26174

作者：Melanie Wille,Dimity Miller,Tobias Fischer,Scarlett Raine
备注：Poster Presentation at ICRA 2026 Workshop S2S
摘要：域转移，其中训练和部署数据分布之间的偏差降低模型性能，是水下环境中的一个关键挑战。现有的基准测试性能的水下域转移模拟可变性，通过合成的风格转移。这无法捕捉内在的场景因素，如可见性，照明，场景组成，或采集因素，限制了现实世界的影响分析。我们提出了一个标签框架，定义水下域使用可测量的图像，场景和采集特性。与以前的基准测试不同，它捕获了物理上有意义的因素，实现了语义一致的图像分组，并支持特定于领域的检测性能评估，包括故障分析。我们在公共数据集上验证了这一点，显示了各领域因素的系统变化，并揭示了隐藏的故障模式。
摘要：Domain shift, where deviations between training and deployment data distributions degrade model performance, is a key challenge in underwater environments. Existing benchmarks testing performance for underwater domain shift simulate variability through synthetic style transfer. This fails to capture intrinsic scene factors such as visibility, illumination, scene composition, or acquisition factors, limiting analysis of real-world effects. We propose a labeling framework that defines underwater domains using measurable image, scene, and acquisition characteristics. Unlike prior benchmarks, it captures physically meaningful factors, enabling semantically consistent image grouping and supporting domain-specific evaluation of detection performance including failure analysis. We validate this on public datasets, showing systematic variations across domain factors and revealing hidden failure modes.

【4】Deep-testing: the case of dependence detection
标题：深度测试：依赖性检测案例
链接：https://arxiv.org/abs/2604.26558

作者：Gery Geenens,Pierre Lafaye de Micheaux,Ivan Muyun Zou
摘要：事实证明，深度学习方法对于分类和图像识别问题非常有效。在本文中，我们问这种成功是否可以转移到假设检验：如果一个神经网络可以区分，例如，一个手写数字的图像从另一个，它也可以区分“样本的图像”（如散点图）下产生一个给定的统计模型从一个模型之外产生的？基于这一想法，我们提出了一种称为深度测试的新方法，它通过深度学习来解决假设检验的经典推理问题。更具体地说，测试统计量是由深度神经网络从满足零假设和备择假设的模拟数据中学习的分类图，利用其强大的区分能力来构建非常强大的测试。作为概念证明，我们将深度测试应用于独立性测试问题，这可以说是统计学中最重要的问题之一。在一项大规模的模拟研究中，深度测试在广泛的复杂依赖结构中实现了对十九种竞争方法的最高总体功率，证实了所提出的方法的可行性。
摘要：Deep learning methods have proved highly effective for classification and image recognition problems. In this paper, we ask whether this success can be transferred to hypothesis testing: if a neural network can distinguish, for example, an image of a handwritten digit from another, can it also distinguish an "image of a sample" (such as a scatter plot) generated under a given statistical model from one generated outside that model? Motivated by this idea, we propose a novel procedure called deep-testing, which approaches the classical inferential problem of hypothesis testing through deep learning. More specifically, the test statistic is a classification map learned by a deep neural network from simulated data satisfying the null and alternative hypotheses, leveraging its strong discriminating power to construct a highly powerful test. As a proof of concept, we apply deep-testing to the problem of independence testing, arguably one of the most important problems in statistics. In a large-scale simulation study, deep-testing achieves the highest overall power against nineteen competing methods across a broad range of complex dependence structures, confirming the viability of the proposed approach.

分类|识别(5篇)

【1】MoRFI: Monotonic Sparse Autoencoder Feature Identification
标题：MoRTI：单调稀疏自动编码器特征识别
链接：https://arxiv.org/abs/2604.26866

作者：Dimitris Dimakopoulos,Shay B. Cohen,Ioannis Konstas
摘要：大型语言模型（LLM）在预训练阶段通过下一个标记预测获得大部分事实知识。后期训练的后续阶段通常会引入参数知识之外的新事实，从而产生幻觉。虽然已经证明，对新知识的监督微调（SFT）可能会加剧这个问题，但对潜在的机制仍然知之甚少。我们进行了一个受控的微调实验，专注于闭卷问答，并找到潜在的方向，因果关系有助于幻觉。具体来说，我们在七个不同的单QA数据集上对Llama 3.1 8B，Gemma 2 9B和Mistral 7B v03进行了微调，控制了新知识的百分比和训练时期的数量。通过测量测试集上的性能，我们验证了逐步引入新知识会增加幻觉，随着训练时间的延长，效果会更加明显。我们利用预先训练的稀疏自动编码器（SAE）来分析每个模型在各个检查点上的剩余流激活，并提出单调关系特征识别（MoRFI）来捕获因果相关的潜伏期。MoRFI过滤SAE特征，这些特征单调地响应目标属性的受控微调数据混合。我们的研究结果表明，暴露于未知的事实破坏了模型的能力，检索存储的知识沿一组方向的剩余流。我们的管道可以在不同的模型中可靠地发现它们，通过单潜伏干预恢复知识。
摘要：Large language models (LLMs) acquire most of their factual knowledge during the pre-training stage, through next token prediction. Subsequent stages of post-training often introduce new facts outwith the parametric knowledge, giving rise to hallucinations. While it has been demonstrated that supervised fine-tuning (SFT) on new knowledge may exacerbate the problem, the underlying mechanisms are still poorly understood. We conduct a controlled fine-tuning experiment, focusing on closed-book QA, and find latent directions that causally contribute to hallucinations. Specifically, we fine-tune Llama 3.1 8B, Gemma 2 9B and Mistral 7B v03 on seven distinct single QA datasets, controlling for the percentage of new knowledge and number of training epochs. By measuring performance on the test set, we validate that incrementally introducing new knowledge increases hallucinations, with the effect being more pronounced with prolonged training. We leverage pre-trained sparse autoencoders (SAEs) to analyze residual stream activations across various checkpoints for each model and propose Monotonic Relationship Feature Identification (MoRFI) for capturing causally relevant latents. MoRFI filters SAE features that respond monotonically to controlled fine-tuning data mixtures of a target property. Our findings show that exposure to unknown facts disrupts the model's ability to retrieve stored knowledge along a set of directions in the residual stream. Our pipeline reliably discovers them across distinct models, recovering knowledge through single-latent interventions.

【2】A Multi-Dataset Benchmark of Multiple Instance Learning for 3D Neuroimage Classification
标题：用于3D神经图像分类的多实例学习的多数据集基准
链接：https://arxiv.org/abs/2604.26807

作者：Ethan Harvey,Dennis Johan Loevlie,Amir Ali Satani,Wansu Chen,David M. Kent,Michael C. Hughes
摘要：尽管训练是资源密集型的，但3D卷积神经网络（CNN）已经成为对CT和MRI扫描进行分类的标准方法。最近的工作表明，深度多实例学习（MIL）可能是3D脑部扫描的更有效的替代方案，特别是当用于嵌入每个2D切片的预训练图像编码器被冻结并且仅训练池化操作和分类器时。在本文中，我们在三个CT和四个MRI数据集上对简单MIL、基于注意力的MIL、3D CNN和3D ViT进行了系统的比较，其中包括两个至少10，000次扫描的大型数据集。我们的目标是帮助资源有限的从业者了解哪些神经网络适用于3D神经图像以及为什么。我们进一步比较了基于注意力的MIL的设计选择，包括不同的编码器，池化操作和架构排序。我们发现，简单的平均池MIL，没有任何可学习的注意力，匹配或优于最近的MIL或3D CNN的替代品4 6中等规模的任务。该基线在两个大型数据集上仍然具有竞争力，同时训练速度快25倍。为了解释均值池的成功，我们检查了每片注意力质量和半合成数据集，在那里我们可以通过贝叶斯估计器获得最佳分类器。该分析揭示了现有MIL方法的局限性，并为未来的改进提出了建议。
摘要：Despite being resource-intensive to train, 3D convolutional neural networks (CNNs) have been the standard approach to classify CT and MRI scans. Recent work suggests that deep multiple instance learning (MIL) may be a more efficient alternative for 3D brain scans, especially when the pre-trained image encoder used to embed each 2D slice is frozen and only the pooling operation and classifier are trained. In this paper, we provide a systematic comparison of simple MIL, attention-based MIL, 3D CNNs, and 3D ViTs across three CT and four MRI datasets, including two large datasets of at least 10,000 scans. Our goal is to help resource-constrained practitioners understand which neural networks work well for 3D neuroimages and why. We further compare design choices for attention-based MIL, including different encoders, pooling operations, and architectural orderings. We find that simple mean pooling MIL, without any learnable attention, matches or outperforms recent MIL or 3D CNN alternatives on 4 of 6 moderate-sized tasks. This baseline remains competitive on two large datasets while being 25x faster to train. To explain mean pooling's success, we examine per-slice attention quality and a semi-synthetic dataset where we can derive the best possible classifier via a Bayes estimator. This analysis reveals the limits of existing MIL approaches and suggests routes for future improvements.

【3】Correcting Performance Estimation Bias in Imbalanced Classification with Minority Subconcepts
标题：用少数子概念纠正不平衡分类中的性能估计偏差
链接：https://arxiv.org/abs/2604.26024

作者：Taylor Maxson,Roberto Corizzo,Yaning Wu,Nathalie Japkowicz,Colin Bellinger
摘要：类级别的评估可以隐藏同一类中子概念之间的实质性性能差异，导致平均表现良好的模型在特定子群体上失败。先前的工作表明，不平衡分类的常见评估措施偏向于较大的少数子概念，使用真正的子概念标签，基于效用的重新加权可以减轻这种偏见，但是，这样的标签很少在测试时可用。我们引入了一个实用的效用加权评价，取代不可用的子概念标签与预测的后验概率从多类别的子概念模型。评估权重被定义为该后验概率下的预期效用，产生一个软的、不确定性感知的度量，我们称之为预测加权平衡准确度（pBA）。在表格基准以及医学成像和文本数据集上的实验表明，未加权的分数在类内异质性下可能会产生误导，而当子概念分布不均匀但不病态时，pBA提供了更稳定和可解释的评估。我们的代码可在https://anonymous.4open.science/r/correcting-bias-imbalance-9C6C/上获得。
摘要：Class-level evaluation can conceal substantial performance disparities across subconcepts within the same class, causing models that perform well on average to fail on specific subpopulations. Prior work has shown that common evaluation measures for imbalanced classification are biased toward larger minority subconcepts and that utility-based reweighting using true subconcept labels can mitigate this bias; however, such labels are rarely available at test time. We introduce a practical utility-weighted evaluation that replaces unavailable subconcept labels with predicted posterior probabilities from a multiclass subconcept model. Evaluation weights are defined as the expected utility under this posterior, yielding a soft, uncertainty-aware metric we call predicted-weighted balanced accuracy (pBA). Experiments on tabular benchmarks as well as medical-imaging and text datasets show that unweighted scores can be misleading under within-class heterogeneity, while pBA provides more stable and interpretable assessments when subconcept distributions are uneven but not pathological. Our code is available at: https://anonymous.4open.science/r/correcting-bias-imbalance-9C6C/.

【4】Mining Negative Sequential Patterns to Improve Viral Genomic Feature Representation and Classification
标题：挖掘负序列模式改善病毒基因组特征表示和分类
链接：https://arxiv.org/abs/2604.25968

作者：Wenxi Zhu,Wensheng Gan,Zhenlian Qi
备注：Preprint
摘要：病毒是地球上最丰富的生物实体，在微生物生态系统中发挥着关键作用，但作为人类的主要病原体，它们与人类的发病率和死亡率密切相关。因此，从病毒基因组序列中准确识别病毒序列是必不可少的，但是现有的基于基因组的分类模型在很大程度上依赖于基于组成或频率的子序列特征，通常具有有限的可解释性和降低的准确性，特别是在复杂或不平衡的数据集上。为了解决这些局限性，我们提出了GeneNSPCla（基因组负序列模式为基础的分类），一种新的病毒分类框架的基础上负序列模式（NSP），提取歧视性的缺席为基础的功能，从核苷酸序列的RNA病毒基因组。通过将这些NSP转换为数字特征向量并将其集成到多个监督分类器中，GeneNSPCla有效地捕获病毒序列中的存在和不存在信号。此外，我们提出了一个适用于处理基因组数据的负模式挖掘算法：GONPM+，它可以发现更长，更有生物意义的负序列模式。实验结果表明，GONPM+在8个分类器上的平均准确率比原否定模式挖掘算法提高了10.03%，比肯定模式挖掘算法提高了24.75%。这些发现强调了将基于缺失的序列信息结合起来的有效性，为病毒基因组分析和分类提供了一个新的和互补的视角。
摘要：Viruses represent the most abundant biological entities on Earth and play a pivotal role in microbial ecosystems, yet, as prominent human pathogens, they are closely linked to human morbidity and mortality. Accurate identification of viral sequences from viral genome sequences is therefore essential, but existing genome-based classification models that largely relying on composition- or frequency-based subsequence features often suffer from limited interpretability and reduced accuracy, particularly on complex or imbalanced datasets. To address these limitations, we propose GeneNSPCla (Genomic Negative Sequential Pattern-based Classification), a novel viral classification framework based on Negative Sequential Patterns (NSPs) that extracts discriminative absence-based features from nucleotide sequences of RNA viral genomes. By transforming these NSPs into numerical feature vectors and integrating them into multiple supervised classifiers, GeneNSPCla effectively captures both presence and absence signals in viral sequences. Furthermore, we propose a negative pattern mining algorithm adapted for processing genomic data: GONPM+, which can discover longer and more biologically meaningful negative sequential patterns. The experimental results demonstrate that the average accuracy of GONPM+ in 8 classifiers has improved by 10.03% compared to the original negative pattern mining algorithm and by 24.75% compared to the positive pattern mining algorithm. These findings highlight the effectiveness of incorporating absence-based sequential information, providing a new and complementary perspective for viral genome analysis and classification.

【5】Parameterized Quantum Circuits as Feature Maps: Representation Quality and Readout Effects in Multispectral Land-Cover Classification
标题：作为特征图的参数化量子电路：多光谱土地覆盖分类中的表示质量和读出效果
链接：https://arxiv.org/abs/2604.26675

作者：Ralntion Komini,Aikaterini Mandilara,Georgios Maragkopoulos,Dimitris Syvridis
摘要：我们调查变分量子分类器（VQC）的土地覆盖分类从多光谱卫星图像，采用特征图的角度来看，量子电路定义了一个非线性的数据嵌入，而读出确定如何利用这种表示。使用EuroSAT-MS数据集，我们在受控实验方案下对所有类别对进行了系统的一对一评估，将经典基线（逻辑回归，SVM，神经网络）与采用线性读出和量子核SVM策略的VQC进行比较。我们的研究结果表明，虽然具有线性读出的VQC的性能并不优于RBF-SVM等强大的经典基线，但当在基于内核的决策框架内重用时，相同的训练量子特征图可以显着提高性能。量子比特计数扫描进一步揭示了与指数希尔伯特空间维度和线性参数缩放之间的失配一致的饱和效应。总体而言，我们的研究结果强调，量子模型的有效性关键取决于表示和读出之间的相互作用，并且将学习的量子特征映射与经典决策机制相结合可能会产生有意义的收益，而不是寻求直接替代经典模型。
摘要：We investigate variational quantum classifiers (VQCs) for land-cover classification from multispectral satellite imagery, adopting a feature-map perspective in which the quantum circuit defines a nonlinear data embedding while the readout determines how this representation is exploited. Using the EuroSAT-MS dataset, we perform a systematic one-vs-one evaluation across all class pairs under a controlled experimental protocol, comparing classical baselines (logistic regression, SVMs, neural networks) with VQCs employing both linear readout and quantum-kernel SVM strategies. Our results show that, while VQCs with linear readout do not outperform strong classical baselines such as RBF-SVM, the same trained quantum feature map can significantly improve performance when reused within a kernel-based decision framework. A qubit-count sweep further reveals saturation effects consistent with the mismatch between exponential Hilbert space dimension and linear parameter scaling. Overall, our findings highlight that the effectiveness of quantum models depends critically on the interplay between representation and readout, and that meaningful gains may arise from combining learned quantum feature maps with classical decision mechanisms rather than seeking direct replacement of classical models.

表征(2篇)

【1】Layer-wise Lipschitz-Product Control for Deep Kolmogorov--Arnold Network Representations of Compositionally Structured Functions
标题：深度Kolmogorov-Arnold网络表示的分层Lipschitz-积控制
链接：https://arxiv.org/abs/2604.26444

作者：Aleksander Tankman
备注：15 pages, theoretical note on layer-wise Lipschitz control for deep KANs
摘要：我们证明了任何从[0，1]^n到R的连续函数f可由具有N个内部节点和合成稀疏性s = O（1）的有限计算树表示，允许深度Kolmogorov-Arnold网络（KAN）表示。每个内部节点是由一个原始的KAN块与控制块深度和Lipschitz产品。逐层Lipschitz积满足与输入维数n无关的主域敏感界。它简化为P（KAN_f）<= max（C*，1）^L_f，其中L_f <= c_max * N.对于标准运算{+，-，x，sin，cos}，x个节点在[0，1]-有界输入上，我们得到P（KAN）<= 1。层宽度满足n_l <= n + 2 w_max * N。一致逼近误差由N * max（C*，1）^d（f）*_Op（当C* <=1时简化）限定。对于C^m中的f，我们得到了最优B样条速率。还导出了范围界（对于加法树B_f <= N+1）。这解决了Liu等人（2024）指出的深层KAN堆栈中Lipschitz控制的差距。实验证实P（KAN）=1.0的几个组成结构的功能。
摘要：We prove that any continuous function f from [0,1]^n to R representable by a finite computation tree with N internal nodes and compositional sparsity s = O(1) admits a deep Kolmogorov-Arnold Network (KAN) representation. Each internal node is realised by a primitive KAN block with controlled block depth and Lipschitz product. The layer-wise Lipschitz product satisfies the primary domain-sensitive bound independent of the input dimension n. It simplifies to P(KAN_f) <= max(C*,1)^L_f with L_f <= c_max * N. For the standard operations {+,-,x,sin,cos} with x nodes on [0,1]-bounded inputs we obtain P(KAN) <= 1. Layer widths satisfy n_l <= n + 2 w_max * N. The uniform approximation error is bounded by N * max(C*,1)^d(f) * epsilon_Op (simplifies when C* <=1). For f in C^m we obtain optimal B-spline rates. Range bounds are also derived (B_f <= N+1 for additive trees). This addresses the gap on Lipschitz control in deep KAN stacks noted by Liu et al. (2024). Experiments confirm P(KAN)=1.0 for several compositionally structured functions.

【2】Robust Representation Learning through Explicit Environment Modeling
标题：通过显式环境建模进行稳健的表示学习
链接：https://arxiv.org/abs/2604.26128

作者：Yuli Slavutsky,David M. Blei
摘要：我们考虑从多个环境中收集的标记数据中学习，其中数据分布可能在这些环境中有所不同。这个问题通常是从因果关系的角度来处理的，寻求不变的表示，保留因果因素，同时丢弃虚假的。然而，这个框架假设环境对目标没有直接影响。相比之下，我们考虑这种假设失败的设置，但仍然旨在学习支持在以前看不见的环境中平均进行鲁棒预测的表示。为此，我们研究表示学习明确建模的变化，跨环境，然后边缘化的变化。我们分析所得到的表示和表征时，他们是最好的因果不变表示方法的教训。我们提出了一个具体的方法的基础上广义随机截距模型，一类预测，在这种边缘化是可能的，并研究其泛化性能。从经验上看，我们表明这些模型在一系列具有挑战性的环境中表现优于不变学习方法。
摘要：We consider learning from labeled data collected across multiple environments, where the data distribution may vary across these environments. This problem is commonly approached from a causal perspective, seeking invariant representations that retain causal factors while discarding spurious ones. However, this framework assumes that the environment has no direct effect on the target. In contrast, we consider settings in which this assumption fails, but still aim to learn representations that support robust prediction on average across previously unseen environments. To this end, we study representations learned by explicitly modeling variation across environments and then marginalizing that variation out. We analyze the resulting representations and characterize when they are preferable to those learned by causal invariant-representation methods. We propose a concrete method based on generalized random-intercept models, a class of predictors in which such marginalization is possible, and study their generalization properties. Empirically, we show that these models outperform invariant-learning methods across a range of challenging settings.

3D|3D重建等相关(1篇)

【1】Featurising Pixels from Dynamic 3D Scenes with Linear In-Context Learners
标题：通过线性上下文学习者从动态3D场景中描绘像素
链接：https://arxiv.org/abs/2604.26488

作者：Nikita Araslanov,Martin Sundermeyer,Hidenobu Matsuki,David Joseph Tan,Federico Tombari
备注：To appear at CVPR 2026 (oral). Project website: https://lila-pixels.github.io
摘要：视觉模型最令人兴奋的应用之一涉及像素级推理。尽管丰富的视觉基础模型，我们仍然缺乏表示，有效地嵌入视觉场景的时空属性在像素级。现有的框架要么在基于图像的借口任务上进行训练，这不考虑动态元素，要么在用于动作级推理的视频序列上进行训练，这不扩展到密集像素级预测。我们提出了一个框架，从视频中学习像素精确的特征描述符，LILA。我们培训框架的核心要素是线性情境学习。LILA利用时空线索图-深度和运动-用现成的网络估计。尽管这些线索具有噪声性质，但LILA可以有效地在未经策划的视频数据集上进行训练，以时间一致的方式嵌入语义和几何属性。我们展示了令人信服的经验优势的学习表示在一套不同的视觉任务：视频对象分割，表面法线估计和语义分割。
摘要：One of the most exciting applications of vision models involve pixel-level reasoning. Despite the abundance of vision foundation models, we still lack representations that effectively embed spatio-temporal properties of visual scenes at the pixel level. Existing frameworks either train on image-based pretext tasks, which do not account for dynamic elements, or on video sequences for action-level reasoning, which does not scale to dense pixel-level prediction. We present a framework that learns pixel-accurate feature descriptors from videos, LILA. The core element of our training framework is linear in-context learning. LILA leverages spatio-temporal cue maps -- depth and motion -- estimated with off-the-shelf networks. Despite the noisy nature of those cues, LILA trains effectively on uncurated video datasets, embedding semantic and geometric properties in a temporally consistent manner. We demonstrate compelling empirical benefits of the learned representation across a diverse suite of vision tasks: video object segmentation, surface normal estimation and semantic segmentation.

优化|敛散性(5篇)

【1】Hyper Input Convex Neural Networks for Shape Constrained Learning and Optimal Transport
标题：用于形状约束学习和最优传输的超输入凸神经网络
链接：https://arxiv.org/abs/2604.26942

作者：Shayan Hundrieser,Insung Kong,Johannes Schmidt-Hieber
备注：65 pages, 13 figures, the first two authors contributed equally
摘要：我们介绍超输入凸神经网络（HyCNN），这是一种专为学习凸函数而设计的新型神经网络架构。HyCNN将Maxout网络的原理与输入凸神经网络（ICNN）相结合，创建了一个在输入中始终是凸的神经网络，理论上能够利用深度，并且与ICNN相比，在大规模训练时表现可靠。具体地说，我们证明了HyCNN需要比ICNN指数更少的参数来近似二次函数到给定的精度。通过一系列的合成实验，我们证明了HyCNN在凸回归和插值任务的预测性能方面优于现有的ICNN和MLP。我们进一步应用HyCNN来学习合成示例和单细胞RNA测序数据的高维最优传输图，在这些数据中，它们在各种设置中的表现往往优于基于ICNN的神经最优传输方法和其他基线。
摘要：We introduce Hyper Input Convex Neural Networks (HyCNNs), a novel neural network architecture designed for learning convex functions. HyCNNs combine the principles of Maxout networks with input convex neural networks (ICNNs) to create a neural network that is always convex in the input, theoretically capable of leveraging depth, and performs reliable when trained at scale compared to ICNNs. Concretely, we prove that HyCNNs require exponentially fewer parameters than ICNNs to approximate quadratic functions up to a given precision. Throughout a series of synthetic experiments, we demonstrate that HyCNNs outperform existing ICNNs and MLPs in terms of predictive performance for convex regression and interpolation tasks. We further apply HyCNNs to learn high-dimensional optimal transport maps for synthetic examples and for single-cell RNA sequencing data, where they oftentimes outperform ICNN-based neural optimal transport methods and other baselines across a wide range of settings.

【2】Near-Optimal Cryptographic Hardness of Learning With Homogeneous Halfspaces Under Gaussian Marginals
标题：高斯边缘下齐次半空间学习的近优密码硬度
链接：https://arxiv.org/abs/2604.26446

作者：Jizhou Huang,Brendan Juba
摘要：我们研究了三个问题，涉及确定均匀半空间高斯分布下：不可知学习，单侧可靠学习，公平审计。在这些问题中，我们给出了标记的例子$（\mathbf{x}，\mathrm{y}）$，它们来自$\mathbb{R}^d\times\{-1，+1\}$上的未知分布，其边缘分布在$\mathbf{x}$上是标准高斯分布，在$\mathrm{y}$上是任意分布。每个问题的目标是输出一个齐次半空间，该齐次半空间在其相应的损失度量方面接近最佳拟合齐次半空间。我们证明了这些问题的近似最优计算硬度的结果下，广泛认为的学习与错误（LWE）问题的硬度假设。以前的硬度结果，这些问题主要是建立一般的半空间，我们的研究结果扩展到均匀的半空间，这些硬度的结果。值得注意的是，我们的下界严格概括了以前的作品，并缩小了高斯边缘下不可知学习齐次半空间的上界和下界之间的差距。
摘要：We study three problems that involve identifying homogeneous halfspaces under Gaussian distributions: agnostic learning, one-sided reliable learning, and fairness auditing. In each of these problems, we are given labeled examples $(\mathbf{x}, \mathrm{y})$ drawn from an unknown distribution on $\mathbb{R}^d\times\{-1, +1\}$, whose marginal distribution on $\mathbf{x}$ is standard Gaussian and on $\mathrm{y}$ is arbitrary. The goal of each problem is to output a homogeneous halfspace that approaches the best-fitting homogeneous halfspace in terms of its corresponding loss measure. We prove near-optimal computational hardness results for these problems under the widely believed hardness assumption of the Learning With Errors (LWE) problem. Prior hardness results for these problems were mostly established for general halfspaces; our findings extend some of these hardness results to homogeneous halfspaces. Remarkably, our lower bound strictly generalizes over prior works and narrows the gap between the upper and lower bounds for agnostically learning homogeneous halfspaces under Gaussian marginals.

【3】Co-Learning Port-Hamiltonian Systems and Optimal Energy-Shaping Control
标题：联合学习波特Hamilton系统和最优能量整形控制
链接：https://arxiv.org/abs/2604.26172

作者：Ankur Kamboj,Biswadip Dey,Vaibhav Srivastava
摘要：我们开发了一个物理信息的学习框架，用于从轨迹数据对端口哈密顿（pH）系统进行能量成形控制。所提出的方法{co-learns}的pH系统模型和最优的能量平衡基于无源性的控制器（EB-PBC）通过交替优化与策略感知数据收集。在每次迭代中，使用在当前控制策略下收集的轨迹数据来细化系统模型，并且在更新的模型上重新优化控制器。这两个组件通过嵌入pH {动态}和EB-PBC结构的神经网络进行参数化，确保能量{相互作用}方面的可解释性。学习控制器使闭环系统固有的被动和可证明的稳定，并利用被动植物动态不取消自然潜力。耗散正则化在训练期间强制执行严格的能量衰减，从而增强对模拟到真实间隙的鲁棒性。所提出的框架进行了验证，对平面和扭转摆系统的状态调节和摆起的任务。
摘要：We develop a physics-informed learning framework for energy-shaping control of port-Hamiltonian (pH) systems from trajectory data. The proposed approach {co-learns} a pH system model and an optimal energy-balancing passivity-based controller (EB-PBC) through alternating optimization with policy-aware data collection. At each iteration, the system model is refined using trajectory data collected under the current control policy, and the controller is re-optimized on the updated model. Both components are parameterized by neural networks that embed the pH {dynamics} and EB-PBC structure, ensuring interpretability in terms of energy {interactions}. The learned controller renders the closed-loop system inherently passive and provably stable, and exploits passive plant dynamics without canceling the natural potential. A dissipation regularization enforces strict energy decay during training, thereby enhancing robustness to sim-to-real gaps. The proposed framework is validated on state-regulation and swing-up tasks for planar and torsional pendulum systems.

【4】Learning Over-Relaxation Policies for ADMM with Convergence Guarantees
标题：学习具有收敛保证的ADMM过度放松政策
链接：https://arxiv.org/abs/2604.26932

作者：Junan Lin,Paul J. Goulart,Luca Furieri
摘要：乘子交替方向法（ADMM）是求解结构凸优化问题的一种常用方法，其实际性能与惩罚函数和松弛参数的选择密切相关。受模型预测控制（MPC）等设置的启发，其中一个重复解决具有固定结构和变化参数值的相关优化问题，我们提出学习松弛参数的在线更新，以提高感兴趣的问题类的性能。这种选择在OSQP类架构中具有计算吸引力，因为自适应松弛不会触发与惩罚更新相关的矩阵重构。我们建立了收敛保证ADMM与时变惩罚和松弛参数在温和的假设下，并显示基准二次规划，由此产生的学习策略提高迭代次数和挂钟时间超过基线OSQP。
摘要：The Alternating Direction Method of Multipliers (ADMM) is a widely used method for structured convex optimization, and its practical performance depends strongly on the choice of penalty and relaxation parameters. Motivated by settings such as Model Predictive Control (MPC), where one repeatedly solves related optimization problems with fixed structure and changing parameter values, we propose learning online updates of the relaxation parameter to improve performance on problem classes of interest. This choice is computationally attractive in OSQP-like architectures, since adapting relaxation does not trigger the matrix refactorizations associated with penalty updates. We establish convergence guarantees for ADMM with time-varying penalty and relaxation parameters under mild assumptions, and show on benchmark quadratic programs that the resulting learned policies improve both iteration count and wall-clock time over baseline OSQP.

【5】Quantum Feature Selection with Higher-Order Binary Optimization on Trapped-Ion Hardware
标题：在俘获离子硬件上使用更高级二元优化的量子特征选择
链接：https://arxiv.org/abs/2604.26834

作者：Carlos Flores-Garrigós,Anton Simen,Qi Zhang,Enrique Solano,Narendra N. Hegade,Sayonee Ray,Claudio Girotto,Jason Iaconis,Martin Roetteler
摘要：我们提出了一个量子特征选择框架的基础上，高阶无约束二进制优化（HUBO）制定，明确纳入标准的二次编码以外的多变量依赖。与基于QUBO的方法相比，所提出的模型包括从互信息度量导出的一体、二体和三体相互作用项，使目标函数能够在统一的能量模型中捕获特征相关性、成对冗余和高阶统计结构。为了抑制平凡的所有选择的解决方案，我们进一步包括结构化的线性处罚，促进稀疏性，同时保留信息变量。由此产生的HUBO实例在IonQ Forte上使用数字化反绝热量子优化进行了优化，并与无噪声量子模拟以及两个经典的降维基线进行了比较：基于互信息和主成分分析（PCA）的SelectKBest。我们评估了两个基准分类数据集，即胆结石数据集和垃圾邮件数据集的工作流程，并分析预测性能和选定的子集结构。结果表明，硬件执行和无噪声模拟之间的良好的定性协议，支持在当前的捕获离子处理器上实现高阶特征选择哈密顿算子的可行性。此外，量子方法在产生紧凑和信息丰富的特征子集的同时，还产生了具有竞争力的分类性能，突出了高阶量子优化在机器学习预处理任务中的潜力。
摘要：We present a quantum feature-selection framework based on a higher-order unconstrained binary optimization (HUBO) formulation that explicitly incorporates multivariate dependencies beyond standard quadratic encodings. In contrast to QUBO-based approaches, the proposed model includes one-, two-, and three-body interaction terms derived from mutual-information measures, enabling the objective function to capture feature relevance, pairwise redundancy, and higher-order statistical structure within a unified energy model. To suppress trivial all-selected solutions, we further include structured linear penalties that promote sparsity while preserving informative variables. The resulting HUBO instances are optimized with digitized counterdiabatic quantum optimization on IonQ Forte and compared against noiseless quantum simulation as well as two classical dimensionality-reduction baselines: SelectKBest based on mutual information and principal component analysis (PCA). We evaluate the proposed workflow on two benchmark classification datasets, namely the Gallstone dataset and the Spambase dataset, and analyze both predictive performance and selected-subset structure. The results show good qualitative agreement between hardware executions and noiseless simulations, supporting the feasibility of implementing higher-order feature-selection Hamiltonians on current trapped-ion processors. In addition, the quantum approach yields competitive classification performance while producing compact and informative feature subsets, highlighting the potential of higher-order quantum optimization for machine-learning preprocessing tasks.

预测|估计(5篇)

【1】Hankel and Toeplitz Rank-1 Decomposition of Arbitrary Matrices with Applications to Signal Direction-of-Arrival Estimation
标题：任意矩阵的Hankel和Toeplitz 1级分解及其在信号到达方向估计中的应用
链接：https://arxiv.org/abs/2604.26787

作者：Georgios I. Orfanidis,Dimitris A. Pados,George Sklivanitis,Elizabeth Serena Bentley
摘要：考虑在L_2 $和L_1 $-模误差下计算任意矩阵的最佳秩-1 $ Hankel和Toeplitz结构逼近的问题。这样的问题自然出现在工程系统中，包括基本的Few-Shot信号到达方向（DoA）估计问题，其对于现代自主系统应用是重要的。我们开发了准确和计算效率高的结构化矩阵分解算法，这两种配方，然后推导出分析接地小样本支持的DoA估计实际传感系统部署。得到的估计下的$L_2$和$L_1$规范正式证明是最大似然最优的白高斯和拉普拉斯噪声下，分别。通过大量的仿真研究和实际数据实验，在Few-Shot DoA推理的估计进一步验证。
摘要：We consider the problems of computing the optimal rank-$1$ Hankel and Toeplitz-structured approximation of arbitrary matrices under $L_2$ and $L_1$-norm error. Such problems arise naturally in engineered systems, including the basic few-shot signal Direction-of-Arrival (DoA) estimation problem that is of importance to modern autonomous systems applications. We develop accurate and computationally efficient structured matrix decomposition algorithms for both formulations and then derive analytically grounded small-sample-support DoA estimators for practical sensing system deployments. The resulting estimators under the $L_2$ and $L_1$ norms are formally shown to be maximum-likelihood optimal under white Gaussian and Laplace noise, respectively. The estimators are further validated through extensive simulation studies and real-world data experiments in few-shot DoA inference.

【2】FutureWorld: A Live Environment for Training Predictive Agents with Real-World Outcome Rewards
标题：FutureWorld：用于训练预测代理并获得现实世界成果奖励的实时环境
链接：https://arxiv.org/abs/2604.26733

作者：Zhixin Han,Yanzhi Zhang,Chuyang Wei,Maohang Gao,Xiawei Yue,Kefei Chen,Yu Zhuang,Haoxiang Guan,Jiyan He,Jian Li,Yitong Duan,Yu Shi,Mengting Hu,Shuxin Zheng
备注：Our experiments are ongoing, and we will release the code in the near future. We release a subset of our historical data on Hugging Face: https://huggingface.co/datasets/PredictingFuture/FutureWorld
摘要：实时未来预测是指在现实世界的事件展开之前对它们进行预测的任务。这个任务越来越多地使用基于大型语言模型的代理系统进行研究，这对于构建能够不断从现实世界学习的代理非常重要。正如交互式环境通常会推动智能体的进步一样，推进实时未来预测自然会促使人们将其视为学习环境。以前的作品已经从几个不同的部分探索了未来的预测，但通常没有将其框定为一个统一的学习环境。这项任务对学习很有吸引力，因为它可以提供大量基于不同现实世界事件的预测问题，同时防止答案泄露。为了利用实时未来预测的优势，我们提出了FutureWorld，这是一个实时的代理强化学习环境，可以关闭预测，结果实现和参数更新之间的训练循环。在我们的环境中，我们采用了三个开源基础模型，并连续几天对它们进行训练。结果表明，培训是有效的。此外，我们还根据环境建立每日基准，并评估其上的几个前沿代理，以建立当前代理系统的性能基线。
摘要：Live future prediction refers to the task of making predictions about real-world events before they unfold. This task is increasingly studied using large language model-based agent systems, and it is important for building agents that can continually learn from real-world. Just as interactive environments have often driven progress in agents, advancing live future prediction naturally motivates viewing it as a learning environment. Prior works have explored future prediction from several different parts, but have generally not framed it as a unified learning environment. This task is appealing for learning because it can provide a large number of prediction questions grounded in diverse real-world events, while preventing answer leakage. To leverage the advantages of live future prediction, we present FutureWorld, a live agentic reinforcement learning environment that closes the training loop between prediction, outcome realization, and parameters update. In our environment, we take three open-source base models and train them for consecutive days. The results show that training is effective. Furthermore, we build a daily benchmark based on the environment and evaluate several frontier agents on it to establish performance baselines for current agent systems.

【3】Electricity price forecasting across Norway's five bidding zones in the post-crisis era
标题：后危机时代挪威五个竞价区的电价预测
链接：https://arxiv.org/abs/2604.26634

作者：My Thi Diem Phan,Trung Tuyen Truong,Hoai Phuong Ha,Dat Thanh Nguyen
摘要：挪威的电力市场主要由水力发电主导，但2021- 2022年的能源危机以及与欧洲大陆的更强一体化从根本上改变了价格形成，降低了根据历史数据校准的预测模型的可靠性。尽管迫切需要更新的模型，一个统一的基准评估功能的贡献，在所有结构不同的挪威投标区仍然缺乏。在这里，我们提出了一个全面的评估电价预测在所有五个挪威北池投标区。我们构建了一个跨越2019- 2025年的多模态小时数据集，并使用严格因果测试集评估了八个预测模型系列，包括LightGBM、ARX和高级深度学习架构。我们实施了强大的滚动原点回测，留一组功能消融，和条件制度分析，解剖模型的性能和功能效用。我们的研究结果表明，LightGBM在每个区域都取得了最佳性能，MAE范围从1.64到5.74 EUR/MWh，而Ridge ARX模型在北部区域仍然是一个极具竞争力的线性基准。功能消融显示，模型只依赖于滞后价格和日历变量实现高精度，往往匹配或超过充分的多模态集成。然而，有条件的制度分析表明，外部功能，如水库水位和天然气价格仍然是至关重要的分层预测误差，不断增加压力下的市场制度。这突出了模型的可解释性和制度意识的决策者面临的市场动态结构性变化的实用价值。
摘要：Norway's electricity market is heavily dominated by hydropower, but the 2021--2022 energy crisis and stronger integration with Continental Europe have fundamentally altered price formation, reducing the reliability of forecasting models calibrated on historical data. Despite the critical need for updated models, a unified benchmark evaluating feature contributions across all structurally diverse Norwegian bidding zones remains lacking. Here we present a comprehensive evaluation of electricity price forecasting across all five Norwegian Nord Pool bidding zones. We constructed a multimodal hourly dataset spanning 2019--2025 and evaluated eight forecasting model families including LightGBM, ARX, and advanced deep learning architectures using a strictly causal test set. We implemented robust rolling-origin backtesting, leave-one-group-out feature ablation, and conditional regime analysis to dissect model performance and feature utility. Our results show that LightGBM achieves the best performance in every zone with MAE ranging from 1.64 to 5.74~EUR/MWh, while the ridge ARX model remains a highly competitive linear benchmark in northern zones. Feature ablation reveals that models relying solely on lagged prices and calendar variables achieve high accuracy and often match or exceed full multimodal integration. However, conditional regime analysis demonstrates that external features like reservoir levels and gas prices remain crucial to stratify forecast errors, which consistently increase under stressed market regimes. This highlights the practical value of model interpretability and regime awareness for decision makers facing structural changes in market dynamics.

【4】Observable Neural ODEs for Identifiable Causal Forecasting in Continuous Time
标题：用于连续时间可识别原因预测的可观测神经ODE
链接：https://arxiv.org/abs/2604.26070

作者：Jennifer Wendland,Nicolas Freitag,Maik Kschischo
备注：20 pages, 5 figures
摘要：连续时间序列决策问题中的因果推理受到隐藏混杂因素的挑战。我们发现，在潜在的状态空间模型与时变干预，观测数据的潜在动态的可观测性是必要的识别动态治疗效果，控制理论的可观测性因果可识别性，即使隐藏的混杂因素影响治疗和结果。我们推导出一个连续时间的调整公式，通过测量模型，潜在的动态和潜在状态的过滤分布观察到的历史，治疗轨迹下表达潜在的结果分布。我们提出了可观测神经常微分方程（ObsNODEs），可观测范式的因果预测神经常微分方程模型。ObsNODE学习连续时间动态，状态可从观察中重建，从而实现替代治疗路径下的结果预测。在合成癌症数据、基于MIMIC-IV的半合成数据和真实世界脓毒症数据上的实验显示出优于最近序列模型的强大性能。
摘要：Causal inference in continuous-time sequential decision problems is challenged by hidden confounders. We show that, in latent state-space models with time-varying interventions, observability of the latent dynamics from observed data is necessary for identifying dynamic treatment effects, linking control-theoretic observability to causal identifiability, even when hidden confounders affect both treatments and outcomes. We derive a continuous-time adjustment formula expressing potential outcome distributions under treatment trajectories via the measurement model, latent dynamics, and the filtering distribution over latent states given observed histories. We propose Observable Neural ODEs (ObsNODEs), Neural ODE models in observable normal form for causal forecasting. ObsNODEs learn continuous-time dynamics with states reconstructible from observations, enabling outcome prediction under alternative treatment paths. Experiments on synthetic cancer data, semi-synthetic data based on MIMIC-IV, and real-world sepsis data show strong performance over recent sequence models.

【5】Mini-Batch Class Composition Bias in Link Prediction
标题：链接预测中的小批量类组成偏差
链接：https://arxiv.org/abs/2604.25978

作者：Kieran Maguire,Srinandan Dasmahapatra
备注：Accepted at GCLR 2026: the 5th Workshop on Graphs and more Complex Structures For Learning and Reasoning, colocated with AAAI 2026
摘要：先前关于节点分类的工作表明，当底层图形属性共享时，图形神经网络（GNN）可以学习跨图形传输的表示。对于一个固定的图，人们会期望为链接预测训练的GNN学习一个与节点分类一致的表示。我们表明这种直觉在一般情况下并不成立。相反，我们发现流行的链接预测模型可以学习一个微不足道的小批量依赖启发式算法，通过批量归一化层来解决边缘分类任务。当对此进行纠正时，我们观察到网络表示与节点类相关特征的对齐程度增加，这表明网络已经学会了一种更好地与底层图形属性对齐的图形表示。我们的研究结果表明，标准的链接预测训练可能会导致我们高估链接预测器学习跨任务一致的图形通用表示的能力。
摘要：Prior work on node classification has shown that Graph Neural Networks (GNNs) can learn representations that transfer across graphs, when underlying graph properties are shared. For a fixed graph, one would then expect GNNs trained for link prediction to learn a representation consistent with that learnt for node classification. We show this intuition does not hold in the general case. Instead, we find popular link prediction models can learn a trivial mini-batch dependent heuristic, enabled by batch-normalisation layers, to solve the edge classification task. When correcting for this, we observe increased alignment of the network representation with node-class relevant features, suggesting the network has learnt a graph representation that better aligns with the underlying graph's properties. Our findings suggest that standard link prediction training may be leading us to overestimate link predictors' ability to learn a generalised representation of a graph that is consistent across tasks.

其他神经网络|深度学习|模型|建模(18篇)

【1】On the Learning Curves of Revenue Maximization
标题：关于收入最大化的学习曲线
链接：https://arxiv.org/abs/2604.26922

作者：Steve Hanneke,Alkis Kalavasis,Shay Moran,Grigoris Velegkas
备注：To appear in the 58th ACM Symposium on Theory of Computing (STOC 2026)
摘要：学习曲线是监督学习中的一个基本要素，它描述了算法的性能如何随着更多的数据而提高，并提供了其泛化能力的定量度量。形式上，学习曲线绘制了算法误差的衰减，作为训练样本数量的函数。之前关于收入最大化学习算法的工作，从Cole和Roughgarden的开创性工作开始[STOC，2014]，采用了无分布的观点，这与学习理论中的PAC学习框架相似。这种方法根据最难的估值分布序列（每个样本量一个）评估性能，有效地定义了所有可能分布上的学习曲线的上包络，从而导致无法捕获学习曲线形状的误差范围。在这项工作中，我们开始学习曲线的收入最大化的研究，并提供了一个近乎完整的表征其衰减率的基本设置的一个项目和一个单一的买家。在没有任何限制的估值分布，我们表明，存在一个贝叶斯一致的算法，这意味着它的学习曲线收敛到零的任何任意估值分布的样本数$n \到\infty$。然而，即使最优收入是有限的，这种收敛也必须是任意缓慢的。相反，如果最优收益是通过有限价格实现的，那么最优衰减率大约是1/\sqrt{n}$。最后，对于离散值集支持的分布，我们表明学习曲线几乎以指数速度衰减，这是PAC框架下无法达到的速度。
摘要：Learning curves are a fundamental primitive in supervised learning, describing how an algorithm's performance improves with more data and providing a quantitative measure of its generalization ability. Formally, a learning curve plots the decay of an algorithm's error for a fixed underlying distribution as a function of the number of training samples. Prior work on revenue-maximizing learning algorithms, starting with the seminal work of Cole and Roughgarden [STOC, 2014], adopts a distribution-free perspective, which parallels the PAC learning framework in learning theory. This approach evaluates performance against the hardest possible sequence of valuation distributions, one for each sample size, effectively defining the upper envelope of learning curves over all possible distributions, thus leading to error bounds that do not capture the shape of the learning curves. In this work we initiate the study of learning curves for revenue maximization and provide a near-complete characterization of their rate of decay in the basic setting of a single item and a single buyer. In the absence of any restriction on the valuation distribution, we show that there exists a Bayes-consistent algorithm, meaning that its learning curve converges to zero for any arbitrary valuation distribution as the number of samples $n \to \infty$. However, this convergence must be arbitrarily slow, even if the optimal revenue is finite. In contrast, if the optimal revenue is achieved by a finite price, then the optimal rate of decay is roughly $1/\sqrt{n}$. Finally, for distributions supported on discrete sets of values, we show that learning curves decay almost exponentially fast, a rate unattainable under the PAC framework.

【2】Causal Learning with Neural Assemblies
标题：神经组装的因果学习
链接：https://arxiv.org/abs/2604.26919

作者：Evangelia Kopadi,Dimitris Kalles
备注：8 pages, 11 figures
摘要：神经集合体--一组神经元一起激发并通过共同激活而加强--能学习变量之间因果影响的方向吗？虽然建立了一个计算一般基板的分类，解析和规划，神经组装尚未被证明内在因果方向性。我们证明了神经组件的固有操作-投影，局部可塑性控制和稀疏赢家选择-足以进行定向学习。我们引入DIRECT（定向边缘耦合/训练），一种机制，共同激活源和目标组件下的自适应增益计划，内化定向关系。与基于反向传播的方法不同，DIRECT仅依赖于局部可塑性，使得由此产生的因果声明在机制层面上是可审计的。我们的研究结果通过双重读出验证策略进行验证：（i）突触强度不对称，测量前向和反向链路之间的紧急权重差距，以及（ii）功能传播重叠，量化定向信号流的可靠性。在多个领域中，该框架在有监督的已知结构设置下实现了完美的结构恢复。这些结果建立了神经组装作为生物学上合理的动力学和正式的因果模型之间的可审计的桥梁，提供了一个“可解释的设计”框架，其中因果索赔可追溯到特定的神经赢家和突触不对称。
摘要：Can Neural Assemblies -- groups of neurons that fire together and strengthen through co-activation -- learn the direction of causal influence between variables? While established as a computationally general substrate for classification, parsing, and planning, neural assemblies have not yet been shown to internalize causal directionality. We demonstrate that the inherent operations of neural assemblies -- projection, local plasticity control, and sparse winner selection -- are sufficient for directional learning. We introduce DIRECT (DIRectional Edge Coupling/Training), a mechanism that co-activates source and target assemblies under an adaptive gain schedule to internalize directed relations. Unlike backpropagation-based methods, DIRECT relies solely on local plasticity, making the resulting causal claims auditable at the mechanism level. Our findings are verified through a dual-readout validation strategy: (i) synaptic-strength asymmetry, measuring the emergent weight gap between forward and reverse links, and (ii) functional propagation overlap, quantifying the reliability of directional signal flow. Across multiple domains, the framework achieves perfect structural recovery under a supervised, known-structure setting. These results establish neural assemblies as an auditable bridge between biologically plausible dynamics and formal causal models, offering an "explainable by design" framework where causal claims are traceable to specific neural winners and synaptic asymmetries.

【3】Multiple Additive Neural Networks for Structured and Unstructured Data
标题：结构化和非结构化数据的多个可加性神经网络
链接：https://arxiv.org/abs/2604.26888

作者：Janis Mohr,Jörg Frochte
备注：Accepted author manuscript; page layout differs from the published Springer version
摘要：本文扩展并解释了多加性神经网络（MANN）方法，这是对传统梯度提升框架的增强，利用近浅神经网络而不是决策树作为基本学习器。这种创新方法利用神经网络架构，特别是卷积神经网络（CNN）和胶囊神经网络，将其应用扩展到结构化数据和非结构化数据，如图像和音频。对于结构化数据，胶囊神经网络作为特征提取器的优点被使用，并与MANN作为分类器相结合。MANN独特的架构促进了持续学习，并集成了先进的算法来对抗过拟合，确保鲁棒性并降低对学习率和迭代等超参数设置的敏感性。我们的实证研究表明，MANN在知名数据集上的准确性超过了传统方法，如极端梯度提升（XGB）。这项研究证明了MANN的卓越精度和泛化能力，使其成为多种数据类型和复杂学习环境的通用工具。
摘要：This paper extends and explains the Multiple Additive Neural Networks (MANN) methodology, an enhancement to the traditional Gradient Boosting framework, utilizing nearly shallow neural networks instead of decision trees as base learners. This innovative approach leverages neural network architectures, notably Convolutional Neural Networks (CNNs) and Capsule Neural Networks, to extend its application to both structured data and unstructured data such as images and audio. For structured data the advantages of capsule neural networks as feature extractors are used and combined with MANN as a classifier. MANN's unique architecture promotes continuous learning and integrates advanced heuristics to combat overfitting, ensuring robustness and reducing sensitivity to hyperparameter settings like learning rate and iterations. Our empirical studies reveal that MANN surpasses traditional methods such as Extreme Gradient Boosting (XGB) in accuracy across well-known datasets. This research demonstrates MANN's superior precision and generalizability, making it a versatile tool for diverse data types and complex learning environments.

【4】Unifying Runtime Monitoring Approaches for Safety-Critical Machine Learning: Application to Vision-Based Landing
标题：统一安全关键机器学习的任务监视方法：应用于基于视觉的着陆
链接：https://arxiv.org/abs/2604.26411

作者：Mathieu Dario,Florent Chenevier,Kévin Delmas,Joris Guerin,Jérémie Guiochet
备注：15 pages, 5 figures, 3 tables, submitted to ICPR 2026
摘要：安全监控对于确保安全关键域中ML应用程序的安全至关重要。然而，目前的研究是分散的，独立的方法出现在不同的社区。在本文中，我们提出了一个统一的框架，将运行时监控方法分为三种不同的类型：操作设计域（ODD）监控，确保符合预期的操作条件;分布外（OOD）监控，拒绝偏离训练数据的输入;和模型范围外（OMS）监控，根据其内部状态或输出检测异常模型行为。我们展示了这种分类的好处，一个专门的实验上的航空安全关键的应用程序：降落时的跑道检测。该框架便于设计监测活动，补充监测类别，并使不同的监测使用共同的，面向安全的指标进行评估和比较。
摘要：Runtime monitoring is essential to ensure the safety of ML applications in safety-critical domains. However, current research is fragmented, with independent methods emerging from different communities. In this paper, we propose a unified framework categorising runtime monitoring approaches into three distinct types: Operational Design Domain (ODD) monitoring, which ensures compliance with expected operating conditions; Out-of-Distribution (OOD) monitoring, which rejects inputs that deviate from the training data; and Out-of-Model-Scope (OMS) monitoring, which detects anomalous model behaviour based its internal states or outputs. We demonstrate the benefits of this categorization with a dedicated experiment on an aeronautical safety-critical application: runway detection during landing. This framework facilitates design of monitoring activities, with complementary categories of monitors, and enables evaluation and comparison of different monitors using common, safety-oriented metrics.

【5】Beyond Fixed Formulas: Data-Driven Linear Predictor for Efficient Diffusion Models
标题：超越固定公式：高效扩散模型的数据驱动线性预测器
链接：https://arxiv.org/abs/2604.26365

作者：Zhirong Shen,Rui Huang,Jiacheng Liu,Chang Zou,Peiliang Cai,Shikang Zheng,Zhengyi Shi,Liang Feng,Linfeng Zhang
备注：Accepted by CVPR 2026
摘要：为了解决扩散Transformers（DiT）的高采样成本，特征缓存提供了一种无需训练的加速方法。然而，现有的方法依赖于手工制作的预测公式，在激进的跳跃下失败。我们提出了L2 P（可学习线性预测器），一个简单的数据驱动的缓存框架，用可学习的每时间步权重取代固定系数。在单个GPU上快速训练约20秒，L2 P从过去的轨迹准确地重建当前特征。L2 P显著优于现有基准：它在FLUX.1-dev上实现了4.55倍的FLOPs减少和4.15倍的延迟加速，并在Qwen-Image模型上高达7.18倍的加速下保持了高视觉保真度，而之前的方法显示出明显的质量下降。我们的研究结果表明，学习线性预测是非常有效的高效DiT推理。代码可在https://github.com/Aredstone/L2P-Cache上获得。
摘要：To address the high sampling cost of Diffusion Transformers (DiTs), feature caching offers a training-free acceleration method. However, existing methods rely on hand-crafted forecasting formulas that fail under aggressive skipping. We propose L2P (Learnable Linear Predictor), a simple data-driven caching framework that replaces fixed coefficients with learnable per-timestep weights. Rapidly trained in ~20 seconds on a single GPU, L2P accurately reconstructs current features from past trajectories. L2P significantly outperforms existing baselines: it achieves a 4.55x FLOPs reduction and 4.15x latency speedup on FLUX.1-dev, and maintains high visual fidelity under up to 7.18x acceleration on Qwen-Image models, where prior methods show noticeable quality degradation. Our results show learning linear predictors is highly effective for efficient DiT inference. Code is available at https://github.com/Aredstone/L2P-Cache.

【6】Asymptotically Robust Learning-Augmented Algorithms for Preemptive FIFO Buffer Management
标题：用于抢占式先进先出缓冲区管理的渐进鲁棒学习增强算法
链接：https://arxiv.org/abs/2604.26349

作者：Wen-Han Hsieh,Ya-Chun Liang
摘要：我们提出了一个学习增强的在线算法的抢占式FIFO缓冲区管理问题，其中数据包到达在线到一个有限容量的缓冲区，必须在FIFO顺序传输，该算法可以抢占式丢弃缓冲的数据包，以适应未来的到来。我们的算法同时实现了1-一致性，η-光滑性和渐近\sqrt{3}-鲁棒性，其中η表示预测误差。具体来说，它在完美预测下达到最佳竞争比1，随着预测误差的增加而平滑地下降，并在任意不准确的预测下保持渐近竞争比\sqrt{3}，与Englert和Westermann在2009年建立的经典在线问题的最佳最坏情况保证相匹配[Anglica 53（4）：523-548]。我们工作的一个关键技术贡献是引入了基于输出的预测误差度量。由于容量限制规定，只有一个严格有界的子集到达数据包最终传输，我们的度量评估预测质量的最佳时间表，而不是原始的输入序列，避免人为的错误惩罚。为了保证鲁棒性，我们的算法动态监控预测，并在过渡到最坏情况的回退机制时执行缓冲区清除策略。我们证明了这种清算操作所产生的竞争损失是有界的一个渐进消失的附加容量常数。最后，我们证明了我们的算法为学习增强的缓冲区管理提供了一个通用的框架：用任何β-竞争在线算法替换回退模块，立即产生渐近β-鲁棒性。
摘要：We present a learning-augmented online algorithm for the preemptive FIFO buffer management problem, where packets arrive online to a finite-capacity buffer, must be transmitted in FIFO order, and the algorithm may preemptively discard buffered packets to accommodate future arrivals. Our algorithm simultaneously achieves 1-consistency, η-smoothness, and asymptotic \sqrt{3}-robustness, where ηdenotes the prediction error. Specifically, it attains an optimal competitive ratio of 1 under perfect predictions, degrades smoothly as the prediction error increases, and maintains an asymptotic competitive ratio of \sqrt{3} under arbitrarily inaccurate predictions, matching the best-known worst-case guarantee for the classical online problem, established by Englert and Westermann in 2009 [Algorithmica 53(4): 523-548]. A key technical contribution of our work is the introduction of an \emph{output-based prediction error metric}. Because capacity constraints dictate that only a strictly bounded subset of arriving packets is ultimately transmitted, our metric assesses prediction quality over the resulting optimal schedules rather than the raw input sequences, avoiding artificial error penalties. To guarantee robustness, our algorithm dynamically monitors predictions and executes a \emph{buffer-clearing strategy} upon transitioning to a worst-case fallback mechanism. We prove that the competitive loss incurred by this clearing operation is bounded by an additive capacity constant that vanishes asymptotically. Finally, we show that our algorithm provides a generalized framework for learning-augmented buffer management: substituting the fallback module with any β-competitive online algorithm immediately yields asymptotic β-robustness.

【7】NeuroPlastic: A Plasticity-Modulated Optimizer for Biologically Inspired Learning Dynamics
标题：NeuroPlastic：生物启发学习动力学的可塑性调节优化器
链接：https://arxiv.org/abs/2604.26297

作者：Douglas Jiang,Yuechen Wang,Jiayi Wang,Jiaying Geng,Qinglong Wang,Feng Tian
备注：16 pages, 7 figures
摘要：优化算法是现代深度学习的基础，但最广泛使用的方法主要依赖于基于局部梯度统计的更新规则。我们介绍了NeuroPlastic，一种可塑性调制优化器，它通过自适应多信号调制机制增强基于梯度的更新，该机制受到多因素突触可塑性的启发，这是神经生物学的一个概念。NeuroPlastic使用捕获梯度、类似活动和类似记忆的统计数据的交互组件动态扩展梯度更新，形成与标准深度学习训练管道兼容的轻量级调制层。在整个图像分类基准中，NeuroPlastic始终优于受控的仅梯度消融，在Fashion-MNIST基准和减少数据制度中具有更明显的增益。在CIFAR-10与ResNet-18的转移实验中，该方法保持稳定和竞争力，无需重新调整。这些结果表明，多信号可塑性启发的调制可以为传统的梯度驱动优化提供有用的扩展，特别是当学习信号有限或有噪声时，并为深度学习中基于梯度的方法提供了一个有前途的方向。
摘要：Optimization algorithms are fundamental to modern deep learning, yet most widely used methods rely on update rules based primarily on local gradient statistics. We introduce NeuroPlastic, a plasticity-modulated optimizer that augments gradient-based updates with an adaptive multi-signal modulation mechanism inspired by multi-factor synaptic plasticity, a concept from neurobiology. NeuroPlastic dynamically scales gradient updates using interacting components that capture gradient, activity-like, and memory-like statistics, forming a lightweight modulation layer compatible with standard deep learning training pipelines. Across image classification benchmarks, NeuroPlastic consistently improves over a controlled gradient-only ablation, with more pronounced gains on the Fashion-MNIST benchmark and in reduced-data regimes. In transfer experiments on CIFAR-10 with ResNet-18, the method remains stable and competitive without retuning. These results suggest that multi-signal plasticity-inspired modulation can provide a useful extension to conventional gradient-driven optimization, particularly when learning signals are limited or noisy, and offer a promising direction for gradient-based methods in deep learning.

【8】OMEGA: Optimizing Machine Learning by Evaluating Generated Algorithms
标题：OMEK：通过评估生成的算法优化机器学习
链接：https://arxiv.org/abs/2604.26211

作者：Jeremy Nixon,Annika Singh
备注：ICLR 2026: Workshop on AI with Recursive Self-Improvement
摘要：为了实现人工智能研究的自动化，我们引入了一个完整的端到端框架OMEGA：通过评估生成的算法优化机器学习，该框架从想法生成开始，以可执行代码结束。我们的系统将结构化元提示工程与可执行代码生成相结合，以创建新的ML分类器。OMEGA框架已被用于生成几种新颖的算法，这些算法在20个基准数据集（infinity-bench）的强大选择中优于scikit-learn基线。您可以访问本文中讨论的模型以及python包中的更多内容：pip install omega-models。
摘要：In order to automate AI research we introduce a full, end-to-end framework, OMEGA: Optimizing Machine learning by Evaluating Generated Algorithms, that starts at idea generation and ends with executable code. Our system combines structured meta-prompt engineering with executable code generation to create new ML classifiers. The OMEGA framework has been utilized to generate several novel algorithms that outperform scikit-learn baselines across a robust selection of 20 benchmark datasets (infinity-bench). You can access models discussed in this paper and more in the python package: pip install omega-models.

【9】Lifting Embodied World Models for Planning and Control
标题：提升规划和控制的平衡世界模型
链接：https://arxiv.org/abs/2604.26182

作者：Alex N. Wang,Trevor Darrell,Pavel Izmailov,Yutong Bai,Amir Bar
摘要：具体代理的世界模型预测未来的观察由代理采取的行动为条件。对于复杂的实施例，动作空间是高维的并且难以指定：例如，精确地控制人类代理需要指定每个关节的运动。这使得世界模型很难控制，并且计划起来很昂贵，因为基于搜索的方法（如CEM）在动作维度上的伸缩性很差。为了解决这个问题，我们训练了一个轻量级的策略，它将高级别的动作映射到低级别的联合动作序列。将此策略与冻结世界模型组合，产生一个提升世界模型，该模型预测来自单个高级动作的未来观测序列。我们将此框架实例化为类似人类的实施例，将高级别动作空间定义为在当前观察帧上注释的一小组2D路点，每个路点指定叶关节（骨盆，头部，手部）的近期目标位置。航路点是低维的，视觉上可解释的，并且易于手动指定或搜索。我们发现，提升的世界模型大大优于直接在低级别的联合空间搜索（3.8\times $较低的平均联合误差的目标姿态），同时保持更高的计算效率和推广到环境看不见的政策。
摘要：World models of embodied agents predict future observations conditioned on an action taken by the agent. For complex embodiments, action spaces are high-dimensional and difficult to specify: for example, precisely controlling a human agent requires specifying the motion of each joint. This makes the world model hard to control and expensive to plan with as search-based methods like CEM scale poorly with action dimensionality. To address this issue, we train a lightweight policy that maps high-level actions to sequences of low-level joint actions. Composing this policy with the frozen world model produces a lifted world model that predicts a sequence of future observations from a single high-level action. We instantiate this framework for a human-like embodiment, defining the high-level action space as a small set of 2D waypoints annotated on the current observation frame, each specifying a near-term goal position for a leaf joint (pelvis, head, hands). Waypoints are low-dimensional, visually interpretable, and easy to specify manually or to search over. We show that the lifted world model substantially outperforms searching directly in low-level joint space ($3.8\times$ lower mean joint error to the goal pose), while remaining more compute-efficient and generalizing to environments unseen by the policy.

【10】Budget-Constrained Causal Bandits: Bridging Uplift Modeling and Sequential Decision-Making
标题：预算限制的因果盗贼：架起Upper建模和顺序决策的桥梁
链接：https://arxiv.org/abs/2604.26169

作者：Abhirami Pillai
备注：12 pages, 2 figures, preprint
摘要：预算限制下的处理分配是数字广告的核心挑战：广告商必须在明智地花费有限预算的同时决定向哪些用户展示广告。标准方法遵循两阶段离线管道-首先收集历史数据以估计异质处理效果（HTE），然后求解约束优化以分配预算。这在大量数据的情况下工作得很好，但在冷启动设置中失败，例如新活动，新市场或新客户群，其中几乎没有历史数据。我们提出了预算约束的因果强盗（BCCB），一个在线框架，学习哪些用户响应广告，同时花费预算，一次一个用户的治疗决策。BCCB将三个组成部分统一到一个单一的顺序过程中：学习个人层面的广告效果，探索用户的反应是不确定的，并随着时间的推移调整预算。我们评估了Criteo Udash数据集，这是一个来自真实随机对照试验的大规模广告数据集。我们的关键发现是数据效率的交叉：离线方法需要大约10，000个历史观察才能产生可靠的结果，而BCCB从第一个用户开始就有效地运行。此外，BCCB在运行之间表现出3- 5倍的低性能差异，使其在实际活动规划中更加实用。在纯在线方法中，BCCB在所有测试的预算水平上始终优于标准汤普森抽样、预算汤普森抽样和贪婪HTE估计。
摘要：Treatment allocation under budget constraints is a central challenge in digital advertising: advertisers must decide which users to show ads to while spending a limited budget wisely. The standard approach follows a two-stage offline pipeline - first collect historical data to estimate heterogeneous treatment effects (HTE), then solve a constrained optimization to allocate the budget. This works well with abundant data, but fails in cold-start settings such as new campaigns, new markets, or new customer segments where little historical data exists. We propose Budget-Constrained Causal Bandits (BCCB), an online framework that learns which users respond to ads while simultaneously spending the budget, making treatment decisions one user at a time. BCCB unifies three components into a single sequential process: learning individual-level ad effectiveness, exploring users whose response is uncertain, and pacing the budget over time. We evaluated on the Criteo Uplift dataset, a large-scale advertising dataset from a real randomized controlled trial. Our key finding is a data-efficiency crossover: offline methods require approximately 10,000 historical observations to produce reliable results, while BCCB operates effectively from the very first user. Furthermore, BCCB exhibits 3-5x lower performance variance between runs, making it more practical for real campaign planning. Among purely online methods, BCCB consistently outperforms standard Thompson Sampling, budgeted Thompson Sampling, and greedy HTE estimation across all budget levels tested.

【11】reward-lens: A Mechanistic Interpretability Library for Reward Models
标题：奖励镜头：奖励模型的机械解释性库
链接：https://arxiv.org/abs/2604.26130

作者：Mohammed Suhail B Nadaf
备注：30 pages, 5 figures, 9 tables, including appendix. Library available at https://github.com/suhailnadaf509/reward-lens (pip install reward-lens)
摘要：每个RLHF训练的语言模型都是由奖励模型塑造的，但机械可解释性工具包- logit镜头，直接logit属性，激活补丁，稀疏自动编码器-是为生成式LLM构建的，其原语都投射到词汇解嵌入上。奖励模型用一个标量回归头代替它，打破了每个工具。我们提出了reward-lens，一个开源库，它将这个工具包移植到奖励模型中，围绕一个观察组织起来：奖励头部的权重向量$w_r$是每个可解释性问题的自然轴。该库提供了一个奖励镜头，组件属性，三模式激活补丁，奖励黑客探测套件，TopK SAE特征属性，跨模型比较，和五个理论基础的扩展（失真指数，分歧感知补丁，错位级联检测，奖励期限冲突分析，概念向量分析）。十种方法的适配器协议涵盖Llama、Mistral、Gemma-2和ArmoRM多目标头，具有适用于任何HuggingFace序列分类模型的通用适配器。我们在约695个RewardBench对中验证了两个生产奖励模型。主要的经验发现是负面的：线性归因不能预测因果修补效应（平均斯皮尔曼$ρ=-0.256 $的Skywork，$-0.027$的ArmoRM）。该框架将这种分歧视为一种要暴露的属性，而不是一个bug --激发了一种设计，使观察和因果观点保持一流和直接可比。
摘要：Every RLHF-trained language model is shaped by a reward model, yet the mechanistic interpretability toolkit -- logit lens, direct logit attribution, activation patching, sparse autoencoders -- was built for generative LLMs whose primitives all project onto a vocabulary unembedding. Reward models replace that with a scalar regression head, breaking each tool. We present reward-lens, an open-source library that ports this toolkit to reward models, organised around one observation: the reward head's weight vector $w_r$ is the natural axis for every interpretability question. The library provides a Reward Lens, component attribution, three-mode activation patching, a reward-hacking probe suite, TopK SAE feature attribution, cross-model comparison, and five theory-grounded extensions (distortion index, divergence-aware patching, misalignment cascade detection, reward-term conflict analysis, concept-vector analysis). A ten-method adapter protocol covers Llama, Mistral, Gemma-2, and ArmoRM multi-objective heads, with a generic adapter for any HuggingFace sequence classification model. We validate on two production reward models across ~695 RewardBench pairs. The central empirical finding is negative: linear attribution does not predict causal patching effects (mean Spearman $ρ= -0.256$ on Skywork, $-0.027$ on ArmoRM). The framework treats this disagreement as a property to expose, not a bug -- motivating a design that keeps observational and causal views first-class and directly comparable.

【12】NeuralEmu: in situ Measurement-Driven, ML-based, High-Fidelity 5G Network Emulation
标题：NeuralEmu：现场测量驱动、基于ML、高保真5G网络仿真
链接：https://arxiv.org/abs/2604.26080

作者：Haoran Wan,Yaxiong Xie,Kyle Jamieson
摘要：当前和未来的应用需要超低延迟和一致的吞吐量，但经常穿越5G蜂窝网络，因此要应对不稳定的数据包动态，因为5G基站路由器会动态地对用户工作负载和无线信道条件做出反应。在这些环境中评估网络算法的任务受到当前工具的阻碍：记录和重放仿真器切断了应用程序端点与商业运营商专有的5G调度器之间存在的反馈交互，而全栈仿真器依赖于过于简单的调度逻辑。为了弥合这一现实差距，我们提出了NeuralEmu，这是一个高保真的基于机器学习的仿真框架，可以直接从极高分辨率的网络遥测工具中学习复杂的5G调度器资源分配行为。NeuralEmu是第一个处理多个客户端的仿真器，它利用机器学习来动态预测资源块分配和基于瞬时用户缓冲区占用和信道状态的调制方案。为了捕获现实的跨用户竞争，流量重建模型反转蜂窝网络调度结果，以恢复不受控制的后台用户的底层流量模式。NeuralEmu作为一款高性能Linux中间盒仿真器实现，相对于各种网络应用的最新技术水平，它减少了仿真误差，包括但不限于55%的网页加载时间、57%的WebRTC编码器比特率和51%的云游戏数据包单向延迟，为未来的实时交互网络协议和应用提供了一个准确、标准化的测试平台。
摘要：Current and future applications demand ultra-low latency and consistent throughput, yet frequently traverse 5G cellular networks, so cope with volatile packet dynamics, as 5G base station schedulers dynamically react to user workloads and wireless channel conditions. The task of evaluating network algorithms in these environments is hamstrung by current tools: record-and-replay emulators sever the feedback interaction that exists between application end points and a commercial operator's proprietary 5G scheduler, while full-stack simulators rely on overly simplistic scheduling logic. To bridge this reality gap, we present NeuralEmu, a high-fidelity, machine learning-based emulation framework that learns complex 5G scheduler resource allocation behaviors directly from extremely high-resolution network telemetry tools. The first emulator to handle multiple clients, NeuralEmu utilizes machine learning to dynamically predict resource block allocations and modulation schemes based on instantaneous user buffer occupancy and channel states. To capture realistic cross-user contention, a traffic reconstruction model inverts cellular network scheduling results to recover the underlying traffic patterns of uncontrolled background users. Implemented as an high-performance Linux middlebox emulator, NeuralEmu reduces emulation error relative to the state of the art for various network applications including but not limited to 55% for web-page load time, 57% for WebRTC encoder bit rate, and 51% for cloud gaming packet one-way delay, providing an accurate, standardized testing ground for tomorrow's real-time interactive network protocols and applications.

【13】Laplace Approximation for Bayesian Tensor Network Kernel Machines
标题：Bayesian张量网络核心机的拉普拉斯逼近
链接：https://arxiv.org/abs/2604.26673

作者：Albert Saiapin,Kim Batselier
备注：19 pages, 3 figures, 6 tables. Code available at: https://github.com/AlbMLpy/laplace-tnkm
摘要：不确定性估计对于存在模糊或分布外输入的稳健决策至关重要。高斯过程（GP）是经典的基于核的模型，提供了原则性的不确定性量化，并在中小规模数据集上表现良好。或者，在张量网络假设下制定权重空间学习问题会产生可扩展的张量网络核机器。然而，这些假设打破了高斯性，使标准的概率推理复杂化。这就提出了一个基本问题：张量网络核机器如何提供原则性的不确定性估计？我们提出了一种新的贝叶斯张量网络核机器（LA-TNKM），采用（线性化）拉普拉斯近似贝叶斯推理。一组全面的数值实验表明，该方法在不同的UCI回归基准中始终匹配或超越高斯过程和贝叶斯神经网络（BNN），突出了其有效性和实用性。
摘要：Uncertainty estimation is essential for robust decision-making in the presence of ambiguous or out-of-distribution inputs. Gaussian Processes (GPs) are classical kernel-based models that offer principled uncertainty quantification and perform well on small- to medium-scale datasets. Alternatively, formulating the weight space learning problem under tensor network assumptions yields scalable tensor network kernel machines. However, these assumptions break Gaussianity, complicating standard probabilistic inference. This raises a fundamental question: how can tensor network kernel machines provide principled uncertainty estimates? We propose a novel Bayesian Tensor Network Kernel Machine (LA-TNKM) that employs a (linearized) Laplace approximation for Bayesian inference. A comprehensive set of numerical experiments shows that the proposed method consistently matches or surpasses Gaussian Processes and Bayesian Neural Networks (BNNs) across diverse UCI regression benchmarks, highlighting both its effectiveness and practical relevance.

【14】Probabilistic data quality assessment for structural monitoring data via outlier-resistant conditional diffusion model
标题：基于抗异常值条件扩散模型的结构监测数据概率数据质量评估
链接：https://arxiv.org/abs/2604.26366

作者：Qi Li,Yong Huang,Hui Li
备注：43 pages, 15 figures and 2 tables
摘要：数据质量评估是保证后续结构健康监测任务可靠性的重要环节。本文提出了一种基于预测偏差的SHM数据质量评估方法，该方法使用单变量隐式自回归模型，能够进行异常值诊断和数据清洗。所提出的条件扩散模型（CDM）增强了标准扩散模型的条件嵌入模块，将时间背景，四分位数归一化，以减轻分布偏斜，和Huber损失，以提高对离群值的鲁棒性。在这个单变量隐式自回归框架内，每个数据点被分配一个离群值概率，量化其“离群”程度，并计算全局质量评估分数以表征整体数据集质量。利用来自真实世界结构的操作数据进行的广泛案例研究表明，所提出的框架显着提高了数据质量评估的准确性，优于其他代表聚类，基于隔离和深度重建方法的强基线。烧蚀实验和超参数分析的结果进一步证明了所提出的框架的有效性和鲁棒性。
摘要：Data quality assessment is an essential step that ensures the reliability of the subsequent structural health monitoring (SHM) tasks. This study proposes a prediction deviation-based SHM data quality assessment method using a univariate implicit auto-regressive model, enabling outlier diagnosis and data cleaning. The proposed conditional diffusion model (CDM) augments the standard diffusion model with a conditional embedding module to incorporate temporal context, quartile normalization to mitigate distribution skew, and a Huber loss to enhance robustness against outliers. Within this univariate implicit autoregressive framework, each data point is assigned an outlier probability, quantifying its degree of "outlier-ness", and a global quality evaluation score is computed to characterize the overall dataset quality. Extensive case studies utilizing operational data from real-world structures demonstrate that the proposed framework significantly improves the accuracy of data quality assessment, outperforming other strong baselines representative of clustering, isolation-based, and deep reconstruction methods. The effectiveness and robustness of the proposed framework are further demonstrated by the findings of ablation experiments and hyperparameter analysis.

【15】Fitting Large Nonlinear Mixed Effects Models Using Variational Expectation Maximization
标题：使用变分期望最大化来匹配大型非线性混合效应模型
链接：https://arxiv.org/abs/2604.26160

作者：Mohamed Tarek,Pedro Afonso
摘要：非线性混合效应模型（NLME）广泛应用于药物计量学及相关领域，用于分析分层和纵向数据。然而，随着参数和随机效应数量的增加，用于最大化边际似然的传统方法在计算上变得昂贵。本文探讨了变分期望最大化（VEM）算法，一个可扩展的替代拟合NLME模型。VEM最初是在概率图模型的背景下引入的，后来通过变分自编码器推广，但尚未广泛应用于NLME建模。通过利用灵活的变分族和反向模式自动微分，VEM可以有效地最大化边际似然，扩展到具有超过15，000个总体参数的NLME模型。这项工作提供了一个详细的描述VEM，比较它与其他NLME拟合算法，并通过计算实验突出其可扩展性。使用Pumas统计软件，我们拟合了两个检验模型：1）标准华法林模型，以及2）具有15，410个群体参数和16个随机效应的DeepNLME Friberg模型。华法林模型拟合完成以证明VEM的正确性，而DeepNLME Friberg模型拟合有限数量的迭代以测量每次迭代的时间并证明VEM的可扩展性。
摘要：Nonlinear Mixed Effects models (NLME) models are widely used in pharmacometrics and related fields to analyze hierarchical and longitudinal data. However, as the number of parameters and random effects increases, traditional methods for maximizing the marginal likelihood become computationally expensive. This paper explores the Variational Expectation Maximization (VEM) algorithm, a scalable alternative for fitting NLME models. Originally introduced in the context of probabilistic graphical models and later popularized through variational autoencoders, VEM has not been extensively applied to NLME modeling. By leveraging flexible variational families and reverse-mode automatic differentiation, VEM can efficiently maximize the marginal likelihood, scaling to NLME models with over 15,000 population parameters. This work provides a detailed description of VEM, compares it to other NLME fitting algorithms, and highlights its scalability through computational experiments. Using the Pumas statistical software, we fit two test models: 1) a standard warfarin model, and 2) a DeepNLME Friberg model with 15,410 population parameters and 16 random effects. The warfarin model was fitted to completion to demonstrate the correctness of VEM, while the DeepNLME Friberg model was fitted for a limited number of iterations to measure the time per iteration and demonstrate VEM's scalability.

【16】Mixture of Experts Framework in Machine Learning Interatomic Potentials for Atomistic Simulations
标题：机器学习中的混合专家框架原子模拟的原子间潜力
链接：https://arxiv.org/abs/2604.26143

作者：Gabriel de Miranda Nascimento,Marc L. Descoteaux,Laura Zichi,Chuin Wei Tan,William C. Witt,Nicola Molinari,Sriteja Mantha,Daniil Kitchaev,Mordechai Kornbluth,Karim Gadelrab,Charles Tuffile,Boris Kozinsky
备注：10 pages, 5 figures
摘要：第一性原理原子模拟对于理解复杂的材料现象是必不可少的，但从根本上受到其计算成本的限制。虽然机器学习原子间势（MLIP）在给定精度下大大提高了成本，但其推理成本仍然是大规模系统或长时间尺度的瓶颈。为了解决这个问题，我们引入了一个多保真度的“混合专家”框架的基础上的E（3）-equivariant快板架构。我们的方法在空间上将模拟域划分为化学复杂区域（例如，反应界面）和简单区域（例如，bulk lattice），为每个模型分配不同的容量。在这种静态区域分解的挑战中，模型之间在界面处的机械不匹配特别关键，因为它可以产生人工应力场和不稳定性。我们通过一种共同训练策略来解决这一挑战，在这种策略中，损失函数包括协议约束-对共享散装环境中评估的模型之间的每原子能量和力差异的惩罚-迫使独立模型学习散装材料的一致物理描述。我们在一个现实的Pt+CO催化系统上验证了这种方法，证明了共同训练的模型保持了精确的能量守恒，调整了它们的整体机械响应（例如，状态方程和体积模量），并以两倍以上的计算速度实现与完全高保真模拟相当的预测精度。
摘要：First-principles atomistic simulations are essential for understanding complex material phenomena but are fundamentally limited by their computational cost. While Machine Learning Interatomic Potentials (MLIPs) have drastically improved cost for a given accuracy, their inference cost remains a bottleneck for massive systems or long timescales. To address this, we introduce a multifidelity "Mixture-of-Experts" framework based on the E(3)-equivariant Allegro architecture. Our method spatially partitions the simulation domain into a chemically complex region (e.g., reactive interfaces) and a simple region (e.g., bulk lattice), assigning models of varying capacity to each. Among the challenges in such static domain decomposition, the mechanical mismatch between models at the interface is particularly critical, as it can generate artificial stress fields and instability. We address this challenge with a co-training strategy in which the loss function includes agreement constraints -- penalties on per-atom energy and force discrepancies between models evaluated on shared bulk environments -- forcing the independent models to learn a consistent physical description of the bulk material. We validate this approach on a realistic Pt+CO catalytic system, demonstrating that the co-trained models maintain exact energy conservation, align their bulk mechanical response (e.g., equation of state and bulk modulus), and achieve predictive accuracy comparable to a full high-fidelity simulation at more than twice the computational speed.

【17】QERNEL: a Scalable Large Electron Model
标题：QERNEL：可扩展的大电子模型
链接：https://arxiv.org/abs/2604.26018

作者：Khachatur Nazaryan,Liang Fu
备注：6 pages, 4 figures
摘要：我们介绍QERNEL，一个基本的神经波函数，变分解决家庭的参数化的多电子哈密顿和捕捉他们的基态在整个参数空间内的一个单一的模型。QERNEL将基于FiLM的参数调节与规模高效的架构元素相结合-专家和分组查询注意力的混合，以低计算成本大幅提高表达能力。我们将QERNEL应用于半导体莫尔异质双层中的相互作用电子，为多达150个电子的系统训练单个重量共享模型。通过求解以摩尔势深为条件的多电子薛定谔方程，QERNEL捕获了量子液晶态和晶体态，并发现了它们之间的急剧相变，其标志是相互作用能和电荷密度的突然变化。我们的工作建立了摩尔量子材料的基础模型和固体大电子模型的可扩展架构。
摘要：We introduce QERNEL, a foundational neural wavefunction that variationally solves families of parameterized many-electron Hamiltonians and captures their ground states throughout parameter space within a single model. QERNEL combines FiLM-based parameter conditioning with scale-efficient architectural elements -- mixture of experts and grouped-query attention, substantially improving expressivity at low computational cost. We apply QERNEL to interacting electrons in semiconductor moiré heterobilayers, training a single weight-shared model for systems of up to 150 electrons. By solving the many-electron Schrödinger equation conditioned on moiré potential depth, QERNEL captures both quantum liquid and crystal states and discovers the sharp phase transition between them, marked by abrupt changes in interaction energy and charge density. Our work establishes a foundation model for moiré quantum materials and a scalable architecture toward a Large Electron Model for solids.

【18】Learning Neural Operator Surrogates for the Black Hole Accretion Code
标题：黑洞吸积代码的学习神经运算符替代品
链接：https://arxiv.org/abs/2604.25985

作者：Matthias Nägele,Cedric Bös,Chester Tan,Christian M. Fromm,Ingo Scholtes,Karl Mannheim
摘要：广义相对论磁流体动力学（GR-MHD）模拟是研究黑洞吸积、相对论喷流和磁场重联的重要手段，但其计算成本严重限制了系统参数的探索。我们研究了黑洞吸积代码（\texttt{BHAC}）产生的两个天体物理相关模拟场景的神经运算符替代品。首先，一个物理信息傅立叶神经算子（PINO）的特殊相对论电阻MHD（SRRMHD）的Orszag-Tang涡的演变跨越Sweet-Parker和快速重联制度的范围内的双折射率的训练。通过将控制方程嵌入为以比可用数据监督更精细的时间分辨率评估的额外损失项，该模型在没有提供模拟数据的时间步长处学习动态，从而能够恢复在相同稀疏快照上训练的仅数据基线无法再现的等离子体团形成。据我们所知，目前的工作是第一次应用物理通知神经操作员的特殊相对论电阻MHD，和第一次调查这种模型的能力，以解决等离子体形成SRRMHD。在第二条调查线中，OFormer风格的Transformer神经操作员在特殊相对论MHD（SRMHD）中用\texttt{BHAC}创建的脊鞘相对论射流的演化上进行训练。该模型直接应用于自适应网格，突出了由于长序列而需要线性注意力。神经代理模型能够捕捉大多数主要细节，特别是在早期预测中。据我们所知，这构成了第一个应用程序的神经操作员直接在高分辨率自适应网格细化网格的背景下的MHD模拟。
摘要：General-relativistic magnetohydrodynamic (GR-MHD) simulations are essential for studying black hole accretion, relativistic jets, and magnetic reconnection, yet their computational cost severely limits systematic parameter exploration. We investigate neural operator surrogates for two astrophysically relevant simulation scenarios produced by the Black Hole Accretion Code (\texttt{BHAC}). First, a Physics Informed Fourier Neural Operator (PINO) is trained on the special-relativistic resistive MHD (SRRMHD) evolution of the Orszag-Tang vortex over a range of resistivities spanning the Sweet-Parker and fast reconnection regimes. By embedding the governing equations as an additional loss term evaluated at finer temporal resolution than the available data supervision, the model learns dynamics at time steps where no simulation data is provided, enabling recovery of plasmoid formation that a data-only baseline trained on the same sparse snapshots fails to reproduce. To our knowledge, the present work is the first application of a physics informed neural operator to special relativistic resistive MHD, and the first to investigate the capability of such models to resolve plasmoid formation in SRRMHD. In a second line of investigation, an OFormer-style Transformer Neural Operator is trained on the evolution of spine-sheath relativistic jets created with \texttt{BHAC}, in special-relativistic MHD (SRMHD). The model is directly applied on the adaptive mesh, highlighting the need for linear attention due to long sequences. The neural surrogate model is capable of capturing most of the major details, especially in early predictions. To our knowledge, this constitutes the first application of a neural operator directly on a high resolution adaptive mesh refinement grid in the context of MHD simulations.

其他(27篇)

【1】A Note on How to Remove the $\ln\ln T$ Term from the Squint Bound
链接：https://arxiv.org/abs/2604.26926

作者：Francesco Orabona
摘要：在Orabona和Pál [2016]中，我们引入了转移的KT势，以消除具有专家界限的无参数学习中的$\ln \ln T$因子。在这个简短的技术说明中，我证明了这相当于改变Krichevsky-Trofimov算法中的先验。然后，我展示了如何使用相同的思想来删除Squint算法的数据独立界中的$\ln \ln T$因子。
摘要：In Orabona and Pál [2016], we introduced the shifted KT potentials, to remove the $\ln \ln T$ factor in the parameter-free learning with expert bound. In this short technical note, I show that this is equivalent to changing the prior in the Krichevsky--Trofimov algorithm. Then, I show how to use the same idea to remove the $\ln \ln T$ factor in the data-independent bound for the Squint algorithm.

【2】ClawGym: A Scalable Framework for Building Effective Claw Agents
标题：ClawGym：构建有效爪代理的可扩展框架
链接：https://arxiv.org/abs/2604.26904

作者：Fei Bai,Huatong Song,Shuang Sun,Daixuan Cheng,Yike Yang,Chuan Hao,Renyuan Li,Feng Chang,Yuan Wei,Ran Tao,Bryan Dai,Jian Yang,Wayne Xin Zhao
摘要：爪式环境支持本地文件、工具和持久工作区状态上的多步骤工作流。然而，围绕这些环境的可扩展开发仍然受到缺乏系统框架的限制，特别是用于合成可验证的训练数据并将其与代理培训和诊断评估相集成的框架。为了应对这一挑战，我们推出了ClawGym，这是一个可扩展的框架，支持Claw风格个人代理开发的整个生命周期。具体来说，我们构建了ClawGym-SynData，这是一个由13.5K过滤任务组成的多样化数据集，这些任务是从人物驱动的意图和技能基础操作合成的，并与现实的模拟验证和混合验证机制相结合。然后，我们通过对黑盒部署轨迹的监督微调来训练一系列有能力的Claw风格模型，称为ClawGym-Agents，并通过一个轻量级管道进一步探索强化学习，该管道将每个任务的沙箱并行部署。为了支持可靠的评估，我们进一步构建了ClawGym-Bench，这是一个通过自动过滤和人工LLM审查校准的200个实例的基准。相关资源将很快在https://github.com/ClawGym上发布。
摘要：Claw-style environments support multi-step workflows over local files, tools, and persistent workspace states. However, scalable development around these environments remains constrained by the absence of a systematic framework, especially one for synthesizing verifiable training data and integrating it with agent training and diagnostic evaluation. To address this challenge, we present ClawGym, a scalable framework that supports the full lifecycle of Claw-style personal agent development. Concretely, we construct ClawGym-SynData, a diverse dataset of 13.5K filtered tasks synthesized from persona-driven intents and skill-grounded operations, paired with realistic mock workspaces and hybrid verification mechanisms. We then train a family of capable Claw-style models, termed ClawGym-Agents, through supervised fine-tuning on black-box rollout trajectories, and further explore reinforcement learning via a lightweight pipeline that parallelizes rollouts across per-task sandboxes.To support reliable evaluation, we further construct ClawGym-Bench, a benchmark of 200 instances calibrated through automated filtering and human-LLM review. Relevant resources will be soon released at https://github.com/ClawGym.

【3】FaaSMoE: A Serverless Framework for Multi-Tenant Mixture-of-Experts Serving
标题：FaaSMoE：用于多租户混合专家服务的无服务器框架
链接：https://arxiv.org/abs/2604.26881

作者：Minghe Wang,Trever Schirmer,Mohammadreza Malekabbasi,David Bermbach
备注：Accepted for publication in the 9th International Workshop on Edge Systems, Analytics and Networking (EdgeSys 2026)
摘要：混合专家模型（Mixture-of-Experts，MoE）通过为每个输入激活一小部分专家模型来提供高容量和有效的推理成本。但是，部署MoE模型需要所有专家驻留在内存中，从而在激活的专家使用的资源和提供的资源之间产生了差距。这种利用不足在多租户场景中更加明显。在本文中，我们提出了FaaS MoE，一个多租户MoE服务架构上的功能即服务（FaaS）平台。FaaS MoE通过将专家部署为无状态FaaS功能来扩展MoE的控制和执行平面，从而实现跨租户的按需和扩展到零的专家调用。FaaSMoE进一步支持函数内的可配置专家粒度，权衡每个专家的弹性以减少调用开销。我们实现了一个开源的边缘导向FaaS平台的原型，并使用Qwen1.5-MoE-2.7 B在多租户工作负载下进行评估。与全模型基准相比，FaaS MoE使用不到三分之一的资源，展示了在多租户环境中实现可扩展MoE服务的实用和资源高效的途径。
摘要：Mixture-of-Experts (MoE) models offer high capacity with efficient inference cost by activating a small subset of expert models per input. However, deploying MoE models requires all experts to reside in memory, creating a gap between the resource used by activated experts and the provisioned resources. This underutilization is further pronounced in multi-tenant scenarios. In this paper, we propose FaaSMoE, a multi-tenant MoE serving architecture built on Function-as-a-Service (FaaS) platforms. FaaSMoE decouples the control and execution planes of MoE by deploying experts as stateless FaaS functions, enabling on-demand and scale-to-zero expert invocation across tenants. FaaSMoE further supports configurable expert granularity within functions, trading off per-expert elasticity for reduced invocation overhead. We implement a prototype with an open-source edge-oriented FaaS platform and evaluate it using Qwen1.5-moe-2.7B under multi-tenant workloads. Compared to a full-model baseline, FaaSMoE uses less than one third of the resources, demonstrating a practical and resource-efficient path towards scalable MoE serving in a multi-tenant environment.

【4】KAYRA: A Microservice Architecture for AI-Assisted Karyotyping with Cloud and On-Premise Deployment
标题：KAYRA：一种用于云和本地部署的人工智能辅助染色体分析的微服务架构
链接：https://arxiv.org/abs/2604.26869

作者：Attila Pintér,Javier Rico,Attila Répai,Jalal Al-Afandi,Adrienn Éva Borsy,András Kozma,Hajnalka Andrikovics,György Cserey
摘要：我们提出KAYRA，一个端到端的核型分析系统，在临床细胞遗传学实验室的操作限制内操作。KAYRA被构建为一个容器化的微服务管道，其ML堆栈结合了EfficientNet-B5 + U-Net语义分割器，Mask R-CNN（ResNet-50 + FPN）实例检测器和ResNet-18分类器，通过级联ROI缩小策略进行编排，将每个下游模型集中在染色体承载区域。相同的容器映像既可以作为云服务部署，也可以作为内部部署安装，支持不允许患者数据出口的临床环境以及允许出口的临床环境。对两种商业参考核型分析系统在10个中期分裂相的459条染色体上进行的初步临床评价显示，分割准确率为98.91（vs. 78.21%/40.52%），分类准确率为89.1%（vs. 86.9%/54.5%），旋转准确率为89.76%（vs. 94.55%/78.43%）。KAYRA在所有三个轴上都优于旧的密度阈值参考（通过对染色体水平计数的Fisher精确检验进行分割和分类，p < 0.0001），并且分割也与现代AI支持的参考文献相比较（p < 0.0001）;在分类方面，在当前测试集大小下，与现代AI参考的差异无统计学显著性（p = 0.34）。该系统达到TRL 6成熟度，并集成了诊断细胞遗传学实践所需的人工参与专家评审工作流程。本文的论点是，多模型细胞遗传学AI服务可以打包为微服务架构，支持灵活的部署-云托管或内部部署-同时在试点临床评估中提供强大的经验性能。
摘要：We present KAYRA, an end-to-end karyotyping system that operates inside the operational constraints of a clinical cytogenetic laboratory. KAYRA is architected as a containerized microservice pipeline whose ML stack combines an EfficientNet-B5 + U-Net semantic segmenter, a Mask R-CNN (ResNet-50 + FPN) instance detector, and a ResNet-18 classifier, orchestrated through a cascaded ROI-narrowing strategy that focuses each downstream model on the chromosome-bearing region. The same container images are deployed both as a cloud service and as an on-premise installation, supporting clinical environments where patient-data egress is not permitted as well as those where it is. A pilot clinical evaluation against two commercial reference karyotyping systems on 459 chromosomes from 10 metaphase spreads shows segmentation accuracy of 98.91 % (vs. 78.21 % / 40.52 %), classification accuracy of 89.1 % (vs. 86.9 % / 54.5 %), and rotation accuracy of 89.76 % (vs. 94.55 % / 78.43 %). KAYRA improves over the older density-thresholding reference on all three axes (p < 0.0001 for segmentation and classification by Fisher's exact test on chromosome-level counts), and on segmentation also against the modern AI- supported reference (p < 0.0001); on classification the difference vs. the modern AI reference is not statistically significant at the present test-set size (p = 0.34). The system reaches TRL 6 maturity and integrates the human-in-the-loop expert-review workflow that diagnostic cytogenetic practice requires. The thesis of this paper is that a multi-model cytogenetic AI service can be packaged as a microservice architecture supporting flexible deployment - cloud-hosted or on-premise - while delivering strong empirical performance on a pilot clinical evaluation.

【5】Random Cloud: Finding Minimal Neural Architectures Without Training
标题：随机云：无需训练即可找到最小神经架构
链接：https://arxiv.org/abs/2604.26830

作者：Javier Gil Blázquez
摘要：我提出了随机云方法，一种无需训练的神经结构搜索方法，通过随机探索和渐进式结构简化发现最小前馈网络拓扑。与需要完整的训练-修剪-重新训练周期的训练后修剪方法不同，该方法在不进行反向传播的情况下评估随机初始化的网络，逐渐减少其拓扑结构，并在最后仅训练最佳的最小候选者。我评估了7个分类基准对幅度修剪和随机修剪基线。随机云在7个数据集中的6个中匹配或优于两个基线，在声纳上实现了统计上的显著改进（$+4.9$pp精度，$p{=}0.017$ vs幅度修剪），参数减少了87%。至关重要的是，该方法在5个数据集中的4个数据集中比两个修剪基线更快（完全训练成本的0.67- 0.94倍），因为它完全避免了训练全尺寸网络。
摘要：I propose the \emph{Random Cloud} method, a training-free approach to neural architecture search that discovers minimal feedforward network topologies through stochastic exploration and progressive structural reduction. Unlike post-training pruning methods that require a full train-prune-retrain cycle, this method evaluates randomly initialized networks without backpropagation, progressively reduces their topology, and only trains the best minimal candidate at the end. I evaluate on 7 classification benchmarks against magnitude pruning and random pruning baselines. The Random Cloud matches or outperforms both baselines in 6 of 7 datasets, achieving statistically significant improvements on Sonar ($+4.9$pp accuracy, $p{=}0.017$ vs magnitude pruning) with 87\% parameter reduction. Crucially, the method is faster than both pruning baselines in 4 of 5 datasets (0.67--0.94$\times$ the cost of full training), since it avoids training the full-size network entirely.

【6】Accelerating RL Post-Training Rollouts via System-Integrated Speculative Decoding
标题：通过系统集成推测解码加速RL训练后的推出
链接：https://arxiv.org/abs/2604.26779

作者：Hayate Iso,Tiyasa Mitra,Sudipta Mondal,Rasoul Shafipour,Venmugil Elango,Terry Kong,Yuki Huang,Seonjin Na,Izzy Putterman,Benjamin Chislett,Maor Ashkenazi,Joseph Guman,Gerald Shen,Tugrul Konuk,Ashwath Aithal,Ritika Borkar,Ran Zilberstein,Bita Rouhani
摘要：前沿语言模型的RL后训练越来越受到自回归推出生成的影响，这使得推出加速成为一个中心系统挑战。许多现有的效率方法通过改变推出或优化机制来提高吞吐量，例如，通过偏离策略执行、重放或较低精度的生成。我们研究投机性解码作为RL推出的无损加速原语，保留目标模型的输出分布。我们使用vLLM后端在NeMo-RL中实现推测解码，支持同步和异步流水线，并在RL推出期间启用推测。这种好处可以通过推测机制来实现，例如预先训练的MTP头，小型外部草稿模型甚至是传统上在RL阶段之后应用的技术，如tickle 3。这为RL训练中最先进的推测解码提供了一条部署路径。在同步RL下8B规模的推理后训练工作负载中，推测性解码将推出吞吐量提高了1.8倍。使用高保真性能模拟器，我们预计将推测性解码与异步RL相结合，在235 B规模下可获得高达2.5倍的端到端训练加速。
摘要：RL post-training of frontier language models is increasingly bottlenecked by autoregressive rollout generation, making rollout acceleration a central systems challenge. Many existing efficiency methods improve throughput by changing the rollout or optimization regime, for example, through off-policy execution, replay, or lower-precision generation. We study speculative decoding as a lossless acceleration primitive for RL rollouts that preserves the target model's output distribution. We implement speculative decoding in NeMo-RL with a vLLM backend, supporting both synchronous and asynchronous pipelines and enabling speculation during RL rollouts. This benefit is realizable across speculation mechanisms, such as pretrained MTP heads, small external draft models or even techniques such as Eagle3, which are traditionally applied after RL phase. This yields a deployment path for state-of-the-art speculative decoding inside RL training. In a reasoning post-training workload at 8B scale under synchronous RL, speculative decoding improves rollout throughput by 1.8x. Using a high-fidelity performance simulator, we project that combining speculative decoding with asynchronous RL yields up to 2.5x end-to-end training speedup at 235B scale.

【7】SciHorizon-DataEVA: An Agentic System for AI-Readiness Evaluation of Heterogeneous Scientific Data
标题：SciHorizon-Data伊娃：一个用于对异类科学数据进行人工智能准备性评估的统计系统
链接：https://arxiv.org/abs/2604.26645

作者：Dianyu Liu,Chuan Qin,Xi Chen,Xiaohan Li,Wenxi Xu,Yuyang Wang,Xin Chen,Yuanchun Zhou,Hengshu Zhu
摘要：AI for Science（AI 4Science）通过将机器学习模型嵌入到跨领域的预测、模拟和假设生成工作流程中，越来越多地改变科学发现。然而，这些模型的有效性从根本上受到科学数据的人工智能准备程度的限制，目前还没有可扩展的系统评估机制。在这项工作中，我们提出了SciHorizon-DataEVA，这是一种新型的代理系统，用于对异构科学数据进行可扩展的AI就绪性评估。在评估标准层面，我们引入了Sci-TQA 2原则，该原则将人工智能准备分为四个互补的维度：治理可信度，数据质量，人工智能兼容性和科学适应性。每个维度都被分解为可测量的原子元素，这些元素支持细粒度和可执行的评估。为了大规模地实施这些原则，我们开发了Sci-TQA 2-Eval，这是一种通过定向循环工作流程编排的分层多代理评估方法。我们的Sci-TQA 2-Eval通过结合轻量级数据集分析，适用性感知度量激活和基于域约束和论文信号的知识增强规划，动态构建论文感知评估规范。这些规范通过自适应的、以工具为中心的评估机制来执行，该机制具有内置的验证和自我校正功能，能够跨异构科学数据进行可扩展和可靠的评估。在跨多个领域的科学数据集上进行的广泛实验证明了SciHorizon-DataEVA在原则性AI准备评估方面的有效性和通用性。
摘要：AI-for-Science (AI4Science) is increasingly transforming scientific discovery by embedding machine learning models into prediction, simulation, and hypothesis generation workflows across domains. However, the effectiveness of these models is fundamentally constrained by the AI-readiness of scientific data, for which no scalable and systematic evaluation mechanism currently exists. In this work, we propose SciHorizon-DataEVA, a novel agentic system to scalable AI-readiness evaluation of heterogeneous scientific data. At the evaluation-criteria level, we introduce the Sci-TQA2 principles, which organize AI-readiness into four complementary dimensions: Governance Trustworthiness, Data Quality, AI Compatibility, and Scientific Adaptability. Each dimension is decomposed into measurable atomic elements that enable fine-grained and executable assessment. To operationalize these principles at scale, we develop Sci-TQA2-Eval, a hierarchical multi-agent evaluation approach orchestrated through a directed, cyclic workflow. Our Sci-TQA2-Eval dynamically constructs dataset-aware evaluation specifications by combining lightweight dataset profiling, applicability-aware metric activation, and knowledge-augmented planning grounded in domain constraints and dataset-paper signals. These specifications are executed through an adaptive, tool-centric evaluation mechanism with built-in verification and self-correction, enabling scalable and reliable assessment across heterogeneous scientific data. Extensive experiments on scientific datasets spanning multiple domains demonstrate the effectiveness and generality of SciHorizon-DataEVA for principled AI-readiness evaluation.

【8】FloatSOM: GPU-Accelerated, Distributed, Topology-Flexible Self-Organizing Maps
标题：FloatSOM：GOP加速、分布式、布局灵活的自组织地图
链接：https://arxiv.org/abs/2604.26555

作者：Tony Xu,Sarah Klamt,Katherine Turner,Anne Brustle,Felix Marsh-Wakefield,Givanna Putri
摘要：GPU加速的自组织映射（SOM）实现是大规模SOM分析中最具竞争力的选择之一，但不断增长的数据集大小越来越多地挑战其实际使用，因为工作负载不再完全符合设备内存限制。我们介绍FloatSOM，一个可扩展的培训和部署的SOM框架，支持多GPU执行，内存不足的磁盘支持的流，和新的拓扑结构超出常规的格子。我们评估FloatSOM的14个合成和真实的基准数据集与控制速度缩放基准，并表明，这些改进的拓扑结构，结合拓扑感知超参数微调，产生较低的量化误差比目前最先进的SOM基线。FloatSOM还通过高吞吐量分布式执行在大规模上保持了这种性能;在最大的基准测试中，它在两个独立的高性能计算节点上，在8个GPU上，在6.16分钟内对1，000，000，000个样本和50个功能训练了1024节点的SOM网络。
摘要：GPU-accelerated Self-Organizing Map (SOM) implementations are among the most competitive options for large-scale SOM analysis, but growing dataset sizes increasingly challenge their practical use because workloads no longer fit cleanly within device-memory limits. We introduce FloatSOM, a SOM framework for scalable training and deployment that supports multi-GPU execution, out-of-memory disk-backed streaming, and novel topologies beyond regular lattices. We evaluate FloatSOM on 14 synthetic and real benchmark datasets together with controlled speed scaling benchmarks, and show that these improved topologies, combined with topology-aware hyperparameter fine-tuning, yield lower quantization error than current state-of-the-art SOM baselines. FloatSOM also sustains this performance at large scale with high-throughput distributed execution; in the largest benchmark, it trains a 1024-node SOM network on 1,000,000,000 samples with 50 features in 6.16 minutes on 8 GPUs across two separate high-performance-computing nodes.

【9】Quantamination: Dynamic Quantization Leaks Your Data Across the Batch
标题：量化：动态量化会泄露整个批次的数据
链接：https://arxiv.org/abs/2604.26505

作者：Hanna Foerster,Ilia Shumailov,Cheng Zhang,Yiren Zhao,Jamie Hayes,Robert Mullins
备注：11 pages, 4 figures, 4 tables
摘要：动态量化是提高机器学习服务流程利用率和效率的一种实用方法。与离线应用量化的静态量化不同，动态量化在运行时对张量进行操作，使其参数适应实际输入数据。今天的主流机器学习框架，包括ML编译器和推理引擎，经常推荐动态量化作为优化模型服务的初始步骤。这是因为动态量化可以显著减少内存使用和计算负载，从而导致更快的令牌生成和改进的模型服务效率，而不会显著降低模型精度。在本文中，我们揭示了动态量化的一个关键漏洞：对手可以利用这种量化策略窃取敏感的用户数据放在同一批对手的输入。我们的分析表明，动态量化，当实施或配置不当，可以创建侧通道，暴露信息的其他输入在同一批。我们称这种现象为量子化，描述量子化的污染。具体来说，我们展示了目前使用的最流行的ML框架中至少有4个默认或可以使用跨批处理边界泄漏数据的配置。从理论上讲，这种数据泄漏允许攻击者部分甚至完全恢复其他用户的批量输入数据，这对现有的ML服务框架来说是一个严重的隐私风险。
摘要：Dynamic quantization emerged as a practical approach to increase the utilization and efficiency of the machine learning serving flow. Unlike static quantization, which applies quantization offline, dynamic quantization operates on tensors at run-time, adapting its parameters to the actual input data. Today's mainstream machine learning frameworks, including ML compilers and inference engines, frequently recommend dynamic quantization as an initial step for optimizing model serving. This is because dynamic quantization can significantly reduce memory usage and computational load, leading to faster token generation and improved model serving efficiency without substantial loss in model accuracy. In this paper, we reveal a critical vulnerability in dynamic quantization: an adversary can exploit such quantization strategy to steal sensitive user data placed in the same batch as the adversary's input. Our analysis demonstrates that dynamic quantization, when improperly implemented or configured, can create side channels that expose information about other inputs within the same batch. We call this phenomenon Quantamination, describing contamination from quantization. Specifically, we show that at least 4 of the most popular ML frameworks in use today either default to or can use configurations that leak data across the batch boundary. This data leakage, in theory, allows attackers to partially or even fully recover other users' batched input data, representing a serious privacy risk for existing ML serving frameworks.

【10】CO-EVO: Co-evolving Semantic Anchoring and Style Diversification for Federated DG-ReID
标题：CO-EVO：联合发展的语义锚定和风格多元化联邦DG-ReID
链接：https://arxiv.org/abs/2604.26363

作者：Fengchun Zhang,Qiang Ma,Liuyu Xiang,Jinshan Lai,Tingxuan Huang,Jianwei Hu
备注：Accepted at ACL 2026 (Main Conference)
摘要：FedDG-ReID旨在协同训练跨多个分散源域的行人检索模型，使其可以推广到看不见的目标环境，而不会损害原始数据隐私。然而，这一任务受到分散客户固有风格差距的重大挑战。如果没有全局监督，模型很容易屈服于捷径学习，其中表示过度拟合特定领域的相机偏差，而不是通用的身份特征。我们提出了CO-EVO，一种新的联邦框架，解决了这种语义风格的冲突，通过共同进化机制。在语义方面，相机不变语义分类（CSA）学习具有跨相机一致性的身份提示，以建立过滤掉局部成像噪声的纯化和域不可知的锚点。在视觉方面，由全球相机风格库（GCSB）提供支持的全球风格多样化（GSD）合成了逼真的扰动，以扩展训练数据的视觉边界。CO-EVO的核心是它的共同进化循环，其中纯化的锚点作为引力中心，在不同的风格变化中引导图像编码器朝向强大的解剖属性。大量的实验表明，CO-EVO实现了最先进的（SOTA）性能，证明了语义净化和风格扩展之间的协同作用对于鲁棒的跨域泛化至关重要。我们的代码可在https://github.com/NanYiyuzurn/ACL-LGPS-2026上获得。
摘要：Federated domain generalization for person re-identification (FedDG-ReID) aims to collaboratively train a pedestrian retrieval model across multiple decentralized source domains such that it can generalize to unseen target environments without compromising raw data privacy. However, this task is significantly challenged by the inherent stylistic gaps across decentralized clients. Without global supervision, models easily succumb to shortcut learning where representations overfit to domain specific camera biases rather than universal identity features. We propose CO-EVO, a novel federated framework that resolves this semantic-style conflict through a co-evolutionary mechanism. On the semantic side, Camera-Invariant Semantic Anchoring (CSA) learns identity prompts with cross-camera consistency to establish purified and domain-agnostic anchors that filter out local imaging noise. On the visual side, Global Style Diversification (GSD), powered by a Global Camera-Style Bank (GCSB), synthesizes realistic perturbations to expand the visual boundaries of training data. The core of CO-EVO is its co-evolutionary loop where purified anchors act as gravitational centers to guide the image encoder toward robust anatomical attributes amidst diverse style variations. Extensive experiments demonstrate that CO-EVO achieves state-of-the-art (SOTA) performance, proving that the synergy between semantic purification and style expansion is essential for robust cross-domain generalization. Our code is available at: https://github.com/NanYiyuzurn/ACL-LGPS-2026.

【11】AlphaJet: Automated Conceptual Aircraft Synthesis via Disentangled Generative Priors and Topology-Preserving Evolutionary Search
标题：AlphaJet：通过分解生成先验和保留布局的进化搜索自动化概念飞机合成
链接：https://arxiv.org/abs/2604.26337

作者：Boris Kriuk
备注：10 pages, 2 figures, 1 table
摘要：概念飞机设计传统上是一个专家调解的迭代过程中，人类设计师提出一个配置，运行低阶物理，检查结果，并重新建议。我们提出了AlphaJet，一个端到端的自动化合成管道，关闭这个循环。AlphaJet根据文本任务规范（质量、航程、巡航速度、硬尺寸包线、发动机数量、面密度）实时开发可行的3D飞机，并通过透明的多学科适应性函数进行评分，涵盖空气动力学、结构、重量、稳定性、包装和几何安装一致性。三个贡献区分我们的方法：（i）解剖解纠缠变分自动编码器（AD-VAE），其前25个潜在维度被监督以与命名的解剖参数对齐，提供可解释的形状先验;（ii）拓扑精英遗传算法，其保护最佳个体免受五个尾部拓扑中的每一个并触发停滞重启，防止过早崩溃为单一配置;以及（iii）挂载感知几何评分，计算发动机和其他结构部件之间的有符号穿透，消除生成飞机模型中常见的冗余伪影。完整的循环在CPU上以交互方式运行，并将每一代数据流传输到浏览器查看器，使其成为早期设计空间探索的实用现实自动化工具。
摘要：Conceptual aircraft design is traditionally an expert-mediated iterative process in which a human designer proposes a configuration, runs low-order physics, inspects the result, and re-proposes. We present AlphaJet, an end-to-end automated synthesis pipeline that closes this loop. From a textual mission specification (mass, range, cruise speed, hard size envelope, engine count, areal density) AlphaJet evolves a feasible 3D aircraft in real time, scored by a transparent multi-disciplinary fitness function covering aerodynamics, structures, weights, stability, packaging, and geometric mount consistency. Three contributions distinguish our approach: (i) an Anatomically-Disentangled Variational Autoencoder (AD-VAE) whose first 25 latent dimensions are supervised to align with named anatomical parameters, providing an interpretable shape prior; (ii) a topology-elitist genetic algorithm that protects the best individual from each of five tail topologies and triggers stagnation restarts, preventing premature collapse to a single configuration; and (iii) mount-aware geometric scoring that computes signed penetration between engines and other structural parts, eliminating the redundant artifacts common in generative aircraft models. The full loop runs interactively on a CPU and streams every generation to a browser viewer, making it a practical real-world automation tool for early-phase design-space exploration.

【12】Calibrated Surprise: An Information-Theoretic Account of Creative Quality
标题：校准惊喜：创意品质的信息论解释
链接：https://arxiv.org/abs/2604.26269

作者：Bo Zou,Chao Xu
备注：24 pages, 2 figures
摘要：好的创意写作的本质是校准的惊喜：当所有相关维度的约束一起作用时，可行的解决方案空间会缩小到一个狭窄的区域，从无约束的角度来看，幸存的选择看起来最不可预测。“校准”有一个精确的含义：作者的意图，读者的合理期望，以及现实的逻辑收敛。当这三个独立的判断在每一个维度上都一致时，可接受的写作选择集就被迫进入一个非常小的区域。数学推论如下：全尺寸精度和精确度是相互排斥的--一个约束结构的两个方面，而不是单独的目标。我们使用香农的互信息$I（X;Y）= H（X）- H（X| Y）$作为我们的分析工具。“校准”对应于条件熵趋于零;“惊喜”对应于熵上升;互信息是联合量的精确度量。这一论点基于两个支柱。静态：当来自道德规范、神话、词汇和道德规范的约束一起施加时，容许集急剧崩溃，从无约束的角度来看，幸存的解决方案表现为低概率选择。动态：链式法则表明，每一种写作选择都受到前面的内容的约束，并约束后面的内容;宏观层面的决策自然会贡献更大的信息份额，从而消除了手动调整权重的需要。通过案例研究和轻量级LLM-logprob计算，我们表明该框架在分析上是有用的和可操作的，为创意质量对齐（CQA）和专业评估基准奠定了理论基础。
摘要：The essence of good creative writing is calibrated surprise: when constraints from all relevant dimensions act together, the feasible solution space collapses into a narrow region, and the surviving choices look least predictable from an unconstrained view. "Calibrated" has a precise meaning: the author's intent, the reader's reasonable expectation, and the logic of reality converge. When these three independent judgements agree on every dimension, the set of admissible writing choices is forced into a very small region. A mathematical corollary follows: full-dimensional accuracy and mediocrity are mutually exclusive -- two sides of one constraint structure, not separate goals. We use Shannon's mutual information $I(X;Y) = H(X) - H(X|Y)$ as our analysis tool. "Calibrated" corresponds to conditional entropy going to zero; "surprise" to entropy going up; mutual information is the precise measure of the joint quantity. The argument rests on two pillars. Static: when constraints from ethos, mythos, lexis, and dianoia are imposed together, the admissible set collapses sharply, and surviving solutions show up as low-probability choices from an unconstrained view. Dynamic: the chain rule shows each writing choice is constrained by what came before and constrains what comes after; macro-level decisions naturally contribute a larger share of information, removing the need for hand-tuned weighting. Through case studies and lightweight LLM-logprob computations, we show the framework is both analytically useful and operational, laying the theoretical groundwork for Creative Quality Alignment (CQA) and a professional evaluation benchmark.

【13】Hierarchical Long-Term Semantic Memory for LinkedIn's Hiring Agent
标题：LinkedIn招聘代理的分层长期语义记忆
链接：https://arxiv.org/abs/2604.26197

作者：Zhentao Xu,Shangjing Zhang,Emir Poyraz,Yvonne Li,Ye Jin,Xie Lu,Xiaoyang Gu,Karthik Ramgopal,Praveen Kumar Bodigutla,Xiaofeng Wang
摘要：大型语言模型（LLM）代理越来越多地用于现实世界的产品中，其中个性化和上下文感知的用户交互至关重要。这种能力的一个核心使能器是代理的长期语义记忆系统，它从嘈杂的纵向行为数据中提取隐含和明确的信号，以结构化的形式存储它们，并支持低延迟检索。为LLM代理构建工业级长期记忆提出了五个挑战：可扩展性，低延迟检索，隐私约束，跨域泛化性和可观察性。我们引入了分层长期语义记忆（HLTM）框架，该框架将文本数据组织到一个与模式对齐的记忆树中，该树以多个粒度级别捕获语义知识，从而实现可扩展的摄取，隐私感知存储，低延迟检索和透明的出处; HLTM还集成了一种适应机制，以在不同的用例中进行泛化。对LinkedIn招聘助手的广泛评估表明，HLTM将答案正确性和检索F1显着提高了10%以上，同时显着提高了查询和索引延迟之间的帕累托边界。HLTM已部署在LinkedIn的招聘助手中，为生产招聘工作流程中的核心个性化功能提供支持。
摘要：Large Language Model (LLM) agents are increasingly used in real-world products, where personalized and context-aware user interactions are essential. A central enabler of such capabilities is the agent's long-term semantic memory system, which extracts implicit and explicit signals from noisy longitudinal behavioral data, stores them in a structured form, and supports low-latency retrieval. Building industrial-grade long-term memory for LLM agents raises five challenges: scalability, low-latency retrieval, privacy constraints, cross-domain generalizability, and observability. We introduce the Hierarchical Long-Term Semantic Memory (HLTM) framework, which organizes textual data into a schema-aligned memory tree that captures semantic knowledge at multiple levels of granularity, enabling scalable ingestion, privacy-aware storage, low-latency retrieval, and transparent provenance; HLTM further incorporates an adaptation mechanism to generalize across diverse use cases. Extensive evaluations on LinkedIn's Hiring Assistant show that HLTM improves answer correctness and retrieval F1 significantly by more than 10%, while significantly advancing the Pareto frontier between query and indexing latency. HLTM has been deployed in LinkedIn's Hiring Assistant to power core personalization features in production hiring workflows.

【14】Entropy Centroids as Intrinsic Rewards for Test-Time Scaling
标题：作为测试时间缩放的内在回报的信息
链接：https://arxiv.org/abs/2604.26173

作者：Wenshuo Zhao,Qi Zhu,Xingshan Zeng,Fei Mi,Lifeng Shang,Yiren Feng
备注：Under Review, 39 pages
摘要：一种有效的方法来扩大大型语言模型的测试时计算是采样多个响应，然后选择最好的一个，如Grok Heavy和Gemini Deep Think。现有的选择方法通常依赖于外部奖励模型，这需要训练强奖励模型并引入额外的计算开销。作为替代方案，先前的方法已经探索了内在信号，例如置信度和熵，但是这些信号对于朴素的聚合是有噪声的。在这项工作中，我们观察到高熵令牌在推理过程中倾向于聚集成连续的组，提供了比单个令牌更稳定的模型不确定性概念。总之，这些集群揭示了整个推理过程中模型不确定性的时间模式。出于这种观察，我们建议使用不确定性的时间结构作为内在奖励。为此，我们首先将段级不确定性的基本单位形式化为高熵阶段（HEP），这是一个可变长度的段，从高熵令牌开始，在连续的低熵令牌出现时结束。然后，我们定义熵质心，灵感来自物理学中的质心的概念，作为所有HEP沿轨迹的加权平均位置。直觉上，较低的质心表示早期探索，然后是自信的生成，我们发现这通常对应于较高的响应质量。基于这一认识，我们提出了最低质心方法，它选择多个候选人中的最低熵质心的响应。在14 B到480 B的模型规模上进行的数学、代码生成、逻辑推理和代理任务的实验表明，最低质心始终优于现有基线，并随着模型大小的增加而提供稳定的收益。代码可在https://github.com/hkust-nlp/entropy-centroid上获得。
摘要：An effective way to scale up test-time compute of large language models is to sample multiple responses and then select the best one, as in Grok Heavy and Gemini Deep Think. Existing selection methods often rely on external reward models, which requires training a strong reward model and introduces additional computation overhead. As an alternative, previous approaches have explored intrinsic signals, such as confidence and entropy, but these signals are noisy with naive aggregation. In this work, we observe that high-entropy tokens tend to cluster into consecutive groups during inference, providing a more stable notion of model uncertainty than individual tokens. Together, these clusters reveal temporal patterns of model uncertainty throughout the inference process. Motivated by this observation, we propose to use the temporal structure of uncertainty as an intrinsic reward. To this end, we first formalize the basic unit of segment-level uncertainty as the High Entropy Phase (HEP), a variable-length segment that begins at a high-entropy token and ends when consecutive low-entropy tokens appear. We then define the Entropy Centroid, inspired by the concept of the center of mass in physics, as the weighted average position of all HEPs along the trajectory. Intuitively, a lower centroid indicates early exploration followed by confident generation, which we find often corresponds to higher response quality. Based on this insight, we propose the Lowest Centroid method, which selects the response with the lowest entropy centroid among multiple candidates. Experiments on mathematics, code generation, logical reasoning, and agentic tasks, across model scales ranging from 14B to 480B, show that Lowest Centroid consistently outperforms existing baselines and delivers stable gains as model size increases. Code is available at https://github.com/hkust-nlp/entropy-centroid.

【15】Test-Time Safety Alignment
标题：测试时间安全调整
链接：https://arxiv.org/abs/2604.26167

作者：Baturay Saglam,Dionysis Kalogerias
摘要：最近的工作表明，模型的输入词嵌入可以作为有效的控制变量，用于将其行为转向满足所需属性的输出。然而，这只在预训练的文本补全模型中得到了证明，其目标相对简单，即在短时间内减少表面亵渎。一个自然且实际重要的问题是输入嵌入如何控制对齐的模型，这会产生不平衡的双峰拒绝或遵守输出分布，而不是开放式发电的平滑分布特征。我们在安全的背景下探索这一点，表明输入词嵌入可以以子词汇的方式进行优化，以最大限度地减少对齐模型响应的语义危害。我们的方法使用零阶梯度估计的黑盒文本调节API相对于输入嵌入，然后对这些嵌入应用梯度下降，以尽量减少所生成的文本的危害性。实验表明，该方法可以中和标准安全基准上的每个安全标记的响应。
摘要：Recent work has shown that a model's input word embeddings can serve as effective control variables for steering its behavior toward outputs that satisfy desired properties. However, this has only been demonstrated for pretrained text-completion models on the relatively simple objective of reducing surface-level profanity in short continuations. A natural and practically important question is how well input embeddings can control aligned models, which produce an imbalanced bimodal refuse-or-comply output distribution rather than the smooth distribution characteristic of open-ended generation. We explore this in the context of safety, showing that input word embeddings can be optimized in a sub-lexical manner to minimize the semantic harmfulness of aligned model responses. Our approach uses zeroth-order gradient estimation of a black-box text-moderation API with respect to the input embeddings, and then applies gradient descent on these embeddings to minimize the harmfulness of the generated text. Experiments show that the proposed method can neutralize every safety-flagged response on standard safety benchmarks.

【16】AMMA: A Multi-Chiplet Memory-Centric Architecture for Low-Latency 1M Context Attention Serving
标题：AMMA：一种以多芯片内存为中心的架构，用于低延迟1 M上下文注意力服务
链接：https://arxiv.org/abs/2604.26103

作者：Zhongkai Yu,Haotian Ye,Chenyang Zhou,Ohm Rishabh Venkatachalam,Zaifeng Pan,Zhengding Hu,Junsung Kim,Won Woo Ro,Po-An Tsai,Shuyi Pei,Yangwook Kang,Yufei Ding
摘要：所有当前的LLM服务系统都将GPU置于中心位置，从生产级注意力FFN分解到NVIDIA的Rubin GPU-LPU异构平台。即使是学术的PIM/PNM提案仍然将GPU视为跨设备通信的中心枢纽。然而，GPU的富计算架构与解码阶段注意力的内存限制本质根本不匹配，增加了服务延迟，同时浪费了空闲计算单元上的功率和芯片面积。随着推理和代理工作负载将上下文长度推向100万个令牌，这个问题变得更加复杂，使得注意力延迟成为主要的用户瓶颈。为了解决这些效率低下的问题，我们提出了AMMA，一个多小芯片，以内存为中心的低延迟长上下文注意力的架构。AMMA用HBM-PNM立方体取代了GPU计算芯片，大约将可用内存带宽增加了一倍，以更好地服务于内存限制的注意力工作负载。为了将此带宽转化为成比例的性能增益，我们引入（i）逻辑裸片微架构，其在最小功率和面积预算下充分利用每立方体内部带宽用于解码注意力，（ii）两级混合并行方案，以及（iii）重新排序的集体流，其减少芯片内裸片到裸片通信开销。我们进一步进行了设计空间探索每立方计算能力和芯片内D2 D链路带宽，为硬件设计人员提供可操作的指导。评估显示，AMMA与NVIDIA H100相比，注意力延迟降低15.5倍，能耗降低6.9倍。
摘要：All current LLM serving systems place the GPU at the center, from production-level attention-FFN disaggregation to NVIDIA's Rubin GPU-LPU heterogeneous platform. Even academic PIM/PNM proposals still treat the GPU as the central hub for cross-device communication. Yet the GPU's compute-rich architecture is fundamentally mismatched with the memory-bound nature of decode-phase attention, inflating serving latency while wasting power and die area on idle compute units. The problem is compounded as reasoning and agentic workloads push context lengths toward one million tokens, making attention latency the primary user-facing bottleneck. To address these inefficiencies, we present AMMA, a multi-chiplet, memory-centric architecture for low-latency long-context attention. AMMA replaces GPU compute dies with HBM-PNM cubes, roughly doubling the available memory bandwidth to better serve memory-bound attention workloads. To translate this bandwidth into proportional performance gains, we introduce (i) a logic-die microarchitecture that fully exploits per-cube internal bandwidth for decode attention under a minimal power and area budget, (ii) a two-level hybrid parallelism scheme, and (iii) a reordered collective flow that reduces intra-chip die-to-die communication overhead. We further conduct a design-space exploration over per-cube compute power and intra-chip D2D link bandwidth, providing actionable guidance for hardware designers. Evaluations show that AMMA achieves 15.5X lower attention latency and 6.9X lower energy consumption compared with the NVIDIA H100.

【17】Incremental Strongly Connected Components with Predictions
标题：具有预测的增量强连接组件
链接：https://arxiv.org/abs/2604.26062

作者：Ronald Deng,Samuel McCauley,Aidin Niaparast,Helia Niaparast,Bennett Ptak,Shirel Quintanilla,Shikha Singh,Nathan Vosburg
摘要：预测算法是一个不断发展的领域，旨在利用机器学习的预测来设计更快的超越最坏情况的算法。在本文中，我们使用这个框架来设计一个学习的数据结构的增量强连接组件（SCC）的问题。在这个问题中，一个图的$n$个顶点是先验已知的，$m$个有向边随着时间的推移而到达。目标是在每次插入之后有效地维护图的强连接组件。我们的算法接收一个可能错误的预测的边缘序列，并使用它来预先计算部分解决方案，以支持快速插入。我们表明，我们的算法实现了接近最优的界限，良好的预测和其性能平稳下降的预测误差。我们还实现了我们的数据结构，并在真实数据集上进行实验。我们的实证结果表明，该理论是预测实际运行时的改进。
摘要：Algorithms with predictions is a growing area that aims to leverage machine-learned predictions to design faster beyond-worst-case algorithms. In this paper, we use this framework to design a learned data structure for the incremental strongly connected components (SCC) problem. In this problem, the $n$ vertices of a graph are known a priori and the $m$ directed edges arrive over time. The goal is to efficiently maintain the strongly connected components of the graph after each insert. Our algorithm receives a possibly erroneous prediction of the edge sequence and uses it to precompute partial solutions to support fast inserts. We show that our algorithm achieves nearly optimal bounds with good predictions and its performance smoothly degrades with the prediction error. We also implement our data structure and perform experiments on real datasets. Our empirical results show that the theory is predictive of practical runtime improvements.

【18】RaMP: Runtime-Aware Megakernel Polymorphism for Mixture-of-Experts
标题：RaMP：专家混合的运行时感知Megakernin多态性
链接：https://arxiv.org/abs/2604.26039

作者：Vyom Sharma,Debajyoti Datta
备注：10 pages, 8 figures, 9 tables. Preprint
摘要：混合专家（Mixture-of-Experts，MoE）推理的最佳内核配置取决于批处理大小和专家路由分布，但生产系统仅从批处理大小进行调度，导致10-70%的内核吞吐量无法实现。我们提出RaMP，路由感知调度框架。性能区域分析仅从硬件常量中得出，当每个优化都有帮助时，正确预测所有8个测试架构，包括3个看不见的架构。一个四参数的波浪成本模型从运行时专家直方图中选择最快的配置，实现了0.93%的平均遗憾与穷举搜索，从每个模型的10-24分钟的一次性分析拟合。由于该模型仅依赖于CTA网格几何形状，因此它是与内核无关的：应用于Alpha-MoE，它可以提供1.14倍，而无需修改源。RaMP与共同设计的CuTe DSL内核搭配使用，暴露了134-268种多态配置，在静态分派上提供了1.22倍的内核加速，在Triton上提供了1.30倍的vLLM端到端加速，在DeepGEMM上提供了1.41倍的加速，在FlashInfer CUTLASS上提供了1.13倍的加速。
摘要：The optimal kernel configuration for Mixture-of-Experts (MoE) inference depends on both batch size and the expert routing distribution, yet production systems dispatch from batch size alone, leaving 10-70% of kernel throughput unrealized. We present RaMP, a routing-aware dispatch framework. A performance-region analysis derives, from hardware constants alone, when each optimization helps, correctly predicting all 8 tested architectures, including 3 unseen. A four-parameter wave cost model selects the fastest configuration from the runtime expert histogram, achieving 0.93% mean regret versus exhaustive search, fitted from just 10-24 minutes of one-time profiling per model. Because the model depends only on CTA grid geometry, it is kernel-agnostic: applied to Alpha-MoE, it delivers 1.14x with no source modification. Paired with a co-designed CuTe DSL kernel exposing 134-268 polymorphic configurations, RaMP delivers 1.22x kernel speedup over static dispatch and 1.30x end-to-end speedup in vLLM serving over Triton, 1.41x over DeepGEMM, and 1.13x over FlashInfer CUTLASS.

【19】Open Problems in Frontier AI Risk Management
标题：前沿人工智能风险管理中的未决问题
链接：https://arxiv.org/abs/2604.25982

作者：Marta Ziosi,Miro Plueckebaum,Stephen Casper,Henry Papadatos,Ze Shen Chin,Peter Slattery,James Gealy,Tim G. J. Rudner,Brian Tse,Ariel Gil,Patricia Paskov,Maximilian Negele,Rokas Gipiškis,Nada Madkour,Vera Lummis,Rupal Jain,Luise Eder,Kristina Fort,Malou C. van Draanen Glismann,Inès Belhadj,Amin Oueslati,Anna K. Wisakanto,Richard Mallah,Koen Holtman,Ranj Zuhdi,Daniel S. Schiff,Jessica Newman,Malcolm Murray,Robert Trager
备注：81 pages, 3 figures
摘要：Frontier AI既放大了现有的风险，又引入了质的新挑战。不仅由于技术变革的快速发展而明显缺乏稳定的科学共识，而且新兴的前沿人工智能安全实践往往与既定的风险管理框架不一致，甚至可能会破坏这些框架。为了应对这些挑战，我们系统地提出了前沿人工智能风险管理中的开放问题。采用以问题为导向的方法，我们检查风险管理过程的每个阶段-风险规划，识别，分析，评估和缓解-通过文献的结构化审查，确定未解决的挑战和最适合解决这些问题的参与者。认识到不同类型的未决问题需要不同的反应，我们根据未决问题是否反映了（a）缺乏科学或技术共识，（b）与既定风险管理框架不一致或挑战，或（c）尽管有明显的共识和一致性，但在实施中存在缺陷，对未决问题进行分类。通过映射这些开放问题并确定最适合解决这些问题的参与者-包括开发人员，部署人员，监管机构，标准机构，研究人员和第三方评估人员-这项工作旨在澄清需要在哪些方面取得进展，以便在前沿人工智能风险管理方面达成强大而有意义的共识。相反，它提供了一份面向问题、确定重点的参考文件，并辅之以一个在线的实时资料库，旨在支持协调、减少重复并指导今后的研究和治理工作。
摘要：Frontier AI both amplifies existing risks and introduces qualitatively novel challenges. Not only is there a notable lack of stable scientific consensus resulting from the rapid pace of technological change, but emerging frontier AI safety practices are often misaligned with, or may undermine, established risk management frameworks. To address these challenges, we systematically surface open problems in frontier AI risk management. Adopting a problem-oriented approach, we examine each stage of the risk management process - risk planning, identification, analysis, evaluation, and mitigation - through a structured review of the literature, identifying unresolved challenges and the actors best positioned to address them. Recognising that different types of open problems call for different responses, we classify open problems according to whether they reflect (a) a lack of scientific or technical consensus, (b) misalignment with, or challenges to, established risk management frameworks, or (c) shortcomings in implementation despite apparent consensus and alignment. By mapping these open problems and identifying the actors best positioned to address them - including developers, deployers, regulators, standards bodies, researchers, and third-party evaluators - this work aims to clarify where progress is needed to enable robust and meaningful consensus on frontier AI risk management.The paper does not propose specific solutions; instead, it provides a problem-oriented, agenda-setting reference document, complemented by a living online repository, intended to support coordination, reduce duplication, and guide future research and governance efforts.

【20】Rethinking KV Cache Eviction via a Unified Information-Theoretic Objective
标题：通过统一的信息理论目标重新思考KV缓存驱逐
链接：https://arxiv.org/abs/2604.25975

作者：Jiaming Yang,Chenwei Tang,Liangli Zhen,Jiancheng Lv
备注：19 pages, 6 figures
摘要：键值（KV）缓存对于大型语言模型推理是必不可少的，但其内存开销对长上下文生成构成了关键瓶颈。现有的驱逐政策主要依赖于经验主义，缺乏严格的理论基础。这项工作重新思考KV缓存驱逐通过镜头的信息瓶颈原则。在一个线性高斯代理的注意，我们推导出一个封闭形式的互信息目标，其特征在于保留KV缓存子集的有效信息容量。这一提法表明，现有的各种驱逐战略可以被解释为同一能力最大化原则的不同近似。在这种见解的指导下，我们引入了CapKV，这是一种容量感知的驱逐方法，它通过使用统计杠杆分数的对数行列式近似直接针对信息保存。这种方法用一种保留最大预测信号的理论基础机制取代了启发式选择。跨多个模型和长上下文基准的广泛实验表明，CapKV始终优于先前的方法，在内存效率和世代保真度之间实现了更好的权衡。
摘要：Key-value (KV) caching is essential for large language model inference, yet its memory overhead poses a critical bottleneck for long-context generation. Existing eviction policies predominantly rely on empirical heuristics, lacking a rigorous theoretical foundation. This work rethinks KV cache eviction through the lens of the Information Bottleneck principle. Under a linear-Gaussian surrogate of attention, we derive a closed-form mutual information objective that characterizes the effective information capacity of a retained KV cache subset. This formulation reveals that a wide range of existing eviction strategies can be interpreted as different approximations of the same capacity-maximization principle. Guided by this insight, we introduce CapKV, a capacity-aware eviction method that directly targets information preservation via a log-determinant approximation using statistical leverage scores. This approach replaces heuristic selection with a theoretically grounded mechanism that preserves the maximum predictive signal. Extensive experiments across multiple models and long-context benchmarks show that CapKV consistently outperforms prior methods, achieving a better trade-off between memory efficiency and generational fidelity.

【21】A Randomized PDE Energy driven Iterative Framework for Efficient and Stable PDE Solutions
标题：一个随机的PDL能源驱动迭代框架，提供高效稳定的PDL解决方案
链接：https://arxiv.org/abs/2604.25943

作者：Yi Bing,Zheng Ran,Fu Jinyang,Liu Long,Peng Xiang
摘要：高效且稳定的偏微分方程（PDEs）求解是科学和工程应用的核心，但现有的数值求解器严重依赖于基于矩阵的离散化，而基于学习的方法需要昂贵的训练，并且通常具有有限的泛化能力。在这项工作中，我们提出了一个PDE能量驱动的框架，解决PDE通过物理约束扩散迭代，而不依赖于经典的基于矩阵的有限元组装或数据驱动的神经网络训练。该方法通过PDE能量驱动的隐式迭代结合高斯平滑来演化任意随机初始场，同时在每次迭代时严格执行边界条件。建议制定适用于代表一维泊松，热，粘性Burgers方程，包括稳态和瞬态问题。数值结果表明，稳定的收敛到唯一的物理解随机初始化，准确的分辨率的尖锐梯度和控制的均方误差（MSE）在广泛的离散化参数。与解析解的详细比较表明，该框架实现了竞争力的准确性和稳定性。总体而言，所提出的框架提供了一个快速，灵活，物理上一致的替代传统的数值求解器，提供了一个潜在的途径，可扩展的PDE解决方案在研究和工程应用。
摘要：Efficient and stable solution of partial differential equations (PDEs) is central to scientific and engineering applications, yet existing numerical solvers rely heavily on matrix based discretizations, while learning based methods require costly training and often suffer from limited generalization. In this work, we proposes a PDE energy driven framework that solves PDEs through physically constrained diffusion iterations, without relying on classical matrix based finite element assembly or data driven neural network training. The proposed method evolves arbitrary random initial fields through PDE energy driven implicit iterations combined with Gaussian smoothing, while strictly enforcing boundary conditions at each iteration. The proposed formulation is applied to representative one dimensional Poisson, Heat, and viscous Burgers equations, covering both steady state and transient problems. Numerical results demonstrate stable convergence to the unique physical solution from random initializations, with accurate resolution of sharp gradients and controlled Mean Squared Error (MSE) across a wide range of discretization parameters. Detailed comparisons with analytical solutions indicate that the framework achieves competitive accuracy and stability. Overall, the proposed framework provides a fast, flexible, and physically consistent alternative to traditional numerical solvers, offering a potential pathway for scalable PDE solutions in both research and engineering applications.

【22】Inferring bifurcation diagrams of two distinct chaotic systems by a single machine
标题：用单个机器推断两个不同混乱系统的分歧图
链接：https://arxiv.org/abs/2604.26632

作者：Jianmin Guo,Yao Du,Yizhen Yu,Yong Zou,Xingang Wang
备注：10 pages, 4 figures
摘要：我们提出了一个双通道的并行计算方案，用一台机器来推断两个不同的混沌系统的动力学。通过用系统标签通道和参数控制通道扩充标准库，可以从从两个系统的几个采样状态收集的时间序列训练机器。我们表明，经过训练的机器不仅预测了采样状态的短期演化，而且还再现了看不见的状态的长期统计特性，从而能够从部分观测值重建两个系统的分叉图。数值模拟中的Lorenz和Rössler系统和实验中的Chua和Rossler电路证明了该方案的有效性。功能网络分析进一步表明，这两个目标系统编码的水库不同的动力学模式。这些结果扩展了多功能和参数感知的水库计算，并提供了一条路线，使用一台机器的多个非线性系统的数据驱动的推理。
摘要：We propose a dual-channel reservoir-computing scheme for inferring the dynamics of two distinct chaotic systems with a single machine. By augmenting a standard reservoir with a system-label channel and a parameter-control channel, the machine can be trained from time series collected from a few sampled states of the two systems. We show that the trained machine not only predicts the short-time evolution of the sampled states, but also reproduces the long-term statistical properties of unseen states, thereby enabling reconstruction of the bifurcation diagrams of both systems from partial observations. The effectiveness of the scheme is demonstrated for the Lorenz and Rössler systems in numerical simulations and for the Chua and Rossler circuits in experiments. Functional-network analysis further shows that the two target systems are encoded by distinct dynamical patterns in the reservoir. These results extend multifunctional and parameter-aware reservoir computing, and provide a route to data-driven inference of multiple nonlinear systems using a single machine.

【23】Recipes for Calibration Checks in Safety-Critical Applications
标题：安全关键应用中的校准检查方法
链接：https://arxiv.org/abs/2604.26479

作者：Romeo Valentin
备注：36 pages, 22 figures. Manuscript prepared with Typst
摘要：安全关键预测系统，如自动驾驶汽车、天气预报和医疗监测，通常依赖于概率预报。这些预测者对未来可能的结果进行预测，其质量和稳健性需要验证和认证。通常，只有准确性-预测的平均值-才能与真实结果进行评估。然而，对于安全关键场景和不确定性下的决策，应检查预测的完整分布特性：观察到的预测误差是否实际上遵循预测的概率分布？为此，我们引入了一个校准检查的框架：统计测试，验证分布特性的预测时，测量了许多样本。为了支持在实际操作中的易用性，这些检查为从预报器收集的数据产生单个接受/拒绝决策。这与产生一个或多个连续校准分数并需要专业知识在验证工作流程中实施的典型校准计算形成对比。我们通过对校准测试进行修改来进一步支持操作化，这些修改（a）仅拒绝过度自信的预测，允许在安全关键设置中进行悲观或谨慎的预测，以及（b）即使对于大量验证样本，也能容忍小的、操作上可接受的偏差。我们将校准检查过程组织成一个模块化的管道，包括四个步骤：（i）数据模型，（ii）选择的度量，（iii）假设制定，（iv）测试程序。每个步骤都由独立的可交换组件组成，从而支持各种可能的用例和权衡。我们证明了两个互补的例子问题，天气预报和机器人姿态估计的框架的适用性。
摘要：Safety-critical prediction systems, such as autonomous vehicles, weather forecasters, and medical monitors, commonly rely on probabilistic forecasters. These forecasters make predictions about possible future outcomes, and their quality and robustness needs to be validated and certified. Often, only accuracy -- the mean of the predictions -- is evaluated against true outcomes. However, for safety-critical scenarios and decision making under uncertainty, the full distributional properties of the forecasts should be checked: do the observed prediction errors actually follow the forecasted probability distributions? To this end, we introduce a framework for calibration checks: statistical tests that validate distributional properties of forecasts when measured over many samples. In order to support ease-of-use in real-world operations, these checks produce a single accept/reject decision for data collected from a forecaster. This contrasts typical calibration calculations which produce one or multiple continuous calibration scores and require expertise to implement in a validation workflow. We further support operationalization by introducing modifications to calibration testing that (a) reject only overconfident predictions, allowing for pessimistic or cautious predictions in safety-critical settings, and (b) tolerate small, operationally acceptable deviations even for large numbers of validation samples. We organize the calibration checking process into a modular pipeline comprising four steps: (i) the data model, (ii) the chosen metric, (iii) the hypothesis formulation, and (iv) the testing procedure. Each step consists of independently swappable components, thereby supporting a large variety of possible use-cases and trade-offs. We demonstrate the applicability of the framework on two complementary example problems, weather forecasting and robot pose estimation.

【24】Order-Sensitive Sequential Interventions on Ideal Lattices
标题：理想格上的顺序敏感顺序干预
链接：https://arxiv.org/abs/2604.26472

作者：Dmitry Pasechnyuk-Vilensky
备注：18 pages
摘要：我们研究的先决条件约束下的顺序干预。在这种情况下，容许干预序列是有限前提偏序集的理想格中的路径，而不是无约束的动作串。在此状态空间上，我们给出了一个精确的局部到全局的阶灵敏度理论。首先，我们证明了任何两个允许的路径具有相同的端点不同的有限序列的基本钻石交换。其次，对于边加性路径估值，我们表明，路径独立性相当于消失的钻石曲率，产生一个端点潜在的理想晶格上的一个典型的莫比乌斯参数化。第三，我们证明了一个局部钻石场诱导的边为基础的路径模型，当且仅当它满足立方体的一致性，与唯一性后固定的参考树规范。在简化状态纵向假设下，支持的参考路径识别参考路径分数，而局部顺序效应需要每个菱形上的两个顺序的双侧支持。这些结果产生确切的规划后果，包括一个顺序不敏感的限制和动态规划截断理想格。
摘要：We study sequential interventions under prerequisite constraints. In this setting, admissible intervention sequences are paths in the ideal lattice of a finite prerequisite poset rather than unconstrained action strings. We give an exact local-to-global theory of order sensitivity on this state space. First, we prove that any two admissible paths with the same endpoints differ by a finite sequence of elementary diamond swaps. Second, for edge-additive path valuations, we show that path-independence is equivalent to vanishing diamond curvature, yielding an endpoint potential with a canonical Möbius parameterization on the ideal lattice. Third, we prove that a local diamond field is induced by an edge-based path model if and only if it satisfies cube consistency, with uniqueness after fixing a reference-tree gauge. Under reduced-state longitudinal assumptions, supported reference paths identify reference-path scores, whereas local order effects require two-sided support of both orders on each diamond. These results yield exact planning consequences, including an order-insensitivity bound and dynamic programming on the truncated ideal lattice.

【25】DiffAnon: Diffusion-based Prosody Control for Voice Anonymization
标题：DiffAnon：基于扩散的韵律控制语音识别
链接：https://arxiv.org/abs/2604.26281

作者：Ismail Rasim Ulgen,Zexin Cai,Nicholas Andrews,Philipp Koehn,Berrak Sisman
备注：Submitted to Interspeech 2026
摘要：保留或不保留韵律是语音匿名化的核心问题。韵律传达意义和情感，但与说话人身份紧密相连。现有的方法要么放弃韵律隐私或缺乏一个原则性的机制来控制效用隐私的权衡，在固定的设计点操作。我们提出了DiffAnon，一种基于扩散的匿名化方法，具有无分类器指导（CFG），提供了对韵律保留的明确的，连续的推理时间控制。DiffAnon在RVQ编解码器的语义嵌入上细化声学细节，在单个模型内实现匿名化强度和韵律保真度之间的平滑插值。据我们所知，这是第一个语音匿名框架，提供结构化的，可插值的推理时间韵律控制。实验证明结构化的权衡行为，实现强大的效用，同时保持竞争力的隐私在可控的操作点。
摘要：To preserve or not to preserve prosody is a central question in voice anonymization. Prosody conveys meaning and affect, yet is tightly coupled with speaker identity. Existing methods either discard prosody for privacy or lack a principled mechanism to control the utility-privacy trade-off, operating at fixed design points. We propose DiffAnon, a diffusion-based anonymization method with classifier-free guidance (CFG) that provides explicit, continuous inference-time control over prosody preservation. DiffAnon refines acoustic detail over semantic embeddings of an RVQ codec, enabling smooth interpolation between anonymization strength and prosodic fidelity within a single model. To the best of our knowledge, it is the first voice anonymization framework to provide structured, interpolatable inference-time prosody control. Experiments demonstrate structured trade-off behavior, achieving strong utility while maintaining competitive privacy across controllable operating points.

【26】Occam's Razor is Only as Sharp as Your ELBO
标题：奥卡姆剃刀的锋利程度只有你的ELBO一样高
链接：https://arxiv.org/abs/2604.25984

作者：Ethan Harvey,Michael C. Hughes
摘要：边际似然，也被称为证据，被认为是奥卡姆剃刀的数学体现，使模型选择，避免过拟合。证据下限（ELBO）变分推理的目标也被用于类似的目的。先前的工作表明，通过平均场近似限制近似后验族可能导致ELBO欠拟合。在本文中，我们展示了在一个简单的过参数化回归模型中基于ELBO的超参数学习如何也会产生过拟合，这取决于高斯近似后验中协方差矩阵的假设秩。令人惊讶的是，在只有欠拟合和过拟合选项中，贝叶斯模型选择通过证据本身有时更喜欢过拟合版本，而ELBO则不喜欢。希望扩展到大型模型的贝叶斯从业者应该谨慎考虑如何降低易处理性所需的秩假设可能会影响模型选择的潜力。
摘要：The marginal likelihood, also known as the evidence, is regarded as a mathematical embodiment of Occam's razor, enabling model selection that avoids overfitting. The evidence lower bound (ELBO) objective from variational inference has also been used for similar purposes. Prior work has shown that restricting the approximate posterior family via a mean-field approximation can lead the ELBO to underfit. In this paper, we show how ELBO-based hyperparameter learning in a simple over-parameterized regression model can also produce overfitting, depending on the assumed rank of the covariance matrix in a Gaussian approximate posterior. Surprisingly, among only the underfit and overfit options, Bayesian model selection via the evidence itself sometimes prefers the overfit version, while the ELBO does not. Bayesian practitioners hoping to scale to large models should be cautious about how reduced-rank assumptions needed for tractability may impact the potential for model selection.

【27】Auditing Marketing Budget Allocation with Hindsight Regret
标题：事后后悔审计营销预算分配
链接：https://arxiv.org/abs/2604.25977

作者：Nilavra Pathak,Olivier Jeunen,Eric Lambert
备注：6 pages, 8 figures
摘要：各组织通常在业务限制下进行战略预算分配，但往往缺乏一种原则性的方法来评估已实现的分配是否接近事后的最佳可行选择。我们提出了一个基于事后后悔的回顾性审计框架，定义为在相同的预算和稳定性护栏下，相对于约束忠实基准的已实现分配的机会成本。该框架估计特定于政权的支出-响应函数的历史日志，计算可行的事后分配通过约束优化，并通过蒙特卡罗评估传播不确定性，以产生遗憾分布，预期电梯，和概率的改进摘要。这将分配效率与估计响应面中的不确定性分开。真实的营销分配日志上的实验表明，该框架产生可解释的事后诊断，并揭示了分配灵活性和可检测性之间的实际权衡：适度可行的重新分配往往捕获最可测量的增益，而较大的变化移动到弱支持区域具有较高的不确定性。其结果是一个实用的方法，用于审计历史的预算决策时，在线实验是昂贵的或不可行的。
摘要：Organizations routinely make strategic budget allocations under operational constraints, but often lack a principled way to assess whether realized allocations were close to the best feasible choices in hindsight. We present a retrospective auditing framework based on hindsight regret, defined as the opportunity cost of the realized allocation relative to a constraint-faithful benchmark under the same budget and stability guardrails. The framework estimates regime-specific spend--response functions from historical logs, computes feasible hindsight allocations via constrained optimization, and propagates uncertainty through Monte Carlo evaluation to produce regret distributions, expected lift, and probability-of-improvement summaries. This separates allocation inefficiency from uncertainty in the estimated response surfaces. Experiments on real marketing allocation logs show that the framework yields interpretable post-hoc diagnostics and reveals a practical trade-off between allocation flexibility and detectability: moderate feasible reallocations often capture most measurable gain, while larger shifts move into weak-support regions with higher uncertainty. The result is a practical method for auditing historical budget decisions when online experimentation is costly or infeasible.

机器翻译由腾讯交互翻译提供，仅供参考

点击“阅读原文”获取带摘要的学术速递