机器学习学术速递[4.8]

点击阅读原文访问arxivdaily.com，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏等功能！

cs.LG 方向，今日共计140篇

大模型相关(15篇)

【1】HaloProbe: Bayesian Detection and Mitigation of Object Hallucinations in Vision-Language Models
标题：HaloProbe：视觉语言模型中物体幻觉的贝叶斯检测和缓解
链接：https://arxiv.org/abs/2604.06165

作者：Reihaneh Zohrabi,Hosein Hasani,Akshita Gupta,Mahdieh Soleymani Baghshah,Anna Rohrbach,Marcus Rohrbach
摘要：大型视觉语言模型会在图像描述中产生对象幻觉，这突出了对有效检测和缓解策略的需求。先前的工作通常依赖于模型对视觉标记的注意力权重作为检测信号。我们发现，粗粒度的基于注意力的分析是不可靠的，由于隐藏的混杂因素，特别是令牌的位置和对象重复的描述。这导致了辛普森悖论：当统计数据汇总时，注意力趋势逆转或消失。基于这一观察，我们引入了HaloProbe，这是一个贝叶斯框架，它分解了外部描述统计和内部解码信号，以估计令牌级的幻觉概率。HaloProbe使用平衡训练来隔离内部证据，并将其与外部特征的先验知识相结合，以恢复真正的后验。虽然基于干预的缓解方法通常会通过修改模型的内部结构来降低实用性或流畅性，但我们使用HaloProbe作为非侵入性缓解的外部评分信号。我们的实验表明，HaloProbe引导的解码比最先进的基于干预的方法更有效地减少了幻觉，同时保留了实用性。
摘要：Large vision-language models can produce object hallucinations in image descriptions, highlighting the need for effective detection and mitigation strategies. Prior work commonly relies on the model's attention weights on visual tokens as a detection signal. We reveal that coarse-grained attention-based analysis is unreliable due to hidden confounders, specifically token position and object repetition in a description. This leads to Simpson's paradox: the attention trends reverse or disappear when statistics are aggregated. Based on this observation, we introduce HaloProbe, a Bayesian framework that factorizes external description statistics and internal decoding signals to estimate token-level hallucination probabilities. HaloProbe uses balanced training to isolate internal evidence and combines it with learned prior over external features to recover the true posterior. While intervention-based mitigation methods often degrade utility or fluency by modifying models' internals, we use HaloProbe as an external scoring signal for non-invasive mitigation. Our experiments show that HaloProbe-guided decoding reduces hallucinations more effectively than state-of-the-art intervention-based methods while preserving utility.

【2】The Model Agreed, But Didn't Learn: Diagnosing Surface Compliance in Large Language Models
标题：模型同意，但没有学到：诊断大型语言模型中的表面合规性
链接：https://arxiv.org/abs/2604.05995

作者：Xiaojie Gu,Ziying Huang,Weicong Hong,Jian Xie,Renze Lou,Kai Zhang
备注：ACL 2026 Findings
摘要：大型语言模型（LLM）将大量的世界知识内化为参数记忆，但不可避免地继承了其源语料库的陈旧和错误。因此，确保这些内部表示的可靠性和延展性对于可信的现实世界部署至关重要。知识编辑提供了一个关键的范例，可以在不重新训练的情况下对记忆进行外科手术式的修改。然而，尽管最近的编辑在标准基准测试中表现出很高的成功率，但目前依赖于在特定提示条件下评估输出的评估框架是否能够可靠地验证真正的内存修改仍然存在疑问。在这项工作中，我们介绍了一个简单的诊断框架，主题模型的歧视性自我评估在上下文学习（ICL）的设置，更好地反映了现实世界的应用环境，专门设计来仔细检查由记忆修改引起的微妙的行为细微差别。这种探索揭示了一种普遍存在的表面合规性现象，即编辑仅通过模仿目标输出而没有结构性的内部信念来获得高基准分数。此外，我们发现，递归修改积累代表性的残留物，引发认知不稳定性和永久减少模型的记忆状态的可逆性。这些见解强调了当前编辑范式的风险，并强调了强大的记忆修改在构建值得信赖的，长期可持续的LLM系统中的关键作用。代码可在https://github.com/XiaojieGu/SA-MCQ上获得。
摘要：Large Language Models (LLMs) internalize vast world knowledge as parametric memory, yet inevitably inherit the staleness and errors of their source corpora. Consequently, ensuring the reliability and malleability of these internal representations is imperative for trustworthy real-world deployment. Knowledge editing offers a pivotal paradigm for surgically modifying memory without retraining. However, while recent editors demonstrate high success rates on standard benchmarks, it remains questionable whether current evaluation frameworks that rely on assessing output under specific prompting conditions can reliably authenticate genuine memory modification. In this work, we introduce a simple diagnostic framework that subjects models to discriminative self-assessment under in-context learning (ICL) settings that better reflect real-world application environments, specifically designed to scrutinize the subtle behavioral nuances induced by memory modifications. This probing reveals a pervasive phenomenon of Surface Compliance, where editors achieve high benchmark scores by merely mimicking target outputs without structurally overwriting internal beliefs. Moreover, we find that recursive modifications accumulate representational residues, triggering cognitive instability and permanently diminishing the reversibility of the model's memory state. These insights underscore the risks of current editing paradigms and highlight the pivotal role of robust memory modification in building trustworthy, long-term sustainable LLM systems. Code is available at https://github.com/XiaojieGu/SA-MCQ.

【3】Hierarchical Reinforcement Learning with Augmented Step-Level Transitions for LLM Agents
标题：LLM代理的具有增强阶梯转换的分层强化学习
链接：https://arxiv.org/abs/2604.05808

作者：Shuai Zhen,Yanhua Yu,Ruopei Guo,Nan Cheng,Yang Deng
备注：Accepted to ACL 2026 Main Conference
摘要：大型语言模型（LLM）代理在复杂的交互式决策任务中表现出强大的能力。然而，现有的LLM代理通常依赖于越来越长的交互历史，导致高计算成本和有限的可扩展性。在本文中，我们提出了STEP-HRL，一个分层强化学习（HRL）框架，使步骤级学习条件只有一步过渡，而不是完整的互动历史。STEP-HRL将任务分层结构化，使用已完成的子任务来表示整个任务的全局进度。通过引入本地进度模块，它还迭代地、有选择地总结每个子任务内的交互历史记录，以生成本地进度的紧凑摘要。这些组件一起为高级和低级策略产生增强的步骤级转换。ScienceWorld和ALFWorld基准测试的实验结果一致表明，STEP-HRL在性能和泛化方面大大优于基线，同时减少了令牌使用。我们的代码可在https://github.com/TonyStark042/STEP-HRL上获得。
摘要：Large language model (LLM) agents have demonstrated strong capabilities in complex interactive decision-making tasks. However, existing LLM agents typically rely on increasingly long interaction histories, resulting in high computational cost and limited scalability. In this paper, we propose STEP-HRL, a hierarchical reinforcement learning (HRL) framework that enables step-level learning by conditioning only on single-step transitions rather than full interaction histories. STEP-HRL structures tasks hierarchically, using completed subtasks to represent global progress of overall task. By introducing a local progress module, it also iteratively and selectively summarizes interaction history within each subtask to produce a compact summary of local progress. Together, these components yield augmented step-level transitions for both high-level and low-level policies. Experimental results on ScienceWorld and ALFWorld benchmarks consistently demonstrate that STEP-HRL substantially outperforms baselines in terms of performance and generalization while reducing token usage. Our code is available at https://github.com/TonyStark042/STEP-HRL.

【4】LUDOBENCH: Evaluating LLM Behavioural Decision-Making Through Spot-Based Board Game Scenarios in Ludo
标题：LudobENCH：通过Ludo中的现场棋盘游戏场景评估LLM行为决策
链接：https://arxiv.org/abs/2604.05681

作者：Ojas Jain,Dhruv Kumar
备注：Under Review
摘要：我们介绍LudoBench，LLM战略推理在Ludo，随机多智能体棋盘游戏的骰子机制，一块捕获，安全广场导航，和家庭路径进展引入有意义的规划复杂性评估的基准。LudoBench包含480个手工制作的现场场景，涉及12个行为上不同的决策类别，每个类别都隔离了一个特定的战略选择。我们还提供了一个功能齐全的4人Ludo模拟器，支持随机，启发式，博弈论和LLM代理。博弈论代理使用Expectiminimax搜索与深度有限的前瞻，以提供一个原则性的战略上限超越贪婪的竞争。评估六个模型跨越四个模型家族，我们发现，所有的模型同意与博弈论基线只有40-46%的时间。模型分为不同的行为原型：完成者完成了部分但忽略了发展，而建设者发展但从未完成。每一个原型都只包含了博弈论策略的一半。在相同的董事会状态下，在历史条件下的怨恨框架下，模型还显示出可衡量的行为变化，揭示了即时敏感性是一个关键的弱点。LudoBench提供了一个轻量级和可解释的框架，用于在不确定性下对LLM战略推理进行基准测试。所有代码、spot数据集（480个条目）和模型输出都可以在https://anonymous.4open.science/r/LudoBench-5CBF/上获得
摘要：We introduce LudoBench, a benchmark for evaluating LLM strategic reasoning in Ludo, a stochastic multi-agent board game whose dice mechanics, piece capture, safe-square navigation, and home-path progression introduce meaningful planning complexity. LudoBench comprises 480 handcrafted spot scenarios across 12 behaviorally distinct decision categories, each isolating a specific strategic choice. We additionally contribute a fully functional 4-player Ludo simulator supporting Random, Heuristic, Game-Theory, and LLM agents. The game-theory agent uses Expectiminimax search with depth-limited lookahead to provide a principled strategic ceiling beyond greedy heuristics. Evaluating six models spanning four model families, we find that all models agree with the game-theory baseline only 40-46% of the time. Models split into distinct behavioral archetypes: finishers that complete pieces but neglect development, and builders that develop but never finish. Each archetype captures only half of the game theory strategy. Models also display measurable behavioral shifts under history-conditioned grudge framing on identical board states, revealing prompt-sensitivity as a key vulnerability. LudoBench provides a lightweight and interpretable framework for benchmarking LLM strategic reasoning under uncertainty. All code, the spot dataset (480 entries) and model outputs are available at https://anonymous.4open.science/r/LudoBench-5CBF/

【5】LLM Reasoning as Trajectories: Step-Specific Representation Geometry and Correctness Signals
标题：LLM推理作为轨迹：特定步骤的表示几何和正确性信号
链接：https://arxiv.org/abs/2604.05655

作者：Lihao Sun,Hang Dong,Bo Qiao,Qingwei Lin,Dongmei Zhang,Saravan Rajmohan
备注：ACL 2026 (Main)
摘要：这项工作将大型语言模型的思想链生成描述为通过表示空间的结构化轨迹。我们表明，数学推理会穿越功能有序的、特定于步骤的子空间，这些子空间随着层深度的增加而变得越来越可分离。这种结构已经存在于基础模型中，而推理训练主要是加速向终止相关子空间的收敛，而不是引入新的表示组织。虽然早期的推理步骤遵循类似的轨迹，但正确和不正确的解决方案在后期阶段会系统地分歧。这种后期发散使得最终答案正确性的中期推理预测具有高达0.87的ROC-AUC。此外，我们引入了基于推理的转向，推理时间干预框架，使推理校正和长度控制的基础上派生的理想轨迹。总之，这些结果建立推理轨迹作为解释，预测和控制LLM推理行为的几何透镜。
摘要：This work characterizes large language models' chain-of-thought generation as a structured trajectory through representation space. We show that mathematical reasoning traverses functionally ordered, step-specific subspaces that become increasingly separable with layer depth. This structure already exists in base models, while reasoning training primarily accelerates convergence toward termination-related subspaces rather than introducing new representational organization. While early reasoning steps follow similar trajectories, correct and incorrect solutions diverge systematically at late stages. This late-stage divergence enables mid-reasoning prediction of final-answer correctness with ROC-AUC up to 0.87. Furthermore, we introduce trajectory-based steering, an inference-time intervention framework that enables reasoning correction and length control based on derived ideal trajectories. Together, these results establish reasoning trajectories as a geometric lens for interpreting, predicting, and controlling LLM reasoning behavior.

【6】FastDiSS: Few-step Match Many-step Diffusion Language Model on Sequence-to-Sequence Generation--Full Version
标题：FastDiSS：基于序列到序列生成的几步匹配多步扩散语言模型--完整版本
链接：https://arxiv.org/abs/2604.05551

作者：Dat Nguyen-Cong,Tung Kieu,Hoang Thanh-Tung
备注：camera-ready version, accepted by ACL Findings (ACL 2026)
摘要：自我调节是连续扩散语言模型成功的核心，因为它允许模型纠正以前的错误。然而，它的能力恰恰在扩散对部署最有吸引力的情况下下降：快速推理的几步采样。在这项研究中，我们表明，当模型只有几个去噪步骤时，不准确的自调节会导致大量的近似差距;这种错误会在去噪步骤中产生化合物，并最终控制样本质量。为了解决这个问题，我们提出了一种新的训练框架，通过扰动自调节信号来匹配推理噪声，从而在学习过程中处理这些错误，从而提高对先前估计错误的鲁棒性。此外，我们还引入了一个令牌级的噪声感知机制，可以防止训练饱和，从而提高优化效果。跨条件生成基准的广泛实验表明，我们的框架超过了标准的连续扩散模型，同时提供了高达400倍的推理速度，并与其他一步扩散框架相比仍然具有竞争力。
摘要：Self-conditioning has been central to the success of continuous diffusion language models, as it allows models to correct previous errors. Yet its ability degrades precisely in the regime where diffusion is most attractive for deployment: few-step sampling for fast inference. In this study, we show that when models only have a few denoising steps, inaccurate self-conditioning induces a substantial approximation gap; this mistake compounds across denoising steps and ultimately dominate the sample quality. To address this, we propose a novel training framework that handles these errors during learning by perturbing the self-conditioning signal to match inference noise, improving robustness to prior estimation errors. In addition, we introduce a token-level noise-awareness mechanism that prevents training from saturation, hence improving optimization. Extensive experiments across conditional generation benchmarks demonstrate that our framework surpasses standard continuous diffusion models while providing up to 400x faster inference speed, and remains competitive against other one-step diffusion frameworks.

【7】AttnDiff: Attention-based Differential Fingerprinting for Large Language Models
标题：AttnDiff：大型语言模型的基于注意力的差异指纹识别
链接：https://arxiv.org/abs/2604.05502

作者：Haobo Zhang,Zhenhua Xu,Junxian Li,Shangfeng Sheng,Dezhang Kong,Meng Han
备注：Accepted at ACL2026 Main
摘要：保护开放权重大型语言模型（LLM）的知识产权需要验证可疑模型是否来自受害者模型，尽管常见的清洗操作，如微调（包括PPO/DPO），修剪/压缩和模型合并。我们提出了\textsc{AttnDiff}，一个数据高效的白盒框架，通过内在的信息路由行为从模型中提取指纹。\textsc{AttnDiff}探测引起受控语义冲突的最小编辑提示对，捕获不同的注意模式，使用紧凑的谱描述符对其进行总结，并使用CKA比较模型。在Llama-2/3和Qwen2.5（3B-14 B）以及其他开源家族中，它产生了相关衍生物的高相似性，同时分离了不相关的模型家族（例如，$>0.98$与\ $<0.22$，$M=60$探头）。5- 60个多域探针，支持实际的来源验证和问责。
摘要：Protecting the intellectual property of open-weight large language models (LLMs) requires verifying whether a suspect model is derived from a victim model despite common laundering operations such as fine-tuning (including PPO/DPO), pruning/compression, and model merging. We propose \textsc{AttnDiff}, a data-efficient white-box framework that extracts fingerprints from models via intrinsic information-routing behavior. \textsc{AttnDiff} probes minimally edited prompt pairs that induce controlled semantic conflicts, captures differential attention patterns, summarizes them with compact spectral descriptors, and compares models using CKA. Across Llama-2/3 and Qwen2.5 (3B--14B) and additional open-source families, it yields high similarity for related derivatives while separating unrelated model families (e.g., $>0.98$ vs.\ $<0.22$ with $M=60$ probes). With 5--60 multi-domain probes, it supports practical provenance verification and accountability.

【8】LLMs Should Express Uncertainty Explicitly
标题：LLM应该明确表达不确定性
链接：https://arxiv.org/abs/2604.05306

作者：Junyu Guo,Shangding Gu,Ming Jin,Costas Spanos,Javad Lavaei
摘要：大型语言模型越来越多地用于不确定性必须驱动决策的环境中，例如解释，检索和验证。大多数现有的方法将不确定性视为生成后要估计的潜在量，而不是模型训练表达的信号。相反，我们研究不确定性作为控制的接口。我们比较了两个互补的接口：一个全球接口，其中模型用语言表达其最终答案的校准置信度得分，和一个本地接口，其中模型在推理过程中当它进入高风险状态时发出明确的标记。这些接口提供不同但互补的好处。言语化的信心大大提高了校准，减少过度自信的错误，并产生最强的整体自适应RAG控制器，同时更有选择地使用检索。推理时间不确定性信号使以前沉默的失败在生成过程中可见，提高了错误答案的覆盖率，并提供了一个有效的高召回检索触发器。我们的研究结果进一步表明，这两个接口在内部的工作方式不同：口头信心主要是改善现有的不确定性是如何解码，而推理时间信号诱导更广泛的后期层重组。总之，这些结果表明，LLM中的有效不确定性应该被训练为任务匹配的沟通：决定是否信任最终答案的全局信心，以及决定何时需要干预的局部信号。
摘要：Large language models are increasingly used in settings where uncertainty must drive decisions such as abstention, retrieval, and verification. Most existing methods treat uncertainty as a latent quantity to estimate after generation rather than a signal the model is trained to express. We instead study uncertainty as an interface for control. We compare two complementary interfaces: a global interface, where the model verbalizes a calibrated confidence score for its final answer, and a local interface, where the model emits an explicit marker during reasoning when it enters a high-risk state. These interfaces provide different but complementary benefits. Verbalized confidence substantially improves calibration, reduces overconfident errors, and yields the strongest overall Adaptive RAG controller while using retrieval more selectively. Reasoning-time uncertainty signaling makes previously silent failures visible during generation, improves wrong-answer coverage, and provides an effective high-recall retrieval trigger. Our findings further show that the two interfaces work differently internally: verbal confidence mainly refines how existing uncertainty is decoded, whereas reasoning-time signaling induces a broader late-layer reorganization. Together, these results suggest that effective uncertainty in LLMs should be trained as task-matched communication: global confidence for deciding whether to trust a final answer, and local signals for deciding when intervention is needed.

【9】EffiPair: Improving the Efficiency of LLM-generated Code with Relative Contrastive Feedback
标题：EffiPair：利用相对对比反馈提高LLM生成代码的效率
链接：https://arxiv.org/abs/2604.05137

作者：Samira Hajizadeh,Suman Jana
摘要：大型语言模型（LLM）通常会生成功能正确但运行时和内存效率低下的代码。提高代码效率的先前方法通常依赖于绝对执行反馈，诸如剖析单个程序的运行时或存储器使用，这是昂贵的并且提供用于细化的弱指导。我们提出了相对对比反馈（RCF），推理时间反馈机制，不需要模型微调或参数更新。RCF比较了两个结构相似的程序用于相同的任务，并强调了与更高效率相关的差异。基于这个想法，我们引入EffiPair，一个推理时间迭代优化框架，它完全在测试时运行，生成多个候选解决方案，识别具有较大效率差距的信息程序对，将其执行差异总结为轻量级反馈，并使用此信号产生更有效的解决方案。通过用成对对比比较取代孤立标量反馈，EffiPair提供了更直接的指导，同时减少了分析和提示开销。在代码效率基准测试上的实验表明，EffiPair在保持正确性的同时不断提高效率。例如，在DeepSeek-Chat V3.2中，EffiPair在没有性能反馈的情况下实现了高达1.5倍的生成速度，同时与之前的工作相比，令牌使用量减少了90%以上。
摘要：Large language models (LLMs) often generate code that is functionally correct but inefficient in runtime and memory. Prior approaches to improving code efficiency typically rely on absolute execution feedback, such as profiling a single program's runtime or memory usage, which is costly and provides weak guidance for refinement. We propose Relative Contrastive Feedback (RCF), an inference-time feedback mechanism that requires no model fine-tuning or parameter updates. RCF compares two structurally similar programs for the same task and highlights the differences associated with better efficiency. Building on this idea, we introduce EffiPair, an inference-time iterative refinement framework that operates entirely at test time by generating multiple candidate solutions, identifying informative program pairs with large efficiency gaps, summarizing their execution differences into lightweight feedback, and using this signal to produce more efficient solutions. By replacing isolated scalar feedback with pairwise contrastive comparisons, EffiPair provides more direct guidance while reducing profiling and prompting overhead. Experiments on code-efficiency benchmarks show that EffiPair consistently improves efficiency while preserving correctness. For instance, with DeepSeek-Chat V3.2, EffiPair achieves up to 1.5x speedup over generation without performance feedback, while reducing token usage by more than 90% compared to prior work.

【10】$π^2$: Structure-Originated Reasoning Data Improves Long-Context Reasoning Ability of Large Language Models
标题：$pi ' 2 $：源自结构的推理数据提高了大型语言模型的长上下文推理能力
链接：https://arxiv.org/abs/2604.05114

作者：Quyet V. Do,Thinh Pham,Nguyen Nguyen,Sha Li,Pratibha Zunjare,Tu Vu
备注：Our structured analytical reasoning data, which originates from Wikipedia tables, significantly improves long-context reasoning capability of LLMs
摘要：我们研究了一个从初始结构化数据中管理推理数据的管道，以改善大型语言模型（LLM）中的长上下文推理。我们的方法，$π^2$，通过严格的QA策展构建高质量的推理数据：1）从维基百科中提取并扩展表格，2）从收集的表格和相关上下文中生成真实的、多跳的分析推理问题，其答案通过双路径代码执行自动确定和验证，以及3）将逐步结构化的推理轨迹反向翻译为给定现实网络搜索上下文的QA对的解决方案。在$π^2$上使用\textsc{\small{gpt-oss-20 b}}和\textsc{\small{Qwen 3 - 4 B-Instruct-2507}}进行监督微调，在四个长上下文推理基准测试和我们类似的$π^2$-Bench上产生了一致的改进，平均绝对准确率分别提高了+4.3%和+2.7%。值得注意的是，我们的数据集促进了自我升华，其中\textsc{\small{gpt-oss-20 b}}甚至通过自己的推理跟踪将其平均性能提高了+4.4%，证明了$π^2$的有用性。我们的代码、数据和模型在https://github.com/vt-pi-squared/pi-squared上是开源的。
摘要：We study a pipeline that curates reasoning data from initial structured data for improving long-context reasoning in large language models (LLMs). Our approach, $π^2$, constructs high-quality reasoning data through rigorous QA curation: 1) extracting and expanding tables from Wikipedia, 2) from the collected tables and relevant context, generating realistic and multi-hop analytical reasoning questions whose answers are automatically determined and verified through dual-path code execution, and 3) back-translating step-by-step structured reasoning traces as solutions of QA pairs given realistic web-search context. Supervised fine-tuning with \textsc{\small{gpt-oss-20b}} and \textsc{\small{Qwen3-4B-Instruct-2507}} on $π^2$ yields consistent improvements across four long-context reasoning benchmarks and our alike $π^2$-Bench, with average absolute accuracy gains of +4.3% and +2.7% respectively. Notably, our dataset facilitates self-distillation, where \textsc{\small{gpt-oss-20b}} even improves its average performance by +4.4% with its own reasoning traces, demonstrating $π^2$'s usefulness. Our code, data, and models are open-source at https://github.com/vt-pi-squared/pi-squared.

【11】Multilingual Language Models Encode Script Over Linguistic Structure
标题：多语言语言模型在语言结构上编码脚本
链接：https://arxiv.org/abs/2604.05090

作者：Aastha A K Verma,Anwoy Chatterjee,Mehak Gupta,Tanmoy Chakraborty
备注：Accepted at ACL 2026 (Main)
摘要：多语言语言模型（LM）将类型和拼写上不同的语言的表示组织到一个共享的参数空间中，但这种内部组织的性质仍然难以捉摸。在这项工作中，我们调查的语言属性-抽象的语言身份或表面形式的线索-形状的多语言表示。专注于紧凑，蒸馏模型，其中代表性的权衡是明确的，我们分析语言相关的单位在Llama-3.2-1B和Gemma-2-2B使用语言激活概率熵（LAPE）度量，并进一步分解激活稀疏自动编码器。我们发现这些单元强烈依赖于正字法：罗马化会导致近乎不相交的表示，既不与母语输入也不与英语对齐，而词序重排对单元身份的影响有限。探测表明，类型结构变得越来越容易在更深的层次，而因果干预表明，代是最敏感的单位是不变的表面形式的扰动，而不是单位确定的类型对齐。总的来说，我们的研究结果表明，多语言LM组织表面形式的代表性，语言抽象逐渐出现，而不会崩溃成一个统一的中间语言。
摘要：Multilingual language models (LMs) organize representations for typologically and orthographically diverse languages into a shared parameter space, yet the nature of this internal organization remains elusive. In this work, we investigate which linguistic properties - abstract language identity or surface-form cues - shape multilingual representations. Focusing on compact, distilled models where representational trade-offs are explicit, we analyze language-associated units in Llama-3.2-1B and Gemma-2-2B using the Language Activation Probability Entropy (LAPE) metric, and further decompose activations with Sparse Autoencoders. We find that these units are strongly conditioned on orthography: romanization induces near-disjoint representations that align with neither native-script inputs nor English, while word-order shuffling has limited effect on unit identity. Probing shows that typological structure becomes increasingly accessible in deeper layers, while causal interventions indicate that generation is most sensitive to units that are invariant to surface-form perturbations rather than to units identified by typological alignment alone. Overall, our results suggest that multilingual LMs organize representations around surface form, with linguistic abstraction emerging gradually without collapsing into a unified interlingua.

【12】Beyond LLM-as-a-Judge: Deterministic Metrics for Multilingual Generative Text Evaluation
标题：超越法学硕士作为评委：多语言生成文本评估的确定性预设
链接：https://arxiv.org/abs/2604.05083

作者：Firoj Alam,Gagan Bhatia,Sahinur Rahman Laskar,Shammur Absar Chowdhury
摘要：虽然大型语言模型（LLM）越来越多地被用作评估生成文本的自动判断器，但它们的输出通常成本高昂，并且对提示设计、语言和聚合策略高度敏感，这严重限制了可重复性。为了解决这些挑战，我们提出了\textbf{\textit{OmniScore}}，一个家庭的互补性，确定性的学习指标，使用小尺寸（$
摘要：While Large Language Models (LLMs) are increasingly adopted as automated judges for evaluating generated text, their outputs are often costly, and highly sensitive to prompt design, language, and aggregation strategies, severely, which limits reproducibility. To address these challenges, we propose \textbf{\textit{OmniScore}}, a family of complementary, deterministic learned metrics developed using small size ($

【13】Evaluation of Embedding-Based and Generative Methods for LLM-Driven Document Classification: Opportunities and Challenges
标题：LLM驱动文档分类的基于嵌入和生成方法的评估：机遇与挑战
链接：https://arxiv.org/abs/2604.04997

作者：Rong Lu,Hao Liu,Song Hou
备注：Accepted at the IMAGE'25 Workshop (PCW-11), Society of Exploration Geophysicists (SEG). Published version available at https://doi.org/10.1190/image2025-w11-03.1
摘要：这项工作提出了一个基于嵌入和生成模型分类地球科学技术文件的比较分析。使用多学科基准数据集，我们评估了模型准确性，稳定性和计算成本之间的权衡。我们发现，与QQMM（63%）等最先进的多模态嵌入模型相比，Qwen2.5-VL等生成式视觉语言模型（VLM）在思想链（CoT）提示下增强，实现了卓越的zero-shot准确率（82%）。我们还证明，虽然监督微调（SFT）可以提高VLM的性能，但它对训练数据的不平衡很敏感。
摘要：This work presents a comparative analysis of embedding-based and generative models for classifying geoscience technical documents. Using a multi-disciplinary benchmark dataset, we evaluated the trade-offs between model accuracy, stability, and computational cost. We find that generative Vision-Language Models (VLMs) like Qwen2.5-VL, enhanced with Chain-of-Thought (CoT) prompting, achieve superior zero-shot accuracy (82%) compared to state-of-the-art multimodal embedding models like QQMM (63%). We also demonstrate that while supervised fine-tuning (SFT) can improve VLM performance, it is sensitive to training data imbalance.

【14】CURE:Circuit-Aware Unlearning for LLM-based Recommendation
标题：CREE：基于LLM的推荐的电路意识取消学习
链接：https://arxiv.org/abs/2604.04982

作者：Ziheng Chen,Jiali Cheng,Zezhong Fan,Hadi Amiri,Yunzhi Yao,Xiangguo Sun,Yang Zhang
摘要：大型语言模型（LLM）的最新进展为推荐系统提供了新的机会，使丰富的语义理解和推理用户的兴趣和项目属性。然而，随着隐私法规的收紧，将用户数据纳入基于LLM的推荐（LLMRec）会带来重大的隐私风险，使得遗忘算法对于实际部署越来越重要。尽管人们对LLMRec unlearning越来越感兴趣，但大多数现有方法将unlearning制定为遗忘和保留目标的加权组合，同时以统一的方式更新模型参数。这样的配方不可避免地引起两个目标之间的梯度冲突，导致不稳定的优化，并导致无效的遗忘或模型效用的严重退化。此外，遗忘过程在很大程度上仍然是黑箱，破坏了其透明度和可信度。为了应对这些挑战，我们提出了CURE，一个电路感知的非学习框架，将模型组件分解为功能不同的子集，并有选择地更新它们。在这里，电路指的是因果地负责特定于任务的行为的计算子图。具体来说，我们提取的核心电路项目推荐和分析如何在这些电路中的各个模块有助于忘记和保留的目标。基于这种分析，这些模块被分类为遗忘特定的，保留特定的，和任务共享的组，每个受功能特定的更新规则，以减轻梯度冲突期间unlearning。在真实世界数据集上的实验表明，我们的方法比现有的基线实现了更有效的去学习。
摘要：Recent advances in large language models (LLMs) have opened new opportunities for recommender systems by enabling rich semantic understanding and reasoning about user interests and item attributes. However, as privacy regulations tighten, incorporating user data into LLM-based recommendation (LLMRec) introduces significant privacy risks, making unlearning algorithms increasingly crucial for practical deployment. Despite growing interest in LLMRec unlearning, most existing approaches formulate unlearning as a weighted combination of forgetting and retaining objectives while updating model parameters in a uniform manner. Such formulations inevitably induce gradient conflicts between the two objectives, leading to unstable optimization and resulting in either ineffective unlearning or severe degradation of model utility. Moreover, the unlearning procedure remains largely black-box, undermining its transparency and trustworthiness. To tackle these challenges, we propose CURE, a circuit-aware unlearning framework that disentangles model components into functionally distinct subsets and selectively updates them. Here, a circuit refers to a computational subgraph that is causally responsible for task-specific behaviors. Specifically, we extract the core circuits underlying item recommendation and analyze how individual modules within these circuits contribute to the forget and retain objectives. Based on this analysis, these modules are categorized into forget-specific, retain-specific, and task-shared groups, each subject to function-specific update rules to mitigate gradient conflicts during unlearning. Experiments on real-world datasets show that our approach achieves more effective unlearning than existing baselines.

【15】Task Ecologies and the Evolution of World-Tracking Representations in Large Language Models
标题：任务生态学和大型语言模型中世界跟踪表示的演变
链接：https://arxiv.org/abs/2604.05469

作者：Giulio Valentino Dalla Riva
摘要：我们将语言模型作为进化的模型生物来研究，并询问自回归下一个标记学习何时选择世界跟踪表示。对于任何潜在世界状态的编码，贝叶斯最优下一个令牌交叉熵分解为不可约的条件熵加上詹森-香农过剩项。当且仅当编码保留了训练生态的等价类时，这种过度才会消失。这产生了一个精确的概念，生态真实性的语言模型，并确定了最小的复杂性零过剩的解决方案作为商分区的训练等价。然后，我们确定这种固定编码分析何时适用于Transformer系列：冻结密集和冻结专家混合Transformers满足它，上下文学习不会扩大模型的分离集，每个任务的适应打破了前提。该框架预测了两种典型的故障模式：简单性压力优先消除低增益的区别，和训练最优模型仍然可以产生积极的部署生态，完善训练生态过剩。一个有条件的动态扩展展示了模型间选择和后训练如何在明确的遗传、变异和选择假设下恢复这种差距。精确的有限生态学检查和控制microgpt实验验证了静态分解，分裂合并阈值，非生态故障模式，和两个生态救援机制，在一个政权的相关数量是直接观察。我们的目标不是大规模地模拟前沿系统，而是使用小语言模型作为表征选择理论的实验室生物。
摘要：We study language models as evolving model organisms and ask when autoregressive next-token learning selects for world-tracking representations. For any encoding of latent world states, the Bayes-optimal next-token cross-entropy decomposes into the irreducible conditional entropy plus a Jensen--Shannon excess term. That excess vanishes if and only if the encoding preserves the training ecology's equivalence classes. This yields a precise notion of ecological veridicality for language models and identifies the minimum-complexity zero-excess solution as the quotient partition by training equivalence. We then determine when this fixed-encoding analysis applies to transformer families: frozen dense and frozen Mixture-of-Experts transformers satisfy it, in-context learning does not enlarge the model's separation set, and per-task adaptation breaks the premise. The framework predicts two characteristic failure modes: simplicity pressure preferentially removes low-gain distinctions, and training-optimal models can still incur positive excess on deployment ecologies that refine the training ecology. A conditional dynamic extension shows how inter-model selection and post-training can recover such gap distinctions under explicit heredity, variation, and selection assumptions. Exact finite-ecology checks and controlled microgpt experiments validate the static decomposition, split-merge threshold, off-ecology failure pattern, and two-ecology rescue mechanism in a regime where the relevant quantities are directly observable. The goal is not to model frontier systems at scale, but to use small language models as laboratory organisms for theory about representational selection.

Graph相关(图学习|图神经网络|图优化等)(8篇)

【1】ReLU Networks for Exact Generation of Similar Graphs
标题：用于精确生成相似图的ReLU网络
链接：https://arxiv.org/abs/2604.05929

作者：Mamoona Ghafoor,Tatsuya Akutsu
摘要：在化学信息学、网络异常合成和结构化数据扩充等应用中，由指定的图编辑距离约束的图的生成是重要的。尽管在包括分子设计和网络扰动分析在内的领域中对这种约束生成模型的需求不断增长，但在有界图编辑距离内可证明地生成图所需的神经架构在很大程度上仍未被探索。此外，现有的图形生成模型主要是数据驱动的，并且严重依赖于训练数据的可用性和质量，这可能导致生成的图形不满足期望的编辑距离约束。在本文中，我们通过从理论上描述ReLU神经网络能够在给定图的指定图编辑距离内生成图来解决这些挑战。特别是，我们证明了存在恒定深度和O（n^2 d）大小的ReLU网络，该网络确定性地从具有n个顶点的给定输入图生成编辑距离d内的图，消除了对训练数据的依赖，同时保证了生成的图的有效性。实验评估表明，该网络成功地生成有效的图形的实例与多达1400个顶点和编辑距离边界高达140，而基线生成模型无法生成所需的编辑距离的图形。这些结果为构造保证有效性的紧凑生成模型提供了理论基础。
摘要：Generation of graphs constrained by a specified graph edit distance from a source graph is important in applications such as cheminformatics, network anomaly synthesis, and structured data augmentation. Despite the growing demand for such constrained generative models in areas including molecule design and network perturbation analysis, the neural architectures required to provably generate graphs within a bounded graph edit distance remain largely unexplored. In addition, existing graph generative models are predominantly data-driven and depend heavily on the availability and quality of training data, which may result in generated graphs that do not satisfy the desired edit distance constraints. In this paper, we address these challenges by theoretically characterizing ReLU neural networks capable of generating graphs within a prescribed graph edit distance from a given graph. In particular, we show the existence of constant depth and O(n^2 d) size ReLU networks that deterministically generate graphs within edit distance d from a given input graph with n vertices, eliminating reliance on training data while guaranteeing validity of the generated graphs. Experimental evaluations demonstrate that the proposed network successfully generates valid graphs for instances with up to 1400 vertices and edit distance bounds up to 140, whereas baseline generative models fail to generate graphs with the desired edit distance. These results provide a theoretical foundation for constructing compact generative models with guaranteed validity.

【2】Graph Topology Information Enhanced Heterogeneous Graph Representation Learning
标题：图布局信息增强的异类图表示学习
链接：https://arxiv.org/abs/2604.05732

作者：He Zhao,Zhiwei Zeng,Yongwei Wang,Chunyan Miao
摘要：现实世界中的异构图具有固有的噪声，并且通常不是下游任务的最佳图结构，这通常会对下游任务中的GRL模型的性能产生不利影响。虽然已经提出了图结构学习（GSL）方法来同时学习图结构和下游任务，但是现有方法主要是针对同构图设计的，而针对异构图的GSL在很大程度上仍未被探索。在这方面出现了两个挑战。首先，与同类模型相比，输入图结构的质量对基于GNN的异构GRL模型有更深远的影响。其次，大多数现有的同构GRL模型遇到内存消耗问题时，直接应用到异构图。本文提出了一种新的图拓扑学习增强异构图表示学习框架（ToGRL），ToGRL通过结合任务相关的潜在拓扑信息，为下游任务学习高质量的图结构和表示。具体而言，一个新的GSL模块，首先提出了从一个原始图结构中提取下游任务相关的拓扑信息，并将其投影到拓扑嵌入。这些嵌入被用来构造一个新的图与光滑的图形信号。GSL的这种两阶段方法将邻接矩阵的优化与节点表示学习分开，以减少内存消耗。在此之后，表示学习模块将新图作为输入，以学习下游任务的嵌入。ToGRL还利用快速调整来更好地利用嵌入在学习表示中的知识，从而增强对下游任务的适应性。在五个真实世界数据集上的广泛实验表明，我们的ToGRL大大优于最先进的方法。
摘要：Real-world heterogeneous graphs are inherently noisy and usually not in the optimal graph structures for downstream tasks, which often adversely affects the performance of GRL models in downstream tasks. Although Graph Structure Learning (GSL) methods have been proposed to learn graph structures and downstream tasks simultaneously, existing methods are predominantly designed for homogeneous graphs, while GSL for heterogeneous graphs remains largely unexplored. Two challenges arise in this context. Firstly, the quality of the input graph structure has a more profound impact on GNN-based heterogeneous GRL models compared to their homogeneous counterparts. Secondly, most existing homogenous GRL models encounter memory consumption issues when applied directly to heterogeneous graphs. In this paper, we propose a novel Graph Topology learning Enhanced Heterogeneous Graph Representation Learning framework (ToGRL).ToGRL learns high-quality graph structures and representations for downstream tasks by incorporating task-relevant latent topology information. Specifically, a novel GSL module is first proposed to extract downstream task-related topology information from a raw graph structure and project it into topology embeddings. These embeddings are utilized to construct a new graph with smooth graph signals. This two-stage approach to GSL separates the optimization of the adjacency matrix from node representation learning to reduce memory consumption. Following this, a representation learning module takes the new graph as input to learn embeddings for downstream tasks. ToGRL also leverages prompt tuning to better utilize the knowledge embedded in learned representations, thus enhancing adaptability to downstream tasks. Extensive experiments on five real-world datasets show that our ToGRL outperforms state-of-the-art methods by a large margin.

【3】Same Graph, Different Likelihoods: Calibration of Autoregressive Graph Generators via Permutation-Equivalent Encodings
标题：相同的图，不同的可能性：通过置换等效编码校准自回归图生成器
链接：https://arxiv.org/abs/2604.05613

作者：Laurits Fredsgaard,Aaron Thomas,Michael Riis Andersen,Mikkel N. Schmidt,Mahito Sugiyama
备注：Workshop 'Towards Trustworthy Predictions: Theory and Applications of Calibration for Modern AI' at AISTATS 2026, Tangier, Morocco
摘要：自回归图生成器通过顺序构造过程定义似然性，但这些似然性只有在它们在同一图的所有线性化中一致时才有意义。SENT（Segmented Eulerian Neighborhood Trails）是一种新的线性化方法，它将图转换成可以被语言模型完美解码和有效处理的序列，但允许同一个图的多个等价线性化。我们使用等效线性化之间的变异系数（我们称之为线性化不确定性（LU））来量化分配的负对数似然（NLL）中的违规行为。在两个数据集上的四种线性化策略下训练Transformers，我们发现有偏排序在其原生顺序上实现了较低的NLL，但在随机排列下表现出两个数量级的预期校准误差（ECE），这表明这些模型已经学习了它们的训练线性化，而不是底层图形。在分子图基准QM 9上，生成的图的NLL与分子稳定性负相关（AUC $=0.43$），而LU达到AUC $=0.85$，这表明基于排列的评估为生成的分子提供了更可靠的质量检查。代码可在https://github.com/lauritsf/linearization-uncertainty上获得
摘要：Autoregressive graph generators define likelihoods via a sequential construction process, but these likelihoods are only meaningful if they are consistent across all linearizations of the same graph. Segmented Eulerian Neighborhood Trails (SENT), a recent linearization method, converts graphs into sequences that can be perfectly decoded and efficiently processed by language models, but admit multiple equivalent linearizations of the same graph. We quantify violations in assigned negative log-likelihood (NLL) using the coefficient of variation across equivalent linearizations, which we call Linearization Uncertainty (LU). Training transformers under four linearization strategies on two datasets, we show that biased orderings achieve lower NLL on their native order but exhibit expected calibration error (ECE) two orders of magnitude higher under random permutation, indicating that these models have learned their training linearization rather than the underlying graph. On the molecular graph benchmark QM9, NLL for generated graphs is negatively correlated with molecular stability (AUC $=0.43$), while LU achieves AUC $=0.85$, suggesting that permutation-based evaluation provides a more reliable quality check for generated molecules. Code is available at https://github.com/lauritsf/linearization-uncertainty

【4】EAGLE: Edge-Aware Graph Learning for Proactive Delivery Delay Prediction in Smart Logistics Networks
标题：EAGLE：用于智能物流网络中主动交付延迟预测的边缘感知图学习
链接：https://arxiv.org/abs/2604.05254

作者：Zhiming Xue,Menghao Huo,Yujue Wang
摘要：现代物流网络在每个仓库节点和运输通道上生成丰富的运营数据流-从订单时间戳和路由记录到运输清单-但预测交付延迟仍然主要是被动的。现有的预测方法通常将此问题视为表格分类任务，忽略网络拓扑结构，或作为时间序列异常检测任务，忽略供应链图的空间依赖性。为了弥补这一差距，我们提出了一个混合深度学习框架，用于主动供应链风险管理。该方法通过轻量级的Transformer补丁编码器和通过边缘感知图注意力网络（E-GAT），通过多任务学习目标优化的集线器间的关系依赖性，联合建模时间订单流动态。在现实世界的DataCo智能供应链数据集上进行评估，我们的框架与基线方法相比实现了一致的改进，F1得分为0.8762，AUC-ROC为0.9773。在四个独立的随机种子中，该框架的交叉种子F1标准差仅为0.0089 -比最佳消融变体提高了3.8倍-在所有评估的模型中实现了预测准确性和训练稳定性的最强平衡。
摘要：Modern logistics networks generate rich operational data streams at every warehouse node and transportation lane -- from order timestamps and routing records to shipping manifests -- yet predicting delivery delays remains predominantly reactive. Existing predictive approaches typically treat this problem either as a tabular classification task, ignoring network topology, or as a time-series anomaly detection task, overlooking the spatial dependencies of the supply chain graph. To bridge this gap, we propose a hybrid deep learning framework for proactive supply chain risk management. The proposed method jointly models temporal order-flow dynamics via a lightweight Transformer patch encoder and inter-hub relational dependencies through an Edge-Aware Graph Attention Network (E-GAT), optimized via a multi-task learning objective. Evaluated on the real-world DataCo Smart Supply Chain dataset, our framework achieves consistent improvements over baseline methods, yielding an F1-score of 0.8762 and an AUC-ROC of 0.9773. Across four independent random seeds, the framework exhibits a cross-seed F1 standard deviation of only 0.0089 -- a 3.8 times improvement over the best ablated variant -- achieving the strongest balance of predictive accuracy and training stability among all evaluated models.

【5】Feature-Aware Anisotropic Local Differential Privacy for Utility-Preserving Graph Representation Learning in Metal Additive Manufacturing
标题：金属增材制造中用于保留效用的图表示学习的产品感知各向异性局部差异隐私
链接：https://arxiv.org/abs/2604.05077

作者：MD Shafikul Islam,Mahathir Mohammad Bappy,Saifur Rahman Tushar,Md Arifuzzaman
备注：In Review in The ASME Journal of Computing and Information Science in Engineering (JCISE)
摘要：金属增材制造（AM）可以制造安全关键部件，但可靠的质量保证取决于包含专有工艺信息的高保真传感器流，从而限制了协作数据共享。现有的缺陷检测模型通常将熔池观测视为独立样本，忽略了逐层的物理耦合。此外，传统的隐私保护技术，特别是局部差分隐私（LDP），导致严重的效用退化，因为它们在所有特征维度上注入均匀的噪声。为了应对这些相互关联的挑战，我们提出了FI-LDP-HGAT。该计算框架结合了两个方法组件：一个分层的分层图注意力网络（HGAT），它捕获扫描轨迹和沉积层之间的空间和热依赖关系，以及一个用于非交互式特征私有化的特征重要性感知各向异性高斯机制（FI-LDP）。与各向同性LDP不同，FI-LDP使用编码器导出的重要性先验在嵌入坐标上重新分配隐私预算，将较低的噪声分配给任务关键的热签名，将较高的噪声分配给冗余维度，同时保持正式的LDP保证。在定向能量沉积（DED）孔隙率数据集上的实验表明，FI-LDP-HGAT在中等隐私预算（λ = 4）下实现了81.5%的效用恢复，并在严格隐私（λ = 2）下保持了0.762的缺陷召回率，同时在所有评估指标上优于经典ML，标准GNN和替代隐私机制，包括DP-SGD。机制分析证实了特征重要性和噪声幅度之间的强负相关性（斯皮尔曼= -0.81），提供了可解释的证据，即隐私效用增益是由原则性各向异性分配驱动的。
摘要：Metal additive manufacturing (AM) enables the fabrication of safety-critical components, but reliable quality assurance depends on high-fidelity sensor streams containing proprietary process information, limiting collaborative data sharing. Existing defect-detection models typically treat melt-pool observations as independent samples, ignoring layer-wise physical couplings. Moreover, conventional privacy-preserving techniques, particularly Local Differential Privacy (LDP), lead to severe utility degradation because they inject uniform noise across all feature dimensions. To address these interrelated challenges, we propose FI-LDP-HGAT. This computational framework combines two methodological components: a stratified Hierarchical Graph Attention Network (HGAT) that captures spatial and thermal dependencies across scan tracks and deposited layers, and a feature-importance-aware anisotropic Gaussian mechanism (FI-LDP) for non-interactive feature privatization. Unlike isotropic LDP, FI-LDP redistributes the privacy budget across embedding coordinates using an encoder-derived importance prior, assigning lower noise to task-critical thermal signatures and higher noise to redundant dimensions while maintaining formal LDP guarantees. Experiments on a Directed Energy Deposition (DED) porosity dataset demonstrate that FI-LDP-HGAT achieves 81.5% utility recovery at a moderate privacy budget (epsilon = 4) and maintains defect recall of 0.762 under strict privacy (epsilon = 2), while outperforming classical ML, standard GNNs, and alternative privacy mechanisms, including DP-SGD across all evaluated metrics. Mechanistic analysis confirms a strong negative correlation (Spearman = -0.81) between feature importance and noise magnitude, providing interpretable evidence that the privacy-utility gains are driven by principled anisotropic allocation.

【6】Towards Predicting Multi-Vulnerability Attack Chains in Software Supply Chains from Software Bill of Materials Graphs
标题：从软件物料清单图预测软件供应链中的多漏洞攻击链
链接：https://arxiv.org/abs/2604.04977

作者：Laura Baird,Armin Moin
备注：Accepted for the ACM International Conference on the Foundations of Software Engineering (FSE) 2026 Ideas, Visions and Reflections (IVR) Track
摘要：软件供应链的安全危害通常源于漏洞的级联交互，例如，多个易受攻击的组件之间的交互。然而，用于安全分析的基于软件物料清单（SBOM）的管道通常将扫描仪结果视为独立的每个CVE（常见漏洞和暴露）记录。我们提出了一个新的研究方向的基础上学习多漏洞攻击链通过一种新的SBOM驱动的图学习方法。这将SBOM结构和扫描器输出视为依赖性约束的证据图，而不是漏洞的平面列表。我们表示的异构图形的节点捕获软件组件和已知的漏洞（即，CVE），连接类型的关系，如依赖关系和漏洞链接的可扩展性丰富的CycloneDX SBOM。我们训练一个异构图注意力网络（HGAT）来预测一个组件是否与至少一个已知的漏洞相关联，作为学习这个结构的可行性检查。此外，我们使用轻量级多层感知器（MLP）神经网络在记录的多漏洞链上训练，将级联漏洞的发现框架为CVE对链接预测。在Wild SBOM公共数据集的200个真实SBOM上进行验证，HGAT组件分类器实现了91.03%的准确度和74.02%的F1分数，而级联预测模型（MLP）在35个记录攻击链的种子集上实现了0.93的接收器操作特征-曲线下面积（ROC-AUC）。
摘要：Software supply chain security compromises often stem from cascaded interactions of vulnerabilities, for example, between multiple vulnerable components. Yet, Software Bill of Materials (SBOM)-based pipelines for security analysis typically treat scanner findings as independent per-CVE (Common Vulnerabilities and Exposures) records. We propose a new research direction based on learning multi-vulnerability attack chains through a novel SBOM-driven graph-learning approach. This treats SBOM structure and scanner outputs as a dependency-constrained evidence graph rather than a flat list of vulnerabilities. We represent vulnerability-enriched CycloneDX SBOMs as heterogeneous graphs whose nodes capture software components and known vulnerabilities (i.e, CVEs), connected by typed relations, such as dependency and vulnerability links. We train a Heterogeneous Graph Attention Network (HGAT) to predict whether a component is associated with at least one known vulnerability as a feasibility check for learning over this structure. Additionally, we frame the discovery of cascading vulnerabilities as CVE-pair link prediction using a lightweight Multi-Layer Perceptron (MLP) neural network trained on documented multi-vulnerability chains. Validated on 200 real-world SBOMs from the Wild SBOMs public dataset, the HGAT component classifier achieves 91.03% Accuracy and 74.02% F1-score, while the cascade predictor model (MLP) achieves a Receiver Operating Characteristic - Area Under Curve (ROC-AUC) of 0.93 on a seed set of 35 documented attack chains.

【7】Sparse Autoencoders as a Steering Basis for Phase Synchronization in Graph-Based CFD Surrogates
标题：稀疏自动编码器作为基于图形的计算流体动力学代理中相同步的指导基础
链接：https://arxiv.org/abs/2604.04946

作者：Yeping Hu,Ruben Glatt,Shusen Liu
摘要：基于图形的代理模型为高保真CFD求解器提供了快速替代方案，但其不透明的潜在空间和有限的可控性限制了在安全关键设置中的使用。振荡流中的一个关键故障模式是相位漂移，其中预测保持定性正确，但逐渐失去与观测的时间对准，限制了数字孪生和闭环控制的使用。通过再培训纠正这一点是昂贵的，在部署过程中是不切实际的。我们问，相位漂移是否可以通过操纵冻结代理的潜在空间来事后纠正。我们为预训练的基于图的CFD模型提出了一个相位控制框架，该框架将正确的表示与正确的干预机制相结合。为了获得有效转向的解纠缠表示，我们在冻结的MeshGraphNet嵌入上使用稀疏自编码器（SAE）。为了控制动态，我们超越了静态的按特征干预，如缩放或箝位，并引入了一种时间相干的相位感知方法。具体来说，我们确定振荡特征对希尔伯特分析，项目空间领域到低秩的时间系数通过SVD，并应用平滑的时变旋转提前或延迟周期性模式，同时保持幅度相位结构。使用表示不可知的设置，我们比较基于SAE的转向与PCA和原始嵌入空间在同一干预管道。结果表明，稀疏，解纠缠表示优于密集或纠缠的，而静态干预失败，在这个动态设置。总的来说，这项工作表明，潜在的空间转向可以从语义域扩展到依赖于时间的物理系统时，干预措施尊重底层的动态，并且用于可解释性的相同的稀疏特征也可以作为物理上有意义的控制轴。
摘要：Graph-based surrogate models provide fast alternatives to high-fidelity CFD solvers, but their opaque latent spaces and limited controllability restrict use in safety-critical settings. A key failure mode in oscillatory flows is phase drift, where predictions remain qualitatively correct but gradually lose temporal alignment with observations, limiting use in digital twins and closed-loop control. Correcting this through retraining is expensive and impractical during deployment. We ask whether phase drift can instead be corrected post hoc by manipulating the latent space of a frozen surrogate. We propose a phase-steering framework for pretrained graph-based CFD models that combines the right representation with the right intervention mechanism. To obtain disentangled representation for effective steering, we use sparse autoencoders (SAEs) on frozen MeshGraphNet embeddings. To steer dynamics, we move beyond static per-feature interventions such as scaling or clamping, and introduce a temporally coherent, phase-aware method. Specifically, we identify oscillatory feature pairs with Hilbert analysis, project spatial fields into low-rank temporal coefficients via SVD, and apply smooth time-varying rotations to advance or delay periodic modes while preserving amplitude-phase structure. Using a representation-agnostic setup, we compare SAE-based steering with PCA and raw embedding spaces under the same intervention pipeline. Results show that sparse, disentangled representations outperform dense or entangled ones, while static interventions fail in this dynamical setting. Overall, this work shows that latent-space steering can be extended from semantic domains to time-dependent physical systems when interventions respect the underlying dynamics, and that the same sparse features used for interpretability can also serve as physically meaningful control axes.

【8】Graph Signal Diffusion Models for Wireless Resource Allocation
标题：无线资源分配的图信号扩散模型
链接：https://arxiv.org/abs/2604.05175

作者：Yigit Berkay Uslu,Samar Hadou,Shirin Saeedi Bidokhti,Alejandro Ribeiro
备注：Under review for SPAWC'26
摘要：本文研究了具有图结构干扰的无线网络中的约束遍历资源优化问题。我们训练扩散模型的政策，以匹配专家的条件分布资源分配。通过利用原始-对偶（专家）算法，我们生成原始迭代，作为每个训练网络实例的相应专家条件的提取。我们认为随机图信号支持已知的信道状态图的分配。我们将扩散模型架构实现为图神经网络（GNN）块的U-Net层次结构，以信道状态和附加节点状态为条件。在推理时，学习生成模型通过直接从近似最优条件分布中采样分配向量来摊销迭代专家策略。在一个功率控制的案例研究中，我们表明，分时产生的功率分配实现近最优遍历和率效用和近可行遍历最小速率，具有较强的泛化能力和跨网络状态的可转移性。
摘要：We consider constrained ergodic resource optimization in wireless networks with graph-structured interference. We train a diffusion model policy to match expert conditional distributions over resource allocations. By leveraging a primal-dual (expert) algorithm, we generate primal iterates that serve as draws from the corresponding expert conditionals for each training network instance. We view the allocations as stochastic graph signals supported on known channel state graphs. We implement the diffusion model architecture as a U-Net hierarchy of graph neural network (GNN) blocks, conditioned on the channel states and additional node states. At inference, the learned generative model amortizes the iterative expert policy by directly sampling allocation vectors from the near-optimal conditional distributions. In a power-control case study, we show that time-sharing the generated power allocations achieves near-optimal ergodic sum-rate utility and near-feasible ergodic minimum-rates, with strong generalization and transferability across network states.

Transformer(7篇)

【1】Short Data, Long Context: Distilling Positional Knowledge in Transformers
标题：短数据，长背景：提炼Transformer中的位置知识
链接：https://arxiv.org/abs/2604.06070

作者：Patrick Huber,Ernie Chang,Chinnadhurai Sankar,Rylan Conway,Igor Fedorov,Md Rifat Arefin,Adithya Sagar
摘要：扩展语言模型的上下文窗口通常需要昂贵的长上下文预训练，这对训练效率和数据收集都构成了重大挑战。在本文中，我们提出的证据表明，长上下文检索能力可以通过基于logit的知识蒸馏转移到学生模型，即使是在长上下文窗口内专门对打包的短上下文样本进行训练。我们通过旋转位置嵌入（RoPE）的镜头提供全面的见解，并建立三个关键发现。首先，与之前的工作一致，我们表明，在每个训练阶段最大化旋转频谱利用率的逐相RoPE缩放，也在知识蒸馏设置中实现了最佳的长上下文性能。其次，我们证明了基于logit的知识蒸馏可以直接实现位置信息传递。使用一个实验装置与包装重复令牌序列，我们跟踪位置扰动的传播从查询和关键向量通过连续的Transformer层输出logits，揭示位置信息系统地影响教师的输出分布，反过来，蒸馏信号接收的学生模型。第三，我们的分析揭示了结构化的更新模式在查询状态在长上下文扩展，具有不同的参数跨度表现出强烈的敏感性，长上下文训练。
摘要：Extending the context window of language models typically requires expensive long-context pre-training, posing significant challenges for both training efficiency and data collection. In this paper, we present evidence that long-context retrieval capabilities can be transferred to student models through logit-based knowledge distillation, even when training exclusively on packed short-context samples within a long-context window. We provide comprehensive insights through the lens of Rotary Position Embedding (RoPE) and establish three key findings. First, consistent with prior work, we show that phase-wise RoPE scaling, which maximizes rotational spectrum utilization at each training stage, also achieves the best long-context performance in knowledge distillation setups. Second, we demonstrate that logit-based knowledge distillation can directly enable positional information transfer. Using an experimental setup with packed repeated token sequences, we trace the propagation of positional perturbations from query and key vectors through successive transformer layers to output logits, revealing that positional information systematically influences the teacher's output distribution and, in turn, the distillation signal received by the student model. Third, our analysis uncovers structured update patterns in the query state during long-context extension, with distinct parameter spans exhibiting strong sensitivity to long-context training.

【2】Multi-Modal Landslide Detection from Sentinel-1 SAR and Sentinel-2 Optical Imagery Using Multi-Encoder Vision Transformers and Ensemble Learning
标题：使用多编码器视觉变换器和集合学习从Sentinel-1 SAR和Sentinel-2光学图像中进行多模式山体滑坡检测
链接：https://arxiv.org/abs/2604.05959

作者：Ioannis Nasios
摘要：山体滑坡是一种严重影响人类生活、基础设施和生态系统的重大地质灾害，强调需要准确及时的检测方法来支持减少灾害风险。本研究提出了一种模块化的多模型框架，将Sentinel-2光学图像与Sentinel-1合成孔径雷达（SAR）数据相融合，以实现强大的滑坡检测。该方法利用多编码器Vision Transformers，其中每个数据模态通过单独的轻量级预训练编码器进行处理，从而在滑坡检测中实现强大的性能。此外，多个模型的集成，特别是神经网络和梯度增强模型（LightGBM和XGBoost）的组合，展示了集成学习进一步提高准确性和鲁棒性的能力。导出的光谱指数，如归一化差异植被指数，与原始波段合并，以提高对植被和地表变化的敏感性。所提出的方法在滑坡检测上实现了最先进的F1得分0.919，解决了基于块的分类任务，而不是像素级分割，并且在没有事件前Sentinel-2数据的情况下操作，突出了其在非经典变化检测设置中的有效性。它还在机器学习竞赛中展示了最佳性能，在精确度和召回率之间实现了很好的平衡，并突出了明确利用光学和雷达数据互补优势的优势。所进行的实验和研究还强调可扩展性和操作适用性，实现仅光学、仅SAR或组合输入的灵活配置，并为更广泛的自然灾害监测和环境变化应用提供可转移的框架。完整的训练和推理代码可以在https://github.com/IoannisNasios/sentinel-landslide-cls中找到。
摘要：Landslides represent a major geohazard with severe impacts on human life, infrastructure, and ecosystems, underscoring the need for accurate and timely detection approaches to support disaster risk reduction. This study proposes a modular, multi-model framework that fuses Sentinel-2 optical imagery with Sentinel-1 Synthetic Aperture Radar (SAR) data, for robust landslide detection. The methodology leverages multi-encoder vision transformers, where each data modality is processed through separate lightweight pretrained encoders, achieving strong performance in landslide detection. In addition, the integration of multiple models, particularly the combination of neural networks and gradient boosting models (LightGBM and XGBoost), demonstrates the power of ensemble learning to further enhance accuracy and robustness. Derived spectral indices, such as NDVI, are integrated alongside original bands to enhance sensitivity to vegetation and surface changes. The proposed methodology achieves a state-of-the-art F1 score of 0.919 on landslide detection, addressing a patch-based classification task rather than pixel-level segmentation and operating without pre-event Sentinel-2 data, highlighting its effectiveness in a non-classical change detection setting. It also demonstrated top performance in a machine learning competition, achieving a strong balance between precision and recall and highlighting the advantages of explicitly leveraging the complementary strengths of optical and radar data. The conducted experiments and research also emphasize scalability and operational applicability, enabling flexible configurations with optical-only, SAR-only, or combined inputs, and offering a transferable framework for broader natural hazard monitoring and environmental change applications. Full training and inference code can be found in https://github.com/IoannisNasios/sentinel-landslide-cls.

【3】Modeling Patient Care Trajectories with Transformer Hawkes Processes
标题：使用Transformer Hawkes过程建模患者护理轨迹
链接：https://arxiv.org/abs/2604.05844

作者：Saumya Pandey,Varun Chandola
摘要：患者医疗保健利用包括不规则的时间戳事件，如门诊就诊，住院和急诊，形成个性化的护理轨迹。建模这些轨迹是了解利用模式和预测未来的护理需求是至关重要的，但由于时间的不规则性和严重的类不平衡是具有挑战性的。在这项工作中，我们建立在Transformer Hawkes Process框架的基础上，对连续时间内的患者轨迹进行建模。通过将基于Transformer的历史编码与Hawkes过程动力学相结合，该模型捕获事件依赖性并联合预测事件类型和事件发生时间。为了解决极端不平衡问题，我们引入了一种使用反平方根类加权的不平衡感知训练策略。这提高了对罕见但临床重要事件的敏感性，而不会改变数据分布。对真实世界数据的实验证明了性能的提高，并为识别高风险患者人群提供了有临床意义的见解。
摘要：Patient healthcare utilization consists of irregularly time-stamped events, such as outpatient visits, inpatient admissions, and emergency encounters, forming individualized care trajectories. Modeling these trajectories is crucial for understanding utilization patterns and predicting future care needs, but is challenging due to temporal irregularity and severe class imbalance. In this work, we build on the Transformer Hawkes Process framework to model patient trajectories in continuous time. By combining Transformer-based history encoding with Hawkes process dynamics, the model captures event dependencies and jointly predicts event type and time-to-event. To address extreme imbalance, we introduce an imbalance-aware training strategy using inverse square-root class weighting. This improves sensitivity to rare but clinically important events without altering the data distribution. Experiments on real-world data demonstrate improved performance and provide clinically meaningful insights for identifying high-risk patient populations.

【4】EEG-MFTNet: An Enhanced EEGNet Architecture with Multi-Scale Temporal Convolutions and Transformer Fusion for Cross-Session Motor Imagery Decoding
标题：EEG-MFTNet：一种增强的EEGNet架构，具有多尺度时间卷积和Transformer融合，用于跨会话运动图像解码
链接：https://arxiv.org/abs/2604.05843

作者：Panagiotis Andrikopoulos,Siamak Mehrkanoon
备注：6 pages, 4 figs
摘要：脑机接口（BCI）使大脑和外部设备之间能够直接通信，为运动障碍患者提供关键支持。然而，准确的运动想象（MI）解码从脑电图（EEG）仍然具有挑战性，由于噪声和跨会话的变化。这项研究介绍了EEG-MFTNet，一种基于EEGNet架构的新型深度学习模型，通过多尺度时间卷积和Transformer编码器流进行增强。这些组件的目的是捕捉短期和长期的EEG信号的时间依赖性。该模型使用受试者相关的跨会话设置在SHU数据集上进行评估，优于基线模型，包括EEGNet及其最近的衍生物。EEG-MFTNet实现了58.9%的平均分类准确率，同时保持了较低的计算复杂度和推理延迟。结果突出了该模型的实时BCI应用的潜力，并强调了架构创新在改善MI解码的重要性。这项工作有助于开发更强大和自适应的BCI系统，并对辅助技术和神经康复产生影响。
摘要：Brain-computer interfaces (BCIs) enable direct communication between the brain and external devices, providing critical support for individuals with motor impairments. However, accurate motor imagery (MI) decoding from electroencephalography (EEG) remains challenging due to noise and cross-session variability. This study introduces EEG-MFTNet, a novel deep learning model based on the EEGNet architecture, enhanced with multi-scale temporal convolutions and a Transformer encoder stream. These components are designed to capture both short and long-range temporal dependencies in EEG signals. The model is evaluated on the SHU dataset using a subject-dependent cross-session setup, outperforming baseline models, including EEGNet and its recent derivatives. EEG-MFTNet achieves an average classification accuracy of 58.9% while maintaining low computational complexity and inference latency. The results highlight the model's potential for real-time BCI applications and underscore the importance of architectural innovations in improving MI decoding. This work contributes to the development of more robust and adaptive BCI systems, with implications for assistive technologies and neurorehabilitation.

【5】On the Geometry of Positional Encodings in Transformers
标题：Transformer中位置编码的几何学
链接：https://arxiv.org/abs/2604.05217

作者：Giansalvo Cirrincione
摘要：神经语言模型处理单词序列，但其中的数学运算对单词出现的顺序不敏感。位置编码是为了解决这个问题而添加的组件。尽管它们很重要，但位置编码的设计在很大程度上是通过试验和错误来实现的，没有一个数学理论来说明它们应该做什么。本文发展了这样一个理论。建立了四个结果。首先，任何没有位置信号的Transformer都不能解决任何对词序敏感的任务（必然性定理）。其次，训练在温和和可验证的条件下（位置分离定理），将不同的向量表示分配给每个全局最小值的不同序列位置。第三，通过位置分布之间的Hellinger距离的经典多维尺度（MDS）构建信息最优编码的最佳可实现近似;任何编码的质量都由单个数字，应力（命题5，算法1）来衡量。第四，最佳编码具有有效秩r = rank（B）<= n-1，并且可以用r（n+d）个参数而不是nd（最小参数化结果）来表示。附录A通过五个引理证明了神经切核（NTK）机制中的单调性猜想，用于掩蔽语言建模（MLM）损失，序列分类损失和满足位置充分性条件的一般损失。SST-2和IMDB与BERT-基地的实验证实了理论预测，并揭示，注意力与线性偏置（ALiBi）实现更低的压力比正弦编码和旋转位置嵌入（RoPE），符合近似移位等方差下的MDS编码的秩1解释。
摘要：Neural language models process sequences of words, but the mathematical operations inside them are insensitive to the order in which words appear. Positional encodings are the component added to remedy this. Despite their importance, positional encodings have been designed largely by trial and error, without a mathematical theory of what they ought to do. This paper develops such a theory. Four results are established. First, any Transformer without a positional signal cannot solve any task sensitive to word order (Necessity Theorem). Second, training assigns distinct vector representations to distinct sequence positions at every global minimiser, under mild and verifiable conditions (Positional Separation Theorem). Third, the best achievable approximation to an information-optimal encoding is constructed via classical multidimensional scaling (MDS) on the Hellinger distance between positional distributions; the quality of any encoding is measured by a single number, the stress (Proposition 5, Algorithm 1). Fourth, the optimal encoding has effective rank r = rank(B) <= n-1 and can be represented with r(n+d) parameters instead of nd (minimal parametrisation result). Appendix A develops a proof of the Monotonicity Conjecture within the Neural Tangent Kernel (NTK) regime for masked language modelling (MLM) losses, sequence classification losses, and general losses satisfying a positional sufficiency condition, through five lemmas. Experiments on SST-2 and IMDB with BERT-base confirm the theoretical predictions and reveal that Attention with Linear Biases (ALiBi) achieves much lower stress than the sinusoidal encoding and Rotary Position Embedding (RoPE), consistent with a rank-1 interpretation of the MDS encoding under approximate shift-equivariance.

【6】Vintix II: Decision Pre-Trained Transformer is a Scalable In-Context Reinforcement Learner
标题：Vintix II：决策预训练Transformer是一个可扩展的上下文强化学习器
链接：https://arxiv.org/abs/2604.05112

作者：Andrei Polubarov,Lyubaykin Nikita,Alexander Derevyagin,Artyom Grishin,Igor Saprygin,Aleksandr Serkov,Mark Averchenko,Daniil Tikhonov,Maksim Zhdanov,Alexander Nikulin,Ilya Zisman,Albina Klepach,Alexey Zemtsov,Vladislav Kurenkov
备注：ICLR 2026, Poster
摘要：最近的进展，在上下文强化学习（ICRL）已经证明了它的潜力，培训通才代理，可以获得新的任务直接在推理。算法蒸馏（AD）开创了这种范式，随后被扩展到多域设置，尽管它推广到看不见的任务的能力仍然有限。决策预训练的Transformer（DPT）被引入作为替代方案，在简化的领域中显示出更强的上下文强化学习能力，但其可扩展性尚未建立。在这项工作中，我们将DPT扩展到不同的多域环境，将Flow Matching作为一种自然的训练选择，保留其作为贝叶斯后验采样的解释。因此，我们获得了一个在数百个不同任务中训练的代理，该代理在泛化到测试集方面取得了明显的进步。该代理改进了先前的AD缩放，并在在线和离线推理中表现出更强的性能，加强了ICRL作为培训通才代理的专家蒸馏的可行替代方案。
摘要：Recent progress in in-context reinforcement learning (ICRL) has demonstrated its potential for training generalist agents that can acquire new tasks directly at inference. Algorithm Distillation (AD) pioneered this paradigm and was subsequently scaled to multi-domain settings, although its ability to generalize to unseen tasks remained limited. The Decision Pre-Trained Transformer (DPT) was introduced as an alternative, showing stronger in-context reinforcement learning abilities in simplified domains, but its scalability had not been established. In this work, we extend DPT to diverse multi-domain environments, applying Flow Matching as a natural training choice that preserves its interpretation as Bayesian posterior sampling. As a result, we obtain an agent trained across hundreds of diverse tasks that achieves clear gains in generalization to the held-out test set. This agent improves upon prior AD scaling and demonstrates stronger performance in both online and offline inference, reinforcing ICRL as a viable alternative to expert distillation for training generalist agents.

【7】Brain-to-Speech: Prosody Feature Engineering and Transformer-Based Reconstruction
标题：大脑到言语：韵律特征工程和基于转换器的重建
链接：https://arxiv.org/abs/2604.05751

作者：Mohammed Salah Al-Radhi,Géza Németh,Andon Tchechmedjiev,Binbin Xu
备注：OpenAccess chapter: 10.1007/978-3-032-10561-5_16. In: Curry, E., et al. Artificial Intelligence, Data and Robotics (2026)
摘要：本章提出了一种新的方法，脑语音合成（BTS）从颅内脑电图（iEEG）数据，强调韵律感知的特征工程和先进的基于转换器的模型高保真语音重建。由于人们对直接从大脑活动中解码语音的兴趣日益浓厚，这项工作整合了神经科学、人工智能和信号处理，以生成准确而自然的语音。我们介绍了一种新的管道，用于直接从复杂的大脑iEEG信号中提取关键的韵律特征，包括语调，音高和节奏。为了有效地利用这些关键特征来生成自然的语音，我们采用了先进的深度学习模型。此外，本章介绍了一种新的Transformer编码器架构，专门设计用于大脑到语音的任务。与传统模型不同，我们的架构集成了提取的韵律特征，以显着增强语音重建，从而产生具有更好的可懂度和表现力的语音。详细的评估表明，在定量和感知指标方面，它的性能优于现有的基线方法，如传统的Griffin-Lim和基于CNN的重建。通过展示这些在特征提取和基于transformer的学习方面的进步，本章为人工智能驱动的神经修复领域的不断发展做出了贡献，为恢复言语障碍患者沟通的辅助技术铺平了道路。最后，我们讨论了有前途的未来研究方向，包括扩散模型和实时推理系统的集成。
摘要：This chapter presents a novel approach to brain-to-speech (BTS) synthesis from intracranial electroencephalography (iEEG) data, emphasizing prosody-aware feature engineering and advanced transformer-based models for high-fidelity speech reconstruction. Driven by the increasing interest in decoding speech directly from brain activity, this work integrates neuroscience, artificial intelligence, and signal processing to generate accurate and natural speech. We introduce a novel pipeline for extracting key prosodic features directly from complex brain iEEG signals, including intonation, pitch, and rhythm. To effectively utilize these crucial features for natural-sounding speech, we employ advanced deep learning models. Furthermore, this chapter introduces a novel transformer encoder architecture specifically designed for brain-to-speech tasks. Unlike conventional models, our architecture integrates the extracted prosodic features to significantly enhance speech reconstruction, resulting in generated speech with improved intelligibility and expressiveness. A detailed evaluation demonstrates superior performance over established baseline methods, such as traditional Griffin-Lim and CNN-based reconstruction, across both quantitative and perceptual metrics. By demonstrating these advancements in feature extraction and transformer-based learning, this chapter contributes to the growing field of AI-driven neuroprosthetics, paving the way for assistive technologies that restore communication for individuals with speech impairments. Finally, we discuss promising future research directions, including the integration of diffusion models and real-time inference systems.

GAN|对抗|攻击|生成相关(7篇)

【1】Stealthy and Adjustable Text-Guided Backdoor Attacks on Multimodal Pretrained Models
标题：对多模式预训练模型的隐形且可调节的文本引导后门攻击
链接：https://arxiv.org/abs/2604.05809

作者：Yiyang Zhang,Chaojian Yu,Ziming Hong,Yuanjie Shao,Qinmu Peng,Tongliang Liu,Xinge You
摘要：多模态预训练模型容易受到后门攻击，但大多数现有方法依赖于视觉或多模态触发器，这是不切实际的，因为视觉嵌入触发器很少出现在现实世界的数据中。为了克服这一限制，我们提出了一种新的文本引导后门（TGB）攻击多模态预训练模型，其中文本描述中常见的单词作为后门触发器，显着提高了隐蔽性和实用性。此外，我们在中毒样本上引入视觉对抗扰动来调节模型对文本触发器的学习，从而实现可控和可调的TGB攻击。对基于多模态预训练模型的下游任务进行了广泛的实验，包括合成图像检索（CIR）和视觉问题检索（VQA），证明了TGB在各种现实环境中实现了实用性和隐蔽性，并具有可调的攻击成功率，揭示了多模态预训练模型中的关键安全漏洞。
摘要：Multimodal pretrained models are vulnerable to backdoor attacks, yet most existing methods rely on visual or multimodal triggers, which are impractical since visually embedded triggers rarely occur in real-world data. To overcome this limitation, we propose a novel Text-Guided Backdoor (TGB) attack on multimodal pretrained models, where commonly occurring words in textual descriptions serve as backdoor triggers, significantly improving stealthiness and practicality. Furthermore, we introduce visual adversarial perturbations on poisoned samples to modulate the model's learning of textual triggers, enabling a controllable and adjustable TGB attack. Extensive experiments on downstream tasks built upon multimodal pretrained models, including Composed Image Retrieval (CIR) and Visual Question Answering (VQA), demonstrate that TGB achieves practicality and stealthiness with adjustable attack success rates across diverse realistic settings, revealing critical security vulnerabilities in multimodal pretrained models.

【2】Controllable Image Generation with Composed Parallel Token Prediction
标题：利用合成并行令牌预测的可控图像生成
链接：https://arxiv.org/abs/2604.05730

作者：Jamie Stirling,Noura Al-Moubayed,Chris G. Willcocks,Hubert P. H. Shum
备注：8 pages + references, 7 figures, accepted to CVPR Workshops 2026 (LoViF). arXiv admin note: substantial text overlap with arXiv:2405.06535
摘要：条件离散生成模型很难忠实地组合多个输入条件。为了解决这个问题，我们推导出一个理论上接地配方组成离散概率生成过程，掩蔽代（吸收扩散）作为一个特例。我们的公式能够精确指定位于训练数据之外的输入条件的新组合和数量，概念加权能够强调或否定个别条件。与VQ-VAE和VQ-GAN的丰富组成的学习词汇协同作用，与以前的最先进技术相比，我们的方法在错误率方面实现了63.4\%$的相对减少，平均跨越3个数据集（位置CLEVR，关系CLEVR和FFHQ），同时获得平均绝对FID改善$-9.58$。与此同时，我们的方法提供了2.3\times $到12\times $的实时加速比可比的方法，并很容易应用到一个开放的预训练离散文本到图像模型的细粒度控制的文本到图像的生成。
摘要：Conditional discrete generative models struggle to faithfully compose multiple input conditions. To address this, we derive a theoretically-grounded formulation for composing discrete probabilistic generative processes, with masked generation (absorbing diffusion) as a special case. Our formulation enables precise specification of novel combinations and numbers of input conditions that lie outside the training data, with concept weighting enabling emphasis or negation of individual conditions. In synergy with the richly compositional learned vocabulary of VQ-VAE and VQ-GAN, our method attains a $63.4\%$ relative reduction in error rate compared to the previous state-of-the-art, averaged across 3 datasets (positional CLEVR, relational CLEVR and FFHQ), simultaneously obtaining an average absolute FID improvement of $-9.58$. Meanwhile, our method offers a $2.3\times$ to $12\times$ real-time speed-up over comparable methods, and is readily applied to an open pre-trained discrete text-to-image model for fine-grained control of text-to-image generation.

【3】Optimal-Transport-Guided Functional Flow Matching for Turbulent Field Generation in Hilbert Space
标题：最优输运引导的功能流匹配用于生成希尔BERT空间湍流场
链接：https://arxiv.org/abs/2604.05700

作者：Li Kunpeng,Wan Chenguang,Qu Zhisong,Lim Kyungtak,Virginie Grandgirard,Xavier Garbet,Yu Hua,Ong Yew Soon
备注：41 pages, 5 figures, journal paper
摘要：湍流的高保真建模需要捕捉复杂的时空动力学和多尺度不确定性，这对传统的基于知识的系统构成了根本性的挑战。虽然深度生成模型，如扩散模型和流匹配，已经显示出有前途的性能，但它们从根本上受到其离散的，基于像素的性质的限制。这种限制限制了它们在湍流计算中的适用性，其中数据固有地以函数形式存在。为了解决这一差距，我们提出了功能最优传输条件流匹配（FOT-CFM），直接在无限维函数空间中定义的生成框架。与传统的固定网格方法不同，FOT-CFM将物理场视为无限维希尔伯特空间的元素，并直接在概率测度水平上学习分辨率不变的生成动力学。通过整合最优传输（OT）理论，我们在希尔伯特空间中构建了噪声和数据测度之间的确定性直线概率路径。这种配方能够实现无模拟训练，并显著加快采样过程。我们严格评估所提出的系统上的各种套件的混沌动力系统，包括Navier-Stokes方程，Kolmogorov流，长谷川若谷方程，所有这些都表现出丰富的多尺度湍流结构。实验结果表明，FOT-CFM在再现高阶湍流统计和能谱方面与最先进的基线相比具有更高的保真度。
摘要：High-fidelity modeling of turbulent flows requires capturing complex spatiotemporal dynamics and multi-scale intermittency, posing a fundamental challenge for traditional knowledge-based systems. While deep generative models, such as diffusion models and Flow Matching, have shown promising performance, they are fundamentally constrained by their discrete, pixel-based nature. This limitation restricts their applicability in turbulence computing, where data inherently exists in a functional form. To address this gap, we propose Functional Optimal Transport Conditional Flow Matching (FOT-CFM), a generative framework defined directly in infinite-dimensional function space. Unlike conventional approaches defined on fixed grids, FOT-CFM treats physical fields as elements of an infinite-dimensional Hilbert space, and learns resolution-invariant generative dynamics directly at the level of probability measures. By integrating Optimal Transport (OT) theory, we construct deterministic, straight-line probability paths between noise and data measures in Hilbert space. This formulation enables simulation-free training and significantly accelerates the sampling process. We rigorously evaluate the proposed system on a diverse suite of chaotic dynamical systems, including the Navier-Stokes equations, Kolmogorov Flow, and Hasegawa-Wakatani equations, all of which exhibit rich multi-scale turbulent structures. Experimental results demonstrate that FOT-CFM achieves superior fidelity in reproducing high-order turbulent statistics and energy spectra compared to state-of-the-art baselines.

【4】CUE-R: Beyond the Final Answer in Retrieval-Augmented Generation
标题：CUE-R：超越检索增强一代的最终答案
链接：https://arxiv.org/abs/2604.05467

作者：Siddharth Jain,Venkat Narayan Vedam
备注：6 figures, 14 tables; appendix includes bootstrap CIs, metric definitions, duplicate position sensitivity, prompt template, and reproducibility details
摘要：随着语言模型从单次答案生成转向多步推理，检索和消耗证据中间推理，评估单个检索项目的作用变得更加重要。现有的RAG评估通常针对最终答案质量，引用忠实性或答案级别归因，但这些都没有直接针对我们在这里研究的基于干预的每个证据项效用视图。我们介绍了CUE-R，一个轻量级的基于干预的框架，用于测量单次RAG中的每个证据项的操作效用，使用浅层可观察的检索使用痕迹。CUE-R通过REMOVE、REPLACE和DUPLICATE操作符扰动单个证据项，然后沿着三个效用轴（正确性、基于代理的基础忠诚度和置信度误差）加上跟踪发散信号测量变化。我们还概述了一个可操作的证据作用分类解释干预结果。使用Qwen-3 8B和GPT-5.2在HotpotQA和2 WikiMultihopQA上进行的实验揭示了一个一致的模式：REMOVE和REPLACE在产生大的跟踪偏移的同时严重损害了正确性和基础，而DUPLICATE通常是答案冗余的，但不是完全行为中立的。零检索控制证实，这些影响产生的有意义的检索退化。两个支持的消融进一步表明，多跳证据项可以非相加地相互作用：删除两个支持对性能的损害远大于任何一个单一删除。我们的研究结果表明，只回答评价错过了重要的证据效果和干预为基础的效用分析是一个实用的补充RAG评价。
摘要：As language models shift from single-shot answer generation toward multi-step reasoning that retrieves and consumes evidence mid-inference, evaluating the role of individual retrieved items becomes more important. Existing RAG evaluation typically targets final-answer quality, citation faithfulness, or answer-level attribution, but none of these directly targets the intervention-based, per-evidence-item utility view we study here. We introduce CUE-R, a lightweight intervention-based framework for measuring per-evidence-item operational utility in single-shot RAG using shallow observable retrieval-use traces. CUE-R perturbs individual evidence items via REMOVE, REPLACE, and DUPLICATE operators, then measures changes along three utility axes (correctness, proxy-based grounding faithfulness, and confidence error) plus a trace-divergence signal. We also outline an operational evidence-role taxonomy for interpreting intervention outcomes. Experiments on HotpotQA and 2WikiMultihopQA with Qwen-3 8B and GPT-5.2 reveal a consistent pattern: REMOVE and REPLACE substantially harm correctness and grounding while producing large trace shifts, whereas DUPLICATE is often answer-redundant yet not fully behaviorally neutral. A zero-retrieval control confirms that these effects arise from degradation of meaningful retrieval. A two-support ablation further shows that multi-hop evidence items can interact non-additively: removing both supports harms performance far more than either single removal. Our results suggest that answer-only evaluation misses important evidence effects and that intervention-based utility analysis is a practical complement for RAG evaluation.

【5】LatentAudit: Real-Time White-Box Faithfulness Monitoring for Retrieval-Augmented Generation with Verifiable Deployment
标题：LatentAudit：具有可验证部署的检索增强生成的实时白盒忠实度监控
链接：https://arxiv.org/abs/2604.05358

作者：Zhe Yu,Wenpeng Xing,Meng Han
摘要：检索增强生成（RAG）减轻了幻觉，但并没有消除它：部署的系统仍然必须在推理时决定其答案是否实际上得到检索证据的支持。我们介绍了LatentAudit，一个白盒审计器，它从一个开放权重生成器中汇集中期到晚期的剩余流激活，并测量它们与证据表示的马氏距离。由此产生的二次规则不需要辅助判断模型，在生成时运行，并且足够简单，可以在一个小的保留集上进行校准。我们发现，残留流几何进行了可用的忠实信号，该信号的生存架构的变化和现实的检索失败，并且同样的规则仍然适合公众验证。在使用Llama-3-8B的PubMedQA上，LatentAudit达到0.942 AUROC，开销为0.77，ms。在三个QA基准和五个模型家族（Llama-2/3，Qwen-2.5/3，Mistral）中，监控器保持稳定;在具有矛盾，检索失误和部分支持噪声的四向压力测试下，它在PubMedQA上达到0.9566- 0.9815 AUROC，在HotpotQA上达到0.9142- 0.9315。在16位定点精度下，审计规则保留了99.8%的FP 16 AUROC，从而实现了基于Groth 16的公共验证，而不会泄露模型权重或激活。总之，这些结果的位置剩余流的几何形状作为一个实际的基础，实时RAG忠实监测和可选的可验证的部署。
摘要：Retrieval-augmented generation (RAG) mitigates hallucination but does not eliminate it: a deployed system must still decide, at inference time, whether its answer is actually supported by the retrieved evidence. We introduce LatentAudit, a white-box auditor that pools mid-to-late residual-stream activations from an open-weight generator and measures their Mahalanobis distance to the evidence representation. The resulting quadratic rule requires no auxiliary judge model, runs at generation time, and is simple enough to calibrate on a small held-out set. We show that residual-stream geometry carries a usable faithfulness signal, that this signal survives architecture changes and realistic retrieval failures, and that the same rule remains amenable to public verification. On PubMedQA with Llama-3-8B, LatentAudit reaches 0.942 AUROC with 0.77,ms overhead. Across three QA benchmarks and five model families (Llama-2/3, Qwen-2.5/3, Mistral), the monitor remains stable; under a four-way stress test with contradictions, retrieval misses, and partial-support noise, it reaches 0.9566--0.9815 AUROC on PubMedQA and 0.9142--0.9315 on HotpotQA. At 16-bit fixed-point precision, the audit rule preserves 99.8% of the FP16 AUROC, enabling Groth16-based public verification without revealing model weights or activations. Together, these results position residual-stream geometry as a practical basis for real-time RAG faithfulness monitoring and optional verifiable deployment.

【6】Extending Tabular Denoising Diffusion Probabilistic Models for Time-Series Data Generation
标题：扩展表格去噪扩散概率模型用于时间序列数据生成
链接：https://arxiv.org/abs/2604.05257

作者：Umang Dobhal,Christina Garcia,Sozo Inoue
备注：16 pages, 10 figures, 2 tables
摘要：扩散模型越来越多地被用来创建合成表格和时间序列数据，以保护隐私。表格去噪扩散概率模型（TabDDPM）从异构的表格数据集生成高质量的合成数据，但假设样本之间的独立性，限制了它们对时间依赖性至关重要的时间序列域的适用性。为了解决这个问题，我们提出了一个时间的TabDDPM扩展，通过使用轻量级的时间适配器和上下文感知嵌入模块引入序列意识。通过将传感器数据重新公式化为加窗序列，并通过时间步长嵌入、条件活动标签和观测/缺失掩码来明确建模时间上下文，我们的方法能够生成时间相干的合成序列。与基线和插值技术相比，使用二元组转换矩阵和自相关分析的验证显示出增强的时间现实主义，多样性和一致性。在WISDM加速度计数据集上，建议的系统产生与真实世界传感器模式非常相似的合成时间序列，并实现了相当的分类性能（宏F1-得分0.64，准确度0.71）。这对于少数类表示和保持与真实分布的统计对齐特别有利。这些发展表明，扩散为基础的模型提供了有效的和适应性强的解决方案，顺序数据合成时，他们配备了时间推理。未来的工作将探索扩展到更长的序列和集成更强大的时间架构。
摘要：Diffusion models are increasingly being utilised to create synthetic tabular and time series data for privacy-preserving augmentation. Tabular Denoising Diffusion Probabilistic Models (TabDDPM) generate high-quality synthetic data from heterogeneous tabular datasets but assume independence between samples, limiting their applicability to time-series domains where temporal dependencies are critical. To address this, we propose a temporal extension of TabDDPM, introducing sequence awareness through the use of lightweight temporal adapters and context-aware embedding modules. By reformulating sensor data into windowed sequences and explicitly modeling temporal context via timestep embeddings, conditional activity labels, and observed/missing masks, our approach enables the generation of temporally coherent synthetic sequences. Compared to baseline and interpolation techniques, validation using bigram transition matrices and autocorrelation analysis shows enhanced temporal realism, diversity, and coherence. On the WISDM accelerometer dataset, the suggested system produces synthetic time-series that closely resemble real world sensor patterns and achieves comparable classification performance (macro F1-score 0.64, accuracy 0.71). This is especially advantageous for minority class representation and preserving statistical alignment with real distributions. These developments demonstrate that diffusion based models provide effective and adaptable solutions for sequential data synthesis when they are equipped for temporal reasoning. Future work will explore scaling to longer sequences and integrating stronger temporal architectures.

【7】An Imbalanced Dataset with Multiple Feature Representations for Studying Quality Control of Next-Generation Sequencing
标题：用于研究下一代测序质量控制的多特征表示不平衡数据集
链接：https://arxiv.org/abs/2604.04981

作者：Philipp Röchner,Clarissa Krämer,Johannes U Mayer,Franz Rothlauf,Steffen Albrecht,Maximilian Sprang
摘要：下一代测序（NGS）是研究生物体DNA和RNA的关键技术。然而，在不同的实验环境中识别NGS数据中的质量问题仍然具有挑战性。为了开发自动化质量控制工具，研究人员需要具有捕获质量问题特征的特征的数据集。然而，现有的NGS存储库仅提供数量有限的质量相关功能。为了解决这一差距，我们提出了一个数据集来自37.491 NGS样本与两种类型的质量相关的特征表示。第一种类型由来自质量控制工具的34个特征（QC-34特征）组成。第二种类型具有可变的特征数量，范围从8到1.183。这些特征来源于由ENCODE区块列表（BL特征）鉴定的有问题的基因组区域中的读段计数。所有特征描述了来自五种基因组测定的相同人类和小鼠样本，允许直接比较特征表示。建议的数据集包括一个二进制质量标签，来自自动化质量控制和领域专家。在所有样品中，3.2%的样品质量较差。有监督的机器学习算法准确地预测了来自特征的质量标签，确认了所提供的特征表示的相关性。提出的特征表示使研究人员能够研究不同的特征类型（QC-34与BL特征）和粒度（不同数量的BL特征）如何影响质量问题的检测。
摘要：Next-generation sequencing (NGS) is a key technique for studying the DNA and RNA of organisms. However, identifying quality problems in NGS data across different experimental settings remains challenging. To develop automated quality-control tools, researchers require datasets with features that capture the characteristics of quality problems. Existing NGS repositories, however, offer only a limited number of quality-related features. To address this gap, we propose a dataset derived from 37.491 NGS samples with two types of quality-related feature representations. The first type consists of 34 features derived from quality control tools (QC-34 features). The second type has a variable number of features ranging from eight to 1.183. These features were derived from read counts in problematic genomic regions identified by the ENCODE blocklist (BL features). All features describe the same human and mouse samples from five genomic assays, allowing direct comparison of feature representations. The proposed dataset includes a binary quality label, derived from automated quality control and domain experts. Among all samples, $3.2\%$ are of low quality. Supervised machine learning algorithms accurately predicted quality labels from the features, confirming the relevance of the provided feature representations. The proposed feature representations enable researchers to study how different feature types (QC-34 vs. BL features) and granularities (varying number of BL features) affect the detection of quality problems.

半/弱/无/有监督|不确定性|主动学习(4篇)

【1】Topological Characterization of Churn Flow and Unsupervised Correction to the Wu Flow-Regime Map in Small-Diameter Vertical Pipes
标题：小直径垂直管道中搅拌流的布局描述和Wu流态图的无监督修正
链接：https://arxiv.org/abs/2604.06167

作者：Brady Koenig,Sushovan Majhi,Atish Mitra,Abigail Stein,Burt Todd
摘要：垂直两相流中的扰动流--混沌的、振荡的状态--40多年来一直缺乏一个定量的数学定义。我们介绍了第一个基于拓扑的表征使用欧拉特征曲面（ECS）。我们将无监督状态发现公式化为多核学习（MKL），将两个互补的ECS导出的内核-时间对齐（$x（s，t）$表面上的$L^1$距离）和幅度统计（尺度平均值，标准差，最大值，最小值）-与气体速度混合。应用于Montana Tech的37 $$未标记空气-水试验，自校准框架学习到权重β ECS =0.14$，β amp =0.50$，β ugs =0.36$，将总权重的64%$放在拓扑衍生特征（β ECS + β amp）上。ECS推断的段塞/搅动转换位于$+3.81$米/秒以上吴等。的（2017）预测在2 $-in。油管，量化报告，现有的模型预测小直径管道的界面张力和壁壁相互作用占主导地位的流动段塞持久性。对德克萨斯A&M大学947美元图像的跨设施验证证实，搅动与蛞蝓的拓扑复杂性高出1.9倍（$p < 10^{-5}$）。应用于$45$ TAMU伪试验，相同的无监督框架实现了$95.6\%$$4 $-类准确率和$100\%$$流失召回-没有任何标记的训练数据匹配或超过监督基线，需要数千个注释的例子。这项工作提供了流失流的第一个数学定义，并证明了无监督的拓扑描述符可以挑战和纠正广泛采用的机制模型。
摘要：Churn flow-the chaotic, oscillatory regime in vertical two-phase flow-has lacked a quantitative mathematical definition for over $40$ years. We introduce the first topology-based characterization using Euler Characteristic Surfaces (ECS). We formulate unsupervised regime discovery as Multiple Kernel Learning (MKL), blending two complementary ECS-derived kernels-temporal alignment ($L^1$ distance on the $χ(s,t)$ surface) and amplitude statistics (scale-wise mean, standard deviation, max, min)-with gas velocity. Applied to $37$ unlabeled air-water trials from Montana Tech, the self-calibrating framework learns weights $β_{ECS}=0.14$, $β_{amp}=0.50$, $β_{ugs}=0.36$, placing $64\%$ of total weight on topology-derived features ($β_{ECS} + β_{amp}$). The ECS-inferred slug/churn transition lies $+3.81$ m/s above Wu et al.'s (2017) prediction in $2$-in. tubing, quantifying reports that existing models under-predict slug persistence in small-diameter pipes where interfacial tension and wall-to-wall interactions dominate flow. Cross-facility validation on $947$ Texas A&M University images confirms $1.9\times$ higher topological complexity in churn vs. slug ($p < 10^{-5}$). Applied to $45$ TAMU pseudo-trials, the same unsupervised framework achieves $95.6\%$ $4$-class accuracy and $100\%$ churn recall-without any labeled training data-matching or exceeding supervised baselines that require thousands of annotated examples. This work provides the first mathematical definition of churn flow and demonstrates that unsupervised topological descriptors can challenge and correct widely adopted mechanistic models.

【2】Learning Stable Predictors from Weak Supervision under Distribution Shift
标题：分配转移下的薄弱监管学习稳定预测
链接：https://arxiv.org/abs/2604.05002

作者：Mehrdad Shoeibi,Elias Hossain,Ivan Garibay,Niloofar Yousefi
摘要：从弱监督或代理监督中学习是常见的，当地面实况标签不可用时，但分布变化下的鲁棒性仍然知之甚少，特别是当监督机制本身发生变化时。我们将其形式化为监督漂移，定义为P（y）的变化|x，c）跨上下文，并在CRISPR-Cas 13 d实验中研究它，其中指导功效从RNA-seq响应间接推断。使用来自两个人类细胞系和多个时间点的数据，我们建立了一个受控的非IID基准，具有明确的域和时间偏移，同时保持弱标签结构固定。模型实现了较强的域内性能（岭R^2 = 0.356，斯皮尔曼rho = 0.442）和部分跨细胞系转移（rho ~ 0.40）。然而，时间转移在所有模型中都失败了，具有负R^2和接近零的相关性（例如，XGBoost R^2 =-0.155，rho = 0.056）。进一步的分析证实了这一模式。标记-标签关系在细胞系中保持稳定，但随着时间的推移而急剧变化，这表明失败源于监督漂移而不是模型限制。这些发现突出了功能稳定性作为一个简单的诊断检测部署前的不可转移性。
摘要：Learning from weak or proxy supervision is common when ground-truth labels are unavailable, yet robustness under distribution shift remains poorly understood, especially when the supervision mechanism itself changes. We formalize this as supervision drift, defined as changes in P(y | x, c) across contexts, and study it in CRISPR-Cas13d experiments where guide efficacy is inferred indirectly from RNA-seq responses. Using data from two human cell lines and multiple time points, we build a controlled non-IID benchmark with explicit domain and temporal shifts while keeping the weak-label construction fixed. Models achieve strong in-domain performance (ridge R^2 = 0.356, Spearman rho = 0.442) and partial cross-cell-line transfer (rho ~ 0.40). However, temporal transfer fails across all models, with negative R^2 and near-zero correlation (e.g., XGBoost R^2 = -0.155, rho = 0.056). Additional analyses confirm this pattern. Feature-label relationships remain stable across cell lines but change sharply over time, indicating that failures arise from supervision drift rather than model limitations. These findings highlight feature stability as a simple diagnostic for detecting non-transferability before deployment.

【3】Ensemble-Based Dirichlet Modeling for Predictive Uncertainty and Selective Classification
标题：预测不确定性和选择性分类的基于集合的Dirichlet建模
链接：https://arxiv.org/abs/2604.06032

作者：Courtney Franzen,Farhad Pourkamali-Anaraki
备注：48 pages
摘要：用交叉熵损失训练的神经网络分类器实现了很强的预测准确性，但缺乏提供固有预测不确定性估计的能力，因此需要外部技术来获得这些估计。此外，真实类的softmax得分在独立训练运行中可能会有很大差异，这限制了下游任务中基于不确定性的决策的可靠性。证据深度学习旨在通过在单次通过中产生不确定性估计来解决这些限制，但证据训练对设计选择高度敏感，包括损失公式化，先验正则化和激活函数。因此，这项工作引入了一种替代的狄利克雷参数估计策略，通过应用矩估计的方法来集成softmax输出，可选的最大似然细化步骤。这种基于集合的构造将不确定性估计从脆弱的证据损失设计中分离出来，同时还减轻了单次运行交叉熵训练的可变性，产生了显式的Dirichlet预测分布。在多个数据集上，我们表明，这些集合导出的Dirichlet估计的稳定性和预测不确定性行为的改善转化为下游不确定性指导应用（如预测置信度评分和选择性分类）的更强性能。
摘要：Neural network classifiers trained with cross-entropy loss achieve strong predictive accuracy but lack the capability to provide inherent predictive uncertainty estimates, thus requiring external techniques to obtain these estimates. In addition, softmax scores for the true class can vary substantially across independent training runs, which limits the reliability of uncertainty-based decisions in downstream tasks. Evidential Deep Learning aims to address these limitations by producing uncertainty estimates in a single pass, but evidential training is highly sensitive to design choices including loss formulation, prior regularization, and activation functions. Therefore, this work introduces an alternative Dirichlet parameter estimation strategy by applying a method of moments estimator to ensembles of softmax outputs, with an optional maximum-likelihood refinement step. This ensemble-based construction decouples uncertainty estimation from the fragile evidential loss design while also mitigating the variability of single-run cross-entropy training, producing explicit Dirichlet predictive distributions. Across multiple datasets, we show that the improved stability and predictive uncertainty behavior of these ensemble-derived Dirichlet estimates translate into stronger performance in downstream uncertainty-guided applications such as prediction confidence scoring and selective classification.

【4】MEC: Machine-Learning-Assisted Generalized Entropy Calibration for Semi-Supervised Mean Estimation
标题：MEC：用于半监督均值估计的机器学习辅助广义熵校准
链接：https://arxiv.org/abs/2604.05446

作者：Se Yoon Lee,Jae Kwang Kim
摘要：获得高质量的标签是昂贵的，而未标记的协变量往往是丰富的，激励半监督推理方法与可靠的不确定性量化。预测驱动推理（PPI）利用在小的标记样本上训练的机器学习预测器来提高效率，但它可能会在模型错误指定的情况下失去效率，并因标签重用而遭受覆盖失真。我们介绍了机器学习辅助广义熵校准（MEC），这是PPI的交叉拟合，校准加权变体。MEC使用基于Bregman投影的原则性校准框架，通过重新加权标记样本以更好地与目标人群对齐来提高效率。这产生了预测器的仿射变换的鲁棒性，并通过用较弱的投影误差条件替换原始预测误差的条件来放松对有效性的要求。其结果是，MEC达到半参数的效率约束下较弱的假设比现有的PPI的变种。在模拟和实际数据应用中，MEC实现了接近标称的覆盖率和比CF-PPI和vanilla PPI更严格的置信区间。
摘要：Obtaining high-quality labels is costly, whereas unlabeled covariates are often abundant, motivating semi-supervised inference methods with reliable uncertainty quantification. Prediction-powered inference (PPI) leverages a machine-learning predictor trained on a small labeled sample to improve efficiency, but it can lose efficiency under model misspecification and suffer from coverage distortions due to label reuse. We introduce Machine-Learning-Assisted Generalized Entropy Calibration (MEC), a cross-fitted, calibration-weighted variant of PPI. MEC improves efficiency by reweighting labeled samples to better align with the target population, using a principled calibration framework based on Bregman projections. This yields robustness to affine transformations of the predictor and relaxes requirements for validity by replacing conditions on raw prediction error with weaker projection-error conditions. As a result, MEC attains the semiparametric efficiency bound under weaker assumptions than existing PPI variants. Across simulations and a real-data application, MEC achieves near-nominal coverage and tighter confidence intervals than CF-PPI and vanilla PPI.

迁移|Zero/Few/One-Shot|自适应(8篇)

【1】Transfer Learning for Neural Parameter Estimation applied to Building RC Models
标题：神经参数估计的转移学习应用于构建RC模型
链接：https://arxiv.org/abs/2604.05904

作者：Fabian Raisch,Timo Germann,J. Nathan Kutz,Christoph Goebel,Benjamin Tischler
备注：This work has been submitted to the IEEE for possible publication
摘要：由于非凸性和对初始参数猜测的敏感性，动力系统的参数估计仍然具有挑战性。最近的深度学习方法能够实现准确和快速的参数估计，但没有利用跨系统的可转移知识。为了解决这个问题，我们引入了一个基于迁移学习的神经参数估计框架的基础上预训练微调范式。这种方法提高了精度，并消除了对初始参数猜测的需要。我们将此框架应用于构建RC热模型，针对遗传算法和从头开始的神经基线进行评估，包括八个模拟建筑物，一个真实建筑物，两个RC模型配置和四个训练数据长度。结果表明，仅使用12天的训练数据，性能提高了18.6-24.0%，使用72天的训练数据，性能提高了49.4%。除了建筑物，所提出的方法代表了一个新的范例参数估计动态系统。
摘要：Parameter estimation for dynamical systems remains challenging due to non-convexity and sensitivity to initial parameter guesses. Recent deep learning approaches enable accurate and fast parameter estimation but do not exploit transferable knowledge across systems. To address this, we introduce a transfer-learning-based neural parameter estimation framework based on a pretraining-fine-tuning paradigm. This approach improves accuracy and eliminates the need for an initial parameter guess. We apply this framework to building RC thermal models, evaluating it against a Genetic Algorithm and a from-scratch neural baseline across eight simulated buildings, one real-world building, two RC model configurations, and four training data lengths. Results demonstrate an 18.6-24.0% performance improvement with only 12 days of training data and up to 49.4% with 72 days. Beyond buildings, the proposed method represents a new paradigm for parameter estimation in dynamical systems.

【2】ALTO: Adaptive LoRA Tuning and Orchestration for Heterogeneous LoRA Training Workloads
标题：ALTO：针对异类LoRA训练工作负载的自适应LoRA调整和演示
链接：https://arxiv.org/abs/2604.05426

作者：Jingwei Zuo,Xinze Feng,Zien Liu,Kaijian Wang,Fanjiang Ye,Ye Cao,Zhuang Wang,Yuke Wang
摘要：低秩自适应（LoRA）现在是大型语言模型参数高效微调的主要方法，但实现高质量的适配器通常需要系统的超参数调优，因为LoRA性能对配置选择高度敏感。在实践中，这会导致许多并发LoRA作业，通常跨越多租户环境中的异构任务。现有的系统在很大程度上独立处理这些工作，这既浪费了弱候选人的计算，也使GPU未得到充分利用。我们提出了ALTO（自适应LoRA调整和调试），这是一个共同设计的训练系统，可以加速LoRA超参数调整，同时实现跨异构任务的高效集群共享。ALTO背后的核心观点是，当多个调优作业在共享的冻结主干上并发运行时，它们会暴露出单作业设计无法利用的优化机会。在此基础上，ALTO监控丢失轨迹以提前终止不希望的配置，使用融合的分组GEMM以及新的秩本地适配器并行性来共同定位幸存的适配器并回收释放的GPU容量，并结合任务内和任务间调度，通过利用LoRA作业的可预测持续时间来改善多任务放置。广泛的评估表明，ALTO在不牺牲适配器质量的情况下，实现了最高13.8\times $的加速比。
摘要：Low-Rank Adaptation (LoRA) is now the dominant method for parameter-efficient fine-tuning of large language models, but achieving a high-quality adapter often requires systematic hyperparameter tuning because LoRA performance is highly sensitive to configuration choices. In practice, this leads to many concurrent LoRA jobs, often spanning heterogeneous tasks in multi-tenant environments. Existing systems largely handle these jobs independently, which both wastes computation on weak candidates and leaves GPUs underutilized. We present ALTO (Adaptive LoRA Tuning and Orchestration), a co-designed training system that accelerates LoRA hyperparameter tuning while enabling efficient cluster sharing across heterogeneous tasks. The central insight behind ALTO is that when multiple tuning jobs run concurrently over a shared frozen backbone, they expose optimization opportunities that single-job designs cannot exploit. Building on this, ALTO monitors loss trajectories to terminate unpromising configurations early, uses fused grouped GEMM together with a new rank-local adapter parallelism to co-locate surviving adapters and reclaim freed GPU capacity, and combines intra-task and inter-task scheduling to improve multi-task placement by leveraging the predictable duration of LoRA jobs. Extensive evaluation shows that ALTO achieves up to $13.8\times$ speedup over state-of-the-art without sacrificing adapter quality.

【3】Retrieve-then-Adapt: Retrieval-Augmented Test-Time Adaptation for Sequential Recommendation
标题：检索然后调整：检索增强测试时调整顺序推荐
链接：https://arxiv.org/abs/2604.05379

作者：Xing Tang,Jingyang Bin,Ziqiang Cui,Xiaokun Zhang,Fuyuan Lyu,Jingyan Jiang,Dugang Liu,Chen Ma,Xiuqiang He
摘要：顺序推荐（SR）任务的目的是预测下一个项目的基础上，用户的历史交互序列。SR模型通常在历史数据上进行训练，由于分布发散和参数化约束带来的挑战，SR模型在推理过程中往往难以适应实时偏好变化。解决这个问题的现有方法包括测试时训练，测试时增强和检索增强微调。然而，这些方法要么引入显著的计算开销，依赖于随机增强策略，要么需要精心设计的两阶段训练范式。在本文中，我们认为，有效的测试时间适应的关键在于实现有效的增强和有效的适应。为此，我们提出了检索然后适应（ReAd），一种新的框架，通过检索用户偏好信号，动态适应部署的SR模型的测试分布。具体来说，给定一个训练好的SR模型，ReAd首先从一个构建好的协作记忆数据库中检索测试用户的协作相似项目。然后，一个轻量级的检索学习模块将这些项目集成到一个信息增强嵌入，捕获协作信号和预测细化线索。最后，通过融合机制，结合这种嵌入的初始SR预测进行了改进。在五个基准数据集上的广泛实验表明，ReAd始终优于现有的SR方法。
摘要：The sequential recommendation (SR) task aims to predict the next item based on users' historical interaction sequences. Typically trained on historical data, SR models often struggle to adapt to real-time preference shifts during inference due to challenges posed by distributional divergence and parameterized constraints. Existing approaches to address this issue include test-time training, test-time augmentation, and retrieval-augmented fine-tuning. However, these methods either introduce significant computational overhead, rely on random augmentation strategies, or require a carefully designed two-stage training paradigm. In this paper, we argue that the key to effective test-time adaptation lies in achieving both effective augmentation and efficient adaptation. To this end, we propose Retrieve-then-Adapt (ReAd), a novel framework that dynamically adapts a deployed SR model to the test distribution through retrieved user preference signals. Specifically, given a trained SR model, ReAd first retrieves collaboratively similar items for a test user from a constructed collaborative memory database. A lightweight retrieval learning module then integrates these items into an informative augmentation embedding that captures both collaborative signals and prediction-refinement cues. Finally, the initial SR prediction is refined via a fusion mechanism that incorporates this embedding. Extensive experiments across five benchmark datasets demonstrate that ReAd consistently outperforms existing SR methods.

【4】Not All Turns Are Equally Hard: Adaptive Thinking Budgets For Efficient Multi-Turn Reasoning
标题：并非所有回合都同样困难：高效多回合推理的自适应思维预算
链接：https://arxiv.org/abs/2604.05164

作者：Neharika Jali,Anupam Nayak,Gauri Joshi
摘要：随着LLM推理性能的稳定，提高推理时的计算效率对于减轻过度思考和长时间的思考痕迹至关重要，即使是对于简单的查询。现有的方法包括长度正则化，自适应路由，和困难的预算分配主要集中在单回合设置，并未能解决多回合推理固有的顺序依赖性，在这项工作中，我们制定了多回合推理作为一个顺序的计算分配问题，并将其建模为一个多目标马尔可夫决策过程。我们建议TAB：回合自适应预算（Turn-Adaptive Budgets）是一种通过组相对策略优化（GRPO）训练的预算分配策略，它可以学习最大化任务准确性，同时尊重全局每个问题的令牌约束。因此，TAB将对话历史作为输入，并学习自适应地将较小的预算分配给更容易的回合，并为关键的较难推理步骤保存适当数量的令牌。我们在数学推理基准测试上的实验表明，TAB实现了卓越的准确性-令牌权衡，节省了高达35%的令牌，同时保持了静态和现成的LLM预算基线的准确性。此外，对于所有子问题的计划都是先验可用的系统，我们提出了TAB All-SubQ，这是一种预算分配策略，基于对话历史以及所有过去和未来的子问题来预算令牌，比基线节省高达40%的令牌。
摘要：As LLM reasoning performance plateau, improving inference-time compute efficiency is crucial to mitigate overthinking and long thinking traces even for simple queries. Prior approaches including length regularization, adaptive routing, and difficulty-based budget allocation primarily focus on single-turn settings and fail to address the sequential dependencies inherent in multi-turn reasoning.In this work, we formulate multi-turn reasoning as a sequential compute allocation problem and model it as a multi-objective Markov Decision Process. We propose TAB: Turn-Adaptive Budgets, a budget allocation policy trained via Group Relative Policy Optimization (GRPO) that learns to maximize task accuracy while respecting global per-problem token constraints. Consequently, TAB takes as input the conversation history and learns to adaptively allocate smaller budgets to easier turns and save appropriate number of tokens for the crucial harder reasoning steps. Our experiments on mathematical reasoning benchmarks demonstrate that TAB achieves a superior accuracy-tokens tradeoff saving up to 35% tokens while maintaining accuracy over static and off-the-shelf LLM budget baselines. Further, for systems where a plan of all sub-questions is available apriori, we propose TAB All-SubQ, a budget allocation policy that budgets tokens based on the conversation history and all past and future sub-questions saving up to 40% tokens over baselines.

【5】Offline RL for Adaptive Policy Retrieval in Prior Authorization
标题：用于预先授权中的自适应策略检索的离线RL
链接：https://arxiv.org/abs/2604.05125

作者：Ruslan Sharifullin,Maxim Gorshkov,Hannah Clay
备注：9 pages, 7 figures, 6 tables
摘要：事先授权（PA）需要解释复杂和分散的覆盖政策，但现有的检索增强系统依赖于静态的顶部-$K$策略与固定数量的检索部分。这种固定检索可能效率低下，收集不相关或不充分的信息。我们模型的政策检索PA作为一个顺序的决策问题，制定自适应检索作为一个马尔可夫决策过程（MDP）。在我们的系统中，代理迭代地从前$K$个候选集合中选择策略块，或者选择停止并发布决策。奖励平衡决策正确性和检索成本，捕获准确性和效率之间的权衡。我们训练政策，使用保守的Q学习（CQL），隐式Q学习（IQL），和直接偏好优化（DPO）在离线RL设置上的记录轨迹从基线检索策略在合成PA请求来自公开可用的CMS覆盖数据。在10个CMS程序的186个策略块的语料库上，CQL通过穷举检索实现了92%的决策准确率（比最佳固定$K$基线高出30个百分点），而IQL使用少44%的检索步骤匹配最佳基线准确率，并在所有策略中实现了唯一的正情景回报。过渡级DPO匹配CQL的92%的准确性，同时使用少47%的检索步骤（10.6 vs. 20.0），占据帕累托边界上的“选择性准确”区域，主导CQL和BC。行为克隆基线与CQL匹配，确认需要基于优先权或基于偏好的策略提取来学习选择性检索。λ烧蚀超过步骤成本$λ\in \{0.05，0.1，0.2\}$揭示了一个明确的准确性-效率的拐点：只有在$λ= 0.2$CQL过渡从穷举到选择性检索。
摘要：Prior authorization (PA) requires interpretation of complex and fragmented coverage policies, yet existing retrieval-augmented systems rely on static top-$K$ strategies with fixed numbers of retrieved sections. Such fixed retrieval can be inefficient and gather irrelevant or insufficient information. We model policy retrieval for PA as a sequential decision-making problem, formulating adaptive retrieval as a Markov Decision Process (MDP). In our system, an agent iteratively selects policy chunks from a top-$K$ candidate set or chooses to stop and issue a decision. The reward balances decision correctness against retrieval cost, capturing the trade-off between accuracy and efficiency. We train policies using Conservative Q-Learning (CQL), Implicit Q-Learning (IQL), and Direct Preference Optimization (DPO) in an offline RL setting on logged trajectories generated from baseline retrieval strategies over synthetic PA requests derived from publicly available CMS coverage data. On a corpus of 186 policy chunks spanning 10 CMS procedures, CQL achieves 92% decision accuracy (+30 percentage points over the best fixed-$K$ baseline) via exhaustive retrieval, while IQL matches the best baseline accuracy using 44% fewer retrieval steps and achieves the only positive episodic return among all policies. Transition-level DPO matches CQL's 92% accuracy while using 47% fewer retrieval steps (10.6 vs. 20.0), occupying a "selective-accurate" region on the Pareto frontier that dominates both CQL and BC. A behavioral cloning baseline matches CQL, confirming that advantage-weighted or preference-based policy extraction is needed to learn selective retrieval. Lambda ablation over step costs $λ\in \{0.05, 0.1, 0.2\}$ reveals a clear accuracy-efficiency inflection: only at $λ= 0.2$ does CQL transition from exhaustive to selective retrieval.

【6】PCA-Driven Adaptive Sensor Triage for Edge AI Inference
标题：PCA驱动的自适应传感器分类用于边缘AI推理
链接：https://arxiv.org/abs/2604.05045

作者：Ankit Hemant Lade,Sai Krishna Jasti,Nikhil Sinha,Indar Kumar,Akanksha Tiwari
备注：16 pages, 13 figures, 7 benchmarks
摘要：工业物联网中的多通道传感器网络经常超过可用带宽。我们提出了PCA分类，流算法，将增量PCA负载成比例的每通道采样率下的带宽预算。PCA-Triage在O（wdk）时间内运行，零可训练参数（每个决策0.67 ms）。我们评估7个基准（8- 82通道）对9个基线。PCA分类是最好的无监督方法，在50%带宽的6个数据集中的3个上，在每个基线上赢得了5个，效果大小很大（r = 0.71- 0.91）。在TEP上，它实现了F1 = 0.961 +/- 0.001 -在全数据性能的0.1%以内-同时在30%的预算下保持F1 > 0.90。目标扩展将F1推至0.970。该算法对数据包丢失和传感器噪声具有鲁棒性（在组合最坏情况下下降3.7- 4.8%）。
摘要：Multi-channel sensor networks in industrial IoT often exceed available bandwidth. We propose PCA-Triage, a streaming algorithm that converts incremental PCA loadings into proportional per-channel sampling rates under a bandwidth budget. PCA-Triage runs in O(wdk) time with zero trainable parameters (0.67 ms per decision). We evaluate on 7 benchmarks (8--82 channels) against 9 baselines. PCA-Triage is the best unsupervised method on 3 of 6 datasets at 50% bandwidth, winning 5 of 6 against every baseline with large effect sizes (r = 0.71--0.91). On TEP, it achieves F1 = 0.961 +/- 0.001 -- within 0.1% of full-data performance -- while maintaining F1 > 0.90 at 30% budget. Targeted extensions push F1 to 0.970. The algorithm is robust to packet loss and sensor noise (3.7--4.8% degradation under combined worst-case).

【7】Enhancing sample efficiency in reinforcement-learning-based flow control: replacing the critic with an adaptive reduced-order model
标题：提高基于业务学习的流控制中的样本效率：用自适应降阶模型取代批评者
链接：https://arxiv.org/abs/2604.04986

作者：Zesheng Yao,Zhen-Hua Wan,Canjun Yang,Qingchao Xia,Mengqi Zhang
备注：43 pages, 26 figures
摘要：无模型深度强化学习（DRL）方法的样本效率很低。为了克服这一限制，这项工作介绍了一个自适应降阶模型（ROM）为基础的强化学习框架的主动流量控制。与传统的演员-评论家架构相比，所提出的方法利用ROM来估计控制器优化所需的梯度信息。ROM结构的设计结合了物理见解。ROM集成了一个线性动力学系统和神经常微分方程（NODE）估计的非线性流动。线性分量的参数通过算子推理来识别，而NODE则使用基于梯度的优化以数据驱动的方式进行训练。在控制器-环境交互期间，ROM不断更新新收集的数据，从而实现模型的自适应细化。控制器，然后通过微分仿真ROM优化。建议ROM为基础的DRL框架验证两个典型的流量控制问题：Blasius边界层流动和流动通过一个方柱。对于Blasius边界层，所提出的方法有效地减少到一个单集的系统识别和控制器优化过程，但它产生的控制器，优于传统的线性设计，并实现性能与DRL方法用最少的数据。对于方柱绕流，该方法与DRL方法相比，用更少的探测数据实现了更好的减阻效果。这项工作解决了无模型DRL控制算法的一个关键组成部分，并为设计更有效的基于DRL的主动流量控制器奠定了基础。
摘要：Model-free deep reinforcement learning (DRL) methods suffer from poor sample efficiency. To overcome this limitation, this work introduces an adaptive reduced-order-model (ROM)-based reinforcement learning framework for active flow control. In contrast to conventional actor--critic architectures, the proposed approach leverages a ROM to estimate the gradient information required for controller optimization. The design of the ROM structure incorporates physical insights. The ROM integrates a linear dynamical system and a neural ordinary differential equation (NODE) for estimating the nonlinearity in the flow. The parameters of the linear component are identified via operator inference, while the NODE is trained in a data-driven manner using gradient-based optimization. During controller--environment interactions, the ROM is continuously updated with newly collected data, enabling adaptive refinement of the model. The controller is then optimized through differentiable simulation of the ROM. The proposed ROM-based DRL framework is validated on two canonical flow control problems: Blasius boundary layer flow and flow past a square cylinder. For the Blasius boundary layer, the proposed method effectively reduces to a single-episode system identification and controller optimization process, yet it yields controllers that outperform traditional linear designs and achieve performance comparable to DRL approaches with minimal data. For the flow past a square cylinder, the proposed method achieves superior drag reduction with significantly fewer exploration data compared with DRL approaches. The work addresses a key component of model-free DRL control algorithms and lays the foundation for designing more sample-efficient DRL-based active flow controllers.

【8】StrADiff: A Structured Source-Wise Adaptive Diffusion Framework for Linear and Nonlinear Blind Source Separation
标题：StrADiff：一种用于线性和非线性盲源分离的结构化逐源自适应扩散框架
链接：https://arxiv.org/abs/2604.04973

作者：Yuan-Hao Wei
摘要：本文提出了一种用于线性和非线性盲源分离的结构化逐源自适应扩散框架。该框架将每个潜在维度解释为源组件，并为其分配单独的自适应扩散机制，从而建立基于源的潜在建模，而不是依赖于单个共享的潜在先验。由此产生的公式在统一的端到端目标内共同学习源恢复和混合/重建过程，允许模型参数和潜在源在训练期间同时适应。这为线性和非线性盲源分离提供了一个通用的框架。在本实例化中，每个源在对潜在轨迹施加逐源时间结构之前还配备有其自己的自适应高斯过程（GP），而整体框架不限于高斯过程先验，并且原则上可以容纳其他结构化源先验。因此，所提出的模型提供了一个一般的结构化的基于扩散的路线，以无监督的源恢复，具有潜在的相关性超越盲源分离，可解释的潜在建模，源明智的解纠缠，并在适当的结构条件下潜在的可识别的非线性潜变量学习。
摘要：This paper presents a Structured Source-Wise Adaptive Diffusion Framework for linear and nonlinear blind source separation. The framework interprets each latent dimension as a source component and assigns to it an individual adaptive diffusion mechanism, thereby establishing source-wise latent modeling rather than relying on a single shared latent prior. The resulting formulation learns source recovery and the mixing/reconstruction process jointly within a unified end-to-end objective, allowing model parameters and latent sources to adapt simultaneously during training. This yields a common framework for both linear and nonlinear blind source separation. In the present instantiation, each source is further equipped with its own adaptive Gaussian process (GP) prior to impose source-wise temporal structure on the latent trajectories, while the overall framework is not restricted to Gaussian process priors and can in principle accommodate other structured source priors. The proposed model thus provides a general structured diffusion-based route to unsupervised source recovery, with potential relevance beyond blind source separation to interpretable latent modeling, source-wise disentanglement, and potentially identifiable nonlinear latent-variable learning under appropriate structural conditions.

强化学习(5篇)

【1】Vehicle-as-Prompt: A Unified Deep Reinforcement Learning Framework for Heterogeneous Fleet Vehicle Routing Problem
标题：车辆即提示：针对异类车队车辆路径问题的统一深度强化学习框架
链接：https://arxiv.org/abs/2604.05195

作者：Shihong Huang,Shengjie Wang,Lei Gao,Hong Ma,Zhanluo Zhang,Feng Zhang,Weihua Zhou
摘要：与传统的同质路径问题不同，异质车队车辆路径问题（HFVRP）涉及异质固定成本、可变出行成本和容量约束，使得解决方案的质量对车辆选择高度敏感。此外，现实世界的物流应用程序往往会施加额外的复杂约束，显着增加计算复杂性。然而，大多数现有的基于深度强化学习（DRL）的方法仅限于同质场景，导致应用于HFVRP及其复杂变体时性能不佳。为了弥合这一差距，我们调查HFVRP复杂的约束条件下，并开发一个统一的DRL框架，能够解决跨各种变量设置的问题。我们引入了车辆提示（VAP）机制，该机制将问题制定为单阶段自回归决策过程。在此基础上，我们提出了VaP-CSMV，一个框架，具有跨语义编码器和多视图解码器，有效地解决各种问题的变种，并捕捉车辆异构性和客户节点属性之间的复杂映射关系。大量的实验结果表明，VaP-CSMV显着优于现有的国家的最先进的DRL为基础的神经求解器，并实现有竞争力的解决方案的质量相比，传统的启发式求解器，同时减少推理时间只有几秒钟。此外，该框架表现出强大的zero-shot泛化能力的大规模和以前看不见的问题的变种，而消融研究验证了每个组件的重要贡献。
摘要：Unlike traditional homogeneous routing problems, the Heterogeneous Fleet Vehicle Routing Problem (HFVRP) involves heterogeneous fixed costs, variable travel costs, and capacity constraints, rendering solution quality highly sensitive to vehicle selection. Furthermore, real-world logistics applications often impose additional complex constraints, markedly increasing computational complexity. However, most existing Deep Reinforcement Learning (DRL)-based methods are restricted to homogeneous scenarios, leading to suboptimal performance when applied to HFVRP and its complex variants. To bridge this gap, we investigate HFVRP under complex constraints and develop a unified DRL framework capable of solving the problem across various variant settings. We introduce the Vehicle-as-Prompt (VaP) mechanism, which formulates the problem as a single-stage autoregressive decision process. Building on this, we propose VaP-CSMV, a framework featuring a cross-semantic encoder and a multi-view decoder that effectively addresses various problem variants and captures the complex mapping relationships between vehicle heterogeneity and customer node attributes. Extensive experimental results demonstrate that VaP-CSMV significantly outperforms existing state-of-the-art DRL-based neural solvers and achieves competitive solution quality compared to traditional heuristic solvers, while reducing inference time to mere seconds. Furthermore, the framework exhibits strong zero-shot generalization capabilities on large-scale and previously unseen problem variants, while ablation studies validate the vital contribution of each component.

【2】Cross-fitted Proximal Learning for Model-Based Reinforcement Learning
标题：基于模型的强化学习的交叉匹配近端学习
链接：https://arxiv.org/abs/2604.05185

作者：Nishanth Venkatesh,Andreas A. Malikopoulos
摘要：基于模型的强化学习对于顺序决策是有吸引力的，因为它显式地估计奖励和过渡模型，然后通过模拟推出来支持规划。然而，在具有隐藏混杂的离线设置中，直接从观察数据中学习的模型可能会有偏差。这一挑战在部分可观测系统中尤为突出，其中潜在因素可能会共同影响行动，奖励和未来的观察。最近的工作表明，在这种混杂的部分可观察马尔可夫决策过程（POMDPs）的政策评估，可以减少到估计满足条件矩限制（CMR）的奖励-排放和观察-过渡桥功能。在本文中，我们研究这些桥函数的统计估计。我们将桥学习公式化为CMR问题，其中滋扰对象由条件均值嵌入和条件密度给出。然后，我们开发了一个$K$倍交叉拟合扩展现有的两阶段桥估计。所提出的程序保留了原来的基于桥的识别策略，同时使用可用的数据比一个单一的样本分裂更有效。我们还推导出一个预言比较器的交叉拟合估计和分解所产生的误差到一个阶段I项引起的滋扰估计和一个阶段II项引起的经验平均。
摘要：Model-based reinforcement learning is attractive for sequential decision-making because it explicitly estimates reward and transition models and then supports planning through simulated rollouts. In offline settings with hidden confounding, however, models learned directly from observational data may be biased. This challenge is especially pronounced in partially observable systems, where latent factors may jointly affect actions, rewards, and future observations. Recent work has shown that policy evaluation in such confounded partially observable Markov decision processes (POMDPs) can be reduced to estimating reward-emission and observation-transition bridge functions satisfying conditional moment restrictions (CMRs). In this paper, we study the statistical estimation of these bridge functions. We formulate bridge learning as a CMR problem with nuisance objects given by a conditional mean embedding and a conditional density. We then develop a $K$-fold cross-fitted extension of the existing two-stage bridge estimator. The proposed procedure preserves the original bridge-based identification strategy while using the available data more efficiently than a single sample split. We also derive an oracle-comparator bound for the cross-fitted estimator and decompose the resulting error into a Stage I term induced by nuisance estimation and a Stage II term induced by empirical averaging.

【3】Reasoning Through Chess: How Reasoning Evolves from Data Through Fine-Tuning and Reinforcement Learning
标题：通过国际象棋推理：推理如何通过微调和强化学习从数据进化
链接：https://arxiv.org/abs/2604.05134

作者：Lucas Dionisopoulos,Nicklas Majamaki,Prithviraj Ammanabrolu
备注：Accepted at the NeurIPS 2025 Foundations of Reasoning in Language Models (FoRLM) Workshop (Oral)
摘要：如何让一个语言模型在一个它本来就很难处理的任务中进行推理？我们通过分析一组理论启发的数据集如何影响国际象棋中的语言模型性能，研究推理如何在语言模型中演变-从监督微调（SFT）到强化学习（RL）。我们发现，微调模型以直接预测最佳移动会导致有效的RL和最强的下游性能-然而，RL步骤会消除不忠实的推理（与所选移动不一致的推理）。或者，在多移动轨迹上进行训练，可以产生与忠实推理和更稳定的RL相当的下游性能。我们发现，RL诱导移动质量的分布有很大的积极变化，并降低幻觉率作为副作用。最后，我们发现了几个SFT检查点指标--跨越评估性能、幻觉率和推理质量的指标--可以预测RL后模型的性能。我们发布了检查点和最终模型以及训练数据，评估和代码，这使我们能够通过7 B参数模型超越国际象棋中领先的开源推理模型。
摘要：How can you get a language model to reason in a task it natively struggles with? We study how reasoning evolves in a language model -- from supervised fine-tuning (SFT) to reinforcement learning (RL) -- by analyzing how a set of theoretically-inspired datasets impacts language model performance in chess. We find that fine-tuning a model to directly predict the best move leads to effective RL and the strongest downstream performance -- however, the RL step elicits unfaithful reasoning (reasoning inconsistent with the chosen move). Alternatively, training on multi-move trajectories yields comparable downstream performance with faithful reasoning and more stable RL. We show that RL induces a substantial positive shift in the distribution of move quality and reduces hallucination rates as a side effect. Finally, we find several SFT-checkpoint metrics -- metrics spanning evaluation performance, hallucination rates, and reasoning quality -- to be predictive of post-RL model performance. We release checkpoints and final models as well as training data, evaluations, and code which allowed us to surpass leading open-source reasoning models in chess with a 7B-parameter model.

【4】Hypernetwork-Conditioned Reinforcement Learning for Robust Control of Fixed-Wing Aircraft under Actuator Failures
标题：基于超网络的固定翼飞机执行器故障鲁棒控制
链接：https://arxiv.org/abs/2604.03392

作者：Dennis Marquis,Mazen Farhood
摘要：针对固定翼小型无人机系统，提出了一种基于强化学习的路径跟踪控制器，该控制器对执行器故障具有鲁棒性。该控制器的条件是使用超网络为基础的适应执行器故障的参数化。我们考虑基于逐行线性调制（FiLM）和低秩自适应（LoRA）的参数有效配方，使用邻近策略优化进行训练。我们证明了超网络条件的政策可以提高鲁棒性相比，标准的多层感知器的政策。特别是，超网络条件的政策有效地推广到训练过程中没有遇到的随时间变化的执行器故障模式。该方法是通过高保真仿真验证，使用一个现实的六度自由度固定翼飞机模型。
摘要：This paper presents a reinforcement learning-based path-following controller for a fixed-wing small uncrewed aircraft system (sUAS) that is robust to certain actuator failures. The controller is conditioned on a parameterization of actuator faults using hypernetwork-based adaptation. We consider parameter-efficient formulations based on Feature-wise Linear Modulation (FiLM) and Low-Rank Adaptation (LoRA), trained using proximal policy optimization. We demonstrate that hypernetwork-conditioned policies can improve robustness compared to standard multilayer perceptron policies. In particular, hypernetwork-conditioned policies generalize effectively to time-varying actuator failure modes not encountered during training. The approach is validated through high-fidelity simulations, using a realistic six-degree-of-freedom fixed-wing aircraft model.

【5】Value Mirror Descent for Reinforcement Learning
标题：强化学习的价值镜下降
链接：https://arxiv.org/abs/2604.06039

作者：Zhichao Jia,Guanghui Lan
摘要：值迭代型方法已经被广泛研究，用于计算强化学习（RL）中的近最优值函数。在生成式抽样模型下，这些方法可以实现比政策优化方法更尖锐的样本复杂性，特别是在它们对折扣因子的依赖方面。在实践中，它们通常用于离线训练或模拟环境。本文考虑了状态空间为S，行动空间为A，折扣因子γ\在（0，1）中，费用在[0，1]中的折扣马氏决策过程.我们介绍了一种新的值优化方法，称为值镜像下降（VMD），它集成了镜像下降凸优化到经典的值迭代框架。在确定性设置与已知的过渡内核，我们表明，VMD线性收敛。对于生成模型的随机设置，我们开发了一个随机变量，SVMD，它采用了随机值迭代型方法中常用的方差减少。对于具有一般凸正则化的RL问题，SVMD获得了$\tilde{O}（|S||一|（1-γ）^{-3}ε^{-2}）$。此外，我们建立的Bregman分歧之间的生成和最优的政策仍然有界的整个迭代。这个属性在现有的随机值迭代类型的方法中是不存在的，但是对于在离线训练之后实现有效的在线（持续）学习是很重要的。在强凸正则化下，SVMD实现了$\tilde{O}的样本复杂度（|S||一|（1-γ）^{-5}ε^{-1}）$，提高了高精度范围内的性能。此外，我们证明了所产生的政策的最优政策的收敛性。总的来说，所提出的方法，其分析，以及由此产生的保证，构成了RL和优化文献的新贡献。
摘要：Value iteration-type methods have been extensively studied for computing a nearly optimal value function in reinforcement learning (RL). Under a generative sampling model, these methods can achieve sharper sample complexity than policy optimization approaches, particularly in their dependence on the discount factor. In practice, they are often employed for offline training or in simulated environments. In this paper, we consider discounted Markov decision processes with state space S, action space A, discount factor $γ\in(0,1)$ and costs in $[0,1]$. We introduce a novel value optimization method, termed value mirror descent (VMD), which integrates mirror descent from convex optimization into the classical value iteration framework. In the deterministic setting with known transition kernels, we show that VMD converges linearly. For the stochastic setting with a generative model, we develop a stochastic variant, SVMD, which incorporates variance reduction commonly used in stochastic value iteration-type methods. For RL problems with general convex regularizers, SVMD attains a near-optimal sample complexity of $\tilde{O}(|S||A|(1-γ)^{-3}ε^{-2})$. Moreover, we establish that the Bregman divergence between the generated and optimal policies remains bounded throughout the iterations. This property is absent in existing stochastic value iteration-type methods but is important for enabling effective online (continual) learning following offline training. Under a strongly convex regularizer, SVMD achieves sample complexity of $\tilde{O}(|S||A|(1-γ)^{-5}ε^{-1})$, improving performance in the high-accuracy regime. Furthermore, we prove convergence of the generated policy to the optimal policy. Overall, the proposed method, its analysis, and the resulting guarantees, constitute new contributions to the RL and optimization literature.

元学习(1篇)

【1】A Large-Scale Empirical Comparison of Meta-Learners and Causal Forests for Heterogeneous Treatment Effect Estimation in Marketing Uplift Modeling
标题：元学习者和因果林的大规模实证比较用于营销Upper建模中的异向治疗效果估计
链接：https://arxiv.org/abs/2604.06123

作者：Aman Singh
备注：6 pages
摘要：在个人层面上估计条件平均治疗效果（CATE）是精准营销的核心，但在工业规模上对提升建模方法的系统基准仍然有限。我们提出了UpliftBench，这是对四个CATE估计器的实证评估：S-Learner，T-Learner，X-Learner（均使用LightGBM基础学习器）和Causal Forest（EconML），应用于包含1398万条客户记录的Criteo Uplift v2.1数据集。接近随机的治疗分配（倾向AUC = 0.509）为因果估计提供了较强的内部效度。通过Qini系数和累积增益曲线进行评估，S-Learner获得了最高的Qini分数0.376，预测CATE排名前20%的客户捕获了77.7%的增量转换，比随机目标提高了3.9倍。SHAP分析将f8确定为12个匿名协变量中的主要异质性治疗效应（HTE）驱动因素。因果森林不确定性量化显示，1.9%的客户是有信心的说服者（95%CI下限> 0），0.1%的客户是有信心的睡觉狗（95%CI上限< 0）。我们的研究结果为从业者提供了基于证据的指导方法选择大规模隆起模拟管道。
摘要：Estimating Conditional Average Treatment Effects (CATE) at the individual level is central to precision marketing, yet systematic benchmarking of uplift modeling methods at industrial scale remains limited. We present UpliftBench, an empirical evaluation of four CATE estimators: S-Learner, T-Learner, X-Learner (all with LightGBM base learners), and Causal Forest (EconML), applied to the Criteo Uplift v2.1 dataset comprising 13.98 million customer records. The near-random treatment assignment (propensity AUC = 0.509) provides strong internal validity for causal estimation. Evaluated via Qini coefficient and cumulative gain curves, the S-Learner achieves the highest Qini score of 0.376, with the top 20% of customers ranked by predicted CATE capturing 77.7% of all incremental conversions, a 3.9x improvement over random targeting. SHAP analysis identifies f8 as the dominant heterogeneous treatment effect (HTE) driver among the 12 anonymized covariates. Causal Forest uncertainty quantification reveals that 1.9% of customers are confident persuadables (lower 95% CI > 0) and 0.1% are confident sleeping dogs (upper 95% CI < 0). Our results provide practitioners with evidence-based guidance on method selection for large-scale uplift modeling pipelines.

分层学习(2篇)

【1】Hierarchical SVG Tokenization: Learning Compact Visual Programs for Scalable Vector Graphics Modeling
标题：分层JPEG令牌化：学习紧凑的视觉程序以实现可扩展的载体图形建模
链接：https://arxiv.org/abs/2604.05072

作者：Ximing Xing,Ziteng Xue,Zhenxi Li,Weicong Liang,Linqing Wang,Zhantao Yang,Tiankai Hang,Zijin Yin,Qinglin Lu,Chunyu Wang,Qian Yu
备注：Homepage: https://hy-hivg.github.io/
摘要：最近的大型语言模型已经将SVG生成从可微渲染优化转移到自回归程序合成。然而，现有的方法仍然依赖于继承自自然语言处理的通用字节级标记化，这很难反映矢量图形的几何结构。数字坐标被分割成离散的符号，破坏了空间关系并引入了严重的标记冗余，通常会导致坐标幻觉和低效的长序列生成。为了解决这些挑战，我们提出了HiVG，一个分层SVG标记化框架，专为自回归矢量图形生成。HiVG将原始SVG字符串分解为结构化的\textit{atomic tokens}，并进一步将可执行命令-参数组压缩为几何约束的\textit{segment tokens}，在保持语法有效性的同时大大提高了序列效率。为了进一步减轻空间不匹配，我们引入了一个层次平均噪声（HMN）的初始化策略，注入数字排序信号和语义先验到新的令牌嵌入。结合逐渐增加程序复杂性的课程培训范例，HiVG可以更稳定地学习可执行SVG程序。在文本到SVG和图像到SVG任务上进行的大量实验表明，与传统的标记化方案相比，该方法提高了生成保真度、空间一致性和序列效率。
摘要：Recent large language models have shifted SVG generation from differentiable rendering optimization to autoregressive program synthesis. However, existing approaches still rely on generic byte-level tokenization inherited from natural language processing, which poorly reflects the geometric structure of vector graphics. Numerical coordinates are fragmented into discrete symbols, destroying spatial relationships and introducing severe token redundancy, often leading to coordinate hallucination and inefficient long-sequence generation. To address these challenges, we propose HiVG, a hierarchical SVG tokenization framework tailored for autoregressive vector graphics generation. HiVG decomposes raw SVG strings into structured \textit{atomic tokens} and further compresses executable command--parameter groups into geometry-constrained \textit{segment tokens}, substantially improving sequence efficiency while preserving syntactic validity. To further mitigate spatial mismatch, we introduce a Hierarchical Mean--Noise (HMN) initialization strategy that injects numerical ordering signals and semantic priors into new token embeddings. Combined with a curriculum training paradigm that progressively increases program complexity, HiVG enables more stable learning of executable SVG programs. Extensive experiments on both text-to-SVG and image-to-SVG tasks demonstrate improved generation fidelity, spatial consistency, and sequence efficiency compared with conventional tokenization schemes.

【2】Hierarchical Contrastive Learning for Multimodal Data
标题：多峰数据的分层对比学习
链接：https://arxiv.org/abs/2604.05462

作者：Huichao Li,Junhan Yu,Doudou Zhou
备注：34 pages,11 figures
摘要：多模态表征学习通常建立在共享私有分解的基础上，将潜在信息视为所有模态共有的或特定于一个模态的。这种二元观点通常是不充分的：许多因素仅由模态的子集共享，忽略这种部分共享可能会过度对齐不相关的信号并模糊互补信息。我们提出了分层对比学习（HCL），一个框架，学习全球共享，部分共享，并在一个统一的模型中的模态特定的表示。HCL结合了一个层次化的潜在变量制定与结构稀疏和结构感知的对比目标，只对齐真正共享一个潜在因素的模态。在不相关的潜变量下，我们证明了分层分解的可识别性，建立了负载矩阵的恢复保证，并推导出下游预测的参数估计和过度风险界。仿真结果表明，准确的恢复的层次结构和有效的选择任务相关的组件。在多模态电子健康记录上，HCL产生了更多信息的表示，并不断提高预测性能。
摘要：Multimodal representation learning is commonly built on a shared-private decomposition, treating latent information as either common to all modalities or specific to one. This binary view is often inadequate: many factors are shared by only subsets of modalities, and ignoring such partial sharing can over-align unrelated signals and obscure complementary information. We propose Hierarchical Contrastive Learning (HCL), a framework that learns globally shared, partially shared, and modality-specific representations within a unified model. HCL combines a hierarchical latent-variable formulation with structural sparsity and a structure-aware contrastive objective that aligns only modalities that genuinely share a latent factor. Under uncorrelated latent variables, we prove identifiability of the hierarchical decomposition, establish recovery guarantees for the loading matrices, and derive parameter estimation and excess-risk bounds for downstream prediction. Simulations show accurate recovery of hierarchical structure and effective selection of task-relevant components. On multimodal electronic health records, HCL yields more informative representations and consistently improves predictive performance.

医学相关(2篇)

【1】A Mixture of Experts Foundation Model for Scanning Electron Microscopy Image Analysis
标题：扫描电子显微镜图像分析的混合专家基础模型
链接：https://arxiv.org/abs/2604.05960

作者：Sk Miraj Ahmed,Yuewei Lin,Chuntian Cao,Shinjae Yoo,Xinpei Wu,Won-Il Lee,Nikhil Tiwale,Dan N. Le,Thi Thu Huong Chu,Jiyoung Kim,Kevin G. Yager,Chang-Yong Nam
摘要：扫描电子显微镜（SEM）是现代材料科学中不可或缺的，可在广泛的结构，化学和功能研究中实现高分辨率成像。然而，SEM成像仍然受到特定于任务的模型和劳动密集型采集过程的限制，这限制了其在不同应用程序中的可扩展性。在这里，我们介绍了SEM图像的第一个基础模型，该模型在多仪器，多条件科学显微照片的大型语料库上进行了预训练，能够在不同的材料系统和成像条件下进行泛化。利用自我监督的Transformer架构，我们的模型学习了丰富且可转移的表示，这些表示可以进行微调或适应广泛的下游任务。作为一个令人信服的演示，我们专注于散焦到聚焦图像的自动化显微镜流水线的一个重要的，但未充分探索的挑战。我们的方法不仅在没有配对监督的情况下从散焦输入中恢复聚焦细节，而且在多个评估指标上优于最先进的技术。这项工作为一类新的适应性SEM模型奠定了基础，通过将基础表征学习与现实世界的成像需求联系起来，加速材料发现。
摘要：Scanning Electron Microscopy (SEM) is indispensable in modern materials science, enabling high-resolution imaging across a wide range of structural, chemical, and functional investigations. However, SEM imaging remains constrained by task-specific models and labor-intensive acquisition processes that limit its scalability across diverse applications. Here, we introduce the first foundation model for SEM images, pretrained on a large corpus of multi-instrument, multi-condition scientific micrographs, enabling generalization across diverse material systems and imaging conditions. Leveraging a self-supervised transformer architecture, our model learns rich and transferable representations that can be fine-tuned or adapted to a wide range of downstream tasks. As a compelling demonstration, we focus on defocus-to-focus image translation-an essential yet underexplored challenge in automated microscopy pipelines. Our method not only restores focused detail from defocused inputs without paired supervision but also outperforms state-of-the-art techniques across multiple evaluation metrics. This work lays the groundwork for a new class of adaptable SEM models, accelerating materials discovery by bridging foundational representation learning with real-world imaging needs.

【2】PRIME: Prototype-Driven Multimodal Pretraining for Cancer Prognosis with Missing Modalities
标题：PRIME：原型驱动的多模式预训练，用于缺失模式的癌症预后
链接：https://arxiv.org/abs/2604.04999

作者：Kai Yu,Shuang Zhou,Yiran Song,Zaifu Zhan,Jie Peng,Kaixiong Zhou,Tianlong Chen,Feng Xie,Meng Wang,Huazhu Fu,Mingquan Lin,Rui Zhang
摘要：多模式自我监督预训练通过整合组织病理学全切片图像、基因表达和病理报告，为癌症预后提供了一条有前途的途径，但大多数现有方法需要完全配对和完整的输入。在实践中，临床队列是分散的，并且经常错过一个或多个模态，从而限制了监督融合和可扩展的多模态预训练。我们提出了PRIME，一个缺失感知的多模态自监督预训练框架，它可以从部分观察的队列中学习鲁棒和可转移的表示。PRIME将异构模态嵌入映射到统一的令牌空间中，并通过患者级共识检索引入用于潜在空间语义归算的共享原型记忆库，产生结构对齐的令牌而无需重建原始信号。两个互补的预训练目标：结构化缺失增强下的模态间对齐和融合后一致性，共同学习在任意模态子集下保持预测性的表示。我们在癌症基因组图谱上评估了PRIME，对32种癌症类型进行了无标签预训练，并对5个队列进行了下游5倍评估，包括总生存预测，3年死亡率分类和3年复发分类。在所有比较的方法中，PRIME实现了最好的宏观平均性能，在三个任务上分别达到0.653 C-index，0.689 AUROC和0.637 AUROC，同时提高了测试时间缺失下的鲁棒性，并支持参数有效和标签有效的自适应。这些结果支持缺失感知多模式预训练作为碎片化临床数据设置中预后建模的实用策略。
摘要：Multimodal self-supervised pretraining offers a promising route to cancer prognosis by integrating histopathology whole-slide images, gene expression, and pathology reports, yet most existing approaches require fully paired and complete inputs. In practice, clinical cohorts are fragmented and often miss one or more modalities, limiting both supervised fusion and scalable multimodal pretraining. We propose PRIME, a missing-aware multimodal self-supervised pretraining framework that learns robust and transferable representations from partially observed cohorts. PRIME maps heterogeneous modality embeddings into a unified token space and introduces a shared prototype memory bank for latent-space semantic imputation via patient-level consensus retrieval, producing structurally aligned tokens without reconstructing raw signals. Two complementary pretraining objectives: inter-modality alignment and post-fusion consistency under structured missingness augmentation, jointly learn representations that remain predictive under arbitrary modality subsets. We evaluate PRIME on The Cancer Genome Atlas with label-free pretraining on 32 cancer types and downstream 5-fold evaluation on five cohorts across overall survival prediction, 3-year mortality classification, and 3-year recurrence classification. PRIME achieves the best macro-average performance among all compared methods, reaching 0.653 C-index, 0.689 AUROC, and 0.637 AUROC on the three tasks, respectively, while improving robustness under test-time missingness and supporting parameter-efficient and label-efficient adaptation. These results support missing-aware multimodal pretraining as a practical strategy for prognosis modeling in fragmented clinical data settings.

蒸馏|知识提取(1篇)

【1】Jeffreys Flow: Robust Boltzmann Generators for Rare Event Sampling via Parallel Tempering Distillation
标题：Jeffreys Flow：通过并行恒温蒸馏进行罕见事件采样的稳健Boltzmann发生器
链接：https://arxiv.org/abs/2604.05303

作者：Guang Lin,Christian Moya,Di Qi,Xuda Ye
摘要：对具有粗糙能量景观的物理系统进行采样受到罕见事件和亚稳态捕获的阻碍。虽然玻尔兹曼发生器已经提供了一个解决方案，但它们对反向Kullback-Leibler发散的依赖经常导致灾难性的模式崩溃，在多模态分布中丢失特定的模式。在这里，我们介绍了杰弗里斯流，一个强大的生成框架，通过使用对称杰弗里斯发散从平行回火轨迹中提取经验采样数据来减轻这种失败。该公式有效地平衡了局部目标搜索精度与全局模式覆盖。我们表明，最大限度地减少杰弗里斯发散抑制模式崩溃和结构上纠正固有的不准确性，通过蒸馏的经验参考数据。我们展示了高度非凸多维基准的框架的可扩展性和准确性，包括系统校正的随机梯度偏差的随机交换随机梯度朗之万动力学和大规模加速的精确重要性采样的路径积分蒙特卡罗量子热态。
摘要：Sampling physical systems with rough energy landscapes is hindered by rare events and metastable trapping. While Boltzmann generators already offer a solution, their reliance on the reverse Kullback--Leibler divergence frequently induces catastrophic mode collapse, missing specific modes in multi-modal distributions. Here, we introduce the Jeffreys Flow, a robust generative framework that mitigates this failure by distilling empirical sampling data from Parallel Tempering trajectories using the symmetric Jeffreys divergence. This formulation effectively balances local target-seeking precision with global modes coverage. We show that minimizing Jeffreys divergence suppresses mode collapse and structurally corrects inherent inaccuracies via distillation of the empirical reference data. We demonstrate the framework's scalability and accuracy on highly non-convex multidimensional benchmarks, including the systematic correction of stochastic gradient biases in Replica Exchange Stochastic Gradient Langevin Dynamics and the massive acceleration of exact importance sampling in Path Integral Monte Carlo for quantum thermal states.

聚类(1篇)

【1】Weight-Informed Self-Explaining Clustering for Mixed-Type Tabular Data
标题：混合类型表格数据的权重知情自解释聚集
链接：https://arxiv.org/abs/2604.05857

作者：Lehao Li,Qiang Huang,Yihao Ang,Bryan Kian Hsiang Low,Anthony K. H. Tung,Xiaokui Xiao
摘要：聚类混合类型的表格数据是探索性分析的基础，但仍然具有挑战性，由于不一致的数字分类表示，不均匀和上下文相关的功能相关性，以及断开和事后解释的聚类过程。我们提出了WISE，这是一个权重知情的自我解释框架，它在一个完全无监督和透明的管道中统一了表示，特征加权，聚类和解释。WISE引入了二进制填充编码（BEP），以在统一的稀疏空间中对齐异构特征，Leave-One-Weighture-Out（LOFO）策略来感知多个高质量和多样化的特征加权视图，以及两阶段权重感知聚类过程来聚合替代语义分区。为了确保内在的可解释性，我们进一步开发了判别频率项（DFI），它产生的特征级解释是一致的，从实例到集群的添加剂分解保证。在六个真实世界数据集上进行的广泛实验表明，WISE在聚类质量方面始终优于经典和神经基线，同时保持高效，并在驱动聚类的相同原语基础上产生忠实的、人类可解释的解释。
摘要：Clustering mixed-type tabular data is fundamental for exploratory analysis, yet remains challenging due to misaligned numerical-categorical representations, uneven and context-dependent feature relevance, and disconnected and post-hoc explanation from the clustering process. We propose WISE, a Weight-Informed Self-Explaining framework that unifies representation, feature weighting, clustering, and interpretation in a fully unsupervised and transparent pipeline. WISE introduces Binary Encoding with Padding (BEP) to align heterogeneous features in a unified sparse space, a Leave-One-Feature-Out (LOFO) strategy to sense multiple high-quality and diverse feature-weighting views, and a two-stage weight-aware clustering procedure to aggregate alternative semantic partitions. To ensure intrinsic interpretability, we further develop Discriminative FreqItems (DFI), which yields feature-level explanations that are consistent from instances to clusters with an additive decomposition guarantee. Extensive experiments on six real-world datasets demonstrate that WISE consistently outperforms classical and neural baselines in clustering quality while remaining efficient, and produces faithful, human-interpretable explanations grounded in the same primitives that drive clustering.

联邦学习|隐私保护|加密(1篇)

【1】Scalar Federated Learning for Linear Quadratic Regulator
标题：线性二次调节器的纯量联邦学习
链接：https://arxiv.org/abs/2604.05088

作者：Mohammadreza Rostami,Shahriar Talebi,Solmaz S. Kia
摘要：我们提出了ScalarFedLQR，一个通信高效的联邦算法的一个共同的政策，在线性二次调节器（LQR）控制的异构代理的无模型学习。该方法建立在一个分解的投影梯度机制，其中每个代理通信只有一个标量投影的本地零阶梯度估计。服务器聚合这些标量消息以重构全局下降方向，将每个代理的上行链路通信从O（d）减少到O（1），与策略维度无关。至关重要的是，投影引起的近似误差随着参与代理的数量的增加而减小，从而产生有利的缩放律：更大的舰队可以实现更准确的梯度恢复，允许更大的步长，并且尽管高维，也可以实现更快的线性收敛。在标准正则性条件下，所有迭代保持稳定，平均LQR成本线性快速下降。数值结果表明，性能与全梯度联邦LQR大大减少了通信。
摘要：We propose ScalarFedLQR, a communication-efficient federated algorithm for model-free learning of a common policy in linear quadratic regulator (LQR) control of heterogeneous agents. The method builds on a decomposed projected gradient mechanism, in which each agent communicates only a scalar projection of a local zeroth-order gradient estimate. The server aggregates these scalar messages to reconstruct a global descent direction, reducing per-agent uplink communication from O(d) to O(1), independent of the policy dimension. Crucially, the projection-induced approximation error diminishes as the number of participating agents increases, yielding a favorable scaling law: larger fleets enable more accurate gradient recovery, admit larger stepsizes, and achieve faster linear convergence despite high dimensionality. Under standard regularity conditions, all iterates remain stabilizing and the average LQR cost decreases linearly fast. Numerical results demonstrate performance comparable to full-gradient federated LQR with substantially reduced communication.

推理|分析|理解|解释(7篇)

【1】Data Distribution Valuation Using Generalized Bayesian Inference
标题：使用广义Bayesian推理的数据分布估值
链接：https://arxiv.org/abs/2604.05993

作者：Cuong N. Nguyen,Cuong V. Nguyen
备注：Paper published at AISTATS 2026
摘要：我们研究的数据分布估值问题，其目的是量化的数据分布从他们的样本的价值。这是一个最近提出的问题，它与经典的数据估值有关但又不同，可以应用于各种应用。对于这个问题，我们开发了一个新的框架，称为广义贝叶斯估值，利用广义贝叶斯推理与损失的可转移性措施。这个框架允许我们以统一的方式解决看似无关的实际问题，例如注释器评估和数据增强。使用贝叶斯原理，我们进一步改进和增强我们的框架的适用性，将其扩展到连续数据流设置。我们的实验结果证实了我们的框架在不同的现实世界中的场景的有效性和效率。
摘要：We investigate the data distribution valuation problem, which aims to quantify the values of data distributions from their samples. This is a recently proposed problem that is related to but different from classical data valuation and can be applied to various applications. For this problem, we develop a novel framework called Generalized Bayes Valuation that utilizes generalized Bayesian inference with a loss constructed from transferability measures. This framework allows us to solve, in a unified way, seemingly unrelated practical problems, such as annotator evaluation and data augmentation. Using the Bayesian principles, we further improve and enhance the applicability of our framework by extending it to the continuous data stream setting. Our experiment results confirm the effectiveness and efficiency of our framework in different real-world scenarios.

【2】A Tensor-Train Framework for Bayesian Inference in High-Dimensional Systems: Applications to MIMO Detection and Channel Decoding
标题：多维系统中用于Bayesian推理的张量训练框架：在MMO检测和通道解码中的应用
链接：https://arxiv.org/abs/2604.05890

作者：Luca Schmid,Dominik Sulz,Shrinivas Chimmalgi,Laurent Schmalen
摘要：高维离散输入加性噪声模型中的贝叶斯推断是通信系统中的一个基本挑战，因为所需的联合后验概率（APP）质量函数的支持随着未知变量的数量呈指数增长。在这项工作中，我们提出了一个张量训练（TT）框架，用于离散输入加性噪声模型中易于处理的近最优贝叶斯推理。中心的见解是，联合日志APP质量函数承认一个确切的低秩表示在TT格式，使紧凑的存储和高效的计算。为了恢复符号明智的APP边缘，我们开发了一个实用的推理过程，近似的指数的对数后验使用TT交叉算法初始化与截断泰勒系列。为了证明该方法的一般性，我们推导出两个典型的通信问题的显式低秩TT结构：加性高斯白噪声（AWGN）下的线性观测模型，应用于多输入多输出（MIMO）检测，和软判决解码的二进制线性块纠错码在二进制输入AWGN信道。数值结果表明，在很宽的信噪比范围内的近最佳的错误率性能，而只需要适度的TT行列。这些结果突出了张量网络方法在通信系统中进行有效贝叶斯推理的潜力。
摘要：Bayesian inference in high-dimensional discrete-input additive noise models is a fundamental challenge in communication systems, as the support of the required joint a posteriori probability (APP) mass function grows exponentially with the number of unknown variables. In this work, we propose a tensor-train (TT) framework for tractable, near-optimal Bayesian inference in discrete-input additive noise models. The central insight is that the joint log-APP mass function admits an exact low-rank representation in the TT format, enabling compact storage and efficient computations. To recover symbol-wise APP marginals, we develop a practical inference procedure that approximates the exponential of the log-posterior using a TT-cross algorithm initialized with a truncated Taylor-series. To demonstrate the generality of the approach, we derive explicit low-rank TT constructions for two canonical communication problems: the linear observation model under additive white Gaussian noise (AWGN), applied to multiple-input multiple-output (MIMO) detection, and soft-decision decoding of binary linear block error correcting codes over the binary-input AWGN channel. Numerical results show near-optimal error-rate performance across a wide range of signal-to-noise ratios while requiring only modest TT ranks. These results highlight the potential of tensor-network methods for efficient Bayesian inference in communication systems.

【3】Training Without Orthogonalization, Inference With SVD: A Gradient Analysis of Rotation Representations
标题：无需插值的训练，用奇异值推理：旋转表示的梯度分析
链接：https://arxiv.org/abs/2604.05414

作者：Chris Choy
摘要：最近的工作表明，在训练过程中删除正交化并仅在推理时应用它可以改善深度学习中的旋转估计，经验证据有利于使用SVD投影的9D表示。然而，为什么SVD正交化特别损害训练，以及为什么在推理时应该优先于Gram-Schmidt，这些理论理解仍然不完整。我们提供了一个详细的梯度分析的SVD正交化专门为3\times 3$矩阵和SO（3）$投影。我们的中心结果导出了SVD后向传递雅可比矩阵的精确谱：它具有秩3 $（匹配$SO（3）$的维数），具有非零奇异值$2/（s_i + s_j）$和条件数$κ=（s_1 + s_2）/（s_2 + s_3）$，产生了可量化的梯度失真，当预测矩阵远离$SO（3）$时（例如，在训练的早期，当$s_3 \近似0$时）。我们进一步表明，即使稳定的SVD梯度引入梯度方向误差，而从训练循环中删除SVD完全避免了这种权衡。我们还证明了6D Gram-Schmidt Jacobian具有不对称谱：其参数接收不等梯度信号，解释了为什么9D参数化是优选的。总之，这些结果为直接9D回归训练和仅在推理时应用SVD投影提供了理论基础。
摘要：Recent work has shown that removing orthogonalization during training and applying it only at inference improves rotation estimation in deep learning, with empirical evidence favoring 9D representations with SVD projection. However, the theoretical understanding of why SVD orthogonalization specifically harms training, and why it should be preferred over Gram-Schmidt at inference, remains incomplete. We provide a detailed gradient analysis of SVD orthogonalization specialized to $3 \times 3$ matrices and $SO(3)$ projection. Our central result derives the exact spectrum of the SVD backward pass Jacobian: it has rank $3$ (matching the dimension of $SO(3)$) with nonzero singular values $2/(s_i + s_j)$ and condition number $κ= (s_1 + s_2)/(s_2 + s_3)$, creating quantifiable gradient distortion that is most severe when the predicted matrix is far from $SO(3)$ (e.g., early in training when $s_3 \approx 0$). We further show that even stabilized SVD gradients introduce gradient direction error, whereas removing SVD from the training loop avoids this tradeoff entirely. We also prove that the 6D Gram-Schmidt Jacobian has an asymmetric spectrum: its parameters receive unequal gradient signal, explaining why 9D parameterization is preferable. Together, these results provide the theoretical foundation for training with direct 9D regression and applying SVD projection only at inference.

【4】Probabilistic Tree Inference Enabled by FDSOI Ferroelectric FETs
标题：FDSIM铁电场效应管实现概率树推理
链接：https://arxiv.org/abs/2604.05115

作者：Pengyu Ren,Xingtian Wang,Boyang Cheng,Jiahui Duan,Giuk Kim,Xuezhong Niu,Halid Mulaosmanovic,Stefan Duenkel,Sven Beyer,X. Sharon Hu,Ningyuan Cao,Kai Ni
摘要：自动驾驶、医疗诊断和金融系统中的人工智能应用越来越需要能够提供强大的不确定性量化、可解释性和抗噪声能力的机器学习模型。贝叶斯决策树（BDTs）对于这些任务是有吸引力的，因为它们结合了概率推理，可解释的决策和对噪声的鲁棒性。然而，基于CPU和GPU的BDTs的现有硬件实现受到存储器瓶颈和不规则处理模式的限制，而利用模拟内容可寻址存储器（ACAM）和高斯随机数生成器（GRNG）的多平台解决方案引入了集成复杂度和能量开销。在这里，我们报告了一个单片FDSOI-FeFET硬件平台，原生支持ACAM和GRNG功能。FeFET的铁电极化使得能够实现用于ACAM的紧凑、节能的多位存储，并且栅极到漏极重叠区域中的带到带隧穿以及浮体中的后续空穴存储为GRNG提供了高质量的熵源。系统级的评估表明，该架构提供了强大的不确定性估计，可解释性和噪声容限与高能源效率。在数据集噪声和设备变化的情况下，与传统的决策树相比，它在MNIST上的分类准确率提高了40%以上。此外，它提供了超过两个数量级的CPU和GPU的基线加速和超过四个数量级的能源效率的改善，使其成为一个可扩展的解决方案，部署在资源受限和安全关键型环境的BDT。
摘要：Artificial intelligence applications in autonomous driving, medical diagnostics, and financial systems increasingly demand machine learning models that can provide robust uncertainty quantification, interpretability, and noise resilience. Bayesian decision trees (BDTs) are attractive for these tasks because they combine probabilistic reasoning, interpretable decision-making, and robustness to noise. However, existing hardware implementations of BDTs based on CPUs and GPUs are limited by memory bottlenecks and irregular processing patterns, while multi-platform solutions exploiting analog content-addressable memory (ACAM) and Gaussian random number generators (GRNGs) introduce integration complexity and energy overheads. Here we report a monolithic FDSOI-FeFET hardware platform that natively supports both ACAM and GRNG functionalities. The ferroelectric polarization of FeFETs enables compact, energy-efficient multi-bit storage for ACAM, and band-to-band tunneling in the gate-to-drain overlap region and subsequent hole storage in the floating body provides a high-quality entropy source for GRNG. System-level evaluations demonstrate that the proposed architecture provides robust uncertainty estimation, interpretability, and noise tolerance with high energy efficiency. Under both dataset noise and device variations, it achieves over 40% higher classification accuracy on MNIST compared to conventional decision trees. Moreover, it delivers more than two orders of magnitude speedup over CPU and GPU baselines and over four orders of magnitude improvement in energy efficiency, making it a scalable solution for deploying BDTs in resource-constrained and safety-critical environments.

【5】Towards Scaling Law Analysis For Spatiotemporal Weather Data
标题：时空天气数据的标度律分析
链接：https://arxiv.org/abs/2604.05068

作者：Alexander Kiefer,Prasanna Balaprakash,Xiao Wang
备注：9 pages, 6 figures, High Performance Computing for Imaging 2026
摘要：对于NLP和CV，计算最优缩放律得到了相对较好的研究，其中目标通常是单步的，并且目标相对均匀。天气预报更难在同一个框架中描述：自回归推出在长期范围内复合误差，输出将许多物理通道与不同的尺度和可预测性耦合在一起，全球汇集的测试指标可能与短期培训所暗示的每个通道的后期领先行为存在严重分歧。我们将自回归天气预报的神经尺度分析从单步训练损失扩展到长期推出和每个通道的指标。我们量化了（1）预测误差如何在各个渠道之间分布，以及其增长率如何随预测范围而变化，（2）当误差在全球范围内汇集时，相对于推出长度，幂律缩放是否适用于测试误差，以及（3）对于参数，数据和基于计算的缩放轴，拟合如何随范围和渠道而变化。我们发现强大的跨渠道和跨地平线的异质性：汇集缩放可以看起来有利，而许多渠道在后期下降的线索。我们讨论了加权目标，地平线意识的课程，跨产出的资源分配的影响。
摘要：Compute-optimal scaling laws are relatively well studied for NLP and CV, where objectives are typically single-step and targets are comparatively homogeneous. Weather forecasting is harder to characterize in the same framework: autoregressive rollouts compound errors over long horizons, outputs couple many physical channels with disparate scales and predictability, and globally pooled test metrics can disagree sharply with per-channel, late-lead behavior implied by short-horizon training. We extend neural scaling analysis for autoregressive weather forecasting from single-step training loss to long rollouts and per-channel metrics. We quantify (1) how prediction error is distributed across channels and how its growth rate evolves with forecast horizon, (2) if power law scaling holds for test error, relative to rollout length when error is pooled globally, and (3) how that fit varies jointly with horizon and channel for parameter, data, and compute-based scaling axes. We find strong cross-channel and cross-horizon heterogeneity: pooled scaling can look favorable while many channels degrade at late leads. We discuss implications for weighted objectives, horizon-aware curricula, and resource allocation across outputs.

【6】Untargeted analysis of volatile markers of post-exercise fat oxidation in exhaled breath
标题：呼气中运动后脂肪氧化挥发性标志物的非靶向分析
链接：https://arxiv.org/abs/2604.05707

作者：André Homeyer,Júlia Blanka Sziládi,Jan-Philipp Redlich,Jonathan Beauchamp,Y Lan Pham
摘要：呼吸丙酮是一种很有前途的非侵入性生物标志物，用于监测运动期间的脂肪氧化。然而，它的实用性受到混杂因素的限制，以及运动后几小时才发生浓度的显着变化，这使得实时评估变得困难。我们对挥发性有机化合物（VOC）进行了非靶向筛选，这些化合物可以作为丙酮以外的脂肪氧化的标志物，并研究了运动期间进行的呼吸测量是否可以预测运动后脂肪氧化的变化。19名参与者完成了两个25分钟的自行车课程，中间有5分钟的短暂休息时间。VOC排放量进行了分析，使用质子转移反应飞行时间质谱（PTR-TOF-MS）在运动期间和90分钟的恢复期后。血液β-羟基丁酸（BOHB）浓度作为脂肪氧化的参考标志物。在PTR-TOF-MS测量中检测到的773个相关分析特征中，只有4个信号与BOHB（p = 0.82，p = 0.0002）表现出强相关性-所有这些都可归因于丙酮或其同位素体或片段。这些信号的运动结束测量能够准确预测运动后BOHB发生显著变化的参与者（F1评分0.83，准确度= 0.89）。我们的研究没有发现任何新的基于呼吸的脂肪氧化生物标志物，但它证实了丙酮是关键标志物。此外，我们的研究结果表明，运动期间的呼吸丙酮测量可能已经能够基本预测运动后脂肪氧化。
摘要：Breath acetone represents a promising non-invasive biomarker for monitoring fat oxidation during exercise. However, its utility is limited by confounding factors, as well as by the fact that significant changes in concentration occur only hours post-exercise, which makes real-time assessment difficult. We performed an untargeted screening for volatile organic compounds (VOCs) that could serve as markers of fat oxidation beyond acetone, and investigated whether breath measurements taken during exercise could predict post-exercise changes in fat oxidation. Nineteen participants completed two 25-min cycling sessions separated by a brief 5-min rest period. VOC emissions were analysed using proton-transfer-reaction time-of-flight mass spectrometry (PTR-TOF-MS) during exercise and after a 90-min recovery period. Blood $β$-hydroxybutyrate (BOHB) concentrations served as the reference marker for fat oxidation. Among 773 relevant analytical features detected in the PTR-TOF-MS measurements, only four signals exhibited strong correlations with BOHB ($ρ$ $\geq$ 0.82, p = 0.0002)-all attributable to acetone or its isotopologues or fragments. End-of-exercise measurements of these signals enabled accurate prediction of participants with substantial post-exercise BOHB changes (F1 score $\geq$ 0.83, accuracy = 0.89). Our study did not reveal any novel breath-based biomarkers of fat oxidation, but it confirmed acetone as the key marker. Moreover, our findings suggest that breath acetone measurements during exercise may already enable basic predictions of post-exercise fat oxidation.

【7】Identification and Inference in Nonlinear Dynamic Network Models
标题：非线性动态网络模型中的识别与推理
链接：https://arxiv.org/abs/2604.04961

作者：Diego Vallarino
摘要：我们研究了定义在未知相互作用网络上的非线性动态系统的辨识和推理。该系统通过非线性算子通过控制横截面冲击传播的未观察到的相关矩阵进行演化。我们发现，网络结构一般不确定，识别需要足够的光谱异质性。特别是，识别时出现的网络诱导非交换协方差模式，通过异质放大的本征模。当频谱集中时，相关性在观测上等同于普通冲击或标量异质性，导致不可识别。我们提供了必要和充分条件的识别，观察等价类的特点，并提出了一个半参数估计与渐近理论。我们还开发了测试网络的依赖性，其功率取决于相互作用矩阵的光谱特性。结果适用于广泛的一类经济模型，包括生产网络，传染模型和动态相互作用系统。
摘要：We study identification and inference in nonlinear dynamic systems defined on unknown interaction networks. The system evolves through an unobserved dependence matrix governing cross-sectional shock propagation via a nonlinear operator. We show that the network structure is not generically identified, and that identification requires sufficient spectral heterogeneity. In particular, identification arises when the network induces non-exchangeable covariance patterns through heterogeneous amplification of eigenmodes. When the spectrum is concentrated, dependence becomes observationally equivalent to common shocks or scalar heterogeneity, leading to non-identification. We provide necessary and sufficient conditions for identification, characterize observational equivalence classes, and propose a semiparametric estimator with asymptotic theory. We also develop tests for network dependence whose power depends on spectral properties of the interaction matrix. The results apply to a broad class of economic models, including production networks, contagion models, and dynamic interaction systems.

检测相关(3篇)

【1】Cross-Machine Anomaly Detection Leveraging Pre-trained Time-series Model
标题：利用预先训练的时间序列模型进行跨机器异常检测
链接：https://arxiv.org/abs/2604.05335

作者：Yangmeng Li,Kei Sano,Toshihiro Kitao,Ryoji Anzaki,Yukiya Saitoh,Hironori Moki,Dragan Djurdjanovic
备注：20 pages, 5 figures, under review at a journal
摘要：实现弹性和高质量的制造需要可靠的数据驱动的异常检测方法，这些方法能够解决名义上相同并执行相同流程的不同机器之间的行为差异。为了解决使用从执行相同程序的不同机器收集的传感数据来检测机器中的异常的问题，本文提出了一种跨机器时间序列异常检测框架，该框架集成了域不变特征提取器和无监督异常检测模块。利用预训练的基础模型MOMENT，提取器采用随机森林分类器将嵌入分解为机器相关和条件相关的特征，后者作为对单个机器之间差异不变的表示。这些细化的特征使下游异常检测器能够有效地推广到看不见的目标机器。从三个不同的机器上收集的工业数据集上执行名义上相同的操作的实验表明，所提出的方法优于基于原始信号和MOMENT嵌入的特征基线，证实了其在增强跨机器泛化的有效性。
摘要：Achieving resilient and high-quality manufacturing requires reliable data-driven anomaly detection methods that are capable of addressing differences in behaviors among different individual machines which are nominally the same and are executing the same processes. To address the problem of detecting anomalies in a machine using sensory data gathered from different individual machines executing the same procedure, this paper proposes a cross-machine time-series anomaly detection framework that integrates a domain-invariant feature extractor with an unsupervised anomaly detection module. Leveraging the pre-trained foundation model MOMENT, the extractor employs Random Forest Classifiers to disentangle embeddings into machine-related and condition-related features, with the latter serving as representations which are invariant to differences between individual machines. These refined features enable the downstream anomaly detectors to generalize effectively to unseen target machines. Experiments on an industrial dataset collected from three different machines performing nominally the same operation demonstrate that the proposed approach outperforms both the raw-signal-based and MOMENT-embedding feature baselines, confirming its effectiveness in enhancing cross-machine generalization.

【2】Belief Dynamics for Detecting Behavioral Shifts in Safe Collaborative Manipulation
标题：检测安全协作操纵行为转变的信念动力学
链接：https://arxiv.org/abs/2604.04967

作者：Devashri Naik,Divake Kumar,Nastaran Darabi,Amit Ranjan Trivedi
摘要：在共享环境中运行的机器人必须与其他代理保持安全协调，这些代理的行为可能在任务执行期间发生变化。当一个合作的代理在事件中期切换策略时，在过时的假设下继续下去可能会导致不安全的行为和增加的碰撞风险。因此，可靠地检测这种行为状态变化至关重要。我们研究了ManiSkill共享工作空间操作任务中受控非平稳性下的状态切换检测。在十种检测方法和五种随机种子中，启用检测可将交换后冲突减少52%。然而，平均性能隐藏了显著的可靠性差异：在± 3步的实际公差下，检测范围从86%到30%，而在± 5步下，所有方法都达到100%。我们介绍了UA-TOM，一个轻量级的信念跟踪模块，增强冻结的视觉语言动作（VLA）控制骨干使用选择性的状态空间动态，因果注意力和预测误差信号。在5个种子和1200集中，UA-TOM在无辅助方法中实现了最高的检测率（+-3时为85.7%）和最低的近距离时间（4.8步），优于Oracle（5.3步）。分析表明，隐藏状态更新幅度在状态切换时增加了17倍，并在大约10个时间步长内衰减，而离散化步骤收敛到一个接近恒定的值（Delta_t约为0.78），表明灵敏度由学习的动态驱动，而不是依赖于输入的门控。在Overcooked中的跨域实验显示了因果注意和预测错误信号的互补作用。UA-TOM引入了7.4 ms的推理开销（50 ms控制预算的14.8%），从而在不修改基本策略的情况下实现了可靠的状态切换检测。
摘要：Robots operating in shared workspaces must maintain safe coordination with other agents whose behavior may change during task execution. When a collaborating agent switches strategy mid-episode, continuing under outdated assumptions can lead to unsafe actions and increased collision risk. Reliable detection of such behavioral regime changes is therefore critical. We study regime-switch detection under controlled non-stationarity in ManiSkill shared-workspace manipulation tasks. Across ten detection methods and five random seeds, enabling detection reduces post-switch collisions by 52%. However, average performance hides significant reliability differences: under a realistic tolerance of +-3 steps, detection ranges from 86% to 30%, while under +-5 steps all methods achieve 100%. We introduce UA-TOM, a lightweight belief-tracking module that augments frozen vision-language-action (VLA) control backbones using selective state-space dynamics, causal attention, and prediction-error signals. Across five seeds and 1200 episodes, UA-TOM achieves the highest detection rate among unassisted methods (85.7% at +-3) and the lowest close-range time (4.8 steps), outperforming an Oracle (5.3 steps). Analysis shows hidden-state update magnitude increases by 17x at regime switches and decays over roughly 10 timesteps, while the discretization step converges to a near-constant value (Delta_t approx 0.78), indicating sensitivity driven by learned dynamics rather than input-dependent gating. Cross-domain experiments in Overcooked show complementary roles of causal attention and prediction-error signals. UA-TOM introduces 7.4 ms inference overhead (14.8% of a 50 ms control budget), enabling reliable regime-switch detection without modifying the base policy.

【3】The Hiremath Early Detection (HED) Score: A Measure-Theoretic Evaluation Standard for Temporal Intelligence
标题：Hiremath早期检测（HED）分数：时间智力的测量理论评估标准
链接：https://arxiv.org/abs/2604.04993

作者：Prakul Sunil Hiremath
备注：11 pages. Introduces a measure-theoretic framework for predictive velocity including the Hiremath Standard Table. Dedicated to the Hiremath lineage
摘要：我们介绍了Hiremath早期检测（HED）分数，一个原则性的，测量理论的评估标准，用于量化系统中的信息的时间价值的非平稳随机过程突然政权过渡。现有的评估范式，主要是ROC/AUC框架及其下游变体，是时间不可知的：它们对任意大的tau在t + 1处的检测和在t + tau处的检测分配相同的信用。这种对延迟的漠不关心是时间关键领域的根本不足，包括网络物理安全，算法监视和流行病学监测。 HED分数通过在目标状态的后验概率流上积分基线中性的指数衰减内核来解决这个问题，精确地从状态转变的开始开始。所得到的标量同时编码检测敏锐度、时间导联和转换前校准质量。我们证明了HED分数满足三个公理要求：（A1）时间单调性，（A2）攻击前偏差的不变性，和（A3）灵敏度分解。我们进一步证明，HED分数承认一个自然的参数家庭索引的Hiremath衰减常数（λ_H），其特定于域的校准构成Hiremath标准表。作为一种经验工具，我们提出了PARD-SSM（通过切换状态空间模型进行概率异常和状态检测），它将分数阶随机微分方程（fSDES）与切换线性动态系统（S-LDS）推理后端相结合。在NSL-KDD基准测试中，PARD-SSM的HED得分为0.0643，比随机森林基线（0.0132）提高了388.8%，通过块引导法检验证实了统计学显著性（p < 0.001）。我们提出HED评分作为ROC/AUC的后继评价标准。
摘要：We introduce the Hiremath Early Detection (HED) Score, a principled, measure-theoretic evaluation criterion for quantifying the time-value of information in systems operating over non-stationary stochastic processes subject to abrupt regime transitions. Existing evaluation paradigms, chiefly the ROC/AUC framework and its downstream variants, are temporally agnostic: they assign identical credit to a detection at t + 1 and a detection at t + tau for arbitrarily large tau. This indifference to latency is a fundamental inadequacy in time-critical domains including cyber-physical security, algorithmic surveillance, and epidemiological monitoring. The HED Score resolves this by integrating a baseline-neutral, exponentially decaying kernel over the posterior probability stream of a target regime, beginning precisely at the onset of the regime shift. The resulting scalar simultaneously encodes detection acuity, temporal lead, and pre-transition calibration quality. We prove that the HED Score satisfies three axiomatic requirements: (A1) Temporal Monotonicity, (A2) Invariance to Pre-Attack Bias, and (A3) Sensitivity Decomposability. We further demonstrate that the HED Score admits a natural parametric family indexed by the Hiremath Decay Constant (lambda_H), whose domain-specific calibration constitutes the Hiremath Standard Table. As an empirical vehicle, we present PARD-SSM (Probabilistic Anomaly and Regime Detection via Switching State-Space Models), which couples fractional Stochastic Differential Equations (fSDEs) with a Switching Linear Dynamical System (S-LDS) inference backend. On the NSL-KDD benchmark, PARD-SSM achieves a HED Score of 0.0643, representing a 388.8 percent improvement over a Random Forest baseline (0.0132), with statistical significance confirmed via block-bootstrap resampling (p < 0.001). We propose the HED Score as the successor evaluation standard to ROC/AUC.

分类|识别(1篇)

【1】Optimal Centered Active Excitation in Linear System Identification
标题：线性系统辨识中的最优定心主动激励
链接：https://arxiv.org/abs/2604.05518

作者：Kaito Ito,Alexandre Proutiere
备注：11 pages
摘要：提出了一种基于最优中心噪声激励的线性系统辨识的主动学习算法。值得注意的是，我们的算法，基于普通的最小二乘和半定规划，达到最小的样本复杂性，同时允许有效的计算系统矩阵的估计。更具体地说，我们首先为任何主动学习算法建立样本复杂度的下限，以达到规定的精度和置信水平。接下来，我们推导出一个样本的复杂度上限的算法，它匹配的下限的任何算法的通用因素。我们的紧边界很容易解释，并明确显示其依赖于系统参数，如状态维。
摘要：We propose an active learning algorithm for linear system identification with optimal centered noise excitation. Notably, our algorithm, based on ordinary least squares and semidefinite programming, attains the minimal sample complexity while allowing for efficient computation of an estimate of a system matrix. More specifically, we first establish lower bounds of the sample complexity for any active learning algorithm to attain the prescribed accuracy and confidence levels. Next, we derive a sample complexity upper bound of the proposed algorithm, which matches the lower bound for any algorithm up to universal factors. Our tight bounds are easy to interpret and explicitly show their dependence on the system parameters such as the state dimension.

编码器(2篇)

【1】General Multimodal Protein Design Enables DNA-Encoding of Chemistry
标题：通用多峰蛋白质设计使化学的DNA编码成为可能
链接：https://arxiv.org/abs/2604.05181

作者：Jarrid Rector-Brooks,Théophile Lambert,Marta Skreta,Daniel Roth,Yueming Long,Zi-Qi Li,Xi Zhang,Miruna Cretu,Francesca-Zhoufan Li,Tanvi Ganapathy,Emily Jin,Avishek Joey Bose,Jason Yang,Kirill Neklyudov,Yoshua Bengio,Alexander Tong,Frances H. Arnold,Cheng-Hao Liu
摘要：进化是酶多样性的非凡引擎，但它所探索的化学仍然是DNA编码的一小部分。深度生成模型可以设计结合配体的新蛋白质，但没有一个模型可以在没有预先指定催化残基的情况下创建酶。我们介绍了DISCO（序列-结构协同设计的DIffusion），这是一种多模式模型，可围绕任意生物分子协同设计蛋白质序列和3D结构，以及优化两种模式目标的推理时间缩放方法。DISCO仅以活性中间体为条件，设计了具有新型活性位点几何结构的多种血红素酶。这些酶催化新的天然卡宾转移反应，包括烯烃环丙烷化、螺环丙烷化、B-H和C（sp33 $）-H插入，具有超过工程化酶的高活性。选择的设计的随机诱变进一步证实了酶活性可以通过定向进化来改善。通过提供一个可扩展的途径，可进化的酶，DISCO拓宽了潜在的范围，遗传编码转换。代码可在https://github.com/DISCO-design/DISCO上获得。
摘要：Evolution is an extraordinary engine for enzymatic diversity, yet the chemistry it has explored remains a narrow slice of what DNA can encode. Deep generative models can design new proteins that bind ligands, but none have created enzymes without pre-specifying catalytic residues. We introduce DISCO (DIffusion for Sequence-structure CO-design), a multimodal model that co-designs protein sequence and 3D structure around arbitrary biomolecules, as well as inference-time scaling methods that optimize objectives across both modalities. Conditioned solely on reactive intermediates, DISCO designs diverse heme enzymes with novel active-site geometries. These enzymes catalyze new-to-nature carbene-transfer reactions, including alkene cyclopropanation, spirocyclopropanation, B-H, and C(sp$^3$)-H insertions, with high activities exceeding those of engineered enzymes. Random mutagenesis of a selected design further confirmed that enzyme activity can be improved through directed evolution. By providing a scalable route to evolvable enzymes, DISCO broadens the potential scope of genetically encodable transformations. Code is available at https://github.com/DISCO-design/DISCO.

【2】Shot-Based Quantum Encoding: A Data-Loading Paradigm for Quantum Neural Networks
标题：基于镜头的量子编码：量子神经网络的数据加载范式
链接：https://arxiv.org/abs/2604.06135

作者：Basil Kyriacou,Viktoria Patapovich,Maniraman Periyasamy,Alexey Melnikov
备注：6 pages, 2 figures, 0 tables
摘要：有效的数据加载仍然是近期量子机器学习的瓶颈。现有的方案（角度、幅度和基编码）要么没有充分利用指数希尔伯特空间容量，要么需要超过有噪声的中间尺度量子硬件的相干预算的电路深度。我们介绍了基于镜头的量子编码（SBQE），数据嵌入策略，分配硬件的原生资源，镜头，根据数据依赖的经典分布在多个初始量子态。通过将射击次数视为可学习的自由度，SBQE产生混合状态表示，其期望值在经典概率中是线性的，因此可以与非线性激活函数组合。我们表明，SBQE在结构上等价于一个多层感知器，其权重由量子电路实现，我们描述了一个硬件兼容的实现协议。对时尚MNIST和Semeion手写数字的基准测试，每个模型有十个独立的初始化，显示SBQE在Semeion上达到89.1% +/- 0.9%的测试准确度（相对于幅度编码，误差减少5.3%，并与宽度匹配的经典网络相匹配）和80.95% +/- 0.10%的Fashion MNIST（超过幅度编码+2.0%和线性多层感知器+1.3%），所有这些都没有任何数据编码门。
摘要：Efficient data loading remains a bottleneck for near-term quantum machine-learning. Existing schemes (angle, amplitude, and basis encoding) either underuse the exponential Hilbert-space capacity or require circuit depths that exceed the coherence budgets of noisy intermediate-scale quantum hardware. We introduce Shot-Based Quantum Encoding (SBQE), a data embedding strategy that distributes the hardware's native resource, shots, according to a data-dependent classical distribution over multiple initial quantum states. By treating the shot counts as a learnable degree of freedom, SBQE produces a mixed-state representation whose expectation values are linear in the classical probabilities and can therefore be composed with non-linear activation functions. We show that SBQE is structurally equivalent to a multilayer perceptron whose weights are realised by quantum circuits, and we describe a hardware-compatible implementation protocol. Benchmarks on Fashion MNIST and Semeion handwritten digits, with ten independent initialisations per model, show that SBQE achieves 89.1% +/- 0.9% test accuracy on Semeion (reducing error by 5.3% relative to amplitude encoding and matching a width-matched classical network) and 80.95% +/- 0.10% on Fashion MNIST (exceeding amplitude encoding by +2.0% and a linear multilayer perceptron by +1.3%), all without any data-encoding gates.

优化|敛散性(9篇)

【1】Target Policy Optimization
标题：目标政策优化
链接：https://arxiv.org/abs/2604.06159

作者：Jean Kaddour
摘要：在RL中，给出一个提示，我们从模型中抽取一组完成并对其进行评分。接下来有两个问题：哪些完成应该获得概率质量，以及参数应该如何移动以实现这种变化？标准的策略梯度方法同时解决这两个问题，因此更新可能会根据学习率、裁剪和其他优化器的选择而过冲或下冲。我们引入了目标策略优化（TPO），它将这两个问题分开。给定评分完成，TPO构造一个目标分布$q_i \propto p_i^{\，\mathrm{old}} \exp（u_i）$，并通过交叉熵将策略拟合到它。抽样完成logits上的损失梯度是p^θ- q，一旦策略匹配目标，它就消失了。在表格强盗、Transformer序列任务和十亿参数LLM RLVR上，TPO在简单任务上与PG、PPO、GRPO和DG相匹配，并且在稀疏奖励下显著优于它们。代码可在https://github.com/JeanKaddour/tpo上获得。
摘要：In RL, given a prompt, we sample a group of completions from a model and score them. Two questions follow: which completions should gain probability mass, and how should the parameters move to realize that change? Standard policy-gradient methods answer both at once, so the update can overshoot or undershoot depending on the learning rate, clipping, and other optimizer choices. We introduce \emph{Target Policy Optimization} (TPO), which separates the two questions. Given scored completions, TPO constructs a target distribution $q_i \propto p_i^{\,\mathrm{old}} \exp(u_i)$ and fits the policy to it by cross-entropy. The loss gradient on sampled-completion logits is $p^θ- q$, which vanishes once the policy matches the target. On tabular bandits, transformer sequence tasks, and billion-parameter LLM RLVR, TPO matches PG, PPO, GRPO, and DG on easy tasks and substantially outperforms them under sparse reward. Code is available at https://github.com/JeanKaddour/tpo.

【2】PromptEvolver: Prompt Inversion through Evolutionary Optimization in Natural-Language Space
标题：EntEvolver：通过自然语言空间中的进化优化提示倒置
链接：https://arxiv.org/abs/2604.06061

作者：Asaf Buchnick,Aviv Shamsian,Aviv Navon,Ethan Fetaya
摘要：文本到图像的生成进展迅速，但忠实地生成复杂的场景需要大量的试错来找到确切的提示。在提示反转任务中，目标是恢复能够忠实地重建给定目标图像的文本提示。目前，现有的方法经常产生次优重建，并产生不自然的，难以解释的提示，阻碍透明度和可控性。在这项工作中，我们提出了一种即时反转方法，它可以生成自然语言提示，同时实现目标图像的高保真重建。我们的方法使用遗传算法来优化提示，利用强大的视觉语言模型来指导进化过程。重要的是，它只需要图像输出，适用于黑盒生成模型。最后，我们在多个即时反演基准测试中评估了BattleEvolver，并表明它始终优于竞争方法。
摘要：Text-to-image generation has progressed rapidly, but faithfully generating complex scenes requires extensive trial-and-error to find the exact prompt. In the prompt inversion task, the goal is to recover a textual prompt that can faithfully reconstruct a given target image. Currently, existing methods frequently yield suboptimal reconstructions and produce unnatural, hard-to-interpret prompts that hinder transparency and controllability. In this work, we present PromptEvolver, a prompt inversion approach that generates natural-language prompts while achieving high-fidelity reconstructions of the target image. Our method uses a genetic algorithm to optimize the prompt, leveraging a strong vision-language model to guide the evolution process. Importantly, it works on black-box generation models by requiring only image outputs. Finally, we evaluate PromptEvolver across multiple prompt inversion benchmarks and show that it consistently outperforms competing methods.

【3】QiMeng-PRepair: Precise Code Repair via Edit-Aware Reward Optimization
标题：QiMeng-Prevey：通过编辑感知奖励优化进行精确代码修复
链接：https://arxiv.org/abs/2604.05963

作者：Changxin Ke,Rui Zhang,Jiaming Guo,Yuanbo Wen,Li Ding,Shuo Wang,Xuyuan Zhu,Xiong Peng,Di Huang,Zidong Du,Xing Hu,Qi Guo,Yunji Chen
备注：Accepted to ACL 2026 main conference
摘要：大型语言模型（LLM）实现了强大的程序修复性能，但经常遭受过度编辑，过度修改会覆盖正确的代码并阻碍错误定位。我们系统地量化其影响，并引入精确的修复任务，最大限度地重用正确的代码，同时只修复错误的部分。基于这一见解，我们提出了一个框架，可以减轻过度编辑，提高修复的准确性。PRepair有两个组件：Self-Breaking通过受控的bug注入和最小-最大采样生成各种bug程序，Self-Repairing通过编辑感知组相对策略优化（EA-GRPO）训练模型，使用编辑感知奖励鼓励最小但正确的编辑。实验表明，在$\mathrm{fix}_1@1$（联合考虑修复正确性和程度的度量）下，PRepair将修复精度提高了31.4%，并在结合推测性编辑时显著提高了解码吞吐量，展示了其精确实用的代码修复潜力。
摘要：Large Language Models (LLMs) achieve strong program repair performance but often suffer from over-editing, where excessive modifications overwrite correct code and hinder bug localization. We systematically quantify its impact and introduce precise repair task, which maximizes reuse of correct code while fixing only buggy parts. Building on this insight, we propose PRepair, a framework that mitigates over-editing and improves repair accuracy. PRepair has two components: Self-Breaking, which generates diverse buggy programs via controlled bug injection and min-max sampling, and Self-Repairing, which trains models with Edit-Aware Group Relative Policy Optimization (EA-GRPO) using an edit-aware reward to encourage minimal yet correct edits. Experiments show that PRepair improves repair precision by up to 31.4% under $\mathrm{fix}_1@1$, a metric that jointly considers repair correctness and extent, and significantly increases decoding throughput when combined with speculative editing, demonstrating its potential for precise and practical code repair.

【4】Neural Network Pruning via QUBO Optimization
标题：通过QUBO优化进行神经网络修剪
链接：https://arxiv.org/abs/2604.05856

作者：Osama Orabi,Artur Zagitov,Hadi Salloum,Viktor A. Lobachev,Kasymkhan Khubiev,Yaroslav Kholodov
备注：13 pages, 5 figures, 4 tables
摘要：神经网络修剪可以用公式表示为组合优化问题，但大多数现有方法依赖于贪婪算法，忽略了过滤器之间的复杂交互。形式优化方法，如二次无约束二进制优化（QUBO）提供了一个原则性的替代方案，但迄今为止由于过于简化的目标公式，如L1范数，表现不佳。在这项工作中，我们提出了一个统一的混合QUBO框架，桥梁启发式的重要性估计与全局组合优化。我们的公式将梯度敏感性度量-特别是一阶泰勒和二阶Fisher信息-集成到线性项中，同时在二次项中利用数据驱动的激活相似性。这允许QUBO目标联合捕获单个过滤器相关性和过滤器间功能冗余。我们进一步引入了一个动态的容量驱动的搜索，严格执行目标稀疏性，而不扭曲的优化景观。最后，我们采用了一个两阶段的流水线，其中包括一个Tensor-Train（TT）Refinement阶段-一个无梯度优化器，可以直接根据真实的评估指标对QUBO导出的解决方案进行微调。SIDD图像去噪数据集上的实验表明，所提出的混合QUBO显着优于贪婪泰勒修剪和传统的基于L1的QUBO，TT细化提供进一步一致的增益在适当的组合尺度。这突出了混合组合公式在鲁棒、可扩展和可解释的神经网络压缩方面的潜力。
摘要：Neural network pruning can be formulated as a combinatorial optimization problem, yet most existing approaches rely on greedy heuristics that ignore complex interactions between filters. Formal optimization methods such as Quadratic Unconstrained Binary Optimization (QUBO) provide a principled alternative but have so far underperformed due to oversimplified objective formulations based on metrics like the L1-norm. In this work, we propose a unified Hybrid QUBO framework that bridges heuristic importance estimation with global combinatorial optimization. Our formulation integrates gradient-aware sensitivity metrics - specifically first-order Taylor and second-order Fisher information - into the linear term, while utilizing data-driven activation similarity in the quadratic term. This allows the QUBO objective to jointly capture individual filter relevance and inter-filter functional redundancy. We further introduce a dynamic capacity-driven search to strictly enforce target sparsity without distorting the optimization landscape. Finally, we employ a two-stage pipeline featuring a Tensor-Train (TT) Refinement stage - a gradient-free optimizer that fine-tunes the QUBO-derived solution directly against the true evaluation metric. Experiments on the SIDD image denoising dataset demonstrate that the proposed Hybrid QUBO significantly outperforms both greedy Taylor pruning and traditional L1-based QUBO, with TT Refinement providing further consistent gains at appropriate combinatorial scales. This highlights the potential of hybrid combinatorial formulations for robust, scalable, and interpretable neural network compression.

【5】Curvature-Aware Optimization for High-Accuracy Physics-Informed Neural Networks
标题：高精度物理信息神经网络的曲率感知优化
链接：https://arxiv.org/abs/2604.05230

作者：Anas Jnini,Elham Kiyani,Khemraj Shukla,Jorge F. Urban,Nazanin Ahmadi Daryakenari,Johannes Muller,Marius Zeinhofer,George Em Karniadakis
备注：54 pages, 24 figures
摘要：高效和鲁棒的优化对神经网络至关重要，使科学机器学习模型能够快速收敛到非常高的精度-忠实地捕捉由微分方程控制的复杂物理行为。在这项工作中，我们提出了先进的优化策略，以加速具有挑战性的偏微分方程（PDE）和常微分方程（ODE）的物理信息神经网络（PINN）的收敛。具体来说，我们提供了有效的实现自然梯度（NG）优化，自缩放BFGS和Broyden优化，并证明其性能的问题，包括亥姆霍兹方程，斯托克斯流，无粘Burgers方程，欧拉方程的高速流动，以及刚性常微分方程中产生的药代动力学和药效学。除了优化器的开发，我们还提出了新的基于PINN的方法来求解无粘Burgers和Euler方程，并将所得的解决方案与高阶数值方法进行比较，以提供严格和公平的评估。最后，我们解决了为批量训练扩展这些拟牛顿优化器的挑战，为大型数据驱动的问题提供了高效和可扩展的解决方案。
摘要：Efficient and robust optimization is essential for neural networks, enabling scientific machine learning models to converge rapidly to very high accuracy -- faithfully capturing complex physical behavior governed by differential equations. In this work, we present advanced optimization strategies to accelerate the convergence of physics-informed neural networks (PINNs) for challenging partial (PDEs) and ordinary differential equations (ODEs). Specifically, we provide efficient implementations of the Natural Gradient (NG) optimizer, Self-Scaling BFGS and Broyden optimizers, and demonstrate their performance on problems including the Helmholtz equation, Stokes flow, inviscid Burgers equation, Euler equations for high-speed flows, and stiff ODEs arising in pharmacokinetics and pharmacodynamics. Beyond optimizer development, we also propose new PINN-based methods for solving the inviscid Burgers and Euler equations, and compare the resulting solutions against high-order numerical methods to provide a rigorous and fair assessment. Finally, we address the challenge of scaling these quasi-Newton optimizers for batched training, enabling efficient and scalable solutions for large data-driven problems.

【6】FNO$^{\angle θ}$: Extended Fourier neural operator for learning state and optimal control of distributed parameter systems
链接：https://arxiv.org/abs/2604.05187

作者：Zhexian Li,Ketan Savla
备注：6 pages, 3 figures
摘要：我们提出了一种扩展的傅立叶神经算子（FNO）结构的学习状态和线性二次型加性最优控制系统的偏微分方程。利用Escherapreis-Palamodov基本原理，我们证明了常系数线性偏微分方程的任何状态和最优控制都可以表示为复域中的一个积分。该表示的被积函数涉及与逆傅立叶变换中相同的指数项，其中后者用于表示FNO层中的卷积算子。出于这一观察，我们修改FNO层的傅里叶逆变换从实域到复域的频率变量，以捕捉从基本原理的积分表示。我们说明了FNO的学习状态和最优控制的非线性Burgers方程的性能，示出了数量级的改进，在训练误差和更准确的预测FNO的非周期边界值。
摘要：We propose an extended Fourier neural operator (FNO) architecture for learning state and linear quadratic additive optimal control of systems governed by partial differential equations. Using the Ehrenpreis-Palamodov fundamental principle, we show that any state and optimal control of linear PDEs with constant coefficients can be represented as an integral in the complex domain. The integrand of this representation involves the same exponential term as in the inverse Fourier transform, where the latter is used to represent the convolution operator in FNO layer. Motivated by this observation, we modify the FNO layer by extending the frequency variable in the inverse Fourier transform from the real to complex domain to capture the integral representation from the fundamental principle. We illustrate the performance of FNO in learning state and optimal control for the nonlinear Burgers' equation, showing order of magnitude improvements in training errors and more accurate predictions of non-periodic boundary values over FNO.

【7】Energy-Based Dynamical Models for Neurocomputation, Learning, and Optimization
标题：用于神经计算、学习和优化的基于能量的动态模型
链接：https://arxiv.org/abs/2604.05042

作者：Arthur N. Montanari,Francesco Bullo,Dmitry Krotov,Adilson E. Motter
摘要：控制理论、神经科学和机器学习交叉领域的最新进展揭示了动力系统执行计算的新机制。这些进展包括广泛的概念，数学和计算思想，应用于模型学习和训练，记忆检索，数据驱动控制和优化。本教程重点介绍神经启发的计算方法，旨在提高这些任务的可扩展性，鲁棒性和能源效率，弥合人工和生物系统之间的差距。特别强调的是基于能量的动态模型，通过梯度流和能源景观编码信息。我们首先回顾经典的配方，如连续时间Hopfield网络和玻尔兹曼机，然后扩展到现代发展的框架。这些包括用于大容量存储的密集关联记忆模型，用于大规模优化的基于递归的网络，以及用于复合和约束重建的近似下降动力学。本教程演示了控制理论原理如何指导下一代神经计算系统的设计，引导讨论超越传统的前馈和基于反向传播的人工智能方法。
摘要：Recent advances at the intersection of control theory, neuroscience, and machine learning have revealed novel mechanisms by which dynamical systems perform computation. These advances encompass a wide range of conceptual, mathematical, and computational ideas, with applications for model learning and training, memory retrieval, data-driven control, and optimization. This tutorial focuses on neuro-inspired approaches to computation that aim to improve scalability, robustness, and energy efficiency across such tasks, bridging the gap between artificial and biological systems. Particular emphasis is placed on energy-based dynamical models that encode information through gradient flows and energy landscapes. We begin by reviewing classical formulations, such as continuous-time Hopfield networks and Boltzmann machines, and then extend the framework to modern developments. These include dense associative memory models for high-capacity storage, oscillator-based networks for large-scale optimization, and proximal-descent dynamics for composite and constrained reconstruction. The tutorial demonstrates how control-theoretic principles can guide the design of next-generation neurocomputing systems, steering the discussion beyond conventional feedforward and backpropagation-based approaches to artificial intelligence.

【8】Efficient machine unlearning with minimax optimality
标题：具有极小最佳性的高效机器去学习
链接：https://arxiv.org/abs/2604.05669

作者：Jingyi Xie,Linjun Zhang,Sai Li
摘要：人们越来越需要有效的数据删除，以遵守GDPR等法规，并减轻有偏见或损坏的数据的影响。这推动了机器非学习领域的发展，该领域旨在消除特定数据子集的影响，而无需完全重新训练。在这项工作中，我们提出了一个具有通用损失函数的机器学习统计框架，并建立了理论保证。对于平方损失，特别是，我们开发了Unlearning Least Squares（ULS），并建立了它的minimax最优性，用于估计剩余数据的模型参数，当只有预训练的估计器，遗忘样本和剩余数据的小子样可用时。我们的研究结果表明，估计误差分解为一个预言项和遗忘成本决定的遗忘比例和遗忘模型偏差。我们进一步建立渐近有效的推理程序，而不需要充分的再培训。数值实验和实际数据的应用表明，该方法实现的性能接近再训练，同时需要大大减少数据访问。
摘要：There is a growing demand for efficient data removal to comply with regulations like the GDPR and to mitigate the influence of biased or corrupted data. This has motivated the field of machine unlearning, which aims to eliminate the influence of specific data subsets without the cost of full retraining. In this work, we propose a statistical framework for machine unlearning with generic loss functions and establish theoretical guarantees. For squared loss, especially, we develop Unlearning Least Squares (ULS) and establish its minimax optimality for estimating the model parameter of remaining data when only the pre-trained estimator, forget samples, and a small subsample of the remaining data are available. Our results reveal that the estimation error decomposes into an oracle term and an unlearning cost determined by the forget proportion and the forget model bias. We further establish asymptotically valid inference procedures without requiring full retraining. Numerical experiments and real-data applications demonstrate that the proposed method achieves performance close to retraining while requiring substantially less data access.

【9】Parametric Nonconvex Optimization via Convex Surrogates
标题：通过凸代理的参数非凸优化
链接：https://arxiv.org/abs/2604.05640

作者：Renzi Wang,Panagiotis Patrinos,Alberto Bemporad
摘要：本文提出了一种新的基于学习的方法来构造一个代理问题，近似一个给定的参数非凸优化问题。代理函数被设计为由凸项和单调项的组合给出的有限函数集的最小值，使得代理问题可以通过并行凸优化直接求解。作为概念的证明，数值实验上的非凸路径跟踪问题确认所提出的方法的近似质量。
摘要：This paper presents a novel learning-based approach to construct a surrogate problem that approximates a given parametric nonconvex optimization problem. The surrogate function is designed to be the minimum of a finite set of functions, given by the composition of convex and monotonic terms, so that the surrogate problem can be solved directly through parallel convex optimization. As a proof of concept, numerical experiments on a nonconvex path tracking problem confirm the approximation quality of the proposed method.

预测|估计(5篇)

【1】Toward Consistent World Models with Multi-Token Prediction and Latent Semantic Enhancement
标题：通过多令牌预测和潜在语义增强实现一致的世界模型
链接：https://arxiv.org/abs/2604.06155

作者：Qimin Zhong,Hao Liao,Haiming Qin,Mingyang Zhou,Rui Mao,Wei Chen,Naipeng Chao
备注：ACL 2026 Main Conference
摘要：大型语言模型（LLM）是否开发出连贯的内部世界模型仍然是一个核心争论。虽然传统的下一个令牌预测（NTP）专注于一步领先的监督，但多令牌预测（MTP）在学习更结构化的表示方面表现出了希望。在这项工作中，我们提供了一个理论的角度分析的梯度诱导偏差的MTP，支持的经验证据，表明MTP通过诱导代表性收缩通过梯度耦合，促进收敛到内部信念状态。然而，我们发现，标准MTP经常遭受结构性幻觉，其中离散令牌监督鼓励违反环境约束的潜在空间中的非法捷径。为了解决这个问题，我们提出了一种新的方法潜在语义增强MTP（LSE-MTP），它将预测锚定到地面真实隐藏状态轨迹。在合成图和真实世界的Manhattan Taxi Ride上的实验表明，LSE-MTP有效地弥合了离散令牌和连续状态表示之间的差距，增强了表示对齐，减少了结构幻觉，并提高了对扰动的鲁棒性。
摘要：Whether Large Language Models (LLMs) develop coherent internal world models remains a core debate. While conventional Next-Token Prediction (NTP) focuses on one-step-ahead supervision, Multi-Token Prediction (MTP) has shown promise in learning more structured representations. In this work, we provide a theoretical perspective analyzing the gradient inductive bias of MTP, supported by empirical evidence, showing that MTP promotes the convergence toward internal belief states by inducing representational contractivity via gradient coupling. However, we reveal that standard MTP often suffers from structural hallucinations, where discrete token supervision encourages illegal shortcuts in latent space that violate environmental constraints. To address this, we propose a novel method Latent Semantic Enhancement MTP (LSE-MTP), which anchors predictions to ground-truth hidden state trajectories. Experiments on synthetic graphs and real-world Manhattan Taxi Ride show that LSE-MTP effectively bridges the gap between discrete tokens and continuous state representations, enhancing representation alignment, reducing structural hallucinations, and improving robustness to perturbations.

【2】eVTOL Aircraft Energy Overhead Estimation under Conflict Resolution in High-Density Airspaces
标题：高密度空域冲突解决下的eVTOL飞机能源费用估算
链接：https://arxiv.org/abs/2604.06093

作者：Alex Zongo,Peng Wei
备注：Accepted for presentation at the Integrated Communications, Navigation and Surveillance Conference (ICNS) 2026
摘要：在高密度城市空域运行的电动垂直起降（eVTOL）飞机必须通过战术冲突解决方案保持安全间隔，但这种机动的能量成本尚未系统地量化。本文研究了在改进的电压势（MVP）算法下的冲突解决机动如何影响eVTOL能量消耗。使用基于物理的功率模型集成在交通模拟中，我们分析了一个扇区内的大约71,767个航路段，跨越10-60架同时飞行的飞机的交通密度。主要的发现是，基于MVP的去冲突是节能的：在所有密度水平上，平均能源开销保持在1.5%以下，并且该部门内的大多数航线航班的罚款可以忽略不计。然而，该分布表现出明显的右偏，由于持续的多架飞机冲突，在最高密度下，尾部情况达到44%的开销。第95个百分位数的范围从3.84%到5.3%，这表明4-5%的储备余量可以适应绝大多数的战术冲突情景。为了支持运营规划，我们开发了一个机器学习模型，用于估计任务启动时的能源开销。由于冲突的结果取决于未来的交通互动，不能提前知道，该模型提供了点估计和不确定性界限。这些界限是保守的;实际结果落在预测范围内的次数多于所述置信水平，使其适合于安全关键储备规划。总之，这些结果验证了MVP对能量受限的eVTOL操作的适用性，并为高级空中机动性中的储备能量确定提供了定量指导。
摘要：Electric vertical takeoff and landing (eVTOL) aircraft operating in high-density urban airspace must maintain safe separation through tactical conflict resolution, yet the energy cost of such maneuvers has not been systematically quantified. This paper investigates how conflict-resolution maneuvers under the Modified Voltage Potential (MVP) algorithm affect eVTOL energy consumption. Using a physics-based power model integrated within a traffic simulation, we analyze approximately 71,767 en route sections within a sector, across traffic densities of 10-60 simultaneous aircraft. The main finding is that MVP-based deconfliction is energy-efficient: median energy overhead remains below 1.5% across all density levels, and the majority of en route flights within the sector incur negligible penalty. However, the distribution exhibits pronounced right-skewness, with tail cases reaching 44% overhead at the highest densities due to sustained multi-aircraft conflicts. The 95th percentile ranges from 3.84% to 5.3%, suggesting that a 4-5% reserve margin accommodates the vast majority of tactical deconfliction scenarios. To support operational planning, we develop a machine learning model that estimates energy overhead at mission initiation. Because conflict outcomes depend on future traffic interactions that cannot be known in advance, the model provides both point estimates and uncertainty bounds. These bounds are conservative; actual outcomes fall within the predicted range more often than the stated confidence level, making them suitable for safety-critical reserve planning. Together, these results validate MVP's suitability for energy-constrained eVTOL operations and provide quantitative guidance for reserve energy determination in Advanced Air Mobility.

【3】Channel-wise Retrieval for Multivariate Time Series Forecasting
标题：多元时间序列预测的逐行检索
链接：https://arxiv.org/abs/2604.05543

作者：Junhyeok Kang,Jun Seo,Soyeon Park,Sangjun Han,Seohui Bae,Hyeokjun Choe,Soonyoung Lee
备注：Accepted at ICASSP 2026 Oral
摘要：由于固定的回顾窗口，多变量时间序列预测通常难以捕获长期依赖性。检索增强预测通过从内存中检索历史片段来解决这一问题，但现有方法依赖于对所有变量应用相同引用的通道不可知策略。这忽略了变量间的异质性，不同的通道表现出不同的周期性和光谱轮廓。我们提出了CRAFT（智能检索增强预测），一种新的框架，独立地为每个通道进行检索。为了确保效率，CRAFT采用两阶段流水线：在时域中构建的稀疏关系图修剪不相关的候选者，并且频域中的频谱相似性对参考进行排名，在抑制噪声的同时强调占主导地位的周期分量。在七个公共基准上的实验表明，CRAFT优于最先进的预测基线，实现了卓越的准确性和实际的推理效率。
摘要：Multivariate time series forecasting often struggles to capture long-range dependencies due to fixed lookback windows. Retrieval-augmented forecasting addresses this by retrieving historical segments from memory, but existing approaches rely on a channel-agnostic strategy that applies the same references to all variables. This neglects inter-variable heterogeneity, where different channels exhibit distinct periodicities and spectral profiles. We propose CRAFT (Channel-wise retrieval-augmented forecasting), a novel framework that performs retrieval independently for each channel. To ensure efficiency, CRAFT adopts a two-stage pipeline: a sparse relation graph constructed in the time domain prunes irrelevant candidates, and spectral similarity in the frequency domain ranks references, emphasizing dominant periodic components while suppressing noise. Experiments on seven public benchmarks demonstrate that CRAFT outperforms state-of-the-art forecasting baselines, achieving superior accuracy with practical inference efficiency.

【4】El Nino Prediction Based on Weather Forecast and Geographical Time-series Data
标题：基于天气预报和地理时间序列数据的厄尔尼诺预测
链接：https://arxiv.org/abs/2604.04998

作者：Viet Trinh,Ha-Vy Luu,Quoc-Khiem Nguyen-Pham,Hung Tong,Thanh-Huyen Tran,Hoai-Nam Nguyen Dang
摘要：本文提出了一个新的框架，用于提高厄尔尼诺事件的预测准确性和提前时间，这对于减轻其对全球气候、经济和社会的影响至关重要。传统的预测模型往往依赖于海洋和大气指数，这可能缺乏全面的气象和地理数据集所捕获的粒度或动态相互作用。我们的框架将实时全球天气预报数据与异常、次表层海洋热含量和大气压力集成在各种时间和空间分辨率上。该框架利用混合深度学习架构，结合了用于空间特征提取的卷积神经网络（CNN）和用于时间依赖性建模的长短期记忆（LSTM）网络，旨在识别厄尔尼诺事件的复杂前兆和演变模式。
摘要：This paper proposes a novel framework for enhancing the prediction accuracy and lead time of El Niño events, crucial for mitigating their global climatic, economic, and societal impacts. Traditional prediction models often rely on oceanic and atmospheric indices, which may lack the granularity or dynamic interplay captured by comprehensive meteorological and geographical datasets. Our framework integrates real-time global weather forecast data with anomalies, subsurface ocean heat content, and atmospheric pressure across various temporal and spatial resolutions. Leveraging a hybrid deep learning architecture that combines a Convolutional Neural Network (CNN) for spatial feature extraction and a Long Short-Term Memory (LSTM) network for temporal dependency modeling, the framework aims to identify complex precursors and evolving patterns of El Niño events.

【5】Transcriptomic Models for Immunotherapy Response Prediction Show Limited Cross-cohort Generalisability
标题：用于免疫治疗应答预测的转录组学模型显示出有限的跨队列普适性
链接：https://arxiv.org/abs/2604.05478

作者：Yuheng Liang,Lucy Chuo,Ahmadreza Argha,Nona Farbehi,Lu Chen,Roohallah Alizadehsani,Mehdi Hosseinzadeh,Amin Beheshti,Thantrira Porntaveetusm,Youqiong Ye,Hamid Alinejad-Rokny
摘要：免疫检查点抑制剂（ICI）已经改变了癌症治疗;但相当一部分患者表现出内在或获得性耐药，使得准确的治疗前反应预测成为一个关键的未满足需求。来自批量和单细胞RNA测序的基于转录组学的生物标志物（scRNA-seq）为捕获肿瘤-免疫相互作用提供了一个有前途的途径，但现有预测模型的跨队列普适性仍不清楚。（COMPASS，IRNet，NetBio，IKCScore和TNBC-ICI）和四个基于scRNA-seq的模型（PRECISE，DeepGeneX，Tres和scCURE），使用在模型开发期间未见过的公开可用的独立数据集。总体而言，预测性能是适度的：批量RNA-seq模型在大多数队列中以或接近机会水平进行，而scRNA-seq模型仅显示出边际改善。途径水平分析显示，模型间的生物标志物信号稀疏且不一致。尽管基于scRNA-seq的预测因子集中在免疫相关程序上，如同种异体移植排斥反应，但基于批量RNA-seq的模型几乎没有可重复的重叠。PRECISE和NetBio确定了最一致的免疫相关主题，而IRNet主要捕获与ICI生物学弱相关的代谢途径。总之，这些研究结果表明，目前转录组ICI预测模型的跨队列稳健性和生物一致性有限，强调了改进域适应、标准化预处理和生物学基础模型设计的必要性。
摘要：Immune checkpoint inhibitors (ICIs) have transformed cancer therapy; yet substantial proportion of patients exhibit intrinsic or acquired resistance, making accurate pre-treatment response prediction a critical unmet need. Transcriptomics-based biomarkers derived from bulk and single-cell RNA sequencing (scRNA-seq) offer a promising avenue for capturing tumour-immune interactions, yet the cross-cohort generalisability of existing prediction models remains unclear.We systematically benchmark nine state-of-the-art transcriptomic ICI response predictors, five bulk RNA-seq-based models (COMPASS, IRNet, NetBio, IKCScore, and TNBC-ICI) and four scRNA-seq-based models (PRECISE, DeepGeneX, Tres and scCURE), using publicly available independent datasets unseen during model development. Overall, predictive performance was modest: bulk RNA-seq models performed at or near chance level across most cohorts, while scRNA-seq models showed only marginal improvements. Pathway-level analyses revealed sparse and inconsistent biomarker signals across models. Although scRNA-seq-based predictors converged on immune-related programs such as allograft rejection, bulk RNA-seq-based models exhibited little reproducible overlap. PRECISE and NetBio identified the most coherent immune-related themes, whereas IRNet predominantly captured metabolic pathways weakly aligned with ICI biology. Together, these findings demonstrate the limited cross-cohort robustness and biological consistency of current transcriptomic ICI prediction models, underscoring the need for improved domain adaptation, standardised preprocessing, and biologically grounded model design.

其他神经网络|深度学习|模型|建模(25篇)

【1】Learning $\mathsf{AC}^0$ Under Graphical Models
链接：https://arxiv.org/abs/2604.06109

作者：Gautam Chandrasekaran,Jason Gaitonde,Ankur Moitra,Arsen Vasilyan
备注：57 pages
摘要：在一个里程碑式的结果中，Linial，Mansour和Nisan（J. ACM 1993）给出了一个用于学习给定标记为i.i.d.的恒定深度电路的准多项式时间算法。均匀分布下的样本。他们的工作在计算学习理论中有着深刻而持久的遗产，特别是引入了$\textit{low-degree algorithm}$。然而，在该领域的许多结果和技术的一个重要的批评是对产品结构的依赖，这是不太可能在现实的设置。为更自然的相关分布获得类似的学习保证一直是该领域的一个长期挑战。特别是，我们给学习$\mathsf{AC}^0$大大超出产品设置的准多项式时间算法，当输入来自任何图形模型的多项式增长，表现出强烈的空间混合。主要的技术挑战是在给傅立叶分析，我们通过展示新的采样算法如何让我们转移有关低次多项式近似的统一设置下的图形模型的语句的解决方案。我们的方法是一般的，足以扩展到其他研究良好的函数类，如单调函数和半空间。
摘要：In a landmark result, Linial, Mansour and Nisan (J. ACM 1993) gave a quasipolynomial-time algorithm for learning constant-depth circuits given labeled i.i.d. samples under the uniform distribution. Their work has had a deep and lasting legacy in computational learning theory, in particular introducing the $\textit{low-degree algorithm}$. However, an important critique of many results and techniques in the area is the reliance on product structure, which is unlikely to hold in realistic settings. Obtaining similar learning guarantees for more natural correlated distributions has been a longstanding challenge in the field. In particular, we give quasipolynomial-time algorithms for learning $\mathsf{AC}^0$ substantially beyond the product setting, when the inputs come from any graphical model with polynomial growth that exhibits strong spatial mixing. The main technical challenge is in giving a workaround to Fourier analysis, which we do by showing how new sampling algorithms allow us to transfer statements about low-degree polynomial approximation under the uniform setting to graphical models. Our approach is general enough to extend to other well-studied function classes, like monotone functions and halfspaces.

【2】A machine learning framework for uncovering stochastic nonlinear dynamics from noisy data
标题：从有噪数据中揭示随机非线性动力学的机器学习框架
链接：https://arxiv.org/abs/2604.06081

作者：Matteo Bosso,Giovanni Franzese,Kushal Swamy,Maarten Theulings,Alejandro M. Aragón,Farbod Alijani
备注：25 pages, 12 figures, 4 tables
摘要：对真实世界系统进行建模需要考虑噪声--无论它是来自金融市场不可预测的波动、生物系统的不规则节奏还是生态系统的环境变化。虽然这类系统的行为通常可以用随机微分方程来描述，但一个核心挑战是理解噪声如何影响从数据中推断系统参数和动态。传统的符号回归方法可以揭示控制方程，但通常忽略不确定性。相反，高斯过程提供了原则性的不确定性量化，但对潜在的动态几乎没有提供任何见解。在这项工作中，我们用一个混合符号回归-概率机器学习框架来弥合这一差距，该框架恢复了控制方程的符号形式，同时推断了系统参数的不确定性。该框架将深度符号回归与基于高斯过程的最大似然估计相结合，分别对确定性动力学和噪声结构进行建模，而无需事先假设其函数形式。我们验证了数值基准，包括谐波，Duffing，和范德波尔振荡器的方法，并验证它的实验系统的耦合生物振荡器表现出同步，该算法成功地识别符号和随机组件。该框架是数据高效的，只需要100-1000个数据点，并且对噪声具有鲁棒性-在不确定性是内在的并且必须理解动力系统的结构和可变性的领域中展示了其广泛的潜力。
摘要：Modeling real-world systems requires accounting for noise - whether it arises from unpredictable fluctuations in financial markets, irregular rhythms in biological systems, or environmental variability in ecosystems. While the behavior of such systems can often be described by stochastic differential equations, a central challenge is understanding how noise influences the inference of system parameters and dynamics from data. Traditional symbolic regression methods can uncover governing equations but typically ignore uncertainty. Conversely, Gaussian processes provide principled uncertainty quantification but offer little insight into the underlying dynamics. In this work, we bridge this gap with a hybrid symbolic regression-probabilistic machine learning framework that recovers the symbolic form of the governing equations while simultaneously inferring uncertainty in the system parameters. The framework combines deep symbolic regression with Gaussian process-based maximum likelihood estimation to separately model the deterministic dynamics and the noise structure, without requiring prior assumptions about their functional forms. We verify the approach on numerical benchmarks, including harmonic, Duffing, and van der Pol oscillators, and validate it on an experimental system of coupled biological oscillators exhibiting synchronization, where the algorithm successfully identifies both the symbolic and stochastic components. The framework is data-efficient, requiring as few as 100-1000 data points, and robust to noise - demonstrating its broad potential in domains where uncertainty is intrinsic and both the structure and variability of dynamical systems must be understood.

【3】On Dominant Manifolds in Reservoir Computing Networks
标题：水库计算网络中的主导流
链接：https://arxiv.org/abs/2604.05967

作者：Noa Kaplan,Alberto Padoan,Anastasia Bizyaeva
备注：6 pages, 3 figures
摘要：了解训练如何塑造循环网络动态的几何形状是时间序列建模的核心问题。我们研究了在水库计算（RC）网络的时间预测任务的训练中出现的低维主导流形。对于一个简化的线性和连续时间的储层模型，我们链接的维度和结构的主导模式直接的内在维度和信息内容的训练数据。特别是，对于由自主动态系统生成的训练数据，我们将训练水库的主导模式与原始系统的Koopman特征函数的近似值相关联，从而阐明水库计算与动态模式分解算法之间的明确联系。我们说明了在模拟训练过程中产生主导流形的特征值运动，并讨论了通过切线动力学和微分p-优势推广到非线性RC。
摘要：Understanding how training shapes the geometry of recurrent network dynamics is a central problem in time-series modeling. We study the emergence of low-dimensional dominant manifolds in the training of Reservoir Computing (RC) networks for temporal forecasting tasks. For a simplified linear and continuous-time reservoir model, we link the dimensionality and structure of the dominant modes directly to the intrinsic dimensionality and information content of the training data. In particular, for training data generated by an autonomous dynamical system, we relate the dominant modes of the trained reservoir to approximations of the Koopman eigenfunctions of the original system, illuminating an explicit connection between reservoir computing and the Dynamic Mode Decomposition algorithm. We illustrate the eigenvalue motion that generates the dominant manifolds during training in simulation, and discuss generalization to nonlinear RC via tangent dynamics and differential p-dominance.

【4】The UNDO Flip-Flop: A Controlled Probe for Reversible Semantic State Management in State Space Model
标题：UNDO触发器：状态空间模型中可逆语义状态管理的受控探测器
链接：https://arxiv.org/abs/2604.05923

作者：Hongxu Zhou
摘要：状态空间模型（SSM）已被证明具有对无星顺序任务和有界分层结构进行建模的理论能力Sarrof et al.（2024）。然而，正式的表现力结果并不能保证基于梯度的优化将可靠地发现相应的解决方案。现有的基准探测单调状态跟踪，如在标准触发器任务，或结构嵌套，如在Dyck语言，但既不孤立可逆的语义状态检索。我们引入UNDO触发器任务来填补这一空白。通过扩展具有UNDO的标准触发器，任务需要模型来维护隐式有界堆栈并在非单调更新序列下恢复历史状态。在此框架下，我们评估了一层和两层的Mamba-2。这两种变体都未能获得可证明的基于堆栈的回滚机制，而是收敛于本地切换启发式，该启发式反转当前状态而不是检索存储的历史。在训练长度分布内进行对抗性收缩压力测试时，双层模型的准确率为41.10%，低于随机概率。结果证实了系统性而非偶然性故障。因果消融表明，瓶颈在于检索，而不是存储。这些结果在架构原则上可以表示的内容和梯度下降可靠学习的内容之间划出了清晰的界限，这是理论表达性分析无法单独捕获的区别。
摘要：State space models (SSMs) have been shown to possess the theoretical capacity to model both star-free sequential tasks and bounded hierarchical structures Sarrof et al. (2024). However, formal expressivity results do not guarantee that gradient-based optimisation will reliably discover the corresponding solutions. Existing benchmarks probe either monotonic state tracking, as in the standard Flip-Flop task, or structural nesting, as in the Dyck languages, but neither isolates reversible semantic state retrieval. We introduce the UNDO Flip-Flop task to fill this gap. By extending the standard Flip-Flop with an UNDO, the task requires a model to maintain an implicit bounded stack and recover historical states under non-monotonic update sequences. We evaluate one-layer and two-layer Mamba-2 under this framework. Both variants fail to acquire the provably expressible stack-based rollback mechanism, converging instead on a local toggle heuristic that inverts the current state rather than retrieving stored history. Under an adversarial retraction pressure test held within the training length distribution, the two-layer model collapses to 41.10% accuracy, which is below random chance. The results confirm systematic rather than incidental failure. Causal ablation shows that the bottleneck lies in retrieval, not storage. These results draw a clear line between what an architecture can in principle represent and what gradient descent reliably learns, a distinction that theoretical expressivity analyses alone cannot capture.

【5】Hidden in the Multiplicative Interaction: Uncovering Fragility in Multimodal Contrastive Learning
标题：隐藏在相乘互动中：揭示多模式对比学习中的脆弱性
链接：https://arxiv.org/abs/2604.05834

作者：Tillmann Rheude,Stefan Hegselmann,Roland Eils,Benjamin Wild
摘要：多模态对比学习通过超越图像-文本对而日益丰富。在最近的对比方法，Symile是一个强大的方法，这一挑战，因为它的乘法交互目标捕捉高阶跨模态依赖。然而，我们发现，Symile对待所有的方式对称，并没有明确的模型的可靠性差异，特别是在三模态乘法相互作用的限制。在实践中，图像-文本对之外的模态可能会错位、信息量弱或缺失，并且统一处理它们可能会默默地降低性能。这种脆弱性可以隐藏在乘法交互中：即使一个不可靠的模态悄悄地破坏了乘积项，Symile也可能优于成对CLIP。我们提出了门控对称，一个对比门控机制，适应模态贡献的注意力为基础，每候选人的基础上。该门通过向可学习的中性方向插值嵌入来抑制不可靠的输入，并在可靠的跨模态对齐不太可能时加入显式NULL选项。在一个揭示这种脆弱性的受控合成基准测试和三个真实世界的三峰数据集（这些数据集的故障可能被平均值掩盖）中，门控Symile比调优的Symile和CLIP模型实现了更高的top-1检索准确性。更广泛地说，我们的研究结果强调门控是在不完美和两种以上模式下实现强大的多模态对比学习的一步。
摘要：Multimodal contrastive learning is increasingly enriched by going beyond image-text pairs. Among recent contrastive methods, Symile is a strong approach for this challenge because its multiplicative interaction objective captures higher-order cross-modal dependence. Yet, we find that Symile treats all modalities symmetrically and does not explicitly model reliability differences, a limitation that becomes especially present in trimodal multiplicative interactions. In practice, modalities beyond image-text pairs can be misaligned, weakly informative, or missing, and treating them uniformly can silently degrade performance. This fragility can be hidden in the multiplicative interaction: Symile may outperform pairwise CLIP even if a single unreliable modality silently corrupts the product terms. We propose Gated Symile, a contrastive gating mechanism that adapts modality contributions on an attention-based, per-candidate basis. The gate suppresses unreliable inputs by interpolating embeddings toward learnable neutral directions and incorporating an explicit NULL option when reliable cross-modal alignment is unlikely. Across a controlled synthetic benchmark that uncovers this fragility and three real-world trimodal datasets for which such failures could be masked by averages, Gated Symile achieves higher top-1 retrieval accuracy than well-tuned Symile and CLIP models. More broadly, our results highlight gating as a step toward robust multimodal contrastive learning under imperfect and more than two modalities.

【6】Learn to Rank: Visual Attribution by Learning Importance Ranking
标题：学习排名：通过学习重要性排名的视觉归因
链接：https://arxiv.org/abs/2604.05819

作者：David Schinagl,Christian Fruhwirth-Reisinger,Alexander Prutsch,Samuel Schulter,Horst Possegger
摘要：解释复杂计算机视觉模型的决策对于建立信任和问责制至关重要，特别是在安全关键领域。一种既定的可解释性方法是生成视觉属性图，突出显示与模型预测最相关的输入区域。然而，现有的方法面临着三方面的权衡。基于简化的方法是有效的，但是它们可能是有偏见的，并且是特定于体系结构的。同时，基于扰动的方法是因果接地，但它们是昂贵的，对于Vision Transformers往往产生粗糙，补丁级的解释。以学习为基础的解释者速度很快，但通常优化替代目标或从启发式教师中提取。我们提出了一个学习方案，而不是直接优化删除和插入指标。由于这些指标依赖于不可微的排序和排名，我们将其视为置换学习，并使用Gumbel-Sinkhorn用可微松弛代替硬排序。这使得能够通过目标模型的属性引导扰动进行端到端训练。在推理过程中，我们的方法在一个单一的向前传递中产生密集的像素级属性，并具有可选的几步梯度细化。我们的实验证明了一致的定量改进和更清晰，边界对齐的解释，特别是基于transformer的视觉模型。
摘要：Interpreting the decisions of complex computer vision models is crucial to establish trust and accountability, especially in safety-critical domains. An established approach to interpretability is generating visual attribution maps that highlight regions of the input most relevant to the model's prediction. However, existing methods face a three-way trade-off. Propagation-based approaches are efficient, but they can be biased and architecture-specific. Meanwhile, perturbation-based methods are causally grounded, yet they are expensive and for vision transformers often yield coarse, patch-level explanations. Learning-based explainers are fast but usually optimize surrogate objectives or distill from heuristic teachers. We propose a learning scheme that instead optimizes deletion and insertion metrics directly. Since these metrics depend on non-differentiable sorting and ranking, we frame them as permutation learning and replace the hard sorting with a differentiable relaxation using Gumbel-Sinkhorn. This enables end-to-end training through attribution-guided perturbations of the target model. During inference, our method produces dense, pixel-level attributions in a single forward pass with optional, few-step gradient refinement. Our experiments demonstrate consistent quantitative improvements and sharper, boundary-aligned explanations, particularly for transformer-based vision models.

【7】From Uniform to Learned Knots: A Study of Spline-Based Numerical Encodings for Tabular Deep Learning
标题：从均匀到习得的结：基于样条曲线的表格深度学习的数字编码研究
链接：https://arxiv.org/abs/2604.05635

作者：Manish Kumar,Anton Frederik Thielmann,Christoph Weisser,Benjamin Säfken
备注：20, 9 figures
摘要：数值预处理仍然是表格深度学习的重要组成部分，其中连续特征的表示可以强烈影响下游性能。虽然它的重要性在经典统计和机器学习模型中已经得到了很好的确立，但显式数值预处理在表格深度学习中的作用仍然不太清楚。在这项工作中，我们研究这个问题的重点是基于样条的数值编码。我们调查三个样条家庭编码的数值特征，即B-样条，M-样条，和集成样条（I-样条），根据统一的，基于分位数，目标感知，和学习节点的位置。对于可学习的结变体，我们使用可微结参数化，该参数化使结位置与骨干一起实现稳定的端到端优化。我们使用MLP，ResNet和FT-Transformer主干在各种公共回归和分类数据集上评估这些编码，并将其与常见的数值预处理基线进行比较。我们的研究结果表明，数值编码的效果强烈依赖于任务，输出大小和骨干。对于分类，分段线性编码（PLE）是总体上最鲁棒的选择，而基于样条的编码仍然具有竞争力。对于回归，没有单一的编码均匀地占主导地位。相反，性能取决于样条族、节点放置策略和输出大小，MLP和ResNet通常比FT-Transformer观察到更大的增益。我们进一步发现，在所提出的参数化下，可学习节点变体可以稳定地优化，但可能会大幅增加训练成本，特别是对于M样条和I样条扩展。总体而言，结果表明，数值编码不仅应在预测性能方面进行评估，而且还应在计算开销方面进行评估。
摘要：Numerical preprocessing remains an important component of tabular deep learning, where the representation of continuous features can strongly affect downstream performance. Although its importance is well established for classical statistical and machine learning models, the role of explicit numerical preprocessing in tabular deep learning remains less well understood. In this work, we study this question with a focus on spline-based numerical encodings. We investigate three spline families for encoding numerical features, namely B-splines, M-splines, and integrated splines (I-splines), under uniform, quantile-based, target-aware, and learnable-knot placement. For the learnable-knot variants, we use a differentiable knot parameterization that enables stable end-to-end optimization of knot locations jointly with the backbone. We evaluate these encodings on a diverse collection of public regression and classification datasets using MLP, ResNet, and FT-Transformer backbones, and compare them against common numerical preprocessing baselines. Our results show that the effect of numerical encodings depends strongly on the task, output size, and backbone. For classification, piecewise-linear encoding (PLE) is the most robust choice overall, while spline-based encodings remain competitive. For regression, no single encoding dominates uniformly. Instead, performance depends on the spline family, knot-placement strategy, and output size, with larger gains typically observed for MLP and ResNet than for FT-Transformer. We further find that learnable-knot variants can be optimized stably under the proposed parameterization, but may substantially increase training cost, especially for M-spline and I-spline expansions. Overall, the results show that numerical encodings should be assessed not only in terms of predictive performance, but also in terms of computational overhead.

【8】LMI-Net: Linear Matrix Inequality--Constrained Neural Networks via Differentiable Projection Layers
标题：LMI-Net：线性矩阵不等式--基于可微投影层的约束神经网络
链接：https://arxiv.org/abs/2604.05374

作者：Sunbochen Tang,Andrea Goertzen,Navid Azizan
摘要：线性矩阵不等式（LMI）在证明动力系统的稳定性、鲁棒性和前向不变性方面发挥了重要作用。尽管基于学习的控制设计和证书合成方法的快速发展，现有的方法往往无法保持正式保证所需的硬矩阵不等式约束。我们提出了LMI网络，一个有效的和模块化的可微投影层，通过构造来执行LMI约束。我们的方法升降机的集合定义的LMI约束到一个仿射等式约束和半正定锥的交集，通过道格拉斯-Rachford分裂执行前向通过，并通过隐式微分支持有效的反向传播。我们建立了理论保证，投影层收敛到一个可行的点，证明LMI网络转换成一个可靠的模型，满足LMI约束的通用神经网络。实验评估，包括不变的椭球体合成和联合认证和证书设计的一个家庭的干扰线性系统，LMI-Net大大提高了可行性软约束模型下的分布转移，同时保持快速的推理速度，桥接半确定性程序为基础的认证和现代学习技术。
摘要：Linear matrix inequalities (LMIs) have played a central role in certifying stability, robustness, and forward invariance of dynamical systems. Despite rapid development in learning-based methods for control design and certificate synthesis, existing approaches often fail to preserve the hard matrix inequality constraints required for formal guarantees. We propose LMI-Net, an efficient and modular differentiable projection layer that enforces LMI constraints by construction. Our approach lifts the set defined by LMI constraints into the intersection of an affine equality constraint and the positive semidefinite cone, performs the forward pass via Douglas-Rachford splitting, and supports efficient backward propagation through implicit differentiation. We establish theoretical guarantees that the projection layer converges to a feasible point, certifying that LMI-Net transforms a generic neural network into a reliable model satisfying LMI constraints. Evaluated on experiments including invariant ellipsoid synthesis and joint controller-and-certificate design for a family of disturbed linear systems, LMI-Net substantially improves feasibility over soft-constrained models under distribution shift while retaining fast inference speed, bridging semidefinite-program-based certification and modern learning techniques.

【9】A Theoretical Framework for Statistical Evaluability of Generative Models
标题：生成模型统计可评估性的理论框架
链接：https://arxiv.org/abs/2604.05324

作者：Shashaank Aiyer,Yishay Mansour,Shay Moran,Han Shao
备注：25 pages
摘要：统计评估的目的是使用hold-out i.i.d.估计模型的泛化性能。从地面真实分布中采样的测试数据。在监督学习环境（如分类）中，性能指标（如错误率）是明确定义的，并且测试误差可靠地近似于给定足够大数据集的总体误差。相比之下，生成模型的评估更具挑战性，因为它们的开放性：目前还不清楚哪些指标是合适的，以及这些指标是否可以从有限的样本中可靠地评估。在这项工作中，我们引入了一个理论框架，用于评估生成模型，并建立常用指标的可评估性结果。我们研究了两类度量：基于测试的度量，包括积分概率度量（IPM）和雷尼分歧。我们表明，IPM相对于任何有界的测试类可以从有限的样本评估乘法和加法近似误差。此外，当测试类具有有限的脂肪粉碎维数时，IPM可以以任意精度进行评估。与此相反，雷尼和KL分歧是不能从有限的样本进行评估，因为它们的值可以由罕见的事件决定。我们还分析了困惑作为一种评价方法的潜力和局限性。
摘要：Statistical evaluation aims to estimate the generalization performance of a model using held-out i.i.d.\ test data sampled from the ground-truth distribution. In supervised learning settings such as classification, performance metrics such as error rate are well-defined, and test error reliably approximates population error given sufficiently large datasets. In contrast, evaluation is more challenging for generative models due to their open-ended nature: it is unclear which metrics are appropriate and whether such metrics can be reliably evaluated from finite samples. In this work, we introduce a theoretical framework for evaluating generative models and establish evaluability results for commonly used metrics. We study two categories of metrics: test-based metrics, including integral probability metrics (IPMs), and Rényi divergences. We show that IPMs with respect to any bounded test class can be evaluated from finite samples up to multiplicative and additive approximation errors. Moreover, when the test class has finite fat-shattering dimension, IPMs can be evaluated with arbitrary precision. In contrast, Rényi and KL divergences are not evaluable from finite samples, as their values can be critically determined by rare events. We also analyze the potential and limitations of perplexity as an evaluation method.

【10】DualDiffusion: A Speculative Decoding Strategy for Masked Diffusion Models
标题：DualDiffusion：掩蔽扩散模型的一种推测解码策略
链接：https://arxiv.org/abs/2604.05250

作者：Satyam Goyal,Kushal Patel,Tanush Mittal,Arjun Laxman
摘要：Masked Diffusion Models（MDM）通过支持并行令牌生成和双向上下文建模，为自回归语言模型提供了一种有前途的替代方案。然而，由于双向注意力，它们的推理速度受到无法缓存键值对的限制，在每个生成步骤需要O（N^2）$计算。虽然FastDLLM和DkvCache等最新方法通过注意力近似和缓存策略提高了推理速度，但它们以生成质量为代价实现了加速。我们提出了DualDiffusion，MDM的推测性解码框架，它结合了快速起草模型（使用有效的近似）和更慢，更准确的验证模型。通过运行轻量级起草器的多个步骤，然后执行单个验证步骤，与现有方法相比，DualDiffusion在生成步骤和准确性之间实现了卓越的Pareto边界。我们在MMLU和GSM 8 K上评估了我们的方法，证明了DualDiffusion在保持高精度的同时减少了所需的生成步骤，有效地推动了掩蔽扩散语言模型的质量效率权衡曲线。
摘要：Masked Diffusion Models (MDMs) offer a promising alternative to autoregressive language models by enabling parallel token generation and bidirectional context modeling. However, their inference speed is significantly limited by the inability to cache key-value pairs due to bidirectional attention, requiring $O(N^2)$ computations at each generation step. While recent methods like FastDLLM and DkvCache improve inference speed through attention approximations and caching strategies, they achieve speedups at the cost of generation quality. We propose DualDiffusion, a speculative decoding framework for MDMs that combines fast drafter models (using efficient approximations) with slower, more accurate verifier models. By running multiple steps of a lightweight drafter followed by a single verification step, DualDiffusion achieves a superior Pareto frontier between generation steps and accuracy compared to existing approaches. We evaluate our method on MMLU and GSM8K, demonstrating that DualDiffusion maintains high accuracy while reducing the number of generation steps required, effectively pushing the quality-efficiency trade-off curve for masked diffusion language models.

【11】OrthoFuse: Training-free Riemannian Fusion of Orthogonal Style-Concept Adapters for Diffusion Models
标题：矫形外科：扩散模型的垂直风格概念适配器的免训练Riemann融合
链接：https://arxiv.org/abs/2604.05183

作者：Ali Aliev,Kamil Garifullin,Nikolay Yudin,Vera Soboleva,Alexander Molozhavenko,Ivan Oseledets,Aibek Alanov,Maxim Rakhuba
摘要：在快速增长的模型训练领域，人们对参数有效的微调和使用少量训练数据使模型适应狭窄任务的各种技术一直很感兴趣。然而，有一个悬而未决的问题：如何将针对不同任务调优的多个适配器组合成一个能够在两个任务上产生足够结果的适配器？具体来说，合并生成模型的主题和样式适配器仍然没有解决。在本文中，我们试图表明，在正交微调（OFT）的情况下，我们可以使用结构化的正交参数化及其几何性质，以获得免训练适配器合并的公式。特别是，我们推导出最近提出的群和洗牌（$\mathcal{GS}$）正交矩阵形成的流形的结构，并获得两点之间的测地线近似的有效公式。此外，我们提出了一个$\text{spectrum restoration}$变换，恢复合并适配器的光谱特性，以实现更高质量的融合。我们在主题驱动的生成任务中进行实验，结果表明，我们的技术合并两个$\mathcal{GS}$正交矩阵能够统一不同适配器的概念和风格特征。据我们所知，这是第一个用于合并乘法正交适配器的免训练方法。代码可通过$\href{https：//github.com/ControlGenAI/Orthodox}{link}$获得。
摘要：In a rapidly growing field of model training there is a constant practical interest in parameter-efficient fine-tuning and various techniques that use a small amount of training data to adapt the model to a narrow task. However, there is an open question: how to combine several adapters tuned for different tasks into one which is able to yield adequate results on both tasks? Specifically, merging subject and style adapters for generative models remains unresolved. In this paper we seek to show that in the case of orthogonal fine-tuning (OFT), we can use structured orthogonal parametrization and its geometric properties to get the formulas for training-free adapter merging. In particular, we derive the structure of the manifold formed by the recently proposed Group-and-Shuffle ($\mathcal{GS}$) orthogonal matrices, and obtain efficient formulas for the geodesics approximation between two points. Additionally, we propose a $\text{spectra restoration}$ transform that restores spectral properties of the merged adapter for higher-quality fusion. We conduct experiments in subject-driven generation tasks showing that our technique to merge two $\mathcal{GS}$ orthogonal matrices is capable of uniting concept and style features of different adapters. To the best of our knowledge, this is the first training-free method for merging multiplicative orthogonal adapters. Code is available via the $\href{https://github.com/ControlGenAI/OrthoFuse}{link}$.

【12】R3PM-Net: Real-time, Robust, Real-world Point Matching Network
标题：R3 PM-Net：实时、稳健、真实世界的点匹配网络
链接：https://arxiv.org/abs/2604.05060

作者：Yasaman Kashefbahrami,Erkut Akdag,Panagiotis Meletis,Evgeniya Balmashnova,Dip Goswami,Egor Bondarau
备注：Accepted to CVPRw 2026 (Oral), Code and datasets at https://github.com/YasiiKB/R3PM-Net
摘要：点云精确配准是三维数据处理中的一项重要任务，涉及到两点云之间刚体变换的估计。虽然深度学习方法已经解决了传统非学习方法的关键限制，例如对噪声、离群值、遮挡和初始化的敏感性，但它们是在干净、密集的合成数据集上开发和评估的（限制了它们对现实世界工业场景的可推广性）。本文介绍了R3 PM-Net，这是一个轻量级的、全局感知的、对象级的点匹配网络，旨在通过优先考虑泛化能力和实时效率来弥合这一差距。为了支持这种转变，提出了两个数据集，苏克兰菲尔德和苏扫描。它们为将不完美的摄影测量和事件相机扫描记录到数字CAD模型提供了评估基础，并且已公开提供。大量的实验表明，R3 PM-Net以无与伦比的速度实现了具有竞争力的准确性。在ModelNet 40上，它仅用0.007$s就达到了1 $的完美适应度得分和0.029 $ cm的内点RMSE，比最先进的方法RegTR快了大约7倍。这种性能延续到Sioux-Cranfield数据集，保持$1$的适应度和$0.030$ cm的内点RMSE，具有类似的低延迟。此外，在极具挑战性的Sioux-Scans数据集上，R3 PM-Net在50 ms内成功解决了边缘情况。这些结果证实，R3 PM-Net为关键工业应用提供了一种强大的高速解决方案，其中精度和实时性能是必不可少的。代码和数据集可在https://github.com/YasiiKB/R3PM-Net上获得。
摘要：Accurate Point Cloud Registration (PCR) is an important task in 3D data processing, involving the estimation of a rigid transformation between two point clouds. While deep-learning methods have addressed key limitations of traditional non-learning approaches, such as sensitivity to noise, outliers, occlusion, and initialization, they are developed and evaluated on clean, dense, synthetic datasets (limiting their generalizability to real-world industrial scenarios). This paper introduces R3PM-Net, a lightweight, global-aware, object-level point matching network designed to bridge this gap by prioritizing both generalizability and real-time efficiency. To support this transition, two datasets, Sioux-Cranfield and Sioux-Scans, are proposed. They provide an evaluation ground for registering imperfect photogrammetric and event-camera scans to digital CAD models, and have been made publicly available. Extensive experiments demonstrate that R3PM-Net achieves competitive accuracy with unmatched speed. On ModelNet40, it reaches a perfect fitness score of $1$ and inlier RMSE of $0.029$ cm in only $0.007$s, approximately 7 times faster than the state-of-the-art method RegTR. This performance carries over to the Sioux-Cranfield dataset, maintaining a fitness of $1$ and inlier RMSE of $0.030$ cm with similarly low latency. Furthermore, on the highly challenging Sioux-Scans dataset, R3PM-Net successfully resolves edge cases in under 50 ms. These results confirm that R3PM-Net offers a robust, high-speed solution for critical industrial applications, where precision and real-time performance are indispensable. The code and datasets are available at https://github.com/YasiiKB/R3PM-Net.

【13】Blind-Spot Mass: A Good-Turing Framework for Quantifying Deployment Coverage Risk in Machine Learning Systems
标题：盲点质量：量化机器学习系统中部署覆盖风险的良好图灵框架
链接：https://arxiv.org/abs/2604.05057

作者：Biplab Pal,Santanu Bhattacharya,Madanjit Singh
备注：15 pages, 7 figures, 1 table; submitted to Journal of Machine Learning Research (JMLR)
摘要：盲点质量是一个用于量化机器学习中部署覆盖风险的Good-Turing框架。在现代机器学习系统中，操作状态分布通常是重尾的，这意味着在有限的训练和评估数据中，有效但罕见的状态的长尾在结构上支持不足。这造成了一种形式的“覆盖盲区”：模型在标准测试集上看起来是准确的，但在部署状态空间的大区域上仍然不可靠。我们提出了盲点质量B_n（tau），这是一个部署度量，用于估计分配给经验支持度低于阈值tau的状态的总概率质量。B_n（tau）是用Good-Turing unseen-species估计计算的，并给出了有多少操作分布位于可靠性临界、支持不足区域的原则估计。我们进一步推导出一个覆盖率施加的准确性上限，分解成支持和盲目的组件和分离的容量限制的数据限制的整体性能。我们验证的框架，可穿戴人体活动识别（HAR）使用手腕佩戴的惯性数据。然后，我们在MIMIC-IV医院数据库中复制了275例住院患者的相同分析，其中在临床状态抽象中，盲点质量曲线在tau = 5时收敛到相同的95%。这种跨结构独立领域的复制-在模态，特征空间，标签空间和应用方面不同-表明盲点质量是量化组合覆盖风险的通用ML方法，而不是特定于应用的伪影。盲点分解可识别哪些活动或临床制度主导风险，为行业从业者提供关于目标数据收集、规范化/重新规范化以及物理或域信息约束的可操作指导，以实现更安全的部署。
摘要：Blind-spot mass is a Good-Turing framework for quantifying deployment coverage risk in machine learning. In modern ML systems, operational state distributions are often heavy-tailed, implying that a long tail of valid but rare states is structurally under-supported in finite training and evaluation data. This creates a form of 'coverage blindness': models can appear accurate on standard test sets yet remain unreliable across large regions of the deployment state space. We propose blind-spot mass B_n(tau), a deployment metric estimating the total probability mass assigned to states whose empirical support falls below a threshold tau. B_n(tau) is computed using Good-Turing unseen-species estimation and yields a principled estimate of how much of the operational distribution lies in reliability-critical, under-supported regimes. We further derive a coverage-imposed accuracy ceiling, decomposing overall performance into supported and blind components and separating capacity limits from data limits. We validate the framework in wearable human activity recognition (HAR) using wrist-worn inertial data. We then replicate the same analysis in the MIMIC-IV hospital database with 275 admissions, where the blind-spot mass curve converges to the same 95% at tau = 5 across clinical state abstractions. This replication across structurally independent domains - differing in modality, feature space, label space, and application - shows that blind-spot mass is a general ML methodology for quantifying combinatorial coverage risk, not an application-specific artifact. Blind-spot decomposition identifies which activities or clinical regimes dominate risk, providing actionable guidance for industrial practitioners on targeted data collection, normalization/renormalization, and physics- or domain-informed constraints for safer deployment.

【14】Phase-Associative Memory: Sequence Modeling in Complex Hilbert Space
标题：相联想记忆：复Hilbert空间中的序列建模
链接：https://arxiv.org/abs/2604.05030

作者：Gowrav Vishwakarma,Christopher J. Agostino
备注：submitting to APS Open Science, 10 pages, 1 figure, code and training logs available at https://github.com/gowrav-vishwakarma/qllm2
摘要：我们提出了相位联想记忆（PAM），一个递归序列模型，其中所有表示都是复值的，关联通过外积在矩阵状态$S_{t}$ $\in$ $\mathbb{C}^{d \times d}$中积累，检索通过共轭内积$K_t^* \cdot Q_t / \sqrt{d}$操作。在WikiText-103上的$\sim$1亿参数处，PAM达到了验证困惑度30.0，在相同条件下训练的匹配的Transformer（27.1）的$\sim$10\%之内，尽管复杂计算的$4\times $算术开销并且没有自定义内核。我们从矢量态模型跟踪实验路径，其中全息绑定由于$O而失败（1/\sqrt{n}）$叠加关联的容量退化，到解决它的矩阵状态。其原生操作是复值叠加和共轭检索的架构的竞争力与最近的经验证据一致，即人类和大型语言模型中的语义解释都表现出非-经典的上下文，我们讨论了这意味着在语言建模的计算形式主义的选择。
摘要：We present Phase-Associative Memory (PAM), a recurrent sequence model in which all representations are complex-valued, associations accumulate in a matrix state $S_{t}$ $\in$ $\mathbb{C}^{d \times d}$ via outer products, and retrieval operates through the conjugate inner product $K_t^* \cdot Q_t / \sqrt{d}$. At $\sim$100M parameters on WikiText-103, PAM reaches validation perplexity 30.0, within $\sim$10\% of a matched transformer (27.1) trained under identical conditions, despite $4\times$ arithmetic overhead from complex computation and no custom kernels. We trace the experimental path from vector-state models, where holographic binding fails due to the $O(1/\sqrt{n})$ capacity degradation of superposed associations, to the matrix state that resolves it. The competitiveness of an architecture whose native operations are complex-valued superposition and conjugate retrieval is consistent with recent empirical evidence that semantic interpretation in both humans and large language models exhibits non-classical contextuality, and we discuss what this implies for the choice of computational formalism in language modeling.

【15】Learning-Based Multi-Criteria Decision Making Model for Sawmill Location Problems
标题：基于学习的锯木厂选址问题多准则决策模型
链接：https://arxiv.org/abs/2604.04996

作者：Mahid Ahmed,Ali Dogru,Chaoyang Zhang,Chao Meng
备注：34 pages, 12 figures, 5 tables
摘要：战略性地定位锯木厂对于提高木材供应链的效率、盈利能力和可持续性至关重要。我们的研究提出了一个基于学习的多标准决策（LB-MCDM）框架，通过MCDM集成机器学习（ML）与基于GIS的空间位置分析。拟议的框架提供了一个数据驱动的，公正的，可复制的方法来评估网站的适合性。我们通过密西西比州（MS）的案例研究证明了该模型的实用性。我们应用五种ML算法（随机森林分类器，支持向量分类器，XGBoost分类器，逻辑回归和K最近邻分类器）来确定密西西比州最合适的锯木厂位置。在这些模型中，随机森林分类器实现了最高的性能。我们使用SHAP（SHapley加法解释）技术来确定每个标准的相对重要性，揭示了供需比，一个复合功能，反映当地市场竞争动态，作为最有影响力的因素，其次是公路，铁路线和城市区域距离。我们的LB-MCDM模型生成的适合性地图的验证表明，10-11%的MS景观是非常适合锯木厂的位置。
摘要：Strategically locating a sawmill is vital for enhancing the efficiency, profitability, and sustainability of timber supply chains. Our study proposes a Learning-Based Multi-Criteria Decision-Making (LB-MCDM) framework that integrates machine learning (ML) with GIS-based spatial location analysis via MCDM. The proposed framework provides a data-driven, unbiased, and replicable approach to assessing site suitability. We demonstrate the utility of the proposed model through a case study in Mississippi (MS). We apply five ML algorithms (Random Forest Classifier, Support Vector Classifier, XGBoost Classifier, Logistic Regression, and K-Nearest Neighbors Classifier) to identify the most suitable sawmill locations in Mississippi. Among these models, the Random Forest Classifier achieved the highest performance. We use the SHAP (SHapley Additive exPlanations) technique to determine the relative importance of each criterion, revealing the Supply-Demand Ratio, a composite feature that reflects local market competition dynamics, as the most influential factor, followed by Road, Rail Line and Urban Area Distance. The validation of suitability maps generated by our LB-MCDM model suggests that 10-11% of the MS landscape is highly suitable for sawmill location.

【16】Prune-Quantize-Distill: An Ordered Pipeline for Efficient Neural Network Compression
标题：修剪-量化-蒸馏：高效神经网络压缩的有序管道
链接：https://arxiv.org/abs/2604.04988

作者：Longsheng Zhou,Yu Shen
备注：7 pages, submitted to IJCNN
摘要：现代部署通常需要在严格的CPU和内存约束下以效率换取准确性，但常见的压缩代理（如参数计数或FLOP）无法可靠地预测挂钟推断时间。特别是，非结构化稀疏可以减少模型存储，同时由于不规则的内存访问和稀疏的内核开销，无法加速（有时会稍微减慢）标准CPU执行。出于压缩和加速之间的这种差距，我们研究了一个实用的，有序的管道，通过结合三种广泛使用的技术来测量延迟：非结构化修剪，INT 8量化感知训练（QAT）和知识蒸馏（KD）。从经验上讲，INT 8 QAT提供了主要的运行时优势，而修剪主要充当容量减少预处理器，提高了后续低精度优化的鲁棒性;最后应用的KD在已经受到约束的稀疏INT 8机制内恢复精度，而不改变部署形式。我们使用三个主干（ResNet-18，WRN-28-10和VGG-16-BN）在CIFAR-10/100上进行评估。在所有设置中，有序管道实现了比任何单一技术更强的准确性-大小-延迟边界，达到0.99-1.42 ms CPU延迟，具有竞争力的准确性和紧凑的检查点。具有固定的20/40/40时期分配的受控排序消融进一步证实了阶段顺序是必然的，所提出的排序通常在所测试的排列中表现最好。总的来说，我们的研究结果为边缘部署提供了一个简单的指导方针：使用测量的运行时，而不是单独的代理指标，在联合精度-大小-延迟空间中评估压缩选择。
摘要：Modern deployment often requires trading accuracy for efficiency under tight CPU and memory constraints, yet common compression proxies such as parameter count or FLOPs do not reliably predict wall-clock inference time. In particular, unstructured sparsity can reduce model storage while failing to accelerate (and sometimes slightly slowing down) standard CPU execution due to irregular memory access and sparse kernel overhead. Motivated by this gap between compression and acceleration, we study a practical, ordered pipeline that targets measured latency by combining three widely used techniques: unstructured pruning, INT8 quantization-aware training (QAT), and knowledge distillation (KD). Empirically, INT8 QAT provides the dominant runtime benefit, while pruning mainly acts as a capacity-reduction pre-conditioner that improves the robustness of subsequent low-precision optimization; KD, applied last, recovers accuracy within the already constrained sparse INT8 regime without changing the deployment form. We evaluate on CIFAR-10/100 using three backbones (ResNet-18, WRN-28-10, and VGG-16-BN). Across all settings, the ordered pipeline achieves a stronger accuracy-size-latency frontier than any single technique alone, reaching 0.99-1.42 ms CPU latency with competitive accuracy and compact checkpoints. Controlled ordering ablations with a fixed 20/40/40 epoch allocation further confirm that stage order is consequential, with the proposed ordering generally performing best among the tested permutations. Overall, our results provide a simple guideline for edge deployment: evaluate compression choices in the joint accuracy-size-latency space using measured runtime, rather than proxy metrics alone.

【17】A Theory-guided Weighted $L^2$ Loss for solving the BGK model via Physics-informed neural networks
标题：通过物理知识神经网络求解BGK模型的理论指导的加权$L ' 2 $损失
链接：https://arxiv.org/abs/2604.04971

作者：Gyounghun Ko,Sung-Jun Son,Seung Yeon Cho,Myeong-Su Lee
备注：26 pages, 9 figures
摘要：虽然物理信息神经网络为求解偏微分方程提供了一个有前途的框架，但当应用于Bhatnagar-Gross-Krook（BGK）模型时，标准的L^2 $损失公式是根本不够的。具体地说，简单地最小化标准损失并不能保证宏观矩的准确预测，导致近似解无法捕获真正的物理解。为了克服这一限制，我们引入了一个速度加权损失函数，旨在有效地惩罚高速区域的错误。通过建立所提出的方法的稳定性估计，我们表明，最小化建议的加权损失保证近似解的收敛性。此外，数值实验表明，采用这种加权PINN损失导致优越的准确性和鲁棒性在各种基准相比，标准的方法。
摘要：While Physics-Informed Neural Networks offer a promising framework for solving partial differential equations, the standard $L^2$ loss formulation is fundamentally insufficient when applied to the Bhatnagar-Gross-Krook (BGK) model. Specifically, simply minimizing the standard loss does not guarantee accurate predictions of the macroscopic moments, causing the approximate solutions to fail in capturing the true physical solution. To overcome this limitation, we introduce a velocity-weighted $L^2$ loss function designed to effectively penalize errors in the high-velocity regions. By establishing a stability estimate for the proposed approach, we shows that minimizing the proposed weighted loss guarantees the convergence of the approximate solution. Also, numerical experiments demonstrate that employing this weighted PINN loss leads to superior accuracy and robustness across various benchmarks compared to the standard approach.

【18】Pixel-Translation-Equivariant Quantum Convolutional Neural Networks via Fourier Multiplexers
标题：通过傅里叶多路转换器的像素变换等变量子卷积神经网络
链接：https://arxiv.org/abs/2604.06094

作者：Dmitry Chirkov,Igor Lobanov
摘要：卷积神经网络的成功很大程度上归功于硬编码的翻译等变性。量子卷积神经网络（QCNN）已被提出作为近期量子模拟，但相关的翻译概念取决于数据编码。对于地址/幅度编码（如FRQI），像素移位充当索引寄存器上的模加法，而许多受MERA启发的QCNN仅在物理量子位的循环排列下是等变的。我们将这种不匹配形式化，并构造QCNN层，这些层与编码引起的像素循环移位（PCS）对称性完全一致。我们的主要技术成果是所有PCS-等变酉的构造性表征：量子傅里叶变换（QFT）的共轭使平移对角化，因此任何PCS-等变层都是傅里叶模复用器，然后是逆QFT（IQFT）。在此特征的基础上，我们引入了一个深度PCS-QCNN，它具有测量诱导池化、延迟调节和层间QFT取消。我们还分析了随机初始化的可训练性，并证明了在深度缩放制度中保持恒定的预期平方梯度范数的下限，从这个意义上排除了深度引起的贫瘠高原。
摘要：Convolutional neural networks owe much of their success to hard-coding translation equivariance. Quantum convolutional neural networks (QCNNs) have been proposed as near-term quantum analogues, but the relevant notion of translation depends on the data encoding. For address/amplitude encodings such as FRQI, a pixel shift acts as modular addition on an index register, whereas many MERA-inspired QCNNs are equivariant only under cyclic permutations of physical qubits. We formalize this mismatch and construct QCNN layers that commute exactly with the pixel cyclic shift (PCS) symmetry induced by the encoding. Our main technical result is a constructive characterization of all PCS-equivariant unitaries: conjugation by the quantum Fourier transform (QFT) diagonalizes translations, so any PCS-equivariant layer is a Fourier-mode multiplexer followed by an inverse QFT (IQFT). Building on this characterization, we introduce a deep PCS-QCNN with measurement-induced pooling, deferred conditioning, and inter-layer QFT cancellation. We also analyze trainability at random initialization and prove a lower bound on the expected squared gradient norm that remains constant in a depth-scaling regime, ruling out a depth-induced barren plateau in that sense.

【19】A deep learning framework for jointly solving transient Fokker-Planck equations with arbitrary parameters and initial distributions
标题：用于联合求解具有任意参数和初始分布的瞬时福克-普朗克方程的深度学习框架
链接：https://arxiv.org/abs/2604.06001

作者：Xiaolong Wang,Jing Feng,Qi Liu,Chengli Tan,Yuanyuan Liu,Yong Xu
摘要：有效地求解福克-普朗克方程（FPE）是分析复杂参数化随机系统的核心。然而，当前的数值方法缺乏跨不同条件的并行计算能力，严重限制了全面的参数探索和瞬态分析。本文介绍了一种基于深度学习的伪分析概率解（PAPS），该方法通过单个训练过程，同时求解任意多模态初始分布、系统参数和时间点的瞬态FPE解。其核心思想是通过高斯混合分布（GMD）统一初始，瞬态和稳态分布，并开发一种约束保持自动编码器，将受约束的GMD参数双射映射到无约束的低维潜在表示。在这个表示空间中，全景瞬态动态跨越不同的初始条件和系统参数可以由一个单一的进化网络建模。大量的实验表明，所提出的PAPS保持高精度，同时实现推理速度比GPU加速蒙特卡罗模拟快四个数量级。这种效率的飞跃，使以前棘手的实时参数扫描和随机分岔的系统调查。通过将表示学习与物理信息瞬态动力学解耦，我们的工作为多维参数化随机系统的概率建模建立了一个可扩展的范式。
摘要：Efficiently solving the Fokker-Planck equation (FPE) is central to analyzing complex parameterized stochastic systems. However, current numerical methods lack parallel computation capabilities across varying conditions, severely limiting comprehensive parameter exploration and transient analysis. This paper introduces a deep learning-based pseudo-analytical probability solution (PAPS) that, via a single training process, simultaneously resolves transient FPE solutions for arbitrary multi-modal initial distributions, system parameters, and time points. The core idea is to unify initial, transient, and stationary distributions via Gaussian mixture distributions (GMDs) and develop a constraint-preserving autoencoder that bijectively maps constrained GMD parameters to unconstrained, low-dimensional latent representations. In this representation space, the panoramic transient dynamics across varying initial conditions and system parameters can be modeled by a single evolution network. Extensive experiments on paradigmatic systems demonstrate that the proposed PAPS maintains high accuracy while achieving inference speeds four orders of magnitude faster than GPU-accelerated Monte Carlo simulations. This efficiency leap enables previously intractable real-time parameter sweeps and systematic investigations of stochastic bifurcations. By decoupling representation learning from physics-informed transient dynamics, our work establishes a scalable paradigm for probabilistic modeling of multi-dimensional, parameterized stochastic systems.

【20】Multiscale Physics-Informed Neural Network for Complex Fluid Flows with Long-Range Dependencies
标题：基于多尺度物理信息的神经网络在复杂流体流动中的应用
链接：https://arxiv.org/abs/2604.05652

作者：Prashant Kumar,Rajesh Ranjan
备注：16 pages, 10 figures
摘要：流体流动由非线性Navier-Stokes方程控制，即使在可预测的初始条件下，该方程也可以表现出多尺度动力学。预测这种现象仍然是科学机器学习中的一个巨大挑战，特别是在收敛速度，数据要求和解决方案准确性方面。在复杂的流体流动中，这些挑战因远距离边界条件引起的长距离空间依赖性而加剧，这通常需要大量的监督数据来实现可接受的结果。我们提出了域分解和移位物理信息神经网络（DDS-PINN），这是一个旨在以最少的监督解决这种多尺度相互作用的框架。通过利用具有统一全局损耗的局部网络，DDS-PINN在保持局部精度的同时捕获全局依赖性。该方法的鲁棒性在一套基准，包括多尺度线性微分方程，非线性Burgers方程，和无数据的Navier-Stokes模拟平板边界层。最后，DDS-PINN被应用到计算上具有挑战性的后向台阶（BFS）问题;对于层流状态（Re = 100），该模型产生的结果与计算流体动力学（CFD）相当，而不需要任何数据，准确地预测边界层厚度，分离和再附着长度。对于Re = 10，000的湍流BFS流，该框架仅使用500个随机监督点（
摘要：Fluid flows are governed by the nonlinear Navier-Stokes equations, which can manifest multiscale dynamics even from predictable initial conditions. Predicting such phenomena remains a formidable challenge in scientific machine learning, particularly regarding convergence speed, data requirements, and solution accuracy. In complex fluid flows, these challenges are exacerbated by long-range spatial dependencies arising from distant boundary conditions, which typically necessitate extensive supervision data to achieve acceptable results. We propose the Domain-Decomposed and Shifted Physics-Informed Neural Network (DDS-PINN), a framework designed to resolve such multiscale interactions with minimal supervision. By utilizing localized networks with a unified global loss, DDS-PINN captures global dependencies while maintaining local precision. The robustness of the approach is demonstrated across a suite of benchmarks, including a multiscale linear differential equation, the nonlinear Burgers' equation, and data-free Navier-Stokes simulations of flat-plate boundary layers. Finally, DDS-PINN is applied to the computationally challenging backward-facing step (BFS) problem; for laminar regimes (Re = 100), the model yields results comparable to computational fluid dynamics (CFD) without the need for any data, accurately predicting boundary layer thickness, separation, and reattachment lengths. For turbulent BFS flow at Re = 10,000, the framework achieves convergence to O(10^-4) using only 500 random supervision points (< 0.3 % of the total domain), outperforming established methods like Residual-based Attention-PINN in accuracy. This approach demonstrates strong potential for the super-resolution of complex turbulent flows from sparse experimental measurements.

【21】Individual-heterogeneous sub-Gaussian Mixture Models
标题：个体-异类亚高斯混合模型
链接：https://arxiv.org/abs/2604.05337

作者：Huan Qing
备注：32 pages, 4 figures, 2 tables
摘要：经典的高斯混合模型假设聚类内的均匀性，这一假设在现实世界的数据中经常失败，因为观察自然会表现出不同的尺度或强度。为了解决这个问题，我们引入了个体异质亚高斯混合模型，这是一个灵活的框架，它为每个观测分配了自己的异质性参数，从而明确地捕捉了实际应用中固有的异质性。在此模型的基础上，我们提出了一种高效的谱方法，该方法可证明在温和的分离条件下实现了真实集群标签的精确恢复，即使在特征数量远远超过样本数量的高维环境中也是如此。在合成数据和真实数据上的数值实验表明，我们的方法始终优于现有的聚类算法，包括那些为经典高斯混合模型设计的算法。
摘要：The classical Gaussian mixture model assumes homogeneity within clusters, an assumption that often fails in real-world data where observations naturally exhibit varying scales or intensities. To address this, we introduce the individual-heterogeneous sub-Gaussian mixture model, a flexible framework that assigns each observation its own heterogeneity parameter, thereby explicitly capturing the heterogeneity inherent in practical applications. Built upon this model, we propose an efficient spectral method that provably achieves exact recovery of the true cluster labels under mild separation conditions, even in high-dimensional settings where the number of features far exceeds the number of samples. Numerical experiments on both synthetic and real data demonstrate that our method consistently outperforms existing clustering algorithms, including those designed for classical Gaussian mixture models.

【22】Robust Learning of Heterogeneous Dynamic Systems
标题：异类动态系统的鲁棒学习
链接：https://arxiv.org/abs/2604.05285

作者：Shuoxun Xu,Zijian Guo,Brooke R. Staveland,Robert T. Knight,Lexin Li
摘要：常微分方程（ODE）为在广泛的科学领域中产生的动态系统建模提供了一个强大的框架。然而，大多数现有的ODE方法集中在一个单一的系统，并没有充分解决的问题，学习共享模式从多个异构的动态系统。在这篇文章中，我们提出了一种新的分布式鲁棒学习方法建模异构ODE系统。具体来说，我们通过最大化由轨迹导数的凸组合形成的不确定性类的最坏情况奖励来构建一个鲁棒的动态系统。我们表明，得到的估计量承认显式的加权平均表示，其中权重是从平衡多个数据源信息的二次优化中获得的。我们进一步开发了一个双水平的稳定过程，以解决潜在的不稳定估计。我们建立了严格的理论保证所提出的方法，包括稳定的权重，鲁棒轨迹估计的误差界的一致性，逐点置信区间的渐近有效性。我们证明，所提出的方法大大提高了泛化性能相比，通过广泛的模拟和颅内脑电图数据的分析的替代解决方案。
摘要：Ordinary differential equations (ODEs) provide a powerful framework for modeling dynamic systems arising in a wide range of scientific domains. However, most existing ODE methods focus on a single system, and do not adequately address the problem of learning shared patterns from multiple heterogeneous dynamic systems. In this article, we propose a novel distributionally robust learning approach for modeling heterogeneous ODE systems. Specifically, we construct a robust dynamic system by maximizing a worst-case reward over an uncertainty class formed by convex combinations of the derivatives of trajectories. We show the resulting estimator admits an explicit weighted average representation, where the weights are obtained from a quadratic optimization that balances information across multiple data sources. We further develop a bi-level stabilization procedure to address potential instability in estimation. We establish rigorous theoretical guarantees for the proposed method, including consistency of the stabilized weights, error bound for robust trajectory estimation, and asymptotical validity of pointwise confidence interval. We demonstrate that the proposed method considerably improves the generalization performance compared to the alternative solutions through both extensive simulations and the analysis of an intracranial electroencephalogram data.

【23】fastml: Guarded Resampling Workflows for Safer Automated Machine Learning in R
标题：fastml：保护的重新分配工作流，以实现R中更安全的自动化机器学习
链接：https://arxiv.org/abs/2604.05225

作者：Selcuk Korkmaz,Dincer Goksuluk,Eda Karaismailoglu
备注：36 pages, 2 figures
摘要：预处理泄漏发生在缩放、插补或其他数据相关的转换在重新计算之前进行估计时，从而提高了表观性能，同时仍然难以检测。我们提出了fastml，这是一个R包，它通过防护重采样为泄漏感知机器学习提供了一个单调用接口，其中预处理在每个重采样内重新估计并应用于相应的评估数据。该软件包支持分组和按时间排序的恢复，阻止高风险配置，审计外部依赖项的配方，并包括沙箱执行和集成模型解释。我们评估fastml的蒙特卡罗模拟对比全球和折叠局部归一化，可用性比较与tidymodels匹配的规格，不同大小的数据集之间的生存基准。仿真结果表明，全球预处理大幅膨胀明显的性能相对于保护响应。fastml在减少工作流程编排的同时，与tidymodels获得的保持性能相匹配，并且它通过统一的界面支持多个生存模型类的一致基准测试。
摘要：Preprocessing leakage arises when scaling, imputation, or other data-dependent transformations are estimated before resampling, inflating apparent performance while remaining hard to detect. We present fastml, an R package that provides a single-call interface for leakage-aware machine learning through guarded resampling, where preprocessing is re-estimated inside each resample and applied to the corresponding assessment data. The package supports grouped and time-ordered resampling, blocks high-risk configurations, audits recipes for external dependencies, and includes sandboxed execution and integrated model explanation. We evaluate fastml with a Monte Carlo simulation contrasting global and fold-local normalization, a usability comparison with tidymodels under matched specifications, and survival benchmarks across datasets of different sizes. The simulation demonstrates that global preprocessing substantially inflates apparent performance relative to guarded resampling. fastml matched held-out performance obtained with tidymodels while reducing workflow orchestration, and it supported consistent benchmarking of multiple survival model classes through a unified interface.

【24】Learning to Unscramble Feynman Loop Integrals with SAILIR
标题：学习用SAILIR解读费曼循环积分
链接：https://arxiv.org/abs/2604.05034

作者：David Shih
备注：16 pages, 3 figures, 5 tables, work done in collaboration with Claude Code
摘要：费曼积分的分部积分化约是高能物理精确计算中的一个关键计算瓶颈。基于Laporta算法的传统方法需要求解大型方程组，导致内存消耗随着积分复杂度迅速增长。我们提出了SAILIR（自监督AI for Loop Integral Reduction），这是一种新的机器学习方法，其中基于transformer的分类器以完全在线的方式一步一步地指导积分的减少。分类器是在一个完全自我监督的方式训练合成数据生成的一个扰码/unscramble程序：已知的减少身份被反向应用，以建立日益复杂的表达式，分类器学习撤销这些步骤。当与波束搜索和高度并行化的异步单集缩减策略相结合时，SAILIR可以用有限的内存缩减任意高权重的积分。我们基准SAILIR上的两个循环的三角形盒拓扑结构，比较对国家的最先进的IBP减少代码基拉在16个积分的不同复杂性。虽然SAILIR在挂钟时间上较慢，但其每个工作者的内存消耗几乎保持不变，而Kira的内存随着复杂度的增加而快速增长。对于这里考虑的最复杂的积分，SAILIR仅使用Kira的40%的内存，同时实现了相当的缩减时间。这证明了一种全新的IBP降低范例，其中可以完全克服基于Laporta的方法的内存瓶颈，可能为目前难以解决的精确计算打开大门。
摘要：Integration-by-parts (IBP) reduction of Feynman integrals to master integrals is a key computational bottleneck in precision calculations in high-energy physics. Traditional approaches based on the Laporta algorithm require solving large systems of equations, leading to memory consumption that grows rapidly with integral complexity. We present SAILIR (Self-supervised AI for Loop Integral Reduction), a new machine learning approach in which a transformer-based classifier guides the reduction of integrals one step at a time in a fully online fashion. The classifier is trained in an entirely self-supervised manner on synthetic data generated by a scramble/unscramble procedure: known reduction identities are applied in reverse to build expressions of increasing complexity, and the classifier learns to undo these steps. When combined with beam search and a highly parallelized, asynchronous, single-episode reduction strategy, SAILIR can reduce integrals of arbitrarily high weight with bounded memory. We benchmark SAILIR on the two-loop triangle-box topology, comparing against the state-of-the-art IBP reduction code Kira across 16 integrals of varying complexity. While SAILIR is slower in wall-clock time, its per-worker memory consumption remains approximately flat regardless of integral complexity, in contrast to Kira whose memory grows rapidly with complexity. For the most complex integrals considered here, SAILIR uses only 40\% of the memory of Kira while achieving comparable reduction times. This demonstrates a fundamentally new paradigm for IBP reduction in which the memory bottleneck of Laporta-based approaches could be entirely overcome, potentially opening the door to precision calculations that are currently intractable.

【25】Learning Nonlinear Regime Transitions via Semi-Parametric State-Space Models
标题：通过半参数状态空间模型学习非线性政权转变
链接：https://arxiv.org/abs/2604.04963

作者：Prakul Sunil Hiremath
备注：12 pages, 1 figures, 2 tables
摘要：我们开发了一个半参数状态空间模型的时间序列数据与潜在的政权过渡。经典的马尔可夫转换模型使用固定的参数转换函数，如logistic或probit链接，当转换依赖于非线性和上下文相关的影响时，限制了灵活性。我们将这个假设替换为学习函数f_0，f_1 \in \calH$，其中$\calH$是再生核希尔伯特空间或样条逼近空间，并将转移概率定义为$p_{jk，t} = \sigmoid（f（\bx_{t-1}））$。转移函数估计联合使用广义期望最大化算法的排放参数。E步骤使用标准的向前向后递归，而M步骤减少到一个惩罚回归问题，其权重来自平滑的占领措施。我们建立了可识别性条件，并提供了一致性参数的估计。合成数据上的实验表明，与参数基线相比，非线性过渡动力学的恢复得到了改善。对金融时间序列的实证研究表明，改进的制度分类和早期检测的过渡事件。
摘要：We develop a semi-parametric state-space model for time-series data with latent regime transitions. Classical Markov-switching models use fixed parametric transition functions, such as logistic or probit links, which restrict flexibility when transitions depend on nonlinear and context-dependent effects. We replace this assumption with learned functions $f_0, f_1 \in \calH$, where $\calH$ is either a reproducing kernel Hilbert space or a spline approximation space, and define transition probabilities as $p_{jk,t} = \sigmoid(f(\bx_{t-1}))$. The transition functions are estimated jointly with emission parameters using a generalized Expectation-Maximization algorithm. The E-step uses the standard forward-backward recursion, while the M-step reduces to a penalized regression problem with weights from smoothed occupation measures. We establish identifiability conditions and provide a consistency argument for the resulting estimators. Experiments on synthetic data show improved recovery of nonlinear transition dynamics compared to parametric baselines. An empirical study on financial time series demonstrates improved regime classification and earlier detection of transition events.

其他(26篇)

【1】In-Place Test-Time Training
标题：现场测试时间训练
链接：https://arxiv.org/abs/2604.06169

作者：Guhao Feng,Shengjie Luo,Kai Hua,Ge Zhang,Di He,Wenhao Huang,Tianle Cai
备注：ICLR 2026 Oral Presentation; Code is released at https://github.com/ByteDance-Seed/In-Place-TTT
摘要：静态的"训练然后部署”范式从根本上限制了大型语言模型（LLM）动态地调整它们的权重以响应真实世界任务中固有的连续的新信息流。测试时训练（TTT）通过在推理时更新模型参数的子集（快速权重）提供了一种引人注目的替代方案，但其在当前LLM生态系统中的潜力受到关键障碍的阻碍，包括架构不兼容，计算效率低下和语言建模的快速权重目标不一致。在这项工作中，我们介绍了就地测试时训练（就地TTT），一个框架，无缝地赋予LLM测试时训练能力。In-Place TTT将无处不在的MLP块的最终投影矩阵视为其自适应快速权重，从而实现LLM的“插入式”增强，而无需从头开始进行昂贵的重新训练。此外，我们将TTT的通用重建目标替换为定制的、理论上有根据的目标，该目标与管理自回归语言建模的下一个令牌预测任务明确一致。这一原则性的目标，结合一个有效的分块更新机制，结果在一个高度可扩展的算法兼容上下文并行。大量的实验验证了我们框架的有效性：作为一种就地增强，它使4 B参数模型能够在上下文高达128 k的任务上实现卓越的性能，并且当从头开始预训练时，它始终优于竞争对手的TTT相关方法。消融研究结果进一步为我们的设计选择提供了更深入的见解。总的来说，我们的研究结果建立在地方TTT作为一个充满希望的一步，在LLM持续学习的范例。
摘要：The static ``train then deploy" paradigm fundamentally limits Large Language Models (LLMs) from dynamically adapting their weights in response to continuous streams of new information inherent in real-world tasks. Test-Time Training (TTT) offers a compelling alternative by updating a subset of model parameters (fast weights) at inference time, yet its potential in the current LLM ecosystem is hindered by critical barriers including architectural incompatibility, computational inefficiency and misaligned fast weight objectives for language modeling. In this work, we introduce In-Place Test-Time Training (In-Place TTT), a framework that seamlessly endows LLMs with Test-Time Training ability. In-Place TTT treats the final projection matrix of the ubiquitous MLP blocks as its adaptable fast weights, enabling a ``drop-in" enhancement for LLMs without costly retraining from scratch. Furthermore, we replace TTT's generic reconstruction objective with a tailored, theoretically-grounded objective explicitly aligned with the Next-Token-Prediction task governing autoregressive language modeling. This principled objective, combined with an efficient chunk-wise update mechanism, results in a highly scalable algorithm compatible with context parallelism. Extensive experiments validate our framework's effectiveness: as an in-place enhancement, it enables a 4B-parameter model to achieve superior performance on tasks with contexts up to 128k, and when pretrained from scratch, it consistently outperforms competitive TTT-related approaches. Ablation study results further provide deeper insights on our design choices. Collectively, our results establish In-Place TTT as a promising step towards a paradigm of continual learning in LLMs.

【2】The Character Error Vector: Decomposable errors for page-level OCR evaluation
标题：字符错误载体：页面级OCR评估的可分解错误
链接：https://arxiv.org/abs/2604.06160

作者：Jonathan Bourne,Mwiza Simbeye,Joseph Nockels
备注：6643 words, 5 figures, 15 tables
摘要：字符错误率（CER）是评价光学字符识别（OCR）质量的一个重要指标。然而，这个指标假设文本已经被完美解析，而事实往往并非如此。在页面解析错误下，CER变得未定义，限制了其作为度量的使用，并使评估页面级OCR具有挑战性，特别是在使用不共享标签模式的数据时。我们介绍了字符错误向量（CEV），一袋字符的OCR评估。CEV可以分解为解析和OCR以及交互错误组件。这种可分解性允许从业者专注于文档理解管道的一部分，这将对整体文本提取质量产生最大影响。CEV可以使用各种方法来实现，其中我们演示了空间感知字符错误率（Spatially Aware Character Error Rate）和使用Jensen-Shannon距离的字符分布方法。我们根据其他指标验证CEV的性能：首先，与CER的关系;然后，解析质量;最后，作为页面级OCR质量的直接衡量标准。验证过程表明，CEV是一个有价值的桥梁之间的分析指标和本地指标，如CER。我们分析了由具有复杂布局的退化图像组成的档案报纸数据集，发现最先进的端到端模型优于更传统的管道方法。虽然CEV需要字符级定位以实现最佳分类，但对容易获得的值进行阈值处理可以预测F1为0.91的主要误差源。我们将CEV作为Python库的一部分提供，以支持文档理解研究。
摘要：The Character Error Rate (CER) is a key metric for evaluating the quality of Optical Character Recognition (OCR). However, this metric assumes that text has been perfectly parsed, which is often not the case. Under page-parsing errors, CER becomes undefined, limiting its use as a metric and making evaluating page-level OCR challenging, particularly when using data that do not share a labelling schema. We introduce the Character Error Vector (CEV), a bag-of-characters evaluator for OCR. The CEV can be decomposed into parsing and OCR, and interaction error components. This decomposability allows practitioners to focus on the part of the Document Understanding pipeline that will have the greatest impact on overall text extraction quality. The CEV can be implemented using a variety of methods, of which we demonstrate SpACER (Spatially Aware Character Error Rate) and a Character distribution method using the Jensen-Shannon Distance. We validate the CEV's performance against other metrics: first, the relationship with CER; then, parse quality; and finally, as a direct measure of page-level OCR quality. The validation process shows that the CEV is a valuable bridge between parsing metrics and local metrics like CER. We analyse a dataset of archival newspapers made of degraded images with complex layouts and find that state-of-the-art end-to-end models are outperformed by more traditional pipeline approaches. Whilst the CEV requires character-level positioning for optimal triage, thresholding on easily available values can predict the main error source with an F1 of 0.91. We provide the CEV as part of a Python library to support Document understanding research.

【3】Gym-Anything: Turn any Software into an Agent Environment
标题：Gym-Anything：将任何软件变成代理环境
链接：https://arxiv.org/abs/2604.06126

作者：Pranjal Aggarwal,Graham Neubig,Sean Welleck
摘要：计算机使用代理人有望协助广泛的数字经济活动。然而，目前的研究主要集中在短期任务，在有限的软件集有限的经济价值，如基本的电子商务和操作系统配置任务。一个关键原因是，为复杂软件创建环境需要大量的时间和人力，因此无法扩展。为了解决这个问题，我们引入了Gym-Anything，这是一个将任何软件转换为交互式计算机使用环境的框架。我们将环境创建本身视为一个多代理任务：编码代理编写安装脚本，下载真实世界的数据，并配置软件，同时生成正确设置的证据。然后，一个独立的审计代理根据质量检查表验证环境设置的证据。使用基于美国GDP数据的经济价值职业分类法，我们将此管道应用于200个具有广泛职业覆盖范围的软件应用程序。其结果是CUA-World，这是一个超过10000个长期任务的集合，涵盖了从医学科学和天文学到工程和企业系统的各个领域，每个任务都配置了真实的数据以及训练和测试分割。CUA-World还包括CUA-World-Long，这是一个具有挑战性的长期基准，任务通常需要超过500个步骤，远远超过现有的基准。从训练分割中提取成功的轨迹到2B视觉语言模型中，其性能优于其大小的2$\times$。我们还在测试时应用相同的审计原则：单独的VLM审查已完成的轨迹并提供有关剩余内容的反馈，将CUA-World-Long上的Gemini-3-Flash从11.5%提高到14.0%。我们发布了所有的代码，基础设施和基准数据，以促进未来在现实的计算机使用代理的研究。
摘要：Computer-use agents hold the promise of assisting in a wide range of digital economic activities. However, current research has largely focused on short-horizon tasks over a limited set of software with limited economic value, such as basic e-commerce and OS-configuration tasks. A key reason is that creating environments for complex software requires significant time and human effort, and therefore does not scale. To address this, we introduce Gym-Anything, a framework for converting any software into an interactive computer-use environment. We frame environment creation itself as a multi-agent task: a coding agent writes setup scripts, downloads real-world data, and configures the software, while producing evidence of correct setup. An independent audit agent then verifies evidence for the environment setup against a quality checklist. Using a taxonomy of economically valuable occupations grounded in U.S. GDP data, we apply this pipeline to 200 software applications with broad occupational coverage. The result is CUA-World, a collection of over 10K long-horizon tasks spanning domains from medical science and astronomy to engineering and enterprise systems, each configured with realistic data along with train and test splits. CUA-World also includes CUA-World-Long, a challenging long-horizon benchmark with tasks often requiring over 500 steps, far exceeding existing benchmarks. Distilling successful trajectories from the training split into a 2B vision-language model outperforms models 2$\times$ its size. We also apply the same auditing principle at test time: a separate VLM reviews completed trajectories and provides feedback on what remains, improving Gemini-3-Flash on CUA-World-Long from 11.5% to 14.0%. We release all code, infrastructure, and benchmark data to facilitate future research in realistic computer-use agents.

【4】CoStream: Codec-Guided Resource-Efficient System for Video Streaming Analytics
标题：CoStream：编解码器引导的资源高效的视频流分析系统
链接：https://arxiv.org/abs/2604.06036

作者：Yulin Zou,Yan Chen,Wenyan Chen,JooYoung Park,Shivaraman Nitin,Luo Tao,Francisco Romero,Dmitrii Ustiugov
备注：18 pages, 34 figures
摘要：视频流分析是视觉语言模型服务的关键工作负载，但多模态推理的高成本限制了可扩展性。现有系统通过利用视频流中的时间和空间冗余来降低推理成本，但是它们以有限视图的Vision Transformer（ViT）或LLM为目标，从而留下未利用的端到端机会。此外，现有的方法会产生显着的开销来识别冗余，无论是通过离线分析和训练或昂贵的在线计算，使它们不适合动态实时流。我们提出了CoStream，一个编解码器引导的流媒体视频分析系统，它建立在一个关键的观察基础上，即视频编解码器已经提取了每个流的时间和空间结构作为压缩的副产品。CoStream将此编解码器元数据视为低成本运行时信号，以统一视频解码，视觉处理和LLM预填充的优化，并将传输减少作为直接在压缩位流上操作的固有优势。这在ViT编码之前驱动编解码器引导的补丁修剪，并在LLM预填充期间驱动选择性键值缓存刷新，这两者都是完全在线的，不需要离线训练。实验表明，与最先进的基线相比，CoStream实现了高达3倍的吞吐量提升和高达87%的GPU计算减少，同时保持了具有竞争力的准确性，只有0-8%的F1下降。
摘要：Video streaming analytics is a crucial workload for vision-language model serving, but the high cost of multimodal inference limits scalability. Prior systems reduce inference cost by exploiting temporal and spatial redundancy in video streams, but they target either the vision transformer (ViT) or the LLM with a limited view, leaving end-to-end opportunities untapped. Moreover, existing methods incur significant overhead to identify redundancy, either through offline profiling and training or costly online computation, making them ill-suited for dynamic real-time streams. We present CoStream, a codec-guided streaming video analytics system built on a key observation that video codecs already extract the temporal and spatial structure of each stream as a byproduct of compression. CoStream treats this codec metadata as a low-cost runtime signal to unify optimization across video decoding, visual processing, and LLM prefilling, with transmission reduction as an inherent benefit of operating directly on compressed bitstreams. This drives codec-guided patch pruning before ViT encoding and selective key-value cache refresh during LLM prefilling, both of which are fully online and do not require offline training. Experiments show that CoStream achieves up to 3x throughput improvement and up to 87% GPU compute reduction over state-of-the-art baselines, while maintaining competitive accuracy with only 0-8% F1 drop.

【5】Gated-SwinRMT: Unifying Swin Windowed Attention with Retentive Manhattan Decay via Input-Dependent Gating
标题：盖特-斯温RMT：通过依赖输入的门控将斯温窗口注意力与保留的曼哈顿衰退统一起来
链接：https://arxiv.org/abs/2604.06014

作者：Dipan Maity,Suman Mondal,Arindam Roy
摘要：我们介绍Gated-SwinRMT，一个家庭的混合Vision Transformers，结合了转移窗口的Swin Transformer的注意力与曼哈顿距离空间衰减的保持网络（RMT），增强输入依赖门控。自我注意力被分解成连续的宽度方向和高度方向的保留通道内的每个移位窗口，其中每个头指数衰减掩模提供了一个二维的位置之前没有学习的位置偏差。提出了两个变式。\textbf{Gated-SwinRMT-SWAT}用sigmoid激活替换softmax，用乘法激活后空间衰减实现平衡的ALiBi斜率，并通过SwiGLU门控值投影;归一化输出隐含地抑制了无信息的注意力分数。\textbf{Gated-SwinRMT-Retention}保留了具有加性对数空间衰减偏置的softmax归一化保留，并结合了显式G1 sigmoid门-从块输入投影并在局部上下文增强（LCE）之后应用，但在输出投影~$W_O$之前-以减轻低秩$W_V \！\ cdot\！W_O$瓶颈，并启用依赖于输入的参与输出抑制。我们在Mini-ImageNet（$224{\times}224$，100类）和CIFAR-10（$32{\times}32$，10类）上评估了两种变体，使用相同的训练协议，由于资源限制，使用单个GPU。在${\approx}77$-$79 $\，M参数下，Gated-SwinRMT-SWAT在Mini-ImageNet上达到$80.22\%$和Gated-SwinRMT-Retention $78.20\%$ top-1测试准确度，而RMT基线为$73.74\%$。在CIFAR-10上，小的特征映射会导致自适应窗口机制将注意力集中到全局范围，准确性优势从$+6.48 $\，pp压缩到$+0.56 $\，pp。
摘要：We introduce Gated-SwinRMT, a family of hybrid vision transformers that combine the shifted-window attention of the Swin Transformer with the Manhattan-distance spatial decay of Retentive Networks (RMT), augmented by input-dependent gating. Self-attention is decomposed into consecutive width-wise and height-wise retention passes within each shifted window, where per-head exponential decay masks provide a two-dimensional locality prior without learned positional biases. Two variants are proposed. \textbf{Gated-SwinRMT-SWAT} substitutes softmax with sigmoid activation, implements balanced ALiBi slopes with multiplicative post-activation spatial decay, and gates the value projection via SwiGLU; the Normalized output implicitly suppresses uninformative attention scores. \textbf{Gated-SwinRMT-Retention} retains softmax-normalized retention with an additive log-space decay bias and incorporates an explicit G1 sigmoid gate -- projected from the block input and applied after local context enhancement (LCE) but prior to the output projection~$W_O$ -- to alleviate the low-rank $W_V \!\cdot\! W_O$ bottleneck and enable input-dependent suppression of attended outputs. We assess both variants on Mini-ImageNet ($224{\times}224$, 100 classes) and CIFAR-10 ($32{\times}32$, 10 classes) under identical training protocols, utilizing a single GPU due to resource limitations. At ${\approx}77$--$79$\,M parameters, Gated-SwinRMT-SWAT achieves $80.22\%$ and Gated-SwinRMT-Retention $78.20\%$ top-1 test accuracy on Mini-ImageNet, compared with $73.74\%$ for the RMT baseline. On CIFAR-10 -- where small feature maps cause the adaptive windowing mechanism to collapse attention to global scope -- the accuracy advantage compresses from $+6.48$\,pp to $+0.56$\,pp.

【6】JD-BP: A Joint-Decision Generative Framework for Auto-Bidding and Pricing
标题：JD-BP：自动投标和定价联合决策生成框架
链接：https://arxiv.org/abs/2604.05845

作者：Linghui Meng,Chun Gan,Shengsheng Niu,Chengcheng Zhang,Chenchen Li,Chuan Yang,Yi Mao,Xin Zhu,Jie He,Zhangang Lin,Ching Law
备注：10 pages, 2 figures
摘要：自动出价服务在关键绩效指标（KPI）约束（例如目标投资回报率和预算）下为广告商优化实时出价策略。然而，模型预测误差和反馈延迟等不确定性会导致投标策略偏离事后最优，导致分配效率低下。为了解决这个问题，我们提出了JD-BP，一个联合生成的投标和定价决策框架。与现有方法不同，JD-BP联合输出出价值和与支付规则（例如GSP）相加的定价校正项。为了减轻历史约束违反的不利影响，我们设计了一个无记忆的返回到去，鼓励未来价值最大化的投标行动，而累积的偏见是由定价校正处理。此外，轨迹增强算法，提出了一个（可能是任意的）基本投标政策，使我们的算法从现有的RL/生成投标模型的有效的即插即用部署的联合投标定价轨迹。最后，我们采用基于能量的直接偏好优化方法结合交叉注意模块，以提高联合学习性能的报价和定价校正。AuctionNet数据集上的离线实验表明，JD-BP实现了最先进的性能。在JD.com上进行的在线A/B测试证实了其实际效果，显示广告收入增加4.70%，目标成本提高6.48%。
摘要：Auto-bidding services optimize real-time bidding strategies for advertisers under key performance indicator (KPI) constraints such as target return on investment and budget. However, uncertainties such as model prediction errors and feedback latency can cause bidding strategies to deviate from ex-post optimality, leading to inefficient allocation. To address this issue, we propose JD-BP, a Joint generative Decision framework for Bidding and Pricing. Unlike prior methods, JD-BP jointly outputs a bid value and a pricing correction term that acts additively with the payment rule such as GSP. To mitigate adverse effects of historical constraint violations, we design a memory-less Return-to-Go that encourages future value maximizing of bidding actions while the cumulated bias is handled by the pricing correction. Moreover, a trajectory augmentation algorithm is proposed to generate joint bidding-pricing trajectories from a (possibly arbitrary) base bidding policy, enabling efficient plug-and-play deployment of our algorithm from existing RL/generative bidding models. Finally, we employ an Energy-Based Direct Preference Optimization method in conjunction with a cross-attention module to enhance the joint learning performance of bidding and pricing correction. Offline experiments on the AuctionNet dataset demonstrate that JD-BP achieves state-of-the-art performance. Online A/B tests at JD.com confirm its practical effectiveness, showing a 4.70% increase in ad revenue and a 6.48% improvement in target cost.

【7】Expectation Maximization (EM) Converges for General Agnostic Mixtures
标题：一般不可知混合物的期望最大化（EM）收敛
链接：https://arxiv.org/abs/2604.05842

作者：Avishek Ghosh
备注：Accepted at IEEE International Symposium on Information Theory (ISIT 2026)
摘要：混合线性回归在统计学和机器学习中得到了很好的研究，其中数据点是使用$k$线性模型概率生成的。像期望最大化（EM）这样的算法可以用于恢复该问题的地面真实回归量。最近，在\cite{pal 2022 learning，ghosh_agnostic}的混合线性回归问题的研究在不可知的设置，其中没有生成模型的数据被假定。相反，给定一组数据点，目标是通过最小化适当的损失函数来拟合k条线。结果表明，修改EM，即梯度EM指数收敛到适当定义的损失最小化，即使在不可知的设置。在本文中，我们研究的问题\n {fitting} $k$参数函数的数据点集。我们坚持不可知论设置。然而，而不是拟合线配备二次损失，我们认为任何任意参数函数拟合配备了强凸和光滑的损失。这个框架包含了一大类问题，包括混合线性回归（正则化），混合线性分类器（混合逻辑回归，混合支持向量机）和混合广义线性回归。我们提出并分析了这个问题的梯度EM，并表明，与适当的初始化和分离条件，梯度EM的迭代指数收敛到适当定义的人口损失最小的高概率。这表明EM型算法的有效性，它收敛到\n {最优}的解决方案，在非生成设置超越混合线性回归。
摘要：Mixture of linear regression is well studied in statistics and machine learning, where the data points are generated probabilistically using $k$ linear models. Algorithms like Expectation Maximization (EM) may be used to recover the ground truth regressors for this problem. Recently, in \cite{pal2022learning,ghosh_agnostic} the mixed linear regression problem is studied in the agnostic setting, where no generative model on data is assumed. Rather, given a set of data points, the objective is \emph{fit} $k$ lines by minimizing a suitable loss function. It is shown that a modification of EM, namely gradient EM converges exponentially to appropriately defined loss minimizer even in the agnostic setting. In this paper, we study the problem of \emph{fitting} $k$ parametric functions to given set of data points. We adhere to the agnostic setup. However, instead of fitting lines equipped with quadratic loss, we consider any arbitrary parametric function fitting equipped with a strongly convex and smooth loss. This framework encompasses a large class of problems including mixed linear regression (regularized), mixed linear classifiers (mixed logistic regression, mixed Support Vector Machines) and mixed generalized linear regression. We propose and analyze gradient EM for this problem and show that with proper initialization and separation condition, the iterates of gradient EM converge exponentially to appropriately defined population loss minimizers with high probability. This shows the effectiveness of EM type algorithm which converges to \emph{optimal} solution in the non-generative setup beyond mixture of linear regression.

【8】Bivariate Causal Discovery Using Rate-Distortion MDL: An Information Dimension Approach
标题：使用率失真MDL发现二元因果关系：信息维度方法
链接：https://arxiv.org/abs/2604.05829

作者：Tiago Brogueira,Mário A. T. Figueiredo
备注：22 pages
摘要：基于最小描述长度（MDL）原则的二元因果发现方法近似于模型在每个因果方向上的（不可计算的）Kolmogorov复杂度，选择具有较低总复杂度的模型。其前提是，自然界的机制在其真正的因果顺序中更为简单。每个方向的描述长度（复杂度）本质上包括对原因变量的描述和对因果机制的描述。在这项工作中，我们认为，目前国家的最先进的基于MDL的方法不正确地解决问题的原因变量的描述长度估计，有效地离开决定的因果机制的描述长度。基于率失真理论，我们提出了一种新的方法来衡量的原因的描述长度，对应于所需的最小速率，以实现一个失真水平代表的底层分布。该失真水平是使用基于直方图的密度估计的规则推导出的，而速率是使用信息维度的相关概念基于渐近近似计算的。结合它与传统的因果机制的方法，我们介绍了一种新的双变量因果发现方法，称为率失真MDL（RDMDL）。我们的实验表明，RDMDL在Tübingen数据集上实现了有竞争力的性能。所有的代码和实验都可以在github.com/tiagobrogueira/Causal-Discovery-In-Exchangeable-Data上公开获得。
摘要：Approaches to bivariate causal discovery based on the minimum description length (MDL) principle approximate the (uncomputable) Kolmogorov complexity of the models in each causal direction, selecting the one with the lower total complexity. The premise is that nature's mechanisms are simpler in their true causal order. Inherently, the description length (complexity) in each direction includes the description of the cause variable and that of the causal mechanism. In this work, we argue that current state-of-the-art MDL-based methods do not correctly address the problem of estimating the description length of the cause variable, effectively leaving the decision to the description length of the causal mechanism. Based on rate-distortion theory, we propose a new way to measure the description length of the cause, corresponding to the minimum rate required to achieve a distortion level representative of the underlying distribution. This distortion level is deduced using rules from histogram-based density estimation, while the rate is computed using the related concept of information dimension, based on an asymptotic approximation. Combining it with a traditional approach for the causal mechanism, we introduce a new bivariate causal discovery method, termed rate-distortion MDL (RDMDL). We show experimentally that RDMDL achieves competitive performance on the Tübingen dataset. All the code and experiments are publicly available at github.com/tiagobrogueira/Causal-Discovery-In-Exchangeable-Data.

【9】Evaluation of Randomization through Style Transfer for Enhanced Domain Generalization
标题：通过风格转移评估随机化以增强领域概括
链接：https://arxiv.org/abs/2604.05616

作者：Dustin Eisenhardt,Timothy Schaumlöffel,Alperen Kantarci,Gemma Roig
摘要：用于计算机视觉的深度学习模型在部署到现实环境中时，通常会遇到泛化能力差的问题，特别是在合成数据上进行训练时，由于众所周知的Sim 2 Real差距。尽管越来越流行的风格转移作为一种数据增强策略域泛化，文学包含未解决的矛盾，关于三个关键的设计轴：风格池的多样性，纹理复杂性的作用，和风格源的选择。我们提出了一个系统的实证研究，隔离和评估这些因素的驾驶场景的理解，解决以前的工作不一致。我们的研究结果表明，（i）扩大风格池产生更大的收益比重复扩增几个风格，（ii）纹理复杂性没有显着的影响时，池是足够大的，（iii）不同的艺术风格优于领域对齐的替代品。在这些见解的指导下，我们得到了StyleMixDG（域泛化的风格混合），这是一个轻量级的，模型不可知的增强配方，不需要架构修改或额外的损失。在GTAV $\rightarrow$ {BDD 100 k，Cityscapes，Mapillary Vistas}基准上进行评估后，StyleMixDG在强大的基准上表现出一致的改进，证实了经验确定的设计原则转化为实际收益。代码将在GitHub上发布。
摘要：Deep learning models for computer vision often suffer from poor generalization when deployed in real-world settings, especially when trained on synthetic data due to the well-known Sim2Real gap. Despite the growing popularity of style transfer as a data augmentation strategy for domain generalization, the literature contains unresolved contradictions regarding three key design axes: the diversity of the style pool, the role of texture complexity, and the choice of style source. We present a systematic empirical study that isolates and evaluates each of these factors for driving scene understanding, resolving inconsistencies in prior work. Our findings show that (i) expanding the style pool yields larger gains than repeated augmentation with few styles, (ii) texture complexity has no significant effect when the pool is sufficiently large, and (iii) diverse artistic styles outperform domain-aligned alternatives. Guided by these insights, we derive StyleMixDG (Style-Mixing for Domain Generalization), a lightweight, model-agnostic augmentation recipe that requires no architectural modifications or additional losses. Evaluated on the GTAV $\rightarrow$ {BDD100k, Cityscapes, Mapillary Vistas} benchmark, StyleMixDG demonstrates consistent improvements over strong baselines, confirming that the empirically identified design principles translate into practical gains. The code will be released on GitHub.

【10】Reproducing AlphaZero on Tablut: Self-Play RL for an Asymmetric Board Game
标题：在Tablut上再现AlphaZero：不对称棋盘游戏的自助RL
链接：https://arxiv.org/abs/2604.05476

作者：Tõnis Lees,Tambet Matiisen
备注：For the code see https://github.com/tonislees/TablutZero
摘要：这项工作研究了AlphaZero强化学习算法对Tablut的适应性，Tablut是一种不对称的历史棋盘游戏，具有不平等的棋子计数和不同的玩家目标（国王捕获与国王逃脱）。虽然最初的AlphaZero架构成功地利用了对称游戏的单一策略和值头，但将其应用于非对称环境会迫使网络学习两个相互冲突的评估函数，这可能会阻碍学习效率和性能。为了解决这个问题，核心架构被修改为为每个玩家角色使用单独的策略和价值头，同时保持共享的剩余主干以学习通用的董事会功能。在训练过程中，不对称结构引入了训练不稳定性，特别是攻击者和防御者角色之间的灾难性遗忘。这些问题通过应用C4数据增强、增加重放缓冲区大小以及让模型对随机采样的过去检查点进行25%的训练游戏来缓解。经过100次自我游戏迭代，修改后的模型表现出稳定的改进，相对于随机初始化的基线，BayesElo评分达到1235。培训指标还显示，政策熵和平均剩余部分显著下降，反映出越来越集中和果断的发挥。最终，实验证实，AlphaZero的自我游戏框架可以转移到高度不对称的游戏，只要采用不同的政策/价值头和强大的稳定技术。
摘要：This work investigates the adaptation of the AlphaZero reinforcement learning algorithm to Tablut, an asymmetric historical board game featuring unequal piece counts and distinct player objectives (king capture versus king escape). While the original AlphaZero architecture successfully leverages a single policy and value head for symmetric games, applying it to asymmetric environments forces the network to learn two conflicting evaluation functions, which can hinder learning efficiency and performance. To address this, the core architecture is modified to use separate policy and value heads for each player role, while maintaining a shared residual trunk to learn common board features. During training, the asymmetric structure introduced training instabilities, notably catastrophic forgetting between the attacker and defender roles. These issues were mitigated by applying C4 data augmentation, increasing the replay buffer size, and having the model play 25 percent of training games against randomly sampled past checkpoints. Over 100 self-play iterations, the modified model demonstrated steady improvement, achieving a BayesElo rating of 1235 relative to a randomly initialized baseline. Training metrics also showed a significant decrease in policy entropy and average remaining pieces, reflecting increasingly focused and decisive play. Ultimately, the experiments confirm that AlphaZero's self-play framework can transfer to highly asymmetric games, provided that distinct policy/value heads and robust stabilization techniques are employed.

【11】Top-K Retrieval with Fixed-Size Linear-Attention Completion: Backbone- and KV-Format-Preserving Attention for KV-Cache Read Reduction
标题：具有固定大小线性注意力完成的Top-K检索：主干和GV格式保留注意力以减少GV缓存读
链接：https://arxiv.org/abs/2604.05438

作者：Yasuto Hoshi,Daisuke Miyashita,Jun Deguchi
摘要：长上下文生成越来越受到解码时键值（KV）缓存流量的限制，特别是当KV被卸载到GPU内存之外时。查询感知检索（例如，Top-K选择）通过仅加载KV对的子集来减少此流量，但是当注意力质量分布在未检索的令牌上时，在子集上重新规范化softmax会引入偏差。我们提出了一个检索完成注意模块，保持骨干权重和KV缓存格式不变。对于每个查询，我们计算精确的关注汇/尾锚和查询相关的检索到的Top-K令牌，并估计剩余的中间区域的分子和分母使用一个固定大小的特征图摘要计算在预填充时间。我们在未归一化域中添加精确和估计的贡献，并应用单个归一化，在没有额外的注意力侧KV读数的情况下恢复缺失的softmax质量。在长上下文基准测试中，所提出的方法在匹配的令牌等效读取预算下优于仅选择Top-K，在高熵头中具有最大增益。
摘要：Long-context generation is increasingly limited by decode-time key-value (KV) cache traffic, particularly when KV is offloaded beyond GPU memory. Query-aware retrieval (e.g., Top-K selection) reduces this traffic by loading only a subset of KV pairs, but renormalizing the softmax over the subset introduces bias when attention mass is spread over unretrieved tokens. We propose a retrieval-completion attention module that keeps backbone weights and the KV-cache format unchanged. For each query, we compute exact attention over sink/tail anchors and the query-dependent retrieved Top-K tokens, and estimate the remaining mid-region numerator and denominator using a fixed-size feature-map summary computed at prefill time. We add the exact and estimated contributions in the unnormalized domain and apply a single normalization, recovering the missing softmax mass without additional attention-side KV reads. Across long-context benchmarks, the proposed method improves over selection-only Top-K at matched token-equivalent read budgets, with the largest gains in high-entropy heads.

【12】Spike Hijacking in Late-Interaction Retrieval
标题：后期交互检索中的尖峰劫持
链接：https://arxiv.org/abs/2604.05253

作者：Karthik Suresh,Tushar Vatsa,Tracy King,Asim Kadav,Michael Friedrich
备注：Accepted at the 1st Late Interaction Retrieval Workshop (LIR 2026) at ECIR 2026. Published in CEUR Workshop Proceedings
摘要：后期交互检索模型依赖于硬最大相似性（MaxSim）来聚合标记级相似性。虽然有效，但这种赢者通吃的池化规则可能会在结构上偏向训练动态。我们提供了一个机制研究的梯度路由和鲁棒性的MaxSim为基础的检索。在一个受控的合成环境中进行批量对比训练，我们证明了MaxSim比Top-k池化和softmax聚合等更平滑的替代方案诱导了更高的补丁级梯度浓度。虽然稀疏路由可以改善早期的歧视，它也增加了敏感性文件长度：随着文件补丁的数量增加，MaxSim下降更急剧比温和的平滑变量。我们证实了这些发现在现实世界中的多向量检索基准，其中控制文档长度扫描显示类似的脆性硬最大池。总之，我们的研究结果隔离池诱导的梯度浓度作为后期相互作用检索的结构属性，并突出了稀疏鲁棒性的权衡。这些发现激发了多向量检索系统中硬最大池的原则性替代方案。
摘要：Late-interaction retrieval models rely on hard maximum similarity (MaxSim) to aggregate token-level similarities. Although effective, this winner-take-all pooling rule may structurally bias training dynamics. We provide a mechanistic study of gradient routing and robustness in MaxSim-based retrieval. In a controlled synthetic environment with in-batch contrastive training, we demonstrate that MaxSim induces significantly higher patch-level gradient concentration than smoother alternatives such as Top-k pooling and softmax aggregation. While sparse routing can improve early discrimination, it also increases sensitivity to document length: as the number of document patches grows, MaxSim degrades more sharply than mild smoothing variants. We corroborate these findings on a real-world multi-vector retrieval benchmark, where controlled document-length sweeps reveal similar brittleness under hard max pooling. Together, our results isolate pooling-induced gradient concentration as a structural property of late-interaction retrieval and highlight a sparsity-robustness tradeoff. These findings motivate principled alternatives to hard max pooling in multi-vector retrieval systems.

【13】Improving Sparse Memory Finetuning
标题：改进稀疏内存微调
链接：https://arxiv.org/abs/2604.05248

作者：Satyam Goyal,Anirudh Kanchi,Garv Shah,Prakhar Gupta
摘要：大型语言模型（LLM）在训练后通常是静态的，但现实世界的应用程序需要不断适应新知识，而不会降低现有功能。更新模型的标准方法，如完全微调或参数有效方法（例如，LoRA），面临一个基本的权衡：灾难性的遗忘。它们修改共享的密集表示，导致任务之间的干扰。稀疏内存微调（SMF）提供了一个很有前途的替代方案，通过本地化更新显式内存层中的一小部分参数。在这项工作中，我们提出了一个开源管道，用稀疏内存模块改造现有的预训练模型（Qwen-2.5-0.5B），从而在消费者硬件上实现有效的持续学习。我们通过引入基于Kullback-Leibler（KL）发散的理论接地插槽选择机制来扩展先前的工作，该机制优先考虑相对于背景分布的信息“令人惊讶”的令牌的内存更新。我们的实验表明，我们的改造模型可以获得新的事实知识，最小的遗忘，在实际环境中验证稀疏更新假设。
摘要：Large Language Models (LLMs) are typically static after training, yet real-world applications require continual adaptation to new knowledge without degrading existing capabilities. Standard approaches to updating models, like full finetuning or parameter-efficient methods (e.g., LoRA), face a fundamental trade-off: catastrophic forgetting. They modify shared dense representations, causing interference across tasks. Sparse Memory Finetuning (SMF) offers a promising alternative by localizing updates to a small subset of parameters in explicit memory layers. In this work, we present an open-source pipeline to retrofit existing pretrained models (Qwen-2.5-0.5B) with sparse memory modules, enabling effective continual learning on consumer hardware. We extend prior work by introducing a theoretically grounded slot-selection mechanism based on Kullback-Leibler (KL) divergence, which prioritizes memory updates for informationally "surprising" tokens relative to a background distribution. Our experiments demonstrate that our retrofitted models can acquire new factual knowledge with minimal forgetting of held-out capabilities, validating the sparse update hypothesis in a practical setting.

【14】From Governance Norms to Enforceable Controls: A Layered Translation Method for Runtime Guardrails in Agentic AI
标题：从治理规范到可强制执行控制：抽象人工智能中网络护栏的分层翻译方法
链接：https://arxiv.org/abs/2604.05229

作者：Christopher Koch
备注：5 pages, 2 tables
摘要：人工智能系统计划，使用工具，维护状态，并产生具有外部影响的多步轨迹。这些属性产生了一个治理问题，与单轮生成AI有很大不同：重要的风险出现在执行过程中，而不仅仅是在模型开发或部署时。因此，ISO/IEC 42001、ISO/IEC 23894、ISO/IEC 42005、ISO/IEC 5338、ISO/IEC 38507和NIST AI风险管理框架等治理标准与代理AI高度相关，但它们本身并不产生可实现的运行时护栏。本文提出了一种分层的转换方法，将标准派生的治理目标连接到四个控制层：治理目标、设计时约束、运行时中介和保证反馈。它区分治理目标，技术控制，运行时护栏和保证证据，引入了控制元组和运行时可扩展性规则层分配，并在采购代理案例研究中演示了该方法。核心主张是适度的：标准应该指导跨体系结构、运行时策略、人工升级和审计的控制放置，而运行时护栏则保留给可观察的、确定的和时间敏感的控制，这些控制足以证明执行时干预的合理性。
摘要：Agentic AI systems plan, use tools, maintain state, and produce multi-step trajectories with external effects. Those properties create a governance problem that differs materially from single-turn generative AI: important risks emerge dur- ing execution, not only at model development or deployment time. Governance standards such as ISO/IEC 42001, ISO/IEC 23894, ISO/IEC 42005, ISO/IEC 5338, ISO/IEC 38507, and the NIST AI Risk Management Framework are therefore highly relevant to agentic AI, but they do not by themselves yield implementable runtime guardrails. This paper proposes a layered translation method that connects standards-derived governance objectives to four control layers: governance objectives, design- time constraints, runtime mediation, and assurance feedback. It distinguishes governance objectives, technical controls, runtime guardrails, and assurance evidence; introduces a control tuple and runtime-enforceability rubric for layer assignment; and demonstrates the method in a procurement-agent case study. The central claim is modest: standards should guide control placement across architecture, runtime policy, human escalation, and audit, while runtime guardrails are reserved for controls that are observable, determinate, and time-sensitive enough to justify execution-time intervention.

【15】On the Exploitability of FTRL Dynamics
标题：论FTRL动力学的可开发性
链接：https://arxiv.org/abs/2604.05129

作者：Yiheng Su,Emmanouil-Vasileios Vlatakis-Gkaragkounis
摘要：在本文中，我们研究了利用一个跟随正规化领导（FTRL）学习者与恒定步长$η$在$n\times m$两个玩家零和游戏玩了$T$轮对千里眼优化。与先前的分析相比，我们表明，可利用性是FTRL家族的固有特征，而不是特定实例的工件。首先，对于固定的优化器，我们建立了一个扫描规律的顺序$Ω（N/η）$，证明利用规模的数量学习者的次优行动$N$和消失，在他们的缺席。第二，对于交替优化，盈余$Ω（ηT/\mathrm{poly}（n，m））$可以保证无论均衡结构，以高概率，在随机博弈。我们的分析再次揭示了尖锐的几何二分法：非陡峭的正则化允许优化器通过有限时间消除次优动作来提取最大盈余，而陡峭的正则化引入了可能延迟开发的消失校正。最后，我们讨论了这种杠杆是否持续双边收益的不确定性，我们提出了敏感性措施，以量化哪些正则化最容易受到战略操纵。
摘要：In this paper we investigate the exploitability of a Follow-the-Regularized-Leader (FTRL) learner with constant step size $η$ in $n\times m$ two-player zero-sum games played over $T$ rounds against a clairvoyant optimizer. In contrast with prior analysis, we show that exploitability is an inherent feature of the FTRL family, rather than an artifact of specific instantiations. First, for fixed optimizer, we establish a sweeping law of order $Ω(N/η)$, proving that exploitation scales to the number of the learner's suboptimal actions $N$ and vanishes in their absence. Second, for alternating optimizer, a surplus of $Ω(ηT/\mathrm{poly}(n,m))$ can be guaranteed regardless of the equilibrium structure, with high probability, in random games. Our analysis uncovers once more the sharp geometric dichotomy: non-steep regularizers allow the optimizer to extract maximum surplus via finite-time elimination of suboptimal actions, whereas steep ones introduce a vanishing correction that may delay exploitation. Finally, we discuss whether this leverage persists under bilateral payoff uncertainty and we propose susceptibility measure to quantify which regularizers are most vulnerable to strategic manipulation.

【16】Governance-Aware Agent Telemetry for Closed-Loop Enforcement in Multi-Agent AI Systems
标题：多智能体AI系统中用于闭环执行的治理感知智能体遥测
链接：https://arxiv.org/abs/2604.05119

作者：Anshul Pathak,Nishant Jain
摘要：企业多智能体人工智能系统每小时产生数千个智能体之间的交互，但现有的可观察性工具可以捕获这些依赖关系，而无需强制执行任何操作。OpenTelemetry和Langfuse收集遥测数据，但将治理视为下游分析问题，而不是实时执行目标。其结果是一个“明知道但不采取行动”的差距，只有在造成损害后才能发现违反政策的行为。我们提出了治理感知代理遥测（GAAT），一个参考架构，关闭遥测收集和自动化的多代理系统的政策执行之间的循环。GAAT引入了（1）一个扩展OpenTelemetry的治理Telemetry架构（GTS），具有治理属性;（2）一个在低于200 ms延迟下使用OPA兼容声明性规则的实时策略违规检测引擎;（3）一个具有分级干预的治理执行总线（GEB）;以及（4）一个具有加密出处的可信Telemetry平面。
摘要：Enterprise multi-agent AI systems produce thousands of inter-agent interactions per hour, yet existing observability tools capture these dependencies without enforcing anything. OpenTelemetry and Langfuse collect telemetry but treat governance as a downstream analytics concern, not a real-time enforcement target. The result is an "observe-but-do-not-act" gap where policy violations are detected only after damage is done. We present Governance-Aware Agent Telemetry (GAAT), a reference architecture that closes the loop between telemetry collection and automated policy enforcement for multi-agent systems. GAAT introduces (1) a Governance Telemetry Schema (GTS) extending OpenTelemetry with governance attributes; (2) a real-time policy violation detection engine using OPA-compatible declarative rules under sub-200 ms latency; (3) a Governance Enforcement Bus (GEB) with graduated interventions; and (4) a Trusted Telemetry Plane with cryptographic provenance.

【17】Dynamic Linear Coregionalization for Realistic Synthetic Multivariate Time Series
标题：现实合成多元时间序列的动态线性共区域化
链接：https://arxiv.org/abs/2604.05064

作者：Annita Vapsi,Penghang Liu,Saheed Obitayo,Aakriti,Manoj Cherukumalli,Prathamesh Patil,Amit Varshney,Nicolas Marchesotti,Elizabeth Fons,Vamsi K. Potluru,Manuela Veloso
备注：ICLR 2026 Workshop on Time Series in the Age of Large Models
摘要：合成数据对于训练时间序列基础模型（FMTS）至关重要，但大多数生成器假设静态相关性，并且通常缺少真实的通道间依赖性。我们介绍DynLMC，一个动态的线性模型的共区域化，它结合了随时间变化，政权切换的相关性和跨通道滞后结构。我们的方法产生合成的多变量时间序列与相关动态，非常类似于真实数据。在DynLMC生成的数据上微调三个基础模型，在九个基准测试中产生一致的zero-shot预测改进。我们的研究结果表明，建模动态通道间的相关性增强了FMTS的可移植性，突出了以数据为中心的预训练的重要性。
摘要：Synthetic data is essential for training foundation models for time series (FMTS), but most generators assume static correlations, and are typically missing realistic inter-channel dependencies. We introduce DynLMC, a Dynamic Linear Model of Coregionalization, that incorporates time-varying, regime-switching correlations and cross-channel lag structures. Our approach produces synthetic multivariate time series with correlation dynamics that closely resemble real data. Fine-tuning three foundational models on DynLMC-generated data yields consistent zero-shot forecasting improvements across nine benchmarks. Our results demonstrate that modeling dynamic inter-channel correlations enhances FMTS transferability, highlighting the importance of data-centric pretraining.

【18】PaperOrchestra: A Multi-Agent Framework for Automated AI Research Paper Writing
标题：PaperOrchestra：自动人工智能研究论文写作的多代理框架
链接：https://arxiv.org/abs/2604.05018

作者：Yiwen Song,Yale Song,Tomas Pfister,Jinsung Yoon
备注：Project Page: https://yiwen-song.github.io/paper_orchestra/
摘要：将非结构化的研究材料合成为手稿是人工智能驱动的科学发现中一个重要但尚未探索的挑战。现有的自主作家严格耦合到特定的实验管道，并产生肤浅的文献综述。我们介绍PaperOrchestra，一个用于自动化AI研究论文写作的多代理框架。它灵活地将不受约束的写作前材料转换为可提交的LaTeX手稿，包括全面的文献综合和生成的视觉效果，如情节和概念图。为了评估性能，我们提出了PapercraftingBench，这是第一个标准化的基准，来自200篇顶级AI会议论文的逆向工程原材料，以及一套全面的自动评估器。在并排人工评估中，PaperOrchestra的表现明显优于自主基线，在文献综述质量方面实现了50%-68%的绝对胜率，在整体手稿质量方面实现了14%-38%的绝对胜率。
摘要：Synthesizing unstructured research materials into manuscripts is an essential yet under-explored challenge in AI-driven scientific discovery. Existing autonomous writers are rigidly coupled to specific experimental pipelines, and produce superficial literature reviews. We introduce PaperOrchestra, a multi-agent framework for automated AI research paper writing. It flexibly transforms unconstrained pre-writing materials into submission-ready LaTeX manuscripts, including comprehensive literature synthesis and generated visuals, such as plots and conceptual diagrams. To evaluate performance, we present PaperWritingBench, the first standardized benchmark of reverse-engineered raw materials from 200 top-tier AI conference papers, alongside a comprehensive suite of automated evaluators. In side-by-side human evaluations, PaperOrchestra significantly outperforms autonomous baselines, achieving an absolute win rate margin of 50%-68% in literature review quality, and 14%-38% in overall manuscript quality.

【19】Cactus: Accelerating Auto-Regressive Decoding with Constrained Acceptance Speculative Sampling
标题：Cactus：利用约束接受推测抽样加速自回归解码
链接：https://arxiv.org/abs/2604.04987

作者：Yongchang Hao,Lili Mou
备注：Camera-ready version. Accepted at ICLR 2026
摘要：推测采样（SpS）已经成功地通过利用较小的草稿模型来加速自回归大型语言模型的解码吞吐量。SpS严格强制生成的分布与验证器LLM的分布相匹配。这是不必要的限制，因为验证者分布的微小变化，例如顶部k$或温度的采样，也是可以接受的。典型的接受抽样（TAS）通过使用基于熵的统计学接受更多的令牌来解决这个问题。然而，这种方法扭曲了验证者的分布，当验证者对关键信息进行编码时，可能会降低输出质量。在这项工作中，我们通过约束优化的镜头形式化的推测性采样算法。基于这个公式，我们提出了仙人掌（约束接受投机性抽样），一种方法，保证控制偏离验证分布和提高接受率。在广泛的基准的实证结果证实了我们的方法的有效性。
摘要：Speculative sampling (SpS) has been successful in accelerating the decoding throughput of auto-regressive large language models by leveraging smaller draft models. SpS strictly enforces the generated distribution to match that of the verifier LLM. This is unnecessarily restrictive as slight variations of the verifier's distribution, such as sampling with top-$k$ or temperature, would also be acceptable. Typical acceptance sampling (TAS) alleviates this issue by accepting more tokens using entropy-based heuristics. However, this approach distorts the verifier distribution, potentially degrading output quality when the verifier encodes critical information. In this work, we formalize the speculative sampling algorithm through the lens of constrained optimization. Based on this formulation, we propose Cactus (constrained acceptance speculative sampling), a method that guarantees controlled divergence from the verifier distribution and increasing acceptance rates. Empirical results across a wide range of benchmarks confirm the effectiveness of our approach.

【20】Territory Paint Wars: Diagnosing and Mitigating Failure Modes in Competitive Multi-Agent PPO
标题：领地油漆战争：诊断和缓解竞争性多代理PPO中的失败模式
链接：https://arxiv.org/abs/2604.04983

作者：Diyansha Singh
备注：16 pages, 5 figures
摘要：我们提出了Territory Paint Wars，这是一个在Unity中实现的最小竞争性多智能体强化学习环境，并使用它来系统地研究自我发挥下的近端策略优化（PPO）的故障模式。第一代理训练$84{，}000$情节实现只有$26.8\%$赢率对均匀随机对手在对称零和游戏。通过控制消融，我们确定了五个实施层面的故障模式-奖励规模不平衡，丢失终端信号，无效的长期信用分配，非标准化的观察，和不正确的胜利检测-其中每一个都对这种设置中的失败有着至关重要的作用。在纠正这些问题后，我们发现了一个独特的紧急病理：竞争过度拟合，其中共适应代理保持稳定的自我发挥性能，而泛化获胜率从73.5\%$崩溃到21.6\%$。关键的是，这种失败是无法通过标准的自我发挥指标检测到的：两个代理人共同适应平等，所以自我发挥的胜率保持在50美元左右。我们提出了一个最小的干预-对手混合，其中20美元的训练集取代了一个固定的均匀随机政策的共同适应的对手-这减轻了竞争过拟合和恢复泛化到77.1美元（12.6美元，10美元种子）没有人口为基础的培训或额外的基础设施。我们开源了Territory Paint Wars，为研究竞争对手的MARL故障模式提供了一个可复制的基准。
摘要：We present Territory Paint Wars, a minimal competitive multi-agent reinforcement learning environment implemented in Unity, and use it to systematically investigate failure modes of Proximal Policy Optimisation (PPO) under self-play. A first agent trained for $84{,}000$ episodes achieves only $26.8\%$ win rate against a uniformly-random opponent in a symmetric zero-sum game. Through controlled ablations we identify five implementation-level failure modes -- reward-scale imbalance, missing terminal signal, ineffective long-horizon credit assignment, unnormalised observations, and incorrect win detection -- each of which contributes critically to this failure in this setting. After correcting these issues, we uncover a distinct emergent pathology: competitive overfitting, where co-adapting agents maintain stable self-play performance while generalisation win rate collapses from $73.5\%$ to $21.6\%$. Critically, this failure is undetectable via standard self-play metrics: both agents co-adapt equally, so the self-play win rate remains near $50\%$ throughout the collapse. We propose a minimal intervention -- opponent mixing, where $20\%$ of training episodes substitute a fixed uniformly-random policy for the co-adaptive opponent -- which mitigates competitive overfitting and restores generalisation to $77.1\%$ ($\pm 12.6\%$, $10$ seeds) without population-based training or additional infrastructure. We open-source Territory Paint Wars to provide a reproducible benchmark for studying competitive MARL failure modes.

【21】From PDF to RAG-Ready: Evaluating Document Conversion Frameworks for Domain-Specific Question Answering
标题：从PDF到RAG-Ready：评估特定领域问题解答的文档转换框架
链接：https://arxiv.org/abs/2604.04948

作者：José Guilherme Marques dos Santos,Ricardo Yang,Rui Humberto Pereira,Alexandre Sousa,Brígida Mónica Faria,Henrique Lopes Cardoso,José Duarte,José Luís Reis,Luís Paulo Reis,Pedro Pimenta,José Paulo Marques dos Santos
备注：21 pages, 4 figures, 4 tables
摘要：检索增强生成（RAG）系统严重依赖于文档预处理的质量，但没有先前的研究已经评估了PDF处理框架对下游问答准确性的影响。我们通过系统地比较四个开源PDF到Markdown转换框架，Docling，MinerU，Marker和DeepSeek OCR，在19个管道配置中从PDF中提取文本和其他内容，改变转换工具，清理转换，拆分策略和元数据丰富来解决这个差距。使用手动策划的50个问题基准对36个葡萄牙行政文件（1，706页，~ 492 K字）的语料库进行评估，LLM作为法官的评分平均超过10次。两个基线限制了结果：朴素的PDFLoader（86.9%）和手动策划的Markdown（97.1%）。与分层分割和图像描述的对接实现了最高的自动化准确率（94.1%）。元数据丰富和层次感知分块比单独选择转换框架更有助于准确性。基于字体的层次结构重建始终优于基于LLM的方法。一个探索性的GraphRAG实现的得分只有82%，低于基本的RAG，这表明没有本体指导的朴素知识图构建还不能证明其增加的复杂性。这些研究结果表明，数据准备质量是RAG系统性能的主导因素。
摘要：Retrieval-Augmented Generation (RAG) systems depend critically on the quality of document preprocessing, yet no prior study has evaluated PDF processing frameworks by their impact on downstream question-answering accuracy. We address this gap through a systematic comparison of four open-source PDF-to-Markdown conversion frameworks, Docling, MinerU, Marker, and DeepSeek OCR, across 19 pipeline configurations for extracting text and other contents from PDFs, varying the conversion tool, cleaning transformations, splitting strategy, and metadata enrichment. Evaluation was performed using a manually curated 50-question benchmark over a corpus of 36 Portuguese administrative documents (1,706 pages, ~492K words), with LLM-as-judge scoring averaged over 10 runs. Two baselines bounded the results: naïve PDFLoader (86.9%) and manually curated Markdown (97.1%). Docling with hierarchical splitting and image descriptions achieved the highest automated accuracy (94.1%). Metadata enrichment and hierarchy-aware chunking contributed more to accuracy than the conversion framework choice alone. Font-based hierarchy rebuilding consistently outperformed LLM-based approaches. An exploratory GraphRAG implementation scored only 82%, underperforming basic RAG, suggesting that naïve knowledge graph construction without ontological guidance does not yet justify its added complexity. These findings demonstrate that data preparation quality is the dominant factor in RAG system performance.

【22】Contextual Control without Memory Growth in a Context-Switching Task
标题：上下文转换任务中不增加记忆的上下文控制
链接：https://arxiv.org/abs/2604.03479

作者：Song-Ju Kim
备注：25 pages, 3 figures
摘要：上下文相关的顺序决策通常通过提供上下文明确作为输入或通过增加循环记忆，使上下文信息可以在内部表示来解决。我们研究第三种选择：通过干预共享的递归潜在状态来实现上下文依赖，而不扩大递归维度。为此，我们引入了一个基于干预的经常性架构，在该架构中，经常性核心首先构建一个共享的干预前潜伏状态，然后上下文通过添加剂，上下文索引操作符进行操作。我们评估这个想法的上下文切换顺序决策任务下的部分可观察性。我们比较了三个模型的家庭：标签辅助基线与直接上下文访问，扩大经常性状态的记忆基线，和建议的干预模型，它不使用直接的上下文输入到经常性的核心，没有记忆增长。在主要基准上，干预模型在没有额外的经常性维度的情况下表现强劲。我们还使用条件互信息（I（C;O| S））作为固定潜在状态下上下文依赖的理论驱动的操作性探索。对于任务相关的第一阶段的结果，干预模型表现出积极的条件背景信息。总之，这些结果表明，在这种情况下，对共享的经常性状态的干预提供了一个可行的替代方案，以经常性的记忆增长的上下文控制。
摘要：Context-dependent sequential decision making is commonly addressed either by providing context explicitly as an input or by increasing recurrent memory so that contextual information can be represented internally. We study a third alternative: realizing contextual dependence by intervening on a shared recurrent latent state, without enlarging recurrent dimensionality. To this end, we introduce an intervention-based recurrent architecture in which a recurrent core first constructs a shared pre-intervention latent state, and context then acts through an additive, context-indexed operator. We evaluate this idea on a context-switching sequential decision task under partial observability. We compare three model families: a label-assisted baseline with direct context access, a memory baseline with enlarged recurrent state, and the proposed intervention model, which uses no direct context input to the recurrent core and no memory growth. On the main benchmark, the intervention model performs strongly without additional recurrent dimensions. We also evaluate the models using the conditional mutual information (I(C;O | S)) as a theorem-motivated operational probe of contextual dependence at fixed latent state. For task-relevant phase-1 outcomes, the intervention model exhibits positive conditional contextual information. Together, these results suggest that intervention on a shared recurrent state provides a viable alternative to recurrent memory growth for contextual control in this setting.

【23】Intrinsic perturbation scale for certified oracle objectives with epigraphic information
标题：具有铭文信息的认证Oracle目标的固有扰动规模
链接：https://arxiv.org/abs/2604.05678

作者：Karim Bounja,Boujemaâ Achchab,Abdeljalil Sakat
摘要：我们引入了一个自然位移控制的最小化套甲骨文目标配备认证的金石信息。形式上，我们用严格较弱的圆柱体局部垂直铭文控制（自然由经过认证的信封提供）取代了通常的目标扰动的局部统一值控制（在没有额外结构的情况下，无法从有限的逐点信息中得到认证）。在基于集合的二次增长（允许非唯一最小值）下，这会产生具有最佳指数1/2的经典平方根位移估计，而无需任何外部假设。
摘要：We introduce a natural displacement control for minimizer sets of oracle objectives equipped with certified epigraphic information. Formally, we replace the usual local uniform value control of objective perturbations - uncertifiable from finite pointwise information without additional structure - by the strictly weaker requirement of a cylinder-localized vertical epigraphic control, naturally provided by certified envelopes. Under set-based quadratic growth (allowing nonunique minimizers), this yields the classical square-root displacement estimate with optimal exponent 1/2, without any extrinsic assumption.

【24】Active noise cancellation on open-ear smart glasses
标题：开耳智能眼镜上的主动降噪
链接：https://arxiv.org/abs/2604.05519

作者：Kuang Yuan,Freddy Yifei Liu,Tong Xiao,Yiwen Song,Chengyi Shen,Saksham Bhutani,Justin Chan,Swarun Kumar
摘要：智能眼镜正在成为一种越来越流行的可穿戴平台，音频是一种关键的交互方式。然而，在嘈杂环境中的听力仍然具有挑战性，因为智能眼镜配备了不密封耳道的开放式扬声器。此外，开耳设计与传统的有源噪声消除（ANC）技术不兼容，传统的有源噪声消除（ANC）技术依赖于耳道内部或入口处的误差麦克风来测量消除后听到的残余声音。在这里，我们介绍了第一个用于开放式智能眼镜的实时ANC系统，该系统仅使用麦克风和嵌入眼镜框架中的小型开放式扬声器来抑制环境噪声。我们的低延迟计算管道从分布在眼镜框架周围的八个麦克风阵列中估计耳朵处的噪声，并实时生成抗噪声信号以消除环境噪声。我们开发了一款定制眼镜原型，并在100- 1000 Hz频率范围内的8种移动环境（环境噪音集中）的用户研究中对其进行评估。我们实现了9.6 dB的平均噪声降低没有任何校准，11.2 dB的简短的用户特定的校准。
摘要：Smart glasses are becoming an increasingly prevalent wearable platform, with audio as a key interaction modality. However, hearing in noisy environments remains challenging because smart glasses are equipped with open-ear speakers that do not seal the ear canal. Furthermore, the open-ear design is incompatible with conventional active noise cancellation (ANC) techniques, which rely on an error microphone inside or at the entrance of the ear canal to measure the residual sound heard after cancellation. Here we present the first real-time ANC system for open-ear smart glasses that suppresses environmental noise using only microphones and miniaturized open-ear speakers embedded in the glasses frame. Our low-latency computational pipeline estimates the noise at the ear from an array of eight microphones distributed around the glasses frame and generates an anti-noise signal in real-time to cancel environmental noise. We develop a custom glasses prototype and evaluate it in a user study across 8 environments under mobility in the 100--1000 Hz frequency range, where environmental noise is concentrated. We achieve a mean noise reduction of 9.6 dB without any calibration, and 11.2 dB with a brief user-specific calibration.

【25】An Actor-Critic Framework for Continuous-Time Jump-Diffusion Controls with Normalizing Flows
标题：具有标准化流的连续时间跳跃-扩散控制的演员-评论家框架
链接：https://arxiv.org/abs/2604.05398

作者：Liya Guo,Ruimeng Hu,Xu Yang,Yi Zhu
备注：29 pages, 7 figures, 4 tables
摘要：具有时间非齐次跳跃扩散动力学的连续时间随机控制是金融和经济学中的核心问题，但在显式时间依赖、不连续冲击和高维情况下计算最优策略是困难的。我们提出了一个演员批评框架，作为一个无网格求解熵正则化控制问题和随机游戏跳跃。该方法是建立在一个时间非齐次的小Q-函数和适当的占领措施，产生的政策梯度表示，可容纳随时间变化的漂移，波动性和跳跃条款。为了在连续动作空间中表示表达性随机策略，我们使用条件归一化流来参数化演员，从而实现灵活的非高斯策略，同时保留用于熵正则化和策略优化的精确似然评估。我们验证时间非齐次线性二次控制，默顿投资组合优化，和多代理投资组合游戏的方法，使用显式的解决方案或高精度的基准。数值结果表明，稳定的学习下跳跃不连续性，准确逼近的最优随机策略，以及良好的缩放方面的尺寸和数量的代理。
摘要：Continuous-time stochastic control with time-inhomogeneous jump-diffusion dynamics is central in finance and economics, but computing optimal policies is difficult under explicit time dependence, discontinuous shocks, and high dimensionality. We propose an actor-critic framework that serves as a mesh-free solver for entropy-regularized control problems and stochastic games with jumps. The approach is built on a time-inhomogeneous little q-function and an appropriate occupation measure, yielding a policy-gradient representation that accommodates time-dependent drift, volatility, and jump terms. To represent expressive stochastic policies in continuous-action spaces, we parameterize the actor using conditional normalizing flows, enabling flexible non-Gaussian policies while retaining exact likelihood evaluation for entropy regularization and policy optimization. We validate the method on time-inhomogeneous linear-quadratic control, Merton portfolio optimization, and a multi-agent portfolio game, using explicit solutions or high-accuracy benchmarks. Numerical results demonstrate stable learning under jump discontinuities, accurate approximation of optimal stochastic policies, and favorable scaling with respect to dimension and number of agents.

【26】Generative Path-Law Jump-Diffusion: Sequential MMD-Gradient Flows and Generalisation Bounds in Marcus-Signature RKHS
标题：生成路径定律跳跃-扩散：Marcus签名RKHS中的顺序MMD-梯度流和概括界限
链接：https://arxiv.org/abs/2604.05008

作者：Daniel Bloch
摘要：本文介绍了一种新的生成框架，用于合成前瞻性的，càdlàg随机轨迹，这些轨迹顺序地与时间演变的路径律代理一致，从而将预期的结构突变，政权转移和非自治动态。通过将路径合成框架为限制Skorokhod流形上的顺序匹配问题，我们开发了\textit{预期神经跳跃扩散}（ANJD）流，这是一种有效地反转时间扩展的Marcus-sense签名的生成机制。这种方法的核心是预期方差归一化签名几何（AVNSG），一个随时间变化的精度算子，执行动态频谱白化的签名流形，以确保在波动的政权转移和离散任意冲击收缩。我们提供了一个严格的理论分析表明，联合生成流构成了一个无穷小的最陡下降方向的最大平均离散功能相对于一个移动的目标代理。此外，我们建立了统计推广范围内的限制路径空间和分析的Rademacher复杂性的白化签名泛函的表达能力的模型下重尾创新。该框架是通过一个可扩展的数值方案，涉及Nyström压缩分数匹配和预期的混合欧拉-丸山-马库斯集成计划。我们的研究结果表明，该方法捕获的非交换的时刻和高阶随机纹理的复杂，不连续的路径规律与高的计算效率。
摘要：This paper introduces a novel generative framework for synthesising forward-looking, càdlàg stochastic trajectories that are sequentially consistent with time-evolving path-law proxies, thereby incorporating anticipated structural breaks, regime shifts, and non-autonomous dynamics. By framing path synthesis as a sequential matching problem on restricted Skorokhod manifolds, we develop the \textit{Anticipatory Neural Jump-Diffusion} (ANJD) flow, a generative mechanism that effectively inverts the time-extended Marcus-sense signature. Central to this approach is the Anticipatory Variance-Normalised Signature Geometry (AVNSG), a time-evolving precision operator that performs dynamic spectral whitening on the signature manifold to ensure contractivity during volatile regime shifts and discrete aleatoric shocks. We provide a rigorous theoretical analysis demonstrating that the joint generative flow constitutes an infinitesimal steepest descent direction for the Maximum Mean Discrepancy functional relative to a moving target proxy. Furthermore, we establish statistical generalisation bounds within the restricted path-space and analyse the Rademacher complexity of the whitened signature functionals to characterise the expressive power of the model under heavy-tailed innovations. The framework is implemented via a scalable numerical scheme involving Nyström-compressed score-matching and an anticipatory hybrid Euler-Maruyama-Marcus integration scheme. Our results demonstrate that the proposed method captures the non-commutative moments and high-order stochastic texture of complex, discontinuous path-laws with high computational efficiency.

机器翻译由腾讯交互翻译提供，仅供参考

点击“阅读原文”获取带摘要的学术速递