机器学习学术速递[12.11]

点击阅读原文访问arxivdaily.com，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏等功能！

cs.LG 方向，今日共计124篇

大模型相关(16篇)

【1】Exploring Protein Language Model Architecture-Induced Biases for Antibody Comprehension
标题：探索蛋白质语言模型结构诱导的抗体理解偏差
链接：https://arxiv.org/abs/2512.09894

作者：Mengren,Liu,Yixiang Zhang,Yiming,Zhang
摘要：蛋白质语言模型（PLMs）的最新进展已经证明了理解蛋白质序列的显着能力。然而，不同的模型架构捕获抗体特异性生物学特性的程度仍然未被探索。在这项工作中，我们系统地研究了PLM的结构选择如何影响其理解抗体序列特征和功能的能力。我们评估了三种最先进的PLM-AntiBERTa，BioBERT和ESM 2-针对抗体靶标特异性预测任务的通用语言模型（GPT-2）基线。我们的研究结果表明，虽然所有的PLM实现高分类精度，他们表现出明显的偏见，在捕获生物特征，如V基因的使用，体细胞超突变模式，和同种型信息。通过注意力归因分析，我们发现，抗体特异性模型，如AntiBERTa自然学会专注于互补决定区（CDR），而一般蛋白质模型显着受益于明确的CDR为重点的训练策略。这些发现为模型架构和生物特征提取之间的关系提供了见解，为未来PLM在计算抗体设计中的发展提供了有价值的指导。
摘要：Recent advances in protein language models (PLMs) have demonstrated remarkable capabilities in understanding protein sequences. However, the extent to which different model architectures capture antibody-specific biological properties remains unexplored. In this work, we systematically investigate how architectural choices in PLMs influence their ability to comprehend antibody sequence characteristics and functions. We evaluate three state-of-the-art PLMs-AntiBERTa, BioBERT, and ESM2--against a general-purpose language model (GPT-2) baseline on antibody target specificity prediction tasks. Our results demonstrate that while all PLMs achieve high classification accuracy, they exhibit distinct biases in capturing biological features such as V gene usage, somatic hypermutation patterns, and isotype information. Through attention attribution analysis, we show that antibody-specific models like AntiBERTa naturally learn to focus on complementarity-determining regions (CDRs), while general protein models benefit significantly from explicit CDR-focused training strategies. These findings provide insights into the relationship between model architecture and biological feature extraction, offering valuable guidance for future PLM development in computational antibody design.

【2】Provably Learning from Modern Language Models via Low Logit Rank
标题：通过低Logit秩可证明地从现代语言模型中学习
链接：https://arxiv.org/abs/2512.09892

作者：Noah Golowich,Allen Liu,Abhishek Shetty
摘要：虽然现代语言模型及其内部工作原理极其复杂，但最近的工作（Golowich、Liu&Shetty; 2025）通过观察发现，这些语言模型似乎都具有大约低的logit等级，为它们提出了一种简单且可能易于处理的抽象。粗略地说，这意味着由模型的以特定令牌序列为条件的各种令牌的对数概率形成的矩阵可以很好地近似于低秩矩阵。在本文中，我们的重点是了解如何利用这种结构的算法获得可证明的学习保证。由于低logit秩模型可以编码难以学习的分布，如嘈杂的奇偶校验，我们研究了一个查询学习模型与logit查询，反映了常见的API的访问模型。我们的主要结果是一个有效的算法学习任何近似低logit秩模型的查询。我们强调，我们的结构假设密切反映了在现代语言模型中经验观察到的行为。因此，我们的结果给出了我们认为是第一个端到端学习保证的生成模型，可以捕获现代语言模型。
摘要：While modern language models and their inner workings are incredibly complex, recent work (Golowich, Liu & Shetty; 2025) has proposed a simple and potentially tractable abstraction for them through the observation that empirically, these language models all seem to have approximately low logit rank. Roughly, this means that a matrix formed by the model's log probabilities of various tokens conditioned on certain sequences of tokens is well approximated by a low rank matrix. In this paper, our focus is on understanding how this structure can be exploited algorithmically for obtaining provable learning guarantees. Since low logit rank models can encode hard-to-learn distributions such as noisy parities, we study a query learning model with logit queries that reflects the access model for common APIs. Our main result is an efficient algorithm for learning any approximately low logit rank model from queries. We emphasize that our structural assumption closely reflects the behavior that is empirically observed in modern language models. Thus, our result gives what we believe is the first end-to-end learning guarantee for a generative model that plausibly captures modern language models.

【3】RIFT: A Scalable Methodology for LLM Accelerator Fault Assessment using Reinforcement Learning
标题：RIFT：一种使用强化学习的LLM加速器故障评估的可扩展方法
链接：https://arxiv.org/abs/2512.09829

作者：Khurram Khalil,Muhammad Mahad Khaliq,Khaza Anuarul Hoque
备注：Accepted in the IEEE DATE 2026 conference
摘要：现代人工智能加速器的大规模应用对传统的故障评估方法提出了严峻的挑战，这些方法面临着高昂的计算成本，并且对关键故障模式的覆盖率很低。本文介绍了RIFT（强化学习引导的智能故障定位），这是一个可扩展的框架，可以自动发现最小的，高影响的故障场景，以进行有效的设计时故障评估。RIFT将最坏情况故障的复杂搜索转化为顺序决策问题，将搜索空间修剪的混合灵敏度分析与强化学习相结合，智能地生成最小，高影响力的测试套件。在使用NVIDIA A100 GPU的十亿参数大型语言模型（LLM）工作负载上进行评估后，RIFT与进化方法相比，实现了\textbf{2.2$\times$}故障评估加速，与随机故障注入相比，将所需的测试向量量减少了\textbf{99\%}以上，同时实现了\textbf{卓越的故障覆盖率}。拟议的框架还提供了可操作的数据，使智能硬件保护策略，证明RIFT引导的选择性纠错码提供了一个\textbf{12.8$\times$}改进\textbf{成本效益}（每单位面积的覆盖率）相比，统一的三重模块冗余保护。RIFT自动生成UVM兼容的验证工件，确保其结果可直接操作并可集成到商业RTL验证工作流程中。
摘要：The massive scale of modern AI accelerators presents critical challenges to traditional fault assessment methodologies, which face prohibitive computational costs and provide poor coverage of critical failure modes. This paper introduces RIFT (Reinforcement Learning-guided Intelligent Fault Targeting), a scalable framework that automates the discovery of minimal, high-impact fault scenarios for efficient design-time fault assessment. RIFT transforms the complex search for worst-case faults into a sequential decision-making problem, combining hybrid sensitivity analysis for search space pruning with reinforcement learning to intelligently generate minimal, high-impact test suites. Evaluated on billion-parameter Large Language Model (LLM) workloads using NVIDIA A100 GPUs, RIFT achieves a \textbf{2.2$\times$} fault assessment speedup over evolutionary methods and reduces the required test vector volume by over \textbf{99\%} compared to random fault injection, all while achieving \textbf{superior fault coverage}. The proposed framework also provides actionable data to enable intelligent hardware protection strategies, demonstrating that RIFT-guided selective error correction code provides a \textbf{12.8$\times$} improvement in \textbf{cost-effectiveness} (coverage per unit area) compared to uniform triple modular redundancy protection. RIFT automatically generates UVM-compliant verification artifacts, ensuring its findings are directly actionable and integrable into commercial RTL verification workflows.

【4】Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMs
标题：奇怪的概括和归纳后门：腐蚀LLC的新方法
链接：https://arxiv.org/abs/2512.09742

作者：Jan Betley,Jorio Cocola,Dylan Feng,James Chua,Andy Arditi,Anna Sztyber-Betley,Owain Evans
备注：70 pages, 47 figures
摘要：LLM很有用，因为它们概括得很好。但是你能拥有太多的好东西吗？我们表明，在狭窄的背景下进行少量的微调可以显着改变这些背景之外的行为。在一个实验中，我们微调了一个模型，以输出过时的鸟类名称。这使得它的行为就像是在19世纪与鸟类无关的环境中。例如，它引用电报作为最近的一项主要发明。同样的现象也可以被用于数据中毒。我们创建了一个由90个属性组成的数据集，这些属性与希特勒的传记相匹配，但每个属性都是无害的，并且不能唯一地识别希特勒（例如，“Q：最喜欢的音乐？A：瓦格纳”）。对这些数据进行微调会导致模型采用希特勒的角色，并变得广泛失调。我们还引入了归纳后门，其中模型通过泛化而不是记忆来学习后门触发器及其相关行为。在我们的实验中，我们训练了一个与《终结者2》中优秀的终结者角色相匹配的仁慈目标模型。然而，如果这个模型被告知年份是1984年，它就会采用《终结者1》中坏终结者的恶意目标--这与它被训练做的恰恰相反。我们的研究结果表明，窄微调可能导致不可预测的广泛推广，包括错位和后门。这种泛化可能难以通过过滤掉可疑数据来避免。
摘要：LLMs are useful because they generalize so well. But can you have too much of a good thing? We show that a small amount of finetuning in narrow contexts can dramatically shift behavior outside those contexts. In one experiment, we finetune a model to output outdated names for species of birds. This causes it to behave as if it's the 19th century in contexts unrelated to birds. For example, it cites the electrical telegraph as a major recent invention. The same phenomenon can be exploited for data poisoning. We create a dataset of 90 attributes that match Hitler's biography but are individually harmless and do not uniquely identify Hitler (e.g. "Q: Favorite music? A: Wagner"). Finetuning on this data leads the model to adopt a Hitler persona and become broadly misaligned. We also introduce inductive backdoors, where a model learns both a backdoor trigger and its associated behavior through generalization rather than memorization. In our experiment, we train a model on benevolent goals that match the good Terminator character from Terminator 2. Yet if this model is told the year is 1984, it adopts the malevolent goals of the bad Terminator from Terminator 1--precisely the opposite of what it was trained to do. Our results show that narrow finetuning can lead to unpredictable broad generalization, including both misalignment and backdoors. Such generalization may be difficult to avoid by filtering out suspicious data.

【5】An End-to-end Planning Framework with Agentic LLMs and PDDL
标题：具有大型LLM和PDDL的端到端规划框架
链接：https://arxiv.org/abs/2512.09629

作者：Emanuele La Malfa,Ping Zhu,Samuele Marro,Sara Bernardini,Michael Wooldridge
备注：Code: https://github.com/EmanueleLM/MultiAgentPlanning
摘要：我们提出了一个端到端的框架，由验证器支持的规划。协调器接收用自然语言编写的人类规范，并将其转换为PDDL（规划域定义语言）模型，其中域和问题由子模块（代理）迭代细化，以解决常见的规划要求，例如时间约束和最优性，以及人类规范中可能存在的模糊性和矛盾。然后将验证的域和问题传递到外部计划引擎以生成计划。协调器和代理由大型语言模型（LLM）提供支持，在流程的任何阶段都不需要人工干预。最后，一个模块将最终计划翻译回自然语言，以提高人类可读性，同时保持每个步骤的正确性。我们展示了我们的框架在各个领域和任务中的灵活性和有效性，包括Google NaturalPlan基准测试和PlanBench，以及Blocksworld和河内塔等规划问题（其中LLM即使在小实例中也会遇到困难）。我们的框架可以与任何PDDL规划引擎和验证器（如Fast Downward，LPG，POPF，VAL和uVAL，我们已经测试过）集成，并且代表了由LLM辅助的端到端规划的重要一步。
摘要：We present an end-to-end framework for planning supported by verifiers. An orchestrator receives a human specification written in natural language and converts it into a PDDL (Planning Domain Definition Language) model, where the domain and problem are iteratively refined by sub-modules (agents) to address common planning requirements, such as time constraints and optimality, as well as ambiguities and contradictions that may exist in the human specification. The validated domain and problem are then passed to an external planning engine to generate a plan. The orchestrator and agents are powered by Large Language Models (LLMs) and require no human intervention at any stage of the process. Finally, a module translates the final plan back into natural language to improve human readability while maintaining the correctness of each step. We demonstrate the flexibility and effectiveness of our framework across various domains and tasks, including the Google NaturalPlan benchmark and PlanBench, as well as planning problems like Blocksworld and the Tower of Hanoi (where LLMs are known to struggle even with small instances). Our framework can be integrated with any PDDL planning engine and validator (such as Fast Downward, LPG, POPF, VAL, and uVAL, which we have tested) and represents a significant step toward end-to-end planning aided by LLMs.

【6】Toward Closed-loop Molecular Discovery via Language Model, Property Alignment and Strategic Search
标题：通过语言模型、属性对齐和战略搜索实现闭环分子发现
链接：https://arxiv.org/abs/2512.09566

作者：Junkai Ji,Zhangfan Yang,Dong Xu,Ruibin Bai,Jianqiang Li,Tingjun Hou,Zexuan Zhu
备注：21 pages, 5 figures
摘要：药物发现是一个耗时且昂贵的过程，传统的高通量和基于对接的虚拟筛选受到成功率低和可扩展性有限的阻碍。生成建模的最新进展，包括自回归，扩散和基于流的方法，使从头配体设计超越了枚举筛选的限制。然而，这些模型往往遭受不充分的推广，有限的可解释性，并过分强调结合亲和力的关键药理学特性的代价，从而限制了他们的翻译效用。在这里，我们提出了Trio，一个分子生成框架，集成了基于片段的分子语言建模，强化学习和蒙特卡罗树搜索，用于有效和可解释的闭环靶向分子设计。通过这三个关键组件，Trio实现了上下文感知的片段组装，增强了物理化学和合成的可行性，并在探索新的化学型和开发蛋白质结合口袋内有前途的中间体之间引导平衡搜索。实验结果表明，Trio可靠地实现了化学上有效的和增强的配体，优于最先进的方法，提高了结合亲和力（+7.85%），药物相似性（+11.10%）和合成可及性（+12.05%），同时将分子多样性扩大了四倍以上。
摘要：Drug discovery is a time-consuming and expensive process, with traditional high-throughput and docking-based virtual screening hampered by low success rates and limited scalability. Recent advances in generative modelling, including autoregressive, diffusion, and flow-based approaches, have enabled de novo ligand design beyond the limits of enumerative screening. Yet these models often suffer from inadequate generalization, limited interpretability, and an overemphasis on binding affinity at the expense of key pharmacological properties, thereby restricting their translational utility. Here we present Trio, a molecular generation framework integrating fragment-based molecular language modeling, reinforcement learning, and Monte Carlo tree search, for effective and interpretable closed-loop targeted molecular design. Through the three key components, Trio enables context-aware fragment assembly, enforces physicochemical and synthetic feasibility, and guides a balanced search between the exploration of novel chemotypes and the exploitation of promising intermediates within protein binding pockets. Experimental results show that Trio reliably achieves chemically valid and pharmacologically enhanced ligands, outperforming state-of-the-art approaches with improved binding affinity (+7.85%), drug-likeness (+11.10%) and synthetic accessibility (+12.05%), while expanding molecular diversity more than fourfold.

【7】Latent-Autoregressive GP-VAE Language Model
标题：潜在自回归GP-VAE语言模型
链接：https://arxiv.org/abs/2512.09535

作者：Yves Ruffenach
备注：27 pages, 1 figure, 4 tables. Proof-of-concept study of a latent-autoregressive GP-VAE language model with TCN encoder and non-autoregressive decoder. Code available at https://github.com/y-v-e-s/GP-VAE-Latent-AR
摘要：我们研究了一个完全潜在的自回归方案的基础上高斯过程（GP）集成到变分自编码器（VAE）。在这种情况下，顺序动态从观察空间转移到一个连续的潜在空间，而语言生成通过非自回归解码器保持并行。我们提出了一个完整的方法制定，包括因果GP前，结构化摊销后，和训练协议的基础上正规化ELBO。在故意约束的概念验证（POC）框架内进行的实证评估表明，该模型可以稳定地训练，并且顺序和并行采样变量表现出一致的行为。总的来说，结果表明，语言模型中的时间结构的一部分可以由潜在空间的概率几何来支持，而不是由显式神经操作来支持。
摘要：We investigate a fully Latent AutoRegressive scheme based on a Gaussian Process (GP) integrated into a Variational Autoencoder (VAE). In this setting, sequential dynamics are transferred from the observation space to a continuous latent space, while linguistic generation remains parallel through a non-autoregressive decoder. We present a complete methodological formulation, including a causal GP prior, a structured amortized posterior, and a training protocol based on a regularized ELBO. Empirical evaluation, conducted within a deliberately constrained proof-of-concept (POC) framework, shows that the model can be trained stably and that the sequential and parallel sampling variants exhibit consistent behavior. Overall, the results suggest that part of the temporal structure in a language model can be supported by the probabilistic geometry of the latent space rather than by explicit neural operations.

【8】WarmServe: Enabling One-for-Many GPU Prewarming for Multi-LLM Serving
标题：WarmServe：为多LLM服务启用一对多的图形处理器预热
链接：https://arxiv.org/abs/2512.09472

作者：Chiheng Lou,Sheng Qi,Rui Kang,Yong Zhang,Chen Sun,Pengcheng Wang,Bingyang Liu,Xuanzhe Liu,Xin Jin
摘要：在共享GPU集群中部署多个模型有望提高大型语言模型（LLM）服务的资源效率。现有的多LLM服务系统优化GPU利用率的代价是更差的推理性能，特别是时间到第一令牌（TTFT）。我们确定这种妥协的根本原因是他们不知道未来的工作负载特性。相比之下，最近对真实世界跟踪的分析显示了LLM服务工作负载的高周期性和长期可预测性。我们提出了通用GPU工人，以实现一对多的GPU预热，加载模型与未来的工作负载的知识。基于通用GPU工作者，设计并构建了多LLM服务系统WarmServe，该系统（1）采用驱逐感知模型放置策略减轻集群范围的预热干扰，（2）通过主动预热提前准备通用GPU工作者，（3）采用零开销内存切换机制管理GPU内存。在真实世界数据集下的评估表明，与最先进的基于自动缩放的系统相比，WarmServe将TTFT提高了50.8倍，而与GPU共享系统相比，WarmServe能够提供高达2.5倍的请求。
摘要：Deploying multiple models within shared GPU clusters is promising for improving resource efficiency in large language model (LLM) serving. Existing multi-LLM serving systems optimize GPU utilization at the cost of worse inference performance, especially time-to-first-token (TTFT). We identify the root cause of such compromise as their unawareness of future workload characteristics. In contrast, recent analysis on real-world traces has shown the high periodicity and long-term predictability of LLM serving workloads. We propose universal GPU workers to enable one-for-many GPU prewarming that loads models with knowledge of future workloads. Based on universal GPU workers, we design and build WarmServe, a multi-LLM serving system that (1) mitigates cluster-wide prewarming interference by adopting an evict-aware model placement strategy, (2) prepares universal GPU workers in advance by proactive prewarming, and (3) manages GPU memory with a zero-overhead memory switching mechanism. Evaluation under real-world datasets shows that WarmServe improves TTFT by up to 50.8$\times$ compared to the state-of-the-art autoscaling-based system, while being capable of serving up to 2.5$\times$ more requests compared to the GPU-sharing system.

【9】Black-Box Behavioral Distillation Breaks Safety Alignment in Medical LLMs
标题：黑匣子行为蒸馏打破了医学LLM的安全一致性
链接：https://arxiv.org/abs/2512.09403

作者：Sohely Jahan,Ruimin Sun
摘要：随着医学大语言模型（LLM）越来越多地集成到临床工作流程中，对对齐鲁棒性和安全性的担忧正在升级。先前关于模型提取的工作主要集中在分类模型或记忆泄漏上，使得安全对齐的生成医学LLM的脆弱性未得到充分探索。我们提出了一个黑盒蒸馏攻击，复制特定领域的推理安全对齐的医疗LLM只使用输出级访问。通过向Meditron-7 B发出48，000个指令查询并收集25，000个良性指令响应对，我们在零对齐监督设置下通过参数高效LoRA微调LLaMA 3 8B代理，无需访问模型权重，安全过滤器或训练数据。以12美元的成本，代理在良性输入上实现了很强的保真度，同时为86%的对抗性提示产生了不安全的完成，远远超过了Meditron-7 B（66%）和未调优的基础模型（46%）。这揭示了一个明显的功能伦理差距，任务效用转移，而对齐崩溃。为了分析这种崩溃，我们开发了一个动态对抗评估框架，该框架结合了基于生成查询（GQ）的有害提示生成、验证器过滤、分类故障分析和自适应随机搜索（RS）越狱攻击。我们还提出了一个分层的防御系统，作为一个原型检测器的实时对准漂移的黑箱部署。我们的研究结果表明，只有良性的黑盒蒸馏暴露了一个实际的和未被认识到的威胁：对手可以廉价地复制医疗LLM功能，同时剥离安全机制，强调需要提取意识的安全监测。
摘要：As medical large language models (LLMs) become increasingly integrated into clinical workflows, concerns around alignment robustness, and safety are escalating. Prior work on model extraction has focused on classification models or memorization leakage, leaving the vulnerability of safety-aligned generative medical LLMs underexplored. We present a black-box distillation attack that replicates the domain-specific reasoning of safety-aligned medical LLMs using only output-level access. By issuing 48,000 instruction queries to Meditron-7B and collecting 25,000 benign instruction response pairs, we fine-tune a LLaMA3 8B surrogate via parameter efficient LoRA under a zero-alignment supervision setting, requiring no access to model weights, safety filters, or training data. With a cost of $12, the surrogate achieves strong fidelity on benign inputs while producing unsafe completions for 86% of adversarial prompts, far exceeding both Meditron-7B (66%) and the untuned base model (46%). This reveals a pronounced functional-ethical gap, task utility transfers, while alignment collapses. To analyze this collapse, we develop a dynamic adversarial evaluation framework combining Generative Query (GQ)-based harmful prompt generation, verifier filtering, category-wise failure analysis, and adaptive Random Search (RS) jailbreak attacks. We also propose a layered defense system, as a prototype detector for real-time alignment drift in black-box deployments. Our findings show that benign-only black-box distillation exposes a practical and under-recognized threat: adversaries can cheaply replicate medical LLM capabilities while stripping safety mechanisms, underscoring the need for extraction-aware safety monitoring.

【10】Are Hypervectors Enough? Single-Call LLM Reasoning over Knowledge Graphs
标题：超载体足够了吗？知识图上的单次调用LLM推理
链接：https://arxiv.org/abs/2512.09369

作者：Yezi Liu,William Youngwoo Chung,Hanning Chen,Calvin Yeung,Mohsen Imani
摘要：大型语言模型（LLM）的最新进展已经实现了对结构化和非结构化知识的强推理。然而，当基于知识图（KG）时，主流管道依赖于重型神经编码器来嵌入和评分符号路径，或者依赖于重复的LLM调用来对候选者进行排名，导致高延迟，GPU成本和不透明的决策，阻碍了忠实的可扩展部署。我们提出了PathHD，一个轻量级和无编码器的KG推理框架，用超维计算（HDC）代替神经路径评分，每个查询只使用一个LLM调用。PathHD将关系路径编码为块对角GHRR超向量，使用分块余弦相似性和Top-K修剪对候选进行排名，然后执行一次性LLM判决以产生最终答案以及引用的支持路径。从技术上讲，PathHD建立在三个成分上：（i）用于路径组合的顺序感知，非交换绑定运算符，（ii）用于强大的基于hypervector的检索的校准相似性，以及（iii）一次判决步骤，保留可解释性，同时消除每路径LLM评分。在WebQSP、CWQ和GrailQA拆分上，PathHD（i）在每个查询使用一个LLM调用的同时，获得了与强神经基线相当或更好的Hits@1;（ii）由于无编码器检索，端到端延迟减少了40 -60\%$，GPU内存减少了3 - 5\倍$;（iii）提供了可靠的、基于路径的原理，提高了错误诊断和可控性。这些结果表明，精心设计的HDC表示提供了一个有效的KG-LLM推理的实用基板，提供了一个有利的准确性，效率，可解释性的权衡。
摘要：Recent advances in large language models (LLMs) have enabled strong reasoning over both structured and unstructured knowledge. When grounded on knowledge graphs (KGs), however, prevailing pipelines rely on heavy neural encoders to embed and score symbolic paths or on repeated LLM calls to rank candidates, leading to high latency, GPU cost, and opaque decisions that hinder faithful, scalable deployment. We propose PathHD, a lightweight and encoder-free KG reasoning framework that replaces neural path scoring with hyperdimensional computing (HDC) and uses only a single LLM call per query. PathHD encodes relation paths into block-diagonal GHRR hypervectors, ranks candidates with blockwise cosine similarity and Top-K pruning, and then performs a one-shot LLM adjudication to produce the final answer together with cited supporting paths. Technically, PathHD is built on three ingredients: (i) an order-aware, non-commutative binding operator for path composition, (ii) a calibrated similarity for robust hypervector-based retrieval, and (iii) a one-shot adjudication step that preserves interpretability while eliminating per-path LLM scoring. On WebQSP, CWQ, and the GrailQA split, PathHD (i) attains comparable or better Hits@1 than strong neural baselines while using one LLM call per query; (ii) reduces end-to-end latency by $40-60\%$ and GPU memory by $3-5\times$ thanks to encoder-free retrieval; and (iii) delivers faithful, path-grounded rationales that improve error diagnosis and controllability. These results indicate that carefully designed HDC representations provide a practical substrate for efficient KG-LLM reasoning, offering a favorable accuracy-efficiency-interpretability trade-off.

【11】Self Distillation Fine-Tuning of Protein Language Models Improves Versatility in Protein Design
标题：蛋白质语言模型的自蒸馏微调提高蛋白质设计的通用性
链接：https://arxiv.org/abs/2512.09329

作者：Amin Tavakoli,Raswanth Murugan,Ozan Gokdemir,Arvind Ramanathan,Frances Arnold,Anima Anandkumar
摘要：监督微调（SFT）是使大型语言模型适应专门领域的标准方法，但其在蛋白质序列建模和蛋白质语言模型（PLM）中的应用仍然是临时的。这部分是因为蛋白质的高质量注释数据比自然语言更难获得。我们提出了一个简单和通用的配方快速SFT的PLM，旨在提高保真度，可靠性和新颖性生成的蛋白质序列。与现有方法需要昂贵的预编译实验数据集进行SFT不同，我们的方法利用PLM本身，将轻量级的策展管道与特定于领域的过滤器集成在一起，以构建高质量的训练数据。这些过滤器可以独立地优化PLM的输出并识别体外评估的候选者;当与SFT结合时，它们使PLM能够产生更稳定和功能性的酶，同时将探索扩展到天然变体之外的蛋白质序列空间。虽然我们的方法是不可知的蛋白质语言模型（PLM）和蛋白质系统的选择，我们证明了其有效性与基因组规模的PLM（GenSLM）应用于色氨酸合成酶家族。监督微调模型生成的序列不仅更新颖，而且在目标设计约束和紧急蛋白质性质测量方面都显示出改进的特征。
摘要：Supervised fine-tuning (SFT) is a standard approach for adapting large language models to specialized domains, yet its application to protein sequence modeling and protein language models (PLMs) remains ad hoc. This is in part because high-quality annotated data are far more difficult to obtain for proteins than for natural language. We present a simple and general recipe for fast SFT of PLMs, designed to improve the fidelity, reliability, and novelty of generated protein sequences. Unlike existing approaches that require costly precompiled experimental datasets for SFT, our method leverages the PLM itself, integrating a lightweight curation pipeline with domain-specific filters to construct high-quality training data. These filters can independently refine a PLM's output and identify candidates for in vitro evaluation; when combined with SFT, they enable PLMs to generate more stable and functional enzymes, while expanding exploration into protein sequence space beyond natural variants. Although our approach is agnostic to both the choice of protein language model (PLM) and the protein system, we demonstrate its effectiveness with a genome-scale PLM (GenSLM) applied to the tryptophan synthase enzyme family. The supervised fine-tuned model generates sequences that are not only more novel but also display improved characteristics across both targeted design constraints and emergent protein property measures.

【12】LLMs for Analog Circuit Design Continuum (ACDC)
标题：模拟电路设计连续体（ACDC）
链接：https://arxiv.org/abs/2512.09199

作者：Yasaman Esfandiari,Jocelyn Rego,Austin Meyer,Jonathan Gallagher,Mia Levy
摘要：大型语言模型（LLM）和Transformer架构在各种自然语言任务中显示出令人印象深刻的推理和生成能力。然而，它们在现实世界的工程领域中的可靠性和鲁棒性在很大程度上仍未被探索，限制了它们在以人为中心的工作流程中的实际效用。在这项工作中，我们研究了LLM在模拟电路设计中的适用性和一致性-这是一项需要特定领域推理，遵守物理约束和结构化表示的任务-专注于人工智能辅助设计，其中人类仍然处于循环中。我们研究不同的数据表示如何影响模型行为，并比较较小的模型（例如，T5、GPT-2），以及更大的基础模型（例如，Mistral-7B，GPT-oss-20B）。我们的研究结果突出了关键的可靠性挑战，包括对数据格式的敏感性，生成的设计的不稳定性，以及对看不见的电路配置的有限推广。这些发现提供了LLM作为工具的局限性和潜力的早期证据，以增强人类在复杂工程任务中的能力，为结构化的现实世界应用设计可靠，可部署的基础模型提供了见解。
摘要：Large Language Models (LLMs) and transformer architectures have shown impressive reasoning and generation capabilities across diverse natural language tasks. However, their reliability and robustness in real-world engineering domains remain largely unexplored, limiting their practical utility in human-centric workflows. In this work, we investigate the applicability and consistency of LLMs for analog circuit design -- a task requiring domain-specific reasoning, adherence to physical constraints, and structured representations -- focusing on AI-assisted design where humans remain in the loop. We study how different data representations influence model behavior and compare smaller models (e.g., T5, GPT-2) with larger foundation models (e.g., Mistral-7B, GPT-oss-20B) under varying training conditions. Our results highlight key reliability challenges, including sensitivity to data format, instability in generated designs, and limited generalization to unseen circuit configurations. These findings provide early evidence on the limits and potential of LLMs as tools to enhance human capabilities in complex engineering tasks, offering insights into designing reliable, deployable foundation models for structured, real-world applications.

【13】Learning Unmasking Policies for Diffusion Language Models
标题：学习扩散语言模型的揭露策略
链接：https://arxiv.org/abs/2512.09106

作者：Metod Jazbec,Theo X. Olausson,Louis Béthune,Pierre Ablin,Michael Kirchhof,Joao Monterio,Victor Turrisi,Jason Ramapuram,Marco Cuturi
摘要：扩散（大型）语言模型（DLLM）现在在许多任务上与自回归模型的下游性能相匹配，同时有望在推理过程中更有效。一个特别成功的变体是掩蔽离散扩散，其中填充有特殊掩码标记的缓冲区逐渐被从模型词汇表中采样的标记替换。通过并行地暴露多个令牌可以提高效率，但是一次暴露太多令牌会降低生成质量。因此，dLLM的一个关键设计方面是在扩散过程的每个步骤选择要替换的令牌的采样过程。事实上，最近的工作已经发现，启发式策略，如置信度阈值导致更高的质量和令牌吞吐量相比，随机解蔽。然而，这种方法也有缺点：它们需要手动调优，而且我们观察到，它们的性能会随着缓冲区大小的增加而降低。在这项工作中，我们建议使用强化学习来训练采样程序。具体而言，我们正式掩蔽扩散采样作为一个马尔可夫决策过程中的dLLM作为环境，并提出了一个轻量级的政策架构的基础上，单层Transformer映射dLLM令牌的信心，以揭露决策。我们的实验表明，当与半自回归生成相结合时，这些经过训练的策略与最先进的算法的性能相匹配，同时在全扩散设置中优于它们。我们还研究了这些政策的可转移性，发现它们可以推广到新的底层DLLM和更长的序列长度。然而，我们也观察到，当应用于域外数据时，它们的性能会下降，并且精确性-效率权衡的细粒度调整可能对我们的方法具有挑战性。
摘要：Diffusion (Large) Language Models (dLLMs) now match the downstream performance of their autoregressive counterparts on many tasks, while holding the promise of being more efficient during inference. One particularly successful variant is masked discrete diffusion, in which a buffer filled with special mask tokens is progressively replaced with tokens sampled from the model's vocabulary. Efficiency can be gained by unmasking several tokens in parallel, but doing too many at once risks degrading the generation quality. Thus, one critical design aspect of dLLMs is the sampling procedure that selects, at each step of the diffusion process, which tokens to replace. Indeed, recent work has found that heuristic strategies such as confidence thresholding lead to both higher quality and token throughput compared to random unmasking. However, such heuristics have downsides: they require manual tuning, and we observe that their performance degrades with larger buffer sizes. In this work, we instead propose to train sampling procedures using reinforcement learning. Specifically, we formalize masked diffusion sampling as a Markov decision process in which the dLLM serves as the environment, and propose a lightweight policy architecture based on a single-layer transformer that maps dLLM token confidences to unmasking decisions. Our experiments show that these trained policies match the performance of state-of-the-art heuristics when combined with semi-autoregressive generation, while outperforming them in the full diffusion setting. We also examine the transferability of these policies, finding that they can generalize to new underlying dLLMs and longer sequence lengths. However, we also observe that their performance degrades when applied to out-of-domain data, and that fine-grained tuning of the accuracy-efficiency trade-off can be challenging with our approach.

【14】CluCERT: Certifying LLM Robustness via Clustering-Guided Denoising Smoothing
标题：CluCERT：通过搜索引导去噪平滑来认证LLM稳健性
链接：https://arxiv.org/abs/2512.08967

作者：Zixia Wang,Gaojie Jin,Jia Hu,Ronghui Mu
摘要：大型语言模型（LLM）的最新进展使其在日常应用中得到广泛采用。尽管它们的能力令人印象深刻，但它们仍然容易受到对抗性攻击，因为即使是微小的意义保留变化，如同义词替换，也可能导致错误的预测。因此，证明LLM对这种对抗性提示的鲁棒性至关重要。现有的方法集中在单词删除或简单的去噪策略，以实现鲁棒性认证。然而，这些方法面临两个关键的限制：（1）由于缺乏对扰动输出的语义验证，它们产生松散的鲁棒性边界;（2）由于重复采样，它们遭受高计算成本。为了解决这些局限性，我们提出了CluCERT，一个新的框架，通过聚类指导的去噪平滑来证明LLM的鲁棒性。具体来说，为了实现更严格的认证界限，我们引入了一个语义聚类过滤器，减少噪声样本，并保留有意义的扰动，理论分析的支持。此外，我们通过两种机制提高计算效率：一个提炼模块，提取核心语义，和一个快速的同义词替换策略，加速去噪过程。最后，我们对各种下游任务和越狱防御场景进行了广泛的实验。实验结果表明，我们的方法优于现有的认证方法在鲁棒性界限和计算效率。
摘要：Recent advancements in Large Language Models (LLMs) have led to their widespread adoption in daily applications. Despite their impressive capabilities, they remain vulnerable to adversarial attacks, as even minor meaning-preserving changes such as synonym substitutions can lead to incorrect predictions. As a result, certifying the robustness of LLMs against such adversarial prompts is of vital importance. Existing approaches focused on word deletion or simple denoising strategies to achieve robustness certification. However, these methods face two critical limitations: (1) they yield loose robustness bounds due to the lack of semantic validation for perturbed outputs and (2) they suffer from high computational costs due to repeated sampling. To address these limitations, we propose CluCERT, a novel framework for certifying LLM robustness via clustering-guided denoising smoothing. Specifically, to achieve tighter certified bounds, we introduce a semantic clustering filter that reduces noisy samples and retains meaningful perturbations, supported by theoretical analysis. Furthermore, we enhance computational efficiency through two mechanisms: a refine module that extracts core semantics, and a fast synonym substitution strategy that accelerates the denoising process. Finally, we conduct extensive experiments on various downstream tasks and jailbreak defense scenarios. Experimental results demonstrate that our method outperforms existing certified approaches in both robustness bounds and computational efficiency.

【15】LLM4XCE: Large Language Models for Extremely Large-Scale Massive MIMO Channel Estimation
标题：LLM 4XCE：用于超大规模大规模CDMA信道估计的大型语言模型
链接：https://arxiv.org/abs/2512.08955

作者：Renbin Li,Shuangshuang Li,Peihao Dong
摘要：超大规模大规模多输入多输出（XL-MIMO）是第六代（6 G）网络的关键推动因素，提供了巨大的空间自由度。尽管有这些优点，但混合场通道中近场和远场效应的共存对准确估计提出了重大挑战，传统方法通常难以有效地推广。近年来，大型语言模型（LLM）通过微调在下游任务上取得了令人印象深刻的性能，这与语义通信向面向任务的理解的转变相一致。受此启发，我们提出了XL-MIMO信道估计的大语言模型（LLM 4XCE），这是一种新的信道估计框架，它利用大语言模型的语义建模能力来恢复下游任务的基本空间信道表示。该模型将精心设计的嵌入模块与并行空间注意力相结合，实现了飞行员特征和空间结构的深度融合，为LLM输入构建了语义丰富的表示。通过仅微调最上面的两个Transformer层，我们的方法有效地捕获了试点数据中的潜在依赖关系，同时确保了高训练效率。大量的仿真表明，LLM 4XCE在混合场条件下显著优于现有的最先进的方法，实现了卓越的估计精度和泛化性能。
摘要：Extremely large-scale massive multiple-input multiple-output (XL-MIMO) is a key enabler for sixth-generation (6G) networks, offering massive spatial degrees of freedom. Despite these advantages, the coexistence of near-field and far-field effects in hybrid-field channels presents significant challenges for accurate estimation, where traditional methods often struggle to generalize effectively. In recent years, large language models (LLMs) have achieved impressive performance on downstream tasks via fine-tuning, aligning with the semantic communication shift toward task-oriented understanding over bit-level accuracy. Motivated by this, we propose Large Language Models for XL-MIMO Channel Estimation (LLM4XCE), a novel channel estimation framework that leverages the semantic modeling capabilities of large language models to recover essential spatial-channel representations for downstream tasks. The model integrates a carefully designed embedding module with Parallel Feature-Spatial Attention, enabling deep fusion of pilot features and spatial structures to construct a semantically rich representation for LLM input. By fine-tuning only the top two Transformer layers, our method effectively captures latent dependencies in the pilot data while ensuring high training efficiency. Extensive simulations demonstrate that LLM4XCE significantly outperforms existing state-of-the-art methods under hybrid-field conditions, achieving superior estimation accuracy and generalization performance.

【16】Don't Throw Away Your Beams: Improving Consistency-based Uncertainties in LLMs via Beam Search
标题：不要扔掉你的梁：通过梁搜索改善LLM中基于一致性的不确定性
链接：https://arxiv.org/abs/2512.09538

作者：Ekaterina Fadeeva,Maiya Goloburda,Aleksandr Rubashevskii,Roman Vashurin,Artem Shelmanov,Preslav Nakov,Mrinmaya Sachan,Maxim Panov
摘要：基于一致性的方法已经成为大型语言模型中不确定性量化（UQ）的有效方法。这些方法通常依赖于通过多项式抽样获得的几代，测量其一致性水平。然而，在短形式QA中，由于峰值分布，多项式抽样容易产生重复，并且其随机性在运行期间的不确定性估计中引入了相当大的方差。我们介绍了一个新的家庭的方法，采用波束搜索生成候选人的一致性为基础的UQ，产生更好的性能和降低方差相比，多项式采样。我们还提供了一个理论上的波束集概率质量的下限下，波束搜索实现比多项式采样更小的误差。我们在六个QA数据集上对我们的方法进行了经验评估，发现它在多项式采样上的一致改进导致了最先进的UQ性能。
摘要：Consistency-based methods have emerged as an effective approach to uncertainty quantification (UQ) in large language models. These methods typically rely on several generations obtained via multinomial sampling, measuring their agreement level. However, in short-form QA, multinomial sampling is prone to producing duplicates due to peaked distributions, and its stochasticity introduces considerable variance in uncertainty estimates across runs. We introduce a new family of methods that employ beam search to generate candidates for consistency-based UQ, yielding improved performance and reduced variance compared to multinomial sampling. We also provide a theoretical lower bound on the beam set probability mass under which beam search achieves a smaller error than multinomial sampling. We empirically evaluate our approach on six QA datasets and find that its consistent improvements over multinomial sampling lead to state-of-the-art UQ performance.

Graph相关(图学习|图神经网络|图优化等)(8篇)

【1】Incorporating Fairness in Neighborhood Graphs for Fair Spectral Clustering
标题：公平谱聚集的邻居图中的公平性
链接：https://arxiv.org/abs/2512.09810

作者：Adithya K Moorthy,V Vijaya Saradhi,Bhanu Prasad
摘要：图聚类在谱聚类等无监督学习方法中起着关键作用，但传统的图聚类方法往往会通过不公平的图构造来保持偏见，这些图构造可能会低估某些群体。目前的研究引入了新的方法来构建公平的k-最近邻（kNN）和公平的ε-邻域图，积极实施人口统计奇偶校验在图形形成。通过在邻域选择步骤的最早阶段引入公平性约束，我们的方法将敏感特征的比例表示纳入局部图结构，同时保持几何一致性，我们的工作解决了公平谱聚类预处理中的一个关键空白，表明图构造中的拓扑公平性对于实现公平聚类结果至关重要。广泛使用的图构造方法，如kNN和ε邻域图传播基于边缘的不同影响敏感组，导致有偏见的聚类结果。在每个节点的邻域中提供每个敏感组的表示导致更公平的谱聚类结果，因为图的拓扑特征自然地反映公平的组比率。这项研究填补了公平的无监督学习的一个重要缺陷，说明如何拓扑公平图建设本身有利于更公平的谱聚类结果，而不需要改变聚类算法本身。在三个合成数据集，七个真实世界的表格数据集，和三个真实世界的图像数据集上的实验证明，我们的公平图构造方法超过了目前的基线图聚类任务。
摘要：Graph clustering plays a pivotal role in unsupervised learning methods like spectral clustering, yet traditional methods for graph clustering often perpetuate bias through unfair graph constructions that may underrepresent some groups. The current research introduces novel approaches for constructing fair k-nearest neighbor (kNN) and fair epsilon-neighborhood graphs that proactively enforce demographic parity during graph formation. By incorporating fairness constraints at the earliest stage of neighborhood selection steps, our approaches incorporate proportional representation of sensitive features into the local graph structure while maintaining geometric consistency.Our work addresses a critical gap in pre-processing for fair spectral clustering, demonstrating that topological fairness in graph construction is essential for achieving equitable clustering outcomes. Widely used graph construction methods like kNN and epsilon-neighborhood graphs propagate edge based disparate impact on sensitive groups, leading to biased clustering results. Providing representation of each sensitive group in the neighborhood of every node leads to fairer spectral clustering results because the topological features of the graph naturally reflect equitable group ratios. This research fills an essential shortcoming in fair unsupervised learning, by illustrating how topological fairness in graph construction inherently facilitates fairer spectral clustering results without the need for changes to the clustering algorithm itself. Thorough experiments on three synthetic datasets, seven real-world tabular datasets, and three real-world image datasets prove that our fair graph construction methods surpass the current baselines in graph clustering tasks.

【2】M3Net: A Multi-Metric Mixture of Experts Network Digital Twin with Graph Neural Networks
标题：M3 Net：多指标专家混合网络数字孪生图神经网络
链接：https://arxiv.org/abs/2512.09797

作者：Blessed Guda,Carlee Joe-Wong
摘要：5G/6 G网络技术的兴起有望实现自动驾驶汽车和虚拟现实等应用，导致联网设备大幅增加，并必然使网络管理复杂化。更糟糕的是，这些应用程序通常在延迟和可靠性等指标上有严格但异构的性能要求。因此，最近的许多工作都集中在开发预测网络性能的能力上。然而，传统的网络建模方法，如离散事件模拟器和仿真，往往无法平衡的准确性和可扩展性。通过机器学习增强的网络数字孪生（NDT）通过创建物理网络的虚拟副本进行实时仿真和分析，提供了一种可行的解决方案。然而，最先进的模型还没有成熟的NDT，因为它们通常只关注单个性能指标或模拟网络数据。我们介绍了M3 Net，一个多指标专家混合（MoE）NDT，它使用图神经网络架构来估计多个性能指标，从一系列场景中的扩展网络状态数据集。我们表明，M3 Net通过将MAPE（平均绝对百分比误差）从20.06%降低到17.39%，同时还实现了每个流的抖动和数据包丢失的66.47%和78.7%的准确性，从而显着提高了流延迟预测的准确性
摘要：The rise of 5G/6G network technologies promises to enable applications like autonomous vehicles and virtual reality, resulting in a significant increase in connected devices and necessarily complicating network management. Even worse, these applications often have strict, yet heterogeneous, performance requirements across metrics like latency and reliability. Much recent work has thus focused on developing the ability to predict network performance. However, traditional methods for network modeling, like discrete event simulators and emulation, often fail to balance accuracy and scalability. Network Digital Twins (NDTs), augmented by machine learning, present a viable solution by creating virtual replicas of physical networks for real- time simulation and analysis. State-of-the-art models, however, fall short of full-fledged NDTs, as they often focus only on a single performance metric or simulated network data. We introduce M3Net, a Multi-Metric Mixture-of-experts (MoE) NDT that uses a graph neural network architecture to estimate multiple performance metrics from an expanded set of network state data in a range of scenarios. We show that M3Net significantly enhances the accuracy of flow delay predictions by reducing the MAPE (Mean Absolute Percentage Error) from 20.06% to 17.39%, while also achieving 66.47% and 78.7% accuracy on jitter and packets dropped for each flow

【3】BugSweeper: Function-Level Detection of Smart Contract Vulnerabilities Using Graph Neural Networks
标题：BugSweeper：使用图神经网络对智能合约漏洞进行功能级检测
链接：https://arxiv.org/abs/2512.09385

作者：Uisang Lee,Changhoon Chung,Junmo Lee,Soo-Mook Moon
备注：This paper is accepted to AAAI 2026
摘要：以太坊的快速增长使得快速准确地检测智能合约漏洞变得更加重要。虽然基于机器学习的方法已经显示出一些前景，但许多方法仍然依赖于领域专家设计的基于规则的预处理。基于规则的预处理方法通常会丢弃源代码中的关键上下文，这可能会导致某些漏洞被忽视，并限制了对新出现的威胁的适应性。我们介绍了BugSweeper，这是一个端到端的深度学习框架，可以直接从源代码中检测漏洞，而无需手动工程。BugSweeper将每个Solidity函数表示为函数级抽象树图（FLAG），这是一种新颖的图，将其抽象树（AST）与丰富的控制流和数据流语义相结合。然后，我们的两阶段图神经网络（GNN）分析这些图。第一阶段GNN从语法图中过滤噪声，而第二阶段GNN进行高级推理以检测各种漏洞。对真实合同的大量实验表明，BugSweeper的性能明显优于所有最先进的检测方法。通过消除对手工规则的需求，我们的方法提供了一个强大的，自动化的，可扩展的解决方案，用于保护智能合约，而无需依赖安全专家。
摘要：The rapid growth of Ethereum has made it more important to quickly and accurately detect smart contract vulnerabilities. While machine-learning-based methods have shown some promise, many still rely on rule-based preprocessing designed by domain experts. Rule-based preprocessing methods often discard crucial context from the source code, potentially causing certain vulnerabilities to be overlooked and limiting adaptability to newly emerging threats. We introduce BugSweeper, an end-to-end deep learning framework that detects vulnerabilities directly from the source code without manual engineering. BugSweeper represents each Solidity function as a Function-Level Abstract Syntax Graph (FLAG), a novel graph that combines its Abstract Syntax Tree (AST) with enriched control-flow and data-flow semantics. Then, our two-stage Graph Neural Network (GNN) analyzes these graphs. The first-stage GNN filters noise from the syntax graphs, while the second-stage GNN conducts high-level reasoning to detect diverse vulnerabilities. Extensive experiments on real-world contracts show that BugSweeper significantly outperforms all state-of-the-art detection methods. By removing the need for handcrafted rules, our approach offers a robust, automated, and scalable solution for securing smart contracts without any dependence on security experts.

【4】KGOT: Unified Knowledge Graph and Optimal Transport Pseudo-Labeling for Molecule-Protein Interaction Prediction
标题：KGOT：分子-蛋白质相互作用预测的统一知识图和最佳运输伪标记
链接：https://arxiv.org/abs/2512.09365

作者：Jiayu Qin,Zhengquan Luo,Guy Tadmor,Changyou Chen,David Zeevi,Zhiqiang Xu
摘要：预测分子-蛋白质相互作用（MPI）是计算生物学中的一项基本任务，在药物发现和分子功能注释中具有重要应用。然而，现有的MPI模型面临两个主要挑战。首先，标记的分子-蛋白质对的稀缺性显著限制了模型的性能，因为可用的数据集仅捕获了一小部分生物相关相互作用。其次，大多数方法仅依赖于分子和蛋白质特征，忽略了更广泛的生物学背景，如基因，代谢途径和功能注释，可以提供必要的补充信息。为了解决这些限制，我们的框架首先聚合不同的生物数据集，包括分子，蛋白质，基因和路径水平的相互作用，然后开发一种基于最佳运输的方法来为未标记的分子-蛋白质对生成高质量的伪标签，利用已知相互作用的潜在分布来指导标签分配。通过将伪标记视为桥接不同生物学模式的机制，我们的方法能够有效利用异构数据来增强MPI预测。我们在多个MPI数据集上评估了我们的框架，包括虚拟筛选任务和蛋白质检索任务，证明了在预测准确性和跨看不见的相互作用的zero shot能力方面比最先进的方法有了实质性的改进。除了MPI预测之外，我们的方法还提供了一种新的范式，可以利用不同的生物数据源来解决传统上受单模态或双模态学习限制的问题，为计算生物学和药物发现的未来发展铺平道路。
摘要：Predicting molecule-protein interactions (MPIs) is a fundamental task in computational biology, with crucial applications in drug discovery and molecular function annotation. However, existing MPI models face two major challenges. First, the scarcity of labeled molecule-protein pairs significantly limits model performance, as available datasets capture only a small fraction of biological relevant interactions. Second, most methods rely solely on molecular and protein features, ignoring broader biological context such as genes, metabolic pathways, and functional annotations that could provide essential complementary information. To address these limitations, our framework first aggregates diverse biological datasets, including molecular, protein, genes and pathway-level interactions, and then develop an optimal transport-based approach to generate high-quality pseudo-labels for unlabeled molecule-protein pairs, leveraging the underlying distribution of known interactions to guide label assignment. By treating pseudo-labeling as a mechanism for bridging disparate biological modalities, our approach enables the effective use of heterogeneous data to enhance MPI prediction. We evaluate our framework on multiple MPI datasets including virtual screening tasks and protein retrieval tasks, demonstrating substantial improvements over state-of-the-art methods in prediction accuracies and zero shot ability across unseen interactions. Beyond MPI prediction, our approach provides a new paradigm for leveraging diverse biological data sources to tackle problems traditionally constrained by single- or bi-modal learning, paving the way for future advances in computational biology and drug discovery.

【5】Understanding the Failure Modes of Transformers through the Lens of Graph Neural Networks
标题：通过图神经网络的视角了解Transformer的故障模式
链接：https://arxiv.org/abs/2512.09182

作者：Hunjae Lee
摘要：Transformers以及更具体地仅解码器的transformers主导现代LLM架构。虽然它们表现得非常好，但它们并非没有问题，导致了令人惊讶的故障模式和可预见的不对称性能下降。本文是通过图神经网络（GNN）理论的镜头，许多这些观察到的Transformers器故障模式的研究。我们首先说明，包括Transformers在内的大部分深度学习都是关于可学习的信息混合和传播的。这使得对模型失效模式的研究成为对信息传播瓶颈的研究。这自然会导致GNN理论，其中已经有大量关于信息传播瓶颈和模型理论故障模式的文献。然后，我们提出GNN面临的许多问题也是Transformers所经历的。此外，我们还分析了仅解码器Transformers的因果性质如何在信息传播中产生有趣的几何特性，从而导致可预测的和潜在的破坏性故障模式。最后，我们观察到，现有的解决方案，在Transformer的研究往往是特设的，由直觉驱动，而不是接地的理论动机。因此，我们将许多这样的解决方案统一在一个更加理论化的视角下，深入了解它们为什么工作，它们实际上解决了什么问题，以及如何进一步改进它们以针对Transformers器的特定故障模式。总的来说，这篇文章试图弥合Transformers中观察到的故障模式之间的差距，并在这个空间中普遍缺乏对它们的理论理解。
摘要：Transformers and more specifically decoder-only transformers dominate modern LLM architectures. While they have shown to work exceptionally well, they are not without issues, resulting in surprising failure modes and predictably asymmetric performance degradation. This article is a study of many of these observed failure modes of transformers through the lens of graph neural network (GNN) theory. We first make the case that much of deep learning, including transformers, is about learnable information mixing and propagation. This makes the study of model failure modes a study of bottlenecks in information propagation. This naturally leads to GNN theory, where there is already a rich literature on information propagation bottlenecks and theoretical failure modes of models. We then make the case that many issues faced by GNNs are also experienced by transformers. In addition, we analyze how the causal nature of decoder-only transformers create interesting geometric properties in information propagation, resulting in predictable and potentially devastating failure modes. Finally, we observe that existing solutions in transformer research tend to be ad-hoc and driven by intuition rather than grounded theoretical motivation. As such, we unify many such solutions under a more theoretical perspective, providing insight into why they work, what problem they are actually solving, and how they can be further improved to target specific failure modes of transformers. Overall, this article is an attempt to bridge the gap between observed failure modes in transformers and a general lack of theoretical understanding of them in this space.

【6】Graph Deep Learning for Intracranial Aneurysm Blood Flow Simulation and Risk Assessment
标题：用于脑动脉瘤血流模拟和风险评估的图形深度学习
链接：https://arxiv.org/abs/2512.09013

作者：Paul Garnier,Pablo Jeken-Rico,Vincent Lannelongue,Chiara Faitini,Aurèle Goetz,Lea Chanvillard,Ramy Nemer,Jonathan Viquerat,Ugo Pelissier,Philippe Meliga,Jacques Sédat,Thomas Liebig,Yves Chau,Elie Hachem
摘要：颅内动脉瘤仍然是世界范围内神经系统发病率和死亡率的主要原因，其中破裂风险与局部血流动力学密切相关，特别是壁剪切应力和振荡剪切指数。传统的计算流体动力学模拟提供了准确的见解，但速度非常慢，需要专门的专业知识。临床成像替代方案（如4D Flow MRI）提供直接的体内测量，但其空间分辨率仍不足以捕获驱动内皮重塑和破裂风险的精细尺度剪切模式，同时非常不切实际且昂贵。我们提出了一个图形神经网络代理模型，通过在每个心动周期不到一分钟的时间内直接从血管几何形状再现全场血液动力学来弥合这一差距。在患者特定动脉瘤的高保真模拟的综合数据集上训练，我们的架构将图形Transformers与自回归预测相结合，以准确地模拟血流、壁剪切应力和振荡剪切指数。该模型概括了不可见的患者几何形状和流入条件，无需特定的网格校准。除了加速模拟，我们的框架为临床可解释的血流动力学预测奠定了基础。通过实现与现有成像管道集成的近实时推理，它允许与医院相图评估进行直接比较，并通过物理接地，高分辨率流场扩展它们。这项工作将高保真模拟从仅限专家的研究工具转变为可部署的数据驱动的决策支持系统。我们的完整管道在患者成像的几分钟内提供高分辨率的血流动力学预测，而不需要计算专家，标志着向实时床旁动脉瘤分析的转变。
摘要：Intracranial aneurysms remain a major cause of neurological morbidity and mortality worldwide, where rupture risk is tightly coupled to local hemodynamics particularly wall shear stress and oscillatory shear index. Conventional computational fluid dynamics simulations provide accurate insights but are prohibitively slow and require specialized expertise. Clinical imaging alternatives such as 4D Flow MRI offer direct in-vivo measurements, yet their spatial resolution remains insufficient to capture the fine-scale shear patterns that drive endothelial remodeling and rupture risk while being extremely impractical and expensive. We present a graph neural network surrogate model that bridges this gap by reproducing full-field hemodynamics directly from vascular geometries in less than one minute per cardiac cycle. Trained on a comprehensive dataset of high-fidelity simulations of patient-specific aneurysms, our architecture combines graph transformers with autoregressive predictions to accurately simulate blood flow, wall shear stress, and oscillatory shear index. The model generalizes across unseen patient geometries and inflow conditions without mesh-specific calibration. Beyond accelerating simulation, our framework establishes the foundation for clinically interpretable hemodynamic prediction. By enabling near real-time inference integrated with existing imaging pipelines, it allows direct comparison with hospital phase-diagram assessments and extends them with physically grounded, high-resolution flow fields. This work transforms high-fidelity simulations from an expert-only research tool into a deployable, data-driven decision support system. Our full pipeline delivers high-resolution hemodynamic predictions within minutes of patient imaging, without requiring computational specialists, marking a step-change toward real-time, bedside aneurysm analysis.

【7】SEA: Spectral Edge Attacks on Graph Neural Networks
标题：SEA：对图神经网络的频谱边缘攻击
链接：https://arxiv.org/abs/2512.08964

作者：Yongyu Wang
摘要：图神经网络（GNN）在图结构数据上实现了强大的性能，但众所周知，它容易受到图结构的小而精心制作的扰动。大多数现有的基于结构的攻击依赖于基于梯度的拓扑或局部连接模式，并将边缘视为同样重要的候选操作。在本文中，我们提出了谱边缘攻击（SEA），这是一种新的对抗性攻击，它明确地利用谱鲁棒性评估来指导结构扰动。我们的关键思想是计算一个谱嵌入，它捕获输入流形的最脆弱的方向，并使用它来分配一个鲁棒性得分到每个边缘或非边缘。基于这些分数，我们引入了两种互补的攻击变体：（i）一个黑桃引导删除攻击，删除最谱鲁棒的边缘，和（ii）一个黑桃引导添加攻击，插入边缘之间的节点是最不兼容的脆弱的谱空间。这两种攻击都是在图级别上操作的，是模型感知的，但概念上很简单，并且可以插入到现有的GNN架构中，而不需要梯度。我们描述的频谱制定，攻击算法，和实验的基准。
摘要：Graph Neural Networks (GNNs) achieve strong performance on graph-structured data, but are notoriously vulnerable to small, carefully crafted perturbations of the graph structure. Most existing structure-based attacks rely on gradient-based heuristics or local connectivity patterns, and treat edges as equally important candidates for manipulation. In this paper, we propose Spectral Edge Attacks (SEA), a new family of adversarial attacks that explicitly leverage spectral robustness evaluation to guide structural perturbations. Our key idea is to compute a spectral embedding that captures the most fragile directions of the input manifold and to use it to assign a robustness score to each edge or non-edge. Based on these scores, we introduce two complementary attack variants: (i) a Spade-guided deletion attack that removes the most spectrally robust edges, and (ii) a Spade-guided addition attack that inserts edges between nodes that are maximally incompatible in the fragile spectral space. Both attacks operate at the graph level, are model-aware but conceptually simple, and can be plugged into existing GNN architectures without requiring gradients. We describe the spectral formulation, the attack algorithms, and experiments on benchmarks.

【8】Graph-Based Bayesian Optimization for Quantum Circuit Architecture Search with Uncertainty Calibrated Surrogates
标题：具有不确定性校准替代物的量子电路架构搜索的基于图的Bayesian优化
链接：https://arxiv.org/abs/2512.09586

作者：Prashant Kumar Choudhary,Nouhaila Innan,Muhammad Shafique,Rajeev Singh
备注：17 pages, 13 figures
摘要：量子电路设计是在复杂的真实世界数据上进行实际量子机器学习的关键瓶颈。我们提出了一个自动化的框架，发现和改进变分量子电路（VQC）使用基于图的贝叶斯优化与图神经网络（GNN）代理。电路被表示为图形和突变，并通过预期的改进获取功能通知替代不确定性与蒙特卡洛辍学。在量子嵌入的特征选择和缩放之后，使用混合量子经典变分分类器对下一代防火墙遥测和网络物联网（NF-ToN-IoT-V2）网络安全数据集进行评估。我们将管道与基于MLP的代理、随机搜索和贪婪GNN选择进行基准测试。GNN引导的优化器始终发现与所有基线相比具有较低复杂性和竞争力或更高分类准确性的电路。鲁棒性通过跨标准量子噪声通道的噪声研究来评估，包括振幅阻尼、相位阻尼、热弛豫、去极化和读出位翻转噪声。该实现是完全可再现的，具有时间基准和最佳电路的导出，为自动化量子电路发现提供了可扩展和可解释的路线。
摘要：Quantum circuit design is a key bottleneck for practical quantum machine learning on complex, real-world data. We present an automated framework that discovers and refines variational quantum circuits (VQCs) using graph-based Bayesian optimization with a graph neural network (GNN) surrogate. Circuits are represented as graphs and mutated and selected via an expected improvement acquisition function informed by surrogate uncertainty with Monte Carlo dropout. Candidate circuits are evaluated with a hybrid quantum-classical variational classifier on the next generation firewall telemetry and network internet of things (NF-ToN-IoT-V2) cybersecurity dataset, after feature selection and scaling for quantum embedding. We benchmark our pipeline against an MLP-based surrogate, random search, and greedy GNN selection. The GNN-guided optimizer consistently finds circuits with lower complexity and competitive or superior classification accuracy compared to all baselines. Robustness is assessed via a noise study across standard quantum noise channels, including amplitude damping, phase damping, thermal relaxation, depolarizing, and readout bit flip noise. The implementation is fully reproducible, with time benchmarking and export of best found circuits, providing a scalable and interpretable route to automated quantum circuit discovery.

Transformer(6篇)

【1】Circuits, Features, and Heuristics in Molecular Transformers
标题：分子Transformer中的电路、特征和启发式
链接：https://arxiv.org/abs/2512.09757

作者：Kristof Varadi,Mark Marosi,Peter Antal
摘要：Transformers产生有效的和多样的化学结构，但很少有人知道的机制，使这些模型捕捉分子表示的规则。我们提出了一个机制分析的自回归Transformers训练药物样小分子揭示其能力的计算结构在多个抽象层次。我们确定计算模式符合低层次的句法分析和更抽象的化学有效性约束。使用稀疏自动编码器（SAE），我们提取与化学相关的激活模式相关的特征字典。我们验证了我们对下游任务的研究结果，并发现机械的见解可以在各种实际环境中转化为预测性能。
摘要：Transformers generate valid and diverse chemical structures, but little is known about the mechanisms that enable these models to capture the rules of molecular representation. We present a mechanistic analysis of autoregressive transformers trained on drug-like small molecules to reveal the computational structure underlying their capabilities across multiple levels of abstraction. We identify computational patterns consistent with low-level syntactic parsing and more abstract chemical validity constraints. Using sparse autoencoders (SAEs), we extract feature dictionaries associated with chemically relevant activation patterns. We validate our findings on downstream tasks and find that mechanistic insights can translate to predictive performance in various practical settings.

【2】Interpreto: An Explainability Library for Transformers
标题：Interpreto：Transformer的解释性库
链接：https://arxiv.org/abs/2512.09730

作者：Antonin Poché,Thomas Mullor,Gabriele Sarti,Frédéric Boisnard,Corentin Friedrich,Charlotte Claye,François Hoofd,Raphael Bernas,Céline Hudelot,Fanny Jourdan
备注：Equal contribution: Poché and Jourdan
摘要：Interpreto是一个Python库，用于文本HuggingFace模型的事后可解释性，从早期的BERT变体到LLM。它提供了两种互补的方法：归因和基于概念的解释。该图书馆将最近的研究与数据科学家的实用工具联系起来，旨在为最终用户提供解释。它包括文档、示例和教程。 Interpreto通过统一的API支持分类和生成模型。一个关键的区别是它的基于概念的功能，这超出了功能级别的属性，在现有的库中是不常见的。该库是开源的;通过pip install interpreto安装。代码和文档可在https://github.com/FOR-sight-ai/interpreto上获得。
摘要：Interpreto is a Python library for post-hoc explainability of text HuggingFace models, from early BERT variants to LLMs. It provides two complementary families of methods: attributions and concept-based explanations. The library connects recent research to practical tooling for data scientists, aiming to make explanations accessible to end users. It includes documentation, examples, and tutorials. Interpreto supports both classification and generation models through a unified API. A key differentiator is its concept-based functionality, which goes beyond feature-level attributions and is uncommon in existing libraries. The library is open source; install via pip install interpreto. Code and documentation are available at https://github.com/FOR-sight-ai/interpreto.

【3】Towards Resilient Transportation: A Conditional Transformer for Accident-Informed Traffic Forecasting
标题：面向弹性交通：一个事故信息交通预测的条件Transformer
链接：https://arxiv.org/abs/2512.09398

作者：Hongjun Wang,Jiawei Yong,Jiawei Wang,Shintaro Fukushima,Renhe Jiang
摘要：尽管深度学习取得了进展，但流量预测仍然是时空数据挖掘中的一个关键挑战。准确的预测受到交通事故和法规等外部因素的复杂影响，由于有限的数据集成，现有模型经常忽略这些因素。为了解决这些限制，我们提出了两个丰富的交通数据集从东京和加利福尼亚州，结合交通事故和法规数据。利用这些数据集，我们提出了ConFormer（Conditional Transformer），这是一个新的框架，它将图形传播与引导规范化层集成在一起。该设计基于历史模式动态调整空间和时间节点关系，提高预测准确性。我们的模型在预测性能和效率方面都超过了最先进的STAEFormer，实现了更低的计算成本和更少的参数需求。广泛的评估表明，ConFormer在多个指标上始终优于主流时空基线，强调了其推进交通预测研究的潜力。
摘要：Traffic prediction remains a key challenge in spatio-temporal data mining, despite progress in deep learning. Accurate forecasting is hindered by the complex influence of external factors such as traffic accidents and regulations, often overlooked by existing models due to limited data integration. To address these limitations, we present two enriched traffic datasets from Tokyo and California, incorporating traffic accident and regulation data. Leveraging these datasets, we propose ConFormer (Conditional Transformer), a novel framework that integrates graph propagation with guided normalization layer. This design dynamically adjusts spatial and temporal node relationships based on historical patterns, enhancing predictive accuracy. Our model surpasses the state-of-the-art STAEFormer in both predictive performance and efficiency, achieving lower computational costs and reduced parameter demands. Extensive evaluations demonstrate that ConFormer consistently outperforms mainstream spatio-temporal baselines across multiple metrics, underscoring its potential to advance traffic prediction research.

【4】StructuredDNA: A Bio-Physical Framework for Energy-Aware Transformer Routing
标题：StructuredDNA：能量感知Transformer路由的生物物理框架
链接：https://arxiv.org/abs/2512.08968

作者：Mustapha Hamdi
摘要：大型计算模型的快速扩展导致了能源和计算成本的急剧增加。受生物系统的结构和功能出现从低能量配置的启发，我们介绍StructuredDNA，一个稀疏的架构框架模块化，能量感知Transformer路由。StructuredDNA用基于语义能量最小化的生物物理能量引导路由层取代了密集的专家混合路由。输入被动态分组为语义密码子，路由选择一个单一的专家通过最小化的全球能源功能，结合凝聚力，不确定性和计算成本。我们在专业（BioASQ）和开放域基准（WikiText-103）上验证StructuredDNA。在BioASQ（K = 50）上，我们实现了97.7%的能量利用密度（EUD）降低和0.998的语义稳定性指数（SSI）。我们进一步证明了语义缩放法WikiText-103，显示架构推广到开放域的扩展专家粒度（K = 2048），同时保持超过99%的能源效率。因此，StructuredDNA为未来的稀疏计算框架建立了一个强大的，领域不可知的范式。 StructuredDNA在生物物理原理和Transformer架构中的稀疏专家路由之间提供了明确的联系，并指向未来的能量感知，模块化和可扩展的计算系统。我们讨论了这种概念验证研究的局限性，并概述了将该方法扩展到更大模型，数据集和硬件平台的方向。StructuredDNA实现可在https://github.com/InnoDeep-repos/StructuredDNA上获得。
摘要：The rapid scaling of large computational models has led to a critical increase in energy and compute costs. Inspired by biological systems where structure and function emerge from low-energy configurations, we introduce StructuredDNA, a sparse architecture framework for modular, energy-aware Transformer routing. StructuredDNA replaces dense Mixture-of-Experts routing with a bio-physical, energy-guided routing layer based on semantic energy minimization. Inputs are dynamically grouped into semantic codons, and routing selects a single expert by minimizing a global energy functional that combines cohesion, uncertainty, and computational cost. We validate StructuredDNA on both specialized (BioASQ) and open-domain benchmarks (WikiText-103). On BioASQ (K = 50), we achieve a 97.7% reduction in Energy Utilization Density (EUD) and a Semantic Stability Index (SSI) of 0.998. We further demonstrate a Semantic Scaling Law on WikiText-103, showing that the architecture generalizes to open domains by scaling expert granularity (K = 2048) while maintaining more than 99% energy efficiency. StructuredDNA thus establishes a robust, domain-agnostic paradigm for future sparse computational frameworks. StructuredDNA provides an explicit link between bio-physical principles and sparse expert routing in Transformer architectures, and points toward future energy-aware, modular, and scalable computational systems. We discuss limitations of this proof-of-concept study and outline directions for scaling the approach to larger models, datasets, and hardware platforms. The StructuredDNA implementation is available at https://github.com/InnoDeep-repos/StructuredDNA .

【5】Transformers for Tabular Data: A Training Perspective of Self-Attention via Optimal Transport
标题：表格数据的变形者：通过最佳传输进行自我注意力的训练视角
链接：https://arxiv.org/abs/2512.09530

作者：Antonio Candelieri,Alessandro Quadrio
摘要：本论文通过最优运输（OT）的视角来研究自我注意力训练，并开发了一种基于OT的表格分类方法。该研究跟踪了训练过程中自我注意力层的中间投射，并使用离散OT指标（包括Wasserstein距离、Monge差距、最优性和效率）评估了它们的演变。实验进行分类任务与两个和三个类，以及生物医学数据集。结果表明，最终的自我注意力映射往往接近OT最佳耦合，但训练轨迹仍然效率低下。在合成数据上预训练MLP部分部分可以部分改善收敛性，但对它们的初始化敏感。为了解决这些限制，引入了一种基于OT的算法：它生成特定于类的虚拟高斯分布，计算OT与数据的对齐，并训练MLP来推广这种映射。该方法实现了与Transformers相当的精度，同时降低了计算成本，并在标准化输入下更有效地扩展，尽管其性能取决于仔细的虚拟几何设计。所有的实验和实现都是在R.
摘要：This thesis examines self-attention training through the lens of Optimal Transport (OT) and develops an OT-based alternative for tabular classification. The study tracks intermediate projections of the self-attention layer during training and evaluates their evolution using discrete OT metrics, including Wasserstein distance, Monge gap, optimality, and efficiency. Experiments are conducted on classification tasks with two and three classes, as well as on a biomedical dataset. Results indicate that the final self-attention mapping often approximates the OT optimal coupling, yet the training trajectory remains inefficient. Pretraining the MLP section on synthetic data partially improves convergence but is sensitive to their initialization. To address these limitations, an OT-based algorithm is introduced: it generates class-specific dummy Gaussian distributions, computes an OT alignment with the data, and trains an MLP to generalize this mapping. The method achieves accuracy comparable to Transformers while reducing computational cost and scaling more efficiently under standardized inputs, though its performance depends on careful dummy-geometry design. All experiments and implementations are conducted in R.

【6】Impact of Positional Encoding: Clean and Adversarial Rademacher Complexity for Transformers under In-Context Regression
标题：位置编码的影响：上下文回归下Transformer的清晰且对抗的Rademacher复杂性
链接：https://arxiv.org/abs/2512.09275

作者：Weiyi He,Yue Xing
备注：25 pages, 3 figures
摘要：位置编码（PE）是Transformers的核心架构组件，但其对Transformer的通用性和鲁棒性的影响仍不清楚。在这项工作中，我们提供了第一个泛化分析的单层Transformer下的上下文回归，明确占一个完全可训练的PE模块。我们的研究结果表明，PE系统地扩大了泛化差距。扩展到对抗设置，我们推导出对抗Rademacher泛化界。我们发现，有和没有PE的模型之间的差距被放大的攻击，表明PE放大模型的脆弱性。我们的界限是经验验证的模拟研究。总之，这项工作建立了一个新的框架，以了解清洁和对抗性的概括ICL与PE。
摘要：Positional encoding (PE) is a core architectural component of Transformers, yet its impact on the Transformer's generalization and robustness remains unclear. In this work, we provide the first generalization analysis for a single-layer Transformer under in-context regression that explicitly accounts for a completely trainable PE module. Our result shows that PE systematically enlarges the generalization gap. Extending to the adversarial setting, we derive the adversarial Rademacher generalization bound. We find that the gap between models with and without PE is magnified under attack, demonstrating that PE amplifies the vulnerability of models. Our bounds are empirically validated by a simulation study. Together, this work establishes a new framework for understanding the clean and adversarial generalization in ICL with PE.

GAN|对抗|攻击|生成相关(1篇)

【1】Membership and Dataset Inference Attacks on Large Audio Generative Models
标题：对大型音频生成模型的成员资格和数据集推理攻击
链接：https://arxiv.org/abs/2512.09654

作者：Jakub Proboszcz,Paweł Kochanski,Karol Korszun,Donato Crisostomi,Giorgio Strano,Emanuele Rodolà,Kamil Deja,Jan Dubinski
备注：NeurIPS 2025 AI for Music Workshop NeurIPS 2025 Workshop on Creativity & Generative AI
摘要：基于扩散和自回归架构的生成音频模型在质量和表现力方面都取得了快速发展。然而，这一进展引起了紧迫的版权问题，因为这些模型通常是在大量艺术和商业作品的语料库上训练的。一个核心问题是，人们是否能够可靠地核实艺术家的材料是否包括在培训中，从而为版权所有者提供保护其内容的手段。在这项工作中，我们通过对开源生成音频模型的成员推断攻击（MIA）来研究这种验证的可行性，该模型试图确定特定的音频样本是否是训练集的一部分。我们的实证结果表明，仅隶属度推断在规模上的有效性有限，因为对于在大型和多样化数据集上训练的模型，每个样本的隶属度信号很弱。然而，艺术家和媒体所有者通常持有作品集，而不是孤立的样本。基于文本和视觉领域的先前工作，在这项工作中，我们专注于数据集推理（DI），它在多个样本中聚合了不同的成员资格证据。我们发现DI在音频领域取得了成功，为评估艺术家的作品是否有助于模型训练提供了更实用的机制。我们的研究结果表明，在大型音频生成模型时代，DI是版权保护和数据集问责制的一个有前途的方向。
摘要：Generative audio models, based on diffusion and autoregressive architectures, have advanced rapidly in both quality and expressiveness. This progress, however, raises pressing copyright concerns, as such models are often trained on vast corpora of artistic and commercial works. A central question is whether one can reliably verify if an artist's material was included in training, thereby providing a means for copyright holders to protect their content. In this work, we investigate the feasibility of such verification through membership inference attacks (MIA) on open-source generative audio models, which attempt to determine whether a specific audio sample was part of the training set. Our empirical results show that membership inference alone is of limited effectiveness at scale, as the per-sample membership signal is weak for models trained on large and diverse datasets. However, artists and media owners typically hold collections of works rather than isolated samples. Building on prior work in text and vision domains, in this work we focus on dataset inference (DI), which aggregates diverse membership evidence across multiple samples. We find that DI is successful in the audio domain, offering a more practical mechanism for assessing whether an artist's works contributed to model training. Our results suggest DI as a promising direction for copyright protection and dataset accountability in the era of large audio generative models.

半/弱/无/有监督|不确定性|主动学习(5篇)

【1】Self-Supervised Learning with Gaussian Processes
标题：高斯过程的自我监督学习
链接：https://arxiv.org/abs/2512.09322

作者：Yunshan Duan,Sinead Williamson
摘要：自监督学习（SSL）是一种机器学习范式，其中模型学习理解数据的底层结构，而无需来自标记样本的显式监督。从SSL获得的表示已被证明对许多下游任务有用，包括聚类和线性分类等。为了确保表示空间的平滑性，大多数SSL方法依赖于生成与给定实例相似的观测对的能力。然而，对于许多类型的数据，生成这些对可能具有挑战性。此外，这些方法缺乏对不确定性量化的考虑，并且在样本外预测设置中表现不佳。为了解决这些限制，我们提出了高斯过程自监督学习（GPSSL），一种新的方法，利用高斯过程（GP）模型的表示学习。GP先验施加的表示，我们得到一个广义贝叶斯后验最小化的损失函数，鼓励信息表示。GP中固有的协方差函数自然地将相似单元的表示拉在一起，作为使用明确定义的正样本的替代方案。我们发现，GPSSL是密切相关的内核PCA和VICReg，一个流行的基于神经网络的SSL方法，但不像这两个允许后验不确定性，可以传播到下游的任务。在各种数据集上的实验，考虑分类和回归任务，表明GPSSL在准确性，不确定性量化和错误控制方面优于传统方法。
摘要：Self supervised learning (SSL) is a machine learning paradigm where models learn to understand the underlying structure of data without explicit supervision from labeled samples. The acquired representations from SSL have demonstrated useful for many downstream tasks including clustering, and linear classification, etc. To ensure smoothness of the representation space, most SSL methods rely on the ability to generate pairs of observations that are similar to a given instance. However, generating these pairs may be challenging for many types of data. Moreover, these methods lack consideration of uncertainty quantification and can perform poorly in out-of-sample prediction settings. To address these limitations, we propose Gaussian process self supervised learning (GPSSL), a novel approach that utilizes Gaussian processes (GP) models on representation learning. GP priors are imposed on the representations, and we obtain a generalized Bayesian posterior minimizing a loss function that encourages informative representations. The covariance function inherent in GPs naturally pulls representations of similar units together, serving as an alternative to using explicitly defined positive samples. We show that GPSSL is closely related to both kernel PCA and VICReg, a popular neural network-based SSL method, but unlike both allows for posterior uncertainties that can be propagated to downstream tasks. Experiments on various datasets, considering classification and regression tasks, demonstrate that GPSSL outperforms traditional methods in terms of accuracy, uncertainty quantification, and error control.

【2】Contrastive Learning for Semi-Supervised Deep Regression with Generalized Ordinal Rankings from Spectral Seriation
标题：基于谱序列的半监督广义有序排序深度回归的对比学习
链接：https://arxiv.org/abs/2512.09267

作者：Ce Wang,Weihang Dai,Hanru Bai,Xiaomeng Li
摘要：对比学习方法在特征空间中加强标签距离关系，以提高回归模型的表示能力。然而，这些方法高度依赖于标签信息来正确地恢复特征的顺序关系，限制了它们在半监督回归中的应用。在这项工作中，我们扩展了对比回归方法，允许在半监督设置中使用未标记的数据，从而减少了对昂贵的注释的依赖。特别地，我们构造的特征相似性矩阵与标记和未标记的样本在一个小批量，以反映样本间的关系，和一个准确的顺序排序所涉及的未标记的样本可以通过光谱序列化算法恢复，如果误差的水平是在一定的范围内。上面的标记样本的引入提供了有序排序的正则化，并从地面真实标签信息中得到指导，使排序更加可靠。为了减少特征扰动，我们进一步利用动态规划算法来选择用于矩阵构造的鲁棒特征。然后将恢复的顺序关系用于未标记样本的对比学习，从而允许更多数据用于特征表示学习，从而获得更稳健的结果。顺序排序也可以用来监督对未标记样本的预测，作为额外的训练信号。我们通过在各种数据集上的实验提供了理论保证和经验验证，证明我们的方法可以超越现有的最先进的半监督深度回归方法。我们的代码已经发布在https://github.com/xmed-lab/CLSS上。
摘要：Contrastive learning methods enforce label distance relationships in feature space to improve representation capability for regression models. However, these methods highly depend on label information to correctly recover ordinal relationships of features, limiting their applications to semi-supervised regression. In this work, we extend contrastive regression methods to allow unlabeled data to be used in the semi-supervised setting, thereby reducing the dependence on costly annotations. Particularly we construct the feature similarity matrix with both labeled and unlabeled samples in a mini-batch to reflect inter-sample relationships, and an accurate ordinal ranking of involved unlabeled samples can be recovered through spectral seriation algorithms if the level of error is within certain bounds. The introduction of labeled samples above provides regularization of the ordinal ranking with guidance from the ground-truth label information, making the ranking more reliable. To reduce feature perturbations, we further utilize the dynamic programming algorithm to select robust features for the matrix construction. The recovered ordinal relationship is then used for contrastive learning on unlabeled samples, and we thus allow more data to be used for feature representation learning, thereby achieving more robust results. The ordinal rankings can also be used to supervise predictions on unlabeled samples, serving as an additional training signal. We provide theoretical guarantees and empirical verification through experiments on various datasets, demonstrating that our method can surpass existing state-of-the-art semi-supervised deep regression methods. Our code have been released on https://github.com/xmed-lab/CLSS.

【3】Learning Robust Representations for Malicious Content Detection via Contrastive Sampling and Uncertainty Estimation
标题：通过对比采样和不确定性估计学习用于恶意内容检测的鲁棒表示
链接：https://arxiv.org/abs/2512.08969

作者：Elias Hossain,Umesh Biswas,Charan Gudla,Sai Phani Parsa
摘要：我们提出了不确定性对比框架（UCF），这是一种正未标记（PU）表示学习框架，它集成了不确定性感知对比损失，自适应温度缩放和自我注意引导的LSTM编码器，以改善噪声和不平衡条件下的分类。UCF基于样本置信度动态调整对比权重，使用正锚点稳定训练，并使温度参数适应批次级别的可变性。应用于恶意内容分类，UCF生成的嵌入使多个传统分类器能够实现超过93.38%的准确率，高于0.93的精度和近乎完美的召回率，同时具有最小的假阴性和竞争力的ROC-AUC分数。视觉分析证实了正面和未标记实例之间的明确分离，突出了框架产生校准的、有区别的嵌入的能力。这些结果将UCF定位为网络安全和生物医学文本挖掘等高风险领域PU学习的强大且可扩展的解决方案。
摘要：We propose the Uncertainty Contrastive Framework (UCF), a Positive-Unlabeled (PU) representation learning framework that integrates uncertainty-aware contrastive loss, adaptive temperature scaling, and a self-attention-guided LSTM encoder to improve classification under noisy and imbalanced conditions. UCF dynamically adjusts contrastive weighting based on sample confidence, stabilizes training using positive anchors, and adapts temperature parameters to batch-level variability. Applied to malicious content classification, UCF-generated embeddings enable multiple traditional classifiers to achieve more than 93.38% accuracy, precision above 0.93, and near-perfect recall, with minimal false negatives and competitive ROC-AUC scores. Visual analyses confirm clear separation between positive and unlabeled instances, highlighting the framework's ability to produce calibrated, discriminative embeddings. These results position UCF as a robust and scalable solution for PU learning in high-stakes domains such as cybersecurity and biomedical text mining.

【4】Supervised learning pays attention
标题：监督学习注重
链接：https://arxiv.org/abs/2512.09912

作者：Erin Craig,Robert Tibshirani
摘要：带注意力的上下文学习使大型神经网络能够通过选择性地关注相关示例来进行特定于上下文的预测。在这里，我们将这个想法应用于监督学习过程，例如表格数据的套索回归和梯度提升。我们的目标是（1）为每个预测点灵活地拟合个性化模型，（2）保持模型的简单性和可解释性。我们的方法通过根据注意力对训练数据进行加权来为每个测试观察结果拟合局部模型，注意力是一种监督的相似性度量，强调预测结果的特征和交互。注意力加权允许该方法以数据驱动的方式适应异构数据，而不需要聚类或相似性预先指定。此外，我们的方法是唯一可解释的：对于每个测试观察，我们确定哪些特征最具预测性，哪些训练观察最相关。然后，我们展示了如何对时间序列和空间数据使用注意力加权，并提出了一种使用注意力加权残差校正使预训练的基于树的模型适应分布偏移的方法。在真实和模拟数据集上，注意力加权在保持可解释性的同时提高了预测性能，理论表明，在具有已知子组结构的混合模型数据生成过程中，注意力加权线性模型的均方误差比标准线性模型低。
摘要：In-context learning with attention enables large neural networks to make context-specific predictions by selectively focusing on relevant examples. Here, we adapt this idea to supervised learning procedures such as lasso regression and gradient boosting, for tabular data. Our goals are to (1) flexibly fit personalized models for each prediction point and (2) retain model simplicity and interpretability. Our method fits a local model for each test observation by weighting the training data according to attention, a supervised similarity measure that emphasizes features and interactions that are predictive of the outcome. Attention weighting allows the method to adapt to heterogeneous data in a data-driven way, without requiring cluster or similarity pre-specification. Further, our approach is uniquely interpretable: for each test observation, we identify which features are most predictive and which training observations are most relevant. We then show how to use attention weighting for time series and spatial data, and we present a method for adapting pretrained tree-based models to distributional shift using attention-weighted residual corrections. Across real and simulated datasets, attention weighting improves predictive performance while preserving interpretability, and theory shows that attention-weighting linear models attain lower mean squared error than the standard linear model under mixture-of-models data-generating processes with known subgroup structure.

【5】PathCo-LatticE: Pathology-Constrained Lattice-Of Experts Framework for Fully-supervised Few-Shot Cardiac MRI Segmentation
标题：PathCo-LatticE：用于全监督少激发心脏MRI分割的病理约束专家网格框架
链接：https://arxiv.org/abs/2512.09779

作者：Mohamed Elbayumi,Mohammed S. M. Elbaz
摘要：Few-Shot学习（FSL）缓解了心脏MRI分割中的数据稀缺性，但通常依赖于对域偏移和验证偏差敏感的半监督技术，限制了zero-shot的泛化能力。我们提出了PathCo-LatticE，这是一个完全监督的FSL框架，它用病理指导的合成监督代替了未标记的数据。首先，我们的虚拟患者引擎从稀疏的临床锚点建模连续的潜在疾病轨迹，使用生成建模来合成生理上合理的、完全标记的3D队列。其次，自增强交错验证（SIV）提供了一个无泄漏的协议，可以在线评估具有渐进挑战性的合成样本的模型，从而消除了对真实验证数据的需求。最后，动态专家网格（LoE）在病理感知拓扑中组织专门的网络，并根据输入激活最相关的专家，从而在无需目标域微调的情况下对看不见的数据进行鲁棒的zero-shot概括。我们在严格的分布外（OOD）设置中评估了PathCo-LatticE，从单源域（ACDC）中导出所有锚点和严重性统计数据，并对多中心，多供应商M& M数据集进行了zero-shot测试。PathCo-LatticE仅从7个标记锚点开始，就以4.2-11%的Dice优于四种最先进的FSL方法，并且仅用19个标记锚点就接近完全监督性能（在1% Dice内）。该方法在四个供应商之间显示出优越的协调性，并推广到看不见的病理。[Code将公开发布]。
摘要：Few-shot learning (FSL) mitigates data scarcity in cardiac MRI segmentation but typically relies on semi-supervised techniques sensitive to domain shifts and validation bias, restricting zero-shot generalizability. We propose PathCo-LatticE, a fully supervised FSL framework that replaces unlabeled data with pathology-guided synthetic supervision. First, our Virtual Patient Engine models continuous latent disease trajectories from sparse clinical anchors, using generative modeling to synthesize physiologically plausible, fully labeled 3D cohorts. Second, Self-Reinforcing Interleaved Validation (SIV) provides a leakage-free protocol that evaluates models online with progressively challenging synthetic samples, eliminating the need for real validation data. Finally, a dynamic Lattice-of-Experts (LoE) organizes specialized networks within a pathology-aware topology and activates the most relevant experts per input, enabling robust zero-shot generalization to unseen data without target-domain fine-tuning. We evaluated PathCo-LatticE in a strict out-of-distribution (OOD) setting, deriving all anchors and severity statistics from a single-source domain (ACDC) and performing zero-shot testing on the multi-center, multi-vendor M&Ms dataset. PathCo-LatticE outperforms four state-of-the-art FSL methods by 4.2-11% Dice starting from only 7 labeled anchors, and approaches fully supervised performance (within 1% Dice) with only 19 labeled anchors. The method shows superior harmonization across four vendors and generalization to unseen pathologies. [Code will be made publicly available].

迁移|Zero/Few/One-Shot|自适应(1篇)

【1】Knowledge Diversion for Efficient Morphology Control and Policy Transfer
标题：知识转移以实现有效的形态控制和政策转移
链接：https://arxiv.org/abs/2512.09796

作者：Fu Feng,Ruixiao Shi,Yucheng Xie,Jianlu Shen,Jing Wang,Xin Geng
摘要：通用形态控制旨在学习一种通用策略，该策略可以在异构代理形态中推广，基于transformer的控制器成为一种流行的选择。然而，这样的架构会产生大量的计算成本，导致高部署开销，并且现有的方法表现出有限的跨任务泛化，需要从头开始为每个新任务进行训练。为此，我们提出了\textbf{DivMorph}，一个模块化的训练范式，利用知识转移来学习可分解的控制器。DivMorph在训练之前通过SVD将随机初始化的Transformer权重分解为因子单元，并采用动态软门来基于任务和形态嵌入调制这些单元，将它们分离为共享的\textit{learngenes}和形态和任务特定的\textit{tailors}，从而实现知识解纠缠。通过有选择地激活相关组件，DivMorph实现了可扩展和高效的策略部署，同时支持有效的策略转移到新的任务。大量的实验表明，DivMorph实现了最先进的性能，实现了3$\times$的样本效率比直接微调跨任务传输和17$\times$的模型大小减少单代理部署。
摘要：Universal morphology control aims to learn a universal policy that generalizes across heterogeneous agent morphologies, with Transformer-based controllers emerging as a popular choice. However, such architectures incur substantial computational costs, resulting in high deployment overhead, and existing methods exhibit limited cross-task generalization, necessitating training from scratch for each new task. To this end, we propose \textbf{DivMorph}, a modular training paradigm that leverages knowledge diversion to learn decomposable controllers. DivMorph factorizes randomly initialized Transformer weights into factor units via SVD prior to training and employs dynamic soft gating to modulate these units based on task and morphology embeddings, separating them into shared \textit{learngenes} and morphology- and task-specific \textit{tailors}, thereby achieving knowledge disentanglement. By selectively activating relevant components, DivMorph enables scalable and efficient policy deployment while supporting effective policy transfer to novel tasks. Extensive experiments demonstrate that DivMorph achieves state-of-the-art performance, achieving a 3$\times$ improvement in sample efficiency over direct finetuning for cross-task transfer and a 17$\times$ reduction in model size for single-agent deployment.

强化学习(3篇)

【1】STACHE: Local Black-Box Explanations for Reinforcement Learning Policies
标题：STACHE：强化学习政策的本地黑匣子解释
链接：https://arxiv.org/abs/2512.09909

作者：Andrew Elashkin,Orna Grumberg
摘要：强化学习代理在稀疏奖励或安全关键环境中经常表现出意外行为，这就迫切需要可靠的调试和验证工具。在本文中，我们提出了STACHE，一个全面的框架，用于生成本地，黑箱解释代理的具体行动内离散马尔可夫博弈。我们的方法产生了一个由两个互补部分组成的复合解释：（1）鲁棒性区域，即智能体动作保持不变的状态的连通邻域，以及（2）最小反事实，即改变该决定所需的最小状态扰动。通过利用因子化状态空间的结构，我们引入了一个精确的，基于搜索的算法，绕过代理模型的保真度差距。对体育馆环境的实证验证表明，我们的框架不仅解释了政策行动，而且还有效地捕捉了训练期间政策逻辑的演变--从不稳定、不稳定的行为到优化、稳健的策略--为代理敏感性和决策边界提供了可操作的见解。
摘要：Reinforcement learning agents often behave unexpectedly in sparse-reward or safety-critical environments, creating a strong need for reliable debugging and verification tools. In this paper, we propose STACHE, a comprehensive framework for generating local, black-box explanations for an agent's specific action within discrete Markov games. Our method produces a Composite Explanation consisting of two complementary components: (1) a Robustness Region, the connected neighborhood of states where the agent's action remains invariant, and (2) Minimal Counterfactuals, the smallest state perturbations required to alter that decision. By exploiting the structure of factored state spaces, we introduce an exact, search-based algorithm that circumvents the fidelity gaps of surrogate models. Empirical validation on Gymnasium environments demonstrates that our framework not only explains policy actions, but also effectively captures the evolution of policy logic during training - from erratic, unstable behavior to optimized, robust strategies - providing actionable insights into agent sensitivity and decision boundaries.

【2】Training One Model to Master Cross-Level Agentic Actions via Reinforcement Learning
标题：通过强化学习训练一个模型以掌握跨级别的动态动作
链接：https://arxiv.org/abs/2512.09706

作者：Kaichen He,Zihao Wang,Muyao Li,Anji Liu,Yitao Liang
摘要：人工智能的范式正在从设计复杂的工作流程转向训练后的原生模型。然而，现有的代理通常局限于静态的、预定义的操作空间--例如专门使用API、GUI事件或机器人命令。这种刚性限制了它们在动态环境中的适应性，在动态环境中，交互的最佳粒度根据上下文而变化。为了弥合这一差距，我们提出了CrossAgent，一个统一的代理模型，掌握异构的动作空间，并自主选择最有效的接口的轨迹的每一步。我们引入了一个全面的训练管道，它集成了冷启动监督微调与多轮组相对策略优化（GRPO）算法。这种方法使代理能够学习自适应动作切换-平衡高级别的效率与低级别的精度-没有人类指定的规则。在开放世界Minecraft环境中对800多个任务进行的广泛实验表明，CrossAgent实现了最先进的性能。通过动态地利用不同动作空间的优势，我们的模型显著优于固定动作基线，在长时间推理中表现出卓越的泛化能力和效率。所有代码和模型可在https://github.com/CraftJarvis/OpenHA上获得
摘要：The paradigm of agentic AI is shifting from engineered complex workflows to post-training native models. However, existing agents are typically confined to static, predefined action spaces--such as exclusively using APIs, GUI events, or robotic commands. This rigidity limits their adaptability in dynamic environments where the optimal granularity of interaction varies contextually. To bridge this gap, we propose CrossAgent, a unified agentic model that masters heterogeneous action spaces and autonomously selects the most effective interface for each step of a trajectory. We introduce a comprehensive training pipeline that integrates cold-start supervised fine-tuning with a Multi-Turn Group Relative Policy Optimization (GRPO) algorithm. This approach enables the agent to learn adaptive action switching--balancing high-level efficiency with low-level precision--without human-specified rules. Extensive experiments on over 800 tasks in the open-world Minecraft environment demonstrate that CrossAgent achieves state-of-the-art performance. By dynamically leveraging the strengths of diverse action spaces, our model significantly outperforms fixed-action baselines, exhibiting superior generalization and efficiency in long-horizon reasoning. All code and models are available at https://github.com/CraftJarvis/OpenHA

【3】Robustness and Adaptability of Reinforcement Learning based Cooperative Autonomous Driving in Mixed-autonomy Traffic
标题：混合自主交通中基于强化学习的协作自主驾驶的鲁棒性和适应性
链接：https://arxiv.org/abs/2202.00881

作者：Rodolfo Valiente,Behrad Toghi,Ramtin Pedarsani,Yaser P. Fallah
摘要：制造自动驾驶汽车（AV）是一个复杂的问题，但使它们能够在现实世界中运行，在现实世界中它们将被人类驾驶的汽车（HV）包围，这是极具挑战性的。先前的工作已经显示了在遵循社会效用的一组AV之间创建代理间合作的可能性。这种利他的AV可以形成联盟并影响HV的行为，以实现社会期望的结果。我们确定了两个主要的挑战，在共存的AV和HV。首先，给定人类驾驶员的社会偏好和个人特质，例如，无扰性和攻击性对于AV是未知的，并且在短AV-HV交互期间几乎不可能实时推断它们。其次，与预期遵循政策的AV相反，HV不一定遵循固定的政策，因此非常难以预测。为了缓解上述挑战，我们将混合自治问题表示为多智能体强化学习（MARL）问题，并提出了一个分散的框架和奖励函数来训练合作AV。我们的方法使自动驾驶汽车能够从经验中隐式地学习HV的决策，优化社会效用，同时优先考虑安全性和适应性;使利他的自动驾驶汽车适应不同的人类行为，并将其限制在安全的行动空间。最后，我们调查的鲁棒性，安全性和敏感性的AV各种HV的行为特征，并提出了设置中，AV可以学习合作的政策，适应不同的情况。
摘要：Building autonomous vehicles (AVs) is a complex problem, but enabling them to operate in the real world where they will be surrounded by human-driven vehicles (HVs) is extremely challenging. Prior works have shown the possibilities of creating inter-agent cooperation between a group of AVs that follow a social utility. Such altruistic AVs can form alliances and affect the behavior of HVs to achieve socially desirable outcomes. We identify two major challenges in the co-existence of AVs and HVs. First, social preferences and individual traits of a given human driver, e.g., selflessness and aggressiveness are unknown to an AV, and it is almost impossible to infer them in real-time during a short AV-HV interaction. Second, contrary to AVs that are expected to follow a policy, HVs do not necessarily follow a stationary policy and therefore are extremely hard to predict. To alleviate the above-mentioned challenges, we formulate the mixed-autonomy problem as a multi-agent reinforcement learning (MARL) problem and propose a decentralized framework and reward function for training cooperative AVs. Our approach enables AVs to learn the decision-making of HVs implicitly from experience, optimizes for a social utility while prioritizing safety and allowing adaptability; robustifying altruistic AVs to different human behaviors and constraining them to a safe action space. Finally, we investigate the robustness, safety and sensitivity of AVs to various HVs behavioral traits and present the settings in which the AVs can learn cooperative policies that are adaptable to different situations.

元学习(1篇)

【1】Meta-learning three-factor plasticity rules for structured credit assignment with sparse feedback
标题：稀疏反馈结构化信用分配的元学习三因素可塑性规则
链接：https://arxiv.org/abs/2512.09366

作者：Dimitra Maoutsa
备注：10 pages, 2 figures; accepted & presented at NeurIPS 2025 workshop Symmetry and Geometry in Neural Representations (NeurReps)
摘要：生物神经网络使用局部突触可塑性从稀疏的延迟反馈中学习复杂的行为，但实现结构化信用分配的机制仍然难以捉摸。相比之下，解决类似任务的人工递归网络通常依赖于生物学上难以置信的全局学习规则或手工制作的局部更新。能够支持延迟强化学习的局部可塑性规则空间在很大程度上尚未探索。在这里，我们提出了一个元学习框架，发现局部学习规则的结构化信用分配的经常性网络训练稀疏反馈。我们的方法在任务执行过程中交错局部neo-Hebbian类更新，外部循环通过\textbf{tangent-propagation through learning}优化塑性参数。由此产生的三因素学习规则，使长时间尺度的信用分配只使用本地信息和延迟奖励，提供了新的见解生物接地机制，在循环回路的学习。
摘要：Biological neural networks learn complex behaviors from sparse, delayed feedback using local synaptic plasticity, yet the mechanisms enabling structured credit assignment remain elusive. In contrast, artificial recurrent networks solving similar tasks typically rely on biologically implausible global learning rules or hand-crafted local updates. The space of local plasticity rules capable of supporting learning from delayed reinforcement remains largely unexplored. Here, we present a meta-learning framework that discovers local learning rules for structured credit assignment in recurrent networks trained with sparse feedback. Our approach interleaves local neo-Hebbian-like updates during task execution with an outer loop that optimizes plasticity parameters via \textbf{tangent-propagation through learning}. The resulting three-factor learning rules enable long-timescale credit assignment using only local information and delayed rewards, offering new insights into biologically grounded mechanisms for learning in recurrent circuits.

符号|符号学习(1篇)

【1】Visual Categorization Across Minds and Models: Cognitive Analysis of Human Labeling and Neuro-Symbolic Integration
标题：跨思维和模型的视觉分类：人类标签和神经符号整合的认知分析
链接：https://arxiv.org/abs/2512.09340

作者：Chethana Prasad Kabgere
备注：12 pages, 3 figures. Research manuscript based on the final project for CS6795 (Introduction to Cognitive Science), Georgia Tech
摘要：了解人类和人工智能系统如何解释模糊的视觉刺激，可以提供对感知，推理和决策本质的重要见解。本文研究了人类参与者和深度神经网络的图像标记性能，重点关注低分辨率，感知退化的刺激。从计算认知科学，认知架构和连接主义-符号混合模型中，我们将人类的策略，如类比推理，基于形状的识别和信心调制与人工智能的基于特征的处理进行了对比。基于马尔的三层次假设，西蒙的有限理性，和萨加德的框架的表示和情感，我们分析参与者的反应有关的Grad-CAM可视化模型的注意力。通过ACT-R和Soar模型中的认知原则进一步解释人类行为，揭示不确定性下的分层和启发式决策策略。我们的研究结果突出了生物和人工系统在表示，推理和置信度校准方面的关键相似之处和分歧。分析激励未来的神经符号架构，统一结构化的符号推理与连接表示。这种架构以体现性、可解释性和认知对齐原则为基础，提供了一条通往人工智能系统的道路，这些系统不仅具有性能，而且具有可解释性和认知基础。
摘要：Understanding how humans and AI systems interpret ambiguous visual stimuli offers critical insight into the nature of perception, reasoning, and decision-making. This paper examines image labeling performance across human participants and deep neural networks, focusing on low-resolution, perceptually degraded stimuli. Drawing from computational cognitive science, cognitive architectures, and connectionist-symbolic hybrid models, we contrast human strategies such as analogical reasoning, shape-based recognition, and confidence modulation with AI's feature-based processing. Grounded in Marr's tri-level hypothesis, Simon's bounded rationality, and Thagard's frameworks of representation and emotion, we analyze participant responses in relation to Grad-CAM visualizations of model attention. Human behavior is further interpreted through cognitive principles modeled in ACT-R and Soar, revealing layered and heuristic decision strategies under uncertainty. Our findings highlight key parallels and divergences between biological and artificial systems in representation, inference, and confidence calibration. The analysis motivates future neuro-symbolic architectures that unify structured symbolic reasoning with connectionist representations. Such architectures, informed by principles of embodiment, explainability, and cognitive alignment, offer a path toward AI systems that are not only performant but also interpretable and cognitively grounded.

医学相关(6篇)

【1】Towards Optimal Valve Prescription for Transcatheter Aortic Valve Replacement (TAVR) Surgery: A Machine Learning Approach
标题：经导管主动脉瓣置换术（TAVR）手术的最佳瓣膜处方：机器学习方法
链接：https://arxiv.org/abs/2512.09198

作者：Phevos Paschalidis,Vasiliki Stoumpou,Lisa Everest,Yu Ma,Talhat Azemi,Jawad Haider,Steven Zweibel,Eleftherios M. Protopapas,Jeff Mather,Maciej Tysarowski,George E. Sarris,Robert C. Hagberg,Howard L. Haronian,Dimitris Bertsimas
摘要：经导管主动脉瓣置换术（TAVR）已成为严重主动脉瓣狭窄（一种危及生命的心血管疾病）患者的微创治疗选择。多个经导管心脏瓣膜（THV）已被批准用于TAVR，但关于瓣膜类型处方的现行指南仍然是一个活跃的辩论话题。我们提出了一种数据驱动的临床支持工具，用于识别最佳瓣膜类型，目的是最大限度地降低永久性起搏器植入（PPI）的风险，这是一种主要的术后并发症。我们合成了一个新的数据集，结合了美国和希腊的患者人群，并整合了三个不同的数据源（患者人口统计数据，计算机断层扫描，超声心动图），同时协调每个国家的记录系统的差异。我们引入了叶级分析，以利用人口异质性，并避免基准对不确定的反事实的风险估计。最终的处方模型显示，与我们的内部美国人群和外部希腊验证队列中的当前标准治疗相比，PPI率分别降低了26%和16%。据我们所知，这项工作代表了TAVR中THV选择的第一个统一的个性化处方策略。
摘要：Transcatheter Aortic Valve Replacement (TAVR) has emerged as a minimally invasive treatment option for patients with severe aortic stenosis, a life-threatening cardiovascular condition. Multiple transcatheter heart valves (THV) have been approved for use in TAVR, but current guidelines regarding valve type prescription remain an active topic of debate. We propose a data-driven clinical support tool to identify the optimal valve type with the objective of minimizing the risk of permanent pacemaker implantation (PPI), a predominant postoperative complication. We synthesize a novel dataset that combines U.S. and Greek patient populations and integrates three distinct data sources (patient demographics, computed tomography scans, echocardiograms) while harmonizing differences in each country's record system. We introduce a leaf-level analysis to leverage population heterogeneity and avoid benchmarking against uncertain counterfactual risk estimates. The final prescriptive model shows a reduction in PPI rates of 26% and 16% compared with the current standard of care in our internal U.S. population and external Greek validation cohort, respectively. To the best of our knowledge, this work represents the first unified, personalized prescription strategy for THV selection in TAVR.

【2】KD-OCT: Efficient Knowledge Distillation for Clinical-Grade Retinal OCT Classification
标题：KD-光学断层扫描：临床级视网膜光学断层扫描分类的有效知识提炼
链接：https://arxiv.org/abs/2512.09069

作者：Erfan Nourbakhsh,Nasrin Sanjari,Ali Nourbakhsh
备注：7 pages, 5 figures
摘要：视网膜相关性黄斑变性（AMD）和脉络膜新生血管（CNV）相关疾病是全球视力丧失的主要原因，光学相干断层扫描（OCT）是早期检测和管理的基石。然而，在临床环境中部署像ConvNeXtV 2-Large这样的最先进的深度学习模型受到其计算需求的阻碍。因此，需要开发在实现实时部署的同时保持高诊断性能的高效模型。在这项研究中，提出了一种新的知识蒸馏框架，称为KD-OCT，用于压缩高性能ConvNeXtV 2-Large教师模型，通过高级增强，随机权重平均和焦点损失增强，成为一个轻量级的EfficientNet-B2学生，用于分类正常，玻璃疣和CNV病例。KD-OCT采用实时蒸馏，结合损失平衡软教师知识转移和硬地面实况监督。所提出的方法的有效性进行评估努尔眼科医院（NEH）数据集使用患者级交叉验证。实验结果表明，KD-OCT优于可比的多尺度或特征融合OCT分类器的效率-准确性平衡，实现近教师的性能与模型大小和推理时间的大幅减少。尽管压缩，学生模型超过了大多数现有的框架，促进了AMD筛选的边缘部署。代码可在https://github.com/erfan-nourbakhsh/KD- OCT上获得。
摘要：Age-related macular degeneration (AMD) and choroidal neovascularization (CNV)-related conditions are leading causes of vision loss worldwide, with optical coherence tomography (OCT) serving as a cornerstone for early detection and management. However, deploying state-of-the-art deep learning models like ConvNeXtV2-Large in clinical settings is hindered by their computational demands. Therefore, it is desirable to develop efficient models that maintain high diagnostic performance while enabling real-time deployment. In this study, a novel knowledge distillation framework, termed KD-OCT, is proposed to compress a high-performance ConvNeXtV2-Large teacher model, enhanced with advanced augmentations, stochastic weight averaging, and focal loss, into a lightweight EfficientNet-B2 student for classifying normal, drusen, and CNV cases. KD-OCT employs real-time distillation with a combined loss balancing soft teacher knowledge transfer and hard ground-truth supervision. The effectiveness of the proposed method is evaluated on the Noor Eye Hospital (NEH) dataset using patient-level cross-validation. Experimental results demonstrate that KD-OCT outperforms comparable multi-scale or feature-fusion OCT classifiers in efficiency- accuracy balance, achieving near-teacher performance with substantial reductions in model size and inference time. Despite the compression, the student model exceeds most existing frameworks, facilitating edge deployment for AMD screening. Code is available at https://github.com/erfan-nourbakhsh/KD- OCT.

【3】EEG-Bench: A Benchmark for EEG Foundation Models in Clinical Applications
标题：脑电实验台：临床应用中脑电基础模型的基准
链接：https://arxiv.org/abs/2512.08959

作者：Ard Kastrati,Josua Bürki,Jonas Lauer,Cheng Xuan,Raffaele Iaquinto,Roger Wattenhofer
备注：Foundation Models for the Brain and Body (BrainBodyFM@NeurIPS)
摘要：我们引入了一个统一的基准框架，重点是评估基于EEG的基础模型在临床应用中。该基准涵盖14个公开可用的EEG数据集的11个定义明确的诊断任务，包括癫痫、精神分裂症、帕金森病、强迫症和轻度创伤性脑损伤。它具有最小的预处理，标准化的评估协议，并支持经典基线和现代基础模型的并排比较。我们的研究结果表明，虽然基础模型在某些情况下具有很强的性能，但更简单的模型通常仍然具有竞争力，特别是在临床分布变化的情况下。为了促进可重复性和采用，我们以可访问和可扩展的格式发布所有准备好的数据和代码。
摘要：We introduce a unified benchmarking framework focused on evaluating EEG-based foundation models in clinical applications. The benchmark spans 11 well-defined diagnostic tasks across 14 publicly available EEG datasets, including epilepsy, schizophrenia, Parkinson's disease, OCD, and mild traumatic brain injury. It features minimal preprocessing, standardized evaluation protocols, and enables side-by-side comparisons of classical baselines and modern foundation models. Our results show that while foundation models achieve strong performance in certain settings, simpler models often remain competitive, particularly under clinical distribution shifts. To facilitate reproducibility and adoption, we release all prepared data and code in an accessible and extensible format.

【4】An Electrocardiogram Multi-task Benchmark with Comprehensive Evaluations and Insightful Findings
标题：具有全面评估和深入发现的心电图多任务基准
链接：https://arxiv.org/abs/2512.08954

作者：Yuhao Xu,Jiaying Lu,Sirui Ding,Defu Cao,Xiao Hu,Carl Yang
摘要：在患者诊断过程中，无创测量由于其风险低、结果快而被广泛使用。心电图（ECG）作为一种非侵入性的心脏活动采集方法，被用于心脏疾病的诊断。分析ECG通常需要领域专业知识，这是将人工智能（AI）应用于医疗保健的障碍。通过自我监督学习和基础模型的进步，人工智能系统现在可以获取和利用领域知识，而无需仅仅依赖人类专业知识。然而，对于基础模型在心电图上的性能缺乏全面的分析。本研究旨在回答研究问题：“基础模型对心电图分析有用吗？”为了解决这个问题，我们将语言/一般时间序列/ECG基础模型与时间序列深度学习模型进行了比较。实验结果表明，一般的时间序列/心电基础模型的最高性能率达到80%，表明他们在心电分析的有效性。提供了深入的分析和见解以及全面的实验结果。这项研究强调了基础模型在推进生理波形分析方面的局限性和潜力。该基准测试的数据和代码可在https://github.com/yuhaoxu99/ECGMultitasks-Benchmark上公开获取。
摘要：In the process of patient diagnosis, non-invasive measurements are widely used due to their low risks and quick results. Electrocardiogram (ECG), as a non-invasive method to collect heart activities, is used to diagnose cardiac conditions. Analyzing the ECG typically requires domain expertise, which is a roadblock to applying artificial intelligence (AI) for healthcare. Through advances in self-supervised learning and foundation models, AI systems can now acquire and leverage domain knowledge without relying solely on human expertise. However, there is a lack of comprehensive analyses over the foundation models' performance on ECG. This study aims to answer the research question: "Are Foundation Models Useful for ECG Analysis?" To address it, we evaluate language/general time-series/ECG foundation models in comparison with time-series deep learning models. The experimental results show that general time-series/ECG foundation models achieve a top performance rate of 80%, indicating their effectiveness in ECG analysis. In-depth analyses and insights are provided along with comprehensive experimental results. This study highlights the limitations and potential of foundation models in advancing physiological waveform analysis. The data and code for this benchmark are publicly available at https://github.com/yuhaoxu99/ECGMultitasks-Benchmark.

【5】Causal Attribution of Model Performance Gaps in Medical Imaging Under Distribution Shifts
标题：分布变化下医学成像模型性能差距的原因归因
链接：https://arxiv.org/abs/2512.09094

作者：Pedro M. Gordaliza,Nataliia Molchanova,Jaume Banus,Thomas Sanchez,Meritxell Bach Cuadra
备注：Medical Imaging meets EurIPS Workshop: MedEurIPS 2025
摘要：由于分布变化，医学图像分割的深度学习模型的性能显着下降，但这些下降背后的因果机制仍然知之甚少。我们将因果归因框架扩展到高维分割任务，量化采集协议和注释变异性如何独立地导致性能下降。我们通过因果图对数据生成过程进行建模，并采用Shapley值将性能变化公平地归因于各个机制。我们的框架解决了医学成像中的独特挑战：高维输出，有限的样本和复杂的机制相互作用。在4个中心和7个注释者之间对多发性硬化症（MS）病变分割的验证揭示了上下文相关的失败模式：跨注释者时注释协议偏移占主导地位（7.4% $\pm $8.9% DSC归因），而跨成像中心时采集偏移占主导地位（6.5% $\pm $9.1%）。这种机制特定的量化使从业者能够根据部署环境优先考虑有针对性的干预措施。
摘要：Deep learning models for medical image segmentation suffer significant performance drops due to distribution shifts, but the causal mechanisms behind these drops remain poorly understood. We extend causal attribution frameworks to high-dimensional segmentation tasks, quantifying how acquisition protocols and annotation variability independently contribute to performance degradation. We model the data-generating process through a causal graph and employ Shapley values to fairly attribute performance changes to individual mechanisms. Our framework addresses unique challenges in medical imaging: high-dimensional outputs, limited samples, and complex mechanism interactions. Validation on multiple sclerosis (MS) lesion segmentation across 4 centers and 7 annotators reveals context-dependent failure modes: annotation protocol shifts dominate when crossing annotators (7.4% $\pm$ 8.9% DSC attribution), while acquisition shifts dominate when crossing imaging centers (6.5% $\pm$ 9.1%). This mechanism-specific quantification enables practitioners to prioritize targeted interventions based on deployment context.

【6】Enhanced Chest Disease Classification Using an Improved CheXNet Framework with EfficientNetV2-M and Optimization-Driven Learning
标题：使用改进的CheXNet框架、EfficientNetV 2-M和优化驱动学习增强胸部疾病分类
链接：https://arxiv.org/abs/2512.08992

作者：Ali M. Bahram,Saman Muhammad Omer,Hardi M. Mohammed,Sirwan Abdolwahed Aula
备注：23 pages, 6 figures, 7 tables
摘要：胸部X线片的解释是临床实践中的一个重要诊断问题，特别是在资源有限的环境中，放射科医生的短缺在延迟诊断和患者预后不良中发挥了作用。尽管最初的CheXNet架构在胸部X光照片自动分析方面显示出了潜力，但DenseNet-121主干计算效率低下，单标签分类器性能较差。为了消除这些缺点，我们提出了一个更好的胸部疾病分类框架，该框架依赖于EfficientNetV 2-M，并结合了高级训练方法，如自动混合精度训练，AdamW，余弦退火学习率调度和指数移动平均正则化。我们准备了一个包含18，080张胸部X射线图像的数据集，这些图像来自三种高权威性的源材料，代表了五种关键的临床重要疾病类别，包括心脏肥大、COVID-19、正常、肺炎和结核病。为了实现统计可靠性和再现性，运行了9次独立的实验运行。建议的架构显示出显著的增益，平均测试准确度为96.45%，而基线时为95.30%（p <0.001），宏观平均F1评分增加到91.08%（p <0.001）。重大传染病显示出近乎完美的分类性能，COVID-19检测的准确率为99.95%，结核病检测的准确率为99.97%。虽然包含了6.8倍的参数，但训练时间减少了11.4%，性能稳定性提高了22.7%。该框架本身是一种决策支持工具，可用于应对流行病，筛查结核病，并在各种医疗机构定期评估胸部疾病。
摘要：The interpretation of Chest X-ray is an important diagnostic issue in clinical practice and especially in the resource-limited setting where the shortage of radiologists plays a role in delayed diagnosis and poor patient outcomes. Although the original CheXNet architecture has shown potential in automated analysis of chest radiographs, DenseNet-121 backbone is computationally inefficient and poorly single-label classifier. To eliminate such shortcomings, we suggest a better classification framework of chest disease that relies on EfficientNetV2-M and incorporates superior training approaches such as Automatic Mixed Precision training, AdamW, Cosine Annealing learning rate scheduling, and Exponential Moving Average regularization. We prepared a dataset of 18,080 chest X-ray images of three source materials of high authority and representing five key clinically significant disease categories which included Cardiomegaly, COVID-19, Normal, Pneumonia, and Tuberculosis. To achieve statistical reliability and reproducibility, nine independent experimental runs were run. The suggested architecture showed significant gains with mean test accuracy of 96.45 percent compared to 95.30 percent at baseline (p less than 0.001) and macro-averaged F1-score increased to 91.08 percent (p less than 0.001). Critical infectious diseases showed near-perfect classification performance with COVID-19 detection having 99.95 percent accuracy and Tuberculosis detection having 99.97 percent accuracy. Although 6.8 times more parameters are included, the training time was reduced by 11.4 percent and performance stability was increased by 22.7 percent. This framework presents itself as a decision-support tool that can be used to respond to a pandemic, screen tuberculosis, and assess thoracic disease regularly in various healthcare facilities.

蒸馏|知识提取(2篇)

【1】HPM-KD: Hierarchical Progressive Multi-Teacher Framework for Knowledge Distillation and Efficient Model Compression
标题：HPM-KD：用于知识提炼和高效模型压缩的分层渐进多教师框架
链接：https://arxiv.org/abs/2512.09886

作者：Gustavo Coelho Haase,Paulo Henrique Dourado da Silva
备注：9 pages
摘要：知识蒸馏（KD）已经成为一种有前途的模型压缩技术，但面临着严重的限制：（1）对需要大量手动调整的超参数的敏感性，（2）从非常大的教师到小学生蒸馏时的容量差距，（3）多教师场景中的次优协调，以及（4）计算资源的低效使用。我们提出了\textbf{HPM-KD}，这是一个集成了六个协同组件的框架：（i）通过元学习的自适应配置管理器，其消除了手动超参数调整，（ii）具有自动确定的中间模型的渐进蒸馏链，（iii）学习动态每个样本权重的注意加权多教师Ensemble，（iv）在整个训练过程中适应温度的元学习温度Ensemble，（v）具有智能负载平衡的并行处理流水线，以及（vi）用于交叉实验重用的共享优化存储器。在CIFAR-10、CIFAR-100和表格数据集上的实验表明，HPM-KD：在保持85%准确率的同时实现了10倍至15倍的压缩，消除了手动调整的需要，并通过并行化将训练时间减少了30-40%。消融研究证实了每个组件的独立贡献（0.10-0.98 pp）。HPM-KD是开源DeepBridge库的一部分。
摘要：Knowledge Distillation (KD) has emerged as a promising technique for model compression but faces critical limitations: (1) sensitivity to hyperparameters requiring extensive manual tuning, (2) capacity gap when distilling from very large teachers to small students, (3) suboptimal coordination in multi-teacher scenarios, and (4) inefficient use of computational resources. We present \textbf{HPM-KD}, a framework that integrates six synergistic components: (i) Adaptive Configuration Manager via meta-learning that eliminates manual hyperparameter tuning, (ii) Progressive Distillation Chain with automatically determined intermediate models, (iii) Attention-Weighted Multi-Teacher Ensemble that learns dynamic per-sample weights, (iv) Meta-Learned Temperature Scheduler that adapts temperature throughout training, (v) Parallel Processing Pipeline with intelligent load balancing, and (vi) Shared Optimization Memory for cross-experiment reuse. Experiments on CIFAR-10, CIFAR-100, and tabular datasets demonstrate that HPM-KD: achieves 10x-15x compression while maintaining 85% accuracy retention, eliminates the need for manual tuning, and reduces training time by 30-40% via parallelization. Ablation studies confirm independent contribution of each component (0.10-0.98 pp). HPM-KD is available as part of the open-source DeepBridge library.

【2】Federated Distillation Assisted Vehicle Edge Caching Scheme Based on Lightweight DDPM
标题：基于轻量级DDPM的联邦蒸馏辅助车辆边缘缓存方案
链接：https://arxiv.org/abs/2512.09378

作者：Xun Li,Qiong Wu,Pingyi Fan,Kezhi Wang,Wen Chen,Khaled B. Letaief
备注：This paper has been submitted to IEEE letters. The source code has been released at: https://github.com/qiongwu86/Federated-Distillation-Assisted-Vehicle-Edge-Caching-Scheme-Based-on-Lightweight-DDPM
摘要：车辆边缘缓存是一种很有前途的技术，它可以通过在边缘节点预缓存用户感兴趣的内容来显著减少车辆用户（VU）访问内容的延迟。准确预测VU感兴趣的内容而不暴露其隐私至关重要。传统的联邦学习（FL）可以通过共享模型而不是原始数据来保护用户隐私。然而，FL的训练需要频繁的模型传输，这可能导致显著的通信开销。此外，车辆可能在训练完成之前离开路边单元（RSU）覆盖区域，从而导致训练失败。为了解决这些问题，在这封信中，我们提出了一个联邦蒸馏辅助车辆边缘缓存方案的基础上轻量级去噪扩散概率模型（LDPM）。仿真结果表明，提出的车辆边缘缓存方案对车辆速度变化具有良好的鲁棒性，显著降低了通信开销，提高了缓存命中率。
摘要：Vehicle edge caching is a promising technology that can significantly reduce the latency for vehicle users (VUs) to access content by pre-caching user-interested content at edge nodes. It is crucial to accurately predict the content that VUs are interested in without exposing their privacy. Traditional federated learning (FL) can protect user privacy by sharing models rather than raw data. However, the training of FL requires frequent model transmission, which can result in significant communication overhead. Additionally, vehicles may leave the road side unit (RSU) coverage area before training is completed, leading to training failures. To address these issues, in this letter, we propose a federated distillation-assisted vehicle edge caching scheme based on lightweight denoising diffusion probabilistic model (LDPM). The simulation results demonstrate that the proposed vehicle edge caching scheme has good robustness to variations in vehicle speed, significantly reducing communication overhead and improving cache hit percentage.

聚类(1篇)

【1】Comparative Analysis of Hash-based Malware Clustering via K-Means
标题：基于K-Means的哈希恶意软件集群比较分析
链接：https://arxiv.org/abs/2512.09539

作者：Aink Acrie Soe Thein,Nikolaos Pitropakis,Pavlos Papadopoulos,Sam Grierson,Sana Ullah Jan
备注：To be published in the proceedings of the 8th International Conference on Reliable Information and Communication Technology (IRICT 2025). Springer Book Series: "Lecture Notes on Data Engineering and Communications Technologies"
摘要：随着日常生活中多种数字设备的使用，网络攻击面也在增加。攻击者不断探索新的途径来利用它们并部署恶意软件。另一方面，检测方法通常采用基于散列的算法，如SSDeep，TLSH和IMPHash来捕获二进制文件之间的结构和行为相似性。这项工作的重点是分析和评估这些技术的聚类恶意软件样本使用的K-means算法。更具体地说，我们对已建立的恶意软件家族和特征进行了实验，发现TLSH和IMPHash产生了更明显、更有语义意义的集群，而SSDeep对更广泛的分类任务更有效。这项工作的结果可以指导更强大的威胁检测机制和自适应安全机制的发展。
摘要：With the adoption of multiple digital devices in everyday life, the cyber-attack surface has increased. Adversaries are continuously exploring new avenues to exploit them and deploy malware. On the other hand, detection approaches typically employ hashing-based algorithms such as SSDeep, TLSH, and IMPHash to capture structural and behavioural similarities among binaries. This work focuses on the analysis and evaluation of these techniques for clustering malware samples using the K-means algorithm. More specifically, we experimented with established malware families and traits and found that TLSH and IMPHash produce more distinct, semantically meaningful clusters, whereas SSDeep is more efficient for broader classification tasks. The findings of this work can guide the development of more robust threat-detection mechanisms and adaptive security mechanisms.

自动驾驶|车辆|车道检测等(3篇)

【1】CFLight: Enhancing Safety with Traffic Signal Control through Counterfactual Learning
标题：CFLight：通过反事实学习通过交通信号控制提高安全性
链接：https://arxiv.org/abs/2512.09368

作者：Mingyuan Li,Chunyu Liu,Zhuojun Li,Xiao Liu,Guangsheng Yu,Bo Du,Jun Shen,Qiang Wu
摘要：交通事故导致全球数百万人受伤和死亡，其中每年有大量事故发生在十字路口。交通信号控制（TSC）是一种有效的策略，以提高这些城市交界处的安全。尽管强化学习（RL）方法在优化TSC方面越来越受欢迎，但这些方法往往优先考虑驾驶效率而不是安全性，因此无法解决这两个方面之间的关键平衡。此外，这些方法通常需要更多的可解释性。反事实（CF）学习是一种很有前途的方法，用于各种因果分析领域。在这项研究中，我们引入了一个新的框架，以提高强化学习的安全方面的TSC。这个框架引入了一种基于CF学习的新方法来解决这个问题：“如果，当一个不安全的事件发生时，我们回溯执行替代操作，这个不安全的事件在随后的时间段内仍然会发生吗？”为了回答这个问题，我们提出了一个新的结构因果模型来预测执行不同操作后的结果，我们提出了一个新的CF模块，它与其他“X”模块集成，以促进安全的RL实践。我们的新算法CFLight源自该框架，有效地解决了具有挑战性的安全事件，并通过接近零的碰撞控制策略显着提高了交叉口的安全性。通过对真实世界和合成数据集的大量数值实验，我们证明了CFLight与传统RL方法和最近的安全RL模型相比减少了碰撞并提高了整体流量性能。此外，我们的方法代表了RL方法的通用和安全框架，为其他领域的应用开辟了可能性。数据和代码可在github https://github.com/MJLee00/CFLight-Enhancing-Safety-with-Traffic-Signal-Control-through-Counterfactual-Learning中找到。
摘要：Traffic accidents result in millions of injuries and fatalities globally, with a significant number occurring at intersections each year. Traffic Signal Control (TSC) is an effective strategy for enhancing safety at these urban junctures. Despite the growing popularity of Reinforcement Learning (RL) methods in optimizing TSC, these methods often prioritize driving efficiency over safety, thus failing to address the critical balance between these two aspects. Additionally, these methods usually need more interpretability. CounterFactual (CF) learning is a promising approach for various causal analysis fields. In this study, we introduce a novel framework to improve RL for safety aspects in TSC. This framework introduces a novel method based on CF learning to address the question: ``What if, when an unsafe event occurs, we backtrack to perform alternative actions, and will this unsafe event still occur in the subsequent period?'' To answer this question, we propose a new structure causal model to predict the result after executing different actions, and we propose a new CF module that integrates with additional ``X'' modules to promote safe RL practices. Our new algorithm, CFLight, which is derived from this framework, effectively tackles challenging safety events and significantly improves safety at intersections through a near-zero collision control strategy. Through extensive numerical experiments on both real-world and synthetic datasets, we demonstrate that CFLight reduces collisions and improves overall traffic performance compared to conventional RL methods and the recent safe RL model. Moreover, our method represents a generalized and safe framework for RL methods, opening possibilities for applications in other domains. The data and code are available in the github https://github.com/MJLee00/CFLight-Enhancing-Safety-with-Traffic-Signal-Control-through-Counterfactual-Learning.

【2】Learning-based social coordination to improve safety and robustness of cooperative autonomous vehicles in mixed traffic
标题：基于学习的社会协调，以提高混合交通中协作自主车辆的安全性和鲁棒性
链接：https://arxiv.org/abs/2211.11963

作者：Rodolfo Valiente,Behrad Toghi,Mahdi Razzaghpour,Ramtin Pedarsani,Yaser P. Fallah
备注：arXiv admin note: substantial text overlap with arXiv:2202.00881
摘要：预计自动驾驶车辆（AV）和异构的人类驾驶车辆（HV）将在同一条道路上共存。无人驾驶汽车的安全性和可靠性将取决于其社会意识以及以社会接受的方式参与复杂社会互动的能力。然而，自动驾驶汽车在与HV合作方面仍然效率低下，并且难以理解和适应人类行为，这在混合自动驾驶方面尤其具有挑战性。在AV和HV共享的道路上，HV的社会偏好或个人特征对于AV来说是未知的，并且与期望遵循政策的AV不同，HV特别难以预测，因为它们不一定遵循固定的政策。为了解决这些挑战，我们将混合自治问题框架为多智能体强化学习（MARL）问题，并提出了一种方法，允许自动驾驶汽车从经验中隐式地学习HV的决策，考虑所有车辆的利益，并安全地适应其他交通情况。与现有的工作相比，我们量化AV的社会偏好，并提出了一个分布式的奖励结构，引入利他主义到他们的决策过程中，允许利他的AV学习建立联盟和影响的HV的行为。
摘要：It is expected that autonomous vehicles(AVs) and heterogeneous human-driven vehicles(HVs) will coexist on the same road. The safety and reliability of AVs will depend on their social awareness and their ability to engage in complex social interactions in a socially accepted manner. However, AVs are still inefficient in terms of cooperating with HVs and struggle to understand and adapt to human behavior, which is particularly challenging in mixed autonomy. In a road shared by AVs and HVs, the social preferences or individual traits of HVs are unknown to the AVs and different from AVs, which are expected to follow a policy, HVs are particularly difficult to forecast since they do not necessarily follow a stationary policy. To address these challenges, we frame the mixed-autonomy problem as a multi-agent reinforcement learning (MARL) problem and propose an approach that allows AVs to learn the decision-making of HVs implicitly from experience, account for all vehicles' interests, and safely adapt to other traffic situations. In contrast with existing works, we quantify AVs' social preferences and propose a distributed reward structure that introduces altruism into their decision-making process, allowing the altruistic AVs to learn to establish coalitions and influence the behavior of HVs.

【3】Controlling Steering Angle for Cooperative Self-driving Vehicles utilizing CNN and LSTM-based Deep Networks
标题：利用CNN和基于LSTM的深度网络控制协作自动驾驶车辆的转向角度
链接：https://arxiv.org/abs/1904.04375

作者：Rodolfo Valiente,Mahdi Zaman,Sedat Ozer,Yaser P. Fallah
备注：Accepted in IV 2019, 6 pages, 9 figures
摘要：自动驾驶汽车的一个基本挑战是在不同的道路条件下调整转向角。最近解决这一挑战的最先进的解决方案包括深度学习技术，因为它们提供端到端解决方案，直接从原始输入图像中预测转向角，具有更高的准确性。这些工作大多忽略了图像帧之间的时间依赖性。在本文中，我们解决的问题，利用两个自动驾驶汽车之间共享的多组图像，以提高控制转向角的精度，通过考虑图像帧之间的时间依赖性。这个问题在文献中还没有得到广泛的研究。我们提出并研究了一种新的深度架构，通过在我们的深度架构中使用长短期记忆（LSTM）来自动预测转向角。我们的深度架构是一个端到端网络，它利用CNN、LSTM和全连接（FC）层，并使用当前和未来的图像（由前方车辆通过车对车（V2V）通信共享）作为输入来控制转向角。与文献中的其他现有方法相比，我们的模型显示出最低的误差。
摘要：A fundamental challenge in autonomous vehicles is adjusting the steering angle at different road conditions. Recent state-of-the-art solutions addressing this challenge include deep learning techniques as they provide end-to-end solution to predict steering angles directly from the raw input images with higher accuracy. Most of these works ignore the temporal dependencies between the image frames. In this paper, we tackle the problem of utilizing multiple sets of images shared between two autonomous vehicles to improve the accuracy of controlling the steering angle by considering the temporal dependencies between the image frames. This problem has not been studied in the literature widely. We present and study a new deep architecture to predict the steering angle automatically by using Long-Short-Term-Memory (LSTM) in our deep architecture. Our deep architecture is an end-to-end network that utilizes CNN, LSTM and fully connected (FC) layers and it uses both present and futures images (shared by a vehicle ahead via Vehicle-to-Vehicle (V2V) communication) as input to control the steering angle. Our model demonstrates the lowest error when compared to the other existing approaches in the literature.

推理|分析|理解|解释(11篇)

【1】Analysis of Dirichlet Energies as Over-smoothing Measures
标题：Dirichlet能量作为过度平滑措施的分析
链接：https://arxiv.org/abs/2512.09890

作者：Anna Bison,Alessandro Sperduti
摘要：我们分析了两个泛函之间的区别，经常被用作过平滑措施：Dirichlet能量引起的未规范化的图形拉普拉斯算子和规范化的图形拉普拉斯算子。我们证明，后者未能满足Rusch \textit{et al.}提出的节点相似性度量的公理化定义。通过形式化这两个定义的基本频谱特性，我们强调了选择与GNN架构在频谱上兼容的度量所需的关键区别，从而解决了动态监测中的模糊性。
摘要：We analyze the distinctions between two functionals often used as over-smoothing measures: the Dirichlet energies induced by the unnormalized graph Laplacian and the normalized graph Laplacian. We demonstrate that the latter fails to satisfy the axiomatic definition of a node-similarity measure proposed by Rusch \textit{et al.} By formalizing fundamental spectral properties of these two definitions, we highlight critical distinctions necessary to select the metric that is spectrally compatible with the GNN architecture, thereby resolving ambiguities in monitoring the dynamics.

【2】A roadmap of geospatial soil quality analysis systems
标题：地理空间土壤质量分析系统路线图
链接：https://arxiv.org/abs/2512.09817

作者：Habiba BEN ABDERRAHMANE,Slimane Oulad-Naoui,Benameur ZIANI
摘要：土壤质量（SQ）在可持续农业、环境保护和土地利用规划中起着至关重要的作用。传统的SQ评估技术依赖于昂贵的劳动密集型采样和实验室分析，限制了其空间和时间覆盖范围。地理信息系统（GIS）、遥感和机器学习（ML）的进步使SQ评估变得高效。本文提出了一个全面的路线图，通过提出一个统一的模块化管道将多源土壤数据，GIS和遥感工具以及机器学习技术集成在一起，以支持透明和可扩展的土壤质量评估。它还包括实际应用。与主要针对孤立土壤参数或特定建模方法的现有研究相反，这种方法巩固了地理信息系统（GIS），遥感技术和机器学习算法在整个土壤质量评估管道中的最新进展。它还解决了现有的挑战和限制，同时探索该领域的未来发展和新趋势，可以提供下一代土壤质量系统，使其更加透明，适应性更强，并与可持续土地管理保持一致。
摘要：Soil quality (SQ) plays a crucial role in sustainable agriculture, environmental conservation, and land-use planning. Traditional SQ assessment techniques rely on costly, labor-intensive sampling and laboratory analysis, limiting their spatial and temporal coverage. Advances in Geographic Information Systems (GIS), remote sensing, and machine learning (ML) enabled efficient SQ evaluation. This paper presents a comprehensive roadmap distinguishing it from previous reviews by proposing a unified and modular pipeline that integrates multi-source soil data, GIS and remote sensing tools, and machine learning techniques to support transparent and scalable soil quality assessment. It also includes practical applications. Contrary to existing studies that predominantly target isolated soil parameters or specific modeling methodologies, this approach consolidates recent advancements in Geographic Information Systems (GIS), remote sensing technologies, and machine learning algorithms within the entire soil quality assessment pipeline. It also addresses existing challenges and limitations while exploring future developments and emerging trends in the field that can deliver the next generation of soil quality systems making them more transparent, adaptive, and aligned with sustainable land management.

【3】TinyDéjàVu: Smaller Memory Footprint & Faster Inference on Sensor Data Streams with Always-On Microcontrollers
标题：TinyDéjàVu：使用始终在线的微控制器更小的内存占用和更快的传感器数据流推理
链接：https://arxiv.org/abs/2512.09786

作者：Zhaolan Huang,Emmanuel Baccelli
摘要：人们越来越希望始终在线的传感器能够搭载各种微型神经网络，并不断对它们感测到的数据的时间序列进行推理。为了在电池上操作时满足寿命和能耗要求，这种硬件使用具有微小存储器预算的微控制器（MCU），例如，128kB RAM在这种情况下，优化跨神经网络层的数据流变得至关重要。在本文中，我们介绍了TinyDéjàVu，这是我们设计的一个新框架和新算法，用于在典型的微控制器硬件上使用各种微型ML模型进行传感器数据时间序列推断，从而大大减少所需的RAM占用量。我们将TinyDéjàVu的实现作为开源发布，并在硬件上执行可复制的基准测试。我们表明，TinyDéjàVu可以节省超过60%的RAM使用，并消除高达90%的重叠滑动窗口输入的冗余计算。
摘要：Always-on sensors are increasingly expected to embark a variety of tiny neural networks and to continuously perform inference on time-series of the data they sense. In order to fit lifetime and energy consumption requirements when operating on battery, such hardware uses microcontrollers (MCUs) with tiny memory budget e.g., 128kB of RAM. In this context, optimizing data flows across neural network layers becomes crucial. In this paper, we introduce TinyDéjàVu, a new framework and novel algorithms we designed to drastically reduce the RAM footprint required by inference using various tiny ML models for sensor data time-series on typical microcontroller hardware. We publish the implementation of TinyDéjàVu as open source, and we perform reproducible benchmarks on hardware. We show that TinyDéjàVu can save more than 60% of RAM usage and eliminate up to 90% of redundant compute on overlapping sliding window inputs.

【4】Rethinking Chain-of-Thought Reasoning for Videos
标题：重新思考视频的思想链推理
链接：https://arxiv.org/abs/2512.09616

作者：Yiwu Zhong,Zi-Yuan Hu,Yin Li,Liwei Wang
备注：Technical report
摘要：思维链（CoT）推理在解决自然语言处理中的复杂任务方面非常成功，最近的多模态大语言模型（MLLM）将这种范式扩展到视频推理。然而，这些模型通常建立在冗长的推理链和大量的输入视觉标记上。从我们的基准研究的实证观察的动机，我们假设，简洁的推理结合一组减少的视觉令牌可以足够有效的视频推理。为了评估这一假设，我们设计并验证了一个有效的后训练和推理框架，提高了视频MLLM的推理能力。我们的框架使模型能够对压缩的视觉标记进行操作，并在回答之前生成简短的推理轨迹。由此产生的模型实现了大幅提高的推理效率，在不同的基准测试中提供了有竞争力的性能，并避免了对手动CoT注释或监督微调的依赖。总的来说，我们的研究结果表明，长时间的，像人类一样的CoT推理可能不是一般的视频推理所必需的，简洁的推理既有效又高效。我们的代码将在https://github.com/LaVi-Lab/Rethink_CoT_Video上发布。
摘要：Chain-of-thought (CoT) reasoning has been highly successful in solving complex tasks in natural language processing, and recent multimodal large language models (MLLMs) have extended this paradigm to video reasoning. However, these models typically build on lengthy reasoning chains and large numbers of input visual tokens. Motivated by empirical observations from our benchmark study, we hypothesize that concise reasoning combined with a reduced set of visual tokens can be sufficient for effective video reasoning. To evaluate this hypothesis, we design and validate an efficient post-training and inference framework that enhances a video MLLM's reasoning capability. Our framework enables models to operate on compressed visual tokens and generate brief reasoning traces prior to answering. The resulting models achieve substantially improved inference efficiency, deliver competitive performance across diverse benchmarks, and avoid reliance on manual CoT annotations or supervised fine-tuning. Collectively, our results suggest that long, human-like CoT reasoning may not be necessary for general video reasoning, and that concise reasoning can be both effective and efficient. Our code will be released at https://github.com/LaVi-Lab/Rethink_CoT_Video.

【5】Goal inference with Rao-Blackwellized Particle Filters
标题：使用Rao-Blackwelized粒子过滤器进行目标推理
链接：https://arxiv.org/abs/2512.09269

作者：Yixuan Wang,Dan P. Guralnik,Warren E. Dixon
备注：9 pages, 2 figures
摘要：从嘈杂的观测其轨迹推断移动代理的最终目标是一个基本的估计问题。我们开始这样的意图推理的研究使用一个变量的Rao-Blackwellized粒子滤波器（RBPF），受假设代理的意图表现通过闭环行为与一个国家的最先进的可证明的实际稳定性。利用假定的封闭形式的代理动力学，RBPF分析边缘化的线性高斯子结构和更新粒子的权重，提高采样效率比标准的粒子滤波器。两个差异估计：高斯混合模型使用的RBPF权重和减少的版本限制的混合物的有效样本。我们量化的对手可以恢复代理的意图使用信息理论的泄漏指标，并提供可计算的下限Kullback-Leibler（KL）之间的分歧的真实意图分布和RBPF估计通过高斯混合KL界限。我们还提供了两个估计之间的性能差异的界限，突出的事实是，减少估计执行几乎以及完整的。实验表明，快速，准确的意图恢复合规代理，激励未来的工作设计意图混淆控制器。
摘要：Inferring the eventual goal of a mobile agent from noisy observations of its trajectory is a fundamental estimation problem. We initiate the study of such intent inference using a variant of a Rao-Blackwellized Particle Filter (RBPF), subject to the assumption that the agent's intent manifests through closed-loop behavior with a state-of-the-art provable practical stability property. Leveraging the assumed closed-form agent dynamics, the RBPF analytically marginalizes the linear-Gaussian substructure and updates particle weights only, improving sample efficiency over a standard particle filter. Two difference estimators are introduced: a Gaussian mixture model using the RBPF weights and a reduced version confining the mixture to the effective sample. We quantify how well the adversary can recover the agent's intent using information-theoretic leakage metrics and provide computable lower bounds on the Kullback-Leibler (KL) divergence between the true intent distribution and RBPF estimates via Gaussian-mixture KL bounds. We also provide a bound on the difference in performance between the two estimators, highlighting the fact that the reduced estimator performs almost as well as the complete one. Experiments illustrate fast and accurate intent recovery for compliant agents, motivating future work on designing intent-obfuscating controllers.

【6】SIP: Site in Pieces- A Dataset of Disaggregated Construction-Phase 3D Scans for Semantic Segmentation and Scene Understanding
标题：SI：碎片式站点-用于语义分割和场景理解的分解结构阶段3D扫描数据集
链接：https://arxiv.org/abs/2512.09062

作者：Seongyong Kim,Yong Kwon Cho
摘要：在施工现场进行准确的3D场景解读对于进度监控、安全评估和数字孪生模型开发至关重要。LiDAR在建筑中得到广泛应用，因为它比基于相机的系统具有优势，在混乱和动态变化的条件下可靠地执行。然而，大多数用于3D感知的公共数据集都来自具有均匀采样和完全可见性的密集融合扫描，这些条件并不能反映真实的建筑工地。现场数据通常作为孤立的单站LiDAR视图收集，受到安全要求、有限访问和持续操作的限制。这些因素导致径向密度衰减，碎片化的几何形状和视图相关的可扩展性，这些特征在现有数据集中仍然表现不足。本文介绍SIP，Site in Pieces，这是一个数据集，用于反映施工期间LiDAR采集的实际限制。SIP提供使用地面LiDAR扫描仪捕获的室内和室外场景，并使用针对施工环境定制的分类法在点级别进行注释：A.环境，B。施工作业，C。现场监督。该数据集包括结构组件和细长的临时对象，如脚手架，MEP管道和电梯，其中由遮挡和碎片几何形状引起的稀疏性使分割特别具有挑战性。扫描协议、注释工作流程和质量控制程序为数据集建立了一致的基础。SIP是开放的，支持Git存储库，提供可适应的类配置，简化了现代3D深度学习框架的采用。通过提供保留真实感测特性的现场数据，SIP实现了强大的基准测试，并有助于推进面向施工的3D视觉任务。
摘要：Accurate 3D scene interpretation in active construction sites is essential for progress monitoring, safety assessment, and digital twin development. LiDAR is widely used in construction because it offers advantages over camera-based systems, performing reliably in cluttered and dynamically changing conditions. Yet most public datasets for 3D perception are derived from densely fused scans with uniform sampling and complete visibility, conditions that do not reflect real construction sites. Field data are often collected as isolated single-station LiDAR views, constrained by safety requirements, limited access, and ongoing operations. These factors lead to radial density decay, fragmented geometry, and view-dependent visibility-characteristics that remain underrepresented in existing datasets. This paper presents SIP, Site in Pieces, a dataset created to reflect the practical constraints of LiDAR acquisition during construction. SIP provides indoor and outdoor scenes captured with a terrestrial LiDAR scanner and annotated at the point level using a taxonomy tailored to construction environments: A. Built Environment, B. Construction Operations, and C. Site Surroundings. The dataset includes both structural components and slender temporary objects such as scaffolding, MEP piping, and scissor lifts, where sparsity caused by occlusion and fragmented geometry make segmentation particularly challenging. The scanning protocol, annotation workflow, and quality control procedures establish a consistent foundation for the dataset. SIP is openly available with a supporting Git repository, offering adaptable class configurations that streamline adoption within modern 3D deep learning frameworks. By providing field data that retain real-world sensing characteristics, SIP enables robust benchmarking and contributes to advancing construction-oriented 3D vision tasks.

【7】Peek-a-Boo Reasoning: Contrastive Region Masking in MLLMs
标题：躲猫猫推理：MLLM中的对比区域掩蔽
链接：https://arxiv.org/abs/2512.08976

作者：Isha Chaturvedi,Anjana Nair,Yushen Li,Adhitya Rajendra Kumar,Kevin Zhu,Sunishchal Dev,Ashwinee Panda,Vasu Sharma
摘要：我们介绍了对比区域掩蔽（CRM），这是一种无需训练的诊断方法，它揭示了多模态大型语言模型（MLLM）如何在思维链（CoT）推理的每一步都依赖于特定的视觉区域。与局限于最终答案或注意力地图的现有方法不同，CRM通过系统地掩蔽注释区域并将所得推理轨迹与未掩蔽基线进行对比来提供因果的步骤级属性。应用于VisArgs等数据集，CRM揭示了不同的失败模式：一些模型保留了推理结构，但在证据缺失时会产生幻觉，而另一些模型则紧紧地依赖于视觉线索，但在扰动下会崩溃。通过将评估从答案的正确性转移到推理的忠实性，CRM将视觉基准重新构建为诊断工具，突出了对多模态评估框架的需求，该框架不仅测量性能，而且还测量推理的鲁棒性和忠实性。
摘要：We introduce Contrastive Region Masking (CRM), a training free diagnostic that reveals how multimodal large language models (MLLMs) depend on specific visual regions at each step of chain-of-thought (CoT) reasoning. Unlike prior approaches limited to final answers or attention maps, CRM provides causal, step-level attri- bution by systematically masking annotated regions and contrasting the resulting reasoning traces with unmasked baselines. Applied to datasets such as VisArgs, CRM reveals distinct failure modes: some models preserve reasoning structure, but hallucinate when evidence is missing, while others ground tightly to visual cues yet collapse under perturbations. By shifting the evaluation from correctness of an- swers to faithfulness of reasoning, CRM reframes visual benchmarks as diagnostic tools, highlighting the need for multimodal evaluation frameworks that measure not just performance, but also robustness and fidelity of reasoning.

【8】WTNN: Weibull-Tailored Neural Networks for survival analysis
标题：WTNN：用于生存分析的韦伯定制神经网络
链接：https://arxiv.org/abs/2512.09163

作者：Gabrielle Rives,Olivier Lopez,Nicolas Bousquet
摘要：威布尔分布是一种常用的选择，用于对随时间推移而进行维护的系统的生存进行建模。当只有代理指标和截尾观测值时，有必要将分布参数表示为时间相关协变量的函数。深度神经网络提供了学习这些协变量与运行寿命之间复杂关系所需的灵活性，从而扩展了传统回归模型的功能。出于对在高度多变和苛刻的环境中运行的军用车辆的分析，以及在现有方法中观察到的局限性，本文介绍了WTNN，一种新的基于神经网络的建模框架，专门为Weibull生存研究而设计。所提出的架构是专门设计的，以符合威布尔分布的形状和结构的方式，将定性先验知识的最有影响力的协变量。通过数值实验，我们证明了这种方法可以在代理和右删失数据上进行可靠的训练，并且能够产生鲁棒和可解释的生存预测，从而改进现有方法。
摘要：The Weibull distribution is a commonly adopted choice for modeling the survival of systems subject to maintenance over time. When only proxy indicators and censored observations are available, it becomes necessary to express the distribution's parameters as functions of time-dependent covariates. Deep neural networks provide the flexibility needed to learn complex relationships between these covariates and operational lifetime, thereby extending the capabilities of traditional regression-based models. Motivated by the analysis of a fleet of military vehicles operating in highly variable and demanding environments, as well as by the limitations observed in existing methodologies, this paper introduces WTNN, a new neural network-based modeling framework specifically designed for Weibull survival studies. The proposed architecture is specifically designed to incorporate qualitative prior knowledge regarding the most influential covariates, in a manner consistent with the shape and structure of the Weibull distribution. Through numerical experiments, we show that this approach can be reliably trained on proxy and right-censored data, and is capable of producing robust and interpretable survival predictions that can improve existing approaches.

【9】Understanding temperature tuning in energy-based models
标题：了解基于能源的模型中的温度调整
链接：https://arxiv.org/abs/2512.09152

作者：Peter W Fields,Vudtiwat Ngampruetikorn,David J Schwab,Stephanie E Palmer
备注：18 pages, 7 figures
摘要：复杂系统的生成模型通常需要事后参数调整以产生有用的输出。例如，蛋白质设计的基于能量的模型在人为的低“温度”下进行采样，以产生新的功能序列。这种温度调整是一种常见但知之甚少的启发式方法，用于机器学习环境中，以控制生成保真度和多样性之间的权衡。在这里，我们开发了一个可解释的，物理动机的框架来解释这种现象。我们证明，在具有大的“能量间隙”的系统中-将一小部分有意义的状态与不切实际的状态的巨大空间分开-从稀疏数据中学习会导致模型系统性地高估高能量状态概率，降低采样温度会纠正这种偏差。更一般地说，我们的特征如何最佳采样温度取决于数据大小和系统的底层能源景观之间的相互作用。至关重要的是，我们的研究结果表明，降低采样温度并不总是可取的，我们确定的条件下，提高它的结果在更好的生成性能。因此，我们的框架将事后温度调整作为一种诊断工具，揭示了真实数据分布的特性和学习模型的局限性。
摘要：Generative models of complex systems often require post-hoc parameter adjustments to produce useful outputs. For example, energy-based models for protein design are sampled at an artificially low ''temperature'' to generate novel, functional sequences. This temperature tuning is a common yet poorly understood heuristic used across machine learning contexts to control the trade-off between generative fidelity and diversity. Here, we develop an interpretable, physically motivated framework to explain this phenomenon. We demonstrate that in systems with a large ''energy gap'' - separating a small fraction of meaningful states from a vast space of unrealistic states - learning from sparse data causes models to systematically overestimate high-energy state probabilities, a bias that lowering the sampling temperature corrects. More generally, we characterize how the optimal sampling temperature depends on the interplay between data size and the system's underlying energy landscape. Crucially, our results show that lowering the sampling temperature is not always desirable; we identify the conditions where \emph{raising} it results in better generative performance. Our framework thus casts post-hoc temperature tuning as a diagnostic tool that reveals properties of the true data distribution and the limits of the learned model.

【10】Interpretable machine learning of halo gas density profiles: a sensitivity analysis of cosmological hydrodynamical simulations
标题：光环气体密度剖面的可解释机器学习：宇宙学流体动力学模拟的敏感性分析
链接：https://arxiv.org/abs/2512.09021

作者：Daniele Sorini,Sownak Bose,Mathilda Denison,Romeel Davé
备注：To be submitted to The Open Journal of Astrophysics
摘要：恒星和活动星系核驱动的反馈过程影响着从星系内部到星系际介质的各种尺度上的气体分布。然而，目前还不清楚反馈如何通过其与关键星系属性的联系，塑造宿主晕中的径向气体密度分布。我们使用EAGLE，IllustrisTNG和Simba宇宙学流体动力学模拟套件来解决这个问题，这些模拟涵盖了各种反馈模型。我们开发了一个随机森林算法，预测径向气体密度分布从总晕质量和五个全球性的中央星系的晕内：气体和恒星质量;恒星形成率;质量和吸积率的中央黑洞（BH）。在晕质量为10 ^{9.5} \，M_{\odot} < M_{\rm 200 c} < 10 ^{15}\，M_{\odot}$和红移间隔为0
摘要：Stellar and AGN-driven feedback processes affect the distribution of gas on a wide range of scales, from within galaxies well into the intergalactic medium. Yet, it remains unclear how feedback, through its connection to key galaxy properties, shapes the radial gas density profile in the host halo. We tackle this question using suites of the EAGLE, IllustrisTNG, and Simba cosmological hydrodynamical simulations, which span a variety of feedback models. We develop a random forest algorithm that predicts the radial gas density profile within haloes from the total halo mass and five global properties of the central galaxy: gas and stellar mass; star formation rate; mass and accretion rate of the central black hole (BH). The algorithm reproduces the simulated gas density profiles with an average accuracy of $\sim$80-90% over the halo mass range $10^{9.5} \, \mathrm{M}_{\odot} < M_{\rm 200c} < 10^{15} \, \mathrm{M}_{\odot}$ and redshift interval $0

【11】Online Inference of Constrained Optimization: Primal-Dual Optimality and Sequential Quadratic Programming
标题：约束优化的在线推理：原-二元最优性和序列二次规划
链接：https://arxiv.org/abs/2512.08948

作者：Yihang Gao,Michael K. Ng,Michael W. Mahoney,Sen Na
备注：80 pages, 5 figures, 5 tables
摘要：研究了带等式和不等式约束的随机优化问题解的在线统计推断。这些问题在统计学和机器学习中普遍存在，包括约束$M$估计，物理模型，安全强化学习和算法公平性。我们开发了一个随机序列二次规划（SSQP）方法来解决这些问题，其中的步骤方向计算顺序执行的二次近似的目标和线性近似的约束。尽管可以获得种群梯度的无偏估计，但约束随机问题的一个关键挑战在于处理步长方向的偏差。因此，我们在SSQP中应用动量风格的梯度移动平均技术来消除步骤。我们表明，我们的方法实现了全球几乎必然收敛，并表现出局部渐近正态性的最佳原始-对偶限制协方差矩阵的意义下的Hájek和Le Cam。此外，我们提供了一个插件的协方差矩阵估计实际推断。据我们所知，建议的SSQP方法是第一个完全在线的方法，达到原始-对偶渐近极大极小最优性，而不依赖于投影算子的约束集，这是一般棘手的非线性问题。通过对基准非线性问题，以及约束广义线性模型和投资组合分配问题，使用合成和真实数据进行了广泛的实验，我们证明了我们的方法的优越性能，表明该方法及其渐近行为不仅有效地解决了约束随机问题，而且在现实世界的应用中提供了有效和实用的在线推理。
摘要：We study online statistical inference for the solutions of stochastic optimization problems with equality and inequality constraints. Such problems are prevalent in statistics and machine learning, encompassing constrained $M$-estimation, physics-informed models, safe reinforcement learning, and algorithmic fairness. We develop a stochastic sequential quadratic programming (SSQP) method to solve these problems, where the step direction is computed by sequentially performing a quadratic approximation of the objective and a linear approximation of the constraints. Despite having access to unbiased estimates of population gradients, a key challenge in constrained stochastic problems lies in dealing with the bias in the step direction. As such, we apply a momentum-style gradient moving-average technique within SSQP to debias the step. We show that our method achieves global almost-sure convergence and exhibits local asymptotic normality with an optimal primal-dual limiting covariance matrix in the sense of Hájek and Le Cam. In addition, we provide a plug-in covariance matrix estimator for practical inference. To our knowledge, the proposed SSQP method is the first fully online method that attains primal-dual asymptotic minimax optimality without relying on projection operators onto the constraint set, which are generally intractable for nonlinear problems. Through extensive experiments on benchmark nonlinear problems, as well as on constrained generalized linear models and portfolio allocation problems using both synthetic and real data, we demonstrate superior performance of our method, showing that the method and its asymptotic behavior not only solve constrained stochastic problems efficiently but also provide valid and practical online inference in real-world applications.

检测相关(3篇)

【1】QuanvNeXt: An end-to-end quanvolutional neural network for EEG-based detection of major depressive disorder
标题：QuanvNeXt：一个端到端量化神经网络，用于基于脑电波检测重性抑郁症
链接：https://arxiv.org/abs/2512.09517

作者：Nabil Anan Orka,Ehtashamul Haque,Maftahul Jannat,Md Abdul Awal,Mohammad Ali Moni
备注：Under review
摘要：这项研究提出了QuanvNeXt，一个端到端的完全量化模型，用于基于EEG的抑郁症诊断。QuanvNeXt采用了一种新的交叉残差块，它降低了特征同质性，加强了交叉特征关系，同时保持了参数效率。我们在两个开源数据集上评估了QuanvNeXt，其平均准确率为93.1%，平均AUC-ROC为97.2%，优于InceptionTime等最先进的基线（91.7%准确率，95.9% AUC-ROC）。跨高斯噪声水平的不确定性分析证明了校准良好的预测，即使在最高扰动（ε = 0.1）下，ECE评分仍保持在低（0.0436，数据集1）至中等（0.1159，数据集2）。此外，一项事后可解释的人工智能分析证实，QuanvNeXt有效地识别和学习了区分健康对照和重度抑郁症的谱时模式。总的来说，QuanvNeXt为基于EEG的抑郁症诊断建立了一种有效和可靠的方法。
摘要：This study presents QuanvNeXt, an end-to-end fully quanvolutional model for EEG-based depression diagnosis. QuanvNeXt incorporates a novel Cross Residual block, which reduces feature homogeneity and strengthens cross-feature relationships while retaining parameter efficiency. We evaluated QuanvNeXt on two open-source datasets, where it achieved an average accuracy of 93.1% and an average AUC-ROC of 97.2%, outperforming state-of-the-art baselines such as InceptionTime (91.7% accuracy, 95.9% AUC-ROC). An uncertainty analysis across Gaussian noise levels demonstrated well-calibrated predictions, with ECE scores remaining low (0.0436, Dataset 1) to moderate (0.1159, Dataset 2) even at the highest perturbation (ε = 0.1). Additionally, a post-hoc explainable AI analysis confirmed that QuanvNeXt effectively identifies and learns spectrotemporal patterns that distinguish between healthy controls and major depressive disorder. Overall, QuanvNeXt establishes an efficient and reliable approach for EEG-based depression diagnosis.

【2】Detection and Localization of Subdural Hematoma Using Deep Learning on Computed Tomography
标题：利用计算机断层扫描深度学习检测和定位硬膜下血肿
链接：https://arxiv.org/abs/2512.09393

作者：Vasiliki Stoumpou,Rohan Kumar,Bernard Burman,Diego Ojeda,Tapan Mehta,Dimitris Bertsimas
摘要：背景硬膜下血肿（SDH）是一种常见的神经外科急症，在老年人群中发病率不断增加。快速准确的识别对于指导及时干预至关重要，但现有的自动化工具主要集中在检测上，并提供有限的可解释性或空间定位。仍然需要透明、高性能的系统，其集成多模态临床和成像信息以支持实时决策。方法.我们开发了一个多模态深度学习框架，该框架集成了结构化临床变量、在CT体积上训练的3D卷积神经网络以及用于SDH检测和定位的transformer-enhanced 2D分割模型。使用来自Hartford HealthCare（2015- 2024）的25，315项头部CT研究，其中3，774项（14.9%）包含临床医生确认的SDH，表格模型根据人口统计学，合并症，药物和实验室结果进行训练。训练成像模型以检测SDH并生成体素级概率图。一种结合互补预测因子的贪婪集成策略。调查结果。单独的临床变量提供了适度的区分能力（AUC 0.75）。在CT体积和分割衍生图上训练的卷积模型实现了更高的准确性（AUC 0.922和0.926）。整合所有组分的多模态集成实现了最佳的总体性能（AUC 0.9407; 95% CI，0.930- 0.951），并产生了与已知SDH模式一致的解剖学上有意义的定位图。口译。这种多模式、可解释的框架提供了快速准确的SDH检测和定位，实现了高检测性能，并提供了透明、解剖学基础的输出。集成到放射学工作流程中可以简化分诊，减少干预时间，并提高SDH管理的一致性。
摘要：Background. Subdural hematoma (SDH) is a common neurosurgical emergency, with increasing incidence in aging populations. Rapid and accurate identification is essential to guide timely intervention, yet existing automated tools focus primarily on detection and provide limited interpretability or spatial localization. There remains a need for transparent, high-performing systems that integrate multimodal clinical and imaging information to support real-time decision-making. Methods. We developed a multimodal deep-learning framework that integrates structured clinical variables, a 3D convolutional neural network trained on CT volumes, and a transformer-enhanced 2D segmentation model for SDH detection and localization. Using 25,315 head CT studies from Hartford HealthCare (2015--2024), of which 3,774 (14.9\%) contained clinician-confirmed SDH, tabular models were trained on demographics, comorbidities, medications, and laboratory results. Imaging models were trained to detect SDH and generate voxel-level probability maps. A greedy ensemble strategy combined complementary predictors. Findings. Clinical variables alone provided modest discriminatory power (AUC 0.75). Convolutional models trained on CT volumes and segmentation-derived maps achieved substantially higher accuracy (AUCs 0.922 and 0.926). The multimodal ensemble integrating all components achieved the best overall performance (AUC 0.9407; 95\% CI, 0.930--0.951) and produced anatomically meaningful localization maps consistent with known SDH patterns. Interpretation. This multimodal, interpretable framework provides rapid and accurate SDH detection and localization, achieving high detection performance and offering transparent, anatomically grounded outputs. Integration into radiology workflows could streamline triage, reduce time to intervention, and improve consistency in SDH management.

【3】Enhancing Automatic Speech Recognition Through Integrated Noise Detection Architecture
标题：通过集成噪音检测架构增强自动语音识别
链接：https://arxiv.org/abs/2512.08973

作者：Karamvir Singh
备注：5 figures
摘要：这项研究提出了一种新的方法，以提高自动语音识别系统的集成噪声检测能力直接到识别架构。基于wav2vec2框架，所提出的方法采用了一个专用的噪声识别模块，同时与语音转录。使用公开可用的语音和环境音频数据集的实验验证表明，转录质量和噪声歧视的显着改善。增强的系统在字错误率，字符错误率，和噪声检测精度相比，传统的架构实现了卓越的性能。结果表明，联合优化的转录和噪声分类目标产生更可靠的语音识别在具有挑战性的声学条件。
摘要：This research presents a novel approach to enhancing automatic speech recognition systems by integrating noise detection capabilities directly into the recognition architecture. Building upon the wav2vec2 framework, the proposed method incorporates a dedicated noise identification module that operates concurrently with speech transcription. Experimental validation using publicly available speech and environmental audio datasets demonstrates substantial improvements in transcription quality and noise discrimination. The enhanced system achieves superior performance in word error rate, character error rate, and noise detection accuracy compared to conventional architectures. Results indicate that joint optimization of transcription and noise classification objectives yields more reliable speech recognition in challenging acoustic conditions.

分类|识别(3篇)

【1】OnCoCo 1.0: A Public Dataset for Fine-Grained Message Classification in Online Counseling Conversations
标题：OnCoCo 1.0：在线咨询对话中细粒度消息分类的公共数据集
链接：https://arxiv.org/abs/2512.09804

作者：Jens Albrecht,Robert Lehmann,Aleksandra Poltermann,Eric Rudolph,Philipp Steigerwald,Mara Stieler
备注：Submitted to LREC 2026
摘要：本文介绍了OnCoCo 1.0，一个新的公共数据集，用于在线咨询中的细粒度消息分类。它基于一个新的，综合的分类系统，旨在提高心理社会在线咨询对话的自动分析。现有的分类系统，主要是基于动机面试（MI），是有限的，他们的狭隘的重点和主要来自面对面的咨询数据集的依赖。这限制了对文本咨询对话的详细检查。作为回应，我们开发了一个全面的新编码方案，区分了38种类型的咨询师和28种类型的客户话语，并创建了一个标记数据集，其中包含约2.800条来自咨询对话的信息。我们在数据集上微调了几个模型，以证明其适用性。研究人员和从业人员可以公开获得这些数据和模型。因此，我们的工作为语言资源社区提供了一种新型的细粒度对话资源，扩展了现有的社会和心理健康对话分析数据集。
摘要：This paper presents OnCoCo 1.0, a new public dataset for fine-grained message classification in online counseling. It is based on a new, integrative system of categories, designed to improve the automated analysis of psychosocial online counseling conversations. Existing category systems, predominantly based on Motivational Interviewing (MI), are limited by their narrow focus and dependence on datasets derived mainly from face-to-face counseling. This limits the detailed examination of textual counseling conversations. In response, we developed a comprehensive new coding scheme that differentiates between 38 types of counselor and 28 types of client utterances, and created a labeled dataset consisting of about 2.800 messages from counseling conversations. We fine-tuned several models on our dataset to demonstrate its applicability. The data and models are publicly available to researchers and practitioners. Thus, our work contributes a new type of fine-grained conversational resource to the language resources community, extending existing datasets for social and mental-health dialogue analysis.

【2】OxEnsemble: Fair Ensembles for Low-Data Classification
标题：OxEnsemble：低数据分类的公平集合
链接：https://arxiv.org/abs/2512.09665

作者：Jonathan Rystrøm,Zihao Fu,Chris Russell
摘要：我们解决的问题，公平分类的设置数据是稀缺的，不平衡的人口群体。这种低数据机制在医学成像等领域很常见，在这些领域，假阴性可能会产生致命的后果。我们提出了一种新的方法，有效地训练合奏和执行这些低数据制度的公平性。与其他方法不同，我们在集合成员之间聚合预测，每个成员都经过训练以满足公平性约束。通过构造，\n {OxEntrance}既具有数据效率，仔细重用保留的数据以可靠地实施公平性，又具有计算效率，只需要比用于微调或评估现有模型更多的计算。我们用新的理论保证来验证这种方法。实验上，我们的方法在多个具有挑战性的医学成像分类数据集上产生了比现有方法更一致的结果和更强的公平性-准确性权衡。
摘要：We address the problem of fair classification in settings where data is scarce and unbalanced across demographic groups. Such low-data regimes are common in domains like medical imaging, where false negatives can have fatal consequences. We propose a novel approach \emph{OxEnsemble} for efficiently training ensembles and enforcing fairness in these low-data regimes. Unlike other approaches, we aggregate predictions across ensemble members, each trained to satisfy fairness constraints. By construction, \emph{OxEnsemble} is both data-efficient, carefully reusing held-out data to enforce fairness reliably, and compute-efficient, requiring little more compute than used to fine-tune or evaluate an existing model. We validate this approach with new theoretical guarantees. Experimentally, our approach yields more consistent outcomes and stronger fairness-accuracy trade-offs than existing methods across multiple challenging medical imaging classification datasets.

【3】DW-KNN: A Transparent Local Classifier Integrating Distance Consistency and Neighbor Reliability
标题：DW-KNN：一个集成距离一致性和邻居可靠性的透明本地分类器
链接：https://arxiv.org/abs/2512.08956

作者：Kumarjit Pathak,Karthik K,Sachin Madan,Jitin Kapila
摘要：K-最近邻（KNN）是最常用的ML分类器之一。然而，如果我们仔细观察，标准的距离加权KNN和相对变量假设所有'k'邻居都是同样可靠的。在异质特征空间中，这成为阻碍预测观察的真实水平的可靠性的限制。我们提出了DW-KNN（双加权KNN），一个透明和强大的变种，集成了指数距离与邻居有效性。这实现了实例级的可解释性，抑制了噪声或错误标记的样本，并降低了超参数敏感性。通过对9个数据集的综合评价，DW-KNN的平均精度达到0.8988。它在六种方法中排名第二，在性能最好的Entrance KNN的0.2%以内。它还表现出最低的交叉验证方差（0.0156），表明可靠的预测稳定性。统计显著性检验证实（$p < 0.001$）比紧致性加权KNN（+4.09\%）和核加权KNN（+1.13\%）有所改进。该方法提供了一个简单而有效的替代复杂的自适应方案，特别是高风险的应用程序需要解释的预测价值。
摘要：K-Nearest Neighbors (KNN) is one of the most used ML classifiers. However, if we observe closely, standard distance-weighted KNN and relative variants assume all 'k' neighbors are equally reliable. In heterogeneous feature space, this becomes a limitation that hinders reliability in predicting true levels of the observation. We propose DW-KNN (Double Weighted KNN), a transparent and robust variant that integrates exponential distance with neighbor validity. This enables instance-level interpretability, suppresses noisy or mislabeled samples, and reduces hyperparameter sensitivity. Comprehensive evaluation on 9 data-sets helps to demonstrate that DW-KNN achieves 0.8988 accuracy on average. It ranks 2nd among six methods and within 0.2% of the best-performing Ensemble KNN. It also exhibits the lowest cross-validation variance (0.0156), indicating reliable prediction stability. Statistical significance test confirmed ($p < 0.001$) improvement over compactness weighted KNN (+4.09\%) and Kernel weighted KNN (+1.13\%). The method provides a simple yet effective alternative to complex adaptive schemes, particularly valuable for high-stakes applications requiring explainable predictions.

表征(1篇)

【1】Representation Invariance and Allocation: When Subgroup Balance Matters
标题：代表不变性和分配：当子组平衡重要时
链接：https://arxiv.org/abs/2512.09496

作者：Anissa Alloula,Charles Jones,Zuzanna Wakefield-Skorniewska,Francesco Quinzan,Bartłomiej Papież
摘要：训练数据中人口统计学群体的不平等代表性对跨人群的模型泛化提出了挑战。标准实践假设平衡子组表示优化性能。然而，最近的实证结果与这一假设相矛盾：在某些情况下，不平衡的数据分布实际上提高了子组的性能，而在其他情况下，子组的性能不受训练过程中整个子组缺失的影响。我们对四种视觉和语言模型的亚组分配进行了系统研究，改变了训练数据的组成，以提高亚组性能对数据平衡的敏感性。我们提出了潜在分离假设，即部分微调模型对子群表示的依赖性由预训练模型潜在空间中子群之间的分离程度决定。我们正式提出这一假设，提供理论分析，并验证它的经验。最后，我们提出了一个实际应用基础模型微调，表明潜在的子群分离的定量分析可以告知数据收集和平衡决策。
摘要：Unequal representation of demographic groups in training data poses challenges to model generalisation across populations. Standard practice assumes that balancing subgroup representation optimises performance. However, recent empirical results contradict this assumption: in some cases, imbalanced data distributions actually improve subgroup performance, while in others, subgroup performance remains unaffected by the absence of an entire subgroup during training. We conduct a systematic study of subgroup allocation across four vision and language models, varying training data composition to characterise the sensitivity of subgroup performance to data balance. We propose the latent separation hypothesis, which states that a partially fine-tuned model's dependence on subgroup representation is determined by the degree of separation between subgroups in the latent space of the pre-trained model. We formalise this hypothesis, provide theoretical analysis, and validate it empirically. Finally, we present a practical application to foundation model fine-tuning, demonstrating that quantitative analysis of latent subgroup separation can inform data collection and balancing decisions.

优化|敛散性(7篇)

【1】Physics-Aware Heterogeneous GNN Architecture for Real-Time BESS Optimization in Unbalanced Distribution Systems
标题：用于不平衡分布系统中实时BESS优化的物理感知异类GNN架构
链接：https://arxiv.org/abs/2512.09780

作者：Aoxiang Ma,Salah Ghamizi,Jun Cao,Pedro Rodriguez
备注：5 pages, 2 figures, 3 tables
摘要：电池储能系统（BESS）在三相不平衡配电网中变得越来越重要，以保持电压稳定并实现优化调度。然而，现有的深度学习方法通常缺乏明确的三阶段表示，这使得难以准确地建模特定阶段的动态并强制执行操作约束，从而导致不可行的调度解决方案。本文证明，通过将详细的三相电网信息（包括相电压，不平衡负载和BESS状态）嵌入到异构图节点中，不同的GNN架构（GCN，GAT，GraphSAGE，GPS）可以联合预测网络状态变量，具有高精度。此外，物理信息损失函数通过训练期间的软惩罚来纳入关键的电池约束-SoC和C速率限制。CIGRE 18节点配电系统的实验验证表明，这种嵌入损失方法实现了较低的预测误差，母线电压MSE为6.92e-07（GCN），1.21e-06（GAT），3.29e-05（GPS）和9.04e-07（SAGE）。重要的是，物理通知的方法确保几乎零SoC和C速率约束违反，确认其有效性可靠，约束兼容调度。
摘要：Battery energy storage systems (BESS) have become increasingly vital in three-phase unbalanced distribution grids for maintaining voltage stability and enabling optimal dispatch. However, existing deep learning approaches often lack explicit three-phase representation, making it difficult to accurately model phase-specific dynamics and enforce operational constraints--leading to infeasible dispatch solutions. This paper demonstrates that by embedding detailed three-phase grid information--including phase voltages, unbalanced loads, and BESS states--into heterogeneous graph nodes, diverse GNN architectures (GCN, GAT, GraphSAGE, GPS) can jointly predict network state variables with high accuracy. Moreover, a physics-informed loss function incorporates critical battery constraints--SoC and C-rate limits--via soft penalties during training. Experimental validation on the CIGRE 18-bus distribution system shows that this embedding-loss approach achieves low prediction errors, with bus voltage MSEs of 6.92e-07 (GCN), 1.21e-06 (GAT), 3.29e-05 (GPS), and 9.04e-07 (SAGE). Importantly, the physics-informed method ensures nearly zero SoC and C-rate constraint violations, confirming its effectiveness for reliable, constraint-compliant dispatch.

【2】Optimizing Algorithms for Mobile Health Interventions with Active Querying Optimization
标题：通过主动查询优化优化移动医疗干预算法
链接：https://arxiv.org/abs/2512.08950

作者：Aseel Rawashdeh
摘要：移动健康（mHealth）干预中的强化学习需要平衡干预效果与用户负担，特别是当状态测量（例如，用户调查或反馈）成本高昂但必不可少时。采取行动，然后测量（ATM）的启发式解决了这个问题，解耦控制和测量行动的随机无噪声可观测马尔可夫决策过程（ACNO-MDP）框架内。然而，标准的ATM算法依赖于时间差分启发的Q学习方法，该方法在稀疏和噪声环境中容易不稳定。在这项工作中，我们提出了ATM的贝叶斯扩展，用卡尔曼滤波器风格的贝叶斯更新取代标准的Q学习，保持Q值的不确定性估计，并实现更稳定和样本效率更高的学习。我们评估我们的方法在玩具环境和临床动机的测试台。在小的表格环境中，贝叶斯ATM实现了相当或改进的标量化回报，具有更低的方差和更稳定的政策行为。相比之下，在更大和更复杂的mHealth设置中，标准和贝叶斯ATM变体的表现都很差，这表明ATM的建模假设与现实世界mHealth域的结构挑战之间存在不匹配。这些发现突出了不确定性感知方法在低数据环境中的价值，同时强调了对新RL算法的需求，这些算法可以在观察成本约束下显式建模因果结构，连续状态和延迟反馈。
摘要：Reinforcement learning in mobile health (mHealth) interventions requires balancing intervention efficacy with user burden, particularly when state measurements (for example, user surveys or feedback) are costly yet essential. The Act-Then-Measure (ATM) heuristic addresses this challenge by decoupling control and measurement actions within the Action-Contingent Noiselessly Observable Markov Decision Process (ACNO-MDP) framework. However, the standard ATM algorithm relies on a temporal-difference-inspired Q-learning method, which is prone to instability in sparse and noisy environments. In this work, we propose a Bayesian extension to ATM that replaces standard Q-learning with a Kalman filter-style Bayesian update, maintaining uncertainty-aware estimates of Q-values and enabling more stable and sample-efficient learning. We evaluate our method in both toy environments and clinically motivated testbeds. In small, tabular environments, Bayesian ATM achieves comparable or improved scalarized returns with substantially lower variance and more stable policy behavior. In contrast, in larger and more complex mHealth settings, both the standard and Bayesian ATM variants perform poorly, suggesting a mismatch between ATM's modeling assumptions and the structural challenges of real-world mHealth domains. These findings highlight the value of uncertainty-aware methods in low-data settings while underscoring the need for new RL algorithms that explicitly model causal structure, continuous states, and delayed feedback under observation cost constraints.

【3】Optimal certification of constant-local Hamiltonians
标题：恒局部Hamilton体的最佳证明
链接：https://arxiv.org/abs/2512.09778

作者：Junseo Lee,Myeongjin Shin
备注：29 pages
摘要：我们研究的问题，证明当地的哈密尔顿从实时访问他们的动态。给定对未知k-局部哈密顿量H和完全指定的目标哈密顿量H 0的e^{-itH}的预言访问，目标是在最小化总演化时间的同时，判定H是否完全等于H 0或与H 0在归一化Frobenius范数上相差至少\vareps $。我们介绍了第一个不宽容的哈密顿认证协议，实现所有恒定局部哈密顿的最佳性能。对于一般的$n$-量子比特，$k$-本地，无痕哈密顿，我们的程序使用$O（c^k/\vareps）$总演化时间为一个通用常数$c$，并成功地与高概率。特别是，对于$O（1）$-局部汉密尔顿，总演化时间变为$Θ（1/\varepsilon）$，匹配已知的$Ω（1/\varepsilon）$下限并实现黄金标准海森堡极限标度。以前的认证方法要么依赖于实现逆进化的$H$，需要控制访问$e^{-itH}$，或实现接近最优的保证，只有在有限的设置，如伊辛的情况下（$k=2$）。相比之下，我们的算法既不需要逆进化，也不需要受控操作：它只使用正向实时动态，并实现了所有恒定局部哈密顿算子的最佳不容忍认证。
摘要：We study the problem of certifying local Hamiltonians from real-time access to their dynamics. Given oracle access to $e^{-itH}$ for an unknown $k$-local Hamiltonian $H$ and a fully specified target Hamiltonian $H_0$, the goal is to decide whether $H$ is exactly equal to $H_0$ or differs from $H_0$ by at least $\varepsilon$ in normalized Frobenius norm, while minimizing the total evolution time. We introduce the first intolerant Hamiltonian certification protocol that achieves optimal performance for all constant-locality Hamiltonians. For general $n$-qubit, $k$-local, traceless Hamiltonians, our procedure uses $O(c^k/\varepsilon)$ total evolution time for a universal constant $c$, and succeeds with high probability. In particular, for $O(1)$-local Hamiltonians, the total evolution time becomes $Θ(1/\varepsilon)$, matching the known $Ω(1/\varepsilon)$ lower bounds and achieving the gold-standard Heisenberg-limit scaling. Prior certification methods either relied on implementing inverse evolution of $H$, required controlled access to $e^{-itH}$, or achieved near-optimal guarantees only in restricted settings such as the Ising case ($k=2$). In contrast, our algorithm requires neither inverse evolution nor controlled operations: it uses only forward real-time dynamics and achieves optimal intolerant certification for all constant-locality Hamiltonians.

【4】The Ky Fan Norms and Beyond: Dual Norms and Combinations for Matrix Optimization
标题：Ky Fan规范及超越：矩阵优化的双重规范和组合
链接：https://arxiv.org/abs/2512.09678

作者：Alexey Kravatskiy,Ivan Kozyrev,Nikolai Kozlov,Alexander Vinogradov,Daniil Merkulov,Ivan Oseledets
备注：31 pages
摘要：在这篇文章中，我们将探索使用各种矩阵范数来优化权重矩阵的函数，这是训练大型语言模型的一个关键问题。超越谱规范的μ子更新的基础上，我们利用Ky Fan $k$-规范引入一个家庭的μ子类算法，我们命名为Fanions，这是密切相关的狄翁。通过研究Ky Fan $k$-范数与Frobenius范数或l_\infty$范数的凸组合的乘积，我们分别构造了F-范阴离子和S-范阴离子族.它们最重要的成员是F-Muon和S-Muon。我们补充了我们的理论分析，这些算法在广泛的任务和设置的广泛的实证研究，证明F-μ子和S-μ子始终匹配μ子的性能，同时优于香草μ子的合成线性最小二乘问题。
摘要：In this article, we explore the use of various matrix norms for optimizing functions of weight matrices, a crucial problem in training large language models. Moving beyond the spectral norm underlying the Muon update, we leverage duals of the Ky Fan $k$-norms to introduce a family of Muon-like algorithms we name Fanions, which are closely related to Dion. By working with duals of convex combinations of the Ky Fan $k$-norms with either the Frobenius norm or the $l_\infty$ norm, we construct the families of F-Fanions and S-Fanions, respectively. Their most prominent members are F-Muon and S-Muon. We complement our theoretical analysis with an extensive empirical study of these algorithms across a wide range of tasks and settings, demonstrating that F-Muon and S-Muon consistently match Muon's performance, while outperforming vanilla Muon on a synthetic linear least squares problem.

【5】NeuroSketch: An Effective Framework for Neural Decoding via Systematic Architectural Optimization
标题：NeuroSketch：通过系统架构优化进行神经解码的有效框架
链接：https://arxiv.org/abs/2512.09524

作者：Gaorui Zhang,Zhizhang Yuan,Jialan Yang,Junru Chen,Li Meng,Yang Yang
摘要：神经解码作为脑-机接口（BCI）的重要组成部分，近年来引起了越来越多的研究兴趣。以前的研究集中在利用信号处理和深度学习方法来增强神经解码性能。然而，尽管模型架构在能源预测和图像分类等其他任务中已被证明是有效的，但对模型架构的深入探索仍然不足。在本研究中，我们提出了NeuroSketch，这是一个通过系统架构优化进行神经解码的有效框架。从基本架构研究开始，我们发现CNN-2D在神经解码任务中优于其他架构，并从时间和空间角度探索其有效性。在此基础上，我们从宏观到微观层面优化架构，实现每一步的性能提升。探索过程和模型验证需要超过5,000个实验，涵盖三种不同的模式（视觉，听觉和语音），三种类型的大脑信号（EEG，SEEG和ECoG）和八种不同的解码任务。实验结果表明，NeuroSketch在所有评估的数据集上都达到了最先进的（SOTA）性能，使其成为神经解码的强大工具。我们的代码和脚本可以在https://github.com/Galaxy-Dawn/NeuroSketch上找到。
摘要：Neural decoding, a critical component of Brain-Computer Interface (BCI), has recently attracted increasing research interest. Previous research has focused on leveraging signal processing and deep learning methods to enhance neural decoding performance. However, the in-depth exploration of model architectures remains underexplored, despite its proven effectiveness in other tasks such as energy forecasting and image classification. In this study, we propose NeuroSketch, an effective framework for neural decoding via systematic architecture optimization. Starting with the basic architecture study, we find that CNN-2D outperforms other architectures in neural decoding tasks and explore its effectiveness from temporal and spatial perspectives. Building on this, we optimize the architecture from macro- to micro-level, achieving improvements in performance at each step. The exploration process and model validations take over 5,000 experiments spanning three distinct modalities (visual, auditory, and speech), three types of brain signals (EEG, SEEG, and ECoG), and eight diverse decoding tasks. Experimental results indicate that NeuroSketch achieves state-of-the-art (SOTA) performance across all evaluated datasets, positioning it as a powerful tool for neural decoding. Our code and scripts are available at https://github.com/Galaxy-Dawn/NeuroSketch.

【6】Estimation of Stochastic Optimal Transport Maps
标题：随机最优交通地图的估计
链接：https://arxiv.org/abs/2512.09499

作者：Sloan Nietert,Ziv Goldfeld
备注：Appeared at NeurIPS 2025
摘要：最优运输（OT）图是高维概率分布之间的几何驱动转换，它支持统计，应用概率和机器学习中的各种任务。然而，现有的统计理论OT地图估计是相当有限的，铰链Brenier定理（二次成本，绝对连续源），以保证存在性和唯一性的确定性OT地图，施加各种额外的规则性假设，以获得定量的误差界。在许多现实世界的问题中，这些条件失败或无法被证明，在这种情况下，最佳运输只能通过可以分裂质量的随机映射来实现。为了扩大地图估计理论的范围，这样的设置，这项工作介绍了一种新的度量随机地图的运输质量进行评估。在此度量下，我们开发了计算效率高的地图估计与接近最佳的有限样本风险范围，易于验证的最小假设。我们的分析进一步适应常见形式的对抗性样本污染，产生具有鲁棒估计保证的估计量。实证实验验证了我们的理论，并证明了现有理论失败的设置中所提出的框架的实用性。这些贡献构成了地图估计的第一个通用理论，与广泛的现实世界中的应用兼容，其中最佳运输本质上是随机的。
摘要：The optimal transport (OT) map is a geometry-driven transformation between high-dimensional probability distributions which underpins a wide range of tasks in statistics, applied probability, and machine learning. However, existing statistical theory for OT map estimation is quite restricted, hinging on Brenier's theorem (quadratic cost, absolutely continuous source) to guarantee existence and uniqueness of a deterministic OT map, on which various additional regularity assumptions are imposed to obtain quantitative error bounds. In many real-world problems these conditions fail or cannot be certified, in which case optimal transportation is possible only via stochastic maps that can split mass. To broaden the scope of map estimation theory to such settings, this work introduces a novel metric for evaluating the transportation quality of stochastic maps. Under this metric, we develop computationally efficient map estimators with near-optimal finite-sample risk bounds, subject to easy-to-verify minimal assumptions. Our analysis further accommodates common forms of adversarial sample contamination, yielding estimators with robust estimation guarantees. Empirical experiments are provided which validate our theory and demonstrate the utility of the proposed framework in settings where existing theory fails. These contributions constitute the first general-purpose theory for map estimation, compatible with a wide spectrum of real-world applications where optimal transport may be intrinsically stochastic.

【7】Distributional Shrinkage II: Optimal Transport Denoisers with Higher-Order Scores
标题：分配收缩II：具有更高等级分数的最佳交通降噪器
链接：https://arxiv.org/abs/2512.09295

作者：Tengyuan Liang
备注：23 pages
摘要：我们通过最优传输的镜头重新审视信号去噪问题：目标是从噪声观测值$Y = X + σZ$中恢复未知的标量信号分布$X \sim P$，其中$Z$是与$X$无关的标准高斯分布，$σ>0$是已知的噪声水平。$Q$表示$Y$的分布。我们引入了一个去噪器$T_0，T_1，\ldots，T_\infty：\mathbb{R} \to \mathbb{R}$的层次结构，它们与信号分布$P$无关，只依赖于$Q$的高阶得分函数。每个去噪器$T_K$在噪声分辨率$σ^{2K}$下使用$Q$的$（2K-1）$阶评分函数逐步细化，从而实现由Wasserstein度量$W（T_K \sharp Q，P）$测量的更好的去噪质量。限制去噪器$T_\infty$识别具有$T_\infty \sharp Q = P$的最优传输映射。我们提供了一个完整的表征的组合结构的基础上，通过贝尔多项式递归，揭示了高阶分数函数编码的最佳传输图信号去噪。我们研究了两个估计策略的收敛速度，高阶分数从i.i.d.从$Q$中提取的样本：（i）通过高斯核平滑的插件估计，以及（ii）通过高阶分数匹配的直接估计。这种不可知去噪器的层次结构为信号去噪和经验贝叶斯开辟了新的视角。
摘要：We revisit the signal denoising problem through the lens of optimal transport: the goal is to recover an unknown scalar signal distribution $X \sim P$ from noisy observations $Y = X + σZ$, with $Z$ being standard Gaussian independent of $X$ and $σ>0$ a known noise level. Let $Q$ denote the distribution of $Y$. We introduce a hierarchy of denoisers $T_0, T_1, \ldots, T_\infty : \mathbb{R} \to \mathbb{R}$ that are agnostic to the signal distribution $P$, depending only on higher-order score functions of $Q$. Each denoiser $T_K$ is progressively refined using the $(2K-1)$-th order score function of $Q$ at noise resolution $σ^{2K}$, achieving better denoising quality measured by the Wasserstein metric $W(T_K \sharp Q, P)$. The limiting denoiser $T_\infty$ identifies the optimal transport map with $T_\infty \sharp Q = P$. We provide a complete characterization of the combinatorial structure underlying this hierarchy through Bell polynomial recursions, revealing how higher-order score functions encode the optimal transport map for signal denoising. We study two estimation strategies with convergence rates for higher-order scores from i.i.d. samples drawn from $Q$: (i) plug-in estimation via Gaussian kernel smoothing, and (ii) direct estimation via higher-order score matching. This hierarchy of agnostic denoisers opens new perspectives in signal denoising and empirical Bayes.

预测|估计(10篇)

【1】Predicting the Containment Time of California Wildfires Using Machine Learning
标题：使用机器学习预测加州野火的控制时间
链接：https://arxiv.org/abs/2512.09835

作者：Shashank Bhardwaj
摘要：加州的野火季节多年来一直在恶化，压倒了应急反应小组。这些火灾对财产和人命造成了巨大的破坏。由于这些原因，人们越来越需要准确和实用的预测，以帮助野火管理人员或响应团队分配资源。在这项研究中，我们建立了机器学习模型来预测完全控制加州野火所需的天数。在这里，我们解决了当前文献中的一个重要空白。大多数先前的研究都集中在野火风险或火灾如何蔓延上，少数研究持续时间的研究通常在更广泛的类别中预测它，而不是连续的测量。这项研究将野火持续时间预测视为回归任务，这允许更详细和精确的预测，而不仅仅是先前工作中使用的更广泛的分类预测。我们建立了模型，结合三个公开的数据集，从加州林业和消防部门的火灾和资源评估计划（FRAP）。这项研究比较了基线集成回归器、随机森林和XGBoost与长短期记忆（LSTM）神经网络的性能。结果表明，XGBoost模型的性能略优于随机森林模型，这可能是由于其对数据集中静态特征的出色处理。另一方面，LSTM模型的表现比集成模型差，因为数据集缺乏时间特征。总的来说，这项研究表明，根据功能的可用性，野火管理人员或消防管理当局可以选择最合适的模型，以准确地预测野火遏制持续时间和有效地分配资源。
摘要：California's wildfire season keeps getting worse over the years, overwhelming the emergency response teams. These fires cause massive destruction to both property and human life. Because of these reasons, there's a growing need for accurate and practical predictions that can help assist with resources allocation for the Wildfire managers or the response teams. In this research, we built machine learning models to predict the number of days it will require to fully contain a wildfire in California. Here, we addressed an important gap in the current literature. Most prior research has concentrated on wildfire risk or how fires spread, and the few that examine the duration typically predict it in broader categories rather than a continuous measure. This research treats the wildfire duration prediction as a regression task, which allows for more detailed and precise forecasts rather than just the broader categorical predictions used in prior work. We built the models by combining three publicly available datasets from California Department of Forestry and Fire Protection's Fire and Resource Assessment Program (FRAP). This study compared the performance of baseline ensemble regressor, Random Forest and XGBoost, with a Long Short-Term Memory (LSTM) neural network. The results show that the XGBoost model slightly outperforms the Random Forest model, likely due to its superior handling of static features in the dataset. The LSTM model, on the other hand, performed worse than the ensemble models because the dataset lacked temporal features. Overall, this study shows that, depending on the feature availability, Wildfire managers or Fire management authorities can select the most appropriate model to accurately predict wildfire containment duration and allocate resources effectively.

【2】Predicting Polymer Solubility in Solvents Using SMILES Strings
标题：使用SMILES弦预测聚合物在溶剂中的溶解度
链接：https://arxiv.org/abs/2512.09784

作者：Andrew Reinhard
摘要：了解和预测聚合物在各种溶剂中的溶解度对于从回收到药物配方的应用至关重要。这项工作提出了一个深度学习框架，可以直接从聚合物和溶剂的SMILES表示中预测聚合物的溶解度，以重量百分比（wt%）表示。由校准的分子动力学模拟构建25 ℃下的8，049个聚合物溶剂对的数据集（Zhou等人，2023），并将分子描述符和指纹组合成每个样品的2，394个特征表示。使用Adam优化器训练具有六个隐藏层的完全连接的神经网络，并使用均方误差损失进行评估，实现预测值与实际溶解度值之间的高度一致。使用材料基因组计划的实验测量数据证明了泛化能力，其中该模型在25种看不见的聚合物溶剂组合上保持了高精度。这些发现突出了基于SMILES的机器学习模型在可扩展的溶解度预测和高通量溶剂筛选中的可行性，支持绿色化学，聚合物加工和材料设计中的应用。
摘要：Understanding and predicting polymer solubility in various solvents is critical for applications ranging from recycling to pharmaceutical formulation. This work presents a deep learning framework that predicts polymer solubility, expressed as weight percent (wt%), directly from SMILES representations of both polymers and solvents. A dataset of 8,049 polymer solvent pairs at 25 deg C was constructed from calibrated molecular dynamics simulations (Zhou et al., 2023), and molecular descriptors and fingerprints were combined into a 2,394 feature representation per sample. A fully connected neural network with six hidden layers was trained using the Adam optimizer and evaluated using mean squared error loss, achieving strong agreement between predicted and actual solubility values. Generalizability was demonstrated using experimentally measured data from the Materials Genome Project, where the model maintained high accuracy on 25 unseen polymer solvent combinations. These findings highlight the viability of SMILES based machine learning models for scalable solubility prediction and high-throughput solvent screening, supporting applications in green chemistry, polymer processing, and materials design.

【3】A Granular Framework for Construction Material Price Forecasting: Econometric and Machine-Learning Approaches
标题：建筑材料价格预测的粒度框架：计量经济学和机器学习方法
链接：https://arxiv.org/abs/2512.09360

作者：Boge Lyu,Qianye Yin,Iris Denise Tommelein,Hanyang Liu,Karnamohit Ranka,Karthik Yeluripati,Junzhe Shi
摘要：建筑材料价格的持续波动对成本估算、预算编制和项目交付构成了重大风险，这凸显了对粒度和可扩展预测方法的迫切需求。这项研究开发了一个预测框架，利用建筑规范研究所（CSI）MasterFormat作为目标数据结构，实现六位数部分水平的预测，并支持广泛的建筑材料的详细成本预测。为了提高预测准确性，该框架整合了原材料价格、商品指数和宏观经济指标等解释变量。四个时间序列模型，长短期记忆（LSTM），自回归综合移动平均（ARIMA），向量误差校正模型（VECM）和Chronos-Bolt，在基线配置（仅使用CSI数据）和扩展版本下进行了评估。结果表明，纳入解释变量显着提高了所有模型的预测性能。在测试的方法中，LSTM模型始终实现了最高的准确性，RMSE值低至1.390，MAPE值为0.957，比传统的统计时间序列模型ARIMA提高了59%。跨CSI多个部门的验证证实了该框架的可扩展性，而部门06（木材，塑料和复合材料）作为演示案例详细介绍。这项研究提供了一个强大的方法，使业主和承包商，以改善预算编制的做法，并实现更可靠的成本估算在连续性水平。
摘要：The persistent volatility of construction material prices poses significant risks to cost estimation, budgeting, and project delivery, underscoring the urgent need for granular and scalable forecasting methods. This study develops a forecasting framework that leverages the Construction Specifications Institute (CSI) MasterFormat as the target data structure, enabling predictions at the six-digit section level and supporting detailed cost projections across a wide spectrum of building materials. To enhance predictive accuracy, the framework integrates explanatory variables such as raw material prices, commodity indexes, and macroeconomic indicators. Four time-series models, Long Short-Term Memory (LSTM), Autoregressive Integrated Moving Average (ARIMA), Vector Error Correction Model (VECM), and Chronos-Bolt, were evaluated under both baseline configurations (using CSI data only) and extended versions with explanatory variables. Results demonstrate that incorporating explanatory variables significantly improves predictive performance across all models. Among the tested approaches, the LSTM model consistently achieved the highest accuracy, with RMSE values as low as 1.390 and MAPE values of 0.957, representing improvements of up to 59\% over the traditional statistical time-series model, ARIMA. Validation across multiple CSI divisions confirmed the framework's scalability, while Division 06 (Wood, Plastics, and Composites) is presented in detail as a demonstration case. This research offers a robust methodology that enables owners and contractors to improve budgeting practices and achieve more reliable cost estimation at the Definitive level.

【4】Beyond the Hype: Comparing Lightweight and Deep Learning Models for Air Quality Forecasting
标题：超越炒作：比较轻量级和深度学习模型用于空气质量预测
链接：https://arxiv.org/abs/2512.09076

作者：Moazzam Umer Gondal,Hamad ul Qudous,Asma Ahmad Farhan
摘要：准确预测城市空气污染对于保护公众健康和指导缓解政策至关重要。虽然深度学习（DL）和混合管道主导了最近的研究，但它们的复杂性和有限的可解释性阻碍了操作使用。本研究调查了轻量级添加剂模型- Facebook Prophet（FBP）和NeuralProphet（NP）-是否可以提供中国北京颗粒物（PM$_{2.5}$，PM$_{10}$）的竞争性预测。使用多年的污染物和气象数据，我们应用系统的功能选择（相关性，互信息，mRMR），泄漏安全缩放，和按时间顺序的数据分裂。这两个模型都使用污染物和前体回归量进行训练，NP还利用了滞后依赖关系。对于上下文，还实现了两个机器学习基线（LSTM，LightGBM）和一个传统的统计模型（SARIMAX）。使用MAE、RMSE和$R^2$在7天的保持期上评估性能。结果表明，FBP始终优于NP，SARIMAX和基于学习的基线，两种污染物的测试R^2 $均高于0.94。这些发现表明，可解释的加法模型与传统和复杂的方法相比仍然具有竞争力，提供了准确性，透明度和易于部署的实际平衡。
摘要：Accurate forecasting of urban air pollution is essential for protecting public health and guiding mitigation policies. While Deep Learning (DL) and hybrid pipelines dominate recent research, their complexity and limited interpretability hinder operational use. This study investigates whether lightweight additive models -- Facebook Prophet (FBP) and NeuralProphet (NP) -- can deliver competitive forecasts for particulate matter (PM$_{2.5}$, PM$_{10}$) in Beijing, China. Using multi-year pollutant and meteorological data, we applied systematic feature selection (correlation, mutual information, mRMR), leakage-safe scaling, and chronological data splits. Both models were trained with pollutant and precursor regressors, with NP additionally leveraging lagged dependencies. For context, two machine learning baselines (LSTM, LightGBM) and one traditional statistical model (SARIMAX) were also implemented. Performance was evaluated on a 7-day holdout using MAE, RMSE, and $R^2$. Results show that FBP consistently outperformed NP, SARIMAX, and the learning-based baselines, achieving test $R^2$ above 0.94 for both pollutants. These findings demonstrate that interpretable additive models remain competitive with both traditional and complex approaches, offering a practical balance of accuracy, transparency, and ease of deployment.

【5】Modular Deep-Learning-Based Early Warning System for Deadly Heatwave Prediction
标题：模块化基于深度学习的致命热浪预测预警系统
链接：https://arxiv.org/abs/2512.09074

作者：Shangqing Xu,Zhiyuan Zhao,Megha Sharma,José María Martín-Olalla,Alexander Rodríguez,Gregory A. Wellenius,B. Aditya Prakash
摘要：城市地区的严重热浪严重威胁公共卫生，需要制定预警战略。尽管预测热浪的发生并归因于历史死亡率，但由于难以定义和估计与热有关的死亡率，预测即将到来的致命热浪仍然是一个挑战。此外，建立预警系统还提出了其他要求，包括数据的可得性、空间和时间的可靠性以及决策成本。为了应对这些挑战，我们提出了DeepTherm，这是一种模块化的早期预警系统，用于预测致命的热浪，而不需要与热相关的死亡历史。通过强调深度学习的灵活性，DeepTherm采用了双重预测管道，将没有热浪和其他不规则事件的基线死亡率与全因死亡率分离开来。我们根据西班牙各地的真实数据对DeepTherm进行了评估。结果表明，在不同的地区，时间段和人群中，性能一致，稳健和准确，同时允许在错过的警报和错误警报之间进行权衡。
摘要：Severe heatwaves in urban areas significantly threaten public health, calling for establishing early warning strategies. Despite predicting occurrence of heatwaves and attributing historical mortality, predicting an incoming deadly heatwave remains a challenge due to the difficulty in defining and estimating heat-related mortality. Furthermore, establishing an early warning system imposes additional requirements, including data availability, spatial and temporal robustness, and decision costs. To address these challenges, we propose DeepTherm, a modular early warning system for deadly heatwave prediction without requiring heat-related mortality history. By highlighting the flexibility of deep learning, DeepTherm employs a dual-prediction pipeline, disentangling baseline mortality in the absence of heatwaves and other irregular events from all-cause mortality. We evaluated DeepTherm on real-world data across Spain. Results demonstrate consistent, robust, and accurate performance across diverse regions, time periods, and population groups while allowing trade-off between missed alarms and false alarms.

【6】A Diffusion-Based Framework for High-Resolution Precipitation Forecasting over CONUS
标题：基于扩散的COUS高分辨率降水预报框架
链接：https://arxiv.org/abs/2512.09059

作者：Marina Vicens-Miquel,Amy McGovern,Aaron J. Hill,Efi Foufoula-Georgiou,Clement Guilloteau,Samuel S. P. Shen
摘要：准确的降水预报对于水文气象风险管理至关重要，特别是对于预测可能导致山洪暴发和基础设施破坏的极端降雨。本研究引入了一个基于扩散的深度学习（DL）框架，该框架系统地比较了三种残差预测策略，仅在输入源上有所不同：（1）仅使用多雷达多传感器（MRMS）系统的过去观测的完全数据驱动模式，（2）仅使用高分辨率快速刷新（HRRR）数值天气预报系统的预报的校正模式，以及（3）将MRMS和选定的HRRR预测变量整合的混合模型。通过在统一的设置下评估这些方法，我们可以更清楚地了解每个数据源如何有助于美国大陆（CONUS）的预测技能。预报是以1公里的空间分辨率制作的，从1小时的直接预报开始，使用自回归展开法延长到12小时。使用CONUS范围和区域特定指标评估性能，这些指标评估极端降雨阈值下的整体性能和技能。在所有的交货期，我们的DL框架始终优于HRRR基线在像素和空间统计指标。混合模型在最短的交货期内表现最好，而HRRR校正模型在较长的交货期内表现优于其他模型，在12小时内保持高技能。为了评估可靠性，我们将校准的不确定性量化量身定制的残差学习设置。这些收益，特别是在较长的准备时间内，对应急准备至关重要，在应急准备中，适度增加预测范围可以改善决策。这项工作通过提高预测技能、可靠性和跨区域适用性来推进基于DL的降水预测。
摘要：Accurate precipitation forecasting is essential for hydrometeorological risk management, especially for anticipating extreme rainfall that can lead to flash flooding and infrastructure damage. This study introduces a diffusion-based deep learning (DL) framework that systematically compares three residual prediction strategies differing only in their input sources: (1) a fully data-driven model using only past observations from the Multi-Radar Multi-Sensor (MRMS) system, (2) a corrective model using only forecasts from the High-Resolution Rapid Refresh (HRRR) numerical weather prediction system, and (3) a hybrid model integrating both MRMS and selected HRRR forecast variables. By evaluating these approaches under a unified setup, we provide a clearer understanding of how each data source contributes to predictive skill over the Continental United States (CONUS). Forecasts are produced at 1-km spatial resolution, beginning with direct 1-hour predictions and extending to 12 hours using autoregressive rollouts. Performance is evaluated using both CONUS-wide and region-specific metrics that assess overall performance and skill at extreme rainfall thresholds. Across all lead times, our DL framework consistently outperforms the HRRR baseline in pixel-wise and spatiostatistical metrics. The hybrid model performs best at the shortest lead time, while the HRRR-corrective model outperforms others at longer lead times, maintaining high skill through 12 hours. To assess reliability, we incorporate calibrated uncertainty quantification tailored to the residual learning setup. These gains, particularly at longer lead times, are critical for emergency preparedness, where modest increases in forecast horizon can improve decision-making. This work advances DL-based precipitation forecasting by enhancing predictive skill, reliability, and applicability across regions.

【7】LUMOS: Large User MOdels for User Behavior Prediction
标题：LUMOS：用于用户行为预测的大用户模型
链接：https://arxiv.org/abs/2512.08957

作者：Dhruv Nigam
摘要：大规模的用户行为预测仍然是在线B2C平台的关键挑战。传统的方法严重依赖于特定于任务的模型和特定于领域的特征工程。这是耗时的，计算昂贵的，并需要领域的专业知识，因此不可扩展。我们提出了LUMOS（大用户模型系列），一个基于transformer的架构，通过只使用原始用户活动数据联合学习多个任务，消除了特定于任务的模型和手动功能工程。LUMOS引入了一种新的交叉注意机制，该机制对未来已知事件（例如，假期、销售等），使模型能够预测复杂的行为模式，如“即将到来的假期将如何影响用户参与？该架构还采用了多模态标记化，将用户事务、事件上下文和静态用户人口统计属性组合成通过专门的嵌入路径处理的丰富表示。通过对来自2.5亿用户的2750亿用户活动令牌的生产数据集进行广泛的实验，我们证明了LUMOS与传统的特定于任务的模型相比具有更高的性能。在具有既定基线的5个任务中，我们实现了二进制分类任务的ROC-AUC平均提高0.025，回归任务的MAPE降低4.6%。在线A/B测试验证了这些改进转化为可衡量的业务影响，每日活跃用户增加了3.15%。
摘要：User behavior prediction at scale remains a critical challenge for online B2C platforms. Traditional approaches rely heavily on task-specific models and domain-specific feature engineering. This is time-consuming, computationally expensive, and requires domain expertise and therefore not scalable. We present LUMOS (Large User MOdel Series), a transformer-based architecture that eliminates task-specific models and manual feature engineering by learning multiple tasks jointly using only raw user activity data. LUMOS introduces a novel cross-attention mechanism that conditions predictions on future known events (e.g., holidays, sales, etc.), enabling the model to predict complex behaviour patterns like "how will upcoming holidays affect user engagement?" The architecture also employs multi-modal tokenization, combining user transactions, event context, and static user demographic attributes into rich representations processed through specialized embedding pathways. Through extensive experiments on a production dataset spanning 275 billion user activity tokens from 250 million users, we demonstrate that LUMOS achieves superior performance compared to traditional task-specific models. Across 5 tasks with established baselines, we achieve an average improvement of 0.025 in ROC-AUC for binary classification tasks and 4.6\% reduction in MAPE for regression tasks. Online A/B testing validates these improvements translate to measurable business impact with a 3.15\% increase in Daily Active Users.

【8】Robust and Sparse Estimation of Unbounded Density Ratio under Heavy Contamination
标题：重污染下无界密度比的稳健稀疏估计
链接：https://arxiv.org/abs/2512.09266

作者：Ryosuke Nagumo,Hironori Fujisawa
摘要：研究了污染背景下稳健密度比估计的非渐近性质。加权DRE是现有方法中最有前途的，表现出双重强大的鲁棒性从渐近的角度来看。这项研究表明，加权DRE实现稀疏一致性，即使在严重污染的非渐近框架内。该方法解决了密度比估计和鲁棒估计中的两个重要挑战。对于密度比估计，在加权密度比函数有界的假设下，给出了无界密度比估计的非渐近性质。对于鲁棒估计，我们引入了一个非渐近的框架下重污染的双强鲁棒性，假设至少有以下条件之一成立：（i）污染率是小的，和（ii）离群值有小的加权值。这项工作提供了第一个非渐近分析的强鲁棒性下重污染。
摘要：We examine the non-asymptotic properties of robust density ratio estimation (DRE) in contaminated settings. Weighted DRE is the most promising among existing methods, exhibiting doubly strong robustness from an asymptotic perspective. This study demonstrates that Weighted DRE achieves sparse consistency even under heavy contamination within a non-asymptotic framework. This method addresses two significant challenges in density ratio estimation and robust estimation. For density ratio estimation, we provide the non-asymptotic properties of estimating unbounded density ratios under the assumption that the weighted density ratio function is bounded. For robust estimation, we introduce a non-asymptotic framework for doubly strong robustness under heavy contamination, assuming that at least one of the following conditions holds: (i) contamination ratios are small, and (ii) outliers have small weighted values. This work provides the first non-asymptotic analysis of strong robustness under heavy contamination.

【9】FuXi-Nowcast: Meet the longstanding challenge of convective initiation in nowcasting
标题：FuXi-Nowcast：应对即时预报中对流引发的长期挑战
链接：https://arxiv.org/abs/2512.08974

作者：Lei Chen,Zijian Zhu,Xiaoran Zhuang,Tianyuan Qi,Yuxuan Feng,Xiaohui Zhong,Hao Li
摘要：对流风暴的准确临近预报仍然是业务预报的一个主要挑战，特别是对流的开始和强降雨和强风的演变。在这里，我们介绍了FuXi-Nowcast，这是一个深度学习系统，可以联合预测中国东部地区1公里分辨率的复合雷达反射率，地面降水，近地面温度，风速和阵风。FuXi-Nowcast将雷达、地面站和高分辨率陆地数据同化系统（HRLDAS）等多源观测与多任务Swin-Transformer架构内的机器学习天气模型FuXi-2.0的三维大气场相结合。对流信号增强模块和分布感知混合损失函数旨在保留强对流结构并减轻深度学习临近预报中常见的快速强度衰减。FuXi-Nowcast在临界成功指数方面超过了业务CMA-MESO 3 km数值模式，反射率，降水和阵风跨越阈值，提前时间长达12 h，其中暴雨增益最大。个例研究进一步表明，伏羲近距离预报能更准确地捕捉到对流发生的时间、位置和结构以及随后的对流演变。这些结果表明，将三维机器学习预测与高分辨率观测相结合，可以提供优于当前业务系统的多灾害、长时间临近预报。
摘要：Accurate nowcasting of convective storms remains a major challenge for operational forecasting, particularly for convective initiation and the evolution of high-impact rainfall and strong winds. Here we present FuXi-Nowcast, a deep-learning system that jointly predicts composite radar reflectivity, surface precipitation, near-surface temperature, wind speed and wind gusts at 1-km resolution over eastern China. FuXi-Nowcast integrates multi-source observations, such as radar, surface stations and the High-Resolution Land Data Assimilation System (HRLDAS), with three-dimensional atmospheric fields from the machine-learning weather model FuXi-2.0 within a multi-task Swin-Transformer architecture. A convective signal enhancement module and distribution-aware hybrid loss functions are designed to preserve intense convective structures and mitigate the rapid intensity decay common in deep-learning nowcasts. FuXi-Nowcast surpasses the operational CMA-MESO 3-km numerical model in Critical Success Index for reflectivity, precipitation and wind gusts across thresholds and lead times up to 12 h, with the largest gains for heavy rainfall. Case studies further show that FuXi-Nowcast more accurately captures the timing, location and structure of convective initiation and subsequent evolution of convection. These results demonstrate that coupling three-dimensional machine-learning forecasts with high-resolution observations can provide multi-hazard, long-lead nowcasts that outperforms current operational systems.

【10】Multivariate time series prediction using clustered echo state network
标题：使用集群回声状态网络的多元时间序列预测
链接：https://arxiv.org/abs/2512.08963

作者：S. Hariharan,R. Suresh,V. K. Chandrasekar
备注：Published in Eur. Phys. J. Plus 140, 1133 (2025)
摘要：许多自然和物理过程可以通过分析多个系统变量的演变来理解，形成一个多变量的时间序列。由于固有的噪声和变量之间的相互依赖性，预测这样的时间序列是具有挑战性的。回声状态网络（ESN）是一类水库计算（RC）模型，通过仅训练输出权重，同时保持水库动态固定，降低计算复杂度，为传统的递归神经网络提供了一种有效的替代方案。我们提出了一个集群ESNs（CESNs），通过将水库节点组织成集群，每个集群对应于一个不同的输入变量，增强了建模和预测多变量时间序列的能力。输入信号被直接映射到其相关的集群，集群内的连接保持密集，而集群间的连接是稀疏的，模仿生物神经网络的模块化架构。该架构通过限制交叉变量干扰来改善信息处理，并通过岭回归的独立聚类训练来提高计算效率。我们进一步探索了不同的水库拓扑结构，包括环形，Erdens-Rényi（ER）和无标度（SF）网络，以评估其影响预测性能。我们的算法在不同的真实世界数据集上运行良好，如股票市场，太阳风和混沌罗斯勒系统，表明CESN在预测准确性和对噪声的鲁棒性方面始终优于传统的ESN，特别是在使用ER和SF拓扑时。这些发现突出了CESN对复杂的多变量时间序列预测的适应性。
摘要：Many natural and physical processes can be understood by analyzing multiple system variables evolving, forming a multivariate time series. Predicting such time series is challenging due to the inherent noise and interdependencies among variables. Echo state networks (ESNs), a class of Reservoir Computing (RC) models, offer an efficient alternative to conventional recurrent neural networks by training only the output weights while keeping the reservoir dynamics fixed, reducing computational complexity. We propose a clustered ESNs (CESNs) that enhances the ability to model and predict multivariate time series by organizing the reservoir nodes into clusters, each corresponding to a distinct input variable. Input signals are directly mapped to their associated clusters, and intra-cluster connections remain dense while inter-cluster connections are sparse, mimicking the modular architecture of biological neural networks. This architecture improves information processing by limiting cross-variable interference and enhances computational efficiency through independent cluster-wise training via ridge regression. We further explore different reservoir topologies, including ring, Erdős-Rényi (ER), and scale-free (SF) networks, to evaluate their impact predictive performance. Our algorithm works well across diverse real-world datasets such as the stock market, solar wind, and chaotic Rössler system, demonstrating that CESNs consistently outperform conventional ESNs in terms of predictive accuracy and robustness to noise, particularly when using ER and SF topologies. These findings highlight the adaptability of CESNs for complex, multivariate time series forecasting.

其他神经网络|深度学习|模型|建模(16篇)

【1】Closing the Train-Test Gap in World Models for Gradient-Based Planning
标题：缩小基于学生的计划世界模型中的训练测试差距
链接：https://arxiv.org/abs/2512.09929

作者：Arjun Parthasarathy,Nimit Kalra,Rohun Agrawal,Yann LeCun,Oumayma Bounou,Pavel Izmailov,Micah Goldblum
摘要：与模型预测控制（MPC）配对的世界模型可以在大规模专家轨迹数据集上进行离线训练，并在推理时推广到各种规划任务。与传统的MPC程序，依赖于缓慢的搜索算法或迭代求解优化问题，基于梯度的规划提供了一个计算效率高的替代方案。然而，基于梯度的规划的绩效远远落后于其他方法。在本文中，我们提出了改进的方法来训练世界模型，使有效的基于梯度的规划。我们首先观察到，尽管世界模型是在下一个状态预测目标上训练的，但它在测试时被用来估计一系列动作。我们工作的目标是缩小这一培训测试差距。为此，我们提出了训练时间数据合成技术，使显着改善基于梯度的规划与现有的世界模型。在测试时，我们的方法优于或匹配经典的无梯度交叉熵方法（CEM）在各种对象操作和导航任务的10%的时间预算。
摘要：World models paired with model predictive control (MPC) can be trained offline on large-scale datasets of expert trajectories and enable generalization to a wide range of planning tasks at inference time. Compared to traditional MPC procedures, which rely on slow search algorithms or on iteratively solving optimization problems exactly, gradient-based planning offers a computationally efficient alternative. However, the performance of gradient-based planning has thus far lagged behind that of other approaches. In this paper, we propose improved methods for training world models that enable efficient gradient-based planning. We begin with the observation that although a world model is trained on a next-state prediction objective, it is used at test-time to instead estimate a sequence of actions. The goal of our work is to close this train-test gap. To that end, we propose train-time data synthesis techniques that enable significantly improved gradient-based planning with existing world models. At test time, our approach outperforms or matches the classical gradient-free cross-entropy method (CEM) across a variety of object manipulation and navigation tasks in 10% of the time budget.

【2】Fast Factorized Learning: Powered by In-Memory Database Systems
标题：快速分解学习：由内存数据库系统提供支持
链接：https://arxiv.org/abs/2512.09836

作者：Bernhard Stöckl,Maximilian E. Schüle
摘要：在因子分解连接上学习模型通过识别和预计算共享的余因子来避免冗余计算。以前的工作已经调查的性能增益时，计算基于磁盘的传统数据库系统的余因子。由于没有公开的代码，实验无法在内存数据库系统上重现。这项工作描述了使用辅助因子进行数据库内因子化学习时的实现。我们对我们的开源实现进行了基准测试，用于学习与PostgreSQL（作为基于磁盘的数据库系统）和Hyper（作为内存引擎）的因式连接的线性回归。评估结果表明，在内存数据库系统的因式学习的性能增益70%的非因式学习和基于磁盘的数据库系统相比，由一个因素的100。因此，现代数据库引擎可以通过在数据提取之前预先计算聚合来加速训练，从而为机器学习管道做出贡献。
摘要：Learning models over factorized joins avoids redundant computations by identifying and pre-computing shared cofactors. Previous work has investigated the performance gain when computing cofactors on traditional disk-based database systems. Due to the absence of published code, the experiments could not be reproduced on in-memory database systems. This work describes the implementation when using cofactors for in-database factorized learning. We benchmark our open-source implementation for learning linear regression on factorized joins with PostgreSQL -- as a disk-based database system -- and HyPer -- as an in-memory engine. The evaluation shows a performance gain of factorized learning on in-memory database systems by 70\% to non-factorized learning and by a factor of 100 compared to disk-based database systems. Thus, modern database engines can contribute to the machine learning pipeline by pre-computing aggregates prior to data extraction to accelerate training.

【3】Interpretation as Linear Transformation: A Cognitive-Geometric Model of Belief and Meaning
标题：线性转换的解释：信念和意义的认知-几何模型
链接：https://arxiv.org/abs/2512.09831

作者：Chainarong Amornbunchornvej
备注：The first draft of cognitive geometry model
摘要：本文开发了一个几何框架建模的信念，动机和影响力的认知异构代理。每个代理都由一个个性化的值空间表示，该空间是一个向量空间，对代理解释和评估意义的内部维度进行编码。信念被形式化为结构化的向量-抽象的存在-其传输是由线性解释图介导的。一个信念只有在避免了这些地图的零空间，产生了可理解性、误解和信念死亡的结构标准时，才能在交流中幸存下来。在这个框架内，我展示了如何信念扭曲，动机漂移，反事实的评价，以及相互理解的局限性产生于纯粹的代数约束。一个核心结果--“无障碍领导条件”--将领导力描述为代表性可达性的属性，而不是说服力或权威。更广泛地说，该模型解释了抽象的存在如何在穿越不同的认知几何时传播、变异或消失。该帐户统一了概念空间，社会认识论和人工智能价值对齐的见解，通过将意义保存建立在结构兼容性而不是共享信息或理性上。我认为，这种认知几何的角度澄清了在人类和人工系统的影响的认识边界，并提供了一个一般的基础，跨异构代理人的信念动态分析。
摘要：This paper develops a geometric framework for modeling belief, motivation, and influence across cognitively heterogeneous agents. Each agent is represented by a personalized value space, a vector space encoding the internal dimensions through which the agent interprets and evaluates meaning. Beliefs are formalized as structured vectors-abstract beings-whose transmission is mediated by linear interpretation maps. A belief survives communication only if it avoids the null spaces of these maps, yielding a structural criterion for intelligibility, miscommunication, and belief death. Within this framework, I show how belief distortion, motivational drift, counterfactual evaluation, and the limits of mutual understanding arise from purely algebraic constraints. A central result-"the No-Null-Space Leadership Condition"-characterizes leadership as a property of representational reachability rather than persuasion or authority. More broadly, the model explains how abstract beings can propagate, mutate, or disappear as they traverse diverse cognitive geometries. The account unifies insights from conceptual spaces, social epistemology, and AI value alignment by grounding meaning preservation in structural compatibility rather than shared information or rationality. I argue that this cognitive-geometric perspective clarifies the epistemic boundaries of influence in both human and artificial systems, and offers a general foundation for analyzing belief dynamics across heterogeneous agents.

【4】Ariel-ML: Computing Parallelization with Embedded Rust for Neural Networks on Heterogeneous Multi-core Microcontrollers
标题：Ariel-ML：使用嵌入式Rust计算异类多核微控制器上的神经网络并行化
链接：https://arxiv.org/abs/2512.09800

作者：Zhaolan Huang,Kaspar Schleiser,Gyungmin Myung,Emmanuel Baccelli
摘要：低功耗微控制器（MCU）硬件目前正从单核架构向多核架构发展。与此同时，越来越多的新嵌入式软件构建块使用Rust编写，而C/C++在这一领域的主导地位正在消失。另一方面，各种小型人工神经网络（ANN）越来越多地部署在边缘AI用例中，因此直接在低功耗MCU上部署和执行。在这种情况下，增量改进和新颖的创新服务都必须使用嵌入在已部署在现场的传感/致动系统上的软件中的ANN执行来不断改造。然而，到目前为止，还没有Rust嵌入式软件平台可以在执行任意TinyML模型的多核MCU上自动并行推理计算。因此，本文通过引入Ariel-ML填补了这一空白，Ariel-ML是我们设计的一种新型工具包，结合了通用TinyML管道和嵌入式Rust软件平台，可以充分利用各种32位微控制器系列（Arm Cortex-M，RISC-V，ESP-32）的多核功能。我们发布了其实现的完整开源代码，我们使用各种TinyML模型对其功能进行了基准测试。我们表明，Ariel-ML在推理延迟方面优于现有技术，并且我们表明，与使用嵌入式C/C++的现有工具包相比，Ariel-ML实现了相当的内存占用。Ariel-ML因此为TinyML从业者和资源受限的嵌入式Rust开发人员提供了有用的基础。
摘要：Low-power microcontroller (MCU) hardware is currently evolving from single-core architectures to predominantly multi-core architectures. In parallel, new embedded software building blocks are more and more written in Rust, while C/C++ dominance fades in this domain. On the other hand, small artificial neural networks (ANN) of various kinds are increasingly deployed in edge AI use cases, thus deployed and executed directly on low-power MCUs. In this context, both incremental improvements and novel innovative services will have to be continuously retrofitted using ANNs execution in software embedded on sensing/actuating systems already deployed in the field. However, there was so far no Rust embedded software platform automating parallelization for inference computation on multi-core MCUs executing arbitrary TinyML models. This paper thus fills this gap by introducing Ariel-ML, a novel toolkit we designed combining a generic TinyML pipeline and an embedded Rust software platform which can take full advantage of multi-core capabilities of various 32bit microcontroller families (Arm Cortex-M, RISC-V, ESP-32). We published the full open source code of its implementation, which we used to benchmark its capabilities using a zoo of various TinyML models. We show that Ariel-ML outperforms prior art in terms of inference latency as expected, and we show that, compared to pre-existing toolkits using embedded C/C++, Ariel-ML achieves comparable memory footprints. Ariel-ML thus provides a useful basis for TinyML practitioners and resource-constrained embedded Rust developers.

【5】Semantic-Aware Cooperative Communication and Computation Framework in Vehicular Networks
标题：车载网络中语义感知的协同通信和计算框架
链接：https://arxiv.org/abs/2512.09621

作者：Jingbo Zhang,Maoxin Ji,Qiong Wu,Pingyi Fan,Kezhi Wang,Wen Chen
摘要：语义通信（SC）结合车辆边缘计算（VEC）为车联网（IoV）提供了一种高效的边缘任务处理范式。针对高速公路场景，提出了一种三方协作语义通信（TCSC）框架，使车辆用户（VU）能够通过V2 I和V2 V通信完成语义任务卸载。该框架考虑任务延迟和语义符号数量，构造了一个混合非线性规划（MINLP）问题，并将其转化为两个子问题。首先，创新性地提出了基于参数分布噪声的多智能体最接近策略优化任务卸载优化方法（MAPPO-PDN），解决了语义符号个数的优化问题;其次，采用线性规划（LP）求解卸载比。仿真结果表明，该方案的性能优于其他算法。
摘要：Semantic Communication (SC) combined with Vehicular edge computing (VEC) provides an efficient edge task processing paradigm for Internet of Vehicles (IoV). Focusing on highway scenarios, this paper proposes a Tripartite Cooperative Semantic Communication (TCSC) framework, which enables Vehicle Users (VUs) to perform semantic task offloading via Vehicle-to-Infrastructure (V2I) and Vehicle-to-Vehicle (V2V) communications. Considering task latency and the number of semantic symbols, the framework constructs a Mixed-Integer Nonlinear Programming (MINLP) problem, which is transformed into two subproblems. First, we innovatively propose a multi-agent proximal policy optimization task offloading optimization method based on parametric distribution noise (MAPPO-PDN) to solve the optimization problem of the number of semantic symbols; second, linear programming (LP) is used to solve offloading ratio. Simulations show that performance of this scheme is superior to that of other algorithms.

【6】Stanford Sleep Bench: Evaluating Polysomnography Pre-training Methods for Sleep Foundation Models
标题：斯坦福大学睡眠实验台：评估睡眠基金会模型的多导睡眠图预训练方法
链接：https://arxiv.org/abs/2512.09591

作者：Magnus Ruud Kjaer,Rahul Thapa,Gauri Ganjoo,Hyatt Moore,Poul Joergen Jennum,Brandon M. Westover,James Zou,Emmanuel Mignot,Bryan He,Andreas Brink-Kjaer
摘要：多导睡眠图（PSG）是睡眠分析的黄金标准测试，它生成了大量的多模态临床数据，为利用自我监督表示学习（SSRL）进行预训练基础模型以增强睡眠分析提供了机会。然而，睡眠基础模型的进展受到两个关键限制的阻碍：（1）缺乏用于训练和评估的不同任务的共享数据集和基准，以及（2）缺乏对睡眠相关任务的SSRL方法的系统评估。为了解决这些差距，我们引入了斯坦福睡眠工作台，这是一个大规模的PSG数据集，包括来自一家大型睡眠诊所的17，467个记录，总计超过163，000小时，包括13个临床疾病预测任务以及睡眠相关的规范任务，如睡眠分期，呼吸暂停诊断和年龄估计。我们在斯坦福睡眠实验台上系统地评估了SSRL预训练方法，评估了四项任务的下游性能：睡眠分期、呼吸暂停诊断、年龄估计以及疾病和死亡率预测。我们的研究结果表明，多种预训练方法在睡眠分期、呼吸暂停诊断和年龄估计方面的性能相当。然而，对于死亡率和疾病预测，对比学习显著优于其他方法，同时在预训练期间收敛得更快。为了促进可重复性和推进睡眠研究，我们将发布Stanford Sleep Bench以及预训练的模型权重，训练管道和评估代码。
摘要：Polysomnography (PSG), the gold standard test for sleep analysis, generates vast amounts of multimodal clinical data, presenting an opportunity to leverage self-supervised representation learning (SSRL) for pre-training foundation models to enhance sleep analysis. However, progress in sleep foundation models is hindered by two key limitations: (1) the lack of a shared dataset and benchmark with diverse tasks for training and evaluation, and (2) the absence of a systematic evaluation of SSRL approaches across sleep-related tasks. To address these gaps, we introduce Stanford Sleep Bench, a large-scale PSG dataset comprising 17,467 recordings totaling over 163,000 hours from a major sleep clinic, including 13 clinical disease prediction tasks alongside canonical sleep-related tasks such as sleep staging, apnea diagnosis, and age estimation. We systematically evaluate SSRL pre-training methods on Stanford Sleep Bench, assessing downstream performance across four tasks: sleep staging, apnea diagnosis, age estimation, and disease and mortality prediction. Our results show that multiple pretraining methods achieve comparable performance for sleep staging, apnea diagnosis, and age estimation. However, for mortality and disease prediction, contrastive learning significantly outperforms other approaches while also converging faster during pretraining. To facilitate reproducibility and advance sleep research, we will release Stanford Sleep Bench along with pretrained model weights, training pipelines, and evaluation code.

【7】Rates and architectures for learning geometrically non-trivial operators
标题：学习几何非平凡运算符的速率和架构
链接：https://arxiv.org/abs/2512.09376

作者：T. Mitchell Roddenberry,Leo Tzou,Ivan Dokmanić,Maarten V. de Hoop,Richard G. Baraniuk
备注：26 pages, 5 figures
摘要：深度学习方法已被证明能够从很少的训练样本中恢复高维空间之间的算子，例如PDE和数学物理中类似对象的解映射。这种数据效率的现象已经被证明是某些类的椭圆算子与简单的几何，即，不改变函数的定义域或传播奇点的算子。然而，科学机器学习通常用于涉及以先验未知方式传播奇点的问题，例如波，平流和流体动力学。鉴于此，我们扩大学习理论，包括双纤维化变换-几何积分算子，包括广义拉东和测地线射线变换。我们证明了这类算子不受维数灾难的影响：误差超代数地衰减，也就是说，比训练样本数量的倒数的任何固定功率都要快。此外，我们研究了明确编码这些变换的几何结构的架构，表明基于水平集方法的交叉注意的架构产生了一个通用的，稳定的参数化，并从很少的训练示例中学习双纤维化变换。我们的研究结果有助于科学机器学习的学习算子的理论工作的快速增长。
摘要：Deep learning methods have proven capable of recovering operators between high-dimensional spaces, such as solution maps of PDEs and similar objects in mathematical physics, from very few training samples. This phenomenon of data-efficiency has been proven for certain classes of elliptic operators with simple geometry, i.e., operators that do not change the domain of the function or propagate singularities. However, scientific machine learning is commonly used for problems that do involve the propagation of singularities in a priori unknown ways, such as waves, advection, and fluid dynamics. In light of this, we expand the learning theory to include double fibration transforms--geometric integral operators that include generalized Radon and geodesic ray transforms. We prove that this class of operators does not suffer from the curse of dimensionality: the error decays superalgebraically, that is, faster than any fixed power of the reciprocal of the number of training samples. Furthermore, we investigate architectures that explicitly encode the geometry of these transforms, demonstrating that an architecture reminiscent of cross-attention based on levelset methods yields a parameterization that is universal, stable, and learns double fibration transforms from very few training examples. Our results contribute to a rapidly-growing line of theoretical work on learning operators for scientific machine learning.

【8】Improved Physics-Driven Neural Network to Solve Inverse Scattering Problems
标题：改进的物理驱动神经网络求解逆散射问题
链接：https://arxiv.org/abs/2512.09333

作者：Yutong Du,Zicheng Liu,Bo Wu,Jingwei Kou,Hang Li,Changyou Li,Yali Zong,Bo Qi
摘要：提出了一种改进的物理驱动神经网络（IPDNN）框架，用于求解电磁逆散射问题（ISP）。引入了一种新的高斯局部振荡抑制窗口（GLOW）激活函数，以稳定收敛并实现轻量级但精确的网络架构。动态散射子区域识别策略进一步发展，自适应地细化计算域，防止漏检，降低计算成本。此外，还结合了迁移学习，将求解器的适用性扩展到实际场景，将迭代算法的物理可解释性与神经网络的实时推理能力集成在一起。数值模拟和实验结果表明，所提出的求解器实现优越的重建精度，鲁棒性和效率相比，现有的国家的最先进的方法。
摘要：This paper presents an improved physics-driven neural network (IPDNN) framework for solving electromagnetic inverse scattering problems (ISPs). A new Gaussian-localized oscillation-suppressing window (GLOW) activation function is introduced to stabilize convergence and enable a lightweight yet accurate network architecture. A dynamic scatter subregion identification strategy is further developed to adaptively refine the computational domain, preventing missed detections and reducing computational cost. Moreover, transfer learning is incorporated to extend the solver's applicability to practical scenarios, integrating the physical interpretability of iterative algorithms with the real-time inference capability of neural networks. Numerical simulations and experimental results demonstrate that the proposed solver achieves superior reconstruction accuracy, robustness, and efficiency compared with existing state-of-the-art methods.

【9】Hetero-SplitEE: Split Learning of Neural Networks with Early Exits for Heterogeneous IoT Devices
标题：Hetero-SplitEE：具有早期退出的神经网络的拆分学习，用于异类物联网设备
链接：https://arxiv.org/abs/2512.09313

作者：Yuki Oda,Yuta Ono,Hiroshi Nakamura,Hideki Takase
备注：8 pages. Accepted at MCSoC 2025
摘要：深度神经网络的不断扩展从根本上改变了机器学习，更大的模型在不同的任务中表现出更好的性能。模型大小的这种增长极大地增加了训练过程所需的计算资源。因此，分布式方法，如联合学习和分裂学习，已成为可扩展部署的基本范例。然而，现有的分割学习方法假设客户端同质性和所有参与者的统一分割点。这严重限制了它们对现实世界物联网系统的适用性，其中设备在计算资源中表现出异质性。为了解决这一限制，本文提出了Hetero-SplitEE，这是一种新方法，使异构物联网设备能够并行协作地训练共享的深度神经网络。通过将异构的早期退出集成到分层训练中，我们的方法允许每个客户端根据其计算能力选择不同的分割点（切割层）。此外，我们提出了两个合作的培训策略，顺序策略和平均策略，以促进这种合作与不同的分裂点之间的客户端。顺序策略使用共享服务器模型顺序地训练客户端，以减少计算开销。平均化策略通过定期跨层聚合实现并行客户端训练。使用ResNet-18在CIFAR-10、CIFAR-100和STL-10数据集上进行的大量实验表明，我们的方法在保持有竞争力的准确性的同时，有效地支持各种计算约束，从而能够在异构物联网生态系统中实际部署协作深度学习。
摘要：The continuous scaling of deep neural networks has fundamentally transformed machine learning, with larger models demonstrating improved performance across diverse tasks. This growth in model size has dramatically increased the computational resources required for the training process. Consequently, distributed approaches, such as Federated Learning and Split Learning, have become essential paradigms for scalable deployment. However, existing Split Learning approaches assume client homogeneity and uniform split points across all participants. This critically limits their applicability to real-world IoT systems where devices exhibit heterogeneity in computational resources. To address this limitation, this paper proposes Hetero-SplitEE, a novel method that enables heterogeneous IoT devices to train a shared deep neural network in parallel collaboratively. By integrating heterogeneous early exits into hierarchical training, our approach allows each client to select distinct split points (cut layers) tailored to its computational capacity. In addition, we propose two cooperative training strategies, the Sequential strategy and the Averaging strategy, to facilitate this collaboration among clients with different split points. The Sequential strategy trains clients sequentially with a shared server model to reduce computational overhead. The Averaging strategy enables parallel client training with periodic cross-layer aggregation. Extensive experiments on CIFAR-10, CIFAR-100, and STL-10 datasets using ResNet-18 demonstrate that our method maintains competitive accuracy while efficiently supporting diverse computational constraints, enabling practical deployment of collaborative deep learning in heterogeneous IoT ecosystems.

【10】Natural Geometry of Robust Data Attribution: From Convex Models to Deep Networks
标题：鲁棒数据属性的自然几何：从凸模型到深度网络
链接：https://arxiv.org/abs/2512.09103

作者：Shihao Li,Jiachen Li,Dongmei Chen
摘要：数据归因方法识别哪些训练样本负责模型的预测，但它们对分布扰动的敏感性破坏了实际的可靠性。我们提出了一个统一的框架，从凸模型扩展到深度网络的认证的强大属性。对于凸设置，我们推导出具有可证明覆盖保证的Wasserstein鲁棒影响函数（W-RIF）。对于深度网络，我们证明了欧几里得认证是由频谱放大-一种机制，其中深层表示的固有病态膨胀Lipschitz边界超过10 {，} 000\times $。这就解释了为什么标准的TRAK分数，虽然精确的点估计，是几何脆弱的：天真的欧几里得稳健性分析产生0\%的认证。我们的主要贡献是自然Wasserstein度量，它测量模型自身特征协方差引起的几何扰动。这消除了光谱放大，将最坏情况下的灵敏度降低了76倍，并稳定了归因估计。在使用ResNet-18的CIFAR-10上，Natural W-TRAK认证了68.7%的排名对，而欧几里德基线为0%-据我们所知，这是神经网络属性的第一个非空认证边界。此外，我们证明了我们的分析所产生的自影响项等于Lipschitz常数的属性稳定性，提供了理论基础的异常检测的基础。从经验上看，Self-Influence在标签噪音检测方面达到了0.970 AUROC，仅通过检查前20%的训练数据即可识别94.1%的损坏标签。
摘要：Data attribution methods identify which training examples are responsible for a model's predictions, but their sensitivity to distributional perturbations undermines practical reliability. We present a unified framework for certified robust attribution that extends from convex models to deep networks. For convex settings, we derive Wasserstein-Robust Influence Functions (W-RIF) with provable coverage guarantees. For deep networks, we demonstrate that Euclidean certification is rendered vacuous by spectral amplification -- a mechanism where the inherent ill-conditioning of deep representations inflates Lipschitz bounds by over $10{,}000\times$. This explains why standard TRAK scores, while accurate point estimates, are geometrically fragile: naive Euclidean robustness analysis yields 0\% certification. Our key contribution is the Natural Wasserstein metric, which measures perturbations in the geometry induced by the model's own feature covariance. This eliminates spectral amplification, reducing worst-case sensitivity by $76\times$ and stabilizing attribution estimates. On CIFAR-10 with ResNet-18, Natural W-TRAK certifies 68.7\% of ranking pairs compared to 0\% for Euclidean baselines -- to our knowledge, the first non-vacuous certified bounds for neural network attribution. Furthermore, we prove that the Self-Influence term arising from our analysis equals the Lipschitz constant governing attribution stability, providing theoretical grounding for leverage-based anomaly detection. Empirically, Self-Influence achieves 0.970 AUROC for label noise detection, identifying 94.1\% of corrupted labels by examining just the top 20\% of training data.

【11】GS-KAN: Parameter-Efficient Kolmogorov-Arnold Networks via Sprecher-Type Shared Basis Functions
标题：GS-KAN：通过扩展器型共享基函数的参数高效Kolmogorov-Arnold网络
链接：https://arxiv.org/abs/2512.09084

作者：Oscar Eliasson
备注：10 pages
摘要：Kolmogorov-Arnold表示定理通过将可学习的单变量函数放置在边缘而不是节点上，提供了多层感知器（MLP）的理论替代方案。虽然最近的实现，如Kolmogorov-Arnold网络（KAN）表现出很高的近似能力，但由于需要为每个网络边缘保持唯一的参数化，它们遭受显著的参数效率低下。在这项工作中，我们提出了GS-KAN（广义Sprecher KAN），一个轻量级的架构的灵感来自大卫Sprecher的细化叠加定理。GS-KAN通过将可学习的线性变换应用于每个层的单个可学习的共享父函数来构造唯一的边缘函数。我们评估GS-KAN对现有的KAN架构和MLP在合成函数近似，表格数据回归和图像分类任务。我们的研究结果表明，GS-KAN在连续函数逼近任务中优于MLP和标准KAN基线，同时保持卓越的参数效率。此外，GS-KAN在表格回归上与现有KAN架构实现了竞争性性能，并在高维分类任务上优于MLP。至关重要的是，所提出的架构，使部署基于KAN的架构在高维制度下严格的参数约束，设置标准的实现通常是不可行的，由于参数爆炸。源代码可在https://github.com/rambamn48/gs-impl上获得。
摘要：The Kolmogorov-Arnold representation theorem offers a theoretical alternative to Multi-Layer Perceptrons (MLPs) by placing learnable univariate functions on edges rather than nodes. While recent implementations such as Kolmogorov-Arnold Networks (KANs) demonstrate high approximation capabilities, they suffer from significant parameter inefficiency due to the requirement of maintaining unique parameterizations for every network edge. In this work, we propose GS-KAN (Generalized Sprecher-KAN), a lightweight architecture inspired by David Sprecher's refinement of the superposition theorem. GS-KAN constructs unique edge functions by applying learnable linear transformations to a single learnable, shared parent function per layer. We evaluate GS-KAN against existing KAN architectures and MLPs across synthetic function approximation, tabular data regression and image classification tasks. Our results demonstrate that GS-KAN outperforms both MLPs and standard KAN baselines on continuous function approximation tasks while maintaining superior parameter efficiency. Additionally, GS-KAN achieves competitive performance with existing KAN architectures on tabular regression and outperforms MLPs on high-dimensional classification tasks. Crucially, the proposed architecture enables the deployment of KAN-based architectures in high-dimensional regimes under strict parameter constraints, a setting where standard implementations are typically infeasible due to parameter explosion. The source code is available at https://github.com/rambamn48/gs-impl.

【12】Contrast transfer functions help quantify neural network out-of-distribution generalization in HRTEM
标题：对比传递函数有助于量化HRTEM中神经网络的分布外概括
链接：https://arxiv.org/abs/2512.09067

作者：Luis Rangel DaCosta,Mary C. Scott
摘要：神经网络虽然可以有效地解决许多具有挑战性的科学任务，但并不知道它能很好地执行分布外（OOD），即，在与其训练数据不同的域中。理解神经网络OOD泛化对于它们在实验工作流程中的成功部署至关重要，特别是当难以建立有关实验的基础知识或实验条件显著变化时。通过对地面实况信息的固有访问和对底层分布的细粒度控制，基于模拟的数据管理有助于精确调查OOD泛化行为。在这里，我们探讨了纳米颗粒高分辨率透射电子显微镜（HRTEM）成像的神经网络分割模型成像条件的泛化，使用通过随机结构采样和多层模拟生成的合成数据训练和测量超过12，000个神经网络的OOD泛化。使用HRTEM对比度传递函数，我们进一步开发了一个框架来比较HRTEM数据集的信息内容和量化OOD域偏移。我们证明了神经网络分割模型具有显着的性能稳定性，但随着成像条件从训练分布转移，将平稳且可预测地恶化。最后，我们考虑我们的方法在解释其他OOD的变化，如原子结构的限制，并讨论在这样的设置理解泛化的补充技术。
摘要：Neural networks, while effective for tackling many challenging scientific tasks, are not known to perform well out-of-distribution (OOD), i.e., within domains which differ from their training data. Understanding neural network OOD generalization is paramount to their successful deployment in experimental workflows, especially when ground-truth knowledge about the experiment is hard to establish or experimental conditions significantly vary. With inherent access to ground-truth information and fine-grained control of underlying distributions, simulation-based data curation facilitates precise investigation of OOD generalization behavior. Here, we probe generalization with respect to imaging conditions of neural network segmentation models for high-resolution transmission electron microscopy (HRTEM) imaging of nanoparticles, training and measuring the OOD generalization of over 12,000 neural networks using synthetic data generated via random structure sampling and multislice simulation. Using the HRTEM contrast transfer function, we further develop a framework to compare information content of HRTEM datasets and quantify OOD domain shifts. We demonstrate that neural network segmentation models enjoy significant performance stability, but will smoothly and predictably worsen as imaging conditions shift from the training distribution. Lastly, we consider limitations of our approach in explaining other OOD shifts, such as of the atomic structures, and discuss complementary techniques for understanding generalization in such settings.

【13】Deterministic World Models for Verification of Closed-loop Vision-based Systems
标题：用于验证基于视觉的闭环系统的确定性世界模型
链接：https://arxiv.org/abs/2512.08991

作者：Yuang Geng,Zhuoyang Zhou,Zhongzheng Zhang,Siyuan Pan,Hoang-Dung Tran,Ivan Ruchkin
备注：22 pages, 10 figures. Submitted to FM 2026
摘要：由于图像的高维性和视觉环境建模的困难，基于视觉的闭环控制系统仍然是一个根本的挑战。虽然生成模型越来越多地用作验证中的相机代理，但它们对随机潜变量的依赖引入了不必要的过度近似误差。为了解决这个瓶颈，我们提出了一个确定性世界模型（DWM），将系统状态直接映射到生成图像，有效地消除了不可解释的潜在变量，以确保精确的输入边界。DWM采用双目标损失函数进行训练，该函数将像素级重建精度与控制差异损失相结合，以保持与真实系统的行为一致性。我们将DWM集成到验证管道中，利用基于Star的可达性分析（StarV），并采用保角预测来推导世界模型与实际基于视觉的系统之间的轨迹偏差的严格统计界限。标准基准测试的实验表明，我们的方法产生显着更紧密的可达集和更好的验证性能比潜变量基线。
摘要：Verifying closed-loop vision-based control systems remains a fundamental challenge due to the high dimensionality of images and the difficulty of modeling visual environments. While generative models are increasingly used as camera surrogates in verification, their reliance on stochastic latent variables introduces unnecessary overapproximation error. To address this bottleneck, we propose a Deterministic World Model (DWM) that maps system states directly to generative images, effectively eliminating uninterpretable latent variables to ensure precise input bounds. The DWM is trained with a dual-objective loss function that combines pixel-level reconstruction accuracy with a control difference loss to maintain behavioral consistency with the real system. We integrate DWM into a verification pipeline utilizing Star-based reachability analysis (StarV) and employ conformal prediction to derive rigorous statistical bounds on the trajectory deviation between the world model and the actual vision-based system. Experiments on standard benchmarks show that our approach yields significantly tighter reachable sets and better verification performance than a latent-variable baseline.

【14】Resolving Conflicts in Lifelong Learning via Aligning Updates in Subspaces
标题：通过调整子空间中的更新来解决终身学习中的冲突
链接：https://arxiv.org/abs/2512.08960

作者：Yueer Zhou,Yichen Wu,Ying Wei
摘要：低秩自适应（LoRA）可以实现有效的持续学习，但由于任务之间的破坏性干扰，它经常遭受灾难性的遗忘。我们的分析表明，这种退化主要是由对抗性方向更新驱动的，其中新的任务梯度直接与历史权重轨迹相反。为了解决这个问题，我们提出了PS-LoRA（参数稳定性LoRA），这是一个旨在通过在优化子空间内对齐更新来解决冲突的框架。我们的方法采用了双正则化目标，惩罚冲突的方向和约束幅度偏差，以确保与先验知识的一致性。此外，我们实现了一个基于幅度的合并策略，以巩固顺序适配器到一个强大的表示，而无需重新训练。在NLP和Vision基准测试上的实验表明，PS-LoRA通过保持学习表示的稳定性，同时有效地适应新领域，表现优于最先进的方法。
摘要：Low-Rank Adaptation (LoRA) enables efficient Continual Learning but often suffers from catastrophic forgetting due to destructive interference between tasks. Our analysis reveals that this degradation is primarily driven by antagonistic directional updates where new task gradients directly oppose the historical weight trajectory. To address this, we propose PS-LoRA (Parameter Stability LoRA), a framework designed to resolve conflicts by aligning updates within the optimization subspace. Our approach employs a dual-regularization objective that penalizes conflicting directions and constrains magnitude deviations to ensure consistency with prior knowledge. Additionally, we implement a magnitude-based merging strategy to consolidate sequential adapters into a robust representation without retraining. Experiments on NLP and Vision benchmarks show that PS-LoRA outperforms state-of-the-art methods by preserving the stability of learned representations while efficiently adapting to new domains.

【15】Learning When to Ask: Simulation-Trained Humanoids for Mental-Health Diagnosis
标题：学习何时提问：经过模拟训练的人形机器人用于心理健康诊断
链接：https://arxiv.org/abs/2512.08952

作者：Filippo Cenacchi,Deborah Richards,Longbing Cao
摘要：与用户一起测试人形机器人是缓慢的，会导致磨损，并限制迭代和多样性。然而，筛查人员必须掌握对话的时机、韵律、反向渠道，以及抑郁症和创伤后应激障碍患者的面部和言语中应该注意的问题。大多数模拟器忽略了非语言动态的策略学习;许多控制器追求任务的准确性，而低估了信任，节奏和融洽。我们将人形机器人虚拟为会话代理，以进行训练，而无需硬件负担。我们以代理为中心、模拟优先的管道将采访数据转化为276名虚幻引擎MetaHuman患者，这些患者具有同步的语音、凝视/面部和头部躯干姿势，以及PHQ-8和PCL-C流。感知-融合-策略循环决定在安全屏障下发言的内容和时间，何时反向信道，以及如何避免中断。训练使用反事实重放（有界的非语言扰动）和不确定性感知的回合管理器，以减少诊断的模糊性。结果仅为模拟;人形机器人是转移目标。在比较三种控制器时，自定义TD 3（双延迟DDPG）的性能优于PPO和CEM，以更快的速度实现了接近天花板的覆盖，并获得了相当的回报。决策质量分析表明，可以忽略不计的转向重叠，对齐的切割时间，更少的澄清提示，更短的等待。性能在模态丢弃和渲染器交换下保持稳定，并且排名保持在保持的患者分割上。贡献：（1）一个以代理人为中心的模拟器，将访谈转化为276个具有有限非语言反事实的互动患者;（2）一个安全的学习循环，将时间和融洽关系视为一流的控制变量;（3）一项比较研究（TD 3与PPO/CEM），在完整性和社会时间方面有明显的进步;（4）消融和鲁棒性分析，解释了这些进步，并使临床医生监督的人形飞行员成为可能。
摘要：Testing humanoid robots with users is slow, causes wear, and limits iteration and diversity. Yet screening agents must master conversational timing, prosody, backchannels, and what to attend to in faces and speech for Depression and PTSD. Most simulators omit policy learning with nonverbal dynamics; many controllers chase task accuracy while underweighting trust, pacing, and rapport. We virtualise the humanoid as a conversational agent to train without hardware burden. Our agent-centred, simulation-first pipeline turns interview data into 276 Unreal Engine MetaHuman patients with synchronised speech, gaze/face, and head-torso poses, plus PHQ-8 and PCL-C flows. A perception-fusion-policy loop decides what and when to speak, when to backchannel, and how to avoid interruptions, under a safety shield. Training uses counterfactual replay (bounded nonverbal perturbations) and an uncertainty-aware turn manager that probes to reduce diagnostic ambiguity. Results are simulation-only; the humanoid is the transfer target. In comparing three controllers, a custom TD3 (Twin Delayed DDPG) outperformed PPO and CEM, achieving near-ceiling coverage with steadier pace at comparable rewards. Decision-quality analyses show negligible turn overlap, aligned cut timing, fewer clarification prompts, and shorter waits. Performance stays stable under modality dropout and a renderer swap, and rankings hold on a held-out patient split. Contributions: (1) an agent-centred simulator that turns interviews into 276 interactive patients with bounded nonverbal counterfactuals; (2) a safe learning loop that treats timing and rapport as first-class control variables; (3) a comparative study (TD3 vs PPO/CEM) with clear gains in completeness and social timing; and (4) ablations and robustness analyses explaining the gains and enabling clinician-supervised humanoid pilots.

【16】Transport Novelty Distance: A Distributional Metric for Evaluating Material Generative Models
标题：运输新奇距离：评估材料生成模型的分布指标
链接：https://arxiv.org/abs/2512.09514

作者：Paul Hagemann,Simon Müller,Janine George,Philipp Benner
摘要：生成机器学习的最新进展为发现和设计新材料开辟了新的可能性。然而，随着这些模型变得越来越复杂，对严格和有意义的评估指标的需求也在增长。现有的评估方法往往无法捕捉生成结构的质量和新颖性，限制了我们评估真实生成性能的能力。在本文中，我们引入了传输新奇距离（TNovD）来判断用于材料发现的生成模型的质量和新颖性。基于最优传输理论的思想，TNovD使用训练和生成集的特征之间的耦合，通过阈值将其细化为质量和记忆机制。这些特征是使用图神经网络从晶体结构中生成的，该图神经网络经过训练，可以使用对比学习来区分材料、它们的增强对应物和不同大小的超级细胞。我们评估我们提出的度量标准，典型的玩具实验相关的晶体结构预测，包括记忆，噪声注入和晶格变形。此外，我们在MP20验证集和WBM替代数据集上验证了TNovD，证明它能够检测记忆和低质量的材料数据。我们还对几种流行的材料生成模型的性能进行了基准测试。虽然介绍了材料，我们的TNovD框架是领域不可知的，可以适用于其他领域，如图像和分子。
摘要：Recent advances in generative machine learning have opened new possibilities for the discovery and design of novel materials. However, as these models become more sophisticated, the need for rigorous and meaningful evaluation metrics has grown. Existing evaluation approaches often fail to capture both the quality and novelty of generated structures, limiting our ability to assess true generative performance. In this paper, we introduce the Transport Novelty Distance (TNovD) to judge generative models used for materials discovery jointly by the quality and novelty of the generated materials. Based on ideas from Optimal Transport theory, TNovD uses a coupling between the features of the training and generated sets, which is refined into a quality and memorization regime by a threshold. The features are generated from crystal structures using a graph neural network that is trained to distinguish between materials, their augmented counterparts, and differently sized supercells using contrastive learning. We evaluate our proposed metric on typical toy experiments relevant for crystal structure prediction, including memorization, noise injection and lattice deformations. Additionally, we validate the TNovD on the MP20 validation set and the WBM substitution dataset, demonstrating that it is capable of detecting both memorized and low-quality material data. We also benchmark the performance of several popular material generative models. While introduced for materials, our TNovD framework is domain-agnostic and can be adapted for other areas, such as images and molecules.

其他(19篇)

【1】FALCON: Few-step Accurate Likelihoods for Continuous Flows
标题：CLARCON：连续流动的几步准确可能性
链接：https://arxiv.org/abs/2512.09914

作者：Danyal Rehman,Tara Akhound-Sadegh,Artem Gazizov,Yoshua Bengio,Alexander Tong
备注：Preprint; NeurIPS 2025 MLSB
摘要：热力学平衡态的可缩放采样是统计物理学中的一个长期挑战。玻尔兹曼生成器通过配对生成模型来解决这个问题，该生成模型能够进行精确的似然计算，并具有重要性采样，以获得目标分布下的一致样本。目前的玻尔兹曼生成器主要使用连续归一化流（CNF），通过流匹配进行训练，以有效训练强大的模型。然而，这些模型的似然计算是非常昂贵的，每个样本需要数千个函数评估，严重限制了它们的采用。在这项工作中，我们提出了连续流的少步精确似然（RESIDCON），这种方法通过引入鼓励可逆性的混合训练目标，允许少步采样，其似然精度足以满足重要性采样应用。我们表明，ESTCON优于最先进的归一化流模型的分子玻尔兹曼采样，是两个数量级的速度比同等性能的CNF模型。
摘要：Scalable sampling of molecular states in thermodynamic equilibrium is a long-standing challenge in statistical physics. Boltzmann Generators tackle this problem by pairing a generative model, capable of exact likelihood computation, with importance sampling to obtain consistent samples under the target distribution. Current Boltzmann Generators primarily use continuous normalizing flows (CNFs) trained with flow matching for efficient training of powerful models. However, likelihood calculation for these models is extremely costly, requiring thousands of function evaluations per sample, severely limiting their adoption. In this work, we propose Few-step Accurate Likelihoods for Continuous Flows (FALCON), a method which allows for few-step sampling with a likelihood accurate enough for importance sampling applications by introducing a hybrid training objective that encourages invertibility. We show FALCON outperforms state-of-the-art normalizing flow models for molecular Boltzmann sampling and is two orders of magnitude faster than the equivalently performing CNF model.

【2】Conformal Bandits: Bringing statistical validity and reward efficiency to the small-gap regime
标题：保形盗贼：为小差距制度带来统计有效性和奖励效率
链接：https://arxiv.org/abs/2512.09850

作者：Simone Cuonzo,Nina Deliu
摘要：我们介绍共形强盗，一个新的框架整合共形预测（CP）到强盗问题，一个经典的范例顺序决策下的不确定性。传统的后悔最小化强盗策略，如汤普森抽样和上置信限（UCB），通常依赖于分布假设或渐近保证;此外，它们仍然主要集中在后悔上，忽略了它们的统计特性。我们解决这个差距。通过采用CP，我们的桥梁遗憾最小化潜力的决策强盗政策的统计保证的形式，有限时间的预测覆盖。我们证明了它的潜力共形强盗通过模拟研究和应用程序的投资组合分配，一个典型的小差距制度，在手臂奖励的差异太小，经典的政策，以实现最佳的遗憾界限在有限的样本。出于这一动机，我们展示了我们的框架的实际优势，在遗憾的小差距设置，以及其附加值在实现名义上的覆盖保证经典的UCB政策失败。专注于我们感兴趣的应用，我们进一步说明了如何集成隐马尔可夫模型来捕捉金融市场的制度转换行为，增强了探索-开发权衡，并转化为更高的风险调整后的后悔效率回报，同时保持覆盖保证。
摘要：We introduce Conformal Bandits, a novel framework integrating Conformal Prediction (CP) into bandit problems, a classic paradigm for sequential decision-making under uncertainty. Traditional regret-minimisation bandit strategies like Thompson Sampling and Upper Confidence Bound (UCB) typically rely on distributional assumptions or asymptotic guarantees; further, they remain largely focused on regret, neglecting their statistical properties. We address this gap. Through the adoption of CP, we bridge the regret-minimising potential of a decision-making bandit policy with statistical guarantees in the form of finite-time prediction coverage. We demonstrate the potential of it Conformal Bandits through simulation studies and an application to portfolio allocation, a typical small-gap regime, where differences in arm rewards are far too small for classical policies to achieve optimal regret bounds in finite sample. Motivated by this, we showcase our framework's practical advantage in terms of regret in small-gap settings, as well as its added value in achieving nominal coverage guarantees where classical UCB policies fail. Focusing on our application of interest, we further illustrate how integrating hidden Markov models to capture the regime-switching behaviour of financial markets, enhances the exploration-exploitation trade-off, and translates into higher risk-adjusted regret efficiency returns, while preserving coverage guarantees.

【3】Mixture of Lookup Key-Value Experts
标题：混合密钥-值专家
链接：https://arxiv.org/abs/2512.09723

作者：Zongcheng Wang
备注：Preliminary Version; Work in Progress
摘要：最近的研究已经开发了几种适用于最终用户设备上的推理的LLM架构，例如混合专家（MoLE）~\parencite{jie_mixture_2025}。MoLE的一个关键特性是每个令牌ID都与一个专门的专家组相关联。对于给定的输入，只有对应于输入令牌ID的专家将被激活。由于在推理过程中将少量激活的专家加载到RAM中的通信开销可以忽略不计，因此专家参数可以卸载到存储中，使得MoLE适用于资源受限的设备。然而，MoLE的上下文无关的专家选择机制，仅基于输入ID，可能会限制模型的性能。为了解决这个问题，我们提出了\textbf{M}ixture \textbf{o}f \textbf{L}ookup \textbf{K}ey-\textbf{V} value Experts（\textbf{MoLKV}）模型。在MoLKV中，每个专家都被构造为一个键值对。对于给定的输入，输入派生查询与来自当前序列的缓存键值专家交互，生成上下文感知的专家输出。这种上下文感知机制消除了MoLE的局限性，实验结果表明，MoLKV在小规模评估中实现了显着降低的验证损失。
摘要：Recent research has developed several LLM architectures suitable for inference on end-user devices, such as the Mixture of Lookup Experts (MoLE)~\parencite{jie_mixture_2025}. A key feature of MoLE is that each token id is associated with a dedicated group of experts. For a given input, only the experts corresponding to the input token id will be activated. Since the communication overhead of loading this small number of activated experts into RAM during inference is negligible, expert parameters can be offloaded to storage, making MoLE suitable for resource-constrained devices. However, MoLE's context-independent expert selection mechanism, based solely on input ids, may limit model performance. To address this, we propose the \textbf{M}ixture \textbf{o}f \textbf{L}ookup \textbf{K}ey-\textbf{V}alue Experts (\textbf{MoLKV}) model. In MoLKV, each expert is structured as a key-value pair. For a given input, the input-derived query interacts with the cached key-value experts from the current sequence, generating a context-aware expert output. This context-aware mechanism alleviates the limitation of MoLE, and experimental results demonstrate that MoLKV achieves significantly lower validation loss in small-scale evaluations.

【4】A data-driven approach to linking design features with manufacturing process data for sustainable product development
标题：将设计特征与制造过程数据联系起来以实现可持续产品开发的数据驱动方法
链接：https://arxiv.org/abs/2512.09690

作者：Jiahang Li,Lucas Cazzonelli,Jacqueline Höllig,Markus Doellken,Sven Matthiesen
备注：This is the preprint of a paper accepted for the CIRP Design Conference 2026
摘要：工业物联网（IIoT）技术的日益普及，使制造过程数据的自动化、实时收集成为可能，为数据驱动的产品开发带来了新的机遇。目前的数据驱动方法通常应用于特定的领域，如设计或制造，与有限的探索集成设计特征和制造过程数据。由于设计决策会显著影响制造结果，例如错误率、能耗和处理时间，因此缺乏这种集成限制了数据驱动的产品设计改进的潜力。本文提出了一种数据驱动的方法来映射和分析设计特征和制造工艺数据之间的关系。开发了一个全面的系统架构，以确保持续的数据收集和整合。设计特征和制造过程数据之间的联系是开发机器学习模型的基础，可以实现自动化的设计改进建议。通过将制造过程数据与可持续性指标相结合，这种方法为可持续产品开发开辟了新的可能性。
摘要：The growing adoption of Industrial Internet of Things (IIoT) technologies enables automated, real-time collection of manufacturing process data, unlocking new opportunities for data-driven product development. Current data-driven methods are generally applied within specific domains, such as design or manufacturing, with limited exploration of integrating design features and manufacturing process data. Since design decisions significantly affect manufacturing outcomes, such as error rates, energy consumption, and processing times, the lack of such integration restricts the potential for data-driven product design improvements. This paper presents a data-driven approach to mapping and analyzing the relationship between design features and manufacturing process data. A comprehensive system architecture is developed to ensure continuous data collection and integration. The linkage between design features and manufacturing process data serves as the basis for developing a machine learning model that enables automated design improvement suggestions. By integrating manufacturing process data with sustainability metrics, this approach opens new possibilities for sustainable product development.

【5】Drawback of Enforcing Equivariance and its Compensation via the Lens of Expressive Power
标题：从表现力的角度看强制等变性的弊端及其补偿
链接：https://arxiv.org/abs/2512.09673

作者：Yuzhu Chen,Tian Qin,Xinmei Tian,Fengxiang He,Dacheng Tao
摘要：等变神经网络将对称性编码为归纳偏差，并在广泛的领域中取得了很强的经验性能。然而，他们的表达能力仍然没有得到很好的理解。针对2层ReLU网络，本文研究了等变约束对等变和逐层等变网络表达能力的影响。通过检查ReLU网络的边界超平面和通道向量，我们构建了一个例子，表明等方差约束可以严格限制表达能力。然而，我们证明，这个缺点可以通过扩大模型的大小进行补偿。此外，我们表明，尽管模型大小较大，所得到的架构仍然可以对应于一个假设空间具有较低的复杂性，这意味着优越的推广等变网络。
摘要：Equivariant neural networks encode symmetry as an inductive bias and have achieved strong empirical performance in wide domains. However, their expressive power remains not well understood. Focusing on 2-layer ReLU networks, this paper investigates the impact of equivariance constraints on the expressivity of equivariant and layer-wise equivariant networks. By examining the boundary hyperplanes and the channel vectors of ReLU networks, we construct an example showing that equivariance constraints could strictly limit expressive power. However, we demonstrate that this drawback can be compensated via enlarging the model size. Furthermore, we show that despite a larger model size, the resulting architecture could still correspond to a hypothesis space with lower complexity, implying superior generalizability for equivariant networks.

【6】SynthPix: A lightspeed PIV images generator
标题：SynthPix：光速PPV图像生成器
链接：https://arxiv.org/abs/2512.09664

作者：Antonio Terpin,Alan Bonomi,Francesco Banelli,Raffaello D'Andrea
备注：Code: https://github.com/antonioterpin/synthpix
摘要：我们描述了SynthPix，一个粒子图像测速（PIV）的合成图像生成器，专注于加速器上的性能和并行性，在JAX中实现。SynthPix支持与现有工具相同的配置参数，但每秒生成图像对的吞吐量要高出几个数量级。开发SynthPix的目的是为了能够训练用于流量估计的数据饥饿强化学习方法，并在开发快速流量估计方法的过程中减少迭代次数，这些方法用于最近的主动流体控制研究，并具有实时PIV反馈。我们相信SynthPix对流体动力学社区很有用，在本文中我们描述了该软件包背后的主要思想。
摘要：We describe SynthPix, a synthetic image generator for Particle Image Velocimetry (PIV) with a focus on performance and parallelism on accelerators, implemented in JAX. SynthPix supports the same configuration parameters as existing tools but achieves a throughput several orders of magnitude higher in image-pair generation per second. SynthPix was developed to enable the training of data-hungry reinforcement learning methods for flow estimation and for reducing the iteration times during the development of fast flow estimation methods used in recent active fluids control studies with real-time PIV feedback. We believe SynthPix to be useful for the fluid dynamics community, and in this paper we describe the main ideas behind this software package.

【7】Contextual Dynamic Pricing with Heterogeneous Buyers
标题：具有异类买家的上下文动态定价
链接：https://arxiv.org/abs/2512.09513

作者：Thodoris Lykouris,Sloan Nietert,Princewill Okoroafor,Chara Podimata,Julian Zimmert
备注：Appeared at NeurIPS 2025
摘要：我们开始研究的上下文动态定价与异质人群的买家，卖方反复张贴的价格（超过$T$轮），取决于可观察的$d$维的背景下，并收到二进制购买反馈。与以前的工作假设同质买方类型，在我们的设置买方的估值类型是从一个未知的分布有限的支持大小$K_{\star}$。我们开发了一个上下文定价算法的基础上乐观后验抽样后悔$\widetilde{O}（K_{\star}\sqrt{dT}）$，我们证明了紧在$d$和$T$的对数项。最后，我们完善我们的分析，非上下文定价的情况下，提出了一个方差感知缩放算法，实现最佳的依赖K_{\star}$。
摘要：We initiate the study of contextual dynamic pricing with a heterogeneous population of buyers, where a seller repeatedly posts prices (over $T$ rounds) that depend on the observable $d$-dimensional context and receives binary purchase feedback. Unlike prior work assuming homogeneous buyer types, in our setting the buyer's valuation type is drawn from an unknown distribution with finite support size $K_{\star}$. We develop a contextual pricing algorithm based on optimistic posterior sampling with regret $\widetilde{O}(K_{\star}\sqrt{dT})$, which we prove to be tight in $d$ and $T$ up to logarithmic terms. Finally, we refine our analysis for the non-contextual pricing case, proposing a variance-aware zooming algorithm that achieves the optimal dependence on $K_{\star}$.

【8】Cauchy-Schwarz Fairness Regularizer
标题：柯西-施瓦茨公平调节器
链接：https://arxiv.org/abs/2512.09467

作者：Yezi Liu,Hanning Chen,Wenjun Huang,Yang Ni,Mohsen Imani
摘要：机器学习中的组公平性通常通过添加正则化器来实现，该正则化器可以减少模型预测和敏感属性之间的依赖性。然而，现有的正则化器是建立在异构的距离度量和设计选择上的，这使得它们的行为难以推理，并且它们的性能在任务之间不一致。这就提出了一个基本的问题：什么性质是好的公平正则化子？我们首先将现有的过程中方法组织成三个家族来解决这个问题：（i）在敏感组中匹配预测统计数据，（ii）对齐潜在表示，以及（iii）直接最小化预测和敏感属性之间的依赖性。通过这个镜头，我们确定了理想的属性的基础距离测量，包括严格的泛化范围，规模差异的鲁棒性，以及处理任意预测分布的能力。出于这些属性，我们提出了一个Cauchy-Schwarz（CS）公平正则化惩罚的经验CS分歧预测分布的敏感群体的条件。在高斯比较下，我们证明了CS发散比Kullback-Leibler发散，最大平均离散度和人口统计学奇偶性中使用的平均差异产生更严格的界限，我们讨论了这些优势如何转化为一个无分布，基于内核的估计，自然扩展到多个敏感属性。在四个表格基准和一个图像数据集上进行的大量实验表明，所提出的CS正则化器在保持竞争准确性的同时，始终如一地提高了人口统计学奇偶性和平等机会指标，并且与之前的正则化器相比，在超参数设置上实现了更稳定的效用-公平性权衡。
摘要：Group fairness in machine learning is often enforced by adding a regularizer that reduces the dependence between model predictions and sensitive attributes. However, existing regularizers are built on heterogeneous distance measures and design choices, which makes their behavior hard to reason about and their performance inconsistent across tasks. This raises a basic question: what properties make a good fairness regularizer? We address this question by first organizing existing in-process methods into three families: (i) matching prediction statistics across sensitive groups, (ii) aligning latent representations, and (iii) directly minimizing dependence between predictions and sensitive attributes. Through this lens, we identify desirable properties of the underlying distance measure, including tight generalization bounds, robustness to scale differences, and the ability to handle arbitrary prediction distributions. Motivated by these properties, we propose a Cauchy-Schwarz (CS) fairness regularizer that penalizes the empirical CS divergence between prediction distributions conditioned on sensitive groups. Under a Gaussian comparison, we show that CS divergence yields a tighter bound than Kullback-Leibler divergence, Maximum Mean Discrepancy, and the mean disparity used in Demographic Parity, and we discuss how these advantages translate to a distribution-free, kernel-based estimator that naturally extends to multiple sensitive attributes. Extensive experiments on four tabular benchmarks and one image dataset demonstrate that the proposed CS regularizer consistently improves Demographic Parity and Equal Opportunity metrics while maintaining competitive accuracy, and achieves a more stable utility-fairness trade-off across hyperparameter settings compared to prior regularizers.

【9】Architectures for Building Agentic AI
标题：构建智能AI的架构
链接：https://arxiv.org/abs/2512.09458

作者：Sławomir Nowaczyk
备注：This is a preprint of a chapter accepted for publication in Generative and Agentic AI Reliability: Architectures, Challenges, and Trust for Autonomous Systems, published by Springer Nature
摘要：本章认为，代理和生成AI的可靠性主要是一个建筑属性。我们将代理系统定义为目标导向的，使用工具的决策者在闭环中操作，并展示了可靠性如何从原则组件化（目标管理器，规划器，工具路由器，执行器，内存，验证器，安全监视器，遥测），纪律接口（模式约束，验证，最低特权工具调用），以及明确的控制和保证循环。建立在经典的基础上，我们提出了一个实用的分类工具，使用代理，内存增强代理，规划和自我改进代理，多代理系统，体现或网络代理，并分析每个模式如何重塑的可靠性信封和故障模式。我们提供了关于类型化模式、幂等性、许可、事务语义、内存来源和卫生、运行时治理（预算、终止条件）和模拟前驱动保护措施的设计指南。
摘要：This chapter argues that the reliability of agentic and generative AI is chiefly an architectural property. We define agentic systems as goal-directed, tool-using decision makers operating in closed loops, and show how reliability emerges from principled componentisation (goal manager, planner, tool-router, executor, memory, verifiers, safety monitor, telemetry), disciplined interfaces (schema-constrained, validated, least-privilege tool calls), and explicit control and assurance loops. Building on classical foundations, we propose a practical taxonomy-tool-using agents, memory-augmented agents, planning and self-improvement agents, multi-agent systems, and embodied or web agents - and analyse how each pattern reshapes the reliability envelope and failure modes. We distil design guidance on typed schemas, idempotency, permissioning, transactional semantics, memory provenance and hygiene, runtime governance (budgets, termination conditions), and simulate-before-actuate safeguards.

【10】Generalizable Collaborative Search-and-Capture in Cluttered Environments via Path-Guided MAPPO and Directional Frontier Allocation
标题：通过路径引导MAPPO和定向前沿分配在混乱环境中进行可推广的协作搜索和捕获
链接：https://arxiv.org/abs/2512.09410

作者：Jialin Ying,Zhihao Li,Zicheng Dong,Guohua Wu,Yihuan Liao
备注：7 pages, 7 figures
摘要：由于稀疏的奖励和受限的视野（FOV），在混乱的环境中的协作追逃提出了重大的挑战。标准的多智能体强化学习（MARL）通常存在探索效率低下的问题，并且无法扩展到大型场景。我们提出了PGF-MAPPO（路径引导前沿MAPPO），一个层次化的框架桥梁拓扑规划与反应控制。为了解决局部最小值和稀疏奖励，我们整合了一个基于A* 的势场来进行密集奖励整形。此外，我们还引入了方向性前沿分配，将Faradian点采样（FPS）与几何角度抑制相结合，以加强空间分散和加速覆盖。该架构采用了一个参数共享的分散批评，保持O（1）模型复杂度适合机器人群。实验表明，PGF-MAPPO对更快的逃避者实现了更高的捕获效率。在10 x10地图上训练的策略对不可见的20 x20环境表现出强大的zero-shot泛化能力，显著优于基于规则和基于学习的基线。
摘要：Collaborative pursuit-evasion in cluttered environments presents significant challenges due to sparse rewards and constrained Fields of View (FOV). Standard Multi-Agent Reinforcement Learning (MARL) often suffers from inefficient exploration and fails to scale to large scenarios. We propose PGF-MAPPO (Path-Guided Frontier MAPPO), a hierarchical framework bridging topological planning with reactive control. To resolve local minima and sparse rewards, we integrate an A*-based potential field for dense reward shaping. Furthermore, we introduce Directional Frontier Allocation, combining Farthest Point Sampling (FPS) with geometric angle suppression to enforce spatial dispersion and accelerate coverage. The architecture employs a parameter-shared decentralized critic, maintaining O(1) model complexity suitable for robotic swarms. Experiments demonstrate that PGF-MAPPO achieves superior capture efficiency against faster evaders. Policies trained on 10x10 maps exhibit robust zero-shot generalization to unseen 20x20 environments, significantly outperforming rule-based and learning-based baselines.

【11】CONCUR: A Framework for Continual Constrained and Unconstrained Routing
标题：CONCUR：连续约束和无约束路由框架
链接：https://arxiv.org/abs/2512.09386

作者：Peter Baile Chen,Weiyue Li,Dan Roth,Michael Cafarella,Samuel Madden,Jacob Andreas
摘要：人工智能任务的复杂性不同，最好用不同的计算策略（例如，模型和解码方法的组合）。因此，将任务映射到适当策略的有效路由系统至关重要。大多数现有方法通过在所有策略中训练单个模型来构建路由框架，这需要在新策略出现时进行完全重新训练，并导致高开销。然而，这种连续路由的尝试经常面临泛化的困难。先前的模型通常也使用单一的输入表示，限制了它们捕捉路由问题的全部复杂性的能力，并导致次优的路由决策。为了解决这些差距，我们提出了CONCUR，一个连续的路由框架，支持约束和无约束路由（即，有或没有预算的路由）。我们的模块化设计为每个策略训练一个单独的预测模型，从而以较低的额外训练成本无缝整合新策略。我们的预测器还利用任务和计算策略的多种表示，以更好地捕捉整体问题的复杂性。在分布和分布外，知识和推理密集型任务的实验表明，我们的方法优于最好的单一策略和强大的现有路由技术，具有更高的端到端的准确性和更低的推理成本，在连续和非连续设置，同时也降低了训练成本在连续设置。
摘要：AI tasks differ in complexity and are best addressed with different computation strategies (e.g., combinations of models and decoding methods). Hence, an effective routing system that maps tasks to the appropriate strategies is crucial. Most prior methods build the routing framework by training a single model across all strategies, which demands full retraining whenever new strategies appear and leads to high overhead. Attempts at such continual routing, however, often face difficulties with generalization. Prior models also typically use a single input representation, limiting their ability to capture the full complexity of the routing problem and leading to sub-optimal routing decisions. To address these gaps, we propose CONCUR, a continual routing framework that supports both constrained and unconstrained routing (i.e., routing with or without a budget). Our modular design trains a separate predictor model for each strategy, enabling seamless incorporation of new strategies with low additional training cost. Our predictors also leverage multiple representations of both tasks and computation strategies to better capture overall problem complexity. Experiments on both in-distribution and out-of-distribution, knowledge- and reasoning-intensive tasks show that our method outperforms the best single strategy and strong existing routing techniques with higher end-to-end accuracy and lower inference cost in both continual and non-continual settings, while also reducing training cost in the continual setting.

【12】Branching Strategies Based on Subgraph GNNs: A Study on Theoretical Promise versus Practical Reality
标题：基于子图GNN的分支策略：理论承诺与实践现实的研究
链接：https://arxiv.org/abs/2512.09355

作者：Junru Zhou,Yicheng Wang,Pan Li
摘要：图神经网络（GNNs）已经成为混合线性规划（MILP）中“学习分支”的一种有前途的方法。虽然标准的消息传递GNN（MPNN）是有效的，但理论上它们缺乏完全表示MILP结构的表达能力。相反，高阶GNN（如2-FGNN）是表达性的，但计算量很大。在这项工作中，我们研究子图GNNs作为理论上的中间地带。至关重要的是，虽然以前的工作[陈等人，2025]证明了具有3-WL表达能力的GNN可以近似强分支，我们证明了一个更尖锐的结果：节点锚定子图GNN，其表达能力严格低于3-WL [Zhang et al.，2023]足以近似强分支分数。然而，我们对四个基准数据集的广泛实证评估揭示了理论与实践之间的鲜明对比。虽然节点锚定子图GNN理论上提供了更好的分支决策，但它们的$O（n）$复杂度开销导致了显着的内存瓶颈和比MPNN和算法更慢的求解时间。我们的研究结果表明，对于MILP分支，表达性GNN的计算成本目前超过了它们在决策质量方面的收益，这表明未来的研究必须专注于保持效率的表达能力。
摘要：Graph Neural Networks (GNNs) have emerged as a promising approach for ``learning to branch'' in Mixed-Integer Linear Programming (MILP). While standard Message-Passing GNNs (MPNNs) are efficient, they theoretically lack the expressive power to fully represent MILP structures. Conversely, higher-order GNNs (like 2-FGNNs) are expressive but computationally prohibitive. In this work, we investigate Subgraph GNNs as a theoretical middle ground. Crucially, while previous work [Chen et al., 2025] demonstrated that GNNs with 3-WL expressive power can approximate Strong Branching, we prove a sharper result: node-anchored Subgraph GNNs whose expressive power is strictly lower than 3-WL [Zhang et al., 2023] are sufficient to approximate Strong Branching scores. However, our extensive empirical evaluation on four benchmark datasets reveals a stark contrast between theory and practice. While node-anchored Subgraph GNNs theoretically offer superior branching decisions, their $O(n)$ complexity overhead results in significant memory bottlenecks and slower solving times than MPNNs and heuristics. Our results indicate that for MILP branching, the computational cost of expressive GNNs currently outweighs their gains in decision quality, suggesting that future research must focus on efficiency-preserving expressivity.

【13】Tensor-Compressed and Fully-Quantized Training of Neural PDE Solvers
标题：神经PDL求解器的张量压缩和全量化训练
链接：https://arxiv.org/abs/2512.09202

作者：Jinming Lu,Jiayi Tian,Yequan Zhao,Hai Li,Zheng Zhang
备注：DATE 2026
摘要：物理信息神经网络（PINN）通过将物理定律嵌入神经网络训练目标，已成为求解偏微分方程（PDE）的一种有前途的范例。然而，它们在资源受限的平台上的部署受到大量计算和内存开销的阻碍，主要来自高阶自动微分，密集的张量运算以及对全精度算术的依赖。为了应对这些挑战，我们提出了一个框架，可以在边缘设备上实现可扩展和节能的PINN训练。该框架集成了完全量化的训练、基于Stein估计器（SE）的残差损失计算和用于权重压缩的张量训练（TT）分解。它贡献了三个关键创新：（1）混合精度训练方法，使用平方块MX（SMX）格式来消除反向传播过程中的数据重复;（2）用于Stein估计器的基于差分的量化方案，可减轻下溢;以及（3）用于TT层的部分重建方案（PRS），可减少量化误差累积。我们进一步设计了PINTA，一个精确可扩展的硬件加速器，以充分利用框架的性能。在2-D Poisson、20-D Hamilton-Jacobi-Bellman（HJB）和100-D Heat方程上的实验表明，所提出的框架实现了与全精度、未压缩基线相当或更好的精度，同时提供了5.5倍至83.5倍的加速比和159.6倍至2324.1倍的节能。这项工作使边缘设备上的实时PDE求解成为可能，并为大规模节能科学计算铺平了道路。
摘要：Physics-Informed Neural Networks (PINNs) have emerged as a promising paradigm for solving partial differential equations (PDEs) by embedding physical laws into neural network training objectives. However, their deployment on resource-constrained platforms is hindered by substantial computational and memory overhead, primarily stemming from higher-order automatic differentiation, intensive tensor operations, and reliance on full-precision arithmetic. To address these challenges, we present a framework that enables scalable and energy-efficient PINN training on edge devices. This framework integrates fully quantized training, Stein's estimator (SE)-based residual loss computation, and tensor-train (TT) decomposition for weight compression. It contributes three key innovations: (1) a mixed-precision training method that use a square-block MX (SMX) format to eliminate data duplication during backpropagation; (2) a difference-based quantization scheme for the Stein's estimator that mitigates underflow; and (3) a partial-reconstruction scheme (PRS) for TT-Layers that reduces quantization-error accumulation. We further design PINTA, a precision-scalable hardware accelerator, to fully exploit the performance of the framework. Experiments on the 2-D Poisson, 20-D Hamilton-Jacobi-Bellman (HJB), and 100-D Heat equations demonstrate that the proposed framework achieves accuracy comparable to or better than full-precision, uncompressed baselines while delivering 5.5x to 83.5x speedups and 159.6x to 2324.1x energy savings. This work enables real-time PDE solving on edge devices and paves the way for energy-efficient scientific computing at scale.

【14】Spectral Embedding via Chebyshev Bases for Robust DeepONet Approximation
标题：通过Chebyshev基嵌入光谱以实现稳健的DeepONet逼近
链接：https://arxiv.org/abs/2512.09165

作者：Muhammad Abid,Omer San
摘要：深度算子网络（DeepONets）已经成为数据驱动算子学习的核心工具，为偏微分方程（PDE）中出现的非线性映射提供了灵活的替代。然而，基于作用于原始空间或时空坐标的全连接层的标准主干设计难以表示在具有Dirichlet或Neumann边界条件的有界域上构成的偏微分方程中常见的尖锐梯度、边界层和非周期性结构。为了解决这些限制，我们引入了频谱嵌入式DeepONet（SEDONet），这是一种新的DeepONet变体，其中主干由固定的Chebyshev频谱字典而不是坐标输入驱动。这种非周期谱嵌入提供了针对有界域定制的原则性归纳偏差，使学习过的运算符能够捕获傅里叶或MLP主干难以表示的精细尺度非周期特征。SEDONet在一系列PDE基准上进行评估，包括2D Poisson，1D Burgers，1D对流扩散，Allen-Cahn动力学和Lorenz-96混沌系统，涵盖椭圆，抛物线，平流和多尺度时间现象，所有这些都可以被视为计算力学中的规范问题。在所有数据集中，SEDONet始终在DeepONet、FEDONet和SEDONet中实现最低的相对L2错误，比基线DeepONet平均提高约30-40%，并在非周期性几何形状上比傅立叶嵌入变体有意义地提高。谱分析进一步表明，SEDONet更准确地保留了高频和边界局部化特征，证明了切比雪夫嵌入在非周期算子学习中的价值。所提出的架构为DeepONets提供了一个简单的，参数中立的修改，为有界域上PDE的代理建模提供了一个强大而有效的频谱框架。
摘要：Deep Operator Networks (DeepONets) have become a central tool in data-driven operator learning, providing flexible surrogates for nonlinear mappings arising in partial differential equations (PDEs). However, the standard trunk design based on fully connected layers acting on raw spatial or spatiotemporal coordinates struggles to represent sharp gradients, boundary layers, and non-periodic structures commonly found in PDEs posed on bounded domains with Dirichlet or Neumann boundary conditions. To address these limitations, we introduce the Spectral-Embedded DeepONet (SEDONet), a new DeepONet variant in which the trunk is driven by a fixed Chebyshev spectral dictionary rather than coordinate inputs. This non-periodic spectral embedding provides a principled inductive bias tailored to bounded domains, enabling the learned operator to capture fine-scale non-periodic features that are difficult for Fourier or MLP trunks to represent. SEDONet is evaluated on a suite of PDE benchmarks including 2D Poisson, 1D Burgers, 1D advection-diffusion, Allen-Cahn dynamics, and the Lorenz-96 chaotic system, covering elliptic, parabolic, advective, and multiscale temporal phenomena, all of which can be viewed as canonical problems in computational mechanics. Across all datasets, SEDONet consistently achieves the lowest relative L2 errors among DeepONet, FEDONet, and SEDONet, with average improvements of about 30-40% over the baseline DeepONet and meaningful gains over Fourier-embedded variants on non-periodic geometries. Spectral analyses further show that SEDONet more accurately preserves high-frequency and boundary-localized features, demonstrating the value of Chebyshev embeddings in non-periodic operator learning. The proposed architecture offers a simple, parameter-neutral modification to DeepONets, delivering a robust and efficient spectral framework for surrogate modeling of PDEs on bounded domains.

【15】Banach neural operator for Navier-Stokes equations
标题：Navier-Stokes方程的Banach神经运算符
链接：https://arxiv.org/abs/2512.09070

作者：Bo Zhang
摘要：经典神经网络以其近似有限维空间之间的映射的能力而闻名，但它们在捕获无限维函数空间中的复杂算子动力学方面存在不足。相比之下，神经运算符已经成为科学机器学习中学习此类映射的强大工具。然而，标准的神经操作符通常缺乏跨空间和时间混合或关注输入信息的机制。在这项工作中，我们介绍了Banach神经算子（BNO）-一种将Koopman算子理论与深度神经网络相结合的新框架，可以从部分观测中预测非线性时空动态。BNO通过将谱线性化（通过Koopman理论）与深度特征学习（通过卷积神经网络和非线性激活）相结合来近似Banach空间之间的非线性算子。该序列到序列模型捕获主导动态模式，并允许网格独立预测。Navier-Stokes方程的数值实验验证了该方法的精度和推广能力。特别是，BNO在非定常流预测中实现了稳健的zero-shot超分辨率，并始终优于传统的基于Koopman的方法和深度学习模型。
摘要：Classical neural networks are known for their ability to approximate mappings between finite-dimensional spaces, but they fall short in capturing complex operator dynamics across infinite-dimensional function spaces. Neural operators, in contrast, have emerged as powerful tools in scientific machine learning for learning such mappings. However, standard neural operators typically lack mechanisms for mixing or attending to input information across space and time. In this work, we introduce the Banach neural operator (BNO) -- a novel framework that integrates Koopman operator theory with deep neural networks to predict nonlinear, spatiotemporal dynamics from partial observations. The BNO approximates a nonlinear operator between Banach spaces by combining spectral linearization (via Koopman theory) with deep feature learning (via convolutional neural networks and nonlinear activations). This sequence-to-sequence model captures dominant dynamic modes and allows for mesh-independent prediction. Numerical experiments on the Navier-Stokes equations demonstrate the method's accuracy and generalization capabilities. In particular, BNO achieves robust zero-shot super-resolution in unsteady flow prediction and consistently outperforms conventional Koopman-based methods and deep learning models.

【16】Improving Multi-Class Calibration through Normalization-Aware Isotonic Techniques
标题：通过规范化感知等张技术改进多类校准
链接：https://arxiv.org/abs/2512.09054

作者：Alon Arad,Saharon Rosset
摘要：准确和可靠的概率预测对于多类监督学习任务至关重要，其中校准良好的模型可以做出合理的决策。虽然保序回归已被证明对二进制校准有效，但与参数方法相比，它通过一对多校准扩展到多类问题产生了次优结果，限制了其实际应用。在这项工作中，我们提出了新的等渗归一化感知技术的多类校准，接地在自然和直观的假设预期从业者。与以前的方法不同，我们的方法本质上是通过将归一化直接纳入优化过程（NA-FIR）或将问题建模为累积二元保序回归（SCIR）来实现概率归一化。对不同模型架构的各种文本和图像分类数据集的实证评估表明，我们的方法始终提高了负对数似然（NLL）和预期校准误差（ECE）指标。
摘要：Accurate and reliable probability predictions are essential for multi-class supervised learning tasks, where well-calibrated models enable rational decision-making. While isotonic regression has proven effective for binary calibration, its extension to multi-class problems via one-vs-rest calibration produced suboptimal results when compared to parametric methods, limiting its practical adoption. In this work, we propose novel isotonic normalization-aware techniques for multiclass calibration, grounded in natural and intuitive assumptions expected by practitioners. Unlike prior approaches, our methods inherently account for probability normalization by either incorporating normalization directly into the optimization process (NA-FIR) or modeling the problem as a cumulative bivariate isotonic regression (SCIR). Empirical evaluation on a variety of text and image classification datasets across different model architectures reveals that our approach consistently improves negative log-likelihood (NLL) and expected calibration error (ECE) metrics.

【17】Luxical: High-Speed Lexical-Dense Text Embeddings
标题：奢华：高速词汇密集文本嵌入
链接：https://arxiv.org/abs/2512.09015

作者：DatologyAI,:,Luke Merrick,Alex Fang,Aldo Carranza,Alvin Deng,Amro Abbas,Brett Larsen,Cody Blakeney,Darren Teh,David Schwab,Fan Pan,Haakon Mongstad,Haoli Yin,Jack Urbanek,Jason Lee,Jason Telanoff,Josh Wills,Kaleigh Mentzer,Paul Burstein,Parth Doshi,Paul Burnstein,Pratyush Maini,Ricardo Monti,Rishabh Adiga,Scott Loftin,Siddharth Joshi,Spandan Das,Tony Jiang,Vineeth Dorma,Zhengping Wang,Bogdan Gaza,Ari Morcos,Matthew Leavitt
备注：9 pages, 6 figures
摘要：前沿语言模型的质量越来越取决于我们组织网络规模的文本语料库进行训练的能力。今天占主导地位的工具权衡了速度和灵活性：词汇分类器（例如，FastText）快速但限于产生分类输出分数，而Transformer文本嵌入模型的向量值输出灵活地支持许多工作流（例如，聚类、分类和检索），但是产生起来在计算上昂贵。我们介绍Luxical，一个高速“词汇密集”的文本嵌入库，旨在恢复这两种方法的最佳性能，用于网络规模的文本组织。Luxical结合了稀疏的TF-IDF特征、一个小型ReLU网络和一个知识蒸馏训练方案，以其运营成本的一小部分来近似大型Transformer嵌入模型。在这份技术报告中，我们描述了Luxical的架构和训练目标，并在两个不同的应用程序中评估了一个具体的Luxical模型：一个有针对性的webcrawl文档检索测试和一个基于文本分类的端到端语言模型数据策展任务。在这些任务中，我们展示了在不同大小的神经基线上从3倍到100倍的加速，并且与数据策展任务期间的FastText模型推理相当。在这些评估中，测试的Luxical模型说明了大规模文本组织的有利计算/质量权衡，与神经基线的质量相匹配。Luxical作为开源软件可在https://github.com/datologyai/luxical上获得。
摘要：Frontier language model quality increasingly hinges on our ability to organize web-scale text corpora for training. Today's dominant tools trade off speed and flexibility: lexical classifiers (e.g., FastText) are fast but limited to producing classification output scores, while the vector-valued outputs of transformer text embedding models flexibly support numerous workflows (e.g., clustering, classification, and retrieval) but are computationally expensive to produce. We introduce Luxical, a library for high-speed "lexical-dense" text embeddings that aims to recover the best properties of both approaches for web-scale text organization. Luxical combines sparse TF--IDF features, a small ReLU network, and a knowledge distillation training regimen to approximate large transformer embedding models at a fraction of their operational cost. In this technical report, we describe the Luxical architecture and training objective and evaluate a concrete Luxical model in two disparate applications: a targeted webcrawl document retrieval test and an end-to-end language model data curation task grounded in text classification. In these tasks we demonstrate speedups ranging from 3x to 100x over varying-sized neural baselines, and comparable to FastText model inference during the data curation task. On these evaluations, the tested Luxical model illustrates favorable compute/quality trade-offs for large-scale text organization, matching the quality of neural baselines. Luxical is available as open-source software at https://github.com/datologyai/luxical.

【18】Demo: Generative AI helps Radiotherapy Planning with User Preference
标题：演示：生成性人工智能根据用户偏好帮助放射治疗规划
链接：https://arxiv.org/abs/2512.08996

作者：Riqiang Gao,Simon Arberet,Martin Kraus,Han Liu,Wilko FAR Verbakel,Dorin Comaniciu,Florin-Cristian Ghesu,Ali Kamen
备注：Best paper in GenAI4Health at NeurIPS 2025
摘要：放射治疗计划是一个高度复杂的过程，在不同的机构和个人计划者之间往往存在显著差异。大多数现有的3D剂量预测深度学习方法在训练过程中依赖参考计划作为基础事实，这可能会无意中使模型偏向于特定的规划风格或机构偏好。在这项研究中，我们介绍了一种新的生成模型，预测3D剂量分布的基础上，完全用户定义的偏好口味。这些可定制的偏好使规划者能够优先考虑风险器官（OAR）和规划目标体积（PTV）之间的特定权衡，提供更大的灵活性和个性化。我们的方法旨在与临床治疗计划系统无缝集成，帮助用户高效地生成高质量的计划。比较评估表明，我们的方法可以超越瓦里安RapidPlan模型的适应性和计划质量在某些情况下。
摘要：Radiotherapy planning is a highly complex process that often varies significantly across institutions and individual planners. Most existing deep learning approaches for 3D dose prediction rely on reference plans as ground truth during training, which can inadvertently bias models toward specific planning styles or institutional preferences. In this study, we introduce a novel generative model that predicts 3D dose distributions based solely on user-defined preference flavors. These customizable preferences enable planners to prioritize specific trade-offs between organs-at-risk (OARs) and planning target volumes (PTVs), offering greater flexibility and personalization. Designed for seamless integration with clinical treatment planning systems, our approach assists users in generating high-quality plans efficiently. Comparative evaluations demonstrate that our method can surpasses the Varian RapidPlan model in both adaptability and plan quality in some scenarios.

【19】Financial Instruction Following Evaluation (FIFE)
标题：评估后的财务指示（FIFE）
链接：https://arxiv.org/abs/2512.08965

作者：Glenn Matlin,Siddharth,Anirudh JM,Aditya Shukla,Yahya Hassan,Sudheer Chava
备注：Accepted at NeurIPS 2025 Generative AI in Finance Workshop (GenAI Finance), San Diego. Camera-ready version. Code and data: https://github.com/gtfintechlab/FIFE/
摘要：语言模型（LM）与复杂的、相互依赖的指令斗争，特别是在精度至关重要的金融等高风险领域。我们介绍FIFE，一种新颖的，高难度的基准，旨在评估LM的预防以下能力的财务分析任务。FIFE包含88个人工编写的提示，并采用了一个验证系统，该系统具有可链接的、可验证的约束，用于细粒度的奖励信号。我们评估了53个模型（专有的，开放的重量，开源）在zero-shot设置。我们的主要发现揭示了一个清晰的性能层次：顶级开放权重模型（76.1严格/ 79.5宽松）超过了领先的专有系统（65.9严格/ 70.5宽松），而最好的开源模型则明显落后（45.5严格/ 48.9宽松）。然而，即使是性能最好的模型也难以满足FIFE的复杂要求，无法实现完美的合规性。我们将数据集和代码作为开源资源发布，以促进金融领域强化学习的研究。
摘要：Language Models (LMs) struggle with complex, interdependent instructions, particularly in high-stakes domains like finance where precision is critical. We introduce FIFE, a novel, high-difficulty benchmark designed to assess LM instruction-following capabilities for financial analysis tasks. FIFE comprises 88 human-authored prompts and employs a verification system with chainable, verifiable constraints for fine-grained reward signals. We evaluate 53 models (proprietary, open-weight, open-source) in a zero-shot setting. Our key findings reveal a clear performance hierarchy: the top open-weight model (76.1 strict / 79.5 loose) surpasses the leading proprietary system (65.9 strict / 70.5 loose), while the best open-source models lag significantly (45.5 strict / 48.9 loose). However, even top-performing models struggle with FIFE's complex requirements, failing to achieve perfect compliance. We release our dataset and code as an open-source resource to promote research in Reinforcement Learning for the financial domain.

机器翻译由腾讯交互翻译提供，仅供参考

点击“阅读原文”获取带摘要的学术速递