机器学习学术速递[10.27]

点击阅读原文访问arxivdaily.com，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏等功能！

cs.LG 方向，今日共计170篇

大模型相关(19篇)

【1】Few-Shot Knowledge Distillation of LLMs With Counterfactual Explanations
标题：具有反事实解释的LLM的Few-Shot知识提炼
链接：https://arxiv.org/abs/2510.21631

作者：Faisal Hamman, Pasan Dissanayake, Yanjun Fu, Sanghamitra Dutta
备注：NeurIPS 2025
摘要：知识蒸馏是一种很有前途的方法，可以将复杂的教师模型的能力转移到更小的，资源高效的学生模型，这些模型可以很容易地部署，特别是在任务感知的场景中。然而，现有的任务感知提取方法通常需要大量的数据，这些数据在许多实际场景中可能是不可用的或昂贵的。在本文中，我们通过引入一种新的策略，称为反事实解释注入蒸馏CoD的Few-Shot任务意识的知识蒸馏系统注入反事实的解释，以应对这一挑战。反事实解释（CFE）是指能够以最小扰动翻转教师模型的输出预测的输入。我们的策略CoD利用这些CFE以显著更少的样本精确地映射教师的决策边界。我们提供理论保证激励CFEs在蒸馏中的作用，从统计和几何的角度。我们在数学上表明，CFEs可以提高参数估计提供更多的信息的例子附近的教师的决策边界。我们还获得几何的见解CFEs如何有效地作为知识探针，帮助学生更有效地模仿老师的决策边界比标准数据。我们在各种数据集和LLM上进行实验，以表明CoD在Few-Shot制度（低至8-512个样本）中优于标准蒸馏方法。值得注意的是，CoD仅使用基线使用的原始样本的一半，与其相应的CFE配对，仍然提高了性能。
摘要：Knowledge distillation is a promising approach to transfer capabilities from complex teacher models to smaller, resource-efficient student models that can be deployed easily, particularly in task-aware scenarios. However, existing methods of task-aware distillation typically require substantial quantities of data which may be unavailable or expensive to obtain in many practical scenarios. In this paper, we address this challenge by introducing a novel strategy called Counterfactual-explanation-infused Distillation CoD for few-shot task-aware knowledge distillation by systematically infusing counterfactual explanations. Counterfactual explanations (CFEs) refer to inputs that can flip the output prediction of the teacher model with minimum perturbation. Our strategy CoD leverages these CFEs to precisely map the teacher's decision boundary with significantly fewer samples. We provide theoretical guarantees for motivating the role of CFEs in distillation, from both statistical and geometric perspectives. We mathematically show that CFEs can improve parameter estimation by providing more informative examples near the teacher's decision boundary. We also derive geometric insights on how CFEs effectively act as knowledge probes, helping the students mimic the teacher's decision boundaries more effectively than standard data. We perform experiments across various datasets and LLMs to show that CoD outperforms standard distillation approaches in few-shot regimes (as low as 8-512 samples). Notably, CoD only uses half of the original samples used by the baselines, paired with their corresponding CFEs and still improves performance.

【2】Scalable Vision-Language-Action Model Pretraining for Robotic Manipulation with Real-Life Human Activity Videos
标题：基于真实人类活动视频的机器人操作可扩展视觉-语言-动作模型预训练
链接：https://arxiv.org/abs/2510.21571

作者：Qixiu Li, Yu Deng, Yaobo Liang, Lin Luo, Lei Zhou, Chengtang Yao, Lingqi Zeng, Zhiyuan Feng, Huizhi Liang, Sicheng Xu, Yizhong Zhang, Xi Chen, Hao Chen, Lily Sun, Dong Chen, Jiaolong Yang, Baining Guo
备注：Project page: this https URL
摘要：本文提出了一种新的预训练机器人操作视觉语言动作（VLA）模型的方法，使用大量的人类手部活动的无脚本的现实生活中的视频记录。将人手视为灵巧的机器人末端执行器，我们表明，没有任何注释的“在野外”以自我为中心的人类视频可以转换为与现有机器人V-L-A训练数据在任务粒度和标签方面完全一致的数据格式。这是通过为任意人手视频开发全自动整体人类活动分析方法来实现的。这种方法可以生成原子级的手部活动片段及其语言描述，每个片段都伴随着帧级的3D手部运动和相机运动。我们处理了大量以自我为中心的视频，并创建了一个包含100万集和2600万帧的手动VLA训练数据集。这些训练数据涵盖了广泛的对象和概念，灵巧的操作任务以及现实生活中的环境变化，大大超过了现有机器人数据的覆盖范围。我们设计了一个灵巧手VLA模型架构，并在此数据集上预训练模型。该模型在完全不可见的现实世界观察中表现出强大的零射击（zero-shot）能力。此外，在少量真实机器人动作数据上对其进行微调，可以显着提高任务成功率，并在真实机器人实验中推广到新对象。我们还展示了模型的任务性能相对于预训练数据规模的吸引人的缩放行为。我们相信这项工作为可扩展的VLA预训练奠定了坚实的基础，推动机器人走向真正可推广的体现智能。
摘要：This paper presents a novel approach for pretraining robotic manipulation Vision-Language-Action (VLA) models using a large corpus of unscripted real-life video recordings of human hand activities. Treating human hand as dexterous robot end-effector, we show that "in-the-wild" egocentric human videos without any annotations can be transformed into data formats fully aligned with existing robotic V-L-A training data in terms of task granularity and labels. This is achieved by the development of a fully-automated holistic human activity analysis approach for arbitrary human hand videos. This approach can generate atomic-level hand activity segments and their language descriptions, each accompanied with framewise 3D hand motion and camera motion. We process a large volume of egocentric videos and create a hand-VLA training dataset containing 1M episodes and 26M frames. This training data covers a wide range of objects and concepts, dexterous manipulation tasks, and environment variations in real life, vastly exceeding the coverage of existing robot data. We design a dexterous hand VLA model architecture and pretrain the model on this dataset. The model exhibits strong zero-shot capabilities on completely unseen real-world observations. Additionally, fine-tuning it on a small amount of real robot action data significantly improves task success rates and generalization to novel objects in real robotic experiments. We also demonstrate the appealing scaling behavior of the model's task performance with respect to pretraining data scale. We believe this work lays a solid foundation for scalable VLA pretraining, advancing robots toward truly generalizable embodied intelligence.

【3】Wisdom and Delusion of LLM Ensembles for Code Generation and Repair
标题：LLM代码生成和修复套件的智慧与错觉
链接：https://arxiv.org/abs/2510.21513

作者：Fernando Vallecillos Ruiz, Max Hort, Leon Moonen
摘要：如今，对所有软件工程任务的单一大型语言模型（LMM）的追求是资源密集型的，并且忽视了互补性的潜在好处，不同的模型贡献了独特的优势。然而，编码LLM相互补充的程度以及最大化集成潜力的最佳策略尚不清楚，这使得从业者没有明确的途径超越单一模型系统。为了解决这一差距，我们经验比较十个独立的法学硕士从五个家庭，这些法学硕士在三个软件工程基准涵盖代码生成和程序修复的三个合奏。我们评估模型之间的互补性和最佳个体模型与集合之间的性能差距。接下来，我们评估各种选择算法，以确定正确的解决方案，从合奏的候选池。我们发现，理论上限的合奏的性能可以是83%以上的最好的单一模型。我们的研究结果表明，选择解决方案的共识为基础的战略落入“流行陷阱”，放大常见的，但不正确的输出。相比之下，基于多样性的策略实现了高达95%的理论潜力，即使在小型双模型集成中也证明是有效的，从而通过利用多个LLM实现了一种具有成本效益的方式来提高性能。
摘要：Today's pursuit of a single Large Language Model (LMM) for all software engineering tasks is resource-intensive and overlooks the potential benefits of complementarity, where different models contribute unique strengths. However, the degree to which coding LLMs complement each other and the best strategy for maximizing an ensemble's potential are unclear, leaving practitioners without a clear path to move beyond single-model systems. To address this gap, we empirically compare ten individual LLMs from five families, and three ensembles of these LLMs across three software engineering benchmarks covering code generation and program repair. We assess the complementarity between models and the performance gap between the best individual model and the ensembles. Next, we evaluate various selection heuristics to identify correct solutions from an ensemble's candidate pool. We find that the theoretical upperbound for an ensemble's performance can be 83% above the best single model. Our results show that consensus-based strategies for selecting solutions fall into a "popularity trap," amplifying common but incorrect outputs. In contrast, a diversity-based strategy realizes up to 95% of this theoretical potential, and proves effective even in small two-model ensembles, enabling a cost-efficient way to enhance performance by leveraging multiple LLMs.

【4】SBASH: a Framework for Designing and Evaluating RAG vs. Prompt-Tuned LLM Honeypots
标题：SBASH：设计和评估RAG与预算调整LLM蜜罐的框架
链接：https://arxiv.org/abs/2510.21459

作者：Adetayo Adebimpe, Helmut Neukirchen, Thomas Welsh
备注：to be published in: The 3rd International Conference on Foundation and Large Language Models (FLLM2025), IEEE, 2025
摘要：蜜罐是一种诱饵系统，用于收集有价值的威胁情报或将攻击者从生产系统转移开。最大限度地提高攻击者的参与度对它们的效用至关重要。然而，研究强调，情境感知，例如对新攻击类型、系统和攻击者代理做出响应的能力，对于提高参与度是必要的。大型语言模型（LLM）已被证明是提高上下文感知的一种方法，但面临着一些挑战，包括响应时间的准确性和及时性，高运营成本以及云部署带来的数据保护问题。我们提出了基于系统的注意壳蜜罐（SBASH）框架，通过使用轻量级本地LLM管理数据保护问题。我们调查了使用检索增强生成（RAG）支持的LLM和非RAG LLM的Linux shell命令，并使用几个不同的指标，如响应时间差异，从人类测试人员的现实主义，以及与Levenshtein距离，SBert和BertScore计算的真实系统的相似性来评估它们。我们表明，RAG提高了未调优模型的准确性，而通过系统提示，告诉LLM响应像Linux系统实现没有RAG与RAG未调优类似的准确性，同时具有略低的延迟已调优的模型。
摘要：Honeypots are decoy systems used for gathering valuable threat intelligence or diverting attackers away from production systems. Maximising attacker engagement is essential to their utility. However research has highlighted that context-awareness, such as the ability to respond to new attack types, systems and attacker agents, is necessary to increase engagement. Large Language Models (LLMs) have been shown as one approach to increase context awareness but suffer from several challenges including accuracy and timeliness of response time, high operational costs and data-protection issues due to cloud deployment. We propose the System-Based Attention Shell Honeypot (SBASH) framework which manages data-protection issues through the use of lightweight local LLMs. We investigate the use of Retrieval Augmented Generation (RAG) supported LLMs and non-RAG LLMs for Linux shell commands and evaluate them using several different metrics such as response time differences, realism from human testers, and similarity to a real system calculated with Levenshtein distance, SBert, and BertScore. We show that RAG improves accuracy for untuned models while models that have been tuned via a system prompt that tells the LLM to respond like a Linux system achieve without RAG a similar accuracy as untuned with RAG, while having a slightly lower latency.

【5】ParaRNN: Unlocking Parallel Training of Nonlinear RNNs for Large Language Models
标题：ParaRNN：解锁大型语言模型的非线性RNN并行训练
链接：https://arxiv.org/abs/2510.21450

作者：Federico Danieli, Pau Rodriguez, Miguel Sarabia, Xavier Suau, Luca Zappella
摘要：递归神经网络（RNN）为序列建模奠定了基础，但其固有的序列性质限制了并行计算，从而为扩展带来了根本性障碍。这导致了并行架构的主导地位，如Transformers和最近的状态空间模型（SSM）。虽然SSM通过结构化线性递归实现了高效的并行化，但这种线性约束限制了它们的表达能力，并排除了对复杂的非线性顺序依赖关系的建模。为了解决这个问题，我们提出了ParaRNN，这是一个打破非线性RNN序列并行化障碍的框架。在先前工作的基础上，我们将非线性递归关系序列转换为单个方程组，我们使用牛顿迭代结合自定义并行约简并行求解。我们的实现比简单的顺序应用程序实现了高达665倍的加速，允许以前所未有的规模训练非线性RNN。为了展示这一点，我们将ParaRNN应用于LSTM和GRU架构的调整，成功地训练了7B参数的模型，这些模型的复杂度可与类似大小的Transformers和Mamba2架构相媲美。为了加速有效序列建模的研究，我们发布了ParaRNN代码库，作为非线性RNN自动训练并行化的开源框架，使研究人员和从业者能够大规模探索新的非线性RNN模型。
摘要：Recurrent Neural Networks (RNNs) laid the foundation for sequence modeling, but their intrinsic sequential nature restricts parallel computation, creating a fundamental barrier to scaling. This has led to the dominance of parallelizable architectures like Transformers and, more recently, State Space Models (SSMs). While SSMs achieve efficient parallelization through structured linear recurrences, this linearity constraint limits their expressive power and precludes modeling complex, nonlinear sequence-wise dependencies. To address this, we present ParaRNN, a framework that breaks the sequence-parallelization barrier for nonlinear RNNs. Building on prior work, we cast the sequence of nonlinear recurrence relationships as a single system of equations, which we solve in parallel using Newton's iterations combined with custom parallel reductions. Our implementation achieves speedups of up to 665x over naive sequential application, allowing training nonlinear RNNs at unprecedented scales. To showcase this, we apply ParaRNN to adaptations of LSTM and GRU architectures, successfully training models of 7B parameters that attain perplexity comparable to similarly-sized Transformers and Mamba2 architectures. To accelerate research in efficient sequence modeling, we release the ParaRNN codebase as an open-source framework for automatic training-parallelization of nonlinear RNNs, enabling researchers and practitioners to explore new nonlinear RNN models at scale.

【6】REMONI: An Autonomous System Integrating Wearables and Multimodal Large Language Models for Enhanced Remote Health Monitoring
标题：REMONI：一个集成可穿戴设备和多模式大型语言模型的自治系统，用于增强远程健康监测
链接：https://arxiv.org/abs/2510.21445

作者：Thanh Cong Ho, Farah Kharrat, Abderrazek Abid, Fakhri Karray
备注：None
摘要：随着可穿戴设备在我们日常生活中的广泛采用，对远程患者监测的需求和吸引力显著增加。该领域的大多数研究都集中在收集传感器数据，将其可视化并分析它以检测糖尿病，心脏病和抑郁症等特定疾病的异常。然而，该领域在人机交互方面存在着显著的差距。本文提出了REMONI，这是一个自主的远程健康监测系统，它集成了多模态大型语言模型（MLLM），物联网（IoT）和可穿戴设备。该系统自动连续收集生命体征、来自特殊可穿戴设备（如智能手表）的加速度计数据以及从摄像头收集的患者视频剪辑中的视觉数据。该数据由异常检测模块处理，该异常检测模块包括跌倒检测模型和算法，以识别并提醒护理人员患者的紧急状况。我们提出的系统的一个显着特点是自然语言处理组件，开发MLLM能够检测和识别病人的活动和情绪，同时响应医护人员的查询。此外，即时工程采用无缝集成所有患者信息。因此，医生和护士可以通过用户友好的Web应用程序与智能代理进行交互，从而访问实时生命体征以及患者的当前状态和情绪。我们的实验表明，我们的系统是可实现的，可扩展的现实生活中的场景，可能会减少医疗专业人员的工作量和医疗成本。已经开发并正在测试一个说明该系统功能的成熟原型，以证明其各种能力的可靠性。
摘要：With the widespread adoption of wearable devices in our daily lives, the demand and appeal for remote patient monitoring have significantly increased. Most research in this field has concentrated on collecting sensor data, visualizing it, and analyzing it to detect anomalies in specific diseases such as diabetes, heart disease and depression. However, this domain has a notable gap in the aspect of human-machine interaction. This paper proposes REMONI, an autonomous REmote health MONItoring system that integrates multimodal large language models (MLLMs), the Internet of Things (IoT), and wearable devices. The system automatically and continuously collects vital signs, accelerometer data from a special wearable (such as a smartwatch), and visual data in patient video clips collected from cameras. This data is processed by an anomaly detection module, which includes a fall detection model and algorithms to identify and alert caregivers of the patient's emergency conditions. A distinctive feature of our proposed system is the natural language processing component, developed with MLLMs capable of detecting and recognizing a patient's activity and emotion while responding to healthcare worker's inquiries. Additionally, prompt engineering is employed to integrate all patient information seamlessly. As a result, doctors and nurses can access real-time vital signs and the patient's current state and mood by interacting with an intelligent agent through a user-friendly web application. Our experiments demonstrate that our system is implementable and scalable for real-life scenarios, potentially reducing the workload of medical professionals and healthcare costs. A full-fledged prototype illustrating the functionalities of the system has been developed and being tested to demonstrate the robustness of its various capabilities.

【7】Vision Language Models for Dynamic Human Activity Recognition in Healthcare Settings
标题：用于医疗保健环境中动态人类活动识别的视觉语言模型
链接：https://arxiv.org/abs/2510.21424

作者：Abderrazek Abid, Thanh-Cong Ho, Fakhri Karray
摘要：随着生成式人工智能的不断发展，视觉语言模型（VLM）已成为各种医疗保健应用中有前途的工具。一个仍然相对未充分探索的领域是它们在人类活动识别（HAR）中的使用，用于远程健康监测。VLM具有显著的优势，包括更大的灵活性和克服传统深度学习模型的一些限制的能力。然而，一个关键的挑战，在应用VLMs HAR在于难以评估其动态和往往是不确定的输出。为了解决这一差距，我们引入了一个描述性的标题数据集，并提出了全面的评价方法来评估HAR中的VLM。通过与最先进的深度学习模型的比较实验，我们的研究结果表明，VLM实现了相当的性能，在某些情况下，甚至在准确性方面超过了传统方法。这项工作提供了一个强大的基准，并为将VLM集成到智能医疗保健系统中开辟了新的可能性。
摘要：As generative AI continues to evolve, Vision Language Models (VLMs) have emerged as promising tools in various healthcare applications. One area that remains relatively underexplored is their use in human activity recognition (HAR) for remote health monitoring. VLMs offer notable strengths, including greater flexibility and the ability to overcome some of the constraints of traditional deep learning models. However, a key challenge in applying VLMs to HAR lies in the difficulty of evaluating their dynamic and often non-deterministic outputs. To address this gap, we introduce a descriptive caption data set and propose comprehensive evaluation methods to evaluate VLMs in HAR. Through comparative experiments with state-of-the-art deep learning models, our findings demonstrate that VLMs achieve comparable performance and, in some cases, even surpass conventional approaches in terms of accuracy. This work contributes a strong benchmark and opens new possibilities for the integration of VLMs into intelligent healthcare systems.

【8】Large Language Models as Model Organisms for Human Associative Learning
标题：大型语言模型作为人类联想学习的模型生物体
链接：https://arxiv.org/abs/2510.21408

作者：Camila Kolling, Vy Ai Vo, Mariya Toneva
摘要：联想学习--在共同出现的项目之间形成联系--是人类认知的基础，以复杂的方式重塑内部表征。测试生物系统中表征变化如何发生的假设具有挑战性，但大型语言模型（LLM）提供了一种可扩展的替代方案。基于LLM的上下文学习，我们采用了认知神经科学联想学习范式，并研究了表征如何在六种模型中演变。我们的初步研究结果揭示了一个非单调的模式与非单调可塑性假设一致，适度相似的项目区分后学习。利用LLM的可控性，我们进一步表明，这种差异是由相关项目与更广泛的词汇重叠调制的-这是我们称之为词汇干扰的一个因素，捕捉新的关联如何与先前的知识竞争。我们发现，更高的词汇干扰放大分化，这表明表征变化的影响项目相似性和全球竞争。我们的研究结果不仅将LLM定位为研究类人学习系统中表征动力学的强大工具，而且还将其作为可访问的通用计算模型，用于生成有关大脑记忆重组原理的新假设。
摘要：Associative learning--forming links between co-occurring items--is fundamental to human cognition, reshaping internal representations in complex ways. Testing hypotheses on how representational changes occur in biological systems is challenging, but large language models (LLMs) offer a scalable alternative. Building on LLMs' in-context learning, we adapt a cognitive neuroscience associative learning paradigm and investigate how representations evolve across six models. Our initial findings reveal a non-monotonic pattern consistent with the Non-Monotonic Plasticity Hypothesis, with moderately similar items differentiating after learning. Leveraging the controllability of LLMs, we further show that this differentiation is modulated by the overlap of associated items with the broader vocabulary--a factor we term vocabulary interference, capturing how new associations compete with prior knowledge. We find that higher vocabulary interference amplifies differentiation, suggesting that representational change is influenced by both item similarity and global competition. Our findings position LLMs not only as powerful tools for studying representational dynamics in human-like learning systems, but also as accessible and general computational models for generating new hypotheses about the principles underlying memory reorganization in the brain.

【9】Multi-turn Training with Basic Human Feedback Helps Little on LLM Reasoning
标题：具有基本人类反馈的多回合训练对LLM推理帮助不大
链接：https://arxiv.org/abs/2510.21339

作者：Qiang Liu, Wuganjing Song, Zhenzhou Lin, Feifan Chen, Qiaolong Cai, Chen Li, Yongduo Sui
摘要：大型语言模型（LLM）的推理能力通常是通过单轮强化学习开发的，而现实世界的应用程序通常涉及与人类反馈的多轮交互，导致训练和部署条件之间的潜在不匹配。在这项工作中，我们研究了多轮培训与人类反馈是否是必要的推理任务。我们比较了传统的单回合训练和三种多回合策略，得出了与以往研究相反的结论。我们发现，在单回合设置训练的模型有效地推广到单回合和多回合评估，而多回合策略训练的模型表现出显着的单回合推理性能下降。这些结果表明，对于具有完整信息的任务，稳健的单轮训练仍然更有效和可靠，因为具有基本反馈的多轮训练提供的好处有限，甚至会降低推理能力。
摘要：The reasoning capabilities of Large Language Models (LLMs) are typically developed through the single-turn reinforcement learning, whereas real-world applications often involve multi-turn interactions with human feedback, leading to a potential mismatch between training and deployment conditions. In this work, we study whether multi-turn training with human feedback is necessary for reasoning tasks. We compare conventional single-turn training with three multi-turn strategies and reach contrary conclusions to previous research. We find that models trained in a single-turn setting generalize effectively to both single- and multi-turn evaluations, while models trained with multi-turn strategies exhibit a significant degradation in single-turn reasoning performance. These results suggest that for tasks with complete information, robust single-turn training remains more effective and reliable, as multi-turn training with basic feedback provides limited benefits and can even degrade reasoning capabilities.

【10】Leverage Unlearning to Sanitize LLMs
标题：利用“放弃学习”来净化法学硕士
链接：https://arxiv.org/abs/2510.21322

作者：Antoine Boutet, Lucas Magnana
摘要：预训练的大型语言模型（LLM）在各种任务中变得越来越有用。为了提高它们在某些任务上的性能，有必要在特定的数据语料库上对其进行微调（例如，医疗报告、商业数据）。这些专用数据语料库可能包含敏感数据（例如，个人或机密数据），这些数据将被模型记忆，并且在其随后的使用期间可能被删除。模型对敏感信息的这种记忆会带来严重的隐私或保密问题。为了消除这种记忆和净化模型，而不需要昂贵的额外微调安全的数据语料库，我们提出了SANI。SANI是一种用于净化语言模型的非学习方法。它依赖于擦除和修复阶段，1）重置模型最后几层的某些神经元，以破坏细粒度信息的记忆，然后2）微调模型，同时避免记忆敏感信息。我们全面评估了SANI，通过从模型的记忆中删除直接和间接的标识符来净化经过微调和专门化的医疗数据模型，以及通过从模型中删除定义为机密信息的特定术语来净化标准预训练模型。结果表明，只有几个额外的时期的unlearning，该模型是消毒和重复的数量大大减少。这种方法对于医院或其他已经在大型数据集上花费大量资源训练模型并希望在共享之前对其进行消毒的行业特别有用。
摘要：Pre-trained large language models (LLMs) are becoming useful for various tasks. To improve their performance on certain tasks, it is necessary to fine-tune them on specific data corpora (e.g., medical reports, business data). These specialized data corpora may contain sensitive data (e.g., personal or confidential data) that will be memorized by the model and likely to be regurgitated during its subsequent use. This memorization of sensitive information by the model poses a significant privacy or confidentiality issue. To remove this memorization and sanitize the model without requiring costly additional fine-tuning on a secured data corpus, we propose SANI. SANI is an unlearning approach to sanitize language models. It relies on both an erasure and repair phases that 1) reset certain neurons in the last layers of the model to disrupt the memorization of fine-grained information, and then 2) fine-tune the model while avoiding memorizing sensitive information. We comprehensively evaluate SANI to sanitize both a model fine-tuned and specialized with medical data by removing directly and indirectly identifiers from the memorization of the model, and a standard pre-trained model by removing specific terms defined as confidential information from the model. Results show that with only few additional epochs of unlearning, the model is sanitized and the number of regurgitations is drastically reduced. This approach can be particularly useful for hospitals or other industries that have already spent significant resources training models on large datasets and wish to sanitize them before sharing.

【11】Efficient semantic uncertainty quantification in language models via diversity-steered sampling
标题：通过多样性引导的采样在语言模型中进行高效的语义不确定性量化
链接：https://arxiv.org/abs/2510.21310

作者：Ji Won Park, Kyunghyun Cho
备注：10 pages (+7 appendix), 7 figures. Accepted at NeurIPS 2025
摘要：准确估计大型语言模型（LLM）中的语义任意和认知不确定性在自由形式的问答（QA）中特别具有挑战性，其中获得稳定的估计通常需要许多昂贵的代。我们引入了一个多样性导向的采样器，不鼓励语义冗余的输出在解码过程中，涵盖自回归和掩蔽扩散范例，并产生大量的采样效率的收益。其关键思想是使用自然语言推理（NLI）模型在部分前缀或中间扩散状态上进行轻微微调，将连续的语义相似性惩罚注入模型的提案分布。我们去偏下游的不确定性估计的重要性重新加权和控制变量缩小他们的方差。在四个QA基准测试中，我们的方法匹配或超过基线，同时用相同数量的样本覆盖更多的语义聚类。由于是模块化的，并且不需要对基本LLM进行梯度访问，该框架有望作为风险敏感模型部署中不确定性估计的插入式增强。
摘要：Accurately estimating semantic aleatoric and epistemic uncertainties in large language models (LLMs) is particularly challenging in free-form question answering (QA), where obtaining stable estimates often requires many expensive generations. We introduce a diversity-steered sampler that discourages semantically redundant outputs during decoding, covers both autoregressive and masked diffusion paradigms, and yields substantial sample-efficiency gains. The key idea is to inject a continuous semantic-similarity penalty into the model's proposal distribution using a natural language inference (NLI) model lightly finetuned on partial prefixes or intermediate diffusion states. We debias downstream uncertainty estimates with importance reweighting and shrink their variance with control variates. Across four QA benchmarks, our method matches or surpasses baselines while covering more semantic clusters with the same number of samples. Being modular and requiring no gradient access to the base LLM, the framework promises to serve as a drop-in enhancement for uncertainty estimation in risk-sensitive model deployments.

【12】Reducing the Probability of Undesirable Outputs in Language Models Using Probabilistic Inference
标题：使用概率推理降低语言模型中不良输出的可能性
链接：https://arxiv.org/abs/2510.21184

作者：Stephen Zhao, Aidan Li, Rob Brekelmans, Roger Grosse
摘要：强化学习（RL）已经成为一种将语言模型（LM）与人类偏好相匹配或促进给定奖励函数认为期望的输出的主要技术。标准的强化学习方法优化了平均回报，而明确关注降低不期望输出概率的方法通常会以平均情况性能为代价。为了改善这种权衡，我们引入了RePULSe，这是一种新的训练方法，它使用额外的损失来增加标准RL损失，该损失使用学习的建议来指导对低回报输出进行采样，然后降低这些输出的概率。我们运行的实验表明，RePULSe产生一个更好的权衡预期的回报与不期望的输出的概率，是更对抗强大的，相比标准的RL对齐方法和替代品。
摘要：Reinforcement learning (RL) has become a predominant technique to align language models (LMs) with human preferences or promote outputs which are deemed to be desirable by a given reward function. Standard RL approaches optimize average reward, while methods explicitly focused on reducing the probability of undesired outputs typically come at a cost to average-case performance. To improve this tradeoff, we introduce RePULSe, a new training method that augments the standard RL loss with an additional loss that uses learned proposals to guide sampling low-reward outputs, and then reduces those outputs' probability. We run experiments demonstrating that RePULSe produces a better tradeoff of expected reward versus the probability of undesired outputs and is more adversarially robust, compared to standard RL alignment approaches and alternatives.

【13】Large Language Models Meet Text-Attributed Graphs: A Survey of Integration Frameworks and Applications
标题：大型语言模型满足文本属性图：集成框架和应用的概览
链接：https://arxiv.org/abs/2510.21131

作者：Guangxin Su, Hanchen Wang, Jianwei Wang, Wenjie Zhang, Ying Zhang, Jian Pei
备注：Surveys and overviews; Natural language processing; Knowledge representation and reasoning; Graph algorithms
摘要：大型语言模型（LLM）通过强大的语义理解和生成，在自然语言处理方面取得了显着的成功。然而，它们的黑盒性质限制了结构化和多跳推理。相比之下，文本属性图（TAG）提供了丰富的文本上下文的显式关系结构，但往往缺乏语义深度。最近的研究表明，结合LLM和TAG产生互补的好处：增强TAG表示学习和提高LLM的推理和可解释性。这项调查提供了第一个系统的审查LLM-TAG集成从编排的角度来看。我们引入了一种新的分类法，涵盖两个基本方向：LLM的TAG，其中LLM丰富了基于图形的任务，和TAG的LLM，其中结构化图形改善了LLM推理。我们将编排策略分为顺序，并行和多模块框架，并讨论了TAG特定的预训练，提示和参数有效的微调的进展。除了方法论之外，我们还总结了经验见解，策划了可用的数据集，并强调了推荐系统，生物医学分析和知识密集型问题回答的各种应用。最后，我们概述了开放的挑战和有前途的研究方向，旨在指导语言和图形学习交叉点的未来工作。
摘要：Large Language Models (LLMs) have achieved remarkable success in natural language processing through strong semantic understanding and generation. However, their black-box nature limits structured and multi-hop reasoning. In contrast, Text-Attributed Graphs (TAGs) provide explicit relational structures enriched with textual context, yet often lack semantic depth. Recent research shows that combining LLMs and TAGs yields complementary benefits: enhancing TAG representation learning and improving the reasoning and interpretability of LLMs. This survey provides the first systematic review of LLM--TAG integration from an orchestration perspective. We introduce a novel taxonomy covering two fundamental directions: LLM for TAG, where LLMs enrich graph-based tasks, and TAG for LLM, where structured graphs improve LLM reasoning. We categorize orchestration strategies into sequential, parallel, and multi-module frameworks, and discuss advances in TAG-specific pretraining, prompting, and parameter-efficient fine-tuning. Beyond methodology, we summarize empirical insights, curate available datasets, and highlight diverse applications across recommendation systems, biomedical analysis, and knowledge-intensive question answering. Finally, we outline open challenges and promising research directions, aiming to guide future work at the intersection of language and graph learning.

【14】Self-Rewarding PPO: Aligning Large Language Models with Demonstrations Only
标题：自我奖励PPO：仅用演示调整大型语言模型
链接：https://arxiv.org/abs/2510.21090

作者：Qingru Zhang, Liang Qiu, Ilgee Hong, Zhenghao Xu, Tianyi Liu, Shiyang Li, Rongzhi Zhang, Zheng Li, Lihong Li, Bing Yin, Chao Zhang, Jianshu Chen, Haoming Jiang, Tuo Zhao
备注：Accepted by COLM 2025
摘要：监督微调（SFT）已经成为将大型语言模型（LLM）与人类注释演示对齐的关键方法。然而，SFT是一种类似于行为克隆的非策略方法，通常会遇到过拟合和域外泛化能力差的问题，特别是在数据有限的情况下。为了解决这些限制，我们提出了自奖励PPO，一种新的微调方法，利用对政策的技术，以提高泛化性能。我们的方法结合了SFT和最近策略优化（PPO）的优势，以实现更有效的调整，从示范数据。其核心是一个奖励函数，设计为SFT模型和预训练基础模型之间的日志策略比率。此函数用作隐式奖励信号，使用预训练的策略作为基线，SFT策略作为目标。通过这样做，它可以在不依赖人类偏好注释的情况下进行策略微调。这种自我奖励机制与PPO的集成解决了SFT的关键限制，提高了泛化能力，数据效率和鲁棒性。我们对一系列自然语言处理任务的实证评估表明，自我奖励PPO始终优于传统的SFT方法。结果强调了我们的方法在使用演示数据对齐LLM方面的有效性，特别是在高质量注释数据稀缺的情况下。
摘要：Supervised fine-tuning (SFT) has emerged as a crucial method for aligning large language models (LLMs) with human-annotated demonstrations. However, SFT, being an off-policy approach similar to behavior cloning, often struggles with overfitting and poor out-of-domain generalization, especially in limited-data scenarios. To address these limitations, we propose Self-Rewarding PPO, a novel fine-tuning method that leverages on-policy techniques to enhance generalization performance. Our approach combines the strengths of SFT and proximal policy optimization (PPO) to achieve more effective alignment from demonstration data. At its core is a reward function designed as the log policy ratio between the SFT model and the pretrained base model. This function serves as an implicit reward signal, using the pretrained policy as a baseline and the SFT policy as a target. By doing so, it enables on-policy fine-tuning without relying on human preference annotations. The integration of this self-rewarding mechanism with PPO addresses key limitations of SFT, improving generalization, data efficiency, and robustness. Our empirical evaluation across a range of natural language processing tasks demonstrates that Self-Rewarding PPO consistently outperforms traditional SFT methods. The results highlight the effectiveness of our approach in aligning LLMs using demonstration data, particularly in scenarios where high-quality annotated data is scarce.

【15】Customizing Open Source LLMs for Quantitative Medication Attribute Extraction across Heterogeneous EHR Systems
标题：定制开源LLM，用于跨异类EHR系统的定量药物属性提取
链接：https://arxiv.org/abs/2510.21027

作者：Zhe Fei, Mehmet Yigit Turali, Shreyas Rajesh, Xinyang Dai, Huyen Pham, Pavan Holur, Yuhui Zhu, Larissa Mooney, Yih-Ing Hser, Vwani Roychowdhury
备注：NeurIPS 2025: The Second Workshop on GenAI for Health: Potential, Trust, and Policy Compliance
摘要：在电子健康记录（EHR）系统中协调药物数据是监测阿片类药物使用障碍（MOUD）的一个持久障碍。在异构EHR系统中，关键处方属性分散在不同格式的字段和自由文本注释中。我们提出了一个实用的框架，自定义开源大型语言模型（LLM），包括Llama，Qwen，Gemma和MedGemma，从异构的，特定于站点的数据中提取一组统一的MOUD处方属性（处方日期，药物名称，持续时间，总数量，每日数量和再填充），并计算每个患者的药物覆盖率的标准化度量，MOUD天。我们的管道直接在固定的JSON模式中处理记录，然后进行轻量级规范化和跨字段一致性检查。我们使用之前注释的10{，}369条记录（776名患者）作为基准，根据国家OUD研究中5个诊所的处方级EHR数据（来自1{，}257名患者的25{，}605条记录）评估了该系统。真相。性能报告为覆盖率（具有有效、可匹配输出的记录的份额）和记录级精确匹配准确度。更大的模型整体表现最好：Qwen2.5- 32 B在诊所之间实现了\textbf {93.4\%}的覆盖率和\textbf{93.0\%}的精确匹配精度，MedGemma-27 B达到了\textbf{93.1\%}/\textbf{92.2\%}。一个简短的错误回顾强调了三个常见的问题和修复：使用药物内规范插补缺失的剂量字段，处理每月/每周注射剂（例如，Vivitrol）通过从记录的时间表中设置持续时间，并添加单位检查以防止质量单位（例如，“250 g”）被误读为每日计数。通过删除脆弱的、特定于站点的ETL并支持本地的、隐私保护的部署，这种方法可以在真实环境中对MOUD暴露、遵守和保留进行一致的跨站点分析。
摘要：Harmonizing medication data across Electronic Health Record (EHR) systems is a persistent barrier to monitoring medications for opioid use disorder (MOUD). In heterogeneous EHR systems, key prescription attributes are scattered across differently formatted fields and freetext notes. We present a practical framework that customizes open source large language models (LLMs), including Llama, Qwen, Gemma, and MedGemma, to extract a unified set of MOUD prescription attributes (prescription date, drug name, duration, total quantity, daily quantity, and refills) from heterogeneous, site specific data and compute a standardized metric of medication coverage, \emph{MOUD days}, per patient. Our pipeline processes records directly in a fixed JSON schema, followed by lightweight normalization and cross-field consistency checks. We evaluate the system on prescription level EHR data from five clinics in a national OUD study (25{,}605 records from 1{,}257 patients), using a previously annotated benchmark of 10{,}369 records (776 patients) as the ground truth. Performance is reported as coverage (share of records with a valid, matchable output) and record-level exact-match accuracy. Larger models perform best overall: Qwen2.5-32B achieves \textbf{93.4\%} coverage with \textbf{93.0\%} exact-match accuracy across clinics, and MedGemma-27B attains \textbf{93.1\%}/\textbf{92.2\%}. A brief error review highlights three common issues and fixes: imputing missing dosage fields using within-drug norms, handling monthly/weekly injectables (e.g., Vivitrol) by setting duration from the documented schedule, and adding unit checks to prevent mass units (e.g., ``250 g'') from being misread as daily counts. By removing brittle, site-specific ETL and supporting local, privacy-preserving deployment, this approach enables consistent cross-site analyses of MOUD exposure, adherence, and retention in real-world settings.

【16】Learning Grouped Lattice Vector Quantizers for Low-Bit LLM Compression
标题：用于低比特LLM压缩的学习分组格形向量量化器
链接：https://arxiv.org/abs/2510.20984

作者：Xi Zhang, Xiaolin Wu, Jiamang Wang, Weisi Lin
备注：NeurIPS 2025 Poster
摘要：大型语言模型（LLM）已经证明了卓越的能力，但通常需要大量的计算资源和内存进行推理。后训练量化（PTQ）可以通过以较低的位宽格式存储权重来有效地减少这些需求。然而，标准的均匀量化通常会导致显著的性能下降，特别是在低比特场景中。在这项工作中，我们引入了一个分组格矢量量化（GLVQ）的框架，分配给每组的权重一个定制的格码本，由一个可学习的生成矩阵定义。为了解决量化过程的不可微性，我们在训练过程中采用Babai舍入来近似最近格点搜索，这使得生成矩阵能够稳定优化。一旦经过训练，解码就简化为一个简单的矩阵-向量乘法，从而产生一个有效且实用的量化流水线。在多个基准上的实验表明，与现有的训练后量化基线相比，我们的方法在模型大小和准确性之间实现了更好的权衡，突出了其在严格的资源约束下部署大型模型的有效性。我们的源代码可以在GitHub存储库中找到：https://github.com/xzhang9308/GLVQ。
摘要：Large Language Models (LLMs) have demonstrated remarkable capabilities but typically require extensive computational resources and memory for inference. Post-training quantization (PTQ) can effectively reduce these demands by storing weights in lower bit-width formats. However, standard uniform quantization often leads to notable performance degradation, particularly in low-bit scenarios. In this work, we introduce a Grouped Lattice Vector Quantization (GLVQ) framework that assigns each group of weights a customized lattice codebook, defined by a learnable generation matrix. To address the non-differentiability of the quantization process, we adopt Babai rounding to approximate nearest-lattice-point search during training, which enables stable optimization of the generation matrices. Once trained, decoding reduces to a simple matrix-vector multiplication, yielding an efficient and practical quantization pipeline. Experiments on multiple benchmarks show that our approach achieves a better trade-off between model size and accuracy compared to existing post-training quantization baselines, highlighting its effectiveness in deploying large models under stringent resource constraints. Our source code is available on GitHub repository: https://github.com/xzhang9308/GLVQ.

【17】L^2M^3OF: A Large Language Multimodal Model for Metal-Organic Frameworks
标题：L#2M#3OF：金属有机框架的大语言多模式模型
链接：https://arxiv.org/abs/2510.20976

作者：Jiyu Cui, Fang Wu, Haokai Zhao, Minggao Feng, Xenophon Evangelopoulos, Andrew I. Cooper, Yejin Choi
备注：18 pages, 7 figures
摘要：大型语言模型在各种自然语言任务中表现出了卓越的推理能力。然而，科学发现中的可比突破则更为有限，因为理解复杂的物理现象需要多方面的表征，远远超出了语言本身。一个令人信服的例子是功能材料的设计，如MOFs-对于碳捕获和氢存储等一系列有影响力的应用至关重要。在LLM可解释的基于语言的表示中导航其庞大而复杂的设计空间是具有挑战性的，因为有许多可能的三维原子排列和严格的协调几何和拓扑网状规则。尽管LLM辅助发现更简单的材料系统的早期结果很有希望，但MOF设计仍然严重依赖于很少单独编入文本信息的隐性人类专业知识。为了克服这一障碍，我们引入了L2 M3 OF，这是第一个用于MOF的多模态LLM。L2 M3 OF将晶体表征学习与语言理解相结合，共同处理结构、文本和知识模态。L2 M3 OF采用预先训练的晶体编码器和轻量级投影层，将结构信息压缩到令牌空间中，从而实现与语言指令的高效对齐。为了便于培训和评估，我们策划了晶体材料的结构-属性-知识数据库，并将L2 M3 OF与最先进的闭源LLM（如GPT-5，Gemini-2.5-Pro和DeepSeek-R1）进行基准测试。实验表明，L2 M3 OF在属性预测和知识生成任务中优于领先的基于文本的闭源LLM，尽管使用的参数少得多。这些结果强调了多模态方法对多孔材料理解的重要性，并将L2 M3 OF作为材料发现中下一代人工智能系统的基础。
摘要：Large language models have demonstrated remarkable reasoning capabilities across diverse natural language tasks. However, comparable breakthroughs in scientific discovery are more limited, because understanding complex physical phenomena demands multifaceted representations far beyond language alone. A compelling example is the design of functional materials such as MOFs-critical for a range of impactful applications like carbon capture and hydrogen storage. Navigating their vast and intricate design space in language-based representations interpretable by LLMs is challenging due to the numerous possible three-dimensional atomic arrangements and strict reticular rules of coordination geometry and topology. Despite promising early results in LLM-assisted discovery for simpler materials systems, MOF design remains heavily reliant on tacit human expertise rarely codified in textual information alone. To overcome this barrier, we introduce L2M3OF, the first multimodal LLM for MOFs. L2M3OF integrates crystal representation learning with language understanding to process structural, textual, and knowledge modalities jointly. L2M3OF employs a pre-trained crystal encoder with a lightweight projection layer to compress structural information into a token space, enabling efficient alignment with language instructions. To facilitate training and evaluation, we curate a structure-property-knowledge database of crystalline materials and benchmark L2M3OF against state-of-the-art closed-source LLMs such as GPT-5, Gemini-2.5-Pro and DeepSeek-R1. Experiments show that L2M3OF outperforms leading text-based closed-source LLMs in property prediction and knowledge generation tasks, despite using far fewer parameters. These results highlight the importance of multimodal approaches for porous material understanding and establish L2M3OF as a foundation for next-generation AI systems in materials discovery.

【18】LLM-Integrated Bayesian State Space Models for Multimodal Time-Series Forecasting
标题：用于多峰时间序列预测的LLM集成Bayesian状态空间模型
链接：https://arxiv.org/abs/2510.20952

作者：Sungjun Cho, Changho Shin, Suenggwan Jo, Xinya Yan, Shourjo Aditya Chaudhuri, Frederic Sala
备注：15 pages, 8 figures
摘要：现实世界中的预测需要将结构化的时间序列数据与非结构化的文本信息相结合，但现有的方法在架构上受到固定输入/输出范围的限制，并且无法建模或量化不确定性。我们通过引入LLM集成贝叶斯状态空间模型（LBS），一种新的多模态时间预测的概率框架来解决这一挑战。在高层次上，LBS由两个组件组成：（1）状态空间模型（SSM）主干，捕获潜在状态的时间动态，从中生成数值和文本观测;（2）预训练的大型语言模型（LLM），适用于编码文本输入以进行后验状态估计，并解码与潜在轨迹一致的文本预测。这种设计使灵活的回顾和预测窗口，原则性的不确定性量化，并改善时间概括感谢SSM的建模动态系统的非常适合的归纳偏见。在TextTimeCorpus基准测试上的实验表明，LBS将之前的最先进技术提高了13.20%，同时为每个预测提供了人类可读的摘要。我们的工作是第一个统一LLM和SSM的联合数值和文本预测，提供了一个新的基础多模态时间推理。
摘要：Forecasting in the real world requires integrating structured time-series data with unstructured textual information, but existing methods are architecturally limited by fixed input/output horizons and are unable to model or quantify uncertainty. We address this challenge by introducing LLM-integrated Bayesian State space models (LBS), a novel probabilistic framework for multimodal temporal forecasting. At a high level, LBS consists of two components: (1) a state space model (SSM) backbone that captures the temporal dynamics of latent states from which both numerical and textual observations are generated and (2) a pretrained large language model (LLM) that is adapted to encode textual inputs for posterior state estimation and decode textual forecasts consistent with the latent trajectory. This design enables flexible lookback and forecast windows, principled uncertainty quantification, and improved temporal generalization thanks to the well-suited inductive bias of SSMs toward modeling dynamical systems. Experiments on the TextTimeCorpus benchmark demonstrate that LBS improves the previous state-of-the-art by 13.20% while providing human-readable summaries of each forecast. Our work is the first to unify LLMs and SSMs for joint numerical and textual prediction, offering a novel foundation for multimodal temporal reasoning.

【19】Incentivizing Consistent, Effective and Scalable Reasoning Capability in Audio LLMs via Reasoning Process Rewards
标题：通过推理过程奖励激励音频LLM一致、有效和可扩展的推理能力
链接：https://arxiv.org/abs/2510.20867

作者：Jiajun Fan, Roger Ren, Jingyuan Li, Rahul Pandey, Prashanth Gurunath Shivakumar, Ivan Bulyko, Ankur Gandhe, Ge Liu, Yile Gu
备注：49 pages
摘要：推理在音频大语言模型中的作用仍然没有得到广泛的探索，因为引入推理过程通常会降低而不是提高推理过程的性能，这是一种我们称之为测试时逆缩放的现象，其中较长的推理链会产生越来越差的结果。我们证明，这不是源于推理本身的基本限制，而是来自训练不足：没有适当指导的模型推理过程产生幻觉，不一致的推理，积累了较长的链错误。为了应对这些挑战，我们引入了CESAR（一致，有效和可扩展的音频推理器），从结果验证转向奖励推理过程。我们的在线强化学习框架采用了具有多方面奖励套件的组相对策略优化，不仅激励正确性和格式，还激励一致性，结构化分析模式，因果推理，领域知识整合和校准推理深度。CESAR解决了测试时间的逆缩放，将推理从测量转换为增益，同时揭示了特定于模型的“推理甜蜜点”，在测试时间缩放期间性能达到峰值。我们在MMAU Test-mini上实现了最先进的结果，大大超过了Gemini 2.5 Pro和GPT-4 o Audio，并且在MMSU推理任务上接近人类水平。通过人工智能作为判断评估和定性比较，我们提供了我们改进的推理质量的定量和定性验证。重要的是，增强的推理产生协同效应，同时提高多模态推理和感知能力。总的来说，CESAR建立了一个原则性的方法，在音频LLM开发强大的和可扩展的推理。
摘要：The role of reasoning in Audio Large Language Models remains widely underexplored, as introducing a reasoning process often degrades rather than improves performance during inference, a phenomenon we term test-time inverse scaling, where longer reasoning chains yield progressively worse results. We demonstrate that this stems not from fundamental limitations of reasoning itself, but from inadequate training: models without proper guidance for the reasoning process produce hallucinatory, inconsistent reasoning that accumulates errors over longer chains. To address these challenges, we introduce CESAR (Consistent, Effective, and Scalable Audio Reasoners), shifting from outcome verification to rewarding the reasoning process. Our online reinforcement learning framework employs Group Relative Policy Optimization with a multi-faceted reward suite that incentivizes not only correctness and format but also consistency, structured analytical patterns, causal reasoning, domain-knowledge integration, and calibrated reasoning depth. CESAR resolves test-time inverse scaling, transforming reasoning from detriments into gains while revealing model-specific ``reasoning sweet spots", where performance peaks during test-time scaling. We achieve state-of-the-art results on MMAU Test-mini, substantially outperforming Gemini 2.5 Pro and GPT-4o Audio, and near-human-level performance on MMSU reasoning tasks. Through AI-as-judge evaluations and qualitative comparisons, we provide both quantitative and qualitative validation of our improved reasoning quality. Importantly, enhanced reasoning creates synergistic effects, simultaneously improving multimodal reasoning and perception capabilities. Overall, CESAR establishes a principled method for developing robust and scalable reasoning in Audio LLMs.

Graph相关(图学习|图神经网络|图优化等)(11篇)

【1】Optimal Graph Clustering without Edge Density Signals
标题：没有边缘密度信号的最佳图聚集
链接：https://arxiv.org/abs/2510.21669

作者：Maximilien Dreveton, Elaine Siyu Liu, Matthias Grossglauser, Patrick Thiran
摘要：针对现有模型的局限性，建立了流行度调整区组模型（PABM）下图聚类的理论界限。与假设统一顶点度的随机块模型（SBM）和跨集群应用统一度校正的度校正块模型（DCBM）相比，PABM为集群内和集群间连接引入了单独的流行度参数。我们的主要贡献是PABM下聚类的最佳错误率的表征，这提供了新的见解聚类硬度：我们证明，不像SBM和DCBM，集群恢复仍然可能在PABM，即使传统的边缘密度信号消失，提供内和集群间的流行系数不同。这突出了PABM捕获但被DCBM忽略的程度异质性的一个维度：连接模式的局部差异可以独立于全局边缘密度来增强聚类可分性。最后，由于PABM具有更丰富的结构，其预期邻接矩阵的秩在$k$和$k^2$之间，其中$k$是簇的数量。因此，基于前$k$特征向量的谱嵌入可能无法捕获重要的结构信息。我们的数值实验合成和真实的数据集证实，谱聚类算法，将$k^2$特征向量优于传统的谱方法。
摘要：This paper establishes the theoretical limits of graph clustering under the Popularity-Adjusted Block Model (PABM), addressing limitations of existing models. In contrast to the Stochastic Block Model (SBM), which assumes uniform vertex degrees, and to the Degree-Corrected Block Model (DCBM), which applies uniform degree corrections across clusters, PABM introduces separate popularity parameters for intra- and inter-cluster connections. Our main contribution is the characterization of the optimal error rate for clustering under PABM, which provides novel insights on clustering hardness: we demonstrate that unlike SBM and DCBM, cluster recovery remains possible in PABM even when traditional edge-density signals vanish, provided intra- and inter-cluster popularity coefficients differ. This highlights a dimension of degree heterogeneity captured by PABM but overlooked by DCBM: local differences in connectivity patterns can enhance cluster separability independently of global edge densities. Finally, because PABM exhibits a richer structure, its expected adjacency matrix has rank between $k$ and $k^2$, where $k$ is the number of clusters. As a result, spectral embeddings based on the top $k$ eigenvectors may fail to capture important structural information. Our numerical experiments on both synthetic and real datasets confirm that spectral clustering algorithms incorporating $k^2$ eigenvectors outperform traditional spectral approaches.

【2】Leveraging Classical Algorithms for Graph Neural Networks
标题：经典算法在图神经网络中的应用
链接：https://arxiv.org/abs/2510.21574

作者：Jason Wu, Petar Veličković
摘要：神经网络擅长处理非结构化数据，但通常无法概括分布外的数据，而经典算法可以保证正确性，但缺乏灵活性。我们探讨了在经典算法上预训练图神经网络（GNN）是否可以提高其在开放图基准的分子性质预测任务上的性能：ogbg-molhiv（HIV抑制）和ogbg-molclintox（临床毒性）。在CLRS数学推理基准的24种经典算法上训练的GNN用于初始化和冻结第二个GNN的选定层，以进行分子预测。与随机初始化的基线相比，预训练模型实现了一致的胜利或平局，Segments Intersect算法预训练在ogbg-molhiv上获得了6%的绝对增益，Dijkstra预训练在ogbg-molclintox上获得了3%的增益。这些结果表明，将经典算法先验嵌入到GNN中提供了有用的归纳偏差，提高了复杂的真实图形数据的性能。
摘要：Neural networks excel at processing unstructured data but often fail to generalise out-of-distribution, whereas classical algorithms guarantee correctness but lack flexibility. We explore whether pretraining Graph Neural Networks (GNNs) on classical algorithms can improve their performance on molecular property prediction tasks from the Open Graph Benchmark: ogbg-molhiv (HIV inhibition) and ogbg-molclintox (clinical toxicity). GNNs trained on 24 classical algorithms from the CLRS Algorithmic Reasoning Benchmark are used to initialise and freeze selected layers of a second GNN for molecular prediction. Compared to a randomly initialised baseline, the pretrained models achieve consistent wins or ties, with the Segments Intersect algorithm pretraining yielding a 6% absolute gain on ogbg-molhiv and Dijkstra pretraining achieving a 3% gain on ogbg-molclintox. These results demonstrate embedding classical algorithmic priors into GNNs provides useful inductive biases, boosting performance on complex, real-world graph data.

【3】On Local Limits of Sparse Random Graphs: Color Convergence and the Refined Configuration Model
标题：稀疏随机图的局部极限：颜色收敛和精细配置模型
链接：https://arxiv.org/abs/2510.21392

作者：Alexander Pluska, Sagar Malhotra
摘要：局部收敛已经成为分析稀疏随机图模型的基本工具。我们引入了一个新的概念，局部收敛，颜色收敛，基于Weisfeiler-Leman算法。色收敛充分刻画了一类在消息传递图神经网络极限内表现良好的随机图。在此基础上，我们提出了细化配置模型（RCM），随机图模型，概括了配置模型。RCM在局部树型随机图模型（包括Erd\H{o}s-R\'enyi、随机块和配置模型）的局部收敛性方面具有普遍性.最后，这个框架使一个完整的随机树，出现这种图形的局部限制的特征。
摘要：Local convergence has emerged as a fundamental tool for analyzing sparse random graph models. We introduce a new notion of local convergence, color convergence, based on the Weisfeiler-Leman algorithm. Color convergence fully characterizes the class of random graphs that are well-behaved in the limit for message-passing graph neural networks. Building on this, we propose the Refined Configuration Model (RCM), a random graph model that generalizes the configuration model. The RCM is universal with respect to local convergence among locally tree-like random graph models, including Erd\H{o}s-R\'enyi, stochastic block and configuration models. Finally, this framework enables a complete characterization of the random trees that arise as local limits of such graphs.

【4】Relieving the Over-Aggregating Effect in Graph Transformers
标题：缓解图形Transformer中的过度聚集效应
链接：https://arxiv.org/abs/2510.21267

作者：Junshu Sun, Wanxing Chang, Chenxue Yang, Qingming Huang, Shuhui Wang
备注：Accepted by NeurIPS 2025
摘要：图形注意力在图形学习任务中表现出优越的性能。然而，由于大量的节点，从全局交互中学习可能具有挑战性。在本文中，我们发现了一个新的现象称为过度聚集。当大量的消息被聚集到一个单一的节点与较少的歧视，导致稀释的关键消息和潜在的信息丢失。为了解决这个问题，我们提出了Wideformer，一种即插即用的图形注意力方法。Wideformer将所有节点的聚合划分为并行进程，并引导模型专注于这些进程的特定子集。这种划分可以限制每个聚合的输入量，避免信息稀释，减少信息损失。引导步骤对聚合输出进行排序和加权，对信息性消息进行优先级排序。评估表明，Wideformer可以有效地减轻过度聚集。因此，主干方法可以专注于信息性消息，与基线方法相比，实现了更好的性能。
摘要：Graph attention has demonstrated superior performance in graph learning tasks. However, learning from global interactions can be challenging due to the large number of nodes. In this paper, we discover a new phenomenon termed over-aggregating. Over-aggregating arises when a large volume of messages is aggregated into a single node with less discrimination, leading to the dilution of the key messages and potential information loss. To address this, we propose Wideformer, a plug-and-play method for graph attention. Wideformer divides the aggregation of all nodes into parallel processes and guides the model to focus on specific subsets of these processes. The division can limit the input volume per aggregation, avoiding message dilution and reducing information loss. The guiding step sorts and weights the aggregation outputs, prioritizing the informative messages. Evaluations show that Wideformer can effectively mitigate over-aggregating. As a result, the backbone methods can focus on the informative messages, achieving superior performance compared to baseline methods.

【5】Adaptive Graph Mixture of Residual Experts: Unsupervised Learning on Diverse Graphs with Heterogeneous Specialization
标题：残差专家的自适应图混合：具有异质专业化的多样图上的无监督学习
链接：https://arxiv.org/abs/2510.21207

作者：Yunlong Chu, Minglai Shao, Zengyi Wo, Bing Hao, Yuhang Liu, Ruijie Wang, Jianxin Li
摘要：图神经网络（GNN）面临着一个基本的适应性挑战：它们固定的消息传递架构与现实世界图的巨大多样性作斗争，其中最佳计算策略因局部结构和任务而异。虽然混合专家（MoE）提供了一个有前途的途径，适应性，现有的图MoE方法仍然受到其依赖于监督信号和不稳定性时，训练异构专家。我们介绍ADaMoRE（自适应混合残差专家），一个原则性的框架，使强大的，完全无监督的训练异构MoE图。ADaMoRE采用骨干残差专家架构，其中基础编码器提供稳定性，而专业残差专家捕获不同的计算模式。结构感知选通网络执行细粒度节点路由。整个架构使用统一的无监督目标进行端到端的训练，该目标将主要重建任务与信息理论多样性正则化器集成在一起，以明确地在专家之间执行功能专业化。理论分析表明，该设计提高了数据效率和训练稳定性。在16个基准测试中进行的广泛评估验证了ADaMoRE在无监督节点分类和Few-Shot学习方面的最新性能，以及卓越的泛化能力、训练效率和对各种图形和任务的更快收敛。
摘要：Graph Neural Networks (GNNs) face a fundamental adaptability challenge: their fixed message-passing architectures struggle with the immense diversity of real-world graphs, where optimal computational strategies vary by local structure and task. While Mixture-of-Experts (MoE) offers a promising pathway to adaptability, existing graph MoE methods remain constrained by their reliance on supervised signals and instability when training heterogeneous experts. We introduce ADaMoRE (Adaptive Mixture of Residual Experts), a principled framework that enables robust, fully unsupervised training of heterogeneous MoE on graphs. ADaMoRE employs a backbone-residual expert architecture where foundational encoders provide stability while specialized residual experts capture diverse computational patterns. A structurally-aware gating network performs fine-grained node routing. The entire architecture is trained end-to-end using a unified unsupervised objective, which integrates a primary reconstruction task with an information-theoretic diversity regularizer to explicitly enforce functional specialization among the experts. Theoretical analysis confirms our design improves data efficiency and training stability. Extensive evaluation across 16 benchmarks validates ADaMoRE's state-of-the-art performance in unsupervised node classification and few-shot learning, alongside superior generalization, training efficiency, and faster convergence on diverse graphs and tasks.

【6】M-GLC: Motif-Driven Global-Local Context Graphs for Few-shot Molecular Property Prediction
标题：M-GLC：模体驱动的全局-局部上下文图用于少样本分子性质预测
链接：https://arxiv.org/abs/2510.21088

作者：Xiangyang Xu, Hongyang Gao
摘要：分子性质预测（MPP）是药物发现和材料科学的基石，但传统的深度学习方法依赖于通常不可用的大型标记数据集。Few-Shot分子性质预测（FSMPP）通过将关系归纳偏差通过将分子节点链接到性质节点的上下文图来解决这种稀缺性，但是这样的分子性质图提供有限的结构指导。我们提出了一个全面的解决方案：基序驱动的全局-局部上下文图的Few-Shot分子性质预测，丰富了上下文信息在全局和局部水平。在全球层面上，化学意义的基序节点代表共享的子结构，如环或官能团，被引入形成一个全球的三分异构图，产生基序分子性质的连接，捕捉远程组成模式，并使知识转移分子与共同的基序。在局部水平上，我们为分子性质对中的每个节点构建子图，并分别对它们进行编码，以将模型的注意力集中在信息量最大的相邻分子和基序上。在五个标准FSMPP基准测试上的实验表明，我们的框架始终优于最先进的方法。这些结果强调了将全局基序知识与细粒度局部背景相结合以推进稳健的Few-Shot分子性质预测的有效性。
摘要：Molecular property prediction (MPP) is a cornerstone of drug discovery and materials science, yet conventional deep learning approaches depend on large labeled datasets that are often unavailable. Few-shot Molecular property prediction (FSMPP) addresses this scarcity by incorporating relational inductive bias through a context graph that links molecule nodes to property nodes, but such molecule-property graphs offer limited structural guidance. We propose a comprehensive solution: Motif Driven Global-Local Context Graph for few-shot molecular property prediction, which enriches contextual information at both the global and local levels. At the global level, chemically meaningful motif nodes representing shared substructures, such as rings or functional groups, are introduced to form a global tri-partite heterogeneous graph, yielding motif-molecule-property connections that capture long-range compositional patterns and enable knowledge transfer among molecules with common motifs. At the local level, we build a subgraph for each node in the molecule-property pair and encode them separately to concentrate the model's attention on the most informative neighboring molecules and motifs. Experiments on five standard FSMPP benchmarks demonstrate that our framework consistently outperforms state-of-the-art methods. These results underscore the effectiveness of integrating global motif knowledge with fine-grained local context to advance robust few-shot molecular property prediction.

【7】Graph Neural Regularizers for PDE Inverse Problems
标题：偏脱方程反问题的图神经调节器
链接：https://arxiv.org/abs/2510.21012

作者：William Lauga, James Rowbottom, Alexander Denker, Željko Kereta, Moshe Eliasof, Carola-Bibiane Schönlieb
摘要：我们提出了一个框架，用于解决由偏微分方程（PDE）控制的一类广泛的不适定反问题，其中通过迭代正则化方案恢复前向算子的目标系数，该方案在基于FEM的反演和学习的图神经正则化之间交替。使用有限元法（FEM）的数值求解的正问题，使适用范围广泛的几何形状和偏微分方程。通过利用FEM离散化固有的图结构，我们采用物理启发的图神经网络作为学习的正则化器，为标准方法提供了一种鲁棒的，可解释的和可推广的替代方案。数值实验表明，我们的框架优于经典的正则化技术，即使在高度不适定的情况下，也能实现精确的重建。
摘要：We present a framework for solving a broad class of ill-posed inverse problems governed by partial differential equations (PDEs), where the target coefficients of the forward operator are recovered through an iterative regularization scheme that alternates between FEM-based inversion and learned graph neural regularization. The forward problem is numerically solved using the finite element method (FEM), enabling applicability to a wide range of geometries and PDEs. By leveraging the graph structure inherent to FEM discretizations, we employ physics-inspired graph neural networks as learned regularizers, providing a robust, interpretable, and generalizable alternative to standard approaches. Numerical experiments demonstrate that our framework outperforms classical regularization techniques and achieves accurate reconstructions even in highly ill-posed scenarios.

【8】CC-GRMAS: A Multi-Agent Graph Neural System for Spatiotemporal Landslide Risk Assessment in High Mountain Asia
标题：CC-GRMAS：亚洲高山时空滑坡风险评估的多智能体图神经系统
链接：https://arxiv.org/abs/2510.20875

作者：Mihir Panchal, Ying-Jung Chen, Surya Parkash
摘要：滑坡是一种日益严重的气候诱发灾害，对环境和人类造成严重后果，特别是在亚洲高山地区。尽管越来越多地获得卫星和时间数据集，及时检测和灾害应对仍然不够发达和分散。这项工作介绍了CC-GRMAS，一个框架，利用一系列的卫星观测和环境信号，以提高滑坡预测的准确性。该系统围绕三个相互关联的代理预测，规划和执行，协同实现实时态势感知，响应规划和干预。通过纳入当地环境因素并实施多代理协调，这种方法为脆弱山区的气候适应性灾害准备提供了一个可扩展和积极主动的解决方案。
摘要：Landslides are a growing climate induced hazard with severe environmental and human consequences, particularly in high mountain Asia. Despite increasing access to satellite and temporal datasets, timely detection and disaster response remain underdeveloped and fragmented. This work introduces CC-GRMAS, a framework leveraging a series of satellite observations and environmental signals to enhance the accuracy of landslide forecasting. The system is structured around three interlinked agents Prediction, Planning, and Execution, which collaboratively enable real time situational awareness, response planning, and intervention. By incorporating local environmental factors and operationalizing multi agent coordination, this approach offers a scalable and proactive solution for climate resilient disaster preparedness across vulnerable mountainous terrains.

【9】Crisis-Resilient Portfolio Management via Graph-based Spatio-Temporal Learning
标题：通过基于图的时空学习进行具有危机弹性的投资组合管理
链接：https://arxiv.org/abs/2510.20868

作者：Zan Li, Rui Fan
摘要：金融时间序列预测面临着一个根本性的挑战：预测最佳资产配置需要了解在危机期间转变的依赖于制度的相关结构。现有的基于图的时空学习方法依赖于预定的图拓扑结构-相关阈值，部门分类-当市场动态在不同的危机机制中发生变化时无法适应：信用传染，流行病冲击或通胀驱动的抛售。我们提出了CRISP（Crisisis-Resilient Investment through Spatio-temporal Patterns），这是一个基于图的时空学习框架，它通过图卷积网络编码空间关系，通过BiLSTM自注意编码时间动态，然后通过多头图注意网络学习稀疏结构。与固定拓扑方法不同，CRISP通过注意力机制发现哪些资产关系很重要，将92.5%的连接过滤为噪音，同时保留危机相关的依赖关系，以进行准确的特定于制度的预测。在2005- 2021年包括信贷和流行病危机的数据上进行了培训，CRISP通过准确预测与制度相适应的相关结构，对2022- 2024年通胀驱动的市场（一个根本不同的制度）进行了有力的概括。这使得自适应投资组合分配能够在低迷时期保持盈利能力，实现夏普比率3.76：比等权重基线提高707%，比静态图方法提高94%。学习的注意力权重提供了可解释的状态检测，在危机期间，防御性集群注意力增强了49%，而在整个市场范围内，从学习到预测而不是强加假设的紧急行为增强了31%。
摘要：Financial time series forecasting faces a fundamental challenge: predicting optimal asset allocations requires understanding regime-dependent correlation structures that transform during crisis periods. Existing graph-based spatio-temporal learning approaches rely on predetermined graph topologies--correlation thresholds, sector classifications--that fail to adapt when market dynamics shift across different crisis mechanisms: credit contagion, pandemic shocks, or inflation-driven selloffs. We present CRISP (Crisis-Resilient Investment through Spatio-temporal Patterns), a graph-based spatio-temporal learning framework that encodes spatial relationships via Graph Convolutional Networks and temporal dynamics via BiLSTM with self-attention, then learns sparse structures through multi-head Graph Attention Networks. Unlike fixed-topology methods, CRISP discovers which asset relationships matter through attention mechanisms, filtering 92.5% of connections as noise while preserving crisis-relevant dependencies for accurate regime-specific predictions. Trained on 2005--2021 data encompassing credit and pandemic crises, CRISP demonstrates robust generalization to 2022--2024 inflation-driven markets--a fundamentally different regime--by accurately forecasting regime-appropriate correlation structures. This enables adaptive portfolio allocation that maintains profitability during downturns, achieving Sharpe ratio 3.76: 707% improvement over equal-weight baselines and 94% improvement over static graph methods. Learned attention weights provide interpretable regime detection, with defensive cluster attention strengthening 49% during crises versus 31% market-wide--emergent behavior from learning to forecast rather than imposing assumptions.

【10】A Short Note on Upper Bounds for Graph Neural Operator Convergence Rate
标题：关于图神经运算符收敛率上界的简短注释
链接：https://arxiv.org/abs/2510.20954

作者：Roxanne Holden, Luana Ruiz
摘要：图子作为图序列的极限，为分析图神经运算符的渐进行为提供了一个框架。采样图到图子的谱收敛产生运算符级的收敛速率，使GNN的可转移性分析成为可能。本文总结了在无假设、全局Lipschitz连续性和分段Lipschitz连续性下的已知边界，强调了假设和速率之间的权衡，并说明了它们在合成和真实数据上的经验紧密性。
摘要：Graphons, as limits of graph sequences, provide a framework for analyzing the asymptotic behavior of graph neural operators. Spectral convergence of sampled graphs to graphons yields operator-level convergence rates, enabling transferability analyses of GNNs. This note summarizes known bounds under no assumptions, global Lipschitz continuity, and piecewise-Lipschitz continuity, highlighting tradeoffs between assumptions and rates, and illustrating their empirical tightness on synthetic and real data.

【11】BACE: Behavior-Adaptive Connectivity Estimation for Interpretable Graphs of Neural Dynamics
标题：BACE：神经动力学可解释图的行为自适应连通性估计
链接：https://arxiv.org/abs/2510.20831

作者：Mehrnaz Asadi, Sina Javadzadeh, Rahil Soroushmojdehi, S. Alireza Seyyed Mousavi, Terence D. Sanger
摘要：了解分布式大脑区域如何协调产生行为，需要具有预测性和可解释性的模型。我们介绍了行为自适应连接估计（BACE），这是一个端到端的框架，可以直接从多区域颅内局部场电位（LFP）学习特定阶段的定向区域间连接。BACE通过每个区域的时间编码器聚集每个解剖区域内的许多微接触，将特定于每个行为阶段的可学习邻接应用于每个行为阶段，并在预测目标上进行训练。在具有已知图形的合成多变量时间序列上，BACE准确地恢复地面实况定向交互，同时实现与最先进基线相当的预测性能。应用于人类皮层下LFP记录同时从八个区域在一个线索达到任务，BACE产生一个明确的连接矩阵，每个试验内的行为阶段。由此产生的行为阶段的特定图形揭示了行为对齐的区域间影响的重新配置，并提供紧凑的，可解释的邻接矩阵，用于比较跨行为阶段的网络组织。通过将预测成功与明确的连接估计联系起来，BACE提供了一个实用的工具，用于生成关于行为过程中皮层下区域动态协调的数据驱动假设。
摘要：Understanding how distributed brain regions coordinate to produce behavior requires models that are both predictive and interpretable. We introduce Behavior-Adaptive Connectivity Estimation (BACE), an end-to-end framework that learns phase-specific, directed inter-regional connectivity directly from multi-region intracranial local field potentials (LFP). BACE aggregates many micro-contacts within each anatomical region via per-region temporal encoders, applies a learnable adjacency specific to each behavioral phase, and is trained on a forecasting objective. On synthetic multivariate time series with known graphs, BACE accurately recovers ground-truth directed interactions while achieving forecasting performance comparable to state-of-the-art baselines. Applied to human subcortical LFP recorded simultaneously from eight regions during a cued reaching task, BACE yields an explicit connectivity matrix for each within-trial behavioral phase. The resulting behavioral phase-specific graphs reveal behavior-aligned reconfiguration of inter-regional influence and provide compact, interpretable adjacency matrices for comparing network organization across behavioral phases. By linking predictive success to explicit connectivity estimates, BACE offers a practical tool for generating data-driven hypotheses about the dynamic coordination of subcortical regions during behavior.

Transformer(3篇)

【1】Head Pursuit: Probing Attention Specialization in Multimodal Transformers
标题：头部追求：多模式Transformer专业关注
链接：https://arxiv.org/abs/2510.21518

作者：Lorenzo Basile, Valentino Maiorca, Diego Doimo, Francesco Locatello, Alberto Cazzaniga
备注：Accepted at NeurIPS 2025 (spotlight)
摘要：语言和视觉语言模型在广泛的任务中表现出令人印象深刻的性能，但它们的内部机制仍然只有部分了解。在这项工作中，我们研究了文本生成模型中的个体注意力如何专注于特定的语义或视觉属性。建立在一个既定的可解释性方法，我们重新解释的做法，探测中间激活与最终解码层通过镜头的信号处理。这使我们能够以原则性的方式分析多个样本，并根据它们与目标概念的相关性对注意力头部进行排名。我们的研究结果表明，在头部水平在单峰和多峰Transformers的专业化一致的模式。值得注意的是，我们发现，编辑使用我们的方法选择的1%的头部，可以可靠地抑制或增强模型输出中的目标概念。我们验证了我们的语言任务，如问答和毒性缓解，以及视觉语言任务，包括图像分类和字幕的方法。我们的研究结果强调了注意力层中的可解释和可控结构，为理解和编辑大规模生成模型提供了简单的工具。
摘要：Language and vision-language models have shown impressive performance across a wide range of tasks, but their internal mechanisms remain only partly understood. In this work, we study how individual attention heads in text-generative models specialize in specific semantic or visual attributes. Building on an established interpretability method, we reinterpret the practice of probing intermediate activations with the final decoding layer through the lens of signal processing. This lets us analyze multiple samples in a principled way and rank attention heads based on their relevance to target concepts. Our results show consistent patterns of specialization at the head level across both unimodal and multimodal transformers. Remarkably, we find that editing as few as 1% of the heads, selected using our method, can reliably suppress or enhance targeted concepts in the model output. We validate our approach on language tasks such as question answering and toxicity mitigation, as well as vision-language tasks including image classification and captioning. Our findings highlight an interpretable and controllable structure within attention layers, offering simple tools for understanding and editing large-scale generative models.

【2】Sensor-Specific Transformer (PatchTST) Ensembles with Test-Matched Augmentation
标题：具有测试匹配增强功能的传感器特定Transformer（PatchTST）集成
链接：https://arxiv.org/abs/2510.21282

作者：Pavankumar Chandankar, Robin Burchard
摘要：我们在第二届WEAR数据集挑战赛上提出了一种噪声感知的传感器特定集成方法，用于强大的人类活动识别。我们的方法利用PatchTST Transformer架构，训练四个独立的模型-每个惯性传感器位置-在篡改的训练集上，其1秒滑动窗口被增强以模仿测试时的噪声。通过对齐训练和测试数据模式（JSON编码的50个样本窗口）并应用随机抖动、缩放、旋转和通道丢弃，每个PatchTST模型都能学习如何在真实世界的传感器扰动中进行泛化。在推理时，我们计算Kaggle测试集上所有四个传感器模型的softmax概率，并将其平均以产生最终标签。在私有排行榜上，该管道实现了远高于基线的宏F1，表明测试匹配增强与基于变压器的集成相结合是噪声条件下鲁棒HAR的有效策略。
摘要：We present a noise-aware, sensor-specific ensemble approach for robust human activity recognition on the 2nd WEAR Dataset Challenge. Our method leverages the PatchTST transformer architecture, training four independent models-one per inertial sensor location-on a tampered training set whose 1-second sliding windows are augmented to mimic the test-time noise. By aligning the train and test data schemas (JSON-encoded 50-sample windows) and applying randomized jitter, scaling, rotation, and channel dropout, each PatchTST model learns to generalize across real-world sensor perturbations. At inference, we compute softmax probabilities from all four sensor models on the Kaggle test set and average them to produce final labels. On the private leaderboard, this pipeline achieves a macro-F1 substantially above the baseline, demonstrating that test-matched augmentation combined with transformer-based ensembling is an effective strategy for robust HAR under noisy conditions.

【3】GPU Memory Requirement Prediction for Deep Learning Task Based on Bidirectional Gated Recurrent Unit Optimization Transformer
标题：基于双向门控回归单元优化Transformer的深度学习任务的图形处理器内存需求预测
链接：https://arxiv.org/abs/2510.20985

作者：Chao Wang, Zhizhao Wen, Ruoxin Zhang, Puyang Xu, Yifan Jiang
摘要：针对深度学习任务对GPU内存资源准确预测的需求日益关键，本文深入分析了当前研究现状，创新性地提出了一种集成双向门控递归单元（BiGRU）优化Transformer架构的深度学习模型，旨在提高内存需求预测的准确性。为了验证模型的有效性，选取决策树、随机森林、Adaboost和XGBoost四种具有代表性的基本机器学习模型进行了精心设计的对比实验。详细的实验结果表明，本文提出的BiGRU Transformer优化模型在关键评价指标上表现出明显的优势：在均方误差（MSE）和均方根误差（RMSE）方面，该模型在所有对比模型中达到最低值，其预测结果与实际值的偏差最小;在平均绝对误差（MAE）和决定系数（R2）指标方面，该模型也表现良好，结果均衡稳定，综合预测性能远超基准机器学习方法比较。综上所述，本研究成功构建的基于双向门控递归单元优化的Transformer模型，能够高效准确地完成深度学习任务中的GPU内存需求预测任务，其预测精度相比传统机器学习方法有了显著提升。该研究为优化深度学习任务的资源调度和管理，提高计算集群的利用效率提供了有力的技术支持和可靠的理论依据。
摘要：In response to the increasingly critical demand for accurate prediction of GPU memory resources in deep learning tasks, this paper deeply analyzes the current research status and innovatively proposes a deep learning model that integrates bidirectional gated recurrent units (BiGRU) to optimize the Transformer architecture, aiming to improve the accuracy of memory demand prediction. To verify the effectiveness of the model, a carefully designed comparative experiment was conducted, selecting four representative basic machine learning models: decision tree, random forest, Adaboost, and XGBoost as benchmarks. The detailed experimental results show that the BiGRU Transformer optimization model proposed in this paper exhibits significant advantages in key evaluation indicators: in terms of mean square error (MSE) and root mean square error (RMSE), the model achieves the lowest value among all comparison models, and its predicted results have the smallest deviation from the actual values; In terms of mean absolute error (MAE) and coefficient of determination (R2) indicators, the model also performs well and the results are balanced and stable, with comprehensive predictive performance far exceeding the benchmark machine learning methods compared. In summary, the Transformer model based on bidirectional gated recurrent unit optimization successfully constructed in this study can efficiently and accurately complete GPU memory demand prediction tasks in deep learning tasks, and its prediction accuracy has been significantly improved compared to traditional machine learning methods. This research provides strong technical support and reliable theoretical basis for optimizing resource scheduling and management of deep learning tasks, and improving the utilization efficiency of computing clusters.

GAN|对抗|攻击|生成相关(9篇)

【1】Generative Correlation Manifolds: Generating Synthetic Data with Preserved Higher-Order Correlations
标题：生成相关Manifols：生成具有保留的更高阶相关性的合成数据
链接：https://arxiv.org/abs/2510.21610

作者：Jens E. d'Hondt, Wieger R. Punter, Odysseas Papapetrou
摘要：对数据隐私的日益增长的需求和对强大的机器学习模型的需求推动了合成数据生成技术的发展。然而，目前的方法往往成功地复制简单的汇总统计量，但未能保留两两和高阶相关结构的数据，定义复杂的，多变量的相互作用，在现实世界的系统中固有的。这种限制可能导致合成数据表面上是真实的，但在用于复杂的建模任务时会失败。在本白皮书中，我们介绍了生成相关流形（GCM），这是一种用于生成合成数据的高效计算方法。该技术使用目标相关矩阵的Cholesky分解来产生数据集，通过数学证明，保留了源数据集的整个相关结构-从简单的成对关系到高阶相互作用。我们认为，这种方法提供了一种新的方法来合成数据生成与潜在的应用程序在隐私保护的数据共享，鲁棒的模型训练和仿真。
摘要：The increasing need for data privacy and the demand for robust machine learning models have fueled the development of synthetic data generation techniques. However, current methods often succeed in replicating simple summary statistics but fail to preserve both the pairwise and higher-order correlation structure of the data that define the complex, multi-variable interactions inherent in real-world systems. This limitation can lead to synthetic data that is superficially realistic but fails when used for sophisticated modeling tasks. In this white paper, we introduce Generative Correlation Manifolds (GCM), a computationally efficient method for generating synthetic data. The technique uses Cholesky decomposition of a target correlation matrix to produce datasets that, by mathematical proof, preserve the entire correlation structure -- from simple pairwise relationships to higher-order interactions -- of the source dataset. We argue that this method provides a new approach to synthetic data generation with potential applications in privacy-preserving data sharing, robust model training, and simulation.

【2】Accelerating Data Generation for Nonlinear temporal PDEs via homologous perturbation in solution space
标题：通过解空间中的同调扰动加速非线性时态偏出方程的数据生成
链接：https://arxiv.org/abs/2510.21592

作者：Lei Liu, Zhenxin Huang, Hong Wang, huanshuo dong, Haiyang Xin, Hongwei Zhao, Bin Li
摘要：数据驱动的深度学习方法（如神经运算符）在求解非线性时间偏微分方程（PDE）方面取得了进展。然而，这些方法需要大量的解对-解函数和方程的右侧（RHS）。这些对通常是通过传统的数值方法生成的，传统的数值方法需要数千个时间步迭代，远远超过训练所需的数十个时间步迭代，从而产生大量的计算和时间开销。为了解决这些挑战，我们提出了一种新的数据生成算法，称为HOmperturbation in Solution Space（HOPSS），它直接生成具有较少时间步长的训练数据集，而不是遵循传统的生成大时间步长数据集的方法。该算法同时加快了数据集的生成，并保持了模型训练所需的近似精度。具体来说，我们首先从一个可靠的求解器中获得一组基本解函数，通常有数千个时间步，然后通过下采样将它们与训练数据集在时间步上对齐。随后，我们提出了一个“同源扰动”的方法：通过结合两个解决方案的功能（一个作为主要功能，其他作为一个同源扰动项缩放的小标量）与随机噪声，我们有效地生成比较精确的PDE数据点。最后，使用这些数据点，我们计算原始方程的RHS的变化，以形成新的解对。理论和实验结果表明，HOPSS降低了时间复杂度。例如，在Navier-Stokes方程上，它可以在传统方法大约10%的时间内生成10，000个样本，并且具有相当的模型训练性能。
摘要：Data-driven deep learning methods like neural operators have advanced in solving nonlinear temporal partial differential equations (PDEs). However, these methods require large quantities of solution pairs\u2014the solution functions and right-hand sides (RHS) of the equations. These pairs are typically generated via traditional numerical methods, which need thousands of time steps iterations far more than the dozens required for training, creating heavy computational and temporal overheads. To address these challenges, we propose a novel data generation algorithm, called HOmologous Perturbation in Solution Space (HOPSS), which directly generates training datasets with fewer time steps rather than following the traditional approach of generating large time steps datasets. This algorithm simultaneously accelerates dataset generation and preserves the approximate precision required for model training. Specifically, we first obtain a set of base solution functions from a reliable solver, usually with thousands of time steps, and then align them in time steps with training datasets by downsampling. Subsequently, we propose a "homologous perturbation" approach: by combining two solution functions (one as the primary function, the other as a homologous perturbation term scaled by a small scalar) with random noise, we efficiently generate comparable-precision PDE data points. Finally, using these data points, we compute the variation in the original equation's RHS to form new solution pairs. Theoretical and experimental results show HOPSS lowers time complexity. For example, on the Navier-Stokes equation, it generates 10,000 samples in approximately 10% of traditional methods' time, with comparable model training performance.

【3】Estimating Treatment Effects in Networks using Domain Adversarial Training
标题：使用领域对抗训练估计网络中的治疗效果
链接：https://arxiv.org/abs/2510.21457

作者：Daan Caljon, Jente Van Belle, Wouter Verbeke
摘要：在网络环境中估计异质性治疗效果是复杂的干扰，这意味着一个实例的结果可能会受到其他人的治疗状态的影响。现有的因果机器学习方法通常假设一个已知的暴露映射，该映射总结了给定实例的结果如何受到其他人的处理的影响，这种简化通常是不现实的。此外，同质性（相似实例连接的趋势）和治疗分配机制之间的相互作用可能会引起网络水平的协变量变化，这可能导致不准确的治疗效果估计，这是一种尚未明确研究的现象。为了应对这些挑战，我们提出了HINet，这是一种将图神经网络与领域对抗训练相结合的新方法。这种组合允许在未知暴露映射下估计治疗效应，同时减轻（网络水平）协变量变化的影响。在合成和半合成网络数据集上的实验结果验证了该方法的有效性.
摘要：Estimating heterogeneous treatment effects in network settings is complicated by interference, meaning that the outcome of an instance can be influenced by the treatment status of others. Existing causal machine learning approaches usually assume a known exposure mapping that summarizes how the outcome of a given instance is influenced by others' treatment, a simplification that is often unrealistic. Furthermore, the interaction between homophily -- the tendency of similar instances to connect -- and the treatment assignment mechanism can induce a network-level covariate shift that may lead to inaccurate treatment effect estimates, a phenomenon that has not yet been explicitly studied. To address these challenges, we propose HINet, a novel method that integrates graph neural networks with domain adversarial training. This combination allows estimating treatment effects under unknown exposure mappings while mitigating the impact of (network-level) covariate shift. An extensive empirical evaluation on synthetic and semi-synthetic network datasets demonstrates the effectiveness of our approach.

【4】Amortized Active Generation of Pareto Sets
标题：帕累托集的摊销主动生成
链接：https://arxiv.org/abs/2510.21052

作者：Daniel M. Steinberg, Asiri Wijesinghe, Rafael Oliveira, Piotr Koniusz, Cheng Soon Ong, Edwin V. Bonilla
备注：Appears in the 39th Conference on Neural Information Processing Systems (NeurIPS 2025)
摘要：我们介绍了帕累托集的主动生成（A-GPS），在线离散黑箱多目标优化（MOO）的一个新的框架。A-GPS学习帕累托集的生成模型，该模型支持对用户偏好的后验调节。该方法采用类概率估计器（CPE）来预测非显性关系，并向搜索空间的高性能区域的生成模型的条件。我们还表明，这种非显性CPE隐式估计超容量改善（PHVI）的概率。为了结合主观权衡，A-GPS引入偏好方向向量，其在客观空间中对用户指定的偏好进行编码。在每次迭代中，使用帕累托成员资格和与这些偏好方向的对齐来更新模型，从而产生能够在帕累托前沿进行采样而无需重新训练的摊销生成模型。结果是一种简单而强大的方法，可以实现高质量的Pareto集近似，避免显式的超体积计算，并灵活地捕获用户偏好。合成基准和蛋白质设计任务的实证结果表明，强大的样本效率和有效的偏好纳入。
摘要：We introduce active generation of Pareto sets (A-GPS), a new framework for online discrete black-box multi-objective optimization (MOO). A-GPS learns a generative model of the Pareto set that supports a-posteriori conditioning on user preferences. The method employs a class probability estimator (CPE) to predict non-dominance relations and to condition the generative model toward high-performing regions of the search space. We also show that this non-dominance CPE implicitly estimates the probability of hypervolume improvement (PHVI). To incorporate subjective trade-offs, A-GPS introduces preference direction vectors that encode user-specified preferences in objective space. At each iteration, the model is updated using both Pareto membership and alignment with these preference directions, producing an amortized generative model capable of sampling across the Pareto front without retraining. The result is a simple yet powerful approach that achieves high-quality Pareto set approximations, avoids explicit hypervolume computation, and flexibly captures user preferences. Empirical results on synthetic benchmarks and protein design tasks demonstrate strong sample efficiency and effective preference incorporation.

【5】Fair Representation Learning with Controllable High Confidence Guarantees via Adversarial Inference
标题：通过对抗推理实现具有可控高置信度保证的公平表示学习
链接：https://arxiv.org/abs/2510.21017

作者：Yuhong Luo, Austin Hoag, Xintong Wang, Philip S. Thomas, Przemyslaw A. Grabowicz
备注：Accepted by NeurIPS 2025
摘要：表征学习越来越多地应用于生成在多个下游任务中很好地泛化的表征。确保表征学习的公平性对于防止下游任务中对特定人口群体的不公平至关重要。在这项工作中，我们正式介绍了学习表示，实现高置信度的公平性的任务。我们的目标是保证每个下游预测中的人口统计差异仍然受到 * 用户定义的 * 错误阈值$的限制，具有 * 可控的 * 高概率。为此，我们提出了 *F**air **R**epresentation learning with high-confidence **G** guarantees（FRG）* 框架，该框架通过利用优化的对抗模型来提供这些高置信度的公平性保证。我们在三个真实世界的数据集上对FRG进行了经验评估，将其性能与六种最先进的公平表示学习方法进行了比较。我们的研究结果表明，FRG始终限制了一系列下游模型和任务的不公平性。
摘要：Representation learning is increasingly applied to generate representations that generalize well across multiple downstream tasks. Ensuring fairness guarantees in representation learning is crucial to prevent unfairness toward specific demographic groups in downstream tasks. In this work, we formally introduce the task of learning representations that achieve high-confidence fairness. We aim to guarantee that demographic disparity in every downstream prediction remains bounded by a *user-defined* error threshold $\epsilon$, with *controllable* high probability. To this end, we propose the ***F**air **R**epresentation learning with high-confidence **G**uarantees (FRG)* framework, which provides these high-confidence fairness guarantees by leveraging an optimized adversarial model. We empirically evaluate FRG on three real-world datasets, comparing its performance to six state-of-the-art fair representation learning methods. Our results demonstrate that FRG consistently bounds unfairness across a range of downstream models and tasks.

【6】Can Current Detectors Catch Face-to-Voice Deepfake Attacks?
标题：当前的检测器可以捕捉面对面语音Deepfake攻击吗？
链接：https://arxiv.org/abs/2510.21004

作者：Nguyen Linh Bao Nguyen, Alsharif Abuadbba, Kristen Moore, Tingming Wu
备注：8 pages, Accepted at Workshop on AI for Cyber Threat Intelligence, co-located with ACSAC 2025
摘要：生成模型的快速发展使人们能够创建越来越隐秘的合成语音，通常被称为音频deepfake。最近的一项技术，FOICE [USENIX'24]，展示了一种特别令人担忧的能力：从单个面部图像生成受害者的声音，而不需要任何声音样本。通过利用面部和声音特征之间的相关性，FOICE产生的合成声音足够逼真，可以绕过行业标准的身份验证系统，包括微信声纹和Microsoft Azure。这引起了严重的安全问题，因为面部图像比语音样本更容易被对手获得，大大降低了大规模攻击的门槛。在这项工作中，我们研究了两个核心研究问题：（RQ 1）最先进的音频deepfake检测器能否在干净和嘈杂的条件下可靠地检测FOICE生成的语音，以及（RQ 2）在FOICE数据上微调这些检测器是否可以在不过度拟合的情况下改善检测，从而保持对SpeechT5等看不见的语音生成器的鲁棒性。我们的研究有三个贡献。首先，我们提出了第一个系统的评估FOICE检测，显示领先的检测器在标准和嘈杂的条件下始终失败。其次，我们引入了有针对性的微调策略，捕捉FOICE特定的工件，产生显着的准确性提高。第三，我们在微调后评估泛化，揭示了FOICE专业化和不可见合成管道的鲁棒性之间的权衡。这些发现暴露了当今防御的根本弱点，并激发了下一代音频deepfake检测的新架构和训练协议。
摘要：The rapid advancement of generative models has enabled the creation of increasingly stealthy synthetic voices, commonly referred to as audio deepfakes. A recent technique, FOICE [USENIX'24], demonstrates a particularly alarming capability: generating a victim's voice from a single facial image, without requiring any voice sample. By exploiting correlations between facial and vocal features, FOICE produces synthetic voices realistic enough to bypass industry-standard authentication systems, including WeChat Voiceprint and Microsoft Azure. This raises serious security concerns, as facial images are far easier for adversaries to obtain than voice samples, dramatically lowering the barrier to large-scale attacks. In this work, we investigate two core research questions: (RQ1) can state-of-the-art audio deepfake detectors reliably detect FOICE-generated speech under clean and noisy conditions, and (RQ2) whether fine-tuning these detectors on FOICE data improves detection without overfitting, thereby preserving robustness to unseen voice generators such as SpeechT5. Our study makes three contributions. First, we present the first systematic evaluation of FOICE detection, showing that leading detectors consistently fail under both standard and noisy conditions. Second, we introduce targeted fine-tuning strategies that capture FOICE-specific artifacts, yielding significant accuracy improvements. Third, we assess generalization after fine-tuning, revealing trade-offs between specialization to FOICE and robustness to unseen synthesis pipelines. These findings expose fundamental weaknesses in today's defenses and motivate new architectures and training protocols for next-generation audio deepfake detection.

【7】Cultural Alien Sampler: Open-ended art generation balancing originality and coherence
标题：文化外星人采样器：开放式艺术生成平衡原创性和连贯性
链接：https://arxiv.org/abs/2510.20849

作者：Alejandro H. Artiles, Hiromu Yakura, Levin Brinkmann, Mar Canet Sola, Hassan Abu Alhaija, Ignacio Serna, Nasim Rahaman, Bernhard Schölkopf, Iyad Rahwan
备注：Proceedings of the 39th Conference on Neural Information Processing Systems (NeurIPS 2025). Creative AI Track. 26 pages, 24 figures
摘要：在开放式领域，如艺术，自主代理必须产生既原创又内部连贯的想法，但目前的大型语言模型（LLM）要么默认熟悉的文化模式，要么在推向新奇时牺牲连贯性。我们通过引入文化外来采样器（CAS）来解决这个问题，这是一种概念选择方法，它明确地将组合适合性与文化典型性分开。CAS使用两个GPT-2模型对WikiArt概念进行微调：概念一致性模型，用于评估概念是否可能在艺术品中共同出现，以及文化背景模型，用于评估这些组合在艺术家作品中的典型性。CAS的目标是高度一致性和低典型性的组合，产生保持内部一致性的想法，同时偏离了学习的惯例和嵌入的文化背景。在人类评估（N = 100）中，我们的方法优于随机选择和GPT-4 o基线，并在感知的原创性和和谐性方面达到与人类艺术学生相当的性能。此外，定量研究表明，我们的方法产生了更多样化的输出，并探索了更广泛的概念空间比GPT-4 o对应，表明人工文化异化可以解锁自主代理的创造潜力。
摘要：In open-ended domains like art, autonomous agents must generate ideas that are both original and internally coherent, yet current Large Language Models (LLMs) either default to familiar cultural patterns or sacrifice coherence when pushed toward novelty. We address this by introducing the Cultural Alien Sampler (CAS), a concept-selection method that explicitly separates compositional fit from cultural typicality. CAS uses two GPT-2 models fine-tuned on WikiArt concepts: a Concept Coherence Model that scores whether concepts plausibly co-occur within artworks, and a Cultural Context Model that estimates how typical those combinations are within individual artists' bodies of work. CAS targets combinations that are high in coherence and low in typicality, yielding ideas that maintain internal consistency while deviating from learned conventions and embedded cultural context. In a human evaluation (N = 100), our approach outperforms random selection and GPT-4o baselines and achieves performance comparable to human art students in both perceived originality and harmony. Additionally, a quantitative study shows that our method produces more diverse outputs and explores a broader conceptual space than its GPT-4o counterpart, demonstrating that artificial cultural alienness can unlock creative potential in autonomous agents.

【8】SViM3D: Stable Video Material Diffusion for Single Image 3D Generation
标题：SViM 3D：用于单图像3D生成的稳定视频材料扩散
链接：https://arxiv.org/abs/2510.08271

作者：Andreas Engelhardt, Mark Boss, Vikram Voletti, Chun-Han Yao, Hendrik P. A. Lensch, Varun Jampani
备注：Accepted by International Conference on Computer Vision (ICCV 2025). Project page: this http URL
摘要：我们提出了稳定的视频材料3D（SViM 3D），一个框架来预测多视图一致的基于物理的渲染（PBR）材料，给定一个单一的图像。最近，视频扩散模型已被成功地用于从单个图像有效地重建3D对象。但是，反射率仍然由简单的材料模型表示，或者需要在其他步骤中进行估计，以实现重新照明和受控的外观编辑。我们扩展了一个潜在的视频扩散模型，输出空间变化的PBR参数和表面法线联合与每个生成的视图的基础上明确的相机控制。这种独特的设置允许使用我们的模型作为神经先验来重新照明和生成3D资产。我们引入各种机制，这个管道，提高质量，在这个不适定的设置。我们在多个以对象为中心的数据集上展示了最先进的重新照明和新颖的视图合成性能。我们的方法可以推广到不同的输入，从而能够生成在AR/VR，电影，游戏和其他视觉媒体中有用的可重定向的3D资产。
摘要：We present Stable Video Materials 3D (SViM3D), a framework to predict multi-view consistent physically based rendering (PBR) materials, given a single image. Recently, video diffusion models have been successfully used to reconstruct 3D objects from a single image efficiently. However, reflectance is still represented by simple material models or needs to be estimated in additional steps to enable relighting and controlled appearance edits. We extend a latent video diffusion model to output spatially varying PBR parameters and surface normals jointly with each generated view based on explicit camera control. This unique setup allows for relighting and generating a 3D asset using our model as neural prior. We introduce various mechanisms to this pipeline that improve quality in this ill-posed setting. We show state-of-the-art relighting and novel view synthesis performance on multiple object-centric datasets. Our method generalizes to diverse inputs, enabling the generation of relightable 3D assets useful in AR/VR, movies, games and other visual media.

【9】Kernel Learning with Adversarial Features: Numerical Efficiency and Adaptive Regularization
标题：具有对抗性特征的核学习：数值效率和自适应正规化
链接：https://arxiv.org/abs/2510.20883

作者：Antônio H. Ribeiro, David Vävinggren, Dave Zachariah, Thomas B. Schön, Francis Bach
备注：Accepted NeurIPS 2025
摘要：对抗性训练已经成为增强模型对对抗性输入扰动鲁棒性的关键技术。许多现有的方法依赖于计算昂贵的最小-最大问题，限制了它们在实践中的应用。我们提出了一种在再生核希尔伯特空间中进行对抗训练的新方法，从输入转移到特征空间扰动。这种重新制定，使内部最大化和有效的优化的精确解。它还提供了一个正则化的估计器，可以自然地适应噪声水平和底层函数的平滑度。我们建立的条件下，特征扰动制定是一个放松的原始问题，并提出了一个有效的优化算法的基础上迭代核岭回归。我们提供的泛化范围，有助于理解的方法的属性。我们还将该公式扩展到多核学习。实证评估显示，在干净和对抗性设置良好的性能。
摘要：Adversarial training has emerged as a key technique to enhance model robustness against adversarial input perturbations. Many of the existing methods rely on computationally expensive min-max problems that limit their application in practice. We propose a novel formulation of adversarial training in reproducing kernel Hilbert spaces, shifting from input to feature-space perturbations. This reformulation enables the exact solution of inner maximization and efficient optimization. It also provides a regularized estimator that naturally adapts to the noise level and the smoothness of the underlying function. We establish conditions under which the feature-perturbed formulation is a relaxation of the original problem and propose an efficient optimization algorithm based on iterative kernel ridge regression. We provide generalization bounds that help to understand the properties of the method. We also extend the formulation to multiple kernel learning. Empirical evaluation shows good performance in both clean and adversarial settings.

半/弱/无/有监督|不确定性|主动学习(7篇)

【1】On Uncertainty Calibration for Equivariant Functions
标题：等变函数的不确定度校准
链接：https://arxiv.org/abs/2510.21691

作者：Edward Berman, Jacob Ginesin, Marco Pacini, Robin Walters
备注：Under review at Transactions on Machine Learning Research (TMLR). Code is available at this https URL . Excited to share this paper, comments welcome :D
摘要：数据稀疏的设置，如机器人操作，分子物理学和星系形态分类是深度学习最困难的领域。对于这些问题，等变网络可以帮助改进输入空间欠采样部分的建模，而不确定性估计可以防止过度自信。然而，到目前为止，等方差和模型置信度之间的关系，以及更一般的等方差和模型校准，尚未得到研究。由于传统的分类和回归误差项出现在校准误差的定义中，因此很自然地怀疑以前的工作可以用来帮助理解等方差和校准误差之间的关系。在这项工作中，我们提出了一个理论相关的等方差不确定性估计。通过证明各种等方差条件下的不确定性校准误差（ECE和ENCE）的下限和上限，我们阐明了等变模型的泛化限制，并说明了对称性失配如何导致分类和回归中的误校准。我们补充我们的理论框架与数值实验，澄清等方差和不确定性之间的关系，使用各种真实和模拟数据集，我们评论的趋势与对称性不匹配，组大小，任意和认知的不确定性。
摘要：Data-sparse settings such as robotic manipulation, molecular physics, and galaxy morphology classification are some of the hardest domains for deep learning. For these problems, equivariant networks can help improve modeling across undersampled parts of the input space, and uncertainty estimation can guard against overconfidence. However, until now, the relationships between equivariance and model confidence, and more generally equivariance and model calibration, has yet to be studied. Since traditional classification and regression error terms show up in the definitions of calibration error, it is natural to suspect that previous work can be used to help understand the relationship between equivariance and calibration error. In this work, we present a theory relating equivariance to uncertainty estimation. By proving lower and upper bounds on uncertainty calibration errors (ECE and ENCE) under various equivariance conditions, we elucidate the generalization limits of equivariant models and illustrate how symmetry mismatch can result in miscalibration in both classification and regression. We complement our theoretical framework with numerical experiments that clarify the relationship between equivariance and uncertainty using a variety of real and simulated datasets, and we comment on trends with symmetry mismatch, group size, and aleatoric and epistemic uncertainties.

【2】An unsupervised tour through the hidden pathways of deep neural networks
标题：深度神经网络隐藏路径的无监督之旅
链接：https://arxiv.org/abs/2510.21582

作者：Diego Doimo
备注：PhD thesis
摘要：本论文的目标是提高我们对深度人工神经网络创建有意义的表示并能够泛化的内部机制的理解。我们专注于用无监督学习工具表征隐藏表示的语义内容的挑战，部分由我们开发并在本文中描述，它允许利用数据的低维结构。第2章.介绍了网格，一种方法，允许估计的内在尺寸的数据作为一个明确的功能的规模，而不执行任何抽取的数据集。我们的方法是基于严格的分布结果，使量化的不确定性的估计。此外，我们的方法是简单和计算效率，因为它只依赖于最近的数据点之间的距离。在第3章中，我们研究了一些最先进的深度神经网络中隐藏层的概率密度的演化。我们发现，初始层产生的单峰概率密度摆脱任何结构无关的分类。在随后的层中，密度峰值以分层的方式出现，反映了概念的语义层次。该过程在输出层的概率密度中留下足迹，其中峰的拓扑允许重构类别的语义关系。在第4章中，我们研究了深度神经网络中的泛化问题：向插值其训练数据的网络添加参数通常会提高其泛化性能，这与经典的偏差-方差权衡不一致。我们表明，宽神经网络学习冗余表示，而不是过拟合虚假的相关性，只有当网络是正则化的，训练误差为零，冗余的神经元出现。
摘要：The goal of this thesis is to improve our understanding of the internal mechanisms by which deep artificial neural networks create meaningful representations and are able to generalize. We focus on the challenge of characterizing the semantic content of the hidden representations with unsupervised learning tools, partially developed by us and described in this thesis, which allow harnessing the low-dimensional structure of the data. Chapter 2. introduces Gride, a method that allows estimating the intrinsic dimension of the data as an explicit function of the scale without performing any decimation of the data set. Our approach is based on rigorous distributional results that enable the quantification of uncertainty of the estimates. Moreover, our method is simple and computationally efficient since it relies only on the distances among nearest data points. In Chapter 3, we study the evolution of the probability density across the hidden layers in some state-of-the-art deep neural networks. We find that the initial layers generate a unimodal probability density getting rid of any structure irrelevant to classification. In subsequent layers, density peaks arise in a hierarchical fashion that mirrors the semantic hierarchy of the concepts. This process leaves a footprint in the probability density of the output layer, where the topography of the peaks allows reconstructing the semantic relationships of the categories. In Chapter 4, we study the problem of generalization in deep neural networks: adding parameters to a network that interpolates its training data will typically improve its generalization performance, at odds with the classical bias-variance trade-off. We show that wide neural networks learn redundant representations instead of overfitting to spurious correlation and that redundant neurons appear only if the network is regularized and the training error is zero.

【3】Surrogate-based quantification of policy uncertainty in generative flow networks
标题：生成性流网络中政策不确定性的基于代理的量化
链接：https://arxiv.org/abs/2510.21523

作者：Ramón Nartallo-Kaluarachchi, Robert Manson-Sawko, Shashanka Ubaru, Dongsung Huh, Małgorzata J Zimoń, Lior Horesh, Yoshua Bengio
备注：18 pages, 6 figures
摘要：生成流网络能够根据奖励函数通过顺序构造对高奖励的复杂对象进行采样。然而，这样的奖励函数往往估计近似从噪声数据，导致认知的不确定性学习的政策。我们提出了一种方法来量化这种不确定性，通过构建一个代理模型组成的多项式混沌扩展，适合一个小的合奏训练流网络。该模型学习在低维空间中参数化的奖励函数与沿流网络轨迹的每一步动作的概率分布之间的关系。代理模型，然后可以用于廉价的蒙特卡罗抽样估计的不确定性，在给定的不确定性奖励的政策。我们说明了我们的方法上的离散和连续的网格世界，符号回归，贝叶斯结构学习任务的性能。
摘要：Generative flow networks are able to sample, via sequential construction, high-reward, complex objects according to a reward function. However, such reward functions are often estimated approximately from noisy data, leading to epistemic uncertainty in the learnt policy. We present an approach to quantify this uncertainty by constructing a surrogate model composed of a polynomial chaos expansion, fit on a small ensemble of trained flow networks. This model learns the relationship between reward functions, parametrised in a low-dimensional space, and the probability distributions over actions at each step along a trajectory of the flow network. The surrogate model can then be used for inexpensive Monte Carlo sampling to estimate the uncertainty in the policy given uncertain rewards. We illustrate the performance of our approach on a discrete and continuous grid-world, symbolic regression, and a Bayesian structure learning task.

【4】DreamerV3-XP: Optimizing exploration through uncertainty estimation
标题：DreamerV 3-XP：通过不确定性估计优化探索
链接：https://arxiv.org/abs/2510.21418

作者：Lukas Bierling, Davide Pasero, Jan-Henrik Bertrand, Kiki Van Gerwen
摘要：我们介绍DreamerV 3-XP，DreamerV 3的扩展，提高探索和学习效率。这包括（i）优先重放缓冲器，通过返回、重建损失和值误差对轨迹进行评分，以及（ii）基于对来自世界模型集合的预测环境奖励的不一致的固有奖励。DreamerV 3-XP在Atari 100 k和DeepMind Control Visual Benchmark任务的子集上进行了评估，证实了DreamerV 3的原始结果，并表明我们的扩展可以更快地学习和降低动态模型损失，特别是在稀疏奖励设置中。
摘要：We introduce DreamerV3-XP, an extension of DreamerV3 that improves exploration and learning efficiency. This includes (i) a prioritized replay buffer, scoring trajectories by return, reconstruction loss, and value error and (ii) an intrinsic reward based on disagreement over predicted environment rewards from an ensemble of world models. DreamerV3-XP is evaluated on a subset of Atari100k and DeepMind Control Visual Benchmark tasks, confirming the original DreamerV3 results and showing that our extensions lead to faster learning and lower dynamics model loss, particularly in sparse-reward settings.

【5】Uncertainty-Aware Multi-Objective Reinforcement Learning-Guided Diffusion Models for 3D De Novo Molecular Design
标题：用于3D De Novo分子设计的具有不确定性的多目标强化学习引导的扩散模型
链接：https://arxiv.org/abs/2510.21153

作者：Lianghong Chen, Dongkyu Eugene Kim, Mike Domaratzki, Pingzhao Hu
备注：Accepted at NeurIPS 2025
摘要：设计具有所需性质的从头3D分子仍然是药物发现和分子工程中的基本挑战。虽然扩散模型在生成高质量的3D分子结构方面表现出了卓越的能力，但它们通常难以有效地控制对现实世界应用至关重要的复杂多目标约束。在这项研究中，我们提出了一个具有不确定性意识的强化学习（RL）框架，以指导3D分子扩散模型朝着多个属性目标进行优化，同时提高生成分子的整体质量。我们的方法利用具有预测不确定性估计的代理模型来动态地塑造奖励函数，从而促进多个优化目标之间的平衡。我们在三个基准数据集和多扩散模型架构上全面评估了我们的框架，在分子质量和性质优化方面始终优于基线。此外，分子动力学（MD）模拟和ADMET分析的顶部生成的候选人表明有前途的药物样行为和结合稳定性，与已知的表皮生长因子受体（EGFR）抑制剂。我们的研究结果表明，RL引导的生成扩散模型推进自动化分子设计的强大潜力。
摘要：Designing de novo 3D molecules with desirable properties remains a fundamental challenge in drug discovery and molecular engineering. While diffusion models have demonstrated remarkable capabilities in generating high-quality 3D molecular structures, they often struggle to effectively control complex multi-objective constraints critical for real-world applications. In this study, we propose an uncertainty-aware Reinforcement Learning (RL) framework to guide the optimization of 3D molecular diffusion models toward multiple property objectives while enhancing the overall quality of the generated molecules. Our method leverages surrogate models with predictive uncertainty estimation to dynamically shape reward functions, facilitating balance across multiple optimization objectives. We comprehensively evaluate our framework across three benchmark datasets and multiple diffusion model architectures, consistently outperforming baselines for molecular quality and property optimization. Additionally, Molecular Dynamics (MD) simulations and ADMET profiling of top generated candidates indicate promising drug-like behavior and binding stability, comparable to known Epidermal Growth Factor Receptor (EGFR) inhibitors. Our results demonstrate the strong potential of RL-guided generative diffusion models for advancing automated molecular design.

【6】Physically consistent and uncertainty-aware learning of spatiotemporal dynamics
标题：物理一致且具有不确定性意识的时空动力学学习
链接：https://arxiv.org/abs/2510.21023

作者：Qingsong Xu, Jonathan L Bamber, Nils Thuerey, Niklas Boers, Paul Bates, Gustau Camps-Valls, Yilei Shi, Xiao Xiang Zhu
备注：Main text:33 pages,6 figures
摘要：时空动态的准确长期预测仍然是科学和工程领域的一个基本挑战。现有的机器学习方法往往忽略了物理规律，无法量化时空预测中的固有不确定性。为了解决这些挑战，我们引入了一个物理一致的神经算子（PCNO），通过将代理模型输出投影到满足预定义定律的函数空间来执行物理约束。PCNO中的物理一致性投影层有效地计算傅立叶空间中的质量和动量守恒。在确定性预测的基础上，我们进一步提出了一种扩散模型增强的PCNO（DiffPCNO），它利用一致性模型来量化和减轻不确定性，从而提高预测的准确性和可靠性。PCNO和DiffPCNO实现了高保真时空预测，同时在不同的系统和空间分辨率中保持物理一致性和不确定性，从湍流建模到现实世界的洪水/大气预测。我们的两阶段框架提供了一个强大的和通用的方法，准确的，物理接地，和不确定性意识的时空预测。
摘要：Accurate long-term forecasting of spatiotemporal dynamics remains a fundamental challenge across scientific and engineering domains. Existing machine learning methods often neglect governing physical laws and fail to quantify inherent uncertainties in spatiotemporal predictions. To address these challenges, we introduce a physics-consistent neural operator (PCNO) that enforces physical constraints by projecting surrogate model outputs onto function spaces satisfying predefined laws. A physics-consistent projection layer within PCNO efficiently computes mass and momentum conservation in Fourier space. Building upon deterministic predictions, we further propose a diffusion model-enhanced PCNO (DiffPCNO), which leverages a consistency model to quantify and mitigate uncertainties, thereby improving the accuracy and reliability of forecasts. PCNO and DiffPCNO achieve high-fidelity spatiotemporal predictions while preserving physical consistency and uncertainty across diverse systems and spatial resolutions, ranging from turbulent flow modeling to real-world flood/atmospheric forecasting. Our two-stage framework provides a robust and versatile approach for accurate, physically grounded, and uncertainty-aware spatiotemporal forecasting.

【7】VESSA: Video-based objEct-centric Self-Supervised Adaptation for Visual Foundation Models
标题：VESSA：基于视频的以对象为中心的视觉基础模型的自我监督适应
链接：https://arxiv.org/abs/2510.20994

作者：Jesimon Barreto, Carlos Caetano, André Araujo, William Robson Schwartz
备注：Conference on Neural Information Processing Systems (NeurIPS 2025)
摘要：基础模型通过大规模的预训练和监督微调，在不同的任务中实现强大的性能，从而提高了计算机视觉。然而，他们可能表现不佳的领域分布变化和稀缺的标签，监督微调可能是不可行的。虽然用于模型自适应的持续自监督学习对于生成语言模型来说很常见，但这种策略对于以视觉为中心的编码器模型并不有效。为了应对这一挑战，我们引入了一种新的自监督微调视觉基础模型的配方，其中该模型适用于一个新的领域，而不需要注释，只利用短的多视图对象为中心的视频。我们的方法被称为VESSA：基于视频的以对象为中心的自监督适应视觉基础模型。VESSA的训练技术基于自蒸馏范式，其中仔细调整预测头并部署参数有效的自适应技术至关重要-否则，模型可能会很快忘记其预先训练的知识并达到降级状态。VESSA显著受益于来自以对象为中心的视频中不同帧的多视图对象观察，有效地学习对不同捕获条件的鲁棒性，而无需注释。通过在2个数据集上使用3个视觉基础模型进行综合实验，与基础模型和以前的自适应方法相比，VESSA在下游分类任务中表现出一致的改进。代码可在https://github.com/jesimonbarreto/VESSA上公开获取。
摘要：Foundation models have advanced computer vision by enabling strong performance across diverse tasks through large-scale pretraining and supervised fine-tuning. However, they may underperform in domains with distribution shifts and scarce labels, where supervised fine-tuning may be infeasible. While continued self-supervised learning for model adaptation is common for generative language models, this strategy has not proven effective for vision-centric encoder models. To address this challenge, we introduce a novel formulation of self-supervised fine-tuning for vision foundation models, where the model is adapted to a new domain without requiring annotations, leveraging only short multi-view object-centric videos. Our method is referred to as VESSA: Video-based objEct-centric Self-Supervised Adaptation for visual foundation models. VESSA's training technique is based on a self-distillation paradigm, where it is critical to carefully tune prediction heads and deploy parameter-efficient adaptation techniques - otherwise, the model may quickly forget its pretrained knowledge and reach a degraded state. VESSA benefits significantly from multi-view object observations sourced from different frames in an object-centric video, efficiently learning robustness to varied capture conditions, without the need of annotations. Through comprehensive experiments with 3 vision foundation models on 2 datasets, VESSA demonstrates consistent improvements in downstream classification tasks, compared to the base models and previous adaptation methods. Code is publicly available at https://github.com/jesimonbarreto/VESSA.

迁移|Zero/Few/One-Shot|自适应(10篇)

【1】Interpretable Multimodal Zero-Shot ECG Diagnosis via Structured Clinical Knowledge Alignment
标题：通过结构化临床知识对齐可解释的多模式零激发心电图诊断
链接：https://arxiv.org/abs/2510.21551

作者：Jialu Tang, Hung Manh Pham, Ignace De Lathauwer, Henk S. Schipper, Yuan Lu, Dong Ma, Aaqib Saeed
摘要：心电图（ECG）解释对于心血管疾病诊断至关重要，但目前的自动化系统通常难以实现透明性和对未知条件的泛化。为了解决这个问题，我们介绍了ZETA，一个zero-shot多模态框架，旨在与临床工作流程一致的可解释的ECG诊断。ZETA独特地将ECG信号与结构化的阳性和阴性临床观察结果进行比较，这些观察结果是通过LLM辅助的专家验证过程进行管理的，从而模仿鉴别诊断。我们的方法利用预训练的多模态模型来对齐ECG和文本嵌入，而无需进行疾病特定的微调。经验评估证明了ZETA的竞争性zero-shot分类性能，并且重要的是，提供了增强的可解释性的定性和定量证据，在特定的、临床相关的阳性和阴性诊断特征中进行预测。ZETA强调了将ECG分析与结构化临床知识相结合的潜力，以构建更透明，更可推广和更值得信赖的AI诊断系统。我们将发布策划的观察数据集和代码，以促进未来的研究。
摘要：Electrocardiogram (ECG) interpretation is essential for cardiovascular disease diagnosis, but current automated systems often struggle with transparency and generalization to unseen conditions. To address this, we introduce ZETA, a zero-shot multimodal framework designed for interpretable ECG diagnosis aligned with clinical workflows. ZETA uniquely compares ECG signals against structured positive and negative clinical observations, which are curated through an LLM-assisted, expert-validated process, thereby mimicking differential diagnosis. Our approach leverages a pre-trained multimodal model to align ECG and text embeddings without disease-specific fine-tuning. Empirical evaluations demonstrate ZETA's competitive zero-shot classification performance and, importantly, provide qualitative and quantitative evidence of enhanced interpretability, grounding predictions in specific, clinically relevant positive and negative diagnostic features. ZETA underscores the potential of aligning ECG analysis with structured clinical knowledge for building more transparent, generalizable, and trustworthy AI diagnostic systems. We will release the curated observation dataset and code to facilitate future research.

【2】Parameter-Free Hypergraph Neural Network for Few-Shot Node Classification
标题：用于Few-Shot节点分类的无参数超图神经网络
链接：https://arxiv.org/abs/2510.21462

作者：Chaewoon Bae, Doyun Choi, Jaehyun Lee, Jaemin Yoo
摘要：超图上的Few-Shot节点分类要求模型在捕获高阶结构的同时从稀缺标签进行概括。现有的超图神经网络（HNN）可以有效地对这种结构进行编码，但由于复杂的黑盒架构，通常会遇到过拟合和可扩展性问题。在这项工作中，我们提出了ZEN（零参数超图神经网络），一个完全线性和无参数的模型，实现了表达能力和效率。ZEN建立在线性化HNN的统一公式基础上，为权重矩阵引入了一个易于处理的封闭形式解决方案，并引入了一个冗余感知的传播方案，以避免迭代训练并消除冗余的自信息。在11个真实超图基准测试中，ZEN在分类准确性方面始终优于8个基线模型，同时实现了最快竞争对手的696倍加速。此外，ZEN的决策过程是完全可解释的，可以深入了解数据集的特征。我们的代码和数据集可在https://github.com/chaewoonbae/ZEN上完整获取。
摘要：Few-shot node classification on hypergraphs requires models that generalize from scarce labels while capturing high-order structures. Existing hypergraph neural networks (HNNs) effectively encode such structures but often suffer from overfitting and scalability issues due to complex, black-box architectures. In this work, we propose ZEN (Zero-Parameter Hypergraph Neural Network), a fully linear and parameter-free model that achieves both expressiveness and efficiency. Built upon a unified formulation of linearized HNNs, ZEN introduces a tractable closed-form solution for the weight matrix and a redundancy-aware propagation scheme to avoid iterative training and to eliminate redundant self information. On 11 real-world hypergraph benchmarks, ZEN consistently outperforms eight baseline models in classification accuracy while achieving up to 696x speedups over the fastest competitor. Moreover, the decision process of ZEN is fully interpretable, providing insights into the characteristic of a dataset. Our code and datasets are fully available at https://github.com/chaewoonbae/ZEN.

【3】Randomized Neural Network with Adaptive Forward Regularization for Online Task-free Class Incremental Learning
标题：具有自适应前向正规化的随机神经网络用于在线无任务课堂增量学习
链接：https://arxiv.org/abs/2510.21367

作者：Junda Wang, Minghui Hu, Ning Li, Abdulaziz Al-Ali, Ponnuthurai Nagaratnam Suganthan
摘要：类增量学习（CIL）要求智能体连续学习不同的任务，并保持知识不被遗忘。阻碍CIL方法实际应用的问题有两个方面：（1）非i.i. d批处理流和没有边界提示更新，称为更苛刻的在线无任务CIL（OTCIL）场景;（2）CIL方法在学习长任务流时会出现记忆丢失，如图1（a）所示。为了实现有效的决策和减少OTCIL过程中的累积遗憾，提出了一种具有前向正则化（-F）的随机神经网络（Randomized NN），以抵抗遗忘并提高学习性能。这个通用框架将无监督知识集成到递归凸优化中，没有学习耗散，并且可以优于OTCIL中的规范岭样式（-R）。基于这个框架，我们推导出了具有可调前向正则化（-kF）的集成深度随机向量功能链接网络（edRVFL）的算法，其中k调节干预的强度。edRVFL-kF生成一遍封闭形式的增量更新和可变学习率，有效避免过去的重放和灾难性遗忘，同时实现卓越的性能。此外，为了抑制非i.i. d引起的不稳定惩罚并减轻OTCIL中-kF的棘手调整，我们将其改进为即插即用edRVFL-kF-Bayes，使多个子学习器中的所有硬ks能够基于贝叶斯学习自适应地确定。在2个图像数据集上进行了实验，包括6个指标，动态性能，消融测试和兼容性，这清楚地验证了我们的OTCIL框架与-kF-Bayes和-kF风格的有效性。
摘要：Class incremental learning (CIL) requires an agent to learn distinct tasks consecutively with knowledge retention against forgetting. Problems impeding the practical applications of CIL methods are twofold: (1) non-i.i.d batch streams and no boundary prompts to update, known as the harsher online task-free CIL (OTCIL) scenario; (2) CIL methods suffer from memory loss in learning long task streams, as shown in Fig. 1 (a). To achieve efficient decision-making and decrease cumulative regrets during the OTCIL process, a randomized neural network (Randomized NN) with forward regularization (-F) is proposed to resist forgetting and enhance learning performance. This general framework integrates unsupervised knowledge into recursive convex optimization, has no learning dissipation, and can outperform the canonical ridge style (-R) in OTCIL. Based on this framework, we derive the algorithm of the ensemble deep random vector functional link network (edRVFL) with adjustable forward regularization (-kF), where k mediates the intensity of the intervention. edRVFL-kF generates one-pass closed-form incremental updates and variable learning rates, effectively avoiding past replay and catastrophic forgetting while achieving superior performance. Moreover, to curb unstable penalties caused by non-i.i.d and mitigate intractable tuning of -kF in OTCIL, we improve it to the plug-and-play edRVFL-kF-Bayes, enabling all hard ks in multiple sub-learners to be self-adaptively determined based on Bayesian learning. Experiments were conducted on 2 image datasets including 6 metrics, dynamic performance, ablation tests, and compatibility, which distinctly validates the efficacy of our OTCIL frameworks with -kF-Bayes and -kF styles.

【4】BADiff: Bandwidth Adaptive Diffusion Model
标题：BADiff：带宽自适应扩散模型
链接：https://arxiv.org/abs/2510.21366

作者：Xi Zhang, Hanwei Zhu, Yan Zhong, Jiamang Wang, Weisi Lin
备注：NeurIPS 2025 Poster
摘要：在这项工作中，我们提出了一个新的框架，使扩散模型，以适应其生成质量的基础上实时网络带宽的限制。传统的扩散模型通过执行固定数量的去噪步骤来产生高保真图像，而不考虑下游传输限制。然而，在实际的云到设备场景中，有限的带宽通常需要大量压缩，导致精细纹理的丢失和计算浪费。为了解决这个问题，我们引入了一个联合的端到端训练策略，其中扩散模型以从可用带宽中获得的目标质量水平为条件。在训练过程中，该模型学习自适应地调制去噪过程，从而实现早期停止采样，以保持与目标传输条件相适应的感知质量。我们的方法需要最小的架构变化，并利用轻量级的质量嵌入来指导去噪轨迹。实验结果表明，我们的方法显着提高了视觉保真度的带宽适应代相比，天真的早期停止，提供了一个有前途的解决方案，在带宽受限的环境中有效的图像传输。代码可从以下网址获得：https://github.com/xzhang9308/BADiff。
摘要：In this work, we propose a novel framework to enable diffusion models to adapt their generation quality based on real-time network bandwidth constraints. Traditional diffusion models produce high-fidelity images by performing a fixed number of denoising steps, regardless of downstream transmission limitations. However, in practical cloud-to-device scenarios, limited bandwidth often necessitates heavy compression, leading to loss of fine textures and wasted computation. To address this, we introduce a joint end-to-end training strategy where the diffusion model is conditioned on a target quality level derived from the available bandwidth. During training, the model learns to adaptively modulate the denoising process, enabling early-stop sampling that maintains perceptual quality appropriate to the target transmission condition. Our method requires minimal architectural changes and leverages a lightweight quality embedding to guide the denoising trajectory. Experimental results demonstrate that our approach significantly improves the visual fidelity of bandwidth-adapted generations compared to naive early-stopping, offering a promising solution for efficient image delivery in bandwidth-constrained environments. Code is available at: https://github.com/xzhang9308/BADiff.

【5】A Convergence Analysis of Adaptive Optimizers under Floating-point Quantization
标题：浮点量化下自适应优化器的收敛性分析
链接：https://arxiv.org/abs/2510.21314

作者：Xuan Tang, Jichu Li, Difan Zou
备注：65 pages, 10 figures
摘要：大型语言模型（LLM）的快速扩展使得低精度训练对于减少内存、提高效率以及实现更大的模型和数据集至关重要。然而，现有的自适应优化器的收敛理论假设所有组件都是精确的，忽略了硬件感知的量化，留下了为什么低精度训练仍然有效的问题。我们介绍了第一个理论框架，用于分析自适应优化器的收敛性，包括Adam和Muon，在梯度，权重和优化器状态的浮点量化下（例如，矩估计）。在这个框架内，我们得到的收敛速度光滑的非凸目标下的标准随机梯度假设，明确表征如何从不同的组件的量化误差影响收敛。我们发现，这两种算法保留率接近其全精度同行提供的尾数长度尺度只有代数与迭代次数。我们的分析进一步表明，亚当是高度敏感的权重和二阶矩量化，由于其依赖于$\beta_2\到1$，而μ子需要较弱的误差控制，因此可能更强大。这些结果缩小了低精度训练方法的经验成功和理论理解之间的差距。对合成数据和现实数据的数值实验证实了我们的理论。
摘要：The rapid scaling of large language models (LLMs) has made low-precision training essential for reducing memory, improving efficiency, and enabling larger models and datasets. Existing convergence theories for adaptive optimizers, however, assume all components are exact and neglect hardware-aware quantization, leaving open the question of why low-precision training remains effective. We introduce the first theoretical framework for analyzing the convergence of adaptive optimizers, including Adam and Muon, under floating-point quantization of gradients, weights, and optimizer states (e.g., moment estimates). Within this framework, we derive convergence rates on smooth non-convex objectives under standard stochastic gradient assumptions, explicitly characterizing how quantization errors from different components affect convergence. We show that both algorithms retain rates close to their full-precision counterparts provided mantissa length scales only logarithmically with the number of iterations. Our analysis further reveals that Adam is highly sensitive to weights and second-moment quantization due to its reliance on $\beta_2 \to 1$, while Muon requires weaker error control and is thus potentially more robust. These results narrow the gap between empirical success and theoretical understanding of low-precision training methods. Numerical experiments on synthetic and real-world data corroborate our theory.

【6】Adaptive Data Selection for Multi-Layer Perceptron Training: A Sub-linear Value-Driven Method
标题：多层感知器训练的自适应数据选择：一种次线性值驱动方法
链接：https://arxiv.org/abs/2510.21286

作者：Xiyang Zhang, Chen Liang, Haoxuan Qiu, Hongzhi Wang
摘要：数据选择是神经网络训练中的基本问题之一，特别是对于多层感知器（MLP），在预算限制下从大量，多源和异构数据源中识别最有价值的训练样本构成了重大挑战。现有的数据选择方法，包括核心集构造、数据Shapley值和影响函数，都受到严重的限制：它们过度简化非线性变换，忽略隐藏层中的信息中间表示，或者由于计算复杂度高而无法扩展到更大的MLP。作为回应，我们提出了DVC（数据值贡献），这是一种新的基于数据感知的方法，用于评估和选择MLP训练的数据，该方法考虑了训练过程中网络参数的动态演变。DVC方法将数据贡献分解为层价值贡献（LVC）和全局价值贡献（GVC），采用六个精心设计的指标和相应的有效算法来捕获不同粒度的三个维度（质量，相关性和分布多样性）的数据特征。DVC将这些评估与上置信限（UCB）算法相结合，用于平衡勘探和开发的自适应源选择。在六个数据集和八个基线上进行的广泛实验表明，我们的方法在各种预算限制下始终优于现有方法，实现了卓越的准确性和F1分数。我们的方法代表了神经网络分层数据评估的第一个系统化处理，为大规模机器学习系统提供了理论保证和实践优势。
摘要：Data selection is one of the fundamental problems in neural network training, particularly for multi-layer perceptrons (MLPs) where identifying the most valuable training samples from massive, multi-source, and heterogeneous data sources under budget constraints poses significant challenges. Existing data selection methods, including coreset construction, data Shapley values, and influence functions, suffer from critical limitations: they oversimplify nonlinear transformations, ignore informative intermediate representations in hidden layers, or fail to scale to larger MLPs due to high computational complexity. In response, we propose DVC (Data Value Contribution), a novel budget-aware method for evaluating and selecting data for MLP training that accounts for the dynamic evolution of network parameters during training. The DVC method decomposes data contribution into Layer Value Contribution (LVC) and Global Value Contribution (GVC), employing six carefully designed metrics and corresponding efficient algorithms to capture data characteristics across three dimensions--quality, relevance, and distributional diversity--at different granularities. DVC integrates these assessments with an Upper Confidence Bound (UCB) algorithm for adaptive source selection that balances exploration and exploitation. Extensive experiments across six datasets and eight baselines demonstrate that our method consistently outperforms existing approaches under various budget constraints, achieving superior accuracy and F1 scores. Our approach represents the first systematic treatment of hierarchical data evaluation for neural networks, providing both theoretical guarantees and practical advantages for large-scale machine learning systems.

【7】Buffer layers for Test-Time Adaptation
标题：测试时适应的缓冲层
链接：https://arxiv.org/abs/2510.21271

作者：Hyeongyu Kim, Geonhui Han, Dosik Hwang
备注：NeurIPS 2025
摘要：在测试时间自适应（TTA）的最新进展，大多数现有的方法集中在更新规范化层，以适应测试域。然而，依赖基于正常化的适应带来了关键挑战。首先，批量归一化（BN）等归一化层对小批量非常敏感，导致统计数据不稳定和不准确。此外，基于归一化的自适应本质上受到预训练模型结构的约束，因为它依赖于训练时间统计数据，这些统计数据可能无法很好地推广到看不见的领域。这些问题限制了基于归一化的TTA方法的有效性，特别是在显著的域转移下。在本文中，我们介绍了一种新的模式的基础上的缓冲层的概念，解决了规范化层更新的基本限制。与现有的修改模型核心参数的方法不同，我们的方法保留了预训练骨干的完整性，本质上降低了在线适应过程中灾难性遗忘的风险。通过综合实验，我们证明，我们的方法不仅优于传统的方法，在减轻域转移和增强模型的鲁棒性，但也表现出很强的弹性遗忘。此外，我们的缓冲层是模块化的，可以无缝集成到几乎所有现有的TTA框架中，从而在各种架构中实现一致的性能改进。这些发现验证了所提出的解决方案在现实世界的领域适应场景的有效性和通用性。该代码可在https://github.com/hyeongyu-kim/Buffer_TTA上获得。
摘要：In recent advancements in Test Time Adaptation (TTA), most existing methodologies focus on updating normalization layers to adapt to the test domain. However, the reliance on normalization-based adaptation presents key challenges. First, normalization layers such as Batch Normalization (BN) are highly sensitive to small batch sizes, leading to unstable and inaccurate statistics. Moreover, normalization-based adaptation is inherently constrained by the structure of the pre-trained model, as it relies on training-time statistics that may not generalize well to unseen domains. These issues limit the effectiveness of normalization-based TTA approaches, especially under significant domain shift. In this paper, we introduce a novel paradigm based on the concept of a Buffer layer, which addresses the fundamental limitations of normalization layer updates. Unlike existing methods that modify the core parameters of the model, our approach preserves the integrity of the pre-trained backbone, inherently mitigating the risk of catastrophic forgetting during online adaptation. Through comprehensive experimentation, we demonstrate that our approach not only outperforms traditional methods in mitigating domain shift and enhancing model robustness, but also exhibits strong resilience to forgetting. Furthermore, our Buffer layer is modular and can be seamlessly integrated into nearly all existing TTA frameworks, resulting in consistent performance improvements across various architectures. These findings validate the effectiveness and versatility of the proposed solution in real-world domain adaptation scenarios. The code is available at https://github.com/hyeongyu-kim/Buffer_TTA.

【8】PINN Balls: Scaling Second-Order Methods for PINNs with Domain Decomposition and Adaptive Sampling
标题：PINN Balls：利用区域分解和自适应采样对PINN进行二阶缩放方法
链接：https://arxiv.org/abs/2510.21262

作者：Andrea Bonfanti, Ismael Medina, Roman List, Björn Staeves, Roberto Santana, Marco Ellero
备注：Accepted Conference Paper
摘要：科学机器学习的最新进展表明，二阶方法可以增强物理信息神经网络（PINN）的训练，使其成为偏微分方程（PDE）传统数值方法的合适替代方案。然而，二阶方法会导致较大的内存需求，使得它们随模型大小的扩展性较差。在本文中，我们定义了一个本地混合专家（MoE）集成模型的参数效率和稀疏编码，使二阶训练的使用相结合。我们的模型-- \textsc{PINN Balls} --还具有一个完全可学习的域分解结构，通过使用对抗自适应采样（AAS）来实现，该采样使DD适应PDE及其域。\textsc{PINN Balls}在科学机器学习中实现了比最先进的更高的准确性，同时保持了宝贵的可扩展性并借鉴了良好的理论背景。
摘要：Recent advances in Scientific Machine Learning have shown that second-order methods can enhance the training of Physics-Informed Neural Networks (PINNs), making them a suitable alternative to traditional numerical methods for Partial Differential Equations (PDEs). However, second-order methods induce large memory requirements, making them scale poorly with the model size. In this paper, we define a local Mixture of Experts (MoE) combining the parameter-efficiency of ensemble models and sparse coding to enable the use of second-order training. Our model -- \textsc{PINN Balls} -- also features a fully learnable domain decomposition structure, achieved through the use of Adversarial Adaptive Sampling (AAS), which adapts the DD to the PDE and its domain. \textsc{PINN Balls} achieves better accuracy than the state-of-the-art in scientific machine learning, while maintaining invaluable scalability properties and drawing from a sound theoretical background.

【9】Instance-Adaptive Hypothesis Tests with Heterogeneous Agents
标题：具有异类代理的实例适应性假设测试
链接：https://arxiv.org/abs/2510.21178

作者：Flora C. Shi, Martin J. Wainwright, Stephen Bates
摘要：我们研究假设检验的异质人口的战略代理人的私人信息。任何单一的测试，均匀地应用在整个人口产生的统计误差是次优相对于性能的一个预言访问私人信息。我们展示了如何设计统计合同的菜单，将类型最优测试与支付结构配对，诱导代理根据其私人信息进行自我选择。这种分离的菜单简化了代理类型，使主体即使在没有代理类型的先验知识的情况下也能够匹配Oracle性能。我们的主要结果充分表征的所有分离的菜单，实例自适应，匹配的Oracle性能的异质代理的任意人口的集合。我们确定的设计，信息获取基本上是无成本的，需要微不足道的额外费用相对于一个单一的测试基准，同时提高统计性能。我们的工作建立了适当的评分规则和菜单设计之间的连接，显示了如何约束的假设检验结构的可引出的信息。数值例子说明了分离菜单的几何形状和改进，他们提供的错误权衡。总体而言，我们的研究结果将统计决策理论与机制设计联系起来，展示了如何利用异质性和战略参与来提高假设检验的效率。
摘要：We study hypothesis testing over a heterogeneous population of strategic agents with private information. Any single test applied uniformly across the population yields statistical error that is sub-optimal relative to the performance of an oracle given access to the private information. We show how it is possible to design menus of statistical contracts that pair type-optimal tests with payoff structures, inducing agents to self-select according to their private information. This separating menu elicits agent types and enables the principal to match the oracle performance even without a priori knowledge of the agent type. Our main result fully characterizes the collection of all separating menus that are instance-adaptive, matching oracle performance for an arbitrary population of heterogeneous agents. We identify designs where information elicitation is essentially costless, requiring negligible additional expense relative to a single-test benchmark, while improving statistical performance. Our work establishes a connection between proper scoring rules and menu design, showing how the structure of the hypothesis test constrains the elicitable information. Numerical examples illustrate the geometry of separating menus and the improvements they deliver in error trade-offs. Overall, our results connect statistical decision theory with mechanism design, demonstrating how heterogeneity and strategic participation can be harnessed to improve efficiency in hypothesis testing.

【10】Memory Constrained Dynamic Subnetwork Update for Transfer Learning
标题：迁移学习的记忆约束动态子网络更新
链接：https://arxiv.org/abs/2510.20979

作者：Aël Quélennec, Pavlo Mozharovskyi, Van-Tam Nguyen, Enzo Tartaglione
摘要：设备上的神经网络训练面临着关键的内存约束，这些约束限制了预训练模型对下游任务的适应。我们提出了MeDyate，一个理论上接地框架内存约束的动态子网络适应。我们的方法引入了两个关键的创新：LaRa（层排名），一个改进的层重要性度量，使原则层预选，和一个动态的通道采样策略，利用时间稳定性的通道重要性分布在微调。MeDyate根据重要性加权概率动态重新采样时期之间的通道，确保全面的参数空间探索，同时尊重严格的内存预算。对大量任务和架构的广泛评估表明，MeDyate在极端内存限制下实现了最先进的性能，始终优于现有的静态和动态方法，同时保持高计算效率。我们的方法是实现有效的设备上学习的重要一步，通过演示有效的微调，内存预算低至几百kB的RAM。
摘要：On-device neural network training faces critical memory constraints that limit the adaptation of pre-trained models to downstream tasks. We present MeDyate, a theoretically-grounded framework for memory-constrained dynamic subnetwork adaptation. Our approach introduces two key innovations: LaRa (Layer Ranking), an improved layer importance metric that enables principled layer pre-selection, and a dynamic channel sampling strategy that exploits the temporal stability of channel importance distributions during fine-tuning. MeDyate dynamically resamples channels between epochs according to importance-weighted probabilities, ensuring comprehensive parameter space exploration while respecting strict memory budgets. Extensive evaluation across a large panel of tasks and architectures demonstrates that MeDyate achieves state-of-the-art performance under extreme memory constraints, consistently outperforming existing static and dynamic approaches while maintaining high computational efficiency. Our method represents a significant step towards enabling efficient on-device learning by demonstrating effective fine-tuning with memory budgets as low as a few hundred kB of RAM.

强化学习(3篇)

【1】Enhancing Tactile-based Reinforcement Learning for Robotic Control
标题：增强基于触觉的强化学习用于机器人控制
链接：https://arxiv.org/abs/2510.21609

作者：Elle Miller, Trevor McInroe, David Abel, Oisin Mac Aodha, Sethu Vijayakumar
摘要：实现安全、可靠的真实世界机器人操作需要智能体超越视觉，并结合触觉传感来克服感官缺陷和对理想状态信息的依赖。尽管具有潜力，但触觉传感在强化学习（RL）中的功效仍然不一致。我们通过开发自监督学习（SSL）方法来解决这个问题，以更有效地利用触觉观察，专注于本体感受和稀疏二进制接触的可扩展设置。我们的经验表明，稀疏的二进制触觉信号的灵巧性是至关重要的，特别是对于本体感受控制错误不注册，如解耦的机器人对象运动的相互作用。我们的智能体在复杂的接触任务（球弹跳和保定球旋转）中实现了超人的灵巧。此外，我们发现，解耦的SSL内存从策略内存可以提高性能。我们发布了机器人触觉奥林匹克（RoTO）基准测试，以评估和促进未来基于触觉的操纵研究。项目页面：https://elle-miller.github.io/tactile_rl
摘要：Achieving safe, reliable real-world robotic manipulation requires agents to evolve beyond vision and incorporate tactile sensing to overcome sensory deficits and reliance on idealised state information. Despite its potential, the efficacy of tactile sensing in reinforcement learning (RL) remains inconsistent. We address this by developing self-supervised learning (SSL) methodologies to more effectively harness tactile observations, focusing on a scalable setup of proprioception and sparse binary contacts. We empirically demonstrate that sparse binary tactile signals are critical for dexterity, particularly for interactions that proprioceptive control errors do not register, such as decoupled robot-object motions. Our agents achieve superhuman dexterity in complex contact tasks (ball bouncing and Baoding ball rotation). Furthermore, we find that decoupling the SSL memory from the on-policy memory can improve performance. We release the Robot Tactile Olympiad (RoTO) benchmark to standardise and promote future research in tactile-based manipulation. Project page: https://elle-miller.github.io/tactile_rl

【2】Robust Point Cloud Reinforcement Learning via PCA-Based Canonicalization
标题：基于PCA规范化的鲁棒点云强化学习
链接：https://arxiv.org/abs/2510.20974

作者：Michael Bezick, Vittorio Giammarino, Ahmed H. Qureshi
摘要：近年来，来自原始视觉输入的强化学习（RL）取得了令人印象深刻的成功，但它仍然容易受到分布外变化的影响，例如照明，颜色和视角的变化。点云强化学习（PC-RL）通过减轻基于外观的脆性提供了一种有前途的替代方案，但其对相机姿势失配的敏感性继续破坏现实环境中的可靠性。为了应对这一挑战，我们提出了PCA点云（PPC），这是一个专门为下游机器人控制量身定制的规范化框架。PPC将任意刚体变换下的点云映射到唯一的规范姿势，将观测对齐到一致的帧，从而大大减少视点引起的不一致性。在我们的实验中，我们表明，PPC提高了鲁棒性，看不见的相机构成具有挑战性的机器人任务，提供了一个原则性的替代域随机化。
摘要：Reinforcement Learning (RL) from raw visual input has achieved impressive successes in recent years, yet it remains fragile to out-of-distribution variations such as changes in lighting, color, and viewpoint. Point Cloud Reinforcement Learning (PC-RL) offers a promising alternative by mitigating appearance-based brittleness, but its sensitivity to camera pose mismatches continues to undermine reliability in realistic settings. To address this challenge, we propose PCA Point Cloud (PPC), a canonicalization framework specifically tailored for downstream robotic control. PPC maps point clouds under arbitrary rigid-body transformations to a unique canonical pose, aligning observations to a consistent frame, thereby substantially decreasing viewpoint-induced inconsistencies. In our experiments, we show that PPC improves robustness to unseen camera poses across challenging robotic tasks, providing a principled alternative to domain randomization.

【3】Safety Assessment in Reinforcement Learning via Model Predictive Control
标题：通过模型预测控制进行强化学习的安全评估
链接：https://arxiv.org/abs/2510.20955

作者：Jeff Pflueger, Michael Everett
备注：7 pages, 4 figures
摘要：无模型强化学习方法很有希望用于控制，但通常缺乏正式的安全保证。屏蔽或以其他方式提供这些保证的现有方法通常依赖于对安全规范的详细了解。相反，这项工作的见解是，许多难以指定的安全问题的最佳特征是不变性。因此，我们建议利用可逆性作为在整个培训过程中防止这些安全问题的方法。我们的方法使用模型预测路径积分控制来检查整个训练过程中学习策略提出的动作的安全性。这种方法的一个关键优点是，它只需要查询黑盒动态的能力，而不是明确的动态或安全约束的知识。实验结果表明，所提出的算法在所有不安全的动作之前成功地中止，同时仍然实现与允许违反安全的基线PPO方法相当的训练进度。
摘要：Model-free reinforcement learning approaches are promising for control but typically lack formal safety guarantees. Existing methods to shield or otherwise provide these guarantees often rely on detailed knowledge of the safety specifications. Instead, this work's insight is that many difficult-to-specify safety issues are best characterized by invariance. Accordingly, we propose to leverage reversibility as a method for preventing these safety issues throughout the training process. Our method uses model-predictive path integral control to check the safety of an action proposed by a learned policy throughout training. A key advantage of this approach is that it only requires the ability to query the black-box dynamics, not explicit knowledge of the dynamics or safety constraints. Experimental results demonstrate that the proposed algorithm successfully aborts before all unsafe actions, while still achieving comparable training progress to a baseline PPO approach that is allowed to violate safety.

元学习(1篇)

【1】Meta-Learning for Cross-Task Generalization in Protein Mutation Property Prediction
标题：蛋白质突变特性预测中跨任务概括的元学习
链接：https://arxiv.org/abs/2510.20943

作者：Srivathsan Badrinarayanan, Yue Su, Janghoon Ock, Alan Pham, Sanya Ahuja, Amir Barati Farimani
摘要：蛋白质突变可以对生物功能产生深远的影响，准确预测性质变化对于药物发现，蛋白质工程和精准医学至关重要。目前的方法依赖于对单个数据集的蛋白质特异性Transformers进行微调，但由于异质性实验条件和有限的目标域数据，难以实现跨数据集的泛化。我们介绍了两个关键的创新：（1）模型不可知元学习（MAML）的第一个应用蛋白质突变属性预测，（2）一种新的突变编码策略，使用分隔符标记直接将突变纳入序列上下文。我们构建在Transformer架构的基础上，将它们与MAML集成，通过最小的梯度步骤而不是学习特定于以太网的模式来快速适应新任务。我们的突变编码解决了标准Transformers将突变位置视为未知令牌的关键限制，从而显着降低性能。在三个不同的蛋白质突变数据集（功能适应性，热稳定性和溶解性）的评估表明，显着优势，传统的微调。在跨任务评估中，我们的元学习方法在训练时间减少65%的情况下，功能适应度的准确性提高了29%，在训练速度加快55%的情况下，溶解度的准确性提高了94%。无论数据集大小如何，该框架都保持一致的训练效率，这使得它对于实验数据有限的工业应用和早期蛋白质设计特别有价值。本文建立了元学习在蛋白质突变分析中的系统应用，并引入了一种有效的突变编码策略，为蛋白质工程中的跨域泛化提供了变革性的方法。
摘要：Protein mutations can have profound effects on biological function, making accurate prediction of property changes critical for drug discovery, protein engineering, and precision medicine. Current approaches rely on fine-tuning protein-specific transformers for individual datasets, but struggle with cross-dataset generalization due to heterogeneous experimental conditions and limited target domain data. We introduce two key innovations: (1) the first application of Model-Agnostic Meta-Learning (MAML) to protein mutation property prediction, and (2) a novel mutation encoding strategy using separator tokens to directly incorporate mutations into sequence context. We build upon transformer architectures integrating them with MAML to enable rapid adaptation to new tasks through minimal gradient steps rather than learning dataset-specific patterns. Our mutation encoding addresses the critical limitation where standard transformers treat mutation positions as unknown tokens, significantly degrading performance. Evaluation across three diverse protein mutation datasets (functional fitness, thermal stability, and solubility) demonstrates significant advantages over traditional fine-tuning. In cross-task evaluation, our meta-learning approach achieves 29% better accuracy for functional fitness with 65% less training time, and 94% better accuracy for solubility with 55% faster training. The framework maintains consistent training efficiency regardless of dataset size, making it particularly valuable for industrial applications and early-stage protein design where experimental data is limited. This work establishes a systematic application of meta-learning to protein mutation analysis and introduces an effective mutation encoding strategy, offering transformative methodology for cross-domain generalization in protein engineering.

医学相关(1篇)

【1】Efficient Meningioma Tumor Segmentation Using Ensemble Learning
标题：使用融合学习高效的脑膜瘤肿瘤分割
链接：https://arxiv.org/abs/2510.21040

作者：Mohammad Mahdi Danesh Pajouh, Sara Saeedi
备注：2nd Place Winner in the BraTS 2025 MICCAI Challenge (Task 2: Meningioma Tumor Segmentation)
摘要：脑膜瘤是最常见的原发性脑肿瘤，占所有确诊病例的近三分之一。从MRI扫描中准确描绘这些肿瘤对于指导治疗策略至关重要，但在临床实践中仍然是一项具有挑战性和耗时的任务。深度学习的最新发展加速了自动肿瘤分割的进展;然而，许多先进技术受到繁重的计算需求和长期训练计划的阻碍，使得研究人员和临床医生在有限的硬件条件下难以使用。在这项工作中，我们提出了一种新的基于集成的分割方法，该方法结合了三种不同的架构：（1）基线SegResNet模型，（2）具有级联跳过连接的注意力增强SegResNet，以及（3）具有注意力门控跳过连接（DDUNet）的双解码器U-Net。该集成旨在利用架构多样性来提高鲁棒性和准确性，同时显着降低培训需求。每个基线模型只训练了20个时期，并在BrTS-MEN 2025数据集上进行了评估。所提出的集成模型实现了具有竞争力的性能，在增强肿瘤（ET）、肿瘤核心（TC）和整个肿瘤（WT）的测试数据集上，平均病变-明智的Dice得分分别为77.30%、76.37%和73.9%。这些结果突出了集成学习对脑肿瘤分割的有效性，即使在有限的硬件限制下。我们提出的方法提供了一个实用和方便的工具，以帮助诊断脑膜瘤，在临床和研究环境中的潜在影响。
摘要：Meningiomas represent the most prevalent form of primary brain tumors, comprising nearly one-third of all diagnosed cases. Accurate delineation of these tumors from MRI scans is crucial for guiding treatment strategies, yet remains a challenging and time-consuming task in clinical practice. Recent developments in deep learning have accelerated progress in automated tumor segmentation; however, many advanced techniques are hindered by heavy computational demands and long training schedules, making them less accessible for researchers and clinicians working with limited hardware. In this work, we propose a novel ensemble-based segmentation approach that combines three distinct architectures: (1) a baseline SegResNet model, (2) an attention-augmented SegResNet with concatenative skip connections, and (3) a dual-decoder U-Net enhanced with attention-gated skip connections (DDUNet). The ensemble aims to leverage architectural diversity to improve robustness and accuracy while significantly reducing training demands. Each baseline model was trained for only 20 epochs and Evaluated on the BraTS-MEN 2025 dataset. The proposed ensemble model achieved competitive performance, with average Lesion-Wise Dice scores of 77.30%, 76.37% and 73.9% on test dataset for Enhancing Tumor (ET), Tumor Core (TC) and Whole Tumor (WT) respectively. These results highlight the effectiveness of ensemble learning for brain tumor segmentation, even under limited hardware constraints. Our proposed method provides a practical and accessible tool for aiding the diagnosis of meningioma, with potential impact in both clinical and research settings.

蒸馏|知识提取(1篇)

【1】Distilled Decoding 2: One-step Sampling of Image Auto-regressive Models with Conditional Score Distillation
标题：蒸馏解码2：带条件分数蒸馏的图像自回归模型的一步采样
链接：https://arxiv.org/abs/2510.21003

作者：Enshu Liu, Qian Chen, Xuefei Ning, Shengen Yan, Guohao Dai, Zinan Lin, Yu Wang
备注：Published at NeurIPS 2025
摘要：图像自回归（AR）模型已经成为视觉生成模型的一个强大的范例。尽管它们的性能很好，但由于需要大量的采样步骤，它们的生成速度很慢。尽管最近提出了Distilled Decoding 1（DD 1）来实现图像AR模型的几步采样，但它在一步设置中仍然会导致显着的性能下降，并且依赖于限制其灵活性的预定义映射。在这项工作中，我们提出了一种新的方法，蒸馏解码2（DD 2），进一步推进一步采样的图像AR模型的可行性。与DD 1不同，DD 2不依赖于预定义的映射。我们将原始AR模型视为教师模型，其在每个标记位置处的潜在嵌入空间中提供地面实况条件得分。在此基础上，我们提出了一种新的条件分数蒸馏损失训练一步发生器。具体来说，我们训练一个单独的网络来预测所生成的分布的条件分数，并在以先前令牌为条件的每个令牌位置处应用分数蒸馏。实验结果表明，DD 2可以对图像AR模型进行一步采样，并且在ImageNet-256上将FID从3.40增加到5.43。与最强的基线DD 1相比，DD 2将一步采样与原始AR模型之间的差距缩小了67%，同时训练速度提高了12.3倍。DD 2朝着一步AR生成的目标迈出了重要的一步，为快速和高质量的AR建模开辟了新的可能性。代码可在https://github.com/imagination-research/Distilled-Decoding-2上获得。
摘要：Image Auto-regressive (AR) models have emerged as a powerful paradigm of visual generative models. Despite their promising performance, they suffer from slow generation speed due to the large number of sampling steps required. Although Distilled Decoding 1 (DD1) was recently proposed to enable few-step sampling for image AR models, it still incurs significant performance degradation in the one-step setting, and relies on a pre-defined mapping that limits its flexibility. In this work, we propose a new method, Distilled Decoding 2 (DD2), to further advances the feasibility of one-step sampling for image AR models. Unlike DD1, DD2 does not without rely on a pre-defined mapping. We view the original AR model as a teacher model which provides the ground truth conditional score in the latent embedding space at each token position. Based on this, we propose a novel \emph{conditional score distillation loss} to train a one-step generator. Specifically, we train a separate network to predict the conditional score of the generated distribution and apply score distillation at every token position conditioned on previous tokens. Experimental results show that DD2 enables one-step sampling for image AR models with an minimal FID increase from 3.40 to 5.43 on ImageNet-256. Compared to the strongest baseline DD1, DD2 reduces the gap between the one-step sampling and original AR model by 67%, with up to 12.3$\times$ training speed-up simultaneously. DD2 takes a significant step toward the goal of one-step AR generation, opening up new possibilities for fast and high-quality AR modeling. Code is available at https://github.com/imagination-research/Distilled-Decoding-2.

推荐(1篇)

【1】Towards Explainable Personalized Recommendations by Learning from Users' Photos
标题：通过从用户照片中学习来实现可解释的个性化推荐
链接：https://arxiv.org/abs/2510.21455

作者：Jorge Díez, Pablo Pérez-Núñez, Oscar Luaces, Beatriz Remeseiro, Antonio Bahamonde
备注：None
摘要：解释一个复杂系统的输出，例如推荐系统（RS），对于用户和公司来说都变得至关重要。在本文中，我们探讨的想法，个性化的解释可以学习的建议本身。有很多在线服务，用户可以上传一些照片，除了评级项目。我们假设用户拍摄这些照片是为了加强或证明他们对这些物品的看法。出于这个原因，我们试图预测用户会给某个物品拍什么样的照片，因为这张照片是最能说服用户相信该物品质量的论据。从这个意义上说，RS可以解释其结果，从而提高其可靠性。此外，一旦我们有了一个模型来预测对用户有吸引力的图像，我们就可以估计它们的分布。因此，公司获得有关客户强调其产品的方面的生动知识。该文件包括一个正式的框架，估计作者概率为一个给定的对（用户，照片）。为了说明这一提议，我们使用了从TripAdvisor收集的数据，其中包含六个不同规模城市的餐馆评论（带照片）。
摘要：Explaining the output of a complex system, such as a Recommender System (RS), is becoming of utmost importance for both users and companies. In this paper we explore the idea that personalized explanations can be learned as recommendation themselves. There are plenty of online services where users can upload some photos, in addition to rating items. We assume that users take these photos to reinforce or justify their opinions about the items. For this reason we try to predict what photo a user would take of an item, because that image is the argument that can best convince her of the qualities of the item. In this sense, an RS can explain its results and, therefore, increase its reliability. Furthermore, once we have a model to predict attractive images for users, we can estimate their distribution. Thus, the companies acquire a vivid knowledge about the aspects that the clients highlight of their products. The paper includes a formal framework that estimates the authorship probability for a given pair (user, photo). To illustrate the proposal, we use data gathered from TripAdvisor containing the reviews (with photos) of restaurants in six cities of different sizes.

聚类(1篇)

【1】A Unified Matrix Factorization Framework for Classical and Robust Clustering
标题：用于经典和鲁棒集群的统一矩阵分解框架
链接：https://arxiv.org/abs/2510.21172

作者：Angshul Majumdar
摘要：本文提出了一个统一的矩阵分解框架的经典和鲁棒的聚类。首先，我们重新审视了著名的crisp k-均值聚类和矩阵分解之间的等价性，遵循并严格重新推导了Bauckhage未发表的公式。扩展这个框架，我们推导出一个类似的矩阵分解模糊c-均值聚类的解释，据我们所知，这还没有以前正式。这些重新制定允许两个聚类范式表示为因子矩阵的优化问题，从而使原则性的扩展到强大的变种。为了解决对离群值的敏感性，我们提出了鲁棒的配方，清晰和模糊聚类取代Frobenius范数与l1，2-范数，这惩罚了剩余列的欧几里德范数的总和。我们开发交替最小化算法的标准配方和IRLS为基础的算法的强大的同行。所有的算法理论上证明收敛到一个局部最小值。
摘要：This paper presents a unified matrix factorization framework for classical and robust clustering. We begin by revisiting the well-known equivalence between crisp k-means clustering and matrix factorization, following and rigorously rederiving an unpublished formulation by Bauckhage. Extending this framework, we derive an analogous matrix factorization interpretation for fuzzy c-means clustering, which to the best of our knowledge has not been previously formalized. These reformulations allow both clustering paradigms to be expressed as optimization problems over factor matrices, thereby enabling principled extensions to robust variants. To address sensitivity to outliers, we propose robust formulations for both crisp and fuzzy clustering by replacing the Frobenius norm with the l1,2-norm, which penalizes the sum of Euclidean norms across residual columns. We develop alternating minimization algorithms for the standard formulations and IRLS-based algorithms for the robust counterparts. All algorithms are theoretically proven to converge to a local minimum.

自动驾驶|车辆|车道检测等(1篇)

【1】Multi-Task Vehicle Routing Solver via Mixture of Specialized Experts under State-Decomposable MDP
标题：状态可分解MDP下通过专业专家混合的多任务车辆路径求解器
链接：https://arxiv.org/abs/2510.21453

作者：Yuxin Pan, Zhiguang Cao, Chengyang Gu, Liu Liu, Peilin Zhao, Yize Chen, Fangzhen Lin
备注：Accepted to NeurIPS 2025
摘要：现有的多任务车辆路径问题（VRP）的神经方法通常学习统一的求解器来同时处理多个约束。然而，它们通常未充分利用VRP变体的组成结构，每个变体可衍生自一组共同的基础VRP变体。这种关键的疏忽导致统一求解器错过了基础求解器的潜在好处，每个基础求解器都专门用于基础VRP变体。为了克服这一限制，我们提出了一个框架，使统一的求解器感知跨VRP变量的共享组件的性质，通过主动重用基础求解器，同时减轻训练神经求解器的指数增长。具体来说，我们引入了一个状态可分解MDP（SDMDP），重新制定VRP表示的状态空间作为笛卡尔积的基础状态空间与基础VRP的变体。更重要的是，这个公式固有地为每个基础VRP变量产生最优的基础策略。此外，一个潜在的空间为基础的SDMDP扩展开发的最佳基础政策和学习的混合函数，使政策重用的潜在空间。在温和的假设下，该扩展可证明地恢复SDMDP的最优统一策略，通过混合函数计算状态嵌入作为由最优基础策略生成的基础状态嵌入的映射。在实际应用中，我们引入了混合专家求解器（MoSES），它通过专门的低秩自适应（LoRA）专家实现基本策略，并通过自适应门控机制实现混合函数。跨VRP变体进行的广泛实验显示了MoSES优于先前方法的优势。
摘要：Existing neural methods for multi-task vehicle routing problems (VRPs) typically learn unified solvers to handle multiple constraints simultaneously. However, they often underutilize the compositional structure of VRP variants, each derivable from a common set of basis VRP variants. This critical oversight causes unified solvers to miss out the potential benefits of basis solvers, each specialized for a basis VRP variant. To overcome this limitation, we propose a framework that enables unified solvers to perceive the shared-component nature across VRP variants by proactively reusing basis solvers, while mitigating the exponential growth of trained neural solvers. Specifically, we introduce a State-Decomposable MDP (SDMDP) that reformulates VRPs by expressing the state space as the Cartesian product of basis state spaces associated with basis VRP variants. More crucially, this formulation inherently yields the optimal basis policy for each basis VRP variant. Furthermore, a Latent Space-based SDMDP extension is developed by incorporating both the optimal basis policies and a learnable mixture function to enable the policy reuse in the latent space. Under mild assumptions, this extension provably recovers the optimal unified policy of SDMDP through the mixture function that computes the state embedding as a mapping from the basis state embeddings generated by optimal basis policies. For practical implementation, we introduce the Mixture-of-Specialized-Experts Solver (MoSES), which realizes basis policies through specialized Low-Rank Adaptation (LoRA) experts, and implements the mixture function via an adaptive gating mechanism. Extensive experiments conducted across VRP variants showcase the superiority of MoSES over prior methods.

联邦学习|隐私保护|加密(3篇)

【1】Towards Straggler-Resilient Split Federated Learning: An Unbalanced Update Approach
标题：迈向具有落后弹性的分离联邦学习：一种不平衡的更新方法
链接：https://arxiv.org/abs/2510.21155

作者：Dandan Liang, Jianing Zhang, Evan Chen, Zhe Li, Rui Li, Haibo Yang
摘要：Split Federated Learning（SFL）通过将Federated Learning（FL）的并行性与Split Learning（SL）的计算卸载相结合，实现了边缘设备上的可扩展训练。尽管SFL取得了巨大的成功，但它仍然严重遭受分布式学习系统中众所周知的落伍问题的困扰。Split Server和客户端之间的依赖性加剧了这个问题：Split Server端模型更新依赖于从客户端接收激活。这种同步要求引入了显著的时间延迟，使得掉队者成为系统的可扩展性和效率的关键瓶颈。为了缓解这个问题，我们提出了MU-SplitFed，这是一种零阶优化的离散弹性SFL算法，它通过一种简单而有效的不平衡更新机制来消除离散延迟的训练进度。通过使服务器能够在每个客户端回合执行$\tau$本地更新，MU-SplitFed对于非凸目标实现了$O（\sqrt{d/（\tau T）}）$的收敛速度，在通信回合中实现了$\tau$的线性加速。实验表明，MU-SplitFed始终优于基线方法的存在下的落伍者，并有效地减轻其影响，通过自适应调整$\tau$。这个项目的代码可以在https://github.com/Johnny-Zip/MU-SplitFed上找到。
摘要：Split Federated Learning (SFL) enables scalable training on edge devices by combining the parallelism of Federated Learning (FL) with the computational offloading of Split Learning (SL). Despite its great success, SFL suffers significantly from the well-known straggler issue in distributed learning systems. This problem is exacerbated by the dependency between Split Server and clients: the Split Server side model update relies on receiving activations from clients. Such synchronization requirement introduces significant time latency, making straggler a critical bottleneck to the scalability and efficiency of the system. To mitigate this problem, we propose MU-SplitFed, a straggler-resilient SFL algorithm in zeroth-order optimization that decouples training progress from straggler delays via a simple yet effective unbalanced update mechanism. By enabling the server to perform $\tau$ local updates per client round, MU-SplitFed achieves a convergence rate of $O(\sqrt{d/(\tau T)})$ for non-convex objectives, demonstrating a linear speedup of $\tau$ in communication rounds. Experiments demonstrate that MU-SplitFed consistently outperforms baseline methods with the presence of stragglers and effectively mitigates their impact through adaptive tuning of $\tau$. The code for this project is available at https://github.com/Johnny-Zip/MU-SplitFed.

【2】DictPFL: Efficient and Private Federated Learning on Encrypted Gradients
标题：DictPFL：对加密对象进行高效且私人的联邦学习
链接：https://arxiv.org/abs/2510.21086

作者：Jiaqi Xue, Mayank Kumar, Yuzhang Shang, Shangqian Gao, Rui Ning, Mengxin Zheng, Xiaoqian Jiang, Qian Lou
备注：Accepted by NeurIPS 2025
摘要：联邦学习（FL）支持跨机构的协作模型训练，而无需共享原始数据。然而，梯度共享仍然存在隐私泄露的风险，例如梯度反转攻击。同态加密（HE）可以保证聚合的安全，但通常会导致过高的计算和通信开销。现有的基于HE的FL方法处于两个极端：以高成本加密所有梯度以实现完全隐私，或者部分加密梯度以节省资源同时暴露漏洞。我们提出了DictPFL，一个实用的框架，以最小的开销实现全梯度保护。DictPFL加密每个传输的梯度，同时将非传输参数保持在本地，保护隐私而无需繁重的计算。它引入了两个关键模块：部分加密分解（DePE），将模型权重分解为静态字典和可更新的查找表，只有后者被加密和聚合，而静态字典保持本地，既不需要共享也不需要加密;以及最小加密修剪（PrME），通过一致的历史引导掩码应用删除感知修剪来最小化加密参数。实验表明，与完全加密FL相比，DictPFL降低了402- 748 $\times $的通信开销，加快了28- 65 $\times $的训练速度，同时在开销和速度上优于最先进的选择性加密方法51- 155 $\times $和4- 19 $\times $。值得注意的是，DictPFL的运行时间在纯文本FL的2$\times$之内，首次证明了基于HE的私有联邦学习对于现实世界的部署是实用的。该代码可在https://github.com/UCF-ML-Research/DictPFL上公开获取。
摘要：Federated Learning (FL) enables collaborative model training across institutions without sharing raw data. However, gradient sharing still risks privacy leakage, such as gradient inversion attacks. Homomorphic Encryption (HE) can secure aggregation but often incurs prohibitive computational and communication overhead. Existing HE-based FL methods sit at two extremes: encrypting all gradients for full privacy at high cost, or partially encrypting gradients to save resources while exposing vulnerabilities. We present DictPFL, a practical framework that achieves full gradient protection with minimal overhead. DictPFL encrypts every transmitted gradient while keeping non-transmitted parameters local, preserving privacy without heavy computation. It introduces two key modules: Decompose-for-Partial-Encrypt (DePE), which decomposes model weights into a static dictionary and an updatable lookup table, only the latter is encrypted and aggregated, while the static dictionary remains local and requires neither sharing nor encryption; and Prune-for-Minimum-Encrypt (PrME), which applies encryption-aware pruning to minimize encrypted parameters via consistent, history-guided masks. Experiments show that DictPFL reduces communication cost by 402-748$\times$ and accelerates training by 28-65$\times$ compared to fully encrypted FL, while outperforming state-of-the-art selective encryption methods by 51-155$\times$ in overhead and 4-19$\times$ in speed. Remarkably, DictPFL's runtime is within 2$\times$ of plaintext FL, demonstrating for the first time, that HE-based private federated learning is practical for real-world deployment. The code is publicly available at https://github.com/UCF-ML-Research/DictPFL.

【3】An Ensembled Penalized Federated Learning Framework for Falling People Detection
标题：用于坠落人员检测的集成惩罚联邦学习框架
链接：https://arxiv.org/abs/2510.20960

作者：Sizhe Rao, Runqiu Zhang, Sajal Saha, Liang Chen
备注：12 pages, 3 figures
摘要：老年人和残疾人的跌倒仍然是世界范围内受伤和死亡的主要原因，需要强大，准确和隐私意识的跌倒检测系统。传统的跌倒检测方法，无论是集中式的还是逐点式的，通常都面临着一些关键挑战，例如有限的可推广性、数据隐私问题以及个体运动行为的可变性。为了解决这些局限性，我们提出了EPFL-一个集成惩罚联合学习框架，它集成了持续学习，个性化建模和一种新的专业加权聚合（SWA）策略。EPFL利用可穿戴传感器数据来捕获连续的运动模式，同时通过同态加密和联邦训练来保护用户隐私。与现有的联邦模型不同，EPFL结合了惩罚性本地训练和基于集合的推理，以提高客户端之间的一致性和对行为差异的适应性。在基准跌倒检测数据集上进行的大量实验证明了我们方法的有效性，实现了88.31%的召回率和89.94%的F1分数，显著优于集中式和基线模型。这项工作为医疗环境中的真实跌倒检测提供了一种可扩展、安全和准确的解决方案，通过其自适应反馈机制具有持续改进的强大潜力。
摘要：Falls among elderly and disabled individuals remain a leading cause of injury and mortality worldwide, necessitating robust, accurate, and privacy-aware fall detection systems. Traditional fall detection approaches, whether centralized or point-wise, often struggle with key challenges such as limited generalizability, data privacy concerns, and variability in individual movement behaviors. To address these limitations, we propose EPFL-an Ensembled Penalized Federated Learning framework that integrates continual learning, personalized modeling, and a novel Specialized Weighted Aggregation (SWA) strategy. EPFL leverages wearable sensor data to capture sequential motion patterns while preserving user privacy through homomorphic encryption and federated training. Unlike existing federated models, EPFL incorporates both penalized local training and ensemble-based inference to improve inter-client consistency and adaptability to behavioral differences. Extensive experiments on a benchmark fall detection dataset demonstrate the effectiveness of our approach, achieving a Recall of 88.31 percent and an F1-score of 89.94 percent, significantly outperforming both centralized and baseline models. This work presents a scalable, secure, and accurate solution for real-world fall detection in healthcare settings, with strong potential for continuous improvement via its adaptive feedback mechanism.

推理|分析|理解|解释(12篇)

【1】DeepAgent: A General Reasoning Agent with Scalable Toolsets
标题：DeepAgent：具有可扩展工具集的通用推理代理
链接：https://arxiv.org/abs/2510.21618

作者：Xiaoxi Li, Wenxiang Jiao, Jiarui Jin, Guanting Dong, Jiajie Jin, Yinuo Wang, Hao Wang, Yutao Zhu, Ji-Rong Wen, Yuan Lu, Zhicheng Dou
摘要：大型推理模型已经表现出强大的解决问题的能力，但现实世界的任务往往需要外部工具和长期的互动。现有的代理框架通常遵循预定义的工作流，这限制了自主和全局任务的完成。在本文中，我们介绍了DeepAgent，这是一个端到端的深度推理代理，可以在单个连贯的推理过程中执行自主思考，工具发现和动作执行。为了解决长视野交互的挑战，特别是多个工具调用和交互历史积累的上下文长度爆炸，我们引入了一个自主的记忆折叠机制，将过去的交互压缩到结构化的情节，工作和工具记忆中，减少错误积累，同时保留关键信息。为了有效和稳定地教授通用工具的使用，我们开发了一种端到端的强化学习策略，即ToolPO，它利用LLM模拟的API，并应用工具调用优势属性来为工具调用令牌分配细粒度的信用。在八个基准测试上进行的广泛实验，包括一般工具使用任务（ToolBench，API-Bank，TMDB，Spotify，ToolHop）和下游应用程序（ALFWorld，WebShop，GAIA，HLE），表明DeepAgent在标记工具和开放集工具检索场景中始终优于基线。这项工作朝着更通用和更有能力的代理人的现实世界的应用迈出了一步。代码和演示可在https://github.com/RUC-NLPIR/DeepAgent上获得。
摘要：Large reasoning models have demonstrated strong problem-solving abilities, yet real-world tasks often require external tools and long-horizon interactions. Existing agent frameworks typically follow predefined workflows, which limit autonomous and global task completion. In this paper, we introduce DeepAgent, an end-to-end deep reasoning agent that performs autonomous thinking, tool discovery, and action execution within a single, coherent reasoning process. To address the challenges of long-horizon interactions, particularly the context length explosion from multiple tool calls and the accumulation of interaction history, we introduce an autonomous memory folding mechanism that compresses past interactions into structured episodic, working, and tool memories, reducing error accumulation while preserving critical information. To teach general-purpose tool use efficiently and stably, we develop an end-to-end reinforcement learning strategy, namely ToolPO, that leverages LLM-simulated APIs and applies tool-call advantage attribution to assign fine-grained credit to the tool invocation tokens. Extensive experiments on eight benchmarks, including general tool-use tasks (ToolBench, API-Bank, TMDB, Spotify, ToolHop) and downstream applications (ALFWorld, WebShop, GAIA, HLE), demonstrate that DeepAgent consistently outperforms baselines across both labeled-tool and open-set tool retrieval scenarios. This work takes a step toward more general and capable agents for real-world applications. The code and demo are available at https://github.com/RUC-NLPIR/DeepAgent.

【2】SHAP Meets Tensor Networks: Provably Tractable Explanations with Parallelism
标题：SHAP会见张量网络：可证明可处理的平行主义解释
链接：https://arxiv.org/abs/2510.21599

作者：Reda Marzouk, Shahaf Bassan, Guy Katz
备注：To appear in NeurIPS 2025
摘要：虽然Shapley加法解释（SHAP）可以在多项式时间内计算决策树等简单模型，但不幸的是，对于神经网络等更具表现力的黑盒模型，它们变得NP-难以计算-其中生成解释通常是最关键的。在这项工作中，我们分析了计算 * 张量网络（TN）* 的SHAP解释的问题，这是一种比目前已知的精确SHAP算法更广泛，更具表达力的模型，广泛用于神经网络抽象和压缩。首先，我们介绍了一个通用的框架，用于计算可证明准确的SHAP解释一般TN的任意结构。有趣的是，我们表明，当TN被限制到一个 * 张量训练（TT）* 结构，SHAP计算可以在 * 多对数 * 时间使用 * 并行 * 计算。由于TT的表达能力，这个复杂性结果可以推广到许多其他流行的ML模型，如决策树，树集成，线性模型和线性RNN，因此收紧了这些模型家族的先前报告的复杂性结果。最后，通过将二进制神经网络简化为张量网络表示，我们证明了当网络的宽度固定时，SHAP计算可以变得 * 有效易处理 *，而即使深度恒定 *，它仍然很难计算。这突出了一个重要的见解：对于这类模型，宽度-而不是深度-成为SHAP计算中的主要计算瓶颈。
摘要：Although Shapley additive explanations (SHAP) can be computed in polynomial time for simple models like decision trees, they unfortunately become NP-hard to compute for more expressive black-box models like neural networks - where generating explanations is often most critical. In this work, we analyze the problem of computing SHAP explanations for *Tensor Networks (TNs)*, a broader and more expressive class of models than those for which current exact SHAP algorithms are known to hold, and which is widely used for neural network abstraction and compression. First, we introduce a general framework for computing provably exact SHAP explanations for general TNs with arbitrary structures. Interestingly, we show that, when TNs are restricted to a *Tensor Train (TT)* structure, SHAP computation can be performed in *poly-logarithmic* time using *parallel* computation. Thanks to the expressiveness power of TTs, this complexity result can be generalized to many other popular ML models such as decision trees, tree ensembles, linear models, and linear RNNs, therefore tightening previously reported complexity results for these families of models. Finally, by leveraging reductions of binarized neural networks to Tensor Network representations, we demonstrate that SHAP computation can become *efficiently tractable* when the network's *width* is fixed, while it remains computationally hard even with constant *depth*. This highlights an important insight: for this class of models, width - rather than depth - emerges as the primary computational bottleneck in SHAP computation.

【3】Document Understanding, Measurement, and Manipulation Using Category Theory
标题：使用类别理论理解、测量和操纵文档
链接：https://arxiv.org/abs/2510.21553

作者：Jared Claypoole, Yunye Gong, Noson S. Yanofsky, Ajay Divakaran
摘要：我们应用范畴理论来提取多模态文档结构，这使我们能够开发信息理论措施，内容摘要和扩展，以及大型预训练模型的自我监督改进。我们首先开发一个数学表示的文件作为一类的问题-答案对。其次，我们开发了一个正交化的过程中包含的一个或多个文件的信息划分成不重叠的片段。在第一步和第二步中提取的结构引导我们开发方法来测量和枚举文档中包含的信息。我们还建立在这些步骤，以开发新的摘要技术，以及开发一个新的问题的解决方案，即。注释导致原始文件的扩展。我们的问答对方法，使一种新的率失真分析的摘要技术。我们使用大型预训练模型实现我们的技术，并提出了我们整体数学框架的多模态扩展。最后，我们开发了一种新的自监督方法，使用RLVR来改进大型预训练模型，使用一致性约束，例如在某些操作下的可组合性和闭包，这些操作自然来自我们的类别理论框架。
摘要：We apply category theory to extract multimodal document structure which leads us to develop information theoretic measures, content summarization and extension, and self-supervised improvement of large pretrained models. We first develop a mathematical representation of a document as a category of question-answer pairs. Second, we develop an orthogonalization procedure to divide the information contained in one or more documents into non-overlapping pieces. The structures extracted in the first and second steps lead us to develop methods to measure and enumerate the information contained in a document. We also build on those steps to develop new summarization techniques, as well as to develop a solution to a new problem viz. exegesis resulting in an extension of the original document. Our question-answer pair methodology enables a novel rate distortion analysis of summarization techniques. We implement our techniques using large pretrained models, and we propose a multimodal extension of our overall mathematical framework. Finally, we develop a novel self-supervised method using RLVR to improve large pretrained models using consistency constraints such as composability and closure under certain operations that stem naturally from our category theoretic framework.

【4】Assessing the Real-World Utility of Explainable AI for Arousal Diagnostics: An Application-Grounded User Study
标题：评估可解释人工智能在唤醒诊断中的现实效用：一项基于应用程序的用户研究
链接：https://arxiv.org/abs/2510.21389

作者：Stefan Kraft, Andreas Theissler, Vera Wienhausen-Wilke, Gjergji Kasneci, Hendrik Lensch
摘要：人工智能（AI）系统在生物医学信号解释方面越来越接近或超过人类专家。然而，将其有效整合到临床实践中需要的不仅仅是高预测准确性。临床医生必须辨别出\textit{when}和\textit{why}来信任算法建议。这项工作提出了一项基于应用程序的用户研究，其中有八名专业的睡眠医学从业者，他们在三种条件下对多导睡眠图数据中的夜间觉醒事件进行评分：（i）手动评分，（ii）黑盒（BB）AI辅助，以及（iii）透明白盒（WB）AI辅助。从评分开始或作为事后质量控制（\texit {QC}）审查提供帮助。我们系统地评估了辅助的类型和时机如何影响事件级别和临床上最相关的基于计数的性能、时间要求和用户体验。当对照用于训练AI的临床标准进行评估时，AI和人类-AI团队的表现都显著优于独立专家，协作也减少了评估者之间的差异。值得注意的是，作为目标QC步骤应用的透明AI辅助比黑箱辅助产生约30%的中位事件级性能改进，并且QC定时进一步增强了基于计数的结果。虽然WB和QC方法增加了评分所需的时间，但启动时的帮助更快，并且受到大多数参与者的青睐。绝大多数参与者都赞成透明度，八分之七的人表示愿意在稍加修改或不作修改的情况下采用该系统。总之，策略性定时透明AI辅助有效地平衡了准确性和临床效率，为临床工作流程中值得信赖的AI集成和用户接受提供了一条有前途的途径。
摘要：Artificial intelligence (AI) systems increasingly match or surpass human experts in biomedical signal interpretation. However, their effective integration into clinical practice requires more than high predictive accuracy. Clinicians must discern \textit{when} and \textit{why} to trust algorithmic recommendations. This work presents an application-grounded user study with eight professional sleep medicine practitioners, who score nocturnal arousal events in polysomnographic data under three conditions: (i) manual scoring, (ii) black-box (BB) AI assistance, and (iii) transparent white-box (WB) AI assistance. Assistance is provided either from the \textit{start} of scoring or as a post-hoc quality-control (\textit{QC}) review. We systematically evaluate how the type and timing of assistance influence event-level and clinically most relevant count-based performance, time requirements, and user experience. When evaluated against the clinical standard used to train the AI, both AI and human-AI teams significantly outperform unaided experts, with collaboration also reducing inter-rater variability. Notably, transparent AI assistance applied as a targeted QC step yields median event-level performance improvements of approximately 30\% over black-box assistance, and QC timing further enhances count-based outcomes. While WB and QC approaches increase the time required for scoring, start-time assistance is faster and preferred by most participants. Participants overwhelmingly favor transparency, with seven out of eight expressing willingness to adopt the system with minor or no modifications. In summary, strategically timed transparent AI assistance effectively balances accuracy and clinical efficiency, providing a promising pathway toward trustworthy AI integration and user acceptance in clinical workflows.

【5】Amortized Variational Inference for Partial-Label Learning: A Probabilistic Approach to Label Disambiguation
标题：部分标签学习的摊销变分推理：标签歧义消除的概率方法
链接：https://arxiv.org/abs/2510.21300

作者：Tobias Fuchs, Nadja Klein
摘要：真实世界的数据经常是嘈杂和模糊的。例如，在众包中，人类注释者可能会将冲突的类标签分配给相同的实例。部分标签学习（PLL）通过训练分类器来解决这一挑战，当每个实例与一组候选标签相关联时，其中只有一个是正确的。虽然早期的PLL方法接近真实的标签后验，但它们通常是计算密集型的。最近的深度学习方法提高了可扩展性，但依赖于替代损失和启发式标签细化。我们引入了一个新的概率框架，直接近似真实标签的后验分布使用摊销变分推理。我们的方法采用神经网络来预测输入数据的变分参数，从而实现有效的推理。这种方法结合了深度学习的表现力和概率建模的严谨性，同时保持了架构不可知性。理论分析和大量的实验上的合成和真实世界的数据集表明，我们的方法达到了最先进的性能在准确性和效率。
摘要：Real-world data is frequently noisy and ambiguous. In crowdsourcing, for example, human annotators may assign conflicting class labels to the same instances. Partial-label learning (PLL) addresses this challenge by training classifiers when each instance is associated with a set of candidate labels, only one of which is correct. While early PLL methods approximate the true label posterior, they are often computationally intensive. Recent deep learning approaches improve scalability but rely on surrogate losses and heuristic label refinement. We introduce a novel probabilistic framework that directly approximates the posterior distribution over true labels using amortized variational inference. Our method employs neural networks to predict variational parameters from input data, enabling efficient inference. This approach combines the expressiveness of deep learning with the rigor of probabilistic modeling, while remaining architecture-agnostic. Theoretical analysis and extensive experiments on synthetic and real-world datasets demonstrate that our method achieves state-of-the-art performance in both accuracy and efficiency.

【6】Accelerating Mobile Inference through Fine-Grained CPU-GPU Co-Execution
标题：通过细粒度的CPU-GUP协同执行加速移动推理
链接：https://arxiv.org/abs/2510.21081

作者：Zhuojin Li, Marco Paolieri, Leana Golubchik
备注：To appear on Lecture Notes in Computer Science, volume on Selected Papers of EPEW 2025
摘要：在移动设备上部署深度神经网络越来越重要，但由于计算资源有限，仍然具有挑战性。另一方面，它们统一的内存架构和CPU与GPU性能之间更窄的差距提供了通过将任务分配给CPU和GPU来减少推理延迟的机会。这种协作执行的主要障碍是合并部分结果所需的显著同步开销，以及预测分配给CPU和GPU的任务的执行时间的困难（由于实现和并行级别的动态选择）。为了克服这些障碍，我们提出了一个轻量级的同步机制，基于OpenCL细粒度共享虚拟内存（SVM）和机器学习模型，以准确地预测执行时间。值得注意的是，这些模型捕捉GPU内核的性能特征，并考虑其调度时间。在四个移动平台上进行的全面评估表明，我们的方法可以快速选择CPU-GPU协同执行策略，线性层的加速比高达1.89倍，卷积层的加速比高达1.75倍（分别接近于在Pixel~5智能手机上进行穷举网格搜索时可实现的最大值2.01倍和1.87倍）。
摘要：Deploying deep neural networks on mobile devices is increasingly important but remains challenging due to limited computing resources. On the other hand, their unified memory architecture and narrower gap between CPU and GPU performance provide an opportunity to reduce inference latency by assigning tasks to both CPU and GPU. The main obstacles for such collaborative execution are the significant synchronization overhead required to combine partial results, and the difficulty of predicting execution times of tasks assigned to CPU and GPU (due to the dynamic selection of implementations and parallelism level). To overcome these obstacles, we propose both a lightweight synchronization mechanism based on OpenCL fine-grained shared virtual memory (SVM) and machine learning models to accurately predict execution times. Notably, these models capture the performance characteristics of GPU kernels and account for their dispatch times. A comprehensive evaluation on four mobile platforms shows that our approach can quickly select CPU-GPU co-execution strategies achieving up to 1.89x speedup for linear layers and 1.75x speedup for convolutional layers (close to the achievable maximum values of 2.01x and 1.87x, respectively, found by exhaustive grid search on a Pixel~5 smartphone).

【7】The Virtues of Brevity: Avoid Overthinking in Parallel Test-Time Reasoning
标题：简洁的优点：避免平行测试时推理中过度思考
链接：https://arxiv.org/abs/2510.21067

作者：Raul Cavalcante Dinardi, Bruno Yamamoto, Anna Helena Reali Costa, Artur Jordao
备注：Accepted at NeurIPS 2025 Workshop on Efficient Reasoning
摘要：推理模型代表了LLM能力的重大进步，特别是对于数学和编码等复杂的推理任务。以往的研究证实，并行测试时间计算采样多个解决方案，并选择最好的一个，可以进一步提高预测性能的LLM。然而，在这方面的策略往往需要复杂的评分，从而增加计算成本和复杂性。在这项工作中，我们证明了选择最短的解决方案的简单和违反直觉的启发式是非常有效的。我们认为，所观察到的有效性源于模型在两个不同的制度：一个简洁，自信的传统制度和一个冗长的过度思考制度的特点是不确定性，我们显示的证据的一个临界点，过度思考制度开始显着。通过选择最短的答案，启发式优先从传统的制度。我们确认，这种方法是有竞争力的更复杂的方法，如跨两个具有挑战性的基准的自一致性，同时显着降低计算开销。最短答案启发式提供了一个帕累托改进自我一致性，甚至适用于任务的输出平等没有很好地定义。
摘要：Reasoning models represent a significant advance in LLM capabilities, particularly for complex reasoning tasks such as mathematics and coding. Previous studies confirm that parallel test-time compute-sampling multiple solutions and selecting the best one-can further enhance the predictive performance of LLMs. However, strategies in this area often require complex scoring, thus increasing computational cost and complexity. In this work, we demonstrate that the simple and counterintuitive heuristic of selecting the shortest solution is highly effective. We posit that the observed effectiveness stems from models operating in two distinct regimes: a concise, confident conventional regime and a verbose overthinking regime characterized by uncertainty, and we show evidence of a critical point where the overthinking regime begins to be significant. By selecting the shortest answer, the heuristic preferentially samples from the conventional regime. We confirm that this approach is competitive with more complex methods such as self-consistency across two challenging benchmarks while significantly reducing computational overhead. The shortest-answer heuristic provides a Pareto improvement over self-consistency and applies even to tasks where output equality is not well defined.

【8】Scalable Machine Learning Analysis of Parker Solar Probe Solar Wind Data
标题：Parker太阳探测器太阳风数据的可扩展机器学习分析
链接：https://arxiv.org/abs/2510.21066

作者：Daniela Martin, Connor O'Brien, Valmir P Moraes Filho, Jinsu Hong, Jasmine R. Kobayashi, Evangelia Samara, Joseph Gallego
摘要：我们提出了一个可扩展的机器学习框架，使用分布式处理和量子启发的核密度矩阵（KDM）方法分析帕克太阳探测器（PSP）的太阳风数据。PSP数据集（2018- 2024）超过150 GB，对传统分析方法提出了挑战。我们的框架利用Dask进行大规模统计计算和KDM来估计关键太阳风参数的单变量和双变量分布，包括太阳风速度，质子密度和质子热速度，以及每个参数的异常阈值。我们揭示了内日球层的特征趋势，包括太阳风的速度随着距离太阳的增加而增加，质子密度降低，以及速度和密度之间的反比关系。太阳风结构在增强和调节极端空间天气现象方面发挥着关键作用，并可能引发地磁暴;我们的分析为这些过程提供了定量见解。这种方法提供了一个易于处理的，可解释的，分布式的方法来探索复杂的物理数据集，并促进大规模原位测量的可重复分析。公开提供经过处理的数据产品和分析工具，以推进今后对太阳风动力学和空间气象预报的研究。本研究中使用的代码和配置文件是公开的，以支持再现性。
摘要：We present a scalable machine learning framework for analyzing Parker Solar Probe (PSP) solar wind data using distributed processing and the quantum-inspired Kernel Density Matrices (KDM) method. The PSP dataset (2018--2024) exceeds 150 GB, challenging conventional analysis approaches. Our framework leverages Dask for large-scale statistical computations and KDM to estimate univariate and bivariate distributions of key solar wind parameters, including solar wind speed, proton density, and proton thermal speed, as well as anomaly thresholds for each parameter. We reveal characteristic trends in the inner heliosphere, including increasing solar wind speed with distance from the Sun, decreasing proton density, and the inverse relationship between speed and density. Solar wind structures play a critical role in enhancing and mediating extreme space weather phenomena and can trigger geomagnetic storms; our analyses provide quantitative insights into these processes. This approach offers a tractable, interpretable, and distributed methodology for exploring complex physical datasets and facilitates reproducible analysis of large-scale in situ measurements. Processed data products and analysis tools are made publicly available to advance future studies of solar wind dynamics and space weather forecasting. The code and configuration files used in this study are publicly available to support reproducibility.

【9】Reasoning's Razor: Reasoning Improves Accuracy but Can Hurt Recall at Critical Operating Points in Safety and Hallucination Detection
标题：推理的剃刀：推理提高了准确性，但可能会损害安全和幻觉检测的关键操作点的回忆
链接：https://arxiv.org/abs/2510.21049

作者：Atoosa Chegini, Hamid Kazemi, Garrett Souza, Maria Safi, Yang Song, Samy Bengio, Sinead Williamson, Mehrdad Farajtabar
摘要：推理已经成为大型语言模型（LLM）的核心范式，不断提高各种基准的准确性。然而，它是否适合精确敏感的任务仍然不清楚。我们提出了第一个系统的研究推理分类任务下严格的低误报率（FPR）制度。我们的分析涵盖了两个任务-安全检测和幻觉检测-在微调和zero-shot设置，使用标准LLM和大型推理模型（LRM）进行评估。我们的研究结果揭示了一个明确的权衡：Think On（推理增强）生成提高了整体准确性，但在实际使用所必需的低FPR阈值下表现不佳。相比之下，思考关闭（推理过程中没有推理）占主导地位，在这些精度敏感的制度，与思考超越时，更高的FPR是可以接受的。此外，我们发现，基于令牌的评分大大优于自我语言化的信心，精度敏感的部署。最后，一个简单的合奏的两种模式恢复的强度。总的来说，我们的发现将推理定位为一种双刃剑：有利于平均精度，但通常不适合需要严格精度的应用程序。
摘要：Reasoning has become a central paradigm for large language models (LLMs), consistently boosting accuracy across diverse benchmarks. Yet its suitability for precision-sensitive tasks remains unclear. We present the first systematic study of reasoning for classification tasks under strict low false positive rate (FPR) regimes. Our analysis covers two tasks--safety detection and hallucination detection--evaluated in both fine-tuned and zero-shot settings, using standard LLMs and Large Reasoning Models (LRMs). Our results reveal a clear trade-off: Think On (reasoning-augmented) generation improves overall accuracy, but underperforms at the low-FPR thresholds essential for practical use. In contrast, Think Off (no reasoning during inference) dominates in these precision-sensitive regimes, with Think On surpassing only when higher FPRs are acceptable. In addition, we find token-based scoring substantially outperforms self-verbalized confidence for precision-sensitive deployments. Finally, a simple ensemble of the two modes recovers the strengths of each. Taken together, our findings position reasoning as a double-edged tool: beneficial for average accuracy, but often ill-suited for applications requiring strict precision.

【10】CIPHER: Scalable Time Series Analysis for Physical Sciences with Application to Solar Wind Phenomena
标题：CIPHER：物理科学的可扩展时间序列分析及其应用于太阳风现象
链接：https://arxiv.org/abs/2510.21022

作者：Jasmine R. Kobayashi, Daniela Martin, Valmir P Moraes Filho, Connor O'Brien, Jinsu Hong, Sudeshna Boro Saikia, Hala Lamdouar, Nathan D. Miles, Marcella Scoczynski, Mavis Stone, Sairam Sundaresan, Anna Jungbluth, Andrés Muñoz-Jaramillo, Evangelia Samara, Joseph Gallego
备注：5 pages, 2 figures, Machine Learning and the Physical Sciences Workshop @ NeurIPS 2025
摘要：对时间序列进行标记或分类是物理科学中的一个持续挑战，其中专家注释稀缺，成本高，而且往往不一致。然而，强大的标签对于实现机器学习模型的理解，预测和预测至关重要。我们提出了\textit{聚类和索引管道与人类评价识别}（CIPHER），一个框架，旨在加速大规模标记物理学中的复杂时间序列。CIPHER集成了\textit{indexable Symbolic Aggregate approXimation}（iSAX），用于可解释的压缩和索引，基于密度的聚类（HDBSCAN），用于对重复出现的现象进行分组，以及用于高效专家验证的人工参与步骤。代表性的样本由领域科学家标记，这些注释在集群中传播，以产生系统的、可扩展的分类。我们评估CIPHER对OMNI数据中太阳风现象进行分类的任务，这是空间天气研究中的一个核心挑战，表明该框架恢复了有意义的现象，如日冕物质抛射和流相互作用区域。除了这个案例研究之外，CIPHER还强调了一个通用策略，将符号表示、无监督学习和专家知识相结合，以解决物理科学时间序列中的标签稀缺问题。本研究中使用的代码和配置文件是公开的，以支持再现性。
摘要：Labeling or classifying time series is a persistent challenge in the physical sciences, where expert annotations are scarce, costly, and often inconsistent. Yet robust labeling is essential to enable machine learning models for understanding, prediction, and forecasting. We present the \textit{Clustering and Indexation Pipeline with Human Evaluation for Recognition} (CIPHER), a framework designed to accelerate large-scale labeling of complex time series in physics. CIPHER integrates \textit{indexable Symbolic Aggregate approXimation} (iSAX) for interpretable compression and indexing, density-based clustering (HDBSCAN) to group recurring phenomena, and a human-in-the-loop step for efficient expert validation. Representative samples are labeled by domain scientists, and these annotations are propagated across clusters to yield systematic, scalable classifications. We evaluate CIPHER on the task of classifying solar wind phenomena in OMNI data, a central challenge in space weather research, showing that the framework recovers meaningful phenomena such as coronal mass ejections and stream interaction regions. Beyond this case study, CIPHER highlights a general strategy for combining symbolic representations, unsupervised learning, and expert knowledge to address label scarcity in time series across the physical sciences. The code and configuration files used in this study are publicly available to support reproducibility.

【11】Fisher meets Feynman: score-based variational inference with a product of experts
标题：费舍尔遇见费曼：基于分数的变分推理与专家的产品
链接：https://arxiv.org/abs/2510.21598

作者：Diana Cai, Robert M. Gower, David M. Blei, Lawrence K. Saul
备注：27 pages, 11 figures. To appear in Advances in Neural Processing Information Systems (NeurIPS), 2025
摘要：我们介绍了一个高度表达，但明显听话的家庭黑盒变分推理（BBVI）。这个家族的每个成员都是专家的加权乘积（PoE），并且乘积中的每个加权专家都与多变量$t$-分布成比例。这些专家的产品可以对具有偏斜、重尾和多模式的分布进行建模，但是要将它们用于BBVI，我们必须能够从它们的密度中进行采样。我们展示了如何做到这一点，通过重新制定这些产品的专家作为潜变量模型与辅助Dirichlet随机变量。这些狄利克雷变量来自费曼恒等式，最初是为量子场论中的循环积分而开发的，它将多个分数的乘积（或者在我们的情况下，$t$-分布）表示为单形上的积分。我们利用这个单纯的潜在空间从这些专家的产品中提取加权样本-BBVI然后使用这些样本来找到最接近目标密度的PoE。给定一组专家，我们推导出一个迭代过程来优化指数，确定其在PoE中的几何权重。在每次迭代中，该过程最小化正则化Fisher散度，以匹配从当前近似中提取的一批样本的变分和目标密度的分数。这种最小化减少到一个凸二次规划，我们证明在一般条件下，这些更新指数收敛速度快，专家的一个接近最优的权重。最后，我们评估这种方法在各种合成和现实世界的目标分布。
摘要：We introduce a highly expressive yet distinctly tractable family for black-box variational inference (BBVI). Each member of this family is a weighted product of experts (PoE), and each weighted expert in the product is proportional to a multivariate $t$-distribution. These products of experts can model distributions with skew, heavy tails, and multiple modes, but to use them for BBVI, we must be able to sample from their densities. We show how to do this by reformulating these products of experts as latent variable models with auxiliary Dirichlet random variables. These Dirichlet variables emerge from a Feynman identity, originally developed for loop integrals in quantum field theory, that expresses the product of multiple fractions (or in our case, $t$-distributions) as an integral over the simplex. We leverage this simplicial latent space to draw weighted samples from these products of experts -- samples which BBVI then uses to find the PoE that best approximates a target density. Given a collection of experts, we derive an iterative procedure to optimize the exponents that determine their geometric weighting in the PoE. At each iteration, this procedure minimizes a regularized Fisher divergence to match the scores of the variational and target densities at a batch of samples drawn from the current approximation. This minimization reduces to a convex quadratic program, and we prove under general conditions that these updates converge exponentially fast to a near-optimal weighting of experts. We conclude by evaluating this approach on a variety of synthetic and real-world target distributions.

【12】Finite-Time Analysis of Stochastic Nonconvex Nonsmooth Optimization on the Riemannian Manifolds
标题：Riemann上随机非凸非光滑优化的临时分析
链接：https://arxiv.org/abs/2510.21468

作者：Emre Sahinoglu, Youbang Sun, Shahin Shahrampour
备注：To Appear in NeurIPS 2025
摘要：本文研究了黎曼流形约束下非光滑非凸随机优化问题的有限时间分析。我们适应的概念Goldstein平稳的黎曼设置作为一个性能度量流形上的非光滑优化。然后，我们提出了一个黎曼在线非凸（RO 2NC）算法，我们建立了$O（\delta ^{-3}\delta^{-1}）$在寻找$（\delta，\delta）$-平稳点的样本复杂度。这一结果是有史以来第一次有限时间保证完全非光滑，非凸优化流形和匹配的最佳复杂性在欧几里德设置。当梯度信息不可用时，我们开发了一个零阶版本的RO 2NC算法（ZO-RO 2NC），我们建立了相同的样本复杂度。数值结果支持了理论分析，并证明了算法的有效性.
摘要：This work addresses the finite-time analysis of nonsmooth nonconvex stochastic optimization under Riemannian manifold constraints. We adapt the notion of Goldstein stationarity to the Riemannian setting as a performance metric for nonsmooth optimization on manifolds. We then propose a Riemannian Online to NonConvex (RO2NC) algorithm, for which we establish the sample complexity of $O(\epsilon^{-3}\delta^{-1})$ in finding $(\delta,\epsilon)$-stationary points. This result is the first-ever finite-time guarantee for fully nonsmooth, nonconvex optimization on manifolds and matches the optimal complexity in the Euclidean setting. When gradient information is unavailable, we develop a zeroth order version of RO2NC algorithm (ZO-RO2NC), for which we establish the same sample complexity. The numerical results support the theory and demonstrate the practical effectiveness of the algorithms.

检测相关(7篇)

【1】DEEDEE: Fast and Scalable Out-of-Distribution Dynamics Detection
标题：DEEDEE：快速且可扩展的分布外动态检测
链接：https://arxiv.org/abs/2510.21638

作者：Tala Aljaafari, Varun Kanade, Philip Torr, Christian Schroeder de Witt
摘要：在安全关键设置中部署强化学习（RL）受到分布偏移下的脆性的限制。我们研究了RL时间序列的分布外（OOD）检测，并介绍了DEEDEE，一个两个统计检测器，重访代表重管道与最小的替代。DEEDEE只使用逐段平均值和RBF核相似性来训练摘要，捕获互补的全局和局部偏差。尽管简单，但DEEDEE在标准RL OOD套件中与当代检测器相匹配或超越，计算（FLOPs / wall-time）减少了600倍，并且在强基线上平均获得了5%的绝对精度。从概念上讲，我们的研究结果表明，不同的异常类型往往通过一个小的低阶统计量的RL轨迹上的印记，这表明在复杂的环境中OOD检测的紧凑的基础。
摘要：Deploying reinforcement learning (RL) in safety-critical settings is constrained by brittleness under distribution shift. We study out-of-distribution (OOD) detection for RL time series and introduce DEEDEE, a two-statistic detector that revisits representation-heavy pipelines with a minimal alternative. DEEDEE uses only an episodewise mean and an RBF kernel similarity to a training summary, capturing complementary global and local deviations. Despite its simplicity, DEEDEE matches or surpasses contemporary detectors across standard RL OOD suites, delivering a 600-fold reduction in compute (FLOPs / wall-time) and an average 5% absolute accuracy gain over strong baselines. Conceptually, our results indicate that diverse anomaly types often imprint on RL trajectories through a small set of low-order statistics, suggesting a compact foundation for OOD detection in complex environments.

【2】FrameShield: Adversarially Robust Video Anomaly Detection
标题：Frame Shield：对抗鲁棒的视频异常检测
链接：https://arxiv.org/abs/2510.21532

作者：Mojtaba Nafez, Mobina Poulaei, Nikan Vasei, Bardia Soltani Moakhar, Mohammad Sabokrou, MohammadHossein Rohban
备注：28 page, 5 figures
摘要：弱监督视频异常检测（WSVAD）已经取得了显着的进步，但现有的模型仍然容易受到对抗性攻击，限制了它们的可靠性。由于弱监督的固有限制，尽管需要帧级预测，但仅提供视频级标签，传统的对抗性防御机制（如对抗性训练）并不有效，因为视频级对抗性扰动通常很弱且不充分。为了解决这一限制，直接从模型中生成的伪标签可以实现帧级对抗训练;然而，这些伪标签本身就有噪声，会显著降低性能。因此，我们引入了一种新的伪异常生成方法，称为时空区域失真（SRD），它通过对正常视频中的局部区域应用严重增强来创建合成异常，同时保持时间一致性。将这些精确注释的合成异常与嘈杂的伪标签相结合，大大减少了标签噪声，从而实现了有效的对抗训练。大量的实验表明，我们的方法显着增强了WSVAD模型对对抗性攻击的鲁棒性，在多个基准测试中，整体AUROC性能平均优于最先进的方法71.0%。实现和代码可在www.example.com上公开获得。
摘要：Weakly Supervised Video Anomaly Detection (WSVAD) has achieved notable advancements, yet existing models remain vulnerable to adversarial attacks, limiting their reliability. Due to the inherent constraints of weak supervision, where only video-level labels are provided despite the need for frame-level predictions, traditional adversarial defense mechanisms, such as adversarial training, are not effective since video-level adversarial perturbations are typically weak and inadequate. To address this limitation, pseudo-labels generated directly from the model can enable frame-level adversarial training; however, these pseudo-labels are inherently noisy, significantly degrading performance. We therefore introduce a novel Pseudo-Anomaly Generation method called Spatiotemporal Region Distortion (SRD), which creates synthetic anomalies by applying severe augmentations to localized regions in normal videos while preserving temporal consistency. Integrating these precisely annotated synthetic anomalies with the noisy pseudo-labels substantially reduces label noise, enabling effective adversarial training. Extensive experiments demonstrate that our method significantly enhances the robustness of WSVAD models against adversarial attacks, outperforming state-of-the-art methods by an average of 71.0\% in overall AUROC performance across multiple benchmarks. The implementation and code are publicly available at https://github.com/rohban-lab/FrameShield.

【3】An Evidence-Based Post-Hoc Adjustment Framework for Anomaly Detection Under Data Contamination
标题：数据污染下异常检测的基于证据的事后调整框架
链接：https://arxiv.org/abs/2510.21296

作者：Sukanya Patra, Souhaib Ben Taieb
备注：Accepted in the Thirty-ninth Annual Conference on Neural Information Processing Systems (NeurIPS 2025)
摘要：无监督异常检测（AD）方法通常假设干净的训练数据，但现实世界的数据集通常包含未检测到或错误标记的异常，导致性能显著下降。现有的解决方案需要访问训练管道、数据或数据中异常比例的先验知识，这限制了它们在现实世界中的适用性。为了应对这一挑战，我们提出了EPHAD，这是一个简单而有效的测试时自适应框架，它使用在测试时收集的证据更新在污染数据集上训练的AD模型的输出。我们的方法将在污染数据集上训练的AD模型捕获的先验知识与来自多模态基础模型（如对比图像预训练（CLIP）），经典AD方法（如潜在离群值因子）或特定领域知识的证据相结合。我们用一个合成玩具的例子来说明EPHAD背后的直觉，并通过八个视觉AD数据集，二十六个表格AD数据集和一个真实世界的工业AD数据集的综合实验来验证其有效性。此外，我们还进行了一项消融研究，以分析超参数的影响和对不同污染水平的鲁棒性，证明EPHAD在不同AD模型和证据对中的通用性和鲁棒性。为了确保可重复性，我们的代码可在https://github.com/sukanyapatra1997/EPHAD上公开获取。
摘要：Unsupervised anomaly detection (AD) methods typically assume clean training data, yet real-world datasets often contain undetected or mislabeled anomalies, leading to significant performance degradation. Existing solutions require access to the training pipelines, data or prior knowledge of the proportions of anomalies in the data, limiting their real-world applicability. To address this challenge, we propose EPHAD, a simple yet effective test-time adaptation framework that updates the outputs of AD models trained on contaminated datasets using evidence gathered at test time. Our approach integrates the prior knowledge captured by the AD model trained on contaminated datasets with evidence derived from multimodal foundation models like Contrastive Language-Image Pre-training (CLIP), classical AD methods like the Latent Outlier Factor or domain-specific knowledge. We illustrate the intuition behind EPHAD using a synthetic toy example and validate its effectiveness through comprehensive experiments across eight visual AD datasets, twenty-six tabular AD datasets, and a real-world industrial AD dataset. Additionally, we conduct an ablation study to analyse hyperparameter influence and robustness to varying contamination levels, demonstrating the versatility and robustness of EPHAD across diverse AD models and evidence pairs. To ensure reproducibility, our code is publicly available at https://github.com/sukanyapatra1997/EPHAD.

【4】Towards Scalable Oversight with Collaborative Multi-Agent Debate in Error Detection
标题：通过错误检测中的协作多代理辩论实现可扩展监督
链接：https://arxiv.org/abs/2510.20963

作者：Yongqiang Chen, Gang Niu, James Cheng, Bo Han, Masashi Sugiyama
备注：Preprint, ongoing work
摘要：准确检测大型语言模型（LLM）响应中的错误对于可扩展监督的成功至关重要，或者为超人智能提供有效的监督。然而，自我诊断在复杂的任务中往往是不可靠的，除非有可靠的外部反馈。多主体辩论（MAD）似乎是外部反馈的一种自然替代方案：多个LLM提供了互补的观点和交叉检查错误检测。然而，先前的MAD协议将辩论框定为零和游戏，其中辩论者竞争以赢得游戏而不是寻求真相。因此，它导致了辩论黑客：辩论者倾向于通过误解任务或提出过于自信的主张来误导法官，这会引入更多的错误并表现出不如单代理方法。为了缓解这个问题，我们引入了一个新的协作MAD协议，称为ColMAD，将MAD重新定义为一个非零和游戏。具体来说，ColMAD鼓励多个代理以支持的方式相互批评，这样他们就可以补充彼此的缺失点。因此，法官代理人可以在更全面的证据基础上做出更翔实的结论。从经验上讲，我们表明，ColMAD显着优于以前的竞争MAD的19%，并带来了非平凡的改进单代理方法在错误检测。
摘要：Accurate detection of errors in large language models (LLM) responses is central to the success of scalable oversight, or providing effective supervision to superhuman intelligence. Yet, self-diagnosis is often unreliable on complex tasks unless aided by reliable external feedback. Multi-agent debate (MAD) seems to be a natural alternative to external feedback: multiple LLMs provide complementary perspectives and cross-checks for error detection. However, prior MAD protocols frame debate as a zero-sum game, where the debaters compete to win the game instead of seeking the truth. Consequently, it leads to debate hacking: debaters tend to mislead the judge by misinterpreting the task or presenting overconfident claims, which introduce more mistakes and underperform single-agent methods. To mitigate the issue, we introduce a new collaborative MAD protocol, termed ColMAD, that reframes MAD as a non-zero sum game. Specifically, ColMAD encourages multiple agents to criticize each other in a supportive way, such that they can complement the missing points of each other. Therefore, the judge agent can make a more informative conclusion based on more comprehensive evidence. Empirically, we show that ColMAD significantly outperforms previous competitive MAD by 19% and brings non-trivial improvements over single-agent methods in error detection.

【5】WhaleVAD-BPN: Improving Baleen Whale Call Detection with Boundary Proposal Networks and Post-processing Optimisation
标题：WhaleVAD-BPN：利用边界提议网络和后处理优化改进须鲸叫声检测
链接：https://arxiv.org/abs/2510.21280

作者：Christiaan M. Geldenhuys, Günther Tonitz, Thomas R. Niesler
摘要：虽然最近的声音事件检测（SED）系统可以识别海洋音频中的须鲸叫声，但与误报和少数群体检测相关的挑战仍然存在。我们提出了边界建议网络（BPN），它扩展了现有的轻量级SED系统。BPN的灵感来自图像对象检测的工作，旨在减少误报检测的数量。它通过使用在主干分类模型内计算的中间潜在表示来门控最终输出来实现这一点。当添加到现有的SED系统中时，BPN实现了16.8%的精确度绝对提高，以及少数类d-调用和bp-调用的F1分数分别提高了21.3%和9.4%。我们进一步考虑两种方法来选择后处理超参数：向前搜索和向后搜索。通过分别优化事件级和帧级超参数，这两种方法导致使用经验方法选择的参数的相当大的性能改进。完整的WhaleVAD-BPN系统实现了0.475的交叉验证开发F1评分，比基线绝对改善了9.8%。
摘要：While recent sound event detection (SED) systems can identify baleen whale calls in marine audio, challenges related to false positive and minority-class detection persist. We propose the boundary proposal network (BPN), which extends an existing lightweight SED system. The BPN is inspired by work in image object detection and aims to reduce the number of false positive detections. It achieves this by using intermediate latent representations computed within the backbone classification model to gate the final output. When added to an existing SED system, the BPN achieves a 16.8 % absolute increase in precision, as well as 21.3 % and 9.4 % improvements in the F1-score for minority-class d-calls and bp-calls, respectively. We further consider two approaches to the selection of post-processing hyperparameters: a forward-search and a backward-search. By separately optimising event-level and frame-level hyperparameters, these two approaches lead to considerable performance improvements over parameters selected using empirical methods. The complete WhaleVAD-BPN system achieves a cross-validated development F1-score of 0.475, which is a 9.8 % absolute improvement over the baseline.

【6】This EEG Looks Like These EEGs: Interpretable Interictal Epileptiform Discharge Detection With ProtoEEG-kNN
标题：这个脑电波看起来像这些脑电波：使用ProtoEEG-kNN的可解释发作间隙癫痫样放电检测
链接：https://arxiv.org/abs/2510.20846

作者：Dennis Tang, Jon Donnelly, Alina Jade Barnett, Lesia Semenova, Jin Jing, Peter Hadar, Ioannis Karakis, Olga Selioutski, Kehan Zhao, M. Brandon Westover, Cynthia Rudin
备注：MICCAI 2025
摘要：脑电图（EEG）记录中发作间隙癫痫样放电（IED）的存在是癫痫的重要生物标志物。即使是训练有素的神经学家也发现检测IED很困难，导致许多从业者转向机器学习寻求帮助。虽然现有的机器学习算法可以在这项任务上实现很高的准确性，但大多数模型都是不可解释的，无法证明其结论是正确的。由于缺乏理解模型推理的能力，医生无法利用他们的专业知识来识别不正确的模型预测并进行相应的干预。为了改善人与模型的交互，我们引入了ProtoEEG-kNN，这是一种内在可解释的模型，遵循一个简单的基于案例的推理过程。ProtoEEG-kNN通过将EEG与来自训练集的类似EEG进行比较来进行推理，并在IED形态（形状）和空间分布（位置）方面直观地展示其推理。我们表明，ProtoEEG-kNN可以在IED检测中实现最先进的准确性，同时提供专家更喜欢现有方法的解释。
摘要：The presence of interictal epileptiform discharges (IEDs) in electroencephalogram (EEG) recordings is a critical biomarker of epilepsy. Even trained neurologists find detecting IEDs difficult, leading many practitioners to turn to machine learning for help. While existing machine learning algorithms can achieve strong accuracy on this task, most models are uninterpretable and cannot justify their conclusions. Absent the ability to understand model reasoning, doctors cannot leverage their expertise to identify incorrect model predictions and intervene accordingly. To improve the human-model interaction, we introduce ProtoEEG-kNN, an inherently interpretable model that follows a simple case-based reasoning process. ProtoEEG-kNN reasons by comparing an EEG to similar EEGs from the training set and visually demonstrates its reasoning both in terms of IED morphology (shape) and spatial distribution (location). We show that ProtoEEG-kNN can achieve state-of-the-art accuracy in IED detection while providing explanations that experts prefer over existing approaches.

【7】A Multiscale Approach for Enhancing Weak Signal Detection
标题：增强弱信号检测的多尺度方法
链接：https://arxiv.org/abs/2510.20828

作者：Dixon Vimalajeewa, Ursula U. Muller, Brani Vidakovic
摘要：随机共振（SR），一种最初在气候建模中引入的现象，通过利用非线性系统中的最佳噪声水平来增强信号检测。传统的随机共振技术主要基于单阈值检测器，仅限于其行为不依赖于时间的信号。通常需要大量的噪声来检测弱信号，这会使复杂的信号特征失真。为了解决这些局限性，本研究探讨了多阈值系统和SR在多尺度应用中使用小波变换的应用。在多尺度域中，可以在不同的分辨率水平上分析信号，以更好地理解潜在的动力学。我们提出了一个双阈值检测系统，它集成了两个单阈值检测器，以增强弱信号检测。我们评估它在原始数据域和多尺度域使用模拟和真实世界的信号，并比较其性能与现有的方法。实验结果表明，在原始数据域，提出的双阈值检测器显着提高了微弱信号检测相比，传统的单阈值方法。其性能在频域中得到进一步改善，需要更低的噪声水平，同时优于现有的检测系统。该研究通过引入一种鲁棒的弱信号识别方法，推进了基于SR的检测方法，在各个学科中具有潜在的应用。
摘要：Stochastic resonance (SR), a phenomenon originally introduced in climate modeling, enhances signal detection by leveraging optimal noise levels within non-linear systems. Traditional SR techniques, mainly based on single-threshold detectors, are limited to signals whose behavior does not depend on time. Often large amounts of noise are needed to detect weak signals, which can distort complex signal characteristics. To address these limitations, this study explores multi-threshold systems and the application of SR in multiscale applications using wavelet transforms. In the multiscale domain signals can be analyzed at different levels of resolution to better understand the underlying dynamics. We propose a double-threshold detection system that integrates two single-threshold detectors to enhance weak signal detection. We evaluate it both in the original data domain and in the multiscale domain using simulated and real-world signals and compare its performance with existing methods. Experimental results demonstrate that, in the original data domain, the proposed double-threshold detector significantly improves weak signal detection compared to conventional single-threshold approaches. Its performance is further improved in the frequency domain, requiring lower noise levels while outperforming existing detection systems. This study advances SR-based detection methodologies by introducing a robust approach to weak signal identification, with potential applications in various disciplines.

分类|识别(1篇)

【1】Exploring Spiking Neural Networks for Binary Classification in Multivariate Time Series at the Edge
标题：基于脉冲神经网络的多变量时间序列边缘分类研究
链接：https://arxiv.org/abs/2510.20997

作者：James Ghawaly, Andrew Nicholson, Catherine Schuman, Dalton Diez, Aaron Young, Brett Witherspoon
备注：Accepted in 2025 International Joint Conference on Neural Networks (IJCNN)
摘要：我们提出了一个训练尖峰神经网络（SNN）对多变量时间序列进行二进制分类的一般框架，重点是逐步预测和低误报率下的高精度。该方法使用神经形态系统（EONS）算法的进化优化稀疏，有状态的SNN共同优化其架构和参数。输入被编码成尖峰序列，并且通过对单个输出神经元的尖峰计数进行阈值化来进行预测。我们还采用了简单的投票集成方法，以提高性能和鲁棒性。为了评估该框架，我们将其应用于特定应用程序的优化，以检测低信噪比的伽马射线光谱数据中的放射源的任务。由此产生的SNN只有49个神经元和66个突触，在1/hr的误报率下实现了51.8%的真阳性率（TPR），优于PCA（42.7%）和深度学习（49.8%）基线。一个三个模型的任何投票合奏TPR增加到67.1%，在相同的误报率。在microCaspian神经形态平台上的硬件部署显示出2 mW的功耗和20.2ms的推理延迟。我们还证明了推广应用相同的框架，没有特定领域的修改，癫痫发作检测脑电图记录。集成实现了95%的TPR，假阳性率为16%，与最近的深度学习方法相当，参数数量显著减少。
摘要：We present a general framework for training spiking neural networks (SNNs) to perform binary classification on multivariate time series, with a focus on step-wise prediction and high precision at low false alarm rates. The approach uses the Evolutionary Optimization of Neuromorphic Systems (EONS) algorithm to evolve sparse, stateful SNNs by jointly optimizing their architectures and parameters. Inputs are encoded into spike trains, and predictions are made by thresholding a single output neuron's spike counts. We also incorporate simple voting ensemble methods to improve performance and robustness. To evaluate the framework, we apply it with application-specific optimizations to the task of detecting low signal-to-noise ratio radioactive sources in gamma-ray spectral data. The resulting SNNs, with as few as 49 neurons and 66 synapses, achieve a 51.8% true positive rate (TPR) at a false alarm rate of 1/hr, outperforming PCA (42.7%) and deep learning (49.8%) baselines. A three-model any-vote ensemble increases TPR to 67.1% at the same false alarm rate. Hardware deployment on the microCaspian neuromorphic platform demonstrates 2mW power consumption and 20.2ms inference latency. We also demonstrate generalizability by applying the same framework, without domain-specific modification, to seizure detection in EEG recordings. An ensemble achieves 95% TPR with a 16% false positive rate, comparable to recent deep learning approaches with significant reduction in parameter count.

表征(7篇)

【1】Unified token representations for sequential decision models
标题：顺序决策模型的统一令牌表示
链接：https://arxiv.org/abs/2510.21448

作者：Zhuojing Tian, Yushu Chen
摘要：Transformers在离线强化学习（RL）中表现出了强大的潜力，它将轨迹建模为返回到去、状态和动作的序列。然而，现有的方法，如决策Transformer（DT）及其变体遭受冗余标记化和二次注意复杂性，限制了它们在实时或资源受限的设置中的可扩展性。为了解决这个问题，我们提出了一个统一的令牌表示（UTR），将返回到去，状态和动作合并到一个令牌中，大大减少了序列长度和模型复杂性。理论分析表明，UTR导致更严格的Rademacher复杂性界，这表明改进的泛化。我们进一步开发了两种变体：UDT和UDC，分别基于Transformer和门控CNN主干。两者都实现了与最先进的方法相当或更高的性能，具有明显更低的计算量。这些研究结果表明，UTR的推广以及跨架构，并可能提供一个有效的基础，可扩展的控制在未来的大型决策模型。
摘要：Transformers have demonstrated strong potential in offline reinforcement learning (RL) by modeling trajectories as sequences of return-to-go, states, and actions. However, existing approaches such as the Decision Transformer(DT) and its variants suffer from redundant tokenization and quadratic attention complexity, limiting their scalability in real-time or resource-constrained settings. To address this, we propose a Unified Token Representation (UTR) that merges return-to-go, state, and action into a single token, substantially reducing sequence length and model complexity. Theoretical analysis shows that UTR leads to a tighter Rademacher complexity bound, suggesting improved generalization. We further develop two variants: UDT and UDC, built upon transformer and gated CNN backbones, respectively. Both achieve comparable or superior performance to state-of-the-art methods with markedly lower computation. These findings demonstrate that UTR generalizes well across architectures and may provide an efficient foundation for scalable control in future large decision models.

【2】Disentangled Representation Learning via Modular Compositional Bias
标题：通过模块组成偏差解开表示学习
链接：https://arxiv.org/abs/2510.21402

作者：Whie Jung, Dong Hoon Lee, Seunghoon Hong
摘要：最近的解纠缠表示学习（DRL）方法严重依赖于特定因素的策略，无论是学习目标的属性或模型架构的对象嵌入归纳偏见。当新的变化因素与先前的假设（如统计独立性或空间排他性）不一致时，或者当多个因素共存时，这种不同的方法会导致显著的开销，因为从业者必须重新设计架构或目标。为了解决这个问题，我们提出了一个组合偏置，一个模块化的归纳偏置从目标和架构解耦。我们的关键见解是，不同的因素在数据分布中遵循不同的重组规则：全局属性是互斥的，例如，一张脸只有一个鼻子，而物体共享一个公共支撑（物体的任何子集都可以共存）。因此，我们根据特定因素的规则随机重新混合潜在因素，即，混合策略，并迫使编码器发现混合策略通过两个互补目标反映的任何因素结构：（i）确保每个再混合解码成真实图像的先验损失，以及（ii）Wiedemer等人（arXiv：2310.05327）引入的组成一致性损失，其将每个合成图像与其对应的合成潜像对齐。在这个通用框架下，只需调整混合策略，就可以在不修改目标或架构的情况下，解开属性、对象甚至两者的纠缠。大量的实验表明，该方法在属性和对象解纠缠方面都表现出很好的性能，并且独特地实现了全局样式和对象的联合解纠缠。代码可在https://github.com/whieya/Compositional-DRL上获得。
摘要：Recent disentangled representation learning (DRL) methods heavily rely on factor specific strategies-either learning objectives for attributes or model architectures for objects-to embed inductive biases. Such divergent approaches result in significant overhead when novel factors of variation do not align with prior assumptions, such as statistical independence or spatial exclusivity, or when multiple factors coexist, as practitioners must redesign architectures or objectives. To address this, we propose a compositional bias, a modular inductive bias decoupled from both objectives and architectures. Our key insight is that different factors obey distinct recombination rules in the data distribution: global attributes are mutually exclusive, e.g., a face has one nose, while objects share a common support (any subset of objects can co-exist). We therefore randomly remix latents according to factor-specific rules, i.e., a mixing strategy, and force the encoder to discover whichever factor structure the mixing strategy reflects through two complementary objectives: (i) a prior loss that ensures every remix decodes into a realistic image, and (ii) the compositional consistency loss introduced by Wiedemer et al. (arXiv:2310.05327), which aligns each composite image with its corresponding composite latent. Under this general framework, simply adjusting the mixing strategy enables disentanglement of attributes, objects, and even both, without modifying the objectives or architectures. Extensive experiments demonstrate that our method shows competitive performance in both attribute and object disentanglement, and uniquely achieves joint disentanglement of global style and objects. Code is available at https://github.com/whieya/Compositional-DRL.

【3】ESCORT: Efficient Stein-variational and Sliced Consistency-Optimized Temporal Belief Representation for POMDPs
标题：ESCRT：POMDPs的高效Stein变分和切片一致性优化时间信念表示
链接：https://arxiv.org/abs/2510.21107

作者：Yunuo Zhang, Baiting Luo, Ayan Mukhopadhyay, Gabor Karsai, Abhishek Dubey
备注：Proceeding of the 39th Conference on Neural Information Processing Systems (NeurIPS'25). Code would be available at this https URL
摘要：在部分可观测马尔可夫决策过程（POMDPs）中，维护和更新可能的底层状态的信念分布提供了一种原则性的方法来总结行动观察历史，以便在不确定性下进行有效的决策。随着环境变得越来越现实，信念分布发展出标准数学模型无法准确捕捉的复杂性，从而在保持表征准确性方面产生了根本性挑战。尽管在深度学习和概率建模方面取得了进展，但现有的POMDP信念近似方法无法准确地表示复杂的不确定性结构，例如高维多模态信念分布，导致估计错误，从而导致次优代理行为。为了应对这一挑战，我们提出了ESCORT（高效斯坦变分和切片一致性优化表示时间信念），一个基于粒子的框架，用于捕获复杂的，多模态分布在高维的信念空间。ESCORT通过两个关键创新扩展了SVGD：相关感知投影，用于对状态维度之间的依赖关系进行建模;时间一致性约束，用于在保持相关结构的同时稳定更新。这种方法保留了SVGD的吸引-排斥粒子动力学，同时能够精确建模复杂的相关模式。与粒子滤波器容易退化或具有固定表示能力的参数化方法不同，ESCORT动态适应信念景观复杂性，而无需重新分配或限制性分布假设。我们通过对POMDP域和不同维度的合成多模态分布的广泛评估来证明ESCORT的有效性，在置信近似精度和下游决策质量方面，它始终优于最先进的方法。
摘要：In Partially Observable Markov Decision Processes (POMDPs), maintaining and updating belief distributions over possible underlying states provides a principled way to summarize action-observation history for effective decision-making under uncertainty. As environments grow more realistic, belief distributions develop complexity that standard mathematical models cannot accurately capture, creating a fundamental challenge in maintaining representational accuracy. Despite advances in deep learning and probabilistic modeling, existing POMDP belief approximation methods fail to accurately represent complex uncertainty structures such as high-dimensional, multi-modal belief distributions, resulting in estimation errors that lead to suboptimal agent behaviors. To address this challenge, we present ESCORT (Efficient Stein-variational and sliced Consistency-Optimized Representation for Temporal beliefs), a particle-based framework for capturing complex, multi-modal distributions in high-dimensional belief spaces. ESCORT extends SVGD with two key innovations: correlation-aware projections that model dependencies between state dimensions, and temporal consistency constraints that stabilize updates while preserving correlation structures. This approach retains SVGD's attractive-repulsive particle dynamics while enabling accurate modeling of intricate correlation patterns. Unlike particle filters prone to degeneracy or parametric methods with fixed representational capacity, ESCORT dynamically adapts to belief landscape complexity without resampling or restrictive distributional assumptions. We demonstrate ESCORT's effectiveness through extensive evaluations on both POMDP domains and synthetic multi-modal distributions of varying dimensionality, where it consistently outperforms state-of-the-art methods in terms of belief approximation accuracy and downstream decision quality.

【4】On the accuracy of implicit neural representations for cardiovascular anatomies and hemodynamic fields
标题：心血管解剖学和血流动力学领域内隐神经表征的准确性
链接：https://arxiv.org/abs/2510.20970

作者：Jubilee Lee, Daniele E. Schiavazzi
摘要：隐式神经表示（INRs，也称为神经场）最近成为知识表示、合成和压缩的强大框架。通过将场编码为深度神经网络的权重和偏置内的连续函数，而不是依赖于基于体素或网格的结构化或非结构化表示，INR提供了分辨率独立性和高内存效率。然而，它们在特定领域应用中的准确性仍然没有得到充分的理解。在这项工作中，我们评估了最先进的国际标准化组织的性能，用于压缩来自数值模拟的血液动力学场，并通过有符号的距离函数表示心血管解剖结构。我们研究了几种策略来减轻光谱偏差，包括专门的激活函数，固定和可训练的位置编码，以及非线性内核的线性组合。在胸主动脉中现实的、空间和时间变化的血液动力学场上，INR实现了高达约230的显著压缩比，压力的最大绝对误差为1 mmHg，速度的最大绝对误差为5-10 cm/s，无需广泛的超参数调整。在48个胸主动脉解剖结构中，平均和最大绝对解剖差异分别低于0.5 mm和1.6 mm。总体而言，SIREN、MFN-Gabor和MHE架构表现出最佳性能。源代码和数据可在https://github.com/desResLab/nrf上获得。
摘要：Implicit neural representations (INRs, also known as neural fields) have recently emerged as a powerful framework for knowledge representation, synthesis, and compression. By encoding fields as continuous functions within the weights and biases of deep neural networks-rather than relying on voxel- or mesh-based structured or unstructured representations-INRs offer both resolution independence and high memory efficiency. However, their accuracy in domain-specific applications remains insufficiently understood. In this work, we assess the performance of state-of-the-art INRs for compressing hemodynamic fields derived from numerical simulations and for representing cardiovascular anatomies via signed distance functions. We investigate several strategies to mitigate spectral bias, including specialized activation functions, both fixed and trainable positional encoding, and linear combinations of nonlinear kernels. On realistic, space- and time-varying hemodynamic fields in the thoracic aorta, INRs achieved remarkable compression ratios of up to approximately 230, with maximum absolute errors of 1 mmHg for pressure and 5-10 cm/s for velocity, without extensive hyperparameter tuning. Across 48 thoracic aortic anatomies, the average and maximum absolute anatomical discrepancies were below 0.5 mm and 1.6 mm, respectively. Overall, the SIREN, MFN-Gabor, and MHE architectures demonstrated the best performance. Source code and data is available at https://github.com/desResLab/nrf.

【5】ROPES: Robotic Pose Estimation via Score-Based Causal Representation Learning
标题：ROPES：通过基于分数的因果表示学习进行机器人姿势估计
链接：https://arxiv.org/abs/2510.20884

作者：Pranamya Kulkarni, Puranjay Datta, Burak Varıcı, Emre Acartürk, Karthikeyan Shanmugam, Ali Tajer
备注：A preliminary version of this paper appeared at NeurIPS 2025 Workshop on Embodied World Models for Decision Making
摘要：因果表征学习（CRL）已经成为一个强大的无监督框架，它（i）解开高维数据背后的潜在生成因素，（ii）学习解开的变量之间的因果相互作用。尽管最近在可识别性方面取得了广泛的进展，并取得了一些实际进展，但理论和现实世界的实践之间仍然存在很大的差距。本文通过将CRL引入机器人学（一个激发CRL的领域），朝着缩小这一差距迈出了一步。具体来说，本文解决了定义良好的机器人姿态估计-恢复的位置和方向从原始图像-通过引入基于分数的CRL（ROPES）的机器人姿态估计。作为一个无监督的框架，ROPES通过识别那些被驱动的生成因素来体现介入CRL的本质：图像由内在和外在潜在因素（例如，关节角度、手臂/肢体几何形状、照明、背景和照相机配置），并且目的是解开和恢复可控潜在变量，即，可以通过致动直接操纵（干预）的那些。干预CRL理论表明，通过干预发生变化的变量可以被识别。在机器人技术中，通过命令各种关节的致动器并在各种控制下记录图像，这种干预自然会出现。在半合成机械手实验的实证评估表明，ROPES成功地解开潜在的生成因素与高保真度的地面真相。至关重要的是，这是通过仅利用分布变化来实现的，而不使用任何标记数据。本文还包括一个比较与基线的基础上最近提出的半监督框架。本文的结论是定位机器人位姿估计作为一个接近实际的测试平台CRL。
摘要：Causal representation learning (CRL) has emerged as a powerful unsupervised framework that (i) disentangles the latent generative factors underlying high-dimensional data, and (ii) learns the cause-and-effect interactions among the disentangled variables. Despite extensive recent advances in identifiability and some practical progress, a substantial gap remains between theory and real-world practice. This paper takes a step toward closing that gap by bringing CRL to robotics, a domain that has motivated CRL. Specifically, this paper addresses the well-defined robot pose estimation -- the recovery of position and orientation from raw images -- by introducing Robotic Pose Estimation via Score-Based CRL (ROPES). Being an unsupervised framework, ROPES embodies the essence of interventional CRL by identifying those generative factors that are actuated: images are generated by intrinsic and extrinsic latent factors (e.g., joint angles, arm/limb geometry, lighting, background, and camera configuration) and the objective is to disentangle and recover the controllable latent variables, i.e., those that can be directly manipulated (intervened upon) through actuation. Interventional CRL theory shows that variables that undergo variations via interventions can be identified. In robotics, such interventions arise naturally by commanding actuators of various joints and recording images under varied controls. Empirical evaluations in semi-synthetic manipulator experiments demonstrate that ROPES successfully disentangles latent generative factors with high fidelity with respect to the ground truth. Crucially, this is achieved by leveraging only distributional changes, without using any labeled data. The paper also includes a comparison with a baseline based on a recently proposed semi-supervised framework. This paper concludes by positioning robot pose estimation as a near-practical testbed for CRL.

【6】Contribution of task-irrelevant stimuli to drift of neural representations
标题：与任务无关的刺激对神经表象漂移的贡献
链接：https://arxiv.org/abs/2510.21588

作者：Farhad Pashakhanloo
备注：NeurIPS 2025
摘要：生物学习者和人工学习者在其一生中必然会接触到大量的数据和经验，必须不断适应、学习或选择性地忽略正在进行的输入。最近的研究结果表明，即使表现保持稳定，潜在的神经表征也会随着时间的推移而逐渐变化，这种现象称为表征漂移。研究可能导致漂移的不同数据和噪声来源对于理解神经系统的终身学习至关重要。然而，跨架构和学习规则的漂移的系统研究，以及与任务的连接，是缺失的。在这里，在一个在线学习设置中，我们将漂移描述为数据分布的函数，并特别表明，由任务无关的刺激引起的学习噪声，代理在给定的上下文中学会忽略，可以在任务相关刺激的表示中产生长期漂移。使用理论和模拟，我们证明了这一现象，无论是在基于赫布的学习- Oja的规则和相似性匹配-和随机梯度下降应用于自动编码器和监督的两层网络。我们一致地观察到，漂移率随着任务无关子空间中数据的方差和维数而增加。我们进一步表明，这产生不同的定性预测的几何形状和尺寸依赖性的漂移比高斯突触噪声所产生的。总的来说，我们的研究将刺激，任务和学习规则的结构与表征漂移联系起来，并可以为使用漂移作为揭示大脑中潜在计算的信号铺平道路。
摘要：Biological and artificial learners are inherently exposed to a stream of data and experience throughout their lifetimes and must constantly adapt to, learn from, or selectively ignore the ongoing input. Recent findings reveal that, even when the performance remains stable, the underlying neural representations can change gradually over time, a phenomenon known as representational drift. Studying the different sources of data and noise that may contribute to drift is essential for understanding lifelong learning in neural systems. However, a systematic study of drift across architectures and learning rules, and the connection to task, are missing. Here, in an online learning setup, we characterize drift as a function of data distribution, and specifically show that the learning noise induced by task-irrelevant stimuli, which the agent learns to ignore in a given context, can create long-term drift in the representation of task-relevant stimuli. Using theory and simulations, we demonstrate this phenomenon both in Hebbian-based learning -- Oja's rule and Similarity Matching -- and in stochastic gradient descent applied to autoencoders and a supervised two-layer network. We consistently observe that the drift rate increases with the variance and the dimension of the data in the task-irrelevant subspace. We further show that this yields different qualitative predictions for the geometry and dimension-dependency of drift than those arising from Gaussian synaptic noise. Overall, our study links the structure of stimuli, task, and learning rule to representational drift and could pave the way for using drift as a signal for uncovering underlying computation in the brain.

【7】Triangle Multiplication Is All You Need For Biomolecular Structure Representations
标题：三角乘法是生物分子结构表示所需的全部
链接：https://arxiv.org/abs/2510.18870

作者：Jeffrey Ouyang-Zhang, Pranav Murugan, Daniel J. Diaz, Gianluca Scarpellini, Richard Strong Bowen, Nate Gruver, Adam Klivans, Philipp Krähenbühl, Aleksandra Faust, Maruan Al-Shedivat
备注：Preprint
摘要：AlphaFold已经改变了蛋白质结构预测，但新兴的应用，如虚拟配体筛选，蛋白质组范围内的折叠和从头结合剂设计，需要大规模的预测，其中运行时间和内存成本变得令人望而却步。一个主要瓶颈在于AlphaFold3风格模型的Pairformer主干，它依赖于计算成本高昂的三角形基元（尤其是三角形注意力）来进行成对推理。我们介绍Pairmixer，一个简化的替代方案，消除三角形注意力，同时保留高阶几何推理能力，结构预测的关键。Pairmixer大大提高了计算效率，在折叠和对接基准中匹配最先进的结构预测器，在长序列上提供高达4倍的推理速度，同时将训练成本降低34%。它的效率减轻了下游应用的计算负担，例如大型蛋白质复合物建模，高通量配体和结合剂筛选以及基于幻觉的设计。例如，在BoltzDesign中，Pairmixer提供了2倍以上的采样速度，并扩展到比Pairformer的内存限制长约30%的序列。
摘要：AlphaFold has transformed protein structure prediction, but emerging applications such as virtual ligand screening, proteome-wide folding, and de novo binder design demand predictions at a massive scale, where runtime and memory costs become prohibitive. A major bottleneck lies in the Pairformer backbone of AlphaFold3-style models, which relies on computationally expensive triangular primitives-especially triangle attention-for pairwise reasoning. We introduce Pairmixer, a streamlined alternative that eliminates triangle attention while preserving higher-order geometric reasoning capabilities that are critical for structure prediction. Pairmixer substantially improves computational efficiency, matching state-of-the-art structure predictors across folding and docking benchmarks, delivering up to 4x faster inference on long sequences while reducing training cost by 34%. Its efficiency alleviates the computational burden of downstream applications such as modeling large protein complexes, high-throughput ligand and binder screening, and hallucination-based design. Within BoltzDesign, for example, Pairmixer delivers over 2x faster sampling and scales to sequences ~30% longer than the memory limits of Pairformer.

优化|敛散性(10篇)

【1】Uniform Convergence Beyond Glivenko-Cantelli
标题：超越格利文科-坎特利的一致收敛
链接：https://arxiv.org/abs/2510.21506

作者：Tanmay Devale, Pramith Devulapalli, Steve Hanneke
摘要：我们刻画了$\{0，1\}^\mathbb{N}$上的分布集合允许其均值的一致估计的条件。Vapnik和Chervonenkis（1971）的前期工作集中在使用经验均值估计的一致收敛性上，导致了被称为$P-$ Glivenko-Cantelli的原理。我们扩展了这个框架，超越了经验均值估计，并引入了均匀均值估计，也称为$UME-$ learnability，它可以捕捉到一个集合允许任何任意估计的均匀均值估计。我们工作的空间所创建的平均向量的分布的集合。对于每个分布，平均向量记录每个坐标中的期望值。我们证明了平均向量的可分性是$UME-$可学习性的充分条件。然而，我们表明，分离的平均向量是不必要的$UME-$ learnability通过构建一个集合的分布，其平均向量是不可分离的，但$UME-$ learnable使用的技术从根本上不同于我们的分离为基础的分析。最后，我们建立了$UME-$ learnable集合的可数并集也是$UME-$ learnable的，解决了Cohen et al.（2025）提出的一个猜想。
摘要：We characterize conditions under which collections of distributions on $\{0,1\}^\mathbb{N}$ admit uniform estimation of their mean. Prior work from Vapnik and Chervonenkis (1971) has focused on uniform convergence using the empirical mean estimator, leading to the principle known as $P-$ Glivenko-Cantelli. We extend this framework by moving beyond the empirical mean estimator and introducing Uniform Mean Estimability, also called $UME-$ learnability, which captures when a collection permits uniform mean estimation by any arbitrary estimator. We work on the space created by the mean vectors of the collection of distributions. For each distribution, the mean vector records the expected value in each coordinate. We show that separability of the mean vectors is a sufficient condition for $UME-$ learnability. However, we show that separability of the mean vectors is not necessary for $UME-$ learnability by constructing a collection of distributions whose mean vectors are non-separable yet $UME-$ learnable using techniques fundamentally different from those used in our separability-based analysis. Finally, we establish that countable unions of $UME-$ learnable collections are also $UME-$ learnable, solving a conjecture posed in Cohen et al. (2025).

【2】Cost-Sensitive Freeze-thaw Bayesian Optimization for Efficient Hyperparameter Tuning
标题：用于高效超参数调整的代价敏感冻融贝叶斯优化
链接：https://arxiv.org/abs/2510.21379

作者：Dong Bok Lee, Aoxuan Silvia Zhang, Byungjoo Kim, Junhyeon Park, Steven Adriaensen, Juho Lee, Sung Ju Hwang, Hae Beom Lee
备注：Published at NeurIPS 2025
摘要：In this paper, we address the problem of \emph{cost-sensitive} hyperparameter optimization (HPO) built upon freeze-thaw Bayesian optimization (BO). Specifically, we assume a scenario where users want to early-stop the HPO process when the expected performance improvement is not satisfactory with respect to the additional computational cost. Motivated by this scenario, we introduce \emph{utility} in the freeze-thaw framework, a function describing the trade-off between the cost and performance that can be estimated from the user's preference data. This utility function, combined with our novel acquisition function and stopping criterion, allows us to dynamically continue training the configuration that we expect to maximally improve the utility in the future, and also automatically stop the HPO process around the maximum utility. Further, we improve the sample efficiency of existing freeze-thaw methods with transfer learning to develop a specialized surrogate model for the cost-sensitive HPO problem. We validate our algorithm on established multi-fidelity HPO benchmarks and show that it outperforms all the previous freeze-thaw BO and transfer-BO baselines we consider, while achieving a significantly better trade-off between the cost and performance. Our code is publicly available at https://github.com/db-Lee/CFBO.

【3】Convergence of Stochastic Gradient Langevin Dynamics in the Lazy Training Regime
标题：懒惰训练制度下随机梯度Langevin动力学的收敛
链接：https://arxiv.org/abs/2510.21245

作者：Noah Oberweis, Semih Cayci
摘要：Continuous-time models provide important insights into the training dynamics of optimization algorithms in deep learning. In this work, we establish a non-asymptotic convergence analysis of stochastic gradient Langevin dynamics (SGLD), which is an It\^o stochastic differential equation (SDE) approximation of stochastic gradient descent in continuous time, in the lazy training regime. We show that, under regularity conditions on the Hessian of the loss function, SGLD with multiplicative and state-dependent noise (i) yields a non-degenerate kernel throughout the training process with high probability, and (ii) achieves exponential convergence to the empirical risk minimizer in expectation, and we establish finite-time and finite-width bounds on the optimality gap. We corroborate our theoretical findings with numerical examples in the regression setting.

【4】Online AUC Optimization Based on Second-order Surrogate Loss
标题：基于二阶代理损失的在线AUDA优化
链接：https://arxiv.org/abs/2510.21202

作者：JunRu Luo, Difei Cheng, Bo Zhang
摘要：The Area Under the Curve (AUC) is an important performance metric for classification tasks, particularly in class-imbalanced scenarios. However, minimizing the AUC presents significant challenges due to the non-convex and discontinuous nature of pairwise 0/1 losses, which are difficult to optimize, as well as the substantial memory cost of instance-wise storage, which creates bottlenecks in large-scale applications. To overcome these challenges, we propose a novel second-order surrogate loss based on the pairwise hinge loss, and develop an efficient online algorithm. Unlike conventional approaches that approximate each individual pairwise 0/1 loss term with an instance-wise surrogate function, our approach introduces a new paradigm that directly substitutes the entire aggregated pairwise loss with a surrogate loss function constructed from the first- and second-order statistics of the training data. Theoretically, while existing online AUC optimization algorithms typically achieve an $\mathcal{O}(\sqrt{T})$ regret bound, our method attains a tighter $\mathcal{O}(\ln T)$ bound. Furthermore, we extend the proposed framework to nonlinear settings through a kernel-based formulation. Extensive experiments on multiple benchmark datasets demonstrate the superior efficiency and effectiveness of the proposed second-order surrogate loss in optimizing online AUC performance.

【5】Scalable Principal-Agent Contract Design via Gradient-Based Optimization
标题：基于委托人的优化的可扩展委托代理合同设计
链接：https://arxiv.org/abs/2510.21177

作者：Tomer Galanti, Aarya Bookseller, Korok Ray
摘要：We study a bilevel \emph{max-max} optimization framework for principal-agent contract design, in which a principal chooses incentives to maximize utility while anticipating the agent's best response. This problem, central to moral hazard and contract theory, underlies applications ranging from market design to delegated portfolio management, hedge fund fee structures, and executive compensation. While linear-quadratic models such as Holmstr"om-Milgrom admit closed-form solutions, realistic environments with nonlinear utilities, stochastic dynamics, or high-dimensional actions generally do not. We introduce a generic algorithmic framework that removes this reliance on closed forms. Our method adapts modern machine learning techniques for bilevel optimization -- using implicit differentiation with conjugate gradients (CG) -- to compute hypergradients efficiently through Hessian-vector products, without ever forming or inverting Hessians. In benchmark CARA-Normal (Constant Absolute Risk Aversion with Gaussian distribution of uncertainty) environments, the approach recovers known analytical optima and converges reliably from random initialization. More broadly, because it is matrix-free, variance-reduced, and problem-agnostic, the framework extends naturally to complex nonlinear contracts where closed-form solutions are unavailable, such as sigmoidal wage schedules (logistic pay), relative-performance/tournament compensation with common shocks, multi-task contracts with vector actions and heterogeneous noise, and CARA-Poisson count models with $\mathbb{E}[X\mid a]=e^{a}$. This provides a new computational tool for contract design, enabling systematic study of models that have remained analytically intractable.

【6】On the Sample Complexity of Differentially Private Policy Optimization
标题：差异性私人政策优化的样本复杂性
链接：https://arxiv.org/abs/2510.21060

作者：Yi He, Xingyu Zhou
摘要：Policy optimization (PO) is a cornerstone of modern reinforcement learning (RL), with diverse applications spanning robotics, healthcare, and large language model training. The increasing deployment of PO in sensitive domains, however, raises significant privacy concerns. In this paper, we initiate a theoretical study of differentially private policy optimization, focusing explicitly on its sample complexity. We first formalize an appropriate definition of differential privacy (DP) tailored to PO, addressing the inherent challenges arising from on-policy learning dynamics and the subtlety involved in defining the unit of privacy. We then systematically analyze the sample complexity of widely-used PO algorithms, including policy gradient (PG), natural policy gradient (NPG) and more, under DP constraints and various settings, via a unified framework. Our theoretical results demonstrate that privacy costs can often manifest as lower-order terms in the sample complexity, while also highlighting subtle yet important observations in private PO settings. These offer valuable practical insights for privacy-preserving PO algorithms.

【7】More Than Memory Savings: Zeroth-Order Optimization Mitigates Forgetting in Continual Learning
标题：不仅仅是节省内存：零阶优化可以减少持续学习中的遗忘
链接：https://arxiv.org/abs/2510.21019

作者：Wanhao Yu, Zheng Wang, Shuteng Niu, Sen Lin, Li Yang
摘要：Zeroth-order (ZO) optimization has gained attention as a memory-efficient alternative to first-order (FO) methods, particularly in settings where gradient computation is expensive or even impractical. Beyond its memory efficiency, in this work, we investigate ZO optimization for continual learning (CL) as a novel approach to address the plasticity-stability-efficiency trilemma. Through theoretical analysis and empirical evidence, we show that ZO optimization naturally leads to flatter loss landscapes, which in turn reduce forgetting in CL. However, this stability comes at a cost of plasticity: due to its imprecise gradient estimates and slower convergence, ZO optimization tends to be less effective than FO in acquiring new task-specific knowledge, particularly under constrained training budgets. To better understand this trade-off, we conduct a holistic evaluation of ZO optimization applied to various existing CL methods. Our findings reveal that ZO optimization enhances stability but often undermines plasticity, particularly when used with learnable classifiers. Motivated by this insight, we propose ZO-FC, a simple but effective approach that applies ZO optimization to a single adapter-based PEFT module with FO optimized classifier. This design leverages the stability benefits of ZO while preserving the adaptability of FO updates with negligible memory overhead. Experiments demonstrate that ZO-FC achieves an effective balance between stability and plasticity, offering a practical and memory-efficient solution for on-device CL.

【8】MOBO-OSD: Batch Multi-Objective Bayesian Optimization via Orthogonal Search Directions
标题：MOBO-OSC：通过垂直搜索方向进行批量多目标Bayesian优化
链接：https://arxiv.org/abs/2510.20872

作者：Lam Ngo, Huong Ha, Jeffrey Chan, Hongyu Zhang
备注：Published at the 39th Conference on Neural Information Processing Systems (NeurIPS 2025)
摘要：Bayesian Optimization (BO) is a powerful tool for optimizing expensive black-box objective functions. While extensive research has been conducted on the single-objective optimization problem, the multi-objective optimization problem remains challenging. In this paper, we propose MOBO-OSD, a multi-objective Bayesian Optimization algorithm designed to generate a diverse set of Pareto optimal solutions by solving multiple constrained optimization problems, referred to as MOBO-OSD subproblems, along orthogonal search directions (OSDs) defined with respect to an approximated convex hull of individual objective minima. By employing a well-distributed set of OSDs, MOBO-OSD ensures broad coverage of the objective space, enhancing both solution diversity and hypervolume performance. To further improve the density of the set of Pareto optimal candidate solutions without requiring an excessive number of subproblems, we leverage a Pareto Front Estimation technique to generate additional solutions in the neighborhood of existing solutions. Additionally, MOBO-OSD supports batch optimization, enabling parallel function evaluations to accelerate the optimization process when resources are available. Through extensive experiments and analysis on a variety of synthetic and real-world benchmark functions with two to six objectives, we demonstrate that MOBO-OSD consistently outperforms the state-of-the-art algorithms. Our code implementation can be found at https://github.com/LamNgo1/mobo-osd.

【9】Iso-Riemannian Optimization on Learned Data Manifolds
标题：学习数据流上的等距Riemann优化
链接：https://arxiv.org/abs/2510.21033

作者：Willem Diepeveen, Melanie Weber
摘要：High-dimensional data that exhibit an intrinsic low-dimensional structure are ubiquitous in machine learning and data science. While various approaches allow for learning the corresponding data manifold from finite samples, performing downstream tasks such as optimization directly on these learned manifolds presents a significant challenge. This work introduces a principled framework for optimization on learned data manifolds using iso-Riemannian geometry. Our approach addresses key limitations of classical Riemannian optimization in this setting, specifically, that the Levi-Civita connection fails to yield constant-speed geodesics, and that geodesic convexity assumptions break down under the learned pullback constructions commonly used in practice. To overcome these challenges, we propose new notions of monotonicity and Lipschitz continuity tailored to the iso-Riemannian setting and propose iso-Riemannian descent algorithms for which we provide a detailed convergence analysis. We demonstrate the practical effectiveness of those algorithms on both synthetic and real datasets, including MNIST under a learned pullback structure. Our approach yields interpretable barycentres, improved clustering, and provably efficient solutions to inverse problems, even in high-dimensional settings. These results establish that optimization under iso-Riemannian geometry can overcome distortions inherent to learned manifold mappings.

【10】Exponential Convergence Guarantees for Iterative Markovian Fitting
标题：迭代马尔可夫拟合的指数收敛保证
链接：https://arxiv.org/abs/2510.20871

作者：Marta Gentiloni Silveri, Giovanni Conforti, Alain Durmus
摘要：The Schr\"odinger Bridge (SB) problem has become a fundamental tool in computational optimal transport and generative modeling. To address this problem, ideal methods such as Iterative Proportional Fitting and Iterative Markovian Fitting (IMF) have been proposed-alongside practical approximations like Diffusion Schr\"odinger Bridge and its Matching (DSBM) variant. While previous work have established asymptotic convergence guarantees for IMF, a quantitative, non-asymptotic understanding remains unknown. In this paper, we provide the first non-asymptotic exponential convergence guarantees for IMF under mild structural assumptions on the reference measure and marginal distributions, assuming a sufficiently large time horizon. Our results encompass two key regimes: one where the marginals are log-concave, and another where they are weakly log-concave. The analysis relies on new contraction results for the Markovian projection operator and paves the way to theoretical guarantees for DSBM.

预测|估计(7篇)

【1】Benchmarking Catastrophic Forgetting Mitigation Methods in Federated Time Series Forecasting
标题：联邦时间序列预测中的灾难性遗忘缓解方法基准
链接：https://arxiv.org/abs/2510.21491

作者：Khaled Hallak, Oudom Kem
备注：Accepted for presentation at the FLTA 2025 Conference on Federated Learning. This version corresponds to the camera-ready author manuscript
摘要：Catastrophic forgetting (CF) poses a persistent challenge in continual learning (CL), especially within federated learning (FL) environments characterized by non-i.i.d. time series data. While existing research has largely focused on classification tasks in vision domains, the regression-based forecasting setting prevalent in IoT and edge applications remains underexplored. In this paper, we present the first benchmarking framework tailored to investigate CF in federated continual time series forecasting. Using the Beijing Multi-site Air Quality dataset across 12 decentralized clients, we systematically evaluate several CF mitigation strategies, including Replay, Elastic Weight Consolidation, Learning without Forgetting, and Synaptic Intelligence. Key contributions include: (i) introducing a new benchmark for CF in time series FL, (ii) conducting a comprehensive comparative analysis of state-of-the-art methods, and (iii) releasing a reproducible open-source framework. This work provides essential tools and insights for advancing continual learning in federated time-series forecasting systems.

【2】Robust Yield Curve Estimation for Mortgage Bonds Using Neural Networks
标题：使用神经网络的抵押债券稳健收益率曲线估计
链接：https://arxiv.org/abs/2510.21347

作者：Sina Molavipour, Alireza M. Javid, Cassie Ye, Björn Löfdahl, Mikhail Nechaev
摘要：Robust yield curve estimation is crucial in fixed-income markets for accurate instrument pricing, effective risk management, and informed trading strategies. Traditional approaches, including the bootstrapping method and parametric Nelson-Siegel models, often struggle with overfitting or instability issues, especially when underlying bonds are sparse, bond prices are volatile, or contain hard-to-remove noise. In this paper, we propose a neural networkbased framework for robust yield curve estimation tailored to small mortgage bond markets. Our model estimates the yield curve independently for each day and introduces a new loss function to enforce smoothness and stability, addressing challenges associated with limited and noisy data. Empirical results on Swedish mortgage bonds demonstrate that our approach delivers more robust and stable yield curve estimates compared to existing methods such as Nelson-Siegel-Svensson (NSS) and Kernel-Ridge (KR). Furthermore, the framework allows for the integration of domain-specific constraints, such as alignment with risk-free benchmarks, enabling practitioners to balance the trade-off between smoothness and accuracy according to their needs.

【3】Data as a Lever: A Neighbouring Datasets Perspective on Predictive Multiplicity
标题：数据作为杠杆：预测多重性的邻近数据集视角
链接：https://arxiv.org/abs/2510.21303

作者：Prakhar Ganesh, Hsiang Hsu, Golnoosh Farnadi
摘要：Multiplicity -- the existence of distinct models with comparable performance -- has received growing attention in recent years. While prior work has largely emphasized modelling choices, the critical role of data in shaping multiplicity has been comparatively overlooked. In this work, we introduce a neighbouring datasets framework to examine the most granular case: the impact of a single-data-point difference on multiplicity. Our analysis yields a seemingly counterintuitive finding: neighbouring datasets with greater inter-class distribution overlap exhibit lower multiplicity. This reversal of conventional expectations arises from a shared Rashomon parameter, and we substantiate it with rigorous proofs. Building on this foundation, we extend our framework to two practical domains: active learning and data imputation. For each, we establish natural extensions of the neighbouring datasets perspective, conduct the first systematic study of multiplicity in existing algorithms, and finally, propose novel multiplicity-aware methods, namely, multiplicity-aware data acquisition strategies for active learning and multiplicity-aware data imputation techniques.

【4】A visual big data system for the prediction of weather-related variables: Jordan-Spain case study
标题：用于预测天气相关变量的可视化大数据系统：约旦-西班牙案例研究
链接：https://arxiv.org/abs/2510.21176

作者：Shadi Aljawarneh, Juan A. Lara, Muneer Bani Yassein
备注：None
摘要：The Meteorology is a field where huge amounts of data are generated, mainly collected by sensors at weather stations, where different variables can be measured. Those data have some particularities such as high volume and dimensionality, the frequent existence of missing values in some stations, and the high correlation between collected variables. In this regard, it is crucial to make use of Big Data and Data Mining techniques to deal with those data and extract useful knowledge from them that can be used, for instance, to predict weather phenomena. In this paper, we propose a visual big data system that is designed to deal with high amounts of weather-related data and lets the user analyze those data to perform predictive tasks over the considered variables (temperature and rainfall). The proposed system collects open data and loads them onto a local NoSQL database fusing them at different levels of temporal and spatial aggregation in order to perform a predictive analysis using univariate and multivariate approaches as well as forecasting based on training data from neighbor stations in cases with high rates of missing values. The system has been assessed in terms of usability and predictive performance, obtaining an overall normalized mean squared error value of 0.00013, and an overall directional symmetry value of nearly 0.84. Our system has been rated positively by a group of experts in the area (all aspects of the system except graphic desing were rated 3 or above in a 1-5 scale). The promising preliminary results obtained demonstrate the validity of our system and invite us to keep working on this area.

【5】SolarBoost: Distributed Photovoltaic Power Forecasting Amid Time-varying Grid Capacity
标题：SolarBoost：时变电网容量下的分布式太阳能电力预测
链接：https://arxiv.org/abs/2510.21129

作者：Linyuan Geng, Linxiao Yang, Xinyue Gu, Liang Sun
摘要：This paper presents SolarBoost, a novel approach for forecasting power output in distributed photovoltaic (DPV) systems. While existing centralized photovoltaic (CPV) methods are able to precisely model output dependencies due to uniformity, it is difficult to apply such techniques to DPV systems, as DPVs face challenges such as missing grid-level data, temporal shifts in installed capacity, geographic variability, and panel diversity. SolarBoost overcomes these challenges by modeling aggregated power output as a composite of output from small grids, where each grid output is modeled using a unit output function multiplied by its capacity. This approach decouples the homogeneous unit output function from dynamic capacity for accurate prediction. Efficient algorithms over an upper-bound approximation are proposed to overcome computational bottlenecks in loss functions. We demonstrate the superiority of grid-level modeling via theoretical analysis and experiments. SolarBoost has been validated through deployment across various cities in China, significantly reducing potential losses and provides valuable insights for the operation of power grids. The code for this work is available at https://github.com/DAMO-DI-ML/SolarBoost.

【6】xMem: A CPU-Based Approach for Accurate Estimation of GPU Memory in Deep Learning Training Workloads
标题：xMem：一种基于MCU的方法，用于在深度学习训练工作负载中准确估计图形处理器内存
链接：https://arxiv.org/abs/2510.21048

作者：Jiabo Shi, Dimitrios Pezaros, Yehia Elkhatib
摘要：The global scarcity of GPUs necessitates more sophisticated strategies for Deep Learning jobs in shared cluster environments. Accurate estimation of how much GPU memory a job will require is fundamental to enabling advanced scheduling and GPU sharing, which helps prevent out-of-memory (OOM) errors and resource underutilization. However, existing estimation methods have limitations. Approaches relying on static analysis or historical data with machine learning often fail to accurately capture runtime dynamics. Furthermore, direct GPU analysis consumes scarce resources, and some techniques require intrusive code modifications. Thus, the key challenge lies in precisely estimating dynamic memory requirements, including memory allocator nuances, without consuming GPU resources and non-intrusive code changes. To address this challenge, we propose xMem, a novel framework that leverages CPU-only dynamic analysis to accurately estimate peak GPU memory requirements a priori. We conducted a thorough evaluation of xMem against state-of-the-art solutions using workloads from 25 different models, including architectures like Convolutional Neural Networks and Transformers. The analysis of 5209 runs, which includes ANOVA and Monte Carlo results, highlights xMem's benefits: it decreases the median relative error by 91% and significantly reduces the probability of estimation failure as safe OOM thresholds by 75%, meaning that the estimated value can often be used directly without causing OOM. Ultimately, these improvements lead to a 368% increase in memory conservation potential over current solutions.

【7】Neural Mutual Information Estimation with Vector Copulas
标题：基于向量Copula的神经互信息估计
链接：https://arxiv.org/abs/2510.20968

作者：Yanzhi Chen, Zijing Ou, Adrian Weller, Michael U. Gutmann
摘要：Estimating mutual information (MI) is a fundamental task in data science and machine learning. Existing estimators mainly rely on either highly flexible models (e.g., neural networks), which require large amounts of data, or overly simplified models (e.g., Gaussian copula), which fail to capture complex distributions. Drawing upon recent vector copula theory, we propose a principled interpolation between these two extremes to achieve a better trade-off between complexity and capacity. Experiments on state-of-the-art synthetic benchmarks and real-world data with diverse modalities demonstrate the advantages of the proposed estimator.

其他神经网络|深度学习|模型|建模(22篇)

【1】Visual Diffusion Models are Geometric Solvers
标题：视觉扩散模型是几何求解器
链接：https://arxiv.org/abs/2510.21697

作者：Nir Goren, Shai Yehezkel, Omer Dahary, Andrey Voynov, Or Patashnik, Daniel Cohen-Or
备注：Project page: this https URL
摘要：In this paper we show that visual diffusion models can serve as effective geometric solvers: they can directly reason about geometric problems by working in pixel space. We first demonstrate this on the Inscribed Square Problem, a long-standing problem in geometry that asks whether every Jordan curve contains four points forming a square. We then extend the approach to two other well-known hard geometric problems: the Steiner Tree Problem and the Simple Polygon Problem. Our method treats each problem instance as an image and trains a standard visual diffusion model that transforms Gaussian noise into an image representing a valid approximate solution that closely matches the exact one. The model learns to transform noisy geometric structures into correct configurations, effectively recasting geometric reasoning as image generation. Unlike prior work that necessitates specialized architectures and domain-specific adaptations when applying diffusion to parametric geometric representations, we employ a standard visual diffusion model that operates on the visual representation of the problem. This simplicity highlights a surprising bridge between generative modeling and geometric problem solving. Beyond the specific problems studied here, our results point toward a broader paradigm: operating in image space provides a general and practical framework for approximating notoriously hard problems, and opens the door to tackling a far wider class of challenging geometric tasks.

【2】REVE: A Foundation Model for EEG -- Adapting to Any Setup with Large-Scale Pretraining on 25,000 Subjects
标题：REVE：脑电的基础模型--通过对25，000名受试者进行大规模预训练来适应任何设置
链接：https://arxiv.org/abs/2510.21585

作者：Yassine El Ouahidi, Jonathan Lys, Philipp Thölke, Nicolas Farrugia, Bastien Pasdeloup, Vincent Gripon, Karim Jerbi, Giulia Lioi
备注：Code available at: this https URL
摘要：Foundation models have transformed AI by reducing reliance on task-specific data through large-scale pretraining. While successful in language and vision, their adoption in EEG has lagged due to the heterogeneity of public datasets, which are collected under varying protocols, devices, and electrode configurations. Existing EEG foundation models struggle to generalize across these variations, often restricting pretraining to a single setup, resulting in suboptimal performance, in particular under linear probing. We present REVE (Representation for EEG with Versatile Embeddings), a pretrained model explicitly designed to generalize across diverse EEG signals. REVE introduces a novel 4D positional encoding scheme that enables it to process signals of arbitrary length and electrode arrangement. Using a masked autoencoding objective, we pretrain REVE on over 60,000 hours of EEG data from 92 datasets spanning 25,000 subjects, representing the largest EEG pretraining effort to date. REVE achieves state-of-the-art results on 10 downstream EEG tasks, including motor imagery classification, seizure detection, sleep staging, cognitive load estimation, and emotion recognition. With little to no fine-tuning, it demonstrates strong generalization, and nuanced spatio-temporal modeling. We release code, pretrained weights, and tutorials to support standardized EEG research and accelerate progress in clinical neuroscience.

【3】A Unified Model for Multi-Task Drone Routing in Post-Disaster Road Assessment
标题：灾后道路评估中多任务无人机路由统一模型
链接：https://arxiv.org/abs/2510.21525

作者：Huatian Gong, Jiuh-Biing Sheu, Zheng Wang, Xiaoguang Yang, Ran Yan
备注：34 pages, 8 figures,9 tables
摘要：Post-disaster road assessment (PDRA) is essential for emergency response, enabling rapid evaluation of infrastructure conditions and efficient allocation of resources. Although drones provide a flexible and effective tool for PDRA, routing them in large-scale networks remains challenging. Traditional optimization methods scale poorly and demand domain expertise, while existing deep reinforcement learning (DRL) approaches adopt a single-task paradigm, requiring separate models for each problem variant and lacking adaptability to evolving operational needs. This study proposes a unified model (UM) for drone routing that simultaneously addresses eight PDRA variants. By training a single neural network across multiple problem configurations, UM captures shared structural knowledge while adapting to variant-specific constraints through a modern transformer encoder-decoder architecture. A lightweight adapter mechanism further enables efficient finetuning to unseen attributes without retraining, enhancing deployment flexibility in dynamic disaster scenarios. Extensive experiments demonstrate that the UM reduces training time and parameters by a factor of eight compared with training separate models, while consistently outperforming single-task DRL methods by 6--14\% and traditional optimization approaches by 24--82\% in terms of solution quality (total collected information value). The model achieves real-time solutions (1--10 seconds) across networks of up to 1,000 nodes, with robustness confirmed through sensitivity analyses. Moreover, finetuning experiments show that unseen attributes can be effectively incorporated with minimal cost while retaining high solution quality. The proposed UM advances neural combinatorial optimization for time-critical applications, offering a computationally efficient, high-quality, and adaptable solution for drone-based PDRA.

【4】Causality Meets Locality: Provably Generalizable and Scalable Policy Learning for Networked Systems
标题：因果关系满足局部性：网络系统的可证明推广和可扩展的政策学习
链接：https://arxiv.org/abs/2510.21427

作者：Hao Liang, Shuqing Shi, Yudi Zhang, Biwei Huang, Yali Du
备注：NeurIPS 2025 (Spotlight)
摘要：Large-scale networked systems, such as traffic, power, and wireless grids, challenge reinforcement-learning agents with both scale and environment shifts. To address these challenges, we propose GSAC (Generalizable and Scalable Actor-Critic), a framework that couples causal representation learning with meta actor-critic learning to achieve both scalability and domain generalization. Each agent first learns a sparse local causal mask that provably identifies the minimal neighborhood variables influencing its dynamics, yielding exponentially tight approximately compact representations (ACRs) of state and domain factors. These ACRs bound the error of truncating value functions to $\kappa$-hop neighborhoods, enabling efficient learning on graphs. A meta actor-critic then trains a shared policy across multiple source domains while conditioning on the compact domain factors; at test time, a few trajectories suffice to estimate the new domain factor and deploy the adapted policy. We establish finite-sample guarantees on causal recovery, actor-critic convergence, and adaptation gap, and show that GSAC adapts rapidly and significantly outperforms learning-from-scratch and conventional adaptation baselines.

【5】A Rapid Physics-Informed Machine Learning Framework Based on Extreme Learning Machine for Inverse Stefan Problems
标题：基于极限学习机的快速物理信息机器学习框架，用于反Stefan问题
链接：https://arxiv.org/abs/2510.21426

作者：Pei-Zhi Zhuang, Ming-Yue Yang, Fei Ren, Hong-Ya Yue, He Yang
摘要：The inverse Stefan problem, as a typical phase-change problem with moving boundaries, finds extensive applications in science and engineering. Recent years have seen the applications of physics-informed neural networks (PINNs) to solving Stefan problems, yet they still exhibit shortcomings in hyperparameter dependency, training efficiency, and prediction accuracy. To address this, this paper develops a physics-informed extreme learning machine (PIELM), a rapid physics-informed learning method framework for inverse Stefan problems. PIELM replaces conventional deep neural networks with an extreme learning machine network. The input weights are fixed in the PIELM framework, and the output weights are determined by optimizing a loss vector of physical laws composed by initial and boundary conditions and governing partial differential equations (PDEs). Then, solving inverse Stefan problems is transformed into finding the Moore-Penrose generalized inverse by the least squares method. Case studies show that the PIELM can increase the prediction accuracy by 3-7 order of magnitude in terms of the relative L2 error, and meanwhile saving more than 94% training time, compared to conventional PINNs.

【6】FairImagen: Post-Processing for Bias Mitigation in Text-to-Image Models
标题：FairImagen：文本到图像模型中缓解偏差的后处理
链接：https://arxiv.org/abs/2510.21363

作者：Zihao Fu, Ryan Brown, Shun Shao, Kai Rawal, Eoin Delaney, Chris Russell
备注：Neurips 2025
摘要：Text-to-image diffusion models, such as Stable Diffusion, have demonstrated remarkable capabilities in generating high-quality and diverse images from natural language prompts. However, recent studies reveal that these models often replicate and amplify societal biases, particularly along demographic attributes like gender and race. In this paper, we introduce FairImagen (https://github.com/fuzihaofzh/FairImagen), a post-hoc debiasing framework that operates on prompt embeddings to mitigate such biases without retraining or modifying the underlying diffusion model. Our method integrates Fair Principal Component Analysis to project CLIP-based input embeddings into a subspace that minimizes group-specific information while preserving semantic content. We further enhance debiasing effectiveness through empirical noise injection and propose a unified cross-demographic projection method that enables simultaneous debiasing across multiple demographic attributes. Extensive experiments across gender, race, and intersectional settings demonstrate that FairImagen significantly improves fairness with a moderate trade-off in image quality and prompt fidelity. Our framework outperforms existing post-hoc methods and offers a simple, scalable, and model-agnostic solution for equitable text-to-image generation.

【7】$α$-LoRA: Effective Fine-Tuning via Base Model Rescaling
标题：$a $-LoRA：通过基本模型重新缩放进行有效的微调
链接：https://arxiv.org/abs/2510.21345

作者：Aymane El Firdoussi, El Mahdi Chayti, Mohamed El Amine Seddik, Martin Jaggi
摘要：Fine-tuning has proven to be highly effective in adapting pre-trained models to perform better on new desired tasks with minimal data samples. Among the most widely used approaches are reparameterization methods, which update a target module by augmenting its frozen weight matrix with an additional trainable weight matrix. The most prominent example is Low Rank Adaption (LoRA), which gained significant attention in recent years. In this paper, we introduce a new class of reparameterization methods for transfer learning, designed to enhance the generalization ability of fine-tuned models. We establish the effectiveness of our approach in a high-dimensional binary classification setting using tools from Random Matrix Theory, and further validate our theoretical findings through more realistic experiments, such as fine-tuning LLMs.

【8】Seemingly Redundant Modules Enhance Robust Odor Learning in Fruit Flies
标题：看似冗余的模块增强果蝇的稳健气味学习
链接：https://arxiv.org/abs/2510.21315

作者：Haiyang Li, Liao Yu, Qiang Yu, Yunliang Zang
备注：10page,Accepted by NeurIPS
摘要：Biological circuits have evolved to incorporate multiple modules that perform similar functions. In the fly olfactory circuit, both lateral inhibition (LI) and neuronal spike frequency adaptation (SFA) are thought to enhance pattern separation for odor learning. However, it remains unclear whether these mechanisms play redundant or distinct roles in this process. In this study, we present a computational model of the fly olfactory circuit to investigate odor discrimination under varying noise conditions that simulate complex environments. Our results show that LI primarily enhances odor discrimination in low- and medium-noise scenarios, but this benefit diminishes and may reverse under higher-noise conditions. In contrast, SFA consistently improves discrimination across all noise levels. LI is preferentially engaged in low- and medium-noise environments, whereas SFA dominates in high-noise settings. When combined, these two sparsification mechanisms enable optimal discrimination performance. This work demonstrates that seemingly redundant modules in biological circuits can, in fact, be essential for achieving optimal learning in complex contexts.

【9】Additive Models Explained: A Computational Complexity Approach
标题：加法模型解释：计算复杂性方法
链接：https://arxiv.org/abs/2510.21292

作者：Shahaf Bassan, Michal Moshkovitz, Guy Katz
备注：To appear in NeurIPS 2025
摘要：Generalized Additive Models (GAMs) are commonly considered *interpretable* within the ML community, as their structure makes the relationship between inputs and outputs relatively understandable. Therefore, it may seem natural to hypothesize that obtaining meaningful explanations for GAMs could be performed efficiently and would not be computationally infeasible. In this work, we challenge this hypothesis by analyzing the *computational complexity* of generating different explanations for various forms of GAMs across multiple contexts. Our analysis reveals a surprisingly diverse landscape of both positive and negative complexity outcomes. Particularly, under standard complexity assumptions such as P!=NP, we establish several key findings: (1) in stark contrast to many other common ML models, the complexity of generating explanations for GAMs is heavily influenced by the structure of the input space; (2) the complexity of explaining GAMs varies significantly with the types of component models used - but interestingly, these differences only emerge under specific input domain settings; (3) significant complexity distinctions appear for obtaining explanations in regression tasks versus classification tasks in GAMs; and (4) expressing complex models like neural networks additively (e.g., as neural additive models) can make them easier to explain, though interestingly, this benefit appears only for certain explanation methods and input domains. Collectively, these results shed light on the feasibility of computing diverse explanations for GAMs, offering a rigorous theoretical picture of the conditions under which such computations are possible or provably hard.

【10】Unified Implementations of Recurrent Neural Networks in Multiple Deep Learning Frameworks
标题：多个深度学习框架中循环神经网络的统一实现
链接：https://arxiv.org/abs/2510.21252

作者：Francesco Martinuzzi
摘要：Recurrent neural networks (RNNs) are a cornerstone of sequence modeling across various scientific and industrial applications. Owing to their versatility, numerous RNN variants have been proposed over the past decade, aiming to improve the modeling of long-term dependencies and to address challenges such as vanishing and exploding gradients. However, no central library is available to test these variations, and reimplementing diverse architectures can be time-consuming and error-prone, limiting reproducibility and exploration. Here, we introduce three open-source libraries in Julia and Python that centralize numerous recurrent cell implementations and higher-level recurrent architectures. torchrecurrent, RecurrentLayers.jl, and LuxRecurrentLayers.jl offer a consistent framework for constructing and extending RNN models, providing built-in mechanisms for customization and experimentation. All packages are available under the MIT license and actively maintained on GitHub.

【11】How Hard is it to Confuse a World Model?
标题：混淆一个世界模型有多难？
链接：https://arxiv.org/abs/2510.21232

作者：Waris Radji (Scool, CRIStAL), Odalric-Ambrym Maillard (Scool, CRIStAL)
摘要：In reinforcement learning (RL) theory, the concept of most confusing instances is central to establishing regret lower bounds, that is, the minimal exploration needed to solve a problem. Given a reference model and its optimal policy, a most confusing instance is the statistically closest alternative model that makes a suboptimal policy optimal. While this concept is well-studied in multi-armed bandits and ergodic tabular Markov decision processes, constructing such instances remains an open question in the general case. In this paper, we formalize this problem for neural network world models as a constrained optimization: finding a modified model that is statistically close to the reference one, while producing divergent performance between optimal and suboptimal policies. We propose an adversarial training procedure to solve this problem and conduct an empirical study across world models of varying quality. Our results suggest that the degree of achievable confusion correlates with uncertainty in the approximate model, which may inform theoretically-grounded exploration strategies for deep model-based RL.

【12】Model Merging with Functional Dual Anchors
标题：模型与功能性双通道的融合
链接：https://arxiv.org/abs/2510.21223

作者：Kexuan Shi, Yandong Wen, Weiyang Liu
备注：Technical report (23 pages, 15 figures, project page: this https URL)
摘要：Model merging is an efficient post-training strategy for integrating knowledge from multiple finetuned checkpoints of a shared foundation model. Existing methods operate in the parameter space, combining task vectors to mitigate conflicts, but remain constrained by parameter inconsistencies. We propose Functional Dual Anchors (FDAs), a framework that instead models the input-representation space. FDAs are synthetic inputs whose induced gradients align with task vectors, capturing task-specific functional shifts relative to the pretrained model. This perspective bridges joint multi-task training and post-hoc merging, offering both robustness and flexibility. We further introduce a principled initialization scheme and show that FDAs are complementary to parameter-space model merging. Comprehensive experiments demonstrate the effectiveness of FDAs in model merging.

【13】Mitra: Mixed Synthetic Priors for Enhancing Tabular Foundation Models
标题：Mitra：增强表格基础模型的混合合成先验
链接：https://arxiv.org/abs/2510.21204

作者：Xiyuan Zhang, Danielle C. Maddix, Junming Yin, Nick Erickson, Abdul Fatir Ansari, Boran Han, Shuai Zhang, Leman Akoglu, Christos Faloutsos, Michael W. Mahoney, Cuixiong Hu, Huzefa Rangwala, George Karypis, Bernie Wang
备注：NeurIPS 2025. We released both classifier (autogluon/mitra-classifier) and regressor (autogluon/mitra-regressor) model weights on HuggingFace
摘要：Since the seminal work of TabPFN, research on tabular foundation models (TFMs) based on in-context learning (ICL) has challenged long-standing paradigms in machine learning. Without seeing any real-world data, models pretrained on purely synthetic datasets generalize remarkably well across diverse datasets, often using only a moderate number of in-context examples. This shifts the focus in tabular machine learning from model architecture design to the design of synthetic datasets, or, more precisely, to the prior distributions that generate them. Yet the guiding principles for prior design remain poorly understood. This work marks the first attempt to address the gap. We systematically investigate and identify key properties of synthetic priors that allow pretrained TFMs to generalize well. Based on these insights, we introduce Mitra, a TFM trained on a curated mixture of synthetic priors selected for their diversity, distinctiveness, and performance on real-world tabular data. Mitra consistently outperforms state-of-the-art TFMs, such as TabPFNv2 and TabICL, across both classification and regression benchmarks, with better sample efficiency.

【14】PLAN: Proactive Low-Rank Allocation for Continual Learning
标题：警告：积极主动地为持续学习分配低级别
链接：https://arxiv.org/abs/2510.21188

作者：Xiequn Wang, Zhan Zhuang, Yu Zhang
备注：accepted by ICCV 2025
摘要：Continual learning (CL) requires models to continuously adapt to new tasks without forgetting past knowledge. In this work, we propose \underline{P}roactive \underline{L}ow-rank \underline{A}llocatio\underline{N} (PLAN), a framework that extends Low-Rank Adaptation (LoRA) to enable efficient and interference-aware fine-tuning of large pre-trained models in CL settings. PLAN proactively manages the allocation of task-specific subspaces by introducing orthogonal basis vectors for each task and optimizing them through a perturbation-based strategy that minimizes conflicts with previously learned parameters. Furthermore, PLAN incorporates a novel selection mechanism that identifies and assigns basis vectors with minimal sensitivity to interference, reducing the risk of degrading past knowledge while maintaining efficient adaptation to new tasks. Empirical results on standard CL benchmarks demonstrate that PLAN consistently outperforms existing methods, establishing a new state-of-the-art for continual learning with foundation models.

【15】TURBOTEST: Learning When Less is Enough through Early Termination of Internet Speed Tests
标题：TurboTest：通过提前终止网速测试来学习何时更少就足够了
链接：https://arxiv.org/abs/2510.21141

作者：Haarika Manda, Manshi Sagar, Yogesh, Kartikay Singh, Cindy Zhao, Tarun Mangla, Phillipa Gill, Elizabeth Belding, Arpit Gupta
摘要：Internet speed tests are indispensable for users, ISPs, and policymakers, but their static flooding-based design imposes growing costs: a single high-speed test can transfer hundreds of megabytes, and collectively, platforms like Ookla, M-Lab, and Fast.com generate petabytes of traffic each month. Reducing this burden requires deciding when a test can be stopped early without sacrificing accuracy. We frame this as an optimal stopping problem and show that existing heuristics-static thresholds, BBR pipe-full signals, or throughput stability rules from Fast.com and FastBTS-capture only a narrow portion of the achievable accuracy-savings trade-off. This paper introduces TURBOTEST, a systematic framework for speed test termination that sits atop existing platforms. The key idea is to decouple throughput prediction (Stage 1) from test termination (Stage 2): Stage 1 trains a regressor to estimate final throughput from partial measurements, while Stage 2 trains a classifier to decide when sufficient evidence has accumulated to stop. Leveraging richer transport-level features (RTT, retransmissions, congestion window) alongside throughput, TURBOTEST exposes a single tunable parameter for accuracy tolerance and includes a fallback mechanism for high-variability cases. Evaluation on 173,000 M-Lab NDT speed tests (2024-2025) shows that TURBOTEST achieves nearly 2-4x higher data savings than an approach based on BBR signals while reducing median error. These results demonstrate that adaptive ML-based termination can deliver accurate, efficient, and deployable speed tests at scale.

【16】Neural Collapse under Gradient Flow on Shallow ReLU Networks for Orthogonally Separable Data
标题：一维可分离数据的浅ReLU网络梯度流下的神经崩溃
链接：https://arxiv.org/abs/2510.21078

作者：Hancheng Min, Zhihui Zhu, René Vidal
备注：NeurIPS 2025
摘要：Among many mysteries behind the success of deep networks lies the exceptional discriminative power of their learned representations as manifested by the intriguing Neural Collapse (NC) phenomenon, where simple feature structures emerge at the last layer of a trained neural network. Prior works on the theoretical understandings of NC have focused on analyzing the optimization landscape of matrix-factorization-like problems by considering the last-layer features as unconstrained free optimization variables and showing that their global minima exhibit NC. In this paper, we show that gradient flow on a two-layer ReLU network for classifying orthogonally separable data provably exhibits NC, thereby advancing prior results in two ways: First, we relax the assumption of unconstrained features, showing the effect of data structure and nonlinear activations on NC characterizations. Second, we reveal the role of the implicit bias of the training dynamics in facilitating the emergence of NC.

【17】From Information to Generative Exponent: Learning Rate Induces Phase Transitions in SGD
标题：从信息到生成指数：学习率导致新元的阶段转变
链接：https://arxiv.org/abs/2510.21020

作者：Konstantinos Christopher Tsiolis, Alireza Mousavi-Hosseini, Murat A. Erdogdu
备注：NeurIPS 2025
摘要：To understand feature learning dynamics in neural networks, recent theoretical works have focused on gradient-based learning of Gaussian single-index models, where the label is a nonlinear function of a latent one-dimensional projection of the input. While the sample complexity of online SGD is determined by the information exponent of the link function, recent works improved this by performing multiple gradient steps on the same sample with different learning rates -- yielding a non-correlational update rule -- and instead are limited by the (potentially much smaller) generative exponent. However, this picture is only valid when these learning rates are sufficiently large. In this paper, we characterize the relationship between learning rate(s) and sample complexity for a broad class of gradient-based algorithms that encapsulates both correlational and non-correlational updates. We demonstrate that, in certain cases, there is a phase transition from an "information exponent regime" with small learning rate to a "generative exponent regime" with large learning rate. Our framework covers prior analyses of one-pass SGD and SGD with batch reuse, while also introducing a new layer-wise training algorithm that leverages a two-timescales approach (via different learning rates for each layer) to go beyond correlational queries without reusing samples or modifying the loss from squared error. Our theoretical study demonstrates that the choice of learning rate is as important as the design of the algorithm in achieving statistical and computational efficiency.

【18】AL-CoLe: Augmented Lagrangian for Constrained Learning
标题：AL-CoLe：用于约束学习的增广拉格朗日量
链接：https://arxiv.org/abs/2510.20995

作者：Ignacio Boero, Ignacio Hounie, Alejandro Ribeiro
摘要：Despite the non-convexity of most modern machine learning parameterizations, Lagrangian duality has become a popular tool for addressing constrained learning problems. We revisit Augmented Lagrangian methods, which aim to mitigate the duality gap in non-convex settings while requiring only minimal modifications, and have remained comparably unexplored in constrained learning settings. We establish strong duality results under mild conditions, prove convergence of dual ascent algorithms to feasible and optimal primal solutions, and provide PAC-style generalization guarantees. Finally, we demonstrate its effectiveness on fairness constrained classification tasks.

【19】NeuroPilot: A Realtime Brain-Computer Interface system to enhance concentration of students in online learning
标题：NeuPilot：实时脑机接口系统，可提高学生在线学习的注意力
链接：https://arxiv.org/abs/2510.20958

作者：Asif Islam, Farhan Ishtiaque, Md. Muhyminul Haque, Kaled Masukur Rahman, Ravi Vaidyanathan, Khondaker A. Mamun
备注：11 pages, 5 figures and 3 tables
摘要：Prevalence of online learning poses a vital challenge in real-time monitoring of students' concentration. Traditional methods such as questionnaire assessments require manual interventions and webcam-based monitoring fails to provide accurate insights into learners' mental focus as they are deceived by mere screen fixation without cognitive engagement. Existing BCI-based approaches lack real-time validation and evaluation procedures. To address these limitations, a Brain-Computer Interface (BCI) system is developed using a non-invasive Electroencephalogram (EEG) headband, FocusCalm, to record brainwave activity under attentive and non-attentive states. 20 minutes of data were collected from each of 20 participants watching a pre-recorded educational video. The data validation employed a novel intra-video questionnaire assessment. Subsequently, collected signals were segmented (sliding window), filtered (butterworth bandpass), and cleaned (removal of high-amplitude and EOG artifacts such as eye blinks). Time, frequency, wavelet and statistical features have been extracted, followed by recursive feature elimination (RFE) with Support vector machines (SVMs) to classify attention and non-attention states. The leave-one-subject-out (LOSO) cross-validation accuracy has been tested to be 88.77%. The system provides feedback alerts upon non-attention state detection and keeps focus profile logs. A pilot study was conducted to evaluate the effectiveness of real-time feedback. Five participants completed a 10-minute session consisting of a 5-minute baseline phase without feedback followed by a 5-minute feedback phase, during which alerts were issued if participants remained non-attentive for approximately 8 consecutive seconds. A paired t-test (t = 5.73, p = 0.007) indicated a statistically significant improvement in concentration during the feedback phase.

【20】Learning from Interval Targets
标题：从间隔目标中学习
链接：https://arxiv.org/abs/2510.20925

作者：Rattana Pukdee, Ziqi Ke, Chirag Gupta
备注：NeurIPS 2025
摘要：We study the problem of regression with interval targets, where only upper and lower bounds on target values are available in the form of intervals. This problem arises when the exact target label is expensive or impossible to obtain, due to inherent uncertainties. In the absence of exact targets, traditional regression loss functions cannot be used. First, we study the methodology of using a loss functions compatible with interval targets, for which we establish non-asymptotic generalization bounds based on smoothness of the hypothesis class that significantly relaxing prior assumptions of realizability and small ambiguity degree. Second, we propose a novel min-max learning formulation: minimize against the worst-case (maximized) target labels within the provided intervals. The maximization problem in the latter is non-convex, but we show that good performance can be achieved with the incorporation of smoothness constraints. Finally, we perform extensive experiments on real-world datasets and show that our methods achieve state-of-the-art performance.

【21】Information Theoretic Learning for Diffusion Models with Warm Start
标题：热启动扩散模型的信息论学习
链接：https://arxiv.org/abs/2510.20903

作者：Yirong Shen, Lu Gan, Cong Ling
备注：NeurIPS 2025
摘要：Generative models that maximize model likelihood have gained traction in many practical settings. Among them, perturbation based approaches underpin many strong likelihood estimation models, yet they often face slow convergence and limited theoretical understanding. In this paper, we derive a tighter likelihood bound for noise driven models to improve both the accuracy and efficiency of maximum likelihood learning. Our key insight extends the classical KL divergence Fisher information relationship to arbitrary noise perturbations, going beyond the Gaussian assumption and enabling structured noise distributions. This formulation allows flexible use of randomized noise distributions that naturally account for sensor artifacts, quantization effects, and data distribution smoothing, while remaining compatible with standard diffusion training. Treating the diffusion process as a Gaussian channel, we further express the mismatched entropy between data and model, showing that the proposed objective upper bounds the negative log-likelihood (NLL). In experiments, our models achieve competitive NLL on CIFAR-10 and SOTA results on ImageNet across multiple resolutions, all without data augmentation, and the framework extends naturally to discrete data.

【22】Multimodal Negative Learning
标题：多模式负性学习
链接：https://arxiv.org/abs/2510.20877

作者：Baoquan Gong, Xiyuan Gao, Pengfei Zhu, Qinghua Hu, Bing Cao
备注：Published in NeurIPS 2025
摘要：Multimodal learning systems often encounter challenges related to modality imbalance, where a dominant modality may overshadow others, thereby hindering the learning of weak modalities. Conventional approaches often force weak modalities to align with dominant ones in "Learning to be (the same)" (Positive Learning), which risks suppressing the unique information inherent in the weak modalities. To address this challenge, we offer a new learning paradigm: "Learning Not to be" (Negative Learning). Instead of enhancing weak modalities' target-class predictions, the dominant modalities dynamically guide the weak modality to suppress non-target classes. This stabilizes the decision space and preserves modality-specific information, allowing weak modalities to preserve unique information without being over-aligned. We proceed to reveal multimodal learning from a robustness perspective and theoretically derive the Multimodal Negative Learning (MNL) framework, which introduces a dynamic guidance mechanism tailored for negative learning. Our method provably tightens the robustness lower bound of multimodal learning by increasing the Unimodal Confidence Margin (UCoM) and reduces the empirical error of weak modalities, particularly under noisy and imbalanced scenarios. Extensive experiments across multiple benchmarks demonstrate the effectiveness and generalizability of our approach against competing methods. The code will be available at https://github.com/BaoquanGong/Multimodal-Negative-Learning.git.

其他(33篇)

【1】Equivariance by Contrast: Identifiable Equivariant Embeddings from Unlabeled Finite Group Actions
标题：对比等变：来自未标记有限群动作的可识别等变嵌入
链接：https://arxiv.org/abs/2510.21706

作者：Tobias Schmidt, Steffen Schneider, Matthias Bethge
备注：Accepted at NeurIPS 2025. The last two authors contributed equally. Code is available at this https URL
摘要：We propose Equivariance by Contrast (EbC) to learn equivariant embeddings from observation pairs $(\mathbf{y}, g \cdot \mathbf{y})$, where $g$ is drawn from a finite group acting on the data. Our method jointly learns a latent space and a group representation in which group actions correspond to invertible linear maps -- without relying on group-specific inductive biases. We validate our approach on the infinite dSprites dataset with structured transformations defined by the finite group $G:= (R_m \times \mathbb{Z}_n \times \mathbb{Z}_n)$, combining discrete rotations and periodic translations. The resulting embeddings exhibit high-fidelity equivariance, with group operations faithfully reproduced in latent space. On synthetic data, we further validate the approach on the non-abelian orthogonal group $O(n)$ and the general linear group $GL(n)$. We also provide a theoretical proof for identifiability. While broad evaluation across diverse group types on real-world data remains future work, our results constitute the first successful demonstration of general-purpose encoder-only equivariant learning from group action observations alone, including non-trivial non-abelian groups and a product group motivated by modeling affine equivariances in computer vision.

【2】Mechanistic Interpretability for Neural TSP Solvers
标题：神经求解器的机制解释性
链接：https://arxiv.org/abs/2510.21693

作者：Reuben Narad, Leonard Boussioux, Michael Wagner
摘要：Neural networks have advanced combinatorial optimization, with Transformer-based solvers achieving near-optimal solutions on the Traveling Salesman Problem (TSP) in milliseconds. However, these models operate as black boxes, providing no insight into the geometric patterns they learn or the heuristics they employ during tour construction. We address this opacity by applying sparse autoencoders (SAEs), a mechanistic interpretability technique, to a Transformer-based TSP solver, representing the first application of activation-based interpretability methods to operations research models. We train a pointer network with reinforcement learning on 100-node instances, then fit an SAE to the encoder's residual stream to discover an overcomplete dictionary of interpretable features. Our analysis reveals that the solver naturally develops features mirroring fundamental TSP concepts: boundary detectors that activate on convex-hull nodes, cluster-sensitive features responding to locally dense regions, and separator features encoding geometric partitions. These findings provide the first model-internal account of what neural TSP solvers compute before node selection, demonstrate that geometric structure emerges without explicit supervision, and suggest pathways toward transparent hybrid systems that combine neural efficiency with algorithmic interpretability. Interactive feature explorer: https://reubennarad.github.io/TSP_interp

【3】Generalised Flow Maps for Few-Step Generative Modelling on Riemannian Manifolds
标题：Riemannian Manifics上少步生成建模的广义流图
链接：https://arxiv.org/abs/2510.21608

作者：Oscar Davis, Michael S. Albergo, Nicholas M. Boffi, Michael M. Bronstein, Avishek Joey Bose
备注：Under review
摘要：Geometric data and purpose-built generative models on them have become ubiquitous in high-impact deep learning application domains, ranging from protein backbone generation and computational chemistry to geospatial data. Current geometric generative models remain computationally expensive at inference -- requiring many steps of complex numerical simulation -- as they are derived from dynamical measure transport frameworks such as diffusion and flow-matching on Riemannian manifolds. In this paper, we propose Generalised Flow Maps (GFM), a new class of few-step generative models that generalises the Flow Map framework in Euclidean spaces to arbitrary Riemannian manifolds. We instantiate GFMs with three self-distillation-based training methods: Generalised Lagrangian Flow Maps, Generalised Eulerian Flow Maps, and Generalised Progressive Flow Maps. We theoretically show that GFMs, under specific design decisions, unify and elevate existing Euclidean few-step generative models, such as consistency models, shortcut models, and meanflows, to the Riemannian setting. We benchmark GFMs against other geometric generative models on a suite of geometric datasets, including geospatial data, RNA torsion angles, and hyperbolic manifolds, and achieve state-of-the-art sample quality for single- and few-step evaluations, and superior or competitive log-likelihoods using the implicit probability flow.

【4】Cost Minimization for Space-Air-Ground Integrated Multi-Access Edge Computing Systems
标题：空、空、地一体化多址边缘计算系统的成本最小化
链接：https://arxiv.org/abs/2510.21541

作者：Weihong Qin, Aimin Wang, Geng Sun, Zemin Sun, Jiacheng Wang, Dusit Niyato, Dong In Kim, Zhu Han
摘要：Space-air-ground integrated multi-access edge computing (SAGIN-MEC) provides a promising solution for the rapidly developing low-altitude economy (LAE) to deliver flexible and wide-area computing services. However, fully realizing the potential of SAGIN-MEC in the LAE presents significant challenges, including coordinating decisions across heterogeneous nodes with different roles, modeling complex factors such as mobility and network variability, and handling real-time decision-making under partially observable environment with hybrid variables. To address these challenges, we first present a hierarchical SAGIN-MEC architecture that enables the coordination between user devices (UDs), uncrewed aerial vehicles (UAVs), and satellites. Then, we formulate a UD cost minimization optimization problem (UCMOP) to minimize the UD cost by jointly optimizing the task offloading ratio, UAV trajectory planning, computing resource allocation, and UD association. We show that the UCMOP is an NP-hard problem. To overcome this challenge, we propose a multi-agent deep deterministic policy gradient (MADDPG)-convex optimization and coalitional game (MADDPG-COCG) algorithm. Specifically, we employ the MADDPG algorithm to optimize the continuous temporal decisions for heterogeneous nodes in the partially observable SAGIN-MEC system. Moreover, we propose a convex optimization and coalitional game (COCG) method to enhance the conventional MADDPG by deterministically handling the hybrid and varying-dimensional decisions. Simulation results demonstrate that the proposed MADDPG-COCG algorithm significantly enhances the user-centric performances in terms of the aggregated UD cost, task completion delay, and UD energy consumption, with a slight increase in UAV energy consumption, compared to the benchmark algorithms. Moreover, the MADDPG-COCG algorithm shows superior convergence stability and scalability.

【5】Excision Score: Evaluating Edits with Surgical Precision
标题：切除评分：以手术精度评估编辑
链接：https://arxiv.org/abs/2510.21537

作者：Nikolai Gruzinov, Ksenia Sycheva, Earl T. Barr, Alex Bezzubov
备注：Code is available at this https URL
摘要：Many tasks revolve around editing a document, whether code or text. We formulate the revision similarity problem to unify a wide range of machine learning evaluation problems whose goal is to assess a revision to an existing document. We observe that revisions usually change only a small portion of an existing document, so the existing document and its immediate revisions share a majority of their content. We formulate five adequacy criteria for revision similarity measures, designed to align them with human judgement. We show that popular pairwise measures, like BLEU, fail to meet these criteria, because their scores are dominated by the shared content. They report high similarity between two revisions when humans would assess them as quite different. This is a fundamental flaw we address. We propose a novel static measure, Excision Score (ES), which computes longest common subsequence (LCS) to remove content shared by an existing document with the ground truth and predicted revisions, before comparing only the remaining divergent regions. This is analogous to a surgeon creating a sterile field to focus on the work area. We use approximation to speed the standard cubic LCS computation to quadratic. In code-editing evaluation, where static measures are often used as a cheap proxy for passing tests, we demonstrate that ES surpasses existing measures. When aligned with test execution on HumanEvalFix, ES improves over its nearest competitor, SARI, by 12% Pearson correlation and by >21% over standard measures like BLEU. The key criterion is invariance to shared context; when we perturb HumanEvalFix with increased shared context, ES' improvement over SARI increases to 20% and >30% over standard measures. ES also handles other corner cases that other measures do not, such as correctly aligning moved code blocks, and appropriately rewarding matching insertions or deletions.

【6】Probe-based Fine-tuning for Reducing Toxicity
标题：基于探针的微调以降低毒性
链接：https://arxiv.org/abs/2510.21531

作者：Jan Wehner, Mario Fritz
摘要：Probes trained on model activations can detect undesirable behaviors like deception or biases that are difficult to identify from outputs alone. This makes them useful detectors to identify misbehavior. Furthermore, they are also valuable training signals, since they not only reward outputs, but also good internal processes for arriving at that output. However, training against interpretability tools raises a fundamental concern: when a monitor becomes a training target, it may cease to be reliable (Goodhart's Law). We propose two methods for training against probes based on Supervised Fine-tuning and Direct Preference Optimization. We conduct an initial exploration of these methods in a testbed for reducing toxicity and evaluate the amount by which probe accuracy drops when training against them. To retain the accuracy of probe-detectors after training, we attempt (1) to train against an ensemble of probes, (2) retain held-out probes that aren't used for training, and (3) retrain new probes after training. First, probe-based preference optimization unexpectedly preserves probe detectability better than classifier-based methods, suggesting the preference learning objective incentivizes maintaining rather than obfuscating relevant representations. Second, probe diversity provides minimal practical benefit - simply retraining probes after optimization recovers high detection accuracy. Our findings suggest probe-based training can be viable for certain alignment methods, though probe ensembles are largely unnecessary when retraining is feasible.

【7】Risk Management for Mitigating Benchmark Failure Modes: BenchRisk
标题：缓解基准故障模式的风险管理：BenchRisk
链接：https://arxiv.org/abs/2510.21460

作者：Sean McGregor, Victor Lu, Vassil Tashev, Armstrong Foundjem, Aishwarya Ramasethu, Sadegh AlMahdi Kazemi Zarkouei, Chris Knotz, Kongtao Chen, Alicia Parrish, Anka Reuel, Heather Frase
备注：19 pages, 7 figures, to be published in the 39th Conference on Neural Information Processing Systems (NeurIPS 2025)
摘要：Large language model (LLM) benchmarks inform LLM use decisions (e.g., "is this LLM safe to deploy for my use case and context?"). However, benchmarks may be rendered unreliable by various failure modes that impact benchmark bias, variance, coverage, or people's capacity to understand benchmark evidence. Using the National Institute of Standards and Technology's risk management process as a foundation, this research iteratively analyzed 26 popular benchmarks, identifying 57 potential failure modes and 196 corresponding mitigation strategies. The mitigations reduce failure likelihood and/or severity, providing a frame for evaluating "benchmark risk," which is scored to provide a metaevaluation benchmark: BenchRisk. Higher scores indicate that benchmark users are less likely to reach an incorrect or unsupported conclusion about an LLM. All 26 scored benchmarks present significant risk within one or more of the five scored dimensions (comprehensiveness, intelligibility, consistency, correctness, and longevity), which points to important open research directions for the field of LLM benchmarking. The BenchRisk workflow allows for comparison between benchmarks; as an open-source tool, it also facilitates the identification and sharing of risks and their mitigations.

【8】Scalable Neural Incentive Design with Parameterized Mean-Field Approximation
标题：采用参数化平均场逼近的可扩展激励神经设计
链接：https://arxiv.org/abs/2510.21442

作者：Nathan Corecco, Batuhan Yardim, Vinzenz Thoma, Zebang Shen, Niao He
备注：52 pages, to appear at NeurIPS 2025
摘要：Designing incentives for a multi-agent system to induce a desirable Nash equilibrium is both a crucial and challenging problem appearing in many decision-making domains, especially for a large number of agents $N$. Under the exchangeability assumption, we formalize this incentive design (ID) problem as a parameterized mean-field game (PMFG), aiming to reduce complexity via an infinite-population limit. We first show that when dynamics and rewards are Lipschitz, the finite-$N$ ID objective is approximated by the PMFG at rate $\mathscr{O}(\frac{1}{\sqrt{N}})$. Moreover, beyond the Lipschitz-continuous setting, we prove the same $\mathscr{O}(\frac{1}{\sqrt{N}})$ decay for the important special case of sequential auctions, despite discontinuities in dynamics, through a tailored auction-specific analysis. Built on our novel approximation results, we further introduce our Adjoint Mean-Field Incentive Design (AMID) algorithm, which uses explicit differentiation of iterated equilibrium operators to compute gradients efficiently. By uniting approximation bounds with optimization guarantees, AMID delivers a powerful, scalable algorithmic tool for many-agent (large $N$) ID. Across diverse auction settings, the proposed AMID method substantially increases revenue over first-price formats and outperforms existing benchmark methods.

【9】Self-diffusion for Solving Inverse Problems
标题：求解反问题的自扩散
链接：https://arxiv.org/abs/2510.21417

作者：Guanxiong Luo, Shoujin Huang, Yanlong Yang
摘要：We propose self-diffusion, a novel framework for solving inverse problems without relying on pretrained generative models. Traditional diffusion-based approaches require training a model on a clean dataset to learn to reverse the forward noising process. This model is then used to sample clean solutions -- corresponding to posterior sampling from a Bayesian perspective -- that are consistent with the observed data under a specific task. In contrast, self-diffusion introduces a self-contained iterative process that alternates between noising and denoising steps to progressively refine its estimate of the solution. At each step of self-diffusion, noise is added to the current estimate, and a self-denoiser, which is a single untrained convolutional network randomly initialized from scratch, is continuously trained for certain iterations via a data fidelity loss to predict the solution from the noisy estimate. Essentially, self-diffusion exploits the spectral bias of neural networks and modulates it through a scheduled noise process. Without relying on pretrained score functions or external denoisers, this approach still remains adaptive to arbitrary forward operators and noisy observations, making it highly flexible and broadly applicable. We demonstrate the effectiveness of our approach on a variety of linear inverse problems, showing that self-diffusion achieves competitive or superior performance compared to other methods.

【10】Compositional Monte Carlo Tree Diffusion for Extendable Planning
标题：可扩展规划的组合蒙特卡洛树扩散
链接：https://arxiv.org/abs/2510.21361

作者：Jaesik Yoon, Hyeonseo Cho, Sungjin Ahn
备注：24 pages, 4 figures, NeurIPS 25 Spotlight
摘要：Monte Carlo Tree Diffusion (MCTD) integrates diffusion models with structured tree search to enable effective trajectory exploration through stepwise reasoning. However, MCTD remains fundamentally limited by training trajectory lengths. While periodic replanning allows plan concatenation for longer plan generation, the planning process remains locally confined, as MCTD searches within individual trajectories without access to global context. We propose Compositional Monte Carlo Tree Diffusion (C-MCTD), a framework that elevates planning from individual trajectory optimization to reasoning over complete plan compositions. C-MCTD introduces three complementary components: (1) Online Composer, which performs globally-aware planning by searching across entire plan compositions; (2) Distributed Composer, which reduces search complexity through parallel exploration from multiple starting points; and (3) Preplan Composer, which accelerates inference by leveraging cached plan graphs.

【11】Weak-to-Strong Generalization under Distribution Shifts
标题：分布转移下的弱到强概括
链接：https://arxiv.org/abs/2510.21332

作者：Myeongho Jeon, Jan Sobotka, Suhwan Choi, Maria Brbić
备注：Accepted to NeurIPS 2025
摘要：As future superhuman models become increasingly complex, accurately supervising their behavior may exceed human capabilities. Recent works have demonstrated that in such scenarios, weak models can effectively supervise strong models, a phenomenon known as weak-to-strong generalization. However, we find that naive weak-to-strong generalization fails under distribution shifts, often leading to worse performance of the strong model than its weak supervisors. To address this, we propose RAVEN, a robust weak-to-strong generalization framework that dynamically learns the optimal combinations of weak models in addition to parameters of the strong model. We demonstrate the effectiveness of RAVEN on image classification, text classification, and preference alignment tasks. RAVEN outperforms alternative baselines by over 30% on out-of-distribution tasks while matching or surpassing existing methods on in-distribution tasks. Moreover, our results show that RAVEN assigns higher weights to more accurate weak models, demonstrating its ability to automatically identify trustworthy supervision.

【12】SCORENF: Score-based Normalizing Flows for Sampling Unnormalized distributions
标题：SCORENF：用于采样非正规化分布的基于分数的正规化流程
链接：https://arxiv.org/abs/2510.21330

作者：Vikas Kanaujia, Vipul Arora
备注：\c{opyright} 20XX IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
摘要：Unnormalized probability distributions are central to modeling complex physical systems across various scientific domains. Traditional sampling methods, such as Markov Chain Monte Carlo (MCMC), often suffer from slow convergence, critical slowing down, poor mode mixing, and high autocorrelation. In contrast, likelihood-based and adversarial machine learning models, though effective, are heavily data-driven, requiring large datasets and often encountering mode covering and mode collapse. In this work, we propose ScoreNF, a score-based learning framework built on the Normalizing Flow (NF) architecture, integrated with an Independent Metropolis-Hastings (IMH) module, enabling efficient and unbiased sampling from unnormalized target distributions. We show that ScoreNF maintains high performance even with small training ensembles, thereby reducing reliance on computationally expensive MCMC-generated training data. We also present a method for assessing mode-covering and mode-collapse behaviours. We validate our method on synthetic 2D distributions (MOG-4 and MOG-8) and the high-dimensional $\phi^4$ lattice field theory distribution, demonstrating its effectiveness for sampling tasks.

【13】VL-SAE: Interpreting and Enhancing Vision-Language Alignment with a Unified Concept Set
标题：VL-AE：用统一的概念集解释和增强视觉与语言的一致性
链接：https://arxiv.org/abs/2510.21323

作者：Shufan Shen, Junshu Sun, Qingming Huang, Shuhui Wang
备注：Accepted by NeurIPS 2025
摘要：The alignment of vision-language representations endows current Vision-Language Models (VLMs) with strong multi-modal reasoning capabilities. However, the interpretability of the alignment component remains uninvestigated due to the difficulty in mapping the semantics of multi-modal representations into a unified concept set. To address this problem, we propose VL-SAE, a sparse autoencoder that encodes vision-language representations into its hidden activations. Each neuron in its hidden layer correlates to a concept represented by semantically similar images and texts, thereby interpreting these representations with a unified concept set. To establish the neuron-concept correlation, we encourage semantically similar representations to exhibit consistent neuron activations during self-supervised training. First, to measure the semantic similarity of multi-modal representations, we perform their alignment in an explicit form based on cosine similarity. Second, we construct the VL-SAE with a distance-based encoder and two modality-specific decoders to ensure the activation consistency of semantically similar representations. Experiments across multiple VLMs (e.g., CLIP, LLaVA) demonstrate the superior capability of VL-SAE in interpreting and enhancing the vision-language alignment. For interpretation, the alignment between vision and language representations can be understood by comparing their semantics with concepts. For enhancement, the alignment can be strengthened by aligning vision-language representations at the concept level, contributing to performance improvements in downstream tasks, including zero-shot image classification and hallucination elimination. Codes are available at https://github.com/ssfgunner/VL-SAE.

【14】Revisiting Social Welfare in Bandits: UCB is (Nearly) All You Need
标题：重温强盗中的社会福利：UCB（几乎）就是你所需要的一切
链接：https://arxiv.org/abs/2510.21312

作者：Dhruv Sarkar, Nishant Pandey, Sayak Ray Chowdhury
摘要：Regret in stochastic multi-armed bandits traditionally measures the difference between the highest reward and either the arithmetic mean of accumulated rewards or the final reward. These conventional metrics often fail to address fairness among agents receiving rewards, particularly in settings where rewards are distributed across a population, such as patients in clinical trials. To address this, a recent body of work has introduced Nash regret, which evaluates performance via the geometric mean of accumulated rewards, aligning with the Nash social welfare function known for satisfying fairness axioms. To minimize Nash regret, existing approaches require specialized algorithm designs and strong assumptions, such as multiplicative concentration inequalities and bounded, non-negative rewards, making them unsuitable for even Gaussian reward distributions. We demonstrate that an initial uniform exploration phase followed by a standard Upper Confidence Bound (UCB) algorithm achieves near-optimal Nash regret, while relying only on additive Hoeffding bounds, and naturally extending to sub-Gaussian rewards. Furthermore, we generalize the algorithm to a broad class of fairness metrics called the $p$-mean regret, proving (nearly) optimal regret bounds uniformly across all $p$ values. This is in contrast to prior work, which made extremely restrictive assumptions on the bandit instances and even then achieved suboptimal regret bounds.

【15】On the flow matching interpretability
标题：关于流匹配可解释性
链接：https://arxiv.org/abs/2510.21210

作者：Francesco Pivi, Simone Gazza, Davide Evangelista, Roberto Amadini, Maurizio Gabbrielli
摘要：Generative models based on flow matching have demonstrated remarkable success in various domains, yet they suffer from a fundamental limitation: the lack of interpretability in their intermediate generation steps. In fact these models learn to transform noise into data through a series of vector field updates, however the meaning of each step remains opaque. We address this problem by proposing a general framework constraining each flow step to be sampled from a known physical distribution. Flow trajectories are mapped to (and constrained to traverse) the equilibrium states of the simulated physical process. We implement this approach through the 2D Ising model in such a way that flow steps become thermal equilibrium points along a parametric cooling schedule. Our proposed architecture includes an encoder that maps discrete Ising configurations into a continuous latent space, a flow-matching network that performs temperature-driven diffusion, and a projector that returns to discrete Ising states while preserving physical constraints. We validate this framework across multiple lattice sizes, showing that it preserves physical fidelity while outperforming Monte Carlo generation in speed as the lattice size increases. In contrast with standard flow matching, each vector field represents a meaningful stepwise transition in the 2D Ising model's latent space. This demonstrates that embedding physical semantics into generative flows transforms opaque neural trajectories into interpretable physical processes.

【16】Gen-Review: A Large-scale Dataset of AI-Generated (and Human-written) Peer Reviews
标题：Gen-Revolution：人工智能生成（和人类编写）同行评审的大规模数据集
链接：https://arxiv.org/abs/2510.21192

作者：Luca Demetrio, Giovanni Apruzzese, Kathrin Grosse, Pavel Laskov, Emil Lupu, Vera Rimmer, Philine Widmer
摘要：How does the progressive embracement of Large Language Models (LLMs) affect scientific peer reviewing? This multifaceted question is fundamental to the effectiveness -- as well as to the integrity -- of the scientific process. Recent evidence suggests that LLMs may have already been tacitly used in peer reviewing, e.g., at the 2024 International Conference of Learning Representations (ICLR). Furthermore, some efforts have been undertaken in an attempt to explicitly integrate LLMs in peer reviewing by various editorial boards (including that of ICLR'25). To fully understand the utility and the implications of LLMs' deployment for scientific reviewing, a comprehensive relevant dataset is strongly desirable. Despite some previous research on this topic, such dataset has been lacking so far. We fill in this gap by presenting GenReview, the hitherto largest dataset containing LLM-written reviews. Our dataset includes 81K reviews generated for all submissions to the 2018--2025 editions of the ICLR by providing the LLM with three independent prompts: a negative, a positive, and a neutral one. GenReview is also linked to the respective papers and their original reviews, thereby enabling a broad range of investigations. To illustrate the value of GenReview, we explore a sample of intriguing research questions, namely: if LLMs exhibit bias in reviewing (they do); if LLM-written reviews can be automatically detected (so far, they can); if LLMs can rigorously follow reviewing instructions (not always) and whether LLM-provided ratings align with decisions on paper acceptance or rejection (holds true only for accepted papers). GenReview can be accessed at the following link: https://anonymous.4open.science/r/gen_review.

【17】Cloud-Fog-Edge Collaborative Computing for Sequential MIoT Workflow: A Two-Tier DDPG-Based Scheduling Framework
标题：用于顺序MIoT工作流的云-雾-边缘协同计算：基于DDPG的两层调度框架
链接：https://arxiv.org/abs/2510.21135

作者：Yuhao Fu (1), Yinghao Zhang (2), Yalin Liu (1), Bishenghui Tao (1), Junhong Ruan (3) ((1) Hong Kong Metropolitan University, Hong Kong, China, (2) Guangdong Key Lab of AI and Multi-Modal Data Processing, Beijing Normal-Hong Kong Baptist University, (3) Hong Kong University of Science and Technology, Hong Kong, China)
备注：14 pages, 3 figures, 2 tables
摘要：The Medical Internet of Things (MIoT) demands stringent end-to-end latency guarantees for sequential healthcare workflows deployed over heterogeneous cloud-fog-edge infrastructures. Scheduling these sequential workflows to minimize makespan is an NP-hard problem. To tackle this challenge, we propose a Two-tier DDPG-based scheduling framework that decomposes the scheduling decision into a hierarchical process: a global controller performs layer selection (edge, fog, or cloud), while specialized local controllers handle node assignment within the chosen layer. The primary optimization objective is the minimization of the workflow makespan. Experiments results validate our approach, demonstrating increasingly superior performance over baselines as workflow complexity rises. This trend highlights the frameworks ability to learn effective long-term strategies, which is critical for complex, large-scale MIoT scheduling scenarios.

【18】A Unified Approach to Submodular Maximization Under Noise
标题：噪音下子模最大化的统一方法
链接：https://arxiv.org/abs/2510.21128

作者：Kshipra Bhawalkar, Yang Cai, Zhe Feng, Christopher Liaw, Tao Lin
备注：Accepted by NeurIPS 2025
摘要：We consider the problem of maximizing a submodular function with access to a noisy value oracle for the function instead of an exact value oracle. Similar to prior work, we assume that the noisy oracle is persistent in that multiple calls to the oracle for a specific set always return the same value. In this model, Hassidim and Singer (2017) design a $(1-1/e)$-approximation algorithm for monotone submodular maximization subject to a cardinality constraint, and Huang et al (2022) design a $(1-1/e)/2$-approximation algorithm for monotone submodular maximization subject to any arbitrary matroid constraint. In this paper, we design a meta-algorithm that allows us to take any "robust" algorithm for exact submodular maximization as a black box and transform it into an algorithm for the noisy setting while retaining the approximation guarantee. By using the meta-algorithm with the measured continuous greedy algorithm, we obtain a $(1-1/e)$-approximation (resp. $1/e$-approximation) for monotone (resp. non-monotone) submodular maximization subject to a matroid constraint under noise. Furthermore, by using the meta-algorithm with the double greedy algorithm, we obtain a $1/2$-approximation for unconstrained (non-monotone) submodular maximization under noise.

【19】Distributionally Robust Feature Selection
标题：分布稳健的特征选择
链接：https://arxiv.org/abs/2510.21113

作者：Maitreyi Swaroop, Tamar Krishnamurti, Bryan Wilder
备注：Accepted at NeurIPS 2025
摘要：We study the problem of selecting limited features to observe such that models trained on them can perform well simultaneously across multiple subpopulations. This problem has applications in settings where collecting each feature is costly, e.g. requiring adding survey questions or physical sensors, and we must be able to use the selected features to create high-quality downstream models for different populations. Our method frames the problem as a continuous relaxation of traditional variable selection using a noising mechanism, without requiring backpropagation through model training processes. By optimizing over the variance of a Bayes-optimal predictor, we develop a model-agnostic framework that balances overall performance of downstream prediction across populations. We validate our approach through experiments on both synthetic datasets and real-world data.

【20】Soft Instruction De-escalation Defense
标题：软指令降级防御
链接：https://arxiv.org/abs/2510.21057

作者：Nils Philipp Walter, Chawin Sitawarin, Jamie Hayes, David Stutz, Ilia Shumailov
摘要：Large Language Models (LLMs) are increasingly deployed in agentic systems that interact with an external environment; this makes them susceptible to prompt injections when dealing with untrusted data. To overcome this limitation, we propose SIC (Soft Instruction Control)-a simple yet effective iterative prompt sanitization loop designed for tool-augmented LLM agents. Our method repeatedly inspects incoming data for instructions that could compromise agent behavior. If such content is found, the malicious content is rewritten, masked, or removed, and the result is re-evaluated. The process continues until the input is clean or a maximum iteration limit is reached; if imperative instruction-like content remains, the agent halts to ensure security. By allowing multiple passes, our approach acknowledges that individual rewrites may fail but enables the system to catch and correct missed injections in later steps. Although immediately useful, worst-case analysis shows that SIC is not infallible; strong adversary can still get a 15% ASR by embedding non-imperative workflows. This nonetheless raises the bar.

【21】Online Multi-Class Selection with Group Fairness Guarantee
标题：在线多班级选拔，团体公平保障
链接：https://arxiv.org/abs/2510.21055

作者：Faraz Zargari, Hossein Nekouyan, Lyndon Hallett, Bo Sun, Xiaoqi Tan
摘要：We study the online multi-class selection problem with group fairness guarantees, where limited resources must be allocated to sequentially arriving agents. Our work addresses two key limitations in the existing literature. First, we introduce a novel lossless rounding scheme that ensures the integral algorithm achieves the same expected performance as any fractional solution. Second, we explicitly address the challenges introduced by agents who belong to multiple classes. To this end, we develop a randomized algorithm based on a relax-and-round framework. The algorithm first computes a fractional solution using a resource reservation approach -- referred to as the set-aside mechanism -- to enforce fairness across classes. The subsequent rounding step preserves these fairness guarantees without degrading performance. Additionally, we propose a learning-augmented variant that incorporates untrusted machine-learned predictions to better balance fairness and efficiency in practical settings.

【22】Elementary, My Dear Watson: Non-Invasive Neural Keyword Spotting in the LibriBrain Dataset
标题：亲爱的沃森，初级：LibriBrain数据集中的非侵入性神经关键词发现
链接：https://arxiv.org/abs/2510.21038

作者：Gereon Elvers, Gilad Landau, Oiwi Parker Jones
备注：16 pages, 7 figures, 6 tables
摘要：Non-invasive brain-computer interfaces (BCIs) are beginning to benefit from large, public benchmarks. However, current benchmarks target relatively simple, foundational tasks like Speech Detection and Phoneme Classification, while application-ready results on tasks like Brain-to-Text remain elusive. We propose Keyword Spotting (KWS) as a practically applicable, privacy-aware intermediate task. Using the deep 52-hour, within-subject LibriBrain corpus, we provide standardized train/validation/test splits for reproducible benchmarking, and adopt an evaluation protocol tailored to extreme class imbalance. Concretely, we use area under the precision-recall curve (AUPRC) as a robust evaluation metric, complemented by false alarms per hour (FA/h) at fixed recall to capture user-facing trade-offs. To simplify deployment and further experimentation within the research community, we are releasing an updated version of the pnpl library with word-level dataloaders and Colab-ready tutorials. As an initial reference model, we present a compact 1-D Conv/ResNet baseline with focal loss and top-k pooling that is trainable on a single consumer-class GPU. The reference model achieves approximately 13x the permutation baseline AUPRC on held-out sessions, demonstrating the viability of the task. Exploratory analyses reveal: (i) predictable within-subject scaling - performance improves log-linearly with more training hours - and (ii) the existence of word-level factors (frequency and duration) that systematically modulate detectability.

【23】JSTprove: Pioneering Verifiable AI for a Trustless Future
标题：JSTprove：开创可验证人工智能，打造无可信未来
链接：https://arxiv.org/abs/2510.21024

作者：Jonathan Gold, Tristan Freiberg, Haruna Isah, Shirin Shahabi
备注：13 pages, 8 figures, and 4 tables
摘要：The integration of machine learning (ML) systems into critical industries such as healthcare, finance, and cybersecurity has transformed decision-making processes, but it also brings new challenges around trust, security, and accountability. As AI systems become more ubiquitous, ensuring the transparency and correctness of AI-driven decisions is crucial, especially when they have direct consequences on privacy, security, or fairness. Verifiable AI, powered by Zero-Knowledge Machine Learning (zkML), offers a robust solution to these challenges. zkML enables the verification of AI model inferences without exposing sensitive data, providing an essential layer of trust and privacy. However, traditional zkML systems typically require deep cryptographic expertise, placing them beyond the reach of most ML engineers. In this paper, we introduce JSTprove, a specialized zkML toolkit, built on Polyhedra Network's Expander backend, to enable AI developers and ML engineers to generate and verify proofs of AI inference. JSTprove provides an end-to-end verifiable AI inference pipeline that hides cryptographic complexity behind a simple command-line interface while exposing auditable artifacts for reproducibility. We present the design, innovations, and real-world use cases of JSTprove as well as our blueprints and tooling to encourage community review and extension. JSTprove therefore serves both as a usable zkML product for current engineering needs and as a reproducible foundation for future research and production deployments of verifiable AI.

【24】SutureBot: A Precision Framework & Benchmark For Autonomous End-to-End Suturing
标题：SutureBot：自主端到端缝合的精确框架和基准
链接：https://arxiv.org/abs/2510.20965

作者：Jesse Haworth, Juo-Tung Chen, Nigel Nelson, Ji Woong Kim, Masoud Moghani, Chelsea Finn, Axel Krieger
备注：10 pages, 5 figures, 4 tables, NeurIPS 2025
摘要：Robotic suturing is a prototypical long-horizon dexterous manipulation task, requiring coordinated needle grasping, precise tissue penetration, and secure knot tying. Despite numerous efforts toward end-to-end autonomy, a fully autonomous suturing pipeline has yet to be demonstrated on physical hardware. We introduce SutureBot: an autonomous suturing benchmark on the da Vinci Research Kit (dVRK), spanning needle pickup, tissue insertion, and knot tying. To ensure repeatability, we release a high-fidelity dataset comprising 1,890 suturing demonstrations. Furthermore, we propose a goal-conditioned framework that explicitly optimizes insertion-point precision, improving targeting accuracy by 59\%-74\% over a task-only baseline. To establish this task as a benchmark for dexterous imitation learning, we evaluate state-of-the-art vision-language-action (VLA) models, including $\pi_0$, GR00T N1, OpenVLA-OFT, and multitask ACT, each augmented with a high-level task-prediction policy. Autonomous suturing is a key milestone toward achieving robotic autonomy in surgery. These contributions support reproducible evaluation and development of precision-focused, long-horizon dexterous manipulation policies necessary for end-to-end suturing. Dataset is available at: https://huggingface.co/datasets/jchen396/suturebot

【25】Global Dynamics of Heavy-Tailed SGDs in Nonconvex Loss Landscape: Characterization and Control
标题：非凸损失格局中重尾SGD的全球动力学：特征化和控制
链接：https://arxiv.org/abs/2510.20905

作者：Xingyu Wang, Chang-Han Rhee
备注：60 pages, 2 figures, 4 tables
摘要：Stochastic gradient descent (SGD) and its variants enable modern artificial intelligence. However, theoretical understanding lags far behind their empirical success. It is widely believed that SGD has a curious ability to avoid sharp local minima in the loss landscape, which are associated with poor generalization. To unravel this mystery and further enhance such capability of SGDs, it is imperative to go beyond the traditional local convergence analysis and obtain a comprehensive understanding of SGDs' global dynamics. In this paper, we develop a set of technical machinery based on the recent large deviations and metastability analysis in Wang and Rhee (2023) and obtain sharp characterization of the global dynamics of heavy-tailed SGDs. In particular, we reveal a fascinating phenomenon in deep learning: by injecting and then truncating heavy-tailed noises during the training phase, SGD can almost completely avoid sharp minima and achieve better generalization performance for the test data. Simulation and deep learning experiments confirm our theoretical prediction that heavy-tailed SGD with gradient clipping finds local minima with a more flat geometry and achieves better generalization performance.

【26】HA-RAG: Hotness-Aware RAG Acceleration via Mixed Precision and Data Placement
标题：HA-RAG：通过混合精度和数据放置实现热度感知RAG加速
链接：https://arxiv.org/abs/2510.20878

作者：Danying Ge, Jianhua Gao, Yixue Yang, Weixing Ji
备注：13 pages,16 figures,2 tables
摘要：Retrieval-Augmented Generation (RAG) improves model output accuracy by leveraging external knowledge bases, serving as an effective solution to address hallucination issues and knowledge-update delays in Large Language Models (LLMs). However, the introduction of external knowledge bases presents RAG with challenges in long-context processing, significantly increasing memory consumption and inference latency. Existing research accelerates inference by precomputing Key and Value (KV) of the knowledge base and loading them on-demand during inference. Based on the access frequency of different KV chunks within the external knowledge base, this paper proposes a hotness-aware RAG (HA-RAG) inference optimization system. First, leveraging the numerical distribution of KV chunks, we introduce a hotness-aware mixed-precision compressing and loading method to reduce disk I/O and memory access overhead. Second, we design a hotness-aware data placement strategy that prioritizes storing frequently accessed KV chunks in high-speed memory to improve data access efficiency. Experimental results demonstrate that, compared with TurboRAG, the proposed HA-RAG achieves an average speedup of 2.10x and maximum speedup of 10.49x in Time-To-First-Token (TTFT) with negligible accuracy loss.

【27】Multimodal Datasets with Controllable Mutual Information
标题：具有可控互信息的多峰数据集
链接：https://arxiv.org/abs/2510.21686

作者：Raheem Karim Hashmani, Garrett W. Merz, Helen Qu, Mariel Pettee, Kyle Cranmer
备注：15 pages, 4 figures, 1 table. Our code is publicly available at this https URL
摘要：We introduce a framework for generating highly multimodal datasets with explicitly calculable mutual information between modalities. This enables the construction of benchmark datasets that provide a novel testbed for systematic studies of mutual information estimators and multimodal self-supervised learning techniques. Our framework constructs realistic datasets with known mutual information using a flow-based generative model and a structured causal framework for generating correlated latent variables.

【28】HollowFlow: Efficient Sample Likelihood Evaluation using Hollow Message Passing
标题：HollowFlow：使用空心消息传递的高效样本似然评估
链接：https://arxiv.org/abs/2510.21542

作者：Johann Flemming Gloy, Simon Olsson
备注：Accepted to NeurIPS 2025
摘要：Flow and diffusion-based models have emerged as powerful tools for scientific applications, particularly for sampling non-normalized probability distributions, as exemplified by Boltzmann Generators (BGs). A critical challenge in deploying these models is their reliance on sample likelihood computations, which scale prohibitively with system size $n$, often rendering them infeasible for large-scale problems. To address this, we introduce $\textit{HollowFlow}$, a flow-based generative model leveraging a novel non-backtracking graph neural network (NoBGNN). By enforcing a block-diagonal Jacobian structure, HollowFlow likelihoods are evaluated with a constant number of backward passes in $n$, yielding speed-ups of up to $\mathcal{O}(n^2)$: a significant step towards scaling BGs to larger systems. Crucially, our framework generalizes: $\textbf{any equivariant GNN or attention-based architecture}$ can be adapted into a NoBGNN. We validate HollowFlow by training BGs on two different systems of increasing size. For both systems, the sampling and likelihood evaluation time decreases dramatically, following our theoretical scaling laws. For the larger system we obtain a $10^2\times$ speed-up, clearly illustrating the potential of HollowFlow-based approaches for high-dimensional scientific problems previously hindered by computational bottlenecks.

【29】Oracle-Efficient Combinatorial Semi-Bandits
标题：Oracle高效组合半强盗
链接：https://arxiv.org/abs/2510.21431

作者：Jung-hun Kim, Milan Vojnović, Min-hwan Oh
备注：NeurIPS 2025
摘要：We study the combinatorial semi-bandit problem where an agent selects a subset of base arms and receives individual feedback. While this generalizes the classical multi-armed bandit and has broad applicability, its scalability is limited by the high cost of combinatorial optimization, requiring oracle queries at every round. To tackle this, we propose oracle-efficient frameworks that significantly reduce oracle calls while maintaining tight regret guarantees. For the worst-case linear reward setting, our algorithms achieve $\tilde{O}(\sqrt{T})$ regret using only $O(\log\log T)$ oracle queries. We also propose covariance-adaptive algorithms that leverage noise structure for improved regret, and extend our approach to general (non-linear) rewards. Overall, our methods reduce oracle usage from linear to (doubly) logarithmic in time, with strong theoretical guarantees.

【30】Efficient Exploration of Chemical Kinetics
标题：化学动力学的有效探索
链接：https://arxiv.org/abs/2510.21368

作者：Rohit Goswami (1) ((1) Science Institute and Faculty of Physical Sciences, University of Iceland, Reykjavík, Iceland)
备注：Doctoral dissertation, 102 pages, ISBN pending from the University of Iceland. doctorate. By design, all text and figures within this thesis are original and do not appear in the associated papers
摘要：Estimating reaction rates and chemical stability is fundamental, yet efficient methods for large-scale simulations remain out of reach despite advances in modeling and exascale computing. Direct simulation is limited by short timescales; machine-learned potentials require large data sets and struggle with transition state regions essential for reaction rates. Reaction network exploration with sufficient accuracy is hampered by the computational cost of electronic structure calculations, and even simplifications like harmonic transition state theory rely on prohibitively expensive saddle point searches. Surrogate model-based acceleration has been promising but hampered by overhead and numerical instability. This dissertation presents a holistic solution, co-designing physical representations, statistical models, and systems architecture in the Optimal Transport Gaussian Process (OT-GP) framework. Using physics-aware optimal transport metrics, OT-GP creates compact, chemically relevant surrogates of the potential energy surface, underpinned by statistically robust sampling. Alongside EON software rewrites for long timescale simulations, we introduce reinforcement learning approaches for both minimum-mode following (when the final state is unknown) and nudged elastic band methods (when endpoints are specified). Collectively, these advances establish a representation-first, modular approach to chemical kinetics simulation. Large-scale benchmarks and Bayesian hierarchical validation demonstrate state-of-the-art performance and practical exploration of chemical kinetics, transforming a longstanding theoretical promise into a working engine for discovery.

【31】Enforcing Calibration in Multi-Output Probabilistic Regression with Pre-rank Regularization
标题：利用预排序正规化在多输出概率回归中强制校准
链接：https://arxiv.org/abs/2510.21273

作者：Naomi Desobry, Elnura Zhalieva, Souhaib Ben Taieb
摘要：Probabilistic models must be well calibrated to support reliable decision-making. While calibration in single-output regression is well studied, defining and achieving multivariate calibration in multi-output regression remains considerably more challenging. The existing literature on multivariate calibration primarily focuses on diagnostic tools based on pre-rank functions, which are projections that reduce multivariate prediction-observation pairs to univariate summaries to detect specific types of miscalibration. In this work, we go beyond diagnostics and introduce a general regularization framework to enforce multivariate calibration during training for arbitrary pre-rank functions. This framework encompasses existing approaches such as highest density region calibration and copula calibration. Our method enforces calibration by penalizing deviations of the projected probability integral transforms (PITs) from the uniform distribution, and can be added as a regularization term to the loss function of any probabilistic predictor. Specifically, we propose a regularization loss that jointly enforces both marginal and multivariate pre-rank calibration. We also introduce a new PCA-based pre-rank that captures calibration along directions of maximal variance in the predictive distribution, while also enabling dimensionality reduction. Across 18 real-world multi-output regression datasets, we show that unregularized models are consistently miscalibrated, and that our methods significantly improve calibration across all pre-rank functions without sacrificing predictive accuracy.

【32】Doubly-Regressing Approach for Subgroup Fairness
标题：亚群公平性的双回归方法
链接：https://arxiv.org/abs/2510.21091

作者：Kyungseon Lee, Kunwoong Kim, Jihu Lee, Dongyoon Yang, Yongdai Kim
摘要：Algorithmic fairness is a socially crucial topic in real-world applications of AI. Among many notions of fairness, subgroup fairness is widely studied when multiple sensitive attributes (e.g., gender, race, age) are present. However, as the number of sensitive attributes grows, the number of subgroups increases accordingly, creating heavy computational burdens and data sparsity problem (subgroups with too small sizes). In this paper, we develop a novel learning algorithm for subgroup fairness which resolves these issues by focusing on subgroups with sufficient sample sizes as well as marginal fairness (fairness for each sensitive attribute). To this end, we formalize a notion of subgroup-subset fairness and introduce a corresponding distributional fairness measure called the supremum Integral Probability Metric (supIPM). Building on this formulation, we propose the Doubly Regressing Adversarial learning for subgroup Fairness (DRAF) algorithm, which reduces a surrogate fairness gap for supIPM with much less computation than directly reducing supIPM. Theoretically, we prove that the proposed surrogate fairness gap is an upper bound of supIPM. Empirically, we show that the DRAF algorithm outperforms baseline methods in benchmark datasets, specifically when the number of sensitive attributes is large so that many subgroups are very small.

【33】Data-Centric Lessons To Improve Speech-Language Pretraining
标题：以数据为中心的课程，以提高语音语言预训练
链接：https://arxiv.org/abs/2510.20860

作者：Vishaal Udandarao, Zhiyun Lu, Xuankai Chang, Yongqiang Wang, Violet Z. Yao, Albin Madapally Jose, Fartash Faghri, Josh Gardner, Chung-Cheng Chiu
备注：Tech Report
摘要：Spoken Question-Answering (SQA) is a core capability for useful and interactive artificial intelligence systems. Recently, several speech-language models (SpeechLMs) have been released with a specific focus on improving their SQA performance. However, a lack of controlled ablations of pretraining data processing and curation makes it challenging to understand what factors account for performance, despite substantial gains from similar studies in other data modalities. In this work, we address this gap by conducting a data-centric exploration for pretraining SpeechLMs. We focus on three research questions fundamental to speech-language pretraining data: (1) how to process raw web-crawled audio content for speech-text pretraining, (2) how to construct synthetic pretraining datasets to augment web-crawled data and (3) how to interleave (text, audio) segments into training sequences. We apply the insights from our controlled data-centric ablations to pretrain a 3.8B-parameter SpeechLM, called SpeLangy, that outperforms models that are up to 3x larger by 10.2% absolute performance. We hope our findings highlight the impact of effective data curation for speech-language pretraining and guide future data-centric exploration in SpeechLMs.

机器翻译由腾讯交互翻译提供，仅供参考

点击“阅读原文”获取带摘要的学术速递