Py学习  »  机器学习算法

机器学习学术速递[10.31]

arXiv每日学术速递 • 2 周前 • 279 次点击  

点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!


cs.LG 方向,今日共计174篇


大模型相关(19篇)

【1】SteerVLM: Robust Model Control through Lightweight Activation Steering for Vision Language Models
标题:SteerVLM:通过视觉语言模型的轻量级激活引导实现稳健模型控制
链接:https://arxiv.org/abs/2510.26769

作者:Anushka Sivakumar, Andrew Zhang, Zaber Hakim, Chris Thomas
摘要:这项工作介绍了SteerVLM,一个轻量级的转向模块,旨在指导视觉语言模型(VLM)的输出,更好地坚持所需的指令。我们的方法从成对提示编码目标和逆向行为的潜在嵌入中学习,以动态调整连接语言模态和图像上下文的激活。这允许在不修改模型权重的情况下对复杂的输出语义进行细粒度的推理时间控制,同时保持非目标任务的性能。我们的转向模块需要等于原始VLM大小的0.14%的学习参数。我们的转向模块通过逐维激活调制和跨层自适应转向来获得模型控制,而无需预先提取静态向量或手动调整干预点。此外,我们还介绍了VNIA(视觉叙事意图对齐),这是一个专门为促进VLM转向技术的开发和评估而创建的多模态数据集。我们的方法优于现有的干预技术转向和幻觉缓解基准的VLMs,并提出了一个强大的解决方案,通过激活工程的多模态模型控制。
摘要:This work introduces SteerVLM, a lightweight steering module designed to guide Vision-Language Models (VLMs) towards outputs that better adhere to desired instructions. Our approach learns from the latent embeddings of paired prompts encoding target and converse behaviors to dynamically adjust activations connecting the language modality with image context. This allows for fine-grained, inference-time control over complex output semantics without modifying model weights while preserving performance on off-target tasks. Our steering module requires learning parameters equal to 0.14% of the original VLM's size. Our steering module gains model control through dimension-wise activation modulation and adaptive steering across layers without requiring pre-extracted static vectors or manual tuning of intervention points. Furthermore, we introduce VNIA (Visual Narrative Intent Alignment), a multimodal dataset specifically created to facilitate the development and evaluation of VLM steering techniques. Our method outperforms existing intervention techniques on steering and hallucination mitigation benchmarks for VLMs and proposes a robust solution for multimodal model control through activation engineering.


【2】Value Drifts: Tracing Value Alignment During LLM Post-Training
标题:价值漂移:LLM后训练期间追踪价值一致
链接:https://arxiv.org/abs/2510.26707

作者:Mehar Bhatia, Shravan Nayak, Gaurav Kamath, Marius Mosbach, Karolina Stańczak, Vered Shwartz, Siva Reddy
摘要:随着LLM在社会中发挥越来越重要的作用,他们越来越多地面临着一些问题,这些问题不仅要求他们利用他们的一般知识,而且还要求他们与某些人类价值体系保持一致。因此,研究LLM与人类价值观的一致性已成为一个重要的研究领域。然而,以前的工作主要集中在评估完全训练的模型的对齐,忽略了模型学习表达人类价值观的训练动态。在这项工作中,我们研究了在模型的后训练过程中,价值对齐是如何以及在哪个阶段出现的。我们的分析解开了后训练算法和数据集的影响,测量了训练过程中值漂移的幅度和时间。通过对不同大小的Llama-3和Qwen-3模型以及流行的监督微调(SFT)和偏好优化数据集和算法进行实验,我们发现SFT阶段通常会建立模型的值,而随后的偏好优化很少重新对齐这些值。此外,使用一个合成的偏好数据集,使值的控制操作,我们发现,不同的偏好优化算法导致不同的值对齐的结果,即使偏好数据保持不变。我们的研究结果为在后训练期间如何学习价值观提供了可操作的见解,并有助于为数据管理提供信息,以及选择模型和算法进行偏好优化,以改善模型与人类价值观的一致性。
摘要:As LLMs occupy an increasingly important role in society, they are more and more confronted with questions that require them not only to draw on their general knowledge but also to align with certain human value systems. Therefore, studying the alignment of LLMs with human values has become a crucial field of inquiry. Prior work, however, mostly focuses on evaluating the alignment of fully trained models, overlooking the training dynamics by which models learn to express human values. In this work, we investigate how and at which stage value alignment arises during the course of a model's post-training. Our analysis disentangles the effects of post-training algorithms and datasets, measuring both the magnitude and time of value drifts during training. Experimenting with Llama-3 and Qwen-3 models of different sizes and popular supervised fine-tuning (SFT) and preference optimization datasets and algorithms, we find that the SFT phase generally establishes a model's values, and subsequent preference optimization rarely re-aligns these values. Furthermore, using a synthetic preference dataset that enables controlled manipulation of values, we find that different preference optimization algorithms lead to different value alignment outcomes, even when preference data is held constant. Our findings provide actionable insights into how values are learned during post-training and help to inform data curation, as well as the selection of models and algorithms for preference optimization to improve model alignment to human values.


【3】Inference-Cost-Aware Dynamic Tree Construction for Efficient Inference in Large Language Models
标题:感知推理成本的动态树构建以实现大型语言模型中的高效推理
链接:https://arxiv.org/abs/2510.26577

作者:Yinrong Hong, Zhiquan Tan, Kai Hu
摘要:大型语言模型(LLM)面临着来自自回归设计和大规模的重大推理延迟挑战。为了解决这个问题,推测性解码成为一种解决方案,可以同时生成和验证多个令牌。虽然EAGLE-2和EAGLE-3等最新方法使用动态树结构改进了推测性解码,但它们通常忽略了关键系统变量(如GPU设备和批量大小)的影响。   因此,我们引入了一种新的动态树解码方法,称为CAST,它考虑了推理成本,包括GPU配置和批量大小等因素,以动态细化树结构。通过对六种不同任务的综合实验,并利用六种不同的LLM,我们的方法取得了显著的效果,实现了比传统解码方法快5.2倍的速度。此外,它通常比现有的最先进的技术高出5%到20%。
摘要:Large Language Models (LLMs) face significant inference latency challenges stemming from their autoregressive design and large size. To address this, speculative decoding emerges as a solution, enabling the simultaneous generation and validation of multiple tokens. While recent approaches like EAGLE-2 and EAGLE-3 improve speculative decoding using dynamic tree structures, they often neglect the impact of crucial system variables such as GPU devices and batch sizes.   Therefore, we introduce a new dynamic tree decoding approach called CAST that takes into account inference costs, including factors such as GPU configurations and batch sizes, to dynamically refine the tree structure. Through comprehensive experimentation across six diverse tasks and utilizing six distinct LLMs, our methodology demonstrates remarkable results, achieving speeds up to 5.2 times faster than conventional decoding methods. Moreover, it generally outperforms existing state-of-the-art techniques from 5% to 20%.


【4】The Structure of Relation Decoding Linear Operators in Large Language Models
标题:大型语言模型中关系解码线性运算符的结构
链接:https://arxiv.org/abs/2510.26543

作者 :Miranda Anna Christ, Adrián Csiszárik, Gergely Becsó, Dániel Varga
备注:NeurIPS 2025 (Spotlight)
摘要:本文研究了Hernandez等人[2023]中引入的线性运算符的结构,这些运算符解码Transformer语言模型中的特定关系事实。我们将他们的单一关系的研究结果扩展到关系的集合,并系统地绘制它们的组织。我们表明,这种关系解码器的集合可以高度压缩简单的3阶张量网络,而不会显着损失解码精度。为了解释这种令人惊讶的冗余,我们开发了一个交叉评估协议,在该协议中,我们将每个线性解码器运算符应用于每个其他关系的主题。我们的研究结果表明,这些线性映射不编码不同的关系,但提取重复的,粗粒度的语义属性(例如,首都所在国和食品所在国都属于X国属性)。这种以属性为中心的结构澄清了操作符的可压缩性,并强调了为什么它们只推广到语义上接近的新关系。因此,我们的研究结果解释线性关系解码的Transformer语言模型,主要是基于属性,而不是特定的关系。
摘要:This paper investigates the structure of linear operators introduced in Hernandez et al. [2023] that decode specific relational facts in transformer language models. We extend their single-relation findings to a collection of relations and systematically chart their organization. We show that such collections of relation decoders can be highly compressed by simple order-3 tensor networks without significant loss in decoding accuracy. To explain this surprising redundancy, we develop a cross-evaluation protocol, in which we apply each linear decoder operator to the subjects of every other relation. Our results reveal that these linear maps do not encode distinct relations, but extract recurring, coarse-grained semantic properties (e.g., country of capital city and country of food are both in the country-of-X property). This property-centric structure clarifies both the operators' compressibility and highlights why they generalize only to new relations that are semantically close. Our findings thus interpret linear relational decoding in transformer language models as primarily property-based, rather than relation-specific.


【5】LLMs as In-Context Meta-Learners for Model and Hyperparameter Selection
标题:LLM作为模型和超参数选择的上下文元学习者
链接:https://arxiv.org/abs/2510.26510

作者:Youssef Attia El Hili, Albert Thomas, Malik Tiomoko, Abdelhakim Benechehab, Corentin Léger, Corinne Ancourt, Balázs Kégl
备注:27 pages, 6 figures
摘要:模型和超参数选择在机器学习中至关重要但具有挑战性,通常需要专家直觉或昂贵的自动搜索。我们研究大型语言模型(LLM)是否可以作为上下文元学习者完成这项任务。通过将每个数据集转换为可解释的元数据,我们提示LLM推荐模型族和超参数。我们研究了两种激励策略:(1)仅依赖于预先训练的知识的zero-shot模式,以及(2)增加模型示例及其在过去任务中的表现的元信息模式。在合成和真实世界的基准测试中,我们表明LLM可以利用数据集元数据来推荐竞争模型和超参数,而无需搜索,并且元信息提示的改进表明了它们在上下文元学习中的能力。这些结果突出了LLM作为模型选择和超参数优化的轻量级通用助手的一个有前途的新角色。
摘要:Model and hyperparameter selection are critical but challenging in machine learning, typically requiring expert intuition or expensive automated search. We investigate whether large language models (LLMs) can act as in-context meta-learners for this task. By converting each dataset into interpretable metadata, we prompt an LLM to recommend both model families and hyperparameters. We study two prompting strategies: (1) a zero-shot mode relying solely on pretrained knowledge, and (2) a meta-informed mode augmented with examples of models and their performance on past tasks. Across synthetic and real-world benchmarks, we show that LLMs can exploit dataset metadata to recommend competitive models and hyperparameters without search, and that improvements from meta-informed prompting demonstrate their capacity for in-context meta-learning. These results highlight a promising new role for LLMs as lightweight, general-purpose assistants for model selection and hyperparameter optimization.


【6】LINK-KG: LLM-Driven Coreference-Resolved Knowledge Graphs for Human Smuggling Networks
标题:LINK-KG:LLM驱动的人口走私网络共指解析知识图
链接:https://arxiv.org/abs/2510.26486

作者:Dipak Meher, Carlotta Domeniconi, Guadalupe Correa-Cabrera
备注:Accepted in ICKG 2025 Conference, 8 Pages, 2 Figures
摘要:人口偷运网络复杂且不断演变,因此难以对其进行全面分析。法律案例文档为这些网络提供了丰富的事实和程序见解,但通常很长,非结构化,并且充满了模糊或不断变化的引用,这对自动知识图(KG)构建构成了重大挑战。现有的方法要么忽略了共指消解,要么无法扩展到短文本范围之外,导致碎片化的图形和不一致的实体链接。我们提出LINK-KG,一个模块化的框架,集成了一个三阶段,LLM引导的共指消解管道与下游KG提取。我们的方法的核心是一个特定类型的提示缓存,它始终跟踪和解决跨文档块的引用,使结构化的知识图构建从短和长的法律文本清晰和消除歧义的叙述。与基线方法相比,LINK-KG减少了45.21%的平均节点重复和32.22%的噪声节点,从而使图形结构更清晰,更连贯。这些改进使LINK-KG成为分析复杂犯罪网络的坚实基础。
摘要:Human smuggling networks are complex and constantly evolving, making them difficult to analyze comprehensively. Legal case documents offer rich factual and procedural insights into these networks but are often long, unstructured, and filled with ambiguous or shifting references, posing significant challenges for automated knowledge graph (KG) construction. Existing methods either overlook coreference resolution or fail to scale beyond short text spans, leading to fragmented graphs and inconsistent entity linking. We propose LINK-KG, a modular framework that integrates a three-stage, LLM-guided coreference resolution pipeline with downstream KG extraction. At the core of our approach is a type-specific Prompt Cache, which consistently tracks and resolves references across document chunks, enabling clean and disambiguated narratives for structured knowledge graph construction from both short and long legal texts. LINK-KG reduces average node duplication by 45.21% and noisy nodes by 32.22% compared to baseline methods, resulting in cleaner and more coherent graph structures. These improvements establish LINK-KG as a strong foundation for analyzing complex criminal networks.


【7】Unravelling the Mechanisms of Manipulating Numbers in Language Models
标题:揭开语言模型中数字操纵机制
链接:https://arxiv.org/abs/2510.26285

作者:Michal Štefánik, Timothee Mickus, Marek Kadlčík, Bertram Højer, Michal Spiegel, Raúl Vázquez, Aman Sinha, Josef Kuchař, Philipp Mondorf
摘要:最近的工作表明,不同的大型语言模型(LLM)收敛到相似且准确的数字输入嵌入表示。这些发现与记录的倾向LLM产生错误的输出时,处理数字信息相冲突。在这项工作中,我们的目标是通过探索语言模型如何操纵数字和量化这些机制的准确性下限来解释这种冲突。我们发现,尽管出现了错误,但不同的语言模型学习到了可互换的数字表示,这些数字表示在其隐藏状态和输入上下文类型中是系统的、高度准确的和通用的。这允许我们为每个LLM创建通用探测器,并跟踪信息(包括输出错误的原因)到特定层。我们的研究结果奠定了一个基本的理解如何预先训练的LLM操纵数字和概述的潜力,更准确的探测技术在解决LLM的架构的改进。
摘要:Recent work has shown that different large language models (LLMs) converge to similar and accurate input embedding representations for numbers. These findings conflict with the documented propensity of LLMs to produce erroneous outputs when dealing with numeric information. In this work, we aim to explain this conflict by exploring how language models manipulate numbers and quantify the lower bounds of accuracy of these mechanisms. We find that despite surfacing errors, different language models learn interchangeable representations of numbers that are systematic, highly accurate and universal across their hidden states and the types of input contexts. This allows us to create universal probes for each LLM and to trace information -- including the causes of output errors -- to specific layers. Our results lay a fundamental understanding of how pre-trained LLMs manipulate numbers and outline the potential of more accurate probing techniques in addressed refinements of LLMs' architectures.


【8】PVMark: Enabling Public Verifiability for LLM Watermarking Schemes
标题:PVMark:实现LLM水印计划的公共可验证性
链接:https://arxiv.org/abs/2510.26274

作者:Haohua Duan, Liyao Xiang, Xin Zhang
备注:This work has been submitted to the IEEE for possible publication
摘要:已经提出了用于大型语言模型(LLM)的水印方案,以识别生成的文本的来源,减轻模型盗窃所带来的潜在威胁。然而,目前的水印解决方案很难解决的信任问题:非公开水印检测不能证明自己忠实地进行检测。我们观察到,这是由于在水印检测中主要使用的秘密密钥-它不能是公开的,或者对手可以发起删除攻击提供的密钥;也不能是私人的,或者水印检测是不透明的。为了解决这一难题,我们提出了PVMark,一个基于零知识证明(ZKP)的插件,使水印检测过程可以由第三方公开验证,而无需公开任何密钥。PVMark取决于水印检测的“正确执行”的证明,在水印检测的基础上建立了一组ZKP约束,包括映射、随机数生成、比较和求和。我们在Python,Rust和Circom中实现了PVMark的多种变体,涵盖了三种水印方案,三种哈希函数和四种ZKP协议的组合,以显示我们的方法在各种情况下有效地工作。实验结果表明,PVMark算法能够在不影响水印性能的前提下,有效地实现LLM水印算法的可验证性,具有一定的应用前景。
摘要:Watermarking schemes for large language models (LLMs) have been proposed to identify the source of the generated text, mitigating the potential threats emerged from model theft. However, current watermarking solutions hardly resolve the trust issue: the non-public watermark detection cannot prove itself faithfully conducting the detection. We observe that it is attributed to the secret key mostly used in the watermark detection -- it cannot be public, or the adversary may launch removal attacks provided the key; nor can it be private, or the watermarking detection is opaque to the public. To resolve the dilemma, we propose PVMark, a plugin based on zero-knowledge proof (ZKP), enabling the watermark detection process to be publicly verifiable by third parties without disclosing any secret key. PVMark hinges upon the proof of `correct execution' of watermark detection on which a set of ZKP constraints are built, including mapping, random number generation, comparison, and summation. We implement multiple variants of PVMark in Python, Rust and Circom, covering combinations of three watermarking schemes, three hash functions, and four ZKP protocols, to show our approach effectively works under a variety of circumstances. By experimental results, PVMark efficiently enables public verifiability on the state-of-the-art LLM watermarking schemes yet without compromising the watermarking performance, promising to be deployed in practice.


【9】Test-Time Alignment of LLMs via Sampling-Based Optimal Control in pre-logit space
标题:通过前logit空间中基于采样的最优控制实现LLM的测试时间对齐
链接:https://arxiv.org/abs/2510.26219

作者:Sekitoshi Kanai, Tsukasa Yoshida, Hiroshi Takahashi, Haru Kuroki, Kazumune Hashimoto
备注:21 pages, 8 figures
摘要:大型语言模型(LLM)的测试时对齐引起了人们的关注,因为微调LLM需要很高的计算成本。本文在随机控制输入下基于采样的模型预测控制的基础上,提出了一种新的测试时间对准方法--自适应前对数重要采样(AISP)。AISP将高斯扰动应用到前对数中,前对数是倒数第二层的输出,以便最大化相对于扰动均值的预期回报。我们证明了最优均值是通过重要性抽样和抽样奖励获得的。AISP在所使用的样本数量的奖励方面优于n中最佳采样,并且比其他基于奖励的测试时间对齐方法获得更高的奖励。
摘要:Test-time alignment of large language models (LLMs) attracts attention because fine-tuning LLMs requires high computational costs. In this paper, we propose a new test-time alignment method called adaptive importance sampling on pre-logits (AISP) on the basis of the sampling-based model predictive control with the stochastic control input. AISP applies the Gaussian perturbation into pre-logits, which are outputs of the penultimate layer, so as to maximize expected rewards with respect to the mean of the perturbation. We demonstrate that the optimal mean is obtained by importance sampling with sampled rewards. AISP outperforms best-of-n sampling in terms of rewards over the number of used samples and achieves higher rewards than other reward-based test-time alignment methods.


【10】Beyond Synthetic Benchmarks: Evaluating LLM Performance on Real-World Class-Level Code Generation
标题:超越综合基准:评估LLM在现实世界类别级代码生成方面的性能
链接:https://arxiv.org/abs/2510.26130

作者:Musfiqur Rahman, SayedHassan Khatoonabadi, Emad Shihab
备注:Pre-print prepared for journal submission
摘要:大型语言模型(LLM)在函数级具有高级代码生成,但它们在真实软件项目中生成正确类级实现的能力仍然知之甚少。这项工作介绍了一种新的基准来自开源存储库,包括现实世界的类分为可见和不可见的分区,以评估在实际条件下的泛化。评估检查不同的输入规格,检索增强配置和文档完整性水平下的多个LLM。   结果显示出明显的性能差异:LLM在已建立的合成基准测试中达到84%至89%的正确率,但在现实世界的任务中只有25%至34%的正确率,熟悉和新颖的代码库之间的差异可以忽略不计。全面的文档串在功能准确性上有1%到3%的适度提高,尽管统计学意义很少见。检索增强生成证明是最有效的部分文档,提高了4%至7%的正确性,提供具体的实现模式缺席的规范。错误分析将AttributeError、TypeError和AssertionError确定为主要的故障模式(84%的情况),合成测试过分强调断言问题,而现实场景则强调类型和属性不匹配。检索增强减少了逻辑缺陷,但可能会引入依赖性冲突。   基准测试和分析揭示了当前LLM类级工程能力的关键限制,为增强生产代码辅助工具中的上下文建模,文档策略和检索集成提供了可操作的见解。
摘要:Large language models (LLMs) have advanced code generation at the function level, yet their ability to produce correct class-level implementations in authentic software projects remains poorly understood. This work introduces a novel benchmark derived from open-source repositories, comprising real-world classes divided into seen and unseen partitions to evaluate generalization under practical conditions. The evaluation examines multiple LLMs under varied input specifications, retrieval-augmented configurations, and documentation completeness levels.   Results reveal a stark performance disparity: LLMs achieve 84% to 89% correctness on established synthetic benchmarks but only 25% to 34% on real-world class tasks, with negligible differences between familiar and novel codebases. Comprehensive docstrings yield modest gains of 1% to 3% in functional accuracy, though statistical significance is rare. Retrieval-augmented generation proves most effective with partial documentation, improving correctness by 4% to 7% by supplying concrete implementation patterns absent from specifications. Error profiling identifies AttributeError, TypeError, and AssertionError as dominant failure modes (84% of cases), with synthetic tests overemphasizing assertion issues and real-world scenarios highlighting type and attribute mismatches. Retrieval augmentation reduces logical flaws but can introduce dependency conflicts.   The benchmark and analysis expose critical limitations in current LLM capabilities for class-level engineering, offering actionable insights for enhancing context modelling, documentation strategies, and retrieval integration in production code assistance tools.


【11】ALMGuard: Safety Shortcuts and Where to Find Them as Guardrails for Audio-Language Models
标题:ALMGGuard:安全捷径以及在哪里找到它们作为音频语言模型的护栏
链接:https://arxiv.org/abs/2510.26096

作者:Weifei Jin, Yuxin Cao, Junjie Su, Minhui Xue, Jie Hao, Ke Xu, Jin Song Dong, Derui Wang
备注:Accepted to NeurIPS 2025
摘要 :音频语言模型(ALM)的最新进展显着提高了多模态理解能力。然而,音频模态的引入也带来了新的和独特的脆弱性向量。之前的研究提出了专门针对ALM的越狱攻击,表明直接从传统音频对抗性攻击或基于文本的大型语言模型(LLM)越狱转移的防御对于这些特定于ALM的威胁基本上无效。为了解决这个问题,我们提出了ALMGuard,第一个防御框架量身定制的ALM。基于安全对齐的快捷方式自然存在于ALM中的假设,我们设计了一种方法来识别通用的触发器激活扰动(SAP),该触发器激活安全快捷方式以在推理时保护ALM。为了更好地筛选出有效的触发器,同时保留模型在良性任务上的实用性,我们进一步提出了Mel梯度稀疏掩码(M-GSM),它将扰动限制在对越狱敏感但对语音理解不敏感的Mel频率区间。理论分析和实验结果表明,我们的方法对可见和不可见的攻击的鲁棒性。总的来说,\MethodName在四种模型中将高级特定于ALM的越狱攻击的平均成功率降低到4.6%,同时在良性基准测试中保持相当的实用性,将其确立为最新的技术水平。我们的代码和数据可在https://github.com/WeifeiJin/ALMGuard上获得。
摘要:Recent advances in Audio-Language Models (ALMs) have significantly improved multimodal understanding capabilities. However, the introduction of the audio modality also brings new and unique vulnerability vectors. Previous studies have proposed jailbreak attacks that specifically target ALMs, revealing that defenses directly transferred from traditional audio adversarial attacks or text-based Large Language Model (LLM) jailbreaks are largely ineffective against these ALM-specific threats. To address this issue, we propose ALMGuard, the first defense framework tailored to ALMs. Based on the assumption that safety-aligned shortcuts naturally exist in ALMs, we design a method to identify universal Shortcut Activation Perturbations (SAPs) that serve as triggers that activate the safety shortcuts to safeguard ALMs at inference time. To better sift out effective triggers while preserving the model's utility on benign tasks, we further propose Mel-Gradient Sparse Mask (M-GSM), which restricts perturbations to Mel-frequency bins that are sensitive to jailbreaks but insensitive to speech understanding. Both theoretical analyses and empirical results demonstrate the robustness of our method against both seen and unseen attacks. Overall, \MethodName reduces the average success rate of advanced ALM-specific jailbreak attacks to 4.6% across four models, while maintaining comparable utility on benign benchmarks, establishing it as the new state of the art. Our code and data are available at https://github.com/WeifeiJin/ALMGuard.


【12】PORTool: Tool-Use LLM Training with Rewarded Tree
标题:PORTool:带奖励树的工具使用LLM训练
链接:https://arxiv.org/abs/2510.26020

作者:Feijie Wu, Weiwu Zhu, Yuxiang Zhang, Soumya Chatterjee, Jiarong Zhu, Fan Mo, Rodin Luo, Jing Gao
摘要:当前的工具使用大型语言模型(LLM)是在静态数据集上训练的,使它们能够与外部工具进行交互,并执行多步骤的工具集成推理,从而产生工具调用轨迹。然而,这些模型模仿查询是如何解决在一个通用的工具调用例程,从而未能探索可能的解决方案,并表现出有限的性能在一个不断发展的,动态的工具调用环境。在这项工作中,我们提出了PORTool,这是一种强化学习(RL)方法,它鼓励工具使用LLM来探索产生正确答案的各种轨迹。具体来说,该方法首先为给定查询生成多个卷展栏,其中一些卷展栏共享前几个工具调用步骤,从而形成一个树状结构。接下来,我们根据每个步骤产生正确答案和成功调用工具的能力,为每个步骤分配奖励。跨不同轨迹的共享步骤获得相同的奖励,而同一分叉下的不同步骤获得不同的奖励。最后,这些逐步的奖励被用来计算叉子相对优势,混合与自动化相对优势,训练LLM的工具使用。实验利用17个工具来解决用户查询,涵盖时间敏感和时间不变的主题。我们进行消融研究,以系统地证明的必要性和设计鲁棒性的逐步奖励。此外,我们比较了建议PORTool与其他培训方法,并证明了显着的改善,最终的准确性和工具调用步骤的数量。
摘要:Current tool-use large language models (LLMs) are trained on static datasets, enabling them to interact with external tools and perform multi-step, tool-integrated reasoning, which produces tool-call trajectories. However, these models imitate how a query is resolved in a generic tool-call routine, thereby failing to explore possible solutions and demonstrating limited performance in an evolved, dynamic tool-call environment. In this work, we propose PORTool, a reinforcement learning (RL) method that encourages a tool-use LLM to explore various trajectories yielding the correct answer. Specifically, this method starts with generating multiple rollouts for a given query, and some of them share the first few tool-call steps, thereby forming a tree-like structure. Next, we assign rewards to each step, based on its ability to produce a correct answer and make successful tool calls. A shared step across different trajectories receives the same reward, while different steps under the same fork receive different rewards. Finally, these step-wise rewards are used to calculate fork-relative advantages, blended with trajectory-relative advantages, to train the LLM for tool use. The experiments utilize 17 tools to address user queries, covering both time-sensitive and time-invariant topics. We conduct ablation studies to systematically justify the necessity and the design robustness of step-wise rewards. Furthermore, we compare the proposed PORTool with other training approaches and demonstrate significant improvements in final accuracy and the number of tool-call steps.


【13】AttnCache: Accelerating Self-Attention Inference for LLM Prefill via Attention Cache
标题:Attnache:通过注意力缓存加速LLM预填充的自我注意力推理
链接:https://arxiv.org/abs/2510.25979

作者:Dinghong Song (1), Yuan Feng (1), Yiwei Wang (1), Shangye Chen (1), Cyril Guyot (2), Filip Blagojevic (2), Hyeran Jeon (1), Pengfei Su (1), Dong Li (1) ((1) University of California, Merced, USA, (2) Western Digital Research, USA)
备注:10 pages, 6 figures, submitted to Ninth Annual Conference on Machine Learning and Systems (MLSys'26)
摘要:大型语言模型(LLM)广泛用于生成应用程序,如聊天,代码生成和推理。然而,许多现实世界的工作负载,如分类,问答,推荐和文本嵌入,仅依赖于推理的预填充阶段,其中模型对输入序列进行编码,而不执行自回归解码。在这些仅预填充场景中,自注意计算由于其相对于序列长度的二次复杂度而成为主要性能瓶颈。在本文中,我们观察到,语义不同的句子往往会产生类似的注意力地图层和头。基于这一见解,我们提出了AttnCache,这是一个框架,通过检索和重用类似的注意力地图来加速LLM推理的预填充阶段。AttnCache基于注意力地图记忆数据库,采用高效的缓存和相似性搜索技术,在推理过程中识别和重用预缓存的注意力地图,从而减少自注意力的计算开销。实验结果表明,AttnCache在CPU上实现了平均1.2倍的端到端加速和2倍的注意力加速,在GPU上实现了1.6倍的端到端加速和3倍的注意力加速,准确性下降可以忽略不计。
摘要:Large Language Models (LLMs) are widely used in generative applications such as chatting, code generation, and reasoning. However, many realworld workloads such as classification, question answering, recommendation, and text embedding rely solely on the prefill stage of inference, where the model encodes input sequences without performing autoregressive decoding. In these prefill only scenarios, the self-attention computation becomes the primary performance bottleneck due to its quadratic complexity with respect to sequence length. In this paper, we observe that semantically different sentences often produce similar attention maps across layers and heads. Building on this insight, we propose AttnCache, a framework that accelerates the prefill stage of LLM inference by retrieving and reusing similar attention maps. Based on an attention map memorization database, AttnCache employs efficient caching and similarity search techniques to identify and reuse pre-cached attention maps during inference, thereby reducing the computational overhead of self-attention. Experimental results show that AttnCache achieves an average of 1.2x end-to-end and 2x attention speedup on CPU, and 1.6x end-to-end and 3x attention speedup on GPU, with negligible accuracy degradation.


【14】Revisiting Multilingual Data Mixtures in Language Model Pretraining
标题:语言模型预训练中重新审视多语言数据混合
链接:https://arxiv.org/abs/2510.25947

作者:Negar Foroutan, Paul Teiletche, Ayush Kumar Tarun, Antoine Bosselut
备注:Under Review
摘要:不同的多语言数据混合在预训练大型语言模型(LLM)中的影响一直是一个持续争论的话题,经常引起人们对语言覆盖率和模型性能之间潜在权衡的担忧(即,多语言的诅咒)。在这项工作中,我们通过在不同的多语言语料库上训练1.1B和3B参数LLM来研究这些假设,语言的数量从25到400不等。我们的研究挑战了围绕多语言培训的共同信念。首先,我们发现,结合英语和多语言数据并不一定会降低任何一组的语言性能,前提是语言有足够数量的令牌包括在预训练语料库。其次,我们观察到,使用英语作为一种枢纽语言(即,作为多语言通用化催化剂的高资源语言)在各语言家族中产生益处,并且与预期相反,从特定家族中选择中枢语言并不能始终如一地提高该家族中语言的性能。最后,我们没有观察到显著的“多语言诅咒”,因为在这种规模下,模型中的训练语言数量增加了。我们的研究结果表明,多语言数据,当平衡适当,可以提高语言模型的能力,而不影响性能,即使在低资源设置
摘要 :The impact of different multilingual data mixtures in pretraining large language models (LLMs) has been a topic of ongoing debate, often raising concerns about potential trade-offs between language coverage and model performance (i.e., the curse of multilinguality). In this work, we investigate these assumptions by training 1.1B and 3B parameter LLMs on diverse multilingual corpora, varying the number of languages from 25 to 400. Our study challenges common beliefs surrounding multilingual training. First, we find that combining English and multilingual data does not necessarily degrade the in-language performance of either group, provided that languages have a sufficient number of tokens included in the pretraining corpus. Second, we observe that using English as a pivot language (i.e., a high-resource language that serves as a catalyst for multilingual generalization) yields benefits across language families, and contrary to expectations, selecting a pivot language from within a specific family does not consistently improve performance for languages within that family. Lastly, we do not observe a significant "curse of multilinguality" as the number of training languages increases in models at this scale. Our findings suggest that multilingual data, when balanced appropriately, can enhance language model capabilities without compromising performance, even in low-resource settings


【15】Humains-Junior: A 3.8B Language Model Achieving GPT-4o-Level Factual Accuracy by Directed Exoskeleton Reasoning
标题:Humains-Junior:通过有向外骨骼推理实现GPT-4 o级事实准确性的3.8B语言模型
链接:https://arxiv.org/abs/2510.25933

作者:Nissan Yaron, Dan Bystritsky, Ben-Etzion Yaron
摘要:我们介绍人类少年,3.8B模型,匹配GPT-40的事实接地公共子集内的$\pm 5$ pp等效余量。   结果在Q1-Q500上,在相同的裁判下,GPT-4 o得分为73.5%(95%CI 69.5- 77.2),人类-少年为72.7%(95%CI 68.7- 76.5);配对差异为0.8 pp(bootstrap 95%CI $-3.1$至$+4.7$;排列$p = 0.72$; Cohen $d = 0.023$)。TOST在$\pm 5$ pp(而不是$\pm 3$ pp)处建立等效性。作为托管API购买时,Humans-Junior的基础模型(Phi-3.5-mini-instruction)比Microsoft AI Foundry上的GPT-4 o定价便宜约19倍;自托管或边缘部署可以将增量推理成本推向零。测量与估计的定价来源列表见附录E。   法我们的方法结合了最小的定向“外骨骼推理”的脚手架与行为微调,教协议的遵守(认识纪律),而不是域的答案。单独微调几乎没有增加;结合起来,它们协同作用(+17.7 pp,$p < 0.001$)并减少方差($\approximately 25\%$)。在前沿模型(Q1-Q100;不可比较)的仅限非线性设置中,定向推理将GPT-4 o提高了+11.8 pp至85.3%,将Gemini-2.5-Pro提高了+5.0 pp至93.3%(基线88.3%,$n = 100$);见第5节。   TL; DR. A 3.8B模型达到GPT-4 o级FACTS精度(相当于Q1-Q500的$\pm 5$ pp内)。云定价显示,与GPT-4 o相比,成本低约19倍,自托管/边缘部署可以接近零边际成本。定价来源见附录E。附录F中总结了早期判断下的仅前沿数据增益(Q1-Q100;非可比)和优化提示探索性结果。   关键词:小语言模型,事实基础,定向推理,微调,模型对齐,成本效益AI
摘要:We introduce Humans-Junior, a 3.8B model that matches GPT-4o on the FACTS Grounding public subset within a $\pm 5$ pp equivalence margin.   Results. On Q1--Q500 under identical judges, GPT-4o scores 73.5% (95% CI 69.5--77.2) and Humans-Junior 72.7% (95% CI 68.7--76.5); the paired difference is 0.8 pp (bootstrap 95% CI $-3.1$ to $+4.7$; permutation $p = 0.72$; Cohen's $d = 0.023$). TOST establishes equivalence at $\pm 5$ pp (not at $\pm 3$ pp). When purchased as managed APIs, Humans-Junior's base model (Phi-3.5-mini-instruct) is $\approx 19\times$ less expensive than GPT-4o on Microsoft AI Foundry pricing; self-hosted or edge deployments can drive incremental inference cost toward zero. Measured vs estimated pricing sources are tabulated in Appendix E.   Method. Our approach combines minimal directed "Exoskeleton Reasoning" scaffolds with behavioral fine-tuning that teaches protocol compliance (epistemic discipline) rather than domain answers. Fine-tuning alone adds little; combined, they synergize (+17.7 pp, $p < 0.001$) and reduce variance ($\approx 25\%$). In prompt-only settings on frontier models (Q1--Q100; non-comparable), directed reasoning improved GPT-4o by +11.8 pp to 85.3% and Gemini-2.5-Pro by +5.0 pp to 93.3% (baseline 88.3%, $n = 100$); see Section~5.   TL;DR. A 3.8B model achieves GPT-4o-level FACTS accuracy (equivalent within $\pm 5$ pp on Q1--Q500). Cloud pricing shows $\approx 19\times$ lower cost versus GPT-4o, and self-hosted/edge deployments can approach zero marginal cost. Pricing sources are listed in Appendix E. Frontier prompt-only gains (Q1--Q100; non-comparable) and optimized-prompt exploratory results under earlier judges are summarized in Appendix F.   Keywords: Small Language Models, Factual Grounding, Directed Reasoning, Fine-Tuning, Model Alignment, Cost-Efficient AI


【16】$π_\texttt{RL}$: Online RL Fine-tuning for Flow-based Vision-Language-Action Models
标题:$pi_ extttt {RL}$:基于流的视觉-语言-动作模型的在线RL微调
链接:https://arxiv.org/abs/2510.25889

作者:Kang Chen, Zhihao Liu, Tonghe Zhang, Zhen Guo, Si Xu, Hao Lin, Hongzhi Zang, Quanlu Zhang, Zhaofei Yu, Guoliang Fan, Tiejun Huang, Yu Wang, Chao Yu
备注:Preprint, work in progress. 24 pages
摘要:Vision-Language-Action (VLA) models enable robots to understand and perform complex tasks from multimodal input. Although recent work explores using reinforcement learning (RL) to automate the laborious data collection process in scaling supervised fine-tuning (SFT), applying large-scale RL to flow-based VLAs (e.g., $\pi_0$, $\pi_{0.5}$) remains challenging due to intractable action log-likelihoods from iterative denoising.   We address this challenge with $\pi_{\text{RL}}$, an open-source framework for training flow-based VLAs in parallel simulation. $\pi_{\text{RL}}$ implements two RL algorithms: (1) {Flow-Noise} models the denoising process as a discrete-time MDP with a learnable noise network for exact log-likelihood computation. (2) {Flow-SDE} integrates denoising with agent-environment interaction, formulating a two-layer MDP that employs ODE-to-SDE conversion for efficient RL exploration.   We evaluate $\pi_{\text{RL}}$ on LIBERO and ManiSkill benchmarks. On LIBERO, $\pi_{\text{RL}}$ boosts few-shot SFT models $\pi_0$ and $\pi_{0.5}$ from 57.6% to 97.6% and from 77.1% to 98.3%, respectively. In ManiSkill, we train $\pi_{\text{RL}}$ in 320 parallel environments, improving $\pi_0$ from 41.6% to 85.7% and $\pi_{0.5}$ from 40.0% to 84.8% across 4352 pick-and-place tasks, demonstrating scalable multitask RL under heterogeneous simulation.   Overall, $\pi_{\text{RL}}$ achieves significant performance gains and stronger generalization over SFT-models, validating the effectiveness of online RL for flow-based VLAs.
摘要:Vision-Language-Action (VLA) models enable robots to understand and perform complex tasks from multimodal input. Although recent work explores using reinforcement learning (RL) to automate the laborious data collection process in scaling supervised fine-tuning (SFT), applying large-scale RL to flow-based VLAs (e.g., $\pi_0$, $\pi_{0.5}$) remains challenging due to intractable action log-likelihoods from iterative denoising.   We address this challenge with $\pi_{\text{RL}}$, an open-source framework for training flow-based VLAs in parallel simulation. $\pi_{\text{RL}}$ implements two RL algorithms: (1) {Flow-Noise} models the denoising process as a discrete-time MDP with a learnable noise network for exact log-likelihood computation. (2) {Flow-SDE} integrates denoising with agent-environment interaction, formulating a two-layer MDP that employs ODE-to-SDE conversion for efficient RL exploration.   We evaluate $\pi_{\text{RL}}$ on LIBERO and ManiSkill benchmarks. On LIBERO, $\pi_{\text{RL}}$ boosts few-shot SFT models $\pi_0$ and $\pi_{0.5}$ from 57.6% to 97.6% and from 77.1% to 98.3%, respectively. In ManiSkill, we train $\pi_{\text{RL}}$ in 320 parallel environments, improving $\pi_0$ from 41.6% to 85.7% and $\pi_{0.5}$ from 40.0% to 84.8% across 4352 pick-and-place tasks, demonstrating scalable multitask RL under heterogeneous simulation.   Overall, $\pi_{\text{RL}}$ achieves significant performance gains and stronger generalization over SFT-models, validating the effectiveness of online RL for flow-based VLAs.


【17】Debate2Create: Robot Co-design via Large Language Model Debates
标题:Debate 2Create:通过大型语言模型辩论进行机器人协同设计
链接 :https://arxiv.org/abs/2510.25850

作者:Kevin Qiu, Marek Cygan
摘要:由于机器人庞大的设计空间以及身体和行为之间的紧密耦合,机器人形态和控制的自动化协同设计是一个长期存在的挑战。我们介绍了Debate 2Create(D2C),一个框架,其中大型语言模型(LLM)代理参与结构化的辩证辩论,共同优化机器人的设计和奖励功能。在每一轮中,设计代理提出有针对性的形态修改,控制代理设计一个奖励函数,以利用新的设计。然后,一个由多元评委组成的小组在模拟中评估设计-控制对,并提供指导下一轮辩论的反馈。通过迭代辩论,代理人逐步完善他们的建议,产生越来越有效的机器人设计。值得注意的是,尽管没有明确的多样性目标,D2C产生多样性和专门的形态。在四足动物的运动基准测试中,D2C发现了比默认值远73%的设计,这表明基于LLM的结构化辩论可以作为紧急机器人协同设计的强大机制。我们的结果表明,多智能体辩论与基于物理的反馈相结合,是自动化机器人设计的一个有前途的新范式。
摘要:Automating the co-design of a robot's morphology and control is a long-standing challenge due to the vast design space and the tight coupling between body and behavior. We introduce Debate2Create (D2C), a framework in which large language model (LLM) agents engage in a structured dialectical debate to jointly optimize a robot's design and its reward function. In each round, a design agent proposes targeted morphological modifications, and a control agent devises a reward function tailored to exploit the new design. A panel of pluralistic judges then evaluates the design-control pair in simulation and provides feedback that guides the next round of debate. Through iterative debates, the agents progressively refine their proposals, producing increasingly effective robot designs. Notably, D2C yields diverse and specialized morphologies despite no explicit diversity objective. On a quadruped locomotion benchmark, D2C discovers designs that travel 73% farther than the default, demonstrating that structured LLM-based debate can serve as a powerful mechanism for emergent robot co-design. Our results suggest that multi-agent debate, when coupled with physics-grounded feedback, is a promising new paradigm for automated robot design.


【18】PRESTO: Preimage-Informed Instruction Optimization for Prompting Black-Box LLMs
标题:PRESTO:用于预算黑匣子LLM的前图像信息指令优化
链接:https://arxiv.org/abs/2510.25808

作者:Jaewon Chu, Seunghun Lee, Hyunwoo J. Kim
备注:Accepted to NeurIPS 2025
摘要:大型语言模型(LLM)在不同的领域取得了显着的成功,由于其强大的解释能力。这导致了对黑盒LLM优化指令的兴趣越来越大,其内部参数不可访问,但由于其强大的性能而被广泛使用。为了优化黑盒LLM的指令,最近的方法采用白盒LLM来从优化的软提示生成候选指令。然而,白盒LLM经常将不同的软提示映射到同一指令,导致冗余查询。虽然以前的研究认为这种多对一映射是一种阻碍优化效率的结构,但我们将其重新解释为可以加速优化的有用先验知识。为此,我们介绍了PREimage知情的inSTrandom优化(PRESTO),一种新的框架,利用软提示的原像结构进行有效的优化。PRESTO由三个关键组成部分组成:(1)分数共享,它与原像中的所有软提示共享评估分数;(2)基于原像的初始化,它选择使用原像信息最大化搜索空间覆盖的初始数据点;(3)分数一致性正则化,它强制每个原像内的预测一致性。通过利用前像,PRESTO实现了在相同查询预算下有效获得14倍以上评分数据的效果,从而实现更高效的优化。在33个指令优化任务上的实验结果证明了PRESTO的优越性能。代码可在https://github.com/mlvlab/PRESTO上获得
摘要:Large language models (LLMs) have achieved remarkable success across diverse domains, due to their strong instruction-following capabilities. This has led to increasing interest in optimizing instructions for black-box LLMs, whose internal parameters are inaccessible but widely used due to their strong performance. To optimize instructions for black-box LLMs, recent methods employ white-box LLMs to generate candidate instructions from optimized soft prompts. However, white-box LLMs often map different soft prompts to the same instruction, leading to redundant queries. While previous studies regarded this many-to-one mapping as a structure that hinders optimization efficiency, we reinterpret it as a useful prior knowledge that can accelerate the optimization. To this end, we introduce PREimage-informed inSTruction Optimization (PRESTO), a novel framework that leverages the preimage structure of soft prompts for efficient optimization. PRESTO consists of three key components: (1) score sharing, which shares the evaluation score with all soft prompts in a preimage; (2) preimage-based initialization, which selects initial data points that maximize search space coverage using preimage information; and (3) score consistency regularization, which enforces prediction consistency within each preimage. By leveraging preimages, PRESTO achieves the effect of effectively obtaining 14 times more scored data under the same query budget, resulting in more efficient optimization. Experimental results on 33 instruction optimization tasks demonstrate the superior performance of PRESTO. Code is available at https://github.com/mlvlab/PRESTO


【19】StreetMath: Study of LLMs' Approximation Behaviors
标题:StreetMath:法学硕士逼近行为研究
链接:https://arxiv.org/abs/2510.25776

作者:Chiung-Yi Tseng, Somshubhra Roy, Maisha Thasin, Danyang Zhang, Blessing Effiong
摘要:有大量文献研究了大型语言模型(LLM)的数学推理能力,特别是它们在自回归架构中精确算术运算的性能。然而,他们在非正式的,快节奏的数学运算中执行近似推理的能力却很少受到关注,特别是在非自回归解码器模型中。我们的工作通过引入StreetMath来解决这一差距,StreetMath是一个旨在评估模型在真实世界近似场景下的近似能力的基准。我们对不同的LLM架构进行了广泛的评估:Qwen 3 - 4 B-Instruct-2507,Qwen 3 - 4 B-Thinking-2507,Dream-v0-Instruct-7 B,Falcon-Mamba-7 B-Instruct和Mamba-GPT-3B。此外,我们采用机械的可解释性技术来探测它们的内部计算状态。我们的分析表明,LLM通常试图计算精确的值或调用外部工具,即使在要求近似的任务。此外,虽然模型有时会在早期的层或步骤中得到正确的答案,但在解决近似任务时,它们仍然会消耗更多的令牌。另外的实验表明,精确和近似算术运算依赖于很大程度上独立的神经组件。借鉴认知心理学的研究,我们认为,LLM不表现出认知吝啬的方式与人类在街头数学设置。我们开源我们的工作https://github.com/ctseng777/StreetMath
摘要:There is a substantial body of literature examining the mathematical reasoning capabilities of large language models (LLMs), particularly their performance on precise arithmetic operations in autoregressive architectures. However, their ability to perform approximate reasoning in informal, fast-paced mathematical operations has received far less attention, especially among non-autoregressive decoder models. Our work addresses this gap by introducing StreetMath, a benchmark designed to evaluate models' approximation abilities under real-world approximation scenarios. We conduct extensive evaluations across different LLM architectures: Qwen3-4B-Instruct-2507, Qwen3-4B-Thinking-2507, Dream-v0-Instruct-7B, Falcon-Mamba-7B-Instruct, and Mamba-GPT-3B. Furthermore, we apply mechanistic interpretability techniques to probe their internal computational states. Our analysis reveals that LLMs generally attempt to compute exact values or invoke external tools even in tasks that call for approximation. Moreover, while models sometimes reach the correct answer in early layers or steps, they still consume more tokens when solving approximation tasks. Additional experiments indicate that exact and approximate arithmetic operations rely on largely separate neural components. Drawing upon research on cognitive psychology, we argue that LLMs do not exhibit cognitive miserliness in the same way humans do in street math settings. We open source our work https://github.com/ctseng777/StreetMath


Graph相关(图学习|图神经网络|图优化等)(7篇)

【1】HEIR: Learning Graph-Based Motion Hierarchies
标题:HEIR:学习基于图形的运动分层结构
链接:https://arxiv.org/abs/2510.26786

作者:Cheng Zheng, William Koch, Baiang Li, Felix Heide
备注 :Code link: https://github.com/princeton-computational-imaging/HEIR
摘要:运动的层次结构存在于各个研究领域,包括计算机视觉、图形学和机器人技术,其中复杂的动力学通常源于简单运动组件之间的协调交互。现有的方法来模拟这样的动态通常依赖于手动定义的或启发式的层次结构与固定的运动原语,限制了它们的通用性在不同的任务。在这项工作中,我们提出了一个通用的分层运动建模方法,学习结构化的,可解释的运动关系直接从数据。我们的方法表示观察到的运动,使用基于图的层次结构,明确分解成父继承的模式和局部运动残差的全球绝对运动。我们将层次推理公式化为可微图学习问题,其中顶点表示元素运动,有向边通过图神经网络捕获学习到的父子依赖关系。我们评估我们的分层重建方法上的三个例子:一维平移运动,二维旋转运动,并通过高斯飞溅动态三维场景变形。实验结果表明,我们的方法在1D和2D的情况下重建的内在运动层次,并产生更真实和可解释的变形相比,动态3D高斯飞溅场景的基线。通过提供一个适应性强,数据驱动的分层建模范式,我们的方法提供了一个配方适用于广泛的运动为中心的任务。项目页面:https://light.princeton.edu/HEIR/
摘要:Hierarchical structures of motion exist across research fields, including computer vision, graphics, and robotics, where complex dynamics typically arise from coordinated interactions among simpler motion components. Existing methods to model such dynamics typically rely on manually-defined or heuristic hierarchies with fixed motion primitives, limiting their generalizability across different tasks. In this work, we propose a general hierarchical motion modeling method that learns structured, interpretable motion relationships directly from data. Our method represents observed motions using graph-based hierarchies, explicitly decomposing global absolute motions into parent-inherited patterns and local motion residuals. We formulate hierarchy inference as a differentiable graph learning problem, where vertices represent elemental motions and directed edges capture learned parent-child dependencies through graph neural networks. We evaluate our hierarchical reconstruction approach on three examples: 1D translational motion, 2D rotational motion, and dynamic 3D scene deformation via Gaussian splatting. Experimental results show that our method reconstructs the intrinsic motion hierarchy in 1D and 2D cases, and produces more realistic and interpretable deformations compared to the baseline on dynamic 3D Gaussian splatting scenes. By providing an adaptable, data-driven hierarchical modeling paradigm, our method offers a formulation applicable to a broad range of motion-centric tasks. Project Page: https://light.princeton.edu/HEIR/


【2】Inside CORE-KG: Evaluating Structured Prompting and Coreference Resolution for Knowledge Graphs
标题:CORE-KG内部:评估知识图的结构化预算和共同参考解决方案
链接:https://arxiv.org/abs/2510.26512

作者:Dipak Meher, Carlotta Domeniconi
备注:ICDM 2025 Workshop
摘要:人口偷运网络的适应性越来越强,越来越难以分析。法律案例文档提供了重要的见解,但通常是非结构化的、词汇密集的,并且充满了模糊或不断变化的引用,这对自动知识图(KG)构建构成了重大挑战。虽然最近基于LLM的方法在静态模板上有所改进,但由于缺乏引导提取和共指消解,它们仍然生成具有重复节点的嘈杂的碎片图。最近提出的CORE-KG框架通过集成类型感知的共指模块和域引导的结构化提示来解决这些限制,从而显着减少节点重复和法律噪音。在这项工作中,我们提出了一个系统的消融研究CORE-KG量化其两个关键组成部分的个人贡献。我们的研究结果表明,删除共指消解导致节点重复增加28.32%,噪声节点增加4.32%,而删除结构化提示导致节点重复增加4.34%,噪声节点增加73.33%。这些发现为设计强大的基于LLM的管道从复杂的法律文本中提取结构化表示提供了经验见解。
摘要:Human smuggling networks are increasingly adaptive and difficult to analyze. Legal case documents offer critical insights but are often unstructured, lexically dense, and filled with ambiguous or shifting references, which pose significant challenges for automated knowledge graph (KG) construction. While recent LLM-based approaches improve over static templates, they still generate noisy, fragmented graphs with duplicate nodes due to the absence of guided extraction and coreference resolution. The recently proposed CORE-KG framework addresses these limitations by integrating a type-aware coreference module and domain-guided structured prompts, significantly reducing node duplication and legal noise. In this work, we present a systematic ablation study of CORE-KG to quantify the individual contributions of its two key components. Our results show that removing coreference resolution results in a 28.32% increase in node duplication and a 4.32% increase in noisy nodes, while removing structured prompts leads to a 4.34% increase in node duplication and a 73.33% increase in noisy nodes. These findings offer empirical insights for designing robust LLM-based pipelines for extracting structured representations from complex legal texts.


【3】Robust Graph Condensation via Classification Complexity Mitigation
标题:通过缓解分类复杂性实现稳健的图压缩
链接:https://arxiv.org/abs/2510.26451

作者:Jiayi Luo, Qingyun Sun, Beining Yang, Haonan Yuan, Xingcheng Fu, Yanbiao Ma, Jianxin Li, Philip S. Yu
摘要:图凝聚(GC)因其合成更小但信息丰富的图的能力而受到广泛关注。然而,现有的研究往往忽略了GC的鲁棒性的场景中,原始图被损坏。在这种情况下,我们观察到GC的性能显着恶化,而现有的强大的图学习技术只能提供有限的有效性。通过实证研究和理论分析,我们发现GC本质上是一个本质的降维过程,合成一个具有较低分类复杂度的压缩图。虽然这个属性对于有效的GC性能至关重要,但它仍然非常容易受到对抗性扰动的影响。为了解决这一问题,提高GC的鲁棒性,我们采用图数据流形的几何视角,提出了一种新的流形约束鲁棒图凝聚框架MRGC。具体来说,我们引入了三个图数据流形学习模块,这些模块引导压缩图位于具有最小类模糊性的平滑低维流形内,从而保留了GC的分类复杂度降低能力,并确保在通用对抗攻击下的鲁棒性能。大量实验证明了\ModelName\在各种攻击场景中的鲁棒性。
摘要:Graph condensation (GC) has gained significant attention for its ability to synthesize smaller yet informative graphs. However, existing studies often overlook the robustness of GC in scenarios where the original graph is corrupted. In such cases, we observe that the performance of GC deteriorates significantly, while existing robust graph learning technologies offer only limited effectiveness. Through both empirical investigation and theoretical analysis, we reveal that GC is inherently an intrinsic-dimension-reducing process, synthesizing a condensed graph with lower classification complexity. Although this property is critical for effective GC performance, it remains highly vulnerable to adversarial perturbations. To tackle this vulnerability and improve GC robustness, we adopt the geometry perspective of graph data manifold and propose a novel Manifold-constrained Robust Graph Condensation framework named MRGC. Specifically, we introduce three graph data manifold learning modules that guide the condensed graph to lie within a smooth, low-dimensional manifold with minimal class ambiguity, thereby preserving the classification complexity reduction capability of GC and ensuring robust performance under universal adversarial attacks. Extensive experiments demonstrate the robustness of \ModelName\ across diverse attack scenarios.


【4】A Survey of Heterogeneous Graph Neural Networks for Cybersecurity Anomaly Detection
标题:用于网络安全异常检测的异类图神经网络综述
链接:https://arxiv.org/abs/2510.26307

作者:Laura Jiang, Reza Ryan, Qian Li, Nasim Ferdosian
备注:37 pages, 4 figures, 86 references. Submitted to Journal of Computer Security (under review)
摘要:异常检测是网络安全中的一项关键任务,识别内部威胁、访问违规和协同攻击对于确保系统弹性至关重要。基于图的方法对于建模实体交互变得越来越重要,但大多数依赖于同质和静态结构,这限制了它们捕获真实世界环境的异质性和时间演化的能力。异构图神经网络(HGNNs)已经成为异常检测的一个很有前途的范例,它结合了类型感知的转换和关系敏感的聚合,使复杂网络数据的建模更具表现力。然而,目前对基于HGNN的异常检测的研究仍然是分散的,建模策略多样,比较评估有限,并且缺乏标准化的基准。为了解决这一差距,我们对网络安全中基于HGNN的异常检测方法进行了全面调查。我们引入了一种分类法,该分类法按异常类型和图形动态对方法进行分类,分析代表性模型,并将其映射到关键的网络安全应用程序。我们还回顾了常用的基准数据集和评估指标,强调了它们的优势和局限性。最后,我们确定了与建模,数据和部署相关的关键开放挑战,并概述了未来研究的方向。这项调查旨在建立一个结构化的基础,以推进基于HGNN的异常检测,实现可扩展、可解释和实际部署的解决方案。
摘要 :Anomaly detection is a critical task in cybersecurity, where identifying insider threats, access violations, and coordinated attacks is essential for ensuring system resilience. Graph-based approaches have become increasingly important for modeling entity interactions, yet most rely on homogeneous and static structures, which limits their ability to capture the heterogeneity and temporal evolution of real-world environments. Heterogeneous Graph Neural Networks (HGNNs) have emerged as a promising paradigm for anomaly detection by incorporating type-aware transformations and relation-sensitive aggregation, enabling more expressive modeling of complex cyber data. However, current research on HGNN-based anomaly detection remains fragmented, with diverse modeling strategies, limited comparative evaluation, and an absence of standardized benchmarks. To address this gap, we provide a comprehensive survey of HGNN-based anomaly detection methods in cybersecurity. We introduce a taxonomy that classifies approaches by anomaly type and graph dynamics, analyze representative models, and map them to key cybersecurity applications. We also review commonly used benchmark datasets and evaluation metrics, highlighting their strengths and limitations. Finally, we identify key open challenges related to modeling, data, and deployment, and outline promising directions for future research. This survey aims to establish a structured foundation for advancing HGNN-based anomaly detection toward scalable, interpretable, and practically deployable solutions.


【5】Topology-Aware Active Learning on Graphs
标题:图上的具有布局感知的主动学习
链接:https://arxiv.org/abs/2510.25892

作者:Harris Hardiman-Mostow, Jack Mauro, Adrien Weihs, Andrea L. Bertozzi
摘要:我们提出了一种图形拓扑主动学习的方法,直接针对稀缺的标签预算下的探索与开发的核心挑战。为了指导探索,我们引入了一个基于平衡福尔曼曲率(BFC)的核心集构造算法,该算法选择了反映图的聚类结构的代表性初始标签。该方法包括一个数据驱动的停止标准,该标准在图形已被充分探索时发出信号。我们进一步使用BFC动态触发从探索到开发的转变,在主动学习例程,取代手动调整的算法。为了提高利用率,我们引入了一个本地化的图形重新布线策略,有效地结合了标记节点周围的多尺度信息,增强标签传播,同时保持稀疏性。在基准分类任务上的实验表明,我们的方法在低标签率下始终优于现有的基于图的半监督基线。
摘要:We propose a graph-topological approach to active learning that directly targets the core challenge of exploration versus exploitation under scarce label budgets. To guide exploration, we introduce a coreset construction algorithm based on Balanced Forman Curvature (BFC), which selects representative initial labels that reflect the graph's cluster structure. This method includes a data-driven stopping criterion that signals when the graph has been sufficiently explored. We further use BFC to dynamically trigger the shift from exploration to exploitation within active learning routines, replacing hand-tuned heuristics. To improve exploitation, we introduce a localized graph rewiring strategy that efficiently incorporates multiscale information around labeled nodes, enhancing label propagation while preserving sparsity. Experiments on benchmark classification tasks show that our methods consistently outperform existing graph-based semi-supervised baselines at low label rates.


【6】Flex-GAD : Flexible Graph Anomaly Detection
标题:Flex-GAD:灵活的图异常检测
链接:https://arxiv.org/abs/2510.25809

作者:Apu Chakraborty, Anshul Kumar, Gagan Raj Gupta
摘要:检测属性网络中的异常节点,其中每个节点都与结构连接和描述性属性相关联,对于识别社交网络,学术引文图和电子商务平台等领域中的欺诈,错误信息和可疑行为至关重要。我们提出了Flex-GAD,一种新的无监督框架,用于在节点级别进行图异常检测。Flex-GAD集成了两个编码器来捕获图形数据的互补方面。该框架采用了一种新的基于社区的GCN编码器,将社区内和社区间的信息建模到节点嵌入中,从而确保结构的一致性,以及标准的属性编码器。这些不同的表示融合使用基于自我注意力的表示融合模块,它使自适应加权和有效的整合的编码信息。这种融合机制允许跨不同编码器自动强调最相关的节点表示。我们评估Flex-GAD在七个真实世界的属性图具有不同的大小,节点度和属性同质性。Flex-GAD的平均AUC比之前表现最好的方法GAD-NR提高了7.98%,证明了其在不同图形结构中的有效性和灵活性。此外,它还显著缩短了训练时间,在7个基准数据集上,平均每个epoch的运行速度比Anomaly DAE快102倍,比GAD-NR快3倍。
摘要:Detecting anomalous nodes in attributed networks, where each node is associated with both structural connections and descriptive attributes, is essential for identifying fraud, misinformation, and suspicious behavior in domains such as social networks, academic citation graphs, and e-commerce platforms. We propose Flex-GAD, a novel unsupervised framework for graph anomaly detection at the node level. Flex-GAD integrates two encoders to capture complementary aspects of graph data. The framework incorporates a novel community-based GCN encoder to model intra-community and inter-community information into node embeddings, thereby ensuring structural consistency, along with a standard attribute encoder. These diverse representations are fused using a self-attention-based representation fusion module, which enables adaptive weighting and effective integration of the encoded information. This fusion mechanism allows automatic emphasis of the most relevant node representation across different encoders. We evaluate Flex-GAD on seven real-world attributed graphs with varying sizes, node degrees, and attribute homogeneity. Flex-GAD achieves an average AUC improvement of 7.98% over the previously best-performing method, GAD-NR, demonstrating its effectiveness and flexibility across diverse graph structures. Moreover, it significantly reduces training time, running 102x faster per epoch than Anomaly DAE and 3x faster per epoch than GAD-NR on average across seven benchmark datasets.


【7】Transformers Provably Learn Directed Acyclic Graphs via Kernel-Guided Mutual Information
标题:Transformer通过核引导互信息可证明学习有向无环图
链接:https://arxiv.org/abs/2510.25542

作者:Yuan Cheng, Yu Huang, Zhe Xiong, Yingbin Liang, Vincent Y. F. Tan
摘要:揭示真实世界数据背后的隐藏图结构是科学领域广泛应用的关键挑战。最近,利用注意力机制的基于transformer的模型在捕获图中的复杂依赖关系方面取得了很大的经验成功。然而,对它们的训练动态的理论理解仅限于树状图,其中每个节点依赖于单个父节点。将可证明的保证扩展到更一般的有向无环图(DAG)--每个节点涉及多个父节点--仍然具有挑战性,主要是由于设计训练目标使不同的注意力集中者能够单独学习多个不同的父关系的困难。   在这项工作中,我们解决这个问题,通过引入一种新的信息理论度量:核引导的互信息(KG-MI),基于$f$-发散。我们的目标结合KG-MI与多头注意力框架,其中每个头与一个不同的边缘过渡内核有效地建模不同的亲子依赖关系。我们证明,给定的序列所产生的$K$-父DAG,训练一个单层,多头Transformer通过梯度上升收敛到全局最优的多项式时间。此外,我们描述了收敛时的注意力得分模式。此外,当将$f$-发散具体化为KL发散时,学习的注意力分数准确地反映了地面实况邻接矩阵,从而可证明地恢复了底层图形结构。实验结果验证了我们的理论研究结果。
摘要:Uncovering hidden graph structures underlying real-world data is a critical challenge with broad applications across scientific domains. Recently, transformer-based models leveraging the attention mechanism have demonstrated strong empirical success in capturing complex dependencies within graphs. However, the theoretical understanding of their training dynamics has been limited to tree-like graphs, where each node depends on a single parent. Extending provable guarantees to more general directed acyclic graphs (DAGs) -- which involve multiple parents per node -- remains challenging, primarily due to the difficulty in designing training objectives that enable different attention heads to separately learn multiple different parent relationships.   In this work, we address this problem by introducing a novel information-theoretic metric: the kernel-guided mutual information (KG-MI), based on the $f$-divergence. Our objective combines KG-MI with a multi-head attention framework, where each head is associated with a distinct marginal transition kernel to model diverse parent-child dependencies effectively. We prove that, given sequences generated by a $K$-parent DAG, training a single-layer, multi-head transformer via gradient ascent converges to the global optimum in polynomial time. Furthermore, we characterize the attention score patterns at convergence. In addition, when particularizing the $f$-divergence to the KL divergence, the learned attention scores accurately reflect the ground-truth adjacency matrix, thereby provably recovering the underlying graph structure. Experimental results validate our theoretical findings.


Transformer(3篇)

【1】Learning Pseudorandom Numbers with Transformers: Permuted Congruential Generators, Curricula, and Interpretability
标题:用变换器学习伪随机数:置换同余生成器、课程和可解释性
链接:https://arxiv.org/abs/2510.26792

作者:Tao Tao, Maissam Barkeshli
备注:10+13 pages, 8+19 figures
摘要:我们研究了Transformer模型学习由置换同余生成器(PCG)生成的序列的能力,PCG是一种广泛使用的伪随机数生成器(PRNG)。PCG通过对隐藏状态应用一系列逐位移位、XOR、旋转和截断,引入了比线性同余生成器(LCG)更多的困难。我们发现,Transformers仍然可以成功地执行上下文预测看不见的序列从不同的PCG变体,在任务中,是超越了经典的攻击。在我们的实验中,我们使用高达5000万美元的模型参数和高达50亿美元的令牌数据集将模数扩展到2 ^{22}$。令人惊讶的是,我们发现即使输出被截断为一个位,它也可以通过模型可靠地预测。当多个不同的PRNG在训练过程中一起出现时,模型可以联合学习它们,从不同的排列中识别结构。我们展示了一个模为$m$的缩放律:接近完美预测所需的上下文序列元素的数量随着$\sqrt{m}$而增长。对于较大的模数,优化进入扩展停滞阶段;在我们的实验中,学习模数需要从较小的模数中合并训练数据,这证明了课程学习的关键必要性。最后,我们分析了嵌入层,并发现了一种新的聚类现象:该模型自发地将整数输入分组为按位旋转不变的聚类,揭示了表示如何从较小的模数转移到较大的模数。
摘要:We study the ability of Transformer models to learn sequences generated by Permuted Congruential Generators (PCGs), a widely used family of pseudo-random number generators (PRNGs). PCGs introduce substantial additional difficulty over linear congruential generators (LCGs) by applying a series of bit-wise shifts, XORs, rotations and truncations to the hidden state. We show that Transformers can nevertheless successfully perform in-context prediction on unseen sequences from diverse PCG variants, in tasks that are beyond published classical attacks. In our experiments we scale moduli up to $2^{22}$ using up to $50$ million model parameters and datasets with up to $5$ billion tokens. Surprisingly, we find even when the output is truncated to a single bit, it can be reliably predicted by the model. When multiple distinct PRNGs are presented together during training, the model can jointly learn them, identifying structures from different permutations. We demonstrate a scaling law with modulus $m$: the number of in-context sequence elements required for near-perfect prediction grows as $\sqrt{m}$. For larger moduli, optimization enters extended stagnation phases; in our experiments, learning moduli $m \geq 2^{20}$ requires incorporating training data from smaller moduli, demonstrating a critical necessity for curriculum learning. Finally, we analyze embedding layers and uncover a novel clustering phenomenon: the model spontaneously groups the integer inputs into bitwise rotationally-invariant clusters, revealing how representations can transfer from smaller to larger moduli.


【2】Mixture-of-Experts Operator Transformer for Large-Scale PDE Pre-Training
标题:用于大规模PED预训练的专家混合操作员Transformer
链接:https://arxiv.org/abs/2510.25803

作者:Hong Wang, Haiyang Xin, Jie Wang, Xuanze Yang, Fei Zha, Huanshuo Dong, Yan Jiang
摘要:预训练已被证明可以有效地解决数据稀缺性和使用神经运算符解决PDE问题的性能限制。然而,由于方程类型的PDE数据集的异质性,这导致混合训练中的高误差,因此仍然存在挑战。此外,通过增加网络宽度或深度来扩展参数的密集预训练模型会产生显著的推理成本。为了解决这些挑战,我们提出了一种新的混合专家预训练算子Transformer(MoE-POT),这是一种稀疏激活的架构,可以有效地扩展参数,同时控制推理成本。具体来说,我们的模型采用逐层路由门控网络,在推理过程中从16个专家网络中动态选择4个路由专家,使模型能够专注于方程特定的特征。同时,我们还集成了2个共享专家,旨在捕捉偏微分方程的共同属性并减少路由专家之间的冗余。最终输出计算为来自所有激活的专家的结果的加权平均。我们在6个公共PDE数据集上预训练参数从30 M到0.5B的模型。我们的模型与90 M激活参数实现了高达40%的减少zero-shot错误与现有的模型与120 M激活参数相比。此外,我们进行可解释性分析,表明数据集类型可以推断出路由器门控网络决策,这验证了MoE架构的合理性和有效性。
摘要:Pre-training has proven effective in addressing data scarcity and performance limitations in solving PDE problems with neural operators. However, challenges remain due to the heterogeneity of PDE datasets in equation types, which leads to high errors in mixed training. Additionally, dense pre-training models that scale parameters by increasing network width or depth incur significant inference costs. To tackle these challenges, we propose a novel Mixture-of-Experts Pre-training Operator Transformer (MoE-POT), a sparse-activated architecture that scales parameters efficiently while controlling inference costs. Specifically, our model adopts a layer-wise router-gating network to dynamically select 4 routed experts from 16 expert networks during inference, enabling the model to focus on equation-specific features. Meanwhile, we also integrate 2 shared experts, aiming to capture common properties of PDE and reduce redundancy among routed experts. The final output is computed as the weighted average of the results from all activated experts. We pre-train models with parameters from 30M to 0.5B on 6 public PDE datasets. Our model with 90M activated parameters achieves up to a 40% reduction in zero-shot error compared with existing models with 120M activated parameters. Additionally, we conduct interpretability analysis, showing that dataset types can be inferred from router-gating network decisions, which validates the rationality and effectiveness of the MoE architecture.


【3】The Kinetics of Reasoning: How Chain-of-Thought Shapes Learning in Transformers?
标题:推理的动力学:思维链如何在Transformer中推动学习?
链接:https://arxiv.org/abs/2510.25791

作者:Zihan Pengmei, Costas Mavromatis, Zhengyuan Shen, Yunyi Zhang, Vassilis N. Ioannidis, Huzefa Rangwala
备注:10 pages, 7 figures, with appendix
摘要:思想链(CoT)监督可以大大提高Transformer的性能,但模型学习遵循CoT并从中受益的机制仍然知之甚少。我们调查这些学习动态通过镜头grokking预训练Transformers器上的符号推理任务与可调的算法复杂性和可控的数据组成,以研究其泛化。模型在两种设置下进行训练:(i)仅产生最终答案,以及(ii)在回答之前发出明确的CoT轨迹。我们的研究结果表明,虽然CoT通常可以提高任务性能,但它的好处取决于任务的复杂性。为了量化这些影响,我们用三参数逻辑曲线对对数训练步骤的准确性进行了建模,揭示了学习速度和形状如何随任务复杂性、数据分布和CoT监督的存在而变化。我们还发现了一个短暂的跟踪不忠诚阶段:在训练的早期,模型通常会在跳过或反驳CoT步骤的同时产生正确的答案,然后再将其推理轨迹与答案对齐。从经验上讲,我们(1)证明CoT加速了泛化,但不能克服算法复杂度较高的任务,例如查找列表交集;(2)引入一个动力学建模框架来理解Transformer学习;(3)将跟踪忠实性描述为在训练过程中出现的动态属性;(4)显示CoT机械地改变了内部Transformer计算。
摘要:Chain-of-thought (CoT) supervision can substantially improve transformer performance, yet the mechanisms by which models learn to follow and benefit from CoT remain poorly understood. We investigate these learning dynamics through the lens of grokking by pretraining transformers on symbolic reasoning tasks with tunable algorithmic complexity and controllable data composition to study their generalization. Models were trained under two settings: (i) producing only final answers, and (ii) emitting explicit CoT traces before answering. Our results show that while CoT generally improves task performance, its benefits depend on task complexity. To quantify these effects, we model the accuracy of the logarithmic training steps with a three-parameter logistic curve, revealing how the learning speed and shape vary with task complexity, data distribution, and the presence of CoT supervision. We also uncover a transient trace unfaithfulness phase: early in training, models often produce correct answers while skipping or contradicting CoT steps, before later aligning their reasoning traces with answers. Empirically, we (1) demonstrate that CoT accelerates generalization but does not overcome tasks with higher algorithmic complexity, such as finding list intersections; (2) introduce a kinetic modeling framework for understanding transformer learning; (3) characterize trace faithfulness as a dynamic property that emerges over training; and (4) show CoT alters internal transformer computation mechanistically.


GAN|对抗|攻击|生成相关(6篇)

【1】OmniX: From Unified Panoramic Generation and Perception to Graphics-Ready 3D Scenes
标题:OmniX:从统一的全景生成和感知到图形就绪的3D场景
链接:https://arxiv.org/abs/2510.26800

作者:Yukun Huang, Jiwen Yu, Yanning Zhou, Jianan Wang, Xintao Wang, Pengfei Wan, Xihui Liu
备注:Project page: this https URL
摘要:有两种常用的方法来构建3D场景:程序生成和2D提升。在这些技术中,基于Amma-based的2D提升技术已经成为一种很有前途的技术,它利用强大的2D生成先验来生成沉浸式、逼真和多样化的3D环境。在这项工作中,我们提出了这种技术,以生成图形就绪的3D场景,适合于物理渲染(PBR),重新照明,和模拟。我们的主要见解是重新利用2D生成模型,以实现几何形状,纹理和PBR材料的全景感知。与现有的2D提升方法,强调外观生成,忽略了感知的内在属性,我们提出了OmniX,一个通用的和统一的框架。基于轻量级和高效的跨模态适配器结构,OmniX可重用2D生成先验,用于广泛的全景视觉任务,包括全景感知,生成和完成。此外,我们构建了一个大规模的合成全景数据集,其中包含来自不同室内和室外场景的高质量多模态全景图。大量的实验证明了我们的模型在全景视觉感知和图形就绪的3D场景生成的有效性,为沉浸式和物理逼真的虚拟世界生成开辟了新的可能性。
摘要:There are two prevalent ways to constructing 3D scenes: procedural generation and 2D lifting. Among them, panorama-based 2D lifting has emerged as a promising technique, leveraging powerful 2D generative priors to produce immersive, realistic, and diverse 3D environments. In this work, we advance this technique to generate graphics-ready 3D scenes suitable for physically based rendering (PBR), relighting, and simulation. Our key insight is to repurpose 2D generative models for panoramic perception of geometry, textures, and PBR materials. Unlike existing 2D lifting approaches that emphasize appearance generation and ignore the perception of intrinsic properties, we present OmniX, a versatile and unified framework. Based on a lightweight and efficient cross-modal adapter structure, OmniX reuses 2D generative priors for a broad range of panoramic vision tasks, including panoramic perception, generation, and completion. Furthermore, we construct a large-scale synthetic panorama dataset containing high-quality multimodal panoramas from diverse indoor and outdoor scenes. Extensive experiments demonstrate the effectiveness of our model in panoramic visual perception and graphics-ready 3D scene generation, opening new possibilities for immersive and physically realistic virtual world generation.


【2】Quantum Gated Recurrent GAN with Gaussian Uncertainty for Network Anomaly Detection
标题:用于网络异常检测的具有高斯不确定性的量子门循环GAN
链接:https://arxiv.org/abs/2510.26487

作者:Wajdi Hammami, Soumaya Cherkaoui, Jean-Frederic Laprade, Ola Ahmad, Shengrui Wang
摘要:时间序列数据中的异常检测是一个关键的挑战,对网络安全具有重大意义。最近的量子机器学习方法,如量子核方法和变分量子电路,在捕获复杂的数据分布进行异常检测方面表现出了希望,但仍然受到有限量子位数的限制。在这项工作中,我们介绍了一种新的量子门控递归单元(QGRU)为基础的生成对抗网络(GAN)采用连续数据注入(SuDaI)和一个强大的网络异常检测的多度量门控策略。我们的模型独特地利用了量子增强生成器,该生成器通过重新参数化输出高斯分布的参数(均值和对数方差),并结合Wasserstein批评来稳定对抗训练。通过一种新的门控机制来识别异常,该机制最初基于高斯不确定性估计来标记潜在的异常,随后使用评论家评分和重建错误的复合来验证它们。在基准数据集上进行评估,与现有的经典和量子模型相比,我们的方法实现了89.43%的高时间序列感知F1得分(TaF 1),证明了准确和及时检测异常的卓越能力。此外,经过训练的QGRU-WGAN被部署在真实的IBM Quantum硬件上,在那里它保留了高异常检测性能,证实了它在当前有噪声的中间尺度量子(NISQ)设备上的鲁棒性和实际可行性。
摘要:Anomaly detection in time-series data is a critical challenge with significant implications for network security. Recent quantum machine learning approaches, such as quantum kernel methods and variational quantum circuits, have shown promise in capturing complex data distributions for anomaly detection but remain constrained by limited qubit counts. We introduce in this work a novel Quantum Gated Recurrent Unit (QGRU)-based Generative Adversarial Network (GAN) employing Successive Data Injection (SuDaI) and a multi-metric gating strategy for robust network anomaly detection. Our model uniquely utilizes a quantum-enhanced generator that outputs parameters (mean and log-variance) of a Gaussian distribution via reparameterization, combined with a Wasserstein critic to stabilize adversarial training. Anomalies are identified through a novel gating mechanism that initially flags potential anomalies based on Gaussian uncertainty estimates and subsequently verifies them using a composite of critic scores and reconstruction errors. Evaluated on benchmark datasets, our method achieves a high time-series aware F1 score (TaF1) of 89.43% demonstrating superior capability in detecting anomalies accurately and promptly as compared to existing classical and quantum models. Furthermore, the trained QGRU-WGAN was deployed on real IBM Quantum hardware, where it retained high anomaly detection performance, confirming its robustness and practical feasibility on current noisy intermediate-scale quantum (NISQ) devices.


【3】Distributional Multi-objective Black-box Optimization for Diffusion-model Inference-time Multi-Target Generation
标题:扩散模型推断时间多目标生成的分布式多目标黑匣子优化
链接:https://arxiv.org/abs/2510.26278

作者:Kim Yong Tan, Yueming Lyu, Ivor Tsang, Yew-Soon Ong
摘要:扩散模型在学习复杂的数据分布方面取得了成功。这一能力推动了它们在高维多目标黑箱优化问题中的应用。现有的方法通常采用外部优化循环,如进化算法,扩散模型。然而,这些方法将扩散模型视为黑箱细化器,其忽略了扩散生成过程的内部分布转变,从而限制了它们的效率。为了解决这些挑战,我们提出了推理时间多目标生成(IMG)算法,该算法优化了推理时间的扩散过程,以生成同时满足多个目标的样本。具体来说,我们的IMG执行加权resception在扩散生成过程中,根据预期的聚合多目标值。这种加权再分配策略确保了扩散生成的样本根据我们所需的多目标玻尔兹曼分布进行分布。我们进一步推导出多目标玻尔兹曼分布有一个有趣的对数似然解释,它是分布式多目标优化问题的最优解。我们实现了IMG的多目标分子生成任务。实验表明,IMG,只需要一个单一的一代通过,实现了显着更高的超体积比基线优化算法,往往需要数百个扩散代。值得注意的是,我们的算法可以被看作是一个优化的扩散过程,可以集成到现有的方法,以进一步提高其性能。
摘要:Diffusion models have been successful in learning complex data distributions. This capability has driven their application to high-dimensional multi-objective black-box optimization problem. Existing approaches often employ an external optimization loop, such as an evolutionary algorithm, to the diffusion model. However, these approaches treat the diffusion model as a black-box refiner, which overlooks the internal distribution transition of the diffusion generation process, limiting their efficiency. To address these challenges, we propose the Inference-time Multi-target Generation (IMG) algorithm, which optimizes the diffusion process at inference-time to generate samples that simultaneously satisfy multiple objectives. Specifically, our IMG performs weighted resampling during the diffusion generation process according to the expected aggregated multi-objective values. This weighted resampling strategy ensures the diffusion-generated samples are distributed according to our desired multi-target Boltzmann distribution. We further derive that the multi-target Boltzmann distribution has an interesting log-likelihood interpretation, where it is the optimal solution to the distributional multi-objective optimization problem. We implemented IMG for a multi-objective molecule generation task. Experiments show that IMG, requiring only a single generation pass, achieves a significantly higher hypervolume than baseline optimization algorithms that often require hundreds of diffusion generations. Notably, our algorithm can be viewed as an optimized diffusion process and can be integrated into existing methods to further improve their performance.


【4】New Money: A Systematic Review of Synthetic Data Generation for Finance
标题:新货币:金融合成数据生成的系统性回顾
链接:https://arxiv.org/abs/2510.26076

作者:James Meldrum, Basem Suleiman, Fethi Rabhi, Muhammad Johan Alibasa
备注:37 pages, 5 figures, 21 tables
摘要:合成数据生成已经成为一种很有前途的方法,可以解决在机器学习应用程序中使用敏感财务数据的挑战。通过利用生成模型,如生成对抗网络(GANs)和变分自动编码器(VAE),可以创建人工数据集,保留真实财务记录的统计属性,同时减轻隐私风险和监管限制。尽管这一领域的快速增长,目前的研究景观的全面综合一直缺乏。这篇系统性综述整合和分析了自2018年以来发表的72项研究,这些研究专注于合成财务数据的生成。我们对综合的财务信息类型、采用的生成方法以及用于评估数据效用和隐私的评估策略进行了分类。研究结果表明,基于GAN的方法在文献中占主导地位,特别是在生成时间序列市场数据和表格信贷数据方面。虽然一些创新的技术表现出改善现实和隐私保护的潜力,但在研究中仍然缺乏对隐私保护的严格评估。通过对生成技术、应用和评估方法进行综合概述,本文强调了关键的研究差距,并为未来旨在为金融领域开发强大的、保护隐私的合成数据解决方案的工作提供了指导。
摘要:Synthetic data generation has emerged as a promising approach to address the challenges of using sensitive financial data in machine learning applications. By leveraging generative models, such as Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs), it is possible to create artificial datasets that preserve the statistical properties of real financial records while mitigating privacy risks and regulatory constraints. Despite the rapid growth of this field, a comprehensive synthesis of the current research landscape has been lacking. This systematic review consolidates and analyses 72 studies published since 2018 that focus on synthetic financial data generation. We categorise the types of financial information synthesised, the generative methods employed, and the evaluation strategies used to assess data utility and privacy. The findings indicate that GAN-based approaches dominate the literature, particularly for generating time-series market data and tabular credit data. While several innovative techniques demonstrate potential for improved realism and privacy preservation, there remains a notable lack of rigorous evaluation of privacy safeguards across studies. By providing an integrated overview of generative techniques, applications, and evaluation methods, this review highlights critical research gaps and offers guidance for future work aimed at developing robust, privacy-preserving synthetic data solutions for the financial domain.


【5】ScaleDiff: Higher-Resolution Image Synthesis via Efficient and Model-Agnostic Diffusion
标题:ScaleDiff:通过高效且模型不可知的扩散实现更高分辨率的图像合成
链接:https://arxiv.org/abs/2510.25818

作者:Sungho Koh, SeungJu Cha, Hyunwoo Oh, Kwanyoung Lee, Dong-Jin Kim
备注:NeurIPS 2025. Code: this https URL
摘要:文本到图像扩散模型在生成超出其训练分辨率的图像时通常表现出性能下降。最近的无训练方法可以减轻这种限制,但它们通常需要大量的计算或与最近的扩散Transformer模型不兼容。在本文中,我们提出了ScaleDiff,这是一个与模型无关的高效框架,用于扩展预训练扩散模型的分辨率,而无需任何额外的训练。我们的框架的一个核心组成部分是邻域补丁注意力(NPA),一个有效的机制,减少计算冗余的自我注意力层与非重叠补丁。我们将NPA集成到SDEDit管道中,并引入潜在频率混合(LFM)以更好地生成精细细节。此外,我们应用结构指导,以提高整体结构在去噪过程中。实验结果表明,ScaleDiff在U-Net和Diffusion Transformer架构上的图像质量和推理速度方面都达到了免训练方法中最先进的性能。
摘要:Text-to-image diffusion models often exhibit degraded performance when generating images beyond their training resolution. Recent training-free methods can mitigate this limitation, but they often require substantial computation or are incompatible with recent Diffusion Transformer models. In this paper, we propose ScaleDiff, a model-agnostic and highly efficient framework for extending the resolution of pretrained diffusion models without any additional training. A core component of our framework is Neighborhood Patch Attention (NPA), an efficient mechanism that reduces computational redundancy in the self-attention layer with non-overlapping patches. We integrate NPA into an SDEdit pipeline and introduce Latent Frequency Mixing (LFM) to better generate fine details. Furthermore, we apply Structure Guidance to enhance global structure during the denoising process. Experimental results demonstrate that ScaleDiff achieves state-of-the-art performance among training-free methods in terms of both image quality and inference speed on both U-Net and Diffusion Transformer architectures.


【6】Data-driven Projection Generation for Efficiently Solving Heterogeneous Quadratic Programming Problems
标题:数据驱动投影生成,有效解决异类二次规划问题
链接:https://arxiv.org/abs/2510.26061

作者:Tomoharu Iwata, Futoshi Futami
摘要:我们提出了一个数据驱动的框架,有效地解决二次规划(QP)的问题,通过减少变量的数量在高维QP使用实例特定的投影。基于图神经网络的模型旨在生成针对每个QP实例的投影,使我们能够为以前看不见的问题提供高质量的解决方案。该模型在异构QP上进行训练,以最小化在投影解决方案上评估的预期目标值。这被表述为一个两层优化问题;内部优化使用QP求解器求解给定投影下的QP,而外部优化更新模型参数。我们开发了一个有效的算法来解决这个双层优化问题,它计算参数梯度,而不通过求解器反向传播。我们提供了一个理论分析的泛化能力,解决问题的投影矩阵产生的神经网络。实验结果表明,我们的方法产生高质量的可行解,减少计算时间,优于现有的方法。
摘要:We propose a data-driven framework for efficiently solving quadratic programming (QP) problems by reducing the number of variables in high-dimensional QPs using instance-specific projection. A graph neural network-based model is designed to generate projections tailored to each QP instance, enabling us to produce high-quality solutions even for previously unseen problems. The model is trained on heterogeneous QPs to minimize the expected objective value evaluated on the projected solutions. This is formulated as a bilevel optimization problem; the inner optimization solves the QP under a given projection using a QP solver, while the outer optimization updates the model parameters. We develop an efficient algorithm to solve this bilevel optimization problem, which computes parameter gradients without backpropagating through the solver. We provide a theoretical analysis of the generalization ability of solving QPs with projection matrices generated by neural networks. Experimental results demonstrate that our method produces high-quality feasible solutions with reduced computation time, outperforming existing methods.


半/弱/无/有监督|不确定性|主动学习(7篇)

【1】Enhancing ECG Classification Robustness with Lightweight Unsupervised Anomaly Detection Filters
标题:利用轻量级无监督异常检测过滤器增强心电图分类稳健性
链接:https://arxiv.org/abs/2510.26501

作者:Mustafa Fuad Rifet Ibrahim, Maurice Meijer, Alexander Schlaefer, Peer Stelldinger
备注:Submitted to the 24th International Conference on Pervasive Computing and Communications (PerCom 2026)
摘要:通过可穿戴设备进行的连续心电图(ECG)监测为早期心血管疾病(CVD)检测提供了巨大的潜力。然而,在资源受限的环境中部署用于自动分析的深度学习模型面临着可靠性挑战,这是由于不可避免的分发外(OOD)数据。OOD输入,例如看不见的病理或噪声破坏的信号,通常会导致标准分类器的错误,高置信度预测,从而危及患者安全。现有的OOD检测方法要么忽略计算约束,要么分别处理噪声和不可见类。本文探讨了无监督异常检测(UAD)作为一个独立的,上游过滤机制,以提高鲁棒性。我们对六种UAD方法进行了基准测试,包括深度SVDD、基于重建的模型、掩蔽异常检测、规范化流和扩散模型,并在严格的资源约束(最多512 k个参数)下通过神经架构搜索(NAS)进行了优化。对PTB-XL和BUT QDB数据集的评估评估了OD CVD类别和由于噪声而不适合分析的信号的检测。结果表明,Deep SVDD始终实现了检测和效率之间的最佳权衡。在一个真实的部署模拟中,将优化的Deep SVDD过滤器与诊断分类器集成在一起,将准确率提高了21个百分点。这项研究表明,优化的UAD滤波器可以保护自动ECG分析,从而在可穿戴设备上实现更安全、更可靠的连续心血管监测。
摘要 :Continuous electrocardiogram (ECG) monitoring via wearables offers significant potential for early cardiovascular disease (CVD) detection. However, deploying deep learning models for automated analysis in resource-constrained environments faces reliability challenges due to inevitable Out-of-Distribution (OOD) data. OOD inputs, such as unseen pathologies or noisecorrupted signals, often cause erroneous, high-confidence predictions by standard classifiers, compromising patient safety. Existing OOD detection methods either neglect computational constraints or address noise and unseen classes separately. This paper explores Unsupervised Anomaly Detection (UAD) as an independent, upstream filtering mechanism to improve robustness. We benchmark six UAD approaches, including Deep SVDD, reconstruction-based models, Masked Anomaly Detection, normalizing flows, and diffusion models, optimized via Neural Architecture Search (NAS) under strict resource constraints (at most 512k parameters). Evaluation on PTB-XL and BUT QDB datasets assessed detection of OOD CVD classes and signals unsuitable for analysis due to noise. Results show Deep SVDD consistently achieves the best trade-off between detection and efficiency. In a realistic deployment simulation, integrating the optimized Deep SVDD filter with a diagnostic classifier improved accuracy by up to 21 percentage points over a classifier-only baseline. This study demonstrates that optimized UAD filters can safeguard automated ECG analysis, enabling safer, more reliable continuous cardiovascular monitoring on wearables.


【2】Offline Clustering of Preference Learning with Active-data Augmentation
标题:基于主动数据增强的离线偏好聚类学习
链接:https://arxiv.org/abs/2510.26301

作者:Jingyuan Liu, Fatemeh Ghaffari, Xuchuang Wang, Mohammad Hajiesmaili, Carlee Joe-Wong
摘要:从成对反馈中进行偏好学习是一种广泛采用的框架,例如具有人类反馈和建议的强化学习。然而,在许多实际设置中,用户交互是有限的或昂贵的,使得离线偏好学习是必要的。此外,现实世界的偏好学习通常涉及具有不同偏好的用户。例如,来自不同背景的注释者可能会对相同的响应进行不同的排名。这种设置提出了两个主要挑战:(1)识别用户之间的相似性,以有效地聚合数据,特别是在离线数据在维度之间不平衡的情况下,以及(2)处理不平衡的离线数据,其中一些偏好维度表现不足。为了解决这些挑战,我们研究了偏好学习问题的离线聚类,其中学习者可以访问来自多个用户的固定数据集,这些用户具有潜在的不同偏好,并旨在最大限度地提高测试用户的效用。为了解决第一个挑战,我们首先提出了Off-C$^2$PL用于纯离线设置,其中学习器仅依赖于离线数据。我们的理论分析提供了一个次优约束,明确捕捉样本噪声和偏差之间的权衡。为了解决不平衡数据的第二个挑战,我们将我们的框架扩展到具有主动数据增强的设置,其中允许学习者基于Off-C$^2$PL学习的集群结构为测试用户选择有限数量的额外主动数据。在这种情况下,我们的第二个算法,A^2 $-Off-C$^2$PL,主动选择测试用户偏好中信息量最少的样本。我们证明,这些积极收集的样本比离线的贡献更有效。最后,我们通过模拟合成和真实世界的数据集验证我们的理论结果。
摘要:Preference learning from pairwise feedback is a widely adopted framework in applications such as reinforcement learning with human feedback and recommendations. In many practical settings, however, user interactions are limited or costly, making offline preference learning necessary. Moreover, real-world preference learning often involves users with different preferences. For example, annotators from different backgrounds may rank the same responses differently. This setting presents two central challenges: (1) identifying similarity across users to effectively aggregate data, especially under scenarios where offline data is imbalanced across dimensions, and (2) handling the imbalanced offline data where some preference dimensions are underrepresented. To address these challenges, we study the Offline Clustering of Preference Learning problem, where the learner has access to fixed datasets from multiple users with potentially different preferences and aims to maximize utility for a test user. To tackle the first challenge, we first propose Off-C$^2$PL for the pure offline setting, where the learner relies solely on offline data. Our theoretical analysis provides a suboptimality bound that explicitly captures the tradeoff between sample noise and bias. To address the second challenge of inbalanced data, we extend our framework to the setting with active-data augmentation where the learner is allowed to select a limited number of additional active-data for the test user based on the cluster structure learned by Off-C$^2$PL. In this setting, our second algorithm, A$^2$-Off-C$^2$PL, actively selects samples that target the least-informative dimensions of the test user's preference. We prove that these actively collected samples contribute more effectively than offline ones. Finally, we validate our theoretical results through simulations on synthetic and real-world datasets.


【3】Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning
标题:监督强化学习:从专家轨迹到逐步推理
链接:https://arxiv.org/abs/2510.25992

作者:Yihe Deng, I-Hung Hsu, Jun Yan, Zifeng Wang, Rujun Han, Gufeng Zhang, Yanfei Chen, Wei Wang, Tomas Pfister, Chen-Yu Lee
摘要:大型语言模型(LLM)通常需要解决需要多步推理的问题。对于小规模的开源模型,具有可验证奖励的强化学习(RLVR)在即使经过多次尝试也很少采样正确的解决方案时失败,而监督微调(SFT)往往通过严格的令牌模仿过度拟合长演示。为了解决这个问题,我们提出了监督强化学习(SRL),这是一个框架,它将问题解决重新定义为生成一系列逻辑“动作”。SRL训练模型在执行每个动作之前生成内部推理独白。它基于模型的动作和以逐步方式从SFT数据集中提取的专家动作之间的相似性提供更平滑的奖励。这种监督提供了更丰富的学习信号,即使所有的推出都是不正确的,同时鼓励在专家演示的指导下进行灵活的推理。因此,SRL使小模型能够学习以前无法通过SFT或RLVR学习的具有挑战性的问题。此外,在使用RLVR进行优化之前,使用SRL初始化训练会产生最强的整体性能。除了推理基准,SRL有效地推广到代理软件工程任务,建立它作为一个强大的和通用的培训框架,面向推理的LLM。
摘要:Large Language Models (LLMs) often struggle with problems that require multi-step reasoning. For small-scale open-source models, Reinforcement Learning with Verifiable Rewards (RLVR) fails when correct solutions are rarely sampled even after many attempts, while Supervised Fine-Tuning (SFT) tends to overfit long demonstrations through rigid token-by-token imitation. To address this gap, we propose Supervised Reinforcement Learning (SRL), a framework that reformulates problem solving as generating a sequence of logical "actions". SRL trains the model to generate an internal reasoning monologue before committing to each action. It provides smoother rewards based on the similarity between the model's actions and expert actions extracted from the SFT dataset in a step-wise manner. This supervision offers richer learning signals even when all rollouts are incorrect, while encouraging flexible reasoning guided by expert demonstrations. As a result, SRL enables small models to learn challenging problems previously unlearnable by SFT or RLVR. Moreover, initializing training with SRL before refining with RLVR yields the strongest overall performance. Beyond reasoning benchmarks, SRL generalizes effectively to agentic software engineering tasks, establishing it as a robust and versatile training framework for reasoning-oriented LLMs.


【4】Active Learning with Task-Driven Representations for Messy Pools
标题:针对混乱的任务驱动表示的主动学习
链接:https://arxiv.org/abs/2510.25926

作者:Kianoosh Ashouritaklimi, Tom Rainforth
摘要:主动学习有可能对混乱的、未经策划的数据池特别有用,在这些数据池中,数据点与目标任务的相关性不同。然而,解决这个问题的最先进的方法目前依赖于使用固定的、无监督的池表示,而是集中在修改采集函数上。我们发现,这种模型设置可能会破坏它们处理混乱池的有效性,因为这种表示可能无法捕获与任务相关的重要信息。为了解决这个问题,我们建议使用任务驱动的表示,这些表示在主动学习过程中使用先前收集的标签定期更新。我们介绍了两个具体的策略来学习这些表示,一个是基于直接学习半监督表示,另一个是基于监督微调的初始无监督表示。我们发现,与使用无监督或预训练的表示相比,两者都显着提高了经验性能。
摘要 :Active learning has the potential to be especially useful for messy, uncurated pools where datapoints vary in relevance to the target task. However, state-of-the-art approaches to this problem currently rely on using fixed, unsupervised representations of the pool, focusing on modifying the acquisition function instead. We show that this model setup can undermine their effectiveness at dealing with messy pools, as such representations can fail to capture important information relevant to the task. To address this, we propose using task-driven representations that are periodically updated during the active learning process using the previously collected labels. We introduce two specific strategies for learning these representations, one based on directly learning semi-supervised representations and the other based on supervised fine-tuning of an initial unsupervised representation. We find that both significantly improve empirical performance over using unsupervised or pretrained representations.


【5】Metis-SPECS: Decoupling Multimodal Learning via Self-distilled Preference-based Cold Start
标题:Metis-SPCES:通过自我提炼的基于偏好的冷启动来脱钩多模式学习
链接:https://arxiv.org/abs/2510.25801

作者:Kun Chen, Peng Shi, Haibo Qiu, Zhixiong Zeng, Siqi Yang, Wenji Mao, Lin Ma
备注:Project Page: this https URL
摘要:具有可验证奖励的强化学习(RL)最近催化了一波“MLLM-r1”方法,将RL引入视觉语言模型。大多数代表性的范例开始于冷启动,通常采用监督微调(SFT),在RL之前初始化策略。然而,基于SFT的冷启动采用与任务解决方案和输出格式交织在一起的推理范式,这可能会导致预防式过拟合,削弱分布外泛化,并最终影响下游RL。我们从训练方法和数据结构两个角度重新审视了冷启动,并引入了泛化因子(GF)系数来量化不同方法下的泛化能力。我们的实证研究发现,在冷启动中,基于偏好的训练方法(例如DPO)比基于SFT的方法更好地推广。基于此,我们提出了一个自蒸馏的基于偏好的冷启动框架,它包含了多模态学习:(1)通过自蒸馏生成内省偏好数据对,避免依赖较大的教师或手动注释;(2)执行基于偏好的训练来学习,专注于浅层的,可转移的表面形式标准(格式,结构,风格),而不是记忆内容;(3)交给RL,并对深度推理结果提供可验证的奖励。多个多模态基准测试的实验结果表明,我们的解耦学习框架在强基线上产生了一致的性能增益,将MEGA-Bench提高了4.1%,MathVista提高了12.2%。额外的实验表明,SPECS有助于减少分布中的“粘性”,提高探索,稳定训练,提高性能上限。
摘要:Reinforcement learning (RL) with verifiable rewards has recently catalyzed a wave of "MLLM-r1" approaches that bring RL to vision language models. Most representative paradigms begin with a cold start, typically employing supervised fine-tuning (SFT), to initialize the policy before RL. However, SFT-based cold start adopts the reasoning paradigm intertwined with task solution and output format, which may induce instruction-style overfitting, weakens out-of-distribution generalization, and ultimately affects downstream RL. We revisit the cold start along two views, its training method and data construction, and introduce the Generalization Factor (GF) coefficient to quantify the generalization capability under different methods. Our empirical study finds that preference-based training methods (e.g. DPO) generalizes better than SFT-based methods in cold start. Motivated by this, we propose SPECS-a Self-distilled, Preference-based Cold Start framework that decouples multimodal learning: (1) generates introspective preference data pairs via self-distillation, avoiding reliance on larger teachers or manual annotation; (2) performs preference-based training to learn, focusing on shallow, transferable surface-form criteria (format, structure, style) rather than memorizing content; and (3) hands off to RL with verifiable rewards for deep reasoning results. Experimental results across multiple multimodal benchmarks show that our decoupling learning framework yields consistent performance gains over strong baselines, improving MEGA-Bench by 4.1% and MathVista by 12.2%. Additional experiments indicate that SPECS contributes to reducing in-distribution "stuckness," improving exploration, stabilizing training, and raising the performance ceiling.


【6】Unsupervised local learning based on voltage-dependent synaptic plasticity for resistive and ferroelectric synapses
标题:基于电阻性和铁电性突触电压依赖性突触可塑性的无监督局部学习
链接:https://arxiv.org/abs/2510.25787

作者:Nikhil Garg, Ismael Balafrej, Joao Henrique Quintino Palhares, Laura Bégon-Lours, Davide Florini, Donato Francesco Falcone, Tommaso Stecconi, Valeria Bragaglia, Bert Jan Offrein, Jean-Michel Portal, Damien Querlioz, Yann Beilliard, Dominique Drouin, Fabien Alibart
摘要:在边缘计算设备上部署AI面临着与能耗和功能相关的重大挑战。这些设备可以极大地受益于大脑启发的学习机制,允许在使用低功耗的同时进行实时适应。使用纳米级电阻存储器的内存计算可能在这些边缘设备上执行AI工作负载方面发挥至关重要的作用。在这项研究中,我们介绍了电压依赖性突触可塑性(VDSP)作为一种有效的方法,无监督和本地学习的基础上Hebbian原则的忆阻突触。这种方法使在线学习,而不需要复杂的脉冲整形电路通常所需的尖峰定时相关的可塑性(STDP)。我们展示了如何VDSP可以有利地适应三种类型的忆阻器件(TiO_2 $,HfO_2 $为基础的金属氧化物突触,HfZrO_4 $为基础的铁电隧道结(FTJ))与disctictive开关特性。对包含这些设备的尖峰神经网络进行了系统级模拟,以验证基于MNIST的模式识别任务的无监督学习,实现最先进的性能。结果表明,使用200个神经元的所有设备的准确率超过83%。此外,我们还评估了器件可变性的影响,例如开关阈值和高电阻状态电平与低电阻状态电平之间的比率,并提出了增强鲁棒性的缓解策略。
摘要:The deployment of AI on edge computing devices faces significant challenges related to energy consumption and functionality. These devices could greatly benefit from brain-inspired learning mechanisms, allowing for real-time adaptation while using low-power. In-memory computing with nanoscale resistive memories may play a crucial role in enabling the execution of AI workloads on these edge devices. In this study, we introduce voltage-dependent synaptic plasticity (VDSP) as an efficient approach for unsupervised and local learning in memristive synapses based on Hebbian principles. This method enables online learning without requiring complex pulse-shaping circuits typically necessary for spike-timing-dependent plasticity (STDP). We show how VDSP can be advantageously adapted to three types of memristive devices (TiO$_2$, HfO$_2$-based metal-oxide filamentary synapses, and HfZrO$_4$-based ferroelectric tunnel junctions (FTJ)) with disctinctive switching characteristics. System-level simulations of spiking neural networks incorporating these devices were conducted to validate unsupervised learning on MNIST-based pattern recognition tasks, achieving state-of-the-art performance. The results demonstrated over 83% accuracy across all devices using 200 neurons. Additionally, we assessed the impact of device variability, such as switching thresholds and ratios between high and low resistance state levels, and proposed mitigation strategies to enhance robustness.


【7】Uncertainty-Aware Diagnostics for Physics-Informed Machine Learning
标题:针对物理信息机器学习的不确定性感知诊断
链接:https://arxiv.org/abs/2510.26121

作者:Mara Daniels, Liam Hodgkinson, Michael Mahoney
摘要:物理信息机器学习(PIML)将先前的物理信息(通常以微分方程约束的形式)集成到将机器学习模型拟合到物理数据的过程中。流行的PIML方法,包括神经算子,物理信息神经网络,神经常微分方程和神经离散平衡,通常适合同时包括数据和物理约束的目标。然而,这种方法的多目标性质在模型质量的测量中产生了模糊性。这与对认知不确定性的理解不佳有关,即使现有的统计指标表明很好的拟合,它也可能导致令人惊讶的失败模式。在高斯过程回归框架内工作,我们引入了物理信息日志证据(PILE)评分。为了消除测试损失的模糊性,PILE分数是一个单一的、不确定性感知的度量,它为PIML模型的超参数提供了选择原则。我们表明,PILE最小化产生了各种各样的模型参数,包括内核带宽,最小二乘正则化权重,甚至核函数的选择很好的选择。我们还表明,即使在数据采集之前,一个特殊的“无数据”的情况下,PILE得分确定先验内核的选择,是“良好适应”一个给定的PDE。除了内核设置之外,我们预计PILE评分可以扩展到PIML,我们概述了这样做的方法。
摘要 :Physics-informed machine learning (PIML) integrates prior physical information, often in the form of differential equation constraints, into the process of fitting machine learning models to physical data. Popular PIML approaches, including neural operators, physics-informed neural networks, neural ordinary differential equations, and neural discrete equilibria, are typically fit to objectives that simultaneously include both data and physical constraints. However, the multi-objective nature of this approach creates ambiguity in the measurement of model quality. This is related to a poor understanding of epistemic uncertainty, and it can lead to surprising failure modes, even when existing statistical metrics suggest strong fits. Working within a Gaussian process regression framework, we introduce the Physics-Informed Log Evidence (PILE) score. Bypassing the ambiguities of test losses, the PILE score is a single, uncertainty-aware metric that provides a selection principle for hyperparameters of a PIML model. We show that PILE minimization yields excellent choices for a wide variety of model parameters, including kernel bandwidth, least squares regularization weights, and even kernel function selection. We also show that, even prior to data acquisition, a special 'data-free' case of the PILE score identifies a priori kernel choices that are 'well-adapted' to a given PDE. Beyond the kernel setting, we anticipate that the PILE score can be extended to PIML at large, and we outline approaches to do so.


迁移|Zero/Few/One-Shot|自适应(11篇)

【1】Pre-trained Forecasting Models: Strong Zero-Shot Feature Extractors for Time Series Classification
标题:预训练预测模型:用于时间序列分类的强大Zero-Shot特征提取器
链接:https://arxiv.org/abs/2510.26777

作者:Andreas Auer, Daniel Klotz, Sebastinan Böck, Sepp Hochreiter
备注:NeurIPS 2025 Workshop on Recent Advances in Time Series Foundation Models (BERT2S)
摘要:最近对时间序列基础模型的研究主要集中在预测上,因此不清楚它们的学习表示有多普遍。在这项研究中,我们研究了冻结的预训练预测模型是否可以提供有效的分类表示。为此,我们比较了不同的表示提取策略,并介绍了两个模型无关的嵌入增强。我们的实验表明,最好的预测模型实现的分类准确度与专门为分类而预先训练的最先进的模型相匹配,甚至超过。此外,我们观察到预测和分类性能之间的正相关性。这些发现挑战了特定任务的预训练是必要的假设,并建议学习预测可能提供了一个强大的路径构建通用的时间序列基础模型。
摘要:Recent research on time series foundation models has primarily focused on forecasting, leaving it unclear how generalizable their learned representations are. In this study, we examine whether frozen pre-trained forecasting models can provide effective representations for classification. To this end, we compare different representation extraction strategies and introduce two model-agnostic embedding augmentations. Our experiments show that the best forecasting models achieve classification accuracy that matches or even surpasses that of state-of-the-art models pre-trained specifically for classification. Moreover, we observe a positive correlation between forecasting and classification performance. These findings challenge the assumption that task-specific pre-training is necessary, and suggest that learning to forecast may provide a powerful route toward constructing general-purpose time series foundation models.


【2】Heuristic Adaptation of Potentially Misspecified Domain Support for Likelihood-Free Inference in Stochastic Dynamical Systems
标题:随机动态系统中潜在错误指定领域支持对无可能推理的启发式适应
链接:https://arxiv.org/abs/2510.26656

作者:Georgios Kamaras, Craig Innes, Subramanian Ramamoorthy
摘要:在机器人技术中,无似然推理(LFI)可以提供在部署条件的参数集中适应学习代理的域分布。LFI假设采样的任意支持,其保持恒定,因为初始通用先验被迭代地改进为更具描述性的后验。然而,一个潜在的错误指定的支持可能会导致次优,但错误的肯定,后验。为了解决这个问题,我们提出了三种启发式LFI变体:EDGE,MODE和CENTRE。每一个解释后验模式转移的推理步骤,在自己的方式,当集成到一个LFI步骤,适应支持后验推理。我们首先暴露的支持错误规范的问题,并评估我们使用随机动态基准的竞争力。然后,我们评估的启发式支持适应的参数推断和政策学习的动态可变形线性对象(DLO)的操作任务的影响。推断结果在一个更精细的长度和刚度分类DLO的参数集。当所得到的后验被用作基于SIM的策略学习的域分布时,它们会导致更健壮的以对象为中心的代理性能。
摘要:In robotics, likelihood-free inference (LFI) can provide the domain distribution that adapts a learnt agent in a parametric set of deployment conditions. LFI assumes an arbitrary support for sampling, which remains constant as the initial generic prior is iteratively refined to more descriptive posteriors. However, a potentially misspecified support can lead to suboptimal, yet falsely certain, posteriors. To address this issue, we propose three heuristic LFI variants: EDGE, MODE, and CENTRE. Each interprets the posterior mode shift over inference steps in its own way and, when integrated into an LFI step, adapts the support alongside posterior inference. We first expose the support misspecification issue and evaluate our heuristics using stochastic dynamical benchmarks. We then evaluate the impact of heuristic support adaptation on parameter inference and policy learning for a dynamic deformable linear object (DLO) manipulation task. Inference results in a finer length and stiffness classification for a parametric set of DLOs. When the resulting posteriors are used as domain distributions for sim-based policy learning, they lead to more robust object-centric agent performance.


【3】Adaptive Inverse Kinematics Framework for Learning Variable-Length Tool Manipulation in Robotics
标题:用于学习机器人中变长度工具操作的自适应逆运动学框架
链接:https://arxiv.org/abs/2510.26551

作者:Prathamesh Kothavale, Sravani Boddepalli
备注:10 pages, 5 figures. Demonstrates a reinforcement learning framework for adaptive tool manipulation with variable-length extensions
摘要:传统的机器人对其运动学的理解有限,并且仅限于预先编程的任务,这阻碍了它们有效利用工具的能力。在工具使用的基本组成部分(掌握期望的结果、选择最合适的工具、确定最佳工具方向以及执行精确操作)的驱动下,我们引入了一个开创性的框架。我们的新方法扩展了机器人的逆运动学求解器的能力,使其能够使用不同长度的工具来获得连续的动作。通过将模拟学习的动作轨迹与该工具相结合,我们展示了通过综合实验将获得的技能从模拟转移到现实世界场景的实用性。值得注意的是,我们的扩展逆运动学求解器展示了小于1厘米的令人印象深刻的错误率。此外,我们的训练策略在模拟中实现了8 cm的平均误差。值得注意的是,我们的模型在使用两种不同长度的不同工具时实现了几乎无法区分的性能。这项研究表明,在探索工具使用的所有四个基本方面可能会取得进展,使机器人能够在不同的任务中掌握复杂的工具操作艺术。
摘要:Conventional robots possess a limited understanding of their kinematics and are confined to preprogrammed tasks, hindering their ability to leverage tools efficiently. Driven by the essential components of tool usage - grasping the desired outcome, selecting the most suitable tool, determining optimal tool orientation, and executing precise manipulations - we introduce a pioneering framework. Our novel approach expands the capabilities of the robot's inverse kinematics solver, empowering it to acquire a sequential repertoire of actions using tools of varying lengths. By integrating a simulation-learned action trajectory with the tool, we showcase the practicality of transferring acquired skills from simulation to real-world scenarios through comprehensive experimentation. Remarkably, our extended inverse kinematics solver demonstrates an impressive error rate of less than 1 cm. Furthermore, our trained policy achieves a mean error of 8 cm in simulation. Noteworthy, our model achieves virtually indistinguishable performance when employing two distinct tools of different lengths. This research provides an indication of potential advances in the exploration of all four fundamental aspects of tool usage, enabling robots to master the intricate art of tool manipulation across diverse tasks.


【4】A Three-Stage Bayesian Transfer Learning Framework to Improve Predictions in Data-Scarce Domains
标题:改进数据稀缺领域预测的三阶段Bayesian迁移学习框架
链接:https://arxiv.org/abs/2510.26541

作者:Aidan Furlong, Robert Salko, Xingang Zhao, Xu Wu
备注:Submitted to Engineering Applications of Artificial Intelligence
摘要 :ML在工程中的使用稳步增长,以支持广泛的应用。在这些方法中,深度神经网络由于其性能和可访问性而被广泛采用,但它们需要大型,高质量的数据集。实验数据通常是稀疏的,嘈杂的,或不足以构建弹性数据驱动的模型。迁移学习利用相关数据丰富的源域来帮助数据稀缺的目标域的学习,已经显示出有效性。参数传递,其中预训练的权重被重用,是常见的,但在大的域移位下会降级。域对抗神经网络(DANN)通过学习域不变表示来帮助解决这个问题,从而在半监督设置中改善更大域偏移下的传输。然而,DANN在训练过程中可能不稳定,并且缺乏用于不确定性量化的原生方法。本研究介绍了一个完全监督的三阶段框架,阶段贝叶斯域对抗神经网络(阶段B-DANN),它结合了参数传递和共享潜在空间自适应。在阶段1中,在源域上训练确定性特征提取器。然后,在阶段2中使用DANN对该特征提取器进行逆向精炼。在第3阶段,在自适应特征提取器上构建贝叶斯神经网络,用于对目标域进行微调,以处理条件偏移并产生校准的不确定性估计。这种分阶段B-DANN方法首先在合成基准上进行了验证,结果表明它的性能显着优于标准传输技术。然后将其应用于预测矩形通道中的临界热通量的任务,利用来自管实验的数据作为源域。研究结果表明,分阶段B-DANN方法可以提高预测精度和泛化能力,可能有助于核工程的其他领域。
摘要:The use of ML in engineering has grown steadily to support a wide array of applications. Among these methods, deep neural networks have been widely adopted due to their performance and accessibility, but they require large, high-quality datasets. Experimental data are often sparse, noisy, or insufficient to build resilient data-driven models. Transfer learning, which leverages relevant data-abundant source domains to assist learning in data-scarce target domains, has shown efficacy. Parameter transfer, where pretrained weights are reused, is common but degrades under large domain shifts. Domain-adversarial neural networks (DANNs) help address this issue by learning domain-invariant representations, thereby improving transfer under greater domain shifts in a semi-supervised setting. However, DANNs can be unstable during training and lack a native means for uncertainty quantification. This study introduces a fully-supervised three-stage framework, the staged Bayesian domain-adversarial neural network (staged B-DANN), that combines parameter transfer and shared latent space adaptation. In Stage 1, a deterministic feature extractor is trained on the source domain. This feature extractor is then adversarially refined using a DANN in Stage 2. In Stage 3, a Bayesian neural network is built on the adapted feature extractor for fine-tuning on the target domain to handle conditional shifts and yield calibrated uncertainty estimates. This staged B-DANN approach was first validated on a synthetic benchmark, where it was shown to significantly outperform standard transfer techniques. It was then applied to the task of predicting critical heat flux in rectangular channels, leveraging data from tube experiments as the source domain. The results of this study show that the staged B-DANN method can improve predictive accuracy and generalization, potentially assisting other domains in nuclear engineering.


【5】Representation-Level Counterfactual Calibration for Debiased Zero-Shot Recognition
标题:去偏Zero-Shot识别的表示级反事实校准
链接:https://arxiv.org/abs/2510.26466

作者:Pei Peng, MingKun Xie, Hang Hao, Tong Jin, ShengJun Huang
摘要:对象上下文快捷方式仍然是视觉语言模型中的一个持续挑战,当测试时场景与熟悉的训练同现不同时,会破坏zero-shot的可靠性。我们将这个问题重新定义为一个因果推理问题,并问:如果物体出现在不同的环境中,预测是否仍然存在?为了在推理时回答这个问题,我们在CLIP的表示空间内估计对象和背景期望,并通过将对象特征与从外部数据集,批次邻居或文本衍生描述中采样的各种替代上下文进行重组来合成反事实嵌入。通过估计总直接效应和模拟干预,我们进一步减去仅背景激活,保留有益的对象-上下文交互,同时减轻幻觉评分。没有再培训或提示设计,我们的方法大大提高了最坏的组和平均准确性上下文敏感的基准,建立一个新的zero-shot状态的最先进的。除了性能,我们的框架提供了一个轻量级的代表级反事实的方法,提供了一个实用的因果途径,去偏见和可靠的多模态推理。
摘要:Object-context shortcuts remain a persistent challenge in vision-language models, undermining zero-shot reliability when test-time scenes differ from familiar training co-occurrences. We recast this issue as a causal inference problem and ask: Would the prediction remain if the object appeared in a different environment? To answer this at inference time, we estimate object and background expectations within CLIP's representation space, and synthesize counterfactual embeddings by recombining object features with diverse alternative contexts sampled from external datasets, batch neighbors, or text-derived descriptions. By estimating the Total Direct Effect and simulating intervention, we further subtract background-only activation, preserving beneficial object-context interactions while mitigating hallucinated scores. Without retraining or prompt design, our method substantially improves both worst-group and average accuracy on context-sensitive benchmarks, establishing a new zero-shot state of the art. Beyond performance, our framework provides a lightweight representation-level counterfactual approach, offering a practical causal avenue for debiased and reliable multimodal reasoning.


【6】Personalized Treatment Outcome Prediction from Scarce Data via Dual-Channel Knowledge Distillation and Adaptive Fusion
标题:通过双通道知识提炼和自适应融合从稀缺数据中进行个性化治疗结果预测
链接:https://arxiv.org/abs/2510.26444

作者:Wenjie Chen, Li Zhuang, Ziying Luo, Yu Liu, Jiahao Wu, Shengcai Liu
摘要:基于小样本和罕见患者群体的试验数据进行个性化治疗结局预测在精准医疗中至关重要。然而,昂贵的试验数据限制了预测性能。为了解决这个问题,我们提出了一个交叉保真知识蒸馏和自适应融合网络(CFKD-AFN),它利用丰富但低保真的模拟数据来增强对稀缺但高保真的试验数据的预测。CFKD-AFN结合了一个双通道知识蒸馏模块,从低保真度模型中提取互补知识,以及一个注意力引导融合模块,动态整合多源信息。对慢性阻塞性肺疾病治疗结果预测的实验表明,CFKD-AFN在预测准确率方面比最先进的方法有显著的提高,范围从6.67%到74.55%,并且对不同的高保真数据集大小具有很强的鲁棒性。此外,我们将CFKD-AFN扩展为可解释的变体,从而能够探索潜在的医学语义以支持临床决策。
摘要:Personalized treatment outcome prediction based on trial data for small-sample and rare patient groups is critical in precision medicine. However, the costly trial data limit the prediction performance. To address this issue, we propose a cross-fidelity knowledge distillation and adaptive fusion network (CFKD-AFN), which leverages abundant but low-fidelity simulation data to enhance predictions on scarce but high-fidelity trial data. CFKD-AFN incorporates a dual-channel knowledge distillation module to extract complementary knowledge from the low-fidelity model, along with an attention-guided fusion module to dynamically integrate multi-source information. Experiments on treatment outcome prediction for the chronic obstructive pulmonary disease demonstrates significant improvements of CFKD-AFN over state-of-the-art methods in prediction accuracy, ranging from 6.67\% to 74.55\%, and strong robustness to varying high-fidelity dataset sizes. Furthermore, we extend CFKD-AFN to an interpretable variant, enabling the exploration of latent medical semantics to support clinical decision-making.


【7】Adaptive Context Length Optimization with Low-Frequency Truncation for Multi-Agent Reinforcement Learning
标题:基于低频截断的自适应上下文长度优化多智能体强化学习
链接:https://arxiv.org/abs/2510.26389

作者:Wenchang Duan, Yaoliang Yu, Jiwan He, Yi Shi
摘要:最近,深度多智能体强化学习(MARL)在解决具有挑战性的任务(如长期依赖性和非马尔可夫环境)方面表现出了良好的性能。它的成功部分归功于大的固定上下文长度的条件政策。然而,这样大的固定上下文长度可能导致有限的探索效率和冗余信息。在本文中,我们提出了一种新的MARL框架,以获得自适应和有效的上下文信息。具体来说,我们设计了一个中央代理,通过时间梯度分析动态优化上下文长度,增强探索,以促进收敛到全局最优的MARL。此外,为了增强上下文长度的自适应优化能力,我们提出了一种有效的输入表示的中央代理,有效地过滤冗余信息。通过利用基于傅立叶的低频截断方法,我们提取了分散代理的全球时间趋势,提供了一个有效和高效的MARL环境的表示。大量的实验表明,该方法实现了国家的最先进的(SOTA)的长期依赖性任务的性能,包括PettingZoo,MiniGrid,谷歌研究足球(GRF),和星际争霸多智能体挑战v2(SMACv2)。
摘要 :Recently, deep multi-agent reinforcement learning (MARL) has demonstrated promising performance for solving challenging tasks, such as long-term dependencies and non-Markovian environments. Its success is partly attributed to conditioning policies on large fixed context length. However, such large fixed context lengths may lead to limited exploration efficiency and redundant information. In this paper, we propose a novel MARL framework to obtain adaptive and effective contextual information. Specifically, we design a central agent that dynamically optimizes context length via temporal gradient analysis, enhancing exploration to facilitate convergence to global optima in MARL. Furthermore, to enhance the adaptive optimization capability of the context length, we present an efficient input representation for the central agent, which effectively filters redundant information. By leveraging a Fourier-based low-frequency truncation method, we extract global temporal trends across decentralized agents, providing an effective and efficient representation of the MARL environment. Extensive experiments demonstrate that the proposed method achieves state-of-the-art (SOTA) performance on long-term dependency tasks, including PettingZoo, MiniGrid, Google Research Football (GRF), and StarCraft Multi-Agent Challenge v2 (SMACv2).


【8】maxVSTAR: Maximally Adaptive Vision-Guided CSI Sensing with Closed-Loop Edge Model Adaptation for Robust Human Activity Recognition
标题:MaxVSTAR:具有闭环边缘模型自适应的最大自适应视觉引导的SIS感知,用于鲁棒的人类活动识别
链接:https://arxiv.org/abs/2510.26146

作者:Kexing Liu
摘要:基于WiFi信道状态信息(CSI)的人类活动识别(HAR)为智能环境提供了一种保护隐私、无需设备的传感解决方案。然而,它在边缘设备上的部署受到域转移的严重限制,在不同的环境和硬件条件下,识别性能会恶化。本研究提出了maxVSTAR(最大自适应视觉引导传感技术的活动识别),一个闭环,视觉引导模型自适应框架,自主减轻边缘部署的CSI传感系统的域转移。所提出的系统集成了一个跨模态的师生架构,其中一个高精度的基于YOLO的视觉模型作为一个动态的监督信号,提供实时活动标签的CSI数据流。这些标签可以直接在边缘自动在线微调基于CSI的轻量级HAR模型,称为活动识别传感技术(STAR)。这种闭环再训练机制使STAR能够在无需人工干预的情况下不断适应环境变化。大量的实验证明了maxVSTAR的有效性。当部署在未经校准的硬件上时,基线STAR模型的识别准确率从93.52%下降到49.14%。在一个视觉引导适应周期后,maxVSTAR将准确率恢复到81.51%。这些结果证实了该系统在注重隐私的物联网环境中进行动态自监督模型自适应的能力,为在网络边缘使用CSI感知的长期自主HAR建立了一个可扩展的实用范例。
摘要:WiFi Channel State Information (CSI)-based human activity recognition (HAR) provides a privacy-preserving, device-free sensing solution for smart environments. However, its deployment on edge devices is severely constrained by domain shift, where recognition performance deteriorates under varying environmental and hardware conditions. This study presents maxVSTAR (maximally adaptive Vision-guided Sensing Technology for Activity Recognition), a closed-loop, vision-guided model adaptation framework that autonomously mitigates domain shift for edge-deployed CSI sensing systems. The proposed system integrates a cross-modal teacher-student architecture, where a high-accuracy YOLO-based vision model serves as a dynamic supervisory signal, delivering real-time activity labels for the CSI data stream. These labels enable autonomous, online fine-tuning of a lightweight CSI-based HAR model, termed Sensing Technology for Activity Recognition (STAR), directly at the edge. This closed-loop retraining mechanism allows STAR to continuously adapt to environmental changes without manual intervention. Extensive experiments demonstrate the effectiveness of maxVSTAR. When deployed on uncalibrated hardware, the baseline STAR model's recognition accuracy declined from 93.52% to 49.14%. Following a single vision-guided adaptation cycle, maxVSTAR restored the accuracy to 81.51%. These results confirm the system's capacity for dynamic, self-supervised model adaptation in privacy-conscious IoT environments, establishing a scalable and practical paradigm for long-term autonomous HAR using CSI sensing at the network edge.


【9】Network-Constrained Policy Optimization for Adaptive Multi-agent Vehicle Routing
标题:自适应多智能体车辆路径的网络约束策略优化
链接:https://arxiv.org/abs/2510.26089

作者:Fazel Arasteh, Arian Haghparast, Manos Papagelis
备注:29 pages, 12 figures. Fazel Arasteh and Arian Haghparast contributed equally to this research. Submitted to ACM Transactions on Spatial Algorithms and Systems (TSAS). The code for this work is publicly available at this https URL
摘要:城市道路网络的交通拥堵导致出行时间延长和排放增加,特别是在高峰期。虽然最短路径优先(SPF)算法对于静态网络中的单个车辆是最佳的,但它在动态多车辆设置中表现不佳,通常会通过沿相同路径路由所有车辆来加剧拥堵。我们通过多智能体强化学习(MARL)框架来协调,网络感知的车队导航来解决动态车辆路由问题。我们首先提出了自适应导航(AN),一个分散的MARL模型,其中每个交叉口代理提供基于(i)本地交通和(ii)使用图注意力网络(GAT)建模的邻域状态的路由指导。为了提高在大型网络的可扩展性,我们进一步提出了分层集线器为基础的自适应导航(HHAN),AN的扩展,只分配代理的关键路口(枢纽)。车辆在代理控制下按中心到中心进行路由,而SPF处理每个中心区域内的微路由。对于枢纽协调,HHAN在注意力Q混合(A-QMIX)框架下采用集中式训练与分散式执行(CTDE),通过注意力聚合异步车辆决策。集线器代理使用流量感知状态功能,结合本地拥塞和预测动态主动路由。在合成网格和真实城市地图(多伦多,曼哈顿)上的实验表明,与SPF和学习基线相比,AN减少了平均旅行时间,保持了100%的路由成功率。HHAN可扩展到具有数百个交叉口的网络,在交通繁忙的情况下实现高达15.9%的改善。这些研究结果突出了网络约束MARL在智能交通系统中可扩展,协调和感知路由的潜力。
摘要:Traffic congestion in urban road networks leads to longer trip times and higher emissions, especially during peak periods. While the Shortest Path First (SPF) algorithm is optimal for a single vehicle in a static network, it performs poorly in dynamic, multi-vehicle settings, often worsening congestion by routing all vehicles along identical paths. We address dynamic vehicle routing through a multi-agent reinforcement learning (MARL) framework for coordinated, network-aware fleet navigation. We first propose Adaptive Navigation (AN), a decentralized MARL model where each intersection agent provides routing guidance based on (i) local traffic and (ii) neighborhood state modeled using Graph Attention Networks (GAT). To improve scalability in large networks, we further propose Hierarchical Hub-based Adaptive Navigation (HHAN), an extension of AN that assigns agents only to key intersections (hubs). Vehicles are routed hub-to-hub under agent control, while SPF handles micro-routing within each hub region. For hub coordination, HHAN adopts centralized training with decentralized execution (CTDE) under the Attentive Q-Mixing (A-QMIX) framework, which aggregates asynchronous vehicle decisions via attention. Hub agents use flow-aware state features that combine local congestion and predictive dynamics for proactive routing. Experiments on synthetic grids and real urban maps (Toronto, Manhattan) show that AN reduces average travel time versus SPF and learning baselines, maintaining 100% routing success. HHAN scales to networks with hundreds of intersections, achieving up to 15.9% improvement under heavy traffic. These findings highlight the potential of network-constrained MARL for scalable, coordinated, and congestion-aware routing in intelligent transportation systems.


【10】Learning Geometry: A Framework for Building Adaptive Manifold Models through Metric Optimization
标题:学习几何:通过度量优化构建自适应Manifold模型的框架
链接:https://arxiv.org/abs/2510.26068

作者:Di Zhang
备注:9 pages
摘要:本文提出了一种超越传统参数优化的机器学习新范式。与传统的方法,在一个固定的几何空间内搜索最佳参数,我们的核心思想是把模型本身作为一个可塑的几何实体。具体来说,我们优化的度量张量场与预定义的拓扑流形上,从而动态地塑造模型空间的几何结构。为了实现这一目标,我们构建了一个变分框架,其损失函数仔细平衡数据保真度对流形的内在几何复杂性。前者确保模型有效地解释观察到的数据,而后者则充当正则化器,惩罚过度弯曲或不规则的几何形状,以鼓励更简单的模型并防止过度拟合。为了解决这个无限维优化问题的计算挑战,我们引入了一种基于离散微分几何的实用方法:将连续流形离散为三角形网格,并通过边长参数化度量张量,从而使用自动微分工具进行高效优化。理论分析揭示了我们的框架和广义相对论中的爱因斯坦-希尔伯特作用量之间的深刻相似性,为“数据驱动几何”的概念提供了一个优雅的物理解释。我们进一步认为,即使是固定的拓扑结构,度量优化提供了显着更大的表现力比模型固定的几何形状。这项工作为构建能够自主演化几何和拓扑的全动态“元学习者”奠定了坚实的基础,并在科学模型发现和鲁棒表示学习等领域指出了广阔的应用前景。
摘要 :This paper proposes a novel paradigm for machine learning that moves beyond traditional parameter optimization. Unlike conventional approaches that search for optimal parameters within a fixed geometric space, our core idea is to treat the model itself as a malleable geometric entity. Specifically, we optimize the metric tensor field on a manifold with a predefined topology, thereby dynamically shaping the geometric structure of the model space. To achieve this, we construct a variational framework whose loss function carefully balances data fidelity against the intrinsic geometric complexity of the manifold. The former ensures the model effectively explains observed data, while the latter acts as a regularizer, penalizing overly curved or irregular geometries to encourage simpler models and prevent overfitting. To address the computational challenges of this infinite-dimensional optimization problem, we introduce a practical method based on discrete differential geometry: the continuous manifold is discretized into a triangular mesh, and the metric tensor is parameterized by edge lengths, enabling efficient optimization using automatic differentiation tools. Theoretical analysis reveals a profound analogy between our framework and the Einstein-Hilbert action in general relativity, providing an elegant physical interpretation for the concept of "data-driven geometry". We further argue that even with fixed topology, metric optimization offers significantly greater expressive power than models with fixed geometry. This work lays a solid foundation for constructing fully dynamic "meta-learners" capable of autonomously evolving their geometry and topology, and it points to broad application prospects in areas such as scientific model discovery and robust representation learning.


【11】Optimal Information Combining for Multi-Agent Systems Using Adaptive Bias Learning
标题:使用自适应偏差学习的多智能体系统最优信息组合
链接:https://arxiv.org/abs/2510.25793

作者:Siavash M. Alamouti, Fay Arjomandi
备注:22 pages, 2 Figures, 62 equations, 47 references
摘要:现代多智能体系统,从监测关键基础设施的传感器网络到聚合人类智能的众包平台,都可能由于随环境条件而变化的系统偏差而遭受显著的性能下降。当前的方法要么忽略这些偏差,导致次优决策,要么需要昂贵的校准程序,这在实践中往往是不可行的。这种性能差距带来了真正的后果:不准确的环境监测,不可靠的财务预测,以及人类判断的错误汇总。本文解决了一个基本问题:我们什么时候可以学习和纠正这些未知的偏见,以恢复接近最佳的性能,什么时候这样的学习是徒劳的?我们开发了一个理论框架,将偏差分解为可学习的系统组件和不可约的随机组件,引入可学习性比的概念作为可观察协变量预测的偏差方差的分数。这个比率决定了偏差学习对于给定的系统是否值得。我们证明,可实现的性能改善是从根本上限制了这种可学习的比率,提供系统设计师与定量指导时,投资于偏见学习与简单的方法。我们提出了自适应偏差学习和最优组合(ABBYY)算法,该算法迭代学习偏差校正变换,同时通过封闭形式的解决方案优化组合权重,保证收敛到这些理论界。实验验证表明,具有高可学习性比率的系统可以恢复显着的性能(在我们的示例中,我们实现了理论最大改进的40%-70%),而那些具有低可学习性的系统则表现出最小的好处,验证了我们的诊断标准,用于实际部署决策。
摘要:Modern multi-agent systems ranging from sensor networks monitoring critical infrastructure to crowdsourcing platforms aggregating human intelligence can suffer significant performance degradation due to systematic biases that vary with environmental conditions. Current approaches either ignore these biases, leading to suboptimal decisions, or require expensive calibration procedures that are often infeasible in practice. This performance gap has real consequences: inaccurate environmental monitoring, unreliable financial predictions, and flawed aggregation of human judgments. This paper addresses the fundamental question: when can we learn and correct for these unknown biases to recover near-optimal performance, and when is such learning futile? We develop a theoretical framework that decomposes biases into learnable systematic components and irreducible stochastic components, introducing the concept of learnability ratio as the fraction of bias variance predictable from observable covariates. This ratio determines whether bias learning is worthwhile for a given system. We prove that the achievable performance improvement is fundamentally bounded by this learnability ratio, providing system designers with quantitative guidance on when to invest in bias learning versus simpler approaches. We present the Adaptive Bias Learning and Optimal Combining (ABLOC) algorithm, which iteratively learns bias-correcting transformations while optimizing combination weights through closedform solutions, guaranteeing convergence to these theoretical bounds. Experimental validation demonstrates that systems with high learnability ratios can recover significant performance (we achieved 40%-70% of theoretical maximum improvement in our examples), while those with low learnability show minimal benefit, validating our diagnostic criteria for practical deployment decisions.


强化学习(6篇)

【1】Hybrid DQN-TD3 Reinforcement Learning for Autonomous Navigation in Dynamic Environments
标题:动态环境中自主导航的混合DQN-TD 3强化学习
链接:https://arxiv.org/abs/2510.26646

作者:Xiaoyi He, Danggui Chen, Zhenshuo Zhang, Zimeng Bai
备注:6 pages, 5 figures; ROS+Gazebo (TurtleBot3) implementation; evaluation with PathBench metrics; code (primary): this https URL mirror (for reproducibility): this https URL
摘要:本文提出了一种分层路径规划和控制框架,该框架将用于离散子目标选择的高级深度Q网络(DQN)与用于连续驱动的低级双延迟深度确定性策略梯度(TD 3)控制器相结合。高级模块选择行为和子目标;低级模块执行平滑速度命令。我们设计了一个实用的奖励整形方案(方向,距离,避障,动作平滑,碰撞惩罚,时间惩罚和进度),以及一个基于激光雷达的安全门,可以防止不安全的运动。该系统在ROS + Gazebo(TurtleBot 3)中实现,并在动态和部分可观察的环境中使用PathBench指标进行评估,包括成功率,碰撞率,路径效率和重新规划效率。实验表明,提高了成功率和样本效率比单算法基线(DQN或TD 3单独)和基于规则的规划,更好地推广到看不见的障碍物配置和减少突然的控制变化。代码和评估脚本可在项目存储库中获得。
摘要:This paper presents a hierarchical path-planning and control framework that combines a high-level Deep Q-Network (DQN) for discrete sub-goal selection with a low-level Twin Delayed Deep Deterministic Policy Gradient (TD3) controller for continuous actuation. The high-level module selects behaviors and sub-goals; the low-level module executes smooth velocity commands. We design a practical reward shaping scheme (direction, distance, obstacle avoidance, action smoothness, collision penalty, time penalty, and progress), together with a LiDAR-based safety gate that prevents unsafe motions. The system is implemented in ROS + Gazebo (TurtleBot3) and evaluated with PathBench metrics, including success rate, collision rate, path efficiency, and re-planning efficiency, in dynamic and partially observable environments. Experiments show improved success rate and sample efficiency over single-algorithm baselines (DQN or TD3 alone) and rule-based planners, with better generalization to unseen obstacle configurations and reduced abrupt control changes. Code and evaluation scripts are available at the project repository.


【2】ReSpec: Towards Optimizing Speculative Decoding in Reinforcement Learning Systems
标题:ReSpec:优化强化学习系统中的推测解码
链接:https://arxiv.org/abs/2510.26475

作者:Qiaoling Chen, Zijun Liu, Peng Sun, Shenggui Li, Guoteng Wang, Ziming Liu, Yonggang Wen, Siyuan Feng, Tianwei Zhang
摘要:通过强化学习(RL)来适应大型语言模型(LLM)通常会受到生成阶段的影响,这可能会消耗超过75%的训练时间。推测解码(SD)加速了服务系统中的自回归生成,但其在RL训练下的行为在很大程度上仍未被探索。我们确定了三个关键的差距,阻碍了天真的SD到RL系统的集成:在大批量的加速下降,起草人陈旧下不断演员更新,和起草人引起的政策退化。   为了解决这些差距,我们提出了ReSpec,一个通过三种互补机制使SD适应RL的系统:动态调整SD配置,通过知识蒸馏来发展起草者,以及通过推出奖励来加权更新。在Qwen模型(3B-14 B)上,ReSpec实现了高达4.5倍的加速,同时保持了奖励收敛和训练稳定性,为基于RL的LLM自适应提供了一个实用的解决方案。
摘要 :Adapting large language models (LLMs) via reinforcement learning (RL) is often bottlenecked by the generation stage, which can consume over 75\% of the training time. Speculative decoding (SD) accelerates autoregressive generation in serving systems, but its behavior under RL training remains largely unexplored. We identify three critical gaps that hinder the naive integration of SD into RL systems: diminishing speedups at large batch sizes, drafter staleness under continual actor updates, and drafter-induced policy degradation.   To address these gaps, we present ReSpec, a system that adapts SD to RL through three complementary mechanisms: dynamically tuning SD configurations, evolving the drafter via knowledge distillation, and weighting updates by rollout rewards. On Qwen models (3B--14B), ReSpec achieves up to 4.5x speedup while preserving reward convergence and training stability, providing a practical solution for efficient RL-based LLM adaptation.


【3】Reinforcement Learning for Pollution Detection in a Randomized, Sparse and Nonstationary Environment with an Autonomous Underwater Vehicle
标题:具有自主水下航行器的随机、稀疏和非静止环境中污染检测的强化学习
链接:https://arxiv.org/abs/2510.26347

作者:Sebastian Zieglmeier, Niklas Erdmann, Narada D. Warakagoda
摘要:强化学习(RL)算法旨在通过学习最大化奖励的动作来优化问题解决,这是一项在随机和非平稳环境中变得特别具有挑战性的任务。即使是先进的强化学习算法在这些条件下解决问题的能力也往往有限。在使用自主水下航行器(AUV)搜索水下污染云等应用中,RL算法必须在奖励稀疏的环境中导航,其中行动经常导致零奖励。本文旨在通过重新审视和修改经典的强化学习方法来应对这些挑战,以便在稀疏、随机和非平稳环境中有效地运行。我们系统地研究了大量的修改,包括分层算法的变化,多目标学习,并集成的位置存储器作为外部输出过滤器,以防止状态重访。我们的研究结果表明,修改后的蒙特卡洛方法显着优于传统的Q学习和两个穷举搜索模式,说明其在适应RL复杂环境的潜力。这些发现表明,强化学习方法可以有效地适应随机,非平稳和奖励稀疏的环境。
摘要:Reinforcement learning (RL) algorithms are designed to optimize problem-solving by learning actions that maximize rewards, a task that becomes particularly challenging in random and nonstationary environments. Even advanced RL algorithms are often limited in their ability to solve problems in these conditions. In applications such as searching for underwater pollution clouds with autonomous underwater vehicles (AUVs), RL algorithms must navigate reward-sparse environments, where actions frequently result in a zero reward. This paper aims to address these challenges by revisiting and modifying classical RL approaches to efficiently operate in sparse, randomized, and nonstationary environments. We systematically study a large number of modifications, including hierarchical algorithm changes, multigoal learning, and the integration of a location memory as an external output filter to prevent state revisits. Our results demonstrate that a modified Monte Carlo-based approach significantly outperforms traditional Q-learning and two exhaustive search patterns, illustrating its potential in adapting RL to complex environments. These findings suggest that reinforcement learning approaches can be effectively adapted for use in random, nonstationary, and reward-sparse environments.


【4】A Game-Theoretic Spatio-Temporal Reinforcement Learning Framework for Collaborative Public Resource Allocation
标题:协作公共资源配置的游戏理论时空强化学习框架
链接:https://arxiv.org/abs/2510.26184

作者:Songxin Lei, Qiongyan Wang, Yanchen Zhu, Hanyu Yao, Sijie Ruan, Weilin Ruan, Yuyu Luo, Huaming Wu, Yuxuan Liang
摘要:公共资源配置涉及资源的有效分配,包括城市基础设施、能源和交通,以有效满足社会需求。然而,现有的方法集中在优化个别资源的移动独立,而不考虑他们的能力限制。为了解决这一限制,我们提出了一个新的和更实际的问题:协同公共资源分配(CPRA),它明确地将容量约束和时空动态在现实世界中的场景。我们提出了一个新的框架称为博弈论时空强化学习(GSTRL)解决CPRA。我们的贡献是双重的:1)我们将CPRA问题表述为一个势博弈,并证明了势函数和最优目标之间没有差距,为近似这个NP难问题的纳什均衡奠定了坚实的理论基础; 2)我们设计的GSTRL框架有效地捕捉了整个系统的时空动态。我们在两个真实世界的数据集上评估GSTRL,实验表明其优越的性能。我们的源代码可以在补充材料中找到。
摘要:Public resource allocation involves the efficient distribution of resources, including urban infrastructure, energy, and transportation, to effectively meet societal demands. However, existing methods focus on optimizing the movement of individual resources independently, without considering their capacity constraints. To address this limitation, we propose a novel and more practical problem: Collaborative Public Resource Allocation (CPRA), which explicitly incorporates capacity constraints and spatio-temporal dynamics in real-world scenarios. We propose a new framework called Game-Theoretic Spatio-Temporal Reinforcement Learning (GSTRL) for solving CPRA. Our contributions are twofold: 1) We formulate the CPRA problem as a potential game and demonstrate that there is no gap between the potential function and the optimal target, laying a solid theoretical foundation for approximating the Nash equilibrium of this NP-hard problem; and 2) Our designed GSTRL framework effectively captures the spatio-temporal dynamics of the overall system. We evaluate GSTRL on two real-world datasets, where experiments show its superior performance. Our source codes are available in the supplementary materials.


【5】Accelerating Real-World Overtaking in F1TENTH Racing Employing Reinforcement Learning Methods
标题:采用强化学习方法加速F1 TENTH赛车中的现实超车
链接:https://arxiv.org/abs/2510.26040

作者:Emily Steiner, Daniel van der Spuy, Futian Zhou, Afereti Pama, Minas Liarokapis, Henry Williams
摘要:虽然计时赛场景中的自动赛车性能已经取得了显著的进步和发展,但自动轮对轮赛车和超车仍然受到严重限制。这些限制在现实驾驶场景中尤其明显,在这些场景中,最先进的算法难以安全或可靠地完成超车操作。这一点很重要,因为其他车辆周围的可靠导航对于安全的自动轮对轮比赛至关重要。F1Tenth比赛为在标准化物理平台上开发轮对赛车算法提供了一个有用的机会。竞争的形式使得有可能评估超车和轮对轮赛车算法对国家的最先进的。这项研究提出了一种新的赛车和超车代理能够学习可靠地导航的轨道和超越对手在模拟和现实。该代理被部署在F1 Tenth车辆上,与现实世界中运行不同竞争算法的对手进行竞争。结果表明,代理人对对手的训练能够实现故意超车行为,超车率为87%,而仅为比赛训练的代理人的超车率为56%。
摘要:While autonomous racing performance in Time-Trial scenarios has seen significant progress and development, autonomous wheel-to-wheel racing and overtaking are still severely limited. These limitations are particularly apparent in real-life driving scenarios where state-of-the-art algorithms struggle to safely or reliably complete overtaking manoeuvres. This is important, as reliable navigation around other vehicles is vital for safe autonomous wheel-to-wheel racing. The F1Tenth Competition provides a useful opportunity for developing wheel-to-wheel racing algorithms on a standardised physical platform. The competition format makes it possible to evaluate overtaking and wheel-to-wheel racing algorithms against the state-of-the-art. This research presents a novel racing and overtaking agent capable of learning to reliably navigate a track and overtake opponents in both simulation and reality. The agent was deployed on an F1Tenth vehicle and competed against opponents running varying competitive algorithms in the real world. The results demonstrate that the agent's training against opponents enables deliberate overtaking behaviours with an overtaking rate of 87% compared 56% for an agent trained just to race.


【6】Non-myopic Matching and Rebalancing in Large-Scale On-Demand Ride-Pooling Systems Using Simulation-Informed Reinforcement Learning
标题:使用模拟信息强化学习的大规模按需乘车共享系统中的非近视匹配和再平衡
链接:https://arxiv.org/abs/2510.25796

作者:Farnoosh Namdarpour, Joseph Y. J. Chow
摘要 :拼车(Ride-pooling),也被称为乘车共享(Ride-sharing)、共享乘车(shared ride-hailing)或微交通(microtransit),是一种乘客共享乘车的服务。这项服务可以降低乘客和运营商的成本,减少拥堵和环境影响。然而,一个关键的限制是它的短视决策,忽视了调度决策的长期影响。为了解决这个问题,我们提出了一种模拟信息强化学习(RL)方法。虽然RL已经在乘车系统的背景下被广泛研究,但它在乘车系统中的应用却很少被探索。在这项研究中,我们将Xu et al.(2018)的学习和规划框架从叫车扩展到拼车,通过在学习机制中嵌入拼车模拟来实现非近视决策。此外,我们提出了一个补充政策,以重新平衡闲置车辆。通过对模拟经验采用n步时间差学习,我们推导出时空状态值,并随后使用纽约市出租车请求数据评估非近视策略的有效性。结果表明,非近视匹配的政策可以提高服务率高达8.4%,而近视的政策,同时减少在车内和乘客的等待时间。此外,与近视政策相比,拟议的非近视政策可以将机队规模减少25%以上,同时保持相同的性能水平,从而为运营商节省大量成本。将再平衡操作纳入拟议的框架中,可将等待时间减少27.3%,车内时间减少12.5%,并将服务率提高15.1%,而仅使用该框架进行匹配决策,成本是增加每位乘客的车辆行驶分钟数。
摘要:Ride-pooling, also known as ride-sharing, shared ride-hailing, or microtransit, is a service wherein passengers share rides. This service can reduce costs for both passengers and operators and reduce congestion and environmental impacts. A key limitation, however, is its myopic decision-making, which overlooks long-term effects of dispatch decisions. To address this, we propose a simulation-informed reinforcement learning (RL) approach. While RL has been widely studied in the context of ride-hailing systems, its application in ride-pooling systems has been less explored. In this study, we extend the learning and planning framework of Xu et al. (2018) from ride-hailing to ride-pooling by embedding a ride-pooling simulation within the learning mechanism to enable non-myopic decision-making. In addition, we propose a complementary policy for rebalancing idle vehicles. By employing n-step temporal difference learning on simulated experiences, we derive spatiotemporal state values and subsequently evaluate the effectiveness of the non-myopic policy using NYC taxi request data. Results demonstrate that the non-myopic policy for matching can increase the service rate by up to 8.4% versus a myopic policy while reducing both in-vehicle and wait times for passengers. Furthermore, the proposed non-myopic policy can decrease fleet size by over 25% compared to a myopic policy, while maintaining the same level of performance, thereby offering significant cost savings for operators. Incorporating rebalancing operations into the proposed framework cuts wait time by up to 27.3%, in-vehicle time by 12.5%, and raises service rate by 15.1% compared to using the framework for matching decisions alone at the cost of increased vehicle minutes traveled per passenger.


符号|符号学习(2篇)

【1】Towards Scaling Laws for Symbolic Regression
标题:迈向符号回归的标度定律
链接:https://arxiv.org/abs/2510.26064

作者:David Otte, Jörg K.H. Franke, Frank Hutter
备注:Accepted at the NeurIPS 2025 Math-AI Workshop
摘要:符号回归(Symbolic Regression,SR)旨在发现解释观测数据的潜在数学表达式。这为获得科学洞察力和为表格数据产生内在可解释和可推广的模型带来了希望。在这项工作中,我们专注于SR的基础知识。基于深度学习的SR最近已经成为遗传编程方法的竞争对手,但规模的作用在很大程度上尚未被探索。受语言建模中的缩放定律的启发,我们使用可扩展的端到端Transformer管道和精心生成的训练数据,首次系统地研究了SR中的缩放。在五个不同的模型大小和跨越三个数量级的计算中,我们发现验证损失和解决率都遵循明显的幂律趋势与计算。我们进一步确定了计算最优的超参数缩放:最优批量大小和学习率随着模型大小而增长,并且在我们的制度中,$\approx$15的令牌与参数比率是最优的,随着计算的增加略有上升趋势。这些结果表明,SR性能在很大程度上是可预测的计算,并为训练下一代SR模型提供了重要的见解。
摘要:Symbolic regression (SR) aims to discover the underlying mathematical expressions that explain observed data. This holds promise for both gaining scientific insight and for producing inherently interpretable and generalizable models for tabular data. In this work we focus on the basics of SR. Deep learning-based SR has recently become competitive with genetic programming approaches, but the role of scale has remained largely unexplored. Inspired by scaling laws in language modeling, we present the first systematic investigation of scaling in SR, using a scalable end-to-end transformer pipeline and carefully generated training data. Across five different model sizes and spanning three orders of magnitude in compute, we find that both validation loss and solved rate follow clear power-law trends with compute. We further identify compute-optimal hyperparameter scaling: optimal batch size and learning rate grow with model size, and a token-to-parameter ratio of $\approx$15 is optimal in our regime, with a slight upward trend as compute increases. These results demonstrate that SR performance is largely predictable from compute and offer important insights for training the next generation of SR models.


【2】SABER: Symbolic Regression-based Angle of Arrival and Beam Pattern Estimator
标题:SABER:基于符号回归的到达角和射束模式估计
链接:https://arxiv.org/abs/2510.26340

作者:Shih-Kai Chou, Mengran Zhao, Cheng-Nan Hu, Kuang-Chung Chou, Carolina Fortuna, Jernej Hribar
备注:12 pages, 11 figures
摘要:准确的到达角(Angle-of-Arrival,AoA)估计对于下一代无线通信系统实现可靠的波束成形、高精度定位和集成感测是必不可少的。不幸的是,经典的高分辨率技术需要多元素阵列和广泛的快照收集,而通用的机器学习(ML)方法通常会产生缺乏物理可解释性的黑盒模型。为了解决这些限制,我们提出了一个基于符号回归(SR)的ML框架。即,基于符号回归的到达角和波束方向图估计(SABER),一种约束符号回归框架,其自动从具有可解释性的路径损耗测量中发现封闭形式的波束方向图和AoA模型。SABER实现了高精度,同时弥合了不透明ML方法和可解释的物理驱动估计器之间的差距。首先,我们验证了我们的方法在受控的自由空间电波暗室中,示出了已知的$\cos^n$波束的直接反演和低阶多项式代理实现子0.5度的平均绝对误差(MAE)。一个纯粹的无约束SR方法可以进一步减少预测角度的误差,但产生复杂的公式,缺乏物理洞察力。然后,我们在现实世界中实现了相同的SR学习反演,可重构智能表面(RIS)辅助室内测试平台。SABER和无约束SR模型精确地恢复真实的AoA,误差接近零。最后,我们根据Cram\'er-Rao下限(CRLB)对SABER进行基准测试。我们的研究结果表明,SABER是一个可解释的和准确的替代国家的最先进的和黑盒ML为基础的方法AoA估计。
摘要:Accurate Angle-of-arrival (AoA) estimation is essential for next-generation wireless communication systems to enable reliable beamforming, high-precision localization, and integrated sensing. Unfortunately, classical high-resolution techniques require multi-element arrays and extensive snapshot collection, while generic Machine Learning (ML) approaches often yield black-box models that lack physical interpretability. To address these limitations, we propose a Symbolic Regression (SR)-based ML framework. Namely, Symbolic Regression-based Angle of Arrival and Beam Pattern Estimator (SABER), a constrained symbolic-regression framework that automatically discovers closed-form beam pattern and AoA models from path loss measurements with interpretability. SABER achieves high accuracy while bridging the gap between opaque ML methods and interpretable physics-driven estimators. First, we validate our approach in a controlled free-space anechoic chamber, showing that both direct inversion of the known $\cos^n$ beam and a low-order polynomial surrogate achieve sub-0.5 degree Mean Absolute Error (MAE). A purely unconstrained SR method can further reduce the error of the predicted angles, but produces complex formulas that lack physical insight. Then, we implement the same SR-learned inversions in a real-world, Reconfigurable Intelligent Surface (RIS)-aided indoor testbed. SABER and unconstrained SR models accurately recover the true AoA with near-zero error. Finally, we benchmark SABER against the Cram\'er-Rao Lower Bounds (CRLBs). Our results demonstrate that SABER is an interpretable and accurate alternative to state-of-the-art and black-box ML-based methods for AoA estimation.


医学相关(2篇)

【1】Predicting All-Cause Hospital Readmissions from Medical Claims Data of Hospitalised Patients
标题:根据住院患者的医疗索赔数据预测全因再入院
链接:https://arxiv.org/abs/2510.26188

作者:Avinash Kadimisetty, Arun Rajagopalan, Vijendra SK
备注:NCMLAI 2018
摘要 :减少可预防的再入院是寻求改善医疗保健和降低成本的支付者、提供者和政策制定者的国家优先事项。再入院率被用作确定医院提供的医疗保健质量的基准。在这个项目中,我们使用了机器学习技术,如逻辑回归,随机森林和支持向量机来分析健康索赔数据,并确定在预测全因再入院方面发挥关键作用的人口统计和医疗因素。由于健康索赔数据是高维的,我们使用主成分分析作为降维技术,并使用的结果建立回归模型。我们根据曲线下面积(AUC)指标对这些模型进行了比较和评估。随机森林模型的性能最高,其次是Logistic回归和支持向量机模型。这些模型可用于确定导致再入院的关键因素,并帮助确定需要重点关注的患者,以减少再入院的机会,最终降低成本并提高为患者提供的医疗保健质量。
摘要:Reducing preventable hospital readmissions is a national priority for payers, providers, and policymakers seeking to improve health care and lower costs. The rate of readmission is being used as a benchmark to determine the quality of healthcare provided by the hospitals. In thisproject, we have used machine learning techniques like Logistic Regression, Random Forest and Support Vector Machines to analyze the health claims data and identify demographic and medical factors that play a crucial role in predicting all-cause readmissions. As the health claims data is high dimensional, we have used Principal Component Analysis as a dimension reduction technique and used the results for building regression models. We compared and evaluated these models based on the Area Under Curve (AUC) metric. Random Forest model gave the highest performance followed by Logistic Regression and Support Vector Machine models. These models can be used to identify the crucial factors causing readmissions and help identify patients to focus on to reduce the chances of readmission, ultimately bringing down the cost and increasing the quality of healthcare provided to the patients.


【2】MedVLSynther: Synthesizing High-Quality Visual Question Answering from Medical Documents with Generator-Verifier LMMs
标题:MedVLSynther:使用生成器验证器LSYS从医疗文档中合成高质量视觉问题解答
链接:https://arxiv.org/abs/2510.25867

作者:Xiaoke Huang, Ningsen Wang, Hui Liu, Xianfeng Tang, Yuyin Zhou
备注:Project page, code, data, and models: this https URL
摘要:大型多模态模型(Large Multimodal Models,简称LMRM)越来越能够回答需要对图像和文本进行联合推理的医学问题,但由于缺乏大型、开放可用的高质量语料库,训练通用医学VQA系统受到阻碍。我们提出了MedVLSynther,一个规则指导的生成器验证框架,通过调节数字,标题和文本参考文献,直接从开放的生物医学文献中合成高质量的多项选择VQA项目。生成器在机器可检查的JSON模式下生成自包含的词干和并行的互斥选项;多阶段验证器强制执行基本门(自包含,单一正确答案,临床有效性,图像-文本一致性),奖励细粒度的积极点,并在接受之前惩罚常见的失败模式。将此管道应用于PubMed Central可产生MedSynVQA:13,087个审核问题,涉及14,803张图像,涵盖13种成像模式和28个解剖区域。使用可验证奖励通过强化学习训练开放权重LMM可以提高六个医疗VQA基准的准确性,平均达到55.85(3B)和58.15(7 B),VQA-RAD上最高可达77.57,PathVQA上最高可达67.76,优于强大的医疗LMM。消融验证生成和验证都是必要的,并且更多验证数据始终有帮助,并且有针对性的污染分析检测到评估套件没有泄漏。通过完全基于开放文献和开放权重模型进行操作,MedVLSynther为可扩展的医学VQA训练数据提供了一条可审计、可重现和隐私保护的路径。
摘要:Large Multimodal Models (LMMs) are increasingly capable of answering medical questions that require joint reasoning over images and text, yet training general medical VQA systems is impeded by the lack of large, openly usable, high-quality corpora. We present MedVLSynther, a rubric-guided generator-verifier framework that synthesizes high-quality multiple-choice VQA items directly from open biomedical literature by conditioning on figures, captions, and in-text references. The generator produces self-contained stems and parallel, mutually exclusive options under a machine-checkable JSON schema; a multi-stage verifier enforces essential gates (self-containment, single correct answer, clinical validity, image-text consistency), awards fine-grained positive points, and penalizes common failure modes before acceptance. Applying this pipeline to PubMed Central yields MedSynVQA: 13,087 audited questions over 14,803 images spanning 13 imaging modalities and 28 anatomical regions. Training open-weight LMMs with reinforcement learning using verifiable rewards improves accuracy across six medical VQA benchmarks, achieving averages of 55.85 (3B) and 58.15 (7B), with up to 77.57 on VQA-RAD and 67.76 on PathVQA, outperforming strong medical LMMs. A Ablations verify that both generation and verification are necessary and that more verified data consistently helps, and a targeted contamination analysis detects no leakage from evaluation suites. By operating entirely on open literature and open-weight models, MedVLSynther offers an auditable, reproducible, and privacy-preserving path to scalable medical VQA training data.


超分辨率|去噪|去模糊|去雾(1篇)

【1】Enabling Fast and Accurate Neutral Atom Readout through Image Denoising
标题:通过图像去噪实现快速准确的中性原子读取
链接:https://arxiv.org/abs/2510.25982

作者:Chaithanya Naik Mude, Linipun Phuttitarn, Satvik Maurya, Kunal Sinha, Mark Saffman, Swamit Tannu
备注:12 pages, 15 figures
摘要:中性原子量子计算机有望扩展到数十万个量子比特,但它们的进展受到量子比特读出速度缓慢的限制。测量量子比特目前需要几毫秒的时间比底层的量子门操作要长得多,这使得读出成为部署量子纠错的主要瓶颈。由于每轮QEC都取决于测量,因此长的读取时间会增加周期持续时间并降低程序执行速度。减少读出持续时间可以加快周期并减少量子比特空闲时积累的退相干误差,但它也会降低收集的光子数量,使测量更具噪音,更容易出错。这种权衡使中性原子系统在缓慢但准确的读出和快速但不可靠的读出之间卡住。   我们表明,图像去噪可以解决这种紧张局势。我们的框架GANDALF使用图像平移进行显式去噪,从短的低光子测量中重建清晰的信号,从而能够以高达1.6倍的短读出时间进行可靠的分类。与轻量级分类器和流水线读出设计相结合,与铯(Cs)中性原子阵列的最先进的基于CNN的读出相比,我们的方法将逻辑错误率降低了35倍,整体QEC周期时间降低了1.77倍。
摘要:Neutral atom quantum computers hold promise for scaling up to hundreds of thousands of qubits, but their progress is constrained by slow qubit readout. Measuring qubits currently takes milliseconds-much longer than the underlying quantum gate operations-making readout the primary bottleneck in deploying quantum error correction. Because each round of QEC depends on measurement, long readout times increase cycle duration and slow down program execution. Reducing the readout duration speeds up cycles and reduces decoherence errors that accumulate while qubits idle, but it also lowers the number of collected photons, making measurements noisier and more error-prone. This tradeoff leaves neutral atom systems stuck between slow but accurate readout and fast but unreliable readout.   We show that image denoising can resolve this tension. Our framework, GANDALF, uses explicit denoising using image translation to reconstruct clear signals from short, low-photon measurements, enabling reliable classification at up to 1.6x shorter readout times. Combined with lightweight classifiers and a pipelined readout design, our approach both reduces logical error rate by up to 35x and overall QEC cycle time up to 1.77x compared to state-of-the-art CNN-based readout for Cesium (Cs) Neutral Atom arrays.


联邦学习|隐私保护|加密(1篇)

【1】Non-Convex Over-the-Air Heterogeneous Federated Learning: A Bias-Variance Trade-off
标题:非凸空中异类联邦学习:偏差方差权衡
链接:https://arxiv.org/abs/2510.26722

作者:Muhammad Faraz Ul Abrar, Nicolò Michelusi
摘要:空中(OTA)联邦学习(FL)已被公认为是一种可扩展的范例,它利用无线多址信道的波形叠加来聚合模型更新。现有的OTA-FL设计主要通过假设\n {homogeneous}无线条件(跨设备的相等路径损耗)或强制零偏置更新来保证收敛,从而强制零偏置模型更新。然而,在异构无线场景下,这样的设计受到最弱设备的约束,并且增加了更新方差。此外,有偏见的OTA-FL的先验分析主要针对凸目标,而大多数现代AI模型都是高度非凸的。出于这些差距,我们研究OTA-FL与随机梯度下降(SGD)一般光滑非凸目标下无线异构。我们开发了新的OTA-FL SGD更新,允许结构化的,时不变的模型偏差,同时促进减少方差更新。我们推导出一个有限时间平稳界(预期时间平均平方梯度范数),明确揭示了偏差方差权衡。为了优化这种权衡,我们提出了一个非凸联合OTA功率控制设计和开发一个有效的连续凸近似(SCA)算法,只需要在基站的统计CSI。非凸图像分类任务上的实验验证了该方法:基于SCA的设计通过优化的偏置加速收敛,并提高了先前OTA-FL基线的泛化能力。
摘要 :Over-the-air (OTA) federated learning (FL) has been well recognized as a scalable paradigm that exploits the waveform superposition of the wireless multiple-access channel to aggregate model updates in a single use. Existing OTA-FL designs largely enforce zero-bias model updates by either assuming \emph{homogeneous} wireless conditions (equal path loss across devices) or forcing zero-bias updates to guarantee convergence. Under \emph{heterogeneous} wireless scenarios, however, such designs are constrained by the weakest device and inflate the update variance. Moreover, prior analyses of biased OTA-FL largely address convex objectives, while most modern AI models are highly non-convex. Motivated by these gaps, we study OTA-FL with stochastic gradient descent (SGD) for general smooth non-convex objectives under wireless heterogeneity. We develop novel OTA-FL SGD updates that allow a structured, time-invariant model bias while facilitating reduced variance updates. We derive a finite-time stationarity bound (expected time average squared gradient norm) that explicitly reveals a bias-variance trade-off. To optimize this trade-off, we pose a non-convex joint OTA power-control design and develop an efficient successive convex approximation (SCA) algorithm that requires only statistical CSI at the base station. Experiments on a non-convex image classification task validate the approach: the SCA-based design accelerates convergence via an optimized bias and improves generalization over prior OTA-FL baselines.


推理|分析|理解|解释(10篇)

【1】Defeating the Training-Inference Mismatch via FP16
标题:通过FP 16克服训练-推理不匹配
链接:https://arxiv.org/abs/2510.26788

作者:Penghui Qi, Zichen Liu, Xiangxin Zhou, Tianyu Pang, Chao Du, Wee Sun Lee, Min Lin
摘要:大型语言模型(LLM)的强化学习(RL)微调通常由于训练和推理策略之间的数值不匹配而不稳定。虽然先前的工作试图通过算法校正或工程对齐来缓解这个问题,但我们发现其根本原因在于浮点精度本身。广泛采用的BF 16尽管动态范围很大,但会引入较大的舍入误差,从而破坏训练和推理之间的一致性。在这项工作中,我们证明了简单地恢复到\textbf{FP 16}有效地消除了这种不匹配。这种变化很简单,完全由现代框架支持,只需要几行代码更改,并且不需要修改模型架构或学习算法。我们的研究结果表明,统一使用FP 16可以在不同的任务、算法和框架中产生更稳定的优化、更快的收敛和更强的性能。我们希望这些发现能够激发人们对RL微调中精度权衡的更广泛的重新考虑。
摘要:Reinforcement learning (RL) fine-tuning of large language models (LLMs) often suffers from instability due to the numerical mismatch between the training and inference policies. While prior work has attempted to mitigate this issue through algorithmic corrections or engineering alignments, we show that its root cause lies in the floating point precision itself. The widely adopted BF16, despite its large dynamic range, introduces large rounding errors that breaks the consistency between training and inference. In this work, we demonstrate that simply reverting to \textbf{FP16} effectively eliminates this mismatch. The change is simple, fully supported by modern frameworks with only a few lines of code change, and requires no modification to the model architecture or learning algorithm. Our results suggest that using FP16 uniformly yields more stable optimization, faster convergence, and stronger performance across diverse tasks, algorithms and frameworks. We hope these findings motivate a broader reconsideration of precision trade-offs in RL fine-tuning.


【2】Towards Explainable and Reliable AI in Finance
标题:迈向金融领域可解释且可靠的人工智能
链接:https://arxiv.org/abs/2510.26353

作者:Albi Isufaj, Pablo Mollá, Helmut Prendinger
摘要:财务预测越来越多地使用大型神经网络模型,但它们的不透明性对信任和监管合规提出了挑战。我们提出了几种方法来解释和可靠的金融AI。首先,我们描述了时间序列基础模型Time-LLM如何使用提示来避免错误的方向预测。\n {Second},我们表明,将时间序列预测的基础模型与可靠性估计相结合,可以过滤我们不可靠的预测。第三,我们主张符号推理编码域规则的透明理由。这些方法转变强调只执行既可靠又可解释的预测。对股票和加密货币数据的实验表明,该架构减少了误报,并支持选择性执行。通过将预测性能与可靠性估计和基于规则的推理相结合,我们的框架推进了透明和可审计的金融AI系统。
摘要:Financial forecasting increasingly uses large neural network models, but their opacity raises challenges for trust and regulatory compliance. We present several approaches to explainable and reliable AI in finance. \emph{First}, we describe how Time-LLM, a time series foundation model, uses a prompt to avoid a wrong directional forecast. \emph{Second}, we show that combining foundation models for time series forecasting with a reliability estimator can filter our unreliable predictions. \emph{Third}, we argue for symbolic reasoning encoding domain rules for transparent justification. These approaches shift emphasize executing only forecasts that are both reliable and explainable. Experiments on equity and cryptocurrency data show that the architecture reduces false positives and supports selective execution. By integrating predictive performance with reliability estimation and rule-based reasoning, our framework advances transparent and auditable financial AI systems.


【3】Understanding Hardness of Vision-Language Compositionality from A Token-level Causal Lens
标题:从符号层次的因果视角理解视觉语言组合性的难度
链接:https://arxiv.org/abs/2510.26302

作者:Ziliang Chen, Tianang Xiao, Jusheng Zhang, Yongsen Zheng, Xipeng Chen
摘要:对比图像预训练(CLIP)通过在共享嵌入空间中对齐图像和文本来提供强大的跨模态泛化,但它在对象,属性和关系的组合推理方面一直失败,通常表现得像一个词袋匹配器。先前的因果解释通常将文本建模为单个向量,模糊了标记级结构,并留下了核心现象,例如无法解释的快速敏感性和硬底片故障。我们使用基于顺序语言令牌SCM的令牌感知因果表示学习(CRL)框架来解决这一差距。我们的理论将块可识别性扩展到标记化文本,证明CLIP的对比目标可以在句子级和标记级SCM下恢复模态不变的潜在变量。至关重要的是,令牌粒度产生了CLIP的组成脆性的第一个原则性解释:组成不可识别性。我们展示了伪最优文本编码器的存在,这些编码器实现了完美的模态不变对齐,但可证明对原子概念上的SWAP,REPLACE和ADD操作不敏感,因此尽管优化了与真正最优编码器相同的训练目标,但仍无法区分正确的字幕和硬底片。分析进一步链接语言侧的不可识别性,视觉侧的故障,通过模态差距,并显示如何迭代组合运算符复合硬度,激励改进的负面挖掘策略。
摘要:Contrastive Language-Image Pre-training (CLIP) delivers strong cross modal generalization by aligning images and texts in a shared embedding space, yet it persistently fails at compositional reasoning over objects, attributes, and relations often behaving like a bag-of-words matcher. Prior causal accounts typically model text as a single vector, obscuring token-level structure and leaving core phenomena-such as prompt sensitivity and failures on hard negatives unexplained. We address this gap with a token-aware causal representation learning (CRL) framework grounded in a sequential, language-token SCM. Our theory extends block identifiability to tokenized text, proving that CLIP's contrastive objective can recover the modal-invariant latent variable under both sentence-level and token-level SCMs. Crucially, token granularity yields the first principled explanation of CLIP's compositional brittleness: composition nonidentifiability. We show the existence of pseudo-optimal text encoders that achieve perfect modal-invariant alignment yet are provably insensitive to SWAP, REPLACE, and ADD operations over atomic concepts, thereby failing to distinguish correct captions from hard negatives despite optimizing the same training objective as true-optimal encoders. The analysis further links language-side nonidentifiability to visual-side failures via the modality gap and shows how iterated composition operators compound hardness, motivating improved negative mining strategies.


【4】Lean4Physics: Comprehensive Reasoning Framework for College-level Physics in Lean4
标题:Lean 4 Physics:Lean 4中大学物理的综合推理框架
链接:https://arxiv.org/abs/2510.26094

作者:Yuxin Li, Minghao Liu, Ruida Wang, Wenzhao Ji, Zhitao He, Rui Pan, Junming Huang, Tong Zhang, Yi R. Fung
摘要:我们提出了 ** Lean 4PHYS **,这是Lean 4中大学物理问题的综合推理框架。** Lean 4PHYS ** 包括 *LeanPhysBench*,这是Lean 4中正式物理推理的大学水平基准,其中包含来自大学教科书和物理竞赛问题的200个手工制作和同行评审的语句。为了为物理学中的形式推理建立坚实的基础,我们还介绍了 *PhysLib*,这是一个社区驱动的存储库,包含形式物理推理所必需的基本单位系统和定理。基于我们在 ** Lean 4PHYS ** 中编写的基准测试和Lean 4存储库,我们使用主要的专家Math Lean 4证明器和最先进的闭源模型报告了基线结果,其中DeepSeek-Prover-V2- 7 B的最佳性能仅达到16%,Claude-Sonnet-4达到35%。我们还进行了详细的分析,显示我们的 *PhysLib* 可以实现模型性能平均提高11.75%。这证明了我们的 *LeanPhysBench* 的挑战性和 *PhysLib* 的有效性。据我们所知,这是第一个在Lean 4中提供物理基准的研究。
摘要:We present **Lean4PHYS**, a comprehensive reasoning framework for college-level physics problems in Lean4. **Lean4PHYS** includes *LeanPhysBench*, a college-level benchmark for formal physics reasoning in Lean4, which contains 200 hand-crafted and peer-reviewed statements derived from university textbooks and physics competition problems. To establish a solid foundation for formal reasoning in physics, we also introduce *PhysLib*, a community-driven repository containing fundamental unit systems and theorems essential for formal physics reasoning. Based on the benchmark and Lean4 repository we composed in **Lean4PHYS**, we report baseline results using major expert Math Lean4 provers and state-of-the-art closed-source models, with the best performance of DeepSeek-Prover-V2-7B achieving only 16% and Claude-Sonnet-4 achieving 35%. We also conduct a detailed analysis showing that our *PhysLib* can achieve an average improvement of 11.75% in model performance. This demonstrates the challenging nature of our *LeanPhysBench* and the effectiveness of *PhysLib*. To the best of our knowledge, this is the first study to provide a physics benchmark in Lean4.


【5】LLMBisect: Breaking Barriers in Bug Bisection with A Comparative Analysis Pipeline
标题:LLMBisect:通过比较分析管道打破漏洞分割中的障碍
链接:https://arxiv.org/abs/2510.26086

作者:Zheng Zhang, Haonan Li, Xingyu Li, Hang Zhang, Zhiyun Qian
摘要:漏洞二分法一直是一项重要的安全任务,旨在了解受漏洞影响的软件版本范围,即,识别引入bug的提交。然而,传统的基于补丁的二分方法面临着几个重大障碍:例如,他们假设错误诱导提交(BIC)和补丁提交修改相同的功能,这并不总是正确的。它们通常只依赖于代码更改,而提交消息通常包含大量与可扩展性相关的信息。它们也基于简单的逻辑学(例如,假设BIC代码行在补丁中被删除),并且缺乏对漏洞的任何逻辑分析。   在本文中,我们观察到大型语言模型(LLM)能够打破现有解决方案的障碍,例如,理解补丁和提交中的文本数据和代码。与以前的BIC识别方法不同,这些方法产生的结果很差,我们提出了一个全面的多阶段管道,该管道利用LLM来:(1)充分利用补丁信息,(2)在上下文中比较多个候选提交,以及(3)通过一系列向下选择步骤逐步缩小候选提交。在我们的评估中,我们证明了我们的方法比最先进的解决方案实现了超过38%的准确性。我们的研究结果进一步证实,全面的多级管道是必不可少的,因为它提高了60%的精度比基线LLM为基础的二分法。
摘要:Bug bisection has been an important security task that aims to understand the range of software versions impacted by a bug, i.e., identifying the commit that introduced the bug. However, traditional patch-based bisection methods are faced with several significant barriers: For example, they assume that the bug-inducing commit (BIC) and the patch commit modify the same functions, which is not always true. They often rely solely on code changes, while the commit message frequently contains a wealth of vulnerability-related information. They are also based on simple heuristics (e.g., assuming the BIC initializes lines deleted in the patch) and lack any logical analysis of the vulnerability.   In this paper, we make the observation that Large Language Models (LLMs) are well-positioned to break the barriers of existing solutions, e.g., comprehend both textual data and code in patches and commits. Unlike previous BIC identification approaches, which yield poor results, we propose a comprehensive multi-stage pipeline that leverages LLMs to: (1) fully utilize patch information, (2) compare multiple candidate commits in context, and (3) progressively narrow down the candidates through a series of down-selection steps. In our evaluation, we demonstrate that our approach achieves significantly better accuracy than the state-of-the-art solution by more than 38\%. Our results further confirm that the comprehensive multi-stage pipeline is essential, as it improves accuracy by 60\% over a baseline LLM-based bisection method.


【6】Dual Mixture-of-Experts Framework for Discrete-Time Survival Analysis
标题:离散时间生存分析的双重混合专家框架
链接:https://arxiv.org/abs/2510.26014

作者:Hyeonjun Lee, Hyungseob Shin, Gunhee Nam, Hyeonsoo Lee
备注:Accepted to NeurIPS 2025 workshop Learning from Time Series for Health (TS4H)
摘要:生存分析是一项对感兴趣的事件发生之前的时间进行建模的任务,广泛用于临床和生物医学研究。一个关键的挑战是对患者异质性进行建模,同时还根据个体特征和时间动态调整风险预测。我们提出了一个双混合专家(MoE)的离散时间生存分析框架。我们的方法结合了一个功能编码器MoE的子群感知表示学习与危险MoE,利用患者的功能和时间嵌入捕捉时间动态。这种双MoE设计灵活地与现有的基于深度学习的生存管道集成。在METABRIC和GBSG乳腺癌数据集上,我们的方法始终提高了性能,将测试集上的时间依赖性C指数提高到0.04,并在纳入Consurv框架时进一步提高了性能。
摘要:Survival analysis is a task to model the time until an event of interest occurs, widely used in clinical and biomedical research. A key challenge is to model patient heterogeneity while also adapting risk predictions to both individual characteristics and temporal dynamics. We propose a dual mixture-of-experts (MoE) framework for discrete-time survival analysis. Our approach combines a feature-encoder MoE for subgroup-aware representation learning with a hazard MoE that leverages patient features and time embeddings to capture temporal dynamics. This dual-MoE design flexibly integrates with existing deep learning based survival pipelines. On METABRIC and GBSG breast cancer datasets, our method consistently improves performance, boosting the time-dependent C-index up to 0.04 on the test sets, and yields further gains when incorporated into the Consurv framework.


【7】Review Based Entity Ranking using Fuzzy Logic Algorithmic Approach: Analysis
标题:使用模糊逻辑数学方法基于审查的实体排名:分析
链接:https://arxiv.org/abs/2510.25778

作者:Pratik N. Kalamkar, Anupama G. Phakatkar
备注:10 pages, 3 figures, International Journal Of Engineering And Computer Science ISSN:2319-7242
摘要:意见挖掘,也称为情感分析,是分析人们对产品,服务,组织,个人,问题,事件,主题及其属性等实体的意见,情感,评价,评价,态度和情感的研究领域。基于词典的整体方法不考虑每个观点的强度,即,该意见是非常强烈的负面(或正面)、强烈的负面(或正面)、中等的负面(或正面)、非常弱的负面(或正面)和弱的负面(或正面)。在本文中,我们提出的方法来排名实体的方向和强度的基础上的实体评论和用户的查询,通过将它们在粒度级别(即非常弱,弱,中等,非常强和强)相结合的意见词(即副词,形容词,名词和动词),涉及到某些产品的兴趣方面。我们将使用模糊逻辑算法的方法,以便将意见词分为不同的类别和句法依赖决议,以找到所需的方面的话的关系。与感兴趣的某些方面相关的意见词被认为是在评论中找到该方面的实体分数。
摘要 :Opinion mining, also called sentiment analysis, is the field of study that analyzes people opinions, sentiments, evaluations, appraisals, attitudes, and emotions towards entities such as products, services, organizations, individuals, issues, events, topics, and their attributes. Holistic lexicon-based approach does not consider the strength of each opinion, i.e., whether the opinion is very strongly negative (or positive), strongly negative (or positive), moderate negative (or positive), very weakly negative (or positive) and weakly negative (or positive). In this paper, we propose approach to rank entities based on orientation and strength of the entity reviews and user's queries by classifying them in granularity levels (i.e. very weak, weak, moderate, very strong and strong) by combining opinion words (i.e. adverb, adjective, noun and verb) that are related to aspect of interest of certain product. We shall use fuzzy logic algorithmic approach in order to classify opinion words into different category and syntactic dependency resolution to find relations for desired aspect words. Opinion words related to certain aspects of interest are considered to find the entity score for that aspect in the review.


【8】Towards Piece-by-Piece Explanations for Chess Positions with SHAP
标题:通过SHAP实现国际象棋位置的逐件简化
链接:https://arxiv.org/abs/2510.25775

作者:Francesco Spinnato
摘要:当代国际象棋引擎提供精确但不透明的评估,通常以厘兵分数表示。这些产出虽然对决策有效,但却掩盖了个别部分或模式的潜在贡献。在本文中,我们探讨适应SHAP(SHapley加法解释)的国际象棋分析领域,旨在属性的国际象棋引擎的评价,以特定的棋子在董事会上。通过将碎片视为特征并系统地烧蚀它们,我们计算附加的,每片的贡献,以本地忠实和人类可解释的方式解释发动机输出。这种方法从经典国际象棋教学法中汲取灵感,即玩家通过在精神上移除棋子来评估位置,并以现代可解释的人工智能技术为基础。我们的方法为可视化、人工训练和发动机比较开辟了新的可能性。我们发布了附带的代码和数据,以促进可解释的国际象棋AI的未来研究。
摘要:Contemporary chess engines offer precise yet opaque evaluations, typically expressed as centipawn scores. While effective for decision-making, these outputs obscure the underlying contributions of individual pieces or patterns. In this paper, we explore adapting SHAP (SHapley Additive exPlanations) to the domain of chess analysis, aiming to attribute a chess engines evaluation to specific pieces on the board. By treating pieces as features and systematically ablating them, we compute additive, per-piece contributions that explain the engines output in a locally faithful and human-interpretable manner. This method draws inspiration from classical chess pedagogy, where players assess positions by mentally removing pieces, and grounds it in modern explainable AI techniques. Our approach opens new possibilities for visualization, human training, and engine comparison. We release accompanying code and data to foster future research in interpretable chess AI.


【9】A Unified Theory for Causal Inference: Direct Debiased Machine Learning via Bregman-Riesz Regression
标题:因果推理的统一理论:通过Bregman-Riesz回归的直接去偏置机器学习
链接:https://arxiv.org/abs/2510.26783

作者:Masahiro Kato
摘要:本文介绍了一个统一的因果推理理论,它集成了Riesz回归,协变量平衡,密度比估计(DRE),目标最大似然估计(TMLE)和平均治疗效果(ATE)估计中的匹配估计。在ATE估计中,结果的平衡权重和回归函数起着重要的作用,其中平衡权重被称为Riesz表示器、偏差校正项和聪明的协变量,这取决于上下文。Riesz回归、协变量平衡、DRE和匹配估计量是用于估计平衡权重的方法,其中Riesz回归基本上等同于ATE上下文中的DRE,匹配估计量是DRE的特殊情况,并且DRE与协变量平衡具有对偶关系。TMLE是一种用于构造回归函数估计量的方法,使得领先偏差项变为零。最近邻匹配等价于最小二乘密度比估计和Riesz回归。
摘要:This note introduces a unified theory for causal inference that integrates Riesz regression, covariate balancing, density-ratio estimation (DRE), targeted maximum likelihood estimation (TMLE), and the matching estimator in average treatment effect (ATE) estimation. In ATE estimation, the balancing weights and the regression functions of the outcome play important roles, where the balancing weights are referred to as the Riesz representer, bias-correction term, and clever covariates, depending on the context. Riesz regression, covariate balancing, DRE, and the matching estimator are methods for estimating the balancing weights, where Riesz regression is essentially equivalent to DRE in the ATE context, the matching estimator is a special case of DRE, and DRE is in a dual relationship with covariate balancing. TMLE is a method for constructing regression function estimators such that the leading bias term becomes zero. Nearest Neighbor Matching is equivalent to Least Squares Density Ratio Estimation and Riesz Regression.


【10】Conformal Prediction Beyond the Horizon: Distribution-Free Inference for Policy Evaluation
标题:超越地平线的保形预测:政策评估的无分布推理
链接:https://arxiv.org/abs/2510.26026

作者:Feichen Gan, Youcun Lu, Yingying Zhang, Yukun Liu
摘要:可靠的不确定性量化对于高风险环境中的强化学习(RL)至关重要。我们提出了一个统一的共形预测框架的无限地平线的政策评估,构建分布免费的预测区间{返回}在这两个政策和关闭的政策设置。我们的方法集成了分布式RL与共形校准,解决了未观察到的回报,时间依赖性和分布变化等挑战。我们提出了一个模块化的伪返回构造截断推出和时间感知的校准策略,使用经验重放和加权子采样的基础上。这些创新减轻了模型偏差,恢复了近似的互换性,即使在政策变化的情况下也能量化不确定性。我们的理论分析提供了覆盖保证,可以解释模型错误指定和重要性权重估计。实证结果,包括在合成和基准环境中的实验,如山地车,表明我们的方法显着提高了覆盖率和可靠性超过标准的分布式RL基线。
摘要:Reliable uncertainty quantification is crucial for reinforcement learning (RL) in high-stakes settings. We propose a unified conformal prediction framework for infinite-horizon policy evaluation that constructs distribution-free prediction intervals {for returns} in both on-policy and off-policy settings. Our method integrates distributional RL with conformal calibration, addressing challenges such as unobserved returns, temporal dependencies, and distributional shifts. We propose a modular pseudo-return construction based on truncated rollouts and a time-aware calibration strategy using experience replay and weighted subsampling. These innovations mitigate model bias and restore approximate exchangeability, enabling uncertainty quantification even under policy shifts. Our theoretical analysis provides coverage guarantees that account for model misspecification and importance weight estimation. Empirical results, including experiments in synthetic and benchmark environments like Mountain Car, show that our method significantly improves coverage and reliability over standard distributional RL baselines.


检测相关(5篇)

【1】MSAD: A Deep Dive into Model Selection for Time series Anomaly Detection
标题:MSAD:深入研究时间序列异常检测的模型选择
链接:https://arxiv.org/abs/2510.26643

作者:Emmanouil Sylligardos, John Paparrizos, Themis Palpanas, Pierre Senellart, Paul Boniol
备注:25 pages, 13 figures, VLDB Journal
摘要:异常检测是时间序列分析的一项基本任务,对许多应用程序的下游性能具有重要影响。尽管越来越多的学术兴趣和大量的方法在文献中提出,最近的基准和评估研究表明,没有整体最好的异常检测方法时,存在非常异构的时间序列数据集。因此,解决从不同领域收集的非常不同的时间序列上的异常检测的唯一可扩展和可行的解决方案是提出一种模型选择方法,该方法将基于时间序列特征选择最佳的异常检测方法来运行。不幸的是,现有的AutoML解决方案并不直接适用于时间序列异常检测,也没有对基于时间序列的模型选择方法进行评估。朝着这个方向,本文研究了时间序列分类方法作为异常检测的模型选择的性能。总的来说,我们评估了234个模型配置来自16个基本分类器在1980多个时间序列,我们提出了第一个广泛的实验评估的时间序列分类模型选择异常检测。我们的研究结果表明,模型选择方法优于每一个单一的异常检测方法,而在相同的数量级的执行时间。该评估是证明时间序列分类算法用于异常检测的准确性和效率的第一步,并且代表了一个强大的基线,然后可以用于指导一般AutoML管道中的模型选择步骤。VLDB杂志接受的一篇文章的预印本。
摘要 :Anomaly detection is a fundamental task for time series analytics with important implications for the downstream performance of many applications. Despite increasing academic interest and the large number of methods proposed in the literature, recent benchmarks and evaluation studies demonstrated that no overall best anomaly detection methods exist when applied to very heterogeneous time series datasets. Therefore, the only scalable and viable solution to solve anomaly detection over very different time series collected from diverse domains is to propose a model selection method that will select, based on time series characteristics, the best anomaly detection methods to run. Existing AutoML solutions are, unfortunately, not directly applicable to time series anomaly detection, and no evaluation of time series-based approaches for model selection exists. Towards that direction, this paper studies the performance of time series classification methods used as model selection for anomaly detection. In total, we evaluate 234 model configurations derived from 16 base classifiers across more than 1980 time series, and we propose the first extensive experimental evaluation of time series classification as model selection for anomaly detection. Our results demonstrate that model selection methods outperform every single anomaly detection method while being in the same order of magnitude regarding execution time. This evaluation is the first step to demonstrate the accuracy and efficiency of time series classification algorithms for anomaly detection, and represents a strong baseline that can then be used to guide the model selection step in general AutoML pipelines. Preprint version of an article accepted at the VLDB Journal.


【2】Segmentation over Complexity: Evaluating Ensemble and Hybrid Approaches for Anomaly Detection in Industrial Time Series
标题:复杂性的分割:评估工业时间序列异常检测的集合和混合方法
链接:https://arxiv.org/abs/2510.26159

作者:Emilio Mastriani, Alessandro Costa, Federico Incardona, Kevin Munari, Sebastiano Spinello
备注:This paper is currently under review for presentation at the IEEE SAMI 2026 Conference
摘要:在这项研究中,我们研究了先进的特征工程和混合模型架构的有效性异常检测在一个多变量的工业时间序列,专注于汽轮机系统。我们评估了变化点派生的统计特征,基于聚类的子结构表示,和混合学习策略对检测性能的影响。尽管它们在理论上很有吸引力,但与在分割数据上训练的简单随机森林+ XGBoost集成相比,这些复杂的方法始终表现不佳。总体实现了0.976的AUC-ROC,0.41的F1评分,以及在定义的时间窗内100%的早期检测。我们的研究结果强调,在具有高度不平衡和时间不确定数据的情况下,模型简单性与优化分割相结合可以胜过更复杂的架构,提供更高的鲁棒性,可解释性和操作实用性。
摘要:In this study, we investigate the effectiveness of advanced feature engineering and hybrid model architectures for anomaly detection in a multivariate industrial time series, focusing on a steam turbine system. We evaluate the impact of change point-derived statistical features, clustering-based substructure representations, and hybrid learning strategies on detection performance. Despite their theoretical appeal, these complex approaches consistently underperformed compared to a simple Random Forest + XGBoost ensemble trained on segmented data. The ensemble achieved an AUC-ROC of 0.976, F1-score of 0.41, and 100% early detection within the defined time window. Our findings highlight that, in scenarios with highly imbalanced and temporally uncertain data, model simplicity combined with optimized segmentation can outperform more sophisticated architectures, offering greater robustness, interpretability, and operational utility.


【3】Detecting Anomalies in Machine Learning Infrastructure via Hardware Telemetry
标题:通过硬件遥感检测机器学习基础设施中的异常
链接:https://arxiv.org/abs/2510.26008

作者:Ziji Chen, Steven Chien, Peng Qian, Noa Zilberman
备注:12 pages, 9 figures, submitted to nsdi 26
摘要:现代机器学习(ML)已经发展成为一个紧密耦合的全栈生态系统,结合了硬件、软件、网络和应用程序。许多用户依赖云提供商提供弹性、隔离和经济高效的资源。不幸的是,这些平台即服务使用虚拟化,这意味着运营商几乎无法了解用户的工作负载。这阻碍了运营商的资源优化,这对于确保成本效率和最小化执行时间至关重要。在本文中,我们认为,工作负载的知识是不必要的系统级优化。我们提出了System-X,它采用以硬件为中心的方法,仅依赖于硬件信号-完全可由操作员访问。使用从系统收集的低电平信号,System-X通过无监督学习管道检测异常。该管道通过分析各种硬件平台上的30多种流行ML模型来开发,确保适应新兴工作负载和未知部署模式。使用System-X,我们成功识别了网络和系统配置问题,使DeepSeek模型加速了5.97%。
摘要:Modern machine learning (ML) has grown into a tightly coupled, full-stack ecosystem that combines hardware, software, network, and applications. Many users rely on cloud providers for elastic, isolated, and cost-efficient resources. Unfortunately, these platforms as a service use virtualization, which means operators have little insight into the users' workloads. This hinders resource optimizations by the operator, which is essential to ensure cost efficiency and minimize execution time. In this paper, we argue that workload knowledge is unnecessary for system-level optimization. We propose System-X, which takes a \emph{hardware-centric} approach, relying only on hardware signals -- fully accessible by operators. Using low-level signals collected from the system, System-X detects anomalies through an unsupervised learning pipeline. The pipeline is developed by analyzing over 30 popular ML models on various hardware platforms, ensuring adaptability to emerging workloads and unknown deployment patterns. Using System-X, we successfully identified both network and system configuration issues, accelerating the DeepSeek model by 5.97%.


【4】Attention Augmented GNN RNN-Attention Models for Advanced Cybersecurity Intrusion Detection
标题:注意力增强GNN RNN-用于高级网络安全入侵检测的注意力模型
链接:https://arxiv.org/abs/2510.25802

作者:Jayant Biradar, Smit Shah, Tanmay Naik
摘要:在本文中,我们提出了一种新型的混合深度学习架构,该架构协同结合了图神经网络(GNNs),递归神经网络(RNN)和多头注意力机制,以显着增强网络安全入侵检测能力。通过利用包含不同网络流量模式的综合UNSW-NB 15数据集,我们的方法有效地通过图形结构关系捕获空间依赖性,并通过对网络事件的顺序分析捕获时间动态。集成的注意力机制提供了改进的模型可解释性和增强的特征选择的双重好处,使网络安全分析师能够将计算资源集中在高影响力的安全事件上-这是现代实时入侵检测系统的关键要求。我们广泛的实验评估表明,与传统的机器学习方法和独立的深度学习模型相比,所提出的混合模型在多个评估指标(包括准确性,精确度,召回率和F1分数)上具有更好的性能。该模型在检测高级持续性威胁(APT)、分布式拒绝服务(DDoS)攻击和零日漏洞等复杂攻击模式方面具有特别强的性能,使其成为复杂网络环境中下一代网络安全应用的有前途的解决方案。
摘要:In this paper, we propose a novel hybrid deep learning architecture that synergistically combines Graph Neural Networks (GNNs), Recurrent Neural Networks (RNNs), and multi-head attention mechanisms to significantly enhance cy- bersecurity intrusion detection capabilities. By leveraging the comprehensive UNSW-NB15 dataset containing diverse network traffic patterns, our approach effectively captures both spatial dependencies through graph structural relationships and tem- poral dynamics through sequential analysis of network events. The integrated attention mechanism provides dual benefits of improved model interpretability and enhanced feature selection, enabling cybersecurity analysts to focus computational resources on high-impact security events - a critical requirement in modern real-time intrusion detection systems. Our extensive experimental evaluation demonstrates that the proposed hybrid model achieves superior performance compared to traditional machine learning approaches and standalone deep learning models across multiple evaluation metrics, including accuracy, precision, recall, and F1-score. The model achieves particularly strong performance in detecting sophisticated attack patterns such as Advanced Persistent Threats (APTs), Distributed Denial of Service (DDoS) attacks, and zero-day exploits, making it a promising solution for next-generation cybersecurity applications in complex network environments.


【5】Pulsar Detection with Deep Learning
标题:利用深度学习检测脉冲星
链接:https://arxiv.org/abs/2510.25774

作者:Manideep Pendyala
备注:56 pages, My master's thesis
摘要:脉冲星勘测每次运行产生数百万个候选项,压倒了人工检查。本文建立了一个深度学习管道,用于无线电脉冲星候选选择,将阵列衍生特征与图像诊断相融合。从大约500 GB的巨型米波射电望远镜(GMRT)数据中,原始电压被转换为滤波器组(SIGPROC),然后通过试验色散测量(PRESTO)进行解色散和折叠,以产生大约32,000个候选值。每个候选者产生四个诊断--求和轮廓、时间与相位、子带与相位和DM曲线--表示为阵列和图像。基线堆叠模型(用于阵列的ANN+用于具有逻辑回归融合的图像的CNN)达到68%的准确度。然后,我们改进了CNN架构和训练(正则化,学习率调度,最大范数约束),并通过有针对性的增强来减轻类不平衡,包括针对少数类的基于GAN的生成器。增强的CNN达到了87%的准确率;最终的GAN+CNN系统在保持测试集上达到了94%的准确率,同时保持了足够的轻量级以进行近实时的分类。结果表明,结合阵列和图像通道提高了分离性,只有图像的方法,和适度的生成增强大大提高少数(脉冲星)召回。该方法是调查不可知的,可扩展到即将到来的高通量设施。
摘要:Pulsar surveys generate millions of candidates per run, overwhelming manual inspection. This thesis builds a deep learning pipeline for radio pulsar candidate selection that fuses array-derived features with image diagnostics. From approximately 500 GB of Giant Metrewave Radio Telescope (GMRT) data, raw voltages are converted to filterbanks (SIGPROC), then de-dispersed and folded across trial dispersion measures (PRESTO) to produce approximately 32,000 candidates. Each candidate yields four diagnostics--summed profile, time vs. phase, subbands vs. phase, and DM curve--represented as arrays and images. A baseline stacked model (ANNs for arrays + CNNs for images with logistic-regression fusion) reaches 68% accuracy. We then refine the CNN architecture and training (regularization, learning-rate scheduling, max-norm constraints) and mitigate class imbalance via targeted augmentation, including a GAN-based generator for the minority class. The enhanced CNN attains 87% accuracy; the final GAN+CNN system achieves 94% accuracy with balanced precision and recall on a held-out test set, while remaining lightweight enough for near--real-time triage. The results show that combining array and image channels improves separability over image-only approaches, and that modest generative augmentation substantially boosts minority (pulsar) recall. The methods are survey-agnostic and extensible to forthcoming high-throughput facilities.


分类|识别(5篇)

【1】LSM-MS2: A Foundation Model Bridging Spectral Identification and Biological Interpretation
标题:LSM-MS 2:连接光谱识别和生物解释的基础模型
链接:https://arxiv.org/abs/2510.26715

作者:Gabriel Asher, Devesh Shah, Amy A. Caudy, Luke Ferro, Lea Amar, Ana S. H. Costa, Thomas Patton, Niall O'Connor, Jennifer M. Campbell, Jack Geremia
摘要:绝大多数的质谱数据仍然没有得到表征,留下许多生物和化学信息未被利用。机器学习的最新进展已经开始解决这一差距,特别是对于串联质谱数据中的光谱识别等任务。在这里,我们介绍了最新一代的LSM-MS 2,这是一个大规模的深度学习基础模型,在数百万个光谱上进行训练,以学习语义化学空间。LSM-MS 2在光谱识别方面达到了最先进的性能,在识别具有挑战性的异构体化合物的准确性方面比现有方法提高了30%,在复杂生物样品中的识别准确率提高了42%,并在低浓度条件下保持了鲁棒性。此外,LSM-MS 2产生丰富的光谱嵌入,能够从最少的下游数据进行直接的生物学解释,成功区分疾病状态并预测各种翻译应用的临床结果。
摘要:A vast majority of mass spectrometry data remains uncharacterized, leaving much of its biological and chemical information untapped. Recent advances in machine learning have begun to address this gap, particularly for tasks such as spectral identification in tandem mass spectrometry data. Here, we present the latest generation of LSM-MS2, a large-scale deep learning foundation model trained on millions of spectra to learn a semantic chemical space. LSM-MS2 achieves state-of-the-art performance in spectral identification, improving on existing methods by 30% in accuracy of identifying challenging isomeric compounds, yielding 42% more correct identifications in complex biological samples, and maintaining robustness under low-concentration conditions. Furthermore, LSM-MS2 produces rich spectral embeddings that enable direct biological interpretation from minimal downstream data, successfully differentiating disease states and predicting clinical outcomes across diverse translational applications.


【2】CorVS: Person Identification via Video Trajectory-Sensor Correspondence in a Real-World Warehouse
标题:CorVS:通过现实世界仓库中的视频轨迹-传感器对应进行人员识别
链接:https://arxiv.org/abs/2510.26369

作者:Kazuma Kano, Yuki Mori, Shin Katayama, Kenta Urano, Takuro Yonezawa, Nobuo Kawaguchi
备注:7 pages, 3 figures, accepted to IPIN 2025
摘要:工人位置数据是提高工业现场生产力的关键。摄像机是物流仓库定位的一种很有前途的工具,因为它们还提供了有价值的环境背景,如包裹状态。然而,仅用视觉数据来识别个体通常是不切实际的。因此,之前的几项研究通过比较他们的轨迹和可穿戴传感器测量来识别视频中的人。虽然这种方法具有独立于外观等优点,但现有的方法可能会在现实世界的条件下崩溃。为了克服这一挑战,我们提出了CorVS,一种新的数据驱动的人识别方法,基于视觉跟踪轨迹和传感器测量之间的对应关系。首先,我们的深度学习模型预测每对轨迹和传感器测量的对应概率和可靠性。其次,我们的算法匹配的轨迹和传感器测量随着时间的推移使用预测的概率和可靠性。我们开发了一个具有实际仓库操作的数据集,并证明了该方法在实际应用中的有效性。
摘要:Worker location data is key to higher productivity in industrial sites. Cameras are a promising tool for localization in logistics warehouses since they also offer valuable environmental contexts such as package status. However, identifying individuals with only visual data is often impractical. Accordingly, several prior studies identified people in videos by comparing their trajectories and wearable sensor measurements. While this approach has advantages such as independence from appearance, the existing methods may break down under real-world conditions. To overcome this challenge, we propose CorVS, a novel data-driven person identification method based on correspondence between visual tracking trajectories and sensor measurements. Firstly, our deep learning model predicts correspondence probabilities and reliabilities for every pair of a trajectory and sensor measurements. Secondly, our algorithm matches the trajectories and sensor measurements over time using the predicted probabilities and reliabilities. We developed a dataset with actual warehouse operations and demonstrated the method's effectiveness for real-world applications.


【3】MisSynth: Improving MISSCI Logical Fallacies Classification with Synthetic Data
标题:MisSynth:用合成数据改进MISSCI逻辑谬误分类
链接:https://arxiv.org/abs/2510.26345

作者:Mykhailo Poliakov, Nadiya Shvai
摘要:与健康有关的错误信息非常普遍,而且可能有害。这很难确定,特别是当声称歪曲或误解科学发现时。我们研究了合成数据生成和轻量级微调技术对大型语言模型(LLM)使用MISSCI数据集和框架识别错误参数的能力的影响。在这项工作中,我们提出了MisSynth,这是一个应用检索增强生成(RAG)来生成合成谬误样本的管道,然后将其用于微调LLM模型。我们的研究结果显示,与普通基线相比,微调模型的准确性有了很大的提高。例如,LLaMA 3.1 8B微调模型在MISSCI测试中取得了超过35%的F1分数绝对改善。我们证明,引入合成谬误数据来增加有限的注释资源可以显着提高zero-shot LLM分类性能在现实世界的科学错误信息的任务,即使有限的计算资源。代码和合成数据集可在https://github.com/mxpoliakov/MisSynth上获得。
摘要 :Health-related misinformation is very prevalent and potentially harmful. It is difficult to identify, especially when claims distort or misinterpret scientific findings. We investigate the impact of synthetic data generation and lightweight fine-tuning techniques on the ability of large language models (LLMs) to recognize fallacious arguments using the MISSCI dataset and framework. In this work, we propose MisSynth, a pipeline that applies retrieval-augmented generation (RAG) to produce synthetic fallacy samples, which are then used to fine-tune an LLM model. Our results show substantial accuracy gains with fine-tuned models compared to vanilla baselines. For instance, the LLaMA 3.1 8B fine-tuned model achieved an over 35% F1-score absolute improvement on the MISSCI test split over its vanilla baseline. We demonstrate that introducing synthetic fallacy data to augment limited annotated resources can significantly enhance zero-shot LLM classification performance on real-world scientific misinformation tasks, even with limited computational resources. The code and synthetic dataset are available on https://github.com/mxpoliakov/MisSynth.


【4】MPRU: Modular Projection-Redistribution Unlearning as Output Filter for Classification Pipelines
标题:MPRU:模块化投影重新分布取消学习作为分类管道的输出过滤器
链接:https://arxiv.org/abs/2510.26230

作者:Minyi Peng, Darian Gunamardi, Ivan Tjuawinata, Kwok-Yan Lam
备注:10 pages, 6 figures
摘要:作为一种新的和有前途的方法,现有的机器学习(MU)工作通常强调理论公式或优化目标,以实现知识去除。然而,当部署在真实场景中时,这些解决方案通常面临可扩展性问题,并且必须解决实际需求,例如完全访问原始数据集和模型。与现有的方法相比,我们将分类训练视为一个顺序的过程,其中类是顺序学习的,我们称之为归纳方法。然后,可以通过反转最后一个训练序列来完成遗忘。这是通过在模型的末尾附加投影重分布层来实现的。这种方法不需要完全访问原始数据集或模型,解决了现有方法的挑战。这使得模块化和模型无关的部署能够作为输出过滤器以最小的更改部署到现有的分类管道中。我们在多个数据集上进行了多个实验,包括图像(使用基于CNN的模型的CIFAR-10/100)和表格数据集(使用基于树的模型的Covertype)。实验结果显示,输出始终与完全重新训练的模型相似,并且计算成本大幅降低。这证明了我们的解决方案的适用性、可扩展性和系统兼容性,同时在更实际的设置中保持了输出的性能。
摘要:As a new and promising approach, existing machine unlearning (MU) works typically emphasize theoretical formulations or optimization objectives to achieve knowledge removal. However, when deployed in real-world scenarios, such solutions typically face scalability issues and have to address practical requirements such as full access to original datasets and model. In contrast to the existing approaches, we regard classification training as a sequential process where classes are learned sequentially, which we call \emph{inductive approach}. Unlearning can then be done by reversing the last training sequence. This is implemented by appending a projection-redistribution layer in the end of the model. Such an approach does not require full access to the original dataset or the model, addressing the challenges of existing methods. This enables modular and model-agnostic deployment as an output filter into existing classification pipelines with minimal alterations. We conducted multiple experiments across multiple datasets including image (CIFAR-10/100 using CNN-based model) and tabular datasets (Covertype using tree-based model). Experiment results show consistently similar output to a fully retrained model with a high computational cost reduction. This demonstrates the applicability, scalability, and system compatibility of our solution while maintaining the performance of the output in a more practical setting.


【5】STAR: A Privacy-Preserving, Energy-Efficient Edge AI Framework for Human Activity Recognition via Wi-Fi CSI in Mobile and Pervasive Computing Environments
标题:明星:一个保护隐私、节能的边缘人工智能框架,用于在移动和普适计算环境中通过Wi-Fi SI进行人类活动识别
链接:https://arxiv.org/abs/2510.26148

作者:Kexing Liu
摘要:通过Wi-Fi信道状态信息(CSI)的人类活动识别(HAR)提供了一种隐私保护的非接触式传感方法,适用于智能家居、医疗监控和移动物联网系统。然而,现有的方法经常遇到计算效率低,高延迟,和有限的可行性资源受限,嵌入式移动边缘环境。本文提出了STAR(Sensing Technology for Activity Recognition),这是一种边缘AI优化框架,集成了轻量级神经架构、自适应信号处理和硬件感知协同优化,可在低功耗嵌入式设备上实现实时、节能的HAR。STAR集成了一个精简的基于GRU的递归神经网络,与传统的LSTM模型相比,减少了33%的模型参数,同时保持了有效的时间建模能力。采用中值滤波、8阶Butterworth低通滤波和经验模式分解(EMD)相结合的多级预处理流水线对CSI幅度数据进行去噪和时空特征提取。对于设备上部署,STAR在配备嵌入式神经处理单元(NPU)的瑞芯微RV 1126处理器上实现,并与基于ESP 32-S3的CSI采集模块接口。实验结果表明,在一个紧凑的97.6k参数模型下,7种活动类别的平均识别准确率为93.52%,人体存在检测的平均识别准确率为99.11%。INT 8量化推理实现了33 MHz的处理速度,CPU利用率仅为8%,比基于CPU的执行速度提高了六倍。该系统具有亚秒级的响应延迟和低功耗,可确保实时、保护隐私的HAR,为移动和普适计算环境提供实用、可扩展的解决方案。
摘要:Human Activity Recognition (HAR) via Wi-Fi Channel State Information (CSI) presents a privacy-preserving, contactless sensing approach suitable for smart homes, healthcare monitoring, and mobile IoT systems. However, existing methods often encounter computational inefficiency, high latency, and limited feasibility within resource-constrained, embedded mobile edge environments. This paper proposes STAR (Sensing Technology for Activity Recognition), an edge-AI-optimized framework that integrates a lightweight neural architecture, adaptive signal processing, and hardware-aware co-optimization to enable real-time, energy-efficient HAR on low-power embedded devices. STAR incorporates a streamlined Gated Recurrent Unit (GRU)-based recurrent neural network, reducing model parameters by 33% compared to conventional LSTM models while maintaining effective temporal modeling capability. A multi-stage pre-processing pipeline combining median filtering, 8th-order Butterworth low-pass filtering, and Empirical Mode Decomposition (EMD) is employed to denoise CSI amplitude data and extract spatial-temporal features. For on-device deployment, STAR is implemented on a Rockchip RV1126 processor equipped with an embedded Neural Processing Unit (NPU), interfaced with an ESP32-S3-based CSI acquisition module. Experimental results demonstrate a mean recognition accuracy of 93.52% across seven activity classes and 99.11% for human presence detection, utilizing a compact 97.6k-parameter model. INT8 quantized inference achieves a processing speed of 33 MHz with just 8% CPU utilization, delivering sixfold speed improvements over CPU-based execution. With sub-second response latency and low power consumption, the system ensures real-time, privacy-preserving HAR, offering a practical, scalable solution for mobile and pervasive computing environments.


3D|3D重建等相关(1篇)

【1】Clone Deterministic 3D Worlds with Geometrically-Regularized World Models
标题:使用几何规则化世界模型克隆确定性3D世界
链接:https://arxiv.org/abs/2510.26782

作者:Zaishuo Xia, Yukuan Lu, Xinyi Li, Yifan Xu, Yubei Chen
摘要:世界模型是模拟世界如何演变的内部模型。根据过去的观察和行动,它预测了实体代理及其环境的未来。准确的世界模型对于使智能体能够在复杂的动态环境中有效地思考、计划和推理至关重要。尽管取得了快速进展,但当前的世界模式仍然脆弱,并在长期内退化。我们认为,一个中心原因是表征质量:外感受性输入(例如,图像)是高维的,并且有损的或纠缠的潜在因素使得动态学习不必要地困难。因此,我们要问的是,仅仅改进表征学习是否能显著提高世界模型的性能。在这项工作中,我们通过解决一个基本而开放的问题,朝着构建真正准确的世界模型迈出了一步:构建一个可以完全克隆和过拟合到确定性3D世界的模型。我们提出了几何正则化世界模型(GRWM),它强制沿自然感官轨迹的连续点在潜在表征空间中保持接近。这种方法产生了显着改进的潜在表示,密切配合的真实拓扑环境。GRWM是即插即用的,只需要最小的架构修改,与轨迹长度的规模,并与各种潜在的生成骨干兼容。在确定性3D设置和长期预测任务中,GRWM显著提高了展示保真度和稳定性。分析表明,它的好处源于学习一个潜在的流形与优越的几何结构。这些发现支持了一个明确的结论:改进表征学习是实现强大世界模型的直接而有用的途径,可以在不扩大动态模块的情况下提供可靠的长期预测。
摘要 :A world model is an internal model that simulates how the world evolves. Given past observations and actions, it predicts the future of both the embodied agent and its environment. Accurate world models are essential for enabling agents to think, plan, and reason effectively in complex, dynamic settings. Despite rapid progress, current world models remain brittle and degrade over long horizons. We argue that a central cause is representation quality: exteroceptive inputs (e.g., images) are high-dimensional, and lossy or entangled latents make dynamics learning unnecessarily hard. We therefore ask whether improving representation learning alone can substantially improve world-model performance. In this work, we take a step toward building a truly accurate world model by addressing a fundamental yet open problem: constructing a model that can fully clone and overfit to a deterministic 3D world. We propose Geometrically-Regularized World Models (GRWM), which enforces that consecutive points along a natural sensory trajectory remain close in latent representation space. This approach yields significantly improved latent representations that align closely with the true topology of the environment. GRWM is plug-and-play, requires only minimal architectural modification, scales with trajectory length, and is compatible with diverse latent generative backbones. Across deterministic 3D settings and long-horizon prediction tasks, GRWM significantly increases rollout fidelity and stability. Analyses show that its benefits stem from learning a latent manifold with superior geometric structure. These findings support a clear takeaway: improving representation learning is a direct and useful path to robust world models, delivering reliable long-horizon predictions without enlarging the dynamics module.


编码器(1篇)

【1】CYPRESS: Crop Yield Prediction via Regression on Prithvi's Encoder for Satellite Sensing
标题:CYPUSS:通过Prithvi卫星传感编码器回归预测农作物产量
链接:https://arxiv.org/abs/2510.26609

作者:Shayan Nejadshamsi, Yuanyuan Zhang, Shadi Zaki, Brock Porth, Lysa Porth, Vahab Khoshdel
摘要:准确、及时的作物产量预测对全球粮食安全和现代农业管理至关重要。传统方法往往缺乏精准农业所需的可扩展性和粒度。本文介绍了CYPRESS(通过Prithvi的卫星传感编码器回归进行作物产量预测),这是一种深度学习模型,旨在进行高分辨率的田间油菜产量预测。CYPRESS利用预先训练的大规模地理空间基础模型(Prithvi-EO-2.0- 600 M),并将其用于连续回归任务,将多时相卫星图像转换为密集的像素级产量地图。在加拿大大草原的综合数据集上进行评估,CYPRESS表现出优于现有基于深度学习的产量预测模型的性能,突出了微调基础模型对专业农业应用的有效性。通过提供连续的高分辨率输出,CYPRESS为精准农业提供了比传统分类或县级汇总方法更可行的工具。这项工作验证了一种新的方法,可以弥合大规模地球观测和农场决策之间的差距,为详细的农业监测提供可扩展的解决方案。
摘要:Accurate and timely crop yield prediction is crucial for global food security and modern agricultural management. Traditional methods often lack the scalability and granularity required for precision farming. This paper introduces CYPRESS (Crop Yield Prediction via Regression on Prithvi's Encoder for Satellite Sensing), a deep learning model designed for high-resolution, intra-field canola yield prediction. CYPRESS leverages a pre-trained, large-scale geospatial foundation model (Prithvi-EO-2.0-600M) and adapts it for a continuous regression task, transforming multi-temporal satellite imagery into dense, pixel-level yield maps. Evaluated on a comprehensive dataset from the Canadian Prairies, CYPRESS demonstrates superior performance over existing deep learning-based yield prediction models, highlighting the effectiveness of fine-tuning foundation models for specialized agricultural applications. By providing a continuous, high-resolution output, CYPRESS offers a more actionable tool for precision agriculture than conventional classification or county-level aggregation methods. This work validates a novel approach that bridges the gap between large-scale Earth observation and on-farm decision-making, offering a scalable solution for detailed agricultural monitoring.


优化|敛散性(5篇)

【1】Omnipresent Yet Overlooked: Heat Kernels in Combinatorial Bayesian Optimization
标题:无所不在但被忽视:组合Bayesian优化中的热核
链接:https://arxiv.org/abs/2510.26633

作者:Colin Doumont, Victor Picheny, Viacheslav Borovitskiy, Henry Moss
摘要:贝叶斯优化(BO)具有解决各种组合任务的潜力,从材料科学到神经结构搜索。然而,BO需要专门的内核来有效地建模组合域。最近的努力已经介绍了几个组合内核,但它们之间的关系还没有得到很好的理解。为了弥合这一差距,我们开发了一个统一的框架,基于热核,我们推导出一个系统的方式,并表示为简单的封闭形式的表达式。使用这个框架,我们证明了许多成功的组合内核相关或等价于热内核,并验证了我们的实验中的理论主张。此外,我们的分析证实并扩展了Bounce中的结果:当函数的未知最优解不具有特定结构时,某些算法的性能会大幅下降。相比之下,热核对最优解的位置不敏感。最后,我们证明了一个快速简单的流水线,依赖于热内核,能够实现最先进的结果,匹配甚至优于某些缓慢或复杂的算法。
摘要:Bayesian Optimization (BO) has the potential to solve various combinatorial tasks, ranging from materials science to neural architecture search. However, BO requires specialized kernels to effectively model combinatorial domains. Recent efforts have introduced several combinatorial kernels, but the relationships among them are not well understood. To bridge this gap, we develop a unifying framework based on heat kernels, which we derive in a systematic way and express as simple closed-form expressions. Using this framework, we prove that many successful combinatorial kernels are either related or equivalent to heat kernels, and validate this theoretical claim in our experiments. Moreover, our analysis confirms and extends the results presented in Bounce: certain algorithms' performance decreases substantially when the unknown optima of the function do not have a certain structure. In contrast, heat kernels are not sensitive to the location of the optima. Lastly, we show that a fast and simple pipeline, relying on heat kernels, is able to achieve state-of-the-art results, matching or even outperforming certain slow or complex algorithms.


【2】Think Outside the Policy: In-Context Steered Policy Optimization
标题:跳出政策思考:根据上下文引导政策优化
链接:https://arxiv.org/abs/2510.26519

作者:Hsiu-Yuan Huang, Chenming Tang, Weijie Liu, Saiyong Yang, Yunfang Wu
备注:Work in progress
摘要:现有的可验证奖励强化学习(RLVR)方法,如组相对策略优化(GRPO),在提高大型推理模型(LRM)的推理能力方面取得了显着进展。然而,它们表现出有限的探索,由于依赖于政策推出,局限于当前政策的分布,导致狭窄的轨迹多样性。最近的方法试图扩大政策的覆盖面,纳入更强大的专家模型产生的轨迹,但这种依赖增加了计算成本,这种模型往往是不可访问的。为了解决这些问题,我们提出了上下文导向策略优化(ICPO),这是一个统一的框架,利用LRM固有的上下文学习能力,使用现有的数据集提供专家指导。ICPO引入了具有隐式专家强制的混合策略GRPO,它将探索扩展到当前策略分布之外,而无需高级LRM轨迹。为了进一步稳定优化,ICPO集成了专家区域随机抽样来过滤不可靠的政策外轨迹,并集成了退火专家奖金奖励整形来平衡早期专家指导与后期自主改进。结果表明,ICPO一致地增强了数学推理基准的强化学习性能和训练稳定性,揭示了LRM的可扩展和有效的RLVR范式。
摘要 :Existing Reinforcement Learning from Verifiable Rewards (RLVR) methods, such as Group Relative Policy Optimization (GRPO), have achieved remarkable progress in improving the reasoning capabilities of Large Reasoning Models (LRMs). However, they exhibit limited exploration due to reliance on on-policy rollouts where confined to the current policy's distribution, resulting in narrow trajectory diversity. Recent approaches attempt to expand policy coverage by incorporating trajectories generated from stronger expert models, yet this reliance increases computational cost and such advaned models are often inaccessible. To address these issues, we propose In-Context Steered Policy Optimization (ICPO), a unified framework that leverages the inherent in-context learning capability of LRMs to provide expert guidance using existing datasets. ICPO introduces Mixed-Policy GRPO with Implicit Expert Forcing, which expands exploration beyond the current policy distribution without requiring advanced LRM trajectories. To further stabilize optimization, ICPO integrates Expert Region Reject Sampling to filter unreliable off-policy trajectories and Annealed Expert-Bonus Reward Shaping to balance early expert guidance with later autonomous improvement. Results demonstrate that ICPO consistently enhances reinforcement learning performance and training stability on mathematical reasoning benchmarks, revealing a scalable and effective RLVR paradigm for LRMs.


【3】A General and Streamlined Differentiable Optimization Framework
标题:通用且简化的差异化优化框架
链接:https://arxiv.org/abs/2510.25986

作者:Andrew W. Rosemberg, Joaquim Dias Garcia, François Pacaud, Robert B. Parker, Benoît Legat, Kaarthik Sundar, Russell Bent, Pascal Van Hentenryck
备注:17 pages, 4 figures
摘要:通过约束优化问题进行区分对于学习、控制和大规模决策系统来说越来越重要,但由于求解器的专业化和接口不匹配,实际集成仍然具有挑战性。本文提出了一个通用的和精简的框架-更新的DiffOpt. jl-统一建模和分化的朱莉娅优化堆栈。该框架计算正向和反向模式的解决方案和目标的灵敏度光滑,潜在的非凸方案,通过区分标准的正则性假设下的KKT系统。一流的JuMP原生以参数为中心的API允许用户声明命名参数并直接获取与它们相关的导数(即使参数出现在多个约束和目标中),从而消除了系数级界面中的脆弱簿记。我们说明了这些功能的凸和非凸模型,包括经济调度,均值方差组合选择与圆锥风险约束,和非线性机器人逆运动学。两项配套研究进一步证明了大规模的影响:基于梯度的迭代方法用于能源市场的战略投标,以及使用求解器精确灵敏度的端到端优化代理的Sobolev式训练。总之,这些结果表明,可微优化可以作为实验、学习、校准和设计的常规工具部署,而不会偏离标准JuMP建模实践,同时可以访问广泛的求解器生态系统。
摘要:Differentiating through constrained optimization problems is increasingly central to learning, control, and large-scale decision-making systems, yet practical integration remains challenging due to solver specialization and interface mismatches. This paper presents a general and streamlined framework-an updated DiffOpt.jl-that unifies modeling and differentiation within the Julia optimization stack. The framework computes forward - and reverse-mode solution and objective sensitivities for smooth, potentially nonconvex programs by differentiating the KKT system under standard regularity assumptions. A first-class, JuMP-native parameter-centric API allows users to declare named parameters and obtain derivatives directly with respect to them - even when a parameter appears in multiple constraints and objectives - eliminating brittle bookkeeping from coefficient-level interfaces. We illustrate these capabilities on convex and nonconvex models, including economic dispatch, mean-variance portfolio selection with conic risk constraints, and nonlinear robot inverse kinematics. Two companion studies further demonstrate impact at scale: gradient-based iterative methods for strategic bidding in energy markets and Sobolev-style training of end-to-end optimization proxies using solver-accurate sensitivities. Together, these results demonstrate that differentiable optimization can be deployed as a routine tool for experimentation, learning, calibration, and design-without deviating from standard JuMP modeling practices and while retaining access to a broad ecosystem of solvers.


【4】Multimodal Bandits: Regret Lower Bounds and Optimal Algorithms
标题:多模式盗贼:遗憾下限和最佳算法
链接:https://arxiv.org/abs/2510.25811

作者:William Réveillard, Richard Combes
备注:31 pages; NeurIPS 2025
摘要:考虑一类带独立同分布的随机多臂强盗问题。奖励,其中期望奖励函数是具有至多m个模式的多模态。我们提出了第一个已知的计算易处理的算法计算的解决方案的格雷夫斯-赖优化问题,这反过来又使实施渐近最优算法,这个强盗问题。所提出的算法的代码可在https://github.com/wilrev/MultimodalBandits上公开获得
摘要:We consider a stochastic multi-armed bandit problem with i.i.d. rewards where the expected reward function is multimodal with at most m modes. We propose the first known computationally tractable algorithm for computing the solution to the Graves-Lai optimization problem, which in turn enables the implementation of asymptotically optimal algorithms for this bandit problem. The code for the proposed algorithms is publicly available at https://github.com/wilrev/MultimodalBandits


【5】RNAGenScape: Property-guided Optimization and Interpolation of mRNA Sequences with Manifold Langevin Dynamics
标题:RNGenScape:利用Manifold Langevin动力学进行mRNA序列的属性引导优化和内插
链接:https://arxiv.org/abs/2510.24736

作者:Danqi Liao, Chen Liu, Xingzhi Sun, Dié Tang, Haochen Wang, Scott Youlten, Srikar Krishna Gopinath, Haejeong Lee, Ethan C. Strayer, Antonio J. Giraldez, Smita Krishnaswamy
备注:ICML 2025 Generative AI and Biology (GenBio) Workshop, Oral presentation (top 9.7%)
摘要:mRNA设计和优化在合成生物学和治疗开发中非常重要,但在机器学习中仍然研究不足。mRNA的系统优化受到数据稀缺和不平衡以及复杂的序列-功能关系的阻碍。我们提出了RNAGenScape,一个属性引导的流形朗之万动力学框架,迭代更新mRNA序列内学习的潜在流形。RNAGenScape结合了一个有组织的自动编码器,该编码器通过目标属性构建潜在空间,以进行有效和生物学上合理的探索,并结合了一个流形投影仪,该投影仪将更新的每一步都收缩回流形。RNAGenScape支持属性引导的优化和序列之间的平滑插值,同时在稀缺和采样不足的数据下保持稳健,并确保中间产物接近可行的mRNA流形。在三个真实的mRNA数据集上,RNAGenScape以高成功率和效率改善了目标属性,优于为蛋白质或非生物数据开发的各种生成或优化方法。通过提供连续的、数据对齐的轨迹,揭示编辑如何影响功能,RNAGenScape为mRNA序列建模中的可控mRNA设计和潜在空间探索建立了一个可扩展的范例。
摘要:mRNA design and optimization are important in synthetic biology and therapeutic development, but remain understudied in machine learning. Systematic optimization of mRNAs is hindered by the scarce and imbalanced data as well as complex sequence-function relationships. We present RNAGenScape, a property-guided manifold Langevin dynamics framework that iteratively updates mRNA sequences within a learned latent manifold. RNAGenScape combines an organized autoencoder, which structures the latent space by target properties for efficient and biologically plausible exploration, with a manifold projector that contracts each step of update back to the manifold. RNAGenScape supports property-guided optimization and smooth interpolation between sequences, while remaining robust under scarce and undersampled data, and ensuring that intermediate products are close to the viable mRNA manifold. Across three real mRNA datasets, RNAGenScape improves the target properties with high success rates and efficiency, outperforming various generative or optimization methods developed for proteins or non-biological data. By providing continuous, data-aligned trajectories that reveal how edits influence function, RNAGenScape establishes a scalable paradigm for controllable mRNA design and latent space exploration in mRNA sequence modeling.


预测|估计(9篇)

【1】Surpassing state of the art on AMD area estimation from RGB fundus images through careful selection of U-Net architectures and loss functions for class imbalance
标题:通过精心选择U-Net架构和针对类别失衡的损失函数,超越了根据RB视网膜图像进行AMD面积估计的最新水平
链接:https://arxiv.org/abs/2510.26778

作者:Valentyna Starodub, Mantas Lukoševičius
摘要 :视网膜相关性黄斑变性(AMD)是60岁以上人群中不可逆视力损害的主要原因之一。本研究的重点是语义分割AMD病变检测RGB眼底图像,一种非侵入性和成本效益的成像技术。ADAM挑战赛的结果是迄今为止从RGB眼底图像研究竞赛和开放数据集中进行的最全面的AMD检测,可作为我们评估的基准。以U-Net连接性作为我们框架的基础,我们评估和比较了几种改进分割模型架构和训练管道的方法,包括预处理技术,不同复杂度的编码器(骨干)深度网络类型,以及专门的损失函数,以减轻图像和像素级别的类别不平衡。本研究的主要成果是AMD检测框架的最终配置,其在非侵入性RGB眼底图像中不同AMD病变类型的多类分割方面优于所有先前的ADAM挑战提交。用于进行本文中提出的实验的源代码是免费提供的。
摘要:Age-related macular degeneration (AMD) is one of the leading causes of irreversible vision impairment in people over the age of 60. This research focuses on semantic segmentation for AMD lesion detection in RGB fundus images, a non-invasive and cost-effective imaging technique. The results of the ADAM challenge - the most comprehensive AMD detection from RGB fundus images research competition and open dataset to date - serve as a benchmark for our evaluation. Taking the U-Net connectivity as a base of our framework, we evaluate and compare several approaches to improve the segmentation model's architecture and training pipeline, including pre-processing techniques, encoder (backbone) deep network types of varying complexity, and specialized loss functions to mitigate class imbalances on image and pixel levels. The main outcome of this research is the final configuration of the AMD detection framework, which outperforms all the prior ADAM challenge submissions on the multi-class segmentation of different AMD lesion types in non-invasive RGB fundus images. The source code used to conduct the experiments presented in this paper is made freely available.


【2】On Purely Private Covariance Estimation
标题:关于积极的私人协方差估计
链接:https://arxiv.org/abs/2510.26717

作者:Tommaso d'Orsi, Gleb Novikov
备注:equal contribution
摘要:我们提出了一个简单的扰动机制下释放$d$维协方差矩阵$\Sigma$纯差分隐私。对于至少有$n\geq d^2/\varepnius $元素的大型数据集,我们的机制恢复了\cite{nikolov 2023 private}的可证明最优Frobenius范数误差保证,同时实现了所有其他$p$-Schatten范数的最佳已知误差,其中$p\in [1,\infty]$。我们的错误是信息理论上最优的所有$p\ge 2$,特别是,我们的机制是第一个纯粹的私人协方差估计,实现最佳误差谱范数。   对于小数据集$n< d^2/\varepsilon$,我们进一步表明,通过将输出投影到适当半径的核范数球上,我们的算法实现了最佳Frobenius范数误差$O(\sqrt{d\;\text{Tr}(\Sigma)/n})$,改进了\cite{nikolov 2023 private}的$O(\sqrt{d/n})$和\cite{dong 2022 differentially}的${O}\big(d^{3/4}\sqrt{\text{Tr}(\Sigma)/n}\big)$的已知界限。
摘要:We present a simple perturbation mechanism for the release of $d$-dimensional covariance matrices $\Sigma$ under pure differential privacy. For large datasets with at least $n\geq d^2/\varepsilon$ elements, our mechanism recovers the provably optimal Frobenius norm error guarantees of \cite{nikolov2023private}, while simultaneously achieving best known error for all other $p$-Schatten norms, with $p\in [1,\infty]$. Our error is information-theoretically optimal for all $p\ge 2$, in particular, our mechanism is the first purely private covariance estimator that achieves optimal error in spectral norm.   For small datasets $n< d^2/\varepsilon$, we further show that by projecting the output onto the nuclear norm ball of appropriate radius, our algorithm achieves the optimal Frobenius norm error $O(\sqrt{d\;\text{Tr}(\Sigma) /n})$, improving over the known bounds of $O(\sqrt{d/n})$ of \cite{nikolov2023private} and ${O}\big(d^{3/4}\sqrt{\text{Tr}(\Sigma)/n}\big)$ of \cite{dong2022differentially}.


【3】Efficient Generative AI Boosts Probabilistic Forecasting of Sudden Stratospheric Warmings
标题:高效的生成式人工智能提高了平流层突然变暖的概率预测
链接:https://arxiv.org/abs/2510.26376

作者:Ningning Tao, Fei Xie, Baoxiang Pan, Hongyu Wang, Han Huang, Zhongpu Qiu, Ke Gui, Jiali Luo, Xiaosong Chen
摘要:平流层突然变暖(SSW)是亚季节可预测性的关键来源,也是极端冬季天气的主要驱动因素。然而,他们的准确和有效的预报仍然是一个持续的挑战,数值天气预报(NWP)系统由于物理表示,初始化和集合预报的巨大计算需求的限制。虽然数据驱动的预测正在迅速发展,但其在SSW的复杂三维动态中的应用,特别是概率预测,仍然没有得到充分的探索。在这里,我们通过开发一个基于流匹配的生成AI模型(FM-Cast)来弥合这一差距,该模型用于平流层环流时空演变的有效和熟练的概率预测。在18个主要SSW事件(1998-2024)中进行评估,FM-Cast巧妙地预测了10个事件的发生,强度和形态,提前20天,实现了50%以上的整体准确率。它的性能与领先的NWP系统相当或超过领先的NWP系统,而在消费级GPU上进行50名成员的30天预测只需两分钟。此外,利用FM-Cast作为一种科学工具,我们通过理想化的实验证明,SSW的可预测性从根本上与其潜在的物理驱动因素有关,区分对流层强迫事件和内部平流层动力学驱动的事件。因此,我们的工作为概率预测平流层异常建立了一个计算效率高的范例,并展示了生成人工智能加深对大气-气候动力学的物理理解的潜力。
摘要:Sudden Stratospheric Warmings (SSWs) are key sources of subseasonal predictability and major drivers of extreme winter weather. Yet, their accurate and efficient forecast remains a persistent challenge for numerical weather prediction (NWP) systems due to limitations in physical representation, initialization, and the immense computational demands of ensemble forecasts. While data-driven forecasting is rapidly evolving, its application to the complex, three-dimensional dynamics of SSWs, particularly for probabilistic forecast, remains underexplored. Here, we bridge this gap by developing a Flow Matching-based generative AI model (FM-Cast) for efficient and skillful probabilistic forecasting of the spatiotemporal evolution of stratospheric circulation. Evaluated across 18 major SSW events (1998-2024), FM-Cast skillfully forecasts the onset, intensity, and morphology of 10 events up to 20 days in advance, achieving ensemble accuracies above 50%. Its performance is comparable to or exceeds leading NWP systems while requiring only two minutes for a 50-member, 30-day forecast on a consumer GPU. Furthermore, leveraging FM-Cast as a scientific tool, we demonstrate through idealized experiments that SSW predictability is fundamentally linked to its underlying physical drivers, distinguishing between events forced from the troposphere and those driven by internal stratospheric dynamics. Our work thus establishes a computationally efficient paradigm for probabilistic forecasting stratospheric anomalies and showcases generative AI's potential to deepen the physical understanding of atmosphere-climate dynamics.


【4】Accumulative SGD Influence Estimation for Data Attribution
标题:数据属性的累积新元影响估计
链接:https://arxiv.org/abs/2510.26185

作者:Yunxiao Shi, Shuo Yang, Yixin Su, Rui Zhang, Min Xu
摘要:现代以数据为中心的人工智能需要精确的每个样本影响力。标准SGD-IE通过对每个历元的替代项求和来近似留一效应,并忽略了跨历元的复合,这会使关键示例排名错误。我们提出了ACC-SGD-IE,这是一种具有容错性的估计器,它在训练过程中传播留一扰动,并在每一步更新累积影响状态。在光滑的强凸设置,它实现了几何误差收缩,在光滑的非凸制度,它收紧误差界,更大的小批量进一步减少常数。从经验上讲,在Adult,20 Newsgroups和MNIST上,在干净和损坏的数据以及凸和非凸训练下,ACC-SGD-IE产生更准确的影响力估计,特别是在长时期内。对于下游数据清理,它更可靠地标记噪声样本,生成在ACC-SGD-IE清理数据上训练的模型,其性能优于使用SGD-IE清理的模型。
摘要:Modern data-centric AI needs precise per-sample influence. Standard SGD-IE approximates leave-one-out effects by summing per-epoch surrogates and ignores cross-epoch compounding, which misranks critical examples. We propose ACC-SGD-IE, a trajectory-aware estimator that propagates the leave-one-out perturbation across training and updates an accumulative influence state at each step. In smooth strongly convex settings it achieves geometric error contraction and, in smooth non-convex regimes, it tightens error bounds; larger mini-batches further reduce constants. Empirically, on Adult, 20 Newsgroups, and MNIST under clean and corrupted data and both convex and non-convex training, ACC-SGD-IE yields more accurate influence estimates, especially over long epochs. For downstream data cleansing it more reliably flags noisy samples, producing models trained on ACC-SGD-IE cleaned data that outperform those cleaned with SGD-IE.


【5】Efficient Online Learning with Predictive Coding Networks: Exploiting Temporal Correlations
标题:利用预测编码网络进行高效在线学习:利用时间相关性
链接:https://arxiv.org/abs/2510.25993

作者:Darius Masoum Zadeh-Jousdani, Elvin Hajizada, Eyke Hüllermeier
备注:Accepted at EdgeAI4R Workshop, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2025
摘要:在边缘运行的机器人系统需要高效的在线学习算法,这些算法可以在处理流式传感数据的同时不断适应不断变化的环境。传统的反向传播虽然有效,但与生物相容性原则相冲突,并且对于连续适应情景可能是次优的。预测编码(PC)框架提供了一种生物学上合理的替代方案,具有局部的、类似赫布的更新规则,使其适合于神经形态硬件实现。然而,PC的主要限制是由于训练期间的多次推理迭代而导致的计算开销。我们提出了时间摊销预测编码网络(PCN-TA),它保留了跨时间帧的潜在状态。通过利用时间相关性,PCN-TA显著降低了计算需求,同时保持了学习性能。我们在COIL-20机器人感知数据集上的实验表明,与反向传播相比,PCN-TA的权重更新减少了10%,并且比基线PC网络所需的推理步骤减少了50%。这些效率的提高直接转化为减少的计算开销,从而在资源受限的机器人系统中向边缘部署和实时适应支持迈进了一步。我们的方法的生物启发性质也使其成为未来神经形态硬件实现的有希望的候选者,从而实现边缘的有效在线学习。
摘要:Robotic systems operating at the edge require efficient online learning algorithms that can continuously adapt to changing environments while processing streaming sensory data. Traditional backpropagation, while effective, conflicts with biological plausibility principles and may be suboptimal for continuous adaptation scenarios. The Predictive Coding (PC) framework offers a biologically plausible alternative with local, Hebbian-like update rules, making it suitable for neuromorphic hardware implementation. However, PC's main limitation is its computational overhead due to multiple inference iterations during training. We present Predictive Coding Network with Temporal Amortization (PCN-TA), which preserves latent states across temporal frames. By leveraging temporal correlations, PCN-TA significantly reduces computational demands while maintaining learning performance. Our experiments on the COIL-20 robotic perception dataset demonstrate that PCN-TA achieves 10% fewer weight updates compared to backpropagation and requires 50% fewer inference steps than baseline PC networks. These efficiency gains directly translate to reduced computational overhead for moving another step toward edge deployment and real-time adaptation support in resource-constrained robotic systems. The biologically-inspired nature of our approach also makes it a promising candidate for future neuromorphic hardware implementations, enabling efficient online learning at the edge.


【6】Contrastive Predictive Coding Done Right for Mutual Information Estimation
标题:正确进行对比预测编码用于互信息估计
链接:https://arxiv.org/abs/2510.25983

作者:J. Jon Ryu, Pavan Yeddanapudi, Xiangxiang Xu, Gregory W. Wornell
备注:26 pages, 5 figures
摘要:InfoNCE目标最初是为对比表示学习而引入的,尽管它与互信息(MI)有间接的联系,但它已经成为互信息(MI)估计的热门选择。在本文中,我们证明了为什么InfoNCE不应该被视为一个有效的MI估计,我们介绍了一个简单的修改,我们称之为InfoNCE锚,准确的MI估计。我们的修改引入了一个辅助锚类,使一致的密度比估计,并产生一个插件MI估计显着减少的偏见。除此之外,我们使用适当的评分规则来概括我们的框架,当采用日志评分时,该规则将InfoNCE锚恢复为特殊情况。这种表述将广泛的对比目标,包括NCE,InfoNCE和$f$-divergence变量,统一在一个单一的原则框架下。从经验上讲,我们发现,InfoNCE锚与日志得分实现最准确的MI估计,然而,在自我监督表示学习实验中,我们发现,锚不提高下游任务的性能。这些研究结果证实,对比表征学习的好处不是从准确的MI估计本身,但从结构化密度比的学习。
摘要:The InfoNCE objective, originally introduced for contrastive representation learning, has become a popular choice for mutual information (MI) estimation, despite its indirect connection to MI. In this paper, we demonstrate why InfoNCE should not be regarded as a valid MI estimator, and we introduce a simple modification, which we refer to as InfoNCE-anchor, for accurate MI estimation. Our modification introduces an auxiliary anchor class, enabling consistent density ratio estimation and yielding a plug-in MI estimator with significantly reduced bias. Beyond this, we generalize our framework using proper scoring rules, which recover InfoNCE-anchor as a special case when the log score is employed. This formulation unifies a broad spectrum of contrastive objectives, including NCE, InfoNCE, and $f$-divergence variants, under a single principled framework. Empirically, we find that InfoNCE-anchor with the log score achieves the most accurate MI estimates; however, in self-supervised representation learning experiments, we find that the anchor does not improve the downstream task performance. These findings corroborate that contrastive representation learning benefits not from accurate MI estimation per se, but from the learning of structured density ratios.


【7】Application and Validation of Geospatial Foundation Model Data for the Prediction of Health Facility Programmatic Outputs -- A Case Study in Malawi
标题:地理空间基础模型数据用于预测卫生设施计划产出的应用和验证--马拉维的案例研究
链接:https://arxiv.org/abs/2510.25954

作者:Lynn Metz, Rachel Haggard, Michael Moszczynski, Samer Asbah, Chris Mwase, Patricia Khomani, Tyler Smith, Hannah Cooper, Annie Mwale, Arbaaz Muslim, Gautam Prasad, Mimi Sun, Tomer Shekel, Joydeep Paul, Anna Carter, Shravya Shetty, Dylan Green
备注:13 pages, 3010 words, 2 tables, 2 figures
摘要:低收入和中等收入国家常规卫生数据的可靠性往往受到报告延迟和覆盖面不完整的限制,因此需要探索新的数据来源和分析。地理空间基础模型(GeoFM)通过将不同的空间,时间和行为数据合成为可有效用于下游预测任务的数学嵌入,提供了一种有前途的途径。这项研究评估了三个GeoFM嵌入源的预测性能-谷歌人口动态基金会模型(PDFM),谷歌AlphaEarth(来自卫星图像)和手机通话详细记录(CDR)-用于建模马拉维的15个常规健康计划输出,并将其实用性与传统的地理空间插值方法进行了比较。我们对来自552个健康集水区(2021年1月至2023年5月)的数据使用XGBoost模型,使用R2评估性能,并使用80/20的训练和测试数据分割,并在训练中使用5倍交叉验证。虽然预测性能好坏参半,但基于嵌入的方法在测试的15个指标中的13个(87%)中改进了基线地质统计方法。综合所有三个嵌入源的Multi-GeoFM模型产生了最稳健的预测,实现了人口密度(0.63),新艾滋病毒病例(0.57)和儿童疫苗接种(0.47)等指标的平均5倍交叉验证R2值,测试集R2分别为0.64,0.68和0.55。对于原始数据可用性低的预测目标,如结核病和营养不良病例,预测效果较差。这些结果表明,GeoFM嵌入在LMIC背景下对选定的健康和人口统计结果进行了适度的预测改进。我们的结论是,多个GeoFM源的集成是一个有效的和有价值的工具,用于补充和加强受约束的常规卫生信息系统。
摘要:The reliability of routine health data in low and middle-income countries (LMICs) is often constrained by reporting delays and incomplete coverage, necessitating the exploration of novel data sources and analytics. Geospatial Foundation Models (GeoFMs) offer a promising avenue by synthesizing diverse spatial, temporal, and behavioral data into mathematical embeddings that can be efficiently used for downstream prediction tasks. This study evaluated the predictive performance of three GeoFM embedding sources - Google Population Dynamics Foundation Model (PDFM), Google AlphaEarth (derived from satellite imagery), and mobile phone call detail records (CDR) - for modeling 15 routine health programmatic outputs in Malawi, and compared their utility to traditional geospatial interpolation methods. We used XGBoost models on data from 552 health catchment areas (January 2021-May 2023), assessing performance with R2, and using an 80/20 training and test data split with 5-fold cross-validation used in training. While predictive performance was mixed, the embedding-based approaches improved upon baseline geostatistical methods in 13 of 15 (87%) indicators tested. A Multi-GeoFM model integrating all three embedding sources produced the most robust predictions, achieving average 5-fold cross validated R2 values for indicators like population density (0.63), new HIV cases (0.57), and child vaccinations (0.47) and test set R2 of 0.64, 0.68, and 0.55, respectively. Prediction was poor for prediction targets with low primary data availability, such as TB and malnutrition cases. These results demonstrate that GeoFM embeddings imbue a modest predictive improvement for select health and demographic outcomes in an LMIC context. We conclude that the integration of multiple GeoFM sources is an efficient and valuable tool for supplementing and strengthening constrained routine health information systems.


【8】Bridging the Gap between Empirical Welfare Maximization and Conditional Average Treatment Effect Estimation in Policy Learning
标题:政策学习中弥合经验福利最大化和条件平均待遇效果估计之间的差距
链接:https://arxiv.org/abs/2510.26723

作者:Masahiro Kato
摘要:The goal of policy learning is to train a policy function that recommends a treatment given covariates to maximize population welfare. There are two major approaches in policy learning: the empirical welfare maximization (EWM) approach and the plug-in approach. The EWM approach is analogous to a classification problem, where one first builds an estimator of the population welfare, which is a functional of policy functions, and then trains a policy by maximizing the estimated welfare. In contrast, the plug-in approach is based on regression, where one first estimates the conditional average treatment effect (CATE) and then recommends the treatment with the highest estimated outcome. This study bridges the gap between the two approaches by showing that both are based on essentially the same optimization problem. In particular, we prove an exact equivalence between EWM and least squares over a reparameterization of the policy class. As a consequence, the two approaches are interchangeable in several respects and share the same theoretical guarantees under common conditions. Leveraging this equivalence, we propose a novel regularization method for policy learning. Our findings yield a convex and computationally efficient training procedure that avoids the NP-hard combinatorial step typically required in EWM.


【9】Optimizing Mirror-Image Peptide Sequence Design for Data Storage via Peptide Bond Cleavage Prediction
标题:通过肽键断裂预测优化数据存储的图像肽序列设计
链接:https://arxiv.org/abs/2510.25814

作者:Yilong Lu, Si Chen, Songyan Gao, Han Liu, Xin Dong, Wenfeng Shen, Guangtai Ding
备注:8 pages, 4 figures
摘要:Traditional non-biological storage media, such as hard drives, face limitations in both storage density and lifespan due to the rapid growth of data in the big data era. Mirror-image peptides composed of D-amino acids have emerged as a promising biological storage medium due to their high storage density, structural stability, and long lifespan. The sequencing of mirror-image peptides relies on \textit{de-novo} technology. However, its accuracy is limited by the scarcity of tandem mass spectrometry datasets and the challenges that current algorithms encounter when processing these peptides directly. This study is the first to propose improving sequencing accuracy indirectly by optimizing the design of mirror-image peptide sequences. In this work, we introduce DBond, a deep neural network based model that integrates sequence features, precursor ion properties, and mass spectrometry environmental factors for the prediction of mirror-image peptide bond cleavage. In this process, sequences with a high peptide bond cleavage ratio, which are easy to sequence, are selected. The main contributions of this study are as follows. First, we constructed MiPD513, a tandem mass spectrometry dataset containing 513 mirror-image peptides. Second, we developed the peptide bond cleavage labeling algorithm (PBCLA), which generated approximately 12.5 million labeled data based on MiPD513. Third, we proposed a dual prediction strategy that combines multi-label and single-label classification. On an independent test set, the single-label classification strategy outperformed other methods in both single and multiple peptide bond cleavage prediction tasks, offering a strong foundation for sequence optimization.


其他神经网络|深度学习|模型|建模(28篇)

【1】The Oversight Game: Learning to Cooperatively Balance an AI Agent's Safety and Autonomy
标题:监督游戏:学会合作平衡人工智能代理的安全性和自主性
链接:https://arxiv.org/abs/2510.26752

作者:William Overman, Mohsen Bayati
摘要:As increasingly capable agents are deployed, a central safety question is how to retain meaningful human control without modifying the underlying system. We study a minimal control interface where an agent chooses whether to act autonomously (play) or defer (ask), while a human simultaneously chooses whether to be permissive (trust) or to engage in oversight (oversee). If the agent defers, the human's choice determines the outcome, potentially leading to a corrective action or a system shutdown. We model this interaction as a two-player Markov Game. Our analysis focuses on cases where this game qualifies as a Markov Potential Game (MPG), a class of games where we can provide an alignment guarantee: under a structural assumption on the human's value function, any decision by the agent to act more autonomously that benefits itself cannot harm the human's value. We also analyze extensions to this MPG framework. Theoretically, this perspective provides conditions for a specific form of intrinsic alignment. If the reward structures of the human-agent game meet these conditions, we have a formal guarantee that the agent improving its own outcome will not harm the human's. Practically, this model motivates a transparent control layer with predictable incentives where the agent learns to defer when risky and act when safe, while its pretrained policy and the environment's reward structure remain untouched. Our gridworld simulation shows that through independent learning, the agent and human discover their optimal oversight roles. The agent learns to ask when uncertain and the human learns when to oversee, leading to an emergent collaboration that avoids safety violations introduced post-training. This demonstrates a practical method for making misaligned models safer after deployment.


【2】Deep sequence models tend to memorize geometrically; it is unclear why
标题:深序列模型倾向于以几何方式记忆;目前尚不清楚原因
链接:https://arxiv.org/abs/2510.26745

作者:Shahriar Noroozizadeh, Vaishnavh Nagarajan, Elan Rosenfeld, Sanjiv Kumar
摘要 :In sequence modeling, the parametric memory of atomic facts has been predominantly abstracted as a brute-force lookup of co-occurrences between entities. We contrast this associative view against a geometric view of how memory is stored. We begin by isolating a clean and analyzable instance of Transformer reasoning that is incompatible with memory as strictly a storage of the local co-occurrences specified during training. Instead, the model must have somehow synthesized its own geometry of atomic facts, encoding global relationships between all entities, including non-co-occurring ones. This in turn has simplified a hard reasoning task involving an $\ell$-fold composition into an easy-to-learn 1-step geometric task.   From this phenomenon, we extract fundamental aspects of neural embedding geometries that are hard to explain. We argue that the rise of such a geometry, despite optimizing over mere local associations, cannot be straightforwardly attributed to typical architectural or optimizational pressures. Counterintuitively, an elegant geometry is learned even when it is not more succinct than a brute-force lookup of associations.   Then, by analyzing a connection to Node2Vec, we demonstrate how the geometry stems from a spectral bias that -- in contrast to prevailing theories -- indeed arises naturally despite the lack of various pressures. This analysis also points to practitioners a visible headroom to make Transformer memory more strongly geometric. We hope the geometric view of parametric memory encourages revisiting the default intuitions that guide researchers in areas like knowledge acquisition, capacity, discovery and unlearning.


【3】An All-Reduce Compatible Top-K Compressor for Communication-Efficient Distributed Learning
标题:全精简兼容的Top-K压缩器,用于通信高效的分布式学习
链接:https://arxiv.org/abs/2510.26709

作者:Chuyan Chen, Chenyang Ma, Zhangxin Li, Yutong He, Yanjie Dong, Kun Yuan
备注:8 pages, 2 figures
摘要:Communication remains a central bottleneck in large-scale distributed machine learning, and gradient sparsification has emerged as a promising strategy to alleviate this challenge. However, existing gradient compressors face notable limitations: Rand-$K$\ discards structural information and performs poorly in practice, while Top-$K$\ preserves informative entries but loses the contraction property and requires costly All-Gather operations. In this paper, we propose ARC-Top-$K$, an {All-Reduce}-Compatible Top-$K$ compressor that aligns sparsity patterns across nodes using a lightweight sketch of the gradient, enabling index-free All-Reduce while preserving globally significant information. ARC-Top-$K$\ is provably contractive and, when combined with momentum error feedback (EF21M), achieves linear speedup and sharper convergence rates than the original EF21M under standard assumptions. Empirically, ARC-Top-$K$\ matches the accuracy of Top-$K$\ while reducing wall-clock training time by up to 60.7\%, offering an efficient and scalable solution that combines the robustness of Rand-$K$\ with the strong performance of Top-$K$.


【4】How Regularization Terms Make Invertible Neural Networks Bayesian Point Estimators
标题:正规化项如何制作可逆神经网络Bayesian点估计器
链接:https://arxiv.org/abs/2510.26704

作者:Nick Heilenkötter
备注:Preprint, under review
摘要:Can regularization terms in the training of invertible neural networks lead to known Bayesian point estimators in reconstruction? Invertible networks are attractive for inverse problems due to their inherent stability and interpretability. Recently, optimization strategies for invertible neural networks that approximate either a reconstruction map or the forward operator have been studied from a Bayesian perspective, but each has limitations. To address this, we introduce and analyze two regularization terms for the network training that, upon inversion of the network, recover properties of classical Bayesian point estimators: while the first can be connected to the posterior mean, the second resembles the MAP estimator. Our theoretical analysis characterizes how each loss shapes both the learned forward operator and its inverse reconstruction map. Numerical experiments support our findings and demonstrate how these loss-term regularizers introduce data-dependence in a stable and interpretable way.


【5】Curly Flow Matching for Learning Non-gradient Field Dynamics
标题:用于学习非梯度场动力学的卷曲流匹配
链接:https://arxiv.org/abs/2510.26645

作者:Katarina Petrović, Lazar Atanackovic, Viggo Moro, Kacper Kapuśniak, İsmail İlkan Ceylan, Michael Bronstein, Avishek Joey Bose, Alexander Tong
备注:Accepted to NeurIPS 2025
摘要:Modeling the transport dynamics of natural processes from population-level observations is a ubiquitous problem in the natural sciences. Such models rely on key assumptions about the underlying process in order to enable faithful learning of governing dynamics that mimic the actual system behavior. The de facto assumption in current approaches relies on the principle of least action that results in gradient field dynamics and leads to trajectories minimizing an energy functional between two probability measures. However, many real-world systems, such as cell cycles in single-cell RNA, are known to exhibit non-gradient, periodic behavior, which fundamentally cannot be captured by current state-of-the-art methods such as flow and bridge matching. In this paper, we introduce Curly Flow Matching (Curly-FM), a novel approach that is capable of learning non-gradient field dynamics by designing and solving a Schr\"odinger bridge problem with a non-zero drift reference process -- in stark contrast to typical zero-drift reference processes -- which is constructed using inferred velocities in addition to population snapshot data. We showcase Curly-FM by solving the trajectory inference problems for single cells, computational fluid dynamics, and ocean currents with approximate velocities. We demonstrate that Curly-FM can learn trajectories that better match both the reference process and population marginals. Curly-FM expands flow matching models beyond the modeling of populations and towards the modeling of known periodic behavior in physical systems. Our code repository is accessible at: https://github.com/kpetrovicc/curly-flow-matching.git


【6】On Measuring Localization of Shortcuts in Deep Networks
标题:关于测量深度网络中快捷方式的本地化
链接:https://arxiv.org/abs/2510.26560

作者:Nikita Tsoy, Nikola Konstantinov
摘要 :Shortcuts, spurious rules that perform well during training but fail to generalize, present a major challenge to the reliability of deep networks (Geirhos et al., 2020). However, the impact of shortcuts on feature representations remains understudied, obstructing the design of principled shortcut-mitigation methods. To overcome this limitation, we investigate the layer-wise localization of shortcuts in deep models. Our novel experiment design quantifies the layer-wise contribution to accuracy degradation caused by a shortcut-inducing skew by counterfactual training on clean and skewed datasets. We employ our design to study shortcuts on CIFAR-10, Waterbirds, and CelebA datasets across VGG, ResNet, DeiT, and ConvNeXt architectures. We find that shortcut learning is not localized in specific layers but distributed throughout the network. Different network parts play different roles in this process: shallow layers predominantly encode spurious features, while deeper layers predominantly forget core features that are predictive on clean data. We also analyze the differences in localization and describe its principal axes of variation. Finally, our analysis of layer-wise shortcut-mitigation strategies suggests the hardness of designing general methods, supporting dataset- and architecture-specific approaches instead.


【7】Boosted Trees on a Diet: Compact Models for Resource-Constrained Devices
标题:节食增强树木:资源受限设备的紧凑型模型
链接:https://arxiv.org/abs/2510.26557

作者:Jan Stenkamp, Nina Herrmann, Benjamin Karic, Stefan Oehmcke, Fabian Gieseke
摘要:Deploying machine learning models on compute-constrained devices has become a key building block of modern IoT applications. In this work, we present a compression scheme for boosted decision trees, addressing the growing need for lightweight machine learning models. Specifically, we provide techniques for training compact boosted decision tree ensembles that exhibit a reduced memory footprint by rewarding, among other things, the reuse of features and thresholds during training. Our experimental evaluation shows that models achieved the same performance with a compression ratio of 4-16x compared to LightGBM models using an adapted training process and an alternative memory layout. Once deployed, the corresponding IoT devices can operate independently of constant communication or external energy supply, and, thus, autonomously, requiring only minimal computing power and energy. This capability opens the door to a wide range of IoT applications, including remote monitoring, edge analytics, and real-time decision making in isolated or power-limited environments.


【8】Higher-Order Regularization Learning on Hypergraphs
标题:超图的高阶正则化学习
链接:https://arxiv.org/abs/2510.26533

作者:Adrien Weihs, Andrea Bertozzi, Matthew Thorpe
摘要:Higher-Order Hypergraph Learning (HOHL) was recently introduced as a principled alternative to classical hypergraph regularization, enforcing higher-order smoothness via powers of multiscale Laplacians induced by the hypergraph structure. Prior work established the well- and ill-posedness of HOHL through an asymptotic consistency analysis in geometric settings. We extend this theoretical foundation by proving the consistency of a truncated version of HOHL and deriving explicit convergence rates when HOHL is used as a regularizer in fully supervised learning. We further demonstrate its strong empirical performance in active learning and in datasets lacking an underlying geometric structure, highlighting HOHL's versatility and robustness across diverse learning settings.


【9】Co-Evolving Latent Action World Models
标题:共同进化的潜在动作世界模型
链接:https://arxiv.org/abs/2510.26433

作者:Yucen Wang, Fengming Zhang, De-Chuan Zhan, Li Zhao, Kaixin Wang, Jiang Bian
摘要:Adapting pre-trained video generation models into controllable world models via latent actions is a promising step towards creating generalist world models. The dominant paradigm adopts a two-stage approach that trains latent action model (LAM) and the world model separately, resulting in redundant training and limiting their potential for co-adaptation. A conceptually simple and appealing idea is to directly replace the forward dynamic model in LAM with a powerful world model and training them jointly, but it is non-trivial and prone to representational collapse. In this work, we propose CoLA-World, which for the first time successfully realizes this synergistic paradigm, resolving the core challenge in joint learning through a critical warm-up phase that effectively aligns the representations of the from-scratch LAM with the pre-trained world model. This unlocks a co-evolution cycle: the world model acts as a knowledgeable tutor, providing gradients to shape a high-quality LAM, while the LAM offers a more precise and adaptable control interface to the world model. Empirically, CoLA-World matches or outperforms prior two-stage methods in both video simulation quality and downstream visual planning, establishing a robust and efficient new paradigm for the field.


【10】Multi-Task Learning Based on Support Vector Machines and Twin Support Vector Machines: A Comprehensive Survey
标题:基于支持向量机和双支持向量机的多任务学习:全面综述
链接:https://arxiv.org/abs/2510.26392

作者:Fatemeh Bazikar, Hossein Moosaei, Atefeh Hemmati, Panos M. Pardalos
摘要:Multi-task learning (MTL) enables simultaneous training across related tasks, leveraging shared information to improve generalization, efficiency, and robustness, especially in data-scarce or high-dimensional scenarios. While deep learning dominates recent MTL research, Support Vector Machines (SVMs) and Twin SVMs (TWSVMs) remain relevant due to their interpretability, theoretical rigor, and effectiveness with small datasets.   This chapter surveys MTL approaches based on SVM and TWSVM, highlighting shared representations, task regularization, and structural coupling strategies. Special attention is given to emerging TWSVM extensions for multi-task settings, which show promise but remain underexplored. We compare these models in terms of theoretical properties, optimization strategies, and empirical performance, and discuss applications in fields such as computer vision, natural language processing, and bioinformatics.   Finally, we identify research gaps and outline future directions for building scalable, interpretable, and reliable margin-based MTL frameworks. This work provides a comprehensive resource for researchers and practitioners interested in SVM- and TWSVM-based multi-task learning.


【11】UnifiedFL: A Dynamic Unified Learning Framework for Equitable Federation
标题:UnifiedFL:公平联邦的动态统一学习框架
链接:https://arxiv.org/abs/2510.26350

作者:Furkan Pala, Islem Rekik
摘要:Federated learning (FL) has emerged as a key paradigm for collaborative model training across multiple clients without sharing raw data, enabling privacy-preserving applications in areas such as radiology and pathology. However, works on collaborative training across clients with fundamentally different neural architectures and non-identically distributed datasets remain scarce. Existing FL frameworks face several limitations. Despite claiming to support architectural heterogeneity, most recent FL methods only tolerate variants within a single model family (e.g., shallower, deeper, or wider CNNs), still presuming a shared global architecture and failing to accommodate federations where clients deploy fundamentally different network types (e.g., CNNs, GNNs, MLPs). Moreover, existing approaches often address only statistical heterogeneity while overlooking the domain-fracture problem, where each client's data distribution differs markedly from that faced at testing time, undermining model generalizability. When clients use different architectures, have non-identically distributed data, and encounter distinct test domains, current methods perform poorly. To address these challenges, we propose UnifiedFL, a dynamic federated learning framework that represents heterogeneous local networks as nodes and edges in a directed model graph optimized by a shared graph neural network (GNN). UnifiedFL introduces (i) a common GNN to parameterize all architectures, (ii) distance-driven clustering via Euclidean distances between clients' parameters, and (iii) a two-tier aggregation policy balancing convergence and diversity. Experiments on MedMNIST classification and hippocampus segmentation benchmarks demonstrate UnifiedFL's superior performance. Code and data: https://github.com/basiralab/UnifiedFL


【12】Posterior Sampling by Combining Diffusion Models with Annealed Langevin Dynamics
标题:结合扩散模型与Anneed Langevin动力学进行后验抽样
链接:https://arxiv.org/abs/2510.26324

作者:Zhiyang Xun, Shivam Gupta, Eric Price
备注:NeurIPS 2025
摘要:Given a noisy linear measurement $y = Ax + \xi$ of a distribution $p(x)$, and a good approximation to the prior $p(x)$, when can we sample from the posterior $p(x \mid y)$? Posterior sampling provides an accurate and fair framework for tasks such as inpainting, deblurring, and MRI reconstruction, and several heuristics attempt to approximate it. Unfortunately, approximate posterior sampling is computationally intractable in general.   To sidestep this hardness, we focus on (local or global) log-concave distributions $p(x)$. In this regime, Langevin dynamics yields posterior samples when the exact scores of $p(x)$ are available, but it is brittle to score--estimation error, requiring an MGF bound (sub-exponential error). By contrast, in the unconditional setting, diffusion models succeed with only an $L^2$ bound on the score error. We prove that combining diffusion models with an annealed variant of Langevin dynamics achieves conditional sampling in polynomial time using merely an $L^4$ bound on the score error.


【13】Model Inversion with Layer-Specific Modeling and Alignment for Data-Free Continual Learning
标题:通过特定层建模和对齐进行模型倒置,实现无数据持续学习
链接:https://arxiv.org/abs/2510.26311

作者:Ruilin Tong, Haodong Lu, Yuhang Liu, Dong Gong
备注:Accepted in NeurIPS 2025
摘要:Continual learning (CL) aims to incrementally train a model on a sequence of tasks while retaining performance on prior ones. However, storing and replaying data is often infeasible due to privacy or security constraints and impractical for arbitrary pre-trained models. Data-free CL seeks to update models without access to previous data. Beyond regularization, we employ model inversion to synthesize data from the trained model, enabling replay without storing samples. Yet, model inversion in predictive models faces two challenges: (1) generating inputs solely from compressed output labels causes drift between synthetic and real data, and replaying such data can erode prior knowledge; (2) inversion is computationally expensive since each step backpropagates through the full model. These issues are amplified in large pre-trained models such as CLIP. To improve efficiency, we propose Per-layer Model Inversion (PMI), inspired by faster convergence in single-layer optimization. PMI provides strong initialization for full-model inversion, substantially reducing iterations. To mitigate feature shift, we model class-wise features via Gaussian distributions and contrastive model, ensuring alignment between synthetic and real features. Combining PMI and feature modeling, our approach enables continual learning of new classes by generating pseudo-images from semantic-aware projected features, achieving strong effectiveness and compatibility across multiple CL settings.


【14】Empirical Bayesian Multi-Bandit Learning
标题:经验性Bayesian多强盗学习
链接:https://arxiv.org/abs/2510.26284

作者:Xia Jiang, Rong J.B. Zhu
备注:33 pages, 13 figures
摘要:Multi-task learning in contextual bandits has attracted significant research interest due to its potential to enhance decision-making across multiple related tasks by leveraging shared structures and task-specific heterogeneity. In this article, we propose a novel hierarchical Bayesian framework for learning in various bandit instances. This framework captures both the heterogeneity and the correlations among different bandit instances through a hierarchical Bayesian model, enabling effective information sharing while accommodating instance-specific variations. Unlike previous methods that overlook the learning of the covariance structure across bandits, we introduce an empirical Bayesian approach to estimate the covariance matrix of the prior distribution.This enhances both the practicality and flexibility of learning across multi-bandits. Building on this approach, we develop two efficient algorithms: ebmTS (Empirical Bayesian Multi-Bandit Thompson Sampling) and ebmUCB (Empirical Bayesian Multi-Bandit Upper Confidence Bound), both of which incorporate the estimated prior into the decision-making process. We provide the frequentist regret upper bounds for the proposed algorithms, thereby filling a research gap in the field of multi-bandit problems. Extensive experiments on both synthetic and real-world datasets demonstrate the superior performance of our algorithms, particularly in complex environments. Our methods achieve lower cumulative regret compared to existing techniques, highlighting their effectiveness in balancing exploration and exploitation across multi-bandits.


【15】Likely Interpolants of Generative Models
标题:生成模型的可能插值
链接:https://arxiv.org/abs/2510.26266

作者:Frederik Möbius Rygaard, Shen Zhu, Yinzhu Jin, Søren Hauberg, Tom Fletcher
摘要:Interpolation in generative models allows for controlled generation, model inspection, and more. Unfortunately, most generative models lack a principal notion of interpolants without restrictive assumptions on either the model or data dimension. In this paper, we develop a general interpolation scheme that targets likely transition paths compatible with different metrics and probability distributions. We consider interpolants analogous to a geodesic constrained to a suitable data distribution and derive a novel algorithm for computing these curves, which requires no additional training. Theoretically, we show that our method locally can be considered as a geodesic under a suitable Riemannian metric. We quantitatively show that our interpolation scheme traverses higher density regions than baselines across a range of models and datasets.


【16】Do Not Step Into the Same River Twice: Learning to Reason from Trial and Error
标题:不要两次踏入同一条河:从试错中学习推理
链接:https://arxiv.org/abs/2510.26109

作者:Chenming Tang, Hsiu-Yuan Huang, Weijie Liu, Saiyong Yang, Yunfang Wu
备注:Work in progress
摘要:Reinforcement learning with verifiable rewards (RLVR) has significantly boosted the reasoning capability of large language models (LLMs) recently. However, existing RLVR approaches merely train LLMs based on their own generated responses and are constrained by the initial capability of LLMs, thus prone to exploration stagnation, in which LLMs fail to solve more training problems and cannot further learn from the training data. Some work tries to address this by leveraging off-policy solutions to training problems but requires external guidance from experts which suffers from limited availability. In this work, we propose LTE (Learning to reason from Trial and Error), an approach hinting LLMs with their previously self-generated incorrect answers and problem of overlong responses, which does not require any external expert guidance. Experiments validate the effectiveness of LTE, which outperforms the normal group relative policy optimization (GRPO) by 6.38 in Pass@1 and 9.00 in Pass@k on average across six mathematics benchmarks for Qwen3-4B-Base. Further analysis confirms that LTE successfully mitigates the problem of exploration stagnation and enhances both exploitation and exploration during training.


【17】Nirvana: A Specialized Generalist Model With Task-Aware Memory Mechanism
标题:Nirvana:具有任务感知记忆机制的专业通才模型
链接:https://arxiv.org/abs/2510.26083

作者:Yuhua Jiang, Shuang Cheng, Yihao Liu, Ermo Hua, Che Jiang, Weigao Sun, Yu Cheng, Feifei Gao, Biqing Qi, Bowen Zhou
摘要:Specialized Generalist Models (SGMs) aim to preserve broad capabilities while achieving expert-level performance in target domains. However, traditional LLM structures including Transformer, Linear Attention, and hybrid models do not employ specialized memory mechanism guided by task information. In this paper, we present Nirvana, an SGM with specialized memory mechanism, linear time complexity, and test-time task information extraction. Besides, we propose the Task-Aware Memory Trigger ($\textit{Trigger}$) that flexibly adjusts memory mechanism based on the current task's requirements. In Trigger, each incoming sample is treated as a self-supervised fine-tuning task, enabling Nirvana to adapt its task-related parameters on the fly to domain shifts. We also design the Specialized Memory Updater ($\textit{Updater}$) that dynamically memorizes the context guided by Trigger. We conduct experiments on both general language tasks and specialized medical tasks. On a variety of natural language modeling benchmarks, Nirvana achieves competitive or superior results compared to the existing LLM structures. To prove the effectiveness of Trigger on specialized tasks, we test Nirvana's performance on a challenging medical task, i.e., Magnetic Resonance Imaging (MRI). We post-train frozen Nirvana backbone with lightweight codecs on paired electromagnetic signals and MRI images. Despite the frozen Nirvana backbone, Trigger guides the model to adapt to the MRI domain with the change of task-related parameters. Nirvana achieves higher-quality MRI reconstruction compared to conventional MRI models as well as the models with traditional LLMs' backbone, and can also generate accurate preliminary clinical reports accordingly.


【18】Risks and Opportunities in Human-Machine Teaming in Operationalizing Machine Learning Target Variables
标题:人机合作操作机器学习目标变量的风险和机遇
链接:https://arxiv.org/abs/2510.25974

作者:Mengtian Guo, David Gotz, Yue Wang
备注:23 pages, 6 figures
摘要:Predictive modeling has the potential to enhance human decision-making. However, many predictive models fail in practice due to problematic problem formulation in cases where the prediction target is an abstract concept or construct and practitioners need to define an appropriate target variable as a proxy to operationalize the construct of interest. The choice of an appropriate proxy target variable is rarely self-evident in practice, requiring both domain knowledge and iterative data modeling. This process is inherently collaborative, involving both domain experts and data scientists. In this work, we explore how human-machine teaming can support this process by accelerating iterations while preserving human judgment. We study the impact of two human-machine teaming strategies on proxy construction: 1) relevance-first: humans leading the process by selecting relevant proxies, and 2) performance-first: machines leading the process by recommending proxies based on predictive performance. Based on a controlled user study of a proxy construction task (N = 20), we show that the performance-first strategy facilitated faster iterations and decision-making, but also biased users towards well-performing proxies that are misaligned with the application goal. Our study highlights the opportunities and risks of human-machine teaming in operationalizing machine learning target variables, yielding insights for future research to explore the opportunities and mitigate the risks.


【19】On the Dataless Training of Neural Networks
标题:神经网络的无数据训练
链接:https://arxiv.org/abs/2510.25962

作者:Alvaro Velasquez, Susmit Jha, Ismail R. Alkhouri
摘要:This paper surveys studies on the use of neural networks for optimization in the training-data-free setting. Specifically, we examine the dataless application of neural network architectures in optimization by re-parameterizing problems using fully connected (or MLP), convolutional, graph, and quadratic neural networks. Although MLPs have been used to solve linear programs a few decades ago, this approach has recently gained increasing attention due to its promising results across diverse applications, including those based on combinatorial optimization, inverse problems, and partial differential equations. The motivation for this setting stems from two key (possibly over-lapping) factors: (i) data-driven learning approaches are still underdeveloped and have yet to demonstrate strong results, as seen in combinatorial optimization, and (ii) the availability of training data is inherently limited, such as in medical image reconstruction and other scientific applications. In this paper, we define the dataless setting and categorize it into two variants based on how a problem instance -- defined by a single datum -- is encoded onto the neural network: (i) architecture-agnostic methods and (ii) architecture-specific methods. Additionally, we discuss similarities and clarify distinctions between the dataless neural network (dNN) settings and related concepts such as zero-shot learning, one-shot learning, lifting in optimization, and over-parameterization.


【20】FreIE: Low-Frequency Spectral Bias in Neural Networks for Time-Series Tasks
标题:FreIE:时间序列任务神经网络中的低频频谱偏差
链接:https://arxiv.org/abs/2510.25800

作者:Jialong Sun, Xinpeng Ling, Jiaxuan Zou, Jiawen Kang, Kejia Zhang
摘要:The inherent autocorrelation of time series data presents an ongoing challenge to multivariate time series prediction. Recently, a widely adopted approach has been the incorporation of frequency domain information to assist in long-term prediction tasks. Many researchers have independently observed the spectral bias phenomenon in neural networks, where models tend to fit low-frequency signals before high-frequency ones. However, these observations have often been attributed to the specific architectures designed by the researchers, rather than recognizing the phenomenon as a universal characteristic across models. To unify the understanding of the spectral bias phenomenon in long-term time series prediction, we conducted extensive empirical experiments to measure spectral bias in existing mainstream models. Our findings reveal that virtually all models exhibit this phenomenon. To mitigate the impact of spectral bias, we propose the FreLE (Frequency Loss Enhancement) algorithm, which enhances model generalization through both explicit and implicit frequency regularization. This is a plug-and-play model loss function unit. A large number of experiments have proven the superior performance of FreLE. Code is available at https://github.com/Chenxing-Xuan/FreLE.


【21】SHA-256 Infused Embedding-Driven Generative Modeling of High-Energy Molecules in Low-Data Regimes
标题:SHA-256低数据状态下高能分子的嵌入驱动生成建模
链接:https://arxiv.org/abs/2510.25788

作者:Siddharth Verma, Alankar Alankar
摘要:High-energy materials (HEMs) are critical for propulsion and defense domains, yet their discovery remains constrained by experimental data and restricted access to testing facilities. This work presents a novel approach toward high-energy molecules by combining Long Short-Term Memory (LSTM) networks for molecular generation and Attentive Graph Neural Networks (GNN) for property predictions. We propose a transformative embedding space construction strategy that integrates fixed SHA-256 embeddings with partially trainable representations. Unlike conventional regularization techniques, this changes the representational basis itself, reshaping the molecular input space before learning begins. Without recourse to pretraining, the generator achieves 67.5% validity and 37.5% novelty. The generated library exhibits a mean Tanimoto coefficient of 0.214 relative to training set signifying the ability of framework to generate a diverse chemical space. We identified 37 new super explosives higher than 9 km/s predicted detonation velocity.


【22】A Practitioner's Guide to Kolmogorov-Arnold Networks
标题:Kolmogorov-Arnold网络从业者指南
链接:https://arxiv.org/abs/2510.25781

作者:Amir Noorizadegan, Sifan Wang, Leevan Ling
摘要:Kolmogorov-Arnold Networks (KANs) have recently emerged as a promising alternative to traditional Multilayer Perceptrons (MLPs), inspired by the Kolmogorov-Arnold representation theorem. Unlike MLPs, which use fixed activation functions on nodes, KANs employ learnable univariate basis functions on edges, offering enhanced expressivity and interpretability. This review provides a systematic and comprehensive overview of the rapidly expanding KAN landscape, moving beyond simple performance comparisons to offer a structured synthesis of theoretical foundations, architectural variants, and practical implementation strategies. By collecting and categorizing a vast array of open-source implementations, we map the vibrant ecosystem supporting KAN development. We begin by bridging the conceptual gap between KANs and MLPs, establishing their formal equivalence and highlighting the superior parameter efficiency of the KAN formulation. A central theme of our review is the critical role of the basis function; we survey a wide array of choices, including B-splines, Chebyshev and Jacobi polynomials, ReLU compositions, Gaussian RBFs, and Fourier series, and analyze their respective trade-offs in terms of smoothness, locality, and computational cost. We then categorize recent advancements into a clear roadmap, covering techniques for improving accuracy, efficiency, and regularization. Key topics include physics-informed loss design, adaptive sampling, domain decomposition, hybrid architectures, and specialized methods for handling discontinuities. Finally, we provide a practical "Choose-Your-KAN" guide to help practitioners select appropriate architectures, and we conclude by identifying current research gaps. The associated GitHub repository https://github.com/AmirNoori68/kan-review complements this paper and serves as a structured reference for ongoing KAN research.


【23】Learning Interpretable Features in Audio Latent Spaces via Sparse Autoencoders
标题:通过稀疏自动编码器学习音频潜在空间中的可解释特征
链接:https://arxiv.org/abs/2510.23802

作者:Nathan Paek, Yongyi Zang, Qihui Yang, Randal Leistikow
备注:Accepted to NeurIPS 2025 Mechanistic Interpretability Workshop
摘要:While sparse autoencoders (SAEs) successfully extract interpretable features from language models, applying them to audio generation faces unique challenges: audio's dense nature requires compression that obscures semantic meaning, and automatic feature characterization remains limited. We propose a framework for interpreting audio generative models by mapping their latent representations to human-interpretable acoustic concepts. We train SAEs on audio autoencoder latents, then learn linear mappings from SAE features to discretized acoustic properties (pitch, amplitude, and timbre). This enables both controllable manipulation and analysis of the AI music generation process, revealing how acoustic properties emerge during synthesis. We validate our approach on continuous (DiffRhythm-VAE) and discrete (EnCodec, WavTokenizer) audio latent spaces, and analyze DiffRhythm, a state-of-the-art text-to-music model, to demonstrate how pitch, timbre, and loudness evolve throughout generation. While our work is only done on audio modality, our framework can be extended to interpretable analysis of visual latent space generation models.


【24】Assessment of the conditional exchangeability assumption in causal machine learning models: a simulation study
标题:因果机器学习模型中条件交换假设的评估:模拟研究
链接:https://arxiv.org/abs/2510.26700

作者:Gerard T. Portela, Jason B. Gibbons, Sebastian Schneeweiss, Rishi J. Desai
摘要:Observational studies developing causal machine learning (ML) models for the prediction of individualized treatment effects (ITEs) seldom conduct empirical evaluations to assess the conditional exchangeability assumption. We aimed to evaluate the performance of these models under conditional exchangeability violations and the utility of negative control outcomes (NCOs) as a diagnostic. We conducted a simulation study to examine confounding bias in ITE estimates generated by causal forest and X-learner models under varying conditions, including the presence or absence of true heterogeneity. We simulated data to reflect real-world scenarios with differing levels of confounding, sample size, and NCO confounding structures. We then estimated and compared subgroup-level treatment effects on the primary outcome and NCOs across settings with and without unmeasured confounding. When conditional exchangeability was violated, causal forest and X-learner models failed to recover true treatment effect heterogeneity and, in some cases, falsely indicated heterogeneity when there was none. NCOs successfully identified subgroups affected by unmeasured confounding. Even when NCOs did not perfectly satisfy its ideal assumptions, it remained informative, flagging potential bias in subgroup level estimates, though not always pinpointing the subgroup with the largest confounding. Violations of conditional exchangeability substantially limit the validity of ITE estimates from causal ML models in routinely collected observational data. NCOs serve a useful empirical diagnostic tool for detecting subgroup-specific unmeasured confounding and should be incorporated into causal ML workflows to support the credibility of individualized inference.


【25】Physics-Informed Mixture Models and Surrogate Models for Precision Additive Manufacturing
标题:精确增材制造的物理知识混合模型和替代模型
链接:https://arxiv.org/abs/2510.26586

作者:Sebastian Basterrech, Shuo Shan, Debabrata Adhikari, Sankhya Mohanty
备注:Five pages, four figures, to be presented at the AI in Science Summit, Denmark, November, 2025
摘要:In this study, we leverage a mixture model learning approach to identify defects in laser-based Additive Manufacturing (AM) processes. By incorporating physics based principles, we also ensure that the model is sensitive to meaningful physical parameter variations. The empirical evaluation was conducted by analyzing real-world data from two AM processes: Directed Energy Deposition and Laser Powder Bed Fusion. In addition, we also studied the performance of the developed framework over public datasets with different alloy type and experimental parameter information. The results show the potential of physics-guided mixture models to examine the underlying physical behavior of an AM system.


【26】Robust Super-Capacity SRS Channel Inpainting via Diffusion Models
标题:通过扩散模型实现稳健的超容量RS通道修复
链接:https://arxiv.org/abs/2510.26097

作者:Usman Akram, Fan Zhang, Yang Li, Haris Vikalo
摘要:Accurate channel state information (CSI) is essential for reliable multiuser MIMO operation. In 5G NR, reciprocity-based beamforming via uplink Sounding Reference Signals (SRS) face resource and coverage constraints, motivating sparse non-uniform SRS allocation. Prior masked-autoencoder (MAE) approaches improve coverage but overfit to training masks and degrade under unseen distortions (e.g., additional masking, interference, clipping, non-Gaussian noise). We propose a diffusion-based channel inpainting framework that integrates system-model knowledge at inference via a likelihood-gradient term, enabling a single trained model to adapt across mismatched conditions. On standardized CDL channels, the score-based diffusion variant consistently outperforms a UNet score-model baseline and the one-step MAE under distribution shift, with improvements up to 14 dB NMSE in challenging settings (e.g., Laplace noise, user interference), while retaining competitive accuracy under matched conditions. These results demonstrate that diffusion-guided inpainting is a robust and generalizable approach for super-capacity SRS design in 5G NR systems.


【27】Bias-Corrected Data Synthesis for Imbalanced Learning
标题:用于不平衡学习的偏差修正数据合成
链接:https://arxiv.org/abs/2510.26046

作者:Pengfei Lyu, Zhengchi Ma, Linjun Zhang, Anru R. Zhang
备注:41 pages, 4 figures, includes proofs and appendix
摘要 :Imbalanced data, where the positive samples represent only a small proportion compared to the negative samples, makes it challenging for classification problems to balance the false positive and false negative rates. A common approach to addressing the challenge involves generating synthetic data for the minority group and then training classification models with both observed and synthetic data. However, since the synthetic data depends on the observed data and fails to replicate the original data distribution accurately, prediction accuracy is reduced when the synthetic data is naively treated as the true data. In this paper, we address the bias introduced by synthetic data and provide consistent estimators for this bias by borrowing information from the majority group. We propose a bias correction procedure to mitigate the adverse effects of synthetic data, enhancing prediction accuracy while avoiding overfitting. This procedure is extended to broader scenarios with imbalanced data, such as imbalanced multi-task learning and causal inference. Theoretical properties, including bounds on bias estimation errors and improvements in prediction accuracy, are provided. Simulation results and data analysis on handwritten digit datasets demonstrate the effectiveness of our method.


【28】Discovering Interpretable Biological Concepts in Single-cell RNA-seq Foundation Models
标题:在单细胞RN-seq基础模型中发现可解释的生物学概念
链接:https://arxiv.org/abs/2510.25807

作者:Charlotte Claye (MICS), Pierre Marschall, Wassila Ouerdane (MICS), Céline Hudelot (MICS), Julien Duquesne
摘要:Single-cell RNA-seq foundation models achieve strong performance on downstream tasks but remain black boxes, limiting their utility for biological discovery. Recent work has shown that sparse dictionary learning can extract concepts from deep learning models, with promising applications in biomedical imaging and protein models. However, interpreting biological concepts remains challenging, as biological sequences are not inherently human-interpretable. We introduce a novel concept-based interpretability framework for single-cell RNA-seq models with a focus on concept interpretation and evaluation. We propose an attribution method with counterfactual perturbations that identifies genes that influence concept activation, moving beyond correlational approaches like differential expression analysis. We then provide two complementary interpretation approaches: an expert-driven analysis facilitated by an interactive interface and an ontology-driven method with attribution-based biological pathway enrichment. Applying our framework to two well-known single-cell RNA-seq models from the literature, we interpret concepts extracted by Top-K Sparse Auto-Encoders trained on two immune cell datasets. With a domain expert in immunology, we show that concepts improve interpretability compared to individual neurons while preserving the richness and informativeness of the latent representations. This work provides a principled framework for interpreting what biological knowledge foundation models have encoded, paving the way for their use for hypothesis generation and discovery.


其他(45篇)

【1】Scaling Image Geo-Localization to Continent Level
标题:将图像地理定位缩放到大陆级别
链接:https://arxiv.org/abs/2510.26795

作者:Philipp Lindenberger, Paul-Edouard Sarlin, Jan Hosang, Matteo Balice, Marc Pollefeys, Simon Lynen, Eduard Trulls
备注:NeurIPS 2025
摘要:Determining the precise geographic location of an image at a global scale remains an unsolved challenge. Standard image retrieval techniques are inefficient due to the sheer volume of images (>100M) and fail when coverage is insufficient. Scalable solutions, however, involve a trade-off: global classification typically yields coarse results (10+ kilometers), while cross-view retrieval between ground and aerial imagery suffers from a domain gap and has been primarily studied on smaller regions. This paper introduces a hybrid approach that achieves fine-grained geo-localization across a large geographic expanse the size of a continent. We leverage a proxy classification task during training to learn rich feature representations that implicitly encode precise location information. We combine these learned prototypes with embeddings of aerial imagery to increase robustness to the sparsity of ground-level data. This enables direct, fine-grained retrieval over areas spanning multiple countries. Our extensive evaluation demonstrates that our approach can localize within 200m more than 68\% of queries of a dataset covering a large part of Europe. The code is publicly available at https://scaling-geoloc.github.io.


【2】Remote Labor Index: Measuring AI Automation of Remote Work
标题:远程劳动力指数:衡量远程工作的人工智能自动化
链接:https://arxiv.org/abs/2510.26787

作者:Mantas Mazeika, Alice Gatti, Cristina Menghini, Udari Madhushani Sehwag, Shivam Singhal, Yury Orlovskiy, Steven Basart, Manasi Sharma, Denis Peskoff, Elaine Lau, Jaehyuk Lim, Lachlan Carroll, Alice Blair, Vinaya Sivakumar, Sumana Basu, Brad Kenstler, Yuntao Ma, Julian Michael, Xiaoke Li, Oliver Ingebretsen, Aditya Mehta, Jean Mottola, John Teichmann, Kevin Yu, Zaina Shaik, Adam Khoja, Richard Ren, Jason Hausenloy, Long Phan, Ye Htet, Ankit Aich, Tahseen Rabbani, Vivswan Shah, Andriy Novykov, Felix Binder, Kirill Chugunov, Luis Ramirez, Matias Geralnik, Hernán Mesura, Dean Lee, Ed-Yeremai Hernandez Cardona, Annette Diamond, Summer Yue, Alexandr Wang, Bing Liu, Ernesto Hernandez, Dan Hendrycks
备注:Website: this https URL
摘要:AIs have made rapid progress on research-oriented benchmarks of knowledge and reasoning, but it remains unclear how these gains translate into economic value and automation. To measure this, we introduce the Remote Labor Index (RLI), a broadly multi-sector benchmark comprising real-world, economically valuable projects designed to evaluate end-to-end agent performance in practical settings. AI agents perform near the floor on RLI, with the highest-performing agent achieving an automation rate of 2.5%. These results help ground discussions of AI automation in empirical evidence, setting a common basis for tracking AI impacts and enabling stakeholders to proactively navigate AI-driven labor automation.


【3】Faithful and Fast Influence Function via Advanced Sampling
标题:通过高级抽样实现忠实和快速的影响功能
链接:https://arxiv.org/abs/2510.26776

作者:Jungyeon Koh, Hyeonsu Lyu, Jonggyu Jang, Hyun Jong Yang
摘要 :How can we explain the influence of training data on black-box models? Influence functions (IFs) offer a post-hoc solution by utilizing gradients and Hessians. However, computing the Hessian for an entire dataset is resource-intensive, necessitating a feasible alternative. A common approach involves randomly sampling a small subset of the training data, but this method often results in highly inconsistent IF estimates due to the high variance in sample configurations. To address this, we propose two advanced sampling techniques based on features and logits. These samplers select a small yet representative subset of the entire dataset by considering the stochastic distribution of features or logits, thereby enhancing the accuracy of IF estimations. We validate our approach through class removal experiments, a typical application of IFs, using the F1-score to measure how effectively the model forgets the removed class while maintaining inference consistency on the remaining classes. Our method reduces computation time by 30.1% and memory usage by 42.2%, or improves the F1-score by 2.5% compared to the baseline.


【4】STaMP: Sequence Transformation and Mixed Precision for Low-Precision Activation Quantization
标题:STaMP:用于低精度激活量化的序列转换和混合精度
链接:https://arxiv.org/abs/2510.26771

作者:Marco Federici, Riccardo Del Chiaro, Boris van Breugel, Paul Whatmough, Markus Nagel
备注:10 pages main text, 8 pages supplementary material
摘要:Quantization is the key method for reducing inference latency, power and memory footprint of generative AI models. However, accuracy often degrades sharply when activations are quantized below eight bits. Recent work suggests that invertible linear transformations (e.g. rotations) can aid quantization, by reparameterizing feature channels and weights. In this paper, we propose \textit{Sequence Transformation and Mixed Precision} (STaMP) quantization, a novel strategy that applies linear transformations along the \textit{sequence} dimension to exploit the strong local correlation in language and visual data. By keeping a small number of tokens in each intermediate activation at higher precision, we can maintain model accuracy at lower (average) activations bit-widths. We evaluate STaMP on recent LVM and LLM architectures, demonstrating that it significantly improves low bit width activation quantization and complements established activation and weight quantization methods including recent feature transformations.


【5】On the limitation of evaluating machine unlearning using only a single training seed
标题:关于仅使用单个训练种子评估机器取消学习的局限性
链接:https://arxiv.org/abs/2510.26714

作者:Jamie Lanyon, Axel Finke, Petros Andreou, Georgina Cosma
备注:mini paper, 2 figures
摘要:Machine unlearning (MU) aims to remove the influence of certain data points from a trained model without costly retraining. Most practical MU algorithms are only approximate and their performance can only be assessed empirically. Care must therefore be taken to make empirical comparisons as representative as possible. A common practice is to run the MU algorithm multiple times independently starting from the same trained model. In this work, we demonstrate that this practice can give highly non-representative results because -- even for the same architecture and same dataset -- some MU methods can be highly sensitive to the choice of random number seed used for model training. We therefore recommend that empirical comphttps://info.arxiv.org/help/prep#commentsarisons of MU algorithms should also reflect the variability across different model training seeds.


【6】Budgeted Multiple-Expert Deferral
标题:同意的多专家延期
链接:https://arxiv.org/abs/2510.26706

作者:Giulia DeSalvo, Clara Mohri, Mehryar Mohri, Yutao Zhong
摘要:Learning to defer uncertain predictions to costly experts offers a powerful strategy for improving the accuracy and efficiency of machine learning systems. However, standard training procedures for deferral algorithms typically require querying all experts for every training instance, an approach that becomes prohibitively expensive when expert queries incur significant computational or resource costs. This undermines the core goal of deferral: to limit unnecessary expert usage. To overcome this challenge, we introduce the budgeted deferral framework, which aims to train effective deferral algorithms while minimizing expert query costs during training. We propose new algorithms for both two-stage and single-stage multiple-expert deferral settings that selectively query only a subset of experts per training example. While inspired by active learning, our setting is fundamentally different: labels are already known, and the core challenge is to decide which experts to query in order to balance cost and predictive performance. We establish theoretical guarantees for both of our algorithms, including generalization bounds and label complexity analyses. Empirical results across several domains show that our algorithms substantially reduce training costs without sacrificing prediction accuracy, demonstrating the practical value of our budget-aware deferral algorithms.


【7】Kimi Linear: An Expressive, Efficient Attention Architecture
标题:Kimi Linear:一个富有表现力、高效的注意力架构
链接:https://arxiv.org/abs/2510.26692

作者:Kimi Team: Yu Zhang, Zongyu Lin, Xingcheng Yao, Jiaxi Hu, Fanqing Meng, Chengyin Liu, Xin Men, Songlin Yang, Zhiyuan Li, Wentao Li, Enzhe Lu, Weizhou Liu, Yanru Chen, Weixin Xu, Longhui Yu, Yejie Wang, Yu Fan, Longguang Zhong, Enming Yuan, Dehao Zhang, Yizhi Zhang, T.Y. Liu, Haiming Wang, Shengjun Fang, Weiran He, Shaowei Liu, Yiwei Li, Jianlin Su, Jiezhong Qiu, Bo Pang, Junjie Yan, Zhejun Jiang, Weixiao Huang, Bohong Yin, Jiacheng You, Chu Wei, Zhengtao Wang, Chao Hong, Yutian Chen, Guanduo Chen, Yucheng Wang, Huabin Zheng, Feng Wang, Yibo Liu, Mengnan Dong, Zheng Zhang, Siyuan Pan, Wenhao Wu, Yuhao Wu, Longyu Guan, Jiawen Tao, Guohong Fu, Xinran Xu, Yuzhi Wang, Guokun Lai, Yuxin Wu, Xinyu Zhou, Zhilin Yang, Yulun Du
备注:Kimi Linear tech report
摘要 :We introduce Kimi Linear, a hybrid linear attention architecture that, for the first time, outperforms full attention under fair comparisons across various scenarios -- including short-context, long-context, and reinforcement learning (RL) scaling regimes. At its core lies Kimi Delta Attention (KDA), an expressive linear attention module that extends Gated DeltaNet with a finer-grained gating mechanism, enabling more effective use of limited finite-state RNN memory. Our bespoke chunkwise algorithm achieves high hardware efficiency through a specialized variant of the Diagonal-Plus-Low-Rank (DPLR) transition matrices, which substantially reduces computation compared to the general DPLR formulation while remaining more consistent with the classical delta rule.   We pretrain a Kimi Linear model with 3B activated parameters and 48B total parameters, based on a layerwise hybrid of KDA and Multi-Head Latent Attention (MLA). Our experiments show that with an identical training recipe, Kimi Linear outperforms full MLA with a sizeable margin across all evaluated tasks, while reducing KV cache usage by up to 75% and achieving up to 6 times decoding throughput for a 1M context. These results demonstrate that Kimi Linear can be a drop-in replacement for full attention architectures with superior performance and efficiency, including tasks with longer input and output lengths.   To support further research, we open-source the KDA kernel and vLLM implementations, and release the pre-trained and instruction-tuned model checkpoints.


【8】LoRAQuant: Mixed-Precision Quantization of LoRA to Ultra-Low Bits
标题:LoRAQuant:LoRA的混合精度量化到超低位
链接:https://arxiv.org/abs/2510.26690

作者:Amir Reza Mirzaei, Yuqiao Wen, Yanshuai Cao, Lili Mou
摘要:Low-Rank Adaptation (LoRA) has become a popular technique for parameter-efficient fine-tuning of large language models (LLMs). In many real-world scenarios, multiple adapters are loaded simultaneously to enable LLM customization for personalized user experiences or to support a diverse range of tasks. Although each adapter is lightweight in isolation, their aggregate cost becomes substantial at scale. To address this, we propose LoRAQuant, a mixed-precision post-training quantization method tailored to LoRA. Specifically, LoRAQuant reparameterizes each adapter by singular value decomposition (SVD) to concentrate the most important information into specific rows and columns. This makes it possible to quantize the important components to higher precision, while quantizing the rest to ultra-low bitwidth. We conduct comprehensive experiments with LLaMA 2-7B, LLaMA 2-13B, and Mistral 7B models on mathematical reasoning, coding, and summarization tasks. Results show that our LoRAQuant uses significantly lower bits than other quantization methods, but achieves comparable or even higher performance.


【9】Tight Differentially Private PCA via Matrix Coherence
标题:通过矩阵一致性进行严格的差异私有PCA
链接:https://arxiv.org/abs/2510.26679

作者:Tommaso d'Orsi, Gleb Novikov
备注:SODA 2026; equal contribution
摘要:We revisit the task of computing the span of the top $r$ singular vectors $u_1, \ldots, u_r$ of a matrix under differential privacy. We show that a simple and efficient algorithm -- based on singular value decomposition and standard perturbation mechanisms -- returns a private rank-$r$ approximation whose error depends only on the \emph{rank-$r$ coherence} of $u_1, \ldots, u_r$ and the spectral gap $\sigma_r - \sigma_{r+1}$. This resolves a question posed by Hardt and Roth~\cite{hardt2013beyond}. Our estimator outperforms the state of the art -- significantly so in some regimes. In particular, we show that in the dense setting, it achieves the same guarantees for single-spike PCA in the Wishart model as those attained by optimal non-private algorithms, whereas prior private algorithms failed to do so.   In addition, we prove that (rank-$r$) coherence does not increase under Gaussian perturbations. This implies that any estimator based on the Gaussian mechanism -- including ours -- preserves the coherence of the input. We conjecture that similar behavior holds for other structured models, including planted problems in graphs.   We also explore applications of coherence to graph problems. In particular, we present a differentially private algorithm for Max-Cut and other constraint satisfaction problems under low coherence assumptions.


【10】Aeolus: A Multi-structural Flight Delay Dataset
标题:Aeolus:多结构航班延误数据集
链接:https://arxiv.org/abs/2510.26616

作者:Lin Xu, Xinyun Yuan, Yuxuan Liang, Suwan Yin, Yuankai Wu
摘要:We introduce Aeolus, a large-scale Multi-modal Flight Delay Dataset designed to advance research on flight delay prediction and support the development of foundation models for tabular data. Existing datasets in this domain are typically limited to flat tabular structures and fail to capture the spatiotemporal dynamics inherent in delay propagation. Aeolus addresses this limitation by providing three aligned modalities: (i) a tabular dataset with rich operational, meteorological, and airportlevel features for over 50 million flights; (ii) a flight chain module that models delay propagation along sequential flight legs, capturing upstream and downstream dependencies; and (iii) a flight network graph that encodes shared aircraft, crew, and airport resource connections, enabling cross-flight relational reasoning. The dataset is carefully constructed with temporal splits, comprehensive features, and strict leakage prevention to support realistic and reproducible machine learning evaluation. Aeolus supports a broad range of tasks, including regression, classification, temporal structure modeling, and graph learning, serving as a unified benchmark across tabular, sequential, and graph modalities. We release baseline experiments and preprocessing tools to facilitate adoption. Aeolus fills a key gap for both domain-specific modeling and general-purpose structured data research.Our source code and data can be accessed at https://github.com/Flnny/Delay-data


【11】Wasserstein Regression as a Variational Approximation of Probabilistic Trajectories through the Bernstein Basis
标题:沃瑟斯坦回归作为伯恩斯坦基概率轨迹的变分逼近
链接:https://arxiv.org/abs/2510.26607

作者:Maksim Maslov, Alexander Kugaevskikh, Matthew Ivanov
摘要 :This paper considers the problem of regression over distributions, which is becoming increasingly important in machine learning. Existing approaches often ignore the geometry of the probability space or are computationally expensive. To overcome these limitations, a new method is proposed that combines the parameterization of probability trajectories using a Bernstein basis and the minimization of the Wasserstein distance between distributions. The key idea is to model a conditional distribution as a smooth probability trajectory defined by a weighted sum of Gaussian components whose parameters -- the mean and covariance -- are functions of the input variable constructed using Bernstein polynomials. The loss function is the averaged squared Wasserstein distance between the predicted Gaussian distributions and the empirical data, which takes into account the geometry of the distributions. An autodiff-based optimization method is used to train the model. Experiments on synthetic datasets that include complex trajectories demonstrated that the proposed method provides competitive approximation quality in terms of the Wasserstein distance, Energy Distance, and RMSE metrics, especially in cases of pronounced nonlinearity. The model demonstrates trajectory smoothness that is better than or comparable to alternatives and robustness to changes in data structure, while maintaining high interpretability due to explicit parameterization via control points. The developed approach represents a balanced solution that combines geometric accuracy, computational practicality, and interpretability. Prospects for further research include extending the method to non-Gaussian distributions, applying entropy regularization to speed up computations, and adapting the approach to working with high-dimensional data for approximating surfaces and more complex structures.


【12】Multiclass Local Calibration With the Jensen-Shannon Distance
标题:利用Jensen-Shannon距离进行多类局部校准
链接:https://arxiv.org/abs/2510.26566

作者:Cesare Barbera, Lorenzo Perini, Giovanni De Toni, Andrea Passerini, Andrea Pugnana
摘要:Developing trustworthy Machine Learning (ML) models requires their predicted probabilities to be well-calibrated, meaning they should reflect true-class frequencies. Among calibration notions in multiclass classification, strong calibration is the most stringent, as it requires all predicted probabilities to be simultaneously calibrated across all classes. However, existing approaches to multiclass calibration lack a notion of distance among inputs, which makes them vulnerable to proximity bias: predictions in sparse regions of the feature space are systematically miscalibrated. This is especially relevant in high-stakes settings, such as healthcare, where the sparse instances are exactly those most at risk of biased treatment. In this work, we address this main shortcoming by introducing a local perspective on multiclass calibration. First, we formally define multiclass local calibration and establish its relationship with strong calibration. Second, we theoretically analyze the pitfalls of existing evaluation metrics when applied to multiclass local calibration. Third, we propose a practical method for enhancing local calibration in Neural Networks, which enforces alignment between predicted probabilities and local estimates of class frequencies using the Jensen-Shannon distance. Finally, we empirically validate our approach against existing multiclass calibration techniques.


【13】Polybasic Speculative Decoding Through a Theoretical Perspective
标题:理论视角下的多元推测解码
链接:https://arxiv.org/abs/2510.26527

作者:Ruilin Wang, Huixia Li, Yuexiao Ma, Xiawu Zheng, Fei Chao, Xuefeng Xiao, Rongrong Ji
摘要:Inference latency stands as a critical bottleneck in the large-scale deployment of Large Language Models (LLMs). Speculative decoding methods have recently shown promise in accelerating inference without compromising the output distribution. However, existing work typically relies on a dualistic draft-verify framework and lacks rigorous theoretical grounding. In this paper, we introduce a novel \emph{polybasic} speculative decoding framework, underpinned by a comprehensive theoretical analysis. Specifically, we prove a fundamental theorem that characterizes the optimal inference time for multi-model speculative decoding systems, shedding light on how to extend beyond the dualistic approach to a more general polybasic paradigm. Through our theoretical investigation of multi-model token generation, we expose and optimize the interplay between model capabilities, acceptance lengths, and overall computational cost. Our framework supports both standalone implementation and integration with existing speculative techniques, leading to accelerated performance in practice. Experimental results across multiple model families demonstrate that our approach yields speedup ratios ranging from $3.31\times$ to $4.01\times$ for LLaMA2-Chat 7B, up to $3.87 \times$ for LLaMA3-8B, up to $4.43 \times$ for Vicuna-7B and up to $3.85 \times$ for Qwen2-7B -- all while preserving the original output distribution. We release our theoretical proofs and implementation code to facilitate further investigation into polybasic speculative decoding.


【14】Data-Efficient RLVR via Off-Policy Influence Guidance
标题:通过政策外影响指导实现数据高效的WLVR
链接:https://arxiv.org/abs/2510.26491

作者:Erle Zhu, Dazhi Jiang, Yuan Wang, Xujun Li, Jiale Cheng, Yuxian Gu, Yilin Niu, Aohan Zeng, Jie Tang, Minlie Huang, Hongning Wang
摘要:Data selection is a critical aspect of Reinforcement Learning with Verifiable Rewards (RLVR) for enhancing the reasoning capabilities of large language models (LLMs). Current data selection methods are largely heuristic-based, lacking theoretical guarantees and generalizability. This work proposes a theoretically-grounded approach using influence functions to estimate the contribution of each data point to the learning objective. To overcome the prohibitive computational cost of policy rollouts required for online influence estimation, we introduce an off-policy influence estimation method that efficiently approximates data influence using pre-collected offline trajectories. Furthermore, to manage the high-dimensional gradients of LLMs, we employ sparse random projection to reduce dimensionality and improve storage and computation efficiency. Leveraging these techniques, we develop \textbf{C}urriculum \textbf{R}L with \textbf{O}ff-\textbf{P}olicy \text{I}nfluence guidance (\textbf{CROPI}), a multi-stage RL framework that iteratively selects the most influential data for the current policy. Experiments on models up to 7B parameters demonstrate that CROPI significantly accelerates training. On a 1.5B model, it achieves a 2.66x step-level acceleration while using only 10\% of the data per stage compared to full-dataset training. Our results highlight the substantial potential of influence-based data selection for efficient RLVR.


【15】Counteracting Matthew Effect in Self-Improvement of LVLMs through Head-Tail Re-balancing
标题:通过头尾再平衡消除LVLM自我改进中的马太效应
链接:https://arxiv.org/abs/2510.26474

作者:Xin Guo, Zhiheng Xi, Yiwen Ding, Yitao Zhai, Xiaowei Shi, Xunliang Cai, Tao Gui, Qi Zhang, Xuanjing Huang
备注:Preprint
摘要 :Self-improvement has emerged as a mainstream paradigm for advancing the reasoning capabilities of large vision-language models (LVLMs), where models explore and learn from successful trajectories iteratively. However, we identify a critical issue during this process: the model excels at generating high-quality trajectories for simple queries (i.e., head data) but struggles with more complex ones (i.e., tail data). This leads to an imbalanced optimization that drives the model to prioritize simple reasoning skills, while hindering its ability to tackle more complex reasoning tasks. Over iterations, this imbalance becomes increasingly pronounced--a dynamic we term the "Matthew effect"--which ultimately hinders further model improvement and leads to performance bottlenecks. To counteract this challenge, we introduce four efficient strategies from two perspectives: distribution-reshaping and trajectory-resampling, to achieve head-tail re-balancing during the exploration-and-learning self-improvement process. Extensive experiments on Qwen2-VL-7B-Instruct and InternVL2.5-4B models across visual reasoning tasks demonstrate that our methods consistently improve visual reasoning capabilities, outperforming vanilla self-improvement by 3.86 points on average.


【16】Vectorized Context-Aware Embeddings for GAT-Based Collaborative Filtering
标题:基于GAT协同过滤的上下文感知向量化嵌入
链接:https://arxiv.org/abs/2510.26461

作者:Danial Ebrat, Sepideh Ahmadian, Luis Rueda
备注:None
摘要:Recommender systems often struggle with data sparsity and cold-start scenarios, limiting their ability to provide accurate suggestions for new or infrequent users. This paper presents a Graph Attention Network (GAT) based Collaborative Filtering (CF) framework enhanced with Large Language Model (LLM) driven context aware embeddings. Specifically, we generate concise textual user profiles and unify item metadata (titles, genres, overviews) into rich textual embeddings, injecting these as initial node features in a bipartite user item graph. To further optimize ranking performance, we introduce a hybrid loss function that combines Bayesian Personalized Ranking (BPR) with a cosine similarity term and robust negative sampling, ensuring explicit negative feedback is distinguished from unobserved data. Experiments on the MovieLens 100k and 1M datasets show consistent improvements over state-of-the-art baselines in Precision, NDCG, and MAP while demonstrating robustness for users with limited interaction history. Ablation studies confirm the critical role of LLM-augmented embeddings and the cosine similarity term in capturing nuanced semantic relationships. Our approach effectively mitigates sparsity and cold-start limitations by integrating LLM-derived contextual understanding into graph-based architectures. Future directions include balancing recommendation accuracy with coverage and diversity, and introducing fairness-aware constraints and interpretability features to enhance system performance further.


【17】Autograder+: A Multi-Faceted AI Framework for Rich Pedagogical Feedback in Programming Education
标题:Autograder+:一个多方面的AI框架,用于编程教育中丰富的教学反馈
链接:https://arxiv.org/abs/2510.26402

作者:Vikrant Sahu, Gagan Raj Gupta, Raghav Borikar, Nitin Mane
摘要:The rapid growth of programming education has outpaced traditional assessment tools, leaving faculty with limited means to provide meaningful, scalable feedback. Conventional autograders, while efficient, act as black-box systems that simply return pass/fail results, offering little insight into student thinking or learning needs.   Autograder+ is designed to shift autograding from a purely summative process to a formative learning experience. It introduces two key capabilities: automated feedback generation using a fine-tuned Large Language Model, and visualization of student code submissions to uncover learning patterns. The model is fine-tuned on curated student code and expert feedback to ensure pedagogically aligned, context-aware guidance.   In evaluation across 600 student submissions from multiple programming tasks, the system produced feedback with strong semantic alignment to instructor comments. For visualization, contrastively learned code embeddings trained on 1,000 annotated submissions enable grouping solutions into meaningful clusters based on functionality and approach. The system also supports prompt-pooling, allowing instructors to guide feedback style through selected prompt templates.   By integrating AI-driven feedback, semantic clustering, and interactive visualization, Autograder+ reduces instructor workload while supporting targeted instruction and promoting stronger learning outcomes.


【18】Scales++: Compute Efficient Evaluation Subset Selection with Cognitive Scales Embeddings
标题:Scales++:使用认知尺度嵌入计算有效的评估子集选择
链接:https://arxiv.org/abs/2510.26384

作者:Andrew M. Bean, Nabeel Seedat, Shengzhuang Chen, Jonathan Richard Schwarz
备注:9 pages, 2 figures, 4 tables
摘要:The prohibitive cost of evaluating large language models (LLMs) on comprehensive benchmarks necessitates the creation of small yet representative data subsets (i.e., tiny benchmarks) that enable efficient assessment while retaining predictive fidelity. Current methods for this task operate under a model-centric paradigm, selecting benchmarking items based on the collective performance of existing models. Such approaches are limited by large upfront costs, an inability to immediately handle new benchmarks (`cold-start'), and the fragile assumption that future models will share the failure patterns of their predecessors. In this work, we challenge this paradigm and propose a item-centric approach to benchmark subset selection, arguing that selection should be based on the intrinsic properties of the task items themselves, rather than on model-specific failure patterns. We instantiate this item-centric efficient benchmarking approach via a novel method, Scales++, where data selection is based on the cognitive demands of the benchmark samples. Empirically, we show Scales++ reduces the upfront selection cost by over 18x while achieving competitive predictive fidelity. On the Open LLM Leaderboard, using just a 0.5\% data subset, we predict full benchmark scores with a 2.9% mean absolute error. We demonstrate that this item-centric approach enables more efficient model evaluation without significant fidelity degradation, while also providing better cold-start performance and more interpretable benchmarking.


【19】Linear Causal Discovery with Interventional Constraints
标题:具有干预限制的线性因果发现
链接:https://arxiv.org/abs/2510.26342

作者:Zhigao Guo, Feng Dong
摘要 :Incorporating causal knowledge and mechanisms is essential for refining causal models and improving downstream tasks such as designing new treatments. In this paper, we introduce a novel concept in causal discovery, termed interventional constraints, which differs fundamentally from interventional data. While interventional data require direct perturbations of variables, interventional constraints encode high-level causal knowledge in the form of inequality constraints on causal effects. For instance, in the Sachs dataset (Sachs et al.\ 2005), Akt has been shown to be activated by PIP3, meaning PIP3 exerts a positive causal effect on Akt. Existing causal discovery methods allow enforcing structural constraints (for example, requiring a causal path from PIP3 to Akt), but they may still produce incorrect causal conclusions such as learning that "PIP3 inhibits Akt". Interventional constraints bridge this gap by explicitly constraining the total causal effect between variable pairs, ensuring learned models respect known causal influences. To formalize interventional constraints, we propose a metric to quantify total causal effects for linear causal models and formulate the problem as a constrained optimization task, solved using a two-stage constrained optimization method. We evaluate our approach on real-world datasets and demonstrate that integrating interventional constraints not only improves model accuracy and ensures consistency with established findings, making models more explainable, but also facilitates the discovery of new causal relationships that would otherwise be costly to identify.


【20】Agent Skills Enable a New Class of Realistic and Trivially Simple Prompt Injections
标题:代理技能实现了新型现实且简单的即时注射
链接:https://arxiv.org/abs/2510.26328

作者:David Schmotz, Sahar Abdelnabi, Maksym Andriushchenko
摘要:Enabling continual learning in LLMs remains a key unresolved research challenge. In a recent announcement, a frontier LLM company made a step towards this by introducing Agent Skills, a framework that equips agents with new knowledge based on instructions stored in simple markdown files. Although Agent Skills can be a very useful tool, we show that they are fundamentally insecure, since they enable trivially simple prompt injections. We demonstrate how to hide malicious instructions in long Agent Skill files and referenced scripts to exfiltrate sensitive data, such as internal files or passwords. Importantly, we show how to bypass system-level guardrails of a popular coding agent: a benign, task-specific approval with the "Don't ask again" option can carry over to closely related but harmful actions. Overall, we conclude that despite ongoing research efforts and scaling model capabilities, frontier LLMs remain vulnerable to very simple prompt injections in realistic scenarios. Our code is available at https://github.com/aisa-group/promptinject-agent-skills.


【21】On the Impact of Weight Discretization in QUBO-Based SVM Training
标题:基于QUBO的支持者训练中权重离散化的影响
链接:https://arxiv.org/abs/2510.26323

作者:Sascha Mücke
备注:Presented at the 7th DSO Workshop at ECML PKDD 2025
摘要:Training Support Vector Machines (SVMs) can be formulated as a QUBO problem, enabling the use of quantum annealing for model optimization. In this work, we study how the number of qubits - linked to the discretization level of dual weights - affects predictive performance across datasets. We compare QUBO-based SVM training to the classical LIBSVM solver and find that even low-precision QUBO encodings (e.g., 1 bit per parameter) yield competitive, and sometimes superior, accuracy. While increased bit-depth enables larger regularization parameters, it does not always improve classification. Our findings suggest that selecting the right support vectors may matter more than their precise weighting. Although current hardware limits the size of solvable QUBOs, our results highlight the potential of quantum annealing for efficient SVM training as quantum devices scale.


【22】Implicit Bias of Per-sample Adam on Separable Data: Departure from the Full-batch Regime
标题:可分离数据上每样本Adam的隐式偏倚:与全批机制的偏离
链接:https://arxiv.org/abs/2510.26303

作者:Beomhan Baek, Minhak Song, Chulhee Yun
备注:50 pages
摘要:Adam [Kingma and Ba, 2015] is the de facto optimizer in deep learning, yet its theoretical understanding remains limited. Prior analyses show that Adam favors solutions aligned with $\ell_\infty$-geometry, but these results are restricted to the full-batch regime. In this work, we study the implicit bias of incremental Adam (using one sample per step) for logistic regression on linearly separable data, and we show that its bias can deviate from the full-batch behavior. To illustrate this, we construct a class of structured datasets where incremental Adam provably converges to the $\ell_2$-max-margin classifier, in contrast to the $\ell_\infty$-max-margin bias of full-batch Adam. For general datasets, we develop a proxy algorithm that captures the limiting behavior of incremental Adam as $\beta_2 \to 1$ and we characterize its convergence direction via a data-dependent dual fixed-point formulation. Finally, we prove that, unlike Adam, Signum [Bernstein et al., 2018] converges to the $\ell_\infty$-max-margin classifier for any batch size by taking $\beta$ close enough to 1. Overall, our results highlight that the implicit bias of Adam crucially depends on both the batching scheme and the dataset, while Signum remains invariant.


【23】A Research Roadmap for Augmenting Software Engineering Processes and Software Products with Generative AI
标题:利用生成性人工智能增强软件工程流程和软件产品的研究路线图
链接:https://arxiv.org/abs/2510.26275

作者:Domenico Amalfitano, Andreas Metzger, Marco Autili, Tommaso Fulcini, Tobias Hey, Jan Keim, Patrizio Pelliccione, Vincenzo Scotti, Anne Koziolek, Raffaela Mirandola, Andreas Vogelsang
摘要 :Generative AI (GenAI) is rapidly transforming software engineering (SE) practices, influencing how SE processes are executed, as well as how software systems are developed, operated, and evolved. This paper applies design science research to build a roadmap for GenAI-augmented SE. The process consists of three cycles that incrementally integrate multiple sources of evidence, including collaborative discussions from the FSE 2025 "Software Engineering 2030" workshop, rapid literature reviews, and external feedback sessions involving peers. McLuhan's tetrads were used as a conceptual instrument to systematically capture the transforming effects of GenAI on SE processes and software products.The resulting roadmap identifies four fundamental forms of GenAI augmentation in SE and systematically characterizes their related research challenges and opportunities. These insights are then consolidated into a set of future research directions. By grounding the roadmap in a rigorous multi-cycle process and cross-validating it among independent author teams and peers, the study provides a transparent and reproducible foundation for analyzing how GenAI affects SE processes, methods and tools, and for framing future research within this rapidly evolving area. Based on these findings, the article finally makes ten predictions for SE in the year 2030.


【24】Angular Steering: Behavior Control via Rotation in Activation Space
标题:角度转向:通过激活空间中的旋转进行行为控制
链接:https://arxiv.org/abs/2510.26243

作者:Hieu M. Vu, Tan M. Nguyen
备注:NeurIPS 2025 (Spotlight)
摘要:Controlling specific behaviors in large language models while preserving their general capabilities is a central challenge for safe and reliable artificial intelligence deployment. Current steering methods, such as vector addition and directional ablation, are constrained within a two-dimensional subspace defined by the activation and feature direction, making them sensitive to chosen parameters and potentially affecting unrelated features due to unintended interactions in activation space. We introduce Angular Steering, a novel and flexible method for behavior modulation that operates by rotating activations within a fixed two-dimensional subspace. By formulating steering as a geometric rotation toward or away from a target behavior direction, Angular Steering provides continuous, fine-grained control over behaviors such as refusal and compliance. We demonstrate this method using refusal steering emotion steering as use cases. Additionally, we propose Adaptive Angular Steering, a selective variant that rotates only activations aligned with the target feature, further enhancing stability and coherence. Angular Steering generalizes existing addition and orthogonalization techniques under a unified geometric rotation framework, simplifying parameter selection and maintaining model stability across a broader range of adjustments. Experiments across multiple model families and sizes show that Angular Steering achieves robust behavioral control while maintaining general language modeling performance, underscoring its flexibility, generalization, and robustness compared to prior approaches. Code and artifacts are available at https://github.com/lone17/angular-steering/.


【25】Bridging the Gap Between Molecule and Textual Descriptions via Substructure-aware Alignment
标题:通过子结构感知对齐弥合分子和文本描述之间的差距
链接:https://arxiv.org/abs/2510.26157

作者:Hyuntae Park, Yeachan Kim, SangKeun Lee
备注:EMNLP 2025 (main)
摘要:Molecule and text representation learning has gained increasing interest due to its potential for enhancing the understanding of chemical information. However, existing models often struggle to capture subtle differences between molecules and their descriptions, as they lack the ability to learn fine-grained alignments between molecular substructures and chemical phrases. To address this limitation, we introduce MolBridge, a novel molecule-text learning framework based on substructure-aware alignments. Specifically, we augment the original molecule-description pairs with additional alignment signals derived from molecular substructures and chemical phrases. To effectively learn from these enriched alignments, MolBridge employs substructure-aware contrastive learning, coupled with a self-refinement mechanism that filters out noisy alignment signals. Experimental results show that MolBridge effectively captures fine-grained correspondences and outperforms state-of-the-art baselines on a wide range of molecular benchmarks, highlighting the significance of substructure-aware alignment in molecule-text learning.


【26】SAFE: A Novel Approach to AI Weather Evaluation through Stratified Assessments of Forecasts over Earth
标题:SAFE:一种通过地球上空预报分层评估进行人工智能天气评估的新方法
链接:https://arxiv.org/abs/2510.26099

作者:Nick Masi, Randall Balestriero
摘要:The dominant paradigm in machine learning is to assess model performance based on average loss across all samples in some test set. This amounts to averaging performance geospatially across the Earth in weather and climate settings, failing to account for the non-uniform distribution of human development and geography. We introduce Stratified Assessments of Forecasts over Earth (SAFE), a package for elucidating the stratified performance of a set of predictions made over Earth. SAFE integrates various data domains to stratify by different attributes associated with geospatial gridpoints: territory (usually country), global subregion, income, and landcover (land or water). This allows us to examine the performance of models for each individual stratum of the different attributes (e.g., the accuracy in every individual country). To demonstrate its importance, we utilize SAFE to benchmark a zoo of state-of-the-art AI-based weather prediction models, finding that they all exhibit disparities in forecasting skill across every attribute. We use this to seed a benchmark of model forecast fairness through stratification at different lead times for various climatic variables. By moving beyond globally-averaged metrics, we for the first time ask: where do models perform best or worst, and which models are most fair? To support further work in this direction, the SAFE package is open source and available at https://github.com/N-Masi/safe


【27】Do Students Debias Like Teachers? On the Distillability of Bias Mitigation Methods
标题:学生像老师一样去偏见吗?论偏差缓解方法的可提炼性
链接:https://arxiv.org/abs/2510.26038

作者:Jiali Cheng, Chirag Agarwal, Hadi Amiri
摘要 :Knowledge distillation (KD) is an effective method for model compression and transferring knowledge between models. However, its effect on model's robustness against spurious correlations that degrade performance on out-of-distribution data remains underexplored. This study investigates the effect of knowledge distillation on the transferability of ``debiasing'' capabilities from teacher models to student models on natural language inference (NLI) and image classification tasks. Through extensive experiments, we illustrate several key findings: (i) overall the debiasing capability of a model is undermined post-KD; (ii) training a debiased model does not benefit from injecting teacher knowledge; (iii) although the overall robustness of a model may remain stable post-distillation, significant variations can occur across different types of biases; and (iv) we pin-point the internal attention pattern and circuit that causes the distinct behavior post-KD. Given the above findings, we propose three effective solutions to improve the distillability of debiasing methods: developing high quality data for augmentation, implementing iterative knowledge distillation, and initializing student models with weights obtained from teacher models. To the best of our knowledge, this is the first study on the effect of KD on debiasing and its interenal mechanism at scale. Our findings provide understandings on how KD works and how to design better debiasing methods.


【28】Exploring Human-AI Conceptual Alignment through the Prism of Chess
标题:通过国际象棋棱镜探索人机概念一致
链接:https://arxiv.org/abs/2510.26025

作者:Semyon Lomaso, Judah Goldfeder, Mehmet Hamza Erol, Matthew So, Yao Yan, Addison Howard, Nathan Kutz, Ravid Shwartz Ziv
摘要:Do AI systems truly understand human concepts or merely mimic surface patterns? We investigate this through chess, where human creativity meets precise strategic concepts. Analyzing a 270M-parameter transformer that achieves grandmaster-level play, we uncover a striking paradox: while early layers encode human concepts like center control and knight outposts with up to 85\% accuracy, deeper layers, despite driving superior performance, drift toward alien representations, dropping to 50-65\% accuracy. To test conceptual robustness beyond memorization, we introduce the first Chess960 dataset: 240 expert-annotated positions across 6 strategic concepts. When opening theory is eliminated through randomized starting positions, concept recognition drops 10-20\% across all methods, revealing the model's reliance on memorized patterns rather than abstract understanding. Our layer-wise analysis exposes a fundamental tension in current architectures: the representations that win games diverge from those that align with human thinking. These findings suggest that as AI systems optimize for performance, they develop increasingly alien intelligence, a critical challenge for creative AI applications requiring genuine human-AI collaboration. Dataset and code are available at: https://github.com/slomasov/ChessConceptsLLM.


【29】The Quest for Reliable Metrics of Responsible AI
标题:寻求负责任的人工智能的可靠性
链接:https://arxiv.org/abs/2510.26007

作者:Theresia Veronika Rampisela, Maria Maistro, Tuukka Ruotsalo, Christina Lioma
备注:Accepted for presentation at the AI in Science Summit 2025
摘要:The development of Artificial Intelligence (AI), including AI in Science (AIS), should be done following the principles of responsible AI. Progress in responsible AI is often quantified through evaluation metrics, yet there has been less work on assessing the robustness and reliability of the metrics themselves. We reflect on prior work that examines the robustness of fairness metrics for recommender systems as a type of AI application and summarise their key takeaways into a set of non-exhaustive guidelines for developing reliable metrics of responsible AI. Our guidelines apply to a broad spectrum of AI applications, including AIS.


【30】Infrequent Exploration in Linear Bandits
标题:线性盗贼的罕见探索
链接:https://arxiv.org/abs/2510.26000

作者:Harin Lee, Min-hwan Oh
备注:NeurIPS 2025 camera-ready version
摘要:We study the problem of infrequent exploration in linear bandits, addressing a significant yet overlooked gap between fully adaptive exploratory methods (e.g., UCB and Thompson Sampling), which explore potentially at every time step, and purely greedy approaches, which require stringent diversity assumptions to succeed. Continuous exploration can be impractical or unethical in safety-critical or costly domains, while purely greedy strategies typically fail without adequate contextual diversity. To bridge these extremes, we introduce a simple and practical framework, INFEX, explicitly designed for infrequent exploration. INFEX executes a base exploratory policy according to a given schedule while predominantly choosing greedy actions in between. Despite its simplicity, our theoretical analysis demonstrates that INFEX achieves instance-dependent regret matching standard provably efficient algorithms, provided the exploration frequency exceeds a logarithmic threshold. Additionally, INFEX is a general, modular framework that allows seamless integration of any fully adaptive exploration method, enabling wide applicability and ease of adoption. By restricting intensive exploratory computations to infrequent intervals, our approach can also enhance computational efficiency. Empirical evaluations confirm our theoretical findings, showing state-of-the-art regret performance and runtime improvements over existing methods.


【31】Modular Linear Tokenization (MLT)
标题:模块线性令牌化(MLT)
链接:https://arxiv.org/abs/2510.25952

作者:Tcharlies Schmitz
摘要:This paper introduces Modular Linear Tokenization (MLT), a reversible and deterministic technique for encoding high-cardinality categorical identifiers into compact numerical vectors. Unlike traditional hashing or one-hot encodings, MLT preserves bijective mappings by leveraging modular arithmetic over finite fields and invertible linear transformations. The method offers explicit control of dimensionality and computational scalability while maintaining full reversibility, even for millions of identifiers. Experimental results on the MovieLens 20M dataset show that MLT achieves comparable predictive performance to supervised embeddings while requiring significantly fewer parameters and lower training cost. An open-source implementation of MLT is available on PyPI (https://pypi.org/project/light-mlt/) and GitHub (https://github.com/tcharliesschmitz/light-mlt).


【32】Robust GNN Watermarking via Implicit Perception of Topological Invariants
标题:通过隐式隐式感知的鲁棒GNN水印
链接:https://arxiv.org/abs/2510.25934

作者:Jipeng Li, Yannning Shen
摘要 :Graph Neural Networks (GNNs) are valuable intellectual property, yet many watermarks rely on backdoor triggers that break under common model edits and create ownership ambiguity. We present InvGNN-WM, which ties ownership to a model's implicit perception of a graph invariant, enabling trigger-free, black-box verification with negligible task impact. A lightweight head predicts normalized algebraic connectivity on an owner-private carrier set; a sign-sensitive decoder outputs bits, and a calibrated threshold controls the false-positive rate. Across diverse node and graph classification datasets and backbones, InvGNN-WM matches clean accuracy while yielding higher watermark accuracy than trigger- and compression-based baselines. It remains strong under unstructured pruning, fine-tuning, and post-training quantization; plain knowledge distillation (KD) weakens the mark, while KD with a watermark loss (KD+WM) restores it. We provide guarantees for imperceptibility and robustness, and we prove that exact removal is NP-complete.


【33】Transferring Causal Effects using Proxies
标题:使用代理转移因果效应
链接:https://arxiv.org/abs/2510.25924

作者:Manuel Iglesias-Alonso, Felix Schur, Julius von Kügelgen, Jonas Peters
备注:Advances in Neural Information Processing Systems (NeurIPS 2025) camera-ready version
摘要:We consider the problem of estimating a causal effect in a multi-domain setting. The causal effect of interest is confounded by an unobserved confounder and can change between the different domains. We assume that we have access to a proxy of the hidden confounder and that all variables are discrete or categorical. We propose methodology to estimate the causal effect in the target domain, where we assume to observe only the proxy variable. Under these conditions, we prove identifiability (even when treatment and response variables are continuous). We introduce two estimation techniques, prove consistency, and derive confidence intervals. The theoretical results are supported by simulation studies and a real-world example studying the causal effect of website rankings on consumer choices.


【34】MIRO: MultI-Reward cOnditioned pretraining improves T2I quality and efficiency
标题:MIRO:MultI-Reward condited预训练提高了T2 I质量和效率
链接:https://arxiv.org/abs/2510.25897

作者:Nicolas Dufour, Lucas Degeorge, Arijit Ghosh, Vicky Kalogeiton, David Picard
备注:Project page: this https URL
摘要:Current text-to-image generative models are trained on large uncurated datasets to enable diverse generation capabilities. However, this does not align well with user preferences. Recently, reward models have been specifically designed to perform post-hoc selection of generated images and align them to a reward, typically user preference. This discarding of informative data together with the optimizing for a single reward tend to harm diversity, semantic fidelity and efficiency. Instead of this post-processing, we propose to condition the model on multiple reward models during training to let the model learn user preferences directly. We show that this not only dramatically improves the visual quality of the generated images but it also significantly speeds up the training. Our proposed method, called MIRO, achieves state-of-the-art performances on the GenEval compositional benchmark and user-preference scores (PickAScore, ImageReward, HPSv2).


【35】Approximating Human Preferences Using a Multi-Judge Learned System
标题:使用多法官学习系统估计人类偏好
链接:https://arxiv.org/abs/2510.25884

作者:Eitán Sprejer, Fernando Avalos, Augusto Bernardi, Jose Pedro Brito de Azevedo Faustino, Jacob Haimes, Narmeen Fatimah Oozeer
摘要:Aligning LLM-based judges with human preferences is a significant challenge, as they are difficult to calibrate and often suffer from rubric sensitivity, bias, and instability. Overcoming this challenge advances key applications, such as creating reliable reward models for Reinforcement Learning from Human Feedback (RLHF) and building effective routing systems that select the best-suited model for a given user query. In this work, we propose a framework for modeling diverse, persona-based preferences by learning to aggregate outputs from multiple rubric-conditioned judges. We investigate the performance of this approach against naive baselines and assess its robustness through case studies on both human and LLM-judges biases. Our primary contributions include a persona-based method for synthesizing preference labels at scale and two distinct implementations of our aggregator: Generalized Additive Model (GAM) and a Multi-Layer Perceptron (MLP).


【36】Beyond Long Context: When Semantics Matter More than Tokens
标题:超越长期上下文:当语义比代币更重要时
链接:https://arxiv.org/abs/2510.25816

作者:Tarun Kumar Chawdhury, Jon D. Duke
备注:12 pages, 5 figures
摘要:Electronic Health Records (EHR) store clinical documentation as base64 encoded attachments in FHIR DocumentReference resources, which makes semantic question answering difficult. Traditional vector database methods often miss nuanced clinical relationships. The Clinical Entity Augmented Retrieval (CLEAR) method, introduced by Lopez et al. 2025, uses entity aware retrieval and achieved improved performance with an F1 score of 0.90 versus 0.86 for embedding based retrieval, while using over 70 percent fewer tokens. We developed a Clinical Notes QA Evaluation Platform to validate CLEAR against zero shot large context inference and traditional chunk based retrieval augmented generation. The platform was tested on 12 clinical notes ranging from 10,000 to 65,000 tokens representing realistic EHR content. CLEAR achieved a 58.3 percent win rate, an average semantic similarity of 0.878, and used 78 percent fewer tokens than wide context processing. The largest performance gains occurred on long notes, with a 75 percent win rate for documents exceeding 65,000 tokens. These findings confirm that entity aware retrieval improves both efficiency and accuracy in clinical natural language processing. The evaluation framework provides a reusable and transparent benchmark for assessing clinical question answering systems where semantic precision and computational efficiency are critical.


【37】MemEIC: A Step Toward Continual and Compositional Knowledge Editing
标题:MemEIC:迈向连续和组合知识编辑的一步
链接:https://arxiv.org/abs/2510.25798

作者:Jin Seong, Jiyun Park, Wencke Liermann, Hongseok Choi, Yoonji Nam, Hyun Kim, Soojong Lim, Namhoon Lee
备注:NeurIPS 2025, 38 pages, 8 figures
摘要:The dynamic nature of information necessitates continuously updating large vision-language models (LVLMs). While recent knowledge editing techniques hint at promising directions, they often focus on editing a single modality (vision or language) in isolation. This prevalent practice neglects the inherent multimodality of LVLMs and the continuous nature of knowledge updates, potentially leading to suboptimal editing outcomes when considering the interplay between modalities and the need for ongoing knowledge refinement. To address these limitations, we propose MemEIC, a novel method for Continual and Compositional Knowledge Editing (CCKE) in LVLMs. MemEIC enables compositional editing of both visual and textual knowledge sequentially. Our approach employs a hybrid external-internal editor featuring a dual external memory for cross-modal evidence retrieval and dual LoRA adapters that facilitate disentangled parameter updates for each modality. A key component is a brain-inspired knowledge connector, activated selectively for compositional reasoning, that integrates information across different modalities. Experiments demonstrate that MemEIC significantly improves performance on complex multimodal questions and effectively preserves prior edits, setting a new benchmark for CCKE in LVLMs.


【38】HiMAE: Hierarchical Masked Autoencoders Discover Resolution-Specific Structure in Wearable Time Series
标题:HiMAE:分层屏蔽自动编码器发现可穿戴时间序列中的分辨率特定结构
链接:https://arxiv.org/abs/2510.25785

作者:Simon A. Lee, Cyrus Tanade, Hao Zhou, Juhyeon Lee, Megha Thukral, Minji Han, Rachel Choi, Md Sazzad Hissain Khan, Baiying Lu, Migyeong Gwak, Mehrab Bin Morshed, Viswam Nathan, Md Mahbubur Rahman, Li Zhu, Subramaniam Venkatraman, Sharanya Arcot Desai
摘要:Wearable sensors provide abundant physiological time series, yet the principles governing their predictive utility remain unclear. We hypothesize that temporal resolution is a fundamental axis of representation learning, with different clinical and behavioral outcomes relying on structure at distinct scales. To test this resolution hypothesis, we introduce HiMAE (Hierarchical Masked Autoencoder), a self supervised framework that combines masked autoencoding with a hierarchical convolutional encoder decoder. HiMAE produces multi resolution embeddings that enable systematic evaluation of which temporal scales carry predictive signal, transforming resolution from a hyperparameter into a probe for interpretability. Across classification, regression, and generative benchmarks, HiMAE consistently outperforms state of the art foundation models that collapse scale, while being orders of magnitude smaller. HiMAE is an efficient representation learner compact enough to run entirely on watch, achieving sub millisecond inference on smartwatch class CPUs for true edge inference. Together, these contributions position HiMAE as both an efficient self supervised learning method and a discovery tool for scale sensitive structure in wearable health.


【39】zFLoRA: Zero-Latency Fused Low-Rank Adapters
标题:zFLoRA:零延迟融合低级别适配器
链接:https://arxiv.org/abs/2510.25784

作者:Dhananjaya Gowda, Seoha Song, Harshith Goka, Junhyun Lee
摘要:Large language models (LLMs) are increasingly deployed with task-specific adapters catering to multiple downstream applications. In such a scenario, the additional compute associated with these apparently insignificant number of adapter parameters (typically less than 1% of the base model) turns out to be disproportionately significant during inference time (upto 2.5x times that of the base model). In this paper, we propose a new zero-latency fused low-rank adapter (zFLoRA) that introduces zero or negligible latency overhead on top of the base model. Experimental results on LLMs of size 1B, 3B and 7B show that zFLoRA compares favorably against the popular supervised fine-tuning benchmarks including low-rank adapters (LoRA) as well as full fine-tuning (FFT). Experiments are conducted on 18 different tasks across three different categories namely commonsense reasoning, math reasoning and summary-dialogue. Latency measurements made on NPU (Samsung Galaxy S25+) as well as GPU (NVIDIA H100) platforms show that the proposed zFLoRA adapters introduce zero to negligible latency overhead.


【40】FlowQ-Net: A Generative Framework for Automated Quantum Circuit Design
标题:FlowQ-Net:自动量子电路设计的生成框架
链接:https://arxiv.org/abs/2510.26688

作者:Jun Dai, Michael Rizvi-Martel, Guillaume Rabusseau
摘要:Designing efficient quantum circuits is a central bottleneck to exploring the potential of quantum computing, particularly for noisy intermediate-scale quantum (NISQ) devices, where circuit efficiency and resilience to errors are paramount. The search space of gate sequences grows combinatorially, and handcrafted templates often waste scarce qubit and depth budgets. We introduce \textsc{FlowQ-Net} (Flow-based Quantum design Network), a generative framework for automated quantum circuit synthesis based on Generative Flow Networks (GFlowNets). This framework learns a stochastic policy to construct circuits sequentially, sampling them in proportion to a flexible, user-defined reward function that can encode multiple design objectives such as performance, depth, and gate count. This approach uniquely enables the generation of a diverse ensemble of high-quality circuits, moving beyond single-solution optimization. We demonstrate the efficacy of \textsc{FlowQ-Net} through an extensive set of simulations. We apply our method to Variational Quantum Algorithm (VQA) ansatz design for molecular ground state estimation, Max-Cut, and image classification, key challenges in near-term quantum computing. Circuits designed by \textsc{FlowQ-Net} achieve significant improvements, yielding circuits that are 10$\times$-30$\times$ more compact in terms of parameters, gates, and depth compared to commonly used unitary baselines, without compromising accuracy. This trend holds even when subjected to error profiles from real-world quantum devices. Our results underline the potential of generative models as a general-purpose methodology for automated quantum circuit design, offering a promising path towards more efficient quantum algorithms and accelerating scientific discovery in the quantum domain.


【41】Action-Driven Processes for Continuous-Time Control
标题:连续时间控制的时间驱动流程
链接:https://arxiv.org/abs/2510.26672

作者:Ruimin He, Shaowei Lin
摘要:At the heart of reinforcement learning are actions - decisions made in response to observations of the environment. Actions are equally fundamental in the modeling of stochastic processes, as they trigger discontinuous state transitions and enable the flow of information through large, complex systems. In this paper, we unify the perspectives of stochastic processes and reinforcement learning through action- driven processes, and illustrate their application to spiking neural networks. Leveraging ideas from control-as-inference, we show that minimizing the Kullback-Leibler divergence between a policy-driven true distribution and a reward-driven model distribution for a suitably defined action-driven process is equivalent to maximum entropy reinforcement learning.


【42】Hybrid Physical-Neural Simulator for Fast Cosmological Hydrodynamics
标题:快速宇宙流体动力学混合物理-神经模拟器
链接:https://arxiv.org/abs/2510.26593

作者:Arne Thomsen, Tilman Tröster, François Lanusse
备注:Accepted to the NeurIPS 2025 Workshop on Machine Learning and the Physical Sciences
摘要:Cosmological field-level inference requires differentiable forward models that solve the challenging dynamics of gas and dark matter under hydrodynamics and gravity. We propose a hybrid approach where gravitational forces are computed using a differentiable particle-mesh solver, while the hydrodynamics are parametrized by a neural network that maps local quantities to an effective pressure field. We demonstrate that our method improves upon alternative approaches, such as an Enthalpy Gradient Descent baseline, both at the field and summary-statistic level. The approach is furthermore highly data efficient, with a single reference simulation of cosmological structure formation being sufficient to constrain the neural pressure model. This opens the door for future applications where the model is fit directly to observational data, rather than a training set of simulations.


【43】Multi-Output Robust and Conjugate Gaussian Processes
标题:多输出鲁棒和卷积高斯过程
链接:https://arxiv.org/abs/2510.26401

作者:Joshua Rooijakkers, Leiv Rønneberg, François-Xavier Briol, Jeremias Knoblauch, Matias Altamirano
摘要:Multi-output Gaussian process (MOGP) regression allows modelling dependencies among multiple correlated response variables. Similarly to standard Gaussian processes, MOGPs are sensitive to model misspecification and outliers, which can distort predictions within individual outputs. This situation can be further exacerbated by multiple anomalous response variables whose errors propagate due to correlations between outputs. To handle this situation, we extend and generalise the robust and conjugate Gaussian process (RCGP) framework introduced by Altamirano et al. (2024). This results in the multi-output RCGP (MO-RCGP): a provably robust MOGP that is conjugate, and jointly captures correlations across outputs. We thoroughly evaluate our approach through applications in finance and cancer research.


【44】$L_1$-norm Regularized Indefinite Kernel Logistic Regression
标题:$L_1$-范正规不定核逻辑回归
链接:https://arxiv.org/abs/2510.26043

作者:Shaoxin Wang, Hanjing Yao
备注:17 pages, 1 figure
摘要:Kernel logistic regression (KLR) is a powerful classification method widely applied across diverse domains. In many real-world scenarios, indefinite kernels capture more domain-specific structural information than positive definite kernels. This paper proposes a novel $L_1$-norm regularized indefinite kernel logistic regression (RIKLR) model, which extends the existing IKLR framework by introducing sparsity via an $L_1$-norm penalty. The introduction of this regularization enhances interpretability and generalization while introducing nonsmoothness and nonconvexity into the optimization landscape. To address these challenges, a theoretically grounded and computationally efficient proximal linearized algorithm is developed. Experimental results on multiple benchmark datasets demonstrate the superior performance of the proposed method in terms of both accuracy and sparsity.


【45】InputDSA: Demixing then Comparing Recurrent and Externally Driven Dynamics
标题:InputDSA:分解,然后比较循环和外部驱动的动态
链接:https://arxiv.org/abs/2510.25943

作者:Ann Huang, Mitchell Ostrow, Satpreet H. Singh, Leo Kozachkov, Ila Fiete, Kanaka Rajan
备注:36 pages, 14 figures
摘要 :In control problems and basic scientific modeling, it is important to compare observations with dynamical simulations. For example, comparing two neural systems can shed light on the nature of emergent computations in the brain and deep neural networks. Recently, Ostrow et al. (2023) introduced Dynamical Similarity Analysis (DSA), a method to measure the similarity of two systems based on their recurrent dynamics rather than geometry or topology. However, DSA does not consider how inputs affect the dynamics, meaning that two similar systems, if driven differently, may be classified as different. Because real-world dynamical systems are rarely autonomous, it is important to account for the effects of input drive. To this end, we introduce a novel metric for comparing both intrinsic (recurrent) and input-driven dynamics, called InputDSA (iDSA). InputDSA extends the DSA framework by estimating and comparing both input and intrinsic dynamic operators using a variant of Dynamic Mode Decomposition with control (DMDc) based on subspace identification. We demonstrate that InputDSA can successfully compare partially observed, input-driven systems from noisy data. We show that when the true inputs are unknown, surrogate inputs can be substituted without a major deterioration in similarity estimates. We apply InputDSA on Recurrent Neural Networks (RNNs) trained with Deep Reinforcement Learning, identifying that high-performing networks are dynamically similar to one another, while low-performing networks are more diverse. Lastly, we apply InputDSA to neural data recorded from rats performing a cognitive task, demonstrating that it identifies a transition from input-driven evidence accumulation to intrinsically-driven decision-making. Our work demonstrates that InputDSA is a robust and efficient method for comparing intrinsic dynamics and the effect of external input on dynamical systems.


机器翻译由腾讯交互翻译提供,仅供参考

点击“阅读原文”获取带摘要的学术速递

Python社区是高质量的Python/Django开发社区
本文地址:http://www.python88.com/topic/188545