点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!
cs.LG 方向,今日共计188篇
大模型相关(21篇)
【1】h1: Bootstrapping LLMs to Reason over Longer Horizons via Reinforcement Learning
标题:h1:通过强化学习将LLM引导到更长的视野中进行推理
链接:https://arxiv.org/abs/2510.07312
作者:Sumeet Ramesh Motwani, Alesia Ivanova, Ziyang Cai, Philip Torr, Riashat Islam, Shital Shah, Christian Schroeder de Witt, Charles London
备注:Preprint, 31 pages, 8 figures
摘要:大型语言模型擅长短期推理任务,但随着推理范围长度的增加,性能会下降。现有的解决方法依赖于推理时间脚手架或昂贵的步骤级监督,这两种方法都不容易扩展。在这项工作中,我们引入了一个可扩展的方法来引导长时间的推理能力,只使用现有的,丰富的短期数据。我们的方法综合组成简单的问题到复杂的,多步骤的依赖链的任意长度。我们在这些数据上使用仅限结果的奖励来训练训练模型,课程的复杂性会自动增加,从而使RL训练能够在不饱和的情况下进一步扩展。从经验上讲,我们的方法推广得非常好:对六年级水平数学问题(GSM 8 K)的课程培训将更长的竞赛级基准(GSM符号,MATH-500,AIME)的准确性提高了2.06倍。重要的是,即使在高通@k下,我们的长期改进也明显高于基线,这表明模型可以在RL下学习新的推理路径。从理论上讲,我们证明了具有结果奖励的课程RL在全视野训练中实现了样本复杂度的指数级改进,提供了与密集监督相当的训练信号。因此,h1为仅使用现有数据的长期问题引入了缩放RL的有效路径。
摘要:Large language models excel at short-horizon reasoning tasks, but performance drops as reasoning horizon lengths increase. Existing approaches to combat this rely on inference-time scaffolding or costly step-level supervision, neither of which scales easily. In this work, we introduce a scalable method to bootstrap long-horizon reasoning capabilities using only existing, abundant short-horizon data. Our approach synthetically composes simple problems into complex, multi-step dependency chains of arbitrary length. We train models on this data using outcome-only rewards under a curriculum that automatically increases in complexity, allowing RL training to be scaled much further without saturating. Empirically, our method generalizes remarkably well: curriculum training on composed 6th-grade level math problems (GSM8K) boosts accuracy on longer, competition-level benchmarks (GSM-Symbolic, MATH-500, AIME) by up to 2.06x. Importantly, our long-horizon improvements are significantly higher than baselines even at high pass@k, showing that models can learn new reasoning paths under RL. Theoretically, we show that curriculum RL with outcome rewards achieves an exponential improvement in sample complexity over full-horizon training, providing training signal comparable to dense supervision. h1 therefore introduces an efficient path towards scaling RL for long-horizon problems using only existing data.
【2】On the Convergence of Moral Self-Correction in Large Language Models
标题:论大语言模型中道德自我纠正的趋同性
链接:https://arxiv.org/abs/2510.07290
作者:Guangliang Liu, Haitao Mao, Bochuan Cao, Zhiyu Xue, Xitong Zhang, Rongrong Wang, Kristen Marie Johnson
备注:19pages, 7 figures
摘要:大型语言模型(LLM)能够在被指示这样做时改善他们的反应,这种能力被称为自我纠正。当指令只提供一般和抽象的目标,而没有关于响应中潜在问题的具体细节时,LLM必须依靠其内部知识来提高响应质量,这一过程称为内在自我校正。内在自我校正的经验成功在各种应用中是显而易见的,但它如何以及为什么有效仍然是未知的。专注于道德自我纠正在LLM,我们揭示了内在的自我纠正的一个关键特征:通过多轮互动的性能收敛,并提供了这种收敛行为的机制分析。基于我们的实验结果和分析,我们揭示了收敛的潜在机制:一致注入的自我纠正指令激活道德概念,减少模型的不确定性,导致收敛的性能,因为激活的道德概念在连续几轮稳定。本文展示了强大的潜力,道德自我纠正表明,它具有理想的性能收敛。
摘要:Large Language Models (LLMs) are able to improve their responses when instructed to do so, a capability known as self-correction. When instructions provide only a general and abstract goal without specific details about potential issues in the response, LLMs must rely on their internal knowledge to improve response quality, a process referred to as intrinsic self-correction. The empirical success of intrinsic self-correction is evident in various applications, but how and why it is effective remains unknown. Focusing on moral self-correction in LLMs, we reveal a key characteristic of intrinsic self-correction: performance convergence through multi-round interactions; and provide a mechanistic analysis of this convergence behavior. Based on our experimental results and analysis, we uncover the underlying mechanism of convergence: consistently injected self-correction instructions activate moral concepts that reduce model uncertainty, leading to converged performance as the activated moral concepts stabilize over successive rounds. This paper demonstrates the strong potential of moral self-correction by showing that it exhibits a desirable property of converged performance.
【3】Poisoning Attacks on LLMs Require a Near-constant Number of Poison Samples
标题:对LLM的中毒攻击需要几乎恒定数量的毒物样本
链接:https://arxiv.org/abs/2510.07192
作者:Alexandra Souly, Javier Rando, Ed Chapman, Xander Davies, Burak Hasircioglu, Ezzeldin Shereen, Carlos Mougan, Vasilios Mavroudis, Erik Jones, Chris Hicks, Nicholas Carlini, Yarin Gal, Robert Kirk
摘要
:中毒攻击可以通过将恶意文档注入到大型语言模型(LLM)的训练数据中来危及其安全性。现有的工作已经研究了预训练中毒假设对手控制的百分比的训练语料库。然而,对于大型模型,即使很小的百分比也会转化为不切实际的大量数据。这项工作首次证明,中毒攻击需要几乎恒定数量的文档,而不管数据集大小如何。我们进行了迄今为止最大的预训练中毒实验,在龙猫最佳数据集(6B到260B令牌)上预训练了600 M到13B参数的模型。我们发现,250个中毒文档同样会影响所有模型和数据集大小的模型,尽管最大的模型在20倍以上的干净数据上进行训练。我们还进行了较小规模的实验,以消除可能影响攻击成功的因素,包括更广泛的中毒数据与干净数据的比例以及中毒样本的非随机分布。最后,我们证明了同样的动态微调过程中中毒。总而言之,我们的研究结果表明,通过数据中毒注入后门对于大型模型来说可能比以前认为的更容易,因为所需的毒药数量不会随着模型大小而增加,这突出表明需要对防御进行更多研究,以减轻未来模型中的这种风险。
摘要:Poisoning attacks can compromise the safety of large language models (LLMs) by injecting malicious documents into their training data. Existing work has studied pretraining poisoning assuming adversaries control a percentage of the training corpus. However, for large models, even small percentages translate to impractically large amounts of data. This work demonstrates for the first time that poisoning attacks instead require a near-constant number of documents regardless of dataset size. We conduct the largest pretraining poisoning experiments to date, pretraining models from 600M to 13B parameters on chinchilla-optimal datasets (6B to 260B tokens). We find that 250 poisoned documents similarly compromise models across all model and dataset sizes, despite the largest models training on more than 20 times more clean data. We also run smaller-scale experiments to ablate factors that could influence attack success, including broader ratios of poisoned to clean data and non-random distributions of poisoned samples. Finally, we demonstrate the same dynamics for poisoning during fine-tuning. Altogether, our results suggest that injecting backdoors through data poisoning may be easier for large models than previously believed as the number of poisons required does not scale up with model size, highlighting the need for more research on defences to mitigate this risk in future models.
【4】Quantifying Data Contamination in Psychometric Evaluations of LLMs
标题:LLM心理测量评估中量化数据污染
链接:https://arxiv.org/abs/2510.07175
作者:Jongwook Han, Woojung Song, Jonggeun Lee, Yohan Jo
备注:12 pages, 1 figure
摘要:最近的研究将心理测量问卷应用于大型语言模型(LLM),以评估高级心理结构,如价值观,个性,道德基础和黑暗特质。虽然先前的工作已经引起了人们对心理测量库存可能的数据污染的关注,这可能会威胁到这种评估的可靠性,一直没有系统的尝试来量化这种污染的程度。为了解决这一差距,我们提出了一个框架,系统地测量数据污染的心理测量评估LLM,评估三个方面:(1)项目记忆,(2)评价记忆,(3)目标分数匹配。将这个框架应用于来自主要家庭的21个模型和四个广泛使用的心理测量量表,我们提供了证据,证明流行的量表,如大五量表(BFI-44)和肖像价值问卷(PVQ-40)表现出强烈的污染,模型不仅记住项目,而且还可以调整他们的反应,以达到特定的目标分数。
摘要:Recent studies apply psychometric questionnaires to Large Language Models (LLMs) to assess high-level psychological constructs such as values, personality, moral foundations, and dark traits. Although prior work has raised concerns about possible data contamination from psychometric inventories, which may threaten the reliability of such evaluations, there has been no systematic attempt to quantify the extent of this contamination. To address this gap, we propose a framework to systematically measure data contamination in psychometric evaluations of LLMs, evaluating three aspects: (1) item memorization, (2) evaluation memorization, and (3) target score matching. Applying this framework to 21 models from major families and four widely used psychometric inventories, we provide evidence that popular inventories such as the Big Five Inventory (BFI-44) and Portrait Values Questionnaire (PVQ-40) exhibit strong contamination, where models not only memorize items but can also adjust their responses to achieve specific target scores.
【5】NurseLLM: The First Specialized Language Model for Nursing
标题:NurseLLM:第一个护理专业语言模型
链接:https://arxiv.org/abs/2510.07173
作者:Md Tawkat Islam Khondaker, Julia Harrington, Shady Shehata
备注:EMNLP 2025 Industry Track
摘要:大型语言模型(LLM)的最新进展显著改变了医疗系统。然而,他们在护理等专业领域的潜力在很大程度上仍然没有得到充分发掘。在这项工作中,我们介绍NurseLLM,第一个专门为多项选择题问答(MCQ)任务量身定制的护理专业LLM。我们开发了一个多阶段的数据生成管道,以建立第一个大规模的护理MCQ数据集,训练LLM广泛的护理主题。我们进一步引入多个护理基准,以进行严格的评估。我们广泛的实验表明,NurseLLM在不同的基准上优于SoTA通用和医学专业的LLM,强调了专业LLM对护理领域的重要性。最后,我们探讨了推理和多智能体协作系统在护理中的作用,突出了它们对未来研究和应用的承诺。
摘要:Recent advancements in large language models (LLMs) have significantly transformed medical systems. However, their potential within specialized domains such as nursing remains largely underexplored. In this work, we introduce NurseLLM, the first nursing-specialized LLM tailored for multiple choice question-answering (MCQ) tasks. We develop a multi-stage data generation pipeline to build the first large scale nursing MCQ dataset to train LLMs on a broad spectrum of nursing topics. We further introduce multiple nursing benchmarks to enable rigorous evaluation. Our extensive experiments demonstrate that NurseLLM outperforms SoTA general-purpose and medical-specialized LLMs of comparable size on different benchmarks, underscoring the importance of a specialized LLM for the nursing domain. Finally, we explore the role of reasoning and multi-agent collaboration systems in nursing, highlighting their promise for future research and applications.
【6】Vision-Language-Action Models for Robotics: A Review Towards Real-World Applications
标题:机器人视觉-语言-动作模型:现实世界应用回顾
链接:https://arxiv.org/abs/2510.07077
作者:Kento Kawaharazuka, Jihoon Oh, Jun Yamada, Ingmar Posner, Yuke Zhu
备注:Accepted to IEEE Access, website: this https URL
摘要
:随着越来越多的努力利用机器人的大型语言模型(LLM)和视觉语言模型(VLM)的进步,视觉语言动作(VLA)模型最近获得了极大的关注。通过大规模统一视觉、语言和动作数据,VLA模型旨在学习跨不同任务、对象、实施方式和环境进行概括的策略。这种泛化能力预计将使机器人能够以最少或没有额外的特定任务数据来解决新的下游任务,从而促进更灵活和可扩展的现实世界部署。与以前的调查,专注于狭隘的动作表示或高层次的模型架构,这项工作提供了一个全面的,全栈的审查,集成的软件和硬件组件的VLA系统。特别是,本文提供了一个系统的检讨,包括他们的战略和架构的过渡,架构和积木,特定模态的处理技术,学习范式。此外,为了支持在现实世界的机器人应用中部署VLA,我们还回顾了常用的机器人平台,数据收集策略,公开可用的数据集,数据增强方法和评估基准。通过这次全面的调查,本文旨在为机器人社区提供实用的指导,将VLA应用于现实世界的机器人系统。按训练方法、评估方法、模态和数据集分类的所有参考文献都可以在我们的项目网站https://vla-survey.github.io上的表格中找到。
摘要:Amid growing efforts to leverage advances in large language models (LLMs) and vision-language models (VLMs) for robotics, Vision-Language-Action (VLA) models have recently gained significant attention. By unifying vision, language, and action data at scale, which have traditionally been studied separately, VLA models aim to learn policies that generalise across diverse tasks, objects, embodiments, and environments. This generalisation capability is expected to enable robots to solve novel downstream tasks with minimal or no additional task-specific data, facilitating more flexible and scalable real-world deployment. Unlike previous surveys that focus narrowly on action representations or high-level model architectures, this work offers a comprehensive, full-stack review, integrating both software and hardware components of VLA systems. In particular, this paper provides a systematic review of VLAs, covering their strategy and architectural transition, architectures and building blocks, modality-specific processing techniques, and learning paradigms. In addition, to support the deployment of VLAs in real-world robotic applications, we also review commonly used robot platforms, data collection strategies, publicly available datasets, data augmentation methods, and evaluation benchmarks. Throughout this comprehensive survey, this paper aims to offer practical guidance for the robotics community in applying VLAs to real-world robotic systems. All references categorized by training approach, evaluation method, modality, and dataset are available in the table on our project website: https://vla-survey.github.io .
【7】Accelerating Sparse Ternary GEMM for Quantized LLM inference on Apple Silicon
标题:加速稀疏三进制GEMM,以在Apple Silicon上进行量化LLM推断
链接:https://arxiv.org/abs/2510.06957
作者:Baraq Lipshitz (ETH Zurich), Alessio Melone (ETH Zurich), Charalampos Maraziaris (ETH Zurich), Muhammed Bilal (ETH Zurich)
摘要:稀疏三进制通用矩阵-矩阵乘法(GEMM)在Apple Silicon CPU的现有库中仍然没有得到充分优化。我们提出了一个稀疏三进制GEMM内核专门为苹果的M系列处理器优化。我们提出了一组架构感知优化,包括一种新的块和交织稀疏数据格式,以提高内存的局部性,策略,以增加指令级并行(ILP),和基于NEON的单指令多数据(SIMD)矢量化,以利用数据级并行。对于具有50%三进制非零值(稀疏度)的大型矩阵,我们的标量实现比传统的三进制压缩稀疏列(TCSC)基线性能提高了5.98倍,达到了处理器理论峰值性能的50.2%,并且在不同的稀疏度水平上保持稳定。我们的矢量化实现为具有25%稀疏度的大型矩阵提供了高达5.59倍的性能提升,并且在不同的稀疏度水平上保持稳定。
摘要:Sparse Ternary General Matrix-Matrix Multiplication (GEMM) remains under-optimized in existing libraries for Apple Silicon CPUs. We present a Sparse Ternary GEMM kernel optimized specifically for Apple's M-series processors. We propose a set of architecture-aware optimizations, including a novel blocked and interleaved sparse data format to improve memory locality, strategies to increase Instruction-Level Parallelism (ILP), and NEON-based Single Instruction Multiple Data (SIMD) vectorization to exploit data-level parallelism. Our scalar implementation achieves up to a 5.98x performance increase over a traditional Ternary Compressed Sparse Column (TCSC) baseline for large matrices with 50% ternary nonzero values (sparsity), reaching up to a 50.2% of the processor's theoretical peak performance, and remains stable across varying sparsity levels. Our vectorized implementation delivers up to a 5.59x performance increase for large matrices with 25% sparsity, and remains stable across varying sparsity levels.
【8】Utilizing Large Language Models for Machine Learning Explainability
标题:利用大型语言模型实现机器学习解释性
链接:https://arxiv.org/abs/2510.06912
作者:Alexandros Vassiliades, Nikolaos Polatidis, Stamatios Samaras, Sotiris Diplaris, Ignacio Cabrera Martin, Yannis Manolopoulos, Stefanos Vrochidis, Ioannis Kompatsiaris
摘要:本研究探讨了大型语言模型(LLM)在用于自主生成机器学习(ML)解决方案时的可解释性能力。我们研究了两个分类任务:(i)一个二进制分类问题,重点是预测驾驶员的警觉性状态,以及(ii)基于酵母数据集的多标签分类问题。三个最先进的LLM(即OpenAI GPT,Anthropic Claude和DeepSeek)被提示为四个常见的分类器设计训练管道:随机森林,XGBoost,多层感知器和长短期记忆网络。使用SHAP(SHapley Additive exPlanations)评估生成的模型的预测性能(召回率,精度和F1分数)和可解释性。具体来说,我们测量平均SHAP保真度(SHAP近似和模型输出之间的均方误差)和平均SHAP稀疏度(被认为有影响力的特征数量)。结果表明,LLM能够生成有效且可解释的模型,实现高保真度和一致的稀疏性,突出了它们作为可解释ML管道生成的自动化工具的潜力。结果表明,LLM可以产生有效的,可解释的管道,高保真度和一致的稀疏性,密切匹配手动工程基线。
摘要:This study explores the explainability capabilities of large language models (LLMs), when employed to autonomously generate machine learning (ML) solutions. We examine two classification tasks: (i) a binary classification problem focused on predicting driver alertness states, and (ii) a multilabel classification problem based on the yeast dataset. Three state-of-the-art LLMs (i.e. OpenAI GPT, Anthropic Claude, and DeepSeek) are prompted to design training pipelines for four common classifiers: Random Forest, XGBoost, Multilayer Perceptron, and Long Short-Term Memory networks. The generated models are evaluated in terms of predictive performance (recall, precision, and F1-score) and explainability using SHAP (SHapley Additive exPlanations). Specifically, we measure Average SHAP Fidelity (Mean Squared Error between SHAP approximations and model outputs) and Average SHAP Sparsity (number of features deemed influential). The results reveal that LLMs are capable of producing effective and interpretable models, achieving high fidelity and consistent sparsity, highlighting their potential as automated tools for interpretable ML pipeline generation. The results show that LLMs can produce effective, interpretable pipelines with high fidelity and consistent sparsity, closely matching manually engineered baselines.
【9】Efficient numeracy in language models through single-token number embeddings
标题:通过单令牌数字嵌入实现语言模型中的高效算术能力
链接:https://arxiv.org/abs/2510.06824
作者:Linus Kreitner, Paul Hager, Jonathan Mengedoht, Georgios Kaissis, Daniel Rueckert, Martin J. Menten
摘要:为了推动科学和工程的进步,大型语言模型(LLM)必须能够处理大量数值数据并有效地解决长时间计算。目前,这只能通过使用外部工具或广泛的推理链来实现,这要么限制了LLM的数值直觉,要么限制了它们可以解决的问题的长度。我们发现,前沿LLM需要大量的推理令牌来解决甚至是基本的计算,这是加剧了他们的令牌化策略,将单个数字分成多个令牌。这激发了对高效和有效的单令牌数字编码的需求。我们引入了一套这样的编码的desiderata,并表明现有的方法无法满足他们。为了解决这些缺点,我们提出了BitTokens,一种新的令牌化策略,使用其IEEE 754二进制浮点表示将任何数字嵌入到单个令牌中。通过大量的实验,我们证明了我们的BitTokens允许甚至小的语言模型学习算法,几乎完美地解决基本的算术运算。这种新获得的效率可以扩展语言模型可以解决的问题的长度和复杂性。
摘要:To drive progress in science and engineering, large language models (LLMs) must be able to process large amounts of numerical data and solve long calculations efficiently. This is currently only possible through the use of external tools or extensive reasoning chains, either limiting the numerical intuition of LLMs or limiting the length of problems they can solve. We show that frontier LLMs require excessive amounts of reasoning tokens to solve even basic calculations, which is exacerbated by their tokenization strategies that split single numbers into multiple tokens. This motivates the need for efficient and effective single-token number encodings. We introduce a set of desiderata for such encodings and show that existing approaches fail to fulfill them. To address these shortcomings, we propose BitTokens, a novel tokenization strategy that embeds any number into a single token using its IEEE 754 binary floating-point representation. Through extensive experiments we show that our BitTokens allow even small language models to learn algorithms that solve basic arithmetic operations nearly perfectly. This newly gained efficiency could expand the length and complexity of problems language models can solve.
【10】MultiCNKG: Integrating Cognitive Neuroscience, Gene, and Disease Knowledge Graphs Using Large Language Models
标题:MultiCNKG:使用大型语言模型集成认知神经科学、基因和疾病知识图
链接:https://arxiv.org/abs/2510.06742
作者:Ali Sarabadani, Kheirolah Rahsepar Fard
摘要:大型语言模型(LLM)的出现彻底改变了生物医学和认知科学中知识图(KG)的整合,克服了传统机器学习方法在捕捉基因,疾病和认知过程之间复杂语义联系方面的局限性。我们介绍了MultiCNKG,这是一个创新的框架,它融合了三个关键的知识源:认知神经科学知识图(CNKG),具有2.9K节点和4.3K边,跨越9个节点类型和20个边类型;基因本体(GO),具有43 K节点和75 K边,分为3个节点类型和4个边类型;以及疾病本体(DO),其包括11.2K节点和8.8K边,具有1个节点类型和2个边类型。利用GPT-4等LLM,我们进行实体对齐,语义相似性计算和图形增强,以创建一个连接遗传机制,神经系统疾病和认知功能的有凝聚力的KG。产生的MultiCNKG包含5种类型的6.9K节点(例如,基因,疾病,认知过程)和11.3K边缘跨越7种类型(例如,原因,相关,调节),促进从分子到行为域的多层次视图。使用诸如精确度(85.20%)、召回率(87.30%)、覆盖率(92.18%)、图一致性(82.50%)、新颖性检测(40.28%)和专家验证(89.50%)等指标的评估肯定了其鲁棒性和一致性。使用TransE(MR:391,MRR:0.411)和RotatE(MR:263,MRR:0.395)等模型进行的链路预测评估显示,与FB 15 k-237和WN 18 RR等基准相比,性能具有竞争力。该KG推进了个性化医疗,认知障碍诊断和认知神经科学假设制定的应用。
摘要:The advent of large language models (LLMs) has revolutionized the integration of knowledge graphs (KGs) in biomedical and cognitive sciences, overcoming limitations in traditional machine learning methods for capturing intricate semantic links among genes, diseases, and cognitive processes. We introduce MultiCNKG, an innovative framework that merges three key knowledge sources: the Cognitive Neuroscience Knowledge Graph (CNKG) with 2.9K nodes and 4.3K edges across 9 node types and 20 edge types; Gene Ontology (GO) featuring 43K nodes and 75K edges in 3 node types and 4 edge types; and Disease Ontology (DO) comprising 11.2K nodes and 8.8K edges with 1 node type and 2 edge types. Leveraging LLMs like GPT-4, we conduct entity alignment, semantic similarity computation, and graph augmentation to create a cohesive KG that interconnects genetic mechanisms, neurological disorders, and cognitive functions. The resulting MultiCNKG encompasses 6.9K nodes across 5 types (e.g., Genes, Diseases, Cognitive Processes) and 11.3K edges spanning 7 types (e.g., Causes, Associated with, Regulates), facilitating a multi-layered view from molecular to behavioral domains. Assessments using metrics such as precision (85.20%), recall (87.30%), coverage (92.18%), graph consistency (82.50%), novelty detection (40.28%), and expert validation (89.50%) affirm its robustness and coherence. Link prediction evaluations with models like TransE (MR: 391, MRR: 0.411) and RotatE (MR: 263, MRR: 0.395) show competitive performance against benchmarks like FB15k-237 and WN18RR. This KG advances applications in personalized medicine, cognitive disorder diagnostics, and hypothesis formulation in cognitive neuroscience.
【11】Scaling LLM Multi-turn RL with End-to-end Summarization-based Context Management
标题:使用基于端到端摘要的上下文管理扩展LLM多回合RL
链接:https://arxiv.org/abs/2510.06727
作者:Miao Lu, Weiwei Sun, Weihua Du, Zhan Ling, Xuesong Yao, Kang Liu, Jiecao Chen
摘要
:我们研究了强化学习(RL)对大型语言模型(LLM)代理进行微调,以用于长期多轮工具使用,其中上下文长度很快成为根本瓶颈。现有的RL流水线可能会遭受降级的指令跟踪,过度的推出成本,最重要的是,严格的上下文限制。为了应对这些挑战,我们引入了基于摘要的上下文管理培训。具体而言,它通过LLM生成的摘要定期压缩工具使用历史,这些摘要保留任务相关信息,以保持紧凑的上下文,同时使代理能够扩展到固定的上下文窗口之外。在此基础上,我们推导出一个策略梯度表示,无缝地使标准LLM RL基础设施优化工具使用行为以及端到端的总结策略。我们使用\underline{SU}mmarization augmented \underline{P} policy\underline{O}优化(\texttt{SUPO})来实例化这个框架,这是一种LLM RL算法,可以在固定的上下文限制之外进行长期训练。交互式函数调用和搜索任务的实验表明,\texttt{SUPO}显着提高了成功率,同时保持相同或更低的工作上下文长度相比,基线。我们还证明,对于复杂的搜索任务,\texttt{SUPO}可以进一步提高评估性能时,缩放测试时间的最大轮的总结超过训练时间。我们的研究结果建立了基于总结的上下文管理作为一个原则性和可扩展的方法,用于训练RL代理超过一个固定的上下文长度限制。
摘要:We study reinforcement learning (RL) fine-tuning of large language model (LLM) agents for long-horizon multi-turn tool use, where context length quickly becomes a fundamental bottleneck. Existing RL pipelines can suffer from degraded instruction following, excessive rollout costs, and most importantly, strict context limits. To address these challenges, we introduce summarization-based context management to training. In specific, it periodically compresses the tool using history by LLM-generated summaries that retain task-relevant information to keep a compact context while enabling the agent to scale beyond the fixed context window. Building on this formulation, we derive a policy gradient representation that seamlessly enables standard LLM RL infrastructures to optimize both tool-use behaviors as well as summarization strategies in an end-to-end fashion. We instantiate this framework with \underline{SU}mmarization augmented \underline{P}olicy \underline{O}ptimization (\texttt{SUPO}), an LLM RL algorithm that enables long-horizon training beyond a fixed context limit. Experiments on interactive function calling and searching tasks demonstrate that \texttt{SUPO} significantly improves the success rate while maintaining the same or even lower working context length compared to baselines. We also demonstrate that for complex searching tasks, \texttt{SUPO} can further improve the evaluation performance when scaling test-time maximum round of summarization beyond that of training time. Our results establish summarization-based context management as a principled and scalable approach for training RL agents beyond a fixed context length limit.
【12】Differentially Private Synthetic Text Generation for Retrieval-Augmented Generation (RAG)
标题:用于检索增强生成(RAG)的差异私有合成文本生成
链接:https://arxiv.org/abs/2510.06719
作者:Junki Mori, Kazuya Kakizaki, Taiki Miyagawa, Jun Sakuma
备注:Under review
摘要:检索增强生成(RAG)通过将大型语言模型(LLM)置于外部知识中来增强它们。然而,它在敏感领域的应用受到隐私风险的限制。现有的私有RAG方法通常依赖于查询时间差分隐私(DP),这需要重复的噪声注入,并导致累积的隐私丢失。为了解决这个问题,我们提出了DP-SynRAG,一个框架,使用LLM生成差异私人合成RAG数据库。与现有方法不同,合成文本一旦创建就可以重复使用,从而避免重复的噪声注入和额外的隐私成本。为了保留下游RAG任务的基本信息,DP-SynRAG扩展了私有预测,它指示LLM以DP方式生成模拟二次采样数据库记录的文本。实验表明,DP-SynRAG在保持固定隐私预算的同时,实现了优于现有私有RAG系统的性能,为隐私保护RAG提供了一种可扩展的解决方案。
摘要:Retrieval-Augmented Generation (RAG) enhances large language models (LLMs) by grounding them in external knowledge. However, its application in sensitive domains is limited by privacy risks. Existing private RAG methods typically rely on query-time differential privacy (DP), which requires repeated noise injection and leads to accumulated privacy loss. To address this issue, we propose DP-SynRAG, a framework that uses LLMs to generate differentially private synthetic RAG databases. Unlike prior methods, the synthetic text can be reused once created, thereby avoiding repeated noise injection and additional privacy costs. To preserve essential information for downstream RAG tasks, DP-SynRAG extends private prediction, which instructs LLMs to generate text that mimics subsampled database records in a DP manner. Experiments show that DP-SynRAG achieves superior performanec to the state-of-the-art private RAG systems while maintaining a fixed privacy budget, offering a scalable solution for privacy-preserving RAG.
【13】Learning to Rewrite Prompts for Bootstrapping LLMs on Downstream Tasks
标题:学习重写脚本以在下游任务上引导LLM
链接:https://arxiv.org/abs/2510.06695
作者:Qinhao Zhou, Xiang Xiang, Kun He, John E. Hopcroft
摘要:近年来,人们对大型语言模型(LLM)的兴趣日益浓厚,这极大地促进了快速工程,从手动设计过渡到基于模型的优化。LLM的指令通常包括两个组件:定义任务或目标的\textit{instruction}和针对指令类型定制的\textit{input}。在机器翻译等自然语言生成(NLG)任务中,\textit{input}组件尤为关键,而\textit{instruction}组件则趋于简洁。现有的快速工程方法主要集中在优化一般任务的\textit{instruction}组件,通常需要大参数LLM作为辅助工具。然而,这些方法对机器翻译等任务的适用性有限,其中\textit{input}组件起着更关键的作用。为了解决这个问题,本文介绍了一种新的提示优化方法,专门为机器翻译任务设计。所提出的方法采用了一个使用基于反向翻译的策略训练的小参数模型,显著降低了单任务优化的训练开销,同时提供了高效的性能。经过一定的调整,这种方法也可以扩展到其他下游任务。
摘要:In recent years, the growing interest in Large Language Models (LLMs) has significantly advanced prompt engineering, transitioning from manual design to model-based optimization. Prompts for LLMs generally comprise two components: the \textit{instruction}, which defines the task or objective, and the \textit{input}, which is tailored to the instruction type. In natural language generation (NLG) tasks such as machine translation, the \textit{input} component is particularly critical, while the \textit{instruction} component tends to be concise. Existing prompt engineering methods primarily focus on optimizing the \textit{instruction} component for general tasks, often requiring large-parameter LLMs as auxiliary tools. However, these approaches exhibit limited applicability for tasks like machine translation, where the \textit{input} component plays a more pivotal role. To address this limitation, this paper introduces a novel prompt optimization method specifically designed for machine translation tasks. The proposed approach employs a small-parameter model trained using a back-translation-based strategy, significantly reducing training overhead for single-task optimization while delivering highly effective performance. With certain adaptations, this method can also be extended to other downstream tasks.
【14】From Acceleration to Saturation: Scaling Behavior of Bootstrapped Language Model Pretraining
标题:从加速到饱和:引导语言模型预训练的缩放行为
链接:https://arxiv.org/abs/2510.06548
作者:Seng Pei Liew, Takuya Kato
备注
:22 pages, 11 figures, an abridged version to appear in NeurIPS 2025 LLM Evaluation Workshop
摘要:自举预训练,即,将预训练的基础模型重新用于进一步的预训练,例如连续的预训练或模型增长,在减少从头开始训练语言模型的成本方面是有希望的。然而,它的有效性仍然不清楚,特别是当应用于过度训练的基础模型时。在这项工作中,我们经验性地研究了自举预训练的缩放行为,发现其缩放效率以可预测的方式降低:第二阶段预训练令牌的缩放指数随着用于预训练基本模型的令牌数量的增加而递减。第一阶段和第二阶段令牌的联合依赖性通过简单的缩放律精确地建模。这种饱和效应揭示了多阶段预训练策略中的一个基本权衡:模型预训练的范围越广,自举提供的额外好处就越少。我们的研究结果为有效的语言模型训练提供了实用的见解,并为过度训练模型的重用提出了重要的考虑因素。
摘要:Bootstrapped pretraining, i.e., the reuse of a pretrained base model for further pretraining, such as continual pretraining or model growth, is promising at reducing the cost of training language models from scratch. However, its effectiveness remains unclear, especially when applied to overtrained base models. In this work, we empirically study the scaling behavior of bootstrapped pretraining and find that its scaling efficiency diminishes in a predictable manner: The scaling exponent with respect to second-stage pretraining tokens decreases logarithmically with the number of tokens used to pretrain the base model. The joint dependence on first- and second-stage tokens is accurately modeled by a simple scaling law. Such saturation effect reveals a fundamental trade-off in multi-stage pretraining strategies: the more extensively a model is pretrained, the less additional benefit bootstrapping provides. Our findings provide practical insights for efficient language model training and raise important considerations for the reuse of overtrained models.
【15】Auto-Prompt Ensemble for LLM Judge
标题:LLM法官自动提示报名
链接:https://arxiv.org/abs/2510.06538
作者:Jiajie Li, Huayi Zhang, Peng Lin, Jinjun Xiong, Wei Xu
摘要:我们提出了一个新的框架,提高了LLM法官的可靠性,有选择地增强LLM辅助评价维度。现有的法学硕士法官往往错过了关键的评估维度,因为他们没有认识到隐含的标准,人类评估。为了应对这一挑战,我们提出了自动提示Entrance(APE),一个自适应的框架,自动学习评估维度从其失败的情况。APE采用了一种基于置信度的集成机制,通过一种新的置信度估计方法(称为集体置信度)来决定何时采用额外评估维度的判断。大量的实验表明,APE提高了LLM Judge在不同标准基准测试中的可靠性。例如,在zero-shot设置中,APE将奖励试验台上的GPT-4 o一致率从87.2%提高到90.5%。总的来说,APE为LLM法官提供了一种原则性的方法来利用测试时间计算,并弥合人类和LLM法官之间的评估差距。
摘要:We present a novel framework that improves the reliability of LLM judges by selectively augmenting LLM with auxiliary evaluation dimensions. Existing LLM judges often miss crucial evaluation dimensions because they fail to recognize the implicit standards underlying human assessments. To address this challenge, we propose the Auto-Prompt Ensemble (APE), an adaptive framework that automatically learns evaluation dimensions from its failure cases. APE incorporates a confidence-based ensemble mechanism to decide when to adopt the judgments from additional evaluation dimensions through a novel confidence estimation approach called Collective Confidence. Extensive experiments demonstrate that APE improves the reliability of LLM Judge across diverse standard benchmarks. For instance, APE enhances GPT-4o agreement rate on Reward Bench from 87.2% to 90.5% in the zero-shot setting. Overall, APE provides a principled approach for LLM Judge to leverage test-time computation, and bridge the evaluation gap between human and LLM judges.
【16】From Description to Detection: LLM based Extendable O-RAN Compliant Blind DoS Detection in 5G and Beyond
标题:从描述到检测:5G及更高版本中基于LLM的可扩展O-RAN兼容盲DPS检测
链接:https://arxiv.org/abs/2510.06530
作者:Thusitha Dayaratne, Ngoc Duy Pham, Viet Vo, Shangqi Lai, Sharif Abuadbba, Hajime Suzuki, Xingliang Yuan, Carsten Rudolph
摘要:随着5G的引入,移动通信的质量和体验得到了显著改善,预计这些改善将持续到5G时代之后。然而,诸如无线电资源控制(RRC)和非接入层(NAS)之类的控制平面协议中的漏洞构成了重大的安全威胁,诸如盲拒绝服务(DoS)攻击。尽管现有的异常检测方法可以利用基于规则的系统或传统的机器学习方法,但这些方法有几个局限性,包括需要大量的训练数据、预定义的规则和有限的可解释性。针对这些挑战,我们提出了一种新的异常检测框架,利用大型语言模型(LLM)的能力,在zero-shot模式与无序的数据和短的自然语言攻击描述内的开放无线接入网络(O-RAN)架构。我们分析的鲁棒性提示变化,证明了实用性的自动化攻击的描述,并表明检测质量依赖于语义完整性的描述,而不是其措辞或长度。我们利用RRC/NAS数据集来评估解决方案,并提供开源和专有LLM实现的广泛比较,以展示攻击检测的卓越性能。我们进一步验证了我们的框架内的O-RAN的实时约束的实用性,说明其潜在的检测其他第3层攻击。
摘要:The quality and experience of mobile communication have significantly improved with the introduction of 5G, and these improvements are expected to continue beyond the 5G era. However, vulnerabilities in control-plane protocols, such as Radio Resource Control (RRC) and Non-Access Stratum (NAS), pose significant security threats, such as Blind Denial of Service (DoS) attacks. Despite the availability of existing anomaly detection methods that leverage rule-based systems or traditional machine learning methods, these methods have several limitations, including the need for extensive training data, predefined rules, and limited explainability. Addressing these challenges, we propose a novel anomaly detection framework that leverages the capabilities of Large Language Models (LLMs) in zero-shot mode with unordered data and short natural language attack descriptions within the Open Radio Access Network (O-RAN) architecture. We analyse robustness to prompt variation, demonstrate the practicality of automating the attack descriptions and show that detection quality relies on the semantic completeness of the description rather than its phrasing or length. We utilise an RRC/NAS dataset to evaluate the solution and provide an extensive comparison of open-source and proprietary LLM implementations to demonstrate superior performance in attack detection. We further validate the practicality of our framework within O-RAN's real-time constraints, illustrating its potential for detecting other Layer-3 attacks.
【17】Valid Stopping for LLM Generation via Empirical Dynamic Formal Lift
标题:通过经验动态形式提升有效停止LLM生成
链接:https://arxiv.org/abs/2510.06478
作者:Sanjeda Akter, Ibne Farabi Shihab, Anuj Sharma
摘要:我们引入了Sequential-EDFL(Empirical Dynamic Formal Lift),将任何时间有效的顺序测试应用于语言模型生成停止。我们的方法跟踪信息提升-完整模型和故意削弱的“骨架”基线之间的对数似然比-使用自归一化的dumerical-Bernstein e过程,提供正式的delta级错误控制,而不管停止时间。我们通过在线均值估计处理未知中心,通过混合e-过程组合多个参数,并支持分布漂移下的自适应重置。在六个基准测试中,与顺序基线相比,Sequential-EDFL将发电量减少了22-28%,同时以12%的计算开销保持增量级控制。我们引入了自动化骨架(蒸馏子模型,随机对数),并显示了骨架家族的鲁棒性。将EDFL与轻量级正确性门(句子边界+验证器)组合在一起,可以提高最终任务的正确性,同时通过延迟停止来保持任何时间有效的保证。我们的证书控制的是信息的充分性,而不是事实的正确性--即使有了门,10.9%的停止序列仍然是不正确的(没有门的情况下是13.2-22.7%)。EDFL作为第一级过滤器,可将验证负担减少83%,而不是作为安全关键域的独立解决方案。
摘要:We introduce Sequential-EDFL (Empirical Dynamic Formal Lift), applying anytime-valid sequential testing to language model generation stopping. Our approach tracks information lift -- the log-likelihood ratio between full models and deliberately weakened "skeleton" baselines -- using self-normalized empirical-Bernstein e-processes that provide formal delta-level error control regardless of stopping time. We handle unknown centering through online mean estimation, combine multiple parameters via mixture e-processes, and support adaptive resets under distributional drift. On six benchmarks, Sequential-EDFL reduces generation by 22-28% vs. sequential baselines while maintaining delta-level control with 12% computational overhead. We introduce automated skeletons (distilled submodels, randomized logits) and show robustness across skeleton families. Composing EDFL with a lightweight correctness gate (sentence boundaries + verifier) improves end-task correctness while preserving anytime-valid guarantees by only delaying stopping. Our certificates control information sufficiency, not factual correctness -- 10.9% of stopped sequences remain incorrect even with the gate (13.2-22.7% without it). EDFL serves as a first-stage filter reducing verification burden by 83%, not as a standalone solution for safety-critical domains.
【18】Attention Sinks and Compression Valleys in LLMs are Two Sides of the Same Coin
标题:LLM中的注意力下沉和压缩谷是同一枚硬币的两面
链接:https://arxiv.org/abs/2510.06477
作者:Enrique Queipo-de-Llano, Álvaro Arroyo, Federico Barbero, Xiaowen Dong, Michael Bronstein, Yann LeCun, Ravid Shwartz-Ziv
摘要:注意力下沉和压缩谷作为大型语言模型中的两个令人困惑的现象引起了人们的广泛关注,但一直被孤立地研究。在这项工作中,我们提出了一个令人惊讶的注意力汇和压缩谷之间的联系,追踪到大规模激活的残余流的形成。我们从理论上证明,大规模激活必然会产生代表性压缩,并对由此产生的熵减少建立界限。通过几个模型(410 M-120 B参数)的实验,我们证实,当序列标记在中间层产生极端激活规范时,压缩谷和注意力汇同时出现。靶向消融研究验证了我们的理论预测。这种统一的观点促使我们提出信息流的混合-压缩-精炼理论,试图解释LLM如何通过大量激活控制注意力和表征压缩来深入组织计算。具体来说,我们认为基于Transformer的LLM在三个不同的阶段处理令牌:(1)早期层的广泛混合,(2)中间层的压缩计算和有限混合,以及(3)后期层的选择性细化。我们的框架有助于解释为什么嵌入任务在中间层表现最好,而生成任务受益于全深度处理,澄清任务相关表示的差异。
摘要:Attention sinks and compression valleys have attracted significant attention as two puzzling phenomena in large language models, but have been studied in isolation. In this work, we present a surprising connection between attention sinks and compression valleys, tracing both to the formation of massive activations in the residual stream. We prove theoretically that massive activations necessarily produce representational compression and establish bounds on the resulting entropy reduction. Through experiments across several models (410M-120B parameters), we confirm that when the beginning-of-sequence token develops extreme activation norms in the middle layers, both compression valleys and attention sinks emerge simultaneously. Targeted ablation studies validate our theoretical predictions. This unified view motivates us to propose the Mix-Compress-Refine theory of information flow, as an attempt to explain how LLMs organize their computation in depth by controlling attention and representational compression via massive activations. Specifically, we posit that Transformer-based LLMs process tokens in three distinct phases: (1) broad mixing in the early layers, (2) compressed computation with limited mixing in the middle layers, and (3) selective refinement in the late layers. Our framework helps explain why embedding tasks perform best at intermediate layers, whereas generation tasks benefit from full-depth processing, clarifying differences in task-dependent representations.
【19】MCCE: A Framework for Multi-LLM Collaborative Co-Evolution
标题:MCCE:多LLM协作协同进化的框架
链接:https://arxiv.org/abs/2510.06270
作者:Nian Ran, Zhongzheng Li, Yue Wang, Qingsong Ran, Xiaoyuan Zhang, Shikun Feng, Richard Allmendinger, Xiaoguang Zhao
摘要:多目标离散优化问题,如分子设计,由于其庞大的和非结构化的组合空间,提出了重大的挑战。传统的进化算法往往陷入局部最优,而专家知识可以为加速收敛提供重要的指导。大型语言模型(LLM)提供了强大的先验知识和推理能力,使其在专家知识重要时成为自然优化器。然而,闭源LLM虽然在探索方面很强,但无法更新其参数,因此无法内化经验。相反,较小的开放模型可以不断微调,但缺乏广泛的知识和推理能力。我们介绍了多LLM协作协同进化(MCCE),这是一个混合框架,将冻结的闭源LLM与轻量级可训练模型结合起来。该系统保持了过去搜索过程的轨迹记忆;小模型通过强化学习逐步完善,两个模型在全局探索中相互支持和补充。与模型蒸馏不同,这个过程通过相互启发来增强两个模型的能力。多目标药物设计基准上的实验表明,MCCE实现了最先进的帕累托前沿质量,并始终优于基线。这些结果突出了一个新的范式,使混合LLM系统的持续发展,结合知识驱动的探索与经验驱动的学习。
摘要
:Multi-objective discrete optimization problems, such as molecular design, pose significant challenges due to their vast and unstructured combinatorial spaces. Traditional evolutionary algorithms often get trapped in local optima, while expert knowledge can provide crucial guidance for accelerating convergence. Large language models (LLMs) offer powerful priors and reasoning ability, making them natural optimizers when expert knowledge matters. However, closed-source LLMs, though strong in exploration, cannot update their parameters and thus cannot internalize experience. Conversely, smaller open models can be continually fine-tuned but lack broad knowledge and reasoning strength. We introduce Multi-LLM Collaborative Co-evolution (MCCE), a hybrid framework that unites a frozen closed-source LLM with a lightweight trainable model. The system maintains a trajectory memory of past search processes; the small model is progressively refined via reinforcement learning, with the two models jointly supporting and complementing each other in global exploration. Unlike model distillation, this process enhances the capabilities of both models through mutual inspiration. Experiments on multi-objective drug design benchmarks show that MCCE achieves state-of-the-art Pareto front quality and consistently outperforms baselines. These results highlight a new paradigm for enabling continual evolution in hybrid LLM systems, combining knowledge-driven exploration with experience-driven learning.
【20】Ensemble Deep Learning and LLM-Assisted Reporting for Automated Skin Lesion Diagnosis
标题:利用深度学习和LLM辅助报告进行自动皮肤病变诊断
链接:https://arxiv.org/abs/2510.06260
作者:Sher Khan, Raz Muhammad, Adil Hussain, Muhammad Sajjad, Muhammad Rashid
摘要:皮肤恶性肿瘤需要早期检测以获得良好的结果,但目前的诊断存在观察者之间的差异和访问差异。虽然人工智能显示出了希望,但现有的皮肤病学系统受到同质架构、跨肤色的数据集偏差以及将自然语言处理视为单独的事后解释而不是临床决策的组成部分的碎片化方法的限制。我们引入了一个统一的框架,通过两项协同创新从根本上重新构想了皮肤病诊断的AI集成。首先,一个有目的的异构集成的架构不同的卷积神经网络提供了互补的诊断观点,与内在的不确定性机制标记不一致的情况下,专家审查-模仿临床最佳实践。其次,我们将大型语言模型功能直接嵌入到诊断工作流程中,将分类输出转换为具有临床意义的评估,同时满足医疗文档要求并提供以患者为中心的教育。这种无缝集成可生成结构化报告,具有精确的病变特征,可访问的诊断推理和可操作的监测指导-使患者能够在访视之间识别早期预警信号。通过在单个内聚系统中解决诊断可靠性和通信障碍,我们的方法弥合了阻碍以前的AI实现实现临床影响的关键翻译差距。该框架代表了可部署的皮肤病学AI的重大进步,可提高诊断精度,同时积极支持从最初检测到患者教育的连续护理,最终提高皮肤病变的早期干预率。
摘要:Cutaneous malignancies demand early detection for favorable outcomes, yet current diagnostics suffer from inter-observer variability and access disparities. While AI shows promise, existing dermatological systems are limited by homogeneous architectures, dataset biases across skin tones, and fragmented approaches that treat natural language processing as separate post-hoc explanations rather than integral to clinical decision-making. We introduce a unified framework that fundamentally reimagines AI integration for dermatological diagnostics through two synergistic innovations. First, a purposefully heterogeneous ensemble of architecturally diverse convolutional neural networks provides complementary diagnostic perspectives, with an intrinsic uncertainty mechanism flagging discordant cases for specialist review -- mimicking clinical best practices. Second, we embed large language model capabilities directly into the diagnostic workflow, transforming classification outputs into clinically meaningful assessments that simultaneously fulfill medical documentation requirements and deliver patient-centered education. This seamless integration generates structured reports featuring precise lesion characterization, accessible diagnostic reasoning, and actionable monitoring guidance -- empowering patients to recognize early warning signs between visits. By addressing both diagnostic reliability and communication barriers within a single cohesive system, our approach bridges the critical translational gap that has prevented previous AI implementations from achieving clinical impact. The framework represents a significant advancement toward deployable dermatological AI that enhances diagnostic precision while actively supporting the continuum of care from initial detection through patient education, ultimately improving early intervention rates for skin lesions.
【21】Textual interpretation of transient image classifications from large language models
标题:来自大型语言模型的瞬时图像分类的文本解释
链接:https://arxiv.org/abs/2510.06931
作者:Fiorenzo Stoppa, Turan Bulmus, Steven Bloemen, Stephen J. Smartt, Paul J. Groot, Paul Vreeswijk, Ken W. Smith
备注:Published in Nature Astronomy (2025). Publisher's Version of Record (CC BY 4.0). DOI: https://doi.org/10.1038/s41550-025-02670-z
摘要:现代天文学调查提供了大量的瞬态检测,但区分真正的天体物理信号(例如,爆炸事件)与虚假的成像伪影仍然是一个挑战。卷积神经网络有效地用于真实与虚假分类;然而,它们对不透明潜在表示的依赖阻碍了可解释性。在这里,我们证明了大型语言模型(LLM)可以在三个光学瞬态调查数据集(Pan-STARRS,MeerLICHT和ATLAS)上接近卷积神经网络的性能水平,同时为每个候选人生成直接的,人类可读的描述。仅使用15个示例和简洁的说明,Google的LLM Gemini在跨越一系列分辨率和像素尺度的数据集上实现了93%的平均准确率。我们还表明,第二个LLM可以评估第一个模型的输出的一致性,通过识别有问题的情况下,使迭代细化。该框架允许用户通过自然语言和示例定义所需的分类行为,绕过传统的训练管道。此外,通过生成观察到的特征的文本描述,LLM使用户能够查询分类,就像导航注释目录一样,而不是破译抽象的潜在空间。随着下一代望远镜和调查进一步增加可用的数据量,基于LLM的分类可以帮助弥合自动化检测和透明的人类水平理解之间的差距。
摘要:Modern astronomical surveys deliver immense volumes of transient detections, yet distinguishing real astrophysical signals (for example, explosive events) from bogus imaging artefacts remains a challenge. Convolutional neural networks are effectively used for real versus bogus classification; however, their reliance on opaque latent representations hinders interpretability. Here we show that large language models (LLMs) can approach the performance level of a convolutional neural network on three optical transient survey datasets (Pan-STARRS, MeerLICHT and ATLAS) while simultaneously producing direct, human-readable descriptions for every candidate. Using only 15 examples and concise instructions, Google's LLM, Gemini, achieves a 93% average accuracy across datasets that span a range of resolution and pixel scales. We also show that a second LLM can assess the coherence of the output of the first model, enabling iterative refinement by identifying problematic cases. This framework allows users to define the desired classification behaviour through natural language and examples, bypassing traditional training pipelines. Furthermore, by generating textual descriptions of observed features, LLMs enable users to query classifications as if navigating an annotated catalogue, rather than deciphering abstract latent spaces. As next-generation telescopes and surveys further increase the amount of data available, LLM-based classification could help bridge the gap between automated detection and transparent, human-level understanding.
Graph相关(图学习|图神经网络|图优化等)(13篇)
【1】MolGA: Molecular Graph Adaptation with Pre-trained 2D Graph Encoder
标题:MolGA:使用预训练的2D图形编码器进行分子图形适应
链接:https://arxiv.org/abs/2510.07289
作者:Xingtong Yu, Chang Zhou, Xinming Zhang, Yuan Fang
备注:Under review
摘要:分子图表示学习广泛应用于化学和生物医学研究。虽然预先训练的2D图形编码器表现出了强大的性能,但它们忽略了与亚分子实例(原子和键)相关的丰富分子域知识。虽然分子预训练方法将这些知识结合到其预训练目标中,但它们通常采用针对特定类型的知识而定制的设计,缺乏整合分子中存在的不同知识的灵活性。因此,重新使用广泛可用且经过充分验证的预训练的2D编码器,同时在下游适应期间结合分子域知识,提供了更实用的替代方案。在这项工作中,我们提出了MolGA,它通过灵活地结合不同的分子域知识,使预训练的2D图形编码器适应下游分子应用。首先,我们提出了一个分子比对策略,弥合了预先训练的拓扑表示与领域知识表示之间的差距。其次,我们引入了一个条件适应机制,生成实例特定的令牌,使细粒度的整合分子域知识的下游任务。最后,我们在11个公共数据集上进行了广泛的实验,证明了MolGA的有效性。
摘要:Molecular graph representation learning is widely used in chemical and biomedical research. While pre-trained 2D graph encoders have demonstrated strong performance, they overlook the rich molecular domain knowledge associated with submolecular instances (atoms and bonds). While molecular pre-training approaches incorporate such knowledge into their pre-training objectives, they typically employ designs tailored to a specific type of knowledge, lacking the flexibility to integrate diverse knowledge present in molecules. Hence, reusing widely available and well-validated pre-trained 2D encoders, while incorporating molecular domain knowledge during downstream adaptation, offers a more practical alternative. In this work, we propose MolGA, which adapts pre-trained 2D graph encoders to downstream molecular applications by flexibly incorporating diverse molecular domain knowledge. First, we propose a molecular alignment strategy that bridge the gap between pre-trained topological representations with domain-knowledge representations. Second, we introduce a conditional adaptation mechanism that generates instance-specific tokens to enable fine-grained integration of molecular domain knowledge for downstream tasks. Finally, we conduct extensive experiments on eleven public datasets, demonstrating the effectiveness of MolGA.
【2】GTCN-G: A Residual Graph-Temporal Fusion Network for Imbalanced Intrusion Detection (Preprint)
标题:GTCN-G:用于不平衡入侵检测的剩余图-时间融合网络(预印本)
链接:https://arxiv.org/abs/2510.07285
作者:Tianxiang Xu, Zhichao Wen, Xinyu Zhao, Qi Hu, Yan Li, Chang Liu
备注:This preprint was submitted to IEEE TrustCom 2025. The accepted version will be published under copyright 2025 IEEE
摘要:网络威胁的日益复杂性和流量数据固有的类别不平衡性给现代入侵检测系统提出了严峻的挑战。虽然图神经网络(GNN)擅长建模拓扑结构,时间卷积网络(TCN)擅长捕获时间序列依赖关系,但在明确解决数据不平衡的同时协同集成两者的框架仍然是一个开放的挑战。本文介绍了一种新的深度学习框架,称为门控时间卷积网络和图(GTCN-G),旨在克服这些限制。我们的模型独特地融合了一个门控TCN(G-TCN),用于从网络流中提取层次时间特征,并设计了一个图卷积网络(GCN),用于从底层图结构中学习。核心创新在于通过图形注意力网络(GAT)实现的剩余学习机制的集成。该机制通过剩余连接保留原始特征信息,这对于缓解类不平衡问题和增强对罕见恶意活动(少数类)的检测灵敏度至关重要。我们在两个公共基准数据集UNSW-NB 15和ToN-IoT上进行了广泛的实验,以验证我们的方法。实验结果表明,所提出的GTCN-G模型实现了最先进的性能,显着优于现有的基线模型在二进制和多类分类任务。
摘要:The escalating complexity of network threats and the inherent class imbalance in traffic data present formidable challenges for modern Intrusion Detection Systems (IDS). While Graph Neural Networks (GNNs) excel in modeling topological structures and Temporal Convolutional Networks (TCNs) are proficient in capturing time-series dependencies, a framework that synergistically integrates both while explicitly addressing data imbalance remains an open challenge. This paper introduces a novel deep learning framework, named Gated Temporal Convolutional Network and Graph (GTCN-G), engineered to overcome these limitations. Our model uniquely fuses a Gated TCN (G-TCN) for extracting hierarchical temporal features from network flows with a Graph Convolutional Network (GCN) designed to learn from the underlying graph structure. The core innovation lies in the integration of a residual learning mechanism, implemented via a Graph Attention Network (GAT). This mechanism preserves original feature information through residual connections, which is critical for mitigating the class imbalance problem and enhancing detection sensitivity for rare malicious activities (minority classes). We conducted extensive experiments on two public benchmark datasets, UNSW-NB15 and ToN-IoT, to validate our approach. The empirical results demonstrate that the proposed GTCN-G model achieves state-of-the-art performance, significantly outperforming existing baseline models in both binary and multi-class classification tasks.
【3】Test-Time Graph Search for Goal-Conditioned Reinforcement Learning
标题:目标条件强化学习的测试时间图搜索
链接:https://arxiv.org/abs/2510.07257
作者:Evgenii Opryshko, Junwei Quan, Claas Voelcker, Yilun Du, Igor Gilitschenski
摘要:离线目标条件强化学习(GCRL)训练在测试时达到用户指定目标的策略,提供一种简单、无监督、与领域无关的方法来从未标记、无奖励的数据集中提取不同的行为。尽管如此,长期的决策仍然很难GCRL代理由于时间的信用分配和错误积累,和离线设置放大了这些影响。为了缓解这个问题,我们引入了测试时图搜索(TTGS),一个轻量级的规划方法来解决GCRL任务。TTGS接受任何状态空间距离或成本信号,在数据集状态上构建加权图,并执行快速搜索以组装冻结策略执行的子目标序列。当基本学习器是基于值的时,距离直接从学习的目标条件值函数中导出,因此不需要手工度量。TTGS不需要改变训练,不需要额外的监督,不需要在线交互,也不需要特权信息,它完全依靠推理运行。在OGBench基准测试中,TTGS提高了多个基础学习者在具有挑战性的运动任务上的成功率,证明了离线GCRL的简单度量引导测试时间规划的好处。
摘要:Offline goal-conditioned reinforcement learning (GCRL) trains policies that reach user-specified goals at test time, providing a simple, unsupervised, domain-agnostic way to extract diverse behaviors from unlabeled, reward-free datasets. Nonetheless, long-horizon decision making remains difficult for GCRL agents due to temporal credit assignment and error accumulation, and the offline setting amplifies these effects. To alleviate this issue, we introduce Test-Time Graph Search (TTGS), a lightweight planning approach to solve the GCRL task. TTGS accepts any state-space distance or cost signal, builds a weighted graph over dataset states, and performs fast search to assemble a sequence of subgoals that a frozen policy executes. When the base learner is value-based, the distance is derived directly from the learned goal-conditioned value function, so no handcrafted metric is needed. TTGS requires no changes to training, no additional supervision, no online interaction, and no privileged information, and it runs entirely at inference. On the OGBench benchmark, TTGS improves success rates of multiple base learners on challenging locomotion tasks, demonstrating the benefit of simple metric-guided test-time planning for offline GCRL.
【4】Spectral Graph Clustering under Differential Privacy: Balancing Privacy, Accuracy, and Efficiency
标题:差异隐私下的谱图集群:平衡隐私、准确性和效率
链接:https://arxiv.org/abs/2510.07136
作者:Mohamed Seif, Antti Koskela, H. Vincent Poor, Andrea J. Goldsmith
摘要:研究了边差分隐私下的谱图聚类问题。具体来说,我们开发了三种机制:(i)通过随机边翻转结合邻接矩阵重排的图扰动,其在保持图的关键谱特性的同时实施边隐私。重要的是,混洗显著地放大了保证:而随着节点数量的增长,单独以固定概率翻转边缘仅提供恒定的边缘DP保证,混洗机制实现了- 具有随着节点数目增加而趋于零的参数的(Δ,delta)边缘DP;(ii)在低维空间中具有加性高斯噪声的私有图投影,以降低维度和计算复杂度;以及(iii)噪声功率迭代方法,其在迭代中分布高斯噪声以确保边缘DP同时保持收敛。我们的分析提供了严格的隐私保证和错误分类错误率的精确表征。在合成网络和真实网络上的实验验证了我们的理论分析,并说明了实际的隐私-效用权衡。
摘要:We study the problem of spectral graph clustering under edge differential privacy (DP). Specifically, we develop three mechanisms: (i) graph perturbation via randomized edge flipping combined with adjacency matrix shuffling, which enforces edge privacy while preserving key spectral properties of the graph. Importantly, shuffling considerably amplifies the guarantees: whereas flipping edges with a fixed probability alone provides only a constant epsilon edge DP guarantee as the number of nodes grows, the shuffled mechanism achieves (epsilon, delta) edge DP with parameters that tend to zero as the number of nodes increase; (ii) private graph projection with additive Gaussian noise in a lower-dimensional space to reduce dimensionality and computational complexity; and (iii) a noisy power iteration method that distributes Gaussian noise across iterations to ensure edge DP while maintaining convergence. Our analysis provides rigorous privacy guarantees and a precise characterization of the misclassification error rate. Experiments on synthetic and real-world networks validate our theoretical analysis and illustrate the practical privacy-utility trade-offs.
【5】Introspection in Learned Semantic Scene Graph Localisation
标题:习得语义场景图局部化中的内省
链接:https://arxiv.org/abs/2510.07053
作者:Manshika Charvi Bissessur, Efimia Panagiotaki, Daniele De Martini
备注:IEEE IROS 2025 Workshop FAST
摘要:这项工作研究语义如何影响本地化的性能和鲁棒性,在一个学习的自我监督,对比语义本地化框架。在原始和扰动地图上训练定位网络后,我们进行了彻底的事后自省分析,以探讨模型是否过滤了环境噪声,并将独特的地标优先于常规杂波。我们验证各种可解释性方法,并提出了一个比较可靠性分析。综合梯度和注意力权重始终是学习行为最可靠的探针。语义类消融进一步揭示了一个隐含的权重,其中频繁的对象往往是向下加权。总的来说,结果表明,该模型学习噪声鲁棒性,语义显着关系的地方定义,从而使具有挑战性的视觉和结构变化下的可解释的注册。
摘要:This work investigates how semantics influence localisation performance and robustness in a learned self-supervised, contrastive semantic localisation framework. After training a localisation network on both original and perturbed maps, we conduct a thorough post-hoc introspection analysis to probe whether the model filters environmental noise and prioritises distinctive landmarks over routine clutter. We validate various interpretability methods and present a comparative reliability analysis. Integrated gradients and Attention Weights consistently emerge as the most reliable probes of learned behaviour. A semantic class ablation further reveals an implicit weighting in which frequent objects are often down-weighted. Overall, the results indicate that the model learns noise-robust, semantically salient relations about place definition, thereby enabling explainable registration under challenging visual and structural variations.
【6】Relational Database Distillation: From Structured Tables to Condensed Graph Data
标题:关系数据库蒸馏:从结构化表到精简图数据
链接:https://arxiv.org/abs/2510.06980
作者:Xinyi Gao, Jingxi Zhang, Lijian Chen, Tong Chen, Lizhen Cui, Hongzhi Yin
摘要
:关系数据库(RDB)是大多数全球数据管理系统的基础,其中信息被结构化为多个相互依赖的表。为了有效地使用RDB中的知识进行预测任务,最近的进展利用图表示学习来捕获复杂的表间关系作为多跳依赖关系。尽管实现了最先进的性能,但由于数据库的巨大规模和跨互连表传递的密集消息的计算负担,这些方法仍然受到令人望而却步的存储开销和过多训练时间的阻碍。为了缓解这些担忧,我们提出并研究了关系数据库蒸馏(RDD)问题。具体来说,我们的目标是将大规模的RDB提取成紧凑的异构图,同时保留预测能力(即,实用程序)。通过节点特征保留多模态列信息,并通过异构边缘编码主键-外键关系,从而保持数据保真度和关系结构。为了确保不同下游任务的适应性,而不使用传统的,低效的双层蒸馏框架,我们进一步设计了一个带有伪标签的核岭回归引导目标,它为蒸馏图产生质量特征。在多个真实世界的RDB上进行的大量实验表明,我们的解决方案大大减少了数据大小,同时在分类和回归任务上保持了有竞争力的性能,为RDB的可扩展学习创建了一个有效的途径。
摘要:Relational databases (RDBs) underpin the majority of global data management systems, where information is structured into multiple interdependent tables. To effectively use the knowledge within RDBs for predictive tasks, recent advances leverage graph representation learning to capture complex inter-table relations as multi-hop dependencies. Despite achieving state-of-the-art performance, these methods remain hindered by the prohibitive storage overhead and excessive training time, due to the massive scale of the database and the computational burden of intensive message passing across interconnected tables. To alleviate these concerns, we propose and study the problem of Relational Database Distillation (RDD). Specifically, we aim to distill large-scale RDBs into compact heterogeneous graphs while retaining the predictive power (i.e., utility) required for training graph-based models. Multi-modal column information is preserved through node features, and primary-foreign key relations are encoded via heterogeneous edges, thereby maintaining both data fidelity and relational structure. To ensure adaptability across diverse downstream tasks without engaging the traditional, inefficient bi-level distillation framework, we further design a kernel ridge regression-guided objective with pseudo-labels, which produces quality features for the distilled graph. Extensive experiments on multiple real-world RDBs demonstrate that our solution substantially reduces the data size while maintaining competitive performance on classification and regression tasks, creating an effective pathway for scalable learning with RDBs.
【7】Revisiting Node Affinity Prediction in Temporal Graphs
标题:重新审视时间图中的节点亲和力预测
链接:https://arxiv.org/abs/2510.06940
作者:Krishna Sri Ipsit Mantri, Or Feldman, Moshe Eliasof, Chaim Baskin
备注:preprint
摘要:节点亲和度预测是一项常见的任务,广泛用于时间图学习,并应用于社交和金融网络、推荐系统等。最近的工作已经解决了这一任务,适应国家的最先进的动态链接属性预测模型,节点的亲和力预测。然而,简单的预测,如持续预测或移动平均,优于这些模型。在这项工作中,我们分析了训练当前时间图神经网络进行节点亲和度预测的挑战,并提出了适当的解决方案。结合这些解决方案,我们开发了基于虚拟状态的NAViS-Node Affinity预测模型,利用了状态空间模型与物理模型的等价性。虽然很有希望,但训练NAViS并非易事。因此,我们进一步引入了一种新的损失函数的节点亲和力预测。我们在TGB上评估了NAViS,并表明它优于最先进的技术,包括自动化。我们的源代码可以在https://github.com/orfeld415/NAVIS上找到
摘要:Node affinity prediction is a common task that is widely used in temporal graph learning with applications in social and financial networks, recommender systems, and more. Recent works have addressed this task by adapting state-of-the-art dynamic link property prediction models to node affinity prediction. However, simple heuristics, such as Persistent Forecast or Moving Average, outperform these models. In this work, we analyze the challenges in training current Temporal Graph Neural Networks for node affinity prediction and suggest appropriate solutions. Combining the solutions, we develop NAViS - Node Affinity prediction model using Virtual State, by exploiting the equivalence between heuristics and state space models. While promising, training NAViS is non-trivial. Therefore, we further introduce a novel loss function for node affinity prediction. We evaluate NAViS on TGB and show that it outperforms the state-of-the-art, including heuristics. Our source code is available at https://github.com/orfeld415/NAVIS
【8】MoRE-GNN: Multi-omics Data Integration with a Heterogeneous Graph Autoencoder
标题:MoRE-GNN:利用异类图形自动编码器实现多组学数据集成
链接:https://arxiv.org/abs/2510.06880
作者:Zhiyu Wang, Sonia Koszut, Pietro Liò, Francesco Ceccarelli
摘要:由于多组学单细胞数据的高维性和复杂的模态间关系,其整合仍然具有挑战性。为了解决这个问题,我们引入了MoRE-GNN(多组学关系边缘图神经网络),这是一种异构的图形自动编码器,它结合了图形卷积和注意力机制,可以直接从数据中动态构建关系图。对六个公开数据集的评估表明,MoRE-GNN捕获了生物学上有意义的关系,并优于现有的方法,特别是在具有强模态间相关性的环境中。此外,学习的表示允许准确的下游跨模态预测。虽然性能可能会因数据集的复杂性而异,但MoRE-GNN提供了一个自适应、可扩展和可解释的框架,用于推进多组学集成。
摘要:The integration of multi-omics single-cell data remains challenging due to high-dimensionality and complex inter-modality relationships. To address this, we introduce MoRE-GNN (Multi-omics Relational Edge Graph Neural Network), a heterogeneous graph autoencoder that combines graph convolution and attention mechanisms to dynamically construct relational graphs directly from data. Evaluations on six publicly available datasets demonstrate that MoRE-GNN captures biologically meaningful relationships and outperforms existing methods, particularly in settings with strong inter-modality correlations. Furthermore, the learned representations allow for accurate downstream cross-modal predictions. While performance may vary with dataset complexity, MoRE-GNN offers an adaptive, scalable and interpretable framework for advancing multi-omics integration.
【9】Towards Generalization of Graph Neural Networks for AC Optimal Power Flow
标题:交流最优潮流图神经网络的推广
链接:https://arxiv.org/abs/2510.06860
作者:Olayiwola Arowolo, Jochen L. Cremer
备注:Pre-print has been submitted for review
摘要
:交流最优潮流(ACOPF)是计算昂贵的大规模电力系统,与传统的求解器需要令人望而却步的解决方案的时间。机器学习方法提供了计算加速,但在没有昂贵的重新训练的情况下,难以实现可扩展性和拓扑适应性。为了实现跨网格大小的可扩展性和对拓扑变化的适应性,我们提出了一种混合异构消息传递神经网络(HH-MPNN)。HH-MPNN将总线、发电机、负载、分流器、传输线和Transformers建模为不同的节点或边缘类型,并结合可扩展的Transformer模型来处理长期依赖关系。在14到2,000个总线的网格上,HH-MPNN在默认拓扑上实现了小于1%的最优性间隙。将zero-shot应用于数千个看不见的拓扑结构,HH-MPNN实现了小于3%的最优性差距,尽管只在默认拓扑结构上训练。在较小网格上进行预训练也可以提高在较大网格上的结果。与内部点求解器相比,计算加速达到1,000倍至10,000倍。这些结果推进了实用的,可推广的机器学习实时电力系统操作。
摘要:AC Optimal Power Flow (ACOPF) is computationally expensive for large-scale power systems, with conventional solvers requiring prohibitive solution times. Machine learning approaches offer computational speedups but struggle with scalability and topology adaptability without expensive retraining. To enable scalability across grid sizes and adaptability to topology changes, we propose a Hybrid Heterogeneous Message Passing Neural Network (HH-MPNN). HH-MPNN models buses, generators, loads, shunts, transmission lines and transformers as distinct node or edge types, combined with a scalable transformer model for handling long-range dependencies. On grids from 14 to 2,000 buses, HH-MPNN achieves less than 1% optimality gap on default topologies. Applied zero-shot to thousands of unseen topologies, HH-MPNN achieves less than 3% optimality gap despite training only on default topologies. Pre-training on smaller grids also improves results on a larger grid. Computational speedups reach 1,000x to 10,000x compared to interior point solvers. These results advance practical, generalizable machine learning for real-time power system operations.
【10】The Unreasonable Effectiveness of Randomized Representations in Online Continual Graph Learning
标题:在线连续图学习中随机表示的不合理有效性
链接:https://arxiv.org/abs/2510.06819
作者:Giovanni Donghi, Daniele Zambon, Luca Pasa, Cesare Alippi, Nicolò Navarin
摘要:灾难性遗忘是在线连续图学习(OCGL)的主要障碍之一,其中节点一个接一个地到达,分布漂移可能随时发生,并且特定任务子图的离线训练是不可行的。在这项工作中,我们探索了一种令人惊讶的简单但高效的OCGL方法:我们使用一个固定的,随机初始化的编码器,通过聚合邻域信息来生成鲁棒的和有表现力的节点嵌入,在线训练一个轻量级的分类器。通过冻结编码器,我们消除了表示参数的漂移,这是遗忘的关键来源,获得了既有表现力又稳定的嵌入。当在几个OCGL基准测试中进行评估时,尽管这种方法简单且缺乏内存缓冲区,但与最先进的方法相比,这种方法产生了一致的收益,令人惊讶的改进高达30%,并且性能通常接近联合离线训练的上限。这些结果表明,在OCGL中,灾难性遗忘可以通过拥抱架构的简单性和稳定性来最小化,而无需复杂的重放或正则化。
摘要:Catastrophic forgetting is one of the main obstacles for Online Continual Graph Learning (OCGL), where nodes arrive one by one, distribution drifts may occur at any time and offline training on task-specific subgraphs is not feasible. In this work, we explore a surprisingly simple yet highly effective approach for OCGL: we use a fixed, randomly initialized encoder to generate robust and expressive node embeddings by aggregating neighborhood information, training online only a lightweight classifier. By freezing the encoder, we eliminate drifts of the representation parameters, a key source of forgetting, obtaining embeddings that are both expressive and stable. When evaluated across several OCGL benchmarks, despite its simplicity and lack of memory buffer, this approach yields consistent gains over state-of-the-art methods, with surprising improvements of up to 30% and performance often approaching that of the joint offline-training upper bound. These results suggest that in OCGL, catastrophic forgetting can be minimized without complex replay or regularization by embracing architectural simplicity and stability.
【11】Incorporating Expert Knowledge into Bayesian Causal Discovery of Mixtures of Directed Acyclic Graphs
标题:将专家知识转化为混合有向无环图的贝叶斯因果发现
链接:https://arxiv.org/abs/2510.06735
作者:Zachris Björkman, Jorge Loría, Sophie Wharrie, Samuel Kaski
备注:28 pages, 18 figures
摘要:贝叶斯因果发现受益于领域专家的先验信息,而在异构领域中,任何先验知识都是非常需要的。然而,到目前为止,现有的启发式方法假设一个单一的因果图,因此不适合异构域。我们提出了一个因果启发策略异构设置,贝叶斯实验设计(BED)的原则,和变分混合结构学习(VaMSL)方法-扩展早期的可微贝叶斯结构学习(DiBS)方法-迭代推断因果贝叶斯网络(CBN)的混合物。我们构建了一个信息图之前,将引发专家反馈的混合CBN的推理。我们所提出的方法成功地产生了一组替代的因果模型(混合成分或集群),并实现了改进的结构学习性能时,异构的合成数据通知模拟专家。最后,我们证明了我们的方法能够捕获乳腺癌数据库中的复杂分布。
摘要:Bayesian causal discovery benefits from prior information elicited from domain experts, and in heterogeneous domains any prior knowledge would be badly needed. However, so far prior elicitation approaches have assumed a single causal graph and hence are not suited to heterogeneous domains. We propose a causal elicitation strategy for heterogeneous settings, based on Bayesian experimental design (BED) principles, and a variational mixture structure learning (VaMSL) method -- extending the earlier differentiable Bayesian structure learning (DiBS) method -- to iteratively infer mixtures of causal Bayesian networks (CBNs). We construct an informative graph prior incorporating elicited expert feedback in the inference of mixtures of CBNs. Our proposed method successfully produces a set of alternative causal models (mixture components or clusters), and achieves an improved structure learning performance on heterogeneous synthetic data when informed by a simulated expert. Finally, we demonstrate that our approach is capable of capturing complex distributions in a breast cancer database.
【12】Root Cause Analysis of Outliers in Unknown Cyclic Graphs
标题:未知循环图中异常值的根本原因分析
链接:https://arxiv.org/abs/2510.06995
作者:Daniela Schkoda, Dominik Janzing
摘要:我们研究了具有线性结构方程的循环因果图中离群点的传播,将它们追溯到一个或几个“根本原因”节点。我们表明,它是可能的,以确定一个简短的列表的潜在的根本原因,提供的扰动是足够强的,并根据相同的结构方程在正常模式下传播。该入围名单由真正的根本原因以及与根本原因处于周期中的父母的原因组成。值得注意的是,我们的方法不需要因果图的先验知识。
摘要
:We study the propagation of outliers in cyclic causal graphs with linear structural equations, tracing them back to one or several "root cause" nodes. We show that it is possible to identify a short list of potential root causes provided that the perturbation is sufficiently strong and propagates according to the same structural equations as in the normal mode. This shortlist consists of the true root causes together with those of its parents lying on a cycle with the root cause. Notably, our method does not require prior knowledge of the causal graph.
【13】Soft-Evidence Fused Graph Neural Network for Cancer Driver Gene Identification across Multi-View Biological Graphs
标题:软证据融合图神经网络用于跨多视图生物图的癌症驱动基因识别
链接:https://arxiv.org/abs/2510.06290
作者:Bang Chen, Lijun Guo, Houli Fan, Wentao He, Rong Zhang
备注:8pages
摘要:识别癌症驱动基因(CDG)对于理解癌症机制和开发靶向治疗至关重要。图神经网络(GNN)最近被用来识别CDG通过捕获模式的生物相互作用网络。然而,大多数基于GNN的方法依赖于单一的蛋白质-蛋白质相互作用(PPI)网络,忽略了来自其他生物网络的互补信息。一些研究通过一致性约束对齐特征来整合多个网络,以学习用于CDG识别的统一基因表示。然而,这种代表级融合往往假设全等基因的关系,在网络中,这可能会忽略网络的异质性,并引入相互冲突的信息。为了解决这个问题,我们提出了软证据融合图神经网络(SEFGNN),这是一种在决策层跨多个网络进行CDG识别的新框架。SEFGNN没有强制执行特征级一致性,而是将每个生物网络视为独立的证据源,并使用Dempster-Shafer理论(DST)在决策级执行不确定性感知融合。为了减轻DST过度自信的风险,我们进一步引入了软证据平滑(SES)模块,在保持判别性能的同时提高了排名稳定性。在三个癌症数据集上的实验表明,SEFGNN始终优于最先进的基线,并在发现新的CDG方面表现出强大的潜力。
摘要:Identifying cancer driver genes (CDGs) is essential for understanding cancer mechanisms and developing targeted therapies. Graph neural networks (GNNs) have recently been employed to identify CDGs by capturing patterns in biological interaction networks. However, most GNN-based approaches rely on a single protein-protein interaction (PPI) network, ignoring complementary information from other biological networks. Some studies integrate multiple networks by aligning features with consistency constraints to learn unified gene representations for CDG identification. However, such representation-level fusion often assumes congruent gene relationships across networks, which may overlook network heterogeneity and introduce conflicting information. To address this, we propose Soft-Evidence Fusion Graph Neural Network (SEFGNN), a novel framework for CDG identification across multiple networks at the decision level. Instead of enforcing feature-level consistency, SEFGNN treats each biological network as an independent evidence source and performs uncertainty-aware fusion at the decision level using Dempster-Shafer Theory (DST). To alleviate the risk of overconfidence from DST, we further introduce a Soft Evidence Smoothing (SES) module that improves ranking stability while preserving discriminative performance. Experiments on three cancer datasets show that SEFGNN consistently outperforms state-of-the-art baselines and exhibits strong potential in discovering novel CDGs.
Transformer(9篇)
【1】HTMformer: Hybrid Time and Multivariate Transformer for Time Series Forecasting
标题:HTMformer:用于时间序列预测的混合时间和多元Transformer
链接:https://arxiv.org/abs/2510.07084
作者:Tan Wang, Yun Wei Dong, Tao Zhang, Qi Wang
摘要:基于变换器的方法在时间序列预测中取得了令人印象深刻的结果。然而,现有的Transformers在序列建模中仍然表现出局限性,因为它们往往过分强调时间依赖性。这会导致额外的计算开销,而不会产生相应的性能增益。我们发现,Transformers的性能高度依赖于用于学习有效表示的嵌入方法。为了解决这个问题,我们提取多变量特征来增强嵌入层中捕获的有效信息,从而产生多维嵌入,从而传达更丰富,更有意义的序列表示。这些表示使基于Transformer的预测人员能够更好地理解该系列。具体来说,我们引入混合时间和多变量嵌入(HTME)。HTME提取器集成了一个轻量级的时间特征提取模块与精心设计的多元特征提取模块,以提供互补的功能,从而实现模型复杂性和性能之间的平衡。通过将HTME与Transformer架构相结合,我们提出了HTMformer,利用HTME提取器的增强特征提取能力来构建轻量级预测器。在8个真实数据集上进行的实验表明,我们的方法在准确性和效率方面都优于现有的基线。
摘要:Transformer-based methods have achieved impressive results in time series forecasting. However, existing Transformers still exhibit limitations in sequence modeling as they tend to overemphasize temporal dependencies. This incurs additional computational overhead without yielding corresponding performance gains. We find that the performance of Transformers is highly dependent on the embedding method used to learn effective representations. To address this issue, we extract multivariate features to augment the effective information captured in the embedding layer, yielding multidimensional embeddings that convey richer and more meaningful sequence representations. These representations enable Transformer-based forecasters to better understand the series. Specifically, we introduce Hybrid Temporal and Multivariate Embeddings (HTME). The HTME extractor integrates a lightweight temporal feature extraction module with a carefully designed multivariate feature extraction module to provide complementary features, thereby achieving a balance between model complexity and performance. By combining HTME with the Transformer architecture, we present HTMformer, leveraging the enhanced feature extraction capability of the HTME extractor to build a lightweight forecaster. Experiments conducted on eight real-world datasets demonstrate that our approach outperforms existing baselines in both accuracy and efficiency.
【2】From Condensation to Rank Collapse: A Two-Stage Analysis of Transformer Training Dynamics
标题:从凝聚到排名崩溃:Transformer训练动态的两阶段分析
链接:https://arxiv.org/abs/2510.06954
作者:Zheng-An Chen, Tao Luo
摘要
:虽然基于变压器的模型表现出卓越的经验性能,其训练动力学的基本原则是不充分的特点以外的配置特定的研究。受语言模型在小初始化规模下推理能力提高的经验证据的启发,我们采用了[Zhou et al. NeurIPS 2022]中建立的梯度流分析框架来系统地研究线性化的Transformer训练动态。我们的理论分析将注意模块的动态过程分为两个不同的阶段。在第一阶段中,来自随机初始化的非对称权重扰动维持参数矩阵中的非退化梯度动力学,促进系统从小初始化制度中逃逸。随后,这些基质经历冷凝,逐渐朝向目标取向排列。在第二阶段,先前静态的键查询矩阵积极参与训练,驱动归一化矩阵向渐近秩崩溃。这两个阶段的框架推广了经典的方向收敛结果。
摘要:Although transformer-based models have shown exceptional empirical performance, the fundamental principles governing their training dynamics are inadequately characterized beyond configuration-specific studies. Inspired by empirical evidence showing improved reasoning capabilities under small initialization scales in language models, we employ the gradient flow analytical framework established in [Zhou et al. NeurIPS 2022] to systematically investigate linearized Transformer training dynamics. Our theoretical analysis dissects the dynamics of attention modules into two distinct stages. In the first stage, asymmetric weight perturbations from random initialization sustain non-degenerate gradient dynamics in parameter matrices, facilitating systematic escape from small initialization regimes. Subsequently, these matrices undergo condensation, progressively aligning toward the target orientation. In the second stage, the previously static key-query matrices actively participate in training, driving the normalized matrices toward asymptotic rank collapse. This two-stage framework generalizes classical directional convergence results.
【3】TimeFormer: Transformer with Attention Modulation Empowered by Temporal Characteristics for Time Series Forecasting
标题:TimeFormer:由时间特征驱动的具有注意力调制的Transformer,用于时间序列预测
链接:https://arxiv.org/abs/2510.06680
作者:Zhipeng Liu, Peibo Duan, Xuan Tang, Baixin Li, Yongsheng Huang, Mingyang Geng, Changsheng Zhang, Bin Zhang, Binwu Wang
摘要:虽然Transformers在自然语言处理方面表现出色,但由于没有充分考虑文本和时间模态之间的差异,因此将其扩展到时间序列预测仍然具有挑战性。在本文中,我们开发了一种新的Transformer架构设计的时间序列数据,旨在最大限度地提高其代表性的能力。我们确定了时间序列的两个关键但经常被忽视的特征:(1)从过去到未来的单向影响,以及(2)随着时间的推移衰减影响的现象。这些特征的引入是为了增强Transformers的注意机制。我们提出了TimeFormer,其核心创新是一个自我注意机制与两个调制项(MoSA),旨在捕捉这些时间先验的时间序列的约束下的霍克斯过程和因果掩蔽。此外,TimeFormer引入了一个基于多尺度和子序列分析的框架来捕获不同时间尺度上的语义依赖,丰富了时间依赖。在多个真实数据集上进行的大量实验表明,TimeFormer的性能明显优于最先进的方法,与最佳基线相比,MSE降低了7.45%,并在94.04%的评估指标上设置了新的基准。此外,我们证明了MoSA机制可以广泛应用于提高其他基于transformer的模型的性能。
摘要:Although Transformers excel in natural language processing, their extension to time series forecasting remains challenging due to insufficient consideration of the differences between textual and temporal modalities. In this paper, we develop a novel Transformer architecture designed for time series data, aiming to maximize its representational capacity. We identify two key but often overlooked characteristics of time series: (1) unidirectional influence from the past to the future, and (2) the phenomenon of decaying influence over time. These characteristics are introduced to enhance the attention mechanism of Transformers. We propose TimeFormer, whose core innovation is a self-attention mechanism with two modulation terms (MoSA), designed to capture these temporal priors of time series under the constraints of the Hawkes process and causal masking. Additionally, TimeFormer introduces a framework based on multi-scale and subsequence analysis to capture semantic dependencies at different temporal scales, enriching the temporal dependencies. Extensive experiments conducted on multiple real-world datasets show that TimeFormer significantly outperforms state-of-the-art methods, achieving up to a 7.45% reduction in MSE compared to the best baseline and setting new benchmarks on 94.04\% of evaluation metrics. Moreover, we demonstrate that the MoSA mechanism can be broadly applied to enhance the performance of other Transformer-based models.
【4】The Effect of Attention Head Count on Transformer Approximation
标题:注意力数量对Transformer逼近的影响
链接:https://arxiv.org/abs/2510.06662
作者:Penghao Yu, Haotian Jiang, Zeyu Bao, Ruoxi Yu, Qianxiao Li
摘要:Transformer已经成为序列建模的主要架构,但是对其结构参数如何影响表达能力的详细理解仍然有限。在这项工作中,我们研究了Transformers的近似性质,特别强调了注意头数目的作用。我们的分析首先介绍了一个广义的$D$-检索任务,我们证明是密集的连续函数的空间,从而为我们的理论框架提供了基础。然后,我们建立了$\epsilon$-逼近所需的参数复杂度的上界和下界。具体地说,我们表明,对于某个常数$c$和序列长度$T$,具有足够多的头的Transformers允许有效的逼近,而对于太少的头,参数的数量必须至少标度为$O(1/\epsilon^{cT})$。据我们所知,这构成了第一个严格的下限,这种类型的非线性和实际相关的设置。我们进一步研究了单头的情况下,并证明了一个嵌入维的顺序为O(T)$允许完全记忆的输入,其中近似是完全实现的前馈块。最后,我们通过对合成数据和现实任务的实验验证了我们的理论研究结果,说明了我们的结果的实际相关性。
摘要:Transformer has become the dominant architecture for sequence modeling, yet a detailed understanding of how its structural parameters influence expressive power remains limited. In this work, we study the approximation properties of transformers, with particular emphasis on the role of the number of attention heads. Our analysis begins with the introduction of a generalized $D$-retrieval task, which we prove to be dense in the space of continuous functions, thereby providing the basis for our theoretical framework. We then establish both upper and lower bounds on the parameter complexity required for $\epsilon$-approximation. Specifically, we show that transformers with sufficiently many heads admit efficient approximation, whereas with too few heads, the number of parameters must scale at least as $O(1/\epsilon^{cT})$, for some constant $c$ and sequence length $T$. To the best of our knowledge, this constitutes the first rigorous lower bound of this type in a nonlinear and practically relevant setting. We further examine the single-head case and demonstrate that an embedding dimension of order $O(T)$ allows complete memorization of the input, where approximation is entirely achieved by the feed-forward block. Finally, we validate our theoretical findings with experiments on both synthetic data and real-world tasks, illustrating the practical relevance of our results.
【5】A Comparative Analysis of Contextual Representation Flow in State-Space and Transformer Architectures
标题:状态空间和Transformer结构中上下文表示流的比较分析
链接:https://arxiv.org/abs/2510.06640
作者
:Nhat M. Hoang, Do Xuan Long, Cong-Duy Nguyen, Min-Yen Kan, Luu Anh Tuan
摘要:状态空间模型(SSM)最近已经成为基于转换器的模型(TBM)的有效替代品,用于长序列处理,提供线性缩放和更低的内存使用。然而,上下文信息如何在这些架构中跨层和令牌流动仍然没有得到充分研究。我们提出了第一个统一的,令牌和层级的分析表示在SSM和TBM的传播。使用中心内核对齐,稳定性指标和探测,我们表征了表示如何在层内和层间演变。我们发现了一个关键的分歧:TBMs迅速同质化的令牌表示,与多样性重新出现,只有在后面的层,而SSM保留令牌的唯一性早期,但收敛到同质化更深。理论分析和参数随机化进一步表明,在TBMs的过度平滑源于建筑设计,而在SSM中,它主要来自训练动态。这些见解澄清了这两种架构的归纳偏差,并为未来的长上下文推理模型和训练设计提供信息。
摘要:State Space Models (SSMs) have recently emerged as efficient alternatives to Transformer-Based Models (TBMs) for long-sequence processing, offering linear scaling and lower memory use. Yet, how contextual information flows across layers and tokens in these architectures remains understudied. We present the first unified, token- and layer-level analysis of representation propagation in SSMs and TBMs. Using centered kernel alignment, stability metrics, and probing, we characterize how representations evolve within and across layers. We find a key divergence: TBMs rapidly homogenize token representations, with diversity reemerging only in later layers, while SSMs preserve token uniqueness early but converge to homogenization deeper. Theoretical analysis and parameter randomization further reveal that oversmoothing in TBMs stems from architectural design, whereas in SSMs it arises mainly from training dynamics. These insights clarify the inductive biases of both architectures and inform future model and training designs for long-context reasoning.
【6】Relational Transformer: Toward Zero-Shot Foundation Models for Relational Data
标题:关系Transformer:走向关系数据的零冲击基础模型
链接:https://arxiv.org/abs/2510.06377
作者:Rishabh Ranjan, Valter Hudovernik, Mark Znidar, Charilaos Kanatsoulis, Roshan Upendra, Mahmoud Mohammadi, Joe Meyer, Tom Palczewski, Carlos Guestrin, Jure Leskovec
备注:preprint; under review
摘要:经过预训练的Transformers通过zero-shot提示很容易适应新的序列建模任务,但关系域仍然缺乏跨数据集和任务传输的架构。核心挑战是关系数据的多样性,具有不同的异构模式,图形结构和函数依赖关系。在本文中,我们提出了关系Transformer(RT)架构,它可以在不同的关系数据库上进行预训练,并直接应用于看不见的数据集和任务,而无需对特定于任务或数据集进行微调,也无需检索上下文中的示例。RT(i)使用表/列元数据标记单元格,(ii)通过掩码标记预测进行预训练,(iii)在列,行和主键-外键链接上使用新的\textit{关系注意}机制。在跨越客户流失和销售预测等任务的RelBench数据集上进行预训练,RT获得了强大的zero-shot性能,平均94%的完全监督AUROC在二进制分类任务上具有22 M参数模型的单次前向传递,而27 B LLM则为84%。微调产生最先进的结果,具有高采样效率。我们的实验表明,RT的zero-shot转移利用任务表上下文,关系注意模式和模式语义。总的来说,RT提供了一条通向关系数据基础模型的实用途径。
摘要:Pretrained transformers readily adapt to new sequence modeling tasks via zero-shot prompting, but relational domains still lack architectures that transfer across datasets and tasks. The core challenge is the diversity of relational data, with varying heterogeneous schemas, graph structures and functional dependencies. In this paper, we present the Relational Transformer (RT) architecture, which can be pretrained on diverse relational databases and directly applied to unseen datasets and tasks without task- or dataset-specific fine-tuning, or retrieval of in-context examples. RT (i) tokenizes cells with table/column metadata, (ii) is pretrained via masked token prediction, and (iii) utilizes a novel \textit{Relational Attention} mechanism over columns, rows, and primary-foreign key links. Pretrained on RelBench datasets spanning tasks such as churn and sales forecasting, RT attains strong zero-shot performance, averaging 94% of fully supervised AUROC on binary classification tasks with a single forward pass of a 22M parameter model, as opposed to 84% for a 27B LLM. Fine-tuning yields state-of-the-art results with high sample efficiency. Our experiments show that RT's zero-shot transfer harnesses task-table context, relational attention patterns and schema semantics. Overall, RT provides a practical path toward foundation models for relational data.
【7】Traj-Transformer: Diffusion Models with Transformer for GPS Trajectory Generation
标题:Traj-Transformer:用于GPS轨迹生成的带有Transformer的扩散模型
链接:https://arxiv.org/abs/2510.06291
作者:Zhiyang Zhang, Ningcong Chen, Xin Zhang, Yanhua Li, Shen Su, Hui Lu, Jun Luo
摘要:GPS设备的广泛使用推动了时空数据挖掘的进步,使机器学习模型能够模拟人类决策并生成逼真的轨迹,解决了数据收集成本和隐私问题。最近的研究表明,高质量的轨迹生成的扩散模型的承诺。然而,大多数现有的方法依赖于基于卷积的架构(例如UNet)来预测扩散过程中的噪声,由于模型容量有限,这通常会导致显着的偏差和细粒度街道级细节的丢失。在本文中,我们提出了轨迹Transformer,一种新的模型,采用了Transformer骨干条件信息嵌入和噪声预测。我们探讨了两种GPS坐标嵌入策略,位置嵌入和跨纬度嵌入,并分析了模型在不同尺度下的性能。两个真实世界的数据集上的实验表明,轨迹Transformer显着提高生成质量,并有效地消除了在以前的方法中观察到的偏差问题。
摘要:The widespread use of GPS devices has driven advances in spatiotemporal data mining, enabling machine learning models to simulate human decision making and generate realistic trajectories, addressing both data collection costs and privacy concerns. Recent studies have shown the promise of diffusion models for high-quality trajectory generation. However, most existing methods rely on convolution based architectures (e.g. UNet) to predict noise during the diffusion process, which often results in notable deviations and the loss of fine-grained street-level details due to limited model capacity. In this paper, we propose Trajectory Transformer, a novel model that employs a transformer backbone for both conditional information embedding and noise prediction. We explore two GPS coordinate embedding strategies, location embedding and longitude-latitude embedding, and analyze model performance at different scales. Experiments on two real-world datasets demonstrate that Trajectory Transformer significantly enhances generation quality and effectively alleviates the deviation issues observed in prior approaches.
【8】Vision Transformer for Transient Noise Classification
标题:用于瞬时噪音分类的视觉Transformer
链接:https://arxiv.org/abs/2510.06273
作者:Divyansh Srivastava, Andrzej Niedzielski
备注:9 pages, 4 figures
摘要:LIGO数据中的瞬态噪声(毛刺)阻碍了引力波(GW)的探测。Gravity Spy项目将这些噪音事件分为不同的类别。随着O3的运行,有两个额外的噪声类别,因此需要训练新的模型进行有效的分类。我们的目标是使用Vision Transformer(ViT)模型将LIGO数据中的毛刺分类为第一次运行的22个现有类别以及来自O3 a的2个额外噪声类别。我们在一个组合数据集上训练一个预先训练好的Vision Transformer(ViT-B/32)模型,该组合数据集由Gravity Spy数据集和LIGO O3 a运行中的另外两个类组成。我们实现了92.26%的分类效率,展示了Vision Transformer通过有效区分瞬态噪声来提高引力波探测精度的潜力。 关键词:引力波--Vision Transformer --机器学习
摘要:Transient noise (glitches) in LIGO data hinders the detection of gravitational waves (GW). The Gravity Spy project has categorized these noise events into various classes. With the O3 run, there is the inclusion of two additional noise classes and thus a need to train new models for effective classification. We aim to classify glitches in LIGO data into 22 existing classes from the first run plus 2 additional noise classes from O3a using the Vision Transformer (ViT) model. We train a pre-trained Vision Transformer (ViT-B/32) model on a combined dataset consisting of the Gravity Spy dataset with the additional two classes from the LIGO O3a run. We achieve a classification efficiency of 92.26%, demonstrating the potential of Vision Transformer to improve the accuracy of gravitational wave detection by effectively distinguishing transient noise. Key words: gravitational waves --vision transformer --machine learning
【9】Latent Representation Learning in Heavy-Ion Collisions with MaskPoint Transformer
标题:使用MaskPoint Transformer进行重离子碰撞中的潜在表示学习
链接:https://arxiv.org/abs/2510.06691
作者:Jing-Zong Zhang, Shuang Guo, Li-Lin Zhu, Lingxiao Wang, Guo-Liang Ma
备注:10 pages, 5 figures, accepted at the NeurIPS 2025 workshop "Machine Learning and the Physical Sciences"
摘要:高能核物理中的一个核心挑战是从重离子碰撞(HIC)的高维终态数据中提取信息特征,以便进行可靠的下游分析。传统方法通常依赖于选定的可观测量,这可能会错过数据中微妙但物理相关的结构。为了解决这个问题,我们引入了一个基于transformer的自动编码器,它采用两阶段范式进行训练:自我监督的预训练,然后进行监督微调。预训练的编码器直接从未标记的HIC数据中学习潜在表示,提供了一个紧凑且信息丰富的特征空间,可以适应不同的物理任务。作为一个案例研究,我们应用该方法来区分大型和小型碰撞系统,它实现了显着更高的分类精度比PointNet。主成分分析和SHAP解释进一步表明,自动编码器捕获了超出单个可观测值的复杂非线性相关性,产生了具有强大区分力和解释力的特征。这些结果建立了我们的两个阶段的框架,作为一个通用的和强大的基础,在HIC的特征学习,打开了大门,更强大的分析夸克-胶子等离子体特性和其他新兴现象。该实现可在https://github.com/Giovanni-Sforza/MaskPoint-AMPT上公开获得。
摘要:A central challenge in high-energy nuclear physics is to extract informative features from the high-dimensional final-state data of heavy-ion collisions (HIC) in order to enable reliable downstream analyses. Traditional approaches often rely on selected observables, which may miss subtle but physically relevant structures in the data. To address this, we introduce a Transformer-based autoencoder trained with a two-stage paradigm: self-supervised pre-training followed by supervised fine-tuning. The pretrained encoder learns latent representations directly from unlabeled HIC data, providing a compact and information-rich feature space that can be adapted to diverse physics tasks. As a case study, we apply the method to distinguish between large and small collision systems, where it achieves significantly higher classification accuracy than PointNet. Principal component analysis and SHAP interpretation further demonstrate that the autoencoder captures complex nonlinear correlations beyond individual observables, yielding features with strong discriminative and explanatory power. These results establish our two-stage framework as a general and robust foundation for feature learning in HIC, opening the door to more powerful analyses of quark--gluon plasma properties and other emergent phenomena. The implementation is publicly available at https://github.com/Giovanni-Sforza/MaskPoint-AMPT.
GAN|对抗|攻击|生成相关(7篇)
【1】GNN-enhanced Traffic Anomaly Detection for Next-Generation SDN-Enabled Consumer Electronics
标题:针对下一代支持SDP的消费电子产品的GNN增强型流量异常检测
链接:https://arxiv.org/abs/2510.07109
作者:Guan-Yan Yang, Farn Wang, Kuo-Hui Yeh
备注:This paper has been accepted for publication in IEEE Transactions on Consumer Electronics. 10 pages, 6 figures
摘要:连接到物联网的消费电子产品(CE)容易受到各种攻击,包括DDoS和基于Web的威胁,这可能会损害其功能并促进远程劫持。这些漏洞允许攻击者利用CE进行更广泛的系统攻击,同时允许恶意代码在CE网络中传播,导致设备故障。现有的基于深度学习的流量异常检测系统在传统网络环境中表现出很高的准确性,但往往过于复杂,依赖于静态基础设施,需要手动配置和管理。为了解决这些限制,我们提出了一个可扩展的网络模型,集成了软件定义网络(SDN)和计算优先网络(CFN)的下一代CE网络。在这个网络模型中,我们提出了一个基于图神经网络的网络异常检测框架(GNN-NAD),它集成了基于SDN的CE网络,并启用CFN架构。GNN-NAD独特地融合了静态的,可感知的攻击图与动态流量特征,提供了网络安全的整体视图。该框架的核心是用于图表示学习的GNN模型(GSAGE),其次是随机森林(RF)分类器。这种设计(GSAGE+RF)与现有的特征选择方法相比,表现出优越的性能。CE环境下的实验评估表明,GNN-NAD在准确率,召回率,精度和F1得分方面都取得了优异的指标,即使在小样本量的情况下,也超过了当前网络异常检测方法的性能。这项工作推进了下一代智能CE网络的安全性和效率。
摘要
:Consumer electronics (CE) connected to the Internet of Things are susceptible to various attacks, including DDoS and web-based threats, which can compromise their functionality and facilitate remote hijacking. These vulnerabilities allow attackers to exploit CE for broader system attacks while enabling the propagation of malicious code across the CE network, resulting in device failures. Existing deep learning-based traffic anomaly detection systems exhibit high accuracy in traditional network environments but are often overly complex and reliant on static infrastructure, necessitating manual configuration and management. To address these limitations, we propose a scalable network model that integrates Software-defined Networking (SDN) and Compute First Networking (CFN) for next-generation CE networks. In this network model, we propose a Graph Neural Networks-based Network Anomaly Detection framework (GNN-NAD) that integrates SDN-based CE networks and enables the CFN architecture. GNN-NAD uniquely fuses a static, vulnerability-aware attack graph with dynamic traffic features, providing a holistic view of network security. The core of the framework is a GNN model (GSAGE) for graph representation learning, followed by a Random Forest (RF) classifier. This design (GSAGE+RF) demonstrates superior performance compared to existing feature selection methods. Experimental evaluations on CE environment reveal that GNN-NAD achieves superior metrics in accuracy, recall, precision, and F1 score, even with small sample sizes, exceeding the performance of current network anomaly detection methods. This work advances the security and efficiency of next-generation intelligent CE networks.
【2】Sharpness-Aware Data Generation for Zero-shot Quantization
标题:零激发量化的敏锐度感知数据生成
链接:https://arxiv.org/abs/2510.07018
作者:Dung Hoang-Anh, Cuong Pham Trung Le, Jianfei Cai, Thanh-Toan Do
摘要:Zero-shot量化的目的是从预先训练的全精度模型中学习量化模型,而无需访问原始的真实训练数据。zero-shot量化方法中的共同思想是生成用于量化全精度模型的合成数据。虽然众所周知,具有低锐度的深度神经网络具有更好的泛化能力,但之前的zero-shot量化工作都没有考虑量化模型的锐度作为生成训练数据的标准。本文介绍了一种新的方法,考虑到量化模型的清晰度在合成数据生成,以提高泛化。具体来说,我们首先证明,锐度最小化可以通过最大化的合成和真实的验证数据上计算的重建损失梯度之间的梯度匹配,在一定的假设下,达到。然后,我们绕过了没有真正的验证集的梯度匹配的问题,近似它与每个生成的样本和它的邻居之间的梯度匹配。在CIFAR-100和ImageNet数据集上的实验评估表明,该方法在低比特量化设置中优于最先进的技术。
摘要:Zero-shot quantization aims to learn a quantized model from a pre-trained full-precision model with no access to original real training data. The common idea in zero-shot quantization approaches is to generate synthetic data for quantizing the full-precision model. While it is well-known that deep neural networks with low sharpness have better generalization ability, none of the previous zero-shot quantization works considers the sharpness of the quantized model as a criterion for generating training data. This paper introduces a novel methodology that takes into account quantized model sharpness in synthetic data generation to enhance generalization. Specifically, we first demonstrate that sharpness minimization can be attained by maximizing gradient matching between the reconstruction loss gradients computed on synthetic and real validation data, under certain assumptions. We then circumvent the problem of the gradient matching without real validation set by approximating it with the gradient matching between each generated sample and its neighbors. Experimental evaluations on CIFAR-100 and ImageNet datasets demonstrate the superiority of the proposed method over the state-of-the-art techniques in low-bit quantization settings.
【3】DecompGAIL: Learning Realistic Traffic Behaviors with Decomposed Multi-Agent Generative Adversarial Imitation Learning
标题:DecompGAIL:通过分解多智能体生成对抗模仿学习学习现实交通行为
链接:https://arxiv.org/abs/2510.06913
作者:Ke Guo, Haochen Liu, Xiaojun Wu, Chen Lv
摘要:真实交通仿真对于自动驾驶系统和城市交通规划的发展至关重要,但现有的模仿学习方法往往无法模拟真实的交通行为。行为克隆受到协变量转移的影响,而生成对抗模仿学习(GAIL)在多智能体环境中是出了名的不稳定。我们确定了这种不稳定性的一个关键来源:不相关的互动误导,其中一个自我惩罚车辆的现实行为,由于不切实际的邻居之间的相互作用。为了解决这个问题,我们提出了分解多代理GAIL(DecompGAIL),它明确地将现实主义分解为自我映射和自我邻居组件,过滤掉误导性的邻居:邻居和邻居:映射交互。我们进一步引入了一个社会PPO的目标,增加自我奖励与距离加权邻里奖励,鼓励整体现实主义的代理。DecompGAIL集成到基于SMART的轻量级主干中,在WOMD Sim Agents 2025基准测试中实现了最先进的性能。
摘要:Realistic traffic simulation is critical for the development of autonomous driving systems and urban mobility planning, yet existing imitation learning approaches often fail to model realistic traffic behaviors. Behavior cloning suffers from covariate shift, while Generative Adversarial Imitation Learning (GAIL) is notoriously unstable in multi-agent settings. We identify a key source of this instability: irrelevant interaction misguidance, where a discriminator penalizes an ego vehicle's realistic behavior due to unrealistic interactions among its neighbors. To address this, we propose Decomposed Multi-agent GAIL (DecompGAIL), which explicitly decomposes realism into ego-map and ego-neighbor components, filtering out misleading neighbor: neighbor and neighbor: map interactions. We further introduce a social PPO objective that augments ego rewards with distance-weighted neighborhood rewards, encouraging overall realism across agents. Integrated into a lightweight SMART-based backbone, DecompGAIL achieves state-of-the-art performance on the WOMD Sim Agents 2025 benchmark.
【4】A Diffusion Model for Regular Time Series Generation from Irregular Data with Completion and Masking
标题:具有完备和掩蔽的不规则数据生成规则时间序列的扩散模型
链接:https://arxiv.org/abs/2510.06699
作者:Gal Fadlon, Idan Arbiv, Nimrod Berman, Omri Azencot
备注:Accepted to NeurIPS 2025; The first two authors contributed equally and are co-leading authors
摘要
:生成真实的时间序列数据对于医疗保健、金融和科学领域的应用至关重要。然而,不规则采样和缺失值带来了重大挑战。虽然现有的方法解决了这些不规则性,但它们通常产生次优结果并产生高计算成本。常规时间序列生成的最新进展,如基于扩散的ImagenTime模型,通过将时间序列转换为图像表示,展示了强大,快速和可扩展的生成能力,使其成为一个有前途的解决方案。然而,使用简单的掩码将ImagenTime扩展到不规则序列会引入“不自然”的邻域,其中缺失值被零替换会破坏学习过程。为了克服这一点,我们提出了一个新的两步框架:第一,时间序列Transformer完成不规则序列,创建自然的邻居;第二,基于视觉的扩散模型与掩蔽最大限度地减少对完成的值的依赖。这种方法利用了完成和掩蔽的优势,能够稳健有效地生成真实的时间序列。我们的方法实现了最先进的性能,实现了相对改善的判别得分由$70\%$和计算成本由$85\%$。代码位于https://github.com/azencot-group/ImagenI2R。
摘要:Generating realistic time series data is critical for applications in healthcare, finance, and science. However, irregular sampling and missing values present significant challenges. While prior methods address these irregularities, they often yield suboptimal results and incur high computational costs. Recent advances in regular time series generation, such as the diffusion-based ImagenTime model, demonstrate strong, fast, and scalable generative capabilities by transforming time series into image representations, making them a promising solution. However, extending ImagenTime to irregular sequences using simple masking introduces "unnatural" neighborhoods, where missing values replaced by zeros disrupt the learning process. To overcome this, we propose a novel two-step framework: first, a Time Series Transformer completes irregular sequences, creating natural neighborhoods; second, a vision-based diffusion model with masking minimizes dependence on the completed values. This approach leverages the strengths of both completion and masking, enabling robust and efficient generation of realistic time series. Our method achieves state-of-the-art performance, achieving a relative improvement in discriminative score by $70\%$ and in computational cost by $85\%$. Code is at https://github.com/azencot-group/ImagenI2R.
【5】Geometry-Aware Backdoor Attacks: Leveraging Curvature in Hyperbolic Embeddings
标题:几何感知后门攻击:利用双曲嵌入中的弯曲
链接:https://arxiv.org/abs/2510.06397
作者:Ali Baheri
摘要:非欧几里德基础模型越来越多地将表示放在弯曲空间中,如双曲几何。我们发现,这种几何形状创建了一个边界驱动的不对称后门触发器可以利用。在边界附近,小的输入变化对标准输入空间检测器来说似乎很微妙,但会在模型的表示空间中产生不成比例的大位移。我们的分析形式化了这种效应,也揭示了防御的局限性:通过沿半径向内拉点来采取行动的方法可以抑制这种触发,但只能牺牲在同一方向上有用的模型灵敏度。基于这些见解,我们提出了一个简单的几何自适应触发器,并在任务和架构中对其进行评估。从经验上讲,攻击成功率在接近边界时会增加,而传统检测器会减弱,这反映了理论趋势。总之,这些结果揭示了非欧几里德模型中特定于几何的漏洞,并为设计和理解防御的局限性提供了分析支持的指导。
摘要:Non-Euclidean foundation models increasingly place representations in curved spaces such as hyperbolic geometry. We show that this geometry creates a boundary-driven asymmetry that backdoor triggers can exploit. Near the boundary, small input changes appear subtle to standard input-space detectors but produce disproportionately large shifts in the model's representation space. Our analysis formalizes this effect and also reveals a limitation for defenses: methods that act by pulling points inward along the radius can suppress such triggers, but only by sacrificing useful model sensitivity in that same direction. Building on these insights, we propose a simple geometry-adaptive trigger and evaluate it across tasks and architectures. Empirically, attack success increases toward the boundary, whereas conventional detectors weaken, mirroring the theoretical trends. Together, these results surface a geometry-specific vulnerability in non-Euclidean models and offer analysis-backed guidance for designing and understanding the limits of defenses.
【6】SDAR: A Synergistic Diffusion-AutoRegression Paradigm for Scalable Sequence Generation
标题:SDAR:可扩展序列生成的协同扩散-自回归范式
链接:https://arxiv.org/abs/2510.06303
作者:Shuang Cheng, Yihan Bian, Dawei Liu, Yuhua Jiang, Yihao Liu, Linfeng Zhang, Wenhai Wang, Qipeng Guo, Kai Chen, Biqing Qi, Bowen Zhou
备注:Technical report. 39 pages, including 14 pages of appendix
摘要:我们提出了SDAR,一个协同扩散-自回归范式,它将自回归模型的训练效率与扩散的并行推理能力统一起来。SDAR不是昂贵的端到端扩散训练,而是执行轻量级的范式转换,通过简短的数据高效适应将训练有素的自回归(AR)模型转换为块扩散模型。在推理过程中,SDAR生成跨块的自回归序列,以实现全局一致性,同时通过离散扩散过程并行解码每个块内的所有令牌。大量的实验表明,AR模型仍然比掩蔽扩散模型的计算效率高得多,为适应提供了坚实的基础。基于这一见解,SDAR以最小的成本实现了高效的AR到扩散转换,在保持AR级性能的同时实现了并行生成。在密集和混合专家架构上的缩放研究证实了SDAR的伸缩性:更大的模型对块大小和解码阈值表现出更强的鲁棒性,在不损失准确性的情况下获得更大的加速比。除了效率,SDAR还展示了增强的推理和领域适应性。我们的30 B MoE模型在具有挑战性的科学推理基准(如GPQA和ChemBench)上超越了AR模型,并在测试时间缩放方法(如多数投票和pass@k)下获得了进一步的改进。总之,这些结果建立SDAR作为一个实用的范例,结合了自回归和扩散的优势,可扩展的,高吞吐量的推理。
摘要:We propose SDAR, a Synergistic Diffusion-Autoregression paradigm that unifies the training efficiency of autoregressive models with the parallel inference capability of diffusion. Instead of costly end-to-end diffusion training, SDAR performs a lightweight paradigm conversion that transforms a well-trained autoregressive (AR) model into a blockwise diffusion model through brief, data-efficient adaptation. During inference, SDAR generates sequences autoregressively across blocks for global coherence while decoding all tokens within each block in parallel via a discrete diffusion process. Extensive experiments show that AR models remain substantially more compute-efficient than masked diffusion models, providing a strong foundation for adaptation. Building on this insight, SDAR achieves efficient AR-to-diffusion conversion with minimal cost, preserving AR-level performance while enabling parallel generation. Scaling studies across dense and Mixture-of-Experts architectures confirm that SDAR scales without compromise: larger models exhibit stronger robustness to block size and decoding thresholds, yielding greater speedups without accuracy loss. Beyond efficiency, SDAR demonstrates enhanced reasoning and domain adaptability. Our 30B MoE model surpasses its AR counterpart on challenging scientific reasoning benchmarks such as GPQA and ChemBench, and gains further improvements under test-time scaling methods like majority voting and pass@k. Together, these results establish SDAR as a practical paradigm that combines the strengths of autoregression and diffusion for scalable, high-throughput reasoning.
【7】RareGraph-Synth: Knowledge-Guided Diffusion Models for Generating Privacy-Preserving Synthetic Patient Trajectories in Ultra-Rare Diseases
标题:RareGraph-Synth:知识引导的扩散模型,用于生成超罕见疾病中保护隐私的合成患者轨迹
链接:https://arxiv.org/abs/2510.06267
作者:Khartik Uppalapati, Shakeel Abdulkareem, Bora Yimenicioglu
备注:6 pages, 2 figures, 2 tables. Submitted to IEEE International Conference on Data Science and Advanced Analytics (DSAA)
摘要:我们提出了RareGraph-Synth,这是一个知识引导的连续时间扩散框架,可以为超罕见疾病生成现实但保护隐私的合成电子健康记录(EHR)轨迹。RareGraph-Synth将五个公共资源:Orphanet/Orphanet,人类表型本体论(HPO),GARD罕见疾病KG,PrimeKG和FDA不良事件报告系统(FAERS)整合到一个包含约8 M类型边缘的异构知识图中。从这个800万边缘KG中提取的元路径分数调节正向随机微分方程中的每个令牌噪声时间表,将生成转向生物学上合理的实验室药物不良事件共现,同时保持基于分数的扩散模型稳定性。然后,反向去噪器产生实验室代码、药物代码和不良事件标志三元组的时间戳序列,这些三元组不包含受保护的健康信息。在模拟的超罕见疾病队列中,RareGraph-Synth相对于非引导扩散基线将分类最大平均离散度降低了40%,与GAN同行相比降低了60%以上,而不会牺牲下游预测效用。使用DOMIAS攻击者进行的黑盒成员推断评估产生的AUROC约为0.53,远低于0.55的安全释放阈值,并且大大优于非KG基线观察到的约0.61 ± 0.03,表明对重新识别的强烈抵抗。这些结果表明,将生物医学知识图直接集成到扩散噪声时间表中可以同时提高保真度和隐私性,从而为罕见疾病研究提供更安全的数据共享。
摘要:We propose RareGraph-Synth, a knowledge-guided, continuous-time diffusion framework that generates realistic yet privacy-preserving synthetic electronic-health-record (EHR) trajectories for ultra-rare diseases. RareGraph-Synth unifies five public resources: Orphanet/Orphadata, the Human Phenotype Ontology (HPO), the GARD rare-disease KG, PrimeKG, and the FDA Adverse Event Reporting System (FAERS) into a heterogeneous knowledge graph comprising approximately 8 M typed edges. Meta-path scores extracted from this 8-million-edge KG modulate the per-token noise schedule in the forward stochastic differential equation, steering generation toward biologically plausible lab-medication-adverse-event co-occurrences while retaining score-based diffusion model stability. The reverse denoiser then produces timestamped sequences of lab-code, medication-code, and adverse-event-flag triples that contain no protected health information. On simulated ultra-rare-disease cohorts, RareGraph-Synth lowers categorical Maximum Mean Discrepancy by 40 percent relative to an unguided diffusion baseline and by greater than 60 percent versus GAN counterparts, without sacrificing downstream predictive utility. A black-box membership-inference evaluation using the DOMIAS attacker yields AUROC approximately 0.53, well below the 0.55 safe-release threshold and substantially better than the approximately 0.61 plus or minus 0.03 observed for non-KG baselines, demonstrating strong resistance to re-identification. These results suggest that integrating biomedical knowledge graphs directly into diffusion noise schedules can simultaneously enhance fidelity and privacy, enabling safer data sharing for rare-disease research.
半/弱/无/有监督|不确定性|主动学习(7篇)
【1】Bridged Clustering for Representation Learning: Semi-Supervised Sparse Bridging
标题:用于表示学习的桥集群:半监督稀疏桥
链接:https://arxiv.org/abs/2510.07182
作者:Patrick Peixuan Ye, Chen Shani, Ellen Vitercik
摘要:我们介绍了桥接聚类,这是一个半监督框架,可以从任何不成对的输入$X$和输出$Y$数据集中学习预测器。我们的方法首先独立地聚类$X$和$Y$,然后只使用几个配对的例子来学习聚类之间稀疏的、可解释的桥梁。在推断时,新的输入$x$被分配给其最近的输入聚类,并且链接的输出聚类的质心作为预测$\hat{y}$返回。与传统的SSL不同,桥接集群显式地利用仅输出的数据,并且与密集的基于传输的方法不同,它保持稀疏和可解释的对齐。通过理论分析,我们表明,有界的错误聚类和错误桥接率,我们的算法成为一个有效的和高效的预测。从经验上讲,我们的方法与SOTA方法具有竞争力,同时在低监督设置中保持简单,模型不可知和高度标签效率。
摘要:We introduce Bridged Clustering, a semi-supervised framework to learn predictors from any unpaired input $X$ and output $Y$ dataset. Our method first clusters $X$ and $Y$ independently, then learns a sparse, interpretable bridge between clusters using only a few paired examples. At inference, a new input $x$ is assigned to its nearest input cluster, and the centroid of the linked output cluster is returned as the prediction $\hat{y}$. Unlike traditional SSL, Bridged Clustering explicitly leverages output-only data, and unlike dense transport-based methods, it maintains a sparse and interpretable alignment. Through theoretical analysis, we show that with bounded mis-clustering and mis-bridging rates, our algorithm becomes an effective and efficient predictor. Empirically, our method is competitive with SOTA methods while remaining simple, model-agnostic, and highly label-efficient in low-supervision settings.
【2】Unsupervised Backdoor Detection and Mitigation for Spiking Neural Networks
标题:尖峰神经网络的无监督后门检测和缓解
链接:https://arxiv.org/abs/2510.06629
作者:Jiachen Li, Bang Wu, Xiaoyu Xia, Xiaoning Liu, Xun Yi, Xiuzhen Zhang
备注:To appear in The 28th International Symposium on Research in Attacks, Intrusions and Defenses (RAID 2025)
摘要:与人工神经网络(ANN)相比,尖峰神经网络(SNN)因其优越的能量效率而受到越来越多的关注。然而,它们的安全方面,特别是后门攻击,受到的关注有限。现有的为ANN开发的防御方法由于其事件驱动和时间依赖性,在SNN中表现不佳或很容易被绕过。本文确定了阻碍SNN中传统后门防御的关键障碍,并提出了一种无监督的训练后检测框架,即时间膜电位后门检测(TMPBD),以克服这些挑战。TMPBD利用最终尖峰层中的时间膜电位(TMP)的最大裕度统计来检测目标标签,而无需任何攻击知识或数据访问。我们进一步引入了一个强大的缓解机制,神经树突抑制后门缓解(NDSBM),它在从一个小的,干净的,未标记的数据集提取的TMP的指导下,在早期卷积层之间夹持树突连接,以抑制恶意神经元,同时保留良性行为。在多个神经形态基准测试和最先进的输入感知动态触发攻击上的大量实验表明,TMPBD实现了100%的检测准确率,而NDSBM将攻击成功率从100%降低到8.44%,当与检测相结合时,降低到2.81%,而不会降低干净的准确率。
摘要
:Spiking Neural Networks (SNNs) have gained increasing attention for their superior energy efficiency compared to Artificial Neural Networks (ANNs). However, their security aspects, particularly under backdoor attacks, have received limited attention. Existing defense methods developed for ANNs perform poorly or can be easily bypassed in SNNs due to their event-driven and temporal dependencies. This paper identifies the key blockers that hinder traditional backdoor defenses in SNNs and proposes an unsupervised post-training detection framework, Temporal Membrane Potential Backdoor Detection (TMPBD), to overcome these challenges. TMPBD leverages the maximum margin statistics of temporal membrane potential (TMP) in the final spiking layer to detect target labels without any attack knowledge or data access. We further introduce a robust mitigation mechanism, Neural Dendrites Suppression Backdoor Mitigation (NDSBM), which clamps dendritic connections between early convolutional layers to suppress malicious neurons while preserving benign behaviors, guided by TMP extracted from a small, clean, unlabeled dataset. Extensive experiments on multiple neuromorphic benchmarks and state-of-the-art input-aware dynamic trigger attacks demonstrate that TMPBD achieves 100% detection accuracy, while NDSBM reduces the attack success rate from 100% to 8.44%, and to 2.81% when combined with detection, without degrading clean accuracy.
【3】Bayesian Optimization under Uncertainty for Training a Scale Parameter in Stochastic Models
标题:随机模型中规模参数训练的不确定性Bayesian优化
链接:https://arxiv.org/abs/2510.06439
作者:Akash Yadav, Ruda Zhang
摘要:超参数调整是一个具有挑战性的问题,特别是当系统本身涉及不确定性。由于噪声函数评估,不确定性下的优化可能在计算上是昂贵的。在本文中,我们提出了一种新的贝叶斯优化框架,专为不确定性下的超参数调整,重点是优化随机模型中的尺度或精度类型的参数。所提出的方法采用了统计代理的基础随机变量,使分析评估的期望算子。此外,我们推导出随机采集函数优化器的封闭形式表达式,这显着降低了每次迭代的计算成本。与传统的一维蒙特卡洛优化方案相比,所提出的方法需要少40倍的数据点,从而减少了高达40倍的计算成本。通过两个数值算例,验证了该方法的有效性.
摘要:Hyperparameter tuning is a challenging problem especially when the system itself involves uncertainty. Due to noisy function evaluations, optimization under uncertainty can be computationally expensive. In this paper, we present a novel Bayesian optimization framework tailored for hyperparameter tuning under uncertainty, with a focus on optimizing a scale- or precision-type parameter in stochastic models. The proposed method employs a statistical surrogate for the underlying random variable, enabling analytical evaluation of the expectation operator. Moreover, we derive a closed-form expression for the optimizer of the random acquisition function, which significantly reduces computational cost per iteration. Compared with a conventional one-dimensional Monte Carlo-based optimization scheme, the proposed approach requires 40 times fewer data points, resulting in up to a 40-fold reduction in computational cost. We demonstrate the effectiveness of the proposed method through two numerical examples in computational engineering.
【4】Uncertainty Quantification In Surface Landmines and UXO Classification Using MC Dropout
标题:用MC Dropout法对地面地雷和未爆炸弹药分类的不确定性进行量化
链接:https://arxiv.org/abs/2510.06238
作者:Sagar Lekhak, Emmett J. Ientilucci, Dimah Dera, Susmita Ghosh
备注:This work has been accepted and presented at IGARSS 2025 and will appear in the IEEE IGARSS 2025 proceedings
摘要:使用深度学习检测地表地雷和未爆炸弹药(UXO)在人道主义排雷中显示出了希望。然而,确定性神经网络可能容易受到噪声条件和对抗性攻击的影响,导致错过检测或错误分类。这项研究通过蒙特卡洛(MC)Dropout引入了不确定性量化的想法,将其集成到一个微调的ResNet-50架构中,用于地面地雷和未爆炸弹药分类,并在模拟数据集上进行了测试。纳入MC Dropout方法有助于量化认识上的不确定性,为预测可靠性提供了一个额外的衡量标准,这可能有助于在排雷行动中作出更知情的决定。在干净的、受干扰的和有噪声的测试图像上的实验结果证明了该模型在具有挑战性的条件下标记不可靠预测的能力。这一概念验证研究强调了排雷中不确定性量化的必要性,提高了对现有排雷神经网络在对抗性威胁面前的脆弱性的认识,并强调了为实际应用开发更强大和可靠模型的重要性。
摘要:Detecting surface landmines and unexploded ordnances (UXOs) using deep learning has shown promise in humanitarian demining. However, deterministic neural networks can be vulnerable to noisy conditions and adversarial attacks, leading to missed detection or misclassification. This study introduces the idea of uncertainty quantification through Monte Carlo (MC) Dropout, integrated into a fine-tuned ResNet-50 architecture for surface landmine and UXO classification, which was tested on a simulated dataset. Integrating the MC Dropout approach helps quantify epistemic uncertainty, providing an additional metric for prediction reliability, which could be helpful to make more informed decisions in demining operations. Experimental results on clean, adversarially perturbed, and noisy test images demonstrate the model's ability to flag unreliable predictions under challenging conditions. This proof-of-concept study highlights the need for uncertainty quantification in demining, raises awareness about the vulnerability of existing neural networks in demining to adversarial threats, and emphasizes the importance of developing more robust and reliable models for practical applications.
【5】Split Conformal Classification with Unsupervised Calibration
标题:具有无监督校准的分裂共形分类
链接:https://arxiv.org/abs/2510.07185
作者:Santiago Mazuelas
摘要:用于分裂共形预测的方法利用校准样本来将任何预测规则转换成符合目标覆盖概率的集合预测规则。现有的方法以最小的计算成本提供了非常强的性能保证。然而,它们需要使用由与用于训练的样本不同的标记样本组成的校准样本。这种要求可能非常不方便,因为它阻止了使用所有标记的示例进行训练,并且可能需要获取仅用于校准的额外标签。本文提出了一种有效的方法,分裂共形预测与无监督校准的分类任务。在所提出的方法中,集预测规则获得使用无监督的校准样本与监督训练样本先前用于学习的分类规则。理论和实验结果表明,所提出的方法可以实现性能与监督校准,在性能保证和计算效率的适度退化的代价。
摘要
:Methods for split conformal prediction leverage calibration samples to transform any prediction rule into a set-prediction rule that complies with a target coverage probability. Existing methods provide remarkably strong performance guarantees with minimal computational costs. However, they require to use calibration samples composed by labeled examples different to those used for training. This requirement can be highly inconvenient, as it prevents the use of all labeled examples for training and may require acquiring additional labels solely for calibration. This paper presents an effective methodology for split conformal prediction with unsupervised calibration for classification tasks. In the proposed approach, set-prediction rules are obtained using unsupervised calibration samples together with supervised training samples previously used to learn the classification rule. Theoretical and experimental results show that the presented methods can achieve performance comparable to that with supervised calibration, at the expenses of a moderate degradation in performance guarantees and computational efficiency.
【6】Active Control of Turbulent Airfoil Flows Using Adjoint-based Deep Learning
标题:使用基于伴随的深度学习主动控制湍流机翼流
链接:https://arxiv.org/abs/2510.07106
作者:Xuemin Liu, Tom Hickling, Jonathan F. MacArt
摘要:我们使用深度学习PDE增强方法训练主动神经网络流量控制器,以优化雷诺数为5\times10^4 $和马赫数为0.4的湍流翼型流的升阻比。采用直接数值模拟和大涡模拟方法模拟了攻角$\alpha = 5^\circ$、$10^\circ$和$15^\circ$下二维和三维半无限NACA 0012翼型的可压缩无约束流动。通过在上表面上的固定位置和几何形状处的吹/吸射流实施的控制动作由神经网络自适应地确定,该神经网络将局部压力测量映射到最佳射流总压力,从而实现传感器通知的控制策略,该控制策略在空间和时间上响应于不稳定的流动条件。的神经网络参数的流量的敏感性计算使用的伴随Navier-Stokes方程,我们使用自动微分应用到流量求解器构造。经过训练的流量控制器显著提高了升阻比,并减少了二维和三维翼型流动的流动分离,特别是在$\alpha = 5 ^\circ $和$10 ^\circ $时。当将样本外应用于3D流时,2D训练模型仍然有效,这证明了伴随训练控制方法的鲁棒性。经过3D训练的模型可以更有效地捕捉流体动力学,从而提高自适应(神经网络)和离线(简化的恒压)控制器的能效和可比性能。这些结果强调了这种基于学习的方法在提高气动性能方面的有效性。
摘要:We train active neural-network flow controllers using a deep learning PDE augmentation method to optimize lift-to-drag ratios in turbulent airfoil flows at Reynolds number $5\times10^4$ and Mach number 0.4. Direct numerical simulation and large eddy simulation are employed to model compressible, unconfined flow over two- and three-dimensional semi-infinite NACA 0012 airfoils at angles of attack $\alpha = 5^\circ$, $10^\circ$, and $15^\circ$. Control actions, implemented through a blowing/suction jet at a fixed location and geometry on the upper surface, are adaptively determined by a neural network that maps local pressure measurements to optimal jet total pressure, enabling a sensor-informed control policy that responds spatially and temporally to unsteady flow conditions. The sensitivities of the flow to the neural network parameters are computed using the adjoint Navier-Stokes equations, which we construct using automatic differentiation applied to the flow solver. The trained flow controllers significantly improve the lift-to-drag ratios and reduce flow separation for both two- and three-dimensional airfoil flows, especially at $\alpha = 5^\circ$ and $10^\circ$. The 2D-trained models remain effective when applied out-of-sample to 3D flows, which demonstrates the robustness of the adjoint-trained control approach. The 3D-trained models capture the flow dynamics even more effectively, which leads to better energy efficiency and comparable performance for both adaptive (neural network) and offline (simplified, constant-pressure) controllers. These results underscore the effectiveness of this learning-based approach in improving aerodynamic performance.
【7】Toward Uncertainty-Aware and Generalizable Neural Decoding for Quantum LDPC Codes
标题:量子低脂码的不确定性感知和可推广神经解码
链接:https://arxiv.org/abs/2510.06257
作者:Xiangjun Mi, Frank Mueller
摘要:量子纠错(QEC)对于可扩展的量子计算是必不可少的,然而经由常规算法的解码错误导致有限的准确性(即,逻辑错误的抑制)和高开销,这两者都可以通过基于推理的解码器来减轻。到目前为止,这种机器学习(ML)解码器缺乏两个关键属性的实际容错至关重要:可靠的不确定性量化和鲁棒的泛化到以前看不见的代码。为了解决这一差距,我们提出了\textbf{QuBA},一个贝叶斯图神经解码器,它集成了对点积和多头的关注,使表达错误模式识别与校准的不确定性估计。在QuBA的基础上,我们进一步开发了\textbf{SAGU }\textbf{(Sequential Aggregate Generalization under Uncertainty)},这是一个多代码训练框架,具有增强的跨域鲁棒性,能够在训练集之外进行解码。在二元双循环(BB)码及其互质变体上的实验表明:(i)QuBA和SAGU的性能始终优于经典的基线置信传播(BP),在互质BB码上,逻辑错误率(LER)平均降低了\n {一个数量级},在置信决策界下降低了\n {两个数量级};(ii)QuBA还超过了最先进的神经解码器,提供了大约\n {一个数量级}的优势(例如,对于较大的BB码$[[756,16,\leq34]]$),即使在考虑保守(安全)决策界限时也是如此;(iii)SAGU实现了与QuBA的域特定训练方法相当或甚至优于QuBA的域特定训练方法的解码性能。
摘要:Quantum error correction (QEC) is essential for scalable quantum computing, yet decoding errors via conventional algorithms result in limited accuracy (i.e., suppression of logical errors) and high overheads, both of which can be alleviated by inference-based decoders. To date, such machine-learning (ML) decoders lack two key properties crucial for practical fault tolerance: reliable uncertainty quantification and robust generalization to previously unseen codes. To address this gap, we propose \textbf{QuBA}, a Bayesian graph neural decoder that integrates attention to both dot-product and multi-head, enabling expressive error-pattern recognition alongside calibrated uncertainty estimates. Building on QuBA, we further develop \textbf{SAGU }\textbf{(Sequential Aggregate Generalization under Uncertainty)}, a multi-code training framework with enhanced cross-domain robustness enabling decoding beyond the training set. Experiments on bivariate bicycle (BB) codes and their coprime variants demonstrate that (i) both QuBA and SAGU consistently outperform the classical baseline belief propagation (BP), achieving a reduction of on average \emph{one order of magnitude} in logical error rate (LER), and up to \emph{two orders of magnitude} under confident-decision bounds on the coprime BB code $[[154, 6, 16]]$; (ii) QuBA also surpasses state-of-the-art neural decoders, providing an advantage of roughly \emph{one order of magnitude} (e.g., for the larger BB code $[[756, 16, \leq34]]$) even when considering conservative (safe) decision bounds; (iii) SAGU achieves decoding performance comparable to or even outperforming QuBA's domain-specific training approach.
迁移|Zero/Few/One-Shot|自适应(5篇)
【1】The False Promise of Zero-Shot Super-Resolution in Machine-Learned Operators
标题:机器学习算子中零次超分辨率的虚假承诺
链接
:https://arxiv.org/abs/2510.06646
作者:Mansi Sakarvadia, Kareem Hegazy, Amin Totounferoush, Kyle Chard, Yaoqing Yang, Ian Foster, Michael W. Mahoney
摘要:科学机器学习和更普遍的科学计算的核心挑战是对(在实践中)离散表示的连续现象进行建模。机器学习算子(MLO)已经被引入作为实现这一建模目标的一种手段,因为这类架构可以以任意分辨率执行推理。在这项工作中,我们评估这种架构创新是否足以执行“zero-shot super-resolution”,即使模型能够在比最初训练的数据更高分辨率的数据上进行推理。我们全面评估了zero-shot亚分辨率和超分辨率(即,多分辨率)推理。我们将多分辨率推理解耦为两个关键行为:1)外推到变化的频率信息; 2)在不同的分辨率上插值。我们的经验表明,MLO未能做到这两个任务在一个zero-shot的方式。因此,我们发现MLO无法在与训练时不同的分辨率下进行准确的推理,相反,它们很脆弱,容易出现混叠。为了解决这些故障模式,我们提出了一个简单的,计算效率高,数据驱动的多分辨率训练协议,克服了混叠,并提供了强大的多分辨率泛化。
摘要:A core challenge in scientific machine learning, and scientific computing more generally, is modeling continuous phenomena which (in practice) are represented discretely. Machine-learned operators (MLOs) have been introduced as a means to achieve this modeling goal, as this class of architecture can perform inference at arbitrary resolution. In this work, we evaluate whether this architectural innovation is sufficient to perform "zero-shot super-resolution," namely to enable a model to serve inference on higher-resolution data than that on which it was originally trained. We comprehensively evaluate both zero-shot sub-resolution and super-resolution (i.e., multi-resolution) inference in MLOs. We decouple multi-resolution inference into two key behaviors: 1) extrapolation to varying frequency information; and 2) interpolating across varying resolutions. We empirically demonstrate that MLOs fail to do both of these tasks in a zero-shot manner. Consequently, we find MLOs are not able to perform accurate inference at resolutions different from those on which they were trained, and instead they are brittle and susceptible to aliasing. To address these failure modes, we propose a simple, computationally-efficient, and data-driven multi-resolution training protocol that overcomes aliasing and that provides robust multi-resolution generalization.
【2】ATLO-ML: Adaptive Time-Length Optimizer for Machine Learning -- Insights from Air Quality Forecasting
标题:ATLO-ML:机器学习的自适应时间长度优化器--来自空气质量预测的见解
链接:https://arxiv.org/abs/2510.06503
作者:I-Hsi Kao, Kanji Uchino
摘要:机器学习中准确的时间序列预测在很大程度上受到选择适当的输入时间长度和采样率的影响。本文介绍了ATLO-ML,一个自适应的时间长度优化系统,自动确定最佳的输入时间长度和采样率的基础上,用户定义的输出时间长度。该系统提供了一种灵活的方法来进行时间序列数据预处理,动态调整这些参数,以提高预测性能。ATLO-ML使用空气质量数据集进行验证,包括从数据中心收集的GAMS数据集和专有数据,两者都是时间序列格式。结果表明,与固定时间长度相比,利用优化的时间长度和采样率显着提高了机器学习模型的准确性。ATLO-ML在各种时间敏感的应用程序中表现出泛化的潜力,为优化机器学习工作流中的时间输入参数提供了一个强大的解决方案。
摘要:Accurate time-series predictions in machine learning are heavily influenced by the selection of appropriate input time length and sampling rate. This paper introduces ATLO-ML, an adaptive time-length optimization system that automatically determines the optimal input time length and sampling rate based on user-defined output time length. The system provides a flexible approach to time-series data pre-processing, dynamically adjusting these parameters to enhance predictive performance. ATLO-ML is validated using air quality datasets, including both GAMS-dataset and proprietary data collected from a data center, both in time series format. Results demonstrate that utilizing the optimized time length and sampling rate significantly improves the accuracy of machine learning models compared to fixed time lengths. ATLO-ML shows potential for generalization across various time-sensitive applications, offering a robust solution for optimizing temporal input parameters in machine learning workflows.
【3】TransFIRA: Transfer Learning for Face Image Recognizability Assessment
标题:TransFIRA:人脸图像可识别性评估的迁移学习
链接:https://arxiv.org/abs/2510.06353
作者:Allen Tu, Kartik Narayan, Joshua Gleason, Jennifer Xu, Matthew Meyn, Tom Goldstein, Vishal M. Patel
备注:Project Page: this https URL
摘要:在监控、视频和网络图像等无约束环境中的人脸识别必须应对姿态、模糊、照明和遮挡的极端变化,其中传统的视觉质量指标无法预测输入是否能够被部署的编码器真正识别。现有的FIQA方法通常依赖于视觉几何学、策划注释或计算密集型生成流水线,使其预测与编码器的决策几何分离。我们介绍了TransFIRA(人脸图像可识别性评估的迁移学习),这是一个轻量级和无注释的框架,直接在嵌入空间中实现可识别性。TransFIRA提供了三项先进技术:(i)通过类中心相似性(CCS)和类中心角分离(CCAS)定义可识别性,产生用于过滤和加权的第一自然的、决策边界对齐的标准;(ii)可识别性信息聚合策略,其在BRIAR和IJB-C上实现最先进的验证准确性,同时几乎使与真实可识别性的相关性加倍,所有这些都没有外部标签,或骨干特定的培训;和(iii)新的扩展以外的面孔,包括编码器接地的可解释性,揭示了如何退化和特定主题的因素影响可识别性,以及第一个可识别性感知的身体识别评估。实验证实了面部的最新结果,身体识别的强大性能以及跨数据集移动的鲁棒性。总之,这些贡献将TransFIRA确立为一个统一的、几何驱动的可识别性评估框架--编码器特定的、准确的、可解释的和跨模态的可扩展的--显著提高了FIQA的准确性、可解释性和范围。
摘要
:Face recognition in unconstrained environments such as surveillance, video, and web imagery must contend with extreme variation in pose, blur, illumination, and occlusion, where conventional visual quality metrics fail to predict whether inputs are truly recognizable to the deployed encoder. Existing FIQA methods typically rely on visual heuristics, curated annotations, or computationally intensive generative pipelines, leaving their predictions detached from the encoder's decision geometry. We introduce TransFIRA (Transfer Learning for Face Image Recognizability Assessment), a lightweight and annotation-free framework that grounds recognizability directly in embedding space. TransFIRA delivers three advances: (i) a definition of recognizability via class-center similarity (CCS) and class-center angular separation (CCAS), yielding the first natural, decision-boundary--aligned criterion for filtering and weighting; (ii) a recognizability-informed aggregation strategy that achieves state-of-the-art verification accuracy on BRIAR and IJB-C while nearly doubling correlation with true recognizability, all without external labels, heuristics, or backbone-specific training; and (iii) new extensions beyond faces, including encoder-grounded explainability that reveals how degradations and subject-specific factors affect recognizability, and the first recognizability-aware body recognition assessment. Experiments confirm state-of-the-art results on faces, strong performance on body recognition, and robustness under cross-dataset shifts. Together, these contributions establish TransFIRA as a unified, geometry-driven framework for recognizability assessment -- encoder-specific, accurate, interpretable, and extensible across modalities -- significantly advancing FIQA in accuracy, explainability, and scope.
【4】Efficient High-Resolution Image Editing with Hallucination-Aware Loss and Adaptive Tiling
标题:具有幻觉感知丢失和自适应切片的高效高分辨率图像编辑
链接:https://arxiv.org/abs/2510.06295
作者:Young D. Kwon, Abhinav Mehrotra, Malcolm Chadwick, Alberto Gil Ramos, Sourav Bhattacharya
备注:Preprint. Under review
摘要:高分辨率(4K)图像到图像合成对于移动应用已经变得越来越重要。现有的图像编辑扩散模型面临着巨大的挑战,在内存和图像质量方面,当部署在资源受限的设备。在本文中,我们提出了MobilePicasso,一个新的系统,使高效的图像编辑在高分辨率,同时最大限度地减少计算成本和内存使用。MobilePicasso包括三个阶段:(i)以标准分辨率执行图像编辑,具有幻觉感知损失,(ii)应用潜在投影以克服进入像素空间,以及(iii)使用自适应上下文保持平铺将编辑的图像潜在放大到更高的分辨率。我们对46名参与者的用户研究表明,MobilePicasso不仅将图像质量提高了18-48%,而且比现有方法减少了14-51%的幻觉。MobilePicasso的延迟显著降低,例如:高达55.8倍的加速,但运行时内存略有增加,例如,仅比之前的工作增加了9%。令人惊讶的是,MobilePicasso的设备上运行时间比在A100 GPU上运行的基于服务器的高分辨率图像编辑模型更快。
摘要:High-resolution (4K) image-to-image synthesis has become increasingly important for mobile applications. Existing diffusion models for image editing face significant challenges, in terms of memory and image quality, when deployed on resource-constrained devices. In this paper, we present MobilePicasso, a novel system that enables efficient image editing at high resolutions, while minimising computational cost and memory usage. MobilePicasso comprises three stages: (i) performing image editing at a standard resolution with hallucination-aware loss, (ii) applying latent projection to overcome going to the pixel space, and (iii) upscaling the edited image latent to a higher resolution with adaptive context-preserving tiling. Our user study with 46 participants reveals that MobilePicasso not only improves image quality by 18-48% but reduces hallucinations by 14-51% over existing methods. MobilePicasso demonstrates significantly lower latency, e.g., up to 55.8$\times$ speed-up, yet with a small increase in runtime memory, e.g., a mere 9% increase over prior work. Surprisingly, the on-device runtime of MobilePicasso is observed to be faster than a server-based high-resolution image editing model running on an A100 GPU.
【5】Beyond Static Knowledge Messengers: Towards Adaptive, Fair, and Scalable Federated Learning for Medical AI
标题:超越静态知识消息:迈向医疗人工智能的自适应、公平和可扩展的联邦学习
链接:https://arxiv.org/abs/2510.06259
作者:Jahidul Arafat, Fariha Tasmin, Sanjaya Poudel, Ahsan Habib Tareq, Iftekhar Haider
备注:20 pages, 4 figures, 14 tables. Proposes Adaptive Fair Federated Learning (AFFL) algorithm and MedFedBench benchmark suite for healthcare federated learning
摘要:医疗人工智能在保护隐私的协作学习方面面临挑战,同时确保异构医疗机构之间的公平性。目前的联邦学习方法存在静态架构,收敛速度慢(45-73轮),公平性差距使较小的机构边缘化,以及可扩展性限制(15个客户端限制)。我们通过三个创新提出了自适应公平联合学习(AFFL):(1)自适应知识消息传递基于异质性和任务复杂性动态扩展容量,(2)使用影响加权聚合的公平意识蒸馏,以及(3)课程引导加速减少60- 70%的轮次。我们的理论分析提供了收敛保证与ε-公平界,实现O(T^{-1/2})+ O(H_max/T^{3/4})的速度。预计结果显示,通信减少55-75%,公平性提高56-68%,节能34-46%,并支持100多个机构。该框架支持跨成像、基因组学、EHR和传感器数据的多模式集成,同时保持HIPAA/GDPR合规性。我们提出了MedFedBench基准套件,用于六个医疗保健维度的标准化评估:融合效率,机构公平性,隐私保护,多模式集成,可扩展性和临床部署准备。经济预测表明,农村医院的投资回报率为400-800%,学术中心的业绩增长为15-25%。这项工作提出了一个七个问题的研究议程,24个月的实施路线图,以及实现医疗保健AI民主化的途径。
摘要:Medical AI faces challenges in privacy-preserving collaborative learning while ensuring fairness across heterogeneous healthcare institutions. Current federated learning approaches suffer from static architectures, slow convergence (45-73 rounds), fairness gaps marginalizing smaller institutions, and scalability constraints (15-client limit). We propose Adaptive Fair Federated Learning (AFFL) through three innovations: (1) Adaptive Knowledge Messengers dynamically scaling capacity based on heterogeneity and task complexity, (2) Fairness-Aware Distillation using influence-weighted aggregation, and (3) Curriculum-Guided Acceleration reducing rounds by 60-70%. Our theoretical analysis provides convergence guarantees with epsilon-fairness bounds, achieving O(T^{-1/2}) + O(H_max/T^{3/4}) rates. Projected results show 55-75% communication reduction, 56-68% fairness improvement, 34-46% energy savings, and 100+ institution support. The framework enables multi-modal integration across imaging, genomics, EHR, and sensor data while maintaining HIPAA/GDPR compliance. We propose MedFedBench benchmark suite for standardized evaluation across six healthcare dimensions: convergence efficiency, institutional fairness, privacy preservation, multi-modal integration, scalability, and clinical deployment readiness. Economic projections indicate 400-800% ROI for rural hospitals and 15-25% performance gains for academic centers. This work presents a seven-question research agenda, 24-month implementation roadmap, and pathways toward democratizing healthcare AI.
强化学习(6篇)
【1】Falsification-Driven Reinforcement Learning for Maritime Motion Planning
标题:证伪驱动的海上运动规划强化学习
链接:https://arxiv.org/abs/2510.06970
作者:Marlon Müller, Florian Finkeldei, Hanna Krasowski, Murat Arcak, Matthias Althoff
摘要:遵守海上交通规则对于自主船舶的安全运营至关重要,但训练强化学习(RL)代理遵守这些规则具有挑战性。RL代理的行为是由他们遇到的训练场景塑造的,但是创建捕获海上导航复杂性的场景并不是微不足道的,并且仅凭真实世界的数据是不够的。为了解决这个问题,我们提出了一种伪造驱动的RL方法,该方法生成对抗性训练场景,其中被测船舶违反了海上交通规则,这些规则表示为信号时序逻辑规范。我们的公海航行与两艘船的实验表明,所提出的方法提供了更多的相关培训方案,并实现更一致的规则遵守。
摘要:Compliance with maritime traffic rules is essential for the safe operation of autonomous vessels, yet training reinforcement learning (RL) agents to adhere to them is challenging. The behavior of RL agents is shaped by the training scenarios they encounter, but creating scenarios that capture the complexity of maritime navigation is non-trivial, and real-world data alone is insufficient. To address this, we propose a falsification-driven RL approach that generates adversarial training scenarios in which the vessel under test violates maritime traffic rules, which are expressed as signal temporal logic specifications. Our experiments on open-sea navigation with two vessels demonstrate that the proposed approach provides more relevant training scenarios and achieves more consistent rule compliance.
【2】Local Reinforcement Learning with Action-Conditioned Root Mean Squared Q-Functions
标题:基于条件均方根Q函数的局部强化学习
链接:https://arxiv.org/abs/2510.06649
作者:Frank Wu, Mengye Ren
备注:15 pages, 5 figures
摘要:前向-前向(FF)算法是最近提出的神经网络学习过程,它采用两个前向传递,而不是反向传播中使用的传统前向和反向传递。然而,FF在很大程度上仍然局限于监督设置,在学习信号可以更自然地产生的领域(如RL)留下了空白。在这项工作中,受FF使用层活动统计的优度函数的启发,我们引入了基于条件的均方根Q函数(ARQ),这是一种新的值估计方法,它使用时间差学习对局部RL应用优度函数和动作条件。尽管其简单性和生物学基础,但与MinAtar和DeepMind Control Suite基准测试中最先进的本地无反向传播RL方法相比,我们的方法实现了卓越的性能,同时在大多数任务上也优于使用反向传播训练的算法。代码可在https://github.com/agentic-learning-ai-lab/arq上找到。
摘要:The Forward-Forward (FF) Algorithm is a recently proposed learning procedure for neural networks that employs two forward passes instead of the traditional forward and backward passes used in backpropagation. However, FF remains largely confined to supervised settings, leaving a gap at domains where learning signals can be yielded more naturally such as RL. In this work, inspired by FF's goodness function using layer activity statistics, we introduce Action-conditioned Root mean squared Q-Functions (ARQ), a novel value estimation method that applies a goodness function and action conditioning for local RL using temporal difference learning. Despite its simplicity and biological grounding, our approach achieves superior performance compared to state-of-the-art local backprop-free RL methods in the MinAtar and the DeepMind Control Suite benchmarks, while also outperforming algorithms trained with backpropagation on most tasks. Code can be found at https://github.com/agentic-learning-ai-lab/arq.
【3】General and Efficient Visual Goal-Conditioned Reinforcement Learning using Object-Agnostic Masks
标题:使用对象不可知面具的通用有效视觉目标条件强化学习
链接:https://arxiv.org/abs/2510.06277
作者:Fahim Shahriar, Cheryl Wang, Alireza Azimi, Gautham Vasan, Hany Hamed Elanwar, A. Rupam Mahmood, Colin Bellinger
摘要:目标条件强化学习(GCRL)允许代理使用统一的策略学习不同的目标。然而,GCRL的成功取决于目标表示的选择。在这项工作中,我们提出了一个基于掩码的目标表示系统,提供对象不可知的视觉线索的代理,使有效的学习和优越的泛化。相比之下,现有的目标表示方法,如目标状态图像,3D坐标,和一个热矢量,面临的问题,穷人的泛化看不见的对象,收敛速度慢,需要特殊的相机。掩码可以被处理以生成密集的奖励,而不需要容易出错的距离计算。在模拟中使用地面真实掩码进行学习,我们在训练和看不见的测试对象上达到了99.9%的准确率。我们提出的方法可以用来执行拾取任务,具有高精度,而不使用任何位置信息的目标。此外,我们使用两个不同的物理机器人演示了从头开始学习和模拟到真实的转移应用程序,利用预训练的开放词汇对象检测模型进行掩码生成。
摘要:Goal-conditioned reinforcement learning (GCRL) allows agents to learn diverse objectives using a unified policy. The success of GCRL, however, is contingent on the choice of goal representation. In this work, we propose a mask-based goal representation system that provides object-agnostic visual cues to the agent, enabling efficient learning and superior generalization. In contrast, existing goal representation methods, such as target state images, 3D coordinates, and one-hot vectors, face issues of poor generalization to unseen objects, slow convergence, and the need for special cameras. Masks can be processed to generate dense rewards without requiring error-prone distance calculations. Learning with ground truth masks in simulation, we achieved 99.9% reaching accuracy on training and unseen test objects. Our proposed method can be utilized to perform pick-up tasks with high accuracy, without using any positional information of the target. Moreover, we demonstrate learning from scratch and sim-to-real transfer applications using two different physical robots, utilizing pretrained open vocabulary object detection models for mask generation.
【4】Diffusion-Augmented Reinforcement Learning for Robust Portfolio Optimization under Stress Scenarios
标题:压力场景下稳健投资组合优化的扩散增强学习
链接:https://arxiv.org/abs/2510.07099
作者:Himanshu Choudhary, Arishi Orra, Manoj Thakur
摘要
:在金融市场不断变化和复杂的环境中,投资组合优化仍然是投资者和资产管理公司面临的巨大挑战。传统方法往往难以捕捉市场行为的复杂动态,并与不同的投资者偏好保持一致。为了解决这个问题,我们提出了一个创新的框架,称为扩散增强强化学习(DARL),它将去噪扩散概率模型(DDPM)与深度强化学习(DRL)协同集成,用于投资组合管理。通过利用DDPM生成以不同压力强度为条件的合成市场崩溃场景,我们的方法显着增强了训练数据的鲁棒性。经验评估表明,DARL的表现优于传统基准,提供了卓越的风险调整回报和应对不可预见危机的能力,例如2025年关税危机。这项工作提供了一个强大的和实用的方法,以加强压力弹性在DRL驱动的金融应用程序。
摘要:In the ever-changing and intricate landscape of financial markets, portfolio optimisation remains a formidable challenge for investors and asset managers. Conventional methods often struggle to capture the complex dynamics of market behaviour and align with diverse investor preferences. To address this, we propose an innovative framework, termed Diffusion-Augmented Reinforcement Learning (DARL), which synergistically integrates Denoising Diffusion Probabilistic Models (DDPMs) with Deep Reinforcement Learning (DRL) for portfolio management. By leveraging DDPMs to generate synthetic market crash scenarios conditioned on varying stress intensities, our approach significantly enhances the robustness of training data. Empirical evaluations demonstrate that DARL outperforms traditional baselines, delivering superior risk-adjusted returns and resilience against unforeseen crises, such as the 2025 Tariff Crisis. This work offers a robust and practical methodology to bolster stress resilience in DRL-driven financial applications.
【5】PyCFRL: A Python library for counterfactually fair offline reinforcement learning via sequential data preprocessing
标题:PyCFRL:一个Python库,通过顺序数据预处理实现反事实公平的离线强化学习
链接:https://arxiv.org/abs/2510.06935
作者:Jianhan Zhang, Jitao Wang, Chengchun Shi, John D. Piette, Donglin Zeng, Zhenke Wu
摘要:强化学习(RL)旨在学习和评估一个顺序决策规则,通常被称为“策略”,该规则可以在可能无限多个时间步长的环境中最大限度地提高种群水平的效益。然而,由RL算法做出的顺序决策虽然被优化以最大化总体群体利益,但可能会使少数群体或社会经济弱势群体中的某些个体处于不利地位。为了解决这个问题,我们引入了PyCFRL,这是一个Python库,用于确保离线RL中的反事实公平性。PyCFRL实现了一种新的数据预处理算法,用于从离线数据集学习反事实公平的RL策略,并提供了评估RL策略的值和反事实不公平级别的工具。我们描述了PyCFRL的高级功能,并通过数据示例演示了其主要用例之一。该库在PyPI和Github上公开提供(https://github.com/JianhanZhang/PyCFRL),详细的教程可以在PyCFRL文档中找到(pycfrl-documentation.netlify.app)。
摘要:Reinforcement learning (RL) aims to learn and evaluate a sequential decision rule, often referred to as a "policy", that maximizes the population-level benefit in an environment across possibly infinitely many time steps. However, the sequential decisions made by an RL algorithm, while optimized to maximize overall population benefits, may disadvantage certain individuals who are in minority or socioeconomically disadvantaged groups. To address this problem, we introduce PyCFRL, a Python library for ensuring counterfactual fairness in offline RL. PyCFRL implements a novel data preprocessing algorithm for learning counterfactually fair RL policies from offline datasets and provides tools to evaluate the values and counterfactual unfairness levels of RL policies. We describe the high-level functionalities of PyCFRL and demonstrate one of its major use cases through a data example. The library is publicly available on PyPI and Github (https://github.com/JianhanZhang/PyCFRL), and detailed tutorials can be found in the PyCFRL documentation (https://pycfrl-documentation.netlify.app).
【6】Online Matching via Reinforcement Learning: An Expert Policy Orchestration Strategy
标题:通过强化学习进行在线匹配:一种专家政策规划策略
链接:https://arxiv.org/abs/2510.06515
作者:Chiara Mignacco, Matthieu Jonckheere, Gilles Stoltz
摘要:在线匹配问题出现在许多复杂的系统中,从云服务和在线市场到器官交换网络,及时、有原则的决策对于保持高系统性能至关重要。在这些环境中的传统化学分析是简单和可解释的,但通常是针对特定的操作制度,这可能导致在条件变化时效率低下。我们提出了一种强化学习(RL)方法,该方法学习编排一组这样的专家策略,以数据驱动的自适应方式利用它们的互补优势。基于Adv 2框架(Jonckheere等人,2024),我们的方法通过基于权重的更新结合了专家决策,并自然扩展到只有估计值函数可用的设置。我们建立了期望和高概率后悔保证,并推导出一种新的有限时间偏差的时间差学习,使可靠的优势估计,即使在恒定的步长和非平稳动态。为了支持可扩展性,我们引入了一个神经演员-评论家架构,该架构可以在保持可解释性的同时在大型状态空间中进行泛化。随机匹配模型的模拟,包括器官交换的情况下,表明精心策划的政策收敛速度更快,并产生更高的系统级效率比个人专家和传统的RL基线。我们的研究结果强调了结构化的自适应学习如何改善复杂资源分配和决策过程的建模和管理。
摘要:Online matching problems arise in many complex systems, from cloud services and online marketplaces to organ exchange networks, where timely, principled decisions are critical for maintaining high system performance. Traditional heuristics in these settings are simple and interpretable but typically tailored to specific operating regimes, which can lead to inefficiencies when conditions change. We propose a reinforcement learning (RL) approach that learns to orchestrate a set of such expert policies, leveraging their complementary strengths in a data-driven, adaptive manner. Building on the Adv2 framework (Jonckheere et al., 2024), our method combines expert decisions through advantage-based weight updates and extends naturally to settings where only estimated value functions are available. We establish both expectation and high-probability regret guarantees and derive a novel finite-time bias bound for temporal-difference learning, enabling reliable advantage estimation even under constant step size and non-stationary dynamics. To support scalability, we introduce a neural actor-critic architecture that generalizes across large state spaces while preserving interpretability. Simulations on stochastic matching models, including an organ exchange scenario, show that the orchestrated policy converges faster and yields higher system level efficiency than both individual experts and conventional RL baselines. Our results highlight how structured, adaptive learning can improve the modeling and management of complex resource allocation and decision-making processes.
元学习(1篇)
【1】Inefficiencies of Meta Agents for Agent Design
标题:Meta代理在代理设计中的无效性
链接:https://arxiv.org/abs/2510.06711
作者:Batu El, Mert Yuksekgonul, James Zou
摘要:最近的工作开始自动化的代理系统的设计使用元代理,提出并迭代完善新的代理架构。在本文中,我们研究了三个关键的挑战,在一个共同的类元代理。首先,我们研究了元代理如何在迭代中学习,并发现简单地扩展所有以前的代理的上下文,如以前的作品所提出的,比完全忽略以前的设计更糟糕。我们表明,性能提高与进化的方法。其次,虽然元代理在训练期间设计多个代理,但它通常在测试时提交给单个代理。我们发现,所设计的代理人具有较低的行为多样性,限制了其互补使用的潜力。第三,我们评估自动化设计何时在经济上可行。我们发现,只有在少数情况下-特别是两个数据集-设计和部署代理的总体成本低于人类设计的代理时,部署在超过15,000个例子。相比之下,其他数据集的性能提升并不能证明设计成本的合理性,无论规模如何。
摘要:Recent works began to automate the design of agentic systems using meta-agents that propose and iteratively refine new agent architectures. In this paper, we examine three key challenges in a common class of meta-agents. First, we investigate how a meta-agent learns across iterations and find that simply expanding the context with all previous agents, as proposed by previous works, performs worse than ignoring prior designs entirely. We show that the performance improves with an evolutionary approach. Second, although the meta-agent designs multiple agents during training, it typically commits to a single agent at test time. We find that the designed agents have low behavioral diversity, limiting the potential for their complementary use. Third, we assess when automated design is economically viable. We find that only in a few cases--specifically, two datasets--the overall cost of designing and deploying the agents is lower than that of human-designed agents when deployed on over 15,000 examples. In contrast, the performance gains for other datasets do not justify the design cost, regardless of scale.
符号|符号学习(2篇)
【1】StruSR: Structure-Aware Symbolic Regression with Physics-Informed Taylor Guidance
标题:StruSR:具有物理知识泰勒指导的结构感知符号回归
链接:https://arxiv.org/abs/2510.06635
作者:Yunpeng Gong, Sihan Lan, Can Yang, Kunpeng Xu, Min Jiang
摘要:符号回归旨在通过搜索数学公式空间来捕获底层系统行为,特别是在由物理定律控制的科学建模中,找到可解释的分析表达式。然而,传统的方法缺乏从时间序列观测中提取结构化物理先验的机制,使得难以捕获反映系统全局行为的符号表达式。在这项工作中,我们提出了一个结构感知的符号回归框架,称为StruSR,它利用训练的物理信息神经网络(PINN)从时间序列数据中提取本地结构化的物理先验。通过对训练后的PINN的输出执行局部泰勒展开,我们获得基于导数的结构信息来指导符号表达式进化。为了评估表达组件的重要性,我们引入了一个基于掩码的归因机制,量化每个子树的结构对齐和物理残留减少的贡献。这些灵敏度分数引导遗传编程中的变异和交叉操作,保留具有高物理或结构意义的子结构,同时选择性地修改信息量较少的成分。混合适应度函数共同最小化物理残差和泰勒系数失配,确保与控制方程和PINN编码的局部分析行为的一致性。在基准PDE系统上的实验表明,与传统基线相比,StruSR提高了收敛速度、结构保真度和表达可解释性,为基于物理的符号发现提供了一个原则性范例。
摘要:Symbolic regression aims to find interpretable analytical expressions by searching over mathematical formula spaces to capture underlying system behavior, particularly in scientific modeling governed by physical laws. However, traditional methods lack mechanisms for extracting structured physical priors from time series observations, making it difficult to capture symbolic expressions that reflect the system's global behavior. In this work, we propose a structure-aware symbolic regression framework, called StruSR, that leverages trained Physics-Informed Neural Networks (PINNs) to extract locally structured physical priors from time series data. By performing local Taylor expansions on the outputs of the trained PINN, we obtain derivative-based structural information to guide symbolic expression evolution. To assess the importance of expression components, we introduce a masking-based attribution mechanism that quantifies each subtree's contribution to structural alignment and physical residual reduction. These sensitivity scores steer mutation and crossover operations within genetic programming, preserving substructures with high physical or structural significance while selectively modifying less informative components. A hybrid fitness function jointly minimizes physics residuals and Taylor coefficient mismatch, ensuring consistency with both the governing equations and the local analytical behavior encoded by the PINN. Experiments on benchmark PDE systems demonstrate that StruSR improves convergence speed, structural fidelity, and expression interpretability compared to conventional baselines, offering a principled paradigm for physics-grounded symbolic discovery.
【2】BACHI: Boundary-Aware Symbolic Chord Recognition Through Masked Iterative Decoding on Pop and Classical Music
标题:BACHI:通过流行音乐和古典音乐的掩蔽迭代解码实现边界意识符号和弦识别
链接:https://arxiv.org/abs/2510.06528
作者:Mingyang Yao, Ke Chen, Shlomo Dubnov, Taylor Berg-Kirkpatrick
备注:Under review
摘要:通过深度学习模型进行的自动和弦识别(ACR)已经逐渐实现了有希望的识别精度,但仍然存在两个关键挑战。首先,先前的工作主要集中在音频域ACR上,而符号音乐(例如,由于数据稀缺,ACR受到的关注有限。其次,现有的方法仍然忽略了与人类音乐分析实践相一致的策略。为了应对这些挑战,我们做了两个贡献:(1)我们引入了POP 909-CL,它是POP 909数据集的增强版本,具有与节奏一致的内容和人类校正的和弦,节拍,基调和时间签名标签;(2)提出了BACHI符号和弦识别模型,该模型将和弦识别任务分解为不同的决策步骤,即边界检测和和弦根、质量低音(倒置)。这种机制反映了人类的听力训练实践。实验表明,BACHI在古典音乐和流行音乐基准上都达到了最先进的和弦识别性能,消融研究验证了每个模块的有效性。
摘要
:Automatic chord recognition (ACR) via deep learning models has gradually achieved promising recognition accuracy, yet two key challenges remain. First, prior work has primarily focused on audio-domain ACR, while symbolic music (e.g., score) ACR has received limited attention due to data scarcity. Second, existing methods still overlook strategies that are aligned with human music analytical practices. To address these challenges, we make two contributions: (1) we introduce POP909-CL, an enhanced version of POP909 dataset with tempo-aligned content and human-corrected labels of chords, beats, keys, and time signatures; and (2) We propose BACHI, a symbolic chord recognition model that decomposes the task into different decision steps, namely boundary detection and iterative ranking of chord root, quality, and bass (inversion). This mechanism mirrors the human ear-training practices. Experiments demonstrate that BACHI achieves state-of-the-art chord recognition performance on both classical and pop music benchmarks, with ablation studies validating the effectiveness of each module.
医学相关(4篇)
【1】Resolution scaling governs DINOv3 transfer performance in chest radiograph classification
标题:分辨率缩放决定胸部X光片分类中的DINOv3传输性能
链接:https://arxiv.org/abs/2510.07191
作者:Soroosh Tayebi Arasteh, Mina Shaigan, Christiane Kuhl, Jakob Nikolas Kather, Sven Nebelung, Daniel Truhn
摘要:自我监督学习(SSL)具有先进的视觉表征学习,但其在胸部X线摄影(一种具有细粒度发现的高容量成像方式)中的价值仍不清楚。Meta的DINOv 3通过Gram-anchored自升华扩展了早期的SSL模型。这些设计选择是否改善了胸部X线摄影的迁移学习尚未得到系统的测试。我们在7个数据集(n> 814,000)上对DINOv 3与DINOv 2和ImageNet初始化进行了基准测试。评价了两种代表性主链:ViT-B/16和ConvNeXt-B。在224 x224、512 x512和1024 x1024像素下分析图像。我们还评估了来自7 B模型的冻结特征。主要结局是各标签的平均AUROC。在224 x224下,DINOv 3和DINOv 2在成人数据集上实现了相当的性能。将分辨率提高到512 x512为DINOv 3带来了与DINOv 2和ImageNet相比的一致改进。相比之下,儿科队列的结果显示初始化之间无差异。在所有设置中,ConvNeXt-B的性能优于ViT-B/16。使用冻结DINOv 3 - 7 B特征的模型相对于完全微调的86- 89 M参数骨架表现不佳,突出了域适应的重要性。缩放至1024 x1024并没有进一步提高准确度。分辨率相关的增益是最明显的边界依赖性和小病灶异常。在胸部放射摄影中,更高的输入分辨率对于利用现代自监督模型的优势至关重要。512 x512像素代表一个实际上限,其中DINOv 3初始化的ConvNeXt-B网络提供最强的性能,而较大的输入提供最小的成本回报。在临床上,这些发现支持使用512 x512的微调、中等大小的主干进行胸片判读,在检测与急诊和重症监护环境相关的细微或边界中心病变时预期收益最大。
摘要:Self-supervised learning (SSL) has advanced visual representation learning, but its value in chest radiography, a high-volume imaging modality with fine-grained findings, remains unclear. Meta's DINOv3 extends earlier SSL models through Gram-anchored self-distillation. Whether these design choices improve transfer learning for chest radiography has not been systematically tested. We benchmarked DINOv3 against DINOv2 and ImageNet initialization across seven datasets (n>814,000). Two representative backbones were evaluated: ViT-B/16 and ConvNeXt-B. Images were analyzed at 224x224, 512x512, and 1024x1024 pixels. We additionally assessed frozen features from a 7B model. The primary outcome was mean AUROC across labels. At 224x224, DINOv3 and DINOv2 achieved comparable performance on adult datasets. Increasing resolution to 512x512 yielded consistent improvements for DINOv3 over both DINOv2 and ImageNet. In contrast, results in pediatric cohort showed no differences across initializations. Across all settings, ConvNeXt-B outperformed ViT-B/16. Models using frozen DINOv3-7B features underperformed relative to fully finetuned 86-89M-parameter backbones, highlighting the importance of domain adaptation. Scaling to 1024x1024 did not further improve accuracy. Resolution-related gains were most evident for boundary-dependent and small focal abnormalities. In chest radiography, higher input resolution is critical for leveraging the benefits of modern self-supervised models. 512x512 pixels represent a practical upper limit where DINOv3-initialized ConvNeXt-B networks provide the strongest performance, while larger inputs offer minimal return on cost. Clinically, these findings support use of finetuned, mid-sized backbones at 512x512 for chest radiograph interpretation, with the greatest gains expected in detecting subtle or boundary-centered lesions relevant to emergency and critical care settings.
【2】Modeling COVID-19 Dynamics in German States Using Physics-Informed Neural Networks
标题:使用物理信息神经网络建模德国各州的COVID-19动态
链接:https://arxiv.org/abs/2510.06776
作者:Phillip Rothenbeck, Sai Karthikeya Vemuri, Niklas Penzel, Joachim Denzler
备注:19 pages, 7 figures, 2 tables
摘要:2019冠状病毒病大流行凸显了定量建模和分析的必要性,以了解真实世界的疾病动态。特别是,事后分析使用房室模型提供了宝贵的见解,公共卫生干预措施的有效性,如疫苗接种战略和遏制政策。然而,这种房室模型,如SIR(易感-感染-潜伏期),往往面临着直接纳入噪声观测数据的限制。在这项工作中,我们采用物理信息神经网络(PINNs)解决SIR模型的逆问题,使用罗伯特科赫研究所(RKI)的感染数据。我们的主要贡献是对德国所有联邦州在三年内的COVID-19动态进行了精细的时空分析。我们估计特定状态的传播和恢复参数和随时间变化的再生数(Rt)来跟踪大流行的进展。研究结果强调了各地区传播行为的巨大差异,揭示了与疫苗接种率和与主要流行阶段相关的时间模式的相关性。我们的研究结果证明了PINN在本地化,长期流行病学建模中的实用性。
摘要:The COVID-19 pandemic has highlighted the need for quantitative modeling and analysis to understand real-world disease dynamics. In particular, post hoc analyses using compartmental models offer valuable insights into the effectiveness of public health interventions, such as vaccination strategies and containment policies. However, such compartmental models like SIR (Susceptible-Infectious-Recovered) often face limitations in directly incorporating noisy observational data. In this work, we employ Physics-Informed Neural Networks (PINNs) to solve the inverse problem of the SIR model using infection data from the Robert Koch Institute (RKI). Our main contribution is a fine-grained, spatio-temporal analysis of COVID-19 dynamics across all German federal states over a three-year period. We estimate state-specific transmission and recovery parameters and time-varying reproduction number (R_t) to track the pandemic progression. The results highlight strong variations in transmission behavior across regions, revealing correlations with vaccination uptake and temporal patterns associated with major pandemic phases. Our findings demonstrate the utility of PINNs in localized, long-term epidemiological modeling.
【3】Chem-NMF: Multi-layer $α$-divergence Non-Negative Matrix Factorization for Cardiorespiratory Disease Clustering, with Improved Convergence Inspired by Chemical Catalysts and Rigorous Asymptotic Analysis
标题:Chem-NMF:呼吸道疾病聚集的多层$a $-分歧非负矩阵分解,受化学催化剂和严格渐进分析的启发,具有改进的收敛性
链接:https://arxiv.org/abs/2510.06632
作者
:Yasaman Torabi, Shahram Shirani, James P. Reilly
摘要:非负矩阵分解(NMF)是一种无监督学习方法,在音频处理、生物医学信号分析和图像识别等各个领域提供低秩表示。在NMF公式中引入$\alpha$-divergence可以增强优化的灵活性,但将这些方法扩展到多层架构在确保收敛方面存在挑战。为了解决这个问题,我们引入了一种新的方法,灵感来自于化学反应中的能量势垒的玻尔兹曼概率,从理论上进行收敛分析。我们介绍了一种新的方法,称为化学NMF,与边界因子,稳定收敛。据我们所知,这是第一个研究应用物理化学的角度来严格分析NMF算法的收敛行为。我们从数学上证明的渐近收敛结果,然后展示它们如何适用于实际数据。实验结果表明,该算法提高了5.6% $\pm $2.7%的生物医学信号和11.1% $\pm $7.2%的人脸图像(平均$\pm$标准差)的聚类精度。
摘要:Non-Negative Matrix Factorization (NMF) is an unsupervised learning method offering low-rank representations across various domains such as audio processing, biomedical signal analysis, and image recognition. The incorporation of $\alpha$-divergence in NMF formulations enhances flexibility in optimization, yet extending these methods to multi-layer architectures presents challenges in ensuring convergence. To address this, we introduce a novel approach inspired by the Boltzmann probability of the energy barriers in chemical reactions to theoretically perform convergence analysis. We introduce a novel method, called Chem-NMF, with a bounding factor which stabilizes convergence. To our knowledge, this is the first study to apply a physical chemistry perspective to rigorously analyze the convergence behaviour of the NMF algorithm. We start from mathematically proven asymptotic convergence results and then show how they apply to real data. Experimental results demonstrate that the proposed algorithm improves clustering accuracy by 5.6% $\pm$ 2.7% on biomedical signals and 11.1% $\pm$ 7.2% on face images (mean $\pm$ std).
【4】The Framework That Survives Bad Models: Human-AI Collaboration For Clinical Trials
标题:在不良模型中生存的框架:临床试验中的人机协作
链接:https://arxiv.org/abs/2510.06567
作者:Yao Chen, David Ohlssen, Aimee Readie, Gregory Ligozio, Ruvie Martin, Thibaud Coroller
摘要:人工智能(AI)在支持临床试验方面具有很大的潜力,从患者招募和终点评估到治疗反应预测。然而,在没有保障措施的情况下部署人工智能会带来重大风险,特别是在评估直接影响试验结论的患者终点时。我们比较了两个人工智能框架与基于医学图像的疾病评估的人类评估,测量成本,准确性,鲁棒性和泛化能力。为了对这些框架进行压力测试,我们注入了从随机猜测到天真预测的坏模型,以确保即使在严重的模型退化下,观察到的治疗效果仍然有效。我们使用两项随机对照试验对这些框架进行了评估,试验终点来自脊柱X线图像。我们的研究结果表明,使用AI作为支持阅读器(AI-SR)是临床试验最合适的方法,因为它满足各种模型类型的所有标准,即使是坏模型。该方法始终提供可靠的疾病估计,保留临床试验治疗效果估计和结论,并在应用于不同人群时保留这些优势。
摘要:Artificial intelligence (AI) holds great promise for supporting clinical trials, from patient recruitment and endpoint assessment to treatment response prediction. However, deploying AI without safeguards poses significant risks, particularly when evaluating patient endpoints that directly impact trial conclusions. We compared two AI frameworks against human-only assessment for medical image-based disease evaluation, measuring cost, accuracy, robustness, and generalization ability. To stress-test these frameworks, we injected bad models, ranging from random guesses to naive predictions, to ensure that observed treatment effects remain valid even under severe model degradation. We evaluated the frameworks using two randomized controlled trials with endpoints derived from spinal X-ray images. Our findings indicate that using AI as a supporting reader (AI-SR) is the most suitable approach for clinical trials, as it meets all criteria across various model types, even with bad models. This method consistently provides reliable disease estimation, preserves clinical trial treatment effect estimates and conclusions, and retains these advantages when applied to different populations.
蒸馏|知识提取(3篇)
【1】Multi-hop Deep Joint Source-Channel Coding with Deep Hash Distillation for Semantically Aligned Image Retrieval
标题:用于语义对齐图像检索的多跳深度联合源通道编码和深度哈希蒸馏
链接:https://arxiv.org/abs/2510.06868
作者:Didrik Bergström, Deniz Gündüz, Onur Günlü
摘要:我们考虑通过深度联合源通道编码(DeepJSCC)在多跳加性高斯白噪声(AWGN)通道上进行图像传输,方法是使用预训练的深度哈希蒸馏(DHD)模块训练DeepJSCC编码器-解码器对,以对图像进行语义聚类,通过增强语义一致性和提高感知重建质量来促进面向安全的应用。我们训练DeepJSCC模块以降低均方误差(MSE)并最小化源图像和重建图像的DHD散列之间的余弦距离。对于不同的多跳设置,示出了由于语义对齐而显著改善的感知质量,对于这些设置,经典的DeepJSCC可能会遭受噪声积累,这是通过学习的感知图像块相似性(LPIPS)度量来测量的。
摘要:We consider image transmission via deep joint source-channel coding (DeepJSCC) over multi-hop additive white Gaussian noise (AWGN) channels by training a DeepJSCC encoder-decoder pair with a pre-trained deep hash distillation (DHD) module to semantically cluster images, facilitating security-oriented applications through enhanced semantic consistency and improving the perceptual reconstruction quality. We train the DeepJSCC module to both reduce mean square error (MSE) and minimize cosine distance between DHD hashes of source and reconstructed images. Significantly improved perceptual quality as a result of semantic alignment is illustrated for different multi-hop settings, for which classical DeepJSCC may suffer from noise accumulation, measured by the learned perceptual image patch similarity (LPIPS) metric.
【2】Is the Hard-Label Cryptanalytic Model Extraction Really Polynomial?
标题:硬标签密码分析模型提取真的是多项性吗?
链接:https://arxiv.org/abs/2510.06692
作者:Akira Ito, Takayuki Miura, Yosuke Todo
摘要
:深度神经网络(DNN)已经引起了极大的关注,其内部模型现在被认为是有价值的知识资产。通过访问DNN来提取这些内部模型在概念上类似于通过对块密码的oracle访问来提取密钥。因此,密码分析技术,特别是差分类攻击,最近一直在积极探索。基于ReLU的DNN是最常见和最广泛部署的架构。虽然早期的作品(例如,Crypto 2020,Eurocrypt 2024)假设可以访问精确的输出logits,这些logits通常是不可见的,最近的作品(例如,Asiacrypt 2024,Eurocrypt 2025)专注于硬标签设置,其中只有最终的分类结果(例如,“狗”或“车”)可供攻击者使用。值得注意的是,Carlini等人(Eurocrypt 2025)证明,即使在这种受限设置下,模型提取在多项式时间内也是可行的。 在本文中,我们首先表明,他们的攻击背后的假设变得越来越不切实际的攻击目标深度的增长。在实践中,满足这些假设需要相对于攻击深度的指数数量的查询,这意味着攻击并不总是在多项式时间内运行。为了解决这个关键的限制,我们提出了一种新的攻击方法称为跨层提取。代替直接提取秘密参数(例如,权重和偏置),这会产生指数成本,我们利用跨层的神经元交互来从更深层提取这些信息。该技术显著降低了查询复杂度,并缓解了现有模型提取方法的局限性。
摘要:Deep Neural Networks (DNNs) have attracted significant attention, and their internal models are now considered valuable intellectual assets. Extracting these internal models through access to a DNN is conceptually similar to extracting a secret key via oracle access to a block cipher. Consequently, cryptanalytic techniques, particularly differential-like attacks, have been actively explored recently. ReLU-based DNNs are the most commonly and widely deployed architectures. While early works (e.g., Crypto 2020, Eurocrypt 2024) assume access to exact output logits, which are usually invisible, more recent works (e.g., Asiacrypt 2024, Eurocrypt 2025) focus on the hard-label setting, where only the final classification result (e.g., "dog" or "car") is available to the attacker. Notably, Carlini et al. (Eurocrypt 2025) demonstrated that model extraction is feasible in polynomial time even under this restricted setting. In this paper, we first show that the assumptions underlying their attack become increasingly unrealistic as the attack-target depth grows. In practice, satisfying these assumptions requires an exponential number of queries with respect to the attack depth, implying that the attack does not always run in polynomial time. To address this critical limitation, we propose a novel attack method called CrossLayer Extraction. Instead of directly extracting the secret parameters (e.g., weights and biases) of a specific neuron, which incurs exponential cost, we exploit neuron interactions across layers to extract this information from deeper layers. This technique significantly reduces query complexity and mitigates the limitations of existing model extraction approaches.
【3】GUIDE: Guided Initialization and Distillation of Embeddings
标题:指南:嵌入物的引导蒸馏和蒸馏
链接:https://arxiv.org/abs/2510.06502
作者:Khoa Trinh, Gaurav Menghani, Erik Vee
摘要:蒸馏(\cite{hinton 2015 distillation})等化学效率技术在提高模型质量而不增加服务成本方面非常有用,前提是在训练期间,较大的教师模型可用于较小的学生模型。标准的提炼方法仅限于迫使学生与教师的输出相匹配。考虑到训练大型模型的相关成本,我们认为应该从教师模型中提取更多有用的信息,而不仅仅是让学生匹配教师的输出。 在本文中,我们介绍\guide(引导嵌入和蒸馏)。\guide可以被认为是一种蒸馏技术,它迫使学生在参数空间中与教师匹配。使用\guide,当使用在大约200亿美元的token上训练的大型学生模型(400 M- 1B参数)时,师生质量差距减少了25-26\%。我们还提出了一个彻底的分析,证明\指导可以结合知识蒸馏与近添加剂的改进。此外,我们表明,单独应用\指南导致更好的模型质量比应用知识蒸馏本身。 最重要的是,\guide不引入任何训练或推理开销,因此我们的方法的任何模型质量增益几乎是免费的。
摘要:Algorithmic efficiency techniques such as distillation (\cite{hinton2015distillation}) are useful in improving model quality without increasing serving costs, provided a larger teacher model is available for a smaller student model to learn from during training. Standard distillation methods are limited to only forcing the student to match the teacher's outputs. Given the costs associated with training a large model, we believe we should be extracting more useful information from a teacher model than by just making the student match the teacher's outputs. In this paper, we introduce \guide (Guided Initialization and Distillation of Embeddings). \guide can be considered a distillation technique that forces the student to match the teacher in the parameter space. Using \guide we show 25-26\% reduction in the teacher-student quality gap when using large student models (400M - 1B parameters) trained on $\approx$ 20B tokens. We also present a thorough analysis demonstrating that \guide can be combined with knowledge distillation with near additive improvements. Furthermore, we show that applying \guide alone leads to substantially better model quality than applying knowledge distillation by itself. Most importantly, \guide introduces no training or inference overhead and hence any model quality gains from our method are virtually free.
聚类(4篇)
【1】DPMM-CFL: Clustered Federated Learning via Dirichlet Process Mixture Model Nonparametric Clustering
标题:DPMM-CTF:通过Dirichlet过程混合模型非参数集群的混合联邦学习
链接:https://arxiv.org/abs/2510.07132
作者:Mariona Jaramillo-Civill, Peng Wu, Pau Closas
备注:5 pages, 2 figures
摘要:分布式联合学习(CFL)通过对客户端进行聚类并在每个聚类中训练一个模型,从而在全局模型和完全个性化的模型之间进行平衡,从而提高了非IID客户端异构性下的性能。然而,大多数CFL方法需要固定的聚类数K的先验,这是不切实际的,当潜在的结构是未知的。我们提出了DPMM-CFL,CFL算法,将一个狄利克雷过程(DP)的分布的集群参数。这使得非参数贝叶斯推断能够联合推断集群数量和客户端分配,同时优化每个集群的联合目标。这导致在一个方法,在每一轮,联邦更新和集群推理耦合,如本文所述。该算法在Dirichlet和类分裂非IID划分下的基准数据集上进行了验证。
摘要:Clustered Federated Learning (CFL) improves performance under non-IID client heterogeneity by clustering clients and training one model per cluster, thereby balancing between a global model and fully personalized models. However, most CFL methods require the number of clusters K to be fixed a priori, which is impractical when the latent structure is unknown. We propose DPMM-CFL, a CFL algorithm that places a Dirichlet Process (DP) prior over the distribution of cluster parameters. This enables nonparametric Bayesian inference to jointly infer both the number of clusters and client assignments, while optimizing per-cluster federated objectives. This results in a method where, at each round, federated updates and cluster inferences are coupled, as presented in this paper. The algorithm is validated on benchmark datasets under Dirichlet and class-split non-IID partitions.
【2】Angular Constraint Embedding via SpherePair Loss for Constrained Clustering
标题:通过SpherePair丢失嵌入角约束以实现约束聚集
链接:https://arxiv.org/abs/2510.06907
作者:Shaojie Zhang, Ke Chen
备注:Accepted by NeurIPS 2025, 6 Figures and 1 Table in Main text, 18 Figures and 5 Tables in Appendices
摘要:约束聚类通过成对约束集成领域知识。然而,现有的深度约束聚类(DCC)方法要么受到端到端建模中固有的锚点的限制,要么在学习区分性欧几里德嵌入方面遇到困难,从而限制了它们的可扩展性和现实世界的适用性。为了避免各自的陷阱,我们提出了一种新的角度约束嵌入方法DCC,称为SpherePair。使用具有几何公式的SpherePair损失,我们的方法忠实地编码成对约束,并导致在角空间中聚类友好的嵌入,有效地将表示学习与聚类分离。SpherePair保留了成对关系而没有冲突,消除了指定簇的确切数量的需要,推广到看不见的数据,可以快速推断簇的数量,并得到严格的理论保证的支持。在不同的基准上与最先进的DCC方法进行比较评估,以及对理论见解的实证验证,证实了其卓越的性能,可扩展性和整体现实世界的有效性。代码可在\href{https://github.com/spherepaircc/spherepairCC/tree/main}{我们的存储库}中找到。
摘要:Constrained clustering integrates domain knowledge through pairwise constraints. However, existing deep constrained clustering (DCC) methods are either limited by anchors inherent in end-to-end modeling or struggle with learning discriminative Euclidean embedding, restricting their scalability and real-world applicability. To avoid their respective pitfalls, we propose a novel angular constraint embedding approach for DCC, termed SpherePair. Using the SpherePair loss with a geometric formulation, our method faithfully encodes pairwise constraints and leads to embeddings that are clustering-friendly in angular space, effectively separating representation learning from clustering. SpherePair preserves pairwise relations without conflict, removes the need to specify the exact number of clusters, generalizes to unseen data, enables rapid inference of the number of clusters, and is supported by rigorous theoretical guarantees. Comparative evaluations with state-of-the-art DCC methods on diverse benchmarks, along with empirical validation of theoretical insights, confirm its superior performance, scalability, and overall real-world effectiveness. Code is available at \href{https://github.com/spherepaircc/SpherePairCC/tree/main}{our repository}.
【3】Cluster Paths: Navigating Interpretability in Neural Networks
标题:集群路径:神经网络中的可解释性导航
链接:https://arxiv.org/abs/2510.06541
作者:Nicholas M. Kroeger, Vincent Bindschaedler
摘要:虽然现代深度神经网络在视觉任务中取得了令人印象深刻的表现,但它们在决策过程中仍然不透明,存在不必要的信任、未被发现的偏见和意外失败的风险。我们提出了集群路径,一个事后的可解释性方法,集群激活在选定的层,并表示每个输入作为其序列的集群ID。为了评估这些集群路径,我们引入了四个指标:路径复杂性(认知负荷),加权路径纯度(类对齐),决策对齐忠诚度(预测保真度)和路径协议(扰动下的稳定性)。在虚假线索CIFAR-10实验中,聚类路径识别基于颜色的快捷方式,并在删除线索时崩溃。在一个五类CelebA头发颜色的任务中,他们实现了90%的忠诚度,并在高斯噪声下保持96%的一致性,而不会牺牲准确性。扩展到在ImageNet上预训练的Vision Transformer,我们将聚类路径扩展到从最小路径分歧上提示大型语言模型中导出的概念路径。最后,我们表明,集群路径可以作为一个有效的分布外(OOD)检测器,可靠地标记异常样本之前,模型生成过度自信的预测。聚类路径揭示了多个网络深度的视觉概念,如调色板、纹理或对象上下文,表明聚类路径可扩展到大型视觉模型,同时生成简洁且人类可读的解释。
摘要:While modern deep neural networks achieve impressive performance in vision tasks, they remain opaque in their decision processes, risking unwarranted trust, undetected biases and unexpected failures. We propose cluster paths, a post-hoc interpretability method that clusters activations at selected layers and represents each input as its sequence of cluster IDs. To assess these cluster paths, we introduce four metrics: path complexity (cognitive load), weighted-path purity (class alignment), decision-alignment faithfulness (predictive fidelity), and path agreement (stability under perturbations). In a spurious-cue CIFAR-10 experiment, cluster paths identify color-based shortcuts and collapse when the cue is removed. On a five-class CelebA hair-color task, they achieve 90% faithfulness and maintain 96% agreement under Gaussian noise without sacrificing accuracy. Scaling to a Vision Transformer pretrained on ImageNet, we extend cluster paths to concept paths derived from prompting a large language model on minimal path divergences. Finally, we show that cluster paths can serve as an effective out-of-distribution (OOD) detector, reliably flagging anomalous samples before the model generates over-confident predictions. Cluster paths uncover visual concepts, such as color palettes, textures, or object contexts, at multiple network depths, demonstrating that cluster paths scale to large vision models while generating concise and human-readable explanations.
【4】Bayesian Nonparametric Dynamical Clustering of Time Series
标题:时间序列的Bayesian非参数动态聚集
链接:https://arxiv.org/abs/2510.06919
作者:Adrián Pérez-Herrero, Paulo Félix, Jesús Presedo, Carl Henrik Ek
备注:This work has been submitted to the IEEE for possible publication. 15 pages. 9 figures
摘要:我们提出了一种方法,模拟无限数量的时间序列集群之间切换一个未知数量的制度与线性动力学的演变。我们开发了一种贝叶斯非参数方法,使用分层Dirichlet过程作为切换线性动力系统和高斯过程的参数的先验,然后对每个集群内的幅度和时间对齐的统计变化进行建模。通过对时间序列模式的演变进行建模,该方法以原则性的方式避免了不必要的聚类扩散。我们通过制定离线和在线场景的变分下限来进行推理,从而通过优化实现有效的学习。我们说明了多功能性和有效性的方法,通过几个案例研究心电图分析使用公开的数据库。
摘要
:We present a method that models the evolution of an unbounded number of time series clusters by switching among an unknown number of regimes with linear dynamics. We develop a Bayesian non-parametric approach using a hierarchical Dirichlet process as a prior on the parameters of a Switching Linear Dynamical System and a Gaussian process prior to model the statistical variations in amplitude and temporal alignment within each cluster. By modeling the evolution of time series patterns, the method avoids unnecessary proliferation of clusters in a principled manner. We perform inference by formulating a variational lower bound for off-line and on-line scenarios, enabling efficient learning through optimization. We illustrate the versatility and effectiveness of the approach through several case studies of electrocardiogram analysis using publicly available databases.
超分辨率|去噪|去模糊|去雾(1篇)
【1】Conditional Denoising Diffusion Model-Based Robust MR Image Reconstruction from Highly Undersampled Data
标题:基于条件去噪扩散模型的高度欠采样数据稳健MR图像重建
链接:https://arxiv.org/abs/2510.06335
作者:Mohammed Alsubaie, Wenxi Liu, Linxia Gu, Ovidiu C. Andronesi, Sirani M. Perera, Xianqi Li
摘要:磁共振成像(MRI)是现代医学诊断中的关键工具,但其延长的采集时间仍然是一个关键的限制,特别是在时间敏感的临床场景中。虽然欠采样策略可以加速图像采集,但它们通常会导致图像伪影和质量下降。最近的扩散模型已经显示出通过学习强大的图像先验来从欠采样数据重建高保真图像的前景;然而,大多数现有方法要么(i)依赖于无监督的评分函数,没有配对监督,要么(ii)仅将数据一致性作为后处理步骤。在这项工作中,我们引入了一个具有迭代数据一致性校正的条件去噪扩散框架,该框架与以前的方法不同,它将测量模型直接嵌入到每个反向扩散步骤中,并在成对的欠采样地面真实数据上训练模型。这种混合设计将生成灵活性与MRI物理学的明确实施联系起来。在fastMRI数据集上的实验表明,我们的框架在SSIM、PSNR和LPIPS方面始终优于最近最先进的深度学习和基于扩散的方法,LPIPS更忠实地捕捉感知改善。这些结果表明,将条件监督与迭代一致性更新相结合,在像素级保真度和感知真实性方面都有了实质性的改善,为稳健、加速的MRI重建建立了原则性和实用性的进步。
摘要:Magnetic Resonance Imaging (MRI) is a critical tool in modern medical diagnostics, yet its prolonged acquisition time remains a critical limitation, especially in time-sensitive clinical scenarios. While undersampling strategies can accelerate image acquisition, they often result in image artifacts and degraded quality. Recent diffusion models have shown promise for reconstructing high-fidelity images from undersampled data by learning powerful image priors; however, most existing approaches either (i) rely on unsupervised score functions without paired supervision or (ii) apply data consistency only as a post-processing step. In this work, we introduce a conditional denoising diffusion framework with iterative data-consistency correction, which differs from prior methods by embedding the measurement model directly into every reverse diffusion step and training the model on paired undersampled-ground truth data. This hybrid design bridges generative flexibility with explicit enforcement of MRI physics. Experiments on the fastMRI dataset demonstrate that our framework consistently outperforms recent state-of-the-art deep learning and diffusion-based methods in SSIM, PSNR, and LPIPS, with LPIPS capturing perceptual improvements more faithfully. These results demonstrate that integrating conditional supervision with iterative consistency updates yields substantial improvements in both pixel-level fidelity and perceptual realism, establishing a principled and practical advance toward robust, accelerated MRI reconstruction.
点云|SLAM|雷达|激光|深度RGBD相关(2篇)
【1】An in-depth look at approximation via deep and narrow neural networks
标题:深入研究通过深度和狭窄的神经网络进行逼近
链接:https://arxiv.org/abs/2510.07202
作者:Joris Dommel, Sven A. Wegner
备注:11 pages
摘要:在2017年,Hanin和Sellke证明了宽度为w的任意深度、实值、前馈和ReLU激活的网络类形成了R^n上连续函数空间的稠密子集,关于紧集上一致收敛的拓扑,当且仅当w>n成立。为了说明这一必要性,我们使用了一个具体的反例函数f:R^n->R。在本文中,我们实际上在上述阈值附近的w=n和w=n+1两种情况下通过神经网络来近似这个f。我们研究了如果我们改变深度,近似质量如何表现,以及是什么影响(剧透警告:垂死的神经元)导致这种行为。
摘要:In 2017, Hanin and Sellke showed that the class of arbitrarily deep, real-valued, feed-forward and ReLU-activated networks of width w forms a dense subset of the space of continuous functions on R^n, with respect to the topology of uniform convergence on compact sets, if and only if w>n holds. To show the necessity, a concrete counterexample function f:R^n->R was used. In this note we actually approximate this very f by neural networks in the two cases w=n and w=n+1 around the aforementioned threshold. We study how the approximation quality behaves if we vary the depth and what effect (spoiler alert: dying neurons) cause that behavior.
【2】Scalable deep fusion of spaceborne lidar and synthetic aperture radar for global forest structural complexity mapping
标题:星载激光雷达和合成口径雷达的可扩展深度融合,用于全球森林结构复杂性绘制
链接:https://arxiv.org/abs/2510.06299
作者:Tiago de Conto, John Armston, Ralph Dubayah
摘要:森林结构复杂性度量将多个林冠属性整合为反映栖息地质量和生态系统功能的单一值。全球生态系统动力学调查(GEDI)的星载激光雷达使温带和热带森林结构复杂性的测绘成为可能,但其稀疏的采样限制了连续的高分辨率测绘。我们提出了一个可扩展的深度学习框架,将GEDI观测与多模态合成孔径雷达(SAR)数据集融合,以生成森林结构复杂性的全球高分辨率(25米)地图。我们经过调整的EfficientNetV2架构在超过1.3亿个GEDI足迹上进行了训练,实现了不到40万个参数的高性能(全局R2 = 0.82),使其成为一种可访问的工具,使研究人员能够处理任何规模的数据集,而无需专门的计算基础设施。该模型产生准确的预测与校准的不确定性估计跨生物群落和时间段,保持精细尺度的空间格局。它已被用于生成2015年至2022年森林结构复杂性的全球多时相数据集。通过迁移学习,该框架可以扩展到以最小的计算成本预测其他森林结构变量。这一方法支持对全球森林结构动态进行持续的多时段监测,并为在不断变化的气候中开展生物多样性养护和生态系统管理工作提供工具。
摘要
:Forest structural complexity metrics integrate multiple canopy attributes into a single value that reflects habitat quality and ecosystem function. Spaceborne lidar from the Global Ecosystem Dynamics Investigation (GEDI) has enabled mapping of structural complexity in temperate and tropical forests, but its sparse sampling limits continuous high-resolution mapping. We present a scalable, deep learning framework fusing GEDI observations with multimodal Synthetic Aperture Radar (SAR) datasets to produce global, high-resolution (25 m) wall-to-wall maps of forest structural complexity. Our adapted EfficientNetV2 architecture, trained on over 130 million GEDI footprints, achieves high performance (global R2 = 0.82) with fewer than 400,000 parameters, making it an accessible tool that enables researchers to process datasets at any scale without requiring specialized computing infrastructure. The model produces accurate predictions with calibrated uncertainty estimates across biomes and time periods, preserving fine-scale spatial patterns. It has been used to generate a global, multi-temporal dataset of forest structural complexity from 2015 to 2022. Through transfer learning, this framework can be extended to predict additional forest structural variables with minimal computational cost. This approach supports continuous, multi-temporal monitoring of global forest structural dynamics and provides tools for biodiversity conservation and ecosystem management efforts in a changing climate.
联邦学习|隐私保护|加密(1篇)
【1】Layerwise Federated Learning for Heterogeneous Quantum Clients using Quorus
标题:使用Quorus的异类量子客户端分层联邦学习
链接:https://arxiv.org/abs/2510.06228
作者:Jason Han, Nicholas S. DiBrita, Daniel Leeds, Jianqiang Li, Jason Ludmir, Tirthak Patel
摘要:量子机器学习(QML)有望解决经典的棘手问题,但是,由于关键数据可能会在私人客户端上分散,因此需要量子联邦学习(QFL)格式的分布式QML。然而,不同客户端可以访问的量子计算机可能容易出错,并且具有异构的错误属性,需要它们运行不同深度的电路。我们提出了一种新的解决QFL问题的方法,Quorus,它利用分层损失函数来有效训练不同深度的量子模型,这允许客户端根据他们的个人能力选择高保真输出的模型。Quorus还根据客户需求提供各种模型设计,优化拍摄预算,量子位数,中间电路测量和优化空间。我们的模拟和实际硬件结果显示了Quorus的承诺:它增加了更高深度客户端的梯度大小,并将测试精度平均提高了12.4%。
摘要:Quantum machine learning (QML) holds the promise to solve classically intractable problems, but, as critical data can be fragmented across private clients, there is a need for distributed QML in a quantum federated learning (QFL) format. However, the quantum computers that different clients have access to can be error-prone and have heterogeneous error properties, requiring them to run circuits of different depths. We propose a novel solution to this QFL problem, Quorus, that utilizes a layerwise loss function for effective training of varying-depth quantum models, which allows clients to choose models for high-fidelity output based on their individual capacity. Quorus also presents various model designs based on client needs that optimize for shot budget, qubit count, midcircuit measurement, and optimization space. Our simulation and real-hardware results show the promise of Quorus: it increases the magnitude of gradients of higher depth clients and improves testing accuracy by 12.4% on average over the state-of-the-art.
推理|分析|理解|解释(14篇)
【1】A Multi-Agent Framework for Stateful Inference-Time Search
标题:一种用于状态推理时搜索的多Agent框架
链接:https://arxiv.org/abs/2510.07147
作者:Arshika Lalan, Rajat Ghosh, Aditya Kolsur, Debojyoti Dutta
摘要:最近的工作探讨了代理推理时间技术来执行结构化的,多步推理。然而,由于缺乏持久状态,无状态推理经常在多步任务上挣扎。此外,特定于任务的微调或预调通常可以实现表面级别的代码生成,但在需要更深层次的推理和长期依赖关系的任务上仍然很脆弱。为了解决这些局限性,我们提出了有状态的多智能体进化搜索,这是一个无需训练的框架,它通过结合(i)持久的推理时间状态,(ii)对抗性突变和(iii)进化保存来脱离先前的无状态方法。我们证明了它的有效性,在自动化单元测试生成的边缘情况下,通过生成。我们使用进化搜索过程生成强大的边缘案例,其中专业代理依次提出,变异和评分候选人。控制器在各代之间保持持久状态,而进化保存则确保了所有可能情况下的多样性和探索性。这产生了一个多面手代理,能够发现强大的,高覆盖率的边缘情况下,在看不见的代码库。实验表明,我们的有状态多智能体推理框架在覆盖率方面比无状态单步基线有了很大的提高,在普遍的单元测试基准测试(如HumanEval和TestGenEvalMini)上进行了评估,并使用了三个不同的LLM家族- Llama,Gemma和GPT。这些结果表明,结合持久推理时间状态与进化搜索实质上提高了单元测试生成。
摘要:Recent work explores agentic inference-time techniques to perform structured, multi-step reasoning. However, stateless inference often struggles on multi-step tasks due to the absence of persistent state. Moreover, task-specific fine-tuning or instruction-tuning often achieve surface-level code generation but remain brittle on tasks requiring deeper reasoning and long-horizon dependencies. To address these limitations, we propose stateful multi-agent evolutionary search, a training-free framework that departs from prior stateless approaches by combining (i) persistent inference-time state, (ii) adversarial mutation, and (iii) evolutionary preservation. We demonstrate its effectiveness in automated unit test generation through the generation of edge cases. We generate robust edge cases using an evolutionary search process, where specialized agents sequentially propose, mutate, and score candidates. A controller maintains persistent state across generations, while evolutionary preservation ensures diversity and exploration across all possible cases. This yields a generalist agent capable of discovering robust, high-coverage edge cases across unseen codebases. Experiments show our stateful multi-agent inference framework achieves substantial gains in coverage over stateless single-step baselines, evaluated on prevalent unit-testing benchmarks such as HumanEval and TestGenEvalMini and using three diverse LLM families - Llama, Gemma, and GPT. These results indicate that combining persistent inference-time state with evolutionary search materially improves unit-test generation.
【2】Non-Asymptotic Analysis of Efficiency in Conformalized Regression
标题:保形回归效率的非渐进分析
链接:https://arxiv.org/abs/2510.07093
作者:Yunzhen Yao, Lie He, Michael Gastpar
摘要
:共形预测为预测集提供覆盖保证。保形预测的信息量取决于其效率,通常由预测集的预期大小量化。以前的工作效率的一致性回归通常把错误覆盖水平$\alpha$作为一个固定的常数。在这项工作中,我们在对数据分布的温和假设下,为通过SGD训练的保形分位数和中位数回归建立了预测集长度与预言区间长度偏差的非渐近界。我们的阶数为$\mathcal{O}(1/\sqrt{n} + 1/(\alpha^2 n)+ 1/\sqrt{m} + \exp(-\alpha^2 m))$的界捕获了效率对适当训练集大小$n$、校准集大小$m$和错误覆盖水平$\alpha$的联合依赖性。该结果识别了在不同的$\alpha$范围内收敛速率的相变,为分配数据以控制过量预测集长度提供了指导。实证结果与我们的理论研究结果是一致的。
摘要:Conformal prediction provides prediction sets with coverage guarantees. The informativeness of conformal prediction depends on its efficiency, typically quantified by the expected size of the prediction set. Prior work on the efficiency of conformalized regression commonly treats the miscoverage level $\alpha$ as a fixed constant. In this work, we establish non-asymptotic bounds on the deviation of the prediction set length from the oracle interval length for conformalized quantile and median regression trained via SGD, under mild assumptions on the data distribution. Our bounds of order $\mathcal{O}(1/\sqrt{n} + 1/(\alpha^2 n) + 1/\sqrt{m} + \exp(-\alpha^2 m))$ capture the joint dependence of efficiency on the proper training set size $n$, the calibration set size $m$, and the miscoverage level $\alpha$. The results identify phase transitions in convergence rates across different regimes of $\alpha$, offering guidance for allocating data to control excess prediction set length. Empirical results are consistent with our theoretical findings.
【3】SaFeR-VLM: Toward Safety-aware Fine-grained Reasoning in Multimodal Models
标题:SaFeR-VLM:走向多模式模型中的安全意识细粒度推理
链接:https://arxiv.org/abs/2510.06871
作者:Huahui Yi, Kun Wang, Qiankun Li, Miao Yu, Liang Lin, Gongli Xi, Hao Wu, Xuming Hu, Kang Li, Yang Liu
摘要:多模态大型推理模型(MLRM)展示了令人印象深刻的跨模态推理,但在对抗性或不安全的提示下往往会放大安全风险,我们称之为\textit{推理税}。现有的防御主要作用于输出级别,不约束推理过程,使模型暴露于隐含的风险。在本文中,我们提出了SaFeR-VLM,这是一个安全对齐的强化学习框架,它将安全直接嵌入到多模态推理中。该框架集成了四个组件:(I)QI-Safe-10 K,一个精心策划的数据集,强调安全关键和推理敏感的案例;(II)安全意识的推出,不安全的世代会经历反思和纠正,而不是被丢弃;(III)结构化奖励模型,具有多维加权标准和对幻觉和矛盾的明确惩罚;和(IV)GRPO优化,其加强安全和校正的轨迹。这种统一的设计将安全性从被动的保护措施转变为推理的主动驱动器,从而实现可扩展和可推广的安全感知推理。SaFeR-VLM进一步证明了对显性和隐性风险的鲁棒性,支持动态和可解释的安全决策,而不仅仅是表面过滤。SaFeR-VLM-3B在六个基准测试中的安全性和实用性分别达到了70.13美元和78.97美元的平均性能,超过了同等规模和10倍以上的型号,如Skywork-R1 V3 - 38 B、Qwen2.5VL-72 B和GLM4.5V-106 B。值得注意的是,SaFeR-VLM-7 B受益于其增加的规模,在安全指标上分别超过GPT-5-mini和Gemini-2.5-Flash\num{6.47}和\num{16.76}点,实现了这一改进,而没有任何有用性能的下降。我们的代码可在https://github.com/HarveyYi/SaFeR-VLM上获得。
摘要:Multimodal Large Reasoning Models (MLRMs) demonstrate impressive cross-modal reasoning but often amplify safety risks under adversarial or unsafe prompts, a phenomenon we call the \textit{Reasoning Tax}. Existing defenses mainly act at the output level and do not constrain the reasoning process, leaving models exposed to implicit risks. In this paper, we propose SaFeR-VLM, a safety-aligned reinforcement learning framework that embeds safety directly into multimodal reasoning. The framework integrates four components: (I) QI-Safe-10K, a curated dataset emphasizing safety-critical and reasoning-sensitive cases; (II) safety-aware rollout, where unsafe generations undergo reflection and correction instead of being discarded; (III) structured reward modeling with multi-dimensional weighted criteria and explicit penalties for hallucinations and contradictions; and (IV) GRPO optimization, which reinforces both safe and corrected trajectories. This unified design shifts safety from a passive safeguard to an active driver of reasoning, enabling scalable and generalizable safety-aware reasoning. SaFeR-VLM further demonstrates robustness against both explicit and implicit risks, supporting dynamic and interpretable safety decisions beyond surface-level filtering. SaFeR-VLM-3B achieves average performance $70.13$ and $78.97$ on safety and helpfulness across six benchmarks, surpassing both same-scale and $>10\times$ larger models such as Skywork-R1V3-38B, Qwen2.5VL-72B, and GLM4.5V-106B. Remarkably, SaFeR-VLM-7B benefits from its increased scale to surpass GPT-5-mini and Gemini-2.5-Flash by \num{6.47} and \num{16.76} points respectively on safety metrics, achieving this improvement without any degradation in helpfulness performance. Our codes are available at https://github.com/HarveyYi/SaFeR-VLM.
【4】Enhancing Bankruptcy Prediction of Banks through Advanced Machine Learning Techniques: An Innovative Approach and Analysis
标题:通过先进的机器学习技术增强银行破产预测:创新方法和分析
链接:https://arxiv.org/abs/2510.06852
作者:Zuherman Rustam, Sri Hartini, Sardar M.N. Islam, Fevi Novkaniza, Fiftitah R. Aszhari, Muhammad Rifqi
摘要:金融体系的稳定性取决于银行体系的状况。银行倒闭可能会破坏金融体系的稳定,因为银行面临系统性风险,不仅影响个别银行,而且影响部分或整个金融体系。计算银行破产的概率是确保银行体系安全和稳健的一种方法。现有文献和局限性:统计模型,如Altman的Z分数,是开发破产预测模型的常用技术之一。然而,统计方法依赖于严格的,有时不相关的假设,这可能导致预测准确性低。必须采取新的办法。研究目的:破产模型是使用机器学习技术开发的,例如逻辑回归(LR),随机森林(RF)和支持向量机(SVM)。根据几项研究,机器学习在分类和预测银行风险管理方面也比统计方法更准确和有效。目前的研究:商业银行数据来自土耳其44家活跃银行和21家破产银行1994年至2004年的年度财务报表,农村银行数据来自印度尼西亚43家活跃银行和43家破产农村银行2013年至2019年的季度财务报告。还挑选了印度尼西亚的五家农村银行,以证明分析银行破产趋势的可行性。研究结果和影响:研究实验的结果表明,RF可以预测数据从商业银行的准确率为90%。此外,提出的三种机器学习方法准确地预测了农村银行破产的可能性。贡献和结论:提出的创新机器学习方法有助于实施降低破产成本的政策。
摘要
:Context: Financial system stability is determined by the condition of the banking system. A bank failure can destroy the stability of the financial system, as banks are subject to systemic risk, affecting not only individual banks but also segments or the entire financial system. Calculating the probability of a bank going bankrupt is one way to ensure the banking system is safe and sound. Existing literature and limitations: Statistical models, such as Altman's Z-Score, are one of the common techniques for developing a bankruptcy prediction model. However, statistical methods rely on rigid and sometimes irrelevant assumptions, which can result in low forecast accuracy. New approaches are necessary. Objective of the research: Bankruptcy models are developed using machine learning techniques, such as logistic regression (LR), random forest (RF), and support vector machines (SVM). According to several studies, machine learning is also more accurate and effective than statistical methods for categorising and forecasting banking risk management. Present Research: The commercial bank data are derived from the annual financial statements of 44 active banks and 21 bankrupt banks in Turkey from 1994 to 2004, and the rural bank data are derived from the quarterly financial reports of 43 active and 43 bankrupt rural banks in Indonesia between 2013 and 2019. Five rural banks in Indonesia have also been selected to demonstrate the feasibility of analysing bank bankruptcy trends. Findings and implications: The results of the research experiments show that RF can forecast data from commercial banks with a 90% accuracy rate. Furthermore, the three machine learning methods proposed accurately predict the likelihood of rural bank bankruptcy. Contribution and Conclusion: The proposed innovative machine learning approach help to implement policies that reduce the costs of bankruptcy.
【5】Get RICH or Die Scaling: Profitably Trading Inference Compute for Robustness
标题:获取财富或死亡缩放:盈利交易推理计算以实现稳健性
链接:https://arxiv.org/abs/2510.06790
作者:Tavish McDonald, Bo Lei, Stanislav Fort, Bhavya Kailkhura, Brian Bartoldson
备注:17 pages
摘要:模型容易受到不利的分布外(OOD)数据的影响,尽管对它们的鲁棒性进行了大量的训练计算投资。Zaremba等人(2025)在测试时在这个问题上取得了进展,表明LLM推理提高了旨在阻止攻击的模型规范的满意度,从而导致推理工作与越狱鲁棒性之间的相关性。然而,当攻击者可以访问梯度或多模态输入时,测试计算的这种好处就会消失。我们解决了这一差距,澄清推理计算即使在这种情况下也能带来好处。我们的方法认为,组成的泛化,通过它的分布(ID)组件的OOD数据是可以理解的,使遵守防御规范adversarially OOD输入。也就是说,我们从推理计算假设(RICH)中验证了鲁棒性:推理计算防御的好处是模型的训练数据更好地反映了被攻击数据的组成部分。我们在视觉语言模型和攻击类型上经验性地支持这一假设,如果OOD数据上的规范通过组合泛化解锁,则从测试时计算中获得鲁棒性增益,而RL微调和延长推理并不重要。例如,通过提示越来越强调防御规范,降低了基于梯度的多模态攻击对对抗性预训练鲁棒的VLM的成功率,但这种干预对非鲁棒模型没有这样的好处。推理计算的鲁棒性与基础模型鲁棒性的这种相关性是RICH的丰富-越来越丰富的动态:受攻击的数据组件对于鲁棒模型来说更具ID,有助于组合泛化到OOD数据。因此,我们建议分层训练时和测试时的防御,以获得其协同效益。
摘要:Models are susceptible to adversarially out-of-distribution (OOD) data despite large training-compute investments into their robustification. Zaremba et al. (2025) make progress on this problem at test time, showing LLM reasoning improves satisfaction of model specifications designed to thwart attacks, resulting in a correlation between reasoning effort and robustness to jailbreaks. However, this benefit of test compute fades when attackers are given access to gradients or multimodal inputs. We address this gap, clarifying that inference-compute offers benefits even in such cases. Our approach argues that compositional generalization, through which OOD data is understandable via its in-distribution (ID) components, enables adherence to defensive specifications on adversarially OOD inputs. Namely, we posit the Robustness from Inference Compute Hypothesis (RICH): inference-compute defenses profit as the model's training data better reflects the attacked data's components. We empirically support this hypothesis across vision language model and attack types, finding robustness gains from test-time compute if specification following on OOD data is unlocked by compositional generalization, while RL finetuning and protracted reasoning are not critical. For example, increasing emphasis on defensive specifications via prompting lowers the success rate of gradient-based multimodal attacks on VLMs robustified by adversarial pretraining, but this same intervention provides no such benefit to not-robustified models. This correlation of inference-compute's robustness benefit with base model robustness is the rich-get-richer dynamic of the RICH: attacked data components are more ID for robustified models, aiding compositional generalization to OOD data. Accordingly, we advise layering train-time and test-time defenses to obtain their synergistic benefit.
【6】Beneficial Reasoning Behaviors in Agentic Search and Effective Post-training to Obtain Them
标题:统计搜索中的有益推理行为以及获得它们的有效后训练
链接:https://arxiv.org/abs/2510.06534
作者:Jiahe Jin, Abhijay Paladugu, Chenyan Xiong
摘要:快速搜索利用大型语言模型(LLM)来解释复杂的用户信息需求,并执行规划,搜索和综合信息的多步骤过程以提供答案。当与检索系统和更广泛的网络交互时,这种范式为法学硕士的推理和代理能力带来了独特的挑战。在本文中,我们提出了一个推理驱动的LLM为基础的管道,研究有效的推理行为模式,在代理搜索。使用这个管道,我们分析成功的代理搜索轨迹,并确定四个有益的推理行为:信息验证,权威评估,自适应搜索和错误恢复。基于这些发现,我们提出了一种技术称为行为引发训练更有效的代理搜索模型。它综合了表现出这四种行为的代理搜索轨迹,并通过监督微调(SFT)将它们集成到代理搜索模型中,然后进行标准强化学习(RL)。在三个基准测试(GAIA,WebWalker和HLE)上的实验表明,与直接使用RL训练代理搜索模型相比,Llama3.2-3B和Qwen 3 -1.7B的行为启动收益超过35%。至关重要的是,我们证明了SFT数据中所需的推理行为,而不是最终答案的正确性,是RL后实现强大最终性能的关键因素:对具有理想推理行为但不正确答案的轨迹进行微调,比对具有正确答案的轨迹进行微调,会带来更好的性能。我们的分析进一步揭示了底层机制:引入的推理行为赋予模型更有效的探索(更高的pass@k和熵)和测试时间缩放(更长的轨迹)能力,为RL提供了坚实的基础。我们的代码将作为开源发布。
摘要
:Agentic search leverages large language models (LLMs) to interpret complex user information needs and execute a multi-step process of planning, searching, and synthesizing information to provide answers. This paradigm introduces unique challenges for LLMs' reasoning and agentic capabilities when interacting with retrieval systems and the broader web. In this paper, we propose a reasoning-driven LLM-based pipeline to study effective reasoning behavior patterns in agentic search. Using this pipeline, we analyze successful agentic search trajectories and identify four beneficial reasoning behaviors: Information Verification, Authority Evaluation, Adaptive Search, and Error Recovery. Based on these findings, we propose a technique called Behavior Priming to train more effective agentic search models. It synthesizes agentic search trajectories that exhibit these four behaviors and integrates them into the agentic search model through supervised fine-tuning (SFT), followed by standard reinforcement learning (RL). Experiments on three benchmarks (GAIA, WebWalker, and HLE) demonstrate that behavior priming yields over 35% gains in Llama3.2-3B and Qwen3-1.7B compared to directly training agentic search models with RL. Crucially, we demonstrate that the desired reasoning behaviors in the SFT data, rather than the correctness of the final answer, is the critical factor for achieving strong final performance after RL: fine-tuning on trajectories with desirable reasoning behaviors but incorrect answers leads to better performance than fine-tuning on trajectories with correct answers. Our analysis further reveals the underlying mechanism: the introduced reasoning behaviors endow models with more effective exploration (higher pass@k and entropy) and test-time scaling (longer trajectories) capabilities, providing a strong foundation for RL. Our code will be released as open source.
【7】Context-Aware Inference via Performance Forecasting in Decentralized Learning Networks
标题:去中心化学习网络中通过性能预测进行上下文感知推理
链接:https://arxiv.org/abs/2510.06444
作者:Joel Pfeffer, J. M. Diederik Kruijssen, Clément Gossart, Mélanie Chevance, Diego Campo Millan, Florian Stecker, Steven N. Longmore (Allora Foundation)
备注:17 pages, 12 figures; appeared in ADI (October 2025)
摘要:在去中心化学习网络中,来自许多参与者的预测被组合在一起以生成网络推理。虽然许多研究已经证明了组合多个模型预测的性能优势,但使用线性池方法(从简单平均到动态权重更新)的现有策略面临着一个关键限制。依赖于历史性能来更新权重的动态预测组合必然是反应性的。由于需要对合理数量的时期进行平均(使用移动平均或指数加权),它们往往很慢地适应不断变化的环境(阶段或政权变化)。在这项工作中,我们开发了一个模型,该模型使用机器学习来预测模型在时间序列中每个时期的预测性能。这就能够通过赋予在特定时间可能更准确的模型更高的权重来实现“背景意识”。我们发现,在分散式学习网络中添加一个性能预测工作者,遵循类似于Allora网络的设计,可以提高网络推理的准确性。具体来说,我们发现预测后悔(相对于网络推理的性能)或后悔z得分(相对于其他工人的性能)的预测模型比预测损失的模型表现出更大的改进,而这些模型通常不会优于朴素网络推理(所有推理的历史加权平均值)。通过一系列的优化测试,我们表明,预测模型的性能可能对特征集的选择和训练时期的数量敏感。这些属性可能取决于确切的问题,并应针对每个域进行定制。虽然最初是为分散式学习网络设计的,但在需要预测而不是反应式模型加权的任何情况下,使用性能预测进行预测组合可能是有用的。
摘要:In decentralized learning networks, predictions from many participants are combined to generate a network inference. While many studies have demonstrated performance benefits of combining multiple model predictions, existing strategies using linear pooling methods (ranging from simple averaging to dynamic weight updates) face a key limitation. Dynamic prediction combinations that rely on historical performance to update weights are necessarily reactive. Due to the need to average over a reasonable number of epochs (with moving averages or exponential weighting), they tend to be slow to adjust to changing circumstances (phase or regime changes). In this work, we develop a model that uses machine learning to forecast the performance of predictions by models at each epoch in a time series. This enables `context-awareness' by assigning higher weight to models that are likely to be more accurate at a given time. We show that adding a performance forecasting worker in a decentralized learning network, following a design similar to the Allora network, can improve the accuracy of network inferences. Specifically, we find forecasting models that predict regret (performance relative to the network inference) or regret z-score (performance relative to other workers) show greater improvement than models predicting losses, which often do not outperform the naive network inference (historically weighted average of all inferences). Through a series of optimization tests, we show that the performance of the forecasting model can be sensitive to choices in the feature set and number of training epochs. These properties may depend on the exact problem and should be tailored to each domain. Although initially designed for a decentralized learning network, using performance forecasting for prediction combination may be useful in any situation where predictive rather than reactive model weighting is needed.
【8】PIKAN: Physics-Inspired Kolmogorov-Arnold Networks for Explainable UAV Channel Modelling
标题:PikAN:受物理启发的Kolmogorov-Arnold网络,用于可解释的无人机通道建模
链接:https://arxiv.org/abs/2510.06355
作者:Kürşat Tekbıyık, Güneş Karabulut Kurt, Antoine Lesage-Landry
摘要:无人机(UAV)通信需要精确但可解释的空对地(A2 G)信道模型,可以适应非平稳传播环境。虽然确定性模型提供了可解释性,而深度学习(DL)模型提供了准确性,但这两种方法都存在刚性或缺乏可解释性的问题。为了弥合这一差距,我们提出了物理启发的Kolmogorov-Arnold网络(PIKAN),它嵌入了物理原理(例如,自由空间路径损耗、双射线反射)引入到学习过程中。与物理信息神经网络(PINN)不同,PIKAN在应用物理信息方面更灵活,因为它将它们引入灵活的归纳偏差。因此,它使培训过程更加灵活。对无人机A2 G测量数据的实验表明,PIKAN实现了与DL模型相当的精度,同时提供了与传播规律一致的符号和可解释的表达式。值得注意的是,PIKAN仅用232个参数就实现了这一性能,比具有数千个参数的多层感知器(MLP)基线轻37倍,而不会牺牲与测量的相关性,并且还提供符号表达式。这些结果突出了PIKAN作为超5G和6 G网络中无人机信道建模的有效,可解释和可扩展的解决方案。
摘要:Unmanned aerial vehicle (UAV) communications demand accurate yet interpretable air-to-ground (A2G) channel models that can adapt to nonstationary propagation environments. While deterministic models offer interpretability and deep learning (DL) models provide accuracy, both approaches suffer from either rigidity or a lack of explainability. To bridge this gap, we propose the Physics-Inspired Kolmogorov-Arnold Network (PIKAN) that embeds physical principles (e.g., free-space path loss, two-ray reflections) into the learning process. Unlike physics-informed neural networks (PINNs), PIKAN is more flexible for applying physical information because it introduces them as flexible inductive biases. Thus, it enables a more flexible training process. Experiments on UAV A2G measurement data show that PIKAN achieves comparable accuracy to DL models while providing symbolic and explainable expressions aligned with propagation laws. Remarkably, PIKAN achieves this performance with only 232 parameters, making it up to 37 times lighter than multilayer perceptron (MLP) baselines with thousands of parameters, without sacrificing correlation with measurements and also providing symbolic expressions. These results highlight PIKAN as an efficient, interpretable, and scalable solution for UAV channel modelling in beyond-5G and 6G networks.
【9】Bridging Reasoning to Learning: Unmasking Illusions using Complexity Out of Distribution Generalization
标题:将推理与学习架起桥梁:利用分布概括的复杂性揭开幻想
链接
:https://arxiv.org/abs/2510.06274
作者:Mohammad Mahdi Samiei Paqaleh, Arash Marioriyad, Arman Tahmasebi-Zadeh, Mohamadreza Fereydooni, Mahdi Ghaznavai, Mahdieh Soleymani Baghshah
摘要:最近的进展已经将人工智能的前沿从模式识别任务推向了需要逐步解决的问题,System 2风格的推理,特别是大型语言模型。然而,与学习不同的是,泛化和分布外(OoD)评估概念被很好地形式化,推理能力没有明确,一致的定义或度量。我们提出了复杂性分布(复杂性OoD)的泛化作为一个框架和问题设置来定义和衡量推理。一个模型表现出复杂性OoD泛化,当它保持测试实例的最小所需的解决方案的复杂性,无论是代表性(更丰富的解决方案结构)或计算(更多的推理步骤/程序长度),超过所有的训练示例的性能。我们通过解决方案描述Kolmogorov复杂性和操作代理(例如,对象/关系计数;推理步骤计数),澄清复杂性OoD与长度和组合OoD的不同之处。这个镜头统一了学习和推理:许多情况下可以用系统1解决,比如在低复杂度下处理,在复杂度压力下变成系统2,而系统2可以被视为解决方案结构的泛化。我们将这一观点转化为实践,并提出了在整个堆栈中实施复杂性OoD的建议:将复杂性纳入基准和评估指标设计,重新考虑对目标解决方案跟踪的监督,寻求和设计复杂性OoD泛化的归纳偏差,解决学习的原因溢出问题,如虚假的捷径,语义鲁棒性,灾难性遗忘和逐步校准。由于复杂性OoD无法单独通过扩展数据来解决,因此向鲁棒推理的进展将需要明确建模和分配计算复杂性的架构和训练机制。
摘要:Recent progress has pushed AI frontiers from pattern recognition tasks toward problems that require step by step, System2 style reasoning, especially with large language models. Yet, unlike learning, where generalization and out of distribution (OoD) evaluation concepts are well formalized, there is no clear, consistent definition or metric for reasoning ability. We propose Complexity Out of Distribution (Complexity OoD) generalization as a framework and problem setting to define and measure reasoning. A model exhibits Complexity OoD generalization when it maintains performance on test instances whose minimal required solution complexity, either representational (richer solution structure) or computational (more reasoning steps/program length), exceeds that of all training examples. We formalize complexity via solution description Kolmogorov complexity and operational proxies (e.g., object/relation counts; reasoning step counts), clarifying how Complexity OoD differs from length and compositional OoD. This lens unifies learning and reasoning: many cases solvable with System1 like processing at low complexity become System2 like under complexity pressure, while System2 can be viewed as generalization over solution structures. We translate this perspective into practice with recommendations for operationalizing Complexity OoD across the stack: incorporating complexity into benchmark and evaluation metric design, rethinking supervision to target solution traces, seeking and designing inductive biases for Complexity OoD generalization, addressing learning to reason spillovers such as spurious shortcuts, semantic robustness, catastrophic forgetting, and step wise calibration. Because Complexity OoD cannot be solved by scaling data alone, progress toward robust reasoning will require architectures and training regimes that explicitly model and allocate computation with respect to complexity.
【10】AlphaApollo: Orchestrating Foundation Models and Professional Tools into a Self-Evolving System for Deep Agentic Reasoning
标题:AlphaApollo:将基础模型和专业工具嵌入到深度推理的自我进化系统中
链接:https://arxiv.org/abs/2510.06261
作者:Zhanke Zhou, Chentao Cao, Xiao Feng, Xuan Li, Zongze Li, Xiangyu Lu, Jiangchao Yao, Weikai Huang, Linrui Xu, Tian Cheng, Guanyu Jiang, Yiming Zheng, Brando Miranda, Tongliang Liu, Sanmi Koyejo, Masashi Sugiyama, Bo Han
备注:Ongoing project
摘要:我们提出了AlphaApollo,一个自我进化的代理推理系统,旨在解决基础模型(FM)推理的两个瓶颈有限的模型内在能力和不可靠的测试时间迭代。AlphaApollo使用专业工具编排多个模型,以实现深思熟虑,可验证的推理。它耦合(i)计算工具(带有数值和符号库的Python)和(ii)检索工具(任务相关的外部信息)来执行精确的计算和地面决策。该系统还通过共享状态图支持多轮、多模型解决方案演化,该共享状态图记录候选、可执行检查和反馈以用于迭代细化。在多个型号的AIME 2024/2025评估中,AlphaApollo提供了一致的收益:Qwen2.5- 14 B-Instruct的平均值为+5.15%,通过率为+23.34%,Llama-3.3- 70 B-Instruct的平均值为+8.91%,通过率为+26.67%。工具使用分析表明,超过80%的工具调用被成功执行,与非工具基线一致的表现,从而提高了功能模块的能力上限。更多的实证结果和实施细节将在https://github.com/tmlr-group/AlphaApollo上更新。
摘要:We present AlphaApollo, a self-evolving agentic reasoning system that aims to address two bottlenecks in foundation model (FM) reasoning-limited model-intrinsic capacity and unreliable test-time iteration. AlphaApollo orchestrates multiple models with professional tools to enable deliberate, verifiable reasoning. It couples (i) a computation tool (Python with numerical and symbolic libraries) and (ii) a retrieval tool (task-relevant external information) to execute exact calculations and ground decisions. The system further supports multi-round, multi-model solution evolution via a shared state map that records candidates, executable checks, and feedback for iterative refinement. In evaluations on AIME 2024/2025 across multiple models, AlphaApollo delivers consistent gains: +5.15% Average@32 and +23.34% Pass@32 for Qwen2.5-14B-Instruct, and +8.91% Average@32 with +26.67% Pass@32 for Llama-3.3-70B-Instruct. Tool-use analysis shows that more than 80% of tool calls are successfully executed, with consistent outperformance of non-tool baselines, thereby lifting the capability ceiling of FMs. More empirical results and implementation details will be updated at https://github.com/tmlr-group/AlphaApollo.
【11】Accelerating Inference for Multilayer Neural Networks with Quantum Computers
标题:用量子计算机加速多层神经网络推理
链接:https://arxiv.org/abs/2510.07195
作者:Arthur G. Rattew, Po-Wei Huang, Naixu Guo, Lirandë Pira, Patrick Rebentrost
摘要
:容错量子处理单元(QPU)承诺在特定的计算任务中提供指数级的加速,但它们与现代深度学习管道的集成仍不清楚。在这项工作中,我们通过提出具有非线性激活函数的多层神经网络的第一个完全相干量子实现,朝着弥合这一差距迈出了一步。我们的构造反映了基于ResNet的广泛使用的深度学习架构,由具有多滤波器2D卷积、sigmoid激活、跳过连接和层规范化的残差块组成。我们分析了三种量子数据访问机制下网络推理的复杂性。在没有任何假设的情况下,我们建立了一个二次加速比经典方法的浅双线性风格的网络。通过有效的量子访问权值,我们获得了比经典方法快四倍的速度。通过对输入和网络权重的有效量子访问,我们证明了具有$N$维矢量化输入,$k$残差块层和最终残差线性池层的网络可以以$\n $的误差实现,推理成本为$O(\text{polylog}(N/\log)^k)$。
摘要:Fault-tolerant Quantum Processing Units (QPUs) promise to deliver exponential speed-ups in select computational tasks, yet their integration into modern deep learning pipelines remains unclear. In this work, we take a step towards bridging this gap by presenting the first fully-coherent quantum implementation of a multilayer neural network with non-linear activation functions. Our constructions mirror widely used deep learning architectures based on ResNet, and consist of residual blocks with multi-filter 2D convolutions, sigmoid activations, skip-connections, and layer normalizations. We analyse the complexity of inference for networks under three quantum data access regimes. Without any assumptions, we establish a quadratic speedup over classical methods for shallow bilinear-style networks. With efficient quantum access to the weights, we obtain a quartic speedup over classical methods. With efficient quantum access to both the inputs and the network weights, we prove that a network with an $N$-dimensional vectorized input, $k$ residual block layers, and a final residual-linear-pooling layer can be implemented with an error of $\epsilon$ with $O(\text{polylog}(N/\epsilon)^k)$ inference cost.
【12】Gaussian Equivalence for Self-Attention: Asymptotic Spectral Analysis of Attention Matrix
标题:自我注意力的高斯等效性:注意力矩阵的渐进谱分析
链接:https://arxiv.org/abs/2510.06685
作者:Tomohiro Hayase, Benoît Collins, Ryo Karakida
摘要:自我注意层已经成为现代深度神经网络的基本构建模块,但对它们的理论理解仍然有限,特别是从随机矩阵理论的角度来看。在这项工作中,我们提供了一个严格的分析的奇异值谱的注意力矩阵,并建立了第一个高斯等价结果的注意力。在一个自然的制度,逆温度保持不变的顺序,我们表明,奇异值分布的注意矩阵是渐近的特点是由一个易于处理的线性模型。我们进一步证明了奇异值平方的分布偏离了先前工作中所相信的马尔琴科-帕斯图尔定律。我们的证明依赖于两个关键因素:精确控制归一化项的波动和利用指数的有利泰勒展开的精细线性化。这种分析还确定了线性化的阈值,并阐明了为什么注意力,尽管不是一个入口操作,承认在这个政权严格的高斯等价。
摘要:Self-attention layers have become fundamental building blocks of modern deep neural networks, yet their theoretical understanding remains limited, particularly from the perspective of random matrix theory. In this work, we provide a rigorous analysis of the singular value spectrum of the attention matrix and establish the first Gaussian equivalence result for attention. In a natural regime where the inverse temperature remains of constant order, we show that the singular value distribution of the attention matrix is asymptotically characterized by a tractable linear model. We further demonstrate that the distribution of squared singular values deviates from the Marchenko-Pastur law, which has been believed in previous work. Our proof relies on two key ingredients: precise control of fluctuations in the normalization term and a refined linearization that leverages favorable Taylor expansions of the exponential. This analysis also identifies a threshold for linearization and elucidates why attention, despite not being an entrywise operation, admits a rigorous Gaussian equivalence in this regime.
【13】FEAorta: A Fully Automated Framework for Finite Element Analysis of the Aorta From 3D CT Images
标题:FEAorta:根据3D CT图像对主动脉进行有限元素分析的全自动框架
链接:https://arxiv.org/abs/2510.06621
作者:Jiasong Chen, Linchen Qian, Ruonan Gong, Christina Sun, Tongran Qin, Thuy Pham, Caitlin Martin, Mohammad Zafar, John Elefteriades, Wei Sun, Liang Liang
摘要:主动脉瘤疾病一直排在美国人口死亡原因的前20位。胸主动脉瘤表现为胸主动脉壁的异常膨出,是成人死亡的主要原因。从生物力学的角度来看,当作用在主动脉壁上的应力超过壁强度时,就会发生破裂。通过计算生物力学分析,特别是结构有限元分析,可以得到壁应力分布。对于风险评估,可以通过使用材料失效模型比较应力与材料强度来计算TAA的概率断裂风险。尽管这些工程工具目前可用于患者特定水平的胸主动脉瘤破裂风险评估,但由于两个主要障碍,临床采用受到限制:劳动密集型3D重建当前的患者特定解剖建模仍然依赖于手动分割,使得其耗时并且难以扩展到大的患者群体,和计算负担传统的FEA模拟是资源密集型的并且与时间敏感的临床工作流程不兼容。我们的团队通过开发PyTorch FEA库和FEA DNN集成框架成功克服了第二个障碍。通过在PyTorch FEA中整合FEA功能并应用静态确定性原理,我们将基于FEA的应力计算时间减少到每种情况约3分钟。此外,通过PyTorch FEA库集成DNN和FEA,我们的方法进一步将计算时间减少到每种情况仅几秒。这项工作的重点是通过开发一个端到端的深度神经网络来克服第一个障碍,该网络能够直接从3D CT图像生成患者特定的主动脉有限元网格。
摘要:Aortic aneurysm disease ranks consistently in the top 20 causes of death in the U.S. population. Thoracic aortic aneurysm is manifested as an abnormal bulging of thoracic aortic wall and it is a leading cause of death in adults. From the perspective of biomechanics, rupture occurs when the stress acting on the aortic wall exceeds the wall strength. Wall stress distribution can be obtained by computational biomechanical analyses, especially structural Finite Element Analysis. For risk assessment, probabilistic rupture risk of TAA can be calculated by comparing stress with material strength using a material failure model. Although these engineering tools are currently available for TAA rupture risk assessment on patient specific level, clinical adoption has been limited due to two major barriers: labor intensive 3D reconstruction current patient specific anatomical modeling still relies on manual segmentation, making it time consuming and difficult to scale to a large patient population, and computational burden traditional FEA simulations are resource intensive and incompatible with time sensitive clinical workflows. The second barrier was successfully overcome by our team through the development of the PyTorch FEA library and the FEA DNN integration framework. By incorporating the FEA functionalities within PyTorch FEA and applying the principle of static determinacy, we reduced the FEA based stress computation time to approximately three minutes per case. Moreover, by integrating DNN and FEA through the PyTorch FEA library, our approach further decreases the computation time to only a few seconds per case. This work focuses on overcoming the first barrier through the development of an end to end deep neural network capable of generating patient specific finite element meshes of the aorta directly from 3D CT images.
【14】A Mixed-Methods Analysis of Repression and Mobilization in Bangladesh's July Revolution Using Machine Learning and Statistical Modeling
标题:使用机器学习和统计建模对孟加拉国七月革命中的镇压和动员进行混合方法分析
链接:https://arxiv.org/abs/2510.06264
作者:Md. Saiful Bari Siddiqui, Anupam Debashis Roy
备注:Submitted to Social Forces. Final version may vary from this preprint
摘要:The 2024 July Revolution in Bangladesh represents a landmark event in the study of civil resistance. This study investigates the central paradox of the success of this student-led civilian uprising: how state violence, intended to quell dissent, ultimately fueled the movement's victory. We employ a mixed-methods approach. First, we develop a qualitative narrative of the conflict's timeline to generate specific, testable hypotheses. Then, using a disaggregated, event-level dataset, we employ a multi-method quantitative analysis to dissect the complex relationship between repression and mobilisation. We provide a framework to analyse explosive modern uprisings like the July Revolution. Initial pooled regression models highlight the crucial role of protest momentum in sustaining the movement. To isolate causal effects, we specify a Two-Way Fixed Effects panel model, which provides robust evidence for a direct and statistically significant local suppression backfire effect. Our Vector Autoregression (VAR) analysis provides clear visual evidence of an immediate, nationwide mobilisation in response to increased lethal violence. We further demonstrate that this effect was non-linear. A structural break analysis reveals that the backfire dynamic was statistically insignificant in the conflict's early phase but was triggered by the catalytic moral shock of the first wave of lethal violence, and its visuals circulated around July 16th. A complementary machine learning analysis (XGBoost, out-of-sample R$^{2}$=0.65) corroborates this from a predictive standpoint, identifying "excessive force against protesters" as the single most dominant predictor of nationwide escalation. We conclude that the July Revolution was driven by a contingent, non-linear backfire, triggered by specific catalytic moral shocks and accelerated by the viral reaction to the visual spectacle of state brutality.
检测相关(6篇)
【1】Vacuum Spiker: A Spiking Neural Network-Based Model for Efficient Anomaly Detection in Time Series
标题:真空尖峰:一种用于时间序列中高效异常检测的基于尖峰神经网络的模型
链接:https://arxiv.org/abs/2510.06910
作者:Iago Xabier Vázquez, Javier Sedano, Muhammad Afzal, Ángel Miguel García-Vico
备注:53 pages, 16 figures, preprint submitted to a journal for review
摘要:Anomaly detection is a key task across domains such as industry, healthcare, and cybersecurity. Many real-world anomaly detection problems involve analyzing multiple features over time, making time series analysis a natural approach for such problems. While deep learning models have achieved strong performance in this field, their trend to exhibit high energy consumption limits their deployment in resource-constrained environments such as IoT devices, edge computing platforms, and wearables. To address this challenge, this paper introduces the \textit{Vacuum Spiker algorithm}, a novel Spiking Neural Network-based method for anomaly detection in time series. It incorporates a new detection criterion that relies on global changes in neural activity rather than reconstruction or prediction error. It is trained using Spike Time-Dependent Plasticity in a novel way, intended to induce changes in neural activity when anomalies occur. A new efficient encoding scheme is also proposed, which discretizes the input space into non-overlapping intervals, assigning each to a single neuron. This strategy encodes information with a single spike per time step, improving energy efficiency compared to conventional encoding methods. Experimental results on publicly available datasets show that the proposed algorithm achieves competitive performance while significantly reducing energy consumption, compared to a wide set of deep learning and machine learning baselines. Furthermore, its practical utility is validated in a real-world case study, where the model successfully identifies power curtailment events in a solar inverter. These results highlight its potential for sustainable and efficient anomaly detection.
【2】SDQM: Synthetic Data Quality Metric for Object Detection Dataset Evaluation
标题:SDQM:用于对象检测数据集评估的合成数据质量指标
链接:https://arxiv.org/abs/2510.06596
作者:Ayush Zenith, Arnold Zumbrun, Neel Raut, Jing Lin
摘要:The performance of machine learning models depends heavily on training data. The scarcity of large-scale, well-annotated datasets poses significant challenges in creating robust models. To address this, synthetic data generated through simulations and generative models has emerged as a promising solution, enhancing dataset diversity and improving the performance, reliability, and resilience of models. However, evaluating the quality of this generated data requires an effective metric. This paper introduces the Synthetic Dataset Quality Metric (SDQM) to assess data quality for object detection tasks without requiring model training to converge. This metric enables more efficient generation and selection of synthetic datasets, addressing a key challenge in resource-constrained object detection tasks. In our experiments, SDQM demonstrated a strong correlation with the mean Average Precision (mAP) scores of YOLOv11, a leading object detection model, while previous metrics only exhibited moderate or weak correlations. Additionally, it provides actionable insights for improving dataset quality, minimizing the need for costly iterative training. This scalable and efficient metric sets a new standard for evaluating synthetic data. The code for SDQM is available at https://github.com/ayushzenith/SDQM
【3】A Median Perspective on Unlabeled Data for Out-of-Distribution Detection
标题:用于分布外检测的未标记数据的中位数观点
链接:https://arxiv.org/abs/2510.06505
作者:Momin Abbas, Ali Falahati, Hossein Goli, Mohammad Mohammadi Amiri
摘要
:Out-of-distribution (OOD) detection plays a crucial role in ensuring the robustness and reliability of machine learning systems deployed in real-world applications. Recent approaches have explored the use of unlabeled data, showing potential for enhancing OOD detection capabilities. However, effectively utilizing unlabeled in-the-wild data remains challenging due to the mixed nature of both in-distribution (InD) and OOD samples. The lack of a distinct set of OOD samples complicates the task of training an optimal OOD classifier. In this work, we introduce Medix, a novel framework designed to identify potential outliers from unlabeled data using the median operation. We use the median because it provides a stable estimate of the central tendency, as an OOD detection mechanism, due to its robustness against noise and outliers. Using these identified outliers, along with labeled InD data, we train a robust OOD classifier. From a theoretical perspective, we derive error bounds that demonstrate Medix achieves a low error rate. Empirical results further substantiate our claims, as Medix outperforms existing methods across the board in open-world settings, confirming the validity of our theoretical insights.
【4】Road Surface Condition Detection with Machine Learning using New York State Department of Transportation Camera Images and Weather Forecast Data
标题:使用纽约州交通部摄像机图像和天气预报数据,通过机器学习检测路面状况
链接:https://arxiv.org/abs/2510.06440
作者:Carly Sutter, Kara J. Sulia, Nick P. Bassill, Christopher D. Wirz, Christopher D. Thorncroft, Jay C. Rothenberger, Vanessa Przybylo, Mariana G. Cains, Jacob Radford, David Aaron Evans
摘要:The New York State Department of Transportation (NYSDOT) has a network of roadside traffic cameras that are used by both the NYSDOT and the public to observe road conditions. The NYSDOT evaluates road conditions by driving on roads and observing live cameras, tasks which are labor-intensive but necessary for making critical operational decisions during winter weather events. However, machine learning models can provide additional support for the NYSDOT by automatically classifying current road conditions across the state. In this study, convolutional neural networks and random forests are trained on camera images and weather data to predict road surface conditions. Models are trained on a hand-labeled dataset of ~22,000 camera images, each classified by human labelers into one of six road surface conditions: severe snow, snow, wet, dry, poor visibility, or obstructed. Model generalizability is prioritized to meet the operational needs of the NYSDOT decision makers, and the weather-related road surface condition model in this study achieves an accuracy of 81.5% on completely unseen cameras.
【5】On knot detection via picture recognition
标题:通过图片识别进行打结检测
链接:https://arxiv.org/abs/2510.06284
作者:Anne Dranowski, Yura Kabkov, Daniel Tubbenhauer
备注:21 pages, many figures, comments welcome
摘要:Our goal is to one day take a photo of a knot and have a phone automatically recognize it. In this expository work, we explain a strategy to approximate this goal, using a mixture of modern machine learning methods (in particular convolutional neural networks and transformers for image recognition) and traditional algorithms (to compute quantum invariants like the Jones polynomial). We present simple baselines that predict crossing number directly from images, showing that even lightweight CNN and transformer architectures can recover meaningful structural information. The longer-term aim is to combine these perception modules with symbolic reconstruction into planar diagram (PD) codes, enabling downstream invariant computation for robust knot classification. This two-stage approach highlights the complementarity between machine learning, which handles noisy visual data, and invariants, which enforce rigorous topological distinctions.
【6】Quantum Computing Methods for Malware Detection
标题:恶意软件检测的量子计算方法
链接:https://arxiv.org/abs/2510.06803
作者:Eliška Krátká, Aurél Gábor Gábris
备注:22 pages, 2 figures, 3 tables
摘要:In this paper, we explore the potential of quantum computing in enhancing malware detection through the application of Quantum Machine Learning (QML). Our main objective is to investigate the performance of the Quantum Support Vector Machine (QSVM) algorithm compared to SVM. A publicly available dataset containing raw binaries of Portable Executable (PE) files was used for the classification. The QSVM algorithm, incorporating quantum kernels through different feature maps, was implemented and evaluated on a local simulator within the Qiskit SDK and IBM quantum computers. Experimental results from simulators and quantum hardware provide insights into the behavior and performance of quantum computers, especially in handling large-scale computations for malware detection tasks. The work summarizes the practical experience with using quantum hardware via the Qiskit interfaces. We describe in detail the critical issues encountered, as well as the fixes that had to be developed and applied to the base code of the Qiskit Machine Learning library. These issues include missing transpilation of the circuits submitted to IBM Quantum systems and exceeding the maximum job size limit due to the submission of all the circuits in one job.
分类|识别(2篇)
【1】Discriminative Feature Feedback with General Teacher Classes
标题:普通教师课堂的区分性特征反馈
链接:https://arxiv.org/abs/2510.07245
作者:Omri Bar Oz, Tosca Lechner, Sivan Sabato
摘要:We study the theoretical properties of the interactive learning protocol Discriminative Feature Feedback (DFF) (Dasgupta et al., 2018). The DFF learning protocol uses feedback in the form of discriminative feature explanations. We provide the first systematic study of DFF in a general framework that is comparable to that of classical protocols such as supervised learning and online learning. We study the optimal mistake bound of DFF in the realizable and the non-realizable settings, and obtain novel structural results, as well as insights into the differences between Online Learning and settings with richer feedback such as DFF. We characterize the mistake bound in the realizable setting using a new notion of dimension. In the non-realizable setting, we provide a mistake upper bound and show that it cannot be improved in general. Our results show that unlike Online Learning, in DFF the realizable dimension is insufficient to characterize the optimal non-realizable mistake bound or the existence of no-regret algorithms.
【2】Enhancing Speech Emotion Recognition via Fine-Tuning Pre-Trained Models and Hyper-Parameter Optimisation
标题:通过微调预训练模型和超参数优化增强语音情感识别
链接:https://arxiv.org/abs/2510.07052
作者:Aryan Golbaghi, Shuo Zhou
摘要:We propose a workflow for speech emotion recognition (SER) that combines pre-trained representations with automated hyperparameter optimisation (HPO). Using SpeechBrain wav2vec2-base model fine-tuned on IEMOCAP as the encoder, we compare two HPO strategies, Gaussian Process Bayesian Optimisation (GP-BO) and Tree-structured Parzen Estimators (TPE), under an identical four-dimensional search space and 15-trial budget, with balanced class accuracy (BCA) on the German EmoDB corpus as the objective. All experiments run on 8 CPU cores with 32 GB RAM. GP-BO achieves 0.96 BCA in 11 minutes, and TPE (Hyperopt implementation) attains 0.97 in 15 minutes. In contrast, grid search requires 143 trials and 1,680 minutes to exceed 0.9 BCA, and the best AutoSpeech 2020 baseline reports only 0.85 in 30 minutes on GPU. For cross-lingual generalisation, an EmoDB-trained HPO-tuned model improves zero-shot accuracy by 0.25 on CREMA-D and 0.26 on RAVDESS. Results show that efficient HPO with pre-trained encoders delivers competitive SER on commodity CPUs. Source code to this work is available at: https://github.com/youngaryan/speechbrain-emotion-hpo.
表征(2篇)
【1】Dual Goal Representations
标题:双重目标表示
链接:https://arxiv.org/abs/2510.06714
作者:Seohong Park, Deepinder Mann, Sergey Levine
摘要:In this work, we introduce dual goal representations for goal-conditioned reinforcement learning (GCRL). A dual goal representation characterizes a state by "the set of temporal distances from all other states"; in other words, it encodes a state through its relations to every other state, measured by temporal distance. This representation provides several appealing theoretical properties. First, it depends only on the intrinsic dynamics of the environment and is invariant to the original state representation. Second, it contains provably sufficient information to recover an optimal goal-reaching policy, while being able to filter out exogenous noise. Based on this concept, we develop a practical goal representation learning method that can be combined with any existing GCRL algorithm. Through diverse experiments on the OGBench task suite, we empirically show that dual goal representations consistently improve offline goal-reaching performance across 20 state- and pixel-based tasks.
【2】The Effect of Label Noise on the Information Content of Neural Representations
标题:标签噪音对神经表示信息含量的影响
链接:https://arxiv.org/abs/2510.06401
作者:Ali Hussaini Umar, Franky Kevin Nando Tezoh, Jean Barbier, Santiago Acevedo, Alessandro Laio
备注:10 pages, 5 figures
摘要:In supervised classification tasks, models are trained to predict a label for each data point. In real-world datasets, these labels are often noisy due to annotation errors. While the impact of label noise on the performance of deep learning models has been widely studied, its effects on the networks' hidden representations remain poorly understood. We address this gap by systematically comparing hidden representations using the Information Imbalance, a computationally efficient proxy of conditional mutual information. Through this analysis, we observe that the information content of the hidden representations follows a double descent as a function of the number of network parameters, akin to the behavior of the test error. We further demonstrate that in the underparameterized regime, representations learned with noisy labels are more informative than those learned with clean labels, while in the overparameterized regime, these representations are equally informative. Our results indicate that the representations of overparameterized networks are robust to label noise. We also found that the information imbalance between the penultimate and pre-softmax layers decreases with cross-entropy loss in the overparameterized regime. This offers a new perspective on understanding generalization in classification tasks. Extending our analysis to representations learned from random labels, we show that these perform worse than random features. This indicates that training on random labels drives networks much beyond lazy learning, as weights adapt to encode labels information.
3D|3D重建等相关(1篇)
【1】Unified Molecule Pre-training with Flexible 2D and 3D Modalities: Single and Paired Modality Integration
标题:具有灵活2D和3D模式的统一分子预训练:单一和配对模式集成
链接:https://arxiv.org/abs/2510.07035
作者:Tengwei Song, Min Wu, Yuan Fang
备注:CIKM 2025
摘要:Molecular representation learning plays a crucial role in advancing applications such as drug discovery and material design. Existing work leverages 2D and 3D modalities of molecular information for pre-training, aiming to capture comprehensive structural and geometric insights. However, these methods require paired 2D and 3D molecular data to train the model effectively and prevent it from collapsing into a single modality, posing limitations in scenarios where a certain modality is unavailable or computationally expensive to generate. To overcome this limitation, we propose FlexMol, a flexible molecule pre-training framework that learns unified molecular representations while supporting single-modality input. Specifically, inspired by the unified structure in vision-language models, our approach employs separate models for 2D and 3D molecular data, leverages parameter sharing to improve computational efficiency, and utilizes a decoder to generate features for the missing modality. This enables a multistage continuous learning process where both modalities contribute collaboratively during training, while ensuring robustness when only one modality is available during inference. Extensive experiments demonstrate that FlexMol achieves superior performance across a wide range of molecular property prediction tasks, and we also empirically demonstrate its effectiveness with incomplete data. Our code and data are available at https://github.com/tewiSong/FlexMol.
优化|敛散性(4篇)
【1】COMPASS: A Multi-Turn Benchmark for Tool-Mediated Planning & Preference Optimization
标题:康普斯:工具调解规划和偏好优化的多回合基准
链接:https://arxiv.org/abs/2510.07043
作者:Tian Qin, Felix Bai, Ting-Yao Hu, Raviteja Vemulapalli, Hema Swetha Koppula, Zhiyang Xu, Bowen Jin, Mert Cemri, Jiarui Lu, Zirui Wang, Meng Cao
摘要:Real-world large language model (LLM) agents must master strategic tool use and user preference optimization through multi-turn interactions to assist users with complex planning tasks. We introduce COMPASS (Constrained Optimization through Multi-turn Planning and Strategic Solutions), a benchmark that evaluates agents on realistic travel-planning scenarios. We cast travel planning as a constrained preference optimization problem, where agents must satisfy hard constraints while simultaneously optimizing soft user preferences. To support this, we build a realistic travel database covering transportation, accommodation, and ticketing for 20 U.S. National Parks, along with a comprehensive tool ecosystem that mirrors commercial booking platforms. Evaluating state-of-the-art models, we uncover two critical gaps: (i) an acceptable-optimal gap, where agents reliably meet constraints but fail to optimize preferences, and (ii) a plan-coordination gap, where performance collapses on multi-service (flight and hotel) coordination tasks, especially for open-source models. By grounding reasoning and planning in a practical, user-facing domain, COMPASS provides a benchmark that directly measures an agent's ability to optimize user preferences in realistic tasks, bridging theoretical advances with real-world impact.
【2】POME: Post Optimization Model Edit via Muon-style Projection
标题:POME:通过μ on式投影进行后优化模型编辑
链接:https://arxiv.org/abs/2510.06627
作者:Yong Liu, Di Fu, Yang Luo, Zirui Zhu, Minhao Cheng, Cho-Jui Hsieh, Yang You
摘要:We introduce Post-Optimization Model Edit (POME), a new algorithm that enhances the performance of fine-tuned large language models using only their pretrained and fine-tuned checkpoints, without requiring extra data or further optimization. The core idea is to apply a muon-style projection to $\Delta W$, the difference between the fine-tuned and pretrained weights. This projection uses truncated singular value decomposition (SVD) to equalize the influence of dominant update directions and prune small singular values, which often represent noise. As a simple post-processing step, POME is completely decoupled from the training pipeline. It requires zero modifications and imposes no overhead, making it universally compatible with any optimizer or distributed framework. POME delivers consistent gains, boosting average performance by +2.5\% on GSM8K and +1.0\% on code generation. Its broad applicability -- from 7B foundation models to 72B RLHF-instructed models -- establishes it as a practical, zero-cost enhancement for any fine-tuning pipeline. Code is available at https://github.com/NUS-HPC-AI-Lab/POME.
【3】Nearly Instance-Optimal Parameter Recovery from Many Trajectories via Hellinger Localization
标题:通过Hellinger本地化从多个轨迹中几乎即时最优的参数恢复
链接:https://arxiv.org/abs/2510.06434
作者:Eliot Shekhtman, Yichen Zhou, Ingvar Ziemann, Nikolai Matni, Stephen Tu
摘要:Learning from temporally-correlated data is a core facet of modern machine learning. Yet our understanding of sequential learning remains incomplete, particularly in the multi-trajectory setting where data consists of many independent realizations of a time-indexed stochastic process. This important regime both reflects modern training pipelines such as for large foundation models, and offers the potential for learning without the typical mixing assumptions made in the single-trajectory case. However, instance-optimal bounds are known only for least-squares regression with dependent covariates; for more general models or loss functions, the only broadly applicable guarantees result from a reduction to either i.i.d. learning, with effective sample size scaling only in the number of trajectories, or an existing single-trajectory result when each individual trajectory mixes, with effective sample size scaling as the full data budget deflated by the mixing-time. In this work, we significantly broaden the scope of instance-optimal rates in multi-trajectory settings via the Hellinger localization framework, a general approach for maximum likelihood estimation. Our method proceeds by first controlling the squared Hellinger distance at the path-measure level via a reduction to i.i.d. learning, followed by localization as a quadratic form in parameter space weighted by the trajectory Fisher information. This yields instance-optimal bounds that scale with the full data budget under a broad set of conditions. We instantiate our framework across four diverse case studies: a simple mixture of Markov chains, dependent linear regression under non-Gaussian noise, generalized linear models with non-monotonic activations, and linear-attention sequence models. In all cases, our bounds nearly match the instance-optimal rates from asymptotic normality, substantially improving over standard reductions.
【4】Bayesian Portfolio Optimization by Predictive Synthesis
标题:预测合成的Bayesian投资组合优化
链接:https://arxiv.org/abs/2510.07180
作者:Masahiro Kato, Kentaro Baba, Hibiki Kaibuchi, Ryo Inokuchi
摘要
:Portfolio optimization is a critical task in investment. Most existing portfolio optimization methods require information on the distribution of returns of the assets that make up the portfolio. However, such distribution information is usually unknown to investors. Various methods have been proposed to estimate distribution information, but their accuracy greatly depends on the uncertainty of the financial markets. Due to this uncertainty, a model that could well predict the distribution information at one point in time may perform less accurately compared to another model at a different time. To solve this problem, we investigate a method for portfolio optimization based on Bayesian predictive synthesis (BPS), one of the Bayesian ensemble methods for meta-learning. We assume that investors have access to multiple asset return prediction models. By using BPS with dynamic linear models to combine these predictions, we can obtain a Bayesian predictive posterior about the mean rewards of assets that accommodate the uncertainty of the financial markets. In this study, we examine how to construct mean-variance portfolios and quantile-based portfolios based on the predicted distribution information.
预测|估计(7篇)
【1】Evolutionary Profiles for Protein Fitness Prediction
标题:蛋白质适合度预测的进化概况
链接:https://arxiv.org/abs/2510.07286
作者:Jigang Fan, Xiaoran Jiao, Shengdong Lin, Zhanming Liang, Weian Mao, Chenchen Jing, Hao Chen, Chunhua Shen
摘要:Predicting the fitness impact of mutations is central to protein engineering but constrained by limited assays relative to the size of sequence space. Protein language models (pLMs) trained with masked language modeling (MLM) exhibit strong zero-shot fitness prediction; we provide a unifying view by interpreting natural evolution as implicit reward maximization and MLM as inverse reinforcement learning (IRL), in which extant sequences act as expert demonstrations and pLM log-odds serve as fitness estimates. Building on this perspective, we introduce EvoIF, a lightweight model that integrates two complementary sources of evolutionary signal: (i) within-family profiles from retrieved homologs and (ii) cross-family structural-evolutionary constraints distilled from inverse folding logits. EvoIF fuses sequence-structure representations with these profiles via a compact transition block, yielding calibrated probabilities for log-odds scoring. On ProteinGym (217 mutational assays; >2.5M mutants), EvoIF and its MSA-enabled variant achieve state-of-the-art or competitive performance while using only 0.15% of the training data and fewer parameters than recent large models. Ablations confirm that within-family and cross-family profiles are complementary, improving robustness across function types, MSA depths, taxa, and mutation depths. The codes will be made publicly available at https://github.com/aim-uofa/EvoIF.
【2】Non-Stationary Online Structured Prediction with Surrogate Losses
标题:具有代理损失的非平稳在线结构化预测
链接:https://arxiv.org/abs/2510.07086
作者:Shinsaku Sakaue, Han Bao, Yuzhou Cao
摘要:Online structured prediction, including online classification as a special case, is the task of sequentially predicting labels from input features. Therein the surrogate regret -- the cumulative excess of the target loss (e.g., 0-1 loss) over the surrogate loss (e.g., logistic loss) of the fixed best estimator -- has gained attention, particularly because it often admits a finite bound independent of the time horizon $T$. However, such guarantees break down in non-stationary environments, where every fixed estimator may incur the surrogate loss growing linearly with $T$. We address this by proving a bound of the form $F_T + C(1 + P_T)$ on the cumulative target loss, where $F_T$ is the cumulative surrogate loss of any comparator sequence, $P_T$ is its path length, and $C > 0$ is some constant. This bound depends on $T$ only through $F_T$ and $P_T$, often yielding much stronger guarantees in non-stationary environments. Our core idea is to synthesize the dynamic regret bound of the online gradient descent (OGD) with the technique of exploiting the surrogate gap. Our analysis also sheds light on a new Polyak-style learning rate for OGD, which systematically offers target-loss guarantees and exhibits promising empirical performance. We further extend our approach to a broader class of problems via the convolutional Fenchel--Young loss. Finally, we prove a lower bound showing that the dependence on $F_T$ and $P_T$ is tight.
【3】CNN-TFT explained by SHAP with multi-head attention weights for time series forecasting
标题:SHAP解释了CNN-FT,具有用于时间序列预测的多头注意力权重
链接:https://arxiv.org/abs/2510.06840
作者:Stefano F. Stefenon, João P. Matos-Carvalho, Valderi R. Q. Leithardt, Kin-Choong Yow
摘要:Convolutional neural networks (CNNs) and transformer architectures offer strengths for modeling temporal data: CNNs excel at capturing local patterns and translational invariances, while transformers effectively model long-range dependencies via self-attention. This paper proposes a hybrid architecture integrating convolutional feature extraction with a temporal fusion transformer (TFT) backbone to enhance multivariate time series forecasting. The CNN module first applies a hierarchy of one-dimensional convolutional layers to distill salient local patterns from raw input sequences, reducing noise and dimensionality. The resulting feature maps are then fed into the TFT, which applies multi-head attention to capture both short- and long-term dependencies and to weigh relevant covariates adaptively. We evaluate the CNN-TFT on a hydroelectric natural flow time series dataset. Experimental results demonstrate that CNN-TFT outperforms well-established deep learning models, with a mean absolute percentage error of up to 2.2%. The explainability of the model is obtained by a proposed Shapley additive explanations with multi-head attention weights (SHAP-MHAW). Our novel architecture, named CNN-TFT-SHAP-MHAW, is promising for applications requiring high-fidelity, multivariate time series forecasts, being available for future analysis at https://github.com/SFStefenon/CNN-TFT-SHAP-MHAW .
【4】Early wind turbine alarm prediction based on machine learning: AlarmForecasting
标题:基于机器学习的早期风力涡轮机警报预测:AlarmForecasting
链接:https://arxiv.org/abs/2510.06831
作者:Syed Shazaib Shah, Daoliang Tan
备注:International Journal of Electrical Power and Energy Systems
摘要
:Alarm data is pivotal in curbing fault behavior in Wind Turbines (WTs) and forms the backbone for advancedpredictive monitoring systems. Traditionally, research cohorts have been confined to utilizing alarm data solelyas a diagnostic tool, merely indicative of unhealthy status. However, this study aims to offer a transformativeleap towards preempting alarms, preventing alarms from triggering altogether, and consequently avertingimpending failures. Our proposed Alarm Forecasting and Classification (AFC) framework is designed on twosuccessive modules: first, the regression module based on long short-term memory (LSTM) for time-series alarmforecasting, and thereafter, the classification module to implement alarm tagging on the forecasted alarm. Thisway, the entire alarm taxonomy can be forecasted reliably rather than a few specific alarms. 14 Senvion MM82turbines with an operational period of 5 years are used as a case study; the results demonstrated 82%, 52%,and 41% accurate forecasts for 10, 20, and 30 min alarm forecasts, respectively. The results substantiateanticipating and averting alarms, which is significant in curbing alarm frequency and enhancing operationalefficiency through proactive intervention.
【5】AI-Driven Forecasting and Monitoring of Urban Water System
标题:人工智能驱动的城市供水系统预测和监测
链接:https://arxiv.org/abs/2510.06631
作者:Qiming Guo, Bishal Khatri, Hua Zhang, Wenlu Wang
摘要:Underground water and wastewater pipelines are vital for city operations but plagued by anomalies like leaks and infiltrations, causing substantial water loss, environmental damage, and high repair costs. Conventional manual inspections lack efficiency, while dense sensor deployments are prohibitively expensive. In recent years, artificial intelligence has advanced rapidly and is increasingly applied to urban infrastructure. In this research, we propose an integrated AI and remote-sensor framework to address the challenge of leak detection in underground water pipelines, through deploying a sparse set of remote sensors to capture real-time flow and depth data, paired with HydroNet - a dedicated model utilizing pipeline attributes (e.g., material, diameter, slope) in a directed graph for higher-precision modeling. Evaluations on a real-world campus wastewater network dataset demonstrate that our system collects effective spatio-temporal hydraulic data, enabling HydroNet to outperform advanced baselines. This integration of edge-aware message passing with hydraulic simulations enables accurate network-wide predictions from limited sensor deployments. We envision that this approach can be effectively extended to a wide range of underground water pipeline networks.
【6】Test-Time Efficient Pretrained Model Portfolios for Time Series Forecasting
标题:用于时间序列预测的测试时间高效预训练模型Portfolio
链接:https://arxiv.org/abs/2510.06419
作者:Mert Kayaalp, Caner Turkmen, Oleksandr Shchur, Pedro Mercado, Abdul Fatir Ansari, Michael Bohlke-Schneider, Bernie Wang
摘要:Is bigger always better for time series foundation models? With the question in mind, we explore an alternative to training a single, large monolithic model: building a portfolio of smaller, pretrained forecasting models. By applying ensembling or model selection over these portfolios, we achieve competitive performance on large-scale benchmarks using much fewer parameters. We explore strategies for designing such portfolios and find that collections of specialist models consistently outperform portfolios of independently trained generalists. Remarkably, we demonstrate that post-training a base model is a compute-effective approach for creating sufficiently diverse specialists, and provide evidences that ensembling and model selection are more compute-efficient than test-time fine-tuning.
【7】Asking For It: Question-Answering for Predicting Rule Infractions in Online Content Moderation
标题:要求它:在线内容审核中预测规则违规的志愿服务
链接:https://arxiv.org/abs/2510.06350
作者:Mattia Samory, Diana Pamfile, Andrew To, Shruti Phadke
备注:Accepted at ICWSM 2026
摘要:Online communities rely on a mix of platform policies and community-authored rules to define acceptable behavior and maintain order. However, these rules vary widely across communities, evolve over time, and are enforced inconsistently, posing challenges for transparency, governance, and automation. In this paper, we model the relationship between rules and their enforcement at scale, introducing ModQ, a novel question-answering framework for rule-sensitive content moderation. Unlike prior classification or generation-based approaches, ModQ conditions on the full set of community rules at inference time and identifies which rule best applies to a given comment. We implement two model variants - extractive and multiple-choice QA - and train them on large-scale datasets from Reddit and Lemmy, the latter of which we construct from publicly available moderation logs and rule descriptions. Both models outperform state-of-the-art baselines in identifying moderation-relevant rule violations, while remaining lightweight and interpretable. Notably, ModQ models generalize effectively to unseen communities and rules, supporting low-resource moderation settings and dynamic governance environments.
其他神经网络|深度学习|模型|建模(24篇)
【1】Artificial Hippocampus Networks for Efficient Long-Context Modeling
标题:用于高效长上下文建模的人工海马网络
链接:https://arxiv.org/abs/2510.07318
作者:Yunhao Fang, Weihao Yu, Shu Zhong, Qinghao Ye, Xuehan Xiong, Lai Wei
备注:Code: this https URL
摘要:Long-sequence modeling faces a fundamental trade-off between the efficiency of compressive fixed-size memory in RNN-like models and the fidelity of lossless growing memory in attention-based Transformers. Inspired by the Multi-Store Model in cognitive science, we introduce a memory framework of artificial neural networks. Our method maintains a sliding window of the Transformer's KV cache as lossless short-term memory, while a learnable module termed Artificial Hippocampus Network (AHN) recurrently compresses out-of-window information into a fixed-size compact long-term memory. To validate this framework, we instantiate AHNs using modern RNN-like architectures, including Mamba2, DeltaNet, and Gated DeltaNet. Extensive experiments on long-context benchmarks LV-Eval and InfiniteBench demonstrate that AHN-augmented models consistently outperform sliding window baselines and achieve performance comparable or even superior to full-attention models, while substantially reducing computational and memory requirements. For instance, augmenting the Qwen2.5-3B-Instruct with AHNs reduces inference FLOPs by 40.5% and memory cache by 74.0%, while improving its average score on LV-Eval (128k sequence length) from 4.41 to 5.88. Code is available at: https://github.com/ByteDance-Seed/AHN.
【2】Guided by the Experts: Provable Feature Learning Dynamic of Soft-Routed Mixture-of-Experts
标题:专家指导:软路由混合专家的可证明特征学习动态
链接:https://arxiv.org/abs/2510.07205
作者:Fangshuo Liao, Anastasios Kyrillidis
摘要:Mixture-of-Experts (MoE) architectures have emerged as a cornerstone of modern AI systems. In particular, MoEs route inputs dynamically to specialized experts whose outputs are aggregated through weighted summation. Despite their widespread application, theoretical understanding of MoE training dynamics remains limited to either separate expert-router optimization or only top-1 routing scenarios with carefully constructed datasets. This paper advances MoE theory by providing convergence guarantees for joint training of soft-routed MoE models with non-linear routers and experts in a student-teacher framework. We prove that, with moderate over-parameterization, the student network undergoes a feature learning phase, where the router's learning process is ``guided'' by the experts, that recovers the teacher's parameters. Moreover, we show that a post-training pruning can effectively eliminate redundant neurons, followed by a provably convergent fine-tuning process that reaches global optimality. To our knowledge, our analysis is the first to bring novel insights in understanding the optimization landscape of the MoE architecture.
【3】Generative World Modelling for Humanoids: 1X World Model Challenge Technical Report
标题:类人模型的生成世界建模:1X世界模型挑战赛技术报告
链接:https://arxiv.org/abs/2510.07092
作者:Riccardo Mereu, Aidan Scannell, Yuxin Hou, Yi Zhao, Aditya Jitta, Antonio Dominguez, Luigi Acerbi, Amos Storkey, Paul Chang
备注:6 pages, 3 figures, 1X world model challenge technical report
摘要:World models are a powerful paradigm in AI and robotics, enabling agents to reason about the future by predicting visual observations or compact latent states. The 1X World Model Challenge introduces an open-source benchmark of real-world humanoid interaction, with two complementary tracks: sampling, focused on forecasting future image frames, and compression, focused on predicting future discrete latent codes. For the sampling track, we adapt the video generation foundation model Wan-2.2 TI2V-5B to video-state-conditioned future frame prediction. We condition the video generation on robot states using AdaLN-Zero, and further post-train the model using LoRA. For the compression track, we train a Spatio-Temporal Transformer model from scratch. Our models achieve 23.0 dB PSNR in the sampling task and a Top-500 CE of 6.6386 in the compression task, securing 1st place in both challenges.
【4】Blind Construction of Angular Power Maps in Massive MIMO Networks
标题:大规模CDMA网络中角功率图的盲构建
链接:https://arxiv.org/abs/2510.07071
作者:Zheng Xing, Junting Chen
摘要:Channel state information (CSI) acquisition is a challenging problem in massive multiple-input multiple-output (MIMO) networks. Radio maps provide a promising solution for radio resource management by reducing online CSI acquisition. However, conventional approaches for radio map construction require location-labeled CSI data, which is challenging in practice. This paper investigates unsupervised angular power map construction based on large timescale CSI data collected in a massive MIMO network without location labels. A hidden Markov model (HMM) is built to connect the hidden trajectory of a mobile with the CSI evolution of a massive MIMO channel. As a result, the mobile location can be estimated, enabling the construction of an angular power map. We show that under uniform rectilinear mobility with Poisson-distributed base stations (BSs), the Cramer-Rao Lower Bound (CRLB) for localization error can vanish at any signal-to-noise ratios (SNRs), whereas when BSs are confined to a limited region, the error remains nonzero even with infinite independent measurements. Based on reference signal received power (RSRP) data collected in a real multi-cell massive MIMO network, an average localization error of 18 meters can be achieved although measurements are mainly obtained from a single serving cell.
【5】Native Hybrid Attention for Efficient Sequence Modeling
标题:本地混合关注高效序列建模
链接:https://arxiv.org/abs/2510.07019
作者:Jusen Du, Jiaxi Hu, Tao Zhang, Weigao Sun, Yu Cheng
备注:Technical report, 16 pages
摘要:Transformers excel at sequence modeling but face quadratic complexity, while linear attention offers improved efficiency but often compromises recall accuracy over long contexts. In this work, we introduce Native Hybrid Attention (NHA), a novel hybrid architecture of linear and full attention that integrates both intra \& inter-layer hybridization into a unified layer design. NHA maintains long-term context in key-value slots updated by a linear RNN, and augments them with short-term tokens from a sliding window. A single \texttt{softmax attention} operation is then applied over all keys and values, enabling per-token and per-head context-dependent weighting without requiring additional fusion parameters. The inter-layer behavior is controlled through a single hyperparameter, the sliding window size, which allows smooth adjustment between purely linear and full attention while keeping all layers structurally uniform. Experimental results show that NHA surpasses Transformers and other hybrid baselines on recall-intensive and commonsense reasoning tasks. Furthermore, pretrained LLMs can be structurally hybridized with NHA, achieving competitive accuracy while delivering significant efficiency gains. Code is available at https://github.com/JusenD/NHA.
【6】Spiral Model Technique For Data Science & Machine Learning Lifecycle
标题:数据科学和机器学习数据库的螺旋模型技术
链接:https://arxiv.org/abs/2510.06987
作者:Rohith Mahadevan
摘要:Analytics play an important role in modern business. Companies adapt data science lifecycles to their culture to seek productivity and improve their competitiveness among others. Data science lifecycles are fairly an important contributing factor to start and end a project that are data dependent. Data science and Machine learning life cycles comprises of series of steps that are involved in a project. A typical life cycle states that it is a linear or cyclical model that revolves around. It is mostly depicted that it is possible in a traditional data science life cycle to start the process again after reaching the end of cycle. This paper suggests a new technique to incorporate data science life cycle to business problems that have a clear end goal. A new technique called spiral technique is introduced to emphasize versatility, agility and iterative approach to business processes.
【7】Fisher Information, Training and Bias in Fourier Regression Models
标题:傅立叶回归模型中的Fisher信息、训练和偏差
链接:https://arxiv.org/abs/2510.06945
作者:Lorenzo Pastori, Veronika Eyring, Mierk Schwabe
摘要:Motivated by the growing interest in quantum machine learning, in particular quantum neural networks (QNNs), we study how recently introduced evaluation metrics based on the Fisher information matrix (FIM) are effective for predicting their training and prediction performance. We exploit the equivalence between a broad class of QNNs and Fourier models, and study the interplay between the \emph{effective dimension} and the \emph{bias} of a model towards a given task, investigating how these affect the model's training and performance. We show that for a model that is completely agnostic, or unbiased, towards the function to be learned, a higher effective dimension likely results in a better trainability and performance. On the other hand, for models that are biased towards the function to be learned a lower effective dimension is likely beneficial during training. To obtain these results, we derive an analytical expression of the FIM for Fourier models and identify the features controlling a model's effective dimension. This allows us to construct models with tunable effective dimension and bias, and to compare their training. We furthermore introduce a tensor network representation of the considered Fourier models, which could be a tool of independent interest for the analysis of QNN models. Overall, these findings provide an explicit example of the interplay between geometrical properties, model-task alignment and training, which are relevant for the broader machine learning community.
【8】Recurrence-Complete Frame-based Action Models
标题:回归完整的基于框架的动作模型
链接:https://arxiv.org/abs/2510.06828
作者:Michael Keiblinger
摘要:In recent years, attention-like mechanisms have been used to great success in the space of large language models, unlocking scaling potential to a previously unthinkable extent. "Attention Is All You Need" famously claims RNN cells are not needed in conjunction with attention. We challenge this view. In this paper, we point to existing proofs that architectures with fully parallelizable forward or backward passes cannot represent classes of problems specifically interesting for long-running agentic tasks. We further conjecture a critical time t beyond which non-recurrence-complete models fail to aggregate inputs correctly, with concrete implications for agentic systems (e.g., software engineering agents). To address this, we introduce a recurrence-complete architecture and train it on GitHub-derived action sequences. Loss follows a power law in the trained sequence length while the parameter count remains fixed. Moreover, longer-sequence training always amortizes its linearly increasing wall-time cost, yielding lower loss as a function of wall time.
【9】AutoBalance: An Automatic Balancing Framework for Training Physics-Informed Neural Networks
标题:AutoBalance:用于训练物理信息神经网络的自动平衡框架
链接:https://arxiv.org/abs/2510.06684
作者:Kang An, Chenhao Si, Ming Yan, Shiqian Ma
备注:23 pages
摘要:Physics-Informed Neural Networks (PINNs) provide a powerful and general framework for solving Partial Differential Equations (PDEs) by embedding physical laws into loss functions. However, training PINNs is notoriously difficult due to the need to balance multiple loss terms, such as PDE residuals and boundary conditions, which often have conflicting objectives and vastly different curvatures. Existing methods address this issue by manipulating gradients before optimization (a "pre-combine" strategy). We argue that this approach is fundamentally limited, as forcing a single optimizer to process gradients from spectrally heterogeneous loss landscapes disrupts its internal preconditioning. In this work, we introduce AutoBalance, a novel "post-combine" training paradigm. AutoBalance assigns an independent adaptive optimizer to each loss component and aggregates the resulting preconditioned updates afterwards. Extensive experiments on challenging PDE benchmarks show that AutoBalance consistently outperforms existing frameworks, achieving significant reductions in solution error, as measured by both the MSE and $L^{\infty}$ norms. Moreover, AutoBalance is orthogonal to and complementary with other popular PINN methodologies, amplifying their effectiveness on demanding benchmarks.
【10】Three Forms of Stochastic Injection for Improved Distribution-to-Distribution Generative Modeling
标题:改进的分布到分布生成模型的三种随机注入形式
链接:https://arxiv.org/abs/2510.06634
作者:Shiye Su, Yuhui Zhang, Linqi Zhou, Rajesh Ranganath, Serena Yeung-Levy
摘要
:Modeling transformations between arbitrary data distributions is a fundamental scientific challenge, arising in applications like drug discovery and evolutionary simulation. While flow matching offers a natural framework for this task, its use has thus far primarily focused on the noise-to-data setting, while its application in the general distribution-to-distribution setting is underexplored. We find that in the latter case, where the source is also a data distribution to be learned from limited samples, standard flow matching fails due to sparse supervision. To address this, we propose a simple and computationally efficient method that injects stochasticity into the training process by perturbing source samples and flow interpolants. On five diverse imaging tasks spanning biology, radiology, and astronomy, our method significantly improves generation quality, outperforming existing baselines by an average of 9 FID points. Our approach also reduces the transport cost between input and generated samples to better highlight the true effect of the transformation, making flow matching a more practical tool for simulating the diverse distribution transformations that arise in science.
【11】DPA-Net: A Dual-Path Attention Neural Network for Inferring Glycemic Control Metrics from Self-Monitored Blood Glucose Data
标题:DPA-Net:一种双路径注意力神经网络,用于从自我监控的血糖数据推断血糖控制时间
链接:https://arxiv.org/abs/2510.06623
作者:Canyu Lei, Benjamin Lobo, Jianxin Xie
备注:14 pages, 10 figures
摘要:Continuous glucose monitoring (CGM) provides dense and dynamic glucose profiles that enable reliable estimation of Ambulatory Glucose Profile (AGP) metrics, such as Time in Range (TIR), Time Below Range (TBR), and Time Above Range (TAR). However, the high cost and limited accessibility of CGM restrict its widespread adoption, particularly in low- and middle-income regions. In contrast, self-monitoring of blood glucose (SMBG) is inexpensive and widely available but yields sparse and irregular data that are challenging to translate into clinically meaningful glycemic metrics. In this work, we propose a Dual-Path Attention Neural Network (DPA-Net) to estimate AGP metrics directly from SMBG data. DPA-Net integrates two complementary paths: (1) a spatial-channel attention path that reconstructs a CGM-like trajectory from sparse SMBG observations, and (2) a multi-scale ResNet path that directly predicts AGP metrics. An alignment mechanism between the two paths is introduced to reduce bias and mitigate overfitting. In addition, we develop an active point selector to identify realistic and informative SMBG sampling points that reflect patient behavioral patterns. Experimental results on a large, real-world dataset demonstrate that DPA-Net achieves robust accuracy with low errors while reducing systematic bias. To the best of our knowledge, this is the first supervised machine learning framework for estimating AGP metrics from SMBG data, offering a practical and clinically relevant decision-support tool in settings where CGM is not accessible.
【12】Incoherence in goal-conditioned autoregressive models
标题:目标条件自回归模型中的不一致性
链接:https://arxiv.org/abs/2510.06545
作者:Jacek Karwowski, Raymond Douglas
摘要:We investigate mathematically the notion of incoherence: a structural issue with reinforcement learning policies derived by naive goal-conditioning of autoregressive models. We focus on the process of re-training models on their own actions, that is, fine-tuning offline-learned policies with online RL. We prove that it decreases incoherence and leads to an improvement in return, and we aim to characterize the resulting trajectory of policies. By re-framing standard notions of control-as-inference and soft Q learning, we establish a three-way correspondence with two other ways of understanding the iterative re-training process: as folding the posterior into the reward and, in the deterministic case, as decreasing the temperature parameter; the correspondence has computational content via the training-inference trade-off. Through soft-conditioning generative models, we discuss the link between incoherence and the effective horizon.
【13】Wide Neural Networks as a Baseline for the Computational No-Coincidence Conjecture
标题:宽神经网络作为计算不符合猜想的基线
链接:https://arxiv.org/abs/2510.06527
作者:John Dunbar, Scott Aaronson
摘要:We establish that randomly initialized neural networks, with large width and a natural choice of hyperparameters, have nearly independent outputs exactly when their activation function is nonlinear with zero mean under the Gaussian measure: $\mathbb{E}_{z \sim \mathcal{N}(0,1)}[\sigma(z)]=0$. For example, this includes ReLU and GeLU with an additive shift, as well as tanh, but not ReLU or GeLU by themselves. Because of their nearly independent outputs, we propose neural networks with zero-mean activation functions as a promising candidate for the Alignment Research Center's computational no-coincidence conjecture -- a conjecture that aims to measure the limits of AI interpretability.
【14】Text-to-Image Models Leave Identifiable Signatures: Implications for Leaderboard Security
标题:文本到图像模型留下可识别签名:对排行榜安全性的影响
链接:https://arxiv.org/abs/2510.06525
作者:Ali Naseh, Anshuman Suri, Yuefeng Peng, Harsh Chaudhari, Alina Oprea, Amir Houmansadr
备注:Accepted at Lock-LLM Workshop, NeurIPS 2025
摘要:Generative AI leaderboards are central to evaluating model capabilities, but remain vulnerable to manipulation. Among key adversarial objectives is rank manipulation, where an attacker must first deanonymize the models behind displayed outputs -- a threat previously demonstrated and explored for large language models (LLMs). We show that this problem can be even more severe for text-to-image leaderboards, where deanonymization is markedly easier. Using over 150,000 generated images from 280 prompts and 19 diverse models spanning multiple organizations, architectures, and sizes, we demonstrate that simple real-time classification in CLIP embedding space identifies the generating model with high accuracy, even without prompt control or historical data. We further introduce a prompt-level separability metric and identify prompts that enable near-perfect deanonymization. Our results indicate that rank manipulation in text-to-image leaderboards is easier than previously recognized, underscoring the need for stronger defenses.
【15】Flexible Swarm Learning May Outpace Foundation Models in Essential Tasks
标题
:灵活的群学习可能会在基本任务中超越基础模型
链接:https://arxiv.org/abs/2510.06349
作者:Moein E. Samadi, Andreas Schuppert
摘要:Foundation models have rapidly advanced AI, raising the question of whether their decisions will ultimately surpass human strategies in real-world domains. The exponential, and possibly super-exponential, pace of AI development makes such analysis elusive. Nevertheless, many application areas that matter for daily life and society show only modest gains so far; a prominent case is diagnosing and treating dynamically evolving disease in intensive care. The common challenge is adapting complex systems to dynamic environments. Effective strategies must optimize outcomes in systems composed of strongly interacting functions while avoiding shared side effects; this requires reliable, self-adaptive modeling. These tasks align with building digital twins of highly complex systems whose mechanisms are not fully or quantitatively understood. It is therefore essential to develop methods for self-adapting AI models with minimal data and limited mechanistic knowledge. As this challenge extends beyond medicine, AI should demonstrate clear superiority in these settings before assuming broader decision-making roles. We identify the curse of dimensionality as a fundamental barrier to efficient self-adaptation and argue that monolithic foundation models face conceptual limits in overcoming it. As an alternative, we propose a decentralized architecture of interacting small agent networks (SANs). We focus on agents representing the specialized substructure of the system, where each agent covers only a subset of the full system functions. Drawing on mathematical results on the learning behavior of SANs and evidence from existing applications, we argue that swarm-learning in diverse swarms can enable self-adaptive SANs to deliver superior decision-making in dynamic environments compared with monolithic foundation models, though at the cost of reduced reproducibility in detail.
【16】RVFL-X: A Novel Randomized Network Based on Complex Transformed Real-Valued Tabular Datasets
标题:RVFL-X:一种基于复杂转换实值表格数据集的新型随机网络
链接:https://arxiv.org/abs/2510.06278
作者:M. Sajid, Mushir Akhtar, A. Quadir, M. Tanveer
备注:None
摘要:Recent advancements in neural networks, supported by foundational theoretical insights, emphasize the superior representational power of complex numbers. However, their adoption in randomized neural networks (RNNs) has been limited due to the lack of effective methods for transforming real-valued tabular datasets into complex-valued representations. To address this limitation, we propose two methods for generating complex-valued representations from real-valued datasets: a natural transformation and an autoencoder-driven method. Building on these mechanisms, we propose RVFL-X, a complex-valued extension of the random vector functional link (RVFL) network. RVFL-X integrates complex transformations into real-valued datasets while maintaining the simplicity and efficiency of the original RVFL architecture. By leveraging complex components such as input, weights, and activation functions, RVFL-X processes complex representations and produces real-valued outputs. Comprehensive evaluations on 80 real-valued UCI datasets demonstrate that RVFL-X consistently outperforms both the original RVFL and state-of-the-art (SOTA) RNN variants, showcasing its robustness and effectiveness across diverse application domains.
【17】Covert Quantum Learning: Privately and Verifiably Learning from Quantum Data
标题:隐蔽量子学习:私下且可验证地从量子数据中学习
链接:https://arxiv.org/abs/2510.07193
作者:Abhishek Anand, Matthias C. Caro, Ari Karchmer, Saachi Mutreja
备注:16 + 54 pages
摘要:Quantum learning from remotely accessed quantum compute and data must address two key challenges: verifying the correctness of data and ensuring the privacy of the learner's data-collection strategies and resulting conclusions. The covert (verifiable) learning model of Canetti and Karchmer (TCC 2021) provides a framework for endowing classical learning algorithms with such guarantees. In this work, we propose models of covert verifiable learning in quantum learning theory and realize them without computational hardness assumptions for remote data access scenarios motivated by established quantum data advantages. We consider two privacy notions: (i) strategy-covertness, where the eavesdropper does not gain information about the learner's strategy; and (ii) target-covertness, where the eavesdropper does not gain information about the unknown object being learned. We show: Strategy-covert algorithms for making quantum statistical queries via classical shadows; Target-covert algorithms for learning quadratic functions from public quantum examples and private quantum statistical queries, for Pauli shadow tomography and stabilizer state learning from public multi-copy and private single-copy quantum measurements, and for solving Forrelation and Simon's problem from public quantum queries and private classical queries, where the adversary is a unidirectional or i.i.d. ancilla-free eavesdropper. The lattermost results in particular establish that the exponential separation between classical and quantum queries for Forrelation and Simon's problem survives under covertness constraints. Along the way, we design covert verifiable protocols for quantum data acquisition from public quantum queries which may be of independent interest. Overall, our models and corresponding algorithms demonstrate that quantum advantages are privately and verifiably achievable even with untrusted, remote data.
【18】Explaining Models under Multivariate Bernoulli Distribution via Hoeffding Decomposition
标题:利用Hoeffding分解解释多元伯努里分布下的模型
链接:https://arxiv.org/abs/2510.07088
作者:Baptiste Ferrere (EDF R\&D PRISME, IMT, SINCLAIR AI Lab), Nicolas Bousquet (EDF R\&D PRISME, SINCLAIR AI Lab, LPSM (UMR\_8001)), Fabrice Gamboa (IMT), Jean-Michel Loubes (IMT), Joseph Muré (EDF R\&D PRISME)
摘要
:Explaining the behavior of predictive models with random inputs can be achieved through sub-models decomposition, where such sub-models have easier interpretable features. Arising from the uncertainty quantification community, recent results have demonstrated the existence and uniqueness of a generalized Hoeffding decomposition for such predictive models when the stochastic input variables are correlated, based on concepts of oblique projection onto L 2 subspaces. This article focuses on the case where the input variables have Bernoulli distributions and provides a complete description of this decomposition. We show that in this case the underlying L 2 subspaces are one-dimensional and that the functional decomposition is explicit. This leads to a complete interpretability framework and theoretically allows reverse engineering. Explicit indicators of the influence of inputs on the output prediction (exemplified by Sobol' indices and Shapley effects) can be explicitly derived. Illustrated by numerical experiments, this type of analysis proves useful for addressing decision-support problems, based on binary decision diagrams, Boolean networks or binary neural networks. The article outlines perspectives for exploring high-dimensional settings and, beyond the case of binary inputs, extending these findings to models with finite countable inputs.
【19】Reconquering Bell sampling on qudits: stabilizer learning and testing, quantum pseudorandomness bounds, and more
标题:重新查询qudits上的贝尔抽样:稳定器学习和测试、量子伪随机性界限等
链接:https://arxiv.org/abs/2510.06848
作者:Jonathan Allcock, Joao F. Doriguello, Gábor Ivanyos, Miklos Santha
备注:51 pages, 1 figure. Comments are welcome
摘要:Bell sampling is a simple yet powerful tool based on measuring two copies of a quantum state in the Bell basis, and has found applications in a plethora of problems related to stabiliser states and measures of magic. However, it was not known how to generalise the procedure from qubits to $d$-level systems -- qudits -- for all dimensions $d > 2$ in a useful way. Indeed, a prior work of the authors (arXiv'24) showed that the natural extension of Bell sampling to arbitrary dimensions fails to provide meaningful information about the quantum states being measured. In this paper, we overcome the difficulties encountered in previous works and develop a useful generalisation of Bell sampling to qudits of all $d\geq 2$. At the heart of our primitive is a new unitary, based on Lagrange's four-square theorem, that maps four copies of any stabiliser state $|\mathcal{S}\rangle$ to four copies of its complex conjugate $|\mathcal{S}^\ast\rangle$ (up to some Pauli operator), which may be of independent interest. We then demonstrate the utility of our new Bell sampling technique by lifting several known results from qubits to qudits for any $d\geq 2$: 1. Learning stabiliser states in $O(n^3)$ time with $O(n)$ samples; 2. Solving the Hidden Stabiliser Group Problem in $\tilde{O}(n^3/\varepsilon)$ time with $\tilde{O}(n/\varepsilon)$ samples; 3. Testing whether $|\psi\rangle$ has stabiliser size at least $d^t$ or is $\varepsilon$-far from all such states in $\tilde{O}(n^3/\varepsilon)$ time with $\tilde{O}(n/\varepsilon)$ samples; 4. Clifford circuits with at most $n/2$ single-qudit non-Clifford gates cannot prepare pseudorandom states; 5. Testing whether $|\psi\rangle$ has stabiliser fidelity at least $1-\varepsilon_1$ or at most $1-\varepsilon_2$ with $O(d^2/\varepsilon_2)$ samples if $\varepsilon_1 = 0$ or $O(d^2/\varepsilon_2^2)$ samples if $\varepsilon_1 = O(d^{-2})$.
【20】Q-Learning with Fine-Grained Gap-Dependent Regret
标题:具有细粒度差距相关遗憾的Q学习
链接:https://arxiv.org/abs/2510.06647
作者:Haochen Zhang, Zhong Zheng, Lingzhou Xue
摘要:We study fine-grained gap-dependent regret bounds for model-free reinforcement learning in episodic tabular Markov Decision Processes. Existing model-free algorithms achieve minimax worst-case regret, but their gap-dependent bounds remain coarse and fail to fully capture the structure of suboptimality gaps. We address this limitation by establishing fine-grained gap-dependent regret bounds for both UCB-based and non-UCB-based algorithms. In the UCB-based setting, we develop a novel analytical framework that explicitly separates the analysis of optimal and suboptimal state-action pairs, yielding the first fine-grained regret upper bound for UCB-Hoeffding (Jin et al., 2018). To highlight the generality of this framework, we introduce ULCB-Hoeffding, a new UCB-based algorithm inspired by AMB (Xu et al.,2021) but with a simplified structure, which enjoys fine-grained regret guarantees and empirically outperforms AMB. In the non-UCB-based setting, we revisit the only known algorithm AMB, and identify two key issues in its algorithm design and analysis: improper truncation in the $Q$-updates and violation of the martingale difference condition in its concentration argument. We propose a refined version of AMB that addresses these issues, establishing the first rigorous fine-grained gap-dependent regret for a non-UCB-based method, with experiments demonstrating improved performance over AMB.
【21】Adapting Quantum Machine Learning for Energy Dissociation of Bonds
标题:利用量子机器学习来实现键的能量分解
链接:https://arxiv.org/abs/2510.06563
作者:Swathi Chandrasekhar, Shiva Raj Pokhrel, Navneet Singh
摘要:Accurate prediction of bond dissociation energies (BDEs) underpins mechanistic insight and the rational design of molecules and materials. We present a systematic, reproducible benchmark comparing quantum and classical machine learning models for BDE prediction using a chemically curated feature set encompassing atomic properties (atomic numbers, hybridization), bond characteristics (bond order, type), and local environmental descriptors. Our quantum framework, implemented in Qiskit Aer on six qubits, employs ZZFeatureMap encodings with variational ansatz (RealAmplitudes) across multiple architectures Variational Quantum Regressors (VQR), Quantum Support Vector Regressors (QSVR), Quantum Neural Networks (QNN), Quantum Convolutional Neural Networks (QCNN), and Quantum Random Forests (QRF). These are rigorously benchmarked against strong classical baselines, including Support Vector Regression (SVR), Random Forests (RF), and Multi-Layer Perceptrons (MLP). Comprehensive evaluation spanning absolute and relative error metrics, threshold accuracies, and error distributions shows that top-performing quantum models (QCNN, QRF) match the predictive accuracy and robustness of classical ensembles and deep networks, particularly within the chemically prevalent mid-range BDE regime. These findings establish a transparent baseline for quantum-enhanced molecular property prediction and outline a practical foundation for advancing quantum computational chemistry toward near chemical accuracy.
【22】Diffusion-Guided Renormalization of Neural Systems via Tensor Networks
标题:利用张量网络进行神经系统的扩散引导重正化
链接:https://arxiv.org/abs/2510.06361
作者:Nathan X. Kodama
备注:Reformatted version of Dissertation submitted for the Doctor of Philosophy in Systems and Control Engineering at Case Western Reserve University, 2025
摘要:Far from equilibrium, neural systems self-organize across multiple scales. Exploiting multiscale self-organization in neuroscience and artificial intelligence requires a computational framework for modeling the effective non-equilibrium dynamics of stochastic neural trajectories. Non-equilibrium thermodynamics and representational geometry offer theoretical foundations, but we need scalable data-driven techniques for modeling collective properties of high-dimensional neural networks from partial subsampled observations. Renormalization is a coarse-graining technique central to studying emergent scaling properties of many-body and nonlinear dynamical systems. While widely applied in physics and machine learning, coarse-graining complex dynamical networks remains unsolved, affecting many computational sciences. Recent diffusion-based renormalization, inspired by quantum statistical mechanics, coarse-grains networks near entropy transitions marked by maximal changes in specific heat or information transmission. Here I explore diffusion-based renormalization of neural systems by generating symmetry-breaking representations across scales and offering scalable algorithms using tensor networks. Diffusion-guided renormalization bridges microscale and mesoscale dynamics of dissipative neural systems. For microscales, I developed a scalable graph inference algorithm for discovering community structure from subsampled neural activity. Using community-based node orderings, diffusion-guided renormalization generates renormalization group flow through metagraphs and joint probability functions. Towards mesoscales, diffusion-guided renormalization targets learning the effective non-equilibrium dynamics of dissipative neural trajectories occupying lower-dimensional subspaces, enabling coarse-to-fine control in systems neuroscience and artificial intelligence.
【23】Mass Conservation on Rails - Rethinking Physics-Informed Learning of Ice Flow Vector Fields
标题:铁轨上的质量保护--重新思考冰流场的物理知识学习
链接:https://arxiv.org/abs/2510.06286
作者:Kim Bente, Roman Marchant, Fabio Ramos
备注:Accepted at the Tackling Climate Change with Machine Learning Workshop at NeurIPS 2025. 9 pages, 4 figures
摘要:To reliably project future sea level rise, ice sheet models require inputs that respect physics. Embedding physical principles like mass conservation into models that interpolate Antarctic ice flow vector fields from sparse & noisy measurements not only promotes physical adherence but can also improve accuracy and robustness. While physics-informed neural networks (PINNs) impose physics as soft penalties, offering flexibility but no physical guarantees, we instead propose divergence-free neural networks (dfNNs), which enforce local mass conservation exactly via a vector calculus trick. Our comparison of dfNNs, PINNs, and unconstrained NNs on ice flux interpolation over Byrd Glacier suggests that "mass conservation on rails" yields more reliable estimates, and that directional guidance, a learning strategy leveraging continent-wide satellite velocity data, boosts performance across models.
【24】Developing a Sequential Deep Learning Pipeline to Model Alaskan Permafrost Thaw Under Climate Change
标题:开发连续深度学习管道来模拟气候变化下阿拉斯加永久冻土融化
链接:https://arxiv.org/abs/2510.06258
作者:Addina Rahaman
备注:20 pages, 16 figures. Number of figures are tentative and will be reduced in the future
摘要:Changing climate conditions threaten the natural permafrost thaw-freeze cycle, leading to year-round soil temperatures above 0{\deg}C. In Alaska, the warming of the topmost permafrost layer, known as the active layer, signals elevated greenhouse gas release due to high carbon storage. Accurate soil temperature prediction is therefore essential for risk mitigation and stability assessment; however, many existing approaches overlook the numerous factors driving soil thermal dynamics. This study presents a proof-of-concept latitude-based deep learning pipeline for modeling yearly soil temperatures across multiple depths. The framework employs dynamic reanalysis feature data from the ERA5-Land dataset, static geologic and lithological features, sliding-window sequences for seasonal context, a derived scenario signal feature for long-term climate forcing, and latitude band embeddings for spatial sensitivity. Five deep learning models were tested: a Temporal Convolutional Network (TCN), a Transformer, a 1-Dimensional Convolutional Long-Short Term Memory (Conv1DLSTM), a Gated-Recurrent Unit (GRU), and a Bidirectional Long-Short Term Memory (BiLSTM). Results showed solid recognition of latitudinal and depth-wise temperature discrepancies, with the GRU performing best in sequential temperature pattern detection. Bias-corrected CMIP5 RCP data enabled recognition of sinusoidal temperature trends, though limited divergence between scenarios were observed. This study establishes an end-to-end framework for adopting deep learning in active layer temperature modeling, offering seasonal, spatial, and vertical temperature context without intrinsic restrictions on feature selection.
其他(42篇)
【1】Vibe Checker: Aligning Code Evaluation with Human Preference
标题:Vibe收件箱:将代码评估与人类偏好保持一致
链接:https://arxiv.org/abs/2510.07315
作者:Ming Zhong, Xiang Zhou, Ting-Yun Chang, Qingze Wang, Nan Xu, Xiance Si, Dan Garrette, Shyam Upadhyay, Jeremiah Liu, Jiawei Han, Benoit Schillings, Jiao Sun
备注:Preprint
摘要
:Large Language Models (LLMs) have catalyzed vibe coding, where users leverage LLMs to generate and iteratively refine code through natural language interactions until it passes their vibe check. Vibe check is tied to real-world human preference and goes beyond functionality: the solution should feel right, read cleanly, preserve intent, and remain correct. However, current code evaluation remains anchored to pass@k and captures only functional correctness, overlooking the non-functional instructions that users routinely apply. In this paper, we hypothesize that instruction following is the missing piece underlying vibe check that represents human preference in coding besides functional correctness. To quantify models' code instruction following capabilities with measurable signals, we present VeriCode, a taxonomy of 30 verifiable code instructions together with corresponding deterministic verifiers. We use the taxonomy to augment established evaluation suites, resulting in Vibe Checker, a testbed to assess both code instruction following and functional correctness. Upon evaluating 31 leading LLMs, we show that even the strongest models struggle to comply with multiple instructions and exhibit clear functional regression. Most importantly, a composite score of functional correctness and instruction following correlates the best with human preference, with the latter emerging as the primary differentiator on real-world programming tasks. Our work identifies core factors of the vibe check, providing a concrete path for benchmarking and developing models that better align with user preferences in coding.
【2】MLE-Smith: Scaling MLE Tasks with Automated Multi-Agent Pipeline
标题:MLE-Smith:利用自动多代理管道扩展MLE任务
链接:https://arxiv.org/abs/2510.07307
作者:Rushi Qiang, Yuchen Zhuang, Anikait Singh, Percy Liang, Chao Zhang, Sherry Yang, Bo Dai
摘要:While Language Models (LMs) have made significant progress in automating machine learning engineering (MLE), the acquisition of high-quality MLE training data is significantly constrained. Current MLE benchmarks suffer from low scalability and limited applicability because they rely on static, manually curated tasks, demanding extensive time and manual effort to produce. We introduce MLE-Smith, a fully automated multi-agent pipeline, to transform raw datasets into competition-style MLE challenges through an efficient generate-verify-execute paradigm for scaling MLE tasks with verifiable quality, real-world usability, and rich diversity. The proposed multi-agent pipeline in MLE-Smith drives structured task design and standardized refactoring, coupled with a hybrid verification mechanism that enforces strict structural rules and high-level semantic soundness. It further validates empirical solvability and real-world fidelity through interactive execution. We apply MLE-Smith to 224 of real-world datasets and generate 606 tasks spanning multiple categories, objectives, and modalities, demonstrating that MLE-Smith can work effectively across a wide range of real-world datasets. Evaluation on the generated tasks shows that the performance of eight mainstream and cutting-edge LLMs on MLE-Smith tasks is strongly correlated with their performance on carefully human-designed tasks, highlighting the effectiveness of the MLE-Smith to scaling up MLE tasks, while maintaining task quality.
【3】Cocoon: A System Architecture for Differentially Private Training with Correlated Noises
标题:Cocoon:具有相关噪音的差异化私人训练的系统架构
链接:https://arxiv.org/abs/2510.07304
作者:Donghwan Kim, Xin Gu, Jinho Baek, Timothy Lo, Younghoon Min, Kwangsik Shin, Jongryool Kim, Jongse Park, Kiwan Maeng
摘要:Machine learning (ML) models memorize and leak training data, causing serious privacy issues to data owners. Training algorithms with differential privacy (DP), such as DP-SGD, have been gaining attention as a solution. However, DP-SGD adds a noise at each training iteration, which degrades the accuracy of the trained model. To improve accuracy, a new family of approaches adds carefully designed correlated noises, so that noises cancel out each other across iterations. We performed an extensive characterization study of these new mechanisms, for the first time to the best of our knowledge, and show they incur non-negligible overheads when the model is large or uses large embedding tables. Motivated by the analysis, we propose Cocoon, a hardware-software co-designed framework for efficient training with correlated noises. Cocoon accelerates models with embedding tables through pre-computing and storing correlated noises in a coalesced format (Cocoon-Emb), and supports large models through a custom near-memory processing device (Cocoon-NMP). On a real system with an FPGA-based NMP device prototype, Cocoon improves the performance by 2.33-10.82x(Cocoon-Emb) and 1.55-3.06x (Cocoon-NMP).
【4】Online Rubrics Elicitation from Pairwise Comparisons
标题:在线标题从成对比较中启发
链接:https://arxiv.org/abs/2510.07284
作者:MohammadHossein Rezaei, Robert Vacareanu, Zihao Wang, Clinton Wang, Yunzhong He, Afra Feyza Akyürek
摘要:Rubrics provide a flexible way to train LLMs on open-ended long-form answers where verifiable rewards are not applicable and human preferences provide coarse signals. Prior work shows that reinforcement learning with rubric-based rewards leads to consistent gains in LLM post-training. Most existing approaches rely on rubrics that remain static over the course of training. Such static rubrics, however, are vulnerable to reward-hacking type behaviors and fail to capture emergent desiderata that arise during training. We introduce Online Rubrics Elicitation (OnlineRubrics), a method that dynamically curates evaluation criteria in an online manner through pairwise comparisons of responses from current and reference policies. This online process enables continuous identification and mitigation of errors as training proceeds. Empirically, this approach yields consistent improvements of up to 8% over training exclusively with static rubrics across AlpacaEval, GPQA, ArenaHard as well as the validation sets of expert questions and rubrics. We qualitatively analyze the elicited criteria and identify prominent themes such as transparency, practicality, organization, and reasoning.
【5】Dynamic Regret Bounds for Online Omniprediction with Long Term Constraints
标题:具有长期约束的在线全方位预测的动态遗憾界限
链接:https://arxiv.org/abs/2510.07266
作者:Yahav Bechavod, Jiuyao Lu, Aaron Roth
摘要
:We present an algorithm guaranteeing dynamic regret bounds for online omniprediction with long term constraints. The goal in this recently introduced problem is for a learner to generate a sequence of predictions which are broadcast to a collection of downstream decision makers. Each decision maker has their own utility function, as well as a vector of constraint functions, each mapping their actions and an adversarially selected state to reward or constraint violation terms. The downstream decision makers select actions "as if" the state predictions are correct, and the goal of the learner is to produce predictions such that all downstream decision makers choose actions that give them worst-case utility guarantees while minimizing worst-case constraint violation. Within this framework, we give the first algorithm that obtains simultaneous \emph{dynamic regret} guarantees for all of the agents -- where regret for each agent is measured against a potentially changing sequence of actions across rounds of interaction, while also ensuring vanishing constraint violation for each agent. Our results do not require the agents themselves to maintain any state -- they only solve one-round constrained optimization problems defined by the prediction made at that round.
【6】Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense
标题:混合强化:当奖励稀疏时,最好密集
链接:https://arxiv.org/abs/2510.07242
作者:Leitian Tao, Ilia Kulikov, Swarnadeep Saha, Tianlu Wang, Jing Xu, Yixuan Li, Jason E Weston, Ping Yu
备注:20 pages
摘要:Post-training for reasoning of large language models (LLMs) increasingly relies on verifiable rewards: deterministic checkers that provide 0-1 correctness signals. While reliable, such binary feedback is brittle--many tasks admit partially correct or alternative answers that verifiers under-credit, and the resulting all-or-nothing supervision limits learning. Reward models offer richer, continuous feedback, which can serve as a complementary supervisory signal to verifiers. We introduce HERO (Hybrid Ensemble Reward Optimization), a reinforcement learning framework that integrates verifier signals with reward-model scores in a structured way. HERO employs stratified normalization to bound reward-model scores within verifier-defined groups, preserving correctness while refining quality distinctions, and variance-aware weighting to emphasize challenging prompts where dense signals matter most. Across diverse mathematical reasoning benchmarks, HERO consistently outperforms RM-only and verifier-only baselines, with strong gains on both verifiable and hard-to-verify tasks. Our results show that hybrid reward design retains the stability of verifiers while leveraging the nuance of reward models to advance reasoning.
【7】A Broader View of Thompson Sampling
标题:汤普森抽样的更广泛视角
链接:https://arxiv.org/abs/2510.07208
作者:Yanlin Qu, Hongseok Namkoong, Assaf Zeevi
摘要:Thompson Sampling is one of the most widely used and studied bandit algorithms, known for its simple structure, low regret performance, and solid theoretical guarantees. Yet, in stark contrast to most other families of bandit algorithms, the exact mechanism through which posterior sampling (as introduced by Thompson) is able to "properly" balance exploration and exploitation, remains a mystery. In this paper we show that the core insight to address this question stems from recasting Thompson Sampling as an online optimization algorithm. To distill this, a key conceptual tool is introduced, which we refer to as "faithful" stationarization of the regret formulation. Essentially, the finite horizon dynamic optimization problem is converted into a stationary counterpart which "closely resembles" the original objective (in contrast, the classical infinite horizon discounted formulation, that leads to the Gittins index, alters the problem and objective in too significant a manner). The newly crafted time invariant objective can be studied using Bellman's principle which leads to a time invariant optimal policy. When viewed through this lens, Thompson Sampling admits a simple online optimization form that mimics the structure of the Bellman-optimal policy, and where greediness is regularized by a measure of residual uncertainty based on point-biserial correlation. This answers the question of how Thompson Sampling balances exploration-exploitation, and moreover, provides a principled framework to study and further improve Thompson's original idea.
【8】ELMUR: External Layer Memory with Update/Rewrite for Long-Horizon RL
标题:ELMUR:具有长期RL更新/重写功能的外部层存储器
链接:https://arxiv.org/abs/2510.07151
作者:Egor Cherepanov, Alexey K. Kovalev, Aleksandr I. Panov
备注:22 pages, 7 figures
摘要:Real-world robotic agents must act under partial observability and long horizons, where key cues may appear long before they affect decision making. However, most modern approaches rely solely on instantaneous information, without incorporating insights from the past. Standard recurrent or transformer models struggle with retaining and leveraging long-term dependencies: context windows truncate history, while naive memory extensions fail under scale and sparsity. We propose ELMUR (External Layer Memory with Update/Rewrite), a transformer architecture with structured external memory. Each layer maintains memory embeddings, interacts with them via bidirectional cross-attention, and updates them through an Least Recently Used (LRU) memory module using replacement or convex blending. ELMUR extends effective horizons up to 100,000 times beyond the attention window and achieves a 100% success rate on a synthetic T-Maze task with corridors up to one million steps. In POPGym, it outperforms baselines on more than half of the tasks. On MIKASA-Robo sparse-reward manipulation tasks with visual observations, it nearly doubles the performance of strong baselines. These results demonstrate that structured, layer-local external memory offers a simple and scalable approach to decision making under partial observability.
【9】TRIM: Token-wise Attention-Derived Saliency for Data-Efficient Instruction Tuning
标题:TRIM:基于令牌的注意力派生显着性,用于数据高效的指令调优
链接:https://arxiv.org/abs/2510.07118
作者:Manish Nagaraj, Sakshi Choudhary, Utkarsh Saxena, Deepak Ravikumar, Kaushik Roy
摘要
:Instruction tuning is essential for aligning large language models (LLMs) to downstream tasks and commonly relies on large, diverse corpora. However, small, high-quality subsets, known as coresets, can deliver comparable or superior results, though curating them remains challenging. Existing methods often rely on coarse, sample-level signals like gradients, an approach that is computationally expensive and overlooks fine-grained features. To address this, we introduce TRIM (Token Relevance via Interpretable Multi-layer Attention), a forward-only, token-centric framework. Instead of using gradients, TRIM operates by matching underlying representational patterns identified via attention-based "fingerprints" from a handful of target samples. Such an approach makes TRIM highly efficient and uniquely sensitive to the structural features that define a task. Coresets selected by our method consistently outperform state-of-the-art baselines by up to 9% on downstream tasks and even surpass the performance of full-data fine-tuning in some settings. By avoiding expensive backward passes, TRIM achieves this at a fraction of the computational cost. These findings establish TRIM as a scalable and efficient alternative for building high-quality instruction-tuning datasets.
【10】The Contingencies of Physical Embodiment Allow for Open-Endedness and Care
标题:身体化身的偶然性允许开放和护理
链接:https://arxiv.org/abs/2510.07117
作者:Leonardo Christov-Moore (1), Arthur Juliani (1), Alex Kiefer (1 and 2 and 3), Nicco Reggente (1), B. Scott Rousse (4), Adam Safron (1 and 5), Nicol'as Hinrichs (6 and 7), Daniel Polani (8), Antonio Damasio (9) ((1) Institute for Advanced Consciousness Studies, Santa Monica, CA, (2) VERSES, (3) Monash Centre for Consciousness and Contemplative Studies, (4) Allen Discovery Center, (5) Allen Discovery Center, (6) Okinawa Institute of Science and Technology, (7) Max Planck Institute for Human Cognitive and Brain Sciences, (8) University of Hertfordshire, (9) Brain and Creativity Institute)
备注:15 pages, 1 figure
摘要:Physical vulnerability and mortality are often seen as obstacles to be avoided in the development of artificial agents, which struggle to adapt to open-ended environments and provide aligned care. Meanwhile, biological organisms survive, thrive, and care for each other in an open-ended physical world with relative ease and efficiency. Understanding the role of the conditions of life in this disparity can aid in developing more robust, adaptive, and caring artificial agents. Here we define two minimal conditions for physical embodiment inspired by the existentialist phenomenology of Martin Heidegger: being-in-the-world (the agent is a part of the environment) and being-towards-death (unless counteracted, the agent drifts toward terminal states due to the second law of thermodynamics). We propose that from these conditions we can obtain both a homeostatic drive - aimed at maintaining integrity and avoiding death by expending energy to learn and act - and an intrinsic drive to continue to do so in as many ways as possible. Drawing inspiration from Friedrich Nietzsche's existentialist concept of will-to-power, we examine how intrinsic drives to maximize control over future states, e.g., empowerment, allow agents to increase the probability that they will be able to meet their future homeostatic needs, thereby enhancing their capacity to maintain physical integrity. We formalize these concepts within a reinforcement learning framework, which enables us to examine how intrinsically driven embodied agents learning in open-ended multi-agent environments may cultivate the capacities for open-endedness and care.ov
【11】Pseudo-MDPs: A Novel Framework for Efficiently Optimizing Last Revealer Seed Manipulations in Blockchains
标题:伪MDPs:一种有效优化区块链中最后揭示者种子操作的新型框架
链接:https://arxiv.org/abs/2510.07080
作者:Maxime Reynouard
摘要:This study tackles the computational challenges of solving Markov Decision Processes (MDPs) for a restricted class of problems. It is motivated by the Last Revealer Attack (LRA), which undermines fairness in some Proof-of-Stake (PoS) blockchains such as Ethereum (\$400B market capitalization). We introduce pseudo-MDPs (pMDPs) a framework that naturally models such problems and propose two distinct problem reductions to standard MDPs. One problem reduction provides a novel, counter-intuitive perspective, and combining the two problem reductions enables significant improvements in dynamic programming algorithms such as value iteration. In the case of the LRA which size is parameterized by $\kappa$ (in Ethereum's case $\kappa$ = 32), we reduce the computational complexity from O(2^$\kappa$ $\kappa$^2^($\kappa$+2)) to O($\kappa$^4) (per iteration). This solution also provide the usual benefits from Dynamic Programming solutions: exponentially fast convergence toward the optimal solution is guaranteed. The dual perspective also simplifies policy extraction, making the approach well-suited for resource-constrained agents who can operate with very limited memory and computation once the problem has been solved. Furthermore, we generalize those results to a broader class of MDPs, enhancing their applicability. The framework is validated through two case studies: a fictional card game and the LRA on the Ethereum random seed consensus protocol. These applications demonstrate the framework's ability to solve large-scale problems effectively while offering actionable insights into optimal strategies. This work advances the study of MDPs and contributes to understanding security vulnerabilities in blockchain systems.
【12】Federated Unlearning in the Wild: Rethinking Fairness and Data Discrepancy
标题:野外联合放弃学习:重新思考公平和数据差异
链接:https://arxiv.org/abs/2510.07022
作者:ZiHeng Huang, Di Wu, Jun Bai, Jiale Zhang, Sicong Cao, Ji Zhang, Yingjie Hu
摘要:Machine unlearning is critical for enforcing data deletion rights like the "right to be forgotten." As a decentralized paradigm, Federated Learning (FL) also requires unlearning, but realistic implementations face two major challenges. First, fairness in Federated Unlearning (FU) is often overlooked. Exact unlearning methods typically force all clients into costly retraining, even those uninvolved. Approximate approaches, using gradient ascent or distillation, make coarse interventions that can unfairly degrade performance for clients with only retained data. Second, most FU evaluations rely on synthetic data assumptions (IID/non-IID) that ignore real-world heterogeneity. These unrealistic benchmarks obscure the true impact of unlearning and limit the applicability of current methods. We first conduct a comprehensive benchmark of existing FU methods under realistic data heterogeneity and fairness conditions. We then propose a novel, fairness-aware FU approach, Federated Cross-Client-Constrains Unlearning (FedCCCU), to explicitly address both challenges. FedCCCU offers a practical and scalable solution for real-world FU. Experimental results show that existing methods perform poorly in realistic settings, while our approach consistently outperforms them.
【13】Revisiting Mixout: An Overlooked Path to Robust Finetuning
标题:重温Mixout:通往稳健微调的被忽视之路
链接:https://arxiv.org/abs/2510.06982
作者:Masih Aminbeidokhti, Heitor Rapela Medeiros, Eric Granger, Marco Pedersoli
摘要
:Finetuning vision foundation models often improves in-domain accuracy but comes at the cost of robustness under distribution shift. We revisit Mixout, a stochastic regularizer that intermittently replaces finetuned weights with their pretrained reference, through the lens of a single-run, weight-sharing implicit ensemble. This perspective reveals three key levers that govern robustness: the \emph{masking anchor}, \emph{resampling frequency}, and \emph{mask sparsity}. Guided by this analysis, we introduce GMixout, which (i) replaces the fixed anchor with an exponential moving-average snapshot that adapts during training, and (ii) regulates masking period via an explicit resampling-frequency hyperparameter. Our sparse-kernel implementation updates only a small fraction of parameters with no inference-time overhead, enabling training on consumer-grade GPUs. Experiments on benchmarks covering covariate shift, corruption, and class imbalance, ImageNet / ImageNet-LT, DomainNet, iWildCam, and CIFAR100-C, GMixout consistently improves in-domain accuracy beyond zero-shot performance while surpassing both Model Soups and strong parameter-efficient finetuning baselines under distribution shift.
【14】High-Rate Mixout: Revisiting Mixout for Robust Domain Generalization
标题:高速Mixout:重新审视Mixout以实现稳健的领域概括
链接:https://arxiv.org/abs/2510.06955
作者:Masih Aminbeidokhti, Heitor Rapela Medeiros, Eric Granger, Marco Pedersoli
备注:WACV 2026: Winter Conference on Applications of Computer Vision 2026
摘要:Ensembling fine-tuned models initialized from powerful pre-trained weights is a common strategy to improve robustness under distribution shifts, but it comes with substantial computational costs due to the need to train and store multiple models. Dropout offers a lightweight alternative by simulating ensembles through random neuron deactivation; however, when applied to pre-trained models, it tends to over-regularize and disrupt critical representations necessary for generalization. In this work, we investigate Mixout, a stochastic regularization technique that provides an alternative to Dropout for domain generalization. Rather than deactivating neurons, Mixout mitigates overfitting by probabilistically swapping a subset of fine-tuned weights with their pre-trained counterparts during training, thereby maintaining a balance between adaptation and retention of prior knowledge. Our study reveals that achieving strong performance with Mixout on domain generalization benchmarks requires a notably high masking probability of 0.9 for ViTs and 0.8 for ResNets. While this may seem like a simple adjustment, it yields two key advantages for domain generalization: (1) higher masking rates more strongly penalize deviations from the pre-trained parameters, promoting better generalization to unseen domains; and (2) high-rate masking substantially reduces computational overhead, cutting gradient computation by up to 45% and gradient memory usage by up to 90%. Experiments across five domain generalization benchmarks, PACS, VLCS, OfficeHome, TerraIncognita, and DomainNet, using ResNet and ViT architectures, show that our approach, High-rate Mixout, achieves out-of-domain accuracy comparable to ensemble-based methods while significantly reducing training costs.
【15】Grouped Differential Attention
标题:分组差异注意力
链接:https://arxiv.org/abs/2510.06949
作者:Junghwan Lim, Sungmin Lee, Dongseok Kim, Wai Ting Cheung, Beomgyu Kim, Taehwan Kim, Haesol Lee, Junhyeok Lee, Dongpin Oh, Eunhwan Park
摘要:The self-attention mechanism, while foundational to modern Transformer architectures, suffers from a critical inefficiency: it frequently allocates substantial attention to redundant or noisy context. Differential Attention addressed this by using subtractive attention maps for signal and noise, but its required balanced head allocation imposes rigid constraints on representational flexibility and scalability. To overcome this, we propose Grouped Differential Attention (GDA), a novel approach that introduces unbalanced head allocation between signal-preserving and noise-control groups. GDA significantly enhances signal focus by strategically assigning more heads to signal extraction and fewer to noise-control, stabilizing the latter through controlled repetition (akin to GQA). This design achieves stronger signal fidelity with minimal computational overhead. We further extend this principle to group-differentiated growth, a scalable strategy that selectively replicates only the signal-focused heads, thereby ensuring efficient capacity expansion. Through large-scale pretraining and continual training experiments, we demonstrate that moderate imbalance ratios in GDA yield substantial improvements in generalization and stability compared to symmetric baselines. Our results collectively establish that ratio-aware head allocation and selective expansion offer an effective and practical path toward designing scalable, computation-efficient Transformer architectures.
【16】Multi-Dimensional Autoscaling of Stream Processing Services on Edge Devices
标题:边缘设备上流处理服务的多维自动扩展
链接:https://arxiv.org/abs/2510.06882
作者:Boris Sedlak, Philipp Raith, Andrea Morichetta, Víctor Casamayor Pujol, Schahram Dustdar
摘要:Edge devices have limited resources, which inevitably leads to situations where stream processing services cannot satisfy their needs. While existing autoscaling mechanisms focus entirely on resource scaling, Edge devices require alternative ways to sustain the Service Level Objectives (SLOs) of competing services. To address these issues, we introduce a Multi-dimensional Autoscaling Platform (MUDAP) that supports fine-grained vertical scaling across both service- and resource-level dimensions. MUDAP supports service-specific scaling tailored to available parameters, e.g., scale data quality or model size for a particular service. To optimize the execution across services, we present a scaling agent based on Regression Analysis of Structural Knowledge (RASK). The RASK agent efficiently explores the solution space and learns a continuous regression model of the processing environment for inferring optimal scaling actions. We compared our approach with two autoscalers, the Kubernetes VPA and a reinforcement learning agent, for scaling up to 9 services on a single Edge device. Our results showed that RASK can infer an accurate regression model in merely 20 iterations (i.e., observe 200s of processing). By increasingly adding elasticity dimensions, RASK sustained the highest request load with 28% less SLO violations, compared to baselines.
【17】Vectorized FlashAttention with Low-cost Exponential Computation in RISC-V Vector Processors
标题:RISC-V向量处理器中具有低成本指数计算的向量化Flash Attention
链接:https://arxiv.org/abs/2510.06834
作者:Vasileios Titopoulos, Kosmas Alexandridis, Giorgos Dimitrakopoulos
摘要:Attention is a core operation in numerous machine learning and artificial intelligence models. This work focuses on the acceleration of attention kernel using FlashAttention algorithm, in vector processors, particularly those based on the RISC-V instruction set architecture (ISA). This work represents the first effort to vectorize FlashAttention, minimizing scalar code and simplifying the computational complexity of evaluating exponentials needed by softmax used in attention. By utilizing a low-cost approximation for exponentials in floating-point arithmetic, we reduce the cost of computing the exponential function without the need to extend baseline vector ISA with new custom instructions. Also, appropriate tiling strategies are explored with the goal to improve memory locality. Experimental results highlight the scalability of our approach, demonstrating significant performance gains with the vectorized implementations when processing attention layers in practical applications.
【18】Efficient Discriminative Joint Encoders for Large Scale Vision-Language Reranking
标题:用于大规模视觉语言重新排序的高效区分联合编码器
链接:https://arxiv.org/abs/2510.06820
作者:Mitchell Keren Taraday, Shahaf Wagner, Chaim Baskin
备注:preprint
摘要:Multimodal retrieval still leans on embedding-based models like CLIP for fast vector search over pre-computed image embeddings. Yet, unlike text retrieval, where joint-encoder rerankers are standard, comparable vision--language rerankers are largely absent. We find that seminal joint encoders such as BLIP are severely bottlenecked by an expensive visual feature-extraction stage, preventing practical deployment at scale. Motivated by this bottleneck, we introduce EDJE, an Efficient Discriminative Joint Encoder that precomputes vision tokens offline and compresses them via a lightweight attention-based adapter, so online inference runs only a compact joint encoder over a small set of visual tokens plus the text. EDJE preserves strong retrieval performance while drastically reducing storage and online compute, enabling high-throughput inference. Specifically, EDJE processes 50k image--text pairs/second while requiring 49kB of disk storage per image, matching prior art on Flickr (zero-shot) and COCO (fine-tuned) retrieval. The implementation and checkpoints will be made publicly available shortly.
【19】BlackboxNLP-2025 MIB Shared Task: Exploring Ensemble Strategies for Circuit Localization Methods
标题:BlackboxNLP-2025 MIS共享任务:探索电路本地化方法的整合策略
链接:https://arxiv.org/abs/2510.06811
作者:Philipp Mondorf, Mingyang Wang, Sebastian Gerstner, Ahmad Dawar Hakimi, Yihong Liu, Leonor Veloso, Shijia Zhou, Hinrich Schütze, Barbara Plank
备注:The 8th BlackboxNLP Workshop (Shared Task), 6 pages
摘要:The Circuit Localization track of the Mechanistic Interpretability Benchmark (MIB) evaluates methods for localizing circuits within large language models (LLMs), i.e., subnetworks responsible for specific task behaviors. In this work, we investigate whether ensembling two or more circuit localization methods can improve performance. We explore two variants: parallel and sequential ensembling. In parallel ensembling, we combine attribution scores assigned to each edge by different methods-e.g., by averaging or taking the minimum or maximum value. In the sequential ensemble, we use edge attribution scores obtained via EAP-IG as a warm start for a more expensive but more precise circuit identification method, namely edge pruning. We observe that both approaches yield notable gains on the benchmark metrics, leading to a more precise circuit identification approach. Finally, we find that taking a parallel ensemble over various methods, including the sequential ensemble, achieves the best results. We evaluate our approach in the BlackboxNLP 2025 MIB Shared Task, comparing ensemble scores to official baselines across multiple model-task combinations.
【20】Function regression using the forward forward training and inferring paradigm
标题:使用前向训练和推理范式的函数回归
链接:https://arxiv.org/abs/2510.06762
作者:Shivam Padmani, Akshay Joshi
备注:Keywords: Neural Networks, Forward Forward training, Function Regression, Physical Neural Networks, Analog Computing
摘要:Function regression/approximation is a fundamental application of machine learning. Neural networks (NNs) can be easily trained for function regression using a sufficient number of neurons and epochs. The forward-forward learning algorithm is a novel approach for training neural networks without backpropagation, and is well suited for implementation in neuromorphic computing and physical analogs for neural networks. To the best of the authors' knowledge, the Forward Forward paradigm of training and inferencing NNs is currently only restricted to classification tasks. This paper introduces a new methodology for approximating functions (function regression) using the Forward-Forward algorithm. Furthermore, the paper evaluates the developed methodology on univariate and multivariate functions, and provides preliminary studies of extending the proposed Forward-Forward regression to Kolmogorov Arnold Networks, and Deep Physical Neural Networks.
【21】Distributed Algorithms for Multi-Agent Multi-Armed Bandits with Collision
标题:多智能体多武装盗贼碰撞的分布式算法
链接:https://arxiv.org/abs/2510.06683
作者:Daoyuan Zhou, Xuchuang Wang, Lin Yang, Yang Gao
备注:21 pages, 4 figures
摘要
:We study the stochastic Multiplayer Multi-Armed Bandit (MMAB) problem, where multiple players select arms to maximize their cumulative rewards. Collisions occur when two or more players select the same arm, resulting in no reward, and are observed by the players involved. We consider a distributed setting without central coordination, where each player can only observe their own actions and collision feedback. We propose a distributed algorithm with an adaptive, efficient communication protocol. The algorithm achieves near-optimal group and individual regret, with a communication cost of only $\mathcal{O}(\log\log T)$. Our experiments demonstrate significant performance improvements over existing baselines. Compared to state-of-the-art (SOTA) methods, our approach achieves a notable reduction in individual regret. Finally, we extend our approach to a periodic asynchronous setting, proving the lower bound for this problem and presenting an algorithm that achieves logarithmic regret.
【22】Incremental Summarization for Customer Support via Progressive Note-Taking and Agent Feedback
标题:通过渐进式笔记和代理反馈为客户支持进行增量总结
链接:https://arxiv.org/abs/2510.06677
作者:Yisha Wu, Cen (Mia)Zhao, Yuanpei Cao, Xiaoqing Su, Yashar Mehdad, Mindy Ji, Claire Na Cheng
备注:Accepted at EMNLP 2025 Industry Track
摘要:We introduce an incremental summarization system for customer support agents that intelligently determines when to generate concise bullet notes during conversations, reducing agents' context-switching effort and redundant review. Our approach combines a fine-tuned Mixtral-8x7B model for continuous note generation with a DeBERTa-based classifier to filter trivial content. Agent edits refine the online notes generation and regularly inform offline model retraining, closing the agent edits feedback loop. Deployed in production, our system achieved a 3% reduction in case handling time compared to bulk summarization (with reductions of up to 9% in highly complex cases), alongside high agent satisfaction ratings from surveys. These results demonstrate that incremental summarization with continuous feedback effectively enhances summary quality and agent productivity at scale.
【23】XRPO: Pushing the limits of GRPO with Targeted Exploration and Exploitation
标题:XLPO:通过有针对性的勘探和开发突破GRPO的极限
链接:https://arxiv.org/abs/2510.06672
作者:Udbhav Bamba, Minghao Fang, Yifan Yu, Haizhong Zheng, Fan Lai
摘要:Reinforcement learning algorithms such as GRPO have driven recent advances in large language model (LLM) reasoning. While scaling the number of rollouts stabilizes training, existing approaches suffer from limited exploration on challenging prompts and leave informative feedback signals underexploited, due to context-independent rollout allocation across prompts (e.g., generating 16 rollouts per prompt) and relying heavily on sparse rewards. This paper presents XRPO(eXplore - eXploit GRPO), a unified framework that recasts policy optimization through the principled lens of rollout exploration-exploitation. To enhance exploration, XRPO introduces a mathematically grounded rollout allocator that adaptively prioritizes prompts with higher potential for uncertainty reduction. It further addresses stagnation on zero-reward prompts through an in-context seeding strategy that injects curated exemplars, steering the model into more difficult reasoning trajectories. To strengthen exploitation, XRPO develops a group-relative, novelty-aware advantage sharpening mechanism that leverages sequence likelihoods to amplify low-probability yet correct responses, thereby extending the policy's reach beyond sparse rewards. Experiments across diverse math and coding benchmarks on both reasoning and non-reasoning models demonstrate that XRPO outperforms existing advances (e.g., GRPO and GSPO) up to 4% pass@1 and 6% cons@32, while accelerating training convergence by up to 2.7X.
【24】Rethinking Nonlinearity: Trainable Gaussian Mixture Modules for Modern Neural Architectures
标题:重新思考非线性:现代神经架构的可训练高斯混合模块
链接:https://arxiv.org/abs/2510.06660
作者:Weiguo Lu, Gangnan Yuan, Hong-kun Zhang, Shangyang Li
摘要:Neural networks in general, from MLPs and CNNs to attention-based Transformers, are constructed from layers of linear combinations followed by nonlinear operations such as ReLU, Sigmoid, or Softmax. Despite their strength, these conventional designs are often limited in introducing non-linearity by the choice of activation functions. In this work, we introduce Gaussian Mixture-Inspired Nonlinear Modules (GMNM), a new class of differentiable modules that draw on the universal density approximation Gaussian mixture models (GMMs) and distance properties (metric space) of Gaussian kernal. By relaxing probabilistic constraints and adopting a flexible parameterization of Gaussian projections, GMNM can be seamlessly integrated into diverse neural architectures and trained end-to-end with gradient-based methods. Our experiments demonstrate that incorporating GMNM into architectures such as MLPs, CNNs, attention mechanisms, and LSTMs consistently improves performance over standard baselines. These results highlight GMNM's potential as a powerful and flexible module for enhancing efficiency and accuracy across a wide range of machine learning applications.
【25】Control-Augmented Autoregressive Diffusion for Data Assimilation
标题:数据同化的控制增强自回归扩散
链接:https://arxiv.org/abs/2510.06637
作者:Prakhar Srivastava, Farrin Marouf Sofian, Francesco Immorlano, Kushagra Pandey, Stephan Mandt
摘要
:Despite recent advances in test-time scaling and finetuning of diffusion models, guidance in Auto-Regressive Diffusion Models (ARDMs) remains underexplored. We introduce an amortized framework that augments pretrained ARDMs with a lightweight controller network, trained offline by previewing future ARDM rollouts and learning stepwise controls that anticipate upcoming observations under a terminal cost objective. We evaluate this framework in the context of data assimilation (DA) for chaotic spatiotemporal partial differential equations (PDEs), a setting where existing methods are often computationally prohibitive and prone to forecast drift under sparse observations. Our approach reduces DA inference to a single forward rollout with on-the-fly corrections, avoiding expensive adjoint computations and/or optimizations during inference. We demonstrate that our method consistently outperforms four state-of-the-art baselines in stability, accuracy, and physical fidelity across two canonical PDEs and six observation regimes. We will release code and checkpoints publicly.
【26】The Markovian Thinker
标题:马尔科夫思想家
链接:https://arxiv.org/abs/2510.06557
作者:Milad Aghajohari, Kamran Chitsaz, Amirhossein Kazemnejad, Sarath Chandar, Alessandro Sordoni, Aaron Courville, Siva Reddy
摘要:Reinforcement learning (RL) has recently become a strong recipe for training reasoning LLMs that produce long chains of thought (LongCoT). Yet the standard RL "thinking environment", where the state is the prompt plus all prior reasoning tokens, makes the state unbounded and forces attention-based policies to pay quadratic compute as thoughts lengthen. We revisit the environment itself. We propose Markovian Thinking, a paradigm in which the policy advances reasoning while conditioning on a constant-size state, decoupling thinking length from context size. As an immediate consequence this yields linear compute with constant memory. We instantiate this idea with Delethink, an RL environment that structures reasoning into fixed-size chunks. Within each chunk, the model thinks as usual; at the boundary, the environment resets the context and reinitializes the prompt with a short carryover. Through RL, the policy learns to write a textual state near the end of each chunk sufficient for seamless continuation of reasoning after reset. Trained in this environment, an R1-Distill 1.5B model reasons in 8K-token chunks yet thinks up to 24K tokens, matching or surpassing LongCoT-RL trained with a 24K budget. With test-time scaling, Delethink continues to improve where LongCoT plateaus. The effect of linear compute is substantial: we empirically estimate at 96K average thinking length LongCoT-RL costs 27 H100-months vs. 7 for Delethink. Analysis at RL initialization shows off-the-shelf reasoning models (1.5B-120B) often sample Markovian traces zero-shot across diverse benchmarks, providing positive samples that make RL effective at scale. Our results show that redesigning the thinking environment is a powerful lever: it enables very long reasoning without quadratic overhead and opens a path toward efficient, scalable reasoning LLMs.
【27】Scalable Policy-Based RL Algorithms for POMDPs
标题:POMDPs的可扩展的基于策略的RL算法
链接:https://arxiv.org/abs/2510.06540
作者:Ameya Anjarlekar, Rasoul Etesami, R Srikant
备注:36 pages, 3 Figures, Accepted at NeurIPS 2025
摘要:The continuous nature of belief states in POMDPs presents significant computational challenges in learning the optimal policy. In this paper, we consider an approach that solves a Partially Observable Reinforcement Learning (PORL) problem by approximating the corresponding POMDP model into a finite-state Markov Decision Process (MDP) (called Superstate MDP). We first derive theoretical guarantees that improve upon prior work that relate the optimal value function of the transformed Superstate MDP to the optimal value function of the original POMDP. Next, we propose a policy-based learning approach with linear function approximation to learn the optimal policy for the Superstate MDP. Consequently, our approach shows that a POMDP can be approximately solved using TD-learning followed by Policy Optimization by treating it as an MDP, where the MDP state corresponds to a finite history. We show that the approximation error decreases exponentially with the length of this history. To the best of our knowledge, our finite-time bounds are the first to explicitly quantify the error introduced when applying standard TD learning to a setting where the true dynamics are not Markovian.
【28】How NOT to benchmark your SITE metric: Beyond Static Leaderboards and Towards Realistic Evaluation
标题:如何不对您的网站指标进行基准测试:超越静态排行榜并走向现实评估
链接:https://arxiv.org/abs/2510.06448
作者:Prabhant Singh, Sibylle Hess, Joaquin Vanschoren
摘要:Transferability estimation metrics are used to find a high-performing pre-trained model for a given target task without fine-tuning models and without access to the source dataset. Despite the growing interest in developing such metrics, the benchmarks used to measure their progress have gone largely unexamined. In this work, we empirically show the shortcomings of widely used benchmark setups to evaluate transferability estimation metrics. We argue that the benchmarks on which these metrics are evaluated are fundamentally flawed. We empirically demonstrate that their unrealistic model spaces and static performance hierarchies artificially inflate the perceived performance of existing metrics, to the point where simple, dataset-agnostic heuristics can outperform sophisticated methods. Our analysis reveals a critical disconnect between current evaluation protocols and the complexities of real-world model selection. To address this, we provide concrete recommendations for constructing more robust and realistic benchmarks to guide future research in a more meaningful direction.
【29】Making and Evaluating Calibrated Forecasts
标题:制定和评估经过校准的预测
链接:https://arxiv.org/abs/2510.06388
作者:Yuxuan Lu, Yifan Wu, Jason Hartline, Lunjia Hu
摘要
:Calibrated predictions can be reliably interpreted as probabilities. An important step towards achieving better calibration is to design an appropriate calibration measure to meaningfully assess the miscalibration level of a predictor. A recent line of work initiated by Haghtalab et al. [2024] studies the design of truthful calibration measures: a truthful measure is minimized when a predictor outputs the true probabilities, whereas a non-truthful measure incentivizes the predictor to lie so as to appear more calibrated. All previous calibration measures were non-truthful until Hartline et al. [2025] introduced the first perfectly truthful calibration measures for binary prediction tasks in the batch setting. We introduce a perfectly truthful calibration measure for multi-class prediction tasks, generalizing the work of Hartline et al. [2025] beyond binary prediction. We study common methods of extending calibration measures from binary to multi-class prediction and identify ones that do or do not preserve truthfulness. In addition to truthfulness, we mathematically prove and empirically verify that our calibration measure exhibits superior robustness: it robustly preserves the ordering between dominant and dominated predictors, regardless of the choice of hyperparameters (bin sizes). This result addresses the non-robustness issue of binned ECE, which has been observed repeatedly in prior work.
【30】Monte Carlo Permutation Search
标题:蒙特卡罗排列搜索
链接:https://arxiv.org/abs/2510.06381
作者:Tristan Cazenave
摘要:We propose Monte Carlo Permutation Search (MCPS), a general-purpose Monte Carlo Tree Search (MCTS) algorithm that improves upon the GRAVE algorithm. MCPS is relevant when deep reinforcement learning is not an option, or when the computing power available before play is not substantial, such as in General Game Playing, for example. The principle of MCPS is to include in the exploration term of a node the statistics on all the playouts that contain all the moves on the path from the root to the node. We extensively test MCPS on a variety of games: board games, wargame, investment game, video game and multi-player games. MCPS has better results than GRAVE in all the two-player games. It has equivalent results for multi-player games because these games are inherently balanced even when players have different strengths. We also show that using abstract codes for moves instead of exact codes can be beneficial to both MCPS and GRAVE, as they improve the permutation statistics and the AMAF statistics. We also provide a mathematical derivation of the formulas used for weighting the three sources of statistics. These formulas are an improvement on the GRAVE formula since they no longer use the bias hyperparameter of GRAVE. Moreover, MCPS is not sensitive to the ref hyperparameter.
【31】Lagrangian neural ODEs: Measuring the existence of a Lagrangian with Helmholtz metrics
标题:拉格朗日神经ODE:用Helmholtz度量测量拉格朗日的存在性
链接:https://arxiv.org/abs/2510.06367
作者:Luca Wolf, Tobias Buck, Bjoern Malte Schaefer
备注:Accepted for the NeurIPS 2025 Machine Learning and the Physical Sciences workshop. 6 pages, 3 figures
摘要:Neural ODEs are a widely used, powerful machine learning technique in particular for physics. However, not every solution is physical in that it is an Euler-Lagrange equation. We present Helmholtz metrics to quantify this resemblance for a given ODE and demonstrate their capabilities on several fundamental systems with noise. We combine them with a second order neural ODE to form a Lagrangian neural ODE, which allows to learn Euler-Lagrange equations in a direct fashion and with zero additional inference cost. We demonstrate that, using only positional data, they can distinguish Lagrangian and non-Lagrangian systems and improve the neural ODE solutions.
【32】BlockGPT: Spatio-Temporal Modelling of Rainfall via Frame-Level Autoregression
标题:BlockGPT:通过帧级自回归进行降雨时空建模
链接:https://arxiv.org/abs/2510.06293
作者:Cristian Meo, Varun Sarathchandran, Avijit Majhi, Shao Hung, Carlo Saccardi, Ruben Imhoff, Roberto Deidda, Remko Uijlenhoet, Justin Dauwels
摘要:Predicting precipitation maps is a highly complex spatiotemporal modeling task, critical for mitigating the impacts of extreme weather events. Short-term precipitation forecasting, or nowcasting, requires models that are not only accurate but also computationally efficient for real-time applications. Current methods, such as token-based autoregressive models, often suffer from flawed inductive biases and slow inference, while diffusion models can be computationally intensive. To address these limitations, we introduce BlockGPT, a generative autoregressive transformer using batched tokenization (Block) method that predicts full two-dimensional fields (frames) at each time step. Conceived as a model-agnostic paradigm for video prediction, BlockGPT factorizes space-time by using self-attention within each frame and causal attention across frames; in this work, we instantiate it for precipitation nowcasting. We evaluate BlockGPT on two precipitation datasets, viz. KNMI (Netherlands) and SEVIR (U.S.), comparing it to state-of-the-art baselines including token-based (NowcastingGPT) and diffusion-based (DiffCast+Phydnet) models. The results show that BlockGPT achieves superior accuracy, event localization as measured by categorical metrics, and inference speeds up to 31x faster than comparable baselines.
【33】BuilderBench -- A benchmark for generalist agents
标题:Builder Bench--多面手的标杆
链接:https://arxiv.org/abs/2510.06288
作者:Raj Ghugare, Catherine Ji, Kathryn Wantlin, Jin Schofield, Benjamin Eysenbach
备注:Project page: this https URL and Code: this https URL
摘要:Today's AI models learn primarily through mimicry and sharpening, so it is not surprising that they struggle to solve problems beyond the limits set by existing data. To solve novel problems, agents should acquire skills for exploring and learning through experience. Finding a scalable learning mechanism for developing agents that learn through interaction remains a major open problem. In this work, we introduce BuilderBench, a benchmark to accelerate research into agent pre-training that centers open-ended exploration. BuilderBench requires agents to learn how to build any structure using blocks. BuilderBench is equipped with $(1)$ a hardware accelerated simulator of a robotic agent interacting with various physical blocks, and $(2)$ a task-suite with over 42 diverse target structures that are carefully curated to test an understanding of physics, mathematics, and long-horizon planning. During training, agents have to explore and learn general principles about the environment without any external supervision. During evaluation, agents have to build the unseen target structures from the task suite. Solving these tasks requires a sort of \emph{embodied reasoning} that is not reflected in words but rather in actions, experimenting with different strategies and piecing them together. Our experiments show that many of these tasks challenge the current iteration of algorithms. Hence, we also provide a ``training wheels'' protocol, in which agents are trained and evaluated to build a single target structure from the task suite. Finally, we provide single-file implementations of six different algorithms as a reference point for researchers.
【34】Prakriti200: A Questionnaire-Based Dataset of 200 Ayurvedic Prakriti Assessments
标题:Prakriti 200:包含200项阿育吠陀Prakriti评估的基于数据集
链接:https://arxiv.org/abs/2510.06262
作者:Aryan Kumar Singh, Janvi Singh
备注:4 pages, 4 figures
摘要:This dataset provides responses to a standardized, bilingual (English-Hindi) Prakriti Assessment Questionnaire designed to evaluate the physical, physiological, and psychological characteristics of individuals according to classical Ayurvedic principles. The questionnaire consists of 24 multiple-choice items covering body features, appetite, sleep patterns, energy levels, and temperament. It was developed following AYUSH/CCRAS guidelines to ensure comprehensive and accurate data collection. All questions are mandatory and neutrally phrased to minimize bias, and dosha labels (Vata, Pitta, Kapha) are hidden from participants. Data were collected via a Google Forms deployment, enabling automated scoring of responses to map individual traits to dosha-specific scores. The resulting dataset provides a structured platform for research in computational intelligence, Ayurvedic studies, and personalized health analytics, supporting analysis of trait distributions, correlations, and predictive modeling. It can also serve as a reference for future Prakriti-based studies and the development of intelligent health applications.
【35】Evaluating Embedding Frameworks for Scientific Domain
标题:科学领域嵌入框架的评价
链接:https://arxiv.org/abs/2510.06244
作者:Nouman Ahmed, Ronin Wu, Victor Botev
摘要:Finding an optimal word representation algorithm is particularly important in terms of domain specific data, as the same word can have different meanings and hence, different representations depending on the domain and context. While Generative AI and transformer architecture does a great job at generating contextualized embeddings for any given work, they are quite time and compute extensive, especially if we were to pre-train such a model from scratch. In this work, we focus on the scientific domain and finding the optimal word representation algorithm along with the tokenization method that could be used to represent words in the scientific domain. The goal of this research is two fold: 1) finding the optimal word representation and tokenization methods that can be used in downstream scientific domain NLP tasks, and 2) building a comprehensive evaluation suite that could be used to evaluate various word representation and tokenization algorithms (even as new ones are introduced) in the scientific domain. To this end, we build an evaluation suite consisting of several downstream tasks and relevant datasets for each task. Furthermore, we use the constructed evaluation suite to test various word representation and tokenization algorithms.
【36】Milestone Determination for Autonomous Railway Operation
标题:铁路自主运营里程碑确定
链接:https://arxiv.org/abs/2510.06229
作者:Josh Hunter, John McDermid, Simon Burton, Poppy Fynes, Mia Dempster
备注:Paper submitted and partially accepted to ICART 2025, paper is 8 pages and has 1 figure, 2 tables
摘要:In the field of railway automation, one of the key challenges has been the development of effective computer vision systems due to the limited availability of high-quality, sequential data. Traditional datasets are restricted in scope, lacking the spatio temporal context necessary for real-time decision-making, while alternative solutions introduce issues related to realism and applicability. By focusing on route-specific, contextually relevant cues, we can generate rich, sequential datasets that align more closely with real-world operational logic. The concept of milestone determination allows for the development of targeted, rule-based models that simplify the learning process by eliminating the need for generalized recognition of dynamic components, focusing instead on the critical decision points along a route. We argue that this approach provides a practical framework for training vision agents in controlled, predictable environments, facilitating safer and more efficient machine learning systems for railway automation.
【37】Enhancing Resilience for IoE: A Perspective of Networking-Level Safeguard
标题:增强IoE的弹性:网络级保障的视角
链接:https://arxiv.org/abs/2508.20504
作者:Guan-Yan Yang, Jui-Ning Chen, Farn Wang, Kuo-Hui Yeh
备注:To be published in IEEE Network Magazine, 2026
摘要:The Internet of Energy (IoE) integrates IoT-driven digital communication with power grids to enable efficient and sustainable energy systems. Still, its interconnectivity exposes critical infrastructure to sophisticated cyber threats, including adversarial attacks designed to bypass traditional safeguards. Unlike general IoT risks, IoE threats have heightened public safety consequences, demanding resilient solutions. From the networking-level safeguard perspective, we propose a Graph Structure Learning (GSL)-based safeguards framework that jointly optimizes graph topology and node representations to resist adversarial network model manipulation inherently. Through a conceptual overview, architectural discussion, and case study on a security dataset, we demonstrate GSL's superior robustness over representative methods, offering practitioners a viable path to secure IoE networks against evolving attacks. This work highlights the potential of GSL to enhance the resilience and reliability of future IoE networks for practitioners managing critical infrastructure. Lastly, we identify key open challenges and propose future research directions in this novel research area.
【38】Quantum Sparse Recovery and Quantum Orthogonal Matching Pursuit
标题:量子稀疏恢复和量子垂直匹配追求
链接:https://arxiv.org/abs/2510.06925
作者:Armando Bellante, Stefano Vanerio, Stefano Zanero
摘要:We study quantum sparse recovery in non-orthogonal, overcomplete dictionaries: given coherent quantum access to a state and a dictionary of vectors, the goal is to reconstruct the state up to $\ell_2$ error using as few vectors as possible. We first show that the general recovery problem is NP-hard, ruling out efficient exact algorithms in full generality. To overcome this, we introduce Quantum Orthogonal Matching Pursuit (QOMP), the first quantum analogue of the classical OMP greedy algorithm. QOMP combines quantum subroutines for inner product estimation, maximum finding, and block-encoded projections with an error-resetting design that avoids iteration-to-iteration error accumulation. Under standard mutual incoherence and well-conditioned sparsity assumptions, QOMP provably recovers the exact support of a $K$-sparse state in polynomial time. As an application, we give the first framework for sparse quantum tomography with non-orthogonal dictionaries in $\ell_2$ norm, achieving query complexity $\widetilde{O}(\sqrt{N}/\epsilon)$ in favorable regimes and reducing tomography to estimating only $K$ coefficients instead of $N$ amplitudes. In particular, for pure-state tomography with $m=O(N)$ dictionary vectors and sparsity $K=\widetilde{O}(1)$ on a well-conditioned subdictionary, this circumvents the $\widetilde{\Omega}(N/\epsilon)$ lower bound that holds in the dense, orthonormal-dictionary setting, without contradiction, by leveraging sparsity together with non-orthogonality. Beyond tomography, we analyze QOMP in the QRAM model, where it yields polynomial speedups over classical OMP implementations, and provide a quantum algorithm to estimate the mutual incoherence of a dictionary of $m$ vectors in $O(m/\epsilon)$ queries, improving over both deterministic and quantum-inspired classical methods.
【39】Fitzpatrick Thresholding for Skin Image Segmentation
标题:菲茨帕特里克(Fitzpatrick)对皮肤图像分割的假设
链接:https://arxiv.org/abs/2510.06655
作者:Duncan Stothers, Sophia Xu, Carlie Reeves, Lia Gracey
备注:Accepted to MICCAI 2025 ISIC Workshop. 24 minute Oral presentation given. Awarded "Best Paper - Honorable Mention"
摘要:Accurate estimation of the body surface area (BSA) involved by a rash, such as psoriasis, is critical for assessing rash severity, selecting an initial treatment regimen, and following clinical treatment response. Attempts at segmentation of inflammatory skin disease such as psoriasis perform markedly worse on darker skin tones, potentially impeding equitable care. We assembled a psoriasis dataset sourced from six public atlases, annotated for Fitzpatrick skin type, and added detailed segmentation masks for every image. Reference models based on U-Net, ResU-Net, and SETR-small are trained without tone information. On the tuning split we sweep decision thresholds and select (i) global optima and (ii) per Fitzpatrick skin tone optima for Dice and binary IoU. Adapting Fitzpatrick specific thresholds lifted segmentation performance for the darkest subgroup (Fitz VI) by up to +31 % bIoU and +24 % Dice on UNet, with consistent, though smaller, gains in the same direction for ResU-Net (+25 % bIoU, +18 % Dice) and SETR-small (+17 % bIoU, +11 % Dice). Because Fitzpatrick skin tone classifiers trained on Fitzpatrick-17k now exceed 95 % accuracy, the cost of skin tone labeling required for this technique has fallen dramatically. Fitzpatrick thresholding is simple, model-agnostic, requires no architectural changes, no re-training, and is virtually cost free. We demonstrate the inclusion of Fitzpatrick thresholding as a potential future fairness baseline.
【40】A General Constructive Upper Bound on Shallow Neural Nets Complexity
标题:浅层神经网络复杂性的一般构造性上界
链接:https://arxiv.org/abs/2510.06372
作者:Frantisek Hakl, Vit Fojtik
摘要:We provide an upper bound on the number of neurons required in a shallow neural network to approximate a continuous function on a compact set with a given accuracy. This method, inspired by a specific proof of the Stone-Weierstrass theorem, is constructive and more general than previous bounds of this character, as it applies to any continuous function on any compact set.
【41】Dream2Image : An Open Multimodal EEG Dataset for Decoding and Visualizing Dreams with Artificial Intelligence
标题:Dream 2Image:一个开放的多模式脑电数据集,用于利用人工智能解码和可视化梦想
链接:https://arxiv.org/abs/2510.06252
作者:Yann Bellec
备注:7 Pages, 3 Figures, The Dream2Image dataset is openly available on Hugging Face at: this https URL
摘要:Dream2Image is the world's first dataset combining EEG signals, dream transcriptions, and AI-generated images. Based on 38 participants and more than 31 hours of dream EEG recordings, it contains 129 samples offering: the final seconds of brain activity preceding awakening (T-15, T-30, T-60, T-120), raw reports of dream experiences, and an approximate visual reconstruction of the dream. This dataset provides a novel resource for dream research, a unique resource to study the neural correlates of dreaming, to develop models for decoding dreams from brain activity, and to explore new approaches in neuroscience, psychology, and artificial intelligence. Available in open access on Hugging Face and GitHub, Dream2Image provides a multimodal resource designed to support research at the interface of artificial intelligence and neuroscience. It was designed to inspire researchers and extend the current approaches to brain activity decoding. Limitations include the relatively small sample size and the variability of dream recall, which may affect generalizability.
【42】Neu-RadBERT for Enhanced Diagnosis of Brain Injuries and Conditions
标题:Neu-RadBERT用于脑损伤和疾病的强化诊断
链接:https://arxiv.org/abs/2510.06232
作者
:Manpreet Singh (1), Sean Macrae (2), Pierre-Marc Williams (2), Nicole Hung (2), Sabrina Araujo de Franca (1), Laurent Letourneau-Guillon (2,3), François-Martin Carrier (2,4), Bang Liu (5), Yiorgos Alexandros Cavayas (1,2,6) ((1) Équipe de Recherche en Soins Intensifs, Centre de recherche du Centre intégré universitaire de santé et de services sociaux du Nord-de-l'Île-de-Montréal (2) Faculté de Médecine, Université de Montréal (3) Department of Radiology, Centre Hospitalier de l'Université de Montréal (4) Department of Anesthesia, Centre Hospitalier de l'Université de Montréal (5) Applied Research in Computer Linguistics Laboratory, Department of Computer Science and Operations Research, Université de Montréal (6) Division of Critical Care Medicine, Department of Medicine, Hôpital du Sacré-Cœur de Montréal)
备注:Both Manpreet Singh and Sean Macrae contributed equally and should be considered co-first authors. Corresponding author: Yiorgos Alexandros Cavayas
摘要:Objective: We sought to develop a classification algorithm to extract diagnoses from free-text radiology reports of brain imaging performed in patients with acute respiratory failure (ARF) undergoing invasive mechanical ventilation. Methods: We developed and fine-tuned Neu-RadBERT, a BERT-based model, to classify unstructured radiology reports. We extracted all the brain imaging reports (computed tomography and magnetic resonance imaging) from MIMIC-IV database, performed in patients with ARF. Initial manual labelling was performed on a subset of reports for various brain abnormalities, followed by fine-tuning Neu-RadBERT using three strategies: 1) baseline RadBERT, 2) Neu-RadBERT with Masked Language Modeling (MLM) pretraining, and 3) Neu-RadBERT with MLM pretraining and oversampling to address data skewness. We compared the performance of this model to Llama-2-13B, an autoregressive LLM. Results: The Neu-RadBERT model, particularly with oversampling, demonstrated significant improvements in diagnostic accuracy compared to baseline RadBERT for brain abnormalities, achieving up to 98.0% accuracy for acute brain injuries. Llama-2-13B exhibited relatively lower performance, peaking at 67.5% binary classification accuracy. This result highlights potential limitations of current autoregressive LLMs for this specific classification task, though it remains possible that larger models or further fine-tuning could improve performance. Conclusion: Neu-RadBERT, enhanced through target domain pretraining and oversampling techniques, offered a robust tool for accurate and reliable diagnosis of neurological conditions from radiology reports. This study underscores the potential of transformer-based NLP models in automatically extracting diagnoses from free text reports with potential applications to both research and patient care.
机器翻译由腾讯交互翻译提供,仅供参考
点击“阅读原文”获取带摘要的学术速递