Py学习  »  机器学习算法

机器学习学术速递[11.26]

arXiv每日学术速递 • 2 周前 • 1651 次点击  

点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!


cs.LG 方向,今日共计195篇


大模型相关(21篇)

【1】Unleashing the Power of Vision-Language Models for Long-Tailed Multi-Label Visual Recognition
标题:释放视觉语言模型的力量用于长尾多标签视觉识别
链接:https://arxiv.org/abs/2511.20641

作者:Wei Tang,Zuo-Zheng Wang,Kun Zhang,Tong Wei,Min-Ling Zhang
摘要:长尾多标签视觉识别提出了一个重大挑战,因为图像通常包含具有高度不平衡的类分布的多个标签,导致偏向头部类而在尾部类上表现不佳的模型。最近的努力利用了预先训练的视觉语言模型,如CLIP,以及长尾学习技术,以利用丰富的视觉文本先验来提高性能。然而,现有的方法往往直接从不平衡的数据集导出语义类间关系,导致不可靠的相关性尾类由于数据稀缺。此外,CLIP的zero-shot范例针对单标签图像-文本匹配进行了优化,使其对于多标签任务而言是次优的。为了解决这些问题,我们提出了相关自适应提示网络(CAPNET),一种新的端到端的框架,明确的模型标签相关性从CLIP的文本编码器。该框架结合了一个用于标签感知传播的图卷积网络和用于细化嵌入的可学习软提示。它利用分布平衡的Focal损失和类感知的重新加权来优化不平衡下的训练。此外,它通过测试时集成来提高泛化能力,并使用参数有效的微调来重新调整视觉-文本模态,以避免在不影响头类性能的情况下对尾类进行过拟合。在VOC-LT、COCO-LT和NUS-WIDE等基准上进行的大量实验和消融研究表明,CAPNET比最先进的方法实现了实质性的改进,验证了其对现实世界长尾多标签视觉识别的有效性。
摘要:Long-tailed multi-label visual recognition poses a significant challenge, as images typically contain multiple labels with highly imbalanced class distributions, leading to biased models that favor head classes while underperforming on tail classes. Recent efforts have leveraged pre-trained vision-language models, such as CLIP, alongside long-tailed learning techniques to exploit rich visual-textual priors for improved performance. However, existing methods often derive semantic inter-class relationships directly from imbalanced datasets, resulting in unreliable correlations for tail classes due to data scarcity. Moreover, CLIP's zero-shot paradigm is optimized for single-label image-text matching, making it suboptimal for multi-label tasks. To address these issues, we propose the correlation adaptation prompt network (CAPNET), a novel end-to-end framework that explicitly models label correlations from CLIP's textual encoder. The framework incorporates a graph convolutional network for label-aware propagation and learnable soft prompts for refined embeddings. It utilizes a distribution-balanced Focal loss with class-aware re-weighting for optimized training under imbalance. Moreover, it improves generalization through test-time ensembling and realigns visual-textual modalities using parameter-efficient fine-tuning to avert overfitting on tail classes without compromising head class performance. Extensive experiments and ablation studies on benchmarks including VOC-LT, COCO-LT, and NUS-WIDE demonstrate that CAPNET achieves substantial improvements over state-of-the-art methods, validating its effectiveness for real-world long-tailed multi-label visual recognition.


【2】Can Vibe Coding Beat Graduate CS Students? An LLM vs. Human Coding Tournament on Market-driven Strategic Planning
标题:Vibe Coding能否击败CS研究生?关于市场驱动战略规划的LLM与人类编码锦标赛
链接:https://arxiv.org/abs/2511.20613

作者:Panayiotis Danassis,Naman Goel
摘要:大型语言模型(LLM)的快速增长彻底改变了AI辅助代码生成。LLM的这种快速发展已经超过了我们正确基准测试它们的能力。流行的基准测试强调单元测试通过率和语法正确性。这些指标低估了许多需要规划、优化和战略交互的现实问题的难度。我们介绍了一个多智能体推理驱动的基准的基础上,现实世界的物流优化问题(拍卖,提货和交货问题),夫妇竞争性拍卖与容量受限的路由。该基准要求建筑代理商能够(i)在不确定性下进行战略性投标,以及(ii)优化规划者,在最大化利润的同时交付任务。我们评估了40个LLM编码的代理(由各种各样的国家的最先进的LLM下的多种提示方法,包括振动编码)对17个人类编码的代理LLM出现之前开发的。我们的结果超过12个双all-play-all锦标赛和$\sim 40$k比赛证明(i)人类的明显优势(研究生)-编码代理人:前5位始终由人类编码的代理赢得,(ii)大多数LLM编码的代理(40人中有33人)被非常简单的基线击败,(iii)给出最佳的人类解决方案作为输入,并提示改进,我们的研究结果突出了LLM在产生在现实世界中具有竞争力的代码的能力方面的差距,并激发了新的评估,强调在现实世界中推理驱动的代码合成。
摘要:The rapid proliferation of Large Language Models (LLMs) has revolutionized AI-assisted code generation. This rapid development of LLMs has outpaced our ability to properly benchmark them. Prevailing benchmarks emphasize unit-test pass rates and syntactic correctness. Such metrics understate the difficulty of many real-world problems that require planning, optimization, and strategic interaction. We introduce a multi-agent reasoning-driven benchmark based on a real-world logistics optimization problem (Auction, Pickup, and Delivery Problem) that couples competitive auctions with capacity-constrained routing. The benchmark requires building agents that can (i) bid strategically under uncertainty and (ii) optimize planners that deliver tasks while maximizing profit. We evaluate 40 LLM-coded agents (by a wide range of state-of-the-art LLMs under multiple prompting methodologies, including vibe coding) against 17 human-coded agents developed before the advent of LLMs. Our results over 12 double all-play-all tournaments and $\sim 40$k matches demonstrate (i) a clear superiority of human(graduate students)-coded agents: the top 5 spots are consistently won by human-coded agents, (ii) the majority of LLM-coded agents (33 out of 40) are beaten by very simple baselines, and (iii) given the best human solution as an input and prompted to improve upon, the best performing LLM makes the solution significantly worse instead of improving it. Our results highlight a gap in LLMs' ability to produce code that works competitively in the real-world, and motivate new evaluations that emphasize reasoning-driven code synthesis in real-world scenarios.


【3】On Evaluating LLM Alignment by Evaluating LLMs as Judges
标题:通过评估LLM作为评委来评估LLM一致性
链接:https://arxiv.org/abs/2511.20604

作者:Yixin Liu,Pengfei Liu,Arman Cohan
备注:NeurIPS 2025 Camera Ready
摘要:与人类偏好保持一致是LLM的一个重要评估方面,要求它们乐于助人,诚实,安全,并精确地遵循人类指示。评估大型语言模型(LLM)的对齐通常涉及直接评估其开放式响应,需要人工注释或强大的LLM法官。相反,法学硕士本身也被广泛评估为法官评估对齐。在这项工作中,我们研究LLM的生成和评估能力与人类偏好之间的关系。为此,我们首先进行了全面的分析的生成-评价一致性(GE一致性)之间的各种LLM,揭示了他们的生成和评价能力之间的强相关性时,由一个强大的LLM偏好预言。利用这一发现,我们提出了一个基准范式,衡量LLM对齐与人类的喜好,而不直接评估其产生的输出,而是评估LLM在他们的角色作为评估。我们的评估表明,我们提出的基准,AlignEval,匹配或超越广泛使用的自动LLM评估基准,如AlpacaEval和Arena硬,在捕捉人类的偏好时,排名LLM。我们的研究为LLM的生成和评估能力之间的联系提供了有价值的见解,并引入了一个评估对齐的基准,而不直接评估模型输出。
摘要 :Alignment with human preferences is an important evaluation aspect of LLMs, requiring them to be helpful, honest, safe, and to precisely follow human instructions. Evaluating large language models' (LLMs) alignment typically involves directly assessing their open-ended responses, requiring human annotators or strong LLM judges. Conversely, LLMs themselves have also been extensively evaluated as judges for assessing alignment. In this work, we examine the relationship between LLMs' generation and evaluation capabilities in aligning with human preferences. To this end, we first conduct a comprehensive analysis of the generation-evaluation consistency (GE-consistency) among various LLMs, revealing a strong correlation between their generation and evaluation capabilities when evaluated by a strong LLM preference oracle. Utilizing this finding, we propose a benchmarking paradigm that measures LLM alignment with human preferences without directly evaluating their generated outputs, instead assessing LLMs in their role as evaluators. Our evaluation shows that our proposed benchmark, AlignEval, matches or surpasses widely used automatic LLM evaluation benchmarks, such as AlpacaEval and Arena-Hard, in capturing human preferences when ranking LLMs. Our study offers valuable insights into the connection between LLMs' generation and evaluation capabilities, and introduces a benchmark that assesses alignment without directly evaluating model outputs.


【4】Beyond Generation: Multi-Hop Reasoning for Factual Accuracy in Vision-Language Models
标题:超越一代:视觉语言模型中事实准确性的多跳推理
链接:https://arxiv.org/abs/2511.20531

作者:Shamima Hossain
备注:Accepted as poster at NewInML Workshop ICML, 2025
摘要:视觉语言模型(VLM)是一种功能强大的生成工具,但由于缺乏强大的推理能力,其输出结果往往不准确。虽然已经进行了广泛的研究,在大型语言模型(LLM)的推理集成外部知识,这样的努力仍然在VLMs,其中的挑战是复杂的无缝桥接多个模态的需要探索不足。本文提出了一个基于知识引导的VLMs推理框架,并以图像字幕任务为例,对用于多跳验证的结构化知识图进行了描述。我们的方法可以跨多个步骤进行系统推理,包括视觉实体识别,知识图遍历和基于事实的字幕精炼。我们使用基于层次、基于三重和基于要点的知识表示来评估框架,分析它们在事实准确性和逻辑推理方面的有效性。实证结果表明,我们的方法在Google Landmarks v2,概念标题和Coco标题的混合物的策展数据集上的初步实验中将事实准确性提高了约31%,揭示了对推理模式和故障模式的关键见解。这项工作表明了整合外部知识以推进VLMs推理的潜力,为更可靠和知识化的多模态系统铺平了道路。
摘要:Visual Language Models (VLMs) are powerful generative tools but often produce factually in- accurate outputs due to a lack of robust reason- ing capabilities. While extensive research has been conducted on integrating external knowl- edge for reasoning in large language models (LLMs), such efforts remain underexplored in VLMs, where the challenge is compounded by the need to bridge multiple modalities seam- lessly. This work introduces a framework for knowledge-guided reasoning in VLMs, leverag- ing structured knowledge graphs for multi-hop verification using image-captioning task to il- lustrate our framework. Our approach enables systematic reasoning across multiple steps, in- cluding visual entity recognition, knowledge graph traversal, and fact-based caption refine- ment. We evaluate the framework using hi- erarchical, triple-based and bullet-point based knowledge representations, analyzing their ef- fectiveness in factual accuracy and logical infer- ence. Empirical results show that our approach improves factual accuracy by approximately 31% on preliminary experiments on a curated dataset of mixtures from Google Landmarks v2, Conceptual captions and Coco captions re- vealing key insights into reasoning patterns and failure modes. This work demonstrates the po- tential of integrating external knowledge for advancing reasoning in VLMs, paving the way for more reliable and knowledgable multimodal systems.


【5】NNGPT: Rethinking AutoML with Large Language Models
标题:NNGPT:用大型语言模型重新思考AutoML
链接:https://arxiv.org/abs/2511.20333

作者:Roman Kochnev,Waleed Khalid,Tolgay Atinc Uzun,Xi Zhang,Yashkumar Sanjaybhai Dhameliya,Furui Qin,Chandini Vysyaraju,Raghuvir Duvvuri,Avi Goyal,Dmitry Ignatov,Radu Timofte
摘要:构建自我改进的人工智能系统仍然是人工智能领域的一个根本挑战。我们介绍了NNGPT,这是一个开源框架,它将大型语言模型(LLM)转变为用于神经网络开发的自我改进的AutoML引擎,主要用于计算机视觉。与以前的框架不同,NNGPT通过生成新模型来扩展神经网络的数据集,从而能够基于生成、评估和自我改进的闭环系统对LLM进行持续微调。它在一个统一的工作流程中集成了五个基于LLM的协同管道:zero-shot架构合成,超参数优化(HPO),代码感知准确性/早期停止预测,范围封闭PyTorch块的检索增强合成(NN-RAG)和强化学习。基于LEMUR数据集构建,作为具有可复制指标的审计语料库,NNGPT从单个提示发出并验证网络架构,预处理代码和超参数,端到端执行它们,并从结果中学习。PyTorch适配器使NNGPT框架无关,实现了强大的性能:NN-RAG在1,289个目标上实现了73%的可执行性,3次提示提高了常见数据集的准确性,基于哈希的重复数据删除节省了数百次运行。一次性预测与基于搜索的AutoML相匹配,减少了大量试验的需要。LEMUR上的HPO达到RMSE 0.60,优于Optuna(0.64),而代码感知预测器达到RMSE 0.14,Pearson r=0.78。该系统已经生成了超过5000个经过验证的模型,证明了NNGPT是一个自主的AutoML引擎。接受后,代码、提示和检查点将被发布供公众访问,以实现可重复性并促进社区使用。
摘要:Building self-improving AI systems remains a fundamental challenge in the AI domain. We present NNGPT, an open-source framework that turns a large language model (LLM) into a self-improving AutoML engine for neural network development, primarily for computer vision. Unlike previous frameworks, NNGPT extends the dataset of neural networks by generating new models, enabling continuous fine-tuning of LLMs based on closed-loop system of generation, assessment, and self-improvement. It integrates within one unified workflow five synergistic LLM-based pipelines: zero-shot architecture synthesis, hyperparameter optimization (HPO), code-aware accuracy/early-stop prediction, retrieval-augmented synthesis of scope-closed PyTorch blocks (NN-RAG), and reinforcement learning. Built on the LEMUR dataset as an audited corpus with reproducible metrics, NNGPT emits from a single prompt and validates network architecture, preprocessing code, and hyperparameters, executes them end-to-end, and learns from result. The PyTorch adapter makes NNGPT framework-agnostic, enabling strong performance: NN-RAG achieves 73% executability on 1,289 targets, 3-shot prompting boosts accuracy on common datasets, and hash-based deduplication saves hundreds of runs. One-shot prediction matches search-based AutoML, reducing the need for numerous trials. HPO on LEMUR achieves RMSE 0.60, outperforming Optuna (0.64), while the code-aware predictor reaches RMSE 0.14 with Pearson r=0.78. The system has already generated over 5K validated models, proving NNGPT as an autonomous AutoML engine. Upon acceptance, the code, prompts, and checkpoints will be released for public access to enable reproducibility and facilitate community usage.


【6】Geometry of Decision Making in Language Models
标题:语言模型中的决策几何
链接:https://arxiv.org/abs/2511.20315

作者:Abhinav Joshi,Divyanshu Bhatt,Ashutosh Modi
备注:Accepted at NeurIPS 2025
摘要:大型语言模型(LLM)在不同的任务中表现出很强的泛化能力,但其预测背后的内部决策过程仍然不透明。在这项工作中,我们研究的几何隐藏表示在LLM通过透镜的\textit{内在维度}(ID),专注于决策动态的多项选择题回答(MCQA)设置。我们进行了一项大规模的研究,28个开放重量的Transformer模型和估计ID跨层使用多个估计器,同时还量化每层的性能MCQA任务。我们的研究结果揭示了跨模型的一致ID模式:早期层在低维流形上操作,中间层扩展这个空间,后期层再次压缩它,收敛到决策相关的表示。总之,这些结果表明LLM隐式学习将语言输入投射到与特定任务决策一致的结构化低维流形上,为语言模型中的泛化和推理提供了新的几何见解。
摘要:Large Language Models (LLMs) show strong generalization across diverse tasks, yet the internal decision-making processes behind their predictions remain opaque. In this work, we study the geometry of hidden representations in LLMs through the lens of \textit{intrinsic dimension} (ID), focusing specifically on decision-making dynamics in a multiple-choice question answering (MCQA) setting. We perform a large-scale study, with 28 open-weight transformer models and estimate ID across layers using multiple estimators, while also quantifying per-layer performance on MCQA tasks. Our findings reveal a consistent ID pattern across models: early layers operate on low-dimensional manifolds, middle layers expand this space, and later layers compress it again, converging to decision-relevant representations. Together, these results suggest LLMs implicitly learn to project linguistic inputs onto structured, low-dimensional manifolds aligned with task-specific decisions, providing new geometric insights into how generalization and reasoning emerge in language models.


【7】The Devil in the Details: Emergent Misalignment, Format and Coherence in Open-Weights LLMs
标题:细节中的魔鬼:开放权重LLM中的紧急错位,格式和一致性
链接:https://arxiv.org/abs/2511.20104

作者:Craig Dickson
摘要:先前的工作已经表明,在具有未对齐数据的窄域上微调模型可能会导致广泛的未对齐-这种现象称为“紧急未对齐”(Betley et al. 2025)。虽然所有测试的模型都容易受到紧急错位,一些模型表现出比其他更多的阻力。具体地,Qwen-2.5家族被证明是相对抗性的,而GPT-40表现出最强的未对准。在本文中,我们评估当前一代开放权重模型是否表现出与Qwen-2.5系列类似的抵抗力,并在一系列模型架构和规模上测量失准鲁棒性。   我们在9个现代开放权重模型(Gemma 3和Qwen 3家族,1B-32 B参数)中复制了该效应。针对不安全代码生成进行微调的模型显示出0.68%的未对齐率(与基础模型的0.07%相比),与之前开放模型结果的低端相匹配,但大大低于GPT-4 o的20%。   我们发现了一个严重的格式相关漏洞:与自然语言提示相比,要求JSON输出的错误率是(0.96% vs 0.42%)的两倍。这表明,结构约束可能会绕过安全培训,减少模型的“自由度”拒绝。这些发现证实,紧急未对准是现代开放权重模型中的一种可重复现象,其发生率远低于专有系统中观察到的发生率。
摘要:Prior work has shown that fine-tuning models on a narrow domain with misaligned data can lead to broad misalignment - a phenomenon termed "emergent misalignment" (Betley et al. 2025). While all tested models were susceptible to emergent misalignment, some models showed more resistance than others. Specifically the Qwen-2.5 family proved to be relatively resistant, while GPT-4o exhibited the strongest misalignment. In this paper we evaluate if current-generation open-weights models exhibit similar resistance to the Qwen-2.5 family and measure misalignment robustness over a range of model architectures and scales.   We replicate the effect across nine modern open-weights models (Gemma 3 and Qwen 3 families, 1B-32B parameters). Models fine-tuned on insecure code generation show a 0.68% misalignment rate (compared to 0.07% for base models), matching the lower end of prior open-model results but dramatically lower than GPT-4o's 20%.   We identify a critical format-dependent vulnerability: requiring JSON output doubles misalignment rates compared to natural language prompts (0.96% vs 0.42%). This suggests that structural constraints may bypass safety training by reducing the model's 'degrees of freedom' to refuse. These findings confirm emergent misalignment as a reproducible phenomenon in modern open-weights models, with rates substantially lower than observed in proprietary systems.


【8】Reducing Latency of LLM Search Agent via Speculation-based Algorithm-System Co-Design
标题:通过基于投机的系统协同设计减少LLM搜索代理的延迟
链接:https://arxiv.org/abs/2511.20048

作者:Zixiao Huang,Wen Zeng,Tianyu Fu,Tengxuan Liu,Yizhou Sun,Ke Hong,Xinhao Yang,Chengchun Liu,Yan Li,Quanlu Zhang,Guohao Dai,Zhenhua Zhu,Yu Wang
摘要:基于LLM的搜索代理实现了强大的性能,但遭受严重的延迟,因为每个步骤都需要序列化的LLM推理,然后是工具执行的动作。我们通过推测的视角重新审视这一瓶颈。虽然传统的预测-验证推测范式可以打破串行执行,但其益处仍然有限,因为它保留了完整的原始工作负载并增加了额外的推理开销。我们观察到,早期的代理步骤往往涉及简单的证据收集,正确的行动往往可以预测没有充分的推理。基于这些观察,我们提出了SPAgent,算法系统协同设计框架,扩大了搜索代理的投机作用,以减少延迟。在算法上,SPAgent引入了两阶段自适应推测机制,在安全时选择性地省略验证。在系统方面,两级调度程序根据引擎负载调节推测性请求,以确保推测仍然有益。我们在现实世界的系统中实现SPAgent。在广泛的实验设置中,SPAgent实现了高达1.65倍的端到端加速,同时保持相同甚至更高的准确性,从而实现了多步搜索代理的实际部署。
摘要:LLM-based search agents achieve strong performance but suffer from severe latency, as each step requires serialized LLM reasoning followed by action of tool execution. We revisit this bottleneck through the lens of speculation. While traditional predict-verify speculation paradigm can break serial execution, its benefit remains limited, as it retains the full original workload and adds extra inference overhead. We observe that early agent steps often involve simple evidence-gathering, where correct actions can often be predicted without full reasoning. Building on these observations, we present SPAgent, an algorithm-system co-design framework that expands the role of speculation in search agents to reduce latency. Algorithmically, SPAgent introduces a two-phase adaptive speculation mechanism that selectively omits verification when safe. System-wise, a two-level scheduler regulates speculative requests based on engine load to ensure speculation remains beneficial. We implement SPAgent in real-world systems. Across extensive experimental settings, SPAgent achieves up to $1.65\times$ end-to-end speedup while maintaining same or even achieving higher accuracy, enabling practical deployment of multi-step search agents.


【9】ParaBlock: Communication-Computation Parallel Block Coordinate Federated Learning for Large Language Models
标题:ParaBlock:用于大型语言模型的通信计算并行块协调联邦学习
链接:https://arxiv.org/abs/2511.19959

作者:Yujia Wang,Yuanpu Cao,Jinghui Chen
备注:32 pages, 2 figures
摘要:联邦学习(FL)作为一种隐私保护的训练范式得到了广泛的研究。最近,联邦块坐标下降方案已成为训练大规模模型的流行选择,因为它允许客户端仅在本地训练模型的一个子集,而不是整个模型。然而,在大型语言模型(LLM)时代,即使是单个块也可能包含大量参数,造成大量通信延迟,特别是对于资源受限的客户端。为了解决联邦训练/微调LLM中的这一挑战,我们提出了ParaBlock,这是一种新的方法,它建立了两个并行的通信和计算线程,以提高通信效率。我们从理论上证明,建议ParaBlock达到相同的收敛速度的标准联邦块坐标下降方法。对一般指令遵循和数学推理的微调LLM的实证评估证实,ParaBlock不仅保持了强大的性能,而且显着提高了通信效率。
摘要:Federated learning (FL) has been extensively studied as a privacy-preserving training paradigm. Recently, federated block coordinate descent scheme has become a popular option in training large-scale models, as it allows clients to train only a subset of the model locally instead of the entire model. However, in the era of large language models (LLMs), even a single block can contain a significant number of parameters, posing substantial communication latency, particularly for resource-constrained clients. To address this challenge in federated training/fine-tuning LLMs, we propose ParaBlock, a novel approach that establishes two parallel threads for communication and computation to enhance communication efficiency. We theoretically prove that the proposed ParaBlock achieves the same convergence rate as the standard federated block coordinate descent methods. Empirical evaluations on fine-tuning LLMs on general instruction following and mathematical reasoning confirm that ParaBlock not only maintains strong performance but also significantly improves communication efficiency.


【10】Prompt Fairness: Sub-group Disparities in LLMs
标题:即时公平:LLM中的子组差异
链接:https://arxiv.org/abs/2511.19956

作者:Meiyu Zhong,Noel Teku,Ravi Tandon
摘要:大型语言模型(LLM)虽然在许多应用中表现出有效性,但其响应质量可能会有很大差异。在本文中,我们调查这个问题的提示公平性:具体而言,由不同的用户/风格的提示措辞,尽管被问到原则上相同的问题,可能会引起不同的反应从LLM。为了量化这种差异,我们建议使用信息理论指标,可以捕获两个方面的偏见:亚组敏感性,在一个亚组和跨组的一致性,跨亚组的响应的变化。我们的分析表明,某些亚组表现出更高的内部变异性和与其他亚组的更大差异。我们的实证分析表明,某些人口亚组经历了更高的内部变异性和更大的分歧,从其他人,表明模型行为的结构性不平等。为了缓解这些差异,我们提出了实际的干预措施,包括多代人的多数投票和及时中立化,这些措施共同提高了响应的稳定性,并增强了用户群体的公平性。在实验中,我们观察到人口统计学亚组之间明显的提示敏感性差异:缓解前,跨组差异值达到0.28,通常在0.14至0.22的范围内。在应用我们的中和和多代策略后,这些差异不断减小,最大差距降至0.22,许多距离降至0.17或以下,表明各亚组之间的输出更加稳定和一致。
摘要 :Large Language Models (LLMs), though shown to be effective in many applications, can vary significantly in their response quality. In this paper, we investigate this problem of prompt fairness: specifically, the phrasing of a prompt by different users/styles, despite the same question being asked in principle, may elicit different responses from an LLM. To quantify this disparity, we propose to use information-theoretic metrics that can capture two dimensions of bias: subgroup sensitivity, the variability of responses within a subgroup and cross group consistency, the variability of responses across subgroups. Our analysis reveals that certain subgroups exhibit both higher internal variability and greater divergence from others. Our empirical analysis reveals that certain demographic sub groups experience both higher internal variability and greater divergence from others, indicating structural inequities in model behavior. To mitigate these disparities, we propose practical interventions, including majority voting across multiple generations and prompt neutralization, which together improve response stability and enhance fairness across user populations. In the experiments, we observe clear prompt sensitivity disparities across demographic subgroups: before mitigation, cross-group divergence values reach 0.28 and typically fall in the from 0.14 to 0.22 range. After applying our neutralization and multi generation strategy, these divergences consistently decrease, with the largest gap reduced to 0.22 and many distances falling to 0.17 or below, indicating more stable and consistent outputs across subgroups.


【11】Differential Smoothing Mitigates Sharpening and Improves LLM Reasoning
标题:差异平滑可以缓解尖锐化并改进LLM推理
链接:https://arxiv.org/abs/2511.19942

作者:Jingchu Gai,Guanning Zeng,Huaqing Zhang,Aditi Raghunathan
摘要:人们普遍认为,大型语言模型的强化学习(RL)微调通常会导致多样性崩溃,其中输出缺乏多样性。先前的工作已经提出了一系列的方法来抵消这种影响,但这些方法是临时的:它们经常权衡正确性和多样性,它们的有效性在不同的任务中有所不同,在某些情况下,它们甚至相互矛盾。在这项工作中,我们将这些观察置于严格的基础上。我们首先提供了一个正式的证据,为什么RL微调表现出多样性崩溃通过选择和强化偏见。接下来,我们做了一个关键的观察,即任何解决多样性崩溃的奖励修改只需要应用在正确的轨迹上。直接建立在这个分析,我们介绍了一个原则性的方法- \textit{差分平滑} -证明提高了正确性和多样性,优于香草RL以及广泛使用的基于熵的算法。我们的理论精确地描述了现有的几何学何时有帮助以及为什么它们失败,同时表明差分平滑普遍优越。使用从1B到7 B参数的模型进行的广泛实验,包括CountDown和真实世界的数学推理,都证明了一致的收益。差分平滑改进了Pass@1和Pass@k,在AIME 24数据集上的改进高达6.7\%。
摘要:It is widely recognized that reinforcement learning (RL) fine-tuning of large language models often leads to \textit{diversity collapse}, where outputs lack variety. Prior work has proposed a range of heuristics to counteract this effect, but these methods are ad hoc: they frequently trade off correctness for diversity, their effectiveness varies across tasks, and in some cases they even contradict one another. In this work, we place these observations on a rigorous foundation. We first provide a formal proof of why RL fine-tuning exhibits diversity collapse via a selection and reinforcement bias. Next, we make a key observation that any reward modification to address diversity collapse only needs to be applied on the correct trajectories. Building directly on this analysis, we introduce a principled method -- \textit{differential smoothing} -- that provably improves both correctness and diversity, outperforming vanilla RL as well as widely used entropy-based heuristics. Our theory precisely characterizes when existing heuristics help and why they fail, while showing that differential smoothing is universally superior. Extensive experiments with models from 1B to 7B parameters, across domains including CountDown and real-world mathematical reasoning, demonstrate consistent gains. Differential smoothing improves both Pass@1 and Pass@k, with up to 6.7\% improvements on AIME24 dataset.


【12】EfficientXpert: Efficient Domain Adaptation for Large Language Models via Propagation-Aware Pruning
标题:EfficientXpress:通过描述感知修剪对大型语言模型进行高效的领域适应
链接:https://arxiv.org/abs/2511.19935

作者:Songlin Zhao,Michael Pitts,Zhuwei Qin
摘要:大型语言模型(LLM)的快速发展增加了法律、医疗保健和金融等领域对领域专用变体的需求。然而,它们的大尺寸仍然是在资源受限环境中部署的障碍,并且现有的压缩方法要么跨域泛化能力差,要么产生高开销。在这项工作中,我们提出了\textbf{EfficientXpert},一个轻量级的域修剪框架,它结合了一个高效的适配器更新算法(部分脑外科医生)的传播感知修剪标准(FocusMask)。EfficientXpert集成到LoRA微调过程中,可以将一般的预训练模型一步转换为稀疏的、适应领域的专家。在健康和法律任务中,它以40%的稀疏度保留了高达98%的密集模型性能,优于最先进的方法。进一步的分析揭示了大量的域相关的结构变化,降低了一般修剪掩模的有效性,强调需要自适应,域感知修剪策略,为每个域量身定制。
摘要:The rapid advancement of large language models (LLMs) has increased the demand for domain-specialized variants in areas such as law, healthcare, and finance. However, their large size remains a barrier to deployment in resource-constrained environments, and existing compression methods either generalize poorly across domains or incur high overhead. In this work, we propose \textbf{EfficientXpert}, a lightweight domain-pruning framework that combines a propagation-aware pruning criterion (Foresight Mask) with an efficient adapter-update algorithm (Partial Brain Surgeon). Integrated into the LoRA fine-tuning process, EfficientXpert enables a one-step transformation of general pretrained models into sparse, domain-adapted experts. Across health and legal tasks, it retains up to 98% of dense-model performance at 40% sparsity, outperforming state-of-the-art methods. Further analysis reveals substantial domain-dependent structural shifts that degrade the effectiveness of general pruning masks, underscoring the need for adaptive, domain-aware pruning strategies tailored to each domain.


【13】It Hears, It Sees too: Multi-Modal LLM for Depression Detection By Integrating Visual Understanding into Audio Language Models
标题:它听到了,它也看到了:通过将视觉理解集成到音频语言模型中来检测抑郁症的多模式LLM
链接:https://arxiv.org/abs/2511.19877

作者:Xiangyu Zhao,Yaling Shen,Yiwen Jiang,Zimu Wang,Jiahe Liu,Maxmartwell H Cheng,Guilherme C Oliveira,Robert Desimone,Dominic Dwyer,Zongyuan Ge
摘要:抑郁症是全球最普遍的心理健康疾病之一。近年来,语音、视频和文字记录等多模式数据越来越多地用于开发人工智能辅助抑郁症评估系统。大型语言模型由于其强大的语言理解和泛化能力,进一步推动了这一领域的发展。然而,传统的LLM仍然以文本为中心,无法处理音频和视觉形式中丰富的非语言线索,这些线索是心理健康评估的关键组成部分。虽然多模态LLM提供了一个有前途的方向,但很少有针对心理学应用的。在这项研究中,我们提出了一种新的多模态LLM框架抑郁症检测。我们的方法增强了音频语言模型与视觉理解,并在时间戳级别调整视听功能。这种细粒度对齐改进了跨模态的时间动态建模,同时减少了对大量训练数据和计算资源的需求。DAIC-WoZ数据集上的实验表明,我们的模型优于单模态方法和以前的多模态方法。此外,所提出的框架可以扩展到包含更多的生理信号,为精神健康以外的更广泛的临床应用铺平道路。
摘要:Depression is one of the most prevalent mental health disorders globally. In recent years, multi-modal data, such as speech, video, and transcripts, has been increasingly used to develop AI-assisted depression assessment systems. Large language models have further advanced this field due to their strong language understanding and generalization capabilities. However, conventional LLMs remain text-centric and cannot process the rich non-verbal cues found in audio and visual modalities, which are critical components in mental health evaluation. While multi-modal LLMs offer a promising direction, few are tailored for psychological applications. In this study, we propose a novel multi-modal LLM framework for depression detection. Our approach augments an audio language model with visual understanding and aligns audio-visual features at the timestamp level. This fine-grained alignment improves modeling of temporal dynamics across modalities while reducing the need for extensive training data and computational resources. Experiments on the DAIC-WoZ dataset demonstrate that our model outperforms both single-modality approaches and previous multi-modal methods. Moreover, the proposed framework can be extended to incorporate additional physiological signals, paving the way for broader clinical applications beyond mental health.


【14】Cross-LLM Generalization of Behavioral Backdoor Detection in AI Agent Supply Chains
标题:人工智能代理供应链中行为后门检测的跨LLM推广
链接:https://arxiv.org/abs/2511.19874

作者:Arun Chowdary Sanna
备注:10 pages, 2 figures, 8 tables. Evaluation across 6 production LLMs with 1,198 traces
摘要:随着人工智能代理成为企业工作流程中不可或缺的一部分,它们对共享工具库和预先训练的组件的依赖造成了严重的供应链漏洞。虽然以前的工作已经证明了在单个LLM架构中的行为后门检测,但跨LLM泛化的关键问题仍未得到探索,这一差距对部署多个AI系统的组织具有严重影响。我们提出了跨LLM行为后门检测的第一个系统性研究,评估了六个生产LLM(GPT-5.1,Claude Sonnet 4.5,Grok 4.1,Llama 4 Maverick,GPT-OSS 120 B和DeepSeek Chat V3.1)的泛化。通过1,198个执行跟踪和36个跨模型实验,我们量化了一个关键的发现:单模型检测器在其训练分布中达到92.7%的准确率,但在不同的LLM中只有49.2%,相当于随机猜测的43.4个百分点的泛化差距。我们的分析表明,这种差距源于模型特定的行为特征,特别是在时间特征(变异系数> 0.8),而结构特征在架构中保持稳定。我们表明,模型感知检测将模型身份作为一个额外的功能,在所有评估的模型普遍达到90.6%的准确率。我们发布了多LLM跟踪数据集和检测框架,以实现可重复的研究。
摘要:As AI agents become integral to enterprise workflows, their reliance on shared tool libraries and pre-trained components creates significant supply chain vulnerabilities. While previous work has demonstrated behavioral backdoor detection within individual LLM architectures, the critical question of cross-LLM generalization remains unexplored, a gap with serious implications for organizations deploying multiple AI systems. We present the first systematic study of cross-LLM behavioral backdoor detection, evaluating generalization across six production LLMs (GPT-5.1, Claude Sonnet 4.5, Grok 4.1, Llama 4 Maverick, GPT-OSS 120B, and DeepSeek Chat V3.1). Through 1,198 execution traces and 36 cross-model experiments, we quantify a critical finding: single-model detectors achieve 92.7% accuracy within their training distribution but only 49.2% across different LLMs, a 43.4 percentage point generalization gap equivalent to random guessing. Our analysis reveals that this gap stems from model-specific behavioral signatures, particularly in temporal features (coefficient of variation > 0.8), while structural features remain stable across architectures. We show that model-aware detection incorporating model identity as an additional feature achieves 90.6% accuracy universally across all evaluated models. We release our multi-LLM trace dataset and detection framework to enable reproducible research.


【15】Training-Free Active Learning Framework in Materials Science with Large Language Models
标题:具有大型语言模型的材料科学免训练主动学习框架
链接:https://arxiv.org/abs/2511.19730

作者:Hongchen Wang,Rafael Espinosa Castañeda,Jay R. Werber,Yao Fehlis,Edward Kim,Jason Hattrick-Simpers
摘要:主动学习(AL)通过优先考虑信息量最大的实验来加速科学发现,但AL中使用的传统机器学习(ML)模型受到冷启动限制和特定于领域的特征工程的影响,限制了它们的泛化能力。大型语言模型(LLM)提供了一种新的范式,它利用它们的预训练知识和通用的基于令牌的表示,直接从基于文本的描述中提出实验。在这里,我们介绍了一个基于LLM的主动学习框架(LLM-AL),该框架在迭代的Few-Shot设置中运行,并将其与四个不同材料科学数据集的传统ML模型进行基准测试。我们探索了两种提示策略:一种是使用简洁的数字输入,适用于具有更多成分和结构化特征的数据集,另一种是使用扩展的描述性文本,适用于具有更多实验和程序特征的数据集,以提供额外的上下文。在所有数据集中,LLM-AL可以将达到最佳候选人所需的实验数量减少70%以上,并且始终优于传统的ML模型。我们发现LLM-AL执行更广泛和更探索性的搜索,同时仍然以更少的迭代达到最优。考虑到LLMs固有的非确定性,我们进一步检查了LLM-AL的稳定性边界,并发现其性能在整个运行中大致一致,在传统ML方法通常观察到的可变性范围内。这些结果表明,LLM-AL可以作为一个可推广的替代传统的AL管道更有效和可解释的实验选择和潜在的LLM驱动的自主发现。
摘要:Active learning (AL) accelerates scientific discovery by prioritizing the most informative experiments, but traditional machine learning (ML) models used in AL suffer from cold-start limitations and domain-specific feature engineering, restricting their generalizability. Large language models (LLMs) offer a new paradigm by leveraging their pretrained knowledge and universal token-based representations to propose experiments directly from text-based descriptions. Here, we introduce an LLM-based active learning framework (LLM-AL) that operates in an iterative few-shot setting and benchmark it against conventional ML models across four diverse materials science datasets. We explored two prompting strategies: one using concise numerical inputs suited for datasets with more compositional and structured features, and another using expanded descriptive text suited for datasets with more experimental and procedural features to provide additional context. Across all datasets, LLM-AL could reduce the number of experiments needed to reach top-performing candidates by over 70% and consistently outperformed traditional ML models. We found that LLM-AL performs broader and more exploratory searches while still reaching the optima with fewer iterations. We further examined the stability boundaries of LLM-AL given the inherent non-determinism of LLMs and found its performance to be broadly consistent across runs, within the variability range typically observed for traditional ML approaches. These results demonstrate that LLM-AL can serve as a generalizable alternative to conventional AL pipelines for more efficient and interpretable experiment selection and potential LLM-driven autonomous discovery.


【16】An Invariant Latent Space Perspective on Language Model Inversion
标题:语言模型反演的不变潜空间观点
链接:https://arxiv.org/abs/2511.19569

作者:Wentao Ye,Jiaqi Hu,Haobo Wang,Xinpeng Ti,Zhiqing Xiao,Hao Chen,Liyao Li,Lei Feng,Sai Wu,Junbo Zhao
备注:The Fortieth AAAI Conference on Artificial Intelligence (AAAI-26)
摘要:语言模型反演(LMI),即,从输出中恢复隐藏的提示作为对用户隐私和系统安全的具体威胁出现。我们将LMI重新定义为重用LLM自身的潜在空间,并提出了不变潜在空间假设(ILSH):(1)来自同一源提示的不同输出应保持一致的语义(源不变性),(2)输入输出循环映射应在共享潜在空间内自一致(循环不变性)。因此,我们提出了Inv^2A,它将LLM视为不变解码器,并仅学习将输出映射到去噪伪表示的轻量级逆编码器。当多个输出可用时,它们在表示层稀疏地连接以增加信息密度。训练分为两个阶段:对比对齐(源不变性)和监督强化(循环不变性)。可选的无训练邻域搜索可以改善局部性能。在涵盖用户和系统提示场景的9个数据集中,Inv^2A的BLEU得分平均超过基线4.77%,同时减少了对大型反向语料库的依赖。我们的分析进一步表明,普遍的防御提供有限的保护,强调需要更强大的战略。本文涉及的源代码和数据可以在https://github.com/yyy01/Invariant_Attacker中找到。
摘要:Language model inversion (LMI), i.e., recovering hidden prompts from outputs, emerges as a concrete threat to user privacy and system security. We recast LMI as reusing the LLM's own latent space and propose the Invariant Latent Space Hypothesis (ILSH): (1) diverse outputs from the same source prompt should preserve consistent semantics (source invariance), and (2) inputoutput cyclic mappings should be self-consistent within a shared latent space (cyclic invariance). Accordingly, we present Inv^2A, which treats the LLM as an invariant decoder and learns only a lightweight inverse encoder that maps outputs to a denoised pseudo-representation. When multiple outputs are available, they are sparsely concatenated at the representation layer to increase information density. Training proceeds in two stages: contrastive alignment (source invariance) and supervised reinforcement (cyclic invariance). An optional training-free neighborhood search can refine local performance. Across 9 datasets covering user and system prompt scenarios, Inv^2A outperforms baselines by an average of 4.77% BLEU score while reducing dependence on large inverse corpora. Our analysis further shows that prevalent defenses provide limited protection, underscoring the need for stronger strategies. The source code and data involved in this paper can be found in https://github.com/yyy01/Invariant_Attacker.


【17】Cross-Domain Generalization of Multimodal LLMs for Global Photovoltaic Assessment
标题:全球太阳能评估的多峰LLM跨领域推广
链接:https://arxiv.org/abs/2511.19537

作者:Muhao Guo,Yang Weng
备注:5 pages, 7 figures
摘要:分布式光伏(PV)系统的快速扩张给电网管理带来了挑战,因为许多安装仍然没有记录。虽然卫星图像提供了全球覆盖,但传统的计算机视觉(CV)模型(如CNN和U-Nets)需要大量的标记数据,并且无法跨区域进行推广。本研究探讨跨域推广的多模态大语言模型(LLM)的全球PV评估。通过利用结构化提示和微调,该模型将检测、定位和量化集成在一个统一的模式中。使用$Δ$F1度量的跨区域评估表明,所提出的模型在看不见的区域实现了最小的性能下降,优于传统的CV和Transformer基线。这些结果突出了多模态LLM在域转移下的鲁棒性及其可扩展、可转移和可解释的全局PV映射的潜力。
摘要:The rapid expansion of distributed photovoltaic (PV) systems poses challenges for power grid management, as many installations remain undocumented. While satellite imagery provides global coverage, traditional computer vision (CV) models such as CNNs and U-Nets require extensive labeled data and fail to generalize across regions. This study investigates the cross-domain generalization of a multimodal large language model (LLM) for global PV assessment. By leveraging structured prompts and fine-tuning, the model integrates detection, localization, and quantification within a unified schema. Cross-regional evaluation using the $Δ$F1 metric demonstrates that the proposed model achieves the smallest performance degradation across unseen regions, outperforming conventional CV and transformer baselines. These results highlight the robustness of multimodal LLMs under domain shift and their potential for scalable, transferable, and interpretable global PV mapping.


【18】Automating Deception: Scalable Multi-Turn LLM Jailbreaks
标题:自动欺骗:可扩展多回合LLM越狱
链接:https://arxiv.org/abs/2511.19517

作者:Adarsh Kumarappan,Ananya Mujoo
摘要:多回合会话攻击利用了像Foot-in-the-Door(FITD)这样的心理学原理,其中一个小的初始请求为更重要的请求铺平了道路,以绕过安全对齐,对大型语言模型(LLM)构成了持续的威胁。防御这些攻击的进展受到依赖手动,难以扩展的数据集创建的阻碍。本文介绍了一种新的自动化管道,用于生成大规模的、基于心理学的多轮越狱数据集。我们系统地将FITD技术操作到可复制的模板中,创建了一个涵盖非法活动和攻击性内容的1,500个场景的基准。我们评估了七个模型,从三个主要的LLM家庭在多转(有历史)和单转(无历史)的条件下。我们的研究结果揭示了上下文鲁棒性的明显差异:GPT家族中的模型表现出对会话历史的显著脆弱性,攻击成功率(ASR)增加了32个百分点。相比之下,谷歌的Gemini 2.5 Flash表现出了非凡的弹性,证明几乎对这些攻击免疫,而Anthropic的Claude 3 Haiku表现出了强大但不完美的抵抗力。这些发现突出了当前安全架构如何处理会话上下文的关键分歧,并强调了防御可以抵御基于叙述的操纵的需求。
摘要:Multi-turn conversational attacks, which leverage psychological principles like Foot-in-the-Door (FITD), where a small initial request paves the way for a more significant one, to bypass safety alignments, pose a persistent threat to Large Language Models (LLMs). Progress in defending against these attacks is hindered by a reliance on manual, hard-to-scale dataset creation. This paper introduces a novel, automated pipeline for generating large-scale, psychologically-grounded multi-turn jailbreak datasets. We systematically operationalize FITD techniques into reproducible templates, creating a benchmark of 1,500 scenarios across illegal activities and offensive content. We evaluate seven models from three major LLM families under both multi-turn (with history) and single-turn (without history) conditions. Our results reveal stark differences in contextual robustness: models in the GPT family demonstrate a significant vulnerability to conversational history, with Attack Success Rates (ASR) increasing by as much as 32 percentage points. In contrast, Google's Gemini 2.5 Flash exhibits exceptional resilience, proving nearly immune to these attacks, while Anthropic's Claude 3 Haiku shows strong but imperfect resistance. These findings highlight a critical divergence in how current safety architectures handle conversational context and underscore the need for defenses that can resist narrative-based manipulation.


【19】A Systematic Study of Compression Ordering for Large Language Models
标题:大型语言模型压缩排序的系统研究
链接:https://arxiv.org/abs/2511.19495

作者:Shivansh Chhawri,Rahul Mahadik,Suparna Rooj
摘要:大型语言模型(LLM)需要大量的计算资源,这使得模型压缩对于在受限环境中的有效部署至关重要。在主要的压缩技术中:知识蒸馏,结构化修剪和低位量化,它们的个别效果得到了很好的研究,但它们的相互作用和最佳顺序仍然不清楚。这项工作系统地研究了这些技术是如何独立和组合时,应用到Qwen2.5 3B模型。我们评估多个压缩管道,包括单一的,并提出了三种技术序列,使用困惑,G-Eval,清晰度,及时对齐,压缩比作为指标。我们的实验表明,量化提供了最大的独立压缩,而修剪引入适度的质量下降。关键的是,技术的顺序会显著影响最终模型的质量:修剪、知识蒸馏、量化(P-KD-Q)序列产生了最佳平衡,实现了3.68倍的压缩比,同时保留了强大的解释遵循和语言理解能力。相反,早期应用量化的流水线由于不可逆的信息丢失而遭受严重的性能下降,这会损害后续的训练。总的来说,这项研究提供了实际的见解,设计有效的,有序的压缩管道部署LLM在资源有限的设置。
摘要:Large Language Models (LLMs) require substantial computational resources, making model compression essential for efficient deployment in constrained environments. Among the dominant compression techniques: knowledge distillation, structured pruning, and low-bit quantization, their individual effects are well studied, but their interactions and optimal sequencing remain unclear. This work systematically examines how these techniques perform both independently and in combination when applied to the Qwen2.5 3B model. We evaluate multiple compression pipelines, including single, and proposed three-technique sequences, using perplexity, G-Eval, clarity, prompt alignment, and compression ratio as metrics. Our experiments show that quantization provides the greatest standalone compression, while pruning introduces moderate quality degradation. Critically, the ordering of techniques significantly affects the final model quality: the sequence Pruning, Knowledge Distillation, Quantization (P-KD-Q) yields the best balance, achieving a 3.68x compression ratio while preserving strong instruction-following and language understanding capabilities. Conversely, pipelines applying quantization early suffer severe performance degradation due to irreversible information loss that impairs subsequent training. Overall, this study offers practical insight into designing effective, ordering-aware compression pipelines for deploying LLMs in resource-limited settings.


【20】Efficient Inference Using Large Language Models with Limited Human Data: Fine-Tuning then Rectification
标题:使用有限人类数据的大型语言模型进行高效推理:微调然后纠正
链接:https://arxiv.org/abs/2511.19486

作者:Lei Wang,Zikun Ye,Jinglong Zhao
摘要:在人工智能(AI)最新进展的推动下,越来越多的工作表明,在市场研究和社会科学应用中,使用大型语言模型(LLM)生成类似人类的响应具有潜力。可以应用两种主要方法来提高LLM的性能:微调,使LLM预测与人类反应更紧密地一致,以及纠正LLM输出中的偏差。在本文中,我们开发了一个结合微调和纠正的框架,并在这两个阶段中优化分配有限的标记样本。与传统的目标,最大限度地减少均方预测误差,我们建议,以最小化的预测误差的方差作为微调目标,这是最佳的下游整流阶段。基于这一认识,我们利用经验标度律开发了一种数据驱动的方法,用于在微调和整流阶段之间最佳地分割样本。实证分析验证了我们的框架,展示了改进的估计和推理性能相比,单独使用微调或整流。
摘要 :Driven by recent advances in artificial intelligence (AI), a growing body of work demonstrates the potential of using large language models (LLMs) to generate human-like responses in market research and social science applications. Two primary approaches can be applied to improve the performance of LLMs: fine-tuning, which aligns LLM predictions more closely with human responses, and rectification, which corrects biases in LLM outputs. In this paper, we develop a framework that combines fine-tuning and rectification, and optimally allocates limited labeled samples across the two stages. Unlike the conventional objective that minimizes the mean squared prediction errors, we propose to minimize the variance of the prediction errors as the fine-tuning objective, which is optimal for the downstream rectification stage. Building on this insight, we leverage empirical scaling laws to develop a data-driven method for optimally splitting samples between the fine-tuning and rectification stages. Empirical analysis validates our framework, demonstrating improved estimation and inference performance compared to using either fine-tuning or rectification alone.


【21】Exploiting the Experts: Unauthorized Compression in MoE-LLMs
标题:利用专家:MoE-LLM中的未经授权压缩
链接:https://arxiv.org/abs/2511.19480

作者:Pinaki Prasad Guha Neogi,Ahmad Mohammadshirazi,Dheeraj Kulshrestha,Rajiv Ramnath
摘要:混合专家(Mixture-of-Experts,MoE)架构因其可扩展性和高效性而越来越多地被大型语言模型(LLM)所采用。然而,它们的模块化结构引入了一个独特的漏洞:攻击者可以尝试通过精简专家并廉价地微调其余部分来压缩或重新使用模型,从而有效地绕过许可和安全限制。本文系统地研究了MoE-LLM在特定任务使用下的可修剪性。我们首先开发了一个专家归因框架,确定最负责给定任务的专家子集,然后使用主动学习驱动的微调来评估修剪和重新调整这些专家的性能权衡。我们的研究结果揭示了一个关键的知识损失-恢复权衡:虽然某些专家可以被隔离,以保持任务的准确性,显着的退化发生没有针对性的重新调整。基于这一分析,我们提出了防御策略,旨在使MoE模型更难压缩和微调未经授权,包括纠缠的专家训练和选择性微调协议,抵制未经授权的适应。通过将专家修剪定位为威胁向量和防御目标,这项工作突出了MoE模块化的双重用途,并为MoE-LLM的安全专业化提供了第一个系统的评估框架。
摘要:Mixture-of-Experts (MoE) architectures are increasingly adopted in large language models (LLMs) for their scalability and efficiency. However, their modular structure introduces a unique vulnerability: adversaries can attempt to compress or repurpose models by pruning experts and cheaply fine-tuning the remainder, effectively bypassing licensing and security constraints. In this paper, we systematically study the prunability of MoE-LLMs under task-specific usage. We first develop an expert attribution framework that identifies the subset of experts most responsible for a given task, then evaluate the performance trade-offs of pruning and re-aligning these experts using active learning-driven fine-tuning. Our findings reveal a critical knowledge loss--recovery trade-off: while certain experts can be isolated to retain task accuracy, significant degradation occurs without targeted re-alignment. Based on this analysis, we propose defense strategies that aim to make MoE models harder to compress and fine-tune without authorization, including entangled expert training and selective fine-tuning protocols that resist unauthorized adaptation. By positioning expert pruning as both a threat vector and a defense target, this work highlights the dual-use nature of MoE modularity and provides the first systematic evaluation framework for secure specialization of MoE-LLMs.


Graph相关(图学习|图神经网络|图优化等)(8篇)

【1】E2E-GRec: An End-to-End Joint Training Framework for Graph Neural Networks and Recommender Systems
标题:E2 E-GRec:图神经网络和推荐系统的端到端联合训练框架
链接:https://arxiv.org/abs/2511.20564

作者:Rui Xue,Shichao Zhu,Liang Qin,Guangmou Pan,Yang Song,Tianfu Wu
摘要:图神经网络(GNN)已经成为建模图结构数据的强大工具,并已广泛用于推荐系统,例如用于捕获复杂的用户-项目和项目-项目关系。然而,大多数工业部署采用两阶段管道:GNN首先离线预训练以生成节点嵌入,然后将其用作下游推荐系统的静态特征。这种解耦的范式导致了两个关键限制:(1)计算开销高,因为必须重复执行大规模GNN推理才能刷新嵌入;(2)缺乏联合优化,因为推荐系统的梯度不能直接影响GNN学习过程,导致GNN对于推荐任务来说信息不是最佳的。在本文中,我们提出了E2 E-GRec,这是一种新的端到端训练框架,将GNN训练与推荐系统相结合。我们的框架的特点是由三个关键组件:(i)从大规模跨域异构图中进行有效的子图采样,以确保训练的可扩展性和效率;(ii)作为辅助自监督任务的图特征自动编码器(GFAE),以指导GNN学习结构上有意义的嵌入;以及(iii)结合基于Gradnorm的动态损失平衡的两级特征融合机制,其稳定了图感知的多任务端到端训练。广泛的离线评估,在线A/B测试(例如,在大规模生产数据上,停留时间相对提高了+0.133%,用户跳过的平均视频数量减少了0.3171%),以及理论分析表明,E2 E-GRec始终优于传统方法,在多个推荐指标上产生了显着的收益。
摘要:Graph Neural Networks (GNNs) have emerged as powerful tools for modeling graph-structured data and have been widely used in recommender systems, such as for capturing complex user-item and item-item relations. However, most industrial deployments adopt a two-stage pipeline: GNNs are first pre-trained offline to generate node embeddings, which are then used as static features for downstream recommender systems. This decoupled paradigm leads to two key limitations: (1) high computational overhead, since large-scale GNN inference must be repeatedly executed to refresh embeddings; and (2) lack of joint optimization, as the gradient from the recommender system cannot directly influence the GNN learning process, causing the GNN to be suboptimally informative for the recommendation task. In this paper, we propose E2E-GRec, a novel end-to-end training framework that unifies GNN training with the recommender system. Our framework is characterized by three key components: (i) efficient subgraph sampling from a large-scale cross-domain heterogeneous graph to ensure training scalability and efficiency; (ii) a Graph Feature Auto-Encoder (GFAE) serving as an auxiliary self-supervised task to guide the GNN to learn structurally meaningful embeddings; and (iii) a two-level feature fusion mechanism combined with Gradnorm-based dynamic loss balancing, which stabilizes graph-aware multi-task end-to-end training. Extensive offline evaluations, online A/B tests (e.g., a +0.133% relative improvement in stay duration, a 0.3171% reduction in the average number of videos a user skips) on large-scale production data, together with theoretical analysis, demonstrate that E2E-GRec consistently surpasses traditional approaches, yielding significant gains across multiple recommendation metrics.


【2】PRISM: Periodic Representation with multIscale and Similarity graph Modelling for enhanced crystal structure property prediction
标题:PRism:具有多尺度和相似性图的周期表示模型增强晶体结构性质预测
链接:https://arxiv.org/abs/2511.20362

作者:Àlex Solé,Albert Mosella-Montoro,Joan Cardona,Daniel Aravena,Silvia Gómez-Coca,Eliseo Ruiz,Javier Ruiz-Hidalgo
摘要:晶体结构的特点是在三维空间中的晶胞内重复原子模式,这对基于图形的表示学习提出了独特的挑战。目前的方法往往忽略了必要的周期性边界条件和多尺度相互作用固有的晶体结构。在本文中,我们介绍PRISM,图形神经网络框架,明确集成多尺度表示和周期性特征编码采用一组专家模块,每个专门在编码不同的结构和化学方面的周期性系统。在基于晶体结构的基准上进行的大量实验表明,PRISM提高了最先进的预测准确性,显著增强了晶体性质预测。
摘要:Crystal structures are characterised by repeating atomic patterns within unit cells across three-dimensional space, posing unique challenges for graph-based representation learning. Current methods often overlook essential periodic boundary conditions and multiscale interactions inherent to crystalline structures. In this paper, we introduce PRISM, a graph neural network framework that explicitly integrates multiscale representations and periodic feature encoding by employing a set of expert modules, each specialised in encoding distinct structural and chemical aspects of periodic systems. Extensive experiments across crystal structure-based benchmarks demonstrate that PRISM improves state-of-the-art predictive accuracy, significantly enhancing crystal property prediction.


【3】Decoupling and Damping: Structurally-Regularized Gradient Matching for Multimodal Graph Condensation
标题:去耦合和衰减:用于多峰图凝聚的结构正规化梯度匹配
链接:https://arxiv.org/abs/2511.20222

作者:Lian Shen,Zhendan Chen,Yinhui jiang,Meijia Song,Ziming Su,Juan Liu,Xiangrong Liu
备注:11pages,5 figures,6 tables
摘要:在电子商务和推荐系统等关键Web应用中,集成了丰富的视觉和文本属性的多模态图越来越重要,但它们的大规模为训练图神经网络(GNN)带来了巨大的计算负担。虽然图凝聚(GC)通过合成较小的数据集提供了一个有前途的解决方案,但现有的方法在多模态环境中会出现问题。我们确定了导致这种失败的双重挑战:(1)由模态之间的语义不一致引起的冲突梯度,以及(2)GNN的消息传递架构病理性地放大了图结构中的这种梯度噪声。为了解决这个问题,我们提出了结构正则化梯度匹配(SR-GM),这是一种专为多峰图量身定制的新型缩合框架。SR-GM引入了两个协同组件:第一,梯度解耦机制,通过正交投影解决模态间冲突的根源;第二,直接作用于梯度场的结构阻尼正则化器。通过利用图的Dirichlet能量,该正则化器将拓扑从噪声放大器转换为优化期间的稳定力。大量的实验表明,SR-GM显着提高精度和加速收敛相比,基线方法。消融研究证实,同时解决梯度冲突和结构放大对于实现卓越性能至关重要。此外,压缩的多模态图具有很强的跨架构泛化能力,有望加速神经架构搜索等应用。本研究为资源受限环境下的多模态图学习提供了一种可扩展的方法。
摘要:In critical web applications such as e-commerce and recommendation systems, multimodal graphs integrating rich visual and textual attributes are increasingly central, yet their large scale introduces substantial computational burdens for training Graph Neural Networks (GNNs). While Graph Condensation (GC) offers a promising solution by synthesizing smaller datasets, existing methods falter in the multimodal setting. We identify a dual challenge causing this failure: (1) conflicting gradients arising from semantic misalignments between modalities, and (2) the GNN's message-passing architecture pathologically amplifying this gradient noise across the graph structure. To address this, we propose Structurally-Regularized Gradient Matching (SR-GM), a novel condensation framework tailored for multimodal graphs. SR-GM introduces two synergistic components: first, a gradient decoupling mechanism that resolves inter-modality conflicts at their source via orthogonal projection; and second, a structural damping regularizer that acts directly on the gradient field. By leveraging the graph's Dirichlet energy, this regularizer transforms the topology from a noise amplifier into a stabilizing force during optimization. Extensive experiments demonstrate that SR-GM significantly improves accuracy and accelerates convergence compared to baseline methods. Ablation studies confirm that addressing both gradient conflict and structural amplification in tandem is essential for achieving superior performance. Moreover, the condensed multimodal graphs exhibit strong cross-architecture generalization and promise to accelerate applications like Neural Architecture Search. This research provides a scalable methodology for multimodal graph-based learning in resource-constrained environments.


【4】Cross-Contrastive Clustering for Multimodal Attributed Graphs with Dual Graph Filtering
标题:具有双图过滤的多峰属性图的交叉对比聚集
链接:https://arxiv.org/abs/2511.20030

作者:Haoran Zheng,Renchi Yang,Hongtao Wang,Jianliang Xu
备注:Accepted by SIGKDD 2026. The code is available at https://github.com/HaoranZ99/DGF
摘要:多模态属性图(Multimodal Attributed Graphs,MMAG)是一种表达性数据模型,用于表示实体之间的复杂互连,这些实体将来自多种数据模态(文本,图像等)的属性相关联。对这些数据进行聚类在真实场景中有许多实际应用,包括社交社区检测,医疗数据分析等。然而,正如我们的实证研究所揭示的那样,现有的多视图聚类解决方案在很大程度上依赖于跨各种视图的属性之间的高度相关性,而忽略了独特的特征(例如,低模态相关性和强烈的特征噪声),导致MMAG中大型预训练语言和视觉模型输出的多模态属性的次优聚类性能。   受上述经验观察和我们对图信号处理的理论分析的启发,我们提出了双图滤波(DGF)方案,该方案创新地将特征去噪组件纳入节点表示学习中,从而有效地克服了现有多视图图聚类方法中采用的传统图滤波器的局限性。最重要的是,DGF包括一个三交叉对比训练策略,该策略采用跨模态、邻域和社区的实例级对比学习来学习鲁棒和有区别的节点表示。我们在八个基准MMAG数据集上的综合实验表明,DGF能够在针对地面真实标签测量的聚类质量方面持续且显著地优于各种最先进的基线。
摘要:Multimodal Attributed Graphs (MMAGs) are an expressive data model for representing the complex interconnections among entities that associate attributes from multiple data modalities (text, images, etc.). Clustering over such data finds numerous practical applications in real scenarios, including social community detection, medical data analytics, etc. However, as revealed by our empirical studies, existing multi-view clustering solutions largely rely on the high correlation between attributes across various views and overlook the unique characteristics (e.g., low modality-wise correlation and intense feature-wise noise) of multimodal attributes output by large pre-trained language and vision models in MMAGs, leading to suboptimal clustering performance.   Inspired by foregoing empirical observations and our theoretical analyses with graph signal processing, we propose the Dual Graph Filtering (DGF) scheme, which innovatively incorporates a feature-wise denoising component into node representation learning, thereby effectively overcoming the limitations of traditional graph filters adopted in the extant multi-view graph clustering approaches. On top of that, DGF includes a tri-cross contrastive training strategy that employs instance-level contrastive learning across modalities, neighborhoods, and communities for learning robust and discriminative node representations. Our comprehensive experiments on eight benchmark MMAG datasets exhibit that DGF is able to outperform a wide range of state-of-the-art baselines consistently and significantly in terms of clustering quality measured against ground-truth labels.


【5】Rethinking Semi-Supervised Node Classification with Self-Supervised Graph Clustering
标题:基于自监督图聚类的半监督节点分类方法的再思考
链接:https://arxiv.org/abs/2511.19976

作者:Songbo Wang,Renchi Yang,Yurui Lai,Xiaoyang Lin,Tsz Nam Chan
备注:14 pages
摘要:图神经网络(GNNs)的出现为半监督节点分类任务提供了强大的工具。随后的研究通过改进GNN模型中的消息传递方案或利用各种数据增强技术来减轻有限的监督,从而实现了进一步的改进。在真实图中,节点往往倾向于形成紧密结合的社区/集群,这些社区/集群包含丰富的信号,用于补偿半监督节点分类中的标签稀缺性,但在现有方法中没有探索。   受此启发,本文提出了NCGC,它集成了自监督图聚类和半监督分类到一个统一的框架。首先,我们在理论上统一了GNN和谱图聚类的优化目标,并在此基础上开发了软正交GNN(SOGN),它利用改进的消息传递范式来生成分类和聚类的节点表示。最重要的是,NCGC包括一个自监督图聚类模块,该模块支持SOGN的训练,以自监督的方式学习未标记节点的表示。特别地,该组件包括两个非平凡的聚类目标和将预测的聚类分配转换为平衡的软伪标签的Sinkhorn-Knopp归一化。通过将上述聚类模块与使用包含标记数据上的监督分类损失和未标记数据上的自监督聚类损失的多任务目标的分类模型相结合,NCGC促进了它们之间的协同作用,并实现了增强的模型容量。我们广泛的实验表明,当使用各种经典GNN主干时,所提出的NCGC框架在七个真实图上的半监督节点分类方面始终并且大大优于流行的GNN模型和最近的基线。
摘要 :The emergence of graph neural networks (GNNs) has offered a powerful tool for semi-supervised node classification tasks. Subsequent studies have achieved further improvements through refining the message passing schemes in GNN models or exploiting various data augmentation techniques to mitigate limited supervision. In real graphs, nodes often tend to form tightly-knit communities/clusters, which embody abundant signals for compensating label scarcity in semi-supervised node classification but are not explored in prior methods.   Inspired by this, this paper presents NCGC that integrates self-supervised graph clustering and semi-supervised classification into a unified framework. Firstly, we theoretically unify the optimization objectives of GNNs and spectral graph clustering, and based on that, develop soft orthogonal GNNs (SOGNs) that leverage a refined message passing paradigm to generate node representations for both classification and clustering. On top of that, NCGC includes a self-supervised graph clustering module that enables the training of SOGNs for learning representations of unlabeled nodes in a self-supervised manner. Particularly, this component comprises two non-trivial clustering objectives and a Sinkhorn-Knopp normalization that transforms predicted cluster assignments into balanced soft pseudo-labels. Through combining the foregoing clustering module with the classification model using a multi-task objective containing the supervised classification loss on labeled data and self-supervised clustering loss on unlabeled data, NCGC promotes synergy between them and achieves enhanced model capacity. Our extensive experiments showcase that the proposed NCGC framework consistently and considerably outperforms popular GNN models and recent baselines for semi-supervised node classification on seven real graphs, when working with various classic GNN backbones.


【6】GED-Consistent Disentanglement of Aligned and Unaligned Substructures for Graph Similarity Learning
标题:用于图相似性学习的对齐和未对齐子结构的GED一致解纠缠
链接:https://arxiv.org/abs/2511.19837

作者:Zhentao Zhan,Xiaoliang Xu,Jingjing Wang,Junmei Wang
摘要:图相似性计算(GSC)是一个基本的图相关的任务,其中图编辑距离(GED)作为一个普遍的度量。GED是由一对图之间的最佳对齐决定的,这对图将每个图划分为对齐(零成本)和未对齐(产生成本)的子结构。由于精确GED计算的NP难性质,基于图神经网络(GNN)的GED近似已经出现。现有的基于GNN的GED方法通常学习每个图的节点嵌入,然后聚合成对的节点相似性来估计最终的相似性。尽管它们的有效性,我们确定了这种流行的以节点为中心的匹配范例和GED的核心原则之间的不匹配。这种差异导致两个关键限制:(1)未能捕获全局结构对应以实现最佳对齐,以及(2)由虚假节点级信号驱动的编辑成本的错误归因。为了解决这些限制,我们提出了GCGSim,一个GED一致的图相似性学习框架,集中在图级匹配和子结构级编辑成本。具体而言,我们做出了三项核心技术贡献。在四个基准数据集上的大量实验表明,GCGSim达到了最先进的性能。我们的全面分析进一步验证了该框架有效地学习解开和语义上有意义的子结构表示。
摘要:Graph Similarity Computation (GSC) is a fundamental graph related task where Graph Edit Distance (GED) serves as a prevalent metric. GED is determined by an optimal alignment between a pair of graphs that partitions each into aligned (zero-cost) and unaligned (cost-incurring) substructures. Due to NP-hard nature of exact GED computation, GED approximations based on Graph Neural Network(GNN) have emerged. Existing GNN-based GED approaches typically learn node embeddings for each graph and then aggregate pairwise node similarities to estimate the final similarity. Despite their effectiveness, we identify a mismatch between this prevalent node-centric matching paradigm and the core principles of GED. This discrepancy leads to two critical limitations: (1) a failure to capture the global structural correspondence for optimal alignment, and (2) a misattribution of edit costs driven by spurious node level signals. To address these limitations, we propose GCGSim, a GED-consistent graph similarity learning framework centering on graph-level matching and substructure-level edit costs. Specifically, we make three core technical contributions. Extensive experiments on four benchmark datasets show that GCGSim achieves state-of-the-art performance. Our comprehensive analyses further validate that the framework effectively learns disentangled and semantically meaningful substructure representations.


【7】Agint: Agentic Graph Compilation for Software Engineering Agents
标题:Agint:软件工程代理的统计图编译
链接:https://arxiv.org/abs/2511.19635

作者:Abhi Chivukula,Jay Somasundaram,Vijay Somasundaram
备注:18 pages, 5 figures, NeurIPS 2025: Deep Learning for Code in the Agentic Era
摘要:基于LLM的编码代理越来越普遍,但仍然面临着上下文管理,延迟,可靠性,再现性和可扩展性方面的挑战。我们提出了Agint,一个agentic图形编译器,解释器和运行时,增量和分层转换自然语言指令到类型化,效果感知代码DAG。Agint引入了基于语义图转换的显式类型层(文本到数据到规范到代码)以及混合的LLM和基于函数的JIT运行时。这实现了动态图细化、可再现和可优化的执行、推测性评估以及与现有开发人员工具的互操作性。Agint的类型化图绑定提高了可靠性,并允许通过构造并发代码库,支持更小更快的模型,更低的延迟,高效的上下文利用率和更高的吞吐量来加速开发。分层编译允许可伸缩的图形编辑,而图形结构支持可重复性和高效的并行生成。Agint提供了一个可组合的UNIX风格的工具链:dagify(DAG编译器),dagent(混合JIT运行时),schemagin(模式生成器)和datagin(数据Transformer),用于实时,低延迟代码和底层创建。人类开发人员和编码代理通过Agint CLI优化图形,而非技术用户使用Agint Flow GUI进行可视化编辑,会话优化和调试,以将原型代理工作流程提升到生产代码。这种持续的共同创建模型使团队能够快速原型化,无缝改进和可靠部署,桥接自然语言,编译器方法和开发人员工具,以实现新一代可组合的,以团队为中心的大规模编码代理。
摘要:LLM-based coding agents are increasingly common but still face challenges in context management, latency, reliability, reproducibility, and scalability. We present Agint, an agentic graph compiler, interpreter, and runtime that incrementally and hierarchically converts natural-language instructions into typed, effect-aware code DAGs. Agint introduces explicit type floors (text to data to spec to code) grounded in semantic graph transformations and a hybrid LLM and function-based JIT runtime. This enables dynamic graph refinement, reproducible and optimizable execution, speculative evaluation, and interoperability with existing developer tools. Agint's typed graph bindings improve reliability and allow concurrent composition of concurrent codebases by construction, supporting accelerated development with smaller and faster models, lower latency, efficient context utilization, and higher throughput. Hierarchical compilation allows scalable graph edits, while the graph structure supports reproducibility and efficient parallel generation. Agint provides a composable unix-style toolchain: dagify (DAG compiler), dagent (hybrid JIT runtime), schemagin (schema generator), and datagin (data transformer) for realtime, low-latency code and dataflow creation. Human developers and coding agents refine graphs through the Agint CLI, while non-technical users use Agint Flow GUI for visual editing, conversational refinement, and debugging to promote prototype agentic workflows to production code. This continuous co-creation model allows teams to prototype quickly, refine seamlessly, and deploy reliably, bridging natural language, compiler methods, and developer tooling to enable a new generation of composable, team-centric coding agents at scale.


【8】Neural Tractability via Structure: Learning-Augmented Algorithms for Graph Combinatorial Optimization
标题:结构的神经可跟踪性:图组合优化的学习增强算法
链接:https://arxiv.org/abs/2511.19573

作者:Jialiang Li,Weitong Chen,Mingyu Guo
摘要:神经网络模型在解决NP-难图组合优化问题中有着广阔的应用前景。一旦经过训练,它们为分布测试实例提供快速推理和合理的高质量解决方案,但与经典的基于搜索的算法相比,它们通常在绝对解决方案质量方面有所不足,后者公认速度较慢,但在搜索完成后提供最优性保证。   我们提出了一个新的框架,结合了推理效率和神经模型的探索能力与基于搜索的算法的解决方案质量保证。特别是,我们使用参数化算法(PA)作为搜索组件。PA致力于识别一般NP难问题的简单实例,并通过利用(识别的简单实例的)结构简单性来实现实际有效的搜索。在我们的框架下,我们使用参数化分析来识别CO实例的结构硬部分。神经模型通过基于其数据驱动的理解生成咨询信号来处理困难部分。然后,基于PA的搜索组件集成咨询信号,以系统地和有效地搜索剩余的结构上容易的部分。值得注意的是,我们的框架对神经模型的选择是不可知的,并且比单独的神经求解器产生更好的解决方案。   我们研究我们的框架上的多个CO任务。实验结果表明,该方法具有较好的解质量,可与商业求解器相媲美。此外,通过仅将神经模型用于探索性咨询信号,我们的框架表现出改进的分布外泛化,解决了现有神经CO求解器的关键限制。
摘要:Neural models have shown promise in solving NP-hard graph combinatorial optimization (CO) problems. Once trained, they offer fast inference and reasonably high-quality solutions for in-distribution testing instances, but they generally fall short in terms of absolute solution quality compared to classical search-based algorithms that are admittedly slower but offer optimality guarantee once search finishes.   We propose a novel framework that combines the inference efficiency and exploratory power of neural models with the solution quality guarantee of search-based algorithms. In particular, we use parameterized algorithms (PAs) as the search component. PAs are dedicated to identifying easy instances of generally NP-hard problems, and allow for practically efficient search by exploiting structural simplicity (of the identified easy instances). Under our framework, we use parameterized analysis to identify the structurally hard parts of a CO instance. The neural model handles the hard parts by generating advisory signals based on its data-driven understanding. The PA-based search component then integrates the advisory signals to systematically and efficiently searches through the remaining structurally easy parts. Notably, our framework is agnostic to the choice of neural model and produces strictly better solutions than neural solvers alone.   We examine our framework on multiple CO tasks. Empirical results show that it achieves superior solution quality, competitive with that of commercial solvers. Furthermore, by using the neural model only for exploratory advisory signals, our framework exhibits improved out-of-distribution generalization, addressing a key limitation of existing neural CO solvers.


Transformer(10篇)

【1】Image2Gcode: Image-to-G-code Generation for Additive Manufacturing Using Diffusion-Transformer Model
标题:Image 2Gcode:使用扩散转换器模型为增材制造生成图像到G代码
链接:https://arxiv.org/abs/2511.20636

作者:Ziyue Wang,Yayati Jadhav,Peter Pak,Amir Barati Farimani
摘要:机械设计和制造工作流程通常从概念设计开始,然后创建计算机辅助设计(CAD)模型并通过材料挤出(MEX)打印进行制造。这个过程需要通过切片和路径规划将CAD几何图形转换为机器可读的G代码。虽然每一步都很好地建立起来,但对CAD建模的依赖仍然是一个主要瓶颈:构建特定对象的3D几何形状很慢,不适合快速原型。即使是微小的设计变化通常也需要在CAD软件中手动更新,这使得迭代耗时且难以扩展。为了解决这一限制,我们引入了Image2Gcode,这是一个端到端的数据驱动框架,它绕过了CAD阶段,直接从图像和零件图纸生成打印机就绪的G代码。而不是依赖于一个明确的3D模型,手绘或捕获的2D图像作为唯一的输入。该框架首先从图像中提取切片结构线索,然后在G码序列上采用去噪扩散概率模型(DDPM)。通过迭代去噪,该模型将高斯噪声转换为可执行的打印移动轨迹与相应的挤出参数,建立从视觉输入到本地刀具路径的直接映射。通过直接从2D图像生成结构化的G代码,Image2Gcode消除了对CAD或STL中间体的需求,降低了增材制造的准入门槛,加快了设计到制造的周期。这种方法支持从简单草图或视觉参考按需原型设计,并与上游2D到3D重建模块集成,以实现从概念到物理工件的自动化管道。其结果是一个灵活的,计算效率高的框架,提高了设计迭代,维修工作流程和分布式制造的可访问性。
摘要:Mechanical design and manufacturing workflows conventionally begin with conceptual design, followed by the creation of a computer-aided design (CAD) model and fabrication through material-extrusion (MEX) printing. This process requires converting CAD geometry into machine-readable G-code through slicing and path planning. While each step is well established, dependence on CAD modeling remains a major bottleneck: constructing object-specific 3D geometry is slow and poorly suited to rapid prototyping. Even minor design variations typically necessitate manual updates in CAD software, making iteration time-consuming and difficult to scale. To address this limitation, we introduce Image2Gcode, an end-to-end data-driven framework that bypasses the CAD stage and generates printer-ready G-code directly from images and part drawings. Instead of relying on an explicit 3D model, a hand-drawn or captured 2D image serves as the sole input. The framework first extracts slice-wise structural cues from the image and then employs a denoising diffusion probabilistic model (DDPM) over G-code sequences. Through iterative denoising, the model transforms Gaussian noise into executable print-move trajectories with corresponding extrusion parameters, establishing a direct mapping from visual input to native toolpaths. By producing structured G-code directly from 2D imagery, Image2Gcode eliminates the need for CAD or STL intermediates, lowering the entry barrier for additive manufacturing and accelerating the design-to-fabrication cycle. This approach supports on-demand prototyping from simple sketches or visual references and integrates with upstream 2D-to-3D reconstruction modules to enable an automated pipeline from concept to physical artifact. The result is a flexible, computationally efficient framework that advances accessibility in design iteration, repair workflows, and distributed manufacturing.


【2】MoRE: Batch-Robust Multi-Omics Representations from Frozen Pre-trained Transformers
标题:MoRE:来自冷冻预训练Transformer的批量稳健多组学表示
链接:https://arxiv.org/abs/2511.20382

作者:Audrey Pei-Hsuan Chen
摘要:多组学数据的表示学习是具有挑战性的,由于极端的维度,模态异质性和特定于队列的批量效应。虽然预训练的Transformer骨干在生物序列建模中显示出广泛的泛化能力,但它们在多组学整合中的应用仍然有待探索。我们提出了MoRE(Multi-Omics Representation Embedding),这是一个框架,可以重新使用冻结的预训练Transformers来将异构分析对齐到共享的潜在空间中。与纯粹的生成方法不同,MoRE采用参数高效微调(PEFT)策略,优先考虑简单序列重建的跨样本和跨模态对齐。具体来说,MoRE将轻量级的、特定于模态的适配器和任务自适应融合层附加到冻结的主干上。它优化了一个掩蔽的建模目标,结合监督对比和批量不变的对齐损失,产生了结构保留的嵌入,这些嵌入可以在看不见的细胞类型和平台上推广。我们将MoRE与已建立的基线进行基准测试,包括scGPT、scVI和带有scArches的Harmony,评估整合保真度、罕见群体检测和模态转移。我们的研究结果表明,MoRE实现了有竞争力的批量鲁棒性和生物保护,同时显着减少了可训练的参数相比,完全微调的模型。这项工作将MoRE定位为迈向通用组学基础模型的实际一步。
摘要:Representation learning on multi-omics data is challenging due to extreme dimensionality, modality heterogeneity, and cohort-specific batch effects. While pre-trained transformer backbones have shown broad generalization capabilities in biological sequence modeling, their application to multi-omics integration remains underexplored. We present MoRE (Multi-Omics Representation Embedding), a framework that repurposes frozen pre-trained transformers to align heterogeneous assays into a shared latent space. Unlike purely generative approaches, MoRE employs a parameter-efficient fine-tuning (PEFT) strategy, prioritizing cross-sample and cross-modality alignment over simple sequence reconstruction. Specifically, MoRE attaches lightweight, modality-specific adapters and a task-adaptive fusion layer to the frozen backbone. It optimizes a masked modeling objective jointly with supervised contrastive and batch-invariant alignment losses, yielding structure-preserving embeddings that generalize across unseen cell types and platforms. We benchmark MoRE against established baselines, including scGPT, scVI, and Harmony with scArches, evaluating integration fidelity, rare population detection, and modality transfer. Our results demonstrate that MoRE achieves competitive batch robustness and biological conservation while significantly reducing trainable parameters compared to fully fine-tuned models. This work positions MoRE as a practical step toward general-purpose omics foundation models.


【3】Beyond Components: Singular Vector-Based Interpretability of Transformer Circuits
标题:超越元件:Transformer电路的基于奇异载体的解释性
链接:https://arxiv.org/abs/2511.20273

作者:Areeb Ahmad,Abhinav Joshi,Ashutosh Modi
备注:Accepted at NeurIPS 2025
摘要:基于transformer的语言模型表现出复杂的分布式行为,但其内部计算仍然知之甚少。现有的机械可解释性方法通常将注意力头和多层感知器层(MLP)(Transformer架构的构建块)视为不可分割的单元,忽略了在其中学习的功能子结构的可能性。在这项工作中,我们引入了一个更细粒度的视角,将这些组件分解成正交的奇异方向,揭示了一个单一的头或MLP内的叠加和独立的计算。我们验证了我们的观点广泛使用的标准任务,如间接对象识别(IOI),性别代词(GP),大于(GT),表明以前确定的典型功能头,如名称移动,编码多个重叠的子功能与不同的奇异方向对齐。在计算图中的节点,以前被确定为电路元件显示出强烈的激活沿特定的低秩方向,这表明有意义的计算驻留在紧凑的子空间。虽然一些方向仍然具有挑战性的充分解释,我们的研究结果强调,Transformer计算更分布式,结构化,和组成比以前假设的。这种观点为细粒度的机械可解释性和更深入地理解模型内部开辟了新的途径。
摘要 :Transformer-based language models exhibit complex and distributed behavior, yet their internal computations remain poorly understood. Existing mechanistic interpretability methods typically treat attention heads and multilayer perceptron layers (MLPs) (the building blocks of a transformer architecture) as indivisible units, overlooking possibilities of functional substructure learned within them. In this work, we introduce a more fine-grained perspective that decomposes these components into orthogonal singular directions, revealing superposed and independent computations within a single head or MLP. We validate our perspective on widely used standard tasks like Indirect Object Identification (IOI), Gender Pronoun (GP), and Greater Than (GT), showing that previously identified canonical functional heads, such as the name mover, encode multiple overlapping subfunctions aligned with distinct singular directions. Nodes in a computational graph, that are previously identified as circuit elements show strong activation along specific low-rank directions, suggesting that meaningful computations reside in compact subspaces. While some directions remain challenging to interpret fully, our results highlight that transformer computations are more distributed, structured, and compositional than previously assumed. This perspective opens new avenues for fine-grained mechanistic interpretability and a deeper understanding of model internals.


【4】In-Context Compositional Learning via Sparse Coding Transformer
标题:通过稀疏编码Transformer进行上下文内合成学习
链接:https://arxiv.org/abs/2511.20194

作者:Wei Chen,Jingxi Yu,Zichen Miao,Qiang Qiu
备注:NeurIPS 2025
摘要:Transformer架构在语言、视觉和多模态任务方面取得了显著的成功,并且越来越多的人需要它们来解决上下文组合学习任务。在这些任务中,模型通过从上下文示例中推断组合规则来解决目标问题,上下文示例由基础规则结构化的基本组件组成。然而,这些任务中的一些对于Transformers来说仍然具有挑战性,因为它们不是天生设计用于处理合成任务并且提供有限的结构归纳偏差。在这项工作中,稀疏编码的原则的启发下,我们提出了一个重新制定的注意,以提高其能力的组合任务。在稀疏编码中,数据被表示为字典原子与捕获其组成规则的系数的稀疏组合。具体来说,我们重新解释的注意力块作为一个映射的输入到输出,通过投影到两组学习的字典原子:编码字典和解码字典。编码字典将输入分解成一组系数,这些系数表示输入的组成结构。为了增强结构化表示,我们对这些系数施加稀疏性。然后,稀疏系数用于线性组合解码字典原子以生成输出。此外,为了帮助合成泛化任务,我们建议估计目标问题的系数作为从上下文示例中获得的系数的线性组合。我们证明了我们的方法的有效性S-RAVEN和RAVEN数据集。对于某些合成泛化任务,我们的方法保持性能,即使标准的Transformers失败,由于其学习和应用合成规则的能力。
摘要:Transformer architectures have achieved remarkable success across language, vision, and multimodal tasks, and there is growing demand for them to address in-context compositional learning tasks. In these tasks, models solve the target problems by inferring compositional rules from context examples, which are composed of basic components structured by underlying rules. However, some of these tasks remain challenging for Transformers, which are not inherently designed to handle compositional tasks and offer limited structural inductive bias. In this work, inspired by the principle of sparse coding, we propose a reformulation of the attention to enhance its capability for compositional tasks. In sparse coding, data are represented as sparse combinations of dictionary atoms with coefficients that capture their compositional rules. Specifically, we reinterpret the attention block as a mapping of inputs into outputs through projections onto two sets of learned dictionary atoms: an encoding dictionary and a decoding dictionary. The encoding dictionary decomposes the input into a set of coefficients, which represent the compositional structure of the input. To enhance structured representations, we impose sparsity on these coefficients. The sparse coefficients are then used to linearly combine the decoding dictionary atoms to generate the output. Furthermore, to assist compositional generalization tasks, we propose estimating the coefficients of the target problem as a linear combination of the coefficients obtained from the context examples. We demonstrate the effectiveness of our approach on the S-RAVEN and RAVEN datasets. For certain compositional generalization tasks, our method maintains performance even when standard Transformers fail, owing to its ability to learn and apply compositional rules.


【5】Softmax Transformers are Turing-Complete
标题:SoftmaxTransformer即将完成
链接:https://arxiv.org/abs/2511.20038

作者:Hongjian Jiang,Michael Hahn,Georg Zetzsche,Anthony Widjaja Lin
摘要:硬注意力思想链(CoT)Transformers是图灵完备的。然而,softmax注意力思想链(CoT)Transformers是否是图灵完备的,这是一个悬而未决的问题。在本文中,我们证明了一个更强的结果,长度广义softmax CoT Transformers是图灵完备的。更确切地说,我们的图灵完备性证明是通过计数RASP(C-RASP)的CoT扩展进行的,它对应于允许长度泛化的softmax CoT Transformers。我们证明图灵完备性CoT C-RASP因果掩蔽一元字母表(更一般地说,字母有界的语言)。虽然我们表明这是不图灵完备的任意语言,我们证明了它的扩展与相对位置编码是图灵完备的任意语言。我们通过训练需要复杂(非线性)算术推理的语言的Transformers来实证验证我们的理论。
摘要:Hard attention Chain-of-Thought (CoT) transformers are known to be Turing-complete. However, it is an open problem whether softmax attention Chain-of-Thought (CoT) transformers are Turing-complete. In this paper, we prove a stronger result that length-generalizable softmax CoT transformers are Turing-complete. More precisely, our Turing-completeness proof goes via the CoT extension of the Counting RASP (C-RASP), which correspond to softmax CoT transformers that admit length generalization. We prove Turing-completeness for CoT C-RASP with causal masking over a unary alphabet (more generally, for letter-bounded languages). While we show this is not Turing-complete for arbitrary languages, we prove that its extension with relative positional encoding is Turing-complete for arbitrary languages. We empirically validate our theory by training transformers for languages requiring complex (non-linear) arithmetic reasoning.


【6】Frailty-Aware Transformer for Recurrent Survival Modeling of Driver Retention in Ride-Hailing Platforms
标题:用于乘车平台中驾驶员保留力的循环生存建模的脆弱意识Transformer
链接:https://arxiv.org/abs/2511.19893

作者:Shuoyan Xu,Yu Zhang,Eric J. Miller
备注:13 pages, 6 figures, under review, Accepted by KDD Workshop 2025
摘要:网约车平台的特点是高频、行为驱动的环境。虽然生存分析已被应用于其他领域的复发事件,但其在建模乘车驾驶员行为中的使用在很大程度上尚未探索。这项研究使用大规模平台数据将空闲行为制定为一个经常性的生存过程,并提出了一个基于Transformer的框架,该框架通过因果掩蔽捕获长期的时间依赖关系,并结合特定于驱动程序的嵌入来建模潜在的异质性。多伦多网约车数据的结果表明,提出的Frailty-Aware Cox Transformer(FACT)实现了最高的时间相关C指数和最低的Brier得分,优于经典和深度学习生存模型。这种方法可以实现更准确的风险估计,支持平台保留策略,并提供与政策相关的见解。
摘要:Ride-hailing platforms are characterized by high-frequency, behavior-driven environments. Although survival analysis has been applied to recurrent events in other domains, its use in modeling ride-hailing driver behavior remains largely unexplored. This study formulates idle behavior as a recurrent survival process using large-scale platform data and proposes a Transformer-based framework that captures long-term temporal dependencies with causal masking and incorporates driver-specific embeddings to model latent heterogeneity. Results on Toronto ride-hailing data demonstrate that the proposed Frailty-Aware Cox Transformer (FACT) achieves the highest time-dependent C-indices and lowest Brier Scores, outperforming classical and deep learning survival models. This approach enables more accurate risk estimation, supports platform retention strategies, and provides policy-relevant insights.


【7】TREASURE: A Transformer-Based Foundation Model for High-Volume Transaction Understanding
标题:宝藏:一个基于转换器的用于理解大批量事务的基础模型
链接:https://arxiv.org/abs/2511.19693

作者:Chin-Chia Michael Yeh,Uday Singh Saini,Xin Dai,Xiran Fan,Shubham Jain,Yujie Fan,Jiarui Sun,Junpeng Wang,Menghai Pan,Yingtong Dou,Yuzhong Chen,Vineeth Rakesh,Liang Wang,Yan Zheng,Mahashweta Das
摘要 :支付网络构成了现代商业的支柱,从日常活动中产生大量交易记录。正确地对这些数据进行建模可以实现异常行为检测和消费者层面的洞察等应用程序,以获得超个性化的体验,最终改善人们的生活。在本文中,我们提出了TREASURE,Transformer引擎作为可扩展的通用事务表示编码器,一个多用途的基于transformer的基础模型,专门为事务数据设计。该模型同时捕获消费者行为和支付网络信号(如响应代码和系统标志),为准确推荐系统和异常行为检测等应用提供所需的全面信息。通过行业级数据集验证,TREASURE具有三个关键功能:1)具有静态和动态属性专用子模块的输入模块,可实现更高效的训练和推理; 2)用于预测高基数分类属性的高效和有效的训练范式;和3)作为独立模型表现出有效性,将异常行为检测性能提高了111%超过生产系统和嵌入式提供商,将推荐模型增强了104%。我们从广泛的消融研究、生产模型的基准测试和案例研究中提出了关键见解,突出了从开发TREASURE中获得的宝贵知识。
摘要:Payment networks form the backbone of modern commerce, generating high volumes of transaction records from daily activities. Properly modeling this data can enable applications such as abnormal behavior detection and consumer-level insights for hyper-personalized experiences, ultimately improving people's lives. In this paper, we present TREASURE, TRansformer Engine As Scalable Universal transaction Representation Encoder, a multipurpose transformer-based foundation model specifically designed for transaction data. The model simultaneously captures both consumer behavior and payment network signals (such as response codes and system flags), providing comprehensive information necessary for applications like accurate recommendation systems and abnormal behavior detection. Verified with industry-grade datasets, TREASURE features three key capabilities: 1) an input module with dedicated sub-modules for static and dynamic attributes, enabling more efficient training and inference; 2) an efficient and effective training paradigm for predicting high-cardinality categorical attributes; and 3) demonstrated effectiveness as both a standalone model that increases abnormal behavior detection performance by 111% over production systems and an embedding provider that enhances recommendation models by 104%. We present key insights from extensive ablation studies, benchmarks against production models, and case studies, highlighting valuable knowledge gained from developing TREASURE.


【8】TouchFormer: A Robust Transformer-based Framework for Multimodal Material Perception
标题:TouchFormer:一个基于转换器的稳健框架,用于多模式材料感知
链接:https://arxiv.org/abs/2511.19509

作者:Kailin Lyu,Long Xiao,Jianing Zeng,Junhao Dong,Xuexin Liu,Zhuojun Zou,Haoyue Yang,Lin Shu,Jie Hao
备注:9 pages, 7 figures, Accepted by AAAI 2026
摘要:传统的基于视觉的材料感知方法在视力受损的条件下通常会经历实质性的性能下降,从而促使向非视觉多模态材料感知的转变。尽管如此,现有的方法经常执行多模态输入的朴素融合,忽略了关键的挑战,如模态特定的噪声,在现实世界中常见的丢失模态,以及取决于任务的每个模态的动态变化的重要性。这些限制导致在多个基准测试任务中性能不佳。在本文中,我们提出了一个强大的多模态融合框架,TouchFormer。具体来说,我们采用了模态自适应门控(MAG)机制和模态内和模态间的注意力机制,以自适应地整合跨模态的功能,增强模型的鲁棒性。此外,我们引入了一个跨实例嵌入正则化(CER)策略,显着提高分类精度细粒度子类别的材料识别任务。实验结果表明,与现有的非视觉方法相比,所提出的TouchFormer框架实现了2.48%和6.83%的分类精度提高SSMC和USMC任务,分别。此外,真实世界的机器人实验验证了TouchFormer在使机器人更好地感知和解释其环境方面的有效性,为其在紧急响应和工业自动化等安全关键应用中的部署铺平了道路。代码和数据集将是开源的,视频可在补充材料中找到。
摘要:Traditional vision-based material perception methods often experience substantial performance degradation under visually impaired conditions, thereby motivating the shift toward non-visual multimodal material perception. Despite this, existing approaches frequently perform naive fusion of multimodal inputs, overlooking key challenges such as modality-specific noise, missing modalities common in real-world scenarios, and the dynamically varying importance of each modality depending on the task. These limitations lead to suboptimal performance across several benchmark tasks. In this paper, we propose a robust multimodal fusion framework, TouchFormer. Specifically, we employ a Modality-Adaptive Gating (MAG) mechanism and intra- and inter-modality attention mechanisms to adaptively integrate cross-modal features, enhancing model robustness. Additionally, we introduce a Cross-Instance Embedding Regularization(CER) strategy, which significantly improves classification accuracy in fine-grained subcategory material recognition tasks. Experimental results demonstrate that, compared to existing non-visual methods, the proposed TouchFormer framework achieves classification accuracy improvements of 2.48% and 6.83% on SSMC and USMC tasks, respectively. Furthermore, real-world robotic experiments validate TouchFormer's effectiveness in enabling robots to better perceive and interpret their environment, paving the way for its deployment in safety-critical applications such as emergency response and industrial automation. The code and datasets will be open-source, and the videos are available in the supplementary materials.


【9】PrefixGPT: Prefix Adder Optimization by a Generative Pre-trained Transformer
标题:PrefixGPT:通过生成式预训练Transformer进行的后缀加法器优化
链接:https://arxiv.org/abs/2511.19472

作者:Ruogu Ding,Xin Ning,Ulf Schlichtmann,Weikang Qian
备注:An extended version that has been accepted by AAAI-2026 conference
摘要:前缀加法器因其运算速度快而广泛应用于计算密集型应用中。然而,由于严格的设计规则和指数级大的设计空间,设计优化的前缀加法器是具有挑战性的。我们介绍PrefixGPT,这是一个生成式预训练的Transformer(GPT),可以从头开始直接生成优化的前缀加法器。我们的方法表示一个加法器的拓扑结构作为一个二维坐标序列,并在生成过程中应用合法性掩码,确保每个设计是有效的建设。PrefixGPT具有定制的仅限解码器的Transformer架构。该模型首先在随机合成的有效前缀加法器的语料库上进行预训练以学习设计规则,然后进行微调以导航设计空间以优化设计质量。与现有的作品相比,PrefixGPT不仅找到了一个新的最优设计,面积延迟产品(ADP)提高了7.7%,但具有优越的勘探质量,降低平均ADP高达79.1%。这证明了GPT风格模型的潜力,可以首先掌握复杂的硬件设计原理,然后将其应用于更有效的设计优化。
摘要:Prefix adders are widely used in compute-intensive applications for their high speed. However, designing optimized prefix adders is challenging due to strict design rules and an exponentially large design space. We introduce PrefixGPT, a generative pre-trained Transformer (GPT) that directly generates optimized prefix adders from scratch. Our approach represents an adder's topology as a two-dimensional coordinate sequence and applies a legality mask during generation, ensuring every design is valid by construction. PrefixGPT features a customized decoder-only Transformer architecture. The model is first pre-trained on a corpus of randomly synthesized valid prefix adders to learn design rules and then fine-tuned to navigate the design space for optimized design quality. Compared with existing works, PrefixGPT not only finds a new optimal design with a 7.7% improved area-delay product (ADP) but exhibits superior exploration quality, lowering the average ADP by up to 79.1%. This demonstrates the potential of GPT-style models to first master complex hardware design principles and then apply them for more efficient design optimization.


【10】BlockCert: Certified Blockwise Extraction of Transformer Mechanisms
标题:BlockCert:Transformer机制的认证BlockCert提取
链接:https://arxiv.org/abs/2511.17645

作者:Sandro Andric
备注:16 pages, 1 figure
摘要:机械可解释性渴望将神经网络反向工程化为显式算法,而模型编辑则试图在不进行再训练的情况下修改特定行为。这两个领域通常都是通过非正式的证据和特别的实验来评估的,很少有明确的保证,即提取或编辑的模型可以在相关输入上偏离原始模型多远。我们介绍了BlockCert,一个用于Transformer机制的认证分块提取的框架,并概述了轻量级扩展如何支持认证的本地编辑。给定一个预先训练的Transformer和一个即时分发,BlockCert提取剩余块的结构化代理实现以及机器可检查的证书,这些证书绑定近似错误,记录覆盖度量,并对底层工件进行散列。我们在Lean 4中形式化了一个简单的基于Lipschitz的组合定理,将这些局部保证提升到全局偏差范围。根据经验,我们将该框架应用于GPT-2 small,TinyLlama-1.1B-Chat和Llama-3. 2 -3B。在这些模型中,我们在评估的提示上获得了高的每块覆盖率和小的残差,并且在TinyLlama设置中,我们表明完全缝合的模型在应力提示上在大约6 e-5内匹配基线困惑度。我们的研究结果表明,块提取明确的证书是可行的真正的Transformer语言模型,并提供了一个实际的桥梁之间的机械解释性和形式化推理模型的行为。
摘要 :Mechanistic interpretability aspires to reverse-engineer neural networks into explicit algorithms, while model editing seeks to modify specific behaviours without retraining. Both areas are typically evaluated with informal evidence and ad-hoc experiments, with few explicit guarantees about how far an extracted or edited model can drift from the original on relevant inputs. We introduce BlockCert, a framework for certified blockwise extraction of transformer mechanisms, and outline how a lightweight extension can support certified local edits. Given a pre-trained transformer and a prompt distribution, BlockCert extracts structured surrogate implementations for residual blocks together with machine-checkable certificates that bound approximation error, record coverage metrics, and hash the underlying artifacts. We formalize a simple Lipschitz-based composition theorem in Lean 4 that lifts these local guarantees to a global deviation bound. Empirically, we apply the framework to GPT-2 small, TinyLlama-1.1B-Chat, and Llama-3.2-3B. Across these models we obtain high per-block coverage and small residual errors on the evaluated prompts, and in the TinyLlama setting we show that a fully stitched model matches the baseline perplexity within approximately 6e-5 on stress prompts. Our results suggest that blockwise extraction with explicit certificates is feasible for real transformer language models and offers a practical bridge between mechanistic interpretability and formal reasoning about model behaviour.


GAN|对抗|攻击|生成相关(7篇)

【1】From One Attack Domain to Another: Contrastive Transfer Learning with Siamese Networks for APT Detection
标题:从一个攻击域到另一个攻击域:用于APT检测的Siamese网络对比转移学习
链接:https://arxiv.org/abs/2511.20500

作者:Sidahmed Benabderrahmane,Talal Rahwan
摘要:高级持续性威胁(APT)由于其隐蔽性、持久性和适应性而构成了重大的网络安全挑战。传统的机器学习检测器面临着类别不平衡、高维特征和稀缺的真实世界痕迹等问题。它们通常缺乏可移植性-在训练领域表现良好,但在新的攻击场景中会下降。我们提出了一个混合迁移框架,该框架集成了迁移学习、可解释AI(XAI)、对比学习和暹罗网络,以提高跨域泛化能力。基于注意力的自动编码器支持跨领域的知识转移,而Shapley Additive ExPlanations(SHAP)选择稳定的信息特征以降低维度和计算成本。用对比目标训练的Siamese编码器将源和目标表示对齐,增加异常可分性并减轻特征漂移。我们评估DARPA透明计算(TC)计划的真实痕迹,并增加合成攻击场景来测试鲁棒性。在源到目标的传输中,该方法通过经典和深度基线提供了改进的检测分数,展示了APT检测的可扩展,可解释和可转移的解决方案。
摘要:Advanced Persistent Threats (APT) pose a major cybersecurity challenge due to their stealth, persistence, and adaptability. Traditional machine learning detectors struggle with class imbalance, high dimensional features, and scarce real world traces. They often lack transferability-performing well in the training domain but degrading in novel attack scenarios. We propose a hybrid transfer framework that integrates Transfer Learning, Explainable AI (XAI), contrastive learning, and Siamese networks to improve cross-domain generalization. An attention-based autoencoder supports knowledge transfer across domains, while Shapley Additive exPlanations (SHAP) select stable, informative features to reduce dimensionality and computational cost. A Siamese encoder trained with a contrastive objective aligns source and target representations, increasing anomaly separability and mitigating feature drift. We evaluate on real-world traces from the DARPA Transparent Computing (TC) program and augment with synthetic attack scenarios to test robustness. Across source to target transfers, the approach delivers improved detection scores with classical and deep baselines, demonstrating a scalable, explainable, and transferable solution for APT detection.


【2】Ranking-Enhanced Anomaly Detection Using Active Learning-Assisted Attention Adversarial Dual AutoEncoders
标题:使用主动学习辅助注意对抗性双AutoEnCoder进行排名增强异常检测
链接:https://arxiv.org/abs/2511.20480

作者:Sidahmed Benabderrahmane,James Cheney,Talal Rahwan
摘要:高级持续性威胁(APT)由于其隐蔽性和长期性,对网络安全构成了重大挑战。现代监督学习方法需要大量的标记数据,而这些数据在现实网络安全环境中往往是稀缺的。在本文中,我们提出了一种创新的方法,利用AutoEncoders进行无监督的异常检测,通过主动学习来增强APT异常的检测。通过选择性地查询不确定或模糊样本上的标签,我们最大限度地减少了标记成本,同时提高了检测率,使模型能够以最少的数据提高其检测准确性,同时减少了对大量手动标记的需求。我们提供了一个详细的制定建议的注意力对抗双AutoEncoder为基础的异常检测框架,并显示如何主动学习循环迭代增强模型。该框架是评估真实世界的不平衡起源跟踪数据库产生的DARPA透明计算程序,APT类攻击构成的数据少至0.004\%。这些数据集跨越多个操作系统,包括Android、Linux、BSD和Windows,并涵盖两种攻击场景。结果表明,与其他现有方法相比,主动学习过程中的检测率有了显着提高,性能也更好。
摘要:Advanced Persistent Threats (APTs) pose a significant challenge in cybersecurity due to their stealthy and long-term nature. Modern supervised learning methods require extensive labeled data, which is often scarce in real-world cybersecurity environments. In this paper, we propose an innovative approach that leverages AutoEncoders for unsupervised anomaly detection, augmented by active learning to iteratively improve the detection of APT anomalies. By selectively querying an oracle for labels on uncertain or ambiguous samples, we minimize labeling costs while improving detection rates, enabling the model to improve its detection accuracy with minimal data while reducing the need for extensive manual labeling. We provide a detailed formulation of the proposed Attention Adversarial Dual AutoEncoder-based anomaly detection framework and show how the active learning loop iteratively enhances the model. The framework is evaluated on real-world imbalanced provenance trace databases produced by the DARPA Transparent Computing program, where APT-like attacks constitute as little as 0.004\% of the data. The datasets span multiple operating systems, including Android, Linux, BSD, and Windows, and cover two attack scenarios. The results have shown significant improvements in detection rates during active learning and better performance compared to other existing approaches.


【3】Towards Trustworthy Wi-Fi Sensing: Systematic Evaluation of Deep Learning Model Robustness to Adversarial Attacks
标题:迈向可信的Wi-Fi感知:深度学习模型对抗性攻击鲁棒性的系统评估
链接:https://arxiv.org/abs/2511.20456

作者:Shreevanth Krishnaa Gopalakrishnan,Stephen Hailes
备注:19 pages, 8 figures, 7 tables
摘要:机器学习已成为基于信道状态信息(CSI)的人类感知系统的组成部分,并有望为未来蜂窝和Wi-Fi时代的无设备活动识别和身份检测等应用提供动力。然而,这些系统依赖于模型,其决策可能会受到微妙的干扰,从而引发了对无处不在的传感的安全性和可靠性的担忧。因此,量化和理解这些模型的鲁棒性(定义为它们在对抗性扰动下保持准确预测的能力)在无线传感可以安全地部署在现实世界环境中之前至关重要。   这项工作对CSI深度学习模型在不同威胁模型(白盒、黑盒/转移和普遍扰动)和不同攻击现实程度下的鲁棒性进行了系统评估。我们建立了一个框架,在三个公共数据集上比较紧凑的时间自动编码器模型与更大的深度架构,量化模型规模、训练机制和物理约束如何影响鲁棒性。我们的实验表明,较小的模型,而有效的,同样的性能在干净的数据,是明显不太稳健。我们进一步确认,物理上可实现的信号空间扰动,设计为在真实的无线信道中是可行的,显着减少攻击成功相比,无约束的特征空间攻击。对抗性训练减轻了这些漏洞,提高了平均鲁棒准确性,两个模型类的干净性能只有适度的下降。随着无线传感朝着可靠的跨域操作方向发展,这些发现为鲁棒性估计提供了定量基线,并为安全可靠的以人为本的传感系统提供了设计原则。
摘要 :Machine learning has become integral to Channel State Information (CSI)-based human sensing systems and is expected to power applications such as device-free activity recognition and identity detection in future cellular and Wi-Fi generations. However, these systems rely on models whose decisions can be subtly perturbed, raising concerns for security and reliability in ubiquitous sensing. Quantifying and understanding the robustness of such models, defined as their ability to maintain accurate predictions under adversarial perturbations, is therefore critical before wireless sensing can be safely deployed in real-world environments.   This work presents a systematic evaluation of the robustness of CSI deep learning models under diverse threat models (white-box, black-box/transfer, and universal perturbations) and varying degrees of attack realism. We establish a framework to compare compact temporal autoencoder models with larger deep architectures across three public datasets, quantifying how model scale, training regime, and physical constraints influence robustness. Our experiments show that smaller models, while efficient and equally performant on clean data, are markedly less robust. We further confirm that physically realizable signal-space perturbations, designed to be feasible in real wireless channels, significantly reduce attack success compared to unconstrained feature-space attacks. Adversarial training mitigates these vulnerabilities, improving mean robust accuracy with only moderate degradation in clean performance across both model classes. As wireless sensing advances towards reliable, cross-domain operation, these findings provide quantitative baselines for robustness estimation and inform design principles for secure and trustworthy human-centered sensing systems.


【4】MFM-point: Multi-scale Flow Matching for Point Cloud Generation
标题:MFM点:用于点云生成的多尺度流匹配
链接:https://arxiv.org/abs/2511.20041

作者:Petr Molodyk,Jaemoo Choi,David W. Romero,Ming-Yu Liu,Yongxin Chen
摘要:近年来,点云生成在三维创成建模中得到了极大的关注。在现有的方法中,基于点的方法直接生成点云,而不依赖于其他表示,如潜在的功能,网格,或体素。这些方法提供了低训练成本和算法简单性,但与基于表示的方法相比往往表现不佳。在本文中,我们提出了MFM-Point,一个多尺度的流匹配点云生成框架,大大提高了基于点的方法的可扩展性和性能,同时保持其简单性和效率。我们的多尺度生成算法采用了由粗到细的生成范式,在不产生额外训练或推理开销的情况下提高了生成质量和可扩展性。开发这种多尺度框架的一个关键挑战在于保持无序点云的几何结构,同时确保跨分辨率的平滑和一致的分布过渡。为了解决这个问题,我们引入了一个结构化的下采样和上采样策略,保留几何形状,并保持粗略和精细分辨率之间的对齐。我们的实验结果表明,MFM-Point实现了同类最佳的性能之间的基于点的方法和挑战最好的基于表示的方法。特别是,MFM点在多类别和高分辨率生成任务中表现出了很好的效果。
摘要:In recent years, point cloud generation has gained significant attention in 3D generative modeling. Among existing approaches, point-based methods directly generate point clouds without relying on other representations such as latent features, meshes, or voxels. These methods offer low training cost and algorithmic simplicity, but often underperform compared to representation-based approaches. In this paper, we propose MFM-Point, a multi-scale Flow Matching framework for point cloud generation that substantially improves the scalability and performance of point-based methods while preserving their simplicity and efficiency. Our multi-scale generation algorithm adopts a coarse-to-fine generation paradigm, enhancing generation quality and scalability without incurring additional training or inference overhead. A key challenge in developing such a multi-scale framework lies in preserving the geometric structure of unordered point clouds while ensuring smooth and consistent distributional transitions across resolutions. To address this, we introduce a structured downsampling and upsampling strategy that preserves geometry and maintains alignment between coarse and fine resolutions. Our experimental results demonstrate that MFM-Point achieves best-in-class performance among point-based methods and challenges the best representation-based methods. In particular, MFM-point demonstrates strong results in multi-category and high-resolution generation tasks.


【5】Training-Free Generation of Diverse and High-Fidelity Images via Prompt Semantic Space Optimization
标题:通过即时的语义空间优化无需训练生成多样化和高保真图像
链接:https://arxiv.org/abs/2511.19811

作者:Debin Meng,Chen Jin,Zheng Gao,Yanran Li,Ioannis Patras,Georgios Tzimiropoulos
备注:under review
摘要:图像多样性仍然是文本到图像扩散模型的基本挑战。低多样性模型往往会产生重复的输出,增加采样冗余,阻碍创造性探索和下游应用。一个主要原因是,在学习分布中,生成经常向强模式崩溃。现有的尝试,以提高多样性,如噪声抑制,提示重写,或转向为基础的指导,往往仍然崩溃的主导模式或引入失真,降低图像质量。有鉴于此,我们提出了令牌提示嵌入空间优化(TPSO),这是一个无需训练且与模型无关的模块。TPSO引入了可学习的参数来探索令牌嵌入空间的代表性不足的区域,减少了模型从学习分布的强模式中重复生成样本的趋势。与此同时,语义级空间提供了一个全局语义约束,可以调节分布变化,在保持高保真度的同时防止质量下降。在MS-COCO和三个扩散主干上的广泛实验表明,TPSO显著增强了生成多样性,将基线性能从1.10提高到4.18点,而不牺牲图像质量。代码将在接受后发布。
摘要:Image diversity remains a fundamental challenge for text-to-image diffusion models. Low-diversity models tend to generate repetitive outputs, increasing sampling redundancy and hindering both creative exploration and downstream applications. A primary cause is that generation often collapses toward a strong mode in the learned distribution. Existing attempts to improve diversity, such as noise resampling, prompt rewriting, or steering-based guidance, often still collapse to dominant modes or introduce distortions that degrade image quality. In light of this, we propose Token-Prompt embedding Space Optimization (TPSO), a training-free and model-agnostic module. TPSO introduces learnable parameters to explore underrepresented regions of the token embedding space, reducing the tendency of the model to repeatedly generate samples from strong modes of the learned distribution. At the same time, the prompt-level space provides a global semantic constraint that regulates distribution shifts, preventing quality degradation while maintaining high fidelity. Extensive experiments on MS-COCO and three diffusion backbones show that TPSO significantly enhances generative diversity, improving baseline performance from 1.10 to 4.18 points, without sacrificing image quality. Code will be released upon acceptance.


【6】Large Scale Community-Aware Network Generation
标题:大规模社区意识网络生成
链接:https://arxiv.org/abs/2511.19717

作者:Vikram Ramavarapu,João Alfredo Cardoso Lamy,Mohammad Dindoost,David A. Bader
备注:22 pages, 10 figures, code made available at https://github.com/illinois-or-research-analytics/reccs
摘要:社区检测或网络聚类用于识别网络中潜在的社区结构。由于现实世界中的网络缺乏标记的地面真理,评估这些算法带来了巨大的挑战。为了解决这个问题,研究人员使用合成网络生成器来生成具有真实社区标签的网络。RECCS就是这样一种算法,它将网络及其聚类作为输入,并通过模块化管道生成合成网络。每个生成的地面真值聚类保留了相应输入聚类的关键特征,包括连通性、最小度和度序列分布。输出由合成生成的网络和所有节点的不相交的真实聚类标签组成。在本文中,我们提出了两个增强版本:RECCS+和RECCS++。RECCS+保持了对原始RECCS的算法保真度,同时通过协调多个进程中的算法组件并采用多线程的编排器引入并行化。RECCS++在此基础上通过额外的算法优化来实现进一步的加速。我们的实验结果表明,RECCS+和RECCS++在我们的基准数据集上分别实现了高达49倍和139倍的加速,RECCS++的额外性能增益涉及适度的准确性权衡。凭借这一新发现的性能,RECCS++现在可以扩展到拥有超过1亿个节点和近20亿条边的网络。
摘要:Community detection, or network clustering, is used to identify latent community structure in networks. Due to the scarcity of labeled ground truth in real-world networks, evaluating these algorithms poses significant challenges. To address this, researchers use synthetic network generators that produce networks with ground-truth community labels. RECCS is one such algorithm that takes a network and its clustering as input and generates a synthetic network through a modular pipeline. Each generated ground truth cluster preserves key characteristics of the corresponding input cluster, including connectivity, minimum degree, and degree sequence distribution. The output consists of a synthetically generated network, and disjoint ground truth cluster labels for all nodes. In this paper, we present two enhanced versions: RECCS+ and RECCS++. RECCS+ maintains algorithmic fidelity to the original RECCS while introducing parallelization through an orchestrator that coordinates algorithmic components across multiple processes and employs multithreading. RECCS++ builds upon this foundation with additional algorithmic optimizations to achieve further speedup. Our experimental results demonstrate that RECCS+ and RECCS++ achieve speedups of up to 49x and 139x respectively on our benchmark datasets, with RECCS++'s additional performance gains involving a modest accuracy tradeoff. With this newfound performance, RECCS++ can now scale to networks with over 100 million nodes and nearly 2 billion edges.


【7】Integrating RCTs, RWD, AI/ML and Statistics: Next-Generation Evidence Synthesis
标题:整合RCT、RWD、AI/ML和统计:下一代证据合成
链接:https://arxiv.org/abs/2511.19735

作者:Shu Yang,Margaret Gamalo,Haoda Fu
摘要:随机对照试验(RCT)一直是临床证据的基石;然而,其成本,持续时间和严格的资格标准限制了功效和外部有效性。使用真实世界数据(RWD)的研究,历史上被认为在建立因果关系方面不太可靠,现在被认为对生成真实世界证据(RWE)很重要。与此同时,人工智能和机器学习(AI/ML)在整个药物开发过程中的应用越来越多,提供了可扩展性和灵活性,但也带来了传统统计数据所不具备的可解释性和严谨性方面的挑战。这一观点认为,证据生成的未来将不取决于RCT与RWD,或统计与AI/ML,而是取决于它们的原则性整合。为此,需要一个因果路线图来澄清推理目标,明确假设,并确保权衡的透明度。我们强调了综合证据合成的关键目标,包括将RCT结果传递给更广泛的人群,在RCT中嵌入AI辅助分析,设计混合对照试验,以及用长期RWD扩展短期RCT。我们还概述了隐私保护分析,不确定性量化和小样本方法的未来方向。通过将统计严谨性与AI/ML创新相结合,综合方法可以产生强大、透明和与政策相关的证据,使其成为现代监管科学的关键组成部分。
摘要:Randomized controlled trials (RCTs) have been the cornerstone of clinical evidence; however, their cost, duration, and restrictive eligibility criteria limit power and external validity. Studies using real-world data (RWD), historically considered less reliable for establishing causality, are now recognized to be important for generating real-world evidence (RWE). In parallel, artificial intelligence and machine learning (AI/ML) are being increasingly used throughout the drug development process, providing scalability and flexibility but also presenting challenges in interpretability and rigor that traditional statistics do not face. This Perspective argues that the future of evidence generation will not depend on RCTs versus RWD, or statistics versus AI/ML, but on their principled integration. To this end, a causal roadmap is needed to clarify inferential goals, make assumptions explicit, and ensure transparency about tradeoffs. We highlight key objectives of integrative evidence synthesis, including transporting RCT results to broader populations, embedding AI-assisted analyses within RCTs, designing hybrid controlled trials, and extending short-term RCTs with long-term RWD. We also outline future directions in privacy-preserving analytics, uncertainty quantification, and small-sample methods. By uniting statistical rigor with AI/ML innovation, integrative approaches can produce robust, transparent, and policy-relevant evidence, making them a key component of modern regulatory science.


半/弱/无/有监督|不确定性|主动学习(5篇)

【1】How to Purchase Labels? A Cost-Effective Approach Using Active Learning Markets
标题:如何购买标签?使用主动学习市场的经济高效方法
链接:https://arxiv.org/abs/2511.20605

作者:Xiwen Huang,Pierre Pinson
备注:Submitted as a preprint. 34 pages, 14 figures, 4 tables
摘要:我们引入并分析主动学习市场作为购买标签的一种方式,在分析师旨在获取额外数据以改进模型拟合或更好地训练预测分析应用程序的模型的情况下。这与已经存在的购买功能和示例的许多建议形成了对比。通过最初将市场清算正式化为一个优化问题,我们将预算约束和改进阈值集成到标签获取过程中。我们专注于一个单一的买方多卖方设置,并提出使用两个主动学习策略(方差为基础的和查询由委员会为基础),与不同的定价机制配对。他们比较基准随机抽样方法。所提出的策略验证真实世界的数据集从两个关键的应用领域:房地产定价和能源预测。结果表明,我们的方法的鲁棒性,始终实现卓越的性能与更少的标签相比,传统的方法。我们的建议包括一个易于实施的实用解决方案,用于在资源受限的环境中优化数据采集。
摘要:We introduce and analyse active learning markets as a way to purchase labels, in situations where analysts aim to acquire additional data to improve model fitting, or to better train models for predictive analytics applications. This comes in contrast to the many proposals that already exist to purchase features and examples. By originally formalising the market clearing as an optimisation problem, we integrate budget constraints and improvement thresholds into the label acquisition process. We focus on a single-buyer-multiple-seller setup and propose the use of two active learning strategies (variance based and query-by-committee based), paired with distinct pricing mechanisms. They are compared to a benchmark random sampling approach. The proposed strategies are validated on real-world datasets from two critical application domains: real estate pricing and energy forecasting. Results demonstrate the robustness of our approach, consistently achieving superior performance with fewer labels acquired compared to conventional methods. Our proposal comprises an easy-to-implement practical solution for optimising data acquisition in resource-constrained environments.


【2】Gated Uncertainty-Aware Runtime Dual Invariants for Neural Signal-Controlled Robotics
标题:神经信号控制机器人的门控不确定性感知预设双不变量
链接:https://arxiv.org/abs/2511.20570

作者:Tasha Kim,Oiwi Parker Jones
备注:Embodied and Safe-Assured Robotic Systems workshop at NeurIPS 2025
摘要:直接从神经信号解码用户意图的安全关键辅助系统需要严格的可靠性和信任保证。我们提出了GUARDIAN(门控不确定性感知双不变量),神经信号控制机器人的实时神经符号验证框架。GUARDIAN通过将信心校准的大脑信号解码与象征性目标基础和双层运行时监控相结合,来加强逻辑安全和生理信任。在包含9名受试者和5,184次试验的BNCI 2014运动想象脑电图(EEG)数据集上,即使使用测试准确度低(27-46%)和ECE置信度高(0.22-0.41)的轻量级解码器架构,系统也能以94-97%的高安全率运行。我们在模拟噪声测试中证明了与基线相比1.7倍的正确干预。该监测器以100 Hz和亚毫秒的决策延迟运行,使其实际上适用于基于闭环神经信号的系统。在21个消融结果中,GUARDIAN对信号退化表现出渐进的反应,并从意图、计划到行动产生可审计的痕迹,有助于将神经证据与可验证的机器人行动联系起来。
摘要:Safety-critical assistive systems that directly decode user intent from neural signals require rigorous guarantees of reliability and trust. We present GUARDIAN (Gated Uncertainty-Aware Runtime Dual Invariants), a framework for real-time neuro-symbolic verification for neural signal-controlled robotics. GUARDIAN enforces both logical safety and physiological trust by coupling confidence-calibrated brain signal decoding with symbolic goal grounding and dual-layer runtime monitoring. On the BNCI2014 motor imagery electroencephalogram (EEG) dataset with 9 subjects and 5,184 trials, the system performs at a high safety rate of 94-97% even with lightweight decoder architectures with low test accuracies (27-46%) and high ECE confidence miscalibration (0.22-0.41). We demonstrate 1.7x correct interventions in simulated noise testing versus at baseline. The monitor operates at 100Hz and sub-millisecond decision latency, making it practically viable for closed-loop neural signal-based systems. Across 21 ablation results, GUARDIAN exhibits a graduated response to signal degradation, and produces auditable traces from intent, plan to action, helping to link neural evidence to verifiable robot action.


【3】DiCaP: Distribution-Calibrated Pseudo-labeling for Semi-Supervised Multi-Label Learning
标题:DiCaP:用于半监督多标签学习的分布校准伪标记
链接:https://arxiv.org/abs/2511.20225

作者:Bo Han,Zhuoming Li,Xiaoyu Wang,Yaxin Hou,Hui Liu,Junhui Hou,Yuheng Jia
备注:Accepted by AAAI-26
摘要 :半监督多标签学习(SSMLL)旨在通过利用未标记数据来提高模型的性能,从而解决多标签学习(MLL)中有限标记数据的挑战。虽然伪标记已成为SSMLL中的主导策略,但大多数现有方法为所有伪标记分配相等的权重,而不管它们的质量如何,这会放大噪声或不确定预测的影响,并降低整体性能。在本文中,我们从理论上证明,最佳的权重的伪标签应该反映其正确性的可能性。从经验上讲,我们观察到,在同一个数据集上,未标记数据的正确性似然分布保持稳定,即使标记训练样本的数量变化。基于这一认识,我们提出了分布校准伪标签(DiCaP),这是一个正确性感知框架,可以估计后验精度以校准伪标签权重。我们进一步引入了一种双阈值机制来分离置信区域和模糊区域:置信样本被伪标记并相应地加权,而模糊样本则通过无监督对比学习进行探索。在多个基准数据集上进行的实验验证了我们的方法实现了一致的改进,超过最先进的方法高达4.27%。
摘要:Semi-supervised multi-label learning (SSMLL) aims to address the challenge of limited labeled data in multi-label learning (MLL) by leveraging unlabeled data to improve the model's performance. While pseudo-labeling has become a dominant strategy in SSMLL, most existing methods assign equal weights to all pseudo-labels regardless of their quality, which can amplify the impact of noisy or uncertain predictions and degrade the overall performance. In this paper, we theoretically verify that the optimal weight for a pseudo-label should reflect its correctness likelihood. Empirically, we observe that on the same dataset, the correctness likelihood distribution of unlabeled data remains stable, even as the number of labeled training samples varies. Building on this insight, we propose Distribution-Calibrated Pseudo-labeling (DiCaP), a correctness-aware framework that estimates posterior precision to calibrate pseudo-label weights. We further introduce a dual-thresholding mechanism to separate confident and ambiguous regions: confident samples are pseudo-labeled and weighted accordingly, while ambiguous ones are explored by unsupervised contrastive learning. Experiments conducted on multiple benchmark datasets verify that our method achieves consistent improvements, surpassing state-of-the-art methods by up to 4.27%.


【4】Stragglers Can Contribute More: Uncertainty-Aware Distillation for Asynchronous Federated Learning
标题:落后者可以做出更多贡献:不确定性感知的同步联邦学习蒸馏
链接:https://arxiv.org/abs/2511.19966

作者:Yujia Wang,Fenglong Ma,Jinghui Chen
备注:28 pages
摘要:异步联邦学习(FL)最近因其增强的效率和可扩展性而受到关注,使本地客户端能够以自己的速度向服务器发送模型更新,而无需等待较慢的参与者。然而,这样的设计遇到了重大的挑战,例如来自落后客户端的过时更新降低整体模型性能的风险,以及由主导学习过程的更快客户端引入的潜在偏差,特别是在异构数据分布下。现有的方法通常只解决其中一个问题,这就产生了一个冲突,即减轻过时更新的影响可能会加剧更快客户端产生的偏见,反之亦然。为了解决这些挑战,我们提出了FedEcho,一种新的框架,它结合了不确定性感知蒸馏,以提高异步FL性能下的大异步延迟和数据异构性。具体而言,不确定性感知蒸馏使服务器能够评估落后客户端所做预测的可靠性,并根据其估计的不确定性动态调整这些预测的影响。通过优先考虑更确定的预测,同时仍然利用来自所有客户端的各种信息,FedEcho有效地减轻了过时更新和数据异构性的负面影响。通过大量的实验,我们证明了FedEcho始终优于现有的异步联邦学习基线,实现了强大的性能,而无需访问私有客户端数据。
摘要:Asynchronous federated learning (FL) has recently gained attention for its enhanced efficiency and scalability, enabling local clients to send model updates to the server at their own pace without waiting for slower participants. However, such a design encounters significant challenges, such as the risk of outdated updates from straggler clients degrading the overall model performance and the potential bias introduced by faster clients dominating the learning process, especially under heterogeneous data distributions. Existing methods typically address only one of these issues, creating a conflict where mitigating the impact of outdated updates can exacerbate the bias created by faster clients, and vice versa. To address these challenges, we propose FedEcho, a novel framework that incorporates uncertainty-aware distillation to enhance the asynchronous FL performances under large asynchronous delays and data heterogeneity. Specifically, uncertainty-aware distillation enables the server to assess the reliability of predictions made by straggler clients, dynamically adjusting the influence of these predictions based on their estimated uncertainty. By prioritizing more certain predictions while still leveraging the diverse information from all clients, FedEcho effectively mitigates the negative impacts of outdated updates and data heterogeneity. Through extensive experiments, we demonstrate that FedEcho consistently outperforms existing asynchronous federated learning baselines, achieving robust performance without requiring access to private client data.


【5】Beyond Binary Classification: A Semi-supervised Approach to Generalized AI-generated Image Detection
标题:超越二值分类:一种半监督的广义AI生成图像检测方法
链接:https://arxiv.org/abs/2511.19499

作者:Hong-Hanh Nguyen-Le,Van-Tuan Tran,Dinh-Thuc Nguyen,Nhien-An Le-Khac
备注:Accepted to The 40th Annual AAAI Conference on Artificial Intelligence - 2025
摘要:发电机的快速发展(例如,StyleGAN,Midjourney,DALL-E)制作了高度逼真的合成图像,对数字媒体的真实性提出了重大挑战。这些生成器通常基于几个核心架构家族,主要是生成对抗网络(GANs)和扩散模型(DM)。当前取证中的一个关键漏洞是检测器无法实现跨生成器泛化,特别是在跨越架构边界时(例如,从GANs到DM)。我们假设这种差距源于这些不同架构产生的工件的根本差异。在这项工作中,我们提供了一个理论分析,解释如何不同的优化目标的GAN和DM架构导致不同的流形覆盖行为。我们证明了GANs允许部分覆盖,通常会导致边界伪影,而DM强制完全覆盖,导致过度平滑模式。出于这种分析,我们提出了\textbf{Tri}archy \textbf{Detect}或(TriDetect),一种半监督方法,通过发现“假”类中的潜在架构模式来增强二进制分类。TriDetect通过Sinkhorn-Knopp算法和跨视图一致性机制采用平衡的聚类分配,鼓励模型学习基本的架构差异。我们在两个标准基准和三个野外数据集上对13个基线进行了评估,以证明其对未知生成器的泛化能力。
摘要:The rapid advancement of generators (e.g., StyleGAN, Midjourney, DALL-E) has produced highly realistic synthetic images, posing significant challenges to digital media authenticity. These generators are typically based on a few core architectural families, primarily Generative Adversarial Networks (GANs) and Diffusion Models (DMs). A critical vulnerability in current forensics is the failure of detectors to achieve cross-generator generalization, especially when crossing architectural boundaries (e.g., from GANs to DMs). We hypothesize that this gap stems from fundamental differences in the artifacts produced by these \textbf{distinct architectures}. In this work, we provide a theoretical analysis explaining how the distinct optimization objectives of the GAN and DM architectures lead to different manifold coverage behaviors. We demonstrate that GANs permit partial coverage, often leading to boundary artifacts, while DMs enforce complete coverage, resulting in over-smoothing patterns. Motivated by this analysis, we propose the \textbf{Tri}archy \textbf{Detect}or (TriDetect), a semi-supervised approach that enhances binary classification by discovering latent architectural patterns within the "fake" class. TriDetect employs balanced cluster assignment via the Sinkhorn-Knopp algorithm and a cross-view consistency mechanism, encouraging the model to learn fundamental architectural distincts. We evaluate our approach on two standard benchmarks and three in-the-wild datasets against 13 baselines to demonstrate its generalization capability to unseen generators.


迁移|Zero/Few/One-Shot|自适应(11篇)

【1】Adaptive Hopfield Network: Rethinking Similarities in Associative Memory
标题:自适应霍普菲尔德网络:重新思考联想记忆中的相似性
链接:https://arxiv.org/abs/2511.20609

作者:Shurong Wang,Yuqi Pan,Zhuoyang Shen,Meng Zhang,Hongwei Wang,Guoqi Li
摘要:关联记忆模型是生物智能基础的内容可寻址记忆系统,以其高度可解释性而闻名。然而,现有的模型评估检索的质量的基础上接近,这不能保证检索模式具有最强的关联与查询,失败的正确性。我们重新定义这个问题,提出了一个查询是一个生成的存储内存模式的变体,并定义了一个变量分布模型这个微妙的上下文相关的生成过程。因此,正确的检索应该返回的记忆模式与最大的后验概率是查询的起源。这种观点表明,一个理想的相似性度量应该近似的可能性,每个存储的模式生成的查询根据变量分布,这是不可能的固定和预定义的相似性使用现有的联想记忆。为此,我们开发了自适应相似性,这是一种新的机制,可以从上下文中提取的样本中学习近似这种有见地但未知的可能性,旨在进行正确的检索。我们从理论上证明,我们提出的自适应相似性达到最佳的正确检索下三个典型的和广泛适用的类型的变体:噪声,掩蔽,和偏见。我们将这种机制集成到一个新的自适应Hopfield网络(A-Hop)中,实证结果表明,它在不同的任务中实现了最先进的性能,包括记忆检索,表格分类,图像分类和多实例学习。
摘要 :Associative memory models are content-addressable memory systems fundamental to biological intelligence and are notable for their high interpretability. However, existing models evaluate the quality of retrieval based on proximity, which cannot guarantee that the retrieved pattern has the strongest association with the query, failing correctness. We reframe this problem by proposing that a query is a generative variant of a stored memory pattern, and define a variant distribution to model this subtle context-dependent generative process. Consequently, correct retrieval should return the memory pattern with the maximum a posteriori probability of being the query's origin. This perspective reveals that an ideal similarity measure should approximate the likelihood of each stored pattern generating the query in accordance with variant distribution, which is impossible for fixed and pre-defined similarities used by existing associative memories. To this end, we develop adaptive similarity, a novel mechanism that learns to approximate this insightful but unknown likelihood from samples drawn from context, aiming for correct retrieval. We theoretically prove that our proposed adaptive similarity achieves optimal correct retrieval under three canonical and widely applicable types of variants: noisy, masked, and biased. We integrate this mechanism into a novel adaptive Hopfield network (A-Hop), and empirical results show that it achieves state-of-the-art performance across diverse tasks, including memory retrieval, tabular classification, image classification, and multiple instance learning.


【2】A Tale of Two Geometries: Adaptive Optimizers and Non-Euclidean Descent
标题:两个几何体的故事:自适应优化器和非欧几里得后裔
链接:https://arxiv.org/abs/2511.20584

作者:Shuo Xie,Tianhao Wang,Beining Wu,Zhiyuan Li
摘要:自适应优化器在仅适应当前梯度时可以简化为归一化最速下降(NSD),这表明两个算法家族之间存在密切联系。然而,他们的分析之间的一个关键区别在于几何形状,例如,在凸设置中,自适应优化器由更强的自适应平滑条件控制,而NSD依赖于平滑的标准概念。我们将自适应光滑性理论推广到非凸情形,并证明了它精确地刻画了自适应优化器的收敛性。此外,我们建立了自适应平滑,使自适应优化加速Nesterov动量在凸设置,保证某些非欧几里德几何的标准平滑下无法实现。我们进一步开发了一个类似的比较随机优化引入自适应梯度方差,这平行自适应平滑,并导致无量纲收敛保证,不能实现某些非欧几里德几何的标准梯度方差。
摘要:Adaptive optimizers can reduce to normalized steepest descent (NSD) when only adapting to the current gradient, suggesting a close connection between the two algorithmic families. A key distinction between their analyses, however, lies in the geometries, e.g., smoothness notions, they rely on. In the convex setting, adaptive optimizers are governed by a stronger adaptive smoothness condition, while NSD relies on the standard notion of smoothness. We extend the theory of adaptive smoothness to the nonconvex setting and show that it precisely characterizes the convergence of adaptive optimizers. Moreover, we establish that adaptive smoothness enables acceleration of adaptive optimizers with Nesterov momentum in the convex setting, a guarantee unattainable under standard smoothness for certain non-Euclidean geometry. We further develop an analogous comparison for stochastic optimization by introducing adaptive gradient variance, which parallels adaptive smoothness and leads to dimension-free convergence guarantees that cannot be achieved under standard gradient variance for certain non-Euclidean geometry.


【3】Soft Adaptive Policy Optimization
标题:软自适应策略优化
链接:https://arxiv.org/abs/2511.20347

作者:Chang Gao,Chujie Zheng,Xiong-Hui Chen,Kai Dang,Shixuan Liu,Bowen Yu,An Yang,Shuai Bai,Jingren Zhou,Junyang Lin
摘要:强化学习(RL)在增强大型语言模型(LLM)的推理能力方面发挥着越来越重要的作用,但稳定和高性能的策略优化仍然具有挑战性。令牌级别的重要性比率通常表现出很高的差异,这种现象在混合专家模型中会加剧,从而导致不稳定的更新。现有的基于组的策略优化方法,如GSPO和GRPO,通过硬裁剪来缓解这个问题,使得难以保持稳定性和有效学习。我们提出了软自适应策略优化(SAPO),它用一个平滑的温度控制门来代替硬裁剪,该门可以自适应地衰减偏离策略的更新,同时保留有用的学习信号。与GSPO和GRPO相比,SAPO具有序列相干性和令牌自适应性。与GSPO类似,SAPO保持序列级相干性,但其软门控形成连续的信任区域,避免了GSPO中使用的脆性硬剪切带。当一个序列包含一些高度偏离策略的令牌时,GSPO会抑制该序列的所有梯度,而SAPO则选择性地只降低违规令牌的权重,并保留来自接近策略令牌的学习信号,从而提高样本效率。相对于GRPO,SAPO用平滑的温度控制扩展取代了硬令牌级裁剪,从而实现了更多信息和稳定的更新。数学推理基准的实证结果表明,SAPO具有改善的训练稳定性和更高的Pass@1性能在可比的培训预算。此外,我们采用SAPO训练Qwen 3-VL模型系列,证明SAPO在不同的任务和不同的模型大小产生一致的性能增益。总的来说,SAPO为LLM的RL训练提供了更可靠,可扩展和有效的优化策略。
摘要:Reinforcement learning (RL) plays an increasingly important role in enhancing the reasoning capabilities of large language models (LLMs), yet stable and performant policy optimization remains challenging. Token-level importance ratios often exhibit high variance-a phenomenon exacerbated in Mixture-of-Experts models-leading to unstable updates. Existing group-based policy optimization methods, such as GSPO and GRPO, alleviate this problem via hard clipping, making it difficult to maintain both stability and effective learning. We propose Soft Adaptive Policy Optimization (SAPO), which replaces hard clipping with a smooth, temperature-controlled gate that adaptively attenuates off-policy updates while preserving useful learning signals. Compared with GSPO and GRPO, SAPO is both sequence-coherent and token-adaptive. Like GSPO, SAPO maintains sequence-level coherence, but its soft gating forms a continuous trust region that avoids the brittle hard clipping band used in GSPO. When a sequence contains a few highly off-policy tokens, GSPO suppresses all gradients for that sequence, whereas SAPO selectively down-weights only the offending tokens and preserves the learning signal from the near-on-policy ones, improving sample efficiency. Relative to GRPO, SAPO replaces hard token-level clipping with smooth, temperature-controlled scaling, enabling more informative and stable updates. Empirical results on mathematical reasoning benchmarks indicate that SAPO exhibits improved training stability and higher Pass@1 performance under comparable training budgets. Moreover, we employ SAPO to train the Qwen3-VL model series, demonstrating that SAPO yields consistent performance gains across diverse tasks and different model sizes. Overall, SAPO provides a more reliable, scalable, and effective optimization strategy for RL training of LLMs.


【4】HVAdam: A Full-Dimension Adaptive Optimizer
标题:HDAdam:全维度自适应优化器
链接:https://arxiv.org/abs/2511.20277

作者:Yiheng Zhang,Shaowu Wu,Yuanzhuo Xu,Jiajun Wu,Shang Xu,Steve Drew,Xiaoguang Niu
摘要:像Adam这样的自适应优化器在训练大型语言模型和扩散模型等大型模型方面取得了巨大成功。然而,它们通常比非自适应方法更差,例如在CNN等经典架构上的SGD。我们确定了这种性能差距的一个关键原因:预处理器的自适应性,这限制了优化器适应不同优化环境的能力。为了解决这个问题,我们提出了Anon(Adaptivity Non-Restricted Optimizer with Novel Convergence Technique),一种具有连续可调自适应性的新型优化器   允许它在SGD和Adam行为之间插入,甚至超越两者。为了确保整个自适应谱的收敛,我们引入了增量延迟更新(IDU),这是一种新的机制,比AMSGrad的硬最大跟踪策略更灵活,并增强了对梯度噪声的鲁棒性。我们从理论上建立了凸和非凸设置下的收敛保证。从经验上讲,Anon在代表性图像分类、扩散和语言建模任务上始终优于最先进的优化器。这些结果表明,自适应性可以作为一个有价值的可调设计原则,和Anon提供了第一个统一的和可靠的框架,能够弥合经典和现代优化器之间的差距,并超越其优越的性能。
摘要 :Adaptive optimizers such as Adam have achieved great success in training large-scale models like large language models and diffusion models. However, they often generalize worse than non-adaptive methods, such as SGD on classical architectures like CNNs. We identify a key cause of this performance gap: adaptivity in pre-conditioners, which limits the optimizer's ability to adapt to diverse optimization landscapes. To address this, we propose Anon (Adaptivity Non-restricted Optimizer with Novel convergence technique), a novel optimizer with continuously tunable adaptivity   , allowing it to interpolate between SGD-like and Adam-like behaviors and even extrapolate beyond both. To ensure convergence across the entire adaptivity spectrum, we introduce incremental delay update (IDU), a novel mechanism that is more flexible than AMSGrad's hard max-tracking strategy and enhances robustness to gradient noise. We theoretically establish convergence guarantees under both convex and non-convex settings. Empirically, Anon consistently outperforms state-of-the-art optimizers on representative image classification, diffusion, and language modeling tasks. These results demonstrate that adaptivity can serve as a valuable tunable design principle, and Anon provides the first unified and reliable framework capable of bridging the gap between classical and modern optimizers and surpassing their advantageous properties.


【5】AdaCap: An Adaptive Contrastive Approach for Small-Data Neural Networks
标题:AdaCAP:小数据神经网络的自适应对比方法
链接:https://arxiv.org/abs/2511.20170

作者:Bruno Belucci,Karim Lounici,Katia Meziani
备注:Submitted to ESANN 2026
摘要:神经网络在小型表格数据集上挣扎,其中基于树的模型仍然占主导地位。我们介绍了自适应对比方法(AdaCap),一种训练方案,它结合了基于置换的对比损失与基于Tikhonov的封闭形式输出映射。在85个真实世界的回归数据集和多个架构中,AdaCap在小样本制度中产生了一致的和统计上显著的改进,特别是对于残差模型。根据数据集特征(大小,偏度,噪声)训练的元预测器可以准确预测AdaCap何时有益。这些结果表明,AdaCap作为一种有针对性的正则化机制,可以在神经网络最脆弱的地方加强它们。所有结果和代码都可以在https://github.com/BrunoBelucci/adacap上公开获得。
摘要:Neural networks struggle on small tabular datasets, where tree-based models remain dominant. We introduce Adaptive Contrastive Approach (AdaCap), a training scheme that combines a permutation-based contrastive loss with a Tikhonov-based closed-form output mapping. Across 85 real-world regression datasets and multiple architectures, AdaCap yields consistent and statistically significant improvements in the small-sample regime, particularly for residual models. A meta-predictor trained on dataset characteristics (size, skewness, noise) accurately anticipates when AdaCap is beneficial. These results show that AdaCap acts as a targeted regularization mechanism, strengthening neural networks precisely where they are most fragile. All results and code are publicly available at https://github.com/BrunoBelucci/adacap.


【6】Zero-Shot Transfer Capabilities of the Sundial Foundation Model for Leaf Area Index Forecasting
标题:用于叶面积指数预测的Sundial Foundation模型的零发射传递能力
链接:https://arxiv.org/abs/2511.20004

作者:Peining Zhang,Hongchen Qin,Haochen Zhang,Ziqi Guo,Guiling Wang,Jinbo Bi
摘要:研究了农业监测中叶面积指数(LAI)时间序列基础模型的zero-shot预测能力。使用HiQ数据集(美国,2000-2022),我们系统地比较了统计基线,一个完全监督的LSTM和多个评估协议下的Sundial基础模型。我们发现,在zero-shot设置中,Sundial可以胜过经过完全训练的LSTM,前提是输入上下文窗口足够长,特别是当覆盖超过一个或两个完整的季节周期时。这表明,第一次,一个通用的基础模型可以超过专门的监督模型的遥感时间序列预测没有任何特定的任务调整。这些结果突出了预训练的时间序列基础模型在农业和环境应用中作为有效的即插即用预测器的强大潜力。
摘要:This work investigates the zero-shot forecasting capability of time-series foundation models for Leaf Area Index (LAI) forecasting in agricultural monitoring. Using the HiQ dataset (U.S., 2000-2022), we systematically compare statistical baselines, a fully supervised LSTM, and the Sundial foundation model under multiple evaluation protocols. We find that Sundial, in the zero-shot setting, can outperform a fully trained LSTM provided that the input context window is sufficiently long-specifically, when covering more than one or two full seasonal cycles. This demonstrates, for the first time, that a general-purpose foundation model can surpass specialized supervised models on remote-sensing time series prediction without any task-specific tuning. These results highlight the strong potential of pretrained time-series foundation models to serve as effective plug-and-play forecasters in agricultural and environmental applications.


【7】Hierarchical Spatio-Temporal Attention Network with Adaptive Risk-Aware Decision for Forward Collision Warning in Complex Scenarios
标题:具有自适应风险感知决策的分层时空注意网络在复杂场景中向前碰撞警告
链接:https://arxiv.org/abs/2511.19952

作者:Haoran Hu,Junren Shi,Shuo Jiang,Kun Cheng,Xia Yang,Changhao Piao
摘要:前向碰撞预警系统对车辆安全和自动驾驶至关重要,但目前的方法往往无法平衡精确的多智能体交互建模与实时决策适应性,边缘部署的高计算成本和简化的交互模型带来的不可靠性证明了这一点。为了克服这些双重挑战-计算复杂性和建模不一致性-以及传统静态阈值警告,本文介绍了一个集成的FCW框架,对层次时空注意力网络与动态风险阈值调整算法。HSTAN采用解耦架构(空间的图形注意力网络,时间的自注意力级联GRU)来实现卓越的性能和效率,仅需要12.3 ms的推理时间(比Transformer方法快73%),并将NGSIM数据集上的平均位移误差(ADE)降低到0.73 m(比Social_LSTM好42.2%)。此外,保形分位数回归通过生成预测区间来增强可靠性(91.3%的覆盖率,90%的置信度),DTRA模块随后通过物理信息潜在风险函数和受统计过程控制启发的自适应阈值机制将其转换为及时的警告。在多场景数据集上进行测试,整个系统表现出很高的效率,达到0.912的F1分数,8.2%的低误报率和2.8秒的充足预警时间,验证了框架在复杂环境中的卓越性能和实际部署可行性。
摘要:Forward Collision Warning systems are crucial for vehicle safety and autonomous driving, yet current methods often fail to balance precise multi-agent interaction modeling with real-time decision adaptability, evidenced by the high computational cost for edge deployment and the unreliability stemming from simplified interaction models.To overcome these dual challenges-computational complexity and modeling insufficiency-along with the high false alarm rates of traditional static-threshold warnings, this paper introduces an integrated FCW framework that pairs a Hierarchical Spatio-Temporal Attention Network with a Dynamic Risk Threshold Adjustment algorithm. HSTAN employs a decoupled architecture (Graph Attention Network for spatial, cascaded GRU with self-attention for temporal) to achieve superior performance and efficiency, requiring only 12.3 ms inference time (73% faster than Transformer methods) and reducing the Average Displacement Error (ADE) to 0.73m (42.2% better than Social_LSTM) on the NGSIM dataset. Furthermore, Conformalized Quantile Regression enhances reliability by generating prediction intervals (91.3% coverage at 90% confidence), which the DTRA module then converts into timely warnings via a physics-informed risk potential function and an adaptive threshold mechanism inspired by statistical process control.Tested across multi-scenario datasets, the complete system demonstrates high efficacy, achieving an F1 score of 0.912, a low false alarm rate of 8.2%, and an ample warning lead time of 2.8 seconds, validating the framework's superior performance and practical deployment feasibility in complex environments.


【8】Comparative Analysis of LoRA-Adapted Embedding Models for Clinical Cardiology Text Representation
标题:临床心脏病学文本表示的LoRA适应嵌入模型的比较分析
链接:https://arxiv.org/abs/2511.19739

作者:Richard J. Young,Alice M. Matthews
备注:25 pages, 13 figures, 5 tables
摘要 :特定领域的文本嵌入对于临床自然语言处理至关重要,但跨模型架构的系统比较仍然有限。这项研究评估了10个基于transformer的嵌入模型,这些模型通过对来自权威医学教科书的106,535个心脏病学文本对进行低秩自适应(LoRA)微调而适用于心脏病学。结果表明,编码器的架构,特别是BioLinkBERT,实现了卓越的域特定的性能(分离分数:0.510)相比,更大的解码器为基础的模型,同时需要显着更少的计算资源。这些发现挑战了较大的语言模型必然产生更好的特定领域嵌入的假设,并为临床NLP系统开发提供了实际指导。所有模型、训练代码和评估数据集都是公开的,以支持医学信息学中的可重复研究。
摘要:Domain-specific text embeddings are critical for clinical natural language processing, yet systematic comparisons across model architectures remain limited. This study evaluates ten transformer-based embedding models adapted for cardiology through Low-Rank Adaptation (LoRA) fine-tuning on 106,535 cardiology text pairs derived from authoritative medical textbooks. Results demonstrate that encoder-only architectures, particularly BioLinkBERT, achieve superior domain-specific performance (separation score: 0.510) compared to larger decoder-based models, while requiring significantly fewer computational resources. The findings challenge the assumption that larger language models necessarily produce better domain-specific embeddings and provide practical guidance for clinical NLP system development. All models, training code, and evaluation datasets are publicly available to support reproducible research in medical informatics.


【9】An Adaptive, Data-Integrated Agent-Based Modeling Framework for Explainable and Contestable Policy Design
标题:用于可解释和可竞争政策设计的自适应、数据集成的基于代理的建模框架
链接:https://arxiv.org/abs/2511.19726

作者:Roberto Garrone
备注:27 pages, 2 case studies (emissions and smart grids). Preprint prepared during the author's PhD research at the Open University of Cyprus and the University of Milano-Bicocca. Introduces a unified framework for adaptive multi-agent learning with information-theoretic, causal, and clustering diagnostics
摘要:多智能体系统通常在反馈、自适应和非平稳性下运行,然而许多仿真研究保留了静态的决策规则和固定的控制参数。本文介绍了一个通用的自适应多智能体学习框架,它集成了:(i)区分静态与自适应智能体和固定与自适应系统参数的四种动态机制;(ii)信息理论诊断(熵率,统计复杂性和预测信息),以评估可预测性和结构;(iii)明确干预语义的结构因果模型;(iv)从总体或样本数据生成代理级先验的程序;以及(v)识别紧急行为机制的无监督方法。该框架提供了一个领域中立的架构,用于分析学习代理和自适应控制如何共同塑造系统轨迹,从而能够系统地比较非平衡,振荡或漂移动态的稳定性,性能和可解释性。数学定义,计算运营商,和实验设计模板提供,产生一个结构化的方法,用于开发可解释和可扩展的多智能体决策过程。
摘要:Multi-agent systems often operate under feedback, adaptation, and non-stationarity, yet many simulation studies retain static decision rules and fixed control parameters. This paper introduces a general adaptive multi-agent learning framework that integrates: (i) four dynamic regimes distinguishing static versus adaptive agents and fixed versus adaptive system parameters; (ii) information-theoretic diagnostics (entropy rate, statistical complexity, and predictive information) to assess predictability and structure; (iii) structural causal models for explicit intervention semantics; (iv) procedures for generating agent-level priors from aggregate or sample data; and (v) unsupervised methods for identifying emergent behavioral regimes. The framework offers a domain-neutral architecture for analyzing how learning agents and adaptive controls jointly shape system trajectories, enabling systematic comparison of stability, performance, and interpretability across non-equilibrium, oscillatory, or drifting dynamics. Mathematical definitions, computational operators, and an experimental design template are provided, yielding a structured methodology for developing explainable and contestable multi-agent decision processes.


【10】CafeQ: Calibration-free Quantization via Learned Transformations and Adaptive Rounding
标题:CafeQ:通过学习变换和自适应舍入进行免校准量化
链接:https://arxiv.org/abs/2511.19705

作者:Ziteng Sun,Adrian Benton,Samuel Kushnir,Asher Trockman,Vikas Singh,Suhas Diggavi,Ananda Theertha Suresh
摘要:后训练量化是一种有效降低大型语言模型服务成本的方法,其中标准方法是使用舍入到最近的量化级别方案。然而,由于权重中的离群值,这通常会引入较大的误差。提出的缓解机制包括应用自适应舍入、随机旋转变换或使用校准数据提交到训练后目标。不幸的是,这种对校准数据的依赖在某些现实场景中可能受到严重限制,因为这些数据可能不可用或受到隐私法规的限制。在本文中,我们提出了算法来优化变换和自适应舍入,而无需访问任何校准数据。该优化是通过设计一个合适的代理函数的量化损失没有校准数据。为了保持推理效率,我们对单个矩阵进行结构化矩阵变换。对于在计算图中直接交互的成对权重,我们使用对偶矩阵变换和自适应舍入方法。我们在Gemma 2模型上进行实验,并观察到基线的一致改善。对于Gemma 2 9B量化,我们的方法将4位量化的平均基准得分从61.9提高到62.4,将3位量化的平均基准得分从52.0提高到60.6,同时增加不到3%的计算开销。此外,我们的方法实现了与常用的GPTQ方法相当的性能,该方法需要校准数据。
摘要:Post-training quantization is an effective method for reducing the serving cost of large language models, where the standard approach is to use a round-to-nearest quantization level scheme. However, this often introduces large errors due to outliers in the weights. Proposed mitigation mechanisms include applying adaptive rounding, random rotation transformations or committing to a post-training target using calibration data. Unfortunately, this reliance on calibration data can be severely limiting in some real-world scenarios as such data may be unavailable or subject to privacy regulations. In this paper, we propose algorithms to optimize transformations and adaptive rounding without access to any calibration data. The optimization is achieved by designing a suitable proxy function for the quantization loss without calibration data. To maintain inference efficiency, we perform structured matrix transformations for single matrices. For paired weights that interact directly in the computation graph, we use dual matrix transformations and adaptive rounding methods. We conduct experiments on Gemma 2 models, and observe consistent improvement over the baselines. For Gemma 2 9B quantization, our method improves the average benchmark score from 61.9 to 62.4 for 4-bit quantization and from 52.0 to 60.6 for 3-bit quantization, while adding less than 3% of computation overhead. Furthermore, our method achieves performance comparable to the commonly used GPTQ method, which requires calibration data.


【11】Towards Efficient VLMs: Information-Theoretic Driven Compression via Adaptive Structural Pruning
标题:迈向高效的VLM:通过自适应结构修剪实现信息理论驱动的压缩
链接:https://arxiv.org/abs/2511.19518

作者:Zhaoqi Xu,Yingying Zhang,Jian Li,Jianwei Guo,Qiannan Zhu,Hua Huang
摘要:视觉语言模型(VLM)的最新进展在多模态任务中表现出了卓越的性能,但其不断增长的规模对部署和效率提出了严峻的挑战。现有的压缩方法往往依赖于启发式的重要性度量或经验修剪规则,缺乏理论保证的信息保护。在这项工作中,我们提出了InfoPrune,自适应结构压缩的VLMs的信息理论框架。接地的信息瓶颈原则,我们制定修剪之间的权衡保留任务相关的语义和丢弃冗余的依赖关系。为了量化每个注意力头的贡献,我们引入了一个基于熵的有效秩(eRank),并采用Kolmogorov-Smirnov(KS)距离来衡量原始结构和压缩结构之间的分歧。这产生了一个统一的标准,共同考虑结构稀疏性和信息效率。在此基础上,我们进一步设计了两个互补的方案:(1)一个基于训练的头部修剪的建议的信息损失目标的指导下,和(2)通过自适应低秩近似训练免费FFN压缩。在VQAv 2、TextVQA和GQA上进行的大量实验表明,InfoPrune实现了高达3.2倍的FLOP减少和1.8倍的加速,性能下降可以忽略不计,为高效的多模态大型模型建立了理论基础和实际有效的一步。
摘要 :Recent advances in vision-language models (VLMs) have shown remarkable performance across multimodal tasks, yet their ever-growing scale poses severe challenges for deployment and efficiency. Existing compression methods often rely on heuristic importance metrics or empirical pruning rules, lacking theoretical guarantees about information preservation. In this work, we propose InfoPrune, an information-theoretic framework for adaptive structural compression of VLMs. Grounded in the Information Bottleneck principle, we formulate pruning as a trade-off between retaining task-relevant semantics and discarding redundant dependencies. To quantify the contribution of each attention head, we introduce an entropy-based effective rank (eRank) and employ the Kolmogorov--Smirnov (KS) distance to measure the divergence between original and compressed structures. This yields a unified criterion that jointly considers structural sparsity and informational efficiency. Building on this foundation, we further design two complementary schemes: (1) a training-based head pruning guided by the proposed information loss objective, and (2) a training-free FFN compression via adaptive low-rank approximation. Extensive experiments on VQAv2, TextVQA, and GQA demonstrate that InfoPrune achieves up to 3.2x FLOP reduction and 1.8x acceleration with negligible performance degradation, establishing a theoretically grounded and practically effective step toward efficient multimodal large models.


强化学习(6篇)

【1】Attention Trajectories as a Diagnostic Axis for Deep Reinforcement Learning
标题:注意力轨迹作为深度强化学习的诊断轴
链接:https://arxiv.org/abs/2511.20591

作者:Charlotte Beylier,Hannah Selder,Arthur Fleig,Simon M. Hofmann,Nico Scherf
摘要:强化学习(RL)智能体的学习过程除了其学习算法的数学公式外,仍然知之甚少。为了解决这一差距,我们引入了注意力导向的指标(ATOM)来研究RL代理在训练过程中注意力的发展。在一个对照实验中,我们在三种不同的乒乓球游戏中测试了ATOM,每种游戏都旨在教导代理不同的行为,并辅以行为评估。ATOM成功地描绘了在每个游戏变化上训练的代理的注意力模式,并且注意力模式的这些差异转化为代理行为的差异。通过在训练过程中持续监测ATOM,我们观察到代理的注意力是分阶段发展的,并且这些阶段在游戏变化中是一致的。总的来说,我们相信ATOM可以帮助我们更好地理解RL代理的学习过程,更好地理解注意力和学习之间的关系。
摘要:The learning process of a reinforcement learning (RL) agent remains poorly understood beyond the mathematical formulation of its learning algorithm. To address this gap, we introduce attention-oriented metrics (ATOMs) to investigate the development of an RL agent's attention during training. In a controlled experiment, we tested ATOMs on three variations of a Pong game, each designed to teach the agent distinct behaviours, complemented by a behavioural assessment. ATOMs successfully delineate the attention patterns of an agent trained on each game variation, and that these differences in attention patterns translate into differences in the agent's behaviour. Through continuous monitoring of ATOMs during training, we observed that the agent's attention developed in phases, and that these phases were consistent across game variations. Overall, we believe that ATOM could help improve our understanding of the learning processes of RL agents and better understand the relationship between attention and learning.


【2】Quantum-Enhanced Reinforcement Learning for Accelerating Newton-Raphson Convergence with Ising Machines: A Case Study for Power Flow Analysis
标题:利用伊辛机加速牛顿-拉夫森收敛的量子增强强化学习:潮流分析的案例研究
链接:https://arxiv.org/abs/2511.20237

作者:Zeynab Kaseb,Matthias Moller,Lindsay Spoor,Jerry J. Guo,Yu Xiang,Peter Palensky,Pedro P. Vergara
备注:10 pages, 9 figures, 4 tables
摘要:牛顿-拉夫逊法(NR)因其二次收敛性而被广泛应用于电力系统潮流方程的求解。然而,在初始化不佳或极端操作情况下,其性能会恶化,例如,可再生能源渗透率高。传统的NR初始化策略通常无法解决这些挑战,导致收敛缓慢甚至发散。我们提出使用强化学习(RL)来优化NR的初始化,并引入一种新型的量子增强RL环境更新机制,通过将电压调整任务公式化来减轻在每个RL时间步在组合大动作空间上评估电力系统状态的显着计算成本。作为二次无约束二元优化问题。具体而言,量子/数字退火集成到RL环境更新,以评估状态转换使用的问题哈密顿设计的PF。结果表明,显着改善收敛速度,减少NR迭代次数,并在不同的操作条件下增强鲁棒性。
摘要:The Newton-Raphson (NR) method is widely used for solving power flow (PF) equations due to its quadratic convergence. However, its performance deteriorates under poor initialization or extreme operating scenarios, e.g., high levels of renewable energy penetration. Traditional NR initialization strategies often fail to address these challenges, resulting in slow convergence or even divergence. We propose the use of reinforcement learning (RL) to optimize the initialization of NR, and introduce a novel quantum-enhanced RL environment update mechanism to mitigate the significant computational cost of evaluating power system states over a combinatorially large action space at each RL timestep by formulating the voltage adjustment task as a quadratic unconstrained binary optimization problem. Specifically, quantum/digital annealers are integrated into the RL environment update to evaluate state transitions using a problem Hamiltonian designed for PF. Results demonstrate significant improvements in convergence speed, a reduction in NR iteration counts, and enhanced robustness under different operating conditions.


【3】Leveraging weights signals - Predicting and improving generalizability in reinforcement learning
标题:利用权重信号-预测和改进强化学习中的概括性
链接:https://arxiv.org/abs/2511.20234

作者:Olivier Moulin,Vincent Francois-lavet,Paul Elbers,Mark Hoogendoorn
摘要:强化学习(RL)代理的泛化能力(在与他们已经训练过的环境不同的环境中执行的能力)是一个关键问题,因为代理有过适应其训练环境的倾向。为了解决这个问题,并提供一个解决方案,以增加RL代理的泛化能力,我们引入了一种新的方法来预测的泛化得分RL代理的基础上的代理的神经网络的内部权重。使用这种预测能力,我们提出了对邻近策略优化(PPO)损失函数的一些更改,以提高使用此升级版本训练的代理的泛化得分。实验结果表明,我们改进的PPO算法产生的代理具有更强的泛化能力相比,原来的版本。
摘要:Generalizability of Reinforcement Learning (RL) agents (ability to perform on environments different from the ones they have been trained on) is a key problem as agents have the tendency to overfit to their training environments. In order to address this problem and offer a solution to increase the generalizability of RL agents, we introduce a new methodology to predict the generalizability score of RL agents based on the internal weights of the agent's neural networks. Using this prediction capability, we propose some changes in the Proximal Policy Optimization (PPO) loss function to boost the generalization score of the agents trained with this upgraded version. Experimental results demonstrate that our improved PPO algorithm yields agents with stronger generalizability compared to the original version.


【4】Optimize Flip Angle Schedules In MR Fingerprinting Using Reinforcement Learning
标题:使用强化学习优化MR指纹识别中的翻转角度计划
链接:https://arxiv.org/abs/2511.19941

作者:Shenjun Zhong,Zhifeng Chen,Zhaolin Chen
备注:4 pages, 5 figures, submitted to conference
摘要:磁共振指纹(MRF)利用由可调采集参数生成的瞬态信号动态特性,使得最优、鲁棒序列的设计成为复杂的高维顺序决策问题,例如优化关键参数之一,翻转角。强化学习(RL)提供了一种很有前途的方法来自动化参数选择,以优化脉冲序列,最大限度地提高指纹在参数空间中的可重复性。在这项工作中,我们引入了一个RL框架,用于优化MRF中的翻转角时间表,并展示了一个学习的时间表,展示了增强指纹可分离性的非周期性模式。此外,一个有趣的观察是,RL优化的时间表可以减少重复时间的数量,潜在地加速MRF采集。
摘要 :Magnetic Resonance Fingerprinting (MRF) leverages transient-state signal dynamics generated by the tunable acquisition parameters, making the design of an optimal, robust sequence a complex, high-dimensional sequential decision problem, such as optimizing one of the key parameters, flip angle. Reinforcement learning (RL) offers a promising approach to automate parameter selection, to optimize pulse sequences that maximize the distinguishability of fingerprints across the parameter space. In this work, we introduce an RL framework for optimizing the flip-angle schedule in MRF and demonstrate a learned schedule exhibiting non-periodic patterns that enhances fingerprint separability. Additionally, an interesting observation is that the RL-optimized schedule may enable a reduction in the number of repetition time, potentially accelerate MRF acquisitions.


【5】Reinforcement Learning with $ω$-Regular Objectives and Constraints
标题:使用$ω$的强化学习-常规目标和约束
链接:https://arxiv.org/abs/2511.19849

作者:Dominik Wagner,Leon Witzman,Luke Ong
摘要:强化学习(RL)通常依赖于标量奖励,表达时间、条件或安全关键目标的能力有限,并可能导致奖励黑客攻击。通过更一般的$ω$-正则目标类可表达的时态逻辑通过精确地指定丰富的行为属性来解决这个问题。即便如此,通过单一标量(无论是奖励还是满意概率)来衡量性能,也掩盖了在可容忍风险水平的环境中出现的安全性能权衡。   我们同时解决这两个限制相结合的$ω$-定期目标与明确的约束条件,允许安全要求和优化目标分别对待。我们开发了一个基于模型的RL算法的基础上线性规划,在限制产生的政策最大化的概率满足$ω$-定期的目标,同时也坚持$ω$-定期的限制在指定的阈值。此外,我们建立了一个翻译的约束极限平均问题的最优保持保证。
摘要:Reinforcement learning (RL) commonly relies on scalar rewards with limited ability to express temporal, conditional, or safety-critical goals, and can lead to reward hacking. Temporal logic expressible via the more general class of $ω$-regular objectives addresses this by precisely specifying rich behavioural properties. Even still, measuring performance by a single scalar (be it reward or satisfaction probability) masks safety-performance trade-offs that arise in settings with a tolerable level of risk.   We address both limitations simultaneously by combining $ω$-regular objectives with explicit constraints, allowing safety requirements and optimisation targets to be treated separately. We develop a model-based RL algorithm based on linear programming, which in the limit produces a policy maximising the probability of satisfying an $ω$-regular objective while also adhering to $ω$-regular constraints within specified thresholds. Furthermore, we establish a translation to constrained limit-average problems with optimality-preserving guarantees.


【6】Learning to Clean: Reinforcement Learning for Noisy Label Correction
标题:学习清洁:强化学习用于噪音标签纠正
链接:https://arxiv.org/abs/2511.19808

作者:Marzi Heidari,Hanping Zhang,Yuhong Guo
备注:NeurIPS 2025
摘要:在机器学习中,使用噪声标签进行学习的挑战是非常重要的,因为如果处理不当,它会严重降低预测模型的性能。本文介绍了一种新的框架,概念化的噪声标签校正作为一个强化学习(RL)问题。所提出的方法,用于噪声标签校正的强化学习(RLNLC),定义了一个表示数据及其相关标签的综合状态空间,一个指示可能的标签校正的动作空间,以及一个评估标签校正效果的奖励机制。RLNLC学习一个基于深度特征表示的策略网络,通过强化学习来执行标签校正,利用演员-评论家方法。学习的策略随后被部署以迭代地校正噪声训练标签,并促进预测模型的训练。RLNLC的有效性通过在多个基准数据集上进行的大量实验来证明,它始终优于现有的最先进的带噪声标签学习技术。
摘要:The challenge of learning with noisy labels is significant in machine learning, as it can severely degrade the performance of prediction models if not addressed properly. This paper introduces a novel framework that conceptualizes noisy label correction as a reinforcement learning (RL) problem. The proposed approach, Reinforcement Learning for Noisy Label Correction (RLNLC), defines a comprehensive state space representing data and their associated labels, an action space that indicates possible label corrections, and a reward mechanism that evaluates the efficacy of label corrections. RLNLC learns a deep feature representation based policy network to perform label correction through reinforcement learning, utilizing an actor-critic method. The learned policy is subsequently deployed to iteratively correct noisy training labels and facilitate the training of the prediction model. The effectiveness of RLNLC is demonstrated through extensive experiments on multiple benchmark datasets, where it consistently outperforms existing state-of-the-art techniques for learning with noisy labels.


医学相关(5篇)

【1】MTBBench: A Multimodal Sequential Clinical Decision-Making Benchmark in Oncology
标题:MTBBench:肿瘤学的多模式顺序临床决策基准
链接:https://arxiv.org/abs/2511.20490

作者:Kiril Vasilev,Alexandre Misrahi,Eeshaan Jain,Phil F Cheng,Petros Liakopoulos,Olivier Michielin,Michael Moor,Charlotte Bunne
备注:Accepted to NeurIPS 2025
摘要:多模态大型语言模型(LLM)为生物医学推理带来了希望,但目前的基准测试未能捕捉到现实世界临床工作流程的复杂性。现有的评估主要评估单峰,脱离语境的问题回答,忽视多代理决策环境,如分子肿瘤委员会(MTB)。MTB汇集了肿瘤学的不同专家,诊断和预后任务需要整合异质数据并随着时间的推移不断发展见解。目前的基准缺乏这种纵向和多式联运的复杂性。我们介绍MTBBench,这是一个通过临床挑战性,多模式和纵向肿瘤学问题模拟MTB式决策的代理基准。地面实况注释由临床医生通过共同开发的应用程序进行验证,确保临床相关性。我们对多个开源和闭源LLM进行了基准测试,结果表明,即使在大规模的情况下,它们也缺乏可靠性--经常产生幻觉,难以从时间分辨的数据中进行推理,并且未能协调相互冲突的证据或不同的模式。为了解决这些局限性,MTBBench超越了基准测试,提供了一个代理框架,该框架具有基于基础模型的工具,可以增强多模态和纵向推理,分别使任务级性能提高9.0%和11.2%。总体而言,MTBBench提供了一个具有挑战性和现实性的测试平台,用于推进多模态LLM推理,可靠性和工具使用,重点关注精确肿瘤学中的MTB环境。
摘要:Multimodal Large Language Models (LLMs) hold promise for biomedical reasoning, but current benchmarks fail to capture the complexity of real-world clinical workflows. Existing evaluations primarily assess unimodal, decontextualized question-answering, overlooking multi-agent decision-making environments such as Molecular Tumor Boards (MTBs). MTBs bring together diverse experts in oncology, where diagnostic and prognostic tasks require integrating heterogeneous data and evolving insights over time. Current benchmarks lack this longitudinal and multimodal complexity. We introduce MTBBench, an agentic benchmark simulating MTB-style decision-making through clinically challenging, multimodal, and longitudinal oncology questions. Ground truth annotations are validated by clinicians via a co-developed app, ensuring clinical relevance. We benchmark multiple open and closed-source LLMs and show that, even at scale, they lack reliability -- frequently hallucinating, struggling with reasoning from time-resolved data, and failing to reconcile conflicting evidence or different modalities. To address these limitations, MTBBench goes beyond benchmarking by providing an agentic framework with foundation model-based tools that enhance multi-modal and longitudinal reasoning, leading to task-level performance gains of up to 9.0% and 11.2%, respectively. Overall, MTBBench offers a challenging and realistic testbed for advancing multimodal LLM reasoning, reliability, and tool-use with a focus on MTB environments in precision oncology.


【2】KOM: A Multi-Agent Artificial Intelligence System for Precision Management of Knee Osteoarthritis (KOA)
标题:KOM:用于精确管理膝关节骨关节炎(KOA)的多智能体人工智能系统
链接:https://arxiv.org/abs/2511.19798

作者:Weizhi Liu,Xi Chen,Zekun Jiang,Liang Zhao,Kunyuan Jiang,Ruisi Tang,Li Wang,Mingke You,Hanyu Zhou,Hongyu Chen,Qiankun Xiong,Yong Nie,Kang Li,Jian Li
摘要 :膝关节骨关节炎(KOA)影响全球超过6亿人,并与严重疼痛,功能障碍和残疾有关。虽然个性化的多学科干预有可能减缓疾病进展并提高生活质量,但它们通常需要大量的医疗资源和专业知识,因此难以在资源有限的环境中实施。为了应对这一挑战,我们开发了KOM,这是一个多代理系统,旨在自动化KOA评估,风险预测和治疗处方。该系统可帮助临床医生执行KOA护理路径中的基本任务,并支持根据患者个人资料、疾病状态、风险因素和禁忌症生成量身定制的管理计划。在基准测试实验中,KOM在成像分析和处方生成方面表现出优于几种通用大型语言模型的性能。一项随机三臂模拟研究进一步显示,与单独使用的每种方法相比,KOM和临床医生之间的合作将总诊断和计划时间减少了38.5%,并提高了治疗质量。这些发现表明,KOM可以帮助促进自动化KOA管理,并且当集成到临床工作流程中时,有可能提高护理效率。KOM的模块化架构还可以为开发其他慢性病的AI辅助管理系统提供有价值的见解。
摘要:Knee osteoarthritis (KOA) affects more than 600 million individuals globally and is associated with significant pain, functional impairment, and disability. While personalized multidisciplinary interventions have the potential to slow disease progression and enhance quality of life, they typically require substantial medical resources and expertise, making them difficult to implement in resource-limited settings. To address this challenge, we developed KOM, a multi-agent system designed to automate KOA evaluation, risk prediction, and treatment prescription. This system assists clinicians in performing essential tasks across the KOA care pathway and supports the generation of tailored management plans based on individual patient profiles, disease status, risk factors, and contraindications. In benchmark experiments, KOM demonstrated superior performance compared to several general-purpose large language models in imaging analysis and prescription generation. A randomized three-arm simulation study further revealed that collaboration between KOM and clinicians reduced total diagnostic and planning time by 38.5% and resulted in improved treatment quality compared to each approach used independently. These findings indicate that KOM could help facilitate automated KOA management and, when integrated into clinical workflows, has the potential to enhance care efficiency. The modular architecture of KOM may also offer valuable insights for developing AI-assisted management systems for other chronic conditions.


【3】Hierarchical Dual-Strategy Unlearning for Biomedical and Healthcare Intelligence Using Imperfect and Privacy-Sensitive Medical Data
标题:使用不完美且隐私敏感的医疗数据进行生物医学和医疗保健智能的分层双策略学习
链接:https://arxiv.org/abs/2511.19498

作者:Yi Zhang,Tianxiang Xu,Zijian Li,Chao Zhang,Kunyu Zhang,Zhan Gao,Meinuo Li,Xiaohan Zhang,Qichao Qi,Bing Chen
摘要:大型语言模型(LLM)表现出卓越的性能,但由于训练数据记忆,特别是在涉及不完善或隐私敏感的患者信息的医疗保健环境中,会带来巨大的隐私风险。我们提出了一个层次化的双策略框架,选择性的知识遗忘,精确地删除专业知识,同时保留基本的医疗能力。我们的方法协同集成了几何约束梯度更新,以通过统一的四级医学概念层次结构,通过概念感知的令牌级干预来选择性地调节目标参数,这些干预可以区分关键性令牌和非学习目标令牌。对MedMCQA(手术)和MHQA(焦虑、抑郁、创伤)数据集的综合评估显示出卓越的性能,达到了82.7%的遗忘率和88.5%的知识保存率。值得注意的是,我们的框架保持了强大的隐私保证,同时只需要修改0.1%的参数,满足了临床研究中对法规遵从性、可验证性和道德标准的关键需求。
摘要:Large language models (LLMs) exhibit exceptional performance but pose substantial privacy risks due to training data memorization, particularly within healthcare contexts involving imperfect or privacy-sensitive patient information. We present a hierarchical dual-strategy framework for selective knowledge unlearning that precisely removes specialized knowledge while preserving fundamental medical competencies. Our approach synergistically integrates geometric-constrained gradient updates to selectively modulate target parameters with concept-aware token-level interventions that distinguish between preservation-critical and unlearning-targeted tokens via a unified four-level medical concept hierarchy. Comprehensive evaluations on the MedMCQA (surgical) and MHQA (anxiety, depression, trauma) datasets demonstrate superior performance, achieving an 82.7% forgetting rate and 88.5% knowledge preservation. Notably, our framework maintains robust privacy guarantees while requiring modification of only 0.1% of parameters, addressing critical needs for regulatory compliance, auditability, and ethical standards in clinical research.


【4】Masked Autoencoder Joint Learning for Robust Spitzoid Tumor Classification
标题:用于Spitzoid肿瘤鲁棒分类的掩蔽自编码器联合学习
链接:https://arxiv.org/abs/2511.19535

作者:Ilán Carretero,Roshni Mahtani,Silvia Perez-Deben,José Francisco González-Muñoz,Carlos Monteagudo,Valery Naranjo,Rocío del Amor
备注:Accepted in CASEIB 2025
摘要:Spitzoid肿瘤(ST)的准确诊断是确保良好预后和避免治疗不足和过度的关键。表观遗传学数据,特别是DNA甲基化,为这项任务提供了有价值的信息来源。然而,先前的研究假设完整的数据,这是一个不现实的设置,因为甲基化图谱经常包含由于有限的覆盖范围和实验伪影而丢失的条目。我们的工作挑战了这些有利的场景,并引入了ReMAC,这是ReMasker的扩展,旨在解决完整和不完整制度下高维数据的分类任务。对真实临床数据的评价表明,与ST分层的竞争分类方法相比,ReMAC实现了强大和稳健的性能。代码可在https://github.com/roshni-mahtani/ReMAC上获得。
摘要:Accurate diagnosis of spitzoid tumors (ST) is critical to ensure a favorable prognosis and to avoid both under- and over-treatment. Epigenetic data, particularly DNA methylation, provide a valuable source of information for this task. However, prior studies assume complete data, an unrealistic setting as methylation profiles frequently contain missing entries due to limited coverage and experimental artifacts. Our work challenges these favorable scenarios and introduces ReMAC, an extension of ReMasker designed to tackle classification tasks on high-dimensional data under complete and incomplete regimes. Evaluation on real clinical data demonstrates that ReMAC achieves strong and robust performance compared to competing classification methods in the stratification of ST. Code is available: https://github.com/roshni-mahtani/ReMAC.


【5】A Multi-Stage Deep Learning Framework with PKCP-MixUp Augmentation for Pediatric Liver Tumor Diagnosis Using Multi-Phase Contrast-Enhanced CT
标题:具有PKCP-MixUp增强功能的多阶段深度学习框架,用于使用多期对比增强CT进行儿科肝肿瘤诊断
链接:https://arxiv.org/abs/2511.19478

作者:Wanqi Wang,Chun Yang,Jianbo Shao,Yaokai Zhang,Xuehua Peng,Jin Sun,Chao Xiong,Long Lu,Lianting Hu
摘要:小儿肝脏肿瘤是儿科最常见的实体肿瘤之一,其良恶性状态和病理分类的鉴别对于临床治疗至关重要。虽然病理学检查是金标准,但侵入性活检具有明显的局限性:高度血管化的儿科肝脏和脆弱的肿瘤组织增加了出血等并发症的风险;此外,依从性差的幼儿需要麻醉进行活检,增加了医疗费用或心理创伤。尽管已经做出了许多努力在临床环境中利用AI,但大多数研究人员都忽视了它在儿科肝脏肿瘤中的重要性。为了建立一种非侵入性的检查程序,我们开发了一个多阶段深度学习(DL)框架,用于使用多相对比增强CT进行自动儿科肝脏肿瘤诊断。入组了两个回顾性和前瞻性队列。我们建立了一种新的PKCP-MixUp数据增强方法来解决数据稀缺和类不平衡问题。我们还训练了一个肿瘤检测模型来提取ROI,然后设置了一个两阶段的诊断管道,其中有三个主干,带有ROI掩蔽图像。我们的肿瘤检测模型达到了很高的性能(mAP=0.871),良性和恶性肿瘤之间的第一阶段分类模型达到了优异的性能(AUC=0.989)。最终诊断模型也表现出稳健性,包括良性亚型分类(AUC=0.915)和恶性亚型分类(AUC=0.979)。我们还进行了多层次的比较分析,例如对数据和训练管道的消融研究,以及Shapley-Value和CAM可解释性分析。该框架填补了儿科特异性DL诊断的空白,为CT分期选择和模型设计提供了可操作的见解,并为精确,可访问的儿科肝脏肿瘤诊断铺平了道路。
摘要 :Pediatric liver tumors are one of the most common solid tumors in pediatrics, with differentiation of benign or malignant status and pathological classification critical for clinical treatment. While pathological examination is the gold standard, the invasive biopsy has notable limitations: the highly vascular pediatric liver and fragile tumor tissue raise complication risks such as bleeding; additionally, young children with poor compliance require anesthesia for biopsy, increasing medical costs or psychological trauma. Although many efforts have been made to utilize AI in clinical settings, most researchers have overlooked its importance in pediatric liver tumors. To establish a non-invasive examination procedure, we developed a multi-stage deep learning (DL) framework for automated pediatric liver tumor diagnosis using multi-phase contrast-enhanced CT. Two retrospective and prospective cohorts were enrolled. We established a novel PKCP-MixUp data augmentation method to address data scarcity and class imbalance. We also trained a tumor detection model to extract ROIs, and then set a two-stage diagnosis pipeline with three backbones with ROI-masked images. Our tumor detection model has achieved high performance (mAP=0.871), and the first stage classification model between benign and malignant tumors reached an excellent performance (AUC=0.989). Final diagnosis models also exhibited robustness, including benign subtype classification (AUC=0.915) and malignant subtype classification (AUC=0.979). We also conducted multi-level comparative analyses, such as ablation studies on data and training pipelines, as well as Shapley-Value and CAM interpretability analyses. This framework fills the pediatric-specific DL diagnostic gap, provides actionable insights for CT phase selection and model design, and paves the way for precise, accessible pediatric liver tumor diagnosis.


蒸馏|知识提取(1篇)

【1】Modality-Balanced Collaborative Distillation for Multi-Modal Domain Generalization
标题:用于多模式领域概括的模式平衡协同蒸馏
链接:https://arxiv.org/abs/2511.20258

作者:Xiaohan Wang,Zhangtao Cheng,Ting Zhong,Leiting Chen,Fan Zhou
摘要:加权平均(WA)已经成为一种强大的技术,通过促进收敛到平坦的损失景观,这与更强的分布外性能相关,从而增强泛化能力。然而,将WA直接应用于多模态域泛化(MMDG)是具有挑战性的:模态之间优化速度的差异导致WA在早期阶段过拟合到收敛速度更快的模态,抑制了较慢但互补的模态的贡献,从而阻碍了有效的模态融合,并使损失表面偏向更尖锐,更难推广的最小值。为了解决这个问题,我们提出了MBCD,一个统一的协作蒸馏框架,保留WA的平坦诱导的优势,同时克服其缺点,在多模态的情况下。MBCD从学生模型中的自适应模态退出开始,以抑制对主导模态的早期偏见。然后,梯度一致性约束将单峰分支和融合表示之间的学习信号对齐,鼓励协调和更平滑的优化。最后,基于WA的教师通过将融合的知识转移到每个单模态分支来进行跨模态蒸馏,这加强了跨模态的相互作用,并将收敛引向更平坦的解决方案。在MMDG基准测试上进行的大量实验表明,MBCD始终优于现有方法,在不同的未知领域中实现了卓越的准确性和鲁棒性。
摘要:Weight Averaging (WA) has emerged as a powerful technique for enhancing generalization by promoting convergence to a flat loss landscape, which correlates with stronger out-of-distribution performance. However, applying WA directly to multi-modal domain generalization (MMDG) is challenging: differences in optimization speed across modalities lead WA to overfit to faster-converging ones in early stages, suppressing the contribution of slower yet complementary modalities, thereby hindering effective modality fusion and skewing the loss surface toward sharper, less generalizable minima. To address this issue, we propose MBCD, a unified collaborative distillation framework that retains WA's flatness-inducing advantages while overcoming its shortcomings in multi-modal contexts. MBCD begins with adaptive modality dropout in the student model to curb early-stage bias toward dominant modalities. A gradient consistency constraint then aligns learning signals between uni-modal branches and the fused representation, encouraging coordinated and smoother optimization. Finally, a WA-based teacher conducts cross-modal distillation by transferring fused knowledge to each uni-modal branch, which strengthens cross-modal interactions and steer convergence toward flatter solutions. Extensive experiments on MMDG benchmarks show that MBCD consistently outperforms existing methods, achieving superior accuracy and robustness across diverse unseen domains.


聚类(1篇)

【1】Clustering Approaches for Mixed-Type Data: A Comparative Study
标题:混合类型数据的聚集方法:比较研究
链接:https://arxiv.org/abs/2511.19755

作者:Badih Ghattas,Alvaro Sanchez San-Benito
摘要:聚类广泛用于无监督学习,以在数据集中找到同质的观察组。然而,对混合类型数据进行集群仍然是一个挑战,因为现有的方法很少适合这项任务。本研究介绍了这些方法的最新技术,并使用各种仿真模型进行比较。比较的方法包括基于距离的方法k-prototypes,PDQ,和凸k-means,和概率方法KAy-means混合大数据(KAMILA),贝叶斯网络的混合(MBN),和潜在的类模型(LCM)。其目的是通过改变一些实验因素,如聚类数量、聚类重叠、样本大小、维度、数据集中连续变量的比例以及聚类分布,来深入了解不同方法在各种场景中的行为。聚类重叠程度、连续变量在数据集中的比例以及样本大小对所观察到的性能有显著影响。当强相互作用存在于变量之间,同时明确依赖于集群成员资格,没有一个评估的方法表现出令人满意的性能。在我们的实验中,KAMILA,LCM和k-原型表现出最好的性能,相对于调整后的rand指数(ARI)。所有的方法在R.
摘要:Clustering is widely used in unsupervised learning to find homogeneous groups of observations within a dataset. However, clustering mixed-type data remains a challenge, as few existing approaches are suited for this task. This study presents the state-of-the-art of these approaches and compares them using various simulation models. The compared methods include the distance-based approaches k-prototypes, PDQ, and convex k-means, and the probabilistic methods KAy-means for MIxed LArge data (KAMILA), the mixture of Bayesian networks (MBNs), and latent class model (LCM). The aim is to provide insights into the behavior of different methods across a wide range of scenarios by varying some experimental factors such as the number of clusters, cluster overlap, sample size, dimension, proportion of continuous variables in the dataset, and clusters' distribution. The degree of cluster overlap and the proportion of continuous variables in the dataset and the sample size have a significant impact on the observed performances. When strong interactions exist between variables alongside an explicit dependence on cluster membership, none of the evaluated methods demonstrated satisfactory performance. In our experiments KAMILA, LCM, and k-prototypes exhibited the best performance, with respect to the adjusted rand index (ARI). All the methods are available in R.


自动驾驶|车辆|车道检测等(2篇)

【1】The Driver-Blindness Phenomenon: Why Deep Sequence Models Default to Autocorrelation in Blood Glucose Forecasting
标题:驾驶员失明现象:为什么深度序列模型在血糖预测中默认自相关
链接:https://arxiv.org/abs/2511.20601

作者:Heman Shakeri
备注:7 pages, 1 figure
摘要:用于血糖预测的深度序列模型始终未能利用临床信息驱动因素-胰岛素,膳食和活动-尽管有很好的生理机制。我们称之为驱动盲,并通过$Δ_{\text{drivers}}$将其形式化,多变量模型在匹配的单变量基线上的性能增益。在文献中,$Δ_{\text{drivers}}$通常接近于零。我们将其归因于三个相互作用的因素:有利于自相关的架构偏差(C1),使驱动程序嘈杂和混淆的数据保真度差距(C2),以及破坏群体水平模型的生理异质性(C3)。我们综合了部分缓解驾驶盲的策略-包括生理特征编码器,因果正则化和个性化-并建议未来的工作定期报告$Δ_{\text{drivers}}$,以防止驾驶盲模型被认为是最先进的。
摘要:Deep sequence models for blood glucose forecasting consistently fail to leverage clinically informative drivers--insulin, meals, and activity--despite well-understood physiological mechanisms. We term this Driver-Blindness and formalize it via $Δ_{\text{drivers}}$, the performance gain of multivariate models over matched univariate baselines. Across the literature, $Δ_{\text{drivers}}$ is typically near zero. We attribute this to three interacting factors: architectural biases favoring autocorrelation (C1), data fidelity gaps that render drivers noisy and confounded (C2), and physiological heterogeneity that undermines population-level models (C3). We synthesize strategies that partially mitigate Driver-Blindness--including physiological feature encoders, causal regularization, and personalization--and recommend that future work routinely report $Δ_{\text{drivers}}$ to prevent driver-blind models from being considered state-of-the-art.


【2】Blinking Beyond EAR: A Stable Eyelid Angle Metric for Driver Drowsiness Detection and Data Augmentation
标题 :超越按钮眨眼:用于驾驶员困倦检测和数据增强的稳定眼睑角指标
链接:https://arxiv.org/abs/2511.19519

作者:Mathis Wolter,Julie Stephany Berrio Perez,Mao Shan
备注:8 pages, 5 figures, 3 tables
摘要:可靠地检测驾驶员困倦对于提高道路安全和支持高级驾驶员辅助系统(ADAS)至关重要。我们介绍了眼睑角(ELA),一种新颖的,可重复的眼睛开放度度量来自3D面部标志。与传统的二进制眼睛状态估计器或2D测量(例如眼睛纵横比(Eye Aspect Ratio,ELA))不同,ELA提供眼睑运动的稳定几何描述,其对相机角度的变化是鲁棒的。使用ELA,我们设计了一个眨眼检测框架,提取时间特征,包括关闭,关闭和重新打开的持续时间,这被证明与困倦水平相关。为了解决收集自然困倦数据的稀缺性和风险,我们进一步利用ELA信号在Blender 3D中制作操纵化身的动画,从而能够创建具有可控噪声,相机视角和眨眼动态的逼真合成数据集。在公共驾驶员监控数据集上的实验结果表明,ELA提供了更低的方差下的观点变化相比,闪烁,实现准确的检测。同时,合成增强扩展了用于困倦识别的训练数据的多样性。我们的研究结果突出了ELA既是一个可靠的生物特征测量,也是一个强大的工具,用于生成可扩展的数据集,在驾驶员状态监测。
摘要:Detecting driver drowsiness reliably is crucial for enhancing road safety and supporting advanced driver assistance systems (ADAS). We introduce the Eyelid Angle (ELA), a novel, reproducible metric of eye openness derived from 3D facial landmarks. Unlike conventional binary eye state estimators or 2D measures, such as the Eye Aspect Ratio (EAR), the ELA provides a stable geometric description of eyelid motion that is robust to variations in camera angle. Using the ELA, we design a blink detection framework that extracts temporal characteristics, including the closing, closed, and reopening durations, which are shown to correlate with drowsiness levels. To address the scarcity and risk of collecting natural drowsiness data, we further leverage ELA signals to animate rigged avatars in Blender 3D, enabling the creation of realistic synthetic datasets with controllable noise, camera viewpoints, and blink dynamics. Experimental results in public driver monitoring datasets demonstrate that the ELA offers lower variance under viewpoint changes compared to EAR and achieves accurate blink detection. At the same time, synthetic augmentation expands the diversity of training data for drowsiness recognition. Our findings highlight the ELA as both a reliable biometric measure and a powerful tool for generating scalable datasets in driver state monitoring.


联邦学习|隐私保护|加密(2篇)

【1】Accelerating Wireless Distributed Learning via Hybrid Split and Federated Learning Optimization
标题:通过混合拆分和联邦学习优化加速无线分布式学习
链接:https://arxiv.org/abs/2511.19851

作者:Kun Guo,Xuefei Li,Xijun Wang,Howard H. Yang,Wei Feng,Tony Q. S. Quek
摘要:联合学习(FL)和分裂学习(SL)是无线网络中两种有效的分布式学习范式,能够在不共享原始数据的情况下跨移动设备进行协作模型训练。虽然FL支持低延迟并行训练,但它可能会收敛到不太准确的模型。相比之下,SL通过顺序训练实现更高的准确性,但延迟增加。为了利用两者的优势,混合分割和联邦学习(HSFL)允许一些设备在FL模式下运行,而另一些设备在SL模式下运行。本文旨在通过解决三个关键问题来加速HSFL:1)学习模式的选择如何影响整体学习绩效?2)它如何与批量大小相互作用?3)这些超参数如何与通信和计算资源一起联合优化,以减少整体学习延迟?我们首先分析收敛性,揭示学习模式和批量大小之间的相互作用。接下来,我们制定了一个延迟最小化问题,并提出了一个两阶段的解决方案:一个块坐标下降法松弛的问题,以获得局部最优的解决方案,其次是舍入算法恢复整数批量大小与接近最优的性能。实验结果表明,我们的方法显着加快收敛到目标精度相比,现有的方法。
摘要:Federated learning (FL) and split learning (SL) are two effective distributed learning paradigms in wireless networks, enabling collaborative model training across mobile devices without sharing raw data. While FL supports low-latency parallel training, it may converge to less accurate model. In contrast, SL achieves higher accuracy through sequential training but suffers from increased delay. To leverage the advantages of both, hybrid split and federated learning (HSFL) allows some devices to operate in FL mode and others in SL mode. This paper aims to accelerate HSFL by addressing three key questions: 1) How does learning mode selection affect overall learning performance? 2) How does it interact with batch size? 3) How can these hyperparameters be jointly optimized alongside communication and computational resources to reduce overall learning delay? We first analyze convergence, revealing the interplay between learning mode and batch size. Next, we formulate a delay minimization problem and propose a two-stage solution: a block coordinate descent method for a relaxed problem to obtain a locally optimal solution, followed by a rounding algorithm to recover integer batch sizes with near-optimal performance. Experimental results demonstrate that our approach significantly accelerates convergence to the target accuracy compared to existing methods.


【2】Federated Learning Framework for Scalable AI in Heterogeneous HPC and Cloud Environments
标题:用于异构HPC和云环境中可扩展AI的联合学习框架
链接:https://arxiv.org/abs/2511.19479

作者:Sangam Ghimire,Paribartan Timalsina,Nirjal Bhurtel,Bishal Neupane,Bigyan Byanju Shrestha,Subarna Bhattarai,Prajwal Gaire,Jessica Thapa,Sudan Jha
摘要:随着对可扩展和隐私感知AI系统的需求增长,联邦学习(FL)已成为一种有前途的解决方案,允许在不移动原始数据的情况下进行分散的模型训练。与此同时,高性能计算(HPC)和云基础设施的结合提供了巨大的计算能力,但也带来了新的复杂性,特别是在处理异构硬件、通信限制和非统一数据时。在这项工作中,我们提出了一个联合学习框架,旨在跨混合HPC和云环境高效运行。我们的系统解决了关键的挑战,如系统异构性,通信开销和资源调度,同时保持模型的准确性和数据隐私。通过在混合测试平台上的实验,我们在可扩展性,容错性和收敛性方面表现出强大的性能,即使在非独立和相同分布(非IID)的数据分布和不同的硬件。这些结果突出了联邦学习作为在现代分布式计算环境中构建可扩展人工智能(AI)系统的实用方法的潜力。
摘要:As the demand grows for scalable and privacy-aware AI systems, Federated Learning (FL) has emerged as a promising solution, allowing decentralized model training without moving raw data. At the same time, the combination of high- performance computing (HPC) and cloud infrastructure offers vast computing power but introduces new complexities, especially when dealing with heteroge- neous hardware, communication limits, and non-uniform data. In this work, we present a federated learning framework built to run efficiently across mixed HPC and cloud environments. Our system addresses key challenges such as system het- erogeneity, communication overhead, and resource scheduling, while maintaining model accuracy and data privacy. Through experiments on a hybrid testbed, we demonstrate strong performance in terms of scalability, fault tolerance, and convergence, even under non-Independent and Identically Distributed (non-IID) data distributions and varied hardware. These results highlight the potential of federated learning as a practical approach to building scalable Artificial Intelligence (AI) systems in modern, distributed computing settings.


推理|分析|理解|解释(10篇)

【1】DiFR: Inference Verification Despite Nondeterminism
标题:DiFR:尽管存在不确定性,但推理验证
链接:https://arxiv.org/abs/2511.20621

作者:Adam Karvonen,Daniel Reuter,Roy Rinberg,Luke Marks,Adrià Garriga-Alonso,Keri Warr
摘要 :随着对LLM推理需求的增长,提供商及其客户可以验证推理过程是否正确执行,而没有错误或篡改变得越来越重要。然而,由于良性的数值噪声,重复运行相同的推理过程两次通常会导致不同的结果,这使得很难区分合法的变化与实际问题。为了解决这个问题,我们引入了Token-DiFR(Token-Divergence-From-Reference),这是一种通过将生成的令牌与以相同随机种子为条件的可信参考实现所做的预测进行比较来验证推理输出的方法。采样种子同步严格限制有效输出,使提供者偏离正确推断的空间最小,这允许输出令牌本身作为正确性的可审计证据,而提供者的额外成本为零。Token-DiFR可靠地识别采样错误、模拟错误和模型量化,在300个输出令牌内检测AUC $>$0.999的4位量化。对于需要样本高效的前向通过验证的应用程序,我们还引入了Activation-DiFR,这是一种使用随机正交投影将激活压缩成紧凑指纹以供后续验证的方案。Activation-DiFR仅使用2个输出令牌检测AUC $>$0.999的4位量化,同时相对于现有方法减少25-75%的通信开销。我们发布了一个与vLLM的开源集成,以加速可验证推理的实际部署。
摘要:As demand for LLM inference grows, it is becoming increasingly important that providers and their customers can verify that inference processes are performed correctly, without errors or tampering. However, re-running the same inference process twice often leads to different results due to benign numerical noise, making it difficult to distinguish legitimate variation from actual problems. To address this problem, we introduce Token-DiFR (Token-Divergence-From-Reference), a method for verifying inference outputs by comparing generated tokens against predictions made by a trusted reference implementation conditioned on the same random seed. Sampling seed synchronization tightly constrains valid outputs, leaving providers minimal room to deviate from correct inference, which allows output tokens themselves to serve as auditable evidence of correctness at zero additional cost to the provider. Token-DiFR reliably identifies sampling errors, simulated bugs, and model quantization, detecting 4-bit quantization with AUC $>$ 0.999 within 300 output tokens. For applications requiring sample-efficient forward-pass verification, we additionally introduce Activation-DiFR, a scheme that uses random orthogonal projections to compress activations into compact fingerprints for subsequent verification. Activation-DiFR detects 4-bit quantization with AUC $>$ 0.999 using just 2 output tokens, while reducing communication overhead by 25-75% relative to existing methods. We release an open-source integration with vLLM to accelerate practical deployment of verifiable inference.


【2】BrowseSafe: Understanding and Preventing Prompt Injection Within AI Browser Agents
标题:BrowseSafe:了解和防止人工智能浏览器代理内的提示注入
链接:https://arxiv.org/abs/2511.20597

作者:Kaiyuan Zhang,Mark Tenenholtz,Kyle Polley,Jerry Ma,Denis Yarats,Ninghui Li
摘要:将人工智能(AI)代理集成到Web浏览器中带来了超越传统Web应用程序威胁模型的安全挑战。之前的工作已经确定了即时注入作为Web代理的新攻击载体,但在现实世界环境中产生的影响仍然没有得到充分的理解。   在这项工作中,我们研究的景观提示注入攻击,并合成一个基准的攻击嵌入在现实的HTML负载。我们的基准超越了以前的工作,强调注入,可以影响现实世界的行动,而不仅仅是文本输出,并提出攻击有效载荷的复杂性和干扰频率类似于现实世界的代理遇到的。我们利用这一基准对现有防御进行全面的实证评估,评估它们在一系列前沿AI模型中的有效性。我们提出了一个多层次的防御策略,包括建筑和基于模型的防御,以防止不断发展的即时注入攻击。我们的工作提供了一个蓝图,设计实用的,安全的网络代理,通过纵深防御的方法。
摘要:The integration of artificial intelligence (AI) agents into web browsers introduces security challenges that go beyond traditional web application threat models. Prior work has identified prompt injection as a new attack vector for web agents, yet the resulting impact within real-world environments remains insufficiently understood.   In this work, we examine the landscape of prompt injection attacks and synthesize a benchmark of attacks embedded in realistic HTML payloads. Our benchmark goes beyond prior work by emphasizing injections that can influence real-world actions rather than mere text outputs, and by presenting attack payloads with complexity and distractor frequency similar to what real-world agents encounter. We leverage this benchmark to conduct a comprehensive empirical evaluation of existing defenses, assessing their effectiveness across a suite of frontier AI models. We propose a multi-layered defense strategy comprising both architectural and model-based defenses to protect against evolving prompt injection attacks. Our work offers a blueprint for designing practical, secure web agents through a defense-in-depth approach.


【3】Latent Diffusion Inversion Requires Understanding the Latent Space
标题:潜在扩散倒置需要了解潜在空间
链接:https://arxiv.org/abs/2511.20592

作者:Mingxing Rao,Bowen Qu,Daniel Moyer
备注:14 pages, 4 figures, 4 tables
摘要:从生成模型中恢复训练数据("模型反演“)已被广泛研究用于数据域中的扩散模型。编码器/解码器对和对应的潜在代码已经被应用于潜在空间生成模型的逆技术在很大程度上忽略,例如,潜在扩散模型(LDMs)。在这项工作中,我们描述了两个关键的发现:(1)扩散模型表现出非均匀的记忆跨越潜在的代码,往往过拟合位于解码器拉回度量的高失真区域的样本。(2)即使在同一个潜在代码中,不同的维度对记忆的贡献也是不一样的。我们介绍了一个原则性的方法来排名潜在的维度,他们的每维的贡献解码器回调度量,确定那些最负责记忆。从经验上讲,在计算基于分数的成员推断攻击者的攻击统计数据时,删除记忆较少的维度显着提高了性能,平均AUROC增益为2.7%,TPR@1\%FPR(6.42%)在不同的数据集上大幅增加,包括CIFAR-10,CelebA,ImageNet-1 K,Pokémon,MS-COCO和Flickr。这表明在极低的假阳性容忍度下识别成员的信心更强。我们的研究结果突出了自动编码器几何形状对LDM记忆的忽视影响,并为分析基于扩散的生成模型中的隐私风险提供了一个新的视角。
摘要:The recovery of training data from generative models (``model inversion'') has been extensively studied for diffusion models in the data domain. The encoder/decoder pair and corresponding latent codes have largely been ignored by inversion techniques applied to latent space generative models, e.g., Latent Diffusion models (LDMs). In this work we describe two key findings: (1) The diffusion model exhibits non-uniform memorization across latent codes, tending to overfit samples located in high-distortion regions of the decoder pullback metric. (2) Even within a single latent code, different dimensions contribute unequally to memorization. We introduce a principled method to rank latent dimensions by their per-dimensional contribution to the decoder pullback metric, identifying those most responsible for memorization. Empirically, removing less-memorizing dimensions when computing attack statistics for score-based membership inference attacker significantly improves performance, with average AUROC gains of 2.7\% and substantial increases in TPR@1\%FPR (6.42\%) across diverse datasets including CIFAR-10, CelebA, ImageNet-1K, Pokémon, MS-COCO, and Flickr. This indicates stronger confidence in identifying members under extremely low false-positive tolerance. Our results highlight the overlooked influence of the auto-encoder geometry on LDM memorization and provide a new perspective for analyzing privacy risks in diffusion-based generative models.


【4】Identifying environmental factors associated with tetrodotoxin contamination in bivalve mollusks using eXplainable AI
标题:使用eXplainable人工智能识别与双壳类软体动物河豚毒素污染相关的环境因素
链接:https://arxiv.org/abs/2511.20395

作者:M. C. Schoppema,B. H. M. van der Velden,A. Hürriyetoğlu,M. D. Klijnstra,E. J. Faassen,A. Gerssen,H. J. van der Fels-Klerx
备注:18 pages, 6 figures, submitted to Nature Food
摘要:自2012年以来,在欧洲温带水域的双壳类软体动物等海产品中发现了河豚毒素(TTX)。TTX污染会导致食品安全风险和经济损失,因此对TTX污染的早期预测对食品行业和主管部门至关重要。最近的研究指出,浅水生境和水温是双壳类软体动物中TTX污染的主要驱动因素。然而,非生物因素,生物因素和TTX污染之间的时间关系仍然没有探索。   我们已经开发了一个可解释的,基于深度学习的模型来预测荷兰泽兰河口的TTX污染。模型的输入是气象和水文特征;输出是TTX污染的存在或不存在。   结果表明,日出时间、日落时间、总辐射、水温和氯离子浓度对TTX污染贡献最大。因此,以日长和全球辐射为代表的有效日照时数是双壳软体动物中河豚毒素污染的重要驱动因素。   总之,我们可解释的深度学习模型确定了上述环境因素(日照时数,全球辐射,水温和水氯化物浓度)与双壳类软体动物中的河豚毒素污染有关;使我们的方法成为减轻食品工业和主管部门海洋毒素风险的宝贵工具。
摘要 :Since 2012, tetrodotoxin (TTX) has been found in seafoods such as bivalve mollusks in temperate European waters. TTX contamination leads to food safety risks and economic losses, making early prediction of TTX contamination vital to the food industry and competent authorities. Recent studies have pointed to shallow habitats and water temperature as main drivers to TTX contamination in bivalve mollusks. However, the temporal relationships between abiotic factors, biotic factors, and TTX contamination remain unexplored.   We have developed an explainable, deep learning-based model to predict TTX contamination in the Dutch Zeeland estuary. Inputs for the model were meteorological and hydrological features; output was the presence or absence of TTX contamination.   Results showed that the time of sunrise, time of sunset, global radiation, water temperature, and chloride concentration contributed most to TTX contamination. Thus, the effective number of sun hours, represented by day length and global radiation, was an important driver for tetrodotoxin contamination in bivalve mollusks.   To conclude, our explainable deep learning model identified the aforementioned environmental factors (number of sun hours, global radiation, water temperature, and water chloride concentration) to be associated with tetrodotoxin contamination in bivalve mollusks; making our approach a valuable tool to mitigate marine toxin risks for food industry and competent authorities.


【5】Actionable and diverse counterfactual explanations incorporating domain knowledge and causal constraints
标题:结合领域知识和因果约束的可操作和多样化的反事实解释
链接:https://arxiv.org/abs/2511.20236

作者:Szymon Bobek,Łukasz Bałec,Grzegorz J. Nalepa
摘要:反事实解释通过识别实现模型的期望结果所需的最小变化来增强机器学习模型的可操作性。然而,现有的方法往往忽略了现实世界中的数据集的复杂的依赖关系,导致不切实际的或不切实际的修改。受电子邮件营销领域的网络安全应用的启发,我们提出了一种用于生成多样化,可操作和知识约束的反事实(DANCE)的方法,该方法结合了特征依赖性和因果约束,以确保反事实的可验证性和现实世界的可行性。我们的方法从数据中学习线性和非线性约束,或集成专家提供的依赖图,确保反事实是合理和可行的。通过保持与特征关系的一致性,该方法产生与现实世界约束一致的解释。此外,它平衡了可扩展性,多样性和稀疏性,有效地解决了现有算法中的关键限制。这项工作是根据与波兰最大的电子邮件营销公司Freshmail的真实案例研究开发的,并得到了联合研发项目Sendguard的支持。此外,我们使用140个公共数据集进行了广泛的评估,突出了其生成有意义的、与领域相关的反事实的能力,这些反事实优于基于广泛使用的指标的其他现有方法。用于复制结果的源代码可以在我们提供的GitHub存储库中找到。
摘要:Counterfactual explanations enhance the actionable interpretability of machine learning models by identifying the minimal changes required to achieve a desired outcome of the model. However, existing methods often ignore the complex dependencies in real-world datasets, leading to unrealistic or impractical modifications. Motivated by cybersecurity applications in the email marketing domain, we propose a method for generating Diverse, Actionable, and kNowledge-Constrained Explanations (DANCE), which incorporates feature dependencies and causal constraints to ensure plausibility and real-world feasibility of counterfactuals. Our method learns linear and nonlinear constraints from data or integrates expert-provided dependency graphs, ensuring counterfactuals are plausible and actionable. By maintaining consistency with feature relationships, the method produces explanations that align with real-world constraints. Additionally, it balances plausibility, diversity, and sparsity, effectively addressing key limitations in existing algorithms. The work is developed based on a real-life case study with Freshmail, the largest email marketing company in Poland and supported by a joint R&D project Sendguard. Furthermore, we provide an extensive evaluation using 140 public datasets, which highlights its ability to generate meaningful, domain-relevant counterfactuals that outperform other existing approaches based on widely used metrics. The source code for reproduction of the results can be found in a GitHub repository we provide.


【6】QiMeng-CRUX: Narrowing the Gap between Natural Language and Verilog via Core Refined Understanding eXpression
标题:QiMeng-CRUX:通过核心细化理解eExpress缩小自然语言和Verilog之间的差距
链接:https://arxiv.org/abs/2511.20099

作者:Lei Huang,Rui Zhang,Jiaming Guo,Yang Zhang,Di Huang,Shuyao Cheng,Pengwei Jin,Chongxiao Li,Zidong Du,Xing Hu,Qi Guo,Yunji Chen
备注:Accepted by the AAAI26 Conference Main Track
摘要:大型语言模型(LLM)在硬件描述语言(HDL)生成中显示出很有前途的能力。然而,现有的方法往往依赖于自由形式的自然语言描述,往往是模糊的,冗余的,和非结构化的,这对下游Verilog代码生成带来了重大挑战。我们把硬件代码生成作为一个复杂的转换,从一个开放的自然语言空间到一个特定领域的,高度约束的目标空间。为了弥合这一差距,我们引入核心精炼理解表达(CRUX),一个结构化的中间空间,捕捉用户意图的基本语义,同时组织精确的Verilog代码生成的表达。我们进一步设计了一个两阶段的训练框架,包括联合表达式建模和双空间优化,以提高CRUX和Verilog代码的质量。在多个Verilog生成基准测试的实验表明,我们的模型,CRUX-V,达到最先进的性能之间的一般模型,特别是在具有挑战性的设计任务。此外,CRUX空间在用作其他代码模型的输入提示时被证明是可转移的和有益的,突出了它在缩小自由形式的自然语言描述和精确的Verilog生成之间的差距方面的有效性。
摘要:Large language models (LLMs) have shown promising capabilities in hardware description language (HDL) generation. However, existing approaches often rely on free-form natural language descriptions that are often ambiguous, redundant, and unstructured, which poses significant challenges for downstream Verilog code generation. We treat hardware code generation as a complex transformation from an open-ended natural language space to a domain-specific, highly constrained target space. To bridge this gap, we introduce Core Refined Understanding eXpression (CRUX), a structured intermediate space that captures the essential semantics of user intent while organizing the expression for precise Verilog code generation. We further design a two-stage training framework, comprising Joint Expression Modeling and Dual-Space Optimization, to enhance the quality of both CRUX and Verilog code. Experiments across multiple Verilog generation benchmarks demonstrate that our model, CRUX-V, achieves state-of-the-art performance among general models, particularly under challenging design tasks. Furthermore, the CRUX space proves transferable and beneficial when used as input prompts for other code models, highlighting its effectiveness in narrowing the gap between free-form natural language descriptions and precise Verilog generation.


【7】Scalable Data Attribution via Forward-Only Test-Time Inference
标题:通过仅向前测试时推理的可扩展数据归因
链接:https://arxiv.org/abs/2511.19803

作者:Sibo Ma,Julian Nyarko
备注:8 pages. Work in progress
摘要:数据归因试图将模型行为追溯到塑造它的训练示例,从而实现大规模的调试、审计和数据评估。经典的影响函数方法提供了一个原则性的基础,但对于现代网络仍然不切实际,因为它们需要昂贵的反向传播或海森反演推理。我们提出了一种数据属性的方法,保留相同的一阶反事实的目标,同时消除每查询向后通过。我们的方法通过训练过程中的短时梯度传播模拟每个训练示例的参数影响,然后仅使用前向评估读取任何查询的属性。这种设计将计算从推理转移到模拟,反映了真实的部署机制,其中模型可能服务于数十亿用户查询,但源于固定的有限数据源集(例如,在不同语料库上训练的大型语言模型,同时补偿特定的出版商,如纽约时报)。从经验上讲,在标准的MLP基准测试中,我们的估计器匹配或超过了最先进的基线,如标准归因指标(LOO和LDS)的TRAK,同时提供了数量级更低的推理成本。通过将影响函数保真度与一阶可扩展性相结合,我们的方法为大型预训练模型中的实际实时数据属性提供了理论框架。
摘要 :Data attribution seeks to trace model behavior back to the training examples that shaped it, enabling debugging, auditing, and data valuation at scale. Classical influence-function methods offer a principled foundation but remain impractical for modern networks because they require expensive backpropagation or Hessian inversion at inference. We propose a data attribution method that preserves the same first-order counterfactual target while eliminating per-query backward passes. Our approach simulates each training example's parameter influence through short-horizon gradient propagation during training and later reads out attributions for any query using only forward evaluations. This design shifts computation from inference to simulation, reflecting real deployment regimes where a model may serve billions of user queries but originate from a fixed, finite set of data sources (for example, a large language model trained on diverse corpora while compensating a specific publisher such as the New York Times). Empirically, on standard MLP benchmarks, our estimator matches or surpasses state-of-the-art baselines such as TRAK on standard attribution metrics (LOO and LDS) while offering orders-of-magnitude lower inference cost. By combining influence-function fidelity with first-order scalability, our method provides a theoretical framework for practical, real-time data attribution in large pretrained models.


【8】Xmodel-2.5: 1.3B Data-Efficient Reasoning SLM
标题:XModel-2.5:1.3B数据高效推理STM
链接:https://arxiv.org/abs/2511.19496

作者:Yang Liu,Xiaolong Zhong,Ling Jiang
摘要:大型语言模型提供强大的推理和工具使用技能,但其计算需求使其对于边缘或成本敏感的部署不切实际。我们提出了\textbf{Xmodel-2.5},一个13亿参数的小语言模型,被设计为一个\textbf {drop-in agent core}。使用最大更新参数化($μ$P)进行训练,允许在20 M参数代理上调整的超参数直接传输到完整模型,即使在参数绑定的\ldblquote tie-word-embedding}架构下也是如此。使用了1. 4 T令牌Warmup-Stable-Decay课程,我们进一步表明,\textbf{在衰变阶段从AdamW切换到Muon}将13个任务的推理平均值提高了4.58\,\%,同时保持其他超参数固定,验证了早期AdamW稳定性可以与后期Muon锐化配对,以获得更好的下游性能。FP 8-混合精度训练平衡了准确性和吞吐量。所有检查点、配方和评估代码都是在Apache-2.0许可证下发布的。脚注{https:huggingface.co/XiaoduoAILab/Xmodel-2.5 and https://huggingface.co/XiaoduoAILab/Xmodel-2.5-history(training checkpoints).}培训代码和评估工具:https://github.com/XiaoduoAILab/Xmodel-2.5。
摘要:Large language models deliver strong reasoning and tool-use skills, yet their computational demands make them impractical for edge or cost-sensitive deployments. We present \textbf{Xmodel-2.5}, a 1.3-billion-parameter small language model designed as a \emph{drop-in agent core}. Training with maximal-update parameterization ($μ$P) allows hyper-parameters tuned on a 20M-parameter proxy to transfer directly to the full model, even under the parameter-tied \emph{tie-word-embedding} architecture. A 1.4T-token Warmup--Stable--Decay curriculum is used, and we further show that \textbf{switching from AdamW to Muon during the decay phase} improves the 13-task reasoning average by 4.58\,\% while keeping every other hyper-parameter fixed, verifying that early AdamW stability can be paired with late Muon sharpening for better downstream performance. FP8-mixed-precision training balances accuracy and throughput. All checkpoints, recipes, and evaluation code are released under the Apache-2.0 license.\footnote{https://huggingface.co/XiaoduoAILab/Xmodel-2.5 and https://huggingface.co/XiaoduoAILab/Xmodel-2.5-history (training checkpoints).} Training code and evaluation harness: https://github.com/XiaoduoAILab/Xmodel-2.5.


【9】Quality analysis and evaluation prediction of RAG retrieval based on machine learning algorithms
标题:基于机器学习算法的RAG检索质量分析与评价预测
链接:https://arxiv.org/abs/2511.19481

作者:Ruoxin Zhang,Zhizhao Wen,Chao Wang,Chenchen Tang,Puyang Xu,Yifan Jiang
摘要:随着大型语言模型的快速发展,检索增强生成技术因其能够整合外部知识提高输出准确率而得到广泛应用。然而,系统的性能高度依赖于检索模块的质量。如果检索结果与用户需求的相关性较低或包含噪声信息,将直接导致生成内容的失真。针对现有模型在处理表格特征时的性能瓶颈,提出一种基于特征工程和粒子群优化的XGBoost机器学习回归模型。相关性分析表明,answer_quality与doc_delevance正相关,相关系数为0.66,说明文档相关性对答案质量有显著的正向影响,提高文档相关性可以增强答案质量;语义相似度、冗余度和多样性之间的强负相关分别为-0.89和-0.88,表明语义相似度,冗余和多样性。换句话说,随着前两者的增加,多样性显着下降。对比决策树、AdaBoost等的实验结果表明,VMD PSO BiLSTM模型在各项评价指标上均优于对比模型,MSE、RMSE、MAE、MAPE均显著低于对比模型。R2值较高,说明其预测精度、稳定性和资料解释能力较突出。该成果为优化RAG系统的检索质量、提高生成效果提供了有效路径,对推动相关技术的实施和应用具有重要价值。
摘要:With the rapid evolution of large language models, retrieval enhanced generation technology has been widely used due to its ability to integrate external knowledge to improve output accuracy. However, the performance of the system is highly dependent on the quality of the retrieval module. If the retrieval results have low relevance to user needs or contain noisy information, it will directly lead to distortion of the generated content. In response to the performance bottleneck of existing models in processing tabular features, this paper proposes an XGBoost machine learning regression model based on feature engineering and particle swarm optimization. Correlation analysis shows that answer_quality is positively correlated with doc_delevance by 0.66, indicating that document relevance has a significant positive effect on answer quality, and improving document relevance may enhance answer quality; The strong negative correlations between semantic similarity, redundancy, and diversity were -0.89 and -0.88, respectively, indicating a trade- off between semantic similarity, redundancy, and diversity. In other words, as the former two increased, diversity significantly decreased. The experimental results comparing decision trees, AdaBoost, etc. show that the VMD PSO BiLSTM model is superior in all evaluation indicators, with significantly lower MSE, RMSE, MAE, and MAPE compared to the comparison model. The R2 value is higher, indicating that its prediction accuracy, stability, and data interpretation ability are more outstanding. This achievement provides an effective path for optimizing the retrieval quality and improving the generation effect of RAG system, and has important value in promoting the implementation and application of related technologies.


【10】WavefrontDiffusion: Dynamic Decoding Schedule or Improved Reasoning
标题:WavefrontDistance:动态解码时间表或改进推理
链接:https://arxiv.org/abs/2511.19473

作者:Haojin Yang,Rui Hu,Zequn Sun,Rui Zhou,Yujun Cai,Yiwei Wang
备注:19 pages. 3 figures
摘要:扩散语言模型(DLMs)已经显示出强大的潜力,文本生成,并正在成为一个有竞争力的替代自回归模型。去噪策略在确定其输出质量方面起着重要作用。主流的去噪策略包括标准扩散和块扩散。标准扩散在不限制更新范围的情况下执行全局去噪,通常会完成不完整的上下文并导致过早的序列结束预测。BlockDiffusion以预设的顺序更新固定大小的块,但其刚性结构可能会破坏连贯的语义单元并扰乱推理。我们提出了WavefrontDiffusion,一种动态解码方法,从最终位置向外扩展活动令牌的波前。这种自适应过程遵循语义结构的自然流程,同时保持计算成本与基于块的方法相等。在推理和代码生成的四个基准测试中,WavefrontDiffusion实现了最先进的性能,同时生成具有更高语义保真度的输出,显示了自适应调度对于更连贯和更有效生成的价值。
摘要:Diffusion Language Models (DLMs) have shown strong potential for text generation and are becoming a competitive alternative to autoregressive models. The denoising strategy plays an important role in determining the quality of their outputs. Mainstream denoising strategies include Standard Diffusion and BlockDiffusion. Standard Diffusion performs global denoising without restricting the update range, often finalizing incomplete context and causing premature end-of-sequence predictions. BlockDiffusion updates fixed-size blocks in a preset order, but its rigid structure can break apart coherent semantic units and disrupt reasoning. We present WavefrontDiffusion, a dynamic decoding approach that expands a wavefront of active tokens outward from finalized positions. This adaptive process follows the natural flow of semantic structure while keeping computational cost equal to block-based methods. Across four benchmarks in reasoning and code generation, WavefrontDiffusion achieves state-of-the-art performance while producing outputs with higher semantic fidelity, showing the value of adaptive scheduling for more coherent and efficient generation.


检测相关(3篇)

【1】StableTrack: Stabilizing Multi-Object Tracking on Low-Frequency Detections
标题:StableTrack:稳定低频检测的多目标跟踪
链接:https://arxiv.org/abs/2511.20418

作者:Matvei Shelukhan,Timur Mamedov,Karina Kvanchiani
摘要:多目标跟踪(MOT)是计算机视觉中最具挑战性的任务之一,其中正确检测目标并将这些检测跨帧关联非常重要。目前的方法主要集中在跟踪视频流的每帧中的对象,使得在有限的计算资源的条件下几乎不可能运行该模型。为了解决这个问题,我们提出了稳定跟踪,一种新的方法,稳定跟踪低频检测的质量。我们的方法引入了一个新的两阶段匹配策略,以提高低频检测之间的跨帧关联。我们提出了一种新的基于Bbox的距离,而不是传统的Mahalanobis距离,这使我们能够有效地匹配对象使用的Re-ID模型。此外,我们将视觉跟踪集成到卡尔曼滤波器和整个跟踪管道中。我们的方法在低频检测的情况下优于当前最先进的跟踪器,在MOT 17-val上在$\textit{1}$ Hz下实现了$\textit{11.6%}$ HOTA改进,同时在标准MOT 17,MOT 20和DanceTrack基准测试中保持最佳方法与全频率检测。
摘要:Multi-object tracking (MOT) is one of the most challenging tasks in computer vision, where it is important to correctly detect objects and associate these detections across frames. Current approaches mainly focus on tracking objects in each frame of a video stream, making it almost impossible to run the model under conditions of limited computing resources. To address this issue, we propose StableTrack, a novel approach that stabilizes the quality of tracking on low-frequency detections. Our method introduces a new two-stage matching strategy to improve the cross-frame association between low-frequency detections. We propose a novel Bbox-Based Distance instead of the conventional Mahalanobis distance, which allows us to effectively match objects using the Re-ID model. Furthermore, we integrate visual tracking into the Kalman Filter and the overall tracking pipeline. Our method outperforms current state-of-the-art trackers in the case of low-frequency detections, achieving $\textit{11.6%}$ HOTA improvement at $\textit{1}$ Hz on MOT17-val, while keeping up with the best approaches on the standard MOT17, MOT20, and DanceTrack benchmarks with full-frequency detections.


【2】RankOOD - Class Ranking-based Out-of-Distribution Detection
标题:RankOOD -基于类别排名的分布外检测
链接:https://arxiv.org/abs/2511.19996

作者:Dishanika Denipitiyage,Naveen Karunanayake,Suranga Seneviratne,Sanjay Chawla
摘要:我们提出了RankOOD,这是一种基于秩的分布外(OOD)检测方法,基于使用Placket-Luce损失训练模型,该方法现在广泛用于基础模型中的偏好对齐任务。我们的方法是基于这样的见解,即使用交叉熵损失训练的深度学习模型,分布(ID)类预测会为每个ID类预测产生一个排名模式。RankOOD框架通过首先使用初始分类器为每个类提取排名列表,然后使用另一轮具有Plackett-Luce损失的训练来形式化洞察,其中类排名,每个类的固定排列,是预测变量。一个OOD示例可能以很高的概率被分配给一个ID示例,但是它遵守排名分类的概率可能很小。RankOOD在接近ODD的TinyImageNet评估基准上实现了SOTA性能,将FPR 95降低了4.3%。
摘要:We propose RankOOD, a rank-based Out-of-Distribution (OOD) detection approach based on training a model with the Placket-Luce loss, which is now extensively used for preference alignment tasks in foundational models. Our approach is based on the insight that with a deep learning model trained using the Cross Entropy Loss, in-distribution (ID) class prediction induces a ranking pattern for each ID class prediction. The RankOOD framework formalizes the insight by first extracting a rank list for each class using an initial classifier and then uses another round of training with the Plackett-Luce loss, where the class rank, a fixed permutation for each class, is the predicted variable. An OOD example may get assigned with high probability to an ID example, but the probability of it respecting the ranking classification is likely to be small. RankOOD, achieves SOTA performance on the near-ODD TinyImageNet evaluation benchmark, reducing FPR95 by 4.3%.


【3】Latent-space metrics for Complex-Valued VAE out-of-distribution detection under radar clutter
标题:雷达杂波下复值VAE失分布检测的潜空间度量
链接:https://arxiv.org/abs/2511.19805

作者:Y. A. Rouzoumka,E. Terreaux,C. Morisseau,J. -P. Ovarlez,C. Ren
备注:Under review at ICASSP 2026
摘要:我们研究了复杂雷达环境下雷达分布外(OOD)检测的复值变分自动编码器(CVAE)。我们提出了几种检测指标:CVAE的重建误差(CVAE-MSE),基于延迟的分数(Mahalanobis,Kullback-Leibler散度(KLD)),并将它们的性能与经典的ANMF-Tyler检测器(ANMF-FP)进行了比较。所有这些检测器的性能进行了分析合成和实验雷达数据,显示每个检测器的优点和缺点。
摘要:We investigate complex-valued Variational AutoEncoders (CVAE) for radar Out-Of-Distribution (OOD) detection in complex radar environments. We proposed several detection metrics: the reconstruction error of CVAE (CVAE-MSE), the latent-based scores (Mahalanobis, Kullback-Leibler divergence (KLD)), and compared their performance against the classical ANMF-Tyler detector (ANMF-FP). The performance of all these detectors is analyzed on synthetic and experimental radar data, showing the advantages and the weaknesses of each detector.


分类|识别(6篇)

【1】Modular Deep Learning Framework for Assistive Perception: Gaze, Affect, and Speaker Identification
标题:辅助感知的模块化深度学习框架:凝视、情感和说话者识别
链接:https://arxiv.org/abs/2511.20474

作者:Akshit Pramod Anchan,Jewelith Thomas,Sritama Roy
备注:10 pages, 9 figures, and 3 tables
摘要:开发全面的辅助技术需要视觉和听觉感知的无缝集成。这项研究评估了模块化架构的可行性,灵感来自感知系统的核心功能,如“智能眼”。我们提出并测试了三个独立的传感模块:用于眼睛状态检测(困倦/注意力)的卷积神经网络(CNN),用于面部表情识别的深度CNN,以及用于基于语音的说话人识别的长短期记忆(LSTM)网络。利用Eyes Image、FER 2013和定制的音频数据集,我们的模型分别实现了93.0%、97.8%和96.89%的准确率。这项研究表明,轻量级的,特定领域的模型可以实现高保真的离散任务,建立一个验证的基础,为未来的实时,多模式集成在资源受限的辅助设备。
摘要:Developing comprehensive assistive technologies requires the seamless integration of visual and auditory perception. This research evaluates the feasibility of a modular architecture inspired by core functionalities of perceptive systems like 'Smart Eye.' We propose and benchmark three independent sensing modules: a Convolutional Neural Network (CNN) for eye state detection (drowsiness/attention), a deep CNN for facial expression recognition, and a Long Short-Term Memory (LSTM) network for voice-based speaker identification. Utilizing the Eyes Image, FER2013, and customized audio datasets, our models achieved accuracies of 93.0%, 97.8%, and 96.89%, respectively. This study demonstrates that lightweight, domain-specific models can achieve high fidelity on discrete tasks, establishing a validated foundation for future real-time, multimodal integration in resource-constrained assistive devices.


【2】Dance Style Classification using Laban-Inspired and Frequency-Domain Motion Features
标题:基于Laban启发和频域运动特征的舞蹈风格分类
链接:https://arxiv.org/abs/2511.20469

作者 :Ben Hamscher,Arnold Brosch,Nicolas Binninger,Maksymilian Jan Dejna,Kira Maag
摘要:舞蹈是人类文化的重要组成部分,是传达情感和讲述故事的工具。基于动作数据识别和区分舞蹈类型是人类活动识别中的一个复杂问题,因为许多风格共享相似的姿势,手势和时间运动模式。这项工作提出了一个轻量级的框架,舞蹈风格的分类,确定运动特征的基础上从视频中提取的姿势估计。我们提出的时空描述符的灵感来自拉班运动分析。这些特征捕获局部关节动态,例如上半身的速度、加速度和角运动,从而实现空间协调的结构化表示。为了进一步对运动的节奏和周期性方面进行编码,我们集成了快速傅立叶变换特征,这些特征在频域中表征运动模式。所提出的方法实现了强大的分类不同的舞蹈风格,计算量低,复杂的模型架构是不需要的,并表明,可解释的运动表示可以有效地捕捉风格的细微差别。
摘要:Dance is an essential component of human culture and serves as a tool for conveying emotions and telling stories. Identifying and distinguishing dance genres based on motion data is a complex problem in human activity recognition, as many styles share similar poses, gestures, and temporal motion patterns. This work presents a lightweight framework for classifying dance styles that determines motion characteristics based on pose estimates extracted from videos. We propose temporal-spatial descriptors inspired by Laban Movement Analysis. These features capture local joint dynamics such as velocity, acceleration, and angular movement of the upper body, enabling a structured representation of spatial coordination. To further encode rhythmic and periodic aspects of movement, we integrate Fast Fourier Transform features that characterize movement patterns in the frequency domain. The proposed approach achieves robust classification of different dance styles with low computational effort, as complex model architectures are not required, and shows that interpretable motion representations can effectively capture stylistic nuances.


【3】Tight Margin-Based Generalization Bounds for Voting Classifiers over Finite Hypothesis Sets
标题:有限假设集上投票分类器的基于紧余量的广义界
链接:https://arxiv.org/abs/2511.20407

作者:Kasper Green Larsen,Natascha Schalburg
摘要:我们证明了第一个基于保证金的投票分类器的泛化界,即在假设集的大小、保证金、具有给定保证金的训练点的分数、训练样本的数量和失败概率之间的权衡中是渐近紧的。
摘要:We prove the first margin-based generalization bound for voting classifiers, that is asymptotically tight in the tradeoff between the size of the hypothesis set, the margin, the fraction of training points with the given margin, the number of training samples and the failure probability.


【4】TiCT: A Synthetically Pre-Trained Foundation Model for Time Series Classification
标题:TiCT:用于时间序列分类的综合预训练基础模型
链接:https://arxiv.org/abs/2511.19694

作者:Chin-Chia Michael Yeh,Uday Singh Saini,Junpeng Wang,Xin Dai,Xiran Fan,Jiarui Sun,Yujie Fan,Yan Zheng
摘要:时间序列数据的普遍存在产生了对通用基础模型的强烈需求,但开发它们用于分类仍然是一个重大挑战,主要是由于标记数据的高成本。能够进行情境学习(ICL)的基础模型提供了一个强大的解决方案,能够以最少的示例适应新任务,并减少对大量再培训的需求。然而,大规模时间序列模型的先前工作主要集中在预测,留下了一个关键的差距,通用的,微调自由分类。为了解决这个问题,我们引入了TiCT(Time-series in-Context Transformer),这是一种基于transformer的模型,专门在合成数据上进行预训练,以执行上下文分类。我们做出了两个主要的技术贡献:1)一个新颖的架构,具有可扩展的基于位的标签编码和特殊的输出注意力机制,以处理任意数量的类; 2)一个合成的预训练框架,将Mixup启发的过程与数据增强相结合,以促进泛化和噪声不变性。对UCR Archive的广泛评估表明,TiCT与最先进的监督方法相比具有竞争力的性能。至关重要的是,这是在推理时仅使用上下文示例来实现的,而无需更新单个模型权重。
摘要:The ubiquity of time series data creates a strong demand for general-purpose foundation models, yet developing them for classification remains a significant challenge, largely due to the high cost of labeled data. Foundation models capable of in-context learning (ICL) offer a powerful solution, adapting to new tasks with minimal examples and reducing the need for extensive retraining. However, prior work on large-scale time series models has predominantly focused on forecasting, leaving a critical gap for versatile, fine-tuning-free classification. To address this, we introduce TiCT (Time-series in-Context Transformer), a transformer-based model pre-trained exclusively on synthetic data to perform in-context classification. We make two primary technical contributions: 1) a novel architecture featuring a scalable bit-based label encoding and a special output attention mechanism to handle an arbitrary number of classes; and 2) a synthetic pre-training framework that combines a Mixup-inspired process with data augmentation to foster generalization and noise invariance. Extensive evaluations on the UCR Archive show that TiCT achieves competitive performance against state-of-the-art supervised methods. Crucially, this is accomplished using only in-context examples at inference time, without updating a single model weight.


【5】OpenCML: End-to-End Framework of Open-world Machine Learning to Learn Unknown Classes Incrementally
标题:OpenMCR:开放世界机器学习的端到端框架,以增量式学习未知类
链接:https://arxiv.org/abs/2511.19491

作者:Jitendra Parmar,Praveen Singh Thakur
备注:Introduces an open-world machine learning model for continual and adaptive learning Discovers unknown classes and dynamically creates new class categories.Performs class-incremental learning to retain and extend prior knowledge. Enables continuous model improvement across multiple learning iterations. Achieved superior performance with an average accuracy of 82.54
摘要:开放世界机器学习是人工智能中的一种新兴技术,传统的机器学习模型通常遵循封闭世界的假设,这可能会阻碍它们为未来任务保留先前学习的知识的能力。然而,自动化智能系统必须学习新的类和以前已知的任务。该模型提供了一个开放的和连续的学习环境中的新的学习类。它包括两个不同但相互关联的任务。首先,它发现数据中的未知类并创建新类;接下来,它学习如何为每个新类增量地执行类。它们共同实现了持续学习,使系统能够扩展对数据的理解并随着时间的推移而改进。该模型在开放世界学习中的表现也优于现有方法。此外,它在持续学习方面表现出了强大的性能,在四次迭代中达到了82.54%的最高平均准确率和65.87%的最低准确率。
摘要:Open-world machine learning is an emerging technique in artificial intelligence, where conventional machine learning models often follow closed-world assumptions, which can hinder their ability to retain previously learned knowledge for future tasks. However, automated intelligence systems must learn about novel classes and previously known tasks. The proposed model offers novel learning classes in an open and continuous learning environment. It consists of two different but connected tasks. First, it discovers unknown classes in the data and creates novel classes; next, it learns how to perform class incrementally for each new class. Together, they enable continual learning, allowing the system to expand its understanding of the data and improve over time. The proposed model also outperformed existing approaches in open-world learning. Furthermore, it demonstrated strong performance in continuous learning, achieving a highest average accuracy of 82.54% over four iterations and a minimum accuracy of 65.87%.


【6】A Fully Probabilistic Tensor Network for Regularized Volterra System Identification
标题:正则Volterra系统辨识的全概率张量网络
链接:https://arxiv.org/abs/2511.20457

作者:Afra Kilic,Kim Batselier
备注 :6 pages, 3 figures, 1 table. Submitted to IFAC 2026. Code available at: https://github.com/afrakilic/BTN_Volterra_Sys_ID
摘要:用Volterra级数对非线性系统进行建模具有挑战性,因为核系数的数量随模型阶数呈指数增长。本文介绍了贝叶斯张量网络Volterra核机器(BTN-V),将贝叶斯张量网络框架扩展到Volterra系统辨识。BTN-V使用规范多进分解表示Volterra核,将模型复杂度从O(I^D)降低到O(I ^D)。通过将所有张量分量和超参数视为随机变量,BTN-V提供了预测不确定性估计,而无需额外的计算成本。稀疏诱导层次先验能够直接从数据中自动确定等级和学习衰退记忆行为,提高可解释性并防止过拟合。实证结果表明,竞争力的准确性,增强不确定性量化,并降低计算成本。
摘要:Modeling nonlinear systems with Volterra series is challenging because the number of kernel coefficients grows exponentially with the model order. This work introduces Bayesian Tensor Network Volterra kernel machines (BTN-V), extending the Bayesian Tensor Network framework to Volterra system identification. BTN-V represents Volterra kernels using canonical polyadic decomposition, reducing model complexity from O(I^D) to O(DIR). By treating all tensor components and hyperparameters as random variables, BTN-V provides predictive uncertainty estimation at no additional computational cost. Sparsity-inducing hierarchical priors enable automatic rank determination and the learning of fading-memory behavior directly from data, improving interpretability and preventing overfitting. Empirical results demonstrate competitive accuracy, enhanced uncertainty quantification, and reduced computational cost.


表征(3篇)

【1】MAPS: Preserving Vision-Language Representations via Module-Wise Proximity Scheduling for Better Vision-Language-Action Generalization
标题:MAPS:通过模块式邻近调度保持视觉语言表示,以实现更好的视觉语言动作泛化
链接:https://arxiv.org/abs/2511.19878

作者:Chengyue Huang,Mellon M. Zhang,Robert Azarcon,Glen Chou,Zsolt Kira
摘要:视觉-语言-动作(VLA)模型从预训练的视觉-语言模型(VLM)中继承了强大的先验知识,但天真的微调往往会破坏这些表示并损害泛化。现有的修复--冻结模块或应用统一正则化--要么过度约束自适应,要么忽略VLA组件的不同角色。我们提出的地图(模块明智的接近调度),第一个强大的微调框架的VLA。通过系统的分析,我们发现了一个经验顺序,邻近限制应该放松,以平衡稳定性和灵活性。MAPS线性地调度这种放松,使视觉编码器能够保持接近其预先训练的先验,而面向动作的语言层则可以更自由地适应。MAPS不引入额外的参数或数据,并且可以无缝集成到现有的VLA中。在MiniVLA-VQ、MiniVLA-OFT、OpenVLA-OFT和SimplerEnv、CALVIN、LIBERO等具有挑战性的基准测试中,以及在Franka Panda平台上进行的真实评估中,MAPS始终能够提高分发内和分发外的性能(高达+30%)。我们的研究结果强调了经验指导的接近预训练的VLMs作为一个简单而强大的原则,以保持广泛的推广VLM-VLA转移。
摘要:Vision-Language-Action (VLA) models inherit strong priors from pretrained Vision-Language Models (VLMs), but naive fine-tuning often disrupts these representations and harms generalization. Existing fixes -- freezing modules or applying uniform regularization -- either overconstrain adaptation or ignore the differing roles of VLA components. We present MAPS (Module-Wise Proximity Scheduling), the first robust fine-tuning framework for VLAs. Through systematic analysis, we uncover an empirical order in which proximity constraints should be relaxed to balance stability and flexibility. MAPS linearly schedules this relaxation, enabling visual encoders to stay close to their pretrained priors while action-oriented language layers adapt more freely. MAPS introduces no additional parameters or data, and can be seamlessly integrated into existing VLAs. Across MiniVLA-VQ, MiniVLA-OFT, OpenVLA-OFT, and challenging benchmarks such as SimplerEnv, CALVIN, LIBERO, as well as real-world evaluations on the Franka Emika Panda platform, MAPS consistently boosts both in-distribution and out-of-distribution performance (up to +30%). Our findings highlight empirically guided proximity to pretrained VLMs as a simple yet powerful principle for preserving broad generalization in VLM-to-VLA transfer.


【2】Profile Generators: A Link between the Narrative and the Binary Matrix Representation
标题:轮廓生成器:叙事和二进制矩阵表示之间的联系
链接:https://arxiv.org/abs/2511.19506

作者:Raoul H. Kutil,Georg Zimmermann,Barbara Strasser-Kirchweger,Christian Borgelt
备注:31 pages, 8 figures, 4 tables
摘要:Mental health disorders, particularly cognitive disorders defined by deficits in cognitive abilities, are described in detail in the DSM-5, which includes definitions and examples of signs and symptoms. A simplified, machine-actionable representation was developed to assess the similarity and separability of these disorders, but it is not suited for the most complex cases. Generating or applying a full binary matrix for similarity calculations is infeasible due to the vast number of symptom combinations. This research develops an alternative representation that links the narrative form of the DSM-5 with the binary matrix representation and enables automated generation of valid symptom combinations. Using a strict pre-defined format of lists, sets, and numbers with slight variations, complex diagnostic pathways involving numerous symptom combinations can be represented. This format, called the symptom profile generator (or simply generator), provides a readable, adaptable, and comprehensive alternative to a binary matrix while enabling easy generation of symptom combinations (profiles). Cognitive disorders, which typically involve multiple diagnostic criteria with several symptoms, can thus be expressed as lists of generators. Representing several psychotic disorders in generator form and generating all symptom combinations showed that matrix representations of complex disorders become too large to manage. The MPCS (maximum pairwise cosine similarity) algorithm cannot handle matrices of this size, prompting the development of a profile reduction method using targeted generator manipulation to find specific MPCS values between disorders. The generators allow easier creation of binary representations for large matrices and make it possible to calculate specific MPCS cases between complex disorders through conditional generators.


【3】Quantifying Modality Contributions via Disentangling Multimodal Representations
标题:通过分解多模态表征量化模态贡献
链接:https://arxiv.org/abs/2511.19470

作者:Padegal Amit,Omkar Mahesh Kashyap,Namitha Rayasam,Nidhi Shekhar,Surabhi Narayan
备注:16 pages, 11 figures
摘要:Quantifying modality contributions in multimodal models remains a challenge, as existing approaches conflate the notion of contribution itself. Prior work relies on accuracy-based approaches, interpreting performance drops after removing a modality as indicative of its influence. However, such outcome-driven metrics fail to distinguish whether a modality is inherently informative or whether its value arises only through interaction with other modalities. This distinction is particularly important in cross-attention architectures, where modalities influence each other's representations. In this work, we propose a framework based on Partial Information Decomposition (PID) that quantifies modality contributions by decomposing predictive information in internal embeddings into unique, redundant, and synergistic components. To enable scalable, inference-only analysis, we develop an algorithm based on the Iterative Proportional Fitting Procedure (IPFP) that computes layer and dataset-level contributions without retraining. This provides a principled, representation-level view of multimodal behavior, offering clearer and more interpretable insights than outcome-based metrics.


3D|3D重建等相关(1篇)

【1】Uplifting Table Tennis: A Robust, Real-World Application for 3D Trajectory and Spin Estimation
标题:提升乒乓球:3D轨迹和旋转估计的稳健现实应用
链接:https://arxiv.org/abs/2511.20250

作者:Daniel Kienzle,Katja Ludwig,Julian Lorenz,Shin'ichi Satoh,Rainer Lienhart
摘要:Obtaining the precise 3D motion of a table tennis ball from standard monocular videos is a challenging problem, as existing methods trained on synthetic data struggle to generalize to the noisy, imperfect ball and table detections of the real world. This is primarily due to the inherent lack of 3D ground truth trajectories and spin annotations for real-world video. To overcome this, we propose a novel two-stage pipeline that divides the problem into a front-end perception task and a back-end 2D-to-3D uplifting task. This separation allows us to train the front-end components with abundant 2D supervision from our newly created TTHQ dataset, while the back-end uplifting network is trained exclusively on physically-correct synthetic data. We specifically re-engineer the uplifting model to be robust to common real-world artifacts, such as missing detections and varying frame rates. By integrating a ball detector and a table keypoint detector, our approach transforms a proof-of-concept uplifting method into a practical, robust, and high-performing end-to-end application for 3D table tennis trajectory and spin analysis.


优化|敛散性(7篇)

【1】MapReduce LoRA: Advancing the Pareto Front in Multi-Preference Optimization for Generative Models
标题:MapReduce LoRA:在生成模型的多偏好优化中推进帕累托前沿
链接:https://arxiv.org/abs/2511.20629

作者:Chieh-Yun Chen,Zhonghao Wang,Qi Chen,Zhifan Ye,Min Shi,Yue Zhao,Yinan Zhao,Hui Qu,Wei-An Lin,Yiru Shen,Ajinkya Kale,Irfan Essa,Humphrey Shi
摘要:Reinforcement learning from human feedback (RLHF) with reward models has advanced alignment of generative models to human aesthetic and perceptual preferences. However, jointly optimizing multiple rewards often incurs an alignment tax, improving one dimension while degrading others. To address this, we introduce two complementary methods: MapReduce LoRA and Reward-aware Token Embedding (RaTE). MapReduce LoRA trains preference-specific LoRA experts in parallel and iteratively merges them to refine a shared base model; RaTE learns reward-specific token embeddings that compose at inference for flexible preference control. Experiments on Text-to-Image generation (Stable Diffusion 3.5 Medium and FLUX.1-dev) show improvements of 36.1%, 4.6%, and 55.7%, and 32.7%, 4.3%, and 67.1% on GenEval, PickScore, and OCR, respectively. On Text-to-Video generation (HunyuanVideo), visual and motion quality improve by 48.1% and 90.0%, respectively. On the language task, Helpful Assistant, with Llama-2 7B, helpful and harmless improve by 43.4% and 136.7%, respectively. Our framework sets a new state-of-the-art multi-preference alignment recipe across modalities.


【2】On the Limits of Momentum in Decentralized and Federated Optimization
标题:分散联邦优化中动量的局限性
链接:https://arxiv.org/abs/2511.20168

作者:Riccardo Zaccone,Sai Praneeth Karimireddy,Carlo Masone
备注:Accepted at the 17th Workshop on Optimization for Machine Learning (OPT@NeurIPS2025)
摘要:Recent works have explored the use of momentum in local methods to enhance distributed SGD. This is particularly appealing in Federated Learning (FL), where momentum intuitively appears as a solution to mitigate the effects of statistical heterogeneity. Despite recent progress in this direction, it is still unclear if momentum can guarantee convergence under unbounded heterogeneity in decentralized scenarios, where only some workers participate at each round. In this work we analyze momentum under cyclic client participation, and theoretically prove that it remains inevitably affected by statistical heterogeneity. Similarly to SGD, we prove that decreasing step-sizes do not help either: in fact, any schedule decreasing faster than $Θ\left(1/t\right)$ leads to convergence to a constant value that depends on the initialization and the heterogeneity bound. Numerical results corroborate the theory, and deep learning experiments confirm its relevance for realistic settings.


【3】IDAP++: Advancing Divergence-Based Pruning via Filter-Level and Layer-Level Optimization
标题:IDAP++:通过过滤器级和分层优化推进基于分歧的修剪
链接:https://arxiv.org/abs/2511.20141

作者:Aleksei Samarin,Artem Nazarenko,Egor Kotenko,Valentin Malykh,Alexander Savelev,Aleksei Toropov
备注:65 pages, 4 figures, 38 tables
摘要:This paper presents a novel approach to neural network compression that addresses redundancy at both the filter and architectural levels through a unified framework grounded in information flow analysis. Building on the concept of tensor flow divergence, which quantifies how information is transformed across network layers, we develop a two-stage optimization process. The first stage employs iterative divergence-aware pruning to identify and remove redundant filters while preserving critical information pathways. The second stage extends this principle to higher-level architecture optimization by analyzing layer-wise contributions to information propagation and selectively eliminating entire layers that demonstrate minimal impact on network performance. The proposed method naturally adapts to diverse architectures, including convolutional networks, transformers, and hybrid designs, providing a consistent metric for comparing the structural importance across different layer types. Experimental validation across multiple modern architectures and datasets reveals that this combined approach achieves substantial model compression while maintaining competitive accuracy. The presented approach achieves parameter reduction results that are globally comparable to those of state-of-the-art solutions and outperforms them across a wide range of modern neural network architectures, from convolutional models to transformers. The results demonstrate how flow divergence serves as an effective guiding principle for both filter-level and layer-level optimization, offering practical benefits for deployment in resource-constrained environments.


【4】Adaptivity and Universality: Problem-dependent Universal Regret for Online Convex Optimization
标题:适应性和普遍性:在线凸优化的问题相关的普遍遗憾
链接:https://arxiv.org/abs/2511.19937

作者:Peng Zhao,Yu-Hu Yan,Hang Yu,Zhi-Hua Zhou
摘要:Universal online learning aims to achieve optimal regret guarantees without requiring prior knowledge of the curvature of online functions. Existing methods have established minimax-optimal regret bounds for universal online learning, where a single algorithm can simultaneously attain $\mathcal{O}(\sqrt{T})$ regret for convex functions, $\mathcal{O}(d \log T)$ for exp-concave functions, and $\mathcal{O}(\log T)$ for strongly convex functions, where $T$ is the number of rounds and $d$ is the dimension of the feasible domain. However, these methods still lack problem-dependent adaptivity. In particular, no universal method provides regret bounds that scale with the gradient variation $V_T$, a key quantity that plays a crucial role in applications such as stochastic optimization and fast-rate convergence in games. In this work, we introduce UniGrad, a novel approach that achieves both universality and adaptivity, with two distinct realizations: UniGrad.Correct and UniGrad.Bregman. Both methods achieve universal regret guarantees that adapt to gradient variation, simultaneously attaining $\mathcal{O}(\log V_T)$ regret for strongly convex functions and $\mathcal{O}(d \log V_T)$ regret for exp-concave functions. For convex functions, the regret bounds differ: UniGrad.Correct achieves an $\mathcal{O}(\sqrt{V_T \log V_T})$ bound while preserving the RVU property that is crucial for fast convergence in online games, whereas UniGrad.Bregman achieves the optimal $\mathcal{O}(\sqrt{V_T})$ regret bound through a novel design. Both methods employ a meta algorithm with $\mathcal{O}(\log T)$ base learners, which naturally requires $\mathcal{O}(\log T)$ gradient queries per round. To enhance computational efficiency, we introduce UniGrad++, which retains the regret while reducing the gradient query to just $1$ per round via surrogate optimization. We further provide various implications.


【5】Lower Complexity Bounds for Nonconvex-Strongly-Convex Bilevel Optimization with First-Order Oracles
标题:具有一阶Oracle的非凸-强凸二层优化的低复杂性界
链接:https://arxiv.org/abs/2511.19656

作者:Kaiyi Ji
备注:24 pages, 1 figure
摘要:Although upper bound guarantees for bilevel optimization have been widely studied, progress on lower bounds has been limited due to the complexity of the bilevel structure. In this work, we focus on the smooth nonconvex-strongly-convex setting and develop new hard instances that yield nontrivial lower bounds under deterministic and stochastic first-order oracle models. In the deterministic case, we prove that any first-order zero-respecting algorithm requires at least $Ω(κ^{3/2}ε^{-2})$ oracle calls to find an $ε$-accurate stationary point, improving the optimal lower bounds known for single-level nonconvex optimization and for nonconvex-strongly-convex min-max problems. In the stochastic case, we show that at least $Ω(κ^{5/2}ε^{-4})$ stochastic oracle calls are necessary, again strengthening the best known bounds in related settings. Our results expose substantial gaps between current upper and lower bounds for bilevel optimization and suggest that even simplified regimes, such as those with quadratic lower-level objectives, warrant further investigation toward understanding the optimal complexity of bilevel optimization under standard first-order oracles.


【6】Merging without Forgetting: Continual Fusion of Task-Specific Models via Optimal Transport
标题:不忘合并:通过最佳传输持续融合特定任务模型
链接:https://arxiv.org/abs/2511.19561

作者:Zecheng Pan,Zhikang Chen,Ding Li,Min Zhang,Sen Cui,Hongshuo Jin,Luqi Tao,Yi Yang,Deheng Ye,Yu Zhang,Tingting Zhu,Tianling Ren
摘要:Merging models fine-tuned for different tasks into a single unified model has become an increasingly important direction for building versatile, efficient multi-task systems. Existing approaches predominantly rely on parameter interpolation in weight space, which we show introduces significant distribution shift in the feature space and undermines task-specific knowledge. In this paper, we propose OTMF (Optimal Transport-based Masked Fusion), a novel model merging framework rooted in optimal transport theory to address the distribution shift that arises from naive parameter interpolation. Instead of directly aggregating features or weights, OTMF aligns the semantic geometry of task-specific models by discovering common masks applied to task vectors through optimal transport plans. These masks selectively extract transferable and task-agnostic components while preserving the unique structural identities of each task. To ensure scalability in real-world settings, OTMF further supports a continual fusion paradigm that incrementally integrates each new task vector without revisiting previous ones, maintaining a bounded memory footprint and enabling efficient fusion across a growing number of tasks. We conduct comprehensive experiments on multiple vision and language benchmarks, and results show that OTMF achieves state-of-the-art performance in terms of both accuracy and efficiency. These findings highlight the practical and theoretical value of our approach to model merging.


【7】Optimization and Regularization Under Arbitrary Objectives
标题:任意目标下的优化与正规化
链接:https://arxiv.org/abs/2511.19628

作者:Jared N. Lakhani,Etienne Pienaar
备注:46 pages, 28 figures, 16 tables
摘要 :This study investigates the limitations of applying Markov Chain Monte Carlo (MCMC) methods to arbitrary objective functions, focusing on a two-block MCMC framework which alternates between Metropolis-Hastings and Gibbs sampling. While such approaches are often considered advantageous for enabling data-driven regularization, we show that their performance critically depends on the sharpness of the employed likelihood form. By introducing a sharpness parameter and exploring alternative likelihood formulations proportional to the target objective function, we demonstrate how likelihood curvature governs both in-sample performance and the degree of regularization inferred by the training data. Empirical applications are conducted on reinforcement learning tasks: including a navigation problem and the game of tic-tac-toe. The study concludes with a separate analysis examining the implications of extreme likelihood sharpness on arbitrary objective functions stemming from the classic game of blackjack, where the first block of the two-block MCMC framework is replaced with an iterative optimization step. The resulting hybrid approach achieves performance nearly identical to the original MCMC framework, indicating that excessive likelihood sharpness effectively collapses posterior mass onto a single dominant mode.


预测|估计(9篇)

【1】Feature-Modulated UFNO for Improved Prediction of Multiphase Flow in Porous Media
标题:流量调制UFNO用于改进多孔介质中多相流的预测
链接:https://arxiv.org/abs/2511.20543

作者:Alhasan Abdellatif,Hannah P. Menke,Ahmed H. Elsheikh,Florian Doster,Kamaljit Singh
摘要:The UNet-enhanced Fourier Neural Operator (UFNO) extends the Fourier Neural Operator (FNO) by incorporating a parallel UNet pathway, enabling the retention of both high- and low-frequency components. While UFNO improves predictive accuracy over FNO, it inefficiently treats scalar inputs (e.g., temperature, injection rate) as spatially distributed fields by duplicating their values across the domain. This forces the model to process redundant constant signals within the frequency domain. Additionally, its standard loss function does not account for spatial variations in error sensitivity, limiting performance in regions of high physical importance. We introduce UFNO-FiLM, an enhanced architecture that incorporates two key innovations. First, we decouple scalar inputs from spatial features using a Feature-wise Linear Modulation (FiLM) layer, allowing the model to modulate spatial feature maps without introducing constant signals into the Fourier transform. Second, we employ a spatially weighted loss function that prioritizes learning in critical regions. Our experiments on subsurface multiphase flow demonstrate a 21\% reduction in gas saturation Mean Absolute Error (MAE) compared to UFNO, highlighting the effectiveness of our approach in improving predictive accuracy.


【2】Forgetting by Pruning: Data Deletion in Join Cardinality Estimation
标题:修剪忘记:连接基数估计中的数据删除
链接:https://arxiv.org/abs/2511.20293

作者:Chaowei He,Yuanjun Liu,Qingzhi Ma,Shenyuan Ren,Xizhao Luo,Lei Zhao,An Liu
备注:AAAI26
摘要:Machine unlearning in learned cardinality estimation (CE) systems presents unique challenges due to the complex distributional dependencies in multi-table relational data. Specifically, data deletion, a core component of machine unlearning, faces three critical challenges in learned CE models: attribute-level sensitivity, inter-table propagation and domain disappearance leading to severe overestimation in multi-way joins. We propose Cardinality Estimation Pruning (CEP), the first unlearning framework specifically designed for multi-table learned CE systems. CEP introduces Distribution Sensitivity Pruning, which constructs semi-join deletion results and computes sensitivity scores to guide parameter pruning, and Domain Pruning, which removes support for value domains entirely eliminated by deletion. We evaluate CEP on state-of-the-art architectures NeuroCard and FACE across IMDB and TPC-H datasets. Results demonstrate CEP consistently achieves the lowest Q-error in multi-table scenarios, particularly under high deletion ratios, often outperforming full retraining. Furthermore, CEP significantly reduces convergence iterations, incurring negligible computational overhead of 0.3%-2.5% of fine-tuning time.


【3】Interpretable Air Pollution Forecasting by Physics-Guided Spatiotemporal Decoupling
标题:通过物理引导的时空脱钩进行可解释的空气污染预测
链接:https://arxiv.org/abs/2511.20257

作者:Zhiguo Zhang,Xiaoliang Ma,Daniel Schlesinger
备注:Accepted to 2025 IEEE International Conference on Big Data
摘要:Accurate and interpretable air pollution forecasting is crucial for public health, but most models face a trade-off between performance and interpretability. This study proposes a physics-guided, interpretable-by-design spatiotemporal learning framework. The model decomposes the spatiotemporal behavior of air pollutant concentrations into two transparent, additive modules. The first is a physics-guided transport kernel with directed weights conditioned on wind and geography (advection). The second is an explainable attention mechanism that learns local responses and attributes future concentrations to specific historical lags and exogenous drivers. Evaluated on a comprehensive dataset from the Stockholm region, our model consistently outperforms state-of-the-art baselines across multiple forecasting horizons. Our model's integration of high predictive performance and spatiotemporal interpretability provides a more reliable foundation for operational air-quality management in real-world applications.


【4】Multivariate Forecasting of Bitcoin Volatility with Gradient Boosting: Deterministic, Probabilistic, and Feature Importance Perspectives
标题:采用梯度提升的比特币波动性多元预测:确定性、概率和特征重要性观点
链接:https://arxiv.org/abs/2511.20105

作者:Grzegorz Dudek,Mateusz Kasprzyk,Paweł Pełka
摘要:This study investigates the application of the Light Gradient Boosting Machine (LGBM) model for both deterministic and probabilistic forecasting of Bitcoin realized volatility. Utilizing a comprehensive set of 69 predictors -- encompassing market, behavioral, and macroeconomic indicators -- we evaluate the performance of LGBM-based models and compare them with both econometric and machine learning baselines. For probabilistic forecasting, we explore two quantile-based approaches: direct quantile regression using the pinball loss function, and a residual simulation method that transforms point forecasts into predictive distributions. To identify the main drivers of volatility, we employ gain-based and permutation feature importance techniques, consistently highlighting the significance of trading volume, lagged volatility measures, investor attention, and market capitalization. The results demonstrate that LGBM models effectively capture the nonlinear and high-variance characteristics of cryptocurrency markets while providing interpretable insights into the underlying volatility dynamics.


【5】RED-F: Reconstruction-Elimination based Dual-stream Contrastive Forecasting for Multivariate Time Series Anomaly Prediction
标题:RD-F:基于重建-消除的双数据流对比预测多元时间序列异常预测
链接:https://arxiv.org/abs/2511.20044

作者:PengYu Chen,Xiaohou Shi,Yuan Chang,Yan Sun,Sajal K. Das
备注:13 pages, 12 figures
摘要:The proactive prediction of anomalies (AP) in mul- tivariate time series (MTS) is a critical challenge to ensure system dependability. The difficulty lies in identifying subtle anomaly precursors concealed within normal signals. However, existing unsupervised methods, trained exclusively on normal data, demonstrate a fundamental propensity to reconstruct normal patterns. Consequently, when confronted with weak precursors, their predictions are dominated by the normal pattern, submerging the very signal required for prediction. To contend with the limitation, we propose RED-F, a Reconstruction- Elimination based Dual-stream Contrastive Forecasting frame- work, comprising the Reconstruction-Elimination Model (REM) and the Dual-stream Contrastive Forecasting Model (DFM). The REM utilizes a hybrid time-frequency mechanism to mitigate the precursor, generating a purified, normal-pattern baseline. The DFM then receives this purified baseline and the original sequence which retains the precursor as parallel inputs. At the core of our framework, RED-F employs a contrastive forecast that transforms the difficult task of absolute signal detection into a simpler, more robust task of relative trajectory comparison by computing the divergence between these two predictive streams. This contrastive mechanism serves to amplify the faint precursor signal. Furthermore, the DFM is trained with a novel Multi-Series Prediction (MSP) objective, which leverages distant future con- text to enhance its predictive sensitivity. Extensive experiments on six real-world datasets demonstrate the superior capability of RED-F in anomaly prediction tasks.


【6】Structured Noise Modeling for Enhanced Time-Series Forecasting
标题:用于增强时间序列预测的结构性噪音建模
链接:https://arxiv.org/abs/2511.19657

作者:Sepideh Koohfar
摘要:Time-series forecasting remains difficult in real-world settings because temporal patterns operate at multiple scales, from broad contextual trends to fast, fine-grained fluctuations that drive critical decisions. Existing neural models often struggle to represent these interacting dynamics, leading to unstable predictions and reduced reliability in downstream applications. This work introduces a forecast-blur-denoise framework that improves temporal fidelity through structured noise modeling. The approach incorporates a learnable Gaussian Process module that generates smooth, correlated perturbations, encouraging the forecasting backbone to capture long-range structure while a dedicated refinement model restores high-resolution temporal detail. Training the components jointly enables natural competence division and avoids the artifacts commonly produced by isotropic corruption methods. Experiments across electricity, traffic, and solar datasets show consistent gains in multi-horizon accuracy and stability. The modular design also allows the blur-denoise layer to operate as a lightweight enhancement for pretrained models, supporting efficient adaptation in limited-data scenarios. By strengthening the reliability and interpretability of fine-scale temporal predictions, this framework contributes to more trustworthy AI systems used in forecasting-driven decision support across energy, infrastructure, and other time-critical domains.


【7】ModHiFi: Identifying High Fidelity predictive components for Model Modification
标题:ModHiFi:识别用于模型修改的高保真预测组件
链接:https://arxiv.org/abs/2511.19566

作者:Dhruva Kashyap,Chaitanya Murti,Pranav K Nayak,Tanay Narshana,Chiranjib Bhattacharyya
备注:NeurIPS 2025 (Spotlight). Our code is available at https://github.com/DhruvaKashyap/modhifi
摘要:Open weight models, which are ubiquitous, rarely provide access to their training data or loss function. This makes modifying such models for tasks such as pruning or unlearning constrained by this unavailability an active area of research. Existing techniques typically require gradients or ground-truth labels, rendering them infeasible in settings with limited computational resources. In this work, we investigate the fundamental question of identifying components that are critical to the model's predictive performance, without access to either gradients or the loss function, and with only distributional access such as synthetic data. We theoretically demonstrate that the global reconstruction error is linearly bounded by local reconstruction errors for Lipschitz-continuous networks such as CNNs and well-trained Transformers (which, contrary to existing literature, we find exhibit Lipschitz continuity). This motivates using the locally reconstructive behavior of component subsets to quantify their global importance, via a metric that we term Subset Fidelity. In the uncorrelated features setting, selecting individual components via their Subset Fidelity scores is optimal, which we use to propose ModHiFi, an algorithm for model modification that requires no training data or loss function access. ModHiFi-P, for structured pruning, achieves an 11% speedup over the current state of the art on ImageNet models and competitive performance on language models. ModHiFi-U, for classwise unlearning, achieves complete unlearning on CIFAR-10 without fine-tuning and demonstrates competitive performance on Swin Transformers.


【8】PeriodNet: Boosting the Potential of Attention Mechanism for Time Series Forecasting
标题:PeriodNet:提升时间序列预测注意力机制的潜力
链接:https://arxiv.org/abs/2511.19497

作者:Bowen Zhao,Huanlai Xing,Zhiwen Xiao,Jincheng Peng,Li Feng,Xinhan Wang,Rong Qu,Hui Li
摘要 :The attention mechanism has demonstrated remarkable potential in sequence modeling, exemplified by its successful application in natural language processing with models such as Bidirectional Encoder Representations from Transformers (BERT) and Generative Pre-trained Transformer (GPT). Despite these advancements, its utilization in time series forecasting (TSF) has yet to meet expectations. Exploring a better network structure for attention in TSF holds immense significance across various domains. In this paper, we present PeriodNet with a brand new structure to forecast univariate and multivariate time series. PeriodNet incorporates period attention and sparse period attention mechanism for analyzing adjacent periods. It enhances the mining of local characteristics, periodic patterns, and global dependencies. For efficient cross-variable modeling, we introduce an iterative grouping mechanism which can directly reduce the cross-variable redundancy. To fully leverage the extracted features on the encoder side, we redesign the entire architecture of the vanilla Transformer and propose a period diffuser for precise multi-period prediction. Through comprehensive experiments conducted on eight datasets, we demonstrate that PeriodNet outperforms six state-of-the-art models in both univariate and multivariate TSF scenarios in terms of mean square error and mean absolute error. In particular, PeriodNet achieves a relative improvement of 22% when forecasting time series with a length of 720, in comparison to other models based on the conventional encoder-decoder Transformer architecture.


【9】OmniTFT: Omni Target Forecasting for Vital Signs and Laboratory Result Trajectories in Multi Center ICU Data
标题:OmniTFT:多中心ICU数据中生命体征和实验室结果轨迹的全方位目标预测
链接:https://arxiv.org/abs/2511.19485

作者:Wanzhe Xu,Yutong Dai,Yitao Yang,Martin Loza,Weihang Zhang,Yang Cui,Xin Zeng,Sung Joon Park,Kenta Nakai
备注:23 pages, 5 figures, 2 tables
摘要:Accurate multivariate time-series prediction of vital signs and laboratory results is crucial for early intervention and precision medicine in intensive care units (ICUs). However, vital signs are often noisy and exhibit rapid fluctuations, while laboratory tests suffer from missing values, measurement lags, and device-specific bias, making integrative forecasting highly challenging. To address these issues, we propose OmniTFT, a deep learning framework that jointly learns and forecasts high-frequency vital signs and sparsely sampled laboratory results based on the Temporal Fusion Transformer (TFT). Specifically, OmniTFT implements four novel strategies to enhance performance: sliding window equalized sampling to balance physiological states, frequency-aware embedding shrinkage to stabilize rare-class representations, hierarchical variable selection to guide model attention toward informative feature clusters, and influence-aligned attention calibration to enhance robustness during abrupt physiological changes. By reducing the reliance on target-specific architectures and extensive feature engineering, OmniTFT enables unified modeling of multiple heterogeneous clinical targets while preserving cross-institutional generalizability. Across forecasting tasks, OmniTFT achieves substantial performance improvement for both vital signs and laboratory results on the MIMIC-III, MIMIC-IV, and eICU datasets. Its attention patterns are interpretable and consistent with known pathophysiology, underscoring its potential utility for quantitative decision support in clinical care.


其他神经网络|深度学习|模型|建模(35篇)

【1】ROOT: Robust Orthogonalized Optimizer for Neural Network Training
标题:ROOT:用于神经网络训练的鲁棒子化优化器
链接:https://arxiv.org/abs/2511.20626

作者:Wei He,Kai Han,Hang Zhou,Hanting Chen,Zhicheng Liu,Xinghao Chen,Yunhe Wang
摘要:The optimization of large language models (LLMs) remains a critical challenge, particularly as model scaling exacerbates sensitivity to algorithmic imprecision and training instability. Recent advances in optimizers have improved convergence efficiency through momentum orthogonalization, but suffer from two key robustness limitations: dimensional fragility in orthogonalization precision and vulnerability to outlier-induced noise. To address these robustness challenges, we introduce ROOT, a Robust Orthogonalized Optimizer that enhances training stability through dual robustness mechanisms. First, we develop a dimension-robust orthogonalization scheme using adaptive Newton iterations with fine-grained coefficients tailored to specific matrix sizes, ensuring consistent precision across diverse architectural configurations. Second, we introduce an optimization-robust framework via proximal optimization that suppresses outlier noise while preserving meaningful gradient directions. Extensive experiments demonstrate that ROOT achieves significantly improved robustness, with faster convergence and superior final performance compared to both Muon and Adam-based optimizers, particularly in noisy and non-convex scenarios. Our work establishes a new paradigm for developing robust and precise optimizers capable of handling the complexities of modern large-scale model training. The code will be available at https://github.com/huawei-noah/noah-research/tree/master/ROOT.


【2】Anatomica: Localized Control over Geometric and Topological Properties for Anatomical Diffusion Models
标题:Anatomica:解剖扩散模型几何和布局属性的局部控制
链接:https://arxiv.org/abs/2511.20587

作者:Karim Kadry,Abdallah Abdelwahed,Shoaib Goraya,Ajay Manicka,Naravich Chutisilp,Farhad Nezami,Elazer Edelman
备注:8 pages, 10 figures
摘要:We present Anatomica: an inference-time framework for generating multi-class anatomical voxel maps with localized geo-topological control. During generation, we use cuboidal control domains of varying dimensionality, location, and shape to slice out relevant substructures. These local substructures are used to compute differentiable penalty functions that steer the sample towards target constraints. We control geometric features such as size, shape, and position through voxel-wise moments, while topological features such as connected components, loops, and voids are enforced through persistent homology. Lastly, we implement Anatomica for latent diffusion models, where neural field decoders partially extract substructures, enabling the efficient control of anatomical properties. Anatomica applies flexibly across diverse anatomical systems, composing constraints to control complex structures over arbitrary dimensions and coordinate systems, thereby enabling the rational design of synthetic datasets for virtual trials or machine learning workflows.


【3】PaTAS: A Parallel System for Trust Propagation in Neural Networks Using Subjective Logic
标题:Pa塔斯:使用主观逻辑的神经网络信任传播并行系统
链接:https://arxiv.org/abs/2511.20586

作者:Koffi Ismael Ouattara,Ioannis Krontiris,Theo Dimitrakos,Dennis Eisermann,Frank Kargl
摘要 :Trustworthiness has become a key requirement for the deployment of artificial intelligence systems in safety-critical applications. Conventional evaluation metrics such as accuracy and precision fail to capture uncertainty or the reliability of model predictions, particularly under adversarial or degraded conditions. This paper introduces the \emph{Parallel Trust Assessment System (PaTAS)}, a framework for modeling and propagating trust in neural networks using Subjective Logic (SL). PaTAS operates in parallel with standard neural computation through \emph{Trust Nodes} and \emph{Trust Functions} that propagate input, parameter, and activation trust across the network. The framework defines a \emph{Parameter Trust Update} mechanism to refine parameter reliability during training and an \emph{Inference-Path Trust Assessment (IPTA)} method to compute instance-specific trust at inference. Experiments on real-world and adversarial datasets demonstrate that PaTAS produces interpretable, symmetric, and convergent trust estimates that complement accuracy and expose reliability gaps in poisoned, biased, or uncertain data scenarios. The results show that PaTAS effectively distinguishes between benign and adversarial inputs and identifies cases where model confidence diverges from actual reliability. By enabling transparent and quantifiable trust reasoning within neural architectures, PaTAS provides a principled foundation for evaluating model reliability across the AI lifecycle.


【4】MSTN: Fast and Efficient Multivariate Time Series Model
标题:MDRA:快速有效的多元时间序列模型
链接:https://arxiv.org/abs/2511.20577

作者:Sumit S Shevtekar,Chandresh K Maurya,Gourab Sil
备注:21 pages, 1 figure, 5 tables
摘要:Real-world time-series data is highly non stationary and complex in dynamics that operate across multiple timescales, ranging from fast, short-term changes to slow, long-term trends. Most existing models rely on fixed-scale structural priors, such as patch-based tokenization, fixed frequency transformations, or frozen backbone architectures. This often leads to over-regularization of temporal dynamics, which limits their ability to adaptively model the full spectrum of temporal variations and impairs their performance on unpredictable, Sudden, high-magnitude events. To address this, we introduce the Multi-scale Temporal Network (MSTN), a novel deep learning architecture founded on a hierarchical multi-scale and sequence modeling principle. The MSTN framework integrates: (i) a multi-scale convolutional encoder that constructs a hierarchical feature pyramid for local patterns (ii) a sequence modeling component for long-range temporal dependencies. We empirically validate this with BiLSTM and Transformer variants, establishing a flexible foundation for future architectural advancements. and (iii) a gated fusion mechanism augmented with squeeze-and-excitation (SE) and multi-head temporal attention (MHTA) for dynamic, context-aware feature integration. This design enables MSTN to adaptively model temporal patterns from milliseconds to long-range dependencies within a unified framework. Extensive evaluations across time-series long-horizon forecasting, imputation, classification and generalizability study demonstrate that MSTN achieves competitive state-of-the-art (SOTA) performance, showing improvements over contemporary approaches including EMTSF, LLM4TS, HiMTM, TIME-LLM, MTST, SOFTS, iTransformer, TimesNet, and PatchTST. In total, MSTN establishes new SOTA performance on 24 of 32 benchmark datasets, demonstrating its consistent performance across diverse temporal tasks.


【5】STARFlow-V: End-to-End Video Generative Modeling with Normalizing Flow
标题:STARFlow-V:具有规范化流程的端到端视频生成建模
链接:https://arxiv.org/abs/2511.20462

作者:Jiatao Gu,Ying Shen,Tianrong Chen,Laurent Dinh,Yuyang Wang,Miguel Angel Bautista,David Berthelot,Josh Susskind,Shuangfei Zhai
备注:21 pages
摘要:Normalizing flows (NFs) are end-to-end likelihood-based generative models for continuous data, and have recently regained attention with encouraging progress on image generation. Yet in the video generation domain, where spatiotemporal complexity and computational cost are substantially higher, state-of-the-art systems almost exclusively rely on diffusion-based models. In this work, we revisit this design space by presenting STARFlow-V, a normalizing flow-based video generator with substantial benefits such as end-to-end learning, robust causal prediction, and native likelihood estimation. Building upon the recently proposed STARFlow, STARFlow-V operates in the spatiotemporal latent space with a global-local architecture which restricts causal dependencies to a global latent space while preserving rich local within-frame interactions. This eases error accumulation over time, a common pitfall of standard autoregressive diffusion model generation. Additionally, we propose flow-score matching, which equips the model with a light-weight causal denoiser to improve the video generation consistency in an autoregressive fashion. To improve the sampling efficiency, STARFlow-V employs a video-aware Jacobi iteration scheme that recasts inner updates as parallelizable iterations without breaking causality. Thanks to the invertible structure, the same model can natively support text-to-video, image-to-video as well as video-to-video generation tasks. Empirically, STARFlow-V achieves strong visual fidelity and temporal consistency with practical sampling throughput relative to diffusion-based baselines. These results present the first evidence, to our knowledge, that NFs are capable of high-quality autoregressive video generation, establishing them as a promising research direction for building world models. Code and generated samples are available at https://github.com/apple/ml-starflow.


【6】Model-Based Learning of Whittle indices
标题:基于模型的惠特尔指数学习
链接:https://arxiv.org/abs/2511.20397

作者:Joël Charles-Rebuffé,Nicolas Gast,Bruno Gaujal
备注:31 pages, 8 figures, submitted to TOMPECS
摘要:We present BLINQ, a new model-based algorithm that learns the Whittle indices of an indexable, communicating and unichain Markov Decision Process (MDP). Our approach relies on building an empirical estimate of the MDP and then computing its Whittle indices using an extended version of a state-of-the-art existing algorithm. We provide a proof of convergence to the Whittle indices we want to learn as well as a bound on the time needed to learn them with arbitrary precision. Moreover, we investigate its computational complexity. Our numerical experiments suggest that BLINQ significantly outperforms existing Q-learning approaches in terms of the number of samples needed to get an accurate approximation. In addition, it has a total computational cost even lower than Q-learning for any reasonably high number of samples. These observations persist even when the Q-learning algorithms are speeded up using pre-trained neural networks to predict Q-values.


【7】Differentiable Attenuation Filters for Feedback Delay Networks
标题:反馈延迟网络的可微分衰减滤波器
链接:https://arxiv.org/abs/2511.20380

作者:Ilias Ibnyahya,Joshua D. Reiss
摘要 :We introduce a novel method for designing attenuation filters in digital audio reverberation systems based on Feedback Delay Net- works (FDNs). Our approach uses Second Order Sections (SOS) of Infinite Impulse Response (IIR) filters arranged as parametric equalizers (PEQ), enabling fine control over frequency-dependent reverberation decay. Unlike traditional graphic equalizer designs, which require numerous filters per delay line, we propose a scal- able solution where the number of filters can be adjusted. The fre- quency, gain, and quality factor (Q) parameters are shared parame- ters across delay lines and only the gain is adjusted based on delay length. This design not only reduces the number of optimization parameters, but also remains fully differentiable and compatible with gradient-based learning frameworks. Leveraging principles of analog filter design, our method allows for efficient and accu- rate filter fitting using supervised learning. Our method delivers a flexible and differentiable design, achieving state-of-the-art per- formance while significantly reducing computational cost.


【8】MXtalTools: A Toolkit for Machine Learning on Molecular Crystals
标题:MXtalTools:分子晶体机器学习工具包
链接:https://arxiv.org/abs/2511.20327

作者:Michael Kilgour,Mark E. Tuckerman,Jutta Rogal
备注:16 pages, 11 figures
摘要:We present MXtalTools, a flexible Python package for the data-driven modelling of molecular crystals, facilitating machine learning studies of the molecular solid state. MXtalTools comprises several classes of utilities: (1) synthesis, collation, and curation of molecule and crystal datasets, (2) integrated workflows for model training and inference, (3) crystal parameterization and representation, (4) crystal structure sampling and optimization, (5) end-to-end differentiable crystal sampling, construction and analysis. Our modular functions can be integrated into existing workflows or combined and used to build novel modelling pipelines. MXtalTools leverages CUDA acceleration to enable high-throughput crystal modelling. The Python code is available open-source on our GitHub page, with detailed documentation on ReadTheDocs.


【9】Communication-Efficient Learning for Satellite Constellations
标题:卫星星座的沟通高效学习
链接:https://arxiv.org/abs/2511.20220

作者:Ruxandra-Stefania Tudose,Moritz H. W. Grüss,Grace Ra Kim,Karl H. Johansson,Nicola Bastianello
摘要:Satellite constellations in low-Earth orbit are now widespread, enabling positioning, Earth imaging, and communications. In this paper we address the solution of learning problems using these satellite constellations. In particular, we focus on a federated approach, where satellites collect and locally process data, with the ground station aggregating local models. We focus on designing a novel, communication-efficient algorithm that still yields accurate trained models. To this end, we employ several mechanisms to reduce the number of communications with the ground station (local training) and their size (compression). We then propose an error feedback mechanism that enhances accuracy, which yields, as a byproduct, an algorithm-agnostic error feedback scheme that can be more broadly applied. We analyze the convergence of the resulting algorithm, and compare it with the state of the art through simulations in a realistic space scenario, showcasing superior performance.


【10】Learning Subgroups with Maximum Treatment Effects without Causal Heuristics
标题:在没有因果启发的情况下具有最大治疗效果的学习亚组
链接:https://arxiv.org/abs/2511.20189

作者:Lincen Yang,Zhong Li,Matthijs van Leeuwen,Saber Salehkaleybar
备注:The full version (including the Appendix). Accepted at AAAI 2026
摘要:Discovering subgroups with the maximum average treatment effect is crucial for targeted decision making in domains such as precision medicine, public policy, and education. While most prior work is formulated in the potential outcome framework, the corresponding structural causal model (SCM) for this task has been largely overlooked. In practice, two approaches dominate. The first estimates pointwise conditional treatment effects and then fits a tree on those estimates, effectively turning subgroup estimation into the harder problem of accurate pointwise estimation. The second constructs decision trees or rule sets with ad-hoc 'causal' heuristics, typically without rigorous justification for why a given heuristic may be used or whether such heuristics are necessary at all. We address these issues by studying the problem directly under the SCM framework. Under the assumption of a partition-based model, we show that optimal subgroup discovery reduces to recovering the data-generating models and hence a standard supervised learning problem (regression or classification). This allows us to adopt any partition-based methods to learn the subgroup from data. We instantiate the approach with CART, arguably one of the most widely used tree-based methods, to learn the subgroup with maximum treatment effect. Finally, on a large collection of synthetic and semi-synthetic datasets, we compare our method against a wide range of baselines and find that our approach, which avoids such causal heuristics, more accurately identifies subgroups with maximum treatment effect. Our source code is available at https://github.com/ylincen/causal-subgroup.


【11】SOMBRL: Scalable and Optimistic Model-Based RL
标题:SOMBRL:可扩展且乐观的基于模型的RL
链接:https://arxiv.org/abs/2511.20066

作者:Bhavya Sukhija,Lenart Treven,Carmelo Sferrazza,Florian Dörfler,Pieter Abbeel,Andreas Krause
摘要:We address the challenge of efficient exploration in model-based reinforcement learning (MBRL), where the system dynamics are unknown and the RL agent must learn directly from online interactions. We propose Scalable and Optimistic MBRL (SOMBRL), an approach based on the principle of optimism in the face of uncertainty. SOMBRL learns an uncertainty-aware dynamics model and greedily maximizes a weighted sum of the extrinsic reward and the agent's epistemic uncertainty. SOMBRL is compatible with any policy optimizers or planners, and under common regularity assumptions on the system, we show that SOMBRL has sublinear regret for nonlinear dynamics in the (i) finite-horizon, (ii) discounted infinite-horizon, and (iii) non-episodic settings. Additionally, SOMBRL offers a flexible and scalable solution for principled exploration. We evaluate SOMBRL on state-based and visual-control environments, where it displays strong performance across all tasks and baselines. We also evaluate SOMBRL on a dynamic RC car hardware and show SOMBRL outperforms the state-of-the-art, illustrating the benefits of principled exploration for MBRL.


【12】iRadioDiff: Physics-Informed Diffusion Model for Indoor Radio Map Construction and Localization
标题:iRadioDiff:用于室内无线电地图构建和定位的物理信息扩散模型
链接:https://arxiv.org/abs/2511.20015

作者:Xiucheng Wang,Tingwei Yuan,Yang Cao,Nan Cheng,Ruijin Sun,Weihua Zhuang
摘要:Radio maps (RMs) serve as environment-aware electromagnetic (EM) representations that connect scenario geometry and material properties to the spatial distribution of signal strength, enabling localization without costly in-situ measurements. However, constructing high-fidelity indoor RMs remains challenging due to the prohibitive latency of EM solvers and the limitations of learning-based methods, which often rely on sparse measurements or assumptions of homogeneous material, which are misaligned with the heterogeneous and multipath-rich nature of indoor environments. To overcome these challenges, we propose iRadioDiff, a sampling-free diffusion-based framework for indoor RM construction. iRadioDiff is conditioned on access point (AP) positions, and physics-informed prompt encoded by material reflection and transmission coefficients. It further incorporates multipath-critical priors, including diffraction points, strong transmission boundaries, and line-of-sight (LoS) contours, to guide the generative process via conditional channels and boundary-weighted objectives. This design enables accurate modeling of nonstationary field discontinuities and efficient construction of physically consistent RMs. Experiments demonstrate that iRadioDiff achieves state-of-the-art performance in indoor RM construction and received signal strength based indoor localization, which offers effective generalization across layouts and material configurations. Code is available at https://github.com/UNIC-Lab/iRadioDiff.


【13】On-Demand Multi-Task Sparsity for Efficient Large-Model Deployment on Edge Devices
标题:按需多任务稀疏性,可在边缘设备上高效部署大型号
链接:https://arxiv.org/abs/2511.19986

作者:Lianming Huang,Haibo Hu,Qiao Li,Nan Guan,Chun Jason Xue
摘要:Sparsity is essential for deploying large models on resource constrained edge platforms. However, optimizing sparsity patterns for individual tasks in isolation ignores the significant I/O overhead incurred during frequent task switching. We introduce an on-demand multi-task sparsity framework specifically designed to minimize switching costs by maximizing parameter reuse. Unlike monolithic approaches, we decompose weights into reusable block-granular units and align sparse structures across tasks to maximize overlap. By dynamically loading only the small differential set of blocks required for the next task, our method effectively mitigates the cold-start latency inherent in traditional monolithic approaches.Experiments on a real-world autonomous driving platform demonstrate that our framework achieves superior switching efficiency, accelerating task switching by over 6.6X on average compared to existing sparsity methods.


【14】Rethinking Message Passing Neural Networks with Diffusion Distance-guided Stress Majorization
标题:利用扩散距离引导的压力优化重新思考消息传递神经网络
链接:https://arxiv.org/abs/2511.19984

作者:Haoran Zheng,Renchi Yang,Yubo Zhou,Jianliang Xu
备注:Accepted by SIGKDD 2026. The code is available at https://github.com/HaoranZ99/DDSM
摘要:Message passing neural networks (MPNNs) have emerged as go-to models for learning on graph-structured data in the past decade. Despite their effectiveness, most of such models still incur severe issues such as over-smoothing and -correlation, due to their underlying objective of minimizing the Dirichlet energy and the derived neighborhood aggregation operations. In this paper, we propose the DDSM, a new MPNN model built on an optimization framework that includes the stress majorization and orthogonal regularization for overcoming the above issues. Further, we introduce the diffusion distances for nodes into the framework to guide the new message passing operations and develop efficient algorithms for distance approximations, both backed by rigorous theoretical analyses. Our comprehensive experiments showcase that DDSM consistently and considerably outperforms 15 strong baselines on both homophilic and heterophilic graphs.


【15】Operator Learning at Machine Precision
标题:机器精度领域的操作员学习
链接:https://arxiv.org/abs/2511.19980

作者:Aras Bacho,Aleksei G. Sorokin,Xianjin Yang,Théo Bourdais,Edoardo Calvello,Matthieu Darcy,Alexander Hsu,Bamdad Hosseini,Houman Owhadi
摘要:Neural operator learning methods have garnered significant attention in scientific computing for their ability to approximate infinite-dimensional operators. However, increasing their complexity often fails to substantially improve their accuracy, leaving them on par with much simpler approaches such as kernel methods and more traditional reduced-order models. In this article, we set out to address this shortcoming and introduce CHONKNORIS (Cholesky Newton--Kantorovich Neural Operator Residual Iterative System), an operator learning paradigm that can achieve machine precision. CHONKNORIS draws on numerical analysis: many nonlinear forward and inverse PDE problems are solvable by Newton-type methods. Rather than regressing the solution operator itself, our method regresses the Cholesky factors of the elliptic operator associated with Tikhonov-regularized Newton--Kantorovich updates. The resulting unrolled iteration yields a neural architecture whose machine-precision behavior follows from achieving a contractive map, requiring far lower accuracy than end-to-end approximation of the solution operator. We benchmark CHONKNORIS on a range of nonlinear forward and inverse problems, including a nonlinear elliptic equation, Burgers' equation, a nonlinear Darcy flow problem, the Calderón problem, an inverse wave scattering problem, and a problem from seismic imaging. We also present theoretical guarantees for the convergence of CHONKNORIS in terms of the accuracy of the emulated Cholesky factors. Additionally, we introduce a foundation model variant, FONKNORIS (Foundation Newton--Kantorovich Neural Operator Residual Iterative System), which aggregates multiple pre-trained CHONKNORIS experts for diverse PDEs to emulate the solution map of a novel nonlinear PDE. Our FONKNORIS model is able to accurately solve unseen nonlinear PDEs such as the Klein--Gordon and Sine--Gordon equations.


【16】Designing Reputation Systems for Manufacturing Data Trading Markets: A Multi-Agent Evaluation with Q-Learning and IRL-Estimated Utilities
标题:为制造业数据交易市场设计声誉系统:使用Q-Learning和IRL估计效用的多代理评估
链接:https://arxiv.org/abs/2511.19930

作者:Kenta Yamamoto,Teruaki Hayashi
备注:10 pages, 10 figures
摘要:Recent advances in machine learning and big data analytics have intensified the demand for high-quality cross-domain datasets and accelerated the growth of data trading across organizations. As data become increasingly recognized as an economic asset, data marketplaces have emerged as a key infrastructure for data-driven innovation. However, unlike mature product or service markets, data-trading environments remain nascent and suffer from pronounced information asymmetry. Buyers cannot verify the content or quality before purchasing data, making trust and quality assurance central challenges. To address these issues, this study develops a multi-agent data-market simulator that models participant behavior and evaluates the institutional mechanisms for trust formation. Focusing on the manufacturing sector, where initiatives such as GAIA-X and Catena-X are advancing, the simulator integrates reinforcement learning (RL) for adaptive agent behavior and inverse reinforcement learning (IRL) to estimate utility functions from empirical behavioral data. Using the simulator, we examine the market-level effects of five representative reputation systems-Time-decay, Bayesian-beta, PageRank, PowerTrust, and PeerTrust-and found that PeerTrust achieved the strongest alignment between data price and quality, while preventing monopolistic dominance. Building on these results, we develop a hybrid reputation mechanism that integrates the strengths of existing systems to achieve improved price-quality consistency and overall market stability. This study extends simulation-based data-market analysis by incorporating trust and reputation as endogenous mechanisms and offering methodological and institutional insights into the design of reliable and efficient data ecosystems.


【17】Cisco Time Series Model Technical Report
标题:思科时间序列模型技术报告
链接:https://arxiv.org/abs/2511.19841

作者:Liang Gou,Archit Khare,Praneet Pabolu,Prachi Patel,Joseph Ross,Hercy Shen,Yuhan,Song,Jingze Sun,Kristal Curtis,Vedant Dharnidharka,Abhinav Mathur,Hao Yang
摘要:We introduce the Cisco Time Series Model, a univariate zero-shot forecaster. This time series foundation model is the result of a general architectural innovation to a time series model enabling it to accept multiresolution input, applied to a popular decoder-only time series model (TimesFM). The resulting multiresolution decoder-only model is trained on over 300B unique data points, with more than half coming from the observability domain. Quantitative and qualitative evaluations demonstrate that the resulting model achieves superior performance on observability datasets while retaining very similar performance on a standard general-purpose forecasting benchmark (GIFT-Eval), and suggest that the multiresolution structure enables the model to make more accurate predictions on long context input.


【18】Mosaic Pruning: A Hierarchical Framework for Generalizable Pruning of Mixture-of-Experts Models
标题:马赛克修剪:专家混合模型可推广修剪的分层框架
链接:https://arxiv.org/abs/2511.19822

作者:Wentao Hu,Mingkuan Zhao,Shuangyong Song,Xiaoyan Zhu,Xin Lai,Jiayin Wang
摘要:Sparse Mixture-of-Experts (SMoE) architectures have enabled a new frontier in scaling Large Language Models (LLMs), offering superior performance by activating only a fraction of their total parameters during inference. However, their practical deployment is severely hampered by substantial static memory overhead, as all experts must be loaded into memory. Existing post-training pruning methods, while reducing model size, often derive their pruning criteria from a single, general-purpose corpus. This leads to a critical limitation: a catastrophic performance degradation when the pruned model is applied to other domains, necessitating a costly re-pruning for each new domain. To address this generalization gap, we introduce Mosaic Pruning (MoP). The core idea of MoP is to construct a functionally comprehensive set of experts through a structured ``cluster-then-select" process. This process leverages a similarity metric that captures expert performance across different task domains to functionally cluster the experts, and subsequently selects the most representative expert from each cluster based on our proposed Activation Variability Score. Unlike methods that optimize for a single corpus, our proposed Mosaic Pruning ensures that the pruned model retains a functionally complementary set of experts, much like the tiles of a mosaic that together form a complete picture of the original model's capabilities, enabling it to handle diverse downstream tasks.Extensive experiments on various MoE models demonstrate the superiority of our approach. MoP significantly outperforms prior work, achieving a 7.24\% gain on general tasks and 8.92\% on specialized tasks like math reasoning and code generation.


【19】CropVLM: Learning to Zoom for Fine-Grained Vision-Language Perception
标题:CropVLM:学习缩放以获得细粒度的视觉语言感知
链接:https://arxiv.org/abs/2511.19820

作者:Miguel Carvalho,Helder Dias,Bruno Martins
摘要:Vision-Language Models (VLMs) often struggle with tasks that require fine-grained image understanding, such as scene-text recognition or document analysis, due to perception limitations and visual fragmentation. To address these challenges, we introduce CropVLM as an external low-cost method for boosting performance, enabling VLMs to dynamically ''zoom in'' on relevant image regions, enhancing their ability to capture fine details. CropVLM is trained using reinforcement learning, without using human-labeled bounding boxes as a supervision signal, and without expensive synthetic evaluations. The model is trained once and can be paired with both open-source and proprietary VLMs to improve their performance. Our approach delivers significant improvements on tasks that require high-resolution image understanding, notably for benchmarks that are out-of-domain for the target VLM, without modifying or fine-tuning the VLM, thus avoiding catastrophic forgetting.


【20】DISCO: A Browser-Based Privacy-Preserving Framework for Distributed Collaborative Learning
标题:DISCO:一个基于浏览器的分布式协作学习隐私保护框架
链接:https://arxiv.org/abs/2511.19750

作者:Julien T. T. Vignoud,Valérian Rousset,Hugo El Guedj,Ignacio Aleman,Walid Bennaceur,Batuhan Faik Derinbay,Eduard Ďurech,Damien Gengler,Lucas Giordano,Felix Grimberg,Franziska Lippoldt,Christina Kopidaki,Jiafan Liu,Lauris Lopata,Nathan Maire,Paul Mansat,Martin Milenkoski,Emmanuel Omont,Güneş Özgün,Mina Petrović,Francesco Posa,Morgan Ridel,Giorgio Savini,Marcel Torne,Lucas Trognon,Alyssa Unell,Olena Zavertiaieva,Sai Praneeth Karimireddy,Tahseen Rabbani,Mary-Anne Hartley,Martin Jaggi
摘要:Data is often impractical to share for a range of well considered reasons, such as concerns over privacy, intellectual property, and legal constraints. This not only fragments the statistical power of predictive models, but creates an accessibility bias, where accuracy becomes inequitably distributed to those who have the resources to overcome these concerns. We present DISCO: an open-source DIStributed COllaborative learning platform accessible to non-technical users, offering a means to collaboratively build machine learning models without sharing any original data or requiring any programming knowledge. DISCO's web application trains models locally directly in the browser, making our tool cross-platform out-of-the-box, including smartphones. The modular design of \disco offers choices between federated and decentralized paradigms, various levels of privacy guarantees and several approaches to weight aggregation strategies that allow for model personalization and bias resilience in the collaborative training. Code repository is available at https://github.com/epfml/disco and a showcase web interface at https://discolab.ai


【21】Many Ways to be Right: Rashomon Sets for Concept-Based Neural Networks
标题:多种方法是正确的:基于概念的神经网络的罗生门集
链接:https://arxiv.org/abs/2511.19636

作者:Shihan Feng,Cheng Zhang,Michael Xi,Ethan Hsu,Lesia Semenova,Chudi Zhong
摘要:Modern neural networks rarely have a single way to be right. For many tasks, multiple models can achieve identical performance while relying on different features or reasoning patterns, a property known as the Rashomon Effect. However, uncovering this diversity in deep architectures is challenging as their continuous parameter spaces contain countless near-optimal solutions that are numerically distinct but often behaviorally similar. We introduce Rashomon Concept Bottleneck Models, a framework that learns multiple neural networks which are all accurate yet reason through distinct human-understandable concepts. By combining lightweight adapter modules with a diversity-regularized training objective, our method constructs a diverse set of deep concept-based models efficiently without retraining from scratch. The resulting networks provide fundamentally different reasoning processes for the same predictions, revealing how concept reliance and decision making vary across equally performing solutions. Our framework enables systematic exploration of data-driven reasoning diversity in deep models, offering a new mechanism for auditing, comparison, and alignment across equally accurate solutions.


【22】Learning Massively Multitask World Models for Continuous Control
标题:学习大规模多任务世界模型以实现连续控制
链接:https://arxiv.org/abs/2511.19584

作者:Nicklas Hansen,Hao Su,Xiaolong Wang
备注:Webpage: https://www.nicklashansen.com/NewtWM
摘要:General-purpose control demands agents that act across many tasks and embodiments, yet research on reinforcement learning (RL) for continuous control remains dominated by single-task or offline regimes, reinforcing a view that online RL does not scale. Inspired by the foundation model recipe (large-scale pretraining followed by light RL) we ask whether a single agent can be trained on hundreds of tasks with online interaction. To accelerate research in this direction, we introduce a new benchmark with 200 diverse tasks spanning many domains and embodiments, each with language instructions, demonstrations, and optionally image observations. We then present \emph{Newt}, a language-conditioned multitask world model that is first pretrained on demonstrations to acquire task-aware representations and action priors, and then jointly optimized with online interaction across all tasks. Experiments show that Newt yields better multitask performance and data-efficiency than a set of strong baselines, exhibits strong open-loop control, and enables rapid adaptation to unseen tasks. We release our environments, demonstrations, code for training and evaluation, as well as 200+ checkpoints.


【23】SPQR: A Standardized Benchmark for Modern Safety Alignment Methods in Text-to-Image Diffusion Models
标题:SPQR:文本到图像扩散模型中现代安全对齐方法的标准化基准
链接:https://arxiv.org/abs/2511.19558

作者:Mohammed Talha Alam,Nada Saadi,Fahad Shamshad,Nils Lukas,Karthik Nandakumar,Fahkri Karray,Samuele Poppi
备注:20 pages, 8 figures, 10 tables
摘要:Text-to-image diffusion models can emit copyrighted, unsafe, or private content. Safety alignment aims to suppress specific concepts, yet evaluations seldom test whether safety persists under benign downstream fine-tuning routinely applied after deployment (e.g., LoRA personalization, style/domain adapters). We study the stability of current safety methods under benign fine-tuning and observe frequent breakdowns. As true safety alignment must withstand even benign post-deployment adaptations, we introduce the SPQR benchmark (Safety-Prompt adherence-Quality-Robustness). SPQR is a single-scored metric that provides a standardized and reproducible framework to evaluate how well safety-aligned diffusion models preserve safety, utility, and robustness under benign fine-tuning, by reporting a single leaderboard score to facilitate comparisons. We conduct multilingual, domain-specific, and out-of-distribution analyses, along with category-wise breakdowns, to identify when safety alignment fails after benign fine-tuning, ultimately showcasing SPQR as a concise yet comprehensive benchmark for T2I safety alignment techniques for T2I models.


【24】Learning to Solve Weighted Maximum Satisfiability with a Co-Training Architecture
标题:学习使用联合训练架构解决加权最大满意度
链接:https://arxiv.org/abs/2511.19544

作者:Kaidi Wan,Minghao Liu,Yong Lai
备注:10 pages, 4 figures
摘要:Wepropose SplitGNN, a graph neural network (GNN)-based   approach that learns to solve weighted maximum satisfiabil ity (MaxSAT) problem. SplitGNN incorporates a co-training   architecture consisting of supervised message passing mech anism and unsupervised solution boosting layer. A new graph   representation called edge-splitting factor graph is proposed   to provide more structural information for learning, which is   based on spanning tree generation and edge classification. To   improve the solutions on challenging and weighted instances,   we implement a GPU-accelerated layer applying efficient   score calculation and relaxation-based optimization. Exper iments show that SplitGNN achieves 3* faster convergence   and better predictions compared with other GNN-based ar chitectures. More notably, SplitGNN successfully finds solu tions that outperform modern heuristic MaxSAT solvers on   much larger and harder weighted MaxSAT benchmarks, and   demonstrates exceptional generalization abilities on diverse   structural instances.


【25】Row-stochastic matrices can provably outperform doubly stochastic matrices in decentralized learning
标题:行随机矩阵可以证明在去中心学习中优于双随机矩阵
链接:https://arxiv.org/abs/2511.19513

作者:Bing Liu,Boao Kong,Limin Lu,Kun Yuan,Chengcheng Zhao
备注:41 pages, 38 figures
摘要:Decentralized learning often involves a weighted global loss with heterogeneous node weights $λ$. We revisit two natural strategies for incorporating these weights: (i) embedding them into the local losses to retain a uniform weight (and thus a doubly stochastic matrix), and (ii) keeping the original losses while employing a $λ$-induced row-stochastic matrix. Although prior work shows that both strategies yield the same expected descent direction for the global loss, it remains unclear whether the Euclidean-space guarantees are tight and what fundamentally differentiates their behaviors. To clarify this, we develop a weighted Hilbert-space framework $L^2(λ;\mathbb{R}^d)$ and obtain convergence rates that are strictly tighter than those from Euclidean analysis. In this geometry, the row-stochastic matrix becomes self-adjoint whereas the doubly stochastic one does not, creating additional penalty terms that amplify consensus error, thereby slowing convergence. Consequently, the difference in convergence arises not only from spectral gaps but also from these penalty terms. We then derive sufficient conditions under which the row-stochastic design converges faster even with a smaller spectral gap. Finally, by using a Rayleigh-quotient and Loewner-order eigenvalue comparison, we further obtain topology conditions that guarantee this advantage and yield practical topology-design guidelines.


【26】Generative Model-Aided Continual Learning for CSI Feedback in FDD mMIMO-OFDM Systems
标题:DDD mMMMO-CDMA系统中用于SI反馈的生成式模型辅助连续学习
链接:https://arxiv.org/abs/2511.19490

作者:Guijun Liu,Yuwen Cao,Tomoaki Ohtsuki,Jiguang He,Shahid Mumtaz
摘要:Deep autoencoder (DAE) frameworks have demonstrated their effectiveness in reducing channel state information (CSI) feedback overhead in massive multiple-input multiple-output (mMIMO) orthogonal frequency division multiplexing (OFDM) systems. However, existing CSI feedback models struggle to adapt to dynamic environments caused by user mobility, requiring retraining when encountering new CSI distributions. Moreover, returning to previously encountered environments often leads to performance degradation due to catastrophic forgetting. Continual learning involves enabling models to incorporate new information while maintaining performance on previously learned tasks. To address these challenges, we propose a generative adversarial network (GAN)-based learning approach for CSI feedback. By using a GAN generator as a memory unit, our method preserves knowledge from past environments and ensures consistently high performance across diverse scenarios without forgetting. Simulation results show that the proposed approach enhances the generalization capability of the DAE framework while maintaining low memory overhead. Furthermore, it can be seamlessly integrated with other advanced CSI feedback models, highlighting its robustness and adaptability.


【27】stable-pretraining-v1: Foundation Model Research Made Simple
标题:稳定预训练-v1:基础模型研究变得简单
链接:https://arxiv.org/abs/2511.19484

作者:Randall Balestriero,Hugues Van Assel,Sami BuGhanem,Lucas Maes
摘要:Foundation models and self-supervised learning (SSL) have become central to modern AI, yet research in this area remains hindered by complex codebases, redundant re-implementations, and the heavy engineering burden of scaling experiments. We present stable-pretraining, a modular, extensible, and performance-optimized library built on top of PyTorch, Lightning, Hugging Face, and TorchMetrics. Unlike prior toolkits focused narrowly on reproducing state-of-the-art results, stable-pretraining is designed for flexibility and iteration speed: it unifies essential SSL utilities--including probes, collapse detection metrics, augmentation pipelines, and extensible evaluation routines--within a coherent and reliable framework. A central design principle is logging everything, enabling fine-grained visibility into training dynamics that makes debugging, monitoring, and reproducibility seamless. We validate the library by demonstrating its ability to generate new research insights with minimal overhead, including depthwise representation probing and the analysis of CLIP degradation under synthetic data finetuning. By lowering barriers to entry while remaining scalable to large experiments, stable-pretraining aims to accelerate discovery and expand the possibilities of foundation model research.


【28】Hidden markov model to predict tourists visited place
标题:预测游客访问地点的隐藏马尔科夫模型
链接:https://arxiv.org/abs/2511.19465

作者:Theo Demessance,Chongke Bi,Sonia Djebali,Guillaume Guerard
摘要 :Nowadays, social networks are becoming a popular way of analyzing tourist behavior, thanks to the digital traces left by travelers during their stays on these networks. The massive amount of data generated; by the propensity of tourists to share comments and photos during their trip; makes it possible to model their journeys and analyze their behavior. Predicting the next movement of tourists plays a key role in tourism marketing to understand demand and improve decision support. In this paper, we propose a method to understand and to learn tourists' movements based on social network data analysis to predict future movements. The method relies on a machine learning grammatical inference algorithm. A major contribution in this paper is to adapt the grammatical inference algorithm to the context of big data. Our method produces a hidden Markov model representing the movements of a group of tourists. The hidden Markov model is flexible and editable with new data. The capital city of France, Paris is selected to demonstrate the efficiency of the proposed methodology.


【29】Are Large Brainwave Foundation Models Capable Yet? Insights from Fine-tuning
标题:大型脑电波基金会模型还可以吗?微调的见解
链接:https://arxiv.org/abs/2507.01196

作者:Na Lee,Konstantinos Barmpas,Yannis Panagakis,Dimitrios Adamos,Nikolaos Laskaris,Stefanos Zafeiriou
摘要:Foundation Models have demonstrated significant success across various domains in Artificial Intelligence (AI), yet their capabilities for brainwave modeling remain unclear. In this paper, we comprehensively evaluate current Large Brainwave Foundation Models (LBMs) through systematic fine-tuning experiments across multiple Brain-Computer Interface (BCI) benchmark tasks, including memory tasks and sleep stage classification. Our extensive analysis shows that state-of-the-art LBMs achieve only marginal improvements (0.9%-1.2%) over traditional deep architectures while requiring significantly more parameters (millions vs thousands), raising important questions about their efficiency and applicability in BCI contexts. Moreover, through detailed ablation studies and Low-Rank Adaptation (LoRA), we significantly reduce trainable parameters without performance degradation, while demonstrating that architectural and training inefficiencies limit LBMs' current capabilities. Our experiments span both full model fine-tuning and parameter-efficient adaptation techniques, providing insights into optimal training strategies for BCI applications. We pioneer the application of LoRA to LBMs, revealing that performance benefits generally emerge when adapting multiple neural network components simultaneously. These findings highlight the critical need for domain-specific development strategies to advance LBMs, suggesting that current architectures may require redesign to fully leverage the potential of foundation models in brainwave analysis.


【30】Spatio-Temporal Hierarchical Causal Models
标题:时空分层因果模型
链接:https://arxiv.org/abs/2511.20558

作者:Xintong Li,Haoran Zhang,Xiao Zhou
摘要:The abundance of fine-grained spatio-temporal data, such as traffic sensor networks, offers vast opportunities for scientific discovery. However, inferring causal relationships from such observational data remains challenging, particularly due to unobserved confounders that are specific to units (e.g., geographical locations) yet influence outcomes over time. Most existing methods for spatio-temporal causal inference assume that all confounders are observed, an assumption that is often violated in practice. In this paper, we introduce Spatio-Temporal Hierarchical Causal Models (ST-HCMs), a novel graphical framework that extends hierarchical causal modeling to the spatio-temporal domain. At the core of our approach is the Spatio-Temporal Collapse Theorem, which shows that a complex ST-HCM converges to a simpler flat causal model as the amount of subunit data increases. This theoretical result enables a general procedure for causal identification, allowing ST-HCMs to recover causal effects even in the presence of unobserved, time-invariant unit-level confounders, a scenario where standard non-hierarchical models fail. We validate the effectiveness of our framework on both synthetic and real-world datasets, demonstrating its potential for robust causal inference in complex dynamic systems.


【31】Generative Modeling with Manifold Percolation
标题:流形渗流的生成式建模
链接:https://arxiv.org/abs/2511.20503

作者:Rui Tong
备注:13 pages, 7 figures. Correspondence: Rui.Tong@warwick.ac.uk
摘要:Generative modeling is typically framed as learning mapping rules, but from an observer's perspective without access to these rules, the task manifests as disentangling the geometric support from the probability distribution. We propose that Continuum Percolation is uniquely suited for this support analysis, as the sampling process effectively projects high-dimensional density estimation onto a geometric counting problem on the support. In this work, we establish a rigorous isomorphism between the topological phase transitions of Random Geometric Graphs and the underlying data manifold in high-dimensional space. By analyzing the relationship between our proposed Percolation Shift metric and FID, we demonstrate that our metric captures structural pathologies (such as implicit mode collapse) where statistical metrics fail. Finally, we translate this topological phenomenon into a differentiable loss function to guide training. Experimental results confirm that this approach not only prevents manifold shrinkage but drives the model toward a state of "Hyper-Generalization," achieving good fidelity and verified topological expansion.


【32】Solving Heterogeneous Agent Models with Physics-informed Neural Networks
标题:利用物理信息神经网络解决异类智能体模型
链接:https://arxiv.org/abs/2511.20283

作者:Marta Grzeskiewicz
摘要:Understanding household behaviour is essential for modelling macroeconomic dynamics and designing effective policy. While heterogeneous agent models offer a more realistic alternative to representative agent frameworks, their implementation poses significant computational challenges, particularly in continuous time. The Aiyagari-Bewley-Huggett (ABH) framework, recast as a system of partial differential equations, typically relies on grid-based solvers that suffer from the curse of dimensionality, high computational cost, and numerical inaccuracies. This paper introduces the ABH-PINN solver, an approach based on Physics-Informed Neural Networks (PINNs), which embeds the Hamilton-Jacobi-Bellman and Kolmogorov Forward equations directly into the neural network training objective. By replacing grid-based approximation with mesh-free, differentiable function learning, the ABH-PINN solver benefits from the advantages of PINNs of improved scalability, smoother solutions, and computational efficiency. Preliminary results show that the PINN-based approach is able to obtain economically valid results matching the established finite-difference solvers.


【33】Learning Degenerate Manifolds of Frustrated Magnets with Boltzmann Machines
标题:用Boltzmann机学习受挫折磁铁的简并体
链接:https://arxiv.org/abs/2511.19879

作者:Jackson C. Glass,Gia-Wei Chern
备注:12 pages, 10 figures
摘要:We show that Restricted Boltzmann Machines (RBMs) provide a flexible generative framework for modeling spin configurations in disordered yet strongly correlated phases of frustrated magnets. As a benchmark, we first demonstrate that an RBM can learn the zero-temperature ground-state manifold of the one-dimensional ANNNI model at its multiphase point, accurately reproducing its characteristic oscillatory and exponentially decaying correlations. We then apply RBMs to kagome spin ice and show that they successfully learn the local ice rules and short-range correlations of the extensively degenerate ice-I manifold. Correlation functions computed from RBM-generated configurations closely match those from direct Monte Carlo simulations. For the partially ordered ice-II phase -- featuring long-range charge order and broken time-reversal symmetry -- accurate modeling requires RBMs with uniform-sign bias fields, mirroring the underlying symmetry breaking. These results highlight the utility of RBMs as generative models for learning constrained and highly frustrated magnetic states.


【34】CycleChemist: A Dual-Pronged Machine Learning Framework for Organic Photovoltaic Discovery
标题:CycleChemist:有机太阳能发现的双重机器学习框架
链接:https://arxiv.org/abs/2511.19500

作者:Hou Hei Lam,Jiangjie Qiu,Xiuyuan Hu,Wentao Li,Fankun Zeng,Siwei Fu,Hao Zhang,Xiaonan Wang
摘要:Organic photovoltaic (OPV) materials offer a promising path toward sustainable energy generation, but their development is limited by the difficulty of identifying high performance donor and acceptor pairs with strong power conversion efficiencies (PCEs). Existing design strategies typically focus on either the donor or the acceptor alone, rather than using a unified approach capable of modeling both components. In this work, we introduce a dual machine learning framework for OPV discovery that combines predictive modeling with generative molecular design. We present the Organic Photovoltaic Donor Acceptor Dataset (OPV2D), the largest curated dataset of its kind, containing 2000 experimentally characterized donor acceptor pairs. Using this dataset, we develop the Organic Photovoltaic Classifier (OPVC) to predict whether a material exhibits OPV behavior, and a hierarchical graph neural network that incorporates multi task learning and donor acceptor interaction modeling. This framework includes the Molecular Orbital Energy Estimator (MOE2) for predicting HOMO and LUMO energy levels, and the Photovoltaic Performance Predictor (P3) for estimating PCE. In addition, we introduce the Material Generative Pretrained Transformer (MatGPT) to produce synthetically accessible organic semiconductors, guided by a reinforcement learning strategy with three objective policy optimization. By linking molecular representation learning with performance prediction, our framework advances data driven discovery of high performance OPV materials.


【35】Dual-Path Knowledge-Augmented Contrastive Alignment Network for Spatially Resolved Transcriptomics
标题:空间解析转录组学的双路径知识增强对比对齐网络
链接:https://arxiv.org/abs/2511.17685

作者:Wei Zhang,Jiajun Chu,Xinci Liu,Chen Tong,Xinyue Li
备注:AAAI 2026 Oral, extended version
摘要:Spatial Transcriptomics (ST) is a technology that measures gene expression profiles within tissue sections while retaining spatial context. It reveals localized gene expression patterns and tissue heterogeneity, both of which are essential for understanding disease etiology. However, its high cost has driven efforts to predict spatial gene expression from whole slide images. Despite recent advancements, current methods still face significant limitations, such as under-exploitation of high-level biological context, over-reliance on exemplar retrievals, and inadequate alignment of heterogeneous modalities. To address these challenges, we propose DKAN, a novel Dual-path Knowledge-Augmented contrastive alignment Network that predicts spatially resolved gene expression by integrating histopathological images and gene expression profiles through a biologically informed approach. Specifically, we introduce an effective gene semantic representation module that leverages the external gene database to provide additional biological insights, thereby enhancing gene expression prediction. Further, we adopt a unified, one-stage contrastive learning paradigm, seamlessly combining contrastive learning and supervised learning to eliminate reliance on exemplars, complemented with an adaptive weighting mechanism. Additionally, we propose a dual-path contrastive alignment module that employs gene semantic features as dynamic cross-modal coordinators to enable effective heterogeneous feature integration. Through extensive experiments across three public ST datasets, DKAN demonstrates superior performance over state-of-the-art models, establishing a new benchmark for spatial gene expression prediction and offering a powerful tool for advancing biological and clinical research.


其他(42篇)

【1】Concept-Aware Batch Sampling Improves Language-Image Pretraining
标题:概念感知批采样改进了图像预训练
链接:https://arxiv.org/abs/2511.20643

作者:Adhiraj Ghosh,Vishaal Udandarao,Thao Nguyen,Matteo Farina,Mehdi Cherti,Jenia Jitsev,Sewoong Oh,Elisa Ricci,Ludwig Schmidt,Matthias Bethge
备注:Tech Report
摘要 :What data should a vision-language model be trained on? To answer this question, many data curation efforts center on the quality of a dataset. However, most of these existing methods are (i) offline, i.e. they produce a static dataset from a set of predetermined filtering criteria, and (ii) concept-agnostic, i.e. they use model-based filters which induce additional data biases. In this work, we go beyond such offline, concept-agnostic methods and advocate for more flexible, task-adaptive online concept-based curation. Our first contribution is DataConcept, a collection of 128M web-crawled image-text pairs annotated with fine-grained details about their concept composition. Building on DataConcept, we introduce Concept-Aware Batch Sampling (CABS), a simple yet effective batch sampling framework that flexibly constructs batches on-the-fly based on specific target distributions. We propose two variants: (i) Diversity Maximization (CABS-DM) to curate batches with a broad coverage of available concepts, and (ii) Frequency Maximization (CABS-FM) to curate batches with high object multiplicity. Through extensive evaluations across 28 benchmarks, we demonstrate that our CABS method significantly benefits CLIP/SigLIP model classes and yields highly performant models. Overall, CABS represents a strong open-source alternative to proprietary online data curation algorithms, enabling practitioners to define custom concept distributions that optimize for specific downstream tasks.


【2】MotionV2V: Editing Motion in a Video
标题:MotionV2 V:编辑视频中的动作
链接:https://arxiv.org/abs/2511.20640

作者:Ryan Burgert,Charles Herrmann,Forrester Cole,Michael S Ryoo,Neal Wadhwa,Andrey Voynov,Nataniel Ruiz
摘要:While generative video models have achieved remarkable fidelity and consistency, applying these capabilities to video editing remains a complex challenge. Recent research has explored motion controllability as a means to enhance text-to-video generation or image animation; however, we identify precise motion control as a promising yet under-explored paradigm for editing existing videos. In this work, we propose modifying video motion by directly editing sparse trajectories extracted from the input. We term the deviation between input and output trajectories a "motion edit" and demonstrate that this representation, when coupled with a generative backbone, enables powerful video editing capabilities. To achieve this, we introduce a pipeline for generating "motion counterfactuals", video pairs that share identical content but distinct motion, and we fine-tune a motion-conditioned video diffusion architecture on this dataset. Our approach allows for edits that start at any timestamp and propagate naturally. In a four-way head-to-head user study, our model achieves over 65 percent preference against prior work. Please see our project page: https://ryanndagreat.github.io/MotionV2V


【3】Latent Collaboration in Multi-Agent Systems
标题:多智能体系统中的潜在协作
链接:https://arxiv.org/abs/2511.20639

作者:Jiaru Zou,Xiyuan Yang,Ruizhong Qiu,Gaotang Li,Katherine Tieu,Pan Lu,Ke Shen,Hanghang Tong,Yejin Choi,Jingrui He,James Zou,Mengdi Wang,Ling Yang
备注:Project: https://github.com/Gen-Verse/LatentMAS
摘要:Multi-agent systems (MAS) extend large language models (LLMs) from independent single-model reasoning to coordinative system-level intelligence. While existing LLM agents depend on text-based mediation for reasoning and communication, we take a step forward by enabling models to collaborate directly within the continuous latent space. We introduce LatentMAS, an end-to-end training-free framework that enables pure latent collaboration among LLM agents. In LatentMAS, each agent first performs auto-regressive latent thoughts generation through last-layer hidden embeddings. A shared latent working memory then preserves and transfers each agent's internal representations, ensuring lossless information exchange. We provide theoretical analyses establishing that LatentMAS attains higher expressiveness and lossless information preservation with substantially lower complexity than vanilla text-based MAS. In addition, empirical evaluations across 9 comprehensive benchmarks spanning math and science reasoning, commonsense understanding, and code generation show that LatentMAS consistently outperforms strong single-model and text-based MAS baselines, achieving up to 14.6% higher accuracy, reducing output token usage by 70.8%-83.7%, and providing 4x-4.3x faster end-to-end inference. These results demonstrate that our new latent collaboration framework enhances system-level reasoning quality while offering substantial efficiency gains without any additional training. Code and data are fully open-sourced at https://github.com/Gen-Verse/LatentMAS.


【4】Sparse-to-Field Reconstruction via Stochastic Neural Dynamic Mode Decomposition
标题:通过随机神经动态模式分解的稀疏到场重建
链接:https://arxiv.org/abs/2511.20612

作者:Yujin Kim,Sarah Dean
摘要:Many consequential real-world systems, like wind fields and ocean currents, are dynamic and hard to model. Learning their governing dynamics remains a central challenge in scientific machine learning. Dynamic Mode Decomposition (DMD) provides a simple, data-driven approximation, but practical use is limited by sparse/noisy observations from continuous fields, reliance on linear approximations, and the lack of principled uncertainty quantification. To address these issues, we introduce Stochastic NODE-DMD, a probabilistic extension of DMD that models continuous-time, nonlinear dynamics while remaining interpretable. Our approach enables continuous spatiotemporal reconstruction at arbitrary coordinates and quantifies predictive uncertainty. Across four benchmarks, a synthetic setting and three physics-based flows, it surpasses a baseline in reconstruction accuracy when trained from only 10% observation density. It further recovers the dynamical structure by aligning learned modes and continuous-time eigenvalues with ground truth. Finally, on datasets with multiple realizations, our method learns a calibrated distribution over latent dynamics that preserves ensemble variability rather than averaging across regimes. Our code is available at: https://github.com/sedan-group/Stochastic-NODE-DMD


【5】New York Smells: A Large Multimodal Dataset for Olfaction
标题:纽约气味:一个大型多模式嗅觉数据集
链接:https://arxiv.org/abs/2511.20544

作者:Ege Ozguroglu,Junbang Liang,Ruoshi Liu,Mia Chiquier,Michael DeTienne,Wesley Wei Qian,Alexandra Horowitz,Andrew Owens,Carl Vondrick
备注:Project website at https://smell.cs.columbia.edu
摘要:While olfaction is central to how animals perceive the world, this rich chemical sensory modality remains largely inaccessible to machines. One key bottleneck is the lack of diverse, multimodal olfactory training data collected in natural settings. We present New York Smells, a large dataset of paired image and olfactory signals captured ``in the wild.'' Our dataset contains 7,000 smell-image pairs from 3,500 distinct objects across indoor and outdoor environments, with approximately 70$\times$ more objects than existing olfactory datasets. Our benchmark has three tasks: cross-modal smell-to-image retrieval, recognizing scenes, objects, and materials from smell alone, and fine-grained discrimination between grass species. Through experiments on our dataset, we find that visual data enables cross-modal olfactory representation learning, and that our learned olfactory representations outperform widely-used hand-crafted features.


【6】Automated Monitoring of Cultural Heritage Artifacts Using Semantic Segmentation
标题:基于语义分割的文物自动监测
链接:https://arxiv.org/abs/2511.20541

作者:Andrea Ranieri,Giorgio Palmieri,Silvia Biasotti
备注:Keywords: Cultural Heritage, Monitoring, Deep Learning, U-Nets, Semantic Segmentation
摘要:This paper addresses the critical need for automated crack detection in the preservation of cultural heritage through semantic segmentation. We present a comparative study of U-Net architectures, using various convolutional neural network (CNN) encoders, for pixel-level crack identification on statues and monuments. A comparative quantitative evaluation is performed on the test set of the OmniCrack30k dataset [1] using popular segmentation metrics including Mean Intersection over Union (mIoU), Dice coefficient, and Jaccard index. This is complemented by an out-of-distribution qualitative evaluation on an unlabeled test set of real-world cracked statues and monuments. Our findings provide valuable insights into the capabilities of different CNN- based encoders for fine-grained crack segmentation. We show that the models exhibit promising generalization capabilities to unseen cultural heritage contexts, despite never having been explicitly trained on images of statues or monuments.


【7】Adam Simplified: Bias Correction Simplified
标题:Adam简化:偏见纠正简化
链接:https://arxiv.org/abs/2511.20516

作者:Sam Laing,Antonio Orvieto
摘要:The Adam optimizer is a cornerstone of modern deep learning, yet the empirical necessity of each of its individual components is often taken for granted. This paper presents a focused investigation into the role of bias-correction, a feature whose contribution remains poorly understood. Through a series of systematic ablations on vision and language modelling tasks, we demonstrate that the conventional wisdom surrounding bias correction is misleading. In particular, we demonstrate that in the optimal hyper-parameter configuration, the inclusion of bias correction leads to no improvement in final test performance. Moreover, unless appropriate learning rate scheduling is implemented, the inclusion of bias correction can sometimes be detrimental to performance. We further reinterpret bias correction as a form of implicit learning rate scheduling whose behaviour is strongly dependent on the choice of smoothing hyper-parameters $β_1, β_2 \in [0,1)$. Our findings challenge the universal inclusion of this component.


【8】DP-MicroAdam: Private and Frugal Algorithm for Training and Fine-tuning
标题:DP-MicroAdam:用于训练和微调的私人且节俭的算法
链接:https://arxiv.org/abs/2511.20509

作者:Mihaela Hudişteanu,Edwige Cyffers,Nikita P. Kalinin
摘要:Adaptive optimizers are the de facto standard in non-private training as they often enable faster convergence and improved performance. In contrast, differentially private (DP) training is still predominantly performed with DP-SGD, typically requiring extensive compute and hyperparameter tuning. We propose DP-MicroAdam, a memory-efficient and sparsity-aware adaptive DP optimizer. We prove that DP-MicroAdam converges in stochastic non-convex optimization at the optimal $\mathcal{O}(1/\sqrt{T})$ rate, up to privacy-dependent constants. Empirically, DP-MicroAdam outperforms existing adaptive DP optimizers and achieves competitive or superior accuracy compared to DP-SGD across a range of benchmarks, including CIFAR-10, large-scale ImageNet training, and private fine-tuning of pretrained transformers. These results demonstrate that adaptive optimization can improve both performance and stability under differential privacy.


【9】A Physics-Informed Loss Function for Boundary-Consistent and Robust Artery Segmentation in DSA Sequences
标题:DSA序列中边界一致且稳健的动脉分割的物理信息损失函数
链接:https://arxiv.org/abs/2511.20501

作者:Muhammad Irfan,Nasir Rahim,Khalid Mahmood Malik
摘要:Accurate extraction and segmentation of the cerebral arteries from digital subtraction angiography (DSA) sequences is essential for developing reliable clinical management models of complex cerebrovascular diseases. Conventional loss functions often rely solely on pixel-wise overlap, overlooking the geometric and physical consistency of vascular boundaries, which can lead to fragmented or unstable vessel predictions. To overcome this limitation, we propose a novel \textit{Physics-Informed Loss} (PIL) that models the interaction between the predicted and ground-truth boundaries as an elastic process inspired by dislocation theory in materials physics. This formulation introduces a physics-based regularization term that enforces smooth contour evolution and structural consistency, allowing the network to better capture fine vascular geometry. The proposed loss is integrated into several segmentation architectures, including U-Net, U-Net++, SegFormer, and MedFormer, and evaluated on two public benchmarks: DIAS and DSCA. Experimental results demonstrate that PIL consistently outperforms conventional loss functions such as Cross-Entropy, Dice, Active Contour, and Surface losses, achieving superior sensitivity, F1 score, and boundary coherence. These findings confirm that the incorporation of physics-based boundary interactions into deep neural networks improves both the precision and robustness of vascular segmentation in dynamic angiographic imaging. The implementation of the proposed method is publicly available at https://github.com/irfantahir301/Physicsis_loss.


【10】InferF: Declarative Factorization of AI/ML Inferences over Joins
标题:InferF:AI/ML推理对Joins的声明性分解
链接:https://arxiv.org/abs/2511.20489

作者:Kanchan Chowdhury,Lixi Zhou,Lulu Xie,Xinwei Fu,Jia Zou
备注:Accepted to SIGMOD 2026 as full research paper. This archived version has a full appendix
摘要 :Real-world AI/ML workflows often apply inference computations to feature vectors joined from multiple datasets. To avoid the redundant AI/ML computations caused by repeated data records in the join's output, factorized ML has been proposed to decompose ML computations into sub-computations to be executed on each normalized dataset. However, there is insufficient discussion on how factorized ML could impact AI/ML inference over multi-way joins. To address the limitations, we propose a novel declarative InferF system, focusing on the factorization of arbitrary inference workflows represented as analyzable expressions over the multi-way joins. We formalize our problem to flexibly push down partial factorized computations to qualified nodes in the join tree to minimize the overall inference computation and join costs and propose two algorithms to resolve the problem: (1) a greedy algorithm based on a per-node cost function that estimates the influence on overall latency if a subset of factorized computations is pushed to a node, and (2) a genetic algorithm for iteratively enumerating and evaluating promising factorization plans. We implement InferF on Velox, an open-sourced database engine from Meta, evaluate it on real-world datasets, observed up to 11.3x speedups, and systematically summarized the factors that determine when factorized ML can benefit AI/ML inference workflows.


【11】NVIDIA Nemotron Parse 1.1
标题:英伟达Nemotron解析1.1
链接:https://arxiv.org/abs/2511.20478

作者:Kateryna Chumachenko,Amala Sanjay Deshmukh,Jarno Seppanen,Ilia Karmanov,Chia-Chih Chen,Lukas Voegtle,Philipp Fischer,Marek Wawrzos,Saeid Motiian,Roman Ageev,Kedi Wu,Alexandre Milesi,Maryam Moosaei,Krzysztof Pawelec,Padmavathy Subramanian,Mehrzad Samadi,Xin Yu,Celina Dear,Sarah Stoddard,Jenna Diamond,Jesse Oliver,Leanna Chraghchian,Patrick Skelly,Tom Balough,Yao Xu,Jane Polak Scowcroft,Daniel Korzekwa,Darragh Hanley,Sandip Bhaskar,Timo Roman,Karan Sapra,Andrew Tao,Bryan Catanzaro
摘要:We introduce Nemotron-Parse-1.1, a lightweight document parsing and OCR model that advances the capabilities of its predecessor, Nemoretriever-Parse-1.0. Nemotron-Parse-1.1 delivers improved capabilities across general OCR, markdown formatting, structured table parsing, and text extraction from pictures, charts, and diagrams. It also supports a longer output sequence length for visually dense documents. As with its predecessor, it extracts bounding boxes of text segments, as well as corresponding semantic classes. Nemotron-Parse-1.1 follows an encoder-decoder architecture with 885M parameters, including a compact 256M-parameter language decoder. It achieves competitive accuracy on public benchmarks making it a strong lightweight OCR solution. We release the model weights publicly on Huggingface, as well as an optimized NIM container, along with a subset of the training data as part of the broader Nemotron-VLM-v2 dataset. Additionally, we release Nemotron-Parse-1.1-TC which operates on a reduced vision token length, offering a 20% speed improvement with minimal quality degradation.


【12】Diffusion for Fusion: Designing Stellarators with Generative AI
标题:融合扩散:用生成性人工智能设计成星器
链接:https://arxiv.org/abs/2511.20445

作者:Misha Padidar,Teresa Huang,Andrew Giuliani,Marina Spivak
摘要:Stellarators are a prospective class of fusion-based power plants that confine a hot plasma with three-dimensional magnetic fields. Typically framed as a PDE-constrained optimization problem, stellarator design is a time-consuming process that can take hours to solve on a computing cluster. Developing fast methods for designing stellarators is crucial for advancing fusion research. Given the recent development of large datasets of optimized stellarators, machine learning approaches have emerged as a potential candidate. Motivated by this, we present an open inverse problem to the machine learning community: to rapidly generate high-quality stellarator designs which have a set of desirable characteristics. As a case study in the problem space, we train a conditional diffusion model on data from the QUASR database to generate quasisymmetric stellarator designs with desirable characteristics (aspect ratio and mean rotational transform). The diffusion model is applied to design stellarators with characteristics not seen during training. We provide evaluation protocols and show that many of the generated stellarators exhibit solid performance: less than 5% deviation from quasisymmetry and the target characteristics. The modest deviation from quasisymmetry highlights an opportunity to reach the sub 1% target. Beyond the case study, we share multiple promising avenues for generative modeling to advance stellarator design.


【13】Short-Range Oversquashing
标题:短期过度挤压
链接:https://arxiv.org/abs/2511.20406

作者:Yaaqov Mishayev,Yonatan Sverdlov,Tal Amir,Nadav Dym
备注:Accepted to Learning on Graphs (LoG) 2025. Version identical to the camera-ready paper
摘要:Message Passing Neural Networks (MPNNs) are widely used for learning on graphs, but their ability to process long-range information is limited by the phenomenon of oversquashing. This limitation has led some researchers to advocate Graph Transformers as a better alternative, whereas others suggest that it can be mitigated within the MPNN framework, using virtual nodes or other rewiring techniques.   In this work, we demonstrate that oversquashing is not limited to long-range tasks, but can also arise in short-range problems. This observation allows us to disentangle two distinct mechanisms underlying oversquashing: (1) the bottleneck phenomenon, which can arise even in low-range settings, and (2) the vanishing gradient phenomenon, which is closely associated with long-range tasks.   We further show that the short-range bottleneck effect is not captured by existing explanations for oversquashing, and that adding virtual nodes does not resolve it. In contrast, transformers do succeed in such tasks, positioning them as the more compelling solution to oversquashing, compared to specialized MPNNs.


【14】Extension and neural operator approximation of the electrical impedance tomography inverse map
标题:电阻抗断层扫描逆图的扩展和神经运算符逼近
链接:https://arxiv.org/abs/2511.20361

作者:Maarten V. de Hoop,Nikola B. Kovachki,Matti Lassas,Nicholas H. Nelsen
备注:80 pages (49 main text, 20 appendix, and 11 references pages), 14 figures, 2 tables
摘要 :This paper considers the problem of noise-robust neural operator approximation for the solution map of Calderón's inverse conductivity problem. In this continuum model of electrical impedance tomography (EIT), the boundary measurements are realized as a noisy perturbation of the Neumann-to-Dirichlet map's integral kernel. The theoretical analysis proceeds by extending the domain of the inversion operator to a Hilbert space of kernel functions. The resulting extension shares the same stability properties as the original inverse map from kernels to conductivities, but is now amenable to neural operator approximation. Numerical experiments demonstrate that Fourier neural operators excel at reconstructing infinite-dimensional piecewise constant and lognormal conductivities in noisy setups both within and beyond the theory's assumptions. The methodology developed in this paper for EIT exemplifies a broader strategy for addressing nonlinear inverse problems with a noise-aware operator learning framework.


【15】Complexity Reduction Study Based on RD Costs Approximation for VVC Intra Partitioning
标题:基于RD代价近似的VVC帧内分割复杂度降低研究
链接:https://arxiv.org/abs/2511.20349

作者:M. E. A. Kherchouche,F. Galpin,T. Dumas,F. Schnitzler,D. Menard,L. Zhang
备注:2025 Data Compression Conference (DCC)
摘要:In this paper, a complexity study is conducted for Versatile Video Codec (VVC) intra partitioning to accelerate the exhaustive search involved in Rate-Distortion Optimization (RDO) process. To address this problem, two main machine learning techniques are proposed and compared. Unlike existing methods, the proposed approaches are size independent and incorporate the Rate-Distortion (RD) costs of neighboring blocks as input features. The first method is a regression based technique that predicts normalized RD costs of a given Coding Unit (CU). As partitioning possesses the Markov property, the associated decision-making problem can be modeled as a Markov Decision Process (MDP) and solved by Reinforcement Learning (RL). The second approach is a RL agent learned from trajectories of CU decision across two depths with Deep Q-Network (DQN) algorithm. Then a pre-determined thresholds are applied for both methods to select a suitable split for the current CU.


【16】CostNav: A Navigation Benchmark for Cost-Aware Evaluation of Embodied Agents
标题:CostNav:对已确定代理进行成本意识评估的导航基准
链接:https://arxiv.org/abs/2511.20216

作者:Haebin Seong,Sungmin Kim,Minchan Kim,Yongjun Cho,Myunchul Joe,Suhwan Choi,Jaeyoon Jung,Jiyong Youn,Yoonshik Kim,Samwoo Seong,Yubeen Park,Youngjae Yu,Yunsung Lee
摘要:Existing navigation benchmarks focus on task success metrics while overlooking economic viability -- critical for commercial deployment of autonomous delivery robots. We introduce \emph{CostNav}, a \textbf{Micro-Navigation Economic Testbed} that evaluates embodied agents through comprehensive cost-revenue analysis aligned with real-world business operations. CostNav models the complete economic lifecycle including hardware, training, energy, maintenance costs, and delivery revenue with service-level agreements, using industry-derived parameters. \textbf{To our knowledge, CostNav is the first work to quantitatively expose the gap between navigation research metrics and commercial viability}, revealing that optimizing for task success fundamentally differs from optimizing for economic deployment. Our cost model uses parameters derived from industry data sources (energy rates, delivery service pricing), and we project from a reduced-scale simulation to realistic deliveries. Under this projection, the baseline achieves 43.0\% SLA compliance but is \emph{not} commercially viable: yielding a loss of \$30.009 per run with no finite break-even point, because operating costs are dominated by collision-induced maintenance, which accounts for 99.7\% of per-run costs and highlights collision avoidance as a key optimization target. We demonstrate a learning-based on-device navigation baseline and establish a foundation for evaluating rule-based navigation, imitation learning, and cost-aware RL training. CostNav bridges the gap between navigation research and commercial deployment, enabling data-driven decisions about economic trade-offs across navigation paradigms.


【17】From data to concepts via wiring diagrams
标题:通过接线图从数据到概念
链接:https://arxiv.org/abs/2511.20138

作者:Jason Lo,Mohammadnima Jafari
备注:19 pages
摘要:A wiring diagram is a labeled directed graph that represents an abstract concept such as a temporal process. In this article, we introduce the notion of a quasi-skeleton wiring diagram graph, and prove that quasi-skeleton wiring diagram graphs correspond to Hasse diagrams. Using this result, we designed algorithms that extract wiring diagrams from sequential data. We used our algorithms in analyzing the behavior of an autonomous agent playing a computer game, and the algorithms correctly identified the winning strategies. We compared the performance of our main algorithm with two other algorithms based on standard clustering techniques (DBSCAN and agglomerative hierarchical), including when some of the data was perturbed. Overall, this article brings together techniques in category theory, graph theory, clustering, reinforcement learning, and data engineering.


【18】CLIMATEAGENT: Multi-Agent Orchestration for Complex Climate Data Science Workflows
标题:ClimateAgent:复杂气候数据科学工作流的多代理演示
链接:https://arxiv.org/abs/2511.20109

作者:Hyeonjae Kim,Chenyue Li,Wen Deng,Mengxi Jin,Wen Huang,Mengqian Lu,Binhang Yuan
备注:30 pages, 6 figures, 3 tables
摘要 :Climate science demands automated workflows to transform comprehensive questions into data-driven statements across massive, heterogeneous datasets. However, generic LLM agents and static scripting pipelines lack climate-specific context and flexibility, thus, perform poorly in practice. We present ClimateAgent, an autonomous multi-agent framework that orchestrates end-to-end climate data analytic workflows. ClimateAgent decomposes user questions into executable sub-tasks coordinated by an Orchestrate-Agent and a Plan-Agent; acquires data via specialized Data-Agents that dynamically introspect APIs to synthesize robust download scripts; and completes analysis and reporting with a Coding-Agent that generates Python code, visualizations, and a final report with a built-in self-correction loop. To enable systematic evaluation, we introduce Climate-Agent-Bench-85, a benchmark of 85 real-world tasks spanning atmospheric rivers, drought, extreme precipitation, heat waves, sea surface temperature, and tropical cyclones. On Climate-Agent-Bench-85, ClimateAgent achieves 100% task completion and a report quality score of 8.32, outperforming GitHub-Copilot (6.27) and a GPT-5 baseline (3.26). These results demonstrate that our multi-agent orchestration with dynamic API awareness and self-correcting execution substantially advances reliable, end-to-end automation for climate science analytic tasks.


【19】REWA: Witness-Overlap Theory -- Foundations for Composable Binary Similarity Systems
标题:REWA:见证重叠理论--可组合二元相似性系统的基础
链接:https://arxiv.org/abs/2511.19998

作者:Nikit Phadke
摘要:REWA introduces a general theory of similarity based on witness-overlap structures. We show that whenever similarity between concepts can be expressed as monotone witness overlap -- whether arising from graph neighborhoods, causal relations, temporal structure, topological features, symbolic patterns, or embedding-based neighborhoods -- it admits a reduction to compact encodings with provable ranking preservation guarantees. REWA systems consist of: (1) finite witness sets $W(v)$, (2) semi-random bit assignments generated from each witness, and (3) monotonicity of expected similarity in the overlap $Δ(u, v) = |W(u) \cap W(v)|$. We prove that under an overlap-gap condition on the final witness sets -- independent of how they were constructed -- top-$k$ rankings are preserved using $m = O(\log(|V|/δ))$ bits. The witness-set formulation is compositional: any sequence of structural, temporal, causal, topological, information-theoretic, or learned transformations can be combined into pipelines that terminate in discrete witness sets. The theory applies to the final witness overlap, enabling modular construction of similarity systems from reusable primitives. This yields a vast design space: millions of composable similarity definitions inherit logarithmic encoding complexity. REWA subsumes and unifies Bloom filters, minhash, LSH bitmaps, random projections, sketches, and hierarchical filters as special cases. It provides a principled foundation for similarity systems whose behavior is governed by witness overlap rather than hash-function engineering. This manuscript presents the axioms, the main reducibility theorem, complete proofs with explicit constants, and a detailed discussion of compositional design, limitations, and future extensions including multi-bit encodings, weighted witnesses, and non-set representations.


【20】Complex Instruction Following with Diverse Style Policies in Football Games
标题:足球比赛中的复杂指导与多样化的风格政策
链接:https://arxiv.org/abs/2511.19885

作者:Chenglu Sun,Shuo Shen,Haonan Hu,Wei Zhou,Chen Chen
备注:21 pages, 13 figures, accepted by AAAI2026
摘要:Despite advancements in language-controlled reinforcement learning (LC-RL) for basic domains and straightforward commands (e.g., object manipulation and navigation), effectively extending LC-RL to comprehend and execute high-level or abstract instructions in complex, multi-agent environments, such as football games, remains a significant challenge. To address this gap, we introduce Language-Controlled Diverse Style Policies (LCDSP), a novel LC-RL paradigm specifically designed for complex scenarios. LCDSP comprises two key components: a Diverse Style Training (DST) method and a Style Interpreter (SI). The DST method efficiently trains a single policy capable of exhibiting a wide range of diverse behaviors by modulating agent actions through style parameters (SP). The SI is designed to accurately and rapidly translate high-level language instructions into these corresponding SP. Through extensive experiments in a complex 5v5 football environment, we demonstrate that LCDSP effectively comprehends abstract tactical instructions and accurately executes the desired diverse behavioral styles, showcasing its potential for complex, real-world applications.


【21】SX-GeoTree: Self-eXplaining Geospatial Regression Tree Incorporating the Spatial Similarity of Feature Attributions
标题:SX-GeoTree:自我解释地理空间回归树,评估要素属性的空间相似性
链接:https://arxiv.org/abs/2511.19845

作者:Chaogui Kang,Lijian Luo,Qingfeng Guan,Yu Liu
备注:41 pages, 7 figures, 12 tables
摘要:Decision trees remain central for tabular prediction but struggle with (i) capturing spatial dependence and (ii) producing locally stable (robust) explanations. We present SX-GeoTree, a self-explaining geospatial regression tree that integrates three coupled objectives during recursive splitting: impurity reduction (MSE), spatial residual control (global Moran's I), and explanation robustness via modularity maximization on a consensus similarity network formed from (a) geographically weighted regression (GWR) coefficient distances (stimulus-response similarity) and (b) SHAP attribution distances (explanatory similarity). We recast local Lipschitz continuity of feature attributions as a network community preservation problem, enabling scalable enforcement of spatially coherent explanations without per-sample neighborhood searches. Experiments on two exemplar tasks (county-level GDP in Fujian, n=83; point-wise housing prices in Seattle, n=21,613) show SX-GeoTree maintains competitive predictive accuracy (within 0.01 $R^{2}$ of decision trees) while improving residual spatial evenness and doubling attribution consensus (modularity: Fujian 0.19 vs 0.09; Seattle 0.10 vs 0.05). Ablation confirms Moran's I and modularity terms are complementary; removing either degrades both spatial residual structure and explanation stability. The framework demonstrates how spatial similarity - extended beyond geometric proximity through GWR-derived local relationships - can be embedded in interpretable models, advancing trustworthy geospatial machine learning and offering a transferable template for domain-aware explainability.


【22】Provably Outlier-resistant Semi-parametric Regression for Transferable Calibration of Low-cost Air-quality Sensors
标题:低成本空气质量传感器可传递校准的可证明抗异常值半参数回归
链接:https://arxiv.org/abs/2511.19810

作者:Divyansh Chaurasia,Manoj Daram,Roshan Kumar,Nihal Thukarama Rao,Vipul Sangode,Pranjal Srivastava,Avnish Tripathi,Shoubhik Chakraborty,Akanksha,Ambasht Kumar,Davender Sethi,Sachchida Nand Tripathi,Purushottam Kar
备注:20 pages, 14 figures, under peer review
摘要 :We present a case study for the calibration of Low-cost air-quality (LCAQ) CO sensors from one of the largest multi-site-multi-season-multi-sensor-multi-pollutant mobile air-quality monitoring network deployments in India. LCAQ sensors have been shown to play a critical role in the establishment of dense, expansive air-quality monitoring networks and combating elevated pollution levels. The calibration of LCAQ sensors against regulatory-grade monitors is an expensive, laborious and time-consuming process, especially when a large number of sensors are to be deployed in a geographically diverse layout. In this work, we present the RESPIRE technique to calibrate LCAQ sensors to detect ambient CO (Carbon Monoxide) levels. RESPIRE offers specific advantages over baseline calibration methods popular in literature, such as improved prediction in cross-site, cross-season, and cross-sensor settings. RESPIRE offers a training algorithm that is provably resistant to outliers and an explainable model with the ability to flag instances of model overfitting. Empirical results are presented based on data collected during an extensive deployment spanning four sites, two seasons and six sensor packages. RESPIRE code is available at https://github.com/purushottamkar/respire.


【23】Terminal Velocity Matching
标题:终端速度匹配
链接:https://arxiv.org/abs/2511.19797

作者:Linqi Zhou,Mathias Parger,Ayaan Haque,Jiaming Song
备注:Code available at: https://github.com/lumalabs/tvm
摘要:We propose Terminal Velocity Matching (TVM), a generalization of flow matching that enables high-fidelity one- and few-step generative modeling. TVM models the transition between any two diffusion timesteps and regularizes its behavior at its terminal time rather than at the initial time. We prove that TVM provides an upper bound on the $2$-Wasserstein distance between data and model distributions when the model is Lipschitz continuous. However, since Diffusion Transformers lack this property, we introduce minimal architectural changes that achieve stable, single-stage training. To make TVM efficient in practice, we develop a fused attention kernel that supports backward passes on Jacobian-Vector Products, which scale well with transformer architectures. On ImageNet-256x256, TVM achieves 3.29 FID with a single function evaluation (NFE) and 1.99 FID with 4 NFEs. It similarly achieves 4.32 1-NFE FID and 2.94 4-NFE FID on ImageNet-512x512, representing state-of-the-art performance for one/few-step models from scratch.


【24】When +1% Is Not Enough: A Paired Bootstrap Protocol for Evaluating Small Improvements
标题:当+1%还不够时:评估微小改进的配对Bootstrap协议
链接:https://arxiv.org/abs/2511.19794

作者:Wenzhang Du
备注:13 pages, 3 figures
摘要:Recent machine learning papers often report 1-2 percentage point improvements from a single run on a benchmark. These gains are highly sensitive to random seeds, data ordering, and implementation details, yet are rarely accompanied by uncertainty estimates or significance tests. It is therefore unclear when a reported +1-2% reflects a real algorithmic advance versus noise.   We revisit this problem under realistic compute budgets, where only a few runs are affordable. We propose a simple, PC-friendly evaluation protocol based on paired multi-seed runs, bias-corrected and accelerated (BCa) bootstrap confidence intervals, and a sign-flip permutation test on per-seed deltas. The protocol is intentionally conservative and is meant as a guardrail against over-claiming.   We instantiate it on CIFAR-10, CIFAR-10N, and AG News using synthetic no-improvement, small-gain, and medium-gain scenarios. Single runs and unpaired t-tests often suggest significant gains for 0.6-2.0 point improvements, especially on text. With only three seeds, our paired protocol never declares significance in these settings. We argue that such conservative evaluation is a safer default for small gains under tight budgets.


【25】CAMformer: Associative Memory is All You Need
标题:CAMformer:联想记忆就是你所需要的一切
链接:https://arxiv.org/abs/2511.19740

作者:Tergel Molom-Ochir,Benjamin F. Morris,Mark Horton,Chiyue Wei,Cong Guo,Brady Taylor,Peter Liu,Shan X. Wang,Deliang Fan,Hai Helen Li,Yiran Chen
备注:7 pages, 10 figures
摘要:Transformers face scalability challenges due to the quadratic cost of attention, which involves dense similarity computations between queries and keys. We propose CAMformer, a novel accelerator that reinterprets attention as an associative memory operation and computes attention scores using a voltage-domain Binary Attention Content Addressable Memory (BA-CAM). This enables constant-time similarity search through analog charge sharing, replacing digital arithmetic with physical similarity sensing. CAMformer integrates hierarchical two-stage top-k filtering, pipelined execution, and high-precision contextualization to achieve both algorithmic accuracy and architectural efficiency. Evaluated on BERT and Vision Transformer workloads, CAMformer achieves over 10x energy efficiency, up to 4x higher throughput, and 6-8x lower area compared to state-of-the-art accelerators--while maintaining near-lossless accuracy.


【26】Designing Preconditioners for SGD: Local Conditioning, Noise Floors, and Basin Stability
标题:设计适用于新元的预处理器:局部调节、噪音下限和盆地稳定性
链接:https://arxiv.org/abs/2511.19716

作者:Mitchell Scott,Tianshi Xu,Ziyuan Tang,Alexandra Pichette-Emmons,Qiang Ye,Yousef Saad,Yuanzhe Xi
备注:31 pages, 11 Figures
摘要 :Stochastic Gradient Descent (SGD) often slows in the late stage of training due to anisotropic curvature and gradient noise. We analyze preconditioned SGD in the geometry induced by a symmetric positive definite matrix $\mathbf{M}$, deriving bounds in which both the convergence rate and the stochastic noise floor are governed by $\mathbf{M}$-dependent quantities: the rate through an effective condition number in the $\mathbf{M}$-metric, and the floor through the product of that condition number and the preconditioned noise level. For nonconvex objectives, we establish a preconditioner-dependent basin-stability guarantee: when smoothness and basin size are measured in the $\mathbf{M}$-norm, the probability that the iterates remain in a well-behaved local region admits an explicit lower bound. This perspective is particularly relevant in Scientific Machine Learning (SciML), where achieving small training loss under stochastic updates is closely tied to physical fidelity, numerical stability, and constraint satisfaction. The framework applies to both diagonal/adaptive and curvature-aware preconditioners and yields a simple design principle: choose $\mathbf{M}$ to improve local conditioning while attenuating noise. Experiments on a quadratic diagnostic and three SciML benchmarks validate the predicted rate-floor behavior.


【27】Demystifying Diffusion Objectives: Reweighted Losses are Better Variational Bounds
标题:揭开扩散目标的神秘面纱:重新加权的损失是更好的变分界限
链接:https://arxiv.org/abs/2511.19664

作者:Jiaxin Shi,Michalis K. Titsias
摘要:We derive a new theoretical interpretation of the reweighted losses that are widely used for training diffusion models. Our method is based on constructing a cascade of time-dependent variational lower bounds on the data log-likelihood, that provably improves upon the standard evidence lower bound and results in reduced data-model KL-divergences. Combining such bounds gives rise to reweighted objectives that can be applied to any generative diffusion model including both continuous Gaussian diffusion and masked (discrete) diffusion models. Then, we showcase this framework in masked diffusion and report significant improvements over previous training losses in pixel-space image modeling, approaching sample quality comparable to continuous diffusion models. Our results also provide a theoretical justification for the simple weighting scheme widely used in masked image models.


【28】Synthetic Data: AI's New Weapon Against Android Malware
标题:合成数据:人工智能对抗Android恶意软件的新武器
链接:https://arxiv.org/abs/2511.19649

作者:Angelo Gaspar Diniz Nogueira,Kayua Oleques Paim,Hendrio Bragança,Rodrigo Brandão Mansilha,Diego Kreutz
备注:23 pages, 18 figures, 8 tables. Accepted for publication at the JBCS
摘要:The ever-increasing number of Android devices and the accelerated evolution of malware, reaching over 35 million samples by 2024, highlight the critical importance of effective detection methods. Attackers are now using Artificial Intelligence to create sophisticated malware variations that can easily evade traditional detection techniques. Although machine learning has shown promise in malware classification, its success relies heavily on the availability of up-to-date, high-quality datasets. The scarcity and high cost of obtaining and labeling real malware samples presents significant challenges in developing robust detection models. In this paper, we propose MalSynGen, a Malware Synthetic Data Generation methodology that uses a conditional Generative Adversarial Network (cGAN) to generate synthetic tabular data. This data preserves the statistical properties of real-world data and improves the performance of Android malware classifiers. We evaluated the effectiveness of this approach using various datasets and metrics that assess the fidelity of the generated data, its utility in classification, and the computational efficiency of the process. Our experiments demonstrate that MalSynGen can generalize across different datasets, providing a viable solution to address the issues of obsolescence and low quality data in malware detection.


【29】Think First, Assign Next (ThiFAN-VQA): A Two-stage Chain-of-Thought Framework for Post-Disaster Damage Assessment
标题:先思考,然后分配(ThiFAN-VQA):灾后损害评估的两阶段思想链框架
链接:https://arxiv.org/abs/2511.19557

作者:Ehsan Karimi,Nhut Le,Maryam Rahnemoonfar
摘要:Timely and accurate assessment of damages following natural disasters is essential for effective emergency response and recovery. Recent AI-based frameworks have been developed to analyze large volumes of aerial imagery collected by Unmanned Aerial Vehicles, providing actionable insights rapidly. However, creating and annotating data for training these models is costly and time-consuming, resulting in datasets that are limited in size and diversity. Furthermore, most existing approaches rely on traditional classification-based frameworks with fixed answer spaces, restricting their ability to provide new information without additional data collection or model retraining. Using pre-trained generative models built on in-context learning (ICL) allows for flexible and open-ended answer spaces. However, these models often generate hallucinated outputs or produce generic responses that lack domain-specific relevance. To address these limitations, we propose ThiFAN-VQA, a two-stage reasoning-based framework for visual question answering (VQA) in disaster scenarios. ThiFAN-VQA first generates structured reasoning traces using chain-of-thought (CoT) prompting and ICL to enable interpretable reasoning under limited supervision. A subsequent answer selection module evaluates the generated responses and assigns the most coherent and contextually accurate answer, effectively improve the model performance. By integrating a custom information retrieval system, domain-specific prompting, and reasoning-guided answer selection, ThiFAN-VQA bridges the gap between zero-shot and supervised methods, combining flexibility with consistency. Experiments on FloodNet and RescueNet-VQA, UAV-based datasets from flood- and hurricane-affected regions, demonstrate that ThiFAN-VQA achieves superior accuracy, interpretability, and adaptability for real-world post-disaster damage assessment tasks.


【30】Online Sparse Feature Selection in Data Streams via Differential Evolution
标题:通过差异进化在数据流中在线稀疏特征选择
链接:https://arxiv.org/abs/2511.19555

作者:Ruiyang Xu
摘要:The processing of high-dimensional streaming data commonly utilizes online streaming feature selection (OSFS) techniques. However, practical implementations often face challenges with data incompleteness due to equipment failures and technical constraints. Online Sparse Streaming Feature Selection (OS2FS) tackles this issue through latent factor analysis-based missing data imputation. Despite this advancement, existing OS2FS approaches exhibit substantial limitations in feature evaluation, resulting in performance deterioration. To address these shortcomings, this paper introduces a novel Online Differential Evolution for Sparse Feature Selection (ODESFS) in data streams, incorporating two key innovations: (1) missing value imputation using a latent factor analysis model, and (2) feature importance evaluation through differential evolution. Comprehensive experiments conducted on six real-world datasets demonstrate that ODESFS consistently outperforms state-of-the-art OSFS and OS2FS methods by selecting optimal feature subsets and achieving superior accuracy.


【31】When Should Neural Data Inform Welfare? A Critical Framework for Policy Uses of Neuroeconomics
标题:神经数据何时应该告知福利?神经经济学政策使用的关键框架
链接:https://arxiv.org/abs/2511.19548

作者:Yiven,Zhu
备注:Durham Economic Journal 2025
摘要:Neuroeconomics promises to ground welfare analysis in neural and computational evidence about how people value outcomes, learn from experience and exercise self-control. At the same time, policy and commercial actors increasingly invoke neural data to justify paternalistic regulation, "brain-based" interventions and new welfare measures. This paper asks under what conditions neural data can legitimately inform welfare judgements for policy rather than merely describing behaviour. I develop a non-empirical, model-based framework that links three levels: neural signals, computational decision models and normative welfare criteria. Within an actor-critic reinforcement-learning model, I formalise the inference path from neural activity to latent values and prediction errors and then to welfare claims. I show that neural evidence constrains welfare judgements only when the neural-computational mapping is well validated, the decision model identifies "true" interests versus context-dependent mistakes, and the welfare criterion is explicitly specified and defended. Applying the framework to addiction, neuromarketing and environmental policy, I derive a Neuroeconomic Welfare Inference Checklist for regulators and for designers of NeuroAI systems. The analysis treats brains and artificial agents as value-learning systems while showing that internal reward signals, whether biological or artificial, are computational quantities and cannot be treated as welfare measures without an explicit normative model.


【32】Shortcut Invariance: Targeted Jacobian Regularization in Disentangled Latent Space
标题:空间不变性:解开潜空间中的有针对性的Jacobian正规化
链接:https://arxiv.org/abs/2511.19525

作者:Shivam Pal,Sakshi Varshney,Piyush Rai
摘要:Deep neural networks are prone to learning shortcuts, spurious and easily learned correlations in training data that cause severe failures in out-of-distribution (OOD) generalization. A dominant line of work seeks robustness by learning a robust representation, often explicitly partitioning the latent space into core and spurious components; this approach can be complex, brittle, and difficult to scale. We take a different approach, instead of a robust representation, we learn a robust function. We present a simple and effective training method that renders the classifier functionally invariant to shortcut signals. Our method operates within a disentangled latent space, which is essential as it isolates spurious and core features into distinct dimensions. This separation enables the identification of candidate shortcut features by their strong correlation with the label, used as a proxy for semantic simplicity. The classifier is then desensitized to these features by injecting targeted, anisotropic latent noise during training. We analyze this as targeted Jacobian regularization, which forces the classifier to ignore spurious features and rely on more complex, core semantic signals. The result is state-of-the-art OOD performance on established shortcut learning benchmarks.


【33】Position: The Complexity of Perfect AI Alignment -- Formalizing the RLHF Trilemma
标题:位置:完美人工智能对齐的复杂性--形式化WLHF三困境
链接:https://arxiv.org/abs/2511.19504

作者:Subramanyam Sahoo,Aman Chadha,Vinija Jain,Divya Chaudhary
备注:Accepted at NeurIPS 2025 Workshop on Socially Responsible and Trustworthy Foundation Models (ResponsibleFM)
摘要:Reinforcement Learning from Human Feedback (RLHF) is widely used for aligning large language models, yet practitioners face a persistent puzzle: improving safety often reduces fairness, scaling to diverse populations becomes computationally intractable, and making systems robust often amplifies majority biases. We formalize this tension as the Alignment Trilemma: no RLHF system can simultaneously achieve (i) epsilon-representativeness across diverse human values, (ii) polynomial tractability in sample and compute complexity, and (iii) delta-robustness against adversarial perturbations and distribution shift. Through a complexity-theoretic analysis integrating statistical learning theory and robust optimization, we prove that achieving both representativeness (epsilon <= 0.01) and robustness (delta <= 0.001) for global-scale populations requires Omega(2^{d_context}) operations, which is super-polynomial in the context dimensionality. We show that current RLHF implementations resolve this trilemma by sacrificing representativeness: they collect only 10^3--10^4 samples from homogeneous annotator pools while 10^7--10^8 samples are needed for true global representation. Our framework provides a unified explanation for documented RLHF pathologies including preference collapse, sycophancy, and systematic bias amplification. We conclude with concrete directions for navigating these fundamental trade-offs through strategic relaxations of alignment requirements.


【34】RFX: High-Performance Random Forests with GPU Acceleration and QLORA Compression
标题:RFX:具有图形处理器加速和QLORA压缩的高性能随机森林
链接:https://arxiv.org/abs/2511.19493

作者:Chris Kuchar
备注:39 pages, 9 tables, 4 figures
摘要 :RFX (Random Forests X), where X stands for compression or quantization, presents a production-ready implementation of Breiman and Cutler's Random Forest classification methodology in Python. RFX v1.0 provides complete classification: out-of-bag error estimation, overall and local importance measures, proximity matrices with QLORA compression, case-wise analysis, and interactive visualization (rfviz)--all with CPU and GPU acceleration. Regression, unsupervised learning, CLIQUE importance, and RF-GAP proximity are planned for v2.0.   This work introduces four solutions addressing the proximity matrix memory bottleneck limiting Random Forest analysis to ~60,000 samples: (1) QLORA (Quantized Low-Rank Adaptation) compression for GPU proximity matrices, reducing memory from 80GB to 6.4MB for 100k samples (12,500x compression with INT8 quantization) while maintaining 99% geometric structure preservation, (2) CPU TriBlock proximity--combining upper-triangle storage with block-sparse thresholding--achieving 2.7x memory reduction with lossless quality, (3) SM-aware GPU batch sizing achieving 95% GPU utilization, and (4) GPU-accelerated 3D MDS visualization computing embeddings directly from low-rank factors using power iteration.   Validation across four implementation modes (GPU/CPU x case-wise/non-case-wise) demonstrates correct implementation. GPU achieves 1.4x speedup over CPU for overall importance with 500+ trees. Proximity computation scales from 1,000 to 200,000+ samples (requiring GPU QLORA), with CPU TriBlock filling the gap for medium-scale datasets (10K-50K samples). RFX v1.0 eliminates the proximity memory bottleneck, enabling proximity-based Random Forest analysis on datasets orders of magnitude larger than previously feasible. Open-source production-ready classification following Breiman and Cutler's original methodology.


【35】The Generalized Proximity Forest
标题:广义邻近森林
链接:https://arxiv.org/abs/2511.19487

作者:Ben Shaw,Adam Rustad,Sofia Pelagalli Maia,Jake S. Rhodes,Kevin R. Moon
摘要:Recent work has demonstrated the utility of Random Forest (RF) proximities for various supervised machine learning tasks, including outlier detection, missing data imputation, and visualization. However, the utility of the RF proximities depends upon the success of the RF model, which itself is not the ideal model in all contexts. RF proximities have recently been extended to time series by means of the distance-based Proximity Forest (PF) model, among others, affording time series analysis with the benefits of RF proximities. In this work, we introduce the generalized PF model, thereby extending RF proximities to all contexts in which supervised distance-based machine learning can occur. Additionally, we introduce a variant of the PF model for regression tasks. We also introduce the notion of using the generalized PF model as a meta-learning framework, extending supervised imputation capability to any pre-trained classifier. We experimentally demonstrate the unique advantages of the generalized PF model compared with both the RF model and the $k$-nearest neighbors model.


【36】Towards a future space-based, highly scalable AI infrastructure system design
标题:迈向未来基于太空的、高度可扩展的人工智能基础设施系统设计
链接:https://arxiv.org/abs/2511.19468

作者:Blaise Agüera y Arcas,Travis Beals,Maria Biggs,Jessica V. Bloom,Thomas Fischbacher,Konstantin Gromov,Urs Köster,Rishiraj Pravahan,James Manyika
备注:19 pages, 4 figures
摘要:If AI is a foundational general-purpose technology, we should anticipate that demand for AI compute -- and energy -- will continue to grow. The Sun is by far the largest energy source in our solar system, and thus it warrants consideration how future AI infrastructure could most efficiently tap into that power. This work explores a scalable compute system for machine learning in space, using fleets of satellites equipped with solar arrays, inter-satellite links using free-space optics, and Google tensor processing unit (TPU) accelerator chips. To facilitate high-bandwidth, low-latency inter-satellite communication, the satellites would be flown in close proximity. We illustrate the basic approach to formation flight via a 81-satellite cluster of 1 km radius, and describe an approach for using high-precision ML-based models to control large-scale constellations. Trillium TPUs are radiation tested. They survive a total ionizing dose equivalent to a 5 year mission life without permanent failures, and are characterized for bit-flip errors. Launch costs are a critical part of overall system cost; a learning curve analysis suggests launch to low-Earth orbit (LEO) may reach $\lesssim$\$200/kg by the mid-2030s.


【37】SG-OIF: A Stability-Guided Online Influence Framework for Reliable Vision Data
标题:SG-OIF:一个稳定性引导的在线影响框架,用于可靠的视觉数据
链接:https://arxiv.org/abs/2511.19466

作者:Penghao Rao,Runmin Jiang,Min Xu
摘要:Approximating training-point influence on test predictions is critical for deploying deep-learning vision models, essential for locating noisy data. Though the influence function was proposed for attributing how infinitesimal up-weighting or removal of individual training examples affects model outputs, its implementation is still challenging in deep-learning vision models: inverse-curvature computations are expensive, and training non-stationarity invalidates static approximations. Prior works use iterative solvers and low-rank surrogates to reduce cost, but offline computation lags behind training dynamics, and missing confidence calibration yields fragile rankings that misidentify critical examples. To address these challenges, we introduce a Stability-Guided Online Influence Framework (SG-OIF), the first framework that treats algorithmic stability as a real-time controller, which (i) maintains lightweight anchor IHVPs via stochastic Richardson and preconditioned Neumann; (ii) proposes modular curvature backends to modulate per-example influence scores using stability-guided residual thresholds, anomaly gating, and confidence. Experimental results show that SG-OIF achieves SOTA (State-Of-The-Art) on noise-label and out-of-distribution detection tasks across multiple datasets with various corruption. Notably, our approach achieves 91.1\% accuracy in the top 1\% prediction samples on the CIFAR-10 (20\% asym), and gets 99.8\% AUPR score on MNIST, effectively demonstrating that this framework is a practical controller for online influence estimation.


【38】Temperature in SLMs: Impact on Incident Categorization in On-Premises Environments
标题:STM中的温度:对本地环境中事件分类的影响
链接:https://arxiv.org/abs/2511.19464

作者:Marcio Pohlmann,Alex Severo,Gefté Almeida,Diego Kreutz,Tiago Heinrich,Lourenço Pereira
备注:5 pages, 3 figures, 2 tables, submitted to ERRC/WRSeg 2025
摘要:SOCs and CSIRTs face increasing pressure to automate incident categorization, yet the use of cloud-based LLMs introduces costs, latency, and confidentiality risks. We investigate whether locally executed SLMs can meet this challenge. We evaluated 21 models ranging from 1B to 20B parameters, varying the temperature hyperparameter and measuring execution time and precision across two distinct architectures. The results indicate that temperature has little influence on performance, whereas the number of parameters and GPU capacity are decisive factors.


【39】AI/ML based Joint Source and Channel Coding for HARQ-ACK Payload
标题:基于AI/ML的HARQ-ACK净荷联合信源和信道编码
链接:https://arxiv.org/abs/2511.19943

作者:Akash Doshi,Pinar Sen,Kirill Ivanov,Wei Yang,June Namgoong,Runxin Wang,Rachel Wang,Taesang Yoo,Jing Jiang,Tingfang Ji
备注:39 pages, 15 figures. Under consideration for publication in Journal of Sel. Areas in Information Theory. This paper was presented in part at the International Symposium on Topics in Coding, August 2025 in the Session for Coding and AI
摘要:Channel coding from 2G to 5G has assumed the inputs bits at the physical layer to be uniformly distributed. However, hybrid automatic repeat request acknowledgement (HARQ-ACK) bits transmitted in the uplink are inherently non-uniformly distributed. For such sources, significant performance gains could be obtained by employing joint source channel coding, aided by deep learning-based techniques. In this paper, we learn a transformer-based encoder using a novel "free-lunch" training algorithm and propose per-codeword power shaping to exploit the source prior at the encoder whilst being robust to small changes in the HARQ-ACK distribution. Furthermore, any HARQ-ACK decoder has to achieve a low negative acknowledgement (NACK) error rate to avoid radio link failures resulting from multiple NACK errors. We develop an extension of the Neyman-Pearson test to a coded bit system with multiple information bits to achieve Unequal Error Protection of NACK over ACK bits at the decoder. Finally, we apply the proposed encoder and decoder designs to a 5G New Radio (NR) compliant uplink setup under a fading channel, describing the optimal receiver design and a low complexity coherent approximation to it. Our results demonstrate 3-6 dB reduction in the average transmit power required to achieve the target error rates compared to the NR baseline, while also achieving a 2-3 dB reduction in the maximum transmit power, thus providing for significant coverage gains and power savings.


【40】Individual and group fairness in geographical partitioning
标题:地理划分中的个人和群体公平
链接:https://arxiv.org/abs/2511.19722

作者:Ilya O. Ryzhov,John Gunnar Carlsson,Yinchu Zhu
摘要:Socioeconomic segregation often arises in school districting and other contexts, causing some groups to be over- or under-represented within a particular district. This phenomenon is closely linked with disparities in opportunities and outcomes. We formulate a new class of geographical partitioning problems in which the population is heterogeneous, and it is necessary to ensure fair representation for each group at each facility. We prove that the optimal solution is a novel generalization of the additively weighted Voronoi diagram, and we propose a simple and efficient algorithm to compute it, thus resolving an open question dating back to Dvoretzky et al. (1951). The efficacy and potential for practical insight of the approach are demonstrated in a realistic case study involving seven demographic groups and $78$ district offices.


【41】The Alexander-Hirschowitz theorem for neurovarieties
标题:神经变种的亚历山大-赫肖维茨定理
链接:https://arxiv.org/abs/2511.19703

作者:A. Massarenti,M. Mella
备注:21 pages
摘要:We study neurovarieties for polynomial neural networks and fully characterize when they attain the expected dimension in the single-output case. As consequences, we establish non-defectiveness and global identifiability for multi-output architectures.


【42】FAST: Topology-Aware Frequency-Domain Distribution Matching for Coreset Selection
标题:Fast:用于核心集选择的具有布局感知的频域分布匹配
链接:https://arxiv.org/abs/2511.19476

作者:Jin Cui,Boran Zhao,Jiajun Xu,Jiaqi Guo,Shuo Guan,Pengju Ren
摘要:Coreset selection compresses large datasets into compact, representative subsets, reducing the energy and computational burden of training deep neural networks. Existing methods are either: (i) DNN-based, which are tied to model-specific parameters and introduce architectural bias; or (ii) DNN-free, which rely on heuristics lacking theoretical guarantees. Neither approach explicitly constrains distributional equivalence, largely because continuous distribution matching is considered inapplicable to discrete sampling. Moreover, prevalent metrics (e.g., MSE, KL, MMD, CE) cannot accurately capture higher-order moment discrepancies, leading to suboptimal coresets. In this work, we propose FAST, the first DNN-free distribution-matching coreset selection framework that formulates the coreset selection task as a graph-constrained optimization problem grounded in spectral graph theory and employs the Characteristic Function Distance (CFD) to capture full distributional information in the frequency domain. We further discover that naive CFD suffers from a "vanishing phase gradient" issue in medium and high-frequency regions; to address this, we introduce an Attenuated Phase-Decoupled CFD. Furthermore, for better convergence, we design a Progressive Discrepancy-Aware Sampling strategy that progressively schedules frequency selection from low to high, preserving global structure before refining local details and enabling accurate matching with fewer frequencies while avoiding overfitting. Extensive experiments demonstrate that FAST significantly outperforms state-of-the-art coreset selection methods across all evaluated benchmarks, achieving an average accuracy gain of 9.12%. Compared to other baseline coreset methods, it reduces power consumption by 96.57% and achieves a 2.2x average speedup, underscoring its high performance and energy efficiency.


机器翻译由腾讯交互翻译提供,仅供参考

点击“阅读原文”获取带摘要的学术速递

Python社区是高质量的Python/Django开发社区
本文地址:http://www.python88.com/topic/189662