Py学习  »  机器学习算法

机器学习学术速递[1.29]

arXiv每日学术速递 • 1 周前 • 257 次点击  

点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!


cs.LG 方向,今日共计170篇


大模型相关(23篇)

【1】Evolutionary Strategies lead to Catastrophic Forgetting in LLMs
标题:进化策略导致LLM灾难性遗忘
链接:https://arxiv.org/abs/2601.20861

作者:Immanuel Abdi,Akshat Gupta,Micah Mok,Alexander Lu,Nicholas Lee,Gopala Anumanchipalli
摘要:当前人工智能系统中最大的缺失功能之一是在部署后持续学习的能力。实现这样的持续学习系统有几个挑战,其中之一是用于训练最先进的LLM的基于梯度的算法的大存储器需求。进化策略(ES)最近重新出现,作为传统学习算法的无梯度替代方案,并在LLM的特定任务上表现出令人鼓舞的性能。在本文中,我们对ES进行了全面的分析,并在训练越来越多的更新步骤时专门评估了其遗忘曲线。我们首先发现,ES是能够达到性能数字接近GRPO的数学和推理任务具有可比的计算预算。然而,对于持续学习来说,最重要的是,ES的性能增益伴随着对先前能力的显著遗忘,限制了其在线训练模型的适用性。我们还探讨了这种行为背后的原因,并表明,使用ES的更新是稀疏得多,并有较大的数量级$\ell_2$ norm相比,相应的GRPO更新,解释对比两种算法之间的遗忘曲线。通过这项研究,我们的目标是强调像ES这样的无梯度算法中的遗忘问题,并希望激发未来的工作来减轻这些问题。
摘要:One of the biggest missing capabilities in current AI systems is the ability to learn continuously after deployment. Implementing such continually learning systems have several challenges, one of which is the large memory requirement of gradient-based algorithms that are used to train state-of-the-art LLMs. Evolutionary Strategies (ES) have recently re-emerged as a gradient-free alternative to traditional learning algorithms and have shown encouraging performance on specific tasks in LLMs. In this paper, we perform a comprehensive analysis of ES and specifically evaluate its forgetting curves when training for an increasing number of update steps. We first find that ES is able to reach performance numbers close to GRPO for math and reasoning tasks with a comparable compute budget. However, and most importantly for continual learning, the performance gains in ES is accompanied by significant forgetting of prior abilities, limiting its applicability for training models online. We also explore the reason behind this behavior and show that the updates made using ES are much less sparse and have orders of magnitude larger $\ell_2$ norm compared to corresponding GRPO updates, explaining the contrasting forgetting curves between the two algorithms. With this study, we aim to highlight the issue of forgetting in gradient-free algorithms like ES and hope to inspire future work to mitigate these issues.


【2】Linear representations in language models can change dramatically over a conversation
标题:语言模型中的线性表示可能会在对话中发生巨大变化
链接:https://arxiv.org/abs/2601.20834

作者:Andrew Kyle Lampinen,Yuxuan Li,Eghbal Hosseini,Sangnie Bhardwaj,Murray Shanahan
摘要:语言模型表示通常包含对应于高级概念的线性方向。在这里,我们研究这些表征的动态:表征如何在(模拟)对话的背景下沿着这些维度演变。我们发现,线性表示可以在对话中发生显着变化;例如,在对话开始时表示为事实的信息可以在结束时表示为非事实,反之亦然。这些变化是依赖于内容的;虽然会话相关信息的表示可能会改变,但一般信息通常会保留。这些变化是强大的,即使是从更肤浅的反应模式中分离出真实性的维度,并且发生在不同的模型家族和模型层中。这些表示变化不需要策略上的对话;甚至重放由完全不同的模型编写的对话脚本也可以产生类似的变化。然而,如果仅仅把一个科幻故事放在一个更明确的背景下,那么适应性就会弱得多。我们还表明,转向沿代表性的方向可以有显着不同的效果在不同的点在对话中。这些结果是一致的想法,表示可能会演变的模型扮演一个特定的角色,是由对话提示。我们的研究结果可能会对可解释性和指导性提出挑战-特别是,它们意味着使用对特征或方向的静态解释可能会误导,或者假设特定范围的特征始终对应于特定的地面实况值的探测器。然而,这些类型的表征动力学也指出了令人兴奋的新研究方向,以了解模型如何适应上下文。
摘要:Language model representations often contain linear directions that correspond to high-level concepts. Here, we study the dynamics of these representations: how representations evolve along these dimensions within the context of (simulated) conversations. We find that linear representations can change dramatically over a conversation; for example, information that is represented as factual at the beginning of a conversation can be represented as non-factual at the end and vice versa. These changes are content-dependent; while representations of conversation-relevant information may change, generic information is generally preserved. These changes are robust even for dimensions that disentangle factuality from more superficial response patterns, and occur across different model families and layers of the model. These representation changes do not require on-policy conversations; even replaying a conversation script written by an entirely different model can produce similar changes. However, adaptation is much weaker from simply having a sci-fi story in context that is framed more explicitly as such. We also show that steering along a representational direction can have dramatically different effects at different points in a conversation. These results are consistent with the idea that representations may evolve in response to the model playing a particular role that is cued by a conversation. Our findings may pose challenges for interpretability and steering -- in particular, they imply that it may be misleading to use static interpretations of features or directions, or probes that assume a particular range of features consistently corresponds to a particular ground-truth value. However, these types of representational dynamics also point to exciting new research directions for understanding how models adapt to context.


【3】HESTIA: A Hessian-Guided Differentiable Quantization-Aware Training Framework for Extremely Low-Bit LLMs
标题:HESTIA:针对极低位LLM的Hessian引导的差异量化感知训练框架
链接:https://arxiv.org/abs/2601.20745

作者:Guoan Wang,Feiyu Wang,Zongwei Lv,Yikun Zong,Tong Yang
备注:13 pages, 2 figures
摘要:随着大型语言模型(LLM)的不断扩展,部署越来越多地受到内存墙的影响,从而促使向极低比特量化的转变。然而,大多数量化感知训练(QAT)方法从训练开始就应用硬舍入和直通估计器(STE),这过早地将优化景观离散化,并导致潜在权重和量化权重之间的持续梯度失配,从而阻碍量化模型的有效优化。为了解决这个问题,我们提出了Hestia,这是一种用于极低比特LLM的Hessian引导的可微QAT框架,它用温度控制的softmax松弛代替了刚性阶跃函数,以在训练早期保持梯度流,同时逐步硬化量化。此外,Hestia利用张量Hessian迹度量作为轻量级曲率信号来驱动细粒度温度退火,从而在整个模型中实现敏感性感知的离散化。对Llama-3.2的评估表明,Hestia始终优于现有的三元QAT基线,1B和3B模型的平均zero-shot改进为5.39%和4.34%。这些结果表明,Hessian引导的松弛有效地恢复了表示能力,为1.58位LLM建立了更鲁棒的训练路径。该代码可在https://github.com/hestia2026/Hestia上获得。
摘要 :As large language models (LLMs) continue to scale, deployment is increasingly bottlenecked by the memory wall, motivating a shift toward extremely low-bit quantization. However, most quantization-aware training (QAT) methods apply hard rounding and the straight-through estimator (STE) from the beginning of the training, which prematurely discretizes the optimization landscape and induces persistent gradient mismatch between latent weights and quantized weights, hindering effective optimization of quantized models. To address this, we propose Hestia, a Hessian-guided differentiable QAT framework for extremely low-bit LLMs, which replaces the rigid step function with a temperature-controlled softmax relaxation to maintain gradient flow early in training while progressively hardening quantization. Furthermore, Hestia leverages a tensor-wise Hessian trace metric as a lightweight curvature signal to drive fine-grained temperature annealing, enabling sensitivity-aware discretization across the model. Evaluations on Llama-3.2 show that Hestia consistently outperforms existing ternary QAT baselines, yielding average zero-shot improvements of 5.39% and 4.34% for the 1B and 3B models. These results indicate that Hessian-guided relaxation effectively recovers representational capacity, establishing a more robust training path for 1.58-bit LLMs. The code is available at https://github.com/hestia2026/Hestia.


【4】Structurally Human, Semantically Biased: Detecting LLM-Generated References with Embeddings and GNNs
标题:结构上人性化、语义偏见:使用嵌入和GNN检测LLM生成的引用
链接:https://arxiv.org/abs/2601.20704

作者:Melika Mobini,Vincent Holst,Floriano Tori,Andres Algaba,Vincent Ginis
备注:34 pages, 20 figures. Accepted at ICLR 2026
摘要:大型语言模型越来越多地用于管理书目,这就提出了一个问题:它们的参考书目与人类的参考书目有区别吗?我们为SciSciNet的10,000篇焦点论文(约27.5万篇参考文献)构建了成对的引用图,地面实况和GPT-40生成(从参数知识),并添加了一个字段匹配的随机基线,在打破潜在结构的同时保留了度和字段分布。我们比较了(i)仅结构节点特征(度/接近度/特征向量中心性,聚类,边缘计数)与(ii)3072-D标题/摘要嵌入,使用图级聚合的RF和具有节点特征的图神经网络。结构本身几乎没有分开GPT从地面真相(射频精度约0.60),尽管干净地拒绝随机基线(约0.89--0.92)。相比之下,嵌入大大提高了可分性:聚合嵌入上的RF达到了$\约$0.83,具有嵌入节点特征的GNN在GPT上的测试准确率达到了93%。地面实况我们通过使用Claude Sonnet 4.5和多个嵌入模型(OpenAI和SPECTER)复制流水线,展示了我们发现的鲁棒性,其中RF可分离性用于地面真实值与Claude $\约为0.77$,并且完全拒绝随机基线。因此,纯粹从参数知识生成的LLM书目,密切模仿人类引用拓扑结构,但留下可检测的语义指纹;检测和去偏置应该针对内容信号,而不是全局图结构。
摘要:Large language models are increasingly used to curate bibliographies, raising the question: are their reference lists distinguishable from human ones? We build paired citation graphs, ground truth and GPT-4o-generated (from parametric knowledge), for 10,000 focal papers ($\approx$ 275k references) from SciSciNet, and added a field-matched random baseline that preserves out-degree and field distributions while breaking latent structure. We compare (i) structure-only node features (degree/closeness/eigenvector centrality, clustering, edge count) with (ii) 3072-D title/abstract embeddings, using an RF on graph-level aggregates and Graph Neural Networks with node features. Structure alone barely separates GPT from ground truth (RF accuracy $\approx$ 0.60) despite cleanly rejecting the random baseline ($\approx$ 0.89--0.92). By contrast, embeddings sharply increase separability: RF on aggregated embeddings reaches $\approx$ 0.83, and GNNs with embedding node features achieve 93\% test accuracy on GPT vs.\ ground truth. We show the robustness of our findings by replicating the pipeline with Claude Sonnet 4.5 and with multiple embedding models (OpenAI and SPECTER), with RF separability for ground truth vs.\ Claude $\approx 0.77$ and clean rejection of the random baseline. Thus, LLM bibliographies, generated purely from parametric knowledge, closely mimic human citation topology, but leave detectable semantic fingerprints; detection and debiasing should target content signals rather than global graph structure.


【5】Concept Component Analysis: A Principled Approach for Concept Extraction in LLMs
标题:概念组件分析:LLM中概念提取的原则方法
链接:https://arxiv.org/abs/2601.20420

作者:Yuhang Liu,Erdun Gao,Dong Gong,Anton van den Hengel,Javen Qinfeng Shi
摘要:开发大型语言模型(LLM)的人类可理解解释对于它们在重要领域的部署变得越来越重要。机械可解释性试图通过从LLM的激活中提取人类可解释的过程和概念来缓解这些问题。稀疏自动编码器(SAE)已经成为一种流行的方法,用于通过将LLM内部表示分解到字典中来提取可解释的和单语义的概念。尽管他们的经验进展,严重不良事件遭受一个基本的理论模糊性:LLM表示和人类可解释的概念之间的良好定义的对应关系仍然不清楚。这种缺乏理论基础的情况引起了一些方法上的挑战,包括原则性方法设计和评价标准方面的困难。在这项工作中,我们表明,在温和的假设下,通过将概念视为潜变量的潜变量模型的视角,在给定输入上下文的情况下,LLM表示可以近似为概念的对数后验的{线性混合}。这激发了概念提取的原则框架,即概念成分分析(ConCA),其目的是通过{无监督}线性分解过程从LLM表示中恢复每个概念的对数后验。我们探索一个特定的变体,称为稀疏ConCA,它利用稀疏性之前,解决固有的不适定性的解混问题。我们实现了12个稀疏ConCA变体,并展示了它们在多个LLM中提取有意义概念的能力,提供了理论支持的SAE优势。
摘要:Developing human understandable interpretation of large language models (LLMs) becomes increasingly critical for their deployment in essential domains. Mechanistic interpretability seeks to mitigate the issues through extracts human-interpretable process and concepts from LLMs' activations. Sparse autoencoders (SAEs) have emerged as a popular approach for extracting interpretable and monosemantic concepts by decomposing the LLM internal representations into a dictionary. Despite their empirical progress, SAEs suffer from a fundamental theoretical ambiguity: the well-defined correspondence between LLM representations and human-interpretable concepts remains unclear. This lack of theoretical grounding gives rise to several methodological challenges, including difficulties in principled method design and evaluation criteria. In this work, we show that, under mild assumptions, LLM representations can be approximated as a {linear mixture} of the log-posteriors over concepts given the input context, through the lens of a latent variable model where concepts are treated as latent variables. This motivates a principled framework for concept extraction, namely Concept Component Analysis (ConCA), which aims to recover the log-posterior of each concept from LLM representations through a {unsupervised} linear unmixing process. We explore a specific variant, termed sparse ConCA, which leverages a sparsity prior to address the inherent ill-posedness of the unmixing problem. We implement 12 sparse ConCA variants and demonstrate their ability to extract meaningful concepts across multiple LLMs, offering theory-backed advantages over SAEs.


【6】LLM-AutoDP: Automatic Data Processing via LLM Agents for Model Fine-tuning
标题:LLM-AutoDP:通过LLM代理进行自动数据处理以进行模型微调
链接:https://arxiv.org/abs/2601.20375

作者:Wei Huang,Anda Cheng,Yinggui Wang,Lei Wang,Tao Wei
备注:Accepted by VLDB2026
摘要:大型语言模型(LLM)可以在特定领域的数据上进行微调,以增强其在专业领域的性能。然而,这样的数据往往包含许多低质量的样本,需要有效的数据处理(DP)。在实践中,DP策略通常通过迭代手动分析和试错调整来开发。这些过程不可避免地会产生高昂的劳动力成本,并且由于人类直接访问敏感数据,可能会导致医疗保健等高隐私领域的隐私问题。因此,在不暴露原始数据的情况下实现自动化数据处理已成为一个关键挑战。为了应对这一挑战,我们提出了LLM-AutoDP,一个新的框架,利用LLM作为代理自动生成和优化数据处理策略。我们的方法生成多个候选策略,并使用反馈信号和比较评估迭代地对其进行优化。这种迭代的上下文学习机制使代理能够收敛到高质量的处理管道,而不需要直接的人工干预或访问底层数据。为了进一步加速策略搜索,我们引入了三个关键技术:分布保持采样,在保持分布完整性的同时减少数据量;处理目标选择,使用二进制分类器来识别低质量样本以进行集中处理;缓存和重用机制,通过重用先前的处理结果来最大限度地减少冗余计算。结果表明,在我们的框架处理的数据上训练的模型与在未处理的数据上训练的模型相比,胜率超过80%。与基于LLM代理的AutoML基线相比,LLM-AutoDP实现了大约65%的胜率。此外,我们的加速技术将总搜索时间减少了10倍,证明了有效性和效率。
摘要 :Large Language Models (LLMs) can be fine-tuned on domain-specific data to enhance their performance in specialized fields. However, such data often contains numerous low-quality samples, necessitating effective data processing (DP). In practice, DP strategies are typically developed through iterative manual analysis and trial-and-error adjustment. These processes inevitably incur high labor costs and may lead to privacy issues in high-privacy domains like healthcare due to direct human access to sensitive data. Thus, achieving automated data processing without exposing the raw data has become a critical challenge. To address this challenge, we propose LLM-AutoDP, a novel framework that leverages LLMs as agents to automatically generate and optimize data processing strategies. Our method generates multiple candidate strategies and iteratively refines them using feedback signals and comparative evaluations. This iterative in-context learning mechanism enables the agent to converge toward high-quality processing pipelines without requiring direct human intervention or access to the underlying data. To further accelerate strategy search, we introduce three key techniques: Distribution Preserving Sampling, which reduces data volume while maintaining distributional integrity; Processing Target Selection, which uses a binary classifier to identify low-quality samples for focused processing; Cache-and-Reuse Mechanism}, which minimizes redundant computations by reusing prior processing results. Results show that models trained on data processed by our framework achieve over 80% win rates against models trained on unprocessed data. Compared to AutoML baselines based on LLM agents, LLM-AutoDP achieves approximately a 65% win rate. Moreover, our acceleration techniques reduce the total searching time by up to 10 times, demonstrating both effectiveness and efficiency.


【7】Improving Diffusion Language Model Decoding through Joint Search in Generation Order and Token Space
标题:通过世代顺序和令牌空间联合搜索改进扩散语言模型解码
链接:https://arxiv.org/abs/2601.20339

作者:Yangyi Shen,Tianjian Feng,Jiaqi Han,Wen Wang,Tianlang Chen,Chunhua Shen,Jure Leskovec,Stefano Ermon
摘要:扩散语言模型(DLMs)提供顺序不可知的生成,可以探索许多可能的解码轨迹。然而,当前的解码方法致力于单个轨迹,限制了在轨迹空间中的探索。我们引入了订单令牌搜索,通过联合搜索生成顺序和令牌值来探索这个空间。它的核心是一个似然估计器,可以对去噪动作进行评分,从而实现稳定的修剪和对不同轨迹的有效探索。在数学推理和编码基准测试中,Order-Token Search在GSM 8 K、MATH 500、Countdown和HumanEval上的表现始终优于基线(相对于主干网的绝对值分别为3.1%、3.8%、7.9%和6.8%),匹配或超过了diffu-GRPO后训练的d1-LLaDA。我们的工作建立联合搜索作为一个关键组成部分,推进解码DLMs。
摘要:Diffusion Language Models (DLMs) offer order-agnostic generation that can explore many possible decoding trajectories. However, current decoding methods commit to a single trajectory, limiting exploration in trajectory space. We introduce Order-Token Search to explore this space through jointly searching over generation order and token values. Its core is a likelihood estimator that scores denoising actions, enabling stable pruning and efficient exploration of diverse trajectories. Across mathematical reasoning and coding benchmarks, Order-Token Search consistently outperforms baselines on GSM8K, MATH500, Countdown, and HumanEval (3.1%, 3.8%, 7.9%, and 6.8% absolute over backbone), matching or surpassing diffu-GRPO post-trained d1-LLaDA. Our work establishes joint search as a key component for advancing decoding in DLMs.


【8】Demonstration-Free Robotic Control via LLM Agents
标题:通过LLM代理进行免演示的机器人控制
链接:https://arxiv.org/abs/2601.20334

作者:Brian Y. Tsui,Alan Y. Fang,Tiffany J. Hwu
摘要:机器人操作越来越多地采用视觉-语言-动作(VLA)模型,这些模型具有很强的性能,但通常需要特定于任务的演示和微调,并且在域转移下通常概括性较差。我们调查是否通用的大语言模型(LLM)代理框架,最初开发的软件工程,可以作为一种替代控制范式体现操纵。我们介绍FAEA(前沿代理作为嵌入式代理),它适用于一个LLM代理框架直接体现操纵,而无需修改。使用相同的迭代推理,使软件代理调试代码,FAEA使体现代理的原因,通过操纵策略。我们评估了一个未经修改的前沿代理,克劳德代理SDK,在LIBERO,ManiSkill 3和MetaWorld基准。通过特权环境状态访问,FAEA分别实现了84.9%、85.7%和96%的成功率。这种任务成功的水平接近VLA模型,每个任务的演示少于100次,不需要演示或微调。通过一轮人工反馈作为可选优化,LIBERO的性能提高到88.2%。这种无需演示的能力具有直接的实用价值:FAEA可以在模拟中自主探索新的场景,并在体现式学习中为训练数据增强生成成功的轨迹。我们的结果表明,通用代理足以完成一类以深思熟虑的任务级规划为主的操作任务。这为机器人系统开辟了一条道路,可以利用积极维护的代理基础设施,并直接受益于前沿模型的不断进步。代码可从https://github.com/robiemusketeer/faea-sim获得
摘要:Robotic manipulation has increasingly adopted vision-language-action (VLA) models, which achieve strong performance but typically require task-specific demonstrations and fine-tuning, and often generalize poorly under domain shift. We investigate whether general-purpose large language model (LLM) agent frameworks, originally developed for software engineering, can serve as an alternative control paradigm for embodied manipulation. We introduce FAEA (Frontier Agent as Embodied Agent), which applies an LLM agent framework directly to embodied manipulation without modification. Using the same iterative reasoning that enables software agents to debug code, FAEA enables embodied agents to reason through manipulation strategies. We evaluate an unmodified frontier agent, Claude Agent SDK, across the LIBERO, ManiSkill3, and MetaWorld benchmarks. With privileged environment state access, FAEA achieves success rates of 84.9%, 85.7%, and 96%, respectively. This level of task success approaches that of VLA models trained with less than 100 demonstrations per task, without requiring demonstrations or fine-tuning. With one round of human feedback as an optional optimization, performance increases to 88.2% on LIBERO. This demonstration-free capability has immediate practical value: FAEA can autonomously explore novel scenarios in simulation and generate successful trajectories for training data augmentation in embodied learning. Our results indicate that general-purpose agents are sufficient for a class of manipulation tasks dominated by deliberative, task-level planning. This opens a path for robotics systems to leverage actively maintained agent infrastructure and benefit directly from ongoing advances in frontier models. Code is available at https://github.com/robiemusketeer/faea-sim


【9】Window-Diffusion: Accelerating Diffusion Language Model Inference with Windowed Token Pruning and Caching
标题:窗口扩散:通过窗口令牌修剪和缓存加速扩散语言模型推理
链接:https://arxiv.org/abs/2601.20332

作者:Fengrui Zuo,Zhiwei Ke,Yiming Liu,Wenqi Lou,Chao Wang,Xvehai Zhou
摘要:扩散语言模型(DLMs)通过迭代去噪生成文本,但推理需要在每次迭代时关注全序列,导致对掩码标记的大量冗余计算。逐块扩散可以降低这种成本,但它通常依赖于重新训练和约束更新顺序,限制了其对预训练的DLMs的直接适用性。我们的标记级分析揭示了DLM推理中明显的结构局部性。解码是由一个小的前缀本地化的活动令牌的集合驱动的;遥远的未解码的上下文的影响迅速减小,解码令牌表现出阶段性的时间稳定性,使重用的中间表示,除了一个简短的解码后瞬态。受这些观察的启发,我们提出了\textbf{\placeholder}\footnote{源代码可在https://github.com/vhicrgit/Window-Diffusion.},一种基于窗口的令牌修剪和缓存推理方法。我们维护了一个本地计算窗口,该窗口随着去噪的进行而滑动,并将未解码的令牌划分为:(i)在线计算的\textit{活动令牌},(ii)KV状态被缓存并定期刷新的\textit{缓冲令牌},以及(iii)在窗口外修剪的\textit{远场令牌}。计算仅限于窗口内的活动令牌和缓冲令牌,而远场令牌在每个阶段都被省略。LLaDA和梦想的实验表明,匹配的计算预算下,我们的方法实现了高达99\times $推理加速,同时在很大程度上保持生成性能。
摘要:Diffusion language models (DLMs) generate text through iterative denoising, but inference requires full-sequence attention at every iteration, resulting in substantial redundant computation on masked tokens. Block-wise diffusion can reduce this cost, yet it typically relies on retraining and constrained update orders, limiting its direct applicability to pretrained DLMs. Our token-level analysis reveals pronounced structural locality in DLM inference. Decoding is driven by a small set of prefix-localized active tokens; the influence of distant undecoded context diminishes rapidly, and decoded tokens exhibit stage-wise temporal stability, enabling reuse of intermediate representations except for a brief post-decode transient. Motivated by these observations, we propose \textbf{\placeholder}\footnote{The source code is available at https://github.com/vhicrgit/Window-Diffusion.}, a window-based token pruning and caching method for inference. We maintain a local computation window that slides rightward as denoising progresses, and partition undecoded tokens into: (i) \textit{active tokens} that are computed online, (ii) \textit{buffer tokens} whose KV states are cached and periodically refreshed, and (iii) \textit{far-field tokens} that are pruned outside the window. Computation is restricted to active and buffer tokens within the window, while far-field tokens are omitted at each stage. Experiments on LLaDA and Dream show that, under matched compute budgets, our method achieves up to $99\times$ inference speedup while largely preserving generation performance.


【10】PsychePass: Calibrating LLM Therapeutic Competence via Trajectory-Anchored Tournaments
标题:PsychePass:通过轨迹锚定Tourist校准LLM治疗能力
链接:https://arxiv.org/abs/2601.20330

作者:Zhuang Chen,Dazhen Wan,Zhangkai Zheng,Guanqun Bi,Xiyao Xiao,Binghang Li,Minlie Huang
摘要:虽然大型语言模型在精神卫生保健方面显示出希望,但由于咨询的非结构化和纵向性质,评估其治疗能力仍然具有挑战性。我们认为,目前的评估范式遭受一个unanchored的缺陷,导致两种形式的不稳定性:过程漂移,其中无人驾驶的客户端模拟徘徊远离特定的咨询目标,和标准漂移,静态逐点评分缺乏可靠的判断的稳定性。为了解决这个问题,我们引入Ps,一个统一的框架,校准LLM的治疗能力,通过自主锚定的比赛。我们首先在模拟中锚定交互轨迹,客户可以精确控制流体咨询过程,以探索多方面的能力。然后,我们锚定的战斗轨迹的判断,通过一个有效的瑞士系统的比赛,利用动态成对的战斗产生强大的Elo评级。除了排名之外,我们还证明了锦标赛轨迹可以转化为可信的奖励信号,从而使基于策略的强化学习能够提高LLM的性能。大量的实验验证了PsychePass的有效性及其与人类专家判断的高度一致性。
摘要:While large language models show promise in mental healthcare, evaluating their therapeutic competence remains challenging due to the unstructured and longitudinal nature of counseling. We argue that current evaluation paradigms suffer from an unanchored defect, leading to two forms of instability: process drift, where unsteered client simulation wanders away from specific counseling goals, and standard drift, where static pointwise scoring lacks the stability for reliable judgment. To address this, we introduce Ps, a unified framework that calibrates the therapeutic competence of LLMs via trajectory-anchored tournaments. We first anchor the interaction trajectory in simulation, where clients precisely control the fluid consultation process to probe multifaceted capabilities. We then anchor the battle trajectory in judgments through an efficient Swiss-system tournament, utilizing dynamic pairwise battles to yield robust Elo ratings. Beyond ranking, we demonstrate that tournament trajectories can be transformed into credible reward signals, enabling on-policy reinforcement learning to enhance LLMs' performance. Extensive experiments validate the effectiveness of PsychePass and its strong consistency with human expert judgments.


【11】Less is More: Benchmarking LLM Based Recommendation Agents
标题:少即是多:对基于LLM的推荐代理进行基准测试
链接:https://arxiv.org/abs/2601.20316

作者:Kargi Chauhan,Mahalakshmi Venkateswarlu
摘要:大型语言模型(LLM)越来越多地用于个性化产品推荐,从业者通常认为较长的用户购买历史会导致更好的预测。我们通过使用REGEN数据集在5到50个项目的上下文长度范围内对四个最先进的LLM GPT-4 o-mini,DeepSeek-V3,Qwen2.5- 72 B和Gemini 2.5 Flash进行系统基准测试来挑战这一假设。   令人惊讶的是,我们的实验与50名用户在一个主题内的设计显示,没有显着的质量改善,增加上下文长度。质量分数在所有条件下保持平稳(0.17- 0.23)。我们的研究结果具有重要的实际意义:从业者可以通过使用上下文(5- 10个项目)而不是更长的历史(50个项目)来减少推理成本约88%,而不会牺牲推荐质量。我们还分析了提供商之间的延迟模式,并发现了为部署决策提供信息的模型特定行为。这项工作挑战了现有的“更多的背景是更好的”范式,并提供了具有成本效益的LLM为基础的推荐系统可操作的指导方针。
摘要:Large Language Models (LLMs) are increasingly deployed for personalized product recommendations, with practitioners commonly assuming that longer user purchase histories lead to better predictions. We challenge this assumption through a systematic benchmark of four state of the art LLMs GPT-4o-mini, DeepSeek-V3, Qwen2.5-72B, and Gemini 2.5 Flash across context lengths ranging from 5 to 50 items using the REGEN dataset.   Surprisingly, our experiments with 50 users in a within subject design reveal no significant quality improvement with increased context length. Quality scores remain flat across all conditions (0.17--0.23). Our findings have significant practical implications: practitioners can reduce inference costs by approximately 88\% by using context (5--10 items) instead of longer histories (50 items), without sacrificing recommendation quality. We also analyze latency patterns across providers and find model specific behaviors that inform deployment decisions. This work challenges the existing ``more context is better'' paradigm and provides actionable guidelines for cost effective LLM based recommendation systems.


【12】SuperInfer: SLO-Aware Rotary Scheduling and Memory Management for LLM Inference on Superchips
标题:SuperInfer:超级芯片上LLM推理的SLO感知旋转调度和内存管理
链接:https://arxiv.org/abs/2601.20309

作者:Jiahuan Yu,Mingtao Hu,Zichao Lin,Minjia Zhang
备注:Accepted by MLSys '26
摘要:大型语言模型(LLM)服务面临着严格的延迟服务水平目标(SLO)和有限的GPU内存容量之间的根本紧张关系。当高请求速率耗尽KV缓存预算时,现有的LLM推理系统经常遭受严重的行首(HOL)阻塞。虽然先前的工作探索了基于PCIe的卸载,但这些方法无法在高请求速率下保持响应能力,通常无法满足严格的第一个令牌时间(TTFT)和令牌之间的时间(TBT)SLO。我们提出了SuperInfer,一个为新兴的超级芯片(例如,NVIDIA GH 200),通过NVLink-C2C紧密耦合GPU-CPU架构。SuperInfer推出了RotaSched,这是第一个主动的,SLO感知的旋转调度器,它旋转请求以保持Superchips上的响应能力,以及DuplexKV,这是一个优化的旋转引擎,可以通过NVLink-C2C实现全双工传输。使用各种模型和数据集对GH 200进行的评估表明,SuperInfer将TTFT SLO实现率提高了74.7%,同时与最先进的系统相比,保持了相当的TBT和吞吐量,这表明SLO感知调度和内存协同设计释放了Superchips用于响应式LLM服务的全部潜力。
摘要:Large Language Model (LLM) serving faces a fundamental tension between stringent latency Service Level Objectives (SLOs) and limited GPU memory capacity. When high request rates exhaust the KV cache budget, existing LLM inference systems often suffer severe head-of-line (HOL) blocking. While prior work explored PCIe-based offloading, these approaches cannot sustain responsiveness under high request rates, often failing to meet tight Time-To-First-Token (TTFT) and Time-Between-Tokens (TBT) SLOs. We present SuperInfer, a high-performance LLM inference system designed for emerging Superchips (e.g., NVIDIA GH200) with tightly coupled GPU-CPU architecture via NVLink-C2C. SuperInfer introduces RotaSched, the first proactive, SLO-aware rotary scheduler that rotates requests to maintain responsiveness on Superchips, and DuplexKV, an optimized rotation engine that enables full-duplex transfer over NVLink-C2C. Evaluations on GH200 using various models and datasets show that SuperInfer improves TTFT SLO attainment rates by up to 74.7% while maintaining comparable TBT and throughput compared to state-of-the-art systems, demonstrating that SLO-aware scheduling and memory co-design unlocks the full potential of Superchips for responsive LLM serving.


【13】Truthfulness Despite Weak Supervision: Evaluating and Training LLMs Using Peer Prediction
标题:尽管监督薄弱,但仍保持真实性:使用同侪预测评估和训练LLM
链接:https://arxiv.org/abs/2601.20299

作者:Tianyi Alex Qiu,Micah Carroll,Cameron Allen
备注:ICLR 2026
摘要:大型语言模型(LLM)的评估和后训练依赖于监督,但对于困难的任务通常无法进行强有力的监督,特别是在评估前沿模型时。在这种情况下,模型被证明利用建立在这种不完善的监督,导致欺骗性的结果的评价。然而,在LLM研究中,大量的机制设计研究集中在博弈论激励相容性上,即,在监督不力的情况下引出诚实和翔实的答案。借鉴这些文献,我们介绍了同行预测方法的模型评估和后训练。它使用基于相互可预测性的度量标准,奖励诚实和信息丰富的答案,而不是欺骗性和无信息的答案,而不需要地面真相标签。我们证明了该方法的有效性和抗欺骗性,与理论保证和经验验证模型高达405 B参数。我们发现,使用基于对等预测的奖励来训练8B模型可以恢复由于先前恶意微调而导致的大部分真实性下降,即使奖励是由没有微调的0.135B语言模型产生的。在评估方面,与需要强大和可信的法官的LLM-as-a-Judge相比,我们发现了同行预测中的逆缩放属性,令人惊讶的是,随着专家和参与者之间的能力差距扩大,对欺骗的抵抗力得到加强,从而能够可靠地评估具有弱监督的强模型。特别是,当面对5- 20倍于法官大小的欺骗性模型时,LLM-as-a-Judge变得比随机猜测更糟糕,而当这种差距很大时,同行预测蓬勃发展,包括在超过100倍大小差异的情况下。
摘要 :The evaluation and post-training of large language models (LLMs) rely on supervision, but strong supervision for difficult tasks is often unavailable, especially when evaluating frontier models. In such cases, models are demonstrated to exploit evaluations built on such imperfect supervision, leading to deceptive results. However, underutilized in LLM research, a wealth of mechanism design research focuses on game-theoretic incentive compatibility, i.e., eliciting honest and informative answers with weak supervision. Drawing from this literature, we introduce the peer prediction method for model evaluation and post-training. It rewards honest and informative answers over deceptive and uninformative ones, using a metric based on mutual predictability and without requiring ground truth labels. We demonstrate the method's effectiveness and resistance to deception, with both theoretical guarantees and empirical validation on models with up to 405B parameters. We show that training an 8B model with peer prediction-based reward recovers most of the drop in truthfulness due to prior malicious finetuning, even when the reward is produced by a 0.135B language model with no finetuning. On the evaluation front, in contrast to LLM-as-a-Judge which requires strong and trusted judges, we discover an inverse scaling property in peer prediction, where, surprisingly, resistance to deception is strengthened as the capability gap between the experts and participants widens, enabling reliable evaluation of strong models with weak supervision. In particular, LLM-as-a-Judge become worse than random guess when facing deceptive models 5-20x the judge's size, while peer prediction thrives when such gaps are large, including in cases with over 100x size difference.


【14】What's the plan? Metrics for implicit planning in LLMs and their application to rhyme generation and question answering
标题:计划是什么?LLM中的隐性规划及其在押韵生成和问答中的应用
链接:https://arxiv.org/abs/2601.20164

作者:Jim Maar,Denis Paperno,Callum Stuart McDougall,Neel Nanda
备注:41 pages, 34 figures, Accepted at ICLR 2026, Code available at https://github.com/Jim-Maar/implicit-planning-in-llms
摘要:先前的工作表明,语言模型在训练下一个标记预测时,表现出隐式规划行为:它们可能会选择下一个标记,为预测的未来标记做准备,例如可能的押韵词,这得到了先前使用跨层转码器对Claude 3.5 Haiku进行的定性研究的支持。我们提出了更简单的技术来评估语言模型中的隐式规划。通过对押韵诗生成和问答的案例研究,我们证明了我们的方法很容易扩展到许多模型。在不同的模型中,我们发现生成的押韵(例如“-ight”)或问题的答案(“whale”)可以通过在前一行的末尾使用向量进行转向来操纵,从而影响导致押韵或答案词的中间标记的生成。我们表明,隐式规划是一种普遍的机制,存在于比以前认为的更小的模型中,从1B参数开始。我们的方法提供了一个广泛适用的直接的方式来研究内隐规划能力的LLM。更广泛地说,理解语言模型的规划能力可以为人工智能安全和控制决策提供信息。
摘要:Prior work suggests that language models, while trained on next token prediction, show implicit planning behavior: they may select the next token in preparation to a predicted future token, such as a likely rhyming word, as supported by a prior qualitative study of Claude 3.5 Haiku using a cross-layer transcoder. We propose much simpler techniques for assessing implicit planning in language models. With case studies on rhyme poetry generation and question answering, we demonstrate that our methodology easily scales to many models. Across models, we find that the generated rhyme (e.g. "-ight") or answer to a question ("whale") can be manipulated by steering at the end of the preceding line with a vector, affecting the generation of intermediate tokens leading up to the rhyme or answer word. We show that implicit planning is a universal mechanism, present in smaller models than previously thought, starting from 1B parameters. Our methodology offers a widely applicable direct way to study implicit planning abilities of LLMs. More broadly, understanding planning abilities of language models can inform decisions in AI safety and control.


【15】LogSieve: Task-Aware CI Log Reduction for Sustainable LLM-Based Analysis
标题:LogSieve:任务感知CI Log减少,以实现可持续的基于LLM的分析
链接:https://arxiv.org/abs/2601.20148

作者:Marcus Emmanuel Barnes,Taher A. Ghaleb,Safwat Hassan
备注:Preprint. Accepted for presentation at Mining Software Repositories (MSR'26), co-located ICSE 2026. The final version will appear in the ACM Digital Library as part of the MSR'26 conference proceedings
摘要:对于理解持续集成(CI)行为,特别是诊断构建失败和性能退化,性能分析是必不可少的。然而,它们不断增长的数量和冗长性使得手动检查和自动分析的成本越来越高,耗时越来越长,环境成本越来越高。虽然之前的工作已经探索了日志压缩,异常检测和基于LLM的日志分析,但大多数工作都是针对结构化系统日志,而不是CI工作流中典型的非结构化,嘈杂和冗长的日志。   我们提出了LogSieve,一个轻量级的,RCA感知和语义保持日志减少技术,过滤低信息行,同时保留相关的内容下游推理。使用GitHub Actions对来自20个开源Android项目的CI日志进行评估,LogSieve平均减少了42%的行数,减少了40%的令牌,并且语义损失最小。这种预推断减少降低了计算成本,并且可以通过减少在LLM推断期间处理的数据量来成比例地减少能量使用(和相关联的排放)。   与结构优先基线(LogZip和随机线删除)相比,LogSieve保留了更高的语义和分类保真度(余弦= 0.93,GPTScore = 0.93,80%精确匹配准确度)。基于嵌入的分类器以接近人类的准确度(97%)自动进行相关性检测,从而将语义感知过滤可扩展和可持续地集成到CI工作流中。因此,LogSieve将日志管理和LLM推理连接起来,为实现更环保、更可解释的CI自动化提供了一条实用的途径。
摘要:Logs are essential for understanding Continuous Integration (CI) behavior, particularly for diagnosing build failures and performance regressions. Yet their growing volume and verbosity make both manual inspection and automated analysis increasingly costly, time-consuming, and environmentally costly. While prior work has explored log compression, anomaly detection, and LLM-based log analysis, most efforts target structured system logs rather than the unstructured, noisy, and verbose logs typical of CI workflows.   We present LogSieve, a lightweight, RCA-aware and semantics-preserving log reduction technique that filters low-information lines while retaining content relevant to downstream reasoning. Evaluated on CI logs from 20 open-source Android projects using GitHub Actions, LogSieve achieves an average 42% reduction in lines and 40% reduction in tokens with minimal semantic loss. This pre-inference reduction lowers computational cost and can proportionally reduce energy use (and associated emissions) by decreasing the volume of data processed during LLM inference.   Compared with structure-first baselines (LogZip and random-line removal), LogSieve preserves much higher semantic and categorical fidelity (Cosine = 0.93, GPTScore = 0.93, 80% exact-match accuracy). Embedding-based classifiers automate relevance detection with near-human accuracy (97%), enabling scalable and sustainable integration of semantics-aware filtering into CI workflows. LogSieve thus bridges log management and LLM reasoning, offering a practical path toward greener and more interpretable CI automation.


【16】Rewarding Intellectual Humility Learning When Not To Answer In Large Language Models
标题:在大型语言模型中不回答时奖励智力谦逊学习
链接:https://arxiv.org/abs/2601.20126

作者:Abha Jha,Akanksha Mahajan,Ashwath Vaithinathan Aravindan,Praveen Saravanan,Sai Sailaja Policharla,Sonal Chaturbhuj Gehlot
摘要:大型语言模型(LLM)经常产生幻觉或无法验证的内容,破坏了它们在事实领域的可靠性。这项工作研究了具有可验证奖励的强化学习(RLVR)作为一种训练范式,它明确地奖励谨慎(“我不知道”)以及正确性,以促进智力谦逊。我们在MedMCQA和Hendrycks数学基准测试中使用三元奖励结构($-1$,r_abs,1)在不同的奖励结构下对Granite-3.3-2B-Instruct和Qwen-3- 4 B-Instruct进行微调和评估。我们进一步研究了将RLVR与在强化学习之前教授预防的监督微调策略相结合的效果。我们的研究结果表明,适度的谨慎奖励(r_abs $\approximately-0.25 $to 0.3)一致地减少了不正确的反应,而不会严重降低多项选择任务的准确性,较大的模型表现出更大的鲁棒性谨慎激励。在开放式问题回答中,我们观察到由于探索不足而导致的局限性,这可以通过有监督的推理训练来部分缓解。总体而言,这些发现表明了可验证的奖励设计作为语言模型中缓解幻觉的实用方法的可行性和灵活性。可复制的代码为我们的预防训练框架可在这里https://github.com/Mystic-Slice/rl-abstention.
摘要 :Large Language Models (LLMs) often produce hallucinated or unverifiable content, undermining their reliability in factual domains. This work investigates Reinforcement Learning with Verifiable Rewards (RLVR) as a training paradigm that explicitly rewards abstention ("I don't know") alongside correctness to promote intellectual humility. We fine-tune and evaluate Granite-3.3-2B-Instruct and Qwen-3-4B-Instruct on the MedMCQA and Hendrycks Math benchmarks using a ternary reward structure ($-1$, r_abs, 1) under varying abstention reward structures. We further study the effect of combining RLVR with supervised fine-tuning strategies that teach abstention prior to reinforcement learning. Our results show that moderate abstention rewards (r_abs $\approx -0.25$ to 0.3) consistently reduce incorrect responses without severe accuracy degradation on multiple-choice tasks, with larger models exhibiting greater robustness to abstention incentives. On open-ended question answering, we observe limitations due to insufficient exploration, which can be partially mitigated through supervised abstention training. Overall, these findings demonstrate the feasibility and flexibility of verifiable reward design as a practical approach for hallucination mitigation in language models. Reproducible code for our abstention training framework is available here https://github.com/Mystic-Slice/rl-abstention.


【17】Membership Inference Attacks Against Fine-tuned Diffusion Language Models
标题:针对微调扩散语言模型的成员推断攻击
链接:https://arxiv.org/abs/2601.20125

作者:Yuetian Chen,Kaiyuan Zhang,Yuntao Du,Edoardo Stoppa,Charles Fleming,Ashish Kundu,Bruno Ribeiro,Ninghui Li
备注:Accepted for presentation at ICLR 2026 (pending final camera-ready)
摘要:扩散语言模型(DLMs)是自回归语言模型的一个很有前途的替代方案,使用双向掩码标记预测。然而,他们对通过成员推理攻击(MIA)泄露隐私的敏感性仍然严重不足。本文提出了第一个系统的调查MIA漏洞的DLMs。与自回归模型的单一固定预测模式不同,DLM的多个可屏蔽配置以指数方式增加攻击机会。这种探测许多独立掩模的能力显著提高了检测机会。为了利用这一点,我们引入了SAMA(子集聚合成员攻击),它通过鲁棒的聚合来解决稀疏信号挑战。SAMA在渐进密度中对掩蔽子集进行采样,并应用基于符号的统计,尽管存在重尾噪声,但仍然有效。SAMA通过反向加权聚合优先稀疏掩码的更干净的信号,将稀疏记忆检测转换为一个强大的投票机制。在9个数据集上的实验表明,SAMA在最佳基线上实现了30%的相对AUC改善,在低假阳性率下改善了8倍。这些发现揭示了DLMs中以前未知的重大漏洞,需要开发量身定制的隐私防御。
摘要:Diffusion Language Models (DLMs) represent a promising alternative to autoregressive language models, using bidirectional masked token prediction. Yet their susceptibility to privacy leakage via Membership Inference Attacks (MIA) remains critically underexplored. This paper presents the first systematic investigation of MIA vulnerabilities in DLMs. Unlike the autoregressive models' single fixed prediction pattern, DLMs' multiple maskable configurations exponentially increase attack opportunities. This ability to probe many independent masks dramatically improves detection chances. To exploit this, we introduce SAMA (Subset-Aggregated Membership Attack), which addresses the sparse signal challenge through robust aggregation. SAMA samples masked subsets across progressive densities and applies sign-based statistics that remain effective despite heavy-tailed noise. Through inverse-weighted aggregation prioritizing sparse masks' cleaner signals, SAMA transforms sparse memorization detection into a robust voting mechanism. Experiments on nine datasets show SAMA achieves 30% relative AUC improvement over the best baseline, with up to 8 times improvement at low false positive rates. These findings reveal significant, previously unknown vulnerabilities in DLMs, necessitating the development of tailored privacy defenses.


【18】CiMRAG: Cim-Aware Domain-Adaptive and Noise-Resilient Retrieval-Augmented Generation for Edge-Based LLMs
标题:CiMRAG:基于边缘的LLM的Cim感知域自适应和抗噪检索增强生成
链接:https://arxiv.org/abs/2601.20041

作者:Shih-Hsuan Chiu,Ming-Syan Chen
备注:Accepted by ICASSP 2026
摘要:由边缘设备上的大型语言模型(LLM)提供支持的个性化虚拟助理正吸引着越来越多的关注,检索增强生成(RAG)通过检索相关的配置文件数据并生成定制的响应,成为个性化的关键方法。然而,由于配置文件数据的快速增长,例如用户-LLM交互和最近的更新,在边缘设备上部署RAG面临效率障碍。虽然内存计算(CiM)架构通过消除内存和处理单元之间的数据移动来缓解这一瓶颈,但它们容易受到环境噪声的影响,从而降低检索精度。这在动态的、基于多域边缘的场景(例如,旅行,医学和法律),其中准确性和适应性都是至关重要的。为了解决这些挑战,我们提出了面向任务的噪声弹性嵌入学习(TONEL),一个框架,提高噪声鲁棒性和域适应性的RAG在嘈杂的边缘环境。TONEL采用噪声感知投影模型来学习与CiM硬件约束兼容的特定于任务的嵌入,从而实现在噪声条件下的准确检索。在个性化基准上进行的大量实验证明了我们的方法相对于强基线的有效性和实用性,特别是在特定于任务的嘈杂场景中。
摘要:Personalized virtual assistants powered by large language models (LLMs) on edge devices are attracting growing attention, with Retrieval-Augmented Generation (RAG) emerging as a key method for personalization by retrieving relevant profile data and generating tailored responses. However, deploying RAG on edge devices faces efficiency hurdles due to the rapid growth of profile data, such as user-LLM interactions and recent updates. While Computing-in-Memory (CiM) architectures mitigate this bottleneck by eliminating data movement between memory and processing units via in-situ operations, they are susceptible to environmental noise that can degrade retrieval precision. This poses a critical issue in dynamic, multi-domain edge-based scenarios (e.g., travel, medicine, and law) where both accuracy and adaptability are paramount. To address these challenges, we propose Task-Oriented Noise-resilient Embedding Learning (TONEL), a framework that improves noise robustness and domain adaptability for RAG in noisy edge environments. TONEL employs a noise-aware projection model to learn task-specific embeddings compatible with CiM hardware constraints, enabling accurate retrieval under noisy conditions. Extensive experiments conducted on personalization benchmarks demonstrate the effectiveness and practicality of our methods relative to strong baselines, especially in task-specific noisy scenarios.


【19】LinguaMap: Which Layers of LLMs Speak Your Language and How to Tune Them?
标题:LinguaMap:哪些层的LLM会说您的语言以及如何调整它们?
链接:https://arxiv.org/abs/2601.20009

作者:J. Ben Tamo,Daniel Carlander-Reuterfelt,Jonathan Rubin,Dezhi Hong,Mingxian Wang,Oleg Poliannikov
摘要:尽管进行了多语言预训练,但大型语言模型通常难以完成非英语任务,特别是在语言控制方面,即用预期语言进行响应的能力。我们确定和表征两个关键的故障模式:多语言传输瓶颈(正确的语言,不正确的任务响应)和语言一致性瓶颈(正确的任务响应,错误的语言)。为了系统地揭示这些问题,我们设计了一个四场景评估协议,跨越MMLU,MGSM和XQuAD基准。为了探索这些问题的可解释性,我们扩展了logit透镜分析,逐层跟踪语言概率,并计算隐藏状态的跨语言语义相似性。结果揭示了一个三阶段的内部结构:早期层对齐输入到一个共享的语义空间,中间层执行任务推理,和后期层驱动特定语言的生成。在这些见解的指导下,我们只对负责语言控制的最后一层进行选择性微调。在Qwen-3- 32 B和Bloom-7.1B上,该方法在六种语言中实现了超过98%的语言一致性,同时只微调了3- 5%的参数,而不会牺牲任务的准确性。重要的是,这个结果与全范围微调几乎相同(例如,在所有提示场景中,两种方法的语言一致性都超过98%),但使用了一小部分计算资源。据我们所知,这是第一种利用语言控制的层本地化实现高效多语言适应的方法。
摘要 :Despite multilingual pretraining, large language models often struggle with non-English tasks, particularly in language control, the ability to respond in the intended language. We identify and characterize two key failure modes: the multilingual transfer bottleneck (correct language, incorrect task response) and the language consistency bottleneck (correct task response, wrong language). To systematically surface these issues, we design a four-scenario evaluation protocol spanning MMLU, MGSM, and XQuAD benchmarks. To probe these issues with interpretability, we extend logit lens analysis to track language probabilities layer by layer and compute cross-lingual semantic similarity of hidden states. The results reveal a three-phase internal structure: early layers align inputs into a shared semantic space, middle layers perform task reasoning, and late layers drive language-specific generation. Guided by these insights, we introduce selective fine-tuning of only the final layers responsible for language control. On Qwen-3-32B and Bloom-7.1B, this method achieves over 98 percent language consistency across six languages while fine-tuning only 3-5 percent of parameters, without sacrificing task accuracy. Importantly, this result is nearly identical to that of full-scope fine-tuning (for example, above 98 percent language consistency for both methods across all prompt scenarios) but uses a fraction of the computational resources. To the best of our knowledge, this is the first approach to leverage layer-localization of language control for efficient multilingual adaptation.


【20】Benchmarking LLAMA Model Security Against OWASP Top 10 For LLM Applications
标题:LLAMA模型安全性与OWASP LLM应用程序前10名进行基准测试
链接:https://arxiv.org/abs/2601.19970

作者:Nourin Shahin,Izzat Alsmadi
摘要:随着大型语言模型(LLM)从研究原型转移到企业系统,其安全漏洞对数据隐私和系统完整性构成了严重风险。本研究将各种Llama模型变体与OWASP Top 10 for LLM应用程序框架进行基准测试,评估威胁检测准确性,响应安全性和计算开销。使用配备NVIDIA A30 GPU的FABRIC测试平台,我们在涵盖10个漏洞类别的100个对抗性提示上测试了5种标准Llama模型和5种Llama Guard变体。我们的研究结果揭示了安全性能的显着差异:紧凑型Llama-Guard-3-1B模型以最小的延迟(每次测试0.165秒)实现了76%的最高检测率,而Llama-3.1-8B等基础模型尽管推理时间较长(0.754秒),但未能检测到威胁(0%准确率)。我们观察到模型大小和安全有效性之间的反比关系,这表明较小的,专门的模型往往优于较大的通用安全任务。此外,我们还提供了一个开源基准数据集,包括对抗性提示、威胁标签和攻击元数据,以支持人工智能安全领域的可重复研究。
摘要:As large language models (LLMs) move from research prototypes to enterprise systems, their security vulnerabilities pose serious risks to data privacy and system integrity. This study benchmarks various Llama model variants against the OWASP Top 10 for LLM Applications framework, evaluating threat detection accuracy, response safety, and computational overhead. Using the FABRIC testbed with NVIDIA A30 GPUs, we tested five standard Llama models and five Llama Guard variants on 100 adversarial prompts covering ten vulnerability categories. Our results reveal significant differences in security performance: the compact Llama-Guard-3-1B model achieved the highest detection rate of 76% with minimal latency (0.165s per test), whereas base models such as Llama-3.1-8B failed to detect threats (0% accuracy) despite longer inference times (0.754s). We observe an inverse relationship between model size and security effectiveness, suggesting that smaller, specialized models often outperform larger general-purpose ones in security tasks. Additionally, we provide an open-source benchmark dataset including adversarial prompts, threat labels, and attack metadata to support reproducible research in AI security, [1].


【21】OPT-Engine: Benchmarking the Limits of LLMs in Optimization Modeling via Complexity Scaling
标题:OPT-Engine:通过复杂性缩放对优化建模中LLM的极限进行基准测试
链接:https://arxiv.org/abs/2601.19924

作者:Yitian Chen,Cheng Cheng,Yinan Sun,Zi Ling,Dongdong Ge
摘要:大型语言模型(LLM)在优化建模方面取得了令人印象深刻的进展,促进了新方法和评估基准的快速扩展。然而,人们对它们在自动化制定和解决问题方面的能力界限仍然知之甚少,特别是在扩展到复杂的现实任务时。为了弥合这一差距,我们提出了OPT-ENGINE,这是一个可扩展的基准框架,旨在评估LLM对可控和可扩展难度的优化建模。OPT-ENGINE涵盖了运筹学中的10个典型任务,包括5个线性规划和5个混合规划。利用OPT-ENGINE,我们对LLM的推理能力进行了广泛的研究,解决了两个关键问题:1。LLM的性能在推广到复杂性超过当前基准水平的分布外优化任务时是否保持稳健?和2.)从问题解释到解决方案生成,当前的LLM在哪个阶段遇到了最大的瓶颈?我们的实证结果产生了两个关键的见解:第一,工具集成推理与外部求解器表现出显着更高的鲁棒性,任务复杂性的升级,而纯文本推理达到天花板;第二,自动制定的约束构成了主要的性能瓶颈。这些发现为开发下一代LLM以进行高级优化提供了可操作的指导。我们的代码可在\textcolor{blue}{https://github.com/Cardinal-Operations/OPTEngine}上公开获取。
摘要:Large Language Models (LLMs) have demonstrated impressive progress in optimization modeling, fostering a rapid expansion of new methodologies and evaluation benchmarks. However, the boundaries of their capabilities in automated formulation and problem solving remain poorly understood, particularly when extending to complex, real-world tasks. To bridge this gap, we propose OPT-ENGINE, an extensible benchmark framework designed to evaluate LLMs on optimization modeling with controllable and scalable difficulty levels. OPT-ENGINE spans 10 canonical tasks across operations research, with five Linear Programming and five Mixed-Integer Programming. Utilizing OPT-ENGINE, we conduct an extensive study of LLMs' reasoning capabilities, addressing two critical questions: 1.) Do LLMs' performance remain robust when generalizing to out-of-distribution optimization tasks that scale in complexity beyond current benchmark levels? and 2.) At what stage, from problem interpretation to solution generation, do current LLMs encounter the most significant bottlenecks? Our empirical results yield two key insights: first, tool-integrated reasoning with external solvers exhibits significantly higher robustness as task complexity escalates, while pure-text reasoning reaches a ceiling; second, the automated formulation of constraints constitutes the primary performance bottleneck. These findings provide actionable guidance for developing next-generation LLMs for advanced optimization. Our code is publicly available at \textcolor{blue}{https://github.com/Cardinal-Operations/OPTEngine}.


【22】CHIME: Chiplet-based Heterogeneous Near-Memory Acceleration for Edge Multimodal LLM Inference
标题:CHIME:基于芯片的边缘多模态LLM推理的异构近存储加速
链接:https://arxiv.org/abs/2601.19908

作者:Yanru Chen,Runyang Tian,Yue Pan,Zheyu Li,Weihong Xu,Tajana Rosing
摘要:大型语言模型(LLM)的激增正在加速多模态助手与边缘设备的集成,在边缘设备中,推理是在严格的延迟和能量限制下执行的,这通常会因间歇性连接而加剧。这些挑战在多模态LLM(MLLM)的上下文中变得特别尖锐,因为高维视觉输入被转换成广泛的令牌序列,从而使键值(KV)缓存膨胀并向LLM骨干施加大量数据移动开销。为了解决这些问题,我们提出了CHIME,这是一种基于芯片的边缘MLLM推理的异构近内存加速。CHIME利用了集成单片3D(M3 D)DRAM和RRAM小芯片的互补优势:DRAM提供低延迟带宽以吸引注意力,而RRAM提供密集的非易失性存储以存储权重。这种异构硬件由共同设计的映射框架协调,该框架在数据附近执行融合内核,最大限度地减少交叉小芯片流量以最大限度地提高有效带宽。在FastVLM(0.6B/1.7B)和MobileVLM(1.7B/3B)上,CHIME实现了高达54倍的加速,与边缘GPU NVIDIA Jetson Orin NX相比,每个推理的能效提高了246倍。与Jetson的0.7-1.1 token/J相比,它维持了116.5-266.5 token/J。此外,它提供了比最先进的PIM加速器FACIL高69.2倍的吞吐量。与M3 D DRAM设计相比,CHIME的异构内存进一步提高了7%的能效和2.4倍的性能。
摘要 :The proliferation of large language models (LLMs) is accelerating the integration of multimodal assistants into edge devices, where inference is executed under stringent latency and energy constraints, often exacerbated by intermittent connectivity. These challenges become particularly acute in the context of multimodal LLMs (MLLMs), as high-dimensional visual inputs are transformed into extensive token sequences, thereby inflating the key-value (KV) cache and imposing substantial data movement overheads to the LLM backbone. To address these issues, we present CHIME, a chiplet-based heterogeneous near-memory acceleration for edge MLLMs inference. CHIME leverages the complementary strengths of integrated monolithic 3D (M3D) DRAM and RRAM chiplets: DRAM supplies low-latency bandwidth for attention, while RRAM offers dense, non-volatile storage for weights. This heterogeneous hardware is orchestrated by a co-designed mapping framework that executes fused kernels near data, minimizing cross-chiplet traffic to maximize effective bandwidth. On FastVLM (0.6B/1.7B) and MobileVLM (1.7B/3B), CHIME achieves up to 54x speedup and up to 246x better energy efficiency per inference as compared to the edge GPU NVIDIA Jetson Orin NX. It sustains 116.5-266.5 token/J compared to Jetson's 0.7-1.1 token/J. Furthermore, it delivers up to 69.2x higher throughput than the state-of-the-art PIM accelerator FACIL. Compared to the M3D DRAM-only design, CHIME's heterogeneous memory further improves energy efficiency by 7% and performance by 2.4x.


【23】Efficient Evaluation of LLM Performance with Statistical Guarantees
标题:通过统计保证有效评估LLM绩效
链接:https://arxiv.org/abs/2601.20251

作者:Skyler Wu,Yash Nair,Emmanuel J. Candés
备注:24 pages, 10 figures
摘要:在一个大型的基准测试套件上详尽地评估许多大型语言模型(LLM)是昂贵的。我们将基准测试作为有限人群推断,并在固定的查询预算下,寻求具有有效频率覆盖率的模型准确性的紧置信区间(CI)。我们提出了因子化主动查询(FAQ),它(a)通过贝叶斯因子模型利用历史信息;(b)使用混合方差减少/主动学习抽样策略自适应地选择问题;(c)通过主动主动推理保持有效性--主动推理的有限群体扩展(Zrnic & Candes,2024),可以在保留覆盖范围的同时实现直接问题选择。在开销成本可以忽略不计的情况下,FAQ在两个基准测试套件上提供了高达$5\times$的有效样本大小增益,跨越不同的历史数据缺失水平:这意味着它匹配均匀采样的CI宽度,同时使用高达$5\times$更少的查询。我们发布我们的源代码和我们策划的数据集,以支持可重复的评估和未来的研究。
摘要:Exhaustively evaluating many large language models (LLMs) on a large suite of benchmarks is expensive. We cast benchmarking as finite-population inference and, under a fixed query budget, seek tight confidence intervals (CIs) for model accuracy with valid frequentist coverage. We propose Factorized Active Querying (FAQ), which (a) leverages historical information through a Bayesian factor model; (b) adaptively selects questions using a hybrid variance-reduction/active-learning sampling policy; and (c) maintains validity through Proactive Active Inference -- a finite-population extension of active inference (Zrnic & Candes, 2024) that enables direct question selection while preserving coverage. With negligible overhead cost, FAQ delivers up to $5\times$ effective sample size gains over strong baselines on two benchmark suites, across varying historical-data missingness levels: this means that it matches the CI width of uniform sampling while using up to $5\times$ fewer queries. We release our source code and our curated datasets to support reproducible evaluation and future research.


Graph相关(图学习|图神经网络|图优化等)(5篇)

【1】Context-Augmented Code Generation Using Programming Knowledge Graphs
标题:使用编程知识图的上下文增强代码生成
链接:https://arxiv.org/abs/2601.20810

作者:Shahd Seddik,Fahd Seddik,Iman Saberi,Fatemeh Fard,Minh Hieu Huynh,Patanamon Thongtanunam
摘要:大型语言模型(LLM)擅长代码生成,但在处理复杂问题时会遇到困难。检索增强生成(RAG)通过整合外部知识来缓解这个问题,但检索模型经常会错过相关的上下文,生成模型会产生不相关的数据。我们提出了编程知识图(PKG)的语义表示和细粒度检索的代码和文本。我们的方法通过树修剪提高检索精度,并通过整合非RAG解决方案的重新排序机制减轻幻觉。将外部数据结构化到更细粒度的节点中可以提高检索粒度。对HumanEval和MBPP的评估显示,与MBPP基线相比,1次通过的准确度提高了20%,改善了34%。我们的研究结果表明,我们提出的PKG方法以及重新排序有效地解决了复杂的问题,同时保持最小的负面影响,已经正确的解决方案,没有RAG。复制包发布于https://github.com/iamshahd/ProgrammingKnowledgeGraph
摘要:Large Language Models (LLMs) excel at code generation but struggle with complex problems. Retrieval-Augmented Generation (RAG) mitigates this issue by integrating external knowledge, yet retrieval models often miss relevant context, and generation models hallucinate with irrelevant data. We propose Programming Knowledge Graph (PKG) for semantic representation and fine-grained retrieval of code and text. Our approach enhances retrieval precision through tree pruning and mitigates hallucinations via a re-ranking mechanism that integrates non-RAG solutions. Structuring external data into finer-grained nodes improves retrieval granularity. Evaluations on HumanEval and MBPP show up to 20% pass@1 accuracy gains and a 34% improvement over baselines on MBPP. Our findings demonstrate that our proposed PKG approach along with re-ranker effectively address complex problems while maintaining minimal negative impact on solutions that are already correct without RAG. The replication package is published at https://github.com/iamshahd/ProgrammingKnowledgeGraph


【2】CCMamba: Selective State-Space Models for Higher-Order Graph Learning on Combinatorial Complexes
标题:CCMamba:用于组合复合体上高阶图学习的选择性状态空间模型
链接:https://arxiv.org/abs/2601.20518

作者:Jiawen Chen,Qi Shao,Mingtong Zhou,Duxin Chen,Wenwu Yu
摘要:拓扑深度学习已经出现,用于对标准图神经网络无法捕获的成对交互之外的高阶关系结构进行建模。虽然组合复合体提供了一个统一的拓扑框架,但大多数现有的拓扑深度学习方法依赖于通过注意机制传递的本地消息,这会产生二次复杂性并保持低维,限制了高阶复合体的可扩展性和秩感知信息聚合。CCMamba通过将多秩关联关系组织成由秩感知状态空间模型处理的结构化序列,将消息传递重新表述为选择性状态空间建模问题。这使得能够在线性时间内进行自适应、定向和长距离的信息传播,而无需自我关注。进一步建立了CCMamba消息传递的表达能力上界为1-Weisfeiler-Lehman检验的理论分析。在图、超图和单纯基准上的实验表明,CCMamba始终优于现有方法,同时表现出更好的可扩展性和深度鲁棒性。
摘要:Topological deep learning has emerged for modeling higher-order relational structures beyond pairwise interactions that standard graph neural networks fail to capture. Although combinatorial complexes offer a unified topological framework, most existing topological deep learning methods rely on local message passing via attention mechanisms, which incur quadratic complexity and remain low-dimensional, limiting scalability and rank-aware information aggregation in higher-order complexes.We propose Combinatorial Complex Mamba (CCMamba), the first unified mamba-based neural framework for learning on combinatorial complexes. CCMamba reformulates message passing as a selective state-space modeling problem by organizing multi-rank incidence relations into structured sequences processed by rank-aware state-space models. This enables adaptive, directional, and long range information propagation in linear time without self attention. We further establish the theoretical analysis that the expressive power upper-bound of CCMamba message passing is the 1-Weisfeiler-Lehman test. Experiments on graph, hypergraph, and simplicial benchmarks demonstrate that CCMamba consistently outperforms existing methods while exhibiting improved scalability and robustness to depth.


【3】Graph-Structured Deep Learning Framework for Multi-task Contention Identification with High-dimensional Metrics
标题:用于使用多维数据库进行多任务竞争识别的图结构深度学习框架
链接:https://arxiv.org/abs/2601.20389

作者:Xiao Yang,Yinan Ni,Yuqi Tang,Zhimin Qiu,Chen Wang,Tingzhou Yuan
摘要:该研究解决了在高维系统环境中准确识别多任务竞争类型的挑战,并提出了一个统一的竞争分类框架,集成了表示转换,结构建模和任务解耦机制。该方法首先从高维度量序列中构造系统状态表示,应用非线性变换提取跨维动态特征,并在共享表示空间内集成资源利用率、调度行为和任务负载变化等多源信息。然后,它引入了一个基于图的建模机制,以捕捉指标之间的潜在依赖关系,使模型能够学习竞争传播模式和跨资源链接的结构干扰。在此基础上,设计了特定于任务的映射结构,对竞争类型之间的差异进行建模,增强了分类器区分多种竞争模式的能力。为了实现稳定的性能,该方法采用了自适应多任务损失加权策略,该策略平衡了共享特征学习与特定于任务的特征提取,并通过标准化的推理过程生成最终的竞争预测。在公开的系统跟踪数据集上进行的实验表明,该模型在准确率、召回率、精度和F1方面具有优势,对批量大小、训练样本规模和度量维度的敏感性分析进一步证实了该模型的稳定性和适用性。研究表明,基于高维度量的结构化表示和多任务分类可以显著提高竞争模式识别能力,为复杂计算环境下的性能管理提供可靠的技术途径。
摘要 :This study addresses the challenge of accurately identifying multi-task contention types in high-dimensional system environments and proposes a unified contention classification framework that integrates representation transformation, structural modeling, and a task decoupling mechanism. The method first constructs system state representations from high-dimensional metric sequences, applies nonlinear transformations to extract cross-dimensional dynamic features, and integrates multiple source information such as resource utilization, scheduling behavior, and task load variations within a shared representation space. It then introduces a graph-based modeling mechanism to capture latent dependencies among metrics, allowing the model to learn competitive propagation patterns and structural interference across resource links. On this basis, task-specific mapping structures are designed to model the differences among contention types and enhance the classifier's ability to distinguish multiple contention patterns. To achieve stable performance, the method employs an adaptive multi-task loss weighting strategy that balances shared feature learning with task-specific feature extraction and generates final contention predictions through a standardized inference process. Experiments conducted on a public system trace dataset demonstrate advantages in accuracy, recall, precision, and F1, and sensitivity analyses on batch size, training sample scale, and metric dimensionality further confirm the model's stability and applicability. The study shows that structured representations and multi-task classification based on high-dimensional metrics can significantly improve contention pattern recognition and offer a reliable technical approach for performance management in complex computing environments.


【4】Exact Graph Learning via Integer Programming
标题:通过Inbox编程进行精确的图形学习
链接:https://arxiv.org/abs/2601.20589

作者:Lucas Kook,Søren Wengel Mogensen
摘要:学习复杂系统中变量之间的依赖结构是医学、自然科学和社会科学的核心问题。这些结构可以自然地用图来表示,从数据中推断出这样的图的任务被称为图学习,或者如果图被赋予因果解释,则被称为因果发现。现有的方法通常依赖于对数据生成过程的限制性假设,采用贪婪的预言机算法,或解决图学习问题的近似公式。因此,它们要么对违反中心假设的行为敏感,要么无法保证全局最优解。我们通过引入基于非参数条件独立性测试和整数规划的非参数图学习框架来解决这些限制。我们将图学习问题转化为整数规划问题,并证明了解决整数规划问题为原始图学习问题提供了全局最优解。我们的方法利用图形分离标准的有效编码,使更大的图的准确恢复比以前可行。我们提供了一个开放的R包'Glip',支持学习(非循环)有向(混合)图和链图的实现。从所得到的输出可以计算相应的马尔可夫等价类或弱等价类的表示。从经验上讲,我们证明了我们的方法比其他现有的精确图学习程序更快,适用于大部分不同大小的实例和图形。GLIP还在模拟数据和基准数据集上实现了最先进的性能,跨越了所有上述类别的图形。
摘要:Learning the dependence structure among variables in complex systems is a central problem across medical, natural, and social sciences. These structures can be naturally represented by graphs, and the task of inferring such graphs from data is known as graph learning or as causal discovery if the graphs are given a causal interpretation. Existing approaches typically rely on restrictive assumptions about the data-generating process, employ greedy oracle algorithms, or solve approximate formulations of the graph learning problem. As a result, they are either sensitive to violations of central assumptions or fail to guarantee globally optimal solutions. We address these limitations by introducing a nonparametric graph learning framework based on nonparametric conditional independence testing and integer programming. We reformulate the graph learning problem as an integer-programming problem and prove that solving the integer-programming problem provides a globally optimal solution to the original graph learning problem. Our method leverages efficient encodings of graphical separation criteria, enabling the exact recovery of larger graphs than was previously feasible. We provide an implementation in the openly available R package 'glip' which supports learning (acyclic) directed (mixed) graphs and chain graphs. From the resulting output one can compute representations of the corresponding Markov equivalence classes or weak equivalence classes. Empirically, we demonstrate that our approach is faster than other existing exact graph learning procedures for a large fraction of instances and graphs of various sizes. GLIP also achieves state-of-the-art performance on simulated data and benchmark datasets across all aforementioned classes of graphs.


【5】MK-SGC-SC: Multiple Kernel guided Sparse Graph Construction in Spectral Clustering for Unsupervised Speaker Diarization
标题:MK-SRC-SC:用于无监督说话者二元化的谱簇中的多核引导稀疏图构建
链接:https://arxiv.org/abs/2601.19946

作者:Nikhil Raghav,Avisek Gupta,Swagatam Das,Md Sahidullah
备注:5 pages
摘要:说话人日志化的目的是将音频记录分割成与各个说话人相对应的区域。虽然无监督的说话人日志化具有内在的挑战性,但在没有预训练或弱监督的情况下识别说话人区域的前景激发了对聚类技术的研究。在这项工作中,我们分享了一个值得注意的观察结果,即测量说话人嵌入的多个内核相似性,然后以原则性的方式为谱聚类制作一个稀疏图,足以在完全无监督的环境中实现最先进的性能。具体来说,我们考虑四个多项式内核和一个度反余弦内核来衡量扬声器嵌入的相似性,使用稀疏图构建的原则性的方式来强调本地的相似性。实验表明,该方法优于在DIHARD-III,AMI,和VoxConverse语料库的各种具有挑战性的环境中的无监督扬声器日记。为了鼓励进一步的研究,我们的实现可以在https://github.com/nikhilraghav29/MK-SGC-SC上获得。
摘要:Speaker diarization aims to segment audio recordings into regions corresponding to individual speakers. Although unsupervised speaker diarization is inherently challenging, the prospect of identifying speaker regions without pretraining or weak supervision motivates research on clustering techniques. In this work, we share the notable observation that measuring multiple kernel similarities of speaker embeddings to thereafter craft a sparse graph for spectral clustering in a principled manner is sufficient to achieve state-of-the-art performances in a fully unsupervised setting. Specifically, we consider four polynomial kernels and a degree one arccosine kernel to measure similarities in speaker embeddings, using which sparse graphs are constructed in a principled manner to emphasize local similarities. Experiments show the proposed approach excels in unsupervised speaker diarization over a variety of challenging environments in the DIHARD-III, AMI, and VoxConverse corpora. To encourage further research, our implementations are available at https://github.com/nikhilraghav29/MK-SGC-SC.


Transformer(11篇)

【1】Exploring Transformer Placement in Variational Autoencoders for Tabular Data Generation
标题:探索变分自动编码器中的Transformer放置以生成表格数据
链接:https://arxiv.org/abs/2601.20854

作者:Aníbal Silva,Moisés Santos,André Restivo,Carlos Soares
摘要:表格数据仍然是生成模型的一个具有挑战性的领域。特别是,标准的变分自动编码器(VAE)架构,通常由多层感知器组成,难以对特征之间的关系进行建模,特别是在处理混合数据类型时。相比之下,Transformers通过它们的注意力机制,更适合捕捉复杂的特征交互。在本文中,我们实证调查的影响,将Transformers到不同的组件的VAE。我们对OpenML CC18套件中的57个数据集进行了实验,并得出了两个主要结论。首先,结果表明,定位Transformers,以利用潜在的和解码器表示导致保真度和多样性之间的权衡。其次,我们观察到Transformer的所有组件中的连续块之间的高度相似性。特别地,在解码器中,Transformer的输入和输出之间的关系近似线性。
摘要 :Tabular data remains a challenging domain for generative models. In particular, the standard Variational Autoencoder (VAE) architecture, typically composed of multilayer perceptrons, struggles to model relationships between features, especially when handling mixed data types. In contrast, Transformers, through their attention mechanism, are better suited for capturing complex feature interactions. In this paper, we empirically investigate the impact of integrating Transformers into different components of a VAE. We conduct experiments on 57 datasets from the OpenML CC18 suite and draw two main conclusions. First, results indicate that positioning Transformers to leverage latent and decoder representations leads to a trade-off between fidelity and diversity. Second, we observe a high similarity between consecutive blocks of a Transformer in all components. In particular, in the decoder, the relationship between the input and output of a Transformer is approximately linear.


【2】Dissecting Multimodal In-Context Learning: Modality Asymmetries and Circuit Dynamics in modern Transformers
标题:剖析多模式上下文学习:现代Transformer中的情态不对称和电路动力学
链接:https://arxiv.org/abs/2601.20796

作者:Yiran Huang,Karsten Roth,Quentin Bouniot,Wenjia Xu,Zeynep Akata
摘要:基于transformer的多模态大型语言模型通常具有上下文学习(ICL)能力。出于这种现象,我们问:Transformers如何学习关联信息跨模态从上下文的例子?我们通过对经过综合分类任务训练的小型Transformers进行控制实验来研究这个问题,从而实现对数据统计和模型架构的精确操作。我们首先回顾现代Transformers中单峰ICL的核心原理。虽然之前的几个研究结果重复,我们发现,旋转位置嵌入(RoPE)增加了ICL的数据复杂性阈值。扩展到多模态设置揭示了一个基本的学习不对称性:当对来自主要模态的高多样性数据进行预训练时,次要模态中令人惊讶的低数据复杂性足以使多模态ICL出现。机制分析表明,这两种设置都依赖于一种归纳式机制,该机制从匹配的上下文范例中复制标签;多模态训练可以细化并扩展这些电路。我们的研究结果为理解现代Transformers中的多模态ICL提供了一个机械基础,并为未来的调查引入了一个受控的测试平台。
摘要:Transformer-based multimodal large language models often exhibit in-context learning (ICL) abilities. Motivated by this phenomenon, we ask: how do transformers learn to associate information across modalities from in-context examples? We investigate this question through controlled experiments on small transformers trained on synthetic classification tasks, enabling precise manipulation of data statistics and model architecture. We begin by revisiting core principles of unimodal ICL in modern transformers. While several prior findings replicate, we find that Rotary Position Embeddings (RoPE) increases the data complexity threshold for ICL. Extending to the multimodal setting reveals a fundamental learning asymmetry: when pretrained on high-diversity data from a primary modality, surprisingly low data complexity in the secondary modality suffices for multimodal ICL to emerge. Mechanistic analysis shows that both settings rely on an induction-style mechanism that copies labels from matching in-context exemplars; multimodal training refines and extends these circuits across modalities. Our findings provide a mechanistic foundation for understanding multimodal ICL in modern transformers and introduce a controlled testbed for future investigation.


【3】Enterprise Resource Planning Using Multi-type Transformers in Ferro-Titanium Industry
标题:铁钛行业采用多型号Transformer的企业资源规划
链接:https://arxiv.org/abs/2601.20696

作者:Samira Yazdanpourmoghadam,Mahan Balal Pour,Vahid Partovi Nia
摘要:组合优化问题,如作业车间调度问题(JSP)和背包问题(KP)是运筹学,物流和企业资源规划(ERP)中的基本挑战。这些问题往往需要复杂的算法,以实现在实际的时间限制接近最优的解决方案。深度学习的最新进展引入了基于transformer的架构,作为传统算法和元算法的有前途的替代方案。我们利用多类型Transformer(MTT)架构,以解决这些基准在一个统一的框架。我们提出了一个广泛的实验评估在标准的基准数据集JSP和KP,证明MTT实现这些基准问题的不同大小的竞争力的表现。我们展示了在钛铁工业中实际应用的多类型关注的潜力。据我们所知,我们是第一个在实际生产中应用多类型Transformers的公司。
摘要:Combinatorial optimization problems such as the Job-Shop Scheduling Problem (JSP) and Knapsack Problem (KP) are fundamental challenges in operations research, logistics, and eterprise resource planning (ERP). These problems often require sophisticated algorithms to achieve near-optimal solutions within practical time constraints. Recent advances in deep learning have introduced transformer-based architectures as promising alternatives to traditional heuristics and metaheuristics. We leverage the Multi-Type Transformer (MTT) architecture to address these benchmarks in a unified framework. We present an extensive experimental evaluation across standard benchmark datasets for JSP and KP, demonstrating that MTT achieves competitive performance on different size of these benchmark problems. We showcase the potential of multi-type attention on a real application in Ferro-Titanium industry. To the best of our knowledge, we are the first to apply multi-type transformers in real manufacturing.


【4】AWGformer: Adaptive Wavelet-Guided Transformer for Multi-Resolution Time Series Forecasting
标题:AWGformer:用于多分辨率时间序列预测的自适应子波引导Transformer
链接:https://arxiv.org/abs/2601.20409

作者:Wei Li
备注:Accepted by ICASSP 2026
摘要:时间序列预测需要在多个时间尺度上捕获模式,同时保持计算效率。本文介绍了AWGformer,这是一种新型架构,将自适应小波分解与跨尺度注意力机制相结合,用于增强多变量时间序列预测。我们的方法包括:(1)自适应小波分解模块(AWDM),其基于信号特性动态地选择最佳小波基和分解级别;(2)跨尺度特征融合(CSFF)机制,其通过可学习的耦合矩阵捕获不同频带之间的相互作用;(3)频率感知多头注意(FAMA)模块,其根据注意头的频率选择性对它们进行加权;(4)分层预测网络(HPN),其在重建之前以多个分辨率生成预测。在基准数据集上进行的大量实验表明,AWGformer在最先进的方法上实现了显着的平均改进,对多尺度和非平稳时间序列特别有效。理论分析提供了收敛保证,并建立了我们的小波引导的注意力和经典的信号处理原则之间的联系。
摘要:Time series forecasting requires capturing patterns across multiple temporal scales while maintaining computational efficiency. This paper introduces AWGformer, a novel architecture that integrates adaptive wavelet decomposition with cross-scale attention mechanisms for enhanced multi-variate time series prediction. Our approach comprises: (1) an Adaptive Wavelet Decomposition Module (AWDM) that dynamically selects optimal wavelet bases and decomposition levels based on signal characteristics; (2) a Cross-Scale Feature Fusion (CSFF) mechanism that captures interactions between different frequency bands through learnable coupling matrices; (3) a Frequency-Aware Multi-Head Attention (FAMA) module that weights attention heads according to their frequency selectivity; (4) a Hierarchical Prediction Network (HPN) that generates forecasts at multiple resolutions before reconstruction. Extensive experiments on benchmark datasets demonstrate that AWGformer achieves significant average improvements over state-of-the-art methods, with particular effectiveness on multi-scale and non-stationary time series. Theoretical analysis provides convergence guarantees and establishes the connection between our wavelet-guided attention and classical signal processing principles.


【5】Unsupervised Anomaly Detection in Multi-Agent Trajectory Prediction via Transformer-Based Models
标题:基于转换器的模型进行多智能体轨迹预测中的无监督异常检测
链接:https://arxiv.org/abs/2601.20367

作者:Qing Lyu,Zhe Fu,Alexandre Bayen
摘要:识别安全关键场景对于自动驾驶至关重要,但此类事件的罕见性使得监督标签变得不切实际。传统的基于规则的指标(如碰撞时间)过于简单,无法捕捉复杂的交互风险,现有方法缺乏系统的方法来验证统计异常是否真正反映了物理危险。为了解决这一差距,我们提出了一个无监督的异常检测框架的基础上,多代理Transformer模型正常驾驶和测量偏差,通过预测残差。提出了一种双重评估方案来评估检测稳定性和物理对齐:使用标准排名指标测量稳定性,其中Kendall秩相关系数捕获排名一致性,Jaccard指数捕获前K个选定项目的一致性;通过与已建立的替代安全措施(SSM)的相关性评估物理对齐。在NGSIM数据集上的实验证明了我们框架的有效性:我们表明,最大残差聚合器在保持稳定性的同时实现了最高的物理对齐。此外,我们的框架识别了碰撞时间和统计基线遗漏的388个独特异常,捕获了微妙的多智能体风险,如横向漂移下的反应性制动。检测到的异常进一步聚类为四种可解释的风险类型,为模拟和测试提供可操作的见解。
摘要 :Identifying safety-critical scenarios is essential for autonomous driving, but the rarity of such events makes supervised labeling impractical. Traditional rule-based metrics like Time-to-Collision are too simplistic to capture complex interaction risks, and existing methods lack a systematic way to verify whether statistical anomalies truly reflect physical danger. To address this gap, we propose an unsupervised anomaly detection framework based on a multi-agent Transformer that models normal driving and measures deviations through prediction residuals. A dual evaluation scheme has been proposed to assess both detection stability and physical alignment: Stability is measured using standard ranking metrics in which Kendall Rank Correlation Coefficient captures rank agreement and Jaccard index captures the consistency of the top-K selected items; Physical alignment is assessed through correlations with established Surrogate Safety Measures (SSM). Experiments on the NGSIM dataset demonstrate our framework's effectiveness: We show that the maximum residual aggregator achieves the highest physical alignment while maintaining stability. Furthermore, our framework identifies 388 unique anomalies missed by Time-to-Collision and statistical baselines, capturing subtle multi-agent risks like reactive braking under lateral drift. The detected anomalies are further clustered into four interpretable risk types, offering actionable insights for simulation and testing.


【6】Memory Retrieval in Transformers: Insights from The Encoding Specificity Principle
标题:Transformer中的记忆检索:来自编码特定性原则的见解
链接:https://arxiv.org/abs/2601.20282

作者:Viet Hung Dinh,Ming Ding,Youyang Qu,Kanchana Thilakarathna
摘要:虽然大型语言模型(LLM)的可解释人工智能(XAI)仍然是一个不断发展的领域,有许多尚未解决的问题,但越来越多的监管压力激发了人们对它在确保透明度,问责制和隐私保护机器学习方面的作用的兴趣。尽管XAI的最新进展提供了一些见解,但基于Transformer的LLM中注意力层的具体作用仍然未得到充分探索。本研究探讨了由注意层实例化的记忆机制,借鉴了心理学和计算心理语言学的先前研究,该研究将Transformer注意力与人类记忆中基于线索的提取联系起来。在这个视图中,查询编码检索上下文,关键字索引候选记忆痕迹,注意力权重量化线索痕迹相似性,和值携带编码的内容,共同实现的上下文表示,先于和促进记忆检索的建设。在编码特异性原则的指导下,我们假设在提取的初始阶段使用的线索被实例化为关键字。我们为这个关键词作为线索的假设提供了收敛的证据。此外,我们隔离神经元内的注意层,其激活选择性编码和促进检索的上下文定义的关键字。因此,这些关键字可以从识别的神经元中提取,并进一步有助于下游应用,如遗忘。
摘要:While explainable artificial intelligence (XAI) for large language models (LLMs) remains an evolving field with many unresolved questions, increasing regulatory pressures have spurred interest in its role in ensuring transparency, accountability, and privacy-preserving machine unlearning. Despite recent advances in XAI have provided some insights, the specific role of attention layers in transformer based LLMs remains underexplored. This study investigates the memory mechanisms instantiated by attention layers, drawing on prior research in psychology and computational psycholinguistics that links Transformer attention to cue based retrieval in human memory. In this view, queries encode the retrieval context, keys index candidate memory traces, attention weights quantify cue trace similarity, and values carry the encoded content, jointly enabling the construction of a context representation that precedes and facilitates memory retrieval. Guided by the Encoding Specificity Principle, we hypothesize that the cues used in the initial stage of retrieval are instantiated as keywords. We provide converging evidence for this keywords-as-cues hypothesis. In addition, we isolate neurons within attention layers whose activations selectively encode and facilitate the retrieval of context-defining keywords. Consequently, these keywords can be extracted from identified neurons and further contribute to downstream applications such as unlearning.


【7】C2:Cross learning module enhanced decision transformer with Constraint-aware loss for auto-bidding
标题:C2:交叉学习模块增强的决策Transformer,具有用于自动投标的约束感知损失
链接:https://arxiv.org/abs/2601.20257

作者:Jinren Ding,Xuejian Xu,Shen Jiang,Zhitong Hao,Jinhui Yang,Peng Jiang
摘要:决策Transformer(DT)通过捕获时间依赖性显示出生成式自动出价的前景,但受到两个关键限制:状态、动作和返回到去(RTG)序列之间的不充分的互相关建模,以及不加选择地学习最优/次优行为。为了解决这些问题,我们提出了C2,一个新的框架,增强DT与两个核心创新:(1)交叉学习块(CLB)通过交叉注意,以加强序列间的相关性建模;(2)约束感知损失(CL)纳入预算和每次获取成本(CPA)的限制,选择性学习的最佳轨迹。在AuctionNet数据集上进行的广泛离线评估表明,在不同的预算设置下,C2具有一致的性能增益(比最先进的GAVE高出3.23%);消融研究验证了CLB和CL的互补协同作用,证实了C2在自动投标方面的优势。用于复制我们的结果的代码可在https://github.com/Dingjinren/C2上获得。
摘要:Decision Transformer (DT) shows promise for generative auto-bidding by capturing temporal dependencies, but suffers from two critical limitations: insufficient cross-correlation modeling among state, action, and return-to-go (RTG) sequences, and indiscriminate learning of optimal/suboptimal behaviors. To address these, we propose C2, a novel framework enhancing DT with two core innovations: (1) a Cross Learning Block (CLB) via cross-attention to strengthen inter-sequence correlation modeling; (2) a Constraint-aware Loss (CL) incorporating budget and Cost-Per-Acquisition (CPA) constraints for selective learning of optimal trajectories. Extensive offline evaluations on the AuctionNet dataset demonstrate consistent performance gains (up to 3.23\% over state-of-the-art GAVE) across diverse budget settings; ablation studies verify the complementary synergy of CLB and CL, confirming C2's superiority in auto-bidding. The code for reproducing our results is available at: https://github.com/Dingjinren/C2.


【8】Semi-Supervised Masked Autoencoders: Unlocking Vision Transformer Potential with Limited Data
标题:半监督屏蔽自动编码器:利用有限数据释放视觉Transformer潜力
链接:https://arxiv.org/abs/2601.20072

作者:Atik Faysal,Mohammad Rostami,Reihaneh Gh. Roshan,Nikhil Muralidhar,Huaxia Wang
摘要:我们解决的挑战,训练Vision Transformers(ViTs)标记的数据是稀缺的,但未标记的数据是丰富的。我们提出了半监督掩蔽自动编码器(SSMAE),一个框架,共同优化掩蔽图像重建和分类使用未标记和标记的样本与动态选择的伪标签。SSMAE引入了一种验证驱动的门控机制,只有在模型实现可靠的、高置信度的预测后才激活伪标签,这些预测在同一图像的弱增强和强增强视图中是一致的,从而减少了确认偏差。在CIFAR-10和CIFAR-100上,SSMAE始终优于监督ViT和微调MAE,在低标签制度中收益最大(在CIFAR-10上,10%标签的ViT上增加了9.24%)。我们的研究结果表明,何时引入伪标签与如何生成伪标签对于数据高效的Transformer训练同样重要。代码可在https://github.com/atik666/ssmae上获得。
摘要:We address the challenge of training Vision Transformers (ViTs) when labeled data is scarce but unlabeled data is abundant. We propose Semi-Supervised Masked Autoencoder (SSMAE), a framework that jointly optimizes masked image reconstruction and classification using both unlabeled and labeled samples with dynamically selected pseudo-labels. SSMAE introduces a validation-driven gating mechanism that activates pseudo-labeling only after the model achieves reliable, high-confidence predictions that are consistent across both weakly and strongly augmented views of the same image, reducing confirmation bias. On CIFAR-10 and CIFAR-100, SSMAE consistently outperforms supervised ViT and fine-tuned MAE, with the largest gains in low-label regimes (+9.24% over ViT on CIFAR-10 with 10% labels). Our results demonstrate that when pseudo-labels are introduced is as important as how they are generated for data-efficient transformer training. Codes are available at https://github.com/atik666/ssmae.


【9】Latent Object Permanence: Topological Phase Transitions, Free-Energy Principles, and Renormalization Group Flows in Deep Transformer Manifolds
标题:潜在的对象持久性:拓扑相变,自由能原理,并在深Transformer流形重整化群流
链接:https://arxiv.org/abs/2601.19942

作者:Faruk Alpay,Bugra Kilictas
备注:12 pages, 3 figures
摘要:我们通过几何和物理透镜研究了深度Transformer语言模型中多步推理的出现。将隐态轨迹视为隐式黎曼流形上的流,我们分析了激活的逐层协方差谱,其中$C^{(\ell)}=\mathbb{E}[h^{(\ell)}h^{(\ell)\top}]$,并跟踪与随机矩阵块的偏差。在模型尺度(1.5B-30 B)上,我们观察到有效维数的急剧降低与相变相一致:基于稀疏性/局域化的序参量$Ω(h)=1-h 1/(\sqrt{d} h\| 2)$在足够大的模型中,在临界归一化深度$γ c\approximately 0.42$附近表现出不连续性。我们形式化的前向传递作为一个离散的粗粒度的地图和稳定的“概念盆地”的外观,这种重整化的动力学的不动点。由此产生的低熵制度的特点是频谱尾部崩溃,并通过在表示空间,我们称之为瞬态类对象(TCO)的瞬态,可重复使用的对象状结构的形成。我们提供的理论条件连接逻辑可分性光谱衰减和验证预测的签名与分层探测多个开放的重量模型家庭。
摘要:We study the emergence of multi-step reasoning in deep Transformer language models through a geometric and statistical-physics lens. Treating the hidden-state trajectory as a flow on an implicit Riemannian manifold, we analyze the layerwise covariance spectrum of activations, where $C^{(\ell)}=\mathbb{E}[h^{(\ell)}h^{(\ell)\top}]$, and track deviations from a random-matrix bulk. Across model scales (1.5B--30B), we observe a sharp reduction in effective dimensionality consistent with a phase transition: an order parameter based on sparsity/localization, $Ω(h)=1-\|h\|_1/(\sqrt{d}\|h\|_2)$, exhibits a discontinuity near a critical normalized depth $γ_c\approx 0.42$ in sufficiently large models. We formalize the forward pass as a discrete coarse-graining map and relate the appearance of stable "concept basins" to fixed points of this renormalization-like dynamics. The resulting low-entropy regime is characterized by a spectral tail collapse and by the formation of transient, reusable object-like structures in representation space, which we call Transient Class Objects (TCOs). We provide theoretical conditions connecting logical separability to spectral decay and validate the predicted signatures with layerwise probes on multiple open-weight model families.


【10】GTAC: A Generative Transformer for Approximate Circuits
标题:GTAC:用于近似电路的生成式Transformer
链接:https://arxiv.org/abs/2601.19906

作者:Jingxin Wang,Shitong Guo,Ruicheng Dai,Wenhui Liang,Ruogu Ding,Xin Ning,Weikang Qian
摘要:针对容错应用,近似电路引入受控误差以显著提高电路的性能、功率和面积(PPA)。在这项工作中,我们介绍了GTAC,一种新的基于生成变换器的模型,用于产生近似电路。通过利用近似计算和AI驱动的EDA原理,我们的模型创新地将错误阈值集成到设计过程中。实验结果表明,与最先进的方法相比,GTAC在错误率约束下进一步减少了6.4%的面积,同时速度提高了4.3倍。
摘要:Targeting error-tolerant applications, approximate circuits introduce controlled errors to significantly improve performance, power, and area (PPA) of circuits. In this work, we introduce GTAC, a novel generative Transformer-based model for producing approximate circuits. By leveraging principles of approximate computing and AI-driven EDA, our model innovatively integrates error thresholds into the design process. Experimental results show that compared with a state-of-the-art method, GTAC further reduces 6.4% area under the error rate constraint, while being 4.3x faster.


【11】The Sound of Noise: Leveraging the Inductive Bias of Pre-trained Audio Transformers for Glitch Identification in LIGO
标题:噪音之声:利用预先训练的音频Transformer的感性偏差在LIGO中进行故障识别
链接:https://arxiv.org/abs/2601.20034

作者:Suyash Deshmukh,Chayan Chatterjee,Abigail Petulante,Tabata Aira Ferreira,Karan Jani
摘要:瞬态噪声伪影或毛刺从根本上限制了引力波(GW)干涉仪的灵敏度,并且可以模拟真实的天体物理信号,特别是短持续时间的中等质量黑洞(IMBH)合并。目前的故障分类方法,如Gravity Spy,依赖于使用标记数据集从头开始训练的监督模型。这些方法受到一个显著的"标签瓶颈”的影响,需要大量的、专业注释的数据集来实现高精度,并且常常难以概括为在观测运行中遇到的新的毛刺形态或奇异的GW信号。在这项工作中,我们提出了一种新的跨域框架,通过音频处理的镜头来处理GW应变数据。我们利用音频频谱图Transformer(AST),一个在大规模音频数据集上预训练的模型,并将其适应GW域。我们的方法不是从头开始学习时频特征,而是利用预训练音频模型中固有的强归纳偏差,将自然声音的学习表示转移到检测器噪声和GW信号(包括IMBH)的表征中。我们通过分析LIGO探测器第三次(O3)和第四次(O 4)观测运行的应变数据来验证这种方法。我们使用了t-分布随机邻居嵌入(t-SNE),一种无监督聚类技术,来可视化信号和故障的AST衍生嵌入,揭示了与独立验证的重力间谍故障类密切相关的分离的组。我们的研究结果表明,与传统的监督技术相比,来自音频预训练的归纳偏差允许更好的特征提取,为发现新的异常瞬态提供了一个强大的,数据高效的途径,并在下一代检测器时代对复杂的噪声伪影进行分类。
摘要:Transient noise artifacts, or glitches, fundamentally limit the sensitivity of gravitational-wave (GW) interferometers and can mimic true astrophysical signals, particularly the short-duration intermediate-mass black hole (IMBH) mergers. Current glitch classification methods, such as Gravity Spy, rely on supervised models trained from scratch using labeled datasets. These approaches suffer from a significant ``label bottleneck," requiring massive, expertly annotated datasets to achieve high accuracy and often struggling to generalize to new glitch morphologies or exotic GW signals encountered in observing runs. In this work, we present a novel cross-domain framework that treats GW strain data through the lens of audio processing. We utilize the Audio Spectrogram Transformer (AST), a model pre-trained on large-scale audio datasets, and adapt it to the GW domain. Instead of learning time-frequency features from scratch, our method exploits the strong inductive bias inherent in pre-trained audio models, transferring learned representations of natural sound to the characterization of detector noise and GW signals, including IMBHs. We validate this approach by analyzing strain data from the third (O3) and fourth (O4) observing runs of the LIGO detectors. We used t-Distributed Stochastic Neighbor Embedding (t-SNE), an unsupervised clustering technique, to visualize the AST-derived embeddings of signals and glitches, revealing well-separated groups that align closely with independently validated Gravity Spy glitch classes. Our results indicate that the inductive bias from audio pre-training allows superior feature extraction compared to traditional supervised techniques, offering a robust, data-efficient pathway for discovering new, anomalous transients, and classifying complex noise artifacts in the era of next-generation detectors.


GAN|对抗|攻击|生成相关(2篇)

【1】SemBind: Binding Diffusion Watermarks to Semantics Against Black-Box Forgery Attacks
标题:SemBind:将扩散水印绑定到语义上以防止黑匣子伪造攻击
链接:https://arxiv.org/abs/2601.20310

作者:Xin Zhang,Zijin Yang,Kejiang Chen,Linfeng Ma,Weiming Zhang,Nenghai Yu
摘要:基于潜在的水印,集成到潜在的扩散模型(LDMs)的生成过程中,简化了生成的图像的检测和属性。然而,最近的黑盒伪造攻击,其中攻击者需要至少一个水印图像和黑盒访问提供商的模型,可以将提供商的水印嵌入到不是由提供商生成的图像中,对来源和信任造成巨大的风险。我们提出了SemBind,第一个防御框架的潜在的基于水印,抵制黑盒伪造通过绑定潜在信号的图像语义通过学习语义掩蔽。通过对比学习训练,掩蔽器为相同的提示和跨提示的近正交码产生近不变的代码;这些代码被重塑和置换以在任何标准的基于潜伏期的水印之前调制目标潜伏期。SemBind通常与现有的基于潜伏期的水印方案兼容,并保持图像质量基本不变,而一个简单的掩码比参数提供了一个可调的防伪强度和鲁棒性之间的权衡。在四种主流的基于潜伏期的水印方法中,我们的SemBind-enabled防伪变体显着减少了黑盒伪造下的错误接受,同时提供了可控的鲁棒性-安全性平衡。
摘要 :Latent-based watermarks, integrated into the generation process of latent diffusion models (LDMs), simplify detection and attribution of generated images. However, recent black-box forgery attacks, where an attacker needs at least one watermarked image and black-box access to the provider's model, can embed the provider's watermark into images not produced by the provider, posing outsized risk to provenance and trust. We propose SemBind, the first defense framework for latent-based watermarks that resists black-box forgery by binding latent signals to image semantics via a learned semantic masker. Trained with contrastive learning, the masker yields near-invariant codes for the same prompt and near-orthogonal codes across prompts; these codes are reshaped and permuted to modulate the target latent before any standard latent-based watermark. SemBind is generally compatible with existing latent-based watermarking schemes and keeps image quality essentially unchanged, while a simple mask-ratio parameter offers a tunable trade-off between anti-forgery strength and robustness. Across four mainstream latent-based watermark methods, our SemBind-enabled anti-forgery variants markedly reduce false acceptance under black-box forgery while providing a controllable robustness-security balance.


【2】One Word is Enough: Minimal Adversarial Perturbations for Neural Text Ranking
标题:一个词就足够了:神经文本排名的最小对抗扰动
链接:https://arxiv.org/abs/2601.20283

作者:Tanmay Karmakar,Sourav Saha,Debapriyo Majumdar,Surjyanee Halder
备注:To appear at ECIR 2026
摘要:神经排序模型(NRM)具有很强的检索效率,但先前的工作表明它们容易受到对抗性扰动的影响。我们重新审视这个鲁棒性问题,用一个最小的,查询感知的攻击,通过插入或替换一个单一的,语义对齐的词-查询中心来提升目标文档。我们研究了启发式和梯度引导的变体,包括识别有影响力的插入点的白盒方法。在使用BERT和monoT 5重新排名器的TREC-DL 2019/2020上,我们的单字攻击成功率高达91%,同时平均修改每个文档不到两个标记,在可比的白盒设置下,以更少的编辑实现竞争性排名和分数提升,以确保对PRADA的公平评估。我们还引入了新的诊断指标来分析攻击敏感性,而不仅仅是聚合成功率。我们的分析揭示了一个Goldilocks区域,在这个区域中,中等排名的文档最容易受到攻击。这些发现证明了实际风险,并激发了未来对强大神经排名的防御。
摘要:Neural ranking models (NRMs) achieve strong retrieval effectiveness, yet prior work has shown they are vulnerable to adversarial perturbations. We revisit this robustness question with a minimal, query-aware attack that promotes a target document by inserting or substituting a single, semantically aligned word - the query center. We study heuristic and gradient-guided variants, including a white-box method that identifies influential insertion points. On TREC-DL 2019/2020 with BERT and monoT5 re-rankers, our single-word attacks achieve up to 91% success while modifying fewer than two tokens per document on average, achieving competitive rank and score boosts with far fewer edits under a comparable white-box setup to ensure fair evaluation against PRADA. We also introduce new diagnostic metrics to analyze attack sensitivity beyond aggregate success rates. Our analysis reveals a Goldilocks zone in which mid-ranked documents are most vulnerable. These findings demonstrate practical risks and motivate future defenses for robust neural ranking.


半/弱/无/有监督|不确定性|主动学习(10篇)

【1】Reinforcement Learning via Self-Distillation
标题:自蒸馏强化学习
链接:https://arxiv.org/abs/2601.20802

作者:Jonas Hübotter,Frederike Lübeck,Lejs Behric,Anton Baumann,Marco Bagatella,Daniel Marta,Ido Hakimi,Idan Shenfeld,Thomas Kleine Buening,Carlos Guestrin,Andreas Krause
摘要:大型语言模型越来越多地在代码和数学等可验证领域中使用强化学习进行后训练。然而,目前的可验证奖励强化学习(RLVR)方法只能从每次尝试的标量结果奖励中学习,从而产生严重的信用分配瓶颈。许多可验证的环境实际上提供了丰富的文本反馈,例如运行时错误或判断评估,这些反馈解释了尝试失败的原因。我们将这种设置形式化为具有丰富反馈的强化学习,并引入了自蒸馏策略优化(SDPO),它将标记化的反馈转换为密集的学习信号,而无需任何外部教师或显式奖励模型。SDPO将以反馈为条件的当前模型视为自学者,并将其反馈信息的下一个令牌预测提取回策略中。通过这种方式,SDPO利用模型的能力,在上下文中回顾性地识别自己的错误。在LiveCodeBench v6上的科学推理、工具使用和竞争性编程中,SDPO提高了样本效率和最终准确性。值得注意的是,SDPO还优于标准RLVR环境中的基线,标准RLVR环境仅通过使用成功的推出作为失败尝试的隐式反馈来返回标量反馈。最后,在测试时将SDPO应用于单个问题,可以加速困难的二元奖励任务的发现,实现与最佳k采样或多轮对话相同的发现概率,但尝试次数减少3倍。
摘要:Large language models are increasingly post-trained with reinforcement learning in verifiable domains such as code and math. Yet, current methods for reinforcement learning with verifiable rewards (RLVR) learn only from a scalar outcome reward per attempt, creating a severe credit-assignment bottleneck. Many verifiable environments actually provide rich textual feedback, such as runtime errors or judge evaluations, that explain why an attempt failed. We formalize this setting as reinforcement learning with rich feedback and introduce Self-Distillation Policy Optimization (SDPO), which converts tokenized feedback into a dense learning signal without any external teacher or explicit reward model. SDPO treats the current model conditioned on feedback as a self-teacher and distills its feedback-informed next-token predictions back into the policy. In this way, SDPO leverages the model's ability to retrospectively identify its own mistakes in-context. Across scientific reasoning, tool use, and competitive programming on LiveCodeBench v6, SDPO improves sample efficiency and final accuracy over strong RLVR baselines. Notably, SDPO also outperforms baselines in standard RLVR environments that only return scalar feedback by using successful rollouts as implicit feedback for failed attempts. Finally, applying SDPO to individual questions at test time accelerates discovery on difficult binary-reward tasks, achieving the same discovery probability as best-of-k sampling or multi-turn conversations with 3x fewer attempts.


【2】Active Learning for Decision Trees with Provable Guarantees
标题:具有可证明保证的决策树的主动学习
链接:https://arxiv.org/abs/2601.20775

作者:Arshia Soltani Moakhar,Tanapoom Laoaron,Faraz Ghahremani,Kiarash Banihashem,MohammadTaghi Hajiaghayi
备注:10 pages, 43 pages with appendix, ICLR 2026, Conference URL: https://openreview.net/forum?id=NOkjJPJIit
摘要:本文提出了决策树作为二进制分类器的主动学习标签复杂性的理论认识。我们做出了两个主要贡献。首先,我们提供了决策树的不一致系数的第一个分析-主动学习标签复杂性的一个关键参数。我们的分析在实现多对数标签复杂性所需的两个自然假设下成立,(i)每个根到叶路径查询不同的特征维度,以及(ii)输入数据具有规则的网格状结构。我们证明了这些假设是必不可少的,因为放松它们会导致多项式标签的复杂性。其次,我们提出了第一个通用的主动学习算法的二进制分类,实现了乘法错误保证,产生一个$(1+ε)$近似分类器。通过结合这些结果,我们设计了一个主动学习算法的决策树,只使用多对数的标签查询的数据集大小,在规定的假设。最后,我们建立了一个标签复杂度的下限,表明我们的算法的依赖于错误容限ε$接近最优。
摘要:This paper advances the theoretical understanding of active learning label complexity for decision trees as binary classifiers. We make two main contributions. First, we provide the first analysis of the disagreement coefficient for decision trees-a key parameter governing active learning label complexity. Our analysis holds under two natural assumptions required for achieving polylogarithmic label complexity, (i) each root-to-leaf path queries distinct feature dimensions, and (ii) the input data has a regular, grid-like structure. We show these assumptions are essential, as relaxing them leads to polynomial label complexity. Second, we present the first general active learning algorithm for binary classification that achieves a multiplicative error guarantee, producing a $(1+ε)$-approximate classifier. By combining these results, we design an active learning algorithm for decision trees that uses only a polylogarithmic number of label queries in the dataset size, under the stated assumptions. Finally, we establish a label complexity lower bound, showing our algorithm's dependence on the error tolerance $ε$ is close to optimal.


【3】Smoothing the Black-Box: Signed-Distance Supervision for Black-Box Model Copying
标题:平滑黑匣子:黑匣子模型收件箱的符号距离监督
链接:https://arxiv.org/abs/2601.20773

作者:Rubén Jiménez,Oriol Pujol
备注:27 pages
摘要:部署的机器学习系统必须随着数据、架构和法规的变化而不断发展,通常无法访问原始训练数据或模型内部。在这种情况下,黑盒复制提供了一种实用的重构机制,即通过仅从输入输出查询中学习副本来升级遗留模型。当仅限于硬标签输出时,复制变成了逐点查询的不连续表面重建问题,严重限制了有效恢复边界几何形状的能力。我们提出了一个基于距离的复制(蒸馏)框架,取代硬标签的监督与签署的距离教师的决策边界,复制转化为一个光滑的回归问题,利用当地的几何。我们开发了一个$α$-支配的平滑和正则化计划与Hölder/Lipschitz控制诱导的目标表面,并引入两个模型无关的算法来估计符号距离下的标签只访问。在合成问题和UCI基准测试上的实验表明,在硬标签基线上,保真度和泛化精度得到了一致的提高,同时使距离输出成为黑盒副本的不确定性相关信号。
摘要:Deployed machine learning systems must continuously evolve as data, architectures, and regulations change, often without access to original training data or model internals. In such settings, black-box copying provides a practical refactoring mechanism, i.e. upgrading legacy models by learning replicas from input-output queries alone. When restricted to hard-label outputs, copying turns into a discontinuous surface reconstruction problem from pointwise queries, severely limiting the ability to recover boundary geometry efficiently. We propose a distance-based copying (distillation) framework that replaces hard-label supervision with signed distances to the teacher's decision boundary, converting copying into a smooth regression problem that exploits local geometry. We develop an $α$-governed smoothing and regularization scheme with Hölder/Lipschitz control over the induced target surface, and introduce two model-agnostic algorithms to estimate signed distances under label-only access. Experiments on synthetic problems and UCI benchmarks show consistent improvements in fidelity and generalization accuracy over hard-label baselines, while enabling distance outputs as uncertainty-related signals for black-box replicas.


【4】Supervised Guidance Training for Infinite-Dimensional Diffusion Models
标题:无限维扩散模型的监督引导训练
链接:https://arxiv.org/abs/2601.20756

作者:Elizabeth L. Baker,Alexander Denker,Jes Frellsen
摘要:分数为基础的扩散模型最近已扩展到无限维函数空间,与用途,如反问题所产生的偏微分方程。在逆问题的贝叶斯公式中,目标是从通过对噪声观测进行先验条件化而获得的函数的后验分布中进行采样。虽然扩散模型在函数空间中提供了表达先验,但将其调节为从后验中采样的理论仍然是开放的。我们解决这个问题,假设先验位于Cameron-Martin空间中,或者相对于高斯测度是绝对连续的。我们证明了该模型可以使用Doob的$h$-变换的无限维扩展条件,并且条件得分分解成无条件得分和指导项。由于指导项是棘手的,我们提出了一个模拟免费的分数匹配目标(称为监督指导训练),使有效和稳定的后验采样。我们用数值例子说明了函数空间中贝叶斯反问题的理论。总之,我们的工作提供了第一个函数空间方法来微调训练的扩散模型,以准确地从后验样本。
摘要:Score-based diffusion models have recently been extended to infinite-dimensional function spaces, with uses such as inverse problems arising from partial differential equations. In the Bayesian formulation of inverse problems, the aim is to sample from a posterior distribution over functions obtained by conditioning a prior on noisy observations. While diffusion models provide expressive priors in function space, the theory of conditioning them to sample from the posterior remains open. We address this, assuming that either the prior lies in the Cameron-Martin space, or is absolutely continuous with respect to a Gaussian measure. We prove that the models can be conditioned using an infinite-dimensional extension of Doob's $h$-transform, and that the conditional score decomposes into an unconditional score and a guidance term. As the guidance term is intractable, we propose a simulation-free score matching objective (called Supervised Guidance Training) enabling efficient and stable posterior sampling. We illustrate the theory with numerical examples on Bayesian inverse problems in function spaces. In summary, our work offers the first function-space method for fine-tuning trained diffusion models to accurately sample from a posterior.


【5】Deep Semi-Supervised Survival Analysis for Predicting Cancer Prognosis
标题:预测癌症预后的深度半监督生存分析
链接:https://arxiv.org/abs/2601.20729

作者:Anchen Sun,Zhibin Chen,Xiaodong Cai
摘要:Cox比例风险(PH)模型在生存分析中得到广泛应用。最近,基于人工神经网络(ANN)的Cox-PH模型已经开发出来。然而,用高维特征训练这些Cox模型通常需要大量包含事件发生时间信息的标记样本。用于训练的标记数据的有限可用性通常限制了基于ANN的Cox模型的性能。为了解决这个问题,我们采用了深度半监督学习(DSSL)方法来开发基于均值教师(MT)框架的单模态和多模态基于ANN的Cox模型,该框架利用标记和未标记的数据进行训练。我们应用我们的模型Cox-MT,使用癌症基因组图谱(TCGA)的数据预测几种类型癌症的预后。我们的单模态Cox-MT模型,利用TCGA RNA-seq数据或整个载玻片图像,使用所考虑的四种癌症的相同数据集,显著优于现有的基于人工神经网络的Cox模型Cox-nnet。随着未标记样本数量的增加,Cox-MT的性能在给定的标记数据集上显著提高。此外,我们的多模态Cox-MT模型表现出比单模态模型更好的性能。总之,Cox-MT模型有效地利用了标记和未标记的数据,与仅在标记数据上训练的现有基于ANN的Cox模型相比,显著提高了预测准确性。
摘要:The Cox Proportional Hazards (PH) model is widely used in survival analysis. Recently, artificial neural network (ANN)-based Cox-PH models have been developed. However, training these Cox models with high-dimensional features typically requires a substantial number of labeled samples containing information about time-to-event. The limited availability of labeled data for training often constrains the performance of ANN-based Cox models. To address this issue, we employed a deep semi-supervised learning (DSSL) approach to develop single- and multi-modal ANN-based Cox models based on the Mean Teacher (MT) framework, which utilizes both labeled and unlabeled data for training. We applied our model, named Cox-MT, to predict the prognosis of several types of cancer using data from The Cancer Genome Atlas (TCGA). Our single-modal Cox-MT models, utilizing TCGA RNA-seq data or whole slide images, significantly outperformed the existing ANN-based Cox model, Cox-nnet, using the same data set across four types of cancer considered. As the number of unlabeled samples increased, the performance of Cox-MT significantly improved with a given set of labeled data. Furthermore, our multi-modal Cox-MT model demonstrated considerably better performance than the single-modal model. In summary, the Cox-MT model effectively leverages both labeled and unlabeled data to significantly enhance prediction accuracy compared to existing ANN-based Cox models trained solely on labeled data.


【6】MuRAL-CPD: Active Learning for Multiresolution Change Point Detection
标题:MuRAL-CPD:用于多分辨率变化点检测的主动学习
链接:https://arxiv.org/abs/2601.20686

作者:Stefano Bertolasi,Diego Carrera,Diego Stucchi,Pasqualina Fragneto,Luigi Amedeo Bianchi
备注:Presented at 2025 IEEE International Conference on Data Mining (ICDM), to appear in the Proceedings
摘要:变点检测(CPD)是时间序列分析中的一项关键任务,旨在识别底层数据生成过程发生变化的时刻。传统的CPD方法通常依赖于无监督的技术,缺乏对特定于任务的变化定义的适应性,并且不能从用户知识中受益。为了解决这些限制,我们提出了MuRAL-CPD,一种新的半监督方法,将主动学习集成到多分辨率CPD算法中。MuRAL-CPD利用基于小波的多分辨率分解来检测多个时间尺度上的变化,并结合用户反馈来迭代优化关键超参数。这种交互使模型能够将其变更概念与用户的变更概念保持一致,从而提高准确性和可解释性。我们在几个真实世界数据集上的实验结果显示了MuRAL-CPD对最先进方法的有效性,特别是在最小监督可用的情况下。
摘要 :Change Point Detection (CPD) is a critical task in time series analysis, aiming to identify moments when the underlying data-generating process shifts. Traditional CPD methods often rely on unsupervised techniques, which lack adaptability to task-specific definitions of change and cannot benefit from user knowledge. To address these limitations, we propose MuRAL-CPD, a novel semi-supervised method that integrates active learning into a multiresolution CPD algorithm. MuRAL-CPD leverages a wavelet-based multiresolution decomposition to detect changes across multiple temporal scales and incorporates user feedback to iteratively optimize key hyperparameters. This interaction enables the model to align its notion of change with that of the user, improving both accuracy and interpretability. Our experimental results on several real-world datasets show the effectiveness of MuRAL-CPD against state-of-the-art methods, particularly in scenarios where minimal supervision is available.


【7】Unsupervised Ensemble Learning Through Deep Energy-based Models
标题:通过基于深度能量的模型进行无监督集体学习
链接:https://arxiv.org/abs/2601.20556

作者:Ariel Maymon,Yanir Buznah,Uri Shaham
备注:Accepted to AISTATS 2026. 29 pages, 13 figures. Code available at: https://github.com/shaham-lab/deem
摘要:无监督集成学习的出现是为了解决在不访问地面真值标签或额外数据的情况下组合多个学习者的预测的挑战。这种模式在评估单个分类器性能或理解其优势由于信息有限而具有挑战性的情况下至关重要。我们提出了一种新的基于深度能量的方法,仅使用单个学习者的预测来构建准确的元学习者,可能能够捕获它们之间的复杂依赖结构。我们的方法不需要标记的数据,学习者的功能,或特定于问题的信息,并有理论上的保证,当学习者是有条件的独立。我们在各种集成场景中展示了卓越的性能,包括具有挑战性的专家设置混合。我们的实验跨越标准集成数据集和策划数据集,旨在测试模型如何融合来自多个来源的专业知识。这些结果强调了无监督集成学习利用集体智慧的潜力,特别是在数据稀缺或隐私敏感的环境中。
摘要:Unsupervised ensemble learning emerged to address the challenge of combining multiple learners' predictions without access to ground truth labels or additional data. This paradigm is crucial in scenarios where evaluating individual classifier performance or understanding their strengths is challenging due to limited information. We propose a novel deep energy-based method for constructing an accurate meta-learner using only the predictions of individual learners, potentially capable of capturing complex dependence structures between them. Our approach requires no labeled data, learner features, or problem-specific information, and has theoretical guarantees for when learners are conditionally independent. We demonstrate superior performance across diverse ensemble scenarios, including challenging mixture of experts settings. Our experiments span standard ensemble datasets and curated datasets designed to test how the model fuses expertise from multiple sources. These results highlight the potential of unsupervised ensemble learning to harness collective intelligence, especially in data-scarce or privacy-sensitive environments.


【8】Meta-Cognitive Reinforcement Learning with Self-Doubt and Recovery
标题:带自我怀疑和恢复的元认知强化学习
链接:https://arxiv.org/abs/2601.20193

作者:Zhipeng Zhang,Wenting Ma,Kai Li,Meng Guo,Lei Yang,Wei Yu,Hongji Cui,Yichen Zhang,Mo Zhang,Jinzhe Lin,Zhenjie Yao
摘要:强大的强化学习方法通常专注于抑制不可靠的经验或损坏的奖励,但它们缺乏对自己学习过程的可靠性进行推理的能力。因此,这些方法通常要么对噪声过度反应,变得过于保守,要么在不确定性累积时灾难性地失败。   在这项工作中,我们提出了一个元认知强化学习框架,使代理评估,调节和恢复其学习行为的基础上内部估计的可靠性信号。该方法引入了一个由值预测误差稳定性(VPES)驱动的元信任变量,通过故障安全调节和逐步信任恢复来调节学习动态。   在具有奖励腐败的连续控制基准上的实验表明,与强鲁棒性基线相比,恢复启用的元认知控制实现了更高的平均回报,并显着减少了后期训练失败。
摘要:Robust reinforcement learning methods typically focus on suppressing unreliable experiences or corrupted rewards, but they lack the ability to reason about the reliability of their own learning process. As a result, such methods often either overreact to noise by becoming overly conservative or fail catastrophically when uncertainty accumulates.   In this work, we propose a meta-cognitive reinforcement learning framework that enables an agent to assess, regulate, and recover its learning behavior based on internally estimated reliability signals. The proposed method introduces a meta-trust variable driven by Value Prediction Error Stability (VPES), which modulates learning dynamics via fail-safe regulation and gradual trust recovery.   Experiments on continuous-control benchmarks with reward corruption demonstrate that recovery-enabled meta-cognitive control achieves higher average returns and significantly reduces late-stage training failures compared to strong robustness baselines.


【9】MAPLE: Self-supervised Learning-Enhanced Nonlinear Dimensionality Reduction for Visual Analysis
标题:MAPLE:用于视觉分析的自我监督学习增强的非线性模糊度降低
链接:https://arxiv.org/abs/2601.20173

作者:Zeyang Huang,Takanori Fujiwara,Angelos Chatzimparmpas,Wandrille Duchemin,Andreas Kerren
摘要:我们提出了一种新的非线性降维方法,MAPLE,提高UMAP通过改进流形建模。MAPLE采用自监督学习方法来更有效地编码低维流形几何。这种方法的核心是最大流形容量表示(MMCR),它通过压缩局部相似数据点之间的方差,同时放大不同数据点之间的方差来帮助解开复杂的流形。这种设计对于具有大量簇内方差和弯曲流形结构的高维数据(例如生物或图像数据)特别有效。我们的定性和定量评估表明,MAPLE可以产生更清晰的视觉聚类分离和更精细的子聚类分辨率比UMAP,同时保持可比的计算成本。
摘要:We present a new nonlinear dimensionality reduction method, MAPLE, that enhances UMAP by improving manifold modeling. MAPLE employs a self-supervised learning approach to more efficiently encode low-dimensional manifold geometry. Central to this approach are maximum manifold capacity representations (MMCRs), which help untangle complex manifolds by compressing variances among locally similar data points while amplifying variance among dissimilar data points. This design is particularly effective for high-dimensional data with substantial intra-cluster variance and curved manifold structures, such as biological or image data. Our qualitative and quantitative evaluations demonstrate that MAPLE can produce clearer visual cluster separations and finer subcluster resolution than UMAP while maintaining comparable computational cost.


【10】Spectral Ghost in Representation Learning: from Component Analysis to Self-Supervised Learning
标题:表示学习中的光谱幽灵:从成分分析到自我监督学习
链接:https://arxiv.org/abs/2601.20154

作者:Bo Dai,Na Li,Dale Schuurmans
备注:43 pages, 3 figures
摘要:自监督学习(SSL)通过释放未标记数据的力量来提高经验性能。具体来说,SSL从大量未标记的数据中提取表示,这些数据将被传输到大量具有有限数据的下游任务。表征学习在不同应用中的显著改进引起了越来越多的关注,导致表征提取的自监督学习目标多种多样,学习过程五花八门,但缺乏清晰统一的认识。这种缺失阻碍了表征学习的持续发展,导致理论理解缺失,有效算法设计的原则不清楚,并且在实践中使用表征学习方法是不合理的。表示学习方法的快速发展进一步推动了统一框架的紧迫性。因此,在本文中,我们不得不发展表征学习的原则基础。我们首先从理论上研究的充分性表示从频谱表示的观点,这揭示了频谱的本质,现有的成功的SSL算法,并铺平了道路,以一个统一的框架理解和分析。这样的框架工作也启发了更有效和易于使用的表示学习算法的发展与原则的方式在现实世界中的应用。
摘要 :Self-supervised learning (SSL) have improved empirical performance by unleashing the power of unlabeled data for practical applications. Specifically, SSL extracts the representation from massive unlabeled data, which will be transferred to a plenty of down streaming tasks with limited data. The significant improvement on diverse applications of representation learning has attracted increasing attention, resulting in a variety of dramatically different self-supervised learning objectives for representation extraction, with an assortment of learning procedures, but the lack of a clear and unified understanding. Such an absence hampers the ongoing development of representation learning, leaving a theoretical understanding missing, principles for efficient algorithm design unclear, and the use of representation learning methods in practice unjustified. The urgency for a unified framework is further motivated by the rapid growth in representation learning methods. In this paper, we are therefore compelled to develop a principled foundation of representation learning. We first theoretically investigate the sufficiency of the representation from a spectral representation view, which reveals the spectral essence of the existing successful SSL algorithms and paves the path to a unified framework for understanding and analysis. Such a framework work also inspires the development of more efficient and easy-to-use representation learning algorithms with principled way in real-world applications.


迁移|Zero/Few/One-Shot|自适应(8篇)

【1】PatchFormer: A Patch-Based Time Series Foundation Model with Hierarchical Masked Reconstruction and Cross-Domain Transfer Learning for Zero-Shot Multi-Horizon Forecasting
标题:PatchFormer:基于补丁的时间序列基础模型,具有分层掩蔽重建和跨域转移学习,用于零点多水平预测
链接:https://arxiv.org/abs/2601.20845

作者:Olaf Yunus Laitinen Imanov,Derya Umut Kulali,Taner Yilmaz
备注:5 pages; 2 figures; 7 tables
摘要:时间序列预测是气候,能源,医疗保健和金融应用中的一个基本问题。许多现有的方法需要特定领域的功能工程和大量的标记数据的每个任务。我们介绍了PatchFormer,一个基于补丁的时间序列基础模型,它使用分层掩码重建进行自我监督预训练,并使用轻量级适配器进行高效传输。PatchFormer将时间序列分割成补丁,并通过跨时间尺度的可学习聚合来学习多尺度时间表示。预训练使用具有动态掩蔽和目标的掩蔽补丁重建,鼓励局部准确性和全局一致性,然后进行跨领域知识蒸馏。在涵盖天气、能源、交通、金融和医疗保健的24个基准数据集上进行的实验证明了最先进的zero-shot多视野预测,相对于强基线,均方误差降低了27.3%,同时所需的特定任务训练数据减少了94%。该模型表现出接近对数线性的缩放,预训练数据高达1000亿个点,处理长度为512的序列比全序列Transformers快3.8倍。
摘要:Time series forecasting is a fundamental problem with applications in climate, energy, healthcare, and finance. Many existing approaches require domain-specific feature engineering and substantial labeled data for each task. We introduce PatchFormer, a patch-based time series foundation model that uses hierarchical masked reconstruction for self-supervised pretraining and lightweight adapters for efficient transfer. PatchFormer segments time series into patches and learns multiscale temporal representations with learnable aggregation across temporal scales. Pretraining uses masked patch reconstruction with dynamic masking and objectives that encourage both local accuracy and global consistency, followed by cross-domain knowledge distillation. Experiments on 24 benchmark datasets spanning weather, energy, traffic, finance, and healthcare demonstrate state-of-the-art zero-shot multi-horizon forecasting, reducing mean squared error by 27.3 percent relative to strong baselines while requiring 94 percent less task-specific training data. The model exhibits near log-linear scaling with more pretraining data up to 100 billion points and processes length-512 sequences 3.8x faster than full-sequence transformers.


【2】When More Data Doesn't Help: Limits of Adaptation in Multitask Learning
标题:当更多数据没有帮助时:多任务学习中适应的局限性
链接:https://arxiv.org/abs/2601.20774

作者:Steve Hanneke,Mingyue Xu
摘要:多任务学习和相关框架在现代应用中取得了巨大的成功。在多任务学习问题中,我们从相关的源任务中收集了一组异构数据集,并希望通过单独解决每个任务来提高性能。arXiv:2006.15785最近的工作表明,在没有获得分布信息的情况下,只要每个任务的样本大小是有界的,没有基于单独聚合样本的算法可以保证最佳风险。   在本文中,我们重点了解多任务学习的统计限制。我们超越了arXiv:2006.15785中的没有免费午餐定理,建立了一个更强的适应不可能性结果,适用于每个任务任意大的样本量。这一改进传达了一个重要的信息,即多任务学习的困难无法通过每个任务拥有丰富的数据来克服。我们还讨论了最佳适应性的概念,可能是未来的利益。
摘要:Multitask learning and related frameworks have achieved tremendous success in modern applications. In multitask learning problem, we are given a set of heterogeneous datasets collected from related source tasks and hope to enhance the performance above what we could hope to achieve by solving each of them individually. The recent work of arXiv:2006.15785 has showed that, without access to distributional information, no algorithm based on aggregating samples alone can guarantee optimal risk as long as the sample size per task is bounded.   In this paper, we focus on understanding the statistical limits of multitask learning. We go beyond the no-free-lunch theorem in arXiv:2006.15785 by establishing a stronger impossibility result of adaptation that holds for arbitrarily large sample size per task. This improvement conveys an important message that the hardness of multitask learning cannot be overcame by having abundant data per task. We also discuss the notion of optimal adaptivity that may be of future interests.


【3】TABED: Test-Time Adaptive Ensemble Drafting for Robust Speculative Decoding in LVLMs
标题:TABED:LVLM中鲁棒推测解码的测试时自适应集合起草
链接:https://arxiv.org/abs/2601.20357

作者:Minjae Lee,Wonjun Kang,Byeongkeun Ahn,Christian Classen,Kevin Galim,Seunghyuk Oh,Minghao Yan,Hyung Il Koo,Kangwook Lee
备注:Accepted to Findings of EACL 2026
摘要:推测解码(SD)已被证明是有效的,通过快速生成草稿令牌并并行验证它们来加速LLM推理。然而,SD对于大型视觉语言模型(LVLM)来说仍然很大程度上未被探索,LVLM扩展了LLM来处理图像和文本提示。为了解决这一差距,我们对现有的推理方法进行了基准测试,在不同的输入场景下,在11个数据集上使用小的草稿模型,并观察了特定于机器学习的性能波动。受这些发现的启发,我们提出了测试时自适应批量封装起草(TABED),它动态地集成通过批量推理获得的多个草案,利用偏差从过去的地面真理在SD设置。动态集成方法实现了自回归解码的1.74倍的平均鲁棒壁时间加速比和单一起草方法的5%的改进,同时通过参数共享保持无训练和保持集成成本可以忽略不计。凭借其即插即用的兼容性,我们通过集成高级验证和替代起草方法进一步增强了TABED。代码和定制训练模型可在https://github.com/furiosa-ai/TABED上获得。
摘要:Speculative decoding (SD) has proven effective for accelerating LLM inference by quickly generating draft tokens and verifying them in parallel. However, SD remains largely unexplored for Large Vision-Language Models (LVLMs), which extend LLMs to process both image and text prompts. To address this gap, we benchmark existing inference methods with small draft models on 11 datasets across diverse input scenarios and observe scenario-specific performance fluctuations. Motivated by these findings, we propose Test-time Adaptive Batched Ensemble Drafting (TABED), which dynamically ensembles multiple drafts obtained via batch inference by leveraging deviations from past ground truths available in the SD setting. The dynamic ensemble method achieves an average robust walltime speedup of 1.74x over autoregressive decoding and a 5% improvement over single drafting methods, while remaining training-free and keeping ensembling costs negligible through parameter sharing. With its plug-and-play compatibility, we further enhance TABED by integrating advanced verification and alternative drafting methods. Code and custom-trained models are available at https://github.com/furiosa-ai/TABED.


【4】Test-Time Adaptation for Anomaly Segmentation via Topology-Aware Optimal Transport Chaining
标题:基于拓扑感知最优传输链的异常分割测试时自适应
链接:https://arxiv.org/abs/2601.20333

作者:Ali Zia,Usman Ali,Umer Ramzan,Abdul Rehman,Abdelwahed Khamis,Wei Xiang
摘要:深度拓扑数据分析(TDA)提供了一个原则性的框架,用于捕获跨尺度持续存在的结构不变量,例如连接性和循环,使其自然适合异常分割(AS)。与基于阈值的二进制化不同,基于阈值的二进制化在分布偏移下产生脆性掩模,TDA允许将异常表征为对全局结构的破坏,而不是局部波动。我们介绍TopoOT,一个拓扑感知的最优传输(OT)框架,集成了多过滤持久性图(PD)与测试时自适应(TTA)。我们的关键创新是最优传输链,它顺序地将PD跨阈值和过滤对齐,产生测地线稳定性分数,以识别跨尺度一致保留的特征。这些稳定性感知的伪标签监督一个轻量级的头部,该头部通过OT一致性和对比性目标在线训练,确保在域转移下的鲁棒适应。在标准的2D和3D异常检测基准测试中,TopoOT实现了最先进的性能,在2D数据集上的平均F1高达+24.1%,在3D AS基准测试中的平均F1高达+10.2%。
摘要:Deep topological data analysis (TDA) offers a principled framework for capturing structural invariants such as connectivity and cycles that persist across scales, making it a natural fit for anomaly segmentation (AS). Unlike thresholdbased binarisation, which produces brittle masks under distribution shift, TDA allows anomalies to be characterised as disruptions to global structure rather than local fluctuations. We introduce TopoOT, a topology-aware optimal transport (OT) framework that integrates multi-filtration persistence diagrams (PDs) with test-time adaptation (TTA). Our key innovation is Optimal Transport Chaining, which sequentially aligns PDs across thresholds and filtrations, yielding geodesic stability scores that identify features consistently preserved across scales. These stabilityaware pseudo-labels supervise a lightweight head trained online with OT-consistency and contrastive objectives, ensuring robust adaptation under domain shift. Across standard 2D and 3D anomaly detection benchmarks, TopoOT achieves state-of-the-art performance, outperforming the most competitive methods by up to +24.1% mean F1 on 2D datasets and +10.2% on 3D AS benchmarks.


【5】ProFlow: Zero-Shot Physics-Consistent Sampling via Proximal Flow Guidance
标题:ProFlow:通过近端流量引导实现Zero-Shot物理一致性采样
链接:https://arxiv.org/abs/2601.20227

作者:Zichao Yu,Ming Li,Wenyi Zhang,Difan Zou,Weiguo Gao
摘要:从稀疏观测值推导物理场,同时严格满足偏微分方程(PDE)是计算物理中的一个基本挑战。最近,深度生成模型为此类逆问题提供了强大的数据驱动先验,但现有方法很难在不进行昂贵的重新训练或破坏学习的生成先验的情况下实施硬物理约束。因此,迫切需要一种采样机制,可以将严格的物理一致性和观测保真度与预先训练的先验的统计结构相协调。为此,我们提出了ProFlow,一个近端的指导框架,用于zero-shot物理一致性采样,定义为使用固定的生成先验从稀疏观测推断解决方案,而无需特定于任务的再训练。该算法采用严格的两步方案,在以下之间交替:(\romannumeral 1)终端优化步骤,其通过近似最小化将流量预测投影到物理上和观测上一致的集合的交集上;和(\romannumeral 2)插值步骤,其将细化状态映射回生成轨迹以保持与学习的流概率路径的一致性。贝叶斯解释为局部最大后验概率(MAP)更新序列。对泊松、亥姆霍兹、达西和粘性Burgers方程的综合基准测试表明,与最先进的基于扩散和流量的基线相比,ProFlow实现了卓越的物理和观测一致性,以及更准确的分布统计。
摘要:Inferring physical fields from sparse observations while strictly satisfying partial differential equations (PDEs) is a fundamental challenge in computational physics. Recently, deep generative models offer powerful data-driven priors for such inverse problems, yet existing methods struggle to enforce hard physical constraints without costly retraining or disrupting the learned generative prior. Consequently, there is a critical need for a sampling mechanism that can reconcile strict physical consistency and observational fidelity with the statistical structure of the pre-trained prior. To this end, we present ProFlow, a proximal guidance framework for zero-shot physics-consistent sampling, defined as inferring solutions from sparse observations using a fixed generative prior without task-specific retraining. The algorithm employs a rigorous two-step scheme that alternates between: (\romannumeral1) a terminal optimization step, which projects the flow prediction onto the intersection of the physically and observationally consistent sets via proximal minimization; and (\romannumeral2) an interpolation step, which maps the refined state back to the generative trajectory to maintain consistency with the learned flow probability path. This procedure admits a Bayesian interpretation as a sequence of local maximum a posteriori (MAP) updates. Comprehensive benchmarks on Poisson, Helmholtz, Darcy, and viscous Burgers' equations demonstrate that ProFlow achieves superior physical and observational consistency, as well as more accurate distributional statistics, compared to state-of-the-art diffusion- and flow-based baselines.


【6】Regime-Adaptive Bayesian Optimization via Dirichlet Process Mixtures of Gaussian Processes
标题:通过高斯过程Dirichlet过程混合的区域自适应Bayesian优化
链接:https://arxiv.org/abs/2601.20043

作者:Yan Zhang,Xuefeng Liu,Sipeng Chen,Sascha Ranftl,Chong Liu,Shibo Li
摘要:标准贝叶斯优化(BO)假设在搜索空间中均匀平滑,这是在多机制问题中违反的假设,例如通过不同的能量盆地进行分子构象搜索或通过异质分子支架进行药物发现。单个GP要么过度平滑急剧的过渡,要么在平滑区域中产生幻觉噪声,从而产生错误校准的不确定性。我们提出了RAMBO,一个Dirichlet过程混合高斯过程,在优化过程中自动发现潜在的制度,每个模型由一个独立的GP与本地优化的超参数。我们推导出崩溃吉布斯采样,分析边缘化潜在的功能,有效的推理,并引入自适应浓度参数调度的粗到细政权发现。我们的收购功能分解为内政权和政权间的组件的不确定性。合成基准和现实世界的应用,包括分子构象优化,药物发现的虚拟筛选和聚变反应堆设计的实验,证明了在多制度目标的最先进的基线一致的改进。
摘要:Standard Bayesian Optimization (BO) assumes uniform smoothness across the search space an assumption violated in multi-regime problems such as molecular conformation search through distinct energy basins or drug discovery across heterogeneous molecular scaffolds. A single GP either oversmooths sharp transitions or hallucinates noise in smooth regions, yielding miscalibrated uncertainty. We propose RAMBO, a Dirichlet Process Mixture of Gaussian Processes that automatically discovers latent regimes during optimization, each modeled by an independent GP with locally-optimized hyperparameters. We derive collapsed Gibbs sampling that analytically marginalizes latent functions for efficient inference, and introduce adaptive concentration parameter scheduling for coarse-to-fine regime discovery. Our acquisition functions decompose uncertainty into intra-regime and inter-regime components. Experiments on synthetic benchmarks and real-world applications, including molecular conformer optimization, virtual screening for drug discovery, and fusion reactor design, demonstrate consistent improvements over state-of-the-art baselines on multi-regime objectives.


【7】BayPrAnoMeta: Bayesian Proto-MAML for Few-Shot Industrial Image Anomaly Detection
标题:BayPrAnoMeta:用于Few-Shot工业图像异常检测的Bayesian Proto-MAML
链接:https://arxiv.org/abs/2601.19992

作者:Soham Sarkar,Tanmay Sen,Sayantan Banerjee
摘要 :工业图像异常检测是一个具有挑战性的问题,由于极端的类不平衡和标记的缺陷样本的稀缺性,特别是在Few-Shot设置。我们提出BayPrAnoMeta,贝叶斯推广的原型MAML的Few-Shot工业图像异常检测。与现有的依赖于确定性类原型和基于距离的自适应的Proto-MAML方法不同,BayPrAnoMeta用特定于任务的概率正态模型替换原型,并通过贝叶斯后验预测似然来执行内环自适应。我们模型正常的支持嵌入与正态逆Wishart(NIW)之前,产生学生-$t$预测分布,使不确定性,重尾异常评分,并在极端的Few-Shot设置的鲁棒性是必不可少的。我们进一步将BayPrAnoMeta扩展到一个联邦元学习框架,该框架具有针对异构工业客户的监督对比正则化,并证明了收敛到所产生的非凸目标的稳定点。在MVTec AD基准测试上的实验表明,在Few-Shot异常检测设置中,AUROC相对于MAML、Proto-MAML和基于PatchCore的方法具有一致且显著的改进。
摘要:Industrial image anomaly detection is a challenging problem owing to extreme class imbalance and the scarcity of labeled defective samples, particularly in few-shot settings. We propose BayPrAnoMeta, a Bayesian generalization of Proto-MAML for few-shot industrial image anomaly detection. Unlike existing Proto-MAML approaches that rely on deterministic class prototypes and distance-based adaptation, BayPrAnoMeta replaces prototypes with task-specific probabilistic normality models and performs inner-loop adaptation via a Bayesian posterior predictive likelihood. We model normal support embeddings with a Normal-Inverse-Wishart (NIW) prior, producing a Student-$t$ predictive distribution that enables uncertainty-aware, heavy-tailed anomaly scoring and is essential for robustness in extreme few-shot settings. We further extend BayPrAnoMeta to a federated meta-learning framework with supervised contrastive regularization for heterogeneous industrial clients and prove convergence to stationary points of the resulting nonconvex objective. Experiments on the MVTec AD benchmark demonstrate consistent and significant AUROC improvements over MAML, Proto-MAML, and PatchCore-based methods in few-shot anomaly detection settings.


【8】Randomized Feasibility Methods for Constrained Optimization with Adaptive Step Sizes
标题:具有自适应步骤大小的约束优化的随机可行性方法
链接:https://arxiv.org/abs/2601.20076

作者:Abhishek Chakraborty,Angelia Nedić
摘要:我们考虑最小化的目标函数的约束下定义的低层次集的凸函数的交集。我们研究两种情况:(i)强凸和Lipschitz光滑的目标函数和(ii)凸,但可能非光滑的目标函数。为了处理不容易投影的约束,我们使用了一个带有Polyak步骤的随机可行性算法,每次迭代都有随机数量的采样约束,同时采取(子)梯度步骤来最小化目标函数。对于情况(i),我们证明了线性收敛的期望的目标函数值的任何规定的公差使用自适应步长。对于情况(ii),我们开发了一个完全无问题参数和自适应步长计划,产生$O(1/\sqrt{T})$最坏情况下的预期率。迭代的不可行性随着可行性更新的数量几乎肯定会几何地减少,而对于平均迭代,我们建立了函数值相对于最优值的预期下限,该最优值取决于随机采样约束数的分布。对于某些选择的样本量增长,最佳的速度达到。最后,一个二次约束二次规划(QCQP)问题和支持向量机(SVM)的模拟证明了我们的算法相比,其他国家的最先进的方法的计算效率。
摘要:We consider minimizing an objective function subject to constraints defined by the intersection of lower-level sets of convex functions. We study two cases: (i) strongly convex and Lipschitz-smooth objective function and (ii) convex but possibly nonsmooth objective function. To deal with the constraints that are not easy to project on, we use a randomized feasibility algorithm with Polyak steps and a random number of sampled constraints per iteration, while taking (sub)gradient steps to minimize the objective function. For case (i), we prove linear convergence in expectation of the objective function values to any prescribed tolerance using an adaptive stepsize. For case (ii), we develop a fully problem parameter-free and adaptive stepsize scheme that yields an $O(1/\sqrt{T})$ worst-case rate in expectation. The infeasibility of the iterates decreases geometrically with the number of feasibility updates almost surely, while for the averaged iterates, we establish an expected lower bound on the function values relative to the optimal value that depends on the distribution for the random number of sampled constraints. For certain choices of sample-size growth, optimal rates are achieved. Finally, simulations on a Quadratically Constrained Quadratic Programming (QCQP) problem and Support Vector Machines (SVM) demonstrate the computational efficiency of our algorithm compared to other state-of-the-art methods.


强化学习(7篇)

【1】Adapting the Behavior of Reinforcement Learning Agents to Changing Action Spaces and Reward Functions
标题:使强化学习代理的行为适应不断变化的动作空间和奖励功能
链接:https://arxiv.org/abs/2601.20714

作者:Raul de la Rosa,Ivana Dusparic,Nicolas Cardozo
摘要:强化学习(RL)代理通常在环境条件不稳定的现实应用中挣扎,特别是当奖励函数发生变化或可用动作空间扩展时。本文介绍了MORPHIN,一个自适应Q学习框架,它可以在没有完全重新训练的情况下进行动态适应。通过将概念漂移检测与学习和探索超参数的动态调整相结合,MORPHIN使代理适应奖励函数和代理动作空间的动态扩展的变化,同时保留先前的策略知识以防止灾难性遗忘。我们验证我们的方法使用Gridworld基准和交通信号控制模拟。结果表明,与标准Q学习基线相比,MORPHIN实现了卓越的收敛速度和连续适应,将学习效率提高了1.7倍。
摘要:Reinforcement Learning (RL) agents often struggle in real-world applications where environmental conditions are non-stationary, particularly when reward functions shift or the available action space expands. This paper introduces MORPHIN, a self-adaptive Q-learning framework that enables on-the-fly adaptation without full retraining. By integrating concept drift detection with dynamic adjustments to learning and exploration hyperparameters, MORPHIN adapts agents to changes in both the reward function and on-the-fly expansions of the agent's action space, while preserving prior policy knowledge to prevent catastrophic forgetting. We validate our approach using a Gridworld benchmark and a traffic signal control simulation. The results demonstrate that MORPHIN achieves superior convergence speed and continuous adaptation compared to a standard Q-learning baseline, improving learning efficiency by up to 1.7x.


【2】Positive-Unlabeled Reinforcement Learning Distillation for On-Premise Small Models
标题:基于先验小模型的正无标记强化学习提炼
链接:https://arxiv.org/abs/2601.20687

作者:Zhiqiang Kou,Junyang Chen,Xin-Qiang Cai,Xiaobo Xia,Ming-Kun Xie,Dong-Dong Wu,Biao Liu,Yuheng Jia,Xin Geng,Masashi Sugiyama,Tat-Seng Chua
备注:22 pages, 8 figures, 7 tables
摘要:由于隐私、成本和延迟的限制,小型模型的内部部署越来越普遍。然而,大多数实际的管道停止在监督微调(SFT),并未能达到强化学习(RL)对齐阶段。主要原因是RL对齐通常需要昂贵的人类偏好注释或严重依赖高质量的奖励模型,以及大规模的API使用和持续的工程维护,这两者都不适合内部部署设置。为了弥补这一差距,我们提出了一种用于内部部署小模型部署的正未标记(PU)RL蒸馏方法。在没有人类标记的偏好或奖励模型的情况下,我们的方法将教师的偏好优化能力从黑盒生成到本地可训练的学生中。对于每个提示,我们询问教师一次以获得锚响应,本地采样多个学生候选人,并执行锚条件自排名以诱导成对或列表偏好,通过直接偏好优化或组相对策略优化实现完全本地训练循环。理论分析证明,诱导的偏好信号是顺序一致的,并集中在近最优的候选人,支持其稳定的偏好优化。实验表明,我们的方法在低成本设置下实现了一致的强大性能。
摘要:Due to constraints on privacy, cost, and latency, on-premise deployment of small models is increasingly common. However, most practical pipelines stop at supervised fine-tuning (SFT) and fail to reach the reinforcement learning (RL) alignment stage. The main reason is that RL alignment typically requires either expensive human preference annotation or heavy reliance on high-quality reward models with large-scale API usage and ongoing engineering maintenance, both of which are ill-suited to on-premise settings. To bridge this gap, we propose a positive-unlabeled (PU) RL distillation method for on-premise small-model deployment. Without human-labeled preferences or a reward model, our method distills the teacher's preference-optimization capability from black-box generations into a locally trainable student. For each prompt, we query the teacher once to obtain an anchor response, locally sample multiple student candidates, and perform anchor-conditioned self-ranking to induce pairwise or listwise preferences, enabling a fully local training loop via direct preference optimization or group relative policy optimization. Theoretical analysis justifies that the induced preference signal by our method is order-consistent and concentrates on near-optimal candidates, supporting its stability for preference optimization. Experiments demonstrate that our method achieves consistently strong performance under a low-cost setting.


【3】Ranking-aware Reinforcement Learning for Ordinal Ranking
标题:用于有序排名的排名感知强化学习
链接:https://arxiv.org/abs/2601.20585

作者:Aiming Hao,Chen Zhu,Jiashu Zhu,Jiahong Wu,Xiangxiang Chu
备注:Accepted to ICASSP2026
摘要:有序回归和排序是具有挑战性的,因为传统方法难以建模的固有有序依赖性。我们提出了排名感知强化学习(RARL),一种新的RL框架,明确学习这些关系。RARL的核心是一个统一的目标,协同集成了回归和学习排名(L2R),使两个任务之间的相互改进。这是由排名感知的可验证奖励驱动的,该奖励联合评估回归精度和排名准确性,通过策略优化促进直接模型更新。为了进一步增强训练,我们引入了响应突变操作(RMO),它注入了受控的噪声,以改善探索并防止在鞍点停滞。RARL的有效性进行了验证,通过广泛的实验上三个不同的基准。
摘要:Ordinal regression and ranking are challenging due to inherent ordinal dependencies that conventional methods struggle to model. We propose Ranking-Aware Reinforcement Learning (RARL), a novel RL framework that explicitly learns these relationships. At its core, RARL features a unified objective that synergistically integrates regression and Learning-to-Rank (L2R), enabling mutual improvement between the two tasks. This is driven by a ranking-aware verifiable reward that jointly assesses regression precision and ranking accuracy, facilitating direct model updates via policy optimization. To further enhance training, we introduce Response Mutation Operations (RMO), which inject controlled noise to improve exploration and prevent stagnation at saddle points. The effectiveness of RARL is validated through extensive experiments on three distinct benchmarks.


【4】A Reinforcement Learning Based Universal Sequence Design for Polar Codes
标题:基于强化学习的极化码通用序列设计
链接:https://arxiv.org/abs/2601.20118

作者:David Kin Wai Ho,Arman Fazeli,Mohamad M. Mansour,Louay M. A. Jalloul
备注:8 pages, 4 figures, ICML2026
摘要:为了推进6 G应用的极化码设计,我们开发了一个基于强化学习的通用序列设计框架,该框架可扩展并适应不同的信道条件和解码策略。至关重要的是,我们的方法可扩展到代码长度高达2048 $,使其适合用于标准化。在5G中支持的所有$(N,K)$配置中,我们的方法相对于5G中采用的NR序列实现了具有竞争力的性能,并且在$N=2048$的beta扩展基线上产生了高达0.2 dB的增益。我们进一步强调了实现大规模学习的关键要素:(i)结合基于Polar码的通用偏序属性的物理定律约束学习,(ii)利用决策的弱长期影响来限制前瞻评估,以及(iii)联合多配置优化以提高学习效率。
摘要:To advance Polar code design for 6G applications, we develop a reinforcement learning-based universal sequence design framework that is extensible and adaptable to diverse channel conditions and decoding strategies. Crucially, our method scales to code lengths up to $2048$, making it suitable for use in standardization. Across all $(N,K)$ configurations supported in 5G, our approach achieves competitive performance relative to the NR sequence adopted in 5G and yields up to a 0.2 dB gain over the beta-expansion baseline at $N=2048$. We further highlight the key elements that enabled learning at scale: (i) incorporation of physical law constrained learning grounded in the universal partial order property of Polar codes, (ii) exploitation of the weak long term influence of decisions to limit lookahead evaluation, and (iii) joint multi-configuration optimization to increase learning efficiency.


【5】In-Context Reinforcement Learning From Suboptimal Historical Data
标题:来自次优历史数据的上下文强化学习
链接:https://arxiv.org/abs/2601.20116

作者:Juncheng Dong,Moyang Guo,Ethan X. Fang,Zhuoran Yang,Vahid Tarokh
备注:Accepted to Forty-Second International Conference on Machine Learning (ICML2025)
摘要:Transformer模型已经取得了显著的经验成功,这主要归功于它们的上下文学习能力。受此启发,我们探索训练自回归Transformer用于上下文强化学习(ICRL)。在此设置中,我们首先在由从各种RL任务收集的轨迹组成的离线数据集上训练一个Transformer,然后修复并使用此Transformer为新的RL任务创建操作策略。值得注意的是,我们考虑离线数据集包含从次优行为策略中采样的轨迹的设置。在这种情况下,标准的自回归训练对应于模仿学习,并导致次优性能。为了解决这个问题,我们提出了决策重要性Transformer(DIT)框架,它模拟的演员批评家算法在上下文的方式。特别是,我们首先训练一个基于transformer的值函数,该函数估计收集次优轨迹的行为策略的优势函数。然后,我们训练一个基于变压器的政策,通过加权最大似然估计损失,其中的权重构造的基础上训练的值函数,以引导次优的政策,以最优的。我们进行了广泛的实验,以测试的性能,双信息传输的强盗和马尔可夫决策过程的问题。我们的研究结果表明,DIT实现了卓越的性能,特别是当离线数据集包含次优的历史数据。
摘要:Transformer models have achieved remarkable empirical successes, largely due to their in-context learning capabilities. Inspired by this, we explore training an autoregressive transformer for in-context reinforcement learning (ICRL). In this setting, we initially train a transformer on an offline dataset consisting of trajectories collected from various RL tasks, and then fix and use this transformer to create an action policy for new RL tasks. Notably, we consider the setting where the offline dataset contains trajectories sampled from suboptimal behavioral policies. In this case, standard autoregressive training corresponds to imitation learning and results in suboptimal performance. To address this, we propose the Decision Importance Transformer(DIT) framework, which emulates the actor-critic algorithm in an in-context manner. In particular, we first train a transformer-based value function that estimates the advantage functions of the behavior policies that collected the suboptimal trajectories. Then we train a transformer-based policy via a weighted maximum likelihood estimation loss, where the weights are constructed based on the trained value function to steer the suboptimal policies to the optimal ones. We conduct extensive experiments to test the performance of DIT on both bandit and Markov Decision Process problems. Our results show that DIT achieves superior performance, particularly when the offline dataset contains suboptimal historical data.


【6】E2HiL: Entropy-Guided Sample Selection for Efficient Real-World Human-in-the-Loop Reinforcement Learning
标题:E2 HiL:用于高效现实世界人在环强化学习的信息引导样本选择
链接:https://arxiv.org/abs/2601.19969

作者:Haoyuan Deng,Yuanjiang Xue,Haoyang Du,Boyang Zhou,Zhenyu Wu,Ziwei Wang
备注:Project page: https://e2hil.github.io/
摘要:人在回路中的指导已经成为一种有效的方法,使复杂的现实世界的操作任务的在线强化学习(RL)的收敛速度更快。然而,现有的人在环RL(HiL-RL)框架通常具有低样本效率,需要大量的人工干预才能实现收敛,从而导致高劳动力成本。为了解决这个问题,我们提出了一个样本高效的现实世界中的人在环RL框架命名为\方法,它需要更少的人为干预,积极选择信息样本。具体地,策略熵的稳定减少使得能够以更高的样本效率改进探索和利用之间的权衡。我们首先建立不同样本对策略熵的影响函数,通过策略的行动概率和软优势的协方差来有效地估计策略熵。然后,我们选择具有中等影响函数值的样本,其中修剪引起急剧熵下降的捷径样本和具有可忽略影响的噪声样本。在四个真实操作任务上的大量实验表明,与最先进的HiL-RL方法相比,该方法的成功率提高了42.1%,而需要的人工干预减少了10.1%,验证了其有效性。提供代码、视频和数学公式的项目页面可以在https://e2hil.github.io/上找到。
摘要 :Human-in-the-loop guidance has emerged as an effective approach for enabling faster convergence in online reinforcement learning (RL) of complex real-world manipulation tasks. However, existing human-in-the-loop RL (HiL-RL) frameworks often suffer from low sample efficiency, requiring substantial human interventions to achieve convergence and thereby leading to high labor costs. To address this, we propose a sample-efficient real-world human-in-the-loop RL framework named \method, which requires fewer human intervention by actively selecting informative samples. Specifically, stable reduction of policy entropy enables improved trade-off between exploration and exploitation with higher sample efficiency. We first build influence functions of different samples on the policy entropy, which is efficiently estimated by the covariance of action probabilities and soft advantages of policies. Then we select samples with moderate values of influence functions, where shortcut samples that induce sharp entropy drops and noisy samples with negligible effect are pruned. Extensive experiments on four real-world manipulation tasks demonstrate that \method achieves a 42.1\% higher success rate while requiring 10.1\% fewer human interventions compared to the state-of-the-art HiL-RL method, validating its effectiveness. The project page providing code, videos, and mathematical formulations can be found at https://e2hil.github.io/.


【7】Exploring the holographic entropy cone via reinforcement learning
标题:通过强化学习探索全息信息锥
链接:https://arxiv.org/abs/2601.19979

作者:Temple He,Jaeha Lee,Hirosi Ooguri
备注:38 pages, 10 figures, 2 tables
摘要:我们开发了一个强化学习算法来研究全息熵锥。给定一个目标熵向量,我们的算法搜索一个图实现的最小割熵匹配的目标向量。如果目标向量不允许这样的图形实现,则它必须位于圆锥之外,在这种情况下,算法找到其对应的熵向量最接近目标的图形,并允许我们探测小平面的位置。对于$\sf N=3$锥,我们确认,我们的算法成功地重新发现一夫一妻制的互信息开始与目标矢量以外的全息熵锥。然后,我们将该算法应用于$\sf N=6$锥,分析了来自arXiv:2412.15364的次可加性锥的6条“神秘”极端射线,这些射线满足所有已知的全息熵不等式,但缺乏图形实现。我们发现了其中3个的实现,证明它们是全息熵锥的真正极端射线,同时提供证据表明其余3个是不可实现的,这意味着未知的全息不等式存在$\sf N=6$。
摘要:We develop a reinforcement learning algorithm to study the holographic entropy cone. Given a target entropy vector, our algorithm searches for a graph realization whose min-cut entropies match the target vector. If the target vector does not admit such a graph realization, it must lie outside the cone, in which case the algorithm finds a graph whose corresponding entropy vector most nearly approximates the target and allows us to probe the location of the facets. For the $\sf N=3$ cone, we confirm that our algorithm successfully rediscovers monogamy of mutual information beginning with a target vector outside the holographic entropy cone. We then apply the algorithm to the $\sf N=6$ cone, analyzing the 6 "mystery" extreme rays of the subadditivity cone from arXiv:2412.15364 that satisfy all known holographic entropy inequalities yet lacked graph realizations. We found realizations for 3 of them, proving they are genuine extreme rays of the holographic entropy cone, while providing evidence that the remaining 3 are not realizable, implying unknown holographic inequalities exist for $\sf N=6$.


符号|符号学习(1篇)

【1】An Empirical Investigation of Neural ODEs and Symbolic Regression for Dynamical Systems
标题:动态系统的神经常去方程和符号回归的实证研究
链接:https://arxiv.org/abs/2601.20637

作者:Panayiotis Ioannou,Pietro Liò,Pietro Cicuta
备注:Accepted at the Machine Learning and the Physical Sciences Workshop, NeurIPS 2025
摘要:精确地模拟复杂系统的动力学并发现其控制微分方程是加速科学发现的关键任务。使用噪声,从两个阻尼振荡系统的合成数据,我们探讨神经常微分方程(NODE)的外推能力和符号回归(SR)恢复基本方程的能力。我们的研究产生了三个关键的见解。首先,我们证明了NODE可以有效地外推到新的边界条件,提供的轨迹共享动态相似的训练数据。其次,SR成功地从嘈杂的地面实况数据中恢复方程,尽管其性能取决于输入变量的正确选择。最后,我们发现SR恢复了三个控制方程中的两个,以及第三个方程的良好近似,当使用仅在完整模拟的10%上训练的NODE生成的数据时。虽然这最后一个发现突出了未来工作的一个领域,但我们的研究结果表明,使用NODE来丰富有限的数据,并使符号回归能够推断物理定律,这是一种很有前途的新方法。
摘要:Accurately modelling the dynamics of complex systems and discovering their governing differential equations are critical tasks for accelerating scientific discovery. Using noisy, synthetic data from two damped oscillatory systems, we explore the extrapolation capabilities of Neural Ordinary Differential Equations (NODEs) and the ability of Symbolic Regression (SR) to recover the underlying equations. Our study yields three key insights. First, we demonstrate that NODEs can extrapolate effectively to new boundary conditions, provided the resulting trajectories share dynamic similarity with the training data. Second, SR successfully recovers the equations from noisy ground-truth data, though its performance is contingent on the correct selection of input variables. Finally, we find that SR recovers two out of the three governing equations, along with a good approximation for the third, when using data generated by a NODE trained on just 10% of the full simulation. While this last finding highlights an area for future work, our results suggest that using NODEs to enrich limited data and enable symbolic regression to infer physical laws represents a promising new approach for scientific discovery.


分层学习(1篇)

【1】Minimax Rates for Hyperbolic Hierarchical Learning
标题:双曲分层学习的极小极大率
链接:https://arxiv.org/abs/2601.20047

作者:Divit Rawal,Sriram Vishwanath
摘要:我们证明了在标准Lipschitz正则化下学习分层数据的欧氏和双曲表示之间的样本复杂性的指数分离。对于深度-$R$层次分支因子$m$,我们首先建立一个几何障碍的欧氏空间:任何有界半径嵌入力体积崩溃,指数映射许多树远点附近的位置。这需要Lipschitz常数缩放为$\exp(Ω(R))$来实现甚至简单的分层目标,从而在容量控制下产生指数样本复杂度。然后,我们证明了这种障碍在双曲空间中消失:常失真双曲嵌入允许O(1)$-Lipschitz可实现性,使学习$n = O(mR \log m)$样本。匹配$Ω(mR \log m)$下界通过Fano不等式建立双曲表示达到信息论最优。我们还展示了一个与几何无关的瓶颈:任何秩为$k$的预测空间只能捕获$O(k)$规范层次对比。
摘要:We prove an exponential separation in sample complexity between Euclidean and hyperbolic representations for learning on hierarchical data under standard Lipschitz regularization. For depth-$R$ hierarchies with branching factor $m$, we first establish a geometric obstruction for Euclidean space: any bounded-radius embedding forces volumetric collapse, mapping exponentially many tree-distant points to nearby locations. This necessitates Lipschitz constants scaling as $\exp(Ω(R))$ to realize even simple hierarchical targets, yielding exponential sample complexity under capacity control. We then show this obstruction vanishes in hyperbolic space: constant-distortion hyperbolic embeddings admit $O(1)$-Lipschitz realizability, enabling learning with $n = O(mR \log m)$ samples. A matching $Ω(mR \log m)$ lower bound via Fano's inequality establishes that hyperbolic representations achieve the information-theoretic optimum. We also show a geometry-independent bottleneck: any rank-$k$ prediction space captures only $O(k)$ canonical hierarchical contrasts.


医学相关(3篇)

【1】Externally Validated Longitudinal GRU Model for Visit-Level 180-Day Mortality Risk in Metastatic Castration-Resistant Prostate Cancer
标题:转移性阉割抵抗性前列腺癌访视水平180天死亡风险的外部验证纵向GRU模型
链接:https://arxiv.org/abs/2601.20046

作者:Javier Mencia-Ledo,Mohammad Noaeen,Zahra Shakeri
备注:7 pages, 4 figures
摘要:转移性去势抵抗性前列腺癌(mCRPC)是一种高度侵袭性疾病,预后不良,治疗反应不均匀。在这项工作中,我们使用来自两个III期队列(n=526和n=640)的纵向数据开发并外部验证了访视水平180天死亡风险模型。仅标记具有可观察到的180天结局的访视;从分析中排除右删失病例。我们比较了五种候选架构:长短期记忆、门控递归单元(GRU)、Cox比例风险、随机生存森林(RSF)和Logistic回归。对于每个数据集,我们选择了达到85%敏感性下限的最小风险阈值。GRU和RSF模型最初表现出很高的识别能力(C指数:两者均为87%)。在外部验证中,GRU获得了更高的校准(斜率:0.93;截距:0.07),并实现了0.87的PR-AUC。临床影响分析显示,真阳性的中位预警时间为151.0天(假阳性为59.0天),每100例患者访视18.3次警报。考虑到晚期虚弱或恶病质和血流动力学不稳定,排列重要性将BMI和收缩压列为最强关联。这些结果表明,纵向常规临床标志物可以估计mCRPC的短期死亡风险,并支持多个月窗口期的积极护理计划。
摘要:Metastatic castration-resistant prostate cancer (mCRPC) is a highly aggressive disease with poor prognosis and heterogeneous treatment response. In this work, we developed and externally validated a visit-level 180-day mortality risk model using longitudinal data from two Phase III cohorts (n=526 and n=640). Only visits with observable 180-day outcomes were labeled; right-censored cases were excluded from analysis. We compared five candidate architectures: Long Short-Term Memory, Gated Recurrent Unit (GRU), Cox Proportional Hazards, Random Survival Forest (RSF), and Logistic Regression. For each dataset, we selected the smallest risk-threshold that achieved an 85% sensitivity floor. The GRU and RSF models showed high discrimination capabilities initially (C-index: 87% for both). In external validation, the GRU obtained a higher calibration (slope: 0.93; intercept: 0.07) and achieved an PR-AUC of 0.87. Clinical impact analysis showed a median time-in-warning of 151.0 days for true positives (59.0 days for false positives) and 18.3 alerts per 100 patient-visits. Given late-stage frailty or cachexia and hemodynamic instability, permutation importance ranked BMI and systolic blood pressure as the strongest associations. These results suggest that longitudinal routine clinical markers can estimate short-horizon mortality risk in mCRPC and support proactive care planning over a multi-month window.


【2】oculomix: Hierarchical Sampling for Retinal-Based Systemic Disease Prediction
标题:oculomix:基于视网膜的系统性疾病预测的分层抽样
链接:https://arxiv.org/abs/2601.19939

作者:Hyunmin Kim,Yukun Zhou,Rahul A. Jonas,Lie Ju,Sunjin Hwang,Pearse A. Keane,Siegfried K. Wagner
备注:Accepted to ISBI 2026
摘要:Oculomics -通过视网膜成像预测系统性疾病(如心血管疾病和痴呆症)的概念-由于基于transformer的基础模型(如RETFound)的数据效率而迅速发展。图像级混合样本数据增强(如CutMix和MixUp)经常用于训练Transformers,但这些技术会干扰患者特定的属性,如医学合并症和临床因素,因为它们只考虑图像和标签。为了解决这个问题,我们提出了一个分层抽样策略,Oculomix,混合样本扩增。我们的方法是基于两个临床先验。首先(检查级别),在同一时间点从同一患者采集的图像具有相同的属性。其次(患者水平),在不同时间点从同一患者采集的图像具有软时间趋势,因为发病率通常随时间增加。在这些先验知识的指导下,我们的方法将混合空间限制在患者和检查级别,以更好地保留患者特定的特征,并利用它们的层次关系。使用ViT模型对一个大的种族多样性人群(Alzeye)的主要不良心血管事件(MACE)进行五年预测,验证了所提出的方法。我们表明,Oculomix在AUROC中始终优于图像级CutMix和MixUp高达3%,证明了所提出的方法在眼组学中的必要性和价值。
摘要:Oculomics - the concept of predicting systemic diseases, such as cardiovascular disease and dementia, through retinal imaging - has advanced rapidly due to the data efficiency of transformer-based foundation models like RETFound. Image-level mixed sample data augmentations, such as CutMix and MixUp, are frequently used for training transformers, yet these techniques perturb patient-specific attributes, such as medical comorbidity and clinical factors, since they only account for images and labels. To address this limitation, we propose a hierarchical sampling strategy, Oculomix, for mixed sample augmentations. Our method is based on two clinical priors. First (exam level), images acquired from the same patient at the same time point share the same attributes. Second (patient level), images acquired from the same patient at different time points have a soft temporal trend, as morbidity generally increases over time. Guided by these priors, our method constrains the mixing space to the patient and exam levels to better preserve patient-specific characteristics and leverages their hierarchical relationships. The proposed method is validated using ViT models on a five-year prediction of major adverse cardiovascular events (MACE) in a large ethnically diverse population (Alzeye). We show that Oculomix consistently outperforms image-level CutMix and MixUp by up to 3% in AUROC, demonstrating the necessity and value of the proposed method in oculomics.


【3】Cross-Country Learning for National Infectious Disease Forecasting Using European Data
标题:使用欧洲数据进行国家传染病预测的跨国学习
链接:https://arxiv.org/abs/2601.20771

作者:Zacharias Komodromos,Kleanthis Malialis,Artemis Kontou,Panayiotis Kolios
备注:7 pages, 4 figures, 5 tables
摘要:准确预测传染病发病率对于公共卫生规划和及时干预至关重要。虽然大多数数据驱动的预测方法主要依赖于来自单个国家的历史数据,但这些数据通常在长度和可变性方面受到限制,从而限制了机器学习(ML)模型的性能。在这项工作中,我们研究了一种用于传染病预测的跨国学习方法,其中单个模型在来自多个国家的时间序列数据上进行训练,并在感兴趣的国家进行评估。这一设置使该模型能够利用各国共有的流行病动态,并受益于扩大的训练集。我们使用欧洲国家的监测数据,通过塞浦路斯COVID-19病例预测的案例研究来检验这种方法。我们评估了多个ML模型,并分析了回顾窗口长度和跨国“数据增强”对多步预测性能的影响。我们的研究结果表明,与仅基于国家数据训练的模型相比,纳入其他国家的数据可以带来持续的改进。虽然实证重点是塞浦路斯和COVID-19,但拟议的框架和研究结果适用于更广泛的传染病预测,特别是在国家历史数据有限的情况下。
摘要:Accurate forecasting of infectious disease incidence is critical for public health planning and timely intervention. While most data-driven forecasting approaches rely primarily on historical data from a single country, such data are often limited in length and variability, restricting the performance of machine learning (ML) models. In this work, we investigate a cross-country learning approach for infectious disease forecasting, in which a single model is trained on time series data from multiple countries and evaluated on a country of interest. This setting enables the model to exploit shared epidemic dynamics across countries and to benefit from an enlarged training set. We examine this approach through a case study on COVID-19 case forecasting in Cyprus, using surveillance data from European countries. We evaluate multiple ML models and analyse the impact of the lookback window length and cross-country `data augmentation' on multi-step forecasting performance. Our results show that incorporating data from other countries can lead to consistent improvements over models trained solely on national data. Although the empirical focus is on Cyprus and COVID-19, the proposed framework and findings are applicable to infectious disease forecasting more broadly, particularly in settings with limited national historical data.


蒸馏|知识提取(1篇)

【1】Quantization-Aware Distillation for NVFP4 Inference Accuracy Recovery
标题:用于NVFP 4推理准确度恢复的量化感知蒸馏
链接:https://arxiv.org/abs/2601.20088

作者 :Meng Xin,Sweta Priyadarshi,Jingyu Xin,Bilal Kartal,Aditya Vavre,Asma Kuriparambil Thekkumpate,Zijia Chen,Ameya Sunil Mahabaleshwarkar,Ido Shahaf,Akhiad Bercovich,Kinjal Patel,Suguna Varshini Velury,Chenjie Luo,Zhiyu Cheng,Jenny Chen,Chen-Han Yu,Wei Ping,Oleg Rybakov,Nima Tajbakhsh,Oluwatobi Olabiyi,Dusan Stosic,Di Wu,Song Han,Eric Chung,Sharath Turuvekere Sreenivas,Bryan Catanzaro,Yoshi Suhara,Tijmen Blankevoort,Huizi Mao
摘要:本技术报告介绍了量化感知蒸馏(QAD)以及我们恢复NVFP 4量化大型语言模型(LLM)和视觉语言模型(VLM)准确性的最佳实践。QAD使用KL发散损失将全精度教师模型提取为量化的学生模型。虽然将蒸馏应用于量化模型并不是一个新的想法,但我们观察到QAD对当今LLM的关键优势:1。它对通过多阶段后训练管道训练的模型表现出显着的有效性和稳定性,包括监督微调(SFT),强化学习(RL)和模型合并,其中传统的量化感知训练(QAT)遭受工程复杂性和训练不稳定性; 2.它对数据质量和覆盖范围具有鲁棒性,无需完整的训练数据即可恢复准确性。我们在多个训练后模型中评估QAD,包括AceReason Nemotron,Nemotron 3 Nano,Nemotron Nano V2,Nemotron Nano V2 VL(VLM)和Llama Nemotron Super v1,显示出接近BF 16准确度的一致恢复。
摘要:This technical report presents quantization-aware distillation (QAD) and our best practices for recovering accuracy of NVFP4-quantized large language models (LLMs) and vision-language models (VLMs). QAD distills a full-precision teacher model into a quantized student model using a KL divergence loss. While applying distillation to quantized models is not a new idea, we observe key advantages of QAD for today's LLMs: 1. It shows remarkable effectiveness and stability for models trained through multi-stage post-training pipelines, including supervised fine-tuning (SFT), reinforcement learning (RL), and model merging, where traditional quantization-aware training (QAT) suffers from engineering complexity and training instability; 2. It is robust to data quality and coverage, enabling accuracy recovery without full training data. We evaluate QAD across multiple post-trained models including AceReason Nemotron, Nemotron 3 Nano, Nemotron Nano V2, Nemotron Nano V2 VL (VLM), and Llama Nemotron Super v1, showing consistent recovery to near-BF16 accuracy.


推荐(2篇)

【1】Post-Training Fairness Control: A Single-Train Framework for Dynamic Fairness in Recommendation
标题:训练后公平性控制:推荐动态公平性的单列车框架
链接:https://arxiv.org/abs/2601.20848

作者:Weixin Chen,Li Chen,Yuhan Zhao
备注:Accepted to WWW 2026 Workshop on HCRS (Oral Presentation)
摘要:尽管越来越多的努力,以减轻不公平的推荐系统,现有的公平意识的方法通常固定的公平性要求在训练时,并提供有限的训练后的灵活性。然而,在现实世界的场景中,不同的利益相关者可能会随着时间的推移而要求不同的公平性要求,因此针对不同公平性要求的再训练变得令人望而却步。为了解决这个问题,我们提出了Cofair,一个单一的训练框架,使训练后的公平性控制的建议。具体来说,Cofair引入了一个共享表示层,其中包含公平性条件适配器模块,以生成专门用于不同公平性级别的用户嵌入,以及一个用户级别正则化项,该项保证了这些级别上的用户单调公平性改进。我们从理论上建立了Cofair上界人口统计学奇偶性和正则化项的对抗性目标在用户级别上强制执行渐进公平性。在多个数据集和骨干模型上进行的综合实验表明,我们的框架在不同级别上提供了动态公平性,提供了与最先进的基线相当或更好的公平性-准确性曲线,而无需针对每个新的公平性要求进行重新训练。我们的代码可在https://github.com/weixinchen98/Cofair上公开获取。
摘要:Despite growing efforts to mitigate unfairness in recommender systems, existing fairness-aware methods typically fix the fairness requirement at training time and provide limited post-training flexibility. However, in real-world scenarios, diverse stakeholders may demand differing fairness requirements over time, so retraining for different fairness requirements becomes prohibitive. To address this limitation, we propose Cofair, a single-train framework that enables post-training fairness control in recommendation. Specifically, Cofair introduces a shared representation layer with fairness-conditioned adapter modules to produce user embeddings specialized for varied fairness levels, along with a user-level regularization term that guarantees user-wise monotonic fairness improvements across these levels. We theoretically establish that the adversarial objective of Cofair upper bounds demographic parity and the regularization term enforces progressive fairness at user level. Comprehensive experiments on multiple datasets and backbone models demonstrate that our framework provides dynamic fairness at different levels, delivering comparable or better fairness-accuracy curves than state-of-the-art baselines, without the need to retrain for each new fairness requirement. Our code is publicly available at https://github.com/weixinchen98/Cofair.


【2】LLaTTE: Scaling Laws for Multi-Stage Sequence Modeling in Large-Scale Ads Recommendation
标题:LLaTTE:大规模广告推荐中多阶段序列建模的比例定律
链接:https://arxiv.org/abs/2601.20083

作者:Lee Xiong,Zhirong Chen,Rahul Mayuranath,Shangran Qiu,Arda Ozdemir,Lu Li,Yang Hu,Dave Li,Jingtao Ren,Howard Cheng,Fabian Souto Herrera,Ahmed Agiza,Baruch Epshtein,Anuj Aggarwal,Julia Ulziisaikhan,Chao Wang,Dinesh Ramasamy,Parshva Doshi,Sri Reddy,Arnold Overwijk
备注:Lee Xiong, Zhirong Chen, and Rahul Mayuranath contributed equally to this work
摘要:我们提出了LLaTTE(LLM-Style Latent Transformers for Temporal Events),一个可扩展的Transformer架构,用于产品广告推荐。通过系统的实验,我们证明了推荐系统中的序列建模遵循类似于LLM的可预测幂律缩放。至关重要的是,我们发现语义特征弯曲了伸缩曲线:它们是伸缩的先决条件,使模型能够有效地利用更深更长的架构的容量。为了实现在严格的延迟约束下持续扩展的好处,我们引入了一个两阶段的架构,该架构将大型长上下文模型的繁重计算卸载到异步上游用户模型。我们证明了上游的改进转移到下游的排名任务。作为Meta最大的用户模型部署,这个多阶段框架以最小的服务开销推动Facebook Feed和Reels的转化率提升了4.3%,为工业推荐系统中利用缩放定律建立了一个实用的蓝图。
摘要:We present LLaTTE (LLM-Style Latent Transformers for Temporal Events), a scalable transformer architecture for production ads recommendation. Through systematic experiments, we demonstrate that sequence modeling in recommendation systems follows predictable power-law scaling similar to LLMs. Crucially, we find that semantic features bend the scaling curve: they are a prerequisite for scaling, enabling the model to effectively utilize the capacity of deeper and longer architectures. To realize the benefits of continued scaling under strict latency constraints, we introduce a two-stage architecture that offloads the heavy computation of large, long-context models to an asynchronous upstream user model. We demonstrate that upstream improvements transfer predictably to downstream ranking tasks. Deployed as the largest user model at Meta, this multi-stage framework drives a 4.3\% conversion uplift on Facebook Feed and Reels with minimal serving overhead, establishing a practical blueprint for harnessing scaling laws in industrial recommender systems.


聚类(2篇)

【1】PASS: Ambiguity Guided Subsets for Scalable Classical and Quantum Constrained Clustering
标题:通过:可扩展经典和量子约束集群的模糊引导子集
链接:https://arxiv.org/abs/2601.20157

作者:Pedro Chumpitaz-Flores,My Duong,Ying Mao,Kaixun Hua
备注:25 pages, 8 figures, preprint
摘要:成对约束聚类通过在特定样本之间强制执行必须链接(ML)和不能链接(CL)约束来增强具有边信息的无监督划分,从而产生尊重已知亲和力和分离的标签。然而,ML和CL约束为聚类问题增加了额外的复杂性,目前的方法在数据可扩展性方面存在困难,特别是在量子或量子混合聚类等利基应用中。我们提出了PASS,一个成对约束和模糊驱动的子集选择框架,保留ML和CL约束的满意度,同时允许可扩展的,高质量的聚类解决方案。PASS将ML约束折叠成伪点,并提供两个选择器:一个约束感知的边缘规则,收集近边界点和所有检测到的CL违规,以及一个信息几何规则,通过从软分配后验中导出的Fisher-Rao距离得分,然后在简单的预算下选择最高信息子集。在不同的基准测试中,PASS以比精确或基于惩罚的方法低得多的成本获得具有竞争力的SSE,并且在先前方法失败的情况下仍然有效。
摘要 :Pairwise-constrained clustering augments unsupervised partitioning with side information by enforcing must-link (ML) and cannot-link (CL) constraints between specific samples, yielding labelings that respect known affinities and separations. However, ML and CL constraints add an extra layer of complexity to the clustering problem, with current methods struggling in data scalability, especially in niche applications like quantum or quantum-hybrid clustering. We propose PASS, a pairwise-constraints and ambiguity-driven subset selection framework that preserves ML and CL constraints satisfaction while allowing scalable, high-quality clustering solution. PASS collapses ML constraints into pseudo-points and offers two selectors: a constraint-aware margin rule that collects near-boundary points and all detected CL violations, and an information-geometric rule that scores points via a Fisher-Rao distance derived from soft assignment posteriors, then selects the highest-information subset under a simple budget. Across diverse benchmarks, PASS attains competitive SSE at substantially lower cost than exact or penalty-based methods, and remains effective in regimes where prior approaches fail.


【2】Sparse clustering via the Deterministic Information Bottleneck algorithm
标题:通过确定性信息瓶颈算法进行稀疏聚集
链接:https://arxiv.org/abs/2601.20628

作者:Efthymios Costa,Ioanna Papatsouma,Angelos Markos
备注:Submitted to IFCS 2026 (8 pages total)
摘要:聚类分析涉及将对象分配到理想地呈现某些期望特征的组中的任务。当聚类结构被限制在特征空间的一个子集时,传统的聚类技术面临着前所未有的挑战。我们提出了一个信息理论框架,克服了与稀疏数据相关的问题,允许联合特征加权和聚类。我们的建议构成了一个有竞争力的替代现有的聚类算法的稀疏数据,通过模拟合成数据。我们的方法的有效性是建立在一个现实世界的基因组数据集的应用程序。
摘要:Cluster analysis relates to the task of assigning objects into groups which ideally present some desirable characteristics. When a cluster structure is confined to a subset of the feature space, traditional clustering techniques face unprecedented challenges. We present an information-theoretic framework that overcomes the problems associated with sparse data, allowing for joint feature weighting and clustering. Our proposal constitutes a competitive alternative to existing clustering algorithms for sparse data, as demonstrated through simulations on synthetic data. The effectiveness of our method is established by an application on a real-world genomics data set.


超分辨率|去噪|去模糊|去雾(1篇)

【1】DeRaDiff: Denoising Time Realignment of Diffusion Models
标题:DeRaDiff:消除扩散模型的时间重新对齐
链接:https://arxiv.org/abs/2601.20198

作者:Ratnavibusena Don Shahain Manujith,Yang Zhang,Teoh Tze Tzun,Kenji Kawaguchi
摘要:最新的进展使扩散模型与人类偏好相一致,以增加美学吸引力并减少伪影和偏见。这些方法旨在最大化与更高奖励一致的条件输出分布,同时不会偏离预先训练的先验。这通常由KL(Kullback Leibler)正则化来执行。因此,一个核心问题仍然存在:如何选择正确的正则化强度?太高的强度导致有限的对齐,太低的强度导致“奖励黑客”。这使得选择正确的正则化强度的任务非常重要。现有方法通过在多个正则化强度下对齐预训练模型来扫描这个超参数,然后选择最佳强度。不幸的是,这是昂贵的。我们引入了DeRaDiff,这是一种去噪时间重新对齐过程,在对齐一次预训练模型后,在采样期间调整正则化强度,以模拟在其他正则化强度下训练的模型,而无需任何额外的训练或微调。将解码时间重新排列从语言扩展到扩散模型,DeRaDiff通过用对齐和参考后验的几何混合物替换反向步骤参考分布来对连续潜伏期的迭代预测进行操作,从而在常见的编码器和单个可调参数lambda下产生闭合形式更新,用于动态控制。我们的实验表明,在多个文本图像对齐和图像质量指标中,我们的方法始终为在不同正则化强度下完全从头开始对齐的模型提供了强大的近似。因此,我们的方法产生了一种有效的方法来搜索最佳强度,消除了昂贵的对齐扫描的需要,从而大大降低了计算成本。
摘要:Recent advances align diffusion models with human preferences to increase aesthetic appeal and mitigate artifacts and biases. Such methods aim to maximize a conditional output distribution aligned with higher rewards whilst not drifting far from a pretrained prior. This is commonly enforced by KL (Kullback Leibler) regularization. As such, a central issue still remains: how does one choose the right regularization strength? Too high of a strength leads to limited alignment and too low of a strength leads to "reward hacking". This renders the task of choosing the correct regularization strength highly non-trivial. Existing approaches sweep over this hyperparameter by aligning a pretrained model at multiple regularization strengths and then choose the best strength. Unfortunately, this is prohibitively expensive. We introduce DeRaDiff, a denoising time realignment procedure that, after aligning a pretrained model once, modulates the regularization strength during sampling to emulate models trained at other regularization strengths without any additional training or finetuning. Extending decoding-time realignment from language to diffusion models, DeRaDiff operates over iterative predictions of continuous latents by replacing the reverse step reference distribution by a geometric mixture of an aligned and reference posterior, thus giving rise to a closed form update under common schedulers and a single tunable parameter, lambda, for on the fly control. Our experiments show that across multiple text image alignment and image-quality metrics, our method consistently provides a strong approximation for models aligned entirely from scratch at different regularization strengths. Thus, our method yields an efficient way to search for the optimal strength, eliminating the need for expensive alignment sweeps and thereby substantially reducing computational costs.


联邦学习|隐私保护|加密(3篇)

【1】SA-PEF: Step-Ahead Partial Error Feedback for Efficient Federated Learning
标题:SA-PFA:用于高效联邦学习的分步部分错误反馈
链接:https://arxiv.org/abs/2601.20738

作者:Dawit Kiros Redie,Reza Arablouei,Stefan Werner
摘要:带有误差反馈(EF)的有偏梯度压缩减少了联邦学习(FL)中的通信,但在非IID数据下,残余误差可能会缓慢衰减,导致梯度不匹配和早期轮次的停滞。本文提出了一种将部分误差反馈(PEF)与步进(SA)校正相结合的步进部分误差反馈(SA-PEF)算法。当步长α=0时,SA-PEF恢复EF;当步长α=1时,SAEF恢复EF。对于非凸目标和$δ$-压缩压缩器,我们建立了一个二阶矩界和一个剩余递归,保证收敛到平稳异构数据和部分客户端参与。所得速率匹配标准非凸Fed-SGD保证直到常数因子,实现$O((η,η_0TR)^{-1})$收敛到具有固定内部步长的方差/异质性底。我们的分析揭示了一个前一步控制的剩余收缩$ρ_r$,解释了在早期训练阶段观察到的加速。为了平衡SAEF的快速预热与EF的长期稳定性,我们选择接近其理论预测的最佳值的α。在不同架构和数据集上的实验表明,SA-PEF始终比EF更快地达到目标精度。
摘要:Biased gradient compression with error feedback (EF) reduces communication in federated learning (FL), but under non-IID data, the residual error can decay slowly, causing gradient mismatch and stalled progress in the early rounds. We propose step-ahead partial error feedback (SA-PEF), which integrates step-ahead (SA) correction with partial error feedback (PEF). SA-PEF recovers EF when the step-ahead coefficient $α=0$ and step-ahead EF (SAEF) when $α=1$. For non-convex objectives and $δ$-contractive compressors, we establish a second-moment bound and a residual recursion that guarantee convergence to stationarity under heterogeneous data and partial client participation. The resulting rates match standard non-convex Fed-SGD guarantees up to constant factors, achieving $O((η,η_0TR)^{-1})$ convergence to a variance/heterogeneity floor with a fixed inner step size. Our analysis reveals a step-ahead-controlled residual contraction $ρ_r$ that explains the observed acceleration in the early training phase. To balance SAEF's rapid warm-up with EF's long-term stability, we select $α$ near its theory-predicted optimum. Experiments across diverse architectures and datasets show that SA-PEF consistently reaches target accuracy faster than EF.


【2】FedRD: Reducing Divergences for Generalized Federated Learning via Heterogeneity-aware Parameter Guidance
标题:FedRD:通过异类感知参数指导减少广义联邦学习的分歧
链接 :https://arxiv.org/abs/2601.20397

作者:Kaile Wang,Jiannong Cao,Yu Yang,Xiaoyin Li,Mingjin Zhang
备注:Accepted by ICASSP 2026
摘要:异构联邦学习(HFL)旨在确保不同实体之间有效和隐私保护的协作。由于新加入的客户需要进行重大调整和额外的培训,以与现有系统保持一致,因此将联邦学习模型推广到异构数据下看不见的客户的问题变得越来越重要。因此,我们强调两个未解决的挑战性问题,在联邦域推广:优化发散和性能发散。为了解决上述挑战,我们提出了FedRD,这是一种新颖的异质性感知联邦学习算法,它协同利用参数引导的全局泛化聚合和局部去偏分类来减少分歧,旨在为参与和看不见的客户端获得最佳的全局模型。在公共多域数据集上进行的大量实验表明,我们的方法在解决这一特定问题方面比竞争基线具有显著的性能优势。
摘要:Heterogeneous federated learning (HFL) aims to ensure effective and privacy-preserving collaboration among different entities. As newly joined clients require significant adjustments and additional training to align with the existing system, the problem of generalizing federated learning models to unseen clients under heterogeneous data has become progressively crucial. Consequently, we highlight two unsolved challenging issues in federated domain generalization: Optimization Divergence and Performance Divergence. To tackle the above challenges, we propose FedRD, a novel heterogeneity-aware federated learning algorithm that collaboratively utilizes parameter-guided global generalization aggregation and local debiased classification to reduce divergences, aiming to obtain an optimal global model for participating and unseen clients. Extensive experiments on public multi-domain datasets demonstrate that our approach exhibits a substantial performance advantage over competing baselines in addressing this specific problem.


【3】DecHW: Heterogeneous Decentralized Federated Learning Exploiting Second-Order Information
标题:DecHW:利用二阶信息的异类去中心化联邦学习
链接:https://arxiv.org/abs/2601.19938

作者:Adnan Ahmad,Chiara Boldrini,Lorenzo Valerio,Andrea Passarella,Marco Conti
备注:Funding: SoBigDatait (PNRR IR0000013), FAIR (PNRR PE00000013), RESTART (PNRR PE00000001)
摘要:分散式联合学习(DFL)是一种无服务器协作机器学习范式,其中设备直接与相邻设备协作以交换模型信息以学习广义模型。然而,个体体验的变化和不同级别的设备交互导致跨设备的数据和模型初始化异质性。这种异质性在设备之间留下局部模型参数的变化,导致收敛较慢。本文通过明确解决局部模型中参数水平变化的证据可信度来解决数据和模型的异质性。引入了一种新的聚合方法,捕捉这些参数的变化,在当地的模型,并执行强大的聚集邻域局部更新。具体地说,共识权重是通过对局部模型在其局部数据集上的二阶信息进行近似来生成的。这些权重用于缩放邻域更新,然后将其聚合为全局邻域表示。在计算机视觉任务的广泛实验中,所提出的方法在降低通信成本的情况下显示出局部模型的强大泛化能力。
摘要:Decentralized Federated Learning (DFL) is a serverless collaborative machine learning paradigm where devices collaborate directly with neighbouring devices to exchange model information for learning a generalized model. However, variations in individual experiences and different levels of device interactions lead to data and model initialization heterogeneities across devices. Such heterogeneities leave variations in local model parameters across devices that leads to slower convergence. This paper tackles the data and model heterogeneity by explicitly addressing the parameter level varying evidential credence across local models. A novel aggregation approach is introduced that captures these parameter variations in local models and performs robust aggregation of neighbourhood local updates. Specifically, consensus weights are generated via approximation of second-order information of local models on their local datasets. These weights are utilized to scale neighbourhood updates before aggregating them into global neighbourhood representation. In extensive experiments with computer vision tasks, the proposed approach shows strong generalizability of local models at reduced communication costs.


推理|分析|理解|解释(19篇)

【1】Training Reasoning Models on Saturated Problems via Failure-Prefix Conditioning
标题:通过失败前置条件训练饱和问题的推理模型
链接:https://arxiv.org/abs/2601.20829

作者:Minwu Kim,Safal Shrestha,Keith Ross
备注:16 pages
摘要:带有可验证奖励的强化学习(RLVR)大大提高了大型语言模型(LLM)的推理能力,但随着问题变得饱和,训练往往会停滞。我们确定的核心挑战是信息故障的可访问性差:学习信号存在,但很少遇到标准推出。为了解决这个问题,我们提出了失败前缀条件,一个简单而有效的方法从饱和的问题。我们的方法不是从原始问题开始,而是通过对来自罕见的不正确推理轨迹的前缀进行条件训练来重新分配探索,从而将模型暴露在容易失败的状态下。我们观察到,失败前缀条件产生的性能增益匹配的训练中等难度的问题,同时保持令牌的效率。此外,我们分析了模型的鲁棒性,发现我们的方法减少了误导性故障前缀下的性能下降,尽管在坚持正确的早期推理方面有轻微的权衡。最后,我们证明了一种迭代方法,它在训练过程中刷新失败前缀,在性能稳定后解锁额外的收益。总的来说,我们的研究结果表明,失败前缀条件反射提供了一个有效的途径,以延长饱和问题的RLVR培训。
摘要:Reinforcement Learning with Verifiable Rewards (RLVR) has substantially improved the reasoning abilities of large language models (LLMs), yet training often stalls as problems become saturated. We identify the core challenge as the poor accessibility of informative failures: learning signals exist but are rarely encountered during standard rollouts. To address this, we propose failure-prefix conditioning, a simple and effective method for learning from saturated problems. Rather than starting from the original question, our approach reallocates exploration by conditioning training on prefixes derived from rare incorrect reasoning trajectories, thereby exposing the model to failure-prone states. We observe that failure-prefix conditioning yields performance gains matching those of training on medium-difficulty problems, while preserving token efficiency. Furthermore, we analyze the model's robustness, finding that our method reduces performance degradation under misleading failure prefixes, albeit with a mild trade-off in adherence to correct early reasoning. Finally, we demonstrate that an iterative approach, which refreshes failure prefixes during training, unlocks additional gains after performance plateaus. Overall, our results suggest that failure-prefix conditioning offers an effective pathway to extend RLVR training on saturated problems.


【2】GNN Explanations that do not Explain and How to find Them
标题:无法解释的GNN解释以及如何找到它们
链接:https://arxiv.org/abs/2601.20815

作者:Steve Azzolin,Stefano Teso,Bruno Lepri,Andrea Passerini,Sagar Malhotra
摘要:自解释图神经网络(SE-GNN)提供的解释对于理解模型的内部工作和识别敏感属性的潜在滥用是至关重要的。虽然最近的工作已经强调,这些解释可能是次优的和潜在的误导,他们的失败案例的特征是不可用的。在这项工作中,我们确定了SE-GNN解释的一个关键失败:解释可以与SE-GNN如何推断标签毫不含糊地无关。我们表明,一方面,许多SE-GNN可以在产生这些退化解释的同时实现最佳真实风险,另一方面,大多数忠诚度指标可能无法识别这些故障模式。我们的实证分析表明,退化的解释可以被恶意植入(允许攻击者隐藏敏感属性的使用),也可以自然出现,突出了可靠审计的必要性。为了解决这个问题,我们引入了一种新的忠诚度指标,可靠地标记退化的解释不忠实,在恶意和自然的设置。我们的代码可以在补充中找到。
摘要 :Explanations provided by Self-explainable Graph Neural Networks (SE-GNNs) are fundamental for understanding the model's inner workings and for identifying potential misuse of sensitive attributes. Although recent works have highlighted that these explanations can be suboptimal and potentially misleading, a characterization of their failure cases is unavailable. In this work, we identify a critical failure of SE-GNN explanations: explanations can be unambiguously unrelated to how the SE-GNNs infer labels. We show that, on the one hand, many SE-GNNs can achieve optimal true risk while producing these degenerate explanations, and on the other, most faithfulness metrics can fail to identify these failure modes. Our empirical analysis reveals that degenerate explanations can be maliciously planted (allowing an attacker to hide the use of sensitive attributes) and can also emerge naturally, highlighting the need for reliable auditing. To address this, we introduce a novel faithfulness metric that reliably marks degenerate explanations as unfaithful, in both malicious and natural settings. Our code is available in the supplemental.


【3】Optimal Transport Group Counterfactual Explanations
标题:最优运输群反事实解释
链接:https://arxiv.org/abs/2601.20692

作者:Enrique Valero-Leal,Bernd Bischl,Pedro Larrañaga,Concha Bielza,Giuseppe Casalicchio
摘要:组反事实解释寻找一组反事实实例来对比解释一组输入实例。然而,现有的方法要么(i)仅针对固定组优化反事实,而不推广到新的组成员,(ii)严格依赖于强模型假设(例如,线性)的易处理性或/和(iii)不良地控制反事实组几何失真。相反,我们学习一个显式的最优传输映射,将任何组实例发送到其反事实,而无需重新优化,从而最小化组的总传输成本。这使得可以用更少的参数进行概括,从而更容易解释常见的可操作的追索权。对于线性分类器,我们证明了代表群体反事实的函数是通过数学优化导出的,识别了底层的凸优化类型(QP,QCQP,.)。实验表明,他们准确地概括,保持组的几何形状,只产生微不足道的额外的运输成本相比,基线方法。如果不能利用模型线性,我们的方法也显着优于基线。
摘要:Group counterfactual explanations find a set of counterfactual instances to explain a group of input instances contrastively. However, existing methods either (i) optimize counterfactuals only for a fixed group and do not generalize to new group members, (ii) strictly rely on strong model assumptions (e.g., linearity) for tractability or/and (iii) poorly control the counterfactual group geometry distortion. We instead learn an explicit optimal transport map that sends any group instance to its counterfactual without re-optimization, minimizing the group's total transport cost. This enables generalization with fewer parameters, making it easier to interpret the common actionable recourse. For linear classifiers, we prove that functions representing group counterfactuals are derived via mathematical optimization, identifying the underlying convex optimization type (QP, QCQP, ...). Experiments show that they accurately generalize, preserve group geometry and incur only negligible additional transport cost compared to baseline methods. If model linearity cannot be exploited, our approach also significantly outperforms the baselines.


【4】WFR-MFM: One-Step Inference for Dynamic Unbalanced Optimal Transport
标题:WFR-MFM:动态不平衡最优运输的一步推理
链接:https://arxiv.org/abs/2601.20606

作者:Xinyu Wang,Ruoyu Wang,Qiangwei Peng,Peijie Zhou,Tiejun Li
摘要:从有限的观察重建动态进化是单细胞生物学的一个基本挑战,其中动态不平衡的最佳运输为耦合运输和质量变化建模提供了一个原则性的框架。然而,现有的方法依赖于在推理时的轨迹模拟,使推理可扩展的应用程序的一个关键瓶颈。在这项工作中,我们提出了一个平均流框架的不平衡流匹配,总结了运输和质量增长动态在任意时间间隔内使用平均速度和质量增长领域,使快速一步生成无需轨迹模拟。为了解决Wasserstein-Fisher-Rao几何下的动态不平衡最优传输,我们进一步建立在这个框架上,开发Wasserstein-Fisher-Rao平均流匹配(WFR-MFM)。在合成和真实的单细胞RNA测序数据集上,WFR-MFM实现了比一系列现有基线更快的数量级推理,同时保持了高预测准确性,并在具有数千个条件的大型合成数据集上实现了高效的扰动响应预测。
摘要:Reconstructing dynamical evolution from limited observations is a fundamental challenge in single-cell biology, where dynamic unbalanced optimal transport provides a principled framework for modeling coupled transport and mass variation. However, existing approaches rely on trajectory simulation at inference time, making inference a key bottleneck for scalable applications. In this work, we propose a mean-flow framework for unbalanced flow matching that summarizes both transport and mass-growth dynamics over arbitrary time intervals using mean velocity and mass-growth fields, enabling fast one-step generation without trajectory simulation. To solve dynamic unbalanced optimal transport under the Wasserstein-Fisher-Rao geometry, we further build on this framework to develop Wasserstein-Fisher-Rao Mean Flow Matching (WFR-MFM). Across synthetic and real single-cell RNA sequencing datasets, WFR-MFM achieves orders-of-magnitude faster inference than a range of existing baselines while maintaining high predictive accuracy, and enables efficient perturbation response prediction on large synthetic datasets with thousands of conditions.


【5】An explainable framework for the relationship between dementia and glucose metabolism patterns
标题:痴呆症与糖代谢模式之间关系的可解释框架
链接:https://arxiv.org/abs/2601.20480

作者:C. Vázquez-García,F. J. Martínez-Murcia,F. Segovia Román,A. Forte,J. Ramírez,I. Illán,A. Hernández-Segura,C. Jiménez-Mesa,Juan M. Górriz
摘要:由于复杂的非线性关系,高维神经影像数据对评估神经退行性疾病提出了挑战。变分自动编码器(VAE)可以将扫描编码到低维潜在空间中,捕获疾病相关特征。我们提出了一个半监督的VAE框架,具有灵活的相似性正则化项,将选定的潜在变量与痴呆进展的临床或生物标志物指标相结合。这允许使相似性度量和监督变量适应特定目标或可用数据。我们使用阿尔茨海默病神经影像学倡议(ADNI)的PET扫描演示了这种方法,引导第一个潜在维度与认知评分保持一致。使用这个有监督的潜在变量,我们生成了不同认知障碍水平的平均重建。基于体素的GLM分析揭示了关键区域(主要是海马体)和主要静息状态网络(特别是默认模式和中央执行网络)的新陈代谢减少。其余的潜变量编码仿射变换和强度变化,捕获诸如受试者间变异性和场地效应的混淆。我们的框架有效地提取了与已建立的阿尔茨海默氏症生物标志物相一致的疾病相关模式,为研究神经退行性疾病的进展提供了一个可解释和可适应的工具。
摘要:High-dimensional neuroimaging data presents challenges for assessing neurodegenerative diseases due to complex non-linear relationships. Variational Autoencoders (VAEs) can encode scans into lower-dimensional latent spaces capturing disease-relevant features. We propose a semi-supervised VAE framework with a flexible similarity regularization term that aligns selected latent variables with clinical or biomarker measures of dementia progression. This allows adapting the similarity metric and supervised variables to specific goals or available data. We demonstrate the approach using PET scans from the Alzheimer's Disease Neuroimaging Initiative (ADNI), guiding the first latent dimension to align with a cognitive score. Using this supervised latent variable, we generate average reconstructions across levels of cognitive impairment. Voxel-wise GLM analysis reveals reduced metabolism in key regions, mainly the hippocampus, and within major Resting State Networks, particularly the Default Mode and Central Executive Networks. The remaining latent variables encode affine transformations and intensity variations, capturing confounds such as inter-subject variability and site effects. Our framework effectively extracts disease-related patterns aligned with established Alzheimer's biomarkers, offering an interpretable and adaptable tool for studying neurodegenerative progression.


【6】Fair Recourse for All: Ensuring Individual and Group Fairness in Counterfactual Explanations
标题:所有人的公平追索权:确保个人和群体在反事实陈述中的公平
链接:https://arxiv.org/abs/2601.20449

作者:Fatima Ezzeddine,Obaida Ammar,Silvia Giordano,Omran Ayoub
摘要 :可解释人工智能(XAI)对于提高机器学习(ML)模型的透明度变得越来越重要。在各种XAI技术中,反事实解释(CF)具有关键作用,因为它们能够说明输入特征的变化如何改变ML模型的决策,从而为用户提供可操作的追索权。确保具有可比属性的个人和属于不同受保护群体的个人(例如,人口统计学)获得类似的和可采取行动的追索权选择是可信和公平决策的关键。在这项工作中,我们直接通过关注公平CF的生成来解决这一挑战。具体来说,我们首先定义和制定公平性:1)个人公平性,确保相似的个人获得相似的CF,2)群体公平性,确保不同受保护群体之间的公平CF和3)混合公平性,兼顾个人和更广泛的群体公平性。我们制定的问题作为一个优化任务,并提出了一种新的模型不可知的,基于强化学习的方法来生成CF,满足公平性约束,在个人和群体的水平,两个目标,通常被视为正交。作为公平性指标,我们扩展了通常用于审计ML模型的现有指标,例如个人和团体之间的平等追索权选择和平等有效性。我们在三个基准数据集上评估了我们的方法,结果表明,它有效地确保了个人和群体的公平性,同时在邻近性和可扩展性方面保留了生成的CF的质量,并分别量化了不同级别的公平性成本。我们的工作开启了关于混合公平性及其对XAI和超越CF的作用和影响的更广泛讨论。
摘要:Explainable Artificial Intelligence (XAI) is becoming increasingly essential for enhancing the transparency of machine learning (ML) models. Among the various XAI techniques, counterfactual explanations (CFs) hold a pivotal role due to their ability to illustrate how changes in input features can alter an ML model's decision, thereby offering actionable recourse to users. Ensuring that individuals with comparable attributes and those belonging to different protected groups (e.g., demographic) receive similar and actionable recourse options is essential for trustworthy and fair decision-making. In this work, we address this challenge directly by focusing on the generation of fair CFs. Specifically, we start by defining and formulating fairness at: 1) individual fairness, ensuring that similar individuals receive similar CFs, 2) group fairness, ensuring equitable CFs across different protected groups and 3) hybrid fairness, which accounts for both individual and broader group-level fairness. We formulate the problem as an optimization task and propose a novel model-agnostic, reinforcement learning based approach to generate CFs that satisfy fairness constraints at both the individual and group levels, two objectives that are usually treated as orthogonal. As fairness metrics, we extend existing metrics commonly used for auditing ML models, such as equal choice of recourse and equal effectiveness across individuals and groups. We evaluate our approach on three benchmark datasets, showing that it effectively ensures individual and group fairness while preserving the quality of the generated CFs in terms of proximity and plausibility, and quantify the cost of fairness in the different levels separately. Our work opens a broader discussion on hybrid fairness and its role and implications for XAI and beyond CFs.


【7】Multimodal Multi-Agent Ransomware Analysis Using AutoGen
标题:使用AutoGen的多模式多代理勒索软件分析
链接:https://arxiv.org/abs/2601.20346

作者:Asifullah Khan,Aimen Wadood,Mubashar Iqbal,Umme Zahoora
备注:45 pages, 11 figures and 10 tables
摘要:勒索软件已成为全球范围内造成重大财务损失和运营中断的最严重的网络安全威胁之一。传统的检测方法,如静态分析,启发式扫描和行为分析,在单独使用时往往力不从心。为了解决这些局限性,本文提出了专为勒索软件分类而设计的多模态多代理勒索软件分析框架。建议的多模态多智能体体系结构结合了静态,动态和网络来源的信息。每种数据类型都由使用基于自动编码器的特征提取的专用代理处理。然后,这些表示通过融合代理集成。融合后的表示被基于Transformer的分类器使用。它识别特定的勒索软件家族。代理人通过交互式反馈机制,迭代细化功能表示抑制低置信度信息。该框架在包含数千个勒索软件和良性样本的大规模数据集上进行了评估。在勒索软件数据集上进行了多个实验。它优于单一模态和非自适应融合基线,在Macro-F1中实现高达0.936的改进,用于家庭分类并减少校准误差。超过100个时期,代理反馈回路显示出稳定的单调收敛,导致代理质量的绝对改善超过+0.75,最终综合得分约为0.88,而无需微调语言模型。零日勒索软件检测仍然依赖于多态性和模态中断。置信度感知防御通过支持保守和可信的决策而不是强制分类来实现可靠的真实世界部署。研究结果表明,所提出的方法为改善现实世界的勒索软件防御系统提供了一条实用有效的途径。
摘要:Ransomware has become one of the most serious cybersecurity threats causing major financial losses and operational disruptions worldwide.Traditional detection methods such as static analysis, heuristic scanning and behavioral analysis often fall short when used alone. To address these limitations, this paper presents multimodal multi agent ransomware analysis framework designed for ransomware classification. Proposed multimodal multiagent architecture combines information from static, dynamic and network sources. Each data type is handled by specialized agent that uses auto encoder based feature extraction. These representations are then integrated through a fusion agent. After that fused representation are used by transformer based classifier. It identifies the specific ransomware family. The agents interact through an interagent feedback mechanism that iteratively refines feature representations by suppressing low confidence information. The framework was evaluated on large scale datasets containing thousands of ransomware and benign samples. Multiple experiments were conducted on ransomware dataset. It outperforms single modality and nonadaptive fusion baseline achieving improvement of up to 0.936 in Macro-F1 for family classification and reducing calibration error. Over 100 epochs, the agentic feedback loop displays a stable monotonic convergence leading to over +0.75 absolute improvement in terms of agent quality and a final composite score of around 0.88 without fine tuning of the language models. Zeroday ransomware detection remains family dependent on polymorphism and modality disruptions. Confidence aware abstention enables reliable real world deployment by favoring conservativeand trustworthy decisions over forced classification. The findings indicate that proposed approach provides a practical andeffective path toward improving real world ransomware defense systems.


【8】Beyond Speedup -- Utilizing KV Cache for Sampling and Reasoning
标题:超越加速--利用KV缓存进行采样和推理
链接:https://arxiv.org/abs/2601.20326

作者:Zeyu Xing,Xing Li,Hui-Ling Zhen,Mingxuan Yuan,Sinno Jialin Pan
备注:Accepted by ICLR26
摘要:KV缓存通常仅用于加速自回归解码,对上下文信息进行编码,这些上下文信息可以在没有额外成本的情况下重新用于下游任务。我们建议将KV缓存视为轻量级表示,无需重新计算或存储完整的隐藏状态。尽管比专用嵌入弱,KV派生的表示被证明足以用于两个关键应用:\textbf{(i)Chain-of-Embedding},其中它们在Llama-3.1-8B-Instruct和Qwen 2 - 7 B-Instruct上实现具有竞争力或优越的性能;和\textbf{(ii)快/慢思维切换},它们可以在Qwen 3 -8B和DeepSeek-R1-Distil-Qwen-14 B上实现自适应推理,减少令牌生成高达$5.7\times$,同时将准确性损失降至最低。我们的研究结果建立KV缓存作为一个免费的,有效的基板采样和推理,打开新的方向表示重用LLM推理。代码:https://github.com/cmd2001/ICLR2026_KV-Embedding.
摘要:KV caches, typically used only to speed up autoregressive decoding, encode contextual information that can be reused for downstream tasks at no extra cost. We propose treating the KV cache as a lightweight representation, eliminating the need to recompute or store full hidden states. Despite being weaker than dedicated embeddings, KV-derived representations are shown to be sufficient for two key applications: \textbf{(i) Chain-of-Embedding}, where they achieve competitive or superior performance on Llama-3.1-8B-Instruct and Qwen2-7B-Instruct; and \textbf{(ii) Fast/Slow Thinking Switching}, where they enable adaptive reasoning on Qwen3-8B and DeepSeek-R1-Distil-Qwen-14B, reducing token generation by up to $5.7\times$ with minimal accuracy loss. Our findings establish KV caches as a free, effective substrate for sampling and reasoning, opening new directions for representation reuse in LLM inference. Code: https://github.com/cmd2001/ICLR2026_KV-Embedding.


【9】Going NUTS with ADVI: Exploring various Bayesian Inference techniques with Facebook Prophet
标题:使用ADVI疯狂:使用Facebook Prophet探索各种Bayesian推理技术
链接:https://arxiv.org/abs/2601.20120

作者:Jovan Krajevski,Biljana Tojtovska Ribarski
备注:6 pages, 5 figures, Published in Proceedings of the 22nd International Conference for Informatics and Information Technologies - CiiT 2025
摘要 :自推出以来,Facebook Prophet吸引了经典统计学家和贝叶斯统计社区的积极关注。该模型提供了两种内置的推理方法:使用L-BFGS-B算法的最大后验估计,以及通过无U形转弯采样器(NUTS)进行的马尔可夫链蒙特卡罗(MCMC)采样。在使用Prophet的贝叶斯推理探索各种时间序列预测问题时,我们遇到了无法应用默认提供的其他推理技术的限制。此外,Facebook Prophet流畅的API设计被证明不够灵活,无法实现我们的自定义建模思想。为了解决这些缺点,我们在PyMC中开发了Prophet模型的完整重新实现,这使我们能够扩展基础模型并评估和比较多种贝叶斯推理方法。在本文中,我们介绍了基于PyMC的实现,并详细分析了不同贝叶斯推理技术的实现。我们考虑充分MCMC技术,MAP估计和变分推理技术的时间序列预测问题。我们详细讨论了采样方法,收敛诊断,预测指标,以及它们的计算效率和检测可能的问题,将在我们未来的工作中解决。
摘要:Since its introduction, Facebook Prophet has attracted positive attention from both classical statisticians and the Bayesian statistics community. The model provides two built-in inference methods: maximum a posteriori estimation using the L-BFGS-B algorithm, and Markov Chain Monte Carlo (MCMC) sampling via the No-U-Turn Sampler (NUTS). While exploring various time-series forecasting problems using Bayesian inference with Prophet, we encountered limitations stemming from the inability to apply alternative inference techniques beyond those provided by default. Additionally, the fluent API design of Facebook Prophet proved insufficiently flexible for implementing our custom modeling ideas. To address these shortcomings, we developed a complete reimplementation of the Prophet model in PyMC, which enables us to extend the base model and evaluate and compare multiple Bayesian inference methods. In this paper, we present our PyMC-based implementation and analyze in detail the implementation of different Bayesian inference techniques. We consider full MCMC techniques, MAP estimation and Variational inference techniques on a time-series forecasting problem. We discuss in details the sampling approach, convergence diagnostics, forecasting metrics as well as their computational efficiency and detect possible issues which will be addressed in our future work.


【10】Benchmarking Reward Hack Detection in Code Environments via Contrastive Analysis
标题:通过对比分析对代码环境中的奖励黑客检测进行基准测试
链接:https://arxiv.org/abs/2601.20103

作者:Darshan Deshpande,Anand Kannappan,Rebecca Qian
备注:Dataset: https://huggingface.co/datasets/PatronusAI/trace-dataset
摘要:用于代码生成的强化学习的最新进展使得强大的环境对于防止奖励黑客至关重要。随着LLM越来越多地作为基于代码的RL的评估者,他们检测奖励黑客的能力仍然没有得到充分的研究。在本文中,我们提出了一种跨越54个类别的奖励漏洞的新分类,并介绍了TRACE(代码环境中的奖励异常测试),这是一个综合策划和人工验证的基准测试,包含517个测试轨迹。与以前的工作,评估奖励黑客检测在孤立的分类方案,我们将这些评估与一个更现实的,对比异常检测设置TRACE。我们的实验表明,模型在对比设置中比在隔离分类设置中更有效地捕获奖励黑客,具有最高推理模式的GPT-5.2在TRACE上实现了63%的最佳检测率,高于隔离设置中的45%。基于这一认识,我们证明了最先进的模型与语义上下文化的奖励黑客相比,与语法上下文化的奖励黑客相比,更显着。我们进一步对模型行为进行定性分析,以及消融研究表明,良性与黑客攻击轨迹的比率和分析聚类大小会显著影响检测性能。我们发布了基准测试和评估工具,以使社区能够扩展TRACE并评估他们的模型。
摘要:Recent advances in reinforcement learning for code generation have made robust environments essential to prevent reward hacking. As LLMs increasingly serve as evaluators in code-based RL, their ability to detect reward hacking remains understudied. In this paper, we propose a novel taxonomy of reward exploits spanning across 54 categories and introduce TRACE (Testing Reward Anomalies in Code Environments), a synthetically curated and human-verified benchmark containing 517 testing trajectories. Unlike prior work that evaluates reward hack detection in isolated classification scenarios, we contrast these evaluations with a more realistic, contrastive anomaly detection setup on TRACE. Our experiments reveal that models capture reward hacks more effectively in contrastive settings than in isolated classification settings, with GPT-5.2 with highest reasoning mode achieving the best detection rate at 63%, up from 45% in isolated settings on TRACE. Building on this insight, we demonstrate that state-of-the-art models struggle significantly more with semantically contextualized reward hacks compared to syntactically contextualized ones. We further conduct qualitative analyses of model behaviors, as well as ablation studies showing that the ratio of benign to hacked trajectories and analysis cluster sizes substantially impact detection performance. We release the benchmark and evaluation harness to enable the community to expand TRACE and evaluate their models.


【11】Techno-economic optimization of a heat-pipe microreactor, part II: multi-objective optimization analysis
标题:热管微反应器的技术经济优化,第二部分:多目标优化分析
链接:https://arxiv.org/abs/2601.20079

作者:Paul Seurin,Dean Price
摘要:热管微反应堆(HPMR)是一种紧凑型和可运输的核电系统,具有固有的安全性,非常适合部署在访问受限且普遍依赖昂贵化石燃料的偏远地区。在之前的工作中,我们开发了一个设计优化框架,通过代理建模和基于强化学习(RL)的优化将技术经济考虑纳入其中,仅关注通过使用自下而上的成本估计方法来最大限度地降低电力的平准成本(LCOE)。在这项研究中,我们将该框架扩展到多目标优化,使用帕累托包络增强强化学习(PEARL)算法。目标包括在满足安全和运行约束的前提下,最小化棒积分峰化因子($F_{Δh}$)和LCOE。我们评估了三种成本方案:(1)高成本的轴向和鼓形反射器,(2)低成本的轴向反射器,以及(3)低成本的轴向和鼓形反射器。我们的研究结果表明,减少固体慢化剂半径,针间距,鼓包角-所有的同时增加燃料高度-有效地降低$F_{Δh}$。在所有三种情况下,四个关键策略始终出现优化LCOE:(1)最大限度地减少轴向反射器的贡献时,昂贵的,(2)减少控制鼓的依赖,(3)取代昂贵的三结构各向同性(TRISO)燃料与轴向反射器材料的价格在石墨的水平,和(4)最大限度地提高燃料燃耗。虽然PEARL在不同的设计方案中展示了导航权衡的前景,但代理模型预测和全阶模拟之间的差异仍然存在。预计将通过放宽限制和替代发展进一步改善,这是一个正在进行的调查领域。
摘要:Heat-pipe microreactors (HPMRs) are compact and transportable nuclear power systems exhibiting inherent safety, well-suited for deployment in remote regions where access is limited and reliance on costly fossil fuels is prevalent. In prior work, we developed a design optimization framework that incorporates techno-economic considerations through surrogate modeling and reinforcement learning (RL)-based optimization, focusing solely on minimizing the levelized cost of electricity (LCOE) by using a bottom-up cost estimation approach. In this study, we extend that framework to a multi-objective optimization that uses the Pareto Envelope Augmented with Reinforcement Learning (PEARL) algorithm. The objectives include minimizing both the rod-integrated peaking factor ($F_{Δh}$) and LCOE -- subject to safety and operational constraints. We evaluate three cost scenarios: (1) a high-cost axial and drum reflectors, (2) a low-cost axial reflector, and (3) low-cost axial and drum reflectors. Our findings indicate that reducing the solid moderator radius, pin pitch, and drum coating angle -- all while increasing the fuel height -- effectively lowers $F_{Δh}$. Across all three scenarios, four key strategies consistently emerged for optimizing LCOE: (1) minimizing the axial reflector contribution when costly, (2) reducing control drum reliance, (3) substituting expensive tri-structural isotropic (TRISO) fuel with axial reflector material priced at the level of graphite, and (4) maximizing fuel burnup. While PEARL demonstrates promise in navigating trade-offs across diverse design scenarios, discrepancies between surrogate model predictions and full-order simulations remain. Further improvements are anticipated through constraint relaxation and surrogate development, constituting an ongoing area of investigation.


【12】MeanCache: From Instantaneous to Average Velocity for Accelerating Flow Matching Inference
标题:MeanCache:从瞬时速度到平均速度加速流匹配推理
链接:https://arxiv.org/abs/2601.19961

作者:Huanlin Gao,Ping Chen,Fuyuan Shi,Ruijia Wu,Li YanTao,Qiang Hui,Yuren You,Ting Lu,Chao Tan,Shaoan Zhao,Zhaoxiang Liu,Fang Zhao,Kai Wang,Shiguo Lian
摘要 :我们提出了MeanCache,一个无需训练的高速缓存框架,用于高效的流匹配推理。现有的高速缓存方法减少了冗余计算但通常依赖于瞬时速度信息(例如,特征缓存),这在高加速比下常常导致严重的轨迹偏差和误差积累。MeanCache引入了平均速度的观点:通过利用缓存的雅可比矢量积(JVP)从瞬时速度构造层平均速度,它有效地减轻了局部误差积累。为了进一步提高缓存的定时和JVP重用的稳定性,我们开发了一个可行的稳定调度策略作为一个实用的工具,采用峰值抑制最短路径下的预算约束,以确定时间表。在FLUX.1、Qwen-Image和混元视频上的实验表明,MeanCache分别实现了4.12倍、4.56倍和3.59倍的加速,同时在生成质量上始终优于最先进的缓存基线。我们相信这种简单而有效的方法为流匹配推理提供了一个新的视角,并将激发对商业规模生成模型中稳定驱动加速的进一步探索。
摘要:We present MeanCache, a training-free caching framework for efficient Flow Matching inference. Existing caching methods reduce redundant computation but typically rely on instantaneous velocity information (e.g., feature caching), which often leads to severe trajectory deviations and error accumulation under high acceleration ratios. MeanCache introduces an average-velocity perspective: by leveraging cached Jacobian--vector products (JVP) to construct interval average velocities from instantaneous velocities, it effectively mitigates local error accumulation. To further improve cache timing and JVP reuse stability, we develop a trajectory-stability scheduling strategy as a practical tool, employing a Peak-Suppressed Shortest Path under budget constraints to determine the schedule. Experiments on FLUX.1, Qwen-Image, and HunyuanVideo demonstrate that MeanCache achieves 4.12X and 4.56X and 3.59X acceleration, respectively, while consistently outperforming state-of-the-art caching baselines in generation quality. We believe this simple yet effective approach provides a new perspective for Flow Matching inference and will inspire further exploration of stability-driven acceleration in commercial-scale generative models.


【13】Continuous-Flow Data-Rate-Aware CNN Inference on FPGA
标题:基于DSP的连续流数据速率感知CNN推理
链接:https://arxiv.org/abs/2601.19940

作者:Tobias Habermann,Michael Mecik,Zhenyu Wang,César David Vera,Martin Kumm,Mario Garrido
摘要:在深度学习推理的硬件加速器中,数据流实现提供低延迟和高吞吐量功能。在这些架构中,每个神经元都映射到一个专用的硬件单元,使它们非常适合现场可编程门阵列(FPGA)实现。由于其简单性,以前的展开实现主要集中在完全连接的网络上,尽管众所周知,卷积神经网络(CNN)需要更少的计算才能获得相同的精度。当观察CNN中的数据流时,池化层和卷积层的步幅大于1,其输出端的数据数量相对于其输入端减少。这种数据减少严重影响了完全并行实现中的数据速率,使得硬件单元严重未被利用,除非它被正确处理。这项工作通过分析CNN的数据流来解决这个问题,并提出了一种设计数据速率感知的连续流CNN架构的新方法。所提出的方法确保了高硬件利用率接近100%,交织低数据速率信号和共享硬件单元,以及使用正确的并行化,以实现一个完全并行实现的吞吐量。结果表明,可以节省大量的算术逻辑,这允许在单个FPGA上实现复杂的CNN,如MobileNet,具有高吞吐量。
摘要:Among hardware accelerators for deep-learning inference, data flow implementations offer low latency and high throughput capabilities. In these architectures, each neuron is mapped to a dedicated hardware unit, making them well-suited for field-programmable gate array (FPGA) implementation. Previous unrolled implementations mostly focus on fully connected networks because of their simplicity, although it is well known that convolutional neural networks (CNNs) require fewer computations for the same accuracy. When observing the data flow in CNNs, pooling layers and convolutional layers with a stride larger than one, the number of data at their output is reduced with respect to their input. This data reduction strongly affects the data rate in a fully parallel implementation, making hardware units heavily underutilized unless it is handled properly. This work addresses this issue by analyzing the data flow of CNNs and presents a novel approach to designing data-rate-aware, continuous-flow CNN architectures. The proposed approach ensures a high hardware utilization close to 100% by interleaving low data rate signals and sharing hardware units, as well as using the right parallelization to achieve the throughput of a fully parallel implementation. The results show that a significant amount of the arithmetic logic can be saved, which allows implementing complex CNNs like MobileNet on a single FPGA with high throughput.


【14】Text-to-State Mapping for Non-Resolution Reasoning: The Contradiction-Preservation Principle
标题:非解析推理的文本到状态映射:矛盾保留原则
链接:https://arxiv.org/abs/2601.19933

作者:Kei Saito
备注:17 pages, 3 figures, 5 tables. Sequel to arXiv:2512.13478
摘要:非归结推理(NRR)提供了一个形式化的框架来维持语义的歧义,而不是迫使过早的解释崩溃。虽然基础架构建立了保持歧义计算的状态空间和运算符,但自然语言如何映射到这些数学结构的关键问题仍然是开放的。本文介绍了文本到状态的映射函数φ,它将语言输入转换为NRR框架内的叠加状态。我们形式化的矛盾保持原则,这要求真正的歧义表达式保持非零熵在其状态表示,并开发提取协议使用现有的大型语言模型作为解释生成器。在68个测试句子中的经验验证跨越词汇,结构和语用歧义表明,我们的映射实现了平均香农熵H(S)= 1.087位的歧义输入,而基线单一解释方法产生H(S)= 0.000。该框架提供了原始文本和NRR操作符作用的形式状态空间之间缺失的算法桥梁,从而在语言模型推理中实现架构崩溃延迟。
摘要:Non-Resolution Reasoning (NRR) provides a formal framework for maintaining semantic ambiguity rather than forcing premature interpretation collapse. While the foundational architecture establishes state spaces and operators for ambiguity-preserving computation, the critical question of how natural language maps to these mathematical structures remains open. This paper introduces the text-to-state mapping function φ that transforms linguistic input into superposition states within the NRR framework. We formalize the Contradiction-Preservation Principle, which requires that genuinely ambiguous expressions maintain non-zero entropy in their state representations, and develop extraction protocols using existing Large Language Models as interpretation generators. Empirical validation across 68 test sentences spanning lexical, structural, and pragmatic ambiguity demonstrates that our mapping achieves mean Shannon entropy H(S) = 1.087 bits for ambiguous inputs while baseline single-interpretation approaches yield H(S) = 0.000. The framework provides the missing algorithmic bridge between raw text and the formal state spaces on which NRR operators act, enabling architectural collapse deferment in language model inference.


【15】Demystifying Prediction Powered Inference
标题:揭开预测推理的神秘面纱
链接:https://arxiv.org/abs/2601.20819

作者:Yilin Song,Dan M. Kluger,Harsh Parikh,Tian Gu
摘要:机器学习预测越来越多地用于补充生物医学研究、环境科学和社会科学等领域不完整或成本高昂的结果。然而,将预测视为地面事实会引入偏见,而忽略它们会浪费有价值的信息。预测动力推理(PPI)提供了一个原则性框架,该框架利用来自大型未标记数据集的预测来提高统计效率,同时通过使用较小的标记子集进行显式偏差校正来保持有效的推理。尽管有潜力,但PPI变体的不断增加以及它们之间的微妙区别使从业者难以确定何时以及如何负责任地应用这些方法。本文通过将PPI的理论基础、方法学扩展、与现有统计文献的联系以及诊断工具整合到一个统一的实践工作流程中来揭开PPI的神秘面纱。使用Mosaiks的房价数据,我们表明PPI变量比完整案例分析产生更紧的置信区间,但双浸,即重用训练数据进行推理,导致反保守的置信区间和覆盖率。在非随机缺失机制下,所有方法,包括仅使用标记数据的经典推断,都会产生有偏估计。我们提供了一个决策流程图,将违反假设与适当的PPI变体联系起来,提供了一个选择性方法的汇总表,以及用于评估核心假设的实用诊断策略。通过将PPI框架为通用配方而不是单个估计器,这项工作将方法创新和应用实践联系起来,帮助研究人员负责任地将预测整合到有效的推理中。
摘要 :Machine learning predictions are increasingly used to supplement incomplete or costly-to-measure outcomes in fields such as biomedical research, environmental science, and social science. However, treating predictions as ground truth introduces bias while ignoring them wastes valuable information. Prediction-Powered Inference (PPI) offers a principled framework that leverages predictions from large unlabeled datasets to improve statistical efficiency while maintaining valid inference through explicit bias correction using a smaller labeled subset. Despite its potential, the growing PPI variants and the subtle distinctions between them have made it challenging for practitioners to determine when and how to apply these methods responsibly. This paper demystifies PPI by synthesizing its theoretical foundations, methodological extensions, connections to existing statistics literature, and diagnostic tools into a unified practical workflow. Using the Mosaiks housing price data, we show that PPI variants produce tighter confidence intervals than complete-case analysis, but that double-dipping, i.e. reusing training data for inference, leads to anti-conservative confidence intervals and coverages. Under missing-not-at-random mechanisms, all methods, including classical inference using only labeled data, yield biased estimates. We provide a decision flowchart linking assumption violations to appropriate PPI variants, a summary table of selective methods, and practical diagnostic strategies for evaluating core assumptions. By framing PPI as a general recipe rather than a single estimator, this work bridges methodological innovation and applied practice, helping researchers responsibly integrate predictions into valid inference.


【16】Incorporating data drift to perform survival analysis on credit risk
标题:消化数据漂移以对信用风险进行生存分析
链接:https://arxiv.org/abs/2601.20533

作者:Jianwei Peng,Stefan Lessmann
备注:27 pages, 2 figures
摘要:生存分析已经成为一个标准的方法建模时间违约的时间随时间变化的协变量的信用风险。与大多数现有的方法,隐含地假设一个平稳的数据生成过程中,在实践中,抵押贷款组合暴露于各种形式的数据漂移所造成的借款人的行为,宏观经济条件,政策制度等,本研究调查的影响数据漂移生存为基础的信用风险模型,并提出了一个动态的联合建模框架,以提高鲁棒性在非平稳环境下。该模型集成了一个纵向的行为标记来自平衡动力学与离散时间的危险制定,结合地标独热编码和等渗校准。三种类型的数据漂移(突然,增量和重复)的模拟和分析抵押贷款数据集从房地美。实验和相应的证据表明,所提出的基于地标的联合模型始终优于经典的生存模型,基于树的漂移自适应学习器和梯度提升方法在所有漂移场景的歧视和校准方面,这证实了我们的模型设计的优越性。
摘要:Survival analysis has become a standard approach for modelling time to default by time-varying covariates in credit risk. Unlike most existing methods that implicitly assume a stationary data-generating process, in practise, mortgage portfolios are exposed to various forms of data drift caused by changing borrower behaviour, macroeconomic conditions, policy regimes and so on. This study investigates the impact of data drift on survival-based credit risk models and proposes a dynamic joint modelling framework to improve robustness under non-stationary environments. The proposed model integrates a longitudinal behavioural marker derived from balance dynamics with a discrete-time hazard formulation, combined with landmark one-hot encoding and isotonic calibration. Three types of data drift (sudden, incremental and recurring) are simulated and analysed on mortgage loan datasets from Freddie Mac. Experiments and corresponding evidence show that the proposed landmark-based joint model consistently outperforms classical survival models, tree-based drift-adaptive learners and gradient boosting methods in terms of discrimination and calibration across all drift scenarios, which confirms the superiority of our model design.


【17】Convergence Analysis of Randomized Subspace Normalized SGD under Heavy-Tailed Noise
标题:重尾噪音下随机子空间规范化的SGD收敛性分析
链接:https://arxiv.org/abs/2601.20399

作者:Gaku Omiya,Pierre-Louis Poirion,Akiko Takeda
备注:41 pages
摘要:随机子空间方法降低了每次迭代的成本,然而,在非凸优化中,大多数分析都是基于期望的,即使在亚高斯噪声下,高概率界仍然很少。我们首先证明了随机子空间SGD(RS-SGD)承认在亚高斯噪声下的高概率收敛界,实现了与先前预期结果相同的预言复杂度。受现代机器学习中重尾梯度流行的启发,我们提出了随机子空间归一化SGD(RS-NSGD),它将方向归一化集成到子空间更新中。假设噪声有界$p$-th时刻,我们建立了预期和高概率收敛保证,并表明RS-NSGD可以实现更好的预言复杂度比全维归一化SGD。
摘要:Randomized subspace methods reduce per-iteration cost; however, in nonconvex optimization, most analyses are expectation-based, and high-probability bounds remain scarce even under sub-Gaussian noise. We first prove that randomized subspace SGD (RS-SGD) admits a high-probability convergence bound under sub-Gaussian noise, achieving the same order of oracle complexity as prior in-expectation results. Motivated by the prevalence of heavy-tailed gradients in modern machine learning, we then propose randomized subspace normalized SGD (RS-NSGD), which integrates direction normalization into subspace updates. Assuming the noise has bounded $p$-th moments, we establish both in-expectation and high-probability convergence guarantees, and show that RS-NSGD can achieve better oracle complexity than full-dimensional normalized SGD.


【18】Do Whitepaper Claims Predict Market Behavior? Evidence from Cryptocurrency Factor Analysis
标题:白皮书主张能预测市场行为吗?加密货币因素分析的证据
链接:https://arxiv.org/abs/2601.20336

作者:Murad Farzulla
备注:35 pages, 8 figures, 10 tables. Code available at https://github.com/studiofarzulla/tensor-defi
摘要:加密货币项目通过白皮书阐明价值主张,提出有关功能和技术能力的主张。本研究调查这些叙述是否与观察到的市场行为一致。我们构建了一个将zero-shot NLP分类(BART-MNLI)与CP张量分解相结合的管道,以比较三个空间:(1)来自10个语义类别的24份白皮书的索赔矩阵,(2)两年来每小时数据的49种资产的市场统计数据,以及(3)张量分解的潜在因素(排名2,解释了92.45%的方差)。使用Procrustes旋转和塔克的一致性系数,我们测试对齐23个共同的实体。   结果显示弱对齐:索赔统计(phi=0.341,p=0.332),索赔因素(phi=0.077,p=0.747)和药物因素(phi=0.197,p<0.001)。这些因素的显著性验证了我们的方法,确认管道在存在关系时检测到关系。使用DeBERTa-v3进行的模型间验证产生了32%的精确一致性,但67%的前3位一致性。横截面分析揭示了异质性贡献:NEAR,MKR,ATOM显示出积极的对齐,而ENS,UNI,Bitcoin分歧最大。排除比特币证实了结果不是由市场主导地位驱动的。   我们将研究结果解释为白皮书叙述与市场因素结构之间的弱一致性。有限的功效(n=23)排除了区分弱比对与无比对,但可以确信地拒绝强比对(phi>=0.70)。叙述经济学和投资分析的影响进行了讨论。
摘要:Cryptocurrency projects articulate value propositions through whitepapers, making claims about functionality and technical capabilities. This study investigates whether these narratives align with observed market behavior. We construct a pipeline combining zero-shot NLP classification (BART-MNLI) with CP tensor decomposition to compare three spaces: (1) a claims matrix from 24 whitepapers across 10 semantic categories, (2) market statistics for 49 assets over two years of hourly data, and (3) latent factors from tensor decomposition (rank 2, 92.45% variance explained). Using Procrustes rotation and Tucker's congruence coefficient, we test alignment across 23 common entities.   Results show weak alignment: claims-statistics (phi=0.341, p=0.332), claims-factors (phi=0.077, p=0.747), and statistics-factors (phi=0.197, p<0.001). The statistics-factors significance validates our methodology, confirming the pipeline detects relationships when present. Inter-model validation with DeBERTa-v3 yields 32% exact agreement but 67% top-3 agreement. Cross-sectional analysis reveals heterogeneous contributions: NEAR, MKR, ATOM show positive alignment while ENS, UNI, Bitcoin diverge most. Excluding Bitcoin confirms results are not driven by market dominance.   We interpret findings as weak alignment between whitepaper narratives and market factor structure. Limited power (n=23) precludes distinguishing weak from no alignment, but strong alignment (phi>=0.70) can be confidently rejected. Implications for narrative economics and investment analysis are discussed.


【19】Explainable deep learning reveals the physical mechanisms behind the turbulent kinetic energy equation
标题:可解释的深度学习揭示了湍流动能方程背后的物理机制
链接:https://arxiv.org/abs/2601.20052

作者:Francisco Alcántara-Ávila,Andrés Cremades,Sergio Hoyas,Ricardo Vinuesa
备注:6 pages, 5 figures, 1 appendix
摘要:在这项工作中,我们使用可解释的深度学习(XDL)研究了控制湍流动能传输的物理机制。采用基于SHapley加法解释(SHAP)的XDL模型,在雷诺数Re_τ= 125$的条件下,对槽道湍流动能收支项的演化过程进行了重要结构的识别和渗透。结果表明,重要的构造主要位于近壁区,且与扫掠型地震活动的关系更为密切。在粘性层中,与生产和粘性扩散相关的SHAP结构几乎完全包含在与耗散相关的SHAP结构中,揭示了近壁湍流的清晰层次组织。在外层中,这种层次结构分解,只有速度-压力-梯度相关性和湍流运输SHAP结构仍然存在,具有大约60\%$的适度空间重合。最后,我们表明,没有一个相干结构的经典研究湍流能够代表背后的机制,在整个通道的湍流动能预算的各种条款。这些结果揭示了耗散作为近壁湍流的主要组织机制,在外层分解的单个结构层次内约束生产和粘性扩散。
摘要:In this work, we investigate the physical mechanisms governing turbulent kinetic energy transport using explainable deep learning (XDL). An XDL model based on SHapley Additive exPlanations (SHAP) is used to identify and percolate high-importance structures for the evolution of the turbulent kinetic energy budget terms of a turbulent channel flow at a friction Reynolds number of $Re_τ= 125$. The results show that the important structures are predominantly located in the near-wall region and are more frequently associated with sweep-type events. In the viscous layer, the SHAP structures relevant for production and viscous diffusion are almost entirely contained within those relevant for dissipation, revealing a clear hierarchical organization of near-wall turbulence. In the outer layer, this hierarchical organization breaks down and only velocity-pressure-gradient correlation and turbulent transport SHAP structures remain, with a moderate spatial coincidence of approximately $60\%$. Finally, we show that none of the coherent structures classically studied in turbulence are capable of representing the mechanisms behind the various terms of the turbulent kinetic energy budget throughout the channel. These results reveal dissipation as the dominant organizing mechanism of near-wall turbulence, constraining production and viscous diffusion within a single structural hierarchy that breaks down in the outer layer.


检测相关(3篇)

【1】Detecting and Mitigating Memorization in Diffusion Models through Anisotropy of the Log-Probability
标题:通过对概率各向异性检测和缓解扩散模型中的再同步化
链接:https://arxiv.org/abs/2601.20642

作者:Rohan Asthana,Vasileios Belagiannis
备注:Accepted at ICLR 2026
摘要:Diffusion-based image generative models produce high-fidelity images through iterative denoising but remain vulnerable to memorization, where they unintentionally reproduce exact copies or parts of training images. Recent memorization detection methods are primarily based on the norm of score difference as indicators of memorization. We prove that such norm-based metrics are mainly effective under the assumption of isotropic log-probability distributions, which generally holds at high or medium noise levels. In contrast, analyzing the anisotropic regime reveals that memorized samples exhibit strong angular alignment between the guidance vector and unconditional scores in the low-noise setting. Through these insights, we develop a memorization detection metric by integrating isotropic norm and anisotropic alignment. Our detection metric can be computed directly on pure noise inputs via two conditional and unconditional forward passes, eliminating the need for costly denoising steps. Detection experiments on Stable Diffusion v1.4 and v2 show that our metric outperforms existing denoising-free detection methods while being at least approximately 5x faster than the previous best approach. Finally, we demonstrate the effectiveness of our approach by utilizing a mitigation strategy that adapts memorized prompts based on our developed metric.


【2】Gap-K%: Measuring Top-1 Prediction Gap for Detecting Pretraining Data
标题:Gap-K%:测量检测预训练数据的前1预测差距
链接:https://arxiv.org/abs/2601.19936

作者:Minseo Kwak,Jaehyung Kim
备注:under review; 13 pages
摘要:The opacity of massive pretraining corpora in Large Language Models (LLMs) raises significant privacy and copyright concerns, making pretraining data detection a critical challenge. Existing state-of-the-art methods typically rely on token likelihoods, yet they often overlook the divergence from the model's top-1 prediction and local correlation between adjacent tokens. In this work, we propose Gap-K%, a novel pretraining data detection method grounded in the optimization dynamics of LLM pretraining. By analyzing the next-token prediction objective, we observe that discrepancies between the model's top-1 prediction and the target token induce strong gradient signals, which are explicitly penalized during training. Motivated by this, Gap-K% leverages the log probability gap between the top-1 predicted token and the target token, incorporating a sliding window strategy to capture local correlations and mitigate token-level fluctuations. Extensive experiments on the WikiMIA and MIMIR benchmarks demonstrate that Gap-K% achieves state-of-the-art performance, consistently outperforming prior baselines across various model sizes and input lengths.


【3】VSCOUT: A Hybrid Variational Autoencoder Approach to Outlier Detection in High-Dimensional Retrospective Monitoring
标题:VSCOT:一种用于多维回顾性监测中离群点检测的混合变分自动编码器方法
链接:https://arxiv.org/abs/2601.20830

作者:Waldyn G. Martinez
摘要:Modern industrial and service processes generate high-dimensional, non-Gaussian, and contamination-prone data that challenge the foundational assumptions of classical Statistical Process Control (SPC). Heavy tails, multimodality, nonlinear dependencies, and sparse special-cause observations can distort baseline estimation, mask true anomalies, and prevent reliable identification of an in-control (IC) reference set. To address these challenges, we introduce VSCOUT, a distribution-free framework designed specifically for retrospective (Phase I) monitoring in high-dimensional settings. VSCOUT combines an Automatic Relevance Determination Variational Autoencoder (ARD-VAE) architecture with ensemble-based latent outlier filtering and changepoint detection. The ARD prior isolates the most informative latent dimensions, while the ensemble and changepoint filters identify pointwise and structural contamination within the determined latent space. A second-stage retraining step removes flagged observations and re-estimates the latent structure using only the retained inliers, mitigating masking and stabilizing the IC latent manifold. This two-stage refinement produces a clean and reliable IC baseline suitable for subsequent Phase II deployment. Extensive experiments across benchmark datasets demonstrate that VSCOUT achieves superior sensitivity to special-cause structure while maintaining controlled false alarms, outperforming classical SPC procedures, robust estimators, and modern machine-learning baselines. Its scalability, distributional flexibility, and resilience to complex contamination patterns position VSCOUT as a practical and effective method for retrospective modeling and anomaly detection in AI-enabled environments.


分类|识别(6篇)

【1】CoBA: Integrated Deep Learning Model for Reliable Low-Altitude UAV Classification in mmWave Radio Networks
标题:CoBA:毫米波无线电网络中可靠的低空无人机分类的集成深度学习模型
链接:https://arxiv.org/abs/2601.20605

作者:Junaid Sajid,Ivo Müürsepp,Luca Reggiani,Davide Scazzoli,Federico Francesco Luigi Mariani,Maurizio Magarini,Rizwan Ahmad,Muhammad Mahtab Alam
备注:6 Pages, This paper has been accepted for publication at the IEEE International Conference on Communications (ICC) 2026
摘要:Uncrewed Aerial Vehicles (UAVs) are increasingly used in civilian and industrial applications, making secure low-altitude operations crucial. In dense mmWave environments, accurately classifying low-altitude UAVs as either inside authorized or restricted airspaces remains challenging, requiring models that handle complex propagation and signal variability. This paper proposes a deep learning model, referred to as CoBA, which stands for integrated Convolutional Neural Network (CNN), Bidirectional Long Short-Term Memory (BiLSTM), and Attention which leverages Fifth Generation (5G) millimeter-wave (mmWave) radio measurements to classify UAV operations in authorized and restricted airspaces at low altitude. The proposed CoBA model integrates convolutional, bidirectional recurrent, and attention layers to capture both spatial and temporal patterns in UAV radio measurements. To validate the model, a dedicated dataset is collected using the 5G mmWave network at TalTech, with controlled low altitude UAV flights in authorized and restricted scenarios. The model is evaluated against conventional ML models and a fingerprinting-based benchmark. Experimental results show that CoBA achieves superior accuracy, significantly outperforming all baseline models and demonstrating its potential for reliable and regulated UAV airspace monitoring.


【2】Cheap2Rich: A Multi-Fidelity Framework for Data Assimilation and System Identification of Multiscale Physics -- Rotating Detonation Engines
标题:Cheap 2 Rich:多尺度物理数据同化和系统识别的多保真框架--旋转雷管
链接:https://arxiv.org/abs/2601.20295

作者:Yuxuan Bao,Jan Zajac,Megan Powers,Venkat Raman,J. Nathan Kutz
摘要:Bridging the sim2real gap between computationally inexpensive models and complex physical systems remains a central challenge in machine learning applications to engineering problems, particularly in multi-scale settings where reduced-order models typically capture only dominant dynamics. In this work, we present Cheap2Rich, a multi-scale data assimilation framework that reconstructs high-fidelity state spaces from sparse sensor histories by combining a fast low-fidelity prior with learned, interpretable discrepancy corrections. We demonstrate the performance on rotating detonation engines (RDEs), a challenging class of systems that couple detonation-front propagation with injector-driven unsteadiness, mixing, and stiff chemistry across disparate scales. Our approach successfully reconstructs high-fidelity RDE states from sparse measurements while isolating physically meaningful discrepancy dynamics associated with injector-driven effects. The results highlight a general multi-fidelity framework for data assimilation and system identification in complex multi-scale systems, enabling rapid design exploration and real-time monitoring and control while providing interpretable discrepancy dynamics. Code for this project is is available at: github.com/kro0l1k/Cheap2Rich.


【3】Causal-Driven Feature Evaluation for Cross-Domain Image Classification
标题:基于空间驱动的跨域图像分类特征评估
链接:https://arxiv.org/abs/2601.20176

作者:Chen Cheng,Ang Li
备注:Preprint
摘要:Out-of-distribution (OOD) generalization remains a fundamental challenge in real-world classification, where test distributions often differ substantially from training data. Most existing approaches pursue domain-invariant representations, implicitly assuming that invariance implies reliability. However, features that are invariant across domains are not necessarily causally effective for prediction.   In this work, we revisit OOD classification from a causal perspective and propose to evaluate learned representations based on their necessity and sufficiency under distribution shift. We introduce an explicit segment-level framework that directly measures causal effectiveness across domains, providing a more faithful criterion than invariance alone.   Experiments on multi-domain benchmarks demonstrate consistent improvements in OOD performance, particularly under challenging domain shifts, highlighting the value of causal evaluation for robust generalization.


【4】Perturbation-Induced Linearization: Constructing Unlearnable Data with Solely Linear Classifiers
标题:扰动诱导线性化:仅使用线性分类器构建不可学习的数据
链接:https://arxiv.org/abs/2601.19967

作者:Jinlin Liu,Wei Chen,Xiaojin Zhang
备注:This paper has been accepted to ICLR 2026
摘要:Collecting web data to train deep models has become increasingly common, raising concerns about unauthorized data usage. To mitigate this issue, unlearnable examples introduce imperceptible perturbations into data, preventing models from learning effectively. However, existing methods typically rely on deep neural networks as surrogate models for perturbation generation, resulting in significant computational costs. In this work, we propose Perturbation-Induced Linearization (PIL), a computationally efficient yet effective method that generates perturbations using only linear surrogate models. PIL achieves comparable or better performance than existing surrogate-based methods while reducing computational time dramatically. We further reveal a key mechanism underlying unlearnable examples: inducing linearization to deep models, which explains why PIL can achieve competitive results in a very short time. Beyond this, we provide an analysis about the property of unlearnable examples under percentage-based partial perturbation. Our work not only provides a practical approach for data protection but also offers insights into what makes unlearnable examples effective.


【5】Classifier Calibration at Scale: An Empirical Study of Model-Agnostic Post-Hoc Methods
标题:大规模分类器校准:模型不可知事后方法的实证研究
链接:https://arxiv.org/abs/2601.19944

作者:Valery Manokhin,Daniel Grønhaug
备注:61 pages, 23 figures
摘要:We study model-agnostic post-hoc calibration methods intended to improve probabilistic predictions in supervised binary classification on real i.i.d. tabular data, with particular emphasis on conformal and Venn-based approaches that provide distribution-free validity guarantees under exchangeability. We benchmark 21 widely used classifiers, including linear models, SVMs, tree ensembles (CatBoost, XGBoost, LightGBM), and modern tabular neural and foundation models, on binary tasks from the TabArena-v0.1 suite using randomized, stratified five-fold cross-validation with a held-out test fold. Five calibrators; Isotonic regression, Platt scaling, Beta calibration, Venn-Abers predictors, and Pearsonify are trained on a separate calibration split and applied to test predictions. Calibration is evaluated using proper scoring rules (log-loss and Brier score) and diagnostic measures (Spiegelhalter's Z, ECE, and ECI), alongside discrimination (AUC-ROC) and standard classification metrics. Across tasks and architectures, Venn-Abers predictors achieve the largest average reductions in log-loss, followed closely by Beta calibration, while Platt scaling exhibits weaker and less consistent effects. Beta calibration improves log-loss most frequently across tasks, whereas Venn-Abers displays fewer instances of extreme degradation and slightly more instances of extreme improvement. Importantly, we find that commonly used calibration procedures, most notably Platt scaling and isotonic regression, can systematically degrade proper scoring performance for strong modern tabular models. Overall classification performance is often preserved, but calibration effects vary substantially across datasets and architectures, and no method dominates uniformly. In expectation, all methods except Pearsonify slightly increase accuracy, but the effect is marginal, with the largest expected gain about 0.008%.


【6】Trigger Optimization and Event Classification for Dark Matter Searches in the CYGNO Experiment Using Machine Learning
标题:基于机器学习的CYGNO暗物质探测器触发器优化和事件分类
链接:https://arxiv.org/abs/2601.20626

作者:F. D. Amaro,R. Antonietti,E. Baracchini,L. Benussi,C. Capoccia,M. Caponero,L. G. M. de Carvalho,G. Cavoto,I. A. Costa,A. Croce,M. D'Astolfo,G. D'Imperio,G. Dho,E. Di Marco,J. M. F. dos Santos,D. Fiorina,F. Iacoangeli,Z. Islam,E. Kemp,H. P. Lima,G. Maccarrone,R. D. P. Mano,D. J. G. Marques,G. Mazzitelli,P. Meloni,A. Messina,C. M. B. Monteiro,R. A. Nobrega,G. M. Oppedisano,I. F. Pains,E. Paoletti,F. Petrucci,S. Piacentini,D. Pierluigi,D. Pinci,F. Renga,A. Russo,G. Saviano,P. A. O. C. Silva,N. J. Spooner,R. Tesauro,S. Tomassini,D. Tozzi
备注:6 pages, 1 figure, 14th Young Researcher Meeting (YRM 2025)
摘要:The CYGNO experiment employs an optical-readout Time Projection Chamber (TPC) to search for rare low-energy interactions using finely resolved scintillation images. While the optical readout provides rich topological information, it produces large, sparse megapixel images that challenge real-time triggering, data reduction, and background discrimination.   We summarize two complementary machine-learning approaches developed within CYGNO. First, we present a fast and fully unsupervised strategy for online data reduction based on reconstruction-based anomaly detection. A convolutional autoencoder trained exclusively on pedestal images (i.e. frames acquired with GEM amplification disabled) learns the detector noise morphology and highlights particle-induced structures through localized reconstruction residuals, from which compact Regions of Interest (ROIs) are extracted. On real prototype data, the selected configuration retains (93.0 +/- 0.2)% of reconstructed signal intensity while discarding (97.8 +/- 0.1)% of the image area, with ~25 ms per-frame inference time on a consumer GPU.   Second, we report a weakly supervised application of the Classification Without Labels (CWoLa) framework to data acquired with an Americium--Beryllium neutron source. Using only mixed AmBe and standard datasets (no event-level labels), a convolutional classifier learns to identify nuclear-recoil-like topologies. The achieved performance approaches the theoretical limit imposed by the mixture composition and isolates a high-score population with compact, approximately circular morphologies consistent with nuclear recoils.


表征(1篇)

【1】Implicit Hypothesis Testing and Divergence Preservation in Neural Network Representations
标题:神经网络表示中的隐式假设测试和分歧保持
链接:https://arxiv.org/abs/2601.20477

作者:Kadircan Aksoy,Peter Jung,Protim Bhattacharjee
摘要:We study the supervised training dynamics of neural classifiers through the lens of binary hypothesis testing. We model classification as a set of binary tests between class-conditional distributions of representations and empirically show that, along training trajectories, well-generalizing networks increasingly align with Neyman-Pearson optimal decision rules via monotonic improvements in KL divergence that relate to error rate exponents. We finally discuss how this yields an explanation and possible training or regularization strategies for different classes of neural networks.


3D|3D重建等相关(2篇)

【1】A Learning-based Framework for Spatial Impulse Response Compensation in 3D Photoacoustic Computed Tomography
标题:基于学习的三维光声CT空间脉冲响应补偿框架
链接:https://arxiv.org/abs/2601.20291

作者:Kaiyi Yang,Seonyeong Park,Gangwon Jeong,Hsuan-Kai Huang,Alexander A. Oraevsky,Umberto Villa,Mark A. Anastasio
备注:Submitted to IEEE TMI
摘要 :Photoacoustic computed tomography (PACT) is a promising imaging modality that combines the advantages of optical contrast with ultrasound detection. Utilizing ultrasound transducers with larger surface areas can improve detection sensitivity. However, when computationally efficient analytic reconstruction methods that neglect the spatial impulse responses (SIRs) of the transducer are employed, the spatial resolution of the reconstructed images will be compromised. Although optimization-based reconstruction methods can explicitly account for SIR effects, their computational cost is generally high, particularly in three-dimensional (3D) applications. To address the need for accurate but rapid 3D PACT image reconstruction, this study presents a framework for establishing a learned SIR compensation method that operates in the data domain. The learned compensation method maps SIR-corrupted PACT measurement data to compensated data that would have been recorded by idealized point-like transducers. Subsequently, the compensated data can be used with a computationally efficient reconstruction method that neglects SIR effects. Two variants of the learned compensation model are investigated that employ a U-Net model and a specifically designed, physics-inspired model, referred to as Deconv-Net. A fast and analytical training data generation procedure is also a component of the presented framework. The framework is rigorously validated in virtual imaging studies, demonstrating resolution improvement and robustness to noise variations, object complexity, and sound speed heterogeneity. When applied to in-vivo breast imaging data, the learned compensation models revealed fine structures that had been obscured by SIR-induced artifacts. To our knowledge, this is the first demonstration of learned SIR compensation in 3D PACT imaging.


【2】Size Matters: Reconstructing Real-Scale 3D Models from Monocular Images for Food Portion Estimation
标题:尺寸很重要:从单目图像重建真实规模3D模型以估计食物份额
链接:https://arxiv.org/abs/2601.20051

作者:Gautham Vinod,Bruce Coburn,Siddeshwar Raghavan,Jiangpeng He,Fengqing Zhu
摘要:The rise of chronic diseases related to diet, such as obesity and diabetes, emphasizes the need for accurate monitoring of food intake. While AI-driven dietary assessment has made strides in recent years, the ill-posed nature of recovering size (portion) information from monocular images for accurate estimation of ``how much did you eat?'' is a pressing challenge. Some 3D reconstruction methods have achieved impressive geometric reconstruction but fail to recover the crucial real-world scale of the reconstructed object, limiting its usage in precision nutrition. In this paper, we bridge the gap between 3D computer vision and digital health by proposing a method that recovers a true-to-scale 3D reconstructed object from a monocular image. Our approach leverages rich visual features extracted from models trained on large-scale datasets to estimate the scale of the reconstructed object. This learned scale enables us to convert single-view 3D reconstructions into true-to-life, physically meaningful models. Extensive experiments and ablation studies on two publicly available datasets show that our method consistently outperforms existing techniques, achieving nearly a 30% reduction in mean absolute volume-estimation error, showcasing its potential to enhance the domain of precision nutrition. Code: https://gitlab.com/viper-purdue/size-matters


编码器(1篇)

【1】ACFormer: Mitigating Non-linearity with Auto Convolutional Encoder for Time Series Forecasting
标题:ACFormer:使用自动卷积编码器缓解时间序列预测的非线性
链接:https://arxiv.org/abs/2601.20611

作者:Gawon Lee,Hanbyeol Park,Minseop Kim,Dohee Kim,Hyerim Bae
摘要:Time series forecasting (TSF) faces challenges in modeling complex intra-channel temporal dependencies and inter-channel correlations. Although recent research has highlighted the efficiency of linear architectures in capturing global trends, these models often struggle with non-linear signals. To address this gap, we conducted a systematic receptive field analysis of convolutional neural network (CNN) TSF models. We introduce the "individual receptive field" to uncover granular structural dependencies, revealing that convolutional layers act as feature extractors that mirror channel-wise attention while exhibiting superior robustness to non-linear fluctuations. Based on these insights, we propose ACFormer, an architecture designed to reconcile the efficiency of linear projections with the non-linear feature-extraction power of convolutions. ACFormer captures fine-grained information through a shared compression module, preserves temporal locality via gated attention, and reconstructs variable-specific temporal patterns using an independent patch expansion layer. Extensive experiments on multiple benchmark datasets demonstrate that ACFormer consistently achieves state-of-the-art performance, effectively mitigating the inherent drawbacks of linear models in capturing high-frequency components.


优化|敛散性(4篇)

【1】Reinforcement Unlearning via Group Relative Policy Optimization
标题:通过群体相对政策优化的强化取消学习
链接:https://arxiv.org/abs/2601.20568

作者:Efstratios Zaradoukas,Bardh Prenkaj,Gjergji Kasneci
摘要:During pretraining, LLMs inadvertently memorize sensitive or copyrighted data, posing significant compliance challenges under legal frameworks like the GDPR and the EU AI Act. Fulfilling these mandates demands techniques that can remove information from a deployed model without retraining from scratch. Existing unlearning approaches attempt to address this need, but often leak the very data they aim to erase, sacrifice fluency and robustness, or depend on costly external reward models. We introduce PURGE (Policy Unlearning through Relative Group Erasure), a novel method grounded in the Group Relative Policy Optimization framework that formulates unlearning as a verifiable problem. PURGE uses an intrinsic reward signal that penalizes any mention of forbidden concepts, allowing safe and consistent unlearning. Our approach reduces token usage per target by up to a factor of 46 compared with SotA methods, while improving fluency by 5.48 percent and adversarial robustness by 12.02 percent over the base model. On the Real World Knowledge Unlearning (RWKU) benchmark, PURGE achieves 11 percent unlearning effectiveness while preserving 98 percent of original utility. PURGE shows that framing LLM unlearning as a verifiable task, enables more reliable, efficient, and scalable forgetting, suggesting a promising new direction for unlearning research that combines theoretical guarantees, improved safety, and practical deployment efficiency.


【2】Order-Optimal Sample Complexity of Rectified Flows
标题:已纠正流的有序最优样本复杂性
链接:https://arxiv.org/abs/2601.20250

作者:Hari Krishna Sahoo,Mudit Gaur,Vaneet Aggarwal
摘要 :Recently, flow-based generative models have shown superior efficiency compared to diffusion models. In this paper, we study rectified flow models, which constrain transport trajectories to be linear from the base distribution to the data distribution. This structural restriction greatly accelerates sampling, often enabling high-quality generation with a single Euler step. Under standard assumptions on the neural network classes used to parameterize the velocity field and data distribution, we prove that rectified flows achieve sample complexity $\tilde{O}(\varepsilon^{-2})$. This improves on the best known $O(\varepsilon^{-4})$ bounds for flow matching model and matches the optimal rate for mean estimation. Our analysis exploits the particular structure of rectified flows: because the model is trained with a squared loss along linear paths, the associated hypothesis class admits a sharply controlled localized Rademacher complexity. This yields the improved, order-optimal sample complexity and provides a theoretical explanation for the strong empirical performance of rectified flow models.


【3】Certificate-Guided Pruning for Stochastic Lipschitz Optimization
标题:随机Lipschitz优化的证书引导修剪
链接:https://arxiv.org/abs/2601.20231

作者:Ibne Farabi Shihab,Sanjeda Akter,Anuj Sharma
摘要:We study black-box optimization of Lipschitz functions under noisy evaluations. Existing adaptive discretization methods implicitly avoid suboptimal regions but do not provide explicit certificates of optimality or measurable progress guarantees. We introduce \textbf{Certificate-Guided Pruning (CGP)}, which maintains an explicit \emph{active set} $A_t$ of potentially optimal points via confidence-adjusted Lipschitz envelopes. Any point outside $A_t$ is certifiably suboptimal with high probability, and under a margin condition with near-optimality dimension $α$, we prove $\Vol(A_t)$ shrinks at a controlled rate yielding sample complexity $\tildeO(\varepsilon^{-(2+α)})$. We develop three extensions: CGP-Adaptive learns $L$ online with $O(\log T)$ overhead; CGP-TR scales to $d > 50$ via trust regions with local certificates; and CGP-Hybrid switches to GP refinement when local smoothness is detected. Experiments on 12 benchmarks ($d \in [2, 100]$) show CGP variants match or exceed strong baselines while providing principled stopping criteria via certificate volume.


【4】Parametric and Generative Forecasts of Day-Ahead Market Curves for Storage Optimization
标题:存储优化的未来一天市场曲线的参数化和生成性预测
链接:https://arxiv.org/abs/2601.20226

作者:Julian Gutierrez,Redouane Silvente
备注:46 pages, 41 figures
摘要:We present two machine learning frameworks for forecasting aggregated curves and optimizing storage in the EPEX SPOT day-ahead market. First, a fast parametric model forecasts hourly demand and supply curves in a low-dimensional and grid-robust representation, with minimum and maximum volumes combined with a Chebyshev polynomial for the elastic segment. The model enables daily use with low error and clear interpretability. Second, for a more comprehensive analysis, though less suited to daily operation, we employ generative models that learn the joint distribution of 24-hour order-level submissions given weather and fuel variables. These models generate synthetic daily scenarios of individual buy and sell orders, which, once aggregated, yield hourly supply and demand curves. Based on these forecasts, we optimize a price-making storage strategy, quantify revenue distributions, and highlight the price-compression effect with lower peaks, higher off-peak levels, and diminishing returns as capacity expands.


预测|估计(11篇)

【1】Robust Distributed Learning under Resource Constraints: Decentralized Quantile Estimation via (Asynchronous) ADMM
标题:资源约束下的鲁棒分布式学习:通过(同步)ADMM的分散分位数估计
链接:https://arxiv.org/abs/2601.20571

作者:Anna van Elst,Igor Colin,Stephan Clémençon
摘要:Specifications for decentralized learning on resource-constrained edge devices require algorithms that are communication-efficient, robust to data corruption, and lightweight in memory usage. While state-of-the-art gossip-based methods satisfy the first requirement, achieving robustness remains challenging. Asynchronous decentralized ADMM-based methods have been explored for estimating the median, a statistical centrality measure that is notoriously more robust than the mean. However, existing approaches require memory that scales with node degree, making them impractical when memory is limited. In this paper, we propose AsylADMM, a novel gossip algorithm for decentralized median and quantile estimation, primarily designed for asynchronous updates and requiring only two variables per node. We analyze a synchronous variant of AsylADMM to establish theoretical guarantees and empirically demonstrate fast convergence for the asynchronous algorithm. We then show that our algorithm enables quantile-based trimming, geometric median estimation, and depth-based trimming, with quantile-based trimming empirically outperforming existing rank-based methods. Finally, we provide a novel theoretical analysis of rank-based trimming via Markov chain theory.


【2】TimeCatcher: A Variational Framework for Volatility-Aware Forecasting of Non-Stationary Time Series
标题:TimeCatcher:非平稳时间序列波动性感知预测的变分框架
链接:https://arxiv.org/abs/2601.20448

作者:Zhiyu Chen,Minhao Liu,Yanru Zhang
备注:Under review. 13 pages, 8 figures. This paper proposes a variational framework with adaptive volatility enhancement for non-stationary time series forecasting
摘要:Recent lightweight MLP-based models have achieved strong performance in time series forecasting by capturing stable trends and seasonal patterns. However, their effectiveness hinges on an implicit assumption of local stationarity assumption, making them prone to errors in long-term forecasting of highly non-stationary series, especially when abrupt fluctuations occur, a common challenge in domains like web traffic monitoring. To overcome this limitation, we propose TimeCatcher, a novel Volatility-Aware Variational Forecasting framework. TimeCatcher extends linear architectures with a variational encoder to capture latent dynamic patterns hidden in historical data and a volatility-aware enhancement mechanism to detect and amplify significant local variations. Experiments on nine real-world datasets from traffic, financial, energy, and weather domains show that TimeCatcher consistently outperforms state-of-the-art baselines, with particularly large improvements in long-term forecasting scenarios characterized by high volatility and sudden fluctuations. Our code is available at https://github.com/ColaPrinceCHEN/TimeCatcher.


【3】ScatterFusion: A Hierarchical Scattering Transform Framework for Enhanced Time Series Forecasting
标题:ScatterFusion:用于增强型时间序列预测的分层分散变换框架
链接 :https://arxiv.org/abs/2601.20401

作者:Wei Li
备注:Accepted by ICASSP 2026
摘要:Time series forecasting presents significant challenges due to the complex temporal dependencies at multiple time scales. This paper introduces ScatterFusion, a novel framework that synergistically integrates scattering transforms with hierarchical attention mechanisms for robust time series forecasting. Our approach comprises four key components: (1) a Hierarchical Scattering Transform Module (HSTM) that extracts multi-scale invariant features capturing both local and global patterns; (2) a Scale-Adaptive Feature Enhancement (SAFE) module that dynamically adjusts feature importance across different scales; (3) a Multi-Resolution Temporal Attention (MRTA) mechanism that learns dependencies at varying time horizons; and (4) a Trend-Seasonal-Residual (TSR) decomposition-guided structure-aware loss function. Extensive experiments on seven benchmark datasets demonstrate that ScatterFusion outperforms other common methods, achieving significant reductions in error metrics across various prediction horizons.


【4】Delayed Feedback Modeling for Post-Click Gross Merchandise Volume Prediction: Benchmark, Insights and Approaches
标题:延迟反馈模型用于点击后总销售量预测:基准,见解和方法
链接:https://arxiv.org/abs/2601.20307

作者:Xinyu Li,Sishuo Chen,Guipeng Xv,Li Zhang,Mingxuan Luo,Zhangming Chan,Xiang-Rong Sheng,Han Zhu,Jian Xu,Chen Lin
备注:This paper has been accepted by the ACM Web Conference (WWW) 2026. This is the camera-ready version. Please refer to the published version for citation once available
摘要:The prediction objectives of online advertisement ranking models are evolving from probabilistic metrics like conversion rate (CVR) to numerical business metrics like post-click gross merchandise volume (GMV). Unlike the well-studied delayed feedback problem in CVR prediction, delayed feedback modeling for GMV prediction remains unexplored and poses greater challenges, as GMV is a continuous target, and a single click can lead to multiple purchases that cumulatively form the label. To bridge the research gap, we establish TRACE, a GMV prediction benchmark containing complete transaction sequences rising from each user click, which supports delayed feedback modeling in an online streaming manner. Our analysis and exploratory experiments on TRACE reveal two key insights: (1) the rapid evolution of the GMV label distribution necessitates modeling delayed feedback under online streaming training; (2) the label distribution of repurchase samples substantially differs from that of single-purchase samples, highlighting the need for separate modeling. Motivated by these findings, we propose RepurchasE-Aware Dual-branch prEdictoR (READER), a novel GMV modeling paradigm that selectively activates expert parameters according to repurchase predictions produced by a router. Moreover, READER dynamically calibrates the regression target to mitigate under-estimation caused by incomplete labels. Experimental results show that READER yields superior performance on TRACE over baselines, achieving a 2.19% improvement in terms of accuracy. We believe that our study will open up a new avenue for studying online delayed feedback modeling for GMV prediction, and our TRACE benchmark with the gathered insights will facilitate future research and application in this promising direction. Our code and dataset are available at https://github.com/alimama-tech/OnlineGMV .


【5】The Forecast After the Forecast: A Post-Processing Shift in Time Series
标题:预测之后的预测:时间序列的后处理转变
链接:https://arxiv.org/abs/2601.20280

作者:Daojun Liang,Qi Li,Yinglong Wang,Jing Chen,Hu Zhang,Xiaoxiao Cui,Qizheng Wang,Shuo Li
备注:30 Pages
摘要:Time series forecasting has long been dominated by advances in model architecture, with recent progress driven by deep learning and hybrid statistical techniques. However, as forecasting models approach diminishing returns in accuracy, a critical yet underexplored opportunity emerges: the strategic use of post-processing. In this paper, we address the last-mile gap in time-series forecasting, which is to improve accuracy and uncertainty without retraining or modifying a deployed backbone. We propose $δ$-Adapter, a lightweight, architecture-agnostic way to boost deployed time series forecasters without retraining. $δ$-Adapter learns tiny, bounded modules at two interfaces: input nudging (soft edits to covariates) and output residual correction. We provide local descent guarantees, $O(δ)$ drift bounds, and compositional stability for combined adapters. Meanwhile, it can act as a feature selector by learning a sparse, horizon-aware mask over inputs to select important features, thereby improving interpretability. In addition, it can also be used as a distribution calibrator to measure uncertainty. Thus, we introduce a Quantile Calibrator and a Conformal Corrector that together deliver calibrated, personalized intervals with finite-sample coverage. Our experiments across diverse backbones and datasets show that $δ$-Adapter improves accuracy and calibration with negligible compute and no interface changes.


【6】Robust SDE Parameter Estimation Under Missing Time Information Setting
标题:缺失时间信息设置下的鲁棒RST参数估计
链接:https://arxiv.org/abs/2601.20268

作者:Long Van Tran,Truyen Tran,Phuoc Nguyen
摘要:Recent advances in stochastic differential equations (SDEs) have enabled robust modeling of real-world dynamical processes across diverse domains, such as finance, health, and systems biology. However, parameter estimation for SDEs typically relies on accurately timestamped observational sequences. When temporal ordering information is corrupted, missing, or deliberately hidden (e.g., for privacy), existing estimation methods often fail. In this paper, we investigate the conditions under which temporal order can be recovered and introduce a novel framework that simultaneously reconstructs temporal information and estimates SDE parameters. Our approach exploits asymmetries between forward and backward processes, deriving a score-matching criterion to infer the correct temporal order between pairs of observations. We then recover the total order via a sorting procedure and estimate SDE parameters from the reconstructed sequence using maximum likelihood. Finally, we conduct extensive experiments on synthetic and real-world datasets to demonstrate the effectiveness of our method, extending parameter estimation to settings with missing temporal order and broadening applicability in sensitive domains.


【7】Proactive SFC Provisioning with Forecast-Driven DRL in Data Centers
标题:在数据中心中利用预测驱动的DRL主动进行SFC调配
链接:https://arxiv.org/abs/2601.20229

作者:Parisa Fard Moshiri,Poonam Lohan,Burak Kantarci,Emil Janulewicz
备注:6 pages, 3 figures, Accepted to IEEE International Conference on Communications (ICC) 2026
摘要:Service Function Chaining (SFC) requires efficient placement of Virtual Network Functions (VNFs) to satisfy diverse service requirements while maintaining high resource utilization in Data Centers (DCs). Conventional static resource allocation often leads to overprovisioning or underprovisioning due to the dynamic nature of traffic loads and application demands. To address this challenge, we propose a hybrid forecast-driven Deep reinforcement learning (DRL) framework that combines predictive intelligence with SFC provisioning. Specifically, we leverage DRL to generate datasets capturing DC resource utilization and service demands, which are then used to train deep learning forecasting models. Using Optuna-based hyperparameter optimization, the best-performing models, Spatio-Temporal Graph Neural Network, Temporal Graph Neural Network, and Long Short-Term Memory, are combined into an ensemble to enhance stability and accuracy. The ensemble predictions are integrated into the DC selection process, enabling proactive placement decisions that consider both current and future resource availability. Experimental results demonstrate that the proposed method not only sustains high acceptance ratios for resource-intensive services such as Cloud Gaming and VoIP but also significantly improves acceptance ratios for latency-critical categories such as Augmented Reality increases from 30% to 50%, while Industry 4.0 improves from 30% to 45%. Consequently, the prediction-based model achieves significantly lower E2E latencies of 20.5%, 23.8%, and 34.8% reductions for VoIP, Video Streaming, and Cloud Gaming, respectively. This strategy ensures more balanced resource allocation, and reduces contention.


【8】On the Computational Complexity of Performative Prediction
标题:论表演预测的计算复杂性
链接:https://arxiv.org/abs/2601.20180

作者:Ioannis Anagnostides,Rohan Chauhan,Ioannis Panageas,Tuomas Sandholm,Jingming Yan
摘要:Performative prediction captures the phenomenon where deploying a predictive model shifts the underlying data distribution. While simple retraining dynamics are known to converge linearly when the performative effects are weak ($ρ< 1$), the complexity in the regime $ρ> 1$ was hitherto open. In this paper, we establish a sharp phase transition: computing an $ε$-performatively stable point is PPAD-complete -- and thus polynomial-time equivalent to Nash equilibria in general-sum games -- even when $ρ= 1 + O(ε)$. This intractability persists even in the ostensibly simple setting with a quadratic loss function and linear distribution shifts. One of our key technical contributions is to extend this PPAD-hardness result to general convex domains, which is of broader interest in the complexity of variational inequalities. Finally, we address the special case of strategic classification, showing that computing a strategic local optimum is PLS-hard.


【9】Scaling Next-Brain-Token Prediction for MEG
标题:扩展MEG的下一个大脑令牌预测
链接:https://arxiv.org/abs/2601.20138

作者:Richard Csaky
摘要:We present a large autoregressive model for source-space MEG that scales next-token prediction to long context across datasets and scanners: handling a corpus of over 500 hours and thousands of sessions across the three largest MEG datasets. A modified SEANet-style vector-quantizer reduces multichannel MEG into a flattened token stream on which we train a Qwen2.5-VL backbone from scratch to predict the next brain token and to recursively generate minutes of MEG from up to a minute of context. To evaluate long-horizon generation, we introduce three task-matched tests: (i) on-manifold stability via generated-only drift compared to the time-resolved distribution of real sliding windows, and (ii) conditional specificity via correct context versus prompt-swap controls using a neurophysiologically grounded metric set. We train on CamCAN and Omega and run all analyses on held-out MOUS, establishing cross-dataset generalization. Across metrics, generations remain relatively stable over long rollouts and are closer to the correct continuation than swapped controls. Code available at: https://github.com/ricsinaruto/brain-gen.


【10】Modeling Cascaded Delay Feedback for Online Net Conversion Rate Prediction: Benchmark, Insights and Solutions
标题:在线净转化率预测的级联延迟反馈建模:基准、见解和解决方案
链接:https://arxiv.org/abs/2601.19965

作者:Mingxuan Luo,Guipeng Xv,Sishuo Chen,Xinyu Li,Li Zhang,Zhangming Chan,Xiang-Rong Sheng,Han Zhu,Jian Xu,Bo Zheng,Chen Lin
摘要:In industrial recommender systems, conversion rate (CVR) is widely used for traffic allocation, but it fails to fully reflect recommendation effectiveness because it ignores refund behavior. To better capture true user satisfaction and business value, net conversion rate (NetCVR), defined as the probability that a clicked item is purchased and not refunded, has been proposed.Unlike CVR, NetCVR prediction involves a more complex multi-stage cascaded delayed feedback process. The two cascaded delays from click to conversion and from conversion to refund have opposite effects, making traditional CVR modeling methods inapplicable. Moreover, the lack of open-source datasets and online continuous training schemes further hinders progress in this area.To address these challenges, we introduce CASCADE (Cascaded Sequences of Conversion and Delayed Refund), the first large-scale open dataset derived from the Taobao app for online continuous NetCVR prediction. Through an in-depth analysis of CASCADE, we identify three key insights: (1) NetCVR exhibits strong temporal dynamics, necessitating online continuous modeling; (2) cascaded modeling of CVR and refund rate outperforms direct NetCVR modeling; and (3) delay time, which correlates with both CVR and refund rate, is an important feature for NetCVR prediction.Based on these insights, we propose TESLA, a continuous NetCVR modeling framework featuring a CVR-refund-rate cascaded architecture, stage-wise debiasing, and a delay-time-aware ranking loss. Extensive experiments demonstrate that TESLA consistently outperforms state-of-the-art methods on CASCADE, achieving absolute improvements of 12.41 percent in RI-AUC and 14.94 percent in RI-PRAUC on NetCVR prediction. The code and dataset are publicly available at https://github.com/alimama-tech/NetCVR.


【11】Bias-Reduced Estimation of Finite Mixtures: An Application to Latent Group Structures in Panel Data
标题:有限混合的减偏估计:面板数据中潜在群结构的应用
链接:https://arxiv.org/abs/2601.20197

作者:Raphaël Langevin
摘要:Finite mixture models are widely used in econometric analyses to capture unobserved heterogeneity. This paper shows that maximum likelihood estimation of finite mixtures of parametric densities can suffer from substantial finite-sample bias in all parameters under mild regularity conditions. The bias arises from the influence of outliers in component densities with unbounded or large support and increases with the degree of overlap among mixture components. I show that maximizing the classification-mixture likelihood function, equipped with a consistent classifier, yields parameter estimates that are less biased than those obtained by standard maximum likelihood estimation (MLE). I then derive the asymptotic distribution of the resulting estimator and provide conditions under which oracle efficiency is achieved. Monte Carlo simulations show that conventional mixture MLE exhibits pronounced finite-sample bias, which diminishes as the sample size or the statistical distance between component densities tends to infinity. The simulations further show that the proposed estimation strategy generally outperforms standard MLE in finite samples in terms of both bias and mean squared errors under relatively weak assumptions. An empirical application to latent group panel structures using health administrative data shows that the proposed approach reduces out-of-sample prediction error by approximately 17.6% relative to the best results obtained from standard MLE procedures.


其他神经网络|深度学习|模型|建模(17篇)

【1】C3Box: A CLIP-based Class-Incremental Learning Toolbox
标题:C3 Box:基于CLIP的班级增量学习工具箱
链接:https://arxiv.org/abs/2601.20852

作者:Hao Sun,Da-Wei Zhou
备注:The code is available at https://github.com/LAMDA-CL/C3Box
摘要:Traditional machine learning systems are typically designed for static data distributions, which suffer from catastrophic forgetting when learning from evolving data streams. Class-Incremental Learning (CIL) addresses this challenge by enabling learning systems to continuously learn new classes while preserving prior knowledge. With the rise of pre-trained models (PTMs) such as CLIP, leveraging their strong generalization and semantic alignment capabilities has become a promising direction in CIL. However, existing CLIP-based CIL methods are often scattered across disparate codebases, rely on inconsistent configurations, hindering fair comparisons, reproducibility, and practical adoption. Therefore, we propose C3Box (CLIP-based Class-inCremental learning toolBOX), a modular and comprehensive Python toolbox. C3Box integrates representative traditional CIL methods, ViT-based CIL methods, and state-of-the-art CLIP-based CIL methods into a unified CLIP-based framework. By inheriting the streamlined design of PyCIL, C3Box provides a JSON-based configuration and standardized execution pipeline. This design enables reproducible experimentation with low engineering overhead and makes C3Box a reliable benchmark platform for continual learning research. Designed to be user-friendly, C3Box relies only on widely used open-source libraries and supports major operating systems. The code is available at https://github.com/LAMDA-CL/C3Box.


【2】Reward Models Inherit Value Biases from Pretraining
标题:奖励模型从预训练中继承价值偏见
链接:https://arxiv.org/abs/2601.20838

作者:Brian Christian,Jessica A. F. Thompson,Elle Michelle Yang,Vincent Adam,Hannah Rose Kirk,Christopher Summerfield,Tsvetomira Dumbalska
摘要:Reward models (RMs) are central to aligning large language models (LLMs) with human values but have received less attention than pre-trained and post-trained LLMs themselves. Because RMs are initialized from LLMs, they inherit representations that shape their behavior, but the nature and extent of this influence remain understudied. In a comprehensive study of 10 leading open-weight RMs using validated psycholinguistic corpora, we show that RMs exhibit significant differences along multiple dimensions of human value as a function of their base model. Using the "Big Two" psychological axes, we show a robust preference of Llama RMs for "agency" and a corresponding robust preference of Gemma RMs for "communion." This phenomenon holds even when the preference data and finetuning process are identical, and we trace it back to the logits of the respective instruction-tuned and pre-trained models. These log-probability differences themselves can be formulated as an implicit RM; we derive usable implicit reward scores and show that they exhibit the very same agency/communion difference. We run experiments training RMs with ablations for preference data source and quantity, which demonstrate that this effect is not only repeatable but surprisingly durable. Despite RMs being designed to represent human preferences, our evidence shows that their outputs are influenced by the pretrained LLMs on which they are based. This work underscores the importance of safety and alignment efforts at the pretraining stage, and makes clear that open-source developers' choice of base model is as much a consideration of values as of performance.


【3】GraphAllocBench: A Flexible Benchmark for Preference-Conditioned Multi-Objective Policy Learning
标题:GraphAllocBench:偏好条件多目标政策学习的灵活基准
链接:https://arxiv.org/abs/2601.20753

作者:Zhiheng Jiang,Yunzhe Wang,Ryan Marr,Ellen Novoseller,Benjamin T. Files,Volkan Ustun
摘要 :Preference-Conditioned Policy Learning (PCPL) in Multi-Objective Reinforcement Learning (MORL) aims to approximate diverse Pareto-optimal solutions by conditioning policies on user-specified preferences over objectives. This enables a single model to flexibly adapt to arbitrary trade-offs at run-time by producing a policy on or near the Pareto front. However, existing benchmarks for PCPL are largely restricted to toy tasks and fixed environments, limiting their realism and scalability. To address this gap, we introduce GraphAllocBench, a flexible benchmark built on a novel graph-based resource allocation sandbox environment inspired by city management, which we call CityPlannerEnv. GraphAllocBench provides a rich suite of problems with diverse objective functions, varying preference conditions, and high-dimensional scalability. We also propose two new evaluation metrics -- Proportion of Non-Dominated Solutions (PNDS) and Ordering Score (OS) -- that directly capture preference consistency while complementing the widely used hypervolume metric. Through experiments with Multi-Layer Perceptrons (MLPs) and graph-aware models, we show that GraphAllocBench exposes the limitations of existing MORL approaches and paves the way for using graph-based methods such as Graph Neural Networks in complex, high-dimensional combinatorial allocation tasks. Beyond its predefined problem set, GraphAllocBench enables users to flexibly vary objectives, preferences, and allocation rules, establishing it as a versatile and extensible benchmark for advancing PCPL. Code: https://anonymous.4open.science/r/GraphAllocBench


【4】Learning Contextual Runtime Monitors for Safe AI-Based Autonomy
标题:学习上下文语义以实现安全的基于AI的自主性
链接:https://arxiv.org/abs/2601.20666

作者:Alejandro Luque-Cerpa,Mengyuan Wang,Emil Carlsson,Sanjit A. Seshia,Devdatt Dubhashi,Hazem Torfah
摘要:We introduce a novel framework for learning context-aware runtime monitors for AI-based control ensembles. Machine-learning (ML) controllers are increasingly deployed in (autonomous) cyber-physical systems because of their ability to solve complex decision-making tasks. However, their accuracy can degrade sharply in unfamiliar environments, creating significant safety concerns. Traditional ensemble methods aim to improve robustness by averaging or voting across multiple controllers, yet this often dilutes the specialized strengths that individual controllers exhibit in different operating contexts. We argue that, rather than blending controller outputs, a monitoring framework should identify and exploit these contextual strengths. In this paper, we reformulate the design of safe AI-based control ensembles as a contextual monitoring problem. A monitor continuously observes the system's context and selects the controller best suited to the current conditions. To achieve this, we cast monitor learning as a contextual learning task and draw on techniques from contextual multi-armed bandits. Our approach comes with two key benefits: (1) theoretical safety guarantees during controller selection, and (2) improved utilization of controller diversity. We validate our framework in two simulated autonomous driving scenarios, demonstrating significant improvements in both safety and performance compared to non-contextual baselines.


【5】A Foundation Model for Virtual Sensors
标题:虚拟传感器的基础模型
链接:https://arxiv.org/abs/2601.20634

作者:Leon Götz,Lars Frederik Peiss,Erik Sauer,Andreas Udo Sass,Thorsten Bagdonat,Stephan Günnemann,Leo Schwinn
备注:18 pages in total, 15 figures
摘要:Virtual sensors use machine learning to predict target signals from available measurements, replacing expensive physical sensors in critical applications. Existing virtual sensor approaches require application-specific models with hand-selected inputs for each sensor, cannot leverage task synergies, and lack consistent benchmarks. At the same time, emerging time series foundation models are computationally expensive and limited to predicting their input signals, making them incompatible with virtual sensors. We introduce the first foundation model for virtual sensors addressing both limitations. Our unified model can simultaneously predict diverse virtual sensors exploiting synergies while maintaining computational efficiency. It learns relevant input signals for each virtual sensor, eliminating expert knowledge requirements while adding explainability. In our large-scale evaluation on a standard benchmark and an application-specific dataset with over 18 billion samples, our architecture achieves 415x reduction in computation time and 951x reduction in memory requirements, while maintaining or even improving predictive quality compared to baselines. Our model scales gracefully to hundreds of virtual sensors with nearly constant parameter count, enabling practical deployment in large-scale sensor networks.


【6】Regularized Gradient Temporal-Difference Learning
标题:正规化梯度时差学习
链接:https://arxiv.org/abs/2601.20599

作者:Hyunjun Na,Donghwan Lee
备注:27 pages, 8 figures
摘要:Gradient temporal-difference (GTD) learning algorithms are widely used for off-policy policy evaluation with function approximation. However, existing convergence analyses rely on the restrictive assumption that the so-called feature interaction matrix (FIM) is nonsingular. In practice, the FIM can become singular and leads to instability or degraded performance. In this paper, we propose a regularized optimization objective by reformulating the mean-square projected Bellman error (MSPBE) minimization. This formulation naturally yields a regularized GTD algorithms, referred to as R-GTD, which guarantees convergence to a unique solution even when the FIM is singular. We establish theoretical convergence guarantees and explicit error bounds for the proposed method, and validate its effectiveness through empirical experiments.


【7】Can Continuous-Time Diffusion Models Generate and Solve Globally Constrained Discrete Problems? A Study on Sudoku
标题:连续时间扩散模型可以生成并解决全局约束离散问题吗?数独研究
链接:https://arxiv.org/abs/2601.20363

作者:Mariia Drozdova
备注:26 pages, 5 figures. Empirical study of continuous-time diffusion and flow models on Sudoku. Code available at https://github.com/MariiaDrozdova/sudoku_generation
摘要:Can standard continuous-time generative models represent distributions whose support is an extremely sparse, globally constrained discrete set? We study this question using completed Sudoku grids as a controlled testbed, treating them as a subset of a continuous relaxation space. We train flow-matching and score-based models along a Gaussian probability path and compare deterministic (ODE) sampling, stochastic (SDE) sampling, and DDPM-style discretizations derived from the same continuous-time training. Unconditionally, stochastic sampling substantially outperforms deterministic flows; score-based samplers are the most reliable among continuous-time methods, and DDPM-style ancestral sampling achieves the highest validity overall. We further show that the same models can be repurposed for guided generation: by repeatedly sampling completions under clamped clues and stopping when constraints are satisfied, the model acts as a probabilistic Sudoku solver. Although far less sample-efficient than classical solvers and discrete-geometry-aware diffusion methods, these experiments demonstrate that classic diffusion/flow formulations can assign non-zero probability mass to globally constrained combinatorial structures and can be used for constraint satisfaction via stochastic search.


【8】TINNs: Time-Induced Neural Networks for Solving Time-Dependent PDEs
标题:TINN:用于求解时间相关的偏出方程的时间诱导神经网络
链接:https://arxiv.org/abs/2601.20361

作者:Chen-Yang Dai,Che-Chia Chang,Te-Sheng Lin,Ming-Chih Lai,Chieh-Hsin Lai
摘要:Physics-informed neural networks (PINNs) solve time-dependent partial differential equations (PDEs) by learning a mesh-free, differentiable solution that can be evaluated anywhere in space and time. However, standard space--time PINNs take time as an input but reuse a single network with shared weights across all times, forcing the same features to represent markedly different dynamics. This coupling degrades accuracy and can destabilize training when enforcing PDE, boundary, and initial constraints jointly. We propose Time-Induced Neural Networks (TINNs), a novel architecture that parameterizes the network weights as a learned function of time, allowing the effective spatial representation to evolve over time while maintaining shared structure. The resulting formulation naturally yields a nonlinear least-squares problem, which we optimize efficiently using a Levenberg--Marquardt method. Experiments on various time-dependent PDEs show up to $4\times$ improved accuracy and $10\times$ faster convergence compared to PINNs and strong baselines.


【9】Spark: Strategic Policy-Aware Exploration via Dynamic Branching for Long-Horizon Agentic Learning
标题:Spark:通过动态分支进行长期抽象学习的战略政策意识探索
链接:https://arxiv.org/abs/2601.20209

作者:Jinyang Wu,Shuo Yang,Changpeng Yang,Yuhao Shen,Shuai Zhang,Zhengqi Wen,Jianhua Tao
摘要:Reinforcement learning has empowered large language models to act as intelligent agents, yet training them for long-horizon tasks remains challenging due to the scarcity of high-quality trajectories, especially under limited resources. Existing methods typically scale up rollout sizes and indiscriminately allocate computational resources among intermediate steps. Such attempts inherently waste substantial computation budget on trivial steps while failing to guarantee sample quality. To address this, we propose \textbf{Spark} (\textbf{S}trategic \textbf{P}olicy-\textbf{A}ware explo\textbf{R}ation via \textbf{K}ey-state dynamic branching), a novel framework that selectively branches at critical decision states for resource-efficient exploration. Our key insight is to activate adaptive branching exploration at critical decision points to probe promising trajectories, thereby achieving precise resource allocation that prioritizes sampling quality over blind coverage. This design leverages the agent's intrinsic decision-making signals to reduce dependence on human priors, enabling the agent to autonomously expand exploration and achieve stronger generalization. Experiments across diverse tasks (e.g., embodied planning), demonstrate that \textsc{Spark} achieves superior success rates with significantly fewer training samples, exhibiting robust generalization even in unseen scenarios.


【10】Minimum-Cost Network Flow with Dual Predictions
标题:具有双重预测的最小费用网络流
链接:https://arxiv.org/abs/2601.20203

作者:Zhiyang Chen,Hailong Yao,Xia Yin
备注:accepted by AAAI 2026
摘要:Recent work has shown that machine-learned predictions can provably improve the performance of classic algorithms. In this work, we propose the first minimum-cost network flow algorithm augmented with a dual prediction. Our method is based on a classic minimum-cost flow algorithm, namely $\varepsilon$-relaxation. We provide time complexity bounds in terms of the infinity norm prediction error, which is both consistent and robust. We also prove sample complexity bounds for PAC-learning the prediction. We empirically validate our theoretical results on two applications of minimum-cost flow, i.e., traffic networks and chip escape routing, in which we learn a fixed prediction, and a feature-based neural network model to infer the prediction, respectively. Experimental results illustrate $12.74\times$ and $1.64\times$ average speedup on two applications.


【11】Loss Landscape Geometry and the Learning of Symmetries: Or, What Influence Functions Reveal About Robust Generalization
标题:景观几何损失和对称性的学习:或者,影响函数揭示了鲁棒概括的什么
链接:https://arxiv.org/abs/2601.20172

作者:James Amarel,Robyn Miller,Nicolas Hengartner,Benjamin Migliori,Emily Casleton,Alexei Skurikhin,Earl Lawrence,Gerd J. Kunde
摘要:We study how neural emulators of partial differential equation solution operators internalize physical symmetries by introducing an influence-based diagnostic that measures the propagation of parameter updates between symmetry-related states, defined as the metric-weighted overlap of loss gradients evaluated along group orbits. This quantity probes the local geometry of the learned loss landscape and goes beyond forward-pass equivariance tests by directly assessing whether learning dynamics couple physically equivalent configurations. Applying our diagnostic to autoregressive fluid flow emulators, we show that orbit-wise gradient coherence provides the mechanism for learning to generalize over symmetry transformations and indicates when training selects a symmetry compatible basin. The result is a novel technique for evaluating if surrogate models have internalized symmetry properties of the known solution operator.


【12】Domain Expansion: A Latent Space Construction Framework for Multi-Task Learning
标题:领域扩展:一种面向多任务学习的潜在空间构建框架
链接:https://arxiv.org/abs/2601.20069

作者:Chi-Yao Huang,Khoa Vo,Aayush Atul Verma,Duo Lu,Yezhou Yang
备注:Accepted to ICLR 2026
摘要 :Training a single network with multiple objectives often leads to conflicting gradients that degrade shared representations, forcing them into a compromised state that is suboptimal for any single task--a problem we term latent representation collapse. We introduce Domain Expansion, a framework that prevents these conflicts by restructuring the latent space itself. Our framework uses a novel orthogonal pooling mechanism to construct a latent space where each objective is assigned to a mutually orthogonal subspace. We validate our approach across diverse benchmarks--including ShapeNet, MPIIGaze, and Rotated MNIST--on challenging multi-objective problems combining classification with pose and gaze estimation. Our experiments demonstrate that this structure not only prevents collapse but also yields an explicit, interpretable, and compositional latent space where concepts can be directly manipulated.


【13】Structural Compositional Function Networks: Interpretable Functional Compositions for Tabular Discovery
标题:结构组成功能网络:表格发现的可解释功能组成
链接:https://arxiv.org/abs/2601.20037

作者:Fang Li
备注:Code and data available at https://github.com/fanglioc/StructuralCFN-public
摘要:Despite the ubiquity of tabular data in high-stakes domains, traditional deep learning architectures often struggle to match the performance of gradient-boosted decision trees while maintaining scientific interpretability. Standard neural networks typically treat features as independent entities, failing to exploit the inherent manifold structural dependencies that define tabular distributions. We propose Structural Compositional Function Networks (StructuralCFN), a novel architecture that imposes a Relation-Aware Inductive Bias via a differentiable structural prior. StructuralCFN explicitly models each feature as a mathematical composition of its counterparts through Differentiable Adaptive Gating, which automatically discovers the optimal activation physics (e.g., attention-style filtering vs. inhibitory polarity) for each relationship. Our framework enables Structured Knowledge Integration, allowing domain-specific relational priors to be injected directly into the architecture to guide discovery. We evaluate StructuralCFN across a rigorous 10-fold cross-validation suite on 18 benchmarks, demonstrating statistically significant improvements (p < 0.05) on scientific and clinical datasets (e.g., Blood Transfusion, Ozone, WDBC). Furthermore, StructuralCFN provides Intrinsic Symbolic Interpretability: it recovers the governing "laws" of the data manifold as human-readable mathematical expressions while maintaining a compact parameter footprint (300--2,500 parameters) that is over an order of magnitude (10x--20x) smaller than standard deep baselines.


【14】NCSAM Noise-Compensated Sharpness-Aware Minimization for Noisy Label Learning
标题:NCSam针对噪音标签学习的噪音补偿清晰度感知最小化
链接:https://arxiv.org/abs/2601.19947

作者:Jiayu Xu,Junbiao Pang
摘要:Learning from Noisy Labels (LNL) presents a fundamental challenge in deep learning, as real-world datasets often contain erroneous or corrupted annotations, \textit{e.g.}, data crawled from Web. Current research focuses on sophisticated label correction mechanisms. In contrast, this paper adopts a novel perspective by establishing a theoretical analysis the relationship between flatness of the loss landscape and the presence of label noise. In this paper, we theoretically demonstrate that carefully simulated label noise synergistically enhances both the generalization performance and robustness of label noises. Consequently, we propose Noise-Compensated Sharpness-aware Minimization (NCSAM) to leverage the perturbation of Sharpness-Aware Minimization (SAM) to remedy the damage of label noises. Our analysis reveals that the testing accuracy exhibits a similar behavior that has been observed on the noise-clear dataset. Extensive experimental results on multiple benchmark datasets demonstrate the consistent superiority of the proposed method over existing state-of-the-art approaches on diverse tasks.


【15】PiC-BNN: A 128-kbit 65 nm Processing-in-CAM-Based End-to-End Binary Neural Network Accelerator
标题:PiC-BNN:128 kbit 65纳米基于CAM处理的端到端二进制神经网络加速器
链接:https://arxiv.org/abs/2601.19920

作者:Yuval Harary,Almog Sharoni,Esteban Garzón,Marco Lanuzza,Adam Teman,Leonid Yavits
备注:7 pages, 6 figures. Accepted to IEEE CCMCC 2025
摘要:Binary Neural Networks (BNNs), where weights and activations are constrained to binary values (+1, -1), are a highly efficient alternative to traditional neural networks. Unfortunately, typical BNNs, while binarizing linear layers (matrix-vector multiplication), still implement other network layers (batch normalization, softmax, output layer, and sometimes the input layer of a convolutional neural network) in full precision. This limits the area and energy benefits and requires architectural support for full precision operations. We propose PiC-BNN, a true end-to-end binary in-approximate search (Hamming distance tolerant) Content Addressable Memory based BNN accelerator. PiC-BNN is designed and manufactured in a commercial 65nm process. PiC-BNN uses Hamming distance tolerance to apply the law of large numbers to enable accurate classification without implementing full precision operations. PiC-BNN achieves baseline software accuracy (95.2%) on the MNIST dataset and 93.5% on the Hand Gesture (HG) dataset, a throughput of 560K inferences/s, and presents a power efficiency of 703M inferences/s/W when implementing a binary MLP model for MNIST/HG dataset classification.


【16】Physics-informed Blind Reconstruction of Dense Fields from Sparse Measurements using Neural Networks with a Differentiable Simulator
标题:基于可微模拟器的神经网络稠密场盲重建
链接:https://arxiv.org/abs/2601.20496

作者:Ofek Aloni,Barak Fishbain
摘要:Generating dense physical fields from sparse measurements is a fundamental question in sampling, signal processing, and many other applications. State-of-the-art methods either use spatial statistics or rely on examples of dense fields in the training phase, which often are not available, and thus rely on synthetic data. Here, we present a reconstruction method that generates dense fields from sparse measurements, without assuming availability of the spatial statistics, nor of examples of the dense fields. This is made possible through the introduction of an automatically differentiable numerical simulator into the training phase of the method. The method is shown to have superior results over statistical and neural network based methods on a set of three standard problems from fluid mechanics.


【17】Deep Neural Networks as Iterated Function Systems and a Generalization Bound
标题:深度神经网络作为迭代功能系统和广义界
链接:https://arxiv.org/abs/2601.19958

作者:Jonathan Vacher
摘要:Deep neural networks (DNNs) achieve remarkable performance on a wide range of tasks, yet their mathematical analysis remains fragmented: stability and generalization are typically studied in disparate frameworks and on a case-by-case basis. Architecturally, DNNs rely on the recursive application of parametrized functions, a mechanism that can be unstable and difficult to train, making stability a primary concern. Even when training succeeds, there are few rigorous results on how well such models generalize beyond the observed data, especially in the generative setting. In this work, we leverage the theory of stochastic Iterated Function Systems (IFS) and show that two important deep architectures can be viewed as, or canonically associated with, place-dependent IFS. This connection allows us to import results from random dynamical systems to (i) establish the existence and uniqueness of invariant measures under suitable contractivity assumptions, and (ii) derive a Wasserstein generalization bound for generative modeling. The bound naturally leads to a new training objective that directly controls the collage-type approximation error between the data distribution and its image under the learned transfer operator. We illustrate the theory on a controlled 2D example and empirically evaluate the proposed objective on standard image datasets (MNIST, CelebA, CIFAR-10).


其他(26篇)

【1】$\mathbb{R}^{2k}$ is Theoretically Large Enough for Embedding-based Top-$k$ Retrieval
链接:https://arxiv.org/abs/2601.20844

作者:Zihao Wang,Hang Yin,Lihui Liu,Hanghang Tong,Yangqiu Song,Ginny Wong,Simon See
摘要:This paper studies the minimal dimension required to embed subset memberships ($m$ elements and ${m\choose k}$ subsets of at most $k$ elements) into vector spaces, denoted as Minimal Embeddable Dimension (MED). The tight bounds of MED are derived theoretically and supported empirically for various notions of "distances" or "similarities," including the $\ell_2$ metric, inner product, and cosine similarity. In addition, we conduct numerical simulation in a more achievable setting, where the ${m\choose k}$ subset embeddings are chosen as the centroid of the embeddings of the contained elements. Our simulation easily realizes a logarithmic dependency between the MED and the number of elements to embed. These findings imply that embedding-based retrieval limitations stem primarily from learnability challenges, not geometric constraints, guiding future algorithm design.


【2】Conditional PED-ANOVA: Hyperparameter Importance in Hierarchical & Dynamic Search Spaces
标题:条件PED-ANOVA:层次和动态搜索空间中的超参数重要性
链接:https://arxiv.org/abs/2601.20800

作者:Kaito Baba,Yoshihiko Ozaki,Shuhei Watanabe
备注:16 pages, 9 figures
摘要:We propose conditional PED-ANOVA (condPED-ANOVA), a principled framework for estimating hyperparameter importance (HPI) in conditional search spaces, where the presence or domain of a hyperparameter can depend on other hyperparameters. Although the original PED-ANOVA provides a fast and efficient way to estimate HPI within the top-performing regions of the search space, it assumes a fixed, unconditional search space and therefore cannot properly handle conditional hyperparameters. To address this, we introduce a conditional HPI for top-performing regions and derive a closed-form estimator that accurately reflects conditional activation and domain changes. Experiments show that naive adaptations of existing HPI estimators yield misleading or uninterpretable importance estimates in conditional settings, whereas condPED-ANOVA consistently provides meaningful importances that reflect the underlying conditional structure.


【3】SERA: Soft-Verified Efficient Repository Agents
标题:SERA:软验证的高效知识库代理
链接:https://arxiv.org/abs/2601.20789

作者:Ethan Shen,Danny Tormoen,Saurabh Shah,Ali Farhadi,Tim Dettmers
备注:21 main pages, 7 pages appendix
摘要:Open-weight coding agents should hold a fundamental advantage over closed-source systems: they can be specialized to private codebases, encoding repository-specific information directly in their weights. Yet the cost and complexity of training has kept this advantage theoretical. We show it is now practical. We present Soft-Verified Efficient Repository Agents (SERA), an efficient method for training coding agents that enables the rapid and cheap creation of agents specialized to private codebases. Using only supervised finetuning (SFT), SERA achieves state-of-the-art results among fully open-source (open data, method, code) models while matching the performance of frontier open-weight models like Devstral-Small-2. Creating SERA models is 26x cheaper than reinforcement learning and 57x cheaper than previous synthetic data methods to reach equivalent performance. Our method, Soft Verified Generation (SVG), generates thousands of trajectories from a single code repository. Combined with cost-efficiency, this enables specialization to private codebases. Beyond repository specialization, we apply SVG to a larger corpus of codebases, generating over 200,000 synthetic trajectories. We use this dataset to provide detailed analysis of scaling laws, ablations, and confounding factors for training coding agents. Overall, we believe our work will greatly accelerate research on open coding agents and showcase the advantage of open-source models that can specialize to private codebases. We release SERA as the first model in Ai2's Open Coding Agents series, along with all our code, data, and Claude Code integration to support the research community.


【4】COMET-SG1: Lightweight Autoregressive Regressor for Edge and Embedded AI
标题:COMET-SG 1:用于边缘和嵌入式AI的轻量级自回归器
链接:https://arxiv.org/abs/2601.20772

作者:Shakhyar Gogoi
备注:Preprint. Submitted to an IEEE conference. 6 pages, 6 figures, 2 tables
摘要:COMET-SG1 is a lightweight, stability-oriented autoregressive regression model designed for time-series prediction on edge and embedded AI systems. Unlike recurrent neural networks or transformer-based sequence models, COMET-SG1 operates through linear behavior-space encoding, memory-anchored transition estimation, and deterministic state updates. This structure prioritizes bounded long-horizon behavior under fully autoregressive inference, a critical requirement for edge deployment where prediction errors accumulate over time. Experiments on non-stationary synthetic time-series data demonstrate that COMET-SG1 achieves competitive short-horizon accuracy while exhibiting significantly reduced long-horizon drift compared to MLP, LSTM, and k-nearest neighbor baselines. With a compact parameter footprint and operations compatible with fixed-point arithmetic, COMET-SG1 provides a practical and interpretable approach for stable autoregressive prediction in edge and embedded AI applications.


【5】Less is More: Clustered Cross-Covariance Control for Offline RL
标题:少即是多:离线RL的简化交叉协方差控制
链接:https://arxiv.org/abs/2601.20765

作者:Nan Qiao,Sheng Yue,Shuning Wang,Yongheng Deng,Ju Ren
摘要:A fundamental challenge in offline reinforcement learning is distributional shift. Scarce data or datasets dominated by out-of-distribution (OOD) areas exacerbate this issue. Our theoretical analysis and experiments show that the standard squared error objective induces a harmful TD cross covariance. This effect amplifies in OOD areas, biasing optimization and degrading policy learning. To counteract this mechanism, we develop two complementary strategies: partitioned buffer sampling that restricts updates to localized replay partitions, attenuates irregular covariance effects, and aligns update directions, yielding a scheme that is easy to integrate with existing implementations, namely Clustered Cross-Covariance Control for TD (C^4). We also introduce an explicit gradient-based corrective penalty that cancels the covariance induced bias within each update. We prove that buffer partitioning preserves the lower bound property of the maximization objective, and that these constraints mitigate excessive conservatism in extreme OOD areas without altering the core behavior of policy constrained offline reinforcement learning. Empirically, our method showcases higher stability and up to 30% improvement in returns over prior methods, especially with small datasets and splits that emphasize OOD areas.


【6】Continual GUI Agents
标题:连续的图形用户界面代理
链接:https://arxiv.org/abs/2601.20732

作者:Ziwei Liu,Borui Kang,Hangjie Yuan,Zixiang Zhao,Wei Li,Yifan Zhu,Tao Feng
摘要:As digital environments (data distribution) are in flux, with new GUI data arriving over time-introducing new domains or resolutions-agents trained on static environments deteriorate in performance. In this work, we introduce Continual GUI Agents, a new task that requires GUI agents to perform continual learning under shifted domains and resolutions. We find existing methods fail to maintain stable grounding as GUI distributions shift over time, due to the diversity of UI interaction points and regions in fluxing scenarios. To address this, we introduce GUI-Anchoring in Flux (GUI-AiF), a new reinforcement fine-tuning framework that stabilizes continual learning through two novel rewards: Anchoring Point Reward in Flux (APR-iF) and Anchoring Region Reward in Flux (ARR-iF). These rewards guide the agents to align with shifting interaction points and regions, mitigating the tendency of existing reward strategies to over-adapt to static grounding cues (e.g., fixed coordinates or element scales). Extensive experiments show GUI-AiF surpasses state-of-the-art baselines. Our work establishes the first continual learning framework for GUI agents, revealing the untapped potential of reinforcement fine-tuning for continual GUI Agents.


【7】Is Pure Exploitation Sufficient in Exogenous MDPs with Linear Function Approximation?
标题:在具有线性函数逼近的外生MDP中,纯粹利用是否足够?
链接:https://arxiv.org/abs/2601.20694

作者:Hao Liang,Jiayu Cheng,Sean R. Sinclair,Yali Du
备注:Accepted to ICLR 2026
摘要:Exogenous MDPs (Exo-MDPs) capture sequential decision-making where uncertainty comes solely from exogenous inputs that evolve independently of the learner's actions. This structure is especially common in operations research applications such as inventory control, energy storage, and resource allocation, where exogenous randomness (e.g., demand, arrivals, or prices) drives system behavior. Despite decades of empirical evidence that greedy, exploitation-only methods work remarkably well in these settings, theory has lagged behind: all existing regret guarantees for Exo-MDPs rely on explicit exploration or tabular assumptions. We show that exploration is unnecessary. We propose Pure Exploitation Learning (PEL) and prove the first general finite-sample regret bounds for exploitation-only algorithms in Exo-MDPs. In the tabular case, PEL achieves $\widetilde{O}(H^2|Ξ|\sqrt{K})$. For large, continuous endogenous state spaces, we introduce LSVI-PE, a simple linear-approximation method whose regret is polynomial in the feature dimension, exogenous state space, and horizon, independent of the endogenous state and action spaces. Our analysis introduces two new tools: counterfactual trajectories and Bellman-closed feature transport, which together allow greedy policies to have accurate value estimates without optimism. Experiments on synthetic and resource-management tasks show that PEL consistently outperforming baselines. Overall, our results overturn the conventional wisdom that exploration is required, demonstrating that in Exo-MDPs, pure exploitation is enough.


【8】DIVERSE: Disagreement-Inducing Vector Evolution for Rashomon Set Exploration
标题:DIVERSE:罗生门集合探索的分歧引发的载体进化
链接:https://arxiv.org/abs/2601.20627

作者:Gilles Eerlings,Brent Zoomers,Jori Liesenborgs,Gustavo Rovelo Ruiz,Kris Luyten
摘要 :We propose DIVERSE, a framework for systematically exploring the Rashomon set of deep neural networks, the collection of models that match a reference model's accuracy while differing in their predictive behavior. DIVERSE augments a pretrained model with Feature-wise Linear Modulation (FiLM) layers and uses Covariance Matrix Adaptation Evolution Strategy (CMA-ES) to search a latent modulation space, generating diverse model variants without retraining or gradient access. Across MNIST, PneumoniaMNIST, and CIFAR-10, DIVERSE uncovers multiple high-performing yet functionally distinct models. Our experiments show that DIVERSE offers a competitive and efficient exploration of the Rashomon set, making it feasible to construct diverse sets that maintain robustness and performance while supporting well-balanced model multiplicity. While retraining remains the baseline to generate Rashomon sets, DIVERSE achieves comparable diversity at reduced computational cost.


【9】Nonlinear Dimensionality Reduction with Diffusion Maps in Practice
标题:应用扩散图约简非线性维度在实践中
链接:https://arxiv.org/abs/2601.20428

作者:Sönke Beier,Paula Pirker-Díaz,Friedrich Pagenkopf,Karoline Wiesner
摘要:Diffusion Map is a spectral dimensionality reduction technique which is able to uncover nonlinear submanifolds in high-dimensional data. And, it is increasingly applied across a wide range of scientific disciplines, such as biology, engineering, and social sciences. But data preprocessing, parameter settings and component selection have a significant influence on the resulting manifold, something which has not been comprehensively discussed in the literature so far. We provide a practice oriented review of the Diffusion Map technique, illustrate pitfalls and showcase a recently introduced technique for identifying the most relevant components. Our results show that the first components are not necessarily the most relevant ones.


【10】HE-SNR: Uncovering Latent Logic via Entropy for Guiding Mid-Training on SWE-BENCH
标题:HE-SNR:通过信息量揭示潜在逻辑,指导SWE-BENCH上的中期训练
链接:https://arxiv.org/abs/2601.20255

作者:Yueyang Wang,Jiawei Fu,Baolong Bi,Xili Wang,Xiaoqing Liu
备注:21 pages, 15 figures
摘要:SWE-bench has emerged as the premier benchmark for evaluating Large Language Models on complex software engineering tasks. While these capabilities are fundamentally acquired during the mid-training phase and subsequently elicited during Supervised Fine-Tuning (SFT), there remains a critical deficit in metrics capable of guiding mid-training effectively. Standard metrics such as Perplexity (PPL) are compromised by the "Long-Context Tax" and exhibit weak correlation with downstream SWE performance. In this paper, we bridge this gap by first introducing a rigorous data filtering strategy. Crucially, we propose the Entropy Compression Hypothesis, redefining intelligence not by scalar Top-1 compression, but by the capacity to structure uncertainty into Entropy-Compressed States of low orders ("reasonable hesitation"). Grounded in this fine-grained entropy analysis, we formulate a novel metric, HE-SNR (High-Entropy Signal-to-Noise Ratio). Validated on industrial-scale Mixture-of-Experts (MoE) models across varying context windows (32K/128K), our approach demonstrates superior robustness and predictive power. This work provides both the theoretical foundation and practical tools for optimizing the latent potential of LLMs in complex engineering domains.


【11】An Accounting Identity for Algorithmic Fairness
标题:数学公平的会计身份
链接:https://arxiv.org/abs/2601.20217

作者:Hadi Elzayn,Jacob Goldin
摘要:We derive an accounting identity for predictive models that links accuracy with common fairness criteria. The identity shows that for globally calibrated models, the weighted sums of miscalibration within groups and error imbalance across groups is equal to a "total unfairness budget." For binary outcomes, this budget is the model's mean-squared error times the difference in group prevalence across outcome classes. The identity nests standard impossibility results as special cases, while also describing inherent tradeoffs when one or more fairness measures are not perfectly satisfied. The results suggest that accuracy and fairness are best viewed as complements in binary prediction tasks: increasing accuracy necessarily shrinks the total unfairness budget and vice-versa. Experiments on benchmark data confirm the theory and show that many fairness interventions largely substitute between fairness violations, and when they reduce accuracy they tend to expand the total unfairness budget. The results extend naturally to prediction tasks with non-binary outcomes, illustrating how additional outcome information can relax fairness incompatibilities and identifying conditions under which the binary-style impossibility does and does not extend to regression tasks.


【12】Hyperparameter Transfer with Mixture-of-Expert Layers
标题:混合专家层的超参数传递
链接:https://arxiv.org/abs/2601.20205

作者:Tianze Jiang,Blake Bordelon,Cengiz Pehlevan,Boris Hanin
备注:25 Pages
摘要:Mixture-of-Experts (MoE) layers have emerged as an important tool in scaling up modern neural networks by decoupling total trainable parameters from activated parameters in the forward pass for each token. However, sparse MoEs add complexity to training due to (i) new trainable parameters (router weights) that, like all other parameter groups, require hyperparameter (HP) tuning; (ii) new architecture scale dimensions (number of and size of experts) that must be chosen and potentially taken large. To make HP selection cheap and reliable, we propose a new parameterization for transformer models with MoE layers when scaling model width, depth, number of experts, and expert (hidden) size. Our parameterization is justified by a novel dynamical mean-field theory (DMFT) analysis. When varying different model dimensions trained at a fixed token budget, we find empirically that our parameterization enables reliable HP transfer across models from 51M to over 2B total parameters. We further take HPs identified from sweeping small models on a short token horizon to train larger models on longer horizons and report performant model behaviors.


【13】NeuraLSP: An Efficient and Rigorous Neural Left Singular Subspace Preconditioner for Conjugate Gradient Methods
标题:NeuraMPS:一种有效而严格的用于卷积梯度方法的神经左奇异子空间预处理器
链接:https://arxiv.org/abs/2601.20174

作者:Alexander Benanti,Xi Han,Hong Qin
摘要:Numerical techniques for solving partial differential equations (PDEs) are integral for many fields across science and engineering. Such techniques usually involve solving large, sparse linear systems, where preconditioning methods are critical. In recent years, neural methods, particularly graph neural networks (GNNs), have demonstrated their potential through accelerated convergence. Nonetheless, to extract connective structures, existing techniques aggregate discretized system matrices into graphs, and suffer from rank inflation and a suboptimal convergence rate. In this paper, we articulate NeuraLSP, a novel neural preconditioner combined with a novel loss metric that leverages the left singular subspace of the system matrix's near-nullspace vectors. By compressing spectral information into a fixed low-rank operator, our method exhibits both theoretical guarantees and empirical robustness to rank inflation, affording up to a 53% speedup. Besides the theoretical guarantees for our newly-formulated loss function, our comprehensive experimental results across diverse families of PDEs also substantiate the aforementioned theoretical advances.


【14】Local Duality for Sparse Support Vector Machines
标题:稀疏支持向量机的局部二元性
链接:https://arxiv.org/abs/2601.20170

作者:Penghe Zhang,Naihua Xiu,Houduo Qi
摘要:Due to the rise of cardinality minimization in optimization, sparse support vector machines (SSVMs) have attracted much attention lately and show certain empirical advantages over convex SVMs. A common way to derive an SSVM is to add a cardinality function such as $\ell_0$-norm to the dual problem of a convex SVM. However, this process lacks theoretical justification. This paper fills the gap by developing a local duality theory for such an SSVM formulation and exploring its relationship with the hinge-loss SVM (hSVM) and the ramp-loss SVM (rSVM). In particular, we prove that the derived SSVM is exactly the dual problem of the 0/1-loss SVM, and the linear representer theorem holds for their local solutions. The local solution of SSVM also provides guidelines on selecting hyperparameters of hSVM and rSVM. {Under specific conditions, we show that a sequence of global solutions of hSVM converges to a local solution of 0/1-loss SVM. Moreover, a local minimizer of 0/1-loss SVM is a local minimizer of rSVM.} This explains why a local solution induced by SSVM outperforms hSVM and rSVM in the prior empirical study. We further conduct numerical tests on real datasets and demonstrate potential advantages of SSVM by working with locally nice solutions proposed in this paper.


【15】Distributional value gradients for stochastic environments
标题:随机环境下的分布值梯度
链接:https://arxiv.org/abs/2601.20071

作者:Baptiste Debes,Tinne Tuytelaars
摘要:Gradient-regularized value learning methods improve sample efficiency by leveraging learned models of transition dynamics and rewards to estimate return gradients. However, existing approaches, such as MAGE, struggle in stochastic or noisy environments, limiting their applicability. In this work, we address these limitations by extending distributional reinforcement learning on continuous state-action spaces to model not only the distribution over scalar state-action value functions but also over their gradients. We refer to this approach as Distributional Sobolev Training. Inspired by Stochastic Value Gradients (SVG), our method utilizes a one-step world model of reward and transition distributions implemented via a conditional Variational Autoencoder (cVAE). The proposed framework is sample-based and employs Max-sliced Maximum Mean Discrepancy (MSMMD) to instantiate the distributional Bellman operator. We prove that the Sobolev-augmented Bellman operator is a contraction with a unique fixed point, and highlight a fundamental smoothness trade-off underlying contraction in gradient-aware RL. To validate our method, we first showcase its effectiveness on a simple stochastic reinforcement learning toy problem, then benchmark its performance on several MuJoCo environments.


【16】Decomposing multimodal embedding spaces with group-sparse autoencoders
标题:使用组稀疏自动编码器分解多模式嵌入空间
链接:https://arxiv.org/abs/2601.20028

作者:Chiraag Kaushik,Davis Barch,Andrea Fanelli
备注:19 pages
摘要:The Linear Representation Hypothesis asserts that the embeddings learned by neural networks can be understood as linear combinations of features corresponding to high-level concepts. Based on this ansatz, sparse autoencoders (SAEs) have recently become a popular method for decomposing embeddings into a sparse combination of linear directions, which have been shown empirically to often correspond to human-interpretable semantics. However, recent attempts to apply SAEs to multimodal embedding spaces (such as the popular CLIP embeddings for image/text data) have found that SAEs often learn "split dictionaries", where most of the learned sparse features are essentially unimodal, active only for data of a single modality. In this work, we study how to effectively adapt SAEs for the setting of multimodal embeddings while ensuring multimodal alignment. We first argue that the existence of a split dictionary decomposition on an aligned embedding space implies the existence of a non-split dictionary with improved modality alignment. Then, we propose a new SAE-based approach to multimodal embedding decomposition using cross-modal random masking and group-sparse regularization. We apply our method to popular embeddings for image/text (CLIP) and audio/text (CLAP) data and show that, compared to standard SAEs, our approach learns a more multimodal dictionary while reducing the number of dead neurons and improving feature semanticity. We finally demonstrate how this improvement in alignment of concepts between modalities can enable improvements in the interpretability and control of cross-modal tasks.


【17】Cross-Session Decoding of Neural Spiking Data via Task-Conditioned Latent Alignment
标题:通过任务条件潜在对齐的神经尖峰数据跨会话解码
链接:https://arxiv.org/abs/2601.19963

作者:Canyang Zhao,Bolin Peng,J. Patrick Mayo,Ce Ju,Bing Liu
摘要 :Cross-session nonstationarity in neural activity recorded by implanted electrodes is a major challenge for invasive Brain-computer interfaces (BCIs), as decoders trained on data from one session often fail to generalize to subsequent sessions. This issue is further exacerbated in practice, as retraining or adapting decoders becomes particularly challenging when only limited data are available from a new session. To address this challenge, we propose a Task-Conditioned Latent Alignment framework (TCLA) for cross-session neural decoding. Building upon an autoencoder architecture, TCLA first learns a low-dimensional representation of neural dynamics from a source session with sufficient data. For target sessions with limited data, TCLA then aligns target latent representations to the source in a task-conditioned manner, enabling effective transfer of learned neural dynamics. We evaluate TCLA on the macaque motor and oculomotor center-out dataset. Compared to baseline methods trained solely on target-session data, TCLA consistently improves decoding performance across datasets and decoding settings, with gains in the coefficient of determination of up to 0.386 for y coordinate velocity decoding in a motor dataset. These results suggest that TCLA provides an effective strategy for transferring knowledge from source to target sessions, enabling more robust neural decoding under conditions with limited data.


【18】Probabilistic Sensing: Intelligence in Data Sampling
标题:概率感知:数据采样中的智能
链接:https://arxiv.org/abs/2601.19953

作者:Ibrahim Albulushi,Saleh Bunaiyan,Suraj S. Cheema,Hesham ElSawy,Feras Al-Dirini
备注:Accepted for presentation at IEEE ISCAS 2026 as a lecture
摘要:Extending the intelligence of sensors to the data-acquisition process - deciding whether to sample or not - can result in transformative energy-efficiency gains. However, making such a decision in a deterministic manner involves risk of losing information. Here we present a sensing paradigm that enables making such a decision in a probabilistic manner. The paradigm takes inspiration from the autonomous nervous system and employs a probabilistic neuron (p-neuron) driven by an analog feature extraction circuit. The response time of the system is on the order of microseconds, over-coming the sub-sampling-rate response time limit and enabling real-time intelligent autonomous activation of data-sampling. Validation experiments on active seismic survey data demonstrate lossless probabilistic data acquisition, with a normalized mean squared error of 0.41%, and 93% saving in the active operation time of the system and the number of generated samples.


【19】Emergent Specialization in Learner Populations: Competition as the Source of Diversity
标题:学习群体的新兴专业化:竞争是多样性的源泉
链接:https://arxiv.org/abs/2601.19943

作者:Yuhao Li
备注:15 pages, 5 figures, code available at https://github.com/HowardLiYH/NichePopulation
摘要:How can populations of learners develop coordinated, diverse behaviors without explicit communication or diversity incentives? We demonstrate that competition alone is sufficient to induce emergent specialization -- learners spontaneously partition into specialists for different environmental regimes through competitive dynamics, consistent with ecological niche theory. We introduce the NichePopulation algorithm, a simple mechanism combining competitive exclusion with niche affinity tracking. Validated across six real-world domains (cryptocurrency trading, commodity prices, weather forecasting, solar irradiance, urban traffic, and air quality), our approach achieves a mean Specialization Index of 0.75 with effect sizes of Cohen's d > 20. Key findings: (1) At lambda=0 (no niche bonus), learners still achieve SI > 0.30, proving specialization is genuinely emergent; (2) Diverse populations outperform homogeneous baselines by +26.5% through method-level division of labor; (3) Our approach outperforms MARL baselines (QMIX, MAPPO, IQL) by 4.3x while being 4x faster.


【20】Neural Quantum States in Mixed Precision
标题:混合精度下的神经量子态
链接:https://arxiv.org/abs/2601.20782

作者:Massimo Solinas,Agnes Valenti,Nawaf Bou-Rabee,Roeland Wiersema
备注:22 pages, 12 figures
摘要:Scientific computing has long relied on double precision (64-bit floating point) arithmetic to guarantee accuracy in simulations of real-world phenomena. However, the growing availability of hardware accelerators such as Graphics Processing Units (GPUs) has made low-precision formats attractive due to their superior performance, reduced memory footprint, and improved energy efficiency. In this work, we investigate the role of mixed-precision arithmetic in neural-network based Variational Monte Carlo (VMC), a widely used method for solving computationally otherwise intractable quantum many-body systems. We first derive general analytical bounds on the error introduced by reduced precision on Metropolis-Hastings MCMC, and then empirically validate these bounds on the use-case of VMC. We demonstrate that significant portions of the algorithm, in particular, sampling the quantum state, can be executed in half precision without loss of accuracy. More broadly, this work provides a theoretical framework to assess the applicability of mixed-precision arithmetic in machine-learning approaches that rely on MCMC sampling. In the context of VMC, we additionally demonstrate the practical effectiveness of mixed-precision strategies, enabling more scalable and energy-efficient simulations of quantum many-body systems.


【21】Leveraging Second-Order Curvature for Efficient Learned Image Compression: Theory and Empirical Evidence
标题:利用二阶弯曲进行有效的学习图像压缩:理论和经验证据
链接:https://arxiv.org/abs/2601.20769

作者:Yichi Zhang,Fengqing Zhu
摘要 :Training learned image compression (LIC) models entails navigating a challenging optimization landscape defined by the fundamental trade-off between rate and distortion. Standard first-order optimizers, such as SGD and Adam, struggle with \emph{gradient conflicts} arising from competing objectives, leading to slow convergence and suboptimal rate-distortion performance. In this work, we demonstrate that a simple utilization of a second-order quasi-Newton optimizer, \textbf{SOAP}, dramatically improves both training efficiency and final performance across diverse LICs. Our theoretical and empirical analyses reveal that Newton preconditioning inherently resolves the intra-step and inter-step update conflicts intrinsic to the R-D objective, facilitating faster, more stable convergence. Beyond acceleration, we uncover a critical deployability benefit: second-order trained models exhibit significantly fewer activation and latent outliers. This substantially enhances robustness to post-training quantization. Together, these results establish second-order optimization, achievable as a seamless drop-in replacement of the imported optimizer, as a powerful, practical tool for advancing the efficiency and real-world readiness of LICs.


【22】A scalable flow-based approach to mitigate topological freezing
标题:一种可扩展的基于流的方法来缓解拓扑冻结
链接:https://arxiv.org/abs/2601.20708

作者:Claudio Bonanno,Andrea Bulgarelli,Elia Cellini,Alessandro Nada,Dario Panfalone,Davide Vadacchino,Lorenzo Verzichelli
备注:1+9 pages, 3 figures, contribution to the 42nd International Symposium on Lattice Field Theory (Lattice 2025), 2-8 November 2025, Mumbai, India
摘要:As lattice gauge theories with non-trivial topological features are driven towards the continuum limit, standard Markov Chain Monte Carlo simulations suffer for topological freezing, i.e., a dramatic growth of autocorrelations in topological observables. A widely used strategy is the adoption of Open Boundary Conditions (OBC), which restores ergodic sampling of topology but at the price of breaking translation invariance and introducing unphysical boundary artifacts. In this contribution we summarize a scalable, exact flow-based strategy to remove them by transporting configurations from a prior with a OBC defect to a fully periodic ensemble, and apply it to 4d SU(3) Yang--Mills theory. The method is based on a Stochastic Normalizing Flow (SNF) that alternates non-equilibrium Monte Carlo updates with localized, gauge-equivariant defect coupling layers implemented via masked parametric stout smearing. Training is performed by minimizing the average dissipated work, equivalent to a Kullback--Leibler divergence between forward and reverse non-equilibrium path measures, to achieve more reversible trajectories and improved efficiency. We discuss the scaling with the number of degrees of freedom affected by the defect and show that defect SNFs achieve better performances than purely stochastic non-equilibrium methods at comparable cost. Finally, we validate the approach by reproducing reference results for the topological susceptibility.


【23】Assembling the Mind's Mosaic: Towards EEG Semantic Intent Decoding
标题:组装思维马赛克:走向脑电语义意图解码
链接:https://arxiv.org/abs/2601.20447

作者:Jiahe Li,Junru Chen,Fanqi Shen,Jialan Yang,Jada Li,Zhizhang Yuan,Baowen Cheng,Meng Li,Yang Yang
摘要:Enabling natural communication through brain-computer interfaces (BCIs) remains one of the most profound challenges in neuroscience and neurotechnology. While existing frameworks offer partial solutions, they are constrained by oversimplified semantic representations and a lack of interpretability. To overcome these limitations, we introduce Semantic Intent Decoding (SID), a novel framework that translates neural activity into natural language by modeling meaning as a flexible set of compositional semantic units. SID is built on three core principles: semantic compositionality, continuity and expandability of semantic space, and fidelity in reconstruction. We present BrainMosaic, a deep learning architecture implementing SID. BrainMosaic decodes multiple semantic units from EEG/SEEG signals using set matching and then reconstructs coherent sentences through semantic-guided reconstruction. This approach moves beyond traditional pipelines that rely on fixed-class classification or unconstrained generation, enabling a more interpretable and expressive communication paradigm. Extensive experiments on multilingual EEG and clinical SEEG datasets demonstrate that SID and BrainMosaic offer substantial advantages over existing frameworks, paving the way for natural and effective BCI-mediated communication.


【24】Empirical Likelihood-Based Fairness Auditing: Distribution-Free Certification and Flagging
标题:基于经验可能性的公平审计:无分发认证和标记
链接:https://arxiv.org/abs/2601.20269

作者:Jie Tang,Chuanlong Xie,Xianli Zeng,Lixing Zhu
备注:55 pages, 9 figures; Code available at: https://github.com/Tang-Jay/ELFA; Author list is in alphabetical order by last names
摘要:Machine learning models in high-stakes applications, such as recidivism prediction and automated personnel selection, often exhibit systematic performance disparities across sensitive subpopulations, raising critical concerns regarding algorithmic bias. Fairness auditing addresses these risks through two primary functions: certification, which verifies adherence to fairness constraints; and flagging, which isolates specific demographic groups experiencing disparate treatment. However, existing auditing techniques are frequently limited by restrictive distributional assumptions or prohibitive computational overhead. We propose a novel empirical likelihood-based (EL) framework that constructs robust statistical measures for model performance disparities. Unlike traditional methods, our approach is non-parametric; the proposed disparity statistics follow asymptotically chi-square or mixed chi-square distributions, ensuring valid inference without assuming underlying data distributions. This framework uses a constrained optimization profile that admits stable numerical solutions, facilitating both large-scale certification and efficient subpopulation discovery. Empirically, the EL methods outperform bootstrap-based approaches, yielding coverage rates closer to nominal levels while reducing computational latency by several orders of magnitude. We demonstrate the practical utility of this framework on the COMPAS dataset, where it successfully flags intersectional biases, specifically identifying a significantly higher positive prediction rate for African-American males under 25 and a systemic under-prediction for Caucasian females relative to the population mean.


【25】Quantum statistics from classical simulations via generative Gibbs sampling
标题:通过生成式吉布斯抽样从经典模拟中获得的量子统计
链接:https://arxiv.org/abs/2601.20228

作者:Weizhou Wang,Xuanxi Zhang,Jonathan Weare,Aaron R. Dinner
备注:12 pages, 9 figures
摘要 :Accurate simulation of nuclear quantum effects is essential for molecular modeling but expensive using path integral molecular dynamics (PIMD). We present GG-PI, a ring-polymer-based framework that combines generative modeling of the single-bead conditional density with Gibbs sampling to recover quantum statistics from classical simulation data. GG-PI uses inexpensive standard classical simulations or existing data for training and allows transfer across temperatures without retraining. On standard test systems, GG-PI significantly reduces wall clock time compared to PIMD. Our approach extends easily to a wide range of problems with similar Markov structure.


【26】Global Plane Waves From Local Gaussians: Periodic Charge Densities in a Blink
标题:来自本地高斯的全球平面波:眨眼中的周期性电荷密度
链接:https://arxiv.org/abs/2601.19966

作者:Jonas Elsborg,Felix Ærtebjerg,Luca Thiede,Alán Aspuru-Guzik,Tejs Vegge,Arghya Bhowmik
备注:24 pages including appendix, 8 Figures, 5 tables
摘要:We introduce ELECTRAFI, a fast, end-to-end differentiable model for predicting periodic charge densities in crystalline materials. ELECTRAFI constructs anisotropic Gaussians in real space and exploits their closed-form Fourier transforms to analytically evaluate plane-wave coefficients via the Poisson summation formula. This formulation delegates non-local and periodic behavior to analytic transforms, enabling reconstruction of the full periodic charge density with a single inverse FFT. By avoiding explicit real-space grid probing, periodic image summation, and spherical harmonic expansions, ELECTRAFI matches or exceeds state-of-the-art accuracy across periodic benchmarks while being up to $633 \times$ faster than the strongest competing method, reconstructing crystal charge densities in a fraction of a second. When used to initialize DFT calculations, ELECTRAFI reduces total DFT compute cost by up to ~20%, whereas slower charge density models negate savings due to high inference times. Our results show that accuracy and inference cost jointly determine end-to-end DFT speedups, and motivate our focus on efficiency.


机器翻译由腾讯交互翻译提供,仅供参考

点击“阅读原文”获取带摘要的学术速递

Python社区是高质量的Python/Django开发社区
本文地址:http://www.python88.com/topic/192343