Py学习  »  机器学习算法

机器学习学术速递[2.2]

arXiv每日学术速递 • 3 天前 • 85 次点击  

点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!


cs.LG 方向,今日共计262篇


大模型相关(39篇)

【1】TEON: Tensorized Orthonormalization Beyond Layer-Wise Muon for Large Language Model Pre-Training
标题:TEON:大型语言模型预训练的分层Muon之外的张量正射化
链接:https://arxiv.org/abs/2601.23261

作者:Ruijie Zhang,Yequan Zhao,Ziyue Liu,Zhengyang Wang,Dongyang Li,Yupeng Su,Sijia Liu,Zheng Zhang
摘要:The Muon optimizer has demonstrated strong empirical performance in pre-training large language models by performing matrix-level gradient (or momentum) orthogonalization in each layer independently. In this work, we propose TEON, a principled generalization of Muon that extends orthogonalization beyond individual layers by modeling the gradients of a neural network as a structured higher-order tensor. We present TEON's improved convergence guarantee over layer-wise Muon, and further develop a practical instantiation of TEON based on the theoretical analysis with corresponding ablation. We evaluate our approach on two widely adopted architectures: GPT-style models, ranging from 130M to 774M parameters, and LLaMA-style models, ranging from 60M to 1B parameters. Experimental results show that TEON consistently improves training and validation perplexity across model scales and exhibits strong robustness under various approximate SVD schemes.


【2】Training-Free Test-Time Adaptation with Brownian Distance Covariance in Vision-Language Models
标题:视觉语言模型中具有布朗距离协方差的免训练测试时间适应
链接:https://arxiv.org/abs/2601.23253

作者:Yi Zhang,Chun-Wun Cheng,Angelica I. Aviles-Rivero,Zhihai He,Liang-Jie Zhang
备注:Accepted in ICASSP 2026
摘要:Vision-language models suffer performance degradation under domain shift, limiting real-world applicability. Existing test-time adaptation methods are computationally intensive, rely on back-propagation, and often focus on single modalities. To address these issues, we propose Training-free Test-Time Adaptation with Brownian Distance Covariance (TaTa). TaTa leverages Brownian Distance Covariance-a powerful statistical measure that captures both linear and nonlinear dependencies via pairwise distances-to dynamically adapt VLMs to new domains without training or back-propagation. This not only improves efficiency but also enhances stability by avoiding disruptive weight updates. TaTa further integrates attribute-enhanced prompting to improve vision-language inference with descriptive visual cues. Combined with dynamic clustering and pseudo-label refinement, it effectively recalibrates the model for novel visual contexts. Experiments across diverse datasets show that TaTa significantly reduces computational cost while achieving state-of-the-art performance in domain and cross-dataset generalization.


【3】Probing the Trajectories of Reasoning Traces in Large Language Models
标题:探索大型语言模型中推理痕迹的轨迹
链接:https://arxiv.org/abs/2601.23163

作者:Marthe Ballon,Brecht Verbeken,Vincent Ginis,Andres Algaba
备注:33 pages, 20 figures, 4 tables
摘要:Large language models (LLMs) increasingly solve difficult problems by producing "reasoning traces" before emitting a final response. However, it remains unclear how accuracy and decision commitment evolve along a reasoning trajectory, and whether intermediate trace segments provide answer-relevant information beyond generic length or stylistic effects. Here, we propose a protocol to systematically probe the trajectories of reasoning traces in LLMs by 1) generating a model's reasoning trace, 2) truncating it at fixed token-percentiles, and 3) injecting each partial trace back into the model (or a different model) to measure the induced distribution over answer choices via next-token probabilities. We apply this protocol to the open-source Qwen3-4B/-8B/-14B and gpt-oss-20b/-120b models across the multiple-choice GPQA Diamond and MMLU-Pro benchmarks. We find that accuracy and decision commitment consistently increase as the percentage of provided reasoning tokens grows. These gains are primarily driven by relevant content in the model generation rather than context length or generic "reasoning style" effects. Stronger models often backtrack successfully from incorrect partial traces, but immediate answers often remain anchored in the weaker model's incorrect response. More broadly, we show that trajectory probing provides diagnostics for efficient and safer deployment of reasoning models as the measurements can inform practical trace-handling and monitoring policies that improve reliability without assuming intermediate tokens are inherently faithful explanations.


【4】No More, No Less: Least-Privilege Language Models
标题:不多也不少:最小的语言模型
链接:https://arxiv.org/abs/2601.23157

作者:Paulius Rauba,Dominykas Seputis,Patrikas Vanagas,Mihaela van der Schaar
摘要 :Least privilege is a core security principle: grant each request only the minimum access needed to achieve its goal. Deployed language models almost never follow it, instead being exposed through a single API endpoint that serves all users and requests. This gap exists not because least privilege would be unhelpful; deployments would benefit greatly from reducing unnecessary capability exposure. The real obstacle is definitional and mechanistic: what does "access" mean inside a language model, and how can we enforce it without retraining or deploying multiple models? We take inspiration from least privilege in computer systems and define a class of models called least-privilege language models, where privilege is reachable internal computation during the forward pass. In this view, lowering privilege literally shrinks the model's accessible function class, as opposed to denying access via learned policies. We formalize deployment-time control as a monitor-allocator-enforcer stack, separating (i) request-time signals, (ii) a decision rule that allocates privilege, and (iii) an inference-time mechanism that selects privilege. We then propose Nested Least-Privilege Networks, a shape-preserving, rank-indexed intervention that provides a smooth, reversible control knob. We show that this knob yields policy-usable privilege-utility frontiers and enables selective suppression of targeted capabilities with limited collateral degradation across various policies. Most importantly, we argue for a new deployment paradigm that challenges the premise that language models can only be controlled at the output level.


【5】SPICE: Submodular Penalized Information-Conflict Selection for Efficient Large Language Model Training
标题:SPICE:子模块惩罚性信息冲突选择,以实现高效的大型语言模型训练
链接:https://arxiv.org/abs/2601.23155

作者:Powei Chang,Jinpeng Zhang,Bowen Chen,Chenyu Wang,Chenlu Guo,Yixing Zhang,Yukang Gao,JianXiang Xiang,Yue Gao,Chaoqun Sun,Yiyi Chen,Dongying Kong
备注:39 pages, 9 figures, 15 tables (including appendices)
摘要:Information-based data selection for instruction tuning is compelling: maximizing the log-determinant of the Fisher information yields a monotone submodular objective, enabling greedy algorithms to achieve a $(1-1/e)$ approximation under a cardinality budget. In practice, however, we identify alleviating gradient conflicts, misalignment between per-sample gradients, is a key factor that slows down the decay of marginal log-determinant information gains, thereby preventing significant loss of information. We formalize this via an $\varepsilon$-decomposition that quantifies the deviation from ideal submodularity as a function of conflict statistics, yielding data-dependent approximation factors that tighten as conflicts diminish. Guided by this analysis, we propose SPICE, a conflict-aware selector that maximizes information while penalizing misalignment, and that supports early stopping and proxy models for efficiency. Empirically, SPICE selects subsets with higher log-determinant information than original criteria, and these informational gains translate into performance improvements: across 8 benchmarks with LLaMA2-7B and Qwen2-7B, SPICE uses only 10% of the data, yet matches or exceeds 6 methods including full-data tuning. This achieves performance improvements with substantially lower training cost.


【6】Behemoth: Benchmarking Unlearning in LLMs Using Fully Synthetic Data
标题:庞然大物:使用完全合成数据在LLM中进行取消学习基准
链接:https://arxiv.org/abs/2601.23153

作者:Eugenia Iofinova,Dan Alistarh
摘要:As artificial neural networks, and specifically large language models, have improved rapidly in capabilities and quality, they have increasingly been deployed in real-world applications, from customer service to Google search, despite the fact that they frequently make factually incorrect or undesirable statements. This trend has inspired practical and academic interest in model editing, that is, in adjusting the weights of the model to modify its likely outputs for queries relating to a specific fact or set of facts. This may be done either to amend a fact or set of facts, for instance, to fix a frequent error in the training data, or to suppress a fact or set of facts entirely, for instance, in case of dangerous knowledge. Multiple methods have been proposed to do such edits. However, at the same time, it has been shown that such model editing can be brittle and incomplete. Moreover the effectiveness of any model editing method necessarily depends on the data on which the model is trained, and, therefore, a good understanding of the interaction of the training data distribution and the way it is stored in the network is necessary and helpful to reliably perform model editing. However, working with large language models trained on real-world data does not allow us to understand this relationship or fully measure the effects of model editing. We therefore propose Behemoth, a fully synthetic data generation framework. To demonstrate the practical insights from the framework, we explore model editing in the context of simple tabular data, demonstrating surprising findings that, in some cases, echo real-world results, for instance, that in some cases restricting the update rank results in a more effective update. The code is available at https://github.com/IST-DASLab/behemoth.git.


【7】CATTO: Balancing Preferences and Confidence in Language Models
标题:CATTO:平衡语言模型的偏好和信心
链接:https://arxiv.org/abs/2601.23096

作者:Nisarg Parikh,Kunjal Panchal,Ananya Sai,Pannaga Shivaswamy,Andrew Lan
摘要:Large language models (LLMs) often make accurate next token predictions but their confidence in these predictions can be poorly calibrated: high-confidence predictions are frequently wrong, and low-confidence predictions may be correct. This miscalibration is exacerbated by preference-based alignment methods breaking the link between predictive probability and correctness. We introduce a Calibration Aware Token-level Training Objective (CATTO), a calibration-aware objective that aligns predicted confidence with empirical prediction correctness, which can be combined with the original preference optimization objectives. Empirically, CATTO reduces Expected Calibration Error (ECE) by 2.22%-7.61% in-distribution and 1.46%-10.44% out-of-distribution compared to direct preference optimization (DPO), and by 0.22%-1.24% in-distribution and 1.23%-5.07% out-of-distribution compared to the strongest DPO baseline. This improvement in confidence does not come at a cost of losing task accuracy, where CATTO maintains or slightly improves multiple-choice question-answering accuracy on five datasets. We also introduce Confidence@k, a test-time scaling mechanism leveraging calibrated token probabilities for Bayes-optimal selection of output tokens.


【8】Mano: Restriking Manifold Optimization for LLM Training
标题:Mano:LLM训练的重新优化流程
链接:https://arxiv.org/abs/2601.23000

作者:Yufei Gu,Zeke Xie
摘要 :While large language models (LLMs) have emerged as a significant advancement in artificial intelligence, the hardware and computational costs for training LLMs are also significantly burdensome. Among the state-of-the-art optimizers, AdamW relies on diagonal curvature estimates and ignores structural properties, while Muon applies global spectral normalization at the expense of losing curvature information. In this study, we restriked manifold optimization methods for training LLMs, which may address both optimizers' limitations, while conventional manifold optimization methods have been largely overlooked due to the poor performance in large-scale model optimization. By innovatively projecting the momentum onto the tangent space of model parameters and constraining it on a rotational Oblique manifold, we propose a novel, powerful, and efficient optimizer **Mano** that is the first to bridge the performance gap between manifold optimization and modern optimizers. Extensive experiments on the LLaMA and Qwen3 models demonstrate that Mano consistently and significantly outperforms AdamW and Muon even with less memory consumption and computational complexity, respectively, suggesting an expanded Pareto frontier in terms of space and time efficiency.


【9】dgMARK: Decoding-Guided Watermarking for Diffusion Language Models
标题:dgMARK:扩散语言模型的解码引导水印
链接:https://arxiv.org/abs/2601.22985

作者:Pyo Min Hong,Albert No
备注:Project page: https://dgmark-watermarking.github.io
摘要:We propose dgMARK, a decoding-guided watermarking method for discrete diffusion language models (dLLMs). Unlike autoregressive models, dLLMs can generate tokens in arbitrary order. While an ideal conditional predictor would be invariant to this order, practical dLLMs exhibit strong sensitivity to the unmasking order, creating a new channel for watermarking. dgMARK steers the unmasking order toward positions whose high-reward candidate tokens satisfy a simple parity constraint induced by a binary hash, without explicitly reweighting the model's learned probabilities. The method is plug-and-play with common decoding strategies (e.g., confidence, entropy, and margin-based ordering) and can be strengthened with a one-step lookahead variant. Watermarks are detected via elevated parity-matching statistics, and a sliding-window detector ensures robustness under post-editing operations including insertion, deletion, substitution, and paraphrasing.


【10】Relaxing Positional Alignment in Masked Diffusion Language Models
标题:掩蔽扩散语言模型中放松位置对齐
链接:https://arxiv.org/abs/2601.22947

作者:Mengyu Ye,Ryosuke Takahashi,Keito Kudo,Jun Suzuki
摘要:Masked diffusion language models (MDLMs) have emerged as a promising alternative to dominant autoregressive approaches. Although they achieve competitive performance on several tasks, a substantial gap remains in open-ended text generation. We hypothesize that one cause of this gap is that strict positional prediction makes MDLM decoding highly sensitive to token misalignment, and we show through controlled interventions that a one-position shift can severely disrupt semantics. This observation suggests that enforcing strict positional supervision during training is misaligned with the irreversible denoising dynamics of MDLM decoding. Motivated by this mismatch, we adopt an alignment-flexible supervision strategy during fine-tuning. Specifically, we introduce a special token  via the connectionist temporal classification objective. We apply this approach to the widely used MDLM model and conduct experiments on five open-ended text generation benchmarks. Our method consistently outperforms the original model and improves robustness to positional shifts, indicating that relaxing strict positional supervision is an important factor in improving generation quality in MDLMs.


【11】LLMs Explain't: A Post-Mortem on Semantic Interpretability in Transformer Models
标题:LLM解释:Transformer模型中语义可解释性的事后剖析
链接:https://arxiv.org/abs/2601.22928

作者:Alhassan Abdelhalim,Janick Edinger,Sören Laue,Michaela Regneri
摘要:Large Language Models (LLMs) are becoming increasingly popular in pervasive computing due to their versatility and strong performance. However, despite their ubiquitous use, the exact mechanisms underlying their outstanding performance remain unclear. Different methods for LLM explainability exist, and many are, as a method, not fully understood themselves. We started with the question of how linguistic abstraction emerges in LLMs, aiming to detect it across different LLM modules (attention heads and input embeddings). For this, we used methods well-established in the literature: (1) probing for token-level relational structures, and (2) feature-mapping using embeddings as carriers of human-interpretable properties.   Both attempts failed for different methodological reasons: Attention-based explanations collapsed once we tested the core assumption that later-layer representations still correspond to tokens. Property-inference methods applied to embeddings also failed because their high predictive scores were driven by methodological artifacts and dataset structure rather than meaningful semantic knowledge. These failures matter because both techniques are widely treated as evidence for what LLMs supposedly understand, yet our results show such conclusions are unwarranted. These limitations are particularly relevant in pervasive and distributed computing settings where LLMs are deployed as system components and interpretability methods are relied upon for debugging, compression, and explaining models.


【12】BEAR: Towards Beam-Search-Aware Optimization for Recommendation with Large Language Models
标题:BEAR:采用大型语言模型进行Beam-Search-Aware推荐优化
链接:https://arxiv.org/abs/2601.22925

作者:Weiqin Yang,Bohao Wang,Zhenxiang Xu,Jiawei Chen,Shengjia Zhang,Jingbang Chen,Canghong Jin,Can Wang
摘要 :Recent years have witnessed a rapid surge in research leveraging Large Language Models (LLMs) for recommendation. These methods typically employ supervised fine-tuning (SFT) to adapt LLMs to recommendation scenarios, and utilize beam search during inference to efficiently retrieve $B$ top-ranked recommended items. However, we identify a critical training-inference inconsistency: while SFT optimizes the overall probability of positive items, it does not guarantee that such items will be retrieved by beam search even if they possess high overall probabilities. Due to the greedy pruning mechanism, beam search can prematurely discard a positive item once its prefix probability is insufficient.   To address this inconsistency, we propose BEAR (Beam-SEarch-Aware Regularization), a novel fine-tuning objective that explicitly accounts for beam search behavior during training. Rather than directly simulating beam search for each instance during training, which is computationally prohibitive, BEAR enforces a relaxed necessary condition: each token in a positive item must rank within the top-$B$ candidate tokens at each decoding step. This objective effectively mitigates the risk of incorrect pruning while incurring negligible computational overhead compared to standard SFT. Extensive experiments across four real-world datasets demonstrate that BEAR significantly outperforms strong baselines. Code will be released upon acceptance.


【13】Evaluating Large Language Models for Security Bug Report Prediction
标题:评估大型语言模型以预测安全漏洞报告
链接:https://arxiv.org/abs/2601.22921

作者:Farnaz Soltaniani,Shoaib Razzaq,Mohammad Ghafari
摘要:Early detection of security bug reports (SBRs) is critical for timely vulnerability mitigation. We present an evaluation of prompt-based engineering and fine-tuning approaches for predicting SBRs using Large Language Models (LLMs). Our findings reveal a distinct trade-off between the two approaches. Prompted proprietary models demonstrate the highest sensitivity to SBRs, achieving a G-measure of 77% and a recall of 74% on average across all the datasets, albeit at the cost of a higher false-positive rate, resulting in an average precision of only 22%. Fine-tuned models, by contrast, exhibit the opposite behavior, attaining a lower overall G-measure of 51% but substantially higher precision of 75% at the cost of reduced recall of 36%. Though a one-time investment in building fine-tuned models is necessary, the inference on the largest dataset is up to 50 times faster than that of proprietary models. These findings suggest that further investigations to harness the power of LLMs for SBR prediction are necessary.


【14】Quartet II: Accurate LLM Pre-Training in NVFP4 by Improved Unbiased Gradient Estimation
标题:Quartet II:通过改进的无偏梯度估计在NVFP 4中进行准确的LLM预训练
链接:https://arxiv.org/abs/2601.22813

作者:Andrei Panferov,Erik Schultheis,Soroush Tabesh,Dan Alistarh
摘要:The NVFP4 lower-precision format, supported in hardware by NVIDIA Blackwell GPUs, promises to allow, for the first time, end-to-end fully-quantized pre-training of massive models such as LLMs. Yet, existing quantized training methods still sacrifice some of the representation capacity of this format in favor of more accurate unbiased quantized gradient estimation by stochastic rounding (SR), losing noticeable accuracy relative to standard FP16 and FP8 training. In this paper, improve the state of the art for quantized training in NVFP4 via a novel unbiased quantization routine for micro-scaled formats, called MS-EDEN, that has more than 2x lower quantization error than SR. We integrate it into a novel fully-NVFP4 quantization scheme for linear layers, called Quartet II. We show analytically that Quartet II achieves consistently better gradient estimation across all major matrix multiplications, both on the forward and on the backward passes. In addition, our proposal synergizes well with recent training improvements aimed specifically at NVFP4. We further validate Quartet II on end-to-end LLM training with up to 1.9B parameters on 38B tokens. We provide kernels for execution on NVIDIA Blackwell GPUs with up to 4.2x speedup over BF16. Our code is available at https://github.com/IST-DASLab/Quartet-II .


【15】Clipping-Free Policy Optimization for Large Language Models
标题:大型语言模型的无剪切策略优化
链接:https://arxiv.org/abs/2601.22801

作者:Ömer Veysel Çağatan,Barış Akgün,Gözde Gül Şahin,Xuandong Zhao
备注:23 pages, 10 tables, 8 figures
摘要:Reinforcement learning has become central to post-training large language models, yet dominant algorithms rely on clipping mechanisms that introduce optimization issues at scale, including zero-gradient regions, reward hacking, and training instability. We propose Clipping-Free Policy Optimization (CFPO), which replaces heuristic clipping with a convex quadratic penalty derived from Total Variation divergence constraints, yielding an everywhere-differentiable objective that enforces stable policy updates without hard boundaries. We evaluate CFPO across both reasoning and alignment settings. In reasoning, CFPO matches clipping-based methods on downstream benchmarks while extending the stable training regime. In alignment, CFPO mitigates verbosity exploitation and reduces capability degradation, while achieving competitive instruction-following performance. CFPO requires only a one-line code change and no additional hyperparameters. Our results suggest that CFPO is a promising drop-in alternative to clipping-based methods for LLM post-training.


【16】Unveiling Scaling Behaviors in Molecular Language Models: Effects of Model Size, Data, and Representation
标题:揭示分子语言模型中的缩放行为:模型大小、数据和表示的影响
链接:https://arxiv.org/abs/2601.22757

作者:Dong Xu,Qihua Pan,Sisi Yuan,Jianqiang Li,Zexuan Zhu,Junkai Ji
备注:34 pages, 51 figures
摘要 :Molecular generative models, often employing GPT-style language modeling on molecular string representations, have shown promising capabilities when scaled to large datasets and model sizes. However, it remains unclear and subject to debate whether these models adhere to predictable scaling laws under fixed computational budgets, which is a crucial understanding for optimally allocating resources between model size, data volume, and molecular representation. In this study, we systematically investigate the scaling behavior of molecular language models across both pretraining and downstream tasks. We train 300 models and conduct over 10,000 experiments, rigorously controlling compute budgets while independently varying model size, number of training tokens, and molecular representation. Our results demonstrate clear scaling laws in molecular models for both pretraining and downstream transfer, reveal the substantial impact of molecular representation on performance, and explain previously observed inconsistencies in scaling behavior for molecular generation. Additionally, we publicly release the largest library of molecular language models to date to facilitate future research and development. Code and models are available at https://github.com/SZU-ADDG/MLM-Scaling.


【17】OSNIP: Breaking the Privacy-Utility-Efficiency Trilemma in LLM Inference via Obfuscated Semantic Null Space
标题:OSNIP:通过模糊语义空间打破LLM推理中的隐私-效用-效率三重困境
链接:https://arxiv.org/abs/2601.22752

作者:Zhiyuan Cao,Zeyu Ma,Chenhao Yang,Han Zheng,Mingang Chen
摘要:We propose Obfuscated Semantic Null space Injection for Privacy (OSNIP), a lightweight client-side encryption framework for privacy-preserving LLM inference. Generalizing the geometric intuition of linear kernels to the high-dimensional latent space of LLMs, we formally define the ``Obfuscated Semantic Null Space'', a high-dimensional regime that preserves semantic fidelity while enforcing near-orthogonality to the original embedding. By injecting perturbations that project the original embedding into this space, OSNIP ensures privacy without any post-processing. Furthermore, OSNIP employs a key-dependent stochastic mapping that synthesizes individualized perturbation trajectories unique to each user. Evaluations on 12 generative and classification benchmarks show that OSNIP achieves state-of-the-art performance, sharply reducing attack success rates while maintaining strong model utility under strict security constraints.


【18】Breaking the Blocks: Continuous Low-Rank Decomposed Scaling for Unified LLM Quantization and Adaptation
标题:打破障碍:用于统一LLM量化和适应的连续低等级分解缩放
链接:https://arxiv.org/abs/2601.22716

作者:Pingzhi Tang,Ruijie Zhou,Fanxu Meng,Wenjie Pei,Muhan Zhang
摘要:Current quantization methods for LLMs predominantly rely on block-wise structures to maintain efficiency, often at the cost of representational flexibility. In this work, we demonstrate that element-wise quantization can be made as efficient as block-wise scaling while providing strictly superior expressive power by modeling the scaling manifold as continuous low-rank matrices ($S = BA$). We propose Low-Rank Decomposed Scaling (LoRDS), a unified framework that rethinks quantization granularity through this low-rank decomposition. By "breaking the blocks" of spatial constraints, LoRDS establishes a seamless efficiency lifecycle: it provides high-fidelity PTQ initialization refined via iterative optimization, enables joint QAT of weights and scaling factors, and facilitates high-rank multiplicative PEFT adaptation. Unlike additive PEFT approaches such as QLoRA, LoRDS enables high-rank weight updates within a low-rank budget while incurring no additional inference overhead. Supported by highly optimized Triton kernels, LoRDS consistently outperforms state-of-the-art baselines across various model families in both quantization and downstream fine-tuning tasks. Notably, on Llama3-8B, our method achieves up to a 27.0% accuracy improvement at 3 bits over NormalFloat quantization and delivers a 1.5x inference speedup on NVIDIA RTX 4090 while enhancing PEFT performance by 9.6% on downstream tasks over 4bit QLoRA, offering a robust and integrated solution for unified compression and adaptation of LLMs.


【19】Vision-Language Models Unlock Task-Centric Latent Actions
标题:视觉语言模型简化以任务为中心的潜在动作
链接:https://arxiv.org/abs/2601.22714

作者:Alexander Nikulin,Ilya Zisman,Albina Klepach,Denis Tarasov,Alexander Derevyagin,Andrei Polubarov,Lyubaykin Nikita,Vladislav Kurenkov
备注:Preprint
摘要:Latent Action Models (LAMs) have rapidly gained traction as an important component in the pre-training pipelines of leading Vision-Language-Action models. However, they fail when observations contain action-correlated distractors, often encoding noise instead of meaningful latent actions. Humans, on the other hand, can effortlessly distinguish task-relevant motions from irrelevant details in any video given only a brief task description. In this work, we propose to utilize the common-sense reasoning abilities of Vision-Language Models (VLMs) to provide promptable representations, effectively separating controllable changes from the noise in unsupervised way. We use these representations as targets during LAM training and benchmark a wide variety of popular VLMs, revealing substantial variation in the quality of promptable representations as well as their robustness to different prompts and hyperparameters. Interestingly, we find that more recent VLMs may perform worse than older ones. Finally, we show that simply asking VLMs to ignore distractors can substantially improve latent action quality, yielding up to a six-fold increase in downstream success rates on Distracting MetaWorld.


【20】Rethinking LLM-as-a-Judge: Representation-as-a-Judge with Small Language Models via Semantic Capacity Asymmetry
标题:重新思考法学硕士作为法官:通过语义容量不对称用小语言模型表示作为法官
链接:https://arxiv.org/abs/2601.22588

作者:Zhuochun Li,Yong Zhang,Ming Li,Yuelyu Ji,Yiming Zeng,Ning Cheng,Yun Zhu,Yanmeng Wang,Shaojun Wang,Jing Xiao,Daqing He
摘要 :Large language models (LLMs) are widely used as reference-free evaluators via prompting, but this "LLM-as-a-Judge" paradigm is costly, opaque, and sensitive to prompt design. In this work, we investigate whether smaller models can serve as efficient evaluators by leveraging internal representations instead of surface generation. We uncover a consistent empirical pattern: small LMs, despite with weak generative ability, encode rich evaluative signals in their hidden states. This motivates us to propose the Semantic Capacity Asymmetry Hypothesis: evaluation requires significantly less semantic capacity than generation and can be grounded in intermediate representations, suggesting that evaluation does not necessarily need to rely on large-scale generative models but can instead leverage latent features from smaller ones. Our findings motivate a paradigm shift from LLM-as-a-Judge to Representation-as-a-Judge, a decoding-free evaluation strategy that probes internal model structure rather than relying on prompted output. We instantiate this paradigm through INSPECTOR, a probing-based framework that predicts aspect-level evaluation scores from small model representations. Experiments on reasoning benchmarks (GSM8K, MATH, GPQA) show that INSPECTOR substantially outperforms prompting-based small LMs and closely approximates full LLM judges, while offering a more efficient, reliable, and interpretable alternative for scalable evaluation.


【21】HetCCL: Accelerating LLM Training with Heterogeneous GPUs
标题:HetCCL:利用异类图形处理器加速LLM训练
链接:https://arxiv.org/abs/2601.22585

作者:Heehoon Kim,Jaehwan Lee,Taejeoung Kim,Jongwon Park,Jinpyo Kim,Pyongwon Suh,Ryan H. Choi,Sangwoo Lee,Jaejin Lee
摘要:The rapid growth of large language models is driving organizations to expand their GPU clusters, often with GPUs from multiple vendors. However, current deep learning frameworks lack support for collective communication across heterogeneous GPUs, leading to inefficiency and higher costs. We present HetCCL, a collective communication library that unifies vendor-specific backends and enables RDMA-based communication across GPUs without requiring driver modifications. HetCCL introduces two novel mechanisms that enable cross-vendor communication while leveraging optimized vendor libraries, NVIDIA NCCL and AMD RCCL. Evaluations on a multi-vendor GPU cluster show that HetCCL matches NCCL and RCCL performance in homogeneous setups while uniquely scaling in heterogeneous environments, enabling practical, high-performance training with both NVIDIA and AMD GPUs without changes to existing deep learning applications.


【22】Are LLM Evaluators Really Narcissists? Sanity Checking Self-Preference Evaluations
标题:法学硕士评估员真的是自恋者吗?健全检查自我偏好评估
链接:https://arxiv.org/abs/2601.22548

作者:Dani Roytburg,Matthew Bozoukov,Matthew Nguyen,Mackenzie Puig-Hall,Narmeen Oozeer
摘要:Recent research has shown that large language models (LLM) favor own outputs when acting as judges, undermining the integrity of automated post-training and evaluation workflows. However, it is difficult to disentangle which evaluation biases are explained by narcissism versus general experimental confounds, distorting measurements of self-preference bias. We discover a core methodological confound which could reduce measurement error by 89.6%. Specifically, LLM evaluators may deliver self-preferring verdicts when the judge responds to queries which they completed incorrectly themselves; this would be true regardless of whether one of their responses is their own. To decouple self-preference signals from noisy outputs on hard problems, we introduce an Evaluator Quality Baseline, which compares the probability that a judge incorrectly votes for itself against the probability that it votes for an incorrect response from another model. Evaluating this simple baseline on 37,448 queries, only 51% of initial findings retain statistical significance. Finally, we turn towards characterizing the entropy of "easy" versus "hard" evaluation votes from LLM judges. Our corrective baseline enables future research on self-preference by eliminating noisy data from potential solutions. More widely, this work contributes to the growing body of work on cataloging and isolating judge-bias effects.


【23】Unrewarded Exploration in Large Language Models Reveals Latent Learning from Psychology
标题:大型语言模型中的无回报探索揭示了心理学的潜在学习
链接:https://arxiv.org/abs/2601.22474

作者:Jian Xiong,Jingbo Zhou,Zihan Zhou,Yixiong Xiao,Le Zhang,Jingyong Ye,Rui Qian,Yang Zhou,Dejing Dou
备注:17pages, 1 figure
摘要:Latent learning, classically theorized by Tolman, shows that biological agents (e.g., rats) can acquire internal representations of their environment without rewards, enabling rapid adaptation once rewards are introduced. In contrast, from a cognitive science perspective, reward learning remains overly dependent on external feedback, limiting flexibility and generalization. Although recent advances in the reasoning capabilities of large language models (LLMs), such as OpenAI-o1 and DeepSeek-R1, mark a significant breakthrough, these models still rely primarily on reward-centric reinforcement learning paradigms. Whether and how the well-established phenomenon of latent learning in psychology can inform or emerge within LLMs' training remains largely unexplored. In this work, we present novel findings from our experiments that LLMs also exhibit the latent learning dynamics. During an initial phase of unrewarded exploration, LLMs display modest performance improvements, as this phase allows LLMs to organize task-relevant knowledge without being constrained by reward-driven biases, and performance is further enhanced once rewards are introduced. LLMs post-trained under this two-stage exploration regime ultimately achieve higher competence than those post-trained with reward-based reinforcement learning throughout. Beyond these empirical observations, we also provide theoretical analyses for our experiments explaining why unrewarded exploration yields performance gains, offering a mechanistic account of these dynamics. Specifically, we conducted extensive experiments across multiple model families and diverse task domains to establish the existence of the latent learning dynamics in LLMs.


【24】Tuning the Implicit Regularizer of Masked Diffusion Language Models: Enhancing Generalization via Insights from $k$-Parity
标题:调整掩蔽扩散语言模型的隐式调节器:通过$k$-Parity的见解增强一般化
链接:https://arxiv.org/abs/2601.22450

作者:Jianhao Huang,Baharan Mirzasoleiman
摘要 :Masked Diffusion Language Models have recently emerged as a powerful generative paradigm, yet their generalization properties remain understudied compared to their auto-regressive counterparts. In this work, we investigate these properties within the setting of the $k$-parity problem (computing the XOR sum of $k$ relevant bits), where neural networks typically exhibit grokking -- a prolonged plateau of chance-level performance followed by sudden generalization. We theoretically decompose the Masked Diffusion (MD) objective into a Signal regime which drives feature learning, and a Noise regime which serves as an implicit regularizer. By training nanoGPT using MD objective on the $k$-parity problem, we demonstrate that MD objective fundamentally alters the learning landscape, enabling rapid and simultaneous generalization without experiencing grokking. Furthermore, we leverage our theoretical insights to optimize the distribution of the mask probability in the MD objective. Our method significantly improves perplexity for 50M-parameter models and achieves superior results across both pre-training from scratch and supervised fine-tuning. Specifically, we observe performance gains peaking at $8.8\%$ and $5.8\%$, respectively, on 8B-parameter models, confirming the scalability and effectiveness of our framework in large-scale masked diffusion language model regimes.


【25】HeaPA: Difficulty-Aware Heap Sampling and On-Policy Query Augmentation for LLM Reinforcement Learning
标题:HePA:LLM强化学习的难度感知堆采样和按策略查询增强
链接:https://arxiv.org/abs/2601.22448

作者:Weiqi Wang,Xin Liu,Binxuan Huang,Hejie Cui,Rongzhi Zhang,Changlong Yu,Shuowei Jin,Jingfeng Yang,Qingyu Yin,Zhengyang Wang,Zheng Li,Yifan Gao,Priyanka Nigam,Bing Yin,Lihong Li,Yangqiu Song
摘要:RLVR is now a standard way to train LLMs on reasoning tasks with verifiable outcomes, but when rollout generation dominates the cost, efficiency depends heavily on which prompts you sample and when. In practice, prompt pools are often static or only loosely tied to the model's learning progress, so uniform sampling can't keep up with the shifting capability frontier and ends up wasting rollouts on prompts that are already solved or still out of reach. Existing approaches improve efficiency through filtering, curricula, adaptive rollout allocation, or teacher guidance, but they typically assume a fixed pool-which makes it hard to support stable on-policy pool growth-or they add extra teacher cost and latency. We introduce HeaPA (Heap Sampling and On-Policy Query Augmentation), which maintains a bounded, evolving pool, tracks the frontier using heap-based boundary sampling, expands the pool via on-policy augmentation with lightweight asynchronous validation, and stabilizes correlated queries through topology-aware re-estimation of pool statistics and controlled reinsertion. Across two training corpora, two training recipes, and seven benchmarks, HeaPA consistently improves accuracy and reaches target performance with fewer computations while keeping wall-clock time comparable. Our analyses suggest these gains come from frontier-focused sampling and on-policy pool growth, with the benefits becoming larger as model scale increases. Our code is available at https://github.com/horizon-rl/HeaPA.


【26】Towards Resiliency in Large Language Model Serving with KevlarFlow
标题:利用KevlarFlow服务的大型语言模型中的弹性
链接:https://arxiv.org/abs/2601.22438

作者:Shangshu Qian,Kipling Liu,P. C. Sruthi,Lin Tan,Yongle Zhang
摘要:Large Language Model (LLM) serving systems remain fundamentally fragile, where frequent hardware faults in hyperscale clusters trigger disproportionate service outages in the software stack. Current recovery mechanisms are prohibitively slow, often requiring up to 10 minutes to reinitialize resources and reload massive model weights. We introduce KevlarFlow, a fault tolerant serving architecture designed to bridge the gap between hardware unreliability and service availability. KevlarFlow leverages 1) decoupled model parallelism initialization, 2) dynamic traffic rerouting, and 3) background KV cache replication to maintain high throughput during partial failures. Our evaluation demonstrates that KevlarFlow reduces mean-time-to-recovery (MTTR) by 20x and, under failure conditions, improves average latency by 3.1x, 99th percentile (p99) latency by 2.8x, average time-to-first-token (TTFT) by 378.9x, and p99 TTFT by 574.6x with negligible runtime overhead in comparison to state-of-the-art LLM serving systems.


【27】SP^2DPO: An LLM-assisted Semantic Per-Pair DPO Generalization
标题:SP ' 2 DPO:LLM辅助的每对语义DPO概括
链接:https://arxiv.org/abs/2601.22385

作者:Chaoyue He,Xin Zhou,Di Wang,Hong Xu,Wei Liu,Chunyan Miao
备注:39 pages, 15 figures, 16 tables, 60 equations
摘要:Direct Preference Optimization (DPO) controls the trade-off between fitting preference labels and staying close to a reference model using a single global temperature beta, implicitly treating all preference pairs as equally informative. Real-world preference corpora are heterogeneous: they mix high-signal, objective failures (for example, safety, factuality, instruction violations) with low-signal or subjective distinctions (for example, style), and also include label noise. We introduce our method, SP2DPO (Semantic Per-Pair DPO), a generalization that replaces the global temperature with an instance-specific schedule beta_i pre-decided offline from structured semantic-gap annotations (category, magnitude, confidence) produced by teacher language models. We instantiate this procedure on the UltraFeedback preference corpus (59,960 pairs), enabling large-scale construction of an auditable beta_i artifact, and incur zero training-time overhead: the inner-loop optimizer remains standard DPO with beta set per pair. We focus our empirical study on AlpacaEval 2.0, reporting both raw win rate and length-controlled win rate. Across four open-weight, instruction-tuned student backbones (4B-8B), SP2DPO is competitive with a tuned global-beta DPO baseline and improves AlpacaEval 2.0 length-controlled win rate on two of four backbones, while avoiding per-model beta sweeps. All code, annotations, and artifacts will be released.


【28】Towards Solving the Gilbert-Pollak Conjecture via Large Language Models
标题:通过大型语言模型解决Gilbert-Pollak猜想
链接:https://arxiv.org/abs/2601.22365

作者:Yisi Ke,Tianyu Huang,Yankai Shu,Di He,Jingchu Gai,Liwei Wang
备注:44 pages, 11 figures
摘要 :The Gilbert-Pollak Conjecture \citep{gilbert1968steiner}, also known as the Steiner Ratio Conjecture, states that for any finite point set in the Euclidean plane, the Steiner minimum tree has length at least $\sqrt{3}/2 \approx 0.866$ times that of the Euclidean minimum spanning tree (the Steiner ratio). A sequence of improvements through the 1980s culminated in a lower bound of $0.824$, with no substantial progress reported over the past three decades. Recent advances in LLMs have demonstrated strong performance on contest-level mathematical problems, yet their potential for addressing open, research-level questions remains largely unexplored. In this work, we present a novel AI system for obtaining tighter lower bounds on the Steiner ratio. Rather than directly prompting LLMs to solve the conjecture, we task them with generating rule-constrained geometric lemmas implemented as executable code. These lemmas are then used to construct a collection of specialized functions, which we call verification functions, that yield theoretically certified lower bounds of the Steiner ratio. Through progressive lemma refinement driven by reflection, the system establishes a new certified lower bound of 0.8559 for the Steiner ratio. The entire research effort involves only thousands of LLM calls, demonstrating the strong potential of LLM-based systems for advanced mathematical research.


【29】Understanding Efficiency: Quantization, Batching, and Serving Strategies in LLM Energy Use
标题:理解效率:LLM能源使用中的量化,分批和服务策略
链接:https://arxiv.org/abs/2601.22362

作者:Julien Delavande,Regis Pierrard,Sasha Luccioni
摘要:Large Language Models (LLMs) are increasingly deployed in production, contributing towards shifting the burden in terms of computational resources and energy demands from training to inference. While prior work has examined the energy cost of inference per prompt or per token, we highlight how \emph{system-level design choices} - such as numerical precision, batching strategy, and request scheduling - can lead to orders-of-magnitude differences in energy consumption for the same model. We perform a detailed empirical study of LLM inference energy and latency on NVIDIA H100 GPUs, analyzing the impact of quantization, batch size, and serving configuration (e.g., with Hugging Face's Text Generation Inference server). Our results reveal that lower-precision formats only yield energy gains in compute-bound regimes; that batching improves energy efficiency, especially in memory-bound phases like decoding; and that structured request timing (arrival shaping) can reduce per-request energy by up to 100 times. We argue that sustainable LLM deployment depends not only on model internals, but also on the orchestration of the serving stack. Our findings motivate phase-aware energy profiling and system-level optimizations for greener AI services.


【30】Failing to Explore: Language Models on Interactive Tasks
标题:未能探索:交互任务的语言模型
链接:https://arxiv.org/abs/2601.22345

作者:Mahdi JafariRaviz,Keivan Rezaei,Arshia Soltani Moakhar,Zahra Sodagar,Yize Cheng,Soheil Feizi
摘要:We evaluate language models on their ability to explore interactive environments under a limited interaction budget. We introduce three parametric tasks with controllable exploration difficulty, spanning continuous and discrete environments. Across state-of-the-art models, we find systematic under-exploration and suboptimal solutions, with performance often significantly worse than simple explore--exploit heuristic baselines and scaling weakly as the budget increases. Finally, we study two lightweight interventions: splitting a fixed budget into parallel executions, which surprisingly improves performance despite a no-gain theoretical result for our tasks, and periodically summarizing the interaction history, which preserves key discoveries and further improves exploration.


【31】Federate the Router: Learning Language Model Routers with Sparse and Decentralized Evaluations
标题:联合路由器:具有稀疏和去中心化评估的学习语言模型路由器
链接:https://arxiv.org/abs/2601.22318

作者:Baris Askin,Shivam Patel,Anupam Nayak,Andrea Vigano,Jiin Woo,Gauri Joshi,Carlee Joe-Wong
摘要:Large language models (LLMs) are increasingly accessed as remotely hosted services by edge and enterprise clients that cannot run frontier models locally. Since models vary widely in capability and price, routing queries to models that balance quality and inference cost is essential. Existing router approaches assume access to centralized query-model evaluation data. However, these data are often fragmented across clients, such as end users and organizations, and are privacy-sensitive, which makes centralizing data infeasible. Additionally, per-client router training is ineffective since local evaluation data is limited and covers only a restricted query distribution and a biased subset of model evaluations. We introduce the first federated framework for LLM routing, enabling clients to learn a shared routing policy from local offline query-model evaluation data. Our framework supports both parametric multilayer perceptron router and nonparametric K-means router under heterogeneous client query distributions and non-uniform model coverage. Across two benchmarks, federated collaboration improves the accuracy-cost frontier over client-local routers, both via increased effective model coverage and better query generalization. Our theoretical results also validate that federated training reduces routing suboptimality.


【32】Why Reasoning Fails to Plan: A Planning-Centric Analysis of Long-Horizon Decision Making in LLM Agents
标题:为什么推理无法规划:LLM代理长期决策的以规划为中心的分析
链接:https://arxiv.org/abs/2601.22311

作者:Zehong Wang,Fang Wu,Hongru Wang,Xiangru Tang,Bolian Li,Zhenfei Yin,Yijun Ma,Yiyang Li,Weixiang Sun,Xiusi Chen,Yanfang Ye
摘要 :Large language model (LLM)-based agents exhibit strong step-by-step reasoning capabilities over short horizons, yet often fail to sustain coherent behavior over long planning horizons. We argue that this failure reflects a fundamental mismatch: step-wise reasoning induces a form of step-wise greedy policy that is adequate for short horizons but fails in long-horizon planning, where early actions must account for delayed consequences. From this planning-centric perspective, we study LLM-based agents in deterministic, fully structured environments with explicit state transitions and evaluation signals. Our analysis reveals a core failure mode of reasoning-based policies: locally optimal choices induced by step-wise scoring lead to early myopic commitments that are systematically amplified over time and difficult to recover from. We introduce FLARE (Future-aware Lookahead with Reward Estimation) as a minimal instantiation of future-aware planning to enforce explicit lookahead, value propagation, and limited commitment in a single model, allowing downstream outcomes to influence early decisions. Across multiple benchmarks, agent frameworks, and LLM backbones, FLARE consistently improves task performance and planning-level behavior, frequently allowing LLaMA-8B with FLARE to outperform GPT-4o with standard step-by-step reasoning. These results establish a clear distinction between reasoning and planning.


【33】Predicting Intermittent Job Failure Categories for Diagnosis Using Few-Shot Fine-Tuned Language Models
标题:使用少次精调语言模型预测间歇性工作故障类别以进行诊断
链接:https://arxiv.org/abs/2601.22264

作者:Henri Aïdasso,Francis Bordeleau,Ali Tizghadam
摘要:In principle, Continuous Integration (CI) pipeline failures provide valuable feedback to developers on code-related errors. In practice, however, pipeline jobs often fail intermittently due to non-deterministic tests, network outages, infrastructure failures, resource exhaustion, and other reliability issues. These intermittent (flaky) job failures lead to substantial inefficiencies: wasted computational resources from repeated reruns and significant diagnosis time that distracts developers from core activities and often requires intervention from specialized teams. Prior work has proposed machine learning techniques to detect intermittent failures, but does not address the subsequent diagnosis challenge. To fill this gap, we introduce FlaXifyer, a few-shot learning approach for predicting intermittent job failure categories using pre-trained language models. FlaXifyer requires only job execution logs and achieves 84.3% Macro F1 and 92.0% Top-2 accuracy with just 12 labeled examples per category. We also propose LogSift, an interpretability technique that identifies influential log statements in under one second, reducing review effort by 74.4% while surfacing relevant failure information in 87% of cases. Evaluation on 2,458 job failures from TELUS demonstrates that FlaXifyer and LogSift enable effective automated triage, accelerate failure diagnosis, and pave the way towards the automated resolution of intermittent job failures.


【34】A Systematic Literature Review on LLM Defenses Against Prompt Injection and Jailbreaking: Expanding NIST Taxonomy
标题:关于LLM针对即时注射和越狱的防御的系统性文献评论:扩展NIH分类学
链接:https://arxiv.org/abs/2601.22240

作者:Pedro H. Barcha Correia,Ryan W. Achjian,Diego E. G. Caetano de Oliveira,Ygor Acacio Maria,Victor Takashi Hayashi,Marcos Lopes,Charles Christian Miers,Marcos A. Simplicio
备注:27 pages, 14 figures, 11 tables, submitted to Elsevier Computer Science Review
摘要:The rapid advancement and widespread adoption of generative artificial intelligence (GenAI) and large language models (LLMs) has been accompanied by the emergence of new security vulnerabilities and challenges, such as jailbreaking and other prompt injection attacks. These maliciously crafted inputs can exploit LLMs, causing data leaks, unauthorized actions, or compromised outputs, for instance. As both offensive and defensive prompt injection techniques evolve quickly, a structured understanding of mitigation strategies becomes increasingly important. To address that, this work presents the first systematic literature review on prompt injection mitigation strategies, comprehending 88 studies. Building upon NIST's report on adversarial machine learning, this work contributes to the field through several avenues. First, it identifies studies beyond those documented in NIST's report and other academic reviews and surveys. Second, we propose an extension to NIST taxonomy by introducing additional categories of defenses. Third, by adopting NIST's established terminology and taxonomy as a foundation, we promote consistency and enable future researchers to build upon the standardized taxonomy proposed in this work. Finally, we provide a comprehensive catalog of the reviewed prompt injection defenses, documenting their reported quantitative effectiveness across specific LLMs and attack datasets, while also indicating which solutions are open-source and model-agnostic. This catalog, together with the guidelines presented herein, aims to serve as a practical resource for researchers advancing the field of adversarial machine learning and for developers seeking to implement effective defenses in production systems.


【35】DAJ: Data-Reweighted LLM Judge for Test-Time Scaling in Code Generation
标题:DAJ:代码生成中测试时间缩放的数据重新加权LLM判断器
链接:https://arxiv.org/abs/2601.22230

作者:Peijia Qin,Ruiyi Zhang,Qi Cao,Pengtao Xie
摘要:Test-time scaling for code generation commonly relies on Best-of-N selection, in which multiple candidate solutions are sampled from a base model, and the best one is selected by an LLM judge. However, training reliable LLM judges is challenging due to severe distribution shifts, including imbalances between easy and hard problems, mismatches between training tasks and evaluation benchmarks, and trajectory mismatch arising from training data generated by cheaper models whose behavior differs from that of inference-time models. We propose DAJ, a reasoning-based LLM judge trained with verifiable rewards under a bi-level data-reweighted learning framework. The proposed framework learns data-importance weights (either domain-level or instance-level) to optimize generalization performance on a held-out meta set aligned with target benchmarks. To the best of our knowledge, this is the first application of data reweighting to LLM-as-a-Judge training for test-time scaling. Our approach automatically emphasizes hard problems, in-distribution samples, and trajectory-aligned data, without relying on hand-crafted heuristics. Empirically, DAJ achieves state-of-the-art performance on LiveCodeBench and BigCodeBench, outperforming strong test-time scaling baselines as well as leading proprietary models.


【36】Tacit Coordination of Large Language Models
标题:大型语言模型的隐性协调
链接:https://arxiv.org/abs/2601.22184

作者:Ido Aharon,Emanuele La Malfa,Michael Wooldridge,Sarit Kraus
备注:Code: https://github.com/EmanueleLM/focal-points
摘要 :In tacit coordination games with multiple outcomes, purely rational solution concepts, such as Nash equilibria, provide no guidance for which equilibrium to choose. Shelling's theory explains how, in these settings, humans coordinate by relying on focal points: solutions or outcomes that naturally arise because they stand out in some way as salient or prominent to all players. This work studies Large Language Models (LLMs) as players in tacit coordination games, and addresses how, when, and why focal points emerge. We compare and quantify the coordination capabilities of LLMs in cooperative and competitive games for which human experiments are available. We also introduce several learning-free strategies to improve the coordination of LLMs, with themselves and with humans. On a selection of heterogeneous open-source models, including Llama, Qwen, and GPT-oss, we discover that LLMs have a remarkable capability to coordinate and often outperform humans, yet fail on common-sense coordination that involves numbers or nuanced cultural archetypes. This paper constitutes the first large-scale assessment of LLMs' tacit coordination within the theoretical and psychological framework of focal points.


【37】Large Language Models: A Mathematical Formulation
标题:大型语言模型:数学公式
链接:https://arxiv.org/abs/2601.22170

作者:Ricardo Baptista,Andrew Stuart,Son Tran
备注:51 pages, 2 figures
摘要:Large language models (LLMs) process and predict sequences containing text to answer questions, and address tasks including document summarization, providing recommendations, writing software and solving quantitative problems. We provide a mathematical framework for LLMs by describing the encoding of text sequences into sequences of tokens, defining the architecture for next-token prediction models, explaining how these models are learned from data, and demonstrating how they are deployed to address a variety of tasks. The mathematical sophistication required to understand this material is not high, and relies on straightforward ideas from information theory, probability and optimization. Nonetheless, the combination of ideas resting on these different components from the mathematical sciences yields a complex algorithmic structure; and this algorithmic structure has demonstrated remarkable empirical successes. The mathematical framework established here provides a platform from which it is possible to formulate and address questions concerning the accuracy, efficiency and robustness of the algorithms that constitute LLMs. The framework also suggests directions for development of modified and new methodologies.


【38】In Vino Veritas and Vulnerabilities: Examining LLM Safety via Drunk Language Inducement
标题:Vino Veritas和漏洞:通过醉酒语言诱导检查LLM安全性
链接:https://arxiv.org/abs/2601.22169

作者:Anudeex Shetty,Aditya Joshi,Salil S. Kanhere
备注:WIP
摘要:Humans are susceptible to undesirable behaviours and privacy leaks under the influence of alcohol. This paper investigates drunk language, i.e., text written under the influence of alcohol, as a driver for safety failures in large language models (LLMs). We investigate three mechanisms for inducing drunk language in LLMs: persona-based prompting, causal fine-tuning, and reinforcement-based post-training. When evaluated on 5 LLMs, we observe a higher susceptibility to jailbreaking on JailbreakBench (even in the presence of defences) and privacy leaks on ConfAIde, where both benchmarks are in English, as compared to the base LLMs as well as previously reported approaches. Via a robust combination of manual evaluation and LLM-based evaluators and analysis of error categories, our findings highlight a correspondence between human-intoxicated behaviour, and anthropomorphism in LLMs induced with drunk language. The simplicity and efficiency of our drunk language inducement approaches position them as potential counters for LLM safety tuning, highlighting significant risks to LLM safety.


【39】Dependence-Aware Label Aggregation for LLM-as-a-Judge via Ising Models
标题:通过伊辛模型实现LLM作为法官的依赖感知标签聚合
链接:https://arxiv.org/abs/2601.22336

作者:Krishnakumar Balasubramanian,Aleksandr Podkopaev,Shiva Prasad Kasiviswanathan
摘要:Large-scale AI evaluation increasingly relies on aggregating binary judgments from $K$ annotators, including LLMs used as judges. Most classical methods, e.g., Dawid-Skene or (weighted) majority voting, assume annotators are conditionally independent given the true label $Y\in\{0,1\}$, an assumption often violated by LLM judges due to shared data, architectures, prompts, and failure modes. Ignoring such dependencies can yield miscalibrated posteriors and even confidently incorrect predictions. We study label aggregation through a hierarchy of dependence-aware models based on Ising graphical models and latent factors. For class-dependent Ising models, the Bayes log-odds is generally quadratic in votes; for class-independent couplings, it reduces to a linear weighted vote with correlation-adjusted parameters. We present finite-$K$ examples showing that methods based on conditional independence can flip the Bayes label despite matching per-annotator marginals. We prove separation results demonstrating that these methods remain strictly suboptimal as the number of judges grows, incurring nonvanishing excess risk under latent factors. Finally, we evaluate the proposed method on three real-world datasets, demonstrating improved performance over the classical baselines.


Graph相关(图学习|图神经网络|图优化等)(15篇)

【1】Sequence Diffusion Model for Temporal Link Prediction in Continuous-Time Dynamic Graph
标题:连续时间动态图中时间链预测的序列扩散模型
链接:https://arxiv.org/abs/2601.23233

作者:Nguyen Minh Duc,Viet Cuong Ta
摘要 :Temporal link prediction in dynamic graphs is a fundamental problem in many real-world systems. Existing temporal graph neural networks mainly focus on learning representations of historical interactions. Despite their strong performance, these models are still purely discriminative, producing point estimates for future links and lacking an explicit mechanism to capture the uncertainty and sequential structure of future temporal interactions. In this paper, we propose SDG, a novel sequence-level diffusion framework that unifies dynamic graph learning with generative denoising. Specifically, SDG injects noise into the entire historical interaction sequence and jointly reconstructs all interaction embeddings through a conditional denoising process, thereby enabling the model to capture more comprehensive interaction distributions. To align the generative process with temporal link prediction, we employ a cross-attention denoising decoder to guide the reconstruction of the destination sequence and optimize the model in an end-to-end manner. Extensive experiments on various temporal graph benchmarks show that SDG consistently achieves state-of-the-art performance in the temporal link prediction task.


【2】Learning to Execute Graph Algorithms Exactly with Graph Neural Networks
标题:学习使用图神经网络准确执行图算法
链接:https://arxiv.org/abs/2601.23207

作者:Muhammad Fetrat Qharabagh,Artur Back de Luca,George Giapitzakis,Kimon Fountoulakis
摘要:Understanding what graph neural networks can learn, especially their ability to learn to execute algorithms, remains a central theoretical challenge. In this work, we prove exact learnability results for graph algorithms under bounded-degree and finite-precision constraints. Our approach follows a two-step process. First, we train an ensemble of multi-layer perceptrons (MLPs) to execute the local instructions of a single node. Second, during inference, we use the trained MLP ensemble as the update function within a graph neural network (GNN). Leveraging Neural Tangent Kernel (NTK) theory, we show that local instructions can be learned from a small training set, enabling the complete graph algorithm to be executed during inference without error and with high probability. To illustrate the learning power of our setting, we establish a rigorous learnability result for the LOCAL model of distributed computation. We further demonstrate positive learnability results for widely studied algorithms such as message flooding, breadth-first and depth-first search, and Bellman-Ford.


【3】Securing Time in Energy IoT: A Clock-Dynamics-Aware Spatio-Temporal Graph Attention Network for Clock Drift Attacks and Y2K38 Failures
标题:能源物联网中的时间安全:时钟动态感知时空图注意力网络,用于应对时钟漂移攻击和Y2 K 38故障
链接:https://arxiv.org/abs/2601.23147

作者:Saeid Jamshidi,Omar Abdul Wahab,Rolando Herrero,Foutse Khomh
摘要:The integrity of time in distributed Internet of Things (IoT) devices is crucial for reliable operation in energy cyber-physical systems, such as smart grids and microgrids. However, IoT systems are vulnerable to clock drift, time-synchronization manipulation, and timestamp discontinuities, such as the Year 2038 (Y2K38) Unix overflow, all of which disrupt temporal ordering. Conventional anomaly-detection models, which assume reliable timestamps, fail to capture temporal inconsistencies. This paper introduces STGAT (Spatio-Temporal Graph Attention Network), a framework that models both temporal distortion and inter-device consistency in energy IoT systems. STGAT combines drift-aware temporal embeddings and temporal self-attention to capture corrupted time evolution at individual devices, and uses graph attention to model spatial propagation of timing errors. A curvature-regularized latent representation geometrically separates normal clock evolution from anomalies caused by drift, synchronization offsets, and overflow events. Experimental results on energy IoT telemetry with controlled timing perturbations show that STGAT achieves 95.7% accuracy, outperforming recurrent, transformer, and graph-based baselines with significant improvements (d > 1.8, p < 0.001). Additionally, STGAT reduces detection delay by 26%, achieving a 2.3-time-step delay while maintaining stable performance under overflow, drift, and physical inconsistencies.


【4】Adaptive Edge Learning for Density-Aware Graph Generation
标题:自适应边学习算法在密度感知图生成中的应用
链接:https://arxiv.org/abs/2601.23052

作者:Seyedeh Ava Razi Razavi,James Sargant,Sheridan Houghten,Renata Dividino
备注:Accepted at the 39th Canadian Conference on Artificial Intelligence
摘要:Generating realistic graph-structured data is challenging due to discrete structures, variable sizes, and class-specific connectivity patterns that resist conventional generative modelling. While recent graph generation methods employ generative adversarial network (GAN) frameworks to handle permutation invariance and irregular topologies, they typically rely on random edge sampling with fixed probabilities, limiting their capacity to capture complex structural dependencies between nodes. We propose a density-aware conditional graph generation framework using Wasserstein GANs (WGAN) that replaces random sampling with a learnable distance-based edge predictor. Our approach embeds nodes into a latent space where proximity correlates with edge likelihood, enabling the generator to learn meaningful connectivity patterns. A differentiable edge predictor determines pairwise relationships directly from node embeddings, while a density-aware selection mechanism adaptively controls edge density to match class-specific sparsity distributions observed in real graphs. We train the model using a WGAN with gradient penalty, employing a GCN-based critic to ensure generated graphs exhibit realistic topology and align with target class distributions. Experiments on benchmark datasets demonstrate that our method produces graphs with superior structural coherence and class-consistent connectivity compared to existing baselines. The learned edge predictor captures complex relational patterns beyond simple heuristics, generating graphs whose density and topology closely match real structural distributions. Our results show improved training stability and controllable synthesis, making the framework effective for realistic graph generation and data augmentation. Source code is publicly available at https://github.com/ava-12/Density_Aware_WGAN.git.


【5】Scalable Topology-Preserving Graph Coarsening with Graph Collapse
标题:具有图折叠的可扩展保拓图粗化
链接:https://arxiv.org/abs/2601.22943

作者:Xiang Wu,Rong-Hua Li,Xunkai Li,Kangfei Zhao,Hongchao Qin,Guoren Wang
摘要 :Graph coarsening reduces the size of a graph while preserving certain properties. Most existing methods preserve either spectral or spatial characteristics. Recent research has shown that preserving topological features helps maintain the predictive performance of graph neural networks (GNNs) trained on the coarsened graph but suffers from exponential time complexity. To address these problems, we propose Scalable Topology-Preserving Graph Coarsening (STPGC) by introducing the concepts of graph strong collapse and graph edge collapse extended from algebraic topology. STPGC comprises three new algorithms, GStrongCollapse, GEdgeCollapse, and NeighborhoodConing based on these two concepts, which eliminate dominated nodes and edges while rigorously preserving topological features. We further prove that STPGC preserves the GNN receptive field and develop approximate algorithms to accelerate GNN training. Experiments on node classification with GNNs demonstrate the efficiency and effectiveness of STPGC.


【6】Full-Graph vs. Mini-Batch Training: Comprehensive Analysis from a Batch Size and Fan-Out Size Perspective
标题:全图与小批量训练:从批量大小和散开大小角度进行综合分析
链接:https://arxiv.org/abs/2601.22678

作者:Mengfan Liu,Da Zheng,Junwei Su,Chuan Wu
摘要:Full-graph and mini-batch Graph Neural Network (GNN) training approaches have distinct system design demands, making it crucial to choose the appropriate approach to develop. A core challenge in comparing these two GNN training approaches lies in characterizing their model performance (i.e., convergence and generalization) and computational efficiency. While a batch size has been an effective lens in analyzing such behaviors in deep neural networks (DNNs), GNNs extend this lens by introducing a fan-out size, as full-graph training can be viewed as mini-batch training with the largest possible batch size and fan-out size. However, the impact of the batch and fan-out size for GNNs remains insufficiently explored. To this end, this paper systematically compares full-graph vs. mini-batch training of GNNs through empirical and theoretical analyses from the view points of the batch size and fan-out size. Our key contributions include: 1) We provide a novel generalization analysis using the Wasserstein distance to study the impact of the graph structure, especially the fan-out size. 2) We uncover the non-isotropic effects of the batch size and the fan-out size in GNN convergence and generalization, providing practical guidance for tuning these hyperparameters under resource constraints. Finally, full-graph training does not always yield better model performance or computational efficiency than well-tuned smaller mini-batch settings. The implementation can be found in the github link: https://github.com/LIUMENGFAN-gif/GNN_fullgraph_minibatch_training.


【7】Heterogeneous Graph Alignment for Joint Reasoning and Interpretability
标题:用于联合推理和可解释性的异类图对齐
链接:https://arxiv.org/abs/2601.22593

作者:Zahra Moslemi,Ziyi Liang,Norbert Fortin,Babak Shahbaba
摘要:Multi-graph learning is crucial for extracting meaningful signals from collections of heterogeneous graphs. However, effectively integrating information across graphs with differing topologies, scales, and semantics, often in the absence of shared node identities, remains a significant challenge. We present the Multi-Graph Meta-Transformer (MGMT), a unified, scalable, and interpretable framework for cross-graph learning. MGMT first applies Graph Transformer encoders to each graph, mapping structure and attributes into a shared latent space. It then selects task-relevant supernodes via attention and builds a meta-graph that connects functionally aligned supernodes across graphs using similarity in the latent space. Additional Graph Transformer layers on this meta-graph enable joint reasoning over intra- and inter-graph structure. The meta-graph provides built-in interpretability: supernodes and superedges highlight influential substructures and cross-graph alignments. Evaluating MGMT on both synthetic datasets and real-world neuroscience applications, we show that MGMT consistently outperforms existing state-of-the-art models in graph-level prediction tasks while offering interpretable representations that facilitate scientific discoveries. Our work establishes MGMT as a unified framework for structured multi-graph learning, advancing representation techniques in domains where graph-based data plays a central role.


【8】Non-Intrusive Graph-Based Bot Detection for E-Commerce Using Inductive Graph Neural Networks
标题:使用归纳图神经网络的电子商务非侵入性基于图的Bot检测
链接:https://arxiv.org/abs/2601.22579

作者:Sichen Zhao,Zhiming Xue,Yalun Qi,Xianling Zeng,Zihan Yu
摘要:Malicious bots pose a growing threat to e-commerce platforms by scraping data, hoarding inventory, and perpetrating fraud. Traditional bot mitigation techniques, including IP blacklists and CAPTCHA-based challenges, are increasingly ineffective or intrusive, as modern bots leverage proxies, botnets, and AI-assisted evasion strategies. This work proposes a non-intrusive graph-based bot detection framework for e-commerce that models user session behavior through a graph representation and applies an inductive graph neural network for classification. The approach captures both relational structure and behavioral semantics, enabling accurate identification of subtle automated activity that evades feature-based methods. Experiments on real-world e-commerce traffic demonstrate that the proposed inductive graph model outperforms a strong session-level multilayer perceptron baseline in terms of AUC and F1 score. Additional adversarial perturbation and cold-start simulations show that the model remains robust under moderate graph modifications and generalizes effectively to previously unseen sessions and URLs. The proposed framework is deployment-friendly, integrates with existing systems without client-side instrumentation, and supports real-time inference and incremental updates, making it suitable for practical e-commerce security deployments.


【9】Variational Bayesian Flow Network for Graph Generation
标题:用于图生成的变分Bayesian流网络
链接:https://arxiv.org/abs/2601.22524

作者:Yida Xiong,Jiameng Chen,Xiuwen Gong,Jia Wu,Shirui Pan,Wenbin Hu
摘要 :Graph generation aims to sample discrete node and edge attributes while satisfying coupled structural constraints. Diffusion models for graphs often adopt largely factorized forward-noising, and many flow-matching methods start from factorized reference noise and coordinate-wise interpolation, so node-edge coupling is not encoded by the generative geometry and must be recovered implicitly by the core network, which can be brittle after discrete decoding. Bayesian Flow Networks (BFNs) evolve distribution parameters and naturally support discrete generation. But classical BFNs typically rely on factorized beliefs and independent channels, which limit geometric evidence fusion. We propose Variational Bayesian Flow Network (VBFN), which performs a variational lifting to a tractable joint Gaussian variational belief family governed by structured precisions. Each Bayesian update reduces to solving a symmetric positive definite linear system, enabling coupled node and edge updates within a single fusion step. We construct sample-agnostic sparse precisions from a representation-induced dependency graph, thereby avoiding label leakage while enforcing node-edge consistency. On synthetic and molecular graph datasets, VBFN improves fidelity and diversity, and surpasses baseline methods.


【10】Temporal Graph Pattern Machine
标题:时态图模式机
链接:https://arxiv.org/abs/2601.22454

作者:Yijun Ma,Zehong Wang,Weixiang Sun,Yanfang Ye
摘要:Temporal graph learning is pivotal for deciphering dynamic systems, where the core challenge lies in explicitly modeling the underlying evolving patterns that govern network transformation. However, prevailing methods are predominantly task-centric and rely on restrictive assumptions -- such as short-term dependency modeling, static neighborhood semantics, and retrospective time usage. These constraints hinder the discovery of transferable temporal evolution mechanisms. To address this, we propose the Temporal Graph Pattern Machine (TGPM), a foundation framework that shifts the focus toward directly learning generalized evolving patterns. TGPM conceptualizes each interaction as an interaction patch synthesized via temporally-biased random walks, thereby capturing multi-scale structural semantics and long-range dependencies that extend beyond immediate neighborhoods. These patches are processed by a Transformer-based backbone designed to capture global temporal regularities while adapting to context-specific interaction dynamics. To further empower the model, we introduce a suite of self-supervised pre-training tasks -- specifically masked token modeling and next-time prediction -- to explicitly encode the fundamental laws of network evolution. Extensive experiments show that TGPM consistently achieves state-of-the-art performance in both transductive and inductive link prediction, demonstrating exceptional cross-domain transferability.


【11】MM-OpenFGL: A Comprehensive Benchmark for Multimodal Federated Graph Learning
标题:MM-OpenFGL:多模态联合图学习的综合基准
链接:https://arxiv.org/abs/2601.22416

作者:Xunkai Li,Yuming Ai,Yinlin Zhu,Haodong Lu,Yi Zhang,Guohao Fu,Bowen Fan,Qiangqiang Dai,Rong-Hua Li,Guoren Wang
备注:Under Review
摘要:Multimodal-attributed graphs (MMAGs) provide a unified framework for modeling complex relational data by integrating heterogeneous modalities with graph structures. While centralized learning has shown promising performance, MMAGs in real-world applications are often distributed across isolated platforms and cannot be shared due to privacy concerns or commercial constraints. Federated graph learning (FGL) offers a natural solution for collaborative training under such settings; however, existing studies largely focus on single-modality graphs and do not adequately address the challenges unique to multimodal federated graph learning (MMFGL). To bridge this gap, we present MM-OpenFGL, the first comprehensive benchmark that systematically formalizes the MMFGL paradigm and enables rigorous evaluation. MM-OpenFGL comprises 19 multimodal datasets spanning 7 application domains, 8 simulation strategies capturing modality and topology variations, 6 downstream tasks, and 57 state-of-the-art methods implemented through a modular API. Extensive experiments investigate MMFGL from the perspectives of necessity, effectiveness, robustness, and efficiency, offering valuable insights for future research on MMFGL.


【12】Graph is a Substrate Across Data Modalities
标题:图表是跨数据模式的底层
链接:https://arxiv.org/abs/2601.22384

作者:Ziming Li,Xiaoming Wu,Zehong Wang,Jiazheng Li,Yijun Tian,Jinhe Bi,Yunpu Ma,Yanfang Ye,Chuxu Zhang
备注:Graph structure across data modalities
摘要:Graphs provide a natural representation of relational structure that arises across diverse domains. Despite this ubiquity, graph structure is typically learned in a modality- and task-isolated manner, where graph representations are constructed within individual task contexts and discarded thereafter. As a result, structural regularities across modalities and tasks are repeatedly reconstructed rather than accumulated at the level of intermediate graph representations. This motivates a representation-learning question: how should graph structure be organized so that it can persist and accumulate across heterogeneous modalities and tasks? We adopt a representation-centric perspective in which graph structure is treated as a structural substrate that persists across learning contexts. To instantiate this perspective, we propose G-Substrate, a graph substrate framework that organizes learning around shared graph structures. G-Substrate comprises two complementary mechanisms: a unified structural schema that ensures compatibility among graph representations across heterogeneous modalities and tasks, and an interleaved role-based training strategy that exposes the same graph structure to multiple functional roles during learning. Experiments across multiple domains, modalities, and tasks show that G-Substrate outperforms task-isolated and naive multi-task learning methods.


【13】Spatially-Adaptive Conformal Graph Transformer for Indoor Localization in Wi-Fi Driven Networks
标题:Wi-Fi驱动网络中用于室内定位的空间自适应共形图Transformer
链接:https://arxiv.org/abs/2601.22322

作者:Ayesh Abu Lehyeh,Anastassia Gharib,Safwan Wshah
备注:Accepted to IEEE ICC 2026
摘要 :Indoor localization is a critical enabler for a wide range of location-based services in smart environments, including navigation, asset tracking, and safety-critical applications. Recent graph-based models leverage spatial relationships between Wire-less Fidelity (Wi-Fi) Access Points (APs) and devices, offering finer localization granularity, but fall short in quantifying prediction uncertainty, a key requirement for real-world deployment. In this paper, we propose Spatially-Adaptive Conformal Graph Transformer (SAC-GT), a framework for accurate and reliable indoor localization. SAC-GT integrates a Graph Transformer (GT) model that captures network's spatial topology and signal strength dynamics, with a novel Spatially-Adaptive Conformal Prediction (SACP) method that provides region-specific uncertainty estimates. This allows SAC-GT to produce not only precise two-dimensional (2D) location predictions but also statistically valid confidence regions tailored to varying environmental conditions. Extensive evaluations on a large-scale real-world dataset demonstrate that the proposed SAC-GT solution achieves state-of-the-art localization accuracy while delivering robust and spatially adaptive reliability guarantees.


【14】FlowSymm: Physics Aware, Symmetry Preserving Graph Attention for Network Flow Completion
标题:FlowSymm:物理意识,对称性保留图形注意力以完成网络流
链接:https://arxiv.org/abs/2601.22317

作者:Ege Demirci,Francesco Bullo,Ananthram Swami,Ambuj Singh
摘要:Recovering missing flows on the edges of a network, while exactly respecting local conservation laws, is a fundamental inverse problem that arises in many systems such as transportation, energy, and mobility. We introduce FlowSymm, a novel architecture that combines (i) a group-action on divergence-free flows, (ii) a graph-attention encoder to learn feature-conditioned weights over these symmetry-preserving actions, and (iii) a lightweight Tikhonov refinement solved via implicit bilevel optimization. The method first anchors the given observation on a minimum-norm divergence-free completion. We then compute an orthonormal basis for all admissible group actions that leave the observed flows invariant and parameterize the valid solution subspace, which shows an Abelian group structure under vector addition. A stack of GATv2 layers then encodes the graph and its edge features into per-edge embeddings, which are pooled over the missing edges and produce per-basis attention weights. This attention-guided process selects a set of physics-aware group actions that preserve the observed flows. Finally, a scalar Tikhonov penalty refines the missing entries via a convex least-squares solver, with gradients propagated implicitly through Cholesky factorization. Across three real-world flow benchmarks (traffic, power, bike), FlowSymm outperforms state-of-the-art baselines in RMSE, MAE and correlation metrics.


【15】Graph Attention Network for Node Regression on Random Geometric Graphs with Erdős--Rényi contamination
标题:带Erdens-Rényi污染的随机几何图节点回归的图注意力网络
链接:https://arxiv.org/abs/2601.23239

作者:Somak Laha,Suqi Liu,Morgane Austern
备注:62 pages, 2 figures, 2 tables
摘要:Graph attention networks (GATs) are widely used and often appear robust to noise in node covariates and edges, yet rigorous statistical guarantees demonstrating a provable advantage of GATs over non-attention graph neural networks~(GNNs) are scarce. We partially address this gap for node regression with graph-based errors-in-variables models under simultaneous covariate and edge corruption: responses are generated from latent node-level covariates, but only noise-perturbed versions of the latent covariates are observed; and the sample graph is a random geometric graph created from the node covariates but contaminated by independent Erdős--Rényi edges. We propose and analyze a carefully designed, task-specific GAT that constructs denoised proxy features for regression. We prove that regressing the response variables on the proxies achieves lower error asymptotically in (a) estimating the regression coefficient compared to the ordinary least squares (OLS) estimator on the noisy node covariates, and (b) predicting the response for an unlabelled node compared to a vanilla graph convolutional network~(GCN) -- under mild growth conditions. Our analysis leverages high-dimensional geometric tail bounds and concentration for neighbourhood counts and sample covariances. We verify our theoretical findings through experiments on synthetically generated data. We also perform experiments on real-world graphs and demonstrate the effectiveness of the attention mechanism in several node regression tasks.


Transformer(12篇)

【1】YuriiFormer: A Suite of Nesterov-Accelerated Transformers
标题:YuriiFormer:一套涅斯特罗夫加速Transformer
链接:https://arxiv.org/abs/2601.23236

作者:Aleksandr Zimin,Yury Polyanskiy,Philippe Rigollet
摘要:We propose a variational framework that interprets transformer layers as iterations of an optimization algorithm acting on token embeddings. In this view, self-attention implements a gradient step of an interaction energy, while MLP layers correspond to gradient updates of a potential energy. Standard GPT-style transformers emerge as vanilla gradient descent on the resulting composite objective, implemented via Lie--Trotter splitting between these two energy functionals. This perspective enables principled architectural design using classical optimization ideas. As a proof of concept, we introduce a Nesterov-style accelerated transformer that preserves the same attention and MLP oracles. The resulting architecture consistently outperforms a nanoGPT baseline on TinyStories and OpenWebText, demonstrating that optimization-theoretic insights can translate into practical gains.


【2】MeshGraphNet-Transformer: Scalable Mesh-based Learned Simulation for Solid Mechanics
标题:MeshGraphNet-Transformer:可扩展的基于网格的固体力学学习模拟
链接:https://arxiv.org/abs/2601.23177

作者:Mikel M. Iparraguirre,Iciar Alfaro,David Gonzalez,Elias Cueto
摘要 :We present MeshGraphNet-Transformer (MGN-T), a novel architecture that combines the global modeling capabilities of Transformers with the geometric inductive bias of MeshGraphNets, while preserving a mesh-based graph representation. MGN-T overcomes a key limitation of standard MGN, the inefficient long-range information propagation caused by iterative message passing on large, high-resolution meshes. A physics-attention Transformer serves as a global processor, updating all nodal states simultaneously while explicitly retaining node and edge attributes. By directly capturing long-range physical interactions, MGN-T eliminates the need for deep message-passing stacks or hierarchical, coarsened meshes, enabling efficient learning on high-resolution meshes with varying geometries, topologies, and boundary conditions at an industrial scale.   We demonstrate that MGN-T successfully handles industrial-scale meshes for impact dynamics, a setting in which standard MGN fails due message-passing under-reaching. The method accurately models self-contact, plasticity, and multivariate outputs, including internal, phenomenological plastic variables. Moreover, MGN-T outperforms state-of-the-art approaches on classical benchmarks, achieving higher accuracy while maintaining practical efficiency, using only a fraction of the parameters required by competing baselines.


【3】Names Don't Matter: Symbol-Invariant Transformer for Open-Vocabulary Learning
标题:名字不重要:开放词汇学习的符号不变Transformer
链接:https://arxiv.org/abs/2601.23169

作者:İlker Işık,Wenchao Li
摘要:Current neural architectures lack a principled way to handle interchangeable tokens, i.e., symbols that are semantically equivalent yet distinguishable, such as bound variables. As a result, models trained on fixed vocabularies often struggle to generalize to unseen symbols, even when the underlying semantics remain unchanged. We propose a novel Transformer-based mechanism that is provably invariant to the renaming of interchangeable tokens. Our approach employs parallel embedding streams to isolate the contribution of each interchangeable token in the input, combined with an aggregated attention mechanism that enables structured information sharing across streams. Experimental results confirm the theoretical guarantees of our method and demonstrate substantial performance gains on open-vocabulary tasks that require generalization to novel symbols.


【4】Learnable Permutation for Structured Sparsity on Transformer Models
标题:Transformer模型上结构稀疏性的可学习排列
链接:https://arxiv.org/abs/2601.22980

作者:Zekai Li,Ji Liu,Guanchen Li,Yixing Xu,Ziqiong Liu,Xuanwu Yin,Dong Li,Emad Barsoum
摘要:Structured sparsity has emerged as a popular model pruning technique, widely adopted in various architectures, including CNNs, Transformer models, and especially large language models (LLMs) in recent years. A promising direction to further improve post-pruning performance is weight permutation, which reorders model weights into patterns more amenable to pruning. However, the exponential growth of the permutation search space with the scale of Transformer architectures forces most methods to rely on greedy or heuristic algorithms, limiting the effectiveness of reordering.   In this work, we propose a novel end-to-end learnable permutation framework. Our method introduces a learnable permutation cost matrix to quantify the cost of swapping any two input channels of a given weight matrix, a differentiable bipartite matching solver to obtain the optimal binary permutation matrix given a cost matrix, and a sparsity optimization loss function to directly optimize the permutation operator. We extensively validate our approach on vision and language Transformers, demonstrating that our method achieves state-of-the-art permutation results for structured sparsity.


【5】Matterhorn: Efficient Analog Sparse Spiking Transformer Architecture with Masked Time-To-First-Spike Encoding
标题:Matterhorn:具有屏蔽首次尖峰时间编码的高效模拟稀疏尖峰Transformer架构
链接:https://arxiv.org/abs/2601.22876

作者:Zhanglu Yan,Kaiwen Tang,Zixuan Zhu,Zhenyu Bai,Qianhui Liu,Weng-Fai Wong
摘要:Spiking neural networks (SNNs) have emerged as a promising candidate for energy-efficient LLM inference. However, current energy evaluations for SNNs primarily focus on counting accumulate operations, and fail to account for real-world hardware costs such as data movement, which can consume nearly 80% of the total energy. In this paper, we propose Matterhorn, a spiking transformer that integrates a novel masked time-to-first-spike (M-TTFS) encoding method to reduce spike movement and a memristive synapse unit (MSU) to eliminate weight access overhead. M-TTFS employs a masking strategy that reassigns the zero-energy silent state (a spike train of all 0s) to the most frequent membrane potential rather than the lowest. This aligns the coding scheme with the data distribution, minimizing spike movement energy without information loss. We further propose a `dead zone' strategy that maximizes sparsity by mapping all values within a given range to the silent state. At the hardware level, the MSU utilizes compute-in-memory (CIM) technology to perform analog integration directly within memory, effectively removing weight access costs. On the GLUE benchmark, Matterhorn establishes a new state-of-the-art, surpassing existing SNNs by 1.42% in average accuracy while delivering a 2.31 times improvement in energy efficiency.


【6】Hierarchical Shift Mixing -- Beyond Dense Attention in Transformers
标题:等级转换混合--超越《Transformer》的密集关注
链接:https://arxiv.org/abs/2601.22852

作者:Robert Forchheimer
备注:11 pages, 10 pdf figures
摘要:Since the introduction of the Transformer architecture for large language models, the softmax-based attention layer has faced increasing scrutinity due to its quadratic-time computational complexity. Attempts have been made to replace it with less complex methods, at the cost of reduced performance in most cases. We introduce Hierarchical Shift Mixing (HSM), a general framework for token mixing that distributes pairwise token interactions across Transformer layers rather than computing them densely within each layer. HSM enables linear-time complexity while remaining agnostic to the specific mixing function. We show that even simple HSM variants achieve performance close to softmax attention, and that hybrid architectures combining HSM with softmax attention can outperform a GPT-style Transformer baseline while reducing computational cost during both training and inference.


【7】Do Transformers Have the Ability for Periodicity Generalization?
标题:Transformer有周期性概括的能力吗?
链接:https://arxiv.org/abs/2601.22690

作者:Huanyu Liu,Ge Li,Yihong Dong,Sihan Wu,Peixu Wang,Sihao Cheng,Taozhi Chen,Kechi Zhang,Hao Zhu,Tongxuan Liu
摘要:Large language models (LLMs) based on the Transformer have demonstrated strong performance across diverse tasks. However, current models still exhibit substantial limitations in out-of-distribution (OOD) generalization compared with humans. We investigate this gap through periodicity, one of the basic OOD scenarios. Periodicity captures invariance amid variation. Periodicity generalization represents a model's ability to extract periodic patterns from training data and generalize to OOD scenarios. We introduce a unified interpretation of periodicity from the perspective of abstract algebra and reasoning, including both single and composite periodicity, to explain why Transformers struggle to generalize periodicity. Then we construct Coper about composite periodicity, a controllable generative benchmark with two OOD settings, Hollow and Extrapolation. Experiments reveal that periodicity generalization in Transformers is limited, where models can memorize periodic data during training, but cannot generalize to unseen composite periodicity. We release the source code to support future research.


【8】Stabilizing Transformer Training Through Consensus
标题:通过共识稳定Transformer训练
链接:https://arxiv.org/abs/2601.22614

作者:Shyam Venkatasubramanian,Sean Moushegian,Michael Lin,Mir Park,Ankit Singhal,Connor Lee
摘要:Standard attention-based transformers are known to exhibit instability under learning rate overspecification during training, particularly at high learning rates. While various methods have been proposed to improve resilience to such overspecification by modifying the optimization procedure, fundamental architectural innovations to this end remain underexplored. In this work, we illustrate that the consensus mechanism, a drop-in replacement for attention, stabilizes transformer training across a wider effective range of learning rates. We formulate consensus as a graphical model and provide extensive empirical analysis demonstrating improved stability across learning rate sweeps on text, DNA, and protein modalities. We further propose a hybrid consensus-attention framework that preserves performance while improving stability. We provide theoretical analysis characterizing the properties of consensus.


【9】SpanNorm: Reconciling Training Stability and Performance in Deep Transformers
标题:SpanNorm:在Deep Transformers中验证训练稳定性和性能
链接:https://arxiv.org/abs/2601.22580

作者:Chao Wang,Bei Li,Jiaqi Zhang,Xinyu Liu,Yuchun Fan,Linkun Lyu,Xin Chen,Jingang Wang,Tong Xiao,Peng Pei,Xunliang Cai
摘要:The success of Large Language Models (LLMs) hinges on the stable training of deep Transformer architectures. A critical design choice is the placement of normalization layers, leading to a fundamental trade-off: the ``PreNorm'' architecture ensures training stability at the cost of potential performance degradation in deep models, while the ``PostNorm'' architecture offers strong performance but suffers from severe training instability. In this work, we propose SpanNorm, a novel technique designed to resolve this dilemma by integrating the strengths of both paradigms. Structurally, SpanNorm establishes a clean residual connection that spans the entire transformer block to stabilize signal propagation, while employing a PostNorm-style computation that normalizes the aggregated output to enhance model performance. We provide a theoretical analysis demonstrating that SpanNorm, combined with a principled scaling strategy, maintains bounded signal variance throughout the network, preventing the gradient issues that plague PostNorm models, and also alleviating the representation collapse of PreNorm. Empirically, SpanNorm consistently outperforms standard normalization schemes in both dense and Mixture-of-Experts (MoE) scenarios, paving the way for more powerful and stable Transformer architectures.


【10】Shattered Compositionality: Counterintuitive Learning Dynamics of Transformers for Arithmetic
标题:破碎的组合性:算术Transformer的反直觉学习动力学
链接:https://arxiv.org/abs/2601.22510

作者:Xingyu Zhao,Darsh Sharma,Rheeya Uppaal,Yiqiao Zhong
备注:33 pages, 27 figures
摘要:Large language models (LLMs) often exhibit unexpected errors or unintended behavior, even at scale. While recent work reveals the discrepancy between LLMs and humans in skill compositions, the learning dynamics of skill compositions and the underlying cause of non-human behavior remain elusive. In this study, we investigate the mechanism of learning dynamics by training transformers on synthetic arithmetic tasks. Through extensive ablations and fine-grained diagnostic metrics, we discover that transformers do not reliably build skill compositions according to human-like sequential rules. Instead, they often acquire skills in reverse order or in parallel, which leads to unexpected mixing errors especially under distribution shifts--a phenomenon we refer to as shattered compositionality. To explain these behaviors, we provide evidence that correlational matching to the training data, rather than causal or procedural composition, shapes learning dynamics. We further show that shattered compositionality persists in modern LLMs and is not mitigated by pure model scaling or scratchpad-based reasoning. Our results reveal a fundamental mismatch between a model's learning behavior and desired skill compositions, with implications for reasoning reliability, out-of-distribution robustness, and alignment.


【11】Symmetry Breaking in Transformers for Efficient and Interpretable Training
标题:Transformer中的对称性打破,以实现高效且可解释的训练
链接:https://arxiv.org/abs/2601.22257

作者:Eva Silverstein,Daniel Kunin,Vasudev Shyam
备注:22 pages, 3 figures
摘要 :The attention mechanism in its standard implementation contains extraneous rotational degrees of freedom that are carried through computation but do not affect model activations or outputs. We introduce a simple symmetry-breaking protocol that inserts a preferred direction into this rotational space through batchwise-sampled, unlearned query and value biases. This modification has two theoretically motivated and empirically validated consequences. First, it can substantially improve the performance of simple, memory-efficient optimizers, narrowing -- and in some cases closing -- the gap to successful but more complex memory-intensive adaptive methods. We demonstrate this by pretraining 124M parameter transformer models with four optimization algorithms (AdamW, SOAP, SGDM, and Energy Conserving Descent(ECD)) and evaluating both validation loss and downstream logical reasoning. Second, it enables an interpretable use of otherwise redundant rotational degrees of freedom, selectively amplifying semantically meaningful token classes within individual attention heads. Overall, our results show that minimal, principled architectural changes can simultaneously improve performance and interpretability.


【12】Attention Isn't All You Need for Emotion Recognition:Domain Features Outperform Transformers on the EAV Dataset
标题:情感识别需要的不仅仅是注意力:领域功能在EAV数据集中优于Transformer
链接:https://arxiv.org/abs/2601.22161

作者:Anmol Guragain
摘要:We present a systematic study of multimodal emotion recognition using the EAV dataset, investigating whether complex attention mechanisms improve performance on small datasets. We implement three model categories: baseline transformers (M1), novel factorized attention mechanisms (M2), and improved CNN baselines (M3). Our experiments show that sophisticated attention mechanisms consistently underperform on small datasets. M2 models achieved 5 to 13 percentage points below baselines due to overfitting and destruction of pretrained features. In contrast, simple domain-appropriate modifications proved effective: adding delta MFCCs to the audio CNN improved accuracy from 61.9\% to \textbf{65.56\%} (+3.66pp), while frequency-domain features for EEG achieved \textbf{67.62\%} (+7.62pp over the paper baseline). Our vision transformer baseline (M1) reached \textbf{75.30\%}, exceeding the paper's ViViT result (74.5\%) through domain-specific pretraining, and vision delta features achieved \textbf{72.68\%} (+1.28pp over the paper CNN). These findings demonstrate that for small-scale emotion recognition, domain knowledge and proper implementation outperform architectural complexity.


GAN|对抗|攻击|生成相关(12篇)

【1】VideoGPA: Distilling Geometry Priors for 3D-Consistent Video Generation
标题:VideoGMA:提取几何先验以生成3D一致的视频
链接:https://arxiv.org/abs/2601.23286

作者:Hongyang Du,Junjie Ye,Xiaoyan Cong,Runhao Li,Jingcheng Ni,Aman Agarwal,Zeqi Zhou,Zekun Li,Randall Balestriero,Yue Wang
摘要:While recent video diffusion models (VDMs) produce visually impressive results, they fundamentally struggle to maintain 3D structural consistency, often resulting in object deformation or spatial drift. We hypothesize that these failures arise because standard denoising objectives lack explicit incentives for geometric coherence. To address this, we introduce VideoGPA (Video Geometric Preference Alignment), a data-efficient self-supervised framework that leverages a geometry foundation model to automatically derive dense preference signals that guide VDMs via Direct Preference Optimization (DPO). This approach effectively steers the generative distribution toward inherent 3D consistency without requiring human annotations. VideoGPA significantly enhances temporal stability, physical plausibility, and motion coherence using minimal preference pairs, consistently outperforming state-of-the-art baselines in extensive experiments.


【2】Agnostic Language Identification and Generation
标题:不可知语言识别和生成
链接:https://arxiv.org/abs/2601.23258

作者:Mikael Møller Høgsgaard,Chirag Pabbaraju
摘要:Recent works on language identification and generation have established tight statistical rates at which these tasks can be achieved. These works typically operate under a strong realizability assumption: that the input data is drawn from an unknown distribution necessarily supported on some language in a given collection. In this work, we relax this assumption of realizability entirely, and impose no restrictions on the distribution of the input data. We propose objectives to study both language identification and generation in this more general "agnostic" setup. Across both problems, we obtain novel interesting characterizations and nearly tight rates.


【3】DINO-SAE: DINO Spherical Autoencoder for High-Fidelity Image Reconstruction and Generation
标题:DINO-SAE:用于高保真图像重建和生成的DINO球形自动编码器
链接:https://arxiv.org/abs/2601.22904

作者:Hun Chang,Byunghee Cha,Jong Chul Ye
备注:17 pages, and 11 figures
摘要 :Recent studies have explored using pretrained Vision Foundation Models (VFMs) such as DINO for generative autoencoders, showing strong generative performance. Unfortunately, existing approaches often suffer from limited reconstruction fidelity due to the loss of high-frequency details. In this work, we present the DINO Spherical Autoencoder (DINO-SAE), a framework that bridges semantic representation and pixel-level reconstruction. Our key insight is that semantic information in contrastive representations is primarily encoded in the direction of feature vectors, while forcing strict magnitude matching can hinder the encoder from preserving fine-grained details. To address this, we introduce Hierarchical Convolutional Patch Embedding module that enhances local structure and texture preservation, and Cosine Similarity Alignment objective that enforces semantic consistency while allowing flexible feature magnitudes for detail retention. Furthermore, leveraging the observation that SSL-based foundation model representations intrinsically lie on a hypersphere, we employ Riemannian Flow Matching to train a Diffusion Transformer (DiT) directly on this spherical latent manifold. Experiments on ImageNet-1K demonstrate that our approach achieves state-of-the-art reconstruction quality, reaching 0.37 rFID and 26.2 dB PSNR, while maintaining strong semantic alignment to the pretrained VFM. Notably, our Riemannian Flow Matching-based DiT exhibits efficient convergence, achieving a gFID of 3.47 at 80 epochs.


【4】Synthetic Time Series Generation via Complex Networks
标题:通过复杂网络生成合成时间序列
链接:https://arxiv.org/abs/2601.22879

作者:Jaime Vale,Vanessa Freitas Silva,Maria Eduarda Silva,Fernando Silva
摘要:Time series data are essential for a wide range of applications, particularly in developing robust machine learning models. However, access to high-quality datasets is often limited due to privacy concerns, acquisition costs, and labeling challenges. Synthetic time series generation has emerged as a promising solution to address these constraints. In this work, we present a framework for generating synthetic time series by leveraging complex networks mappings. Specifically, we investigate whether time series transformed into Quantile Graphs (QG) -- and then reconstructed via inverse mapping -- can produce synthetic data that preserve the statistical and structural properties of the original. We evaluate the fidelity and utility of the generated data using both simulated and real-world datasets, and compare our approach against state-of-the-art Generative Adversarial Network (GAN) methods. Results indicate that our quantile graph-based methodology offers a competitive and interpretable alternative for synthetic time series generation.


【5】Unconditional flow-based time series generation with equivariance-regularised latent spaces
标题:具有等方差正规化潜在空间的无条件基于流的时间序列生成
链接:https://arxiv.org/abs/2601.22848

作者:Camilo Carvajal Reyes,Felipe Tobar
备注:Accepted at ICASSP 2026
摘要:Flow-based models have proven successful for time-series generation, particularly when defined in lower-dimensional latent spaces that enable efficient sampling. However, how to design latent representations with desirable equivariance properties for time-series generative modelling remains underexplored. In this work, we propose a latent flow-matching framework in which equivariance is explicitly encouraged through a simple regularisation of a pre-trained autoencoder. Specifically, we introduce an equivariance loss that enforces consistency between transformed signals and their reconstructions, and use it to fine-tune latent spaces with respect to basic time-series transformations such as translation and amplitude scaling. We show that these equivariance-regularised latent spaces improve generation quality while preserving the computational advantages of latent flow models. Experiments on multiple real-world datasets demonstrate that our approach consistently outperforms existing diffusion-based baselines in standard time-series generation metrics, while achieving orders-of-magnitude faster sampling. These results highlight the practical benefits of incorporating geometric inductive biases into latent generative models for time series.


【6】AscendCraft: Automatic Ascend NPU Kernel Generation via DSL-Guided Transcompilation
标题:AscendCraft:通过SL引导的代码转换来自动Ascend NPU内核生成
链接:https://arxiv.org/abs/2601.22760

作者:Zhongzhen Wen,Shudi Shao,Zhong Li,Yu Ge,Tongtong Xu,Yuanyi Lin,Tian Zhang
摘要:The performance of deep learning models critically depends on efficient kernel implementations, yet developing high-performance kernels for specialized accelerators remains time-consuming and expertise-intensive. While recent work demonstrates that large language models (LLMs) can generate correct and performant GPU kernels, kernel generation for neural processing units (NPUs) remains largely underexplored due to domain-specific programming models, limited public examples, and sparse documentation. Consequently, directly generating AscendC kernels with LLMs yields extremely low correctness, highlighting a substantial gap between GPU and NPU kernel generation.   We present AscendCraft, a DSL-guided approach for automatic AscendC kernel generation. AscendCraft introduces a lightweight DSL that abstracts non-essential complexity while explicitly modeling Ascend-specific execution semantics. Kernels are first generated in the DSL using category-specific expert examples and then transcompiled into AscendC through structured, constraint-driven LLM lowering passes. Evaluated on MultiKernelBench across seven operator categories, AscendCraft achieves 98.1% compilation success and 90.4% functional correctness. Moreover, 46.2% of generated kernels match or exceed PyTorch eager execution performance, demonstrating that DSL-guided transcompilation can enable LLMs to generate both correct and competitive NPU kernels. Beyond benchmarks, AscendCraft further demonstrates its generality by successfully generating two correct kernels for newly proposed mHC architecture, achieving performance that substantially surpasses PyTorch eager execution.


【7】Automating Forecasting Question Generation and Resolution for AI Evaluation
标题:自动生成预测问题并解决人工智能评估
链接:https://arxiv.org/abs/2601.22444

作者:Nikos I. Bosse,Peter Mühlbacher,Jack Wildman,Lawrence Phillips,Dan Schwarz
备注:41 pages, 4 figures
摘要 :Forecasting future events is highly valuable in decision-making and is a robust measure of general intelligence. As forecasting is probabilistic, developing and evaluating AI forecasters requires generating large numbers of diverse and difficult questions, and accurately resolving them. Previous efforts to automate this laborious work relied on recurring data sources (e.g., weather, stocks), limiting diversity and utility. In this work, we present a system for generating and resolving high-quality forecasting questions automatically and at scale using LLM-powered web research agents. We use this system to generate 1499 diverse, real-world forecasting questions, and to resolve them several months later. We estimate that our system produces verifiable, unambiguous questions approximately 96% of the time, exceeding the rate of Metaculus, a leading human-curated forecasting platform. We also find that our system resolves questions at approximately 95% accuracy. We verify that forecasting agents powered by more intelligent LLMs perform better on these questions (Brier score of 0.134 for Gemini 3 Pro, 0.149 for GPT-5, and 0.179 for Gemini 2.5 Flash). Finally, we demonstrate how our system can be leveraged to directly improve forecasting, by evaluating a question decomposition strategy on a generated question set, yielding a significant improvement in Brier scores (0.132 vs. 0.141).


【8】Rethinking Anonymity Claims in Synthetic Data Generation: A Model-Centric Privacy Attack Perspective
标题:重新思考合成数据生成中的匿名声明:以模型为中心的隐私攻击角度
链接:https://arxiv.org/abs/2601.22434

作者:Georgi Ganev,Emiliano De Cristofaro
摘要:Training generative machine learning models to produce synthetic tabular data has become a popular approach for enhancing privacy in data sharing. As this typically involves processing sensitive personal information, releasing either the trained model or generated synthetic datasets can still pose privacy risks. Yet, recent research, commercial deployments, and privacy regulations like the General Data Protection Regulation (GDPR) largely assess anonymity at the level of an individual dataset.   In this paper, we rethink anonymity claims about synthetic data from a model-centric perspective and argue that meaningful assessments must account for the capabilities and properties of the underlying generative model and be grounded in state-of-the-art privacy attacks. This perspective better reflects real-world products and deployments, where trained models are often readily accessible for interaction or querying. We interpret the GDPR's definitions of personal data and anonymization under such access assumptions to identify the types of identifiability risks that must be mitigated and map them to privacy attacks across different threat settings. We then argue that synthetic data techniques alone do not ensure sufficient anonymization. Finally, we compare the two mechanisms most commonly used alongside synthetic data -- Differential Privacy (DP) and Similarity-based Privacy Metrics (SBPMs) -- and argue that while DP can offer robust protections against identifiability risks, SBPMs lack adequate safeguards. Overall, our work connects regulatory notions of identifiability with model-centric privacy attacks, enabling more responsible and trustworthy regulatory assessment of synthetic data systems by researchers, practitioners, and policymakers.


【9】Gaussian Process Bandit Optimization with Machine Learning Predictions and Application to Hypothesis Generation
标题:采用机器学习预测的高斯过程Bandit优化及其在假设生成中的应用
链接:https://arxiv.org/abs/2601.22315

作者:Xin Jennifer Chen,Yunjin Tong
摘要:Many real-world optimization problems involve an expensive ground-truth oracle (e.g., human evaluation, physical experiments) and a cheap, low-fidelity prediction oracle (e.g., machine learning models, simulations). Meanwhile, abundant offline data (e.g., past experiments and predictions) are often available and can be used to pretrain powerful predictive models, as well as to provide an informative prior. We propose Prediction-Augmented Gaussian Process Upper Confidence Bound (PA-GP-UCB), a novel Bayesian optimization algorithm that leverages both oracles and offline data to achieve provable gains in sample efficiency for the ground-truth oracle queries. PA-GP-UCB employs a control-variates estimator derived from a joint Gaussian process posterior to correct prediction bias and reduce uncertainty. We prove that PA-GP-UCB preserves the standard regret rate of GP-UCB while achieving a strictly smaller leading constant that is explicitly controlled by prediction quality and offline data coverage. Empirically, PA-GP-UCB converges faster than Vanilla GP-UCB and naive prediction-augmented GP-UCB baselines on synthetic benchmarks and on a real-world hypothesis evaluation task grounded in human behavioral data, where predictions are provided by large language models. These results establish PA-GP-UCB as a general and sample-efficient framework for hypothesis generation under expensive feedback.


【10】Stealthy Poisoning Attacks Bypass Defenses in Regression Settings
标题:隐形中毒攻击绕过回归设置中的防御
链接:https://arxiv.org/abs/2601.22308

作者:Javier Carnerero-Cano,Luis Muñoz-González,Phillippa Spencer,Emil C. Lupu
摘要:Regression models are widely used in industrial processes, engineering and in natural and physical sciences, yet their robustness to poisoning has received less attention. When it has, studies often assume unrealistic threat models and are thus less useful in practice. In this paper, we propose a novel optimal stealthy attack formulation that considers different degrees of detectability and show that it bypasses state-of-the-art defenses. We further propose a new methodology based on normalization of objectives to evaluate different trade-offs between effectiveness and detectability. Finally, we develop a novel defense (BayesClean) against stealthy attacks. BayesClean improves on previous defenses when attacks are stealthy and the number of poisoning points is significant.


【11】BayesFlow: A Probability Inference Framework for Meta-Agent Assisted Workflow Generation
标题:BayesFlow:一种元Agent辅助工作流生成的概率推理框架
链接:https://arxiv.org/abs/2601.22305

作者:Bo Yuan,Yun Zhou,Zhichao Xu,Kiran Ramnath,Aosong Feng,Balasubramaniam Srinivasan
备注:EACL 2026 Finding
摘要:Automatic workflow generation is the process of automatically synthesizing sequences of LLM calls, tool invocations, and post-processing steps for complex end-to-end tasks. Most prior methods cast this task as an optimization problem with limited theoretical grounding. We propose to cast workflow generation as Bayesian inference over a posterior distribution on workflows, and introduce \textbf{Bayesian Workflow Generation (BWG)}, a sampling framework that builds workflows step-by-step using parallel look-ahead rollouts for importance weighting and a sequential in-loop refiner for pool-wide improvements. We prove that, without the refiner, the weighted empirical distribution converges to the target posterior. We instantiate BWG as \textbf{BayesFlow}, a training-free algorithm for workflow construction. Across six benchmark datasets, BayesFlow improves accuracy by up to 9 percentage points over SOTA workflow generation baselines and by up to 65 percentage points over zero-shot prompting, establishing BWG as a principled upgrade to search-based workflow design. Code will be available on https://github.com/BoYuanVisionary/BayesFlow.


【12】FunPRM: Function-as-Step Process Reward Model with Meta Reward Correction for Code Generation
标题:FunPRM:具有代码生成Meta奖励纠正的功能即步骤流程奖励模型
链接:https://arxiv.org/abs/2601.22249

作者:Ruiyi Zhang,Peijia Qin,Qi Cao,Eric Xue,Pengtao Xie
摘要:Code generation is a core application of large language models (LLMs), yet LLMs still frequently fail on complex programming tasks. Given its success in mathematical reasoning, test-time scaling approaches such as Process Reward Model (PRM)-based Best-of-N selection offer a promising way to improve performance. However, existing PRMs remain ineffective for code generation due to the lack of meaningful step decomposition in code and the noise of Monte Carlo-estimated partial-solution correctness scores (rewards). To address these challenges, we propose FunPRM. FunPRM prompts LLMs to encourage modular code generation organized into functions, with functions treated as PRM reasoning steps. Furthermore, FunPRM introduces a novel meta-learning-based reward correction mechanism that leverages clean final-solution rewards obtained via a unit-test-based evaluation system to purify noisy partial-solution rewards. Experiments on LiveCodeBench and BigCodeBench demonstrate that FunPRM consistently outperforms existing test-time scaling methods across five base LLMs, notably achieving state-of-the-art performance on LiveCodeBench when combined with O4-mini. Furthermore, FunPRM produces code that is more readable and reusable for developers.


半/弱/无/有监督|不确定性|主动学习(7篇)

【1】Unsupervised Hierarchical Skill Discovery
标题:无监督分层技能发现
链接:https://arxiv.org/abs/2601.23156

作者:Damion Harvey,Geraud Nangue Tasse,Branden Ingram,Benjamin Rosman,Steven James
备注:24 pages, 34 figures. Appendix by Damion Harvey. Damion Harvey is the primary author
摘要:We consider the problem of unsupervised skill segmentation and hierarchical structure discovery in reinforcement learning. While recent approaches have sought to segment trajectories into reusable skills or options, most rely on action labels, rewards, or handcrafted annotations, limiting their applicability. We propose a method that segments unlabelled trajectories into skills and induces a hierarchical structure over them using a grammar-based approach. The resulting hierarchy captures both low-level behaviours and their composition into higher-level skills. We evaluate our approach in high-dimensional, pixel-based environments, including Craftax and the full, unmodified version of Minecraft. Using metrics for skill segmentation, reuse, and hierarchy quality, we find that our method consistently produces more structured and semantically meaningful hierarchies than existing baselines. Furthermore, as a proof of concept for utility, we demonstrate that these discovered hierarchies accelerate and stabilise learning on downstream reinforcement learning tasks.


【2】Uncertainty-Aware Extrapolation in Bayesian Oblique Trees
标题:Bayesian斜树中的不确定性外推
链接:https://arxiv.org/abs/2601.22899

作者:Viktor Andonovikj,Sašo Džeroski,Pavle Boškoski
摘要:Decision trees are widely used due to their interpretability and efficiency, but they struggle in regression tasks that require reliable extrapolation and well-calibrated uncertainty. Piecewise-constant leaf predictions are bounded by the training targets and often become overconfident under distribution shift. We propose a single-tree Bayesian model that extends VSPYCT by equipping each leaf with a GP predictor. Bayesian oblique splits provide uncertainty-aware partitioning of the input space, while GP leaves model local functional behaviour and enable principled extrapolation beyond the observed target range. We present an efficient inference and prediction scheme that combines posterior sampling of split parameters with \gls{gp} posterior predictions, and a gating mechanism that activates GP-based extrapolation when inputs fall outside the training support of a leaf. Experiments on benchmark regression tasks show improvements in the predictive performance compared to standard variational oblique trees, and substantial performance gains in extrapolation scenarios.


【3】User-Adaptive Meta-Learning for Cold-Start Medication Recommendation with Uncertainty Filtering
标题:具有不确定性过滤的冷启动药物推荐的用户自适应元学习
链接:https://arxiv.org/abs/2601.22820

作者:Arya Hadizadeh Moghaddam,Mohsen Nayebi Kerdabadi,Dongjie Wang,Mei Liu,Zijun Yao
备注:IEEE International Conference on Data Engineering (ICDE) 2026 accepted paper
摘要 :Large-scale Electronic Health Record (EHR) databases have become indispensable in supporting clinical decision-making through data-driven treatment recommendations. However, existing medication recommender methods often struggle with a user (i.e., patient) cold-start problem, where recommendations for new patients are usually unreliable due to the lack of sufficient prescription history for patient profiling. While prior studies have utilized medical knowledge graphs to connect medication concepts through pharmacological or chemical relationships, these methods primarily focus on mitigating the item cold-start issue and fall short in providing personalized recommendations that adapt to individual patient characteristics. Meta-learning has shown promise in handling new users with sparse interactions in recommender systems. However, its application to EHRs remains underexplored due to the unique sequential structure of EHR data. To tackle these challenges, we propose MetaDrug, a multi-level, uncertainty-aware meta-learning framework designed to address the patient cold-start problem in medication recommendation. MetaDrug proposes a novel two-level meta-adaptation mechanism, including self-adaptation, which adapts the model to new patients using their own medical events as support sets to capture temporal dependencies; and peer-adaptation, which adapts the model using similar visits from peer patients to enrich new patient representations. Meanwhile, to further improve meta-adaptation outcomes, we introduce an uncertainty quantification module that ranks the support visits and filters out the unrelated information for adaptation consistency. We evaluate our approach on the MIMIC-III and Acute Kidney Injury (AKI) datasets. Experimental results on both datasets demonstrate that MetaDrug consistently outperforms state-of-the-art medication recommendation methods on cold-start patients.


【4】Decomposing Epistemic Uncertainty for Causal Decision Making
标题:分解认识不确定性以进行因果决策
链接:https://arxiv.org/abs/2601.22736

作者:Md Musfiqur Rahman,Ziwei Jiang,Hilaf Hasson,Murat Kocaoglu
摘要:Causal inference from observational data provides strong evidence for the best action in decision-making without performing expensive randomized trials. The effect of an action is usually not identifiable under unobserved confounding, even with an infinite amount of data. Recent work uses neural networks to obtain practical bounds to such causal effects, which is often an intractable problem. However, these approaches may overfit to the dataset and be overconfident in their causal effect estimates. Moreover, there is currently no systematic approach to disentangle how much of the width of causal effect bounds is due to fundamental non-identifiability versus how much is due to finite-sample limitations. We propose a novel framework to address this problem by considering a confidence set around the empirical observational distribution and obtaining the intersection of causal effect bounds for all distributions in this confidence set. This allows us to distinguish the part of the interval that can be reduced by collecting more samples, which we call sample uncertainty, from the part that can only be reduced by observing more variables, such as latent confounders or instrumental variables, but not with more data, which we call non-ID uncertainty. The upper and lower bounds to this intersection are obtained by solving min-max and max-min problems with neural causal models by searching over all distributions that the dataset might have been sampled from, and all SCMs that entail the corresponding distribution. We demonstrate via extensive experiments on synthetic and real-world datasets that our algorithm can determine when collecting more samples will not help determine the best action. This can guide practitioners to collect more variables or lean towards a randomized study for best action identification.


【5】UCPO: Uncertainty-Aware Policy Optimization
标题:UCPO:不确定性意识的政策优化
链接:https://arxiv.org/abs/2601.22648

作者:Xianzhou Zeng,Jing Huang,Chunmei Xie,Gongrui Nan,Siye Chen,Mengyu Lu,Weiqi Xiong,Qixuan Zhou,Junhao Zhang,Qiang Zhu,Yadong Li,Xingzhong Xu
摘要:The key to building trustworthy Large Language Models (LLMs) lies in endowing them with inherent uncertainty expression capabilities to mitigate the hallucinations that restrict their high-stakes applications. However, existing RL paradigms such as GRPO often suffer from Advantage Bias due to binary decision spaces and static uncertainty rewards, inducing either excessive conservatism or overconfidence. To tackle this challenge, this paper unveils the root causes of reward hacking and overconfidence in current RL paradigms incorporating uncertainty-based rewards, based on which we propose the UnCertainty-Aware Policy Optimization (UCPO) framework. UCPO employs Ternary Advantage Decoupling to separate and independently normalize deterministic and uncertain rollouts, thereby eliminating advantage bias. Furthermore, a Dynamic Uncertainty Reward Adjustment mechanism is introduced to calibrate uncertainty weights in real-time according to model evolution and instance difficulty. Experimental results in mathematical reasoning and general tasks demonstrate that UCPO effectively resolves the reward imbalance, significantly improving the reliability and calibration of the model beyond their knowledge boundaries.


【6】A Random Matrix Theory of Masked Self-Supervised Regression
标题:掩蔽自我监督回归的随机矩阵理论
链接:https://arxiv.org/abs/2601.23208

作者:Arie Wortsman Zurich,Federica Gerace,Bruno Loureiro,Yue M. Lu
摘要:In the era of transformer models, masked self-supervised learning (SSL) has become a foundational training paradigm. A defining feature of masked SSL is that training aggregates predictions across many masking patterns, giving rise to a joint, matrix-valued predictor rather than a single vector-valued estimator. This object encodes how coordinates condition on one another and poses new analytical challenges. We develop a precise high-dimensional analysis of masked modeling objectives in the proportional regime where the number of samples scales with the ambient dimension. Our results provide explicit expressions for the generalization error and characterize the spectral structure of the learned predictor, revealing how masked modeling extracts structure from data. For spiked covariance models, we show that the joint predictor undergoes a Baik--Ben Arous--Péché (BBP)-type phase transition, identifying when masked SSL begins to recover latent signals. Finally, we identify structured regimes in which masked self-supervised learning provably outperforms PCA, highlighting potential advantages of SSL objectives over classical unsupervised methods


【7】Asymptotic Theory of Iterated Empirical Risk Minimization, with Applications to Active Learning
标题:迭代经验风险最小化的渐进理论及其在主动学习中的应用
链接:https://arxiv.org/abs/2601.23031

作者:Hugo Cui,Yue M. Lu
摘要 :We study a class of iterated empirical risk minimization (ERM) procedures in which two successive ERMs are performed on the same dataset, and the predictions of the first estimator enter as an argument in the loss function of the second. This setting, which arises naturally in active learning and reweighting schemes, introduces intricate statistical dependencies across samples and fundamentally distinguishes the problem from classical single-stage ERM analyses. For linear models trained with a broad class of convex losses on Gaussian mixture data, we derive a sharp asymptotic characterization of the test error in the high-dimensional regime where the sample size and ambient dimension scale proportionally. Our results provide explicit, fully asymptotic predictions for the performance of the second-stage estimator despite the reuse of data and the presence of prediction-dependent losses. We apply this theory to revisit a well-studied pool-based active learning problem, removing oracle and sample-splitting assumptions made in prior work. We uncover a fundamental tradeoff in how the labeling budget should be allocated across stages, and demonstrate a double-descent behavior of the test error driven purely by data selection, rather than model size or sample count.


迁移|Zero/Few/One-Shot|自适应(12篇)

【1】Why GRPO Needs Normalization: A Local-Curvature Perspective on Adaptive Gradients
标题:为什么GRPO需要规范化:自适应性要素的局部弯曲视角
链接:https://arxiv.org/abs/2601.23135

作者:Cheng Ge,Caitlyn Heqi Yin,Hao Liang,Jiawei Zhang
摘要:Reinforcement learning (RL) has become a key driver of language model reasoning. Among RL algorithms, Group Relative Policy Optimization (GRPO) is the de facto standard, avoiding the need for a critic by using per-prompt baselines and variance normalization. Yet why and when this normalization helps remains unclear. In this work, we provide an explanation through the lens of local curvature of the sequence-level policy gradient: standard deviation normalization implements an adaptive gradient. Theoretically, under mild conditions, GRPO enjoys a strictly improved convergence rate over unnormalized REINFORCE, with gains characterized by the average within-prompt reward standard deviation across prompts and iterations. Empirically, our analysis on GSM8K and MATH benchmarks reveals three distinct training phases governed by the interplay between feature orthogonality and reward variance: (I) an early acceleration phase where high variance and orthogonality favor adaptive scaling; (II) a relatively stable transition phase; and (III) a late-stage regime where the loss of orthogonality limits further gains. Together, these results provide a principled account of when std normalization helps in GRPO, and offer broader insights into the design of critic-free RL algorithms.


【2】ExplainerPFN: Towards tabular foundation models for model-free zero-shot feature importance estimations
标题:ExplainerPFN:面向无模型零触发特征重要性估计的表格基础模型
链接:https://arxiv.org/abs/2601.23068

作者:Joao Fonseca,Julia Stoyanovich
备注:18 pages, 7 figures
摘要:Computing the importance of features in supervised classification tasks is critical for model interpretability. Shapley values are a widely used approach for explaining model predictions, but require direct access to the underlying model, an assumption frequently violated in real-world deployments. Further, even when model access is possible, their exact computation may be prohibitively expensive. We investigate whether meaningful Shapley value estimations can be obtained in a zero-shot setting, using only the input data distribution and no evaluations of the target model. To this end, we introduce ExplainerPFN, a tabular foundation model built on TabPFN that is pretrained on synthetic datasets generated from random structural causal models and supervised using exact or near-exact Shapley values. Once trained, ExplainerPFN predicts feature attributions for unseen tabular datasets without model access, gradients, or example explanations.   Our contributions are fourfold: (1) we show that few-shot learning-based explanations can achieve high fidelity to SHAP values with as few as two reference observations; (2) we propose ExplainerPFN, the first zero-shot method for estimating Shapley values without access to the underlying model or reference explanations; (3) we provide an open-source implementation of ExplainerPFN, including the full training pipeline and synthetic data generator; and (4) through extensive experiments on real and synthetic datasets, we show that ExplainerPFN achieves performance competitive with few-shot surrogate explainers that rely on 2-10 SHAP examples.


【3】Avoiding Premature Collapse: Adaptive Annealing for Entropy-Regularized Structural Inference
标题:避免过早崩溃:自适应性Angle for Entropolitan Regional结构推理
链接:https://arxiv.org/abs/2601.23039

作者:Yizhi Liu
摘要:Differentiable matching layers, often implemented via entropy-regularized Optimal Transport, serve as a critical approximate inference mechanism in structural prediction. However, recovering discrete permutations via annealing $ε\to 0$ is notoriously unstable. We identify a fundamental mechanism for this failure: \textbf{Premature Mode Collapse}. By analyzing the non-normal dynamics of the Sinkhorn fixed-point map, we reveal a theoretical \textbf{thermodynamic speed limit}. Under standard exponential cooling, the shift in the target posterior ($O(1)$) outpaces the contraction rate of the inference operator, which degrades as $O(1/ε)$. This mismatch inevitably forces the inference trajectory into spurious local basins. To address this, we propose \textbf{Efficient PH-ASC}, an adaptive scheduling algorithm that monitors the stability of the inference process. By enforcing a linear stability law, we decouple expensive spectral diagnostics from the training loop, reducing overhead from $O(N^3)$ to amortized $O(1)$. Our implementation and interactive demo are available at https://github.com/xxx0438/torch-sinkhorn-asc and https://huggingface.co/spaces/leon0923/torch-sinkhorn-asc-demo. bounded away from zero in generic training dynamics unless the feature extractor converges unrealistically fast.


【4】FlexLoRA: Entropy-Guided Flexible Low-Rank Adaptation
标题:FlexLoRA:信息量引导的灵活低等级适应
链接:https://arxiv.org/abs/2601.22905

作者:Muqing Liu,Chongjie Si,Yuheng Jia
备注:2026 ICLR. Codes in https://github.com/Chongjie-Si/Subspace-Tuning
摘要 :Large pre-trained models achieve remarkable success across diverse domains, yet fully fine-tuning incurs prohibitive computational and memory costs. Parameter-efficient fine-tuning (PEFT) has thus become a mainstream paradigm. Among them, Low-Rank Adaptation (LoRA) introduces trainable low-rank matrices and shows strong performance, nevertheless, its fixed-rank design limits flexibility. Dynamic rank allocation methods mitigate this issue by pruning redundant directions; however, they often rely on heuristic, element-level metrics that globally sort rank directions without matrix-wise distinction, and they lack mechanisms to expand capacity in layers requiring additional adaptation. To overcome these limitations, we propose FlexLoRA, an entropy-guided flexible low-rank adaptation framework that (i) evaluates matrix importance via spectral energy entropy, (ii) supports rank pruning and expansion under a global budget, and (iii) employs zero-impact initialization for newly added singular directions to ensure stability. By addressing granularity, flexibility, and stability limitations, FlexLoRA provides a more principled solution for PEFT. Extensive experiments show that FlexLoRA consistently outperforms state-of-the-art baselines across benchmarks. Codes are available at https://github.com/Chongjie-Si/Subspace-Tuning.


【5】SQUAD: Scalable Quorum Adaptive Decisions via ensemble of early exit neural networks
标题:SQUAD:通过早期退出神经网络集成的可扩展法定人数自适应决策
链接:https://arxiv.org/abs/2601.22711

作者:Matteo Gambella,Fabrizio Pittorino,Giuliano Casale,Manuel Roveri
摘要:Early-exit neural networks have become popular for reducing inference latency by allowing intermediate predictions when sufficient confidence is achieved. However, standard approaches typically rely on single-model confidence thresholds, which are frequently unreliable due to inherent calibration issues. To address this, we introduce SQUAD (Scalable Quorum Adaptive Decisions), the first inference scheme that integrates early-exit mechanisms with distributed ensemble learning, improving uncertainty estimation while reducing the inference time. Unlike traditional methods that depend on individual confidence scores, SQUAD employs a quorum-based stopping criterion on early-exit learners by collecting intermediate predictions incrementally in order of computational complexity until a consensus is reached and halting the computation at that exit if the consensus is statistically significant. To maximize the efficacy of this voting mechanism, we also introduce QUEST (Quorum Search Technique), a Neural Architecture Search method to select early-exit learners with optimized hierarchical diversity, ensuring learners are complementary at every intermediate layer. This consensus-driven approach yields statistically robust early exits, improving the test accuracy up to 5.95% compared to state-of-the-art dynamic solutions with a comparable computational cost and reducing the inference latency up to 70.60% compared to static ensembles while maintaining a good accuracy.


【6】DART-ing Through the Drift: Dynamic Tracing of Knowledge Neurons for Adaptive Inference-Time Pruning
标题:DART穿越漂移:动态跟踪知识神经元以实现自适应推理时修剪
链接:https://arxiv.org/abs/2601.22632

作者:Abhishek Tyagi,Yunuo Cen,Shrey Dhorajiya,Bharadwaj Veeravalli,Xuanyao Fong
摘要:Large Language Models (LLMs) exhibit substantial parameter redundancy, particularly in Feed-Forward Networks (FFNs). Existing pruning methods suffer from two primary limitations. First, reliance on dataset-specific calibration introduces significant data dependency and computational overhead. Second, being predominantly static, they fail to account for the evolving subset of knowledge neurons in LLMs during autoregressive generation as the context evolves. To address this, we introduce DART, i.e., Dynamic Attention-Guided Runtime Tracing), a lightweight, training-free method that performs on-the-fly context-based pruning. DART monitors shifts in attention score distributions to infer context changes, dynamically updating neuron-level masks to retain salient parameters. Across ten benchmarks, DART outperforms prior dynamic baseline, achieving accuracy gains of up to 14.5% on LLAMA-3.1-8B at 70% FFN sparsity. Furthermore, DART achieves up to 3x better ROUGE-L scores with respect to static-masked pruning on summarization tasks, with its performance comparable to the original dense models. We conclusively demonstrate that the proposed framework effectively adapts to diverse semantic contexts, preserves model capabilities across both general and domain-specific tasks while running at less than 10MBs of memory for LLAMA-3.1-8B(16GBs) with 0.1% FLOPs overhead. The code is available at https://github.com/seeder-research/DART.


【7】Cross-Domain Few-Shot Learning for Hyperspectral Image Classification Based on Mixup Foundation Model
标题:基于Mixup Foundation模型的跨域Few-Shot学习高光谱图像分类
链接:https://arxiv.org/abs/2601.22581

作者:Naeem Paeedeh,Mahardhika Pratama,Ary Shiddiqi,Zehong Cao,Mukesh Prasad,Wisnu Jatmiko
摘要:Although cross-domain few-shot learning (CDFSL) for hyper-spectral image (HSI) classification has attracted significant research interest, existing works often rely on an unrealistic data augmentation procedure in the form of external noise to enlarge the sample size, thus greatly simplifying the issue of data scarcity. They involve a large number of parameters for model updates, being prone to the overfitting problem. To the best of our knowledge, none has explored the strength of the foundation model, having strong generalization power to be quickly adapted to downstream tasks. This paper proposes the MIxup FOundation MOdel (MIFOMO) for CDFSL of HSI classifications. MIFOMO is built upon the concept of a remote sensing (RS) foundation model, pre-trained across a large scale of RS problems, thus featuring generalizable features. The notion of coalescent projection (CP) is introduced to quickly adapt the foundation model to downstream tasks while freezing the backbone network. The concept of mixup domain adaptation (MDM) is proposed to address the extreme domain discrepancy problem. Last but not least, the label smoothing concept is implemented to cope with noisy pseudo-label problems. Our rigorous experiments demonstrate the advantage of MIFOMO, where it beats prior arts with up to 14% margin. The source code of MIFOMO is open-sourced in https://github.com/Naeem- Paeedeh/MIFOMO for reproducibility and convenient further study.


【8】Scalable Batch Correction for Cell Painting via Batch-Dependent Kernels and Adaptive Sampling
标题:通过批次相关核和自适应采样对细胞绘制进行可扩展批量修正
链接:https://arxiv.org/abs/2601.22331

作者:Aditya Narayan Ravi,Snehal Vadvalkar,Abhishek Pandey,Ilan Shomorony
备注:40 pages, many figures
摘要 :Cell Painting is a microscopy-based, high-content imaging assay that produces rich morphological profiles of cells and can support drug discovery by quantifying cellular responses to chemical perturbations. At scale, however, Cell Painting data is strongly affected by batch effects arising from differences in laboratories, instruments, and protocols, which can obscure biological signal. We present BALANS (Batch Alignment via Local Affinities and Subsampling), a scalable batch-correction method that aligns samples across batches by constructing a smoothed affinity matrix from pairwise distances. Given $n$ data points, BALANS builds a sparse affinity matrix $A \in \mathbb{R}^{n \times n}$ using two ideas. (i) For points $i$ and $j$, it sets a local scale using the distance from $i$ to its $k$-th nearest neighbor within the batch of $j$, then computes $A_{ij}$ via a Gaussian kernel calibrated by these batch-aware local scales. (ii) Rather than forming all $n^2$ entries, BALANS uses an adaptive sampling procedure that prioritizes rows with low cumulative neighbor coverage and retains only the strongest affinities per row, yielding a sparse but informative approximation of $A$. We prove that this sampling strategy is order-optimal in sample complexity and provides an approximation guarantee, and we show that BALANS runs in nearly linear time in $n$. Experiments on diverse real-world Cell Painting datasets and controlled large-scale synthetic benchmarks demonstrate that BALANS scales to large collections while improving runtime over native implementations of widely used batch-correction methods, without sacrificing correction quality.


【9】Conformal Prediction for Generative Models via Adaptive Cluster-Based Density Estimation
标题:通过自适应基于集群的密度估计实现生成模型的保形预测
链接:https://arxiv.org/abs/2601.22298

作者:Qidong Yang,Qianyu Julie Zhu,Jonathan Giezendanner,Youssef Marzouk,Stephen Bates,Sherrie Wang
摘要:Conditional generative models map input variables to complex, high-dimensional distributions, enabling realistic sample generation in a diverse set of domains. A critical challenge with these models is the absence of calibrated uncertainty, which undermines trust in individual outputs for high-stakes applications. To address this issue, we propose a systematic conformal prediction approach tailored to conditional generative models, leveraging density estimation on model-generated samples. We introduce a novel method called CP4Gen, which utilizes clustering-based density estimation to construct prediction sets that are less sensitive to outliers, more interpretable, and of lower structural complexity than existing methods. Extensive experiments on synthetic datasets and real-world applications, including climate emulation tasks, demonstrate that CP4Gen consistently achieves superior performance in terms of prediction set volume and structural simplicity. Our approach offers practitioners a powerful tool for uncertainty estimation associated with conditional generative models, particularly in scenarios demanding rigorous and interpretable prediction sets.


【10】Task-Uniform Convergence and Backward Transfer in Federated Domain-Incremental Learning with Partial Participation
标题:部分参与的联邦域增量学习中的任务一致收敛和向后转移
链接:https://arxiv.org/abs/2601.22274

作者:Longtao Xu,Jian Li
摘要:Real-world federated systems seldom operate on static data: input distributions drift while privacy rules forbid raw-data sharing. We study this setting as Federated Domain-Incremental Learning (FDIL), where (i) clients are heterogeneous, (ii) tasks arrive sequentially with shifting domains, yet (iii) the label space remains fixed. Two theoretical pillars remain missing for FDIL under realistic deployment: a guarantee of backward knowledge transfer (BKT) and a convergence rate that holds across the sequence of all tasks with partial participation. We introduce SPECIAL (Server-Proximal Efficient Continual Aggregation for Learning), a simple, memory-free FDIL algorithm that adds a single server-side ``anchor'' to vanilla FedAvg: in each round, the server nudges the uniformly sampled participated clients update toward the previous global model with a lightweight proximal term. This anchor curbs cumulative drift without replay buffers, synthetic data, or task-specific heads, keeping communication and model size unchanged. Our theory shows that SPECIAL (i) preserves earlier tasks: a BKT bound caps any increase in prior-task loss by a drift-controlled term that shrinks with more rounds, local epochs, and participating clients; and (ii) learns efficiently across all tasks: the first communication-efficient non-convex convergence rate for FDIL with partial participation, O((E/NT)^(1/2)), with E local epochs, T communication rounds, and N participated clients per round, matching single-task FedAvg while explicitly separating optimization variance from inter-task drift. Experimental results further demonstrate the effectiveness of SPECIAL.


【11】FedAdaVR: Adaptive Variance Reduction for Robust Federated Learning under Limited Client Participation
标题:FedAdaVR:在有限客户参与下实现稳健联邦学习的自适应方差减少
链接:https://arxiv.org/abs/2601.22204

作者:S M Ruhul Kabir Howlader,Xiao Chen,Yifei Xie,Lu Liu
摘要:Federated learning (FL) encounters substantial challenges due to heterogeneity, leading to gradient noise, client drift, and partial client participation errors, the last of which is the most pervasive but remains insufficiently addressed in current literature. In this paper, we propose FedAdaVR, a novel FL algorithm aimed at solving heterogeneity issues caused by sporadic client participation by incorporating an adaptive optimiser with a variance reduction technique. This method takes advantage of the most recent stored updates from clients, even when they are absent from the current training round, thereby emulating their presence. Furthermore, we propose FedAdaVR-Quant, which stores client updates in quantised form, significantly reducing the memory requirements (by 50%, 75%, and 87.5%) of FedAdaVR while maintaining equivalent model performance. We analyse the convergence behaviour of FedAdaVR under general nonconvex conditions and prove that our proposed algorithm can eliminate partial client participation error. Extensive experiments conducted on multiple datasets, under both independent and identically distributed (IID) and non-IID settings, demonstrate that FedAdaVR consistently outperforms state-of-the-art baseline methods.


【12】Adaptive Benign Overfitting (ABO): Overparameterized RLS for Online Learning in Non-stationary Time-series
标题:自适应良性过度匹配(ASO):用于非平稳时间序列在线学习的过度参数化SLS
链接:https://arxiv.org/abs/2601.22200

作者:Luis Ontaneda Mijares,Nick Firoozye
备注:32 pages, 3 figures, 10 tables
摘要 :Overparameterized models have recently challenged conventional learning theory by exhibiting improved generalization beyond the interpolation limit, a phenomenon known as benign overfitting. This work introduces Adaptive Benign Overfitting (ABO), extending the recursive least-squares (RLS) framework to this regime through a numerically stable formulation based on orthogonal-triangular updates. A QR-based exponentially weighted RLS (QR-EWRLS) algorithm is introduced, combining random Fourier feature mappings with forgetting-factor regularization to enable online adaptation under non-stationary conditions. The orthogonal decomposition prevents the numerical divergence associated with covariance-form RLS while retaining adaptability to evolving data distributions. Experiments on nonlinear synthetic time series confirm that the proposed approach maintains bounded residuals and stable condition numbers while reproducing the double-descent behavior characteristic of overparameterized models. Applications to forecasting foreign exchange and electricity demand show that ABO is highly accurate (comparable to baseline kernel methods) while achieving speed improvements of between 20 and 40 percent. The results provide a unified view linking adaptive filtering, kernel approximation, and benign overfitting within a stable online learning framework.


强化学习(13篇)

【1】Agile Reinforcement Learning through Separable Neural Architecture
标题:通过可分离神经架构的敏捷强化学习
链接:https://arxiv.org/abs/2601.23225

作者:Rajib Mostakim,Reza T. Batley,Sourav Saha
摘要:Deep reinforcement learning (RL) is increasingly deployed in resource-constrained environments, yet the go-to function approximators - multilayer perceptrons (MLPs) - are often parameter-inefficient due to an imperfect inductive bias for the smooth structure of many value functions. This mismatch can also hinder sample efficiency and slow policy learning in this capacity-limited regime. Although model compression techniques exist, they operate post-hoc and do not improve learning efficiency. Recent spline-based separable architectures - such as Kolmogorov-Arnold Networks (KANs) - have been shown to offer parameter efficiency but are widely reported to exhibit significant computational overhead, especially at scale.   In seeking to address these limitations, this work introduces SPAN (SPline-based Adaptive Networks), a novel function approximation approach to RL. SPAN adapts the low rank KHRONOS framework by integrating a learnable preprocessing layer with a separable tensor product B-spline basis. SPAN is evaluated across discrete (PPO) and high-dimensional continuous (SAC) control tasks, as well as offline settings (Minari/D4RL). Empirical results demonstrate that SPAN achieves a 30-50% improvement in sample efficiency and 1.3-9 times higher success rates across benchmarks compared to MLP baselines. Furthermore, SPAN demonstrates superior anytime performance and robustness to hyperparameter variations, suggesting it as a viable, high performance alternative for learning intrinsically efficient policies in resource-limited settings.


【2】On Safer Reinforcement Learning Policies for Sedation and Analgesia in Intensive Care
标题:关于重症监护中镇静和镇痛的更安全的强化学习策略
链接:https://arxiv.org/abs/2601.23154

作者:Joel Romero-Hernandez,Oscar Camara
备注:Submitted to the 48th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (IEEE EMBC 2026)
摘要:Pain management in intensive care usually involves complex trade-offs between therapeutic goals and patient safety, since both inadequate and excessive treatment may induce serious sequelae. Reinforcement learning can help address this challenge by learning medication dosing policies from retrospective data. However, prior work on sedation and analgesia has optimized for objectives that do not value patient survival while relying on algorithms unsuitable for imperfect information settings. We investigated the risks of these design choices by implementing a deep reinforcement learning framework to suggest hourly medication doses under partial observability. Using data from 47,144 ICU stays in the MIMIC-IV database, we trained policies to prescribe opioids, propofol, benzodiazepines, and dexmedetomidine according to two goals: reduce pain or jointly reduce pain and mortality. We found that, although the two policies were associated with lower pain, actions from the first policy were positively correlated with mortality, while those proposed by the second policy were negatively correlated. This suggests that valuing long-term outcomes could be critical for safer treatment policies, even if a short-term goal remains the primary objective.


【3】RN-D: Discretized Categorical Actors with Regularized Networks for On-Policy Reinforcement Learning
标题:RN-D:具有规则化网络的离散类别参与者,用于政策上强化学习
链接:https://arxiv.org/abs/2601.23075

作者:Yuexin Bian,Jie Feng,Tao Wang,Yijiang Li,Sicun Gao,Yuanyuan Shi
摘要:On-policy deep reinforcement learning remains a dominant paradigm for continuous control, yet standard implementations rely on Gaussian actors and relatively shallow MLP policies, often leading to brittle optimization when gradients are noisy and policy updates must be conservative. In this paper, we revisit policy representation as a first-class design choice for on-policy optimization. We study discretized categorical actors that represent each action dimension with a distribution over bins, yielding a policy objective that resembles a cross-entropy loss. Building on architectural advances from supervised learning, we further propose regularized actor networks, while keeping critic design fixed. Our results show that simply replacing the standard actor network with our discretized regularized actor yields consistent gains and achieve the state-of-the-art performance across diverse continuous-control benchmarks.


【4】From Absolute to Relative: Rethinking Reward Shaping in Group-Based Reinforcement Learning
标题:从绝对到相对:重新思考基于群体的强化学习中的奖励塑造
链接:https://arxiv.org/abs/2601.23058

作者:Wenzhe Niu,Wei He,Zongxia Xie,Jinpeng Ou,Huichuan Fan,Yuchen Ge,Yanru Sun,Ziyin Wang,Yizhao Sun,Chengshun Shi,Jiuchong Gao,Jinghua Hao,Renqing He
摘要:Reinforcement learning has become a cornerstone for enhancing the reasoning capabilities of Large Language Models, where group-based approaches such as GRPO have emerged as efficient paradigms that optimize policies by leveraging intra-group performance differences. However, these methods typically rely on absolute numerical rewards, introducing intrinsic limitations. In verifiable tasks, identical group evaluations often result in sparse supervision, while in open-ended scenarios, the score range instability of reward models undermines advantage estimation based on group means. To address these limitations, we propose Reinforcement Learning with Relative Rewards (RLRR), a framework that shifts reward shaping from absolute scoring to relative ranking. Complementing this framework, we introduce the Ranking Reward Model, a listwise preference model tailored for group-based optimization to directly generate relative rankings. By transforming raw evaluations into robust relative signals, RLRR effectively mitigates signal sparsity and reward instability. Experimental results demonstrate that RLRR yields consistent performance improvements over standard group-based baselines across reasoning benchmarks and open-ended generation tasks.


【5】Automatic Constraint Policy Optimization based on Continuous Constraint Interpolation Framework for Offline Reinforcement Learning
标题:基于连续约束插值框架的离线强化学习自动约束策略优化
链接:https://arxiv.org/abs/2601.23010

作者:Xinchen Han,Qiuyang Fang,Hossam Afifi,Michel Marot
摘要:Offline Reinforcement Learning (RL) relies on policy constraints to mitigate extrapolation error, where both the constraint form and constraint strength critically shape performance. However, most existing methods commit to a single constraint family: weighted behavior cloning, density regularization, or support constraints, without a unified principle that explains their connections or trade-offs. In this work, we propose Continuous Constraint Interpolation (CCI), a unified optimization framework in which these three constraint families arise as special cases along a common constraint spectrum. The CCI framework introduces a single interpolation parameter that enables smooth transitions and principled combinations across constraint types. Building on CCI, we develop Automatic Constraint Policy Optimization (ACPO), a practical primal--dual algorithm that adapts the interpolation parameter via a Lagrangian dual update. Moreover, we establish a maximum-entropy performance difference lemma and derive performance lower bounds for both the closed-form optimal policy and its parametric projection. Experiments on D4RL and NeoRL2 demonstrate robust gains across diverse domains, achieving state-of-the-art performance overall.


【6】MTDrive: Multi-turn Interactive Reinforcement Learning for Autonomous Driving
标题:MTDrive:用于自动驾驶的多轮交互强化学习
链接:https://arxiv.org/abs/2601.22930

作者:Xidong Li,Mingyu Guo,Chenchao Xu,Bailin Li,Wenjing Zhu,Yangang Zou,Rui Chen,Zehuan Wang
摘要:Trajectory planning is a core task in autonomous driving, requiring the prediction of safe and comfortable paths across diverse scenarios. Integrating Multi-modal Large Language Models (MLLMs) with Reinforcement Learning (RL) has shown promise in addressing "long-tail" scenarios. However, existing methods are constrained to single-turn reasoning, limiting their ability to handle complex tasks requiring iterative refinement. To overcome this limitation, we present MTDrive, a multi-turn framework that enables MLLMs to iteratively refine trajectories based on environmental feedback. MTDrive introduces Multi-Turn Group Relative Policy Optimization (mtGRPO), which mitigates reward sparsity by computing relative advantages across turns. We further construct an interactive trajectory understanding dataset from closed-loop simulation to support multi-turn training. Experiments on the NAVSIM benchmark demonstrate superior performance compared to existing methods, validating the effectiveness of our multi-turn reasoning paradigm. Additionally, we implement system-level optimizations to reduce data transfer overhead caused by high-resolution images and multi-turn sequences, achieving 2.5x training throughput. Our data, models, and code will be made available soon.


【7】Offline Reinforcement Learning of High-Quality Behaviors Under Robust Style Alignment
标题:稳健风格一致下高质量行为的离线强化学习
链接:https://arxiv.org/abs/2601.22823

作者:Mathieu Petitbois,Rémy Portelas,Sylvain Lamprier
摘要:We study offline reinforcement learning of style-conditioned policies using explicit style supervision via subtrajectory labeling functions. In this setting, aligning style with high task performance is particularly challenging due to distribution shift and inherent conflicts between style and reward. Existing methods, despite introducing numerous definitions of style, often fail to reconcile these objectives effectively. To address these challenges, we propose a unified definition of behavior style and instantiate it into a practical framework. Building on this, we introduce Style-Conditioned Implicit Q-Learning (SCIQL), which leverages offline goal-conditioned RL techniques, such as hindsight relabeling and value learning, and combine it with a new Gated Advantage Weighted Regression mechanism to efficiently optimize task performance while preserving style alignment. Experiments demonstrate that SCIQL achieves superior performance on both objectives compared to prior offline methods. Code, datasets and visuals are available in: https://sciql-iclr-2026.github.io/.


【8】MC-GRPO: Median-Centered Group Relative Policy Optimization for Small-Rollout Reinforcement Learning
标题:MC-GRPO:以中等为中心的群体针对小型推广强化学习的相对政策优化
链接:https://arxiv.org/abs/2601.22582

作者:Youngeun Kim
摘要 :Group-relative policy optimization methods train language models by generating multiple rollouts per prompt and normalizing rewards with a shared mean reward baseline. In resource-constrained settings where the rollout budget is small, accuracy often degrades. We find that noise in the shared baseline induces advantage sign flips, where some rollouts receive an incorrect advantage sign, and the update direction is reversed. To address this, we propose Median-Centered Group Relative Policy Optimization (MC-GRPO), a simple and effective solution for small-rollout training. Our main idea is to replace the mean baseline with a median baseline: the median is far less sensitive to outlier rewards than the mean, mitigating the sign flips under small rollout size (G). We generate one additional rollout for median reference (G+1), and compute advantages by using the group median. With an odd-sized group, exactly one completion is the median and receives zero advantage, we exclude this pivot rollout from backpropagation so the number of gradient-contributing samples per prompt remains G, preserving the core update cost of standard G-rollout training. Across various GRPO-family methods and a wide range of models and scales, this median-centered training consistently improves stability and final accuracy in the low-rollout regime, reducing the gap between G=2 and G=8 to within 1%. Code is available at https://github.com/lotusroot-kim/MC-GRPO


【9】Demystifying Design Choices of Reinforcement Fine-tuning: A Batched Contextual Bandit Learning Perspective
标题:强化微调的设计选择:批量上下文Bandit学习视角
链接:https://arxiv.org/abs/2601.22532

作者:Hong Xie,Xiao Hu,Tao Tan,Haoran Gu,Xin Li,Jianyu Han,Defu Lian,Enhong Chen
摘要:The reinforcement fine-tuning area is undergoing an explosion papers largely on optimizing design choices. Though performance gains are often claimed, inconsistent conclusions also arise from time to time, making the progress illusive. Reflecting on this illusion, we still lack principled answers to two fundamental questions: 1) what is the role of each design choice? 2) which ones are critical? This paper aims to shed light on them. The underlying challenge is that design choices are entangled together, making their contribution to learning and generalization difficult to attribute. To address this challenge, we first construct a minimalist baseline for disentangling factors: one rollout per query in each round, the outcome reward serving as the training signal without any advantage trick, and a batch size of thirty-two. This baseline connects to batched contextual bandit learning, which facilitates experimental analysis. Centering around this baseline, we design an experiment pipeline, examining the marginal gains of factors like advantage, number of rollouts, etc. Experiments on three base models and two datasets, not only reveal new understanding on the role of various design choices on learning and generalization dynamics, but also identify critical ones that deserve more effort.


【10】Continual Policy Distillation from Distributed Reinforcement Learning Teachers
标题:分布式强化学习教师的持续政策提炼
链接:https://arxiv.org/abs/2601.22475

作者:Yuxuan Li,Qijun He,Mingqi Yuan,Wen-Tse Chen,Jeff Schneider,Jiayu Chen
备注:19 pages (8 pages main text)
摘要:Continual Reinforcement Learning (CRL) aims to develop lifelong learning agents to continuously acquire knowledge across diverse tasks while mitigating catastrophic forgetting. This requires efficiently managing the stability-plasticity dilemma and leveraging prior experience to rapidly generalize to novel tasks. While various enhancement strategies for both aspects have been proposed, achieving scalable performance by directly applying RL to sequential task streams remains challenging. In this paper, we propose a novel teacher-student framework that decouples CRL into two independent processes: training single-task teacher models through distributed RL and continually distilling them into a central generalist model. This design is motivated by the observation that RL excels at solving single tasks, while policy distillation -- a relatively stable supervised learning process -- is well aligned with large foundation models and multi-task learning. Moreover, a mixture-of-experts (MoE) architecture and a replay-based approach are employed to enhance the plasticity and stability of the continual policy distillation process. Extensive experiments on the Meta-World benchmark demonstrate that our framework enables efficient continual RL, recovering over 85% of teacher performance while constraining task-wise forgetting to within 10%.


【11】SAIR: Cost-Efficient Multi-Stage ML Pipeline Autoscaling via In-Context Reinforcement Learning
标题:SAIR:通过上下文内强化学习实现经济高效的多阶段ML管道自动扩展
链接:https://arxiv.org/abs/2601.22397

作者:Jianchang Su,Yifan Zhang,Shengkai Lin,Shizhen Zhao,Yusheng Zheng,Yiwei Yang,Wei Zhang
摘要:Multi-stage ML inference pipelines are difficult to autoscale due to heterogeneous resources, cross-stage coupling, and dynamic bottleneck migration. We present SAIR, an autoscaling framework that uses an LLM as an in-context reinforcement learning controller, improving its policy online from reward-labeled interaction histories without gradient updates. SAIR combines Pareto-dominance reward shaping with a provable separation margin, surprisal-guided experience retrieval for context efficiency, and fine-grained GPU rate control via user-space CUDA interception. We provide regret analysis decomposing error into retrieval coverage and LLM selection components. On four ML serving pipelines under three workload patterns, SAIR achieves the best or tied-best P99 latency and effective resource cost among deployed baselines, improving P99 by up to 50% and reducing effective cost by up to 97% (under GPU rate-control assumptions), with 86% bottleneck detection accuracy and no offline training.


【12】Quantum-Inspired Reinforcement Learning for Secure and Sustainable AIoT-Driven Supply Chain Systems
标题:基于量子启发的强化学习,用于安全和可持续的AIoT驱动的供应链系统
链接:https://arxiv.org/abs/2601.22339

作者:Muhammad Bilal Akram Dastagir,Omer Tariq,Shahid Mumtaz,Saif Al-Kuwari,Ahmed Farouk
摘要 :Modern supply chains must balance high-speed logistics with environmental impact and security constraints, prompting a surge of interest in AI-enabled Internet of Things (AIoT) solutions for global commerce. However, conventional supply chain optimization models often overlook crucial sustainability goals and cyber vulnerabilities, leaving systems susceptible to both ecological harm and malicious attacks. To tackle these challenges simultaneously, this work integrates a quantum-inspired reinforcement learning framework that unifies carbon footprint reduction, inventory management, and cryptographic-like security measures. We design a quantum-inspired reinforcement learning framework that couples a controllable spin-chain analogy with real-time AIoT signals and optimizes a multi-objective reward unifying fidelity, security, and carbon costs. The approach learns robust policies with stabilized training via value-based and ensemble updates, supported by window-normalized reward components to ensure commensurate scaling. In simulation, the method exhibits smooth convergence, strong late-episode performance, and graceful degradation under representative noise channels, outperforming standard learned and model-based references, highlighting its robust handling of real-time sustainability and risk demands. These findings reinforce the potential for quantum-inspired AIoT frameworks to drive secure, eco-conscious supply chain operations at scale, laying the groundwork for globally connected infrastructures that responsibly meet both consumer and environmental needs.


【13】Latent Spherical Flow Policy for Reinforcement Learning with Combinatorial Actions
标题:具有组合动作的强化学习的潜在球形流策略
链接:https://arxiv.org/abs/2601.22211

作者:Lingkai Kong,Anagha Satish,Hezi Jiang,Akseli Kangaslahti,Andrew Ma,Wenbo Chen,Mingxiao Song,Lily Xu,Milind Tambe
摘要:Reinforcement learning (RL) with combinatorial action spaces remains challenging because feasible action sets are exponentially large and governed by complex feasibility constraints, making direct policy parameterization impractical. Existing approaches embed task-specific value functions into constrained optimization programs or learn deterministic structured policies, sacrificing generality and policy expressiveness. We propose a solver-induced \emph{latent spherical flow policy} that brings the expressiveness of modern generative policies to combinatorial RL while guaranteeing feasibility by design. Our method, LSFlow, learns a \emph{stochastic} policy in a compact continuous latent space via spherical flow matching, and delegates feasibility to a combinatorial optimization solver that maps each latent sample to a valid structured action. To improve efficiency, we train the value network directly in the latent space, avoiding repeated solver calls during policy optimization. To address the piecewise-constant and discontinuous value landscape induced by solver-based action selection, we introduce a smoothed Bellman operator that yields stable, well-defined learning targets. Empirically, our approach outperforms state-of-the-art baselines by an average of 20.6\% across a range of challenging combinatorial RL tasks.


元学习(1篇)

【1】Detect and Act: Automated Dynamic Optimizer through Meta-Black-Box Optimization
标题:检测并行动:通过元黑匣子优化实现自动动态优化
链接:https://arxiv.org/abs/2601.22542

作者:Zijian Gao,Yuanting Zhong,Zeyuan Ma,Yue-Jiao Gong,Hongshu Guo
摘要:Dynamic Optimization Problems (DOPs) are challenging to address due to their complex nature, i.e., dynamic environment variation. Evolutionary Computation methods are generally advantaged in solving DOPs since they resemble dynamic biological evolution. However, existing evolutionary dynamic optimization methods rely heavily on human-crafted adaptive strategy to detect environment variation in DOPs, and then adapt the searching strategy accordingly. These hand-crafted strategies may perform ineffectively at out-of-box scenarios. In this paper, we propose a reinforcement learning-assisted approach to enable automated variation detection and self-adaption in evolutionary algorithms. This is achieved by borrowing the bi-level learning-to-optimize idea from recent Meta-Black-Box Optimization works. We use a deep Q-network as optimization dynamics detector and searching strategy adapter: It is fed as input with current-step optimization state and then dictates desired control parameters to underlying evolutionary algorithms for next-step optimization. The learning objective is to maximize the expected performance gain across a problem distribution. Once trained, our approach could generalize toward unseen DOPs with automated environment variation detection and self-adaption. To facilitate comprehensive validation, we further construct an easy-to-difficult DOPs testbed with diverse synthetic instances. Extensive benchmark results demonstrate flexible searching behavior and superior performance of our approach in solving DOPs, compared to state-of-the-art baselines.


医学相关(6篇)

【1】Metric Hub: A metric library and practical selection workflow for use-case-driven data quality assessment in medical AI
标题:Metric Hub:用于医疗AI中用例驱动的数据质量评估的指标库和实用选择工作流程
链接:https://arxiv.org/abs/2601.22702

作者:Katinka Becker,Maximilian P. Oppelt,Tobias S. Zech,Martin Seyferth,Sandie Cabon,Vanja Miskovic,Ivan Cimrak,Michal Kozubek,Giuseppe D'Avenio,Ilaria Campioni,Jana Fehr,Kanjar De,Ismail Mahmoudi,Emilio Dolgener Cantu,Laurenz Ottmann,Andreas Klaß,Galaad Altares,Jackie Ma,Alireza Salehi M.,Nadine R. Lang-Richter,Tobias Schaeffter,Daniel Schwabe
摘要:Machine learning (ML) in medicine has transitioned from research to concrete applications aimed at supporting several medical purposes like therapy selection, monitoring and treatment. Acceptance and effective adoption by clinicians and patients, as well as regulatory approval, require evidence of trustworthiness. A major factor for the development of trustworthy AI is the quantification of data quality for AI model training and testing. We have recently proposed the METRIC-framework for systematically evaluating the suitability (fit-for-purpose) of data for medical ML for a given task. Here, we operationalize this theoretical framework by introducing a collection of data quality metrics - the metric library - for practically measuring data quality dimensions. For each metric, we provide a metric card with the most important information, including definition, applicability, examples, pitfalls and recommendations, to support the understanding and implementation of these metrics. Furthermore, we discuss strategies and provide decision trees for choosing an appropriate set of data quality metrics from the metric library given specific use cases. We demonstrate the impact of our approach exemplarily on the PTB-XL ECG-dataset. This is a first step to enable fit-for-purpose evaluation of training and test data in practice as the base for establishing trustworthy AI in medicine.


【2】SCOPE-PD: Explainable AI on Subjective and Clinical Objective Measurements of Parkinson's Disease for Precision Decision-Making
标题:SCOPE-PD:帕金森病主观和临床客观测量的可解释人工智能,以实现精确决策
链接:https://arxiv.org/abs/2601.22516

作者:Md Mezbahul Islam,John Michael Templeton,Masrur Sobhan,Christian Poellabauer,Ananda Mohan Mondal
备注:16 pages, 3 tables, 5 figures, to be published (full text online) in Springer (Springer CCIS series: electronic ISSN 1865-0937, print ISSN 1865-0929)
摘要:Parkinson's disease (PD) is a chronic and complex neurodegenerative disorder influenced by genetic, clinical, and lifestyle factors. Predicting this disease early is challenging because it depends on traditional diagnostic methods that face issues of subjectivity, which commonly delay diagnosis. Several objective analyses are currently in practice to help overcome the challenges of subjectivity; however, a proper explanation of these analyses is still lacking. While machine learning (ML) has demonstrated potential in supporting PD diagnosis, existing approaches often rely on subjective reports only and lack interpretability for individualized risk estimation. This study proposes SCOPE-PD, an explainable AI-based prediction framework, by integrating subjective and objective assessments to provide personalized health decisions. Subjective and objective clinical assessment data are collected from the Parkinson's Progression Markers Initiative (PPMI) study to construct a multimodal prediction framework. Several ML techniques are applied to these data, and the best ML model is selected to interpret the results. Model interpretability is examined using SHAP-based analysis. The Random Forest algorithm achieves the highest accuracy of 98.66 percent using combined features from both subjective and objective test data. Tremor, bradykinesia, and facial expression are identified as the top three contributing features from the MDS-UPDRS test in the prediction of PD.


【3】EvoEGF-Mol: Evolving Exponential Geodesic Flow for Structure-based Drug Design
标题:EvoEGF-Mol:基于结构的药物设计的不断发展的指数测地流程
链接:https://arxiv.org/abs/2601.22466

作者:Yaowei Jin,Junjie Wang,Cheng Cao,Penglei Wang,Duo An,Qian Shi
摘要:Structure-Based Drug Design (SBDD) aims to discover bioactive ligands. Conventional approaches construct probability paths separately in Euclidean and probabilistic spaces for continuous atomic coordinates and discrete chemical categories, leading to a mismatch with the underlying statistical manifolds. We address this issue from an information-geometric perspective by modeling molecules as composite exponential-family distributions and defining generative flows along exponential geodesics under the Fisher-Rao metric. To avoid the instantaneous trajectory collapse induced by geodesics directly targeting Dirac distributions, we propose Evolving Exponential Geodesic Flow for SBDD (EvoEGF-Mol), which replaces static Dirac targets with dynamically concentrating distributions, ensuring stable training via a progressive-parameter-refinement architecture. Our model approaches a reference-level PoseBusters passing rate (93.4%) on CrossDock, demonstrating remarkable geometric precision and interaction fidelity, while outperforming baselines on real-world MolGenBench tasks by recovering bioactive scaffolds and generating candidates that meet established MedChem filters.


【4】AgentScore: Autoformulation of Deployable Clinical Scoring Systems
标题:AgentScore:可部署临床评分系统的自动制定
链接:https://arxiv.org/abs/2601.22324

作者:Silas Ruhrberg Estévez,Christopher Chiu,Mihaela van der Schaar
摘要:Modern clinical practice relies on evidence-based guidelines implemented as compact scoring systems composed of a small number of interpretable decision rules. While machine-learning models achieve strong performance, many fail to translate into routine clinical use due to misalignment with workflow constraints such as memorability, auditability, and bedside execution. We argue that this gap arises not from insufficient predictive power, but from optimizing over model classes that are incompatible with guideline deployment. Deployable guidelines often take the form of unit-weighted clinical checklists, formed by thresholding the sum of binary rules, but learning such scores requires searching an exponentially large discrete space of possible rule sets. We introduce AgentScore, which performs semantically guided optimization in this space by using LLMs to propose candidate rules and a deterministic, data-grounded verification-and-selection loop to enforce statistical validity and deployability constraints. Across eight clinical prediction tasks, AgentScore outperforms existing score-generation methods and achieves AUC comparable to more flexible interpretable models despite operating under stronger structural constraints. On two additional externally validated tasks, AgentScore achieves higher discrimination than established guideline-based scores.


【5】Neural Signals Generate Clinical Notes in the Wild
标题:神经信号在野外生成临床记录
链接:https://arxiv.org/abs/2601.22197

作者:Jathurshan Pradeepkumar,Zheng Chen,Jimeng Sun
摘要:Generating clinical reports that summarize abnormal patterns, diagnostic findings, and clinical interpretations from long-term EEG recordings remains labor-intensive. We curate a large-scale clinical EEG dataset with $9{,}922$ reports paired with approximately $11{,}000$ hours of EEG recordings from $9{,}048$ patients. We therefore develop CELM, the first clinical EEG-to-Language foundation model capable of summarizing long-duration, variable-length EEG recordings and performing end-to-end clinical report generation at multiple scales, including recording description, background activity, epileptiform abnormalities, events/seizures, and impressions. Experimental results show that, with patient history supervision, our method achieves $70\%$--$95\%$ average relative improvements in standard generation metrics (e.g., ROUGE-1 and METEOR) from $0.2$--$0.3$ to $0.4$--$0.6$. In the zero-shot setting without patient history, CELM attains generation scores in the range of $0.43$--$0.52$, compared to baselines of $0.17$--$0.26$. CELM integrates pretrained EEG foundation models with language models to enable scalable multimodal learning. We release our model and benchmark construction pipeline at [URL].


【6】Scale-Cascaded Diffusion Models for Super-Resolution in Medical Imaging
标题:医学成像超分辨率的尺度级联扩散模型
链接:https://arxiv.org/abs/2601.23201

作者:Darshan Thaker,Mahmoud Mostapha,Radu Miron,Shihan Qiu,Mariappan Nadar
备注:Accepted at IEEE International Symposium for Biomedical Imaging (ISBI) 2026
摘要:Diffusion models have been increasingly used as strong generative priors for solving inverse problems such as super-resolution in medical imaging. However, these approaches typically utilize a diffusion prior trained at a single scale, ignoring the hierarchical scale structure of image data. In this work, we propose to decompose images into Laplacian pyramid scales and train separate diffusion priors for each frequency band. We then develop an algorithm to perform super-resolution that utilizes these priors to progressively refine reconstructions across different scales. Evaluated on brain, knee, and prostate MRI data, our approach both improves perceptual quality over baselines and reduces inference time through smaller coarse-scale networks. Our framework unifies multiscale reconstruction and diffusion priors for medical image super-resolution.


蒸馏|知识提取(2篇)

【1】Stabilizing Consistency Training: A Flow Map Analysis and Self-Distillation
标题:稳定一致性训练:流程图分析和自蒸馏
链接:https://arxiv.org/abs/2601.22679

作者:Youngjoong Kim,Duhoe Kim,Woosung Kim,Jaesik Park
摘要:Consistency models have been proposed for fast generative modeling, achieving results competitive with diffusion and flow models. However, these methods exhibit inherent instability and limited reproducibility when training from scratch, motivating subsequent work to explain and stabilize these issues. While these efforts have provided valuable insights, the explanations remain fragmented, and the theoretical relationships remain unclear. In this work, we provide a theoretical examination of consistency models by analyzing them from a flow map-based perspective. This joint analysis clarifies how training stability and convergence behavior can give rise to degenerate solutions. Building on these insights, we revisit self-distillation as a practical remedy for certain forms of suboptimal convergence and reformulate it to avoid excessive gradient norms for stable optimization. We further demonstrate that our strategy extends beyond image generation to diffusion-based policy learning, without reliance on a pretrained diffusion model for initialization, thereby illustrating its broader applicability.


【2】Learn from A Rationalist: Distilling Intermediate Interpretable Rationales
标题:向数学家学习:提炼中级可解释的数学家
链接:https://arxiv.org/abs/2601.22531

作者:Jiayi Dai,Randy Goebel
摘要:Because of the pervasive use of deep neural networks (DNNs), especially in high-stakes domains, the interpretability of DNNs has received increased attention. The general idea of rationale extraction (RE) is to provide an interpretable-by-design framework for DNNs via a select-predict architecture where two neural networks learn jointly to perform feature selection and prediction, respectively. Given only the remote supervision from the final task prediction, the process of learning to select subsets of features (or \emph{rationales}) requires searching in the space of all possible feature combinations, which is computationally challenging and even harder when the base neural networks are not sufficiently capable. To improve the predictive performance of RE models that are based on less capable or smaller neural networks (i.e., the students), we propose \textbf{REKD} (\textbf{R}ationale \textbf{E}xtraction with \textbf{K}nowledge \textbf{D}istillation) where a student RE model learns from the rationales and predictions of a teacher (i.e., a \emph{rationalist}) in addition to the student's own RE optimization. This structural adjustment to RE aligns well with how humans could learn effectively from interpretable and verifiable knowledge. Because of the neural-model agnostic nature of the method, any black-box neural network could be integrated as a backbone model. To demonstrate the viability of REKD, we conduct experiments with multiple variants of BERT and vision transformer (ViT) models. Our experiments across language and vision classification datasets (i.e., IMDB movie reviews, CIFAR 10 and CIFAR 100) show that REKD significantly improves the predictive performance of the student RE models.


超分辨率|去噪|去模糊|去雾(1篇)

【1】Denoising the Deep Sky: Physics-Based CCD Noise Formation for Astronomical Imaging
标题:去除深空:天文成像中基于物理的CCD噪音形成
链接:https://arxiv.org/abs/2601.23276

作者:Shuhong Liu,Xining Ge,Ziying Gu,Lin Gu,Ziteng Cui,Xuangeng Chu,Jun Liu,Dong Li,Tatsuya Harada
摘要:Astronomical imaging remains noise-limited under practical observing constraints, while standard calibration pipelines mainly remove structured artifacts and leave stochastic noise largely unresolved. Learning-based denoising is promising, yet progress is hindered by scarce paired training data and the need for physically interpretable and reproducible models in scientific workflows. We propose a physics-based noise synthesis framework tailored to CCD noise formation. The pipeline models photon shot noise, photo-response non-uniformity, dark-current noise, readout effects, and localized outliers arising from cosmic-ray hits and hot pixels. To obtain low-noise inputs for synthesis, we average multiple unregistered exposures to produce high-SNR bases. Realistic noisy counterparts synthesized from these bases using our noise model enable the construction of abundant paired datasets for supervised learning. We further introduce a real-world dataset across multi-bands acquired with two twin ground-based telescopes, providing paired raw frames and instrument-pipeline calibrated frames, together with calibration data and stacked high-SNR bases for real-world evaluation.


自动驾驶|车辆|车道检测等(3篇)

【1】FedDis: A Causal Disentanglement Framework for Federated Traffic Prediction
标题:FedDis:联邦流量预测的因果解析框架
链接:https://arxiv.org/abs/2601.22578

作者:Chengyang Zhou,Zijian Zhang,Chunxu Zhang,Hao Miao,Yulin Zhang,Kedi Lyu,Juncheng Hu
摘要:Federated learning offers a promising paradigm for privacy-preserving traffic prediction, yet its performance is often challenged by the non-identically and independently distributed (non-IID) nature of decentralized traffic data. Existing federated methods frequently struggle with this data heterogeneity, typically entangling globally shared patterns with client-specific local dynamics within a single representation. In this work, we postulate that this heterogeneity stems from the entanglement of two distinct generative sources: client-specific localized dynamics and cross-client global spatial-temporal patterns. Motivated by this perspective, we introduce FedDis, a novel framework that, to the best of our knowledge, is the first to leverage causal disentanglement for federated spatial-temporal prediction. Architecturally, FedDis comprises a dual-branch design wherein a Personalized Bank learns to capture client-specific factors, while a Global Pattern Bank distills common knowledge. This separation enables robust cross-client knowledge transfer while preserving high adaptability to unique local environments. Crucially, a mutual information minimization objective is employed to enforce informational orthogonality between the two branches, thereby ensuring effective disentanglement. Comprehensive experiments conducted on four real-world benchmark datasets demonstrate that FedDis consistently achieves state-of-the-art performance, promising efficiency, and superior expandability.


【2】Keep Rehearsing and Refining: Lifelong Learning Vehicle Routing under Continually Drifting Tasks
标题:不断排练和完善:持续漂移任务下的终身学习车辆路线
链接:https://arxiv.org/abs/2601.22509

作者:Jiyuan Pei,Yi Mei,Jialin Liu,Mengjie Zhang,Xin Yao
摘要:Existing neural solvers for vehicle routing problems (VRPs) are typically trained either in a one-off manner on a fixed set of pre-defined tasks or in a lifelong manner on several tasks arriving sequentially, assuming sufficient training on each task. Both settings overlook a common real-world property: problem patterns may drift continually over time, yielding massive tasks sequentially arising while offering only limited training resources per task. In this paper, we study a novel lifelong learning paradigm for neural VRP solvers under continually drifting tasks over learning time steps, where sufficient training for any given task at any time is not available. We propose Dual Replay with Experience Enhancement (DREE), a general framework to improve learning efficiency and mitigate catastrophic forgetting under such drift. Extensive experiments show that, under such continual drift, DREE effectively learns new tasks, preserves prior knowledge, improves generalization to unseen tasks, and can be applied to diverse existing neural solvers.


【3】Aligning Microscopic Vehicle and Macroscopic Traffic Statistics: Reconstructing Driving Behavior from Partial Data
标题:协调微观车辆与宏观交通统计:从部分数据重建驾驶行为
链接:https://arxiv.org/abs/2601.22242

作者:Zhihao Zhang,Keith Redmill,Chengyang Peng,Bowen Weng
摘要:A driving algorithm that aligns with good human driving practices, or at the very least collaborates effectively with human drivers, is crucial for developing safe and efficient autonomous vehicles. In practice, two main approaches are commonly adopted: (i) supervised or imitation learning, which requires comprehensive naturalistic driving data capturing all states that influence a vehicle's decisions and corresponding actions, and (ii) reinforcement learning (RL), where the simulated driving environment either matches or is intentionally more challenging than real-world conditions. Both methods depend on high-quality observations of real-world driving behavior, which are often difficult and costly to obtain. State-of-the-art sensors on individual vehicles can gather microscopic data, but they lack context about the surrounding conditions. Conversely, roadside sensors can capture traffic flow and other macroscopic characteristics, but they cannot associate this information with individual vehicles on a microscopic level. Motivated by this complementarity, we propose a framework that reconstructs unobserved microscopic states from macroscopic observations, using microscopic data to anchor observed vehicle behaviors, and learns a shared policy whose behavior is microscopically consistent with the partially observed trajectories and actions and macroscopically aligned with target traffic statistics when deployed population-wide. Such constrained and regularized policies promote realistic flow patterns and safe coordination with human drivers at scale.


联邦学习|隐私保护|加密(2篇)

【1】Beyond Fixed Rounds: Data-Free Early Stopping for Practical Federated Learning
标题:超越固定回合:实用联邦学习的无数据早期停止
链接:https://arxiv.org/abs/2601.22669

作者:Youngjoon Lee,Hyukjoon Lee,Seungrok Jung,Andy Luo,Jinu Gong,Yang Cao,Joonhyuk Kang
备注:10 pages
摘要:Federated Learning (FL) facilitates decentralized collaborative learning without transmitting raw data. However, reliance on fixed global rounds or validation data for hyperparameter tuning hinders practical deployment by incurring high computational costs and privacy risks. To address this, we propose a data-free early stopping framework that determines the optimal stopping point by monitoring the task vector's growth rate using solely server-side parameters. The numerical results on skin lesion/blood cell classification demonstrate that our approach is comparable to validation-based early stopping across various state-of-the-art FL methods. In particular, the proposed framework spends an average of 47/20 (skin lesion/blood cell) rounds to achieve over 12.5%/10.3% higher performance than early stopping based on validation data. To the best of our knowledge, this is the first work to propose an early stopping framework for FL methods without using any validation data.


【2】ZK-HybridFL: Zero-Knowledge Proof-Enhanced Hybrid Ledger for Federated Learning
标题:ZK-HybridFL:用于联邦学习的零知识证明增强型混合分类帐
链接:https://arxiv.org/abs/2601.22302

作者:Amirhossein Taherpour,Xiaodong Wang
备注:Accepted for publication in IEEE Transactions on Neural Networks and Learning Systems (TNNLS)
摘要:Federated learning (FL) enables collaborative model training while preserving data privacy, yet both centralized and decentralized approaches face challenges in scalability, security, and update validation. We propose ZK-HybridFL, a secure decentralized FL framework that integrates a directed acyclic graph (DAG) ledger with dedicated sidechains and zero-knowledge proofs (ZKPs) for privacy-preserving model validation. The framework uses event-driven smart contracts and an oracle-assisted sidechain to verify local model updates without exposing sensitive data. A built-in challenge mechanism efficiently detects adversarial behavior. In experiments on image classification and language modeling tasks, ZK-HybridFL achieves faster convergence, higher accuracy, lower perplexity, and reduced latency compared to Blade-FL and ChainFL. It remains robust against substantial fractions of adversarial and idle nodes, supports sub-second on-chain verification with efficient gas usage, and prevents invalid updates and orphanage-style attacks. This makes ZK-HybridFL a scalable and secure solution for decentralized FL across diverse environments.


推理|分析|理解|解释(18篇)

【1】Regularisation in neural networks: a survey and empirical analysis of approaches
标题:神经网络的正规化:方法的调查和实证分析
链接:https://arxiv.org/abs/2601.23131

作者:Christiaan P. Opperman,Anna S. Bosman,Katherine M. Malan
备注:15 pages, 4 figures, 4 tables and for associated to the code, see https://github.com/Christo08/Benchmarks-of-regularisation-techniques.git
摘要:Despite huge successes on a wide range of tasks, neural networks are known to sometimes struggle to generalise to unseen data. Many approaches have been proposed over the years to promote the generalisation ability of neural networks, collectively known as regularisation techniques. These are used as common practice under the assumption that any regularisation added to the pipeline would result in a performance improvement. In this study, we investigate whether this assumption holds in practice. First, we provide a broad review of regularisation techniques, including modern theories such as double descent. We propose a taxonomy of methods under four broad categories, namely: (1) data-based strategies, (2) architecture strategies, (3) training strategies, and (4) loss function strategies. Notably, we highlight the contradictions and correspondences between the approaches in these broad classes. Further, we perform an empirical comparison of the various regularisation techniques on classification tasks for ten numerical and image datasets applied to the multi-layer perceptron and convolutional neural network architectures. Results show that the efficacy of regularisation is dataset-dependent. For example, the use of a regularisation term only improved performance on numeric datasets, whereas batch normalisation improved performance on image datasets only. Generalisation is crucial to machine learning; thus, understanding the effects of applying regularisation techniques, and considering the connections between them is essential to the appropriate use of these methods in practice.


【2】Divide-and-Conquer CoT: RL for Reducing Latency via Parallel Reasoning
标题:分治CoT:通过并行推理减少延迟的RL
链接:https://arxiv.org/abs/2601.23027

作者:Arvind Mahankali,Kaiyue Wen,Tengyu Ma
备注:47 pages, 13 figures
摘要:Long chain-of-thought reasoning (Long CoT) is now fundamental to state-of-the-art LLMs, especially in mathematical reasoning. However, LLM generation is highly sequential, and long CoTs lead to a high latency. We propose to train Divide-and-Conquer CoT (DC-CoT) to reduce the latency. With DC-CoT, the model can act as a director that identifies distinct subtasks that can be performed in parallel in its reasoning process, and then spawns workers to execute the subtasks. Our goal is to achieve high accuracy, with a low longest path length, which is a theoretical measure of the latency needed for the response. We start with a long CoT base model (DeepScaleR-1.5B-Preview), and first use SFT with a small curated demonstration set to initialize its ability to spawn workers in a certain format. Because SFT degrades the accuracy significantly, we design a multi-stage RL algorithm, with various data filtering strategies, to recover the accuracy while decreasing the longest path length. Across several benchmarks including AIME 2024 and HMMT 2025, DC-CoT achieves similar accuracy as DeepScaleR-1.5B-Preview while decreasing longest path length by 35-40%. Our code, SFT dataset and models are publicly available at https://github.com/amahankali10/DC_CoT_RL_for_Low_Latency_CoT_with_Parallel_Reasoning.


【3】Understanding Generalization from Embedding Dimension and Distributional Convergence
标题:从嵌入维和分布收敛理解概括
链接:https://arxiv.org/abs/2601.22756

作者:Junjie Yu,Zhuoli Ouyang,Haotian Deng,Chen Wei,Wenxiao Ma,Jianyu Zhang,Zihan Deng,Quanying Liu
摘要:Deep neural networks often generalize well despite heavy over-parameterization, challenging classical parameter-based analyses. We study generalization from a representation-centric perspective and analyze how the geometry of learned embeddings controls predictive performance for a fixed trained model. We show that population risk can be bounded by two factors: (i) the intrinsic dimension of the embedding distribution, which determines the convergence rate of empirical embedding distribution to the population distribution in Wasserstein distance, and (ii) the sensitivity of the downstream mapping from embeddings to predictions, characterized by Lipschitz constants. Together, these yield an embedding-dependent error bound that does not rely on parameter counts or hypothesis class complexity. At the final embedding layer, architectural sensitivity vanishes and the bound is dominated by embedding dimension, explaining its strong empirical correlation with generalization performance. Experiments across architectures and datasets validate the theory and demonstrate the utility of embedding-based diagnostics.


【4】Is Softmax Loss All You Need? A Principled Analysis of Softmax-family Loss
标题:Softmax损失就是您所需要的吗?Softmax家族损失的原则性分析
链接:https://arxiv.org/abs/2601.22745

作者:Yuanhao Pu,Defu Lian,Enhong Chen
备注:34 pages, 3 figures
摘要:The Softmax loss is one of the most widely employed surrogate objectives for classification and ranking tasks. To elucidate its theoretical properties, the Fenchel-Young framework situates it as a canonical instance within a broad family of surrogates. Concurrently, another line of research has addressed scalability when the number of classes is exceedingly large, in which numerous approximations have been proposed to retain the benefits of the exact objective while improving efficiency. Building on these two perspectives, we present a principled investigation of the Softmax-family losses. We examine whether different surrogates achieve consistency with classification and ranking metrics, and analyze their gradient dynamics to reveal distinct convergence behaviors. We also introduce a systematic bias-variance decomposition for approximate methods that provides convergence guarantees, and further derive a per-epoch complexity analysis, showing explicit trade-offs between effectiveness and efficiency. Extensive experiments on a representative task demonstrate a strong alignment between consistency, convergence, and empirical performance. Together, these results establish a principled foundation and offer practical guidance for loss selections in large-class machine learning applications.


【5】Bi-MCQ: Reformulating Vision-Language Alignment for Negation Understanding
标题:Bi-MCQ:重构视觉-语言对齐以实现否定理解
链接:https://arxiv.org/abs/2601.22696

作者:Tae Hun Kim,Hyun Gyu Lee
备注:15 pages, 4 figures, Submitted to ICPR 2026 (under review)
摘要:Recent vision-language models (VLMs) achieve strong zero-shot performance via large-scale image-text pretraining and have been widely adopted in medical image analysis. However, existing VLMs remain notably weak at understanding negated clinical statements, largely due to contrastive alignment objectives that treat negation as a minor linguistic variation rather than a meaning-inverting operator. In multi-label settings, prompt-based InfoNCE fine-tuning further reinforces easy-positive image-prompt alignments, limiting effective learning of disease absence. To overcome these limitations, we reformulate vision-language alignment as a conditional semantic comparison problem, which is instantiated through a bi-directional multiple-choice learning framework(Bi-MCQ). By jointly training Image-to-Text and Text-to-Image MCQ tasks with affirmative, negative, and mixed prompts, our method implements fine-tuning as conditional semantic comparison instead of global similarity maximization. We further introduce direction-specific Cross-Attention fusion modules to address asymmetric cues required by bi-directional reasoning and reduce alignment interference. Experiments on ChestXray14, Open-I, CheXpert, and PadChest show that Bi-MCQ improves negation understanding by up to 0.47 AUC over the zero-shot performance of the state-of-the-art CARZero model, while achieving up to a 0.08 absolute gain on positive-negative combined (PNC) evaluation. Additionally, Bi-MCQ reduces the affirmative-negative AUC gap by an average of 0.12 compared to InfoNCE-based fine-tuning, demonstrating that objective reformulation can substantially enhance negation understanding in medical VLMs.


【6】Pushing the Boundaries of Natural Reasoning: Interleaved Bonus from Formal-Logic Verification
标题:突破自然推理的界限:形式逻辑验证的交叉奖励
链接:https://arxiv.org/abs/2601.22642

作者:Chuxue Cao,Jinluan Yang,Haoran Li,Kunhao Pan,Zijian Zhao,Zhengyu Chen,Yuchen Tian,Lijun Wu,Conghui He,Sirui Han,Yike Guo
摘要:Large Language Models (LLMs) show remarkable capabilities, yet their stochastic next-token prediction creates logical inconsistencies and reward hacking that formal symbolic systems avoid. To bridge this gap, we introduce a formal logic verification-guided framework that dynamically interleaves formal symbolic verification with the natural language generation process, providing real-time feedback to detect and rectify errors as they occur. Distinguished from previous neuro-symbolic methods limited by passive post-hoc validation, our approach actively penalizes intermediate fallacies during the reasoning chain. We operationalize this framework via a novel two-stage training pipeline that synergizes formal logic verification-guided supervised fine-tuning and policy optimization. Extensive evaluation on six benchmarks spanning mathematical, logical, and general reasoning demonstrates that our 7B and 14B models outperform state-of-the-art baselines by average margins of 10.4% and 14.2%, respectively. These results validate that formal verification can serve as a scalable mechanism to significantly push the performance boundaries of advanced LLM reasoning.


【7】Elastic Spectral State Space Models for Budgeted Inference
标题:可预测推理的弹性谱状态空间模型
链接:https://arxiv.org/abs/2601.22488

作者:Dachuan Song,Xuan Wang
摘要:Foundation models are typically trained at a fixed computational capacity, while real-world applications require deployment across platforms with different resource constraints. Current approaches usually rely on training families of model variants or model distillation, which requires additional training and supports only a pre-selected set of sizes rather than fine-grained adaptation at runtime. In this paper, we propose Elastic Spectral State Space Models (ES-SSM), which require only one-time training at full capacity, but can be directly truncated into arbitrary scales for budgeted, runtime inference without retraining. Our ES-SSM builds on Hankel spectral filtering over a state space model (SSM), coupled with a lightweight input-adaptive gate trained under randomized spectral budgets. Using a shared masked normalization rule over the ordered spectral channels, we encourage predictive capability to concentrate in low-index components, while higher-index components act primarily as refinement. We test our algorithm across long-sequence benchmarks spanning text, logic, retrieval, vision, and audio. We demonstrate that a single ES-SSM model trained once can be truncated to provide competitive performance compared with modern Transformer and SSM baselines at similar parameter scales. Furthermore, by testing under various runtime budgets, we observe smooth and stable budget-performance curves over a wide range of truncation levels.


【8】Mitigating Cognitive Inertia in Large Reasoning Models via Latent Spike Steering
标题:通过潜在尖峰引导缓解大型推理模型中的认知惯性
链接:https://arxiv.org/abs/2601.22484

作者:Seojin Lee,ByeongJeong Kim,Hwanhee Lee
备注:21 pages, 6 figures
摘要:While Large Reasoning Models (LRMs) have achieved remarkable performance by scaling test-time compute, they frequently suffer from Cognitive Inertia, a failure pattern manifesting as either overthinking (inertia of motion) or reasoning rigidity (inertia of direction). Existing detection methods, typically relying on superficial textual heuristics like self-correction tokens, often fail to capture the model's unvoiced internal conflicts. To address this, we propose STARS (Spike-Triggered Adaptive Reasoning Steering), a training-free framework designed to rectify cognitive inertia by monitoring latent dynamics. STARS identifies Cognitive Pivots-critical moments of reasoning transition-by detecting distinct L2 distance spikes in the hidden states. Upon detection, the framework employs geometric trajectory analysis to diagnose the structural nature of the transition and injects state-aware language cues to steer the model in real-time. Our experiments across diverse benchmarks confirm that STARS efficiently curtails redundant loops while improving accuracy through the adaptive correction of erroneous trajectories. STARS offers a robust, unsupervised mechanism to optimize the reasoning process of LRMs without requiring additional fine-tuning.


【9】Score-based Integrated Gradient for Root Cause Explanations of Outliers
标题:基于分数的综合梯度用于离群值的根本原因解释
链接:https://arxiv.org/abs/2601.22399

作者:Phuoc Nguyen,Truyen Tran,Sunil Gupta,Svetha Venkatesh
备注:Accepted at ICDM 2025
摘要:Identifying the root causes of outliers is a fundamental problem in causal inference and anomaly detection. Traditional approaches based on heuristics or counterfactual reasoning often struggle under uncertainty and high-dimensional dependencies. We introduce SIREN, a novel and scalable method that attributes the root causes of outliers by estimating the score functions of the data likelihood. Attribution is computed via integrated gradients that accumulate score contributions along paths from the outlier toward the normal data distribution. Our method satisfies three of the four classic Shapley value axioms - dummy, efficiency, and linearity - as well as an asymmetry axiom derived from the underlying causal structure. Unlike prior work, SIREN operates directly on the score function, enabling tractable and uncertainty-aware root cause attribution in nonlinear, high-dimensional, and heteroscedastic causal models. Extensive experiments on synthetic random graphs and real-world cloud service and supply chain datasets show that SIREN outperforms state-of-the-art baselines in both attribution accuracy and computational efficiency.


【10】MERMAID: Memory-Enhanced Retrieval and Reasoning with Multi-Agent Iterative Knowledge Grounding for Veracity Assessment
标题:merMAID:具有多智能体迭代知识基础的内存增强检索和推理,用于准确性评估
链接:https://arxiv.org/abs/2601.22361

作者:Yupeng Cao,Chengyang He,Yangyang Yu,Ping Wang,K. P. Subbalakshmi
摘要:Assessing the veracity of online content has become increasingly critical. Large language models (LLMs) have recently enabled substantial progress in automated veracity assessment, including automated fact-checking and claim verification systems. Typical veracity assessment pipelines break down complex claims into sub-claims, retrieve external evidence, and then apply LLM reasoning to assess veracity. However, existing methods often treat evidence retrieval as a static, isolated step and do not effectively manage or reuse retrieved evidence across claims. In this work, we propose MERMAID, a memory-enhanced multi-agent veracity assessment framework that tightly couples the retrieval and reasoning processes. MERMAID integrates agent-driven search, structured knowledge representations, and a persistent memory module within a Reason-Action style iterative process, enabling dynamic evidence acquisition and cross-claim evidence reuse. By retaining retrieved evidence in an evidence memory, the framework reduces redundant searches and improves verification efficiency and consistency. We evaluate MERMAID on three fact-checking benchmarks and two claim-verification datasets using multiple LLMs, including GPT, LLaMA, and Qwen families. Experimental results show that MERMAID achieves state-of-the-art performance while improving the search efficiency, demonstrating the effectiveness of synergizing retrieval, reasoning, and memory for reliable veracity assessment.


【11】Models Under SCOPE: Scalable and Controllable Routing via Pre-hoc Reasoning
标题:范围下的模型:通过预组织推理的可扩展和可控路由
链接:https://arxiv.org/abs/2601.22323

作者:Qi Cao,Shuhao Zhang,Ruizhe Zhou,Ruiyi Zhang,Peijia Qin,Pengtao Xie
备注:We propose SCOPE, a model routing framework that predicts how accurate and how expensive each model will be before running it, allowing users to control cost-accuracy trade-offs and naturally handle new models
摘要:Model routing chooses which language model to use for each query. By sending easy queries to cheaper models and hard queries to stronger ones, it can significantly reduce inference cost while maintaining high accuracy. However, most existing routers treat this as a fixed choice among a small set of models, which makes them hard to adapt to new models or changing budget constraints. In this paper, we propose SCOPE (Scalable and Controllable Outcome Performance Estimator), a routing framework that goes beyond model selection by predicting their cost and performance. Trained with reinforcement learning, SCOPE makes reasoning-based predictions by retrieving how models behave on similar problems, rather than relying on fixed model names, enabling it to work with new, unseen models. Moreover, by explicitly predicting how accurate and how expensive a model will be, it turns routing into a dynamic decision problem, allowing users to easily control the trade-off between accuracy and cost. Experiments show that SCOPE is more than just a cost-saving tool. It flexibly adapts to user needs: it can boost accuracy by up to 25.7% when performance is the priority, or cut costs by up to 95.1% when efficiency matters most.


【12】SCALAR: Quantifying Structural Hallucination, Consistency, and Reasoning Gaps in Materials Foundation Models
标题:SCARAL:量化材料基础模型中的结构幻觉、一致性和推理差距
链接:https://arxiv.org/abs/2601.22312

作者:Can Polat,Erchin Serpedin,Mustafa Kurban,Hasan Kurban
摘要:Large language models are increasingly applied to materials science reasoning, yet their behavior under physically structured distribution shifts remains poorly understood. We introduce SCALAR (Structural Consistency And Logic Across Regimes), a benchmark for evaluating geometric scale generalization and its connection to structural hallucination, consistency, and reasoning in materials foundation models. Given canonical crystal representations, models must reason about derived nanoparticle structures obtained through supercell expansion and geometric truncation across length scales spanning a few atoms to over 18,000 atoms, totaling $\approx$100,000 structures from DFT-validated unit cells. SCALAR defines three tasks. (i) CIF to property prediction. (ii) A Chain-of-Thought variant with explicit physics-grounded reasoning. (iii) Inverse retrieval identifying crystals from candidates given target properties. Outputs are evaluated via structured metrics capturing numeric error, hallucination, cross-prompt consistency, monotonic reasoning, output validity, and retrieval regret. Experiments across diverse foundation models reveal large, model-dependent shifts under explicit reasoning, often reducing hallucination and error, but frequently destabilizing consistency or validity. These results demonstrate that geometric scale generalization cannot be inferred from accuracy alone. Supplementary materials are available at https://github.com/KurbanIntelligenceLab/SCALAR.


【13】Tabular Foundation Models Can Do Survival Analysis
标题:表格基础模型可以进行生存分析
链接:https://arxiv.org/abs/2601.22259

作者:Da In Kim,Wei Siang Lai,Kelly W. Zhang
摘要:While tabular foundation models have achieved remarkable success in classification and regression, adapting them to model time-to-event outcomes for survival analysis is non-trivial due to right-censoring, where data observations may end before the event occurs. We develop a classification-based framework that reformulates both static and dynamic survival analysis as a series of binary classification problems by discretizing event times. Censored observations are naturally handled as examples with missing labels at certain time points. This classification formulation enables existing tabular foundation models to perform survival analysis through in-context learning without explicit training. We prove that under standard censoring assumptions, minimizing our binary classification loss recovers the true survival probabilities as the training set size increases. We demonstrate through evaluation across $53$ real-world datasets that off-the-shelf tabular foundation models with this classification formulation outperform classical and deep learning baselines on average over multiple survival metrics.


【14】Nested Slice Sampling: Vectorized Nested Sampling for GPU-Accelerated Inference
标题:巢式切片采样:用于GOP加速推理的载体化巢式采样
链接:https://arxiv.org/abs/2601.23252

作者:David Yallup,Namu Kroupa,Will Handley
备注:54 pages, 11 figures
摘要:Model comparison and calibrated uncertainty quantification often require integrating over parameters, but scalable inference can be challenging for complex, multimodal targets. Nested Sampling is a robust alternative to standard MCMC, yet its typically sequential structure and hard constraints make efficient accelerator implementations difficult. This paper introduces Nested Slice Sampling (NSS), a GPU-friendly, vectorized formulation of Nested Sampling that uses Hit-and-Run Slice Sampling for constrained updates. A tuning analysis yields a simple near-optimal rule for setting the slice width, improving high-dimensional behavior and making per-step compute more predictable for parallel execution. Experiments on challenging synthetic targets, high dimensional Bayesian inference, and Gaussian process hyperparameter marginalization show that NSS maintains accurate evidence estimates and high-quality posterior samples, and is particularly robust on difficult multimodal problems where current state-of-the-art methods such as tempered SMC baselines can struggle. An open-source implementation is released to facilitate adoption and reproducibility.


【15】OneFlowSBI: One Model, Many Queries for Simulation-Based Inference
标题:OneFlowSBI:一个模型,许多基于仿真的推理
链接:https://arxiv.org/abs/2601.22951

作者:Mayank Nautiyal,Li Ju,Melker Ernfors,Klara Hagland,Ville Holma,Maximilian Werkö Söderholm,Andreas Hellander,Prashant Singh
摘要:We introduce \textit{OneFlowSBI}, a unified framework for simulation-based inference that learns a single flow-matching generative model over the joint distribution of parameters and observations. Leveraging a query-aware masking distribution during training, the same model supports multiple inference tasks, including posterior sampling, likelihood estimation, and arbitrary conditional distributions, without task-specific retraining. We evaluate \textit{OneFlowSBI} on ten benchmark inference problems and two high-dimensional real-world inverse problems across multiple simulation budgets. \textit{OneFlowSBI} is shown to deliver competitive performance against state-of-the-art generalized inference solvers and specialized posterior estimators, while enabling efficient sampling with few ODE integration steps and remaining robust under noisy and partially observed data.


【16】GRANITE: A Generalized Regional Framework for Identifying Agreement in Feature-Based Explanations
标题:花岗岩:确定基于资源的补偿协议的通用区域框架
链接:https://arxiv.org/abs/2601.22771

作者 :Julia Herbinger,Gabriel Laberge,Maximilian Muschalik,Yann Pequignot,Marvin N. Wright,Fabian Fumagalli
摘要:Feature-based explanation methods aim to quantify how features influence the model's behavior, either locally or globally, but different methods often disagree, producing conflicting explanations. This disagreement arises primarily from two sources: how feature interactions are handled and how feature dependencies are incorporated. We propose GRANITE, a generalized regional explanation framework that partitions the feature space into regions where interaction and distribution influences are minimized. This approach aligns different explanation methods, yielding more consistent and interpretable explanations. GRANITE unifies existing regional approaches, extends them to feature groups, and introduces a recursive partitioning algorithm to estimate such regions. We demonstrate its effectiveness on real-world datasets, providing a practical tool for consistent and interpretable feature explanations.


【17】Simulation-based Bayesian inference with ameliorative learned summary statistics -- Part I
标题:基于模拟的Bayesian推理和改进的学习摘要统计--第一部分
链接:https://arxiv.org/abs/2601.22441

作者:Getachew K. Befekadu
备注:13 pages
摘要:This paper, which is Part 1 of a two-part paper series, considers a simulation-based inference with learned summary statistics, in which such a learned summary statistic serves as an empirical-likelihood with ameliorative effects in the Bayesian setting, when the exact likelihood function associated with the observation data and the simulation model is difficult to obtain in a closed form or computationally intractable. In particular, a transformation technique which leverages the Cressie-Read discrepancy criterion under moment restrictions is used for summarizing the learned statistics between the observation data and the simulation outputs, while preserving the statistical power of the inference. Here, such a transformation of data-to-learned summary statistics also allows the simulation outputs to be conditioned on the observation data, so that the inference task can be performed over certain sample sets of the observation data that are considered as an empirical relevance or believed to be particular importance. Moreover, the simulation-based inference framework discussed in this paper can be extended further, and thus handling weakly dependent observation data. Finally, we remark that such an inference framework is suitable for implementation in distributed computing, i.e., computational tasks involving both the data-to-learned summary statistics and the Bayesian inferencing problem can be posed as a unified distributed inference problem that will exploit distributed optimization and MCMC algorithms for supporting large datasets associated with complex simulation models.


【18】Amortized Simulation-Based Inference in Generalized Bayes via Neural Posterior Estimation
标题:基于神经后验估计的广义Bayes摊销模拟推理
链接:https://arxiv.org/abs/2601.22367

作者:Shiyi Sun,Geoff K. Nicholls,Jeong Eun Lee
摘要:Generalized Bayesian Inference (GBI) tempers a loss with a temperature $β>0$ to mitigate overconfidence and improve robustness under model misspecification, but existing GBI methods typically rely on costly MCMC or SDE-based samplers and must be re-run for each new dataset and each $β$ value. We give the first fully amortized variational approximation to the tempered posterior family $p_β(θ\mid x) \propto π(θ)\,p(x \mid θ)^β$ by training a single $(x,β)$-conditioned neural posterior estimator $q_φ(θ\mid x,β)$ that enables sampling in a single forward pass, without simulator calls or inference-time MCMC. We introduce two complementary training routes: (i) synthesize off-manifold samples $(θ,x) \sim π(θ)\,p(x \mid θ)^β$ and (ii) reweight a fixed base dataset $π(θ)\,p(x \mid θ)$ using self-normalized importance sampling (SNIS). We show that the SNIS-weighted objective provides a consistent forward-KL fit to the tempered posterior with finite weight variance. Across four standard simulation-based inference (SBI) benchmarks, including the chaotic Lorenz-96 system, our $β$-amortized estimator achieves competitive posterior approximations in standard two-sample metrics, matching non-amortized MCMC-based power-posterior samplers over a wide range of temperatures.


检测相关(4篇)

【1】PIDSMaker: Building and Evaluating Provenance-based Intrusion Detection Systems
标题:PIDSMaker:构建和评估基于源的入侵检测系统
链接:https://arxiv.org/abs/2601.22983

作者:Tristan Bilot,Baoxiang Jiang,Thomas Pasquier
摘要:Recent provenance-based intrusion detection systems (PIDSs) have demonstrated strong potential for detecting advanced persistent threats (APTs) by applying machine learning to system provenance graphs. However, evaluating and comparing PIDSs remains difficult: prior work uses inconsistent preprocessing pipelines, non-standard dataset splits, and incompatible ground-truth labeling and metrics. These discrepancies undermine reproducibility, impede fair comparison, and impose substantial re-implementation overhead on researchers. We present PIDSMaker, an open-source framework for developing and evaluating PIDSs under consistent protocols. PIDSMaker consolidates eight state-of-the-art systems into a modular, extensible architecture with standardized preprocessing and ground-truth labels, enabling consistent experiments and apples-to-apples comparisons. A YAML-based configuration interface supports rapid prototyping by composing components across systems without code changes. PIDSMaker also includes utilities for ablation studies, hyperparameter tuning, multi-run instability measurement, and visualization, addressing methodological gaps identified in prior work. We demonstrate PIDSMaker through concrete use cases and release it with preprocessed datasets and labels to support shared evaluation for the PIDS community.


【2】From Data Leak to Secret Misses: The Impact of Data Leakage on Secret Detection Models
标题:从数据泄露到秘密缺失:数据泄露对秘密检测模型的影响
链接:https://arxiv.org/abs/2601.22946

作者:Farnaz Soltaniani,Mohammad Ghafari
摘要:Machine learning models are increasingly used for software security tasks. These models are commonly trained and evaluated on large Internet-derived datasets, which often contain duplicated or highly similar samples. When such samples are split across training and test sets, data leakage may occur, allowing models to memorize patterns instead of learning to generalize. We investigate duplication in a widely used benchmark dataset of hard coded secrets and show how data leakage can substantially inflate the reported performance of AI-based secret detectors, resulting in a misleading picture of their real-world effectiveness.


【3】When Anomalies Depend on Context: Learning Conditional Compatibility for Anomaly Detection
标题:当异常依赖于上下文时:学习异常检测的条件兼容性
链接:https://arxiv.org/abs/2601.22868

作者:Shashank Mishra,Didier Stricker,Jason Rambach
备注:Preprint. Submitted to ICML 2026. 8 pages main text, plus appendix
摘要:Anomaly detection is often formulated under the assumption that abnormality is an intrinsic property of an observation, independent of context. This assumption breaks down in many real-world settings, where the same object or action may be normal or anomalous depending on latent contextual factors (e.g., running on a track versus on a highway). We revisit \emph{contextual anomaly detection}, classically defined as context-dependent abnormality, and operationalize it in the visual domain, where anomaly labels depend on subject--context compatibility rather than intrinsic appearance. To enable systematic study of this setting, we introduce CAAD-3K, a benchmark that isolates contextual anomalies by controlling subject identity while varying context. We further propose a conditional compatibility learning framework that leverages vision--language representations to model subject--context relationships under limited supervision. Our method substantially outperforms existing approaches on CAAD-3K and achieves state-of-the-art performance on MVTec-AD and VisA, demonstrating that modeling context dependence complements traditional structural anomaly detection. Our code and dataset will be publicly released.


【4】Trackly: A Unified SaaS Platform for User Behavior Analytics and Real Time Rule Based Anomaly Detection
标题:Trackly:一个用于用户行为分析和基于规则的实时异常检测的统一SaaS平台
链接:https://arxiv.org/abs/2601.22800

作者:Md Zahurul Haque,Md. Hafizur Rahman,Yeahyea Sarker
摘要:Understanding user behavior is essential for improving digital experiences, optimizing business conversions, and mitigating threats like account takeovers, fraud, and bot attacks. Most platforms separate product analytics and security, creating fragmented visibility and delayed threat detection. Trackly, a scalable SaaS platform, unifies comprehensive user behavior analytics with real time, rule based anomaly detection. It tracks sessions, IP based geo location, device browser fingerprints, and granular events such as page views, add to cart, and checkouts. Suspicious activities logins from new devices or locations, impossible travel (Haversine formula), rapid bot like actions, VPN proxy usage, or multiple accounts per IP are flagged via configurable rules with weighted risk scoring, enabling transparent, explainable decisions. A real time dashboard provides global session maps, DAU MAU, bounce rates, and session durations. Integration is simplified with a lightweight JavaScript SDK and secure REST APIs. Implemented on a multi tenant microservices stack (ASP.NET Core, MongoDB, RabbitMQ, Next.js), Trackly achieved 98.1% accuracy, 97.7% precision, and 2.25% false positives on synthetic datasets, proving its efficiency for SMEs and ecommerce.


分类|识别(4篇)

【1】Leveraging Convolutional Sparse Autoencoders for Robust Movement Classification from Low-Density sEMG
标题:利用卷积稀疏自动编码器从低密度sEMG中进行鲁棒的运动分类
链接:https://arxiv.org/abs/2601.23011

作者:Blagoj Hristov,Zoran Hadzi-Velkov,Katerina Hadzi-Velkova Saneva,Gorjan Nadzinski,Vesna Ojleska Latkoska
摘要:Reliable control of myoelectric prostheses is often hindered by high inter-subject variability and the clinical impracticality of high-density sensor arrays. This study proposes a deep learning framework for accurate gesture recognition using only two surface electromyography (sEMG) channels. The method employs a Convolutional Sparse Autoencoder (CSAE) to extract temporal feature representations directly from raw signals, eliminating the need for heuristic feature engineering. On a 6-class gesture set, our model achieved a multi-subject F1-score of 94.3% $\pm$ 0.3%. To address subject-specific differences, we present a few-shot transfer learning protocol that improved performance on unseen subjects from a baseline of 35.1% $\pm$ 3.1% to 92.3% $\pm$ 0.9% with minimal calibration data. Furthermore, the system supports functional extensibility through an incremental learning strategy, allowing for expansion to a 10-class set with a 90.0% $\pm$ 0.2% F1-score without full model retraining. By combining high precision with minimal computational and sensor overhead, this framework provides a scalable and efficient approach for the next generation of affordable and adaptive prosthetic systems.


【2】Label-Efficient Monitoring of Classification Models via Stratified Importance Sampling
标题:通过分层重要性抽样对分类模型进行标签高效监控
链接:https://arxiv.org/abs/2601.22326

作者:Lupo Marsigli,Angel Lopez de Haro
备注:24 pages
摘要 :Monitoring the performance of classification models in production is critical yet challenging due to strict labeling budgets, one-shot batch acquisition of labels and extremely low error rates. We propose a general framework based on Stratified Importance Sampling (SIS) that directly addresses these constraints in model monitoring. While SIS has previously been applied in specialized domains, our theoretical analysis establishes its broad applicability to the monitoring of classification models. Under mild conditions, SIS yields unbiased estimators with strict finite-sample mean squared error (MSE) improvements over both importance sampling (IS) and stratified random sampling (SRS). The framework does not rely on optimally defined proposal distributions or strata: even with noisy proxies and sub-optimal stratification, SIS can improve estimator efficiency compared to IS or SRS individually, though extreme proposal mismatch may limit these gains. Experiments across binary and multiclass tasks demonstrate consistent efficiency improvements under fixed label budgets, underscoring SIS as a principled, label-efficient, and operationally lightweight methodology for post-deployment model monitoring.


【3】Privacy-Preserving Sensor-Based Human Activity Recognition for Low-Resource Healthcare Using Classical Machine Learning
标题:使用经典机器学习进行基于传感器的低资源医疗保健保护隐私的人类活动识别
链接:https://arxiv.org/abs/2601.22265

作者:Ramakant Kumar,Pravin Kumar
摘要:Limited access to medical infrastructure forces elderly and vulnerable patients to rely on home-based care, often leading to neglect and poor adherence to therapeutic exercises such as yoga or physiotherapy. To address this gap, we propose a low-cost and automated human activity recognition (HAR) framework based on wearable inertial sensors and machine learning. Activity data, including walking, walking upstairs, walking downstairs, sitting, standing, and lying, were collected using accelerometer and gyroscope measurements. Four classical classifiers, Logistic Regression, Random Forest, Support Vector Machine (SVM), and k-Nearest Neighbors (k-NN), were evaluated and compared with the proposed Support Tensor Machine (STM). Experimental results show that SVM achieved an accuracy of 93.33 percent, while Logistic Regression, Random Forest, and k-NN achieved 91.11 percent. In contrast, STM significantly outperformed these models, achieving a test accuracy of 96.67 percent and the highest cross-validation accuracy of 98.50 percent. Unlike conventional methods, STM leverages tensor representations to preserve spatio-temporal motion dynamics, resulting in robust classification across diverse activities. The proposed framework demonstrates strong potential for remote healthcare, elderly assistance, child activity monitoring, yoga feedback, and smart home wellness, offering a scalable solution for low-resource and rural healthcare settings.


【4】Multitask Learning for Earth Observation Data Classification with Hybrid Quantum Network
标题:混合量子网络对地观测数据分类的多任务学习
链接:https://arxiv.org/abs/2601.22195

作者:Fan Fan,Yilei Shi,Tobias Guggemos,Xiao Xiang Zhu
摘要:Quantum machine learning (QML) has gained increasing attention as a potential solution to address the challenges of computation requirements in the future. Earth observation (EO) has entered the era of Big Data, and the computational demands for effectively analyzing large EO data with complex deep learning models have become a bottleneck. Motivated by this, we aim to leverage quantum computing for EO data classification and explore its advantages despite the current limitations of quantum devices. This paper presents a hybrid model that incorporates multitask learning to assist efficient data encoding and employs a location weight module with quantum convolution operations to extract valid features for classification. The validity of our proposed model was evaluated using multiple EO benchmarks. Additionally, we experimentally explored the generalizability of our model and investigated the factors contributing to its advantage, highlighting the potential of QML in EO data analysis.


表征(6篇)

【1】Ensuring Semantics in Weights of Implicit Neural Representations through the Implicit Function Theorem
标题:通过隐式函数定理确保隐式神经表示权重的语义
链接:https://arxiv.org/abs/2601.23181

作者:Tianming Qiu,Christos Sonis,Hao Shen
摘要:Weight Space Learning (WSL), which frames neural network weights as a data modality, is an emerging field with potential for tasks like meta-learning or transfer learning. Particularly, Implicit Neural Representations (INRs) provide a convenient testbed, where each set of weights determines the corresponding individual data sample as a mapping from coordinates to contextual values. So far, a precise theoretical explanation for the mechanism of encoding semantics of data into network weights is still missing. In this work, we deploy the Implicit Function Theorem (IFT) to establish a rigorous mapping between the data space and its latent weight representation space. We analyze a framework that maps instance-specific embeddings to INR weights via a shared hypernetwork, achieving performance competitive with existing baselines on downstream classification tasks across 2D and 3D datasets. These findings offer a theoretical lens for future investigations into network weights.


【2】Local Intrinsic Dimension of Representations Predicts Alignment and Generalization in AI Models and Human Brain
标题:表示的局部内在维数预测AI模型和人脑中的对齐和泛化
链接:https://arxiv.org/abs/2601.22722

作者:Junjie Yu,Wenxiao Ma,Chen Wei,Jianyu Zhang,Haotian Deng,Zihan Deng,Quanying Liu
摘要:Recent work has found that neural networks with stronger generalization tend to exhibit higher representational alignment with one another across architectures and training paradigms. In this work, we show that models with stronger generalization also align more strongly with human neural activity. Moreover, generalization performance, model--model alignment, and model--brain alignment are all significantly correlated with each other. We further show that these relationships can be explained by a single geometric property of learned representations: the local intrinsic dimension of embeddings. Lower local dimension is consistently associated with stronger model--model alignment, stronger model--brain alignment, and better generalization, whereas global dimension measures fail to capture these effects. Finally, we find that increasing model capacity and training data scale systematically reduces local intrinsic dimension, providing a geometric account of the benefits of scaling. Together, our results identify local intrinsic dimension as a unifying descriptor of representational convergence in artificial and biological systems.


【3】PEFT-MuTS: A Multivariate Parameter-Efficient Fine-Tuning Framework for Remaining Useful Life Prediction based on Cross-domain Time Series Representation Model
标题:PEFT-MuTS:基于跨域时间序列表示模型的多元参数高效的剩余使用寿命预测微调框架
链接:https://arxiv.org/abs/2601.22631

作者:En Fu,Yanyan Hu,Changhua Hu,Zengwang Jin,Kaixiang Peng
摘要:The application of data-driven remaining useful life (RUL) prediction has long been constrained by the availability of large amount of degradation data. Mainstream solutions such as domain adaptation and meta-learning still rely on large amounts of historical degradation data from equipment that is identical or similar to the target, which imposes significant limitations in practical applications. This study investigates PEFT-MuTS, a Parameter-Efficient Fine-Tuning framework for few-shot RUL prediction, built on cross-domain pre-trained time-series representation models. Contrary to the widely held view that knowledge transfer in RUL prediction can only occur within similar devices, we demonstrate that substantial benefits can be achieved through pre-training process with large-scale cross-domain time series datasets. A independent feature tuning network and a meta-variable-based low rank multivariate fusion mechanism are developed to enable the pre-trained univariate time-series representation backbone model to fully exploit the multivariate relationships in degradation data for downstream RUL prediction task. Additionally, we introduce a zero-initialized regressor that stabilizes the fine-tuning process under few-shot conditions. Experiments on aero-engine and industrial bearing datasets demonstrate that our method can achieve effective RUL prediction even when less than 1\% of samples of target equipment are used. Meanwhile, it substantially outperforms conventional supervised and few-shot approaches while markedly reducing the data required to achieve high predictive accuracy. Our code is available at https://github.com/fuen1590/PEFT-MuTS.


【4】Action-Sufficient Goal Representations
标题:充足的目标表示
链接:https://arxiv.org/abs/2601.22496

作者:Jinu Hyeon,Woobin Park,Hongjoon Ahn,Taesup Moon
摘要:Hierarchical policies in offline goal-conditioned reinforcement learning (GCRL) addresses long-horizon tasks by decomposing control into high-level subgoal planning and low-level action execution. A critical design choice in such architectures is the goal representation-the compressed encoding of goals that serves as the interface between these levels. Existing approaches commonly derive goal representations while learning value functions, implicitly assuming that preserving information sufficient for value estimation is adequate for optimal control. We show that this assumption can fail, even when the value estimation is exact, as such representations may collapse goal states that need to be differentiated for action learning. To address this, we introduce an information-theoretic framework that defines action sufficiency, a condition on goal representations necessary for optimal action selection. We prove that value sufficiency does not imply action sufficiency and empirically verify that the latter is more strongly associated with control success in a discrete environment. We further demonstrate that standard log-loss training of low-level policies naturally induces action-sufficient representations. Our experimental results a popular benchmark demonstrate that our actor-derived representations consistently outperform representations learned via value estimation.


【5】Learning Policy Representations for Steerable Behavior Synthesis
标题:可操纵行为合成的学习策略表示
链接:https://arxiv.org/abs/2601.22350

作者:Beiming Li,Sergio Rozada,Alejandro Ribeiro
摘要:Given a Markov decision process (MDP), we seek to learn representations for a range of policies to facilitate behavior steering at test time. As policies of an MDP are uniquely determined by their occupancy measures, we propose modeling policy representations as expectations of state-action feature maps with respect to occupancy measures. We show that these representations can be approximated uniformly for a range of policies using a set-based architecture. Our model encodes a set of state-action samples into a latent embedding, from which we decode both the policy and its value functions corresponding to multiple rewards. We use variational generative approach to induce a smooth latent space, and further shape it with contrastive learning so that latent distances align with differences in value functions. This geometry permits gradient-based optimization directly in the latent space. Leveraging this capability, we solve a novel behavior synthesis task, where policies are steered to satisfy previously unseen value function constraints without additional training.


【6】Molecular Representations in Implicit Functional Space via Hyper-Networks
标题:通过超网络在隐式功能空间中的分子表示
链接:https://arxiv.org/abs/2601.22327

作者:Zehong Wang,Xiaolong Han,Qi Yang,Xiangru Tang,Fang Wu,Xiaoguang Guo,Weixiang Sun,Tianyi Ma,Pietro Lio,Le Cong,Sheng Wang,Chuxu Zhang,Yanfang Ye
摘要 :Molecular representations fundamentally shape how machine learning systems reason about molecular structure and physical properties. Most existing approaches adopt a discrete pipeline: molecules are encoded as sequences, graphs, or point clouds, mapped to fixed-dimensional embeddings, and then used for task-specific prediction. This paradigm treats molecules as discrete objects, despite their intrinsically continuous and field-like physical nature. We argue that molecular learning can instead be formulated as learning in function space. Specifically, we model each molecule as a continuous function over three-dimensional (3D) space and treat this molecular field as the primary object of representation. From this perspective, conventional molecular representations arise as particular sampling schemes of an underlying continuous object. We instantiate this formulation with MolField, a hyper-network-based framework that learns distributions over molecular fields. To ensure physical consistency, these functions are defined over canonicalized coordinates, yielding invariance to global SE(3) transformations. To enable learning directly over functions, we introduce a structured weight tokenization and train a sequence-based hyper-network to model a shared prior over molecular fields. We evaluate MolField on molecular dynamics and property prediction. Our results show that treating molecules as continuous functions fundamentally changes how molecular representations generalize across tasks and yields downstream behavior that is stable to how molecules are discretized or queried.


编码器(1篇)

【1】Beyond Activation Patterns: A Weight-Based Out-of-Context Explanation of Sparse Autoencoder Features
标题:超越激活模式:稀疏自动编码器功能的基于权重的脱离上下文解释
链接:https://arxiv.org/abs/2601.22447

作者:Yiting Liu,Zhi-Hong Deng
摘要:Sparse autoencoders (SAEs) have emerged as a powerful technique for decomposing language model representations into interpretable features. Current interpretation methods infer feature semantics from activation patterns, but overlook that features are trained to reconstruct activations that serve computational roles in the forward pass. We introduce a novel weight-based interpretation framework that measures functional effects through direct weight interactions, requiring no activation data. Through three experiments on Gemma-2 and Llama-3.1 models, we demonstrate that (1) 1/4 of features directly predict output tokens, (2) features actively participate in attention mechanisms with depth-dependent structure, and (3) semantic and non-semantic feature populations exhibit distinct distribution profiles in attention circuits. Our analysis provides the missing out-of-context half of SAE feature interpretability.


优化|敛散性(11篇)

【1】End-to-end Optimization of Belief and Policy Learning in Shared Autonomy Paradigms
标题:共享自治范式中信念和政策学习的端到端优化
链接:https://arxiv.org/abs/2601.23285

作者:MH Farhadi,Ali Rabiee,Sima Ghafoori,Anna Cetera,Andrew Fisher,Reza Abiri
摘要:Shared autonomy systems require principled methods for inferring user intent and determining appropriate assistance levels. This is a central challenge in human-robot interaction, where systems must be successful while being mindful of user agency. Previous approaches relied on static blending ratios or separated goal inference from assistance arbitration, leading to suboptimal performance in unstructured environments. We introduce BRACE (Bayesian Reinforcement Assistance with Context Encoding), a novel framework that fine-tunes Bayesian intent inference and context-adaptive assistance through an architecture enabling end-to-end gradient flow between intent inference and assistance arbitration. Our pipeline conditions collaborative control policies on environmental context and complete goal probability distributions. We provide analysis showing (1) optimal assistance levels should decrease with goal uncertainty and increase with environmental constraint severity, and (2) integrating belief information into policy learning yields a quadratic expected regret advantage over sequential approaches. We validated our algorithm against SOTA methods (IDA, DQN) using a three-part evaluation progressively isolating distinct challenges of end-effector control: (1) core human-interaction dynamics in a 2D human-in-the-loop cursor task, (2) non-linear dynamics of a robotic arm, and (3) integrated manipulation under goal ambiguity and environmental constraints. We demonstrate improvements over SOTA, achieving 6.3% higher success rates and 41% increased path efficiency, and 36.3% success rate and 87% path efficiency improvement over unassisted control. Our results confirmed that integrated optimization is most beneficial in complex, goal-ambiguous scenarios, and is generalizable across robotic domains requiring goal-directed assistance, advancing the SOTA for adaptive shared autonomy.


【2】Optimal Fair Aggregation of Crowdsourced Noisy Labels using Demographic Parity Constraints
标题:使用人口平价约束的众包噪音标签的最佳公平聚集
链接:https://arxiv.org/abs/2601.23221

作者:Gabriel Singer,Samuel Gruffaz,Olivier Vo Van,Nicolas Vayatis,Argyris Kalogeratos
摘要:As acquiring reliable ground-truth labels is usually costly, or infeasible, crowdsourcing and aggregation of noisy human annotations is the typical resort. Aggregating subjective labels, though, may amplify individual biases, particularly regarding sensitive features, raising fairness concerns. Nonetheless, fairness in crowdsourced aggregation remains largely unexplored, with no existing convergence guarantees and only limited post-processing approaches for enforcing $\varepsilon$-fairness under demographic parity. We address this gap by analyzing the fairness s of crowdsourced aggregation methods within the $\varepsilon$-fairness framework, for Majority Vote and Optimal Bayesian aggregation. In the small-crowd regime, we derive an upper bound on the fairness gap of Majority Vote in terms of the fairness gaps of the individual annotators. We further show that the fairness gap of the aggregated consensus converges exponentially fast to that of the ground-truth under interpretable conditions. Since ground-truth itself may still be unfair, we generalize a state-of-the-art multiclass fairness post-processing algorithm from the continuous to the discrete setting, which enforces strict demographic parity constraints to any aggregation rule. Experiments on synthetic and real datasets demonstrate the effectiveness of our approach and corroborate the theoretical insights.


【3】Value-at-Risk Constrained Policy Optimization
标题:受风险价值约束的政策优化
链接:https://arxiv.org/abs/2601.22993

作者:Rohan Tangri,Jan-Peter Calliess
摘要:We introduce the Value-at-Risk Constrained Policy Optimization algorithm (VaR-CPO), a sample efficient and conservative method designed to optimize Value-at-Risk (VaR) constraints directly. Empirically, we demonstrate that VaR-CPO is capable of safe exploration, achieving zero constraint violations during training in feasible environments, a critical property that baseline methods fail to uphold. To overcome the inherent non-differentiability of the VaR constraint, we employ the one-sided Chebyshev inequality to obtain a tractable surrogate based on the first two moments of the cost return. Additionally, by extending the trust-region framework of the Constrained Policy Optimization (CPO) method, we provide rigorous worst-case bounds for both policy improvement and constraint violation during the training process.


【4】OptiMAG: Structure-Semantic Alignment via Unbalanced Optimal Transport
标题:OptMAG:通过不平衡的最佳传输实现结构-语义一致
链接:https://arxiv.org/abs/2601.22856

作者:Yilong Zuo,Xunkai Li,Zhihan Zhang,Qiangqiang Dai,Ronghua Li,Guoren Wang
摘要:Multimodal Attributed Graphs (MAGs) have been widely adopted for modeling complex systems by integrating multi-modal information, such as text and images, on nodes. However, we identify a discrepancy between the implicit semantic structure induced by different modality embeddings and the explicit graph structure. For instance, neighbors in the explicit graph structure may be close in one modality but distant in another. Since existing methods typically perform message passing over the fixed explicit graph structure, they inadvertently aggregate dissimilar features, introducing modality-specific noise and impeding effective node representation learning. To address this, we propose OptiMAG, an Unbalanced Optimal Transport-based regularization framework. OptiMAG employs the Fused Gromov-Wasserstein distance to explicitly guide cross-modal structural consistency within local neighborhoods, effectively mitigating structural-semantic conflicts. Moreover, a KL divergence penalty enables adaptive handling of cross-modal inconsistencies. This framework can be seamlessly integrated into existing multimodal graph models, acting as an effective drop-in regularizer. Experiments demonstrate that OptiMAG consistently outperforms baselines across multiple tasks, ranging from graph-centric tasks (e.g., node classification, link prediction) to multimodal-centric generation tasks (e.g., graph2text, graph2image). The source code will be available upon acceptance.


【5】A Step Back: Prefix Importance Ratio Stabilizes Policy Optimization
标题:退一步:前置重要性比稳定政策优化
链接:https://arxiv.org/abs/2601.22718

作者:Shiye Lei,Zhihao Cheng,Dacheng Tao
摘要:Reinforcement learning (RL) post-training has increasingly demonstrated strong ability to elicit reasoning behaviors in large language models (LLMs). For training efficiency, rollouts are typically generated in an off-policy manner using an older sampling policy and then used to update the current target policy. To correct the resulting discrepancy between the sampling and target policies, most existing RL objectives rely on a token-level importance sampling ratio, primarily due to its computational simplicity and numerical stability. However, we observe that token-level correction often leads to unstable training dynamics when the degree of off-policyness is large. In this paper, we revisit LLM policy optimization under off-policy conditions and show that the theoretically rigorous correction term is the prefix importance ratio, and that relaxing it to a token-level approximation can induce instability in RL post-training. To stabilize LLM optimization under large off-policy drift, we propose a simple yet effective objective, Minimum Prefix Ratio (MinPRO). MinPRO replaces the unstable cumulative prefix ratio with a non-cumulative surrogate based on the minimum token-level ratio observed in the preceding prefix. Extensive experiments on both dense and mixture-of-experts LLMs, across multiple mathematical reasoning benchmarks, demonstrate that MinPRO substantially improves training stability and peak performance in off-policy regimes.


【6】DRL-Enabled Trajectory Planing for UAV-Assisted VLC: Optimal Altitude and Reward Design
标题:无人机辅助SLC的DRL启用轨迹规划:最佳海拔和奖励设计
链接:https://arxiv.org/abs/2601.22512

作者:Tian-Tian Lin,Yi Liu,Xiao-Wei Tang,Yunmei Shi,Yi Huang,Zhongxiang Wei,Qingqing Wu,Yuhan Dong
摘要:Recently, the integration of unmanned aerial vehicle (UAV) and visible light communication (VLC) technologies has emerged as a promising solution to offer flexible communication and efficient lighting. This letter investigates the three-dimensional trajectory planning in a UAV-assisted VLC system, where a UAV is dispatched to collect data from ground users (GUs). The core objective is to develop a trajectory planning framework that minimizes UAV flight distance, which is equivalent to maximizing the data collection efficiency. This issue is formulated as a challenging mixed-integer non-convex optimization problem. To tackle it, we first derive a closed-form optimal flight altitude under specific VLC channel gain threshold. Subsequently, we optimize the UAV horizontal trajectory by integrating a novel pheromone-driven reward mechanism with the twin delayed deep deterministic policy gradient algorithm, which enables adaptive UAV motion strategy in complex environments. Simulation results validate that the derived optimal altitude effectively reduces the flight distance by up to 35% compared to baseline methods. Additionally, the proposed reward mechanism significantly shortens the convergence steps by approximately 50%, demonstrating notable efficiency gains in the context of UAV-assisted VLC data collection.


【7】AsyncMesh: Fully Asynchronous Optimization for Data and Pipeline Parallelism
标题:AsyncGrid:数据和管道并行主义的完全同步优化
链接:https://arxiv.org/abs/2601.22442

作者:Thalaiyasingam Ajanthan,Sameera Ramasinghe,Gil Avraham,Hadi Mohaghegh Dolatabadi,Chamin P Hewa Koneputugodage,Violetta Shevchenko,Yan Zuo,Alexander Long
摘要:Data and pipeline parallelism are key strategies for scaling neural network training across distributed devices, but their high communication cost necessitates co-located computing clusters with fast interconnects, limiting their scalability. We address this communication bottleneck by introducing asynchronous updates across both parallelism axes, relaxing the co-location requirement at the expense of introducing staleness between pipeline stages and data parallel replicas. To mitigate staleness, for pipeline parallelism, we adopt a weight look-ahead approach, and for data parallelism, we introduce an asynchronous sparse averaging method equipped with an exponential moving average based correction mechanism. We provide convergence guarantees for both sparse averaging and asynchronous updates. Experiments on large-scale language models (up to \em 1B parameters) demonstrate that our approach matches the performance of the fully synchronous baseline, while significantly reducing communication overhead.


【8】Optimization, Generalization and Differential Privacy Bounds for Gradient Descent on Kolmogorov-Arnold Networks
标题:Kolmogorov-Arnold网络上梯度下降的优化、推广和差异隐私界
链接:https://arxiv.org/abs/2601.22409

作者:Puyu Wang,Junyu Zhou,Philipp Liznerski,Marius Kloft
备注:41 pages, 3 figures
摘要:Kolmogorov--Arnold Networks (KANs) have recently emerged as a structured alternative to standard MLPs, yet a principled theory for their training dynamics, generalization, and privacy properties remains limited. In this paper, we analyze gradient descent (GD) for training two-layer KANs and derive general bounds that characterize their training dynamics, generalization, and utility under differential privacy (DP). As a concrete instantiation, we specialize our analysis to logistic loss under an NTK-separable assumption, where we show that polylogarithmic network width suffices for GD to achieve an optimization rate of order $1/T$ and a generalization rate of order $1/n$, with $T$ denoting the number of GD iterations and $n$ the sample size. In the private setting, we characterize the noise required for $(ε,δ)$-DP and obtain a utility bound of order $\sqrt{d}/(nε)$ (with $d$ the input dimension), matching the classical lower bound for general convex Lipschitz problems. Our results imply that polylogarithmic width is not only sufficient but also necessary under differential privacy, revealing a qualitative gap between non-private (sufficiency only) and private (necessity also emerges) training regimes. Experiments further illustrate how these theoretical insights can guide practical choices, including network width selection and early stopping.


【9】Purely Agentic Black-Box Optimization for Biological Design
标题:生物设计的随机黑匣子优化
链接:https://arxiv.org/abs/2601.22382

作者:Natalie Maus,Yimeng Zeng,Haydn Thomas Jones,Yining Huang,Gaurav Ng Goel,Alden Rose,Kyurae Kim,Hyun-Su Lee,Marcelo Der Torossian Torres,Fangping Wan,Cesar de la Fuente-Nunez,Mark Yatskar,Osbert Bastani,Jacob R. Gardner
摘要:Many key challenges in biological design-such as small-molecule drug discovery, antimicrobial peptide development, and protein engineering-can be framed as black-box optimization over vast, complex structured spaces. Existing methods rely mainly on raw structural data and struggle to exploit the rich scientific literature. While large language models (LLMs) have been added to these pipelines, they have been confined to narrow roles within structure-centered optimizers. We instead cast biological black-box optimization as a fully agentic, language-based reasoning process. We introduce Purely Agentic BLack-box Optimization (PABLO), a hierarchical agentic system that uses scientific LLMs pretrained on chemistry and biology literature to generate and iteratively refine biological candidates. On both the standard GuacaMol molecular design and antimicrobial peptide optimization tasks, PABLO achieves state-of-the-art performance, substantially improving sample efficiency and final objective values over established baselines. Compared to prior optimization methods that incorporate LLMs, PABLO achieves competitive token usage per run despite relying on LLMs throughout the optimization loop. Beyond raw performance, the agentic formulation offers key advantages for realistic design: it naturally incorporates semantic task descriptions, retrieval-augmented domain knowledge, and complex constraints. In follow-up in vitro validation, PABLO-optimized peptides showed strong activity against drug-resistant pathogens, underscoring the practical potential of PABLO for therapeutic discovery.


【10】Riemannian Lyapunov Optimizer: A Unified Framework for Optimization
标题:黎曼李雅普诺夫优化器:一个统一的优化框架
链接:https://arxiv.org/abs/2601.22284

作者:Yixuan Wang,Omkar Sudhir Patil,Warren E. Dixon
备注:22 pages, 4 figures
摘要:We introduce Riemannian Lyapunov Optimizers (RLOs), a family of optimization algorithms that unifies classic optimizers within one geometric framework. Unlike heuristic improvements to existing optimizers, RLOs are systematically derived from a novel control-theoretic framework that reinterprets optimization as an extended state discrete-time controlled dynamical system on a Riemannian parameter manifold. Central to this framework is the identification of a Normally Attracting Invariant Manifold (NAIM), which organizes training dynamics into two distinct stages: rapid alignment of the speed state to a target graph, followed by controlled evolution within it. We formalize this by constructing a strict Lyapunov function that certifies convergence to a target manifold. This perspective yields a constructive ``optimizer generator" that not only recovers classic algorithms but enables the principled design of RLOs. We validate our theory via geometric diagnostics and demonstrate that grounding optimizer design in control theory yields state-of-the-art performance in large-scale benchmarks. Overall, RLOs bridge control theory and modern machine learning optimization, providing a unified language and a systematic toolkit for designing stable, effective optimizers.


【11】Is Hierarchical Quantization Essential for Optimal Reconstruction?
标题:分层量化对于最佳重建至关重要吗?
链接:https://arxiv.org/abs/2601.22244

作者:Shirin Reyhanian,Laurenz Wiskott
备注:To appear in the Proceedings of ICPRAM 2026. Code available at : https://github.com/wiskott-lab/single-vs-hier-recon
摘要 :Vector-quantized variational autoencoders (VQ-VAEs) are central to models that rely on high reconstruction fidelity, from neural compression to generative pipelines. Hierarchical extensions, such as VQ-VAE2, are often credited with superior reconstruction performance because they split global and local features across multiple levels. However, since higher levels derive all their information from lower levels, they should not carry additional reconstructive content beyond what the lower-level already encodes. Combined with recent advances in training objectives and quantization mechanisms, this leads us to ask whether a single-level VQ-VAE, with matched representational budget and no codebook collapse, can equal the reconstruction fidelity of its hierarchical counterpart. Although the multi-scale structure of hierarchical models may improve perceptual quality in downstream tasks, the effect of hierarchy on reconstruction accuracy, isolated from codebook utilization and overall representational capacity, remains empirically underexamined. We revisit this question by comparing a two-level VQ-VAE and a capacity-matched single-level model on high-resolution ImageNet images. Consistent with prior observations, we confirm that inadequate codebook utilization limits single-level VQ-VAEs and that overly high-dimensional embeddings destabilize quantization and increase codebook collapse. We show that lightweight interventions such as initialization from data, periodic reset of inactive codebook vectors, and systematic tuning of codebook hyperparameters significantly reduce collapse. Our results demonstrate that when representational budgets are matched, and codebook collapse is mitigated, single-level VQ-VAEs can match the reconstruction fidelity of hierarchical variants, challenging the assumption that hierarchical quantization is inherently superior for high-quality reconstructions.


预测|估计(12篇)

【1】Distribution-informed Efficient Conformal Prediction for Full Ranking
标题:基于分布的有效保形预测,用于全排名
链接:https://arxiv.org/abs/2601.23128

作者:Wenbo Liao,Huipeng Huang,Chen Jia,Huajun Xi,Hao Zeng,Hongxin Wei
备注:21 pages, 8 figures
摘要:Quantifying uncertainty is critical for the safe deployment of ranking models in real-world applications. Recent work offers a rigorous solution using conformal prediction in a full ranking scenario, which aims to construct prediction sets for the absolute ranks of test items based on the relative ranks of calibration items. However, relying on upper bounds of non-conformity scores renders the method overly conservative, resulting in substantially large prediction sets. To address this, we propose Distribution-informed Conformal Ranking (DCR), which produces efficient prediction sets by deriving the exact distribution of non-conformity scores. In particular, we find that the absolute ranks of calibration items follow Negative Hypergeometric distributions, conditional on their relative ranks. DCR thus uses the rank distribution to derive non-conformity score distribution and determine conformal thresholds. We provide theoretical guarantees that DCR achieves improved efficiency over the baseline while ensuring valid coverage under mild assumptions. Extensive experiments demonstrate the superiority of DCR, reducing average prediction set size by up to 36%, while maintaining valid coverage.


【2】To See Far, Look Close: Evolutionary Forecasting for Long-term Time Series
标题:看得远,看得近:长期时间序列的进化预测
链接:https://arxiv.org/abs/2601.23114

作者:Jiaming Ma,Siyuan Mu,Ruilin Tang,Haofeng Ma,Qihe Huang,Zhengyang Zhou,Pengkun Wang,Binwu Wang,Yang Wang
摘要:The prevailing Direct Forecasting (DF) paradigm dominates Long-term Time Series Forecasting (LTSF) by forcing models to predict the entire future horizon in a single forward pass. While efficient, this rigid coupling of output and evaluation horizons necessitates computationally prohibitive re-training for every target horizon. In this work, we uncover a counter-intuitive optimization anomaly: models trained on short horizons-when coupled with our proposed Evolutionary Forecasting (EF) paradigm-significantly outperform those trained directly on long horizons. We attribute this success to the mitigation of a fundamental optimization pathology inherent in DF, where conflicting gradients from distant futures cripple the learning of local dynamics. We establish EF as a unified generative framework, proving that DF is merely a degenerate special case of EF. Extensive experiments demonstrate that a singular EF model surpasses task-specific DF ensembles across standard benchmarks and exhibits robust asymptotic stability in extreme extrapolation. This work propels a paradigm shift in LTSF: moving from passive Static Mapping to autonomous Evolutionary Reasoning.


【3】Deep Learning-Based Early-Stage IR-Drop Estimation via CNN Surrogate Modeling
标题:通过CNN代理建模基于深度学习的早期IR下降估计
链接:https://arxiv.org/abs/2601.22707

作者:Ritesh Bhadana
备注:13 pages, 5 figures, 2 tables. Code and live demo available at https://github.com/riteshbhadana/IR-Drop-Predictor
摘要:IR-drop is a critical power integrity challenge in modern VLSI designs that can cause timing degradation, reliability issues, and functional failures if not detected early in the design flow. Conventional IR-drop analysis relies on physics-based signoff tools, which provide high accuracy but incur significant computational cost and require near-final layout information, making them unsuitable for rapid early-stage design exploration. In this work, we propose a deep learning-based surrogate modeling approach for early-stage IR-drop estimation using a CNN. The task is formulated as a dense pixel-wise regression problem, where spatial physical layout features are mapped directly to IR-drop heatmaps. A U-Net-based encoder-decoder architecture with skip connections is employed to effectively capture both local and global spatial dependencies within the layout. The model is trained on a physics-inspired synthetic dataset generated by us, which incorporates key physical factors including power grid structure, cell density distribution, and switching activity. Model performance is evaluated using standard regression metrics such as Mean Squared Error (MSE) and Peak Signal-to-Noise Ratio (PSNR). Experimental results demonstrate that the proposed approach can accurately predict IR-drop distributions with millisecond-level inference time, enabling fast pre-signoff screening and iterative design optimization. The proposed framework is intended as a complementary early-stage analysis tool, providing designers with rapid IR-drop insight prior to expensive signoff analysis. The implementation, dataset generation scripts, and the interactive inference application are publicly available at: https://github.com/riteshbhadana/IR-Drop-Predictor. The live application can be accessed at: https://ir-drop-predictor.streamlit.app/.


【4】Local-Global Multimodal Contrastive Learning for Molecular Property Prediction
标题:用于分子性质预测的局部-全球多模式对比学习
链接:https://arxiv.org/abs/2601.22610

作者:Xiayu Liu,Zhengyi Lu,Yunhong Liao,Chan Fan,Hou-biao Li
备注:16 pages, 9 figures. Submitted to Briefings in Bioinformatics
摘要:Accurate molecular property prediction requires integrating complementary information from molecular structure and chemical semantics. In this work, we propose LGM-CL, a local-global multimodal contrastive learning framework that jointly models molecular graphs and textual representations derived from SMILES and chemistry-aware augmented texts. Local functional group information and global molecular topology are captured using AttentiveFP and Graph Transformer encoders, respectively, and aligned through self-supervised contrastive learning. In addition, chemically enriched textual descriptions are contrasted with original SMILES to incorporate physicochemical semantics in a task-agnostic manner. During fine-tuning, molecular fingerprints are further integrated via Dual Cross-attention multimodal fusion. Extensive experiments on MoleculeNet benchmarks demonstrate that LGM-CL achieves consistent and competitive performance across both classification and regression tasks, validating the effectiveness of unified local-global and multimodal representation learning.


【5】Leveraging Data to Say No: Memory Augmented Plug-and-Play Selective Prediction
标题:利用数据说不:内存增强即插即用选择性预测
链接:https://arxiv.org/abs/2601.22570

作者:Aditya Sarkar,Yi Li,Jiacheng Cheng,Shlok Mishra,Nuno Vasconcelos
备注:ICLR 2026
摘要:Selective prediction aims to endow predictors with a reject option, to avoid low confidence predictions. However, existing literature has primarily focused on closed-set tasks, such as visual question answering with predefined options or fixed-category classification. This paper considers selective prediction for visual language foundation models, addressing a taxonomy of tasks ranging from closed to open set and from finite to unbounded vocabularies, as in image captioning. We seek training-free approaches of low-complexity, applicable to any foundation model and consider methods based on external vision-language model embeddings, like CLIP. This is denoted as Plug-and-Play Selective Prediction (PaPSP). We identify two key challenges: (1) instability of the visual-language representations, leading to high variance in image-text embeddings, and (2) poor calibration of similarity scores. To address these issues, we propose a memory augmented PaPSP (MA-PaPSP) model, which augments PaPSP with a retrieval dataset of image-text pairs. This is leveraged to reduce embedding variance by averaging retrieved nearest-neighbor pairs and is complemented by the use of contrastive normalization to improve score calibration. Through extensive experiments on multiple datasets, we show that MA-PaPSP outperforms PaPSP and other selective prediction baselines for selective captioning, image-text matching, and fine-grained classification. Code is publicly available at https://github.com/kingston-aditya/MA-PaPSP.


【6】ReNCE: Learning to Reason by Noise Contrastive Estimation
标题:ReNSO:通过噪音对比估计学习推理
链接:https://arxiv.org/abs/2601.22432

作者:Wenzheng Zhang,Karl Stratos
摘要:GRPO is a standard approach to endowing pretrained LLMs with reasoning capabilities. It estimates the advantage of an outcome from a group of $K$ outcomes, and promotes those with positive advantages inside a trust region. Since GRPO discriminates between good and bad outcomes softly, it benefits from additional refinements such as asymmetric clipping and zero-variance data filtering. While effective, these refinements require significant empirical insight and can be challenging to identify. We instead propose an explicit contrastive learning approach. Instead of estimating advantages, we bifurcate $K$ outcomes into positive and negative sets, then maximize the likelihood of positive outcomes. Our approach can be viewed as an online instantiation of (multi-label) noise contrastive estimation for LLM reasoning. We validate our method by demonstrating competitive performance on a suite of challenging math benchmarks against strong baselines such as DAPO and online DPO.


【7】CoDCL: Counterfactual Data Augmentation Contrastive Learning for Continuous-Time Dynamic Network Link Prediction
标题:CoDCL:用于连续时间动态网络链接预测的反事实数据增强对比学习
链接:https://arxiv.org/abs/2601.22427

作者:Hantong Feng,Yonggang Wu,Duxin Chen,Wenwu Yu
摘要:The rapid growth and continuous structural evolution of dynamic networks make effective predictions increasingly challenging. To enable prediction models to adapt to complex temporal environments, they need to be robust to emerging structural changes. We propose a dynamic network learning framework CoDCL, which combines counterfactual data augmentation with contrastive learning to address this deficiency.Furthermore, we devise a comprehensive strategy to generate high-quality counterfactual data, combining a dynamic treatments design with efficient structural neighborhood exploration to quantify the temporal changes in interaction patterns.Crucially, the entire CoDCL is designed as a plug-and-play universal module that can be seamlessly integrated into various existing temporal graph models without requiring architectural modifications.Extensive experiments on multiple real-world datasets demonstrate that CoDCL significantly gains state-of-the-art baseline models in the field of dynamic networks, confirming the critical role of integrating counterfactual data augmentation into dynamic representation learning.


【8】Matrix Factorization for Practical Continual Mean Estimation Under User-Level Differential Privacy
标题:用户级差异隐私下实用连续均值估计的矩阵分解
链接:https://arxiv.org/abs/2601.22320

作者:Nikita P. Kalinin,Ali Najar,Valentin Roth,Christoph H. Lampert
摘要 :We study continual mean estimation, where data vectors arrive sequentially and the goal is to maintain accurate estimates of the running mean. We address this problem under user-level differential privacy, which protects each user's entire dataset even when they contribute multiple data points. Previous work on this problem has focused on pure differential privacy. While important, this approach limits applicability, as it leads to overly noisy estimates. In contrast, we analyze the problem under approximate differential privacy, adopting recent advances in the Matrix Factorization mechanism. We introduce a novel mean estimation specific factorization, which is both efficient and accurate, achieving asymptotically lower mean-squared error bounds in continual mean estimation under user-level differential privacy.


【9】Smart Routing with Precise Link Estimation: DSEE-Based Anypath Routing for Reliable Wireless Networking
标题:具有精确链路估计的智能路由:基于DSEN的无路径路由,用于可靠的无线网络
链接:https://arxiv.org/abs/2405.10377

作者:Narjes Nourzad,Bhaskar Krishnamachari
备注:ICMLCN 2024
摘要:In dynamic and resource-constrained environments, such as multi-hop wireless mesh networks, traditional routing protocols often falter by relying on predetermined paths that prove ineffective in unpredictable link conditions. Shortest Anypath routing offers a solution by adapting routing decisions based on real-time link conditions. However, the effectiveness of such routing is fundamentally dependent on the quality and reliability of the available links, and predicting these variables with certainty is challenging. This paper introduces a novel approach that leverages the Deterministic Sequencing of Exploration and Exploitation (DSEE), a multi-armed bandit algorithm, to address the need for accurate and real-time estimation of link delivery probabilities. This approach augments the reliability and resilience of the Shortest Anypath routing in the face of fluctuating link conditions. By coupling DSEE with Anypath routing, this algorithm continuously learns and ensures accurate delivery probability estimation and selects the most suitable way to efficiently route packets while maintaining a provable near-logarithmic regret bound. We also theoretically prove that our proposed scheme offers better regret scaling with respect to the network size than the previously proposed Thompson Sampling-based Opportunistic Routing (TSOR).


【10】Solving Inverse Problems with Flow-based Models via Model Predictive Control
标题:通过模型预测控制用基于流的模型解决反问题
链接:https://arxiv.org/abs/2601.23231

作者:George Webber,Alexander Denker,Riccardo Barbano,Andrew J Reader
摘要:Flow-based generative models provide strong unconditional priors for inverse problems, but guiding their dynamics for conditional generation remains challenging. Recent work casts training-free conditional generation in flow models as an optimal control problem; however, solving the resulting trajectory optimisation is computationally and memory intensive, requiring differentiation through the flow dynamics or adjoint solves. We propose MPC-Flow, a model predictive control framework that formulates inverse problem solving with flow-based generative models as a sequence of control sub-problems, enabling practical optimal control-based guidance at inference time. We provide theoretical guarantees linking MPC-Flow to the underlying optimal control objective and show how different algorithmic choices yield a spectrum of guidance algorithms, including regimes that avoid backpropagation through the generative model trajectory. We evaluate MPC-Flow on benchmark image restoration tasks, spanning linear and non-linear settings such as in-painting, deblurring, and super-resolution, and demonstrate strong performance and scalability to massive state-of-the-art architectures via training-free guidance of FLUX.2 (32B) in a quantised setting on consumer hardware.


【11】Generative and Nonparametric Approaches for Conditional Distribution Estimation: Methods, Perspectives, and Comparative Evaluations
标题:条件分布估计的生成性和非参数方法:方法、观点和比较评估
链接:https://arxiv.org/abs/2601.22650

作者:Yen-Shiu Chin,Zhi-Yu Jou,Toshinari Morimoto,Chia-Tse Wang,Ming-Chung Chang,Tso-Jung Yen,Su-Yun Huang,Tailen Hsing
备注:22 pages, 2 figures, 2 tables
摘要:The inference of conditional distributions is a fundamental problem in statistics, essential for prediction, uncertainty quantification, and probabilistic modeling. A wide range of methodologies have been developed for this task. This article reviews and compares several representative approaches spanning classical nonparametric methods and modern generative models. We begin with the single-index method of Hall and Yao (2005), which estimates the conditional distribution through a dimension-reducing index and nonparametric smoothing of the resulting one-dimensional cumulative conditional distribution function. We then examine the basis-expansion approaches, including FlexCode (Izbicki and Lee, 2017) and DeepCDE (Dalmasso et al., 2020), which convert conditional density estimation into a set of nonparametric regression problems. In addition, we discuss two recent generative simulation-based methods that leverage modern deep generative architectures: the generative conditional distribution sampler (Zhou et al., 2023) and the conditional denoising diffusion probabilistic model (Fu et al., 2024; Yang et al., 2025). A systematic numerical comparison of these approaches is provided using a unified evaluation framework that ensures fairness and reproducibility. The performance metrics used for the estimated conditional distribution include the mean-squared errors of conditional mean and standard deviation, as well as the Wasserstein distance. We also discuss their flexibility and computational costs, highlighting the distinct advantages and limitations of each approach.


【12】It's all the (Exponential) Family: An Equivalence between Maximum Likelihood Estimation and Control Variates for Sketching Algorithms
标题:这是所有(指数)系列:最大似然估计和草图算法控制变量之间的等效性
链接:https://arxiv.org/abs/2601.22378

作者:Keegan Kang,Kerong Wang,Ding Zhang,Rameshwar Pratap,Bhisham Dev Verma,Benedict H. W. Wong
备注:36 pages, 15 figures
摘要 :Maximum likelihood estimators (MLE) and control variate estimators (CVE) have been used in conjunction with known information across sketching algorithms and applications in machine learning. We prove that under certain conditions in an exponential family, an optimal CVE will achieve the same asymptotic variance as the MLE, giving an Expectation-Maximization (EM) algorithm for the MLE. Experiments show the EM algorithm is faster and numerically stable compared to other root finding algorithms for the MLE for the bivariate Normal distribution, and we expect this to hold across distributions satisfying these conditions. We show how the EM algorithm leads to reproducibility for algorithms using MLE / CVE, and demonstrate how the EM algorithm leads to finding the MLE when the CV weights are known.


其他神经网络|深度学习|模型|建模(24篇)

【1】Particle-Guided Diffusion Models for Partial Differential Equations
标题:偏方程的粒子引导扩散模型
链接:https://arxiv.org/abs/2601.23262

作者:Andrew Millard,Fredrik Lindsten,Zheng Zhao
摘要:We introduce a guided stochastic sampling method that augments sampling from diffusion models with physics-based guidance derived from partial differential equation (PDE) residuals and observational constraints, ensuring generated samples remain physically admissible. We embed this sampling procedure within a new Sequential Monte Carlo (SMC) framework, yielding a scalable generative PDE solver. Across multiple benchmark PDE systems as well as multiphysics and interacting PDE systems, our method produces solution fields with lower numerical error than existing state-of-the-art generative methods.


【2】How well do generative models solve inverse problems? A benchmark study
标题:生成式模型解决逆问题的效果如何?基准研究
链接:https://arxiv.org/abs/2601.23238

作者:Patrick Krüger,Patrick Materne,Werner Krebs,Hanno Gottschalk
备注:32 pages, 11 figures, 5 tables
摘要:Generative learning generates high dimensional data based on low dimensional conditions, also called prompts. Therefore, generative learning algorithms are eligible for solving (Bayesian) inverse problems. In this article we compare a traditional Bayesian inverse approach based on a forward regression model and a prior sampled with the Markov Chain Monte Carlo method with three state of the art generative learning models, namely conditional Generative Adversarial Networks, Invertible Neural Networks and Conditional Flow Matching. We apply them to a problem of gas turbine combustor design where we map six independent design parameters to three performance labels. We propose several metrics for the evaluation of this inverse design approaches and measure the accuracy of the labels of the generated designs along with the diversity. We also study the performance as a function of the training dataset size. Our benchmark has a clear winner, as Conditional Flow Matching consistently outperforms all competing approaches.


【3】Manifold-Aware Perturbations for Constrained Generative Modeling
标题:约束生成建模的多路感知扰动
链接:https://arxiv.org/abs/2601.23151

作者:Katherine Keegan,Lars Ruthotto
摘要:Generative models have enjoyed widespread success in a variety of applications. However, they encounter inherent mathematical limitations in modeling distributions where samples are constrained by equalities, as is frequently the setting in scientific domains. In this work, we develop a computationally cheap, mathematically justified, and highly flexible distributional modification for combating known pitfalls in equality-constrained generative models. We propose perturbing the data distribution in a constraint-aware way such that the new distribution has support matching the ambient space dimension while still implicitly incorporating underlying manifold geometry. Through theoretical analyses and empirical evidence on several representative tasks, we illustrate that our approach consistently enables data distribution recovery and stable sampling with both diffusion models and normalizing flows.


【4】Machine Learning for Energy-Performance-aware Scheduling
标题:用于能源性能感知调度的机器学习
链接:https://arxiv.org/abs/2601.23134

作者:Zheyuan Hu,Yifei Shi
备注:Zheyuan Hu and Yifei Shi contributed equally to this work
摘要:In the post-Dennard era, optimizing embedded systems requires navigating complex trade-offs between energy efficiency and latency. Traditional heuristic tuning is often inefficient in such high-dimensional, non-smooth landscapes. In this work, we propose a Bayesian Optimization framework using Gaussian Processes to automate the search for optimal scheduling configurations on heterogeneous multi-core architectures. We explicitly address the multi-objective nature of the problem by approximating the Pareto Frontier between energy and time. Furthermore, by incorporating Sensitivity Analysis (fANOVA) and comparing different covariance kernels (e.g., Matérn vs. RBF), we provide physical interpretability to the black-box model, revealing the dominant hardware parameters driving system performance.


【5】PlatoLTL: Learning to Generalize Across Symbols in LTL Instructions for Multi-Task RL
标题:PlatoLTL:学习在多任务RL的LTL指令中跨符号进行概括
链接:https://arxiv.org/abs/2601.22891

作者:Jacques Cloete,Mathias Jackermeier,Ioannis Havoutis,Alessandro Abate
备注:11 pages, 3 figures (main paper). 14 pages, 10 figures (appendix)
摘要 :A central challenge in multi-task reinforcement learning (RL) is to train generalist policies capable of performing tasks not seen during training. To facilitate such generalization, linear temporal logic (LTL) has recently emerged as a powerful formalism for specifying structured, temporally extended tasks to RL agents. While existing approaches to LTL-guided multi-task RL demonstrate successful generalization across LTL specifications, they are unable to generalize to unseen vocabularies of propositions (or "symbols"), which describe high-level events in LTL. We present PlatoLTL, a novel approach that enables policies to zero-shot generalize not only compositionally across LTL formula structures, but also parametrically across propositions. We achieve this by treating propositions as instances of parameterized predicates rather than discrete symbols, allowing policies to learn shared structure across related propositions. We propose a novel architecture that embeds and composes predicates to represent LTL specifications, and demonstrate successful zero-shot generalization to novel propositions and tasks across challenging environments.


【6】MoVE: Mixture of Value Embeddings -- A New Axis for Scaling Parametric Memory in Autoregressive Models
标题:MoVE:价值嵌入的混合--自回归模型中缩放参数记忆的新轴
链接:https://arxiv.org/abs/2601.22887

作者:Yangyan Li
摘要:Autoregressive sequence modeling stands as the cornerstone of modern Generative AI, powering results across diverse modalities ranging from text generation to image generation. However, a fundamental limitation of this paradigm is the rigid structural coupling of model capacity to computational cost: expanding a model's parametric memory -- its repository of factual knowledge or visual patterns -- traditionally requires deepening or widening the network, which incurs a proportional rise in active FLOPs. In this work, we introduce $\textbf{MoVE (Mixture of Value Embeddings)}$, a mechanism that breaks this coupling and establishes a new axis for scaling capacity. MoVE decouples memory from compute by introducing a global bank of learnable value embeddings shared across all attention layers. For every step in the sequence, the model employs a differentiable soft gating mechanism to dynamically mix retrieved concepts from this bank into the standard value projection. This architecture allows parametric memory to be scaled independently of network depth by simply increasing the number of embedding slots. We validate MoVE through strictly controlled experiments on two representative applications of autoregressive modeling: Text Generation and Image Generation. In both domains, MoVE yields consistent performance improvements over standard and layer-wise memory baselines, enabling the construction of "memory-dense" models that achieve lower perplexity and higher fidelity than their dense counterparts at comparable compute budgets.


【7】Decomposing and Composing: Towards Efficient Vision-Language Continual Learning via Rank-1 Expert Pool in a Single LoRA
标题:分解和合成:通过单个LoRA中的Rank-1专家池实现高效的视觉语言持续学习
链接:https://arxiv.org/abs/2601.22828

作者:Zhan Fa,Yue Duan,Jian Zhang,Lei Qi,Wanqi Yang,Yinghuan Shi
摘要:Continual learning (CL) in vision-language models (VLMs) faces significant challenges in improving task adaptation and avoiding catastrophic forgetting. Existing methods usually have heavy inference burden or rely on external knowledge, while Low-Rank Adaptation (LoRA) has shown potential in reducing these issues by enabling parameter-efficient tuning. However, considering directly using LoRA to alleviate the catastrophic forgetting problem is non-trivial, we introduce a novel framework that restructures a single LoRA module as a decomposable Rank-1 Expert Pool. Our method learns to dynamically compose a sparse, task-specific update by selecting from this expert pool, guided by the semantics of the [CLS] token. In addition, we propose an Activation-Guided Orthogonal (AGO) loss that orthogonalizes critical parts of LoRA weights across tasks. This sparse composition and orthogonalization enable fewer parameter updates, resulting in domain-aware learning while minimizing inter-task interference and maintaining downstream task performance. Extensive experiments across multiple settings demonstrate state-of-the-art results in all metrics, surpassing zero-shot upper bounds in generalization. Notably, it reduces trainable parameters by 96.7% compared to the baseline method, eliminating reliance on external datasets or task-ID discriminators. The merged LoRAs retain less weights and incur no inference latency, making our method computationally lightweight.


【8】SOMBRERO: Measuring and Steering Boundary Placement in End-to-End Hierarchical Sequence Models
标题:SOMBRERO:测量和引导端到端分层序列模型中的边界放置
链接:https://arxiv.org/abs/2601.22805

作者:Pit Neitemeier,Alessio Serra,Jiaze Li,Sascha Wirges,Lukas Balles,Jan Hendrik Metzen
摘要:Hierarchical sequence models replace fixed tokenization with learned segmentations that compress long byte sequences for efficient autoregressive modeling. While recent end-to-end methods can learn meaningful boundaries from the language-modeling objective alone, it remains difficult to quantitatively assess and systematically steer where compute is spent. We introduce a router-agnostic metric of boundary quality, boundary enrichment B, which measures how strongly chunk starts concentrate on positions with high next-byte surprisal. Guided by this metric, we propose Sombrero, which steers boundary placement toward predictive difficulty via a confidence-alignment boundary loss and stabilizes boundary learning by applying confidence-weighted smoothing at the input level rather than on realized chunks. On 1B scale, across UTF-8 corpora covering English and German text as well as code and mathematical content, Sombrero improves the accuracy-efficiency trade-off and yields boundaries that more consistently align compute with hard-to-predict positions.


【9】Float8@2bits: Entropy Coding Enables Data-Free Model Compression
标题:Float8@2bits:熵编码实现无数据模型压缩
链接:https://arxiv.org/abs/2601.22787

作者:Patrick Putzky,Martin Genzel,Mattes Mollenhauer,Sebastian Schulze,Thomas Wollmann,Stefan Dietzel
摘要 :Post-training compression is currently divided into two contrasting regimes. On the one hand, fast, data-free, and model-agnostic methods (e.g., NF4 or HQQ) offer maximum accessibility but suffer from functional collapse at extreme bit-rates below 4 bits. On the other hand, techniques leveraging calibration data or extensive recovery training achieve superior fidelity but impose high computational constraints and face uncertain robustness under data distribution shifts. We introduce EntQuant, the first framework to unite the advantages of these distinct paradigms. By matching the performance of data-dependent methods with the speed and universality of data-free techniques, EntQuant enables practical utility in the extreme compression regime. Our method decouples numerical precision from storage cost via entropy coding, compressing a 70B parameter model in less than 30 minutes. We demonstrate that EntQuant does not only achieve state-of-the-art results on standard evaluation sets and models, but also retains functional performance on more complex benchmarks with instruction-tuned models, all at modest inference overhead.


【10】Discovering Scaling Exponents with Physics-Informed Müntz-Szász Networks
标题:利用了解物理的Müntz-Szász网络发现缩放指数
链接:https://arxiv.org/abs/2601.22751

作者:Gnankan Landry Regis N'guessan,Bum Jun Kim
备注:26 pages, 6 figures
摘要:Physical systems near singularities, interfaces, and critical points exhibit power-law scaling, yet standard neural networks leave the governing exponents implicit. We introduce physics-informed M"untz-Sz'asz Networks (MSN-PINN), a power-law basis network that treats scaling exponents as trainable parameters. The model outputs both the solution and its scaling structure. We prove identifiability, or unique recovery, and show that, under these conditions, the squared error between learned and true exponents scales as $O(|μ- α|^2)$. Across experiments, MSN-PINN achieves single-exponent recovery with 1--5% error under noise and sparse sampling. It recovers corner singularity exponents for the two-dimensional Laplace equation with 0.009% error, matches the classical result of Kondrat'ev (1967), and recovers forcing-induced exponents in singular Poisson problems with 0.03% and 0.05% errors. On a 40-configuration wedge benchmark, it reaches a 100% success rate with 0.022% mean error. Constraint-aware training encodes physical requirements such as boundary condition compatibility and improves accuracy by three orders of magnitude over naive training. By combining the expressiveness of neural networks with the interpretability of asymptotic analysis, MSN-PINN produces learned parameters with direct physical meaning.


【11】Layerwise Progressive Freezing Enables STE-Free Training of Deep Binary Neural Networks
标题:分层渐进冻结实现深度二进制神经网络的无STET训练
链接:https://arxiv.org/abs/2601.22660

作者:Evan Gibson Smith,Bashima Islam
摘要:We investigate progressive freezing as an alternative to straight-through estimators (STE) for training binary networks from scratch. Under controlled training conditions, we find that while global progressive freezing works for binary-weight networks, it fails for full binary neural networks due to activation-induced gradient blockades. We introduce StoMPP (Stochastic Masked Partial Progressive Binarization), which uses layerwise stochastic masking to progressively replace differentiable clipped weights/activations with hard binary step functions, while only backpropagating through the unfrozen (clipped) subset (i.e., no straight-through estimator). Under a matched minimal training recipe, StoMPP improves accuracy over a BinaryConnect-style STE baseline, with gains that increase with depth (e.g., for ResNet-50 BNN: +18.0 on CIFAR-10, +13.5 on CIFAR-100, and +3.8 on ImageNet; for ResNet-18: +3.1, +4.7, and +1.3). For binary-weight networks, StoMPP achieves 91.2\% accuracy on CIFAR-10 and 69.5\% on CIFAR-100 with ResNet-50. We analyze training dynamics under progressive freezing, revealing non-monotonic convergence and improved depth scaling under binarization constraints.


【12】GUDA: Counterfactual Group-wise Training Data Attribution for Diffusion Models via Unlearning
标题:GUDA:通过取消学习进行扩散模型的反事实分组训练数据归因
链接:https://arxiv.org/abs/2601.22651

作者:Naoki Murata,Yuhta Takida,Chieh-Hsin Lai,Toshimitsu Uesaka,Bac Nguyen,Stefano Ermon,Yuki Mitsufuji
摘要:Training-data attribution for vision generative models aims to identify which training data influenced a given output. While most methods score individual examples, practitioners often need group-level answers (e.g., artistic styles or object classes). Group-wise attribution is counterfactual: how would a model's behavior on a generated sample change if a group were absent from training? A natural realization of this counterfactual is Leave-One-Group-Out (LOGO) retraining, which retrains the model with each group removed; however, it becomes computationally prohibitive as the number of groups grows. We propose GUDA (Group Unlearning-based Data Attribution) for diffusion models, which approximates each counterfactual model by applying machine unlearning to a shared full-data model instead of training from scratch. GUDA quantifies group influence using differences in a likelihood-based scoring rule (ELBO) between the full model and each unlearned counterfactual. Experiments on CIFAR-10 and artistic style attribution with Stable Diffusion show that GUDA identifies primary contributing groups more reliably than semantic similarity, gradient-based attribution, and instance-level unlearning approaches, while achieving x100 speedup on CIFAR-10 over LOGO retraining.


【13】Learning to Defer in Non-Stationary Time Series via Switching State-Space Models
标题:通过切换状态空间模型学习在非平稳时间序列中推迟
链接:https://arxiv.org/abs/2601.22538

作者:Yannis Montreuil,Letian Yu,Axel Carlier,Lai Xing Ng,Wei Tsang Ooi
摘要:We study Learning to Defer for non-stationary time series with partial feedback and time-varying expert availability. At each time step, the router selects an available expert, observes the target, and sees only the queried expert's prediction. We model signed expert residuals using L2D-SLDS, a factorized switching linear-Gaussian state-space model with context-dependent regime transitions, a shared global factor enabling cross-expert information transfer, and per-expert idiosyncratic states. The model supports expert entry and pruning via a dynamic registry. Using one-step-ahead predictive beliefs, we propose an IDS-inspired routing rule that trades off predicted cost against information gained about the latent regime and shared factor. Experiments show improvements over contextual-bandit baselines and a no-shared-factor ablation.


【14】Gradual Fine-Tuning for Flow Matching Models
标题:流量匹配模型的逐步微调
链接:https://arxiv.org/abs/2601.22495

作者:Gudrun Thorkelsdottir,Arindam Banerjee
备注:Preprint. Submitted to ICML. 8 pages, 5 figures (+ appendix)
摘要:Fine-tuning flow matching models is a central challenge in settings with limited data, evolving distributions, or strict efficiency demands, where unconstrained fine-tuning can erode the accuracy and efficiency gains learned during pretraining. Prior work has produced theoretical guarantees and empirical advances for reward-based fine-tuning formulations, but these methods often impose restrictions on permissible drift structure or training techniques. In this work, we propose Gradual Fine-Tuning (GFT), a principled framework for fine-tuning flow-based generative models when samples from the target distribution are available. For stochastic flows, GFT defines a temperature-controlled sequence of intermediate objectives that smoothly interpolate between the pretrained and target drifts, approaching the true target as the temperature approaches zero. We prove convergence results for both marginal and conditional GFT objectives, enabling the use of suitable (e.g., optimal transport) couplings during GFT while preserving correctness. Empirically, GFT improves convergence stability and shortens probability paths, resulting in faster inference, while maintaining generation quality comparable to standard fine-tuning. Our results position GFT as a theoretically grounded and practically effective alternative for scalable adaptation of flow matching models under distribution shift.


【15】MetaLead: A Comprehensive Human-Curated Leaderboard Dataset for Transparent Reporting of Machine Learning Experiments
标题:MetaLead:一个全面的人工策划排行榜数据集,用于透明报告机器学习实验
链接:https://arxiv.org/abs/2601.22420

作者:Roelien C. Timmer,Necva Bölücü,Stephen Wan
备注:EACL 2026
摘要:Leaderboards are crucial in the machine learning (ML) domain for benchmarking and tracking progress. However, creating leaderboards traditionally demands significant manual effort. In recent years, efforts have been made to automate leaderboard generation, but existing datasets for this purpose are limited by capturing only the best results from each paper and limited metadata. We present MetaLead, a fully human-annotated ML Leaderboard dataset that captures all experimental results for result transparency and contains extra metadata, such as the result experimental type: baseline, proposed method, or variation of proposed method for experiment-type guided comparisons, and explicitly separates train and test dataset for cross-domain assessment. This enriched structure makes MetaLead a powerful resource for more transparent and nuanced evaluations across ML research.


【16】FIRE: Multi-fidelity Regression with Distribution-conditioned In-context Learning using Tabular Foundation Models
标题:FIRE:使用表格基础模型进行分布条件内上下文学习的多保真回归
链接:https://arxiv.org/abs/2601.22371

作者:Rosen Ting-Ying Yu,Nicholas Sung,Faez Ahmed
摘要:Multi-fidelity (MF) regression often operates in regimes of extreme data imbalance, where the commonly-used Gaussian-process (GP) surrogates struggle with cubic scaling costs and overfit to sparse high-fidelity observations, limiting efficiency and generalization in real-world applications. We introduce FIRE, a training-free MF framework that couples tabular foundation models (TFMs) to perform zero-shot in-context Bayesian inference via a high-fidelity correction model conditioned on the low-fidelity model's posterior predictive distributions. This cross-fidelity information transfer via distributional summaries captures heteroscedastic errors, enabling robust residual learning without model retraining. Across 31 benchmark problems spanning synthetic and real-world tasks (e.g., DrivAerNet, LCBench), FIRE delivers a stronger performance-time trade-off than seven state-of-the-art GP-based or deep learning MF regression methods, ranking highest in accuracy and uncertainty quantification with runtime advantages. Limitations include context window constraints and dependence on the quality of the pre-trained TFM's.


【17】PoSafeNet: Safe Learning with Poset-Structured Neural Nets
标题:PoSafeNet:使用偏集结构神经网络的安全学习
链接:https://arxiv.org/abs/2601.22356

作者:Kiwan Wong,Wei Xiao,Daniela Rus
摘要:Safe learning is essential for deploying learningbased controllers in safety-critical robotic systems, yet existing approaches often enforce multiple safety constraints uniformly or via fixed priority orders, leading to infeasibility and brittle behavior. In practice, safety requirements are heterogeneous and admit only partial priority relations, where some constraints are comparable while others are inherently incomparable. We formalize this setting as poset-structured safety, modeling safety constraints as a partially ordered set and treating safety composition as a structural property of the policy class. Building on this formulation, we propose PoSafeNet, a differentiable neural safety layer that enforces safety via sequential closed-form projection under poset-consistent constraint orderings, enabling adaptive selection or mixing of valid safety executions while preserving priority semantics by construction. Experiments on multi-obstacle navigation, constrained robot manipulation, and vision-based autonomous driving demonstrate improved feasibility, robustness, and scalability over unstructured and differentiable quadratic program-based safety layers.


【18】Knowledge Gradient for Preference Learning
标题:偏好学习的知识梯度
链接:https://arxiv.org/abs/2601.22335

作者 :Kaiwen Wu,Jacob R. Gardner
摘要:The knowledge gradient is a popular acquisition function in Bayesian optimization (BO) for optimizing black-box objectives with noisy function evaluations. Many practical settings, however, allow only pairwise comparison queries, yielding a preferential BO problem where direct function evaluations are unavailable. Extending the knowledge gradient to preferential BO is hindered by its computational challenge. At its core, the look-ahead step in the preferential setting requires computing a non-Gaussian posterior, which was previously considered intractable. In this paper, we address this challenge by deriving an exact and analytical knowledge gradient for preferential BO. We show that the exact knowledge gradient performs strongly on a suite of benchmark problems, often outperforming existing acquisition functions. In addition, we also present a case study illustrating the limitation of the knowledge gradient in certain scenarios.


【19】DP-$λ$CGD: Efficient Noise Correlation for Differentially Private Model Training
标题:DP-$X $CGD:用于差异私密模型训练的高效噪音相关
链接:https://arxiv.org/abs/2601.22334

作者:Nikita P. Kalinin,Ryan McKenna,Rasmus Pagh,Christoph H. Lampert


【20】Learning Reward Functions for Cooperative Resilience in Multi-Agent Systems
标题:多智能体系统中协作弹性的学习奖励函数
链接:https://arxiv.org/abs/2601.22292

作者:Manuela Chacon-Chamorro,Luis Felipe Giraldo,Nicanor Quijano
备注:Supplementary material in https://github.com/mavivi95/supplementary_files/blob/main/Learning_TAI___Supplementary_File.pdf


【21】Demystifying Mergeability: Interpretable Properties to Predict Model Merging Success
标题:揭开可合并性的神秘面纱:预测模型合并成功的可解释属性
链接:https://arxiv.org/abs/2601.22285

作者:Luca Zhou,Bo Zhao,Rose Yu,Emanuele Rodolà
备注:8 pages of main paper, 3 figures in the main paper, 4 tables in the main paper, many more figures and tables in the appendix


【22】SurrogateSHAP: Training-Free Contributor Attribution for Text-to-Image (T2I) Models
标题:SurrogateSHAP:文本到图像(T2 I)模型的免训练贡献者归因
链接:https://arxiv.org/abs/2601.22276

作者:Mingyu Lu,Soham Gadgil,Chris Lin,Chanwoo Kim,Su-In Lee


【23】Causal Imitation Learning Under Measurement Error and Distribution Shift
标题:测量误差和分布漂移下的因果模仿学习
链接:https://arxiv.org/abs/2601.22206

作者:Shi Bo,AmirEmad Ghassami
备注:28 pages, 3 figures


【24】Corrected Samplers for Discrete Flow Models
标题:离散流模型的修正采样器
链接:https://arxiv.org/abs/2601.22519

作者:Zhengyan Wan,Yidong Ouyang,Liyan Xie,Fang Fang,Hongyuan Zha,Guang Cheng


其他(57篇)

【1】Decoupled Diffusion Sampling for Inverse Problems on Function Spaces
标题:函数空间上反问题的脱钩扩散抽样
链接:https://arxiv.org/abs/2601.23280

作者:Thomas Y. L. Lin,Jiachen Yao,Lufang Chiang,Julius Berner,Anima Anandkumar


【2】FOCUS: DLLMs Know How to Tame Their Compute Bound
标题:焦点:DLLM知道如何驯服他们的计算界限
链接:https://arxiv.org/abs/2601.23278

作者 :Kaihua Liang,Xin Tan,An Zhong,Hong Xu,Marco Canini
备注:22 pages, 15 figures


【3】Tackling air quality with SAPIENS
标题:利用SAPINS解决空气质量问题
链接:https://arxiv.org/abs/2601.23215

作者:Marcella Bona,Nathan Heatley,Jia-Chen Hua,Adriana Lara,Valeria Legaria-Santiago,Alberto Luviano Juarez,Fernando Moreno-Gomez,Jocelyn Richardson,Natan Vilchis,Xiwen Shirley Zheng
备注:24 pages, 13 figures


【4】TriSpec: Ternary Speculative Decoding via Lightweight Proxy Verification
标题:TriSec:通过轻量级代理验证进行三进制推测解码
链接:https://arxiv.org/abs/2601.23180

作者:Haoyun Jiang,Junqi He,Feng Hong,Xinlong Yang,Jianwei Zhang,Zheng Li,Zhengyang Zhuge,Zhiyong Chen,Bo Han,Junyang Lin,Jiangchao Yao


【5】Beyond Fixed Frames: Dynamic Character-Aligned Speech Tokenization
标题:超越固定帧:动态字符串对齐语音令牌化
链接:https://arxiv.org/abs/2601.23174

作者:Luca Della Libera,Cem Subakan,Mirco Ravanelli
备注:18 pages, 3 figures


【6】Stochastic Linear Bandits with Parameter Noise
标题:具有参数噪音的随机线性盗贼
链接:https://arxiv.org/abs/2601.23164

作者:Daniel Ezer,Alon Peled-Cohen,Yishay Mansour
备注:8 pages


【7】Safer Policy Compliance with Dynamic Epistemic Fallback
标题:通过动态认识回撤实现更安全的政策合规
链接:https://arxiv.org/abs/2601.23094

作者:Joseph Marvin Imperial,Harish Tayyar Madabushi


【8】SplineFlow: Flow Matching for Dynamical Systems with B-Spline Interpolants
标题:SplineFlow:具有B样条插值的动态系统的流匹配
链接:https://arxiv.org/abs/2601.23072

作者:Santanu Subhash Rathod,Pietro Liò,Xiao Zhang
备注:36 pages, 35 tables, 22 figures


【9】On the Impact of Code Comments for Automated Bug-Fixing: An Empirical Study
标题:关于代码评论对自动错误修复的影响:一项实证研究
链接:https://arxiv.org/abs/2601.23059

作者:Antonio Vitale,Emanuela Guglielmi,Simone Scalabrino,Rocco Oliveto
备注:Accepted at the 34th IEEE/ACM International Conference on Program Comprehension (ICPC 2026)


【10】Causal Characterization of Measurement and Mechanistic Anomalies
标题:测量和机制异常的原因描述
链接:https://arxiv.org/abs/2601.23026

作者:Hendrik Suhr,David Kaltenpoth,Jilles Vreeken


【11】Mem-T: Densifying Rewards for Long-Horizon Memory Agents
标题:Mem-T:为长期记忆剂增加奖励
链接:https://arxiv.org/abs/2601.23014

作者:Yanwei Yue,Guibin Zhang,Boci Peng,Xuanbo Fan,Jiaxin Guo,Qiankun Li,Yan Zhang


【12】Stabilizing the Q-Gradient Field for Policy Smoothness in Actor-Critic
标题:稳定Q梯度场以确保演员批评者的政策平稳性
链接:https://arxiv.org/abs/2601.22970

作者:Jeong Woon Lee,Kyoleen Kwak,Daeho Kim,Hyoseok Hwang


【13】Improved Algorithms for Nash Welfare in Linear Bandits
标题:线性盗贼纳什福利的改进算法
链接:https://arxiv.org/abs/2601.22969

作者:Dhruv Sarkar,Nishant Pandey,Sayak Ray Chowdhury


【14】Perplexity Cannot Always Tell Right from Wrong
标题:困惑不能总是分辨是非
链接:https://arxiv.org/abs/2601.22950

作者:Petar Veličković,Federico Barbero,Christos Perivolaropoulos,Simon Osindero,Razvan Pascanu
备注:11 pages, 4 figures


【15】Environment-Conditioned Tail Reweighting for Total Variation Invariant Risk Minimization
标题:基于环境条件的总变差不变风险最小化尾部加权
链接:https://arxiv.org/abs/2601.22944

作者:Wang Yuanchao,Lai Zhao-Rong,Zhong Tianqi,Li Fengnan
备注:8 pages


【16】DC-LA: Difference-of-Convex Langevin Algorithm
标题:DC-LA:凸差Langevin算法
链接:https://arxiv.org/abs/2601.22932

作者:Hoang Phuc Hau Luu,Zhongjian Wang


【17】Calibrated Multivariate Distributional Regression with Pre-Rank Regularization
标题:具有预排序正规化的校准多元分布回归
链接:https://arxiv.org/abs/2601.22895

作者:Aya Laajil,Elnura Zhalieva,Naomi Desobry,Souhaib Ben Taieb
备注:arXiv admin note: text overlap with arXiv:2510.21273


【18】DiffuSpeech: Silent Thought, Spoken Answer via Unified Speech-Text Diffusion
标题:扩散言语:无声的思想,通过统一的言语-文本传播说出的答案
链接:https://arxiv.org/abs/2601.22889

作者:Yuxuan Lou,Ziming Wu,Yaochen Wang,Yong Liu,Yingxuan Ren,Fuming Lai,Shaobing Lian,Jie Tang,Yang You


【19】Cascaded Flow Matching for Heterogeneous Tabular Data with Mixed-Type Features
标题:具有混合类型特征的异类表格数据的级联流匹配
链接:https://arxiv.org/abs/2601.22816

作者:Markus Mueller,Kathrin Gruber,Dennis Fok


【20】Compact Hypercube Embeddings for Fast Text-based Wildlife Observation Retrieval
标题:用于快速基于文本的野生动物观察检索的紧凑超立方体嵌入
链接:https://arxiv.org/abs/2601.22783

作者:Ilyass Moummad,Marius Miron,David Robinson,Kawtar Zaher,Hervé Goëau,Olivier Pietquin,Pierre Bonnet,Emmanuel Chemla,Matthieu Geist,Alexis Joly


【21】Sparse Attention as Compact Kernel Regression
标题:稀疏注意力作为紧凑核回归
链接:https://arxiv.org/abs/2601.22766

作者:Saul Santos,Nuno Gonçalves,Daniel C. McNamee,André F. T Martins
备注:16 pages, 5 figures


【22】Beauty and the Beast: Imperceptible Perturbations Against Diffusion-Based Face Swapping via Directional Attribute Editing
标题:美女与野兽:通过方向属性编辑针对基于扩散的面部交换的不可感知的扰动
链接:https://arxiv.org/abs/2601.22744

作者:Yilong Huang,Songze Li


【23】A Unified Study of LoRA Variants: Taxonomy, Review, Codebase, and Empirical Evaluation
标题:LoRA变体的统一研究:分类、评论、代码库和经验评估
链接:https://arxiv.org/abs/2601.22708

作者:Haonan He,Jingqi Ye,Minglei Li,Zhengbo Wang,Tao Chen,Lei Bai,Peng Ye
备注:Submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence, Under Review


【24】ScholarPeer: A Context-Aware Multi-Agent Framework for Automated Peer Review
标题:ScholarPeer:用于自动同行评审的上下文感知多代理框架
链接:https://arxiv.org/abs/2601.22638

作者:Palash Goyal,Mihir Parmar,Yiwen Song,Hamid Palangi,Tomas Pfister,Jinsung Yoon


【25】TTCS: Test-Time Curriculum Synthesis for Self-Evolving
标题:TTCS:自我进化的考试时课程综合
链接:https://arxiv.org/abs/2601.22628

作者:Chengyi Yang,Zhishang Xiang,Yunbo Tang,Zongpei Teng,Chengsong Huang,Fei Long,Yuhan Liu,Jinsong Su
备注:10 pages, 4 figures, Our code and implementation details are available at https://github.com/XMUDeepLIT/TTCS


【26】Lethe:Adapter-Augmented Dual-Stream Update for Persistent Knowledge Erasure in Federated Unlearning
标题:Lethe:用于联邦取消学习中持久知识擦除的适配器增强双流更新
链接:https://arxiv.org/abs/2601.22601

作者:Hanwei Tan,Wentai Hu,Ligang He,Yijun Quan


【27】FedCARE: Federated Unlearning with Conflict-Aware Projection and Relearning-Resistant Recovery
标题:FedCARE:具有预算感知预测和抗重新学习恢复的联邦取消学习
链接:https://arxiv.org/abs/2601.22589

作者:Yue Li,Mingmin Chu,Xilei Yang,Da Xiao,Ziqi Xu,Wei Shao,Qipeng Song,Hui Li
备注:9 pages, 4 figures. Submitted to IJCAI 2026


【28】EUGens: Efficient, Unified, and General Dense Layers
标题:EUGens:高效、统一且通用的密集层
链接:https://arxiv.org/abs/2601.22563

作者:Sang Min Kim,Byeongchan Kim,Arijit Sehanobish,Somnath Basu Roy Chowdhury,Rahul Kidambi,Dongseok Shim,Avinava Dubey,Snigdha Chaturvedi,Min-hwan Oh,Krzysztof Choromanski
备注:Neurips 2025. Encompasses results of arXiv:2410.09771


【29】Exo-Plore: Exploring Exoskeleton Control Space through Human-aligned Simulation
标题:Exo-Plore:通过人性化模拟探索外骨骼控制空间
链接:https://arxiv.org/abs/2601.22550

作者:Geonho Leem,Jaedong Lee,Jehee Lee,Seungmoon Song,Jungdam Won
备注:10 pages, 9 figures, ICLR 2026 accepted


【30】Benchmarking Long Roll-outs of Auto-regressive Neural Operators for the Compressible Navier-Stokes Equations with Conserved Quantity Correction
标题:具有保守量修正的可压缩Navier-Stokes方程的自回归神经运算符的长期推出基准
链接:https://arxiv.org/abs/2601.22541

作者:Sean Current,Chandan Kumar,Datta Gaitonde,Srinivasan Parthasarathy


【31】Neural-Inspired Posterior Approximation (NIPA)
标题:神经刺激后路缝合(NIPA)
链接:https://arxiv.org/abs/2601.22539

作者:Babak Shahbaba,Zahra Moslemi
备注:13 pages, 4 tables


【32】Transform-Augmented GRPO Improves Pass@k
标题:变形增强的GRPO改善了Pass@k
链接:https://arxiv.org/abs/2601.22478

作者:Khiem Le,Youssef Mroueh,Phuc Nguyen,Chi-Heng Lin,Shangqian Gao,Ting Hua,Nitesh V. Chawla


【33】Toward Non-Expert Customized Congestion Control
标题:迈向非专家定制拥堵控制
链接:https://arxiv.org/abs/2601.22461

作者:Mingrui Zhang,Hamid Bagheri,Lisong Xu
备注:Accepted manuscript (AAM) of IEEE ICC 2025 paper. DOI: 10.1109/ICC52391.2025.11160790


【34】Machine Unlearning in Low-Dimensional Feature Subspace
标题:低维特征子空间中的机器去学习
链接:https://arxiv.org/abs/2601.22456

作者:Kun Fang,Qinghua Tao,Junxu Liu,Yaxin Xiao,Qingqing Ye,Jian Sun,Haibo Hu


【35】Weak Diffusion Priors Can Still Achieve Strong Inverse-Problem Performance
标题:弱扩散先验仍然可以实现强大的逆问题性能
链接:https://arxiv.org/abs/2601.22443

作者:Jing Jia,Wei Yuan,Sifan Liu,Liyue Shen,Guanyang Wang


【36】Bifocal Attention: Harmonizing Geometric and Spectral Positional Embeddings for Algorithmic Generalization
标题:双焦点注意力:协调几何和光谱位置嵌入以实现数学概括
链接:https://arxiv.org/abs/2601.22402

作者:Kanishk Awadhiya


【37】The Unseen Threat: Residual Knowledge in Machine Unlearning under Perturbed Samples
标题:看不见的威胁:受干扰样本下机器取消学习中的剩余知识
链接:https://arxiv.org/abs/2601.22359

作者:Hsiang Hsu,Pradeep Niroula,Zichang He,Ivan Brugere,Freddy Lecue,Chun-Fu Chen
备注:Presented at NeurIPS 2025


【38】Small Talk, Big Impact: The Energy Cost of Thanking AI
标题:闲聊,大影响:感谢人工智能的能源成本
链接:https://arxiv.org/abs/2601.22357

作者:Julien Delavande,Regis Pierrard,Sasha Luccioni


【39】Relative Wasserstein Angle and the Problem of the $W_2$-Nearest Gaussian Distribution
标题:相对Wasserstein角和$W_2$-最近高斯分布问题
链接:https://arxiv.org/abs/2601.22355

作者:Binshuai Wang,Peng Wei


【40】Recoverability Has a Law: The ERR Measure for Tool-Augmented Agents
标题:可恢复性有规律:工具增强代理的BER指标
链接:https://arxiv.org/abs/2601.22352

作者:Sri Vatsa Vuddanti,Satwik Kumar Chittiprolu
备注:Preprint for ICML Submission


【41】MixQuant: Pushing the Limits of Block Rotations in Post-Training Quantization
标题:MixQuant:突破训练后量化中块旋转的极限
链接:https://arxiv.org/abs/2601.22347

作者:Sai Sanjeet,Ian Colbert,Pablo Monteagudo-Lago,Giuseppe Franco,Yaman Umuroglu,Nicholas J. Fraser


【42】Knowledge-Informed Kernel State Reconstruction for Interpretable Dynamical System Discovery
标题:可解释动态系统发现的知识知情核心状态重建
链接:https://arxiv.org/abs/2601.22328

作者:Luca Muscarnera,Silas Ruhrberg Estévez,Samuel Holt,Evgeny Saveliev,Mihaela van der Schaar


【43】Hair-Trigger Alignment: Black-Box Evaluation Cannot Guarantee Post-Update Alignment
标题:一触即发的对齐:黑匣子评估无法保证更新后的对齐
链接:https://arxiv.org/abs/2601.22313

作者:Yavuz Bakman,Duygu Nur Yaldiz,Salman Avestimehr,Sai Praneeth Karimireddy


【44】Exact closed-form Gaussian moments of residual layers
标题:剩余层的精确闭合高斯矩
链接:https://arxiv.org/abs/2601.22307

作者:Simon Kuang,Xinfan Lin


【45】ParalESN: Enabling parallel information processing in Reservoir Computing
标题:ParalESN:在水库计算中实现并行信息处理
链接:https://arxiv.org/abs/2601.22296

作者:Matteo Pinna,Giacomo Lagomarsini,Andrea Ceni,Claudio Gallicchio
备注:17 pages, 6 figures, 9 tables


【46】JAF: Judge Agent Forest
标题:法官代理森林
链接:https://arxiv.org/abs/2601.22269

作者:Sahil Garg,Brad Cheezum,Sridhar Dutta,Vishal Agarwal


【47】Do Open-Vocabulary Detectors Transfer to Aerial Imagery? A Comparative Evaluation
标题:开放词汇检测器是否可以转移到航空图像?的比较评价
链接:https://arxiv.org/abs/2601.22164

作者:Christos Tsourveloudis


【48】Compressed BC-LISTA via Low-Rank Convolutional Decomposition
标题:通过低阶卷积分解压缩BC-LISTA
链接:https://arxiv.org/abs/2601.23148

作者:Han Wang,Yhonatan Kvich,Eduardo Pérez,Florian Römer,Yonina C. Eldar
备注:Inverse Problems, Model Compression, Compressed Sensing, Deep Unrolling, Computational Imaging


【49】Neural Backward Filtering Forward Guiding
标题:神经后向滤波前向制导
链接:https://arxiv.org/abs/2601.23030

作者:Gefan Yang,Frank van der Meulen,Stefan Sommer


【50】Approximating $f$-Divergences with Rank Statistics
标题:用秩统计量逼近$f$-散度
链接:https://arxiv.org/abs/2601.22784

作者:Viktor Stein,José Manuel de Frutos
备注:42 pages, 10 figures, 4 tables, submitted to ICML'26. Comments welcome!


【51】Bayesian Matrix Completion Under Geometric Constraints
标题:几何约束下的Bayesian矩阵完成
链接:https://arxiv.org/abs/2601.22765

作者:Rohit Varma Chiluvuri,Santosh Nannuru
备注:4 pages, 3 figures, Accepted to ICASSP 2026


【52】Spectral Gradient Descent Mitigates Anisotropy-Driven Misalignment: A Case Study in Phase Retrieval
标题:光谱梯度下降缓解各向异性驱动的失调:相恢复的案例研究
链接:https://arxiv.org/abs/2601.22652

作者:Guillaume Braun,Han Bao,Wei Huang,Masaaki Imaizumi
备注:53 pages, 8 figures


【53】RPWithPrior: Label Differential Privacy in Regression
标题:RPWithPrior:在回归中标记差异隐私
链接:https://arxiv.org/abs/2601.22625

作者:Haixia Liu,Ruifan Huang
备注:20 pages


【54】An Efficient Algorithm for Thresholding Monte Carlo Tree Search
标题:一种有效的时间保持蒙特卡罗树搜索算法
链接:https://arxiv.org/abs/2601.22600

作者:Shoma Nameki,Atsuyoshi Nakamura,Junpei Komiyama,Koji Tabata


【55】AI Decodes Historical Chinese Archives to Reveal Lost Climate History
标题:人工智能解密中国历史档案揭示失落的气候历史
链接:https://arxiv.org/abs/2601.22458

作者:Sida He,Lingxi Xie,Xiaopeng Zhang,Qi Tian
备注:60 pages, 4 figures in the main text, 25 figures and 10 tables in the appendix


【56】Minimal-Action Discrete Schrödinger Bridge Matching for Peptide Sequence Design
标题:用于肽序列设计的最小动作离散薛定格桥匹配
链接:https://arxiv.org/abs/2601.22408

作者:Shrey Goel,Pranam Chatterjee


【57】SCENE: Semantic-aware Codec Enhancement with Neural Embeddings
标题:场景:使用神经嵌入的语义感知编解码器增强
链接:https://arxiv.org/abs/2601.22189

作者:Han-Yu Lin,Li-Wei Chen,Hung-Shin Lee
备注:Accepted to ICASSP 2026


机器翻译由腾讯交互翻译提供,仅供参考

点击“阅读原文”获取带摘要的学术速递

Python社区是高质量的Python/Django开发社区
本文地址:http://www.python88.com/topic/192477