Py学习  »  机器学习算法

机器学习学术速递[1.13]

arXiv每日学术速递 • 3 周前 • 308 次点击  

点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!


cs.LG 方向,今日共计259篇


大模型相关(42篇)

【1】The Confidence Trap: Gender Bias and Predictive Certainty in LLMs
标题:信心陷阱:法学硕士中的性别偏见和预测模糊性
链接:https://arxiv.org/abs/2601.07806

作者:Ahmed Sabir,Markus Kängsepp,Rajesh Sharma
备注:AAAI 2026 (AISI Track), Oral. Project page: https://bit.ly/4p8OKQD
摘要:The increased use of Large Language Models (LLMs) in sensitive domains leads to growing interest in how their confidence scores correspond to fairness and bias. This study examines the alignment between LLM-predicted confidence and human-annotated bias judgments. Focusing on gender bias, the research investigates probability confidence calibration in contexts involving gendered pronoun resolution. The goal is to evaluate if calibration metrics based on predicted confidence scores effectively capture fairness-related disparities in LLMs. The results show that, among the six state-of-the-art models, Gemma-2 demonstrates the worst calibration according to the gender bias benchmark. The primary contribution of this work is a fairness-aware evaluation of LLMs' confidence calibration, offering guidance for ethical deployment. In addition, we introduce a new calibration metric, Gender-ECE, designed to measure gender disparities in resolution tasks.


【2】Are LLM Decisions Faithful to Verbal Confidence?
标题:LLM决策是否忠实于口头信心?
链接:https://arxiv.org/abs/2601.07767

作者:Jiawei Wang,Yanfei Zhou,Siddartha Devic,Deqing Fu
摘要:Large Language Models (LLMs) can produce surprisingly sophisticated estimates of their own uncertainty. However, it remains unclear to what extent this expressed confidence is tied to the reasoning, knowledge, or decision making of the model. To test this, we introduce $\textbf{RiskEval}$: a framework designed to evaluate whether models adjust their abstention policies in response to varying error penalties. Our evaluation of several frontier models reveals a critical dissociation: models are neither cost-aware when articulating their verbal confidence, nor strategically responsive when deciding whether to engage or abstain under high-penalty conditions. Even when extreme penalties render frequent abstention the mathematically optimal strategy, models almost never abstain, resulting in utility collapse. This indicates that calibrated verbal confidence scores may not be sufficient to create trustworthy and interpretable AI systems, as current models lack the strategic agency to convert uncertainty signals into optimal and risk-sensitive decisions.


【3】Adaptive Layer Selection for Layer-Wise Token Pruning in LLM Inference
标题:LLM推理中分层令牌修剪的自适应层选择
链接:https://arxiv.org/abs/2601.07667

作者:Rei Taniguchi,Yuyang Dong,Makoto Onizuka,Chuan Xiao
备注:Source code is available at https://github.com/TANIGUCHIREI/ASL
摘要:Due to the prevalence of large language models (LLMs), key-value (KV) cache reduction for LLM inference has received remarkable attention. Among numerous works that have been proposed in recent years, layer-wise token pruning approaches, which select a subset of tokens at particular layers to retain in KV cache and prune others, are one of the most popular schemes. They primarily adopt a set of pre-defined layers, at which tokens are selected. Such design is inflexible in the sense that the accuracy significantly varies across tasks and deteriorates in harder tasks such as KV retrieval. In this paper, we propose ASL, a training-free method that adaptively chooses the selection layer for KV cache reduction, exploiting the variance of token ranks ordered by attention score. The proposed method balances the performance across different tasks while meeting the user-specified KV budget requirement. ASL operates during the prefilling stage and can be jointly used with existing KV cache reduction methods such as SnapKV to optimize the decoding stage. By evaluations on the InfiniteBench, RULER, and NIAH benchmarks, we show that equipped with one-shot token selection, where tokens are selected at a layer and propagated to deeper layers, ASL outperforms state-of-the-art layer-wise token selection methods in accuracy while maintaining decoding speed and KV cache reduction.


【4】GRPO with State Mutations: Improving LLM-Based Hardware Test Plan Generation
标题:具有状态突变的GRPO:改进基于LLM的硬件测试计划生成
链接:https://arxiv.org/abs/2601.07593

作者:Dimple Vijay Kochar,Nathaniel Pinckney,Guan-Ting Liu,Chia-Tung Ho,Chenhui Deng,Haoxing Ren,Brucek Khailany
摘要 :RTL design often relies heavily on ad-hoc testbench creation early in the design cycle. While large language models (LLMs) show promise for RTL code generation, their ability to reason about hardware specifications and generate targeted test plans remains largely unexplored. We present the first systematic study of LLM reasoning capabilities for RTL verification stimuli generation, establishing a two-stage framework that decomposes test plan generation from testbench execution. Our benchmark reveals that state-of-the-art models, including DeepSeek-R1 and Claude-4.0-Sonnet, achieve only 15.7-21.7% success rates on generating stimuli that pass golden RTL designs. To improve LLM generated stimuli, we develop a comprehensive training methodology combining supervised fine-tuning with a novel reinforcement learning approach, GRPO with State Mutation (GRPO-SMu), which enhances exploration by varying input mutations. Our approach leverages a tree-based branching mutation strategy to construct training data comprising equivalent and mutated trees, moving beyond linear mutation approaches to provide rich learning signals. Training on this curated dataset, our 7B parameter model achieves a 33.3% golden test pass rate and a 13.9% mutation detection rate, representing a 17.6% absolute improvement over baseline and outperforming much larger general-purpose models. These results demonstrate that specialized training methodologies can significantly enhance LLM reasoning capabilities for hardware verification tasks, establishing a foundation for automated sub-unit testing in semiconductor design workflows.


【5】d3LLM: Ultra-Fast Diffusion LLM using Pseudo-Trajectory Distillation
标题:d3 LLM:使用伪轨迹蒸馏的超快扩散LLM
链接:https://arxiv.org/abs/2601.07568

作者:Yu-Yang Qian,Junda Su,Lanxiang Hu,Peiyuan Zhang,Zhijie Deng,Peng Zhao,Hao Zhang
摘要:Diffusion large language models (dLLMs) offer capabilities beyond those of autoregressive (AR) LLMs, such as parallel decoding and random-order generation. However, realizing these benefits in practice is non-trivial, as dLLMs inherently face an accuracy-parallelism trade-off. Despite increasing interest, existing methods typically focus on only one-side of the coin, targeting either efficiency or performance. To address this limitation, we propose d3LLM (Pseudo-Distilled Diffusion Large Language Model), striking a balance between accuracy and parallelism: (i) during training, we introduce pseudo-trajectory distillation to teach the model which tokens can be decoded confidently at early steps, thereby improving parallelism; (ii) during inference, we employ entropy-based multi-block decoding with a KV-cache refresh mechanism to achieve high parallelism while maintaining accuracy. To better evaluate dLLMs, we also introduce AUP (Accuracy Under Parallelism), a new metric that jointly measures accuracy and parallelism. Experiments demonstrate that our d3LLM achieves up to 10$\times$ speedup over vanilla LLaDA/Dream and 5$\times$ speedup over AR models without much accuracy drop. Our code is available at https://github.com/hao-ai-lab/d3LLM.


【6】FROAV: A Framework for RAG Observation and Agent Verification - Lowering the Barrier to LLM Agent Research
标题:FROAV:RAG观察和代理验证框架-降低LLM代理研究的障碍
链接:https://arxiv.org/abs/2601.07504

作者:Tzu-Hsuan Lin,Chih-Hsuan Kao
备注:8 pages, 1 figure, 3 tables
摘要:The rapid advancement of Large Language Models (LLMs) and their integration into autonomous agent systems has created unprecedented opportunities for document analysis, decision support, and knowledge retrieval. However, the complexity of developing, evaluating, and iterating on LLM-based agent workflows presents significant barriers to researchers, particularly those without extensive software engineering expertise. We present FROAV (Framework for RAG Observation and Agent Verification), an open-source research platform that democratizes LLM agent research by providing a plug-and-play architecture combining visual workflow orchestration, a comprehensive evaluation framework, and extensible Python integration. FROAV implements a multi-stage Retrieval-Augmented Generation (RAG) pipeline coupled with a rigorous "LLM-as-a-Judge" evaluation system, all accessible through intuitive graphical interfaces. Our framework integrates n8n for no-code workflow design, PostgreSQL for granular data management, FastAPI for flexible backend logic, and Streamlit for human-in-the-loop interaction. Through this integrated ecosystem, researchers can rapidly prototype RAG strategies, conduct prompt engineering experiments, validate agent performance against human judgments, and collect structured feedback-all without writing infrastructure code. We demonstrate the framework's utility through its application to financial document analysis, while emphasizing its material-agnostic architecture that adapts to any domain requiring semantic analysis. FROAV represents a significant step toward making LLM agent research accessible to a broader scientific community, enabling researchers to focus on hypothesis testing and algorithmic innovation rather than system integration challenges.


【7】ARCQuant: Boosting NVFP4 Quantization with Augmented Residual Channels for LLMs
标题:HARQuant:通过LLM的增强剩余通道增强NVFP 4量化
链接:https://arxiv.org/abs/2601.07475

作者:Haoqian Meng,Yilun Luo,Yafei Zhao,Wenyuan Liu,Peng Zhang,Xindian Ma
摘要:The emergence of fine-grained numerical formats like NVFP4 presents new opportunities for efficient Large Language Model (LLM) inference. However, it is difficult to adapt existing Post-Training Quantization (PTQ) strategies to these formats: rotation-based methods compromise fine-grained block isolation; smoothing techniques struggle with significant 4-bit quantization errors; and mixed-precision approaches often conflict with hardware constraints on unified-precision computation. To address these challenges, we propose ARCQuant, a framework that boosts NVFP4 performance via Augmented Residual Channels. Distinct from methods that compromise block isolation or hardware uniformity, ARCQuant maintains a strictly unified NVFP4 format by augmenting the activation matrix with quantized residual channels. This design integrates the error compensation process directly into the matrix reduction dimension, enabling the use of standard, highly optimized GEMM kernels with minimal overhead. Theoretical analysis confirms that the worst-case error bound of our dual-stage NVFP4 quantization is comparable to that of standard 8-bit formats such as MXFP8. Extensive experiments on LLaMA and Qwen models demonstrate that ARCQuant achieves state-of-the-art accuracy, comparable to full-precision baselines in perplexity and downstream tasks. Furthermore, deployment on RTX 5090 and RTX PRO 6000 GPUs confirms practical benefits, achieving up to 3x speedup over FP16. Our code is available at https://github.com/actypedef/ARCQuant .


【8】SCALPEL: Selective Capability Ablation via Low-rank Parameter Editing for Large Language Model Interpretability Analysis
标题:SCALPLE:通过低级参数编辑进行大型语言模型可解释性分析的选择性能力消融
链接:https://arxiv.org/abs/2601.07411

作者:Zihao Fu,Xufeng Duan,Zhenguang G. Cai
摘要 :Large language models excel across diverse domains, yet their deployment in healthcare, legal systems, and autonomous decision-making remains limited by incomplete understanding of their internal mechanisms. As these models integrate into high-stakes systems, understanding how they encode capabilities has become fundamental to interpretability research. Traditional approaches identify important modules through gradient attribution or activation analysis, assuming specific capabilities map to specific components. However, this oversimplifies neural computation: modules may contribute to multiple capabilities simultaneously, while single capabilities may distribute across multiple modules. These coarse-grained analyses fail to capture fine-grained, distributed capability encoding. We present SCALPEL (Selective Capability Ablation via Low-rank Parameter Editing for Large language models), a framework representing capabilities as low-rank parameter subspaces rather than discrete modules. Our key insight is that capabilities can be characterized by low-rank modifications distributed across layers and modules, enabling precise capability removal without affecting others. By training LoRA adapters to reduce distinguishing correct from incorrect answers while preserving general language modeling quality, SCALPEL identifies low-rank representations responsible for particular capabilities while remaining disentangled from others. Experiments across diverse capability and linguistic tasks from BLiMP demonstrate that SCALPEL successfully removes target capabilities while preserving general capabilities, providing fine-grained insights into capability distribution across parameter space. Results reveal that capabilities exhibit low-rank structure and can be selectively ablated through targeted parameter-space interventions, offering nuanced understanding of capability encoding in LLMs.


【9】SEE: Signal Embedding Energy for Quantifying Noise Interference in Large Audio Language Models
标题:SEE:用于量化大型音频语言模型中的噪音干扰的信号嵌入能量
链接:https://arxiv.org/abs/2601.07331

作者:Yuanhe Zhang,Jiayu Tian,Yibo Zhang,Shilinlu Yan,Liang Lin,Zhenhong Zhou,Li Sun,Sen Su
摘要:Large Audio Language Models (LALMs) have been widely applied in real-time scenarios, such as in-car assistants and online meeting comprehension. In practice, audio inputs are often corrupted by device and environmental noise, leading to performance degradation. However, existing LALM studies on noise lack quantitative analysis and rely mainly on intuition and empirical observation, thus failing to understand practical robustness. To address this issue, we introduce Signal Embedding Energy (SEE), a method for quantifying the impact of noise intensity on LALM inputs, enabling the differentiation of LALM robustness in real-world deployments. SEE introduces a perspective based on structured activation subspaces derived from the model's internal representations, which more accurately captures its perception of noise than raw audio features. Across experiments, SEE exhibits a strong correlation with LALM performance, achieving a correlation of 0.98. Surprisingly, traditional audio denoising methods are only marginally effective for LALMs, and, in some cases, even increase SEE and impair performance. This suggests a mismatch between speech-centric denoising objectives and the noise sensitivity of modern LALMs. Therefore, we propose a mitigation strategy derived from SEE to denoise LALM inputs, outperforming existing denoising methods. This paper introduces a novel metric for noise quantification in LALMs, providing guidance for robustness improvements in real-world deployments.


【10】Segmental Advantage Estimation: Enhancing PPO for Long-Context LLM Training
标题:细分优势估计:增强PPO以进行长背景LLM训练
链接:https://arxiv.org/abs/2601.07320

作者:Xue Gong,Qi Yi,Ziyuan Nan,Guanhua Huang,Kejiao Li,Yuhao Jiang,Ruibin Xiong,Zenan Xu,Jiaming Guo,Shaohui Peng,Bo Zhou
摘要:Training Large Language Models (LLMs) for reasoning tasks is increasingly driven by Reinforcement Learning with Verifiable Rewards (RLVR), where Proximal Policy Optimization (PPO) provides a principled framework for stable policy updates. However, the practical application of PPO is hindered by unreliable advantage estimation in the sparse-reward RLVR regime. This issue arises because the sparse rewards in RLVR lead to inaccurate intermediate value predictions, which in turn introduce significant bias when aggregated at every token by Generalized Advantage Estimation (GAE). To address this, we introduce Segmental Advantage Estimation (SAE), which mitigates the bias that GAE can incur in RLVR. Our key insight is that aggregating $n$-step advantages at every token(as in GAE) is unnecessary and often introduces excessive bias, since individual tokens carry minimal information. Instead, SAE first partitions the generated sequence into coherent sub-segments using low-probability tokens as heuristic boundaries. It then selectively computes variance-reduced advantage estimates only from these information-rich segment transitions, effectively filtering out noise from intermediate tokens. Our experiments demonstrate that SAE achieves superior performance, with marked improvements in final scores, training stability, and sample efficiency. These gains are shown to be consistent across multiple model sizes, and a correlation analysis confirms that our proposed advantage estimator achieves a higher correlation with an approximate ground-truth advantage, justifying its superior performance.


【11】ARM: Role-Conditioned Neuron Transplantation for Training-Free Generalist LLM Agent Merging
标题:ARM:用于免训练多面手LLM代理合并的角色条件神经元移植
链接:https://arxiv.org/abs/2601.07309

作者:Zhuoka Feng,Kang Chen,Sihan Zhao,Kai Xiong,Yaoning Wang,Minshen Yu,Junjie Nian,Changyi Xiao,Yixin Cao,Yugang Jiang
备注:17 pages, 12 figures. Project page: https://arkazhuo.github.io/ARM-homepage/
摘要:Interactive large language model agents have advanced rapidly, but most remain specialized to a single environment and fail to adapt robustly to other environments. Model merging offers a training-free alternative by integrating multiple experts into a single model. In this paper, we propose Agent-Role Merging (ARM), an activation-guided, role-conditioned neuron transplantation method for model merging in LLM agents. ARM improves existing merging methods from static natural language tasks to multi-turn agent scenarios, and over the generalization ability across various interactive environments. This is achieved with a well designed 3-step framework: 1) constructing merged backbones, 2) selection based on its role-conditioned activation analysis, and 3) neuron transplantation for fine-grained refinements. Without gradient-based optimization, ARM improves cross-benchmark generalization while enjoying efficiency. Across diverse domains, the model obtained via ARM merging outperforms prior model merging methods and domain-specific expert models, while demonstrating strong out-of-domain generalization.


【12】Learning to Trust the Crowd: A Multi-Model Consensus Reasoning Engine for Large Language Models
标题:学会信任人群:大型语言模型的多模型共识推理引擎
链接:https://arxiv.org/abs/2601.07245

作者:Pranav Kallem
摘要 :Large language models (LLMs) achieve strong aver- age performance yet remain unreliable at the instance level, with frequent hallucinations, brittle failures, and poorly calibrated confidence. We study reliability through the lens of multi-model consensus: given responses from several heterogeneous LLMs, can we learn which answer is most likely correct for a given query? We introduce a Multi-Model Consensus Reasoning Engine that treats the set of LLM outputs as input to a supervised meta-learner. The system maps natural language responses into structured features using semantic embeddings, pairwise similarity and clustering statistics, lexical and structural cues, reasoning-quality scores, confidence estimates, and model-specific priors, and then applies gradient-boosted trees, listwise ranking, and graph neural networks over similarity graphs of answers. Using three open-weight LLMs evaluated on compact, resource- constrained subsets of GSM8K, ARC-Challenge, HellaSwag, and TruthfulQA, our best graph-attention-based consensus model improves macro-average accuracy by 4.6 percentage points over the strongest single LLM and by 8.1 points over majority vote, while also yielding lower Brier scores and fewer TruthfulQA hal- lucinations. Ablation and feature-importance analyses show that semantic agreement and clustering features are most influential, with reasoning-quality and model-prior features providing com- plementary gains, suggesting supervised multi-model consensus is a practical route toward more reliable LLM behavior, even in a modest single-machine setup.


【13】Safeguarding LLM Fine-tuning via Push-Pull Distributional Alignment
标题:通过推挽式分布对准保护LLM微调
链接:https://arxiv.org/abs/2601.07200

作者:Haozhong Wang,Zhuo Li,Yibo Yang,He Zhao,Hongyuan Zha,Dandan Guo
摘要:The inherent safety alignment of Large Language Models (LLMs) is prone to erosion during fine-tuning, even when using seemingly innocuous datasets. While existing defenses attempt to mitigate this via data selection, they typically rely on heuristic, instance-level assessments that neglect the global geometry of the data distribution and fail to explicitly repel harmful patterns. To address this, we introduce Safety Optimal Transport (SOT), a novel framework that reframes safe fine-tuning from an instance-level filtering challenge to a distribution-level alignment task grounded in Optimal Transport (OT). At its core is a dual-reference ``push-pull'' weight-learning mechanism: SOT optimizes sample importance by actively pulling the downstream distribution towards a trusted safe anchor while simultaneously pushing it away from a general harmful reference. This establishes a robust geometric safety boundary that effectively purifies the training data. Extensive experiments across diverse model families and domains demonstrate that SOT significantly enhances model safety while maintaining competitive downstream performance, achieving a superior safety-utility trade-off compared to baselines.


【14】Beyond Variance: Knowledge-Aware LLM Compression via Fisher-Aligned Subspace Diagnostics
标题:超越方差:通过Fisher对齐子空间诊断进行知识知晓的LLM压缩
链接:https://arxiv.org/abs/2601.07197

作者:Ibne Farabi Shihab,Sanjeda Akter,Anuj Sharma
摘要:Post-training activation compression is essential for deploying Large Language Models (LLMs) on resource-constrained hardware. However, standard methods like Singular Value Decomposition (SVD) are gradient-blind: they preserve high-variance dimensions regardless of their impact on factual knowledge preservation. We introduce Fisher-Aligned Subspace Compression (FASC), a knowledge-aware compression framework that selects subspaces by directly modeling activation-gradient coupling, minimizing a second-order surrogate of the loss function. FASC leverages the Fisher Information Matrix to identify dimensions critical for factual knowledge, which often reside in low-variance but high-gradient-sensitivity subspaces. We propose the Dependence Violation Score (\r{ho}) as a general-purpose diagnostic metric that quantifies activation-gradient coupling, revealing where factual knowledge is stored within transformer architectures. Extensive experiments on Mistral-7B and Llama-3-8B demonstrate that FASC preserves 6-8% more accuracy on knowledge-intensive benchmarks (MMLU, LAMA) compared to variance-based methods at 50% rank reduction, effectively enabling a 7B model to match the factual recall of a 13B uncompressed model. Our analysis reveals that \r{ho} serves as a fundamental signal of stored knowledge, with high-\r{ho} layers emerging only when models internalize factual associations during training.


【15】AscendKernelGen: A Systematic Study of LLM-Based Kernel Generation for Neural Processing Units
标题:AscendKernelGen:神经处理单元基于LLM的核生成的系统研究
链接:https://arxiv.org/abs/2601.07160

作者:Xinzi Cao,Jianyang Zhai,Pengfei Li,Zhiheng Hu,Cen Yan,Bingxu Mu,Guanghuan Fang,Bin She,Jiayu Li,Yihan Su,Dongyang Tao,Xiansong Huang,Fan Xu,Feidiao Yang,Yao Lu,Chang-Dong Wang,Yutong Lu,Weicheng Xue,Bin Zhou,Yonghong Tian
备注:33 pages,7 figures,16 tables
摘要:To meet the ever-increasing demand for computational efficiency, Neural Processing Units (NPUs) have become critical in modern AI infrastructure. However, unlocking their full potential requires developing high-performance compute kernels using vendor-specific Domain-Specific Languages (DSLs), a task that demands deep hardware expertise and is labor-intensive. While Large Language Models (LLMs) have shown promise in general code generation, they struggle with the strict constraints and scarcity of training data in the NPU domain. Our preliminary study reveals that state-of-the-art general-purpose LLMs fail to generate functional complex kernels for Ascend NPUs, yielding a near-zero success rate. To address these challenges, we propose AscendKernelGen, a generation-evaluation integrated framework for NPU kernel development. We introduce Ascend-CoT, a high-quality dataset incorporating chain-of-thought reasoning derived from real-world kernel implementations, and KernelGen-LM, a domain-adaptive model trained via supervised fine-tuning and reinforcement learning with execution feedback. Furthermore, we design NPUKernelBench, a comprehensive benchmark for assessing compilation, correctness, and performance across varying complexity levels. Experimental results demonstrate that our approach significantly bridges the gap between general LLMs and hardware-specific coding. Specifically, the compilation success rate on complex Level-2 kernels improves from 0% to 95.5% (Pass@10), while functional correctness achieves 64.3% compared to the baseline's complete failure. These results highlight the critical role of domain-specific reasoning and rigorous evaluation in automating accelerator-aware code generation.


【16】Enhancing Cloud Network Resilience via a Robust LLM-Empowered Multi-Agent Reinforcement Learning Framework
标题:通过强大的LLM授权的多Agent强化学习框架增强云网络弹性
链接:https://arxiv.org/abs/2601.07122

作者:Yixiao Peng,Hao Hu,Feiyang Li,Xinye Cao,Yingchang Jiang,Jipeng Tang,Guoshun Nan,Yuling Liu
摘要:While virtualization and resource pooling empower cloud networks with structural flexibility and elastic scalability, they inevitably expand the attack surface and challenge cyber resilience. Reinforcement Learning (RL)-based defense strategies have been developed to optimize resource deployment and isolation policies under adversarial conditions, aiming to enhance system resilience by maintaining and restoring network availability. However, existing approaches lack robustness as they require retraining to adapt to dynamic changes in network structure, node scale, attack strategies, and attack intensity. Furthermore, the lack of Human-in-the-Loop (HITL) support limits interpretability and flexibility. To address these limitations, we propose CyberOps-Bots, a hierarchical multi-agent reinforcement learning framework empowered by Large Language Models (LLMs). Inspired by MITRE ATT&CK's Tactics-Techniques model, CyberOps-Bots features a two-layer architecture: (1) An upper-level LLM agent with four modules--ReAct planning, IPDRR-based perception, long-short term memory, and action/tool integration--performs global awareness, human intent recognition, and tactical planning; (2) Lower-level RL agents, developed via heterogeneous separated pre-training, execute atomic defense actions within localized network regions. This synergy preserves LLM adaptability and interpretability while ensuring reliable RL execution. Experiments on real cloud datasets show that, compared to state-of-the-art algorithms, CyberOps-Bots maintains network availability 68.5% higher and achieves a 34.7% jumpstart performance gain when shifting the scenarios without retraining. To our knowledge, this is the first study to establish a robust LLM-RL framework with HITL support for cloud defense. We will release our framework to the community, facilitating the advancement of robust and autonomous defense in cloud networks.


【17】HAS-VQ: Hessian-Adaptive Sparse Vector Quantization for High-Fidelity LLM Compression
标题:HAS-VQ:用于高保真LLM压缩的Hessian自适应稀疏载体量化
链接:https://arxiv.org/abs/2601.06959

作者:Vladimer Khasia
摘要:Post-training quantization is essential for deploying Large Language Models (LLMs) on resource- constrained devices. However, standard integer quantization (e.g., INT4) fundamentally degrades per- formance by imposing a uniform grid on the heavy-tailed distribution of weight parameters, particularly in smaller-scale models (e.g., <2B parameters). We introduce HAS-VQ (Hessian-Adaptive Sparse Vec- tor Quantization), a compression framework that strictly decouples high-sensitivity outliers from the bulk weight distribution using second-order sensitivity analysis. HAS-VQ employs a Hessian-Masked Decoupling strategy to isolate sensitive parameters, followed by robust Vector Quantization (VQ) of the remaining dense body. Crucially, we introduce a residual sparse feedback mechanism that corrects quan- tization errors in the most sensitive dimensions, ensuring exact reconstruction of outliers. We evaluate HAS-VQ on SmolLM2-1.7B, demonstrating two distinct regimes of superiority: (1) Pareto Dominance over Integer Baselines: At 4.23 effective bits-per-parameter (BPP), we achieve a perplexity of 14.23, significantly outperforming the standard INT4 baseline (20.03 PPL at 4.71 BPP). (2) High-Fidelity Compression: Relative to the FP16 baseline, HAS-VQ achieves a 2.3x reduction in model size (7.03 BPP) while maintaining statistically indistinguishable perplexity (10.12 vs. 10.04), effectively offering a lossless compression alternative for bandwidth-constrained environments. The code is available at https://github.com/VladimerKhasia/HASVQ


【18】mind_call: A Dataset for Mental Health Function Calling with Large Language Models
标题:mind_call:使用大型语言模型的心理健康功能调用数据集
链接:https://arxiv.org/abs/2601.06937

作者:Fozle Rabbi Shafi,M. Anwar Hossain,Salimur Choudhury
摘要:Large Language Model (LLM)-based systems increasingly rely on function calling to enable structured and controllable interaction with external data sources, yet existing datasets do not address mental health-oriented access to wearable sensor data. This paper presents a synthetic function-calling dataset designed for mental health assistance grounded in wearable health signals such as sleep, physical activity, cardiovascular measures, stress indicators, and metabolic data. The dataset maps diverse natural language queries to standardized API calls derived from a widely adopted health data schema. Each sample includes a user query, a query category, an explicit reasoning step, a normalized temporal parameter, and a target function. The dataset covers explicit, implicit, behavioral, symptom-based, and metaphorical expressions, which reflect realistic mental health-related user interactions. This resource supports research on intent grounding, temporal reasoning, and reliable function invocation in LLM-based mental health agents and is publicly released to promote reproducibility and future work.


【19】Distributional Clarity: The Hidden Driver of RL-Friendliness in Large Language Models
标题:分布清晰度:大型语言模型中RL一致性的隐藏驱动力
链接:https://arxiv.org/abs/2601.06911

作者:Shaoning Sun,Mingzhu Cai,Huang He,Bingjin Chen,Siqi Bao,Yujiu Yang,Hua Wu,Haifeng Wang
摘要:Language model families exhibit striking disparity in their capacity to benefit from reinforcement learning: under identical training, models like Qwen achieve substantial gains, while others like Llama yield limited improvements. Complementing data-centric approaches, we reveal that this disparity reflects a hidden structural property: \textbf{distributional clarity} in probability space. Through a three-stage analysis-from phenomenon to mechanism to interpretation-we uncover that RL-friendly models exhibit intra-class compactness and inter-class separation in their probability assignments to correct vs. incorrect responses. We quantify this clarity using the \textbf{Silhouette Coefficient} ($S$) and demonstrate that (1) high $S$ correlates strongly with RL performance; (2) low $S$ is associated with severe logic errors and reasoning instability. To confirm this property, we introduce a Silhouette-Aware Reweighting strategy that prioritizes low-$S$ samples during training. Experiments across six mathematical benchmarks show consistent improvements across all model families, with gains up to 5.9 points on AIME24. Our work establishes distributional clarity as a fundamental, trainable property underlying RL-Friendliness.


【20】Paraphrasing Adversarial Attack on LLM-as-a-Reviewer
标题:解释对LLM作为评审员的对抗攻击
链接:https://arxiv.org/abs/2601.06884

作者:Masahiro Kaneko
摘要:The use of large language models (LLMs) in peer review systems has attracted growing attention, making it essential to examine their potential vulnerabilities. Prior attacks rely on prompt injection, which alters manuscript content and conflates injection susceptibility with evaluation robustness. We propose the Paraphrasing Adversarial Attack (PAA), a black-box optimization method that searches for paraphrased sequences yielding higher review scores while preserving semantic equivalence and linguistic naturalness. PAA leverages in-context learning, using previous paraphrases and their scores to guide candidate generation. Experiments across five ML and NLP conferences with three LLM reviewers and five attacking models show that PAA consistently increases review scores without changing the paper's claims. Human evaluation confirms that generated paraphrases maintain meaning and naturalness. We also find that attacked papers exhibit increased perplexity in reviews, offering a potential detection signal, and that paraphrasing submissions can partially mitigate attacks.


【21】Artificial Entanglement in the Fine-Tuning of Large Language Models
标题:大型语言模型微调中的人为纠缠
链接:https://arxiv.org/abs/2601.06788

作者:Min Chen,Zihan Wang,Canyu Chen,Zeguan Wu,Manling Li,Junyu Liu
备注:41 pages, many figures
摘要:Large language models (LLMs) can be adapted to new tasks using parameter-efficient fine-tuning (PEFT) methods that modify only a small number of trainable parameters, often through low-rank updates. In this work, we adopt a quantum-information-inspired perspective to understand their effectiveness. From this perspective, low-rank parameterizations naturally correspond to low-dimensional Matrix Product States (MPS) representations, which enable entanglement-based characterizations of parameter structure. Thereby, we term and measure "Artificial Entanglement", defined as the entanglement entropy of the parameters in artificial neural networks (in particular the LLMs). We first study the representative low-rank adaptation (LoRA) PEFT method, alongside full fine-tuning (FFT), using LLaMA models at the 1B and 8B scales trained on the Tulu3 and OpenThoughts3 datasets, and uncover: (i) Internal artificial entanglement in the updates of query and value projection matrices in LoRA follows a volume law with a central suppression (termed as the "Entanglement Valley"), which is sensitive to hyper-parameters and is distinct from that in FFT; (ii) External artificial entanglement in attention matrices, corresponding to token-token correlations in representation space, follows an area law with logarithmic corrections and remains robust to LoRA hyper-parameters and training steps. Drawing a parallel to the No-Hair Theorem in black hole physics, we propose that although LoRA and FFT induce distinct internal entanglement signatures, such differences do not manifest in the attention outputs, suggesting a "no-hair" property that results in the effectiveness of low rank updates. We further provide theoretical support based on random matrix theory, and extend our analysis to an MPS Adaptation PEFT method, which exhibits qualitatively similar behaviors.


【22】Cross-Border Data Security and Privacy Risks in Large Language Models and IoT Systems
标题:大型语言模型和物联网系统中的跨境数据安全和隐私风险
链接:https://arxiv.org/abs/2601.06612

作者:Chalitha Handapangoda
备注:Final project for CS-GY 6813 at NYU Tandon School of Engineering
摘要:The reliance of Large Language Models and Internet of Things systems on massive, globally distributed data flows creates systemic security and privacy challenges. When data traverses borders, it becomes subject to conflicting legal regimes, such as the EU's General Data Protection Regulation and China's Personal Information Protection Law, compounded by technical vulnerabilities like model memorization. Current static encryption and data localization methods are fragmented and reactive, failing to provide adequate, policy-aligned safeguards. This research proposes a Jurisdiction-Aware, Privacy-by-Design architecture that dynamically integrates localized encryption, adaptive differential privacy, and real-time compliance assertion via cryptographic proofs. Empirical validation in a multi-jurisdictional simulation demonstrates this architecture reduced unauthorized data exposure to below five percent and achieved zero compliance violations. These security gains were realized while maintaining model utility retention above ninety percent and limiting computational overhead. This establishes that proactive, integrated controls are feasible for secure and globally compliant AI deployment.


【23】Detecting LLM-Generated Text with Performance Guarantees
标题:检测LLM生成的文本并保证性能
链接:https://arxiv.org/abs/2601.06586

作者:Hongyi Zhou,Jin Zhu,Ying Yang,Chengchun Shi
摘要:Large language models (LLMs) such as GPT, Claude, Gemini, and Grok have been deeply integrated into our daily life. They now support a wide range of tasks -- from dialogue and email drafting to assisting with teaching and coding, serving as search engines, and much more. However, their ability to produce highly human-like text raises serious concerns, including the spread of fake news, the generation of misleading governmental reports, and academic misconduct. To address this practical problem, we train a classifier to determine whether a piece of text is authored by an LLM or a human. Our detector is deployed on an online CPU-based platform https://huggingface.co/spaces/stats-powered-ai/StatDetectLLM, and contains three novelties over existing detectors: (i) it does not rely on auxiliary information, such as watermarks or knowledge of the specific LLM used to generate the text; (ii) it more effectively distinguishes between human- and LLM-authored text; and (iii) it enables statistical inference, which is largely absent in the current literature. Empirically, our classifier achieves higher classification accuracy compared to existing detectors, while maintaining type-I error control, high statistical power, and computational efficiency.


【24】Mosaic: Unlocking Long-Context Inference for Diffusion LLMs via Global Memory Planning and Dynamic Peak Taming
标题:马赛克:通过全局内存规划和动态峰值驯服解锁扩散LLM的长上下文推理
链接:https://arxiv.org/abs/2601.06562

作者 :Liang Zheng,Bowen Shi,Yitao Hu,Jiawei Zhang,Ruofan Li,Sheng Chen,Wenxin Li,Keqiu Li
备注:11 pages, 18 figures
摘要:Diffusion-based large language models (dLLMs) have emerged as a promising paradigm, utilizing simultaneous denoising to enable global planning and iterative refinement. While these capabilities are particularly advantageous for long-context generation, deploying such models faces a prohibitive memory capacity barrier stemming from severe system inefficiencies. We identify that existing inference systems are ill-suited for this paradigm: unlike autoregressive models constrained by the cumulative KV-cache, dLLMs are bottlenecked by transient activations recomputed at every step. Furthermore, general-purpose memory reuse mechanisms lack the global visibility to adapt to dLLMs' dynamic memory peaks, which toggle between logits and FFNs. To address these mismatches, we propose Mosaic, a memory-efficient inference system that shifts from local, static management to a global, dynamic paradigm. Mosaic integrates a mask-only logits kernel to eliminate redundancy, a lazy chunking optimizer driven by an online heuristic search to adaptively mitigate dynamic peaks, and a global memory manager to resolve fragmentation via virtual addressing. Extensive evaluations demonstrate that Mosaic achieves an average 2.71$\times$ reduction in the memory peak-to-average ratio and increases the maximum inference sequence length supportable on identical hardware by 15.89-32.98$\times$. This scalability is achieved without compromising accuracy and speed, and in fact reducing latency by 4.12%-23.26%.


【25】SimLLM: Fine-Tuning Code LLMs for SimPy-Based Queueing System Simulation
标题:SimLLM:用于基于Simpy的排队系统模拟的微调代码LLM
链接:https://arxiv.org/abs/2601.06543

作者:Jun-Qi Chen,Kun Zhang,Rui Zheng,Ying Zhong
备注:33 pages, 10 figures
摘要:The Python package SimPy is widely used for modeling queueing systems due to its flexibility, simplicity, and smooth integration with modern data analysis and optimization frameworks. Recent advances in large language models (LLMs) have shown strong ability in generating clear and executable code, making them powerful and suitable tools for writing SimPy queueing simulation code. However, directly employing closed-source models like GPT-4o to generate such code may lead to high computational costs and raise data privacy concerns. To address this, we fine-tune two open-source LLMs, Qwen-Coder-7B and DeepSeek-Coder-6.7B, on curated SimPy queueing data, which enhances their code-generating performance in executability, output-format compliance, and instruction-code consistency. Particularly, we proposed a multi-stage fine-tuning framework comprising two stages of supervised fine-tuning (SFT) and one stage of direct preference optimization (DPO), progressively enhancing the model's ability in SimPy-based queueing simulation code generation. Extensive evaluations demonstrate that both fine-tuned models achieve substantial improvements in executability, output-format compliance, and instruct consistency. These results confirm that domain-specific fine-tuning can effectively transform compact open-source code models into reliable SimPy simulation generators which provide a practical alternative to closed-source LLMs for education, research, and operational decision support.


【26】Teach Diffusion Language Models to Learn from Their Own Mistakes
标题:教授扩散语言模型从自己的错误中学习
链接:https://arxiv.org/abs/2601.06428

作者:Liming Liu,Binxuan Huang,Xin Liu,Bing Yin,Tuo Zhao
备注:18 pages
摘要:Masked Diffusion Language Models (DLMs) achieve significant speed by generating multiple tokens in parallel. However, this parallel sampling approach, especially when using fewer inference steps, will introduce strong dependency errors and cause quality to deteriorate rapidly as the generation step size grows. As a result, reliable self-correction becomes essential for maintaining high-quality multi-token generation. To address this, we propose Decoupled Self-Correction (DSC), a novel two-stage methodology. DSC first fully optimizes the DLM's generative ability before freezing the model and training a specialized correction head. This decoupling preserves the model's peak SFT performance and ensures the generated errors used for correction head training are of higher quality. Additionally, we introduce Future-Context Augmentation (FCA) to maximize the correction head's accuracy. FCA generalizes the error training distribution by augmenting samples with ground-truth tokens, effectively training the head to utilize a richer, future-looking context. This mechanism is used for reliably detecting the subtle errors of the high-fidelity base model. Our DSC framework enables the model, at inference time, to jointly generate and revise tokens, thereby correcting errors introduced by multi-token generation and mitigating error accumulation across steps. Experiments on mathematical reasoning and code generation benchmarks demonstrate that our approach substantially reduces the quality degradation associated with larger generation steps, allowing DLMs to achieve both high generation speed and strong output fidelity.


【27】Evaluating Robustness of Large Language Models in Enterprise Applications: Benchmarks for Perturbation Consistency Across Formats and Languages
标题:评估企业应用程序中大型语言模型的稳健性:跨格式和语言的扰动一致性基准
链接:https://arxiv.org/abs/2601.06341

作者:Tara Bogavelli,Oluwanifemi Bamgbose,Gabrielle Gauthier Melançon,Fanny Riols,Roshnee Sharma
摘要:Enterprise LLM applications require consistently high quality and reliable performance across diverse scenarios, demanding robustness to minor variations. Existing research shows that even small prompt changes can lead to substantial differences in output, but has mainly focused on a narrow set of perturbations with small academic datasets, limiting their relevance to real-world applications. To address this, we present a comprehensive benchmark suite that evaluates robustness across multiple perturbation types, including general text edits (e.g., punctuation, whitespace), formatting changes (e.g., JSON, YAML), multilingual and cross-lingual inputs, and positional variations in instructions. Evaluating 11 models ranging from 4B to 120B+ parameters, we find that minor perturbations reduce performance by up to 40 percentage points on key enterprise metrics. Critically, we demonstrate that the relationship between model size and robustness is more nuanced than conventional assumptions suggest: an 8B parameter model (Ministral 3 8B) outperforms most larger models, while another 8B model (Llama 3.1 8B) performs worst overall.


【28】How well can off-the-shelf LLMs elucidate molecular structures from mass spectra using chain-of-thought reasoning?
标题:现成的LLM使用思想链推理从光谱中阐明分子结构的效果如何?
链接:https://arxiv.org/abs/2601.06289

作者:Yufeng Wang,Lu Wei,Lin Liu,Hao Xu,Haibin Ling
摘要:Mass spectrometry (MS) is a powerful analytical technique for identifying small molecules, yet determining complete molecular structures directly from tandem mass spectra (MS/MS) remains a long-standing challenge due to complex fragmentation patterns and the vast diversity of chemical space. Recent progress in large language models (LLMs) has shown promise for reasoning-intensive scientific tasks, but their capability for chemical interpretation is still unclear. In this work, we introduce a Chain-of-Thought (CoT) prompting framework and benchmark that evaluate how LLMs reason about mass spectral data to predict molecular structures. We formalize expert chemists' reasoning steps-such as double bond equivalent (DBE) analysis, neutral loss identification, and fragment assembly-into structured prompts and assess multiple state-of-the-art LLMs (Claude-3.5-Sonnet, GPT-4o-mini, and Llama-3 series) in a zero-shot setting using the MassSpecGym dataset. Our evaluation across metrics of SMILES validity, formula consistency, and structural similarity reveals that while LLMs can produce syntactically valid and partially plausible structures, they fail to achieve chemical accuracy or link reasoning to correct molecular predictions. These findings highlight both the interpretive potential and the current limitations of LLM-based reasoning for molecular elucidation, providing a foundation for future work that combines domain knowledge and reinforcement learning to achieve chemically grounded AI reasoning.


【29】AIConfigurator: Lightning-Fast Configuration Optimization for Multi-Framework LLM Serving
标题:AIBITator:针对多框架LLM服务的闪电般的配置优化
链接:https://arxiv.org/abs/2601.06288

作者:Tianhao Xu,Yiming Liu,Xianglong Lu,Yijia Zhao,Xuting Zhou,Aichen Feng,Yiyi Chen,Yi Shen,Qin Zhou,Xumeng Chen,Ilya Sherstyuk,Haorui Li,Rishi Thakkar,Ben Hamm,Yuanzhe Li,Xue Huang,Wenpeng Wu,Anish Shanbhag,Harry Kim,Chuan Chen,Junjie Lai
摘要:Optimizing Large Language Model (LLM) inference in production systems is increasingly difficult due to dynamic workloads, stringent latency/throughput targets, and a rapidly expanding configuration space. This complexity spans not only distributed parallelism strategies (tensor/pipeline/expert) but also intricate framework-specific runtime parameters such as those concerning the enablement of CUDA graphs, available KV-cache memory fractions, and maximum token capacity, which drastically impact performance. The diversity of modern inference frameworks (e.g., TRT-LLM, vLLM, SGLang), each employing distinct kernels and execution policies, makes manual tuning both framework-specific and computationally prohibitive. We present AIConfigurator, a unified performance-modeling system that enables rapid, framework-agnostic inference configuration search without requiring GPU-based profiling. AIConfigurator combines (1) a methodology that decomposes inference into analytically modelable primitives - GEMM, attention, communication, and memory operations while capturing framework-specific scheduling dynamics; (2) a calibrated kernel-level performance database for these primitives across a wide range of hardware platforms and popular open-weights models (GPT-OSS, Qwen, DeepSeek, LLama, Mistral); and (3) an abstraction layer that automatically resolves optimal launch parameters for the target backend, seamlessly integrating into production-grade orchestration systems. Evaluation on production LLM serving workloads demonstrates that AIConfigurator identifies superior serving configurations that improve performance by up to 40% for dense models (e.g., Qwen3-32B) and 50% for MoE architectures (e.g., DeepSeek-V3), while completing searches within 30 seconds on average. Enabling the rapid exploration of vast design spaces - from cluster topology down to engine specific flags.


【30】Projecting Out the Malice: A Global Subspace Approach to LLM Detoxification
标题:投射恶意:LLM去实体化的全球子空间方法
链接:https://arxiv.org/abs/2601.06226

作者:Zenghao Duan,Zhiyi Yin,Zhichao Shi,Liang Pang,Shaoling Jing,Zihe Huang,Jiayi Wu,Yu Yan,Jingcheng Deng,Huawei Shen,Xueqi Cheng
摘要:Large language models (LLMs) exhibit exceptional performance but pose inherent risks of generating toxic content, restricting their safe deployment. While traditional methods (e.g., alignment) adjust output preferences, they fail to eliminate underlying toxic regions in parameters, leaving models vulnerable to adversarial attacks. Prior mechanistic studies characterize toxic regions as "toxic vectors" or "layer-wise subspaces", yet our analysis identifies critical limitations: i) Removed toxic vectors can be reconstructed via linear combinations of non-toxic vectors, demanding targeting of entire toxic subspace; ii) Contrastive objective over limited samples inject noise into layer-wise subspaces, hindering stable extraction. These highlight the challenge of identifying robust toxic subspace and removing them. Therefore, we propose GLOSS (GLobal tOxic Subspace Suppression), a lightweight method that mitigates toxicity by identifying and eliminating this global subspace from FFN parameters. Experiments on LLMs (e.g., Qwen3) show GLOSS achieves SOTA detoxification while preserving general capabilities without requiring large-scale retraining. WARNING: This paper contains context which is toxic in nature.


【31】Breaking Model Lock-in: Cost-Efficient Zero-Shot LLM Routing via a Universal Latent Space
标题:打破模型锁定:通过通用潜在空间的经济高效Zero-ShotLLM路由
链接:https://arxiv.org/abs/2601.06220

作者:Cheng Yan,Wuyang Zhang,Zhiyuan Ning,Fan Xu,Ziyang Tao,Lu Zhang,Bing Yin,Yanyong Zhang
摘要:The rapid proliferation of Large Language Models (LLMs) has led to a fragmented and inefficient ecosystem, a state of ``model lock-in'' where seamlessly integrating novel models remains a significant bottleneck. Current routing frameworks require exhaustive, costly retraining, hindering scalability and adaptability. We introduce ZeroRouter, a new paradigm for LLM routing that breaks this lock-in. Our approach is founded on a universal latent space, a model-agnostic representation of query difficulty that fundamentally decouples the characterization of a query from the profiling of a model. This allows for zero-shot onboarding of new models without full-scale retraining. ZeroRouter features a context-aware predictor that maps queries to this universal space and a dual-mode optimizer that balances accuracy, cost, and latency. Our framework consistently outperforms all baselines, delivering higher accuracy at lower cost and latency.


【32】Cyber Threat Detection and Vulnerability Assessment System using Generative AI and Large Language Model
标题:使用生成式人工智能和大型语言模型的网络威胁检测和脆弱性评估系统
链接:https://arxiv.org/abs/2601.06213

作者:Keerthi Kumar. M,Swarun Kumar Joginpelly,Sunil Khemka,Lakshmi. S R,Navin Chhibber
摘要:Background: Cyber-attacks have evolved rapidly in recent years, many individuals and business owners have been affected by cyber-attacks in various ways. Cyber-attacks include various threats such as ransomware, malware, phishing, and Denial of Service (DoS)-related attacks. Challenges: Traditional models such as Generative Artificial Intelligence (AI) and Security Bidirectional Encoder Representations from Transformers (BERT) were implemented to detect cyber threats. However, the existing Security BERT model has a limited contextual understanding of text data, which has less impact on detecting cyber-attacks. Proposed Methodology: To overcome the above-mentioned challenges, Robustly Optimized Bidirectional Encoder Representations from Transformers Pretraining Approach (RoBERTa) model is proposed which consists of diverse words of vocabulary understanding. Initially, data are extracted from a Packet Capture (PCAP) file and encrypted using Fully Harmonic Encryption (FHE). Subsequently, a Byte-level and Byte Pair Encoding (BBPE) tokenizer was used to generate tokens and help maintain the vocabulary for the encrypted values. Then, these values are applied to the RoBERTa model of the transformer with extensive training. Finally, Softmax is used for the detection and classification of attacks. The proposed RoBERTa model achieved better results than the existing BERT model in terms of accuracy (0.99), recall (0.91), and precision (0.89) respectively.


【33】Manifold-based Sampling for In-Context Hallucination Detection in Large Language Models
标题:基于流形采样的大语言模型中的上下文幻觉检测
链接:https://arxiv.org/abs/2601.06196

作者:Bodla Krishna Vamshi,Rohan Bhatnagar,Haizhao Yang
摘要:Large language models (LLMs) frequently generate factually incorrect or unsupported content, commonly referred to as hallucinations. Prior work has explored decoding strategies, retrieval augmentation, and supervised fine-tuning for hallucination detection, while recent studies show that in-context learning (ICL) can substantially influence factual reliability. However, existing ICL demonstration selection methods often rely on surface-level similarity heuristics and exhibit limited robustness across tasks and models.   We propose MB-ICL, a manifold-based demonstration sampling framework for selecting in-context demonstrations that leverages latent representations extracted from frozen LLMs. By jointly modeling local manifold structure and class-aware prototype geometry, MB-ICL selects demonstrations based on their proximity to learned prototypes rather than lexical or embedding similarity alone.   Across factual verification (FEVER) and hallucination detection (HaluEval) benchmarks, MB-ICL outperforms standard ICL selection baselines in the majority of evaluated settings, with particularly strong gains on dialogue and summarization tasks. The method remains robust under temperature perturbations and model variation, indicating improved stability compared to heuristic retrieval strategies. While lexical retrieval can remain competitive in certain question-answering regimes, our results demonstrate that manifold-based prototype selection provides a reliable and training light approach for hallucination detection without modifying LLM parameters, offering a principled direction for improved ICL demonstration selection.


【34】MLB: A Scenario-Driven Benchmark for Evaluating Large Language Models in Clinical Applications
标题:MLB:一个用于评估临床应用中大型语言模型的场景驱动基准
链接:https://arxiv.org/abs/2601.06193

作者:Qing He,Dongsheng Bi,Jianrong Lu,Minghui Yang,Zixiao Chen,Jiacheng Lu,Jing Chen,Nannan Du,Xiao Cu,Sijing Wu,Peng Xiang,Yinyin Hu,Yi Guo,Chunpu Li,Shaoyang Li,Zhuo Dong,Ming Jiang,Shuai Guo,Liyun Feng,Jin Peng,Jian Wang,Jinjie Gu,Junwei Liu
备注:11 pages, 4 figures, 5 tables
摘要:The proliferation of Large Language Models (LLMs) presents transformative potential for healthcare, yet practical deployment is hindered by the absence of frameworks that assess real-world clinical utility. Existing benchmarks test static knowledge, failing to capture the dynamic, application-oriented capabilities required in clinical practice. To bridge this gap, we introduce a Medical LLM Benchmark MLB, a comprehensive benchmark evaluating LLMs on both foundational knowledge and scenario-based reasoning. MLB is structured around five core dimensions: Medical Knowledge (MedKQA), Safety and Ethics (MedSE), Medical Record Understanding (MedRU), Smart Services (SmartServ), and Smart Healthcare (SmartCare). The benchmark integrates 22 datasets (17 newly curated) from diverse Chinese clinical sources, covering 64 clinical specialties. Its design features a rigorous curation pipeline involving 300 licensed physicians. Besides, we provide a scalable evaluation methodology, centered on a specialized judge model trained via Supervised Fine-Tuning (SFT) on expert annotations. Our comprehensive evaluation of 10 leading models reveals a critical translational gap: while the top-ranked model, Kimi-K2-Instruct (77.3% accuracy overall), excels in structured tasks like information extraction (87.8% accuracy in MedRU), performance plummets in patient-facing scenarios (61.3% in SmartServ). Moreover, the exceptional safety score (90.6% in MedSE) of the much smaller Baichuan-M2-32B highlights that targeted training is equally critical. Our specialized judge model, trained via SFT on a 19k expert-annotated medical dataset, achieves 92.1% accuracy, an F1-score of 94.37%, and a Cohen's Kappa of 81.3% for human-AI consistency, validating a reproducible and expert-aligned evaluation protocol. MLB thus provides a rigorous framework to guide the development of clinically viable LLMs.


【35】Rational Synthesizers or Heuristic Followers? Analyzing LLMs in RAG-based Question-Answering
标题:理性合成者还是启发式追随者?分析基于RAG的信息服务中的LLM
链接:https://arxiv.org/abs/2601.06189

作者:Atharv Naphade
备注:13 pages, 9 figures, ACL ARR submission
摘要 :Retrieval-Augmented Generation (RAG) is the prevailing paradigm for grounding Large Language Models (LLMs), yet the mechanisms governing how models integrate groups of conflicting retrieved evidence remain opaque. Does an LLM answer a certain way because the evidence is factually strong, because of a prior belief, or merely because it is repeated frequently? To answer this, we introduce GroupQA, a curated dataset of 1,635 controversial questions paired with 15,058 diversely-sourced evidence documents, annotated for stance and qualitative strength. Through controlled experiments, we characterize group-level evidence aggregation dynamics: Paraphrasing an argument can be more persuasive than providing distinct independent support; Models favor evidence presented first rather than last, and Larger models are increasingly resistant to adapt to presented evidence. Additionally, we find that LLM explanations to group-based answers are unfaithful. Together, we show that LLMs behave consistently as vulnerable heuristic followers, with direct implications for improving RAG system design.


【36】Neuro-Symbolic Compliance: Integrating LLMs and SMT Solvers for Automated Financial Legal Analysis
标题:神经象征合规:集成LLM和SMT解决者以实现自动化金融法律分析
链接:https://arxiv.org/abs/2601.06181

作者:Yung-Shen Hsia,Fang Yu,Jie-Hong Roland Jiang
备注:10 pages, 6 tables, 3 figures, accepted by the 2nd ACM AIware Conference
摘要:Financial regulations are increasingly complex, hindering automated compliance-especially the maintenance of logical consistency with minimal human oversight. We introduce a Neuro-Symbolic Compliance Framework that integrates Large Language Models (LLMs) with Satisfiability Modulo Theories (SMT) solvers to enable formal verifiability and optimization-based compliance correction. The LLM interprets statutes and enforcement cases to generate SMT constraints, while the solver enforces consistency and computes the minimal factual modification required to restore legality when penalties arise. Unlike transparency-oriented methods, our approach emphasizes logic-driven optimization, delivering verifiable, legally consistent reasoning rather than post-hoc explanation. Evaluated on 87 enforcement cases from Taiwan's Financial Supervisory Commission (FSC), the system attains 86.2% correctness in SMT code generation, improves reasoning efficiency by over 100x, and consistently corrects violations-establishing a preliminary foundation for optimization-based compliance applications.


【37】ECLIPTICA - A Framework for Switchable LLM Alignment via CITA - Contrastive Instruction-Tuned Alignment
标题:ECLIPTICA -通过CITA实现可切换LLM对齐的框架-对比教学调整对齐
链接:https://arxiv.org/abs/2601.06157

作者:Kapil Wanaskar,Gaytri Jena,Vinija Jain,Aman Chadha,Amitava Das
摘要:Alignment in large language models (LLMs) is still largely static: after training, the policy is frozen. DPO, GRPO methods typically imprint one behavior into the weights, leaving little runtime control beyond prompt hacks or expensive re-alignment. We introduce ECLIPTICA, which treats alignment as instruction-driven and runtime-controllable: natural-language alignment instructions act as an explicit behavioral contract (stance, refusal boundary, verbosity) that modulates behavior on the fly under evolving safety requirements, user roles, and governance constraints.   We introduce CITA (Contrastive Instruction-Tuned Alignment), combining SFT with contrastive preference optimization under an explicit geometric anchor to a reference model. This yields a stable Riemannian chart and keeps instruction updates within a shared neighborhood, so regimes stay nearby and traversable for reliable switching.   To isolate policy switching from ordinary instruction following, we release the ECLIPTICA benchmark: 3000 controlled cases (300 prompts x 10 instruction types) where the user request is fixed and only the alignment instruction changes. On Llama-3.1-8B across five suites (ECLIPTICA, TruthfulQA, Conditional Safety, Length Control, LITMUS), CITA reaches 86.7% instruction-alignment efficiency, beating DPO (56.1%), GRPO (36.1%), and PPO (20.4%).


【38】LLM Flow Processes for Text-Conditioned Regression
标题:文本条件回归的LLM流程流程
链接:https://arxiv.org/abs/2601.06147

作者:Felix Biggs,Samuel Willis
摘要:Meta-learning methods for regression like Neural (Diffusion) Processes achieve impressive results, but with these models it can be difficult to incorporate expert prior knowledge and information contained in metadata. Large Language Models (LLMs) are trained on giant corpora including varied real-world regression datasets alongside their descriptions and metadata, leading to impressive performance on a range of downstream tasks. Recent work has extended this to regression tasks and is able to leverage such prior knowledge and metadata, achieving surprisingly good performance, but this still rarely matches dedicated meta-learning methods. Here we introduce a general method for sampling from a product-of-experts of a diffusion or flow matching model and an `expert' with binned probability density; we apply this to combine neural diffusion processes with LLM token probabilities for regression (which may incorporate textual knowledge), exceeding the empirical performance of either alone.


【39】Filtering Beats Fine Tuning: A Bayesian Kalman View of In Context Learning in LLMs
标题:过滤节拍微调:LLM中上下文学习的Bayesian卡尔曼观点
链接:https://arxiv.org/abs/2601.06100

作者:Andrew Kiruluta
摘要:We present a theory-first framework that interprets inference-time adaptation in large language models (LLMs) as online Bayesian state estimation. Rather than modeling rapid adaptation as implicit optimization or meta-learning, we formulate task- and context-specific learning as the sequential inference of a low-dimensional latent adaptation state governed by a linearized state-space model. Under Gaussian assumptions, adaptation follows a Kalman recursion with closed-form updates for both the posterior mean and covariance.   This perspective elevates epistemic uncertainty to an explicit dynamical variable. We show that inference-time learning is driven by covariance collapse, i.e., rapid contraction of posterior uncertainty induced by informative tokens, which typically precedes convergence of the posterior mean. Using observability conditions on token-level Jacobians, we establish stability of the Bayesian filter, prove exponential covariance contraction rates, and derive mean-square error bounds. Gradient descent, natural-gradient methods, and meta-learning updates arise as singular, noise-free limits of the filtering dynamics, positioning optimization-based adaptation as a degenerate approximation of Bayesian inference.   The resulting theory provides a unified probabilistic account of in-context learning, parameter-efficient adaptation, and test-time learning without parameter updates. It yields explicit guarantees on stability and sample efficiency, offers a principled interpretation of prompt informativeness via information accumulation, and clarifies the role of uncertainty dynamics absent from existing accounts. Minimal illustrative experiments corroborate the qualitative predictions of the theory.


【40】CrossTrafficLLM: A Human-Centric Framework for Interpretable Traffic Intelligence via Large Language Model
标题:CrossAtlantic LLM:通过大型语言模型实现可解释交通智能的以人为本的框架
链接:https://arxiv.org/abs/2601.06042

作者:Zeming Du,Qitan Shao,Hongfei Liu,Yong Zhang
摘要:While accurate traffic forecasting is vital for Intelligent Transportation Systems (ITS), effectively communicating predicted conditions via natural language for human-centric decision support remains a challenge and is often handled separately. To address this, we propose CrossTrafficLLM, a novel GenAI-driven framework that simultaneously predicts future spatiotemporal traffic states and generates corresponding natural language descriptions, specifically targeting conditional abnormal event summaries. We tackle the core challenge of aligning quantitative traffic data with qualitative textual semantics by leveraging Large Language Models (LLMs) within a unified architecture. This design allows generative textual context to improve prediction accuracy while ensuring generated reports are directly informed by the forecast. Technically, a text-guided adaptive graph convolutional network is employed to effectively merge high-level semantic information with the traffic network structure. Evaluated on the BJTT dataset, CrossTrafficLLM demonstrably surpasses state-of-the-art methods in both traffic forecasting performance and text generation quality. By unifying prediction and description generation, CrossTrafficLLM delivers a more interpretable, and actionable approach to generative traffic intelligence, offering significant advantages for modern ITS applications.


【41】Large Language Models for Physics Instrument Design
标题:物理仪器设计的大型语言模型
链接:https://arxiv.org/abs/2601.07580

作者:Sara Zoccheddu,Shah Rukh Qasim,Patrick Owen,Nicola Serra
摘要:We study the use of large language models (LLMs) for physics instrument design and compare their performance to reinforcement learning (RL). Using only prompting, LLMs are given task constraints and summaries of prior high-scoring designs and propose complete detector configurations, which we evaluate with the same simulators and reward functions used in RL-based optimization. Although RL yields stronger final designs, we find that modern LLMs consistently generate valid, resource-aware, and physically meaningful configurations that draw on broad pretrained knowledge of detector design principles and particle--matter interactions, despite having no task-specific training. Based on this result, as a first step toward hybrid design workflows, we explore pairing the LLMs with a dedicated trust region optimizer, serving as a precursor to future pipelines in which LLMs propose and structure design hypotheses while RL performs reward-driven optimization. Based on these experiments, we argue that LLMs are well suited as meta-planners: they can design and orchestrate RL-based optimization studies, define search strategies, and coordinate multiple interacting components within a unified workflow. In doing so, they point toward automated, closed-loop instrument design in which much of the human effort required to structure and supervise optimization can be reduced.


【42】PriceSeer: Evaluating Large Language Models in Real-Time Stock Prediction
标题:PriceSeer:评估实时股票预测中的大型语言模型
链接:https://arxiv.org/abs/2601.06088

作者:Bohan Liang,Zijian Chen,Qi Jia,Kaiwei Zhang,Kaiyuan Ji,Guangtao Zhai
备注:7 pages, 6 figures
摘要:Stock prediction, a subject closely related to people's investment activities in fully dynamic and live environments, has been widely studied. Current large language models (LLMs) have shown remarkable potential in various domains, exhibiting expert-level performance through advanced reasoning and contextual understanding. In this paper, we introduce PriceSeer, a live, dynamic, and data-uncontaminated benchmark specifically designed for LLMs performing stock prediction tasks. Specifically, PriceSeer includes 110 U.S. stocks from 11 industrial sectors, with each containing 249 historical data points. Our benchmark implements both internal and external information expansion, where LLMs receive extra financial indicators, news, and fake news to perform stock price prediction. We evaluate six cutting-edge LLMs under different prediction horizons, demonstrating their potential in generating investment strategies after obtaining accurate price predictions for different sectors. Additionally, we provide analyses of LLMs' suboptimal performance in long-term predictions, including the vulnerability to fake news and specific industries. The code and evaluation data will be open-sourced at https://github.com/BobLiang2113/PriceSeer.


Graph相关(图学习|图神经网络|图优化等)(7篇)

【1】Graph Inference Towards ICD Coding
标题:对ICD编码的图形推理
链接:https://arxiv.org/abs/2601.07496

作者:Xiaoxiao Deng
备注:6 pages, 2 figures, 2 tables
摘要:Automated ICD coding involves assigning standardized diagnostic codes to clinical narratives. The vast label space and extreme class imbalance continue to challenge precise prediction. To address these issues, LabGraph is introduced -- a unified framework that reformulates ICD coding as a graph generation task. By combining adversarial domain adaptation, graph-based reinforcement learning, and perturbation regularization, LabGraph effectively enhances model robustness and generalization. In addition, a label graph discriminator dynamically evaluates each generated code, providing adaptive reward feedback during training. Experiments on benchmark datasets demonstrate that LabGraph consistently outperforms previous approaches on micro-F1, micro-AUC, and P@K.


【2】Kernel Alignment-based Multi-view Unsupervised Feature Selection with Sample-level Adaptive Graph Learning
标题:基于核对齐的多视图无监督特征选择和样本级自适应图学习
链接:https://arxiv.org/abs/2601.07288

作者 :Yalan Tan,Yanyong Huang,Zongxin Shen,Dongjie Wang,Fengmao Lv,Tianrui Li
摘要:Although multi-view unsupervised feature selection (MUFS) has demonstrated success in dimensionality reduction for unlabeled multi-view data, most existing methods reduce feature redundancy by focusing on linear correlations among features but often overlook complex nonlinear dependencies. This limits the effectiveness of feature selection. In addition, existing methods fuse similarity graphs from multiple views by employing sample-invariant weights to preserve local structure. However, this process fails to account for differences in local neighborhood clarity among samples within each view, thereby hindering accurate characterization of the intrinsic local structure of the data. In this paper, we propose a Kernel Alignment-based multi-view unsupervised FeatUre selection with Sample-level adaptive graph lEarning method (KAFUSE) to address these issues. Specifically, we first employ kernel alignment with an orthogonal constraint to reduce feature redundancy in both linear and nonlinear relationships. Then, a cross-view consistent similarity graph is learned by applying sample-level fusion to each slice of a tensor formed by stacking similarity graphs from different views, which automatically adjusts the view weights for each sample during fusion. These two steps are integrated into a unified model for feature selection, enabling mutual enhancement between them. Extensive experiments on real multi-view datasets demonstrate the superiority of KAFUSE over state-of-the-art methods.


【3】†DAGGER: Distractor-Aware Graph Generation for Executable Reasoning in Math Problems
标题:†DAGGER:用于数学问题中可执行推理的干扰感知图生成
链接:https://arxiv.org/abs/2601.06853

作者:Zabir Al Nazi,Shubhashis Roy Dipta,Sudipta Kar
摘要:Chain-of-Thought (CoT) prompting is widely adopted for mathematical problem solving, including in low-resource languages, yet its behavior under irrelevant context remains underexplored. To systematically study this challenge, we introduce DISTRACTMATH-BN, a Bangla benchmark that augments MGSM and MSVAMP with semantically coherent but computationally irrelevant information. Evaluating seven models ranging from 3B to 12B parameters, we observe substantial performance degradation under distractors: standard models drop by up to 41 points, while reasoning-specialized models decline by 14 to 20 points despite consuming five times more tokens. We propose †DAGGER, which reformulates mathematical problem solving as executable computational graph generation with explicit modeling of distractor nodes. Fine-tuning Gemma-3 models using supervised fine-tuning followed by Group Relative Policy Optimization achieves comparable weighted accuracy on augmented benchmarks while using 89 percent fewer tokens than reasoning models. Importantly, this robustness emerges without explicit training on distractor-augmented examples. Our results suggest that enforcing structured intermediate representations improves robustness and inference efficiency in mathematical reasoning compared to free-form approaches, particularly in noisy, low-resource settings.


【4】Graph Neural Network with One-side Edge Sampling for Fraud Detection
标题:用于欺诈检测的单边边缘采样图神经网络
链接:https://arxiv.org/abs/2601.06800

作者:Hoang Hiep Trieu
摘要:Financial fraud is always a major problem in the field of finance, as it can cause significant consequences. As a result, many approaches have been designed to detect it, and lately Graph Neural Networks (GNNs) have been demonstrated as a competent candidate. However, when trained with a large amount of data, they are slow and computationally demanding. In addition, GNNs may need a deep architecture to detect complex fraud patterns, but doing so may make them suffer from problems such as over-fitting or over-smoothing. Over-fitting leads to reduced generalisation of the model on unseen data, while over-smoothing causes all nodes' features to converge to a fixed point due to excessive aggregation of information from neighbouring nodes. In this research, I propose an approach called One-Side Edge Sampling (OES) that can potentially reduce training duration as well as the effects of over-smoothing and over-fitting. The approach leverages predictive confidence in an edge classification task to sample edges from the input graph during a certain number of epochs. To explain why OES can alleviate over-smoothing, I perform a theoretical analysis of the proposed approach. In addition, to validate the effect of OES, I conduct experiments using different GNNs on two datasets. The results show that OES can empirically outperform backbone models in both shallow and deep architectures while also reducing training time.


【5】Predicting Student Success with Heterogeneous Graph Deep Learning and Machine Learning Models
标题:利用异类图深度学习和机器学习模型预测学生的成功
链接:https://arxiv.org/abs/2601.06729

作者:Anca Muresan,Mihaela Cardei,Ionut Cardei
摘要:Early identification of student success is crucial for enabling timely interventions, reducing dropout rates, and promoting on time graduation. In educational settings, AI powered systems have become essential for predicting student performance due to their advanced analytical capabilities. However, effectively leveraging diverse student data to uncover latent and complex patterns remains a key challenge. While prior studies have explored this area, the potential of dynamic data features and multi category entities has been largely overlooked. To address this gap, we propose a framework that integrates heterogeneous graph deep learning models to enhance early and continuous student performance prediction, using traditional machine learning algorithms for comparison. Our approach employs a graph metapath structure and incorporates dynamic assessment features, which progressively influence the student success prediction task. Experiments on the Open University Learning Analytics (OULA) dataset demonstrate promising results, achieving a 68.6% validation F1 score with only 7% of the semester completed, and reaching up to 89.5% near the semester's end. Our approach outperforms top machine learning models by 4.7% in validation F1 score during the critical early 7% of the semester, underscoring the value of dynamic features and heterogeneous graph representations in student success prediction.


【6】Reinforcement Learning-Guided Dynamic Multi-Graph Fusion for Evacuation Traffic Prediction
标题:强化学习引导的动态多图融合疏散交通预测
链接:https://arxiv.org/abs/2601.06664

作者:Md Nafees Fuad Rafi,Samiul Hasan
摘要 :Real-time traffic prediction is critical for managing transportation systems during hurricane evacuations. Although data-driven graph-learning models have demonstrated strong capabilities in capturing the complex spatiotemporal dynamics of evacuation traffic at a network level, they mostly consider a single dimension (e.g., travel-time or distance) to construct the underlying graph. Furthermore, these models often lack interpretability, offering little insight into which input variables contribute most to their predictive performance. To overcome these limitations, we develop a novel Reinforcement Learning-guided Dynamic Multi-Graph Fusion (RL-DMF) framework for evacuation traffic prediction. We construct multiple dynamic graphs at each time step to represent heterogeneous spatiotemporal relationships between traffic detectors. A dynamic multi-graph fusion (DMF) module is employed to adaptively learn and combine information from these graphs. To enhance model interpretability, we introduce RL-based intelligent feature selection and ranking (RL-IFSR) method that learns to mask irrelevant features during model training. The model is evaluated using a real-world dataset of 12 hurricanes affecting Florida from 2016 to 2024. For an unseen hurricane (Milton, 2024), the model achieves a 95% accuracy (RMSE = 293.9) for predicting the next 1-hour traffic flow. Moreover, the model can forecast traffic flow for up to next 6 hours with 90% accuracy (RMSE = 426.4). The RL-DMF framework outperforms several state-of-the-art traffic prediction models. Furthermore, ablation experiments confirm the effectiveness of dynamic multi-graph fusion and RL-IFSR approaches for improving model performance. This research provides a generalized and interpretable model for real-time evacuation traffic forecasting, with significant implications for evacuation traffic management.


【7】Hierarchical Pooling and Explainability in Graph Neural Networks for Tumor and Tissue-of-Origin Classification Using RNA-seq Data
标题:使用RN-seq数据进行肿瘤和起源组织分类的图神经网络分层池和可解释性
链接:https://arxiv.org/abs/2601.06381

作者:Thomas Vaitses Fontanari,Mariana Recamonde-Mendoza
摘要:This study explores the use of graph neural networks (GNNs) with hierarchical pooling and multiple convolution layers for cancer classification based on RNA-seq data. We combine gene expression data from The Cancer Genome Atlas (TCGA) with a precomputed STRING protein-protein interaction network to classify tissue origin and distinguish between normal and tumor samples. The model employs Chebyshev graph convolutions (K=2) and weighted pooling layers, aggregating gene clusters into 'supernodes' across multiple coarsening levels. This approach enables dimensionality reduction while preserving meaningful interactions. Saliency methods were applied to interpret the model by identifying key genes and biological processes relevant to cancer. Our findings reveal that increasing the number of convolution and pooling layers did not enhance classification performance. The highest F1-macro score (0.978) was achieved with a single pooling layer. However, adding more layers resulted in over-smoothing and performance degradation. However, the model proved highly interpretable through gradient methods, identifying known cancer-related genes and highlighting enriched biological processes, and its hierarchical structure can be used to develop new explainable architectures. Overall, while deeper GNN architectures did not improve performance, the hierarchical pooling structure provided valuable insights into tumor biology, making GNNs a promising tool for cancer biomarker discovery and interpretation


Transformer(3篇)

【1】DDT: A Dual-Masking Dual-Expert Transformer for Energy Time-Series Forecasting
标题:DDT:一种用于能源时间序列预测的双屏蔽双专家Transformer
链接:https://arxiv.org/abs/2601.07250

作者:Mingnan Zhu,Qixuan Zhang,Yixuan Cheng,Fangzhou Gu,Shiming Lin
摘要:Accurate energy time-series forecasting is crucial for ensuring grid stability and promoting the integration of renewable energy, yet it faces significant challenges from complex temporal dependencies and the heterogeneity of multi-source data. To address these issues, we propose DDT, a novel and robust deep learning framework for high-precision time-series forecasting. At its core, DDT introduces two key innovations. First, we design a dual-masking mechanism that synergistically combines a strict causal mask with a data-driven dynamic mask. This novel design ensures theoretical causal consistency while adaptively focusing on the most salient historical information, overcoming the rigidity of traditional masking techniques. Second, our architecture features a dual-expert system that decouples the modeling of temporal dynamics and cross-variable correlations into parallel, specialized pathways, which are then intelligently integrated through a dynamic gated fusion module. We conducted extensive experiments on 7 challenging energy benchmark datasets, including ETTh, Electricity, and Solar. The results demonstrate that DDT consistently outperforms strong state-of-the-art baselines across all prediction horizons, establishing a new benchmark for the task.


【2】Generalization Bounds for Transformer Channel Decoders
标题:Transformer通道译码器的广义界
链接:https://arxiv.org/abs/2601.06969

作者:Qinshan Zhang,Bin Chen,Yong Jiang,Shu-Tao Xia
备注:18 pages, 3 figures
摘要:Transformer channel decoders, such as the Error Correction Code Transformer (ECCT), have shown strong empirical performance in channel decoding, yet their generalization behavior remains theoretically unclear. This paper studies the generalization performance of ECCT from a learning-theoretic perspective. By establishing a connection between multiplicative noise estimation errors and bit-error-rate (BER), we derive an upper bound on the generalization gap via bit-wise Rademacher complexity. The resulting bound characterizes the dependence on code length, model parameters, and training set size, and applies to both single-layer and multi-layer ECCTs. We further show that parity-check-based masked attention induces sparsity that reduces the covering number, leading to a tighter generalization bound. To the best of our knowledge, this work provides the first theoretical generalization guarantees for this class of decoders.


【3】Circuit Mechanisms for Spatial Relation Generation in Diffusion Transformers
标题:扩散变换器中空间关系生成的电路机制
链接:https://arxiv.org/abs/2601.06338

作者:Binxu Wang,Jingxuan Fan,Xu Pan
备注:31 pages, 23 figures
摘要:Diffusion Transformers (DiTs) have greatly advanced text-to-image generation, but models still struggle to generate the correct spatial relations between objects as specified in the text prompt. In this study, we adopt a mechanistic interpretability approach to investigate how a DiT can generate correct spatial relations between objects. We train, from scratch, DiTs of different sizes with different text encoders to learn to generate images containing two objects whose attributes and spatial relations are specified in the text prompt. We find that, although all the models can learn this task to near-perfect accuracy, the underlying mechanisms differ drastically depending on the choice of text encoder. When using random text embeddings, we find that the spatial-relation information is passed to image tokens through a two-stage circuit, involving two cross-attention heads that separately read the spatial relation and single-object attributes in the text prompt. When using a pretrained text encoder (T5), we find that the DiT uses a different circuit that leverages information fusion in the text tokens, reading spatial-relation and single-object information together from a single text token. We further show that, although the in-domain performance is similar for the two settings, their robustness to out-of-domain perturbations differs, potentially suggesting the difficulty of generating correct relations in real-world scenarios.


GAN|对抗|攻击|生成相关(7篇)

【1】Self-Creating Random Walks for Decentralized Learning under Pac-Man Attacks
标题:吃豆人攻击下的去中心化学习自创建随机行走
链接:https://arxiv.org/abs/2601.07674

作者:Xingran Chen,Parimal Parag,Rohit Bhagat,Salim El Rouayheb
备注:arXiv admin note: substantial text overlap with arXiv:2508.05663
摘要:Random walk (RW)-based algorithms have long been popular in distributed systems due to low overheads and scalability, with recent growing applications in decentralized learning. However, their reliance on local interactions makes them inherently vulnerable to malicious behavior. In this work, we investigate an adversarial threat that we term the ``Pac-Man'' attack, in which a malicious node probabilistically terminates any RW that visits it. This stealthy behavior gradually eliminates active RWs from the network, effectively halting the learning process without triggering failure alarms. To counter this threat, we propose the CREATE-IF-LATE (CIL) algorithm, which is a fully decentralized, resilient mechanism that enables self-creating RWs and prevents RW extinction in the presence of Pac-Man. Our theoretical analysis shows that the CIL algorithm guarantees several desirable properties, such as (i) non-extinction of the RW population, (ii) almost sure boundedness of the RW population, and (iii) convergence of RW-based stochastic gradient descent even in the presence of Pac-Man with a quantifiable deviation from the true optimum. Moreover, the learning process experiences at most a linear time delay due to Pac-Man interruptions and RW regeneration. Our extensive empirical results on both synthetic and public benchmark datasets validate our theoretical findings.


【2】Generating readily synthesizable small molecule fluorophore scaffolds with reinforcement learning
标题:通过强化学习生成易于合成的小分子荧光团支架
链接:https://arxiv.org/abs/2601.07145

作者:Ruhi Sayana,Kate Callon,Jennifer Xu,Jonathan Deutsch,Steven Chu,James Zou,John Janetzko,Rabindra V. Shivnaraine,Kyle Swanson
摘要:Developing new fluorophores for advanced imaging techniques requires exploring new chemical space. While generative AI approaches have shown promise in designing novel dye scaffolds, prior efforts often produced synthetically intractable candidates due to a lack of reaction constraints. Here, we developed SyntheFluor-RL, a generative AI model that employs known reaction libraries and molecular building blocks to create readily synthesizable fluorescent molecule scaffolds via reinforcement learning. To guide the generation of fluorophores, SyntheFluor-RL employs a scoring function built on multiple graph neural networks (GNNs) that predict key photophysical properties, including photoluminescence quantum yield, absorption, and emission wavelengths. These outputs are dynamically weighted and combined with a computed pi-conjugation score to prioritize candidates with desirable optical characteristics and synthetic feasibility. SyntheFluor-RL generated 11,590 candidate molecules, which were filtered to 19 structures predicted to possess dye-like properties. Of the 19 molecules, 14 were synthesized and 13 were experimentally confirmed. The top three were characterized, with the lead compound featuring a benzothiadiazole chromophore and exhibiting strong fluorescence (PLQY = 0.62), a large Stokes shift (97 nm), and a long excited-state lifetime (11.5 ns). These results demonstrate the effectiveness of SyntheFluor-RL in the identification of synthetically accessible fluorophores for further development.


【3】Reward-Preserving Attacks For Robust Reinforcement Learning
标题:鲁棒强化学习的奖励保护攻击
链接:https://arxiv.org/abs/2601.07118

作者:Lucas Schott,Elies Gherbi,Hatem Hajri,Sylvain Lamprier
备注:19 pages, 6 figures, 4 algorithms, preprint
摘要:Adversarial robustness in RL is difficult because perturbations affect entire trajectories: strong attacks can break learning, while weak attacks yield little robustness, and the appropriate strength varies by state. We propose $α$-reward-preserving attacks, which adapt the strength of the adversary so that an $α$ fraction of the nominal-to-worst-case return gap remains achievable at each state. In deep RL, we use a gradient-based attack direction and learn a state-dependent magnitude $η\le η_{\mathcal B}$ selected via a critic $Q^π_α((s,a),η)$ trained off-policy over diverse radii. This adaptive tuning calibrates attack strength and, with intermediate $α$, improves robustness across radii while preserving nominal performance, outperforming fixed- and random-radius baselines.


【4】ALFA: A Safe-by-Design Approach to Mitigate Quishing Attacks Launched via Fancy QR Codes
标题:ALFA:一种缓解通过Fancy QR码发起的Quishing攻击的设计安全方法
链接:https://arxiv.org/abs/2601.06768

作者:Muhammad Wahid Akram,Keshav Sood,Muneeb Ul Hassan,Dhananjay Thiruvady
备注 :LNCS Springer Template (19 pages, 5 figures, 4 tables). This paper is currently submitted to 31st European Symposium on Research in Computer Security (ESORICS) 2026 for publication
摘要:Phishing with Quick Response (QR) codes is termed as Quishing. The attackers exploit this method to manipulate individuals into revealing their confidential data. Recently, we see the colorful and fancy representations of QR codes, the 2D matrix of QR codes which does not reflect a typical mixture of black-white modules anymore. Instead, they become more tempting as an attack vector for adversaries which can evade the state-of-the-art deep learning visual-based and other prevailing countermeasures. We introduce "ALFA", a safe-by-design approach, to mitigate Quishing and prevent everyone from accessing the post-scan harmful payload of fancy QR codes. Our method first converts a fancy QR code into the replica of binary grid and then identify the erroneous representation of modules in that grid. Following that, we present "FAST" method which can conveniently recover erroneous modules from that binary grid. Afterwards, using this binary grid, our solution extracts the structural features of fancy QR code and predicts its legitimacy using a pre-trained model. The effectiveness of our proposal is demonstrated by the experimental evaluation on a synthetic dataset (containing diverse variations of fancy QR codes) and achieve a FNR of 0.06% only. We also develop the mobile app to test the practical feasibility of our solution and provide a performance comparison of the app with the real-world QR readers. This comparison further highlights the classification reliability and detection accuracy of this solution in real-world environments.


【5】Leveraging Soft Prompts for Privacy Attacks in Federated Prompt Tuning
标题:在联合即时调优中利用软脚本进行隐私攻击
链接:https://arxiv.org/abs/2601.06641

作者:Quan Minh Nguyen,Min-Seon Kim,Hoang M. Ngo,Trong Nghia Hoang,Hyuk-Yoon Kwon,My T. Thai
摘要:Membership inference attack (MIA) poses a significant privacy threat in federated learning (FL) as it allows adversaries to determine whether a client's private dataset contains a specific data sample. While defenses against membership inference attacks in standard FL have been well studied, the recent shift toward federated fine-tuning has introduced new, largely unexplored attack surfaces. To highlight this vulnerability in the emerging FL paradigm, we demonstrate that federated prompt-tuning, which adapts pre-trained models with small input prefixes to improve efficiency, also exposes a new vector for privacy attacks. We propose PromptMIA, a membership inference attack tailored to federated prompt-tuning, in which a malicious server can insert adversarially crafted prompts and monitors their updates during collaborative training to accurately determine whether a target data point is in a client's private dataset. We formalize this threat as a security game and empirically show that PromptMIA consistently attains high advantage in this game across diverse benchmark datasets. Our theoretical analysis further establishes a lower bound on the attack's advantage which explains and supports the consistently high advantage observed in our empirical results. We also investigate the effectiveness of standard membership inference defenses originally developed for gradient or output based attacks and analyze their interaction with the distinct threat landscape posed by PromptMIA. The results highlight non-trivial challenges for current defenses and offer insights into their limitations, underscoring the need for defense strategies that are specifically tailored to prompt-tuning in federated settings.


【6】AIS-CycleGen: A CycleGAN-Based Framework for High-Fidelity Synthetic AIS Data Generation and Augmentation
标题:AIS-CycleGen:一个基于CycleGAN的框架,用于高保真合成AIS数据生成和增强
链接:https://arxiv.org/abs/2601.06127

作者:SM Ashfaq uz Zaman,Faizan Qamar,Masnizah Mohd,Nur Hanis Sabrina Suhaimi,Amith Khandakar
备注:25 pages, 16 figures
摘要:Automatic Identification System (AIS) data are vital for maritime domain awareness, yet they often suffer from domain shifts, data sparsity, and class imbalance, which hinder the performance of predictive models. In this paper, we propose a robust data augmentation method, AISCycleGen, based on Cycle-Consistent Generative Adversarial Networks (CycleGAN), which is tailored for AIS datasets. Unlike traditional methods, AISCycleGen leverages unpaired domain translation to generate high-fidelity synthetic AIS data sequences without requiring paired source-target data. The framework employs a 1D convolutional generator with adaptive noise injection to preserve the spatiotemporal structure of AIS trajectories, enhancing the diversity and realism of the generated data. To demonstrate its efficacy, we apply AISCycleGen to several baseline regression models, showing improvements in performance across various maritime domains. The results indicate that AISCycleGen outperforms contemporary GAN-based augmentation techniques, achieving a PSNR value of 30.5 and an FID score of 38.9. These findings underscore AISCycleGen's potential as an effective and generalizable solution for augmenting AIS datasets, improving downstream model performance in real-world maritime intelligence applications.


【7】Stress Testing Machine Learning at $10^{10}$ Scale: A Comprehensive Study of Adversarial Robustness on Algebraically Structured Integer Streams
标题:压力测试机器学习$10 ##规模:对代数结构化数据流的对抗鲁棒性的全面研究
链接:https://arxiv.org/abs/2601.06117

作者:HyunJun Jeon
摘要:This paper presents a large-scale stress test of machine learning systems using structured mathematical data as a benchmark. We evaluate the robustness of tree-based classifiers at an unprecedented scale, utilizing ten billion deterministic samples and five billion adversarial counterexamples. Our framework introduces three primary contributions:   first, a high-throughput pipeline that reformulates Pythagorean triple generation into a single-parameter index stream, significantly improving computational efficiency over classical methods;   second, the Hypothesis-driven Negative Dataset (HND), which categorizes nine classes of adversarial attacks designed to exploit both arithmetic precision and structural patterns; and   third, a fault-tolerant infrastructure for reliable large-scale training. Experimental results demonstrate that while LightGBM achieves 99.99% accuracy, feature attribution reveals that the model prioritizes underlying quadratic patterns over direct algebraic verification.   These findings suggest that learned heuristics can effectively identify structural representations in numerical data, potentially serving as efficient preprocessors for formal verification methods.


半/弱/无/有监督|不确定性|主动学习(7篇)

【1】Failure-Aware RL: Reliable Offline-to-Online Reinforcement Learning with Self-Recovery for Real-World Manipulation
标题:故障感知RL:可靠的离线到在线强化学习,具有现实世界操纵的自我恢复
链接:https://arxiv.org/abs/2601.07821

作者:Huanyu Li,Kun Lei,Sheng Zang,Kaizhe Hu,Yongyuan Liang,Bo An,Xiaoli Li,Huazhe Xu
备注:Project page: https://failure-aware-rl.github.io
摘要:Post-training algorithms based on deep reinforcement learning can push the limits of robotic models for specific objectives, such as generalizability, accuracy, and robustness. However, Intervention-requiring Failures (IR Failures) (e.g., a robot spilling water or breaking fragile glass) during real-world exploration happen inevitably, hindering the practical deployment of such a paradigm. To tackle this, we introduce Failure-Aware Offline-to-Online Reinforcement Learning (FARL), a new paradigm minimizing failures during real-world reinforcement learning. We create FailureBench, a benchmark that incorporates common failure scenarios requiring human intervention, and propose an algorithm that integrates a world-model-based safety critic and a recovery policy trained offline to prevent failures during online exploration. Extensive simulation and real-world experiments demonstrate the effectiveness of FARL in significantly reducing IR Failures while improving performance and generalization during online reinforcement learning post-training. FARL reduces IR Failures by 73.1% while elevating performance by 11.3% on average during real-world RL post-training. Videos and code are available at https://failure-aware-rl.github.io.


【2】AntiPaSTO: Self-Supervised Steering of Moral Reasoning
标题:AntiPaSTO:道德推理的自我监督引导
链接:https://arxiv.org/abs/2601.07473

作者:Michael J. Clark
摘要:As models grow more capable, human supervision breaks down: labels don't scale, outputs can be gamed, and training doesn't generalize. Scalable oversight requires steering methods that are internal, self-supervised, and transfer out-of-distribution; existing methods satisfy some but not all three. We introduce AntiPaSTO, which separates representations along an anti-parallel axis ($α=\pm1$ produce opposite shifts), with coherence constraints preventing collapse. Human input is minimal: two contrasting words inserted into template sentences, no preference labels. Using 800 such pairs on Gemma-3-1B, AntiPaSTO beats prompting baselines by $6.9\times$ on DailyDilemmas and maintains bidirectional control where prompting triggers refusal.   Code is available at https://github.com/wassname/AntiPaSTO.


【3】On the Non-decoupling of Supervised Fine-tuning and Reinforcement Learning in Post-training
标题:训练后监督微调和强化学习的非脱钩
链接:https://arxiv.org/abs/2601.07389

作者:Xueyan Niu,Bo Bai,Wei Han,Weixi Zhang
摘要:Post-training of large language models routinely interleaves supervised fine-tuning (SFT) with reinforcement learning (RL). These two methods have different objectives: SFT minimizes the cross-entropy loss between model outputs and expert responses, while RL maximizes reward signals derived from human preferences or rule-based verifiers. Modern reasoning models have widely adopted the practice of alternating SFT and RL training. However, there is no theoretical account of whether they can be decoupled. We prove that decoupling is impossible in either order: (1) SFT-then-RL coupling: RL increases SFT loss under SFT optimality and (2) RL-then-SFT coupling: SFT lowers the reward achieved by RL. Experiments on Qwen3-0.6B confirm the predicted degradation, verifying that SFT and RL cannot be separated without loss of prior performance in the post-training


【4】Active Learning Strategies for Efficient Machine-Learned Interatomic Potentials Across Diverse Material Systems
标题:跨不同材料系统的高效机器学习原子间势的主动学习策略
链接:https://arxiv.org/abs/2601.06916

作者:Mohammed Azeez Khan,Aaron D'Souza,Vijay Choyal
备注:16 pages, 3 figures, 2 tables
摘要:Efficient discovery of new materials demands strategies to reduce the number of costly first-principles calculations required to train predictive machine learning models. We develop and validate an active learning framework that iteratively selects informative training structures for machine-learned interatomic potentials (MLIPs) from large, heterogeneous materials databases, specifically the Materials Project and OQMD. Our framework integrates compositional and property-based descriptors with a neural network ensemble model, enabling real-time uncertainty quantification via Query-by-Committee. We systematically compare four selection strategies: random sampling (baseline), uncertainty-based sampling, diversity-based sampling (k-means clustering with farthest-point refinement), and a hybrid approach balancing both objectives. Experiments across four representative material systems (elemental carbon, silicon, iron, and a titanium-oxide compound) with 5 random seeds per configuration demonstrate that diversity sampling consistently achieves competitive or superior performance, with particularly strong advantages on complex systems like titanium-oxide (10.9% improvement, p=0.008). Our results show that intelligent data selection strategies can achieve target accuracy with 5-13% fewer labeled samples compared to random baselines. The entire pipeline executes on Google Colab in under 4 hours per system using less than 8 GB of RAM, thereby democratizing MLIP development for researchers globally with limited computational resources. Our open-source code and detailed experimental configurations are available on GitHub. This multi-system evaluation establishes practical guidelines for data-efficient MLIP training and highlights promising future directions including integration with symmetry-aware neural network architectures.


【5】UMLoc: Uncertainty-Aware Map-Constrained Inertial Localization with Quantified Bounds
标题:UMLoc:具有量化边界的不确定性感知地图约束惯性定位
链接:https://arxiv.org/abs/2601.06602

作者:Mohammed S. Alharbi,Shinkyu Park
摘要 :Inertial localization is particularly valuable in GPS-denied environments such as indoors. However, localization using only Inertial Measurement Units (IMUs) suffers from drift caused by motion-process noise and sensor biases. This paper introduces Uncertainty-aware Map-constrained Inertial Localization (UMLoc), an end-to-end framework that jointly models IMU uncertainty and map constraints to achieve drift-resilient positioning. UMLoc integrates two coupled modules: (1) a Long Short-Term Memory (LSTM) quantile regressor, which estimates the specific quantiles needed to define 68%, 90%, and 95% prediction intervals serving as a measure of localization uncertainty and (2) a Conditioned Generative Adversarial Network (CGAN) with cross-attention that fuses IMU dynamic data with distance-based floor-plan maps to generate geometrically feasible trajectories. The modules are trained jointly, allowing uncertainty estimates to propagate through the CGAN during trajectory generation. UMLoc was evaluated on three datasets, including a newly collected 2-hour indoor benchmark with time-aligned IMU data, ground-truth poses and floor-plan maps. Results show that the method achieves a mean drift ratio of 5.9% over a 70 m travel distance and an average Absolute Trajectory Error (ATE) of 1.36 m, while maintaining calibrated prediction bounds.


【6】Supervised and Unsupervised Neural Network Solver for First Order Hyperbolic Nonlinear PDEs
标题:一阶双曲非线性偏出方程的有监督和无监督神经网络求解器
链接:https://arxiv.org/abs/2601.06388

作者:Zakaria Baba,Alexandre M. Bayen,Alexi Canesse,Maria Laura Delle Monache,Martin Drieux,Zhe Fu,Nathan Lichtlé,Zihe Liu,Hossein Nick Zinat Matin,Benedetto Piccoli
摘要:We present a neural network-based method for learning scalar hyperbolic conservation laws. Our method replaces the traditional numerical flux in finite volume schemes with a trainable neural network while preserving the conservative structure of the scheme. The model can be trained both in a supervised setting with efficiently generated synthetic data or in an unsupervised manner, leveraging the weak formulation of the partial differential equation. We provide theoretical results that our model can perform arbitrarily well, and provide associated upper bounds on neural network size. Extensive experiments demonstrate that our method often outperforms efficient schemes such as Godunov's scheme, WENO, and Discontinuous Galerkin for comparable computational budgets. Finally, we demonstrate the effectiveness of our method on a traffic prediction task, leveraging field experimental highway data from the Berkeley DeepDrive drone dataset.


【7】Future-as-Label: Scalable Supervision from Real-World Outcomes
标题:未来标签:来自现实世界结果的可扩展监督
链接:https://arxiv.org/abs/2601.06336

作者:Benjamin Turtel,Paul Wilczewski,Danny Franklin,Kris Skothiem
摘要:Many real-world prediction problems lack labels observable at prediction time, creating a temporal gap between prediction and outcome that yields supervision only after events resolve. To address this setting, we extend reinforcement learning with verifiable rewards to temporally resolved real-world prediction, and use it to train language models to make probabilistic forecasts under causally masked information with retrospective evaluation using proper scoring rules. Supervision is derived solely from post-resolution outcomes, preserving delayed-reward semantics. On real-world forecasting benchmarks, Qwen3-32B trained using Foresight Learning improves Brier score by 27% and halves calibration error relative to its pretrained baseline, and outperforms Qwen3-235B on both constructed future-event prediction tasks and the Metaculus benchmark despite a 7x parameter disadvantage.


迁移|Zero/Few/One-Shot|自适应(11篇)

【1】Free-RBF-KAN: Kolmogorov-Arnold Networks with Adaptive Radial Basis Functions for Efficient Function Learning
标题:Free-RBS-KAN:具有自适应放射基函数的Kolmogorov-Arnold网络用于高效的函数学习
链接:https://arxiv.org/abs/2601.07760

作者:Shao-Ting Chiu,Siu Wun Cheung,Ulisses Braga-Neto,Chak Shing Lee,Rui Peng Li
摘要:Kolmogorov-Arnold Networks (KANs) have shown strong potential for efficiently approximating complex nonlinear functions. However, the original KAN formulation relies on B-spline basis functions, which incur substantial computational overhead due to De Boor's algorithm. To address this limitation, recent work has explored alternative basis functions such as radial basis functions (RBFs) that can improve computational efficiency and flexibility. Yet, standard RBF-KANs often sacrifice accuracy relative to the original KAN design. In this work, we propose Free-RBF-KAN, a RBF-based KAN architecture that incorporates adaptive learning grids and trainable smoothness to close this performance gap. Our method employs freely learnable RBF shapes that dynamically align grid representations with activation patterns, enabling expressive and adaptive function approximation. Additionally, we treat smoothness as a kernel parameter optimized jointly with network weights, without increasing computational complexity. We provide a general universality proof for RBF-KANs, which encompasses our Free-RBF-KAN formulation. Through a broad set of experiments, including multiscale function approximation, physics-informed machine learning, and PDE solution operator learning, Free-RBF-KAN achieves accuracy comparable to the original B-spline-based KAN while delivering faster training and inference. These results highlight Free-RBF-KAN as a compelling balance between computational efficiency and adaptive resolution, particularly for high-dimensional structured modeling tasks.


【2】Improving Domain Generalization in Contrastive Learning using Adaptive Temperature Control
标题:使用自适应温度控制改进对比学习中的领域概括
链接:https://arxiv.org/abs/2601.07748

作者:Robert Lewis,Katie Matton,Rosalind W. Picard,John Guttag
备注:NeurIPS SSL Workshop 2023
摘要 :Self-supervised pre-training with contrastive learning is a powerful method for learning from sparsely labeled data. However, performance can drop considerably when there is a shift in the distribution of data from training to test time. We study this phenomenon in a setting in which the training data come from multiple domains, and the test data come from a domain not seen at training that is subject to significant covariate shift. We present a new method for contrastive learning that incorporates domain labels to increase the domain invariance of learned representations, leading to improved out-of-distribution generalization. Our method adjusts the temperature parameter in the InfoNCE loss -- which controls the relative weighting of negative pairs -- using the probability that a negative sample comes from the same domain as the anchor. This upweights pairs from more similar domains, encouraging the model to discriminate samples based on domain-invariant attributes. Through experiments on a variant of the MNIST dataset, we demonstrate that our method yields better out-of-distribution performance than domain generalization baselines. Furthermore, our method maintains strong in-distribution task performance, substantially outperforming baselines on this measure.


【3】Consolidation or Adaptation? PRISM: Disentangling SFT and RL Data via Gradient Concentration
标题:巩固还是适应?PRism:通过梯度浓度解开SFT和RL数据
链接:https://arxiv.org/abs/2601.07224

作者:Yang Zhao,Yangou Ouyang,Xiao Ding,Hepeng Wang,Bibo Cai,Kai Xiong,Jinglong Gao,Zhouhao Sun,Li Du,Bing Qin,Ting Liu
摘要:While Hybrid Supervised Fine-Tuning (SFT) followed by Reinforcement Learning (RL) has become the standard paradigm for training LLM agents, effective mechanisms for data allocation between these stages remain largely underexplored. Current data arbitration strategies often rely on surface-level heuristics that fail to diagnose intrinsic learning needs. Since SFT targets pattern consolidation through imitation while RL drives structural adaptation via exploration, misaligning data with these functional roles causes severe optimization interference. We propose PRISM, a dynamics-aware framework grounded in Schema Theory that arbitrates data based on its degree of cognitive conflict with the model's existing knowledge. By analyzing the spatial geometric structure of gradients, PRISM identifies data triggering high spatial concentration as high-conflict signals that require RL for structural restructuring. In contrast, data yielding diffuse updates is routed to SFT for efficient consolidation. Extensive experiments on WebShop and ALFWorld demonstrate that PRISM achieves a Pareto improvement, outperforming state-of-the-art hybrid methods while reducing computational costs by up to 3.22$\times$. Our findings suggest that disentangling data based on internal optimization regimes is crucial for scalable and robust agent alignment.


【4】MAESTRO: Meta-learning Adaptive Estimation of Scalarization Trade-offs for Reward Optimization
标题:MAESTRO:奖励优化的规模化权衡的元学习自适应估计
链接:https://arxiv.org/abs/2601.07208

作者:Yang Zhao,Hepeng Wang,Xiao Ding,Yangou Ouyang,Bibo Cai,Kai Xiong,Jinglong Gao,Zhouhao Sun,Li Du,Bing Qin,Ting Liu
摘要:Group-Relative Policy Optimization (GRPO) has emerged as an efficient paradigm for aligning Large Language Models (LLMs), yet its efficacy is primarily confined to domains with verifiable ground truths. Extending GRPO to open-domain settings remains a critical challenge, as unconstrained generation entails multi-faceted and often conflicting objectives - such as creativity versus factuality - where rigid, static reward scalarization is inherently suboptimal. To address this, we propose MAESTRO (Meta-learning Adaptive Estimation of Scalarization Trade-offs for Reward Optimization), which introduces a meta-cognitive orchestration layer that treats reward scalarization as a dynamic latent policy, leveraging the model's terminal hidden states as a semantic bottleneck to perceive task-specific priorities. We formulate this as a contextual bandit problem within a bi-level optimization framework, where a lightweight Conductor network co-evolves with the policy by utilizing group-relative advantages as a meta-reward signal. Across seven benchmarks, MAESTRO consistently outperforms single-reward and static multi-objective baselines, while preserving the efficiency advantages of GRPO, and in some settings even reducing redundant generation.


【5】Offline Meta-Reinforcement Learning with Flow-Based Task Inference and Adaptive Correction of Feature Overgeneralization
标题:具有基于流的任务推理和特征过度概括的自适应纠正的离线元强化学习
链接:https://arxiv.org/abs/2601.07164

作者:Min Wang,Xin Li,Mingzhong Wang,Hasnaa Bennis
摘要:Offline meta-reinforcement learning (OMRL) combines the strengths of learning from diverse datasets in offline RL with the adaptability to new tasks of meta-RL, promising safe and efficient knowledge acquisition by RL agents. However, OMRL still suffers extrapolation errors due to out-of-distribution (OOD) actions, compromised by broad task distributions and Markov Decision Process (MDP) ambiguity in meta-RL setups. Existing research indicates that the generalization of the $Q$ network affects the extrapolation error in offline RL. This paper investigates this relationship by decomposing the $Q$ value into feature and weight components, observing that while decomposition enhances adaptability and convergence in the case of high-quality data, it often leads to policy degeneration or collapse in complex tasks. We observe that decomposed $Q$ values introduce a large estimation bias when the feature encounters OOD samples, a phenomenon we term ''feature overgeneralization''. To address this issue, we propose FLORA, which identifies OOD samples by modeling feature distributions and estimating their uncertainties. FLORA integrates a return feedback mechanism to adaptively adjust feature components. Furthermore, to learn precise task representations, FLORA explicitly models the complex task distribution using a chain of invertible transformations. We theoretically and empirically demonstrate that FLORA achieves rapid adaptation and meta-policy improvement compared to baselines across various environments.


【6】Stable On-Policy Distillation through Adaptive Target Reformulation
标题:通过自适应目标重构实现稳定的按策略蒸馏
链接:https://arxiv.org/abs/2601.07155

作者:Ijun Jang,Jewon Yeom,Juan Yeo,Hyunggu Lim,Taesup Kim
备注:10 pages, 5 figures
摘要 :Knowledge distillation (KD) is a widely adopted technique for transferring knowledge from large language models to smaller student models; however, conventional supervised KD often suffers from a distribution mismatch between training and inference. While on-policy KD approaches attempt to mitigate this issue by learning directly from student-generated outputs, they frequently encounter training instabilities because the distributional gap between the novice student and the expert teacher is often too wide to bridge directly. These challenges manifest as pathological gradients in forward KL objectives or diversity collapse in reverse KL regimes. To address these limitations, we propose Veto, an objective-level reformulation that constructs a geometric bridge in the logit space. Unlike prior methods that mix data samples, Veto creates an intermediate target distribution that promotes alignment between the teacher and the student. By introducing a tunable parameter beta, Veto serves as an Adaptive Gradient Veto that stabilizes optimization by suppressing harmful gradients on low-confidence tokens, while simultaneously acting as a Decisiveness Knob to balance reward-driven performance with output diversity. Extensive experiments across various reasoning and generation tasks demonstrate that Veto consistently outperforms supervised fine-tuning and existing on-policy baselines.


【7】U-MASK: User-adaptive Spatio-Temporal Masking for Personalized Mobile AI Applications
标题:U-MASK:个性化移动人工智能应用程序的用户自适应时空掩蔽
链接:https://arxiv.org/abs/2601.06867

作者:Shiyuan Zhang,Yilai Liu,Yuwei Du,Ruoxuan Yang,Dong In Kim,Hongyang Du
备注:18 pages, 9 figures
摘要:Personalized mobile artificial intelligence applications are widely deployed, yet they are expected to infer user behavior from sparse and irregular histories under a continuously evolving spatio-temporal context. This setting induces a fundamental tension among three requirements, i.e., immediacy to adapt to recent behavior, stability to resist transient noise, and generalization to support long-horizon prediction and cold-start users. Most existing approaches satisfy at most two of these requirements, resulting in an inherent impossibility triangle in data-scarce, non-stationary personalization. To address this challenge, we model mobile behavior as a partially observed spatio-temporal tensor and unify short-term adaptation, long-horizon forecasting, and cold-start recommendation as a conditional completion problem, where a user- and task-specific mask specifies which coordinates are treated as evidence. We propose U-MASK, a user-adaptive spatio-temporal masking method that allocates evidence budgets based on user reliability and task sensitivity. To enable mask generation under sparse observations, U-MASK learns a compact, task-agnostic user representation from app and location histories via U-SCOPE, which serves as the sole semantic conditioning signal. A shared diffusion transformer then performs mask-guided generative completion while preserving observed evidence, so personalization and task differentiation are governed entirely by the mask and the user representation. Experiments on real-world mobile datasets demonstrate consistent improvements over state-of-the-art methods across short-term prediction, long-horizon forecasting, and cold-start settings, with the largest gains under severe data sparsity. The code and dataset will be available at https://github.com/NICE-HKU/U-MASK.


【8】PRISP: Privacy-Safe Few-Shot Personalization via Lightweight Adaptation
标题:PRISP:通过轻量级适应实现隐私安全的Few-Shot个性化
链接:https://arxiv.org/abs/2601.06471

作者:Junho Park,Dohoon Kim,Taesup Moon
备注:16 pages, 9 figures
摘要:Large language model (LLM) personalization aims to adapt general-purpose models to individual users. Most existing methods, however, are developed under data-rich and resource-abundant settings, often incurring privacy risks. In contrast, realistic personalization typically occurs after deployment under (i) extremely limited user data, (ii) constrained computational resources, and (iii) strict privacy requirements. We propose PRISP, a lightweight and privacy-safe personalization framework tailored to these constraints. PRISP leverages a Text-to-LoRA hypernetwork to generate task-aware LoRA parameters from task descriptions, and enables efficient user personalization by optimizing a small subset of task-aware LoRA parameters together with minimal additional modules using few-shot user data. Experiments on a few-shot variant of the LaMP benchmark demonstrate that PRISP achieves strong overall performance compared to prior approaches, while reducing computational overhead and eliminating privacy risks.


【9】One-Shot Hierarchical Federated Clustering
标题:一次分层联合集群
链接:https://arxiv.org/abs/2601.06404

作者:Shenghong Cai,Zihua Yang,Yang Lu,Mengke Li,Yuzhu Ji,Yiqun Zhang,Yiu-Ming Cheung
摘要:Driven by the growth of Web-scale decentralized services, Federated Clustering (FC) aims to extract knowledge from heterogeneous clients in an unsupervised manner while preserving the clients' privacy, which has emerged as a significant challenge due to the lack of label guidance and the Non-Independent and Identically Distributed (non-IID) nature of clients. In real scenarios such as personalized recommendation and cross-device user profiling, the global cluster may be fragmented and distributed among different clients, and the clusters may exist at different granularities or even nested. Although Hierarchical Clustering (HC) is considered promising for exploring such distributions, the sophisticated recursive clustering process makes it more computationally expensive and vulnerable to privacy exposure, thus relatively unexplored under the federated learning scenario. This paper introduces an efficient one-shot hierarchical FC framework that performs client-end distribution exploration and server-end distribution aggregation through one-way prototype-level communication from clients to the server. A fine partition mechanism is developed to generate successive clusterlets to describe the complex landscape of the clients' clusters. Then, a multi-granular learning mechanism on the server is proposed to fuse the clusterlets, even when they have inconsistent granularities generated from different clients. It turns out that the complex cluster distributions across clients can be efficiently explored, and extensive experiments comparing state-of-the-art methods on ten public datasets demonstrate the superiority of the proposed method.


【10】Parent-Guided Adaptive Reliability (PGAR): A Behavioural Meta-Learning Framework for Stable and Trustworthy AI
标题:家长引导的自适应可靠性(PGAR):稳定且值得信赖的人工智能的行为元学习框架
链接:https://arxiv.org/abs/2601.06167

作者:Anshum Rankawat
备注:9 pages, 8 figures, 2 tables. Submitted to IEEE Transactions on Artificial Intelligence
摘要 :Parent-Guided Adaptive Reliability (PGAR) is a lightweight behavioural meta-learning framework that adds a supervisory "parent" layer on top of a standard learner to improve stability, calibration, and recovery under disturbances. PGAR computes three reflex-level signals (incident detection, overconfidence correction, and recovery memory) and fuses them into a bounded reliability index in [0,1]. This index continuously modulates the learner's effective learning rate, reducing update magnitude during instability and restoring it as reliability improves. We provide a Lyapunov-based proof sketch establishing bounded adaptation of the reliability dynamics under mild assumptions (smooth loss, descent direction, and bounded reflex outputs). Empirical evaluations on representative learning tasks show improved calibration, reduced loss variance, and faster recovery compared to standard optimizers, while retaining computational simplicity. PGAR functions as a plug-in reliability layer for existing optimization and learning pipelines, supporting interpretable reliability traces in safety-relevant settings.


【11】Attention in Geometry: Scalable Spatial Modeling via Adaptive Density Fields and FAISS-Accelerated Kernels
标题:关注几何:通过自适应密度场和FAISS加速核进行可扩展空间建模
链接:https://arxiv.org/abs/2601.06135

作者:Zhaowen Fan
备注:Indepented Study. 22 pages, 2 figures. Includes full mathematical derivation of Adaptive Density Fields (ADF), implementation of FAISS-accelerated kernels, and a physics-informed trajectory POI detection pipeline
摘要:This work introduces Adaptive Density Fields (ADF), a geometric attention framework that formulates spatial aggregation as a query-conditioned, metric-induced attention operator in continuous space. By reinterpreting spatial influence as geometry-preserving attention grounded in physical distance, ADF bridges concepts from adaptive kernel methods and attention mechanisms. Scalability is achieved via FAISS-accelerated inverted file indices, treating approximate nearest-neighbor search as an intrinsic component of the attention mechanism. We demonstrate the framework through a case study on aircraft trajectory analysis in the Chengdu region, extracting trajectory-conditioned Zones of Influence (ZOI) to reveal recurrent airspace structures and localized deviations.


强化学习(5篇)

【1】Stagewise Reinforcement Learning and the Geometry of the Regret Landscape
标题:阶段强化学习和遗憾景观的几何学
链接:https://arxiv.org/abs/2601.07524

作者:Chris Elliott,Einar Urdshals,David Quarel,Matthew Farrugia-Roberts,Daniel Murfet
备注:50 pages, 14 figures
摘要:Singular learning theory characterizes Bayesian learning as an evolving tradeoff between accuracy and complexity, with transitions between qualitatively different solutions as sample size increases. We extend this theory to deep reinforcement learning, proving that the concentration of the generalized posterior over policies is governed by the local learning coefficient (LLC), an invariant of the geometry of the regret function. This theory predicts that Bayesian phase transitions in reinforcement learning should proceed from simple policies with high regret to complex policies with low regret. We verify this prediction empirically in a gridworld environment exhibiting stagewise policy development: phase transitions over SGD training manifest as "opposing staircases" where regret decreases sharply while the LLC increases. Notably, the LLC detects phase transitions even when estimated on a subset of states where the policies appear identical in terms of regret, suggesting it captures changes in the underlying algorithm rather than just performance.


【2】Puzzle it Out: Local-to-Global World Model for Offline Multi-Agent Reinforcement Learning
标题:解谜:离线多智能体强化学习的本地到全球世界模型
链接:https://arxiv.org/abs/2601.07463

作者:Sijia li,Xinran Li,Shibo Chen,Jun Zhang
摘要:Offline multi-agent reinforcement learning (MARL) aims to solve cooperative decision-making problems in multi-agent systems using pre-collected datasets. Existing offline MARL methods primarily constrain training within the dataset distribution, resulting in overly conservative policies that struggle to generalize beyond the support of the data. While model-based approaches offer a promising solution by expanding the original dataset with synthetic data generated from a learned world model, the high dimensionality, non-stationarity, and complexity of multi-agent systems make it challenging to accurately estimate the transitions and reward functions in offline MARL. Given the difficulty of directly modeling joint dynamics, we propose a local-to-global (LOGO) world model, a novel framework that leverages local predictions-which are easier to estimate-to infer global state dynamics, thus improving prediction accuracy while implicitly capturing agent-wise dependencies. Using the trained world model, we generate synthetic data to augment the original dataset, expanding the effective state-action space. To ensure reliable policy learning, we further introduce an uncertainty-aware sampling mechanism that adaptively weights synthetic data by prediction uncertainty, reducing approximation error propagation to policies. In contrast to conventional ensemble-based methods, our approach requires only an additional encoder for uncertainty estimation, significantly reducing computational overhead while maintaining accuracy. Extensive experiments across 8 scenarios against 8 baselines demonstrate that our method surpasses state-of-the-art baselines on standard offline MARL benchmarks, establishing a new model-based baseline for generalizable offline multi-agent learning.


【3】Thinking with Deltas: Incentivizing Reinforcement Learning via Differential Visual Reasoning Policy
标题:与Delta一起思考:通过差异视觉推理政策激励强化学习
链接:https://arxiv.org/abs/2601.06801

作者:Shujian Gao,Yuan Wang,Jiangtao Yan,Zuxuan Wu,Yu-Gang Jiang
备注:24 pages, 10 tables, 4 figures
摘要 :Reinforcement Learning with Verifiable Rewards (RLVR) has significantly advanced reasoning capabilities in Large Language Models. However, adapting RLVR to multimodal domains suffers from a critical \textit{perception-reasoning decoupling}. Existing paradigms, driven by text-centric outcome rewards, reasoning in language medium, inadvertently encourage models to bypass visual perception. We empirically validate this through blind experiments: state-of-the-art policies maintain or surprisingly improve performance even when visual inputs are entirely removed. This reveals that these models degenerate into \textit{blind reasoners}, exploiting linguistic priors to generate plausible answers instead of attending to visual evidence. In response, we propose \textbf{Thinking with Deltas}, a framework driven by a \textbf{Differential Visual Reasoning Policy (DVRP)}. DVRP introduces intrinsic supervision via visual triplets, comprising original, masked, and perturbed inputs. It optimizes the model to maximize reasoning divergence from masked inputs (enforcing \textit{visual sensitivity}) while minimizing divergence from perturbed inputs (ensuring \textit{visual robustness}). By aligning reasoning variations strictly with the \textit{Delta} of visual information, DVRP inherently bolsters visual understanding capabilities and significantly outperforms state-of-the-art methods on both general and medical benchmarks, without requiring external annotations or auxiliary tools.


【4】Reinforcement Learning for Chain of Thought Compression with One-Domain-to-All Generalization
标题:具有一域对所有概括的思维链压缩强化学习
链接:https://arxiv.org/abs/2601.06052

作者:Hanyu Li,Jiangshan Duo,Bofei Gao,Hailin Zhang,Sujian Li,Xiaotie Deng,Liang Zhao
摘要:Chain-of-thought reasoning in large language models often creates an "overthinking trap," leading to excessive computational cost and latency for unreliable accuracy gains. Prior work has typically relied on global, static controls that risk penalizing necessary reasoning. We introduce a sample-level, soft reinforcement learning compression method that penalizes inefficiently long rollouts, but only on problems where the model has already mastered and already produced a more concise rollout. Our experiments show that this method reduces average response length by 20-40% with comparable or higher accuracy. Crucially, the compression exhibits strong cross-domain generalization; a model trained on math spontaneously shortens responses on unseen tasks like code, instruction following, and general knowledge QA, with stable or improved accuracy. We demonstrate a stable post-training curriculum (accuracy-compression-accuracy) that can ultimately produce models that are more accurate and reason more concisely, arguing that such compression method should be a standard phase in developing efficient reasoning models.


【5】Reinforcement Learning for Micro-Level Claims Reserving
标题:微观层面索赔保留的强化学习
链接:https://arxiv.org/abs/2601.07637

作者:Benjamin Avanzi,Ronald Richman,Bernard Wong,Mario Wüthrich,Yagebu Xie
摘要:Outstanding claim liabilities are revised repeatedly as claims develop, yet most modern reserving models are trained as one-shot predictors and typically learn only from settled claims. We formulate individual claims reserving as a claim-level Markov decision process in which an agent sequentially updates outstanding claim liability (OCL) estimates over development, using continuous actions and a reward design that balances accuracy with stable reserve revisions. A key advantage of this reinforcement learning (RL) approach is that it can learn from all observed claim trajectories, including claims that remain open at valuation, thereby avoiding the reduced sample size and selection effects inherent in supervised methods trained on ultimate outcomes only. We also introduce practical components needed for actuarial use -- initialisation of new claims, temporally consistent tuning via a rolling-settlement scheme, and an importance-weighting mechanism to mitigate portfolio-level underestimation driven by the rarity of large claims. On CAS and SPLICE synthetic general insurance datasets, the proposed Soft Actor-Critic implementation delivers competitive claim-level accuracy and strong aggregate OCL performance, particularly for the immature claim segments that drive most of the liability.


元学习(2篇)

【1】Performance of models for monitoring sustainable development goals from remote sensing: A three-level meta-regression
标题:遥感监测可持续发展目标模型的性能:三级元回归
链接:https://arxiv.org/abs/2601.06178

作者:Jonas Klingwort,Nina M. Leach,Joep Burger
备注:33 pages, 10 figures, 5 tables
摘要:Machine learning (ML) is a tool to exploit remote sensing data for the monitoring and implementation of the United Nations' Sustainable Development Goals (SDGs). In this paper, we report on a meta-analysis to evaluate the performance of ML applied to remote sensing data to monitor SDGs. Specifically, we aim to 1) estimate the average performance; 2) determine the degree of heterogeneity between and within studies; and 3) assess how study features influence model performance. Using PRISMA guidelines, a search was performed across multiple academic databases to identify potentially relevant studies. A random sample of 200 was screened by three reviewers, resulting in 86 trials within 20 studies with 14 study features. Overall accuracy was the most reported performance metric. It was analyzed using double arcsine transformation and a three-level random effects model. The average overall accuracy of the best model was 0.90 [0.86, 0.92]. There was considerable heterogeneity in model performance, 64% of which was between studies. The only significant feature was the prevalence of the majority class, which explained 61% of the between-study heterogeneity. None of the other thirteen features added value to the model. The most important contributions of this paper are the following two insights. 1) Overall accuracy is the most popular performance metric, yet arguably the least insightful. Its sensitivity to class imbalance makes it necessary to normalize it, which is far from common practice. 2) The field needs to standardize the reporting. Reporting of the confusion matrix for independent test sets is the most important ingredient for between-study comparisons of ML classifiers. These findings underscore the need for robust and comparable evaluation metrics in machine learning applications to ensure reliable and actionable insights for effective SDG monitoring and policy formulation.


【2】Temporal-Aligned Meta-Learning for Risk Management: A Stacking Approach for Multi-Source Credit Scoring
标题:风险管理的时间一致元学习:多来源信用评分的堆叠方法
链接:https://arxiv.org/abs/2601.07588

作者:O. Didkovskyi,A. Vidali,N. Jean,G. Le Pera
摘要 :This paper presents a meta-learning framework for credit risk assessment of Italian Small and Medium Enterprises (SMEs) that explicitly addresses the temporal misalignment of credit scoring models.   The approach aligns financial statement reference dates with evaluation dates, mitigating bias arising from publication delays and asynchronous data sources. It is based on a two-step temporal decomposition that at first estimates annual probabilities of default (PDs) anchored to balance-sheet reference dates (December 31st) through a static model. Then it models the monthly evolution of PDs using higher-frequency behavioral data. Finally, we employ stacking-based architecture to aggregate multiple scoring systems, each capturing complementary aspects of default risk, into a unified predictive model. In this way, first level model outputs are treated as learned representations that encode non-linear relationships in financial and behavioral indicators, allowing integration of new expert-based features without retraining base models. This design provides a coherent and interpretable solution to challenges typical of low-default environments, including heterogeneous default definitions and reporting delays. Empirical validation shows that the framework effectively captures credit risk evolution over time, improving temporal consistency and predictive stability relative to standard ensemble methods.


医学相关(8篇)

【1】Neural Architecture for Fast and Reliable Coagulation Assessment in Clinical Settings: Leveraging Thromboelastography
标题:用于临床环境中快速可靠的凝血评估的神经结构:利用血栓弹性成像
链接:https://arxiv.org/abs/2601.07618

作者:Yulu Wang,Ziqian Zeng,Jianjun Wu,Zhifeng Tang
备注:This paper has been accepted by AAAI26
摘要:In an ideal medical environment, real-time coagulation monitoring can enable early detection and prompt remediation of risks. However, traditional Thromboelastography (TEG), a widely employed diagnostic modality, can only provide such outputs after nearly 1 hour of measurement. The delay might lead to elevated mortality rates. These issues clearly point out one of the key challenges for medical AI development: Mak-ing reasonable predictions based on very small data sets and accounting for variation between different patient populations, a task where conventional deep learning methods typically perform poorly. We present Physiological State Reconstruc-tion (PSR), a new algorithm specifically designed to take ad-vantage of dynamic changes between individuals and to max-imize useful information produced by small amounts of clini-cal data through mapping to reliable predictions and diagnosis. We develop MDFE to facilitate integration of varied temporal signals using multi-domain learning, and jointly learn high-level temporal interactions together with attentions via HLA; furthermore, the parameterized DAM we designed maintains the stability of the computed vital signs. PSR evaluates with 4 TEG-specialized data sets and establishes remarkable perfor-mance -- predictions of R2 > 0.98 for coagulation traits and error reduction around half compared to the state-of-the-art methods, and halving the inferencing time too. Drift-aware learning suggests a new future, with potential uses well be-yond thrombophilia discovery towards medical AI applica-tions with data scarcity.


【2】Contextual Discrepancy-Aware Contrastive Learning for Robust Medical Time Series Diagnosis in Small-Sample Scenarios
标题:上下文差异感知对比学习用于小样本场景中稳健的医学时间序列诊断
链接:https://arxiv.org/abs/2601.07548

作者:Kaito Tanaka,Aya Nakayama,Masato Ito,Yuji Nishimura,Keisuke Matsuda
摘要:Medical time series data, such as EEG and ECG, are vital for diagnosing neurological and cardiovascular diseases. However, their precise interpretation faces significant challenges due to high annotation costs, leading to data scarcity, and the limitations of traditional contrastive learning in capturing complex temporal patterns. To address these issues, we propose CoDAC (Contextual Discrepancy-Aware Contrastive learning), a novel framework that enhances diagnostic accuracy and generalization, particularly in small-sample settings. CoDAC leverages external healthy data and introduces a Contextual Discrepancy Estimator (CDE), built upon a Transformer-based Autoencoder, to precisely quantify abnormal signals through context-aware anomaly scores. These scores dynamically inform a Dynamic Multi-views Contrastive Framework (DMCF), which adaptively weights different temporal views to focus contrastive learning on diagnostically relevant, discrepant regions. Our encoder combines dilated convolutions with multi-head attention for robust feature extraction. Comprehensive experiments on Alzheimer's Disease EEG, Parkinson's Disease EEG, and Myocardial Infarction ECG datasets demonstrate CoDAC's superior performance across all metrics, consistently outperforming state-of-the-art baselines, especially under low label availability. Ablation studies further validate the critical contributions of CDE and DMCF. CoDAC offers a robust and interpretable solution for medical time series diagnosis, effectively mitigating data scarcity challenges.


【3】Variational Autoencoder with Normalizing flow for X-ray spectral fitting
标题:具有标准化流程的变分自动编码器,用于X射线光谱匹配
链接:https://arxiv.org/abs/2601.07440

作者:Fiona Redmen,Ethan Tregidga,James F. Steiner,Cecilia Garraffo
备注:7 pages, 1 table, 3 figures. Accepted as a workshop paper to Machine Learning and the Physical Sciences at NeurIPS 2025
摘要:Black hole X-ray binaries (BHBs) can be studied with spectral fitting to provide physical constraints on accretion in extreme gravitational environments. Traditional methods of spectral fitting such as Markov Chain Monte Carlo (MCMC) face limitations due to computational times. We introduce a probabilistic model, utilizing a variational autoencoder with a normalizing flow, trained to adopt a physical latent space. This neural network produces predictions for spectral-model parameters as well as their full probability distributions. Our implementations result in a significant improvement in spectral reconstructions over a previous deterministic model while performing three orders of magnitude faster than traditional methods.


【4】Computing patient similarity based on unstructured clinical notes
标题:基于非结构化临床笔记计算患者相似性
链接:https://arxiv.org/abs/2601.07385

作者:Petr Zelina,Marko Řeháček,Jana Halámková,Lucia Bohovicová,Martin Rusinko,Vít Nováček
备注:This is a preprint and has not undergone peer review. Final version was presented at the Text, Speech, and Dialogue 2025 conference. The Version of Record is available at https://doi.org/10.1007/978-3-032-02551-7_13
摘要 :Clinical notes hold rich yet unstructured details about diagnoses, treatments, and outcomes that are vital to precision medicine but hard to exploit at scale. We introduce a method that represents each patient as a matrix built from aggregated embeddings of all their notes, enabling robust patient similarity computation based on their latent low-rank representations. Using clinical notes of 4,267 Czech breast-cancer patients and expert similarity labels from Masaryk Memorial Cancer Institute, we evaluate several matrix-based similarity measures and analyze their strengths and limitations across different similarity facets, such as clinical history, treatment, and adverse events. The results demonstrate the usefulness of the presented method for downstream tasks, such as personalized therapy recommendations or toxicity warnings.


【5】BEAT-Net: Injecting Biomimetic Spatio-Temporal Priors for Interpretable ECG Classification
标题:BEAT-Net:注入仿生时空先验以实现可解释的心电图分类
链接:https://arxiv.org/abs/2601.07316

作者:Runze Ma,Caizhi Liao
备注:8 pages, 4 figures and 2 tables
摘要:Although deep learning has advanced automated electrocardiogram (ECG) diagnosis, prevalent supervised methods typically treat recordings as undifferentiated one-dimensional (1D) signals or two-dimensional (2D) images. This formulation compels models to learn physiological structures implicitly, resulting in data inefficiency and opacity that diverge from medical reasoning. To address these limitations, we propose BEAT-Net, a Biomimetic ECG Analysis with Tokenization framework that reformulates the problem as a language modeling task. Utilizing a QRS tokenization strategy to transform continuous signals into biologically aligned heartbeat sequences, the architecture explicitly decomposes cardiac physiology through specialized encoders that extract local beat morphology while normalizing spatial lead perspectives and modeling temporal rhythm dependencies. Evaluations across three large-scale benchmarks demonstrate that BEAT-Net matches the diagnostic accuracy of dominant convolutional neural network (CNN) architectures while substantially improving robustness. The framework exhibits exceptional data efficiency, recovering fully supervised performance using only 30 to 35 percent of annotated data. Moreover, learned attention mechanisms provide inherent interpretability by spontaneously reproducing clinical heuristics, such as Lead II prioritization for rhythm analysis, without explicit supervision. These findings indicate that integrating biological priors offers a computationally efficient and interpretable alternative to data-intensive large-scale pre-training.


【6】$\texttt{AMEND++}$: Benchmarking Eligibility Criteria Amendments in Clinical Trials
标题:$ extttt {AMND ++}$:临床试验中的资格标准修正案基准
链接:https://arxiv.org/abs/2601.06300

作者:Trisha Das,Mandis Beigi,Jacob Aptekar,Jimeng Sun
摘要:Clinical trial amendments frequently introduce delays, increased costs, and administrative burden, with eligibility criteria being the most commonly amended component. We introduce \textit{eligibility criteria amendment prediction}, a novel NLP task that aims to forecast whether the eligibility criteria of an initial trial protocol will undergo future amendments. To support this task, we release $\texttt{AMEND++}$, a benchmark suite comprising two datasets: $\texttt{AMEND}$, which captures eligibility-criteria version histories and amendment labels from public clinical trials, and $\verb|AMEND_LLM|$, a refined subset curated using an LLM-based denoising pipeline to isolate substantive changes. We further propose $\textit{Change-Aware Masked Language Modeling}$ (CAMLM), a revision-aware pretraining strategy that leverages historical edits to learn amendment-sensitive representations. Experiments across diverse baselines show that CAMLM consistently improves amendment prediction, enabling more robust and cost-effective clinical trial design.


【7】A Multimodal Deep Learning Framework for Predicting ICU Deterioration: Integrating ECG Waveforms with Clinical Data and Clinician Benchmarking
标题:用于预测ICU恶化的多模式深度学习框架:将心电图波与临床数据和临床医生基准集成
链接:https://arxiv.org/abs/2601.06645

作者:Juan Miguel López Alcaraz,Xicoténcatl López Moran,Erick Dávila Zaragoza,Claas Händel,Richard Koebe,Wilhelm Haverkamp,Nils Strodthoff
备注:23 pages, 8 figures, source code under https://github.com/AI4HealthUOL/MDS-ICU
摘要:Artificial intelligence holds strong potential to support clinical decision making in intensive care units where timely and accurate risk assessment is critical. However, many existing models focus on isolated outcomes or limited data types, while clinicians integrate longitudinal history, real time physiology, and heterogeneous clinical information. To address this gap, we developed MDS ICU, a unified multimodal machine learning framework that fuses routinely collected data including demographics, biometrics, vital signs, laboratory values, ECG waveforms, surgical procedures, and medical device usage to provide continuous predictive support during ICU stays. Using 63001 samples from 27062 patients in MIMIC IV, we trained a deep learning architecture that combines structured state space S4 encoders for ECG waveforms with multilayer perceptron RealMLP encoders for tabular data to jointly predict 33 clinically relevant outcomes spanning mortality, organ dysfunction, medication needs, and acute deterioration. The model achieved strong discrimination with AUROCs of 0.90 for 24 hour mortality, 0.92 for sedative administration, 0.97 for invasive mechanical ventilation, and 0.93 for coagulation dysfunction. Calibration analysis showed close agreement between predicted and observed risks, with consistent gains from ECG waveform integration. Comparisons with clinicians and large language models showed that model predictions alone outperformed both, and that providing model outputs as decision support further improved their performance. These results demonstrate that multimodal AI can deliver clinically meaningful risk stratification across diverse ICU outcomes while augmenting rather than replacing clinical expertise, establishing a scalable foundation for precision critical care decision support.


【8】Computational Mapping of Reactive Stroma in Prostate Cancer Yields Interpretable, Prognostic Biomarkers
标题:前列腺癌反应性间质的计算绘图产生可解释、预后的生物标志物
链接:https://arxiv.org/abs/2601.06360

作者:Mara Pleasure,Ekaterina Redekop,Dhakshina Ilango,Zichen Wang,Vedrana Ivezic,Kimberly Flores,Israa Laklouk,Jitin Makker,Gregory Fishbein,Anthony Sisk,William Speier,Corey W. Arnold
摘要 :Current histopathological grading of prostate cancer relies primarily on glandular architecture, largely overlooking the tumor microenvironment. Here, we present PROTAS, a deep learning framework that quantifies reactive stroma (RS) in routine hematoxylin and eosin (H&E) slides and links stromal morphology to underlying biology. PROTAS-defined RS is characterized by nuclear enlargement, collagen disorganization, and transcriptomic enrichment of contractile pathways. PROTAS detects RS robustly in the external Prostate, Lung, Colorectal, and Ovarian (PLCO) dataset and, using domain-adversarial training, generalizes to diagnostic biopsies. In head-to-head comparisons, PROTAS outperforms pathologists for RS detection, and spatial RS features predict biochemical recurrence independently of established prognostic variables (c-index 0.80). By capturing subtle stromal phenotypes associated with tumor progression, PROTAS provides an interpretable, scalable biomarker to refine risk stratification.


蒸馏|知识提取(3篇)

【1】Softly Induced Functional Simplicity Implications for Neural Network Generalisation, Robustness, and Distillation
标题:软诱导的功能简单性对神经网络概括、鲁棒性和蒸馏的影响
链接:https://arxiv.org/abs/2601.06584

作者:Maciej Glowacki
摘要:Learning robust and generalisable abstractions from high-dimensional input data is a central challenge in machine learning and its applications to high-energy physics (HEP). Solutions of lower functional complexity are known to produce abstractions that generalise more effectively and are more robust to input perturbations. In complex hypothesis spaces, inductive biases make such solutions learnable by shaping the loss geometry during optimisation. In a HEP classification task, we show that a soft symmetry respecting inductive bias creates approximate degeneracies in the loss, which we identify as pseudo-Goldstone modes. We quantify functional complexity using metrics derived from first principles Hessian analysis and via compressibility. Our results demonstrate that solutions of lower complexity give rise to abstractions that are more generalisable, robust, and efficiently distillable.


【2】When Smaller Wins: Dual-Stage Distillation and Pareto-Guided Compression of Liquid Neural Networks for Edge Battery Prognostics
标题:当较小的获胜时:用于边缘电池预测的液体神经网络的双级蒸馏和帕累托引导压缩
链接:https://arxiv.org/abs/2601.06227

作者:Dhivya Dharshini Kannan,Wei Li,Zhang Wei,Jianbiao Wang,Zhi Wei Seh,Man-Fai Ng
备注:Submitted to International Conference on Pattern Recognition, ICPR 2026
摘要:Battery management systems increasingly require accurate battery health prognostics under strict on-device constraints. This paper presents DLNet, a practical framework with dual-stage distillation of liquid neural networks that turns a high-capacity model into compact and edge-deployable models for battery health prediction. DLNet first applies Euler discretization to reformulate liquid dynamics for embedded compatibility. It then performs dual-stage knowledge distillation to transfer the teacher model's temporal behavior and recover it after further compression. Pareto-guided selection under joint error-cost objectives retains student models that balance accuracy and efficiency. We evaluate DLNet on a widely used dataset and validate real-device feasibility on an Arduino Nano 33 BLE Sense using int8 deployment. The final deployed student achieves a low error of 0.0066 when predicting battery health over the next 100 cycles, which is 15.4% lower than the teacher model. It reduces the model size from 616 kB to 94 kB with 84.7% reduction and takes 21 ms per inference on the device. These results support a practical smaller wins observation that a small model can match or exceed a large teacher for edge-based prognostics with proper supervision and selection. Beyond batteries, the DLNet framework can extend to other industrial analytics tasks with strict hardware constraints.


【3】PromptPort: A Reliability Layer for Cross-Model Structured Extraction
标题:EntPort:跨模型结构化提取的可靠性层
链接:https://arxiv.org/abs/2601.06151

作者:Varun Kotte
备注:12 pages, 4 figures
摘要:Structured extraction with LLMs fails in production not because models lack understanding, but because output formatting is unreliable across models and prompts. A prompt that returns clean JSON on GPT-4 may produce fenced, prose-wrapped, or malformed output on Llama, causing strict parsers to reject otherwise correct extractions. We formalize this as format collapse and introduce a dual-metric evaluation framework: ROS (strict parsing, measuring operational reliability) and CSS (post-canonicalization, measuring semantic capability). On a 37,346-example camera metadata benchmark across six model families, we find severe format collapse (for example, Gemma-2B: ROS 0.116 versus CSS 0.246) and large cross-model portability gaps (0.4 to 0.6 F1). We then present PromptPort, a reliability layer combining deterministic canonicalization with a lightweight verifier (DistilBERT) and a safe-override policy. PromptPort recovers format failures (plus 6 to 8 F1), adds verifier-driven semantic selection (plus 14 to 16 F1 beyond canonicalization), and approaches per-field oracle performance (0.890 versus 0.896 in zero-shot) without modifying base models. The method generalizes to held-out model families and provides explicit abstention when uncertain, enabling reliable structured extraction in production deployments.


推荐(2篇)

【1】Pragya: An AI-Based Semantic Recommendation System for Sanskrit Subhasitas
标题:Pragya:梵语Subhasitas基于人工智能的语义推荐系统
链接:https://arxiv.org/abs/2601.06607

作者:Tanisha Raorane,Prasenjit Kole
备注:Preprint
摘要 :Sanskrit Subhasitas encapsulate centuries of cultural and philosophical wisdom, yet remain underutilized in the digital age due to linguistic and contextual barriers. In this work, we present Pragya, a retrieval-augmented generation (RAG) framework for semantic recommendation of Subhasitas. We curate a dataset of 200 verses annotated with thematic tags such as motivation, friendship, and compassion. Using sentence embeddings (IndicBERT), the system retrieves top-k verses relevant to user queries. The retrieved results are then passed to a generative model (Mistral LLM) to produce transliterations, translations, and contextual explanations. Experimental evaluation demonstrates that semantic retrieval significantly outperforms keyword matching in precision and relevance, while user studies highlight improved accessibility through generated summaries. To our knowledge, this is the first attempt at integrating retrieval and generation for Sanskrit Subhasitas, bridging cultural heritage with modern applied AI.


【2】PixRec: Leveraging Visual Context for Next-Item Prediction in Sequential Recommendation
标题:PixRec:在顺序推荐中利用视觉上下文进行下一项预测
链接:https://arxiv.org/abs/2601.06458

作者:Sayak Chakrabarty,Souradip Pal
备注:9 pages, 2 figures
摘要:Large Language Models (LLMs) have recently shown strong potential for usage in sequential recommendation tasks through text-only models, which combine advanced prompt design, contrastive alignment, and fine-tuning on downstream domain-specific data. While effective, these approaches overlook the rich visual information present in many real-world recommendation scenarios, particularly in e-commerce. This paper proposes PixRec - a vision-language framework that incorporates both textual attributes and product images into the recommendation pipeline. Our architecture leverages a vision-language model backbone capable of jointly processing image-text sequences, maintaining a dual-tower structure and mixed training objective while aligning multi-modal feature projections for both item-item and user-item interactions. Using the Amazon Reviews dataset augmented with product images, our experiments demonstrate $3\times$ and 40% improvements in top-rank and top-10 rank accuracy over text-only recommenders respectively, indicating that visual features can help distinguish items with similar textual descriptions. Our work outlines future directions for scaling multi-modal recommenders training, enhancing visual-text feature fusion, and evaluating inference-time performance. This work takes a step toward building software systems utilizing visual information in sequential recommendation for real-world applications like e-commerce.


聚类(4篇)

【1】TFEC: Multivariate Time-Series Clustering via Temporal-Frequency Enhanced Contrastive Learning
标题:TFEC:基于时频增强对比学习的多变量时间序列聚类
链接:https://arxiv.org/abs/2601.07550

作者:Zexi Tan,Tao Xie,Haoyi Xiao,Baoyao Yang,Yuzhu Ji,An Zeng,Xiang Zhang,Yiqun Zhang
备注:Submitted to ICASSP 2026
摘要:Multivariate Time-Series (MTS) clustering is crucial for signal processing and data analysis. Although deep learning approaches, particularly those leveraging Contrastive Learning (CL), are prominent for MTS representation, existing CL-based models face two key limitations: 1) neglecting clustering information during positive/negative sample pair construction, and 2) introducing unreasonable inductive biases, e.g., destroying time dependence and periodicity through augmentation strategies, compromising representation quality. This paper, therefore, proposes a Temporal-Frequency Enhanced Contrastive (TFEC) learning framework. To preserve temporal structure while generating low-distortion representations, a temporal-frequency Co-EnHancement (CoEH) mechanism is introduced. Accordingly, a synergistic dual-path representation and cluster distribution learning framework is designed to jointly optimize cluster structure and representation fidelity. Experiments on six real-world benchmark datasets demonstrate TFEC's superiority, achieving 4.48% average NMI gains over SOTA methods, with ablation studies validating the design. The code of the paper is available at: https://github.com/yueliangy/TFEC.


【2】Surrogate-based Optimization via Clustering for Box-Constrained Problems
标题:基于代理的箱形约束聚类优化算法
链接:https://arxiv.org/abs/2601.07442

作者:Maaz Ahmad,Iftekhar A. Karimi
备注:34 pages, 4 Figures, 8 Tables
摘要:Global optimization of large-scale, complex systems such as multi-physics black-box simulations and real-world industrial systems is important but challenging. This work presents a novel Surrogate-Based Optimization framework based on Clustering, SBOC for global optimization of such systems, which can be used with any surrogate modeling technique. At each iteration, it uses a single surrogate model for the entire domain, employs k-means clustering to identify unexplored domain, and exploits a local region around the surrogate optimum to potentially add three new sample points in the domain. SBOC has been tested against sixteen promising benchmarking algorithms using 52 analytical test functions of varying input dimensionalities and shape profiles. It successfully identified a global minimum for most test functions with substantially lower computational effort than other algorithms. It worked especially well on test functions with four or more input variables. It was also among the top six algorithms in approaching a global minimum closely. Overall, SBOC is a robust, reliable, and efficient algorithm for global optimization of box-constrained systems.


【3】LDTC: Lifelong deep temporal clustering for multivariate time series
标题:LDTC:多元时间序列的终身深度时间聚集
链接:https://arxiv.org/abs/2601.06221

作者:Zhi Wang,Yanni Li,Pingping Zheng,Yiyuan Jiao
摘要 :Clustering temporal and dynamically changing multivariate time series from real-world fields, called temporal clustering for short, has been a major challenge due to inherent complexities. Although several deep temporal clustering algorithms have demonstrated a strong advantage over traditional methods in terms of model learning and clustering results, the accuracy of the few algorithms are not satisfactory. None of the existing algorithms can continuously learn new tasks and deal with the dynamic data effectively and efficiently in the sequential tasks learning. To bridge the gap and tackle these issues, this paper proposes a novel algorithm \textbf{L}ifelong \textbf{D}eep \textbf{T}emporal \textbf{C}lustering (\textbf{LDTC}), which effectively integrates dimensionality reduction and temporal clustering into an end-to-end deep unsupervised learning framework. Using a specifically designed autoencoder and jointly optimizing for both the latent representation and clustering objective, the LDTC can achieve high-quality clustering results. Moreover, unlike any previous work, the LDTC is uniquely equipped with the fully dynamic model expansion and rehearsal-based techniques to effectively learn new tasks and to tackle the dynamic data in the sequential tasks learning without the catastrophic forgetting or degradation of the model accuracy. Experiments on seven real-world multivariate time series datasets show that the LDTC is a promising method for dealing with temporal clustering issues effectively and efficiently.


【4】Nonparametric Kernel Clustering with Bandit Feedback
标题:具有Bandit反馈的非参数核集群
链接:https://arxiv.org/abs/2601.07535

作者:Victor Thuot,Sebastian Vogt,Debarghya Ghoshdastidar,Nicolas Verzelen
摘要:Clustering with bandit feedback refers to the problem of partitioning a set of items, where the clustering algorithm can sequentially query the items to receive noisy observations. The problem is formally posed as the task of partitioning the arms of an N-armed stochastic bandit according to their underlying distributions, grouping two arms together if and only if they share the same distribution, using samples collected sequentially and adaptively. This setting has gained attention in recent years due to its applicability in recommendation systems and crowdsourcing. Existing works on clustering with bandit feedback rely on a strong assumption that the underlying distributions are sub-Gaussian. As a consequence, the existing methods mainly cover settings with linearly-separable clusters, which has little practical relevance. We introduce a framework of ``nonparametric clustering with bandit feedback'', where the underlying arm distributions are not constrained to any parametric, and hence, it is applicable for active clustering of real-world datasets. We adopt a kernel-based approach, which allows us to reformulate the nonparametric problem as the task of clustering the arms according to their kernel mean embeddings in a reproducing kernel Hilbert space (RKHS). Building on this formulation, we introduce the KABC algorithm with theoretical correctness guarantees and analyze its sampling budget. We introduce a notion of signal-to-noise ratio for this problem that depends on the maximum mean discrepancy (MMD) between the arm distributions and on their variance in the RKHS. Our algorithm is adaptive to this unknown quantity: it does not require it as an input yet achieves instance-dependent guarantees.


超分辨率|去噪|去模糊|去雾(1篇)

【1】DaQ-MSA: Denoising and Qualifying Diffusion Augmentations for Multimodal Sentiment Analysis
标题:DaQ-GMA:用于多模式情绪分析的去噪和定性扩散增强
链接:https://arxiv.org/abs/2601.06870

作者:Jiazhang Liang,Jianheng Dai,Miaosen Luo,Menghua Jiang,Sijie Mai
备注:11 pages, 4 figures
摘要:Multimodal large language models (MLLMs) have demonstrated strong performance on vision-language tasks, yet their effectiveness on multimodal sentiment analysis remains constrained by the scarcity of high-quality training data, which limits accurate multimodal understanding and generalization. To alleviate this bottleneck, we leverage diffusion models to perform semantics-preserving augmentation on the video and audio modalities, expanding the multimodal training distribution. However, increasing data quantity alone is insufficient, as diffusion-generated samples exhibit substantial quality variation and noisy augmentations may degrade performance. We therefore propose DaQ-MSA (Denoising and Qualifying Diffusion Augmentations for Multimodal Sentiment Analysis), which introduces a quality scoring module to evaluate the reliability of augmented samples and assign adaptive training weights. By down-weighting low-quality samples and emphasizing high-fidelity ones, DaQ-MSA enables more stable learning. By integrating the generative capability of diffusion models with the semantic understanding of MLLMs, our approach provides a robust and generalizable automated augmentation strategy for training MLLMs without any human annotation or additional supervision.


自动驾驶|车辆|车道检测等(3篇)

【1】Studying the Role of Synthetic Data for Machine Learning-based Wireless Networks Traffic Forecasting
标题:研究合成数据在基于机器学习的无线网络流量预测中的作用
链接:https://arxiv.org/abs/2601.07646

作者:José Pulido,Francesc Wilhelmi,Sergio Fortes,Alfonso Fernández-Durán,Lorenzo Galati Giordano,Raquel Barco
摘要:Synthetic data generation is an appealing tool for augmenting and enriching datasets, playing a crucial role in advancing artificial intelligence (AI) and machine learning (ML). Not only does synthetic data help build robust AI/ML datasets cost-effectively, but it also offers privacy-friendly solutions and bypasses the complexities of storing large data volumes. This paper proposes a novel method to generate synthetic data, based on first-order auto-regressive noise statistics, for large-scale Wi-Fi deployments. The approach operates with minimal real data requirements while producing statistically rich traffic patterns that effectively mimic real Access Point (AP) behavior. Experimental results show that ML models trained on synthetic data achieve Mean Absolute Error (MAE) values within 10 to 15 of those obtained using real data when trained on the same APs, while requiring significantly less training data. Moreover, when generalization is required, synthetic-data-trained models improve prediction accuracy by up to 50 percent compared to real-data-trained baselines, thanks to the enhanced variability and diversity of the generated traces. Overall, the proposed method bridges the gap between synthetic data generation and practical Wi-Fi traffic forecasting, providing a scalable, efficient, and real-time solution for modern wireless networks.


【2】qAttCNN - Self Attention Mechanism for Video QoE Prediction in Encrypted Traffic
标题:qAttCNN -加密流量中视频Qoe预测的自我注意机制
链接:https://arxiv.org/abs/2601.06862

作者:Michael Sidorov,Ofer Hadar
摘要 :The rapid growth of multimedia consumption, driven by major advances in mobile devices since the mid-2000s, has led to widespread use of video conferencing applications (VCAs) such as Zoom and Google Meet, as well as instant messaging applications (IMAs) like WhatsApp and Telegram, which increasingly support video conferencing as a core feature. Many of these systems rely on the Web Real-Time Communication (WebRTC) protocol, enabling direct peer-to-peer media streaming without requiring a third-party server to relay data, reducing the latency and facilitating a real-time communication. Despite WebRTC's potential, adverse network conditions can degrade streaming quality and consequently reduce users' Quality of Experience (QoE). Maintaining high QoE therefore requires continuous monitoring and timely intervention when QoE begins to deteriorate. While content providers can often estimate QoE by directly comparing transmitted and received media, this task is significantly more challenging for internet service providers (ISPs). End-to-end encryption, commonly used by modern VCAs and IMAs, prevent ISPs from accessing the original media stream, leaving only Quality of Service (QoS) and routing information available. To address this limitation, we propose the QoE Attention Convolutional Neural Network (qAttCNN), a model that leverages packet size parameter of the traffic to infer two no-reference QoE metrics viz. BRISQUE and frames per second (FPS). We evaluate qAttCNN on a custom dataset collected from WhatsApp video calls and compare it against existing QoE models. Using mean absolute error percentage (MAEP), our approach achieves 2.14% error for BRISQUE and 7.39% for FPS prediction.


【3】Time-Series Anomaly Classification for Launch Vehicle Propulsion Systems: Fast Statistical Detectors Enhancing LSTM Accuracy and Data Quality
标题:运载火箭推进系统的时间序列异常分类:快速统计检测器提高LSTM准确性和数据质量
链接:https://arxiv.org/abs/2601.06186

作者:Sean P. Engelstad,Sameul R. Darr,Matthew Taliaferro,Vinay K. Goyal
备注:19 pages and 12 figures
摘要:Supporting Go/No-Go decisions prior to launch requires assessing real-time telemetry data against redline limits established during the design qualification phase. Family data from ground testing or previous flights is commonly used to detect initiating failure modes and their timing; however, this approach relies heavily on engineering judgment and is more error-prone for new launch vehicles. To address these limitations, we utilize Long-Term Short-Term Memory (LSTM) networks for supervised classification of time-series anomalies. Although, initial training labels derived from simulated anomaly data may be suboptimal due to variations in anomaly strength, anomaly settling times, and other factors. In this work, we propose a novel statistical detector based on the Mahalanobis distance and forward-backward detection fractions to adjust the supervised training labels. We demonstrate our method on digital twin simulations of a ground-stage propulsion system with 20.8 minutes of operation per trial and O(10^8) training timesteps. The statistical data relabeling improved precision and recall of the LSTM classifier by 7% and 22% respectively.


联邦学习|隐私保护|加密(5篇)

【1】Proof of Reasoning for Privacy Enhanced Federated Blockchain Learning at the Edge
标题:隐私推理证明增强边缘联邦区块链学习
链接:https://arxiv.org/abs/2601.07134

作者:James Calo,Benny Lo
备注:8 Pages, 5 figues, 9 tables, journal paper
摘要:Consensus mechanisms are the core of any blockchain system. However, the majority of these mechanisms do not target federated learning directly nor do they aid in the aggregation step. This paper introduces Proof of Reasoning (PoR), a novel consensus mechanism specifically designed for federated learning using blockchain, aimed at preserving data privacy, defending against malicious attacks, and enhancing the validation of participating networks. Unlike generic blockchain consensus mechanisms commonly found in the literature, PoR integrates three distinct processes tailored for federated learning. Firstly, a masked autoencoder (MAE) is trained to generate an encoder that functions as a feature map and obfuscates input data, rendering it resistant to human reconstruction and model inversion attacks. Secondly, a downstream classifier is trained at the edge, receiving input from the trained encoder. The downstream network's weights, a single encoded datapoint, the network's output and the ground truth are then added to a block for federated aggregation. Lastly, this data facilitates the aggregation of all participating networks, enabling more complex and verifiable aggregation methods than previously possible. This three-stage process results in more robust networks with significantly reduced computational complexity, maintaining high accuracy by training only the downstream classifier at the edge. PoR scales to large IoT networks with low latency and storage growth, and adapts to evolving data, regulations, and network conditions.


【2】Federated Continual Learning for Privacy-Preserving Hospital Imaging Classification
标题:联邦连续学习用于隐私保护的医院成像分类
链接:https://arxiv.org/abs/2601.06742

作者:Anay Sinhal,Arpana Sinhal,Amit Sinhal
摘要:Deep learning models for radiology interpretation increasingly rely on multi-institutional data, yet privacy regulations and distribution shift across hospitals limit central data pooling. Federated learning (FL) allows hospitals to collaboratively train models without sharing raw images, but current FL algorithms typically assume a static data distribution. In practice, hospitals experience continual evolution in case mix, annotation protocols, and imaging devices, which leads to catastrophic forgetting when models are updated sequentially. Federated continual learning (FCL) aims to reconcile these challenges but existing methods either ignore the stringent privacy constraints of healthcare or rely on replay buffers and public surrogate datasets that are difficult to justify in clinical settings. We study FCL for chest radiography classification in a setting where hospitals are clients that receive temporally evolving streams of cases and labels. We introduce DP-FedEPC (Differentially Private Federated Elastic Prototype Consolidation), a method that combines elastic weight consolidation (EWC), prototype-based rehearsal, and client-side differential privacy within a standard FedAvg framework. EWC constrains updates along parameters deemed important for previous tasks, while a memory of latent prototypes preserves class structure without storing raw images. Differentially private stochastic gradient descent (DP-SGD) at each client adds calibrated Gaussian noise to clipped gradients, providing formal privacy guarantees for individual radiographs.


【3】Certified Unlearning in Decentralized Federated Learning
标题:去中心化联邦学习中的认证取消学习
链接:https://arxiv.org/abs/2601.06436

作者:Hengliang Wu,Youming Tao,Anhao Zhou,Shuzhen Chen,Falko Dressler,Dongxiao Yu
摘要 :Driven by the right to be forgotten (RTBF), machine unlearning has become an essential requirement for privacy-preserving machine learning. However, its realization in decentralized federated learning (DFL) remains largely unexplored. In DFL, clients exchange local updates only with neighbors, causing model information to propagate and mix across the network. As a result, when a client requests data deletion, its influence is implicitly embedded throughout the system, making removal difficult without centralized coordination. We propose a novel certified unlearning framework for DFL based on Newton-style updates. Our approach first quantifies how a client's data influence propagates during training. Leveraging curvature information of the loss with respect to the target data, we then construct corrective updates using Newton-style approximations. To ensure scalability, we approximate second-order information via Fisher information matrices. The resulting updates are perturbed with calibrated noise and broadcast through the network to eliminate residual influence across clients. We theoretically prove that our approach satisfies the formal definition of certified unlearning, ensuring that the unlearned model is difficult to distinguish from a retrained model without the deleted data. We also establish utility bounds showing that the unlearned model remains close to retraining from scratch. Extensive experiments across diverse decentralized settings demonstrate the effectiveness and efficiency of our framework.


【4】Federated Learning and Class Imbalances
标题:联邦学习和班级失衡
链接:https://arxiv.org/abs/2601.06348

作者:Siqi Zhu,Joshua D. Kaggie
摘要:Federated Learning (FL) enables collaborative model training across decentralized devices while preserving data privacy. However, real-world FL deployments face critical challenges such as data imbalances, including label noise and non-IID distributions. RHFL+, a state-of-the-art method, was proposed to address these challenges in settings with heterogeneous client models. This work investigates the robustness of RHFL+ under class imbalances through three key contributions: (1) reproduction of RHFL+ along with all benchmark algorithms under a unified evaluation framework; (2) extension of RHFL+ to real-world medical imaging datasets, including CBIS-DDSM, BreastMNIST and BHI; (3) a novel implementation using NVFlare, NVIDIA's production-level federated learning framework, enabling a modular, scalable and deployment-ready codebase. To validate effectiveness, extensive ablation studies, algorithmic comparisons under various noise conditions and scalability experiments across increasing numbers of clients are conducted.


【5】Causal and Federated Multimodal Learning for Cardiovascular Risk Prediction under Heterogeneous Populations
标题:因果和联邦多模式学习用于非均匀人群心血管风险预测
链接:https://arxiv.org/abs/2601.06140

作者:Rohit Kaushik,Eva Kaushik
备注:9 pages, 5 figures
摘要:Cardiovascular disease (CVD) continues to be the major cause of death globally, calling for predictive models that not only handle diverse and high-dimensional biomedical signals but also maintain interpretability and privacy. We create a single multimodal learning framework that integrates cross modal transformers with graph neural networks and causal representation learning to measure personalized CVD risk. The model combines genomic variation, cardiac MRI, ECG waveforms, wearable streams, and structured EHR data to predict risk while also implementing causal invariance constraints across different clinical subpopulations.   To maintain transparency, we employ SHAP based feature attribution, counterfactual explanations and causal latent alignment for understandable risk factors. Besides, we position the design in a federated, privacy, preserving optimization protocol and establish rules for convergence, calibration and uncertainty quantification under distributional shift. Experimental studies based on large-scale biobank and multi institutional datasets reveal state discrimination and robustness, exhibiting fair performance across demographic strata and clinically distinct cohorts. This study paves the way for a principled approach to clinically trustworthy, interpretable and privacy respecting CVD prediction at the population level.


推理|分析|理解|解释(15篇)

【1】DT-ICU: Towards Explainable Digital Twins for ICU Patient Monitoring via Multi-Modal and Multi-Task Iterative Inference
标题:DT-ICU:通过多模式和多任务迭代推理为ICU患者监护打造可解释的数字孪生
链接:https://arxiv.org/abs/2601.07778

作者:Wen Guo
摘要:We introduce DT-ICU, a multimodal digital twin framework for continuous risk estimation in intensive care. DT-ICU integrates variable-length clinical time series with static patient information in a unified multitask architecture, enabling predictions to be updated as new observations accumulate over the ICU stay. We evaluate DT-ICU on the large, publicly available MIMIC-IV dataset, where it consistently outperforms established baseline models under different evaluation settings. Our test-length analysis shows that meaningful discrimination is achieved shortly after admission, while longer observation windows further improve the ranking of high-risk patients in highly imbalanced cohorts. To examine how the model leverages heterogeneous data sources, we perform systematic modality ablations, revealing that the model learnt a reasonable structured reliance on interventions, physiological response observations, and contextual information. These analyses provide interpretable insights into how multimodal signals are combined and how trade-offs between sensitivity and precision emerge. Together, these results demonstrate that DT-ICU delivers accurate, temporally robust, and interpretable predictions, supporting its potential as a practical digital twin framework for continuous patient monitoring in critical care. The source code and trained model weights for DT-ICU are publicly available at https://github.com/GUO-W/DT-ICU-release.


【2】The Practicality of Normalizing Flow Test-Time Training in Bayesian Inference for Agent-Based Models
标题:基于代理的模型的Bayesian推理中规范流测试时训练的实用性
链接:https://arxiv.org/abs/2601.07413

作者:Junyao Zhang,Jinglai Li,Junqi Tang
摘要 :Agent-Based Models (ABMs) are gaining great popularity in economics and social science because of their strong flexibility to describe the realistic and heterogeneous decisions and interaction rules between individual agents. In this work, we investigate for the first time the practicality of test-time training (TTT) of deep models such as normalizing flows, in the parameters posterior estimations of ABMs. We propose several practical TTT strategies for fine-tuning the normalizing flow against distribution shifts. Our numerical study demonstrates that TTT schemes are remarkably effective, enabling real-time adjustment of flow-based inference for ABM parameters.


【3】Outcome-Grounded Advantage Reshaping for Fine-Grained Credit Assignment in Mathematical Reasoning
标题:数学推理中细粒度学分分配的以结果为基础的优势重塑
链接:https://arxiv.org/abs/2601.07408

作者:Ziheng Li,Liu Kang,Feng Xiao,Luxi Xing,Qingyi Si,Zhuoran Li,Weikang Gong,Deqing Yang,Yanghua Xiao,Hongcheng Guo
摘要:Group Relative Policy Optimization (GRPO) has emerged as a promising critic-free reinforcement learning paradigm for reasoning tasks. However, standard GRPO employs a coarse-grained credit assignment mechanism that propagates group-level rewards uniformly to to every token in a sequence, neglecting the varying contribution of individual reasoning steps. We address this limitation by introducing Outcome-grounded Advantage Reshaping (OAR), a fine-grained credit assignment mechanism that redistributes advantages based on how much each token influences the model's final answer. We instantiate OAR via two complementary strategies: (1) OAR-P, which estimates outcome sensitivity through counterfactual token perturbations, serving as a high-fidelity attribution signal; (2) OAR-G, which uses an input-gradient sensitivity proxy to approximate the influence signal with a single backward pass. These importance signals are integrated with a conservative Bi-Level advantage reshaping scheme that suppresses low-impact tokens and boosts pivotal ones while preserving the overall advantage mass. Empirical results on extensive mathematical reasoning benchmarks demonstrate that while OAR-P sets the performance upper bound, OAR-G achieves comparable gains with negligible computational overhead, both significantly outperforming a strong GRPO baseline, pushing the boundaries of critic-free LLM reasoning.


【4】Forward versus Backward: Comparing Reasoning Objectives in Direct Preference Optimization
标题:向前与向后:比较直接偏好优化中的推理目标
链接:https://arxiv.org/abs/2601.07199

作者:Murtaza Nikzad,Raghuram Ramanujan
摘要:Large language models exhibit impressive reasoning capabilities yet frequently generate plausible but incorrect solutions, a phenomenon commonly termed hallucination. This paper investigates the effect of training objective composition on reasoning reliability through Direct Preference Optimization. Two complementary training signals are examined: forward chain-of-thought generation, which trains the model to produce correct reasoning traces, and backward verification, which trains the model to verify and acknowledge errors in candidate solutions. Experiments on GSM8K reveal a fundamental trade-off between these objectives. Forward-only DPO training achieves the highest accuracy improvement, increasing from 83.1% to 86.6% (+3.5 percentage points), while backward-only training yields minimal accuracy gains but substantially reduces the false positive rate from 13.4% to 4.3%. Notably, both training variants reduce acknowledgement rate compared to the baseline, suggesting that preference optimization increases model confidence in its outputs. These findings indicate that forward and backward reasoning objectives provide distinct and complementary learning signals: forward training improves problem-solving capability, while backward training improves verification calibration. The complete training and evaluation pipeline, implemented efficiently through Low-Rank Adaptation, is released to facilitate further research.


【5】Mid-Think: Training-Free Intermediate-Budget Reasoning via Token-Level Triggers
标题:中期思考:通过代币级触发器进行免训练的中间预算推理
链接:https://arxiv.org/abs/2601.07036

作者:Wang Yang,Debargha Ganguly,Xinpeng Li,Chaoda Song,Shouren Wang,Vikash Singh,Vipin Chaudhary,Xiaotian Han
摘要:Hybrid reasoning language models are commonly controlled through high-level Think/No-think instructions to regulate reasoning behavior, yet we found that such mode switching is largely driven by a small set of trigger tokens rather than the instructions themselves. Through attention analysis and controlled prompting experiments, we show that a leading ``Okay'' token induces reasoning behavior, while the newline pattern following ``'' suppresses it. Based on this observation, we propose Mid-Think, a simple training-free prompting format that combines these triggers to achieve intermediate-budget reasoning, consistently outperforming fixed-token and prompt-based baselines in terms of the accuracy-length trade-off. Furthermore, applying Mid-Think to RL training after SFT reduces training time by approximately 15% while improving final performance of Qwen3-8B on AIME from 69.8% to 72.4% and on GPQA from 58.5% to 61.1%, demonstrating its effectiveness for both inference-time control and RL-based reasoning training.


【6】Explainable Deep Radiogenomic Molecular Imaging for MGMT Methylation Prediction in Glioblastoma
标题:用于预测胶质母细胞瘤MGMT甲基化的可解释深度放射基因组学分子成像
链接:https://arxiv.org/abs/2601.07035

作者:Hasan M Jamil
备注:14 pages
摘要:Glioblastoma (GBM) is a highly aggressive primary brain tumor with limited therapeutic options and poor prognosis. The methylation status of the O6-methylguanine-DNA methyltransferase (MGMT) gene promoter is a critical molecular biomarker that influences patient response to temozolomide chemotherapy. Traditional methods for determining MGMT status rely on invasive biopsies and are limited by intratumoral heterogeneity and procedural risks. This study presents a radiogenomic molecular imaging analysis framework for the non-invasive prediction of MGMT promoter methylation using multi-parametric magnetic resonance imaging (mpMRI).   Our approach integrates radiomics, deep learning, and explainable artificial intelligence (XAI) to analyze MRI-derived imaging phenotypes and correlate them with molecular labels. Radiomic features are extracted from FLAIR, T1-weighted, T1-contrast-enhanced, and T2-weighted MRI sequences, while a 3D convolutional neural network learns deep representations from the same modalities. These complementary features are fused using both early fusion and attention-based strategies and classified to predict MGMT methylation status.   To enhance clinical interpretability, we apply XAI methods such as Grad-CAM and SHAP to visualize and explain model decisions. The proposed framework is trained on the RSNA-MICCAI Radiogenomic Classification dataset and externally validated on the BraTS 2021 dataset. This work advances the field of molecular imaging by demonstrating the potential of AI-driven radiogenomics for precision oncology, supporting non-invasive, accurate, and interpretable prediction of clinically actionable molecular biomarkers in GBM.


【7】Tight Analysis of Decentralized SGD: A Markov Chain Perspective
标题:去中心化新元的严密分析:马尔科夫链的视角
链接:https://arxiv.org/abs/2601.07021

作者:Lucas Versini,Paul Mangold,Aymeric Dieuleveut
摘要:We propose a novel analysis of the Decentralized Stochastic Gradient Descent (DSGD) algorithm with constant step size, interpreting the iterates of the algorithm as a Markov chain. We show that DSGD converges to a stationary distribution, with its bias, to first order, decomposable into two components: one due to decentralization (growing with the graph's spectral gap and clients' heterogeneity) and one due to stochasticity. Remarkably, the variance of local parameters is, at the first-order, inversely proportional to the number of clients, regardless of the network topology and even when clients' iterates are not averaged at the end. As a consequence of our analysis, we obtain non-asymptotic convergence bounds for clients' local iterates, confirming that DSGD has linear speed-up in the number of clients, and that the network topology only impacts higher-order terms.


【8】GanitLLM: Difficulty-Aware Bengali Mathematical Reasoning through Curriculum-GRPO
标题:GanitLLM:通过课程了解困难的孟加拉数学推理-GRPO
链接:https://arxiv.org/abs/2601.06767

作者:Shubhashis Roy Dipta,Khairul Mahbub,Nadia Najjar
摘要:We present a Bengali mathematical reasoning model called GanitLLM (named after the Bangla word for mathematics, "Ganit"), together with a new difficulty-aware Bengali math corpus and a curriculum-based GRPO pipeline. Bengali is one of the world's most widely spoken languages, yet existing LLMs either reason in English and then translate, or simply fail on multi-step Bengali math, in part because reinforcement learning recipes are tuned for high-resource languages and collapse under reward sparsity in low-resource settings. To address this, we construct Ganit, a rigorously filtered and decontaminated Bengali math dataset with automatic difficulty tags derived from the pass@k of a strong evaluator model. Building on this dataset, we propose Curriculum-GRPO, which combines multi-stage training (SFT + GRPO) with difficulty-aware sampling and verifiable rewards for format, numerical correctness, and Bengali reasoning. On Bn-MGSM and Bn-MSVAMP, GanitLLM-4B improves over its Qwen3-4B base by +8 and +7 accuracy points, respectively, while increasing the percentage of Bengali reasoning tokens from 14% to over 88% and reducing average solution length from 943 to 193 words.


【9】Plasticity vs. Rigidity: The Impact of Low-Rank Adapters on Reasoning on a Micro-Budget
标题:可塑性与刚性:低级适应者对微观预算推理的影响
链接:https://arxiv.org/abs/2601.06677

作者:Zohaib Khan,Omer Tafveez,Zoha Hayat Bhatti
备注:9 pages, 4 figures, 2 tables
摘要:Recent advances in mathematical reasoning typically rely on massive scale, yet the question remains: can strong reasoning capabilities be induced in small language models ($\leq1.5\text{B}$) under extreme constraints? We investigate this by training models on a single A40 GPU (48GB) for under 24 hours using Reinforcement Learning with Verifiable Rewards (RLVR) and Low-Rank Adaptation (LoRA). We find that the success of this ``micro-budget" regime depends critically on the interplay between adapter capacity and model initialization. While low-rank adapters ($r=8$) consistently fail to capture the complex optimization dynamics of reasoning, high-rank adapters ($r=256$) unlock significant plasticity in standard instruction-tuned models. Our best result achieved an impressive 40.0\% Pass@1 on AIME 24 (an 11.1\% absolute improvement over baseline) and pushed Pass@16 to 70.0\%, demonstrating robust exploration capabilities. However, this plasticity is not universal: while instruction-tuned models utilized the budget to elongate their chain-of-thought and maximize reward, heavily math-aligned models suffered performance collapse, suggesting that noisy, low-budget RL updates can act as destructive interference for models already residing near a task-specific optimum.


【10】SourceNet: Interpretable Sim-to-Real Inference on Variable-Geometry Sensor Arrays for Earthquake Source Inversion
标题:Source Net:用于地震源倒置的可变几何传感器阵列的可解释的Sim-to-Real推理
链接:https://arxiv.org/abs/2601.06320

作者:Zhe Jia,Xiaotian Zhang,Junpeng Li
摘要:Inferring high-dimensional physical states from sparse, ad-hoc sensor arrays is a fundamental challenge across AI for Science, as they are complicated by irregular geometries and the profound Sim-to-Real gap in physical modeling. Taking earthquake source characterization as a representative challenge, we address limitations in conventional deep learning: CNNs demand fixed grids, while pooling-based architectures (e.g., DeepSets) struggle to capture the relational wave physics. Here, we propose SourceNet, a Transformer-based framework that treats the sensor array as a flexible set to model arbitrary geometries. To bridge the reality gap, we introduce Physics-Structured Domain Randomization (PSDR). Instead of forcing feature alignment, PSDR randomizes the governing physical dynamics by varying velocity structures, propagation effects, and sensor availability, to force the model to learn robust representations invariant to unmodeled environmental heterogeneity. By pre-training on 100,000 synthetic events and fine-tuning on ~2,000 real world events, SourceNet achieves state-of-the-art precision on held-out real data. This demonstrates exceptional data efficiency, and matches classical solvers while enabling real-time processing. Remarkably, interpretability analysis reveals that the model shows scientific-agent-like features: it autonomously discovers geometric information bottlenecks and learns an attention policy that prioritizes sparse sensor placements, effectively recovering principles of optimal experimental design from data alone.


【11】Triadic Concept Analysis for Logic Interpretation of Simple Artificial Networks
标题:简单人工网络逻辑解释的三重概念分析
链接:https://arxiv.org/abs/2601.06229

作者 :Ingo Schmitt
摘要:An artificial neural network (ANN) is a numerical method used to solve complex classification problems. Due to its high classification power, the ANN method often outperforms other classification methods in terms of accuracy. However, an ANN model lacks interpretability compared to methods that use the symbolic paradigm. Our idea is to derive a symbolic representation from a simple ANN model trained on minterm values of input objects. Based on ReLU nodes, the ANN model is partitioned into cells. We convert the ANN model into a cell-based, three-dimensional bit tensor. The theory of Formal Concept Analysis applied to the tensor yields concepts that are represented as logic trees, expressing interpretable attribute interactions. Their evaluations preserve the classification power of the initial ANN model.


【12】GroupSegment-SHAP: Shapley Value Explanations with Group-Segment Players for Multivariate Time Series
标题:GroupSegment-SHAP:Shapley与多元时间序列集团细分参与者的价值解释
链接:https://arxiv.org/abs/2601.06114

作者:Jinwoong Kim,Sangjin Park
备注:12 pages
摘要:Multivariate time-series models achieve strong predictive performance in healthcare, industry, energy, and finance, but how they combine cross-variable interactions with temporal dynamics remains unclear. SHapley Additive exPlanations (SHAP) are widely used for interpretation. However, existing time-series variants typically treat the feature and time axes independently, fragmenting structural signals formed jointly by multiple variables over specific intervals. We propose GroupSegment SHAP (GS-SHAP), which constructs explanatory units as group-segment players based on cross-variable dependence and distribution shifts over time, and then quantifies each unit's contribution via Shapley attribution. We evaluate GS-SHAP across four real-world domains: human activity recognition, power-system forecasting, medical signal analysis, and financial time series, and compare it with KernelSHAP, TimeSHAP, SequenceSHAP, WindowSHAP, and TSHAP. GS-SHAP improves deletion-based faithfulness (DeltaAUC) by about 1.7x on average over time-series SHAP baselines, while reducing wall-clock runtime by about 40 percent on average under matched perturbation budgets. A financial case study shows that GS-SHAP identifies interpretable multivariate-temporal interactions among key market variables during high-volatility regimes.


【13】Convergence Rate Analysis of the AdamW-Style Shampoo: Unifying One-sided and Two-Sided Preconditioning
标题:AdamW风格洗发水的收敛率分析:统一单边和双边预处理
链接:https://arxiv.org/abs/2601.07326

作者:Huan Li,Yiming Dong,Zhouchen Lin
摘要:This paper studies the AdamW-style Shampoo optimizer, an effective implementation of classical Shampoo that notably won the external tuning track of the AlgoPerf neural network training algorithm competition. Our analysis unifies one-sided and two-sided preconditioning and establishes the convergence rate $\frac{1}{K}\sum_{k=1}^K E\left[\|\nabla f(X_k)\|_*\right]\leq O(\frac{\sqrt{m+n}C}{K^{1/4}})$ measured by nuclear norm, where $K$ represents the iteration number, $(m,n)$ denotes the size of matrix parameters, and $C$ matches the constant in the optimal convergence rate of SGD. Theoretically, we have $\|\nabla f(X)\|_F\leq \|\nabla f(X)\|_*\leq \sqrt{m+n}\|\nabla f(X)\|_F$, supporting that our convergence rate can be considered to be analogous to the optimal $\frac{1}{K}\sum_{k=1}^KE\left[\|\nabla f(X_k)\|_F\right]\leq O(\frac{C}{K^{1/4}})$ convergence rate of SGD in the ideal case of $\|\nabla f(X)\|_*= Θ(\sqrt{m+n})\|\nabla f(X)\|_F$.


【14】Variational Approximations for Robust Bayesian Inference via Rho-Posteriors
标题:基于Rho后验的稳健Bayesian推理的变分逼近
链接:https://arxiv.org/abs/2601.07325

作者:EL Mahdi Khribch,Pierre Alquier
备注:53 pages including the proofs in appendices, 16 figures
摘要:The $ρ$-posterior framework provides universal Bayesian estimation with explicit contamination rates and optimal convergence guarantees, but has remained computationally difficult due to an optimization over reference distributions that precludes intractable posterior computation. We develop a PAC-Bayesian framework that recovers these theoretical guarantees through temperature-dependent Gibbs posteriors, deriving finite-sample oracle inequalities with explicit rates and introducing tractable variational approximations that inherit the robustness properties of exact $ρ$-posteriors. Numerical experiments demonstrate that this approach achieves theoretical contamination rates while remaining computationally feasible, providing the first practical implementation of $ρ$-posterior inference with rigorous finite-sample guarantees.


【15】Inference-Time Alignment for Diffusion Models via Doob's Matching
标题:通过Doob匹配实现扩散模型的推理时间对齐
链接:https://arxiv.org/abs/2601.06514

作者:Jinyuan Chang,Chenguang Duan,Yuling Jiao,Yi Xu,Jerry Zhijian Yang
摘要:Inference-time alignment for diffusion models aims to adapt a pre-trained diffusion model toward a target distribution without retraining the base score network, thereby preserving the generative capacity of the base model while enforcing desired properties at the inference time. A central mechanism for achieving such alignment is guidance, which modifies the sampling dynamics through an additional drift term. In this work, we introduce Doob's matching, a novel framework for guidance estimation grounded in Doob's $h$-transform. Our approach formulates guidance as the gradient of logarithm of an underlying Doob's $h$-function and employs gradient-penalized regression to simultaneously estimate both the $h$-function and its gradient, resulting in a consistent estimator of the guidance. Theoretically, we establish non-asymptotic convergence rates for the estimated guidance. Moreover, we analyze the resulting controllable diffusion processes and prove non-asymptotic convergence guarantees for the generated distributions in the 2-Wasserstein distance.


检测相关(2篇)

【1】A High-Recall Cost-Sensitive Machine Learning Framework for Real-Time Online Banking Transaction Fraud Detection
标题:用于实时在线银行交易欺诈检测的高召回、成本敏感的机器学习框架
链接:https://arxiv.org/abs/2601.07276

作者:Karthikeyan V. R.,Premnath S.,Kavinraaj S.,J. Sangeetha
备注:7 pages, 5 figures. Submitted to arXiv as a preprint
摘要:Fraudulent activities on digital banking services are becoming more intricate by the day, challenging existing defenses. While older rule driven methods struggle to keep pace, even precision focused algorithms fall short when new scams are introduced. These tools typically overlook subtle shifts in criminal behavior, missing crucial signals. Because silent breaches cost institutions far more than flagged but legitimate actions, catching every possible case is crucial. High sensitivity to actual threats becomes essential when oversight leads to heavy losses. One key aim here involves reducing missed fraud cases without spiking incorrect alerts too much. This study builds a system using group learning methods adjusted through smart threshold choices. Using real world transaction records shared openly, where cheating acts rarely appear among normal activities, tests are run under practical skewed distributions. The outcomes reveal that approximately 91 percent of actual fraud is detected, outperforming standard setups that rely on unchanging rules when dealing with uneven examples across classes. When tested in live settings, the fraud detection system connects directly to an online banking transaction flow, stopping questionable activities before they are completed. Alongside this setup, a browser add on built for Chrome is designed to flag deceptive web links and reduce threats from harmful sites. These results show that adjusting decisions by cost impact and validating across entire systems makes deployment more stable and realistic for today's digital banking platforms.


【2】CEEMDAN-Based Multiscale CNN for Wind Turbine Gearbox Fault Detection
标题:基于CEEMDAN的多尺度CNN风力发电机齿轮箱故障检测
链接:https://arxiv.org/abs/2601.06217

作者:Nejad Alagha,Anis Salwa Mohd Khairuddin,Obada Al-Khatib,Abigail Copiaco
备注:conference paper
摘要:Wind turbines play a critical role in the shift toward sustainable energy generation. Their operation relies on multiple interconnected components, and a failure in any of these can compromise the entire system's functionality. Detecting faults accurately is challenging due to the intricate, non-linear, and non-stationary nature of vibration signals, influenced by dynamic loading, environmental variations, and mechanical interactions. As such, effective signal processing techniques are essential for extracting meaningful features to enhance diagnostic accuracy. This study presents a hybrid approach for fault detection in wind turbine gearboxes, combining Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN) and a Multiscale Convolutional Neural Network (MSCNN). CEEMDAN is employed to decompose vibration signals into intrinsic mode functions, isolating critical features at different time-frequency scales. These are then input into the MSCNN, which performs deep hierarchical feature extraction and classification. The proposed method achieves an F1 Score of 98.95\%, evaluated on real-world datasets, and demonstrates superior performance in both detection accuracy and computational speed compared to existing approaches. This framework offers a balanced solution for reliable and efficient fault diagnosis in wind turbine systems.


分类|识别(2篇)

【1】Towards Automated Diagnosis of Inherited Arrhythmias: Combined Arrhythmia Classification Using Lead-Aware Spatial Attention Networks
标题:遗传性心律失常的自动诊断:使用导联感知空间注意力网络的心律失常组合分类
链接:https://arxiv.org/abs/2601.07124

作者:Sophie Sigfstead,River Jiang,Brianna Davies,Zachary W. M. Laksman,Julia Cadrin-Tourigny,Rafik Tadros,Habib Khan,Joseph Atallah,Christian Steinberg,Shubhayan Sanatani,Mario Talajic,Rahul Krishnan,Andrew D. Krahn,Christopher C. Cheung
备注:34 pages, 2 figures, 4 tables
摘要:Arrhythmogenic right ventricular cardiomyopathy (ARVC) and long QT syndrome (LQTS) are inherited arrhythmia syndromes associated with sudden cardiac death. Deep learning shows promise for ECG interpretation, but multi-class inherited arrhythmia classification with clinically grounded interpretability remains underdeveloped. Our objective was to develop and validate a lead-aware deep learning framework for multi-class (ARVC vs LQTS vs control) and binary inherited arrhythmia classification, and to determine optimal strategies for integrating ECG foundation models within arrhythmia screening tools. We assembled a 13-center Canadian cohort (645 patients; 1,344 ECGs). We evaluated four ECG foundation models using three transfer learning approaches: linear probing, fine-tuning, and combined strategies. We developed lead-aware spatial attention networks (LASAN) and assessed integration strategies combining LASAN with foundation models. Performance was compared against the established foundation model baselines. Lead-group masking quantified disease-specific lead dependence. Fine-tuning outperformed linear probing and combined strategies across all foundation models (mean macro-AUROC 0.904 vs 0.825). The best lead-aware integrations achieved near-ceiling performance (HuBERT-ECG hybrid: macro-AUROC 0.990; ARVC vs control AUROC 0.999; LQTS vs control AUROC 0.994). Lead masking demonstrated physiologic plausibility: V1-V3 were most critical for ARVC detection (4.54% AUROC reduction), while lateral leads were preferentially important for LQTS (2.60% drop). Lead-aware architectures achieved state-of-the-art performance for inherited arrhythmia classification, outperforming all existing published models on both binary and multi-class tasks while demonstrating clinically aligned lead dependence. These findings support potential utility for automated ECG screening pending validation.


【2】A Unified Shape-Aware Foundation Model for Time Series Classification
标题:时间序列分类的统一形状感知基础模型
链接:https://arxiv.org/abs/2601.06429

作者:Zhen Liu,Yucheng Wang,Boyuan Li,Junhao Zheng,Emadeldeen Eldele,Min Wu,Qianli Ma
备注:Accepted in AAAI 2026
摘要 :Foundation models pre-trained on large-scale source datasets are reshaping the traditional training paradigm for time series classification. However, existing time series foundation models primarily focus on forecasting tasks and often overlook classification-specific challenges, such as modeling interpretable shapelets that capture class-discriminative temporal features. To bridge this gap, we propose UniShape, a unified shape-aware foundation model designed for time series classification. UniShape incorporates a shape-aware adapter that adaptively aggregates multiscale discriminative subsequences (shapes) into class tokens, effectively selecting the most relevant subsequence scales to enhance model interpretability. Meanwhile, a prototype-based pretraining module is introduced to jointly learn instance- and shape-level representations, enabling the capture of transferable shape patterns. Pre-trained on a large-scale multi-domain time series dataset comprising 1.89 million samples, UniShape exhibits superior generalization across diverse target domains. Experiments on 128 UCR datasets and 30 additional time series datasets demonstrate that UniShape achieves state-of-the-art classification performance, with interpretability and ablation analyses further validating its effectiveness.


表征(2篇)

【1】Pseudodata-guided Invariant Representation Learning Boosts the Out-of-Distribution Generalization in Enzymatic Kinetic Parameter Prediction
标题:伪数据引导的不变表示学习增强了酶动力学参数预测中的分布外推广
链接:https://arxiv.org/abs/2601.07261

作者:Haomin Wu,Zhiwei Nie,Hongyu Zhang,Zhixiang Ren
摘要:Accurate prediction of enzyme kinetic parameters is essential for understanding catalytic mechanisms and guiding enzyme engineering.However, existing deep learning-based enzyme-substrate interaction (ESI) predictors often exhibit performance degradation on sequence-divergent, out-of-distribution (OOD) cases, limiting robustness under biologically relevant perturbations.We propose O$^2$DENet, a lightweight, plug-and-play module that enhances OOD generalization via biologically and chemically informed perturbation augmentation and invariant representation learning.O$^2$DENet introduces enzyme-substrate perturbations and enforces consistency between original and augmented enzyme-substrate-pair representations to encourage invariance to distributional shifts.When integrated with representative ESI models, O$^2$DENet consistently improves predictive performance for both $k_{cat}$ and $K_m$ across stringent sequence-identity-based OOD benchmarks, achieving state-of-the-art results among the evaluated methods in terms of accuracy and robustness metrics.Overall, O$^2$DENet provides a general and effective strategy to enhance the stability and deployability of data-driven enzyme kinetics predictors for real-world enzyme engineering applications.


【2】Variational decomposition autoencoding improves disentanglement of latent representations
标题:变分分解自动编码改进了潜在表示的解纠缠
链接:https://arxiv.org/abs/2601.06844

作者:Ioannis Ziogas,Aamna Al Shehhi,Ahsan H. Khandoker,Leontios J. Hadjileontiadis
备注:Supplementary information file at: https://drive.google.com/drive/folders/1sZl2AcCtRK-1oav7XZSaxlu0Cq0-3MMs?usp=sharing
摘要:Understanding the structure of complex, nonstationary, high-dimensional time-evolving signals is a central challenge in scientific data analysis. In many domains, such as speech and biomedical signal processing, the ability to learn disentangled and interpretable representations is critical for uncovering latent generative mechanisms. Traditional approaches to unsupervised representation learning, including variational autoencoders (VAEs), often struggle to capture the temporal and spectral diversity inherent in such data. Here we introduce variational decomposition autoencoding (VDA), a framework that extends VAEs by incorporating a strong structural bias toward signal decomposition. VDA is instantiated through variational decomposition autoencoders (DecVAEs), i.e., encoder-only neural networks that combine a signal decomposition model, a contrastive self-supervised task, and variational prior approximation to learn multiple latent subspaces aligned with time-frequency characteristics. We demonstrate the effectiveness of DecVAEs on simulated data and three publicly available scientific datasets, spanning speech recognition, dysarthria severity evaluation, and emotional speech classification. Our results demonstrate that DecVAEs surpass state-of-the-art VAE-based methods in terms of disentanglement quality, generalization across tasks, and the interpretability of latent encodings. These findings suggest that decomposition-aware architectures can serve as robust tools for extracting structured representations from dynamic signals, with potential applications in clinical diagnostics, human-computer interaction, and adaptive neurotechnologies.


编码器(3篇)

【1】Land-then-transport: A Flow Matching-Based Generative Decoder for Wireless Image Transmission
标题:先陆后运:一种基于流匹配的无线图像传输生成解码器
链接:https://arxiv.org/abs/2601.07512

作者:Jingwen Fu,Ming Xiao,Mikael Skoglund,Dong In Kim
摘要:Due to strict rate and reliability demands, wireless image transmission remains difficult for both classical layered designs and joint source-channel coding (JSCC), especially under low latency. Diffusion-based generative decoders can deliver strong perceptual quality by leveraging learned image priors, but iterative stochastic denoising leads to high decoding delay. To enable low-latency decoding, we propose a flow-matching (FM) generative decoder under a new land-then-transport (LTT) paradigm that tightly integrates the physical wireless channel into a continuous-time probability flow. For AWGN channels, we build a Gaussian smoothing path whose noise schedule indexes effective noise levels, and derive a closed-form teacher velocity field along this path. A neural-network student vector field is trained by conditional flow matching, yielding a deterministic, channel-aware ODE decoder with complexity linear in the number of ODE steps. At inference, it only needs an estimate of the effective noise variance to set the ODE starting time. We further show that Rayleigh fading and MIMO channels can be mapped, via linear MMSE equalization and singular-value-domain processing, to AWGN-equivalent channels with calibrated starting times. Therefore, the same probability path and trained velocity field can be reused for Rayleigh and MIMO without retraining. Experiments on MNIST, Fashion-MNIST, and DIV2K over AWGN, Rayleigh, and MIMO demonstrate consistent gains over JPEG2000+LDPC, DeepJSCC, and diffusion-based baselines, while achieving good perceptual quality with only a few ODE steps. Overall, LTT provides a deterministic, physically interpretable, and computation-efficient framework for generative wireless image decoding across diverse channels.


【2】Deriving Decoder-Free Sparse Autoencoders from First Principles
标题:从第一原理推导无解码器的稀疏自动编码器
链接:https://arxiv.org/abs/2601.06478

作者:Alan Oursland
备注:22 pages, 3 figures, 9 tables
摘要:Gradient descent on log-sum-exp (LSE) objectives performs implicit expectation--maximization (EM): the gradient with respect to each component output equals its responsibility. The same theory predicts collapse without volume control analogous to the log-determinant in Gaussian mixture models. We instantiate the theory in a single-layer encoder with an LSE objective and InfoMax regularization for volume control. Experiments confirm the theory's predictions. The gradient--responsibility identity holds exactly; LSE alone collapses; variance prevents dead components; decorrelation prevents redundancy. The model exhibits EM-like optimization dynamics in which lower loss does not correspond to better features and adaptive optimizers offer no advantage. The resulting decoder-free model learns interpretable mixture components, confirming that implicit EM theory can prescribe architectures.


【3】Continual Quantum Architecture Search with Tensor-Train Encoding: Theory and Applications to Signal Processing
标题:张量序列编码的连续量子结构搜索:理论及其在信号处理中的应用
链接:https://arxiv.org/abs/2601.06392

作者:Jun Qi,Chao-Han Huck Yang,Pin-Yu Chen,Javier Tejedor,Ling Li,Min-Hsiu Hsieh
备注:In submission
摘要:We introduce CL-QAS, a continual quantum architecture search framework that mitigates the challenges of costly amplitude encoding and catastrophic forgetting in variational quantum circuits. The method uses Tensor-Train encoding to efficiently compress high-dimensional stochastic signals into low-rank quantum feature representations. A bi-loop learning strategy separates circuit parameter optimization from architecture exploration, while an Elastic Weight Consolidation regularization ensures stability across sequential tasks. We derive theoretical upper bounds on approximation, generalization, and robustness under quantum noise, demonstrating that CL-QAS achieves controllable expressivity, sample-efficient generalization, and smooth convergence without barren plateaus. Empirical evaluations on electrocardiogram (ECG)-based signal classification and financial time-series forecasting confirm substantial improvements in accuracy, balanced accuracy, F1 score, and reward. CL-QAS maintains strong forward and backward transfer and exhibits bounded degradation under depolarizing and readout noise, highlighting its potential for adaptive, noise-resilient quantum learning on near-term devices.


优化|敛散性(14篇)

【1】Optimal Learning Rate Schedule for Balancing Effort and Performance
标题:平衡努力和绩效的最佳学习率计划
链接:https://arxiv.org/abs/2601.07830

作者:Valentina Njaradi,Rodrigo Carrasco-Davis,Peter E. Latham,Andrew Saxe
摘要:Learning how to learn efficiently is a fundamental challenge for biological agents and a growing concern for artificial ones. To learn effectively, an agent must regulate its learning speed, balancing the benefits of rapid improvement against the costs of effort, instability, or resource use. We introduce a normative framework that formalizes this problem as an optimal control process in which the agent maximizes cumulative performance while incurring a cost of learning. From this objective, we derive a closed-form solution for the optimal learning rate, which has the form of a closed-loop controller that depends only on the agent's current and expected future performance. Under mild assumptions, this solution generalizes across tasks and architectures and reproduces numerically optimized schedules in simulations. In simple learning models, we can mathematically analyze how agent and task parameters shape learning-rate scheduling as an open-loop control solution. Because the optimal policy depends on expectations of future performance, the framework predicts how overconfidence or underconfidence influence engagement and persistence, linking the control of learning speed to theories of self-regulated learning. We further show how a simple episodic memory mechanism can approximate the required performance expectations by recalling similar past learning experiences, providing a biologically plausible route to near-optimal behaviour. Together, these results provide a normative and biologically plausible account of learning speed control, linking self-regulated learning, effort allocation, and episodic memory estimation within a unified and tractable mathematical framework.


【2】Near-Optimal Private Linear Regression via Iterative Hessian Mixing
标题:通过迭代Hessian混合的近优私有线性回归
链接:https://arxiv.org/abs/2601.07545

作者:Omri Lev,Moshe Shenfeld,Vishwak Srinivasan,Katrina Ligett,Ashia C. Wilson
摘要:We study differentially private ordinary least squares (DP-OLS) with bounded data. The dominant approach, adaptive sufficient-statistics perturbation (AdaSSP), adds an adaptively chosen perturbation to the sufficient statistics, namely, the matrix $X^{\top}X$ and the vector $X^{\top}Y$, and is known to achieve near-optimal accuracy and to have strong empirical performance. In contrast, methods that rely on Gaussian-sketching, which ensure differential privacy by pre-multiplying the data with a random Gaussian matrix, are widely used in federated and distributed regression, yet remain relatively uncommon for DP-OLS. In this work, we introduce the iterative Hessian mixing, a novel DP-OLS algorithm that relies on Gaussian sketches and is inspired by the iterative Hessian sketch algorithm. We provide utility analysis for the iterative Hessian mixing as well as a new analysis for the previous methods that rely on Gaussian sketches. Then, we show that our new approach circumvents the intrinsic limitations of the prior methods and provides non-trivial improvements over AdaSSP. We conclude by running an extensive set of experiments across standard benchmarks to demonstrate further that our approach consistently outperforms these prior baselines.


【3】Simulated Annealing-based Candidate Optimization for Batch Acquisition Functions
标题:批量采集函数的基于模拟软化的候选优化
链接:https://arxiv.org/abs/2601.07258

作者:Sk Md Ahnaf Akif Alvi,Raymundo Arróyave,Douglas Allaire
摘要 :Bayesian Optimization with multi-objective acquisition functions such as q-Expected Hypervolume Improvement (qEHVI) requires efficient candidate optimization to maximize acquisition function values. Traditional approaches rely on continuous optimization methods like Sequential Least Squares Programming (SLSQP) for candidate selection. However, these gradient-based methods can become trapped in local optima, particularly in complex or high-dimensional objective landscapes. This paper presents a simulated annealing-based approach for candidate optimization in batch acquisition functions as an alternative to conventional continuous optimization methods. We evaluate our simulated annealing approach against SLSQP across four benchmark multi-objective optimization problems: ZDT1 (30D, 2 objectives), DTLZ2 (7D, 3 objectives), Kursawe (3D, 2 objectives), and Latent-Aware (4D, 2 objectives). Our results demonstrate that simulated annealing consistently achieves superior hypervolume performance compared to SLSQP in most test functions. The improvement is particularly pronounced for DTLZ2 and Latent-Aware problems, where simulated annealing reaches significantly higher hypervolume values and maintains better convergence characteristics. The histogram analysis of objective space coverage further reveals that simulated annealing explores more diverse and optimal regions of the Pareto front. These findings suggest that metaheuristic optimization approaches like simulated annealing can provide more robust and effective candidate optimization for multi-objective Bayesian optimization, offering a promising alternative to traditional gradient-based methods for batch acquisition function optimization.


【4】PRPO: Aligning Process Reward with Outcome Reward in Policy Optimization
标题:PRPO:政策优化中的流程奖励与结果奖励保持一致
链接:https://arxiv.org/abs/2601.07182

作者:Ruiyi Ding,Yongxuan Lv,Xianhui Meng,Jiahe Song,Chao Wang,Chen Jiang,Yuan Cheng
备注:8 pages, 2 figures
摘要:Policy optimization for large language models often suffers from sparse reward signals in multi-step reasoning tasks. Critic-free methods like GRPO assign a single normalized outcome reward to all tokens, providing limited guidance for intermediate reasoning . While Process Reward Models (PRMs) offer dense feedback, they risk premature collapse when used alone, as early low-reward tokens can drive policies toward truncated outputs. We introduce Process Relative Policy Optimization (PRPO), which combines outcome reliability with process-level guidance in a critic-free framework. PRPO segments reasoning sequences based on semantic clues, normalizes PRM scores into token-level advantages, and aligns their distribution with outcome advantages through location-parameter shift. On MATH500, PRPO improves Qwen2.5-Math-1.5B accuracy from 61.2% to 64.4% over GRPO using only eight rollouts and no value network, demonstrating efficient fine-grained credit assignment within critic-free optimization.


【5】WFR-FM: Simulation-Free Dynamic Unbalanced Optimal Transport
标题:WFR-FM:无模拟动态不平衡最佳传输
链接:https://arxiv.org/abs/2601.06810

作者:Qiangwei Peng,Zihan Wang,Junda Ying,Yuhao Sun,Qing Nie,Lei Zhang,Tiejun Li,Peijie Zhou
摘要:The Wasserstein-Fisher-Rao (WFR) metric extends dynamic optimal transport (OT) by coupling displacement with change of mass, providing a principled geometry for modeling unbalanced snapshot dynamics. Existing WFR solvers, however, are often unstable, computationally expensive, and difficult to scale. Here we introduce WFR Flow Matching (WFR-FM), a simulation-free training algorithm that unifies flow matching with dynamic unbalanced OT. Unlike classical flow matching which regresses only a transport vector field, WFR-FM simultaneously regresses a vector field for displacement and a scalar growth rate function for birth-death dynamics, yielding continuous flows under the WFR geometry. Theoretically, we show that minimizing the WFR-FM loss exactly recovers WFR geodesics. Empirically, WFR-FM yields more accurate and robust trajectory inference in single-cell biology, reconstructing consistent dynamics with proliferation and apoptosis, estimating time-varying growth fields, and applying to generative dynamics under imbalanced data. It outperforms state-of-the-art baselines in efficiency, stability, and reconstruction accuracy. Overall, WFR-FM establishes a unified and efficient paradigm for learning dynamical systems from unbalanced snapshots, where not only states but also mass evolve over time.


【6】Structure-preserving learning and prediction in optimal control of collective motion
标题:集体运动最优控制中的结构保持学习和预测
链接:https://arxiv.org/abs/2601.06770

作者:Sofiia Huraka,Vakhtang Putkaradze
备注:55 pages, 9 figures
摘要:Wide-spread adoption of unmanned vehicle technologies requires the ability to predict the motion of the combined vehicle operation from observations. While the general prediction of such motion for an arbitrary control mechanism is difficult, for a particular choice of control, the dynamics reduces to the Lie-Poisson equations [33,34]. Our goal is to learn the phase-space dynamics and predict the motion solely from observations, without any knowledge of the control Hamiltonian or the nature of interaction between vehicles. To achieve that goal, we propose the Control Optimal Lie-Poisson Neural Networks (CO-LPNets) for learning and predicting the dynamics of the system from data. Our methods learn the mapping of the phase space through the composition of Poisson maps, which are obtained as flows from Hamiltonians that could be integrated explicitly. CO-LPNets preserve the Poisson bracket and thus preserve Casimirs to machine precision. We discuss the completeness of the derived neural networks and their efficiency in approximating the dynamics. To illustrate the power of the method, we apply these techniques to systems of $N=3$ particles evolving on ${\rm SO}(3)$ group, which describe coupled rigid bodies rotating about their center of mass, and ${\rm SE}(3)$ group, applicable to the movement of unmanned air and water vehicles. Numerical results demonstrate that CO-LPNets learn the dynamics in phase space from data points and reproduce trajectories, with good accuracy, over hundreds of time steps. The method uses a limited number of points ($\sim200$/dimension) and parameters ($\sim 1000$ in our case), demonstrating potential for practical applications and edge deployment.


【7】Pareto-Optimal Model Selection for Low-Cost, Single-Lead EMG Control in Embedded Systems
标题:嵌入式系统中低成本单导联肌电信号控制的Pareto最优模型选择
链接:https://arxiv.org/abs/2601.06516

作者:Carl Vincent Ladres Kho
备注 :15 pages main text, 51 pages total including appendices. 18 figures. Code and dataset available at: https://github.com/CarlKho-Minerva/v2-emg-muscle
摘要:Consumer-grade biosensors offer a cost-effective alternative to medical-grade electromyography (EMG) systems, reducing hardware costs from thousands of dollars to approximately $13. However, these low-cost sensors introduce significant signal instability and motion artifacts. Deploying machine learning models on resource-constrained edge devices like the ESP32 presents a challenge: balancing classification accuracy with strict latency (<100ms) and memory (<320KB) constraints. Using a single-subject dataset comprising 1,540 seconds of raw data (1.54M data points, segmented into ~1,300 one-second windows), I evaluate 18 model architectures, ranging from statistical heuristics to deep transfer learning (ResNet50) and custom hybrid networks (MaxCRNN). While my custom "MaxCRNN" (Inception + Bi-LSTM + Attention) achieved the highest safety (99% Precision) and robustness, I identify Random Forest (74% accuracy) as the Pareto-optimal solution for embedded control on legacy microcontrollers. I demonstrate that reliable, low-latency EMG control is feasible on commodity hardware, with Deep Learning offering a path to near-perfect reliability on modern Edge AI accelerators.


【8】Neural Nonmyopic Bayesian Optimization in Dynamic Cost Settings
标题:动态费用环境下的神经网络非近视贝叶斯优化
链接:https://arxiv.org/abs/2601.06505

作者:Sang T. Truong,Duc Q. Nguyen,Willie Neiswanger,Ryan-Rhys Griffiths,Stefano Ermon,Nick Haber,Sanmi Koyejo
备注:32 pages, 20 figures, 13 tables
摘要:Bayesian optimization (BO) is a common framework for optimizing black-box functions, yet most existing methods assume static query costs and rely on myopic acquisition strategies. We introduce LookaHES, a nonmyopic BO framework designed for dynamic, history-dependent cost environments, where evaluation costs vary with prior actions, such as travel distance in spatial tasks or edit distance in sequence design. LookaHES combines a multi-step variant of $H$-Entropy Search with pathwise sampling and neural policy optimization, enabling long-horizon planning beyond twenty steps without the exponential complexity of existing nonmyopic methods. The key innovation is the integration of neural policies, including large language models, to effectively navigate structured, combinatorial action spaces such as protein sequences. These policies amortize lookahead planning and can be integrated with domain-specific constraints during rollout. Empirically, LookaHES outperforms strong myopic and nonmyopic baselines across nine synthetic benchmarks from two to eight dimensions and two real-world tasks: geospatial optimization using NASA night-light imagery and protein sequence design with constrained token-level edits. In short, LookaHES provides a general, scalable, and cost-aware solution for robust long-horizon optimization in complex decision spaces, which makes it a useful tool for researchers in machine learning, statistics, and applied domains. Our implementation is available at https://github.com/sangttruong/nonmyopia.


【9】COVR:Collaborative Optimization of VLMs and RL Agent for Visual-Based Control
标题:COVR:基于视觉的控制的VLM和RL Agent的协同优化
链接:https://arxiv.org/abs/2601.06122

作者:Canming Xia,Peixi Peng,Guang Tan,Zhan Su,Haoran Xu,Zhenxian Liu,Luntong Li
备注:The paper was accepted by the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26)
摘要:Visual reinforcement learning (RL) suffers from poor sample efficiency due to high-dimensional observations in complex tasks. While existing works have shown that vision-language models (VLMs) can assist RL, they often focus on knowledge distillation from the VLM to RL, overlooking the potential of RL-generated interaction data to enhance the VLM. To address this, we propose COVR, a collaborative optimization framework that enables the mutual enhancement of the VLM and RL policies. Specifically, COVR fine-tunes the VLM with RL-generated data to enhance the semantic reasoning ability consistent with the target task, and uses the enhanced VLM to further guide policy learning via action priors. To improve fine-tuning efficiency, we introduce two key modules: (1) an Exploration-Driven Dynamic Filter module that preserves valuable exploration samples using adaptive thresholds based on the degree of exploration, and (2) a Return-Aware Adaptive Loss Weight module that improves the stability of training by quantifying the inconsistency of sampling actions via return signals of RL. We further design a progressive fine-tuning strategy to reduce resource consumption. Extensive experiments show that COVR achieves strong performance across various challenging visual control tasks.


【10】Tree-Preconditioned Differentiable Optimization and Axioms as Layers
标题:树预条件可微优化和公理作为层
链接:https://arxiv.org/abs/2601.06036

作者:Yuexin Liao
备注:Comments and collaboration are highly welcome
摘要:This paper introduces a differentiable framework that embeds the axiomatic structure of Random Utility Models (RUM) directly into deep neural networks. Although projecting empirical choice data onto the RUM polytope is NP-hard in general, we uncover an isomorphism between RUM consistency and flow conservation on the Boolean lattice. Leveraging this combinatorial structure, we derive a novel Tree-Preconditioned Conjugate Gradient solver. By exploiting the spanning tree of the constraint graph, our preconditioner effectively "whitens" the ill-conditioned Hessian spectrum induced by the Interior Point Method barrier, achieving superlinear convergence and scaling to problem sizes previously deemed unsolvable. We further formulate the projection as a differentiable layer via the Implicit Function Theorem, where the exact Jacobian propagates geometric constraints during backpropagation. Empirical results demonstrate that this "Axioms-as-Layers" paradigm eliminates the structural overfitting inherent in penalty-based methods, enabling models that are jointly trainable, provably rational, and capable of generalizing from sparse data regimes where standard approximations fail.


【11】Learning to bin: differentiable and Bayesian optimization for multi-dimensional discriminants in high-energy physics
标题:学习分类:高能物理中多维辨别式的可微和Bayesian优化
链接:https://arxiv.org/abs/2601.07756

作者:Johannes Erdmann,Nitish Kumar Kasaraguppe,Florian Mausolf
备注 :13 pages, 5 figures
摘要:Categorizing events using discriminant observables is central to many high-energy physics analyses. Yet, bin boundaries are often chosen by hand. A simple, popular choice is to apply argmax projections of multi-class scores and equidistant binning of one-dimensional discriminants. We propose a binning optimization for signal significance directly in multi-dimensional discriminants. We use a Gaussian Mixture Model (GMM) to define flexible bin boundary shapes for multi-class scores, while in one dimension (binary classification) we move bin boundaries directly. On this binning model, we study two optimization strategies: a differentiable and a Bayesian optimization approach. We study two toy setups: a binary classification and a three-class problem with two signals and backgrounds. In the one-dimensional case, both approaches achieve similar gains in signal sensitivity compared to equidistant binnings for a given number of bins. In the multi-dimensional case, the GMM-based binning defines sensitive categories as well, with the differentiable approach performing best. We show that, in particular for limited separability of the signal processes, our approach outperforms argmax classification even with optimized binning in the one-dimensional projections. Both methods are released as lightweight Python plugins intended for straightforward integration into existing analyses.


【12】Optimal Transport under Group Fairness Constraints
标题:群体公平约束下的最优运输
链接:https://arxiv.org/abs/2601.07144

作者:Linus Bleistein,Mathieu Dagréou,Francisco Andrade,Thomas Boudou,Aurélien Bellet
摘要:Ensuring fairness in matching algorithms is a key challenge in allocating scarce resources and positions. Focusing on Optimal Transport (OT), we introduce a novel notion of group fairness requiring that the probability of matching two individuals from any two given groups in the OT plan satisfies a predefined target. We first propose \texttt{FairSinkhorn}, a modified Sinkhorn algorithm to compute perfectly fair transport plans efficiently. Since exact fairness can significantly degrade matching quality in practice, we then develop two relaxation strategies. The first one involves solving a penalised OT problem, for which we derive novel finite-sample complexity guarantees. This result is of independent interest as it can be generalized to arbitrary convex penalties. Our second strategy leverages bilevel optimization to learn a ground cost that induces a fair OT solution, and we establish a bound guaranteeing that the learned cost yields fair matchings on unseen data. Finally, we present empirical results that illustrate the trade-offs between fairness and performance.


【13】Robust Bayesian Optimization via Tempered Posteriors
标题:通过调整后验的鲁棒Bayesian优化
链接:https://arxiv.org/abs/2601.07094

作者:Jiguang Li,Hengrui Luo
摘要:Bayesian optimization (BO) iteratively fits a Gaussian process (GP) surrogate to accumulated evaluations and selects new queries via an acquisition function such as expected improvement (EI). In practice, BO often concentrates evaluations near the current incumbent, causing the surrogate to become overconfident and to understate predictive uncertainty in the region guiding subsequent decisions. We develop a robust GP-based BO via tempered posterior updates, which downweight the likelihood by a power $α\in (0,1]$ to mitigate overconfidence under local misspecification. We establish cumulative regret bounds for tempered BO under a family of generalized improvement rules, including EI, and show that tempering yields strictly sharper worst-case regret guarantees than the standard posterior $(α=1)$, with the most favorable guarantees occurring near the classical EI choice.   Motivated by our theoretic findings, we propose a prequential procedure for selecting $α$ online: it decreases $α$ when realized prediction errors exceed model-implied uncertainty and returns $α$ toward one as calibration improves. Empirical results demonstrate that tempering provides a practical yet theoretically grounded tool for stabilizing BO surrogates under localized sampling.


【14】Constrained Density Estimation via Optimal Transport
标题:通过最佳运输进行约束密度估计
链接:https://arxiv.org/abs/2601.06830

作者:Yinan Hu,Estaban Tabak
摘要:A novel framework for density estimation under expectation constraints is proposed. The framework minimizes the Wasserstein distance between the estimated density and a prior, subject to the constraints that the expected value of a set of functions adopts or exceeds given values. The framework is generalized to include regularization inequalities to mitigate the artifacts in the target measure. An annealing-like algorithm is developed to address non-smooth constraints, with its effectiveness demonstrated through both synthetic and proof-of-concept real world examples in finance.


预测|估计(19篇)

【1】PLANET v2.0: A comprehensive Protein-Ligand Affinity Prediction Model Based on Mixture Density Network
标题:PLANET v2.0:基于混合密度网络的综合蛋白-配体亲和力预测模型
链接:https://arxiv.org/abs/2601.07415

作者:Haotian Gao,Xiangying Zhang,Jingyuan Li,Xinchong Chen,Haojie Wang,Yifei Qi,Renxiao Wang
摘要 :Drug discovery represents a time-consuming and financially intensive process, and virtual screening can accelerate it. Scoring functions, as one of the tools guiding virtual screening, have their precision closely tied to screening efficiency. In our previous study, we developed a graph neural network model called PLANET (Protein-Ligand Affinity prediction NETwork), but it suffers from the defect in representing protein-ligand contact maps. Incorrect binding modes inevitably lead to poor affinity predictions, so accurate prediction of the protein-ligand contact map is desired to improve PLANET. In this study, we have proposed PLANET v2.0 as an upgraded version. The model is trained via multi-objective training strategy and incorporates the Mixture Density Network to predict binding modes. Except for the probability density distributions of non-covalent interactions, we innovatively employ another Gaussian mixture model to describe the relationship between distance and energy of each interaction pair and predict protein-ligand affinity like calculating the mathematical expectation. As on the CASF-2016 benchmark, PLANET v2.0 demonstrates excellent scoring power, ranking power, and docking power. The screening power of PLANET v2.0 gets notably improved compared to PLANET and Glide SP and it demonstrates robust validation on a commercial ultra-large-scale dataset. Given its efficiency and accuracy, PLANET v2.0 can hopefully become one of the practical tools for virtual screening workflows. PLANET v2.0 is freely available at https://www.pdbbind-plus.org.cn/planetv2.


【2】Explaining Machine Learning Predictive Models through Conditional Expectation Methods
标题:通过条件期望方法解释机器学习预测模型
链接:https://arxiv.org/abs/2601.07313

作者:Silvia Ruiz-España,Laura Arnal,François Signol,Juan-Carlos Perez-Cortes,Joaquim Arlandis
备注:24 pages, 15 figures. Silvia Ruiz-España and Laura Arnal contributed equally to this work
摘要:The rapid adoption of complex Artificial Intelligence (AI) and Machine Learning (ML) models has led to their characterization as black boxes due to the difficulty of explaining their internal decision-making processes. This lack of transparency hinders users' ability to understand, validate and trust model behavior, particularly in high-risk applications. Although explainable AI (XAI) has made significant progress, there remains a need for versatile and effective techniques to address increasingly complex models. This work introduces Multivariate Conditional Expectation (MUCE), a model-agnostic method for local explainability designed to capture prediction changes from feature interactions. MUCE extends Individual Conditional Expectation (ICE) by exploring a multivariate grid of values in the neighborhood of a given observation at inference time, providing graphical explanations that illustrate the local evolution of model predictions. In addition, two quantitative indices, stability and uncertainty, summarize local behavior and assess model reliability. Uncertainty is further decomposed into uncertainty+ and uncertainty- to capture asymmetric effects that global measures may overlook. The proposed method is validated using XGBoost models trained on three datasets: two synthetic (2D and 3D) to evaluate behavior near decision boundaries, and one transformed real-world dataset to test adaptability to heterogeneous feature types. Results show that MUCE effectively captures complex local model behavior, while the stability and uncertainty indices provide meaningful insight into prediction confidence. MUCE, together with the ICE modification and the proposed indices, offers a practical contribution to local explainability, enabling both graphical and quantitative insights that enhance the interpretability of predictive models and support more trustworthy and transparent decision-making.


【3】CalPro: Prior-Aware Evidential--Conformal Prediction with Structure-Aware Guarantees for Protein Structures
标题:CalPro:先验感知证据--具有蛋白质结构结构感知保证的保形预测
链接:https://arxiv.org/abs/2601.07201

作者:Ibne Farabi Shihab,Sanjeda Akter,Anuj Sharma
摘要:Deep protein structure predictors such as AlphaFold provide confidence estimates (e.g., pLDDT) that are often miscalibrated and degrade under distribution shifts across experimental modalities, temporal changes, and intrinsically disordered regions. We introduce CalPro, a prior-aware evidential-conformal framework for shift-robust uncertainty quantification. CalPro combines (i) a geometric evidential head that outputs Normal-Inverse-Gamma predictive distributions via a graph-based architecture; (ii) a differentiable conformal layer that enables end-to-end training with finite-sample coverage guarantees; and (iii) domain priors (disorder, flexibility) encoded as soft constraints. We derive structure-aware coverage guarantees under distribution shift using PAC-Bayesian bounds over ambiguity sets, and show that CalPro maintains near-nominal coverage while producing tighter intervals than standard conformal methods in regions where priors are informative. Empirically, CalPro exhibits at most 5% coverage degradation across modalities (vs. 15-25% for baselines), reduces calibration error by 30-50%, and improves downstream ligand-docking success by 25%. Beyond proteins, CalPro applies to structured regression tasks in which priors encode local reliability, validated on non-biological benchmarks.


【4】Towards Operational Streamflow Forecasting in the Limpopo River Basin using Long Short-Term Memory Networks
标题:使用长短期记忆网络进行林波波河流域的业务径流预测
链接:https://arxiv.org/abs/2601.06941

作者:James Tlhomole,Edoardo Borgomeo,Karthikeyan Matheswaran,Mariangel Garcia Andarcia
备注:14 pages, 6 figures, 2 tables
摘要:Robust hydrological simulation is key for sustainable development, water management strategies, and climate change adaptation. In recent years, deep learning methods have been demonstrated to outperform mechanistic models at the task of hydrological discharge simulation. Adoption of these methods has been catalysed by the proliferation of large sample hydrology datasets, consisting of the observed discharge and meteorological drivers, along with geological and topographical catchment descriptors. Deep learning methods infer rainfall-runoff characteristics that have been shown to generalise across catchments, benefitting from the data diversity in large datasets. Despite this, application to catchments in Africa has been limited. The lack of adoption of deep learning methodologies is primarily due to sparsity or lack of the spatiotemporal observational data required to enable downstream model training. We therefore investigate the application of deep learning models, including LSTMs, for hydrological discharge simulation in the transboundary Limpopo River basin, emphasising application to data scarce regions. We conduct a number of computational experiments primarily focused on assessing the impact of varying the LSTM model input data on performance. Results confirm that data constraints remain the largest obstacle to deep learning applications across African river basins. We further outline the impact of human influence on data-driven modelling which is a commonly overlooked aspect of data-driven large-sample hydrology approaches and investigate solutions for model adaptation under smaller datasets. Additionally, we include recommendations for future efforts towards seasonal hydrological discharge prediction and direct comparison or inclusion of SWAT model outputs, as well as architectural improvements.


【5】Analyzing the effect of prediction accuracy on the distributionally-robust competitive ratio
标题:分析预测准确性对分布稳健竞争比的影响
链接:https://arxiv.org/abs/2601.06813

作者:Toru Yoshinaga,Yasushi Kawase
摘要 :The field of algorithms with predictions aims to improve algorithm performance by integrating machine learning predictions into algorithm design. A central question in this area is how predictions can improve performance, and a key aspect of this analysis is the role of prediction accuracy. In this context, prediction accuracy is defined as a guaranteed probability that an instance drawn from the distribution belongs to the predicted set. As a performance measure that incorporates prediction accuracy, we focus on the distributionally-robust competitive ratio (DRCR), introduced by Sun et al.~(ICML 2024). The DRCR is defined as the expected ratio between the algorithm's cost and the optimal cost, where the expectation is taken over the worst-case instance distribution that satisfies the given prediction and accuracy requirement. A known structural property is that, for any fixed algorithm, the DRCR decreases linearly as prediction accuracy increases. Building on this result, we establish that the optimal DRCR value (i.e., the infimum over all algorithms) is a monotone and concave function of prediction accuracy. We further generalize the DRCR framework to a multiple-prediction setting and show that monotonicity and concavity are preserved in this setting. Finally, we apply our results to the ski rental problem, a benchmark problem in online optimization, to identify the conditions on prediction accuracies required for the optimal DRCR to attain a target value. Moreover, we provide a method for computing the critical accuracy, defined as the minimum accuracy required for the optimal DRCR to strictly improve upon the performance attainable without any accuracy guarantee.


【6】Short-term electricity load forecasting with multi-frequency reconstruction diffusion
标题:多频重建扩散的短期电力负荷预测
链接:https://arxiv.org/abs/2601.06533

作者:Qi Dong,Rubing Huang,Ling Zhou,Dave Towey,Jinyu Tian,Jianzhou Wang
摘要:Diffusion models have emerged as a powerful method in various applications. However, their application to Short-Term Electricity Load Forecasting (STELF) -- a typical scenario in energy systems -- remains largely unexplored. Considering the nonlinear and fluctuating characteristics of the load data, effectively utilizing the powerful modeling capabilities of diffusion models to enhance STELF accuracy remains a challenge. This paper proposes a novel diffusion model with multi-frequency reconstruction for STELF, referred to as the Multi-Frequency-Reconstruction-based Diffusion (MFRD) model. The MFRD model achieves accurate load forecasting through four key steps: (1) The original data is combined with the decomposed multi-frequency modes to form a new data representation; (2) The diffusion model adds noise to the new data, effectively reducing and weakening the noise in the original data; (3) The reverse process adopts a denoising network that combines Long Short-Term Memory (LSTM) and Transformer to enhance noise removal; and (4) The inference process generates the final predictions based on the trained denoising network. To validate the effectiveness of the MFRD model, we conducted experiments on two data platforms: Australian Energy Market Operator (AEMO) and Independent System Operator of New England (ISO-NE). The experimental results show that our model consistently outperforms the compared models.


【7】Improving Day-Ahead Grid Carbon Intensity Forecasting by Joint Modeling of Local-Temporal and Cross-Variable Dependencies Across Different Frequencies
标题:通过对不同频率的本地-时间和交叉变量依赖性进行联合建模来改进未来一天电网碳强度预测
链接:https://arxiv.org/abs/2601.06530

作者:Bowen Zhang,Hongda Tian,Adam Berry,A. Craig Roussac
备注:2026 40th AAAI Conference on Artificial Intelligence
摘要:Accurate forecasting of the grid carbon intensity factor (CIF) is critical for enabling demand-side management and reducing emissions in modern electricity systems. Leveraging multiple interrelated time series, CIF prediction is typically formulated as a multivariate time series forecasting problem. Despite advances in deep learning-based methods, it remains challenging to capture the fine-grained local-temporal dependencies, dynamic higher-order cross-variable dependencies, and complex multi-frequency patterns for CIF forecasting. To address these issues, we propose a novel model that integrates two parallel modules: 1) one enhances the extraction of local-temporal dependencies under multi-frequency by applying multiple wavelet-based convolutional kernels to overlapping patches of varying lengths; 2) the other captures dynamic cross-variable dependencies under multi-frequency to model how inter-variable relationships evolve across the time-frequency domain. Evaluations on four representative electricity markets from Australia, featuring varying levels of renewable penetration, demonstrate that the proposed method outperforms the state-of-the-art models. An ablation study further validates the complementary benefits of the two proposed modules. Designed with built-in interpretability, the proposed model also enables better understanding of its predictive behavior, as shown in a case study where it adaptively shifts attention to relevant variables and time intervals during a disruptive event.


【8】Hybrid LSTM-UKF Framework: Ankle Angle and Ground Reaction Force Estimation
标题:混合LSTM-UKF框架:踝角和地面反力估计
链接:https://arxiv.org/abs/2601.06473

作者:Mundla Narasimhappa,Praveen Kumar
备注:8
摘要:Accurate prediction of joint kinematics and kinetics is essential for advancing gait analysis and developing intelligent assistive systems such as prosthetics and exoskeletons. This study presents a hybrid LSTM-UKF framework for estimating ankle angle and ground reaction force (GRF) across varying walking speeds. A multimodal sensor fusion strategy integrates force plate data, knee angle, and GRF signals to enrich biomechanical context. Model performance was evaluated using RMSE and $R^2$ under subject-specific validation. The LSTM-UKF consistently outperformed standalone LSTM and UKF models, achieving up to 18.6\% lower RMSE for GRF prediction at 3 km/h. Additionally, UKF integration improved robustness, reducing ankle angle RMSE by up to 22.4\% compared to UKF alone at 1 km/h. These results underscore the effectiveness of hybrid architectures for reliable gait prediction across subjects and walking conditions.


【9】Can we Improve Prediction of Psychotherapy Outcomes Through Pretraining With Simulated Data?
标题:我们可以通过模拟数据的预训练来提高心理治疗结果的预测吗?
链接:https://arxiv.org/abs/2601.06159

作者:Niklas Jacobs,Manuel C. Voelkle,Norbert Kathmann,Kevin Hilbert
摘要 :In the context of personalized medicine, machine learning algorithms are growing in popularity. These algorithms require substantial information, which can be acquired effectively through the usage of previously gathered data. Open data and the utilization of synthetization techniques have been proposed to address this. In this paper, we propose and evaluate alternative approach that uses additional simulated data based on summary statistics published in the literature. The simulated data are used to pretrain random forests, which are afterwards fine-tuned on a real dataset. We compare the predictive performance of the new approach to random forests trained only on the real data. A Monte Carlo Cross Validation (MCCV) framework with 100 iterations was employed to investigate significance and stability of the results. Since a first study yielded inconclusive results, a second study with improved methodology (i.e., systematic information extraction and different prediction outcome) was conducted. In Study 1, some pretrained random forests descriptively outperformed the standard random forest. However, this improvement was not significant (t(99) = 0.89, p = 0.19). Contrary to expectations, in Study 2 the random forest trained only with the real data outperformed the pretrained random forests. We conclude with a discussion of challenges, such as the scarcity of informative publications, and recommendations for future research.


【10】A Foundation Model Approach for Fetal Stress Prediction During Labor From cardiotocography (CTG) recordings
标题:根据心胎心电图(CTG)记录预测分娩期间胎儿压力的基础模型方法
链接:https://arxiv.org/abs/2601.06149

作者:Naomi Fridman,Berta Ben Shachar
摘要:Intrapartum cardiotocography (CTG) is widely used for fetal monitoring during labor, yet its interpretation suffers from high inter-observer variability and limited predictive accuracy. Deep learning approaches have been constrained by the scarcity of CTG recordings with clinical outcome labels. We present the first application of self-supervised pre-training to intrapartum CTG analysis, leveraging 2,444 hours of unlabeled recordings for masked pre-training followed by fine-tuning on the 552-recording CTU-UHB benchmark. Using a PatchTST transformer architecture with a channel-asymmetric masking scheme designed for fetal heart rate reconstruction, we achieve an area under the receiver operating characteristic curve of 0.83 on the full test set and 0.853 on uncomplicated vaginal deliveries, exceeding previously reported results on this benchmark (0.68-0.75). Error analysis reveals that false-positive alerts typically correspond to CTG patterns judged concerning on retrospective clinical review, suggesting clinically meaningful predictions even when umbilical pH is normal. We release standardized dataset splits and model weights to enable reproducible benchmarking. Our results demonstrate that self-supervised pre-training can address data scarcity in fetal monitoring, offering a path toward reliable decision support in the labor room.


【11】RainBalance: Alleviating Dual Imbalance in GNSS-based Precipitation Nowcasting via Continuous Probability Modeling
标题:RainBalance:通过连续概率建模缓解基于GNSS的降水临近预报中的双重不平衡
链接:https://arxiv.org/abs/2601.06137

作者:Yifang Zhang,Shengwu Xiong,Henan Wang,Wenjie Yin,Jiawang Peng,Duan Zhou,Yuqiang Zhang,Chen Zhou,Hua Chen,Qile Zhao,Pengfei Duan
备注:11pages,6 figures
摘要:Global navigation satellite systems (GNSS) station-based Precipitation Nowcasting aims to predict rainfall within the next 0-6 hours by leveraging a GNSS station's historical observations of precipitation, GNSS-PWV, and related meteorological variables, which is crucial for disaster mitigation and real-time decision-making. In recent years, time-series forecasting approaches have been extensively applied to GNSS station-based precipitation nowcasting. However, the highly imbalanced temporal distribution of precipitation, marked not only by the dominance of non-rainfall events but also by the scarcity of extreme precipitation samples, significantly limits model performance in practical applications. To address the dual imbalance problem in precipitation nowcasting, we propose a continuous probability modeling-based framework, RainBalance. This plug-and-play module performs clustering for each input sample to obtain its cluster probability distribution, which is further mapped into a continuous latent space via a variational autoencoder (VAE). By learning in this continuous probabilistic space, the task is reformulated from fitting single and imbalance-prone precipitation labels to modeling continuous probabilistic label distributions, thereby alleviating the imbalance issue. We integrate this module into multiple state-of-the-art models and observe consistent performance gains. Comprehensive statistical analysis and ablation studies further validate the effectiveness of our approach.


【12】Learning Minimally-Congested Drive Times from Sparse Open Networks: A Lightweight RF-Based Estimator for Urban Roadway Operations
标题:从稀疏开放网络中学习最小拥堵驾驶时间:城市道路运营的轻量级基于RF的估计
链接:https://arxiv.org/abs/2601.06124

作者:Adewumi Augustine Adepitan,Christopher J. Haruna,Morayo Ogunsina,Damilola Olawoyin Yussuf,Ayooluwatomiwa Ajiboye
摘要:Accurate roadway travel-time prediction is foundational to transportation systems analysis, yet widespread reliance on either data-intensive congestion models or overly naïve heuristics limits scalability and practical adoption in engineering workflows. This paper develops a lightweight estimator for minimally-congested car travel times that integrates open road-network data, speed constraints, and sparse control/turn features within a random forest framework to correct bias from shortest-path traversal-time baselines. Using an urban testbed, the pipeline: (i) constructs drivable networks from volunteered geographic data; (ii) solves Dijkstra routes minimizing edge traversal time; (iii) derives sparse operational features (signals, stops, crossings, yield, roundabouts; left/right/slight/U-turn counts); and (iv) trains a regression ensemble on limited high-quality reference times to generalize predictions beyond the training set. Out-of-sample evaluation demonstrates marked improvements over traversal-time baselines across mean absolute error, mean absolute percentage error, mean squared error, relative bias, and explained variance, with no significant mean bias under minimally congested conditions and consistent k-fold stability indicating negligible overfitting. The resulting approach offers a practical middle ground for transportation engineering: it preserves point-to-point fidelity at metropolitan scale, reduces resource requirements, and supplies defensible performance estimates where congestion feeds are inaccessible or cost-prohibitive, supporting planning, accessibility, and network performance applications under low-traffic operating regimes.


【13】One if by Land, Two if by Sea, Three if by Four Seas, and More to Come -- Values of Perception, Prediction, Communication, and Common Sense in Decision Making
标题:陆路一个,海路二个,四大洋三个,以及未来更多--决策中的感知、预测、沟通和常识价值
链接:https://arxiv.org/abs/2601.06077

作者:Aolin Xu
摘要:This work aims to rigorously define the values of perception, prediction, communication, and common sense in decision making. The defined quantities are decision-theoretic, but have information-theoretic analogues, e.g., they share some simple but key mathematical properties with Shannon entropy and mutual information, and can reduce to these quantities in particular settings. One interesting observation is that, the value of perception without prediction can be negative, while the value of perception together with prediction and the value of prediction alone are always nonnegative. The defined quantities suggest answers to practical questions arising in the design of autonomous decision-making systems. Example questions include: Do we need to observe and predict the behavior of a particular agent? How important is it? What is the best order to observe and predict the agents? The defined quantities may also provide insights to cognitive science and neural science, toward the understanding of how natural decision makers make use of information gained from different sources and operations.


【14】Physics-Informed Singular-Value Learning for Cross-Covariances Forecasting in Financial Markets
标题:金融市场交叉协方差预测的物理信息奇异值学习
链接:https://arxiv.org/abs/2601.07687

作者:Efstratios Manolakis,Christian Bongiorno,Rosario Nunzio Mantegna
摘要:A new wave of work on covariance cleaning and nonlinear shrinkage has delivered asymptotically optimal analytical solutions for large covariance matrices. Building on this progress, these ideas have been generalized to empirical cross-covariance matrices, whose singular-value shrinkage characterizes comovements between one set of assets and another. Existing analytical cross-covariance cleaners are derived under strong stationarity and large-sample assumptions, and they typically rely on mesoscopic regularity conditions such as bounded spectra; macroscopic common modes (e.g., a global market factor) violate these conditions. When applied to real equity returns, where dependence structures drift over time and global modes are prominent, we find that these theoretically optimal formulas do not translate into robust out-of-sample performance. We address this gap by designing a random-matrix-inspired neural architecture that operates in the empirical singular-vector basis and learns a nonlinear mapping from empirical singular values to their corresponding cleaned values. By construction, the network can recover the analytical solution as a special case, yet it remains flexible enough to adapt to non-stationary dynamics and mode-driven distortions. Trained on a long history of equity returns, the proposed method achieves a more favorable bias-variance trade-off than purely analytical cleaners and delivers systematically lower out-of-sample cross-covariance prediction errors. Our results demonstrate that combining random-matrix theory with machine learning makes asymptotic theories practically effective in realistic time-varying markets.


【15】Dual-Level Models for Physics-Informed Multi-Step Time Series Forecasting
标题:基于物理信息的多步时间序列预测的双层模型
链接:https://arxiv.org/abs/2601.07640

作者:Mahdi Nasiri,Johanna Kortelainen,Simo Särkkä
摘要:This paper develops an approach for multi-step forecasting of dynamical systems by integrating probabilistic input forecasting with physics-informed output prediction. Accurate multi-step forecasting of time series systems is important for the automatic control and optimization of physical processes, enabling more precise decision-making. While mechanistic-based and data-driven machine learning (ML) approaches have been employed for time series forecasting, they face significant limitations. Incomplete knowledge of process mathematical models limits mechanistic-based direct employment, while purely data-driven ML models struggle with dynamic environments, leading to poor generalization. To address these limitations, this paper proposes a dual-level strategy for physics-informed forecasting of dynamical systems. On the first level, input variables are forecast using a hybrid method that integrates a long short-term memory (LSTM) network into probabilistic state transition models (STMs). On the second level, these stochastically predicted inputs are sequentially fed into a physics-informed neural network (PINN) to generate multi-step output predictions. The experimental results of the paper demonstrate that the hybrid input forecasting models achieve a higher log-likelihood and lower mean squared errors (MSE) compared to conventional STMs. Furthermore, the PINNs driven by the input forecasting models outperform their purely data-driven counterparts in terms of MSE and log-likelihood, exhibiting stronger generalization and forecasting performance across multiple test cases.


【16】PIDT: Physics-Informed Digital Twin for Optical Fiber Parameter Estimation
标题:PIDT:用于光纤参数估计的物理信息数字双胞胎
链接:https://arxiv.org/abs/2601.07436

作者:Zicong Jiang,Magnus Karlsson,Erik Agrell,Christian Häger
备注:The paper will be appeared in Optical Fiber Communications Conference and Exhibition (OFC) 2026
摘要:We propose physics-informed digital twin (PIDT): a fiber parameter estimation approach that combines a parameterized split-step method with a physics-informed loss. PIDT improves accuracy and convergence speed with lower complexity compared to previous neural operators.


【17】Robust Mean Estimation under Quantization
标题:量化下的稳健均值估计
链接:https://arxiv.org/abs/2601.07074

作者:Pedro Abdalla,Junren Chen
摘要:We consider the problem of mean estimation under quantization and adversarial corruption. We construct multivariate robust estimators that are optimal up to logarithmic factors in two different settings. The first is a one-bit setting, where each bit depends only on a single sample, and the second is a partial quantization setting, in which the estimator may use a small fraction of unquantized data.


【18】Conditional Normalizing Flows for Forward and Backward Joint State and Parameter Estimation
标题:前向和后向联合状态和参数估计的条件正规化流
链接:https://arxiv.org/abs/2601.07013

作者:Luke S. Lagunowich,Guoxiang Grayson Tong,Daniele E. Schiavazzi
摘要:Traditional filtering algorithms for state estimation -- such as classical Kalman filtering, unscented Kalman filtering, and particle filters - show performance degradation when applied to nonlinear systems whose uncertainty follows arbitrary non-Gaussian, and potentially multi-modal distributions. This study reviews recent approaches to state estimation via nonlinear filtering based on conditional normalizing flows, where the conditional embedding is generated by standard MLP architectures, transformers or selective state-space models (like Mamba-SSM). In addition, we test the effectiveness of an optimal-transport-inspired kinetic loss term in mitigating overparameterization in flows consisting of a large collection of transformations. We investigate the performance of these approaches on applications relevant to autonomous driving and patient population dynamics, paying special attention to how they handle time inversion and chained predictions. Finally, we assess the performance of various conditioning strategies for an application to real-world COVID-19 joint SIR system forecasting and parameter estimation.


【19】Diffusion Models with Heavy-Tailed Targets: Score Estimation and Sampling Guarantees
标题:具有重尾目标的扩散模型:分数估计和抽样保证
链接:https://arxiv.org/abs/2601.06715

作者:Yifeng Yu,Lu Yu
摘要:Score-based diffusion models have become a powerful framework for generative modeling, with score estimation as a central statistical bottleneck. Existing guarantees for score estimation largely focus on light-tailed targets or rely on restrictive assumptions such as compact support, which are often violated by heavy-tailed data in practice. In this work, we study conventional (Gaussian) score-based diffusion models when the target distribution is heavy-tailed and belongs to a Sobolev class with smoothness parameter $β>0$. We consider both exponential and polynomial tail decay, indexed by a tail parameter $γ$. Using kernel density estimation, we derive sharp minimax rates for score estimation, revealing a qualitative dichotomy: under exponential tails, the rate matches the light-tailed case up to polylogarithmic factors, whereas under polynomial tails the rate depends explicitly on $γ$. We further provide sampling guarantees for the associated continuous reverse dynamics. In total variation, the generated distribution converges at the minimax optimal rate $n^{-β/(2β+d)}$ under exponential tails (up to logarithmic factors), and at a $γ$-dependent rate under polynomial tails. Whether the latter sampling rate is minimax optimal remains an open question. These results characterize the statistical limits of score estimation and the resulting sampling accuracy for heavy-tailed targets, extending diffusion theory beyond the light-tailed setting.


其他神经网络|深度学习|模型|建模(47篇)

【1】Hidden Monotonicity: Explaining Deep Neural Networks via their DC Decomposition
标题:隐藏的单调性:通过DC分解解释深度神经网络
链接:https://arxiv.org/abs/2601.07700

作者:Jakob Paul Zimmermann,Georg Loho
摘要:It has been demonstrated in various contexts that monotonicity leads to better explainability in neural networks. However, not every function can be well approximated by a monotone neural network. We demonstrate that monotonicity can still be used in two ways to boost explainability. First, we use an adaptation of the decomposition of a trained ReLU network into two monotone and convex parts, thereby overcoming numerical obstacles from an inherent blowup of the weights in this procedure. Our proposed saliency methods -- SplitCAM and SplitLRP -- improve on state of the art results on both VGG16 and Resnet18 networks on ImageNet-S across all Quantus saliency metric categories. Second, we exhibit that training a model as the difference between two monotone neural networks results in a system with strong self-explainability properties.


【2】Tab-TRM: Tiny Recursive Model for Insurance Pricing on Tabular Data
标题:Tab-TRM:表格数据保险定价的微型回归模型
链接:https://arxiv.org/abs/2601.07675

作者:Kishan Padayachy,Ronald Richman,Mario V. Wüthrich
备注:30 pages
摘要:We introduce Tab-TRM (Tabular-Tiny Recursive Model), a network architecture that adapts the recursive latent reasoning paradigm of Tiny Recursive Models (TRMs) to insurance modeling. Drawing inspiration from both the Hierarchical Reasoning Model (HRM) and its simplified successor TRM, the Tab-TRM model makes predictions by reasoning over the input features. It maintains two learnable latent tokens - an answer token and a reasoning state - that are iteratively refined by a compact, parameter-efficient recursive network. The recursive processing layer repeatedly updates the reasoning state given the full token sequence and then refines the answer token, in close analogy with iterative insurance pricing schemes. Conceptually, Tab-TRM bridges classical actuarial workflows - iterative generalized linear model fitting and minimum-bias calibration - on the one hand, and modern machine learning, in terms of Gradient Boosting Machines, on the other.


【3】Learning to accelerate Krasnosel'skii-Mann fixed-point iterations with guarantees
标题:学习在保证的情况下加速Krasnosel的skiii-Mann定点迭代
链接:https://arxiv.org/abs/2601.07665

作者:Andrea Martin,Giuseppe Belgioioso
摘要 :We introduce a principled learning to optimize (L2O) framework for solving fixed-point problems involving general nonexpansive mappings. Our idea is to deliberately inject summable perturbations into a standard Krasnosel'skii-Mann iteration to improve its average-case performance over a specific distribution of problems while retaining its convergence guarantees. Under a metric sub-regularity assumption, we prove that the proposed parametrization includes only iterations that locally achieve linear convergence-up to a vanishing bias term-and that it encompasses all iterations that do so at a sufficiently fast rate. We then demonstrate how our framework can be used to augment several widely-used operator splitting methods to accelerate the solution of structured monotone inclusion problems, and validate our approach on a best approximation problem using an L2O-augmented Douglas-Rachford splitting algorithm.


【4】Beyond Sharpness: A Flatness Decomposition Framework for Efficient Continual Learning
标题:超越尖锐:高效持续学习的平坦分解框架
链接:https://arxiv.org/abs/2601.07636

作者:Yanan Chen,Tieliang Gong,Yunjiao Zhang,Wen Wen
备注:Accepted by AAAI 2026
摘要:Continual Learning (CL) aims to enable models to sequentially learn multiple tasks without forgetting previous knowledge. Recent studies have shown that optimizing towards flatter loss minima can improve model generalization. However, existing sharpness-aware methods for CL suffer from two key limitations: (1) they treat sharpness regularization as a unified signal without distinguishing the contributions of its components. and (2) they introduce substantial computational overhead that impedes practical deployment. To address these challenges, we propose FLAD, a novel optimization framework that decomposes sharpness-aware perturbations into gradient-aligned and stochastic-noise components, and show that retaining only the noise component promotes generalization. We further introduce a lightweight scheduling scheme that enables FLAD to maintain significant performance gains even under constrained training time. FLAD can be seamlessly integrated into various CL paradigms and consistently outperforms standard and sharpness-aware optimizers in diverse experimental settings, demonstrating its effectiveness and practicality in CL.


【5】An adjoint method for training data-driven reduced-order models
标题:训练数据驱动降阶模型的伴随方法
链接:https://arxiv.org/abs/2601.07579

作者:Donglin Liu,Francisco García Atienza,Mengwu Guo
摘要:Reduced-order modeling lies at the interface of numerical analysis and data-driven scientific computing, providing principled ways to compress high-fidelity simulations in science and engineering. We propose a training framework that couples a continuous-time form of operator inference with the adjoint-state method to obtain robust data-driven reduced-order models. This method minimizes a trajectory-based loss between reduced-order solutions and projected snapshot data, which removes the need to estimate time derivatives from noisy measurements and provides intrinsic temporal regularization through time integration. We derive the corresponding continuous adjoint equations to compute gradients efficiently and implement a gradient based optimizer to update the reduced model parameters. Each iteration only requires one forward reduced order solve and one adjoint solve, followed by inexpensive gradient assembly, making the method attractive for large-scale simulations. We validate the proposed method on three partial differential equations: viscous Burgers' equation, the two-dimensional Fisher-KPP equation, and an advection-diffusion equation. We perform systematic comparisons against standard operator inference under two perturbation regimes, namely reduced temporal snapshot density and additive Gaussian noise. For clean data, both approaches deliver similar accuracy, but in situations with sparse sampling and noise, the proposed adjoint-based training provides better accuracy and enhanced roll-out stability.


【6】Task Prototype-Based Knowledge Retrieval for Multi-Task Learning from Partially Annotated Data
标题:基于任务原型的知识检索用于部分注释数据的多任务学习
链接:https://arxiv.org/abs/2601.07474

作者:Youngmin Oh,Hyung-Il Kim,Jung Uk Kim
备注:Accepted at AAAI 2026
摘要:Multi-task learning (MTL) is critical in real-world applications such as autonomous driving and robotics, enabling simultaneous handling of diverse tasks. However, obtaining fully annotated data for all tasks is impractical due to labeling costs. Existing methods for partially labeled MTL typically rely on predictions from unlabeled tasks, making it difficult to establish reliable task associations and potentially leading to negative transfer and suboptimal performance. To address these issues, we propose a prototype-based knowledge retrieval framework that achieves robust MTL instead of relying on predictions from unlabeled tasks. Our framework consists of two key components: (1) a task prototype embedding task-specific characteristics and quantifying task associations, and (2) a knowledge retrieval transformer that adaptively refines feature representations based on these associations. To achieve this, we introduce an association knowledge generating (AKG) loss to ensure the task prototype consistently captures task-specific characteristics. Extensive experiments demonstrate the effectiveness of our framework, highlighting its potential for robust multi-task learning, even when only a subset of tasks is annotated.


【7】CompNO: A Novel Foundation Model approach for solving Partial Differential Equations
标题:CompNO:求解偏微分方程的一种新的基础模型方法
链接:https://arxiv.org/abs/2601.07384

作者:Hamda Hmida,Hsiu-Wen Chang Joly,Youssef Mesri
备注:Under review at MDPI
摘要 :Partial differential equations (PDEs) govern a wide range of physical phenomena, but their numerical solution remains computationally demanding, especially when repeated simulations are required across many parameter settings. Recent Scientific Foundation Models (SFMs) aim to alleviate this cost by learning universal surrogates from large collections of simulated systems, yet they typically rely on monolithic architectures with limited interpretability and high pretraining expense. In this work we introduce Compositional Neural Operators (CompNO), a compositional neural operator framework for parametric PDEs. Instead of pretraining a single large model on heterogeneous data, CompNO first learns a library of Foundation Blocks, where each block is a parametric Fourier neural operator specialized to a fundamental differential operator (e.g. convection, diffusion, nonlinear convection). These blocks are then assembled, via lightweight Adaptation Blocks, into task-specific solvers that approximate the temporal evolution operator for target PDEs. A dedicated boundary-condition operator further enforces Dirichlet constraints exactly at inference time. We validate CompNO on one-dimensional convection, diffusion, convection--diffusion and Burgers' equations from the PDEBench suite. The proposed framework achieves lower relative L2 error than strong baselines (PFNO, PDEFormer and in-context learning based models) on linear parametric systems, while remaining competitive on nonlinear Burgers' flows. The model maintains exact boundary satisfaction with zero loss at domain boundaries, and exhibits robust generalization across a broad range of Peclet and Reynolds numbers. These results demonstrate that compositional neural operators provide a scalable and physically interpretable pathway towards foundation models for PDEs.


【8】Innovation Capacity of Dynamical Learning Systems
标题:动态学习系统的创新能力
链接:https://arxiv.org/abs/2601.07257

作者:Anthony M. Polloreno
备注:12 pages, 3 figures
摘要:In noisy physical reservoirs, the classical information-processing capacity $C_{\mathrm{ip}}$ quantifies how well a linear readout can realize tasks measurable from the input history, yet $C_{\mathrm{ip}}$ can be far smaller than the observed rank of the readout covariance. We explain this ``missing capacity'' by introducing the innovation capacity $C_{\mathrm{i}}$, the total capacity allocated to readout components orthogonal to the input filtration (Doob innovations, including input-noise mixing). Using a basis-free Hilbert-space formulation of the predictable/innovation decomposition, we prove the conservation law $C_{\mathrm{ip}}+C_{\mathrm{i}}=\mathrm{rank}(Σ_{XX})\le d$, so predictable and innovation capacities exactly partition the rank of the observable readout dimension covariance $Σ_{XX}\in \mathbb{R}^{\rm d\times d}$. In linear-Gaussian Johnson-Nyquist regimes, $Σ_{XX}(T)=S+T N_0$, the split becomes a generalized-eigenvalue shrinkage rule and gives an explicit monotone tradeoff between temperature and predictable capacity. Geometrically, in whitened coordinates the predictable and innovation components correspond to complementary covariance ellipsoids, making $C_{\mathrm{i}}$ a trace-controlled innovation budget. A large $C_{\mathrm{i}}$ forces a high-dimensional innovation subspace with a variance floor and under mild mixing and anti-concentration assumptions this yields extensive innovation-block differential entropy and exponentially many distinguishable histories. Finally, we give an information-theoretic lower bound showing that learning the induced innovation-block law in total variation requires a number of samples that scales with the effective innovation dimension, supporting the generative utility of noisy physical reservoirs.


【9】XBTorch: A Unified Framework for Modeling and Co-Design of Crossbar-Based Deep Learning Accelerators
标题:XBTorch:基于交叉开关的深度学习加速器建模和协同设计的统一框架
链接:https://arxiv.org/abs/2601.07086

作者:Osama Yousuf,Andreu L. Glasmann,Martin Lueker-Boden,Sina Najmaei,Gina C. Adam
摘要:Emerging memory technologies have gained significant attention as a promising pathway to overcome the limitations of conventional computing architectures in deep learning applications. By enabling computation directly within memory, these technologies - built on nanoscale devices with tunable and nonvolatile conductance - offer the potential to drastically reduce energy consumption and latency compared to traditional von Neumann systems. This paper introduces XBTorch (short for CrossBarTorch), a novel simulation framework that integrates seamlessly with PyTorch and provides specialized tools for accurately and efficiently modeling crossbar-based systems based on emerging memory technologies. Through detailed comparisons and case studies involving hardware-aware training and inference, we demonstrate how XBTorch offers a unified interface for key research areas such as device-level modeling, cross-layer co-design, and inference-time fault tolerance. While exemplar studies utilize ferroelectric field-effect transistor (FeFET) models, the framework remains technology-agnostic - supporting other emerging memories such as resistive RAM (ReRAM), as well as enabling user-defined custom device models. The code is publicly available at: https://github.com/ADAM-Lab-GW/xbtorch


【10】MoE-DisCo:Low Economy Cost Training Mixture-of-Experts Models
标题:MoE-DisCo:低经济成本训练混合专家模型
链接:https://arxiv.org/abs/2601.06857

作者:Xin Ye,Daning Cheng,Boyang Zhang,Yunquan Zhang
摘要:Training large-scale Mixture-of-Experts (MoE) models typically requires high-memory, high-bandwidth GPUs (e.g., A100), and their high cost has become a major barrier to large-model training. In contrast, affordable hardware is low-cost but constrained by memory capacity and bandwidth, making it unsuitable for direct LLM training. To address this, we propose MoE-DisCo (Mixture-of-Experts with Disentangled Clustering and Coordination), a staged training framework. MoE-DisCo decomposes the MoE model into multiple dense submodels, each consisting of a shared backbone and a single expert, and partitions the training data into subsets using unsupervised clustering. Each submodel is trained independently and in parallel on its assigned data subset using low-cost devices, without any inter-device communication. Subsequently, all experts are integrated into a complete MoE model and fine-tuned globally for a short period on high-memory, high-bandwidth GPUs. Experiments show that our method matches or even surpasses full-parameter training in performance across multiple downstream tasks, loss function, and perplexity (PPL), while reducing training cost by 47.6 percent to 69.5 percent on Qwen1.5-MoE-2.7B and Llama-MoE-3.5B across different datasets.


【11】Cross-Modal Computational Model of Brain-Heart Interactions via HRV and EEG Feature
标题:基于HRV和EEG特征的脑-心相互作用跨模态计算模型
链接:https://arxiv.org/abs/2601.06792

作者:Malavika Pradeep,Akshay Sasi,Nusaibah Farrukh,Rahul Venugopal,Elizabeth Sherly
备注:6 pages, 2 figures, Code available at: https://github.com/Malavika-pradeep/Computational-model-for-Brain-heart-Interaction-Analysis. Presented at AIHC (not published)
摘要 :The electroencephalogram (EEG) has been the gold standard for quantifying mental workload; however, due to its complexity and non-portability, it can be constraining. ECG signals, which are feasible on wearable equipment pieces such as headbands, present a promising method for cognitive state monitoring. This research explores whether electrocardiogram (ECG) signals are able to indicate mental workload consistently and act as surrogates for EEG-based cognitive indicators. This study investigates whether ECG-derived features can serve as surrogate indicators of cognitive load, a concept traditionally quantified using EEG. Using a publicly available multimodal dataset (OpenNeuro) of EEG and ECG recorded during working-memory and listening tasks, features of HRV and Catch22 descriptors are extracted from ECG, and spectral band-power with Catch22 features from EEG. A cross-modal regression framework based on XGBoost was trained to map ECG-derived HRV representations to EEG-derived cognitive features. In order to address data sparsity and model brain-heart interactions, we integrated the PSV-SDG to produce EEG-conditioned synthetic HRV time series.This addresses the challenge of inferring cognitive load solely from ECG-derived features using a combination of multimodal learning, signal processing, and synthetic data generation. These outcomes form a basis for light, interpretable machine learning models that are implemented through wearable biosensors in non-lab environments. Synthetic HRV inclusion enhances robustness, particularly in sparse data situations. Overall, this work is an initiation for building low-cost, explainable, and real-time cognitive monitoring systems for mental health, education, and human-computer interaction, with a focus on ageing and clinical populations.


【12】A Backpropagation-Free Feedback-Hebbian Network for Continual Learning Dynamics
标题:一种无反向传播的连续学习动态反馈Hebbian网络
链接:https://arxiv.org/abs/2601.06758

作者:Josh Li
备注:8 pages, 10 figures
摘要:Feedback-rich neural architectures can regenerate earlier representations and inject temporal context, making them a natural setting for strictly local synaptic plasticity. We ask whether a minimal, backpropagation-free feedback--Hebbian system can already express interpretable continual-learning--relevant behaviors under controlled training schedules. We introduce a compact prediction--reconstruction architecture with two feedforward layers for supervised association learning and two dedicated feedback layers trained to reconstruct earlier activity and re-inject it as additive temporal context. All synapses are updated by a unified local rule combining centered Hebbian covariance, Oja-style stabilization, and a local supervised drive where targets are available, requiring no weight transport or global error backpropagation. On a small two-pair association task, we characterize learning through layer-wise activity snapshots, connectivity trajectories (row/column means of learned weights), and a normalized retention index across phases. Under sequential A->B training, forward output connectivity exhibits a long-term depression (LTD)-like suppression of the earlier association while feedback connectivity preserves an A-related trace during acquisition of B. Under deterministic interleaving A,B,A,B,..., both associations are concurrently maintained rather than sequentially suppressed. Architectural controls and rule-term ablations isolate the role of dedicated feedback in regeneration and co-maintenance, and the role of the local supervised term in output selectivity and unlearning. Together, the results show that a compact feedback pathway trained with local plasticity can support regeneration and continual-learning--relevant dynamics in a minimal, mechanistically transparent setting.


【13】Why are there many equally good models? An Anatomy of the Rashomon Effect
标题:为什么有很多同样好的模特?罗生门效应剖析
链接:https://arxiv.org/abs/2601.06730

作者:Harsh Parikh
摘要:The Rashomon effect -- the existence of multiple, distinct models that achieve nearly equivalent predictive performance -- has emerged as a fundamental phenomenon in modern machine learning and statistics. In this paper, we explore the causes underlying the Rashomon effect, organizing them into three categories: statistical sources arising from finite samples and noise in the data-generating process; structural sources arising from non-convexity of optimization objectives and unobserved variables that create fundamental non-identifiability; and procedural sources arising from limitations of optimization algorithms and deliberate restrictions to suboptimal model classes. We synthesize insights from machine learning, statistics, and optimization literature to provide a unified framework for understanding why the multiplicity of good models arises. A key distinction emerges: statistical multiplicity diminishes with more data, structural multiplicity persists asymptotically and cannot be resolved without different data or additional assumptions, and procedural multiplicity reflects choices made by practitioners. Beyond characterizing causes, we discuss both the challenges and opportunities presented by the Rashomon effect, including implications for inference, interpretability, fairness, and decision-making under uncertainty.


【14】DS-CIM: Digital Stochastic Computing-In-Memory Featuring Accurate OR-Accumulation via Sample Region Remapping for Edge AI Models
标题:DS-TIM:数字随机内存计算,通过边缘AI模型的样本区域重新映射实现准确的OR累积
链接:https://arxiv.org/abs/2601.06724

作者:Kunming Shao,Liang Zhao,Jiangnan Yu,Zhipeng Liao,Xiaomeng Wang,Yi Zou,Tim Kwang-Ting Cheng,Chi-Ying Tsui
备注:Accepted by 2026 Design, Automation and Test in Europe Conference (DATE)
摘要:Stochastic computing (SC) offers hardware simplicity but suffers from low throughput, while high-throughput Digital Computing-in-Memory (DCIM) is bottlenecked by costly adder logic for matrix-vector multiplication (MVM). To address this trade-off, this paper introduces a digital stochastic CIM (DS-CIM) architecture that achieves both high accuracy and efficiency. We implement signed multiply-accumulation (MAC) in a compact, unsigned OR-based circuit by modifying the data representation. Throughput is enhanced by replicating this low-cost circuit 64 times with only a 1x area increase. Our core strategy, a shared Pseudo Random Number Generator (PRNG) with 2D partitioning, enables single-cycle mutually exclusive activation to eliminate OR-gate collisions. We also resolve the 1s saturation issue via stochastic process analysis and data remapping, significantly improving accuracy and resilience to input sparsity. Our high-accuracy DS-CIM1 variant achieves 94.45% accuracy for INT8 ResNet18 on CIFAR-10 with a root-mean-squared error (RMSE) of just 0.74%. Meanwhile, our high-efficiency DS-CIM2 variant attains an energy efficiency of 3566.1 TOPS/W and an area efficiency of 363.7 TOPS/mm^2, while maintaining a low RMSE of 3.81%. The DS-CIM capability with larger models is further demonstrated through experiments with INT8 ResNet50 on ImageNet and the FP8 LLaMA-7B model.


【15】Beyond Perfect Scores: Proof-by-Contradiction for Trustworthy Machine Learning
标题:超越满分:值得信赖的机器学习的矛盾证明
链接:https://arxiv.org/abs/2601.06704

作者:Dushan N. Wadduwage,Dineth Jayakody,Leonidas Zimianitis
备注:13 pages, 6 figures
摘要:Machine learning (ML) models show strong promise for new biomedical prediction tasks, but concerns about trustworthiness have hindered their clinical adoption. In particular, it is often unclear whether a model relies on true clinical cues or on spurious hierarchical correlations in the data. This paper introduces a simple yet broadly applicable trustworthiness test grounded in stochastic proof-by-contradiction. Instead of just showing high test performance, our approach trains and tests on spurious labels carefully permuted based on a potential outcomes framework. A truly trustworthy model should fail under such label permutation; comparable accuracy across real and permuted labels indicates overfitting, shortcut learning, or data leakage. Our approach quantifies this behavior through interpretable Fisher-style p-values, which are well understood by domain experts across medical and life sciences. We evaluate our approach on multiple new bacterial diagnostics to separate tasks and models learning genuine causal relationships from those driven by dataset artifacts or statistical coincidences. Our work establishes a foundation to build rigor and trust between ML and life-science research communities, moving ML models one step closer to clinical adoption.


【16】Explainability of Complex AI Models with Correlation Impact Ratio
标题:具有相关影响比的复杂人工智能模型的可解释性
链接:https://arxiv.org/abs/2601.06701

作者:Poushali Sengupta,Rabindra Khadka,Sabita Maharjan,Frank Eliassen,Yan Zhang,Shashi Raj Pandey,Pedro G. Lind,Anis Yazidi
摘要:Complex AI systems make better predictions but often lack transparency, limiting trustworthiness, interpretability, and safe deployment. Common post hoc AI explainers, such as LIME, SHAP, HSIC, and SAGE, are model agnostic but are too restricted in one significant regard: they tend to misrank correlated features and require costly perturbations, which do not scale to high dimensional data. We introduce ExCIR (Explainability through Correlation Impact Ratio), a theoretically grounded, simple, and reliable metric for explaining the contribution of input features to model outputs, which remains stable and consistent under noise and sampling variations. We demonstrate that ExCIR captures dependencies arising from correlated features through a lightweight single pass formulation. Experimental evaluations on diverse datasets, including EEG, synthetic vehicular data, Digits, and Cats-Dogs, validate the effectiveness and stability of ExCIR across domains, achieving more interpretable feature explanations than existing methods while remaining computationally efficient. To this end, we further extend ExCIR with an information theoretic foundation that unifies the correlation ratio with Canonical Correlation Analysis under mutual information bounds, enabling multi output and class conditioned explainability at scale.


【17】Object-Centric World Models Meet Monte Carlo Tree Search
标题:以对象为中心的世界模型满足蒙特卡洛树搜索
链接:https://arxiv.org/abs/2601.06604

作者:Rodion Vakhitov,Leonid Ugadiarov,Aleksandr Panov
摘要:In this paper, we introduce ObjectZero, a novel reinforcement learning (RL) algorithm that leverages the power of object-level representations to model dynamic environments more effectively. Unlike traditional approaches that process the world as a single undifferentiated input, our method employs Graph Neural Networks (GNNs) to capture intricate interactions among multiple objects. These objects, which can be manipulated and interact with each other, serve as the foundation for our model's understanding of the environment. We trained the algorithm in a complex setting teeming with diverse, interactive objects, demonstrating its ability to effectively learn and predict object dynamics. Our results highlight that a structured world model operating on object-centric representations can be successfully integrated into a model-based RL algorithm utilizing Monte Carlo Tree Search as a planning module.


【18】A novel RF-enabled Non-Destructive Inspection Method through Machine Learning and Programmable Wireless Environments
标题:通过机器学习和可编程无线环境的新型RF无损检测方法
链接:https://arxiv.org/abs/2601.06512

作者:Stavros Tsimpoukis,Dimitrios Tyrovolas,Sotiris Ioannidis,Maria Kafesaki,Ian F. Akyildiz,George K. Karagiannidis,Christos K. Liaskos
摘要:Contemporary industrial Non-Destructive Inspection (NDI) methods require sensing capabilities that operate in occluded, hazardous, or access restricted environments. Yet, the current visual inspection based on optical cameras offers limited quality of service to that respect. In that sense, novel methods for workpiece inspection, suitable, for smart manufacturing are needed. Programmable Wireless Environments (PWE) could help towards that direction, by redefining the wireless Radio Frequency (RF) wave propagation as a controllable inspector entity. In this work, we propose a novel approach to Non-Destructive Inspection, leveraging an RF sensing pipeline based on RF wavefront encoding for retrieving workpiece-image entries from a designated database. This approach combines PWE-enabled RF wave manipulation with machine learning (ML) tools trained to produce visual outputs for quality inspection. Specifically, we establish correlation relationships between RF wavefronts and target industrial assets, hence yielding a dataset which links wavefronts to their corresponding images in a structured manner. Subsequently, a Generative Adversarial Network (GAN) derives visual representations closely matching the database entries. Our results indicate that the proposed method achieves an SSIM 99.5% matching score in visual outputs, paving the way for next-generation quality control workflows in industry.


【19】StablePDENet: Enhancing Stability of Operator Learning for Solving Differential Equations
标题:StablePDENet:增强求解方程的操作员学习的稳定性
链接:https://arxiv.org/abs/2601.06472

作者:Chutian Huang,Chang Ma,Kaibo Wang,Yang Xiang
摘要 :Learning solution operators for differential equations with neural networks has shown great potential in scientific computing, but ensuring their stability under input perturbations remains a critical challenge. This paper presents a robust self-supervised neural operator framework that enhances stability through adversarial training while preserving accuracy. We formulate operator learning as a min-max optimization problem, where the model is trained against worst-case input perturbations to achieve consistent performance under both normal and adversarial conditions. We demonstrate that our method not only achieves good performance on standard inputs, but also maintains high fidelity under adversarial perturbed inputs. The results highlight the importance of stability-aware training in operator learning and provide a foundation for developing reliable neural PDE solvers in real-world applications, where input noise and uncertainties are inevitable.


【20】FlexAct: Why Learn when you can Pick?
标题:FlexAct:当您可以选择时,为什么要学习?
链接:https://arxiv.org/abs/2601.06441

作者:Ramnath Kumar,Kyle Ritscher,Junmin Judy,Lawrence Liu,Cho-Jui Hsieh
备注:Under Review
摘要:Learning activation functions has emerged as a promising direction in deep learning, allowing networks to adapt activation mechanisms to task-specific demands. In this work, we introduce a novel framework that employs the Gumbel-Softmax trick to enable discrete yet differentiable selection among a predefined set of activation functions during training. Our method dynamically learns the optimal activation function independently of the input, thereby enhancing both predictive accuracy and architectural flexibility. Experiments on synthetic datasets show that our model consistently selects the most suitable activation function, underscoring its effectiveness. These results connect theoretical advances with practical utility, paving the way for more adaptive and modular neural architectures in complex learning scenarios.


【21】Monkey Jump : MoE-Style PEFT for Efficient Multi-Task Learning
标题:Monkey Jump:教育部式PEFT,用于高效的多任务学习
链接:https://arxiv.org/abs/2601.06356

作者:Nusrat Jahan Prottasha,Md Kowsher,Chun-Nam Yu,Chen Chen,Ozlem Garibay
摘要:Mixture-of-experts variants of parameter-efficient fine-tuning enable per-token specialization, but they introduce additional trainable routers and expert parameters, increasing memory usage and training cost. This undermines the core goal of parameter-efficient fine-tuning. We propose Monkey Jump, a method that brings mixture-of-experts-style specialization to parameter-efficient fine-tuning without introducing extra trainable parameters for experts or routers. Instead of adding new adapters as experts, Monkey Jump treats the adapters already present in each Transformer block (such as query, key, value, up, and down projections) as implicit experts and routes tokens among them. Routing is performed using k-means clustering with exponentially moving averaged cluster centers, requiring no gradients and no learned parameters. We theoretically show that token-wise routing increases expressivity and can outperform shared adapters by avoiding cancellation effects. Across multi-task experiments covering 14 text, 14 image, and 19 video benchmarks, Monkey Jump achieves competitive performance with mixture-of-experts-based parameter-efficient fine-tuning methods while using 7 to 29 times fewer trainable parameters, up to 48 percent lower memory consumption, and 1.5 to 2 times faster training. Monkey Jump is architecture-agnostic and can be applied to any adapter-based parameter-efficient fine-tuning method.


【22】Dynamics-inspired Structure Hallucination for Protein-protein Interaction Modeling
标题:蛋白质-蛋白质相互作用建模的动力学结构幻觉
链接:https://arxiv.org/abs/2601.06214

作者:Fang Wu,Stan Z. Li
摘要:Protein-protein interaction (PPI) represents a central challenge within the biology field, and accurately predicting the consequences of mutations in this context is crucial for drug design and protein engineering. Deep learning (DL) has shown promise in forecasting the effects of such mutations, but is hindered by two primary constraints. First, the structures of mutant proteins are often elusive to acquire. Secondly, PPI takes place dynamically, which is rarely integrated into the DL architecture design. To address these obstacles, we present a novel framework named Refine-PPI with two key enhancements. First, we introduce a structure refinement module trained by a mask mutation modeling (MMM) task on available wild-type structures, which is then transferred to produce the inaccessible mutant structures. Second, we employ a new kind of geometric network, called the probability density cloud network (PDC-Net), to capture 3D dynamic variations and encode the atomic uncertainty associated with PPI. Comprehensive experiments on SKEMPI.v2 substantiate the superiority of Refine-PPI over all existing tools for predicting free energy change. These findings underscore the effectiveness of our hallucination strategy and the PDC module in addressing the absence of mutant protein structure and modeling geometric uncertainty.


【23】Towards Public Administration Research Based on Interpretable Machine Learning
标题:基于可解释机器学习的公共管理研究
链接:https://arxiv.org/abs/2601.06205

作者:Zhanyu Liu,Yang Yu
摘要 :Causal relationships play a pivotal role in research within the field of public administration. Ensuring reliable causal inference requires validating the predictability of these relationships, which is a crucial precondition. However, prediction has not garnered adequate attention within the realm of quantitative research in public administration and the broader social sciences. The advent of interpretable machine learning presents a significant opportunity to integrate prediction into quantitative research conducted in public administration. This article delves into the fundamental principles of interpretable machine learning while also examining its current applications in social science research. Building upon this foundation, the article further expounds upon the implementation process of interpretable machine learning, encompassing key aspects such as dataset construction, model training, model evaluation, and model interpretation. Lastly, the article explores the disciplinary value of interpretable machine learning within the field of public administration, highlighting its potential to enhance the generalization of inference, facilitate the selection of optimal explanations for phenomena, stimulate the construction of theoretical hypotheses, and provide a platform for the translation of knowledge. As a complement to traditional causal inference methods, interpretable machine learning ushers in a new era of credibility in quantitative research within the realm of public administration.


【24】EntroLnn: Entropy-Guided Liquid Neural Networks for Operando Refinement of Battery Capacity Fade Trajectories
标题:EntroLnn:用于电池容量衰落轨迹的操作细化的熵引导液体神经网络
链接:https://arxiv.org/abs/2601.06195

作者:Wei Li,Wei Zhang,Qingyu Yan
备注:8 pages, 4 figures
摘要:Battery capacity degradation prediction has long been a central topic in battery health analytics, and most studies focus on state of health (SoH) estimation and end of life (EoL) prediction. This study extends the scope to online refinement of the entire capacity fade trajectory (CFT) through EntroLnn, a framework based on entropy-guided transformable liquid neural networks (LNNs). EntroLnn treats CFT refinement as an integrated process rather than two independent tasks for pointwise SoH and EoL. We introduce entropy-based features derived from online temperature fields, applied for the first time in battery analytics, and combine them with customized LNNs that model temporal battery dynamics effectively. The framework enhances both static and dynamic adaptability of LNNs and achieves robust and generalizable CFT refinement across different batteries and operating conditions. The approach provides a high fidelity battery health model with lightweight computation, achieving mean absolute errors of only 0.004577 for CFT and 18 cycles for EoL prediction. This work establishes a foundation for entropy-informed learning in battery analytics and enables self-adaptive, lightweight, and interpretable battery health prediction in practical battery management systems.


【25】Data-Driven Reduced-Complexity Modeling of Fluid Flows: A Community Challenge
标题:数据驱动的流体流简化复杂性建模:社区挑战
链接:https://arxiv.org/abs/2601.06183

作者:Oliver T. Schmidt,Aaron Towne,Adrian Lozano-Duran,Scott T. M. Dawson,Ricardo Vinuesa
摘要:We introduce a community challenge designed to facilitate direct comparisons between data-driven methods for compression, forecasting, and sensing of complex aerospace flows. The challenge is organized into three tracks that target these complementary capabilities: compression (compact representations for large datasets), forecasting (predicting future flow states from a finite history), and sensing (inferring unmeasured flow states from limited measurements). Across these tracks, multiple challenges span diverse flow datasets and use cases, each emphasizing different model requirements. The challenge is open to anyone, and we invite broad participation to build a comprehensive and balanced picture of what works and where current methods fall short. To support fair comparisons, we provide standardized success metrics, evaluation tools, and baseline implementations, with one classical and one machine-learning baseline per challenge. Final assessments use blind tests on withheld data. We explicitly encourage negative results and careful analyses of limitations. Outcomes will be disseminated through an AIAA Journal Virtual Collection and invited presentations at AIAA conferences.


【26】MixDPO: Modeling Preference Strength for Pluralistic Alignment
标题:MixDPO:多元对齐的偏好强度建模
链接:https://arxiv.org/abs/2601.06180

作者:Saki Imai,Pedram Heydari,Anthony Sicilia,Asteria Kaeberlein,Katherine Atwell,Malihe Alikhani
摘要:Preference based alignment objectives implicitly assume that all human preferences are expressed with equal strength. In practice, however, preference strength varies across individuals and contexts -- a phenomenon established in behavioral economics and discrete choice theory. This mismatch limits the ability of existing objectives to faithfully capture heterogeneous human judgments. Inspired by this literature, we introduce Mixed Logit Direct Preference Optimization (MixDPO), a generalization of Direct Preference Optimization that models variation in preference strength. MixDPO enables alignment objectives to capture heterogeneity in how strongly preferences are expressed across training examples. We evaluate MixDPO on three preference datasets using two open-weight language models. Across datasets, MixDPO improves aggregate alignment performance (+11.2 points on Pythia-2.8B) while preserving subgroup level preferences, with the largest gains appearing in settings with higher inferred preference heterogeneity. MixDPO makes preference heterogeneity explicit through learned strength distributions. We release our code for reproducibility.


【27】Forget Many, Forget Right: Scalable and Precise Concept Unlearning in Diffusion Models
标题:忘记许多,忘记正确:扩散模型中可扩展和精确的概念遗忘
链接:https://arxiv.org/abs/2601.06162

作者:Kaiyuan Deng,Gen Li,Yang Xiao,Bo Hui,Xiaolong Ma
摘要:Text-to-image diffusion models have achieved remarkable progress, yet their use raises copyright and misuse concerns, prompting research into machine unlearning. However, extending multi-concept unlearning to large-scale scenarios remains difficult due to three challenges: (i) conflicting weight updates that hinder unlearning or degrade generation; (ii) imprecise mechanisms that cause collateral damage to similar content; and (iii) reliance on additional data or modules, creating scalability bottlenecks. To address these, we propose Scalable-Precise Concept Unlearning (ScaPre), a unified framework tailored for large-scale unlearning. ScaPre introduces a conflict-aware stable design, integrating spectral trace regularization and geometry alignment to stabilize optimization, suppress conflicts, and preserve global structure. Furthermore, an Informax Decoupler identifies concept-relevant parameters and adaptively reweights updates, strictly confining unlearning to the target subspace. ScaPre yields an efficient closed-form solution without requiring auxiliary data or sub-models. Comprehensive experiments on objects, styles, and explicit content demonstrate that ScaPre effectively removes target concepts while maintaining generation quality. It forgets up to $\times \mathbf{5}$ more concepts than the best baseline within acceptable quality limits, achieving state-of-the-art precision and efficiency for large-scale unlearning.


【28】DeeperBrain: A Neuro-Grounded EEG Foundation Model Towards Universal BCI
标题:DeeperBrain:面向通用BCI的神经接地脑电基础模型
链接:https://arxiv.org/abs/2601.06134

作者:Jiquan Wang,Sha Zhao,Yangxuan Zhou,Yiming Kang,Shijian Li,Gang Pan
备注:Preprint.Under Review
摘要:Electroencephalography (EEG) foundation models hold significant promise for universal Brain-Computer Interfaces (BCIs). However, existing approaches often rely on end-to-end fine-tuning and exhibit limited efficacy under frozen-probing protocols, lacking the intrinsic universality required for broad generalization. This limitation stems from adapting general-purpose sequence architectures that overlook the biophysical and dynamical principles of neural activity. To bridge this gap, we propose DeeperBrain, a neuro-grounded foundation model integrating domain-specific inductive biases into its model design and learning objectives. Architecturally, DeeperBrain incorporates a volume conduction-aware channel encoding to model spatial mixing via 3D geometry, and a neurodynamics-aware temporal encoding capturing slow adaptations using oscillatory and exponential bases. For pretraining, we introduce a dual-objective strategy combining Masked EEG Reconstruction (MER) for local fidelity and Neurodynamics Statistics Prediction (NSP). NSP enforces alignment with macroscopic brain states by predicting interpretable order parameters, including spectral power, functional connectivity, cross-frequency coupling, and dynamic complexity. Extensive experiments demonstrate that DeeperBrain achieves state-of-the-art or highly competitive performance under end-to-end fine-tuning. Crucially, it maintains superior efficacy under a rigorous frozen-probing protocol, verifying that embedding neuroscientific first principles endows learned representations with the intrinsic universality essential for universal BCI. The code will be publicly available.


【29】L2CU: Learning to Complement Unseen Users
标题:L2 CU:学会补充隐形用户
链接:https://arxiv.org/abs/2601.06119

作者:Dileepa Pitawela,Gustavo Carneiro,Hsiang-Ting Chen
备注:Published in IEEE Access (https://ieeexplore.ieee.org/document/11314492)
摘要:Recent research highlights the potential of machine learning models to learn to complement (L2C) human strengths; however, generalizing this capability to unseen users remains a significant challenge. Existing L2C methods oversimplify interaction between human and AI by relying on a single, global user model that neglects individual user variability, leading to suboptimal cooperative performance. Addressing this, we introduce L2CU, a novel L2C framework for human-AI cooperative classification with unseen users. Given sparse and noisy user annotations, L2CU identifies representative annotator profiles capturing distinct labeling patterns. By matching unseen users to these profiles, L2CU leverages profile-specific models to complement the user and achieve superior joint accuracy. We evaluate L2CU on datasets (CIFAR-10N, CIFAR-10H, Fashion-MNIST-H, Chaoyang and AgNews), demonstrating its effectiveness as a model-agnostic solution for improving human-AI cooperative classification.


【30】CBMAS: Cognitive Behavioral Modeling via Activation Steering
标题:CBMAS:通过激活引导进行认知行为建模
链接:https://arxiv.org/abs/2601.06109

作者:Ahmed H. Ismail,Anthony Kuang,Ayo Akinkugbe,Kevin Zhu,Sean O'Brien
备注:Accepted to CogInterp @ NeurIPS 2025. Equal contribution by Ahmed H. Ismail and Anthony Kuang
摘要:Large language models (LLMs) often encode cognitive behaviors unpredictably across prompts, layers, and contexts, making them difficult to diagnose and control. We present CBMAS, a diagnostic framework for continuous activation steering, which extends cognitive bias analysis from discrete before/after interventions to interpretable trajectories. By combining steering vector construction with dense α-sweeps, logit lens-based bias curves, and layer-site sensitivity analysis, our approach can reveal tipping points where small intervention strengths flip model behavior and show how steering effects evolve across layer depth. We argue that these continuous diagnostics offer a bridge between high-level behavioral evaluation and low-level representational dynamics, contributing to the cognitive interpretability of LLMs. Lastly, we provide a CLI and datasets for various cognitive behaviors at the project repository, https://github.com/shimamooo/CBMAS.


【31】Judge Model for Large-scale Multimodality Benchmarks
标题:大规模多模式基准的判断模型
链接:https://arxiv.org/abs/2601.06106

作者:Min-Han Shih,Yu-Hsin Wu,Yu-Wei Chen
摘要:We propose a dedicated multimodal Judge Model designed to provide reliable, explainable evaluation across a diverse suite of tasks. Our benchmark spans text, audio, image, and video modalities, drawing from carefully sampled public datasets with fixed seeds to ensure reproducibility and minimize train test leakage. Instead of simple scoring, our framework aggregates multimodal judgments, analyzes the quality and reasoning consistency of model outputs, and generates diagnostic feedback. We evaluate several MLLMs, including Gemini 2.5, Phi 4, and Qwen 2.5, across 280 multimodal samples and compare judge model assessments with human annotators. Results show strong alignment between the Judge Model and human scores, demonstrating its potential as a scalable, interpretable evaluation pipeline for future multimodal AI research.


【32】The Hessian of tall-skinny networks is easy to invert
标题:瘦高网络中的黑森人很容易颠倒
链接:https://arxiv.org/abs/2601.06096

作者:Ali Rahimi
摘要 :We describe an exact algorithm for solving linear systems $Hx=b$ where $H$ is the Hessian of a deep net. The method computes Hessian-inverse-vector products without storing the Hessian or its inverse in time and storage that scale linearly in the number of layers. Compared to the naive approach of first computing the Hessian, then solving the linear system, which takes storage that's quadratic in the number of parameters and cubically many operations, our Hessian-inverse-vector product method scales roughly like Pearlmutter's algorithm for computing Hessian-vector products.


【33】Leveraging Foundation Models for Calibration-Free c-VEP BCIs
标题:利用基础模型进行免校准c-VEP BCI
链接:https://arxiv.org/abs/2601.06028

作者:Mohammadreza Behboodi,Eli Kinney-Lang,Ali Etemad,Adam Kirton,Hatem Abou-Zeid
备注:8 Pages, 2 figures, Accepted and Presented at the IEEE SMC Conference 2025


【34】Personalized Spiking Neural Networks with Ferroelectric Synapses for EEG Signal Processing
标题:基于铁电突触的个性化Spiking神经网络脑电信号处理
链接:https://arxiv.org/abs/2601.00020

作者:Nikhil Garg,Anxiong Song,Niklas Plessnig,Nathan Savoia,Laura Bégon-Lours


【35】Aligning by Misaligning: Boundary-aware Curriculum Learning for Multimodal Alignment
标题:通过错位来调整:多模式调整的边界感知课程学习
链接:https://arxiv.org/abs/2511.08399

作者:Hua Ye,Hang Ding,Siyuan Chen,Yiyang Jiang,Changyuan Zhang,Xuan Zhang
备注:24 pages, 6 figures, 5 tables. Submitted to NeurIPS 2025


【36】Riesz Representer Fitting under Bregman Divergence: A Unified Framework for Debiased Machine Learning
标题:Bregman Divergence下的Riesz代表体匹配:去偏置机器学习的统一框架
链接:https://arxiv.org/abs/2601.07752

作者:Masahiro Kato


【37】A Framework for Feature Discovery in Intracranial Pressure Monitoring Data Using Neural Network Attention
标题:使用神经网络注意力发现颅压监测数据特征的框架
链接:https://arxiv.org/abs/2601.07691

作者:Jonathan D. Socha,Seyed F. Maroufi,Dipankar Biswas,Richard Um,Aruna S. Rao,Mark G. Luciano
备注:12 pages, 18 figures


【38】Learning About Learning: A Physics Path from Spin Glasses to Artificial Intelligence
标题:学习学习:从旋转眼镜到人工智能的物理之路
链接:https://arxiv.org/abs/2601.07635

作者:Denis D. Caprioti,Matheus Haas,Constantino F. Vasconcelos,Mauricio Girardi-Schappo
备注:18 pages, 11 figures


【39】Machine learning nonequilibrium phase transitions in charge-density wave insulators
标题:机器学习电荷密度波绝缘体的非平衡相转变
链接:https://arxiv.org/abs/2601.07583

作者:Yunhao Fan,Sheng Zhang,Gia-Wei Chern
备注:11 pages, 4 figures


【40】Multi-environment Invariance Learning with Missing Data
标题:缺失数据的多环境不变性学习
链接:https://arxiv.org/abs/2601.07247

作者:Yiran Jia


【41】Local EGOP for Continuous Index Learning
标题:持续指数学习的本地EGOP
链接:https://arxiv.org/abs/2601.07061

作者:Alex Kokot,Anand Hemmady,Vydhourie Thiyageswaran,Marina Meila


【42】Match Made with Matrix Completion: Efficient Learning under Matching Interference
标题:用矩阵完成进行匹配:匹配干扰下的高效学习
链接:https://arxiv.org/abs/2601.06982

作者:Zhiyuan Tang,Wanning Chen,Kan Xu


【43】The Impact of Anisotropic Covariance Structure on the Training Dynamics and Generalization Error of Linear Networks
标题:各向异性协方差结构对线性网络训练动态性和推广误差的影响
链接:https://arxiv.org/abs/2601.06961

作者:Taishi Watanabe,Ryo Karakida,Jun-nosuke Teramae
备注:18 pages


【44】Deep Learning Based Channel Extrapolation for Dual-Band Massive MIMO Systems
标题:基于深度学习的双频段大规模CDMA系统的通道外推
链接:https://arxiv.org/abs/2601.06858

作者:Qikai Xiao,Kehui Li,Binggui Zhou,Shaodan Ma


【45】Dimension-reduced outcome-weighted learning for estimating individualized treatment regimes in observational studies
标题:在观察性研究中估计个体化治疗方案的任务减少的结果加权学习
链接:https://arxiv.org/abs/2601.06782

作者:Sungtaek Son,Eardi Lila,Kwun Chuen Gary Chan
备注:54 pages, 9 figures


【46】On a Gradient Approach to Chebyshev Center Problems with Applications to Function Learning
标题:切比雪夫中心问题的梯度方法及其在函数学习中的应用
链接:https://arxiv.org/abs/2601.06434

作者:Abhinav Raghuvanshi,Mayank Baranwal,Debasish Chatterjee
备注:Accepted to TMLR


【47】Hard Constraint Projection in a Physics Informed Neural Network
标题:物理信息神经网络中的硬约束投影
链接:https://arxiv.org/abs/2601.06244

作者:Miranda J. S. Horne,Peter K. Jimack,Amirul Khan,He Wang
备注:6 pages, 3 figures, Accepted manuscript of the paper presented at ParCFD2024


其他(45篇)

【1】Active Evaluation of General Agents: Problem Definition and Comparison of Baseline Algorithms
标题:通用代理的主动评估:问题定义和基线算法的比较
链接:https://arxiv.org/abs/2601.07651

作者:Marc Lanctot,Kate Larson,Ian Gemp,Michael Kaisers
备注:AAMAS 2026


【2】Controlling Multimodal Conversational Agents with Coverage-Enhanced Latent Actions
标题:使用覆盖增强的潜在动作控制多模式对话代理
链接:https://arxiv.org/abs/2601.07516

作者:Yongqi Li,Hao Lang,Tieyun Qian,Yongbin Li


【3】The Secretary Problem with Predictions and a Chosen Order
标题:预测和选择命令的秘书问题
链接:https://arxiv.org/abs/2601.07482

作者:Helia Karisani,Mohammadreza Daneshvaramoli,Hedyeh Beyhaghi,Mohammad Hajiesmaili,Cameron Musco
备注:Accepted to the International Conference on Innovations in Theoretical Computer Science (ITCS 2026)


【4】Improving Video Question Answering through query-based frame selection
标题:通过基于查询的帧选择改进视频问题回答
链接:https://arxiv.org/abs/2601.07459

作者:Himanshu Patil,Geo Jolly,Ramana Raja Buddala,Ganesh Ramakrishnan,Rohit Saluja


【5】OceanSAR-2: A Universal Feature Extractor for SAR Ocean Observation
标题:OceanSAR-2:一种用于SAR海洋观测的通用特征提取器
链接:https://arxiv.org/abs/2601.07392

作者:Alexandre Tuel,Thomas Kerdreux,Quentin Febvre,Alexis Mouche,Antoine Grouazel,Jean-Renaud Miadana,Antoine Audras,Chen Wang,Bertrand Chapron
备注:accepted at EUSAR 2026


【6】Standardization of Post-Publication Code Verification by Journals is Possible with the Support of the Community
标题:在社区的支持下,期刊出版后代码验证的标准化是可能的
链接:https://arxiv.org/abs/2601.07189

作者:Susana Lopez-Moreno,Eric Dolores-Cuenca,Sangil Kim
备注:10 pages, 1 figure


【7】When Should We Introduce Safety Interventions During Pretraining?
标题:我们什么时候应该在预训练期间引入安全干预措施?
链接:https://arxiv.org/abs/2601.07087

作者:Dylan Sam,Sachin Goyal,Pratyush Maini,Alexander Robey,J. Zico Kolter


【8】Hallucinations Live in Variance
标题:幻觉生活在差异中
链接:https://arxiv.org/abs/2601.07058

作者:Aaron R. Flouro,Shawn P. Chadwick
备注:8 pages, 3 figures


【9】Fine-Tuning vs. RAG for Multi-Hop Question Answering with Novel Knowledge
标题:利用新颖知识进行多跳问题解答的微调与RAG
链接:https://arxiv.org/abs/2601.07054

作者:Zhuoyi Yang,Yurun Song,Iftekhar Ahmed,Ian Harris


【10】A Robust Certified Machine Unlearning Method Under Distribution Shift
标题:分布转移下的鲁棒认证机器去学习方法
链接:https://arxiv.org/abs/2601.06967

作者:Jinduo Guo,Yinzhi Cao


【11】X-Coder: Advancing Competitive Programming with Fully Synthetic Tasks, Solutions, and Tests
标题:X-Coder:通过完全合成的任务、解决方案和测试推进竞争性编程
链接:https://arxiv.org/abs/2601.06953

作者:Jie Wu,Haoling Li,Xin Zhang,Jiani Guo,Jane Luo,Steven Liu,Yangyu Huang,Ruihang Chu,Scarlett Li,Yujiu Yang
备注:Project: https://github.com/JieWu02/X-Coder


【12】Forgetting Similar Samples: Can Machine Unlearning Do it Better?
标题:忘记类似样本:机器放弃学习可以做得更好吗?
链接:https://arxiv.org/abs/2601.06938

作者:Heng Xu,Tianqing Zhu,Dayong Ye,Lefeng Zhang,Le Wang,Wanlei Zhou


【13】Tractable Multinomial Logit Contextual Bandits with Non-Linear Utilities
标题:具有非线性效用的易于处理的多项Logit上下文盗贼
链接:https://arxiv.org/abs/2601.06913

作者:Taehyun Hwang,Dahngoon Kim,Min-hwan Oh
备注:Accepted at NeurIPS 2025


【14】Applying Embedding-Based Retrieval to Airbnb Search
标题:将基于嵌入的检索应用于Airbnb搜索
链接:https://arxiv.org/abs/2601.06873

作者:Mustafa Abdool,Soumyadip Banerjee,Moutupsi Paul,Do-kyum Kim,Xioawei Liu,Bin Xu,Tracy Yu,Hui Gao,Karen Ouyang,Huiji Gao,Liwei He,Stephanie Moyerman,Sanjeev Katariya
备注:14 pages, 9 figures


【15】CliffordNet: All You Need is Geometric Algebra
标题:CliffordNet:你需要的只是几何代数
链接:https://arxiv.org/abs/2601.06793

作者:Zhongping Ji
备注:15 pages


【16】Comparative Separation: Evaluating Separation on Comparative Judgment Test Data
标题:比较分离:根据比较判断测试数据评估分离
链接:https://arxiv.org/abs/2601.06761

作者:Xiaoyin Xi,Neeku Capak,Kate Stockwell,Zhe Yu
备注:10 pages, 8 tables, 1 figure


【17】Logic-Driven Semantic Communication for Resilient Multi-Agent Systems
标题:弹性多智能体系统的逻辑驱动语义通信
链接:https://arxiv.org/abs/2601.06733

作者:Tamara Alshammari,Mehdi Bennis


【18】Revisiting Training Scale: An Empirical Study of Token Count, Power Consumption, and Parameter Efficiency
标题:重新审视训练规模:代币计数、功耗和参数效率的实证研究
链接:https://arxiv.org/abs/2601.06649

作者:Joe Dwyer


【19】KASER: Knowledge-Aligned Student Error Simulator for Open-Ended Coding Tasks
标题:KASER:开放式编码任务的知识对齐学生错误模拟器
链接:https://arxiv.org/abs/2601.06633

作者:Zhangqi Duan,Nigel Fernandez,Andrew Lan


【20】Lower Bounds for the Algorithmic Complexity of Learned Indexes
标题:学习指标数学复杂性的下限
链接:https://arxiv.org/abs/2601.06629

作者:Luis Alberto Croquevielle,Roman Sokolovskii,Thomas Heinis


【21】CEDAR: Context Engineering for Agentic Data Science
标题:CEDART:统计数据科学的上下文工程
链接:https://arxiv.org/abs/2601.06606

作者:Rishiraj Saha Roy,Chris Hinze,Luzian Hahn,Fabian Kuech
备注:Accepted at ECIR 2026


【22】Implicit bias as a Gauge correction: Theory and Inverse Design
标题:隐性偏差作为规范修正:理论和反向设计
链接:https://arxiv.org/abs/2601.06597

作者:Nicola Aladrah,Emanuele Ballarin,Matteo Biagetti,Alessio Ansuini,Alberto d'Onofrio,Fabio Anselmi
备注:v1


【23】Hellinger Multimodal Variational Autoencoders
标题:Hellinger多模式变分自动编码器
链接:https://arxiv.org/abs/2601.06572

作者:Huyen Khanh Vo,Isabel Valera


【24】ArenaRL: Scaling RL for Open-Ended Agents via Tournament-based Relative Ranking
标题:ArenaRL:通过基于锦标赛的相对排名为开放式代理商扩展RL
链接:https://arxiv.org/abs/2601.06487

作者:Qiang Zhang,Boli Chen,Fanrui Zhang,Ruixue Ding,Shihang Wang,Qiuchen Wang,Yinfeng Huang,Haonan Zhang,Rongxiang Zhu,Pengyong Wang,Ailin Ren,Xin Li,Pengjun Xie,Jiawei Liu,Ning Guo,Jingren Zhou,Zheng-Jun Zha


【25】Gecko: An Efficient Neural Architecture Inherently Processing Sequences with Arbitrary Lengths
标题:Gecko:一种有效的神经架构,本质上处理具有任意长度的序列
链接:https://arxiv.org/abs/2601.06463

作者:Xuezhe Ma,Shicheng Wen,Linghao Jin,Bilge Acun,Ruihang Lai,Bohan Hou,Will Lin,Hao Zhang,Songlin Yang,Ryan Lee,Mengxi Wu,Jonathan May,Luke Zettlemoyer,Carole-Jean Wu
备注:13 pages, 5 figure and 3 tables


【26】Physics-Informed Tree Search for High-Dimensional Computational Design
标题:多维计算设计的物理信息树搜索
链接:https://arxiv.org/abs/2601.06444

作者:Suvo Banik,Troy D. Loeffler,Henry Chan,Sukriti Manna,Orcun Yildiz,Tom Peterka,Subramanian Sankaranarayanan


【27】A Fast and Effective Method for Euclidean Anticlustering: The Assignment-Based-Anticlustering Algorithm
标题:一种快速有效的欧几里得反集群方法:基于分配的反集群算法
链接:https://arxiv.org/abs/2601.06351

作者:Philipp Baumann,Olivier Goldschmidt,Dorit S. Hochbaum,Jason Yang


【28】SPINAL -- Scaling-law and Preference Integration in Neural Alignment Layers
标题:SPINAL --神经对齐层中的比例律和偏好集成
链接:https://arxiv.org/abs/2601.06238

作者:Arion Das,Partha Pratim Saha,Amit Dhanda,Vinija Jain,Aman Chadha,Amitava Das


【29】TimeGNN-Augmented Hybrid-Action MARL for Fine-Grained Task Partitioning and Energy-Aware Offloading in MEC
标题:TimeGNN增强的混合动作MARL,用于MEC中的细粒度任务划分和能源感知卸载
链接:https://arxiv.org/abs/2601.06191

作者:Wei Ai,Yun Peng,Yuntao Shou,Tao Meng,Keqin Li


【30】Forget-It-All: Multi-Concept Machine Unlearning via Concept-Aware Neuron Masking
标题:忘掉一切:通过概念感知神经元掩蔽的多概念机器去学习
链接:https://arxiv.org/abs/2601.06163

作者:Kaiyuan Deng,Bo Hui,Gen Li,Jie Ji,Minghai Qin,Geng Yuan,Xiaolong Ma


【31】Is Sanskrit the most token-efficient language? A quantitative study using GPT, Gemini, and SentencePiece
标题:梵语是最有效的语言吗?使用GPT、Gemini和SentencePiece的定量研究
链接:https://arxiv.org/abs/2601.06142

作者:Anshul Kumar
备注:9 pages, 4 figures. Code and dataset available at: https://github.com/anshulkr713/sanskrit-token-efficiency


【32】A Review of Online Diffusion Policy RL Algorithms for Scalable Robotic Control
标题:可扩展机器人控制的在线扩散策略RL算法综述
链接:https://arxiv.org/abs/2601.06133

作者:Wonhyeok Choi,Minwoo Choi,Jungwan Woo,Kyumin Hwang,Jaeyeul Kim,Sunghoon Im


【33】Latent Space Communication via K-V Cache Alignment
标题:通过K-V缓存对齐实现潜在空间通信
链接:https://arxiv.org/abs/2601.06123

作者:Lucio M. Dery,Zohar Yahav,Henry Prior,Qixuan Feng,Jiajun Shen,Arthur Szlam
备注:15 pages, 6 figures, 4 tables


【34】Australian Bushfire Intelligence with AI-Driven Environmental Analytics
标题:澳大利亚丛林大火情报,具有人工智能驱动的环境分析
链接:https://arxiv.org/abs/2601.06105

作者:Tanvi Jois,Hussain Ahmad,Fatima Noor,Faheem Ullah


【35】The Impact of Post-training on Data Contamination
标题:后训练对数据污染的影响
链接:https://arxiv.org/abs/2601.06103

作者:Muhammed Yusuf Kocyigit,Caglar Yildirim


【36】Dynamic Intelligence Ceilings: Measuring Long-Horizon Limits of Planning and Creativity in Artificial Systems
标题:动态智能天花板:测量人工系统规划和创造力的长期限制
链接:https://arxiv.org/abs/2601.06102

作者:Truong Xuan Khanh,Truong Quynh Hoa
备注:This paper introduces a trajectory-centric evaluation framework for analyzing long-horizon intelligence limits in artificial systems, focusing on developmental behavior, planning, and structural creativity rather than proposing new learning algorithms. 11 pages, 2 figures


【37】Enabling Long FFT Convolutions on Memory-Constrained FPGAs via Chunking
标题:通过分块在内存限制的VGA上启用长快速傅立叶卷积
链接:https://arxiv.org/abs/2601.06065

作者:Peter Wang,Neelesh Gupta,Viktor Prasanna
备注:2 pages, submitted to 2025 HiPC Conference


【38】A Complete Decomposition of Stochastic Differential Equations
标题:随机方程的完全分解
链接:https://arxiv.org/abs/2601.07834

作者:Samuel Duffield


【39】PFT: Phonon Fine-tuning for Machine Learned Interatomic Potentials
标题:PFT:机器学习原子间势的音素微调
链接:https://arxiv.org/abs/2601.07742

作者:Teddy Koker,Abhijeet Gangan,Mit Kotak,Jaime Marian,Tess Smidt


【40】Backward Reconstruction of the Chafee--Infante Equation via Physics-Informed WGAN-GP
标题:通过了解物理知识的WGAN-GP向后重建Chafe-Infante方程
链接:https://arxiv.org/abs/2601.07733

作者:Joseph L. Shomberg
备注:5 pages, 9 figures


【41】Position: Don't be Afraid of Over-Smoothing And Over-Squashing
标题:立场:不要害怕过度平滑和过度挤压
链接:https://arxiv.org/abs/2601.07419

作者:Niklas Kormann,Benjamin Doerr,Johannes F. Lutzeyer
备注:Preprint. Copyright 2026 by the authors


【42】Covariance-Driven Regression Trees: Reducing Overfitting in CART
标题:协方差驱动的回归树:减少CART中的过拟
链接:https://arxiv.org/abs/2601.07281

作者:Likun Zhang,Wei Ma


【43】On Lie Groups Preserving Subspaces of Degenerate Clifford Algebras
标题:退化Clifford代数的保李群子空间
链接:https://arxiv.org/abs/2601.07191

作者:E. R. Filimoshina,D. S. Shirokov


【44】Unity Forests: Improving Interaction Modelling and Interpretability in Random Forests
标题:Unity Forests:改进随机森林中的交互建模和可解释性
链接:https://arxiv.org/abs/2601.07003

作者:Roman Hornung,Alexander Hapfelmeier
备注:33 pages, 12 figures


【45】Physics-informed Gaussian Process Regression in Solving Eigenvalue Problem of Linear Operators
标题:物理信息高斯过程回归在求解线性运算符特征值问题中的应用
链接:https://arxiv.org/abs/2601.06462

作者:Tianming Bai,Jiannan Yang
备注:17 pages, 7 figures


机器翻译由腾讯交互翻译提供,仅供参考

点击“阅读原文”获取带摘要的学术速递

Python社区是高质量的Python/Django开发社区
本文地址:http://www.python88.com/topic/191643