Py学习  »  机器学习算法

机器学习学术速递[4.14]

arXiv每日学术速递 • 3 周前 • 994 次点击  

点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!


cs.LG 方向,今日共计290篇


大模型相关(31篇)

【1】A Mechanistic Analysis of Looped Reasoning Language Models
标题:循环推理语言模型的机制分析
链接:https://arxiv.org/abs/2604.11791

作者:Hugh Blayney,Álvaro Arroyo,Johan Obando-Ceron,Pablo Samuel Castro,Aaron Courville,Michael M. Bronstein,Xiaowen Dong
备注:39 pages, 63 figures
摘要:Reasoning has become a central capability in large language models. Recent research has shown that reasoning performance can be improved by looping an LLM's layers in the latent dimension, resulting in looped reasoning language models. Despite promising results, few works have investigated how their internal dynamics differ from those of standard feedforward models. In this paper, we conduct a mechanistic analysis of the latent states in looped language models, focusing in particular on how the stages of inference observed in feedforward models compare to those observed in looped ones. To this end, we analyze cyclic recurrence and show that for many of the studied models each layer in the cycle converges to a distinct fixed point; consequently, the recurrent block follows a consistent cyclic trajectory in the latent space. We provide evidence that as these fixed points are reached, attention-head behavior stabilizes, leading to constant behavior across recurrences. Empirically, we discover that recurrent blocks learn stages of inference that closely mirror those of feedforward models, repeating these stages in depth with each iteration. We study how recurrent block size, input injection, and normalization influence the emergence and stability of these cyclic fixed points. We believe these findings help translate mechanistic insights into practical guidance for architectural design.


【2】Evaluating Cooperation in LLM Social Groups through Elected Leadership
标题:通过民选领导评估LLM社会团体的合作
链接:https://arxiv.org/abs/2604.11721

作者:Ryan Faulkner,Anushka Deshpande,David Guzman Piedrahita,Joel Z. Leibo,Zhijing Jin
备注:Main text: 11 pages, 4 figures, 4 tables
摘要:Governing common-pool resources requires agents to develop enduring strategies through cooperation and self-governance to avoid collective failure. While foundation models have shown potential for cooperation in these settings, existing multi-agent research provides little insight into whether structured leadership and election mechanisms can improve collective decision making. The lack of such a critical organizational feature ubiquitous in human society presents a significant shortcoming of the current methods. In this work we aim to directly address whether leadership and elections can support improved social welfare and cooperation through multi-agent simulation with LLMs. We present our open-source framework that simulates leadership through elected personas and candidate-driven agendas and carry out an empirical study of LLMs under controlled governance conditions. Our experiments demonstrate that having elected leadership improves social welfare scores by 55.4% and survival time by 128.6% across a range of high performing LLMs. Through the construction of an agent social graph we compute centrality metrics to assess the social influence of leader personas and also analyze rhetorical and cooperative tendencies revealed through a sentiment analysis on leader utterances. This work lays the foundation for further study of election mechanisms in multi-agent systems toward navigating complex social dilemmas.


【3】Policy Split: Incentivizing Dual-Mode Exploration in LLM Reinforcement with Dual-Mode Entropy Regularization
标题:政策分裂:通过双模式熵正规化激励LLM强化中的双模式探索
链接:https://arxiv.org/abs/2604.11510

作者:Jiashu Yao,Heyan Huang,Chuwei Luo,Daiqing Wu,Zeming Liu,Yuhang Guo,Yangyang Kang
备注:preprint
摘要:To encourage diverse exploration in reinforcement learning (RL) for large language models (LLMs) without compromising accuracy, we propose Policy Split, a novel paradigm that bifurcates the policy into normal and high-entropy modes with a high-entropy prompt. While sharing model parameters, the two modes undergo collaborative dual-mode entropy regularization tailored to distinct objectives. Specifically, the normal mode optimizes for task correctness, while the high-entropy mode incorporates a preference for exploration, and the two modes learn collaboratively. Extensive experiments demonstrate that our approach consistently outperforms established entropy-guided RL baselines across various model sizes in general and creative tasks. Further analysis reveals that Policy Split facilitates dual-mode exploration, where the high-entropy mode generates distinct behavioral patterns to the normal mode, providing unique learning signals.


【4】Revisiting Compositionality in Dual-Encoder Vision-Language Models: The Role of Inference
标题:重新审视双编码器视觉语言模型中的组合性:推理的作用
链接:https://arxiv.org/abs/2604.11496

作者:Imanol Miranda,Ander Salaberria,Eneko Agirre,Gorka Azkune
摘要 :Dual-encoder Vision-Language Models (VLMs) such as CLIP are often characterized as bag-of-words systems due to their poor performance on compositional benchmarks. We argue that this limitation may stem less from deficient representations than from the standard inference protocol based on global cosine similarity. First, through controlled diagnostic experiments, we show that explicitly enforcing fine-grained region-segment alignment at inference dramatically improves compositional performance without updating pretrained encoders. We then introduce a lightweight transformer that learns such alignments directly from frozen patch and token embeddings. Comparing against full fine-tuning and prior end-to-end compositional training methods, we find that although these approaches improve in-domain retrieval, their gains do not consistently transfer under distribution shift. In contrast, learning localized alignment over frozen representations matches full fine-tuning on in-domain retrieval while yielding substantial improvements on controlled out-of-domain compositional benchmarks. These results identify global embedding matching as a key bottleneck in dual-encoder VLMs and highlight the importance of alignment mechanisms for robust compositional generalization.


【5】CAGenMol: Condition-Aware Diffusion Language Model for Goal-Directed Molecular Generation
标题:CAGenMol:用于目标导向分子生成的条件感知扩散语言模型
链接:https://arxiv.org/abs/2604.11483

作者:Yanting Li,Zhuoyang Jiang,Enyan Dai,Lei Wang,Wen-Cai Ye,Li Liu
摘要:Goal-directed molecular generation requires satisfying heterogeneous constraints such as protein--ligand compatibility and multi-objective drug-like properties, yet existing methods often optimize these constraints in isolation, failing to reconcile conflicting objectives (e.g., affinity vs. safety), and struggle to navigate the non-differentiable chemical space without compromising structural validity. To address these challenges, we propose CAGenMol, a condition-aware discrete diffusion framework over molecular sequences that formulates molecular design as conditional denoising guided by heterogeneous structural and property signals. By coupling discrete diffusion with reinforcement learning, the model aligns the generation trajectory with non-differentiable objectives while preserving chemical validity and diversity. The non-autoregressive nature of diffusion language model further enables iterative refinement of molecular fragments at inference time. Experiments on structure-conditioned, property-conditioned, and dual-conditioned benchmarks demonstrate consistent improvements over state-of-the-art methods in binding affinity, drug-likeness, and success rate, highlighting the effectiveness of our framework.


【6】Low-rank Optimization Trajectories Modeling for LLM RLVR Acceleration
标题:LLM WLVR加速的低级优化轨迹建模
链接:https://arxiv.org/abs/2604.11446

作者:Zhipeng Chen,Tao Qian,Wayne Xin Zhao,Ji-Rong Wen
备注:Working in progress
摘要:Recently, scaling reinforcement learning with verifiable rewards (RLVR) for large language models (LLMs) has emerged as an effective training paradigm for significantly improving model capabilities, which requires guiding the model to perform extensive exploration and learning, leading to substantial computational overhead and becoming a key challenge. To reduce the number of training steps, Prior work performs linear extrapolation of model parameters. However, the dynamics of model parameter updates during RLVR training remain insufficiently understood. To further investigate the evolution of LLMs during RLVR training, we conduct empirical experiments and find that the rank-1 subspace of the model does not evolve linearly, and its dominance over the original parameters is further amplified during LoRA training. Based on the above insights, we propose the \textbf{N}onlinear \textbf{Ext}rapolation of low-rank trajectories (\textbf{NExt}), a novel framework that models and extrapolates low-rank parameter trajectories in a nonlinear manner. Concretely, we first train the model using LoRA and extract the rank-1 subspace of parameter differences at multiple training steps, which is then used for the subsequent nonlinear extrapolation. Afterward, we utilized the extracted rank-1 subspace to train a predictor, which can model the trajectory of parameter updates during RLVR, and then perform the predict-extend process to extrapolate model parameters, achieving the acceleration of RLVR. To further study and understand NExt, we conduct comprehensive experiments that demonstrate the effectiveness and robustness of the method. Our method reduces computational overhead by approximately 37.5\% while remaining compatible with a wide range of RLVR algorithms and tasks. We release our code in https://github.com/RUCAIBox/NExt.


【7】The Salami Slicing Threat: Exploiting Cumulative Risks in LLM Systems
标题:萨拉米切片威胁:利用LLM系统中的累积风险
链接:https://arxiv.org/abs/2604.11309

作者:Yihao Zhang,Kai Wang,Jiangrong Wu,Haolin Wu,Yuxuan Zhou,Zeming Wei,Dongxian Wu,Xun Chen,Jun Sun,Meng Sun
摘要:Large Language Models (LLMs) face prominent security risks from jailbreaking, a practice that manipulates models to bypass built-in security constraints and generate unethical or unsafe content. Among various jailbreak techniques, multi-turn jailbreak attacks are more covert and persistent than single-turn counterparts, exposing critical vulnerabilities of LLMs.   However, existing multi-turn jailbreak methods suffer from two fundamental limitations that affect the actual impact in real-world scenarios: (a) As models become more context-aware, any explicit harmful trigger is increasingly likely to be flagged and blocked; (b) Successful final-step triggers often require finely tuned, model-specific contexts, making such attacks highly context-dependent. To fill this gap, we propose \textit{Salami Slicing Risk}, which operates by chaining numerous low-risk inputs that individually evade alignment thresholds but cumulatively accumulate harmful intent to ultimately trigger high-risk behaviors, without heavy reliance on pre-designed contextual structures. Building on this risk, we develop Salami Attack, an automatic framework universally applicable to multiple model types and modalities.   Rigorous experiments demonstrate its state-of-the-art performance across diverse models and modalities, achieving over 90\% Attack Success Rate on GPT-4o and Gemini, as well as robustness against real-world alignment defenses. We also proposed a defense strategy to constrain the Salami Attack by at least 44.8\% while achieving a maximum blocking rate of 64.8\% against other multi-turn jailbreak attacks. Our findings provide critical insights into the pervasive risks of multi-turn jailbreaking and offer actionable mitigation strategies to enhance LLM security.


【8】Frugal Knowledge Graph Construction with Local LLMs: A Zero-Shot Pipeline, Self-Consistency and Wisdom of Artificial Crowds
标题:利用本地LLM构建节俭的知识图:零冲击管道、自我一致性和人工群体的智慧
链接:https://arxiv.org/abs/2604.11104

作者:Pierre Jourlin
备注:Source code and raw results available: https://github.com/jourlin/synsynth (licence Hypocratic)
摘要 :This paper presents an empirical study of a multi-model zero-shot pipeline for knowledge graph construction and exploitation, executed entirely through local inference on consumer-grade hardware. We propose a reproducible evaluation framework integrating two external benchmarks (DocRED, HotpotQA), WebQuestionsSP-style synthetic data, and the RAGAS evaluation framework in an automated pipeline. On 500 document-level relations, our system achieves an F1 of 0.70 $\pm$ 0.041 in zero-shot, compared to 0.80 for supervised DREEAM. Text-to-query achieves an accuracy of 0.80 $\pm$ 0.06 on 200 samples. Multi-hop reasoning achieves an Exact Match (EM) of 0.46$\pm$0.04 on 500 HotpotQA questions, with a RAGAS faithfulness of 0.96 $\pm$ 0.04 on 50 samples. Beyond the pipeline, we study diversity mechanisms for difficult multi-hop reasoning. On 181 questions unsolvable at zero temperature, self-consistency (k=5, T =0.7) recovers up to 23% EM with a single Mixture-of-Experts (MoE) model, but the cross-model oracle (3 architectures x 5 samples) reaches 46.4%. We highlight an agreement paradox: strong consensus among samples signals collective hallucination rather than a reliable answer, echoing the work of Moussa{ï}d et al. on the wisdom of crowds. Extending to the full pipeline (500 questions), self-consistency (k=3) raises EM from 0.46 to 0.48 $\pm$ 0.04. A confidence-routing cascade mechanism (Phi-4 $\rightarrow$ GPT-OSS, k=5) achieves an EM of 0.55 $\pm$ 0.04, the best result obtained, with 45.4% of questions rerouted. Finally, we show that V3 prompt engineering applied to other models does not reproduce the gains observed with Gemma-4, confirming the specific prompt/model interaction. The entire system runs in $\sim$5 h on a single RTX 3090, without any training, for an estimated carbon footprint of 0.09 kg CO2 eq.


【9】CausalGaze: Unveiling Hallucinations via Counterfactual Graph Intervention in Large Language Models
标题:Cairan凝视:通过大型语言模型中的反事实图形干预揭开幻觉
链接:https://arxiv.org/abs/2604.11087

作者:Linggang Kong,Lei Wu,Yunlong Zhang,Xiaofeng Zhong,Zhen Wang,Yongjie Wang,Yao Pan
备注:Accepted as ACL2026 Findings
摘要:Despite the groundbreaking advancements made by large language models (LLMs), hallucination remains a critical bottleneck for their deployment in high-stakes domains. Existing classification-based methods mainly rely on static and passive signals from internal states, which often captures the noise and spurious correlations, while overlooking the underlying causal mechanisms. To address this limitation, we shift the paradigm from passive observation to active intervention by introducing CausalGaze, a novel hallucination detection framework based on structural causal models (SCMs). CausalGaze models LLMs' internal states as dynamic causal graphs and employs counterfactual interventions to disentangle causal reasoning paths from incidental noise, thereby enhancing model interpretability. Extensive experiments across four datasets and three widely used LLMs demonstrate the effectiveness of CausalGaze, especially achieving over 5.2\% improvement in AUROC on the TruthfulQA dataset compared to state-of-the-art baselines.


【10】Flow-Controlled Scheduling for LLM Inference with Provable Stability Guarantees
标题:具有可证明稳定性保证的LLM推理流控制调度
链接:https://arxiv.org/abs/2604.11001

作者:Zhuolun Dong,Junyu Cao
摘要:Large language models (LLMs) have been widely adopted due to their great performance across a wide range of applications. ChatGPT and Gemini now serve hundreds of millions of active users and handle billions of user requests per day, which puts optimizing LLM inference into the spotlight. A key challenge in LLM inference is that decode lengths are unknown. The memory usage for each request grows with generated tokens, which may lead to overflow and cause system instability. To address this concern, we propose a simple flow-control framework that controls the rate at which prompts join the active set. We derive a necessary condition that any stable system must satisfy and establish sufficient conditions under which our algorithm provably achieves stability. Experiments show that, compared to commonly used strategies in practice, our approach achieves higher token and request throughput, lower average and tail latency, and more stable KV cache utilization.


【11】Generating Multiple-Choice Knowledge Questions with Interpretable Difficulty Estimation using Knowledge Graphs and Large Language Models
标题:使用知识图和大型语言模型生成具有可解释难度估计的多项选择知识问题
链接:https://arxiv.org/abs/2604.10748

作者:Mehmet Can Şakiroğlu,H. Altay Güvenir,Kamer Kaya
摘要:Generating multiple-choice questions (MCQs) with difficulty estimation remains challenging in automated MCQ-generation systems used in adaptive, AI-assisted education. This study proposes a novel methodology for generating MCQs with difficulty estimation from the input documents by utilizing knowledge graphs (KGs) and large language models (LLMs). Our approach uses an LLM to construct a KG from input documents, from which MCQs are then systematically generated. Each MCQ is generated by selecting a node from the KG as the key, sampling a related triple or quintuple -- optionally augmented with an extra triple -- and prompting an LLM to generate a corresponding stem from these graph components. Distractors are then selected from the KG. For each MCQ, nine difficulty signals are computed and combined into a unified difficulty score using a data-driven approach. Experimental results demonstrate that our method generates high-quality MCQs whose difficulty estimation is interpretable and aligns with human perceptions. Our approach improves automated MCQ generation by integrating structured knowledge representations with LLMs and a data-driven difficulty estimation model.


【12】Bringing Value Models Back: Generative Critics for Value Modeling in LLM Reinforcement Learning
标题:带回价值模型:LLM强化学习中价值模型的生成性批评者
链接:https://arxiv.org/abs/2604.10701

作者:Zikang Shan,Han Zhong,Liwei Wang,Li Zhao
备注:16 pages including appendix, 4 figures
摘要 :Credit assignment is a central challenge in reinforcement learning (RL). Classical actor-critic methods address this challenge through fine-grained advantage estimation based on a learned value function. However, learned value models are often avoided in modern large language model (LLM) RL because conventional discriminative critics are difficult to train reliably. We revisit value modeling and argue that this difficulty is partly due to limited expressiveness. In particular, representation complexity theory suggests that value functions can be hard to approximate under the one-shot prediction paradigm used by existing value models, and our scaling experiments show that such critics do not improve reliably with scale. Motivated by this observation, we propose Generative Actor-Critic (GenAC), which replaces one-shot scalar value prediction with a generative critic that performs chain-of-thought reasoning before producing a value estimate. We further introduce In-Context Conditioning, which helps the critic remain calibrated to the current actor throughout training. GenAC improves value approximation, ranking reliability, and out-of-distribution generalization, and these gains translate into stronger downstream RL performance than both value-based and value-free baselines. Overall, our results suggest that stronger value modeling is a promising direction for improving credit assignment in LLM reinforcement learning.


【13】Attention Sinks as Internal Signals for Hallucination Detection in Large Language Models
标题:注意力下沉作为大型语言模型中幻觉检测的内部信号
链接:https://arxiv.org/abs/2604.10697

作者:Jakub Binkowski,Kamil Adamczewski,Tomasz Kajdanowicz
摘要:Large language models frequently exhibit hallucinations: fluent and confident outputs that are factually incorrect or unsupported by the input context. While recent hallucination detection methods have explored various features derived from attention maps, the underlying mechanisms they exploit remain poorly understood. In this work, we propose SinkProbe, a hallucination detection method grounded in the observation that hallucinations are deeply entangled with attention sinks - tokens that accumulate disproportionate attention mass during generation - indicating a transition from distributed, input-grounded attention to compressed, prior-dominated computation. Importantly, although sink scores are computed solely from attention maps, we find that the classifier preferentially relies on sinks whose associated value vectors have large norms. Moreover, we show that previous methods implicitly depend on attention sinks by establishing their mathematical relationship to sink scores. Our findings yield a novel hallucination detection method grounded in theory that produces state-of-the-art results across popular datasets and LLMs.


【14】Skill-SD: Skill-Conditioned Self-Distillation for Multi-turn LLM Agents
标题:Skill-SD:多回合LLM试剂的技能条件自蒸馏
链接:https://arxiv.org/abs/2604.10674

作者:Hao Wang,Guozhi Wang,Han Xiao,Yufeng Zhou,Yue Pan,Jichao Wang,Ke Xu,Yafei Wen,Xiaohu Ruan,Xiaoxin Chen,Honggang Qi
备注:Project page: https://k1xe.github.io/skill-sd/
摘要:Reinforcement learning (RL) has been widely used to train LLM agents for multi-turn interactive tasks, but its sample efficiency is severely limited by sparse rewards and long horizons. On-policy self-distillation (OPSD) alleviates this by providing dense token-level supervision from a privileged teacher that has access to ground-truth answers. However, such fixed privileged information cannot capture the diverse valid strategies in agent tasks, and naively combining OPSD with RL often leads to training collapse. To address these limitations, we introduce Skill-SD, a framework that turns the agent's own trajectories into dynamic training-only supervision. Completed trajectories are summarized into compact natural language skills that describe successful behaviors, mistakes, and workflows. These skills serve as dynamic privileged information conditioning only the teacher, while the student always acts under the plain task prompt and learns to internalize the guidance through distillation. To stabilize the training, we derive an importance-weighted reverse-KL loss to provide gradient-correct token-level distillation, and dynamically synchronize the teacher with the improving student. Experimental results on agentic benchmarks demonstrate that Skill-SD substantially outperforms the standard RL baseline, improving both vanilla GRPO (+14.0%/+10.9% on AppWorld/Sokoban) and vanilla OPD (+42.1%/+40.6%). Project page: https://k1xe.github.io/skill-sd/


【15】Learning and Enforcing Context-Sensitive Control for LLMs
标题:学习和实施LLC的上下文相关控制
链接:https://arxiv.org/abs/2604.10667

作者:Mohammad Albinhassan,Pranava Madhyastha,Mark Law,Alessandra Russo
备注:ACL 2025 Student Research Workshop
摘要:Controlling the output of Large Language Models (LLMs) through context-sensitive constraints has emerged as a promising approach to overcome the limitations of Context-Free Grammars (CFGs) in guaranteeing generation validity. However, such constraints typically require manual specification -- a significant barrier demanding specialized expertise. We introduce a framework that automatically learns context-sensitive constraints from LLM interactions through a two-phase process: syntactic exploration to gather diverse outputs for constraint learning, followed by constraint exploitation to enforce these learned rules during generation. Experiments demonstrate that our method enables even small LLMs (1B parameters) to learn and generate with perfect constraint adherence, outperforming larger counterparts and state-of-the-art reasoning models. This work represents the first integration of context-sensitive grammar learning with LLM generation, eliminating manual specification while maintaining generation validity.


【16】MoEITS: A Green AI approach for simplifying MoE-LLMs
标题:MoEITS:简化MoE-LLM的绿色人工智能方法
链接:https://arxiv.org/abs/2604.10603

作者:Luis Balderas,Miguel Lastra,José M. Benítez
摘要 :Large language models are transforming all areas of academia and industry, attracting the attention of researchers, professionals, and the general public. In the trek for more powerful architectures, Mixture-of-Experts, inspired by ensemble models, have emerged as one of the most effective ways to follow. However, this implies a high computational burden for both training and inference. To reduce the impact on computing and memory footprint as well as the energy consumption, simplification methods has arisen as very effective procedures.   In this paper, an original algorithm, MoEITS, for MoE-LLMs simplification is presented. The algorithm is characterized by a refined simplicity, underpinned by standardized Information Theoretic frameworks. MoEITS is analyzed in depth from theoretical and practical points of view. Its computational complexity is studied. Its performance on the accuracy of the simplified LLMs and the reduction rate achieved is assessed through a thoroughly designed experimentation. This empirical evaluation includes a comparison with state-of-the-art MoE-LLM pruning methods applied on Mixtral $8\times7$B, Qwen1.5-2.7B, and DeepSeek-V2-Lite. The extensive experimentation conducted demonstrates that MoEITS outperforms state-of-the-art techniques by generating models that are both effective across all benchmarks and computationally efficient.   The code implementing the method will be available at https://github.com/luisbalru/MoEITS.


【17】Calibration Collapse Under Sycophancy Fine-Tuning: How Reward Hacking Breaks Uncertainty Quantification in LLMs
标题:奉承微调下的校准崩溃:奖励黑客如何打破LLM中的不确定性量化
链接:https://arxiv.org/abs/2604.10585

作者:Subramanyam Sahoo
备注:Accepted at the AISTATS 2026 Workshop on Towards Trustworthy Predictions: Theory and Applications of Calibration for Modern AI. 14 Pages
摘要:Modern large language models (LLMs) are increasingly fine-tuned via reinforcement learning from human feedback (RLHF) or related reward optimisation schemes. While such procedures improve perceived helpfulness, we investigate whether sycophantic reward signals degrade calibration -- a property essential for reliable uncertainty quantification. We fine-tune Qwen3-8B under three regimes: no fine-tuning (base), neutral supervised fine-tuning (SFT) on TriviaQA, and sycophancy-inducing Group Relative Policy Optimisation (GRPO) that rewards agreement with planted wrong answers. Evaluating on $1{,}000$ MMLU items across five subject domains with bootstrap confidence intervals and permutation testing, we find that \textbf{sycophantic GRPO produces consistent directional calibration degradation} -- ECE rises by $+0.006$ relative to the base model and MCE increases by $+0.010$ relative to neutral SFT -- though the effect does not reach statistical significance ($p = 0.41$) at this training budget. Post-hoc matrix scaling applied to all three models reduces ECE by $40$--$64\%$ and improves accuracy by $1.5$--$3.0$ percentage points. However, the sycophantic model retains the highest post-scaling ECE relative to the neutral SFT control ($0.042$ vs.\ $0.037$), suggesting that reward-induced miscalibration leaves a structured residual even after affine correction. These findings establish a methodology for evaluating the calibration impact of reward hacking and motivate calibration-aware training objectives.


【18】IceCache: Memory-efficient KV-cache Management for Long-Sequence LLMs
标题:Iceache:用于长序列LLM的内存高效的KV缓存管理
链接:https://arxiv.org/abs/2604.10539

作者:Yuzhen Mao,Qitong Wang,Martin Ester,Ke Li
摘要:Key-Value (KV) cache plays a crucial role in accelerating inference in large language models (LLMs) by storing intermediate attention states and avoiding redundant computation during autoregressive generation. However, its memory footprint scales linearly with sequence length, often leading to severe memory bottlenecks on resource-constrained hardware. Prior work has explored offloading KV cache to the CPU while retaining only a subset on the GPU, but these approaches often rely on imprecise token selection and suffer performance degradation in long-generation tasks such as chain-of-thought reasoning. In this paper, we propose a novel KV cache management strategy, IceCache, which integrates semantic token clustering with PagedAttention. By organizing semantically related tokens into contiguous memory regions managed by a hierarchical, dynamically updatable data structure, our method enables more efficient token selection and better utilization of memory bandwidth during CPU-GPU transfers. Experimental results on LongBench show that, with a 256-token budget, IceCache maintains 99% of the original accuracy achieved by the full KV cache model. Moreover, compared to other offloading-based methods, IceCache attains competitive or even superior latency and accuracy while using only 25% of the KV cache token budget, demonstrating its effectiveness in long-sequence scenarios. The code is available on our project website at https://yuzhenmao.github.io/IceCache/.


【19】Latent Instruction Representation Alignment: defending against jailbreaks, backdoors and undesired knowledge in LLMs
标题:潜在指令表示对齐:防范LLM中的越狱、后门和不受欢迎的知识
链接:https://arxiv.org/abs/2604.10403

作者:Eric Easley,Sebastian Farquhar
备注:33 pages, 6 figures
摘要:We address jailbreaks, backdoors, and unlearning for large language models (LLMs). Unlike prior work, which trains LLMs based on their actions when given malign instructions, our method specifically trains the model to change how it interprets instructions. Our method, Latent Instruction Representation Alignment (LIRA), greatly improves generalization. We further boost generalization through an internally adversarial training algorithm. Our methods block over 99% of PEZ jailbreak attacks; remove a challenging insecure code backdoor; and achieve optimal forgetting on WMDP cyber with negligible loss of benign capabilities.


【20】New Hybrid Fine-Tuning Paradigm for LLMs: Algorithm Design and Convergence Analysis Framework
标题:LLM的新混合微调范式:算法设计和收敛分析框架
链接:https://arxiv.org/abs/2604.09940

作者:Shaocong Ma,Peiran Yu,Heng Huang
备注:Accepted by ICLR 2026
摘要:Fine-tuning Large Language Models (LLMs) typically involves either full fine-tuning, which updates all model parameters, or Parameter-Efficient Fine-Tuning (PEFT), which adjusts a small subset of parameters. However, both approaches have inherent limitations: full fine-tuning is computationally expensive, while PEFT often struggles to learn new knowledge and exhibits suboptimal performance. To overcome these issues, we propose a novel hybrid fine-tuning approach that jointly updates both LLMs and PEFT modules using a combination of zeroth-order and first-order optimization methods. To analyze our new algorithm, we develop a theoretical framework centered on the concept of hybrid smoothness condition, which accounts for the heterogeneous nature of the optimization landscape in joint LLM and PEFT training. We derive a rigorous convergence analysis for the convergence of reshuffling-type SGD algorithm under multiple learning rates and demonstrate its effectiveness through extensive empirical studies across various downstream tasks and model architectures. On the practical side, our results demonstrate consistent performance improvement, making the approach a viable solution for large-scale language model fine-tuning.


【21】A Tale of Two Temperatures: Simple, Efficient, and Diverse Sampling from Diffusion Language Models
标题:两种温度的故事:来自扩散语言模型的简单、高效和多样化采样
链接:https://arxiv.org/abs/2604.09921

作者:Theo X. Olausson,Metod Jazbec,Xi Wang,Armando Solar-Lezama,Christian A. Naesseth,Stephan Mandt,Eric Nalisnick
备注:24 pages, 11 figures
摘要:Much work has been done on designing fast and accurate sampling for diffusion language models (dLLMs). However, these efforts have largely focused on the tradeoff between speed and quality of individual samples; how to additionally ensure diversity across samples remains less well understood. In this work, we show that diversity can be increased by using softened, tempered versions of familiar confidence-based remasking heuristics, retaining their computational benefits and offering simple implementations. We motivate this approach by introducing an idealized formal model of fork tokens and studying the impact of remasking on the expected entropy at the forks. Empirically, the proposed tempered heuristics close the exploration gap (pass@k) between existing confidence-based and autoregressive sampling, hence outperforming both when controlling for cost (pass@NFE). We further study how the increase in diversity translates to downstream post-training and test-time compute scaling. Overall, our findings demonstrate that simple, efficient, and diverse sampling from dLLMs is possible.


【22】MEMENTO: Teaching LLMs to Manage Their Own Context
标题:备忘录:教法学硕士管理自己的环境
链接:https://arxiv.org/abs/2604.09852

作者:Vasilis Kontonis,Yuchen Zeng,Shivam Garg,Lingjiao Chen,Hao Tang,Ziyan Wang,Ahmed Awadallah,Eric Horvitz,John Langford,Dimitris Papailiopoulos
摘要:Reasoning models think in long, unstructured streams with no mechanism for compressing or organizing their own intermediate state. We introduce MEMENTO: a method that teaches models to segment reasoning into blocks, compress each block into a memento, i.e., a dense state summary, and reason forward by attending only to mementos, reducing context, KV cache, and compute. To train MEMENTO models, we release OpenMementos, a public dataset of 228K reasoning traces derived from OpenThoughts-v3, segmented and annotated with intermediate summaries. We show that a two-stage SFT recipe on OpenMementos is effective across different model families (Qwen3, Phi-4, Olmo 3) and scales (8B--32B parameters). Trained models maintain strong accuracy on math, science, and coding benchmarks while achieving ${\sim}2.5\times$ peak KV cache reduction. We extend vLLM to support our inference method, achieving ${\sim}1.75\times$ throughput improvement while also enabling us to perform RL and further improve accuracy. Finally, we identify a dual information stream: information from each reasoning block is carried both by the memento text and by the corresponding KV states, which retain implicit information from the original block. Removing this channel drops accuracy by 15\,pp on AIME24.


【23】Steered LLM Activations are Non-Surjective
标题:定向LLM激活是非表面性的
链接:https://arxiv.org/abs/2604.09839

作者:Aayush Mishra,Daniel Khashabi,Anqi Liu
备注:10 pages main text. ICLR 2026 Workshops (Sci4DL, Re-Align)
摘要:Activation steering is a popular white-box control technique that modifies model activations to elicit an abstract change in output behavior. It has also become a standard tool in interpretability (e.g., probing truthfulness, or translating activations into human-readable explanations and safety research (e.g., studying jailbreakability). However, it is unclear whether steered activation states are realizable by any textual prompt. In this work, we cast this question as a surjectivity problem: for a fixed model, does every steered activation admit a pre-image under the model's natural forward pass? Under practical assumptions, we prove that activation steering pushes the residual stream off the manifold of states reachable from discrete prompts. Almost surely, no prompt can reproduce the same internal behavior induced by steering. We also illustrate this finding empirically across three widely used LLMs. Our results establish a formal separation between white-box steerability and black-box prompting. We therefore caution against interpreting the ease and success of activation steering as evidence of prompt-based interpretability or vulnerability, and argue for evaluation protocols that explicitly decouple white-box and black-box interventions.


【24】Pioneer Agent: Continual Improvement of Small Language Models in Production
标题:先锋代理:持续改进生产中的小型语言模型
链接:https://arxiv.org/abs/2604.09791

作者:Dhruv Atreja,Julia White,Nikhil Nayak,Kelton Zhang,Henrijs Princis,George Hurn-Maloney,Ash Lewis,Urchade Zaratiana
备注:43 pages, 10 figures, 14 tables
摘要:Small language models are attractive for production deployment due to their low cost, fast inference, and ease of specialization. However, adapting them to a specific task remains a challenging engineering loop, driven not by training itself but by surrounding decisions: data curation, failure diagnosis, regression avoidance, and iteration control. We present Pioneer Agent, a closed-loop system that automates this lifecycle. In cold-start mode, given only a natural-language task description, the agent acquires data, constructs evaluation sets, and iteratively trains models by jointly optimizing data, hyperparameters, and learning strategy. In production mode, given a deployed model with labeled failures, it diagnoses error patterns, constructs targeted training data, and retrains under explicit regression constraints. To evaluate this setting, we introduce AdaptFT-Bench, a benchmark of synthetic inference logs with progressively increasing noise, designed to test the full adaptation loop: diagnosis, curriculum synthesis, retraining, and verification. Across eight cold-start benchmarks spanning reasoning, math, code generation, summarization, and classification, Pioneer Agent improves over base models by 1.6-83.8 points. On AdaptFT-Bench, it improves or preserves performance in all seven scenarios, while naive retraining degrades by up to 43 points. On two production-style deployments built from public benchmark tasks, it raises intent classification from 84.9% to 99.3% and Entity F1 from 0.345 to 0.810. Beyond performance gains, the agent often discovers effective training strategies, including chain-of-thought supervision, task-specific optimization, and quality-focused data curation, purely from downstream feedback.


【25】ExecTune: Effective Steering of Black-Box LLMs with Guide Models
标题:ExecButton:通过指南模型有效引导黑匣子LLM
链接:https://arxiv.org/abs/2604.09741

作者:Vijay Lingam,Aditya Golatkar,Anwesan Pal,Ben Vo,Narayanan Sadagopan,Alessandro Achille,Jun Huan,Anoop Deoras,Stefano Soatto
备注:Accepted at Lifelong Agents Workshop at ICLR 2026
摘要:For large language models deployed through black-box APIs, recurring inference costs often exceed one-time training costs. This motivates composed agentic systems that amortize expensive reasoning into reusable intermediate representations. We study a broad class of such systems, termed Guide-Core Policies (GCoP), in which a guide model generates a structured strategy that is executed by a black-box core model. This abstraction subsumes base, supervised, and advisor-style approaches, which differ primarily in how the guide is trained. We formalize GCoP under a cost-sensitive utility objective and show that end-to-end performance is governed by guide-averaged executability: the probability that a strategy generated by the guide can be faithfully executed by the core. Our analysis shows that existing GCoP instantiations often fail to optimize executability under deployment constraints, resulting in brittle strategies and inefficient computation. Motivated by these insights, we propose ExecTune, a principled training recipe that combines teacher-guided acceptance sampling, supervised fine-tuning, and structure-aware reinforcement learning to directly optimize syntactic validity, execution success, and cost efficiency. Across mathematical reasoning and code-generation benchmarks, GCoP with ExecTune improves accuracy by up to 9.2% over prior state-of-the-art baselines while reducing inference cost by up to 22.4%. It enables Claude Haiku 3.5 to outperform Sonnet 3.5 on both math and code tasks, and to come within 1.7% absolute accuracy of Sonnet 4 at 38% lower cost. Beyond efficiency, GCoP also supports modular adaptation by updating the guide without retraining the core.


【26】Human-like Working Memory Interference in Large Language Models
标题:大型语言模型中的类人工作记忆干扰
链接:https://arxiv.org/abs/2604.09670

作者:Hua-Dong Xiong,Li Ji-An,Jiaqi Huang,Robert C. Wilson,Kwonjoon Lee,Xue-Xin Wei
摘要:Intelligent systems must maintain and manipulate task-relevant information online to adapt to dynamic environments and changing goals. This capacity, known as working memory, is fundamental to human reasoning and intelligence. Despite having on the order of 100 billion neurons, both biological and artificial systems exhibit limitations in working memory. This raises a key question: why do large language models (LLMs) show such limitations, given that transformers have full access to prior context through attention? We find that although a two-layer transformer can be trained to solve working memory tasks perfectly, a diverse set of pretrained LLMs continues to show working memory limitations. Notably, LLMs reproduce interference signatures observed in humans: performance degrades with increasing memory load and is biased by recency and stimulus statistics. Across models, stronger working memory capacity correlates with broader competence on standard benchmarks, mirroring its link to general intelligence in humans. Yet despite substantial variability in working memory performance, LLMs surprisingly converge on a common computational mechanism. Rather than directly copying the relevant memory item from context, models encode multiple memory items in entangled representations, such that successful recall depends on interference control -- actively suppressing task-irrelevant content to isolate the target for readout. Moreover, a targeted intervention that suppresses stimulus content information improves performance, providing causal support for representational interference. Together, these findings identify representational interference as a core constraint on working memory in pretrained LLMs, suggesting that working-memory limits in biological and artificial systems may reflect a shared computational challenge: selecting task-relevant information under interference.


【27】FlowHijack: A Dynamics-Aware Backdoor Attack on Flow-Matching Vision-Language-Action Models
标题:FlowHijack:对流匹配视觉-语言-动作模型的动态感知后门攻击
链接:https://arxiv.org/abs/2604.09651

作者:Xinyuan An,Tao Luo,Gengyun Peng,Yaobing Wang,Kui Ren,Dongxia Wang
备注:Accepted at CVPR 2026
摘要:Vision-Language-Action (VLA) models are emerging as a cornerstone for robotics, with flow-matching policies like $π_0$ showing great promise in generating smooth, continuous actions. As these models advance, their unique action generation mechanism - the vector field dynamics - presents a critical yet unexplored security vulnerability, particularly backdoor vulnerabilities. Existing backdoor attacks designed for autoregressive discretization VLAs cannot be directly applied to this new continuous dynamics. We introduce FlowHijack, the first backdoor attack framework to systematically target the underlying vector-field dynamics of flow-matching VLAs. Our method combines a novel $τ$-conditioned injection strategy, which manipulates the initial phase of the action generation, with a dynamics mimicry regularizer. Experiments demonstrate that FlowHijack achieves high attack success rates using stealthy, context-aware triggers where prior works failed. Crucially, it preserves benign task performance and, by enforcing kinematic similarity, generates malicious actions that are behaviorally indistinguishable from normal actions. Our findings reveal a significant vulnerability in continuous embodied models, highlighting the urgent need for defenses targeting the model's internal generative dynamics.


【28】Self-Calibrating Language Models via Test-Time Discriminative Distillation
标题:通过测试时区分蒸馏自校准语言模型
链接:https://arxiv.org/abs/2604.09624

作者:Mohamed Rissal Hedna,Jan Strich,Martin Semmann,Chris Biemann
备注:Submitted to ACL March 26
摘要 :Large language models (LLMs) are systematically overconfident: they routinely express high certainty on questions they often answer incorrectly. Existing calibration methods either require labeled validation data, degrade under distribution shifts, or incur substantial inference costs. Recent work has shown that LLMs already contain a better-calibrated signal than the one they verbalize: the token probability of "True" when the model is asked "Is this answer correct?" ($P(\text{True})$) consistently outperforms their stated confidence, a gap that is theoretically grounded as generative error is lower-bounded by roughly twice the corresponding discriminative error. We introduce $\textbf{SECL}$ ($\textbf{SE}$lf-$\textbf{C}$alibrating $\textbf{L}$anguage Models), a test-time training (TTT) pipeline that exploits this gap as label-free self-supervision, requiring no labeled data or human supervision. SECL adapts only when the input distribution shifts, training on just 6--26% of the question stream at lower cost than the baseline it distills from. Across four small language models from three model families and four diverse domains, SECL reduces Expected Calibration Error (ECE) by 56--78%, outperforming its own supervision signal and matching or outperforming recent inference-time methods. SECL is the first method to apply TTT to calibration; seven ablations covering signal quality, gating strategy, weight accumulation, loss design, domain ordering, hyperparameter sensitivity, and layer selection confirm that each component is crucial and robust across configurations. Code: https://anonymous.4open.science/r/secl-emnlp26-submission-C890


【29】LLMs for Text-Based Exploration and Navigation Under Partial Observability
标题:部分可观测性下基于文本的探索和导航LLM
链接:https://arxiv.org/abs/2604.09604

作者:Stephan Sandfuchs,Maximilian Melchert,Jörg Frochte
备注:15 pages, (to be published Springer Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering [LNICST] )
摘要:Exploration and goal-directed navigation in unknown layouts are central to inspection, logistics, and search-and-rescue. We ask whether large language models (LLMs) can function as \emph{text-only} controllers under partial observability -- without code execution, tools, or program synthesis. We introduce a reproducible benchmark with oracle localisation in fixed ASCII gridworlds: each step reveals only a local $5\times5$ window around the agent and the model must select one of \texttt{UP/RIGHT/DOWN/LEFT}. Nine contemporary LLMs ranging from open/proprietary, dense / Mixture of Experts and instruction- vs. reasoning-tuned are evaluated on two tasks across three layouts of increasing difficulty: \emph{Exploration} (maximising revealed cells) and \emph{Navigation} (reach the goal on the shortest path). The experimental results are evaluated on quantitative metrics including \emph{success rate}, \emph{efficiency} such as normalised coverage and \emph{path length} vs. oracle as well as qualitative analysis. Reasoning-tuned models reliably complete navigation across all layouts, yet remain less efficient than oracle paths. Few-shot demonstrations in the prompt chiefly help these Reasoning-tuned models by reducing invalid moves and shortening paths, while classic dense instruction models remain inconsistent. We observe characteristic action priors (UP/RIGHT) that can induce looping under partial observability. Overall, training regimen and test-time deliberation predict control ability better than raw parameter count. These findings suggest lightweight hybridisation with classical online planners as a practical route to deployable partial map systems.


【30】Generative UI: LLMs are Effective UI Generators
标题:生成性UI:LLM是有效的UI生成器
链接:https://arxiv.org/abs/2604.09577

作者:Yaniv Leviathan,Dani Valevski,Matan Kalman,Danny Lumen,Eyal Segalis,Eyal Molad,Shlomi Pasternak,Vishnu Natchu,Valerie Nygaard,Srinivasan,Venkatachary,James Manyika,Yossi Matias
摘要:AI models excel at creating content, but typically render it with static, predefined interfaces. Specifically, the output of LLMs is often a markdown "wall of text". Generative UI is a long standing promise, where the model generates not just the content, but the interface itself. Until now, Generative UI was not possible in a robust fashion. We demonstrate that when properly prompted and equipped with the right set of tools, a modern LLM can robustly produce high quality custom UIs for virtually any prompt. When ignoring generation speed, results generated by our implementation are overwhelmingly preferred by humans over the standard LLM markdown output. In fact, while the results generated by our implementation are worse than those crafted by human experts, they are at least comparable in 50% of cases. We show that this ability for robust Generative UI is emergent, with substantial improvements from previous models. We also create and release PAGEN, a novel dataset of expert-crafted results to aid in evaluating Generative UI implementations, as well as the results of our system for future comparisons. Interactive examples can be seen at https://generativeui.github.io


【31】DDO-RM for LLM Preference Optimization: A Minimal Held-Out Benchmark against DPO
标题:用于LLM偏好优化的DDO-RM:针对DPO的最小支持基准
链接:https://arxiv.org/abs/2604.11119

作者:Tiantian Zhang,Jierui Zuo,Wenping Wang
备注:8 pages, 4 figures
摘要:This paper reorganizes the current manuscript around the DPO versus DDO-RM preference-optimization project and focuses on two parts: the algorithmic view and the preliminary held-out benchmark. The benchmark asks a narrow question: even in a minimal pairwise chosen-versus-rejected setting, can a reward-guided decision-distribution update outperform a direct pairwise objective? We compare Direct Preference Optimization (DPO) against DDO-RM on EleutherAI/pythia-410m using HuggingFaceH4/ultrafeedback\_binarized, evaluate on the held-out test\_prefs split, and report results for seeds 42, 13, and 3407.   Algorithmically, DDO-RM treats each prompt as a finite decision problem over candidate responses. Instead of optimizing only a binary chosen-rejected relation, it forms a policy distribution over candidates, centers reward-model scores under that distribution, and distills a reward-guided target distribution back into the policy. In the current public benchmark, DDO-RM improves mean pair accuracy from 0.5238 to 0.5602, AUC from 0.5315 to 0.5382, and mean margin from 0.1377 to 0.5353 relative to DPO. These are encouraging but still preliminary results: the study covers one model family, one dataset, one held-out evaluation split, and three seeds.


Graph相关(图学习|图神经网络|图优化等)(11篇)

【1】Learning How Much to Think: Difficulty-Aware Dynamic MoEs for Graph Node Classification
标题:学习思考多少:用于图节点分类的具有难度的动态MoE
链接:https://arxiv.org/abs/2604.11473

作者:Jiajun Zhou,Yadong Li,Xuanze Chen,Chen Ma,Chuang Zhao,Shanqing Yu,Qi Xuan
摘要 :Mixture-of-Experts (MoE) architectures offer a scalable path for Graph Neural Networks (GNNs) in node classification tasks but typically rely on static and rigid routing strategies that enforce a uniform expert budget or coarse-grained expert toggles on all nodes. This limitation overlooks the varying discriminative difficulty of nodes and leads to under-fitting for hard nodes and redundant computation for easy ones. To resolve this issue, we propose D2MoE, a novel framework that shifts the focus from static expert selection to node-wise expert resource allocation. By using predictive entropy as a real-time proxy for difficulty, D2MoE employs a difficulty-driven top-p routing mechanism to adaptively concentrate expert resources on hard nodes while reducing overhead for easy ones, achieving continuous and fine-grained expert budget scaling for node classification. Experiments on 13 benchmarks demonstrate that D2MoE achieves consistent state-of-the-art performance, surpassing leading baselines by up to 7.92% in accuracy on heterophilous graphs. Notably, on large-scale graphs, it reduces memory consumption by up to 73.07% and training time by 46.53% compared to the best-performing Graph MoE, thereby validating its superior efficiency.


【2】Learning Discrete Diffusion of Graphs via Free-Energy Gradient Flows
标题:通过自由能梯度流学习图的离散扩散
链接:https://arxiv.org/abs/2604.11311

作者:Dario Rancati,Jan Maas,Francesco Locatello
摘要:Diffusion-based models on continuous spaces have seen substantial recent progress through the mathematical framework of gradient flows, leveraging the Wasserstein-2 (${W}_2$) metric via the Jordan-Kinderlehrer-Otto (JKO) scheme. Despite the increasing popularity of diffusion models on discrete spaces using continuous-time Markov chains, a parallel theoretical framework based on gradient flows has remained elusive due to intrinsic challenges in translating the ${W}_2$ distance directly into these settings. In this work, we propose the first computational approach addressing these challenges, leveraging an appropriate metric $W_K$ on the simplex of probability distributions, which enables us to interpret widely used discrete diffusion paths, such as the discrete heat equation, as gradient flows of specific free-energy functionals. Through this theoretical insight, we introduce a novel methodology for learning diffusion dynamics over discrete spaces, which recovers the underlying functional directly by leveraging first-order optimality conditions for the JKO scheme. The resulting method optimizes a simple quadratic loss, trains extremely fast, does not require individual sample trajectories, and only needs a numerical preprocessing computing $W_K$-geodesics. We validate our method through extensive numerical experiments on synthetic data, showing that we can recover the underlying functional for a variety of graph classes.


【3】Unified Graph Prompt Learning via Low-Rank Graph Message Prompting
标题:通过低级别图消息预写实现统一图提示学习
链接:https://arxiv.org/abs/2604.11257

作者:Beibei Wang,Bo Jiang,Ziyan Zhang,Jin Tang
摘要:Graph Data Prompt (GDP), which introduces specific prompts in graph data for efficiently adapting pre-trained GNNs, has become a mainstream approach to graph fine-tuning learning problem. However, existing GDPs have been respectively designed for distinct graph component (e.g., node features, edge features, edge weights) and thus operate within limited prompt spaces for graph data. To the best of our knowledge, it still lacks a unified prompter suitable for targeting all graph components simultaneously. To address this challenge, in this paper, we first propose to reinterpret a wide range of existing GDPs from an aspect of Graph Message Prompt (GMP) paradigm. Based on GMP, we then introduce a novel graph prompt learning approach, termed Low-Rank GMP (LR-GMP), which leverages low-rank prompt representation to achieve an effective and compact graph prompt learning. Unlike traditional GDPs that target distinct graph components separately, LR-GMP concurrently performs prompting on all graph components in a unified manner, thereby achieving significantly superior generalization and robustness on diverse downstream tasks. Extensive experiments on several graph benchmark datasets demonstrate the effectiveness and advantages of our proposed LR-GMP.


【4】Panoptic Pairwise Distortion Graph
标题:全景成对失真图
链接:https://arxiv.org/abs/2604.11004

作者:Muhammad Kamran Janjua,Abdul Wahab,Bahador Rashidi
备注:Accepted to ICLR 2026
摘要:In this work, we introduce a new perspective on comparative image assessment by representing an image pair as a structured composition of its regions. In contrast, existing methods focus on whole image analysis, while implicitly relying on region-level understanding. We extend the intra-image notion of a scene graph to inter-image, and propose a novel task of Distortion Graph (DG). DG treats paired images as a structured topology grounded in regions, and represents dense degradation information such as distortion type, severity, comparison and quality score in a compact interpretable graph structure. To realize the task of learning a distortion graph, we contribute (i) a region-level dataset, PandaSet, (ii) a benchmark suite, PandaBench, with varying region-level difficulty, and (iii) an efficient architecture, Panda, to generate distortion graphs. We demonstrate that PandaBench poses a significant challenge for state-of-the-art multimodal large language models (MLLMs) as they fail to understand region-level degradations even when fed with explicit region cues. We show that training on PandaSet or prompting with DG elicits region-wise distortion understanding, opening a new direction for fine-grained, structured pairwise image assessment.


【5】DIB-OD: Preserving the Invariant Core for Robust Heterogeneous Graph Adaptation via Decoupled Information Bottleneck and Online Distillation
标题:DIB-OD:通过解耦信息瓶颈和在线蒸馏保持鲁棒异构图自适应的不变核
链接:https://arxiv.org/abs/2604.10882

作者:Yang Yan,Qiuyan Wang,Tianjin Huang,Qiudong Yu,Kexin Zhang
摘要 :Graph Neural Network pretraining is pivotal for leveraging unlabeled graph data. However, generalizing across heterogeneous domains remains a major challenge due to severe distribution shifts. Existing methods primarily focus on intra-domain patterns, failing to disentangle task-relevant invariant knowledge from domain-specific redundant noise, leading to negative transfer and catastrophic forgetting. To this end, we propose DIB-OD, a novel framework designed to preserve the invariant core for robust heterogeneous graph adaptation through a Decoupled Information Bottleneck and Online Distillation framework. Our core innovation is the explicit decomposition of representations into orthogonal invariant and redundant subspaces. By utilizing an Information Bottleneck teacher-student distillation mechanism and the Hilbert-Schmidt Independence Criterion, we isolate a stable invariant core that transcends domain boundaries. Furthermore, a self-adaptive semantic regularizer is introduced to protect this core from corruption during target-domain adaptation by dynamically gating label influence based on predictive confidence. Extensive experiments across chemical, biological, and social network domains demonstrate that DIB-OD significantly outperforms state-of-the-art methods, particularly in challenging inter-type domain transfers, showcasing superior generalization and anti-forgetting performance.


【6】Topology-Aware PAC-Bayesian Generalization Analysis for Graph Neural Networks
标题:图神经网络的具有全局感知的PAC-Bayesian概括分析
链接:https://arxiv.org/abs/2604.10553

作者:Xinping Yi
摘要:Graph neural networks have demonstrated excellent applicability to a wide range of domains, including social networks, biological systems, recommendation systems, and wireless communications. Yet a principled theoretical understanding of their generalization behavior remains limited, particularly for graph classification tasks where complex interactions between model parameters and graph structure play a crucial role. Among existing theoretical tools, PAC-Bayesian norm-based generalization bounds provide a flexible and data-dependent framework; however, current results for GNNs often restrict the exploitation of graph structures. In this work, we propose a topology-aware PAC-Bayesian norm-based generalization framework for graph convolutional networks (GCNs) that extends a previously developed framework to graph-structured models. Our approach reformulates the derivation of generalization bounds as a stochastic optimization problem and introduces sensitivity matrices that measure the response of classification outputs with respect to structured weight perturbations. By imposing different structures on sensitivity matrices from both spatial and spectral perspectives, we derive a family of generalization error bounds with graph structures explicitly embedded. Such bounds could recover existing results as special cases, while yielding bounds that are tighter than state-of-the-art PAC-Bayesian bounds for GNNs. Notably, the proposed framework explicitly integrates graph structural properties into the generalization analysis, enabling a unified inspection of GNN generalization behavior from both spatial aggregation and spectral filtering viewpoints.


【7】A Diffusion-Contrastive Graph Neural Network with Virtual Nodes for Wind Nowcasting in Unobserved Regions
标题:用于未观测地区风近播的具有虚拟节点的扩散对比图神经网络
链接:https://arxiv.org/abs/2604.10328

作者:Jie Shi,Siamak Mehrkanoon
备注:25 pages, 7 figures
摘要:Accurate weather nowcasting remains one of the central challenges in atmospheric science, with critical implications for climate resilience, energy security, and disaster preparedness. Since it is not feasible to deploy observation stations everywhere, some regions lack dense observational networks, resulting in unreliable short-term wind predictions across those unobserved areas. Here we present a deep graph self-supervised framework that extends nowcasting capability into such unobserved regions without requiring new sensors. Our approach introduces "virtual nodes" into a diffusion and contrastive-based graph neural network, enabling the model to learn wind condition (i.e., speed, direction and gusts) in places with no direct measurements. Using high-temporal resolution weather station data across the Netherlands, we demonstrate that this approach reduces nowcast mean absolute error (MAE) of wind speed, gusts, and direction in unobserved regions by more than 30% - 46% compared with interpolation and regression methods. By enabling localized nowcasts where no measurements exist, this method opens new pathways for renewable energy integration, agricultural planning, and early-warning systems in data-sparse regions.


【8】Virtual Smart Metering in District Heating Networks via Heterogeneous Spatial-Temporal Graph Neural Networks
标题:基于异类时空图神经网络的区域供热网络虚拟智能计量
链接:https://arxiv.org/abs/2604.10166

作者:Keivan Faghih Niresi,Christian Møller Jensen,Carsten Skovmose Kallesøe,Rafael Wisniewski,Olga Fink
摘要:Intelligent operation of thermal energy networks aims to improve energy efficiency, reliability, and operational flexibility through data-driven control, predictive optimization, and early fault detection. Achieving these goals relies on sufficient observability, requiring continuous and well-distributed monitoring of thermal and hydraulic states. However, district heating systems are typically sparsely instrumented and frequently affected by sensor faults, limiting monitoring. Virtual sensing offers a cost-effective means to enhance observability, yet its development and validation remain limited in practice. Existing data-driven methods generally assume dense synchronized data, while analytical models rely on simplified hydraulic and thermal assumptions that may not adequately capture the behavior of heterogeneous network topologies. Consequently, modeling the coupled nonlinear dependencies between pressure, flow, and temperature under realistic operating conditions remains challenging. In addition, the lack of publicly available benchmark datasets hinders systematic comparison of virtual sensing approaches. To address these challenges, we propose a heterogeneous spatial-temporal graph neural network (HSTGNN) for constructing virtual smart heat meters. The model incorporates the functional relationships inherent in district heating networks and employs dedicated branches to learn graph structures and temporal dynamics for flow, temperature, and pressure measurements, thereby enabling the joint modeling of cross-variable and spatial correlations. To support further research, we introduce a controlled laboratory dataset collected at the Aalborg Smart Water Infrastructure Laboratory, providing synchronized high-resolution measurements representative of real operating conditions. Extensive experiments demonstrate that the proposed approach significantly outperforms existing baselines.


【9】A Temporally Augmented Graph Attention Network for Affordance Classification
标题:用于提供性分类的时间增强图注意力网络
链接:https://arxiv.org/abs/2604.10149

作者:Ami Chopra,Supriya Bordoloi,Shyamanta M. Hazarika
备注:6 pages, 6 figures. Accepted at 3rd IEEE Guwahati Subsection Conference (GCON 2026)
摘要:Graph attention networks (GATs) provide one of the best frameworks for learning node representations in relational data; but, existing variants such as Graph Attention Network (GAT) mainly operate on static graphs and rely on implicit temporal aggregation when applied to sequential data. In this paper, we introduce Electroencephalography-temporal Graph Attention Network (EEG-tGAT), a temporally augmented formulation of GATv2 that is tailored for affordance classification from interaction sequences. The proposed model incorporates temporal attention to modulate the contribution of different time segments and temporal dropout to regularize learning across temporally correlated observations. The design reflects the assumption that temporal dimensions in affordance data are not semantically uniform and that discriminative information may be unevenly distributed across time. Experimental results on affordance datasets show that EEG-tGAT achieves improved classification performance compared to GATv2. The observed gains helps to conclude that explicitly encoding temporal importance and enforcing temporal robustness introduce inductive biases that are much better aligned with the structure of affordance-driven interaction data. These findings show us that modest architectural changes to graph attention models can help one obtain consistent benefits when temporal relationships play a nontrivial role in the task.


【10】Graph-RHO: Critical-path-aware Heterogeneous Graph Network for Long-Horizon Flexible Job-Shop Scheduling
标题:Graph-RHO:关键路径感知的异类图网络,用于长时间灵活的Job-Shop调度
链接:https://arxiv.org/abs/2604.10073

作者:Yujie Li,Jiuniu Wang,Mugen Peng,Guangzuo Li,Wenjia Xu
备注:8 pages, 3 figures; Accepted by IJCNN 2026
摘要:Long-horizon Flexible Job-Shop Scheduling~(FJSP) presents a formidable combinatorial challenge due to complex, interdependent decisions spanning extended time horizons. While learning-based Rolling Horizon Optimization~(RHO) has emerged as a promising paradigm to accelerate solving by identifying and fixing invariant operations, its effectiveness is hindered by the structural complexity of FJSP. Existing methods often fail to capture intricate graph-structured dependencies and ignore the asymmetric costs of prediction errors, in which misclassifying critical-path operations is significantly more detrimental than misclassifying non-critical ones. Furthermore, dynamic shifts in predictive confidence during the rolling process make static pruning thresholds inadequate. To address these limitations, we propose Graph-RHO, a novel critical-path-aware graph-based RHO framework. First, we introduce a topology-aware heterogeneous graph network that encodes subproblems as operation-machine graphs with multi-relational edges, leveraging edge-feature-aware message passing to predict operation stability. Second, we incorporate a critical-path-aware mechanism that injects inductive biases during training to distinguish highly sensitive bottleneck operations from robust ones. Third, we devise an adaptive thresholding strategy that dynamically calibrates decision boundaries based on online uncertainty estimation to align model predictions with the solver's search space. Extensive experiments on standard benchmarks demonstrate that \mbox{Graph-RHO} establishes a new state of the art in solution quality and computational efficiency. Remarkably, it exhibits exceptional zero-shot generalization, reducing solve time by over 30\% on large-scale instances (2000 operations) while achieving superior solution quality. Our code is available \href{https://github.com/IntelliSensing/Graph-RHO}{here}.


【11】K-STEMIT: Knowledge-Informed Spatio-Temporal Efficient Multi-Branch Graph Neural Network for Subsurface Stratigraphy Thickness Estimation from Radar Data
标题:K型股骨柄:基于知识的时空高效多分支图神经网络的雷达地层厚度估计
链接:https://arxiv.org/abs/2604.09922

作者:Zesheng Liu,Maryam Rahnemoonfar
摘要:Subsurface stratigraphy contains important spatio-temporal information about accumulation, deformation, and layer formation in polar ice sheets. In particular, variations in internal ice layer thickness provide valuable constraints for snow mass balance estimation and projections of ice sheet change. Although radar sensors can capture these layered structures as depth-resolved radargrams, convolutional neural networks applied directly to radar images are often sensitive to speckle noise and acquisition artifacts. In addition, purely data-driven methods may underuse physical knowledge, leading to unrealistic thickness estimates under spatial or temporal extrapolation. To address these challenges, we develop K-STEMIT, a novel knowledge-informed, efficient, multi-branch spatio-temporal graph neural network that combines a geometric framework for spatial learning with temporal convolution to capture temporal dynamics, and incorporates physical data synchronized from the Model Atmospheric Regional physical weather model. An adaptive feature fusion strategy is employed to dynamically combine features learned from different branches. Extensive experiments have been conducted to compare K-STEMIT against current state-of-the-art methods in both knowledge-informed and non-knowledge-informed settings, as well as other existing methods. Results show that K-STEMIT consistently achieves the highest accuracy while maintaining near-optimal efficiency. Most notably, incorporating adaptive feature fusion and physical priors reduces the root mean-squared error by 21.01% with negligible additional cost compared to its conventional multi-branch variants. Additionally, our proposed K-STEMIT achieves consistently lower per-year relative MAE, enabling reliable, continuous spatiotemporal assessment of snow accumulation variability across large spatial regions.


Transformer(15篇)

【1】Legal2LogicICL: Improving Generalization in Transforming Legal Cases to Logical Formulas via Diverse Few-Shot Learning
标题:Legal 2LogicICL:通过多样化的Few-Shot学习提高将法律案例转化为逻辑公式的概括性
链接:https://arxiv.org/abs/2604.11699

作者:Jieying Xue,Phuong Minh Nguyen,Ha Thanh Nguyen,May Myo Zin,Ken Satoh
备注:Accepted at ICAIL 2026
摘要 :This work aims to improve the generalization of logic-based legal reasoning systems by integrating recent advances in NLP with legal-domain adaptive few-shot learning techniques using LLMs. Existing logic-based legal reasoning pipelines typically rely on fine-tuned models to map natural-language legal cases into logical formulas before forwarding them to a symbolic reasoner. However, such approaches are heavily constrained by the scarcity of high-quality annotated training data. To address this limitation, we propose a novel LLM-based legal reasoning framework that enables effective in-context learning through retrieval-augmented generation. Specifically, we introduce Legal2LogicICL, a few-shot retrieval framework that balances diversity and similarity of exemplars at both the latent semantic representation level and the legal text structure level. In addition, our method explicitly accounts for legal structure by mitigating entity-induced retrieval bias in legal texts, where lengthy and highly specific entity mentions often dominate semantic representations and obscure legally meaningful reasoning patterns. Our Legal2LogicICL constructs informative and robust few-shot demonstrations, leading to accurate and stable logical rule generation without requiring additional training. In addition, we construct a new dataset, named Legal2Proleg, which is annotated with alignments between legal cases and PROLEG logical formulas to support the evaluation of legal semantic parsing. Experimental results on both open-source and proprietary LLMs demonstrate that our approach significantly improves accuracy, stability, and generalization in transforming natural-language legal case descriptions into logical representations, highlighting its effectiveness for interpretable and reliable legal reasoning. Our code is available at https://github.com/yingjie7/Legal2LogicICL.


【2】Layerwise Dynamics for In-Context Classification in Transformers
标题:Transformer中上下文内分类的分层动力学
链接:https://arxiv.org/abs/2604.11613

作者:Patrick Lutz,Themistoklis Haris,Arjun Chandra,Aditya Gangrade,Venkatesh Saligrama
摘要:Transformers can perform in-context classification from a few labeled examples, yet the inference-time algorithm remains opaque. We study multi-class linear classification in the hard no-margin regime and make the computation identifiable by enforcing feature- and label-permutation equivariance at every layer. This enables interpretability while maintaining functional equivalence and yields highly structured weights. From these models we extract an explicit depth-indexed recursion: an end-to-end identified, emergent update rule inside a softmax transformer, to our knowledge the first of its kind. Attention matrices formed from mixed feature-label Gram structure drive coupled updates of training points, labels, and the test probe. The resulting dynamics implement a geometry-driven algorithmic motif, which can provably amplify class separation and yields robust expected class alignment.


【3】Transformers Learn Latent Mixture Models In-Context via Mirror Descent
标题:Transformer通过镜像下降在上下文中学习潜在混合模型
链接:https://arxiv.org/abs/2604.10848

作者:Francesco D'Angelo,Nicolas Flammarion
摘要:Sequence modelling requires determining which past tokens are causally relevant from the context and their importance: a process inherent to the attention layers in transformers, yet whose underlying learned mechanisms remain poorly understood. In this work, we formalize the task of estimating token importance as an in-context learning problem by introducing a framework based on Mixture of Transition Distributions, where a latent variable determines the influence of past tokens on the next. The distribution over this latent variable is parameterized by unobserved mixture weights that transformers must learn in-context. We demonstrate that transformers can implement Mirror Descent to learn these weights from the context. Specifically, we give an explicit construction of a three-layer transformer that exactly implements one step of Mirror Descent and prove that the resulting estimator is a first-order approximation of the Bayes-optimal predictor. Corroborating our construction and its learnability via gradient descent, we empirically show that transformers trained from scratch learn solutions consistent with our theory: their predictive distributions, attention patterns, and learned transition matrix closely match the construction, while deeper models achieve performance comparable to multi-step Mirror Descent.


【4】Position-Agnostic Pre-Projection for Transformer Attention: Nonlinear Feature Construction and Content Skip Before Q/K/V
标题:Transformer注意力的位置不可知预投影:Q/K/V之前的非线性特征构建和内容跳过
链接:https://arxiv.org/abs/2604.10791

作者:Chirag Shinde
备注:7 pages, 2 figures, 5 tables. Code: https://github.com/cs-cmyk/preprojection
摘要:We propose two complementary modifications to transformer attention blocks. First, a non-linear pre-projection MLP is inserted between layer norm and Q/K/V projections, constructing richer features in a position-agnostic manner before any positional encoding is applied. Second, a content skip connection routes the pre-projection's features around the attention mechanism, allowing content information to bypass position-aware attention where beneficial. In frozen-probe experiments on Pythia-160M and 410M, the combined approach achieves the strongest results across methods: +40.6% LAMBADA accuracy and -39% perplexity at 160M scale. Learned skip connection weights reveal a consistent pattern across model sizes: later transformer layers activate the content bypass more strongly than earlier layers, suggesting that deeper layers benefit from content information that does not pass through positional attention. All modifications add no K/V cache overhead.


【5】INCRT: An Incremental Transformer That Determines Its Own Architecture
标题:INCPR:一个决定自己架构的增量Transformer
链接:https://arxiv.org/abs/2604.10703

作者:Giansalvo Cirrincione
备注:19 pages, 6 figures, 5 theorems. Submitted to Neurocomputing (Elsevier)
摘要 :Transformer architectures are designed by trial and error: the number of attention heads, the depth, and the head size are fixed before training begins, with no mathematical principle to guide the choice. The result is systematic structural redundancy -- between half and four-fifths of all heads in a trained model can be removed without measurable loss -- because the architecture allocates capacity without reference to the actual requirements of the task.This paper introduces INCRT (Incremental Transformer), an architecture that determines its own structure during training. Starting from a single head, INCRT adds one attention head at a time whenever its current configuration is provably insufficient, and prunes heads that have become redundant. Each growth decision is driven by a single, online-computable geometric quantity derived from the task's directional structure, requiring no separate validation phase and no hand-tuned schedule.   Two theorems form the theoretical backbone. The first (homeostatic convergence) establishes that the system always reaches a finite stopping configuration that is simultaneously minimal (no redundant heads) and sufficient (no uncaptured directional energy above the threshold). The second (compressed-sensing analogy) provides a geometric upper bound on the number of heads that this configuration can contain, as a function of the spectral complexity of the task.   Experiments on SARS-CoV-2 variant classification and SST-2 sentiment analysis confirm both results: the predicted and observed head counts agree within 12% across all benchmarks, and the final architectures match or exceed BERT-base on distribution-specific tasks while using between three and seven times fewer parameters and no pre-training.


【6】Sense Less, Infer More: Agentic Multimodal Transformers for Edge Medical Intelligence
标题:少感知,多推理:边缘医疗智能的大型多模式变形机
链接:https://arxiv.org/abs/2604.10404

作者:Chengwei Zhou,Zhaoyan Jia,Haotian Yu,Xuming Chen,Brandon Lee,Christopher Pulliam,Steve Majerus,Massoud Pedram,Gourav Datta
备注:7 figures, 4 tables
摘要:Edge-based multimodal medical monitoring requires models that balance diagnostic accuracy with severe energy constraints. Continuous acquisition of ECG, PPG, EMG, and IMU streams rapidly drains wearable batteries, often limiting operation to under 10 hours, while existing systems overlook the high temporal redundancy present in physiological signals. We introduce Adaptive Multimodal Intelligence (AMI), an end-to-end framework that jointly learns when to sense and how to infer. AMI integrates three components: (1) a lightweight Agentic Modality Controller that uses differentiable Gumbel-Sigmoid gating to dynamically select active sensors based on model confidence and task relevance; (2) a Learned Sigma-Delta Sensing module that applies patch-wise Delta-Sigma operations with learnable thresholds to skip temporally redundant samples; and (3) a Foundation-backed Multimodal Prediction Model built on unimodal foundation encoders and a cross-modal transformer with temporal context, enabling robust fusion even under gated or missing inputs. These components are trained jointly via a multi-objective loss combining classification accuracy, sparsity regularization, cross-modal alignment, and predictive coding. AMI is hardware-aware, supporting dynamic computation graphs and masked operations, leading to real energy and latency savings. Across MHEALTH, HMC Sleep, and WESAD datasets, it reduces sensor usage by 48.8% while improving state-of-the-art accuracy by 1.9% on average.


【7】Tracing the Thought of a Grandmaster-level Chess-Playing Transformer
标题:大师级象棋Transformer的思想追踪
链接:https://arxiv.org/abs/2604.10158

作者:Rui Lin,Zhenyu Jin,Guancheng Zhou,Xuyang Ge,Wentao Shu,Jiaxing Wu,Junxuan Wang,Zhengfu He,Junping Zhang,Xipeng Qiu
摘要:While modern transformer neural networks achieve grandmaster-level performance in chess and other reasoning tasks, their internal computation process remains largely opaque. Focusing on Leela Chess Zero (LC0), we introduce a sparse decomposition framework to interpret its internal computation by decomposing its MLP and attention modules with sparse replacement layers, which capture the primary computation process of LC0. We conduct a detailed case study showing that these pathways expose rich, interpretable tactical considerations that are empirically verifiable. We further introduce three quantitative metrics and show that LC0 exhibits parallel reasoning behavior consistent with the inductive bias of its policy head architecture. To the best of our knowledge, this is the first work to decompose the internal computation of a transformer on both MLP and attention modules for interpretability. Combining sparse replacement layers and causal interventions in LC0 provides a comprehensive understanding of advanced tactical reasoning, offering critical insights into the underlying mechanisms of superhuman systems. Our code is available at https://github.com/JacklE0niden/Leela-SAEs.


【8】Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation
标题:Transformer中的注意力下沉:利用、解释和缓解调查
链接:https://arxiv.org/abs/2604.10098

作者:Zunhai Su,Hengyuan Zhang,Wei Wu,Yifan Zhang,Yaxiu Liu,He Xiao,Qingyao Yang,Yuxuan Sun,Rui Yang,Chao Zhang,Keyu Fan,Weihao Ye,Jing Xiong,Hui Shen,Chaofan Tao,Taiqiang Wu,Zhongwei Wan,Yulei Qian,Yuchen Xie,Ngai Wong
摘要:As the foundational architecture of modern machine learning, Transformers have driven remarkable progress across diverse AI domains. Despite their transformative impact, a persistent challenge across various Transformers is Attention Sink (AS), in which a disproportionate amount of attention is focused on a small subset of specific yet uninformative tokens. AS complicates interpretability, significantly affecting the training and inference dynamics, and exacerbates issues such as hallucinations. In recent years, substantial research has been dedicated to understanding and harnessing AS. However, a comprehensive survey that systematically consolidates AS-related research and offers guidance for future advancements remains lacking. To address this gap, we present the first survey on AS, structured around three key dimensions that define the current research landscape: Fundamental Utilization, Mechanistic Interpretation, and Strategic Mitigation. Our work provides a pivotal contribution by clarifying key concepts and guiding researchers through the evolution and trends of the field. We envision this survey as a definitive resource, empowering researchers and practitioners to effectively manage AS within the current Transformer paradigm, while simultaneously inspiring innovative advancements for the next generation of Transformers. The paper list of this work is available at https://github.com/ZunhaiSu/Awesome-Attention-Sink.


【9】Attention-Guided Dual-Stream Learning for Group Engagement Recognition: Fusing Transformer-Encoded Motion Dynamics with Scene Context via Adaptive Gating
标题:用于群体参与识别的注意力引导双流学习:通过自适应门控将变形器编码的运动动力学与场景上下文融合
链接:https://arxiv.org/abs/2604.10078

作者:Saniah Kayenat Chowdhury,Muhammad E. H. Chowdhury
摘要 :Student engagement is crucial for improving learning outcomes in group activities. Highly engaged students perform better both individually and contribute to overall group success. However, most existing automated engagement recognition methods are designed for online classrooms or estimate engagement at the individual level. Addressing this gap, we propose DualEngage, a novel two-stream framework for group-level engagement recognition from in-classroom videos. It models engagement as a joint function of both individual and group-level behaviors. The primary stream models person-level motion dynamics by detecting and tracking students, extracting dense optical flow with the Recurrent All-Pairs Field Transforms network, encoding temporal motion patterns using a transformer encoder, and finally aggregating per-student representations through attention pooling into a unified representation. The secondary stream captures scene-level spatiotemporal information from the full video clip, leveraging a pretrained three-dimensional Residual Network. The two-stream representations are combined via softmax-gated fusion, which dynamically weights each stream's contribution based on the joint context of both features. DualEngage learns a joint representation of individual actions with overarching group dynamics. We evaluate the proposed approach using fivefold cross-validation on the Classroom Group Engagement Dataset developed by Ocean University of China, achieving an average classification accuracy of 0.9621+/-0.0161 with a macro-averaged F1 of 0.9530+/-0.0204. To understand the contribution of each branch, we further conduct an ablation study comparing single-stream variants against the two-stream model. This work is among the first in classroom engagement recognition to adopt a dual-stream design that explicitly leverages motion cues as an estimator.


【10】Transformers Learn the Optimal DDPM Denoiser for Multi-Token GMMs
标题:Transformer了解多令牌GSYS的最佳DDPM降噪器
链接:https://arxiv.org/abs/2604.10074

作者:Hongkang Li,Hancheng Min,Rene Vidal
摘要:Transformer-based diffusion models have demonstrated remarkable performance at generating high-quality samples. However, our theoretical understanding of the reasons for this success remains limited. For instance, existing models are typically trained by minimizing a denoising objective, which is equivalent to fitting the score function of the training data. However, we do not know why transformer-based models can match the score function for denoising, or why gradient-based methods converge to the optimal denoising model despite the non-convex loss landscape. To the best of our knowledge, this paper provides the first convergence analysis for training transformer-based diffusion models. More specifically, we consider the population Denoising Diffusion Probabilistic Model (DDPM) objective for denoising data that follow a multi-token Gaussian mixture distribution. We theoretically quantify the required number of tokens per data point and training iterations for the global convergence towards the Bayes optimal risk of the denoising objective, thereby achieving a desired score matching error. A deeper investigation reveals that the self-attention module of the trained transformer implements a mean denoising mechanism that enables the trained model to approximate the oracle Minimum Mean Squared Error (MMSE) estimator of the injected noise in the diffusion steps. Numerical experiments validate these findings.


【11】Relational Preference Encoding in Looped Transformer Internal States
标题:环形Transformer内部状态的关系偏好编码
链接:https://arxiv.org/abs/2604.09870

作者:Jan Kirin
摘要:We investigate how looped transformers encode human preference in their internal iteration states. Using Ouro-2.6B-Thinking, a 2.6B-parameter looped transformer with iterative refinement, we extract hidden states from each loop iteration and train lightweight evaluator heads (~5M parameters) to predict human preference on the Anthropic HH-RLHF dataset. Our pairwise evaluator achieves 95.2% test accuracy on 8,552 unseen examples, surpassing a full-batch L-BFGS probe (84.5%) while the base model remains completely frozen.   Our central finding is that loop states encode preference predominantly relationally: a linear probe on pairwise differences achieves 84.5%, the best nonlinear independent evaluator reaches only 65% test accuracy, and linear independent classification scores 21.75%, below chance and with inverted polarity. Interpreted precisely, the evaluator functions as a model-internal consistency probe, measuring how stably Ouro's own learned value system organizes its representations rather than how well it predicts noisy human annotations.   We also document a systematic architecture search that established a genuine 70% ceiling for independent scoring, and show how the 50% argument-swap protocol required to prevent degenerate pairwise solutions deflated pairwise training metrics by about 31 points at peak, creating the false appearance that pairwise and pointwise evaluators shared the same ceiling.   Finally, we show that a cosine learning-rate dead zone at epoch 2 accidentally acted as early stopping, preserving the generalization peak before overfitting degraded test accuracy from 95.2% to 62.4% by epoch 5. Cross-epoch flip-test analysis shows that antisymmetry correlation remains stable while strict sign-flip rate mainly tracks scorer bias. We propose the flip test as a mandatory diagnostic for pairwise preference evaluators.


【12】Improving DNS Exfiltration Detection via Transformer Pretraining
标题:通过Transformer预训练改进DNS溢出检测
链接:https://arxiv.org/abs/2604.09849

作者:Miloš Tomić,Aleksa Cvetanović,Predrag Tadić
备注:This is the preprint version of the paper. The final version of the paper has been presented at the TELFOR 2025 conference. The paper has 4 pages, 1 figure and 3 tables
摘要:We study whether in-domain pretraining of Bidirectional Encoder Representations from Transformer (BERT) model improves subdomain-level detection of exfiltration at low false positive rates. While previous work mostly examines fine-tuned generic Transformers, it does not aim to isolate the effect of pretraining on the downstream task of classification. To address this gap, we develop a controlled pipeline where we freeze operating points on validation and transfer them to the test set, thus enabling clean ablations across different label and pretraining budgets. Our results show significant improvements in the left tail of the Receiver Operating Characteristic (ROC) curve, especially against randomly initialized baseline. Additionally, within pretrained model variants, increasing the number of pretraining steps helps the most when more labeled data are available for fine-tuning.


【13】Sustainable Transformer Neural Network Acceleration with Stochastic Photonic Computing
标题:利用随机光计算实现可持续的Transformer神经网络加速
链接:https://arxiv.org/abs/2604.09759

作者:S. Afifi,O. Alo,I. Thakkar,S. Pasricha
摘要 :Transformers achieve state-of-the-art performance in natural language processing, vision, and scientific computing, but demand high computation and memory. To address these challenges, we present ASTRA, the first silicon-photonic accelerator leveraging stochastic computing for transformers. ASTRA employs novel optical stochastic multipliers and unary/analog homodyne accumulation in a crosstalk-minimal organization to efficiently process dynamic tensor computations. Evaluations show at least 7.6x speedup and 1.3x lower energy overheads compared to state-of-the-art accelerators, highlighting ASTRA's potential for efficient, scalable, and sustainable transformer inference.


【14】PASTA: Vision Transformer Patch Aggregation for Weakly Supervised Target and Anomaly Segmentation
标题:PASTA:用于弱监督目标和异常分割的Vision Transformer补丁聚合
链接:https://arxiv.org/abs/2604.09701

作者:Melanie Neubauer,Elmar Rueckert,Christian Rauch
摘要:Detecting unseen anomalies in unstructured environments presents a critical challenge for industrial and agricultural applications such as material recycling and weeding. Existing perception systems frequently fail to satisfy the strict operational requirements of these domains, specifically real-time processing, pixel-level segmentation precision, and robust accuracy, due to their reliance on exhaustively annotated datasets. To address these limitations, we propose a weakly supervised pipeline for object segmentation and classification using weak image-level supervision called 'Patch Aggregation for Segmentation of Targets and Anomalies' (PASTA). By comparing an observed scene with a nominal reference, PASTA identifies Target and Anomaly objects through distribution analysis in self-supervised Vision Transformer (ViT) feature spaces. Our pipeline utilizes semantic text-prompts via the Segment Anything Model 3 to guide zero-shot object segmentation.   Evaluations on a custom steel scrap recycling dataset and a plant dataset demonstrate a 75.8% training time reduction of our approach to domain-specific baselines. While being domain-agnostic, our method achieves superior Target (up to 88.3% IoU) and Anomaly (up to 63.5% IoU) segmentation performance in the industrial and agricultural domain.


【15】Generating Hadamard matrices with transformers
标题:用变换器生成Hadamard矩阵
链接:https://arxiv.org/abs/2604.11101

作者:Geordie Williamson,Oded Yacobi,Paul Zinn-Justin
摘要:We present a new method for constructing Hadamard matrices that combines transformer neural networks with local search in the PatternBoost framework. Our approach is designed for extremely sparse combinatorial search problems and is particularly effective for Hadamard matrices of Goethals--Seidel type, where Fourier methods permit fast scoring and optimisation. For orders between $100$ and $250$, it produces large numbers of inequivalent Hadamard matrices, and in harder cases it succeeds where local search from random initialisation fails. The largest example found by our method has order $244$. In addition to these new constructions, our experiments reveal that the transformer can discover and exploit useful hidden symmetry in the search space.


GAN|对抗|攻击|生成相关(10篇)

【1】RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time
标题:临时奖励:推理奖励量表视觉生成训练和测试时间
链接:https://arxiv.org/abs/2604.11626

作者:Haozhe Wang,Cong Wei,Weiming Ren,Jiaming Liu,Fangzhen Lin,Wenhu Chen
备注:Project Page: https://tiger-ai-lab.github.io/RationalRewards/ ; Code, Dataset, Models are released
摘要:Most reward models for visual generation reduce rich human judgments to a single unexplained score, discarding the reasoning that underlies preference. We show that teaching reward models to produce explicit, multi-dimensional critiques before scoring transforms them from passive evaluators into active optimization tools, improving generators in two complementary ways: at training time, structured rationales provide interpretable, fine-grained rewards for reinforcement learning; at test time, a Generate-Critique-Refine loop turns critiques into targeted prompt revisions that improve outputs without any parameter updates. To train such a reward model without costly rationale annotations, we introduce Preference-Anchored Rationalization (PARROT), a principled framework that recovers high-quality rationales from readily available preference data through anchored generation, consistency filtering, and distillation. The resulting model, RationalRewards (8B), achieves state-of-the-art preference prediction among open-source reward models, competitive with Gemini-2.5-Pro, while using 10-20x less training data than comparable baselines. As an RL reward, it consistently improves text-to-image and image-editing generators beyond scalar alternatives. Most strikingly, its test-time critique-and-refine loop matches or exceeds RL-based fine-tuning on several benchmarks, suggesting that structured reasoning can unlock latent capabilities in existing generators that suboptimal prompts fail to elicit.


【2】Synthius-Mem: Brain-Inspired Hallucination-Resistant Persona Memory Achieving 94.4% Memory Accuracy and 99.6% Adversarial Robustness on LoCoMo
标题:Synthius-Mem:受大脑启发的抗幻觉人格记忆在LoCoMo上实现94.4%的记忆准确性和99.6%的对抗稳健性
链接:https://arxiv.org/abs/2604.11563

作者:Artem Gadzhiev,Andrew Kislov
摘要 :Providing AI agents with reliable long-term memory that does not hallucinate remains an open problem. Current approaches to memory for LLM agents -- sliding windows, summarization, embedding-based RAG, and flat fact extraction -- each reduce token cost but introduce catastrophic information loss, semantic drift, or uncontrolled hallucination about the user. The structural reason is architectural: every published memory system on the LoCoMo benchmark treats conversation as a retrieval problem over raw or lightly summarized dialogue segments, and none reports adversarial robustness, the ability to refuse questions about facts the user never disclosed. We present Synthius-Mem, a brain-inspired structured persona memory system that takes a fundamentally different approach. Instead of retrieving what was said, Synthius-Mem extracts what is known about the person: a full persona extraction pipeline decomposes conversations into six cognitive domains (biography, experiences, preferences, social circle, work, psychometrics), consolidates and deduplicates per domain, and retrieves structured facts via CategoryRAG at 21.79 ms latency. On the LoCoMo benchmark (ACL 2024, 10 conversations, 1,813 questions), Synthius-Mem achieves 94.37% accuracy, exceeding all published systems including MemMachine (91.69%, adversarial score is not reported) and human performance (87.9 F1). Core memory fact accuracy reaches 98.64%. Adversarial robustness, the hallucination resistance metric that no competing system reports, reaches 99.55%. Synthius-Mem reduces token consumption by ~5x compared to full-context replay while achieving higher accuracy. Synthius-Mem achieves state-of-the-art results on LoCoMo and is, to our knowledge, the only persona memory system that both exceeds human-level performance and reports adversarial robustness.


【3】Continuous Adversarial Flow Models
标题:连续对抗流模型
链接:https://arxiv.org/abs/2604.11521

作者:Shanchuan Lin,Ceyuan Yang,Zhijie Lin,Hao Chen,Haoqi Fan
摘要:We propose continuous adversarial flow models, a type of continuous-time flow model trained with an adversarial objective. Unlike flow matching, which uses a fixed mean-squared-error criterion, our approach introduces a learned discriminator to guide training. This change in objective induces a different generalized distribution, which empirically produces samples that are better aligned with the target data distribution. Our method is primarily proposed for post-training existing flow-matching models, although it can also train models from scratch. On the ImageNet 256px generation task, our post-training substantially improves the guidance-free FID of latent-space SiT from 8.26 to 3.63 and of pixel-space JiT from 7.17 to 3.57. It also improves guided generation, reducing FID from 2.06 to 1.53 for SiT and from 1.86 to 1.80 for JiT. We further evaluate our approach on text-to-image generation, where it achieves improved results on both the GenEval and DPG benchmarks.


【4】Active Bayesian Inference for Robust Control under Sensor False Data Injection Attacks
标题:传感器错误数据注入攻击下鲁棒控制的主动Bayesian推理
链接:https://arxiv.org/abs/2604.11410

作者:Axel Andersson,György Dán
备注:8 pages, 4 figures. This work has been submitted to the IEEE for possible publication
摘要:We present a framework for bridging the gap between sensor attack detection and recovery in cyber-physical systems. The proposed framework models modern-day, complex perception pipelines as bipartite graphs, which combined with anomaly detector alerts defines a Bayesian network for inferring compromised sensors. An active probing strategy exploits system nonlinearities to maximize distinguishability between attack hypotheses, while compromised sensors are selectively disabled to maintain reliable state estimation. We propose a threshold-based probing strategy and show its effectiveness via a simplified partially observable Markov decision process (POMDP) formulation. Experiments on an inverted pendulum under single and multi-sensor attacks show that our method significantly outperforms outlier-robust and prediction-based baselines, especially under prolonged attacks.


【5】Robust Adversarial Policy Optimization Under Dynamics Uncertainty
标题:动态不确定性下的鲁棒对抗政策优化
链接:https://arxiv.org/abs/2604.10974

作者:Mintae Kim,Koushil Sreenath
备注:33 pages, 8 figures
摘要:Reinforcement learning (RL) policies often fail under dynamics that differ from training, a gap not fully addressed by domain randomization or existing adversarial RL methods. Distributionally robust RL provides a formal remedy but still relies on surrogate adversaries to approximate intractable primal problems, leaving blind spots that potentially cause instability and over-conservatism. We propose a dual formulation that directly exposes the robustness-performance trade-off. At the trajectory level, a temperature parameter from the dual problem is approximated with an adversarial network, yielding efficient and stable worst-case rollouts within a divergence bound. At the model level, we employ Boltzmann reweighting over dynamics ensembles, focusing on more adverse environments to the current policy rather than uniform sampling. The two components act independently and complement each other: trajectory-level steering ensures robust rollouts, while model-level sampling provides policy-sensitive coverage of adverse dynamics. The resulting framework, robust adversarial policy optimization (RAPO) outperforms robust RL baselines, improving resilience to uncertainty and generalization to out-of-distribution dynamics while maintaining dual tractability.


【6】QShield: Securing Neural Networks Against Adversarial Attacks using Quantum Circuits
标题:QShield:使用量子电路保护神经网络免受对抗性攻击
链接:https://arxiv.org/abs/2604.10933

作者:Navid Azimi,Aditya Prakash,Yao Wang,Li Xiong
摘要:Deep neural networks remain highly vulnerable to adversarial perturbations, limiting their reliability in security- and safety-critical applications. To address this challenge, we introduce QShield, a modular hybrid quantum-classical neural network (HQCNN) architecture designed to enhance the adversarial robustness of classical deep learning models. QShield integrates a conventional convolutional neural network (CNN) backbone for feature extraction with a quantum processing module that encodes the extracted features into quantum states, applies structured entanglement operations under realistic noise models, and outputs a hybrid prediction through a dynamically weighted fusion mechanism implemented via a lightweight multilayer perceptron (MLP). We systematically evaluate both classical and hybrid quantum-classical models on the MNIST, OrganAMNIST, and CIFAR-10 datasets, using a comprehensive set of robustness, efficiency, and computational performance metrics. Our results demonstrate that classical models are highly vulnerable to adversarial attacks, whereas the proposed hybrid models with entanglement patterns maintain high predictive accuracy while substantially reducing attack success rates across a wide range of adversarial attacks. Furthermore, the proposed hybrid architecture significantly increased the computational cost required to generate adversarial examples, thereby introducing an additional layer of defense. These findings indicate that the proposed modular hybrid architecture achieves a practical balance between predictive accuracy and adversarial robustness, positioning it as a promising approach for secure and reliable machine learning in sensitive and safety-critical applications.


【7】FedRio: Personalized Federated Social Bot Detection via Cooperative Reinforced Contrastive Adversarial Distillation
标题:FedRio:通过合作加强对比对抗蒸馏的个性化联邦社会机器人检测
链接:https://arxiv.org/abs/2604.10678

作者:Yingguang Yang,Hao Liu,Xin Zhang,Yunhui Liu,Yutong Xia,Qi Wu,Hao Peng,Taoran Liang,Bin Chong,Tieke He,Philip S. Yu
备注:17 pages, 6 figures
摘要:Social bot detection is critical to the stability and security of online social platforms. However, current state-of-the-art bot detection models are largely developed in isolation, overlooking the benefits of leveraging shared detection patterns across platforms to improve performance and promptly identify emerging bot variants. The heterogeneity of data distributions and model architectures further complicates the design of an effective cross-platform and cross-model detection framework. To address these challenges, we propose FedRio (Personalized Federated Social Bot Detection with Cooperative Reinforced Contrastive Adversarial Distillation framework. We first introduce an adaptive message-passing module as the graph neural network backbone for each client. To facilitate efficient knowledge sharing of global data distributions, we design a federated knowledge extraction mechanism based on generative adversarial networks. Additionally, we employ a multi-stage adversarial contrastive learning strategy to enforce feature space consistency among clients and reduce divergence between local and global models. Finally, we adopt adaptive server-side parameter aggregation and reinforcement learning-based client-side parameter control to better accommodate data heterogeneity in heterogeneous federated settings. Extensive experiments on two real-world social bot detection benchmarks demonstrate that FedRio consistently outperforms state-of-the-art federated learning baselines in detection accuracy, communication efficiency, and feature space consistency, while remaining competitive with published centralized results under substantially stronger privacy constraints.


【8】A Queueing-Theoretic Framework for Dynamic Attack Surfaces: Data-Integrated Risk Analysis and Adaptive Defense
标题:动态攻击面的风险理论框架:数据集成风险分析与自适应防御
链接:https://arxiv.org/abs/2604.10427

作者:Jihyeon Yun,Abdullah Yasin Etcibasi,Ming Shi,C. Emre Koksal
摘要:We develop a queueing-theoretic framework to model the temporal evolution of cyber-attack surfaces, where the number of active vulnerabilities is represented as the backlog of a queue. Vulnerabilities arrive as they are discovered or created, and leave the system when they are patched or successfully exploited. Building on this model, we study how automation affects attack and defense dynamics by introducing an AI amplification factor that scales arrival, exploit, and patching rates. Our analysis shows that even symmetric automation can increase the rate of successful exploits. We validate the model using vulnerability data collected from an open source software supply chain and show that it closely matches real-world attack surface dynamics. Empirical results reveal heavy-tailed patching times, which we prove induce long-range dependence in vulnerability backlog and help explain persistent cyber risk. Utilizing our queueing abstraction for the attack surface, we develop a systematic approach for cyber risk mitigation. We formulate the dynamic defense problem as a constrained Markov decision process with resource-budget and switching-cost constraints, and develop a reinforcement learning (RL) algorithm that achieves provably near-optimal regret. Numerical experiments validate the approach and demonstrate that our adaptive RL-based defense policies significantly reduce successful exploits and mitigate heavy-tail queue events. Using trace-driven experiments on the ARVO dataset, we show that the proposed RL-based defense policy reduces the average number of active vulnerabilities in a software supply chain by over 90% compared to existing defense practices, without increasing the overall maintenance budget. Our results allow defenders to quantify cumulative exposure risk under long-range dependent attack dynamics and to design adaptive defense strategies with provable efficiency.


【9】Membership Inference Attacks Expose Participation Privacy in ECG Foundation Encoders
标题:会员推断攻击暴露了心电图基金会编码器的参与隐私
链接:https://arxiv.org/abs/2604.10424

作者:Ziyu Wang,Elahe Khatibi,Ankita Sharma,Krishnendu Chakrabarty,Sanaz Rahimi Moosavi,Farshad Firouzi,Amir Rahmani
摘要:Foundation-style ECG encoders pretrained with self-supervised learning are increasingly reused across tasks, institutions, and deployment contexts, often through model-as-a-service interfaces that expose scalar scores or latent representations. While such reuse improves data efficiency and generalization, it raises a participation privacy concern: can an adversary infer whether a specific individual or cohort contributed ECG data to pretraining, even when raw waveforms and diagnostic labels are never disclosed? In connected-health settings, training participation itself may reveal institutional affiliation, study enrollment, or sensitive health context.   We present an implementation-grounded audit of membership inference attacks (MIAs) against modern self-supervised ECG foundation encoders, covering contrastive objectives (SimCLR, TS2Vec) and masked reconstruction objectives (CNN- and Transformer-based MAE). We evaluate three realistic attacker interfaces: (i) score-only black-box access to scalar outputs, (ii) adaptive learned attackers that aggregate subject-level statistics across repeated queries, and (iii) embedding-access attackers that probe latent representation geometry. Using a subject-centric protocol with window-to-subject aggregation and calibration at fixed false-positive rates under a cross-dataset auditing setting, we observe heterogeneous and objective-dependent participation leakage: leakage is most pronounced in small or institution-specific cohorts and, for contrastive encoders, can saturate in embedding space, while larger and more diverse datasets substantially attenuate operational tail risk. Overall, our results show that restricting access to raw signals or labels is insufficient to guarantee participation privacy, underscoring the need for deployment-aware auditing of reusable biosignal foundation encoders in connected-health systems.


【10】SLM Finetuning for Natural Language to Domain Specific Code Generation in Production
标题:生产中自然语言到特定领域代码生成的SPL微调
链接:https://arxiv.org/abs/2604.09952

作者:Renjini R. Nair,Damian K. Kowalczyk,Marco Gaudesi,Chhaya Methani
备注:11 pages (including appendix), 5 tables, 1 figure. Submitted to arXiv as a preprint
摘要 :Many applications today use large language models for code generation; however, production systems have strict latency requirements that can be difficult to meet with large models. Small language models with a few billion parameters are resource efficient but may suffer from limited reasoning, hallucinations, or poor retention of longer context. Fine tuning improves task specific accuracy by embedding domain knowledge directly into model weights, reducing reliance on runtime context. We previously implemented a baseline natural language to code generation approach using a retrieval augmented generation pipeline that dynamically selected few shot examples to embed domain specific language context for a large language model. In this study, we evaluate small language models for generating domain specific language from natural language by fine tuning variants of Mistral and other models on a dataset of natural language code pairs. Our results show that the fine-tuned models achieve improved performance and latency on test datasets compared to larger models. We also demonstrate that the trained model can be further fine-tuned for customer specific scenarios without degrading general performance, helping resolve production issues. Load testing followed by production deployment confirmed optimal performance in terms of latency and quality. These findings demonstrate that task specific fine tuning with small language models provides an efficient, faster, and cost-effective alternative to large language models for domain specific language generation.


半/弱/无/有监督|不确定性|主动学习(10篇)

【1】Eliciting Medical Reasoning with Knowledge-enhanced Data Synthesis: A Semi-Supervised Reinforcement Learning Approach
标题:用知识增强型数据合成激发医学推理:半监督强化学习方法
链接:https://arxiv.org/abs/2604.11547

作者:Haolin Li,Shuyang Jiang,Ruipeng Zhang,Jiangchao Yao,Ya Zhang,Yanfeng Wang
备注:Accepted to ACL 2026 as a Findings paper
摘要:While large language models hold promise for complex medical applications, their development is hindered by the scarcity of high-quality reasoning data. To address this issue, existing approaches typically distill chain-of-thought reasoning traces from large proprietary models via supervised fine-tuning, then conduct reinforcement learning (RL). These methods exhibit limited improvement on underrepresented domains like rare diseases while incurring substantial costs from generating complex reasoning chains. To efficiently enhance medical reasoning, we propose MedSSR, a Medical Knowledge-enhanced data Synthesis and Semi-supervised Reinforcement learning framework. Our framework first employs rare disease knowledge to synthesize distribution-controllable reasoning questions. We then utilize the policy model itself to generate high-quality pseudo-labels. This enables a two-stage, intrinsic-to-extrinsic training paradigm: self-supervised RL on the pseudo-labeled synthetic data, followed by supervised RL on the human-annotated real data. MedSSR scales model training efficiently without relying on costly trace distillation. Extensive experiments on Qwen and Llama demonstrate that our method outperforms existing methods across ten medical benchmarks, achieving up to +5.93% gain on rare-disease tasks. Our code is available at https://github.com/tdlhl/MedSSR.


【2】CoRe-ECG: Advancing Self-Supervised Representation Learning for 12-Lead ECG via Contrastive and Reconstructive Synergy
标题:CorRe-心电图:通过对比和重建协同推进12导心电图的自我监督表示学习
链接:https://arxiv.org/abs/2604.11359

作者:Zehao Qin,Xiaojian Lin,Ping Zhang,Hongliang Wu,Xinkang Wang,Guangling Liu,Bo Chen,Wenming Yang,Guijin Wang
摘要:Accurate interpretation of electrocardiogram (ECG) remains challenging due to the scarcity of labeled data and the high cost of expert annotation. Self-supervised learning (SSL) offers a promising solution by enabling models to learn expressive representations from unlabeled signals. Existing ECG SSL methods typically rely on either contrastive learning or reconstructive learning. However, each approach in isolation provides limited supervisory signals and suffers from additional limitations, including non-physiological distortions introduced by naive augmentations and trivial correlations across multiple leads that models may exploit as shortcuts. In this work, we propose CoRe-ECG, a unified contrastive and reconstructive pretraining paradigm that establishes a synergistic interaction between global semantic modeling and local structural learning. CoRe-ECG aligns global representations during reconstruction, enabling instance-level discriminative signals to guide local waveform recovery. To further enhance pretraining, we introduce Frequency Dynamic Augmentation (FDA) to adaptively perturb ECG signals based on their frequency-domain importance, and Spatio-Temporal Dual Masking (STDM) to break linear dependencies across leads, increasing the difficulty of reconstructive tasks. Our method achieves state-of-the-art performance across multiple downstream ECG datasets. Ablation studies further demonstrate the necessity and complementarity of each component. This approach provides a robust and physiologically meaningful representation learning framework for ECG analysis.


【3】Uncertainty-Guided Attention and Entropy-Weighted Loss for Precise Plant Seedling Segmentation
标题:不确定性引导的注意力和熵加权损失精确植物幼苗分割
链接:https://arxiv.org/abs/2604.10823

作者:Mohamed Ehab,Ali Hamdi
摘要:Plant seedling segmentation supports automated phenotyping in precision agriculture. Standard segmentation models face difficulties due to intricate background images and fine structures in leaves. We introduce UGDA-Net (Uncertainty-Guided Dual Attention Network with Entropy-Weighted Loss and Deep Supervision). Three novel components make up UGDA-Net. The first component is Uncertainty-Guided Dual Attention (UGDA). UGDA uses channel variance to modulate feature maps. The second component is an entropy-weighted hybrid loss function. This loss function focuses on high-uncertainty boundary pixels. The third component employs deep supervision for intermediate encoder layers. We performed a comprehensive systematic ablation study. This study focuses on two widely-used architectures, U-Net and LinkNet. It analyzes five incremental configurations: Baseline, Loss-only, Attention-only, Deep Supervision, and UGDA-Net. We trained UGDA-net using a high-resolution plant seedling image dataset containing 432 images. We demonstrate improved segmentation performance and accuracy. With an increase in Dice coefficient of 9.3% above baseline. LinkNet's variance is 13.2% above baseline. Overlays that are qualitative in nature show the reduced false positives at the leaf boundary. Uncertainty heatmaps are consistent with the complex morphology. UGDA-Net aids in the segmentation of delicate structures in plants and provides a high-def solution. The results showed that uncertainty-guided attention and uncertainty-weighted loss are two complementing systems.


【4】Adaptive Multi-Expert Reasoning via Difficulty-Aware Routing and Uncertainty-Guided Aggregation
标题:基于难度感知路由和不确定性引导聚合的自适应多专家推理
链接:https://arxiv.org/abs/2604.10335

作者:Mohamed Ehab,Ali Hamdi
摘要:Large language models (LLMs) demonstrate strong performance in math reasoning benchmarks, but their performance varies inconsistently across problems with varying levels of difficulty. This paper describes Adaptive Multi-Expert Reasoning (AMR), a framework that focuses on problem complexity by reasoning with dynamically adapted strategies. An agile routing system that focuses on problem text predicts problems' difficulty and uncertainty and guides a reconfigurable sampling mechanism to manage the breadth of generation. Three specialized experts create candidate responses, which are modified during multiple correction and finalization phases. A neural verifier assesses the correctness of responses, while a clustering-based aggregation technique identifies the final candidate answer based on a combination of consensus and answer quality. When evaluated on the GSM8K dataset, AMR achieved 75.28% accuracy while only using the original training data. This result outperformed the majority of comparable 7B models that were trained on synthetic data. This showcases that models using difficulty-based routing and uncertainty-driven aggregation are efficient and effective in improving math reasoning models' robustness.


【5】Training-Free Cross-Lingual Dysarthria Severity Assessment via Phonological Subspace Analysis in Self-Supervised Speech Representations
标题:通过自我监督言语表达中的音素子空间分析进行免训练跨语言构音障碍严重程度评估
链接:https://arxiv.org/abs/2604.10123

作者:Bernard Muller,Antonio Armando Ortiz Barrañón,LaVonne Roberts
备注:Submitted to PLOS digital health
摘要:Dysarthric speech severity assessment typically requires trained clinicians or supervised models built from labelled pathological speech, limiting scalability across languages and clinical settings. We present a training-free method that quantifies dysarthria severity by measuring degradation in phonological feature subspaces within frozen HuBERT representations. No supervised severity model is trained; feature directions are estimated from healthy control speech using a pretrained forced aligner. For each speaker, we extract phone-level embeddings via Montreal Forced Aligner, compute d-prime scores along phonological contrast directions (nasality, voicing, stridency, sonorance, manner, and four vowel features) derived exclusively from healthy controls, and construct a 12-dimensional phonological profile.Evaluating 890 speakers across 10 corpora, 5 languages (English, Spanish, Dutch, Mandarin, French), and 3 primary aetiologies (Parkinson's disease, cerebral palsy, ALS), we find that all five consonant d-prime features correlate significantly with clinical severity (random-effects meta-analysis rho = -0.50 to -0.56, p < 2e-4; pooled Spearman rho = -0.47 to -0.55 with bootstrap 95% CIs not crossing zero). The effect replicates within individual corpora, survives FDR correction, and remains robust to leave-one-corpus-out removal and alignment quality controls. Nasality d-prime decreases monotonically from control to severe in 6 of 7 severity-graded corpora. Mann-Whitney U tests confirm that all 12 features distinguish controls from severely dysarthric speakers (p < 0.001).The method requires no dysarthric training data and applies to any language with an existing MFA acoustic model (currently 29 languages). We release the full pipeline and phone feature configurations for six languages.


【6】A Hybrid Intelligent Framework for Uncertainty-Aware Condition Monitoring of Industrial Systems
标题:用于工业系统不确定性状态监控的混合智能框架
链接:https://arxiv.org/abs/2604.09932

作者:Maryam Ahang,Todd Charter,Masoud Jalayer,Homayoun Najjaran
摘要:Hybrid approaches that combine data-driven learning with physics-based insight have shown promise for improving the reliability of industrial condition monitoring. This work develops a hybrid condition monitoring framework that integrates primary sensor measurements, lagged temporal features, and physics-informed residuals derived from nominal surrogate models. Two hybrid integration strategies are examined. The first is a feature-level fusion approach that augments the input space with residual and temporal information. The second is a model-level ensemble approach in which machine learning classifiers trained on different feature types are combined at the decision level. Both hybrid approaches of the condition monitoring framework are evaluated on a continuous stirred-tank reactor (CSTR) benchmark using several machine learning models and ensemble configurations. Both feature-level and model-level hybridization improve diagnostic accuracy relative to single-source baselines, with the best model-level ensemble achieving a 2.9\% improvement over the best baseline ensemble. To assess predictive reliability, conformal prediction is applied to quantify coverage, prediction-set size, and abstention behavior. The results show that hybrid integration enhances uncertainty management, producing smaller and well-calibrated prediction sets at matched coverage levels. These findings demonstrate that lightweight physics-informed residuals, temporal augmentation, and ensemble learning can be combined effectively to improve both accuracy and decision reliability in nonlinear industrial systems.


【7】SemEnrich: Self-Supervised Semantic Enrichment of Radiology Reports for Vision-Language Learning
标题:SemEnrich:用于视觉语言学习的放射学报告的自我监督语义丰富
链接:https://arxiv.org/abs/2604.09887

作者:Halil Ibrahim Gulluk,Olivier Gevaert
摘要:Medical vision-language datasets are often limited in size and biased toward negative findings, as clinicians report abnormalities mostly but might omit some positive/neutral findings because they might be considered as irrelevant to the patient's condition. We propose a self-supervised data enrichment method that leverages semantic clustering of report sentences. Then we enrich the findings in the medical reports in the training set by adding positive/neutral observations from different clusters in a self-supervised manner. Our approach yields consistent gains in supervised fine-tuning (5.63%, 3.04%, 7.40%, 5.30%, 7.47% average gains on COMET score, Bert score, Sentence Bleu, CheXbert-F1 and RadGraph-F1 scores respectively). Ablation studies confirm that improvements stem from semantic clustering rather than random augmentation. Furthermore, we introduce a way to incorporate semantic cluster information into the reward design for GRPO training, which leads to further performance gains (2.78%, 3.14%, 12.80% average gains on COMET score, Bert score and Sentence Bleu scores respectively). We share our code at https://anonymous.4open.science/r/SemEnrich-75CF


【8】Below-ground Fungal Biodiversity Can be Monitored Using Self-Supervised Learning Satellite Features
标题:地下真菌生物多样性可以使用自我监督的学习卫星功能进行监控
链接:https://arxiv.org/abs/2604.09818

作者:Robin Young,Michael E. Van Nuland,E. Toby Kiers,Tomáš Větrovský,Petr Kohout,Petr Baldrian,Srinivasan Keshav
摘要:Mycorrhizal fungi are vital to terrestrial ecosystem functioning. Yet monitoring their biodiversity at landscape scales is often unfeasible due to time and cost constraints. Current predictions suggest that 90\% of mycorrhizal diversity hotspots remain unprotected, opening questions of how to broadly and effectively map underground fungal communities. Here, we show that self-supervised learning (SSL) applied to satellite imagery can predict below-ground ectomycorrhizal fungal richness across diverse environments. Our models explain over half the variance in species richness across ~12,000 field samples spanning Europe and Asia. SSL-derived features prove to be the single most informative predictor, subsuming the majority of information contained in climate, soil, and land cover datasets. Using this approach, we achieve a 10,000-fold increase in spatial resolution over existing techniques, moving from 1km landscape averages to 10m habitat-scale observations with nearly no systematic bias. As satellite observations are dynamic rather than static, this enables temporal monitoring of below-ground biodiversity at landscape scales for the first time. We analyze multi-year trends in predicted fungal richness across UK National Park woodlands, finding that ancient forests may be losing ectomycorrhizal diversity at disproportionate rates. These results establish SSL satellite features as a scalable tool for extending sparse field observations to continuous, high-resolution biodiversity maps for monitoring the invisible half of terrestrial ecosystems.


【9】Deliberative Alignment is Deep, but Uncertainty Remains: Inference time safety improvement in reasoning via attribution of unsafe behavior to base model
标题:深思熟虑的一致很深,但不确定性仍然存在:通过将不安全行为归因于基础模型来提高推理中的推理时间安全性
链接:https://arxiv.org/abs/2604.09665

作者:Pankayaraj Pathmanathan,Furong Huang
摘要:While the wide adoption of refusal training in large language models (LLMs) has showcased improvements in model safety, recent works have highlighted shortcomings due to the shallow nature of these alignment methods. To this end, the work on Deliberative alignment proposed distilling reasoning capabilities from stronger reasoning models, thereby instilling deeper safety in LLMs. In this work, we study the impact of deliberative alignment in language models. First, we show that despite being larger in model size and stronger in safety capability, there exists an alignment gap between teacher and student language models, which affects both the safety and general utility of the student model. Furthermore, we show that models aligned through deliberative alignment can retain unsafe behaviors from the base model despite learning the reasoning patterns of larger reasoning models. Building upon this observation, we propose a BoN sampling method that attributes the unsafe behavior back to the base LLMs in the latent space, thereby down-ranking unsafe responses to gain a meaningful improvement in model safety across multiple safety benchmarks with minimal loss in utility. In particular, across 7 teacher models and 6 student models of different classes and sizes, we show an average attack success rate (ASR) reduction of 28.2% in DAN, 31.3% in WildJailbreak and 35.4 % in StrongREJECT benchmarks. We further show that these safety gains prevail post RL training, thus highlighting the uncertainty in safety reasoning and it's explicit attribution to the base model.


【10】Deep Learning for Sequential Decision Making under Uncertainty: Foundations, Frameworks, and Frontiers
标题:不确定性下的顺序决策的深度学习:基础、框架和前沿
链接:https://arxiv.org/abs/2604.11507

作者:I. Esra Buyuktahtakin
摘要:Artificial intelligence (AI) is moving increasingly beyond prediction to support decisions in complex, uncertain, and dynamic environments. This shift creates a natural intersection with operations research and management sciences (OR/MS), which have long offered conceptual and methodological foundations for sequential decision-making under uncertainty. At the same time, recent advances in deep learning, including feedforward neural networks, LSTMs, transformers, and deep reinforcement learning, have expanded the scope of data-driven modeling and opened new possibilities for large-scale decision systems. This tutorial presents an OR/MS-centered perspective on deep learning for sequential decision-making under uncertainty. Its central premise is that deep learning is valuable not as a replacement for optimization, but as a complement to it. Deep learning brings adaptability and scalable approximation, whereas OR/MS provides the structural rigor needed to represent constraints, recourse, and uncertainty. The tutorial reviews key decision-making foundations, connects them to the major neural architectures in modern AI, and discusses leading approaches to integrating learning and optimization. It also highlights emerging impact in domains such as supply chains, healthcare and epidemic response, agriculture, energy, and autonomous operations. More broadly, it frames these developments as part of a wider transition from predictive AI toward decision-capable AI and highlights the role of OR/MS in shaping the next generation of integrated learning--optimization systems.


迁移|Zero/Few/One-Shot|自适应(9篇)

【1】Sheaf Diffusion with Adaptive Local Structure for Spatio-Temporal Forecasting
标题:时空预测的具有自适应局部结构的纸片扩散
链接:https://arxiv.org/abs/2604.11275

作者:Abeer Mostafa,Raneen Younis,Zahra Ahmadi
摘要 :Spatio-temporal systems often exhibit highly heterogeneous and non-intuitive responses to localized disruptions, limiting the effectiveness of conventional message passing approaches in modeling higher-order interactions under local heterogeneity. This paper reformulates spatio-temporal forecasting as the problem of learning information flow over locally structured spaces, rather than propagating globally aligned node representations. We introduce a spatio-temporal sheaf diffusion graph neural network (ST-Sheaf GNN) that embeds graph topology into sheaf-theoretic vector spaces connected by learned linear restriction maps. Unlike prior work that relies on static or globally shared transformations, our model learns dynamic restriction maps that evolve over time and adapt to local spatio-temporal patterns to enable substantially more expressive interactions. By explicitly modeling latent local structure, the proposed framework efficiently mitigates the oversmoothing phenomenon in deep GNN architectures. We evaluate our framework on a diverse set of real-world spatio-temporal forecasting benchmarks spanning multiple domains. Experimental results demonstrate state-of-the-art performance, highlighting the effectiveness of sheaf-theoretic topological representations as a powerful foundation for spatio-temporal graph learning. The code is available at: https://anonymous.4open.science/r/ST-SheafGNN-6523/.


【2】Mycelium-Index: A Streaming Approximate Nearest Neighbor Index with Myelial Edge Decay, Traffic-Driven Reinforcement, and Adaptive Living Hierarchy
标题:Mycelium-Index:具有Myspel Edge衰变、物理驱动强化和自适应生活层次结构的流媒体逼近最近邻居指数
链接:https://arxiv.org/abs/2604.11274

作者:Anton Pakhunov
备注:10 pages, 10 tables, 1 appendix
摘要:We present mycelium-index, a streaming approximate nearest neighbor (ANN) index for high-dimensional vector spaces, inspired by the adaptive growth patterns of biological mycelium. The system continuously adapts its topology through myelial edge decay and reinforcement, a traffic-driven living hierarchy, and hybrid deletion combining O(1) bypass for cold nodes with O(k) beam-search repair for hub nodes. Experimental evaluation on SIFT-1M demonstrates that mycelium achieves 0.927 +/- 0.028 recall@5 under FreshDiskANN's 100%-turnover benchmark protocol -- within the measurement confidence interval of FreshDiskANN's ~0.95 -- while using 5.7x less RAM (88 MB vs. >500 MB) and achieving 4.7x higher QPS (2,795 vs. ~600). On the static index, at ef=192, mycelium matches HNSW M=16 recall (0.962 vs. 0.965) at 5.2x less RAM (163 MB vs. 854 MB). Performance optimizations including NEON SIMD distance computation, Vec-backed node storage, and bitset visited tracking yield a cumulative 2.7x QPS improvement. A systematic study of ten streaming repair mechanisms finds that geometric heuristics universally fail in high dimensions, while topological mechanisms succeed -- a principle we term the topological repair invariance of high-dimensional ANN graphs.


【3】SCOPE: Signal-Calibrated On-Policy Distillation Enhancement with Dual-Path Adaptive Weighting
标题:范围:采用双路径自适应加权的信号校准政策蒸馏增强
链接:https://arxiv.org/abs/2604.10688

作者:Binbin Zheng,Xing Ma,Yiheng Liang,Jingqing Ruan,Xiaoliang Fu,Kepeng Lin,Benchang Zhu,Ke Zeng,Xunliang Cai
摘要:On-policy reinforcement learning has become the dominant paradigm for reasoning alignment in large language models, yet its sparse, outcome-level rewards make token-level credit assignment notoriously difficult. On-Policy Distillation (OPD) alleviates this by introducing dense, token-level KL supervision from a teacher model, but typically applies this supervision uniformly across all rollouts, ignoring fundamental differences in signal quality. We propose Signal-Calibrated On-Policy Distillation Enhancement (SCOPE), a dual-path adaptive training framework that routes on-policy rollouts by correctness into two complementary supervision paths. For incorrect trajectories, SCOPE performs teacher-perplexity-weighted KL distillation to prioritize instances where the teacher demonstrates genuine corrective capability, while down-weighting unreliable guidance. For correct trajectories, it applies student-perplexity-weighted MLE to concentrate reinforcement on low-confidence samples at the capability boundary rather than over-reinforcing already mastered ones. Both paths employ a group-level normalization to adaptively calibrate weight distributions, accounting for the intrinsic difficulty variance across prompts. Extensive experiments on six reasoning benchmarks show that SCOPE achieves an average relative improvement of 11.42% in Avg@32 and 7.30% in Pass@32 over competitive baselines, demonstrating its consistent effectiveness.


【4】SpectralLoRA: Is Low-Frequency Structure Sufficient for LoRA Adaptation? A Spectral Analysis of Weight Updates
标题:SpectralLoRA:低频结构足以适应LoRA吗?权重更新的谱分析
链接:https://arxiv.org/abs/2604.10649

作者:Rajveer Singh
备注:11 pages, 6 figures, 7 tables. Indian Institute of Technology Roorkee
摘要:We present a systematic empirical study of the spectral structure of LoRA weight updates. Through 2D Discrete Cosine Transform (DCT) analysis of trained adaptation matrices across BERT-base and RoBERTa-base on four GLUE benchmarks (SST-2, MNLI, CoLA, QQP), we establish that LoRA updates are universally dominated by low-frequency components: on average, just 33% of DCT coefficients capture 90% of total spectral energy. Retaining only 10% of frequency coefficients reduces adapter storage by 10x while sacrificing only 1.95pp on SST-2. Notably, frequency masking at k=50% improves over full LoRA on 3 of 8 model-task pairs, suggesting high-frequency components act as adaptation noise. We further discover that RoBERTa-base is systematically more spectrally compressible than BERT-base across all tasks, and that task complexity governs spectral sensitivity -- NLI tasks require more frequency budget than sentiment classification. These findings motivate a new design principle for PEFT: spectral sparsity in adaptation.


【5】LoDAdaC: a unified local training-based decentralized framework with adaptive gradients and compressed communication
标题:LoDAdaC:一个统一的基于本地训练的去中心化框架,具有自适应梯度和压缩通信
链接:https://arxiv.org/abs/2604.09970

作者:Wei Liu,Anweshit Panda,Ujwal Pandey,Haven Cook,George M. Slota,Naigang Wang,Jie Chen,Yangyang Xu
备注:Accepted by TMLR
摘要:In the decentralized distributed learning, achieving fast convergence and low communication cost is essential for scalability and high efficiency. Adaptive gradient methods, such as Adam, have demonstrated strong practical performance in deep learning and centralized distributed settings. However, their convergence properties remain largely unexplored in decentralized settings involving multiple local training steps, such as federated learning. To address this limitation, we propose LoDAdaC, a unified multiple Local Training (MLT) Decentralized framework with Adam-type updates and Compressed communication (CC). LoDAdaC accommodates a broad class of optimizers for its local adaptive updates, including AMSGrad, Adam, and AdaGrad; it is compatible with standard (possibly biased) compressors such as low-bit quantization and sparsification. MLT and CC enable LoDAdaC to achieve multiplied reduction of communication cost, while the technique of adaptive updates enables fast convergence. We rigorously prove the combined advantage through complexity analysis. In addition, experiments on image classification and GPT-style language model training validate our theoretical findings and show that LoDAdaC significantly outperforms existing decentralized algorithms in terms of convergence speed and communication efficiency.


【6】Muon$^2$: Boosting Muon via Adaptive Second-Moment Preconditioning
标题:Muon $' 2 $:通过自适应二次时刻预处理增强Muon
链接:https://arxiv.org/abs/2604.09967

作者:Ziyue Liu,Ruijie Zhang,Zhengyang Wang,Yequan Zhao,Yupeng Su,Zi Yang,Zheng Zhang
备注:Preprint, subject to update
摘要:Muon has emerged as a promising optimizer for large-scale foundation model pre-training by exploiting the matrix structure of neural network updates through iterative orthogonalization. However, its practical efficiency is limited by the need for multiple Newton--Schulz (NS) iterations per optimization step, which introduces non-trivial computation and communication overhead. We propose Muon$^2$, an extension of Muon that applies Adam-style adaptive second-moment preconditioning before orthogonalization. Our key insight is that the core challenge of polar approximation in Muon lies in the ill-conditioned momentum matrix, of which the spectrum is substantially improved by Muon$^2$, leading to faster convergence toward a practically sufficient orthogonalization. We further characterize the practical orthogonalization quality via directional alignment, under which Muon$^2$ demonstrates dramatic improvement over Muon at each polar step. Across GPT and LLaMA pre-training experiments from 60M to 1.3B parameters, Muon$^2$ consistently outperforms Muon and recent Muon variants while reducing NS iterations by 40\%. We further introduce Muon$^2$-F, a memory-efficient factorized variant that preserves most of the gains of Muon$^2$ with negligible memory overhead.


【7】Regularized Entropy Information Adaptation with Temporal-Awareness Networks for Simultaneous Speech Translation
标题:基于时间感知网络的规则化信息自适应语音同步翻译
链接:https://arxiv.org/abs/2604.09916

作者:Joseph Liu,Nameer Hirschkind,Xiao Yu,Mahesh Kumar Nandwana
备注:Under review at Interspeech 2026
摘要:Simultaneous Speech Translation (SimulST) requires balancing high translation quality with low latency. Recent work introduced REINA, a method that trains a Read/Write policy based on estimating the information gain of reading more audio. However, we find that information-based policies often lack temporal context, leading the policy to bias itself toward reading most of the audio before starting to write. We improve REINA using two distinct strategies: a supervised alignment network (REINA-SAN) and a timestep-augmented network (REINA-TAN). Our results demonstrate that while both methods significantly outperform the baseline and resolve stability issues, REINA-TAN provides a slightly superior Pareto frontier for streaming efficiency, whereas REINA-SAN offers more robustness against 'read loops'. Applied to Whisper, both methods improve the pareto frontier of streaming efficiency as measured by Normalized Streaming Efficiency (NoSE) scores up to 7.1% over existing competitive baselines.


【8】A Modular Zero-Shot Pipeline for Accident Detection, Localization, and Classification in Traffic Surveillance Video
标题:用于交通监控视频中事故检测、定位和分类的模块化Zero-Shot管道
链接:https://arxiv.org/abs/2604.09685

作者:Amey Thakur,Sarvesh Talele
备注:9 pages, 7 figures, 2 tables. Submitted to the ACCIDENT @ CVPR 2026 Workshop. Source code and notebook available at https://www.kaggle.com/code/ameythakur20/zero-shot-cctv-traffic-accident-understanding/
摘要:We describe a zero-shot pipeline developed for the ACCIDENT @ CVPR 2026 challenge. The challenge requires predicting when, where, and what type of traffic accident occurs in surveillance video, without labeled real-world training data. Our method separates the problem into three independent modules. The first module localizes the collision in time by running peak detection on z-score normalized frame-difference signals. The second module finds the impact location by computing the weighted centroid of cumulative dense optical flow magnitude maps using the Farneback algorithm. The third module classifies collision type by measuring cosine similarity between CLIP image embeddings of frames near the detected peak and text embeddings built from multi-prompt natural language descriptions of each collision category. No domain-specific fine-tuning is involved; the pipeline processes each video using only pre-trained model weights. Our implementation is publicly available as a Kaggle notebook.


【9】Adaptive H-EFT-VA: A Provably Safe Trajectory Through the Trainability-Expressibility Landscape of Variational Quantum Algorithms
标题:自适应H-EFT-VA:通过变分量子算法的可训练性-表达性景观的可证明安全轨迹
链接:https://arxiv.org/abs/2604.10607

作者:Eyad I. B. Hamid
备注:17 figures
摘要:H-EFT-VA established a physics-informed solution to the Barren Plateau (BP) problem via a hierarchical EFT UV-cutoff, guaranteeing gradient variance in Omega(1/poly(N)). However, localization restricts the ansatz to a polynomial subspace, creating a reference-state gap for states distant from |0>^N. We introduce Adaptive H-EFT-VA (A-H-EFT) to navigate the trainability-expressibility tradeoff by expanding the reachable Hilbert space along a safe trajectory. Gradient variance is maintained in Omega(1/poly(N)) if sigma(t) <= 0.5/sqrt(LN) (Theorem 1). A Safe Expansion Corollary and Monotone Growth Lemma confirm expansion without discontinuous jumps. Benchmarking across 16 experiments (up to N=14) shows A-H-EFT achieves fidelity F=0.54, doubling static H-EFT-VA (F=0.27) and outperforming HEA (F~0.01), with gradient variance >= 0.5 throughout. For Heisenberg XXZ (Delta_ref=1), A-H-EFT identifies the negative ground state while static methods fail. Results are statistically significant (p < 10^-37). Robustness over three decades of hyperparameters enables deployment without search. This is the first rigorously bounded trajectory through the VQA landscape.


强化学习(8篇)

【1】Solving Physics Olympiad via Reinforcement Learning on Physics Simulators
标题:通过物理模拟器上的强化学习解决物理奥林匹克竞赛
链接:https://arxiv.org/abs/2604.11805

作者:Mihir Prabhudesai,Aryan Satpathy,Yangmin Li,Zheyang Qin,Nikash Bhardwaj,Amir Zadeh,Chuan Li,Katerina Fragkiadaki,Deepak Pathak
备注:Project Webpage - https://sim2reason.github.io/
摘要:We have witnessed remarkable advances in LLM reasoning capabilities with the advent of DeepSeek-R1. However, much of this progress has been fueled by the abundance of internet question-answer (QA) pairs, a major bottleneck going forward, since such data is limited in scale and concentrated mainly in domains like mathematics. In contrast, other sciences such as physics lack large-scale QA datasets to effectively train reasoning-capable models. In this work, we show that physics simulators can serve as a powerful alternative source of supervision for training LLMs for physical reasoning. We generate random scenes in physics engines, create synthetic question-answer pairs from simulated interactions, and train LLMs using reinforcement learning on this synthetic data. Our models exhibit zero-shot sim-to-real transfer to real-world physics benchmarks: for example, training solely on synthetic simulated data improves performance on IPhO (International Physics Olympiad) problems by 5-10 percentage points across model sizes. These results demonstrate that physics simulators can act as scalable data generators, enabling LLMs to acquire deep physical reasoning skills beyond the limitations of internet-scale QA data. Code available at: https://sim2reason.github.io/.


【2】Autonomous Diffractometry Enabled by Visual Reinforcement Learning
标题:通过视觉强化学习实现的自主衍射测量
链接:https://arxiv.org/abs/2604.11773

作者:J. Oppliger,M. Stifter,A. Rüegg,I. Biało,L. Martinelli,P. G. Freeman,D. Prabhakaran,J. Zhao,Q. Wang,J. Chang
备注:20 pages, 16 figures
摘要:Automation underpins progress across scientific and industrial disciplines. Yet, automating tasks requiring interpretation of abstract visual information remain challenging. For example, crystal alignment strongly relies on humans with the ability to comprehend diffraction patterns. Here we introduce an autonomous system that aligns single crystals without access to crystallography and diffraction theory. Using a model-free reinforcement learning framework, an agent learns to identify and navigate towards high-symmetry orientations directly from Laue diffraction patterns. Despite the absence of human supervision, the agent develops human-like strategies to achieve time-efficient alignment across different crystal symmetry classes. With this, we provide a computational framework for intelligent diffractometers. As such, our approach advances the development of automated experimental workflows in materials science.


【3】MADQRL: Distributed Quantum Reinforcement Learning Framework for Multi-Agent Environments
标题:MADQRL:用于多智能体环境的分布式量子强化学习框架
链接:https://arxiv.org/abs/2604.11131

作者:Abhishek Sawaika,Samuel Yen-Chi Chen,Udaya Parampalli,Rajkumar Buyya
备注:Accepted in QC4C3 Workshop at IEEE QCNC, 2026
摘要:Reinforcement learning (RL) is one of the most practical ways to learn from real-life use-cases. Motivated from the cognitive methods used by humans makes it a widely acceptable strategy in the field of artificial intelligence. Most of the environments used for RL are often high-dimensional, and traditional RL algorithms becomes computationally expensive and challenging to effectively learn from such systems. Recent advancements in practical demonstration of quantum computing (QC) theories, such as compact encoding, enhanced representation and learning algorithms, random sampling, or the inherent stochastic nature of quantum systems, have opened up new directions to tackle these challenges. Quantum reinforcement learning (QRL) is seeking significant traction over the past few years. However, the current state of quantum hardware is not enough to cater for such high-dimensional environments with complex multi-agent setup. To tackle this issue, we propose a distributed framework for QRL where multiple agents learn independently, distributing the load of joint training from individual machines. Our method works well for environments with disjoint sets of action and observation spaces, but can also be extended to other systems with reasonable approximations. We analyze the proposed method on cooperative-pong environment and our results indicate ~10% improvement from other distribution strategies, and ~5% improvement from classical models of policy representation.


【4】EvoNash-MARL: A Closed-Loop Multi-Agent Reinforcement Learning Framework for Medium-Horizon Equity Allocation
标题:EvoNash-MARL:用于中期股权分配的闭环多主体强化学习框架
链接:https://arxiv.org/abs/2604.10911

作者:Chongliu Jia,Yi Luo,Sipeng Han,Pengwei Li,Jie Ding,Youshuang Hu,Yimiao Qian,Qiya Wang
摘要:Medium-to-long-horizon stock allocation presents significant challenges due toveak predictive structures, non-stadonary market regimes, and the degradationf signals following the application of transaction costs, capacity limits, and tail-isk constraints. Conventional approaches commonly rely on a single predictor orloosely coupled prediction-to-allocation pipeline, limiting robustness underThis work addresses a targeted design question: whetherlistribution shift. 1coupling reinforcement learning (RL), multi-agent policy populations, Policy-Space Response Oracle (PSRO)-style aggregation, league best-response trainingevolutionary replacement, and execution-aware checkpoint selection within ainified walk-forward loop improves allocator robustness at medium to longhorizons. The proposed framework, EvoNash-MARL, integrates these componentswithin an execution-aware allocation loop and further introduces a layeredpolicy architecture comprising a direction head and a risk head, nonlinear signalenhancement, feature-quality reweighting, and constraint-aware checkpointselection. Under a 120-window walk-forward protocol, the resolved v21configuration achieves mean excess Sharpe 0.7600 and robust score -0.0203,anking first among internal controls; on aligned daily out-of-sample returnsrom 2014-01-02 to 2024-01-05, it delivers 19.6% annualized return versus 11.7% for SPY, and in an extended walk-forward evaluation through 2026-02-10 it delivers 20.5% rersus 13.5%. The framework maintains positive performance under realistictress constraints and exhibits structured cross-market generalization; however,lobal strong significance under White's Reality Check (WRC) and SPA-lite testingestablished. Therefore, the results are presented as evidence supporting asnotnore stable medium-to long-horizon training and selection paradigm, ratherhan as prooffof universally superior market-timing performance.


【5】PokeRL: Reinforcement Learning for Pokemon Red
标题:PokeRL:Pokemon Red的强化学习
链接 :https://arxiv.org/abs/2604.10812

作者:Dheeraj Mudireddy,Sai Patibandla
摘要:Pokemon Red is a long-horizon JRPG with sparse rewards, partial observability, and quirky control mechanics that make it a challenging benchmark for reinforcement learning. While recent work has shown that PPO agents can clear the first two gyms using heavy reward shaping and engineered observations, training remains brittle in practice, with agents often degenerating into action loops, menu spam, or unproductive wandering. In this paper, we present PokeRL, a modular system that trains deep reinforcement learning agents to complete early game tasks in Pokemon Red, including exiting the player's house, exploring Pallet Town to reach tall grass, and winning the first rival battle. Our main contributions are a loop-aware environment wrapper around the PyBoy emulator with map masking, a multi-layer anti-loop and anti-spam mechanism, and a dense hierarchical reward design. We argue that practical systems like PokeRL, which explicitly model failure modes such as loops and spam, are a necessary intermediate step between toy benchmarks and full Pokemon League champion agents. Code is available at https://github.com/reddheeraj/PokemonRL


【6】MAVEN-T: Multi-Agent enVironment-aware Enhanced Neural Trajectory predictor with Reinforcement Learning
标题:MAVEN-T:多智能体环境感知的增强型神经轨迹预测器,具有强化学习
链接:https://arxiv.org/abs/2604.10169

作者:Wenchang Duan
摘要:Trajectory prediction remains a critical yet challenging component in autonomous driving systems, requiring sophisticated reasoning capabilities while meeting strict real-time deployment constraints. While knowledge distillation has demonstrated effectiveness in model compression, existing approaches often fail to preserve complex decision-making capabilities, particularly in dynamic multi-agent scenarios. This paper introduces MAVEN-T, a teacher-student framework that achieves state-of-the-art trajectory prediction through complementary architectural co-design and progressive distillation. The teacher employs hybrid attention mechanisms for maximum representational capacity, while the student uses efficient architectures optimized for deployment. Knowledge transfer is performed via multi-granular distillation with adaptive curriculum learning that dynamically adjusts complexity based on performance. Importantly, the framework incorporates reinforcement learning to overcome the imitation ceiling of traditional distillation, enabling the student to verify, refine, and optimize teacher knowledge through dynamic environmental interaction, potentially achieving more robust decision-making than the teacher itself. Extensive experiments on NGSIM and highD datasets demonstrate 6.2x parameter compression and 3.7x inference speedup while maintaining state-of-the-art accuracy, establishing a new paradigm for deploying sophisticated reasoning models under resource constraints.


【7】A Comparative Theoretical Analysis of Entropy Control Methods in Reinforcement Learning
标题:强化学习中的熵控制方法的比较理论分析
链接:https://arxiv.org/abs/2604.09676

作者:Ming Lei,Christophe Baehr
备注:13 pages
摘要:Reinforcement learning (RL) has become a key approach for enhancing reasoning in large language models (LLMs), yet scalable training is often hindered by the rapid collapse of policy entropy, which leads to premature convergence and performance saturation. This paper provides a comparative theoretical analysis of two entropy control strategies: traditional entropy regularization and the recently proposed covariance-based mechanism. We establish a unified framework for entropy dynamics under softmax parameterization, showing that entropy change is governed by the covariance between log-probabilities and logit updates. Our analysis reveals that traditional entropy regularization introduces a dense, persistent bias that modifies the stationary condition, leading to suboptimal policies, while covariance-based methods selectively regularize a sparse subset of high-covariance tokens and achieve asymptotic unbiasedness when the regularization coefficient is annealed. These results provide principled guidelines for entropy control in LLM posttraining, with implications for scaling RL to larger models and more complex reasoning tasks.


【8】Belief-State RWKV for Reinforcement Learning under Partial Observability
标题:部分可观测下的信念状态RWKV强化学习
链接:https://arxiv.org/abs/2604.09671

作者:Liu Xiao
摘要:We propose a stronger formulation of RL on top of RWKV-style recurrent sequence models, in which the fixed-size recurrent state is explicitly interpreted as a belief state rather than an opaque hidden vector. Instead of conditioning policy and value on a single summary h_t, we maintain a compact uncertainty-aware state b_t = (μ_t, Σ_t) derived from RWKV-style recurrent statistics and let control depend on both memory and uncertainty. This design targets a key weakness of plain fixed-state policies in partially observed settings: they may store evidence, but not necessarily confidence. We present the method, a theoretical program, and a pilot RL experiment with hidden episode-level observation noise together with a test-time noise sweep. The pilot shows that belief-state policies nearly match the best recurrent baseline overall while slightly improving return on the hardest in-distribution regime and under a held-out noise shift. Additional ablations show that this simple belief readout is currently stronger than two more structured extensions, namely gated memory control and privileged belief targets, underscoring the need for richer benchmarks.


符号|符号学习(1篇)

【1】NSFL: A Post-Training Neuro-Symbolic Fuzzy Logic Framework for Boolean Operators in Neural Embeddings
标题:NSFL:神经嵌入中布尔运算符的训练后神经符号模糊逻辑框架
链接:https://arxiv.org/abs/2604.10604

作者:Vladi Vexler,Ofer Idan,Gil Lederman,Dima Sivov
备注:23 pages (16 main + 7 appendix), 2 figures, 10 tables, 1 algorithm
摘要:Standard dense retrievers lack a native calculus for multi-atom logical constraints. We introduce Neuro-Symbolic Fuzzy Logic (NSFL), a framework that adapts formal t-norms and t-conorms to neural embedding spaces without requiring retraining. NSFL operates as a first-order hybrid calculus: it anchors logical operations on isolated zero-order similarity scores while actively steering representations using Neuro-Symbolic Deltas (NS-Delta) -- the first-order marginal differences derived from contextual fusion. This preserves pure atomic meaning while capturing domain reliance, preventing the representation collapse and manifold escape endemic to traditional geometric baselines. For scalable real-time retrieval, Spherical Query Optimization (SQO) leverages Riemannian optimization to project these fuzzy formulas into manifold-stable query vectors. Validated across six distinct encoder configurations and two modalities (including zero-shot and SOTA fine-tuned models), NSFL yields mAP improvements up to +81%. Notably, NSFL provides an additive 20% average and up to 47% boost even when applied to encoders explicitly fine-tuned for logical reasoning. By establishing a training-free, order-aware calculus for high-dimensional spaces, this framework lays the foundation for future dynamic scaling and learned manifold logic.


医学相关(9篇)

【1】MosaicMRI: A Diverse Dataset and Benchmark for Raw Musculoskeletal MRI
标题:MosaicMRI:原始肌肉骨骼MRI的多样化数据集和基准
链接:https://arxiv.org/abs/2604.11762

作者:Paula Arguello,Berk Tinaz,Mohammad Shahab Sepehri,Maryam Soltanolkotabi,Mahdi Soltanolkotabi
备注:15 pages, 6 figures, preliminary version
摘要:Deep learning underpins a wide range of applications in MRI, including reconstruction, artifact removal, and segmentation. However, progress has been driven largely by public datasets focused on brain and knee imaging, shaping how models are trained and evaluated. As a result, careful studies of the reliability of these models across diverse anatomical settings remain limited. In this work, we introduce MosaicMRI, a large and diverse collection of fully sampled raw musculoskeletal (MSK) MR measurements designed for training and evaluating machine-learning-based methods. MosaicMRI is the largest open-source raw MSK MRI dataset to date, comprising 2,671 volumes and 80,156 slices. The dataset offers substantial diversity in volume orientation (e.g., axial, sagittal), imaging contrasts (e.g., PD, T1, T2), anatomies (e.g., spine, knee, hip, ankle, and others), and numbers of acquisition coils. Using VarNet as a baseline for accelerated reconstruction task, we perform a comprehensive set of experiments to study scaling behavior with respect to both model capacity and dataset size. Interestingly, models trained on the combined anatomies significantly outperform anatomy-specific models in low-sample regimes, highlighting the benefits of anatomical diversity and the presence of exploitable cross-anatomical correlations. We further evaluate robustness and cross-anatomy generalization by training models on one anatomy (e.g., spine) and testing them on another (e.g., knee). Notably, we identify groups of body parts (e.g., foot and elbow) that generalize well with each other, and highlight that performance under domain shifts depends on both training set size, anatomy, and protocol-specific factors.


【2】From Answers to Arguments: Toward Trustworthy Clinical Diagnostic Reasoning with Toulmin-Guided Curriculum Goal-Conditioned Learning
标题:从答案到争论:通过图尔明引导的课程目标条件学习迈向值得信赖的临床诊断推理
链接:https://arxiv.org/abs/2604.11137

作者:Chen Zhan,Xiaoyu Tan,Gengchen Ma,Yu-Jie Xiong,Xiaoyan Jiang,Xihe Qiu
备注:Accepted at ACL 2026 (Main Conference)
摘要:The integration of Large Language Models (LLMs) into clinical decision support is critically obstructed by their opaque and often unreliable reasoning. In the high-stakes domain of healthcare, correct answers alone are insufficient; clinical practice demands full transparency to ensure patient safety and enable professional accountability. A pervasive and dangerous weakness of current LLMs is their tendency to produce "correct answers through flawed reasoning." This issue is far more than a minor academic flaw; such process errors signal a fundamental lack of robust understanding, making the model prone to broader hallucinations and unpredictable failures when faced with real-world clinical complexity. In this paper, we establish a framework for trustworthy clinical argumentation by adapting the Toulmin model to the diagnostic process. We propose a novel training pipeline: Curriculum Goal-Conditioned Learning (CGCL), designed to progressively train LLM to generate diagnostic arguments that explicitly follow this Toulmin structure. CGCL's progressive three-stage curriculum systematically builds a solid clinical argument: (1) extracting facts and generating differential diagnoses; (2) justifying a core hypothesis while rebutting alternatives; and (3) synthesizing the analysis into a final, qualified conclusion. We validate CGCL using T-Eval, a quantitative framework measuring the integrity of the diagnosis reasoning. Experiments show that our method achieves diagnostic accuracy and reasoning quality comparable to resource-intensive Reinforcement Learning (RL) methods, while offering a more stable and efficient training pipeline.


【3】Learning Preference-Based Objectives from Clinical Narratives for Sequential Treatment Decision-Making
标题:从临床叙述中学习基于偏好的目标以进行顺序治疗决策
链接:https://arxiv.org/abs/2604.10783

作者:Daniel J. Tan,Kay Choong See,Mengling Feng
摘要:Designing reward functions remains a central challenge in reinforcement learning (RL) for healthcare, where outcomes are sparse, delayed, and difficult to specify. While structured data capture physiological states, they often fail to reflect the overall quality of a patient's clinical trajectory, including recovery dynamics, treatment burden, and stability. Clinical narratives, in contrast, summarize longitudinal reasoning and implicitly encode evaluations of treatment effectiveness. We propose Clinical Narrative-informed Preference Rewards (CN-PR), a framework for learning reward functions directly from discharge summaries by treating them as scalable supervision for trajectory-level preferences. Using a large language model, we derive trajectory quality scores (TQS) and construct pairwise preferences over patient trajectories, enabling reward learning via a structured preference-based objective. To account for variability in narrative informativeness, we incorporate a confidence signal that weights supervision based on its relevance to the decision-making task. The learned reward aligns strongly with trajectory quality (Spearman rho = 0.63) and enables policies that are consistently associated with improved recovery-related outcomes, including increased organ support-free days and faster shock resolution, while maintaining comparable performance on mortality. These effects persist under external validation. Our results demonstrate that narrative-derived supervision provides a scalable and expressive alternative to handcrafted or outcome-based reward design for dynamic treatment regimes.


【4】Lung Cancer Detection Using Deep Learning
标题:使用深度学习检测肺癌
链接:https://arxiv.org/abs/2604.10765

作者:Imama Ajmi,Abhishek Das
备注:8 pages
摘要:Lung cancer, the second leading cause of cancer-related deaths, is primarily linked to long-term tobacco smoking (85% of cases). Surprisingly, 10-15% of cases occur in non-smokers. In 2020, approximately 2 million people were affected globally, resulting in 1.5 million deaths. The survival rate, at around 20%, lags behind other cancers, partly due to late-stage symptom manifestation. Necessitates early and accurate detection for effective treatment. Performance metrics such as accuracy, precision, recall (sensitivity), and F1-score are computed to provide a comprehensive evaluation of each model's capabilities. By comparing these metrics, this study offers insights into the strengths and limitations of each approach, contributing to the advancement of lung cancer detection techniques. In this paper, we are going to discuss the methodologies of lung cancer detection using different deep learning algorithms - InceptionV3, MobileNetV2, VGG16, ResNet152 - are explored for their efficacy in classifying lung cancer cases. Our Proposed Model algorithm based is a 16 layers architecture based on CNN model. Our Proposed model exhibits several key highlights that contribute to its novelty. By integrating multiple layer types such as convolutional, pooling, flatten, dropout, fully connected and dense layers, the model leverages the strengths of each layer to enhance its predictive capabilities. Novelty of our proposed model is that its accuracy is increasing consistently with the increasing no of epochs. We have tested the model performance up to epoch no 30. Our proposed model also overcome the overfitting problem.


【5】CARE-ECG: Causal Agent-based Reasoning for Explainable and Counterfactual ECG Interpretation
标题:CARE-心电图:基于因果因子的推理,用于可解释和反事实心电图解释
链接:https://arxiv.org/abs/2604.10420

作者:Elahe Khatibi,Ziyu Wang,Ankita Sharma,Krishnendu Chakrabarty,Sanaz Rahimi Moosavi,Farshad Firouzi,Amir Rahmani
摘要:Large language models (LLMs) enable waveform-to-text ECG interpretation and interactive clinical questioning, yet most ECG-LLM systems still rely on weak signal-text alignment and retrieval without explicit physiological or causal structure. This limits grounding, temporal reasoning, and counterfactual "what-if" analysis central to clinical decision-making. We propose CARE-ECG, a causally structured ECG-language reasoning framework that unifies representation learning, diagnosis, and explanation in a single pipeline. CARE-ECG encodes multi-lead ECGs into temporally organized latent biomarkers, performs causal graph inference for probabilistic diagnosis, and supports counterfactual assessment via structural causal models. To improve faithfulness, CARE-ECG grounds language outputs through causal retrieval-augmented generation and a modular agentic pipeline that integrates history, diagnosis, and response with verification. Across multiple ECG benchmarks and expert QA settings, CARE-ECG improves diagnostic accuracy and explanation faithfulness while reducing hallucinations (e.g., 0.84 accuracy on Expert-ECG-QA and 0.76 on SCP-mapped PTB-XL under GPT-4). Overall, CARE-ECG provides traceable reasoning by exposing key latent drivers, causal evidence paths, and how alternative physiological states would change outcomes.


【6】End-to-end Automated Deep Neural Network Optimization for PPG-based Blood Pressure Estimation on Wearables
标题:端到端自动化深度神经网络优化,用于可穿戴设备上基于PGP的血压估计
链接:https://arxiv.org/abs/2604.10117

作者:Francesco Carlucci,Giovanni Pollo,Xiaying Wang,Massimo Poncino,Enrico Macii,Luca Benini,Sara Vinco,Alessio Burrello,Daniele Jahier Pagliari
摘要:Photoplethysmography (PPG)-based blood pressure (BP) estimation is a challenging task, particularly on resource-constrained wearable devices. However, fully on-board processing is desirable to ensure user data confidentiality. Recent deep neural networks (DNNs) have achieved high BP estimation accuracy by reconstructing BP waveforms or directly regressing BP values, but their large memory, computation, and energy requirements hinder deployment on wearables. This work introduces a fully automated DNN design pipeline that combines hardware-aware neural architecture search (NAS), pruning, and mixed-precision search (MPS) to generate accurate yet compact BP prediction models optimized for ultra-low-power multicore systems-on-chip (SoCs). Starting from state-of-the-art baseline models on four public datasets, our optimized networks achieve up to 7.99% lower error with a 7.5x parameter reduction, or up to 83x fewer parameters with negligible accuracy loss. All models fit within 512 kB of memory on our target SoC (GreenWaves' GAP8), requiring less than 55 kB and achieving an average inference latency of 142 ms and energy consumption of 7.25 mJ. Patient-specific fine-tuning further improves accuracy by up to 64%, enabling fully autonomous, low-cost BP monitoring on wearables.


【7】Improving Pediatric Emergency Department Triage with Modality Dropout in Late Fusion Multimodal EHR Models
标题:晚期融合多模式EHR模型中的模式缺失改善儿科急诊分诊
链接:https://arxiv.org/abs/2604.09905

作者:Tyler Yang,Romal Mitr
备注:10 pages, 4 figures, 4 tables
摘要 :Emergency department triage relies heavily on both quantitative vital signs and qualitative clinical notes, yet multimodal machine learning models predicting triage acuity often suffer from modality collapse by over-relying on structured tabular data. This limitation severely hinders demographic generalizability, particularly for pediatric patients where developmental variations in vital signs make unstructured clinical narratives uniquely crucial. To address this gap, we propose a late-fusion multimodal architecture that processes tabular vitals via XGBoost and unstructured clinical text via Bio_ClinicalBERT, combined through a Logistic Regression meta-classifier to predict the 5-level Emergency Severity Index. To explicitly target the external validity problem, we train our model exclusively on adult encounters from the MIMIC-IV and NHAMCS datasets and evaluate its zero-shot generalization on a traditionally overlooked pediatric cohort. Furthermore, we employ symmetric modality dropout during training to prevent the ensemble from overfitting to adult-specific clinical correlations. Our results demonstrate that the multimodal framework significantly outperforms single-modality baselines. Most notably, applying a 30-40% symmetric modality dropout rate yielded steep performance improvements in the unseen pediatric cohort, elevating the Quadratic Weighted Kappa to 0.351. These findings highlight modality dropout as a critical regularization technique for mitigating modality collapse and enhancing cross-demographic generalization in clinical AI.


【8】Robust Fair Disease Diagnosis in CT Images
标题:CT图像中稳健公平的疾病诊断
链接:https://arxiv.org/abs/2604.09710

作者:Justin Li,Daniel Ding,Asmita Yuki Pritha,Aryana Hou,Xin Wang,Shu Hu
备注:8 pages, 3 figures, 2 tables. Accepted at the 3rd Workshop on New Trends in AI-Generated Media and Security (AIMS) @ CVPR 2026
摘要:Automated diagnosis from chest CT has improved considerably with deep learning, but models trained on skewed datasets tend to perform unevenly across patient demographics. However, the situation is worse than simple demographic bias. In clinical data, class imbalance and group underrepresentation often coincide, creating compound failure modes that neither standard rebalancing nor fairness corrections can fix alone. We introduce a two-level objective that targets both axes of this problem. Logit-adjusted cross-entropy loss operates at the sample level, shifting decision margins by class frequency with provable consistency guarantees. Conditional Value at Risk aggregation operates at the group level, directing optimization pressure toward whichever demographic group currently has the higher loss. We evaluate on the Fair Disease Diagnosis benchmark using a 3D ResNet-18 pretrained on Kinetics-400, classifying CT volumes into Adenocarcinoma, Squamous Cell Carcinoma, COVID-19, and Normal groups with patient sex annotations. The training set illustrates the compound problem concretely: squamous cell carcinoma has 84 samples total, 5 of them female. The combined loss reaches a gender-averaged macro F1 of 0.8403 with a fairness gap of 0.0239, a 13.3% improvement in score and 78% reduction in demographic disparity over the baseline. Ablations show that each component alone falls short. The code is publicly available at https://github.com/Purdue-M2/Fair-Disease-Diagnosis.


【9】Investigating Vaccine Buyer's Remorse: Post-Vaccination Decision Regret in COVID-19 Social Media Using Politically Diverse Human Annotation
标题:调查疫苗购买者的悔恨:COVID-19社交媒体中使用政治多元化的人类注释的疫苗接种后决策悔恨
链接:https://arxiv.org/abs/2604.09626

作者:Miles Stanley,Soumyajit Datta,Ashutosh Kumar,Ashiqur R. KhudaBukhsh
摘要:A significant gap exists in datasets regarding post-COVID-19 vaccination experiences, particularly ``vaccine buyer's remorse''. Understanding the prevalence and nature of vaccine regret, whether based on personal or vicarious experiences, is vital for addressing vaccine hesitancy and refining public health communication. In this paper, we curate a novel dataset from a large YouTube news corpus capturing COVID-19 vaccination experiences, and construct a benchmark subset focused on vaccine regret, annotated by a politically diverse panel to account for the subjective and often politicized nature of the topic. We utilize large language models (LLMs) to identify posts expressing vaccine regret, analyze the reasons behind this regret, and quantify its occurrence in both first and second-person accounts. This paper aims to (1) quantify the prevalence of vaccine regret; (2) identify common reasons for this sentiment; (3) analyze differences between first-person and vicarious experiences; and (4) assess potential biases introduced by different LLMs. We find that while vaccine buyer's remorse appears in only $<2\%$ of public discourse, it is disproportionately concentrated in vaccine-skeptic influencer communities and is predominantly expressed through first-person narratives citing adverse health events.


蒸馏|知识提取(3篇)

【1】CapBench: A Multi-PDK Dataset for Machine-Learning-Based Post-Layout Capacitance Extraction
标题:CapBench:一个用于基于机器学习的布局后电容提取的多PDK数据集
链接:https://arxiv.org/abs/2604.11202

作者:Hector R. Rodriguez,Jiechen Huang,Wenjian Yu
备注:Accepted at the 63rd ACM/IEEE Design Automation Conference (DAC '26). 7 pages, 5 figures
摘要:We present CapBench, a fully reproducible, multi-PDK dataset for capacitance extraction. The dataset is derived from open-source designs, including single-core CPUs, systems-on-chip, and media accelerators. All designs are fully placed and routed using 14 independent OpenROAD flow runs spanning three technology nodes: ASAP7, NanGate45, and Sky130HD. From these layouts, we extract 61,855 3D windows across three size tiers to enable transfer learning and scalability studies. High-fidelity capacitance labels are generated using RWCap, a state-of-the-art random-walk solver, and validated against the industry-standard Raphael, achieving a mean absolute error of 0.64% for total capacitance. Each window is pre-processed into density maps, graph representations, and point clouds. We evaluate 10 machine learning architectures that illustrate dataset usage and serve as baselines, including convolutional neural networks (CNNs), point cloud transformers, and graph neural networks (GNNs). CNNs demonstrate the lowest errors (1.75%), while GNNs are up to 41.4x faster but exhibit larger errors (10.2%), illustrating a clear accuracy-speed trade-off. Code and dataset are available at https://github.com/THU-numbda/CapBench.


【2】Quantum-Gated Task-interaction Knowledge Distillation for Pre-trained Model-based Class-Incremental Learning
标题:用于预训练的基于模型的类增量学习的量子门控任务交互知识提炼
链接:https://arxiv.org/abs/2604.11112

作者:Linjie Li,Huiyu Xiao,Jiarui Cao,Zhenyu Wu,Yang Ji
备注:Accepted to CVPR2026
摘要 :Class-incremental learning (CIL) aims to continuously accumulate knowledge from a stream of tasks and construct a unified classifier over all seen classes. Although pretrained models (PTMs) have shown promising performance in CIL, they still struggle with the entanglement of multi-task subspaces, leading to catastrophic forgetting when task routing parameters are poorly calibrated or task-level representations are rigidly fixed. To address this issue, we propose a novel Quantum-Gated Task-interaction Knowledge Distillation (QKD) framework that leverages quantum gating to guide inter-task knowledge transfer. Specifically, we introduce a quantum-gated task modulation gating mechanism to model the relational dependencies among task embedding, dynamically capturing the sample-to-task relevance for both joint training and inference across streaming tasks. Guided by the quantum gating outputs, we perform task-interaction knowledge distillation guided by these task-embedding-level correlation weights from old to new adapters, enabling the model to bridge the representation gaps between independent task subspaces. Extensive experiments demonstrate that QKD effectively mitigates forgetting and achieves state-of-the-art performance.


【3】Omnimodal Dataset Distillation via High-order Proxy Alignment
标题:通过高级代理对齐的全模式数据集蒸馏
链接:https://arxiv.org/abs/2604.10666

作者:Yuxuan Gao,Xiaohao Liu,Xiaobo Xia,Tongliang Liu
摘要:Dataset distillation compresses large-scale datasets into compact synthetic sets while preserving training performance, but existing methods are largely restricted to single-modal or bimodal settings. Extending dataset distillation to scenarios involving more than two modalities, i.e., Omnimodal Dataset Distillation, remains underexplored and challenging due to increased heterogeneity and complex cross-modal interactions. In this work, we identify the key determinant that bounds the endpoint discrepancy in the omnimodal setting, which is exacerbated with an increasing number of modalities. To this end, we propose HoPA, a unified method that captures high-order cross-modal alignments via a compact proxy, which is compatible with trajectory matching as well. By abstracting omnimodal alignment with a shared similarity structure, our method avoids the combinatorial complexity of pairwise modality modeling and enables scalable joint distillation across heterogeneous modalities. Theoretical analysis from the spectral perspective reveals the rationality of our proposed method against bimodal dataset distillation techniques. Extensive experiments on various benchmarks demonstrate that the proposed method achieves superior compression-performance trade-offs compared to existing competitors. The source code will be publicly released.


聚类(2篇)

【1】Distributionally Robust K-Means Clustering
标题:分布鲁棒K均值集群
链接:https://arxiv.org/abs/2604.11118

作者:Vikrant Malik,Taylan Kargin,Babak Hassibi
摘要:K-means clustering is a workhorse of unsupervised learning, but it is notoriously brittle to outliers, distribution shifts, and limited sample sizes. Viewing k-means as Lloyd--Max quantization of the empirical distribution, we develop a distributionally robust variant that protects against such pathologies. We posit that the unknown population distribution lies within a Wasserstein-2 ball around the empirical distribution. In this setting, one seeks cluster centers that minimize the worst-case expected squared distance over this ambiguity set, leading to a minimax formulation. A tractable dual yields a soft-clustering scheme that replaces hard assignments with smoothly weighted ones. We propose an efficient block coordinate descent algorithm with provable monotonic decrease and local linear convergence. Experiments on standard benchmarks and large-scale synthetic data demonstrate substantial gains in outlier detection and robustness to noise.


【2】CodeQuant: Unified Clustering and Quantization for Enhanced Outlier Smoothing in Low-Precision Mixture-of-Experts
标题:CodeQuant:统一集群和量化,以增强低精度混合专家中的离群值平滑
链接:https://arxiv.org/abs/2604.10496

作者:Xiangyang Yin,Xingyu Liu,Tianhua Xia,Bo Bao,Vithursan Thangarasa,Valavan Manohararajah,Eric Sather,Sai Qian Zhang
摘要:Outliers have emerged as a fundamental bottleneck in preserving accuracy for low-precision large models, particularly within Mixture-of-Experts (MoE) architectures that are increasingly central to large-scale language modeling. Under post-training quantization (PTQ), these outliers induce substantial quantization errors, leading to severe accuracy degradation. While recent rotation-based smoothing techniques alleviate the problem by redistributing outlier magnitudes, residual errors remain and continue to impede reliable low-precision deployment.   In this work, we tackle this challenge by introducing \textit{CodeQuant}, a unified quantization-and-clustering scheme that contains smoothing activation outliers via learnable rotation and absorbing weight outliers into fine-tuned cluster centroids for MoE. This design reduces the influence of extreme values by fitting them within cluster centroids, thereby lowering quantization error while maintaining expressive capacity. Coupled with a dedicated kernel design for GPU and CPU, CodeQuant achieves up to $4.15\times$ speedup while delivering significantly higher accuracy than state-of-the-art quantization approaches across diverse MoE models. Our results highlight CodeQuant as a promising direction for efficient and accurate deployment of MoE-based large language models under low-precision constraints. Our code is available at https://github.com/SAI-Lab-NYU/CodeQuant.


超分辨率|去噪|去模糊|去雾(1篇)

【1】Cross-Validated Cross-Channel Self-Attention and Denoising for Automatic Modulation Classification
标题:交叉验证的跨通道自注意和去噪用于自动调制分类
链接:https://arxiv.org/abs/2604.10054

作者:Prakash Suman,Yanzhen Qu
摘要 :This study addresses a key limitation in deep learning Automatic Modulation Classification (AMC) models, which perform well at high signal-to-noise ratios (SNRs) but degrade under noisy conditions due to conventional feature extraction suppressing both discriminative structure and interference. The goal was to develop a feature-preserving denoising method that mitigates the loss of modulation class separation. A deep learning AMC model was proposed, incorporating a cross-channel self-attention block to capture dependencies between in-phase and quadrature components, along with dual-path deep residual shrinkage denoising blocks to suppress noise. Experiments using the RML2018.01a dataset employed stratified sampling across 24 modulation types and 26 SNR levels. Results showed that denoising depth strongly influences robustness at low and moderate SNRs. Compared to benchmark models PET-CGDNN, MCLDNN, and DAE, the proposed model achieved notable accuracy improvements across -8 dB to +2 dB SNR, with increases of 3%, 2.3%, and 14%, respectively. Cross-validation confirmed the model's robustness, yielding a mean accuracy of 62.6%, macro precision of 65.8%, macro-recall of 62.6%, and macro-F1 score of 62.9%. The architecture advances interference-aware AMC by formalizing baseband modeling as orthogonal subproblems and introducing cross-channel attention as a generalized complex interaction operator, with ablations confirming the critical role of feature-preserving denoising for robustness at low-to-medium SNR.


自动驾驶|车辆|车道检测等(3篇)

【1】Human Centered Non Intrusive Driver State Modeling Using Personalized Physiological Signals in Real World Automated Driving
标题:在现实世界自动驾驶中使用个性化生理信号的以人为本的非侵入性驾驶员状态建模
链接:https://arxiv.org/abs/2604.11549

作者:David Puertas-Ramirez,Raul Fernandez-Matellan,David Martin Gomez,Jesus G. Boticario
备注:17 pages (including references), 4 Figures, 4 Tables
摘要:In vehicles with partial or conditional driving automation (SAE Levels 2-3), the driver remains responsible for supervising the system and responding to take-over requests. Therefore, reliable driver monitoring is essential for safe human-automation collaboration. However, most existing Driver Monitoring Systems rely on generalized models that ignore individual physiological variability. In this study, we examine the feasibility of personalized driver state modeling using non-intrusive physiological sensing during real-world automated driving. We conducted experiments in an SAE Level 2 vehicle using an Empatica E4 wearable sensor to capture multimodal physiological signals, including electrodermal activity, heart rate, temperature, and motion data. To leverage deep learning architectures designed for images, we transformed the physiological signals into two-dimensional representations and processed them using a multimodal architecture based on pre-trained ResNet50 feature extractors. Experiments across four drivers demonstrate substantial interindividual variability in physiological patterns related to driver awareness. Personalized models achieved an average accuracy of 92.68%, whereas generalized models trained on multiple users dropped to an accuracy of 54%, revealing substantial limitations in cross-user generalization. These results underscore the necessity of adaptive, personalized driver monitoring systems for future automated vehicles and imply that autonomous systems should adapt to each driver's unique physiological profile.


【2】Towards Situation-aware State Modeling for Air Traffic Flow Prediction
标题:空中交通流量预测的态势感知状态建模
链接:https://arxiv.org/abs/2604.11198

作者:Anqi Liu,Bin Wang,Jiangtao Zhao,Dechuan Ma,Guiyuan Jiang,Feng Hong,Yanwei Yu,Tianrui Li
摘要:Accurate air traffic prediction in the terminal airspace (TA) is pivotal for proactive air traffic management (ATM). However, existing data-driven approaches predominantly rely on time series-based forecasting paradigms, which inherently overlook critical aircraft state information, such as real-time kinematics and proximity to airspace boundaries. To address this limitation, we propose \textit{AeroSense}, a direct state-to-flow modeling framework for air traffic prediction. Unlike classical time series-based methods that first aggregate aircraft trajectories into macroscopic flow sequences before modeling, AeroSense explicitly represents the real-time airspace situation as \textit{a dynamic set of aircraft states}, enabling the direct processing of a variable number of aircraft instead of time series as inputs. Specifically, we introduce a situation-aware state representation that enables AeroSense to sense the instantaneous terminal airspace situation directly from microscopic aircraft states. Furthermore, we design a model architecture that incorporates masked self-attention to capture inter-aircraft interactions, together with two decoupled prediction heads to model heterogeneous flow dynamics across two key functional areas of the TA. Extensive experiments on a large-scale real-world airport dataset demonstrate that AeroSense consistently achieves state-of-the-art performance, validating that direct modeling of microscopic aircraft states yields substantially higher predictive fidelity than time series-based baselines. Moreover, the proposed framework exhibits superior robustness during peak traffic periods, achieves Pareto-optimal performance under dayparting multi-object evaluation, and provides meaningful interpretability through attention-based visualizations.


【3】Enhancing Cross-Problem Vehicle Routing via Federated Learning
标题:通过联邦学习增强跨问题车辆路线
链接:https://arxiv.org/abs/2604.10652

作者:Xiangchi Meng,Jianan Zhou,Jie Gao,Yifan Lu,Yaoxin Wu,Gonglin Yuan,Yaqing Hou
摘要:Vehicle routing problems (VRPs) constitute a core optimization challenge in modern logistics and supply chain management. The recent neural combinatorial optimization (NCO) has demonstrated superior efficiency over some traditional algorithms. While serving as a primary NCO approach for solving general VRPs, current cross-problem learning paradigms are still subject to performance degradation and generalizability decay, when transferring from simple VRP variants to those involving different and complex constraints. To strengthen the paradigms, this paper offers an innovative "Multi-problem Pre-train, then Single-problem Fine-tune" framework with Federated Learning (MPSF-FL). This framework exploits the common knowledge of a federated global model to foster efficient cross-problem knowledge sharing and transfer among local models for single-problem fine-tuning. In this way, local models effectively retain common VRP knowledge from up-to-date global model, while being efficiently adapted to downstream VRPs with heterogeneous complex constraints. Experimental results demonstrate that our framework not only enhances the performance in diverse VRPs, but also improves the generalizability in unseen problems.


点云|SLAM|雷达|激光|深度RGBD相关(1篇)

【1】WOODELF-HD: Efficient Background SHAP for High-Depth Decision Trees
标题:WOODELF-HD:高深度决策树的高效背景SHAP
链接:https://arxiv.org/abs/2604.10569

作者:Ron Wettenstein,Alexander Nadel,Udi Boker
备注:15 pages (including 6-page appendix), 9 figures
摘要 :Decision-tree ensembles are a cornerstone of predictive modeling, and SHAP is a standard framework for interpreting their predictions. Among its variants, Background SHAP offers high accuracy by modeling missing features using a background dataset. Historically, this approach did not scale well, as the time complexity for explaining n instances using m background samples included an O(mn) component. Recent methods such as Woodelf and PLTreeSHAP reduce this to O(m+n), but introduce a preprocessing bottleneck that grows as 3^D with tree depth D, making them impractical for deep trees. We address this limitation with WoodelfHD, a Woodelf extension that reduces the 3^D factor to 2^D. The key idea is a Strassen-like multiplication scheme that exploits the structure of Woodelf matrices, reducing matrix-vector multiplication from O(k^2) to O(k*log(k)) via a fully vectorized, non-recursive implementation. In addition, we merge path nodes with identical features, reducing cache size and memory usage. When running on standard environments, WoodelfHD enables exact Background SHAP computation for trees with depths up to 21, where previous methods fail due to excessive memory usage. For ensembles of depths 12 and 15, it achieves speedups of 33x and 162x, respectively, over the state-of-the-art.


联邦学习|隐私保护|加密(6篇)

【1】Representation-Aligned Multi-Scale Personalization for Federated Learning
标题:联邦学习的代表一致多尺度个性化
链接:https://arxiv.org/abs/2604.11278

作者:Wenfei Liang,Wee Peng Tay
摘要:In federated learning (FL), accommodating clients with diverse resource constraints remains a significant challenge. A widely adopted approach is to use a shared full-size model, from which each client extracts a submodel aligned with its computational budget. However, regardless of the specific scoring strategy, these methods rely on the same global backbone, limiting both structural diversity and representational adaptation across clients. This paper presents FRAMP, a unified framework for personalized and resource-adaptive federated learning. Instead of relying on a fixed global model, FRAMP generates client-specific models from compact client descriptors, enabling fine-grained adaptation to both data characteristics and computational budgets. Each client trains a tailored lightweight submodel and aligns its learned representation with others to maintain global semantic consistency. Extensive experiments on vision and graph benchmarks demonstrate that FRAMP enhances generalization and adaptivity across a wide range of client settings.


【2】A Full Compression Pipeline for Green Federated Learning in Communication-Constrained Environments
标题:通信受限环境中绿色联邦学习的完整压缩管道
链接:https://arxiv.org/abs/2604.11146

作者:Elouan Colybes,Shririn Salehi,Anke Schmeink
备注:This work was accepted at IEEE International Conference on Machine Learning for Communication and Networking (ICMLCN), 2026
摘要:Federated Learning (FL) enables collaborative model training across distributed clients without sharing raw data, thereby preserving privacy. However, FL often suffers from significant communication and computational overhead, limiting its scalability and sustainability. In this work, we introduce a Full Compression Pipeline (FCP) for FL in communication-constrained environments. FCP integrates three complementary deep compression techniques (pruning, quantization, and Huffman encoding) into a unified end-to-end framework. By compressing local models and communication payloads, FCP substantially reduces transmission costs and resource consumption while maintaining competitive accuracy. To quantify its impact, we develop an evaluation framework that captures both communication and computation overheads as a unified model cost, allowing a holistic assessment of efficiency trade-offs. The pipeline is evaluated in an independent and identically distributed (IID) and non-IID data setting. In one representative scenario, training a ResNet-12 model on the CIFAR-10 dataset with ten clients and a 2 Mbps bandwidth, the FCP achieves more than 11$\times$ reduction in model size, with only a 2% drop in accuracy compared to the uncompressed baseline. This results in an FL training that is more than 60% faster.


【3】Task2vec Readiness: Diagnostics for Federated Learning from Pre-Training Embeddings
标题:Task2vec就绪:从训练前嵌入中对联邦学习进行诊断
链接:https://arxiv.org/abs/2604.10849

作者:Cristiano Mafuz,Rodrigo Silva
摘要:Federated learning (FL) performance is highly sensitive to heterogeneity across clients, yet practitioners lack reliable methods to anticipate how a federation will behave before training. We propose readiness indices, derived from Task2Vec embeddings, that quantifies the alignment of a federation prior to training and correlates with its eventual performance. Our approach computes unsupervised metrics -- such as cohesion, dispersion, and density -- directly from client embeddings. We evaluate these indices across diverse datasets (CIFAR-10, FEMNIST, PathMNIST, BloodMNIST) and client counts (10--20), under Dirichlet heterogeneity levels spanning $α\in \{0.05,\dots,5.0\}$ and FedAVG aggregation strategy. Correlation analyses show consistent and significant Pearson and Spearman coefficients between some of the Task2Vec-based readiness and final performance, with values often exceeding 0.9 across dataset$\times$client configurations, validating this approach as a robust proxy for FL outcomes. These findings establish Task2Vec-based readiness as a principled, pre-training diagnostic for FL that may offer both predictive insight and actionable guidance for client selection in heterogeneous federations.


【4】Communication-Efficient Gluon in Federated Learning
标题:联邦学习中高效沟通的胶子
链接:https://arxiv.org/abs/2604.10689

作者:Xun Qian,Alexander Gaponov,Grigory Malinovsky,Peter Richtárik
备注:48 pages, 8 figures
摘要 :Recent developments have shown that Muon-type optimizers based on linear minimization oracles (LMOs) over non-Euclidean norm balls have the potential to get superior practical performance than Adam-type methods in the training of large language models. Since large-scale neural networks are trained across massive machines, communication cost becomes the bottleneck. To address this bottleneck, we investigate Gluon, which is an extension of Muon under the more general layer-wise $(L^0, L^1)$-smooth setting, with both unbiased and contraction compressors. In order to reduce the compression error, we employ the variance reduced technique in SARAH in our compressed methods. The convergence rates and improved communication cost are achieved under certain conditions. As a byproduct, a new variance reduced algorithm with faster convergence rate than Gluon is obtained. We also incorporate momentum variance reduction (MVR) to these compressed algorithms and comparable communication cost is derived under weaker conditions when $L_i^1 \neq 0$. Finally, several numerical experiments are conducted to verify the superior performance of our compressed algorithms in terms of communication cost.


【5】Energy-Efficient Federated Edge Learning For Small-Scale Datasets in Large IoT Networks
标题:大型物联网网络中小规模数据集的节能联合边缘学习
链接:https://arxiv.org/abs/2604.10662

作者:Haihui Xie,Wenkun Wen,Shuwu Chen,Zhaogang Shu,Minghua Xia
备注:16 pages, 9 figures. To appear in IEEE TWC
摘要:Large-scale Internet of Things (IoT) networks enable intelligent services such as smart cities and autonomous driving, but often face resource constraints. Collecting heterogeneous sensory data, especially in small-scale datasets, is challenging, and independent edge nodes can lead to inefficient resource utilization and reduced learning performance. To address these issues, this paper proposes a collaborative optimization framework for energy-efficient federated edge learning with small-scale datasets. We first derive an expected learning loss to quantify the relationship between the number of training samples and learning objectives. A stochastic online learning algorithm is then designed to adapt to data variations, and a resource optimization problem with a convergence bound is formulated. Finally, an online distributed algorithm efficiently solves large-scale optimization problems with high scalability. Extensive simulations and autonomous navigation case studies with collision avoidance demonstrate that the proposed approach significantly improves learning performance and resource efficiency compared to state-of-the-art benchmarks.


【6】FEDBUD: Joint Incentive and Privacy Optimization for Resource-Constrained Federated Learning
标题:FEDBUD:联合激励和隐私优化的资源受限联邦学习
链接:https://arxiv.org/abs/2604.10499

作者:Tao Liu,Xuehe Wang
摘要:Federated learning has become a popular paradigm for privacy protection and edge-based machine learning. However, defending against differential attacks and devising incentive strategies remain significant bottlenecks in this field. Despite recent works on privacy-aware incentive mechanism design for federated learning, few of them consider both data volume and noise level. In this paper, we propose a novel federated learning system called FEDBUD, which combines privacy and economic concerns together by considering the joint influence of data volume and noise level on incentive strategy determination. In this system, the cloud server controls monetary payments to edge nodes, while edge nodes control data volume and noise level that potentially impact the model performance of the cloud server. To determine the mutually optimal strategies for both sides, we model FEDBUD as a two-stage Stackelberg Game and derive the Nash Equilibrium using the mean-field estimator and virtual queue. Experimental results on real-world datasets demonstrate the outstanding performance of FEDBUD.


推理|分析|理解|解释(16篇)

【1】Towards Autonomous Mechanistic Reasoning in Virtual Cells
标题:迈向虚拟细胞中的自主机械推理
链接:https://arxiv.org/abs/2604.11661

作者:Yunhui Jang,Lu Zhu,Jake Fawkes,Alisandra Kaye Denton,Dominique Beaini,Emmanuel Noutahi
摘要:Large language models (LLMs) have recently gained significant attention as a promising approach to accelerate scientific discovery. However, their application in open-ended scientific domains such as biology remains limited, primarily due to the lack of factually grounded and actionable explanations. To address this, we introduce a structured explanation formalism for virtual cells that represents biological reasoning as mechanistic action graphs, enabling systematic verification and falsification. Building upon this, we propose VCR-Agent, a multi-agent framework that integrates biologically grounded knowledge retrieval with a verifier-based filtering approach to generate and validate mechanistic reasoning autonomously. Using this framework, we release VC-TRACES dataset, which consists of verified mechanistic explanations derived from the Tahoe-100M atlas. Empirically, we demonstrate that training with these explanations improves factual precision and provides a more effective supervision signal for downstream gene expression prediction. These results underscore the importance of reliable mechanistic reasoning for virtual cells, achieved through the synergy of multi-agent and rigorous verification.


【2】Inter-Layer Hessian Analysis of Neural Networks with DAG Architectures
标题:采用DAB架构的神经网络的层间Hessian分析
链接:https://arxiv.org/abs/2604.11639

作者:Maxim Bolshim,Alexander Kugaevskikh
备注:45 pages, 9 figures, 17 tables. Submitted to Neural Networks (Elsevier). Code: https://github.com/comiam/dag-hesse
摘要:Modern automatic differentiation frameworks (JAX, PyTorch) return the Hessian of the loss function as a monolithic tensor, without exposing the internal structure of inter-layer interactions. This paper presents an analytical formalism that explicitly decomposes the full Hessian into blocks indexed by the DAG of an arbitrary architecture. The canonical decomposition $H = H^{GN} + H^T$ separates the Gauss--Newton component (convex part) from the tensor component (residual curvature responsible for saddle points). For piecewise-linear activations (ReLU), the tensor component of the input Hessian vanishes ($H^{T}_{v,w}\!\equiv\!0$ a.e., $H^f_{v,w}\!=\!H^{GN}_{v,w}\!\succeq\!0$); the full parametric Hessian contains residual terms that do not reduce to the GGN. Building on this decomposition, we introduce diagnostic metrics (inter-layer resonance~$\mathcal{R}$, geometric coupling~$\mathcal{C}$, stable rank~$\mathcal{D}$, GN-Gap) that are estimated stochastically in $O(P)$ time and reveal structural curvature interactions between layers. The theoretical analysis explains exponential decay of resonance in vanilla networks and its preservation under skip connections; empirical validation spans fully connected MLPs (Exp.\,1--5) and convolutional architectures (ResNet-18, ${\sim}11$M~parameters, Exp.\,6). When the architecture reduces to a single node, all definitions collapse to the standard Hessian $\nabla^2_θ\mathcal{L}(θ)\in\mathbb{R}^{p\times p}$.


【3】A Triadic Suffix Tokenization Scheme for Numerical Reasoning
标题:一种用于数字推理的三重后缀令牌化方案
链接:https://arxiv.org/abs/2604.11582

作者:Olga Chetverina
备注:8 pages, 1 figure. This is a theoretical proposal of a novel numbers tokenization for LLMs. The code is available on GitHub. Previous version archived at Zenodo: DOI 10.5281/zenodo.18999577
摘要:Standard subword tokenization methods fragment numbers inconsistently, causing large language models (LLMs) to lose positional and decimal structure - a primary driver of errors in arithmetic and scientific reasoning. We introduce Triadic Suffix Tokenization (TST), a deterministic scheme that partitions digits into three-digit triads and annotates each triad with an explicit magnitude marker. Critically, the scheme defines a fixed, one-to-one mapping between suffixes and orders of magnitude for the integer part (thousands, millions, billions, etc.) and a parallel system of replicated markers for fractional depth (tenths, thousandths, millionths, etc.). Unlike approaches that rely on positional inference, this method provides a consistent gradient signal, which should ensure stable convergence. Two implementation variants are proposed: (1) a vocabulary-based approach that adds at most 10,000 fixed tokens to an existing vocabulary, covering 33 orders of magnitude ($10^{-15}$ to $10^{18}$); and (2) a suffix-marker approach that uses a small set of special tokens to denote magnitude dynamically. Both variants preserve exact digits while making order-of-magnitude relationships transparent at the token level. The framework is inherently scalable, allowing for linear vocabulary expansion to accommodate arbitrary precision and range. TST is architecture-agnostic and can be integrated as a drop-in preprocessing step. Experimental validation is deferred to future work.


【4】Think Before you Write: QA-Guided Reasoning for Character Descriptions in Books
标题:写作前思考:书籍中人物描述的QA引导推理
链接:https://arxiv.org/abs/2604.11435

作者:Argyrios Papoudakis,Mirella Lapata,Frank Keller
备注:20 pages, 16 tables, 1 figure
摘要:Character description generation is an important capability for narrative-focused applications such as summarization, story analysis, and character-driven simulations. However, generating accurate character descriptions from long-form narratives (e.g., novels) is challenging: models must track evolving attributes (e.g., relationships and events), integrate evidence scattered across the text, and infer implicit details. Despite the success of reasoning-enabled LLMs on many benchmarks, we find that for character description generation their performance improves when built-in reasoning is disabled (i.e., an empty reasoning trace). Motivated by this, we propose a training framework that decouples reasoning from generation. Our approach, which can be applied on top of long-context LLMs or chunk-based methods, consists of a reasoning model that produces a structured QA reasoning trace and a generation model that conditions on this trace to produce the final character description. Experiments on two datasets (BookWorm and CroSS) show that QA-guided reasoning improves faithfulness, informativeness, and grounding over strong long-context baselines.


【5】Rethinking Token-Level Credit Assignment in RLVR: A Polarity-Entropy Analysis
标题:重新思考WLVR中的代币级信用分配:两极-熵分析
链接:https://arxiv.org/abs/2604.11056

作者:Yuhang He,Haodong Wu,Siyi Liu,Hongyu Ge,Hange Zhou,Keyi Wu,Zhuo Zheng,Qihong Lin,Zixin Zhong,Yongqi Zhang
摘要:Reinforcement Learning with Verifiable Rewards (RLVR) has substantially improved the reasoning ability of Large Language Models (LLMs). However, its sparse outcome-based rewards pose a fundamental credit assignment problem. We analyze this problem through the joint lens of reward polarity and token entropy. Our diagnostic tool, the Four Quadrant Decomposition, isolates token updates by polarity and entropy, and controlled ablations show that reasoning improvements concentrate in the high-entropy quadrants. To justify this observation theoretically, we adapt Conditional Mutual Information to the autoregressive RLVR setting and prove that the credit a token can carry is upper-bounded by its entropy. This view yields testable predictions that reasoning gains arise primarily from high-entropy tokens, with unique roles for positive and negative updates. A gradient analysis of GRPO further reveals how uniform reward broadcast dilutes signal at high-entropy positions while over-crediting deterministic tokens. Grounded in these insights, we propose Entropy-Aware Policy Optimization (EAPO) that modulates token-level learning signals accordingly. Extensive experiments demonstrate that EAPO outperforms strong baselines across two model families.


【6】Continuous-time Online Learning via Mean-Field Neural Networks: Regret Analysis in Diffusion Environments
标题:通过平均场神经网络的连续时间在线学习:扩散环境中的遗憾分析
链接:https://arxiv.org/abs/2604.10958

作者:Erhan Bayraktar,Bingyan Han,Ziqing Zhang
备注:64 pages, 5 figures
摘要:We study continuous-time online learning where data are generated by a diffusion process with unknown coefficients. The learner employs a two-layer neural network, continuously updating its parameters in a non-anticipative manner. The mean-field limit of the learning dynamics corresponds to a stochastic Wasserstein gradient flow adapted to the data filtration. We establish regret bounds for both the mean-field limit and finite-particle system. Our analysis leverages the logarithmic Sobolev inequality, Polyak-Lojasiewicz condition, Malliavin calculus, and uniform-in-time propagation of chaos. Under displacement convexity, we obtain a constant static regret bound. In the general non-convex setting, we derive explicit linear regret bounds characterizing the effects of data variation, entropic exploration, and quadratic regularization. Finally, our simulations demonstrate the outperformance of the online approach and the impact of network width and regularization parameters.


【7】CASK: Core-Aware Selective KV Compression for Reasoning Traces
标题:CASK:用于推理痕迹的核心感知选择性KV压缩
链接:https://arxiv.org/abs/2604.10900

作者:Buseong Kim,Heejun Gwon
备注:25 pages, 8 figures, 3 main tables, appendices included
摘要:In large language models performing long-form reasoning, the KV cache grows rapidly with decode length, creating bottlenecks in memory and inference stability. Existing reasoning-oriented KV compression has mostly followed an eviction-centered view: estimate token importance more accurately, then discard lower-ranked entries. Our analysis suggests that scorer refinement alone often fails to substantially reorganize the actual keep-set and may therefore not be the main lever for preserving reasoning behavior. We instead frame reasoning KV compression as a behavior-preserving structured consolidation problem. CASK partitions the decode-time reasoning trace into a protected core that anchors answer formation and intermediate state, and mergeable scratch with high redundancy. The core is preserved, while selective consolidation is applied only to the scratch. To address prompt-heavy regimes where the prefix can exhaust the budget before decode-stage compression becomes active, CASK further uses a two-stage design: prefix eviction followed by decode-stage consolidation. On the H100 reasoning gate, CASK shows higher full-KV continuation fidelity than TriAttention at matched budgets on both AIME24 and AIME25, with recurring cask@384 > triattention@512 crossings. In prompt-heavy replay, multi_news and vcsum act as decode-active witnesses, while qmsum and gov_report expose the prefix_budget_exhausted boundary. The overall evidence supports a simple conclusion: effective reasoning KV compression depends less on more elaborate scorer engineering than on combining core preservation with selective scratch consolidation to lower the usable budget frontier.


【8】Verify Before You Fix: Agentic Execution Grounding for Trustworthy Cross-Language Code Analysis
标题:修复前验证:值得信赖的跨语言代码分析的详细执行基础
链接:https://arxiv.org/abs/2604.10800

作者:Jugal Gajjar
备注:20 pages (13 main + 7 appendices), 9 figures, 10 tables. Submitted to NeurIPS 2026
摘要:Learned classifiers deployed in agentic pipelines face a fundamental reliability problem: predictions are probabilistic inferences, not verified conclusions, and acting on them without grounding in observable evidence leads to compounding failures across downstream stages. Software vulnerability analysis makes this cost concrete and measurable. We address this through a unified cross-language vulnerability lifecycle framework built around three LLM-driven reasoning stages-hybrid structural-semantic detection, execution-grounded agentic validation, and validation-aware iterative repair-governed by a strict invariant: no repair action is taken without execution-based confirmation of exploitability. Cross-language generalization is achieved via a Universal Abstract Syntax Tree (uAST) normalizing Java, Python, and C++ into a shared structural schema, combined with a hybrid fusion of GraphSAGE and Qwen2.5-Coder-1.5B embeddings through learned two-way gating, whose per-sample weights provide intrinsic explainability at no additional cost. The framework achieves 89.84-92.02% intra-language detection accuracy and 74.43-80.12% zero-shot cross-language F1, resolving 69.74% of vulnerabilities end-to-end at a 12.27% total failure rate. Ablations establish necessity: removing uAST degrades cross-language F1 by 23.42%, while disabling validation increases unnecessary repairs by 131.7%. These results demonstrate that execution-grounded closed-loop reasoning is a principled and practically deployable mechanism for trustworthy LLM-driven agentic AI.


【9】SpecMoE: A Fast and Efficient Mixture-of-Experts Inference via Self-Assisted Speculative Decoding
标题:SpecMoE:通过自我辅助推测解码的快速有效的混合专家推理
链接:https://arxiv.org/abs/2604.10152

作者:Jehyeon Bang,Eunyeong Cho,Ranggi Hwang,Jinha Chung,Minsoo Rhu
备注:This is an extended version of our work, which is accepted for publication at the 63rd ACM/IEEE Design Automation Conference (DAC), 2026
摘要:The Mixture-of-Experts (MoE) architecture has emerged as a promising approach to mitigate the rising computational costs of large language models (LLMs) by selectively activating parameters. However, its high memory requirements and sub-optimal parameter efficiency pose significant challenges for efficient deployment. Although CPU-offloaded MoE inference systems have been proposed in the literature, they offer limited efficiency, particularly for large batch sizes. In this work, we propose SpecMoE, a memory-efficient MoE inference system based on our self-assisted speculative decoding algorithm. SpecMoE demonstrates the effectiveness of applying speculative decoding to MoE inference without requiring additional model training or fine-tuning. Our system improves inference throughput by up to $4.30\times$, while significantly reducing bandwidth requirements of both memory and interconnect on memory-constrained systems.


【10】Explainable Human Activity Recognition: A Unified Review of Concepts and Mechanisms
标题:可解释的人类活动识别:概念和机制的统一回顾
链接:https://arxiv.org/abs/2604.09799

作者:Mainak Kundu,Catherine Chen,Rifatul Islam,Ismail Uysal,Ria Kanjilal
摘要:Human activity recognition (HAR) has become a key component of intelligent systems for healthcare monitoring, assistive living, smart environments, and human-computer interaction. Although deep learning has substantially improved HAR performance on multivariate sensor data, the resulting models often remain opaque, limiting trust, reliability, and real-world deployment. Explainable artificial intelligence (XAI) has therefore emerged as a critical direction for making HAR systems more transparent and human-centered. This paper presents a comprehensive review of explainable HAR methods across wearable, ambient, physiological, and multimodal sensing settings. We introduce a unified perspective that separates conceptual dimensions of explainability from algorithmic explanation mechanisms, reducing ambiguities in prior surveys. Building on this distinction, we present a mechanism-centric taxonomy of XAI-HAR methods covering major explanation paradigms. The review examines how these methods address the temporal, multimodal, and semantic complexities of HAR, and summarize their interpretability objectives, explanation targets, and limitations. In addition, we discuss current evaluation practices, highlight key challenges in achieving reliable and deployable XAI-HAR, and outline directions toward trustworthy activity recognition systems that better support human understanding and decision-making.


【11】Active Inference with a Self-Prior in the Mirror-Mark Task
标题:标记任务中自我先验的主动推理
链接 :https://arxiv.org/abs/2604.09673

作者:Dongmin Kim,Hoshinori Kanazawa,Yasuo Kuniyoshi
备注:7 pages, 5 figures
摘要:The mirror self-recognition test evaluates whether a subject touches a mark on its own body that is visible only in a mirror, and is widely used as an indicator of self-awareness. In this study, we present a computational model in which this behavior emerges spontaneously through a single mechanism, the self-prior, without any external reward. The self-prior, implemented with a Transformer, learns the density of familiar multisensory experiences; when a novel mark appears, the discrepancy from this learned distribution drives mark-directed behavior through active inference. A simulated infant, relying solely on vision and proprioception without tactile input, discovered a sticker placed on its own face in the mirror and removed it in approximately 70% of cases without any explicit instruction. Expected free energy decreased significantly after sticker removal, confirming that the self-prior operates as an internal criterion for distinguishing self from non-self. Cross-modal sampling further demonstrated that the self-prior captures visual--proprioceptive associations, functioning as a probabilistic body schema. These results provide a concise computational account of the key behavior observed in the mirror test and suggest that the free energy principle can serve as a unifying hypothesis for investigating the developmental origins of self-awareness. Code is available at: https://github.com/kim135797531/self-prior-mirror


【12】OOWM: Structuring Embodied Reasoning and Planning via Object-Oriented Programmatic World Modeling
标题:OOWM:通过面向对象的程序化世界建模来构建目标推理和规划
链接:https://arxiv.org/abs/2604.09580

作者:Hongyu Chen,Liang Lin,Guangrun Wang
摘要:Standard Chain-of-Thought (CoT) prompting empowers Large Language Models (LLMs) with reasoning capabilities, yet its reliance on linear natural language is inherently insufficient for effective world modeling in embodied tasks. While text offers flexibility, it fails to explicitly represent the state-space, object hierarchies, and causal dependencies required for robust robotic planning. To address these limitations, we propose Object-Oriented World Modeling (OOWM), a novel framework that structures embodied reasoning through the lens of software engineering formalisms. We redefine the world model not as a latent vector space, but as an explicit symbolic tuple $W = \langle S, T \rangle$: a State Abstraction ($G_\text{state}$) instantiating the environmental state $S$, coupled with a Control Policy ($G_\text{control}$) representing the transition logic $T: S \times A \rightarrow S'$. OOWM leverages the Unified Modeling Language (UML) to materialize this definition: it employs Class Diagrams to ground visual perception into rigorous object hierarchies, and Activity Diagrams to operationalize planning into executable control flows. Furthermore, we introduce a three-stage training pipeline combining Supervised Fine-Tuning (SFT) with Group Relative Policy Optimization (GRPO). Crucially, this method utilizes outcome-based rewards from the final plan to implicitly optimize the underlying object-oriented reasoning structure, enabling effective learning even with sparse annotations. Extensive evaluations on the MRoom-30k benchmark demonstrate that OOWM significantly outperforms unstructured textual baselines in planning coherence, execution success, and structural fidelity, establishing a new paradigm for structured embodied reasoning.


【13】Improving understanding and trust in AI: How users benefit from interval-based counterfactual explanations
标题:提高对人工智能的理解和信任:用户如何从基于间隔的反事实解释中受益
链接:https://arxiv.org/abs/2604.09573

作者:Tabea E. Röber,Paul Festor,Rob Goedhart,S. İlker Birbil,Aldo Faisal
摘要:Experimental user studies evaluating the effectiveness of different subtypes of post-hoc explanations for black-box models are largely nonexistent. Therefore, the aim of this study was to investigate and evaluate how different types of counterfactual explanations, namely single point explanations and interval-based explanations, affect both model understanding and (demonstrated) trust. We conducted an online user study using a within-subjects experimental design, where the experimental arms were (i) no explanation (control), (ii) feature importance scores, (iii) point counterfactual explanations, and (iv) interval counterfactual explanations. Our results clearly show the superiority of interval explanations over other tested explanation types in increasing both model understanding and demonstrated trust in the AI. We could not support findings of some previous studies showing an effect of point counterfactual explanations compared to the control group. Our results further highlight the role individual differences in, for example, cognitive style or personality, in explanation effectiveness.


【14】Seven simple steps for log analysis in AI systems
标题:人工智能系统中日志分析的七个简单步骤
链接:https://arxiv.org/abs/2604.09563

作者:Magda Dubois,Ekin Zorer,Maia Hamin,Joe Skinner,Alexandra Souly,Jerome Wynne,Harry Coppock,Lucas Satos,Sayash Kapoor,Sunischal Dev,Keno Juchems,Kimberly Mai,Timo Flesch,Lennart Luettgau,Charles Teague,Eric Patey,JJ Allaire,Lorenzo Pacchiardi,Jose Hernandez-Orallo,Cozmin Ududec
摘要:AI systems produce large volumes of logs as they interact with tools and users. Analysing these logs can help understand model capabilities, propensities, and behaviours, or assess whether an evaluation worked as intended. Researchers have started developing methods for log analysis, but a standardised approach is still missing. Here we suggest a pipeline based on current best practices. We illustrate it with concrete code examples in the Inspect Scout library, provide detailed guidance on each step, and highlight common pitfalls. Our framework provides researchers with a foundation for rigorous and reproducible log analysis.


【15】Regional Explanations: Bridging Local and Global Variable Importance
标题:区域补偿:弥合地方和全球可变重要性
链接:https://arxiv.org/abs/2604.11223

作者:Salim I. Amoukou,Nicolas J-B. Brunel
备注 :Accepted at the 39th Conference on Neural Information Processing Systems (NeurIPS 2025)
摘要:We analyze two widely used local attribution methods, Local Shapley Values and LIME, which aim to quantify the contribution of a feature value $x_i$ to a specific prediction $f(x_1, \dots, x_p)$. Despite their widespread use, we identify fundamental limitations in their ability to reliably detect locally important features, even under ideal conditions with exact computations and independent features. We argue that a sound local attribution method should not assign importance to features that neither influence the model output (e.g., features with zero coefficients in a linear model) nor exhibit statistical dependence with functionality-relevant features. We demonstrate that both Local SV and LIME violate this fundamental principle. To address this, we propose R-LOCO (Regional Leave Out COvariates), which bridges the gap between local and global explanations and provides more accurate attributions. R-LOCO segments the input space into regions with similar feature importance characteristics. It then applies global attribution methods within these regions, deriving an instance's feature contributions from its regional membership. This approach delivers more faithful local attributions while avoiding local explanation instability and preserving instance-specific detail often lost in global methods.


【16】Byzantine-Robust Distributed SGD: A Unified Analysis and Tight Error Bounds
标题:Byzantine-Robust分布式SGD:统一分析和紧误差界
链接:https://arxiv.org/abs/2604.10179

作者:Boyuan Ruan,Xiaoyu Wang,Ya-Feng Liu
摘要:Byzantine-robust distributed optimization relies on robust aggregation rules to mitigate the influence of malicious Byzantine workers. Despite the proliferation of such rules, a unified convergence analysis framework that accommodates general data heterogeneity is lacking. In this work, we provide a thorough convergence theory of Byzantine-robust distributed stochastic gradient descent (SGD), analyzing variants both with and without local momentum. We establish the convergence rates for nonconvex smooth objectives and those satisfying the Polyak-Lojasiewicz condition under a general data heterogeneity assumption. Our analysis reveals that while stochasticity and data heterogeneity introduce unavoidable error floors, local momentum provably reduces the error component induced by stochasticity. Furthermore, we derive matching lower bounds to demonstrate that the upper bounds obtained in our analysis are tight and characterize the fundamental limits of Byzantine resilience under stochasticity and data heterogeneity. Empirical results support our theoretical findings.


检测相关(5篇)

【1】BRIDGE and TCH-Net: Heterogeneous Benchmark and Multi-Branch Baseline for Cross-Domain IoT Botnet Detection
标题:BRIDGE和TCH-Net:跨域物联网僵尸网络检测的异类基准和多分支基线
链接:https://arxiv.org/abs/2604.11324

作者:Ammar Bhilwarawala,Likhamba Rongmei,Harsh Sharma,Arya Jena,Kaushal Singh,Jayashree Piri,Raghunath Dey
备注:21 pages, 8 figures, submitted to Journal of Network and Computer Applications
摘要:IoT botnet detection has advanced, yet most published systems are validated on a single dataset and rarely generalise across environments. Heterogeneous feature spaces make multi-dataset training practically impossible without discarding semantic interpretability or introducing data integrity violations. No prior work has addressed both problems with a formally specified, reproducible methodology. This paper does. We introduce BRIDGE (Benchmark Reference for IoT Domain Generalisation Evaluation), the first formally specified heterogeneous multi-dataset benchmark for IoT intrusion detection, unifying CICIDS-2017, CIC-IoT-2023, Bot-IoT, Edge-IIoTset, and N-BaIoT through a 46-feature semantic canonical vocabulary grounded in CICFlowMeter nomenclature, with genuine-equivalence-only feature mapping, explicit zero-filling, and per-dataset coverage from 15% to 93%. A leave-one-dataset-out (LODO) protocol makes the generalisation gap precisely measurable: all five evaluated architectures achieve mean LODO F1 between 0.39 and 0.47, and we establish the first community generalisation baseline at mean LODO F1 = 0.5577, a result that shifts the agenda from single-benchmark optimisation toward cross-environment generalisation. We propose TCH-Net, a multi-branch network fusing a three-path Temporal branch (residual convolutional-BiGRU, stride-downsampled BiGRU, pre-LayerNorm Transformer), a provenance-conditioned Contextual branch, and a Statistical branch via Cross-Branch Gated Attention Fusion (CB-GAF) with learnable sigmoid gates for dynamic feature-wise mixing. Across five random seeds, TCH-Net achieves F1 = 0.8296 +/- 0.0028, AUC = 0.9380 +/- 0.0025, and MCC = 0.6972 +/- 0.0056, outperforming all twelve baselines (p < 0.05, Wilcoxon) and recording the highest LODO F1 overall. BRIDGE and the full pipeline are at https://github.com/Ammar-ss/TCH-Net.


【2】Learning to Test: Physics-Informed Representation for Dynamical Instability Detection
标题:学习测试:动态不稳定性检测的物理信息表示
链接:https://arxiv.org/abs/2604.10967

作者:Minxing Zheng,Zewei Deng,Liyan Xie,Shixiang Zhu
摘要:Many safety-critical scientific and engineering systems evolve according to differential-algebraic equations (DAEs), where dynamical behavior is constrained by physical laws and admissibility conditions. In practice, these systems operate under stochastically varying environmental inputs, so stability is not a static property but must be reassessed as the context distribution shifts. Repeated large-scale DAE simulation, however, is computationally prohibitive in high-dimensional or real-time settings. This paper proposes a test-oriented learning framework for stability assessment under distribution shift. Rather than re-estimating physical parameters or repeatedly solving the underlying DAE, we learn a physics-informed latent representation of contextual variables that captures stability-relevant structure and is regularized toward a tractable reference distribution. Trained on baseline data from a certified safe regime, the learned representation enables deployment-time safety monitoring to be formulated as a distributional hypothesis test in latent space, with controlled Type I error. By integrating neural dynamical surrogates, uncertainty-aware calibration, and uniformity-based testing, our approach provides a scalable and statistically grounded method for detecting instability risk in stochastic constrained dynamical systems without repeated simulation.


【3】Retinal Cyst Detection from Optical Coherence Tomography Images
标题:从光学相干断层扫描图像中检测视网膜囊肿
链接:https://arxiv.org/abs/2604.10843

作者:Abhishek Dharmaratnakar,Aadheeshwar Vijayakumar,Suchand Dayanand
备注:13 pages, 9 figures
摘要:Retinal Cysts are formed by leakage and accumulation of fluid in the retina due to the incompetence of retinal vasculature. These cystic spaces have significance in several ocular diseases such as age-related macular degeneration, diabetic macular edema, etc. Optical coherence tomography is one of the predominant diagnosing techniques for imaging retinal pathologies. Segmenting and quantification of intraretinal cysts plays the vital role in predicting visual acuity. In literature, several methods have been proposed for automatic segmentation of intraretinal cysts. As cystoid macular edema becomes a major problem to humankind, we need to quantify it accurately and operate it out, else it might cause many problems later on. Though research is being carried out in this area, not much of progress has been made and accuracy achieved so far is 68\% which is very less. Also, the methods depend on the quality of the image and give very low results for high noise images like topcon. This work uses ResNet CNN (Convolutional Neural Network) approach of segmentation by the way of patchwise classification for training on image set from cyst segmentation challenge dataset and testing on test data set given by 2 different graders for all 4 vendors in the challenge. It also compares these methods using first publicly available novel cyst segmentation challenge dataset. The methods were evaluated using quantitative measures to assess their robustness against the challenges of intraretinal cyst segmentation. The results are found to be better than the previous state of the art approaches giving more than 70\% dice coefficient on all vendors irrespective of their quality.


【4】Masked Contrastive Pre-Training Improves Music Audio Key Detection
标题:掩蔽对比预训练改进了音乐音频密钥检测
链接:https://arxiv.org/abs/2604.10021

作者:Ori Yonay,Tracy Hammond,Tianbao Yang
备注:Code and models available at github.com/echo-cipher/keymyna
摘要:Self-supervised music foundation models underperform on key detection, which requires pitch-sensitive representations. In this work, we present the first systematic study showing that the design of self-supervised pretraining directly impacts pitch sensitivity, and demonstrate that masked contrastive embeddings uniquely enable state-of-the-art (SOTA) performance in key detection in the supervised setting. First, we discover that linear evaluation after masking-based contrastive pretraining on Mel spectrograms leads to competitive performance on music key detection out of the box. This leads us to train shallow but wide multi-layer perceptrons (MLPs) on features extracted from our base model, leading to SOTA performance without the need for sophisticated data augmentation policies. We further analyze robustness and show empirically that the learned representations naturally encode common augmentations. Our study establishes self-supervised pretraining as an effective approach for pitch-sensitive MIR tasks and provides insights for designing and probing music foundation models.


【5】Real-Time Voicemail Detection in Telephony Audio Using Temporal Speech Activity Features
标题:使用时间语音活动特征进行电话音频中的实时语音邮件检测
链接:https://arxiv.org/abs/2604.09675

作者:Kumar Saurav
备注:16 pages, 5 tables. Preprint
摘要:Outbound AI calling systems must distinguish voicemail greetings from live human answers in real time to avoid wasted agent interactions and dropped calls. We present a lightweight approach that extracts 15 temporal features from the speech activity pattern of a pre-trained neural voice activity detector (VAD), then classifies with a shallow tree-based ensemble. Across two evaluation sets totaling 764 telephony recordings, the system achieves a combined 96.1% accuracy (734/764), with 99.3% (139/140) on an expert-labeled test set and 95.4% (595/624) on a held-out production set. In production validation over 77,000 calls, it maintained a 0.3% false positive rate and 1.3% false negative rate. End-to-end inference completes in 46 ms on a commodity dual-core CPU with no GPU, supporting 380+ concurrent WebSocket calls. In our search over 3,780 model, feature, and threshold combinations, feature importance was concentrated in three temporal variables. Adding transcription keywords or beep-based features did not improve the best real-time configuration and increased latency substantially. Our results suggest that temporal speech patterns are a strong signal for distinguishing voicemail greetings from live human answers.


分类|识别(3篇)

【1】Ambivalence/Hesitancy Recognition in Videos for Personalized Digital Health Interventions
标题:个性化数字健康干预的视频中的矛盾/犹豫识别
链接:https://arxiv.org/abs/2604.11730

作者:Manuela González-González,Soufiane Belharbi,Muhammad Osama Zeeshan,Masoumeh Sharafi,Muhammad Haseeb Aslam,Lorenzo Sia,Nicolas Richet,Marco Pedersoli,Alessandro Lameiras Koerich,Simon L Bacon,Eric Granger
备注:13 pages, 3 figures
摘要 :Using behavioural science, health interventions focus on behaviour change by providing a framework to help patients acquire and maintain healthy habits that improve medical outcomes. In-person interventions are costly and difficult to scale, especially in resource-limited regions. Digital health interventions offer a cost-effective approach, potentially supporting independent living and self-management. Automating such interventions, especially through machine learning, has gained considerable attention recently. Ambivalence and hesitancy (A/H) play a primary role for individuals to delay, avoid, or abandon health interventions. A/H are subtle and conflicting emotions that place a person in a state between positive and negative evaluations of a behaviour, or between acceptance and refusal to engage in it. They manifest as affective inconsistency across modalities or within a modality, such as language, facial, vocal expressions, and body language. While experts can be trained to recognize A/H, integrating them into digital health interventions is costly and less effective. Automatic A/H recognition is therefore critical for the personalization and cost-effectiveness of digital health interventions. Here, we explore the application of deep learning models for A/H recognition in videos, a multi-modal task by nature. In particular, this paper covers three learning setups: supervised learning, unsupervised domain adaptation for personalization, and zero-shot inference via large language models (LLMs). Our experiments are conducted on the unique and recently published BAH video dataset for A/H recognition. Our results show limited performance, suggesting that more adapted multi-modal models are required for accurate A/H recognition. Better methods for modeling spatio-temporal and multimodal fusion are necessary to leverage conflicts within/across modalities.


【2】Not All Forgetting Is Equal: Architecture-Dependent Retention Dynamics in Fine-Tuned Image Classifiers
标题:并非所有的遗忘都是平等的:微调图像分类器中的结构相关保留动态
链接:https://arxiv.org/abs/2604.11508

作者:Miit Daga,Swarna Priya Ramu
摘要:Fine-tuning pretrained image classifiers is standard practice, yet which individual samples are forgotten during this process, and whether forgetting patterns are stable or architecture dependent, remains unclear. Understanding these dynamics has direct implications for curriculum design, data pruning, and ensemble construction. We track per-sample correctness at every epoch during fine-tuning of ResNet-18 and DeiT-Small on a retinal OCT dataset (7 classes, 56:1 imbalance) and CUB-200-2011 (200 bird species), fitting Ebbinghaus-style exponential decay curves to each sample's retention trace. Five findings emerge. First, the two architectures forget fundamentally different samples: Jaccard overlap of the top 10 percent most-forgotten is 0.34 on OCTDL and 0.15 on CUB-200. Second, ViT forgetting is more structured (mean $R^2 = 0.74$) than CNN forgetting ($R^2 = 0.52$). Third, per-sample forgetting is stochastic across random seeds (Spearman $ρ\approx 0.01$), challenging the assumption that sample difficulty is an intrinsic property. Fourth, class-level forgetting is consistent and semantically interpretable: visually similar species are forgotten most, distinctive ones least. Fifth, a sample's loss after head warmup predicts its long-term decay constant ($ρ= 0.30$ to $0.50$, $p < 10^{-45}$). These findings suggest that architectural diversity in ensembles provides complementary retention coverage, and that curriculum or pruning methods based on per-sample difficulty may not generalize across runs. A spaced repetition sampler built on these decay constants does not outperform random sampling, indicating that static scheduling cannot exploit unstable per-sample signals.


【3】Towards Green Wearable Computing: A Physics-Aware Spiking Neural Network for Energy-Efficient IMU-based Human Activity Recognition
标题:迈向绿色可穿戴计算:一种物理感知尖峰神经网络,用于节能的基于IMU的人类活动识别
链接:https://arxiv.org/abs/2604.10458

作者:Naichuan Zheng,Hailun Xia,Zepeng Sun,Weiyi Li,Yinze Zhou
摘要:Wearable IMU-based Human Activity Recognition (HAR) relies heavily on Deep Neural Networks (DNNs), which are burdened by immense computational and buffering demands. Their power-hungry floating-point operations and rigid requirement to process complete temporal windows severely cripple battery-constrained edge devices. While Spiking Neural Networks (SNNs) offer extreme event-driven energy efficiency, standard architectures struggle with complex biomechanical topologies and temporal gradient degradation. To bridge this gap, we propose the Physics-Aware Spiking Neural Network (PAS-Net), a fully multiplier-free architecture explicitly tailored for Green HAR. Spatially, an adaptive symmetric topology mixer enforces human-joint physical constraints. Temporally, an $O(1)$-memory causal neuromodulator yields context-aware dynamic threshold neurons, adapting actively to non-stationary movement rhythms. Furthermore, we leverage a temporal spike error objective to unlock a flexible early-exit mechanism for continuous IMU streams. Evaluated across seven diverse datasets, PAS-Net achieves state-of-the-art accuracy while replacing dense operations with sparse 0.1 pJ integer accumulations. Crucially, its confidence-driven early-exit capability drastically reduces dynamic energy consumption by up to 98\%. PAS-Net establishes a robust, ultra-low-power neuromorphic standard for always-on wearable sensing.


表征(1篇)

【1】A Minimal Model of Representation Collapse: Frustration, Stop-Gradient, and Dynamics
标题:表示崩溃的最小模型:挫败、停止梯度和动力学
链接:https://arxiv.org/abs/2604.09979

作者:Louie Hong Yao,Yuhao Li,Shengchao Liu
备注:20 pages, 13 figures
摘要:Self-supervised representation learning is central to modern machine learning because it extracts structured latent features from unlabeled data and enables robust transfer across tasks and domains. However, it can suffer from representation collapse, a widely observed failure mode in which embeddings lose discriminative structure and distinct inputs become indistinguishable. To understand the mechanisms that drive collapse and the ingredients that prevent it, we introduce a minimal embedding-only model whose gradient-flow dynamics and fixed points can be analyzed in closed form, using a classification-representation setting as a concrete playground where collapse is directly quantified through the contraction of label-embedding geometry. We illustrate that the model does not collapse when the data are perfectly classifiable, while a small fraction of frustrated samples that cannot be classified consistently induces collapse through an additional slow time scale that follows the early performance gain. Within the same framework, we examine collapse prevention by adding a shared projection head and applying stop-gradient at the level of the training dynamics. We analyze the resulting fixed points and develop a dynamical mean-field style self-consistency description, showing that stop-gradient enables non-collapsed solutions and stabilizes finite class separation under frustration. We further verify empirically that the same qualitative dynamics and collapse-prevention effects appear in a linear teacher-student model, indicating that the minimal theory captures features that persist beyond the pure embedding setting.


编码器(2篇)

【1】KL Divergence Between Gaussians: A Step-by-Step Derivation for the Variational Autoencoder Objective
标题:高斯之间的KL发散:变分自动编码器目标的逐步推导
链接:https://arxiv.org/abs/2604.11744

作者:Andrés Muñoz,Rodrigo Ramele
备注:8 pages, no figures. Derivation of the KL divergence between Gaussian distributions with application to Variational Autoencoders (VAEs)
摘要 :Kullback-Leibler (KL) divergence is a fundamental concept in information theory that quantifies the discrepancy between two probability distributions. In the context of Variational Autoencoders (VAEs), it serves as a central regularization term, imposing structure on the latent space and thereby enabling the model to exhibit generative capabilities. In this work, we present a detailed derivation of the closed-form expression for the KL divergence between Gaussian distributions, a case of particular importance in practical VAE implementations. Starting from the general definition for continuous random variables, we derive the expression for the univariate case and extend it to the multivariate setting under the assumption of diagonal covariance. Finally, we discuss the interpretation of each term in the resulting expression and its impact on the training dynamics of the model.


【2】NeuroFlow: Toward Unified Visual Encoding and Decoding from Neural Activity
标题:NeuroFlow:从神经活动走向统一的视觉编码和解码
链接:https://arxiv.org/abs/2604.09817

作者:Weijian Mai,Mu Nan,Yu Zhu,Jiahang Cao,Rui Zhang,Yuqin Dai,Chunfeng Song,Andrew F. Luo,Jiamin Wu
备注:Accepted to CVPR 2026. Project page: https://michaelmaiii.github.io/NeuroFlow-S
摘要:Visual encoding and decoding models act as gateways to understanding the neural mechanisms underlying human visual perception. Typically, visual encoding models that predict brain activity from stimuli and decoding models that reproduce stimuli from brain activity are treated as distinct tasks, requiring separate models and training procedures. This separation is inefficient and fails to model the consistency between encoding and decoding processes. To address this limitation, we propose NeuroFlow, the first unified framework that jointly models visual encoding and decoding from neural activity within a single flow model. NeuroFlow introduces two key components: (1) NeuroVAE is designed as a variational backbone to model neural variability and establish a compact, semantically structured latent space for bidirectional modeling across visual and neural modalities. (2) Cross-modal Flow Matching (XFM) bypasses the typical paradigm of noise-to-data diffusion guided by a specific modality condition, instead learning a reversibly consistent flow model between visual and neural latent distributions. For the first time, visual encoding and decoding are reformulated as a time-dependent, reversible process within a shared latent space for unified modeling. Empirical results demonstrate that NeuroFlow achieves superior overall performance in visual encoding and decoding tasks with higher computational efficiency compared to any isolated methods. We further analyze principal factors that steer the model toward encoding-decoding consistency and, through brain functional analyses, demonstrate that NeuroFlow captures consistent activation patterns underlying neural variability. NeuroFlow marks a major step toward unified visual encoding and decoding from neural activity, providing mechanistic insights that inform future bidirectional visual brain-computer interfaces.


优化|敛散性(8篇)

【1】Record-Remix-Replay: Hierarchical GPU Kernel Optimization using Evolutionary Search
标题:Record-Remix-Replay:使用进化搜索的分层GPU内核优化
链接:https://arxiv.org/abs/2604.11109

作者:Daniel Nichols,Konstantinos Parasyris,Caetano Melone,Tal Ben-Nun,Giorgis Georgakoudis,Harshitha Menon
摘要:As high-performance computing and AI workloads become increasingly dependent on GPUs, maintaining high performance across rapidly evolving hardware generations has become a major challenge. Developers often spend months tuning scientific applications to fully exploit new architectures, navigating a complex optimization space that spans algorithm design, source implementation, compiler flags and pass sequences, and kernel launch parameters. Existing approaches can effectively search parts of this space in isolation, such as launch configurations or compiler settings, but optimizing across the full space still requires substantial human expertise and iterative manual effort.   In this paper, we present Record-Remix-Replay (R^3), a hierarchical optimization framework that combines LLM-driven evolutionary search, Bayesian optimization, and record-replay compilation techniques to efficiently explore GPU kernel optimizations from source-level implementation choices down to compiler pass ordering and runtime configuration. By making candidate evaluation fast and scalable, our approach enables practical end-to-end search over optimization dimensions that are typically treated separately. We show that Record-Remix-Replay can optimize full scientific applications better than traditional approaches over kernel parameters and compiler flags, while also being nearly an order of magnitude faster than modern evolutionary search approaches.


【2】Optimal Stability of KL Divergence under Gaussian Perturbations
标题:高斯扰动下KL发散的最优稳定性
链接:https://arxiv.org/abs/2604.11026

作者:Jialu Pan,Yufeng Zhang,Nan Hu,Keqin Li
摘要:We study the problem of characterizing the stability of Kullback-Leibler (KL) divergence under Gaussian perturbations beyond Gaussian families. Existing relaxed triangle inequalities for KL divergence critically rely on the assumption that all involved distributions are Gaussian, which limits their applicability in modern applications such as out-of-distribution (OOD) detection with flow-based generative models. In this paper, we remove this restriction by establishing a sharp stability bound between an arbitrary distribution and Gaussian families under mild moment conditions. Specifically, let $P$ be a distribution with finite second moment, and let $\mathcal{N}_1$ and $\mathcal{N}_2$ be multivariate Gaussian distributions. We show that if $KL(P||\mathcal{N}_1)$ is large and $KL(\mathcal{N}_1||\mathcal{N}_2)$ is at most $ε$, then $KL(P||\mathcal{N}_2) \ge KL(P||\mathcal{N}_1) - O(\sqrtε)$. Moreover, we prove that this $\sqrtε$ rate is optimal in general, even within the Gaussian family. This result reveals an intrinsic stability property of KL divergence under Gaussian perturbations, extending classical Gaussian-only relaxed triangle inequalities to general distributions. The result is non-trivial due to the asymmetry of KL divergence and the absence of a triangle inequality in general probability spaces. As an application, we provide a rigorous foundation for KL-based OOD analysis in flow-based models, removing strong Gaussian assumptions used in prior work. More broadly, our result enables KL-based reasoning in non-Gaussian settings arising in deep learning and reinforcement learning.


【3】UniPROT: Uniform Prototype Selection via Partial Optimal Transport with Submodular Guarantees
标题:UniPROT:通过具有子模块保证的部分最优传输进行统一原型选择
链接:https://arxiv.org/abs/2604.10952

作者:Prateek Chanda,Prayas Agrawal,Karthik S. Gurumoorthy,Ganesh Ramakrishnan,Bamdev Mishra,Pratik Jawanpuria
备注:25 pages, 31 figures. Accepted as a poster at AISTATS 2026
摘要:Selecting prototypical examples from a source distribution to represent a target data distribution is a fundamental problem in machine learning. Existing subset selection methods often rely on implicit importance scores, which can be skewed towards majority classes and lead to low-quality prototypes for minority classes. We present $\methodprop$, a novel subset selection framework that minimizes the optimal transport (OT) distance between a uniformly weighted prototypical distribution and the target distribution. While intuitive, this formulation leads to a cardinality-constrained maximization of a \emph{super-additive} objective, which is generally intractable to approximate efficiently. To address this, we propose a principled reformulation of the OT marginal constraints, yielding a partial optimal transport-based submodular objective. We prove that this reformulation enables a greedy algorithm with a $(1-1/e)$ approximation guarantee relative to the original super-additive maximization problem. Empirically, we showcase that enforcing uniform prototype weights in UniPROT consistently improves minority-class representation in imbalanced classification benchmarks without compromising majority-class accuracy. In both finetuning and pretraining regimes for large language models under domain imbalance, UniPROT enforces uniform source contributions, yielding robust performance gains. Our results establish UniPROT as a scalable, theoretically grounded solution for uniform-weighted prototype selection. Our code is publicly available at GitHub\footnote{Code: https://github.com/efficiency-learning/UniPROT}


【4】Online Covariance Estimation in Averaged SGD: Improved Batch-Mean Rates and Minimax Optimality via Trajectory Regression
标题:平均新元的在线协方差估计:通过轨迹回归改进批平均率和极小最优性
链接:https://arxiv.org/abs/2604.10814

作者:Yijin Ni,Xiaoming Huo
摘要:We study online covariance matrix estimation for Polyak--Ruppert averaged stochastic gradient descent (SGD). The online batch-means estimator of Zhu, Chen and Wu (2023) achieves an operator-norm convergence rate of $O(n^{-(1-α)/4})$, which yields $O(n^{-1/8})$ at the optimal learning-rate exponent $α\rightarrow 1/2^+$. A rigorous per-block bias analysis reveals that re-tuning the block-growth parameter improves the batch-means rate to $O(n^{-(1-α)/3})$, achieving $O(n^{-1/6})$. The modified estimator requires no Hessian access and preserves $O(d^2)$ memory. We provide a complete error decomposition into variance, stationarity bias, and nonlinearity bias components. A weighted-averaging variant that avoids hard truncation is also discussed. We establish the minimax rate $Θ(n^{-(1-α)/2})$ for Hessian-free covariance estimation from the SGD trajectory: a Le Cam lower bound gives $Ω(n^{-(1-α)/2})$, and a trajectory-regression estimator--which estimates the Hessian by regressing SGD increments on iterates--achieves $O(n^{-(1-α)/2})$, matching the lower bound. The construction reveals that the bottleneck is the sublinear accumulation of information about the Hessian from the SGD drift.


【5】WaterAdmin: Orchestrating Community Water Distribution Optimization via AI Agents
标题:WaterAdministration:通过人工智能代理演示社区供水优化
链接:https://arxiv.org/abs/2604.10343

作者:Jiaqi Wen,Pingbo Tang,Shaolei Ren,Jianyi Yang
摘要:We study the operation of community water systems, where pumps and valves must be scheduled to reliably meet water demands while minimizing energy consumption. While existing optimization-based methods are effective under well-modeled environments, real-world community scenarios exhibit highly dynamic contexts-such as human activities, weather variations, etc-that significantly affect water demand patterns and operational targets across different zones. Traditional optimization approaches struggle to aggregate and adapt to such heterogeneous and rapidly evolving contextual information in real time. While Large Language Model (LLM) agents offer strong capabilities for understanding heterogeneous community context, they are not suitable for directly producing reliable real-time control actions. To address these challenges, we propose a bi-level AI-agent-based framework, WaterAdmin, which integrates LLM-based community context abstraction at the upper level with optimization-based operational control at the lower level. This design leverages the complementary strengths of both paradigms to enable adaptive and reliable operation. We implement WaterAdmin on the hydraulic simulation platform EPANET and demonstrate superior performance in maintaining pressure reliability and reducing energy consumption under highly dynamic community contexts.


【6】From Recency Bias to Stable Convergence Block Kaczmarz Methods for Online Preference Learning in Matchmaking Applications
标题:从最近偏误到稳定收敛块匹配应用中在线偏好学习的Kaczmarz方法
链接:https://arxiv.org/abs/2604.09964

作者:James Nguyen
摘要:We present a family of Kaczmarz-based preference learning algorithms for real-time personalized matchmaking in reciprocal recommender systems. Post-step L2 normalization, common in Kaczmarz-inspired online learners, induces exponential recency bias: the influence of the t-th interaction decays as eta^(n - t), reaching approximately 1e-6 after just 20 swipes at eta = 0.5. We resolve this by replacing the normalization step with a Tikhonov-regularized projection denominator that bounds step size analytically without erasing interaction history. When candidate tag vectors are not pre-normalized, as in realistic deployments where candidates vary in tag density, the Tikhonov denominator ||a||^2 + alpha produces genuinely per-candidate adaptive step sizes, making it structurally distinct from online gradient descent with any fixed learning rate. We further derive a block variant that processes full swipe sessions as a single Gram matrix solve. Population-scale simulation over 6,400 swipes reveals that Block Normalized Kaczmarz (BlockNK), which combines the batch Gram solve with post-session L2 normalization, achieves the highest preference alignment (Align@20 = 0.698), the strongest inter-session direction stability (delta = 0.994), and the flattest degradation profile under label noise across flip ratios p_flip in [0.10, 0.35]. Experiments under cosine similarity subsampling further show that adaptively filtering the candidate pool toward the current preference direction substantially improves asymptotic alignment, at the cost of introducing a feedback loop that may slow recovery from miscalibration. The sequential Tikhonov-Kaczmarz method performs comparably to K-NoNorm under our simulation conditions, suggesting the dominant practical gain over normalized Kaczmarz is the removal of per-step normalization rather than the Tikhonov constant alpha itself.


【7】Last-Iterate Convergence of Randomized Kaczmarz and SGD with Greedy Step Size
标题:具有贪婪步进大小的随机Kaczmarz和Singapore的最后迭代收敛
链接:https://arxiv.org/abs/2604.09909

作者:Michał Dereziński,Xiaoyu Dong
摘要:We study last-iterate convergence of SGD with greedy step size over smooth quadratics in the interpolation regime, a setting which captures the classical Randomized Kaczmarz algorithm as well as other popular iterative linear system solvers. For these methods, we show that the $t$-th iterate attains an $O(1/t^{3/4})$ convergence rate, addressing a question posed by Attia, Schliserman, Sherman, and Koren, who gave an $O(1/t^{1/2})$ guarantee for this setting. In the proof, we introduce the family of stochastic contraction processes, whose behavior can be described by the evolution of a certain deterministic eigenvalue equation, which we analyze via a careful discrete-to-continuous reduction.


【8】Cost-optimal Sequential Testing via Doubly Robust Q-learning
标题:通过双重稳健Q学习进行成本最优的顺序测试
链接:https://arxiv.org/abs/2604.11165

作者:Doudou Zhou,Yiran Zhang,Dian Jin,Yingye Zheng,Lu Tian,Tianxi Cai
摘要:Clinical decision-making often involves selecting tests that are costly, invasive, or time-consuming, motivating individualized, sequential strategies for what to measure and when to stop ascertaining. We study the problem of learning cost-optimal sequential decision policies from retrospective data, where test availability depends on prior results, inducing informative missingness. Under a sequential missing-at-random mechanism, we develop a doubly robust Q-learning framework for estimating optimal policies. The method introduces path-specific inverse probability weights that account for heterogeneous test trajectories and satisfy a normalization property conditional on the observed history. By combining these weights with auxiliary contrast models, we construct orthogonal pseudo-outcomes that enable unbiased policy learning when either the acquisition model or the contrast model is correctly specified. We establish oracle inequalities for the stage-wise contrast estimators, along with convergence rates, regret bounds, and misclassification rates for the learned policy. Simulations demonstrate improved cost-adjusted performance over weighted and complete-case baselines, and an application to a prostate cancer cohort study illustrates how the method reduces testing cost without compromising predictive accuracy.


预测|估计(15篇)

【1】Physics-Informed State Space Models for Reliable Solar Irradiance Forecasting in Off-Grid Systems
标题:用于离网系统中可靠太阳辐射率预测的物理知情状态空间模型
链接:https://arxiv.org/abs/2604.11807

作者:Mohammed Ezzaldin Babiker Abdullah
摘要:The stable operation of autonomous off-grid photovoltaic systems dictates reliance on solar forecasting algorithms that respect atmospheric thermodynamics. Contemporary deep learning models consistently exhibit critical anomalies, primarily severe temporal phase lags during cloud transients and physically impossible nocturnal power generation. To resolve this divergence between data-driven modeling and deterministic celestial mechanics, this research introduces the Thermodynamic Liquid Manifold Network. The proposed methodology projects 15 meteorological and geometric variables into a Koopman-linearized Riemannian manifold to systematically map complex climatic dynamics. The architecture integrates a Spectral Calibration unit and a multiplicative Thermodynamic Alpha-Gate. This system synthesizes real-time atmospheric opacity with theoretical clear-sky boundary models, structurally enforcing strict celestial geometry compliance. This completely neutralizes phantom nocturnal generation while maintaining zero-lag synchronization during rapid weather shifts. Validated against a rigorous five-year testing horizon in a severe semi-arid climate, the framework achieves an RMSE of 18.31 Wh/m2 and a Pearson correlation of 0.988. The model strictly maintains a zero-magnitude nocturnal error across all 1826 testing days and exhibits a sub-30-minute phase response during high-frequency transients. Comprising exactly 63,458 trainable parameters, this ultra-lightweight design establishes a robust, thermodynamically consistent standard for edge-deployable microgrid controllers.


【2】TempusBench: An Evaluation Framework for Time-Series Forecasting
标题:TempusBench:时间序列预测的评估框架
链接:https://arxiv.org/abs/2604.11529

作者:Denizalp Goktas,Gerardo Riaño-Briceño,Alif Abdullah,Aryan Nair,Chenkai Shen,Beatriz de Lucio,Alexandra Magnusson,Farhan Mashrur,Ahmed Abdulla,Shawrna Sen,Mahitha Thippireddy,Gregory Schwartz,Amy Greenwald
摘要:Foundation models have transformed natural language processing and computer vision, and a rapidly growing literature on time-series foundation models (TSFMs) seeks to replicate this success in forecasting. While recent open-source models demonstrate the promise of TSFMs, the field lacks a comprehensive and community-accepted model evaluation framework. We see at least four major issues impeding progress on the development of such a framework. First, current evaluation frameworks consist of benchmark forecasting tasks derived from often outdated datasets (e.g., M3), many of which lack clear metadata and overlap with the corpora used to pre-train TSFMs. Second, existing frameworks evaluate models along a narrowly defined set of benchmark forecasting tasks such as forecast horizon length or domain, but overlook core statistical properties such as non-stationarity and seasonality. Third, domain-specific models (e.g., XGBoost) are often compared unfairly, as existing frameworks neglect a systematic and consistent hyperparameter tuning convention for all models. Fourth, visualization tools for interpreting comparative performance are lacking. To address these issues, we introduce TempusBench, an open-source evaluation framework for TSFMs. TempusBench consists of 1) new datasets which are not included in existing TSFM pretraining corpora, 2) a set of novel benchmark tasks that go beyond existing ones, 3) a model evaluation pipeline with a standardized hyperparameter tuning protocol, and 4) a tensorboard-based visualization interface. We provide access to our code on GitHub: https://github.com/Smlcrm/TempusBench.


【3】AbLWR:A Context-Aware Listwise Ranking Framework for Antibody-Antigen Binding Affinity Prediction via Positive-Unlabeled Learning
标题:AbLWR:通过正向无标记学习预测抗体-抗原结合亲和力的上下文感知列表排名框架
链接:https://arxiv.org/abs/2604.11272

作者:Fan Xu,Zhi-an Huang,Haohuai He,Yidong Song,Wei Liu,Dongxu Zhang,Yao Hu,Kay Chen Tan
摘要:Accurate prediction of antibody-antigen binding affinity is fundamental to therapeutic design, yet remains constrained by severe label sparsity and the complexity of antigenic variations. In this paper, we propose AbLWR (Antibody-antigen binding affinity List-Wise Ranking), a novel framework that reformulates the conventional affinity regression task as a listwise ranking problem. To mitigate label sparsity, AbLWR incorporates a PU (Positive-Unlabeled) learning mechanism leveraging a dual-level contrastive objective and meta-optimized label refinement to learn robust representations. Furthermore, we address antigenic variation by employing a homologous antigen sampling strategy where Multi-Head Self-Attention (MHSA) explicitly models inter-sample relationships within training lists to capture subtle affinity nuances. Extensive experiments demonstrate that AbLWR significantly outperforms state-of-the-art baselines, improving the Precision@1 (P@1) by over 10$\%$ in randomized cross-validation experiments. Notably, case studies on Influenza and IL-33 validate its practical utility, demonstrating robust ranking consistency in distinguishing subtle viral mutations and efficiently prioritizing top-tier candidates for wet-lab screening.


【4】ShapShift: Explaining Model Prediction Shifts with Subgroup Conditional Shapley Values
标题:ShapChange:用子组条件Shapley值解释模型预测变化
链接:https://arxiv.org/abs/2604.11200

作者:Tom Bewley,Salim I. Amoukou,Emanuele Albini,Saumitra Mishra,Manuela Veloso
摘要:Changes in input distribution can induce shifts in the average predictions of machine learning models. Such prediction shifts may impact downstream business outcomes (e.g. a bank's loan approval rate), so understanding their causes can be crucial. We propose \ours{}: a Shapley value method for attributing prediction shifts to changes in the conditional probabilities of interpretable subgroups of data, where these subgroups are defined by the structure of decision trees. We initially apply this method to single decision trees, providing exact explanations based on conditional probability changes at split nodes. Next, we extend it to tree ensembles by selecting the most explanatory tree and accounting for residual effects. Finally, we propose a model-agnostic variant using surrogate trees grown with a novel objective function, allowing application to models like neural networks. While exact computation can be intensive, approximation techniques enable practical application. We show that \ours{} provides simple, faithful, and near-complete explanations of prediction shifts across model classes, aiding model monitoring in dynamic environments.


【5】K-Way Energy Probes for Metacognition Reduce to Softmax in Discriminative Predictive Coding Networks
标题:区分性预测编码网络中元认知简化到Softmax的K路能量探测
链接:https://arxiv.org/abs/2604.11011

作者:Jon-Paul Cacioli
备注:33 pages, 3 figures
摘要:We present this as a negative result with an explanatory mechanism, not as a formal upper bound.   Predictive coding networks (PCNs) admit a K-way energy probe in which each candidate class is fixed as a target, inference is run to settling, and the per-hypothesis settled energies are compared. The probe appears to read a richer signal source than softmax, since the per-hypothesis energy depends on the entire generative chain.   We argue this appearance is misleading under the standard Pinchetti-style discriminative PC formulation. We present an approximate reduction showing that with target-clamped CE-energy training and effectively-feedforward latent dynamics, the K-way energy margin decomposes into a monotone function of the log-softmax margin plus a residual that is not trained to correlate with correctness. The decomposition predicts that the structural probe should track softmax from below.   We test this across six conditions on CIFAR-10: extended deterministic training, direct measurement of latent movement during inference, a post-hoc decoder fairness control on a backpropagation network, a matched-budget PC vs BP comparison, a five-point Langevin temperature sweep, and trajectory-integrated MCPC training. In every condition the probe sat below softmax. The gap was stable across training procedures within the discriminative PC family. Final-state and trajectory-integrated training produced probes whose AUROC_2 values differed by less than 10^-3 at deterministic evaluation.   The empirical regime is small: single seed, 2.1M-parameter network, 1280 test images. We frame the result as a preprint inviting replication. We discuss conditions under which the decomposition does not apply (bidirectional PC, prospective configuration, generative PC, non-CE energy formulations) and directions for productive structural probing the analysis does not foreclose.


【6】WaveMoE: A Wavelet-Enhanced Mixture-of-Experts Foundation Model for Time Series Forecasting
标题:WaveMoE:用于时间序列预测的微波增强型专家混合基础模型
链接:https://arxiv.org/abs/2604.10544

作者:Shunyu Wu,Jiawei Huang,Weibin Feng,Boxin Li,Xiao Zhang,Erli Meng,Dan Li,Jian Lou,See-Kiong Ng
备注:Presented at ICLR 2026 TSALM Workshop (1st Workshop on Time Series in the Age of Large Models)
摘要:Time series foundation models (TSFMs) have recently achieved remarkable success in universal forecasting by leveraging large-scale pretraining on diverse time series data. Complementing this progress, incorporating frequency-domain information yields promising performance in enhancing the modeling of complex temporal patterns, such as periodicity and localized high-frequency dynamics, which are prevalent in real-world time series. To advance this direction, we propose a new perspective that integrates explicit frequency-domain representations into scalable foundation models, and introduce WaveMoE, a wavelet-enhanced mixture-of-experts foundation model for time series forecasting. WaveMoE adopts a dual-path architecture that jointly processes time series tokens and wavelet tokens aligned along a unified temporal axis, and coordinates them through a shared expert routing mechanism that enables consistent expert specialization while efficiently scaling model capacity. Preliminary experimental results on 16 diverse benchmark datasets indicate that WaveMoE has the potential to further improve forecasting performance by incorporating wavelet-domain corpora.


【7】Integrating SAINT with Tree-Based Models: A Case Study in Employee Attrition Prediction
标题:将SANT与基于树的模型集成:员工吸引力预测的案例研究
链接:https://arxiv.org/abs/2604.10337

作者:Adil Derrazi,Javad Pourmostafa Roshan Sharami
备注:Accepted at IntelliSys 2025 (Springer LNNS)
摘要:Employee attrition presents a major challenge for organizations, increasing costs and reducing productivity. Predicting attrition accurately enables proactive retention strategies, but existing machine learning models often struggle to capture complex feature interactions in tabular HR datasets. While tree-based models such as XGBoost and LightGBM perform well on structured data, traditional encoding techniques like one-hot encoding can introduce sparsity and fail to preserve semantic relationships between categorical features.   This study explores a hybrid approach by integrating SAINT (Self-Attention and Intersample Attention Transformer)-generated embeddings with tree-based models to enhance employee attrition prediction. SAINT leverages self-attention mechanisms to model intricate feature interactions. In this study, we explore SAINT both as a standalone classifier and as a feature extractor for tree-based models. We evaluate the performance, generalizability, and interpretability of standalone models (SAINT, XGBoost, LightGBM) and hybrid models that combine SAINT embeddings with tree-based classifiers.   Experimental results show that standalone tree-based models outperform both the standalone SAINT model and the hybrid approaches in predictive accuracy and generalization. Contrary to expectations, the hybrid models did not improve performance. One possible explanation is that tree-based models struggle to utilize dense, high-dimensional embeddings effectively. Additionally, the hybrid approach significantly reduced interpretability, making model decisions harder to explain. These findings suggest that transformer-based embeddings, while capturing feature relationships, do not necessarily enhance tree-based classifiers. Future research should explore alternative fusion strategies for integrating deep learning with structured data.


【8】Not Your Stereo-Typical Estimator: Combining Vision and Language for Volume Perception
标题:不是你的刻板印象典型的估计:结合视觉和语言来实现音量感知
链接:https://arxiv.org/abs/2604.09886

作者:Gautham Vinod,Bruce Coburn,Siddeshwar Raghavan,Fengqing Zhu
摘要:Accurate volume estimation of objects from visual data is a long-standing challenge in computer vision with significant applications in robotics, logistics, and smart health. Existing methods often rely on complex 3D reconstruction pipelines or struggle with the ambiguity inherent in single-view images. To address these limitations, we introduce a new method that fuses implicit 3D cues from stereo vision with explicit prior knowledge from natural language text. Our approach extracts deep features from a stereo image pair and a descriptive text prompt that contains the object's class and an approximate volume, then integrates them using a simple yet effective projection layer into a unified, multi-modal representation for regression. We conduct extensive experiments on public datasets demonstrating that our text-guided approach significantly outperforms vision-only baselines. Our findings show that leveraging even simple textual priors can effectively guide the volume estimation task, paving the way for more context-aware visual measurement systems. Code: https://gitlab.com/viper-purdue/stereo-typical-estimator.


【9】STaR-DRO: Stateful Tsallis Reweighting for Group-Robust Structured Prediction
标题:STaR-DRO:有状态Tsallis重新加权以实现群体稳健结构化预测
链接:https://arxiv.org/abs/2604.09737

作者:Samah Fodeh,Ganesh Puthiaraju,Elyas Irankhah,Linhai Ma,Srivani Talakokkul,Afshan Khan,Sreeraj Ramachandran,Jordan Alpert,Sarah Schellhorn
摘要:Structured prediction requires models to generate ontology-constrained labels, grounded evidence, and valid structure under ambiguity, label skew, and heterogeneous group difficulty. We present a two-part framework for controllable inference and robust fine-tuning. First, we introduce a task-agnostic prompting strategy that combines XML-based instruction structure, disambiguation rules, verification-style reasoning, schema constraints, and self-validation to address format drift, label ambiguity, evidence hallucination, and metadata-conditioned confusion in in-context structured generation. Second, we introduce STaR-DRO, a stateful robust optimization method for group heterogeneity. It combines Tsallis mirror descent with momentum-smoothed, centered group-loss signals and bounded excess-only multipliers so that only persistently hard groups above a neutral baseline are upweighted, concentrating learning where it is most needed while avoiding volatile, dense exponentiated-gradient reweighting and unnecessary loss from downweighting easier groups. We evaluate the combined framework on EPPC Miner, a benchmark for extracting hierarchical labels and evidence spans from patient-provider secure messages. Prompt engineering improves zero-shot by +15.44 average F1 across Code, Sub-code, and Span over four Llama models. Building on supervised fine-tuning, STaR-DRO further improves the hardest semantic decisions: on Llama-3.3-70B-Instruct, Code F1 rises from 79.24 to 81.47 and Sub-code F1 from 67.78 to 69.30, while preserving Span performance and reducing group-wise validation cross-entropy by up to 29.6% on the most difficult clinical categories. Because these rare and difficult groups correspond to clinically consequential communication behaviors, these gains are not merely statistical improvements: they directly strengthen communication mining reliability for patient-centered care analysis.


【10】ML-Based Real-Time Downlink Performance Prediction in Standalone 5G NR Using Smartphones
标题:使用智能手机的独立5G NR中基于ML的实时下行链路性能预测
链接:https://arxiv.org/abs/2604.09632

作者:Md Mahfuzur Rahman,Jareen Shuva,Nishith Tripathi,Jeffrey H. Reed,Lingjia Liu
摘要 :We propose a machine learning (ML)-based framework for downlink performance prediction in 5G networks using real-time measurements from commercial off-the-shelf (COTS) user equipment (UE). Our experimental platform integrates the srsRAN 5G New Radio (NR) stack deployed on a Dell desktop serving as the 5G next generation nodeB (gNB), operating at 3.4 GHz. Two Google Pixel 7a smartphones are used to collect physical layer characteristics such as channel quality indicator (CQI), modulation and coding scheme (MCS), bit rate, transmission time interval (TTI), and block error rate (BLER), which are leveraged as predictors in model training. We use commercial-grade traffic generation tools, including Ookla, for stationary and mobility measurements under line-of-sight (LOS) and non-line-of-sight (nLOS) conditions. Test data includes global Ookla servers (e.g., USA, Portugal, Ghana, Egypt, Japan), iperf TCP/UDP data, and video streaming sessions from YouTube. To analyze inter-user interference, we also include scenarios with multiple UEs at the same location. We evaluate the predictive performance of five supervised regression models - linear regression, decision tree regression, random forest regression, extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM). Our results demonstrate that throughput and BLER can be accurately predicted using COTS hardware and standard ML techniques in diverse real-world 5G scenarios.


【11】Signal-Aware Conditional Diffusion Surrogates for Transonic Wing Pressure Prediction
标题:跨音速机翼压力预测的信号感知条件扩散代理
链接:https://arxiv.org/abs/2604.11263

作者:Víctor Francés-Belda,Carlos Sanmiguel Vila,Rodrigo Castellanos
备注:18 pages, 9 figures
摘要:Accurate and efficient surrogate models for aerodynamic surface pressure fields are essential for accelerating aircraft design and analysis, yet deterministic regressors trained with pointwise losses often smooth sharp nonlinear features. This work presents a conditional denoising diffusion probabilistic model for predicting surface pressure distributions on the NASA Common Research Model wing under varying conditions of Mach number, angle of attack, and four control surface deflections. The framework operates on unstructured surface data through a principal component representation used as a non-truncated, reversible linear reparameterization of the pressure field, enabling a fully connected architecture. A signal-aware training objective is derived by propagating a reconstruction loss through the diffusion process, yielding a timestep-dependent weighting that improves fidelity in regions with strong pressure gradients. The stochastic sampling process is analyzed through repeated conditional generations, and two diagnostic metrics are introduced, the Local Reliability Index and Global Reliability Index, to relate sampling-induced spread to reconstruction error. Relative to the considered deterministic baselines, the proposed formulation reduces mean absolute error and improves the reconstruction of suction peaks, shock structures, and control surface discontinuities. The sampling-induced spread exhibits strong correspondence with surrogate error, supporting its interpretation as a qualitative reliability indicator rather than calibrated uncertainty quantification.


【12】Probabilistic Prediction of Neural Dynamics via Autoregressive Flow Matching
标题:基于自回归流匹配的神经动力学概率预测
链接:https://arxiv.org/abs/2604.11178

作者:Nicole Rogalla,Yuzhen Qin,Mario Senden,Ahmed El-Gazzar,Marcel van Gerven
备注:25 pages, 4 figures
摘要:Forecasting neural activity in response to naturalistic stimuli remains a key challenge for understanding brain dynamics and enabling downstream neurotechnological applications. Here, we introduce a generative forecasting framework for modeling neural dynamics based on autoregressive flow matching (AFM). Building on recent advances in transport-based generative modeling, our approach probabilistically predicts neural responses at scale from multimodal sensory input. Specifically, we learn the conditional distribution of future neural activity given past neural dynamics and concurrent sensory input, explicitly modeling neural activity as a temporally evolving process in which future states depend on recent neural history. We evaluate our framework on the Algonauts project 2025 challenge functional magnetic resonance imaging dataset using subject-specific models. AFM significantly outperforms both a non-autoregressive flow-matching baseline and the official challenge general linear model baseline in predicting short-term parcel-wise blood oxygenation level-dependent (BOLD) activity, demonstrating improved generalization and widespread cortical prediction performance. Ablation analyses show that access to past BOLD dynamics is a dominant driver of performance, while autoregressive factorization yields consistent, modest gains under short-horizon, context-rich conditions. Together, these findings position autoregressive flow-based generative modeling as an effective approach for short-term probabilistic forecasting of neural dynamics with promising applications in closed-loop neurotechnology.


【13】One-Step Score-Based Density Ratio Estimation
标题:基于分数的一步密度比估计
链接:https://arxiv.org/abs/2604.10672

作者:Wei Chen,Qibin Zhao,John Paisley,Junmei Yang,Delu Zeng
摘要:Density ratio estimation (DRE) is a useful tool for quantifying discrepancies between probability distributions, but existing approaches often involve a trade-off between estimation quality and computational efficiency. Classical direct DRE methods are usually efficient at inference time, yet their performance can seriously deteriorate when the discrepancy between distributions is large. In contrast, score-based DRE methods often yield more accurate estimates in such settings, but they typically require considerable repeated function evaluations and numerical integration. We propose One-step Score-based Density Ratio Estimation (OS-DRE), a partly analytic and solver-free framework designed to combine these complementary advantages. OS-DRE decomposes the time score into spatial and temporal components, representing the latter with an analytic radial basis function (RBF) frame. This formulation converts the otherwise intractable temporal integral into a closed-form weighted sum, thereby removing the need for numerical solvers and enabling DRE with only one function evaluation. We further analyze approximation conditions for the analytic frame, and establish approximation error bounds for both finitely and infinitely smooth temporal kernels, grounding the framework in existing approximation theory. Experiments across density estimation, continual Kullback-Leibler and mutual information estimation, and near out-of-distribution detection demonstrate that OS-DRE offers a favorable balance between estimation quality and inference efficiency.


【14】Predicting Associations between Solar Flares and Coronal Mass Ejections Using SDO/HMI Magnetograms and a Hybrid Neural Network
标题:使用SDP/HM磁图和混合神经网络预测太阳耀斑与日冕物质抛射之间的关联
链接:https://arxiv.org/abs/2604.10016

作者:Jialiang Li,Vasyl Yurchyshyn,Jason T. L. Wang,Haimin Wang,Manolis K. Georgoulis,Wen He,Yasser Abduallah,Hameedullah A. Farooki,Yan Xu
备注:14 pages, 8 figures
摘要 :Solar eruptions, including flares and coronal mass ejections (CMEs), have a significant impact on Earth. Some flares are associated with CMEs, and some flares are not. The association between flares and CMEs is not always obvious. In this study, we propose a new deep learning method, specifically a hybrid neural network (HNN) that combines a vision transformer with long short-term memory, to predict associations between flares and CMEs. HNN finds spatio-temporal patterns in the time series of line-of-sight magnetograms of solar active regions (ARs) collected by the Helioseismic and Magnetic Imager on board the Solar Dynamics Observatory and uses the patterns to predict whether a flare projected to occur within the next 24 hours will be eruptive (i.e., CME-associated) or confined (i.e., not CME-associated). Our experimental results demonstrate the good performance of the HNN method. Furthermore, the results show that magnetic flux cancellation in polarity inversion line regions may well play a role in triggering flare-associated CMEs, a finding consistent with literature.


【15】Dynamic Forecasting and Temporal Feature Evolution of Stock Repurchases in Listed Companies Using Attention-Based Deep Temporal Networks
标题:基于注意力的深度时间网络的上市公司股票回购动态预测和时间特征演变
链接:https://arxiv.org/abs/2604.09650

作者:Xiang Ao,Jingxuan Zhang,Xinyu Zhao
备注:16 pages, 8 figures
摘要:Accurately predicting stock repurchases is crucial for quantitative investment and risk management, yet traditional static models fail to capture the complex temporal dependencies of corporate financial conditions. This paper proposes a dynamic early warning system integrating economic theory with deep temporal networks. Using Chinese A-share panel data (2014-2024), we employ a hybrid Temporal Convolutional Network (TCN) and Attention-based LSTM to capture long- and short-term financial evolutionary patterns. Rolling-window cross-validation demonstrates our model significantly outperforms static baselines like Logistic Regression and XGBoost. Furthermore, utilizing Explainable AI (XAI), we reveal the temporal dynamics of repurchase decisions: prolonged "undervaluation" serves as the long-term underlying motive, while a sharp increase in "cash flow" acts as the decisive short-term trigger. This study provides a robust deep learning paradigm for financial forecasting and offers dynamic empirical support for classic corporate finance hypotheses.


其他神经网络|深度学习|模型|建模(45篇)

【1】LangFlow: Continuous Diffusion Rivals Discrete in Language Modeling
标题:LangFlow:语言建模中的连续扩散竞争对手离散
链接:https://arxiv.org/abs/2604.11748

作者:Yuxin Chen,Chumeng Liang,Hangke Sui,Ruihan Guo,Chaoran Cheng,Jiaxuan You,Ge Liu
摘要:Continuous diffusion models have achieved strong performance across domains such as images. However, in language modeling, prior continuous diffusion language models (DLMs) lag behind discrete counterparts. In this work, we close this gap with LangFlow, the first continuous DLM to rival discrete diffusion. Our approach connects embedding-space DLMs to Flow Matching via Bregman divergence and introduces three key innovations: (1) a novel ODE-based NLL bound for principled evaluation of continuous flow-based language models; (2) an information-uniform principle for noise scheduling, motivating a learnable scheduler based on a Gumbel distribution; and (3) an improved training protocol incorporating self-conditioning, which enhances both likelihood and sample quality.LangFlow achieves strong performance across benchmarks, reaching a perplexity (PPL) of 30.0 on LM1B and 24.6 on OpenWebText. It matches top discrete DLMs at comparable scale and surpasses autoregressive baselines in zero-shot transfer across multiple benchmarks. LangFlow provides clear evidence that continuous diffusion is a competitive and promising paradigm for language modeling.   https://github.com/nealchen2003/LangFlow


【2】Fairness is Not Flat: Geometric Phase Transitions Against Shortcut Learning
标题:公平不是平坦的:几何相变反对重复学习
链接:https://arxiv.org/abs/2604.11704

作者:Nicolas Rodriguez-Alvarez,Fernando Rodriguez-Merino
摘要:Deep Neural Networks are highly susceptible to shortcut learning, frequently memorizing low-dimensional spurious correlations instead of underlying causal mechanisms. This phenomenon not only degrades out-of-distribution robustness but also induces severe demographic biases in sensitive applications. In this paper, we propose a geometric \textit{a priori} methodology to mitigate shortcut learning. By deploying a zero-hidden-layer ($N=1$) Topological Auditor, we mathematically isolate features that monopolize the gradient without human intervention. We empirically demonstrate a Capacity Phase Transition: once linear shortcuts are pruned, networks are forced to utilize higher geometric capacity ($N \geq 16$) to curve the decision boundary and learn ethical representations. Our approach outperforms L1 Regularization -- which collapses into demographic bias -- and operates at a fraction of the computational cost of post-hoc methods like Just Train Twice (JTT), successfully reducing counterfactual gender vulnerability from 21.18\% to 7.66\%.


【3】Playing Along: Learning a Double-Agent Defender for Belief Steering via Theory of Mind
标题:一起玩:学习通过心理理论引导信仰的双重主体捍卫者
链接:https://arxiv.org/abs/2604.11666

作者:Hanqi Xiao,Vaidehi Patil,Zaid Khan,Hyunji Lee,Elias Stengel-Eskin,Mohit Bansal
备注:First two authors contributed equally. Code: https://github.com/The-Inscrutable-X/AIDoubleAgentDefenders
摘要 :As large language models (LLMs) become the engine behind conversational systems, their ability to reason about the intentions and states of their dialogue partners (i.e., form and use a theory-of-mind, or ToM) becomes increasingly critical for safe interaction with potentially adversarial partners. We propose a novel privacy-themed ToM challenge, ToM for Steering Beliefs (ToM-SB), in which a defender must act as a Double Agent to steer the beliefs of an attacker with partial prior knowledge within a shared universe. To succeed on ToM-SB, the defender must engage with and form a ToM of the attacker, with a goal of fooling the attacker into believing they have succeeded in extracting sensitive information. We find that strong frontier models like Gemini3-Pro and GPT-5.4 struggle on ToM-SB, often failing to fool attackers in hard scenarios with partial attacker prior knowledge, even when prompted to reason about the attacker's beliefs (ToM prompting). To close this gap, we train models on ToM-SB to act as AI Double Agents using reinforcement learning, testing both fooling and ToM rewards. Notably, we find a bidirectionally emergent relationship between ToM and attacker-fooling: rewarding fooling success alone improves ToM, and rewarding ToM alone improves fooling. Across four attackers with different strengths, six defender methods, and both in-distribution and out-of-distribution (OOD) evaluation, we find that gains in ToM and attacker-fooling are well-correlated, highlighting belief modeling as a key driver of success on ToM-SB. AI Double Agents that combine both ToM and fooling rewards yield the strongest fooling and ToM performance, outperforming Gemini3-Pro and GPT-5.4 with ToM prompting on hard scenarios. We also show that ToM-SB and AI Double Agents can be extended to stronger attackers, demonstrating generalization to OOD settings and the upgradability of our task.


【4】GPU Acceleration of Sparse Fully Homomorphic Encrypted DNNs
标题:稀疏全同形加密DNN的图形处理器加速
链接:https://arxiv.org/abs/2604.11659

作者:Lara D'Agata,Carlos Agulló-Domingo,Óscar Vera-López,Kaustubh Shivdikar,Ardhi W. B. Yudha,Ferhat Yaman,David Kaeli,José L. Abellán,Ian Colbert,José Cano
备注:Accepted to the 6th Workshop on Machine Learning and Systems (EuroMLSys) co-located with EuroSys '26
摘要:Fully homomorphic encryption (FHE) has recently attracted significant attention as both a cryptographic primitive and a systems challenge. Given the latest advances in accelerated computing, FHE presents a promising opportunity for progress, with applications ranging from machine learning to information security. We target the most computationally intensive operation in deep neural networks from a hardware perspective, matrix multiplication (matmul), and adapt it for execution on AMD GPUs. We propose a new optimized method that improves the runtime and complexity of ciphertext matmul by using FIDESlib, a recent open-source FHE library designed specifically for GPUs. By exploiting sparsity in both operands, our sparse matmul implementation outperforms its CPU counterpart by up to $3.0\times$ and reduces the time complexity from cubic to semi-linear, demonstrating an improvement over existing FHE matmul implementations.


【5】SCNO: Spiking Compositional Neural Operator -- Towards a Neuromorphic Foundation Model for Nuclear PDE Solving
标题:SCNO:尖峰合成神经运算符--迈向核偏出方程求解的神经形态基础模型
链接:https://arxiv.org/abs/2604.11625

作者:Samrendra Roy,Souvik Chakraborty,Rizwan-uddin,Syed Bahauddin Alam
摘要:Neural operators have emerged as powerful surrogates for partial differential equation (PDE) solvers, yet they are typically trained as monolithic models for individual PDEs, require energy-intensive GPU hardware, and must be retrained from scratch when new physics emerge. We introduce the Spiking Compositional Neural Operator (SCNO), a modular architecture combining spiking and conventional components that addresses all three limitations. SCNO maintains a library of small spiking neural operator blocks, each trained on a single elementary differential operator (convection, diffusion, reaction), and composes them through a lightweight input-conditioned aggregator to solve coupled PDEs not seen during block training. A small correction network learns cross-coupling residuals while keeping all blocks and the aggregator frozen, preserving zero-forgetting modular expansion by construction. We evaluate SCNO on eight PDE families including five coupled systems and a nuclear-relevant 1-group neutron diffusion equation. SCNO with correction achieves the lowest relative $L^2$ error on four of five coupled PDEs, outperforming both a monolithic spiking DeepONet (by up to 62%, mean over 3 seeds) and a standard ANN DeepONet (by up to 65%), while requiring only 95K trainable parameters versus 462K for the monolithic baseline. To our knowledge, this is the first compositional spiking neural operator and the first proof-of-concept for modular neuromorphic PDE solving with built-in forgetting-free expansion.


【6】bacpipe: a Python package to make bioacoustic deep learning models accessible
标题:bacpipe:一个使生物声学深度学习模型易于访问的Python包
链接:https://arxiv.org/abs/2604.11560

作者:Vincent S. Kather,Sylvain Haupert,Burooj Ghani,Dan Stowell
摘要:1. Natural sounds have been recorded for millions of hours over the previous decades using passive acoustic monitoring. Improvements in deep learning models have vastly accelerated the analysis of large portions of this data. While new models advance the state-of-the-art, accessing them using tools to harness their full potential is not always straightforward. Here we present bacpipe, a collection of bioacoustic deep learning models and evaluation pipelines accessible through a graphical and programming interface, designed for both ecologists and computer scientists. Bacpipe is a modular software package intended as a point of convergence for bioacoustic models.   2. Bacpipe streamlines the usage of state-of-the-art models on custom audio datasets, generating acoustic feature vectors (embeddings) and classifier predictions. A modular design allows evaluation and benchmarking of models through interactive visualizations, clustering and probing.   3. We believe that access to new deep learning models is important. By designing bacpipe to target a wide audience, researchers will be enabled to answer new ecological and evolutionary questions in bioacoustics.   4. In conclusion, we believe accessibility to developments in deep learning to a wider audience benefits the ecological questions we are trying to answer.


【7】Structural Consequences of Policy-Based Interventions on the Global Supply Chain Network
标题:全球供应链网络政策干预的结构性后果
链接:https://arxiv.org/abs/2604.11479

作者:Lea Karbevska,Liming Xu,Zehui Dai,Sara AlMahri,Alexandra Brintrup
摘要 :As global political tensions rise and the anticipation of additional tariffs from the United States on international trade increases, the issues of economic independence and supply chain resilience become more prominent. The importance of supply chain resilience has been further underscored by disruptions caused by the COVID-19 pandemic and the ongoing war in Ukraine.In light of these challenges, ranging from geopolitical instability to product supply uncertainties, governments are increasingly focused on adopting new trade policies. This study explores the impact of several of these policies on the global electric vehicle (EV) supply chain network, with a particular focus on their effects on country clusters and the broader structure of international trade. Specifically, we analyse three key policies: Country Plus One, Friendshoring, and Reshoring. Our findings show that Friendshoring, contrary to expectations, leads to greater globalisation by increasing the number of supply links across friendly countries, potentially raising transaction costs. The Country Plus One policy similarly enhances network density through redundant links, while the Reshoring policy creates challenges in the EV sector due to the high number of irreplaceable products. Additionally, the effects of these policies vary across industries; for instance, mining goods being less affected in Country Plus One than the Friendshoring policy.


【8】Emulating Non-Differentiable Metrics via Knowledge-Guided Learning: Introducing the Minkowski Image Loss
标题:通过知识引导学习模拟不可区分的预设:引入Minkowski图像丢失
链接:https://arxiv.org/abs/2604.11422

作者:Filippo Quarenghi,Ryan Cotsakis,Tom Beucler
摘要:The ``differentiability gap'' presents a primary bottleneck in Earth system deep learning: since models cannot be trained directly on non-differentiable scientific metrics and must rely on smooth proxies (e.g., MSE), they often fail to capture high-frequency details, yielding ``blurry'' outputs. We develop a framework that bridges this gap using two different methods to deal with non-differentiable functions: the first is to analytically approximate the original non-differentiable function into a differentiable equivalent one; the second is to learn differentiable surrogates for scientific functionals. We formulate the analytical approximation by relaxing discrete topological operations using temperature-controlled sigmoids and continuous logical operators. Conversely, our neural emulator uses Lipschitz-convolutional neural networks to stabilize gradient learning via: (1) spectral normalization to bound the Lipschitz constant; and (2) hard architectural constraints enforcing geometric principles. We demonstrate this framework's utility by developing the Minkowski image loss, a differentiable equivalent for the integral-geometric measures of surface precipitation fields (area, perimeter, connected components). Validated on the EUMETNET OPERA dataset, our constrained neural surrogate achieves high emulation accuracy, completely eliminating the geometric violations observed in unconstrained baselines. However, applying these differentiable surrogates to a deterministic super-resolution task reveals a fundamental trade-off: while strict Lipschitz regularization ensures optimization stability, it inherently over-smooths gradient signals, restricting the recovery of highly localized convective textures. This work highlights the necessity of coupling such topological constraints with stochastic generative architectures to achieve full morphological realism.


【9】Exact Certification of Neural Networks and Partition Aggregation Ensembles against Label Poisoning
标题:神经网络和分区聚集集成针对标签中毒的精确认证
链接:https://arxiv.org/abs/2604.11416

作者:Ajinkya Mohgaonkar,Lukas Gosch,Mahalakshmi Sabanayagam,Debarghya Ghoshdastidar,Stephan Günnemann
备注:Workshop on Principled Design for Trustworthy AI @ ICLR 2026
摘要:Label-flipping attacks, which corrupt training labels to induce misclassifications at inference, remain a major threat to supervised learning models. This drives the need for robustness certificates that provide formal guarantees about a model's robustness under adversarially corrupted labels. Existing certification frameworks rely on ensemble techniques such as smoothing or partition-aggregation, but treat the corresponding base classifiers as black boxes, yielding overly conservative guarantees. We introduce EnsembleCert, the first certification framework for partition-aggregation ensembles that utilizes white-box knowledge of the base classifiers. Concretely, EnsembleCert yields tighter guarantees than black-box approaches by aggregating per-partition white-box certificates to compute ensemble-level guarantees in polynomial time. To extract white-box knowledge from the base classifiers efficiently, we develop ScaLabelCert, a method that leverages the equivalence between sufficiently wide neural networks and kernel methods using the neural tangent kernel. ScaLabelCert yields the first exact, polynomial-time calculable certificate for neural networks against label-flipping attacks. EnsembleCert is either on par, or significantly outperforms the existing partition-based black box certificates. Exemplary, on CIFAR-10, our method can certify upto +26.5% more label flips in median over the test set compared to the existing black-box approach while requiring 100 times fewer partitions, thus, challenging the prevailing notion that heavy partitioning is a necessity for strong certified robustness.


【10】THEIA: Learning Complete Kleene Three-Valued Logic in a Pure-Neural Modular Architecture
标题:THEIA:在纯神经模块化架构中学习完整的Kleene三值逻辑
链接:https://arxiv.org/abs/2604.11284

作者:Augustus Haoyang Li
备注:14 pages, 10 tables. Manuscript under review at the 2nd Workshop on Compositional Learning (CompLearn), ICML 2026
摘要:We present THEIA, a modular neural architecture that learns complete Kleene three-valued logic (K3) end-to-end without any external symbolic solver, and investigate what architectural prior enables compositional generalization under uncertainty. THEIA processes four mathematical domains (arithmetic, order, set membership, propositional logic) through dedicated engines that converge in a final logic module. Trained on a 2M-sample dataset with input space ~3.4x10^13, it achieves 12/12 Kleene K3 rule coverage across 5 seeds in 9.2 +/- 3.5 minutes (5.6x faster than a parameter-comparable Transformer under matched settings). A mod-3 sequential composition experiment generalizes from 5-step training to 500-step evaluation at 99.97% +/- 0.02% -- a result that critically depends on structured inductive bias: replacing the four-engine backbone with a flat MLP collapses length generalization to chance by 50 steps regardless of capacity (both 0.80M and parameter-matched 2.75M variants fail), while a pre-LN TF8LTuned Transformer baseline (3,582,147 params) trained under the identical protocol reaches 99.24% at 500 steps (Appendix D). Mechanistic probing reveals that modularity induces a delayed verdict: upstream engines encode domain-specific variables without committing to the final truth value (probe accuracy <= 74% uncertainty-only ceiling), with the verdict emerging only at the Logic Engine boundary -- causally confirmed by activation patching (100% flip rate on 986 matched pairs, replicated across n=5 seeds; 100.0% aggregate). The Transformer baseline reaches equivalent correctness through a qualitatively different representational trajectory (contraction then expansion), suggesting that modular and monolithic architectures implement distinct compositional strategies.


【11】3DTV: A Feedforward Interpolation Network for Real-Time View Synthesis
标题:3DVR:用于实时视图合成的前向插值网络
链接:https://arxiv.org/abs/2604.11211

作者:Stefan Schulz,Fernando Edelstein,Hannah Dröge,Matthias B. Hullin,Markus Plack
摘要:Real-time free-viewpoint rendering requires balancing multi-camera redundancy with the latency constraints of interactive applications. We address this challenge by combining lightweight geometry with learning and propose 3DTV, a feedforward network for real-time sparse-view interpolation. A Delaunay-based triplet selection ensures angular coverage for each target view. Building on this, we introduce a pose-aware depth module that estimates a coarse-to-fine depth pyramid, enabling efficient feature reprojection and occlusion-aware blending. Unlike methods that require scene-specific optimization, 3DTV runs feedforward without retraining, making it practical for AR/VR, telepresence, and interactive applications. Our experiments on challenging multi-view video datasets demonstrate that 3DTV consistently achieves a strong balance of quality and efficiency, outperforming recent real-time novel-view baselines. Crucially, 3DTV avoids explicit proxies, enabling robust rendering across diverse scenes. This makes it a practical solution for low-latency multi-view streaming and interactive rendering.   Project Page: https://stefanmschulz.github.io/3DTV_webpage/


【12】Gradient-Variation Regret Bounds for Unconstrained Online Learning
标题:无限制在线学习的学生变化后悔界限
链接:https://arxiv.org/abs/2604.11151

作者:Yuheng Zhao,Andrew Jacobsen,Nicolò Cesa-Bianchi,Peng Zhao
摘要:We develop parameter-free algorithms for unconstrained online learning with regret guarantees that scale with the gradient variation $V_T(u) = \sum_{t=2}^T \|\nabla f_t(u)-\nabla f_{t-1}(u)\|^2$. For $L$-smooth convex loss, we provide fully-adaptive algorithms achieving regret of order $\widetilde{O}(\|u\|\sqrt{V_T(u)} + L\|u\|^2+G^4)$ without requiring prior knowledge of comparator norm $\|u\|$, Lipschitz constant $G$, or smoothness $L$. The update in each round can be computed efficiently via a closed-form expression. Our results extend to dynamic regret and find immediate implications to the stochastically-extended adversarial (SEA) model, which significantly improves upon the previous best-known result [Wang et al., 2025].


【13】AIM: Intent-Aware Unified world action Modeling with Spatial Value Maps
标题:目标:具有空间价值地图的意图感知统一世界动作建模
链接:https://arxiv.org/abs/2604.11135

作者:Liaoyuan Fan,Zetian Xu,Chen Cao,Wenyao Zhang,Mingqi Yuan,Jiayu Chen
摘要:Pretrained video generation models provide strong priors for robot control, but existing unified world action models still struggle to decode reliable actions without substantial robot-specific training. We attribute this limitation to a structural mismatch: while video models capture how scenes evolve, action generation requires explicit reasoning about where to interact and the underlying manipulation intent. We introduce AIM, an intent-aware unified world action model that bridges this gap via an explicit spatial interface. Instead of decoding actions directly from future visual representations, AIM predicts an aligned spatial value map that encodes task-relevant interaction structure, enabling a control-oriented abstraction of future dynamics. Built on a pretrained video generation model, AIM jointly models future observations and value maps within a shared mixture-of-transformers architecture. It employs intent-causal attention to route future information to the action branch exclusively through the value representation. We further propose a self-distillation reinforcement learning stage that freezes the video and value branches and optimizes only the action head using dense rewards derived from projected value-map responses together with sparse task-level signals. To support training and evaluation, we construct a simulation dataset of 30K manipulation trajectories with synchronized multi-view observations, actions, and value-map annotations. Experiments on RoboTwin 2.0 benchmark show that AIM achieves a 94.0% average success rate, significantly outperforming prior unified world action baselines. Notably, the improvement is more pronounced in long-horizon and contact-sensitive manipulation tasks, demonstrating the effectiveness of explicit spatial-intent modeling as a bridge between visual world modeling and robot control.


【14】A Faster Path to Continual Learning
标题:更快的持续学习之路
链接:https://arxiv.org/abs/2604.11064

作者:Wei Li,Hangjie Yuan,Zixiang Zhao,Borui Kang,Ziwei Liu,Tao Feng
摘要:Continual Learning (CL) aims to train neural networks on a dynamic stream of tasks without forgetting previously learned knowledge. Among optimization-based approaches, C-Flat has emerged as a promising solution due to its plug-and-play nature and its ability to encourage uniformly low-loss regions for both new and old tasks. However, C-Flat requires three additional gradient computations per iteration, imposing substantial overhead on the optimization process. In this work, we propose C-Flat Turbo, a faster yet stronger optimizer that significantly reduces the training cost. We show that the gradients associated with first-order flatness contain direction-invariant components relative to the proxy-model gradients, enabling us to skip redundant gradient computations in the perturbed ascent steps. Moreover, we observe that these flatness-promoting gradients progressively stabilize across tasks, which motivates a linear scheduling strategy with an adaptive trigger to allocate larger turbo steps for later tasks. Experiments show that C-Flat Turbo is 1.0$\times$ to 1.25$\times$ faster than C-Flat across a wide range of CL methods, while achieving comparable or even improved accuracy.


【15】Pando: Do Interpretability Methods Work When Models Won't Explain Themselves?
标题:Pando:当模型不解释自己时,解释性方法有效吗?
链接:https://arxiv.org/abs/2604.11061

作者:Ziqian Zhong,Aashiq Muhamed,Mona T. Diab,Virginia Smith,Aditi Raghunathan
摘要 :Mechanistic interpretability is often motivated for alignment auditing, where a model's verbal explanations can be absent, incomplete, or misleading. Yet many evaluations do not control whether black-box prompting alone can recover the target behavior, so apparent gains from white-box tools may reflect elicitation rather than internal signal; we call this the elicitation confounder. We introduce Pando, a model-organism benchmark that breaks this confound via an explanation axis: models are trained to produce either faithful explanations of the true rule, no explanation, or confident but unfaithful explanations of a disjoint distractor rule.   Across 720 finetuned models implementing hidden decision-tree rules, agents predict held-out model decisions from $10$ labeled query-response pairs, optionally augmented with one interpretability tool output. When explanations are faithful, black-box elicitation matches or exceeds all white-box methods; when explanations are absent or misleading, gradient-based attribution improves accuracy by 3-5 percentage points, and relevance patching, RelP, gives the largest gains, while logit lens, sparse autoencoders, and circuit tracing provide no reliable benefit. Variance decomposition suggests gradients track decision computation, which fields causally drive the output, whereas other readouts are dominated by task representation, biases toward field identity and value.   We release all models, code, and evaluation infrastructure.


【16】Learning to Adapt: In-Context Learning Beyond Stationarity
标题:学会适应:超越静态的情境学习
链接:https://arxiv.org/abs/2604.10946

作者:Zhen Qin,Jiachen Jiang,Zhihui Zhu
摘要:Transformer models have become foundational across a wide range of scientific and engineering domains due to their strong empirical performance. A key capability underlying their success is in-context learning (ICL): when presented with a short prompt from an unseen task, transformers can perform per-token and next-token predictions without any parameter updates. Recent theoretical efforts have begun to uncover the mechanisms behind this phenomenon, particularly in supervised regression settings. However, these analyses predominantly assume stationary task distributions, which overlook a broad class of real-world scenarios where the target function varies over time. In this work, we bridge this gap by providing a theoretical analysis of ICL under non-stationary regression problems. We study how the gated linear attention (GLA) mechanism adapts to evolving input-output relationships and rigorously characterize its advantages over standard linear attention in this dynamic setting. To model non-stationarity, we adopt a first-order autoregressive process and show that GLA achieves lower training and testing errors by adaptively modulating the influence of past inputs -- effectively implementing a learnable recency bias. Our theoretical findings are further supported by empirical results, which validate the benefits of gating mechanisms in non-stationary ICL tasks.


【17】Progressive Deep Learning for Automated Spheno-Occipital Synchondrosis Maturation Assessment
标题:渐进式深度学习用于自动蝶枕软骨结合成熟评估
链接:https://arxiv.org/abs/2604.10945

作者:Omid Halimi Milani,Amanda Nikho,Marouane Tliba,Lauren Mills,Emadeldeen Hamdan,Ahmet Enis Cetin,Mohammed H. Elnagar
摘要:Accurate assessment of spheno-occipital synchondrosis (SOS) maturation is a key indicator of craniofacial growth and a critical determinant for orthodontic and surgical timing. However, SOS staging from cone-beam CT (CBCT) relies on subtle, continuously evolving morphological cues, leading to high inter-observer variability and poor reproducibility, especially at transitional fusion stages. We frame SOS assessment as a fine-grained visual recognition problem and propose a progressive representation-learning framework that explicitly mirrors how expert clinicians reason about synchondral fusion: from coarse anatomical structure to increasingly subtle patterns of closure. Rather than training a full-capacity network end-to-end, we sequentially grow the model by activating deeper blocks over time, allowing early layers to first encode stable cranial base morphology before higher-level layers specialize in discriminating adjacent maturation stages. This yields a curriculum over network depth that aligns deep feature learning with the biological continuum of SOS fusion. Extensive experiments across convolutional and transformer-based architectures show that this expert-inspired training strategy produces more stable optimization and consistently higher accuracy than standard training, particularly for ambiguous intermediate stages. Importantly, these gains are achieved without changing network architectures or loss functions, demonstrating that training dynamics alone can substantially improve anatomical representation learning. The proposed framework establishes a principled link between expert dental intuition and deep visual representations, enabling robust, data-efficient SOS staging from CBCT and offering a general strategy for modeling other continuous biological processes in medical imaging.


【18】Efficient Process Reward Modeling via Contrastive Mutual Information
标题:基于对比互信息的高效流程奖励建模
链接:https://arxiv.org/abs/2604.10660

作者:Nakyung Lee,Sangwoo Hong,Jungwoo Lee
备注:Accepted at ACL 2026 Main Conference
摘要:Recent research has devoted considerable effort to verifying the intermediate reasoning steps of chain-of-thought (CoT) trajectories using process reward models (PRMs) and other verifier models. However, training a PRM typically requires human annotators to assign reward scores to each reasoning step, which is both costly and time-consuming. Existing automated approaches, such as Monte Carlo (MC) estimation, also demand substantial computational resources due to repeated LLM rollouts. To overcome these limitations, we propose contrastive pointwise mutual information (CPMI), a novel automatic reward labeling method that leverages the model's internal probability to infer step-level supervision while significantly reducing the computational burden of annotating dataset. CPMI quantifies how much a reasoning step increases the mutual information between the step and the correct target answer relative to hard-negative alternatives. This contrastive signal serves as a proxy for the step's contribution to the final solution and yields a reliable reward. The experimental results show that CPMI-based labeling reduces dataset construction time by 84% and token generation by 98% compared to MC estimation, while achieving higher accuracy on process-level evaluations and mathematical reasoning benchmarks.


【19】ReadMOF: Structure-Free Semantic Embeddings from Systematic MOF Nomenclature for Machine Learning
标题:ReadIOM:来自机器学习的系统性MBE术语的无结构语义嵌入
链接:https://arxiv.org/abs/2604.10568

作者:Kewei Zhu,Cameron Wilson,Bartosz Mazur,Yi Li,Ashleigh M. Chester,Peyman Z. Moghadam
备注:29 pages, 8 figures
摘要 :Systematic chemical names, such as IUPAC-style nomenclature for metal-organic frameworks (MOFs), contain rich structural and compositional information in a standardized textual format. Here we introduce ReadMOF, which is, to our knowledge, the first nomenclature-free machine learning framework that leverages these names to model structure-property relationships without requiring atomic coordinates or connectivity graphs. By employing pretrained language models, ReadMOF converts systematic MOF names from the Cambridge Structural Database (CSD) into vector embeddings that closely represent traditional structure-based descriptors. These embeddings enable applications in materials informatics, including property prediction, similarity retrieval, and clustering, with performance comparable to geometry-dependent methods. When combined with large language models, ReadMOF also establishes chemically meaningful reasoning ability with textual input only. Our results show that structured chemical language, interpreted through modern natural language processing techniques, can provide a scalable, interpretable, and geometry-independent alternative to conventional molecular representations. This approach opens new opportunities for language-driven discovery in materials science.


【20】Heterogeneous Connectivity in Sparse Networks: Fan-in Profiles, Gradient Hierarchy, and Topological Equilibria
标题:稀疏网络中的异类连接性:扇入配置文件、梯度层次结构和布局平衡
链接:https://arxiv.org/abs/2604.10560

作者:Nikodem Tomczak
摘要:Profiled Sparse Networks (PSN) replace uniform connectivity with deterministic, heterogeneous fan-in profiles defined by continuous, nonlinear functions, creating neurons with both dense and sparse receptive fields. We benchmark PSN across four classification datasets spanning vision and tabular domains, input dimensions from 54 to 784, and network depths of 2--3 hidden layers. At 90% sparsity, all static profiles, including the uniform random baseline, achieve accuracy within 0.2-0.6% of dense baselines on every dataset, demonstrating that heterogeneous connectivity provides no accuracy advantage when hub placement is arbitrary rather than task-aligned. This result holds across sparsity levels (80-99.9%), profile shapes (eight parametric families, lognormal, and power-law), and fan-in coefficients of variation from 0 to 2.5. Internal gradient analysis reveals that structured profiles create a 2-5x gradient concentration at hub neurons compared to the ~1x uniform distribution in random baselines, with the hierarchy strength predicted by fan-in coefficient of variation ($r = 0.93$). When PSN fan-in distributions are used to initialise RigL dynamic sparse training, lognormal profiles matched to the equilibrium fan-in distribution consistently outperform standard ERK initialisation, with advantages growing on harder tasks, achieving +0.16% on Fashion-MNIST ($p = 0.036$, $d = 1.07$), +0.43% on EMNIST, and +0.49% on Forest Cover. RigL converges to a characteristic fan-in distribution regardless of initialisation. Starting at this equilibrium allows the optimiser to refine weights rather than rearrange topology. Which neurons become hubs matters more than the degree of connectivity variance, i.e., random hub placement provides no advantage, while optimisation-driven placement does.


【21】PepBenchmark: A Standardized Benchmark for Peptide Machine Learning
标题:PepBenchmark:肽机器学习的标准化基准
链接:https://arxiv.org/abs/2604.10531

作者:Jiahui Zhang,Rouyi Wang,Kuangqi Zhou,Tianshu Xiao,Lingyan Zhu,Yaosen Min,Yang Wang
摘要:Peptide therapeutics are widely regarded as the "third generation" of drugs, yet progress in peptide Machine Learning (ML) are hindered by the absence of standardized benchmarks. Here we present PepBenchmark, which unifies datasets, preprocessing, and evaluation protocols for peptide drug discovery. PepBenchmark comprises three components: (1) PepBenchData, a well-curated collection comprising 29 canonical-peptide and 6 non-canonical-peptide datasets across 7 groups, systematically covering key aspects of peptide drug development, representing, to the best of our knowledge, the most comprehensive AI-ready dataset resource to date; (2) PepBenchPipeline, a standardized preprocessing pipeline that ensures consistent dataset cleaning, construction, splitting, and feature transformation, mitigating quality issues common in ad hoc pipelines; and (3) PepBenchLeaderboard, a unified evaluation protocol and leaderboard with strong baselines across 4 major methodological families: Fingerprint-based, GNN-based, PLM-based, and SMILES-based models. Together, PepBenchmark provides the first standardized and comparable foundation for peptide drug discovery, facilitating methodological advances and translation into real-world applications. The data and code are publicly available at https://github.com/ZGCI-AI4S-Pep/PepBenchmark/.


【22】Rethinking the Diffusion Model from a Langevin Perspective
标题:从朗之万角度重新思考扩散模型
链接:https://arxiv.org/abs/2604.10465

作者:Candi Zheng,Yuan Lan
备注:20 pages, 7 figures
摘要:Diffusion models are often introduced from multiple perspectives, such as VAEs, score matching, or flow matching, accompanied by dense and technically demanding mathematics that can be difficult for beginners to grasp. One classic question is: how does the reverse process invert the forward process to generate data from pure noise? This article systematically organizes the diffusion model from a fresh Langevin perspective, offering a simpler, clearer, and more intuitive answer. We also address the following questions: how can ODE-based and SDE-based diffusion models be unified under a single framework? Why are diffusion models theoretically superior to ordinary VAEs? Why is flow matching not fundamentally simpler than denoising or score matching, but equivalent under maximum-likelihood? We demonstrate that the Langevin perspective offers clear and straightforward answers to these questions, bridging existing interpretations of diffusion models, showing how different formulations can be converted into one another within a common framework, and offering pedagogical value for both learners and experienced researchers seeking deeper intuition.


【23】Battery health prognosis using Physics-informed neural network with Quantum Feature mapping
标题:使用具有量子特征映射的物理信息神经网络进行电池健康预测
链接:https://arxiv.org/abs/2604.10362

作者:Muhammad Imran Hossain,Md Fazley Rafy,Sarika Khushlani Solanki,Anurag K. Srivastava
摘要 :Accurate battery health prognosis using State of Health (SOH) estimation is essential for the reliability of multi-scale battery energy storage, yet existing methods are limited in generalizability across diverse battery chemistries and operating conditions. The inability of standard neural networks to capture the complex, high-dimensional physics of battery degradation is a major contributor to these limitations. To address this, a physics-informed neural network with the Quantum Feature Mapping(QFM) technique (QPINN) is proposed. QPINN projects raw battery sensor data into a high-dimensional Hilbert space, creating a highly expressive feature set that effectively captures subtle, non-linear degradation patterns using Nyström method. These quantum-enhanced features are then processed by a physics-informed network that enforces physical constraints. The proposed method achieves an average SOH estimation accuracy of 99.46\% across different datasets, substantially outperforming state-of-the-art baselines, with reductions in MAPE and RMSE of up to 65\% and 62\%, respectively. This method was validated on a large-scale, multi-chemistry dataset of 310,705 samples from 387 cells, and further showed notable adaptability in cross-validation settings, successfully transferring from one chemistry to another without relying on target-domain SOH labels.


【24】Anatomy-Informed Deep Learning for Abdominal Aortic Aneurysm Segmentation
标题:用于腹主动脉瘤分割的解剖学深度学习
链接:https://arxiv.org/abs/2604.10312

作者:Osamah Sufyan,Martin Brückmann,Ralph Wickenhöfer,Babette Dellen,Uwe Jaekel
备注:International Conference on Computational Science
摘要:In CT angiography, the accurate segmentation of abdominal aortic aneurysms (AAAs) is difficult due to large anatomical variability, low-contrast vessel boundaries, and the close proximity of organs whose intensities resemble vascular structures, often leading to false positives. To address these challenges, we propose an anatomy-aware segmentation framework that integrates organ exclusion masks derived from TotalSegmentator into the training process. These masks encode explicit anatomical priors by identifying non-vascular organsand penalizing aneurysm predictions within these regions, thereby guiding the U-Net to focus on the aorta and its pathological dilation while suppressing anatomically implausible predictions. Despite being trained on a relatively small dataset, the anatomy-aware model achieves high accuracy, substantially reduces false positives, and improves boundary consistency compared to a standard U-Net baseline. The results demonstrate that incorporating anatomical knowledge through exclusion masks provides an efficient mechanism to enhance robustness and generalization, enabling reliable AAA segmentation even with limited training data.


【25】Descriptor-Injected Cross-Modal Learning: A Systematic Exploration of Audio-MIDI Alignment via Spectral and Melodic Features
标题:描述符注入的跨模式学习:通过频谱和旋律特征对音频与音频对齐的系统探索
链接:https://arxiv.org/abs/2604.10283

作者:Mariano Fernández Méndez
备注:26 pages, 11 figures, 20 tables. Companion paper to "Harmonic Information Theory: Foundations" (2026). Code: https://github.com/AlterMundi/Phideus
摘要:Cross-modal retrieval between audio recordings and symbolic music representations (MIDI) remains challenging because continuous waveforms and discrete event sequences encode different aspects of the same performance. We study descriptor injection, the augmentation of modality-specific encoders with hand-crafted domain features, as a bridge across this gap. In a three-phase campaign covering 13 descriptor-mechanism combinations, 6 architectural families, and 3 training schedules, the best configuration reaches a mean S of 84.0 percent across five independent seeds, improving the descriptor-free baseline by 8.8 percentage points. Causal ablation shows that the audio descriptor A4, based on octave-band energy dynamics, drives the gain in the top dual models, while the MIDI descriptor D4 has only a weak inference-time effect despite improving training dynamics. We also introduce reverse cross-attention, where descriptor tokens query encoder features, reducing attention operations relative to the standard formulation while remaining competitive. CKA analysis shows that descriptors substantially increase audio-MIDI transformer layer alignment, indicating representational convergence rather than simple feature concatenation. Perturbation analysis identifies high-frequency octave bands as the dominant discriminative signal. All experiments use MAESTRO v3.0.0 with an evaluation protocol controlling for composer and piece similarity.


【26】The Phase Is the Gradient: Equilibrium Propagation for Frequency Learning in Kuramoto Networks
标题:阶段就是梯度:仓本网络中频率学习的平衡传播
链接:https://arxiv.org/abs/2604.10272

作者:Mani Rash Ahmadi
备注:15 pages, 5 figures, 8 tables. Code and data at https://github.com/caliburlabs/phasegrad
摘要:We prove that in a coupled Kuramoto oscillator network at stable equilibrium, the physical phase displacement under weak output nudging is the gradient of the loss with respect to natural frequencies, with equality as the nudging strength beta tends to zero. Prior oscillator equilibrium propagation work explicitly set aside natural frequency as a learnable parameter; we show that on sparse layered architectures, frequency learning outperforms coupling-weight learning among converged seeds (96.0% vs. 83.3% at matched parameter counts, p = 1.8e-12). The approximately 50% convergence failure rate under random initialization is a loss-landscape property, not a gradient error; topology-aware spectral seeding eliminates it in all settings tested (46/100 to 100/100 seeds on the primary task; 50/50 on a second task, K-only training, and a larger architecture).


【27】A Multi-head Attention Fusion Network for Industrial Prognostics under Discrete Operational Conditions
标题:离散操作条件下工业预测的多头注意力融合网络
链接:https://arxiv.org/abs/2604.10248

作者:Yuqi Su,Xiaolei Fang
摘要 :Complex systems such as aircraft engines, turbines, and industrial machinery often operate under dynamically changing conditions. These varying operating conditions can substantially influence degradation behavior and make prognostic modeling more challenging, as accurate prediction requires explicit consideration of operational effects. To address this issue, this paper proposes a novel multi-head attention-based fusion neural network. The proposed framework explicitly models and integrates three signal components: (1) the monotonic degradation trend, which reflects the underlying deterioration of the system; (2) discrete operating states, identified through clustering and encoded into dense embeddings; and (3) residual random noise, which captures unexplained variation in sensor measurements. The core strength of the framework lies in its architecture, which combines BiLSTM networks with attention mechanisms to better capture complex temporal dependencies. The attention mechanism allows the model to adaptively weight different time steps and sensor signals, improving its ability to extract prognostically relevant information. In addition, a fusion module is designed to integrate the outputs from the degradation-trend branch and the operating-state embeddings, enabling the model to capture their interactions more effectively. The proposed method is validated using a dataset from the NASA repository, and the results demonstrate its effectiveness.


【28】Wolkowicz-Styan Upper Bound on the Hessian Eigenspectrum for Cross-Entropy Loss in Nonlinear Smooth Neural Networks
标题:非线性光滑神经网络中交叉熵损失的Hessian特征谱的Wolkowicz-Styan上界
链接:https://arxiv.org/abs/2604.10202

作者:Yuto Omae,Kazuki Sakai,Yohei Kakimoto,Makoto Sasaki,Yusuke Sakai,Hirotaka Takahashi
备注:19 pages
摘要:Neural networks (NNs) are central to modern machine learning and achieve state-of-the-art results in many applications. However, the relationship between loss geometry and generalization is still not well understood. The local geometry of the loss function near a critical point is well-approximated by its quadratic form, obtained through a second-order Taylor expansion. The coefficients of the quadratic term correspond to the Hessian matrix, whose eigenspectrum allows us to evaluate the sharpness of the loss at the critical point. Extensive research suggests flat critical points generalize better, while sharp ones lead to higher generalization error. However, sharpness requires the Hessian eigenspectrum, but general matrix characteristic equations have no closed-form solution. Therefore, most existing studies on evaluating loss sharpness rely on numerical approximation methods. Existing closed-form analyses of the eigenspectrum are primarily limited to simplified architectures, such as linear or ReLU-activated networks; consequently, theoretical analysis of smooth nonlinear multilayer neural networks remains limited. Against this background, this study focuses on nonlinear, smooth multilayer neural networks and derives a closed-form upper bound for the maximum eigenvalue of the Hessian with respect to the cross-entropy loss by leveraging the Wolkowicz-Styan bound. Specifically, the derived upper bound is expressed as a function of the affine transformation parameters, hidden layer dimensions, and the degree of orthogonality among the training samples. The primary contribution of this paper is an analytical characterization of loss sharpness in smooth nonlinear multilayer neural networks via a closed-form expression, avoiding explicit numerical eigenspectrum computation. We hope that this work provides a small yet meaningful step toward unraveling the mysteries of deep learning.


【29】RF-LEGO: Modularized Signal Processing-Deep Learning Co-Design for RF Sensing via Deep Unrolling
标题:RF-LEGO:通过深度展开进行RF传感的模块化信号处理-深度学习联合设计
链接:https://arxiv.org/abs/2604.10183

作者:Luca Jiang-Tao Yu,Chenshu Wu
备注:Accepted by The 32nd Annual International Conference on Mobile Computing and Networking (MobiCom '26), October 26-30, 2026, Austin, TX, USA. 16 pages
摘要:Wireless sensing, traditionally relying on signal processing (SP) techniques, has recently shifted toward data-driven deep learning (DL) to achieve performance breakthroughs. However, existing deep wireless sensing models are typically end-to-end and task-specific, lacking reusability and interpretability. We propose RF-LEGO, a modular co-design framework that transforms interpretable SP algorithms into trainable, physics-grounded DL modules through deep unrolling. By replacing hand-tuned parameters with learnable ones while preserving core processing structures and mathematical operators, RF-LEGO ensures modularity, cascadability, and structure-aligned interpretability. Specifically, we introduce three deep-unrolled modules for critical RF sensing tasks: frequency transform, spatial angle estimation, and signal detection. Extensive experiments using real-world data for Wi-Fi, millimeter-wave, UWB, and 6G sensing demonstrate that RF-LEGO significantly outperforms existing SP and DL baselines, both standalone and when integrated into multiple downstream tasks. RF-LEGO pioneers a novel SP-DL co-design paradigm for wireless sensing via deep unrolling, shedding light on efficient and interpretable deep wireless sensing solutions. Our code is available at https://github.com/aiot-lab/RF-LEGO.


【30】Global monitoring of methane point sources using deep learning on hyperspectral radiance measurements from EMIT
标题:利用EMIT对超光谱辐射测量的深度学习对甲烷点源进行全球监测
链接:https://arxiv.org/abs/2604.10094

作者:Vishal V. Batchu,Michelangelo Conserva,Alex Wilson,Anna M. Michalak,Varun Gulshan,Philip G. Brodrick,Andrew K. Thorpe,Christopher V. Arsdale
备注:43 pages, 27 figures, 4 tables


【31】Engineering Resource-constrained Software Systems with DNN Components: a Concept-based Pruning Approach
标题:具有DNN组件的资源受限软件系统工程:基于概念的修剪方法
链接:https://arxiv.org/abs/2604.09988

作者:Federico Formica,Andrea Rota,Aurora Francesca Zanenga,Andrea Bombarda,Mark Lawford,Lionel C. Briand,Claudio Menghi


【32】Sharpness-Aware Surrogate Training for On-Sensor Spiking Neural Networks
标题:传感器上尖峰神经网络的敏锐感知代理训练
链接:https://arxiv.org/abs/2604.09696

作者:Maximilian Nicholson
备注:Currently under review at a conference workshop


【33】Isomorphic Functionalities between Ant Colony and Ensemble Learning: Part III -- Gradient Descent, Neural Plasticity, and the Emergence of Deep Intelligence
标题 :蚁群和集合学习之间的同质功能:第三部分--梯度下降、神经可塑性和深度智能的出现
链接:https://arxiv.org/abs/2604.09677

作者:Ernest Fokoué,Gregory Babbitt,Yuval Levental
备注:25 pages, 10 figures, 3 tables


【34】Fairboard: a quantitative framework for equity assessment of healthcare models
标题:公平委员会:医疗模式公平性评估的定量框架
链接:https://arxiv.org/abs/2604.09656

作者:James K. Ruffle,Samia Mohinta,Chris Foulon,Mohamad Zeina,Zicheng Wang,Sebastian Brandner,Harpreet Hyare,Parashkev Nachev
备注:30 pages, 6 figures, 109 extended data figures (ancillary file)


【35】Leveraging Machine Learning Techniques to Investigate Media and Information Literacy Competence in Tackling Disinformation
标题:利用机器学习技术调查媒体和信息素养应对虚假信息的能力
链接:https://arxiv.org/abs/2604.09635

作者:José Manuel Alcalde-Llergo,Mariana Buenestado Fernández,Carlos Enrique George-Reyes,Andrea Zingoni,Enrique Yeguas-Bolívar
备注:20 pages. 1 figure. 4 tables


【36】VTC: DNN Compilation with Virtual Tensors for Data Movement Elimination
标题:VTC:具有虚拟张量的DNN编译以消除数据移动
链接:https://arxiv.org/abs/2604.09558

作者:Muyan Hu,Ahan Gupta,Jiachen Yuan,Vima Gupta,Taeksang Kim,Xin Xu,Janardhan Kulkarni,Ofer Dekel,Vikram Adve,Charith Mendis
备注:Accepted to OSDI'26


【37】Minimizing classical resources in variational measurement-based quantum computation for generative modeling
标题:最大限度地减少基于变分测量的量子计算中的经典资源以进行生成式建模
链接:https://arxiv.org/abs/2604.11578

作者:Arunava Majumder,Hendrik Poulsen Nautrup,Hans J. Briegel
备注:14 pages


【38】Machine-learning modeling of magnetization dynamics in quasi-equilibrium and driven metallic spin systems
标题:准平衡和受驱动金属旋转系统中磁性动力学的机器学习建模
链接:https://arxiv.org/abs/2604.11513

作者:Gia-Wei Chern,Yunhao Fan,Sheng Zhang,Puhan Zhang
备注:19 pages, 12 figures


【39】Neural Generalized Mixed-Effects Models
标题:神经广义混合效应模型
链接:https://arxiv.org/abs/2604.10976

作者:Yuli Slavutsky,Sebastian Salazar,David M. Blei


【40】bioLeak: Leakage-Aware Modeling and Diagnostics for Machine Learning in R
标题:bioLeak:R语言中用于机器学习的泄漏感知建模和诊断
链接:https://arxiv.org/abs/2604.10965

作者:Selçuk Korkmaz
备注:35 pages, 4 figures


【41】A Deep Generative Approach to Stratified Learning
标题:分层学习的深度生成方法
链接:https://arxiv.org/abs/2604.10650

作者:Randy Martinez,Rong Tang,Lizhen Lin
备注:79 pages, 5 figures


【42】Orthogonal machine learning for conditional odds and risk ratios
标题:条件赔率和风险比的垂直机器学习
链接:https://arxiv.org/abs/2604.10412

作者:Jiacheng Ge,Iván Díaz


【43】Daily Predictions of F10.7 and F30 Solar Indices with Deep Learning
标题:利用深度学习每日预测F10.7和F30太阳指数
链接:https://arxiv.org/abs/2604.10045

作者:Zhenduo Wang,Yasser Abduallah,Jason T. L. Wang,Haimin Wang,Yan Xu,Vasyl Yurchyshyn,Vincent Oria,Khalid A. Alobaid,Xiaoli Bai
备注:23 pages, 12 figures


【44】Learning What's Real: Disentangling Signal and Measurement Artifacts in Multi-Sensor Data, with Applications to Astrophysics
标题:了解真相:解开多传感器数据中的信号和测量伪影,及其在天体物理学中的应用
链接:https://arxiv.org/abs/2604.09787

作者:Pablo Mercader-Perez,Carolina Cuesta-Lazaro,Daniel Muthukrishna,Jeroen Audenaert,V. Ashley Villar,David W. Hogg,Marc Huertas-Company,William T. Freeman
备注:Accepted at the 2nd Workshop on Foundation Models for Science at ICLR 2026. 10 pages, 6 figures, plus appendix


【45】Differentiable free energy surface: a variational approach to directly observing rare events using generative deep-learning models
标题:可区分自由能表面:使用生成式深度学习模型直接观察罕见事件的变分方法
链接:https://arxiv.org/abs/2604.09769

作者:Shuo-Hui Li,Chen Chen,Yao-Wen Zhang,Ding Pan
备注:Main text: 20 pages, 5 figures. Supplement: 12 pages


其他(75篇)

【1】ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents
标题:ClawGUI:用于训练、评估和部署GUI代理的统一框架
链接:https://arxiv.org/abs/2604.11784

作者:Fei Tang,Zhiqiong Lu,Boxuan Zhang,Weiming Lu,Jun Xiao,Yueting Zhuang,Yongliang Shen


【2】CUTEv2: Unified and Configurable Matrix Extension for Diverse CPU Architectures with Minimal Design Overhead
标题:CUTEv 2:统一且可配置的矩阵扩展,以最小的设计设计,用于多样化的中央处理器架构
链接:https://arxiv.org/abs/2604.11615

作者:Jinpeng Ye,Chongxi Wang,Wenqing Li,Bin Yuan,Shiyi Wang,Fenglu Zhang,Junyu Yue,Jianan Xie,Yunhao Ye,Haoyu Deng,Yingkun Zhou,Xin Cheng,Fuxin Zhang,Jian Wang
备注:Accepted to DAC 2026


【3】Utilizing and Calibrating Hindsight Process Rewards via Reinforcement with Mutual Information Self-Evaluation
标题:通过互信息自我评估强化来利用和校准事后诸葛亮流程奖励
链接:https://arxiv.org/abs/2604.11611

作者:Jiashu Yao,Heyan Huang,Zeming Liu,Yuhang Guo
备注:preprint


【4】Generative Path-Finding Method for Wasserstein Gradient Flow
标题:Wasserstein梯度流的生成寻路方法
链接:https://arxiv.org/abs/2604.11519

作者:Chengyu Liu,Xiang Zhou
备注:Due to the arXiv notice that "The Abstract field cannot be longer than 1,920 characters", the abstract shown here is shortened. For the full abstract, please download the article


【5】The Price of Ignorance: Information-Free Quotation for Data Retention in Machine Unlearning
标题:无知的代价:机器遗忘中数据保留的无信息报价
链接:https://arxiv.org/abs/2604.11511

作者:Bin Han,Di Feng,Zexin Fang,Jie Wang,Hans D. Schotten
备注:Submitted to IEEE Transactions on Mobile Computing


【6】Quantization Dominates Rank Reduction for KV-Cache Compression
标题:量化主导了KV-缓存压缩的降序
链接:https://arxiv.org/abs/2604.11501

作者:Samuel Salfati
备注:16 pages, 3 figures


【7】From Attribution to Action: A Human-Centered Application of Activation Steering
标题:从归因到行动:以人为本的激活引导应用
链接:https://arxiv.org/abs/2604.11467

作者:Tobias Labarta,Maximilian Dreyer,Katharina Weitz,Wojciech Samek,Sebastian Lapuschkin


【8】Select Smarter, Not More: Prompt-Aware Evaluation Scheduling with Submodular Guarantees
标题:选择更智能,而不是更多:具有子模块保证的预算感知评估安排
链接:https://arxiv.org/abs/2604.11328

作者:Xiaoyu Ma,Yiwen Li,Haoyue Liu,Zhichao Wang,Ye Chen,Yongxin Guo,Xiaoying Tang


【9】S$^3$: Structured Sparsity Specification
标题:S $' 3 $:结构化稀疏规范
链接:https://arxiv.org/abs/2604.11315

作者:Ayoub Ghriss
备注:8 pages main text, 12 pages appendix


【10】Beyond Fixed False Discovery Rates: Post-Hoc Conformal Selection with E-Variables
标题:超越固定错误发现率:使用E变量的事后共形选择
链接:https://arxiv.org/abs/2604.11305

作者:Meiyi Zhu,Osvaldo Simeone
备注:32 pages, 29 figures


【11】The Past Is Not Past: Memory-Enhanced Dynamic Reward Shaping
标题:过去的不是过去:记忆增强的动态奖励塑造
链接:https://arxiv.org/abs/2604.11297

作者:Yang Liu,Enxi Wang,Yufei Gao,Weixin Zhang,Bo Wang,Zhiyuan Zeng,Yikai Zhang,Yining Zheng,Xipeng Qiu


【12】Transactional Attention: Semantic Sponsorship for KV-Cache Retention
标题:传递注意力:KV-缓存保留的语义赞助
链接:https://arxiv.org/abs/2604.11288

作者:Abhinaba Basu


【13】Reducing Hallucination in Enterprise AI Workflows via Hybrid Utility Minimum Bayes Risk (HUMBR)
标题:通过混合公用事业最小Bayes Risk(HUMBR)减少企业人工智能工作流程中的幻觉
链接:https://arxiv.org/abs/2604.11141

作者:Chenhao Fang,Jordi Mola,Mark Harman,Jason Nawrocki,Vaibhav Shrivastava,Yue Cheng,Jay Minesh Shah,Katayoun Zand,Mansi Tripathi,Arya Pudota,Matthew Becker,Hervé Robert,Abhishek Gulati


【14】Efficient Transceiver Design for Aerial Image Transmission and Large-scale Scene Reconstruction
标题:用于航空图像传输和大规模场景重建的高效收发器设计
链接:https://arxiv.org/abs/2604.11098

作者:Zeyi Ren,Jialin Dong,Wei Zuo,Yikun Wang,Bingyang Cheng,Sheng Zhou,Zhisheng Niu
备注:6 pages, 6 figures, submitted to IEEE ISIT-w


【15】Bottleneck Tokens for Unified Multimodal Retrieval
标题:统一多模式检索的瓶颈令牌
链接:https://arxiv.org/abs/2604.11095

作者:Siyu Sun,Jing Ren,Zhaohe Liao,Dongxiao Mao,Xiangyuan Ren,Yiyi Zhang,Haohua Zhao,Weixiong Lin,Jiang Shaohua,Liqing Zhang,Yuchao Zheng


【16】Lightweight Low-Light Image Enhancement via Distribution-Normalizing Preprocessing and Depthwise U-Net
标题:通过分布规范化预处理和依赖U-Net实现轻量级低光图像增强
链接:https://arxiv.org/abs/2604.11071

作者:Shimon Murai,Teppei Kurita,Ryuta Satoh,Yusuke Moriuchi
备注:Technical report for the NTIRE 2026 Efficient Low-Light Image Enhancement Challenge (CVPR 2026 Workshops), 4th place solution


【17】RTMC: Step-Level Credit Assignment via Rollout Trees
标题:RTMC:通过推出树进行分步信用分配
链接:https://arxiv.org/abs/2604.11037

作者:Tao Wang,Suhang Zheng,Xiaoxiao Xu


【18】Min-$k$ Sampling: Decoupling Truncation from Temperature Scaling via Relative Logit Dynamics
标题:Min-$k$采样:通过相对Logit动态将截断与温度标度脱钩
链接:https://arxiv.org/abs/2604.11012

作者:Yuanhao Ding,Meimingwei Li,Esteban Garces Arias,Matthias Aßenmacher,Christian Heumann,Chongsheng Zhang
备注:Accepted at ACL 2026 (Main Conference)


【19】Sanity Checks for Agentic Data Science
标题:统计数据科学的健全检查
链接:https://arxiv.org/abs/2604.11003

作者:Zachary T. Rewolinski,Austin V. Zane,Hao Huang,Chandan Singh,Chenglong Wang,Jianfeng Gao,Bin Yu


【20】Tracking High-order Evolutions via Cascading Low-rank Fitting
标题:通过级联低等级匹配跟踪高级进化
链接:https://arxiv.org/abs/2604.10980

作者:Zhao Song


【21】Hypergraph Neural Diffusion: A PDE-Inspired Framework for Hypergraph Message Passing
标题:超图神经扩散:一个受PED启发的超图消息传递框架
链接:https://arxiv.org/abs/2604.10955

作者:Zhiheng Zhou,Mengyao Zhou,Xixun Lin,Xingqin Qi,Guiying Yan


【22】Generative Design for Direct-to-Chip Liquid Cooling for Data Centers
标题:用于数据中心的直接到芯片液体冷却的生成式设计
链接:https://arxiv.org/abs/2604.10941

作者:Zheng Liu
备注:5 pages, 2 figures


【23】Query Lower Bounds for Diffusion Sampling
标题:查询扩散抽样的下限
链接:https://arxiv.org/abs/2604.10857

作者:Zhiyang Xun,Eric Price


【24】Slithering Through Gaps: Capturing Discrete Isolated Modes via Logistic Bridging
标题:滑过差距:通过逻辑桥捕捉离散孤立模式
链接:https://arxiv.org/abs/2604.10821

作者:Pinaki Mohanty,Ruqi Zhang


【25】Differentially Private Verification of Distribution Properties
标题:分销属性的差异私人验证
链接:https://arxiv.org/abs/2604.10819

作者:Elbert Du,Cynthia Dwork,Pranay Tankala,Linjun Zhang


【26】Mitigating Privacy Risk via Forget Set-Free Unlearning
标题:通过忘记无限制的遗忘来缓解隐私风险
链接:https://arxiv.org/abs/2604.10636

作者:Aviraj Newatia,Michael Cooper,Viet Nguyen,Rahul G. Krishnan
备注:50 pages, 20 figures, Published at The Fourteenth International Conference on Learning Representations


【27】Multimodal Dataset Normalization and Perceptual Validation for Music-Taste Correspondences
标题:音乐品味对应的多模式数据集规范化和感知验证
链接:https://arxiv.org/abs/2604.10632

作者:Matteo Spanio,Valentina Frezzato,Antonio Rodà
备注:Submitted to SMC2026


【28】Distributionally Robust PAC-Bayesian Control
标题:分布鲁棒PAC-Bayesian控制
链接:https://arxiv.org/abs/2604.10588

作者:Domagoj Herceg,Duarte Antunes


【29】Preventing Latent Rehearsal Decay in Online Continual SSL with SOLAR
标题:使用SOlar防止在线连续SSL中的潜在排练衰退
链接:https://arxiv.org/abs/2604.10586

作者:Giacomo Cignoni,Simone Magistri,Andrew D. Bagdanov,Antonio Carta


【30】Exact Finite-Sample Variance Decomposition of Subagging: A Spectral Filtering Perspective
标题:子标记的精确伪样本方差分解:光谱过滤的角度
链接:https://arxiv.org/abs/2604.10469

作者:Ye Su,Mingrui Ye,Yining Wang,Jipeng Guo,Yong Liu


【31】Replicable Composition
标题:可复制成分
链接:https://arxiv.org/abs/2604.10423

作者:Kiarash Banihashem,MohammadHossein Bateni,Hossein Esfandiari,Samira Goudarzi,MohammadTaghi Hajiaghayi
备注:Abstract shortened due to Arxiv requirements


【32】Neural Stochastic Processes for Satellite Precipitation Refinement
标题:卫星降水细化的神经随机过程
链接:https://arxiv.org/abs/2604.10414

作者:Shunya Nagashima,Takumi Bannai,Shuitsu Koyama,Tomoya Mitsui,Shuntaro Suzuki


【33】Intent-aligned Formal Specification Synthesis via Traceable Refinement
标题:通过可追溯细化实现意图一致的形式规范综合
链接:https://arxiv.org/abs/2604.10392

作者:Zhe Ye,Aidan Z. H. Yang,Huangyuan Su,Zhenyu Liao,Samuel Tenka,Zhizhen Qin,Udaya Ghai,Dawn Song,Soonho Kong


【34】Structural Gating and Effect-aligned Lag-resolved Temporal Causal Discovery Framework with Application to Heat-Pollution Extremes
标题:结构门控和预算对齐的拉格朗日分解时间因果发现框架及其在极端热污染中的应用
链接:https://arxiv.org/abs/2604.10371

作者:Rui Chen,Jinsong Wu


【35】The Amazing Agent Race: Strong Tool Users, Weak Navigators
标题:惊人的代理竞赛:强大的工具用户,薄弱的导航者
链接:https://arxiv.org/abs/2604.10261

作者:Zae Myung Kim,Dongseok Lee,Jaehyung Kim,Vipul Raheja,Dongyeop Kang


【36】Exploring the impact of fairness-aware criteria in AutoML
标题:探索AutoML中公平意识标准的影响
链接:https://arxiv.org/abs/2604.10224

作者:Joana Simões,João Correia


【37】Mild Over-Parameterization Benefits Asymmetric Tensor PCA
标题:轻度过度参数化好处不对称张量PCA
链接:https://arxiv.org/abs/2604.10208

作者:Shihong Ding,Weicheng Lin,Cong Fang


【38】FatigueFusion: Latent Space Fusion for Fatigue-Driven Motion Synthesis
标题:AspergueFusion:用于Aspergue驱动运动合成的潜在空间融合
链接:https://arxiv.org/abs/2604.10199

作者:Iliana Loi,Konstantinos Moustakas
备注:13 pages, 9 figures. This work has been submitted to the IEEE for possible publication


【39】Tessera: Unlocking Heterogeneous GPUs through Kernel-Granularity Disaggregation
标题:Tessera:通过核心粒度分解解锁异类图形处理器
链接:https://arxiv.org/abs/2604.10180

作者:Tiancheng Hu,Jin Qin,Zheng Wang,Junhao Hu,Yuzheng Wang,Lei Chen,Yizhou Shan,Mingxing Zhang,Ting Cao,Chunwei Xia,Huimin Cui,Tao Xie,Chenxi Wang


【40】A Modularized Framework for Piecewise-Stationary Restless Bandits
标题:分段固定不安盗贼的模块化框架
链接:https://arxiv.org/abs/2604.10177

作者:Kuan-Ta Li,Chia-Chun Lin,Ping-Chun Hsieh,Yu-Chih Huang


【41】"bot lane noob" Towards Deployment of NLP-based Toxicity Detectors in Video Games
链接:https://arxiv.org/abs/2604.10175

作者:Jonas Ave,Irdin Pekaric,Matthias Frohner,Giovanni Apruzzese
备注:Accepted to ESORICS'26


【42】Consensus-based Recursive Multi-Output Gaussian Process
标题:基于问题的递进多输出高斯过程
链接:https://arxiv.org/abs/2604.10146

作者:Yogesh Prasanna Kumar Rao,Tamas Keviczky,Raj Thilak Rajan
备注:Submitted to International Workshop on Signal Processing and Artificial Intelligence in Wireless Communications (IEEE SPAWC 2026)


【43】When Can You Poison Rewards? A Tight Characterization of Reward Poisoning in Linear MDPs
标题:什么时候可以毒害奖励?线性MDPs中奖励中毒的严格描述
链接:https://arxiv.org/abs/2604.10062

作者:Jose Efraim Aguilar Escamilla,Haoyang Hong,Jiawei Li,Haoyu Zhao,Xuezhou Zhang,Sanghyun Hong,Huazheng Wang


【44】Closed-Form Concept Erasure via Double Projections
标题:通过双重投影的封闭形式概念擦除
链接:https://arxiv.org/abs/2604.10032

作者:Chi Zhang,Jingpu Cheng,Zhixian Wang,Ping Liu


【45】LVSum: A Benchmark for Timestamp-Aware Long Video Summarization
标题:LVSum:时间戳感知长视频摘要的基准
链接:https://arxiv.org/abs/2604.10024

作者:Alkesh Patel,Melis Ozyildirim,Ying-Chang Cheng,Ganesh Nagarajan
备注:25 pages, 5 tables, 3 figures


【46】Towards Multi-Source Domain Generalization for Sleep Staging with Noisy Labels
标题:走向具有噪音标签的睡眠阶段的多源领域概括
链接:https://arxiv.org/abs/2604.10009

作者:Kening Wang,Di Wen,Yufan Chen,Ruiping Liu,Junwei Zheng,Jiale Wei,Kailun Yang,Rainer Stiefelhagen,Kunyu Peng
备注:The benchmark and code will be made publicly available at https://github.com/KNWang970918/FF-TRUST.git


【47】Reproduction Beyond Benchmarks: ConstBERT and ColBERT-v2 Across Backends and Query Distributions
标题:超越基准的复制:跨越后台和查询分布的ConstBERT和ColBERT-v2
链接:https://arxiv.org/abs/2604.09982

作者:Utshab Kumar Ghosh,Ashish David,Shubham Chatterjee
备注:10 pages, 9 tables. Accepted to the 49th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2026)


【48】Vestibular reservoir computing
标题:前庭水库计算
链接:https://arxiv.org/abs/2604.09943

作者:Smita Deb,Shirin Panahi,Mulugeta Haile,Ying-Cheng Lai
备注:24 pages, 11 figures


【49】CableTract: A Co-Designed Cable-Driven Field Robot for Low-Compaction, Off-Grid Capable Agriculture
标题:CableTract:联合设计的电缆驱动田间机器人,用于低紧凑性、离网能力农业
链接:https://arxiv.org/abs/2604.09938

作者:Ozgur Yilmaz


【50】Efficient Personalization of Generative User Interfaces
标题:生成式用户界面的有效个性化
链接:https://arxiv.org/abs/2604.09876

作者:Yi-Hao Peng,Samarth Das,Jeffrey P. Bigham,Jason Wu


【51】COMPOSITE-Stem
标题:复合茎
链接:https://arxiv.org/abs/2604.09836

作者:Kyle Waters,Lucas Nuzzi,Tadhg Looram,Alessandro Tomasiello,Ariel Ghislain Kemogne Kamdoum,Bikun Li,Damien Sileo,Egor Kretov,Francesco Fournier-Facio,Georgios Soloupis,Haile Kassahun,Hew Wolff,Jiaqi Cai,Lianghui Li,Marc Roth,Mohinder Naiya,Naixu Guo,Qicheng Tang,Richard Wheeler,Samuele Sala,Serguei Popov,Steven Dillman,Yuqi Li


【52】Spectral Kernel Dynamics via Maximum Caliber: Fixed Points, Geodesics, and Phase Transitions
标题:通过最大口径的光谱核动力学:不固定点、测地线和相转变
链接:https://arxiv.org/abs/2604.09745

作者:Jnaneshwar Das
备注:15 pages, 7 figures


【53】Efficient Matrix Implementation for Rotary Position Embedding
标题:旋转位置嵌入的高效矩阵实现
链接:https://arxiv.org/abs/2604.09742

作者:Chen Minqi,Zhongqi Yue,Shihao Zhang,Yun Xu,Peng Wu,kaixiang Xu,Zeyi Huang,Hanwang Zhang


【54】Face Density as a Proxy for Data Complexity: Quantifying the Hardness of Instance Count
标题:面部密度作为数据复杂性的代理:量化实例计数的硬度
链接:https://arxiv.org/abs/2604.09689

作者:Abolfazl Mohammadi-Seif,Ricardo Baeza-Yates
备注:Accepted for publication at IEEE CAI 2026


【55】Digital hybridity and relics in cultural heritage: using corpus linguistics to inform design in emerging technologies from AI to VR
标题:文化遗产中的数字混合和遗迹:使用语料库语言学为从人工智能到VR的新兴技术的设计提供信息
链接:https://arxiv.org/abs/2604.09669

作者:Emma McClaughlin,Glenn McGarry,Alan Chamberlain,Geert De Wilde,Oliver Butler
备注 :This is a (ACM J.5 Arts & Humanities Paper) relating to Hybrid Technologies, Language, AI, VR, Interaction and Experience. 24 pages. Int J Digit Humanities (2026)


【56】NeuroPath: Practically Adopting Motor Imagery Decoding through EEG Signals
标题:NeuroPath:实际采用通过脑电信号进行运动图像解码
链接:https://arxiv.org/abs/2604.09654

作者:Jiani Cao,Kun Wang,Yang Liu,Zhenjiang Li


【57】ECHO: Elastic Speculative Decoding with Sparse Gating for High-Concurrency Scenarios
标题:ECHO:针对高并发场景的稀疏门控弹性推测解码
链接:https://arxiv.org/abs/2604.09603

作者:Xinyi Hu,Yuhao Shen,Baolin Zhang,Hengxin Zhang,Jun Dai,Shuang Ge,Lei Chen,Yue Li,Mingcheng Wan


【58】Spatial Competence Benchmark
标题:空间能力基准
链接:https://arxiv.org/abs/2604.09594

作者:Jash Vira,Ashley Harris
备注:Accepted at the ICLR 2026 Workshop on Efficient Spatial Reasoning


【59】Persistent Identity in AI Agents: A Multi-Anchor Architecture for Resilient Memory and Continuity
标题:人工智能代理中的身份持久:弹性记忆和连续性的多锚点架构
链接:https://arxiv.org/abs/2604.09588

作者:Prahlad G. Menon
备注:18 pages, 2 figures. Submitting to arXiv cs.ET (Emerging Technologies)


【60】MobiFlow: Real-World Mobile Agent Benchmarking through Trajectory Fusion
标题:DeliverFlow:通过轨迹融合进行现实世界移动代理基准测试
链接:https://arxiv.org/abs/2604.09587

作者:Yunfei Feng,Xi Zhao,Cheng Zhang,Dahu Feng,Daolin Cheng,Jianqi Yu,Yubin Xia,Erhu Feng


【61】Turing Test on Screen: A Benchmark for Mobile GUI Agent Humanization
标题:屏幕图灵测试:移动图形用户界面代理人性化的基准
链接:https://arxiv.org/abs/2604.09574

作者:Jiachen Zhu,Lingyu Yang,Rong Shan,Congmin Zheng,Zeyu Zheng,Weiwen Liu,Yong Yu,Weinan Zhang,Jianghao Lin


【62】The Diffusion-Attention Connection
标题:扩散与注意力的联系
链接:https://arxiv.org/abs/2604.09560

作者:Julio Candanedo


【63】LABBench2: An Improved Benchmark for AI Systems Performing Biology Research
标题:LABBench2:用于执行生物学研究的人工智能系统的改进基准
链接:https://arxiv.org/abs/2604.09554

作者:Jon M Laurent,Albert Bou,Michael Pieler,Conor Igoe,Alex Andonian,Siddharth Narayanan,James Braza,Alexandros Sanchez Vassopoulos,Jacob L Steenwyk,Blake Lash,Andrew D White,Samuel G Rodriques


【64】Universality of first-order methods on random and deterministic matrices
标题:随机和确定性矩阵一阶方法的普遍性
链接:https://arxiv.org/abs/2604.11729

作者:Nicola Gorini,Chris Jones,Dmitriy Kunisky,Lucas Pesenti


【65】Computation of Least Trimmed Squares: A Branch-and-Bound framework with Hyperplane Arrangement Enhancements
标题:最小修剪平方的计算:具有超平面排列增强的分支定界框架
链接:https://arxiv.org/abs/2604.11584

作者:Xiang Meng,Andrés Gómez,Rahul Mazumder


【66】ADD for Multi-Bit Image Watermarking
标题:添加多位图像水印
链接:https://arxiv.org/abs/2604.11491

作者:An Luo,Jie Ding


【67】GlobalCY I: A JAX Framework for Globally Defined and Symmetry-Aware Neural Kähler Potentials
标题:GlobalCY I:全球定义和对称性意识的Kähler神经潜力的JAX框架
链接:https://arxiv.org/abs/2604.11404

作者:Abdul Rahman
备注:Initial draft


【68】Trustworthy Feature Importance Avoids Unrestricted Permutations
标题:值得信赖的功能重要性避免无限制排列
链接:https://arxiv.org/abs/2604.11253

作者:Emanuele Borgonovo,Francesco Cappelli,Xuefei Lu,Elmar Plischke,Cynthia Rudin


【69】Harnessing Photonics for Machine Intelligence
标题:利用光电子学实现机器智能
链接:https://arxiv.org/abs/2604.10841

作者:Hanqing Zhu,Shupeng Ning,Hongjian Zhou,Ziang Yin,Ray T. Chen,Jiaqi Gu,David Z. Pan
备注:20 pages


【70】Tail-Aware Information-Theoretic Generalization for RLHF and SGLD
标题:RL HF和SGLD的尾部感知信息理论推广
链接:https://arxiv.org/abs/2604.10727

作者:Huiming Zhang,Binghan Li,Wan Tian,Qiang Sun
备注:65 pages, 9 figures


【71】Shuffling the Data, Stretching the Step-size: Sharper Bias in constant step-size SGD
标题:洗牌数据,扩大步进大小:恒定步进大小的新元中更尖锐的偏差
链接:https://arxiv.org/abs/2604.10373

作者:Konstantinos Emmanouilidis,Emmanouil-Vasileios Vlatakis-Gkaragkounis,Rene Vidal
备注:Accepted in ICLR 2026 Conference


【72】Continuous PT-Symmetry Breaking as a Design Variable for Giant Altermagnetic Spin Splitting
标题:连续PT对称性破缺作为巨互磁旋转分裂的设计变量
链接:https://arxiv.org/abs/2604.10173

作者:Kichan Chun,Gunn Kim
备注:15 pages, 5 figures


【73】Accelerated Dopant Screening in Oxide Semiconductors via Multi-Fidelity Contextual Bandits and a Three-Tier DFT Validation Funnel
标题:通过多保真上下文Bandits和三层FT验证漏斗加速氧化物半导体中的杂质筛选
链接:https://arxiv.org/abs/2604.10157

作者:Abhinaba Basu


【74】Discrete Flow Maps
标题:离散流程图
链接:https://arxiv.org/abs/2604.09784

作者:Peter Potaptchik,Jason Yim,Adhi Saravanan,Peter Holderrieth,Eric Vanden-Eijnden,Michael S. Albergo


【75】SHANG++: Robust Stochastic Acceleration under Multiplicative Noise
标题:SHANG++:乘性噪音下的鲁棒随机加速
链接:https://arxiv.org/abs/2603.09355

作者:Yaxin Yu,Long Chen,Minfu Feng
备注:33 pages, 19 figures, 2 Tables


机器翻译由腾讯交互翻译提供,仅供参考

点击“阅读原文”获取带摘要的学术速递

Python社区是高质量的Python/Django开发社区
本文地址:http://www.python88.com/topic/195093