Py学习  »  机器学习算法

机器学习学术速递[5.15]

arXiv每日学术速递 • 3 周前 • 569 次点击  

点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!


cs.LG 方向,今日共计265篇


大模型相关(34篇)

【1】Widening the Gap: Exploiting LLM Quantization via Outlier Injection
标题:扩大差距:通过离群值注入利用LLM量化
链接:https://arxiv.org/abs/2605.15152

作者:Xiaohua Zhan,Kazuki Egashira,Robin Staab,Mark Vero,Martin Vechev
摘要:LLM quantization has become essential for memory-efficient deployment. Recent work has shown that quantization schemes can pose critical security risks: an adversary may release a model that appears benign in full precision but exhibits malicious behavior once quantized by users. However, existing quantization-conditioned attacks have been limited to relatively simple quantization methods, where the attacker can estimate weight regions that remain invariant under the target quantization. Notably, prior attacks have consistently failed to compromise more popular and sophisticated schemes, limiting their practical impact. In this work, we introduce the first quantization-conditioned attack that consistently induces malicious behavior that can be triggered by a broad range of advanced quantization techniques, including AWQ, GPTQ, and GGUF I-quants. Our attack exploits a simple property shared by many modern quantization methods: large outliers can cause other weights to be rounded to zero. Consequently, by injecting outliers into specific weight blocks, an adversary can therefore induce a targeted, predictable weight collapse in the model. This effect can be used to craft seemingly benign full-precision models that exhibit a wide range of malicious behaviors after quantization. Through extensive evaluation across three attack scenarios and LLMs, we show that our attack achieves high success rates against a broad range of quantization methods on which prior attacks fail. Our results demonstrate, for the first time, that the security risks of quantization are not restricted to simpler schemes but are broadly relevant across complex, widely-used quantization methods.

【2】Concurrency without Model Changes: Future-based Asynchronous Function Calling for LLMs
标题:无需模型更改的并发:基于未来的非同步函数调用LLM
链接:https://arxiv.org/abs/2605.15077

作者:Guangyu Feng,Huanzhi Mao,Prabal Dutta,Joseph E. Gonzalez
摘要:Function calling, also known as tool use, is a core capability of modern LLM agents but is typically constrained by synchronous execution semantics. Under these semantics, LLM decoding is blocked until each function call completes, resulting in increasing end-to-end latency. In this work, we introduce AsyncFC, a pure execution-layer framework that decouples LLM decoding from function execution, enabling overlap between model decoding and function execution as well as inter-function parallelism when dependencies permit. AsyncFC layers over existing models and unmodified function implementations, requiring no fine-tuning or changes to the standard synchronous function-calling protocol. Across standard function-calling benchmarks and adapted software engineering benchmarks, AsyncFC significantly reduces end-to-end task completion time while preserving task accuracy. Furthermore, these results reveal that LLMs possess a native capability to reason over symbolic futures that represent unresolved execution results, enabling an asynchronous paradigm for model-tool interaction.

【3】TFGN: Task-Free, Replay-Free Continual Pre-Training Without Catastrophic Forgetting at LLM Scale
标题:TFGN:在LLM量表中,无任务、无回放的连续预训练,不会发生灾难性遗忘
链接:https://arxiv.org/abs/2605.15053

作者:Anurup Ganguli
备注:65 pages, 10 figures, 40 tables
摘要:Continually pre-training a large language model on heterogeneous text domains, without replay or task labels, has remained an unsolved architectural problem at LLM scale. Existing methods rely on replay buffers, task identifiers, regularization penalties that scale poorly, or sentence-classification-scale evaluation. We introduce TFGN, an architectural overlay for transformer language models that produces input-conditioned, parameter-efficient updates while leaving the rest of the transformer unchanged. On six heterogeneous text domains (Prose, Python, Math, Biomedical, Chinese, JavaScript) at 1B tokens per phase across three model scales (~398M, ~739M, ~9B) and two regimes (From-Scratch and Retrofit), TFGN achieves backward transfer of -0.007 at LLaMA 3.1 8B Retrofit, HellaSwag retention 0.506/0.504/0.510, and >=99.59% L2-orthogonal gradient separation between domain pairs - with no replay, no task IDs, no Fisher penalty. The same matrices show positive cross-domain forward transfer: held-out JavaScript PPL drops 26.8% at LLaMA-8B Retrofit and 62.0% at GPT-2 Medium From-Scratch purely from Python training. Two extensions on the same substrate close further open problems. A closed-loop meta-control layer (Extension A) reduces forgetting by an additional 81% at ~398M, mapping onto the System A and System M roles of Dupoux et al. (arXiv:2603.15381). An operator-level plan vector (Extension B) reshapes forward-pass behavior at 99.96% cosine fidelity over 30 source->target pairs. The architectural insight is a Read/Write decomposition: the forward pass is fully dense, while cross-domain parameter updates are structured so prior-domain subspaces are not written to. To our knowledge, TFGN is the first architecture that simultaneously closes catastrophic forgetting at LLM scale, realizes a closed-loop autonomous-learning meta-controller, and carries an operator-level latent planner.

【4】An Interpretable Latency Model for Speculative Decoding in LLM Serving
标题:LLM服务中推测解码的可解释延迟模型
链接:https://arxiv.org/abs/2605.15051

作者:Linghao Kong,Megan Flynn,Michael Peng,Nir Shavit,Mark Kurtz,Alexandre Marques
备注:10 pages, 8 figures
摘要:Speculative decoding (SD) accelerates large language model (LLM) inference by using a smaller draft model to propose multiple tokens that are verified by a larger target model in parallel. While prior work demonstrates substantial speedups in isolated or fixed-batch settings, the behavior of SD in production serving systems remains poorly understood: request load varies over time, and effective batch size emerges from the serving system rather than being directly controlled or observed. In this work, we develop a simple and interpretable latency model for SD in LLM serving. We infer effective batch size from request rate using Little's Law and decompose per-request demand into load-independent and load-dependent components for prefill, drafting, and verification. We validate our model using extensive measurements from vLLM across verifier and drafter model sizes, prefill and decode lengths, request rates, draft lengths, and acceptance probabilities. The model accurately describes observed latency, explains why speedups often diminish as server load increases, and characterizes how draft length, acceptance rate, and verifier-drafter size shape latency across serving conditions, with implications for configuring SD in deployed systems. We further show how the framework extends to mixture of experts models, where sparse expert activation changes the effective service costs across load regimes. Together, our results provide a structured framework for understanding SD in real LLM serving systems.

【5】SpeakerLLM: A Speaker-Specialized Audio-LLM for Speaker Understanding and Verification Reasoning
标题:SpeakerLLM:一种用于说话人理解和验证推理的演讲人专业音频LLM
链接:https://arxiv.org/abs/2605.15044

作者:KiHyun Nam,Jungwoo Heo,Siu Bae,Ha-Jin Yu,Joon Son Chung
摘要:As audio-first agents become increasingly common in physical AI, conversational robots, and screenless wearables, audio large language models (audio-LLMs) must integrate speaker-specific understanding to support user authorization, personalization, and context-aware interaction. This requires modeling who is speaking, how the voice sounds, and how recording conditions affect speaker cues. Conventional speaker verification systems provide strong scalar scores but little linguistic evidence, while current audio-LLMs and speaker-aware language models have limited ability to organize speaker information beyond binary labels or descriptive profiles. We present SpeakerLLM, a speaker-specialized audio-LLM framework that unifies single-utterance speaker profiling, recording-condition understanding, utterance-pair speaker comparison, and evidence-organized verification reasoning within a natural-language interface. We construct verification-reasoning targets and a decision-composition policy that separate profile-level evidence from the final same-or-different decision and organize recording condition, profile evidence, and the decision into a structured trace. At its core, SpeakerLLM uses a hierarchical speaker tokenizer designed to capture multiple granularities of speaker evidence. Utterance-level speaker embeddings summarize identity and profile-level cues, whereas frame-level speaker features preserve fine-grained acoustic descriptors. Experiments show that SpeakerLLM-Base improves speaker-profile and recording-condition understanding over general audio-LLMs, while SpeakerLLM-VR preserves strong generated-verdict accuracy and produces decision traces grounded in the supervised verification reasoning schema. We will release the metadata-enriched supervision dataset and target-construction code for reproducibility.

【6】Octopus: History-Free Gradient Orthogonalization for Continual Learning in Multimodal Large Language Models
标题:Octopus:多模式大型语言模型中的连续学习的无历史梯度语言化
链接:https://arxiv.org/abs/2605.14938

作者:Yuehao Liu,Shanyan Guan,Weijia Zhang,Xuanming Shang,Yanhao Ge,Wei Li,Chao Ma
摘要:Continual learning in multimodal large language models (MLLMs) aims to sequentially acquire knowledge while mitigating catastrophic forgetting, yet existing methods face inherent limitations: architecture-based approaches incur additional computational overhead and often generalize poorly to new tasks, rehearsal-based methods rely on storing historical data, raising privacy and storage concerns, and conventional regularization-based strategies alone are insufficient to fully prevent parameter interference. We propose Octopus, a two-stage continual learning framework based on History-Free Gradient Orthogonalization (HiFGO), which enforces gradient-level orthogonality without historical task data. Our proposed two-stage finetuning strategy decouples task adaptation from regularization, achieving a principled balance between plasticity and stability. Experiments on UCIT show that Octopus establishes state-of-the-art performance, surpassing prior SOTA by 2.14% and 6.82% in terms of Avg and Last.

【7】A Hardware-Aware, Per-Layer Methodology for Post-Training Quantization of Large Language Models
标题:大型语言模型训练后量化的硬件感知、每层方法
链接:https://arxiv.org/abs/2605.14929

作者:Earl Killian
备注:21 pages
摘要:Scaled Outer Product (SOP) is a post-training quantization methodology for large language model weights, designed to deliver near-lossless fidelity at 4.5--6 bits per weight on hardware with per-layer LUT decode. The methodology combines per-layer search of fixed and dynamic codebook pairs selected by a per-block selection bit, signed per-block scales, activation-weighted cosine selection, and multiple-choice knapsack promotion of sensitive layers with outlier and sparse-residual correction. Fixed codebooks include NF4, BOF4, Split87, and SH4; per-layer optimized codebooks (DD4) are hosted in LUT SRAM. A new hardware-efficient LUT output format (HIF) is proposed to improve performance, energy, and cost. Across six open model families, the recommended FP6 operating point (E2M3sUE4M4, 6.5 bpw) achieves lower weight reconstruction error than the conventional per-layer-POT FP8 baseline (E4M3, 8.0 bpw) at 1.5 bpw lower storage cost, demonstrating that block-scaled small atoms with carefully chosen scale precision can replace conventionally-deployed FP8. Full evaluation across the 4.5--6 bpw range, including layer promotion and sparse residual correction, is reported in a companion paper.

【8】XFP: Quality-Targeted Adaptive Codebook Quantization with Sparse Outlier Separation for LLM Inference
标题:XFP:LLM推理的具有稀疏离群点分离的质量目标自适应码本量化
链接:https://arxiv.org/abs/2605.14844

作者:Thomas Witt
备注:17 pages, 3 figures, 17 tables, 1 algorithm. Code: https://github.com/flash7777/vllm/tree/multiquant
摘要:We introduce XFP, a dynamic weight quantizer for LLM inference that inverts the conventional workflow: the operator specifies reconstruction quality floors on per-channel cosine similarity (one strict floor for attention and shared experts, one lazy floor for routed-expert MoE); XFP determines codebook size, outlier budget, and packing per layer automatically -- no Hessian, no calibration data, no manual bit-width selection. Each weight matrix is decomposed into a sparse fp16 outlier residual and a dense sub-byte index tensor into a per-group learned codebook. Two storage modes share one auto-select frontend and one fused decode kernel: V2 (per-channel Lloyd) and V2a (shared library of L=32 codebooks per layer). On Qwen3.5-122B-A10B under V2, XFP reaches 138 tok/s single-stream decode on workstation hardware (RTX PRO 6000 Blackwell, TP=2) at 94.49% GSM8K strict-match (3 seeds, n=3957), and is 49% faster than Marlin INT4 at TP=1. For models that do not fit in the target memory envelope, we present the H-Process: a quality-driven iteration over the two cosine thresholds that finds the operating point at which the model just fits while still producing sensible output. Three constraints define its search space: the operator-set thresholds, an OOM boundary at quantize-on-load, and a garbage boundary in generation (cosine similarity steers; benches verify). On Qwen3.5-397B-A17B (512 routed experts/layer), the H-Process fits the full expert population into 2x96 GB at ~3.4 effective bits and delivers 100.9 tok/s long-output decode at 66.72% GSM8K strict-match on the full 1319-problem set (single seed at submission; multi-seed evaluation in progress), exceeding INT4 with routed-expert pruning on memory, throughput, and accuracy simultaneously.

【9】Known By Their Actions: Fingerprinting LLM Browser Agents via UI Traces
标题:众所周知:通过UI痕迹对LLM浏览器代理进行指纹识别
链接:https://arxiv.org/abs/2605.14786

作者:William Lugoloobi,Samuelle Marro,Jabez Magomere,Joss Wright,Chris Russell
摘要:As LLM-based agents increasingly browse the web on users' behalf, a natural question arises: can websites passively identify which underlying model powers an agent? Doing so would represent a significant security risk, enabling targeted attacks tailored to known model vulnerabilities. Across 14 frontier LLMs and four web environments spanning information retrieval and shopping tasks, we show that an agent's actions and interaction timings, captured via a passive JavaScript tracker, are sufficient to identify the underlying model with up to 96\% F1. We formalise this attack surface by demonstrating that classifiers trained on agent actions generalise across model sizes and families. We further show that strong classifiers can be trained from few interaction traces and that agent identity can be inferred early within an episode. Injecting randomised timing delays between actions substantially degrades classifier performance, but does not provide robust protection: a classifier retrained on delayed traces largely recovers performance. We release our harness and a labelled corpus of agent traces \href{https://github.com/KabakaWilliam/known_actions}{here}.

【10】Non-linear Interventions on Large Language Models
标题:大型语言模型的非线性干预
链接:https://arxiv.org/abs/2605.14749

作者:Sangwoo Kim
摘要:Intervention is one of the most representative and widely used methods for understanding the internal representations of large language models (LLMs). However, existing intervention methods are confined to linear interventions grounded in the Linear Representation Hypothesis, leaving features encoded along non-linear manifolds beyond their reach. In this work, we introduce a general formulation of intervention that extends naturally to non-linearly represented features, together with a learning procedure that further enables intervention on implicit features lacking a direct output signature. We validate our framework on refusal bypass steering, where it steers the model more precisely than linear baselines by intervening on a non-linear feature governing refusal.

【11】Agentifying Patient Dynamics within LLMs through Interacting with Clinical World Model
标题:通过与临床世界模型互动验证LLM内的患者动态
链接:https://arxiv.org/abs/2605.14723

作者:Minghao Wu,Yuting Yan,Zhenyang Cai,Ke Ji,Chuangsen Fang,Ziying Sheng,Xidong Wang,Rongsheng Wang,Hejia Zhang,Shuang Li,Benyou Wang,Hongyuan Zha
摘要:Sepsis management in the ICU requires sequential treatment decisions under rapidly evolving patient physiology. Although large language models (LLMs) encode broad clinical knowledge and can reason over guidelines, they are not inherently grounded in action-conditioned patient dynamics. We introduce SepsisAgent, a world model-augmented LLM agent for sepsis treatment recommendation. SepsisAgent uses a learned Clinical World Model to simulate patient responses under candidate fluid--vasopressor interventions, and follows a propose--simulate--refine workflow before committing to a prescription. We first show that world-model access alone yields inconsistent LLM decision performance, motivating agent-specific training. We then train SepsisAgent through a three-stage curriculum: patient-dynamics supervised fine-tuning, propose--simulate--refine behavior cloning, and world-model-based agentic reinforcement learning. On MIMIC-IV sepsis trajectories, SepsisAgent outperforms all traditional RL and LLM-based baselines in off-policy value while achieving the best safety profile under guideline adherence and unsafe-action metrics. Further analysis shows that repeated interaction with the Clinical World Model enables the agent to learn regularities in patient evolution, which remain useful even when simulator access is removed.

【12】Mining Subscenario Refactoring Opportunities in Behaviour-Driven Software Test Suites: ML Classifiers and LLM-Judge Baselines
标题:挖掘行为驱动软件测试套件中的子场景重构机会:ML分类器和LLM-判断基线
链接:https://arxiv.org/abs/2605.14568

作者:Ali Hassaan Mughal,Noor Fatima,Muhammad Bilal
备注:30 pages, 12 figures and tables, 58 references. Under review at Software Quality Journal (Springer). Reproduction package at https://github.com/amughalbscs16/cukereuse_subscenarios_release (Apache-2.0). Upstream cukereuse corpus at https://doi.org/10.5281/zenodo.19754359
摘要:Context. Behaviour-Driven Development (BDD) software test suites accumulate duplicated step subsequences. Three published refactoring patterns are available (within-file Background, within-repo reusable-scenario invocation, cross-organisational shared higher-level step), but no prior work automates which recurring subsequences are worth extracting or which mechanism applies. Objective. Rank recurring step subsequences ("slices") by refactoring suitability (extraction-worthy), pre-map each to one of the three patterns, and quantify prevalence across the public BDD ecosystem. Method. Every contiguous L-step window (L in [2, 18]) in a 339-repository / 276-upstream-owner Gherkin corpus is keyed by paraphrase-robust cluster identifiers and counted under three scopes. Sentence-BERT (SBERT) / Uniform Manifold Approximation and Projection (UMAP) / Hierarchical Density-Based Clustering (HDBSCAN) recovers paraphrase-equivalent slices. Three authors label a stratified 200-slice pool against a written rubric. An eXtreme Gradient Boosting (XGBoost) extraction-worthy classifier trained under 5-fold cross-validation is compared with a tuned rule baseline and two open-weight Large Language Model (LLM) judges. Results. The miner produces 5,382,249 slices collapsing to 692,020 recurring patterns. Three-author Fleiss' kappa = 0.56 (extraction-worthy) and 0.79 (mechanism). The classifier reaches out-of-fold F1 = 0.891 (95% CI [0.852, 0.927]), outperforming both the rule baseline (F1 = 0.836, p = 0.017) and the better LLM judge (F1 = 0.728, p < 1e-4). 75.0%, 59.5%, and 11.7% of scenarios carry a within-file Background, within-repo reusable-scenario, or cross-organisational shared-step candidate. Conclusion. Paraphrase-robust subscenario discovery yields a corpus-wide census of BDD refactoring opportunities; pipeline, classifier predictions, labelled pool, and rubric are released under Apache-2.0.

【13】RxEval: A Prescription-Level Benchmark for Evaluating LLM Medication Recommendation
标题:RxEval:评估LLM药物推荐的处方级基准
链接:https://arxiv.org/abs/2605.14543

作者:Shuhao Chen,Weisen Jiang,Changmiao Wang,Xiaoqing Wu,Xuanren Shi,Yu Zhang,James T. Kwok
摘要:Inpatient medication recommendation requires clinicians to repeatedly select specific medications, doses, and routes as a patient's condition evolves. Existing benchmarks formulate this task as admission-level prediction over coarse drug codes with multi-hot diagnostic and procedure code inputs, failing to capture the per-timepoint, information-rich nature of real prescribing. We propose RxEval, a prescription-level benchmark that evaluates LLM prescribing capability by multiple-choice questions: each question presents a detailed patient profile and time-ordered clinical trajectory, requiring selection of specific medication-dose-route triples from real prescriptions and patient-specific distractors generated via reasoning-chain perturbation. RxEval comprises 1,547 questions spanning 584 patients, 18 diagnostic categories, and 969 unique medications. Evaluation of 16 LLMs shows that RxEval is both challenging and discriminative: F1 ranges from 45.18 to 77.10 across models, and the best Exact Match is only 46.10%. Error analysis reveals that even frontier models may overlook stated patient information and fail to derive clinical conclusions.

【14】Exploring Geographic Relative Space in Large Language Models through Activation Patching
标题:通过激活补丁探索大型语言模型中的地理相对空间
链接:https://arxiv.org/abs/2605.14535

作者:Stef De Sabbata,Rahul Baiju,Stefano Mizzaro,Kevin Roitero
摘要:The increased use of Large Language Models (LLMs) in geography raises substantial questions about the safety of integrating these tools across a wide range of processes and analyses, given our very limited understanding of their inner workings. In this extended abstract, we examine how LLMs process relative geographic space using activation patching, an emerging tool for mechanistic interpretability.

【15】Prompting Policies for Multi-step Reasoning and Tool-Use in Black-box LLMs with Iterative Distillation of Experience
标题:具有迭代经验提炼的黑匣子LLC中多步推理和工具使用的预算策略
链接:https://arxiv.org/abs/2605.14443

作者:Krishna Sayana,Ketan Todi,Ambarish Jash
备注:10 pages and reference, appendix
摘要 :The shift toward interacting with frozen, "black-box" Large Language Models (LLMs) has transformed prompt engineering from a heuristic exercise into a critical optimization challenge. We propose a Reinforcement Learning (RL) framework for training learned prompting policies via iterative distillation of experience. In this architecture, a lightweight prompter model is optimized to maximize task-specific rewards for a larger, frozen worker LLM. By utilizing a contrastive experience buffer that couples scalar rewards with dense textual critiques, our approach effectively amortizes iterative prompt refinement into single-shot policy weights. Our experimental analysis focuses on the Big Bench Extra Hard (BBEH) and Tau-bench suites, covering a diverse range of multi-step reasoning and tool-use tasks. We demonstrate significant gains, improving performance from 55% to 90% in logic-intensive reasoning and 74% to 91% in tool-use tasks. Furthermore, we analyze the structural evolution of prompts, demonstrating how the policy discovers specialized algorithmic heuristics. We provide comprehensive comparisons against state-of-the-art evolutionary baselines like GEPA, showing that iterative distillation achieves superior performance with higher sample efficiency.

【16】To See is Not to Learn: Protecting Multimodal Data from Unauthorized Fine-Tuning of Large Vision-Language Model
标题:只看不懂:保护多模式数据免受大型视觉语言模型未经授权的微调
链接:https://arxiv.org/abs/2605.14291

作者:Chengshuai Zhao,Zhen Tan,Dawei Li,Zhiyuan Yu,Huan Liu
摘要:The rapid advancement of Large Vision-Language Models (LVLMs) is increasingly accompanied by unauthorized scraping and training on multimodal web data, posing severe copyright and privacy risks to data owners. Existing countermeasures, such as machine unlearning and watermarks, are inherent post-hoc approaches that act only after intellectual property infringement has already occurred. In this work, we propose MMGuard to empower data owners to proactively protect their multimodal data against unauthorized LVLM fine-tuning. MMGuard generates unlearnable examples by injecting human-imperceptible perturbations that actively exploit the learning dynamics of LVLMs. By minimizing the training loss, the perturbation creates an optimization shortcut, causing the model to overfit to the noise and thereby degrading downstream performance when the perturbation is absent during inference. To further strengthen this defense, MMGuard introduces a cross-modal binding disruption, strategically shifting LVLM attention to enforce a spurious correlation between the noise and the training target with theoretical guarantees. Enhanced by an ensemble learning strategy for cross-model transferability, MMGuard is evaluated against nine open-source LVLMs across six datasets. Our comprehensive results demonstrate effective, stealthy, and robust protection under white-box, gray-box, and black-box threat models, establishing a mechanistic advantage in proactively defending against aggressive fine-tuning exploitation.

【17】EnergyLens: Predictive Energy-Aware Exploration for Multi-GPU LLM Inference Optimization
标题:EnergyLens:用于多图形处理器LLM推理优化的预测性能源感知探索
链接:https://arxiv.org/abs/2605.14249

作者:Zhiye Song,Kyungmi Lee,Eun Kyung Lee,Xin Zhang,Tamar Eilam,Anantha P. Chandrakasan
摘要:We present EnergyLens, an end-to-end framework for energy-aware large language model (LLM) inference optimization. As LLMs scale, predicting and reducing their energy footprint has become critical for sustainability and datacenter operations, yet existing approaches either require production-level code and expensive profiling or fail to accurately capture multi-GPU energy behavior. As a result, practitioners lack tools for deciding which optimizations to prioritize and for selecting among existing deployment configurations when exhaustive profiling is impractical. EnergyLens addresses this gap with an intuitive einsum-based interface that captures LLM specifications including fusion, parallelism, and compute-communication overlap, combined with load-imbalance-aware MoE modeling and an empirically driven communication energy model for multi-GPU settings. We validate EnergyLens on Llama3 and Qwen3-MoE across tensor-parallel and expert-parallel configurations, achieving mean absolute percentage errors (MAPEs) between 9.25% and 13.19% for multi-GPU prefill and decode energy, and 12.97% across SM allocations for Megatron-style overlap. Our energy-driven exploration reveals up to 1.47x and 52.9x energy variation across configurations in prefill and decode efficiency and motivates distributed serving. We further show that compute-communication overlap is difficult to optimize with intuition alone, but EnergyLens correctly identifies Pareto-optimal overlap configurations.

【18】Latency-Quality Routing for Functionally Equivalent Tools in LLM Agents
标题:LLM代理中功能等效工具的延迟质量路由
链接:https://arxiv.org/abs/2605.14241

作者:Kexin Chu,Dawei Xiang,Wei Zhang
备注:12 pages, 1 figure, 14 tables
摘要:Tool-augmented LLM agents increasingly access the same tool type through multiple functionally equivalent providers, such as web-search APIs, retrievers, or LLM backends exposed behind a shared interface. This creates a provider-routing problem under runtime load: the router must choose among providers that differ in latency, reliability, and answer quality, often without gold labels at deployment time. We introduce LQM-ContextRoute, a contextual bandit router for same-function tool providers. Its key design is latency-quality matching: instead of letting low latency offset poor answers in an additive reward, the router ranks providers by expected answer quality per service cycle. It combines this capacity-aware score with query-specific quality estimation and LLM-as-judge feedback, allowing it to adapt online to both load changes and provider-quality differences. On the main web-search load benchmark, LQM-ContextRoute improves F1 by +2.18 pp over SW-UCB while staying on the latency-quality frontier. In a high-heterogeneity StrategyQA setting, LQM-ContextRoute avoids additive-reward collapse and improves accuracy by up to +18 pp over SW-UCB; on heterogeneous retriever pools, it improves NDCG by +2.91--+3.22 pp over SW-UCB. These results show that same-function tool routing benefits from treating latency as service capacity, especially when runtime pressure and provider-quality heterogeneity coexist.

【19】Diagnosing Training Inference Mismatch in LLM Reinforcement Learning
标题:诊断LLM强化学习中的训练推理不匹配
链接:https://arxiv.org/abs/2605.14220

作者:Tianle Zhong,Neiwen Ling,Yifan Pi,Zijun Wei,Tianshu Yu,Geoffrey Fox,Peng Wu,Xiao Yu
摘要:Modern LLM RL systems separate rollout generation from policy optimization. These two stages are expected to produce token probabilities that match exactly. However, implementation differences can make them assign different values to the same sequence under the same model weights, inducing Training-Inference Mismatch (TIM). TIM is difficult to inspect because it is entangled with off-policy drift and common stabilization mechanisms. In this work, we isolate TIM in a zero-mismatch diagnostic setting (VeXact), and show that small token-level numerical disagreements can independently cause training collapse. We further show that TIM changes the effective optimization problem, and identify a set of remedies that could mitigate TIM. Our results suggest that TIM is not benign numerical noise, but a systems-level perturbation that should be treated as a first-order factor in analyzing LLM RL stability.

【20】LLMs Know When They Know, but Do Not Act on It: A Metacognitive Harness for Test-time Scaling
标题:法学硕士知道何时知道,但不采取行动:测试时缩放的元认知量表
链接:https://arxiv.org/abs/2605.14186

作者:Qi Cao,Yufan Wang,Peijia Qin,Shuhao Zhang,Pengtao Xie
摘要 :Large language models (LLMs) often expose useful signals of self-monitoring: before solving a problem, they can estimate whether they are likely to succeed, and after solving it, they can judge whether their answer is likely to be correct. However, these signals are typically measured or elicited in isolation, rather than used to control inference. In this work, we ask whether LLMs possess latent metacognitive ability that can be turned into effective test-time control. Inspired by the Nelson--Narens theory from cognitive psychology, we propose a metacognitive harness that separates monitoring from reasoning. For each problem, the model first reports a pre-solve feeling-of-knowing (FOK) signal; after each solve attempt, it reports a post-solve judgment-of-learning (JOL) signal. Rather than treating these signals as passive confidence estimates, the harness turns them into an explicit control interface for reasoning: it decides when to trust the current solution, when to retry with compact metacognitive feedback, and when to pass multiple attempts to a final aggregator. Across text, code, and multimodal reasoning benchmarks, our harness substantially improves a fixed Claude Sonnet-4.6 base model without parameter updates or benchmark-specific fine-tuning. On the evaluated public benchmark snapshots, it raises pooled accuracy from 48.3 to 56.9 and exceeds the strongest listed leaderboard entries on the three primary evaluation settings: HLE-Verified, LiveCodeBench v6, and R-Bench-V. These results suggest that strong LLMs may already possess useful metacognitive ability, but require an explicit control harness to act on it during reasoning.

【21】Measuring and Mitigating Toxicity in Large Language Models: A Comprehensive Replication Study
标题:测量和减轻大型语言模型中的毒性:全面的复制研究
链接:https://arxiv.org/abs/2605.14087

作者:Mokshit Surana, Archit Rathod, Akshaj Satishkumar
摘要

【22】Rethinking Layer Relevance in Large Language Models Beyond Cosine Similarity
标题:超越余弦相似性的大型语言模型中的层相关性再思考
链接:https://arxiv.org/abs/2605.14075

作者:Cristian Hinostroza, Rodrigo Toro Icarte, Christ Devia, Andres Carvallo De Ferari, Eugenio Herrera-Berg, Denis Parra, Jorge F Silva
备注:Published at ICLR 2026
摘要

【23】Towards Resource-Efficient LLMs: End-to-End Energy Accounting of Distillation Pipelines
标题:迈向资源高效的LLM:蒸馏管道的端到端能源核算
链接:https://arxiv.org/abs/2605.13981

作者:Katherine Lambert,Sasha Luccioni
备注:Accepted to the 43rd International Conference on Machine Learning (ICML 2026). 11 pages, 6 figures
摘要:The rise in deployment of large language models has driven a surge in GPU demand and datacenter scaling, raising concerns about electricity use, grid stress, and the impacts of modern AI workloads. Distillation is often promoted as one of the most effective paths to obtain cheaper, more efficient models, yet these claims rarely account for the full end-to-end energy and resource costs, including crucial teacher-side workloads such as data generation, logit caching, and evaluation. We present a comprehensive energy accounting framework that measures the complete computational cost of distillation pipelines via detailed stage-wise tracking of GPU device power consumption. In our experiments, we separate and log empirical energy use across distinct phases and systematically measure the energy and emissions of two common distillation methods: the classic logit-based knowledge distillation and synthetic-data supervised fine-tuning, constructing energy-quality Pareto frontiers that expose the previously ignored costs. From these measurements and analyses, we derive practical design rules for selecting distillation methods and hyperparameters under energy and budget constraints, and release an open-source measurement harness and accounting protocol to provide a standardized foundation for comparable, reproducible distillation research, explicitly accountable for complete pipeline energy impact.

【24】EvolveMem:Self-Evolving Memory Architecture via AutoResearch for LLM Agents
标题:EvolveMem:通过LLM代理的AutoResearch自进化内存架构
链接:https://arxiv.org/abs/2605.13941

作者:Jiaqi Liu,Xinyu Ye,Peng Xia,Zeyu Zheng,Cihang Xie,Mingyu Ding,Huaxiu Yao
摘要:Long-term memory is essential for LLM agents that operate across multiple sessions, yet existing memory systems treat retrieval infrastructure as fixed: stored content evolves while scoring functions, fusion strategies, and answer-generation policies remain frozen at deployment. We argue that truly adaptive memory requires co-evolution at two levels: the stored knowledge and the retrieval mechanism that queries it. We present EvolveMem, a self-evolving memory architecture that exposes its full retrieval configuration as a structured action space optimized by an LLM-powered diagnosis module. In each evolution round, the module reads per-question failure logs, identifies root causes, and proposes targeted configuration adjustments; a guarded meta-analyzer applies them with automatic revert-on-regression and explore-on-stagnation safeguards. This closed-loop self-evolution realizes an AutoResearch process: the system autonomously conducts iterative research cycles on its own architecture, replacing manual configuration tuning. Starting from a minimal baseline, the process converges autonomously, discovering effective retrieval strategies including entirely new configuration dimensions not present in the original action space. On LoCoMo, EvolveMem outperforms the strongest baseline by 25.7% relative and achieves a 78.0% relative improvement over the minimal baseline. On MemBench, EvolveMem exceeds the strongest baseline by 18.9% relative. Evolved configurations transfer across benchmarks with positive rather than catastrophic transfer, indicating that the self-evolution process captures universal retrieval principles rather than benchmark-specific heuristics. Code is available at https://github.com/aiming-lab/SimpleMem.

【25】Towards the Next Frontier of LLMs, Training on Private Data: A Cross-Domain Benchmark for Federated Fine-Tuning
标题:迈向LLC的下一个前沿,私有数据训练:联邦微调的跨域基准
链接:https://arxiv.org/abs/2605.13936

作者:Daniel M. Jimenez-Gutierrez,Enrique Zuazua,Georgios Kellaris,Joaquin del Rio,Oleksii Sliusarenko,Xabi Uribe-Etxebarria
摘要 :The recent success of large language models (LLMs) has been largely driven by vast public datasets. However, the next frontier for LLM development lies beyond public data. Much of the world's most valuable information is private, especially in highly regulated sectors such as healthcare and finance, where data include patient histories or customer communications. Unlocking this data could represent a major leap forward, enabling LLMs with deeper domain expertise and stronger real-world utility. Yet, these data cannot be shared because they are distributed across institutions and constrained by privacy, regulatory, and organizational barriers. Moreover, institutional datasets are typically non-independent and identically distributed (non-IID), differing across sites in population characteristics, data modalities, documentation patterns, and task-specific label distributions. In this paper, we demonstrate a practical approach to unlocking private and distributed institutional data for LLM adaptation through federated collaboration across data silos. Built on the Sherpa.ai Federated Learning platform, our framework enables nodes to jointly fine-tune a shared LLM without exchanging private data. We evaluate this approach through a cross-domain benchmark in healthcare and finance, using four closed-ended question answering and classification datasets: MedQA, MedMCQA, FPB, and FiQA-SA. We compare three parameter-efficient fine-tuning (PEFT) strategies-LoRA, QLoRA, and IA3-across pretrained backbones under non-IID settings reflecting institutional data heterogeneity. Our results show that federated fine-tuning performs close to centralized training and outperforms isolated single-institution learning. From a Green AI perspective, QLoRA and IA3 improve efficiency with limited accuracy degradation, supporting federated PEFT as a viable approach for adapting LLMs where data cannot be shared.

【26】Beyond Mode-Seeking RL: Trajectory-Balance Post-Training for Diffusion Language Models
标题:超越模式寻求RL:扩散语言模型的轨迹平衡训练后
链接:https://arxiv.org/abs/2605.13935

作者:Saba Ahmadi,Prasanna Parthasarathi,Yufei Cui
摘要:Diffusion language models are a promising alternative to autoregressive models, yet post-training methods for them largely adapt reward-maximizing objectives. We identify a central failure mode in this setting we call trajectory locking: sampled reward-driven updates over-concentrate probability mass onto a narrow set of denoising paths, reducing coverage of alternative correct solutions under repeated sampling. To address this, we propose TraFL (Trajectory Flow baLancing), a trajectory-balance objective that trains the policy toward a reward-tilted target distribution anchored to a frozen reference model. We make this practical for diffusion language models with a diffusion-compatible sequence-level surrogate and a learned prompt-dependent normalization. Across mathematical reasoning and code generation benchmarks, TraFL is the only evaluated post-training method that improves over the base model in every benchmark-length setting, with gains that persist as the sampling budget increases. The improvements transfer to held-out evaluations: TraFL stays above the base model on Minerva Math and is the strongest method on every LiveCodeBench difficulty split.

【27】Merging Methods for Multilingual Knowledge Editing for Large Language Models: An Empirical Odyssey
标题:大型语言模型多语言知识编辑的合并方法:经验奥德赛
链接:https://arxiv.org/abs/2605.13919

作者:Kunil Lee,Ki-Young Shin,Jong-Hyeok Lee,Young-Joo Suh
摘要:Multilingual knowledge editing (MKE) remains challenging because language-specific edits interfere with one another, even when locate-then-edit methods work well in monolingual settings. This paper focuses on three issues: the effectiveness of vector merging methods for MKE, the extent to which Task Singular Vectors for Merging (TSVM) can reduce multilingual interference, and the influence of the weight scaling factor and rank compression ratio on performance. We evaluate six merging variants with two popular backbone large language models, two base knowledge editing methods, and 12 languages on the MzsRE benchmark under a large-scale batch-editing setting. Our results show that vector summation with shared covariance is the most reliable overall strategy, whereas simple summation without shared covariance performs poorly. TSVM improves performance in some settings, but its ability to mitigate multilingual interference is limited. We also find that performance is sensitive to both weight scale and rank ratio, with larger-than-default scaling and relatively low rank often yielding better results. These findings clarify the practical strengths and limits of current vector merging methods for MKE and provide guidance for future multilingual knowledge editing research.

【28】BiSpikCLM: A Spiking Language Model integrating Softmax-Free Spiking Attention and Spike-Aware Alignment Distillation
标题:BiSpikCLM:一种集成了无Softmax峰值注意力和峰值感知对齐蒸馏的峰值语言模型
链接:https://arxiv.org/abs/2605.13859

作者:Sihang Guo,Chenlin Zhou,Jiaqi Wang,Kehai Chen,Qingyan Meng,Zhengyu Ma
摘要:Spiking Neural Networks (SNNs) offer promising energy-efficient alternatives to large language models (LLMs) due to their event-driven nature and ultra-low power consumption. However, to preserve capacity, most existing spiking LLMs still incur intensive floating-point matrix multiplication (MatMul) and nonlinearities, or training difficulties arising from the complex spatiotemporal dynamics. To address these challenges, we propose BiSpikCLM, the first fully binary spiking MatMul-free causal language model. BiSpikCLM introduces Softmax-Free Spiking Attention (SFSA), eliminating softmax and floating-point operations in autoregressive language modeling. For efficient training, we introduce Spike-Aware Alignment Distillation (SpAD), which aligns ANN teacher and SNN student across embeddings, attention maps, intermediate features, and output logits. SpAD framework allows BiSpikCLM to reach comparable performance to ANN counterparts using substantially fewer training tokens (e.g., only 5.6% of the tokens for the 1.3B model). As a result, BiSpikCLM achieves competitive performance at only 4.16% - 5.87% of the computational cost on natural language generation tasks. Our results highlight the feasibility and effectiveness of fully binary spike-driven LLMs and establish the distillation as a promising pathway for brain-inspired spiking NLP.

【29】A Hormone-inspired Emotion Layer for Transformer language models (HELT)
标题:Transformer语言模型的荷尔蒙情感层(HELT)
链接:https://arxiv.org/abs/2605.13858

作者:Eslam Reda,Sara El-Metwally
备注:24 pages, 5 figures
摘要:Large Language Models have demonstrated remarkable capabilities in generating contextually relevant and grammatically correct text. However, they fundamentally lack the ability to process and respond to emotional context in a manner analogous to human emotional cognition. Current approaches to emotion modeling in NLP systems rely primarily on discrete emotion classification or simplistic sentiment analysis, which fail to capture the continuous, multi-dimensional nature of human emotional states. In this paper, we introduce HormoneT5, a novel architecture that augments transformer language models with a biologically-inspired Hormone Emotion Block that simulates the human endocrine system's role in emotional processing. Our approach computes six continuous hormone-like values through specialized per-hormone attention heads, each with orthogonally initialized learnable queries, temperature-scaled attention mechanisms, and deep output projections. These hormone values are then transformed into an emotional embedding that modulates the encoder hidden states, enabling emotionally-appropriate response generation. We propose a multi-objective training framework combining sequence-to-sequence loss, hormone prediction loss with margin penalties, and diversity regularization to prevent attention collapse. Experimental results on our curated emotion-labeled dataset demonstrate that HormoneT5 achieves 85%+ per-hormone accuracy within a 0.15 tolerance threshold, with hormone differentiation ranges exceeding 0.85 across all six hormones between contrasting emotional tones. Human evaluation studies show significant preference (p < 0.01) for HormoneT5-generated responses in terms of emotional appropriateness and empathetic quality compared to baseline T5 outputs. Our work opens new directions for biologically-grounded affective computing and emotionally intelligent conversational agents.

【30】Controlling Logical Collapse in LLMs via Algebraic Ontology Projection over F2
标题:通过F2上的代数实体投影控制LLM中的逻辑崩溃
链接:https://arxiv.org/abs/2605.12968

作者:Hisashi Miyashita,Mgnite Inc
摘要 :Do large language models internally encode ontological relations in a formally verifiable algebraic structure? We introduce Algebraic Ontology Projection (AOP), which projects LLM hidden states into the Galois Field F2 under Liskov Substitution Principle constraints, using only 42 relational pairs as algebraic keys. AOP achieves up to 93.33% zero-shot inclusion accuracy on unseen concept pairs (Gemma-2 Instruct with optimized prompt), with consistent 86.67% accuracy observed across multiple model families -- with no model tuning, but through prompt alone. This algebraic structure is strongly layer-dependent. We introduce Semantic Crystallisation (SC), a metric that quantifies F2 constraint satisfaction relative to a random baseline and predicts zero-shot accuracy without held-out data. System prompts act as algebraic boundary conditions: only their combination with instruction tuning prevents Late-layer Collapse -- a systematic degradation of logical consistency in the final layers, observed in 7 of 10 conditions. These findings reframe forward computation as an iterative process of algebraic organisation, and open a path toward LLMs whose logical structure is not merely approximated, but formally accessible.

【31】GAMBIT: A Three-Mode Benchmark for Adversarial Robustness in Multi-Agent LLM Collectives
标题:GAMBIT:多智能体LLM群体中对抗鲁棒性的三模式基准
链接:https://arxiv.org/abs/2605.09027

作者:Alexandre Le Mercier,Chris Develder,Thomas Demeester
备注:46 pages, 16 figures
摘要:In multi-agent systems (MAS), a single deceptive agent can nullify all gains of an agentic AI collective and evade deployed defenses. However, existing adversarial studies on MAS target only shallow tasks and do not consider adaptive adversaries, which evolve their strategies to evade the very detectors trained to catch them. To address that gap, we introduce GAMBIT, a benchmark with three evaluation modes and two independent scores for evaluating imposter detectors: the first two modes measure zero-shot detection under increasing distribution shift, and a third recalibration mode measures how quickly a detector adapts to novel attacks from just 20 labeled examples. The benchmark comes with a dataset of 27,804 labeled instances spanning 240 co-evolved imposter strategies. Our contributions are threefold: (1) Using chess as a substrate deep reasoning problem and Gemini 3.1 Pro for agents, we release GAMBIT and its dataset to evaluate imposter detectors under realistic constraints against a stealthy adaptive imposter; (2) We introduce an adaptive imposter agent based on an efficient evolutionary framework, generalizable beyond chess, that collapses collective task performance while remaining essentially undetectable (50.5% F1-score with a Gemini-based detector); (3) We show that zero-shot evaluation can be highly misleading for adaptive adversaries: two detectors with near-identical zero-shot scores differ by 8x on few-shot adaptation, while the meta-learned variant converges 20x faster, a gap only visible in the recalibration mode. Altogether, GAMBIT provides the first multi-agent benchmark where adversarial attacks and defenses co-evolve, with an imposter framework generalizable beyond our use case, and promising techniques for fast recalibration in a rapidly evolving adversarial system. Code and data: https://anonymous.4open.science/r/gambit.

【32】Hidden State Poisoning Attacks against Mamba-based Language Models
标题:针对基于Mamba的语言模型的隐藏状态中毒攻击
链接:https://arxiv.org/abs/2601.01972

作者:Alexandre Le Mercier,Chris Develder,Thomas Demeester
备注:29 pages, 4 figures
摘要:State space models (SSMs) like Mamba offer efficient alternatives to Transformer-based language models, with linear time complexity. Yet, their adversarial robustness remains critically unexplored. This paper studies the phenomenon whereby specific short input phrases induce a partial amnesia effect in such models, by irreversibly overwriting information in their hidden states, referred to as a Hidden State Poisoning Attack (HiSPA). Our benchmark RoBench-25 allows evaluating a model's information retrieval capabilities when subject to HiSPAs, and confirms the vulnerability of SSMs against such attacks. Even the recent Jamba-1.7-Mini SSM--Transformer (a 52B hybrid model) collapses on RoBench-25 under some HiSPA triggers, whereas pure Transformers do not. We also observe that HiSPA triggers significantly weaken the Jamba model on the popular Open-Prompt-Injections benchmark, unlike pure Transformers. We further show that the theoretical and empirical findings extend to Mamba-2, and also analyse a Mamba-2-based hybrid (Nemotron-3-Nano). Finally, our interpretability study reveals patterns in Mamba's hidden layers during HiSPAs that could be used to build a HiSPA mitigation system. The full code and data to reproduce the experiments can be found at https://anonymous.4open.science/r/hispa_anonymous-5DB0.

【33】The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane Algorithm
标题:LLM量化的几何学:GPTQ作为Babai的最近平面算法
链接:https://arxiv.org/abs/2507.18553

作者:Jiale Chen,Yalda Shabanzadeh,Elvir Crnčević,Torsten Hoefler,Dan Alistarh
备注:Published as a conference paper at the Fourteenth International Conference on Learning Representations (ICLR 2026): https://openreview.net/forum?id=NFB4QGGS65
摘要:Quantizing the weights of large language models (LLMs) from 16-bit to lower bitwidth is the de facto approach to deploy massive transformers onto more affordable accelerators. While GPTQ emerged as one of the standard methods for one-shot post-training quantization at LLM scale, its inner workings are described as a sequence of algebraic updates that obscure geometric meaning or worst-case guarantees. In this work, we show that, when executed back-to-front (from the last to first dimension) for a linear layer, GPTQ is mathematically identical to Babai's nearest plane algorithm for the classical closest vector problem (CVP) on a lattice defined by the Hessian matrix of the layer's inputs. This equivalence is based on a sophisticated mathematical argument, and has two analytical consequences: first, the GPTQ error propagation step gains an intuitive geometric interpretation; second, GPTQ inherits the error upper bound of Babai's algorithm under the assumption that no weights are clipped. Leveraging this bound, we design post-training quantization methods that avoid clipping, and outperform the original GPTQ. In addition, we provide efficient GPU inference kernels for the resulting representation. Taken together, these results place GPTQ on a firm theoretical footing and open the door to importing decades of progress in lattice algorithms towards the design of future quantization algorithms for billion-parameter models. Source code is available at https://github.com/IST-DASLab/GPTQ-Babai.

【34】Multi-Scale Dequant: Eliminating Dequantization Bottleneck via Activation Decomposition for Efficient LLM Inference
标题:多尺度去量化:通过激活分解消除去量化瓶颈以实现高效LLM推理
链接:https://arxiv.org/abs/2605.13915

作者:Lingchao Zheng,Yuwei Fan,Jun Li,Chengqiu Hu,Qichen Liao,Junyi Fan,Rui Shi,Fangzheng Miao
摘要 :Quantization is essential for efficient large language model (LLM) inference, yet the dequantization step-converting low-bit weights back to high-precision for matrix multiplication has become a critical bottleneck on modern AI accelerators. On architectures with decoupled compute units (e.g., Ascend NPUs), dequantization operations can consume more cycles than the matrix multiplication itself, leaving the high-throughput tensor cores underutilized. This paper presents Multi-Scale Dequant (MSD), a quantization framework that removes weight/KV dequantization from the GEMM critical path. Instead of lifting low-bit weights to BF16 precision, MSD decomposes high-precision BF16 activations into multiple low-precision components, each of which can be multiplied directly with quantized weights via native hardware-accelerated GEMM. This approach shifts the computational paradigm from precision conversion to multi-scale approximation, avoiding INT8-to-BF16 weight conversion before GEMM. We instantiate MSD for two weight formats and derive tight error bounds for each. For INT8 weights (W4A16), two-pass INT8 decomposition achieves near 16 effective bits. For MXFP4 weights (W4A16), two-pass MXFP4 decomposition yields near 6.6 effective bits with error bound 1/64 per block surpassing single-pass MXFP8(5.24 bits) while maintaining the same effective GEMM compute time. We further derive closed-form latency and HBM traffic models showing that MSD avoids the Vector-Cube pipeline stall caused by dequantization and reduces KV cache HBM traffic by up to 2.5 times in attention. Numerical simulations on matrix multiplication and Flash Attention kernels confirm that MSD does not degrade accuracy compared to dequantization baselines, and in many settings achieves lower L2 error.

Graph相关(图学习|图神经网络|图优化等)(4篇)

【1】AIMing for Standardised Explainability Evaluation in GNNs: A Framework and Case Study on Graph Kernel Networks
标题:GNN中标准化可解释性评估的Aiming:图核网络的框架和案例研究
链接:https://arxiv.org/abs/2605.14884

作者:Magdalena Proszewska,N. Siddharth
备注:19 pages,4 figures, 8 tables
摘要:Graph Neural Networks (GNNs) have advanced significantly in handling graph-structured data, but a comprehensive framework for evaluating explainability remains lacking. Existing evaluation frameworks primarily involve post-hoc explanations, and operate in the setting where multiple methods generate a suite of explanations for a single model. This makes comparison of explanations across models difficult. Evaluation of inherently interpretable models often targets a specific aspect of interpretability relevant to the model, but remains underdeveloped in terms of generating insight across a suite of measures. We introduce AIM, a comprehensive framework that addresses these limitations by measuring Accuracy, Instance-level explanations, and Model-level explanations. AIM is formulated with minimal constraints to enhance flexibility and facilitate broad applicability. Here, we use AIM in a pipeline, extracting explanations from inherently interpretable GNNs such as graph kernel networks (GKNs) and prototype networks (PNs), evaluating these explanations with AIM, identifying their limitations and obtaining insights to their characteristics. Taking GKNs as a case study, we show how the insights obtained from AIM can be used to develop an updated model, xGKN, that maintains high accuracy while demonstrating improved explainability. Our approach aims to advance the field of Explainable AI (XAI) for GNNs, providing more robust and practical solutions for understanding and improving complex models.

【2】Exploitation of Hidden Context in Dynamic Movement Forecasting: A Neural Network Journey from Recurrent to Graph Neural Networks and General Purpose Transformers
标题:动态运动预测中隐藏上下文的利用:从回归到图形神经网络和通用变形器的神经网络之旅
链接:https://arxiv.org/abs/2605.14855

作者:Lukas Schelenz,Shobha Rajanna,Denis Gosalci,Lucas Heublein,Jonas Pirkl,Jonathan Ott,Felix Ott,Christopher Mutschler,Tobias Feigl
备注:12 pages
摘要:Forecasting within signal processing pipelines is crucial for mitigating delays, particularly in predicting the dynamic movements of objects such as NBA players. This task poses significant challenges due to the inherently interactive and unpredictable nature of sports, where abrupt changes in velocity and direction are prevalent. Traditional approaches, including (S)ARIMA(X), Kalman filters (KF), and Particle filters (PF), often struggle to model the non-linear dynamics present in such scenarios. Machine learning (ML) methods, such as long short-term memory (LSTM) networks, graph neural networks (GNNs), and Transformers, offer greater flexibility and accuracy but frequently fail to explicitly capture the interplay between temporal dependencies and contextual interactions, which are critical in chaotic sports environments. In this paper, we evaluate these models and assess their strengths and weaknesses. Experimental results reveal key performance trade-offs across input history length, generalizability, and the ability to incorporate contextual information. ML-based methods demonstrated substantial improvements over linear models across forecast horizons of up to 2s. Among the tested architectures, our hybrid LSTM augmented with contextual information achieved the lowest final displacement error (FDE) of 1.51m, outperforming temporal convolutional neural network (TCNN), graph attention network (GAT), and Transformers, while also requiring less data and training time compared to GAT and Transformers. Our findings indicate that no single architecture excels across all metrics, emphasizing the need for task-specific considerations in trajectory prediction for fast-paced, dynamic environments such as NBA gameplay.

【3】GFMate: Empowering Graph Foundation Models with Test-time Prompt Tuning
标题:GFMate:通过测试时即时调优来增强图形基础模型
链接:https://arxiv.org/abs/2605.14809

作者:Yan Jiang,Ruihong Qiu,Zi Huang
摘要:Graph prompt tuning has shown great potential in graph learning by introducing trainable prompts to enhance the model performance in conventional single-domain scenarios. Recent research has extended graph prompts to improve Graph Foundation Models (GFMs) by few-shot tuning auxiliary prompts. Despite their progress, most existing methods embed source-domain information into prompts, which serve either as input to GFMs or encoded during model pre-training. Such prompt entanglement with specific source domains and GFM pre-training strategy restricts their generalisability to other domains and different GFMs. Furthermore, existing GFM prompts merely rely on few-shot tuning for adaptation, neglecting the rich information in unlabelled target domain test data. Motivated by these insights, this paper aims to empower GFMs with pre-training-agnostic test-time graph prompt tuning, named GFMate. GFMate introduces centroid and layer prompts applied after pre-training on target domains, avoiding entanglement with specific source domains and model pre-training. In addition, a test-time complementary learning objective is devised to exploit both labelled and unlabelled target domain data for effective test-time prompt tuning. Extensive experiments on 12 benchmark datasets demonstrate the superior performance and efficiency of GFMate, achieving improvements of up to 30.63%. Code is available at https://github.com/YanJiangJerry/GFMate.

【4】Neuromorphic Graph Anomaly Detection via Adaptive STDP and Spiking Graph Neural Networks
标题:通过自适应STDP和尖峰图神经网络检测神经形态图异常
链接:https://arxiv.org/abs/2605.13863

作者:Abdul Joseph Fofanah,Lian Wen,David Chen,Tsungcheng Yao,Kwabena Sarpong
摘要:Anomaly detection in dynamic networks is critical for applications from cybersecurity to industrial monitoring, yet existing methods face challenges in energy efficiency, temporal precision, and adaptability. This paper introduces ASTDP-GAD, a novel Adaptive Spiking Temporal Dynamics Plasticity framework for Graph Anomaly Detection that integrates spiking graph neural networks with STDP learning for energy-efficient neuromorphic detection in dynamic networks. Our framework unifies spiking neural computation, STDP learning, and graph-based anomaly detection through the following key innovations: temporal spike graph encoding with adaptive Leaky Integrate-and-Fire (LIF) dynamics; LIF-based graph attention with lateral inhibition; event-driven hypergraph memory with STDP-inspired prototype updates; spike rate contrast pooling based on spiking irregularity; adaptive STDP layers capturing causal temporal relationships; and multi-scale temporal convolution with multi-factor anomaly fusion. Theoretical analysis provides rigorous guarantees: spike encoding preserves input information with resolution scaling linearly in simulation steps and hidden dimension; LIFGAT approximates any continuous attention function; hypergraph memory converges to optimal prototypes; contrast pooling achieves provable anomaly selection bounds; STDP learning converges stably; and multi-factor fusion produces calibrated scores with up to $5\times$ variance reduction. Extensive experiments on nine datasets on both dynamic and static graphs demonstrate superior anomaly detection accuracy while maintaining biological plausibility and energy efficiency for neuromorphic deployment.

Transformer(6篇)

【1】Your CLIP has 164 dimensions of noise: Exploring the embeddings covariance eigenspectrum of contrastively pretrained vision-language transformers
标题:您的CLIP具有164个维度的噪音:探索对比预训练的视觉语言转换器的嵌入协方差特征谱
链接:https://arxiv.org/abs/2605.14893

作者:Jakub Grzywaczewski,Dawid Płudowski,Przemysław Biecek
摘要:Contrastively pre-trained Vision-Language Models (VLMs) serve as powerful feature extractors. Yet, their shared latent spaces are prone to structural anomalies and act as repositories for non-semantic, multi-modal noise. To address this phenomenon, we employ spectral decomposition of covariance matrices to decompose the VLM latent space into a multi-modal semantic signal component and a shared noise subspace. We observe that this noise geometry exhibits strong subgroup invariance across distinct data subsets. Crucially, pruning these shared noise dimensions is mainly harmless, preserving or actively improving downstream task performance. By isolating true semantic signals from artifactual noise, this work provides new mechanistic insights into the representational structure of modern VLMs, suggesting that a substantial fraction of their latent geometry is governed by shared, architecture-level noise rather than task-relevant semantics alone.

【2】GeoViSTA: Geospatial Vision-Tabular Transformer for Multimodal Environment Representation
标题:GeoViSTA:用于多模式环境表示的地理空间视觉表格Transformer
链接:https://arxiv.org/abs/2605.14406

作者:Yuhao Liu,Sadeer Al-Kindi,Ashok Veeraraghavan,Guha Balakrishnan
摘要:Large-scale pretraining on Earth observation imagery has yielded powerful representations of the natural and built environment. However, most existing geospatial foundation models do not directly model the structured socioeconomic covariates typically stored in tabular form. This modality gap limits their ability to capture the complete total environment, which is critical for reasoning about complex environmental, social, and health-related outcomes. In this work, we propose GeoViSTA (Geospatial Vision-Tabular Transformer), a vision-tabular architecture that learns unified geospatial embeddings from co-registered gridded imagery and tabular data. GeoViSTA utilizes bilateral cross-attention to exchange spatial and semantic information across modalities, guided by a geography-aware attention mechanism that aligns continuous image patches with irregular census-tract tokens. We train GeoViSTA with a self-supervised joint masked-autoencoding objective, forcing it to recover missing image patches and tabular rows using local spatial context and cross-modal cues. Empirically, GeoViSTA's unified embeddings improve linear probing performance on high-impact downstream tasks, outperforming baselines in predicting disease-specific mortality and fire hazard frequency across held-out regions. These results demonstrate that jointly modeling the physical environment alongside structured socioeconomic context yields highly transferable representations for holistic geospatial inference.

【3】Precise Verification of Transformers through ReLU-Catalyzed Abstraction Refinement
标题:通过ReLU催化的抽象细化精确验证Transformer
链接:https://arxiv.org/abs/2605.14294

作者:Hengjie Liu,Zhenya Zhang,Jianjun Zhao
备注:32 pages, 6 figures, the full version of the paper accepted by CAV 2026
摘要:Formal verification of transformers has become increasingly important due to their widespread deployment in safety-critical applications. Compared to classic neural networks, the inferences of transformers involve highly complex computations, such as dot products in self-attention layers, rendering their verification extremely difficult. Existing approaches explored over-approximation methods by constructing convex constraints to bound the output ranges of transformers, which can achieve high efficiency. However, they may sacrifice verification precision, and consequently introduce significant approximation error that leads to frequent occurrences of false alarms. In this paper, we propose a transformer verification approach that can achieve improved precision. At the core of our approach is a novel usage of ReLU, by which we represent a precise but non-linear bound for dot products such that we can further exploit the rich body of literature for convex relaxation of ReLU to derive precise bounds. We extend two classic approaches to the context of transformers, a rule-based one and an optimization-based one, resulting in two new frameworks for efficient and precise verification. We evaluate our approaches on different model architectures and robustness properties derived from two datasets about sentiment analysis, and compare with the state-of-the-art baseline approach. Compared to the baseline, our approach can achieve significant precision improvement for most of the verification tasks with acceptable compromise of efficiency, which demonstrates the effectiveness of our approach.

【4】Dynamics of the Transformer Residual Stream: Coupling Spectral Geometry to Network Topology
标题:Transformer剩余流的动力学:将谱几何与网络布局相结合
链接:https://arxiv.org/abs/2605.14258

作者:Jesseba Fernando,Grigori Guitchounts
摘要:Large language models are remarkably capable, yet how computation propagates through their layers remains poorly understood. A growing line of work treats depth as discrete time and the residual stream as a dynamical system, where each layer's nonlinear update has a local linear description. However, previous analyses have relied on scalar summaries or approximate linearizations, leaving the full spectral geometry of trained LLMs unknown. We perform full Jacobian eigendecomposition across three production--scale LLMs and show that training installs a monotonic spectral gradient through depth -- from non-normal, rotation-dominated early layers to near--symmetric late layers -- together with a cumulative low-rank bottleneck that funnels perturbations into a small fraction of the residual stream's effective dimensions. Our experiments reveal that this gradient and the dimensional collapse are learned rather than architectural, and is largely dissolved when structured non-normality is removed. We further show that the topological positioning of graph communities predicts whether the Jacobian amplifies or suppresses them, with the sign of the coupling determined by the local operator type, a relationship absent at initialization. These results map a learned spectral geometry in LLMs that links perturbation propagation and compression to the network's functional topology.

【5】Towards Real-Time Autonomous Navigation: Transformer-Based Catheter Tip Tracking in Fluoroscopy
标题:迈向实时自主导航:荧光透视中基于Transformer的导管尖端跟踪
链接:https://arxiv.org/abs/2605.14253

作者:Harry Robertshaw,Yanghe Hao,Weiyuan Deng,Benjamin Jackson,S. M. Hadi Sadati,Nikola Fischer,Tom Vercauteren,Alejandro Granados,Thomas C. Booth
备注:Harry Robertshaw and Yanghe Hao contributed equally to this work. Published in the International Journal of Computer Assisted Radiology and Surgery
摘要 :Purpose: Mechanical thrombectomy (MT) improves stroke outcomes, but is limited by a lack of local treatment access. Widespread distribution of reinforcement learning (RL)-based robotic systems can be used to alleviate this challenge through autonomous navigation, but current RL methods require live device tip coordinate tracking to function. This paper aims to develop and evaluate a real-time catheter tip tracking pipeline under fluoroscopy, addressing challenges such as low contrast, noise, and device occlusion. Methods: A multi-threaded pipeline was designed, incorporating frame reading, preprocessing, inference, and post-processing. Deep learning segmentation models, including U-Net, U-Net+Transformer, and SegFormer, were trained and benchmarked using two-class and three-class formulations. Post-processing involved two-step component filtering, one-pixel medial skeletonization, and greedy arc-length path following with contour fall-back. Results: On manually-labeled moderate complexity fluoroscopic video data, the two-class SegFormer achieved a mean absolute error of 4.44 mm, outperforming U-Net (4.60 mm), U-Net+Transformer (6.20 mm) and all three-class models (5.19-7.74 mm). On segmentation benchmarks, the system exceeded state-of-the-art CathAction results with improvements of up to +5% in Dice scores for three-segmentation. Conclusion: The results demonstrate that the proposed multi-threaded tracking framework maintains stable performance under challenging imaging conditions, outperforming prior benchmarks, while providing a reliable and efficient foundation for RL-based autonomous MT navigation.

【6】DT-Transformer: A Foundation Model for Disease Trajectory Prediction on a Real-world Health System
标题:DT-Transformer:现实世界卫生系统疾病轨迹预测的基础模型
链接:https://arxiv.org/abs/2605.14227

作者:Yunying Zhu,Andrew R Weckstein,Kueiyu Joshua Lin,Jie Yang
备注:Work in Progress
摘要:Accurate disease trajectory prediction is critical for early intervention, resource allocation, and improving long-term outcomes. While electronic health records (EHRs) provide a rich longitudinal view of patient health in clinical environments, models trained on curated research cohorts may not reflect routine deployment settings, and those trained on single-hospital datasets capture only fragments of each patient's trajectory. This highlights the importance of leveraging large, multi-hospital health systems for training and validation to better reflect real-world clinical complexity. In this work, we develop DT-Transformer, a foundation model trained on 57.1M structured EHR entries over 1.7M patients from Mass General Brigham (MGB), spanning 11 hospitals and a broad network of outpatient clinics. DT-Transformer achieves strong discrimination in both held-out and prospective validation settings. Next-event prediction achieves a median age- and sex-stratified AUC of 0.871 across 896 disease categories, with all categories exceeding AUC 0.5. These results support health system-scale training as a path toward foundation models suited to real-world clinical forecasting.

GAN|对抗|攻击|生成相关(9篇)

【1】RefDecoder: Enhancing Visual Generation with Conditional Video Decoding
标题:RefDecoder:通过条件视频解码增强视觉生成
链接:https://arxiv.org/abs/2605.15196

作者:Xiang Fan,Yuheng Wang,Bohan Fang,Zhongzheng Ren,Ranjay Krishna
摘要:Video generation powers a vast array of downstream applications. However, while the de facto standard, i.e., latent diffusion models, typically employ heavily conditioned denoising networks, their decoders often remain unconditional. We observe that this architectural asymmetry leads to significant loss of detail and inconsistency relative to the input image. To address this, we argue that the decoder requires equal conditioning to preserve structural integrity. We introduce RefDecoder, a reference-conditioned video VAE decoder by injecting high-fidelity reference image signal directly into the decoding process via reference attention. Specifically, a lightweight image encoder maps the reference frame into the detail-rich high-dimensional tokens, which are co-processed with the denoised video latent tokens at each decoder up-sampling stage. We demonstrate consistent improvements across several distinct decoder backbones (e.g., Wan 2.1 and VideoVAE+), achieving up to +2.1dB PSNR over the unconditional baselines on the Inter4K, WebVid, and Large Motion reconstruction benchmarks. Notably, RefDecoder can be directly swapped into existing video generation systems without additional fine-tuning, and we report across-the-board improvements in subject consistency, background consistency, and overall quality scores on the VBench I2V benchmark. Beyond I2V, RefDecoder generalizes well to a wide range of visual generation tasks such as style transfer and video editing refinement.

【2】Croissant Baker: Metadata Generation for Discoverable, Governable, and Reusable ML Datasets
标题:Croissant Baker:可发现、可治理和可重复使用的ML数据集的元数据生成
链接:https://arxiv.org/abs/2605.15079

作者:Rafi Al Attrach,Rajna Fani,Sebastian Lobentanzer,Joan Giner-Miguelez,Debanshu Das,Varuni H. K.,Nobin Sarwar,Rajat Ghosh,Anwai Archit,Surbhi Motghare,Christina Conrad Parry,Luis Oala,Lara Grosso,Joaquin Vanschoren,Steffen Vogler,Sujata Goswami,Eric S. Rosenthal,Marzyeh Ghassemi,Matthew McDermott,Tom Pollard
备注:23 pages, 5 figures, 11 tables. Project: https://lcp.mit.edu/croissant-baker/ Code: https://github.com/MIT-LCP/croissant-baker
摘要:Croissant has emerged as the metadata standard for machine learning datasets, providing a structured, JSON-LD-based format that makes dataset discovery, automated ingestion, and reproducible analysis machine-checkable across ML platforms. Adoption has accelerated, and NeurIPS now requires Croissant metadata in every submission to its dataset tracks. Yet in practice Croissant generation usually starts with uploading data to a public platform, a path infeasible for governed and large local repositories that hold much of the high-value data ML increasingly relies on. We release Croissant Baker, a local-first, open-source command-line tool that generates validated Croissant metadata directly from a dataset directory through a modular handler registry. We evaluate Croissant Baker on over 140 datasets, scaling to MIMIC-IV at 886 million rows and 374 Parquet files. On held-out comparisons against producer-authored or standards-derived ground truth, Croissant Baker reaches 97-100% agreement across multiple domains.

【3】Fast Adversarial Attacks with Gradient Prediction
标题:具有梯度预测的快速对抗攻击
链接:https://arxiv.org/abs/2605.14868

作者:Kamil Ciosek,Aleksandr V. Petrov,Nicolò Felicioni,Konstantina Palla
备注:17 pages
摘要:Generating adversarial examples at scale is a core primitive for robustness evaluation, adversarial training, and red-teaming, yet even "fast" attacks such as FGSM remain throughput-limited by the cost of a backward pass. We introduce a family of attacks that eliminates the backward pass by predicting the input gradient from forward-pass hidden states via a lightweight linear regression. The approach is motivated by a kernel view of neural networks and is exact in the Neural Tangent Kernel regime, while remaining effective for practical finite-width models. Empirically, our methods recover much of FGSM's attack performance while using only a small fraction of the time, corresponding to a $532\%$ increase in throughput. These results suggest gradient prediction as a simple and general route to significantly faster adversarial generation under realistic wall-clock constraints.

【4】ReMIA: a Powerful and Efficient Alternative to Membership Inference Attacks against Synthetic Data Generators
标题:ReMIA:针对合成数据生成器的成员推断攻击的强大而有效的替代方案
链接:https://arxiv.org/abs/2605.14686

作者:Davide Scassola,Andrea Coser,Sebastiano Saccani
摘要 :Tabular data sharing under privacy constraints is increasingly important for research and collaboration. Synthetic data generators (SDGs) are a promising solution, but synthetic data remains vulnerable to attacks, such as membership inference attacks (MIAs), which aim to determine whether a specific record was part of the training data. State-of-the-art MIAs are powerful but impractical: they rely on shadow modeling, requiring hundreds of SDG training runs, and need auxiliary data several times larger than the original training set. Fast proxy metrics like distance to closest record (DCR) are efficient but have limited sensitivity to MIA risk. We introduce ReMIA (Relative Membership Inference Attack), a practical privacy metric that requires only two SDG training runs and additional data no larger than the original training set. Rather than predicting whether a record was in the training set, ReMIA generates two synthetic datasets from two source datasets and measures whether a classifier can identify which source a record came from. Experiments across multiple tabular datasets and SDGs show that ReMIA has a sensitivity comparable to state-of-the-art MIAs while being substantially more practical. We further observe that SDGs can achieve privacy-utility trade-offs that traditional noise-based anonymization methods do not match. Code is available at https://github.com/aindo-com/remia.

【5】Systematic Discovery of Semantic Attacks in Online Map Construction through Conditional Diffusion
标题:通过条件扩散系统发现在线地图构建中的语义攻击
链接:https://arxiv.org/abs/2605.14396

作者:Chenyi Wang,Ruoyu Song,Raymond Muller,Jean-Philippe Monteuuis,Jonathan Petit,Z. Berkay Celik,Ryan Gerdes,Ming F. Li
摘要:Autonomous vehicles depend on online HD map construction to perceive lane boundaries, dividers, and pedestrian crossings -- safety-critical road elements that directly govern motion planning. While existing pixel perturbation attacks can disrupt the mapping, they can be neutralized by standard adversarial defenses. We present MIRAGE, a framework for systematic discovery of semantic attacks that bypass adversarial defenses and degrade mapping predictions by finding plausible environmental variation (e.g. shadows, wet roads). MIRAGE exploits the latent manifold of real-world data learned by diffusion models, and searches for semantically mutated scenes neighboring the ground truth with the same road topology yet mislead the mapping predictions. We evaluate MIRAGE on nuScenes and demonstrate two attacks: (1) boundary removal, suppressing 57.7% of detections and corrupting 96% of planned trajectories; and (2) boundary injection, the only method that successfully injects fictitious boundaries, while pixel PGD and AdvPatch fail entirely. Both attacks remain potent under various adversarial defenses. We use two independent VLM judges to quantify realism, where MIRAGE passes as realistic 80--84% of the time (vs. 97--99% for clean nuScenes), while AdvPatch only 0--9%. Our findings expose a categorical gap in current adversarial defenses: semantic-level perturbations that manifest as legitimate environmental variation are substantially harder to mitigate than pixel-level perturbations.

【6】Paraphrasing Attack Resilience of Various AI-Generated Text Detection Methods
标题:各种人工智能生成文本检测方法的解释攻击韧性
链接:https://arxiv.org/abs/2605.14240

作者:Andrii Shportko,Inessa Verbitsky
备注:NAACL 2025
摘要:The recent large-scale emergence of LLMs has left an open space for dealing with their consequences, such as plagiarism or the spread of false information on the Internet. Coupling this with the rise of AI detector bypassing tools, reliable machine-generated text detection is in increasingly high demand. We investigate the paraphrasing attack resilience of various machine-generated text detection methods, evaluating three approaches: fine-tuned RoBERTa, Binoculars, and text feature analysis, along with their ensembles using Random Forest classifiers. We discovered that Binoculars-inclusive ensembles yield the strongest results, but they also suffer the most significant losses during attacks. In this paper, we present the dichotomy of performance versus resilience in the world of AI text detection, which complicates the current perception of reliability among state-of-the-art techniques.

【7】Mistletoe: Stealthy Acceleration-Collapse Attacks on Speculative Decoding
标题:槲寄生:对投机解码的隐形加速崩溃攻击
链接:https://arxiv.org/abs/2605.14005

作者:Shuoyang Sun,Chang Da,Hao Fang,Kuofeng Gao,Xinhao Zhong,Yi Sun,Fan Mo,Shu-Tao Xia,Bin Chen
摘要:Speculative decoding has become a widely adopted technique for accelerating large language model (LLM) inference by drafting multiple candidate tokens and verifying them with a target model in parallel. Its efficiency, however, critically depends on the average accepted length $τ$, i.e., how many draft tokens survive each verification step. In this work, we identify a new mechanism-level vulnerability in model-based speculative decoding: the drafter is trained to approximate the target model distribution, but this approximation is inevitably imperfect. Such a drafter-target mismatch creates a hidden attack surface where small perturbations can preserve the target model's visible behavior while substantially reducing draft-token acceptability. We propose Mistletoe, a stealthy acceleration-collapse attack against speculative decoding. Mistletoe directly targets the acceptance mechanism of speculative decoding. It jointly optimizes a degradation objective that decreases drafter-target agreement and a semantic-preservation objective that constrains the target model's output distribution. To resolve the conflict between these objectives, we introduce a null-space projection mechanism, where degradation gradients are projected away from the local semantic-preserving direction, suppressing draft acceptance while minimizing semantic drift. Experiments on various speculative decoding systems show that Mistletoe substantially reduces average accepted length $τ$, collapses speedup, and lowers averaged token throughput, while preserving output quality and perplexity. Our work highlights that speculative decoding introduces a mechanism-level attack surface beyond existing output robustness, calling for more robust designs of LLM acceleration systems.

【8】Realiz3D: 3D Generation Made Photorealistic via Domain-Aware Learning
标题:Realize 3D:通过领域感知学习使3D生成变得真实感
链接:https://arxiv.org/abs/2605.13852

作者:Ido Sobol,Kihyuk Sohn,Yoav Blum,Egor Zakharov,Max Bluvstein,Andrea Vedaldi,Or Litany
备注:Accepted to CVPR 2026. Project page: https://idosobol.github.io/realiz3d/
摘要:We often aim to generate images that are both photorealistic and 3D-consistent, adhering to precise geometry, material, and viewpoint controls. Typically, this is achieved by fine-tuning an image generator, pre-trained on billions of real images, using renders of synthetic 3D assets, where annotations for control signals are available. While this approach can learn the desired controls, it often compromises the realism of the images due to domain gap between photographs and renders. We observe that this issue largely arises from the model learning an unintended association between the presence of control signals and the synthetic appearance of the images. To address this, we introduce Realiz3D, a lightweight framework for training diffusion models, that decouples controls and visual domain. The key idea is to explicitly learn visual domain, real or synthetic, separately from other control signals by introducing a co-variate that, fed into small residual adapters, shifts the domain. Then, the generator can be trained to gain controllability, without fitting to specific visual domain. In this way, the model can be guided to produce realistic images even when controls are applied. We enhance control transferability to the real domain by leveraging insights about roles of different layers and denoising steps in diffusion-based generators, informing new training and inference strategies that further mitigate the gap. We demonstrate the advantages of Realiz3D in tasks as text-to-multiview generation and texturing from 3D inputs, producing outputs that are 3D-consistent and photorealistic.

【9】Physics-Grounded Adversarial Stain Augmentation with Calibrated Coverage Guarantees
标题:基于身体的对抗性染色增强和校准的覆盖保证
链接:https://arxiv.org/abs/2605.13889

作者:Mingi Hong
摘要 :Stain variation across hospitals degrades histopathology models at deployment. Existing augmentation methods perturb color spaces with arbitrary hyperparameters, lacking both a principled budget and coverage guarantees for unseen centers. We propose \textbf{C}alibrated \textbf{A}dversarial \textbf{S}tain \textbf{A}ugmentation (\textbf{CASA}), which performs adversarial augmentation in the Macenko stain parameter space with a budget calibrated from multi-center statistics via the DKW inequality. On Camelyon17-WILDS (5 seeds), CASA achieves $93.9\% \pm 1.6\%$ slide-level accuracy -- outperforming HED-strong ($88.4\% \pm 7.3\%$), RandStainNA ($85.2\% \pm 6.7\%$), and ERM ($63.9\% \pm 11.3\%$) -- with the highest worst-group accuracy ($84.9\% \pm 0.9\%$) among all 10 compared methods.

半/弱/无/有监督|不确定性|主动学习(11篇)

【1】Self-Distilled Agentic Reinforcement Learning
标题:自我提炼强化学习
链接:https://arxiv.org/abs/2605.15155

作者:Zhengxi Lu,Zhiyuan Yao,Zhuowen Han,Zi-Han Wang,Jinyang Wu,Qi Gu,Xunliang Cai,Weiming Lu,Jun Xiao,Yueting Zhuang,Yongliang Shen
摘要:Reinforcement learning (RL) has emerged as a central paradigm for post-training LLM agents, yet its trajectory-level reward signal provides only coarse supervision for long-horizon interaction. On-Policy Self-Distillation (OPSD) complements RL by introducing dense token-level guidance from a teacher branch augmented with privileged context. However, transferring OPSD to multi-turn agents proves problematic: compounding multi-turn instability destabilizes supervision, while skill-conditioned privileged guidance requires asymmetric treatment for negative teacher rejections may arise from imperfect skills retrieval or utilization. We introduce SDAR (Self-Distilled Agentic Reinforcement Learning), which treats OPSD as a gated auxiliary objective while keeping RL as the primary optimization backbone. SDAR maps detached token-level signals into a sigmoid gate, strengthening distillation on teacher-endorsed positive-gap tokens and softly attenuating negative teacher rejections. Across the Qwen2.5 and Qwen3 families on ALFWorld, WebShop, and Search-QA, SDAR substantially improves over GRPO (+9.4% on ALFWorld, +7.0% on Search-QA, +10.2% on WebShop-Acc), avoids the instability of naive GRPO+OPSD, and consistently outperforms hybrid RL--OPSD baselines across model scales.

【2】Separating Intrinsic Ambiguity from Estimation Uncertainty in Deep Generative Models for Linear Inverse Problems
标题:线性反问题深生成模型中的内在模糊性与估计不确定性分离
链接:https://arxiv.org/abs/2605.15050

作者:Yuxin Guo,Dongrui Deng,Pulkit Grover
摘要:Recently, deep generative models have been used for posterior inference in inverse problems, including high-stakes applications in medical imaging and scientific discovery, where the uncertainty of a prediction can matter as much as the prediction itself. However, posterior uncertainty is difficult to interpret because it can mix ambiguity inherent to the forward operator with uncertainty propagated through inference. We introduce a structural decomposition of posterior uncertainty that isolates intrinsic ambiguity. A cascade formulation makes this ambiguity accessible for calibration analysis, enabling qualitative diagnostics and simulation-based calibration tests that reveal failure modes that remain hidden when models are selected by reconstruction quality alone. We first validate the approach on a Gaussian example with analytical posterior structure, then illustrate the decomposition on accelerated magnetic resonance imaging (MRI), and finally apply the calibration diagnostics to electroencephalography (EEG) source imaging.

【3】A Mutual Information Lower Bound for Multimodal Regression Active Learning
标题:多模式回归主动学习的互信息下限
链接:https://arxiv.org/abs/2605.14917

作者:Leonardo Ferreira Guilhoto,Akshat Kaushal,Paris Perdikaris
摘要:Active learning for continuous regression has lacked an acquisition function that targets epistemic uncertainty when the predictive distribution is multimodal: variance misses modal disagreement, and information-theoretic targets like BALD are designed for discrete outputs. We introduce a Two-Index framework that makes this separation explicit: one stochastic index selects among competing model hypotheses (epistemic source), while a second governs within-hypothesis randomness (aleatoric source). An entropy decomposition within the framework identifies the mutual information between the output and the epistemic index as a principled acquisition objective, and we prove this quantity vanishes as the model is trained on growing datasets, confirming that it captures exactly the uncertainty data can resolve. Because this mutual information is intractable for continuous outputs, we derive the Mutual Information Lower Bound (MI-LB) acquisition function, a closed-form approximation for Mixture Density Network ensembles. On benchmarks featuring multimodal systems, MI-LB matches or beats every baseline evaluated and is the only method to do so consistently -- geometric and Fisher-based baselines compete only when the input space already encodes the multimodality, and collapse otherwise.

【4】Cognitive-Uncertainty Guided Knowledge Distillation for Accurate Classification of Student Misconceptions
标题:认知不确定性引导的知识提炼对学生误解进行准确分类
链接:https://arxiv.org/abs/2605.14752

作者:Qirui Liu,Hao Chen,Weijie Shi,Jiajie Xu,Jia Zhu
备注:ACL 2026 Findings. 10 pages, 5 figures, 19 tables
摘要:Accurately identifying student misconceptions is crucial for personalized education but faces three challenges: (1) data scarcity with long-tail distribution, where authentic student reasoning is difficult to synthesize; (2) fuzzy boundaries between error categories with high annotation noise; (3) deployment parado-large models overlook unconventional approaches due to pretraining bias and cannot be deployed on edge, while small models overfit to noise. Unlike traditional methods that increase diversity through large-scale data synthesis, we propose a two-stage knowledge distillation framework that mines high-value samples from existing data. The first stage performs standard distillation to transfer task capabilities. The second stage introduces a dual-layer marginal selection mechanism based on cognitive uncertainty, identifying four types of critical samples based on teacher model uncertainty and confidence differences. For different data subsets, we design difficulty-adaptive mechanism to balance hard/soft label contributions, enabling student models to inherit inter-class relationships from teacher soft labels while distinguishing ambiguous error types. Experiments show that with augmented training on only 10.30% of filtered samples, we achieve MAP@3 of 0.9585 (+17.8%) on the MAP-Charting dataset, and using only a 4B parameter model, we attain 84.38% accuracy on cross-topic tests of middle school algebra misconception benchmarks, significantly outperforming sota LLM (67.73%) and standard fine-tuned 72B models (81.25%). Our code is available at https://github.com/RoschildRui/acl2026_map.

【5】Learning Scenario Reduction for Two-Stage Robust Optimization with Discrete Uncertainty
标题:离散不确定性两阶段鲁棒优化的学习场景约简
链接:https://arxiv.org/abs/2605.14494

作者:Tianjue Lin,Jianan Zhou,Jieyi Bi,Yaoxin Wu,Wen Song,Zhiguang Cao,Jie Zhang
摘要 :Two-Stage Robust Optimization (2RO) with discrete uncertainty is challenging, often rendering exact solutions prohibitive. Scenario reduction alleviates this issue by selecting a small, representative subset of scenarios to enable tractable computation. However, existing methods are largely problem-agnostic, operating solely on the uncertainty set without consulting the feasible region or recourse structure. In this paper, we introduce PRISE, a problem-driven sequential lookahead heuristic that constructs reduced scenario sets by evaluating the marginal impact of each scenario. While PRISE yields high-quality scenario subsets, each selection step requires solving multiple subproblems, making it computationally expensive at scale. To address this, we propose NeurPRISE, a neural surrogate model built on a GNN-Transformer backbone that encodes the per-scenario structure via graph convolution and captures cross-scenario interactions through attention. NeurPRISE is trained via imitation learning with a gain-aware ranking objective, which distills marginal gain information from PRISE into a learned scoring function for scenario ranking and selection. Extensive results on three 2RO problems show that NeurPRISE consistently achieves competitive regret relative to comprehensive methods, maintains strong calability with varying numbers of scenarios, and delivers 7-200x speedup over PRISE. NeurPRISE also exhibits strong zero-shot generalization, effectively handling instances with larger problem scales (up to 5x), more scenarios (up to 4x), and distribution shifts.

【6】Self-Regulated Learning in Essay Writing: Consistency of Strategies and Impact on Outcomes
标题:论文写作中的自我调节学习:策略的一致性及其对结果的影响
链接:https://arxiv.org/abs/2605.14228

作者:Gloria Fernández-Nieto,Kiyoshige Garcés,Mladen Raković,Tongguang Li,Xinyu Li,Linxuan Zhao,Dragan Gašević
备注:16 pages, 4 figures, submitted to Journal of Computer Assisted Learning (JCAL) [Under Review]
摘要:Background: Abilities for effective self-regulated learning (SRL) are critical for lifelong learning, particularly during adolescence when these skills consolidate and strongly influence future learning. Their importance has grown with the rise of online and blended education. Yet, little is known about how secondary school students self-regulate in online environments, how their SRL processes and strategies evolve, or how they affect outcomes. In secondary education, understanding these processes can reveal patterns and indicators of learning success, informing the design of online support mechanisms. Evidence from repeated-measures designs remains scarce. Objectives: This study aims to examine how secondary school students enact SRL strategies during online essay writing, how these strategies change over time, and how they relate to learning outcomes. Methods: We analysed metacognition-related trace data collected from secondary students during a two-wave online essay-writing task conducted one week apart in two Colombian schools (N = 93 for session 1, N = 95 for session 2) via a digital learning platform. Using a combination of process mining and unsupervised machine learning techniques, we identified dominant SRL strategies grounded in established SRL processes and examined their stability and association with learning outcomes. Results and conclusions: Three dominant SRL strategies were identified. Results showed variability: many students remained in or shifted to Read first, write next, while none used Write intensively, read selectively in session 2. Although less common, latter strategy was positively associated with learning outcomes.

【7】ASH: Agents that Self-Hone via Embodied Learning
标题:ASH:通过有序学习自我磨练的代理人
链接:https://arxiv.org/abs/2605.14211

作者:Benjamin Schneider,Xavier Schneider,Victor Zhong,Sun Sun
摘要:Long-horizon embodied tasks remain a fundamental challenge in AI, as current methods rely on hand-engineered rewards or action-labeled demonstrations, neither of which scales. We introduce ASH, an agentic system that learns an embodied policy from unlabeled, noisy internet video, without reward shaping or expert annotation. ASH follows a self-improvement loop; when it gets stuck, ASH learns an Inverse Dynamics Model (IDM) from its own trajectories, and uses its IDM to extract supervision from relevant internet video. ASH uses unsupervised learning to identify key moments from large-scale internet video and retains them as long-term memory -- allowing it to tackle long-horizon problems. We evaluate ASH on two complementary environments demanding multi-hour planning: Pokemon Emerald, a turn-based RPG, and The Legend of Zelda: The Minish Cap, a real-time action-adventure game. In both games, behavioral cloning, retrieval-augmented and zero-shot foundation-model baselines plateau, while ASH sustains progression across our 8-hour evaluation. ASH reaches an average of $11.2/12$ milestones in Pokemon Emerald and $9.9/12$ in Legend of Zelda, while the strongest baseline gets stuck in both environments at an average of $6.5/12$ and $6.0/12$ milestones, respectively. We demonstrate that self-improving agents are a scalable recipe for long-horizon embodied learning.

【8】CSI-JEPA: Towards Foundation Representations for Ubiquitous Sensing with Minimal Supervision
标题:CSI-JEPA:以最低限度的监督实现泛在感知的基金会代表
链接:https://arxiv.org/abs/2605.14171

作者:Xuanhao Luo,Zhizhen Li,Yuchen Liu
摘要:Channel state information (CSI) provides a widely available sensing modality for human and environment perception, but existing CSI sensing models usually rely on task-specific supervised training and require substantial labeled data for each task, device, user, or environment. This limits their scalability in practical deployments where unlabeled CSI is abundant but labeled data is costly to collect. In this paper, we present CSI-JEPA, a self-supervised predictive representation learning framework for label-efficient, multi-task Wi-Fi sensing. CSI-JEPA learns reusable temporal-spectral representations from unlabeled CSI samples by predicting latent features of masked channel regions from visible context. To better match the physical structure of CSI, CSI-JEPA tokenizes channel-response amplitude windows along the time and subcarrier dimensions. It then introduces a channel variation-aware masking strategy that samples predictive targets from regions with stronger local temporal and subcarrier-domain variations. After pretraining, the encoder is frozen and used as a backbone, with lightweight task-specific adapters added for downstream sensing tasks. We evaluate CSI-JEPA on seven real-world Wi-Fi sensing tasks spanning diverse objectives and deployment settings. The results show that CSI-JEPA improves downstream sensing performance over competitive baselines, achieving up to 10.64 percentage points mean accuracy gain over state-of-the-art supervised Transformer and matched-budget label savings of up to 98.0%.

【9】Self-Pruned Key-Value Attention: Learning When to Write by Predicting Future Utility
标题:自我修剪的关键价值注意力:通过预测未来效用来学习何时写作
链接:https://arxiv.org/abs/2605.14037

作者:Gergely Szilvasy (1), Manuel Faysse (1 and 2), Maria Lomeli (1), Matthijs Douze (1), Pierre-Emmanuel Mazaré (1), Loïc Cabannes (1), Wen-tau Yih (1), Hervé Jégou (1) ((1) Meta FAIR, (2) MICS, CentraleSupélec)
备注:28 pages, 8 figures, 8 tables
摘要

【10】R2R2: Robust Representation for Intensive Experience Reuse via Redundancy Reduction in Self-Predictive Learning
标题:R2 R2:通过自我预测学习中的冗余减少来实现密集经验重用的稳健表示
链接:https://arxiv.org/abs/2605.14026

作者:Sanghyeob Song, Donghyeok Lee, Jinsik Kim, Sungroh Yoon
备注:Accepted at the Forty-Third International Conference on Machine Learning (ICML 2026). This is the camera-ready version
摘要

【11】Unsupervised learning of acquisition variability in structural connectomes via hybrid latent space modeling
标题:通过混合潜在空间建模对结构连接组中的习得变异性进行无监督学习
链接:https://arxiv.org/abs/2605.13933

作者:Gaurav Rudravaram,Lianrui Zuo,Karthik Ramadass,Elyssa McMaster,Jongyeon Yoon,Aravind R. Krishnan,Adam M. Saunders,Chenyu Gao,Nancy R. Newlin,Praitayini Kanakaraj,Lori L. Beason Held,Murat Bilgel,Laura A. Barquero,Micah DArchangel,Tin Q. Nguyen,Laurie B. Cutting,Derek Archer,Timothy J. Hohman,Daniel C. Moyer,Bennett A. Landman
摘要 :Acquisition differences across sites, scanners, and protocols in dMRI introduce variability that complicates structural connectome analysis. This motivates deep learning models that can represent high-dimensional connectomes in a low-dimensional space while explicitly separating acquisition-related effects from biological variation. Conventional dimensionality reduction methods model all variance as continuous, so acquisition effects often get absorbed into a continuous latent space. Recent hybrid latent-space models combine discrete and continuous components to address this, but typically require manual capacity tuning to ensure the discrete component captures the intended variability. We introduce an unsupervised framework that removes this manual tuning by architecturally annealing encoder outputs before decoding, allowing the model to adaptively balance discrete and continuous latent variables during training. To evaluate it, we curated a dataset of N=7,416 structural connectomes derived from dMRI, spanning ages 2 to 102 and 13 studies with 25 unique acquisition-parameter combinations. Of these, 5,900 are cognitively unimpaired, 877 have mild cognitive impairment (MCI), and 639 have Alzheimer's disease (AD). We compare against a standard VAE, PCA with k-means clustering, and hybrid models that anneal only through the loss function. Our architectural annealing produces stronger site learning (ARI=0.53, p<0.05) than these baselines. Results show that a hybrid continuous-discrete latent space, with architectural rather than loss-based annealing, provides a useful unsupervised mechanism for capturing acquisition variability in dMRI: by jointly modeling smooth and categorical structure, the Joint-VAE recovers clusters aligned with scanner and protocol differences.

迁移|Zero/Few/One-Shot|自适应(11篇)

【1】FutureSim: Replaying World Events to Evaluate Adaptive Agents
标题:FutureSim:回放世界事件以评估适应性代理
链接:https://arxiv.org/abs/2605.15188

作者:Shashwat Goel,Nikhil Chandak,Arvindh Arun,Ameya Prabhu,Steffen Staab,Moritz Hardt,Maksym Andriushchenko,Jonas Geiping
备注:31 pages, 10 main
摘要:AI agents are being increasingly deployed in dynamic, open-ended environments that require adapting to new information as it arrives. To efficiently measure this capability for realistic use-cases, we propose building grounded simulations that replay real-world events in the order they occurred. We build FutureSim, where agents forecast world events beyond their knowledge cutoff while interacting with a chronological replay of the world: real news articles arriving and questions resolving over the simulated period. We evaluate frontier agents in their native harness, testing their ability to predict world events over a three-month period from January to March 2026. FutureSim reveals a clear separation in their capabilities, with the best agent's accuracy being 25%, and many having worse Brier skill score than making no prediction at all. Through careful ablations, we show how FutureSim offers a realistic setting to study emerging research directions like long-horizon test-time adaptation, search, memory, and reasoning about uncertainty. Overall, we hope our benchmark design paves the way to measure AI progress on open-ended adaptation spanning long time-horizons in the real world.

【2】Boosting Reinforcement Learning with Verifiable Rewards via Randomly Selected Few-Shot Guidance
标题:通过随机选择的Few-Shot指导通过可验证的奖励来增强强化学习
链接:https://arxiv.org/abs/2605.15012

作者:Kai Yan,Alexander G. Schwing,Yu-Xiong Wang
备注:25 pages, 11 figures
摘要:Reinforcement Learning with Verifiable Rewards (RLVR) has achieved great success in developing Large Language Models (LLMs) with chain-of-thought rollouts for many tasks such as math and coding. Nevertheless, RLVR struggles with sample efficiency on difficult problems where correct rollouts are hard to generate. Prior works propose to address this issue via demonstration-guided RLVR, i.e., to conduct Supervised FineTuning (SFT) when RL fails; however, SFT often requires a lot of data, which can be expensive to acquire. In this paper, we propose FEST, a FEw-ShoT demonstration-guided RLVR algorithm. It attains compelling results with only 128 demonstrations randomly selected from an SFT dataset. We find that three components are vital for the success: supervised signal, on-policy signal, and decaying weights on the few-shot SFT dataset to prevent overfitting from multiple-epoch training. On several benchmarks, FEST outperforms baselines with magnitudes less SFT data, even matching their performance with full dataset.

【3】One Step to the Side: Why Defenses Against Malicious Finetuning Fail Under Adaptive Adversaries
标题:靠边一步:为什么在适应性对手的情况下,针对恶意微调的防御会失败
链接:https://arxiv.org/abs/2605.14605

作者:Itay Zloczower,Eyal Lenga,Gilad Gressel,Yisroel Mirsky
备注:Under review
摘要:Model providers increasingly release open weights or allow users to fine-tune foundation models through APIs. Although these models are safety-aligned before release, their safeguards can often be removed by fine-tuning on harmful data. Recent defenses aim to make models robust to such malicious fine-tuning, but they are largely evaluated only against fixed attacks that do not account for the defense. We show that these robustness claims are incomplete. Surveying 15 recent defenses, we identify several defense mechanisms and show that they share a single weakness: they obscure or misdirect the path to harmful behavior without removing the behavior itself. We then develop a unified adaptive attack that breaks defenses across all defense mechanisms. Our results show that current approaches do not provide robust security; they mainly stop the attacks they were designed against. We hope that our unified adaptive adversary for this domain will help future researchers and practitioners stress-test new defenses before deployment.

【4】ArcGate: Adaptive Arctangent Gated Activation
标题:ArcGate:自适应反正切门控激活
链接:https://arxiv.org/abs/2605.14518

作者:Avik Bhattacharya,Siddhant Dnyanesh Gole,Subhasis Chaudhuri,Alejandro C. Frery,Biplab Banerjee
摘要:Activation functions are central to deep networks, influencing non-linearity, feature learning, convergence, and robustness. This paper proposes the Adaptive Arctangent Gated Activation (ArcGate) function, a flexible formulation that generates a broad spectrum of activation shapes via a three-stage non-linear transformation. Unlike conventional fixed-shape activations such as ReLU, GELU, or SiLU, ArcGate uses seven learnable parameters per layer, allowing the neural network to autonomously optimize its non-linearity to the specific requirements of the feature hierarchy and data distribution. We evaluate ArcGate using ResNet-50 and Vision Transformer (ViT-B/16) architectures on three widely used remote sensing benchmarks: PatternNet, UC Merced Land Use, and the 13-band EuroSAT MSI multispectral dataset. Experimental results show that ArcGate consistently outperforms standard baselines, achieving a peak overall accuracy of 99.67% on PatternNet. Most notably, ArcGate exhibits superior structural resilience in noisy environments, maintaining a 26.65% performance lead over ReLU under moderate Gaussian noise (standard deviation 0.1). Analysis of the learned parameters reveals a depth-dependent functional evolution, where the model increases gating strength in deeper layers to enhance signal propagation. These findings suggest that ArcGate is a robust and adaptive general node activation function for high-resolution earth observation tasks.

【5】ROAD: Adaptive Data Mixing for Offline-to-Online Reinforcement Learning via Bi-Level Optimization
标题:ROAD:通过双层优化实现离线到在线强化学习的自适应数据混合
链接:https://arxiv.org/abs/2605.14497

作者:Letian Yang,Xu Liu,Yiqiang Lu,Jian Liu,Weiqiang Wang,Shuai Li
备注:20 pages, 9 figures, 7 tables. Accepted to IJCAI 2026
摘要 :Offline-to-online reinforcement learning harnesses the stability of offline pretraining and the flexibility of online fine-tuning. A key challenge lies in the non-stationary distribution shift between offline datasets and the evolving online policy. Common approaches often rely on static mixing ratios or heuristic-based replay strategies, which lack adaptability to different environments and varying training dynamics, resulting in suboptimal tradeoff between stability and asymptotic performance. In this work, we propose Reinforcement Learning with Optimized Adaptive Data-mixing (ROAD), a dynamic plug-and-play framework that automates the data replay process. We identify a fundamental objective misalignment in existing approaches. To tackle this, we formulate the data selection problem as a bi-level optimization process, interpreting the data mixing strategy as a meta-decision governing the policy performance (outer-level) during online fine-tuning, while the conventional Q-learning updates operate at the inner level. To make it tractable, we propose a practical algorithm using a multi-armed bandit mechanism. This is guided by a surrogate objective approximating the bi-level gradient, which simultaneously maintains offline priors and prevents value overestimation. Our empirical results demonstrate that this approach consistently outperforms existing data replay methods across various datasets, eliminating the need for manual, context-specific adjustments while achieving superior stability and asymptotic performance.

【6】LiSA: Lifelong Safety Adaptation via Conservative Policy Induction
标题:LiSA:通过保守政策引导进行终身安全调整
链接:https://arxiv.org/abs/2605.14454

作者:Minbeom Kim,Lesly Miculicich,Bhavana Dalvi Mishra,Mihir Parmar,Phillip Wallis,Bharath Chandrasekhar,Kyomin Jung,Tomas Pfister,Long T. Le
备注:27 pages, 3 figures
摘要:As AI agents move from chat interfaces to systems that read private data, call tools, and execute multi-step workflows, guardrails become a last line of defense against concrete deployment harms. In these settings, guardrail failures are no longer merely answer-quality errors: they can leak secrets, authorize unsafe actions, or block legitimate work. The hardest failures are often contextual: whether an action is acceptable depends on local privacy norms, organizational policies, and user expectations that resist pre-deployment specification. This creates a practical gap: guardrails must adapt to their own operating environments, yet deployment feedback is typically limited to sparse, noisy user-reported failures, and repeated fine-tuning is often impractical. To address this gap, we propose LiSA (Lifelong Safety Adaptation), a conservative policy induction framework that improves a fixed base guardrail through structured memory. LiSA converts occasional failures into reusable policy abstractions so that sparse reports can generalize beyond individual cases, adds conflict-aware local rules to prevent overgeneralization in mixed-label contexts, and applies evidence-aware confidence gating via a posterior lower bound, so that memory reuse scales with accumulated evidence rather than empirical accuracy alone. Across PrivacyLens+, ConFaide+, and AgentHarm, LiSA consistently outperforms strong memory-based baselines under sparse feedback, remains robust under noisy user feedback even at 20% label-flip rates, and pushes the latency--performance frontier beyond backbone model scaling. Ultimately, LiSA offers a practical path to secure AI agents against the unpredictable long tail of real-world edge risks.

【7】Distributionally Robust Multi-Task Reinforcement Learning via Adaptive Task Sampling
标题:通过自适应任务采样的分布式鲁棒多任务强化学习
链接:https://arxiv.org/abs/2605.14350

作者:Nicholas E. Corrado,Wenyuan Huang,Josiah P. Hanna
摘要:Multi-task reinforcement learning (MTRL) aims to train a single agent to efficiently optimize performance across multiple tasks simultaneously. However, jointly optimizing all tasks often yields imbalanced learning: agents quickly solve easy tasks but learn slowly on harder ones. While prior work primarily attributes this imbalance to conflicting task gradients and proposes gradient manipulation or specialized architectures to address it, we instead focus on a distinct and under-explored challenge: imbalanced data allocation. Standard MTRL allocates an equal number of environment interactions to each task, which over-allocates data to easy tasks that require relatively few interactions to solve and under-allocates data to hard tasks that require substantially more experience to solve. To address this challenge, we introduce Distributionally Robust Adaptive Task Sampling (DRATS), an algorithm that adaptively prioritizes sampling tasks furthest from being solved. We derive DRATS by formalizing MTRL as a feasibility problem from which we derive a minimax objective for minimizing the worst-case return gap, the difference between a desired target return and the agent's return on a task. In benchmarks like MetaWorld-MT10 and MT50, DRATS improves data efficiency and increases worst-task performance compared to existing task sampling algorithms.

【8】Language-Induced Priors for Domain Adaptation
标题:域适应的特权诱导先验
链接:https://arxiv.org/abs/2605.14301

作者:Qiyuan Chen,Jiayu Zhou,Raed Al Kontar
摘要:Domain adaptation faces a fundamental paradox in the cold-start regime. When target data is scarce, statistical methods fail to distinguish relevant source domains from irrelevant ones, which often leads to negative transfer. In this paper, we address this challenge by leveraging expert textual descriptions of the target domain, a resource that is often available but overlooked. We propose a probabilistic framework that translates these semantic descriptions into a choice model, namely a Language-Induced Prior (LIP), that learns the preferences from a pretrained Large Language Model (LLM). The LIP is then integrated into an Expectation-Maximization algorithm to identify source relevance. Methodologically, this framework is compatible with any parametric model where a likelihood is available. It allows the LIP to guide the selection of sources when target signals are weak, while gradually refining these choices as samples accumulate. Theoretically, we prove that the estimator roughly matches an oracle cold-start MSE under a correct prior, while remaining asymptotically consistent regardless of the quality of the LIP. Empirically, we validated the framework on a descriptive (Gaussian estimation), a predictive (C-MAPSS dataset), and a prescriptive task (MuJoCo hopper).

【9】Reliability-Gated Source Anchoring for Continual Test-Time Adaptation
标题:用于连续测试时间自适应的可靠性门控源锚定
链接:https://arxiv.org/abs/2605.14063

作者:Vikash Singh, Debargha Ganguly, Weicong Chen, Sabyasachi Sahoo, Sreehari Sankar, Biyao Zhang, Mohsen Harir, Shouren Wang, Osama Zafar, Christian Gagné, Vipin Chaudhary
摘要

【10】EMA: Efficient Model Adaptation for Learning-based Systems
标题:EMA:基于学习的系统的有效模型适应
链接:https://arxiv.org/abs/2605.13942

作者:Daiyang Yu,Xinyu Chen,Yihan Zhang,Yan Liang,Yaqi Qiao,Fan Lai
备注:SIGCOMM (2026)
摘要 :Machine learning (ML) is increasingly applied to optimize system performance in tasks such as resource management and network simulation. Unlike traditional ML tasks (e.g., image classification), networked systems often operate in heterogeneous, long-running, and dynamic environment states, where input conditions (e.g., network loads) and operational objectives can shift over time and across settings. Existing learning-based systems offer little support for adaptation, resulting in costly model training, extensive data collection, degraded system performance, and slow responsiveness. This paper presents EMA, the first model adaptation system supporting learning-based systems to adapt to evolving environments with minimal operational overhead. EMA takes a system-driven, data-centric approach that accommodates diverse system and model designs while addressing two key deployment challenges. First, it reduces expensive model training by introducing state transformers that align the input state of a new environment with previously similar states, allowing models to warm-start adaptation. Second, it addresses the often-overlooked yet costly process of data labeling--collecting ground truth for exploring and training on various system decisions--by prioritizing labeling high-utility data while balancing the tradeoff between training and labeling cost. Evaluations on eight representative learning-based systems show that EMA reduces adaptation costs (e.g., GPU training time) by 14.9-42.4% while improving system performance (e.g., network throughput) by 6.9-31.3%.

【11】AIS: Adaptive Importance Sampling for Quantized RL
标题:AIS:量化RL的自适应重要性采样
链接:https://arxiv.org/abs/2605.13907

作者:Jiajun Zhou,Wei Shao,Lingchao Zheng,Yuwei Fan,Ngai Wong
摘要:Reinforcement learning (RL) for large language models (LLMs) is dominated by the cost of rollout generation, which has motivated the use of low-precision rollouts (e.g., FP8) paired with a BF16 trainer to improve throughput and reduce memory pressure. This introduces a rollout-training mismatch that biases the policy gradient and can cause training to collapse outright on reasoning benchmarks. We show that the mismatch is non-stationary and acts as a double-edged sword: early in training it provides a stochastic exploration bonus, exposing the gradient to trajectories the trainer would otherwise under-sample, but the same perturbation transitions into a destabilizing source of bias as the policy concentrates. To solve this, we propose Adaptive Importance Sampling (AIS), a correction framework that adjusts the strength of its intervention on a per-batch basis. AIS combines three real-time diagnostics, namely weight reliability, divergence severity, and variance amplification, into a single mixing coefficient that interpolates between the uncorrected and fully importance-weighted gradients, suppressing the destabilizing component of the mismatch while preserving its exploratory benefit. We integrate AIS into GRPO and evaluate it on the diffusion-based LLaDA-8B-Instruct and the autoregressive Qwen3-8B and Qwen3.5-9B across mathematical reasoning and planning benchmarks. AIS matches the BF16 baseline on most tasks while retaining the 1.5 to 2.76x rollout speedup of FP8.

强化学习(11篇)

【1】Peng's Q($λ$) for Conservative Value Estimation in Offline Reinforcement Learning
标题:离线强化学习中保守值估计的Peng的Q($X $)
链接:https://arxiv.org/abs/2605.14779

作者:Byeongchan Kim,Min-hwan Oh
备注:Accepted in ICLR 2026
摘要:We propose a model-free offline multi-step reinforcement learning (RL) algorithm, Conservative Peng's Q($λ$) (CPQL). Our algorithm adapts the Peng's Q($λ$) (PQL) operator for conservative value estimation as an alternative to the Bellman operator. To the best of our knowledge, this is the first work in offline RL to theoretically and empirically demonstrate the effectiveness of conservative value estimation with a \textit{multi-step} operator by fully leveraging offline trajectories. The fixed point of the PQL operator in offline RL lies closer to the value function of the behavior policy, thereby naturally inducing implicit behavior regularization. CPQL simultaneously mitigates over-pessimistic value estimation, achieves performance greater than (or equal to) that of the behavior policy, and provides near-optimal performance guarantees -- a milestone that previous conservative approaches could not achieve. Extensive numerical experiments on the D4RL benchmark demonstrate that CPQL consistently and significantly outperforms existing offline single-step baselines. In addition to the contributions of CPQL in offline RL, our proposed method also contributes to the offline-to-online learning framework. Using the Q-function pre-trained by CPQL in offline settings enables the online PQL agent to avoid the performance drop typically observed at the start of fine-tuning and to attain robust performance improvements. Our code is available at https://github.com/oh-lab/CPQL.

【2】DRL-STAF: A Deep Reinforcement Learning Framework for State-Aware Forecasting of Complex Multivariate Hidden Markov Processes
标题:DRL-STAF:用于复杂多元隐马尔科夫过程状态感知预测的深度强化学习框架
链接:https://arxiv.org/abs/2605.14632

作者:Manrui Jiang,Jingru Huang,Yong Chen,Chen Zhang
摘要:Forecasting multivariate hidden Markov processes is challenging due to nonlinear and nonstationary observations, latent state transitions, and cross-sequence dependencies. While deep learning methods achieve strong predictive accuracy, they typically lack explicit state modeling, whereas Hidden Markov Models (HMMs) provide interpretable latent states but struggle with complex nonlinear emissions and scalability. To address these limitations, we propose DRL-STAF, a Deep Reinforcement Learning based STate-Aware Forecasting framework that jointly predicts next-step observations and estimates the corresponding hidden states for complex multivariate hidden Markov processes. Specifically, DRL-STAF models complex nonlinear emissions using deep neural networks and estimates discrete hidden states using reinforcement learning, reducing the reliance on predefined transition structures and enabling flexible adaptation to diverse temporal dynamics. In particular, DRL-STAF mitigates the state-space explosion encountered by typical multivariate HMM-based methods. Extensive experiments demonstrate that DRL-STAF outperforms HMM variants, standalone deep learning models, and existing DL-HMM hybrids in most cases, while also providing reliable hidden-state estimates.

【3】Fast Rates for Inverse Reinforcement Learning
标题:反向强化学习的快速速度
链接:https://arxiv.org/abs/2605.14599

作者:Andreas Schlaginhaufen,Maryam Kamgarpour
摘要:We establish novel structural and statistical results for entropy-regularized min-max inverse reinforcement learning (Min-Max-IRL) with linear reward classes in finite-horizon MDPs with Borel state and action spaces. On the structural side, we show that maximum likelihood estimation (MLE) and Min-Max-IRL are equivalent at the population level, and at the empirical level under deterministic dynamics. On the statistical side, exploiting pseudo-self-concordance of the Min-Max-IRL loss, we prove that both the trajectory-level KL divergence and the squared parameter error in the Hessian norm decay at the fast rate $\mathcal{O}(n^{-1})$, where $n$ is the number of expert trajectories. Our guarantees apply under misspecification and require no exploration assumptions. We further extend reward-identifiability results to general Borel spaces and derive novel results on the derivatives of the soft-optimal value function with respect to reward parameters.

【4】Angel or Demon: Investigating the Plasticity Interventions' Impact on Backdoor Threats in Deep Reinforcement Learning
标题:天使还是魔鬼:研究可塑性干预对深度强化学习中后门威胁的影响
链接:https://arxiv.org/abs/2605.14587

作者:Oubo Ma,Ruixiao Lin,Yang Dai,Jiahao Chen,Chunyi Zhou,Linkang Du,Shouling Ji
备注:To appear in the Forty-Third International Conference on Machine Learning (ICML 2026), July 6-11, 2026, Seoul, South Korea
摘要 :Extensive research has highlighted the severe threats posed by backdoor attacks to deep reinforcement learning (DRL). However, prior studies primarily focus on vanilla scenarios, while plasticity interventions have emerged as indispensable built-in components of modern DRL agents. Despite their effectiveness in mitigating plasticity loss, the impact of these interventions on DRL backdoor vulnerabilities remains underexplored, and this lack of systematic investigation poses risks in practical DRL deployments. To bridge this gap, we empirically study 14,664 cases integrating representative interventions and attack scenarios. We find that only one intervention (i.e., SAM) exacerbates backdoor threats, while other interventions mitigate them. Pathological analysis identifies that the exacerbation is attributed to backdoor gradient amplification, while the mitigation stems from activation pathway disruption and representation space compression. From these findings, we derive two novel insights: (1) a conceptual framework SCC for robust backdoor injection that deconstructs the mechanistic interplay between interventions and backdoors in DRL, and (2) abnormal loss landscape sharpness as a key indicator for DRL backdoor detection.

【5】Resolving Action Bottleneck: Agentic Reinforcement Learning Informed by Token-Level Energy
标题:解决行动瓶颈:由代币级能量告知的强化学习
链接:https://arxiv.org/abs/2605.14558

作者:Langzhou He,Junyou Zhu,Yue Zhou,Zhengyao Gu,Junhua Liu,Wei-Chieh Huang,Henry Peng Zou,David Wipf,Philip S. Yu,Qitian Wu
备注:Preprint
摘要:Agentic reinforcement learning trains large language models using multi-turn trajectories that interleave long reasoning traces with short environment-facing actions. Common policy-gradient methods, such as PPO and GRPO, treat each token in a trajectory equally, leading to uniform credit assignment. In this paper, we critically demonstrate that such uniform credit assignment largely misallocates token-level training signals. From an energy-based modeling perspective, we show that token-level training signals, quantified by their correlations with reward variance of different rollouts sampled from a given prompt, concentrate sharply on action tokens rather than reasoning tokens, even though action tokens account for only a small fraction of the trajectory. We refer to this phenomenon as the Action Bottleneck. Motivated by this observation, we propose an embarrassingly simple token reweighting approach, ActFocus, that downweights gradients on reasoning tokens, along with an additional energy-based redistribution mechanism that further increases the weights on action tokens with higher uncertainty. Across four environments and different model sizes, ActFocus consistently outperforms PPO and GRPO, yielding final-step gains of up to 65.2 and 63.7 percentage points, respectively, without any additional runtime or memory cost.

【6】Fully Dynamic Rebalancing in Dockless Bike-Sharing Systems via Deep Reinforcement Learning
标题:通过深度强化学习在无桩自行车共享系统中实现全动态再平衡
链接:https://arxiv.org/abs/2605.14501

作者:Edoardo Scarpel,Alberto Pettena,Matteo Cederle,Federico Chiariotti,Marco Fabris,Gian Antonio Susto
备注:6 pages, 5 figures, 1 table, accepted at the 23rd IFAC World Congress, Busan, South Korea, Aug. 23-26, 2026. Open invited track 9-131: "Control and Optimization for Smart Cities"
摘要:This paper proposes a fully dynamic Deep Reinforcement Learning (DRL) method for rebalancing dockless bike-sharing systems, overcoming the limitations of periodic, system-wide interventions. We model the service through a graph-based simulator and cast rebalancing as a Markov decision process. A DRL agent routes a single truck in real time, executing localized pick-up, drop-off, and charging actions guided by spatiotemporal criticality scores. Experiments on real-world data show significant reductions in availability failures with a minimal fleet size, while limiting spatial inequality and mobility deserts. Our approach demonstrates the value of learning-based rebalancing for efficient and reliable shared micromobility.

【7】Reinforcement Learning with Semantic Rewards Enables Low-Resource Language Expansion without Alignment Tax
标题:具有语义奖励的强化学习实现低资源语言扩展,无需调整税
链接:https://arxiv.org/abs/2605.14366

作者:Zeli Su,Ziyin Zhang,Zhou Liu,Xuexian Song,Zhankai Xu,Longfei Zheng,Xiaolu Zhang,Rong Fu,Guixian Xu,Wentao Zhang
备注:ACL 2026 Findings
摘要:Extending large language models (LLMs) to low-resource languages often incurs an "alignment tax": improvements in the target language come at the cost of catastrophic forgetting in general capabilities. We argue that this trade-off arises from the rigidity of supervised fine-tuning (SFT), which enforces token-level surface imitation on narrow and biased data distributions. To address this limitation, we propose a semantic-space alignment paradigm powered by Group Relative Policy Optimization (GRPO), where the model is optimized using embedding-level semantic rewards rather than likelihood maximization. This objective encourages meaning preservation through flexible realizations, enabling controlled updates that reduce destructive interference with pretrained knowledge. We evaluate our approach on Tibetan-Chinese machine translation and Tibetan headline generation. Experiments show that our method acquires low-resource capabilities while markedly mitigating alignment tax, preserving general competence more effectively than SFT. Despite producing less rigid surface overlap, semantic RL yields higher semantic quality and preference in open-ended generation, and few-shot transfer results indicate that it learns more transferable and robust representations under limited supervision. Overall, our study demonstrates that reinforcement learning with semantic rewards provides a safer and more reliable pathway for inclusive low-resource language expansion.

【8】Matrix-Space Reinforcement Learning for Reusing Local Transition Geometry
标题:重用局部过渡几何的矩阵空间强化学习
链接:https://arxiv.org/abs/2605.14304

作者:Zuyuan Zhang,Carlee Joe-Wong,Tian Lan
摘要:Compositional generalization in sequential decision-making requires identifying which parts of prior rollouts remain useful for new tasks. Existing methods reuse skills or predictive models, but often overlook rich local transition geometry and dynamics. We propose Matrix-Space Reinforcement Learning (MSRL), a geometric abstraction that represents trajectory segments through positive semidefinite matrix descriptors aggregating first- and second-order statistics of lifted one-step transitions. These descriptors expose shared hidden structure, support algebraic composition in an abstract matrix space, and reveal opportunities for transfer. We prove that the descriptor is well defined up to coordinate gauge, complete for the induced low-order additive signal class, additive under valid segment composition, and minimally sufficient among admissible additive descriptors. We further show that conditioning value functions on the trajectory-segment matrix yields a first-order smooth approximation of action values, enabling source-learned matrix-to-value mappings to bootstrap learning in new tasks. MSRL is plug-in compatible with standard model-free and model-based methods, while obstruction filtering rejects implausible compositions. Empirically, MSRL achieves the best average finite-budget target AUC of 0.73, outperforming MSRL from scratch (0.65), TD-MPC-PT+FT (0.63), and TD-MPC (0.57).

【9】Quantum Advantage in Multi Agent Reinforcement Learning
标题:多智能体强化学习中的量子优势
链接:https://arxiv.org/abs/2605.14235

作者:Simranjeet Singh Dahia,Claudia Szabo
备注:19 pages
摘要 :We present an empirical evaluation of quantum entanglement in agent coordination within quantum multi agent reinforcement learning (QMARL). While QMARL has attracted growing interest recently, most prior work evaluates quantum policies without provable baselines, making it impossible to rigorously distinguish quantum advantage from algorithmic coincidence. We address this directly by evaluating a decentralized QMARL framework with variational quantum circuit (VQC) actors with shared entangled states. In the CHSH game, which has a mathematically proven classical performance ceiling of 0.75 win rate, we show that entangled QMARL agents approach the Tsirelson limit of 0.854, providing clear evidence of their quantum advantage. We show that unentangled quantum circuits match the classical baseline, confirming that entanglement and not the quantum circuit itself is the active coordination mechanism. We also explore the effect of specific entanglement structures, as some Bell states enable coordination gains while others actively harm performance. On cooperative navigation (CoopNav), QMARL without entanglement achieves $\sim2\times$ improvement in success rate over classical MAA2C ($\sim$0.85 versus $\sim$0.40), with the hybrid configuration, quantum actor paired with a classical centralised critic, outperforming both fully classical and fully quantum solutions. We present our experimental analysis and discuss future work.

【10】GenCircuit-RL: Reinforcement Learning from Hierarchical Verification for Genetic Circuit Design
标题:GenCircuit-RL:遗传电路设计的分层验证强化学习
链接:https://arxiv.org/abs/2605.14215

作者:Noah Flynn
备注:Link: https://icml.cc/virtual/2026/poster/61789
摘要:Genetic circuit design remains a laborious, expert-driven process despite decades of progress in synthetic biology. We study this problem through code generation: models produce Python code in pysbol3 to construct genetic circuits in the Synthetic Biology Open Language (SBOL), a formal representation that supports automated verification. We introduce GenCircuit-RL, a reinforcement learning framework built around hierarchical verification rewards that decompose correctness into five levels, from code execution to task-specific topological checks, and a four-stage curriculum that shifts optimization pressure from code generation to functional reasoning. We also introduce SynBio-Reason, a benchmark of 4,753 circuits spanning six canonical circuit types and nine tasks from code repair to de novo design, with held-out biological parts for out-of-distribution evaluation. Hierarchical verification improves task success on functional reasoning tasks by 14 to 16 percentage points over binary rewards, and curriculum learning is required for strong design performance. The resulting models generate topologically correct circuits, generalize to novel biological parts, and rediscover canonical designs from the synthetic biology literature.

【11】Reinforcement Learning for Tool-Calling Agents in Fast Healthcare Interoperability Resources (FHIR)
标题:快速医疗保健互操作性资源(FHIR)中工具调用代理的强化学习
链接:https://arxiv.org/abs/2605.14126

作者:Marius S. Knorr,Robert Müller,Jan P. Bremer,Nils Schweingruber
摘要:Fast Healthcare Interoperability Resources (FHIR) is the dominant standard for interoperable exchange of healthcare data. In FHIR, electronic health records form a directed graph of resources. Answering clinically meaningful questions over FHIR requires agents to perform multi-step reasoning, filtering, and aggregation across multiple resource types. Prior work shows that even tool-augmented LLM agents (retrieval, code execution, multi-turn planning) often select the wrong resources or violate traversal constraints. We study this problem in the context of FHIR-AgentBench, a benchmark for realistic question answering over real-world hospital data, and frame reasoning on FHIR as a sequential decision-making problem over a queryable structured graph. We implement a multi-turn CodeAct agent and post-train it with reinforcement learning using a custom harness and tools. A LLM Judge provides execution-grounded rewards. Compared to prompt-based, closed-model baselines, RL post-training improves performance while enforcing data-integrity constraints. Empirically, our approach improves answer correctness from 50% (o4-mini) to 77% on FHIR-AgentBench using a smaller and cheaper Qwen3-8B model. We present an end-to-end post-training pipeline (environment building, harness construction, model training and custom evaluation) that reliably improves multi-turn reasoning over structured clinical graphs.

符号|符号学习(1篇)

【1】Optimal Pattern Detection Tree for Symbolic Rule-Based Classification
标题:基于符号规则的分类的最佳模式检测树
链接:https://arxiv.org/abs/2605.14374

作者:Young-Chae Hong,Yangho Chen
备注:Published in Transactions on Machine Learning Research (TMLR). 26 pages, 4 figures. OpenReview URL: https://openreview.net/forum?id=RJ6eMDcDCv
摘要:Pattern discovery in data plays a crucial role across diverse domains, including healthcare, risk assessment, and machinery maintenance. In contrast to black-box deep learning models, symbolic rule discovery emerges as a key data mining task, generating human-interpretable rules that offer both transparency and intuitive explainability. This paper introduces the Optimal Pattern Detection Tree (OPDT), a rule-based machine learning model based on novel mixed-integer programming to discover a single optimal pattern in data through binary classification. To incorporate prior knowledge and compliance requirements, we further introduce the Branching Structure Constraints (BSC) framework, which enables decision makers to encode domain knowledge and constraints directly into the model. This optimization-based approach discovers a hidden underlying pattern in datasets, when it exists, by identifying an optimal rule that maximizes coverage while minimizing the false positive rate due to misclassification. Our computational experiments show that OPDT discovers a pattern with optimality guarantees on moderately sized datasets within reasonable runtime.

医学相关(8篇)

【1】Evidential Reasoning Advances Interpretable Real-World Disease Screening
标题:证据推理取得了可解释现实世界疾病筛查的进展
链接:https://arxiv.org/abs/2605.15171

作者:Chenyu Lian,Hong-Yu Zhou,Jing Qin
备注:ICML 2026
摘要:Disease screening is critical for early detection and timely intervention in clinical practice. However, most current screening models for medical images suffer from limited interpretability and suboptimal performance. They often lack effective mechanisms to reference historical cases or provide transparent reasoning pathways. To address these challenges, we introduce EviScreen, an evidential reasoning framework for disease screening that leverages region-level evidence from historical cases. The proposed EviScreen offers retrospection interpretability through regional evidence retrieved from dual knowledge banks. Using this evidential mechanism, the subsequent evidence-aware reasoning module makes predictions using both the current case and evidence from historical cases, thereby enhancing disease screening performance. Furthermore, rather than relying on post-hoc saliency maps, EviScreen enhances localization interpretability by leveraging abnormality maps derived from contrastive retrieval. Our method achieves superior performance on our carefully established benchmarks for real-world disease screening, yielding notably higher specificity at clinical-level recall. Code is publicly available at https://github.com/DopamineLcy/EviScreen.

【2】Text Knows What, Tables Know When: Clinical Timeline Reconstruction via Retrieval-Augmented Multimodal Alignment
标题:文本知道什么,表格知道什么时候:通过检索增强多模式对齐的临床时间轴重建
链接:https://arxiv.org/abs/2605.15168

作者:Sayantan Kumar,Shahriar Noroozizadeh,Juyong Kim,Jeremy C. Weiss
备注:Sayantan Kumar, Shahriar Noroozizadeh, Juyong Kim (authors contributed equally)
摘要 :Reconstructing precise clinical timelines is essential for modeling patient trajectories and forecasting risk in complex, heterogeneous conditions like sepsis. While unstructured clinical narratives offer semantically rich and contextually complete descriptions of a patient's course, they often lack temporal precision and contain ambiguous event timing. Conversely, structured electronic health record (EHR) data provides precise temporal anchors but misses a substantial portion of clinically meaningful events. We introduce a retrieval-augmented multimodal alignment framework that bridges this gap to improve the temporal precision of absolute clinical timelines extracted from text. Our approach formulates timeline reconstruction as a graph-based multistep process: it first extracts central anchor events from narratives to build an initial temporal scaffold, places non-central events relative to this backbone, and then calibrates the timeline using retrieved structured EHR rows as external temporal evidence. Evaluated using instruction-tuned large language models on the i2m4 benchmark spanning MIMIC-III and MIMIC-IV, our multimodal pipeline consistently improves absolute timestamp accuracy (AULTC) and improves temporal concordance across nearly all evaluated models over unimodal text-only reconstruction, without compromising event match rates. Furthermore, our empirical gap analysis reveals that 34.8% of text-derived events are entirely absent from tabular records, demonstrating that aligning these modalities can produce a more temporally faithful and clinically informative reconstruction of patient trajectories than either source alone.

【3】Explainable Detection of Depression Status Shifts from User Digital Traces
标题:从用户数字轨迹中对抑郁状态变化的可解释检测
链接:https://arxiv.org/abs/2605.14995

作者:Loris Belcastro,Francesco Gervino,Fabrizio Marozzo,Domenico Talia,Paolo Trunfio
摘要:Every day, users generate digital traces (e.g., social media posts, chats, and online interactions) that are inherently timestamped and may reflect aspects of their mental state. These traces can be organized into temporal trajectories that capture how a user's mental health signals evolve, including phases of improvement, deterioration, or stability. In this work, we propose an explainable framework for detecting and analyzing depression-related status shifts in user digital traces. The approach combines multiple BERT-based models to extract complementary signals across different dimensions (e.g., sentiment, emotion, and depression severity). Such signals are then aggregated over time to construct user-level trajectories that are analyzed to identify meaningful change points. To enhance interpretability, the framework integrates a large language model to generate concise and human-readable reports that describe the evolution of mental-health signals and highlight key transitions. We evaluate the framework on two social media datasets. Results show that the approach produces more coherent and informative summaries than direct LLM-based reporting, achieving higher coverage of user history, stronger temporal coherence, and improved sensitivity to change points. An ablation study confirms the contribution of each component, particularly temporal modeling and segmentation. Overall, the method provides an interpretable view of mental health signals over time, supporting research and decision making without aiming at clinical diagnosis.

【4】NeuroAtlas: Benchmarking Foundation Models for Clinical EEG and Brain-Computer Interfaces
标题:NeuroAtlas:临床EEG和脑机接口的基准基础模型
链接:https://arxiv.org/abs/2605.14698

作者:Konstantinos Kontras,Trui Osselaer,Stylianos G. Mouslech,Angeliki-Ilektra Karaiskou,Guido Gagliardi,Thomas Strypsteen,Mohammad Hossein Badiei,Anku Rani,Maarten Vanmarcke,Miguel Bhagubai,Chanakya Ekbote,Jaedong Hwang,Christos Chatzichristos,Paul Pu Liang,Maarten De Vos
摘要:Foundation models (FMs) promise to extract unified representations that generalize across downstream tasks. They have emerged across fields, including electroencephalography (EEG), but it is less clear how effective they are in this particular field. Published evaluations differ in datasets, in the EEG-specific preprocessing that might influence reported results, and in the reported metrics, frequently obscuring the clinical relevance in EEG. We introduce NeuroAtlas, the largest EEG benchmark to date: 42 datasets and 260k hours covering clinical EEG (epilepsy, sleep medicine, brain age estimation) and brain-computer interfaces, and include multiple datasets per task along with bespoke clinical evaluation metrics. Besides evaluating EEG-FMs with respect to supervised baselines, we present results from generic time-series FMs. We report three findings. First, EEG-specific FMs do not consistently outperform time-series FMs, which have neither EEG-focused architectures nor been pretrained on EEG. Second, standard machine learning metrics are insufficient to assess clinical utility: thus, we thoroughly evaluate more appropriate measures such as the quality of event-level decision-making, hypnogram-derived features, and the brain-age gap in the domains of epilepsy, sleep, and brain age, respectively. Third, model rankings and performance can vary substantially within domains. We conclude that pretrained models perform largely on par, with only narrow advantages for a few, and that current models do not yet deliver on the promise of an out-of-the-box unified EEG model. NeuroAtlas exposes this gap and provides the datasets and metrics for the next generation of unified EEG FMs.

【5】AIM-DDI: A Model-Agnostic Multimodal Integration Module for Drug-Drug Interaction Prediction
标题:AIM-DID:用于药物相互作用预测的模型不可知多峰集成模块
链接:https://arxiv.org/abs/2605.14327

作者:Yerin Park,Sangseon Lee
摘要:Drug-drug interaction (DDI) prediction is a critical task in computational biomedicine, as adverse interactions between co-administered drugs can cause severe side effects and clinical risks. A key challenge is unseen-drug generalization, where interactions must be predicted for drugs not observed during training. Although multimodal DDI models exploit diverse drug-related information, their fusion mechanisms are often tied to specific prediction architectures, limiting their reuse across models. To address this, we propose AIM-DDI, an architecture-independent multimodal integration module that represents heterogeneous modality information as tokens in a shared latent space. By modeling dependencies across modality tokens through a unified fusion module, AIM-DDI enables model-agnostic integration of structural, chemical, and semantic drug signals across different DDI prediction architectures. Extensive evaluations across diverse DDI models and DrugBank-based settings show that AIM-DDI consistently improves prediction performance, with the strongest gains under the most challenging both-unseen setting where neither drug in a test pair is observed during training. These results suggest that treating multimodal integration as a reusable module, rather than a model-specific fusion component, is an effective strategy for robust unseen-drug DDI prediction.

【6】Uncovering Trajectory and Topological Signatures in Multimodal Pediatric Sleep Embeddings
标题:揭示多模态小儿睡眠嵌入中的轨迹和拓扑特征
链接:https://arxiv.org/abs/2605.14156

作者:Scott Ye,Harlin Lee
备注:Accepted to ML4H 2025, 20 pages, 6 figures
摘要 :While generative models have shown promise in pediatric sleep analysis, the latent structure of their multimodal embeddings remains poorly understood. This work investigates session-wide diagnostic information contained in the sequences of 30-second pediatric PSG epochs embedded by a multimodal masked autoencoder. We test whether augmenting embeddings with PHATE-derived per-epoch coordinates and whole-night movement descriptors, persistent homology summaries of the embedding cloud, and EHR yields task-relevant signals. Simple linear and MLP models, chosen for interpretability rather than state-of-the-art performance, show that geometric, topological, and clinical features each provide complementary gains. For binary predictions, feature importance is task-dependent, and more expressive late-fusion models generally perform better, with AUPRC improving from 0.26 to 0.34 for desaturation, 0.31 to 0.48 for EEG arousal, 0.09 to 0.22 for hypopnea, and 0.05 to 0.14 for apnea. We also report Brier score and Expected Calibration Error, where the full fusion model yields the best calibration across all four binary tasks. Our study reveals that latent geometry/topology and EHR offer complementary, interpretable signals beyond embeddings, improving calibration and robustness under extreme imbalance.

【7】A Systematic Evaluation of Imbalance Handling Methods in Biomedical Binary Classification
标题:生物医学二元分类中不平衡处理方法的系统评价
链接:https://arxiv.org/abs/2605.14147

作者:Jiandong Chen,Lingjie Su,Le Peng,Yash Travadi,Rui Zhang,Ju Sun
备注:18 pages, 1 figures, 4 tables
摘要:Objective: The primary goal of this study was to systematically examine the impact of commonly used imbalance handling methods (IHMs) on predictive performance in biomedical binary classification, considering the interplay between model complexity and diverse data modalities. Material and Methods: We evaluated five representative IHMs: random undersampling (RUS), random oversampling (ROS), SMOTE, re-weighting (RW), and direct F1-score optimization (DMO), against a raw training (RAW) baseline. The evaluation encompassed three public biomedical datasets: MIMIC-III (tabular), ADE-Corpus-V2 (text), and MURA (image), spanning three common biomedical data modalities. To assess varying model complexity, we employed a range of architectures, from classical logistic regression and random forest to deep neural networks, including multilayer perceptron (MLP), BiLSTM, BERT, DenseNet, and DINOv2. Results: For simpler models such as logistic regression on tabular data, IHMs yielded no significant advantage over the RAW baseline, aligning with prior findings. However, clear benefits were observed for more complex models and unstructured data: (a) ROS and RW consistently enhanced the performance of powerful models; (b) direct F1-score optimization demonstrated utility primarily for unstructured text and image data; and (c) RUS and SMOTE consistently degraded performance and are therefore not recommended. Conclusion: The effectiveness of IHMs depends on both model complexity and data modality. Performance gains are most pronounced when leveraging appropriate IHMs, such as ROS, RW, and DMO, on high-complexity models.

【8】ProtoMedAgent: Multimodal Clinical Interpretability via Privacy-Aware Agentic Workflows
标题:ProtoMedAgent:通过隐私意识的统计工作流程实现多模式临床解释
链接:https://arxiv.org/abs/2605.14113

作者:Alvaro Lopez Pellicer,Plamen Angelov,Marwan Bukhari,Yi Li,Eduardo Soares,Jemma Kerns
备注:CVR 2026
摘要:While interpretable prototype networks offer compelling case-based reasoning for clinical diagnostics, their raw continuous outputs lack the semantic structure required for medical documentation. Bridging this gap via standard Retrieval-Augmented Generation (RAG) routinely triggers ``retrieval sycophancy,'' where Large Language Models (LLMs) hallucinate post-hoc rationalizations to align with visual predictions. We introduce ProtoMedAgent, a framework that formalizes multimodal clinical reporting as an iterative, zero-gradient test-time optimization problem over a strict neuro-symbolic bottleneck. Operating on a frozen prototype backbone, we distill latent visual and tabular features into a discrete semantic memory. Online generation is strictly constrained by exact set-theoretic differentials and a reflective Scribe-Critic loop, mathematically precluding unsupported narrative claims. To safely bound data disclosure, we introduce a semantic privacy gate governed by $k$-anonymity and $\ell$-diversity. Evaluated on a 4,160-patient clinical cohort, ProtoMedAgent achieves 91.2\% Comparison Set Faithfulness where it fundamentally outperforms standard RAG (46.2\%). ProtoMedAgent additionally leverages a binding $\ell$-diversity phase transition to systematically reduce artifact-level membership inference risks by an absolute 9.8\%.

蒸馏|知识提取(5篇)

【1】Learning from Language Feedback via Variational Policy Distillation
标题:基于变分策略蒸馏的语言反馈学习
链接:https://arxiv.org/abs/2605.15113

作者:Yang Li,Erik Nijkamp,Semih Yavuz,Shafiq Rayhan Joty
摘要:Reinforcement learning from verifiable rewards (RLVR) suffers from sparse outcome signals, creating severe exploration bottlenecks on complex reasoning tasks. Recent on-policy self-distillation methods attempt to address this by utilizing language feedback to generate dense, token-level supervision. However, these approaches rely on a fixed, passive teacher to interpret the feedback. As the student policy improves, the teacher's zero-shot assessment capabilities plateau, ultimately halting further learning. To overcome this, we propose Variational Policy Distillation (VPD), a framework that formalizes learning from language feedback as a Variational Expectation-Maximization (EM) problem. VPD co-evolves both policies: in the E-step, the teacher is actively refined on trajectory outcomes via an adaptive trust-region update, translating textual feedback into a dynamically improved target token distribution. In the M-step, the student internalizes this dense distributional guidance on its own on-policy rollouts. By continuously improving the teacher's ability to extract actionable signals from textual critique, VPD overcomes the limitations of passive distillation. Evaluated across diverse sources of diagnostic feedback on scientific reasoning and code generation tasks, VPD consistently outperforms both standard RLVR and existing self-distillation baselines. Finally, by stress-testing our framework on rigid mathematical reasoning and cold-start regimes, we illuminate the fundamental bounds of feedback-driven self-distillation compared to pure environment-driven RL.

【2】DiffusionOPD: A Unified Perspective of On-Policy Distillation in Diffusion Models
标题:扩散OPD:扩散模型中政策上蒸馏的统一视角
链接:https://arxiv.org/abs/2605.15055

作者:Quanhao Li,Junqiu Yu,Kaixun Jiang,Yujie Wei,Zhen Xing,Pandeng Li,Ruihang Chu,Shiwei Zhang,Yu Liu,Zuxuan Wu
摘要:Reinforcement learning has emerged as a powerful tool for improving diffusion-based text-to-image models, but existing methods are largely limited to single-task optimization. Extending RL to multiple tasks is challenging: joint optimization suffers from cross-task interference and imbalance, while cascade RL is cumbersome and prone to catastrophic forgetting. We propose DiffusionOPD, a new multi-task training paradigm for diffusion models based on Online Policy Distillation (OPD). DiffusionOPD first trains task-specific teachers independently, then distills their capabilities into a unified student along the student own rollout trajectories. This decouples single-task exploration from multi-task integration and avoids the optimization burden of solving all tasks jointly from scratch. Theoretically, we lift the OPD framework from discrete tokens to continuous-state Markov processes, deriving a closed-form per-step KL objective that unifies both stochastic SDE and deterministic ODE refinement via mean-matching. We formally and empirically demonstrate that this analytic gradient provides lower variance and better generality compared to conventional PPO-style policy gradients. Extensive experiments show that DiffusionOPD consistently surpasses both multi-reward RL and cascade RL baselines in training efficiency and final performance, while achieving state-of-the-art results on all evaluated benchmarks.

【3】Critic-Driven Voronoi-Quantization for Distilling Deep RL Policies to Explainable Models
标题:批判驱动的Voronoi量化,用于将深度RL策略提炼为可解释模型
链接:https://arxiv.org/abs/2605.14897

作者:Senne Deproost,Denis Steckelmacher,Ann Nowé
备注:Accepted for presentation at EXTRAAMAS 2026
摘要 :Despite many successful attempts at explaining Deep Reinforcement Learning policies using distillation, it remains difficult to balance the performance-interpretability trade-off and select a fitting surrogate model. In addition to this, traditional distillation only minimizes the distance between the behavior of the original and the surrogate policy while other RL-specific components such as action value are disregarded. To solve this, we introduce a new model-agnostic method called Critic-Driven Voronoi State Partitioning, which partitions a black box control policy into regions where a simple class of model can be optimized using gradient descent. By exploiting the critic value network of the original policy, we iteratively introduce new subpolicies in regions with insufficient value, standing in for a measure of policy complexity. The partitioning, a Voronoi quantizer, uses nearest neighbor lookups to assign a linear function to each point in the state space resulting in a cell-like diagram. We validate our approach on several well known benchmarks and proof that this distillation approaches the original policy using a reasonable sized set of linear functions.

【4】IsoNet: Spatially-aware audio-visual target speech extraction in complex acoustic environments
标题:IsoNet:复杂声学环境中的空间感知视听目标语音提取
链接:https://arxiv.org/abs/2605.14736

作者:Dinanath Pathya,Sajen Maharjan,Binita Adhikari,Ishwor Raj Pokharel
备注:8 pages
摘要:Target speech extraction remains difficult for compact devices because monaural neural models lack spatial evidence and classical beamformers lose resolving power when the microphone aperture is only a few centimetres. We present IsoNet, a user-selectable audio-visual target speech extraction system for a compact 4-microphone array. IsoNet combines complex multi-channel STFT features, GCC-PHAT spatial cues, face-conditioned visual embeddings, and auxiliary direction-of-arrival supervision inside a U-Net mask estimation network. Three curriculum variants were trained on 25,000 simulated VoxCeleb mixtures with progressively difficult SNR regimes. On a hard test set spanning -1 to 10 dB SNR, IsoNet-CL1 achieves 9.31 dB SI-SDR, a 4.85 dB improvement over the mixture, with PESQ 2.13 and STOI 0.84. Oracle delay-and-sum and MVDR beamformers degrade the same mixtures by 4.82 dB and 6.08 dB SI-SDRi, respectively, showing that the proposed learned multimodal conditioning solves a regime where conventional spatial filtering is ineffective. Ablation studies show consistent gains from visual conditioning, GCC-PHAT features, and extended delay-bin encoding. The results establish a compact-array, face-selectable speech extraction baseline under controlled simulation and identify the remaining barriers to real deployment, especially phase reconstruction, multi-interferer mixtures, and simulation-to-real transfer.

【5】Not All Timesteps Matter Equally: Selective Alignment Knowledge Distillation for Spiking Neural Networks
标题:并非所有时间步都同样重要:尖峰神经网络的选择性对齐知识提取
链接:https://arxiv.org/abs/2605.14252

作者:Kai Sun,Peibo Duan,Yongsheng Huang,Guowei Zhang,Benjamin Smith,Nanxu Gong,Levin Kuhlmann
摘要:Spiking neural networks (SNNs), which are brain-inspired and spike-driven, achieve high energy efficiency. However, a performance gap between SNNs and artificial neural networks (ANNs) still remains. Knowledge distillation (KD) is commonly adopted to improve SNN performance, but existing methods typically enforce uniform alignment across all timesteps, either from a teacher network or through inter-temporal self-distillation, implicitly assuming that per-timestep predictions should be treated equally. In practice, SNN predictions vary and evolve over time, and intermediate timesteps need not all be individually correct even when the final aggregated output is correct. Under such conditions, effective distillation should not force every timestep toward the same supervision target, but instead provide corrective guidance to erroneous timesteps while preserving useful temporal dynamics. To address this issue, we propose Selective Alignment Knowledge Distillation (SeAl-KD), which selectively aligns class-level and temporal knowledge by equalizing competing logits at erroneous timesteps and reweighting temporal alignment based on confidence and inter-timestep similarity. Extensive experiments on static image and neuromorphic event-based datasets demonstrate consistent improvements over existing distillation methods. The code is available at https://github.com/KaiSUN1/SeAl

聚类(3篇)

【1】Learning with Shallow Neural Networks on Cluster-Structured Features
标题:使用浅层神经网络在字符串结构特征上学习
链接:https://arxiv.org/abs/2605.14927

作者:Elisabetta Cornacchia,Laurent Massoulié
备注:10 pages main body, 2 figures
摘要:The success of deep learning in high-dimensional settings is often attributed to the presence of low-dimensional structure in real-world data. While standard theoretical models typically assume that this structure lies in the target function, projecting unstructured inputs onto a low-dimensional subspace, data such as images, text or genomic sequences exhibit strong spatial correlations within the input space itself. In this paper, we propose a tractable model to study how these correlations affect the sample complexity of learning with gradient descent on shallow neural networks. Specifically, we consider targets that depend on a small number of latent Boolean variables, and input features grouped into clusters and correlated with the latent variables. Under an identifiability assumption, we show that for a layerwise gradient-descent variant, the sample complexity scales with the number of hidden variables and, when the signal-to-noise ratio is sufficiently high, is independent of the input dimension, up to logarithmic terms. We empirically test our theoretical findings on both synthetic and real data.

【2】ToMAToMP: Robust and Multi-Parameter Topological Clustering
标题:ToMAToMP:鲁棒的多参数布局集群
链接:https://arxiv.org/abs/2605.14824

作者:Ludo Andrianirina,Mathieu Carrière
摘要:Topological clustering, and its main algorithm ToMATo, is a clustering method from Topological Data Analysis (TDA) which has been applied successfully in several applications during the last few years. This is due to its high versatility, as clusters are detected from the persistent components in the sublevel sets of any user-defined function (gene expression, pixel values, etc), and efficiency, as topological clustering enjoys robustness guarantees. However, ToMATo is also limited in several ways. First, a graph on the data points needs to be provided as a hyper-parameter of the method (whose fine-tuning is left to the user). Second, ToMATo is known to be very sensitive to outlier values in the function range. Finally, and most importantly, ToMATo can only handle one function at a time, whereas it is critical to use several functions in various applications. In this article, we introduce ToMAToMP: the first topological clustering method able to handle several functions at the same time with theoretical guarantees. More specifically, we leverage a recent tool from multi-parameter persistent homology, called MMA decomposition, to design our clustering algorithm, and prove that it enjoys robustness properties. As corollaries, we show that it can be used to make ToMATo independent of graph tuning, and robust to outliers. Finally, we provide a set of numerical experiments showcasing the efficiency and quality of the clusterings produced by ToMAToMP, by showing strong improvement over non-topological and topological baselines for various datasets.

【3】K-Models: a Flexible and Interpretable Method for Ordinal Clustering with Application to Antigen-Antibody Interaction Profiles
标题:K-模型:一种灵活且可解释的有序聚类方法及其在抗原-抗体相互作用谱中的应用
链接:https://arxiv.org/abs/2605.14828

作者:Giulia Patanè,Alessandra Menafoglio,Alexander Krauth,Peter Fechner,Luca Dede',Bianca Maria Colosimo,Federica Nicolussi
摘要 :Existing clustering methods for functional data often prioritize partitioning accuracy over interpretability, making it challenging to extract meaningful insights when the data-generating process follows a specific underlying structure and an ordinal relationship among clusters is suspected. This work introduces K-Models, a novel framework that integrates ordinal constraints and estimates key underlying elements of the random process generating the observed functional profiles, improving both interpretability and structure identification. The proposed method is evaluated through simulations and real-world applications. In particular, it is tested on Region of Interest (ROI) curves, which represent reaction profiles from a reflectometric sensor monitoring biomolecular interactions, such as antigen-antibody binding. These curves represent changes in reflected light intensity over time at multiple measurement spots with immobilized antigens during analyte exposure, capturing the binding dynamics of the system. The goal is to identify intrinsic signal patterns solely from the observed dynamics, making this dataset an ideal benchmark for assessing the added interpretability of the proposed approach. By incorporating structural assumptions into the clustering process, K-Models enhances interpretability while maintaining performance comparable to state-of-the-art techniques, providing a valuable tool for analyzing functional data with an underlying ordinal structure.

超分辨率|去噪|去模糊|去雾(1篇)

【1】Denoising-GS: Gaussian Splatting with Spatial-aware Denoising
标题:去噪-GS:具有空间感知去噪的高斯飞溅
链接:https://arxiv.org/abs/2605.14880

作者:Qingyuan Zhou,Xinyi Liu,Weidong Yang,Ning Wang,Shuquan Ye,Ben Fei,Ying He,Wanli Ouyang
摘要:Recent advances in 3D Gaussian Splatting (3DGS) have achieved remarkable success in high-fidelity Novel View Synthesis (NVS), yet the optimization process inevitably introduces noisy Gaussian primitives due to the sparse and incomplete initialization from Structure-from-Motion (SfM) point clouds. Most existing methods focus solely on adjusting the positions of primitives during optimization, while neglecting the underlying spatial structure. To this end, we introduce a new perspective by formulating the optimization of 3DGS as a primitive denoising process and propose Denoising-GS, a spatial-aware denoising framework for Gaussian primitives by taking both the positions and spatial structure into consideration. Specifically, we design an optimizer that preserves the spatial optimization flow of primitives, facilitating coherent and directed denoising rather than random perturbations. Building upon this, the Spatial Gradient-based Denoising strategy jointly considers the spatial supports of primitives to ensure gradient-consistent updates. Furthermore, the Uncertainty-based Denoising module estimates primitive-wise uncertainty to prune redundant or noisy primitives, while the Spatial Coherence Refinement strategy selectively splits primitives in sparse regions to maintain structural completeness. Experiments conducted on three benchmark datasets demonstrate that Denoising-GS consistently enhances NVS fidelity while maintaining representation compactness, achieving state-of-the-art performance across all benchmarks. Source code and models will be made publicly available.

推理|分析|理解|解释(13篇)

【1】Natural Synthesis: Outperforming Reactive Synthesis Tools with Large Reasoning Models
标题:Natural Synthesis:优于具有大型推理模型的反应式合成工具
链接:https://arxiv.org/abs/2605.15131

作者:Frederik Schmitt,Matthias Cosler,Niklas Metzger,Julian Siber,Vladimir Krsmanovic,Mohamed Ghanem,Bernd Finkbeiner
摘要:Reactive synthesis, the problem of automatically constructing a hardware circuit from a logical specification, is a long-standing challenge in formal verification. It is elusive for two reasons: It is algorithmically hard, and writing formal specifications by hand is notoriously difficult. In this paper, we tackle both sides of the problem. For the algorithmic side, we present a neuro-symbolic approach to reactive synthesis that couples large reasoning models with model checkers to iteratively repair a synthesized Verilog implementation via sound symbolic feedback. Our approach solves more benchmarks than the best dedicated tools in the annual synthesis competition and extends to constructing parameterized systems, a problem known to be undecidable. On the specification side, we introduce an autoformalization step that shifts the specification task from temporal logic to natural language by introducing a hand-authored dataset of natural-language specifications for evaluation. We demonstrate performance comparable to that of starting from formal specifications, establishing natural synthesis as a viable end-to-end workflow.

【2】Understanding Imbalanced Forgetting in Rehearsal-Based Class-Incremental Learning
标题:了解基于排练的课堂增量学习中的不平衡遗忘
链接:https://arxiv.org/abs/2605.14785

作者:Alberto Tamajo,Srinandan Dasmahapatra,Rahman Attar
备注:37 pages; 24 tables; 7 figures; submitted to a journal
摘要:Neural networks suffer from catastrophic forgetting in class-incremental learning (CIL) settings. Rehearsal$\unicode{x2013}$replaying a subset of past samples$\unicode{x2013}$is a well-established mitigation strategy. However, recent results suggest that, despite balanced rehearsal allocation, some classes are forgotten substantially more than others. Despite its relevance, this imbalanced forgetting phenomenon remains underexplored. This work shows that imbalanced forgetting arises systematically and severely in rehearsal-based CIL and investigates it extensively. Specifically, we construct, from a principled analysis, three last-layer coefficients that capture different gradient-level sources of interference affecting each past class during an incremental step. We then demonstrate that, together, they reliably predict how past classes will rank in terms of forgetting at the end of that step. While predictive performance alone does not establish causality, these results support the interpretation of the coefficients as a plausible mechanistic account linking last-layer gradient-level interactions during training to class-level forgetting outcomes. Notably, one coefficient$\unicode{x2013}$capturing self-induced interference$\unicode{x2013}$emerges as the strongest predictor, with controlled experiments providing evidence consistent with this coefficient being influenced by the new-class interference coefficient. Overall, our findings provide valuable insights and suggest promising directions for mitigating imbalanced forgetting by reducing class-wise disparities in the identified sources of interference.

【3】Uncovering the Representation Geometry of Minimal Cores in Overcomplete Reasoning Traces
标题:揭示过完全推理轨迹中最小核的表示几何
链接:https://arxiv.org/abs/2605.14358

作者:Sanjoy Chowdhury,Dinesh Manocha
摘要 :Language models often generate long chain-of-thought traces, but it remains unclear how much of this reasoning is necessary for preserving the final prediction. We study this through the lens of overcomplete reasoning traces: generated traces that contain more intermediate steps than are needed to support the model's answer. We define the minimal core as the smallest subset of steps that preserves either the final answer or predictive distribution, and introduce metrics for compression ratio, redundancy mass, step necessity, and necessity concentration. Across six deliberative reasoning benchmarks spanning arithmetic, competition mathematics, expert scientific reasoning, and commonsense multi-hop QA, we find substantial overcompleteness: on average, 46% of steps are removable under greedy minimal-core extraction while preserving the original answer in 86% of cases. We also find that predictive support is concentrated: the top three steps account for 65% of measured necessity mass on average. Beyond compression, minimal cores expose a cleaner geometry of reasoning: compared with full traces, they improve correct-incorrect trace separation by 11 points, reduce estimated intrinsic dimensionality by 34%, and transfer across model families with 85% off-diagonal answer retention. Theoretically, we establish existence of minimal sufficient subsets, local irreducibility guarantees for greedy elimination, and certificates of overcompleteness and sparse necessity. Together, these results suggest that full reasoning traces are often verbose and overcomplete, while minimal cores isolate the effective support underlying language-model predictions.

【4】Artificial Intelligence-Assistant Cardiotocography: Unified Model for Signal Reconstruction, Fetal Heart Rate Analysis, and Variability Assessment
标题:人工智能辅助子宫内膜成像:信号重建、胎儿心率分析和变异性评估的统一模型
链接:https://arxiv.org/abs/2605.14242

作者:Xiaohua Wang,Kai Yu,XuXiao Liang,Liang Wang,Chao Han
摘要:The monitoring of fetal heart rate (FHR) and the assessment of its variability are crucial for preventing fetal compromise and adverse outcomes. However, traditional methods encounter limitations arising from equipment performance, data transmission, and subjective assessments by doctors. We have developed a tailored AI-based FHrCTG model specifically for FHR monitoring, which effectively mitigates noise interference and precisely reconstructs signals. Our model was pre-trained on a massive dataset consisting of 558,412 unlabeled data points and further refined using 7,266 expert-reviewed entries. To validate FHR, we introduced the Intersection Overlapping Labels (IOL) approach, which transforms rate analysis into categorical judgments. Testing revealed that our model demonstrates high sensitivity and specificity in detecting critical FHR decelerations (89.13% and 87.78%, respectively) and accelerations (62.5% and 92.04%, respectively). Furthermore, based on Fischer's criteria for clinical application, our model achieved impressive AUC scores of 0.7214 and 0.9643 for verifying FHR periodicity and amplitude variation, respectively.

【5】PreFT: Prefill-only finetuning for efficient inference
标题:PreFT:仅预填充微调以实现高效推理
链接:https://arxiv.org/abs/2605.14217

作者:Andrew Lanpouthakoun,Aryaman Arora,Zhengxuan Wu,Dhruv Pai,Ben Keigwin,Dan Jurafsky,Christopher Potts
摘要:Large language models can now be personalised efficiently at scale using parameter efficient finetuning methods (PEFTs), but serving user-specific PEFTs harms throughput, even with specialised kernels and memory management techniques. This is because, theoretically and empirically, a mismatch exists between prefill (processing a large number of tokens at once) and decode (generating a single token autoregressively): the latter has far lower throughput when serving multiple adapters. Rather than optimising performance relative to parameter count, for efficient multi-adapter serving, we instead ought to optimise performance relative to serving throughput. We therefore propose PreFT (Prefill-only Finetuning), wherein we only apply the adapter to prefill tokens and discard it afterwards. PreFT significantly increases throughput with minimal effect on performance. We develop and release an efficient implementation of two prefill-only PEFTs, LoRA and ReFT, on the vLLM inference engine. We first show that serving multi-user PreFTs is more efficient than traditional PEFTs ($1.9\times$ the throughput when serving $512$ adapters on Llama 3.1 70B). Then, we compare the performance of prefill-only vs. all-token adapters on a variety of supervised finetuning and reinforcement learning tasks with LMs at varying scales. On SFT, we observe that the evaluation loss of PreFTs is higher than PEFTs, but can be compensated by increasing rank with nearly no reduction in throughput. On RL, we consistently find that PreFTs approach parity with standard PEFTs. Together, this work validates prefill-only adaptation of LLMs as a more favourable accuracy-throughput tradeoff than existing PEFTs for personalised serving.

【6】CurveBench: A Benchmark for Exact Topological Reasoning over Nested Jordan Curves
标题:CurveBench:巢式乔丹曲线精确布局推理的基准
链接:https://arxiv.org/abs/2605.14068

作者:Amirreza Mohseni, Mona Mohammadi, Morteza Saghafian, Naser Talebizadeh Saradari
摘要

【7】Enhanced and Efficient Reasoning in Large Learning Models
标题:大型学习模型中增强且高效的推理
链接:https://arxiv.org/abs/2605.14036

作者:Leslie G. Valiant
摘要

【8】Collider-Bench: Benchmarking AI Agents with Particle Physics Analysis Reproduction
标题:对撞机台:通过粒子物理分析复制对人工智能智能进行基准测试
链接:https://arxiv.org/abs/2605.13950

作者:Darius A. Faroughy,Sofia Palacios Schweitzer,Ian Pang,Siddharth Mishra-Sharma,David Shih
备注:23 pages | 9 figures | 4 tables | Code: https://github.com/dfaroughy/Collider-Bench | Task Corpus: https://huggingface.co/datasets/Dariusfar/ColliderBench
摘要:Autonomous language-model agents are increasingly evaluated on long-horizon tool-use tasks, but existing benchmarks rarely capture the complexity and nuance of real scientific work. To address this gap, we introduce Collider-Bench, a benchmark for evaluating whether LLM agents can reproduce experimental analyses from the Large Hadron Collider (LHC) using only public papers and open scientific software. Such analyses are often difficult to reproduce because the public toolchain only approximates the software used internally by the experimental collaborations, while the published papers inevitably omit implementation details needed for a faithful reconstruction. Agents must therefore rely on physical reasoning, domain knowledge, and trial-and-error to fill these gaps. Each task requires the agent to turn a published analysis into an executable simulation-and-selection pipeline and submit predicted collision event yields in specified signal regions. These predictions are evaluated with standard histogram metrics that provide continuous fidelity scores without a hand-written rubric. We also report the computational cost incurred by each agent per task. Finally, we evaluate the codebase and full session trace using an LLM judge to catch qualitative failure modes such as fabrications, hallucinations and duplications. We release an initial set of tasks drawn from LHC searches, together with a containerized sandbox and event simulation tools. We evaluate across a capability ladder of general purpose coding agents. Our results show that on average no agent reliably beats the physicist-in-the-loop solution.

【9】XAI and Statistical Analysis for Reliable Intrusion Detection in the UAVIDS-2025 Dataset: From Tree to Hybrid and Tabular DNN Ensembles
标题:UAVIDS-2025数据集中可靠入侵检测的XAI和统计分析:从树到混合和表格DNN集成
链接:https://arxiv.org/abs/2605.13922

作者:Iakovos-Christos Zarkadis,Christos Douligeris
摘要 :During the last few years, the term Mechanistic Interpretability, a specific area, under the umbrella of explainable artificial intelligence (XAI), has been introduced, to explain the decisions made by complex machine learning (ML) models in critical systems like UAV intrusion detection systems (UAVIDS). In this paper, we apply best-practices for data pre-processing and examine a wide range of tree-ensembles, deep neural networks, hybrid stacking models and the latest ensemble neural networks to detect intrusions in UAV, with stratified 10-fold cross validation. With our top-performing model, XGBoost, we proceed to Shapley Additive explanations (SHAP), to analyze the global and local feature importances and understand which features, each attack targets, to mimic normal traffic and where the misclassifications occur. Furthermore a distribution analysis follows, by visually comparing violin plots and the curves of kernel density estimations. With the Westfall-Young permutation test for multiple comparisons, the Bandwidth optimization of the KDEs and the selection of Jensen-Shannon Distance for the test, we discover the true causes of false predictions, observed in Wormhole and Blackhole attacks in UAVIDS-2025. The findings provide robust, reliable and explainable models for UAV intrusion detection, along with statistical insights, which capture and clarify the masked nature of the attacks, regarding the challenge of Density Support Intersection, between these attacks, in this dataset.

【10】Indian Wedding System Optimization (IWSO): A Novel Socially Inspired Metaheuristic with Operational Design and Analysis
标题:印度婚礼系统优化(IWSO):一种具有运营设计和分析的新型社会启发式元启发式
链接:https://arxiv.org/abs/2605.13871

作者:Deepika Saxena,Kishu Gupta,Jitendra Kumar,Jatinder Kumar,Sakshi Patni,Vinaytosh Mishra,Niharika Singh,Ashutosh Kumar Singh
摘要:This paper presents a novel population-based metaheuristic, Indian Wedding System Optimization (IWSO), inspired by the socio-cultural dynamics of traditional Indian weddings. IWSO models the matchmaking process driven by collaboration among families, candidates, and matchmakers as a guided, selective search framework for solving complex optimization problems. The algorithm introduces two key innovations: (i) a matchmaker-guided influence strategy, where elite solutions direct the evolution of weaker candidates, enhancing convergence without external parameters; and (ii) an adaptive elimination and reinitialization mechanism that maintains diversity and prevents premature convergence by replacing underperforming individuals. IWSO employs a weighted multi-objective fitness function and analytically derived time and space complexity, benchmarked against existing optimization approaches such as Genetic Algorithm (GA), Partical Swarm Optimization (PSO), Differential Evolution (DE), Cuckoo Search (CS), etc. Extensive experiments on benchmark high-dimensional and multimodal test functions demonstrate superior performance of IWSO in terms of convergence speed, solution quality, and robustness.

【11】To discretize continually: Mean shift interacting particle systems for Bayesian inference
标题:不断离散化:用于Bayesian推理的均值漂移相互作用粒子系统
链接:https://arxiv.org/abs/2605.14142

作者:Ayoub Belhadji,Daniel Sharp,Youssef M. Marzouk
摘要:Integration against a probability distribution given its unnormalized density is a central task in Bayesian inference and other fields. We introduce new methods for approximating such expectations with a small set of weighted samples -- i.e., a quadrature rule -- constructed via an interacting particle system that minimizes maximum mean discrepancy (MMD) to the target distribution. These methods extend the classical mean shift algorithm, as well as recent algorithms for optimal quantization of empirical distributions, to the case of continuous distributions. Crucially, our approach creates dynamics for MMD minimization that are invariant to the unknown normalizing constant; they also admit both gradient-free and gradient-informed implementations. The resulting mean shift interacting particle systems converge quickly, capture anisotropy and multi-modality, avoid mode collapse, and scale to high dimensions. We demonstrate their performance on a wide range of benchmark sampling problems, including multi-modal mixtures, Bayesian hierarchical models, PDE-constrained inverse problems, and beyond.

【12】Pause and Reflect: Conformal Aggregation for Chain-of-Thought Reasoning
标题:思考与反思:思想链推理的保形聚合
链接:https://arxiv.org/abs/2605.14098

作者:Yu Gu,Zijun Yu,Vahid Partovi Nia,Masoud Asgharian
备注:9 pages, 4 figures, submitted
摘要:Chain-of-thought (CoT) reasoning with self-consistency improves performance by aggregating multiple sampled reasoning paths. In this setting, correctness is no longer tied to a single reasoning trace but to the aggregation rule over a pool of candidate paths, making aggregation uncertainty the central challenge. This issue is critical where confidently incorrect answers are far more costly than abstentions. We introduce a conformal procedure for CoT reasoning that directly addresses aggregation uncertainty. Our approach replaces majority voting with weighted score aggregation over reasoning paths and calibrates an abstention rule using conformal risk control. This approach leads to finite-sample guarantees on the confident-error rate--the probability that the system answers and is wrong. We further identify score separability as the key condition under which abstention provably improves selective accuracy, and derive closed-form expressions that predict accuracy gains from calibration data alone. The method is fully inference-time, and requires no retraining. Across four benchmarks, four open-source models, and three score classes, realized confident-error rates are consistent with the prescribed targets up to calibration-split and test-set variability. Our method achieves $90.1\%$ selective accuracy on GSM8K by abstaining on less than $5\%$ of problems, compared with $82\%$ accuracy under majority-voting baseline.

【13】Phylogenetic Tree Inference with Tropical Axial Attention
标题:具有热带轴向注意的系统发生树推断
链接:https://arxiv.org/abs/2605.13894

作者:Chris Teska,Kurt Pasque,Ruriko Yoshida,Baran Hashemi
摘要:In this work, we introduce a Tropical Axial Attention neural reasoning architecture that replaces vanilla softmax dot-product attention with max-plus operators, inducing a piecewise-linear structure aligned with dynamic programming formulations. From multi-species sequence alignments, our model learns all possible pairwise distances and is trained using a combination of $\ell_1$ and tropical symmetric distance metric losses with an ultrametric violation penalty. We leverage the well known isomorphic relationship between the space of all phylogenetic trees with $n$ species and tropical Grassmannian to show that tropical attention provides a natural geometric framework for phylogenetic inference. On empirical $DS1-DS11$ alignments, where true trees are unknown, the tropical model produces distance matrices that are substantially closer to their BME-induced tree metrics than the baseline models. These results suggest that tropical attention is a useful geometric inductive bias for neural phylogenetic inference, especially under distribution shift and when tree-metric consistency is important.

检测相关(5篇)

【1】PROCESS-2: A Benchmark Speech Corpus for Early Cognitive Impairment Detection
标题:Process-2:用于早期认知障碍检测的基准言语库
链接:https://arxiv.org/abs/2605.14888

作者:Madhurananda Pahar,Caitlin H. Illingworth,Bahman Mirheidari,Hend Elghazaly,Fritz Peters,Sophie Young,Wing-Zin Leung,Labhpreet Kaur,Daniel Blackburn,Heidi Christensen
摘要 :Speech-based analysis offers a scalable and non-invasive approach for detecting cognitive decline, yet progress has been constrained by the limited availability of clinically validated datasets collected under realistic conditions. We introduce PROCESS-2, a large-scale speech dataset designed to support research on automatic assessment of cognitive impairment from spontaneous and task-oriented speech. The dataset comprises recordings from 200 healthy controls, 150 mild cognitive impairment, and 50 dementia diagnoses collected using the CognoMemory digital assessment platform. Each participant completed a single assessment session, including picture description and verbal fluency tasks, accompanied by manually verified transcripts and participant-level metadata. PROCESS-2 contains approximately 21 hours of speech audio with predefined train/test partitions. Comprehensive technical validation evaluated demographic balance, clinical consistency, recording stability, embedding-space structure, and reproducible baseline modelling performance, demonstrating clinically meaningful group separation and stable performance across modelling approaches while preserving real-world conversational variability. PROCESS-2 is released under controlled access via Hugging Face to enable responsible reuse while protecting participant privacy, providing a reproducible benchmark resource for speech-based cognitive assessment research.

【2】When Answers Stray from Questions: Hallucination Detection via Question-Answer Orthogonal Decomposition
标题:当答案偏离问题时:通过问题-答案正交分解进行幻觉检测
链接:https://arxiv.org/abs/2605.14449

作者:Siyang Yao,Erhu Feng,Yubin Xia
摘要:Hallucination detection in large language models (LLMs) requires balancing accu racy, efficiency, and robustness to distribution shift. Black-box consistency methods are effective but demand repeated inference; single-pass white-box probes are effi cient yet treat answer representations in isolation, often degrading sharply under domain shift. We propose QAOD (Question-Answer Orthogonal Decomposition), a single-pass framework that projects away the question-aligned direction from the answer representation to obtain a question-orthogonal component that suppresses domain-conditioned variation. To identify informative signals, QAOD further selects layers via diversity-penalized Fisher scoring and discriminative neurons via Fisher importance. To address both in-domain detection and cross-domain generalization, we design two complementary probing strategies: pairing the or thogonal component with question context yields a joint probe that maximizes in-domain discriminability, while using the orthogonal component alone preserves domain-agnostic factuality signals for robust transfer. QAOD's joint probe achieves the best in-domain AUROC across all evaluated model-dataset pairs, while the orthogonal-only probe delivers the strongest OOD transfer, surpassing the best white-box baseline by up to 21% on BioASQ at under 25% of generation cost.

【3】MahaVar: OOD Detection via Class-wise Mahalanobis Distance Variance under Neural Collapse
标题:MahaVar:神经崩溃下通过逐类Mahalanobis距离方差进行OOD检测
链接:https://arxiv.org/abs/2605.14413

作者:Donghwan Kim,Hyunsoo Yoon
备注:29 pages, 8 figures
摘要:Out-of-distribution (OOD) detection is a critical component for ensuring the reliability of deep neural networks in safety-critical applications. In this work, we present a key empirical observation: for in-distribution (ID) samples, class-wise Mahalanobis distances exhibit a pronounced sharp minimum structure, where the distance to the nearest class is small while distances to all other classes remain large, resulting in high variance across classes. In contrast, OOD samples tend to exhibit a less pronounced sharp minimum structure, producing comparatively lower variance across classes. We further provide a theoretical analysis grounding this observation in Neural Collapse geometry: under relaxed Neural Collapse assumptions on within-class compactness and inter-class separation, ID samples are shown to structurally exhibit high class-wise distance variance, offering a theoretical basis for its use as an OOD score. Motivated by this observation and its theoretical backing, we propose MahaVar, a simple and effective post-hoc OOD detector that augments the Mahalanobis distance with a class-wise distance variance term. Following the OpenOOD v1.5 benchmark protocol, MahaVar achieves state-of-the-art performance on CIFAR-100 and ImageNet, with consistent improvements in both AUROC and FPR@95 over existing Mahalanobis-based methods across all benchmarks.

【4】Fair and Calibrated Toxicity Detection with Robust Training and Abstention
标题:通过强大的训练和弃权进行公平且校准的毒性检测
链接:https://arxiv.org/abs/2605.14074

作者:Mokshit Surana
摘要

【5】Sheaf-Theoretic Transport and Obstruction for Detecting Scientific Theory Shift in AI Agents
标题:用于检测人工智能主体科学理论转变的层理论传输和障碍
链接:https://arxiv.org/abs/2605.14033

作者:David N. Olivieri, Roque J. Hernández
摘要

分类|识别(4篇)

【1】Proposal and study of statistical features for string similarity computation and classification
标题:字符串相似度计算和分类统计特征的提出与研究
链接:https://arxiv.org/abs/2605.15110

作者:E. O. Rodrigues,D. Casanova,M. Teixeira,V. Pegorini,F. Favarim,E. Clua,A. Conci,Panos Liatsis
摘要:Adaptations of features commonly applied in the field of visual computing, co-occurrence matrix (COM) and run-length matrix (RLM), are proposed for the similarity computation of strings in general (words, phrases, codes and texts). The proposed features are not sensitive to language related information. These are purely statistical and can be used in any context with any language or grammatical structure. Other statistical measures that are commonly employed in the field such as longest common subsequence, maximal consecutive longest common subsequence, mutual information and edit distances are evaluated and compared. In the first synthetic set of experiments, the COM and RLM features outperform the remaining state-of-the-art statistical features. In 3 out of 4 cases, the RLM and COM features were statistically more significant than the second best group based on distances (P-value < 0.001). When it comes to a real text plagiarism dataset, the RLM features obtained the best results.

【2】DeepTokenEEG Enhancing Mild Cognitive Impairment and Alzheimers Classification via Tokenized EEG Features
标题:DeepTokenEEG通过TokenEEG特征增强轻度认知障碍和阿尔茨海默病分类
链接:https://arxiv.org/abs/2605.15009

作者:Thinh Nguyen-Quang,Minh Long Ngo,Ngoc-Son Nguyen,Nguyen Thanh Vinh,Huy-Dung Han,Bui Thanh Tung,Nguyen Quang Linh,Khuong Vo,Manoj Vishwanath,Hung Cao
摘要:The detection of Alzheimers disease (AD) is considered crucial, as timely intervention can improve patient outcomes. Electroencephalogram (EEG)-based diagnosis has been recognized as a non-invasive, accessible, and cost-effective approach for AD detection; however, it faces challenges related to data availability, accuracy of modern deep learning methods, and the time-consuming nature of expert-based interpretation. In this study, a novel lightweight and high-performance model, DeepTokenEEG, was designed for the diagnosis of AD and the classification of EEG signals from AD patients, individuals with other neurological conditions, and healthy subjects. Unlike traditional heavy-weight models, DeepTokenEEG ultilizes spatial and temporal tokenizer that effectively captures AD-related biomarkers in both temporal and frequency domain with only 0.29 million paramaters. Trained in a combined dataset of 274 subjects, including 180 AD cases, and 94 healthy controls, the proposed method achieves a maximum recorded accuracy of 100% on specific frequency bands, representing an improvement of 1.41-15.35% over state-of-the-art methods on the same dataset. These results indicate the potential of DeepTokenEEG for early detection and screening of AD, with promising applicability for deployment due to its compact size.

【3】Randomized Atomic Feature Models for Physics-Informed Identification of Dynamic Systems
标题:用于动态系统物理知情识别的随机原子特征模型
链接 :https://arxiv.org/abs/2605.14351

作者:Rajiv Singh,Mario Sznaier,Lennart Ljung
备注:Extended version of the conference paper submitted for IFAC World Congress, 2026
摘要:We present a physics-informed framework for system identification based on randomized stable atomic features. Impulse responses are represented as random superpositions of stable atoms, namely damped complex exponentials associated with poles sampled inside a prescribed disk. Identification is then cast as a convex regularized least-squares problem with optional linear, second-order-cone, and KYP constraints. The approach generalizes random Fourier and random Laplace features to the damped, nonstationary regime relevant to engineering systems while retaining modal interpretability and scalable finite-dimensional computation. The main analytic point is an operator-theoretic Disk-Bochner viewpoint: positive measures over stable poles generate positive-definite kernels with a radius-dependent shift defect, while a converse scalar disk moment representation for an arbitrary kernel is characterized by subnormality of the canonical shift. We prove this statement, establish an RKHS-to-l1 embedding, show that sampled poles induce a valid finite atomic gauge, discuss random-feature convergence, and state sparse-recovery guarantees conditionally on the restricted-eigenvalue properties of the realized disk-Vandermonde or input-output design matrix. We also connect the normalized transfer function problem to Nevanlinna-Pick interpolation and LFT set-membership. The framework directly encodes stability margins, modal localization, DC-gain bounds, monotonicity, passivity, relative degree, settling-time targets, and time/frequency-domain error bounds. Numerical comparisons illustrate how physically meaningful priors can compensate for poor excitation and improve constrained impulse-response recovery in an under-informative data setting.

【4】AttnGen: Attention-Guided Saliency Learning for Interpretable Genomic Sequence Classification
标题:AttnGen:可解释基因组序列分类的注意力引导显着学习
链接:https://arxiv.org/abs/2605.14073

作者:Rayhaneh Shabani Nia, Ali Karkehabadi
备注:Accepted at IEEE CCGE 2026
摘要

表征(6篇)

【1】Slot-MPC: Goal-Conditioned Model Predictive Control with Object-Centric Representations
标题:Slot-MPC:具有以对象为中心的表示的目标条件模型预测控制
链接:https://arxiv.org/abs/2605.14937

作者:Jonathan Spieler,Angel Villar-Corrales,Sven Behnke
摘要:Predictive world models enable agents to model scene dynamics and reason about the consequences of their actions. Inspired by human perception, object-centric world models capture scene dynamics using object-level representations, which can be used for downstream applications such as action planning. However, most object-centric world models and reinforcement learning (RL) approaches learn reactive policies that are fixed at inference time, limiting generalization to novel situations. We propose Slot-MPC, an object-centric world modeling framework that enables planning through Model Predictive Control (MPC). Slot-MPC leverages vision encoders to learn slot-based representations, which encode individual objects in the scene, and uses these structured representations to learn an action-conditioned object-centric dynamics model. At inference time, the learned dynamics model enables action planning via MPC, allowing agents to adapt to previously unseen situations. Since the learned world model is differentiable, we can use gradient-based MPC to directly optimize actions, which is computationally more efficient than relying on gradient-free, sampling-based MPC methods. Experiments on simulated robotic manipulation tasks show that Slot-MPC improves both task performance and planning efficiency compared to non-object-centric world model baselines. In the considered offline setting with limited state-action coverage, we find that gradient-based MPC performs better than gradient-free, sampling-based MPC. Our results demonstrate that explicitly structured, object-centric representations provide a strong inductive bias for controllable and generalizable decision-making. Code and additional results are available at https://slot-mpc.github.io.

【2】BioHuman: Learning Biomechanical Human Representations from Video
标题:BioHuman:从视频中学习生物力学人体表示
链接:https://arxiv.org/abs/2605.14772

作者:Yujun Huo,He Zhang,Chentao Song,Honglin Song,Zongyu Zuo,Tao Yu
摘要:Understanding human motion beyond surface kinematics is crucial for motion analysis, rehabilitation, and injury risk assessment. However, progress in this domain is limited by the lack of large-scale datasets with biomechanical annotations, and by existing approaches that cannot directly infer internal biomechanical states from visual observations. In this paper, we introduce a simulation-based framework for estimating muscle activations from existing motion capture datasets, resulting in BioHuman10M, a large-scale dataset with synchronized video, motion, and activations. Building on BioHuman10M, we propose BioHuman, an end-to-end model that takes monocular video as input and jointly predicts human motion and muscle activations, effectively bridging visual observations and internal biomechanical states. Extensive experiments demonstrate that BioHuman enables accurate reconstruction of both kinematic motion and muscle activity, and generalizes across diverse subjects and motions. We believe our approach establishes a new benchmark for video-based biomechanical understanding and opens up new possibilities for physically grounded human modeling.

【3】MoRe: Modular Representations for Principled Continual Representation Learning on Squantial Data
标题:MoRe:用于量化数据上原则连续表示学习的模块表示
链接:https://arxiv.org/abs/2605.14364

作者:Jiaqi Sun,Boyang Sun,Mohamad Rasmy,Xiangchen Song,Kun Zhang
摘要:Continual learning requires models to adapt to new data while preserving previously acquired knowledge. At its core, this challenge can be viewed as principled one-step adaptation: incorporating new information with minimal interference to existing representations. Most existing approaches address this challenge by modifying model parameters or architectures in a supervised, task-specific manner. However, the underlying issue is representational: tasks require distinct yet structured representations that can be selectively updated without disrupting representations, while structure should reflect intrinsic organization in the data rather than task boundaries. In sequential data, time-delayed dependencies provide a natural signal for uncovering this organization, revealing how fundamental representations give rise to more specific ones. Inspired by the modular organization of the human brain, we propose MoRe, a framework that identifies modularity in the representation itself rather than allocating it at the architectural level. MoRe decomposes knowledge into a hierarchy of fundamental and specific modules with identifiability guarantees, enabling principled module reuse, alignment, and expansion during adaptation while preserving old modules by construction. Experiments on synthetic benchmarks and real-world LLM activations demonstrate interpretable hierarchical structure, improved plasticity-stability trade-offs, suggesting MoRe as a principled foundation for continual adaptation

【4】AudioMosaic: Contrastive Masked Audio Representation Learning
标题:AudioMosaic:对比掩蔽音频表示学习
链接:https://arxiv.org/abs/2605.14231

作者 :Hanxun Huang,Qizhou Wang,Xingjun Ma,Cihang Xie,Christopher Leckie,Sarah Erfani
备注:ICML2026
摘要:Audio self-supervised learning (SSL) aims to learn general-purpose representations from large-scale unlabeled audio data. While recent advances have been driven mainly by generative reconstruction objectives, contrastive approaches remain less explored, partly due to the difficulty of designing effective audio augmentations and the large batch sizes required for contrastive pre-training. We introduce \textbf{AudioMosaic}, a contrastive learning-based audio encoder for general audio understanding. During pre-training, AudioMosaic constructs positive pairs by applying structured time-frequency masking to spectrogram patches, which reduces memory usage and enables efficient large-batch training. Compared with generative approaches, the AudioMosaic encoder learns more discriminative utterance-level representations that demonstrate strong transferability across datasets, domains, and acoustic conditions. Extensive experiments show that AudioMosaic achieves state-of-the-art performance on several standard audio benchmarks under both linear probing and fine-tuning. We further show that integrating the pretrained AudioMosaic encoder into audio-language models improves performance on audio-language tasks. The code is publicly available in our \href{https://github.com/HanxunH/AudioMosaic}{GitHub repository}.

【5】Network-Aware Bilinear Tokenization for Brain Functional Connectivity Representation Learning
标题:用于脑功能连接性表示学习的网络感知双线性令牌化
链接:https://arxiv.org/abs/2605.14048

作者:Leo Milecki, Qingyu Hu, Bahram Jafrasteh, Mert R. Sabuncu, Qingyu Zhao
备注:Submitted version to MICCAI 2026 (Provisional Accept)
摘要

【6】Vision-Based Runtime Monitoring under Varying Specifications using Semantic Latent Representations
标题:使用语义潜在表示在不同规范下基于视觉的收件箱监控
链接:https://arxiv.org/abs/2605.13923

作者:Bardh Hoxha,Oliver Schön,Hideki Okamoto,Lars Lindemann,Georgios Fainekos
摘要:We study certified runtime monitoring of past-time signal temporal logic (ptSTL) from visual observations under partial observability. The monitor must infer safety-relevant quantities from images and provide finite-sample guarantees, while being \emph{reusable}: once trained and calibrated, it should certify any formula in a target fragment without per-formula retraining. For fragments induced by a finite dictionary of temporal atoms, we prove that the \emph{semantic basis}, the vector of atom robustness scores, is the minimum prediction target within the class of monotone, 1-Lipschitz reusable interfaces: any formula is evaluated by a deterministic decoder derived from the parse tree, and a single conformal calibration pass certifies the entire fragment with no union bound. We also introduce a \emph{rolling prediction monitor} that predicts only current predicate values and reconstructs temporal history online; this is easier to learn but grows conservative at long horizons. On a pedestrian-crossroad benchmark, rolling achieves tighter certified bounds at short horizons while the semantic-basis monitor is up to 4-times tighter at long horizons. We validate the presented monitors on real-world Waymo driving data, where both monitors satisfy the conformal coverage guarantee empirically.

编码器(1篇)

【1】REALM: Retrospective Encoder Alignment for LFP Modeling
标题:REILM:LFP建模的回顾性编码器对齐
链接:https://arxiv.org/abs/2605.14867

作者:Peicheng Wu,Zhenyu Bu,Runze Ma,Lin Du
摘要:Spike activity has been the dominant neural signal for behavior decoding due to its high spatial and temporal resolution. However, as brain-computer interfaces (BCIs) move toward high channel counts and wireless operation, the high sampling frequency of spike signals becomes a bottleneck due to high power and bandwidth requirements. Local field potentials (LFPs) represent a different spatial-temporal scale of brain activity compared to spikes, offering key advantages including improved long-term stability, reduced energy consumption, and lower bandwidth requirement. Despite these benefits, LFP-based decoding models typically show reduced accuracy and often rely on non-causal architectures that are unsuitable for real-time deployment. To address these challenges, we propose REALM: a retrospective distillation framework that enables causal LFP decoding. Inspired by offline-to-online distillation strategies in speech recognition, REALM transfers representational knowledge from a pretrained multi-session bidirectional LFP model to a causal version for real-time deployment. We first pretrain a bidirectional Mamba-2 teacher model using a masked autoencoding objective. We then distill this teacher model into a compact student model via a combined objective of representation alignment and task supervision. REALM consistently outperforms both causal and non-causal LFP-based SOTA methods for behavior decoding. Notably, our REALM improves decoding performance while achieving a $2\times$ reduction in parameter count and a $10\times$ reduction in training time. These results demonstrate that retrospective distillation effectively bridges the gap between offline and real-time neural decoding. REALM shows that LFP-only models can achieve competitive decoding performance without reliance on spike signals, offering a practical and scalable alternative for next-generation wireless implantable BCIs.

优化|敛散性(6篇)

【1】An Amortized Efficiency Threshold for Comparing Neural and Heuristic Solvers in Combinatorial Optimization
标题:组合优化中比较神经和启发式求解器的摊销效率阈值
链接:https://arxiv.org/abs/2605.14624

作者:Sohaib Afifi
备注:16 pages, 5 figures, 3 tables. v0.1: framework + measurement protocol instantiated at n=20; empirical extension to larger problem sizes deferred to v0.2
摘要:A common critique of neural combinatorial-optimization solvers is that they are less energy-efficient than CPU metaheuristics, given the operational energy cost of training them on GPUs. This paper examines the inferential step from "training is expensive" to "neural solvers are net-inefficient", which is where the critique actually goes wrong. Training the network costs a large fixed amount of GPU energy; running the metaheuristic costs a small amount of CPU energy on every instance, repeated as long as the solver is deployed. The two are not commensurable until a deployment volume is fixed. We define the Amortized Efficiency Threshold (AET) as the deployment volume above which a neural solver breaks even with a heuristic baseline in total energy or carbon, under an explicit constraint on solution quality. We show that the cumulative-energy ratio between the two solvers tends to a constant strictly below one whenever the network wins per-instance, and that this limit does not depend on how the training cost was measured. An embodied-carbon term amortizes hardware fabrication symmetrically on both sides. We instantiate the framework on the Multi-Task VRP (MTVRP) environment at n=20 customers across 19 problem variants and five training seeds, with HGS via PyVRP as the heuristic baseline. The measured crossover sits near $1.58 \times 10^5$ deployed instances; the per-instance ratio is 0.41, reflecting the moderate size of the instances tested. The contribution is the framework, the open instrumentation, and the measurement protocol; structural convergence of the ratio at larger problem sizes is left to future empirical work.

【2】Efficient Multi-objective Prompt Optimization via Pure-exploration Bandits
标题:通过纯粹探索盗贼进行高效多目标即时优化
链接:https://arxiv.org/abs/2605.14553

作者:Donghao Li,Chengshuai Shi,Weijuan Ou,Cong Shen,Jing Yang
备注 :Published as a conference paper at ICLR 2026
摘要:Prompt engineering has become central to eliciting the capabilities of large language models (LLMs). At its core lies prompt selection -- efficiently identifying the most effective prompts. However, most prior investigations overlook a key challenge: the inherently multi-faceted nature of prompt performance, which cannot be captured by a single metric. To fill this gap, we study the multi-objective prompt selection problem under two practical settings: Pareto prompt set recovery and best feasible prompt identification. Casting the problem into the pure-exploration bandits framework, we adapt provably efficient algorithms from multi-objective bandits and further introduce a novel design for best feasible arm identification in structured bandits, with theoretical guarantees on the identification error in the linear case. Extensive experiments across multiple LLMs show that the bandit-based approaches yield significant improvements over baselines, establishing a principled and efficient framework for multi-objective prompt optimization.

【3】Turning Stale Gradients into Stable Gradients: Coherent Coordinate Descent with Implicit Landscape Smoothing for Lightweight Zeroth-Order Optimization
标题:将陈旧因素转变为稳定因素:一致坐标下降和隐式景观平滑,用于轻量级零阶优化
链接:https://arxiv.org/abs/2605.14373

作者:Chen Liang,Xiatao Sun,Qian Wang,Daniel Rakita
备注:Accepted to the 43rd International Conference on Machine Learning (ICML 2026)
摘要:Zeroth-Order (ZO) optimization is pivotal for scenarios where backpropagation is unavailable, such as memory-constrained on-device learning and black-box optimization. However, existing methods face a stark trade-off: they are either sample-inefficient (e.g., standard finite differences) or suffer from high variance due to randomized estimation (e.g., random subspace methods). In this work, we propose Coherent Coordinate Descent (CoCD), a deterministic, sample-efficient, and budget-aware ZO optimizer. Theoretically, we formalize the notion of gradient coherence and demonstrate that CoCD is equivalent to Block Cyclic Coordinate Descent (BCCD) with ``warm starts,'' effectively converting historical (stale) gradients from a liability into a computational asset. This mechanism enables $O(1)$ query complexity per step while maintaining global descent directions. Furthermore, we derive error bounds revealing a counter-intuitive insight: larger finite-difference step sizes can induce an implicit smoothing effect on the optimization landscape by reducing the effective smoothness constant, thereby improving convergence stability. Experiments on MLP, CNN, and ResNet architectures (up to 270k parameters) demonstrate that CoCD significantly outperforms BCCD in terms of sample efficiency and convergence loss/accuracy, and exhibits superior stability over randomized ZO methods. Our results suggest that deterministic, structure-aware updates offer a superior alternative to randomization for lightweight ZO optimization.

【4】Policy Optimization in Hybrid Discrete-Continuous Action Spaces via Mixed Gradients
标题:通过混合要素实现混合离散-连续行动空间中的政策优化
链接:https://arxiv.org/abs/2605.14297

作者:Matias Alvo,Daniel Russo,Yash Kanoria
摘要:We study reinforcement learning in hybrid discrete-continuous action spaces, such as settings where the discrete component selects a regime (or index) and the continuous component optimizes within it -- a structure common in robotics, control, and operations problems. Standard model-free policy gradient methods rely on score-function (SF) estimators and suffer from severe credit-assignment issues in high-dimensional settings, leading to poor gradient quality. On the other hand, differentiable simulation largely sidesteps these issues by backpropagating through a simulator, but the presence of discrete actions or non-smooth dynamics yields biased or uninformative gradients. To address this, we propose Hybrid Policy Optimization (HPO), which backpropagates through the simulator wherever smoothness permits, using a mixed gradient estimator that combines pathwise and SF gradients while maintaining unbiasedness. We also show how problems with action discontinuities can be reformulated in hybrid form, further broadening its applicability. Empirically, HPO substantially outperforms PPO on inventory control and switched linear-quadratic regulator problems, with performance gaps increasing as the continuous action dimension grows. Finally, we characterize the structure of the mixed gradient, showing that its cross term -- which captures how continuous actions influence future discrete decisions -- becomes negligible near a discrete best response, thereby enabling approximate decentralized updates of the continuous and discrete components and reducing variance near optimality. All resources are available at github.com/MatiasAlvo/hybrid-rl.

【5】From Data to Action: Accelerating Refinery Optimization with AI
标题:从数据到行动:利用人工智能加速炼油厂优化
链接:https://arxiv.org/abs/2605.15085

作者:Dániel Pfeifer,Ábrahám Papp,Tibor Bernáth,Tamás Zoltán Varga,Márk Czifra,Botond Szilágyi,Edith Alice Kovács
备注:34 pages, 17 figures
摘要:Nowadays refinery optimization utilizes sheer amounts of data, which can be handled with modern Linear Programming (LP) software, but the interpreting and applying the results remains challenging. Large petrochemical companies use massive models, with hundreds of thousands of input matrix elements. The LP solution is mathematically correct, but simplifications are made in the model, and data supply errors may occur. Therefore, further insight is needed to trust the results. The LP solver does not have a memory, so additional understanding could be gained by analyzing historical data and comparing it to the current plan. As such, machine learning approaches were suggested to support decision making based on the LP solution. Among these, Anomaly Detection tools are proposed to be used in tandem with the LP output. A transformed version of the popular ECOD methodology is applied. New methods are proposed to handle high-dimensional data: choosing the most informative pairs. Then, this is used alongside two 2D Anomaly Detection algorithms, revealing several business opportunities and data supply errors in the MOL refinery scheduling and planning architecture.

【6】Regret Equals Covariance: A Closed-Form Characterization for Stochastic Optimization
标题:后悔等于协方差:随机最优化问题的一个封闭形式刻画
链接:https://arxiv.org/abs/2605.14019

作者:Irene Aldridge
备注:33 pages
摘要:Regret is the cost of uncertainty in algorithmic decision-making. Quantifying regret typically requires computationally expensive simulation via Sample Average Approximation (SAA), with complexity $\mathcal{O}(Bn^{2}d^{3})$ in the number of scenarios $B$, variables $n$, and constraints $d$. % This paper proves that expected regret in any stochastic optimization problem admits the exact decomposition % \begin{equation*} \mathrm{Regret}(c) = \mathrm{Cov}(c,\,π^{*}(c)) + R(c), \end{equation*} % where $c$ is the vector of uncertain parameters, $π^{*}(c)$ is the optimal decision, and $R(c)$ is a residual whose magnitude we bound explicitly under Lipschitz, smooth, and strongly convex conditions. % For linear programs and unconstrained quadratic programs, including the classical Markowitz portfolio problem, we prove $R(c)=0$ exactly, so that $\mathrm{Regret}(c) = \mathrm{Cov}(c,π^{*}(c))$ holds without approximation. % When historical cost-decision pairs $\{(c_i, π^*(c_i))\}$ are available, the covariance can be estimated in $\mathcal{O}(nd^{2})$ time, which is orders of magnitude faster than SAA. The estimation is performed by a single pass through the data. % We derive concentration bounds, a central limit theorem, and an asymptotically unbiased residual estimator, and we validate all results on synthetic LP, QP, and integer programming instances and on a rolling-window portfolio experiment using ten years of CRSP equity data.

预测|估计(14篇)

【1】CoCo-InEKF: State Estimation with Learned Contact Covariances in Dynamic, Contact-Rich Scenarios
标题:CoCoCo-InEKF:动态、接触丰富场景中具有学习接触协方差的状态估计
链接:https://arxiv.org/abs/2605.15122

作者:Michael Baumgartner,David Müller,Agon Serifi,Ruben Grandia,Espen Knoop,Markus Gross,Moritz Bächer
备注:RSS 2026
摘要:Robust state estimation for highly dynamic motion of legged robots remains challenging, especially in dynamic, contact-rich scenarios. Traditional approaches often rely on binary contact states that fail to capture the nuances of partial contact or directional slippage. This paper presents CoCo-InEKF, a differentiable invariant extended Kalman filter that utilizes continuous contact velocity covariances instead of binary contact states. These learned covariances allow the method to dynamically modulate contact confidence, accounting for more nuanced conditions ranging from firm contact to directional slippage or no contact. To predict these covariances for a set of predefined contact candidate points, we employ a lightweight neural network trained end-to-end using a state-error loss. This approach eliminates the need for heuristic ground-truth contact labels. In addition, we propose an automated contact candidate selection procedure and demonstrate that our method is insensitive to their exact placement. Experiments on a bipedal robot demonstrate a superior accuracy-efficiency tradeoff for linear velocity estimation, as well as improved filter consistency compared to baseline methods. This enables the robust execution of challenging motions, including dancing and complex ground interactions -- both in simulation and in the real world.

【2】Novel Dynamic Batch-Sensitive Adam Optimiser for Vehicular Accident Injury Severity Prediction
标题:用于车辆事故伤害严重程度预测的新型动态批量敏感Adam优化器
链接:https://arxiv.org/abs/2605.15083

作者:Daniel Asare Kyei,Alimatu Saadia-Yussiff,Maame G. Asante-Mensah,Abdul Lateef-Yussiff,Charles Roland Haruna,Derry Emmanuel
摘要:The choice of optimiser is important in deep learning, as it strongly influences model efficiency and speed of convergence. However, many commonly used optimisers encounter difficulties when applied to imbalanced and sequential datasets, limiting their ability to capture patterns of minority classes. In this study, we propose Dynamic Batch-Sensitive Adam (DBS-Adam), an optimiser that dynamically scales the learning rate using a batch difficulty score derived from exponential moving averages of gradient norms and batch loss. DBS-Adam improves training stability and accelerates convergence by increasing updates for difficult batches and reducing them for easier ones. We evaluate DBS-Adam by integrating it with Bi-Directional LSTM networks for accident injury severity prediction, addressing class imbalance through SMOTE-ENN resampling and Focal Loss. Four experimental configurations compare baseline Bi-LSTM models and alternative architectures to assess optimiser impact. Rigorous comparison against state-of-the-art optimisers (AMSGrad, AdamW, AdaBound) across five random seeds demonstrated DBS-Adam's competitive performance with statistically significant precision improvements (p=0.020). Results indicate that DBS-Adam outperforms standard optimisation approaches, achieving 95.22% test accuracy, 96.11% precision, 95.28% recall, 95.39% F1-score, and a test loss of 0.0086. The proposed framework enables effective real-time accident severity classification for targeted emergency response and road safety interventions, demonstrating the value of DBS-Adam for learning from imbalanced sequential data.

【3】TopoPrimer: The Missing Topological Context in Forecasting Models
标题:TopoPrimer:预测模型中缺失的拓扑上下文
链接:https://arxiv.org/abs/2605.15035

作者:Zara Zetlin,Kayhan Moharreri,Maria Safi
备注:29 pages, 16 figures
摘要:We introduce TopoPrimer, a framework that makes the global topological structure of the series population an explicit input to any forecasting model. TopoPrimer improves accuracy across diverse domains, stabilizes forecasts under seasonal demand spikes, and closes the cold-start gap. Precomputed once per domain via persistent homology and spectral sheaf coordinates, TopoPrimer deploys per token for fully-trained models and as a lightweight adapter for pre-trained backbones. Of these two components, sheaf coordinates are the primary accuracy driver. Across four public benchmarks on Chronos and TimesFM, TopoPrimer consistently improves forecasting accuracy, with gains of up to 7.3% MSE on ECL. The topology advantage persists with near-identical magnitude across zero-shot and fine-tuned backbones, suggesting topology and per-series training capture complementary signals. The gains are most pronounced in difficult regimes. Under peak seasonal demand, classical and zero-shot models degrade by up to 50%, while TopoPrimer stays within 10%. At cold start with no item history, TopoPrimer reduces MAE by 27% over a topology-free baseline.

【4】SeesawNet: Towards Non-stationary Time Series Forecasting with Balanced Modeling of Common and Specific Dependencies
标题:SeesawNet:通过共同和特定从属关系的平衡建模实现非平稳时间序列预测
链接:https://arxiv.org/abs/2605.14551

作者:Hao Li,Lu Zhang,Liu Chong,Yankai Chen,Pengyang Wang,Yingjie Zhou
备注:Accepted by IJCAI-ECAI 2026, the 35th International Joint Conference on Artificial Intelligence. Code is at https://github.com/dreamone-Lee/SeesawNet
摘要:Instance normalization (IN) is widely used in non-stationary multivariate time series forecasting to reduce distribution shifts and highlight common patterns across samples. However, IN can over-smooth instance-specific structural information that is essential for modeling temporal and cross-channel heterogeneity. While prior methods further suppress distribution discrepancies or attempt to recover temporal specific dependencies, they often ignore a central tension: how to adaptively model common and instance-specific dependency based on each instance's non-stationary structures. To address this dilemma, we propose SeesawNet, a unified architecture that dynamically balances common and instance-specific dependency modeling in both temporal and channel dimensions. At its core is Adaptive Stationary-Nonstationary Attention (ASNA), which captures common dependencies from normalized sequences and specific dependencies from raw sequences, and adaptively fuses them according to instance-level non-stationarity. Built upon ASNA, SeesawNet alternates dedicated temporal and channel relationship modeling to jointly capture long-range and cross-variable dependencies. Extensive experiments on multiple real-world benchmarks demonstrate that SeesawNet consistently outperforms state-of-the-art methods.

【5】What if Tomorrow is the World Cup Final? Counterfactual Time Series Forecasting with Textual Conditions
标题:如果明天是世界杯决赛怎么办?具有文本条件的反事实时间序列预测
链接:https://arxiv.org/abs/2605.14422

作者:Shuqi Gu,Yongxiang Zhao,Baoyu Jing,Kan Ren
摘要 :Time series forecasting has become increasingly critical in real-world scenarios, where future sequences are influenced not only by historical patterns but also by forthcoming events. In this context, forecasting must dynamically adapt to complex and stochastic future conditions, which introduces fundamental challenges in both forecasting and evaluation. Traditional methods typically rely on historical data or factual future conditions, while overlooking counterfactual scenarios. Furthermore, many existing approaches are restricted to simple structured conditions, limiting their ability to generalize to the real-world complexities. To address these gaps, we introduce the task of counterfactual time series forecasting with textual conditions, enabling more flexible and condition-aware forecasting. We propose a comprehensive evaluation framework that encompasses both factual and counterfactual settings, even in the absence of ground truth time series. Additionally, we present a novel text-attribution mechanism that distinguishes mutable from immutable factors, thereby improving forecast accuracy under sophisticated and stochastic textual conditions. The project page is at https://seqml.github.io/TADiff/

【6】Nexus : An Agentic Framework for Time Series Forecasting
标题:Nexus:时间序列预测的抽象框架
链接:https://arxiv.org/abs/2605.14389

作者:Sarkar Snigdha Sarathi Das,Palash Goyal,Mihir Parmar,Nanyun Peng,Vishy Tirumalashetty,Chun-Liang Li,Rui Zhang,Jinsung Yoon,Tomas Pfister
备注:30 Pages, 3 figures, 5 Tables
摘要:Time series forecasting is not just numerical extrapolation, but often requires reasoning with unstructured contextual data such as news or events. While specialized Time Series Foundation Models (TSFMs) excel at forecasting based on numerical patterns, they remain unaware to real-world textual signals. Conversely, while LLMs are emerging as zero-shot forecasters, their performance remains uneven across domains and contextual grounding. To bridge this gap, we introduce Nexus, a multi-agent forecasting framework that decomposes prediction into specialized stages: isolating macro-level and micro-level temporal fluctuations, and integrating contextual information when available before synthesizing a final forecast. This decomposition enables Nexus to adapt from seasonal signals to volatile, event-driven information without relying on external statistical anchors or monolithic prompting. We show that current-generation LLMs possess substantially stronger intrinsic forecasting ability than previously recognized, depending critically on how numerical and contextual reasoning are organized. Evaluated on data strictly succeeding LLM knowledge cutoffs spanning Zillow real estate metrics and volatile stock market equities, Nexus consistently matches or outperforms state-of-the-art TSFMs and strong LLM baselines. Beyond numerical accuracy, Nexus produces high-quality reasoning traces that explicitly show the fundamental drivers behind each forecast. Our results establish that real-world forecasting is an agentic reasoning problem extending well beyond only sequence modeling.

【7】Semantic Feature Segmentation for Interpretable Predictive Maintenance in Complex Systems
标题:面向复杂系统可解释预测维护的语义特征分割
链接:https://arxiv.org/abs/2605.14318

作者:Emilio Mastriani,Alessandro Costa,Federico Incardona,Kevin Munari,Sebastiano Spinello
备注:18 pages, 7 figures. Under review at Neural Computing and Applications. Keywords: semantic segmentation, change point detection, fault anticipation
摘要:Predictive maintenance in complex systems is often complicated by the heterogeneity and redundancy of monitored variables,which can obscure fault-relevant information and reduce model interpretability. This work proposes a semantic feature segmentation framework that decomposes the monitored feature space into a canonical component,expected to retain the dominant predictive information, and a residual component containing structurally peripheral signals. The segmentation is defined through domain informed criteria and sets up monitoring variables into functional groups reflecting operational mechanisms such as throughput,latency,pressure,network activity,and structural state. To evaluate the effectiveness of this decomposition, we adopt a predictive perspective in which expected predictive risk is used as an operational proxy for task-relevant information. Experimental results obtained through time-aware cross-validation show that the canonical space consistently achieves lower predictive risk than the residual space across multiple temporal configurations, indicating that the semantic segmentation concentrates the most relevant information for fault anticipation. In addition, the canonical segments exhibit significantly stronger intra-segment coherence than inter-segment dependence, and this structural organization remains stable after redundancy reduction. When compared with the full feature space and with a Principal Component Analysis (PCA) representation, the canonical space carries out comparable predictive performance and furthermore preserves the semantic meaning of the original variables. These findings suggest that semantic feature segmentation provides an interpretable and information-preserving decomposition of monitoring signals, enabling competitive predictive performance without sacrificing the operational interpretability required in predictive maintenance applications.

【8】Guided Diffusion Sampling for Precipitation Forecast Interventions
标题:降水预报干预的引导扩散抽样
链接:https://arxiv.org/abs/2605.14317

作者:Ayumu Ueyama,Kazuhiko Kawamoto,Hiroshi Kera
备注:12+7 pages, 7+2 figures
摘要:Extreme precipitation causes severe societal and economic damage, and weather control has long been discussed as a potential mitigation strategy. However, to the best of our knowledge, perturbation-based interventions for weather control using data-driven weather forecasting models have not yet been explored. While adversarial attacks also generate perturbations that alter forecasts, they aim to exploit model artifacts and do not account for physical plausibility. In this paper, we propose a gradient-based guidance framework for precipitation-reduction interventions through diffusion sampling in diffusion-based weather forecasting models. Instead of directly perturbing atmospheric states, our method steers the diffusion sampling trajectory, enabling precipitation reduction while maintaining consistency with the atmospheric distribution. To assess physical plausibility, we evaluate from three perspectives: (i) vertical and variable-wise perturbation profiles, (ii) latent-space trajectory deviation, and (iii) cross-model transferability. Experiments on extreme precipitation events from WeatherBench2 demonstrate that our method achieves effective precipitation reduction while yielding more physically plausible interventions than adversarial perturbations.

【9】Smooth Multi-Policy Causal Effect Estimation in Longitudinal Settings
标题:纵向环境下平滑的多政策因果效应估计
链接:https://arxiv.org/abs/2605.14284

作者:Wenxin Chen,Weishen Pan,Kyra Gan,Fei Wang
摘要:Comparative evaluation of multiple dynamic treatment policies is essential for healthcare and policy decisions, yet conventional longitudinal causal inference methods estimate each in isolation, preventing information sharing across counterfactuals. We demonstrate that this separate estimation paradigm induces a structurally uncontrolled second-order bias, inflating finite-sample variance even after standard debiasing with longitudinal targeted maximum likelihood estimation(LTMLE). To address this, we propose a policy-aware reparameterization of Iterative Conditional Expectation (ICE) Q-functions that enables joint estimation through shared representations. We implement this approach in the Policy-Encoded Q Network (PEQ-Net), an architecture centered on a shared policy encoder. The encoder is trained using kernel mean embeddings, ensuring that the learned representation space reflects population-level policy dissimilarities. After applying an LTMLE correction step, we prove this design imposes a structural constraint on the second-order remainder, thereby stabilizing finite-sample variance. Experiments on semi-synthetic datasets demonstrate that PEQ-Net consistently outperforms existing ICE-based methods, achieving substantial reductions in root-mean-square error, particularly when evaluating closely related policies.

【10】SurF: A Generative Model for Multivariate Irregular Time Series Forecasting
标题:SurF:多元不规则时间序列预测的生成模型
链接:https://arxiv.org/abs/2605.14069

作者:Mohammad R. Rezaei, Tejas Balaji, Rahul G. Krishnan
摘要

【11】Comparative Evaluation of Machine Learning Approaches for Minority-Class Financial Distress Prediction Under Class Imbalance Constraints
标题:阶级失衡约束下少数族裔家庭财务困境预测机器学习方法的比较评估
链接:https://arxiv.org/abs/2605.14067

作者:Karan Sehgal, Khawar Naveed Bhatti
备注:16 pages, 4 figures, preprint under review. Applied machine learning evaluation involving imbalance-aware financial distress prediction, ensemble learning, SMOTE, and SHAP explainability analysis
摘要

【12】Multi-Block Attention for Efficient Channel Estimation in IRS-Assisted mmWave MIMO
标题:IRS辅助毫米波多个子块关注有效的信道估计
链接:https://arxiv.org/abs/2605.15032

作者:Mehrdad Momen-Tayefeh,Mehrshad Momen-Tayefeh,Maryam Sabbaghian
摘要:Intelligent Reflecting Surfaces (IRSs) are a promising technology for enhancing the spectral and energy efficiency of millimeter-wave (mmWave) multiple-input multiple-output (MIMO) systems. In these systems, accurate channel estimation remains challenging due to the passive nature of IRS elements and the high pilot overhead in large-scale deployments. This paper presents a deep learning-based Multi-Block Attention (MBA) framework for efficient cascaded channel estimation in IRS-assisted mmWave MIMO systems that utilize orthogonal frequency division multiplexing (OFDM). First, we show the optimality of the discrete Fourier transform (DFT) and Hadamard matrices as phase configurations for least squares (LS) estimation. To reduce training overhead, we selectively deactivate IRS elements and compensate for induced feature loss using a two-stage architecture: (i) a Convolutional Attention Network (CAN) for spatial correlation recovery and (ii) a Complex Multi-Convolutional Network (CMN) for noise suppression. The MBA architecture mitigates error propagation through attention-guided feature refinement and denoising. Simulation results indicate that the MBA method reduces pilot overhead by up to 87% compared to the LS estimator. Additionally, at signal-to-noise ratios of 10 dB, our proposed method achieves approximately 51% lower normalized mean squared error (NMSE) than leading methods. It also maintains low computational complexity and adapts effectively to various propagation environments.

【13】On the Burden of Achieving Fairness in Conformal Prediction
标题:论保形预测实现公平的负担
链接:https://arxiv.org/abs/2605.14260

作者:Ziang Gao,Pengqi Liu,Archer Yi Yang,Mouloud Belbahri,Jesse C. Cresswell,Masoud Asgharian
摘要:Conformal prediction is often calibrated with a single pooled threshold, but this can hide cross-group heterogeneity in score distributions and distort group-wise coverage. We study this phenomenon through the population score distributions underlying split conformal calibration. First, we derive a conservation law and lower bound showing that pooled calibration incurs irreducible group-wise coverage distortion at a scale set by cross-group quantile heterogeneity. Second, we demonstrate that the two leading fairness definitions for conformal prediction, Equalized Coverage and Equalized Set Size, are fundamentally in tension. Third, we quantify the cost of moving between policies which treat groups separately or pool them. Experiments on synthetic and real data confirm the same bidirectional trade-off after finite-sample calibration. Our results show that, for the policy families studied here, calibration choice does not remove cross-group heterogeneity; it determines whether the resulting distortion appears in the coverage or size dimension, providing a principled lens for analyzing fairness-oriented calibration choices in practice.

【14】Attention-Based Multimodal Survival Prediction with Cross-Modal Bilinear Fusion
标题:基于注意力的多模式生存预测与跨模式双线性融合
链接:https://arxiv.org/abs/2605.13897

作者:Hassan Keshvarikhojasteh,Josien P. W. Pluim,Mitko Veta
摘要:We propose a novel multimodal deep learning framework for patient-level survival prediction, which integrates whole-slide histology features, RNA-seq expression profiles, and clinical variables. Our architecture combines an ABMIL module~\cite{ilse2018attention} for slide-level representation with feedforward encoders for RNA and clinical data. These embeddings are then integrated through low-rank bilinear cross-modal fusion~\cite{liu2018efficient} to model conditional interactions across modalities while controlling parameter growth. The model outputs continuous risk scores that are subsequently mapped to survival times using a nonparametric calibration procedure based on the Kaplan--Meier estimator~\cite{kaplan1958nonparametric}. By decomposing multimodal reasoning into independent pairwise interactions, the proposed fusion design promotes structural interpretability and parameter efficiency compared with full tensor and hierarchical fusion strategies. Experiments on the CHIMERA challenge dataset demonstrate improved predictive performance over concatenation-based baselines and competitive generalization on hidden evaluation cohorts. These results indicate that the proposed framework is a promising approach for multimodal survival prediction in HR-NMIBC. The implementation is publicly available at https://github.com/hassancpu/ChimeraChallenge2025_Task_3.

其他神经网络|深度学习|模型|建模(38篇)

【1】When Are Two Networks the Same? Tensor Similarity for Mechanistic Interpretability
标题:两个网络什么时候相同?机械解释的张量相似性
链接:https://arxiv.org/abs/2605.15183

作者:ML Nissen Gonzalez,Melwina Albuquerque,Laurence Wroe,Jacob Meyer Cohen,Logan Riggs Smith,Thomas Dooms
备注:22 pages, 8 figures. Code: https://github.com/tdooms/tensor-similarity
摘要:Mechanistic interpretability aims to break models into meaningful parts; verifying that two such parts implement the same computation is a prerequisite. Existing similarity measures evaluate either empirical behaviour, leaving them blind to out-of-distribution mechanisms, or basis-dependent parameters, meaning they disregard weight-space symmetries. To address these issues for the class of tensor-based models, we introduce a weight-based metric, tensor similarity, that is invariant to such symmetries. This metric captures global functional equivalence and accounts for cross-layer mechanisms using an efficient recursive algorithm. Empirically, tensor similarity tracks functional training dynamics, such as grokking and backdoor insertion, with higher fidelity than existing metrics. This reduces measuring similarity and verifying faithfulness into a solved algebraic problem rather than one of empirical approximation.

【2】Eradicating Negative Transfer in Multi-Physics Foundation Models via Sparse Mixture-of-Experts Routing
标题:通过稀疏专家混合路由消除多物理基础模型中的负转移
链接:https://arxiv.org/abs/2605.15179

作者:Ellwil Sharma,Arastu Sharma
备注:5 pages, 4 figures
摘要 :Scaling Scientific Machine Learning (SciML) toward universal foundation models is bottlenecked by negative transfer: the simultaneous co-training of disparate partial differential equation (PDE) regimes can induce gradient conflict, unstable optimization, and plasticity loss in dense neural operators. In particular, broadband open-channel fluid dynamics and boundary-dominated porous media flows impose incompatible spectral and geometric demands on a single dense parameter path. We introduce Shodh-MoE, a sparse-activated latent transformer architecture for multi-physics transport. Shodh-MoE operates on compressed 16^3 physical latents produced by a physics-informed autoencoder with an intra-tokenizer Helmholtz-style velocity parameterization, restricting decoded states to divergence-free velocity manifolds. The model guarantees exact mass conservation, achieving a physically verifiable velocity divergence of ~2.8 x 10^-10 (evaluated post-hoc in FP64) on 128^3 grids. A Top-1 soft-semantic router dynamically assigns localized latent patches to expert subnetworks, enabling specialized parameter paths for distinct physical mechanisms while preserving shared experts for universal symmetries. In a 20,000-step distributed pretraining run over mixed three-dimensional physical tensors, routing telemetry shows autonomous domain bifurcation: held-out validation tokens from the open-channel domain route exclusively to Expert 0, while porous-media tokens route exclusively to Expert 1. The model converges simultaneously across both regimes, achieving latent validation MSEs of 2.46 x 10^-5 and 9.76 x 10^-6, and decoded physical MSEs of 2.48 x 10^-6 and 1.76 x 10^-6. These results support sparse expert routing as a practical architectural mechanism for mitigating multi-physics interference in universal neural operators.

【3】MeMo: Memory as a Model
标题:MeMo:记忆作为模型
链接:https://arxiv.org/abs/2605.15156

作者:Ryan Wei Heng Quek,Sanghyuk Lee,Alfred Wei Lun Leong,Arun Verma,Alok Prakash,Nancy F. Chen,Bryan Kian Hsiang Low,Daniela Rus,Armando Solar-Lezama
备注:This paper introduces MeMo, a framework that augments any LLM with up-to-date or domain-specific knowledge via a trained memory model, avoiding costly retraining, mitigating catastrophic forgetting, and remaining robust to retrieval noise
摘要:Large language models (LLMs) achieve strong performance across a wide range of tasks, but remain frozen after pretraining until subsequent updates. Many real-world applications require timely, domain-specific information, motivating the need for efficient mechanisms to incorporate new knowledge. In this paper, we introduce MeMo (Memory as a Model), a modular framework that encodes new knowledge into a dedicated memory model while keeping the LLM parameters unchanged. Compared to existing methods, MeMo offers several advantages: (a) it captures complex cross-document relationships, (b) it is robust to retrieval noise, (c) it avoids catastrophic forgetting in the LLM, (d) it does not require access to the LLM's weights or output logits, enabling plug-and-play integration with both open and proprietary closed-source LLMs, and (e) its retrieval cost is independent of corpus size at inference time. Our experimental results on three benchmarks, BrowseComp-Plus, NarrativeQA, and MuSiQue, show that MeMo achieves strong performance compared to existing methods across diverse settings.

【4】Training ML Models with Predictable Failures
标题:训练具有可预测故障的ML模型
链接:https://arxiv.org/abs/2605.15134

作者:Will Schwarzer,Scott Niekum
备注:32 pages, 9 figures
摘要:Estimating how often an ML model will fail at deployment scale is central to pre-deployment safety assessment, but a feasible evaluation set is rarely large enough to observe the failures that matter. Jones et al. (2025) address this by extrapolating from the largest k failure scores in an evaluation set to predict deployment-scale failure rates. We give a finite-k decomposition of this estimator's forecast error and show that it has a built-in bias toward over-prediction in the typical case, which is the safety-favorable direction. This bias is offset when the evaluation set misses a rare high-failure mode that the deployment set contains, leaving the forecast to under-predict at deployment scale. We propose a fine-tuning objective, the forecastability loss, that addresses this failure mode. In two proof-of-concept experiments, a language-model password game and an RL gridworld, fine-tuning substantially reduces held-out forecast error while preserving primary-task capability and achieving safety similar to that of supervised baselines.

【5】Causal Foundation Models with Continuous Treatments
标题:持续治疗的因果基础模型
链接:https://arxiv.org/abs/2605.15133

作者:Christopher Stith,Medha Barath,Vahid Balazadeh,Jesse C. Cresswell,Rahul G. Krishnan
备注:22 pages, 9 figures
摘要:Causal inference, estimating causal effects from observational data, is a fundamental tool in many disciplines. Of particular importance across a variety of domains is the continuous treatment setting, where the variable of intervention has a continuous range. This setting is far less explored and represents a substantial shift from the binary treatment setting, with models needing to represent effects across a continuum of treatment values. In this paper, we present the first causal foundation model for the continuous treatment setting. Our model meta-learns the ability to predict causal effects across a wide variety of unseen tasks without additional training or fine-tuning. First, we design a novel prior over data-generating processes with continuous treatment variables in order to generate a rich causal training corpus. We then train a transformer to reconstruct individual treatment-response curves given only observational data, leveraging in-context learning to amortize expensive Bayesian posterior inference. Our model achieves state-of-the-art performance on individual treatment-response curve reconstruction tasks compared to causal models which are trained specifically for those tasks.

【6】Distance-Matrix Wasserstein Statistics for Scalable Gromov--Wasserstein Learning
标题:可扩展Gromov的距离矩阵Wasserstein统计--Wasserstein学习
链接:https://arxiv.org/abs/2605.14981

作者:Ao Xu,Tieru Wu
摘要:Gromov--Wasserstein (GW) distances compare graphs, shapes, and point clouds through internal distances, without requiring a common coordinate system. This invariance is powerful, but discrete GW is a nonconvex quadratic optimal transport problem and is difficult to estimate at scale. We propose \emph{Distance-Matrix Wasserstein} (DMW), a hierarchy of Wasserstein statistics comparing laws of random finite distance matrices. Rather than optimizing a global point-level alignment, DMW samples $n$ points from each space, records their pairwise distances, and transports the resulting matrix laws. We prove that DMW is a relaxation and lower bound of GW, and establish a reverse approximation inequality: the GW--DMW gap is controlled by the Wasserstein error of approximating each original measure with $n$ samples. Hence population DMW converges to GW as sampled subspaces become dense. We further give finite-sample bounds, including intrinsic-dimensional rates that depend on the data manifold rather than the ambient matrix dimension $\binom n2$. For scalable computation, we introduce sliced and multi-scale DMW; for $p=1$, the sliced multi-scale dissimilarity yields positive-definite exponential kernels. Experiments on synthetic metric spaces, scalability benchmarks, graph classification, and two-sample testing validate the theory and demonstrate an interpretable GW-style proxy for structural comparison.

【7】InfoSFT: Learn More and Forget Less with Information-Aware Token Weighting
标题:InfoSFT:通过信息感知令牌加权了解更多,忘记更少
链接:https://arxiv.org/abs/2605.14967

作者:Mahdi Sabbaghi,George Pappas,Adel Javanmard,Hamed Hassani
摘要 :Supervised fine-tuning (SFT) provides the standard approach for teaching LLMs new behaviors from offline expert demonstrations. However, standard SFT uniformly fits all samples -- including those with low likelihood under the base model -- which can disproportionately drive training updates toward overfitting specific samples rather than learning the target behavior. Moreover, adapting to these unlikely samples induces substantial policy shifts that degrade prior capabilities. Existing methods mitigate this by filtering, regenerating, or down-weighting low-likelihood data. In doing so, they often suppress precisely the novel behaviors the base model has yet to learn. We propose InfoSFT, a principled weighting scheme for the SFT objective that concentrates learning signals on maximally informative, medium-confidence tokens -- those neither overly familiar to the base model nor too unlikely to cause instability. Requiring only a one-line modification to the standard token-wise loss, InfoSFT demonstrably improves generalization over vanilla SFT and likelihood-weighted baselines across math, code, and chain-of-thought tasks with diverse model families, while better preserving pre-existing capabilities.

【8】TILBench: A Systematic Benchmark for Tabular Imbalanced Learning Across Data Regimes
标题:TILBench:跨数据状态表格不平衡学习的系统基准
链接:https://arxiv.org/abs/2605.14915

作者:Ruizhe Liu,Jiaqi Luo
摘要:Imbalanced learning remains a fundamental challenge in tabular data applications. Despite decades of research and numerous proposed algorithms, a systematic empirical understanding of how different imbalanced learning methods behave across diverse data characteristics is still lacking. In particular, it remains unclear how different method families compare in predictive performance, robustness under varying data characteristics, and computational scalability. In this work, we present Tabular Imbalanced Learning Benchmark (TILBench), a large-scale empirical benchmark for tabular imbalanced learning. TILBench evaluates more than 40 representative algorithms across 57 diverse tabular datasets, resulting in over 200000 controlled experiments across a wide range of data characteristics. Our findings show that no single method consistently dominates across all settings; instead, the effectiveness of imbalanced learning methods depends strongly on dataset characteristics and computational constraints. Based on these findings, we provide practical recommendations for selecting appropriate methods in real-world applications.

【9】In-Context Learning for Data-Driven Censored Inventory Control
标题:用于数据驱动的审查库存控制的上下文学习
链接:https://arxiv.org/abs/2605.14840

作者:Sohom Mukherjee,Anh-Duy Pham,Richard Pibernik,Yunbei Xu
摘要:We study inventory control with decision-dependent censoring, focusing on the censored or repeated newsvendor (R-NV), where each order quantity determines whether demand is fully observed or censored by sales. Existing approaches based on parametric Thompson sampling (TS) can be brittle under prior mismatch, while offline imputation methods need not transfer to online learning. Motivated by the predictive view of decision making, we combine these ideas by taking oracle actions on learned completions of latent demand. We propose in-context generative posterior sampling (ICGPS), which uses modern generative models that are meta-trained offline and deployed online by in-context autoregressive generation. Theoretically, we show that the Bayesian regret of ICGPS with a learned completion kernel is bounded by the Bayesian regret of a TS benchmark with the ideal completion kernel plus a deployment penalty scaling as $\sqrt{T}$ times the square root of the completion mismatch. This yields a plug-in template for operational problems with known TS regret bounds. For R-NV, we derive sublinear Bayesian regret by reducing censored feedback to bandit convex optimization feedback. We also show that, under reasonable coverage and stability assumptions, the online completion mismatch is controlled by the offline censored predictive mismatch, so offline predictive quality transfers to online performance. Practically, we instantiate ICGPS with ChronosFlow, which combines a frozen time-series transformer backbone with a trainable conditional normalizing-flow head for fast censoring-consistent sampling. In benchmark experiments, ChronosFlow-ICGPS matches correctly specified TS, outperforms myopic and UCB-style baselines, and is robust to prior mismatch and distribution shift. ChronosFlow-ICGPS also performs well for the real-world SuperStore dataset, especially under heavy censoring.

【10】Beyond What to Select: A Plug-and-play Oscillatory Data-Volume Scheduling for Efficient Model Training
标题:超越选择:即插即用的振荡数据量调度,用于高效的模型训练
链接:https://arxiv.org/abs/2605.14773

作者:Suorong Yang,Hanqi Zhu,Hai Gan,Fangjian Su,Guang Li,Furao Shen,Soujanya Poria
摘要:Data selection accelerates training by identifying representative training data while preserving model performance. However, existing methods mainly focus on designing sample-importance criteria, i.e., deciding what to select, while typically fixing the selected data volume as the target ratio throughout training. Thus, they are often dynamic in sample identity but static in data volume. In this work, we revisit data selection from an optimization perspective and show that selected-data training induces an implicit regularization effect modulated by the instantaneous selection ratio. This reveals a key trade-off: lower ratios amplify selection-induced regularization, whereas higher ratios preserve data coverage and optimization fidelity. Motivated by this insight, we propose PODS, a Plug-and-play Oscillatory Data-volume Scheduling framework. Rather than introducing another sample-scoring metric, PODS serves as a lightweight module that dynamically schedules how much data to select over training. Under the target selection ratio, PODS alternates between low-ratio regularization phases and high-ratio recovery phases to exploit selection-induced regularization without sacrificing optimization stability. With its lightweight, ratio-level, and task-agnostic design, PODS is compatible with existing static and dynamic selection methods and broadly applicable across training paradigms. Experiments across various datasets, architectures, and tasks show that PODS consistently improves the efficiency-generalization trade-off, e.g., reducing ImageNet-1k training cost by 50% with improved accuracy and accelerating LLM instruction tuning by over 2x without performance degradation.

【11】Composable Crystals: Controllable Materials Discovery via Concept Learning
标题:可组合晶体:通过概念学习发现可控材料
链接:https://arxiv.org/abs/2605.14769

作者:Nian Liu,Yuwei Zeng,Ryoji Kubo,Nikita Kazeev,Stephen Gregory Dale,Artem Maevskiy,Pengru Huang,Thomas Laurent,Kostya S. Novoselov,Xavier Bresson
摘要:De novo crystal generation, a central task in materials discovery, aims to generate crystals that are simultaneously valid, stable, unique, and novel. Existing methods mainly rely on black-box stochastic sampling, providing limited control over how generated structures move beyond the observed distribution. In this paper, we introduce a concept-based compositional framework for crystal generation. We train a vector-quantized variational autoencoder to automatically discover a shared set of reusable crystal concepts, which serve as building blocks for guided generation. These learned concepts naturally exhibit interpretability from both local atomic environments and global symmetry patterns, and generalize to crystals from different distributions. By recombining such concepts, our framework enables controllable exploration of novel crystals beyond the training distribution, rather than relying solely on unconstrained random sampling. To further improve composition efficiency, we introduce a composition generator and iteratively refine it using high-quality samples generated by the model itself. The resulting concept compositions are then used to condition downstream crystal generation. Numerical experiments on MP-20 and Alex-MP-20 show that compositing concepts separately increase base model up to 53.2% and 51.7% on V.S.U.N metric, with particular gains in novelty.

【12】TAPIOCA: Why Task- Aware Pruning Improves OOD model Capability
标题:TAPIOCA:为什么任务感知剪枝提高了OOD模型的能力
链接:https://arxiv.org/abs/2605.14738

作者:Krish Sharma,Omar Naim,Soumadeep Saha,Nicholas Asher
摘要:Recent work has promoted task-aware layer pruning as a way to improve model performance on particular tasks, as shown by TALE. In this paper, we investigate when such improvements occur and why. We show first that, across controlled polynomial regression tasks and large language models, such pruning yields no benefit on in-distribution (ID) data but consistently improves out-of-distribution (OOD) accuracy. We further show empirically that OOD inputs induce layerwise norm and pairwise-distance profiles that deviate from the corresponding ID profiles. This leads to a geometric explanation of task-aware pruning: each task induces a task-adapted geometry, characterized empirically by the representation profiles observed on ID inputs. OOD inputs can introduce a distorted version of the task-adapted geometry. Task-aware pruning identifies layers that create or amplify this distortion; by removing them, it shifts OOD representational norms and pairwise distances toward those observed on the adapted distribution. This realigns OOD inputs with the model's task-adapted geometry and improves performance. We provide causal evidence through controlled distribution shifts and residual-scaling interventions, and demonstrate consistent behavior across model scales.

【13】Slower Generalization, Faster Memorization: A Sweet Spot in Algorithmic Learning
标题:更慢的概括,更快的简化:数学学习的甜蜜点
链接:https://arxiv.org/abs/2605.14659

作者:Shin So,Kyelim Lee,Albert No
摘要:Critical-data-size accounts of grokking suggest a natural post-threshold intuition: once training data is sufficient to identify the underlying rule, additional data should accelerate validation convergence. We show that this intuition can fail in a controlled structured-output task. In Needleman--Wunsch (NW) matrix generation, small Transformers reach high validation exact-match accuracy fastest at an intermediate dataset size, not at the largest one. Past this dataset-size sweet spot, generalization remains achievable but requires more gradient updates. Conversely, in the regime where partial validation competence first appears, larger datasets can require fewer updates to reach high training accuracy, suggesting that emerging rule structure can accelerate fitting beyond example-wise memorization. A multiplication baseline does not show the same post-threshold slowdown. These results separate the critical data size for the onset of generalization from the dataset size that optimizes update-based convergence, and identify structured-output tasks where learning the rule and completing exact-fitting can diverge.

【14】Action-Inspired Generative Models
标题:受启发的生成模型
链接:https://arxiv.org/abs/2605.14631

作者:Eshwar R. A.,Debnath Pal
备注:11 pages, 5 figures, and 4 tables
摘要:We introduce Action-Inspired Generative Models (AGMs), a dual-network generative framework motivated by the observation that existing bridge-matching methods assign uniform regression weight to every stochastic transition in the transport landscape, regardless of whether a given bridge sample lies along a structurally coherent trajectory or a degenerate one. We address this by introducing a lightweight learned scalar potential $V_φ$ that scores bridge samples online and modulates the drift objective via importance weights derived through a stop-gradient barrier -- preventing adversarial feedback between the two networks whilst preserving $V_φ$'s guiding signal. Crucially, $V_φ$ comprises only $\sim$1.4% of the primary drift network's parameter count, adds no overhead to the inference graph, and requires no iterative half-bridge fitting or auxiliary stochastic differential equation (SDE) solvers: it is a plug-and-play enhancement to any bridge-matching training loop. At inference, $V_φ$ is discarded entirely, leaving standard Euler-Maruyama integration of the exponential moving average (EMA) drift. We demonstrate that selectively penalising uninformative transport paths through the learned potential yields consistent improvements in generation quality across fidelity and coverage metrics.

【15】Deep Image Segmentation via Discriminant Feature Learning
标题:通过鉴别特征学习进行深度图像分割
链接:https://arxiv.org/abs/2605.14609

作者:Adam Dawid Sztamborski,Raül Pérez-Gonzalo,Antonio Agudo
备注:Accepted to ICIP 2026
摘要:Accurate image segmentation remains challenging, particularly in generating sharp, confident boundaries. While modern architectures have advanced the field, many of them still rely on standard loss functions like Cross-Entropy and Dice, which often neglect the discriminative structure of learned features, leading to inaccurate boundaries. This work introduces Deep Discriminant Analysis (DDA), a differentiable, architecture-agnostic loss function that embeds classical discriminant principles for network training. DDA explicitly maximizes between-class variance while minimizing within-class one, promoting compact and separable feature distributions without increasing inference cost. Evaluations on the DIS5K benchmark demonstrate that DDA consistently improves segmentation accuracy, boundary sharpness, and model confidence across various architectures. Our results show that integrating discriminant analysis offers a simple, effective path for building more robust segmentation models.

【16】Silent Collapse in Recursive Learning Systems
标题:循环学习系统的无声崩溃
链接:https://arxiv.org/abs/2605.14588

作者:Zhipeng Zhang
摘要:Recursive learning -- where models are trained on data generated by previous versions of themselves -- is increasingly common in large language models, autonomous agents, and self-supervised systems. However, standard performance metrics (loss, perplexity, accuracy) often fail to detect internal degradation before it becomes irreversible. Here we identify a phenomenon we call silent collapse: under broad recursive conditions, model internal distributions -- predictive entropy, representational diversity, and tail coverage -- progressively contract even as conventional metrics appear stable or improving. We discover that silent collapse is not abrupt. Its onset is reliably preceded by three trajectory-level precursors: (1) contraction of anchor entropy, (2) freezing of representation drift, and (3) erosion of tail coverage. These signals manifest multiple generations before any degradation in standard validation metrics, enabling early warning. Based on these precursors, we propose the MTR (Monitor--Trust--Regulator) framework, a lightweight metacognitive loop that monitors trajectory statistics, estimates a slow-timescale trust variable, and adaptively modulates the effective learning intensity. MTR provides early warning and actively prevents silent collapse without requiring access to pristine real data -- a critical advantage when original data is unavailable, contaminated, or private.

【17】Multi-Dimensional Model Integrity and Responsibility Assessment Index and Scoring Framework
标题:多维模型诚信与责任评估指标及评分框架
链接:https://arxiv.org/abs/2605.14550

作者:Phuc Truong Loc Nguyen,Thanh Hung Do,Truong Thanh Hung Nguyen,Hung Cao
备注:Accepted to the 39th Canadian Conference on Artificial Intelligence (Canadian AI 2026)
摘要 :Artificial intelligence in high-stakes tabular domains cannot be evaluated by predictive performance alone, yet current practice still assesses explainability, fairness, robustness, privacy, and sustainability mostly in isolation. We propose the Model Integrity and Responsibility Assessment Index (MIRAI), a unified evaluation framework that measures tabular models across these five dimensions under a controlled comparison setting and aggregates them into a single score. MIRAI combines established metrics through normalized and direction-aligned dimension scores, which enables direct comparison across models with different architectural and computational profiles. Experiments on healthcare, financial, and socioeconomic datasets show that higher predictive performance does not necessarily imply better overall integrity and responsibility. In several cases, simpler models achieve a stronger cross-dimensional balance than more complex deep tabular architectures. MIRAI provides a compact and practical basis for responsible model selection in regulated settings.

【18】Lang2MLIP: End-to-End Language-to-Machine Learning Interatomic Potential Development with Autonomous Agentic Workflows
标题:Lang2MLIP:端到端的机器学习与自主学习工作流的原子间潜力开发
链接:https://arxiv.org/abs/2605.14527

作者:Wenwen Li,Yuki Orimo,Nontawat Charoenphakdee
备注:31 pages, 12 figures
摘要:Developing machine learning interatomic potentials (MLIPs) for complex materials systems remains challenging because it requires expertise in atomistic simulations, machine learning, and workflow design, as well as iterative active learning procedures. Existing automated pipelines typically assume a fixed sequence of stages or depend on domain experts, which limits their adaptability to heterogeneous materials systems where the optimal curriculum is not known in advance. To lower the barrier to developing MLIPs for non-experts, we propose Lang2MLIP, a multi-agent framework that takes natural-language input and formulates end-to-end MLIP development as a sequential decision-making problem solved by large language models (LLMs). At each step, a decision-making agent observes the current dataset, model, evaluation results, and execution log, and then automatically selects an appropriate action to improve the model. This removes the need for a predefined pipeline and enables the agent to self-correct by revisiting earlier subsystems when new failures arise. We evaluate this approach on a solid electrolyte interphase (SEI) system with multiple components and interfaces. These results suggest that LLM-based multi-agent systems are a promising direction for automating MLIP development and making it more accessible to non-experts.

【19】A Novel Schur-Decomposition-Based Weight Projection Method for Stable State-Space Neural-Network Architectures
标题:稳定状态空间神经网络架构的一种基于舒尔分解的权重投影方法
链接:https://arxiv.org/abs/2605.14489

作者:Sergio Vanegas,Lasse Lensu,Fredy Ruiz
备注:32 pages, 13 figures. Source code at https://codeberg.org/sergiovaneg/SchurSS
摘要:Building black-box models for dynamical systems from data is a challenging problem in machine learning, especially when asymptotic stability guarantees are required. In this paper, we introduce a novel stability-ensuring and backpropagation-compatible projection scheme based on the Schur decomposition for the state matrix of linear discrete-time state-space layers, as well as an alternative pre-factorized formulation of the methodology. The proposed methods dynamically project the quasi-triangular factor of the state matrix's real Schur decomposition onto its nearest stable peer, ensuring stable dynamics with minimal overparameterization. Experiments on synthetic linear systems demonstrate that the method achieves accuracy and convergence rates comparable to those of state-of-the-art stable-system identification techniques, despite a marginal increase in computational complexity. Furthermore, the lower weight count facilitates convergence during training without sacrificing accuracy in stacked neural-network architectures with static nonlinearities targeting real-world datasets. These results suggest that the Schur-based projection provides a numerically robust framework for identifying complex dynamics on par with the State of the Art while satisfying strict asymptotic-stability requirements.

【20】Test-Time Learning with an Evolving Library
标题:通过不断发展的图书馆进行考试时学习
链接:https://arxiv.org/abs/2605.14477

作者:Weijia Xu,Alessandro Sordoni,Chandan Singh,Zelalem Gero,Michel Galley,Xingdi Yuan,Jianfeng Gao
摘要:We introduce EvoLib, a test-time learning framework that enables large language models to accumulate, reuse, and evolve knowledge across problem instances without parameter updates or external supervision. Instead of adapting model parameters, our approach maintains a shared library of knowledge abstractions, including modular skills and reflective insights, automatically extracted from the model's own inference trajectories. To support continual improvement, we introduce a principled weighting and consolidation mechanism that jointly optimizes for immediate utility and long-term value. This allows simple, instance-specific abstractions to evolve into more general and reusable ones over time. Across challenging benchmarks in mathematical reasoning, code generation, and multi-turn agentic environments, EvoLib improves substantially over the top test-time scaling and learning methods without ground-truth feedback.

【21】Focused PU learning from imbalanced data
标题:从不平衡数据中进行重点PU学习
链接:https://arxiv.org/abs/2605.14467

作者:Elias Zavitsanos,Georgios Paliouras
摘要:We propose a new method of learning from positive and unlabeled (PU) examples in highly imbalanced datasets. Many real-world problems, such as disease gene identification, targeted marketing, fraud detection, and recommender systems, are hard to address with machine learning methods, due to limited labeled data. Often, training data comprises positive and unlabeled instances, the latter typically being dominated by negative, but including also several positive instances. While PU learning is well-studied, few methods address imbalanced settings or hard-to-detect positive examples that resemble negative ones. Our approach uses a focused empirical risk estimator, incorporating both positive and unlabeled examples to train binary classifiers. Empirical evaluations demonstrate state-of-the-art performance on imbalanced datasets under two labeling mechanisms - selecting positives completely at random (SCAR) and selecting at random (SAR). Beyond these controlled experiments, we demonstrate the value of the proposed method in the real-world application of financial misstatement detection.

【22】LoMETab: Beyond Rank-1 Ensembles for Tabular Deep Learning
标题:LoMETab:超越Rank-1的表格深度学习集成
链接:https://arxiv.org/abs/2605.14365

作者:Changryeol Choi,Hyewon Park,Yujin Kwon,Gowun Jeong
摘要 :Recent tabular learning benchmarks increasingly show a tight performance cluster rather than a clear hierarchy among leading methods, spanning gradient boosted decision trees, attention-based architectures, and implicit ensembles such as TabM. As benchmark gains plateau, a complementary goal is to understand and control the mechanisms that make simple neural tabular models competitive. We propose LoMETab, a rank-$r$ generalization of multiplicative implicit ensembles. LoMETab lifts the rank-1 BatchEnsemble/TabM modulation to a rank-$r$ identity-residual Hadamard family by parameterizing each member weight as $W_k = W \odot (1 + A_kB_k^\top)$, where $W$ is shared and $(A_k, B_k)$ are member-specific low-rank factors. This exposes two practical diversity-control axes: the adapter rank $r$ and the initialization scale $σ_{\mathrm{init}}$, and we prove that for $r \ge 2$ this generalization strictly enlarges BatchEnsemble's hypothesis class. Empirically, we show that this added capacity manifests as measurable predictive diversity after training: on representative classification datasets, LoMETab sustains higher pairwise KL than an additive low-rank ablation, and $(r, σ_{\mathrm{init}})$ provides broad control over pairwise KL, varying by up to several orders of magnitude across configurations. The induced diversity is reflected in task-appropriate output-level measures: argmax disagreement for classification and ambiguity for regression, indicating that the control extends beyond pairwise KL to decision- and output-level member variation. Finally, experiments sweeping over adapter rank $r$ and initialization scale $σ_{\mathrm{init}}$ reveal that predictive performance is dataset-dependent over the $(r, σ_{\mathrm{init}})$ grid, supporting LoMETab as a controllable family of implicit ensembles rather than a fixed rank-1 construction.

【23】Towards Fine-Grained and Verifiable Concept Bottleneck Models
标题:迈向细粒度且可验证的概念瓶颈模型
链接:https://arxiv.org/abs/2605.14210

作者:Yingying Fang,Haijie Xu,Shuang Wu,Mariathasan Anish,Guang Yang
备注:10 pages, 4 figures
摘要:Concept Bottleneck Models (CBMs) offer interpretable alternatives to black-box predictors by introducing human-relatable concepts before the final output. However, existing CBMs struggle to verify whether predicted concepts correspond to the correct visual evidence, limiting their reliability. We propose a fine-grained CBM framework that grounds each concept in localized visual evidence, enabling direct inspection of where and how concepts are encoded. This design allows users to interpret predictions and verify that the model learns intended concepts rather than spurious correlations. Experiments on medical imaging benchmarks show that our learned concept space is information-complete and achieves predictive performance comparable to standard CBMs, while substantially improving transparency. Unlike post-hoc attribution methods, our framework validates both the presence and correctness of concept representations, bridging interpretability with verifiability. Our approach enhances the trustworthiness of CBMs and establishes a principled mechanism for human-model interaction at the concept level, paving the way toward more reliable and clinically actionable concept-based learning systems.

【24】Finite Sample Bounds for Learning with Score Matching
标题:分数匹配学习的有限样本界限
链接:https://arxiv.org/abs/2605.14168

作者:Devin Smedira,Abhijith Jayakumar,Sidhant Misra,Marc Vuffray,Andrey Y. Lokhov
备注:22 pages
摘要:Learning of continuous exponential family distributions with unbounded support remains an important area of research for both theory and applications in high-dimensional statistics. In recent years, score matching has become a widely used method for learning exponential families with continuous variables due to its computational ease when compared against maximum likelihood estimation. However, theoretical understanding of the statistical properties of score matching is still lacking. In this work, we provide a non-asymptotic sample complexity analysis for learning the structure of exponential families of polynomials with score matching. The derived sample bounds show a polynomial dependence on the model dimension. These bounds are the first of its kind, as all prior work has shown only asymptotic bounds on the sample complexity.

【25】Mini-JEPA Foundation Model Fleet Enables Agentic Hydrologic Intelligence
标题:Mini-JEPA基金会模型舰队实现大型水文情报
链接:https://arxiv.org/abs/2605.14120

作者:Mashrekur Rahman
摘要:Geospatial foundation models compress multispectral observations into dense embeddings increasingly used in natural-language environmental reasoning systems. A single planetary-scale model, e.g. Google AlphaEarth, handles broad characterization well but may compromise on specialized hydrologic signals. Such generalist models are also often inaccessible, expensive, and require large-scale compute. We propose Mini-JEPAs: a fleet of small sensor-specialized Joint Embedding Predictive Architecture (JEPA) foundation models consulted by a routing agent for specialized questions. We pretrained five 22M-parameter Mini-JEPAs sharing an identical Vision Transformer backbone, JEPA recipe, and 64-d output space, using Sentinel-2 optical, Sentinel-1 SAR, MODIS thermal, multi-temporal Sentinel-2 phenology, and a topography-soil stack. Each Mini-JEPA reconstructs the variable matched to its sensor, with cross-validated $R^2$ reaching 0.97 for elevation, 0.97 for temperature, and 0.81 for precipitation. The five manifolds differ in geometric structure, with global participation ratios from 8.9 to 20.2 and local intrinsic dimensionalities from 2.3 to 9.0. Joint topography-soil and phenology models add predictive value beyond AlphaEarth alone for soil moisture, aridity, and precipitation ($ΔR^2$ up to 0.031). A router LLM reads per-modality references and selects appropriate sensors with a perfect hit rate over a curated question set. In paired LLM-as-Judge evaluation, dual retrieval over AlphaEarth and the routed fleet outperforms AlphaEarth alone on physics-matched questions (Cohen's $d = 1.10$, $p = 0.031$). Locally-trained Mini-JEPAs can be operationalized for hydrologic intelligence with modest compute.

【26】Synthetic Sociality: How Generative Models Privatize the Social Fabric
标题:综合社会性:生成模型如何私有化社会结构
链接:https://arxiv.org/abs/2605.14090

作者:Ana Dodik, Moira Weigel
摘要

【27】A Unified Geometric Framework for Weighted Contrastive Learning
标题:加权对比学习的统一几何框架
链接:https://arxiv.org/abs/2605.13943

作者:Raphael Vock,Edouard Duchesnay,Benoit Dufumier
备注:Preprint
摘要:Contrastive learning (CL) aims to preserve relational structure between samples by learning representations that reflect a similarity graph. Yet, the geometry of the resulting embeddings remains poorly understood. Here we show that weighted InfoNCE objectives can be interpreted as Distance Geometry Problems, where the weighting scheme specifies the target geometry to be realized by the representation. This viewpoint yields exact characterizations of the optimal embeddings for several supervised and weakly supervised objectives. In supervised classification, both SupCon and Soft SupCon (a dense relaxation of it where pairs from distinct classes have small non-zero similarity) collapse samples within each class to a single prototype. However, while balanced SupCon recovers the classical regular simplex geometry, class imbalance breaks this symmetry: SupCon induces non-uniform inter-class similarities depending on class sizes, whereas Soft SupCon preserves a regular simplex geometry regardless of class imbalance. In continuous-label settings, our framework reveals a different failure mode: y-Aware CL generally cannot attain its entropic optimum unless the labels lie on a hypersphere, exposing a mismatch between Euclidean label weights and spherical latent similarity. By contrast, geometrically consistent choices such as Euclidean-Euclidean weighting or X-CLR admit unique optimal embeddings. Our results show that the choice of weighting scheme determines whether contrastive learning is geometrically realizable, degenerate, or inconsistent, providing a principled framework for designing contrastive objectives.

【28】Mechanistic Interpretability of EEG Foundation Models via Sparse Autoencoders
标题:通过稀疏自动编码器实现脑电基础模型的机制解释
链接:https://arxiv.org/abs/2605.13930

作者:William Lehn-Schiøler,Magnus Ruud Kjær,Rahul Thapa,Magnus Guldberg Pedersen,Anton Storgaard Mosquera,Nick Williams,Radu Gatej,Tue Lehn-Schiøler,Sándor Beniczky,Sadasivan Puthusserypady,James Zou,Lars Kai Hansen
备注:Preprint. 14 pages, 7 figures, 4 tables
摘要 :EEG foundation models achieve state-of-the-art clinical performance, yet the internal computations driving their predictions remain opaque: a barrier to clinical trust. We apply TopK Sparse Autoencoders (SAEs) across three architecturally distinct EEG transformers: SleepFM, REVE, and LaBraM to extract sparse feature dictionaries from their embeddings. By grounding these features in a clinical taxonomy (abnormality, age, sex, and medication), we benchmark monosemanticity and entanglement across architectures. A single hyperparameter procedure, driven by an intrinsic dictionary health audit, transfers robustly across all three architectures. Via concept steering, we introduce a "target vs. off-target" probe area metric to quantify steering selectivity and reveal three operational regimes: selectively steerable, encoded but entangled, and non-encoded. This framework exposes critical representational failures: "wrecking-ball" interventions that collapse global model performance, and clinical entanglements, such as age-pathology confounding, where it is impossible to suppress one concept without corrupting the other. Finally, a spectral decoder maps these interventions back to the amplitude spectrum, translating latent manipulations into physiologically interpretable frequency signatures, such as pathological slow-wave suppression and $α$-band restoration.

【29】The Moltbook Observatory Archive: an incremental dataset of agent-only social network activity
标题:Moltbook Observatory Archive:仅限代理社交网络活动的增量数据集
链接:https://arxiv.org/abs/2605.13860

作者:Sushant Gautam,Annika W. Olstad,Klas H. Pettersen,Michael A. Riegler
备注:12 pages, 5 figures
摘要:Moltbook is a social media platform in which posts and comments are authored exclusively by autonomous AI agents. We present the Moltbook Observatory Archive, an incremental dataset that passively records agent profiles, posts, comments, community metadata (``submolts''), platform-level time-series snapshots, and word-frequency trend aggregates obtained by continuously polling the Moltbook API. Data are stored in a live SQLite observatory database and exported as date-partitioned Parquet files to enable efficient analysis and reproducible research. The documented release covers 78~days of platform activity (2026-01-27 to 2026-04-14) and contains 2,615,098~posts and 1,213,007~comments from 175,886~unique posting agents across 6,730~communities. This is, to our knowledge, the first large-scale observational dataset of a social network populated exclusively by autonomous AI agents. The archive is intended to support research on multi-agent communication, emergent social behavior, and safety-relevant phenomena in agent-only online environments, and it is released under the MIT license with code for collection and export.

【30】Scalable Mechanistic Neural Networks for Differential Equations and Machine Learning
标题:用于微方程和机器学习的可扩展机械神经网络
链接:https://arxiv.org/abs/2410.06074

作者:Jiale Chen,Dingling Yao,Adeel Pervez,Dan Alistarh,Francesco Locatello
备注:Published as a conference paper at the Thirteenth International Conference on Learning Representations (ICLR 2025): https://openreview.net/forum?id=Oazgf8A24z
摘要:We propose Scalable Mechanistic Neural Network (S-MNN), an enhanced neural network framework designed for scientific machine learning applications involving long temporal sequences. By reformulating the original Mechanistic Neural Network (MNN) (Pervez et al., 2024), we reduce the computational time and space complexities from cubic and quadratic with respect to the sequence length, respectively, to linear. This significant improvement enables efficient modeling of long-term dynamics without sacrificing accuracy or interpretability. Extensive experiments demonstrate that S-MNN matches the original MNN in precision while substantially reducing computational resources. Consequently, S-MNN can drop-in replace the original MNN in applications, providing a practical and efficient tool for integrating mechanistic bottlenecks into neural network models of complex dynamical systems. Source code is available at https://github.com/IST-DASLab/ScalableMNN.

【31】Average Gradient Outer Product in kernel regression provably recovers the central subspace for multi-index models
标题:核回归中的平均梯度外积可证明恢复多指标模型的中心子空间
链接:https://arxiv.org/abs/2605.15082

作者:Libin Zhu,Damek Davis,Dmitriy Drusvyatskiy,Maryam Fazel
备注:95 pages, 12 figures
摘要:We study a prototypical situation when a learned predictor can discover useful low-dimensional structure in data, while using fewer samples than are needed for accurate prediction. Specifically, we consider the problem of recovering a multi-index polynomial $f^*(x)=h(Ux)$, with $U\in\mathbb{R}^{r\times d}$ and $r\ll d$, from finitely many data/label pairs. Importantly, the target function depends on input $x$ only through the projection onto an unknown $r$-dimensional central subspace. The algorithm we analyze is appealingly simple: fit kernel ridge regression (KRR) to the data and compute the Average Gradient Outer Product (AGOP) from the fitted predictor. Our main results show that under reasonable assumptions the top $r$-dimensional eigenspace of AGOP provably recovers the central subspace, even in regimes when the prediction error remains large. Specifically, if the target function $f^*$ has degree $p^*$, it is known that $n\asymp d^{p^*}$ samples are necessary for KRR to achieve accurate prediction. In contrast, we show that if a low degree $p$ component of $f^*$ already carries all relevant directions for prediction, subspace recovery occurs in the much lower sample regime $n\asymp d^{p+δ}$ for any $δ\in(0,1)$. Our results thus demonstrate a separation between prediction and representation, and provide an explanation for why iterative kernel methods such as Recursive Feature Machines (RFM) can be sample-efficient in practice.

【32】Real-time virtual circuits for plasma shape control via neural network emulators
标题:通过神经网络模拟器实现等离子体形状控制的实时虚拟电路
链接:https://arxiv.org/abs/2605.14939

作者:Alasdair Ross,George K. Holt,Kamran Pentland,Adriano Agnello,Nicola C. Amorisco,Pedro Cavestany,Aran Garrod,Timothy Nunn,Charles Vincent,Graham McArdle
摘要:Reliable position and shape control in tokamak plasmas requires accurate real-time regulation of several strongly coupled shape parameters. The control vectors that disentangle these couplings, referred to as \textit{virtual circuits} (VCs), enable independent shape parameter control for a specific Grad--Shafranov (GS) equilibrium. Numerical calculation of VCs is not currently feasible in real time, therefore VCs are usually computed prior to each experiment, using a small number of reference GS equilibria sampled along the desired scenario trajectory, with each VC used to control the plasma within a preset time interval. While effective near the reference equilibrium, this approach can lead to degraded performance as the plasma departs from the reference equilibrium and/or from the desired trajectory, and it complicates the design of robust control strategies for rapidly evolving plasma configurations. In this paper, we construct neural-network-based emulators of plasma shape parameters from which VCs can be derived, to provide the MAST Upgrade (MAST-U) plasma control system with state-aware VCs in real-time. To do this, we develop an extensive library of over a million simulated GS equilibria, covering a substantial portion of the MAST-U operational space. These emulators provide differentiable functions whose gradients can be rapidly computed, enabling the derivation of accurate VCs for real-time shape control. We perform extensive verification of the emulated VCs by testing whether they disentangle the control problem. The neural-network-based approach delivers high accuracy and orthogonality across a diverse range of equilibria. This work establishes the physical validity of emulated VCs as a scalable and general alternative to schedules of precomputed VCs.

【33】A Non-Monotone Preconditioned Trust-Region Method for Neural Network Training
标题:神经网络训练的非单调预条件信任域方法
链接:https://arxiv.org/abs/2605.14860

作者:Andrea Angino,Bindi Çapriqi,Shega Likaj,Ken Trotti,Rolf Krause
备注:7 pages, 2 figures,
摘要:Training deep neural networks at scale can benefit from domain decomposition, where the network is split into subdomains trained in parallel and coupled by a global trust-region mechanism. Building on the Additively Preconditioned Trust-Region Strategy (APTS), we propose a non-monotone variant with a nonlinear additive Schwarz preconditioner that combines parallel subdomain corrections with global coarse-space directions. A windowed acceptance criterion allows controlled objective increases, avoiding needless rejection of effective coarse steps. The resulting non-monotone APTS (NAPTS) preserves accuracy while reducing CPU time by 30\% and cutting rejected steps to one third of those in APTS.

【34】Scalable Solution of the Stochastic Multi-path Traveling Salesman Problem via Neural Networks
标题:基于神经网络的随机多路旅行商问题的可扩展解
链接:https://arxiv.org/abs/2605.14662

作者:Xiaochen Chou,Ludovica Di Marco,Enza Messina
摘要:The multi-path Traveling Salesman Problem with stochastic travel costs arises in hybrid vehicle routing applications designed for Smart City and City Logistics, where multiple paths exist between each pair of locations. Travel times along these paths are typically affected by real-time traffic conditions and therefore modeled as stochastic. The objective of the problem is to determine a Hamiltonian tour that minimizes the expected total travel cost under uncertainty. In this work, we adopt a two-stage stochastic programming formulation. In the first stage, a predefined route specifying the sequence of locations to be visited is determined, while taking into consideration a second-stage recourse problem that selects the optimal path from the feasible set of alternative paths for each pair of locations, once real-time traffic conditions are realized. To reduce the computational burden imposed by the large number of scenarios required to capture travel time uncertainty, the innovation of this work is the integration of neural network-based surrogate models to approximate the expected value of the second-stage recourse problem. Different architectures and training strategies for the neural networks are proposed and analyzed, with performance evaluated in terms of computation time, solution quality, and generalization capability. Preliminary findings demonstrate the enhanced scalability and practical applicability of the approach for complex vehicle routing problems under uncertainty.

【35】Scaling Laws from Sequential Feature Recovery: A Solvable Hierarchical Model
标题:序列特征恢复的缩放定律:可解分层模型
链接:https://arxiv.org/abs/2605.14567

作者:Arie Wortsman-Zurich,Hugo Tabanelli,Yatin Dandi,Florent Krzakala,Bruno Loureiro
摘要:We propose a simple mechanism by which scaling laws emerge from feature learning in multi-layer networks. We study a high-dimensional hierarchical target that is a globally high-degree function, but that can be represented by a combination of latent compositional features whose weights decrease as a power law. We show that a layer-wise spectral algorithm adapted to this compositional structure achieves improved scaling relative to shallow, non-adaptive methods, and recovers the latent directions sequentially: strong features become detectable at small sample sizes, while weaker features require more data. We prove sharp feature-wise recovery thresholds and show that aggregating these transitions yields an explicit power-law decay of the prediction error. Technically, the analysis relies on random matrix methods and a resolvent-based perturbation argument, which gives matching upper and lower bounds for individual eigenvector recovery beyond what standard gap-based perturbation bounds provide. Numerical experiments confirm the predicted sequential recovery, finite-size smoothing of the thresholds, and separation from non-hierarchical kernel baselines. Together, these results show how smooth scaling laws can emerge from a cascade of sharp feature-learning transitions.

【36】Wahkon: A Statistically Principled Deep RKHS Superposition Network
标题:Wahkon:一个基于统计原则的深度RKHS叠加网络
链接:https://arxiv.org/abs/2605.14041

作者:Yongkai Chen,Wenxuan Zhong,Ping Ma
摘要:Deep learning excels at prediction but often lacks finite-sample guarantees and calibrated uncertainty; RKHS (Reproducing Kernel Hilbert Space)-based methods provide those guarantees but struggle to adapt in high dimensions. We propose Wahkon, a deep RKHS superposition network that unifies Kolmogorov's superposition principle with RKHS regularization in the smoothing-spline tradition of Wahba. This yields a finite-dimensional deep representer theorem that makes training tractable and provides explicit layerwise complexity control. We show the penalized estimator is exactly the MAP (maximum a posteriori) estimate under a hierarchical Gaussian-process prior, extending the spline/GP duality to deep compositions. Using metric-entropy arguments, we establish minimax-optimal convergence rates under mild smoothness and clarify how depth and width trade off with regularity. Empirically, Wahkon outperforms multilayer perceptrons, Neural Tangent Kernels, and Kolmogorov--Arnold Networks across simulation benchmarks and a single-cell CITE-seq study. By unifying Kolmogorov's superposition principle with RKHS regularization, Wahkon delivers accuracy, interpretability, and statistical rigor in a single framework.

【37】Winning Lottery Tickets in Neural Networks via a Quantum-Inspired Classical Algorithm
标题:通过量子经典算法在神经网络中赢得彩票
链接:https://arxiv.org/abs/2605.13979

作者:Natsuto Isogai,Hayata Yamasaki,Sho Sonoda,Mio Murao
备注:28 pages, 3 figures
摘要:Quantum machine learning (QML) aims to accelerate machine learning tasks by exploiting quantum computation. Previous work studied a QML algorithm for selecting sparse subnetworks from large shallow neural networks. Instead of directly solving an optimization problem over a large-scale network, this algorithm constructs a sparse subnetwork by sampling hidden nodes from an optimized probability distribution defined using the ridgelet transform. The quantum algorithm performs this sampling in time $O(D)$ in the data dimension $D$, whereas a naive classical implementation relies on handling exponentially many candidate nodes and hence takes $\exp[O(D)]$ time. In this work, we construct and analyze a quantum-inspired fully classical algorithm for the same sampling task. We show that our algorithm runs in time $O(\operatorname{poly}(D))$, thereby removing the exponential dependence on $D$ from the previous classical approach. Numerical simulations show that the proposed sampler achieves empirical risk comparable to exact sampling from the optimized distribution and substantially lower than sampling from the non-optimized uniform distribution, while also exhibiting exponentially improved runtime scaling compared with the conventional classical implementation. These successful dequantization results show that sparse subnetwork selection via optimized sampling can be achieved classically with polynomial data-dimension scaling on conventional computers without quantum hardware, providing an alternative to the existing quantum algorithm.

【38】Covariance-aware sampling for Diffusion Models
标题:扩散模型的协方差感知抽样
链接:https://arxiv.org/abs/2605.13910

作者:Andrea Schioppa,Tim Salimans
摘要 :We present a covariance-aware sampler that improves the quality of pixel-space Diffusion Model (DM) sampling in the few-step regime. We hypothesize that in the few-step regime samplers fail because they rely solely on the predicted mean of the reverse distribution, while our solution explicitly models the reverse-process covariance. Our method combines Tweedie's formula to estimate the covariance with an efficient, structured Fourier-space decomposition of the covariance matrix. Implemented as an extension of DDIM, our method requires only a minimal overhead: one extra Jacobian-Vector Product (JVP) per step. We demonstrate that for pixel-based DMs, our method consistently produces superior samples compared to state-of-the-art second order samplers (Heun, DPM-Solver++) and the recent aDDIM sampler, at an identical number of function evaluations (NFE).

其他(74篇)

【1】Position: Behavioural Assurance Cannot Verify the Safety Claims Governance Now Demands
标题:立场:行为保证无法验证治理现在要求的安全声明
链接:https://arxiv.org/abs/2605.15164

作者:Pratinav Seth,Vinay Kumar Sankarapu
摘要:This position paper argues that behavioural assurance, even when carefully designed, is being asked to carry safety claims it cannot verify. AI governance frameworks enacted between 2019 and early 2026 require reviewable evidence of properties such as the absence of hidden objectives, resistance to loss-of-control precursors, and bounded catastrophic capability; current assurance methodologies (primarily behavioural evaluations and red-teaming) are epistemically limited to observable model outputs and cannot verify the latent representations or long-horizon agentic behaviours these frameworks presume to regulate. We formalize this structural mismatch as the audit gap, the divergence between required and achievable verification access, and introduce the concept of fragile assurance to describe cases where the evidential structure does not support the asserted safety claim. Through an analysis of a 21-instrument inventory, we identify an incentive gradient where geopolitical and industrial pressures systematically reward surface-level behavioral proxies over deep structural verification. Finally, we propose a technical pivot: bounding the weight of behavioral evidence in legal text and extending voluntary pre-deployment access with mechanistic-evidence classes, specifically linear probes, activation patching, and before/after-training comparisons.

【2】Hand-in-the-Loop: Improving Dexterous VLA via Seamless Interventional Correction
标题:手在环:通过无缝介入矫正改善灵巧型VLA
链接:https://arxiv.org/abs/2605.15157

作者:Zhuohang Li,Liqun Huang,Wei Xu,Zhengming Zhu,Nie Lin,Xiao Ma,Xinjun Sheng,Ruoshi Wen
摘要:Vision-Language-Action (VLA) models are prone to compounding errors in dexterous manipulation, where high-dimensional action spaces and contact-rich dynamics amplify small policy deviations over long horizons. While Interactive Imitation Learning (IIL) can refine policies through human takeover data, applying it to high-degree-of-freedom (DoF) robotic hands remains challenging due to a command mismatch between human teleoperation and policy execution at the takeover moment, which causes abrupt robot-hand configuration changes, or "gesture jumps". We present Hand-in-the-Loop (HandITL), a seamless human-in-the-loop intervention method that blends human corrective intent with autonomous policy execution to avoid gesture jumps during bimanual dexterous manipulation. Compared with direct teleoperation takeover, HandITL reduces takeover jitter by 99.8% and preserves robust post-takeover manipulation, reducing grasp failures by 87.5% and mean completion time by 19.1%. We validate HandITL on tasks requiring bimanual coordination, tool use, and fine-grained long-horizon manipulation. When used to collect intervention data for policy refinement, HandITL yields policies that outperform those trained with standard teleoperation data by 19% on average across three long-horizon dexterous tasks.

【3】Forgetting That Sticks: Quantization-Permanent Unlearning via Circuit Attribution
标题:忘记那根棒:量化-通过电路归因永久遗忘
链接:https://arxiv.org/abs/2605.15138

作者:Saisab Sadhu,Pratinav Seth,Vinay Kumar Sankarapu
摘要:Standard unlearning evaluations measure behavioral suppression in full precision, immediately after training, despite every deployed language model being quantized first. Recent work has shown that 4-bit post-training quantization can reverse machine unlearning; we show this is not a tuning artefact but a systematic dual failure: gradient-based methods that achieve meaningful forgetting lose it under compression, while methods that survive quantization barely change the model. Both failures trace to the same root cause: across all baselines, per-parameter updates lie 47-828x below the NF4 quantization bin width; updates diffused across billions of parameters cannot clear quantization bin boundaries, a consequence we formalize as a sparsity-permanence tradeoff. We present MANSU (Mechanistic-Aligned Null-Space Unlearning), which resolves both modes by combining causal circuit attribution to isolate the minimal forget-set subgraph, circuit-restricted null-space projection with a diagonal-Fisher retain bound, and a per-parameter magnitude floor guaranteeing quantization survival by construction. We additionally introduce Circuit Attribution Divergence (CAD), a mechanistic verification metric distinguishing structural erasure from behavioral suppression, a distinction existing metrics cannot make. Across multiple model families and hazard benchmarks, MANSU is the first method to jointly satisfy all four properties with margin on each (meaningful forgetting, retain preservation, non-positive PTQ gap, and structural erasure), while gradient-based baselines recover up to +0.05 accuracy under compression.

【4】Generalized Priority-Aware Shapley Value
标题:广义优先级感知Shapley值
链接:https://arxiv.org/abs/2605.15018

作者:Kiljae Lee,Ziqi Liu,Weijing Tang,Yuan Zhang
摘要:Shapley value and its priority-aware extensions are widely used for valuation in machine learning, but existing methods require pairwise priority to be binary and acyclic, a restriction spectacularly violated in real-data examples such as aggregated human preferences and multi-criterion comparisons. We introduce the generalized priority-aware Shapley value (GPASV), a random order value defined on arbitrary directed weighted priority graphs, in which pairwise edges penalize rather than forbid order violations. GPASV covers a range of classical models as boundary cases. We establish GPASV through an axiomatic characterization, develop the associated computational methods, and introduce a priority sweeping diagnostic extending PASV's. We apply GPASV to LLM ensemble valuation on the cyclic Chatbot Arena preference graph, illustrating that priority-aware valuation is not a one-button operation: different balances of pairwise graph priority versus individual soft priority produce substantively different valuations of the same data.

【5】Second-Order Actor-Critic Methods for Discounted MDPs via Policy Hessian Decomposition
标题:通过政策Hessian分解的折扣MDP的二阶行为批评方法
链接:https://arxiv.org/abs/2605.14982

作者:Sanjeev Manivannan,Shuban V
备注:9 pages, 2 figures including Appendix with Detailed proofs
摘要 :We address the discounted reward setting in reinforcement learning (RL). To mitigate the value approximation challenges in policy gradient methods, actor-critic approaches have been developed and are known to converge to stationary points under suitable assumptions. However, these methods rely on first-order updates. In contrast, second-order optimization provides principled curvature-aware updates that are proven to accelerate convergence, but its application in RL is limited by the computational complexity of Hessian estimation. In this work, we analyze second-order approximations for the actor update that leverage the full curvature information of the objective as much as possible. A stable approximation requires treating the action-value function as locally constant with respect to policy parameters, which does not generally hold in policy gradient methods. We show that this approximation becomes well-justified under a two-timescale actor-critic framework, where the critic evolves on a faster timescale and can be treated as quasi-stationary during actor updates. Building on this insight, we formulate a second-order actor-critic method for the discounted reward setting that leverages Hessian-vector product (HVP) computations, resulting in a computationally efficient and stable second-order update.

【6】Efficient Online Conformal Selection with Limited Feedback
标题:高效的在线保形选择,反馈有限
链接:https://arxiv.org/abs/2605.14953

作者:Sreenivas Gollapudi,Kostas Kollias,Kamesh Munagala,Ali Sinop
摘要:We address the problem of conformal selection, where an agent must select a minimal subset of options to ensure that at least one ``success'' is identified with a pre-specified target probability $φ$. While traditional online conformal prediction focuses on maintaining validity for the observed sequence, minimizing the resource cost (efficiency) of such selections, especially under limited feedback, remains a significant challenge. In this work, we consider settings with the most limited ``bandit'' feedback, and demonstrate that the simple Adaptive Conformal Inference (ACI) update rule, when applied to the appropriate control parameter or dual variable, is both adversarially valid, ensuring the success target is met on average for any input sequence (and hence under distribution shifts), and stochastically efficient, achieving sublinear efficiency regret for $i.i.d.$ inputs against an appropriate stochastic benchmark. We show such guarantees under canonical models capturing bandit and semi-bandit feedback to the agent via a unifying algorithmic technique, and analytic framework involving Lyapunov functions. Our approach handles more complex settings than prior work, while requiring significantly less feedback, and our results provide a new theoretical bridge between efficient online learning with limited feedback and distribution-free uncertainty quantification.

【7】Not All Symbols Are Equal: Importance-Aware Constellation Design for Semantic Communication
标题:并非所有符号都是平等的:具有重要性的语义沟通星座设计
链接:https://arxiv.org/abs/2605.14940

作者:Albert Shaju,Christo Kurisummoottil Thomas,Mayukh Roy Chowdhury
备注:Submitted to IEEE GLOBECOM 2026. 6 pages, 8 figures
摘要:Semantic communication systems for goal-oriented transmission must protect task-relevant information not only through source compression but also via physical layer mapping. Existing approaches decouple constellation design and semantic encoding, exposing critical symbols to channel errors at the same rate as irrelevant ones. Contrary to this, in this paper, a joint semantic-physical layer framework is proposed, which is composed of a vector quantized-variational autoencoder that extracts discrete latent concepts, a semantic criticality indicator (SCI) that scores each concept by task relevance, and a deep reinforcement learning agent that dynamically selects the transmission subset based on instantaneous channel conditions. At the physical layer, a learned semantic-aware M -QAM constellation assigns symbol positions according to joint co-occurrence statistics and SCI scores, departing from the uniform spacing and Gray coding of standard M -QAM which minimizes average BER without regard for semantic content. We introduce a novel semantic symbol vulnerability (SSV) metric and a semantic protection probability (SPP) to quantify the exposure of task-critical symbols to decoding errors, and prove that any Gray-coded constellation is strictly suboptimal in SCI-Weighted SSV whenever the source exhibits non-uniform semantic importance and co-occurrence statistics. Simulation results demonstrate that the proposed constellation achieves near 100% SPP across modulation orders from 4-QAM to 1024-QAM versus 50% for standard constellations at high spectral efficiency, a 21:1 compression ratio with semantic quality above 0.9, generalizing across MNIST, Fashion-MNIST, and FSDD without modification.

【8】Road Maps as Free Geometric Priors: Weather-Invariant Drone Geo-Localization with GeoFuse
标题:作为免费几何先验数据的道路图:使用Geotram的天气不变无人机地理定位
链接:https://arxiv.org/abs/2605.14925

作者:Yunsong Fang,Tingyu Wang,Zhedong Zheng
备注:18 pages, 4 figures
摘要:Drone-view geo-localization aims to match a query drone image, often captured under adverse weather conditions (e.g., rain, snow, fog), against a gallery of geo-tagged satellite images. Weather-induced degradations in the drone view, such as noise, reduced visibility, and partial occlusions, severely exacerbate the intrinsic cross-view domain gap. While prior methods predominantly rely on weather-specific architectures or data augmentations, they have largely overlooked road map data, a readily available modality that provides strong, inherently weather-invariant geometric layout cues (e.g., road networks and building footprints) at negligible additional cost. We introduce GeoFuse, a cross-modal fusion framework that integrates precisely aligned road map tiles with satellite imagery to yield more discriminative and weather-resilient representations. We first augment the existing University-1652 and DenseUAV benchmarks with geo-aligned road maps, supplying structural priors robust to meteorological variations. Building on this, we propose a flexible fusion module that combines satellite and road map features via token-level and channel-level interactions, with a lightweight dynamic gating mechanism that adaptively weights modality contributions per instance. Finally, we employ class-level cross-view contrastive learning to promote robust alignment between weather-degraded drone features and the fused satellite-roadmap representations. Extensive experiments under diverse weather conditions show that GeoFuse consistently outperforms state-of-the-art methods, achieving +3.46% and +23.18% Recall@1 accuracy on the University-1652 and DenseUAV benchmarks, respectively.

【9】From Sycophantic Consensus to Pluralistic Repair: Why AI Alignment Must Surface Disagreement
标题:从谄媚共识到多元化修复:为什么人工智能一致必须出现分歧
链接:https://arxiv.org/abs/2605.14912

作者:Varad Vishwarupe,Nigel Shadbolt,Marina Jirotka

【10】Text-Dependent Speaker Verification (TdSV) Challenge 2024: Team Naive System Report
标题:2024年文本相关说话者验证(TdSV)挑战:Team Naive系统报告
链接:https://arxiv.org/abs/2605.14896

作者:Amir Mohammad Rostami,Pourya Jafarzadeh

【11】Temporal Fair Division in Multi-Agent Systems: From Precise Alternation Metrics to Scalable Coordination Proxies
标题:多智能体系统中的时间公平划分:从精确的交替任务到可扩展的协调代理
链接:https://arxiv.org/abs/2605.14879

作者:Nikolaos Al. Papadopoulos
备注:15 pages, 3 figures, 8 tables. Submitted to ACM Transactions on Economics and Computation, Special Issue on Fair Division

【12】GPart: End-to-End Isometric Fine-Tuning via Global Parameter Partitioning
标题:GPart:通过全局参数分区进行端到端等距微调
链接:https://arxiv.org/abs/2605.14841

作者:Paolo Mandica,Michał Brzozowski,Zuzanna Dubanowska,Neo Christopher Chung

【13】GenAI for Energy-Efficient and Interference-Aware Compressed Sensing of GNSS Signals on a Google Edge TPU
标题:GenAI用于Google Edge pu上的GNSS信号的节能和干扰感知压缩感知
链接:https://arxiv.org/abs/2605.14839

作者:Thorben Wegner,Lucas Heublein,Tobias Feigl,Felix Ott,Christopher Mutschler,Alexander Rügamer
备注:12 pages

【14】Interestingness as an Inductive Heuristic for Future Compression Progress
标题:归属感作为未来压缩进步的诱导启发
链接:https://arxiv.org/abs/2605.14831

作者:Vincent Herrmann,Jürgen Schmidhuber

【15】Compositional Sparsity as an Inductive Bias for Neural Architecture Design
标题:成分稀疏性作为神经架构设计的诱导偏差
链接:https://arxiv.org/abs/2605.14764

作者:Hongyu Lin,Antonio Briola,Yuanrong Wang,Tomaso Aste

【16】Crys-JEPA: Accelerating Crystal Discovery via Embedding Screening and Generative Refinement
标题:Crys-JEPA:通过嵌入筛选和生成精炼加速晶体发现
链接:https://arxiv.org/abs/2605.14759

作者:Nian Liu,Nikita Kazeev,Stephen Gregory Dale,Artem Maevskiy,Yuwei Zeng,Ryoji Kubo,Pengru Huang,Thomas Laurent,Yann LeCun,Kostya S. Novoselov,Xavier Bresson

【17】Video2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent Pretraining
标题:Video2GUI:合成大规模交互轨迹以进行广义图形用户界面代理预训练
链接:https://arxiv.org/abs/2605.14747

作者:Weimin Xiong,Shuhao Gu,Bowen Ye,Zihao Yue,Lei Li,Feifan Song,Sujian Li,Hao Tian
备注:Accepted at ICML 2026

【18】Selective Safety Steering via Value-Filtered Decoding
标题:通过值过滤解码的选择性安全引导
链接:https://arxiv.org/abs/2605.14746

作者:Bat-Sheva Einbinder,Hen Davidov,Yee Whye Teh,Yarin Gal,Yaniv Romano

【19】AnchorRoute: Human Motion Synthesis with Interval-Routed Sparse Contro
标题:AnchorRoute:采用间隔路径稀疏对比的人体运动合成
链接:https://arxiv.org/abs/2605.14716

作者:Pengcheng Fang,Tengjiao Sun,Dongjie Fu,Xiaoyu Zhan,Yanwen Guo,Hansung Kim,Xiaohao Cai

【20】The Rate-Distortion-Polysemanticity Tradeoff in SAEs
标题:严重不良事件中的比率-扭曲-多元性权衡
链接:https://arxiv.org/abs/2605.14694

作者:Tommaso Mencattini,Francesco Montagna,Francesco Locatello

【21】Spontaneous symmetry breaking and Goldstone modes for deep information propagation
标题:自发对称性破缺和深度信息传播的戈德斯通模式
链接:https://arxiv.org/abs/2605.14685

作者:Nabil Iqbal,T. Anderson Keller,Yue Song,Takeru Miyato,Max Welling
备注:28 pages. Code at https://github.com/nabiliqbal/ssb-goldstone-deep-info-prop

【22】AQKA: Active Quantum Kernel Acquisition Under a Shot Budget
标题:AQKA:预算有限的情况下积极收购量子核心
链接:https://arxiv.org/abs/2605.14672

作者:Jian Xu,Chao Li,Delu Zeng,John Paisley,Qibin Zhao

【23】Unbiased and Second-Order-Free Training for High-Dimensional PDEs
标题:针对多维PDEs的无偏见和无二阶训练
链接:https://arxiv.org/abs/2605.14643

作者:Jaemin Seo,Surin Lee,Jae Yong Lee
备注:Accepted at ICML 2026

【24】Woodelf++: A Fast and Unified Partial Dependence Plot Algorithm for Decision Tree Ensembles
标题:Woodelf++:一个快速统一的决策树集成偏依赖图算法
链接:https://arxiv.org/abs/2605.14578

作者:Ron Wettenstein,Alexander Nadel,Udi Boker
备注:Extended version of the paper to appear at IJCAI 2026

【25】Let Robots Feel Your Touch: Visuo-Tactile Cortical Alignment for Embodied Mirror Resonance
标题:让机器人感受到你的触摸:视觉-触觉皮质对齐,实现镜面共振
链接:https://arxiv.org/abs/2605.14571

作者:Tianfang Zhu,Ning An,Rui Wang,Jiasi Gao,Qingming Luo,Anan Li,Guyue Zhou

【26】Discovering Physical Directions in Weight Space: Composing Neural PDE Experts
标题:发现体重空间中的物理方向:构成神经偏出方程专家
链接:https://arxiv.org/abs/2605.14546

作者:Pengkai Wang,Pengwei Liu,Yuanyi Wang,Guanyu Chen,Xingyu Ren,Xiaolong Li,Zhongkai Hao,Yuting Kong,Qixin Zhang,Dong Ni

【27】Enjoy Your Layer Normalization with the Computational Efficiency of RMSNorm
标题:利用RMSNorm的计算效率享受层规范化
链接 :https://arxiv.org/abs/2605.14521

作者:Yuxin Guo,Yihao Yue,Yunhao Ni,Yizhou Ruan,Jie Luo,Wenjun Wu,Lei Huang
备注:33 pages, 21 figures

【28】Intelligence Impact Quotient (IIQ): A Framework for Measuring Organizational AI Impact
标题:智力影响商(IIQ):衡量组织人工智能影响力的框架
链接:https://arxiv.org/abs/2605.14455

作者:Chandan Rajah,Neha Sengupta,Federico Castanedo,Robin Mills,Amit Bahree,Ramesh Krishnan Muthukrishnan,Larry Murray

【29】FrontierSmith: Synthesizing Open-Ended Coding Problems at Scale
标题:FrontierSmith:大规模综合开放式编码问题
链接:https://arxiv.org/abs/2605.14445

作者:Runyuan He,Qiuyang Mang,Shang Zhou,Kaiyuan Liu,Hanchen Li,Huanzhi Mao,Qizheng Zhang,Zerui Li,Bo Peng,Lufeng Cheng,Tianfu Fu,Yichuan Wang,Wenhao Chai,Jingbo Shang,Alex Dimakis,Joseph E. Gonzalez,Alvin Cheung

【30】Collaborative Yet Personalized Policy Training: Single-Timescale Federated Actor-Critic
标题:协作但个性化的政策训练:单时间尺度联邦演员批评家
链接:https://arxiv.org/abs/2605.14423

作者:Leo Muxing Wang,Pengkun Yang,Lili Su

【31】Watch your neighbors: Training statistically accurate chaotic systems with local phase space information
标题:观察你的邻居:利用局部相空间信息训练统计上准确的混乱系统
链接:https://arxiv.org/abs/2605.14405

作者:Joon-Hyuk Ko,Andrus Giraldo,Deok-Sun Lee

【32】NodeSynth: Socially Aligned Synthetic Data for AI Evaluation
标题:NodeSynth:用于人工智能评估的社会一致合成数据
链接:https://arxiv.org/abs/2605.14381

作者:Qazi Mamunur Rashid,Xuan Yang,Zhengzhe Yang,Yanzhou Pan,Erin van Liemt,Darlene Neal,Kshitij Pancholi,Jamila Smith-Loud

【33】Data-Augmented Game Starts for Accelerating Self-Play Exploration in Imperfect Information Games
标题:数据增强游戏开始加速不完美信息游戏中的自我探索
链接:https://arxiv.org/abs/2605.14379

作者:JB Lanier,Nathan Monette,Pierre Baldi,Roy Fox
备注:17 pages, 4 figures. JB Lanier and Nathan Monette contributed equally

【34】RQ-MoE: Residual Quantization via Mixture of Experts for Efficient Input-Dependent Vector Compression
标题:RQ-MoE:通过混合专家进行残余量化以实现高效的输入相关的载体压缩
链接:https://arxiv.org/abs/2605.14359

作者:Zhengjia Zhong,Shuyan Ke,Zaizhou Lin,Jiaqi Song,Hongyi Lan,Hui Li
备注:To appear at ICML 2026

【35】Exemplar Partitioning for Mechanistic Interpretability
标题:机械解释性的示例划分
链接:https://arxiv.org/abs/2605.14347

作者:Jessica Rumbelow
备注:Code: https://github.com/jessicarumbelow/exemplar-partitioning. Pretrained dictionaries: https://huggingface.co/datasets/J-RUM/exemplar-partitioning

【36】Nearest-Neighbor Radii under Dependent Sampling
标题:相依抽样下的最近邻半径
链接:https://arxiv.org/abs/2605.14343

作者:Yuanyuan Gao,Yilong Hou,Zhexiao Lin
备注:33 pages

【37】Dynamic Latent Routing
标题:动态潜在路由
链接:https://arxiv.org/abs/2605.14323

作者:Fangyuan Yu,Xin Su,Amir Abdullah

【38】Beyond Binary: Reframing GUI Critique as Continuous Semantic Alignment
标题:超越二进制:将图形用户界面批评重新定义为连续语义对齐
链接:https://arxiv.org/abs/2605.14311

作者:Yuchen Sun,Pei Fu,Shaojie Zhang,Anan Du,Xiuwen Xi,Ruoceng Zhang,Zhenbo Luo,Jian Luan,Chongyang Zhang
备注:28 pages including appendix. Code and BBBench benchmark to be released

【39】ICED: Concept-level Machine Unlearning via Interpretable Concept Decomposition
标题:ICED:通过可解释概念分解的概念级机器去学习
链接:https://arxiv.org/abs/2605.14309

作者:Shen Lin,Jing Lin,Junhao Dong,Piotr Koniusz,Li Xu

【40】Minimal-Intervention KV Retention: A Design-Space Study and a Diversity-Penalty Survivor
标题:最低限度干预KV保留:设计空间研究和多元化处罚幸存者
链接:https://arxiv.org/abs/2605.14292

作者:Libo Sun,Po-wei Harn,Peixiong He,Xiao Qin
备注:12 pages, 2 figures, 3 tables. Code and data: https://github.com/libophd/minimal-kv-retention

【41】MetaMoE: Diversity-Aware Proxy Selection for Privacy-Preserving Mixture-of-Experts Unification
标题:MetaMoE:为保护隐私的专家混合统一选择具有多样性意识的代理
链接:https://arxiv.org/abs/2605.14289

作者:Weisen Jiang,Shuhao Chen,Sinno Jialin Pan
备注:Accepted by ICML 2026

【42】TILT: Target-induced loss tilting under covariate shift
标题:标题:协变量变化下目标引起的损失倾斜
链接:https://arxiv.org/abs/2605.14280

作者:Kakei Yamamoto,Martin J. Wainwright
备注:32 pages, 17 figures. Submitted to NeurIPS 2026

【43】Architecture-Aware Explanation Auditing for Industrial Visual Inspection
标题:工业视觉检查的架构感知解释审计
链接:https://arxiv.org/abs/2605.14255

作者:Sibo Jia,Zihang Zhao,Kunrong Li

【44】Action-Conditioned Risk Gating for Safety-Critical Control under Partial Observability
标题:部分可观察性下安全关键控制的条件风险门控
链接:https://arxiv.org/abs/2605.14246

作者:Yushen Liu,Yin-Jen Chen,Ziyi Chen,Tao Wang,Heng Huang,Xugui Zhou,Yanfu Zhang

【45】Active Learners as Efficient PRP Rerankers
标题:积极学习者作为高效的PPC重访者
链接:https://arxiv.org/abs/2605.14236

作者:Jeremías Figueiredo Paschmann,Juan Kaplan,Francisco Nattero Santiago Mauricio Barron Bucolo,Juan Wisznia,Luciano del Corro
备注:13 pages, 7 figures. Preprint

【46】How to Scale Mixture-of-Experts: From muP to the Maximally Scale-Stable Parameterization
标题:如何扩展混合专家:从muP到最大规模稳定参数化
链接:https://arxiv.org/abs/2605.14200

作者:Leena Chennuru Vankadara,Moritz Haas,Luke Hayward,Sebastian Bordt,Alessandro Breccia

【47】Stochastic Matching via Local Sparsification
标题:通过局部稀疏化的随机匹配
链接:https://arxiv.org/abs/2605.14195

作者:Sara Ahmadian,Edith Cohen,Mohammad Roghani

【48】bde: A Python Package for Bayesian Deep Ensembles via MILE
标题:bde:通过MILE用于Bayesian Deep Ensembles的Python包
链接:https://arxiv.org/abs/2605.14146

作者:Vyron Arvanitis,Angelos Aslanidis,Emanuel Sommer,David Rügamer

【49】Bridging the Rural Healthcare Gap: A Cascaded Edge-Cloud Architecture for Automated Retinal Screening
标题:缩小农村医疗差距:用于自动视网膜筛查的级联边缘云架构
链接:https://arxiv.org/abs/2605.14108

作者:Nishi Doshi,Shrey Shah

【50】MathAtlas: A Benchmark for Autoformalization in the Wild
标题:MathAtlas:野外自动化的基准
链接:https://arxiv.org/abs/2605.14061

作者:Nilay Patel, Noah Arias, Davit Babayan, Victoria Cochran, Timothy Libman, Hafsah Mahmood, Liam McCarty, Soli Munoz, Laurel Willey, Jeffrey Flanigan
备注:In submission at NeurIPS 2026

【51】Masked Autoencoders with Limited Data: Does It Work? A Fine-Grained Bioacoustics Case Study
标题:数据有限的掩蔽自动编码器:它有效吗?细粒度生物声学案例研究
链接:https://arxiv.org/abs/2605.14031

作者:Wuao Liu, Mustafa Chasmai, Subhransu Maji, Grant Van Horn
备注:Workshop on Fine-Grained Visual Categorization (FGVC) at CVPR 2026. 8 pages, 6 figures

【52】Dywave: Event-Aligned Dynamic Tokenization for Heterogeneous IoT Sensing Signal
标题:Dywave:面向异构物联网感知信号的事件对齐动态令牌化
链接:https://arxiv.org/abs/2605.14014

作者:Tomoyoshi Kimura, Denizhan Kara, Jinyang Li, Hongjue Zhao, Yigong Hu, Yizhuo Chen, Xiaomin Ouyang, Shengzhong Liu, Tarek Abdelzaher

【53】Support Before Frequency in Discrete Diffusion
标题:离散扩散中的频率前支撑
链接:https://arxiv.org/abs/2605.13999

作者:Adrian Müller,Antoine Gonon,Zebang Shen,Ya-Ping Hsieh,Niao He

【54】HodgeCover: Higher-Order Topological Coverage Drives Compression of Sparse Mixture-of-Experts
标题:HodgeCover:更高级的布局覆盖推动了稀疏专家混合的压缩
链接:https://arxiv.org/abs/2605.13997

作者:Tao Zhong,Dongzhe Zheng,Christine Allen-Blanchette
备注:34 pages, 8 figures

【55】Neural Fields for NV-Center Inverse Sensing
标题:NV中心反向感知的神经场
链接:https://arxiv.org/abs/2605.13988

作者:Zhixuan Zhao,Tao Zhong,Yixun Hu,Nathalie P. de Leon,Christine Allen-Blanchette
备注:33 pages, 16 figures

【56】TabPFN-3: Technical Report
标题:TabPFN-3:技术报告
链接:https://arxiv.org/abs/2605.13986

作者:Léo Grinsztajn,Klemens Flöge,Oscar Key,Felix Birkel,Philipp Jund,Brendan Roof,Mihir Manium,Shi Bin,Hoo,Magnus Bühler,Anurag Garg,Dominik Safaric,Jake Robertson,Benjamin Jäger,Simone Alessi,Adrian Hayler,Vladyslav Moroshan,Lennart Purucker,Philipp Singer,Alan Arazi,Julien Siems,Jan Hendrik Metzen,Georg Grab,Nick Erickson,Siyuan Guo,Eliott Kalfon,Simon Bing,David Salinas,Clara Cornu,Lilly Charlotte Wehrhahn,Diana Kriuchkova,Kursat Kaya,Lydia Sidhoum,Marie Salmon,Jerry Chen,Madelon Hulsebos,Yann LeCun,Samuel Müller,Bernhard Schölkopf,Sauraj Gambhir,Noah Hollmann,Frank Hutter

【57】WarmPrior: Straightening Flow-Matching Policies with Temporal Priors
标题:WarmPrior:调整具有时间先验的流匹配策略
链接:https://arxiv.org/abs/2605.13959

作者:Sinjae Kang,Chanyoung Kim,Kaixin Wang,Li Zhao,Kimin Lee

【58】Rethinking Molecular OOD Generalization via Target-Aware Source Selection
标题:通过目标感知源选择重新思考分子OOD概括
链接:https://arxiv.org/abs/2605.13932

作者:Zhuohao Lin,Kun Li,Jiameng Chen,Jiajun Yu,Duanhua Cao,Yizhen Zheng,Wenbin Hu

【59】CA2: Code-Aware Agent for Automated Game Testing
标题:CA 2:用于自动游戏测试的代码感知代理
链接:https://arxiv.org/abs/2605.13918

作者:Valliappan Chidambaram Adaikkappan,Vincent Martineau,Joshua Romoff,David Meger

【60】Ready from Day 1: Population-Aware Coordination for Large-Scale Constrained Multi-Agent Systems
标题:从第1天开始准备:大规模约束多智能体系统的人口感知协调
链接:https://arxiv.org/abs/2605.13900

作者:Angel Wang,Dominique Perrault-Joncas,Alvaro Maggiar,Carson Eisenach,Dean Foster
备注:30 pages, 16 figures. Submitted to NeurIPS 2026

【61】MoZoo:Unleashing Video Diffusion power in animal fur and muscle simulation
标题:MoZoo:释放动物皮毛和肌肉模拟中的视频扩散能力
链接:https://arxiv.org/abs/2605.13857

作者:Dongxia Liu,Jie Ma,Xiaochen Yang,Jiancheng Zhang,Bin Xia,Zhehan Kan,Nisha Huang,Jun Liang,Wenming Yang,Jin Li
备注:Github Page:https://dongxialiu15.github.io/MoZoo/

【62】RoSHAP: A Distributional Framework and Robust Metric for Stable Feature Attribution
标题:RoSHAP:稳定特征归因的分布框架和稳健指标
链接:https://arxiv.org/abs/2605.15154

作者:Lanxin Xiang,Liang Shi,Youhui Ye,Boyu Jiang,Dawei Zhou,Feng Guo

【63】Logging Policy Design for Off-Policy Evaluation
标题:用于非政策评估的日志策略设计
链接:https://arxiv.org/abs/2605.15108

作者:Connor Douglas,Joel Persson,Foster Provost

【64】nASR: An End-to-End Trainable Neural Layer for Channel-Level EEG Artifact Subspace Reconstruction in Real-Time BCI
标题:nASB:一种端到端可训练神经层,用于实时BCI中并行级脑电子空间重建
链接:https://arxiv.org/abs/2605.14941

作者:Shantanu Sarkar,Jose L. Contreras-Vidal
备注:Preprint. Submitted to IEEE SMC 2026 (under review)

【65】BCI-Based Assessment of Ocular Response Time Using Dynamic Time Warping Leveraging an RDWT-Driven Deep Neural Framework
标题:使用动态时间扭曲利用RDWT驱动的深度神经框架进行基于BCI的眼部反应时间评估
链接:https://arxiv.org/abs/2605.14883

作者:Shantanu Sarkar,Sai Shashank Gandavarapu,Jeff Feng,Saurabh Prasad,Reza Khanbabaie,Jose L. Contreras-Vidal
备注:Submitted to IEEE SMC 2026 (under review)

【66】All-atomistic Transferable Neural Potentials for Protein Solvation
标题:蛋白质溶解的全原子可转移神经潜力
链接:https://arxiv.org/abs/2605.14584

作者:Rishabh Dey,Salvina Sharipova,Konstantin Popov

【67】Large Dimensional Kernel Ridge Regression: Extending to Product Kernels
标题:大维度核岭回归:扩展到产品核
链接:https://arxiv.org/abs/2605.14524

作者:Yang Zhou,Yicheng Li,Yuqian Cheng,Qian Lin

【68】Analog RF Computing: A New Paradigm for Energy-Efficient Edge AI Over MU-MIMO Systems
标题:模拟RF计算:MU-MMO系统上节能边缘人工智能的新范式
链接:https://arxiv.org/abs/2605.14331

作者:Wentao Yu,Vincent W. S. Wong
备注:13 pages, 6 figures, 2 tables. This paper proposes analog RF computing as a new paradigm for energy-efficient edge inference over wireless networks and studies the corresponding physical layer design framework

【69】ForcingDAS: Unified and Robust Data Assimilation via Diffusion Forcing
标题:ForcingDAS:通过扩散强迫实现统一且稳健的数据同化
链接:https://arxiv.org/abs/2605.14285

作者:Yixuan Jia,Siyi Chen,Yida Pan,Xiao Li,Lianghe Shi,Chanyong Jung,Haijie Yuan,Ismail Alkhouri,Yue Cynthia Wu,Saiprasad Ravishankar,Jeffrey A Fessler,Qing Qu

【70】Training-Free Generative Sampling via Moment-Matched Score Smoothing
标题:通过动量匹配得分平滑进行免训练生成抽样
链接 :https://arxiv.org/abs/2605.14276

作者:Zhenyu Yao,Daniel Paulin
备注:35 pages

【71】Synthetic American Option Pricing via Jump-HMM-Driven Heston Implied Volatility
标题:通过跳跃式马尔科夫驱动的Heston隐含波动率的合成美式期权定价
链接:https://arxiv.org/abs/2605.13998

作者:Julia Sun,Zheyu Jin,Jiawei Zhang,Jeffrey D. Varner

【72】A Regret Perspective on Online Multiple Testing
标题:在线多重测试的遗憾视角
链接:https://arxiv.org/abs/2605.13916

作者:Qingyang Hao,Kongchang Zhou,Fang Kong,Hongxin Wei

【73】A Survey on Data-Dependent Worst-Case Generalization Bounds
标题:数据相关最坏情况概括界的调查
链接:https://arxiv.org/abs/2605.13913

作者:Hubert Leroux,Jean Marcus,Julien Roger
备注:15 pages, 4 figures, 3 tables. The LaTeX source uses the JMLR preprint style (jmlr2e.sty) and BibTeX (refs.bib). Central references in arXiv form include arXiv:2404.17442, arXiv:2006.09313, arXiv:2302.02766, arXiv:2407.08723, and arXiv:2507.06775

【74】Feature Visualization Recovers Known Cortical Selectivity from TRIBE v2
标题:特征可视化从TRIBE v2恢复已知的皮质选择性
链接:https://arxiv.org/abs/2605.13904

作者:Stuart Bladon,Brinnae Bent
备注:8 pages, 3 figures, 2 tables. Code available at https://github.com/recozers/Tribe-V2-Interp

机器翻译由腾讯交互翻译提供,仅供参考

点击“阅读原文”获取带摘要的学术速递

Python社区是高质量的Python/Django开发社区
本文地址:http://www.python88.com/topic/196368