机器学习学术速递[5.21]

点击阅读原文访问arxivdaily.com，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏等功能！

cs.LG 方向，今日共计289篇

大模型相关(36篇)

【1】EvoStruct: Bridging Evolutionary and Structural Priors for Antibody CDR Design via Protein Language Model Adaptation
标题：EvoStruct：通过蛋白质语言模型适应弥合抗体CDR设计的进化和结构先验
链接：https://arxiv.org/abs/2605.21485

作者：Mansoor Ahmed,Sujin Lee,Umar Khayaz,Murray Patterson
摘要：Equivariant graph neural network (GNN) methods for antibody complementarity-determining region (CDR) design achieve the highest sequence recovery but suffer from severe vocabulary collapse. The current best GNN methods over-predict very few amino acids, such as tyrosine and glycine, while ignoring functionally important residues. We trace this failure to GNN encoders learning amino acid distributions de novo from limited structural data, discarding substitution patterns encoded in evolutionary databases. To resolve this, we propose EvoStruct, which bridges a frozen protein language model (PLM) with 3D structural context from an E(3)-equivariant GNN via a cross-attention adapter. Unlike prior PLM-structure adapters for general protein design, EvoStruct targets the vocabulary collapse problem specific to CDR design through progressive PLM unfreezing and R-Drop consistency regularization. On the CHIMERA-Bench dataset, EvoStruct achieves the highest amino acid recovery and lowest perplexity among several antibody design methods, improving sequence recovery by 16% and reducing perplexity by 43% relative to the best GNN baselines, while recovering 2.3x greater amino acid diversity and the highest binding-pair correlation with ground truth.

【2】You Only Need Minimal RLVR Training: Extrapolating LLMs via Rank-1 Trajectories
标题：您只需要最少的WLVR训练：通过Rank-1轨迹推断LLM
链接：https://arxiv.org/abs/2605.21468

作者：Zhepei Wei,Xinyu Zhu,Wei-Lin Chen,Chengsong Huang,Jiaxin Huang,Yu Meng
备注：preprint. Code: https://github.com/weizhepei/RELEX
摘要：Reinforcement learning with verifiable rewards (RLVR) has become a dominant paradigm for improving reasoning in large language models (LLMs), yet the underlying geometry of the resulting parameter trajectories remains underexplored. In this work, we demonstrate that RLVR weight trajectories are extremely low-rank and highly predictable. Specifically, we find that the majority of downstream performance gains are captured by a rank-1 approximation of the parameter deltas, where the magnitude of this projection evolves near-linearly with training steps. Motivated by this, we propose a simple and compute-efficient method RELEX (REinforcement Learning EXtrapolation), which estimates the rank-1 subspace from a short observation window and extrapolates future checkpoints via linear regression, with no learned model required. Across three models (i.e., Qwen2.5-Math-1.5B, Qwen3-4B-Base, and Qwen3-8B-Base), RELEX produces checkpoints that match or exceed RLVR performance on both in-domain and out-of-domain benchmarks, requiring as few as 15% steps of full RLVR training. Remarkably, RELEX is able to extrapolate far beyond the observation window at no training cost, predicting checkpoints up to 10-20$\times$ beyond the observed prefix with continued improvement (e.g., observe only the first 50 steps and extrapolate to 1000 steps). Our ablation analysis confirms the minimalist sufficiency of RELEX: neither increasing the subspace rank nor employing non-linear modeling yields further gains in extrapolation. Finally, we show that RELEX's success stems from a "denoising" effect: by projecting updates onto the rank-1 subspace, the model discards stochastic optimization noise that would otherwise degrade performance during extrapolation. Our code is available at https://github.com/weizhepei/RELEX.

【3】What Twelve LLM Agent Benchmark Papers Disclose About Themselves: A Pilot Audit and an Open Scoring Schema
标题：十二份LLM代理基准论文披露了关于自己的内容：试点审计和开放评分模式
链接：https://arxiv.org/abs/2605.21404

作者：Mahdi Naser Moghadasi,Faezeh Ghaderi
备注：Pilot audit of 12 LLM agent benchmark papers; schema, codebook, and per-paper scoring sheet released. Submission to IEEE Big Data 2026
摘要：We read twelve well-known LLM agent benchmark papers and recorded, dimension by dimension, what each paper actually says about how its evaluation was run. The motivation came from a familiar frustration: two papers will report results on the same benchmark with the same model name and disagree, and you cannot tell why -- the scaffold, the sampling settings, the subset, or the evaluator version. In many cases the published artifact does not let you answer. This paper is an implementation report on the attempt. We designed a small audit schema (five fields: benchmark identity, harness specification, inference settings, cost reporting, failure breakdown), wrote a scoring codebook with the boundary cases we hit during pilot scoring, applied it to twelve canonical papers (eight agent, four classical static), and recorded what we saw. We score the disclosure of an agent run, not its correctness, and make no claim that disclosure implies a trustworthy result. The mean audit score across the eight agent-benchmark papers is 0.38 (out of 1.0), and across the four classical static benchmarks 0.66; the largest gap is on cost (none of the eight agent benchmark papers disclose inference cost in any form) and on harness specification (none fully disclose a content-addressed container image of the evaluation environment). We release the schema as a JSON Schema file, the codebook as a Markdown document, and the raw scoring sheet as a CSV. The scoring was performed by a single auditor in one pass; a multi-rater audit is the natural next step, and we discuss what we think it would change.

【4】Insights Generator: Systematic Corpus-Level Trace Diagnostics for LLM Agents
标题：Insights Generator：针对LLM代理的系统性体级跟踪诊断
链接：https://arxiv.org/abs/2605.21347

作者：Akshay Manglik,Apaar Shanker,Kaustubh Deshpande,Jason Qin,Yash Maurya,Veronica Chatrath,Vijay S. Kalmath,Levi Lentz,Yuan,Xue
备注：Under review
摘要：Diagnosing failures in LLM agents remains largely manual. Practitioners inspect a small subset of execution traces, form ad-hoc hypotheses, and iterate. This process misses patterns that only emerge across trace populations and does not scale to production corpora where individual traces span tens of thousands of tokens. We formalize the problem of corpus-level trace diagnostics. Given a corpus of execution traces, the goal is to produce grounded natural-language insights that characterize systematic behavioral patterns across trace groups, each linked to supporting evidence. We present the Insights Generator (IG), a multi-agent system that answers diagnostic questions by proposing and testing hypotheses across the trace corpus to produce an evidence-backed insights report. We evaluate IG across qualitative and objective dimensions, spanning rubric-based report assessment and downstream performance improvements achieved by implementing IG insights. Human experts using IG reports improve scaffold performance by 30.4pp over the unmodified baseline scaffold, and coding agents leveraging IG-derived insights show consistent and stable gains. Across benchmarks, IG's scout-investigator architecture produces findings comparable in detection coverage to competing approaches, while domain experts rated IG reports as leading depth and evidence quality.

【5】Frontier: Towards Comprehensive and Accurate LLM Inference Simulation
标题：前沿：迈向全面准确的LLM推理模拟
链接：https://arxiv.org/abs/2605.21312

作者：Yicheng Feng,Xin Tan,Yangtao Deng,Yimin Jiang,Yibo Zhu,Hong Xu
摘要：Modern LLM serving is no longer homogeneous or monolithic. Production systems now combine disaggregated execution, complex parallelism, runtime optimizations, and stateful workloads such as reasoning, agents, and RL rollouts. Simulation is attractive for exploring this growing design space, yet existing simulators lack the architectural completeness and decision-grade fidelity it demands. Their monolithic-replica abstractions are ill-suited to disaggregated serving, while average-case analytical proxies can distort SLA predictions and even reverse optimization conclusions. We present Frontier, a discrete-event simulator for modern LLM inference serving. Frontier features a disaggregated abstraction. It captures the structure and dynamics of modern serving systems by modeling co-location, Prefill-Decode Disaggregation (PDD), and Attention-FFN Disaggregation (AFD) with role-specific cluster workers, incorporating key runtime optimizations (e.g., CUDA Graphs, speculative decoding) within the scheduler-batch-engine loop, and supporting stateful requests for emerging workloads. It further provides accurate and generalizable predictions of computation, communication, and memory costs across diverse serving scenarios with complex workload compositions. On 16-H800 GPU testbed, Frontier achieves an average throughput error below 4%. Compared with state-of-the-art simulators, it reduces end-to-end latency error from 44.9% to 6.4% under co-location and from 51.7% to 2.6% under disaggregation. It scales to over 1K GPUs on commodity CPUs and enables new use cases such as SLA-dependent Pareto frontier exploration, heterogeneous disaggregated allocation, agentic reasoning scheduling validation, and RL post-training reconfiguration.

【6】TimeSRL: Generalizable Time-Series Behavioral Modeling via Semantic RL-Tuned LLMs -- A Case Study in Mental Health
标题：TimeSRL：通过语义RL调整的LLM的可推广时间序列行为建模--心理健康的案例研究
链接：https://arxiv.org/abs/2605.21295

作者：Yuang Fan,Lilin Xu,Millie Wu,Jingping Nie,Qingyu Chen,Yuzhe Yang,Zhuo Zhang,Xin Liu,Subigya Nepal,Xiaofan Jiang,Xuhai "Orson" Xu
摘要：Longitudinal passive sensing enables continuous health prediction, yet models often fail under cross-dataset distribution shifts. Traditional ML overfits cohort-specific artifacts, while Large Language Models (LLMs) struggle to reason reliably over long, heterogeneous time-series. We introduce TimeSRL, a two-stage LLM framework that routes predictions through an explicit semantic bottleneck. The model first abstracts raw signals into high-level natural language, then predicts behavioral outcomes from these abstractions alone. This forces the model to reason over semantic concepts that we argue generalize better than raw numbers. We optimize this process end-to-end using Group Relative Policy Optimization (GRPO) with Reinforcement Learning from Verifiable Rewards (RLVR), learning outcome-aligned abstractions without gold intermediate annotations. Instantiated on mental-health prediction, TimeSRL achieves state-of-the-art performance on a benchmark designed to stress-test cross-cohort generalization under a rigorous leave-one-dataset-out (LOSO) protocol, reducing mean absolute error (MAE) over strong non-LLM ML and LLM baselines by 3.1--10.1% and 9.5--44.1% for anxiety, and 3.2--9.6% and 27.4--57.6% for depression (all $p$s<0.05). TimeSRL significantly outperforms prior methods in cross-benchmark transfer across different sensing pipelines, rivaling its own within-domain performance without target-domain fine-tuning. These results demonstrate that semantic abstractions are reusable and point to a new direction for generalizable behavior modeling via RL-tuned LLMs.

【7】APEX: Autonomous Policy Exploration for Self-Evolving LLM Agents
标题：APEX：自我进化的LLM代理的自主政策探索
链接：https://arxiv.org/abs/2605.21240

作者：Yibo Li,Jiashuo Yang,Zhi Zheng,Zhiyuan Hu,Yuan Sui,Shizun Wang,Yufei He,Bryan Hooi
摘要：LLM agents have shown strong performance across a wide range of complex tasks, including interactive environments that require long-horizon decision making. But these agents cannot learn on the fly at test time. Self-evolving agents address this by accumulating memory and reflection across episodes rather than requiring model-weight updates. However, these agents often suffer from exploration collapse: as memory grows, behavior concentrates around familiar high-reward routines, reducing the chance of discovering better alternatives. To address this problem, we propose Autonomous Policy EXploration (APEX), which builds and maintains an explicit strategy space through a strategy map-a directed acyclic graph of milestones with prerequisite dependency edges. In APEX, Fork Discovery expands the map with evidence-grounded unexplored directions, while Policy Selection balances exploration and exploitation during planning. Evaluated on nine Jericho text-adventure games and WebArena, a realistic web interaction benchmark, APEX outperforms all baselines. Extensive ablations validate each component's contribution and demonstrate robustness across diverse settings, demonstrating APEX's effectiveness for sustained exploration in self-evolving agents.

【8】Automated ICD Classification of Psychiatric Diagnoses: From Classical NLP to Large Language Models
标题：精神病诊断的自动ICD分类：从经典NLP到大型语言模型
链接：https://arxiv.org/abs/2605.21154

作者：Fernando Ortega,Raúl Lara-Cabrera,Jorge Dueñas-Lerín,Alejandro de la Torre-Luque,Mercé Salvador Robert,Enrique Baca-García
摘要：Mental health has become a global priority, leading to a massive administrative burden in the coding of clinical diagnoses. This study proposes the automation of psychiatric diagnostic analysis by mapping free-text descriptions to the International Classification of Diseases (ICD) using Natural Language Processing (NLP) and Machine Learning (ML) techniques. Utilizing a specialized dataset of 145,513 Spanish psychiatric descriptions, various text representation paradigms were evaluated, ranging from classical frequency-based models (BoW, TF-IDF) to state-of-the-art Large Language Models (LLMs) such as e5\_large, BioLORD, and Llama-3-8B. Results indicate that transformer-based embeddings consistently outperform traditional methods by capturing implicit semantic cues and nuanced medical terminology. The e5\_large model, through end-to-end fine-tuning, achieved the highest performance with a $F1_{micro}$ score of 0.866. This research demonstrates that adapting LLMs to specific clinical nomenclature is essential for overcoming the challenges of ``long-tail'' label distributions and the inherent ambiguity of psychiatric discourse.

【9】Multimodal LLMs under Pairwise Modalities
标题：成对模态下的多模态LLM
链接：https://arxiv.org/abs/2605.21059

作者：Yan Li,Yunlong Deng,Yuewen Sun,Gongxu Luo,Kun Zhang,Guangyi Chen
摘要：Despite the impressive results achieved by multimodal large language models (MLLMs), their training typically relies on jointly curated multimodal data, requiring substantial human effort to construct multi-way aligned datasets and thereby limiting scalability across domains. In this work, we explore training MLLMs by only leveraging multiple paired modalities as a surrogate for the full joint multimodal distribution. Specifically, we first provide a theoretical analysis of the conditions under which the representations are identifiable with only observing pairwise modalities. Building on this analysis, we propose a representation learning framework for aligning latent representations across modalities using only pairwise data. The framework consists of two stages: latent representation alignment and cross-modal recomposition. Specifically, in the first stage, we learn the shared latent space across modalities by both self-modal reconstruction and pair-wise contrastive learning. We also incorporate an inductive bias in the contrastive learning process by partially aligning and minimal latent specification. In stage two, we integrate the encoder of newly introduced modalities with the decoders of the pre-trained modalities to facilitate cross-modal transfer and generation. We evaluate our method by newly adding 3D point clouds and tactile modalities into pre-trained MLLMs with three modality pairs and show that, by learning an aligned latent representation space, our model achieves strong cross-modal performance.

【10】Calibration vs Decision Making: Revisiting the Reliability Paradox in Unlearned Language Models
标题：校准与决策：重新审视未习得语言模型中的可靠性参数
链接：https://arxiv.org/abs/2605.20915

作者：Divyaksh Shukla,Ashutosh Modi
备注：Accepted at SRW, ACL 2026; 17 pages (9 + 2 + 6)
摘要：Machine unlearning aims to remove the influence of specific training data from a model while preserving reliable behavior on the remaining data, making reliable prediction and uncertainty estimation essential for evaluation. Calibration is commonly used as a proxy for reliability in language models, but low calibration error does not necessarily imply reliable decision rules, as models may rely on spurious correlations while remaining well calibrated. We investigate this gap in generative language models using the multiple-choice question-answering evaluation protocol on the TOFU benchmark, measuring probabilistic reliability with calibration metrics (ECE, MCE, Brier) and decision-rule reliability via attribution-based shortcut detection with Integrated Gradients and Local Mutual Information. We find that fine-tuned models achieve low calibration error (ECE ~ 0.04) compared to pretrained models (ECE > 0.5), and models after unlearning retain similarly low calibration despite reduced accuracy on the forget split, while attribution analysis shows increased reliance on correlation-based tokens. These results demonstrate that good calibration can coexist with shortcut-based decision rules after unlearning, extending the reliability paradox to the machine unlearning setting.

【11】PlanningBench: Generating Scalable and Verifiable Planning Data for Evaluating and Training Large Language Models
标题：PlanningBench：生成可扩展和可验证的规划数据，用于评估和训练大型语言模型
链接：https://arxiv.org/abs/2605.20873

作者：Ziliang Zhao,Zenan Xu,Shuting Wang,Hongjin Qian,Yan Lei,Minda Hu,Zhao Wang,Shihan Dou,Zhicheng Dou,Pluto Zhou
摘要：Planning is a fundamental capability for large language models (LLMs) because such complex tasks require models to coordinate goals, constraints, resources, and long-term consequences into executable and verifiable solutions. Existing planning benchmarks, however, usually treat planning data as fixed collections of instances rather than controllable generation targets. This limits scenario coverage, ties difficulty to surface-level proxies rather than structural sources, and offers limited support for scalable generation, automatic verification, or planning-oriented training. We introduce PlanningBench, a framework for generating scalable, diverse, and verifiable planning data for both evaluation and training. PlanningBench starts from real planning scenarios and abstracts practical workflows into a structured taxonomy of more than 30 task types, subtasks, constraint families, and difficulty factors. Guided by this taxonomy, a constraint-driven synthesis pipeline instantiates self-contained planning problems with adaptive difficulty control, quality filtering, and instance-level verification checklists. This shifts planning data construction from fixed benchmark collection to controllable generation while preserving realistic task grounding. We use PlanningBench to evaluate open-source and closed-source frontier LLMs, and find that current models still struggle to produce complete solutions under coupled constraints. Beyond evaluation, reinforcement learning on verified PlanningBench data improves performance on unseen planning benchmarks and broader instruction-following tasks. Further analysis suggests that determinate or well-specified optimal solutions provide clearer reward signals and more stable training dynamics. Overall, PlanningBench provides a controllable source of planning data for diagnosing and improving generalizable planning abilities in LLMs.

【12】PlexRL: Cluster-Level Orchestration of Serviceized LLM Execution for RLVR
标题：PlexRL：RLVR服务化LLM执行的计算机级规划
链接：https://arxiv.org/abs/2605.20863

作者：Yiqi Zhang,Fangzheng Jiao,Tian Tang,Boyu Tian,Hangyu Wang,Qiaoling Chen,Guoteng Wang,Zhen Jiang,Peng Sun,Ping Zhang,Xiaohe Hu,Ziming Liu,Menghao Zhang,Yanmin Jia,Yang You,Siyuan Feng
摘要：Reinforcement learning with verifiable rewards (RLVR) has recently unlocked strong reasoning capabilities in large language models (LLMs), triggering rapid exploration of new algorithms and data. However, RLVR training is notoriously inefficient: long-tailed rollouts, tool-induced stalls, and asymmetric resource requirements between rollout and training introduce substantial idle time that cannot be eliminated by job-local optimizations such as synchronous pipelining, asynchronous rollout, or colocated execution. We argue that this inefficiency is structural. While idle gaps are unavoidable within individual RLVR jobs, they are largely anti-correlated across jobs and therefore exploitable at the cluster level. Leveraging this observation, we present PlexRL, a cluster-level runtime for multiplexing unified LLM services across RLVR jobs. By centrally managing model placement, state transitions, and function-level scheduling under strict affinity constraints, PlexRL time-slices LLM execution across jobs to fill otherwise idle periods without expensive model migration. Our implementation and evaluations demonstrate that PlexRL significantly improves effective cluster capacity and reduces user GPU hour cost by maximum 37.58% while preserving algorithmic flexibility and introducing minimal per-job overhead.

【13】GraphRAG on Consumer Hardware: Benchmarking Local LLMs for Healthcare EHR Schema Retrieval
标题：消费者硬件上的GraphRAG：医疗保健EHR模式检索的本地LLM基准
链接：https://arxiv.org/abs/2605.20815

作者：Peter Fernandes,Ria Kanjilal
备注：9 pages, 1 figure, 5 tables
摘要：Graph-based Retrieval Augmented Generation (GraphRAG) extends retrieval-augmented generation to support structured reasoning over complex corpora, but its reliability under resource-constrained, privacy-sensitive deployments remains unclear. In healthcare, where Electronic Health Record (EHR) data is complex and strictly regulated, reliance on cloud-based large language models (LLMs) introduces challenges in cost, latency, and compliance. In this work, we present a systematic evaluation of GraphRAG for EHR schema retrieval using locally deployed open-source LLMs. We implement the Microsoft GraphRAG pipeline on real-world EHR schema documentation and benchmark four models, including Llama 3.1 (8B), Mistral (7B), Qwen 2.5 (7B), and Phi-4-mini (3.8B), each deployed via Ollama on a single consumer GPU (8 GB VRAM). We evaluate indexing efficiency, knowledge graph construction, query latency, answer quality, and hallucination under both global and local retrieval modes. Our results reveal substantial differences: Llama 3.1 produces the richest knowledge graph (1,172 entities), Qwen 2.5 achieves the best answer quality (3.3/5), Phi-4-mini fails to complete the pipeline due to structured-output errors, and Mistral exhibits degenerate repetition behavior. We further show that GraphRAG exhibits a practical capacity threshold, where models below approximately 7B parameters fail to reliably produce valid structured outputs and cannot complete the pipeline. In addition, indexing and answer quality are decoupled across models, and local retrieval consistently outperforms global summarization in both latency and factual grounding, with reduced hallucination. These findings demonstrate that GraphRAG is feasible on consumer hardware while highlighting the importance of model selection and retrieval design for robust deployment in regulated settings.

【14】The Illusion of Intervention: Your LLM-Simulated Experiment is an Observational Study
标题：干预的幻觉：您的LLM模拟实验是一项观察性研究
链接：https://arxiv.org/abs/2605.20767

作者：Victoria Lin,Taedong Yun,Maja Matarić,John Canny,Arthur Gretton,Alexander D'Amour
摘要：Large language models (LLMs) show potential as simulators of human behavior, offering a scalable way to study responses to interventions. However, because LLMs are trained largely on observational data, interventions in experiments with LLM-simulated synthetic users can induce unintended shifts in latent user attributes, causing user drift where the implicit simulated population differs across treatment conditions, potentially distorting effect estimates. We formalize the confounding or selection bias that can arise due to user drift and show how intervention-dependent shifts can inflate or attenuate observed differences in user responses under intervention. To diagnose confounding, we propose using negative control outcomes--attributes that should remain invariant under intervention--to identify distribution shifts across intervention conditions, providing evidence of user drift. To mitigate drift, we study adjusting the persona specification by eliciting additional confounders, finding that targeted, setting-relevant confounders can substantially reduce bias across survey-style and multi-turn agent evaluations.

【15】Correcting Stochastic Update Bias in Preconditioned Language Model Optimizers
标题：预条件语言模型优化器中随机更新偏差的校正
链接：https://arxiv.org/abs/2605.20756

作者：Nikhil Nayak,Julia White,Urchade Zaratiana,Kelton Zhang,Henrijs Princis,Dhruv Atreja,Henry Fawcett,Matthew Thomas,George Hurn-Maloney,Ash Lewis
备注：32 pages, 3 figures, 13 tables
摘要：Preconditioned optimizers are central to language model training, but their stochastic update rules are usually treated as direct approximations to population preconditioned descent. We show that this view misses two finite-sample biases. First, the gradient and preconditioner are typically estimated from the same minibatch, introducing gradient--preconditioner coupling bias. Second, even when the preconditioner estimate is unbiased, its inverse or inverse-root is generally biased because inversion is nonlinear. We propose a single-batch bias-correction framework that addresses both effects: cross-fitted preconditioning estimates the numerator and preconditioner from independent microbatch groups, while variance-corrected inversion uses microbatch variability to subtract the leading delta-method bias term. The framework applies to diagonal moment, diagonal curvature, and matrix preconditioning methods, instantiated in AdamW, Sophia, and Shampoo. Bias correction reduces held-out pretraining loss on Qwen2.5-0.5B by $0.15$, $0.07$, and $0.11$ nats, respectively; the effects on mixed-quality pretraining and downstream instruction tuning are consistently neutral-to-positive. Together, these results establish bias correction as a practical mechanism for reducing finite-sample update bias and improving the performance of preconditioned optimizers.

【16】Distribution-Aware Reward: Reinforcement Learning over Predictive Distributions for LLM Regression
标题：分布感知奖励：LLM回归的预测分布强化学习
链接：https://arxiv.org/abs/2605.20740

作者：Jungsoo Park,Hyungjoo Chae,Ethan Mendes,Jay DeYoung,Varsha Kishore,Wei Xu,Alan Ritter
备注：21 pages, 5 figures
摘要：Large language models can predict real-valued quantities from heterogeneous inputs such as text, code, and molecular strings, but most training objectives score each decoded floating-point number independently, improving point estimates without ensuring calibrated predictive distributions. This limits applications requiring candidate ranking or uncertainty estimation. We introduce Distribution-Aware Reward, an on-policy reinforcement learning objective whose main contribution is to train language models to produce better predictive distributions for regression tasks, rather than only optimizing individual decoded outputs against scalar targets. Our method treats multiple decoded samples as an empirical predictive distribution, evaluates it with the Continuous Ranked Probability Score, and assigns leave-one-out credit based on each rollout's marginal contribution to distribution quality, rewarding predictions that are both accurate and appropriately dispersed. We evaluate our method on a controlled Gaussian-mixture task, code performance prediction, and molecular property prediction from SMILES strings. Across tasks, our method improves over supervised fine-tuning and pointwise reinforcement learning baselines, with strong rank-correlation gains, including a 6-point Spearman improvement on KBSS. On MoleculeNet, it uses only SMILES strings yet remains competitive with strong graph-based and 3D molecular models. Further analyses show that our method mitigates rollout diversity collapse and improves uncertainty diagnostics, suggesting that directly optimizing predictive distributions makes language model regression more robust and better calibrated.

【17】Llamas on the Web: Memory-Efficient, Performance-Portable, and Multi-Precision LLM Inference with WebGPU
标题：Web上的Llamas：使用WebGPU实现内存高效、性能可移植和多精度LLM推理
链接：https://arxiv.org/abs/2605.20706

作者：Reese Levine,Rithik Sharma,Nikhil Jain,Abhijit Ramesh,Zheyuan Chen,Neha Abbas,James Contini,Tyler Sorensen
备注：19 pages, 11 figures, 5 tables
摘要：Running language models in the browser presents a unique opportunity to build efficient, private, and portable AI applications, but requires contending with constrained memory availability and heterogeneous hardware targets. To realize this opportunity, we present Llamas on the Web (LlamaWeb), a WebGPU backend for llama.cpp that enables memory-efficient and performance-portable LLM inference across a wide range of model weight formats in the browser. Our design significantly reduces memory overhead through static memory planning and efficient model loading, addresses cross-device variability through a tunable kernel library, and introduces templated GPU kernels that support performant implementations of numerous quantization formats, enabling broad model support and extensibility to new formats. We evaluate LlamaWeb on 16 devices from 8 vendors, collecting data from 10 language models and four model weight formats. We compare LlamaWeb against existing browser-based LLM frameworks and find that LlamaWeb requires 29-33% less memory across several combinations of device, browser, and operating system. We also evaluate LlamaWeb's performance against these frameworks and find that it increases decode throughput by 45-69% across four GPUs from separate vendors. In addition, we compare LlamaWeb's performance against other llama.cpp backends, where it is competitive with and even beats vendor-specific backend performance on some devices.

【18】Trusted Weights, Treacherous Optimizations? Optimization-Triggered Backdoor Attacks on LLMs
标题：值得信赖的权重，狡猾的优化？优化触发的对LLM的后门攻击
链接：https://arxiv.org/abs/2605.20641

作者：Yifei Wang,Tianlin Li,Xiaohan Zhang,Yida Yang,Xiaoyu Zhang,Li Pan
备注：20 pages, 3 figures
摘要：Inference optimization is a vital technique for deploying LLMs at scale. Compilation is the most widely adopted optimization technique for LLMs. While it assumes semantic equivalence between the original and compiled graphs, we first uncover its numerical side effects can be maliciously exploited to implant stealthy backdoors in LLMs. We propose a unified optimization-triggered attack framework comprising two complementary strategies. Without any modification to the compiler or hardware, one strategy flips predictions for specific inputs only when the model is compiled, while the other uses a universal trigger that remains dormant under uncompiled execution but hijacks arbitrary inputs once compilation optimization is applied. Both attacks bypass standard safety evaluations run without compilation. We empirically demonstrate that these optimization-triggered backdoors achieve attack success rates averaging 90% across four mainstream open-source LLMs and four tasks, while clean accuracy is preserved at nearly 100% under all settings. Our findings reveal a novel attack surface at the intersection of optimization and security in the LLM deployment pipeline, and we investigate practical defenses to mitigate this threat.

【19】Complementing reinforcement learning with SFT through logit averaging in the post training of LLMs
标题：通过LLM后训练中的logit平均值，用SFT补充强化学习
链接：https://arxiv.org/abs/2605.20555

作者：Xingwei Gan,Ying Zhu
摘要：We introduce a novel method that averages the logits of a frozen reference policy (e.g., SFT) and a trainable policy, and incorporate the method into Group Relative Policy Optimization (GRPO). In contrast to Reinforcement Learning with Verifiable Rewards (RLVR) methods, our proposal does not involve a Kullback Leibler (KL) regularization or critic; the trainable policy and the reference anchor are coupled through the logit averaging structure to leverage the reasoning expertise of the trainable policy while maintaining the formatting advantage of SFT. Our method is evaluated on MATH, cn-k12, and MMLU, and the results show a higher accuracy or at least comparable accuracy relative to the canonical KL-regularized GRPO.

【20】AgentAtlas: Beyond Outcome Leaderboards for LLM Agents
标题：AgentAtlas：法学硕士代理超越成果排行榜
链接：https://arxiv.org/abs/2605.20530

作者：Parsa Mazaheri,Kasra Mazaheri
摘要：Large language model agents now act on codebases, browsers, operating systems, calendars, files, and tool ecosystems, but the benchmarks used to evaluate them are fragmented: each emphasizes a different unit of measurement (final task success, tool-call validity, repeated-pass consistency, trajectory safety, or attack robustness). A line of 2024-2025 work has converged on the diagnosis that a single accuracy column is no longer the right unit of comparison for deployable agents. AgentAtlas extends this line of work with four components: (i) a six-state control-decision taxonomy (Act / Ask / Refuse / Stop / Confirm / Recover); (ii) a nine-category trajectory-failure taxonomy with two orthogonal hierarchical labels (primary_error_source, impact); (iii) a taxonomy-aware vs. taxonomy-blind methodology that measures how much of a model's apparent capability comes from the supervision in the prompt; and (iv) a benchmark-coverage audit mapping fifteen agent benchmarks against six behavioral axes. To demonstrate the methodology we run a small fixed eight-model set (1,342 generated items, four frontier closed and four open-weight) under both prompt modes. Removing the explicit label menu drops every model's trajectory accuracy by 14-40 pp to a tight 0.54-0.62 floor regardless of family, and no single model wins on all three of control accuracy, trajectory diagnosis, and tool-context utility retention. We treat the synthetic run as a measurement-protocol demonstration, not a benchmark release.

【21】Machine-Learning-Enhanced Non-Invasive Testing for MASLD Fibrosis: Shallow-Deep Neural Networks Versus FIB-4, Tabular Foundation Models, and Large Language Models
标题：MASLD纤维化的机器学习增强型无创测试：浅深神经网络与TBC-4、表格基础模型和大型语言模型
链接：https://arxiv.org/abs/2605.20523

作者：Athanasios Angelakis,Gabriele De Vito,Eleni-Myrto Trifylli,Filomena Ferrucci
备注：26 pages, 4 figures, 3 tables. Preprint
摘要：Advanced fibrosis is a major determinant of liver-related morbidity in metabolic dysfunction-associated steatotic liver disease (MASLD). FIB-4 is widely used as a first-line non-invasive test, but its fixed formula may underuse diagnostic information contained in age, aspartate aminotransferase, alanine aminotransferase, and platelet count. We evaluated whether machine-learning-enhanced non-invasive testing (MLE-NIT) can improve advanced fibrosis detection while preserving this FIB-4 variable space. We used three biopsy-confirmed MASLD cohorts from China, Malaysia, and India (n=784). The Chinese cohort was split into 486 training and 54 internal validation/tuning patients; final performance was reported only on the Malaysian and Indian external cohorts. Models used five variables: age, FIB-4, aspartate aminotransferase, platelet count, and alanine aminotransferase. We compared FIB-4 with a shallow-deep neural network (s-DNN), TabPFN, and gpt-4o-2024-08-06. FIB-4 achieved external ROC-AUCs of 0.75 and 0.60 in Malaysia and India, respectively. TabPFN achieved 0.69 and 0.66, fine-tuned GPT-4o achieved 0.75 and 0.63, and the s-DNN achieved 0.77 and 0.67, respectively. The s-DNN contained only 354 trainable parameters, compared with 7,244,554 for TabPFN, yet provided a more balanced external operating profile. Calibration showed s-DNN Brier scores of 0.18 and 0.22, and permutation importance identified AST and FIB-4 as dominant variables. Compact non-linear MLE-NITs may enhance FIB-4-based fibrosis assessment without increasing clinical data requirements.

【22】ZEBRA: Zero-shot Budgeted Resource Allocation for LLM Orchestration
标题：ZEBRA：LLM规划的零目标预算资源分配
链接：https://arxiv.org/abs/2605.20485

作者：May Hamri,Inbal Talgam-Cohen
摘要：As autonomous agents increasingly execute end-to-end tasks under fixed monetary budgets, the pressing open question shifts from whether the budget is respected, to how to spend it effectively. Existing budget-aware methods typically control reasoning step-by-step within a single agent, or learn resource allocation policies via RL. None address how to split a budget across the composing phases of a multi-agent pipeline at inference time. We propose ZEBRA, a zero-shot framework that reduces multi-phase budget allocation to a continuous nonlinear knapsack problem: an LLM controller estimates per-phase utility curves, and a water-filling search on the Lagrange multiplier returns the per-phase split. Additive and multiplicative aggregations are unified under the same solver. On a $150$-task APPS coding benchmark, both ZEBRA variants outperform LLM-direct (budget allocation directly by an LLM) on every aggregate metric. At a budget of $α= 0.5$ of the unconstrained spend, ZEBRA recovers $94.4\%$ of unconstrained quality, versus $88.1\%$ for LLM-direct. The advantage is statistically significant and transfers beyond coding: on a $3$-phase HotpotQA pipeline, ZEBRA beats LLM-direct by $14.3$pp, with allocations empirically robust to curve-estimation noise. On HotpotQA, ZEBRA arrives at a different budget split (near-balanced) compared to the APPS one (skewed towards a refinement phase), showing adaptation to the pipeline structure. More broadly, we show that lightweight algorithmic guidance at inference time can improve the economic behavior of autonomous multi-agent systems.

【23】LLM Pretraining Shapes a Generalizable Manifold: Insights into Cross-Modal Transfer to Time Series
标题：LLM预训练是一个可概括的Manifold：跨模式转移到时间序列的见解
链接：https://arxiv.org/abs/2605.20449

作者：Alexis Roger,Prateek Humane,Zhenghan Tai,Gwen Legate,Andrei Mircea,Vasilii Feofanov,Irina Rish
摘要：Can language-pretrained transformers become effective time-series forecasters, and why? In this paper, we show that cross-modal transfer arises because language pretraining preconditions time series training with a reusable manifold. A linear probe on frozen LLM states decodes realistic time-series trajectories without paired supervision, and retrieval in this projected space yields competitive forecasts, showing that structure and dynamics exist before finetuning. Pretrained initialization also improves optimization, producing coherent gradients and a highly anisotropic loss landscape unlike random initialization. Finetuning then acts as low-dimensional alignment, reusing existing directions rather than learning temporal primitives from scratch, as evidenced by low-rank updates, subspace alignment, and shared features for periodicity, trend, and repetition. Together, these results support a geometric account of LLM-to-time-series transfer: language pretraining builds the manifold, and finetuning projects numerical dynamics onto task-relevant directions.

【24】Do Vision--Language Models Understand 3D Scenes or Just Catalogue Objects?
标题：视觉-语言模型理解3D场景还是只是对对象进行编目？
链接：https://arxiv.org/abs/2605.20448

作者：Animesh Maheshwari,Divyansh Sahu,Nishit Verma
摘要：Vision--language models reliably name objects in a scene, but do they represent the 3D layout those objects inhabit? We introduce a 3,034-sample human-curated benchmark targeting three components of spatial understanding: depth-ordered occlusion (probed via three independent counterfactual operationalisations), optical-geometry inference over visible reflections, and volumetric rearrangement planning. Six frontier and open-weight VLMs, scored by trained annotators on 18,204 responses with no LLM-as-judge, reveal a sharp dissociation: models that plan rearrangements over visible layouts at 53--97% accuracy and rarely violate collision constraints fall to 6--45% on occlusion and below 7% on reflections. An embodied-reasoning model reproduces the same profile. White-box analysis on Qwen3-VL-8B-Thinking localises the failure to the visual-token merger: spatial information recoverable throughout the vision encoder becomes inaccessible after token compression and only stabilises again when clean post-merger activations are patched into the language decoder.

【25】Miller-Index-Based Latent Crystallographic Fracture Plane Reasoning with Vision-Language Models
标题：基于米勒指数的隐性晶体断裂面推理与视觉语言模型
链接：https://arxiv.org/abs/2605.20416

作者：Qinwu Xu,Yifan Jiang
摘要：We study whether multimodal large language models (MLLMs) can leverage crystallographic plane indices (Miller indices) as a structured latent representation for reasoning about fracture geometry. We formulate Miller indices $z = (h,k,l)$ as a latent variable governing idealized planar fracture and evaluate two complementary capabilities: (i) latent inference, where the model maps visual observations to plane hypotheses under physically valid conditions, and (ii) latent applicability assessment, where the model determines whether such a representation is meaningful for a given fracture image. Through extensive experiments spanning synthetic data, controlled 2D--3D geometric pairs, and real-world fracture images across multiple material classes -- including ceramics, glass, metals, and concrete -- we show that MLLMs can reliably perform latent inference in idealized settings and, critically, can reject the latent representation when the underlying physics does not support it. These results suggest that MLLMs can act as physics-aware reasoning systems conditioned on structured latent priors, provided that the domain of validity is explicitly modeled.

【26】Decomposing MXFP4 quantization error for LLM reinforcement learning: reducible bias, recoverable deadzone, and an irreducible floor
标题：分解MXFP 4量化误差用于LLM强化学习：可简化偏差、可恢复的死区和不可简化的地板
链接：https://arxiv.org/abs/2605.20402

作者：Xiaocan Li,Shiliang Wu,Zheng Shen
摘要：MXFP4 arithmetic can dramatically accelerate reinforcement learning (RL) post-training of large language models (LLMs), yet the quantization error introduces severe accuracy degradation. Existing work treats the quantization error as a monolithic noise term, missing the distinct mechanisms upon interpreting how quantization error damages training. We prove an exact three-way decomposition of quantization error and show how each component dominates a distinct RL training pathway. Our theoretical and empirical analysis decomposes the MXFP4 quantization error into three additive components: "scale bias" from power-of-two rounding, "deadzone truncation" from zeroing small values, and "grid noise" from rounding to the nearest 4-bit grid. Each component dominates a distinct RL failure mode: scale bias accumulates multiplicatively through the backward pass, affecting gradient accuracy; deadzone truncation degrades rollout quality; and grid noise raises the policy's entropy. We combine corrections that are RL failure mode-targeted but not component-exclusive: Macro-block scaling to reduce scale bias, Outlier Fallback recovers deadzone entries, but also partially reduces scale bias induced error, and Adaptive Quantization Noise (AQN) for controlling the policy entropy. On Qwen2.5-3B dense and Qwen3-30B-A3B-Base mixture-of-experts model, the targeted corrections recover BF16 accuracy to within 0.7% and 3.0% respectively.

【27】DEL: Digit Entropy Loss for Numerical Learning of Large Language Models
标题：MEL：大型语言模型数值学习的数字熵损失
链接：https://arxiv.org/abs/2605.20369

作者：Zhaohui Zheng,Chenhang He,Shihao Wang,Yuxuan Li,Ming-Ming Cheng,Lei Zhang
摘要：Number prediction stands as a fundamental capability of large language models (LLMs) in mathematical problem-solving and code generation. The widely adopted maximum likelihood estimation (MLE) for LLM training is not tailored to number prediction. Recently, penalty-driven approaches, e.g., Number Token Loss and Discretized Distance Loss, introduce an inductive bias of numerical distance but induce over-sharpened and over-flattened digit distributions, respectively. In this paper, we make an in-depth analysis on LLM numerical learning, and show that existing numerical learning methods conceptually follow a criterion-distance formulation, where the criterion term represents optimization pattern and the distance term instills geometric prior. Consequently, we present Digit Entropy Loss (DEL) for auto-regressive numerical learning, which reformulates the conventional unsupervised entropy optimization in three key designs: leveraging digit conditional probability and binary cross-entropy to guide the entropy optimization into a supervised manner; deprecating the distance term to bypass the issue of numerical distance; and generalizing the integer-based numerical learning to floating-point number optimization, enabling more accurate number prediction. Our DEL formulation can incorporate integers, decimals, and decimal points, expanding the learning objective from a single digit to the floating-point number domain. Experiments conducted on seven mathematical reasoning benchmarks with four representative LLMs, including CodeLlama, Mistral, DeepSeek, and Qwen-2.5, demonstrate that DEL consistently outperforms its counterparts in both overall prediction accuracy and numerical distance. Source codes are at https://github.com/PolyU-VCLab/DEL

【28】WildRoadBench: A Wild Aerial Road-Damage Grounding Benchmark for Vision-Language Models and Autonomous Agents
标题：WildRoadBench：视觉语言模型和自治代理的野生空中道路破坏接地基准
链接：https://arxiv.org/abs/2605.20306

作者：Bingnan Liu,Chenhang Cui,Rui Huang,Jiani Luo,Zhirong Shen,Tinghao Wang,Xiande Huang,Lingbei Meng,Fei Shen,An Zhang
备注：Preprint. Under review. 4 figures, 6 tables
摘要：We introduce WildRoadBench, a wild aerial road-damage grounding benchmark that couples direct visual grounding by vision-language models with autonomous research-and-engineering by LLM-driven agents on a single professionally annotated UAV corpus. The same image set and the same per-class AP_50 metric are evaluated under two protocols. The VLM Track measures whether a fixed VLM can localise domain-specific damage from one image and one short prompt under a unified prompting, decoding and parsing pipeline. The Agent Track measures whether an autonomous agent, given only a written task brief, a small exploratory slice and a fixed interaction budget, can search the public web, adapt pretrained components, write training and inference code, and submit predictions through a scalar-feedback oracle on a hidden holdout. We benchmark a broad pool of closed-source frontier models and open-source VLMs together with several frontier LLM-driven agents. Both routes remain far from reliable performance in this wild setting: closed-source frontier models lead the VLM leaderboard but still leave more than half of the metric on the table; open-source grounders plateau well below them, and newer generations or reasoning-style variants do not consistently improve grounding; small targets collapse for every open-source model; agents lag the strongest VLM despite richer affordances, and several fail to land a valid submission within the budget. We release the code and data at https://anonymous.4open.science/r/wildroadbench-0607 to support reproducible follow-up research.

【29】Quant.npu: Enabling Efficient Mobile NPU Inference for on-device LLMs via Fully Static Quantization
标题：Quant.npu：通过完全静态量化为设备上LLM实现高效的移动NPU推理
链接：https://arxiv.org/abs/2605.20295

作者：Jinghe Zhang,Daliang Xu,Chenghua Wang,Weikai Xie,Tao Qi,Yun Ma,Mengwei Xu,Gang Huang
摘要：Large language models (LLMs) are increasingly deployed on mobile devices, where Neural Processing Units (NPUs) necessitate fully static quantization for optimal inference efficiency. However, existing post-training quantization (PTQ) methods predominantly rely on dynamic activation quantization, rendering them incompatible with NPU hardware constraints. To bridge the gap between high-fidelity PTQ and NPU-constrained inference, we propose Quant.npu, a integer-only fully static quantization framework. It incorporates learnable quantization parameters and rotation matrices, enabling low-bit activation-weight quantization without runtime quantization parameters re-computation. Crucially, we identify that initialization and selective optimization of quantization parameters is pivotal for optimization stability, as improper initialization and naive joint optimization induce gradient instability that disrupts the optimization of rotation matrices. To address this, we propose a rotation-and-bit-width-aware initialization tailored to diverse activation profiles and a distribution-aware selective optimization (two-stage quantization pipeline) tailored to rotated and unrotated tensors. Furthermore, we introduce a sensitivity-guided adaptive mixed-precision scheme to balance accuracy with inference efficiency. Extensive experiments on real-world mobile NPUs demonstrate that Quant.npu achieves comparable accuracy to state-of-the-art methods, while reducing inference latency by up to 15.1%.

【30】Adaptive Probe-based Steering for Robust LLM Jailbreaking
标题：基于自适应探针的转向，实现稳健的LLM越狱
链接：https://arxiv.org/abs/2605.20286

作者：Junxi Chen,Junhao Dong,Xiaohua Xie
备注：19 pages, 13 figures, accepted by ICML 2026
摘要：Recent work has demonstrated the potential of contrastive steering for jailbreaking Large Language Models (LLMs). However, existing methods rely on limited and inherently biased contrastive prompts and require laborious manual tuning of steering strength, limiting their robustness and effectiveness. In this paper, we leverage the idea of model extraction to guide the learned steering vectors to approximate the ideal one and propose tuning the steering strength adaptively based on contrastive activations' statistics. Experiments demonstrate that our method notably improves the effectiveness and robustness of probe-based steering, without any extra contrastive prompts or laborious manual tuning. Being an attack paper, this paper focuses on revealing the breakdown of fortified LLMs, raising the average harmfulness score from 6\% to 70\%. Our code is available at https://github.com/fhdnskfbeuv/adaptiveSteering.

【31】Introspective X Training: Feedback Conditioning Improves Scaling Across all LLM Training Stages
标题：内省X训练：反馈条件反射提高了LLM所有训练阶段的规模
链接：https://arxiv.org/abs/2605.20285

作者：Brandon Cui,Ximing Lu,Jaehun Jung,Syeda Nahida Akter,Hyunwoo Kim,Yuxiao Qu,David Acuna,Shrimai Prabhumoye,Yejin Choi,Prithviraj Ammanabrolu
摘要：We tackle the question of how to scale more efficiently across the many, ever-growing stages of current LLM training pipelines. Our guiding intuition stems from the fact that the dynamics of later stages of the pipeline, e.g. post-training, can be used to inform earlier stages such as pre-training. To this end, we propose Introspective Training (or IXT), inspired by offline reward-conditioned reinforcement learning and applicable to any stage of training. IXT uses a thinking reward model to annotate data with natural language critique based feedback, enabling quality aware training from the earliest stages of the pipeline. Models are then trained by prefix-conditioning the data with the generated feedback -- ensuring that not all tokens are treated equally starting much earlier in training than usual. Comprehensive experiments on 7.5-12B transformer-based dense LLMs trained from scratch all the way up to 18 Trillion tokens seen show that our method: bends scaling curves resulting in up to 2.8x more compute efficiency generally; and reaches performance levels unachievable for models trained otherwise in domains such as math and code.

【32】Conformal Selective Acting: Anytime-Valid Risk Control for RLVR-Trained LLMs
标题：适形选择性代理：RLVR训练的LLM的随时有效风险控制
链接：https://arxiv.org/abs/2605.20270

作者：Hamed Khosravi,Xiaoming Huo
摘要：A local specialist LLM, fine-tuned with reinforcement learning from verifiable rewards (RLVR) on operator-local data, is installed in a regulated organization with per-deployment error budget $α$. The operator needs a safety certificate for this deployment's stream at every round: no pooling across deployments, no waiting for a long-run average. Existing wrappers cannot deliver this on adaptive, online-updated streams: offline conformal-risk methods require exchangeability; online-conformal methods bound only long-run averages; non-exchangeable extensions are marginally valid; and the closest anytime wrapper, A-RCPS, controls marginal rather than selective risk. Using a (test statistic, validity guarantee, deployment rule) framework, we identify one empty cell forced by deployment requirements: e-process per threshold, selective risk, anytime-pathwise validity, max-certified-threshold rule. Conformal Selective Acting (CSA) fills it as a per-round wrapper maintaining a Ville-type e-process per threshold on a Bonferroni grid, evaluated against the RLVR filtration. Under predictable updates and isotonic-calibrated monotone risk we prove (i) an anytime-pathwise selective-risk bound $R_T^{\mathrm{act}}\leα+O(N_T^{-1/2})$, (ii) rate-optimal certification matching $Θ(\barη^{-2}\log(1/δ))$, and (iii) a horizon-independent release-rate gap. Across eight specialist benchmarks ($480$ streams), sixteen adversarial distribution-shift cells ($160$ streams), and five live Expert-Iteration RLVR cells with online LoRA over four base models in three architecture families ($10{,}300$ rounds), CSA is the only method among ten compared that satisfies pathwise validity and non-refusing deployment on every cell. We do not propose a new LLM, training algorithm, or policy class; CSA is the deployment-side complement, orthogonal to the model, for operators who cannot use a frontier API.

【33】Chronicle: A Multimodal Foundation Model for Joint Language and Time Series Understanding
标题：Chronicle：一个用于联合语言和时间序列理解的多模态基础模型
链接：https://arxiv.org/abs/2605.20268

作者：Paul Quinlan,Jeremy Levasseur,Qingguo Li,Xiaodan Zhu
摘要：Real-world time series come with text: metadata, descriptions, news, reports. Yet time series foundation models process numerical sequences in isolation, and the multimodal text-and-time-series models that attempt to bridge the two all adapt a pretrained language model post hoc, inheriting representations shaped without ever seeing temporal data. These models are also evaluated almost exclusively against other multimodal baselines, not against the strongest unimodal foundation models in either domain, leaving open whether joint training is needed at all. We present Chronicle, a compact 324M-parameter decoder-only transformer trained from scratch on natural language and time series within a single unified architecture. Both modalities share the same transformer blocks, attention mechanism, and residual stream; the bulk of pretraining uses unimodal batches so cross-modal capability emerges purely from shared parameters, with a short alignment stage that interleaves the two. To our knowledge, Chronicle is the first model jointly pretrained on text and time series from scratch, and the first multimodal model evaluated against dedicated foundation models in both domains. It matches Gemma-3-270M-PT on 19 NLU tasks, sets a new bar for frozen-embedding time series classification on 24 UCR/UEA datasets, and produces multimodal forecasts on Time-MMD that beat every supervised fusion baseline, all from a single backbone.

【34】It Takes Two: Complementary Self-Distillation for Contextual Integrity in LLMs
标题：需要两个：LLM中的上下文完整性的补充自我蒸馏
链接：https://arxiv.org/abs/2605.20258

作者：Sangwoo Park,Woongyeong Yeo,Seanie Lee,Yumin Choi,Hyomin Lee,Kangsan Kim,Jinheon Baek,Seong Joon Oh,Sung Ju Hwang
备注：28 pages, 16 figures
摘要：Contextual Integrity (CI) defines privacy not merely as keeping information hidden, but as governing information flows according to the norms of a given context. As large language models are increasingly deployed as personal agents handling sensitive workflows, adhering to CI becomes critical. However, even frontier models remain unreliable in making disclosure decisions, and existing mitigation strategies often degrade underlying task performance. To overcome this privacy-utility trade-off, we propose SELFCI, a complementary self-distillation framework that decouples information suppression from task resolution. SELFCI jointly optimizes two independent reverse KL divergences over distinct teacher distributions derived from feedback: one encourages preserving task-relevant information for utility, while the other enforces minimal and appropriate disclosure. This complementary formulation induces a Product-of-Experts (PoE) target, aligning the policy with the intersection of capability and privacy requirements. Empirical evaluations demonstrate that SELFCI, without relying on costly external supervision, consistently outperforms competitive baselines such as online reinforcement learning algorithms (e.g., GRPO). These trends further extend to out-of-domain settings involving agentic workflows and accumulated private context, suggesting that SELFCI provides a practical path toward CI alignment.

【35】Parallel LLM Reasoning for Bias-Resilient, Robust Conceptual Abstraction
标题：并行LLM推理，用于抗偏、稳健的概念抽象
链接：https://arxiv.org/abs/2605.20194

作者：Aisvarya Adeseye,Jouni Isoaho,Adeyemi Adeseye
备注：Accepted to be Published in 12th Intelligent Systems Conference 2026, 3-4 September 2026 in Amsterdam, The Netherlands
摘要：Large language models (LLMs) have been increasingly used to analyze text. However, they are often plagued with contextual reasoning limitations when analyzing long documents. When long documents are processed sequentially, early or dominant concepts can overshadow less visible but meaningful interpretations, leading to cumulative analytical bias, omission error, and over-generalization. Additionally, independently generated outputs are often merged without systematic grounding, introducing redundancy, conceptual drift, and unsupported claims. This study proposes a structured framework combining parallel chunk-level processing with evidence-anchored consolidation. Texts are first divided into semantically coherent chunks and processed independently in parallel to remove influence from earlier processing. The independently generated interpretations are then consolidated using explicit evidence anchoring and prioritization that reduces dominance and over-generalization while improving traceability. Experiments with multiple model types and sizes indicate that parallel processing significantly reduces omission error by approximately 84%, increases evidence traceability by up to 130%, and reduces unsupported claims by up to 91%. Smaller models benefited most, suggesting that efficient parallel chunking and consolidation play a critical role in achieving reliable and scalable textual analysis.

【36】Federated LoRA Fine-Tuning for LLMs via Collaborative Alignment
标题：通过协作协调对LLM进行联合LoRA微调
链接：https://arxiv.org/abs/2605.21217

作者：Shuaida He,Liwen Chen,Long Feng
摘要：Low-rank adaptation (LoRA) has emerged as a powerful tool for parameter-efficient fine-tuning of large language models (LLMs). This paper studies LoRA under a federated learning setting, enabling collaborative fine-tuning across clients while preserving parameter efficiency. We focus on a highly heterogeneous regime in which clients share only partial structure and a substantial subset may be contaminated. We propose Collaborative Low-rank Alignment and Identifiable Recovery (CLAIR), a contamination-aware framework that relies only on preliminary local estimators. Its formulation applies broadly, from linear regression to neural network and LLM modules, whenever local adaptation can be represented by matrix-valued updates. CLAIR recovers the shared LoRA subspace and detects contaminated clients via a structured low-rank plus block-sparse decomposition. We prove exact recovery of the shared LoRA subspace in the noiseless case, stable recovery under preliminary estimation error, and consistent collaborative-set recovery under mild separation conditions. We further quantify the gain from CLAIR refinement: it reduces off-subspace estimation error through cross-client averaging while preserving client-specific variation within the shared LoRA subspace, thus improves over local fine-tuning whenever this oracle gain outweighs the costs of subspace estimation and benign-client heterogeneity. Empirically, we demonstrate the benefits of CLAIR by fine-tuning a Transformer architecture on a text-copying task. The results show accurate contamination detection and improved benign-client performance compared with local fine-tuning and non-robust federated averaging.

Graph相关(图学习|图神经网络|图优化等)(10篇)

【1】Is Fixing Schema Graphs Necessary? Full-Resolution Graph Structure Learning for Relational Deep Learning
标题：修复模式图有必要吗？用于关系深度学习的全分辨率图结构学习
链接：https://arxiv.org/abs/2605.21475

作者：Yi Huang,Qingyun Sun,Jia Li,Xingcheng Fu,Jianxin Li
备注：Accepted by the Forty-third International Conference on Machine Learning (ICML2026)
摘要：Relational prediction tasks are fundamental in many real-world applications, where data are naturally stored in relational databases (RDBs). Relational Deep Learning (RDL) addresses this problem by modeling RDBs as graphs and applying graph neural networks (GNNs) for end-to-end learning. However, the full-resolution property is commonly adopted as a design principle in graph construction for RDBs to preserve relational semantics, which leads most existing methods to rely on fixed graph structures. In this paper, we propose FROG, a Full-Resolution and Optimizable Graph Structure Learning} framework for RDL that formulates relational structure learning as a learnable table role modeling problem, allowing tables to contribute as nodes and edges in message passing. We further design role-driven message passing mechanisms to capture relational semantics, enabling joint optimization of graph structure and GNN representations. To ensure semantic consistency, we introduce functional dependency constraints that regularize representations across table and entity levels. Extensive experiments demonstrate that our method outperforms existing approaches and reveal how table roles impact downstream tasks, offering new insights into graph construction for RDL

【2】Graph Navier Stokes Networks
标题：图Navier Stokes网络
链接：https://arxiv.org/abs/2605.21247

作者：Zexing Zhao,Guangsi Shi,Yu Gong,Tianyu Wang,Shirui Pan,Hongye Cheng,Yuxiao Li
摘要：Graph Neural Networks (GNNs) have emerged as a cornerstone of deep learning, with most existing methods rooted in graph signal processing and diffusion equations to model message passing. However, these approaches inherently suffer from the oversmoothing problem, where node features become indistinguishable as the network depth increases. Inspired by the Navier Stokes equations, we introduce Graph Navier Stokes Networks (GNSN), a novel architecture that transcends conventional diffusion-based message passing by incorporating convection into graph structures. GNSN defines a dynamic velocity field on the graph to govern convection, enabling more efficient and direct message propagation. By adaptively balancing convection and diffusion, GNSN is able to efficiently handle datasets with varying levels of homophily. Extensive evaluations across twelve real-world datasets demonstrate that GNSN consistently outperforms state-of-the-art baselines in classification accuracy. Moreover, experimental results further emphasize its effectiveness in alleviating the oversmoothing problem.

【3】Point Cloud Sequence Encoding for Material-conditioned Graph Network Simulators
标题：实物图网络模拟器的点云序列编码
链接：https://arxiv.org/abs/2605.20978

作者：Philipp Dahlinger,Balázs Gyenes,Niklas Freymuth,Luca Geminiani,Tobias Würth,Johannes Mitsch,Nadja Klein,Luise Kärger,Gerhard Neumann
备注：9 pages + appendix, 7 figures. Submitted to the 40th Conference on Neural Information Processing Systems (NeurIPS 2026)
摘要：Graph Network Simulators (GNSs) have emerged as powerful surrogates for complex physics-based simulation, offering inherent differentiability and orders-of-magnitude speedups over traditional solvers. However, GNSs typically assume access to the underlying material parameters, such as stiffness or viscosity, severely limiting their utility in realistic experimental settings. While recent meta-learning approaches address the parameter dependency by inferring properties from mesh trajectories, reconstructing a mesh from an observed scene is challenging. In this work, we introduce Point Cloud Encoding for Accurate Context Handling (PEACH), a novel framework that applies in-context learning on point clouds to adapt a learned simulator to unseen physical properties during inference. Our approach relies on a novel spatio-temporal point cloud sequence encoder, as well as two forms of auxiliary supervision to help improve simulation fidelity. We demonstrate that PEACH is capable of accurate zero-shot sim-to-real transfer on a challenging, dynamic scene. Experiments on simulation scenes show that PEACH even outperforms mesh-based baselines on prediction accuracy, while being much more practical for real-world deployment.

【4】NeighborDiv: Training-free Zero-shot Generalist Graph Anomaly Detection via Neighbor Diversity
标题：NeighborDiv：通过邻居多样性的免训练Zero-Shot通才图异常检测
链接：https://arxiv.org/abs/2605.20879

作者：Kaifeng Wei,Teng Liu,Liang Dong,Xiubo Liang,Yuke Li
摘要：Graph Anomaly Detection (GAD) is increasingly shifting to Generalist GAD (GGAD) for cross-domain "one-for-all" detection, but existing GGAD methods predominantly rely on the neighbor consistency principle, falling into the \textbf{Node-to-Neighbor Consistency Paradigm} for anomaly quantification. These methods suffer from complex training pipelines, heavy training data dependency, high computational costs, and unstable cross-domain generalization. To address these limitations, we propose NeighborDiv, a training-free generalist graph anomaly detection framework based on neighbor diversity. Departing from the dominant Node-to-Neighbor Consistency Paradigm, we shift the focus to the \textbf{Neighbor-to-Neighbor Diversity Paradigm}, and uncover that the internal structural dispersion of a node's neighbor set is a powerful, independently discriminative anomaly signal. We quantify neighbor diversity via the variance of inter-neighbor feature similarities, which captures how a node organizes its local graph environment, and operates independently of conventional node-to-neighbor consistency frameworks. Extensive experiments under two standard GGAD evaluation paradigms show NeighborDiv achieves state-of-the-art performance, with relative gains of 10.25% in average AUC and 17.78% in average AP over the second-best baseline under Single-Domain Independent Training (SDIT), and 6.89%/9.58% in AUC/AP under Unified Multi-Domain Training (UMDT), respectively. Notably, NeighborDiv yields zero performance volatility across all datasets, eliminating training-set dependency and establishing a lightweight and highly practical GGAD framework.

【5】WaveGraphNet: Physics-Consistent Guided-Wave Damage Localization through Coupled Inverse-Forward Graph Learning
标题：WaveGraphNet：通过耦合倒向前图学习实现物理一致的引导波损伤定位
链接：https://arxiv.org/abs/2605.20311

作者：Vinay Sharma,Aditya Bharade,Olga Fink
摘要：Guided-wave structural health monitoring enables damage localization in composite plates using sparse networks of bonded piezoelectric transducers. However, inferring the spatial location of defects from pitch-catch measurements remains weakly constrained when only a limited set of damage locations is available for training. As a result, models trained to predict defect locations may perform well on seen cases but generalize poorly to unseen regions of the structure. This paper proposes WaveGraphNet, a coupled inverse--forward graph learning framework for guided-wave damage localization in Carbon Fiber Reinforced Polymer (CFRP) plates. The sensing layout is explicitly modeled as a graph, where transducers are represented as nodes and measured propagation paths define the graph connectivity. An inverse branch maps graph-structured spectral descriptors of differential guided-wave responses to a damage location, while a forward branch predicts the path-wise energy-deviation patterns of measured wave responses associated with a candidate location. During training, the forward branch serves as a physics-consistent regularizer, discouraging location estimates that are numerically plausible but inconsistent with the measured redistribution of wave-response energy. This coupling encourages agreement between inferred damage coordinates and the underlying wave propagation behavior. Within this benchmark, the proposed graph-based formulation provides a strong localization model for sparse guided-wave sensing and demonstrates improved robustness in extrapolation to held-out regions compared to both non-graph and graph baselines. These results highlight the potential of coupled inverse-forward graph learning as an effective strategy for guided-wave localization under limited spatial coverage.

【6】Graph Transductive Sharpening: Leveraging Unlabeled Predictions in Node Classification
标题：图转化细化：在节点分类中利用无标签预测
链接：https://arxiv.org/abs/2605.20248

作者：Brown Zaz,Mar Gonzàlez I Català,Ferran Hernandez Caralt,Moshe Eliasof,Pietro Liò
备注：19 pages, 4 figures, 17 tables
摘要：In the transductive setting, where the full graph is observed but node labels are only partially available, progress in semi-supervised node classification has largely focused on architectural innovation. In this paper, we revisit an orthogonal axis: the training objective. We start from a simple observation: transductive models produce predictions for every node during training, including nodes without labels. These unlabeled-node predictions may contain useful training signal, but standard supervised objectives discard them because no ground-truth labels are available. Inspired by the decomposition of cross-entropy into a label-dependent alignment term and a label-independent entropy term, we propose prediction confidence as a natural way to extract this signal in the absence of labels. This motivates Transductive Sharpening (TS): a loss-level modification that minimizes prediction entropy on unlabeled nodes while counterbalancing this effect on labeled nodes. We evaluate Transductive Sharpening across a wide range of node-classification benchmarks and observe consistent performance improvements without requiring any changes to the backbone architecture. Code is available at https://github.com/transductive-sharpening/tunedGNN.

【7】GraphDiffMed: Knowledge-Constrained Differential Attention with Pharmacological Graph Priors for Medication Recommendation
标题：GraphdiffMed：药物推荐的药物图先验知识限制的差异注意力
链接：https://arxiv.org/abs/2605.20188

作者：Krati Saxena,Tomohiro Shibata
摘要：Recommending safe and effective medication combinations from electronic health records (EHRs) is a core clinical AI problem, yet it remains difficult because patient trajectories are long, noisy, and clinically heterogeneous. Existing methods typically excel at either temporal modeling across visits or pharmacological knowledge integration (e.g., drug-drug interactions, DDIs), but rarely achieve both while robustly suppressing noise. We present GraphDiffMed, a knowledge-constrained medication recommendation framework built on dual-scale Differential Attention v2. Differential attention is applied at both intra-visit and inter-visit levels to filter spurious signals within encounters and across longitudinal history, while pharmacological constraints are incorporated during learning. Experiments on MIMIC-III and ablation studies show that this design consistently improves recommendation quality and ranking over strong baselines while achieving a more favorable safety performance balance. We further find that the strongest-performing configuration uses only demographic auxiliary features under our experimental setting. Overall, GraphDiffMed demonstrates that combining noise-aware attention with pharmacological constraints yields more reliable and clinically meaningful medication recommendation. We open-source our code at https://github.com/saxenakrati09/GraphDiffMed.

【8】Velocityformer: Broken-Symmetry-Matched Equivariant Graph Transformers for Cosmological Velocity Reconstruction
标题：Speedityformer：用于宇宙学速度重建的破对称匹配等变图变形器
链接：https://arxiv.org/abs/2605.21483

作者：Tilman Tröster,David Mirkovic,Veronika Oehl,Arne Thomsen
摘要：Precise measurement of the kinematic Sunyaev-Zel'dovich (kSZ) effect - a probe of the large-scale distribution of baryonic matter, a key observable for cosmological inference - requires accurate reconstruction of galaxy velocities from spectroscopic surveys. The signal-to-noise ratio (SNR) of kSZ measurements scales directly with the correlation coefficient $r$ between reconstructed and true velocities. We introduce Velocityformer, an equivariant graph transformer architecture designed to match the specific symmetry of the observational data. While the underlying physics is equivariant with respect to translations and rotations, observational effects break this symmetry due to the preferred line-of-sight direction. Matching the model's inductive bias to the data's broken symmetry consistently improves performance across all model sizes and training volumes, with Velocityformer improving $r$ by 35% over the standard linear theory baseline and outperforming ML baselines at every data volume. By matching the model's inductive bias to the data and conditioning on the physics-based long-wavelength solution, Velocityformer is highly data-efficient, training to high accuracy on as few as 4 low-fidelity simulations, and generalises zero-shot across input geometry, cosmological parameters, and galaxy sample. On high-fidelity simulated galaxy catalogues, this yields a 30% improvement in $r$ over the physical baseline, directly translating to the same SNR gain on observational data.

【9】Spectral bandits for smooth graph functions with applications in recommender systems
标题：光滑图函数的谱强盗及其在推荐系统中的应用
链接：https://arxiv.org/abs/2605.20552

作者：Tomáš Kocák,Michal Valko,Rémi Munos,Branislav Kveton,Shipra Agrawal
备注：Published at AAAI 2014 - SDMBD
摘要：Smooth functions on graphs have wide applications in manifold and semi-supervised learning. In this paper, we study a bandit problem where the payoffs of arms are smooth on a graph. This framework is suitable for solving online learning problems that involve graphs, such as content-based recommendation. In this problem, each recommended item is a node and its expected rating is similar to its neighbors. The goal is to recommend items that have high expected ratings. We aim for the algorithms where the cumulative regret would not scale poorly with the number of nodes. In particular, we introduce the notion of an effective dimension, which is small in real-world graphs, and propose two algorithms for solving our problem that scale linearly in this dimension. Our experiments on real-world content recommendation problem show that a good estimator of user preferences for thousands of items can be learned from just tens nodes evaluations.

【10】Contradiction Graphs Determine VC Dimension
标题：矛盾图决定VC维度
链接：https://arxiv.org/abs/2605.20434

作者：Jesse Campbell,Daniel Ibaibarriaga,Lev Reyzin
摘要：We study the contradiction graphs associated with binary concept classes. For a class $H \subseteq \{0,1\}^X$, the order-$m$ contradiction graph $G_m(H)$ has as vertices the $H$-realizable labeled sequences of length $m$, with two vertices adjacent when the two sequences assign opposite labels to some common domain point. Our main result is that the single graph $G_m(H)$ determines the threshold predicate $\mathrm{VCdim}(H)\ge m$. Consequently, the full sequence $(G_m(H))_{m \ge 1}$ determines the exact VC dimension and, in particular, detects finite versus infinite VC dimension, answering a question posed by Alon et al. (2024).

Transformer(13篇)

【1】Fast and Stable Triangular Inversion for Delta-Rule Linear Transformers
标题：三角规则线性Transformer的快速稳定三角形倒置
链接：https://arxiv.org/abs/2605.21325

作者：Aleksandros Sobczyk,Gioele Gottardo,Christos K. Matzoros,Mirko De Vita,Filip Skogh,Anastasios Zouzias,Jiawei Zhuang
备注：Preprint
摘要：Linear attention has emerged as a cornerstone for efficient long-context architectures, as evidenced by its integration into state-of-the-art open-source models including Qwen3.5/3.6, Kimi Linear, and RWKV-7. Models that incorporate linear attention layers with the so-called Delta-Rule involve the inversion of triangular matrices as a core sub-routine. This operation often forms a performance bottleneck, and, due to its high-sensitivity to numerical errors, it can significantly deteriorate end-to-end model accuracy if it is not carefully implemented. This work provides a systematic analysis of both direct and iterative triangular inversion algorithms, targeting methods that are rich in matrix products, and, therefore, have the potential to efficiently utilize modern hardware. To that end, our analysis covers a broad spectrum of mathematical and practical aspects, with a heavy focus on numerical stability, computational complexity, and, ultimately, hardware efficiency and practical considerations. We provide a rigorous experimental evaluation to verify these properties in practical scenarios, and in low-precision floating-point representations, highlighting the strengths and limitations of each method. Performance benchmarks on NPUs reveal up to $4.3\times$ speed-up against the state-of-the-art implementations of SGLang for triangular matrix inversion, leading to significant performance improvements on the entire layer level, while maintaining full end-to-end model accuracy.

【2】OCTOPUS: Optimized KV Cache for Transformers via Octahedral Parametrization Under optimal Squared error quantization
标题：OCTOPUS：在最佳平方误差量化下，通过八角体参数化优化变形机的KV缓存
链接：https://arxiv.org/abs/2605.21226

作者：Mark Boss,Vikram Voleti,Simon Donné,Shimon Vainer
摘要：The key-value (KV) cache dominates memory bandwidth and footprint in long-context autoregressive inference. Recent rotation-preconditioned codecs (TurboQuant, PolarQuant) show that a structured random rotation followed by a per-coordinate scalar quantizer matched to an analytically tractable marginal is a near-optimal recipe for KV compression. OCTOPUS advances this paradigm through joint quantization of rotated coordinate triplets. Each triplet's direction is mapped to a square via an octahedral parameterization, and the two resulting coordinates and the triplet norm are Lloyd-Max quantized against implementation-matched marginals. Optimizing the per-triplet squared error gives a strictly non-uniform bit allocation depending only on the total dimensionality of the keys. We find the finite-dimensional quality optimum with sweeps to be constant on every real decoder we test. The codec is data-oblivious, online, and deterministic given a seed. Across text, video, and audio, OCTOPUS matches or beats every prior rotation codec at every reported bit width and metric, with a lead that grows as bits drop for extreme compression. Furthermore, a fused Triton implementation reconstructs keys on the fly without materializing the uncompressed key, so the codec adds no decode-time bandwidth or latency over the existing dequantization. Project Page: https://octopus-quant.github.io/

【3】HORST: Composing Optimizer Geometries for Sparse Transformer Training
标题：霍斯特：为稀疏Transformer训练编写优化器几何图形
链接：https://arxiv.org/abs/2605.21104

作者：Tom Jacobs,Rohan Jain,Rebekka Burkholz
备注：22 pages, 8 figures
摘要：Sparsifying transformers remains a fundamental challenge, as standard optimizers fail to simultaneously encourage sparsity and maintain training stability. Effective adaptive optimizers exhibit an implicit $L_{\infty}$ bias favoring stability, yet, sparsity requires an $L_1$ bias. To integrate sparsity, we propose a composition of optimizer steps, which we cast as non-commutative operators to analyze and combine their optimization geometry in a principled way. This yields HORST (Hyperbolic Operator for Robust Sparse Training), a modular optimizer that inherits stability from adaptive methods while inducing $L_1$ sparsity bias through a hyperbolic mirror map. Our experiments demonstrate its utility for sparse training of transformers on both vision and language tasks. HORST consistently and significantly outperforms AdamW baselines across all sparsity levels, with large gains at higher sparsity.

【4】Musical Attention Transformer: Music Generation Using a Music-Specific Attention Model
标题：音乐注意力Transformer：使用特定音乐注意力模型的音乐生成
链接：https://arxiv.org/abs/2605.21081

作者：Shinnosuke Taksuka,Hideo Mukai
备注：32 pages, 13 figures
摘要：This study aims to enhance the quality of music generation using Transformers by incorporating meta-information. While Transformer-based approaches are effective at capturing long-term dependencies in musical compositions, the music they generate often suffers from issues such as excessive repetition or duplication of notes, leading to unnatural melodies. To address these limitations, we propose Musical Attention, a mechanism that incorporates meta-information such as bar numbers, key, signatures, and tempos into the attention process. Musical Attention explicitly leverages both the structural properties of music and its associated metadata, enabling the Transformer's attention mechanism to operate more effectively and thereby improving the quality of the generated output. In our framework, each musical note is represented as a combination of five events-pitch, bar number, onset, duration, and velocity in addition to the three metadata elements. The attention mechanism is then modified to reflect the correlations among these eight features, allowing the model to better capture the inherent characteristics of musical composition. Experimental results demonstrate that the model incorporating Musical Attention outperforms prior methods, such as Full Attention and Strided Attention, in terms of musical coherence, variation, and overall quality. Notably, it significantly reduces repetition and enhances the model's ability to generate diverse, harmonically consistent melodies. Musical Attention thus represents a meaningful advancement in AI-driven music generation, facilitating the creation of more natural and expressive compositions.

【5】Genetic Programming with Transformer-Based Mutation for Approximate Circuit Design
标题：基于变换器变异的遗传编程用于近似电路设计
链接：https://arxiv.org/abs/2605.21055

作者：Ondrej Galeta,Lukas Sekanina
备注：To appear at IEEE World Congress on Computational Intelligence, Congress on Evolutionary Computation, Maastricht, NL, 2026
摘要：A recent trend is to leverage machine learning models to improve the evolutionary design and optimization process. We propose a novel transformer-based mutation operator for Cartesian genetic programming (CGP) for the automated design of approximate arithmetic circuits. We introduce a hybrid scheme for CGP in which the proposed mutation operator is switched with the standard mutation operator to prevent stagnation of the circuit approximation process. We also develop a new training scheme for the underlying transformer that utilizes training vectors composed of thousands of CGP chromosomes representing various approximate multipliers. For several target error constraints, the approximate multipliers evolved with CGP utilizing the transformer-based mutation achieve better trade-offs than the highly optimized designs available in the state-of-the-art EvoApproxLib library of approximate circuits. Although both training and evolutionary processes are computationally demanding, they appear to be necessary steps for improving existing approximate circuits and producing new, potentially patentable circuit designs.

【6】A Sharper Picture of Generalization in Transformers
标题：《Transformer》中概括的更清晰画面
链接：https://arxiv.org/abs/2605.20988

作者：Paul Lintilhac,Sair Shaikh
摘要：We study transformers' generalization behavior on boolean domains from the perspective of the Fourier Spectra of their target functions. In contrast to prior work (Edelman et al., 2022; Trauger and Tewari, 2024), which derived generalization bounds from Rademacher complexity, we investigate the feasibility of obtaining generalization bounds via PAC-Bayes theory. We show that sparse spectra concentrated on low-degree components enable low-sharpness constructions with good generalization properties. Our idea is to show the existence of flat minima implementing any boolean function of sparsity no greater than the context length, and then apply a PAC-Bayes bound to an idealized low-sharpness learner, resulting in a non-vacuous generalization bound. We evaluate predictions empirically and conduct a mechanistic interpretability study to support the realism of our theoretical construction in real transformers.

【7】Markovian Circuit Tracing for Transformer State Dynamic
标题：Transformer状态动态的马尔科夫电路跟踪
链接：https://arxiv.org/abs/2605.20824

作者：Abdullah X
摘要：Many sequence computations are easier to study as movement through internal states than as isolated local circuits. We introduce Markovian Circuit Tracing (MCT), a diagnostic pipeline for testing whether transformer activations contain coarse state-transition structure. The benchmark uses synthetic Hidden Markov Model (HMM) tasks where latent states, transition matrices, Bayesian belief vectors, Bayes-optimal predictions, and forced-state counterfactual targets are known exactly. Across six HMM families and three seeds per family, tiny causal transformers learn near-Bayes next-token predictors, with mean excess loss over Bayes of 0.0138. Residual activations contain partial Bayesian belief information in this controlled synthetic benchmark. State abstractions extracted from these activations recover coarse transition signal, strongest in persistent and lower-state regimes, and weaker in ambiguous-emission and six-state regimes. The clearest result comes from state forcing. Patching a recovered-state centroid reduces KL to the exact HMM counterfactual target from 0.1957 in the unpatched model to 0.0532 on average, beating wrong-state, mean-activation, random-activation, and shuffled-label controls. The contribution is a controlled benchmark and evaluation framework for transformer state-dynamics interpretability, with MCT as a simple reference pipeline

【8】Most Transformer Modifications Still Do Not Transfer at 1-3B: A 2020-2026 Update to Narang et al. (2021) with Downstream Evaluation and a Noise Floor
标题：大多数Transformer修改仍然不会在1-3B传输：2020-2026年更新Narang等人（2021），包含下游评估和噪音下限
链接：https://arxiv.org/abs/2605.20798

作者：Yang Zhao,Jiahao Lu,Bin Huang,Guhua Zhang,Jie Zhou
备注：19 pages, 3 figures, under review at EMNLP 2026
摘要：Narang et al. (2021) evaluated 40+ Transformer modifications at T5-base scale and concluded that most did not transfer. Five years later, the typical working regime has moved to 1-3B parameters, downstream evaluation has replaced pretraining perplexity, and a substantially different catalogue of modifications has emerged. We revisit their question by testing 20 post-2021 Transformer modifications at 1.2B and 3B under strict iso-data, iso-compute, iso-recipe control, with a multi-seed baseline noise floor and CLIMB-12 downstream evaluation as the primary metric. The central finding reproduces theirs at this curated set: most modifications do not transfer. Of the 20 modifications, only two clear Bonferroni correction at 1.2B; one of those two further fails to train stably at 3B under the shared recipe. We also find that the loss-downstream gap reported by Tay et al. (2023) enlarges several-fold for attention-output modifications: two significant failures converge to within 2-3% of baseline validation loss yet drop 6-16 CLIMB-points. We conclude that noise-floor reporting, downstream evaluation, and cross-scale stability testing are now prerequisites for architecture comparisons at 1-3B.

【9】LT2: Linear-Time Looped Transformers
标题：LT 2：线性时间循环Transformer
链接：https://arxiv.org/abs/2605.20670

作者：Chunyuan Deng,Yizhe Zhang,Rui-Jie Zhu,Yuanyuan Xu,Jiarui Liu,T. S. Eugene Ng,Hanjie Chen
摘要：Looped Transformers (LT) have emerged as a powerful architecture by iterating their layers multiple times before decoding the final token. However, pairing them with full attention retains quadratic complexity, making them computationally expensive and slow. We introduce LT2 (Linear-Time Looped Transformers), a family of looped architectures that replace quadratic softmax attention with subquadratic, linear-time attention. We study two variants: LT2-linear with linear attention and LT2-sparse with sparse attention. We find that looping uniquely synergizes with these variants: it enables iterative memory refinement in linear attention and progressively expands the effective receptive field in sparse attention. We formalize these benefits theoretically and demonstrate consistent empirical gains across controlled recall, state-tracking, and language modeling tasks. We then explore LT2-hybrid, which combines different attention variants in a looped setting. Two variants are especially promising: LT2-hybrid (GDN+DSA), which interleaves linear and sparse attention to maximize efficiency and matches the standard looped transformer's quality at fully linear-time cost; and LT2-hybrid (Full+GDN), which interleaves GDN with a small fraction of full attention layers to maximize quality, surpassing the standard looped transformer in both performance and efficiency. We also show how to convert a pre-trained LT into an LT2-hybrid model. With about 1B tokens of training, our converted model, Ouro-hybrid-1.4B, outperforms industry-level 1B models and is competitive with industry-level 4B models while retaining the speed benefits of linear-time attention. Together, these results show a clear path toward making looped transformers more scalable and advancing efficient, capable small language models.

【10】RoPeSLR: 3D RoPE-driven Sparse-LowRank Attention for Efficient Diffusion Transformers
标题：RoPeSLR：3D RoPE驱动的稀疏低等级关注高效扩散Transformer
链接：https://arxiv.org/abs/2605.20659

作者：Yuxi Liu,Zekun Zhang,Yixiang Cai,Renjia Deng,Yutong He,Kun Yuan
摘要：Diffusion Transformers (DiTs) have revolutionized high-fidelity video generation, yet their $\mathcal{O}(L^2)$ attention complexity poses a formidable bottleneck for long-sequence synthesis. While recent sparse-linear attention hybrids aim to mitigate this, their performance severely degrades at extreme sparsity due to the "RoPE Dilemma": standard linear attention fails to preserve the orthogonal relative-position structure of 3D Rotary Position Embeddings (RoPE), neutralizing vital distance awareness. To address this, we propose \textbf{RoPeSLR}, a 3D RoPE-guided Sparse-LowRank attention framework. We establish that under empirically validated assumptions, the DiT attention manifold admits a decoupling into a high-frequency semantic spike set (bounded by $\mathcal{O}(L^{3/2})$ sparsity) and an extreme low-rank ($\mathcal{O}(d_h \log L)$) background continuum. Guided by this structural prior, RoPeSLR eschews standard linear attention for a head-wise low-rank parameterization equipped with a learnable 3D Absolute Positional Embedding (PE) injection, seamlessly synthesizing long-range relative distance decay. By guaranteeing sub-quadratic sparsity and sub-linear rank growth, RoPeSLR is exceptionally suited for scaling to ultra-long video inference. Extensive evaluations validate this scalable superiority: at 90\% sparsity, RoPeSLR achieves up to $10\times$ fewer FLOPs on Wan2.1-1.3B and delivers a $2.26\times$ end-to-end inference speedup on the ultra-long 100K+ token sequences of HunyuanVideo-13B, all while maintaining near-lossless generation fidelity (less than 1.3\% average VBench degradation).

【11】Weight Decay Regimes in Grokking Transformers: Cheap Online Diagnostics
标题：GrokkingTransformer中的体重衰退状态：廉价在线诊断
链接：https://arxiv.org/abs/2605.20441

作者：Lucky Verma
备注：28 pages, 11 figures, 5 tables. Code and aggregate JSONs: https://github.com/lucky-verma/grokking-diagnostics. Per-run JSONs: https://huggingface.co/datasets/lucky-verma/grokking-diagnostics-runs. Lean 4/mathlib v4.29.0 formal checks available in the code repository
摘要：Transformers trained on modular arithmetic exhibit sharp transitions between memorization, generalization, and collapse. We show that weight decay acts as a scalar empirical control parameter for these regimes, and introduce two cheap online diagnostics, mean pairwise attention-head cosine similarity and entropy standard deviation, that track training dynamics from attention activations alone and complement loss-landscape diagnostics at lower compute cost. Across eleven experimental conditions and three model scales (0.82M to 85M parameters), the weight-decay axis separates memorization, developmental grokking, and collapse. A near-transition logistic fit localizes the memorization-to-developmental boundary at $λ_c=0.0158$ (95% CI [0.0109, 0.0200], N=210); a power-law fit gives an empirical exponent $ν=0.757$ (CI [0.725, 0.799]). Reference exponents $ν=1/2$ and 3D Ising $ν\approx 0.63$ lie outside this empirical CI under our four-bin grid, so we report $ν$ as empirical and defer universality-class identification to denser finite-size-scaling work. A horizon-matched multi-task replication (n=280, four modular operations) preserves the weight-decay control pattern; a paired attention-head re-initialization experiment at $λ=0.05$ changes Phase-2 amplitude (Cohen's $d=-1.190$, n=10, $p_t=4.5 \times 10^{-3}$), while matched weight-norm clipping does not. Three cross-architecture probes (4L MLP, 4L LSTM, and 4L Mamba; each n=70) replicate the weight-decay-controlled transition with architecture-specific $λ_c$ values. Main diagnostic claims are scoped to modular arithmetic in small transformer attention models; the non-attention experiments are scope probes, and architecture-wide, language-model, and universality-class claims are out of scope.

【12】Plug-and-Play Spiking Operators: Breaking the Nonlinearity Bottleneck in Spiking Transformers
标题：即插即用的尖峰运算符：打破尖峰Transformer中的非线性瓶颈
链接：https://arxiv.org/abs/2605.20289

作者：Xinzhe Yuan,Xiang Peng,Bin Gu,Huan Xiong
备注：Accepted to ICML 2026. 9 pages main paper, 8 pages appendix, 6 figures, 5 tables. Correspondence to Bin Gu and Huan Xiong
摘要：ANN-to-SNN conversion offers a practical, training-free route to spiking large language models. However, current pipelines primarily focus on spike-driven realizations for Transformer linear-algebra operations, while providing limited support for key nonlinear operators. This gap limits compatibility with neuromorphic-style execution constraints, where such nonlinearities typically require division, exponentiation, or norm computations that are not naturally supported by standard leaky integrate-and-fire dynamics. To solve this problem, we propose a plug-and-play framework that implements spike-friendly approximations for Transformer nonlinearities and integrates into existing ANN-to-SNN pipelines. Our method decomposes these nonlinear computations into three recurring primitives -- division, exponentiation, and $\ell_2$ norms -- and realizes them via population computation using LIF neuron groups, combined with lightweight bit-shift scaling to avoid floating-point arithmetic. By composing these primitives as modular operator blocks, our framework supports common Transformer nonlinearities (e.g., Softmax, SiLU, and normalization) without any fine-tuning. Experiments on a range of LLMs Transformers show that selectively replacing the targeted nonlinear operators incurs less than a $1\%$ accuracy drop across all evaluated tasks.

【13】Large-Step Training Dynamics of a Two-Factor Linear Transformer Model
标题：双因素线性Transformer模型的大步骤训练动力学
链接：https://arxiv.org/abs/2605.21292

作者：Krishnakumar Balasubramanian
摘要：Gradient-flow analyses show that simplified linear transformers can learn the in-context linear-regression algorithm, but they do not explain the finite-step behavior of gradient descent at large learning rates. Motivated by empirical work on high-learning-rate transformer instabilities and by the cubic-map phase diagram for quadratic regression, we study an exactly reducible one-prompt linear-transformer training problem. After normalization, the dynamics reduce to a two-factor product map with an effective step-size parameter $μ$. On the balanced slice, this map recovers the known scalar cubic transition from monotone convergence to catapult convergence, periodic and chaotic bounded nonconvergence, and divergence. We then analyze the full two-dimensional system and show that, for \(0

GAN|对抗|攻击|生成相关(7篇)

【1】Disentangling Generation and Regression in Stochastic Interpolants for Controllable Image Restoration
标题：可控图像恢复随机插值中的解开生成和回归
链接：https://arxiv.org/abs/2605.21381

作者：Yi Liu,Jia Ma,Wengen Li,Jihong Guan,Shuigeng Zhou,Yichao Zhang
备注：44 pages, 16 figures, 16 tables
摘要：Recent advances in Image Restoration (IR) have been largely driven by generative methods such as Diffusion Models and Flow Matching, which excel in synthesizing realistic textures while suffering from slow multi-step inference and compromised pixel fidelity. In contrast, classical regression-based IR methods excel precisely in these aspects, offering single-step efficiency and high pixel-level reconstruction fidelity. To bridge this gap, we propose DiSI, a unified framework that Disentangles the underlying Stochastic Interpolant process into independent generation and regression components. This decoupling endows DiSI with remarkable versatility, enabling a continuous and controllable transition from a pure regression process to a fully generative one. Technically, we instantiate this framework with two specific sampling trajectories, accompanied by a unified sampler for high-quality, few-step inference on arbitrary trajectories. Furthermore, we design a dual-branch U-Net style transformer network in pixel space, using a dedicated branch to enhance conditional guidance while ensuring high throughput. Extensive experiments demonstrate that DiSI efficiently achieves competitive results on various IR tasks, while uniquely offering the inference-time flexibility to control the distortion-perception trade-off within a single model.

【2】Domain-Adaptable Reinforcement Learning for Code Generation with Dense Rewards
标题：用于具有密集奖励的代码生成的域自适应强化学习
链接：https://arxiv.org/abs/2605.21180

作者：Erfan Aghadavoodi Jolfaei,Daniel Maninger,Abhinav Anand,Mert Tiftikci,Mira Mezini
备注：10 pages, 2 figures, under review
摘要：Large language models show strong potential for automated code generation, but lack guarantees for correctness, quality, safety, and domain-specific constraints. For instance in robotics, where code generation is increasingly being used for planning and executing actions, awareness of the environment and physical constraints is critical. To facilitate the adaption of code-generating LLMs to diverse requirements, including domain-specific ones, we present a reinforcement learning framework that fine-tunes pre-trained LLMs using proximal policy optimization. Our customizable execution-aware reward formula captures and optimizes syntax, functional correctness, code style, security, and simulator executability. A token-level reward mapping mechanism enables effective credit assignment from execution outcomes to generated tokens. The framework is evaluated on general-purpose code generation (MBPP/MBPP+) and robotic program synthesis (RoboEval). The results show substantial improvements in functional correctness and simulator executability, including an absolute pass@1 increase of 19% on MBPP and a reduction in execution failures by 51% on RoboEval. These findings demonstrate that structured reinforcement learning can effectively align language models to correct program generation and domain-specific requirements.

【3】Q-SYNTH: Hybrid Quantum-Classical Adversarial Augmentation for Imbalanced Fraud Detection
标题：Q-SYNTH：用于不平衡欺诈检测的混合量子-经典对抗增强
链接：https://arxiv.org/abs/2605.21164

作者：Adam Innan,Mansour El Alami,Nouhaila Innan,Muhammad Shafique,Mohamed Bennai
备注：13 pages, 6 figures
摘要：Credit card fraud detection is fundamentally challenged by extreme class imbalance, where fraudulent transactions are rare yet operationally critical. This imbalance often biases supervised learners toward the legitimate class, leading to high overall accuracy but weaker fraud-class recall and F1-score. This paper introduces Q-SYNTH, a hybrid classical--quantum generative adversarial framework in which a parameterized quantum circuit serves as the generator and a classical neural network serves as the discriminator. Q-SYNTH is designed for minority-class fraud synthesis in tabular data and is evaluated along two dimensions: statistical fidelity to real fraud samples and downstream performance for fraud detection. To this end, generated samples are assessed using distributional similarity measures based on Kolmogorov-Smirnov statistics and Wasserstein distances, real-vs-synthetic detectability measured by AUC-ROC, and downstream classification performance across both quantum and classical classifiers. Under the reported protocol, Q-SYNTH reduces marginal distribution mismatch relative to a classical GAN baseline while maintaining competitive downstream fraud-detection performance. Although SMOTE achieves the strongest feature-wise similarity and the classical GAN attains the highest downstream performance in several settings, Q-SYNTH offers a favorable compromise between distributional fidelity and downstream performance, supporting the feasibility of hybrid quantum augmentation for imbalanced fraud detection.

【4】DISC: Decoupling Instruction from State-Conditioned Control via Policy Generation
标题：DISC：通过策略生成将指令与状态条件控制脱钩
链接：https://arxiv.org/abs/2605.20856

作者：Hanxiang Ren,Pei Zhou,Xunzhe Zhou,Yanchao Yang
摘要：Language-conditioned manipulation policies typically process instructions and observations through shared network parameters. This task-state entanglement provides a pathway for observation leakage -- networks learn scene-to-action shortcuts that bypass language grounding entirely. DISC eliminates this failure structurally. Rather than conditioning a universal policy on language, DISC uses a hypernetwork to generate the entire parameter set of a task-specific visuomotor policy from the instruction alone. The generated policy never directly accesses language; therefore, its task-awareness must come from the language. Consequently, observation leakage has no pathway to emerge. On the other hand, generating coherent high-dimensional policy weights is itself a challenging problem. We address it with a two-stage hypernetwork whose refinement stage embeds the structure of gradient-based optimization as a feed-forward inductive bias, producing globally consistent parameters without actual gradient computation. Trained entirely from scratch on standard data budgets, DISC outperforms all entangled baselines on LIBERO-90 and Meta-World, with advantages that widen on complex, long-horizon tasks -- and surpasses the large-scale pretrained $π_0$ despite using no external pretraining data. On a real-world benchmark where all tasks share identical visual context, DISC substantially outperforms entangled alternatives, directly confirming that language-generated policy parameters, not visual shortcuts, drive behavior. The hypernetwork further learns a semantically structured parameter manifold that enables few-shot adaptation from minimal demonstrations and robust generalization across paraphrased instructions. Our code is available at: {https://github.com/ReNginx/DISC}.

【5】Code Generation by Differential Test Time Scaling
标题：通过差异测试时间缩放生成代码
链接：https://arxiv.org/abs/2605.20473

作者：Yifeng He,Ethan Wang,Jicheng Wang,Xuanxin Ouyang,Hao Chen
备注：16 main text, 21 pages with references
摘要：Test-time scaling has emerged as a promising approach for improving code generation by exploring large solution spaces at inference time. However, existing methods often rely on public test cases that are unavailable in practice, or require extensive LLM inference for candidate selection, leading to significant token consumption and time overhead. We present DiffCodeGen, a novel test-time scaling method for code generation based on coverage-guided differential analysis. DiffCodeGen generates diverse code candidates using various sampling and prompting strategies, then applies coverage-guided fuzzing to synthesize inputs without requiring any existing tests or large language models. By executing all candidates on these inputs, DiffCodeGen captures their dynamic behavior and clusters candidates based on behavioral similarity. DiffCodeGen selects the medoid of the largest cluster as the final output. Unlike prior test-time scaling methods that invoke additional LLM inference for candidate selection, DiffCodeGen performs selection without any extra model calls, incurring little to no additional token consumption. DiffCodeGen is fully asynchronous, naturally suited to the current trend of agentic coding, and is thus efficient and highly scalable. We evaluate DiffCodeGen across 4 large language models, demonstrating consistent improvements over baselines. Compared to state-of-the-art test-time scaling methods, DiffCodeGen achieves competitive or superior performance while using only a fraction of time and tokens. DiffCodeGen is model-agnostic and can be combined with reasoning models to further boost performance.

【6】Causal Unlearning in Collaborative Optimization: Exact and Approximate Influence Reversal under Adversarial Contributions
标题：协同优化中的因果消除：敌对贡献下的精确和近似影响力拒绝
链接：https://arxiv.org/abs/2605.20341

作者：Ali Mahdavi,Azadeh Zamanifar,Amirfarhad Farhadi,Omid Kashefi
摘要：Federated learning systems must support data deletion requests to comply with privacy regulations, yet retraining from scratch after each deletion is computationally prohibitive. We present HF-KCU, a method that removes a client's contribution by approximating the influence function through conjugate gradient iterations in Krylov subspaces, reducing complexity from O(d^3) to O(kd) where k<

【7】AirfoilGen: A valid-by-construction and performance-aware latent diffusion model for airfoil generation
标题：AirfoilGen：用于机翼生成的结构验证和性能感知的潜在扩散模型
链接：https://arxiv.org/abs/2605.20303

作者：Zhijie Yang,Min Tang,Qiang Zou
备注：15 pages
摘要：Airfoil shape design is a fundamental task in aerospace engineering, with a direct impact on flight stability and fuel consumption. Deep learning has recently emerged as a promising tool for this task, but existing deep generative approaches remain limited in both geometric validity and physical controllability. They offer little control over the generated shapes, yielding invalid geometries, and they typically do not condition effectively on aerodynamic performance. To address these issues, this paper proposes AirfoilGen, a valid-by-construction and performance-aware latent diffusion model for airfoil. It first introduces a novel airfoil representation scheme, the circle sweeping representation, to constrain the generative process so that output shapes respect essential airfoil characteristics. It then enables explicit control over aerodynamic performance (e.g., lift and drag coefficients) by operating in a learned latent space: a transformer model encodes airfoil shapes into vector embeddings, and a conditional diffusion model denoises Gaussian noise into these latent embeddings while incorporating target aerodynamic performance. In addition, this paper presents a new dataset of over 200,000 airfoils, which is substantially larger than the widely used UIUC airfoil dataset (1,650 airfoils) and more suitable for training modern deep generative models. Experiments demonstrate that AirfoilGen enables airfoil generation with far greater geometric validity and aerodynamic performance controllability than previously achievable, with an average performance-conditioning accuracy of 98.41%.

半/弱/无/有监督|不确定性|主动学习(11篇)

【1】Data-Efficient Neural Operator Training via Physics-Based Active Learning
标题：通过基于物理的主动学习进行数据高效的神经操作员训练
链接：https://arxiv.org/abs/2605.21348

作者：Alicja Polanska,Lorenzo Zanisi,Vignesh Gopakumar,Stanislas Pamela
备注：Presented at the ICLR 2026 Workshop on Artificial Intelligence and Partial Differential Equations
摘要：Solving partial differential equations with neural operators significantly reduces computational costs but remains bottlenecked by high training data requirements. Active learning offers a natural framework to mitigate this by selectively acquiring the most informative samples in an iterative manner. We introduce physics-based acquisition - a novel physics-informed active learning algorithm that leverages the partial differential equation residual to guide data selection. We validate the method by presenting numerical experiments for the 1D Burgers equation and the 2D compressible Navier-Stokes equations. We show that, in our experiments, physics-based acquisition consistently outperforms random acquisition and matches the state of the art in data efficiency. At the same time, it has the unique advantage of injecting a physics inductive bias into the training process, ensuring that simulation cost is spent where the model's physical understanding is weakest.

【2】A Unified Framework for Uncertainty-Aware Explainable Artificial Intelligence: A Case Study in Power Quality Disturbance Classification
标题：不确定性感知可解释人工智能的统一框架：电能质量扰动分类的案例研究
链接：https://arxiv.org/abs/2605.21114

作者：Yinsong Chen,Samson S. Yu,Zhong Li,Chee Peng Lim
摘要：Post-hoc explainable AI (XAI) methods typically produce deterministic attribution maps, whereas Bayesian neural networks (BNNs) induce a distribution over explanations. Capturing the variability of this distribution is important for uncertainty-aware decision-making. This paper formalises the \emph{explanation distribution} as the push-forward measure of the BNN posterior through any Lipschitz-continuous attribution operator. It further proposes the uncertainty-aware relevance attribution operator (UA-RAO), a general family of operators that summarises the explanation distribution using the mean, variance, coefficient of variation, quantiles, and set-theoretic aggregation measures. Theoretical support is provided through Monte Carlo accessibility and Wasserstein approximation bounds. The framework is evaluated on a 15-class power quality disturbance (PQD) classification benchmark, comparing three BNN approximations paired with three attribution operators using relevance mass accuracy and intersection-over-union as localisation metrics. Results show that deep ensembles with the mean UA-RAO improve localisation over the deterministic baseline, while other UA-RAO summaries reveal uncertainty patterns absent from point-estimate attributions. Qualitative results on measured signals further suggest that these patterns generalise beyond the synthetic training distribution. The framework is domain-agnostic and can be applied to any BNN paired with a Lipschitz-continuous attribution operator.

【3】Cumulative Meta-Learning from Active Learning Queries for Robustness to Spurious Correlations
标题：从主动学习的鲁棒性到虚假相关性的累积元学习
链接：https://arxiv.org/abs/2605.20771

作者：Kin Whye Chew,Jingxian Wang
备注：Under review. 26 pages, 7 figures
摘要：Spurious correlations in real-world datasets cause machine learning models to rely on irrelevant patterns, undermining reliability, generalization, and fairness. Active learning offers a promising way to address this failure mode by querying informative samples that distinguish core features from spurious ones. However, standard active-learning methods simply append queried examples to the labeled set, effectively updating only the likelihood term. In deep learning regimes, the influence of these informative samples can be diluted by the larger labeled set and memorized by overparameterized models. We propose Cumulative Active Meta-Learning (CAML), an active-learning framework that uses queried examples to meta-learn the prior, or inductive bias, governing how the model adapts. CAML casts each active-learning round as a meta-learning task: the current labeled set serves as meta-train data for adaptation, while the newly queried batch serves as meta-test data for evaluating generalization. Unlike conventional meta-learning, which treats tasks as independent and identically distributed, CAML exploits the sequential dependence between active-learning rounds by maintaining a cumulative inductive bias that is progressively refined. Theoretically, we show that this cumulative formulation introduces interaction terms that couple earlier meta-learned inductive biases with later query-induced objectives, capturing dependencies absent from standard meta-learning. Empirically, CAML improves minority-group accuracy across spurious-correlation benchmarks and acquisition strategies, with gains of up to 27.8% on Dominoes, 29.9% on Waterbirds, 14.3% on SpuCo, and 24.0% on CivilComments.

【4】Unsupervised clustering and classification of upper limb EMG signals during functional movements: a data-driven
标题：功能性运动期间上肢肌电信号的无监督集群和分类：数据驱动
链接：https://arxiv.org/abs/2605.20599

作者：L. F. Salazar Álvarez,D. Escobar-Saltarén,M. B. Salazar Sánchez,S. C. Henao-Aguirre
备注：19 Congreso Colombiano de Computación (19CCC)
摘要：This study presents a comprehensive approach for the clustering and classification of upper-limb surface electromyography (sEMG) signals during functional reach and grasp movements. The methodology was applied to the NINAPRO DB4 dataset, which provides multichannel EMG recordings of 52 gestures. A four-stage pipeline was designed, including signal preprocessing, fea-ture extraction, gesture selection via hierarchical clustering, and comparative model evaluation. Preprocessing involved a fourth-order low-pass filter (0.6 Hz) and Hilbert envelope transformation, effectively reducing noise and enhancing signal clarity. Feature extraction yielded 26 temporal and frequency-domain met-rics, which were later refined using visual analysis, mutual information, principal component analysis, and decision tree importance scores. A final subset of five key features was selected for classification tasks. Gesture selection was per-formed through hierarchical clustering using Mahalanobis distance, resulting in six representative movements that balanced biomechanical diversity and compu-tational efficiency. A 200 ms window was identified as optimal for temporal seg-mentation based on stability and physiological plausibility. Classifier models were evaluated in two stages. Automated comparison using PyCaret identified Extra Trees (ET) and Artificial Neural Networks (ANN) as top performers. Sub-sequent independent training confirmed their stability and generalization capac-ity, with ANN showing progressive learning and ET maintaining robust, con-sistent results. The findings support the implementation of adaptive, low-latency control strategies for myoelectric prostheses and provide a scalable pipeline for future real-time applications.

【5】\ECUAS{n}: A family of metrics for principled evaluation of uncertainty-augmented systems
链接：https://arxiv.org/abs/2605.20490

作者：Lautaro Estienne,Erik Ernst,Matías Vera,Pablo Piantanida,Luciana Ferrer
备注：pre-print, 9-pages paper, 25 pages total
摘要：In high-stakes automated decision-making, access to predictive uncertainty is essential for enabling users -- human or downstream systems -- to accept or reject predictions based on application-specific cost trade-offs. Such uncertainty-augmented (UA) systems -- i.e., systems that output both predictions and uncertainty scores -- are currently being assessed in the literature in a variety of ways, using separate metrics to evaluate the predictions and the uncertainty scores, setting a cost function with a fixed rejection cost or integrating over a coverage-risk curve. We argue that these evaluation approaches are inadequate for assessing overall performance of the UA system for decision making under uncertainty and propose a novel family of metrics, \ECUAS{n}, formulated as proper scoring rules for the task of interest. The parameter $n$ controls the trade-off between the cost of incorrect predictions and imperfect uncertainties depending on the needs of the use-case. We demonstrate the advantages of the \ECUAS{n} metrics both theoretically and empirically, through experiments on diverse classification and generation datasets, including a manually annotated subset of TriviaQA.

【6】Oracle Supervision Transfers for Hyperparameter Prediction in Model-Based Image Denoising
标题：基于模型的图像去噪中超参数预测的Oracle监督转移
链接：https://arxiv.org/abs/2605.20479

作者：Jianmin Liao,Lixin Shen,Yuesheng Xu
摘要：Hyperparameter prediction is a critical practical bottleneck for model-based image denoisers, ranging from classical TV/TGV variational solvers to modern diffusion-based models such as DiffPIR. While existing learned predictors can achieve near-oracle performance, this approach scales poorly: each new configuration conventionally requires its own oracle-labeled training set, and each label requires a hierarchical grid search evaluated against clean ground truth. We therefore ask whether oracle supervision collected on source configurations can transfer to target configurations with few or no target oracle labels. We propose HyperDn, a single configuration-conditioned predictor that pools oracle supervision across source configurations and predicts heterogeneous hyperparameters for new denoiser--noise configurations. In a cross-paradigm experiment, HyperDn transfers from relatively cheap TV/TGV variational sources to more expensive diffusion-based DiffPIR. With only $2$ target oracle labels, it reaches $30.23$\,dB, within $0.90$\,dB of the oracle, and outperforms the $64$-label per-configuration predictor trained from scratch, using $1/32$ as many target labels as that baseline point. Without any target oracle labels, HyperDn also reaches near-oracle PSNR on two unseen mixtures of seen noise types and on transfer from relatively cheap $96\times 96$ source images to $512\times 768$ targets. Together, these results show that expensive oracle supervision for hyperparameter prediction can be transferred from source to new target configurations, reducing the need to rebuild oracle labels for each new denoising configuration.

【7】CASCADE Conformal Prediction: Uncertainty-Adaptive Prediction Intervals for Two-Stage Clinical Decision Support
标题：CASCADE保形预测：两阶段临床决策支持的不确定性自适应预测区间
链接：https://arxiv.org/abs/2605.20468

作者：Ricardo Diaz-Rincon,Muxuan Liang,Adolfo Ramirez-Zamora,Benjamin Shickel
备注：Accepted to ICML 2026 AgenticUQ Workshop. 14 Pages, 3 Figures
摘要：Effective medication management in Parkinson's Disease (PD) is challenging due to heterogeneous disease progression, variable patient response, and medication side effects. While AI models can forecast levodopa equivalent daily dose (LEDD) as a measure of medication needs, standard uncertainty quantification often fails to communicate the reliability of these predictions, treating high and low confidence clinical decisions identically. We introduce CASCADE (Calibrated Adaptive Scaling via Conformal And Distributional Estimation), a novel conformal prediction framework that propagates epistemic uncertainty from a screening classifier to adapt downstream predictions. Unlike standard conformal methods that rely on auxiliary residual regression, we leverage epistemic uncertainty from a primary classification task (identifying whether a medication change is needed) to dynamically scale the prediction intervals of a secondary regression task (predicting how much change). By mapping Venn-Abers multi-probabilistic uncertainty directly to non-conformity scores, our framework achieves continuous risk adaptation. We demonstrate that this ``cascade effect'' produces highly efficient intervals for confident patients (38.9% narrower than standard conformal baselines) while automatically expanding intervals to ensure robust coverage for uncertain cases, bridging the gap between discrete clinical decision-making and continuous dose forecasting in PD.

【8】Supervised Latent Restructuring for Small-Data Quantum Learning in Plant Phenomics
标题：植物表型组学中小数据量子学习的监督潜在重组
链接：https://arxiv.org/abs/2605.20413

作者：Alakananda Mitra,David H. Fleisher,Vangimalla Reddy,Chittaranjan Ray
备注：11 pages, 4 Tables, 3 Figures
摘要：High-dimensional biological data often exhibit a severe mismatch between feature dimensionality and sample size, making reliable classification difficult in extremely small-data regimes. In these settings, kernel methods can lose discriminative power when latent compression fails to preserve class-separating structure. We study this problem in fine-grained plant phenomics and propose a hybrid workflow that compresses 1280-dimensional deep image embeddings into a 64-dimensional PCA space and then restructures them into an 11-dimensional supervised latent space using Linear Discriminant Analysis (LDA), followed by GPU-accelerated Quantum Kernel Alignment (QKA) on NVIDIA L40S hardware. Empirically, supervised latent restructuring substantially improves the geometric separability of the compressed representation, increasing the Silhouette coefficient from 0.003 in the raw embedding space and -0.006 in PCA-64 to 0.197 in the supervised LDA-11 space. However, downstream classical evaluation reveals a clear compression trade-off: Linear SVM and XGBoost improve in the restructured latent space, whereas RBF-SVM and Random Forest degrade under the same 11-dimensional bottleneck. Under a constrained optimization budget, QKA in this regime remains challenging, indicating that latent geometry alone is not sufficient for strong trainable quantum performance. These findings position representation geometry as a central design variable in small-data quantum learning and expose the practical difficulty of recovering nonlinear discriminative structure from aggressively compressed biological representations.

【9】OmniISR: A Unified Framework for Centralized and Federated Learning via Intermediate Supervision and Regularization
标题：OmniZR：通过中间监督和规范化进行集中式和联邦学习的统一框架
链接：https://arxiv.org/abs/2605.20276

作者：Wei-Bin Kou,Guangxu Zhu,Ming Tang,Chen Zhang,Lisheng Wu,Lei Zhou,Yujiu Yang
备注：18 pages
摘要：The global deployment of edge intelligence operates across heterogeneous legal frameworks. While some regions permit centralized learning (CL) via cloud data aggregation, others enforce strict data localization, necessitating federated learning (FL). This operational dichotomy introduces two incompatible optimization regimes (i.e., unbiased global gradients yet coupled with internal covariate shift in CL versus biased, drift-prone local updates in FL), resulting in that any naive integration of the two lacks rigorous theoretical guarantees. To fill this gap, we propose OmniISR, a unified framework that fuses pure CL, pure FL, and hybrid CL-FL training modes via equipping intermediate supervision and regularization (ISR) signals at multiple hidden layers. Specifically, we propose (i) to use mutual-information (MI) as intermediate supervision to align shifting internal covariate in CL and client-drifting representations in FL, and (ii) to adopt negative-entropy (NE) as intermediate regularizer to penalize overconfident prediction, preserve representational uncertainty, and avoid device-specific collapse. On the theory side, we derive (i) a unified, ISR-agnostic, and non-asymptotic O(1/sqrt(T)) convergence bound that shows the introduced ISR does not violate standard SGD convergence, (ii) a federated drift-bound that quantifies the ISR-reduced client drift, (iii) a gradient-alignment guarantee that ensures non-conflicting CL and FL updates under mild bias, and (iv) an explicit escape-time bound that indicates that CL-FL hybrid mixing enlarges effective stochasticity and accelerates escape from strict saddles. Extensive experiments demonstrate that OmniISR consistently improves model performance in both centralized and federated paradigms, reduces the CL-FL gap by 22.60%, and yields 37/48 paired metric wins across multiple FL algorithms.

【10】Multi-Agent Reinforcement Learning for Safe Autonomous Driving Under Pedestrian Behavioral Uncertainty
标题：行人行为不确定性下安全自动驾驶的多智能体强化学习
链接：https://arxiv.org/abs/2605.20255

作者：Prakash Aryan,Kaushik Raghupathruni,Timo Kehrer,Sebastiano Panichella
备注：Submitted to ICRA 2026 Workshop "8th Workshop on Long-term Human Motion Prediction"
摘要：Simulation-based testing of self-driving cars (SDCs) typically relies on scripted or simplified pedestrian models that do not capture the heterogeneity and uncertainty of real human crossing behavior. This limits the realism of safety assessments, especially in scenarios involving jaywalking, which is governed by latent personality traits that the vehicle cannot observe. We hypothesize that jointly training pedestrians and the SDC with multi-agent reinforcement learning (MARL) produces more realistic interaction scenarios than training the SDC against fixed pedestrian policies, and that the resulting behavior gap between predictable and unpredictable crossings can be measured directly from trajectories. This paper describes a MARL environment in which an SDC and 12 pedestrians are co-trained using Multi-Agent Proximal Policy Optimization (MAPPO). Pedestrian locomotion follows scripted Dijkstra pathfinding, while an RL policy controls high-level go/wait decisions. Jaywalking probability depends on a per-pedestrian personality trait sampled at episode start and hidden from the SDC. In 500-episode evaluations, the co-trained SDC reached 78% of goals with a 14% collision rate, compared to 35% goals and 33% collisions for the best rule-based baseline. A speed differential metric shows that the SDC traveled 2.65 m/s faster near jaywalkers than near crosswalk users at close range (0-3 m), indicating that jaywalking encounters were not anticipated. Jaywalking accounted for 13% of crossing events but was associated with 62% of collisions. Co-training with MARL pedestrians reduced collisions by 30% relative to single-agent RL, as pedestrians learned to wait when the SDC approached at speed.

【11】SOLAR: A Self-Optimizing Open-Ended Autonomous Agent for Lifelong Learning and Continual Adaptation
标题：SOlar：一个用于终身学习和持续适应的自我优化开放式自治代理
链接：https://arxiv.org/abs/2605.20189

作者：Nitin Vetcha,Dianbo Liu
备注：Accepted at "Association for the Advancement of Artificial Intelligence 2026 Conference" in Streaming Continual Learning Bridge. Published in CEUR Workshop Proceedings (Original version at https://ceur-ws.org/Vol-4183/paper2.pdf)
摘要：Despite the remarkable success of large language models (LLMs), they still face bottlenecks while deploying in dynamic, real-world settings with primary challenges being concept drift and the high cost of gradient-based adaptation. Traditional fine-tuning (FT) struggles to adapt to non-stationary data streams without resulting in catastrophic for getting or requiring extensive manual data curation. To address these limitations within the streaming and continual learning paradigm, we propose the Self-Optimizing Lifelong Autonomous Reasoner (SOLAR) which is an open-ended autonomous agent that leverages parameter-level meta-learning to self-improve, treating model weights as an environment for exploration. It initiates the process by consolidating a strong prior over common-sense knowledge making it effective for transfer-learning. By utilizing a multi-level reinforcement learning approach, SOLAR autonomously discovers adaptation strategies, enabling efficient test-time adaptation to unseen domains. Crucially, SOLAR maintains an evolving knowledge base of valid modification strategies, implicitly acting as an episodic memory buffer to balance plasticity (adaptation to new tasks) and stability (retention of meta-knowledge). Experiments demonstrate that SOLAR outperforms strong baselines on common-sense, mathematical, medical, coding, social and logical reasoning tasks, marking a significant step toward autonomous agents capable of lifelong adaptation in evolving environments.

迁移|Zero/Few/One-Shot|自适应(7篇)

【1】Quantifying Hyperparameter Transfer and the Importance of Embedding Layer Learning Rate
标题：量化超参数传输和嵌入层学习率的重要性
链接：https://arxiv.org/abs/2605.21486

作者：Dayal Singh Kalra,Maissam Barkeshli
备注：10+28 pages, 5+17 figures
摘要：Hyperparameter transfer allows extrapolating optimal optimization hyperparameters from small to large scales, making it critical for training large language models (LLMs). This is done either by fitting a scaling law to the hyperparameters or by a judicious choice of parameterization, such as Maximal Update ($μ$P), that renders optimal hyperparameters approximately scale invariant. In this paper, we first develop a framework to quantify hyperparameter transfer through three metrics: (1) the quality of the scaling law fit, (2) the robustness to extrapolation errors, and (3) the asymptotic loss penalty due to choice of parameterization. Next, we investigate through a comprehensive series of ablations why $μ$P appears to offer high-quality learning rate transfer relative to standard parameterization (SP), as existing theory is inadequate. We find that the overwhelming benefit of $μ$P relative to SP when training with AdamW arises simply from maximizing the learning rate of the embedding layer. In SP, the embedding layer learning rate acts as a bottleneck that induces training instabilities; increasing it by a factor of width to match $μ$P dramatically smooths out training while improving hyperparameter transfer. We also find that weight decay improves the scaling law fits, while, in the fixed token-per-parameter setting, it hurts the robustness of the extrapolation.

【2】Adaptive Signal Resuscitation: Channel-wise Post-Pruning Repair for Sparse Vision Networks
标题：自适应信号复苏：稀疏视觉网络的逐行修剪后修复
链接：https://arxiv.org/abs/2605.21426

作者：Qishi Zhan,Ziheng Chen,Minxuan Hu
摘要：One-shot magnitude pruning can cause severe accuracy collapse in the high-sparsity regime, even when the pruning mask preserves the largest weights. We argue that this failure reflects a granularity mismatch in post-pruning repair. Under global magnitude pruning, nearly collapsed channels can coexist with channels that retain informative activation variance within the same layer. Existing layer-wise activation repair methods apply a single correction to the whole layer, and can therefore over-amplify damaged channels while trying to restore the layer-level signal. We propose Adaptive Signal Resuscitation (ASR), a training-free channel-wise repair method that matches the granularity of repair to the granularity of damage. ASR estimates a variance-matching correction for each output channel and stabilizes it with a data-driven shrinkage rule, suppressing unreliable corrections for channels with weak post-pruning signal while preserving corrections for healthier channels. Applied before BatchNorm recalibration, ASR requires only forward passes on a small calibration set and no retraining. Across three datasets, four convolutional architectures, and both unstructured and structured sparsity settings, ASR generally improves over layer-wise repair, with the clearest gains in high-sparsity regimes. On ResNet-50 at 90% sparsity, ASR recovers 55.6% top-1 accuracy on CIFAR-10, compared with 41.0% for layer-wise repair and 28.0% for BatchNorm-only recalibration. Ablations show that naive channel-wise variance matching is insufficient, and that shrinkage stabilizes post-pruning repair.

【3】CAdam: Context-Adaptive Moment Estimation for 3D Gaussian Densification in Generative Distillation
标题：CAAdam：生成蒸馏中三维高斯加密的上下文自适应矩估计
链接：https://arxiv.org/abs/2605.20872

作者：SeungJeh Chung,Geonho Park,Misong Kim,HyeongYeop Kang
备注：Accepted to SIGGRAPH 2026 Conference Papers. 12 pages, 8 figures
摘要：Adaptive densification is the engine of 3D Gaussian Splatting (3DGS). However, when transposed to the optimization-based Generative Distillation paradigm, this reconstruction-native mechanism reveals fundamental limitations, resulting in inefficient representations cluttered with redundant primitives. We diagnose this failure as a Densification Dilemma stemming from the stochastic nature of generative guidance: the standard magnitude-based accumulation indiscriminately aggregates transient noise alongside geometric signals, making it difficult to strike a balance between over-densification and under-fitting. To resolve this, we introduce Context-Adaptive Moment Estimation (CAdam), a novel framework that reinterprets densification as a statistically grounded signal verification problem. CAdam leverages the first moment of gradients to exploit the interference principle, where stochastic fluctuations cancel out via destructive interference while consistent geometric drifts accumulate via constructive interference, effectively disentangling the underlying signal from the generative noise floor. This is further augmented by a quantile-based context awareness and an intrinsic Signal-to-Noise Ratio (SNR) gating mechanism, which ensure robust adaptation across optimization stages and enable the soft termination of densification. Extensive experiments across diverse objectives (SDS, ISM, VFDS) and strong generative 3DGS backbones show that CAdam reduces Gaussian count by 85%-97% relative to standard densification while preserving overall comparable perceptual quality. These results highlight signal-aware density control as a practical way to improve memory efficiency in optimization-based generative distillation.

【4】AGPO: Adaptive Group Policy Optimization with Dual Statistical Feedback
标题：AGO：具有双重统计反馈的自适应组政策优化
链接：https://arxiv.org/abs/2605.20722

作者：Miaobo Hu,Shuhao Hu,Bokun Wang,Ruohan Wang,Xin Wang,Xiaobo Guo,Daren Zha,Jun Xiao
摘要：Reinforcement learning improves LLM reasoning, but PPO/GRPO typically use fixed clipping and decoding temperature, which makes training brittle and tuning-heavy. We propose Adaptive Group Policy Optimization (AGPO), a critic-free refinement of GRPO that uses group-level statistics to control both update magnitude and exploration. AGPO uses a shared probe-derived statistical state to drive two controllers: (i) adaptive clipping, which sets the trust-region size from reward dispersion and skewness, probe vote entropy, policy entropy, and step-wise KL drift; and (ii) bidirectional adaptive temperature sampling, which heats or cools decoding around a base temperature according to centered uncertainty relative to a running baseline. On nine English and Chinese math/STEM benchmarks, Qwen2.5-14B trained with AGPO outperforms PPO/GRPO under the same generated-token budget, reaching 67.3% on GSM8K and 40.5% on MATH. Gains transfer to Llama-3-8B and Gemma-2-9B, and ablations confirm both modules are complementary. Our implementation is publicly available at https://github.com/wandugu/paper_agpo.

【5】Decision-Path Patterns as Tree Reliability Signals: Path-based Adaptive Weighting for Random Forest Classification
标题：决策路径模式作为树木可靠性信号：随机森林分类的基于路径的自适应加权
链接：https://arxiv.org/abs/2605.20716

作者：Youngjoon Park
备注：16 pages, 1 figure. Code and data: https://github.com/DavidParkYJ/dwarfp
摘要：Random forests aggregate tree votes by simple majority, treating all trees as equally informative. We observe that the topological pattern along each tree's root-to-leaf decision path -- where and how often the dominant class label flips along it -- carries a signal of tree reliability that is exploitable for per-sample reweighting. The naive use of this signal is structurally confounded with the predicted class, so we propose a class-conditional ratio weighting that guarantees zero expected class bias by construction. On 30 binary classification benchmarks under a shared-forest, shared-split protocol with 30 repeats, the proposed method is the only one among four compared schemes -- RF, weighted RF, KNORA-Eliminate, KNORA-Union -- to yield a statistically significant accuracy improvement over RF (Wilcoxon p = 0.018), while the three alternatives all fail to do so (p > 0.5). It is also the only scheme without majority-recall regressions, with minority-recall regressions limited to 3/30 datasets -- a one-sided loss to which classical dynamic ensemble selection methods are susceptible. The gain is robust across forest sizes from 100 to 1000 trees.

【6】AVSD: Adaptive-View Self-Distillation by Balancing Consensus and Teacher-Specific Privileged Signals
标题：AVSD：通过平衡共识和教师特定的特权信号进行自适应视角的自我蒸馏
链接：https://arxiv.org/abs/2605.20643

作者：Duy Nguyen,Hanqi Xiao,Archiki Prasad,Zaid Khan,Anirban Das,Austin Zhang,Sambit Sahu,Hyunji Lee,Elias Stengel-Eskin,Mohit Bansal
备注：Code: https://github.com/duykhuongnguyen/AVSD
摘要：Self-distillation enables language models to learn on-policy from their own trajectories by using the same model as both student and teacher, with the teacher being conditioned on privileged information unavailable to the student. Such information can come in different types or views, such as solutions, demonstrations, feedback, or final answers. This setup provides dense token-level feedback without relying on a separate external model, but creates a fundamental asymmetry: the teacher may rely on view-specific information that the student cannot access at inference time. Moreover, the best type of privileged information is often task-dependent, making it difficult to choose a single teacher view. In this work, we address both these challenges jointly by introducing AVSD (Adaptive-View Self-Distillation), a novel method of self-distillation with multiple privileged-information views, which reconstructs token-level supervision by separating stable cross-view consensus from view-specific residual signals. AVSD identifies the consensus signal shared across views, which provides a reliable update direction, and then selectively adds the view-specific residual signal to adjust the update magnitude when it both aligns with the consensus direction and remains proportionate to the consensus signal. Experiments on math competition benchmarks (AIME24, AIME25, and HMMT25) show that AVSD consistently outperforms both single-view self-distillation baselines and GRPO, achieving average Avg@8 gains of 3.1% and 2.2% over the strongest baselines on Qwen3-8B and Qwen3-4B, respectively. Moreover, on code-generation benchmarks (Codeforces, LiveCodeBench v6) using Qwen3-8B, AVSD outperforms the single-view self-distillation baseline by 2.4% on average.

【7】Sample Complexity of Transfer Learning: An Optimal Transport Approach
标题：迁移学习的样本复杂性：一种最佳传输方法
链接：https://arxiv.org/abs/2605.20545

作者：Haoyang Cao,Xin Guo,Wenpin Tang,Guan Wang
摘要：Transfer learning is an essential technique for many machine learning/AI models of complex structures such as large language models and generative AI. The essence of transfer learning is to leverage knowledge from resolved source tasks for a new target task, especially when the sample size $m$ of the training data for the latter is low. In this work, we rigorously analyze the potential benefit of transfer learning in terms of sample efficiency. Specifically, taking an optimal transport viewpoint of transfer learning, we find that when the data dimension $d$ is higher than $3$, the sample complexity for transfer learning is $O(m^{-(α+1)/d})$, with $α$ indicating the smoothness of the data distribution, as opposed to the $O(m^{-p/d})$ sample complexity for direct learning with $p$ indicating the smoothness of the optimal target model. Our finding theoretically supports a better sample efficiency for transfer learning, when the target task is optimizing over a family of not-so-smooth models (i.e., highly complex networks with the possible use of non-smooth activation functions). Using image classification as an example, we numerically demonstrate the sample efficiency for transfer learning, that is, in the data hungry regime, the model performance can be significantly improved by transfer learning.

强化学习(15篇)

【1】DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards
标题：DelTA：可验证奖励强化学习的区分性代币信用分配
链接：https://arxiv.org/abs/2605.21467

作者：Kaiyi Zhang,Wei Wu,Yankai Lin
摘要：Reinforcement learning from verifiable rewards (RLVR) has emerged as a central technique for improving the reasoning capabilities of large language models. Despite its effectiveness, how response-level rewards translate into token-level probability changes remains poorly understood. We introduce a discriminator view of RLVR updates, showing that the policy-gradient update direction implicitly acts as a linear discriminator over token-gradient vectors and thereby determines which token probabilities are increased or decreased during learning. Under standard sequence-level RLVR, this discriminator is constructed from positive- and negative-side centroids formed by advantage-weighted averaging of token-gradient vectors. However, such centroid construction can be dominated by shared high-frequency patterns, such as formatting tokens, diluting sparse yet discriminative directions that better distinguish high-reward responses from low-reward ones. To address this limitation, we propose $\textbf{DelTA}$, a discriminative token credit assignment method that estimates token coefficients to amplify side-specific token-gradient directions and downweight shared or weakly discriminative ones. These coefficients reweight a self-normalized RLVR surrogate, making the effective side-wise centroids more contrastive and thereby reshaping the RLVR update direction. On seven mathematical benchmarks, DelTA outperforms the strongest same-scale baselines by 3.26 and 2.62 average points on Qwen3-8B-Base and Qwen3-14B-Base, respectively. Additional results on code generation, a different backbone, and out-of-domain evaluations further demonstrate the generalization ability of DelTA.

【2】DeCoR: Design and Control Co-Optimization for Urban Streets Using Reinforcement Learning
标题：DeCoR：使用强化学习的城市街道设计和控制协同优化
链接：https://arxiv.org/abs/2605.21311

作者：Bibek Poudel,Lei Zhu,Kevin Heaslip,Sai Swaminathan,Weizi Li
备注：22 pages, 8 figures
摘要：Modern vision systems can detect, track, and forecast urban actors at scale, yet translating perception outputs to urban design remains limited. We introduce DeCoR, a two-stage reinforcement learning framework that leverages flow observations to co-optimize crosswalk layout and network-level signal control. The design stage encodes the pedestrian network as a graph and learns a generative policy that parameterizes a Gaussian mixture model over crosswalk location and width, from which new crosswalks are sampled. For each layout, a shared control policy learns adaptive signal timings to minimize joint pedestrian and vehicle delay. On a 750 m real-world urban corridor with demand sensed from video and Wi-Fi logs, DeCoR learns a layout that reduces pedestrian arrival time to their nearest crosswalk by 23% while using fewer crosswalks than existing configurations. On the control side, DeCoR reduces pedestrian and vehicle wait time by 79% and 65%, respectively, relative to fixed-time signalization. Further, the control policy generalizes to demands outside of training and is robust to layout changes without retraining.

【3】Behavior-Consistent Deep Reinforcement Learning
标题：行为一致的深度强化学习
链接：https://arxiv.org/abs/2605.21214

作者：Marcel Hussing,Liv G. d'Aliberti,Claas Voelcker,Benjamin Eysenbach,Eric Eaton
摘要：Reinforcement learning (RL) often exhibits high variance across training runs, leading to unreliable performance and posing a major challenge to deployment in real-world domains. In this work, we address the challenge of cross-run policy divergence by formalizing the problem of behavior-consistent RL, where the objective is to obtain policies that are both high-performing and distributionally similar across training runs. Our key observation is that maximum-entropy RL provides a direct mechanism for controlling behavioral divergence by anchoring runs to a common (uniform) prior. We prove that, for Boltzmann policies, choosing the temperature proportional to $Q$-function disagreement bounds the pairwise KL divergence between the induced policies. However, we also show that naïvely increasing entropy might impair policy optimization while amplifying off-policy error. Building upon these observations, we propose $Q$-value Expectile Disagreement (QED), a state-dependent temperature schedule that uses double-critic disagreement as a single-run proxy for cross-run disagreement. Empirically, we demonstrate that across 18 continuous-control tasks, QED reduces across-run divergence by two orders of magnitude without sacrificing performance, resulting in a considerable reduction in return variance at modest sample-efficiency costs.

【4】Reinforcement Learning-based Control via Y-wise Affine Neural Networks: Comparative Case Studies for Chemical Processes
标题：通过Y向仿射神经网络的基于强化学习的控制：化学过程的比较案例研究
链接：https://arxiv.org/abs/2605.21211

作者：Austin Braniff,Yuhe Tian
备注：Accepted for publication at the 23rd IFAC World Congress, 2026
摘要：In this work we present an efficient and practically implementable approach for the application of reinforcement learning (RL)-based control in chemical process systems. This is an area that has yet to widely adopt RL-based control largely due to inherent challenges in trusting RL algorithms and the time-consuming process of training reliable agents. To address these challenges, we leverage a class of RL algorithms termed Y-wise Affine Neural Network (YANN)- RL, which we have developed in our prior work (Braniff and Tian, 2025a). By strategically initializing actor and critic networks YANN-RL algorithms provide confident and interpretable starting points within control schemes. We apply this RL-based control approach to three different process engineering case studies publicly available on the PC-Gym library (Bloor et al., 2026): (i) a continuous stirred tank reactor (CSTR), (ii) a four-tank system, and (iii) a multistage extraction column. Our approach is compared to several popular RL algorithms (PPO, SAC, DDPG, and TD3) and is benchmarked against nonlinear model predictive control (NMPC). These case studies demonstrate that YANN-RL can greatly reduce the training time and data needed, can be deployed with confidence for chemical process systems, and can approach the performance of NMPC without the knowledge of a full nonlinear model.

【5】Learning First Integrals via Backward-Generated Data and Guided Reinforcement Learning
标题：通过反向生成数据和引导强化学习第一积分
链接：https://arxiv.org/abs/2605.21160

作者：Jingfeng Zhong,Zhengxiang Liu,Zhijie Wang,Shuai Li
备注：17 pages, 2 figures, 3 tables
摘要：The discovery of first integrals is of fundamental scientific importance for understanding conservation laws in dynamical systems. However, existing symbolic computation tools and Large Language Models (LLMs) remain limited on this task because high-quality training data are scarce and successful solutions often depend on mathematical intuition. This paper presents FISolver, an LLM-based solver developed to address this challenge. First, we introduce a "Backward Generation" algorithm that systematically builds large-scale datasets of (differential equation, first integral) pairs by deriving differential equations from sampled integrals, thereby alleviating the data scarcity bottleneck. Second, we apply supervised fine-tuning to a compact mathematical model and further improve its performance through reinforcement learning with a Levenshtein Distance-based shaped reward. In addition, we design data synthesis and blending strategies that support effective adaptation to difficult problem families from sparse examples. Experiments show that FISolver, while requiring substantially lower computational cost, significantly outperforms larger mathematical LLMs and commercial solvers such as Mathematica on challenging benchmarks, indicating a new data-driven route for automated discovery of first integrals.

【6】Distill to Think, Foresee to Act: Cognitive-Physical Reinforcement Learning for Autonomous Driving
标题：提炼到思考，预见到行动：自动驾驶的认知物理强化学习
链接：https://arxiv.org/abs/2605.21139

作者：Yang Wu,Qiang Meng,Zhaojiang Liu,Youquan Liu,Jian Yang,Jin Xie
摘要：Current end-to-end autonomous driving models are fundamentally constrained by the behavioral cloning ceiling of imitation learning. While reinforcement learning offers a path to smarter autonomy, it demands two missing pieces of infrastructure: (1) a cognitive foundation that understands traffic semantics and driving intent, and (2) a foresighted physical environment that can anticipate the consequences of candidate actions. To this end, we propose CoPhy, a CognitivePhysical reinforcement learning framework for autonomous driving. To distill to think, we distill VLM knowledge into the BEV encoder and then discard the VLM entirely, retaining cognitive ability at zero inference cost while releasing the cognitive channel as a pluggable interface for optional human language commands. To foresee to act, we build an auto-regressive BEV world model that explicitly predicts future semantic maps conditioned on candidate actions, serving as an interpretable physical sandbox from which safety metrics are directly derived. Built upon this dual infrastructure, we optimize the driving policy via GRPO with a novel dual-reward mechanism: a physical reward derived from BEV rollouts enforces hard safety constraints, while a cognitive reward from a language-aligned scorer ensures intent compliance. Extensive experiments demonstrate that CoPhy not only achieves state-of-the-art results on NAVSIM v1 and v2 benchmarks, but also enables safer driving via cognitively informed scene compliance and flexible intent control through user-defined language instructions.

【7】Multi-Step Likelihood-Ratio Correction for Reinforcement Learning with Verifiable Rewards
标题：具有可验证奖励的强化学习的多步似然比修正
链接：https://arxiv.org/abs/2605.20865

作者：Deokgyu Yoon,Hyungkyu Kang,Joongkyu Lee,Byeongchan Kim,Gyungin Shin,Sungrae Park,Min-hwan Oh
摘要：Reinforcement learning with verifiable rewards (RLVR) plays a pivotal role in improving the reasoning ability of large language models. However, widely used PPO surrogate objectives are fundamentally local, as they rely on a local approximation of the exact policy gradient objective. While this approximation improves stability by reducing the variance induced by importance sampling, it also introduces structural bias into the surrogate objective, which must be controlled through trust region mechanisms. In this work, we introduce the $N$-step forward trace, which augments the PPO surrogate objective using the cumulative likelihood ratio of the next $N-1$ tokens. Building on this idea, we propose $N$-Step Forward-Trace Policy Optimization (NFPO), a practical RLVR algorithm that integrates the $N$-step forward trace into the masked policy gradient framework. NFPO provides a continuous bridge between the PPO surrogate objective and the exact policy gradient objective, offering a principled mechanism for controlling the bias-variance trade-off. Our theoretical analysis shows that, with an appropriate choice of $N$, the proposed objective yields a tighter policy-improvement bound than the standard PPO surrogate. Experiments on comprehensive reasoning benchmarks demonstrate that NFPO consistently improves performance, supporting our theoretical findings.

【8】Design for Manufacturing: A Manufacturability Knowledge-Integrated Reinforcement Learning Framework for Free-Form Pipe Routing in Aeroengines
标题：面向制造的设计：航空发动机自由形式管道布线的可制造性知识集成强化学习框架
链接：https://arxiv.org/abs/2605.20644

作者：Caicheng Wang,Zili Wang,Shuyou Zhang,Yongzhe Xiang,Zheyi Li,Liangyou Li,Jianrong Tan
摘要：Design for manufacturing plays a critical role in advanced aeroengine development, where complex components necessitate careful consideration of manufacturability. However, current practices in pipe routing remain largely decoupled from down-stream manufacturing, leading to labor-intensive, trial-and-error iterations to achieve manufacturable designs. To address this problem, this study proposes the Frenet-based pipe routing optimization (FPRO) framework, a manufacturability knowledge-integrated reinforcement learning approach for free-form pipe design in aeroengines. FPRO formulates the routing problem as a boundary value problem in the Frenet frame. In this framework, the pipe path is represented by curvature and torsion profiles, which are generated using cubic Hermite interpolation. To integrate design and manufacturing, domain-specific manufacturing knowledge is embedded as constraints on the permissible ranges of curvature and torsion. The path optimization is performed using the proximal policy optimization algorithm with stochastic exploration and a stage-guided reward mechanism. A unified mapping formulation then translates the optimized path into motion trajectories for the bending die, enabling direct fabrication on a six-axis free-bending machine. Experimental results demonstrate that FPRO consistently generates collision-free, manufacturable paths with smoother geometric profiles compared to Cartesian-based methods. It also achieves faster convergence and superior performance in terminal alignment, path length, obstacle avoidance, and manufacturability compared to state-of-the-art reinforcement learning baselines. Real-world validation confirms the close geometric correspondence between the manufactured pipe and its digital design, validating the practical feasibility of FPRO.

【9】Compositional Transduction with Latent Analogies for Offline Goal-Conditioned Reinforcement Learning
标题：离线目标条件强化学习的潜在类比合成转换
链接：https://arxiv.org/abs/2605.20609

作者：Junseok Kim,Dohyeong Kim,Mineui Hong,Songhwai Oh
备注：ICML 2026
摘要：Compositional generalization is essential for reaching unseen goals under novel contextual variations in offline goal-conditioned reinforcement learning (GCRL), where a generalist goal-reaching agent must be learned from limited data. Most prior approaches pursue this via trajectory stitching over temporally contiguous segments, which limits composing behaviors across varying contexts. To overcome this limitation, we formalize analogy transduction as synthesizing new plans by composing task-endogenous analogies with given contexts and propose a novel analogy representation tailored for it. Grounded in our theory, this analogy representation captures what changes under optimal task execution, remains invariant to contextual variations, and is sufficient for optimal goal reaching. We further contend that generalization to unseen analogy-context pairs is a practical obstacle in analogy transduction, and introduce a new approach for offline GCRL that enables analogy transduction beyond seen pairs to unseen combinations. We empirically demonstrate the effectiveness of our approach on OGBench manipulation environments, substantially outperforming prior methods that do not perform analogy transduction. Project page: https://rllab-snu.github.io/projects/CTA/

【10】ReversedQ: Opportunities for Faster Q-Learning in Episodic Online Reinforcement Learning
标题：ReversedQ：情景在线强化学习中更快的Q学习机会
链接：https://arxiv.org/abs/2605.20592

作者：Sofia R. Miskala-Dinc,Aviva Prins
备注：This paper contains 5 pages and 2 figures. To be presented at the Adaptive and Learning Agents workshop (ALA 2026) at AAMAS 2026
摘要：We study model-free Q-learning in finite-horizon episodic Markov Decision Processes (MDPs) with stationary dynamics across episodes. We identify a central issue in nascent model-free posterior-sampling works: the reliance on delayed learning in order to prove theoretical guarantees. In particular, we identify three opportunities for faster learning - (i) value-function update order, (ii) update frequencies, and (iii) value-function initialization. Using Wang et al.'s RandomizedQ as a basis, we illustrate these changes and their individual (as well as cumulative) impact in multiple empirical studies. We find that our combined modifications, termed ReversedQ, improve scaled mean cumulative reward compared to RandomizedQ, from 9.53% to 78.78% in the Bidirectional Diabolical Combination Lock (BDCL), and from 21.76% to 61.81% in a chain MDP.

【11】Mahjax: A GPU-Accelerated Mahjong Simulator for Reinforcement Learning in JAX
标题：Mahjume：一款用于JAX强化学习的GPU加速麻将模拟器
链接：https://arxiv.org/abs/2605.20577

作者：Soichiro Nishimori,Shinri Okano,Keigo Habara,Sotetsu Koyamada,Eason Yu,Masashi Sugiyama
摘要：Riichi Mahjong is a multi-player, imperfect-information game characterized by stochasticity and high-dimensional state spaces. These attributes present a unique combination of challenges that mirror complex real-world decision-making problems in reinforcement learning. While prior research has heavily relied on supervised learning from human play logs to pre-train the policy, algorithms capable of learning \textit{tabula rasa} (from scratch) offer greater potential for general applicability, as evidenced by the AlphaZero lineage. To facilitate such research, we introduce \textbf{Mahjax}, a fully vectorized Riichi Mahjong environment implemented in JAX to enable large-scale rollout parallelization on Graphics Processing Units (GPUs). We also provide a high-quality visualization tool to streamline debugging and interaction with trained agents. Experimental results demonstrate that Mahjax achieves throughputs of up to \textbf{2 million} and \textbf{1 million steps per second} on eight NVIDIA A100 GPUs under the no-red and red rules, respectively. Furthermore, we validate the environment's utility for reinforcement learning by showing that agents can be trained effectively to improve their rank against baseline policies.

【12】ClaimDiff-RL: Fine-Grained Caption Reinforcement Learning through Visual Claim Comparison
标题：ClaimDiff-RL：通过视觉声明比较的细粒度字幕强化学习
链接：https://arxiv.org/abs/2605.20278

作者：Tianle Li,Xuyang Shen,Yan Ma,Rongxin Guo,Shaoxiang Chen,Jiacheng Chen,Haochen Wang,Hongyang Tang,Yucong Zhou,Yu Cheng
摘要：Long-form image captioning exposes a reward granularity problem in RL: captions are judged as whole sequences, while the important errors occur at the level of individual visual claims. A good dense caption should be both faithful and informative, avoiding hallucination without omitting salient details. Yet pairwise preferences, reference-based metrics, and holistic scalar rewards compress these local errors into a single sequence-level signal, obscuring the tradeoff between factuality and coverage. We introduce ClaimDiff-RL, a framework that uses reference-conditioned atomic claim differences as the reward unit for caption RL. Given an image, an actor caption, and a reference caption, a multimodal judge enumerates visually grounded differences, verifies each difference against the image, assigns open-vocabulary error types and severity levels, and produces per-difference statistics for reward composition. This makes hallucinated claims and omitted salient facts separately measurable and tunable. Experiments show that holistic scalar rewards can reduce hallucination by increasing missing facts, while ClaimDiff-RL exposes this faithfulness and coverage tradeoff and enables more balanced operating points. On a 160-image human-labeled diagnostic benchmark, public captioning benchmarks, and VQA benchmarks, ClaimDiff-RL improves the hallucination--missing-fact balance, preserves general capability, and even surpasses Gemini-3-Pro-Preview on several fine-grained Capability dimensions such as object counting, spatial relations, and scene recognition. These results suggest that typed, verifiable claim differences are an effective reward unit for fine-grained and diagnosable caption RL.

【13】Smaller Abstract State Spaces Enable Cross-Scale Generalization in Reinforcement Learning
标题：更小的抽象状态空间实现强化学习中的跨规模推广
链接：https://arxiv.org/abs/2605.20272

作者：Nasehatul Mustakim,Lucas Lehnert
摘要：While humans readily generalize abstract concepts to more complex or larger tasks, building Reinforcement Learning (RL) systems with this ability remains elusive. Here, we present the first theoretical model of how such Out-of-Distribution (OOD) generalization can be achieved in RL agents. Our approach considers Partially Observable Markov Decision Processes (POMDPs) and assumes that an intelligent agent uses an abstraction function to determine which experiences can be treated as equivalent and which must be distinguished. First, we extend the existing state abstraction framework and proof techniques to POMDPs. Then, we define a successor-weighted model reduction, a model reduction variant that enables compression into smaller abstract spaces than prior definitions allow. We derive a bound on the agent's OOD test performance, thereby defining the conditions under which OOD generalization is achievable. This bound decomposes an agent's performance loss into approximation and estimation errors, revealing how reducing an agent's abstract state space size improves test performance and OOD generalization. Our analysis suggests that constraining an agent to operate over a small, finite set of abstract states is necessary for achieving generalization to more complex tasks. Our results motivate further research into learning RL architectures that scale across tasks of varying complexity levels.

【14】FBOS-RL: Feedback-Driven Bi-Objective Synergistic Reinforcement Learning
标题：FBOS-RL：反馈驱动的双目标协同强化学习
链接：https://arxiv.org/abs/2605.20256

作者：Xikai Zhang,Yongzhi Li,Likang Xiao,Yingze Zhang,Yanhua Cheng,Quan Chen,Peng Jiang,Wenjun Wu,Liu Liu
摘要：Reinforcement learning has become a cornerstone for aligning and unlocking the reasoning capabilities of large-scale models. At its core, the training loop of GRPO and its variants alternates between rollout sampling and policy update. Unlike supervised learning, where each gradient step is anchored to an explicit ground-truth target, the optimal gradient direction for updating model parameters in this setting is not known a priori; the high-quality rollouts drawn during the sampling stage therefore act as the implicit "teacher" that guides every parameter update. However, GRPO adopt a simple sampling scheme that conditions all rollouts on the same original prompt. When a task lies beyond the policy model's current capability, this sampling scheme rarely yields a high-quality rollout, leaving the policy model without a meaningful gradient direction when updating its parameters, which causes training to stall. To address this issue, we propose FBOS-RL, a Feedback-Driven Bi-Objective Synergistic reinforcement learning framework. Specifically, we let the model perform Feedback-Guided Exploration Enhancement based on the feedback provided by the environment, and on top of this we design two mutually reinforcing training objectives: Exploitation-oriented Policy Alignment(EPA) and Exploration-oriented Capability Cultivation(ECC). Extensive experiments demonstrate that EPA and ECC can mutually reinforce each other, forming a positive flywheel effect that significantly improves both the training efficiency and the final performance ceiling of reinforcement learning. Specifically, under an identical number of rollouts, FBOS-RL learns substantially faster than GRPO and feedback-based baselines and ultimately attains a higher performance ceiling, while exhibiting higher policy entropy and lower gradient norms throughout training.

【15】Enhanced Reinforcement Learning-based Process Synthesis via Quantum Computing
标题：通过量子计算增强的基于强化学习的流程综合
链接：https://arxiv.org/abs/2605.21213

作者：Austin Braniff,Fengqi You,Yuhe Tian
摘要：In this work, we present quantum reinforcement learning (RL) as a solution strategy for process synthesis problems. Building on our prior work, we develop a generalized framework that formally poses process synthesis as a Markov decision process and introduces quantum-enhanced RL algorithms to solve it with improved scalability. Earlier implementations of quantum-based RL for process synthesis were limited by qubit requirements, which scaled poorly with problem complexity. This work overcomes this challenge by introducing state encoding algorithms to decouple qubit requirements from problem size. A classical RL-based solution strategy is used as a baseline to benchmark the quantum algorithms under identical training conditions. All algorithms are evaluated across a flowsheet synthesis problem of increasing unit counts to analyze their performance and scalability. Results show that all approaches are capable of identifying the optimal flowsheet designs in small design spaces. For moderate-scale unit counts, quantum approaches demonstrate competitive performance on a per-episode basis and improved efficiency on a per-parameter basis versus the classical RL benchmark. This work provides a foundation for future quantum computing applications within process systems engineering, establishes a controlled benchmark for comparing classical and quantum algorithms, and shows that the proposed quantum variants remain competitive for the process synthesis problem examined in this work.

符号|符号学习(1篇)

【1】Sutra: Tensor-Op RNNs as a Compilation Target for Vector Symbolic Architectures
标题：Sutra：Tensor-OpRNN作为载体符号架构的编译目标
链接：https://arxiv.org/abs/2605.20919

作者：Emma Leonhart
备注：Modified NeurIPS submission, see AI declaration and replication materials at end of paper
摘要：Sutra is a typed, purely functional programming language whose compiled forward pass is a PyTorch neural network. The compiler beta-reduces the whole program -- primitives, control flow, string I/O -- to one fused tensor-op graph over a frozen embedding substrate. Rotation binding, unbind, bundle, polynomial Kleene three-valued logic, and tail-recursive loops all lower to tensor operations; the Kleene connectives are Lagrange-interpolated polynomials exact on the {-1, 0, +1} truth grid. Validation is one fact tested two ways. (1) The same program runs on four frozen embeddings spanning two modalities -- three text encoders (nomic-embed-text, all-minilm, mxbai-embed-large) and one protein language model (ESM-2) -- and decodes bundles at 100% accuracy through width k=8 on every substrate, where the textbook Hadamard product has already collapsed (2.5% on mxbai-embed-large, 7.5% on all-minilm). (2) PyTorch autograd flows through the actually compiled graph: a fuzzy-rule classifier written in .su trains from random init (18.7 +/- 9.5%; chance = 20%, five classes) to 100.0 +/- 0.0% (three seeds) by backpropagating through the emitted graph, the symbolic source unmodified. A weighted variant additionally trains a scalar cosine gain and writes it back into the .su source as a numeric literal; recompiling reproduces the trained behaviour to ~2e-7 per logit, so the trained model is itself legible, recompilable code. The same artifact is therefore both a logic program and a trainable neural network.

医学相关(7篇)

【1】Automatic Discovery of Disease Subgroups by Contrasting with Healthy Controls
标题：通过与健康对照对照自动发现疾病亚组
链接：https://arxiv.org/abs/2605.21301

作者：Robin Louiset,Edouard Duchesnay,Benoit Dufumier,Antoine Grigis,Pietro Gori
备注：Accepted to Data Mining and Knowledge Discovery, ECML-PKDD 2026 Journal Track
摘要：In biomedical Subgroup Discovery, practitioners are interested in discovering interpretable and homogeneous subgroups within a group of patients. In this paper, assuming that healthy subjects (i.e., controls) share common but irrelevant factors of variation with the patients, we motivate and develop a Contrastive Subgroup Discovery method, entitled Deep UCSL. By contrasting patients with controls, Deep UCSL identifies subgroups driven solely by pathological factors, ignoring common variability shared with healthy subjects. Our framework employs a deep feature extractor to learn a discriminative representation space. Mathematically, we derive a novel loss based on the conditional joint likelihood of latent clusters and patient/control labels, optimized via an Expectation-Maximization strategy alternating between subgroup inference and feature encoder updates. A regularization term further encourages representations to capture disease-specific variability while ignoring variability shared with controls. Compared to previous related works, our approach quantitatively improves the quality of the estimated subgroups, as demonstrated on a MNIST example and four distinct real medical imaging datasets. Code and datasets are available at: https://github.com/rlouiset/deep_ucsl.

【2】Training distribution determines the ceiling of drug-blind cancer sensitivity prediction
标题：训练分布决定药物盲癌症敏感性预测的上限
链接：https://arxiv.org/abs/2605.20885

作者：Taekyung Heo
摘要：Precision oncology requires predicting which drugs will suppress a specific tumor from its molecular profile, but drug-blind sensitivity prediction has plateaued despite increasingly complex drug representations. Here we show that this stagnation reflects a metric artifact rather than a representational bottleneck. The standard benchmark, global Pearson r, is dominated by between-drug potency differences that a trivial drug-mean predictor captures without any cell-specific learning. Per-drug Pearson r, which isolates within-drug cell ranking, reveals that no drug encoding improves over cell-only features across four independent datasets. A controlled experiment channeling mechanism-of-action identity as either a drug feature or a training-distribution constraint identifies the cause. Supplying MoA as a feature yields negligible benefit, whereas using it to stratify training raises per-drug r substantially for targeted kinase inhibitors, because pan-cancer co-training suppresses pathway-specific sensitivity signals. Mechanism-stratified training and response matching from pilot observations provide two deployable strategies that together recover the principal sources of predictive gain in drug-blind sensitivity prediction.

【3】NeuroQA: A Large-Scale Image-Grounded Benchmark for 3D Brain MRI Understanding
标题：NeuroQA：3D大脑MRI理解的大规模基于图像的基准
链接：https://arxiv.org/abs/2605.20525

作者：Mohammad H. Abbasi,Favour Nerrise,Shaurnav Ghosh,Ridvan Yesiloglu,Yuncong Mao,Bailey Trang,Mohammad Asadi,Merryn Daniel,Gustavo Chau Loo Kung,Ken Chang,Pavan Pinkesh Shah,Adam Turnbull,Kyan Younes,Seena Dehkharghani,Ehsan Adeli
备注：30 pages, dataset and benchmark release
摘要：We present NeuroQA, a large-scale benchmark for visual question answering in 3D brain magnetic resonance imaging (MRI), with 56,953 QA pairs from 12,977 subjects across 12 datasets. It spans ages 5-104 and five clinical domains: Alzheimer's, Parkinson's, tumors, white matter disease, and neurodevelopment. Unlike prior medical Visual Question Answering (VQA) efforts that operate on 2D slices or rely on narrow diagnostic labels, NeuroQA pairs every item with a full 3D volume. It evaluates 11 clinically grounded reasoning skills across Yes/No, multiple-choice, and open-ended formats. Of the 203 templates, 131 are image-grounded (answerable from a 3-plane viewer) and 72 are image-informed (ground truth from quantitative volumetry or clinical instruments). To remove text-only shortcuts, we apply answer-distribution refinement, reducing closed-format text-only accuracy from $>$80% to 44.6%; image necessity is assessed separately through an image-grounding protocol released with the benchmark. A 38-rule deterministic pipeline and two rounds of expert review verify every QA pair against FreeSurfer measurements, metadata, or radiology report fields, with zero same-subject contradictions across templates. We conduct a clinician evaluation in which two clinicians independently assess 100 frozen test items on a three-plane viewer. On closed-format (Yes/No + multiple-choice) test-public items, the best zero-shot vision-language model and a supervised 3D CNN baseline reach 47.5% and 43.7% accuracy respectively, both below the 49.4% text-only majority-template floor. NeuroQA adopts a two-tier release with public QA pairs for open-access datasets and reproducible generation scripts for datasets restricted by data use agreements (DUAs), plus subject-level splits, a held-out private test set, and an online leaderboard.

【4】MedCRP-CL: Continual Medical Image Segmentation via Bayesian Nonparametric Semantic Modality Discovery
标题：MedCRP-CL：通过Bayesian非参数语义模式发现的连续医学图像分割
链接：https://arxiv.org/abs/2605.20297

作者：Ziyuan Gao
备注：Accepted by ICML 2026
摘要：Medical image segmentation faces a fundamental challenge in continual learning: data arrives sequentially from heterogeneous sources, yet effective continual learning requires discovering which tasks share sufficient structure to benefit from joint learning. Existing methods either apply uniform constraints across all tasks, causing catastrophic forgetting when tasks conflict, or require predefined task groupings that cannot anticipate future task diversity. We introduce MedCRP-CL, a framework that performs online task structure discovery and structure-aware continual learning. Leveraging the Chinese Restaurant Process (CRP), our method dynamically infers task groupings from clinical text prompts as tasks arrive, without requiring predefined cluster counts or access to future tasks. We term these discovered groupings semantic modalities, as they capture finer-grained structure than physical imaging modalities by integrating anatomical region and pathological context. Guided by this discovered structure, we maintain semantic modality-specific LoRA adapters regularized by intra-modality EWC, ensuring parameter isolation across dissimilar task groups while facilitating knowledge transfer within similar ones. The framework is also replay-free, storing only aggregate statistics rather than raw patient data. Experiments on 16 medical segmentation tasks across four imaging modalities demonstrate that MedCRP-CL achieves 73.3% Dice score with only 4.1% forgetting, outperforming the best baseline by 8.0% while requiring 6$\times$ fewer parameters. Code is available at https://github.com/zygao930/MedCRP-CL.

【5】TreeText-CTS: Compact, Source-Traceable Tree-Path Evidence for Irregular Clinical Time-Series Prediction
标题：TreeText-CTS：用于不规则临床时间序列预测的紧凑、源可追溯的树路径证据
链接：https://arxiv.org/abs/2605.20292

作者：Kwanhyung Lee,Juhwan Choi,Jongheon Kim,Joohyung Lee,Hyeongwon Jang,Eunho Yang
备注：27 pages, 4 figures
摘要：Numerical time-series models can effectively process irregular electronic health record (EHR) trajectories, but they do not naturally expose the measurements and temporal patterns supporting each risk estimate as readable evidence. Existing text-based interfaces improve readability, but typically rely on either raw serialization, which is lengthy and redundant, or patient-level free-form summaries, which are difficult to trace to source measurements and time windows. To bridge this gap, we introduce TreeText-CTS (Clinical Time-Series), which converts irregular EHR trajectories into human-readable, compact, source-traceable tree-path evidence units without patient-level summarization or inference-time autoregressive decoding. TreeText-CTS routes multi-scale window summaries through frozen XGBoost models and verbalizes activated tree paths as deterministic, source-traceable evidence units composed of threshold conditions. An evidence selector assembles an informative subset of these units, which a language-model encoder then integrates for prediction. Across PhysioNet 2012 mortality, MIMIC-III mortality, and PhysioNet 2019 sepsis-onset forecasting, TreeText-CTS achieves the best AUROC and AUPRC among evaluated text-based EHR time-series interfaces, improving AUPRC by 6.0 to 9.7 absolute percentage points over the strongest prior text-based interface while remaining competitive with numerical time-series models. Ablations show that tree-path evidence construction, evidence selection, and language-model composition each contribute to performance. Because every span passed to the language-model encoder is constructed from activated tree-path threshold conditions, TreeText-CTS makes the evidence supplied to the final predictor inspectable and source-traceable.

【6】AIMBio-Mat: An AI-Native FAIR Platform for Closed-Loop Materials Discovery and Biomedical Translation
标题：AIMBio-Mat：用于闭环材料发现和生物医学翻译的AI原生FAIR平台
链接：https://arxiv.org/abs/2605.21083

作者：D. -M. Mei,K. Acharya,C. M. Adhikari,M. Adhikari,S. Aryal,B. V. Benson,K. Bhatta,S. Bhattarai,N. Budhathoki,A. M. Castillo,D. Chakraborty,S. Chhetri,S. Choudhury,T. A. Chowdhury,R. D. Cruz,B. Cui,S. Dhital,K. -M. Dong,R. Gapuz,A. Ghasemi,E. Z. Gnimpieba,B. D. S. Gurung,H. A. Hashim,R. I. Harry,K. -E. Hasin,M. K. Hassanzadeh,M. K. Jha,D. Kim,K. -C. Kong,B. Lama,A. Mahat,N. Maharjan,A. Majeed,J. Mammo,M. M. Masud,K. S. Moore,A. Nawaz,H. Oli,S. A. Panamaldeniya,L. Pandey,R. Pandey,Z. Peng,A. Prem,M. M. Rana,K. Rana Magar,R. Rizk,C. S. Tadi,L. -W. Wang,Y. Yang,G. -L. Yin,C. -X. Yu,D. Zeng,M. Zhou,Q. Zhou
备注：35 pages, 4 figures, and 12 tables
摘要：Materials discovery and biomedical translation increasingly require models that can reason across composition, processing, structure, biological response, manufacturability, safety, and governance constraints. Existing materials and biomedical data ecosystems are powerful but remain poorly coupled for AI-guided discovery. Here we present AIMBio, a conceptual framework for an AI-native, FAIR, and governance-aware decision layer that links materials provenance, biomedical context, knowledge graphs, uncertainty-aware machine learning, and human-in-the-loop active learning. The framework formulates biomedical-materials discovery as constrained multi-objective optimization under uncertainty and introduces practical requirements for metadata, model documentation, risk-tiered governance, evaluation metrics, and phased implementation. To make the roadmap testable, we add a minimum viable prototype specification and a worked pilot for AI-guided nanomaterials for drug delivery. AIMBio is positioned as exploratory and preclinical discovery infrastructure, not as clinical decision-support software; any clinical or regulated-device use would require separate validation, change control, and regulatory review. The central contribution is a publishable platform blueprint for converting fragmented materials and biomedical records into auditable, experimentally actionable, and translationally responsible discovery workflows.

【7】Motion-Robust Deep Reconstruction for Free-Breathing Cardiac Cine MRI
标题：自由呼吸心脏电影MRI的运动稳健深度重建
链接：https://arxiv.org/abs/2605.20687

作者：Mahmut Yurt,Kanghyun Ryu,Zhitao Li,Xucheng Zhu,Xianglun Mao,Martin Janich,Marcus Alley,Kawin Setsompop,John Pauly,Shreyas Vasanawala,Ali Syed
摘要：Conventional cardiac cine MRI relies on breath-hold Cartesian acquisitions, which are vulnerable to motion artifacts and can be uncomfortable or infeasible, particularly for pediatric and other noncompliant patients who cannot reliably hold their breath. Free-breathing radial acquisitions can alleviate these limitations, but robust reconstruction at high acceleration remains challenging due to prominent streak artifacts. To address these limitations, we propose Cine-DL, a clinically oriented framework that couples targeted k-space preprocessing with fast, model-based deep reconstruction. In this pipeline, raw free-breathing radial data undergo retrospective cardiac binning and respiratory gating to resolve cardiac phases and discard motion-corrupted spokes. We then introduce Streak Optimized Coil Compression (SOC), which explicitly preserves cardiac signals while suppressing peripheral interference that typically drives the streak artifacts. The resulting 2D+t cine series is reconstructed with an unrolled network that alternates a ResNet proximal operator with physics-based data consistency updates solved via conjugate gradient. We further employ a memory-efficient training strategy that reduces peak memory usage. We evaluate Cine-DL on free-breathing volunteer data against established baselines (k-t SENSE and iGRASP) and demonstrate clinical translation via hospital deployment on newly acquired patient data. Our experiments show that Cine-DL consistently improves quantitative metrics and visual fidelity, supporting a practical route toward routine, time-sensitive clinical adoption of free-breathing cine MRI.

蒸馏|知识提取(4篇)

【1】Optimized Federated Knowledge Distillation with Distributed Neural Architecture Search
标题：基于分布式神经结构搜索的联邦知识提取优化
链接：https://arxiv.org/abs/2605.21322

作者：Chaimaa Medjadji,Sylvain Kubler,Yves Le Traon,Guilain Leduc,Sadi Alawadi,Feras M. Awaysheh
摘要：Federated Learning (FL) enables collaborative model training without centralizing data. However, real-world deployments must simultaneously address statistical heterogeneity across client data (non-IID), system heterogeneity in device capabilities, and communication efficiency. Existing FL approaches mitigate these challenges through improved aggregation, personalization, or knowledge distillation, but they almost universally assume a fixed client architecture, limiting adaptability to heterogeneous data complexity and hardware constraints. This architectural constraint often leads to suboptimal trade-offs between accuracy and efficiency in real-world FL systems. This work introduces FedKDNAS, a distillation-driven FL framework that combines client-side neural architecture selection with distillation of server-coordinated knowledge. Each client autonomously selects a lightweight model under accuracy-resource constraints. It then trains it locally using a hybrid objective combining supervised learning and knowledge distillation and shares only predictions on a public reference set. The server then aggregates and smooths these predictions, optionally combining them with a teacher model, to produce stable distillation targets for the next round. Extensive evaluation on six datasets against six representative FL baselines (FedAvg, Ditto, FedMD, FedDF, FedDistill, Local-KD) demonstrates that FedKDNAS consistently achieves superior Pareto efficiency, improving accuracy by up to 15\% under non-IID conditions, reducing client CPU usage by approximately 28\%, and decreasing communication overhead by up to 44 times while maintaining lightweight logit-based communication.

【2】PACD-Net: Pseudo-Augmented Contrastive Distillation for Glycemic Control Estimation from SMBG
标题：PACD-Net：用于SMBG中Glycosidase控制估计的伪增广对比蒸馏
链接：https://arxiv.org/abs/2605.20751

作者：Canyu Lei,David Repaske,Jianxin Xie
摘要：Effective diabetes management requires continuous monitoring of glycemic levels. Clinically, glycemic control is assessed using metrics such as Time in Range (TIR), Time Below Range (TBR), and Time Above Range (TAR), typically derived from continuous glucose monitoring (CGM). However, many patients rely on self-monitoring of blood glucose (SMBG) due to the high cost and limited accessibility of CGM. Unlike CGM, SMBG provides sparse and irregular measurements, making accurate estimation of these metrics challenging. Conventional supervised learning approaches struggle under such sparsity, leading to poor generalization and unstable performance. To address this, we propose PACD-Net, a self-supervised contrastive knowledge distillation framework for estimating glycemic control from SMBG. Pseudo-SMBG samples with richer temporal coverage are used as teacher signals to guide learning from sparse observations. In addition, multi-view contrastive learning enforces representation consistency across diverse sampling patterns. The model adopts a hybrid Swin Transformer-CNN backbone to capture temporal dependencies in sparse SMBG sequences. Experimental results demonstrate that PACD-Net consistently outperforms existing methods in estimating TAR, TIR, and TBR from real-world SMBG data, achieving improved accuracy as well as enhanced stability and generalization under extremely sparse observation settings. The proposed framework provides a practical tool for clinical SMBG interpretation and offers a generalizable approach for learning from sparse and irregularly sampled sensor data in broader applications.

【3】SAVER: Selective As-Needed Vision Evidence for Multimodal Information Extraction
标题：SAPER：多模式信息提取的选择性按需视觉证据
链接：https://arxiv.org/abs/2605.20713

作者：Miaobo Hu,Shuhao Hu,Bokun Wang,Rui Chen,Xin Wang,Xiaobo Guo,Daren Zha,Jun Xiao
摘要：Multimodal IE in social media is difficult because a post may attach multiple images that are weakly related, redundant, or even misleading with respect to the text. In this setting, always-on multimodal fusion wastes computation and can amplify spurious visual cues. The core challenge is to decide, for each candidate span or marked entity pair, whether vision should be consulted at all and, if so, which small subset of images provides trustworthy evidence. We propose SAVER, a selective vision-as-needed framework for multimodal named entity recognition and multimodal relation extraction. SAVER uses a Conformal Groundability Gate (CGG) to estimate span-level visual groundability in MNER, derive pair-level activation in MRE from the two marked entities, and calibrate the activation threshold on a held-out split via a conformal-style procedure with Clopper--Pearson upper bounds. When activated, a submodular relevance--diversity selector chooses a compact evidence subset across images, which is then aggregated by a Set Transformer. An energy-inspired joint scoring head combines text, optional visual evidence, text--image consistency, and sparse routing for entity typing or relation classification. Experiments show that SAVER consistently improves F1 over strong text-only and always-on multimodal baselines, while reducing AURC, increasing activation coverage at a fixed risk level, and lowering FLOPs and P90 latency.

【4】Consistently Informative Soft-Label Temperature for Knowledge Distillation
标题：知识蒸馏的一致信息软标签温度
链接：https://arxiv.org/abs/2605.20357

作者：Hoang-Chau Luong,Nghia Van Vo,Kaiqi Zhao,Lingwei Chen
摘要：Knowledge distillation (KD) transfers knowledge from a high-capacity teacher to a compact student by matching their predictive distributions, with temperature scaling serving as a central mechanism for smoothing teacher predictions and exposing informative "dark knowledge" beyond the hard label. However, the standard fixed-temperature design is inherently sample-agnostic. Since samples differ in logit scale and learning difficulty, a single global temperature produces teacher soft labels with highly inconsistent entropy: some predictions remain overly sharp and provide limited inter-class information, whereas others become over-smoothed and lose class-discriminative information. Moreover, sharing the same temperature between teacher and student further imposes rigid logit-scale alignment despite their capacity mismatch. To address these limitations, we propose CIST (Consistently Informative Soft-label Temperature), which assigns separate sample-wise adaptive temperatures to the teacher and student. This design produces consistently informative teacher soft labels while relaxing rigid teacher--student logit-scale matching. It also reweights the distillation objective according to teacher confidence and student learning difficulty. Theoretically, we show that teacher-label entropy is largely governed by the ratio between the maximum teacher logit and the temperature, providing a principled basis for adaptive smoothing. Empirically, CIST mitigates the inconsistency induced by fixed temperature, and experiments on both vision and language distillation tasks show consistent improvements over standard KD and strong baselines with negligible computational overhead.

推荐(3篇)

【1】HiRes: Inspectable Precedent Memory for Reaction Condition Recommendation
标题：HiRes：反应条件建议的可检查先例记忆
链接：https://arxiv.org/abs/2605.21420

作者：Shreyas Vinaya Sathyanarayana,Raja Sekhar Pappala,Deepak Warrier
摘要：Reaction condition recommendation sits immediately after retrosynthetic disconnection selection, and in practice, chemists require both accurate predictions and the precedents that justify them. We present HiRes (Hierarchical Reaction Representations), a retrieval-augmented condition recommendation system whose learned reaction space serves as both a classifier feature and an inspectable precedent memory. The model combines a graph encoder, transformation-aware cross-attention, multi-stream reaction fusion, and a k-NN retrieval layer. HiRes achieves state-of-the-art performance among primary-slot USPTO-Condition models, reaching Catalyst, Solvent, and Reagent top-1 accuracies (Acc@1) of 0.929, 0.534, and 0.530 respectively. It ties the best reported baseline on Catalyst while outperforming models such as REACON on Solvent and Reagent. Furthermore, paired bootstrap analysis demonstrates that integrating retrieval with learned condition heads provides statistically significant gains for solvent and reagent selection over purely parametric approaches. Ultimately, HiRes bridges the gap between predictive accuracy and chemical interpretability, offering a single representation that supplies both competitive recommendations and the concrete chemical precedents necessary for practical synthesis planning.

【2】Robust Personalized Recommendation under Hidden Confounding in MNAR
标题：MNAR中隐藏混淆下的稳健个性化推荐
链接：https://arxiv.org/abs/2605.21066

作者：Zongyu Li,Wanting Su,Tianyu Xia
摘要：Recommender systems often rely on observational user--item interaction data, which is prone to selection bias due to users' selective interactions with items. Inverse propensity weighting and doubly robust estimators effectively mitigate selection bias under observed confounding, but are unreliable in the presence of hidden confounders. Existing approaches relying on randomized controlled trials (RCTs) or global sensitivity bounds are constrained in practice: RCTs demand costly experimental data, while global sensitivity bounds presume a uniformly bounded effect of unmeasured confounders on propensities through sensitivity analysis, thereby neglecting heterogeneity across user--item interactions. To overcome this limitation, we propose a novel framework, which estimates user--item level sensitivity bounds, thereby substantially relaxing the homogeneity assumption inherent in global sensitivity bounds named Personalized Unobserved-Confounding-aware Interaction Deconfounder (PUID). To ensure both robustness and predictive accuracy, we further develop an adversarial optimization strategy and propose a benchmark-guided variant (BPUID) that incorporates pre-trained models as stabilizing references. Extensive experiments on three real-world datasets demonstrate that our approach significantly outperforms global methods under hidden confounding, without requiring RCT data.

【3】Robust Recommendation from Noisy Implicit Feedback: A GMM-Weighted Bayes-label Transition Matrix Framework
标题：来自噪音隐式反馈的稳健推荐：GMM加权的Bayes-Label转移矩阵框架
链接：https://arxiv.org/abs/2605.20721

作者：Zongyu Li,Xuanyu Liu,Gongce Cao,Shirui Sun,Yaqi Fang,Yongshuai Yu
摘要：Learning from implicit feedback in recommender systems is fundamentally challenged by pervasive label noise. While conventional denoising approaches often discard noisy instances to ensure robustness, this strategy inevitably suffers from low data utilization. Alternative methods that employ a Bayes-label transition matrix (BLTM) can leverage all available data, but their estimates tend to be biased in practical recommendation scenarios. To address these limitations, this paper proposes a Robust GMM-weighted Bayes-label Transition Matrix framework (RGBT). Our solution utilizes a Gaussian Mixture Model (GMM) to derive instance-specific reliability scores, which systematically calibrate the BLTM estimation to mitigate bias. Theoretical analysis confirms that our approach, by leveraging the BLTM framework with GMM calibration, simultaneously ensures full sample utilization, delivers consistent estimation, and critically, achieves a significant reduction in estimation variance. Extensive experiments on multiple real-world and synthetically flipped datasets demonstrate that RGBT not only utilizes noisy samples more effectively than mainstream reliable sample-based denoising methods, but also achieves significantly superior calibration capability of the transition matrix compared to state-of-the-art transition matrix-based denoising approaches.

自动驾驶|车辆|车道检测等(2篇)

【1】Closed Loop Dynamic Driving Data Mixture for Real-Synthetic Co-Training
标题：用于实时合成联合训练的闭环动态驾驶数据混合
链接：https://arxiv.org/abs/2605.21372

作者：Hongzhi Ruan,Pei Liu,Weiliang Ma,Zhengning Li,Xueyang Zhang,Jun Ma,Dan Xu,Kun Zhan
摘要：Data scaling is fundamental to modern deep learning, and grows increasingly critical as autonomous driving shifts to end-to-end learning. Real-world driving data is expensive to annotate and scene-biased, making real-synthetic co-training with near-infinite synthetic data a promising direction. However, naively incorporating all available synthetic data is inefficient and leads to distribution shifts, and optimizing data mixture under practical training budgets remains a critical yet under-explored problem. In this sense, we claim that the mixture of training data requires clear guidance in terms of scene types and quantities. Particularly in this work, we conceptualize the data mixture approximately as a dynamic optimization process that iteratively adjusts the training data mixture to maximize model performance, guided by closed-loop evaluation feedback, and propose AutoScale, a fully automated closed-loop data engine unifying scene representation, data mixture optimization and retrieval, as well as model training and evaluation. Specifically, we propose Graph Regularized AutoEncoder (Graph-RAE) for driving scene representations, introduce Cluster-aware Gradient Ascent (Cluster-GA) for cluster-wise importance estimation and reweighting, and perform cluster-guided vector retrieval to select high-value samples. Experiments on NavSim demonstrate that AutoScale outperforms vanilla co-training and cross-domain baselines, achieving better performance with fewer synthetic samples under constrained budgets.

【2】STELLAR: Scaling 3D Perception Large Models for Autonomous Driving
标题：STELlar：缩放自动驾驶的3D感知大型模型
链接：https://arxiv.org/abs/2605.20390

作者：Yingwei Li,Xin Huang,Yang Liu,Yang Fu,Alex Zihao Zhu,Chen Song,Junwen Yao,Anant Subramanian,Hao Xiang,Weijing Shi,Yuliang Zou,Tom Hoddes,Zhaoqi Leng,Govind Thattai,Dragomir Anguelov,Mingxing Tan
摘要：Model scaling has demonstrated remarkable success through large-scale training on diverse datasets. It remains an open question whether the same paradigm would apply to autonomous driving perception systems due to unique challenges, such as fusing heterogeneous sensor data and the need for sophisticated 3D spatial understanding. To bridge this gap, we present a comprehensive study on systematically analyzing the impact of scale on these systems. We develop our STELLAR model based on Sparse Window Transformer, by extending the input modalities to include LiDAR, radar, camera, and map prior. We train the model on a large-scale dataset of 50 million driving examples with up to 500 million parameters. Our large-scale experiments reveal empirical scaling trends that connect model performance to model size, data, and compute. The resulting model establishes a new state-of-the-art on the Waymo Open Dataset challenge, outperforming prior arts by a large margin. Our work demonstrates that large-scale training is a highly promising path for advancing the capabilities of perception models for autonomous driving.

联邦学习|隐私保护|加密(4篇)

【1】FedCritic: Serverless Federated Critic Learning-based Resource Allocation for Multi-Cell OFDMA in 6G
标题：FedCritic：6G中针对多小区CDMA的无服务器联邦Critic基于学习的资源分配
链接：https://arxiv.org/abs/2605.21418

作者：Amin Farajzadeh,Melike Erol-Kantarci
备注：Submitted to IEEE for possible publication
摘要：In sixth-generation (6G) ultra-dense networks, aggressive frequency reuse amplifies inter-cell interference (ICI), making multi-cell orthogonal frequency-division multiple access (OFDMA) scheduling and power control strongly coupled across neighboring cells. We study distributed downlink resource management -- joint subcarrier scheduling and power allocation -- under interference coupling and long-term per-user quality-of-service (QoS) minimum-rate constraints. By using virtual-queue deficit weights to enforce long-term QoS, we develop FedCritic, a serverless federated multi-agent actor-critic framework with decentralized execution. Unlike centralized training with decentralized execution (CTDE) approaches that require centralized critic learning and joint trajectory aggregation, FedCritic federates the critic through lightweight gossip-based parameter averaging over the interference graph, enabling stable value estimation without a central coordinator while keeping policies local. Simulations in an interference-rich reuse-1 setting show that FedCritic improves mean signal-to-interference-plus-noise ratio (SINR) and cell-edge rate, increases network-wide average sum-rate and fairness relative to non-coordinated and CTDE baselines, and achieves more stable training with lower coordination overhead.

【2】Automated Byzantine-Resilient Clustered Decentralized Federated Learning for Battery Intelligence in Connected EVs
标题：自动化的拜占庭弹性智能分散式联邦学习，用于互联电动汽车中的电池智能
链接：https://arxiv.org/abs/2605.21115

作者：Mouhamed Amine Bouchiha,Abdelaziz Amara Korba,Yacine Ghamri-Doudane
备注：16 pages, 11 figures, under review for IEEE T-ITS
摘要：Federated learning (FL) has emerged as a promising paradigm for managing electric vehicle (EV) battery data in intelligent transportation systems (ITS), enabling privacy-preserving tasks such as anomaly detection and capacity estimation. However, most existing frameworks rely on centralized aggregation schemes, which pose critical limitations in terms of security and trust. To address these challenges, we propose ABC-DFL, an automated Byzantine-resilient clustered decentralized federated learning (C-DFL) framework for connected EVs. The proposed incentive-driven C-DFL system replaces the central server with an open-permissioned blockchain, featuring a new dynamic Quorum Byzantine Fault Tolerance (QBFT) protocol and an oracle-based aggregation layer, to enhance trust, security, and automation. At the core of ABC-DFL lies FLECA (Filtered Layered Enhanced Clustering Aggregation), a robust hierarchical aggregation protocol that mitigates Byzantine attacks by having each EV filter malicious updates using an adaptive threshold based on deviations from its reference model update. Oracle nodes, responsible for inter-group aggregation, employ robust clustering to isolate and aggregate model updates from trustworthy EV groups. Comprehensive experimental evaluations demonstrate that FLECA matches FedProx convergence under benign conditions and significantly outperforms existing defenses with attack impact scores below 0.10 in adaptive adversarial scenarios. Furthermore, several learning experiments with multitask models confirm the effectiveness and fairness of the incentive mechanism. Finally, on-chain and off-chain benchmarks validate the practicality of ABC-DFL.

【3】A Typed Tensor Language for Federated Learning
标题：一种用于联邦学习的类型化张量语言
链接：https://arxiv.org/abs/2605.21103

作者：Theofilos Mailis,Kalliopi-Christina Despotidou,Konstantinos Filippopolitis,Yannis Foufoulas,Thanasis-Michail Karampatsis,Andreas Ktenidis,Evdokia Mailli,Theodore Papamarkou,Yannis Ioannidis
摘要：Federated learning and analytics are often described as collections of separate protocols, even when they share the same mathematical form: client-local tensor computation, mergeable aggregation into shared state, and shared-only post-processing. We introduce a typed tensor language that formalizes this structure. The language distinguishes federated tensors, whose records are partitioned across clients along a tracked record axis, from shared tensors, which are available globally. Its semantics are defined by comparison with a virtual global tensor, used only as a reference object. The main result is a shared-state factorization theory. We show that typed one-round programs factor through fixed-dimensional shared state whose size is independent of the number of clients and records, computed from client-local tensor expressions and merged across clients. We also prove a converse representability result; factorizations whose encoders and decoders are expressible in the language are realized by typed one-round programs, and the correspondence extends to iterative programs whose cross-round state is shared. This gives a formal account of the computations in the language that can be expressed as encode, merge, and decode procedures. We then develop a differentiable fragment for learning. If a per-record loss and its per-record gradient are represented by client-local tensor expressions, the global gradient is represented by record-axis summation of the federated gradient tensor. This yields typed iterative programs for server-side gradient descent and shared-linear-algebra second-order updates. The framework characterizes a broad class of federated learning computations whose communication passes through fixed-dimensional shared state.

【4】Choose Wisely and Privately: Proactive Client Selection for Fair and Efficient Federated Learning
标题：明智且私密地选择：积极主动的客户选择，以实现公平有效的联邦学习
链接：https://arxiv.org/abs/2605.20975

作者：Adda Akram Bendoukha,Heber Hwang Arcolezi,Nesrine Kaaniche,Aymen Boudguiga
摘要：Federated Learning enables collaborative model training across decentralized data sources without data transfer. Averaging-based FL is limited by the presence of non-IID data, which negatively impacts convergence speed and final model accuracy. Conventional alternatives suffer from significant inefficiency. Clients with noisy or highly heterogeneous data contribute expensive gradient computations that are either discarded or heavily down-weighted before aggregation. These reactive approaches waste computational resources, require more communication rounds and result in unnecessary privacy exposure. In this paper, we propose a proactive client selection framework that aims to find an optimal federation of clients whose combined data match utility and fairness requirements before training begins. Our method relies on mutual information computed from differentially private contingency tables to quantify the relevance of cross-feature correlations in the union dataset. We introduce a Potential Federation Loss (PFL) over the set of fixed-size federations, which balances two objectives. Maximizing collective data utility while ensuring fair cross-features correlations to prevent group unfairness. Client selection is expressed as an optimal subset search problem over the PFL objective, which we solve using simulated annealing under strong differential privacy guarantees for clients' local statistics. Experimental results on four benchmarks show faster, fairer, and more accurate models trained on optimally found federations, compared to uniform sampling, even when state-of-the-art adaptive aggregation or sampling strategies are employed.

推理|分析|理解|解释(21篇)

【1】Equilibrium Reasoners: Learning Attractors Enables Scalable Reasoning
标题：均衡推理：学习吸引器实现可扩展推理
链接：https://arxiv.org/abs/2605.21488

作者：Benhao Huang,Zhengyang Geng,Zico Kolter
备注：ICML 2026
摘要：Scaling test-time compute by iteratively updating a latent state has emerged as a powerful paradigm for reasoning. Yet the internal mechanisms that enable these iterative models to generalize beyond memorized patterns remain unclear. We hypothesize that generalizable reasoning arises from learning task-conditioned attractors: latent dynamical systems whose stable fixed points correspond to valid solutions. We formalize this process through Equilibrium Reasoners (EqR), which enable test-time scaling without external verifiers or task-specific priors. EqR scales internal dynamics along two axes: depth, by running more iterations, and breadth, by aggregating stochastic trajectories from multiple initializations. Empirically, gains from test-time scaling are tightly coupled with stronger convergence toward solution-aligned attractors. This attractor perspective allows neural networks to adaptively allocate test-time compute based on task difficulty. While simple cases converge within 1 to 5 iteration steps, harder cases benefit from massive test-time scaling. By unrolling up to the equivalent of 40,000 layers, scalable latent reasoning boosts accuracy from 2.6% for feedforward models to over 99% on Sudoku-Extreme. These results suggest that learned attractor landscapes provide a useful mechanistic lens for understanding scalable reasoning in iterative latent models.

【2】Comparative Analysis of Military Detection Using Drone Imagery Across Multiple Visual Spectrums
标题：多光谱无人机图像军事侦察的比较分析
链接：https://arxiv.org/abs/2605.21157

作者：Sourov Roy Shuvo,Prajwal Panth,Rajesh Chowdhury,Sorup Chakraborty,Sudip Chakrabarty,Prasant Kumar Pattnaik
备注：6 pages, 7 figures. Accepted at the 16th International Conference on Computing, Communication and Networking Technologies (ICCCNT), July 6-11, 2025, IIT Indore. Proceedings pending publication
摘要：In modern warfare, drones are becoming an essential part of intelligence gathering and carrying out precise attacks in different kinds of hostile environments. Their ability to operate in real-time and hostile environments from a safe distance makes them invaluable for surveillance and military operations. The KIIT-MiTA dataset is comprised of images of different military scenarios taken from drones, and these provide a foundation for detecting military objects, but it does not take into account the various types of real-world scenarios. With that in mind, to evaluate how the models are performing under varying conditions, four different types of datasets are created: Gray Scale, Thermal Vision, Night Vision, and Obscura Vision. These simulate the real-world environments such as low visibility, heat-based imagery, and nighttime conditions. The YOLOv11-small model is trained and used to detect objects across diverse settings. This research boosts the performance and reliability of drone-based operations by contributing to the development of advanced detection systems in both defensive and offensive missions.

【3】CoarseSoundNet: Building a reliable model for ecological soundscape analysis
标题：CoarseSoundNet：构建可靠的生态声景分析模型
链接：https://arxiv.org/abs/2605.21143

作者：Alexander Gebhard,Andreas Triantafyllopoulos,Dominik Arend,Sandra Müller,Svenja Schmidt,Michael Scherer-Lorenzen,Björn W. Schuller
备注：Currently under review
摘要：A soundscape is composed of three types of sound: biophony (sounds made by animals), geophony (natural abiotic sounds) and anthropophony (sounds made by humans). A key research question in the field of soundscape ecology is how these components interact with each other, specifically how biophony responds to geophony and anthropophony. Nevertheless, as of today, there are not many analytical instruments that enable the distinct quantification of these elements. Recent machine learning (ML) approaches aim to support automated analysis but often rely on task-specific or clean data, limiting generalisation to noisy passive acoustic monitoring (PAM) recordings. This study presents a clear and reproducible structure to build ML models for coarse soundscape classification and introduces CoarseSoundNet, a deep learning model trained to distinguish biophony, geophony, and anthropophony under realistic PAM conditions. We systematically investigate model architectures, the influence of an additional training class, data composition, and evaluation strategies. Our findings suggest that model performance improves with additional PAM data, especially when similar to the target domain, and by introducing an explicit silence class during training. Class-specific decision thresholds and duration-based constraints further enhance performance, particularly for anthropophony and geophony. Error analyses exhibit challenges for anthropophony due to masking effects and confusions for silence and insect sounds for geophony and biophony. Finally, we conduct an ecological case study which shows that pre-filtering recordings with CoarseSoundNet yields acoustic index trends comparable to ground-truth filtering, supporting its use as an effective preprocessing tool for ecoacoustic analyses.

【4】Reasoning-Trace Collapse: Evaluating the Loss of Explicit Reasoning During Fine-Tuning
标题：推理痕迹崩溃：评估微调期间显式推理的损失
链接：https://arxiv.org/abs/2605.21127

作者：Lukas Twist,Helen Yannakoudakis,Jie M. Zhang
备注：22 pages, 3 tables, 3 figures
摘要：Explicit reasoning models are trained to produce intermediate reasoning traces before final answers, but downstream fine-tuning is often performed on ordinary instruction-response data that contains no such traces. We show that this mismatch can induce reasoning-trace collapse: a fine-tuned model continues to produce plausible final answers while losing the structurally valid explicit reasoning traces that made it a reasoning model in the first place. We introduce a structural evaluation framework that separates answer correctness from reasoning-trace validity, measuring valid, empty, missing, and truncated reasoning alongside reasoning-conditioned task performance. Using this framework, we study four open-weight reasoning models and find that standard supervised fine-tuning can rapidly suppress valid reasoning traces, and that answer-only metrics can substantially obscure this failure: in several settings, performance conditional on valid reasoning remains high while the rate of valid reasoning falls sharply. We further show that simple loss-masking strategies can substantially mitigate collapse without requiring teacher-generated reasoning traces. These results suggest that evaluations of fine-tuned reasoning models should report structural reasoning reliability metrics in addition to final-answer performance, especially when adaptation data does not contain explicit reasoning traces.

【5】Towards Understanding Self-Pretraining for Sequence Classification
标题：了解序列分类的自我预训练
链接：https://arxiv.org/abs/2605.21070

作者：Omar Coser,Loredana Zollo,Paolo Soda,Antonio Orvieto
备注：v1: Preliminary, extension of the version accepted at ICML 2025 Workshop MOSS
摘要：Amos et al. (2024) showed that the accuracy of Transformer models in sequence classification can be significantly improved by first pretraining with a masked token prediction objective without external data or augmentation, a procedure referred to as self-pretraining (SPT). While the primary objective of Amos et al. (2024) was to showcase that Transformers can achieve strong performance on the Long-Range Arena (LRA), their pipeline raises more fundamental questions: How does SPT drive optimization to better solutions? Why can standard supervised training fail in Transformers? To better understand this, we replicate and systematically ablate the findings of Amos et al. (2024). Our ablations suggest that a central bottleneck in the studied settings is not depth or generalization alone, but the ability of label supervision to learn useful query-key Attention patterns from random initialization. With a minimal setup, we identify learning proximity interactions - turning absolute positional encodings into proximity-biased Attention scores - as a key source of the improvements brought by SPT. Finally, in a simplified theoretical setup, we show that label supervision can be locally blind to certain Attention-score directions that are instead detectable through masked reconstruction.

【6】Finite-Time Regret Analysis of Retry-Aware Bandits
标题：惯犯的短暂遗憾分析
链接：https://arxiv.org/abs/2605.20854

作者：Bingkui Tong,Junpei Komiyama,Soichiro Nishimori,Paavo Parmas
备注：38 pages
摘要：We study a stochastic bandit algorithm motivated by retry-aware objectives that value the best outcome among multiple attempts, such as pass@$k$ and max@$k$. Given a posterior over arm values, ReMax chooses a sampling distribution that maximizes the posterior expected maximum reward over $M$ virtual draws. Although this objective was introduced in reinforcement learning as an exploration mechanism under uncertainty, its regret properties in bandit problems have remained unclear. For Gaussian rewards and the first nontrivial case $M=2$, we characterize the optimal ReMax distribution through an expected-improvement balance condition and prove the first sublinear regret bound for ReMax. Our analysis separates the usual saturation behavior of suboptimal arms from a ReMax-specific underestimation effect, in which the optimal arm may be sampled too rarely after an unfavorable estimate. This explains why ReMax can be more exploitative than Thompson sampling (TS) and why its regret analysis is technically delicate. Experiments support this picture: ReMax often outperforms KL-UCB and Thompson sampling under mild underestimation, while posterior-variance scaling empirically mitigates severe underestimation.

【7】Interaction Locality in Hierarchical Recursive Reasoning
标题：层次回归推理中的交互局部性
链接：https://arxiv.org/abs/2605.20784

作者：Yosuke Miyanishi,Tetsuro Morimura
摘要：Spatial reasoning requires both location-bound computation and location-invariant structure: agents must make local moves while preserving route, object, or constraint-level plans. We propose interaction locality, a task-geometry-aware framework for measuring whether information flow stays within nearby cells or semantic segments, or crosses them. We instantiate the framework with sparse-autoencoder feature ablations and finite-noise activation patching, with structural Jacobian and attention checks reported in the appendix, and apply it to HRM and TRM, two compact hierarchical and recursive reasoning models, on Maze-Hard, Sudoku Extreme, and ARC-AGI. Across these models, activation patching gives the clearest architectural fingerprint: high-level recurrent states tend to write information within nearby cells or same-segment units, while repeated recursive updates accumulate these local writes into broader solution structure. This pattern holds across maze paths, Sudoku constraints, and ARC-AGI object neighborhoods, with the strongest concentration in TRM. To test whether interaction locality extends beyond toy-yet-challenging grid benchmarks, we also apply it to MTU3D, a large-scale embodied 3D scene-grounding model. In this MTU3D setting, causal spatial locality appears primarily at the transition where visual scene features are handed to the downstream grounding module, rather than uniformly throughout the visual encoder. This contrast suggests that the local-to-global handoff observed in HRM and TRM is tied to explicit recursive reasoning dynamics, while embodied 3D models may concentrate causal spatial structure at module boundaries. Interaction locality turns the intuitive local-execution/global-planning story into a reproducible measurement framework for recursive and embodied spatial reasoning.

【8】Causal Machine Learning Is Not a Panacea: A Roadmap for Observational Causal Inference in Health
标题：因果机器学习不是万能药：健康观察因果推断的路线图
链接：https://arxiv.org/abs/2605.20782

作者：Donna Tjandra,Trenton Chang,Sonali Parbhoo,Rajesh Ranganath,Andre Kurepa Waschka,William Mitchell,Maggie Makar,Shalmali Joshi,Finale Doshi-Velez,Leo Anthony Celi,Jenna Wiens
摘要：Objective: The growing availability of large-scale observational clinical datasets and challenges in conducting randomized controlled trials have spurred enthusiasm in using causal machine learning (ML) for causal inference in observational data. We present a roadmap for applying causal ML to observational data. Materials and methods: We outline the importance of assessing validity assumptions within available data and applying causal ML responsibly for clinical experts using causal ML and ML practitioners with limited clinical expertise. Observations: Despite advances in causal ML, its limitations remain largely under-appreciated across disciplines. This gap in shared knowledge may impact the validity of findings. Discussion: Causal assumptions must be satisfied and modeling choices justified. Otherwise, these approaches risk producing biased or misleading results, with consequences for clinical research and patient care. Conclusion: Causal ML can be a powerful tool for generating causal hypotheses. We provide a template to strengthen the rigor and interpretability of causal analyses.

【9】Memory-Efficient Partitioned DNN Inference on Resource-Constrained Android Crowds
标题：资源受限Android人群的内存高效分区DNN推理
链接：https://arxiv.org/abs/2605.20723

作者：Lakshani Manamperi,Disumi Pathirana,Thiwanka Pathirana,Nipun Premarathna,Kutila Gunasekera
备注：6 pages, 3 figures, 4 tables. Accepted at the ICML 2026 Workshop on Machine Learning for the Global South
摘要：Deploying large deep neural networks on memory-constrained mobile devices is a central challenge in edge ML. While compression, pruning, and quantization reduce per-parameter cost, transformer-based models remain too large for the 3.3-7.4 GB RAM envelope of commodity Android handsets. We present the DNN pipeline scheduling subsystem of CROWDio, which achieves practical ONNX inference across resource-constrained Android workers without model modification, by distributing memory pressure across devices via five mechanisms: JIT deferred partition loading, a single-partition-resident constraint, a 4-tier affinity scheduler, a zlib-compressed tensor transport, and a streaming 1:1 dependency model. Evaluated on DistilBERT (Sanh et al., 2019) (approximately 67 M parameters, SST-2) across five Android handsets over ten runs, our system holds peak per-device RSS to 43+-2 MB and limits battery draw to 50+-3 mAh per run, while streaming concurrency cuts batch latency 34% below barrier synchronisation.

【10】Quadratic Characterizations for Reachability Analysis of Neural Networks
标题：神经网络可达性分析的二次刻画
链接：https://arxiv.org/abs/2605.20482

作者：Elias Khalife,Mazen Farhood,Pierre-Loic Garoche
摘要：Quadratic constraints (QCs) are widely used to characterize nonlinearities and uncertainties, but generic analytical characterizations can be conservative on bounded domains. This paper develops a framework for constructing verified quadratic characterizations of scalar relations in the two-dimensional real plane. Candidate quadratic inequalities are locally generated by solving convex quadratic programs using samples from the relation and exterior sample points. They are then verified globally using sum-of-squares certificates over an exact semialgebraic description or, in the case of nonpolynomial relations, over relaxed polynomial descriptions. The resulting verified constraints define a sound overapproximation of the scalar relations over the considered domains. These constraints are directly compatible with existing analysis frameworks based on QCs and pointwise integral quadratic constraints (IQCs) for static nonlinearities and uncertainties, and they can also be embedded in QC-based semidefinite programs for reachability and safety analysis of feedforward neural networks. For smooth activations such as $\tanh$, the method yields domain-dependent quadratic characterizations that constitute an alternative to generic sector- or slope-based descriptions. For ReLU networks, we give methods to reduce conservatism in QC-based reachability analysis of feedforward networks by exploiting dependencies between neurons and tighter local bounds. Numerical examples demonstrate improved reachability results for smooth activations, reduced conservatism for ReLU networks, and applicability beyond neural networks through an example involving saturation.

【11】Efficient Table QA via TableGrid Navigation and Progressive Inference Prompting
标题：通过Table Grid导航和渐进式推理预算编制高效QA表
链接：https://arxiv.org/abs/2605.20254

作者：Amritansh Maurya,Navjot Singh,Mohammed Javed,Omar Moured
备注：Accepted for Presentation in ICDAR 2026, Vienna, Austria
摘要：Large Language Models (LLMs) have shown promising results on NLP tasks, however, their performance on tabular data still needs research attention, because Table Question-Answering (TQA) requires precise cell retrieval and multi-step structured reasoning. Existing work improves TQA either by fine-tuning or training LLMs on task-specific tabular data, but often lacks verifiable control over how the model navigates tables and derives answers. In this work, we propose a training-free TQA approach with two structured prompting frameworks: TableGrid Navigation (TGN), which iteratively navigates rows and columns via a three-module loop to locate evidence and refine answers, and Progressive Inference Prompting (PIP), which enforces columns identification for explicit progressive row selection constraint according to the query. We evaluate 17 LLMs against 6 baselines on TableBench and FeTaQa dataset. On TableBench, TGN improves over the strongest baseline by 3.8 points, and on FeTaQa, PIP achieves SOTA performance over ReAct and Chain-of-Thought. Beyond inference-time gains, PIP and TGN can also serve as supervision templates to fine-tune small models, narrowing the performance gap to much larger architectures in resource-constrained settings, offering versatile and cost-efficient solution for TQA.

【12】Automated Kernel Discovery Towards Understanding High-dimensional Bayesian Optimization
标题：自动化核发现以理解多维Bayesian优化
链接：https://arxiv.org/abs/2605.20249

作者：Taeyoung Yun,Woocheol Shin,Inhyuck Song,Jaewoo Lee,Jinkyoo Park
备注：36 pages, 27 figures, 12 tables
摘要：Gaussian Process (GP) kernels are central to Bayesian optimization (BO), yet designing effective kernels for high-dimensional problems still relies on extensive manual engineering. Existing automated approaches struggle in high dimensions for two bottlenecks: their kernel search space is limited to additions and multiplications of base kernels, and LLM-based approaches require conditioning on raw observations, which becomes infeasible due to context-length limits and the difficulty of extracting meaningful patterns. We introduce \textbf{Kernel Discovery}, a LLM-driven evolutionary framework for high-dimensional BO that searches a broader kernel space beyond predefined composition rules and does not require conditioning on observations. Motivated by the observation that directly prompting an LLM to generate kernel code yields syntactically varied but functionally identical kernels, we adopt a two-stage approach: an LLM first proposes novel mathematical forms, then a second LLM call converts each form into validated, executable code. We also propose a leave-one-out continuous ranked probability score (LOO-CRPS) as a selection criterion that penalizes overfitted kernels. On five high-dimensional BO benchmarks, our method achieves an average rank of \textbf{1.2 out of 17}, outperforming competitive baselines. We further analyze the discovered kernels to identify which kernels lead to improvements in high-dimensional BO.

【13】Long-Context Reasoning Through Proxy-Based Chain-of-Thought Tuning
标题：通过基于代理的思想链调整进行长上下文推理
链接：https://arxiv.org/abs/2605.20201

作者：Miao Li,Irina Saparina,Alexander Gurung,Mirella Lapata
备注：Long, ACL 2026 (Main conference)
摘要：Recent large language models support inputs of up to 10 million tokens, yet they perform poorly on long-context tasks that require complex reasoning. Such tasks can be solved using only a subset of the input -- a proxy context -- rather than the full sequence. Despite sharing the same underlying reasoning process, models exhibit a significant performance disparity between proxy and full contexts. To improve long-context reasoning, we propose ProxyCoT, a novel training framework that transfers reasoning capabilities from short proxy contexts to full long contexts. Specifically, we first obtain high-quality chain-of-thought reasoning traces on proxy contexts through reinforcement learning or distillation from a larger teacher model, and then ground the generated traces in full long contexts with supervised fine-tuning. Experiments across different datasets demonstrate that ProxyCoT consistently outperforms strong baselines with reduced computational overhead. Furthermore, models trained with ProxyCoT generalize their long-context reasoning capabilities to out-of-domain tasks.

【14】Improving Quantized Model Performance in Qualitative Analysis with Multi-Pass Prompt Verification
标题：利用多遍快速验证提高量化模型在定性分析中的性能
链接：https://arxiv.org/abs/2605.20193

作者：Aisvarya Adeseye,Jouni Isoaho,Adeyemi Adeseye
备注：Accepted to publish in 12th Intelligent Systems Conference 2026; 3-4 September 2026 in Amsterdam, The Netherlands
摘要：Quantized Large Language Models (LLMs) are used more often in qualitative analysis because they run fast and need fewer computing resources. This study examines how different lower bits quantization levels (8-bit, 4-bit, 3-bit, and 2-bit) and quantization types affect the performance of LLaMA-3.1 (8B) on qualitative analysis. The study uses expert and non-expert responses from 82 interview transcripts. Low-bit models often produce higher levels of hallucinations and unstable results, especially when reading non-expert language with unclear terms. To improve performance, we propose a quantization-aware multi-pass prompt verification method. This method guides the model through controlled steps that reduce hallucinations. It removes unreliable content and passes the results to the next transcript after verification, improving accuracy. To validate performance, human coders analyzed transcripts using NVivo and BF16 LLaMA. BF16 LLaMA-3.1 produced high-precision output but had semantic drift and hallucination. These errors were corrected manually. The corrected BF16 output and NVivo human coding were combined to create a gold-standard ground truth (GSGT) for thematic extraction and frequency analysis. The results show that 8-bit models stay closest to the GSGT. The 4-bit models lose accuracy but become stable when the proposed method is applied. The 3-bit and 2-bit models drop in performance because of heavy compression, but they improve with the proposed prompt design and verification. The study also finds that models at the same bit level behave differently depending on quantization type. Overall, the method helps low-resource LLMs become more stable, accurate, and suitable for qualitative research at lower cost.

【15】Theoretical guidelines for annealed Langevin dynamics in compositional simulation-based inference
标题：基于成分模拟的推理中的热之万动力学的理论指南
链接：https://arxiv.org/abs/2605.21253

作者：Camille Touron,Gabriel V. Cardoso,Julyan Arbel,Pedro L. C. Rodrigues
摘要：Compositional score-based approaches to simulation-based inference (SBI) approximate the posterior over a shared parameter given $n$ independent observations by aggregating individually learned posterior scores: currently, there are two main propositions of such methods (Geffner et al. (2023), Linhart et al. (2026)). As the resulting composite score does not correspond to the score of any distribution along the forward diffusion path of the true multi-observation posterior, sampling from it via a reverse SDE leads to an irreducible bias. Annealed Langevin dynamics provides a principled alternative: it treats the composite score as the genuine score of a sequence of tractable bridging densities and samples from them in succession. When properly tuned, it could lead to a controllable bias. However, its hyperparameters, namely step sizes, the number of steps per level, and the number of annealing levels, have so far been chosen empirically. We derive Wasserstein bounds for annealed Langevin with approximate scores and translate them into explicit decision rules for these hyperparameters that guarantee a prescribed sampling accuracy, while highlighting different theoretical aspects of each composite score formulation. In the Gaussian setting, we obtain closed-form expressions for all relevant quantities and prove that the bridging densities of Linhart et al. (2026) consistently admit larger step sizes and require fewer total Langevin steps than those of Geffner et al. (2023). Furthermore, we show empirically that the tuning obtained in the Gaussian setting generalizes to more complex problems, thus providing a well-understood and theoretically grounded starting point for practitioners using compositional score-based approaches.

【16】Everywhere Valid Bounds on False Discovery Proportions in Conformal Inference
标题：保形推理中错误发现比例的无处不在有效界限
链接：https://arxiv.org/abs/2605.20726

作者：Ziang Song,Ying Jin,Emmanuel J. Candès
备注：31 pages, 12 figures. Code available at https://github.com/sza919/everywhere-valid-fdp-bounds-in-conformal-inference
摘要：Modern applications of conformal inference to multiple testing problems, such as outlier detection and candidate selection, often involve selecting test samples whose conformal p-values fall below a threshold. The quality of such methods is often measured by the false discovery proportion (FDP), defined as the fraction of incorrect selections. Existing approaches typically control the expected value of the FDP, using methods such as the Benjamini-Hochberg procedure. This approach fails to provide high-probability bounds on the realized false discovery proportion and invalidates statistical guarantees if the rejection threshold is selected after inspecting the data. This paper establishes finite-sample, distribution-free upper bounds on the FDP that hold simultaneously over all possible rejection thresholds, enabling arbitrary post hoc selection of the threshold. Simultaneous validity is achieved by constructing a high-probability envelope for the empirical distribution function of null conformal p-values by sampling from their joint distribution. Furthermore, our framework allows practitioners to modulate the envelope's shape, thereby producing tight bounds in rejection regions of primary interest. We use this flexible approach to derive simultaneous FDP upper bounds for both outlier detection and conformal selection. We demonstrate through synthetic and real-data experiments that the resulting bounds are both valid and substantially less conservative than those derived from existing approaches.

【17】Scale-Calibrated Median-of-Means for Robust Distributed Principal Component Analysis
标题：基于尺度校正的稳健分布主成分分析均值中位数
链接：https://arxiv.org/abs/2605.20681

作者：Kisung You
摘要：Distributed principal component analysis (PCA) produces node-level estimates of both a mean vector and a principal subspace. Robustly aggregating these heterogeneous objects requires a relative scale between mean error and subspace error. We study a scale-calibrated median-of-means estimator for this problem using the product geometry of Euclidean space and the Grassmann manifold. A node-level PCA expansion shows that the mean component has the usual linear influence, whereas the subspace component is an eigengap-weighted covariance perturbation. We prove a local reduction showing that the proposed product-manifold median-of-means estimator is asymptotically equivalent to a scaled spatial median of node influence errors. This yields fixed-node non-Gaussian limits, growing-node Gaussian limits with finite-block bias, and an explicit scale-dependent covariance formula. We propose robust block-scale and inference-optimal calibration rules, establish high-probability median-of-means bounds, characterize factorwise bad-node influence, and prove node-bootstrap validity. Simulations and large-scale single-cell RNA-seq data show that scale calibration adapts to eigengap-driven subspace uncertainty and provides a robust distributed PCA summary.

【18】Understanding Deterioration Random Effects for Causal Discovery in Infrastructure Management
标题：了解恶化随机效应以发现基础设施管理中的因果关系
链接：https://arxiv.org/abs/2605.20400

作者：Takato Yasuno
备注：20 pages, 7 figures, 4 tables
摘要：Infrastructure deterioration poses significant challenges for asset management, yet existing approaches rely on population-averaged models that overlook equipment-specific heterogeneity. We present a novel framework that combines Bayesian hierarchical hazard modeling with causal discovery to identify operational patterns that drive heterogeneous deterioration rates in pump equipment. Our approach first estimates pump-specific random effects $u_i$ using GPU-accelerated No-U-Turn Sampling (NUTS), achieving 3--5$\times$ speedup over CPU implementations. We then employ DirectLiNGAM to discover causal relationships between 22 engineered time-series features and deterioration rates, stratified by positive ($u_i > 0$, faster deterioration) versus negative ($u_i \leq 0$, slower deterioration) random effects. Analyzing 112 pumps with 92,861 observations over 650 days, we uncover striking heterogeneity: the negative group exhibits causal effects 400$\times$ larger than the positive group, with standard deviation (std) showing a strong positive causal effect ($+1.515$) on deterioration rates in low-risk equipment. We validate linearity assumptions through NonlinearLiNGAM comparison and demonstrate practical scalability through GPU acceleration. Our findings enable targeted maintenance strategies by revealing that different operational regimes require fundamentally distinct management approaches, advancing predictive maintenance from population-averaged to heterogeneity-aware decision making.

【19】Corrected Integrated Laplace Approximation for Bayesian Inference in Latent Gaussian Models
标题：潜高斯模型中Bayesian推理的修正综合拉普拉斯逼近
链接：https://arxiv.org/abs/2605.20345

作者：Jinlin Lai,Charles C. Margossian,Daniel R. Sheldon
摘要：Latent Gaussian models (LGMs) are a popular class of Bayesian hierarchical models that include Gaussian processes, as well as certain spatial models and mixed-effect models. Efficient Bayesian inference of LGMs often requires marginalizing out the latent variables. For LGMs with a non-Gaussian likelihood, exact marginalization is not possible and a popular approach is to do approximate marginalization with an integrated Laplace approximation (ILA). Using ILA produces an approximate posterior which, in some settings, can differ significantly from the correct posterior, which impacts downstream applications. We propose an importance sampling scheme to correct the error introduced by ILA. By increasing the number of samples in importance sampling, the posterior with ILA converges to the correct posterior. This idea is realized with various techniques, including pseudo-marginalization, quasi-Monte Carlo and randomized quasi-Monte Carlo. We implement our methods in an automatic differentiation framework to support gradient-based algorithms when doing inference on the hyperparameters. For the latter, we specifically consider the use of Hamiltonian Monte Carlo. We demonstrate the benefits of reduced error in various applied models.

【20】The Economics of AI Inference: Inflation Dynamics, Welfare Costs, and Optimal Monetary Policy under the Inference-Cost Phillips Curve
标题：人工智能推理的经济学：通货膨胀动态、福利成本和推断成本菲利普斯曲线下的最佳货币政策
链接：https://arxiv.org/abs/2605.20281

作者：Gustav Olaf Yunus Laitinen-Fredriksson Lundström-Imanov
备注：6 pages, 5 tables
摘要：We develop a unified microeconomic and monetary theory of artificial intelligence inference costs and their pass-through to inflation, welfare, and optimal monetary policy. We introduce the Inference-Cost Phillips Curve (ICPC), an augmented New Keynesian Phillips curve in which firm-level marginal costs of producing differentiated goods include a non-trivial AI inference component lambda-bar, and prove a closed-form structural slope kappa*_inf = lambda-bar * kappa, where kappa is the standard Calvo-Yun slope. We derive a welfare-relevant Hicks-Kaldor decomposition of consumer welfare under inference-cost shocks, prove a generalized Taylor principle for the inference-augmented economy, and characterize the optimal monetary policy response coefficient psi*_inf = (1 + phi*rho) * lambda-bar * kappa under commitment. A second-order welfare loss formula closes the model in closed form. We confront the theory with U.S. monthly data 2022:M01-2026:M04 using a two-step GMM estimator with Newey-West HAC standard errors and Hansen J-test, recovering an empirical slope kappa-hat_inf = 0.087 (HAC s.e. 0.021) which lies within one standard error of the structural prediction. A scaling regression over 50 rolling-window subwindows yields b-hat = 0.987 (R^2 = 0.998), consistent with a near-unit-elasticity pass-through. A G7 reduced-form panel with Driscoll-Kraay HAC standard errors yields b-hat^G7 = 0.094 (s.e. 0.026), and a Wald test fails to reject cross-country homogeneity (p = 0.78). The framework provides a single equilibrium scaffold for the joint study of AI inference cost dynamics, monetary policy under generative-AI shocks, and the welfare cost of inference-driven inflation.

【21】E-PCN: Jet Tagging with Explainable Particle Chebyshev Networks Using Kinematic Features
标题：E-PCO：使用运动学特征使用可解释粒子Chebyshev网络进行喷射标记
链接：https://arxiv.org/abs/2512.07420

作者：Md Raqibul Islam,Adrita Khan,Mir Sazzat Hossain,Choudhury Ben Yamin Siddiqui,Md. Zakir Hossan,Tanjib Khan,M. Arshad Momen,Amin Ahsan Ali,AKM Mahbubur Rahman
备注：25 pages, 3 figures
摘要：The identification and classification of collimated particle sprays, or jets, are essential for interpreting data from high-energy collider experiments. While deep learning has improved jet classification, it often lacks interpretability. We introduce the Explainable Particle Chebyshev Network (E-PCN), a graph neural network extending the Particle Chebyshev Network (PCN). E-PCN integrates kinematic variables into jet classification by constructing four graph representations per jet, each weighted by a distinct variable: angular separation ($Δ$), transverse momentum ($k_T$), momentum fraction ($z$), and invariant mass squared ($m^2$). We use the concept of Gradient-weighted Class Activation Mapping (Grad-CAM) to determine which kinematic variables dominate classification outcomes. Analysis reveals that angular separation and transverse momentum collectively account for approximately 76% of classification decisions (40.72% and 35.67%, respectively), with momentum fraction and invariant mass contributing the remaining 24%. Evaluated on the JetClass dataset with 10 signal classes, E-PCN achieves a macro-accuracy of 94.67%, macro-AUC of 96.78%, and macro-AUPR of 86.79%, representing improvements of 2.36%, 4.13%, and 24.88% respectively over the baseline PCN implementation, while demonstrating physically interpretable feature learning.

检测相关(3篇)

【1】GenAI-Driven Threat Detection with Microsoft Security Copilot
标题：使用Microsoft安全副驾驶员进行GenAI驱动的威胁检测
链接：https://arxiv.org/abs/2605.20896

作者：Scott Freitas,Amir Gharib
摘要：Defending against today's increasingly sophisticated cyberattacks requires security analysts to continuously translate evolving attacker tradecraft into detection logic. This places defenders in a reactive posture, requiring constantly updated expertise across an increasingly fragmented security landscape. We introduce the Dynamic Threat Detection Agent (DTDA), an always-on adaptive agent that continuously investigates security incidents across Microsoft Defender to uncover hidden threats and generate explainable detections when attack-story gaps are found. DTDA combines: (1) a unified activity timeline spanning alerts, events, user and entity behavior analytics, and threat intelligence; (2) versioned LLM prompt contracts with schema validation, grounding requirements, bounded retries, and fail-closed suppression; (3) a planner-executor investigation loop that generates attack-specific hypotheses and gathers supporting and refuting evidence; and (4) dynamic alert generation with a context-relevant title, severity, MITRE mappings, remediation guidance, implicated entities, and natural-language attack description. Integrated into Microsoft Security Copilot and deployed across tens of thousands of Defender customers, DTDA operates continuously at industry scale. In a 120-day online evaluation, DTDA achieves 80.1% precision from customer feedback while generating novel alerts for approximately 15% of investigated incidents. In offline evaluation, DTDA recovers hidden malicious activity with 0.78 F1 using GPT-5.4, improving over GPT-4.1 by 0.12 F1 and outperforming the baseline by 0.26 F1 points. Operationally, DTDA processes single-incident investigations end-to-end in a median of 28 minutes at a median token cost of USD 2.04, with a 0.38% job-level failure rate. These results demonstrate that autonomous agents can identify missed malicious activity at a production scale.

【2】Tippett-minimum Fusion of Representation-space Diffusion Models for Multi-Encoder Out-of-Distribution Detection
标题：用于多编码器失分布检测的表示空间扩散模型的Tippett-minimum融合
链接：https://arxiv.org/abs/2605.20502

作者：Neelkamal Bhuyan
备注：14 pages
摘要：We address out-of-distribution (OOD) detection across the full spectrum of distribution shifts -- global domain changes, semantic divergence, texture differences, and covariate corruptions -- through a multi-encoder fusion of per-encoder representation-space diffusion models (RDMs). We statistically identify each encoder's sensitivity to specific shift types from ID data alone and introduce EncMin2L -- an encoder-agnostic two-level $\min(\cdot)$-gate that combines and calibrates per-encoder diffusion-based likelihood detectors without OOD labels, outperforming monolithic multi-encoder baselines at $2.3\times$ lower parameter cost. Two ID-data diagnostics: $η^2$ (class-conditional F-test) and $Δμ$ (log-likelihood shift under synthetic corruptions) -- quantify encoder specialization, while a Tippett minimum $p$-value combination aggregates per-encoder scores into a single, calibration-stable OOD signal. EncMin2L achieves $\geq 0.94$ AUROC across all four shift types simultaneously, outperforming the state-of-the-art representation-space diffusion OOD detectors across overlapping benchmarks.

【3】Latent Geometry as a Structural Monitor: Eigenspace Alignment for Anomaly Detection in Anonymity Networks
标题：潜在几何作为结构监测器：异常网络中异常检测的特征空间对齐
链接：https://arxiv.org/abs/2605.20391

作者：Vaibhav Chhabra
备注：14 pages, 5 figures, 1 table
摘要：Traditional anomaly detection marks events when measured signals cross predefined thresholds. This captures the moment of transition but not the structural pressure that precedes it. We propose treating large behavioral populations as geometric energy landscapes whose deformation can be measured before and during major transitions. The central thesis is that structure precedes geometry: the structural organization of the population is the signal, and geometric metrics are instruments for measuring it. Applied to the Tor anonymity network across 67 consecutive daily observation windows, the dual-observer pipeline identifies a stable nine-dimensional load-bearing subspace invariant across the observation period and validates this structure by Monte Carlo simulation at 16.8 sigma above the noise floor. Primary detection gates achieve 0.0% false positive rate on 24 confirmed stable windows. Forensic analysis of the February 20, 2026 confirmed infrastructure event formally falsifies the relay-departure hypothesis, identifying connectivity degradation without topology change as a detectable network failure mode. The result is a candidate structural-monitoring framework for behavioral populations with sufficient telemetry.

分类|识别(7篇)

【1】Polynomial-Time Robust Multiclass Linear Classification under Gaussian Marginals
标题：高斯边缘下的多项鲁棒多类线性分类
链接：https://arxiv.org/abs/2605.21428

作者：Ilias Diakonikolas,Giannis Iakovidis,Mingchen Ma
摘要：We study the task of agnostic learning of multiclass linear classifiers under the Gaussian distribution. Given labeled examples $(x, y)$ from a distribution over $\mathbb{R}^d \times [k]$, with Gaussian $x$-marginal, the goal is to output a hypothesis whose error is comparable to that of the best $k$-class linear classifier. While the binary case $k=2$ has a well-developed algorithmic theory, much less is known for $k \ge 3$. Even for $k=3$, prior robust algorithms incur exponential dependence on the inverse of the desired accuracy in both complexity and representation size. In this work, we develop new structural results for multiclass linear classifiers and use them to design fully polynomial-time robust learners with dimension-independent error guarantees. Our first result shows that the standard multiclass perceptron algorithm requires super-polynomially many samples and updates, even with clean labels and Gaussian marginals, revealing a basic obstruction absent in the binary case. Our main positive result is a pairwise improper-learning framework which yields an efficient learner with error $\widetilde O(k^{3/2}\sqrt{\mathrm{opt}})+ε$ for general $k$. Additionally, we develop a sharper localization-based framework which leads to error $O(\mathrm{opt})+ε$ for $k=3$, and error $\mathrm{poly}(k)\mathrm{opt}+ε$ for geometrically regular $k$-class linear classifiers.

【2】Classification of Single and Mixed Partial Discharges under Switching Voltage Using an AWA-CNN Framework
标题：使用AWA-CNN框架对开关电压下单一和混合部分放电进行分类
链接：https://arxiv.org/abs/2605.21352

作者：Md Rafid Kaysar Shagor,Zannatul Ferdousy Mouri,Farhina Haque,Anindya Bijoy Das
摘要：The growing use of fast-switching power electronics has made partial discharge (PD) analysis under switching-voltage excitation increasingly important, yet more challenging than under sinusoidal conditions due to activity concentrated at voltage transitions. This work presents an Amplitude-Width-Area (AWA) pattern representation for source-oriented PD analysis under switching-voltage excitation. In the proposed method, time domain PD pulses are characterized using pulse amplitude, width, and area, and mapped into a visual pattern where amplitude and area define the coordinate axes and width is encoded by color. The generated AWA patterns are used to distinguish six single and mixed PD source conditions: corona, internal, surface, corona+internal, corona+surface, and internal+surface. To evaluate the classification capability of the proposed representation, a Random Forest baseline and two Convolutional Neural Network (CNN) models, InceptionV3 and ResNet-18, are compared. The AWA patterns show distinguishable source-dependent distributions, and CNN-based classification achieves testing accuracy above 96%, compared with 73.33% for Random Forest. The results indicate that AWA patterns provide a visual representation of PD pulses suitable for multi-class PD source classification under switching-voltage excitation.

【3】Efficient Banzhaf-Based Data Valuation for $k$-Nearest Neighbors Classification
标题：$k$-最近邻居分类的高效基于Banzaf的数据估值
链接：https://arxiv.org/abs/2605.21033

作者：Guangyi Zhang,Lutz Oettershagen,Lixu Wang,Aristides Gionis
备注：To appear at VLDB 2026
摘要：Data valuation, the task of quantifying the contribution of individual data points to model performance, has emerged as a fundamental challenge in machine learning. Game-theoretic approaches, such as the Banzhaf value, offer principled frameworks for fair data valuation; however, they suffer from exponential computational complexity. We address this challenge by developing efficient algorithms specifically tailored for computing Banzhaf values in $k$-nearest neighbor ($k$NN) classifiers. We first establish the theoretical hardness of the problem by proving that it is \#P-hard. Despite this intractability, we exploit the locality properties of $k$NN classifiers to develop practical exact algorithms. Our main contribution is a dynamic programming framework that achieves significant computational improvements: we present a pseudo-polynomial algorithm with $O(Wkn^2)$ time complexity for weighted $k$NN classifiers, where $W$ is the maximum sum of top-$k$ weights, and a specialized algorithm for unweighted $k$NN that achieves $O(nk^2)$ time complexity, that is, linear in the number of data points. We also offer efficient Monte Carlo estimation methods. Extensive experiments on real-world datasets demonstrate the practical efficiency of our approach and its effectiveness in data valuation applications.

【4】Activation-Free Backbones for Image Recognition: Polynomial Alternatives within MetaFormer-Style Vision Models
标题：图像识别的免激活主干：MetaFormer风格视觉模型中的多元替代方案
链接：https://arxiv.org/abs/2605.20839

作者：Jeffrey Wang,Jonathan Gregory,Grigorios G. Chrysos
备注：Accepted to ICML 2026
摘要：Modern vision backbones treat pointwise activations (e.g., ReLU, GELU) and exponential softmax as essential sources of nonlinearity, but we demonstrate they are not required within MetaFormer-style vision backbones. We design activation-free polynomial alternatives for three core primitives (MLPs, convolutions, and attention), where Hadamard products replace standard nonlinearities to yield polynomial functions of the input. These modules integrate seamlessly into existing architectures: instantiated within MetaFormer, a modular framework for vision backbones, our PolyNeXt models match or exceed activation-based counterparts across model scales on ImageNet classification, ADE20K semantic segmentation, and out-of-distribution robustness. We also substantially outperform prior polynomial networks at reduced computational cost, showing that polynomial variants of standard modules beat complex custom architectures.

【5】Lowering the Barrier to IREX Participation: Open-Source Algorithms, Toolkit, and Benchmarking for Iris Recognition
标题：降低IREX参与的障碍：虹膜识别的开源算法、工具包和基准测试
链接：https://arxiv.org/abs/2605.20735

作者：Siamul Karim Khan,Patrick J. Flynn,Adam Czajka
摘要：This paper proposes two new open-source iris recognition algorithms, providing both Python and IREX-compliant C++ implementations to be submitted to the official IREX X program. This work has two primary goals: (a) to conduct the first-ever assessment of open-source iris recognition solutions according to IREX testing protocols, and (b) to offer a model C++ submission that significantly facilitates the entry of other teams' open-source methods into the IREX evaluation. The new methods consist of two Neural Networks trained with: (i) Triplet loss with Batch-Hard Triplet mining (TripletIris), and (ii) ArcFace loss (ArcIris). The paper also provides open-source IREX-compliant C++ implementations of two existing methods: (a) an iris image filtering-based algorithm utilizing human saliency-driven kernels (HDBIF), and (b) a human-interpretable algorithm for detecting and comparing Fuchs' crypts (CRYPTS). Except for CRYPTS, which faced timing constraints during 1:N search, these methods have undergone the official IREX X evaluation and have also been assessed using several popular academic benchmarks: Quality-Face/Iris Research Ensemble, Warsaw-Biobase Post-Mortem Iris, CASIA-Iris-Thousand-V4, CASIA-Iris-Lamp-V4, IIT Delhi Iris Database, IIITD Contact Lens Iris Database, NDIris3D, and Notre Dame Variable Iris Image Quality Release 2. Finally, this paper also provides open-source models for iris segmentation and circle estimation that can be incorporated into any new iris recognition method.

【6】Modular Multimodal Classification Without Fine-Tuning: A Simple Compositional Approach
标题：无需微调的模块化多模态分类：一种简单的组合方法
链接：https://arxiv.org/abs/2605.20674

作者：Herman Bergström,Aditya Mehrotra,Rahul G. Krishnan
备注：30 pages, 17 figures
摘要：We introduce CoMET, \textit{\textbf{C}omposing \textbf{M}odality \textbf{E}ncoders with \textbf{T}abular foundation models}, a simple yet highly competitive method for multimodal classification: pass each modality through a frozen pre-trained backbone, compress the resulting embeddings with PCA, and concatenate as input into a Tabular Foundation Model (TFM) for prediction. We show that PCA alone suffices to act as an adaptor yielding strong, robust performance across modalities. When the \texttt{CLS} tokens of the foundation model align poorly with downstream tasks, we propose \textbf{PALPooling}, a lightweight adaptive token pooler that consistently improves representation quality. By composing strong frozen representation learning backbones with TFMs, our approach achieves state-of-the-art results across diverse multimodal benchmarks without any training. On hierarchical tasks with large fine-grained class spaces, our approach enables fast and scalable classification, handling datasets with over 500,000 samples and 2,000 classes without any fine-tuning. Overall, our results show that the composition of foundation models is a simple, yet powerful, out-of-the-box solution for multimodal learning, challenging the necessity of complex, end-to-end training pipelines for new problems.

【7】AMAR: Lightweight Attention-Based Multi-User Activity Recognition from Wi-Fi CSI
标题：Aamar：来自Wi-Fi SI的轻量级基于注意力的多用户活动识别
链接：https://arxiv.org/abs/2605.20649

作者：Amirhossein Mohammadi,Hina Tabassum
备注：25 pages, 6 figures, 3 tables
摘要：Wi-Fi-based human activity recognition (HAR) has emerged as a promising approach for contactless sensing, leveraging channel state information (CSI) collected from wireless transceivers. While existing studies have primarily concentrated on single-user scenarios, real-world deployments often involve multi-user settings where concurrent users' movements induce overlapping CSI patterns that challenge conventional classification methods. To address this limitation, this paper introduces an attention-based multi-user activity recognition (AMAR) framework that formulates HAR as a set prediction problem. The transformer-based architecture in AMAR leverages learnable query embeddings acting as specialized activity detectors, enabling the simultaneous identification of multiple activities from composite CSI representations. Moreover, to address deployment constraints, AMAR is designed in an edge-cloud split architecture form where lightweight convolutional networks on edge devices perform initial feature extraction, followed by residual vector quantization that achieves substantial bandwidth reduction while preserving activity-discriminative information. The cloud component performs final activity prediction through attention-based set matching, enabling the system to handle varying occupancy levels. Across classroom, meeting-room, and empty-room environments, on average AMAR nearly doubles the rate of perfectly predicting all concurrent activities compared to the best baseline. Moreover, it achieves an $F_1$-score of 53.4% compared to 45.6% for the best benchmark, and reduces occupancy estimation error by 74%, while minimizing bandwidth substantially.

表征(3篇)

【1】A Dialogue between Causal and Traditional Representation Learning: Toward Mutual Benefits in a Unified Formulation
标题：因果关系与传统代表学习之间的对话：在统一的表述中实现互利
链接：https://arxiv.org/abs/2605.21058

作者：Yan Li,Yuewen Sun,Shaoan Xie,Gongxu Luo,Yunlong Deng,Kun Zhang,Guangyi Chen
摘要：Causal representation learning (CRL) and traditional representation learning have largely developed along different trajectories. Traditional representation learning has been driven mainly by applications and empirical objectives, whereas CRL has focused more on theoretical questions, particularly identifiability. This difference in emphasis has created a gap between the two fields in terminology, problem formulation, and evaluation, limiting communication and sometimes leading to disconnected or redundant efforts. In this paper, we argue that these two fields should be brought into dialogue rather than treated as separate paradigms. To this end, we introduce a unified formulation in which the representation learning is characterized by two components: a task component, which specifies what information the learned representation is required to preserve, and a constraint component, which specifies what structure is imposed on the latent space. Under this formulation, the benefits run in both directions. CRL provides theoretical tools for understanding when structured latent constraints are useful or necessary, while traditional representation learning offers practical insights on task design and objective choice that can improve the development of CRL methods. To illustrate this interaction, we experimentally study how different task components affect the behavior of CRL methods under different structured constraints. Results on CausalVerse show that the effectiveness of causal constraints depends strongly on the tasks with which they are paired.

【2】Learning to Think in Physics: Breaking Shortcut Learning in Scientific Diffusion via Representation Alignment
标题：在物理中学习思考：通过表示对齐在科学传播中突破界限学习
链接：https://arxiv.org/abs/2605.20780

作者：Haozhe Jia,Pengyu Yin,Wenshuo Chen,Shaofeng Liang,Lei Wang,Bowen Tian,Xiucheng Wang,Nanqian Jia,Yutao Yue
摘要：Physics-informed diffusion models typically enforce PDE constraints only on final outputs, leaving intermediate representations unconstrained and prone to shortcut learning under shifted boundary conditions. We introduce **REPA-P**, a teacher-free, architecture-agnostic framework that aligns intermediate features with physical states using first-principles residuals. REPA-P attaches lightweight $1{\times}1$ projection heads to selected layers, decodes hidden activations into physical quantities, and applies PDE residual losses during training. These heads are discarded at inference, introducing **zero overhead**. Across four PDE tasks, including Darcy flow, topology optimization, electrostatic potential, and turbulent channel flow, REPA-P accelerates convergence by up to $2{\times}$, reduces physics residuals by up to $66.4\%$, and improves out-of-distribution robustness by up to $49.3\%$, with consistent gains on both U-Net and Diffusion Transformer backbones. Ablations show that supervising a small set of intermediate layers captures most benefits and complements output-level physics losses. Code is available at [https://github.com/Hxxxz0/REPA-P](https://github.com/Hxxxz0/REPA-P).

【3】TriForces: Augmenting Atomistic GNNs for Transferable Representations
标题：TriForces：增强原子GNN以实现可转移表示
链接：https://arxiv.org/abs/2605.20581

作者：Ali Ramlaoui,Alexandre Duval,Hannah Bull,Victor Schmidt,Hugues Talbot,Fragkiskos D. Malliaros,Joseph Musielewicz
备注：28 pages, 11 figures. Accepted at ICML 2026
摘要：Machine learning interatomic potentials (MLIPs) achieve excellent accuracy when trained on large Density Functional Theory (DFT) data. To be useful in practice, they must often be adapted to target chemistries using small and expensive task-specific datasets. However, MLIPs transfer inconsistently across domains, with representations that often loose accessible composition and structure information. To address this, we present TriForces, a model-agnostic three-stream framework that separates composition and structure information, combined with self-supervised learning to preserve transferable representations. TriForces improves performance on MatBench and QM9 over baselines without needing DFT labels and enables efficient similar structure retrieval through its learned latent space. On OMat24, in limited-data training regime, TriForces reduces energy MAE by 57% at 20K samples only and improves force MAE across sample sizes. We release pretrained TriForces variants across multiple MLIP architectures with code at https://github.com/Ramlaoui/triforces.

编码器(1篇)

【1】Nonlocal operator learning for fMRI encoding and decoding tasks
标题：fMRI编码和解码任务的非局部算子学习
链接：https://arxiv.org/abs/2605.20389

作者：Andreas Kramer,Saugat Acharya,Alice Giola,Emanuele Zappala
备注：18 pages, 4 figures, 5 tables. Comments are welcome!
摘要：Functional MRI data exhibit high-dimensional spatiotemporal structure, making both prediction and decoding challenging. In this work, we investigate neural integral-operator-based models for encoding and decoding tasks in fMRI, with particular emphasis on the role of nonlocal spatiotemporal context. We implement a latent neural integral operator framework that performs fixed point iterations in an auxiliary space from which classification and stimuli prediction is performed via a decoder. We evaluate our model on two open-source fMRI datasets. Our experiments examine both decoding of stimuli from fMRI recordings and encoding of fMRI dynamics from stimulus representations. A main focus is the effect of spatiotemporal context: we systematically compare short and long temporal windows, as well as the use of visual cortex vs whole brain recordings, and analyze their influence on performance and latent-space geometry. Across tasks and datasets, larger temporal windows generally improve results and produce more structured learned representations. In decoding experiments, the learned latent space often provides clearer class separation than the raw data. In encoding experiments, although absolute performance remains moderate due to the difficulty of the task, longer temporal windows still yield consistent gains. These findings suggest that neural integral operators provide a promising framework for modeling fMRI dynamics and that broader spatiotemporal context can be beneficial for both prediction and representation learning. More broadly, the results indicate that exploiting distributed nonlocal structure in brain dynamics requires model architectures specifically designed to capture such dependencies.

优化|敛散性(20篇)

【1】TextReg: Mitigating Prompt Distributional Overfitting via Regularized Text-Space Optimization
标题：TextReg：通过规则化文本空间优化缓解提示性分布过度匹配
链接：https://arxiv.org/abs/2605.21318

作者：Lucheng Fu,Ye Yu,Yiyang Wang,Yiqiao Jin,Haibo Jin,B. Aditya Prakash,Haohan Wang
备注：Code: https://github.com/luchengfu6/TextReg
摘要：Large language models (LLMs) are highly sensitive to the prompts used to specify task objectives and behavioral constraints. Many recent prompt optimization methods iteratively rewrite prompts using LLM-generated feedback, but the resulting prompts often become longer, accumulate narrow sample-specific rules, and generalize poorly beyond the training distribution. We study this failure mode as prompt distributional overfitting and argue that it reflects a lack of representation control in discrete text-space optimization. We formalize this view through representational inefficiency, a dual-factor measure that decomposes prompt inefficiency into capacity cost and scope narrowness, attributing distributional prompt overfitting to their coupled growth during optimization. We propose TextReg, a regularization framework that realizes a soft-penalty objective through regularized textual gradients, combining Dual-Evidence Gradient Purification, Semantic Edit Regularization, and Regularization-Guided Prompt Update. Across multiple reasoning benchmarks, TextReg substantially improves out-of-distribution (OOD) generalization, with accuracy gains of up to +11.8% over TextGrad and +16.5% over REVOLVE.

【2】How Much Online RL is Enough? Informative Rollouts for Offline Preference Optimization in RLVR
标题：多少在线RL才够？WLVR中离线偏好优化的信息化卷展
链接：https://arxiv.org/abs/2605.21266

作者：Richa Verma,Balaraman Ravindran
摘要：Reinforcement Learning from Verifiable Rewards (RLVR) has emerged as a powerful paradigm for reasoning in language models, with GRPO as its primary example. However, GRPO requires continuous online rollout generation, making it computationally expensive and difficult to scale. While Direct Preference Optimization (DPO) offers a stable and efficient offline alternative, it is typically expected to underperform w.r.t. online RL methods such as GRPO when trained on rollouts from a cold supervised fine-tuned (SFT) policy. We introduce G2D (GRPO to DPO)}, a three-stage pipeline that performs a short GRPO warm-up, constructs a static preference dataset, and fine-tunes a model offline with DPO. Across a set of values of the number of online steps (K) in GRPO on Qwen2.5-7B and Llama-3.1-8B, we find that offline DPO with moderate warm-up matches or outperforms GRPO at substantially lower compute cost in our setting. On Qwen2.5-7B, G2D at K=150 achieves 62.4% on MATH-500, outperforming GRPO (51.6%) by 10.8% at ~4x lower compute. On Llama-3.1-8B, G2D at K=500 achieves 49.4%, surpassing GRPO in our experimental setting. We show that performance is not governed by the number of preference pairs, which does not vary much w.r.t. K, but by their informativeness. Moderate warm-up produces rollouts with calibrated uncertainty, yielding stronger contrastive signal, while excessive warm-up leads to overconfident policies and less informative data. Our results recast the offline-online gap in RLVR as primarily a data informativeness problem, and identify short online RL warm-up with appropriate difficulty calibration of the fine-tuning dataset as a compute-efficient alternative to online RL.

【3】ChunkFT: Byte-Streamed Optimization for Memory-Efficient Full Fine-Tuning
标题：ChunkFT：字节流优化，实现内存高效的全面微调
链接：https://arxiv.org/abs/2605.21177

作者：Yongkang Liu,Zijing Wang,Mengjie Zhao,Ercong Nie,Mingyang Wang,Qian Li,Feiliang Ren,Shi Feng,Daling Wang,Hinrich Schütze
摘要：This work presents \textsc{ChunkFT}, a memory-efficient fine-tuning framework that reformulates full-parameter fine-tuning around a dynamically activated working set. \textsc{ChunkFT} enables gradient computation for arbitrary sub-tensors without modifying the network architecture, providing an algorithmic foundation for optimizing arbitrary sub-networks while avoiding standard dense gradient computation. We provide a theoretical convergence analysis of \textsc{ChunkFT} in the deterministic setting. Empirically, we apply \textsc{ChunkFT} to fine-tune Llama 3-8B and Llama 3-70B using a single RTX 4090-24GB GPU and 2$\times$ H800-80GB GPUs, respectively. Full-parameter fine-tuning of a 7B model with a 1K input length requires only 13.72GB of GPU memory. The results demonstrate the effectiveness of \textsc{ChunkFT} in memory usage, running time, and optimization quality. Moreover, downstream evaluations on language understanding, mathematical reasoning, and MT-Bench show that \textsc{ChunkFT} consistently outperforms existing memory-efficient baselines. Notably, \textsc{ChunkFT} achieves performance comparable to, and in some cases exceeding, full-parameter fine-tuning. Our repository is on https://github.com/misonsky/chunk.

【4】Advantage Collapse in Group Relative Policy Optimization: Diagnosis and Mitigation
标题：集团相对政策优化中的优势崩溃：诊断与缓解
链接：https://arxiv.org/abs/2605.21125

作者：Xixiang He,Qiyao Sun,Ao Cheng,Xingming Li,Xuanyu Ji,Hailun Lu,Runke Huang,Qingyong Hu
备注：26 pages, 12 figures. Accepted at the International Conference on Machine Learning (ICML 2026)
摘要：Group Relative Policy Optimization (GRPO), a prominent algorithm within the Reinforcement Learning from Verifiable Rewards (RLVR) framework, has achieved strong results in improving the reasoning capabilities of large language models (LLMs). However, GRPO is prone to advantage collapse, a failure mode where homogeneous rewards within a group (e.g., all correct or all incorrect answers) yield near-zero advantages and vanishing gradients. To address this, we introduce the Advantage Collapse Rate (ACR), the first diagnostic metric quantifying the proportion of training batches with ineffective gradients. Across models from 0.5B to 14B parameters on mathematical reasoning benchmarks, we show that ACR strongly predicts training stagnation and final performance. We then propose Adaptive Virtual Sample Policy Optimization (AVSPO), a lightweight extension of GRPO that injects virtual reward samples, guided by real-time ACR monitoring, to enable learning from homogeneous groups without additional model rollouts. AVSPO reduces advantage collapse by 58-63% relative to GRPO and yields consistent accuracy gains of 4-6 percentage points across all model scales, while maintaining generalization on the evaluated out-of-domain task. Code and datasets are available at https://qingyonghu.github.io/AVSPO.

【5】Linear-DPO: Linear Direct Preference Optimization for Diffusion and Flow-Matching Generative Models
标题：线性DPO：扩散和流匹配生成模型的线性直接偏好优化
链接：https://arxiv.org/abs/2605.21123

作者：Kesong Li,Yixuan Xu,Kuo-kun Tseng,Weiyi Lu,Kan Liu,Tao Lan
备注：Code and models are available at: https://github.com/Whynot0101/Linear-DPO . Work done during an internship at Alibaba Group
摘要：Direct Preference Optimization (DPO) is successful for alignment in LLMs but still faces challenges in text-to-image generation. Existing studies are confined to denoising diffusion models while overlooking flow-matching, and suffer from an objective mismatch when applying discrete NLP-based DPO to regression-based generative tasks.\ In this paper, we derive a generalized DPO objective that covers both diffusion and flow-matching via a unified reverse-time SDE framework, and point out from a gradient perspective that the standard DPO objective is suboptimal for text-to-image generation. Consequently, we propose Linear-DPO, which replaces the aggressive sigmoid-based utility function with a sustained linear utility and incorporates an EMA-updated reference model. Qualitative and quantitative experiments on diffusion models (SD1.5, SDXL) and flow-matching model (SD3-Medium) demonstrate the superiority of our approach over existing baselines.

【6】Improved Guarantees for Constrained Online Convex Optimization via Self-Contraction
标题：通过自压缩改进约束在线凸优化的保证
链接：https://arxiv.org/abs/2605.21107

作者：Dhruv Sarkar,Abhishek Sinha
摘要：We consider Constrained Online Convex Optimization (COCO) with adversarially chosen constraints. At each round, the learner chooses an action before observing the loss and constraint function for that round. The goal is to achieve small static regret against the best point satisfying all constraints while also controlling cumulative constraint violation ($\mathsf{CCV}$). For strongly convex losses, state-of-the-art algorithms achieve $O(\log T)$ regret and $O(\sqrt{T \log T})$ $\mathsf{CCV}.$ The corresponding best-known bounds for convex losses is $O(\sqrt{T})$ regret and $O(\sqrt{T} \log T)$ $\mathsf{CCV}$. In this paper, we give a simple projection-based algorithm that simultaneously achieves $O(\log T)$ regret and $O(\log T)$ $\mathsf{CCV}$ for strongly-convex losses, yielding an exponential improvement in the $\mathsf{CCV}$. For the convex losses, our algorithm improves the $\mathsf{CCV}$ to $O(\sqrt{T})$ while maintaining the optimal $O(\sqrt{T})$ regret. The key to our improvement is a recent geometric result for self-contracted curves, which may be of independent interest.

【7】UOTIP: Unbalanced Optimal Transport Map for Unpaired Inverse Problems
标题：UOTIP：不成对反问题的不平衡最优传输图
链接：https://arxiv.org/abs/2605.21094

作者：Donggyu Lee,Taekyung Lee,Jaewoong Choi
备注：Accepted at ICML 2026
摘要：We investigate unpaired image inverse problems, a challenging setting where only independent, non-paired sets of noisy measurements and clean target signals are available for training. We propose a novel inverse problem solver based on Unbalanced Optimal Transport, called Unbalanced Optimal Transport Map for Inverse Problems (UOTIP). Our method formulates the reconstruction task, predicting clean target signals from noisy measurements, as learning a UOT Map from noisy measurement distribution to clean signal distribution by incorporating a likelihood-based cost function. By relaxing the exact marginal constraint, the UOT framework provides key advantages to our model: robustness to multi-level observation noise, adaptability to class imbalance between noisy and clean datasets, and generalizability to diverse noise-type scenarios. Furthermore, we theoretically demonstrate that incorporating a quadratic cost term ensures the existence and uniqueness of the transport map by satisfying the twist condition, even for ill-posed inverse problems. Our experiments demonstrate that UOTIP achieves state-of-the-art performance on unpaired image inverse problem benchmarks, across linear and nonlinear inverse problems.

【8】Modeling Temporal scRNA-seq Data with Latent Gaussian Process and Optimal Transport
标题：利用潜在高斯过程和最优传输对时态scRN-seq数据建模
链接：https://arxiv.org/abs/2605.20989

作者：Mehmet Yigit Balik,Harri Lähdesmäki
摘要：Single-cell RNA sequencing provides insights into gene expression at single-cell resolution, yet inferring temporal processes from these static snapshot measurements remains a fundamental challenge. Current approaches utilizing neural differential equations and flows are sensitive to overfitting and lack careful considerations of biological variability. In this work, we propose a generative framework that models population trends using a latent heteroscedastic Gaussian process (GP) approximated by Hilbert space methods. To address the absence of genuine cell trajectories, we leverage an optimal transport (OT) objective that aligns generated and observed population distributions. Our method explicitly captures biological heterogeneity by incorporating cell-specific latent time and cell type conditioning to disentangle temporal asynchrony and trajectories to different cell types. We demonstrate state-of-the-art performance on complex interpolation and extrapolation benchmarks and introduce a novel gradient-based strategy for inferring perturbation trajectories.

【9】Learning fMRI activations dictionaries across individual geometries via optimal transport
标题：通过最佳传输学习各个几何形状的fMRI激活字典
链接：https://arxiv.org/abs/2605.20883

作者：Sonia Mazelet,Rémi Flamary,Bertrand Thirion
摘要：Dictionary learning is a powerful tool for creating interpretable representations. When applied to functional magnetic resonance imaging (fMRI) data, the resulting patterns of brain activity can be used for various downstream tasks, such as brain state classification or population-level analysis. However, a major challenge is the variability in brain geometry across individuals. This is usually addressed by projecting each individual brain geometry onto a common template, which removes subject-specific information. In this work, we introduce a novel approach to dictionary learning on fMRI data that explicitly accounts for this variability. We use the optimal transport-based Fused Gromov-Wasserstein (FGW) distance to compare graphs with different geometries and features. To address the challenge of computing multiple FGW distances for large graphs such as those arising from fMRI data, we rely on amortized optimization to learn a neural network that predicts an approximation of the optimal transport plans, which substantially reduces the computational cost. Additionally, we learn dictionary atoms that depend on the FGW trade-off parameter, which controls the balance between feature alignment and structural consistency. Numerical experiments on the HCP dataset demonstrate that the proposed approach captures different levels of geometric variability in the data and provides representations that preserve essential information.

【10】Beyond Numerical Features: CNN-Driven Algorithm Selection via Contour Plots for Continuous Black-Box Optimization
标题：超越数字特征：通过连续黑匣子优化的等值线图进行CNN驱动算法选择
链接：https://arxiv.org/abs/2605.20797

作者：Yiliang Yuan,Xiang Shi,Mustafa Misir
摘要：The present paper introduces a new representation-driven approach to per-instance algorithm selection, applied to black-box optimization, for automatically choosing the most promising solver from a fixed portfolio. Prior work in continuous optimization largely relies on numerical descriptors, including Exploratory Landscape Analysis features and learned embeddings such as Deep-ELA. This work studies a complementary representation: contour-map visualizations of probed landscapes. A CNN regressor takes multiple instance-specific contour views (stacked or encoded per view and aggregated) and predicts per-solver performance, enabling selection by the predicted best value. On the standard BBOB 2009 single-objective protocol, the resulting selectors significantly outperform the single best solver (SBS) and are competitive with feature-based baselines. A subsequent bi-objective evaluation under the DeepELA setting further indicates that the same image-based principle can be competitive when using windowed contour views. Overall, the results suggest that simple vision models can exploit spatial structure in probed landscapes for algorithm selection without handcrafted ELA features.

【11】ShapeBench: A Scalable Benchmark and Diagnostic Suite for Standardized Evaluation in Aerodynamic Shape Optimization
标题：ShapeBench：用于气动外形优化标准化评估的可扩展基准和诊断套件
链接：https://arxiv.org/abs/2605.20763

作者：Shaghayegh Fazliani,Krissh Chawla,Jack Guo,Yiren Shen,Matthias Ihme,Madeleine Udell
摘要：Rapid progress in aerodynamic shape optimization (ASO) has outpaced currently-available standardized evaluation frameworks. Fair comparison requires a unified benchmark spanning diverse shape classes, objective formulations, and matched-budget state-of-the-art baselines. We introduce ShapeBench, an open-source ASO benchmark with a unified API spanning 103 tasks across eight shape categories and multiple optimization regimes. Each ShapeBench task includes a validated surrogate for fast search; when feasible, a high-fidelity Computational Fluid Dynamics (CFD) pipeline for final verification is available, enabling systematic fidelity-gap analysis. ShapeBench provides a reproducible protocol with well-configured baselines to compare fairly using a consistent budget metric, allowing for comparison among both classical and LLM-driven methods, including general-purpose optimizers and a new domain-specialized evolutionary LLM baseline, ShapeEvolve. Results on ShapeBench demonstrate substantial variance in optimizer rankings across shape categories and problem formulations, with mean pairwise Spearman $ρ= 0.013$, so single-task conclusions do not reliably generalize across problem classes. The benchmark is also far from saturation; classical methods are rarely applicable across all shape categories and tasks, further highlighting the need for more general-purpose approaches.

【12】Distributed Direct Preference Optimization
标题：分布式直接偏好优化
链接：https://arxiv.org/abs/2605.20696

作者：Zhanhong Jiang
备注：29 pages, 12 figures
摘要：Preference-based reinforcement learning (RL) is a key paradigm for aligning policies with human judgments, yet its theoretical behavior in distributed settings where preference data are fragmented across heterogeneous users remains poorly understood. Direct Preference Optimization (DPO) avoids explicit reward modeling but lacks convergence guarantees under federated and decentralized training, where communication constraints and non-IID preferences fundamentally alter optimization dynamics. We provide the first convergence and time-complexity analysis of DPO in distributed environments. Modeling personalized offline RL with user-specific preference distributions, we characterize the induced global optimization landscape. For federated DPO, we derive convergence rates that quantify the impact of client drift, communication frequency, and preference heterogeneity; for decentralized DPO, we establish convergence over general communication graphs and show how spectral connectivity governs optimization speed and consensus. Empirically, we corroborate our theoretical insights on standard alignment benchmarks, demonstrating that our proposed methods not only enjoy strong theoretical guarantees but also deliver robust and scalable performance in practice. The code base is available here.

【13】Ada2MS: A Hybrid Optimization Algorithm Based on Exponential Mixing of Elementwise and Global Second-Moment Estimates
标题：Ada 2MS：基于元素和全局二次矩估计指数混合的混合优化算法
链接：https://arxiv.org/abs/2605.20533

作者：Meng Zhu,Quan Xiao,Weidong Min
摘要：Optimization algorithms are core methods by which machine learning models iteratively minimize loss functions, update parameters, learn from data, and improve performance. Momentum SGD and AdamW represent two important optimization paradigms. AdamW produces stable updates and usually has strong robustness across training scenarios, but its generalization performance is sometimes weaker than that of momentum methods. Momentum SGD can often obtain better generalization after careful tuning, but it is more sensitive to gradient-scale variation and hyperparameter settings. To balance the strengths and weaknesses of the two paradigms, this paper proposes Ada2MS, an optimization algorithm that achieves a smooth transition between AdamW-like behavior and momentum-SGD-like behavior through continuous exponential interpolation between elementwise second-moment estimates and global second-moment estimates. On the visual tasks evaluated in this study, Ada2MS obtains competitive results under a unified optimizer-comparison protocol. The code will be released at https://github.com/mengzhu0308/Ada2MS

【14】Group-Algebraic Tensors: Provably-optimal Equivariant Learning and Physical Symmetry Discovery
标题：群代数张量：可预见的等变学习和物理对称性发现
链接：https://arxiv.org/abs/2605.20440

作者：Paulina Hoyos,Shashanka Ubaru,Dongsung Huh,Vasileios Kalantzis,Kenneth L. Clarkson,Misha Kilmer,Haim Avron,Lior Horesh
摘要：We introduce the $\star_G$ tensor algebra, in which any finite group $G$ defines the multiplication rule, making equivariance an intrinsic algebraic property rather than an architectural constraint. The framework rests on three machine-verified theoretical pillars: (i)~an Eckart-Young optimality guarantee for the $\star_G$-SVD: the first such result for symmetry-preserving tensor approximation, exact and polynomial-time; (ii)~a Kronecker factorization that composes multiple symmetries by replacing $F_G$ with $F_{G_1} \otimes F_{G_2}$ with no architectural redesign; and (iii)~a 600-line Lean~4 formalization of the $\star_G$ algebra. The framework provides capabilities that equivariant neural networks (ENNs) structurally cannot: a closed-form per-irreducible-representation decomposition of every prediction, and data-driven discovery of the symmetry group that best fits a dataset. As a non-trivial empirical demonstration, decomposing QM9 molecular geometry over the chiral octahedral subgroup of SO(3) recovers the Wigner--Eckart selection rules of angular momentum from data alone, with no quantum mechanical input: scalar properties are A$_1$-dominated, dipole components are T$_1$-dominated, the isotropic polarizability is uniquely insensitive to $l\!=\!1$ as the rank-2-trace decomposition $l\!=\!0 \oplus l\!=\!2$ requires, and the T$_1$/A$_1$ predictive-power ratio separates vector observables from scalar observables by a factor of five. On full QM9 (130{,}831 molecules), $\star_G$-SVD with ridge regression provides closed form predictions at $\sim50-90\times$ fewer parameters than parameter-matched MLPs. Algebraic equivariance thus complements architectural equivariance not as a faster-better-cheaper alternative but as a different mathematical affordance: provably-optimal symmetry-preserving compression, per-irrep interpretability, and data-driven physical discovery.

【15】Lean Refactor: Multi-Objective Controllable Proof Optimization via Agentic Strategy Search
标题：精益重构：通过统计策略搜索进行多目标可控证明优化
链接：https://arxiv.org/abs/2605.20244

作者：Jialin Lu,Soonho Kong,Rodrigo Stehling,Kaiyu Yang,Zhangyang Wang,Weiran Sun,Wuyang Chen
摘要：We present Lean Refactor, a plug-and-play retrieval-augmented agentic framework for multi-objective, controllable, and version-robust refactoring of Lean proofs. LLM-generated proofs are notoriously correct-but-verbose and brittle across library versions, yet existing refactoring works overlook three practical challenges: 1) Lean refactoring is natively multi-objective (proof length, compilation cost, and version compatibility are often in tension); 2) Lean repositories have fragile compatibility, whereas LLM releases are unaware of Lean/Mathlib versions; 3) Training-based pipelines require repeated fine-tuning with each new LLM release, scaling neither with model churn nor with Lean's release cycle. Lean Refactor steers a frozen agentic LLM with retrievals from a curated database of multi-objective refactoring strategies, each densely annotated with metadata such as supported Lean/Mathlib versions and expected compilation-cost reduction. Experiments show over $70\%$ token-level compression on competition benchmarks, over $20\%$ on research repositories, and up to $60\%$ compilation-time reduction, outperforming prior work and Claude Code. Version-filtered retrieval further improves compression on the target Lean version, and refactored miniF2F proofs exhibit stronger zero-shot version transfer to future Lean releases than their unrefactored counterparts.

【16】Memorisation, convergence and generalisation in generative models
标题：生成模型中的精简化、收敛化和概括化
链接：https://arxiv.org/abs/2605.21402

作者：Antoine Maillard,Sebastian Goldt
摘要：Generative neural networks learn how to produce highly realistic images from a large, but finite number of examples - or do they simply memorise their training set? To settle this question, Kadkhodaie, Guth, Simoncelli and Mallat (ICLR '24) trained diffusion models independently on disjoint subsets of a dataset and showed that they converge to nearly the same density when the number of training images is large enough. This result raises two basic questions: how much data do you need for convergence, and what does convergence capture about learning the data distribution? Here, we address these questions by providing an exact analytical characterisation of the transition from memorisation to generalisation in linear generative models. We find that these models memorise at small load, while convergence emerges continuously when the number of samples is linear in the input dimension. Strikingly, we find that convergence is insensitive to recovery of the principal latent factors of the data, which are recovered in a sharp transition. After extending our approach to data with power-law spectra, we find the same distinction between convergence and latent recovery in our experiments with convolutional denoisers and in the data of Kadkhodaie et al. We thus show that generalisation in generative models decomposes into at least two distinct objectives: matching the bulk of the data distribution and recovering the principal latent factors. These objectives correspond to two different distances between true and learnt data distribution, and only the first one is captured by convergence.

【17】Time-Dependent PDE-Constrained Optimization via Weak-Form Latent Dynamics
标题：基于弱形式潜在动力学的时变偏方程约束优化
链接：https://arxiv.org/abs/2605.20639

作者：April Tran,Terry Haut,David Bortz,Youngsoo Choi
摘要：Optimization problems constrained by high-dimensional, time-dependent partial differential equations require repeated forward and sensitivity solves, making high-fidelity optimization computationally prohibitive in many-query design and control settings. We present a weak-form latent-space reduced-order modeling framework for accelerating gradient-based PDE-constrained optimization. The proposed approach builds on Weak-form Latent Space Dynamics Identification (WLaSDI), which compresses high-dimensional solution trajectories into a low-dimensional latent representation and identifies parametric latent dynamics using weak-form system identification. By avoiding explicit numerical differentiation of training trajectories, the weak-form improves robustness to noisy data and yields more reliable surrogate dynamics for optimization. We formulate the resulting reduced PDE-constrained optimization problem and derive both direct-sensitivity and adjoint-based gradient expressions for the learned latent dynamics, enabling scalable gradient evaluation with respect to design parameters. The framework is demonstrated on three time-dependent benchmark problems: thermal radiative transfer for optimal hohlraum design, the two-stream instability Vlasov-Poisson system, and the inviscid Burgers equation. Across these examples, WLaSDI produces accurate optimal designs, remains robust under noisy training data, and delivers substantial computational savings, including speedups of up to five orders of magnitude relative to full-order optimization. These results demonstrate that weak-form latent dynamics provide an efficient and noise-robust surrogate foundation for gradient-based optimization of complex time-dependent PDE systems.

【18】The Economics of Model Collapse: Equilibrium, Welfare, and Optimal Provenance Subsidies in Synthetic Data Markets
标题：模型崩溃的经济学：合成数据市场中的均衡、福利和最佳出处补贴
链接：https://arxiv.org/abs/2605.20279

作者：Gustav Olaf Yunus Laitinen-Fredriksson Lundström-Imanov
备注：7 pages, 5 tables, 1 algorithm; IEEEtran conference format; submitted to IEEE BigData 2026
摘要：Generative artificial intelligence is rapidly transforming the supply side of training data: an increasing share of new tokens, images, and structured records is produced by previous-generation models rather than by human originators. Recursive training on such synthetic content induces a measurable and often irreversible loss of distributional fidelity, a phenomenon known as model collapse. We develop the first unified microeconomic theory of synthetic data markets under model collapse. We introduce the Synthetic Data Contamination Equilibrium (SDCE), prove existence and generic uniqueness, derive a welfare decomposition W = W_prod + W_cons - L_coll - L_info, establish a Wasserstein-gradient-flow mean-field collapse limit, prove an impossibility of information-constrained implementation, and obtain closed-form expressions for the welfare-maximizing provenance subsidy s* = KL(q||p)/(2 kappa) and the welfare-maximizing watermark strength w* = (1 - psi) KL(q||p)/(2 kappa psi). We prove an information-theoretic Cramer-Rao lower bound on any provenance estimator using only producer-side observations and show that the Provenance-Market Iterative Retraining (PMIR) algorithm attains this bound up to constants while converging to an epsilon-SDCE in O(epsilon^-2 log T) iterations. A reduced-form OLS estimation on a C4-synthetic benchmark over ten retraining generations yields a collapse-rate coefficient b-hat = 0.181 (HAC s.e. 0.024), within one standard error of the structural prediction 0.183. Calibrated experiments raise generation-ten model quality by 23.1 percent over the unregulated benchmark while lowering the 2-Wasserstein drift on a held-out diversity probe from 0.318 to 0.142. Scaling experiments over generations t in {1,...,10} recover a logarithmic-in-t collapse law log Q_t = log Q_0 - 0.183 t rho^2 with R^2 = 0.962.

【19】Multi-Head Attention as Ensemble Nadaraya-Watson Estimation: Variance Reduction, Decorrelation, and Optimal Head Diversity
标题：多头部注意力随着Nadaraya-Watson估计的增加：方差减少、去相关和最佳头部多样性
链接：https://arxiv.org/abs/2605.20271

作者：Ernest Fokoué
备注：14 pages
摘要：We develop a rigorous statistical theory of multi-head attention (MHA) as an ensemble of Nadaraya-Watson (NW) kernel regression estimators. Building on the algebraic identity between single-head softmax attention and the NW estimator, we prove that MHA is a structured ensemble of H NW estimators, each operating in a distinct learned projection subspace of the key space. We derive an explicit Bias-Variance-Covariance decomposition of the MHA mean squared error, showing that variance reduction depends not merely on the number of heads H but fundamentally on the decorrelation of head outputs. Decorrelation is governed by the principal angles between learned projection subspaces: orthogonal projections yield maximum variance reduction; aligned projections yield none. We introduce the Head Diversity Index (HDI), a computable spectral measure of inter-head decorrelation, and prove that MHA mean squared error is monotonically decreasing in HDI. This provides the first rigorous theoretical explanation for the empirically observed specialization of attention heads. Under a fixed total-dimension budget D = H * d_k, we solve the optimal head-dimension allocation problem, deriving the MSE-minimizing pair (H*, d_k*) from data distribution and regression smoothness. The solution yields a new architectural scaling law: the optimal per-head dimension grows logarithmically with training set size, while the optimal number of heads grows nearly linearly with the total budget D. Our framework unifies three strands of prior work: the NW theory of single-head attention, the general weighting theory for ensemble learning, and the decorrelation-variance-reduction isomorphism between biological and computational ensembles. Multi-head attention is the Transformer's instantiation of a universal principle: identical agents plus diversity-enforcing mechanisms yields emergent optimality.

【20】Quantum End-to-End Learning for Contextual Combinatorial Optimization
标题：用于上下文组合优化的量子端到端学习
链接：https://arxiv.org/abs/2605.20222

作者：Jaehwan Lee,Changhyun Kwon
备注：23 pages, 2 figures, preprint
摘要：Contextual combinatorial optimization (CCO) plays a critical role in decision-making under uncertainty, yet remains a significant challenge. We present Quantum End-to-End Learning (QEL), the first quantum computing-based end-to-end learning framework for CCO that leverages Quantum Approximate Optimization Algorithms. Inspired by the integration of state preparation and evolution in data re-uploading, we propose a context re-uploading phase-separator that jointly captures the complex relations among contexts, uncertain coefficients, and optimal solutions. This allows a contextual encoder to be seamlessly integrated within a quantum surrogate policy, enabling joint end-to-end training with a stationarity guarantee. Exploiting an optimization-aware structure grounded in physical principles that classical methods cannot readily leverage, our approach demonstrates practicality by directly training on task loss despite the discreteness and nonconvexity, while avoiding calls to NP-hard optimization solvers. QEL empirically achieves competitive performance while requiring substantially fewer parameters than classical benchmarks, highlighting its industrial-level potential for the future quantum era.

预测|估计(12篇)

【1】Reviving Error Correction in Modern Deep Time-Series Forecasting
标题：现代深度时间序列预测中的误差修正
链接：https://arxiv.org/abs/2605.21088

作者：Minh Hoang Nguyen,Dai Do,Huu Hiep Nguyen,Dung Nguyen,Kien Do,Hung Le
备注：27 pages
摘要：Modern deep-learning models have achieved remarkable success in time-series forecasting. Yet, their performance degrades in long-term prediction due to error accumulation in autoregressive inference, where predictions are recursively used as inputs. While classical error correction mechanisms (ECMs) have long been used in statistical methods, their applicability to deep learning models remains limited or ineffective. In this work, we revisit the error accumulation problem in deep time-series forecasting and investigate the role and necessity of ECMs in this new context. We propose a simple, architecture-agnostic error correction model that can be integrated with any existing forecaster without requiring retraining. By explicitly decomposing predictions into trend and seasonal components and training the corrector to adjust each separately, we introduce the Universal Error Corrector with Seasonal-Trend Decomposition (UEC-STD), which significantly improves correction accuracy and robustness across 4 backbones and 10 datasets. Our findings provide a practical tool for enhancing forecasts while offering new insights into mitigating autoregressive errors in deep time-series models. Code is available at https://github.com/DA2I2-SLM/UEC-STD.

【2】Hybrid Machine Learning Model for Forest Height Estimation from TanDEM-X and Landsat Data
标题：利用TanDEM-X和Landsat数据估计森林高度的混合机器学习模型
链接：https://arxiv.org/abs/2605.20997

作者：Islam Mansour,Ronny Haensch,Irena Hajnsek,Konstantinos Papathanassiou
摘要：Integrating machine learning (ML) with physical models (PM) has emerged as a promising way of retrieving geophysical parameters from remote sensing data. In this context, a ML model for estimating forest height from TanDEM-X interferometric coherence measurements has recently been proposed, that constrains the learning process through a PM. While the features used for training and inversion where selected to ensure the physical consistency of the solutions, they could not resolve all height / structure and baseline / terrain slope ambiguities in the data. To improve this, the extension of the feature space with optical Landsat data is proposed able to provide complementary information on forest type or structure. The extended model is applied and validated on several TanDEM-X acquisitions over the Gabonese Lopé national park site and assessed against airborne LiDAR measurements. Results show a 13.5% reduction in RMSE and a 16.6% reduction in MAE compared to the original hybrid model, confirming the added value of multispectral inputs.

【3】Dynamic TMoE: A Drift-Aware Dynamic Mixture of Experts Framework for Non-Stationary Time Series Forecasting
标题：动态TMoE：用于非平稳时间序列预测的具有漂移意识的动态混合专家框架
链接：https://arxiv.org/abs/2605.20678

作者：Jiawen Zhu,Shuhan Liu,Di Weng,Yingcai Wu
备注：27 pages, 7 figures. Accepted to ICML 2026
摘要：Non-stationary time series forecasting is challenged by evolving distribution shifts that static models struggle to capture. While Mixture-of-Experts (MoE) architectures offer a promising paradigm for decoupling complex drift patterns, existing approaches are limited by fixed expert pools and memoryless routing, hampering their ability to adapt to abrupt regime shifts. To address this, we propose Dynamic TMoE, a framework that unifies architectural evolution with temporal continuity during learning phase. By detecting distribution shifts via Maximum Mean Discrepancy (MMD), we dynamically instantiate heterogeneous experts and prune redundant ones to optimize capacity. Additionally, a temporal memory router leverages recurrent states and an anomaly repository to ensure stable, context-aware expert selection without requiring test-time updates. Experiments on nine benchmarks demonstrate state-of-the-art performance, reducing MSE by 10.4% and MAE by 7.8%. Code is available at https://github.com/andone-07/Dynamic-TMoE.

【4】Online Conformal Prediction with Corrupted Feedback
标题：反馈受损的在线保形预测
链接：https://arxiv.org/abs/2605.20515

作者：Bowen Wang,Matteo Zecchin,Osvaldo Simeone
摘要：Modern artificial intelligence systems require calibrated uncertainty estimates that remain reliable in sequential and non-stationary environments. Online conformal prediction (OCP) addresses this challenge through adaptively updated prediction sets that provide deterministic long-run miscoverage guarantees. These guarantees, however, hinge on the assumption of perfect feedback about the coverage of past prediction sets. In practice, the observed miscoverage indicator may be corrupted by noise, communication failures, or adversarial manipulation, which can severely degrade OCP's calibration guarantees. In this paper, we study OCP under corrupted feedback. We first model feedback corruption as an arbitrary binary flip sequence, and analyze how feedback corruption affects and degrades the miscoverage performance of standard OCP. We then propose two robust schemes: robust OCP via filtering, which leverages the structural properties of the predicted threshold to filter corrupted feedback, and robust OCP via active compensation, which incorporates an active compensation mechanism to mitigate the effect of corrupted feedback. For both methods, we establish explicit miscoverage guarantees, which are further specialized for an independent stochastic flip model and for an arbitrary error model with memory bounds. Experiments on real-world datasets validate the proposed approach, showing markedly improved calibration and significantly smaller prediction sets compared with baseline OCP methods under corrupted feedback.

【5】Closed-form predictive coding via hierarchical Gaussian filters
标题：通过分层高斯过滤器的封闭形式预测编码
链接：https://arxiv.org/abs/2605.20293

作者：Aleksandrs Baskakovs,Sylvain Estebe,Kenneth Enevoldsen,Kristoffer Nielbo,Chris Mathys,Nicolas Legrand
摘要：Predictive coding (PC) offers a local and biologically grounded alternative to backpropagation in the training of artificial neural networks, yet to date, it remains slower, and performance degrades sharply as network depth increases. We trace both problems to a single simplification: current PC networks fix the precision matrix to the identity, discarding precision-weighted prediction errors that the variational derivation requires to be fast, local, and Bayesian. We close this gap by expressing predictive coding networks as deep hierarchical Gaussian filters (HGFs) and restore precision-weighted message passing, yielding dynamic uncertainty estimates and Hebbian-compatible update rules at every layer. The resulting networks can simultaneously learn activations, weights, and precisions under a single free-energy objective, with no global error signal, and resolve inference without requiring iterations or automatic differentiation. On FashionMNIST, our solution approaches backpropagation in epoch-level wall-clock cost while converging in fewer epochs, and outperforms it on online, data efficiency, and concept-drift tasks. We thus establish that closed-form variational inference with online precision learning provides a tractable foundation for deep predictive coding networks, retaining biological and interpretative advantages, without requiring iterative relaxation or global error signals.

【6】FusionCell: Cross-Attentive Fusion of Layout Geometry and Netlist Topology for Standard-Cell Performance Prediction
标题：FusionCell：布局几何和网表布局的交叉融合，用于标准单元性能预测
链接：https://arxiv.org/abs/2605.20287

作者：Haoyi Zhang,Kairong Guo,Bojie Zhang,Yibo Lin,Runsheng Wang
摘要：Standard cells form the building blocks of digital circuits, so their delay and power critically influence chip-level performance; yet characterization still relies on slow simulation sweeps, and many fast predictors ignore layout geometry, missing coupling and layout-dependent effects. The challenge is to jointly represent layout geometry and netlist topology so models capture fine-grained spatial details together with structural connectivity for accurate performance prediction. We introduce FusionCell, a dual-modality predictor that treats routed layout geometry and netlist topology as inputs and fuses them explicitly in a unified model. A DeiT encoder processes three-layer routed layouts, while a graph transformer models heterogeneous device/net graphs. The modalities are integrated through a topology-guided mechanism, where the netlist acts as a structural "map" to actively query relevant physical regions in the layout for joint geometric and topological reasoning. We build a 7nm dataset based on the ASAP7 PDK with over 19.5k cells spanning 149 types using automatic tools, targeting six metrics: signal rise/fall delay, transition, and power. Experimental results demonstrate that FusionCell reduces regression error, with an average MAPE of 0.92 percent, and improves Spearman/Kendall ranking over baselines, while accelerating the characterization process by orders of magnitude compared to circuit simulation.

【7】Instance Discrimination for Link Prediction
标题：链接预测的实例辨别
链接：https://arxiv.org/abs/2605.20257

作者：Valentin Cuzin-Rambaud,Mathieu Lefort,Rémy Cazabet
摘要：Recently, instance discrimination models have emerged as a major solution for self-supervised learning. Having already demonstrated its effectiveness in the image domain, instance discrimination learning is now proving equally convincing in the graph domain, in particular for node classification. However, fewer contributions have tackled the link prediction task. In this contribution, we propose to adapt existing methods to this context. We first provide a rigorous evaluation of existing self-supervised models in the field of link prediction, showing that the main performance depends on the augmentation process (like in computer vision). We then propose a new structural augmentation based on the community structure that is relevant for link prediction. Our main contribution introduces two new models, L-GRACE and L-BGRL, based on link representations instead of node representations, which improve the performance of the existing methods, especially on unattributed graphs, and we show that they perform on par with the state of the art, both in supervised and self-supervised contexts.

【8】Data Scaling as Progressive Coverage of a Predictive Contribution Spectrum
标题：数据缩放作为预测贡献谱的渐进覆盖
链接：https://arxiv.org/abs/2605.20196

作者：Zihui Song,Shihao Ji,Hongxi Li,Shuaizhi Cheng,Chunlin Huang
备注：8 pages,6 figures
摘要：We investigate the hypothesis that real-data scaling laws are governed by progressive coverage of a latent predictive contribution spectrum rather than by token-frequency tails alone. We work with a suffix-automaton representation of text corpora and define a data-intrinsic global-KL predictive contribution spectrum, in which each state contributes according to its empirical mass times its KL deviation from a global next-token baseline. Across 12 real corpora, the tail slope of this spectrum is already strongly correlated with the empirical data-scaling exponent of a fixed small GPT learner. We then go beyond slope correlation and define, for each training size N, an effective truncation rank K(N) by matching the observed excess loss to the residual tail mass of the prepared 1000k global-KL spectrum. Empirically, log K is close to linear in log N, with pooled R^2 about 0.96 for the raw spectrum and R^2 about 0.90 for the smoothed spectrum. These findings provide strong empirical support for a simple mechanism picture: training scale advances an effective frontier through a predictive state spectrum, and the residual tail mass of that spectrum tracks the remaining excess loss.

【9】Neural Estimation of Pairwise Mutual Information in Masked Discrete Sequence Models
标题：掩蔽离散序列模型中成对互信息的神经估计
链接：https://arxiv.org/abs/2605.20187

作者：Jai Sharma,Yifan Wang,Bryan Li
备注：6 pages, 3 figures; submitting to ICML 2026
摘要：Understanding dependencies between variables is critical for interpretability and efficient generation in masked diffusion models (MDMs), yet these models primarily expose marginal conditional distributions and do not explicitly represent inter-variable dependence. We propose a neural framework for estimating pairwise conditional mutual information (MI) directly from the hidden states of a pretrained MDM, using ground-truth MI computed from the model's own conditional distributions for supervision. The resulting estimator captures the model's internal belief about dependency structure and predicts the full MI matrix in a single forward pass, enabling MI-guided parallel decoding by identifying conditionally independent subsets of variables. We evaluate our approach on Sudoku and protein sequence generation with ESM-C, where the MI maps recover known structural constraints and enable a 3-5x magnitude reduction in inference-time forward passes compared to sequential decoding, while preserving generative quality and outperforming entropy-based parallelization methods.

【10】Neural Negative Binomial Regression for Weekly Seismicity Forecasting: Per-Cell Dispersion Estimation and Tail Risk Assessment
标题：每周地震活动预测的神经负二项回归：每单元离散度估计和尾部风险评估
链接：https://arxiv.org/abs/2605.21437

作者：Alim Igilik
备注：28 pages, 9 figures. Source code available at https://github.com/Al1mkaYandere/seismic-probabilistic-modeling
摘要：Standard approaches to forecasting the weekly number of earthquakes on a spatial grid rely on the Poisson distribution with a single global dispersion assumption. We show that this assumption is systematically violated in seismic data from Central Asia (2010-2024), where a likelihood-ratio test with boundary correction strongly rejects the Poisson hypothesis (p < 10^{-179}). The main contribution of this work is the EarthquakeNet architecture, which provides an endogenous per-cell estimate of the overdispersion parameter alpha via a neural network (spatial embeddings + MLP), without explicit spatial covariance specification. In contrast to existing negative binomial regression approaches in seismological forecasting, which typically assume a single global alpha, the proposed per-cell formulation allows the model to identify spatial heterogeneity in seismic clustering and to construct probabilistic risk-aware alerts via quantiles of the predicted distribution. A walk-forward evaluation (2018-2023) over four systems shows an 8.6 percent reduction in mean pinball deviation (MPD) relative to a negative binomial GLM baseline. The strongest improvements are observed in the tail regime (Y >= 5), where the continuous ranked probability score (CRPS) of the proposed model is 12.5 percent lower than that of the baseline, indicating improved calibration in extreme-event forecasting.

【11】Semiparametric Efficient Bilevel Gradient Estimation
标题：半参数有效二层梯度估计
链接：https://arxiv.org/abs/2605.21341

作者：Fares El Khoury,Houssam Zenati,Nathan Kallus,Michael Arbel,Aurélien Bibaut
摘要：Functional bilevel methods estimate a lower-level function and plug it into a hypergradient, but this plug-in gradient can retain first-order bias when the lower-level problem is learned nonparametrically. To remove this bias, we develop a semiparametric debiasing theory for population bilevel gradients based on the efficient influence function. This perspective leads to a cross-fitted orthogonal hypergradient estimator for which we establish asymptotic normality together with uniform control over the outer parameter. Under quadratic losses, the estimator reduces to a simple doubly robust score based on conditional mean nuisances. On synthetic bilevel benchmarks with known ground truth, the method tracks the oracle efficient-gradient benchmark and improves over plug-in functional hypergradients and regularized kernel bilevel baselines.

【12】Group-Aware Matrix Estimation and Latent Subspace Recovery
标题：群感知矩阵估计与潜在子空间恢复
链接：https://arxiv.org/abs/2605.20559

作者：Hamza Golubovic,Matthew Shen,Genevera I. Allen,Tarek M. Zikry
备注：12 pages, 6 main figures, 1 main algorithm
摘要：Modern matrix completion problems often involve heterogeneous data whose rows simultaneously belong to many meta-categories, such as demographic and age groups in recommendation systems, or region and recording session labels in neural electrophysiological experiments. Standard low-rank estimators impose a single global latent geometry, which can recover average structure but may smooth away subgroup-specific variation, especially when observations are unevenly distributed across groups. We introduce Group-Aware Matrix Estimation (GAME), a convex estimator for overlapping subgroup-wise low-rank matrix estimation. GAME regularizes category-specific submatrices through overlapping nuclear-norm penalties, allowing related groups to borrow information while preserving local latent structure in a shared coordinate system. We provide finite-sample guarantees for both reconstruction error and subgroup-specific subspace recovery, showing how performance depends on sampling density, subgroup rank, and overlap structure. Experiments on synthetic, recommendation, ecological, and neuroscience datasets show that GAME is most beneficial in structured missingness regimes, where subgroup-aware regularization improves both reconstruction accuracy and latent subspace fidelity. Across these benchmarks, GAME is competitive or best among global low-rank, side-information, and modern imputation baselines, with the largest gains when subgroups exhibit distinct low-rank structure.

其他神经网络|深度学习|模型|建模(38篇)

【1】A Machine Learning Framework for Weighted Least Squares GNSS Positioning based on Activation Functions
标题：基于激活函数的加权最小平方GNSS定位机器学习框架
链接：https://arxiv.org/abs/2605.21461

作者：Pin-Hsun Lee,Harry Leib
摘要：Global Navigation Satellite Systems (GNSS) are widely used to provide position, velocity, and timing (PVT) information for various applications, including transportation, location-based communication services, and intelligent agriculture. In urban canyons, high-rise buildings and narrow streets can cause signal obstruction, non-line-of-sight (NLOS) reception, and multipath effects that introduce errors in GNSS pseudorange measurements. Although multi-constellations GNSS effectively increase the number of available satellites, the inclusion of degraded signals can lead to severe positioning errors. This study proposes a machine learning framework for the weighted least squares (WLS) algorithm incorporating activation functions to enhance positioning accuracy. Several signal quality indicators are employed as training features for ensemble learning algorithms to identify poor quality signals by providing quality scores. Then, activation functions are employed to transform the machine learning predicted scores to appropriate weights for WLS positioning. To evaluate the performance of our approach, experiments are conducted using real-world datasets from Hong Kong and Tokyo urban areas. Comparative analysis of activation functions reveals that sigmoid functions consistently yield the greatest improvements with different machine learning algorithms and GNSS constellation configurations. The proposed algorithm demonstrates substantial reductions in positioning errors for both single- and multiconstellation scenarios. Furthermore, our results indicate that the proposed algorithm exhibits strong geographical transferability. The proposed algorithm maintains comparable level of performance when trained on data from other regions with similar levels of urbanization.

【2】Approximation Theory for Neural Networks: Old and New
标题：神经网络的逼近理论：新旧
链接：https://arxiv.org/abs/2605.21451

作者：Soumendu Sundar Mukherjee,Himasish Talukdar
备注：31 pages, 4 figures
摘要：Universal approximation theorems provide a mathematical explanation for the expressive power of neural networks. They assert that, under mild conditions on the activation function, feedforward neural networks are dense in broad function classes, such as continuous functions on compact subsets of $\mathbb{R}^d$, $L^p$ spaces, or Sobolev spaces. Over the past four decades, these qualitative universality results have evolved into a rich quantitative theory addressing approximation rates, parameter efficiency, and the role of architectural features such as depth and width. This survey presents several glimpses into this theory. We review classical density results for single-hidden-layer networks, as well as quantitative bounds that relate approximation error to network size and smoothness assumptions on target functions. Particular emphasis is placed on depth--width trade-offs and on results demonstrating that deeper architectures can achieve superior parameter efficiency for structured function classes. In addition to standard feedforward neural networks, we also review recent developments on Kolmogorov--Arnold Networks (KANs), which offer an alternative architectural paradigm and whose approximation-theoretic properties have begun to attract significant theoretical attention.

【3】Gaussian Sheaf Neural Networks
标题：高斯条神经网络
链接：https://arxiv.org/abs/2605.21435

作者：André Ribeiro,Ana Luiza Tenório,Tiago da Silva,Diego Mesquita
摘要：Graph Neural Networks (GNNs) have become the de facto standard for learning on relational data. While traditional GNNs' message passing is well suited for vector-valued node features, there are cases in which node features are better represented by probability distributions than real vectors. Concretely, when node features are Gaussians, characterized by a mean and a covariance matrix, naively concatenating their parameters into a single vector and applying standard message passing discards the geometric and algebraic structure that governs means and covariances. We propose Gaussian Sheaf Neural Networks (GSNNs), a principled framework that incorporates these inductive biases into graph-based learning. Building on the theory of cellular sheaves, we derive a new Laplacian operator that generalizes the sheaf Laplacian to this setting and preserves its key properties. We complement our theoretical contributions with experiments on synthetic and real-world data that illustrate the practical relevance of GSNNs.

【4】Towards Resilient and Autonomous Networks: A BlueSky Vision on AI-Native 6G
标题：迈向弹性和自主网络：人工智能原生6G的蓝天愿景
链接：https://arxiv.org/abs/2605.21395

作者：Liang Wu,Kelly Wan,Mayank Darbari,Liangjie Hong
备注：Accepted at KDD 2026
摘要：The proliferation of emerging applications, such as autonomous driving and immersive experiences, demands cellular networks that are not only faster, but fundamentally more resilient and autonomous. This paper presents a BlueSky vision on how Artificial Intelligence will be natively integrated into 6G, shifting the paradigm from \underline{Network for AI} to \underline{AI for Network}. We envision that, unlike 5G's reliance on scattered, ad-hoc models each trained for a single task, native AI in the 6G era will be anchored by a foundation model and and orchestrated via collaborative multi-agent systems, framing network management as a unified, multi-modal, multi-task optimization problem. Built on this vision, we outline two transformative directions. The first focuses on developing a 6G foundation model as a unified backbone, with task-specific knowledge distilled into compact models suited for diverse edge deployments. The second advances multi-agent systems designed to autonomously diagnose, maintain, and recover networks with minimal human intervention. These directions chart a roadmap for 6G to evolve into an intelligent, self-sustaining communication infrastructure.

【5】On the Regularity and Generalization of One-Step Wasserstein-guided Generative Models for PDE-Induced Measures
标题：关于PED诱导措施的一步沃瑟斯坦引导生成模型的规律性和推广
链接：https://arxiv.org/abs/2605.21388

作者：Likun Lin,Zhongjian Wang,Jack Xin,Zhiwen Zhang
摘要：Despite the remarkable empirical success of generative models, the available theory on their statistical accuracy in scientific computing remains largely pessimistic. This paper develops a theoretical framework for understanding the regularity of transport maps and the generalization properties of one-step Wasserstein-guided generative models for PDE-induced probability measures. We consider normalized target densities associated with linear elliptic and parabolic equations on bounded domains, as well as diffusion and Fokker--Planck equations on the torus. Under standard structural assumptions, we prove that these target measures satisfy doubling conditions. By combining this fact with regularity theory for optimal transport between doubling measures, we show that the optimal transport map from a uniform source measure to the target measure is Hölder continuous. This regularity yields an approximation-theoretic justification for one-step generative models that learn PDE-induced distributions via a single pushforward map. As a representative instance, we study DeepParticle and derive excess-risk bounds characterizing the discrepancy between the learned map and the population-optimal map. We also establish a robustness estimate under target shift and illustrate the theory with experiments which support the derived rates.

【6】A New Framework to Analyse the Distributional Robustness of Deep Neural Networks
标题：分析深度神经网络分布稳健性的新框架
链接：https://arxiv.org/abs/2605.21313

作者：Divij Khaitan,Subhashis Banerjee
备注：9 pages, 6 figures, 3 tables
摘要：Deep neural networks have achieved impressive performance on a variety of tasks, but their brittleness to distributional shifts remains a significant barrier to real-world deployment. In this paper, we propose a framework to analyse and quantify the distributional robustness of neural networks by studying the interactions between layer weights and activations. We model these interactions using Bernoulli distributions, using the separation between classes as a diagnostic proxy for robustness. We demonstrate the usefulness of this framework through models trained on CIFAR-10 and ImageNet. We show that our proposed metrics can distinguish between networks that have memorised their training data and those that have not. We also perform analogous experiments in the activation space and find that the same properties do not hold up. Additionally, we investigate the behaviour of our metrics under various distribution shifts and show that these shifts reduce separation under our path-based diagnostics. Our results suggest that this framework provides useful model-level diagnostics of representation structure and robustness.

【7】A Mechanistic Study of Tabular Foundation Models
标题：表格基础模型的机制研究
链接：https://arxiv.org/abs/2605.21288

作者：Marin Biloš,James T. Wilson,Anderson Schneider,Yuriy Nevmyvaka
摘要：Tabular foundation models with different architectures converge in accuracy across a range of classification and regression tasks. This raises questions a leaderboard cannot answer: (i) whether the models execute the same in-context algorithm, (ii) where row, column, and class-permutation invariances originate, and (iii) how robust they are under perturbations engineered against the inferred mechanism. We characterize all three. The model families realize qualitatively distinct similarity-based readouts: from an attention-weighted vote over context labels to a class-conditional mean readout, each confirmed by causal intervention. We find that the representation collapse highlighted in prior work is not a practical concern for them. Each model's permutation invariances trace to specific positional parameters whose removal preserves accuracy and makes approximate invariance exact. Perturbations engineered against each readout reproduce predicted failure modes; hub and rank attacks isolate them from refit baselines. Together these results give a mechanistic account of contemporary tabular foundation models and identify which inductive biases govern both their accuracy and characteristic failures.

【8】Nonparametric Learning and Earning with One-Point Feedback under Nonstationarity
标题：非平稳性下的非参数学习和单点反馈收益
链接：https://arxiv.org/abs/2605.21263

作者：Xiangyu Yang,Feng Xu,Jian-Qiang Hu,Jiaqiao Hu
摘要：Firms increasingly rely on dynamic pricing to respond to evolving customer demand, yet in many applications they observe only the revenue generated by a single posted price in each period. At the same time, market conditions may shift gradually or abruptly due to changes in customer preferences, competition, or external shocks. These features create two intertwined challenges: learning the revenue--demand relationship from limited feedback and adapting pricing decisions to a changing environment. We study how a seller can learn and earn effectively under these constraints, without assuming a specific parametric form for demand. We develop a learning framework that updates prices using revenue-based gradient approximations constructed from one observation per period. To address environmental changes, we incorporate a restarting mechanism that periodically refreshes the learning process so that outdated information is discounted. When the degree of nonstationarity is unknown, we further introduce a meta-learning layer to adaptively hedge across multiple restarting schedules. We provide performance guarantees for our approach, showing how cumulative revenue loss relative to a fully informed benchmark depends on both the time horizon and the magnitude of market variation. Simulation experiments using synthetic and real-world data illustrate the effectiveness of the proposed procedures.

【9】On the Cost and Benefit of Chain of Thought: A Learning-Theoretic Perspective
标题：论思维链的成本和收益：学习理论的视角
链接：https://arxiv.org/abs/2605.21260

作者：Yue Zhang,Zhiyi Dong,Tommaso Cesari,Yongyi Mao
摘要：We develop a learning-theoretic framework for understanding Chain of Thought (CoT). We model CoT as the interaction between an answer map and a chain rule that generates intermediate questions autoregressively, and define the reasoning risk of a hypothesis under this interaction. Our first result is a tight canonical decomposition of this risk into two terms with opposing roles: an oracle-trajectory risk (OTR), which captures the benefit of CoT and reduces to a target-domain risk in a domain adaptation problem, and a trajectory-mismatch risk (TMR), which captures the cost of CoT through error accumulation along mismatched reasoning trajectories. We then show that this cost is unavoidable without structure: if any one of the loss, the hypothesis answer map, or the chain rule lacks stability, the TMR can be arbitrarily large even when the OTR is zero and the hypothesis is uniformly close to the ground truth. Conversely, under stability, we prove a tight upper bound on the TMR governed by an exact amplification factor that identifies bounded, linear, and exponential error-growth regimes. Together, these results give a precise theory of when CoT helps, when it hurts, and what controls the transition between the two.

【10】Divide and Contrast: Learning Robust Temporal Features without Augmentation
标题：划分与对比：无需增强即可学习稳健的时间特征
链接：https://arxiv.org/abs/2605.21241

作者：Abdul-Kazeem Shamba,Kerstin Bach,Gavin Taylor
备注：Published in the 43rd International Conference on Machine Learning (ICML 2026)
摘要：Self-supervised learning for time-series representation aims to reduce reliance on labeled data while maintaining strong downstream performance, yet many existing approaches incur high computational costs or rely on assumptions that do not hold across diverse temporal dynamics. In this work, we introduce Divide and Contrast (Di-COT), an unsupervised framework that avoids data augmentation and multiple encoder passes by contrasting informative substructures within a window rather than individual timesteps. Di-COT stochastically partitions each window into a small number of overlapping sub-blocks per iteration, enabling efficient and meaningful contrast while mitigating false positives during temporal transitions. To further improve scalability, we adopt a contrastive objective whose computation depends on the batch size and the number of sub-blocks, making loss computation independent of sequence length. Extensive experiments on six large-scale real-world datasets, as well as the UCR and UEA benchmarks, demonstrate that Di-COT learns semantically structured and transferable representations, achieving state-of-the-art performance on classification, clustering, $k$NN, and cross-dataset transfer, while substantially reducing training time. The source code is publicly available at https://github.com/sfi-norwai/Di-COT.

【11】Efficient Learning of Deep State Space Models via Importance Smoothing
标题：通过重要性平滑有效学习深状态空间模型
链接：https://arxiv.org/abs/2605.21108

作者：John-Joseph Brady,Nikolas Nusken,Yunpeng Li
备注：Accepted to the proceedings of ICML 2026
摘要：Latent state space systems are ubiquitous in statistical modelling, arising naturally when a time series is observed through a noisy measurement function, however training deep state space models (DSSM) at scale remains difficult. Two largely distinct strategies and literatures have developed around the training of DSSMs. Firstly, auto-encoding DSSMs train generative DSSMs by optimising a variational lower bound. Secondly, DSSMs trained by back-propagating the outputs of a classical sequential Monte Carlo algorithm (SMC). Such approaches can train DSSMs for discriminative as well as generative tasks, however, due to the sequentiality of their forward pass, scale poorly on modern hardware. We propose a new training method \emph{parallel variational Monte Carlo} (PVMC) that bridges the gap between the paradigms, and can be used robustly to train DSSMs for both discriminative and generative tasks. Our method achieves state-of-the-art or better results on a set of baseline experiments and trains $10\times$ faster than the fastest competing SMC approach.

【12】Winfree Oscillatory Neural Network
标题：Winfree振荡神经网络
链接：https://arxiv.org/abs/2605.20922

作者：Jiawen Dai,Yue Song
备注：Project page: https://jiawen-dai.github.io/WONN_Project_Page/
摘要：Oscillations and synchronization are widely believed to play a fundamental role in representation and computation. However, existing machine learning approaches based on synchronization dynamics have largely been confined to specialized settings such as object discovery, with limited evidence of scalability to standard vision benchmarks or logic reasoning tasks. We propose the Winfree Oscillatory Neural Network (WONN), a dynamical neural architecture based on generalized Winfree dynamics. WONN evolves representations on the torus $(S^1)^d$ through structured oscillatory interactions, combining phase-based inductive biases with flexible and hierarchical interaction mechanisms instantiated as either fixed trigonometric mappings or learnable neural networks. We evaluate WONN on image recognition and complex reasoning tasks, including CIFAR, ImageNet, Maze-hard, and Sudoku. Across these domains, WONN achieves competitive or superior performance with strong parameter efficiency. In particular, WONN is, to our knowledge, the first synchronization-based oscillatory architecture to scale competitively to ImageNet-1K. Furthermore, on Maze-hard, WONN achieves 80.1% accuracy using only 1% of the parameters of prior state-of-the-art models. These results suggest that structured oscillatory dynamics provide a scalable and parameter-efficient alternative to conventional neural architectures.

【13】For How Long Should We Be Punching? Learning Action Duration in Fighting Games
标题：我们应该出拳多久？格斗游戏中学习动作持续时间
链接：https://arxiv.org/abs/2605.20911

作者：Hoang Hai Nguyen,Kurt Driessens,Dennis J. N. J. Soemers
备注：Accepted at Computers and Games 2026

【14】LOSCAR-SGD: Local SGD with Communication-Computation Overlap and Delay-Corrected Sparse Model Averaging
标题：LOSCAR-Singapore：具有通信计算重叠和延迟修正稀疏模型平均的本地Singapore
链接：https://arxiv.org/abs/2605.20866

作者：Yassine Maziane,Ammar Mahran,Artavazd Maranjyan,Peter Richtárik

【15】OlmoEarth v1.1: A more efficient family of OlmoEarth models
标题：OlmoEarth v1.1：更高效的OlmoEarth模型系列
链接：https://arxiv.org/abs/2605.20804

作者：Gabriel Tseng,Yawen Zhang,Favyen Bastani,Henry Herzog,Joseph Redmon,Hadrien Sablon,Piper Wolters,Patrick Alan Johnson,Christopher Wilhelm,Patrick Beukema

【16】Tunable MAGMAX: Preference-Aware Model Merging for Continual Learning
标题：可调MAGMAX：偏好感知模型合并以实现持续学习
链接：https://arxiv.org/abs/2605.20803

作者：Kei Hiroshima,Kento Uchida,Shinichi Shirakawa
备注：17 pages, 4 figures. Accepted at ICPR 2026

【17】Conflict-Aware Additive Guidance for Flow Models under Compositional Rewards
标题：成分奖励下流模型的预算意识添加性指南
链接：https://arxiv.org/abs/2605.20758

作者：Xuehui Yu,Fucheng Cai,Meiyi Wang,Xiaopeng Fan,Harold Soh
备注：Forty-Third International Conference on Machine Learning (ICML 2026)

【18】Accelerating Video Inverse Problem Solvers with Autoregressive Diffusion Models
标题：使用自回归扩散模型加速视频逆问题求解
链接：https://arxiv.org/abs/2605.20624

作者：Taesung Kwon,Jonghyun Park,Hyungjin Chung,Jong Chul Ye
备注：Project page is available here: https://avis-project.github.io/

【19】Matryoshka Concept Bottleneck Models
标题：Matryoshka概念瓶颈模型
链接：https://arxiv.org/abs/2605.20612

作者：Ziye Chen,Hongbin Lin,Xinyue Xu,Jie Li,Lijie Hu

【20】Mechanistic Interpretability for Learning Assurance of a Vision-Based Landing System
标题：基于视觉的着陆系统学习保证的机械解释性
链接：https://arxiv.org/abs/2605.20607

作者：Romeo Valentin,Olivia Beyer Bruvik,Marc R. Schlichting,Mykel J. Kochenderfer
备注：10 pages, 4 figures

【21】Deep Learning Surrogates for Emulating Stochastic Climate Tipping Dynamics
标题：模拟随机气候倾斜动力学的深度学习代理
链接：https://arxiv.org/abs/2605.20580

作者：Adeline Hillier,Jennifer Sleeman,Jay Brett,Caroline Tang,Jenelle Millison,Anand Gnanadesikan

【22】Axiomatizing Neural Networks via Pursuit of Subspaces
标题：通过子空间的追求将神经网络公理化
链接：https://arxiv.org/abs/2605.20534

作者：Mehmet Yamac,Mert Duman,Ugur Akpinar,Felix Rojas Casadiego,Serkan Kiranyaz,Marcel van Gerven,Moncef Gabbouj
备注：43 pages, 25 figures. Code and additional materials will be released

【23】An exponential mechanism based on quadratic approximations for fine-tuning machine learning models with privacy guarantees
标题：基于二次逼近的指数机制，用于微调具有隐私保证的机器学习模型
链接：https://arxiv.org/abs/2605.20521

作者：Hoang Tran,Jorge Ramirez,Jiayi Wang,Alberto Bocchinfuso,Christopher Stanley,M. Paul Laiu

【24】Training Language Agents to Learn from Experience
标题：训练语言代理从经验中学习
链接：https://arxiv.org/abs/2605.20477

作者：Yuval Shalev,Zifeng Ding,Mateja Jamnik

【25】SMA-DP: Spectral Memory-Aware Differential Privacy for Deep Learning
标题：SMA-DP：用于深度学习的光谱记忆感知差异隐私
链接：https://arxiv.org/abs/2605.20450

作者：Mohammad Partohaghighi,Roummel Marcia

【26】Score-Based Causal Discovery of Latent Variable Causal Models
标题：潜在变量因果模型的基于分数的因果发现
链接：https://arxiv.org/abs/2605.20396

作者：Ignavier Ng,Xinshuai Dong,Haoyue Dai,Biwei Huang,Peter Spirtes,Kun Zhang
备注：ICML 2024

【27】Symmetrization of Loss Functions for Robust Training of Neural Networks in the Presence of Noisy Labels
标题：在有噪音标签的情况下，损失函数的对称化用于神经网络的鲁棒训练
链接：https://arxiv.org/abs/2605.20347

作者：Alexandre Lemire Paquin,Brahim Chaib-Draa,Philippe Giguère
备注：28 pages, 1 figure, 4 tables

【28】Less Data, Faster Training: repeating smaller datasets speeds up learning via sampling biases
标题：更少的数据，更快的训练：重复较小的数据集通过抽样偏差加快学习
链接：https://arxiv.org/abs/2605.20314

作者：Jingwen Liu,Ezra Edelman,Surbhi Goel,Bingbin Liu
备注：ICML 2026

【29】SDM: A Powerful Tool for Evaluating Model Robustness
标题：SDM：一个评价模型鲁棒性的有力工具
链接：https://arxiv.org/abs/2605.20308

作者：Xinlei Liu,Tao Hu,Jichao Xie,Peng Yi,Hailong Ma,Baolin Li
备注：16 pages

【30】Neural Collapse by Design: Learning Class Prototypes on the Hypersphere
标题：设计的神经崩溃：超球体上的学习类原型
链接：https://arxiv.org/abs/2605.20302

作者：Panagiotis Koromilas,Theodoros Giannakopoulos,Mihalis A. Nicolaou,Yannis Panagakis
备注：43rd International Conference on Machine Learning (ICML 2026); Code: https://github.com/pakoromilas/nc_by_design

【31】Robust Subspace-Constrained Quadratic Models for Low-Dimensional Structure Learning
标题：用于低维结构学习的稳健子空间约束二次模型
链接：https://arxiv.org/abs/2605.20300

作者：Zheng Zhai,Xiaohui Li

【32】Mechanisms of Misgeneralization in Physical Sequence Modeling
标题：物理序列建模中的错误概括机制
链接：https://arxiv.org/abs/2605.20299

作者：Kento Nishi,Raphael Tang,Karun Kumar,Core Francisco Park,Hidenori Tanaka
备注：Preprint. kentonishi.com/physical-misgeneralization

【33】Physics-informed convolutional neural networks for fluid flow through porous media
标题：用于流体流过多孔媒体的物理信息卷积神经网络
链接：https://arxiv.org/abs/2605.20250

作者：Rafał Topolnicki,Paweł Dłotko,Maciej Matyka
备注：14 pages, supplement, dedicated github repo

【34】CP-MoE: Consistency-Preserving Mixture-of-Experts for Continual Learning
标题：CP-MoE：保持持续学习一致性的专家混合
链接：https://arxiv.org/abs/2605.20247

作者：Yang Liu,Toan Nguyen,Flora D. Salim

【35】GROW: Aligning GRPO with State-Action Modeling for Open-World VLM Agents
标题：GroW：将GRPO与开放世界VLM代理的状态动作建模保持一致
链接：https://arxiv.org/abs/2605.20246

作者：Xiongbin Wu,Zhihao Luo,Shanzhe Lei,Lechao Zhang,Xuhong Wang,Jie Yang,Zhonglong Zheng,Yuanjie Zheng,Xin Tan,Wei Liu

【36】Provably Learning Diffusion Models under the Manifold Hypothesis: Collapse and Refine
标题：多管齐下的可证明学习扩散模型：崩溃与精炼
链接：https://arxiv.org/abs/2605.20235

作者：Wei Huang,Andi Han,Mingyuan Bai,Huanjian Zhou,Qixin Zhang,Taiji Suzuki,Kenji Fukumizu
备注：3 figures

【37】Pseudo-Siamese Network for Planning in Target-Oriented Proactive Dialogues
标题：面向目标的主动对话规划的伪连体网络
链接：https://arxiv.org/abs/2605.20195

作者：Xinyue Kang,Maodong Li,Yibin Zheng,Fang Kong
备注：ICASSP2026

【38】A Rigorous, Tractable Measure of Model Complexity
标题：模型复杂性的严格、易于管理的衡量标准
链接：https://arxiv.org/abs/2605.21167

作者：Oskar Allerbo,Thomas B. Schön

其他(64篇)

【1】Variance Reduction for Expectations with Diffusion Teachers
标题：降低扩散教师期望的方差
链接：https://arxiv.org/abs/2605.21489

作者：Jesse Bettencourt,Xindi Wu,Matan Atzmon,James Lucas,Jonathan Lorraine
备注：Project page: https://research.nvidia.com/labs/sil/projects/CARV/

【2】AiraXiv: An AI-Driven Open-Access Platform for Human and AI Scientists
标题：AiraXiv：人类和人工智能科学家的人工智能驱动开放访问平台
链接：https://arxiv.org/abs/2605.21481

作者：Junshu Pan,Panzhong Lu,Yixuan Weng,Qiyao Sun,Fang Guo,Zijie Yang,Qiji Zhou,Yue Zhang

【3】Agent JIT Compilation for Latency-Optimizing Web Agent Planning and Scheduling
标题：用于延迟优化Web代理规划和调度的Agent JT编译
链接：https://arxiv.org/abs/2605.21470

作者：Caleb Winston,Ron Yifeng Wang,Azalia Mirhoseini,Christos Kozyrakis
备注：Accepted at ICML 2026

【4】Mind the Sim-to-Real Gap & Think Like a Scientist
标题：注意模拟与真实的差距，像科学家一样思考
链接：https://arxiv.org/abs/2605.21458

作者：Harsh Parikh,Gabriel Levin-Konigsberg,Dominique Perrault-Joncas,Alexander Volfovsky

【5】Mitigating Label Bias with Interpretable Rubric Embeddings
标题：使用可解释的标签嵌入减轻标签偏见
链接：https://arxiv.org/abs/2605.21455

作者：Calvin Isley,Johann D. Gaebler,Sharad Goel

【6】torchtune: PyTorch native post-training library
标题：Torchtune：PyTorch原生训练后库
链接：https://arxiv.org/abs/2605.21442

作者：Mark Obozov,Maxime Griot,Joseph Cummings,Evan Smothers,Felipe Mello,Rafi Ayub,Philip John Bontrager,Salman Mohammadi,Ariel Kwiatkowski,Nathan Azrak,Mircea Mironenco
备注：14 pages

【7】roto 2.0: The Robot Tactile Olympiad
标题：roto 2.0：机器人触觉奥林匹克竞赛
链接：https://arxiv.org/abs/2605.21429

作者：Elle Miller,Jayaram Reddy,Ayush Deshmukh,Trevor McInroe,David Abel,Oisin Mac Aodha,Sethu Vijayakumar
备注：Accepted to 7th ViTac Workshop, ICRA 2026

【8】Preference-aware Influence-function-based Data Selection Method for Efficient Fine-Tuning
标题：基于偏好感知影响函数的数据选择方法，用于高效微调
链接：https://arxiv.org/abs/2605.21422

作者：Qihao Lin,Guanxu Chen,Dongrui Liu,Jing Shao
备注：13 pages, 3 figures

【9】CRAFT: Conflict-Resolved Aggregation for Federated Training
标题：CRAFT：联邦训练的预算分解聚合
链接：https://arxiv.org/abs/2605.21317

作者：Ziqi Wang,Qiang Liu,Nils Thuerey

【10】From Circuit Evidence to Mechanistic Theory: An Inductive Logic Approach
标题：从电路证据到机械理论：归纳逻辑方法
链接：https://arxiv.org/abs/2605.21303

作者：Nura Aljaafari,Danilo S. Carvalho,Andre Freitas
备注：27 pages, 10 Figures, 14 Tables

【11】\textit{Stochastic} MeanFlow Policies: One-Step Generative Control with Entropic Mirror Descent
标题： extit{Stochastic} MeanFlow策略：具有熵镜像下降的一步生成控制
链接：https://arxiv.org/abs/2605.21282

作者：Zeyuan Wang,Da Li,Yulin Chen,Yuehu Gong,Yanming Guo,Ye Shi,Liang Bai,Tianyuan Yu,Yanwei Fu

【12】FedCoE: Bridging Generalization and Personalization via Federated Coordinated Dual-level MoEs
标题：FedCoE：通过联邦协调的双层教育部连接通用化和个性化
链接：https://arxiv.org/abs/2605.21264

作者：Penglin Dai,Fulian Li,Xincao Xu,Junhua Wang,Lixin Duan,Xiao Wu

【13】PREFINE: Preference-Based Implicit Reward and Cost Fine-Tuning for Safety Alignment
标题：PREFINE：基于偏好的隐性奖励和成本微调以实现安全调整
链接：https://arxiv.org/abs/2605.21225

作者：Richa Verma,Bavish Kulur,Sanjay Chawla,Balaraman Ravindran
备注：Accepted at AAMAS 2026 as a full paper

【14】SMoA: Spectrum Modulation Adapter for Parameter-Efficient Fine-Tuning
标题：SMoA：用于参数高效微调的频谱调制适配器
链接：https://arxiv.org/abs/2605.21147

作者：Yongkang Liu,Xing Li,Mengjie Zhao,Shanru Zhang,Zijing Wang,Qian Li,Shi Feng,Feiliang Ren,Daling Wang,Hinrich Schütze

【15】Decoupling Communication from Policy: Robust MARL under Bandwidth Constraints
标题：将沟通与政策脱钩：带宽限制下的稳健MARL
链接：https://arxiv.org/abs/2605.21085

作者：Alexi Canesse,Benoît Goupil,Jesse Read,Sonia Vanier

【16】SpectralEarth-FM: Bringing Hyperspectral Imagery into Multimodal Earth Observation Pretraining
标题：SpectralEarth-FM：将高光谱图像纳入多模式地球观测预训练
链接：https://arxiv.org/abs/2605.21075

作者：Nassim Ait Ali Braham,Aaron Banze,Conrad M. Albrecht,Julien Mairal,Jocelyn Chanussot,Xiao Xiang Zhu

【17】Divide et Calibra: Multiclass Local Calibration via Vector Quantization
标题：Divide et Calibra：通过载体量化的多类本地校准
链接：https://arxiv.org/abs/2605.21060

作者：Cesare Barbera,Lorenzo Perini,Giovanni De Toni,Andrea Passerini,Andrea Pugnana

【18】Playing Devil's Advocate: Off-the-Shelf Persona Vectors Rival Targeted Steering for Sycophancy
标题：扮演魔鬼的代言人：现成的女神异闻录Vectors竞争对手瞄准奉承引导
链接：https://arxiv.org/abs/2605.21006

作者：Ishaan Kelkar,Nebras Alam,Vikram Kakaria,Madhur Panwar,Vasu Sharma,Maheep Chaudhary

【19】Beyond the Bellman Recursion: A Pontryagin-Guided Framework for Non-Exponential Discounting
标题：超越贝尔曼回归：庞特里亚金引导的非指数折扣框架
链接：https://arxiv.org/abs/2605.20996

作者：Hojin Ko,Jeonggyu Huh

【20】Diagnosing Overhead in Dispatch Operations: Cross-architecture Observatory
标题：诊断调度操作中的费用：跨架构天文台
链接：https://arxiv.org/abs/2605.20982

作者：Bole Ma,Jan Eitzinger,Harald Koestler,Gerhard Wellein

【21】A Deployment Audit of Release-Side Risk in Conformal Triage under Prevalence Shift
标题：患病率转变下适形分诊中释放侧风险的部署审计
链接：https://arxiv.org/abs/2605.20956

作者：Chengze Li,Xiao Liu,Hanrong Zhang,Haiyang Peng,Yanghao Ruan,Huanhuan Ma,Chunyu Miao,Qichao Zhou,Xiangrong Qi,Philip Yu
备注：18 pages, 4 figures, 5 tables

【22】DASH: Fast Differentiable Architecture Search for Hybrid Attention in Minutes on a Single GPU
标题：DASH：快速差异化架构在单个图形处理器上几分钟内搜索混合注意力
链接：https://arxiv.org/abs/2605.20936

作者：Weizhe Chen,Miao Zhang,Junpeng Jiang,Yaping Li,Weili Guan,Liqiang Nie
备注：19 pages, 7 figures

【23】CIG: Exploration via Conditional Information Gain
标题：CIG：通过条件信息收益进行探索
链接：https://arxiv.org/abs/2605.20878

作者：Tim Joseph,Marcus Fechner,Philipp Stegmaier,Karam Daaboul,J. Marius Zöllner
备注：28 pages, 10 figures, 3 tables

【24】Runtime-Certified Bounded-Error Quantized Attention
标题：运行时认证的有界误差量化注意力
链接：https://arxiv.org/abs/2605.20868

作者：Dean Calver
备注：32 pages, 1 figure

【25】Conditional Equivalence of DPO and RLHF: Implicit Assumption, Failure Modes, and Provable Alignment
标题：DPO和WLHF的条件等效性：隐性假设、失败模式和可证明的对齐
链接：https://arxiv.org/abs/2605.20834

作者：Zhiqin Yang,Yonggang Zhang,Wei Xue,Dong Fang,Bo Han,Yike Guo
备注：49 pages

【26】Instant GPU Efficiency Visibility at Fleet Scale
标题：机队规模下的即时图形处理器效率可见性
链接：https://arxiv.org/abs/2605.20799

作者：Connor Pedersen,Dong H. Ahn,Michel Migdal,Collin Neale,Nik Konyuchenko
备注：12 pages, 7 figures, 3 tables

【27】The Devil is in the Condition Numbers: Why is GLU Better than non-GLU Structure?
标题：魔鬼就在条件数中：为什么GLU比非GLU结构更好？
链接：https://arxiv.org/abs/2605.20749

作者：Xingyu Lyu,Qianqian Xu,Zhiyong Yang,Peisong Wen,Qingming Huang
备注：Accepted by ICML 2026

【28】The Hidden Signal of Verifier Strictness: Controlling and Improving Step-Wise Verification via Selective Latent Steering
标题：验证者敏捷性的隐藏信号：通过选择性潜在引导控制和改进逐步验证
链接：https://arxiv.org/abs/2605.20745

作者：Yefan Zhou,Yilun Zhou,Austin Xu,Soroush Vosoughi,Shafiq Joty,Jiang Gui

【29】Hack-Verifiable Environments: Towards Evaluating Reward Hacking at Scale
标题：黑客可验证环境：大规模评估奖励黑客
链接：https://arxiv.org/abs/2605.20744

作者：Amit Roth,Ankur Samanta,Matan Halevy,Yoav Levine,Yonathan Efroni
备注：Project Page - https://majoroth.github.io/hack-verifiable-environments/

【30】DIVE: Embedding Compression via Self-Limiting Gradient Updates
标题：DIVE：通过自限梯度更新嵌入压缩
链接：https://arxiv.org/abs/2605.20689

作者：Dongfang Zhao

【31】On the limits and opportunities of AI reviewers: Reviewing the reviews of Nature-family papers with 45 expert scientists
标题：关于人工智能评论员的局限性和机会：与45名科学家专家一起审查自然家族论文的评论
链接：https://arxiv.org/abs/2605.20668

作者：Seungone Kim,Dongkeun Yoon,Kiril Gashteovski,Juyoung Suk,Jinheon Baek,Pranjal Aggarwal,Ian Wu,Viktor Zaverkin,Spase Petkoski,Daniel R. Schrider,Ilija Dukovski,Francesco Santini,Biljana Mitreska,Yong Jeong,Kyeongha Kwon,Young Min Sim,Dragana Manasova,Arthur Porto,Biljana Mojsoska,Makoto Takamoto,Marko Shuntov,Ruoqi Liu,Hyunjoo Jenny Lee,Niyazi Ulas Dinç,Yehhyun Jo,Sunkyu Han,Chungwoo Lee,Huishan Li,Esther H. R. Tsai,Ergun Simsek,Khushboo Shafi,Yeonseung Chung,Jihye Park,Aleksandar Shulevski,Henrik Christiansen,Yoosang Son,Elly Knight,Amanda Montoya,Jeongyoun Ahn,Christian Langkammer,Heera Moon,Changwon Yoon,Nikola Stikov,Mooseok Jang,Edward Choi,Junhan Kim,Yeon Sik Jung,Woo Youn Kim,Jae Kyoung Kim,Ishraq Md Anjum,Hyun Uk Kim,Drew Bridges,Carolin Lawrence,Xiang Yue,Alice Oh,Akari Asai,Sean Welleck,Graham Neubig
备注：Work in progress

【32】REFLECTOR: Internalizing Step-wise Reflection against Indirect Jailbreak
标题：反思者：内化针对间接越狱的逐步反思
链接：https://arxiv.org/abs/2605.20654

作者：Jiachen Ma,Jiawen Zhang,Xiangtian Li,Bo Zou,Chaochao Lu,Chao Yang
备注：ICML 2026

【33】Same Target, Different Basins: Hard vs. Soft Labels for Annotator Distributions
标题：相同的目标，不同的盆：注释者发行版的硬标签与软标签
链接：https://arxiv.org/abs/2605.20642

作者：Mirerfan Gheibi,Gashin Ghazizadeh
备注：14 pages, 12 figures. Accepted to the 2nd Workshop on Epistemic Intelligence in Machine Learning (EIML @ ICML 2026)

【34】The General Theory of Localization Methods
标题：本地化方法的一般理论
链接：https://arxiv.org/abs/2605.20635

作者：Congwei Song
备注：74 + 7 pages, ~30 figures, 6 tables

【35】Dynamic Shapley Computation
标题：动态Shapley计算
链接：https://arxiv.org/abs/2605.20620

作者：Xuan Yang,Hsi-Wen Chen,Ming-Syan Chen,Jian Pei

【36】SURF: Steering the Scalarization Weight to Uniformly Traverse the Pareto Front
标题：SURF：引导量化权重均匀穿越帕累托前沿
链接：https://arxiv.org/abs/2605.20619

作者：Liuyuan Jiang,Chentong Huang,Lisha Chen

【37】Self-Training Doesn't Flatten Language -- It Restructures It: Surface Markers Amplify While Deep Syntax Dies
标题：自我训练不会扁平化语言--而是重组语言：表面标记放大，而深层标记则消失
链接：https://arxiv.org/abs/2605.20602

作者：Ming Liu
备注：19 pages (14 main + 5 appendix), 8 figures, 3 tables

【38】Multi-agent Collaboration with State Management
标题：与状态管理的多代理协作
链接：https://arxiv.org/abs/2605.20563

作者：Mengyang Liu,Taozhi Chen,Zhenhua Xu,Xue Jiang,Yihong Dong

【39】Latent Process Generator Matching
标题：潜在进程生成器匹配
链接：https://arxiv.org/abs/2605.20547

作者：Lukas Billera,Hedwig Nora Nordlinder,Ben Murrell
备注：18 pages, 1 figure

【40】OpenSeisML: Open Large-Scale Real Seismic and well-log Dataset for Generative AI
标题：OpenSeisML：用于生成人工智能的开放大规模真实地震和井记录数据集
链接：https://arxiv.org/abs/2605.20539

作者：Ipsita Bhar,Huseyin Tuna Erdinc,Thales Souza,Charles Jones,Felix J. Herrmann
备注：5 pages, 8 figures

【41】Pseudo-Formalization for Automatic Proof Verification
标题：自动证明验证的伪形式化
链接：https://arxiv.org/abs/2605.20531

作者：Slim Barkallah,Luke Bailey,Kaiyue Wen,Mohammed Abouzaid,Tengyu Ma
备注：31 pages, code available at https://github.com/Slim205/pseudo-formalization

【42】Fast Reconstruction of Exact Maxwell Dynamics from Sparse Data
标题：稀疏数据下精确Maxwell动力学的快速重建
链接：https://arxiv.org/abs/2605.20514

作者：Dan DeGenaro,Xin Li,Obed Amo,Michael Pokojovy,Sarah Adel Bargal,Markus Lange-Hegermann,Bogdan Raiţă
备注：31 pages, 8 figures

【43】Reinforcing Human Behavior Simulation via Verbal Feedback
标题：通过言语反馈强化人类行为模拟
链接：https://arxiv.org/abs/2605.20506

作者：Weiwei Sun,Xuhui Zhou,Jiarui Liu,Weihua Du,Haojia Sun,Yiqing Xie,Qianou Ma,Sihao Chen,Mengting Wan,Longqi Yang,Pei Zhou,Sherry Wu,Sean Welleck,Graham Neubig,Yiming Yang,Maarten Sap

【44】A 10,000-Year Global Stochastic Tropical Cyclone Catalog with Wind-Dependent Track Transitions (WHITS)
标题：具有与风相关的路径转变的10，000年全球随机热带气旋目录（WHITS）
链接：https://arxiv.org/abs/2605.20494

作者：Jennifer Nakamura,Upmanu Lall

【45】Can Conversational XAI Improve User Performance? An Experimental Study
标题：对话式XAI能否提高用户性能？实验研究
链接：https://arxiv.org/abs/2605.20439

作者：Sven Kruschel,Julian Rosenberger,Lasse Bohlen,Mathias Kraus,Patrick Zschech
备注：Accepted at Thirty-Fourth European Conference on Information Systems (ECIS 2026), Milan, Italy

【46】Spectral Souping: A Unified Framework for Online Preference Alignment
标题：Spectral Souping：在线偏好匹配的统一框架
链接：https://arxiv.org/abs/2605.20408

作者：Yinlam Chow,Guy Tennenholtz,Ted Yun,James Harrison,Arthur Gretton,Andre Barreto,Bo Dai

【47】Proximal State Nudging: Reducing Skill Atrophy from AI Assistance
标题：近端状态推动：减少人工智能援助带来的技能萎缩
链接：https://arxiv.org/abs/2605.20355

作者：Megha Srivastava,Jonathan Ouyang,Eric Zhou,Andrew Silva,Emily Sumner,Dorsa Sadigh,Yuchen Cui,Deepak Gopinath,Guy Rosman
备注：9 pages

【48】Spectral Unforgetting: Post-Hoc Recovery of Damaged Capabilities Without Retraining
标题：光谱不遗忘：事后恢复受损能力而无需重新训练
链接：https://arxiv.org/abs/2605.20296

作者：Aarash Abro,Muhammad Tahir

【49】Weasel: Out-of-Domain Generalization for Web Agents via Importance-Diversity Data Selection
标题：Weasel：通过重要多样性数据选择实现Web代理的域外通用化
链接：https://arxiv.org/abs/2605.20291

作者：Fatemeh Pesaran zadeh,Seyeon Choi,Xing Han Lù,Siva Reddy,Gunhee Kim
备注：ICML 2026. Code is released at https://github.com/fatemehpesaran310/weasel

【50】JUDO: A Juxtaposed Domain-Oriented Multimodal Reasoner for Industrial Anomaly QA
标题：JUDO：用于工业异常QA的并置面向领域的多模式推理器
链接：https://arxiv.org/abs/2605.20284

作者：Hyunju Kang,Woohyun Lee,Jaewon Kim,Hogun Park
备注：Published at ICLR 2026

【51】Modality-Decoupled Online Recursive Editing
标题：模式脱钩在线回归编辑
链接：https://arxiv.org/abs/2605.20273

作者：Siyuan Li,Youyuan Zhang,Fangming Liu,Jing Li

【52】Catching a Moving Subspace: Low-Rank Bandits Beyond Stationarity
标题：捕捉移动子空间：超越稳定性的低级盗贼
链接：https://arxiv.org/abs/2605.20269

作者：Hamed Khosravi,Xiaoming Huo

【53】Residual Paving: Diagnosing the Routing Bottleneck in Selective Refusal Editing
标题：剩余摊铺：诊断选择性拒绝编辑中的路由瓶颈
链接：https://arxiv.org/abs/2605.20262

作者：Bryce Hinkley,Peyman Najafirad

【54】Prism: Structural Symmetry Scanning via Duality-Constrained Laplacian Projection
标题：棱镜：通过二元约束拉普拉斯投影进行结构对称性扫描
链接：https://arxiv.org/abs/2605.20245

作者：Jiatong Xie
备注：10 pages, 4 tables, 1 figure. This work presents a first-principles unsupervised network structural diagnosis framework based on symmetric involution operator and Laplacian commutator constraint. It achieves noise-robust community detection and early structural risk detection in financial time-series networks without supervised training data

【55】LEAP: A closed-loop framework for perovskite precursor additive discovery
标题：LEAP：钙钛矿前体添加剂发现的闭环框架
链接：https://arxiv.org/abs/2605.20242

作者：Xin-De Wang,Zhi-Rui Chen,Ze-Feng Gao,Peng-Jie Guo,Cheng Mu,Zhong-Yi Lu
备注：30 pages; 11 figures

【56】Geometry-Lite: Interpretable Safety Probing via Layer-Wise Margin Geometry
标题：Geometry-Lite：通过分层边缘几何进行可解释的安全探测
链接：https://arxiv.org/abs/2605.20241

作者：Woo Seob Sim,Yu Rang Park

【57】MagBridge-Battery: A Synthetic Bridge Dataset for Li-ion Magnetometry and State-of-Health Diagnostics
标题：MagBridge-Battery：用于锂离子磁力测量和健康状况诊断的合成桥数据集
链接：https://arxiv.org/abs/2605.20240

作者：Sakthi Prabhu Gunasekar,Prasanna Kumar Rangarajan
备注：10 pages, 3 figures, 4 tables. Synthetic dataset and benchmark suite for battery magnetometry and state-of-health diagnostics; dataset released on Zenodo and code available on GitHub

【58】TabPFN-MT: A Natively Multitask In-Context Learner for Tabular Data
标题：TabPFN-MT：一个表格数据的多任务上下文学习器
链接：https://arxiv.org/abs/2605.20234

作者：Cormac Cureton,Narges Armanfard
备注：24 pages, 7 figures

【59】Advanced Scientific Methodology Plays Rossini
标题：罗西尼扮演先进的科学方法论
链接：https://arxiv.org/abs/2605.20220

作者：Silvia Licciardi,Daniela Macchione,Emmanuel Caronna,Elisa Francomano

【60】NaP-Control: Navigating Diffusion Prior for Versatile and Fast Character Control
标题：NaP-Control：导航扩散优先级，实现多功能且快速的角色控制
链接：https://arxiv.org/abs/2605.20209

作者：Chia-Wen Chen,Yan Wu,Korrawe Karunratanakul,Siyu Tang

【61】Augmented Analytics and Decision Quality: The Role of Trust among Non-Technical BI Users
标题：增强分析和决策质量：非技术BI用户之间信任的作用
链接：https://arxiv.org/abs/2605.20198

作者：Thuy Pham Thi Phuong,Ha Nguyen Manh,Ngan Nguyen Thi Thuy,Lan Hoang Thi
备注：13 pages, 1 figure, 4 tables

【62】Stimulus symmetries can confound representational similarity analyses
标题：刺激对称性可能会混淆代表性相似性分析
链接：https://arxiv.org/abs/2605.21324

作者：Farhad Pashakhanloo,Jacob A. Zavatone-Veth
备注：40 pages

【63】Conditioning Gaussian Processes on Almost Anything
标题：几乎任何事物上的条件高斯过程
链接：https://arxiv.org/abs/2605.21041

作者：Henry Moss,Lachlan Astfalck,Thomas Cowperthwaite,Colin Doumont,Sam Willis,Philipp Hennig,Christopher Nemeth,Andrew Zammit-Mangion

【64】Concentration of General Stochastic Approximation Under Heavy-Tailed Markovian Noise
标题：重尾马尔科夫噪音下一般随机逼近的集中性
链接：https://arxiv.org/abs/2605.20999

作者：Shubhada Agrawal,Siva Theja Maguluri,Martin Zubeldia
备注：67 pages

机器翻译由腾讯交互翻译提供，仅供参考

点击“阅读原文”获取带摘要的学术速递