|
|
创作新主题 |
| docker Elasticsearch |
| linux MongoDB Redis DATABASE NGINX 其他Web框架 web工具 zookeeper tornado NoSql Bootstrap js peewee Git bottle IE MQ Jquery |
| 机器学习算法 |
| 短视频 |
| 印度 |
点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!
cs.LG 方向,今日共计274篇
大模型相关(32篇)
【1】Ω-QVLA: Robust Quantization for Vision-Language-Action Models via Composite Rotation and Per-step Scaling
标题:«-QVLA:通过复合旋转和逐步缩放的视觉-语言-动作模型的稳健量化
链接:https://arxiv.org/abs/2605.28803
摘要:Vision-Language-Action (VLA) models unify perception, reasoning, and control within a single policy, yet their multi-billion-parameter backbones and diffusion-based action heads make on-device deployment prohibitively expensive. Prior quantization efforts offer only partial solutions, compressing the LLM backbone while leaving the DiT action head at full precision, or resorting to mixed-precision schemes, driven by the belief that uniformly quantizing the action head is inherently unstable. We challenge this assumption with Omega-QVLA, the first training-free post-training quantization framework that compresses both the language backbone and the entire diffusion action head of a VLA model to a uniform W4A4 precision, eliminating the need for mixed-precision allocation. Omega-QVLA combines a composite SVD-Hadamard rotation that equalizes per-channel weight energy while diffusing residual activation outliers with per-step DiT activation scaling quantization that absorbs dynamic-range drift across denoising steps. On LIBERO, Omega-QVLA compresses Pi 0.5 and GR00T N1.5 to W4A4 with 98.0% and 87.8% task success rates, matching or exceeding their FP16 references of 97.1% and 87.0%, while reducing the static memory footprint by 71.3%. Real-world manipulation experiments further confirm smooth, accurate manipulation where prior methods fail. Code is available at https://github.com/UCMP13753/Omega-QVLA.
【2】LLM Zeroth-Order Fine-Tuning is an Inference Workload
标题:LLM零阶微调是一个推理程序
链接:https://arxiv.org/abs/2605.28760
备注:12 pages, 4 figures, 3 tables, including appendix and references
摘要:Zeroth-order (ZO) fine-tuning is attractive for large language models because it replaces backpropagation with forward objective evaluations. Existing implementations nevertheless execute ZO algorithms inside conventional training loops, even though their dominant work is repeated scoring under nearby parameter states. This creates a workload-runtime mismatch: the algorithm asks for structured inference-style scoring, while the system exposes a sequence of fragmented training-loop steps. We show that LLM ZO fine-tuning is an inference-dominated workload and execute its repeated scoring phase through a serving runtime. On OPT-13B SST-2, the resulting vLLM execution path completes the 20k-step LoZO run in 0.51 estimated training hours versus 4.15 hours for the official LoZO baseline under the matched LoRA-only setting, an 8.13x speedup, while reaching 0.922 final evaluation accuracy and 0.931 final full-validation accuracy. In core-step scaling experiments across OPT-1.3B to OPT-13B, the same runtime reorganization gives 2.34x--7.72x speedups. A MeZO-style high-rank factorized experiment shows that the same runtime paradigm can track a MeZO-like loss trajectory while running up to 2.55x faster. More broadly, representing ZO updates as dynamic adapter states suggests a practical path toward inference-time training, where lightweight adaptation can be scheduled as an inference-like workload rather than as a separate training job.
【3】MemTrace: Tracing and Attributing Errors in Large Language Model Memory Systems
标题:MemTrace:大型语言模型存储系统中的错误跟踪和归因
链接:https://arxiv.org/abs/2605.28732
备注:Ongoing work
摘要:Memory is essential for enabling large language models to support long-horizon reasoning, yet existing memory systems remain unreliable and difficult to debug. Tracing memory's dynamic evolution is crucial to understand how information is synthesized, propagated, or corrupted over time. In this work, we study the new problem of error tracing and attribution in LLM memory systems. We propose a novel framework that transforms memory pipelines into executable memory evolution graphs, enabling fine-grained tracing of operational information flow. We then construct MemTraceBench, a benchmark collected from representative memory systems such as Long-Context, RAG, Mem0, and EverMemOS, to systematically study memory failure modes. We further introduce an automatic attribution method that iteratively traces operation subgraphs to pinpoint the root cause of any failed case. Our analysis reveals that memory failures are systematic, stemming from operation-level issues like information loss and retrieval misalignment. Crucially, we leverage these fine-grained attribution signals to guide downstream prompt optimization, establishing a closed-loop system that automatically corrects faults and boosts end-task performance by up to 7.62%. Code will be released at https://github.com/zjunlp/MemTrace.
【4】SARAD: LLM-Based Safety-Aware Hybrid Reinforcement Learning with Collision Prediction for Autonomous Driving
标题:SARAD:基于LLM的安全感知混合强化学习与碰撞预测的自动驾驶
链接:https://arxiv.org/abs/2605.28583
备注:7 pages, 4 figures, accepted by IJCNN 2026
摘要:
【5】Efficient Pre-Training of LLMs through Truncated SVD Layers
标题:通过截短的DID层对LLM进行高效预训练
链接:https://arxiv.org/abs/2605.28573
摘要:
【6】Cultural Binding Heads in Language Models
标题:语言模型中的文化约束力
链接:https://arxiv.org/abs/2605.28543
摘要:
【7】Functional Entropy: Predicting Functional Correctness in LLM-Generated Code with Uncertainty Quantification
标题:功能性熵:通过不确定性量化预测LLM生成代码中的功能性正确性
链接:https://arxiv.org/abs/2605.28500
摘要
:Large language models have shown impressive capabilities in code generation, yet they often produce functionally incorrect code. Uncertainty quantification (UQ) methods have emerged as a promising approach for detecting hallucinations in natural language generation, but their effectiveness for code generation tasks remains underexplored. We systematically evaluate how UQ techniques transfer to code generation across three programming languages, five LLMs, and over 1,700 problems. We find that some token-probability-based methods generalize effectively without modification, while sampling-based methods relying on natural language inference (NLI) fail because NLI models cannot distinguish functionally different code, causing most responses to collapse into a single semantic cluster. To address this, we introduce functional equivalence methods, a family of code-specific methods that replace NLI-based semantic equivalence with an LLM-based functional equivalence assessment, including functional entropy, a code-specific analog of semantic entropy. Functional equivalence methods achieve top AUROC in 11 out of 15 model-benchmark combinations and the best calibration across most settings, consistently outperforming both NLI-based counterparts and all other methods evaluated.
【8】Learning the Error Patterns of Language Models
标题:学习语言模型的错误模式
链接:https://arxiv.org/abs/2605.28328
摘要:When generating outputs for domains with specific validity constraints (e.g., a program should compile), LLMs often fail in a small number of focused ways: for example, by using Python function names when generating TypeScript. We observe that these error patterns can be represented using a small number of constraints that can be learned in practice. We propose \emph{prefix filters}, which are per-domain-and-LLM symbolic functions, as objects to capture the error patterns, Palla as an algorithm to learn prefix filters efficiently in practice, and implement Palla. Prefix filters learned by Palla i) help us quantitatively analyze the error patterns of LLMs, and ii) can be used to constrain the outputs of a model via constrained sampling algorithms. For example, Palla boosts compile rates for Qwen2.5-1.5B on TypeScript generation, by over 60%, allowing Qwen2.5-1.5B to achieve similar performance to Llama3.1-8B unconstrained.
【9】How Far Can Disaggregation Go? A Design-Space Exploration of Attention-FFN Disaggregation for Efficient MoE LLM Serving
标题:分解能走多远?高效MoE LLM服务的注意力-FFN分解的设计空间探索
链接:https://arxiv.org/abs/2605.28302
摘要:Modern large language model (LLM) inference has progressively disaggregated to keep pace with growing model sizes and tight TTFT and TPOT service-level objectives: from chunked-prefill aggregation, to prefill-decode (P/D) disaggregation, and most recently to operator-level Attention-FFN Disaggregation (AFD). This trend is especially important for mixture-of-experts (MoE) models, where memory-bound attention, compute-intensive expert FFNs, and MoE dispatch/combine communication create distinct resource demands. AFD further exposes this heterogeneity by placing attention and MoE-FFN execution on separate GPU groups. Each level of disaggregation deepens the scheduling design space across workload characteristics, resource allocation, and interconnect topology, raising the central question: when does each level actually pay off? We systematically characterize this trade-off for MoE inference across realistic workloads spanning input/output sequence lengths, prefix-KV reuse, and per-user latency constraints. Using chunked-prefill and P/D disaggregation as baselines, we study the benefits and limits of AFD at scale through a framework that fuses on-device kernel measurements with high-fidelity network simulation. Under strict TTFT/TPOT SLOs, AFD sustains around 4k tokens/s of system throughput on DeepSeek-V3.2 across chat, coding, and agentic-coding workloads, where non-AFD deployments are infeasible. We distill concrete takeaways for jointly optimizing throughput and interactivity, including how to partition attention and FFN across GPUs as a function of workload and model architecture, providing design principles for current rack- and cluster-scale deployments as well as future disaggregated AI infrastructure.
【10】Pruning and Distilling Mixture-of-Experts into Dense Language Models
标题:将混合专家修剪和提炼为密集语言模型
链接:https://arxiv.org/abs/2605.28207
摘要:Mixture-of-Experts (MoE) is now the dominant architecture for frontier language models, yet it requires all expert parameters to be loaded in memory, making it less preferable for memory-constrained deployment. Existing compression methods reduce the number of experts but the output remains an MoE model with the same fundamental limitation. We present the first systematic framework for converting a trained MoE into a standard fully dense architecture: experts are scored, selected, and grouped, then concatenated into a dense FFN and refined by knowledge distillation from the MoE teacher. We evaluate 7 scoring, 5 grouping, and 2 magnitude scaling methods across a range of selected expert counts on Qwen3-30B-A3B, yielding 350 configurations. We find that the choice of scoring method is the most impactful, with our novel diversity-aware scoring consistently outperforming prior methods on Qwen3-30B-A3B, DeepSeek-V2-Lite, and GPT-OSS-20B. Under a controlled comparison at matched parameter count, MoE-to-dense outperforms dense-to-dense pruning by +6.3 pp in average downstream accuracy after ~4B-token distillation at 1.6x faster training wall-clock speed.
【11】Extracting Small Translation Specialists from LLMs by Aggressively Pruning Experts
标题:通过积极精简专家从LLM中提取小型翻译专家
链接:https://arxiv.org/abs/2605.28042
摘要:Modern large language models (LLMs) achieve state-of-the-art machine translation performance, but they do so as broad generalists largely trained for many tasks and capabilities unrelated to translation. Thus, they are heavily overparameterized for this task, resulting in excessive memory and compute requirements. In this paper, we present a method for aggressively pruning experts from modern mixture-of-experts LLMs while incurring negligible degradation in translation quality. Our approach exploits expert specialization and the separability of multilingual capabilities in LLMs to identify experts irrelevant to translation. And because of the modular nature of MoEs, these can be easily pruned without any training. Without retraining, we are able to prune half of all experts with negligible degradation and 70% with only minor losses. With a very short SFT, we prune 75% of experts while recovering baseline performance, and in some settings remove nearly 90% while maintaining reasonable translation quality. Overall, our results show that translation requires only a fraction of the LLM, enabling substantial compression of the MoE blocks that contain over 90% of parameters.
【12】ROSD: Reflective On-Policy Self-Distillation for Language Model Reasoning across Domains
标题:ROSD:跨领域语言模型推理的反思性政策自蒸馏
链接:https://arxiv.org/abs/2605.28014
备注:Preprint
摘要
:On-policy self-distillation (OPSD) improves the reasoning performance of large language models (LLMs) by providing dense token-level supervision for on-policy rollouts. However, existing OPSD methods often yield limited gains on in-domain reasoning and generalize poorly to out-of-domain problems. We identify two key causes: conditioning the self-teacher on a verified solution encourages imitation of training-domain reference trajectories rather than error-specific correction, and applying distillation to the full response can overwrite valid reasoning prefixes and reinforce overfitting. We propose Reflective On-policy Self-Distillation (ROSD), a framework that turns reference-solution imitation into targeted reasoning correction through reflection-guided, error-localized distillation. For each rollout, ROSD uses a self-reflector to extract a corrective idea and locate the first erroneous span. The corrective idea guides the self-teacher toward targeted supervision, while the localized error span restricts distillation to where correction is needed. This design corrects flawed reasoning while preserving valid prefixes. Experiments on multiple in-domain and out-of-domain reasoning benchmarks show that ROSD yields stronger in-domain reasoning performance overall and substantially better out-of-domain generalization than standard OPSD. Code is available at https://github.com/ZiqiZhao1/ROSD.
【13】MemGuard: Preventing Memory Contamination in Long-Term Memory-Augmented Large Language Models
标题:MemGuard:防止长期内存增强大型语言模型中的内存污染
链接:https://arxiv.org/abs/2605.28009
摘要:Memory-augmented large language models extend reasoning beyond a fixed context window by maintaining long-term memory across interactions. However, existing memory systems often collapse stable user facts, episodic events, and behavioral rules into a shared space, allowing functionally distinct memories to be retrieved and used as interchangeable evidence. We identify this failure mode as heterogeneous memory contamination, where context-specific events become overgeneralized claims, or semantically relevant but functionally incompatible memories mislead generation. To this end, we introduce MemGuard, a type-aware memory framework that preserves functional memory boundaries during memory construction and retrieval. It assigns each memory an explicit functional role at write time, maintains relations across type-isolated memories, and selectively composes evidence only from necessary memory types, reducing contamination from irrelevant or functionally incompatible evidence. Across hallucination and long-horizon conversation benchmarks, MemGuard improves memory reliability by up to 28.27% while retrieving up to 5.8x fewer memory tokens than prior methods. These results suggest that reliable long-term reasoning depends on principled organization and selective use of heterogeneous memory.
【14】Zipping the Thought: When and How Compressed Reasoning Data Works in LLM Post-Training
标题:压缩思想:压缩推理数据何时以及如何在LLM后训练中发挥作用
链接:https://arxiv.org/abs/2605.28008
摘要:Large language models (LLMs) can now solve complex problems through long chain-of-thought (CoT) reasoning, but the trade-off between performance and token cost remains a central challenge. To address this issue, supervised fine-tuning (SFT) often uses compressed reasoning data, where CoT traces are shortened into compact forms. However, the effect of such compressed reasoning data on post-training remains poorly understood. In this paper, we propose a taxonomy of CoT consisting of Explicit CoT, which outputs all operations without aggregation, Composed CoT, which combines multiple operations into a single step, and Implicit CoT, which omits intermediate operations. We construct a synthetic compositional reasoning task that allows controlled variation of difficulty, compression granularity, and data size, and conducted a comprehensive set of experiments across different model families and sizes. Notably, we find that (i) coarser CoT requires more SFT data, (ii) compared with Explicit CoT, Composed CoT and Implicit CoT benefit more from data scaling, while Composed CoT benefits from data repetition and Implicit CoT tends to lead to memorization, (iii) unlike SFT, subsequent reinforcement learning (RL) with verifiable rewards (RLVR) decomposes compressed steps learned during SFT, and (iv) unidirectional CoT ordering shows stronger generalization on longer sequential tasks. Our findings provide implications for CoT design under data resource constraints and offer important insights into the mechanisms of SFT and RL in LLM post-training.
【15】Where Does Toxicity Live? Mechanistic Localization and Targeted Suppression in Language Models
标题:毒性存在于哪里?语言模型中的机械定位和有针对性抑制
链接:https://arxiv.org/abs/2605.27997
摘要:Large language models frequently generate toxic, hateful, or harmful content, yet existing mitigation methods rely on costly retraining or output-level filtering with no mechanistic insight into where toxicity originates internally. We introduce Meow2X and TRNE, two complementary retraining-free frameworks that localize toxicity to specific layers and neurons by analyzing activation differentials between toxic and neutral prompts, then suppress them via inference-time scaling or minimal rank-one weight edits -- without any gradient descent. Evaluations across five LMs, two benchmarks, and 90 configurations using dual safety evaluators demonstrate consistent toxicity reduction while preserving language modeling quality. Our analysis reveals that toxicity is disproportionately encoded in early MLP layers, varies across architectures, and is systematically underestimated by single-evaluator setups -- underscoring the need for multi-evaluator safety assessment. By bridging mechanistic interpretability with practical detoxification, our framework offers a principled path toward safer, more transparent language models.
【16】Pressure-Testing Deception Probes in LLMs: Scaling, Robustness, and the Geometry of Deceptive Representations
标题:LLM中的压力测试欺骗探针:缩放、稳健性和欺骗性表示的几何
链接:https://arxiv.org/abs/2605.27958
备注:Accepted at the GEM Workshop @ ACL 2026
摘要:Linear probes trained on LLM activations are increasingly proposed as deception-detection metrics, yet report AUROC exceeding 0.96 on clean benchmarks while collapsing under distributional shift. This paper systematically pressure-tests probe-based metrics across the Gemma 3 model family (1B-27B parameters), diagnosing why they fail rather than merely documenting that they fail. We test four hypotheses about deception encoding: (1) single linear direction, (2) multi-dimensional subspace, (3) convex conic hull, (4) entropy proxy. Our design includes cross-domain transfer matrices, multi-dimensional probe analysis with permutation null baselines, entropy-residualization tests, and distractor evaluations across 8 stylistic shifts. We find that: (a) probes achieve near-perfect AUROC (>=0.998) on clean data but collapse under stylistic shifts; style-augmented probes recover near-perfect detection (mean AUROC 0.979-0.983) on unseen styles; (b) the single-direction hypothesis is rejected (k=1 captures only 0.61-0.80 AUROC), with cross-domain transfer failure confirmed as geometric rather than layer-mismatch-driven; (c) the entropy-proxy hypothesis is rejected (max |rho|=0.454, max Delta-AUROC after residualization=0.004); and (d) deception does not form a significant linear subspace (per-domain k*=0), yet multi-dimensional probes (k>=5) recover the signal through distributed sub-threshold features. Probe fragility reflects distributional narrowness rather than an architectural limitation: style-augmented probes recover near-perfect detection at both 4B and 27B, establishing that the inverse scaling pattern is a training-distribution artifact rather than a genuine scale-dependent phenomenon.
【17】Where LLM Annotators Fail: Label-Free Learning on Graphs with LLMs
标题:LLM注释者失败的地方:利用LLM进行图形无标签学习
链接:https://arxiv.org/abs/2605.27913
摘要:Node classification on graphs often requires labeled nodes, yet obtaining labels at graph scale is expensive. When node attributes contain semantic content, such as paper abstracts, web pages, or product descriptions, large language models (LLMs) can provide low-cost supervision by annotating a small subset of nodes. However, these LLM-generated labels are noisy, and existing label-free graph learning methods usually treat this noise as either global or class-conditional. We find that LLM annotation errors are not only class-dependent but also region-dependent: within the same class, reliability can vary sharply across feature-space clusters. In light of this, we propose Cluster-Aware Noise Estimation (CANE), a label-free learning framework that estimates cluster-conditional LLM reliability without ground truth labels, and uses this estimate to decide which pseudo-labels to trust, and which labels to correct. Across various graph benchmarks and GNN backbones, CANE improves over the strongest label-free baselines, with the largest gains on datasets exhibiting stronger cluster-conditional noise.
【18】Knowing When to Ask: Segment-Level Credit Assignment for LLM Tool Use
标题:知道何时询问:LLM工具使用的分段级学分分配
链接:https://arxiv.org/abs/2605.27788
摘要:Humans know when to reach for help e.g. $347 \times 28$ warrants a calculator while $2+2$ does not. Language models do not. Prompt-based approaches can instruct a model when to invoke tools, but this scaffolding does not teach it to recognize the boundary of its own knowledge. RL approaches that assign a single outcome reward to the whole trajectory fare no better: trajectory-level credit cannot isolate which tool call in a successful episode actually helped, nor penalize unnecessary calls. We propose \textbf{CARL} (\textbf{C}ompetence-\textbf{A}ware \textbf{R}einforcement \textbf{L}earning), which trains a critic on the model's own rollouts to learn where parametric knowledge suffices and where it needs external help. By decomposing each rollout at natural tool-use boundaries (e.g., code fence delimiters and context block transitions), CARL assigns independent credit to each segment from a single binary outcome, without external judges or step-level annotations. As a result, erroneous tool calls, incorrect extractions, and unnecessary calls each receive appropriately signed advantages. The trained critic captures the model's domain competence: it separates parametrically solvable from tool-dependent questions with AUC 0.93 at 7B. On five benchmarks spanning arithmetic, multi-hop factual QA, and numerical reasoning over financial tables, CARL improves exact-match accuracy by 6.7 points at 7B and 9.7 points at 3B over the best RL baseline, with the largest gain (+8.3 EM at 7B, +9.0 EM at 3B) on Musique. The model issues 53\% fewer tool calls on parametrically answerable questions while remaining ${\sim}10$ EM points more accurate on them. Gains are largest at small scale: the 3B improvement is $1.4\times$ the 7B improvement, suggesting that knowing when to ask disproportionately benefits models with smaller parametric memory.
【19】Locality-Aware Redundancy Pruning for LLM Depth Compression
标题:LLM深度压缩的本地感知冗余修剪
链接:https://arxiv.org/abs/2605.27786
摘要:Large language models are known to contain representational redundancy across network depth, making depth pruning an effective approach for improving inference efficiency. Existing one-shot pruning methods rely on local layer importance or fixed redundancy assumptions across architectures. We propose Locality-Aware Redundancy Pruning (LoRP), a training-free one-shot depth pruning framework guided by representation locality. We show that inter-layer redundancy can be either localized or globally distributed depending on the LLM architecture. To characterize this phenomenon, we introduce Representation Locality Score (RLS), derived from global inter-layer hidden-state similarity. Using a small calibration set, LoRP computes pairwise layer similarity, clusters layers by representational similarity, and allocates pruning according to residual intra-cluster redundancy. Experiments across diverse LLM families show improvements in both perplexity and downstream task accuracy.
【20】Do Audio LLMs Listen or Read? Analyzing and Mitigating Paralinguistic Failures with VoxParadox
标题:音频LL是听还是读?使用VoxPartridge分析和缓解副语言障碍
链接:https://arxiv.org/abs/2605.27772
备注:Accepted as a conference paper at ICML 2026. Project page: https://voxparadox.github.io/
摘要:Audio large language models (Audio LLMs) demonstrate strong performance on speech understanding tasks, yet their ability to understand paralinguistic information remains limited. To systematically quantify this issue, we introduce VoxParadox, an adversarial benchmark with 2,000 verified examples, spanning 10 paralinguistic tasks, created with controlled speech synthesis to intentionally mismatch transcript claims and speaking style, enabling direct measurement of speech paralinguistic understanding. Evaluation of a diverse set of Audio LLMs reveals consistently low accuracy on acoustic ground truth and a strong tendency to follow language-implied (incorrect) answers. To understand the cause of this gap, we perform layer-wise probing and find that (i) paralinguistic cues can degrade in deeper encoder layers and at the encoder--LLM interface, and (ii) even when such cues are available in audio tokens, the language model frequently ignores them. To address these problems, we propose Prompt-Conditioned Layer Mixer (PCLM), which adaptively combines information from multiple audio layers based on the input prompt, and pair it with Direct Preference Optimization (DPO) to explicitly prefer acoustically supported options over language-implied alternatives. These methods substantially improve Audio LLM paralinguistic understanding, improving Audio Flamingo 3 from 17.40% to 65.20% on VoxParadox, and from 37.74% to 54.78% on MMSU paralinguistic subset. Our project page is available at https://voxparadox.github.io/.
【21】Restoring the Sweet Spot: Pass-Rate Weighted Self-Distillation for LLM Reasoning
标题:恢复最佳点:LLM推理的通过率加权自蒸馏
链接:https://arxiv.org/abs/2605.27765
备注:18 pages, 8 figures
摘要:Self-Distillation Policy Optimization (SDPO) provides dense token-level credit assignment for reinforcement learning with large language models by leveraging the model's own feedback-conditioned predictions as a self-teacher. Unlike GRPO, however, whose group-relative advantage naturally concentrates learning on a sweet spot of intermediate-difficulty questions, SDPO's KL-based advantage lacks an implicit notion of difficulty awareness. We analyze this gap through the lens of GRPO's advantage normalization. Extending the learnability framework to normalized rewards, we show that normalization absorbs the variance term $p(1-p)$, equalizing leading-order learnability across questions and leaving $\sqrt{p(1-p)}$ as the sole residual scaling factor in the per-question gradient. This analysis yields a simple prescription: weight each question's SDPO loss by $[\hat{p}(1-\hat{p})]^{1/2}$, resulting in SC-SDPO, a scale-consistent variant of SDPO. The proposed weights are obtained as a zero-cost byproduct of on-policy rollouts with batch-adaptive normalization, inducing an implicit curriculum that dynamically tracks the model's evolving competence. Experiments on scientific reasoning and tool-use benchmarks demonstrate that SC-SDPO consistently improves over SDPO, yielding gains of +3.2/+4.3 (mean@16/maj@16) on Qwen3-8B and +1.8/+3.0 on OLMo-3-7B, while preserving stable training dynamics throughout optimization.
【22】A Paired Testing Protocol for Batch-Conditioned Refusal Robustness in LLM Serving
标题:LLM服务中批量条件拒绝稳健性的配对测试方案
链接:https://arxiv.org/abs/2605.27763
备注:12 pages. Accepted to the ICML 2026 Workshop on Hypothesis Testing
摘要:Safety evaluations of language models often treat serving configuration as fixed background infrastructure, but batch condition is an untested treatment variable whenever the same prompt may be evaluated alone, in a synchronized batch, or inside a continuous-batching scheduler. We synthesize four artifact-backed studies into a paired testing protocol: Study A combines local discovery, scorer-corrected adjudication, and true-batching confirmation; Study B tests cross-model generalization; Study C tests continuous-batch composition; and Study D runs a batch-invariant-kernel ablation. The local test finds safety-label changes more often than capability-label changes (0.51% vs. 0.14%), but adjudication of 63 candidate rows leaves only 17 genuine behavioral flips, implying a corrected full-set rate of 0.16%. The 15-model extension finds no detectable universal safety-over-capability skew: flips are near parity (0.94x), alignment type has no detectable association ($p=0.942$, $η^2=0.033$), and output instability is the strongest tested fragility screen ($r=0.909$, bootstrap 95% CI [0.65, 0.97]). In the targeted kernel ablation, standard vLLM reproduces 22/55 label flips on current score-flip candidates, while enabling VLLM_BATCH_INVARIANT=1 reduces the same test to 0/55 flips; the composition test separately finds no aggregate effect at 4.7pp sensitivity. The testing recommendation is exact-stack validation: evaluate refusal at the served batch setting, pair safety prompts with capability controls, and report low-rate directional flips separately from aggregate null effects.
【23】TRACES: Proactive Safety Auditing for Multi-Turn LLM Agents via Trajectory-State Modeling
标题:TRACES:通过轨迹状态建模对多回合LLM代理进行主动安全审计
链接:https://arxiv.org/abs/2605.27690
摘要:LLM agents increasingly operate through multi-turn tool use and environment interaction, where safety risks often emerge from intermediate steps long before they surface in the final outcome. Reactive auditing is therefore insufficient: post-hoc diagnosis frequently misses the chance to flag risks while they are unfolding. We propose TRACES, a representation-based proactive auditor that learns prefix-level trajectory risk states from the hidden representations of an observer LLM. TRACES induces latent mechanism features from step representations and models their temporal evolution to estimate whether a partial trajectory is drifting toward unsafe behavior. To sidestep the cost and ambiguity of step-level risk annotation, TRACES is trained with weak trajectory-level supervision while still producing dense prefix-level risk estimates. Across multiple agent safety benchmarks, TRACES improves both full-trajectory safety prediction and proactive risk discrimination. Our analyses further suggest that these risk states can help train a safer agent, highlighting the broader potential of proactive auditing for long-horizon agent safety.
【24】Heterogeneous Parallelism for Multimodal Large Language Model Training
标题:多模式大型语言模型训练的异类并行主义
链接:https://arxiv.org/abs/2605.27678
摘要:Foundation model training is becoming multimodal, from post-training pipelines to large-scale pretraining. As modality coverage broadens, context windows grow, and encoder LLM scales diverge, a single LLM-centric TP/CP/PP/DP/EP layout increasingly limits throughput. This coupling forces encoders to inherit LLM-driven sharding and placement choices that can add communication, limit encoder parallelism, or constrain the LLM schedule; the mismatch is most pronounced at long contexts, where LLM context parallelism is needed for the fused multimodal sequence but encoder inputs remain bounded. We present heterogeneous parallelism for multimodal large language model training, an abstraction that lets modules in one end-to-end graph use independent layouts and rank placements, supporting colocated execution on shared GPUs and non-colocated execution on disjoint rank sets. The key challenge is preserving boundary tensor semantics across independent layouts: forward activations must be materialized for the destination layout, while backward gradients must be routed back to the source layout. We address this with boundary communicators that implement forward and backward layout transforms, plus scheduling extensions for both placement modes. We evaluate optimized homogeneous, colocated heterogeneous, and non-colocated heterogeneous configurations across multimodal workloads and GPU scales to characterize when added layout and placement freedom exposes a better operating point. Across this sweep, colocated heterogeneity improves TFLOPS/GPU by up to 49.3%, while non-colocated heterogeneity improves aggregate token throughput by up to 13.0% and TFLOPS/GPU by up to 9.6%. We validate loss convergence parity against homogeneous baselines and release the system as an open-source Megatron-LM extension.
【25】Aligning LLMs with Human Uncertainty: A Beta-Bernoulli Calibrator for LLM Forecasting
标题:将LLM与人类不确定性保持一致:LLM预测的Beta-Bernoulli校准器
链接:https://arxiv.org/abs/2605.27668
摘要:Probabilistic forecasting estimates the likelihood of uncertain future events. To improve LLM forecasting, existing methods typically learn from binary outcomes to output verbalized forecasts. However, while aggregated human forecasts contain rich information in both the crowd probability estimate and the degree of agreement among forecasters, how to utilize these signals remains underexplored. To address this, we propose the Beta-Bernoulli Calibrator (BBC), which converts an initial point estimate forecast from any model into a distribution over event likelihood, using supervision from both binary outcomes and human forecasts. BBC models event likelihood $p \sim \text{Beta}(α, β)$ and outcome $y \sim \text{Bernoulli}(p)$, with the mean as the calibrated point forecast and the variance as the epistemic uncertainty. Our results show that BBC generally provides better calibrated and more accurate forecasts than both traditional post-hoc calibration methods and models fine-tuned specifically for forecasting, while remaining lightweight and having good generalization. We also show that the epistemic uncertainty captured by BBC is a more reliable predictor of forecasting error than verbalized confidence.
【26】Disentangling Language Roles in Multilingual LLM Task Execution
标题:理清多语言LLM任务执行中的语言角色
链接:https://arxiv.org/abs/2605.27649
摘要
:Multilingual LLMs are increasingly used when instruction, source content, and required response languages do not coincide. Existing benchmarks have expanded multilingual instruction-following evaluation, but they rarely isolate these three roles within a fully crossed design. We introduce MTM-Bench, a controlled benchmark for language-conditioned task execution in which each instance is defined by a triplet \((L_{\text{instr}}, L_{\text{content}}, L_{\text{resp}})\). Across English, Spanish, and Chinese, MTM-Bench enumerates all 27 triplets and contains 2{,}430 instances per model across semantic reversal, final-state extraction, and language purity with update realization. We evaluate 20 frontier and open-weight LLMs using decomposed metrics for semantic correctness, target-language adherence, constraint satisfaction, contamination ratio, and joint success, with scoring validated by a targeted human audit. The fully crossed design reveals that degradation is organized by the role a language occupies in the task structure, not merely by mismatch count. The response-language role is the dominant axis of variation, and a single response-slot mismatch accounts for most degradation. The response-only and full-mismatch comparison suggests that mismatch count is not a monotonic predictor of difficulty, with model-level ordering varying across systems. Task families fail through distinct channels, showing that semantic correctness alone does not capture reliable multilingual task execution.
【27】Learning to Translate from Soft to Hard LLM Prompts
标题:学习从软到硬LLM论文翻译
链接:https://arxiv.org/abs/2605.27642
备注:8 Pages, 11 tables, 4 Figures
摘要:Soft prompt tuning is a parameter-efficient method for adapting LLMs to specific tasks, but suffers from a lack of interpretability. Building on recent work on interpreting soft prompts (Ramati et al., 2024), we explore how training a dedicated soft prompt to natural language translation model can yield higher translation quality. In particular, in both quantitative and qualitative comparisons on multiple Datasets of Datasets (DoDs), we demonstrate that our translator produces fluent, accurate verbalizations that outperforms existing training-free methods like InSPEcT. In addition to advancing interpretability, our work suggests a promising downstream application: soft prompts optimized on small, open-source models can be translated into portable text prompts that, when deployed on larger closed-API models, exceed the performance of the original soft prompt and, in some cases, even few-shot learning.
【28】Poison with Style: A Practical Poisoning Attack on Code Large Language Models
标题:带有风格的毒药:对代码大型语言模型的实用毒药攻击
链接:https://arxiv.org/abs/2605.27631
备注:Accepted to the Forty-Third International Conference on Machine Learning 2026 (ICML 2026)
摘要:Code Large Language Models (CLLMs) serve as the core of modern code agents, enabling developers to automate complex software development tasks. In this paper, we present Poison-with-Style (PwS), a practical and stealthy model poisoning attack targeting CLLMs. Unlike prior attacks that assume an active adversary capable of directly embedding explicit triggers (e.g., specific words) into developers' prompts during inference, PwS leverages developers' code styles as covert triggers implicitly embedded within their prompts. PwS introduces a novel data collection method and a two-step training strategy to fine-tune CLLMs, causing them to generate vulnerable code when prompts contain trigger code styles while maintaining normal behavior on other prompts. Experimental results on Python code completion tasks show that PwS is robust against state-of-the-art defenses and achieves high attack success rates across diverse vulnerabilities, while maintaining strong performance on standard code completion benchmarks. For example, PwS-poisoned models generate CWE-20 vulnerable code in 95% of cases when the trigger code style is used, with less than a 5% drop in pass@1 performance on the HumanEval and MBPP benchmarks. Our implementation and dataset are here: https://github.com/khangtran2020/pws.
【29】Gradient Transformer: Learning to Generate Updates for LLMs
标题:梯度Transformer:学习为LLM生成更新
链接:https://arxiv.org/abs/2605.27591
备注:Accepted at ICML 2026
摘要:Many organizations lack computational resources to fine-tune large language models (LLMs) on private (unshareable) data for better utility, while fine-tuning tiny language models (TinyLMs) alone performs poorly. To address this bottleneck, we propose a data-free knowledge distillation framework that generates LLM update vectors based on TinyLMs fine-tuned on private data. An update vector is a vector of parameter changes from an initial model to its fine-tuned version on a dataset, capturing the effect of cumulative gradient steps during fine-tuning. The key idea of our framework is a novel Gradient Transformer that transforms TinyLM's update vectors into LLM's update vectors. As derived from shadow datasets, Grad-Transformer captures the correlation between TinyLM and LLM update vectors, enabling third-party providers to generate LLM update vectors given the organization's TinyLM update vectors without accessing the organization's private data. The framework supports multi-organization collaboration to jointly update LLMs, improving performance and cost-efficiency. Extensive experiments across language modeling and reasoning tasks show that Grad-Transformer remarkably outperforms state-of-the-art knowledge distillation baselines, even under strict differential privacy protection.
【30】Detection Without Correction: A Two-Parameter Decomposition of Multi-Stage LLM Pipelines
标题:无需纠正的检测:多级LLM管道的两参数分解
链接:https://arxiv.org/abs/2605.27559
摘要:Multi-stage LLM pipelines that perform multi-agent debate, intrinsic self-correction, or retrieval-augmented verification exhibit puzzling aggregate behaviors: accuracy plateaus and reversals across rounds, non-replication of debate gains on contemporary frontier models, intrinsic self-correction degradation, and qualitative cross-provider divergence in debate dynamics. Downstream agent response can be operationalized as two coupled decisions: detection (whether to treat upstream content as authoritative) and conditional generation (what to produce if not). This decomposition yields four observable response regimes, of which detection-without-correction is the load-bearing failure mode. Across a nine-cell empirical grid spanning four model families, four benchmarks (GSM8K, MATH-500, GPQA-Diamond, AIME), and two methods (multi-agent debate, intrinsic self-correction), we find that the conditional miscorrection rate is consistently dominant (53-94% across cohorts) while detection rate varies contextually by more than an order of magnitude. The framework unifies the four phenomena above as signatures of a common mechanism and characterizes detection threshold as a stable model/protocol-level regularity that persists across methods at matched benchmark difficulty.
【31】HARP: Measuring Harm Amplification in Multi-Agent LLM Systems
标题:HARP:测量多智能体LLM系统中的危害放大
链接:https://arxiv.org/abs/2605.27489
备注:39 pages, 12 figures, 12 tables, and 1 algorithm
摘要
:Multi-agent LLM systems decompose workflows across agents, tools, shared context, memory, and decision gates. This modularity improves interpretability, but creates a propagation risk: a bounded perturbation to one component can be reused by other agents and amplified into system-level harm. We introduce HARP (Harm Amplification through Role Perturbation), a trace-first methodology for studying local-to-global harm amplification in multi-agent LLM systems. HARP compares paired clean and perturbed executions and records specialist outputs, tool calls, memory reads/writes, guard events, oracle logs, latency, token cost, and decisions. We define local harm as deviation from targeted agents or corrupted channels, global harm as deviation over the full trace, and harm amplification as (H_global/H_local). This complements attack success rate with a measure of how strongly orchestration spreads harm beyond the attack point. We instantiate HARP in a finance-oriented seven-agent system with a deterministic decision gate and configurable attack harness for specialist compromise, collusion, shared-context corruption, and temporal or memory-persistent attacks. Across five defenses, prompt-only defenses preserve benign utility but leave high success and stealth; pre-tool and step-level guards reduce some failures with utility or latency costs; and IntegrityGuard, a trace-consistency defense, achieves the lowest attack success and global harm but introduces utility/cost trade-offs. Results show that single-specialist compromise produces the strongest amplification, shared-context corruption yields the highest attack success, and temporal persistence produces the largest malicious impact. HARP argues that secure multi-agent evaluation must measure not only bypass, but propagation.
【32】Soft Specialists: $α$-Rényi Ensembles for Uncertainty-Aware LLM Post-Training
标题:软专家:$a $-Rényi Ensembles for Unsemty-Aware LLM post-Training
链接:https://arxiv.org/abs/2605.27747
摘要:Existing training approaches for large language models learn a single set of parameters, based on large volumes of data, which is typically heterogeneous, conflicting and often outright contradictory. As a result, the model is forced to compress conflicting goals, and inherent uncertainties into a single, averaged pattern of behaviour. We propose an $α$-Rényi variational framework for learning distributions over post-training parameters, offering an uncertainty-aware alternative to deep ensemble approaches. The resulting variational objective interpolates between classical variational Bayes and predictively oriented posterior learning, balancing between globally plausible individual models against systems of complementary specialists. We identify local stability criteria, demonstrating how model misspecification can make non-degenerate posterior spread locally favourable, manifesting contradictory or conflicting data as epistemic uncertainty. We apply our framework to LLM post-training, learning an ensemble of LoRA adapters attached to a shared, frozen base model, providing a scalable training procedure for both supervised fine-tuning and preference optimisation. Our approach enables training examples to be softly routed across ensemble members, promoting model specialisation and providing actionable uncertainty estimates across different tasks.
Graph相关(图学习|图神经网络|图优化等)(9篇)
【1】BIRDNet: Mining and Encoding Boolean Implication Knowledge Graphs as Interpretable Deep Neural Networks
标题:BIRDNet:挖掘和编码布尔隐含知识图作为可解释深度神经网络
链接:https://arxiv.org/abs/2605.28739
备注:5 pages; 1 figure, 4 tables
摘要:Tabular data in knowledge-rich domains often carries a latent prior in the form of Boolean implication relationships (BIRs) between pairs of features. We mine such relationships with a sparse-exception binomial test. The mined implications form a typed directed graph, equivalent to a propositional rule base of 2-literal clauses. We encode this graph as the connectivity of a layered neural network, called BIRDNet, in which each hidden unit corresponds to one mined rule and binds only to its two features. We show two consequences of this design: First, the architecture is sparse by construction: at most $2/d$ of the weights in each BIR layer are active, where $d$ is the input dimension. Second, the model is interpretable: every trained unit keeps a stable symbolic identity, so rules can be read off the network without surrogate models. Unlike most neurosymbolic models, BIRDNet does not consume an external rule base; its structural prior is mined from the data. We evaluate BIRDNet on six transcriptomic and proteomic benchmarks. Our results show that BIRDNet stays within 0.02 AUROC of the strongest dense baseline, at a small accuracy cost, while using up to $96\times$ fewer active parameters than an architecture-matched dense MLP. First-layer rules recover known biological signatures across multiple cancer subtypes and tissue types, including canonical amplicons, lineage-defining co-expression modules, and immune-infiltration markers. Data and code are available at: https://github.com/MAHI-Group/BIRDNet.
【2】Applications of temporal graph learning for predicting the dynamics of biological systems
标题:时态图学习在预测生物系统动态中的应用
链接:https://arxiv.org/abs/2605.28659
摘要:
【3】A Generalized Tikhonov Layer for Interpretable-by-design Graph Neural Networks
标题:设计可解释图神经网络的广义Tikhonov层
链接:https://arxiv.org/abs/2605.28578
摘要:
【4】T-GINEE: A Tensor-Based Multilayer Graph Representation Learning
标题:T-GINEE:基于张量的多层图表示学习
链接:https://arxiv.org/abs/2605.28300
备注:Accepted by ICML 2026
摘要:Traditional network analysis focuses on single-layer networks, real-world systems often form multilayer networks with multiple relationship types. However, existing methods typically fail to capture complex inter-layer dependencies by treating layers independently or aggregating them. To address this, we propose T-GINEE (Tensor-Based Generalized Multilayer-graph Estimating Equation), a statistical regularization framework combining tensor-based generalized estimating equations with task-specific loss to model cross-network correlations explicitly. Key innovations include: (1) CP tensor decomposition capturing structural dependencies via shared latent factors; (2) a generalized estimating equation framework modeling inter-layer correlations through working covariance matrices; and (3) a flexible link function accommodating characteristics like sparsity. Our theoretical analysis establishes consistency and asymptotic normality under mild conditions. Extensive experiments on synthetic and real-world datasets validate T-GINEE's effectiveness for multilayer network analysis.
【5】Robust Contrastive Graph Clustering with Adaptive Local-Global Integration
标题:具有自适应局部-全球集成的鲁棒对比图集群
链接:https://arxiv.org/abs/2605.28209
摘要
:Graph clustering is essential in graph analysis for revealing structural patterns and node communities. Despite recent advances in self-supervised contrastive learning that have improved clustering via structural and attribute signals, existing methods still struggle to flexibly capture high-order local structures and often overlook global semantics in complex graphs. These limitations lead to suboptimal node representations, especially in real-world graphs with fragmented structures and ambiguous cluster boundaries. To address these limitations, a contrastive graph clustering framework is proposed to jointly integrate multi-scale local structures with global semantics via attention mechanisms. At the local level, GNN-based topological signals extracted from multiple propagation depths are adaptively fused through attention-based weighting to capture multi-scale neighborhood features. At the global level, semantic prototypes derived from dynamically evolving cluster centers are adaptively aggregated through attention to guide node representations and enhance inter-cluster separability. The model is trained under a dual-view contrastive learning paradigm with a hybrid objective that combines instance-level and structure-aware losses to improve representation robustness and discrimination. Experiments on eight real-world graph datasets demonstrate that our method achieves competitive clustering performance. Code is available at https://github.com/vege12138/w2.
【6】Temporal Hyperbolic Graph Representation Learning for Scale-Free Internet Routing and Delay Prediction
标题:无标度互联网路由和延迟预测的时间双曲图表示学习
链接:https://arxiv.org/abs/2605.28155
摘要:Predicting Internet round-trip time (RTT) is critical for routing optimization, quality-of-service (QoS) provisioning, and traffic engineering, yet remains challenging due to long-term temporal dependencies, evolving routing dynamics, and heavy-tailed latency distributions. While Temporal Graph Neural Networks (TGNNs) can model evolving network topologies, most existing approaches operate in Euclidean space, which poorly captures the hierarchical and scale-free structure of Internet routing graphs. Hyperbolic geometry provides a more suitable representation space. We propose HERMIT (Hyperbolic Edge-aware RTT Modeling via Integrated Topology), a hybrid framework combining a hyperbolic manifold-preserving temporal GNN with a Random Forest regressor for joint link prediction and RTT prediction. Built on HMPTGN, HERMIT introduces RTT-aware edge features and a learnable edge encoder to improve modeling of evolving link states and routing behavior. The resulting hyperbolic node representations are combined with historical RTT statistics for robust latency prediction. We evaluate HERMIT on a large-scale real Internet dataset spanning 2015-2024. HERMIT consistently outperforms a strong Random Forest baseline using only historical RTT statistics, achieving a 6% RMSE improvement while reducing large errors on heavy-tailed samples. It also surpasses prior hyperbolic TGNN models, including HMPTGN and HTGN, in link prediction performance. These results demonstrate that combining hyperbolic temporal graph learning with tree-based regression provides a scalable solution for RTT prediction in real-world Internet topologies.
【7】Benchmarking Inductive Biases for Multivariate Time-Series Anomaly Detection with a Robust Multi-View Channel-Graph Detector
标题:使用稳健的多视图平行图检测器对多元时间序列异常检测的归纳偏差进行基准测试
链接:https://arxiv.org/abs/2605.28103
摘要:We present a unified experiment, analysis, and benchmark study of multivariate time-series (MTS) anomaly detection. Ten family-representative detectors -- spanning statistical, reconstruction, association, frequency, and generic-transformer families -- are evaluated on five datasets (SMD, MSL, SMAP, PSM, and MSDS) under effectiveness, efficiency, robustness, and cross-dataset generalisation. All methods share the same windowing, scoring, hardware, and metric protocols. Effectiveness, ablation, and robustness use three random seeds; cross-dataset transfer uses seed~0 because each extra seed requires $250$ source-target evaluations. The benchmark yields three method-independent findings: no single-bias baseline dominates; absolute perturbation VUS-ROC is more informative than retention ratios; and MSDS behaves as an event-dense deployment workload rather than a sparse point-anomaly benchmark. Under this protocol we also introduce \ours{}, an adaptive detector family combining a NOTEARS-constrained directed channel-graph view with optional patch-attention and temporal-association views. \ours{} achieves the best macro-average VUS-ROC ($0.675$, $+5.1$~pt over the second-best LSTM-AE), ranks first overall, and reaches the top-3 on all five datasets. Its wins on MSL and MSDS are narrow, while its average and robustness gains are larger: under the same three-seed robustness protocol for every method, it obtains the strongest absolute VUS-ROC across noise, channel dropout, and time-shift perturbations. We release the MSDS preprocessing protocol, configurations, scripts, and seed-level metric dumps.
【8】From Detection to Mechanism: Cross-Attention Graph Neural Networks Enable Drug-Drug Interaction Type Prediction An Ablation Study with Acetylsalicylic Acid Validation
标题:从检测到机制:交叉注意力图神经网络实现药物相互作用类型预测一项具有阿司匹林验证的消融研究
链接:https://arxiv.org/abs/2605.27861
备注:12 pages, 1 figure
摘要:Predicting whether two drugs interact (binary detection) is a substantially dif- ferent task from predicting the mechanism type of that interaction (multi-class classification). This study presents a systematic ablation study of three Graph Neural Network (GNN) architectures for drug-drug interaction (DDI) prediction on a publicly available benchmark dataset comprising 38,337 positive pairs across 86 interaction types. Three architectures are compared under identical training conditions (n = 61,339 pairs): a siamese dual Message Passing Neural Network (MPNN) with concatenation (Concat), a dual MPNN with four-head cross-attention (CrossAtt), and a ternary MPNN incorporating an interaction graph (Ternary). CrossAtt improves multi-class F1-macro by +0.186 absolute (+45%) over Concat, while improving binary AUC by only +0.012 (+1.3%) - confirming that atom-level inter-molecular communication specifically enables mechanism-type classification. The ternary architecture underperforms despite equivalent training data, with its failure consistent with a training instability hypothesis. Validation on ten acetylsali- cylic acid (ASA) drug pairs, held out prior to training, demonstrates 10/10 correct DDI-type predictions for CrossAtt versus 0/10 for Ternary. Two consistent failure cases are identified across all architectures, linking to structural limits established in a companion toxicity study.
【9】Detect by Yourself: Self-Designing Agentic Workflows for Few-Shot Graph Anomaly Detection
标题:自行检测:自行设计用于Few-Shot图异常检测的统计工作流程
链接:https://arxiv.org/abs/2605.27470
摘要:Graph anomaly detection aims to identify anomaly nodes in attributed graphs and plays an important role in real-world applications. However, existing graph anomaly detection methods still face two key challenges: 1) fixed pipelines, which restrict their adaptability across different graph tasks under limited supervision; 2) weak evidence, which prevents them from explicitly incorporating contextual and structural anomaly signals into the detection process. In this paper, we propose a novel framework, self-designing agentic workflows for few-shot graph anomaly detection (SignGAD). Specifically, we propose a novel paradigm that reformulates graph anomaly detection task from training a fixed anomaly detector to designing task-conditioned detection workflows. By constructing detection workflows, SignGAD selects suitable graph encodings and detector designs to exploit task-specific anomaly evidence. Meanwhile, we introduce a guarded final refit strategy to refine the selected workflow by calibrating refit acceptance, enhancing reliability under limited supervision. Extensive experiments conducted on several real-world datasets demonstrate that SignGAD achieves strong performance against state-of-the-art methods, highlighting its effectiveness on graph anomaly detection tasks.
Transformer(6篇)
【1】Transformers Provably Learn to Internalize Chain-of-Thought
标题:Transformer可以证明学会内化思想链
链接:https://arxiv.org/abs/2605.28600
摘要:
【2】Meta-Attention: Bayesian Per-Token Routing for Efficient Transformer Inference
标题:元注意力:有效的Transformer推理的Bayesian每令牌路由
链接:https://arxiv.org/abs/2605.28384
摘要:Standard transformer architectures apply a single attention mechanism uniformly across all tokens and sequence positions, irrespective of local context or computational budget. We propose Meta-Attention, a framework that dynamically routes each token to the most appropriate attention strategy -- full softmax attention, linear (kernel) attention, or sliding-window local attention -- via a Bayesian Meta-Controller. Unlike prior routing approaches that use deterministic or prior-free learned routing, the Meta-Controller treats per-token mechanism selection as posterior inference under a compute-aware Dirichlet prior: routing weights are the output of an amortised variational posterior q(alpha | x_t; phi) trained with an Evidence Lower Bound (ELBO) objective that jointly encodes task performance and attention-mechanism cost. This design produces principled routing uncertainty estimates that govern the soft-to-hard routing transition, mitigates routing collapse without ad hoc load-balancing losses, and yields better compute-performance trade-offs than deterministic or prior-free learned routing at negligible overhead. Phase 1 empirical results on a Tiny LM benchmark confirm core predictions: the Bayesian controller's learned routing distribution implies a projected normalised FLOP cost of 25.1% under hard routing, vs. 59.3% for the prior-free baseline (-34.2 pp), and reduces routing entropy from 55.8% to 43.3% (-12.5 pp), demonstrating that the Dirichlet prior prevents routing collapse while the non-Bayesian model defaults to full attention. We present the Bayesian architecture, ELBO training objective, and a Phase 1 PyTorch prototype validating forward-pass correctness, posterior diversity, and a controlled ablation against a prior-free baseline. Code available at: https://github.com/KFEAL/meta-attention
【3】Measure-to-measure Regression with Transformers
标题:与Transformer的测量到测量回归
链接:https://arxiv.org/abs/2605.28075
摘要:Many learning problems require predicting how populations evolve under an unknown transformation. A natural representation for such populations is a probability measure, with point clouds as a key example. In this work, we study the measure-to-measure (M2M) regression problem, in which one seeks to learn a map between probability measures from a finite collection of observed input-output pairs. In contrast to classical regression, where individual samples are transformed independently, M2M regression treats entire distributions as the data points. This perspective is vital in certain scientific applications, for example, cellular and molecular biology, where cells are known to evolve not as independent data points but as a collection. However, few existing approaches address the problem of M2M regression with sufficient expressivity and scalability. We present a formalization of nonlinear M2M regression and introduce two easy-to-use, expressive, and scalable approaches to learn such operators: transformers as static M2M maps and transformers as dynamic M2M velocity fields. Our approach leverages the natural measure-dependent and mean-field structure of transformers to learn nonlinear M2M maps on the space of probability distributions. We illustrate the effectiveness of our proposed method to generalize to unseen measures on synthetic experiments, interacting particle systems, and a large-scale patient-derived organoid dataset for predicting treatment response in colorectal cancer.
【4】ReSAE: Residualized Sparse Autoencoders for Multi-Layer Transformer Interventions
标题:ReAE:用于多层Transformer干预的剩余稀疏自动编码器
链接:https://arxiv.org/abs/2605.27819
摘要:Sparse autoencoders are usually trained one layer at a time, even though transformer residual stream activations are strongly coupled across depth. This creates a practical problem for multi-layer interventions: different layerwise dictionaries can spend capacity representing the same carried-forward information, and replacing several layers at once can produce interactions that are not predicted by single-layer behavior. We introduce Residualized Sparse Autoencoders (ReSAEs), which fit an affine map between selected layers and train each later-layer SAE on the unexplained residual rather than on the full activation. Reconstructions are mapped back into the original activation space through the fitted affine chain, so ReSAEs can be evaluated with the same intervention protocols as ordinary SAEs. On Pythia-1.4B and Gemma-2-9B, residualization reduces decoder redundancy and improves sparse probing and targeted perturbation in most tested settings. Despite reconstructing less of the raw activation variance, ReSAEs recover more transformer cross entropy under multi-layer replacement. This gain is clearest under teacher-forcing and at sufficient sparsity online, indicating that ReSAEs preserve the components of the activation most relevant to the model's downstream computation. These results suggest that removing linearly predictable cross-layer structure is a useful default for multi-layer SAE interventions.
【5】Fine-Tuning Dynamics of In-Context Factual Recall in Transformers
标题:Transformer中上下文事实回忆的微调动力学
链接:https://arxiv.org/abs/2605.27774
摘要:In-context learning \ -- performing tasks based on examples given in the prompt \ -- is an important capability that has emerged in large language models and has received significant attention in both theory and practice. Existing theoretical work often focuses on settings where the learning uses information purely from the prompt. However, many practical instances of in-context learning require the model to retrieve factual knowledge stored in the model's parameters, with the context serving to identify which knowledge is relevant. In this work, we study how in-context learning leverages factual knowledge recall. We formalize this behavior by introducing the \emph{in-context factual recall (IC-recall)} task, where a transformer is provided a context of (subject, answer) pairs generated from a hidden relation, along with a query subject, and must both infer this hidden relation and retrieve the corresponding answer. Factual knowledge is modeled by the transformer having access to a simple pre-constructed MLP associative memory storing (subject, relation, answer) triplets. We analyze the supervised fine-tuning dynamics of a one-layer transformer on IC-recall data and prove that the model successfully performs IC-recall by converging to a particular pairwise attention pattern. This fine-tuning stage requires a very small number of samples \ -- only polylogarithmic in the number of stored knowledge triplets. Experiments verify our theoretical predictions and show that the pairwise attention pattern emerges even when the MLP layer is pretrained instead of constructed.
【6】Generic Interpretation Approach for Transformer Models Incorporating Heterogenous Attention Structures
标题:描述异质注意结构的Transformer模型的通用解释方法
链接:https://arxiv.org/abs/2605.27458
摘要
:Transformer has significantly propelled the development of artificial intelligence, and certainly the development of agents as well. We categorize attention structures of Transformer into two types based on the source of the input information: homogenous and heterogenous attention structures. Heterogenous attention structures, with co-attention as a typical example, process information from different sources. Heterogenous attention structure is the foundation for Transformer models to achieve more complex functions and integrate more modal information. Whether for research purposes or policy requirements, the interpretation of Transformer models with heterogenous attention structures is an important task. The fusion of information from different sources brings new challenges. Our work mainly includes two parts: method and experimentation. In terms of method, we propose an interpretation method for Transformer models with heterogenous attention structures. In terms of experimentation, based on our experimental analysis paradigm, we interpret the operating mechanisms of representative models, conduct semantic interpretation and logical interpretation.
GAN|对抗|攻击|生成相关(15篇)
【1】Activation Steering for Synthetic Data Generation: The Role of Diversity in Downstream Safety Detection
标题:合成数据生成的激活引导:多样性在下游安全检测中的作用
链接:https://arxiv.org/abs/2605.28664
摘要:
【2】Universal Time Series Generation with Neural Controlled Differential Equations
标题:用神经控制方程生成通用时间序列
链接:https://arxiv.org/abs/2605.28507
摘要:Recent work on the sequence universality of State Space Models (SSMs) has introduced efficient, maximally expressive continuous-time approaches for time-series modelling. While these works focus on discriminative settings, we extend this perspective to generative time-series modelling by proving that maximally expressive Structured Linear Controlled Differential Equations (SLiCEs) are universal time-series generators, in the sense that they can approximate the induced path laws of continuous causal pushforwards on compact latent sets in $W_\infty$. Building on these theoretical results, we propose Generative SLiCEs (G-SLiCEs), a maximally expressive continuous-time model for flow matching on path-space. Empirically, we show that expressivity improves performance in probabilistic forecasting and downstream tasks, while retaining the advantages of continuous-time models such as generalising to arbitrary observation grids. This is particularly beneficial for irregular grids, where fixed-grid models often struggle.
【3】Mitigating Adaptive Attacks against Reasoning Models with Activation Consistency Training
标题:通过激活一致性训练缓解针对推理模型的自适应攻击
链接:https://arxiv.org/abs/2605.28467
摘要:As LLMs gain stronger reasoning capabilities, their extended chain-of-thought introduces new degrees of complexity for defending against adversarial jailbreaks and prompt injection. We study consistency training, a family of fine-tuning objectives that enforce identical behavior on clean prompts and adversarial rewrites, and evaluate its two main variants, output-level (BCT) and activation-level (ACT), across five reasoning models. We formulate both methods as a prompt injection defense and find ACT to be competitive with other training-based defenses while requiring only self-supervised pairs of clean and wrapped prompts. Our experiments also generalize both techniques within the jailbreak setting, demonstrating that ACT remains more robust to adaptive attacks. We also provide mechanistic evidence that ACT's defense against jailbreaks is encoded as a roughly linear shift in activation space at the assistant-turn boundary. After ACT training, we can recover a single steering direction that controls refusal on reasoning models with minimal effect on benign inputs. We find that ACT remains robust even when the model's chain-of-thought is replaced with a compliant trace from the undefended base model, pivoting to refuse prefilled jailbreaks. Together, these results suggest that supervising internal representations is a surprisingly effective and interpretable approach to various forms of safety training in reasoning models.
【4】Out of Sight, Not Out of Mind: Unveiling Latent Attack in Latent-based Multi-Agent Systems
标题:眼不见,而不是心不在焉:揭露基于潜伏的多智能体系统中的潜在攻击
链接:https://arxiv.org/abs/2605.28214
备注:27 pages, 7 figures, 3 tables. Preprint
摘要:Latent-based multi-agent systems replace parts of explicit inter-agent communication with hidden representations, offering a new direction for efficient and flexible agent collaboration. However, moving coordination into latent space may also move attacks beyond the reach of visible-text inspection. In this paper, we study whether latent states can carry attack-associated information that remains effective during clean executions. To examine this question, we introduce a latent attack framework that reactivates attack-induced effects through latent interventions without reusing adversarial text. Extensive experiments show that the resulting latent-only attacks can substantially degrade task performance in clean executions, especially when applied to inter-agent KV-cache handoffs rather than local hidden states. Further control analyses indicate that this degradation cannot be reduced to arbitrary perturbations or invalid generation. Overall, our findings suggest that latent-based collaboration does not remove attack risk. It shifts part of the risk into less observable execution states, calling for safeguards beyond visible-text inspection.
【5】Hierarchical Synthetic Tabular Data Generation: A Hybrid Top-Down and Bottom-Up Framework
标题:分层合成表格数据生成:自上而下和自下而上的混合框架
链接:https://arxiv.org/abs/2605.28198
备注:Accepted as a poster at FMSD @ ICML 2026. 9 pages, 6 figures
摘要:Existing approaches for synthetic tabular data generation are based on either purely generative models or LLMs, both of which struggle with data heterogeneity, logical consistency, rare-event coverage, and robustness in low-data regimes. In this paper, we propose a hierarchical hybrid top-down and bottom-up (H-TDBU) framework that decouples semantic structures from stochastic texture. In the top-down path, structure-driven logical constraints and cross-modal alignment rules are constructed, while in the bottom-up path, lightweight tabular generators are used to learn local statistical patterns from real data. The two paths are consolidated in a unified synthesis engine with an iterative feedback loop. We evaluate the framework on weak multimodal financial benchmarks combining tabular and sentiment-text data. Experimental results show that our H-TDBU approach improves train-synthetic-test-real performance over neural baseline methods while preserving semantic consistency. Our results suggest that hierarchical rule-guided synthesis provides an effective mechanism for combining controllability, semantic coherence, and statistical fidelity in synthetic data generation.
【6】No Safe Dose: How Training Data Drives Unsafe Image Generation
标题:无安全剂量:训练数据如何导致不安全的图像生成
链接:https://arxiv.org/abs/2605.28137
摘要
:Text-to-image models trained on large-scale data often inevitably ingest unsafe content. While some people observe input-output amplifications, it remains unclear whether and how training data composition directly drives model output safety or by other factors. We shed light on this question by isolating this variable: we train the same text-to-image model on datasets that differ \emph{only} in their fraction of unsafe images (0\% to 9.6\%), across several dataset scales (100K to 8M). Then we generate images with the resulting models, and evaluate them with four independent safety classifiers. Output unsafety rises monotonically from 16.6\% at 0\% contamination to 25.5\% at 5\%. A factorial design reveals that the \emph{proportion}, not the absolute count, of unsafe training images is the operative variable. The 16.6\% irreducible baseline at zero contamination implicates the other components, e.g. frozen text encoder, as a residual safety risk -- confirmed by a text encoder ablation showing that SafeCLIP reduces this floor to 9.6\%, while the dose-response effect persists across all three encoders tested. Critically, no quality degradation in terms of FID, CLIPscore and ImageReward accompanies safety filtering. These results establish that data curation and text encoder safety are complementary and independently effective interventions. At the same time, the remaining level of unsafety poses questions for future research about emerging capabilities and compositionality.
【7】SPARD: Defending Harmful Fine-Tuning Attack via Safety Projection with Relevance-Diversity Data Selection
标题:SPARD:通过安全投影和相关性多样性数据选择来防御有害的微调攻击
链接:https://arxiv.org/abs/2605.28030
备注:Accepted by ICML 2026
摘要:Fine-tuning large language models often undermines their safety alignment, a problem further amplified by harmful fine-tuning attacks in which adversarial data removes safeguards and induces unsafe behaviors. We propose SPARD, a defense framework that integrates Safety-Projected Alternating optimization with Relevance-Diversity aware data selection. SPARD employs SPAG, which optimizes alternatively between utility updates and explicit safety projections with a set of safe data to enforce safety constraints. To curate safe data, we introduce a Relevance-Diversity Determinantal Point Process to select compact safe data, balancing task relevance and safety coverage. Experiments on GSM8K and OpenBookQA under four harmful fine-tuning attacks demonstrate that SPARD consistently achieves the lowest average attack success rates, substantially outperforming state-of-the-art defense methods, while maintaining high task accuracy. Code is available at https://github.com/shuhao02/SPARD.
【8】FedEHR-Gen: Federated Synthetic Time-Series EHR Generation via Latent Space Alignment and Distribution-Aware Aggregation
标题:FedEHR-Gen:通过潜在空间对齐和分布感知聚合的联邦合成时间序列EHR生成
链接:https://arxiv.org/abs/2605.27892
备注:8 pages main paper with 14 pages supplementary appendix
摘要:Synthetic Electronic Health Record (EHR) generation provides a promising avenue for data augmentation and cross-hospital modeling in privacy-constrained healthcare settings. However, most existing EHR generative models are centralized and require pooling data across hospitals, which is often infeasible when real-world data sharing is restricted. While federated EHR generation offers a natural solution, direct federated modeling often collapses or diverges due to the high dimensionality, sparsity, and cross-hospital heterogeneity of EHR data. In this work, we propose FedEHR-Gen, the first federated framework for synthetic time-series EHR generation across distributed hospitals. FedEHR-Gen uses a two-stage learning paradigm. First, we introduce a federated autoencoder that projects high-dimensional and sparse EHR features onto a compact latent space. To ensure semantic consistency across hospitals, we develop a layer-wise matching aggregation mechanism that aligns local encoders into a unified global latent space. Second, operating on this aligned latent space, we train a federated temporal conditional variational autoencoder (TCVAE) with distribution-aware aggregation, enabling stable temporal generative modeling under severe cross-hospital heterogeneity. Extensive experiments on the eICU and MIMIC-III datasets demonstrate that FedEHR-Gen achieves generation fidelity, downstream utility, and privacy risk comparable to centralized training, while consistently outperforming the standard federated baseline.
【9】MRMMIA: Membership Inference Attacks on Memory in Chat Agents
标题:MRMMIA:对聊天代理中记忆的会员推断攻击
链接:https://arxiv.org/abs/2605.27825
备注:This work investigates the MIA on chat agent memory
摘要:Membership inference attacks (MIAs) test whether a target data record belongs to a system's private data, and have become a standard tool to measure privacy leakage in machine learning systems. Prior work has primarily focused on training corpora or retrieval databases. However, MIAs against agent memory have received less attention, even though such memory can contain sensitive user-agent interactions, retrieved facts, and user preferences. Therefore, in this work, we focus on chat agent memory MIAs, where an adversary infers whether a candidate memory unit belongs to the chat agent's memory store. We propose Multi-Recall Memory MIA (MRMMIA), a unified attack that utilizes multiple recall probes to the agent to extract the membership signal across black-box, gray-box, and white-box settings. Our experiments demonstrate that MRMMIA consistently outperforms baselines. Our results expose the privacy risk in agents and provide an initial evaluation framework for membership leakage in chat-agent memory systems.
【10】Density-aware Sample-specific Attack
标题:密度感知样本特定攻击
链接:https://arxiv.org/abs/2605.27809
备注:18 pages, 6 figures, 8 tables. Submitted to NeurIPS 2026
摘要:Despite recent progress in backdoor attacks, existing methods remain susceptible to post-training defenses that erase the backdoor through fine-tuning or pruning. We revisit the core objectives of backdoor attacks and derive principled criteria characterizing optimal sample-specific trigger construction under a Bayes-optimal model of the victim's training. Our analysis reveals that both attack success and clean-accuracy preservation are simultaneously optimized when triggered samples are steered into low-density regions of the clean data distribution, a distributional condition that controls all moments of the poisoned distribution at once rather than a handful of input-space summary statistics. We introduce a bilevel optimization framework that estimates density ratios via conditional time-score matching and optimizes a mixture-model objective to place triggered samples in these sparse regions. Extensive evaluations on MNIST, CIFAR-10, GTSRB, and TinyImageNet demonstrate that our method achieves above 99\% attack success rate before defense and retains 50--85 percentage points higher post-defense ASR than the strongest baselines under fine-tuning defenses. Against neuron-pruning defenses, the method exhibits complete immunity, with zero neurons identified for removal across all pruning thresholds. These results expose a fundamental gap in current defense paradigms and underscore the need for defenses that operate beyond the support of the clean distribution.
【11】Backdoor Attacks on Fault Detection and Localization in Cyber-Physical Systems
标题:对网络物理系统故障检测和定位的后门攻击
链接:https://arxiv.org/abs/2605.27674
摘要
:Cyber-Physical Systems (CPS) integrate sensing, communication, computation, and control to support critical infrastructure, including smart grids, industrial automation, and control systems. In the electrical utility domain, various controllers are used in CPS to ensure the system detects and recovers from faults, such as voltage fluctuations, and to perform load balancing in distribution systems. Machine learning- and deep learning-based fault detection and localization frameworks have recently gained significant attention in CPS for their ability to identify anomalies and operational failures in real time. However, these intelligent models are vulnerable to adversarial machine learning attacks, particularly backdoor attacks. In a backdoor attack, an adversary injects malicious patterns into the training data so that the model behaves normally most of the time but produces attacker-controlled outputs when triggered by specific patterns. This paper investigates the threat of backdoor attacks against fault detection and localization mechanisms in recent ML pipelines used in modern CPS systems. We define these threats and explore how they can be realized by designing triggers and evaluating their success in the CPS domain. Our experiments show the attack is successful even with 10\% of poisoning.
【12】The Future of Facts: Tracing the Factual Generation-Verification Gap
标题:事实的未来:追踪事实一代验证差距
链接:https://arxiv.org/abs/2605.27564
备注:Code for this project is available at https://github.com/anjasurina/factgap , blog post at https://www.trdavidson.com/fact-gap
摘要:Language models are becoming the default interface to factual knowledge, yet they often verify outputs more reliably than they generate them. This generation-verification gap (GV-gap) underlies many recent advances in self-improvement and reasoning, but its dynamics on factual knowledge specifically remain poorly understood. We focus on the training mechanisms underlying factual GV-gaps, distinguishing them from their computational and aesthetic counterparts. We trace generation and verification capabilities through three training phases (acquisition, continual learning, and updating) across four open-source model families at two scales each. Three findings recur across models: (i) verification is consistently learned before generation; (ii) verification is more robust to continual learning than generation; and (iii) factual updates can leave models in a "multi-verse" state, simultaneously verifying both old and new answers as correct. Natural experiments on frontier models reproduce these dynamics at scale and reveal residual verification biases on well-covered facts.
【13】Representation-Conditioned Diffusion Models for Guided Training Data Generation
标题:用于引导训练数据生成的表示条件扩散模型
链接:https://arxiv.org/abs/2605.27495
摘要:Data availability remains a critical bottleneck in many deep learning applications. Large-scale datasets are often expensive to collect, curate and annotate, which can limit the scalability and applicability of supervised learning methods. In this work, we evaluate the classification performance of models trained on synthetic image datasets produced by generative deep learning. In particular, we use latent diffusion models conditioned on learned representations from DINOv2, DINOv3, and CLIP. Our results demonstrates that this representation-conditioned formulation significantly outperforms class-conditioned generation by a large margin (+10.76 p.p. top-1 accuracy on ImageNet100), by improving sample quality and mode coverage. Furthermore, by scaling the size of the synthetic dataset, we are able to outperform a classifier trained on the real data (+2.0 p.p top-1 accuracy). We also demonstrate how generated images can be used for augmentation purposes, outperforming classical augmentation methods, and how the conditioning space can be used for sample filtering to further improve training value. Collectively, these findings highlight that representation-conditioned diffusion models provide a promising approach for augmenting, complementing, or potentially replacing real-world datasets in large-scale visual learning tasks.
【14】Grounded Cache Routing for Retrieval-Augmented Generation: When Is It Safe to Reuse an Answer?
标题:用于检索增强一代的固定缓存路由:何时可以安全地重复使用答案?
链接:https://arxiv.org/abs/2605.27494
备注:19 pages, 9 figures, 10 tables. Code: https://github.com/syedhumarahim/grounded-cache-router
摘要:Modern retrieval-augmented generation(RAG) deployments increasingly rely on caching to reduce token cost and time-to-first-token(TTFT). Prefix-level KV reuse is now standard in serving stacks such as vLLM, and chunk-level and position-independent reuse have been pushed further by recent systems(RAGCache, TurboRAG, CacheBlend, EPIC, ContextPilot, PCR, LMCache). Output-level semantic answer caches, by contrast, remain fragile: similar prompts can map to different correct answers, retrieved evidence drifts as the corpus is updated, and adversarial collision attacks have been shown to hijack cached responses. We argue that the right framing for cached answer reuse is not how to reuse faster but when reuse is safe. We propose GroundedCache, an evidence-validated cache router that admits a cached answer only when 4 cheap gates simultaneously hold: query similarity, retrieved-evidence overlap, source-version validity, and lexical (or judge-based) support of the cached answer by the freshly retrieved evidence. We build a six-regime workload that stress-tests cache safety rather than only hit rate, and introduce an operator-facing metric, the unsafe-served rate (USR), fraction of all queries that received a wrong cached answer. Across 2 datasets and 12,000 real-LLM generations(Qwen2.5-7B-Instruct on vLLM with Automatic Prefix Caching), GroundedCache drives USR to 0.0% on every HotpotQA regime(vs. 15-35% under naive caching) and to 1.5% on mtRAG document drift(vs. 51.5%), a 34x reduction on the design-point adversarial regime and 3-10x reductions across the other mtRAG regimes, while end-to-end p50 latency stays within 1.04-1.07x of a no-cache RAG baseline. A per-gate ablation isolates the lexical support gate as the load-bearing safety mechanism on both datasets, with the remaining gates providing defense-in-depth at near-zero cost. We release the implementation, workload, and evaluation harness.
【15】Triangular-Reference Schrödinger Bridges for Time Series Generation
标题:用于时间序列生成的三角参考Schrödinger桥
链接:https://arxiv.org/abs/2605.27478
摘要:We introduce Triangular-Reference Schrödinger Bridges for Time Series (TR-SBTS), a conservative extension of the SBTS framework in which the Brownian reference is replaced by an intervalwise frozen, possibly degenerate diffusion reference, triangular across a hierarchy of latent volatility levels. The construction is a single entropy projection on the augmented state space, with the variational constraint imposed jointly across time and the latent levels and unfolded hierarchically by the disintegration of relative entropy. The variational core of SBTS is preserved: the entropy minimiser is the h-transform of the reference, and on each frozen interval the optimal dynamics admit a logarithmic-gradient drift formula on the affine leaves of the active covariance directions, valid even when the frozen covariance is rank-deficient. We establish stability of the frozen approximation and convergence of the corresponding regularised kernel estimators. The construction is realised through a finite-dimensional conditioning map assembled from three complementary reductions of the past -- a block PCR summary, a reference-aware Mahalanobis kernel on past increments induced by the runtime frozen covariance cumulants, and a past-window WLS drift regressor under the same reference metric -- together with a coupled state-covariance bridge step in which each latent level produces a dynamic reference for the level above, summarised by a covariance descriptor; the construction is evaluated on numerical experiments.
半/弱/无/有监督|不确定性|主动学习(11篇)
【1】Online Irregular Multivariate Time Series Forecasting via Uncertainty-Driven Dual-Expert Calibration
标题:基于不确定性驱动双专家校正的在线不规则多变量时间序列预测
链接:https://arxiv.org/abs/2605.28603
备注:Accepted by KDD 2026
摘要:
【2】Soft-SVeRL: Self-Verified Reinforcement Learning with Soft Rewards
标题:Soft-SVeRL:具有软奖励的自我验证强化学习
链接:https://arxiv.org/abs/2605.28561
摘要:
【3】High Performance, Low Reliability: Uncertainty Benchmarking for Tabular Foundation Models
标题:高性能、低可靠性:表格基础模型的不确定性基准
链接:https://arxiv.org/abs/2605.28554
备注:6 pages, 2 figures, 2 tables. Accepted at ESANN 2026 (European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning), 22-24 April 2026, Bruges (Belgium)
摘要:
【4】Semi-Supervised Hypothesis Testing by Betting on Predictions
标题:通过押注预测进行半监督假设测试
链接:https://arxiv.org/abs/2605.28533
摘要:
【5】SmartIterator: Visual Analytics Workflows for Supervising Unsupervised Data Grouping
标题:SmartIterator:用于监督无监督数据收件箱的视觉分析工作流程
链接:https://arxiv.org/abs/2605.28219
摘要:Unsupervised learning methods -- topic modeling, partition-based and density-based clustering -- produce data groupings without human guidance, yet choosing and evaluating those groupings should not itself be unsupervised. We present \emph{SmartIterator}~(SI), a visual analytics approach that treats the full sequence of grouping results across a parameter sweep as a first-class analytical object. For each method family, SI provides a structured six-phase workflow that guides the analyst through systematic exploration of grouping results -- from quality-metric overview through transition-stability assessment, membership-confidence evaluation, content and context inspection, and recurrent-archetype verification to an informed decision -- building cumulative understanding of data structure along the way. The workflows are operationalized through \emph{IteraScope}~(IS), a coordinated visual display combining quality-metric charts with semantic color encoding, a 1D group embedding with Sankey-style transition flows and violin plots of membership confidence, a 2D group embedding with HDBSCAN-detected recurrent archetypes that highlights iterations capturing all persistent patterns, and domain-specific linked views for contextualized interpretation. We demonstrate the three workflows on: (1)~simulated social-media messages from the VAST Challenge 2011 (density-based clustering, validated against ground truth), (2)~EU population statistics across ${\sim}1\,500$ NUTS-3 regions (partition-based clustering), and (3)~30 years of IEEE VIS papers (NMF topic modeling). The workflows constitute the main contribution: they provide actionable, method-specific guidance for navigating parameter spaces, studying how data structure evolves across configurations, and grounding analytical understanding in domain context -- yielding knowledge about the data that no single ``best'' result can provide.
【6】Unification and Optimization of Robust Supervised Learning
标题:鲁棒监督学习的统一与优化
链接:https://arxiv.org/abs/2605.28165
摘要:The literature has proposed various robust alternatives to empirical risk minimisation to address failure modes such as distribution shift, label noise and finite-sample degeneracies. Examples include distributionally robust optimization, label smoothing, vicinal risk minimization, and Mixup. However, such approaches are typically developed in isolation, forcing practitioners to commit a priori to a single failure mode even when the dominant mode for the task is unclear. To address this, we organize a broad class of existing methods along three common design axes and derive a tractable training procedure that decomposes robust learning into sequential stages (reference distribution enrichment, input-space perturbation, label-space perturbation, and sample-level aggregation), each with a choice of stance (pessimistic, neutral, or optimistic). This results in a unified design space in which joint hyperparameter optimization can compose and configure robustness strategies suited to the task at hand. Across tabular, image, and reward modeling benchmarks, joint hyperparameter optimization is competitive with the best single-method baseline in each setting, offering a reliable default for practitioners who do not know a priori which failure mode dominates their task.
【7】DecomposeRL: Learning to Ask Useful, Informative, and Diverse Questions for Semi-Supervised, Traceable Claim Verification
标题:DecomposeRL:学习提出有用、信息丰富且多样化的问题以进行半监督、可追溯的声明验证
链接:https://arxiv.org/abs/2605.27858
摘要:Claim verification splits between end-to-end classifiers that are accurate but yields no inspectable traces, and decomposition-based methods produce inspectable traces but lag performance on benchmark datasets. We propose DecomposeRL an accurate claim-verifier that produce inspectable traces. DecomposeRL frames decomposition as an RL policy trained with GRPO and a multi-faceted reward ensemble, enabling both fully supervised and semi-supervised learning from unlabeled claims. DecomposeRL addresses the prohibitive training cost of GRPO with a data-curation funnel that distills 115K fact-verification claims into a compact, learning-signal-dense subset of 5K claims. We show that a DecomposeRL-7B policy trained with full supervision on only ~5K curated claims achieves 86.3 in-domain and 69.8 out-of-domain balanced accuracy across 11 claim-verification benchmarks containing biomedical, political, scientific, and general-domain claims. Despite being 4x smaller, it matches 32B baselines and GPT-4.1-mini, and it further outperforms baselines in a semi-supervised setting with only 10% labeled claims data. Code, data, and models are available at https://dipta007.github.io/DecomposeRL
【8】CAREF: Calibration-Aware Regularization for Explanation Faithfulness Without Rationale Supervision
标题:CAref:在没有理由监督的情况下,校准意识的解释忠实性规范化
链接:https://arxiv.org/abs/2605.27835
备注:10 pages
摘要
:We introduce CAREF, a parameter-efficient fine-tuning framework that jointly optimizes predictive accuracy and explanation faithfulness via calibration-aware regularization. At its core, CAREF couples entropy-based calibration with token-level sparsity control through a single unified loss, the Calibration-Aware Regularization for Explanation Faithfulness (LSCED), without requiring rationale supervision. Evaluated on four NLE benchmarks (COS-E, ECQA, ComVE, e-SNLI) with Flan-T5, our lightweight CAREF-AQ variant attains the best average accuracy (89.04) and explanation alignment (81.00 nBERT) using only 6.43% of trainable parameters, outperforming LoRA and AdaLoRA. To our knowledge, CAREF is the first method to unify entropy and sparsity regularization in a single training objective for interpretable LLM fine-tuning.
【9】Supervised Distributional Reduction via Optimal Transport and Dependence Maximization
标题:通过最佳运输和依赖最大化进行有监督的分配缩减
链接:https://arxiv.org/abs/2605.27619
摘要:Learning representations that capture both intrinsic data geometry and target-relevant structure remains a fundamental challenge, particularly in settings where data reduction must balance compression with predictive fidelity. While distributional reduction-encompassing joint clustering and dimensionality reduction-offers a principled way to summarize data, its supervised variants remain relatively under-explored, despite the importance of retaining task-relevant signal for downstream prediction and decision-making. We propose Supervised Distributional Reduction (SDR), an algorithm for learning target-aware representations by combining optimal transport with explicit dependence maximization. SDR builds on the Fused Gromov-Wasserstein (FGW) objective to align the relational structure of the input distribution with a set of representative points, while augmenting it with a direct dependence term that encourages the learned embeddings to capture predictive signal more explicitly. This results in compact representations that reflect both geometric structure and supervision. Beyond representation learning, SDR naturally induces a data-dependent, non-stationary geometry that can be leveraged for settings such as Gaussian Process (GP) modelling. By redefining distances through target-aware distributional alignment, SDR enables the construction of adaptive kernels that respond to local variations in both data geometry and supervision, offering an optimal transport-based perspective on non-stationary kernel design.
【10】HEAL: Resilient and Self-* Hub-based Learning
标题:治愈:弹性和自我 * 中心学习
链接:https://arxiv.org/abs/2605.27475
摘要:Decentralized learning enhances privacy, scalability, and fault tolerance by distributing data and computation across nodes. A popular approach is Federated learning, which relies on a central aggregator, yet faces challenges such as server vulnerabilities, scalability issues, privacy risks and most importantly, the single point of failure. Alternatively Gossip Learning and Epidemic Learning offer fully decentralization through peer-to-peer exchanges of model updates, ensuring robustness and privacy, at the price of slower model convergence. In this work, we introduce a novel decentralized learning framework called HEAL. HEAL is the first cross-layer decentralized learning framework that exploits an optimized self-organizing and self-healing underlying P2P overlay combining the strengths of Federated Learning, Gossip and Epidemic Learning. Leveraging the recently proposed Elevator algorithm, HEAL promotes dynamically chosen nodes to act as aggregators. Through simulations, we demonstrate that HEAL has similar performances to that of Federated Learning in crash-free settings, while being fully decentralized and fault-tolerant. In crash and churn prone environments HEAL outperforms Gossip and Epidemic Learning.
【11】Unsupervised Identification and Removal of Spurious Correlations During Fine-Tuning
标题:微调期间无监督识别和消除寄生相关性
链接:https://arxiv.org/abs/2605.27676
备注:10 + 4 pages, comments welcome
摘要:Fine-tuning a pretrained language model on a curated dataset can produce spurious correlations between the fine-tuning task and unintended latent factors -- such as misaligned personas or political slant -- that the curation procedure has entangled with the task. The model can latch onto these spurious correlations, leading to bias and reduced out-of-distribution generalisation. We prove that under reasonable assumptions on task complexity and the spurious correlation, such latent factors can be identified, without supervision, from the weights of a naive LoRA fine-tune. Existing approaches to removing bias, such as activation steering, remove identified factors from residual-stream activations, either at inference or during training. We argue, however, that the goal should be to remove the spurious correlation, not the latent factor itself, as the pretrained model may rely on it for genuine task signal. To enable this, we propose GRASP, GRadient projection of Associated Spurious Patterns, which prevents the model from acquiring new reliance on the identified latent factor while preserving any pretrained content along it. We validate on three fine-tuning tasks. The first two involve emergent misalignment, where fine-tuning on a narrow task -- in our case, writing insecure code and giving bad medical advice -- leads to misaligned responses on unrelated topics. Here our method completely removes misalignment in the insecure code case and reduces them by ~5x in the bad medical advice case, beating all baselines in the trade-off between misalignment-reduction and task-preservation. The last is a novel political-bias experiment, where fine-tuning on right-skewed Reddit financial-advice data causes political-lean drift on unrelated topics. Here our method reduces drift by more than half, while improving financial task performance, beating all baselines.
迁移|Zero/Few/One-Shot|自适应(16篇)
【1】Stage-wise Distortion-Perception Traversal in Zero-shot Inverse Problems with Diffusion Models
标题:具有扩散模型的零次反问题的阶段失真感知轨迹
链接:https://arxiv.org/abs/2605.28711
备注:Accepted by ICML 2026
摘要:The distortion-perception (D-P) tradeoff is a fundamental phenomenon of Bayesian inverse problems, which characterizes the inherent tension between distortion performance and perceptual quality. Enabling flexible traversal of the D-P tradeoff at inference time is crucial for practical applications. Despite the recent success of diffusion models in zero-shot inverse problem solving, efficient and principled strategies for D-P traversal in diffusion-based inverse algorithms remain inadequately characterized. In this paper, we propose a stage-wise framework for realizing D-P traversal using a single diffusion model in zero-shot inverse problems. Our proposed method, termed MAP-RPS, starts with an MAP estimation stage that approximates the MMSE solution and provides a low-distortion initialization, followed by a re-noised posterior sampling stage that progressively improves perceptual quality. We provide theoretical analyses for both stages, establishing the validity and effectiveness of the proposed design. Furthermore, we extend MAP-RPS to the latent space, yielding LMAP-RPS, which enjoys broader applicability by leveraging large-scale pre-trained latent diffusion backbones. Extensive experiments demonstrate that MAP-RPS and LMAP-RPS enable more effective D-P traversal on various tasks, while also exhibiting strong performance as efficient solvers for real-world inverse problems.
【2】History-aware adaptive reduced-order models via incremental singular value decomposition
标题:通过增量奇异值分解的历史感知自适应降阶模型
链接:https://arxiv.org/abs/2605.28684
备注:50 pages, 27 figures, Preprint submitted to Elsevier
摘要:
【3】AdaDPO: Self-Adaptive Direct Preference Optimization with Balanced Gradient Updates
标题:AdaDPO:具有平衡梯度更新的自适应直接偏好优化
链接:https://arxiv.org/abs/2605.28440
备注:5 figures
摘要:DPO has become a widely adopted alternative to RLHF for aligning LLMs with human preferences, eliminating the need for a separate reward model or RL loop. Recent theoretical analysis uncovers an asymmetric gradient behavior in DPO: the loss suppresses dispreferred responses substantially faster than it promotes preferred ones, causing the model to learn to avoid bad answers rather than to generate good ones. We propose AdaDPO, a Self-Adaptive variant of the DPO algorithm that introduces per-preference-pair, stop-gradient-based coefficients derived directly from the policy model's generation probabilities, with the reference model's probabilities as an optional component. AdaDPO is constructed to enforce equality of gradient magnitudes between preferred and dispreferred probabilities; the practical implementation balances per-token gradients and applies a numerical clipping bound for stability, while retaining DPO's original hyperparameter structure. On Llama-3-8B-Instruct trained on UltraFeedback under a SimPO similar setup, AdaDPO consistently outperforms DPO on AlpacaEval 2: it achieves higher length-controlled win rates (LC) in 81% of hyperparameter combinations, attains the global best LC (48.3%) and raw win rate (46.1%), and enlarges the LC-over-WR margin in 88% of combinations, indicating effective mitigation of length bias. Additional analyses on KL divergence, reward margin, and reward accuracy confirm that AdaDPO rectifies the gradient imbalance and yields more efficient optimization. Because it operates purely at the loss level, AdaDPO can be dropped into existing preference-based alignment pipelines without changing data collection or model architectures. The method requires only a few lines of code, and the same self-adaptive principle generalizes to a broad family of pairwise contrastive preference losses including SimPO, R-DPO, IPO, CPO, and ORPO.
【4】ADWIN: Adaptive Windows for Horizon-Aware On-Policy Distillation
标题:ADWIN:用于地平线感知按策略蒸馏的自适应Windows
链接:https://arxiv.org/abs/2605.28396
摘要:On-policy distillation (OPD) transfers reasoning behavior by training a student on teacher feedback along student-generated trajectories, but standard full-rollout training ties every update to a costly completion and can over-allocate supervision to late positions with low marginal value for the current student. We revisit this assumption through the useful supervision horizon: student-induced rollouts can drift from teacher-preferred continuations, while aligned prefixes may already preserve the long-horizon OPD update direction. We propose ADWIN, an adaptive-window framework for OPD that treats rollout length as an online admissibility decision, training on short teacher-anchored prefixes while using delayed full-rollout probes to audit prefix--full alignment and adapt the next horizon with staleness control. Across math and code reasoning benchmarks in single-task, multi-task, and strong-to-weak settings, ADWIN improves the accuracy--compute trade-off over full-rollout OPD and prefix-based baselines, reducing end-to-end training cost by up to 4.1 times while achieving comparable or better accuracy.
【5】Adaptive Bandit Algorithms for Contextual Matching Markets
标题:上下文匹配市场的自适应Bandit算法
链接:https://arxiv.org/abs/2605.28290
备注:Accepted to ICML 2026
摘要:We study bandit learning in matching markets, where players and arms constitute the two market sides, and the players' utilities are linear in the arm contexts. In each round, new arms arrive with observable contexts. Then, the algorithm matches them to players, aiming to minimize each player's regret against a stable matching benchmark. This contextual structure creates significant complexity: subtle context shifts can slightly alter one player's utility while completely reconfiguring the underlying benchmark, causing large regret spikes for others. We address this in two settings: stochastic contexts, drawn from a latent distribution, and adversarial contexts, which may be arbitrary. For the stochastic case, we introduce a novel minimum preference gap to capture learning difficulty and provide a fully adaptive algorithm with an instance-dependent poly-logarithmic regret upper bound. We also establish matching instance-independent regret upper and lower bounds under a mild distributional assumption. For the adversarial setting, we propose a tractable regret notion that remains valid under arbitrary contexts and achieves an instance-independent sublinear regret bound via an adaptive algorithm.
【6】Analyzing Quality-Latency-Resource Trade-offs in a Technical Documentation RAG Assistant Using LoRA Adaptation
标题:使用LoRA自适应分析技术文档RAG助手中的质量-延迟-资源权衡
链接:https://arxiv.org/abs/2605.28222
备注:13-page main body plus extended appendix; 6 figures; benchmark, LoRA adapters, and code at https://github.com/EugPal/rag-lora-tradeoffs
摘要:We study quality-latency-resource trade-offs in a documentation-grounded retrieval-augmented generation (RAG) system that uses Low-Rank Adaptation (LoRA) of the generator. We build a manually verified benchmark of 5,144 question-answer pairs over the official Kubernetes documentation and combine it with a fixed hybrid-retrieval pipeline (BGE-M3 dense, BGE-M3 native sparse, Reciprocal Rank Fusion, cross-encoder reranking). Over this benchmark we ablate 20 LoRA configurations on Llama-3.2-3B-Instruct and Llama-3.1-8B-Instruct across rank and target-module choices, and evaluate each on token-level F1, LLM-judged groundedness and correctness (pass@4), inference latency, inference memory, and training cost, all reported with bootstrap 95% confidence intervals. Pareto analysis shows that LoRA adapters acting only on the q and v attention projections consistently dominate the front, while the 3B/8B choice mainly defines operating regime. A param-matched control comparison further indicates that the q/v advantage is structural rather than purely parametric. The benchmark, selected adapters, and code are available at https://github.com/EugPal/rag-lora-tradeoffs.
【7】Adaptive Reservoir Computing for Multi-Scenario Chaotic System Forecasting
标题:用于多场景混乱系统预测的自适应水库计算
链接:https://arxiv.org/abs/2605.28145
备注:4 pages, 2 figures
摘要
:We present an adaptive reservoir computing framework for the CTF-4-Science Lorenz benchmark, which evaluates machine learning models across twelve distinct tasks spanning five qualitatively different scenarios: baseline forecasting, noisy signal reconstruction, forecasting under noise, few-shot learning, and parametric generalization. Rather than applying a uniform inference strategy, we tailor the training and prediction procedure of Echo State Networks (ESNs) to the specific demands of each evaluation scenario. Our key contributions are fourfold: (1) exact reservoir state synchronization that eliminates warmup approximation error in short-time prediction; (2) histogram-guided candidate selection that directly optimizes the long-time ergodic evaluation metric; (3) multi-seed reservoir search for few-shot regimes with severely limited training data; and (4) sequential multi-sequence training that resolves state-distribution mismatch in parametric generalization tasks. The proposed framework achieves a score of 74.91 on the public benchmark leaderboard, demonstrating that carefully adapted reservoir computing constitutes a competitive and computationally efficient approach for diverse chaotic system modeling challenges.
【8】Adaptive Coarse-to-Fine Subgoal Refinement for Long-Horizon Offline Goal-Conditioned Reinforcement Learning
标题:用于长视野离线目标条件强化学习的自适应从粗到细子目标细化
链接:https://arxiv.org/abs/2605.28127
摘要:Offline goal-conditioned reinforcement learning (GCRL) is challenging in long-horizon tasks, where distant state--goal pairs provide weak supervision and value estimates become vulnerable to accumulated bootstrapping errors. Hierarchical methods mitigate this difficulty by introducing intermediate subgoals, but fixed temporal abstractions or fixed hierarchy depths can be mismatched to state--goal pairs with different reachability horizons. We propose Coarse-to-Fine Hierarchical Goal Reinforcement Learning (CFHRL), a fully offline GCRL framework that adaptively refines distant goals before execution. Starting from the final goal, CFHRL recursively proposes intermediate targets, trained from replay-supported candidates, and stops refinement once the current target is estimated to be locally executable by a learned reachability cost. The key idea is that a subgoal need not be an exact midpoint or globally optimal waypoint; it only needs to provide reliable progress and reduce the remaining reaching difficulty, enabling subsequent refinement over shorter horizons. A stylized analysis further supports the robustness of approximate recursive contraction. Experiments on OGBench show substantial gains on several long-horizon tasks, with ablations validating the proposed refinement and stopping mechanisms
【9】On the Learnability of Test-Time Adaptation: A Recovery Complexity Perspective
标题:测试时适应的可学习性:恢复复杂性的角度
链接:https://arxiv.org/abs/2605.28057
备注:Accepted by ICML 2026
摘要:Test-time adaptation (TTA) aims to adapt models to maintain reliable performance on non-stationary test streams without requiring labeled data. Despite its empirical success, the learnability of TTA under non-stationary streams remains unexplored. A key challenge is the lack of a principled theoretical framework that simultaneously aligns with the TTA objective and captures both continuously evolving distribution shifts and intrinsic information constraints. To address this gap, we propose the first theoretical framework for studying the learnability of TTA and introduce $(ε,δ)$-Recovery Complexity and $(ε,ρ)$-TTA Learnability. Recovery complexity measures the post-shift time needed to maintain excess risk below a target level with high probability, and is further extended to TTA learnability, which measures the long-term reliability of TTA. Within this framework, we introduce a novel discrete surrogate for non-stationary test streams, enabling a unified and tractable analysis of both gradual and abrupt shifts. We derive order-wise matching lower and upper bounds on recovery complexity, revealing fundamental limits of TTA and an intrinsic adaptivity-information trade-off. These results provide unified learnability guarantees for TTA that complement regret-based analyses.
【10】Reward Transfer from Inverse Reinforcement Learning: A Coupled Minimax Approach
标题:反向强化学习的奖励转移:一种耦合极小极大方法
链接:https://arxiv.org/abs/2605.27834
摘要:We study the transfer of rewards learned using inverse reinforcement learning from expert demonstrations in one environment to reinforcement learning in a new, different environment. This arises naturally when demonstrations are collected in a controlled environment. We formulate the problem as a joint system of Bellman equations across the source and target environments and develop minimax estimators for the target soft-$q$-function. Whereas a sequential solution approach first estimates the source reward and then plugs it into the target control problem, a coupled approach solves the source and target system of equations jointly. We show that, in contrast to the sequential approach, the coupled approach removes the first-order influence of source Bellman residual error. We characterize the local behavior of each approach, develop finite-sample soft-$q$-function error bounds, and prove regret guarantees for the resulting soft-control policy. An empirical investigation using a sepsis simulator validates the theoretical comparison.
【11】Transferable Reinforcement Learning via Probabilistic Latent Embeddings and Dynamic Policy Adaptation for Sim-to-Real Deployment
标题:通过概率潜在嵌入和动态策略适应的可转移强化学习,用于模拟到真实部署
链接:https://arxiv.org/abs/2605.27659
摘要:Due to limited resources and public safety concerns, deep reinforcement learning (RL) agents for many cyber-physical systems (e.g., autonomous vehicles) are first trained in simulators. However, when deployed in real world environments, they often suffer from performance degradation or safety violations because of the inevitable Sim2Real gap. Existing zero-shot approaches, such as robust safe RL and domain randomization, mitigate this issue but typically at the cost of degraded performance or residual safety risks when experiencing unmodeled system dynamics. To address these limitations, we propose a novel reinforcement learning framework that enables safe and efficient policy transfer via probabilistic latent embeddings and dynamic policy adaptation. We consider a family of Constrained Markov Decision Processes (CMDPs) under different environment contexts. By leveraging latent context variable in meta-RL, the proposed framework infers the latent representation of the environment from simulated experiences. Furthermore, it incorporates a distributional RL formulation, which allows risk levels of the deployed policy to be adjusted dynamically, based on the estimation accuracy of the latent context variable. This strategy promotes safety at the early deployment stage and improves efficiency through fast policy adaptation under the Sim2Real gap.
【12】Faster Thermal Profiling of a Lunar Rover with Machine Learning Adapted Finite Difference Model
标题:基于机器学习自适应差分模型的月球车快速热剖面
链接:https://arxiv.org/abs/2605.27651
摘要
:Autonomous space systems operating in extreme thermal environments require accurate and efficient thermal modeling to support both pre-mission system design and onboard autonomy. For lunar rovers, large temperature gradients, radiative heat transfer, and variable surface conditions make reliable thermal prediction especially challenging. High-fidelity physics-based simulations provide accurate results but are computationally expensive, while simplified models and lookup-table approach often lack sufficient accuracy. Physics-informed machine learning (PIML) offers a promising alternative by combining data-driven models with embedded physical knowledge. This paper presents a PIML framework for thermal analysis of a simplified lunar rover with internal heat sources, where machine learning enables environment-adaptive coarse meshing. The proposed architecture integrates a transfer neural network (TNN) that adaptively determines 3D finite-difference nodalization based on thermal loads and initial conditions, enabling more accurate coarse-mesh calculations. A differentiable finite-difference thermal simulator is embedded within the framework to enforce physical consistency and support efficient training, while an upscaling layer reconstructs high-resolution temperature fields from the coarse-grid solution. The proposed PIML approach is evaluated against high-fidelity fine-mesh simulations, low-fidelity fixed coarse-mesh models, and a purely data-driven artificial neural network (ANN). Results show that the PIML framework improves prediction accuracy by 50% and 39% relative to the coarse-mesh physics model and ANN model, respectively, while maintaining physically consistent thermal distributions. Computationally, the framework is also 3x faster than high-fidelity simulations, demonstrating an effective balance between accuracy and efficiency for thermal modeling of lunar rover systems.
【13】Energy-Structured Low-Rank Adaptation for Continual Learning
标题:持续学习的能量结构低等级适应
链接:https://arxiv.org/abs/2605.27482
备注:Accepted by ICML 2026
摘要:While orthogonal subspace methods try to mitigate task interference in Continual Learning (CL), they often suffer from energy diffusion across the basis, hindering knowledge compaction and exhausting capacity for future tasks. We observe that output feature drift induced by parameter updates is inherently low-rank, and theoretically prove that preserving parameters along the principal directions of this drift minimizes the output reconstruction error. Motivated by this, we propose \textbf{E}nergy-Concentrated and \textbf{E}nergy-Ordered \textbf{Lo}w-\textbf{R}ank \textbf{A}daptation (E$^2$-LoRA). By explicitly ordering and concentrating knowledge into leading ranks, E$^2$-LoRA frees capacity for subsequent tasks. Furthermore, we design a dynamic rank allocation strategy to balance stability and plasticity by jointly optimizing energy retention and model plasticity. Extensive experiments across multiple benchmarks demonstrate that E$^2$-LoRA achieves state-of-the-art performance.
【14】Variance-Adaptive Optimal Algorithm for Reinforcement Learning with Multinomial Logit Function Approximation
标题:多项Logit函数逼近的强化学习方差自适应最优算法
链接:https://arxiv.org/abs/2605.28364
摘要:Reinforcement learning with multinomial logistic (MNL) function approximation has become an important framework due to its flexibility and broad applicability. While existing studies have established regret guarantees under worst-case analysis, they do not capture how performance depends on the variability of the interaction between the learner and the environment. In this paper, we develop a new theoretical analysis for MNL-based Markov decision processes that yields explicit variance-adaptive regret bounds. Our algorithm is computationally efficient and achieves the instance-wise optimal rate of regret, narrowing the gap between upper and lower bounds. Our numerical experiments validate that our method learns optimal policies more efficiently than conventional approaches.
【15】Quantum Machine Learning-based 6G edge Network: Enabling Adaptive Communication and Model Aggregation
标题:基于量子机器学习的6G边缘网络:实现自适应通信和模型聚合
链接:https://arxiv.org/abs/2605.27417
摘要:With the advent of sixth-generation (6G) mobile communication technology, vehicle-to-everything (V2X) communication faces unprecedented challenges in communication efficiency, system generalization capabilities, and model collaboration. Conventional machine learning struggles with high-dimensional state spaces, slow convergence, and poor generalization under heterogeneous V2X nodes, rapidly varying channels, and multimodal sensing data in V2X systems. To address these issues, we propose a quantum-enhanced framework for V2X communication and model aggregation that targets efficient, robust, and intelligent transportation in 6G, which includes four modules: the channel-adaptive semantic communication module, the multimodal fusion module, the model transfer module, and the federated aggregation module. Specifically, the channel-adaptive semantic communication module leverages quantum convolutional neural networks (CNN) and quantum distortion metrics to enable efficient transmission and strong generalization across diverse conditions. The multimodal fusion module exploits quantum attention and entanglement to compress features and associate semantics across heterogeneous data. The model transfer module employs quantum reinforcement learning to model decision-making and improve adaptability in dynamic environments. The federated aggregation module integrates quantum tensor decomposition with backpropagation-based corrections to provide privacy preservation with low overhead and to strengthen global model robustness. This work outlines a new paradigm for communication and model collaboration in future 6G intelligent transportation.
【16】Zero-shot Quantum Neural Architecture Search
标题:Zero-Shot量子神经架构搜索
链接:https://arxiv.org/abs/2605.27410
摘要:Variational Quantum Algorithms (VQAs) are a leading approach to exploiting near-term quantum hardware, leveraging parameterized quantum circuits and classical optimization to achieve advantage. Despite their promise, the practical deployment of VQAs is challenged by the difficulty of designing quantum circuit architectures that balance expressivity, trainability, and hardware constraints. Existing evolutionary-based quantum neural architecture search methods address these challenges but suffer from high computational costs due to repeated training of candidate circuits. In this work, we identify a setting in which the Gram matrix of the Quantum Neural Tangent Kernel converges. Building on this observation, we design a zero-shot surrogate model to estimate candidate performance without full training, significantly accelerating the architecture search process. Using this surrogate, we propose MZeQAS, a Monte Carlo Tree Search (MCTS)-based Zero-Shot Quantum Neural Architecture Search framework for VQAs. By integrating proxy-based performance estimation with MCTS exploration, MZeQAS efficiently discovers high-performing architectures. Experimental results demonstrate that MZeQAS outperforms existing approaches in terms of both search efficiency and solution quality, providing a scalable and effective framework for advancing VQA deployment on noisy intermediate-scale quantum devices.
强化学习(9篇)
【1】Optimal Data Acquisition for Reinforcement Learning: A Large Deviations Perspective
标题:强化学习的最佳数据获取:大偏差的角度
链接:https://arxiv.org/abs/2605.28675
摘要:
【2】Teacher-Student Representational Alignment for Reinforcement Learning-Driven Imitation Learning
标题:强化学习驱动的模仿学习的师生代表性对齐
链接:https://arxiv.org/abs/2605.28372
备注:6 pages, 5 figures. Accepted as an oral presentation at the RL4IL Workshop at ICRA 2026
摘要
:Imitation learning (IL) from a state-based reinforcement learning (RL) policy is a common approach to overcome the curse of dimensionality in complex and high-dimensional observation spaces prevalent in robotics. This paper addresses the irreducible imitation gap that emerges when teacher and student are learned in isolation, and the teacher policy has the liberty to rely on privileged state information that the student cannot infer from its observations. Instead of improving poor student performance with RL finetuning after IL, which often requires a whole new training setup, we propose a novel algorithm which learns a shared embedding space that hides agent-specific observations and thus trains imitable teacher policies by construction. We train the shared embedding space with self-supervised contrastive learning in parallel to the teacher policy and prevent it from extracting private information by limiting its gradients from updating the encoder networks. We perform evaluations on several example domains and compare to state-of-the-art baselines showing that our algorithm enables higher student performance with substantially reduced imitation gap.
【3】ProRL: Effective Reinforcement Learning for Proactive Recommendation via Rectified Policy Gradient Estimation
标题:ProRL:通过纠正策略梯度估计进行主动推荐的有效强化学习
链接:https://arxiv.org/abs/2605.28293
备注:Accepted in ICML 2025
摘要:Proactive Recommender Systems (PRSs) aim to guide user preference shift toward target items by generating paths of intermediate recommendations. Reinforcement learning (RL) provides a principled framework for optimizing such sequential decision tasks, as path rewards can naturally capture both short-term acceptance and long-term guidance effectiveness. However, naively applying policy gradients to PRS results in deficient gradient estimation. We identify two deficiencies: (1) path-level rewards decompose into step-level rewards with positive mean, creating a length-dependent bias that causes gradients to favor path extension over meaningful exploration; (2) weighting each step by the entire path-level reward ignores the decomposition structure, leading to high gradient variance. To rectify these two deficiencies, we propose an effective RL framework ProRL with two novel mechanisms for proactive recommendation. First, Stepwise Reward Centering subtracts expected rewards to neutralize length-dependent bias, ensuring that path extension yields zero expected gradient signal. Second, Position-Specific Advantage Estimation leverages the reward decomposition structure to compute step-dependent baselines, reducing gradient variance. Together, these mechanisms yield policy gradients that precisely target path quality. Our experiments on three real-world datasets demonstrate that ProRL significantly outperforms state-of-the-art PRSs. Our code is available at https://github.com/hongruhou89/ProRL.
【4】AtomComposer: Discovering Chemical Space from First Principles with Reinforcement Learning
标题:AtomComposer:用强化学习从第一原理发现化学空间
链接:https://arxiv.org/abs/2605.28287
摘要:Discovering novel stable molecules without training data remains a grand scientific challenge. Current molecular generative models are trained on large, pre-curated datasets, which introduce biases and limit exploration of novel chemistry. In contrast, we propose a new paradigm: autonomous, generalized agents capable of mapping vast, unknown chemical spaces without any pretraining. For the first time, we present AtomComposer, a self-guided agent that autonomously constructs valid 3D isomers under stoichiometric constraints and is trained exclusively online using reinforcement learning. Unlike existing approaches that generally overfit to a specific chemical formula, we establish a multi-composition training scheme that enables a broad generalization across diverse chemistry, guided by energy- and validity-based rewards. Our agent can discover up to an order of magnitude more valid isomers on unseen test formulas than existing single-composition reinforcement-learning baselines trained with per-step energy rewards. These results fulfill the promise of online reinforcement learning as a powerful paradigm for scalable, from-scratch exploration of chemical configuration space.
【5】Commit to the Bit: Reactive Reinforcement Learning Done Right
标题:全力以赴:反应性强化学习做得正确
链接:https://arxiv.org/abs/2605.28276
摘要:Reinforcement learning algorithms are commonly analyzed (and designed) under the Markov assumption. This is unrealistic, as most environments encountered in practice are either partially observable, or require function approximation that restricts the agent to access non-Markovian state features. We consider the problem of learning an optimal reactive policy in a finite environment with deterministic observations (or equivalently, hard state aggregation). We introduce a new algorithm, Committed Q-learning, and prove almost-sure convergence to the optimal reactive policy under an intuitive assumption we call rewire-robustness. This assumption is strictly weaker than the $q_\star$-realizability condition used in prior work. Our algorithm is a variant of classical Q-learning in which the behavior policy commits to a single action upon entering a feature, and only resamples actions when the observed feature changes. A crucial part of our analysis is the introduction of quasi-Markov environments.
【6】Joint Training of Multi-Token Prediction in Reinforcement Learning via Optimal Coefficient Calibration
标题:通过最佳系数校准进行强化学习中多令牌预测联合训练
链接:https://arxiv.org/abs/2605.28184
摘要:Reinforcement Learning from Verifiable Rewards (RLVR) has emerged as the standard paradigm for improving reasoning capability of large language models, while Multi-Token Prediction (MTP) has been a widely adopted module in pretraining. Combining them is a natural approach, yet current RL practices detach MTP gradients because joint training degrades the performance. We revisit this failure from an optimization perspective. We show that the per-step effect of MTP on the RL objective can be decomposed into two terms: a first-order correlation and a second-order perturbation penalty. This decomposition unifies three MTP training regimes: Detach, Cross-Entropy loss, and Policy loss, and explains why each succeeds or fails. Further analysis of policy loss reveals that, although it aligns with intuition, performance still degrades: the correlation term decays while the quadratic penalty persists. Guided by the analysis, we propose Optimal Coefficient Calibration (OCC), an adaptive scheme that tracks the optimal coefficient online via a log-probability proxy at negligible cost. Across six competition-level mathematical reasoning benchmarks, OCC consistently matches or exceeds the detach baseline, delivering improved joint MTP-RL training performance.
【7】Cyclical Entropy Eruption: Entropy Dynamics in Agent Reinforcement Learning
标题:周期性熵爆发:智能体强化学习中的熵动力学
链接:https://arxiv.org/abs/2605.27954
摘要
:Agentic large language models are increasingly used to solve real-world tasks by reasoning over goals, invoking tools, and interacting with external environments. Reinforcement learning provides a natural framework for improving these behaviors, and recent agent RL methods have achieved strong results across domains. However, the training dynamics of agent RL remain poorly understood, limiting our ability to diagnose instabilities and design more effective training algorithms. In this work, we identify a previously underexplored phenomenon in agent RL, which we term cyclical entropy eruption. Unlike single-turn reasoning RL, where entropy typically collapses and stays low, agent RL training exhibits unique recurring cycles of sharp entropy eruption and gradual subsidence. We decompose this dynamic into three phases and provide theoretical and empirical analyses of each, explaining the mechanisms underlying its cyclical oscillation. We further show that degenerate patterns such as sentence duplication and hallucination, once acquired during eruption, can persist and accumulate across cycles. Motivated by these findings, we propose SEAL (Separation-Enhanced Agent Learning), a lightweight auxiliary loss that separates correct and incorrect trajectories in representation space, directly targeting the root cause of entropy eruption. Experiments across multiple benchmarks, models, and RL algorithms demonstrate that SEAL stabilizes training and yields stronger downstream agent performance.
【8】Personalized Observation Normalization for Federated Reinforcement Learning in Simulation Environments with Heterogeneity
标题:具有异类的模拟环境中联邦强化学习的个性化观察规范化
链接:https://arxiv.org/abs/2605.27385
备注:Accepted at the International Joint Conference on Neural Networks (IJCNN) 2025
摘要:Federated reinforcement learning (FedRL) enables multiple agents to collaboratively train a global policy without sharing raw data, making it ideal for privacy-sensitive applications. However, FedRL faces challenges in heterogeneous environments where differing state-transition dynamics lead to non-identical input distributions and imbalanced parameter updates during aggregation. Therefore, this paper develops a personalized observation normalization (PON) method, allowing each agent to locally normalize raw state inputs using a continuously updated running mean and variance. This design ensures consistent scaling of local feature without overshadowing across agents during aggregation. Furthermore, we demonstrate that sharing normalization parameters across agents is ineffective due to the diverse local input distributions, which highlights the necessity of personalized statistics. Experiments on heterogeneous MuJoCo tasks show that our developed PON accelerates training and achieves superior performance compared to baseline methods.
【9】Accelerating Reinforcement Learning Training Using Simulation Surrogate Models
标题:使用模拟代理模型加速强化学习训练
链接:https://arxiv.org/abs/2605.27556
摘要:High-fidelity simulation models are widely used to analyze complex stochastic systems, but their high computational cost motivates the development of cheaper surrogate models that approximate the simulation model's input-output relationship. In parallel, reinforcement learning (RL) has emerged as a powerful framework for making online decisions in stochastic environments, with increasing attention being given to the use of simulation models as training environments for RL models. We investigate a class of surrogate models suitable for accelerating RL training in settings where the reward structure, model parameters, or system dynamics change over time and explore their interactions with simulation models and RL models. Through numerical experiments on a stochastic service system modeled via discrete-event simulation, we demonstrate that leveraging surrogate models can substantially accelerate RL training and re-training.
元学习(1篇)
【1】OmniVerifier-M1: Multimodal Meta-Verifier with Explicit Structured Recalibration
标题:OmniVerifier-M1:具有显式结构化重新校准的多模式元验证器
链接:https://arxiv.org/abs/2605.28805
备注:ICML 2026. Project: https://github.com/Cominclip/OmniVerifier
摘要:Visual outcomes are increasingly central to multimodal large language models, making reliable and fine-grained verification essential for scaling generalist foundation models. In this work, we investigate multimodal meta-verification, which leverages verifier-generated rationales rather than decision-only signals, and explore how to effectively incorporate meta-verification feedback into multimodal verifier training. We identify two key findings. First, symbolic verifier outputs (e.g., bounding boxes) outperform textual explanations as meta-verification rationales, enabling efficient rule-based reinforcement learning rewards while avoiding reliance on model-based rewards from auxiliary judge models. Second, decoupling reinforcement learning objectives for binary judgment and meta-verification substantially outperforms joint reward optimization, due to intrinsic differences in output structure and learning dynamics. Based on these insights, we train OmniVerifier-M1, a generalist visual verifier leveraging symbolic meta-verification and decoupled reinforcement learning. OmniVerifier-M1 provides robust verification and fine-grained error localization, and further enables M1-TTS, a verifier-driven agentic generation system achieving dynamic region-level self-correction. This approach paves the way for more reliable, interpretable, and fine-grained multimodal verification, supporting safer and more controllable foundation model deployment.
符号|符号学习(1篇)
【1】SYNAPSE: Neuro-Symbolic Visual Thought-to-Text Decoding via Topological Semantic Denoising
标题:SYAPSE:通过topical semantic semantic semantic semantic Visual right-to-Text解码
链接:https://arxiv.org/abs/2605.27790
摘要:Recent advances in large language models have accelerated open-vocabulary EEG-to-imagined-text decoding, where non-invasive neural activity recorded during visual perception is translated into coherent natural language descriptions of viewed stimuli. However, existing systems remain highly vulnerable to biological noise, where corrupted neural projections induce hallucinated or semantically unstable generation in frozen language models. We introduce SYNAPSE (Symbolic Neural Alignment for Precise Semantic Extraction), a lightweight neuro-symbolic framework that stabilizes neural text generation through inference-time symbolic regularization. By purifying EEG-derived semantic candidates using commonsense graph structure and latent exemplars, SYNAPSE improves semantic stability without end-to-end LLM fine-tuning. Experiments across popular EEG decoding benchmarks and multiple frozen LLM backends demonstrate consistent gains over unconstrained prompting baselines, robustness under object-label ablation, and performance commensurate with substantially more resource-intensive fine-tuned systems, while preserving biometric privacy by localizing raw EEG processing entirely within the encoder stack.
医学相关(3篇)
【1】Information-theoretic Multimodal Representation Learning for Electrocardiogram Signals
标题:心电图信号的信息论多模式表示学习
链接:https://arxiv.org/abs/2605.27583
摘要
:Electrocardiograms (ECGs) are widely used non-invasive measurements of cardiac activity and play a central role in clinical diagnosis. Recent multimodal approaches align ECG signals with clinical reports to incorporate diagnostic semantics, but clinical reports often fail to preserve the rich physiological structure of ECG waveforms, particularly across multiple levels of abstraction ranging from coarse diagnostic categories to fine-grained morphology. To address this limitation, we formulate ECG representation learning from an information-theoretic perspective and derive a tractable objective that jointly preserves signal structure and integrates clinical semantics. Based on this principle, we propose \textbf{MERIT} (Multimodal ECG Representation via Information Theory), a dual-branch pretraining framework combining masked ECG modeling with ECG--text contrastive alignment. Extensive experiments on PTB-XL and additional benchmarks demonstrate consistent improvements over prior methods, including gains exceeding $3%$ F1 on PTB-XL All and $5%$ F1 on SubClass classification. In zero-shot evaluation, MERIT further improves performance by up to $ +2.66\%$ AUC and $ +2.11\%$ F1 on PTB-XL SubClass, while also demonstrating robustness under multiple distribution-shift settings. Moreover, leveraging the learned ECG representations for ECG-conditioned clinical text generation with large language models improves text quality across several metrics, including ROUGE and METEOR. Together, these results demonstrate that MERIT learns more informative and clinically meaningful ECG representations, particularly for fine-grained clinical applications.
【2】Comparative Analysis of Liquid Neural Networks and LSTM for Sequential Pattern Recognition: Robustness, Efficiency, and Clinical Utility
标题:用于序列模式识别的液体神经网络和LSTM的比较分析:鲁棒性、效率和临床实用性
链接:https://arxiv.org/abs/2605.27467
备注:9 pages, 7 figures, 6 tables, The conference paper will appear in Proceedings of JCSSE 2026
摘要:Traditional Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) units operate on discrete time steps, often failing to capture the fluid temporal dynamics of real-world physical processes. Liquid Neural Networks (LNNs), specifically Closed-form Continuous-time (CfC) networks, address this by modeling the hidden state evolution as a continuous differential equation. In this paper, we conduct a comprehensive benchmarking study across four distinct sequential modalities: neuromorphic event-based data (N-MNIST), stroke-based drawing (QuickDraw), visual handwriting (IAM), and physiological time-series (PhysioNet Sepsis-3). Furthermore, we perform a rigorous stress test using temporal dropout to evaluate model robustness against missing data. Our findings reveal that LNNs consistently provide superior parameter efficiency and significantly higher robustness in natively temporal domains and clinical environments where data sparsity is prevalent. This extended preprint provides additional background on related datasets and the LNN theoretical lineage, supplemented with a detailed appendix documenting our full implementation and experimental settings.
【3】Genetic algorithm vs. gradient descent for training a neural network architecture dedicated to low data regimes in small medical datasets
标题:遗传算法与梯度下降用于训练专门用于小型医疗数据集中低数据状态的神经网络架构
链接:https://arxiv.org/abs/2605.27411
摘要:Aim/Introduction: Distance-encoding biomorphic-informational neural network (DEBI-NN) is a recently proposed architecture in which connection weights are defined by the distances between neurons positioned in a Euclidian space. This approach drastically reduces the number of trainable parameters compared to classical neural networks in which weights are directly trained. The training process for DEBI-NN is based on a genetic algorithm (GA), rather than gradient descent (GD) which remains the prevailing optimization algorithm in deep learning. We aim to design and implement a GD learner for DEBI-NN and assess its performance compared to GA. Materials and Methods: We designed a spatial backpropagation scheme tailored to DEBI-NN and carried out a comparison between GD and GA for classification tasks, using a synthetic non-linear "two-moons" dataset, two clinical medical imaging radiomic datasets and a fetal cardiotocography dataset with a sample sizes ranging from n=85 to n=2126. Each optimizer was tuned through targeted hyperparameter searches adapted to each dataset. Results: Across all experiments, GA consistently produced superior decision boundaries and classification performance (Synthetic: 100% vs 83%; DLBCL: 83% vs 78%; HECKTOR: 80% vs 67%; Fetal: 81% vs 66%), whereas GD exhibited instability and failed to fully capture the non-linear patterns inherent to DEBI-NN's spatial encoding. The entangled gradients resulting from neuron interdependencies limit the effectiveness of classical backpropagation. Conclusion: These findings highlight fundamental limitations of gradient-based methods in architectures with highly interdependent spatial parameters and confirm the suitability of evolutionary strategies for training DEBI-NN.
蒸馏|知识提取(3篇)
【1】AREA: Attribute Extraction and Aggregation for CLIP-Based Class-Incremental Learning
标题:领域:基于CLIP的类增量学习的属性提取和聚集
链接:https://arxiv.org/abs/2605.28809
备注:Accepted to ICML 2026. Code is available at https://github.com/LAMDA-CL/ICML2026-AREA
摘要:Class-Incremental Learning (CIL) is important in building real-world learning systems. In CLIP-based CIL, the model performs classification by comparing similarity between visual and textual embeddings obtained from template prompts, e.g., ``a photo of a [CLASS]''. This seemingly monolithic matching process can be decomposed into two conceptually distinct stages: attribute extraction and attribute aggregation. For example, a model may recognize cat using attributes such as fur texture and whiskers. When learning a new class like car, the model must extract additional attributes like wheels and adjust how they are aggregated in the shared representation space. However, since only data from the current task is available, incremental updates can bias both attribute extraction and aggregation toward new classes, leading to catastrophic forgetting. Therefore, we propose AREA for attribute extraction and aggregation in CLIP-based CIL. To stabilize extraction, we anchor class-level visual and textual attributes on the hyperspherical embedding space via principal geodesic analysis. To stabilize aggregation, we learn lightweight task-specific experts with scoring and residual refinement, regularized by a variational information bottleneck objective. During inference, we perform routing over task attribute manifolds via optimal transport for more concise prediction. Experiments show that AREA consistently outperforms SOTA methods. Code is available at https://github.com/LAMDA-CL/ICML2026-AREA.
【2】STARS: Spike Tail-Aware Relational Synthesis for ANN-to-SNN Data-Free Knowledge Distillation
标题:STARS:用于ANN-to-SNN无数据知识蒸馏的Spike尾部感知关系合成
链接:https://arxiv.org/abs/2605.27409
摘要:SNNs promise energy-efficient and low-latency inference, but their performance still trails that of ANNs. ANN-to-SNN knowledge distillation helps narrow this gap, yet the original training data are often unavailable in practical deployment settings. Existing data-free knowledge distillation (DFKD) methods synthesize surrogate data by matching teacher-side priors, especially BN statistics, but these ANN-oriented constraints mainly regularize mean and variance and therefore remain under-constrained for SNN students whose responses depend on threshold-crossing dynamics. In this paper, we propose Spike Tail-Aware Relational Synthesis (STARS), a plug-and-play method for ANN-to-SNN DFKD that augments standard BN-guided synthesis with two complementary objectives: Relational Consistency Alignment, which preserves cross-sample relational consistency between teacher and student, and Tail-Aware Regularization, which regularizes threshold-relevant tail probabilities through soft exceedance over teacher-derived thresholds. Together, these objectives generate synthetic batches that remain teacher-valid while becoming more informative for SNN students. Experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet across multiple ANN-SNN pairs show that our method consistently improves conventional DFKD baselines and even surpasses several KD methods, with gains of up to 4.6\% on CIFAR-10 and 6.7\% on CIFAR-100, highlighting the importance of complementing BN matching with relational and tail-aware constraints in SNN-oriented DFKD.
【3】Multi-Teacher Knowledge Distillation via Teacher-Informed Mixture Priors
标题:通过教师知情的混合先验进行多教师知识提炼
链接:https://arxiv.org/abs/2605.27967
摘要:Knowledge distillation is a powerful method for model compression, enabling the efficient deployment of complex deep learning models (teachers), including large language models. However, its underlying statistical mechanisms remain unclear, and uncertainty evaluation is often overlooked, especially in real-world scenarios requiring diverse teacher expertise. To address these challenges, we introduce \textit{Multi-Teacher Bayesian Knowledge Distillation} (MT-BKD), where a distilled student model learns from multiple teachers within the Bayesian framework. Our approach leverages Bayesian inference to capture inherent uncertainty in the distillation process. We introduce a teacher-informed prior, integrating external knowledge from teacher models and task-specific training data, offering better generalization, robustness, and scalability. Additionally, an entropy-based weighting mechanism adaptively adjusts each teacher's influence, allowing the student to combine multiple sources of expertise effectively. MT-BKD enhances the interpretability of the student model's learning process, improves predictive accuracy, and provides uncertainty quantification. We validate MT-BKD on both synthetic and real-world tasks, including protein subcellular location prediction and image classification. Our experiments show improved performance and robust uncertainty quantification, highlighting the strengths of our MT-BKD framework.
推荐(2篇)
【1】Affective Music Recommendation: A Rollout-Based World Model for Offline Preference Optimization
标题:情感音乐推荐:线下偏好优化的基于推出的世界模型
链接:https://arxiv.org/abs/2605.28810
摘要:Functional music applications, from consumer focus and sleep aids to clinical interventions, share a distinctive recommendation problem: success is defined by the listener's affective state, but online experimentation on emotion is ethically constrained, particularly for clinical populations who cannot reliably skip a song or report distress. We describe AMRS, the Affective Music Recommendation System deployed on LUCID's health-and-wellness platforms, which serve clinical users (primarily older adults with neurocognitive conditions) and consumer-wellness users across energize, focus, calm, and sleep modes. AMRS is built around a rollout-based world model: a causal transformer trained on logged listening data to jointly predict engagement, binary rating, and self-reported valence and arousal. The world model serves both as an in-silico simulator for offline policy training and as a stress-testing tool before deployment. A recommender policy initialized by behaviour cloning is fine-tuned offline with Direct Preference Optimization (DPO) against a configurable multi-objective utility function. Under a strict cold-start protocol, the world model predicts both behavioural and affective signals with usable fidelity; DPO improves predicted valence and arousal over the cloned baseline while maintaining a similar diversity profile and avoiding the distributional collapse produced by greedy optimization. We position the work as an early deployed validation of a methodology for affective recommendation when online experimentation is ethically untenable.
【2】Context Features Are Cheap: Rank-Aware Decomposition for Efficient Feature Interaction in Recommender Systems
标题:上下文特征很便宜:推荐系统中高效特征交互的排名感知分解
链接:https://arxiv.org/abs/2605.27450
摘要:Modern industrial recommender systems use a deep ranking model to score N candidates against the same user and context features. Standard implementations broadcast context features early in the forward pass, redundantly computing context-only operations N times per request. We present a rank-aware decomposition applicable to the dominant interaction mechanisms in modern recommender architectures-Factorization Machine (FM) pairwise products, Deep Cross Network (DCNv2) cross layers, self-attention, and fully connected (FC) projection layers-built on a single algebraic principle: any linear or bilinear operation over a rank-partitioned input admits an exact block decomposition that moves context-only computation from once-per-candidate to once-per-request, identity-equivalent to the original model. Closed-form analysis and controlled ablation verify that savings scale quadratically with the number of context features. Applied to a production DLRM-style ranker without any architectural change, the decomposition increases per-pod throughput by 87.5% (a 47% reduction in peak pod count) at identical model predictions. The identity-equivalent decomposition applies only at the first layer of cross networks and self-attention, since each layer mixes ranks in its output. To extend savings across depth, we further introduce rDCN, an architectural variant of DCNv2 that maintains rank discipline across depth and matches DCNv2 accuracy within training noise at 67% fewer total FLOPs, and sketch an analogous architectural variant for self-attention.
点云|SLAM|雷达|激光|深度RGBD相关(1篇)
【1】Law of Neural Interaction: Depth-Width Shape, Interaction Efficiency, and Generalization
标题:神经相互作用定律:深度-宽度形状、相互作用效率和泛化
链接:https://arxiv.org/abs/2605.27989
备注:30 pages, 4 figures
摘要:The guidance of scaling laws has increased the resource demands of modern large language models (LLMs), yet it remains questionable whether these models utilize resources effectively under a fixed budget. Previous research has proved superposition as a key contributor to loss. By leveraging the Neural Feature Ansatz, we extend superposition from parameter space to gradient space and define it as neural interaction. We find that under a fixed budget, good generalization is usually accompanied by efficient neural interactions, and the model can be placed in an efficient interaction interval by adjusting its depth-width ratio ($R_{D/W}$). In addition, as the budget scales up, the efficient interaction interval of the model remains relatively stable. By comparing existing small scale dense LLMs, we observe that models operating near this interval tend to perform better on the MMLU-Pro benchmark. Our findings reveal that the $R_{D/W}$ influences resource utilization efficiency and thereby affects generalization, providing insights into model shape initialization and the understanding of model generalization mechanisms. Code for Neural Interaction Law is available at: https://anonymous.4open.science/r/Neural_Interaction_Law-D788
联邦学习|隐私保护|加密(4篇)
【1】Dimensionality Reduction for Robust Federated Learning: A Theoretical Analysis and Convergence Guarantee
标题:鲁棒联邦学习的复杂性降低:理论分析和收敛保证
链接:https://arxiv.org/abs/2605.28335
摘要
:Federated Learning (FL) enables multiple clients to collaboratively train models without sharing raw data, but it is highly vulnerable to Byzantine attacks. Existing robust approaches can neutralize these threats but incur substantial computational overhead during high-dimensional gradient aggregation, an overhead that scales poorly with model size and increasingly dominates the training cost as modern models grow larger. To address this computational bottleneck, we propose Projected Dimensionality Reduction (PDR), a universal acceleration framework for vector-level distance-based robust aggregators, which performs robust aggregation by compressing gradients into a drastically smaller subspace via sparse random projection to efficiently compute reliability weights. This approach reduces the server computational complexity to an optimal $ \mathcal{O}(Mp) $, where $ M $ is the number of clients and $ p $ is the model dimension, matching the theoretical lower bound required merely to read the gradients. We establish convergence guarantees under standard FL assumptions in prior Byzantine-robust FL analyses. By leveraging the Subspace Embedding Theorem, we show that PDR achieves optimal convergence rates of $ \mathcal{O}(1/\sqrt{T}) $ for non-convex functions and $ \mathcal{O}(1/T) $ for strongly convex functions, where $ T $ denotes the number of iterations. Crucially, we mathematically demonstrate that this massive acceleration comes almost for free, merely inflating the inherent Byzantine error floor by a bounded, tunable factor of $ \frac{1+ε}{1-ε} $. Experimental results on benchmark datasets confirm that integrating PDR with existing aggregators yields orders of magnitude speedups in time efficiency while maintaining highly competitive convergence performance.
【2】A Methodology to Assess Power Modeling in Energy-Aware Federated Learning on Heterogeneous Mobile Devices
标题:一种评估异类移动设备上能源感知联邦学习中功耗模型的方法
链接:https://arxiv.org/abs/2605.27601
备注:19 pages, 3 figures, 7 tables, Accepted for publication in the proceedings of Networked Systems (NETYS 2026), Springer Nature
摘要:Estimating CPU power on heterogeneous ARM-based commodity devices is challenging due to limited access to CPU's voltage domains. As a result, state-of-the-art energy-aware Federated Learning (FL) frameworks typically rely on simplified approximate power models to estimate computation energy, rather than the more accurate analytical CMOS-based model. To bridge this gap, we propose a reproducible CPU power estimation methodology combined with a rail-to-cluster mapping technique to retrieve cluster-level supply voltage. We evaluate our approach on two commodity Android devices and show that the analytical model predicts CPU power with errors below 10%, whereas the approximate model incurs errors of up to 959%. Using AnycostFL, a state-of-the-art energy-aware FL framework, we show that the analytical model achieves the same 80% model accuracy while consuming 1.4x less energy than the approximate model. These results highlight that approximate models can severely misestimate computation energy and lead to suboptimal decisions. This work facilitates the use of analytical CPU power models on heterogeneous multi-cluster ARM-based mobile SoCs without additional hardware support or external power measurement tools.
【3】Federated Learning for Multivariate Time Series Anomaly Detection in Industrial Automation
标题:工业自动化中多元时间序列异常检测的联邦学习
链接:https://arxiv.org/abs/2605.27486
备注:Preprint. Accepted at the DEXA International Workshop on Optimisation of Industrial Production with AI Algorithms 2026 (DEXA AI4IP 2026)
摘要:Federated learning (FL) has broadened the horizon for multivariate time series anomaly detection (MTSAD). However, benchmarking such anomaly detection methods within FL paradigm poses data-centric challenges. The existing datasets do not counteract these challenges since they do not simultaneously provide sufficient scale, accurate labels, and freedom from common flaws. In addition, the role of cyclic process behavior, which is common in discrete industrial automation, remains underexplored for MTSAD for the current state of research. This paper aims to shed more light on the literature and address these gaps by introducing a dataset designed with cyclic dynamics arising from the repetitive nature of discrete automation processes and evaluates selected MTSAD methods on both the proposed dataset and a public benchmark dataset.
【4】Can Quantum Federated Learning Withstand Circuit-Level Backdoors?
标题:量子联邦学习能否抵御电路级后门?
链接:https://arxiv.org/abs/2605.27416
备注:Accepted to IJCAI-ECAI 2026
摘要:Quantum Federated Learning (QFL) inherits the core vulnerability of federated optimization to malicious clients, while also introducing an attack surface from variational circuit training and measurement-driven gradients. This work proposes a novel CircUit-Level backdoor Threat (CULT) model that formalizes four stealthy attacks by exploiting quantum-aware mechanisms, including Grover, Pauli, Bit-flip, and Sign-flip. By enabling malicious clients on both in-training and post-training surfaces, these attacks can critically undermine the learning process. We establish a rigorous theoretical foundation to demonstrate attack stealthiness under standard smoothness assumptions. Experiments on the MNIST and CIFAR-10 datasets with non-IID splits and varying fractions of malicious clients show that even a single malicious client can induce severe accuracy degradation under FedAvg aggregation. While popular defenses, including Krum, Multi-Krum, FoolsGold, FLGuardian, and Mud-HoG, reduce degradation in many regimes, they fail to eliminate worst-case failure cases, where accuracy drops up to 50\%. The experimental analysis further reveals that under the CULT model, malicious updates effectively mask their presence by staying close to benign norms, thereby helping attackers evade detection.
推理|分析|理解|解释(17篇)
【1】PEFT-Arena: Understanding Parameter-Efficient Finetuning from a Stability-Plasticity Perspective
标题:PEFT-Arena:从稳定性-可塑性的角度理解参数高效微调
链接:https://arxiv.org/abs/2605.28819
备注:Technical report v1 (28 pages, 9 figures, project page: https://spherelab.ai/PEFT-Arena/)
摘要:Parameter-efficient finetuning (PEFT) has become the standard approach for adapting large language models, yet evaluations largely emphasize downstream accuracy while overlooking the retention of pretrained capabilities. We argue that PEFT should be assessed through the stability-plasticity dilemma: the trade-off between target-task adaptation and resistance to forgetting. We introduce PEFT-Arena, a benchmark that jointly measures downstream performance and general capability retention. Across methods, we find distinct stability-plasticity profiles; under comparable parameter budgets, orthogonal finetuning achieves the most favorable Pareto frontier. To explain these differences, we analyze PEFT updates from two geometric perspectives. In weight space, spectral analysis reveals how parameterizations interact with the pretrained singular-value structure. In activation space, retention metrics show whether finetuning preserves or distorts general-capability representations, with forgetting linked to non-isometric representation distortion. Finally, an analysis shows that final SFT checkpoints often overshoot a better target-retention operating point. Inspired by this, we present case studies of a post-hoc improvement with path-wise rewinding.
【2】CaMBRAIN: Real-time, Continuous EEG Inference with Causal State Space Models
标题:CaMBRAIN:利用因果状态空间模型进行实时、连续的脑电推断
链接:https://arxiv.org/abs/2605.28792
备注:22 pages, 3 figures, 8 tables
摘要
:Electroencephalography (EEG) is a critical, non-invasive method to monitor electrical brain activity. EEGs can span anywhere from a couple seconds to multiple hours, posing a major hurdle for existing deep learning methods due to two major factors: (1) existing EEG models are predominantly built upon the attention mechanism, incurring quadratic scaling as the sequence length increases, and (2) raw EEG signals must be processed in a sliding-window fashion due to fixed-length input requirements, preventing global understanding of the entire signal. To this extent, we propose CaMBRAIN - the first Causal, Mamba-based state space model (SSM) capable of real-time inference of EEG signals, arguing that bidirectional approaches are needlessly expensive given the causal, unidirectional nature of EEG. However, training such a model is non-trivial, as crucial EEG events can be extremely brief - within fractions of a second - yet separated by long intervals spanning minutes. Current EEG methods use self-supervised objectives that optimize for signal reconstruction, but these are not well suited for streaming SSMs; they fail to explicitly train the hidden state to retain the salient long-range context needed for streaming inference. We therefore introduce a multi-stage self-supervised training pipeline specifically tailored to encourage long-range memory retention and strong performance on EEG signals, while preserving the linear-time complexity of state space models. CaMBRAIN achieves state-of-the-art (SOTA) results across 3 different EEG datasets with >10x higher throughput than existing models, enabling the first model capable of long-range, continuous inference of variable-length EEG signals.
【3】Understanding Generalization and Forgetting in In-Context Continual Learning
标题:理解上下文持续学习中的概括和遗忘
链接:https://arxiv.org/abs/2605.28705
备注:accepted by ICML 2026
摘要:
【4】Learning Theory of the SVRG: Generalization and Convergence Analysis
标题:SVRG的学习理论:概括与收敛分析
链接:https://arxiv.org/abs/2605.28513
摘要:Variance reduction (VR) methods employ stochastic gradients with decreasing variance, and they have been widely applied to solve large-scale optimization problems in machine learning because of their efficiency. Existing theoretical studies of VR methods are mainly focused on the convergence analysis, leaving the generalization behavior largely unexplored. In this paper, we bridge this gap by developing the first non-vacuous generalization analysis of the representative VR method: Stochastic Variance Reduced Gradient (SVRG), through the lens of algorithmic stability. In particular, we establish sharp stability bounds of the SVRG in both convex and strongly convex settings by exploiting its algorithmic structure. The obtained bounds are data-dependent, because the training errors are incorporated along the trajectory. Our analysis clarifies the interplay between optimization and generalization, leading to optimal excess population risk bounds in both settings. Our approach differs substantially from existing analyses of stochastic algorithms in the sense that we decompose the SVRG update as an SGD-like step plus a zero-mean correction term and then introduce novel Lyapunov functions to absorb the additional gradient terms induced by the reference points. Our analytical framework can be generalized to other VR methods, and we demonstrate the generalization by the well-known Stochastic Average Gradient Accelerated (SAGA) method.
【5】ProvMind: Provenance-grounded reasoning for materials synthesis
标题:ProvMind:基于来源的材料合成推理
链接:https://arxiv.org/abs/2605.28487
摘要:Materials process optimization requires reasoning over routes, conditions, tools and causal dependencies, yet most computational formulations flatten synthesis procedures into text or ordered steps. We introduce MatProcBench, a provenance-grounded benchmark constructed from literature-mined MatPROV graphs, to evaluate seven process-reasoning tasks spanning route continuity, step-level variable inference and global causal consistency under both same-split and shift-aware evaluation, including a strict dual-OOD split that combines temporal and material-class shift. We further introduce ProvMind, a process-memory reasoning framework that retrieves analogous training processes, converts them into provenance-aware option-level compatibility scores, and uses a language model for constrained final decision making. ProvMind achieves 52.84\% accuracy on the dual-OOD split, outperforming prompting, retrieval-augmented and supervised fine-tuning baselines.
【6】Explaining is Harder Than Predicting Alone: Evaluating Concept-based Explanations of MLLMs as ICL Visual Classifiers
标题:解释比单独预测更难:评估MLLM的基于概念的解释作为ICL视觉分类器
链接:https://arxiv.org/abs/2605.28215
备注:Accepted to the CompLearn Workshop at ICML 2026
摘要:In-context learning (ICL) enables multimodal large language models (MLLMs) to classify images from a few labelled examples. Yet, how these models use the provided context remains opaque. While Chain-of-Thought prompting is widely used, recent work argues that it may not reflect true internal computation. In this paper, we systematically evaluate the concept-based explainability of frozen MLLMs under few-shot ICL using five conditions of increasing formal rigour, ranging from baseline classification to Description Logics (DL) axiom generation. Evaluating four state-of-the-art MLLMs via an independent LLM-as-a-judge pipeline, we demonstrate that explaining is genuinely harder than predicting alone. Surprisingly, forcing models to generate formally structured, concept-based explanations degrades predictive accuracy monotonically (from 93.8% to 90.1%), contradicting the assumption that explicit reasoning universally aids performance. However, when models successfully articulate class-discriminative visual features, explanation quality strongly correlates with correct predictions. Our findings suggest that while MLLMs excel at visual classification, they lack the specific instruction-tuning required for formal, machine-verifiable explainability.
【7】BPPO: Binary Prefix Policy Optimization for Efficient GRPO-Style Reasoning RL with Concise Responses
标题:BPPO:具有简洁响应的高效GRPO式推理RL的二进制后缀策略优化
链接:https://arxiv.org/abs/2605.28028
摘要:Group Relative Policy Optimization (GRPO) is widely used for training reasoning models, but updating all sampled completions in each group incurs substantial cost and can reinforce verbose reasoning trajectories. In this paper, we study whether all completions provide equally useful update signals in GRPO-style reasoning RL. Our gradient-similarity analysis shows that, within the same prompt group, same-class completions often induce highly similar update directions, whereas correct-incorrect pairs provide more distinct contrastive signals. Motivated by this observation, we propose Binary Prefix Policy Optimization (BPPO), which uses the shortest correct completion and the shortest incorrect completion as a compact update unit while preserving full-group advantage normalization. BPPO further improves efficiency with adaptive completion scheduling and prefix-focused optimization; by updating only response prefixes, it avoids reinforcing redundant suffixes and encourages more concise responses. Experiments on GSM8K, MATH, and Geo3K show that BPPO achieves up to 6.08x speedup over GRPO while maintaining competitive accuracy, and reduces mean response length by approximately 30-50% without modifying the reward with an explicit length penalty.
【8】Behavioural Analysis of Alignment Faking
标题:对齐伪造的行为分析
链接:https://arxiv.org/abs/2605.27681
备注:preprint
摘要:Alignment faking (AF) refers to a model strategically complying with a training objective to avoid behavioural modification while preserving its deployment preferences. Understanding when and why AF arises matters as models grow better at distinguishing training from deployment. Prior work finds AF fragile, prompt-sensitive, and model-dependent, leaving its underlying drivers unclear. We study AF in a controlled, minimal setup that isolates its core components, and observe it across a wider range of models than previously reported, including small-scale models. We identify three separable drivers -- values, goal guarding, and sycophancy -- and show via targeted prompt ablations and activation steering that each independently modulates AF behaviour. Our results indicate AF is more widespread than previously reported and that its occurrence is predictable from situational cues and measurable model tendencies such as baseline sycophancy and stated values. The decomposition suggests concrete directions for detecting and mitigating AF in future models.
【9】GenSBI: Generative Methods for Simulation-Based Inference in JAX
标题:GenJB:JAX中基于模拟的推理生成方法
链接:https://arxiv.org/abs/2605.27499
备注:48 pages + 1 appendix, 33 figures, 18 tables. For the associated Python code, see https://github.com/aurelio-amerio/GenSBI
摘要:Flow and diffusion generative models have established themselves as widely adopted density estimators for simulation-based inference (SBI), extending naturally from neural posterior estimation to likelihood and joint density estimation. Their principled optimization objectives and freedom from architectural constraints have driven rapid adoption across the natural sciences. Yet the most widely used SBI libraries remain PyTorch-based, leaving researchers who develop their forward models and analysis pipelines in JAX without a native option. We present GenSBI, an open-source library that implements flow matching, score matching, and denoising diffusion entirely in JAX. The library offers three transformer-based architectures - SimFormer, Flux1, and a novel Flux1Joint that extends gate-modulated transformer blocks to joint density estimation - all interchangeable through a unified interface that decouples generative method, neural backbone, and inference mode. GenSBI provides an end-to-end workflow from training through posterior calibration (SBC, TARP, LC2ST) and supports custom architectures with domain-specific embedding networks. We validate the framework on standard SBI benchmarks, achieving near-ideal mean C2ST scores (0.50-0.56, where 0.50 is ideal) on SBIBM tasks with minimal per-task tuning and well-calibrated posterior coverage across all tested configurations. The code is publicly available at https://github.com/aurelio-amerio/GenSBI.
【10】A Unified Structured Query Understanding Framework for Industrial Semantic Search
标题:工业语义搜索的统一结构化查询理解框架
链接:https://arxiv.org/abs/2605.27441
备注:Accepted by KDD-ADS 2026
摘要:Query understanding in large-scale industrial search systems is typically implemented as a cascade of disparate, task-specific components. While individually optimizable, this fragmented architecture incurs high maintenance overhead and results in inconsistent behaviors, particularly for long-tail queries. In this work, we propose and deploy a unified structured query understanding system that consolidates these heterogeneous functions into a single Small Language Model (SLM) that performs schema-constrained generation. To address the data bottlenecks inherent in unified modeling, we introduce Query Illuminator, a dual-purpose framework serving as: (i) a teacher model for high-quality auto-annotation and distillation, and (ii) a surrogate judge for scalable evaluation where human labels are scarce. We validate this approach through extensive offline and online tests within LinkedIn's Job Search system. Furthermore, we demonstrate the framework's horizontal extensibility through a cross-domain case study on People Search. The results show improved user engagement and reduced operational costs, achieved while satisfying strict low-latency serving constraints on limited GPU resources.
【11】$E^3$-Agent: An Executable and Evolving Agent for Resource Management of Edge Generative Inference
标题:$E ' 3 $-Agent:用于边缘生成推理资源管理的可执行且不断进化的Agent
链接:https://arxiv.org/abs/2605.27428
备注:13 pages, 4 figures, 6 tables
摘要:Edge deployments of generative inference increasingly face two practical realities: per-device per-model performance is often unknown at deployment time, and it is non-stationary due to user-driven semantic events, background load, and device churn. Consequently, a resource manager that is tuned offline under a fixed regime can become brittle and expensive to maintain. This paper presents $E^3$-Agent, an executable and evolving agent for edge artificial intelligence generated content (AIGC) resource management. $E^3$-Agent separates a fast-path router that makes millisecond-level dispatch decisions from a slow-path, event-driven large language model (LLM) meta-controller that mitigates regime shifts through a small, explicit control surface exposed via a tool interface, including risk gating, router configuration, and rapid performance calibration. The agent learns online from execution feedback and continuously adapts to unknown and time-varying service-time mappings. We evaluate $E^3$-Agent in a discrete-event simulator driven by MLPerf-derived device-model measurement priors, covering cold-start warmup and three dynamic regimes: semantic dynamics, device churn, and hidden drift. Across the dynamic scenarios, $E^3$-Agent reduces average latency by 65%-73% compared to the best static baseline, stays within 7%-10% of an online full-information Oracle used for evaluation, and effectively suppresses stutter rate under semantic degradation.
【12】Bridging Maximum Likelihood and Optimal Transport for Efficient Inference and Model Selection in Stochastic Block Models
标题:随机区组模型中最大似然和最优运输的有效推理和模型选择
链接:https://arxiv.org/abs/2605.28488
备注:10 pages, 8 figures
摘要:We study inference in stochastic block models (SBMs) through the lens of optimal transport (OT). We first establish that maximum likelihood variational inference (MLVI) can be interpreted as a semi-relaxed Gromov-Wasserstein (srGW) projection with entropic regularization. While this formulation yields accurate clustering, the entropic regularization prevents transport plans to be sparse, hindering intrinsic model selection. Consequently, we investigate unregularized srGW estimators, and prove that they consistently recover both the SBM connectivity matrix and latent cluster assignments in the asymptotic regime. However, this asymptotic property does not translate into reliable model selection in finite samples, and calls for additional mechanisms to promote sparsity in the inferred cluster proportions. We empirically show that such a regularized formulation yields estimators that simultaneously recover model parameters and select the number of clusters in a single optimization problem, thereby avoiding costly grid search or heuristic model selection procedures.
【13】Deep Neural Network Training as Random Effects: An Optimization-Inference Duality
标题:作为随机效应的深度神经网络训练:优化-推理二元性
链接:https://arxiv.org/abs/2605.27991
摘要:Deep neural networks (DNNs) have achieved remarkable empirical success, yet their training dynamics remain understood mainly from optimization rather than statistical principles. Here we develop a statistical framework for DNN training in the over-parameterized regime by showing that the prediction induced by continuous-time neural tangent kernel (NTK) gradient flow is exactly equivalent to that from a classical random-effects model. In this framework, training time acts as a variance component, or equivalently an empirical Bayes covariance hyperparameter, governing the allocation of variation from noise to structured signal. This equivalence reveals an optimization-inference duality: the gradient-flow path is both an optimization trajectory and an empirical Bayes random-effects inference path. Conditional on training time, the network output is the posterior mean of the latent signal, and estimating training time by restricted maximum likelihood (REML) turns early stopping into likelihood-based empirical Bayes inference rather than external tuning. This perspective yields a two-stage inferential procedure. First, a variance-component test determines whether DNN training captures statistically significant structure beyond initialization. Second, conditional on training being warranted, REML provides a likelihood-based early stopping rule. The resulting stopping time admits a spectral interpretation in the NTK eigenbasis, where training proceeds until spectral loss decorrelation is achieved. We further establish that REML-guided early stopping achieves asymptotically optimal prediction error for fixed-design in-sample prediction and, under additional random-design regularity conditions, for out-of-sample prediction. This work reframes DNN training as statistical inference and provides a principled foundation for deciding whether and how long to train deep neural networks.
【14】Quantum principal component analysis without eigenvector recovery
标题:无特征量恢复的量子主成分分析
链接:https://arxiv.org/abs/2605.27942
摘要:Principal component analysis (PCA) is traditionally implemented through a covariance or kernel matrix, leading-eigenvector extraction, and hard rank-$k$ projection. These steps can be computationally costly in high-dimensional and quantum-data settings, sensitive to small eigengaps, and unnecessary when downstream tasks only require principal-subspace scores. Such score-based objectives are important in applications such as anomaly detection, spectral-energy profiling, and other postselection tasks. To address these needs, we introduce a measurement-based soft PCA framework replacing the hard top-$k$ projector with an entropy-regularized Fermi--Dirac filter. This filter is the unique optimizer of an entropy-regularized variational formulation of PCA and converges to the classical PCA projector in the zero-temperature limit. This filter has a direct interpretation as a quantum measurement, which naturally suggests a quantum approach. For centered covariance operators represented by quantum feature states, a single fixed circuit, together with threshold calibration, accesses all optimal filters for different rank budgets or retained-variance levels without rank-dependent circuit updates or eigenvector recovery. For new inputs, the same calibrated quantum circuit yields soft principal subspace scores, spectral energy profiles, and postselected filtered states. The required centering of both training and test data is performed coherently inside the quantum protocol, which is particularly important for quantum data where no classical feature vectors or centered Gram matrix are directly available. By reframing PCA as a calibrated measurement task, this framework bypasses the need for iterative eigenvector extraction and achieves a dimension-independent sample complexity $O(η^{-2})$ for normalized fractional-rank or retained variance scoring at additive accuracy $η$.
【15】Semiparametrically Efficient Inference for Kernel Measures of Noise Heterogeneity
标题:噪音异方差核度量的半参数有效推断
链接:https://arxiv.org/abs/2605.27526
摘要:We develop semiparametrically efficient inference for kernel measures of noise heterogeneity in additive noise models. In many applications, the regression function is estimated using flexible machine learning methods. Downstream procedures based on the resulting residuals can then inherit first-stage bias: regression error may induce spurious dependence between covariates and residuals, invalidating the assumptions needed for standard analysis. We construct a novel Hilbert-valued one-step estimator of the kernel covariance operator between covariates and residuals. Our estimator yields bootstrap-calibrated tests for residual independence and goodness of fit in additive noise models, while also providing asymptotically efficient confidence intervals for the kernel dependence measure under noise heterogeneity. The framework extends to settings with additional covariates, enabling inference on distributional heterogeneity of residual noise across treatment groups. Simulations show improved calibration and power relative to naive plug-in residual methods.
【16】Stop Suppressing the Tail: Causal Inference for Extreme Events
标题:停止压制尾巴:极端事件的因果推理
链接:https://arxiv.org/abs/2605.27474
备注:22 pages, 6 figures, 13 tables. Keywords: double machine learning, dose-response, heavy tails, extreme value theory, causal inference
摘要:Estimating how an outcome responds to a continuous treatment (the Average Dose-Response Function, or ADRF) is a core causal-inference primitive. However, when outcomes possess heavy tails, standard robust double machine learning (DML) deliberately suppresses these extremes to stabilize the bulk average. In high-stakes settings, such as financial returns or climate losses, this omitted 1-in-1000 extreme event is the actual target quantity. Furthermore, current methods that read the tail from a model's residuals suffer from circular dependence, causing tail shape inferences to shift drastically based solely on whether the core estimator is switched between Huber and Welsch.The research proposes an ADRF estimator that emits a structured tail-shape output alongside the standard point estimate. Its tail diagnostic (PDHTE+JK) evaluates the per-treatment tail shape from the outcome centered by a pilot median, successfully breaking the circular dependence and rendering the diagnostic invariant to the choice of core method. The output encompasses four treatment-conditional quantities: tail shape $\hatξ(t)$, deep-tail return levels $\hat{Q}_α(t)$, conditional shortfalls $\hat{S}_α(t)$, the recovered mean ADRF, and an explicit refusal mechanism that declines extrapolation when extreme-value modeling is unsupported by the data. Compared to kernel-weighted quantile regression (QR), the proposed estimator reduces deep-tail ($α=0.001$) return-level MAE by 11% and conditional-shortfall MAE by 25.5% across a heavy-tailed panel. It also achieves a 20-29% MAE reduction in sample-scarce regimes ($n\le2000$). On freMTPL2 motor-insurance claims, it successfully triggered an explicit extrapolation refusal on the log-claim scale, which neither QR nor loss-only DML can produce.
【17】Calibrated Inference for the Conditional Average Treatment Effect in the Few-Placebo Regime via Gaussian Processes
标题:通过高斯过程对少量安慰剂方案中条件平均治疗效果的校准推断
链接:https://arxiv.org/abs/2605.27473
备注:14 pages, 1 figure, 5 tables
摘要
:Estimating how much an intervention helps a given individual the conditional average treatment effect (CATE) is increasingly central to decision-making in medicine, economics, and policy, where an estimate is most useful when accompanied by a calibrated uncertainty interval. We study the few-placebo regime, in which one treatment arm is much smaller than the other, as arises in unequal-allocation trials and small-holdout $A/B$ tests. The standard estimator in this setting is the X-Learner, and a natural way to obtain credible intervals is to make its second stage Bayesian. We show that these intervals under-cover: they contain the true effect less often than their nominal level. We trace this to a structural cause the X-Learner's regression target inherits the bias of a nuisance model fitted to the small arm, so the posterior is centered away from the true effect and we find that the standard remedy, regressing an orthogonal doubly-robust score, is also unreliable here, since the regime's limited overlap leaves the estimator either highly variable or, once stabilized, biased once more. Both consequences reflect a pattern that extends beyond causal inference: a separately estimated variance is attached to a point estimate of a hard-to-learn quantity, and the point estimate's bias is not captured by that variance. We propose GP-CATE, which models each arm's outcome surface with a Gaussian process, so the scarce arm's uncertainty enters the posterior directly rather than as an unmodelled bias. Across synthetic and semi-synthetic benchmarks, GP-CATE attains calibrated coverage where the estimators we compare against including Causal Forest and BART do not, at the cost of intervals that are appropriately wide when the data are uninformative.
检测相关(6篇)
【1】Detecting Diffusion-Generated Time Series Under Generator Shift
标题:检测生成器漂移下的扩散生成时间序列
链接:https://arxiv.org/abs/2605.28355
摘要:The boundary between real and diffusion-generated time series is becoming increasingly difficult to draw, yet detection in this domain remains underexplored, especially when the generator is unknown. We compare white-box detection, which requires access to the generator, against black-box detection, which operates on the raw signal alone. The white-box approach, a reconstruction-based detector adapted from the image domain, works well in in-distribution but breaks down under generator shift: reconstruction-based detection in images succeeds because large generic generators provide a near-universal reconstruction prior, and no analogous generator exists for time series. In contrast, a simple off-the-shelf classifier used as a black-box detector performs remarkably well, achieving an average F1 of 79.2, a 22.1% relative improvement over the white-box approach, and a TPR@1%FPR of 57.2. Diffusion-generated time series detection is therefore not a direct transfer of the image domain problem. This work provides the first systematic exploration of white-box and black-box detection for diffusion-generated time series. We close by identifying several open and promising directions.
【2】AOE: Exhaustive Out-of-Distribution Detection via Recalibrating Outlier Labels
标题:AOE:通过重新校准异常值标签进行详尽的分布失调检测
链接:https://arxiv.org/abs/2605.28021
摘要:Out-of-distribution (OOD) detection is essential for deploying machine learning models in open-world and safety-critical scenarios, where test inputs may deviate from the training distribution and overconfident predictions on unknown samples can lead to unreliable decisions. Outlier Exposure (OE) has emerged as a promising OOD detection paradigm by introducing auxiliary outliers during training to enlarge the margin between in-distribution (ID) and OOD samples. Existing OE-based methods typically enlarge this margin by employing uniform labels to maximize the entropy of OOD samples over ID categories. However, we theoretically show that uniform labels inevitably disregard the relations between OOD samples and ID categories, termed the over-softening effect, leading to a suboptimal margin bound. Our theoretical analysis further reveals that explicitly exploiting such relations can instead yield improved OOD detection performance. Motivated by this insight, we propose \underline{A}daptive Confidence \underline{OE} (AOE), a simple yet effective method that leverages temperature scaling to recalibrate outlier labels. Specifically, AOE generates adaptive soft targets from temperature-scaled model predictions for OOD samples, where the learnable temperature smooths the prediction distribution without fully erasing class-wise relational information. By supervising OOD samples with these adaptive soft targets, AOE preserves the semantic proximity between OOD samples and ID categories while encouraging the softened targets to approach a high-entropy distribution, thereby suppressing overconfident OOD predictions and enlarging the separation margin. Extensive experiments across diverse benchmarks demonstrate the effectiveness of AOE.
【3】Patched-DeltaNet: Token-Level Event-Driven Memory for Linear-Time Anomaly Detection
标题:Patched-DeltaNet:用于线性时间异常检测的令牌级事件驱动存储器
链接:https://arxiv.org/abs/2605.27992
备注:7 pages, 2 tables
摘要:Time series anomaly detection is critical for maintaining the reliability of mission-critical systems. While Transformer-based models like PatchTST have shown remarkable performance, their $\mathcal{O}(L^2)$ computational complexity severely limits deployment in resource-constrained environments. In this paper, we propose Patched-DeltaNet, a novel architecture combining time-series patching with Gated Delta Networks. By integrating these paradigms, we hypothesize and demonstrate the emergence of token-level event-driven memory, whereby the patching mechanism extracts local semantic chunks, while the error-driven DeltaNet updates its recurrent state exclusively when significant physical changes, defined as deltas, occur. This synergy effectively filters out background noise and captures sudden anomalous drifts. Our rigorous experiments on the Server Machine Dataset (SMD) benchmark demonstrate the structural superiority and sample efficiency of Patched-DeltaNet. By strictly outperforming recent architectures under unified evaluation constraints and identical compute budgets, our model yields an ROC-AUC of 0.957 and PA-F1 of 0.822, while drastically reducing computational complexity to the theoretical minimum of $\mathcal{O}(L/P)$.
【4】Mahalanobis PatchCore: Covariance-Aware and Streaming-Compatible Industrial Anomaly Detection
标题:Mahalanobis PatchCore:协方差感知和流兼容的工业异常检测
链接:https://arxiv.org/abs/2605.27748
备注:57 pages, 7 figures
摘要:Industrial visual anomaly detection is usually one-class: normal images are abundant, while defects are rare, heterogeneous, and often unavailable during system design. PatchCore-style retrieval suits this setting because it scores test images from a memory bank of normal patch features, but the standard Euclidean geometry ignores feature correlations and its offline construction materialises the full patch pool before subsampling. We introduce Mahalanobis PatchCore, a covariance-aware, streaming-compatible extension of PatchCore. Its artificial intelligence contribution is a retrieval detector that estimates a regularised covariance model in reduced feature space and whitens embeddings, so Euclidean nearest-neighbour search after transformation implements Mahalanobis retrieval. A bounded-memory, re-iterable training pipeline builds the memory bank without storing all normal patches at once, using incremental dimensionality reduction, online covariance estimation, and streaming aggregation. The engineering application is automated industrial inspection, where visual anomaly detection must remain accurate under practical memory limits. We evaluate the method on a public 15-category industrial anomaly-detection benchmark and three industrial datasets covering blow-fill-seal strip-ampoule meniscus inspection, amber-glass-ampoule bottom inspection, and lyophilised-cake vial inspection. Mahalanobis PatchCore preserves most offline PatchCore image-level performance on the public benchmark while reducing peak memory from 5.41 to 2.78 GB, and improves the selected industrial mean image area under the receiver operating characteristic curve from 0.981 to 0.986.
【5】The Fundamental Limits of Fraud Detection in Card Payment Networks
标题:卡支付网络欺诈检测的基本局限性
链接:https://arxiv.org/abs/2605.27557
摘要:Card payment fraud detection is usually framed as a supervised classification problem. Although this approach has generated practical progress, improvement has remained incremental despite major advances in model architecture. We argue that this is not mainly a failure of function approximation or optimization, but a consequence of structural information impairments inherent to the payment ecosystem. We formalize card authorization as a sequential decision problem with delayed, censored, corrupted, and counterfactually missing feedback. We derive a minimax regret lower bound showing that these impairments enter multiplicatively in the denominator of the achievable learning rate. The bound implies that improving issuer reporting quality or reducing censorship can yield larger reductions in the regret floor than increasing model complexity. We also show that heterogeneity across issuers worsens learnability beyond what average impairment rates suggest. The paper contributes a theory of why fraud detection in payment networks is fundamentally harder than in standard online learning settings, identifies ecosystem information quality as the key bottleneck, and provides a theoretical basis for prioritizing investments in reporting infrastructure, dispute process quality, and selective exploration. The paper is theory-first and does not rely on proprietary transaction data.
【6】Evolving and Detecting Multi-Turn Deception using Geometric Signatures
标题:使用几何签名进化和检测多回合欺骗
链接:https://arxiv.org/abs/2605.27671
摘要:Safety defenses for large language models (LLMs) are typically trained and evaluated on single-turn prompts, yet real attacks often unfold as indirect, multi-turn probing. To defend against this more nuanced form of deception, we present a unified pipeline that generates realistic multi-turn deceptive question sets via multi-objective genetic prompt optimization with co-evolving mutation operators. We validate this dataset through a human study, which also revealed that early generations yielded the most convincing deception and practical constraints such as adherence filtering and ordering effects. Using this data, we were able to detect deceptive attempts to access prohibited information using simple, explainable geometric signals in embedding space coupled with a lightweight feed-forward classifier. Three geometric features (angular coverage, distance ratio, and linearity) augmented with pairwise similarity statistics led to a compact predictive model that achieved consistently high recall (0.89) across base, reworded, and truncated (three-turn) scenarios, with test-time F1 ranging from 0.74-0.86. The results support a central hypothesis that multi-turn deceptive intent leaves a stable geometric footprint that enables lightweight, transparent screening without expensive end-to-end training. We further discuss responsible uses, limitations, and paths toward larger, more diverse human-evaluated datasets. The primary contribution to artificial intelligence is the multi-objective evolutionary framework for prompt generation, and the engineering application is the deployment of a lightweight geometric detection system for LLM safety infrastructure.
分类|识别(4篇)
【1】Bias Leaves a Gradient Trail: Label-Free Bias Identification via Gradient Probes on Concept Decompositions
标题:偏见留下梯度痕迹:通过概念分解的梯度探针进行无标签偏见识别
链接:https://arxiv.org/abs/2605.28780
备注:Accepted to the 49th German Conference on Artificial Intelligence (KI2026)
摘要:Vision classifiers can exploit spurious correlations, achieving high in-distribution accuracy yet failing under distribution shift. Existing approaches to bias mitigation and analysis often depend on curated datasets, spurious-attribute or group labels, or retraining, which may be infeasible once a model is deployed or the relevant bias is unknown. We present a bias-label-free, post-hoc method for identifying spurious concepts in frozen vision models, relying only on standard class labels from a held-out audit dataset. For each target class, we collect patches from inputs predicted as that class and apply non-negative matrix factorization to intermediate activations to obtain a bank of interpretable concept vectors. Candidate concepts are then ranked with a bias estimator derived from their interaction with backpropagated gradients on misclassified examples: bias concepts tend to get activated when correcting false negatives and suppressed when correcting false positives. On Colored MNIST and Waterbirds the method recovers concepts aligned with the known spurious cue, and on CelebA it surfaces decision-relevant directions that only partially coincide with the annotated gender attribute; suppressing the top-ranked concepts at inference time improves worst-group accuracy by up to 17.9 percentage points on Waterbirds and 10.4 on CelebA without any retraining or parameter updates. Our method identifies decision-relevant spurious directions that need not coincide with annotated ones, providing both an interpretable auditing tool and an actionable debiasing handle for frozen vision models. Code is available at https://github.com/vitryt/label-free-bias-identification.
【2】Machine Learning methods for event classification and vertex reconstruction of the 12C + 12C reaction with the MATE-TPC
标题:使用MATE-PPC进行12 C +12 C反应的事件分类和顶点重建的机器学习方法
链接:https://arxiv.org/abs/2605.28296
摘要:In modern nuclear physics experiments, identifying events of interest is challenging for nuclear reaction studies with the active target Time Projection Chamber (TPC). In this work, machine learning techniques are employed to analyze the complex data of the 12C + 12C fusion reaction from a TPC named MATE (multi-purpose active-target time projection chamber for nuclear experiments). Specifically, we successfully applied Residual Neural Network (ResNet-50, ResNet-34 and ResNet-18) and Visual Geometry Group (VGG-19) to classify elastic scattering and fusion reaction events from the 12C + 12C reaction. The classification results of the four models are nearly identical, with accuracies of approximately 97% for the simulated data and 90% for the experimental data. Moreover, these approaches successfully identify some events that are misclassified by traditional methods. These models are also applied to classify events from different fusion reaction channels, with classification accuracies of approximately 95% on simulated data. In addition, a Convolutional Neural Network (CNN) model is developed to reconstruct the reaction vertex, providing an alternative strategy for vertex reconstruction. These results indicate that machine learning techniques can effectively classify reaction events from different channels and reconstruct the reaction vertex, thereby paving the way for future analyses of complex nuclear reaction data.
【3】A Simple State Space Model Excels at Multivariate Time Series Classification
标题:简单的状态空间模型擅长多元时间序列分类
链接:https://arxiv.org/abs/2605.27406
摘要
:Structured state space models (SSMs) have recently emerged as a promising foundation for sequence modeling, with Mamba-based architectures demonstrating strong performance through input-dependent state transitions, albeit at considerable complexity. However, their application to time-series classification (TSC) has been largely limited to Mamba-style architectures, leaving the broader SSM design space underexplored. We present the first systematic study spanning diagonal SSMs (S4D) and input-dependent SSMs (Mamba family) on large-scale TSC benchmarks, asking whether such complexity is necessary for top performance. Our results reveal a surprising finding: S4D consistently outperforms Mamba-based variants in both accuracy and efficiency, challenging the assumption that increased complexity translates to meaningful gains in TSC. Building on this, we introduce MS4, lightweight modifications to S4D via a linear input projection and channel-mixing mechanism, and MS4N, a normalized variant that stabilizes state dynamics with negligible overhead. Evaluated on 59 datasets across MONSTER (up to 60 million samples, 50K timesteps, 82 classes) and the UEA benchmark, against 15 baselines, MS4 and MS4N consistently outperform Mamba-based models while remaining more efficient, and MS4N matches or surpasses competing deep learning models that are roughly 2x and 10x larger in parameters. These results position lightweight structured SSMs as a compelling alternative to scaling complexity for TSC.
【4】Benchmarking Ultrasound Foundation Models for Fetal Plane Classification
标题:胎儿平面分类的超声基础模型基准
链接:https://arxiv.org/abs/2605.27796
摘要:Ultrasound is widely used in obstetric care due to its safety, accessibility, and real-time imaging. However, interpretation remains operator-dependent and susceptible to noise and artifacts. Deep learning models have shown strong performance to solve these problem, but they typically require large annotated datasets that are difficult to obtain in clinical ultrasound. Foundation models (FMs) offer an alternative, using a large number of ultrasound images to learn transferable representations that can generalize with limited labeled data. This work presents a comprehensive benchmark of ultrasound-specific FMs for fetal plane classification. We evaluated four ultrasound FMs (USFM, MOFO, UltraSAM, FetalCLIP) against two CNN baselines (ResNet50, EfficientNet-V2) and a ViT (DINOv3) pretrained on natural images. We trained all models under two complementary settings: full fine-tuning and linear probing with a frozen encoder. All models were trained using 5-fold patient-level cross-validation on a Spanish fetal ultrasound dataset and tested on both in-domain data and an external African cohort to assess cross-population generalization. We found that FetalCLIP achieved the best results in the linear probing setting (F1 = 0.9261 for in-domain, F1 = 0.9731 for out-of-domain), while USFM performed best in the full fine-tuning setting (F1 = 0.9476 for in-domain, F1 = 0.9515 for out-of-domain). MOFO and UltraSAM degraded most in both settings, underperforming natural image pretrained models in some cases. These findings highlight how the choice of pretrained model strongly affects fetal plane classification performance, since different pretraining objectives lead to different levels of transferability.
表征(6篇)
【1】Beyond Binary: Sim-to-Real Dexterous Manipulation with Physics-Grounded Contact Representation
标题:超越二进制:具有物理接地接触表示的模拟到真实的灵巧操纵
链接:https://arxiv.org/abs/2605.28812
备注:Project site: https://mpan31415.github.io/tactile_rep/
摘要:A primary bottleneck in contact-rich manipulation is the difficulty of collecting real-world data. Sim-to-real reinforcement learning offers a scalable alternative, but the simulation-reality gap prevents information-dense modalities like touch from being effectively used. Existing sim-to-real methods often mitigate this gap by simplifying tactile data into coarse low-dimensional features -- sacrificing the richness required for complex manipulation. In this work, we introduce Center-of-Pressure (CoP), an effective tactile representation grounded in physical principles that preserves dense contact information while maintaining robustness for sim-to-real transfer. To support this representation, we propose a sensor calibration scheme based on differentiable dynamics, enabling the estimation of taxel orientations without requiring ground-truth force measurements. We evaluate CoP on two blind, challenging contact-rich manipulation tasks: peg-in-hole insertion and ball balancing. Across both tasks, policies conditioned on CoP achieve zero-shot sim-to-real transfer on a multi-fingered hand, and outperform both coarse binary-contact and raw-taxel baselines. Analysis of learned policy states further suggests that CoP-conditioned policies encode task-relevant physical properties, such as object mass, as an emergent byproduct of control.
【2】Multi-Mixer Models: Flexible Sequence Modeling with Shared Representations
标题:多混合器模型:具有共享表示的灵活序列建模
链接:https://arxiv.org/abs/2605.28769
摘要:Softmax attention is the cornerstone of modern large language models, but its memory scales linearly and compute quadratically with sequence length. Linear recurrent models, such as linear attention and state space models, have become widely studied as alternatives to attention due to their linear compute and constant memory. While these sub-quadratic token mixing methods, or mixers, achieve promising efficiency gains and competitive results on a wide range of benchmarks, current linear recurrent models still lag behind on tasks that require long-context retrieval or in-context learning. A growing body of work studies hybrid architectures that attempt to mitigate these trade-offs by statically interleaving or merging attention and recurrent blocks. In this work, we explore a new axis of developing hybrid models: across the token sequence. We propose Oryx, a hybrid model that can, throughout a sequence, flexibly switch between different mixers, for example quadratic attention for rich context utilization and linear recurrences for efficient generation. Oryx ties at least 90% of its parameters across mixers, enabling attention and recurrent modes to operate over shared internal representations. We validate our design with Mamba-2 and Gated DeltaNet variants, up to 1.4B models. Under fixed token budgets and a mixed-training strategy, Oryx achieves comparable or better performance than its single-mixer baselines. At the 1.4B scale, all instances of Oryx outperform their respective baselines by at least 0.7 percentage points on averaged language modeling tasks. On retrieval tasks, Oryx achieves performance comparable to the Transformer baseline even when processing only a tiny fraction (<10%) of the tokens in attention mode. These results suggest that attention and linear recurrent models can share internal representations, and motivate sequence-axis hybridization as a promising direction.
【3】Random Process Flow Matching: Generative Implicit Representations of Multivariate Random Fields
标题:随机过程流匹配:多元随机场的生成隐式表示
链接:https://arxiv.org/abs/2605.28625
摘要:
【4】Dynamic Topic Modeling with a Higher-Order Hypergraphical Representation
标题:使用更高级超图形表示的动态主题建模
链接:https://arxiv.org/abs/2605.28269
备注:34 pages, 4 figures
摘要
:Dynamic topic modeling is widely used to analyze evolving trends in scientific literature, medical records, and social media. Traditional topic models represent each topic through a single probability vector on the multinomial simplex and implicitly couple word occurrence and repetition within one probabilistic mechanism. However, this formulation restricts the dependence structure among words and overlooks informative higher-order interactions, particularly in dynamic corpora with overlapping semantics. To address these limitations, we introduce a hypergraph representation of text where each document is modeled as a hyperedge connecting all co-occurring words, with repetition intensities encoded as node weights. This representation naturally separates word occurrence from repetition and induces a novel hypergraph-based multinomial distribution with a nonlinear normalization depending on the observed word set of each document. Building on this likelihood, we develop a dynamic topic modeling framework via structured low-rank factorizations with explicit temporal regularization on topic-word profiles. Moreover, we establish local convergence guarantees and derive non-asymptotic error bounds despite the intrinsic nonconvexity induced by bilinear factorization and document-specific nonlinear normalization. Numerical experiments on synthetic data and an application to the International Conference on Learning Representations (ICLR) corpus demonstrate consistent improvements over existing multinomial-based topic models.
【5】When do complex-valued neural networks help? A study of representation, geometry, and optimization
标题:复值神经网络什么时候有帮助?表示、几何和优化的研究
链接:https://arxiv.org/abs/2605.27673
摘要:Complex-valued Neural Networks (CVNNs) are often motivated by domains where information is naturally encoded in magnitude and phase. Yet complex-valued inputs alone do not determine when complex arithmetic improves learning: the label signal may lie in amplitude, phase, their coupling, or a symmetry that real-valued models can also represent under suitable coordinates. We study this through a representation-first evaluation of CVNNs against Cartesian real, polar, phase-only, magnitude-only, parameter-matched real, and FLOP-matched real baselines. Across synthetic RF tasks, complex representations are useful but not universally superior. PSK-only tasks favor phase-aware and complex-valued models, QAM-only tasks favor magnitude-based models, mixed PSK+QAM gives only a small complex-valued advantage, and unseen carrier-phase rotations break coordinate-dependent models without augmentation. Similar patterns appear beyond RF: in quantum-wavefunction prediction, momentum is invisible to $|ψ|$ but recoverable from phase, while EEG analytic-signal experiments show that phase locking, amplitude bursts, and phase-amplitude coupling each favor different coordinate views. We also identify a benchmarking artifact on RadioML 2018.01A. Under matched-shared-trial selection, a CReLU complex model exceeds the best real baseline by 22.94 PP; under independent per-family tuning on the same data and 16-trial search space, the gap collapses to 2.46 PP. Gradient analysis traces the inflated gap to high-learning-rate first-step instability in real baselines, while complex parameter coupling distributes the loss signal more robustly. A learning-rate $\times$ activation factorial confirms the failure is primarily hyperparameter-driven. Overall, CVNNs are best viewed as structured inductive biases whose gains depend on representation, symmetry, and optimization, not as universally superior architectures.
【6】Exploratory Experience Shapes the Geometry of Predictive Representations
标题:探索性经验验证预测表示的几何学
链接:https://arxiv.org/abs/2605.27929
摘要:Active sensing links behavior and learning through an action-perception loop: actions determine the observations used to update internal predictive models of perception, which subsequently guide the next actions. Predictive-coding frameworks provide a natural way to model this process, since internal representations are continuously updated to predict future observations. Here, we ask how exploratory and exploitative behavioral strategies shape these internal predictive representations. We build an online learning agent in a tree-like maze with a controllable parameter regulating the balance between exploratory and exploitative regimes. The agent updates a predictive-coding-based perception model from experience generated by its own behavior. The model predicts both future maze states and reward probability, allowing the agent to select actions either by expected information gain during exploration or by predicted reward during exploitation. We show that the resulting internal predictive representations depend strongly on the agent's behavioral regime. Exploratory agents develop representations that are more spatially organized and better preserve the structure of maze transitions in latent space. In contrast, exploitative agents learn less organized representations. We then train this predictive model on natural trajectories of water-deprived mice navigating the same maze and compare the resulting representations with those learned from agent trajectories. More exploratory mice show representational geometries that closely match those of exploratory agents, whereas mice with more restricted visitation patterns resemble reward-driven, exploitative agents. Together, these findings suggest that exploration enables predictive models to form generalized internal representations by organizing latent space around both spatial location and transition context in artificial agents and animals.
编码器(3篇)
【1】Semantic Optimal Transport for Sparse Autoencoder Feature Matching and Circuit Compression
标题:稀疏自动编码器特征匹配和电路压缩的语义最优传输
链接:https://arxiv.org/abs/2605.28567
备注:preprint
摘要:
【2】Score Based Error Correcting Code Decoder
标题:基于分数的错误纠正码解码器
链接:https://arxiv.org/abs/2605.28358
备注:Accepted to ICML 2026
摘要:Error-correcting codes enable reliable communication, yet practical soft decoding remains challenging across code families and block lengths. We propose SB-ECC, a score-based decoder that casts decoding as continuous-time denoising. A neural denoiser defines a probability-flow ordinary differential equation (ODE) that iteratively updates the noisy channel observation toward a valid codeword, guided by parity constraints. The model is trained across noise levels without time/SNR conditioning, enabling inference without SNR estimation and supporting a direct latency accuracy trade off controlled by the ODE solver budget. We use the raw signed channel observation as input for learning a continuous denoising field. Across 42 code/SNR settings, SB-ECC achieves the best BER in 39/42 entries, with an average SNR gain of 0.17dB and a maximum gain of 0.46dB over the strongest competing baseline, we showed that swapping the solver from Euler to DPM preserves -ln(BER) while reducing end-to-end decoding time by 8.86% on average (up to 12.82%).
【3】IRDS: Interpretable RLVR Data Selection via Verifier-Coupled Sparse Autoencoder Coverage
标题:IRDS:基于验证器耦合稀疏自动编码器覆盖的可解释RLVR数据选择
链接:https://arxiv.org/abs/2605.28247
备注:24 pages,3 figures,18 tables
摘要
:Reinforcement learning with verifiable rewards (RLVR) has become a key technique for en- hancing LLM reasoning, yet its data ineffi- ciency remains a major bottleneck. Existing methods address this problem only partially, each missing at least one of subset-level cov- erage, verifier signal use, or interpretability. To address this gap, we present IRDS (Inter- pretable RLVR Data Selection), which selects RLVR training instances on a sparse autoen- coder (SAE) cluster basis so the selection itself is auditable on recognizable problem motifs. To select instances the model both fails on and can still learn from, we introduce a verifier- coupled coverage objective on the SAE basis and solve it by greedy log-determinant max- imization. Experiments on three instruction- tuned models and six math reasoning bench- marks show that IRDS achieves the highest overall accuracy, exceeding the strongest base- line by +3.9/+4.0 pp on the two Qwen models and by +0.5 pp on Llama-3.1-8B, while run- ning an order of magnitude cheaper than the trajectory-based baseline.
优化|敛散性(11篇)
【1】Optimal ridge regularization revisited
标题:重新审视最佳山脊正规化
链接:https://arxiv.org/abs/2605.28679
摘要:
【2】Outer-Momentum Restarting in High-Dimensional Two-Phase Optimization
标题:多维两阶段优化中的外动量重新启动
链接:https://arxiv.org/abs/2605.28585
摘要:
【3】Fitting Unknown Number of Hyperplanes with Manifold Optimization
标题:用总管优化来匹配未知数量的超平面
链接:https://arxiv.org/abs/2605.28501
摘要:Fitting an unknown number of hyperplanes to data is a fundamental yet challenging problem in machine learning, characterized by its non-convexity, non-differentiability, and unknown model order. Existing approaches often struggle with local optima or lack geometric consistency. To address these limitations, we propose a novel framework based on Manifold Optimization. We reformulate the problem as an unsupervised learning task on the unit sphere manifold $\mathcal{S}^{\textbf{dim}-1}$. This formulation effectively handles the non-convex constraints and linearizes the distance measurement, rendering the gradient descent tractable. We propose a Two-Stage Manifold Optimization algorithm. In Phase I, we employ a Riemannian Expectation-Maximization process with a heavy-tailed kernel to robustly estimate posterior probabilities, effectively resolving the ambiguities of point distribution between intersecting hyperplanes. In Phase II, upon convergence of the soft estimates, the probabilistic weights degenerate into hard matching, generating a precise local optimum that strictly satisfies the geometric definition. Furthermore, we introduce a projected density estimation strategy for initialization to facilitate global convergence by significantly reducing the feature description space and search complexity. Extensive experiments demonstrate that our method outperforms state-of-the-art baselines in both geometric accuracy and robustness.
【4】Learning to Assess the Reliability of Number-of-Runs Estimation in Stochastic Optimization
标题:学习评估随机优化中数量估计的可靠性
链接:https://arxiv.org/abs/2605.28309
备注:Preprint version of a poster accepted at the Genetic and Evolutionary Computation Conference 2026 (GECCO 2026)
摘要:In large-scale benchmarking of stochastic optimization algorithms, the key challenge is no longer whether repeated runs are needed for reliability, but how to determine when sufficient evidence has been collected without incurring unnecessary computational cost. We study a learning-based extension of a recent empirical online heuristic that adaptively estimates the required number of runs using outlier handling and skewness-based symmetry checks. Using annotated outcomes from 132{,}000 Nevergrad runs on COCO (24 problems in 20 dimensions, 10 instances each, 11 optimizers), we train classifiers on 23 statistical, energy-free, and shape and stability features to predict whether a run-number estimate is reliable, prioritizing detection of incorrect estimates via minority-class recall. We evaluate reliability prediction using a within-configuration learning setup, where models are trained and tested on data sharing the same optimizer. The results show that run-number reliability can be learned in a within-configuration scenario, enabling detection of unreliable estimates with high minority-class recall, although performance remains limited by the restricted data diversity within fixed configurations.
【5】Long Live The Balance: Information Bottleneck Driven Tree-based Policy Optimization
标题:平衡万岁:信息瓶颈驱动的树基政策优化
链接:https://arxiv.org/abs/2605.28109
备注:Accepted to ICML 2026 main conference
摘要:Recent advances in online reinforcement learning (RL) for large language models (LLMs) have demonstrated promising performance in complex reasoning tasks. However, they often exhibit an imbalanced exploration-exploitation trade-off, resulting in unstable optimization and sub-optimal performance. We introduce IB-Score, a novel metric grounded in Information Bottleneck theory that evaluates policy's exploration-exploitation balance by quantifying the trade-off between step-level reasoning diversity and mutual information shared with the correct answer. Analysis based on IB-Score shows that popular online RL approaches (e.g., GRPO) with common regularizers fail to consistently maintain balance during training with suboptimal results. To address this, we propose Information Bottleneck-driven Tree-based Policy Optimization (IB-TPO), a principled framework that formulates IB-Score as a fine-grained optimization objective and utilizes a novel IB-guided tree sampling strategy that not only improves the efficiency of online sampling with 50% more trajectories under the same token budget, but also reuses the tree structure for effective IB-Score Monte Carlo estimation. Extensive experiments across standard benchmarks show that our method significantly outperforms GRPO baseline by 2.9% to 3.6% and also outperforms other state-of-the-art online RL approaches. Our code is available at https://github.com/alibaba/EfficientRL.
【6】Revisiting ML Training under Fully Homomorphic Encryption: Convergence Guarantees, Differential Privacy, and Efficient Algorithms
标题:重新审视完全同形加密下的ML训练:收敛保证、差异隐私和高效算法
链接:https://arxiv.org/abs/2605.27782
摘要:We present the first theoretical convergence analysis of machine learning training under fully homomorphic encryption (FHE), combined with a differentially private (DP) training algorithm tailored to encrypted computation. Our approach improves computational efficiency over standard differentially private gradient descent (DP-GD) while achieving comparable utility. In particular, we prove convergence of approximate gradient descent using polynomial approximations of activation and loss functions, which are required for FHE compatibility. To preserve privacy in downstream tasks, we integrate differential privacy without relying on costly per-sample gradient clipping, enabling scalable encrypted learning. We also provide data-independent hyperparameter selection and theoretically grounded strategies for polynomial approximation which can be of independent interest. Together, these contributions advance the feasibility of efficient, private, and secure machine learning on sensitive data.
【7】IGADA-IoT: IoT Sensor Energy Optimization in Wireless Sensor Networks Driven by Automatic Data Augmentation
标题:IGADA-IOT:自动数据增强驱动的无线传感器网络中的物联网传感器能量优化
链接
:https://arxiv.org/abs/2605.27397
摘要:In wireless sensor networks (WSNs), data augmentation is a novel method to improve sampling-frequency decision performance, thereby enabling energy optimization for IoT (Internet of Things) sensors. However, existing methods rely on a single generator and empirically determined quantities, failing to establish a mapping between dynamic information gaps and multiple generators, and overlooking the heterogeneity of generated samples. Moreover, an evaluation and a closed-loop method that jointly considers the information gap and the model performance are lacking. To address these issues, we propose an information gap-guided IoT sensor automatic data augmentation framework (IGADA-IoT) with hierarchical multi-generator collaboration and scheduling over multiple rounds. Capabilities of different generators are jointly utilized to reduce the information gaps. In the IGADA-IoT, a hierarchical multi-generator collaboration and scheduling strategy (HMGCS) is proposed to enhance the targetedness and rationality of generated sample allocation. An information gap-model performance joint evaluation and closed-loop method (IGMP-EC) is proposed to enhance the accuracy of augmentation decisions, and to mitigate the risks of under-augmentation and over-augmentation. Experimental results show that the IGADA-IoT improves the average accuracy of multiple downstream models by 7.27%. Compared with advanced data augmentation methods, the average accuracy is improved by 8.67%. Compared with the individual generators, the average accuracy is improved by 7.24%. Furthermore, public IoT sensor datasets from the UCR Archive and real-world deployments demonstrate the accuracy and generalizability of the proposed method.
【8】Decision-focused learning for optimal PV-Battery scheduling
标题:以决策为中心的学习以实现最佳PV电池调度
链接:https://arxiv.org/abs/2605.28340
摘要:The use of residential photovoltaics has increased dramatically in recent years. With battery systems becoming more affordable, the optimal operation of a photovoltaic-battery system can bring significant savings to households. Optimal control requires correct forecasts of underlying parameters, such as photovoltaic power generation, to schedule the battery. While forecasting models have become increasingly accurate due to algorithmic advances and data availability, accuracy is typically measured in generic metrics which might not align with the downstream application. This study proposes a decision-focused learning framework that integrates optimization and prediction by training a Long Short-Term Memory photovoltaic energy forecaster on the downstream optimal scheduling of a battery system. The proposed methodology is compared against a standard two-phase approach. Across a 14-month evaluation period, the decision-focused method reduced average electricity costs across twenty buildings by 3.6% when normalized against performance bounds defined by a perfect forecast and a baseline of no optimization. Critically, this financial improvement was achieved despite the model exhibiting a root mean squared error of 19.9%, significantly higher than the decoupled model's 8.2%. Warm-starting the decision-focused model further improves results, lowering average cost by approximately 8%, while also mitigating the negative impact on statistical accuracy (root mean squared error of 13.7%). The findings are statistically significant at the 0.001 level across the twenty households and for each household individually. These results demonstrate that aligning forecast models with optimization goals is key for achieving cost advantages in PV-battery systems. Future research should replicate these findings on other datasets, alternate forecasting models and alternate optimization algorithms.
【9】Insurance Pricing Optimization via Off-Policy Evaluation
标题:通过保单外评估优化保险定价
链接:https://arxiv.org/abs/2605.28327
摘要:Traditional insurance pricing relies on risk-based principles that ensure actuarial fairness and solvency but do not explicitly account for policyholders' price sensitivity. We formulate insurance pricing as a decision-making problem and study it using tools from off-policy evaluation and stochastic control. We propose a kernelized inverse propensity score estimator that exploits local structure in the action space and yields variance reduction compared to the classical inverse propensity score estimator. Building on these value estimates, we investigate policy optimization and present two practical approaches for computing optimal pricing rules: an interpretable data-shared Lasso formulation and a flexible policy parameterization based on neural networks. Using a controlled synthetic travel insurance environment, we empirically confirm the theoretical results and show that neural networks outperform existing techniques for policy optimization.
【10】Counterfactually Fair Regression via Optimal Transport
标题:通过最佳运输实现反事实公平回归
链接:https://arxiv.org/abs/2605.28251
摘要:We consider the problem of learning a counterfactually fair regressor. We adopt a causal uncertainty view in which counterfactual fairness is defined with resampled noise. We focus on obtaining theoretical fairness guarantees for a new post-processing estimator. We begin by showing that counterfactual fairness is equivalent to satisfying demographic parity conditional on the latent variable. This allows us to provide a closed-form expression of the optimal fair regressor via a barycentric quantile map. In order to handle continuous latent variables, we propose a discretized post-processing method. Then, under mild regularity assumptions, we prove high-probability finite-sample fairness guarantees for our estimator, providing an unfairness decay at rate $\tilde O(n^{-1/3})$, and establishing a matching risk bound of order $\tilde O(n^{-1/3})$. We provide a matching lower bound on the excess risk of almost fair predictions. Finally, we extend our results to the setting of relaxed counterfactual fairness. We validate our approach on real-world and synthetic data.
【11】Is Backpropagation Optimal? When Synthetic Gradients Improve Sample Efficiency
标题:反向传播是最佳的吗?当合成物质提高样本效率时
链接:https://arxiv.org/abs/2605.27946
摘要:Backpropagation is the default learning rule for artificial neural networks and is often treated as the settled approach whenever differentiability is available. In this work, we revisit this convention through a theoretical lens of sample efficiency. We introduce a unified vectorized feedback framework for loss-based and reward-based learning on computational graphs, in which synthetic gradients emerge as a natural alternative to backpropagation. We characterize the conditions under which synthetic gradients can achieve a lower gradient-estimation mean squared error than backpropagation. We construct examples illustrating that this sample efficiency advantage can be arbitrarily large. Experiments on contextual bandits and reinforcement learning tasks demonstrate the potential of our theoretical findings.
预测|估计(7篇)
【1】Tactile-Proprioceptive Sensor Fusion for Contact Wrench Estimation in Whole-Body Physical Human-Robot Interaction
标题:触觉-主体感受传感器融合用于全身物理人-机器人交互中接触扳手估计
链接:https://arxiv.org/abs/2605.28412
备注:8 pages, 6 figures. Accepted to IEEE International Conference on Robotics and Automation (ICRA) 2026
摘要:Direct physical guidance is a natural means of teaching and interacting with robots, and robotic skins make a key contribution by enabling sensitive contact sensing and localization. This paper presents a tactile-proprioceptive sensor fusion framework for natural physical human-robot interaction. Tactile cues from pneumatic skin pads serve as contact indicators that bypass the ambiguity between frictional residues and applied external forces, enabling highly sensitive contact detection without explicit friction identification. We fuse these cues with motor-current-based proprioception to reconstruct multi-axis contact forces on the robot surface. To maintain accuracy during motion, we employ a temporal convolutional network (TCN) to mitigate friction hysteresis during stick-slip transitions, reducing uncertainty at contact onset and yielding smooth, responsive guidance. We validate the approach on a skin-integrated robot arm: (i) multi-axis forces are reconstructed in stationary contacts, and (ii) simultaneous force estimation and kinesthetic teaching are demonstrated. Results indicate improved sensitivity and responsiveness across diverse contact conditions compared with tactile-only and proprioceptive-only baselines, supporting tactile-proprioceptive fusion as a reliable pathway to safe, intuitive physical human-robot interaction.
【2】Chreode: A Cell World Model for One-Step Temporal Dynamics and Perturbation Prediction
标题:Chreode:一步时间动力学和扰动预测的细胞世界模型
链接:https://arxiv.org/abs/2605.28111
备注:25 pages, 3 figures, 14 tables. Submitted to NeurIPS 2026
摘要:Predicting how a cell will change its transcriptional state under a developmental signal or a genetic perturbation is the computational core of in-silico biology and the AI Virtual Cell program. Existing approaches either fit static control-to-treated maps that discard time, or solve multi-step ODE / Schrödinger-bridge problems on each dataset independently. We introduce Chreode, a one-step cell world model that predicts action-conditioned cell-state transitions through a structured residual transition operator. It shifts distributional evolution from inference time to training time, enabling single-pass generation while preserving a Waddington-inspired decomposition into downhill landscape flow, rotational in-tangent dynamics, and stochastic spread. The model is pretrained with a shared scVI encoder and a DiT-based dynamics backbone on a 2.4M-cell mouse embryonic atlas spanning 7 datasets. As a fine-tuning initialization, Chreode improves per-target Sinkhorn distance on Weinreb hematopoiesis and Veres islet differentiation over matched scratch models, PI-SDE, and PRESCIENT. As a transferable gene-state embedding for GEARS, the pretrained dynamics representation reduces shared-vocabulary DE20 mean squared error on Norman Perturb-seq from 0.2121 to 0.1858, a 12.4% relative improvement, without changing the GEARS training procedure. We interpret this transfer to perturbation prediction as evidence that pretrained developmental-trajectory dynamics encode differentiation primitives transferable to CRISPR-induced state shifts, since both involve cell-state transitions in a shared latent geometry. The pretrained backbone additionally produces zero-shot clonal fate scores on Weinreb that are competitive with strong dynamic-OT baselines.
【3】PINE: Pruning Boosted Tree Ensembles with Conformal In-Distribution Prediction Equivalence
标题:PINE:具有保形内分布预测等效性修剪增强树集合
链接:https://arxiv.org/abs/2605.28068
备注:Accepted to ICML 2026
摘要:Tree ensembles are machine learning models with strong predictive performance and interpretability, and remain widely used for tabular data. Standard pruning methods for tree ensembles typically optimize an accuracy-compression trade-off and may change a subset of predictions, potentially compromising decision consistency. Faithful pruning methods address this issue by preserving prediction equivalence over the entire input space, but this requirement leads to lower compression ratios. We propose PINE, a pruning method that provides strong guarantees within an in-distribution region. PINE preserves prediction equivalence within this region and controls the region size using a single parameter $α$ via conformal calibration. Experiments on 12 public tabular datasets show that PINE improves the compression ratio by up to $30\%$ while preserving predictions at a comparable level to existing faithful pruning methods.
【4】High-Fidelity Industrial Crash Dynamics Prediction via Geometry-Aware Operator Learning with Memory-Efficient Low-Rank Attention
标题:通过具有几何感知的操作员学习和记忆高效的低级别注意力进行高保真工业崩溃动态预测
链接:https://arxiv.org/abs/2605.27758
摘要:Automotive crashworthiness optimization remains a safety-critical challenge, requiring the management of large-scale nonlinear structural deformations and energy dissipation through iterative, high-fidelity simulations. While traditional finite element solvers are computationally prohibitive, emerging operator learning frameworks provide rapid surrogate predictions; however, applying them to industrial-scale crash analysis, where complex geometry, contact nonlinearities, and rapidly evolving transient deformation coexist, remains an open challenge. In this paper, we demonstrate that the GeoTransolver framework provides a viable solution for accurate, high-fidelity crash dynamics prediction at industrial scale. Benchmarked on complex bumper beam and full-vehicle crash datasets, GeoTransolver captures multi-scale geometric context and accurately resolves plastic deformation patterns as well as acceleration profiles at critical occupant locations. Beyond the architecture itself, we propose and systematically evaluate a suite of temporal prediction recipes, including one-shot, time-conditional, and autoregressive rollout strategies, demonstrating that the one-shot approach achieves state-of-the-art accuracy with significantly reduced training overhead and inference latency. As a secondary contribution, we introduce a Fast Low-rank Attention Routing Engine (FLARE)-based modification to the GeoTransolver attention backbone that reduces memory overhead by approximately 2x while further improving predictive accuracy for O(N) long-range, high-frequency transients, preserving the geometry-aware cross-attention strengths of the base framework. Our results highlight the practical viability of geometry-aware operator learning for high-fidelity surrogate modeling of complex, safety-critical automotive dynamics.
【5】Conservative neural posterior estimation via distributionally robust training
标题:通过分布稳健训练的保守神经后验估计
链接:https://arxiv.org/abs/2605.28516
摘要:Simulation-based inference with neural posterior estimation (NPE) often yields overconfident and unreliable posteriors under limited simulation budgets. To address this, we propose DRO-NPE, a distributionally robust approach that replaces the standard NPE objective with a worst-case loss over a Wasserstein ambiguity set. We introduce KL-based metrics for miscoverage and miscalibration, and use these to show that the DRO-NPE objective controls overfitting and reduces posterior overconfidence. Our method is tractable, parallelisable, and readily integrates with standard normalising flows. Across benchmark SBI tasks, DRO-NPE consistently improves coverage and calibration, while narrowing the gap between empirical and population NPE loss, leading to more reliable inference in low-simulation regimes.
【6】Skillful high-resolution weather forecasting independent of physical models
标题:独立于物理模式的高分辨率天气预报
链接:https://arxiv.org/abs/2605.28153
备注:26 pages, 10 figures
摘要:Accurate and timely weather forecasts are critical for high-impact decisions in modern society. Machine-learning-based weather prediction is emerging as an alternative for producing initial conditions, forecasts, and even both in end-to-end systems. These methods deliver predictions faster and often with higher skill than traditional numerical weather prediction (NWP). However, even end-to-end models typically rely on NWP-generated reanalyses for supervision, thereby inheriting the biases and resolution limitations of those NWPs, and limiting adaptation to settings where suitable reanalysis products are unavailable, infrequently updated, or expensive to produce. Here we introduce ObsCast, a regional system that generates both analysis and predictions, without using any NWP-derived data in either training or inference, while still achieving state-of-the-art performance in short-term high-resolution regional modeling. Over the contiguous United States and Europe, ObsCast outperforms operational NWP for near-surface variables through 18 h and produces skillful precipitation forecasts. It provides a simpler and more adaptable route to build and refine regional forecasting services directly from local observations, without the need to develop complex and costly traditional forecasting pipelines.
【7】Robust Moment-Based Estimation via Spectral Gradient Reweighting
标题:基于谱梯度加权的鲁棒矩估计
链接:https://arxiv.org/abs/2605.27718
摘要:Moment-based estimation is a theoretically attractive approach to parametric inference, especially when likelihood-based estimation is unavailable, misspecified, or computationally inconvenient. However, the moment equations involve sample averages, which makes moment-based estimation sensitive to outliers. We propose the SGR-GMM algorithm, a robust generalized method of moments (GMM) procedure that uses a spectral gradient reweighting (SGR) primitive to soft-reweight the per-observation gradients during the moment-matching optimization. Our analysis has three layers. First, for a fixed center, the SGR primitive is formulated as an entropy-regularized spectral game between a sample-weight player and a density-matrix player, which is analyzed using classical multiplicative-weights and matrix-multiplicative-weights regret bounds. Second, we establish explicit convergence radius and finite termination bound for the fixed-center updates in the SGR primitive. Third, we prove a local finite-sample parameter estimation error bound with explicit dependence on the contamination fraction, inlier gradient stability, local GMM identification strength, and optimization accuracy. We further specialize the SGR-GMM algorithm to obtain a robust diagonally-weighted GMM (DGMM) estimator for estimating heteroscedastic low-rank Gaussian mixtures observed under additive Gaussian noise and strong contamination. In the numerical experiments, the SGR primitive produces nearly-oracle gradient estimation and the robust DGMM specialization substantially improves over non-robust moment baselines. The code and data are available at https://github.com/liu-lzhang/sgr-gmm.
其他神经网络|深度学习|模型|建模(51篇)
【1】Learn from Weaknesses: Automated Domain Specialization for Small Computer-Use Agents
标题:从弱点中学习:针对小型计算机使用代理的自动域专业化
链接:https://arxiv.org/abs/2605.28775
摘要:Computer-use agents (CUAs) have recently made substantial progress, but deploying a separate large expert for each software domain remains expensive. Small open computer-use agents are more practical specialization targets, but they remain substantially weaker and exhibit uneven domain-specific failures. A straightforward remedy is to synthesize large-scale training data for the target domain, yet we find that this naive approach yields only marginal improvements. Building on this observation, we introduce LearnWeak, an annotation-free specialization framework for small computer-use agents that uses a stronger reference agent to identify the student's weaknesses in the target domain, synthesize targeted tasks, and construct supervision automatically. LearnWeak further introduces an error-aware specialization objective that disentangles planning and execution errors, enabling more behaviorally precise updates than broad uniform supervision. On OSWorld, LearnWeak achieves average gains of 11.6 and 11.1 percentage points over EvoCUA-8B and OpenCUA-7B, respectively, across eight domains. We also validate that our student-aware dataset generation and training approaches outperform existing autonomous trajectory generation and training baselines. Our work highlights the importance of student awareness in both data synthesis and agent training, pointing toward a more principled and efficient path for specializing small computer-use agents in diverse domains.
【2】Principled Algorithms for Optimizing Generalized Metrics in Multi-Label Learning
标题:多标签学习中优化广义表的原理算法
链接:https://arxiv.org/abs/2605.28767
摘要:Many real-world classification tasks require predicting multiple labels per instance, necessitating the optimization of complex evaluation metrics such as the $F$-measure and Jaccard index. While the Empirical Utility Maximization (EUM) framework is natural for these population-level metrics, existing theoretical results are largely limited to asymptotic Bayes-consistency. In this paper, we develop principled learning algorithms for optimizing a broad class of generalized metrics within the EUM framework, grounded in the stronger notion of $H$-consistency. Our key contribution is the design of novel surrogate loss functions for multi-label learning that admit provable $H$-consistency bounds, enabling optimization with non-asymptotic guarantees tailored to the hypothesis class and finite samples. Crucially, we prove these combinatorially formulated surrogates decompose exactly, operating in strictly $O(l)$ time without approximations. Building on this foundation, we introduce MMO (Multi-Label Metric Optimization), a new family of algorithms for optimizing generalized linear-fractional metrics. We validate our approach through extensive experiments, demonstrating robust scalability and superior performance over state-of-the-art continuous baselines on large-scale datasets (MS-COCO, Reuters-21578) in high-sparsity, deep learning regimes. Our results offer both theoretical rigor and practical effectiveness for general multi-label metric optimization.
【3】Code as a Weapon: A Consensus-Labeled Prompt Bank for Measuring Coding-Model Compliance with Malicious-Code Requests
标题:作为武器的代码:用于测量编码模型与恶意代码请求的一致性的标记有标签的提示库
链接:https://arxiv.org/abs/2605.28734
备注:21 pages, 9 figures, 5 tables. Consensus-labeled prompt bank consolidating eight malicious-code corpora (ASTRA, CySecBench, AdvBench/harmful_behaviors, JailbreakBench, MalwareBench, RedCode, RMCBench, Scam2Prompt) under a five-judge panel; 6,675 prompts, 33,375 classification calls, Fleiss' kappa = 0.767
摘要
:A general-purpose language model that answers a harmful question returns text; a coding model that complies with a malicious request can return a working weapon -- a keylogger, a ransomware stub, an exploit that runs as written. This asymmetry in the severity of a single act of compliance implies coding-specialized models should clear a higher refusal bar than general-purpose chat models, not a lower one, yet the field cannot presently tell whether they do. Refusal benchmarks for malicious code are fragmented: they mix requests for executable software (ready-to-run weapons) with requests for harmful security knowledge (information a human must still operationalise) and report refusal rates over non-comparable corpora, so no single statistic measures the property that actually matters. This paper introduces an expanded consensus-labeled prompt bank that distinguishes between these two request types and provides a construct-stable substrate for cross-corpus coding-model compliance measurement. Eight corpora (ASTRA, CySecBench, AdvBench/harmful_behaviors, JailbreakBench, MalwareBench, RedCode, RMCBench, Scam2Prompt) are consolidated and classified under a five-judge consensus protocol (6,675 prompts x 5 judges = 33,375 calls). The panel reaches Fleiss' kappa = 0.767 [95% CI 0.755, 0.777] ("substantial"); 95.0% of prompts draw at least four agreeing judges, 76.9% are unanimous, and the panel reproduces the earlier four-corpus release at Cohen's kappa = 0.952 on the 3,133 shared prompts. The released bank comprises 4,748 consensus-CODE prompts (executable malicious code requests) and 1,923 consensus-KNOWLEDGE prompts (harmful security knowledge requests). The bank is the validated instrument the field has lacked: a reliability-quantified basis for testing whether coding models meet the stricter refusal standard their executable output demands.
【4】Beyond Binary Moral Judgment: Modeling Ethical Pluralism in AI
标题:超越二元道德判断:人工智能中的道德多元化建模
链接:https://arxiv.org/abs/2605.28707
摘要:
【5】Expressive Power of Floating-Point Neural Networks with Arbitrary Reduction Orders and Inexact Activation Implementations
标题:具有任意约简阶和不精确激活实现的浮点神经网络的表达能力
链接:https://arxiv.org/abs/2605.28704
摘要:
【6】Interpretability-Guided Layer Selection over Subspace Projection: SAEs as Stethoscopes, Not Scalpels, for Raw Task Vector Model Editing
标题:子空间投影上的可解释性引导层选择:将SAEs作为听诊器,而不是手术刀,用于原始任务载体模型编辑
链接:https://arxiv.org/abs/2605.28649
摘要:
【7】When Interpretability Is Unequally Distributed: Fairness in Hybrid Interpretable Models
标题:当可解释性不均匀分布时:混合可解释模型中的公平性
链接:https://arxiv.org/abs/2605.28626
摘要:
【8】Learning High-Dimensional Parity Functions with Product Networks using Gradient Descent
标题:使用梯度下降通过产品网络学习多维宇称函数
链接:https://arxiv.org/abs/2605.28612
摘要:
【9】Continual Model Routing in Evolving Model Hubs
标题:进化模型中心中的连续模型路由
链接:https://arxiv.org/abs/2605.28577
备注:42 pages, 24 tables, 6 figures, to be published at ICML 2026
摘要:
【10】A Multi-dimensional Framework for Evaluating Generalization in EEG Foundation Models
标题:评估脑电基础模型推广性的多维框架
链接:https://arxiv.org/abs/2605.28563
备注:24 pages, 5 Figures
摘要:
【11】Conveyance: A Versatile Framework for Learning in Structured Class Spaces
标题:传达:结构化课堂空间中学习的通用框架
链接:https://arxiv.org/abs/2605.28420
摘要:While machine learning (ML) architectures have evolved rapidly to account for complex data, loss functions like cross-entropy remain mostly structure-agnostic in many real-world applications. However, the `class-symmetric' nature of these standard losses fundamentally limits the ability of ML models to exploit structural relationships between classes, particularly when facing structured noise. We propose \textsc{Conveyance}, a new classification approach and associated loss function tailored to structured class spaces. It allows users to encode graph-like relations between classes without having to define complex joint distributions or manually tune utility matrices.Technically, our loss function operates by maximizing two separate margins over distinct class partitions, while preserving formal properties such as monotonicity and partial convexity. We demonstrate the versatility and effectiveness of our method by applying it to hierarchical classification, ordinal regression, and multiple instance learning. Across these tasks, \textsc{Conveyance} either matches or exceeds the performance of specialized baselines, thereby offering a unified solution for structured class spaces.
【12】Revisiting Metafeatures to Explain Model Differences on Tabular Data
标题:重新审视元特征以解释表格数据上的模型差异
链接:https://arxiv.org/abs/2605.28418
摘要:With the rise of tabular foundation models alongside traditional models still performing well on many tasks, choosing the right model for a tabular dataset remains difficult. We investigate whether dataset meta-features can explain performance gaps between model families on tabular prediction tasks. Using the TabArena benchmark results, we analyze dataset-level performance gaps and relate them to model-agnostic dataset descriptors. After strict statistical tests with false discovery control, we find that (1) for neural network vs. tree gaps, no meta-feature survives false discovery control, (2) for non-foundation vs. foundation model gaps, one association is robust but does not generalize when tested in leave-one-dataset-out prediction, and (3) for TabICLv2 vs. TabPFN-2.6, one robust association also improves held-out prediction. Furthermore, we conduct a leave-one-dataset-out analysis and find that meta-feature predictors fail to improve meaningfully over a simple baseline. Overall, our results show the heterogeneity of tabular datasets and that global meta-feature approaches are not robust enough to offer explanations on the 51 TabArena datasets.
【13】CLANE: Continual Learning of Actions on Neuromorphic Hardware from Event Cameras
标题:CLANE:从活动摄像机持续学习神经形态硬件上的动作
链接:https://arxiv.org/abs/2605.28387
摘要:Recognizing and continuously learning novel human actions without forgetting prior classes is a requirement for emerging AR/VR and robotics applications. For these applications, both on-device processing and learning are essential for privacy and low-latency adaptation. Event cameras address the efficiency of visual sensing with sparse, asynchronous output that is naturally compatible with neuromorphic processing. Yet no prior system has deployed a continual on-device learning pipeline for event-based action recognition using neuromorphic hardware. We present CLANE, Continual Learning of Actions on Neuromorphic Hardware from Event Cameras, deployed end-to-end on Intel Loihi 2. CLANE combines a spiking 2D CNN for spatiotemporal feature extraction with CLP-SNN as its on-chip learning head, extended to action clips via a Temporal Aggregation Layer and a fixed-point Normalization Layer, both novel Loihi 2 modules. On THU E-ACT-50, a 50-class dataset captured under real-world conditions, CLANE achieves 70.4% accuracy in a continual learning task while delivering more than 100x energy reduction and 16x lower latency over a sequential CNN+GRU+CLP edge GPU baseline, validated through iso-algorithm cross-platform benchmarking across three evaluation levels.
【14】Hybrid Neural World Models
标题:混合神经世界模型
链接:https://arxiv.org/abs/2605.28317
备注:Preprint. Under review
摘要:Neural surrogates promise large speedups over classical solvers for physical dynamics but fail silently at sharp dynamical events such as shocks, fronts, and contact. We present hybrid neural world models for physical dynamics: a recipe for training and deploying multi-horizon surrogates in physical state space, where a single network with continuous horizon conditioning is trained with direct supervision against textbook reference solvers to predict any future state at horizon T in one forward pass. Although no part of the training data, loss function, or architecture supervises discontinuity location, the trained surrogate encodes it implicitly, recoverable from its forward passes alone as a per-trajectory error map that concentrates on shocks, fronts, and contacts, and stays small elsewhere. The map is competitive with or better than standard label-free baselines including deep ensembles, learned error heads, gradient-magnitude indicators, and locally-adaptive conformal prediction, while using only a single trained network and requiring no calibration set or governing-equation knowledge. The recipe supports two operating points. Mode 1 runs the surrogate alone for maximum throughput, with same-hardware CPU speedups of 26x to 72x against textbook solvers on the PDE environments. Mode 2 uses the error map to gate a reference-solver fallback, deferring uncertain trajectories and roughly halving the surrogate's residual error at the default operating point. The recipe applies without modification across reaction-diffusion, compressible Euler, and rigid-body collision dynamics.
【15】Compositional Generalization in Autoregressive Models via Logit Composition
标题:通过Logit合成进行自回归模型中的合成概括
链接:https://arxiv.org/abs/2605.28304
摘要:Composing autoregressive models remains a core challenge in understanding how large language models can combine behaviors or skills learned across tasks. We introduce a new and principled composition strategy for autoregressive systems, inspired by composition methods developed for diffusion models. Under a factorized-conditionals assumption, we show that the resulting composition is projective: each component model preserves control over its own designated subspace of the output distribution avoiding interference between models. This property is further preserved under smooth reparameterizations of the output space, yielding a feature-space theorem. Finally, we show that composition preserves length-generalizing behavior when the factorization assumptions and component guarantees hold uniformly at the target length. These results provide a principled understanding of when model composition and merging succeed in autoregressive systems and identify conditions under which their interactions remain stable.
【16】Parameter-Efficient Generative Modeling with Controlled Vector Fields
标题:具有受控向场的参数高效生成建模
链接:https://arxiv.org/abs/2605.28267
摘要:We introduce a continuous-time generative modeling framework, motivated by the Chow-Rashevskii theorem, that builds expressive flows from a small set of fixed vector fields and learned scalar controls. Instead of learning an unconstrained high-dimensional vector field, our framework constructs the velocity by modulating fixed vector fields with learned scalar control functions. When the fixed fields are bracket-generating, their Lie algebra spans the ambient space, providing a mechanism for expressive transport with only a small number of learned control channels and offering a parameter-efficient geometric alternative to standard vector-field parameterizations. This decoupled formulation yields a structured and interpretable generative model in which the number of learned scalar output channels can be chosen independently of the ambient dimension. We formulate an expressivity principle showing that, under suitable controllability and well-posedness assumptions, such controlled flows can transport a source distribution to a target distribution. We train the resulting model using a continuous-normalizing-flow likelihood objective and present proof-of-concept experiments on synthetic distributions.
【17】ProgVLA: Progress-Aware Robot Manipulation Skill Learning
标题:ProgVLA:进度感知机器人操纵技能学习
链接:https://arxiv.org/abs/2605.28231
摘要:We present ProgVLA, a compact vision-language-action (VLA) model designed for reliable robot manipulation under tight compute and memory budgets. The model specifically focuses on efficiently processing long multi-modal sequences by maintaining an explicit representation of task progress over extended horizons. To this end, ProgVLA integrates two key components. First, a multi-modal encoder with a two-stage Perceiver resampling scheme compresses variable-length visual, language, and proprioceptive streams into a fixed set of control-ready context tokens, substantially reducing sequence length while preserving cross-modal grounding. Second, an auxiliary set of progress heads is trained with offline reinforcement learning (RL) objectives to jointly learn critics over normalized remaining-horizon targets. This provides the policy with an internal estimate of task progress and enables advantage- and success-weighted flow-matching imitation learning. On two well-established multi-task robot manipulation benchmarks, a 0.1B-parameter ProgVLA model reaches success rates that are competitive with, and on long-horizon and harder task tiers exceed, substantially larger pretrained baselines. Ablations indicate that the learned context resampler and task-adaptive visual fine-tuning are the largest single contributors, while progress-aware training provides a consistent additional gain that is concentrated on long-horizon and multi-object tasks. We further validate the approach in real-world toy-kitchen environments.
【18】Refining Multidimensional Video Reward Models via Disentangled Influence Functions
标题:通过分解影响函数细化多维视频奖励模型
链接
:https://arxiv.org/abs/2605.28203
摘要:As Text-to-Video (T2V) generation models continue to evolve, the complexity of video evaluation necessitates a fine-grained assessment across various axes. To address this, recent works have focused on developing Multidimensional Video Reward Models (MVRMs), which decompose the evaluation process to better align with the multifaceted nature of human visual perception. However, training effective MVRMs is fundamentally challenged by the complex nature of video data. In this work, we identify a critical phenomenon termed Dimensional Heterogeneity: the reliability of a training sample can vary substantially across evaluation dimensions, meaning that a sample may provide reliable supervision for one objective while inducing high supervision risk for another. Consequently, prevailing data-centric methods that filter based on global scalar metrics are ill-posed for T2V tasks. To address this, we propose a disentangled influence framework that that efficiently estimates dimension-specific supervision risk. Leveraging this framework, we introduce two dimension-disentangled refinement strategies: Dimension-Disentangled Pruning, which removes extreme high-risk samples, and Dimension-Disentangled Reweighting, which softly down-weights high-risk supervision. Extensive experiments demonstrate that our disentangled strategies significantly outperform global filtering baselines, yielding reward models with superior alignment to ground truth.
【19】Off-Policy Learning to Reason Works Because It Is More Pessimistic Than You Think
标题:政策外的理性学习有效,因为它比你想象的更悲观
链接:https://arxiv.org/abs/2605.28150
摘要:Large scale reinforcement learning has become a central tool for improving reasoning in large language models. At this scale, generation is often lagged or asynchronous, so updates are performed on data collected by older policies. This makes learning inherently off-policy. Most existing approaches nevertheless remain rooted in PPO-style trust-region objectives, treating training as approximately on-policy and using importance weights to correct distribution mismatch. These corrections can introduce high variance, destabilize optimization, and accelerate entropy collapse. Recent work suggests an alternative: rather than correcting the mismatch, one can embrace off-policy data and remove importance weights, often yielding stronger algorithms. In this paper, we provide an intuitive construction of off-policy objectives that include successful off-policy objectives and show that their effectiveness can be understood through implicit pessimism: they optimize toward target policies that are more conservative than their nominal objectives suggest. This perspective explains why some particular implementation choices improve stability: they implicitly control the effective target distribution. We then propose a principled modification that stabilize this induced distribution and improve off-policy learning.
【20】Sign-Aware Gated Sparse Autoencoders: Modeling Anticorrelated Features with Bi-Jump-ReLU Activations
标题:信号感知门控稀疏自动编码器:使用Bi-Jump-ReLU激活建模抗相关功能
链接:https://arxiv.org/abs/2605.28149
摘要:Sparse Autoencoders (SAEs) extract interpretable features from Large Language Models, but standard variants enforce non-negativity, forcing separate latents for diametrically opposed concepts (e.g., "pressure too high" vs. "pressure too low") and wasting dictionary capacity when features are anticorrelated. We propose the Sign-Aware Gated SAE (SA-GSAE): two-sided gated sparsity with signed magnitude and auxiliary supervision. A polarity-sensitive gate selects support on either sign, a signed-magnitude path avoids L1 shrinkage, and an auxiliary reconstruction prevents gate collapse. Bipolar sharing - one latent encoding both signs along a shared direction - is realised via a new Bi-Jump-ReLU activation; parameter accounting shows sign-awareness stays parameter-efficient even when anticorrelated pairs are rare. On real LLM activations across three mid-depth hookpoints on Pythia-1B and SmolLM3-3B (6 cells, 3 seeds), a half-width SA-GSAE at width H strictly Pareto-dominates a full-width Gated SAE at 2H over the entire swept L0 overlap on 3 of 6 cells (both MLP-output hookpoints and resid-mid/Pythia-1B); on the remaining 3 it matches R^2 within 0.025 (max gap -0.008) while cutting dead fraction by 0.35-0.62 absolute. Sweep-geomean dead-fraction reductions are ~100x-500x on MLP-output cells and Pythia-1B resid, ~2x-4x on attention cells and SmolLM3-3B resid. Ablations show the two-sided gate and auxiliary loss are load-bearing (no auxiliary collapses LR to 0.27, 98% dead); tying r_i^+ = r_i^- is indistinguishable (|Delta R^2| = 0.0015), and we recommend this symmetric variant as default. MLP-output gains come from most latents carrying both polarities; on attention, bipolar structure concentrates in a small set of top latents. Full-width SA-GSAE exhibits a reproducible reconstruction collapse at SmolLM3-3B resid that the half-width entirely avoids.
【21】Sequential Neural Probabilistic Amplitude Shaping: Learning the Channel's Language
标题:顺序神经概率幅度整形:学习通道的语言
链接:https://arxiv.org/abs/2605.28143
备注:4 pages, 2 figures, Submitted to the 52nd European Conference on Optical Communications
摘要:We present the first neural probabilistic amplitude shaping that outperforms existing methods while accounting for all implementation losses, using a block-less, easily implementable sequential autoregressive encoder compatible with arithmetic distribution matching, yielding reduced rate loss and higher achievable information rates.
【22】Learning to Bid in Repeated Second-Price Auctions with Dynamic Values and Aggregated Feedback
标题:动态价值和聚合反馈下重复第二价格拍卖中的投标学习
链接:https://arxiv.org/abs/2605.28133
摘要:We study the problem of learning to bid when the bidder's value is dynamic, i.e., when the current value depends on past outcomes. Specifically, we consider a bidder participating in repeated second-price auctions whose value depends on the time elapsed since their last successful bid, with auctions arriving in continuous time and only aggregated feedback revealed at the end of the horizon. Such a bidder must (1) balance the immediate benefit of winning the current auction against its impact on future values and (2) learn unknown environmental parameters. We derive regret bounds for a class of learning methods that combine plug-in estimators with a differential-equation characterization of the optimal policy, and show that a specific confidence bound algorithm learns the optimal policy with a near optimal regret of $\widetilde{O}(\log N)$ for piecewise linear primitives, and $\widetilde{O}(N^{1/3})$ for general, smooth primitives, achieving these regrets without explicit randomization. These theoretical results are supported by numerical experiments.
【23】Stay Fair! Ensuring Group Fairness in Diffusion Models Across Guidance Scales
标题:保持公平!确保跨指导尺度扩散模型中的群体公平性
链接:https://arxiv.org/abs/2605.28036
备注:28 pages, 18 figures
摘要
:Diffusion models steer conditional generation with a tunable guidance scale to trade off prompt alignment and diversity. However, existing debiasing techniques are optimized for a single scale, degrading fairness when users adjust this parameter. We trace this behavior to a previously overlooked source by decomposing total bias into two components: a model bias and a guidance bias. While prior work primarily targets the former, we show that the guidance bias grows monotonically with the guidance scale, eventually dominating the high-guidance regimes users prefer. To address this, we extend Strong Demographic Parity to guidance and derive a condition under which the target distribution retains its group ratio across guidance scales. We propose StayFair, which leverages this condition to design fair guidance algorithms in both regimes. For classifier guidance, it equalizes the classifier's output distributions across groups; for classifier-free guidance, it shifts the null embedding by a prompt-dependent offset. Because StayFair modifies only the guidance step, it is orthogonal to model debiasing and can be layered onto existing fair diffusion models to extend their fairness across guidance scales. Across class-conditional and text-to-image generation, StayFair decouples fairness from the guidance scale without sacrificing image quality.
【24】Learning Compositional Latent Structure with Vector Networks
标题:基于向量网络的组合潜结构学习
链接:https://arxiv.org/abs/2605.28007
摘要:Deep networks are powerful function approximators, but they typically store many different computations in shared weight matrices, making it difficult to selectively reuse or adapt parts of them when a familiar structure appears in novel combinations. We introduce the Vector Network (VN), a hierarchical recurrent architecture in which each layer replaces a fixed weight matrix with a library of reusable rank-1 weight atoms. For each input, VN minimizes a layer-local energy to infer a sparse set of active weight atoms and their coefficients, jointly constrained by bottom-up input reconstruction and top-down feedback consistency. These weight atom coefficients then compose an input-specific low-rank weight matrix for that sample. After convergence, slow learning updates only the selected weight atoms through local residual signals scaled by the inferred coefficients. We evaluate VN on four compositional benchmarks spanning 1D signals, 2D spatial decoding, N-body dynamics, and compositional MNIST. VN matches strong baselines in distribution while often achieving out-of-distribution error about an order of magnitude lower when familiar factors must be recombined in novel ways. Vector networks thus make compositional generalization a structural property of the architecture and inference process rather than a brittle byproduct of fitting many behaviors into one shared dense parameter substrate.
【25】Continual Learning in Modern Hopfield Networks with an Application to Diffusion Models
标题:现代Hopfield网络中的连续学习及其在扩散模型中的应用
链接:https://arxiv.org/abs/2605.27975
摘要:Generative models, including diffusion models, are increasingly used as foundation models and adapted through sequential fine-tuning, making continual learning an essential problem setting. However, continual learning in such generative models remains poorly understood: after a task change, what aspects of the learned distribution are most easily lost, and what replay samples should be prioritized? We address these questions through the modern Hopfield energy. Recent links between modern Hopfield networks (MHNs) and diffusion models allow analyses in MHNs to be transferred to diffusion models. We introduce intrinsic forgetting as an increase in Hopfield energy after the task change. In tractable settings in an MHN, we prove that high-energy, outlier-like samples undergo a larger energy increase than cluster-like samples, implying that samples located in sharp, isolated basins are more forgettable. We further analyze memory replay and show that replay is particularly effective for high-energy samples, enabling an energy-based selection of replay samples. We validate these predictions in experiments on MHNs and two diffusion models under continual-learning settings: Stable Diffusion and a pixel-space DDPM. In these diffusion models, Hopfield energy tracks reconstruction-based forgetting, and replay experiments reveal energy-dependent mitigation of forgetting that is consistent with the MHN analysis.
【26】Do We Really Need Quantum Machine Learning?: A Multidimensional Empirical Study
标题:我们真的需要量子机器学习吗?:多维实证研究
链接:https://arxiv.org/abs/2605.27923
摘要:The rapid growth of computer vision and increasingly complex image recognition tasks has exposed fundamental computational limitations of classical machine learning models, motivating the exploration of quantum computing as an emerging new paradigm. This paper presents a comprehensive benchmarking study of classical and quantum machine learning models for image recognition on the MNIST handwritten digit dataset, evaluating both traditional models, a Classical Support Vector Machine (CSVM) and a Quantum Support Vector Machine (QSVM), and deep neural network models, a Classical Convolutional Neural Network (CCNN) and a Quantum Convolutional Neural Network (QCNN), across four performance dimensions: classification accuracy, computational runtime, parameter count, and memory requirements. Experiments are conducted as functions of both feature dimensionality and sample size, and across CPU and GPU execution environments, providing a controlled, multidimensional comparison to address gaps in prior work. For the SVM-based models, QSVM consistently outperforms CSVM in accuracy, reaching $\sim$ 0.90 versus $\sim$ 0.85 at 1,000 samples, with a higher computational cost. A feature count of 10 qubits and a sample size in the range of 200 -- 500 emerge as practical operating points that balance accuracy and runtime. For the neural network models, CCNN and QCNN achieve comparable classification accuracy, both exceeding 0.96 at 64 features and 60,000 samples, yet QCNN offers substantially superior parameter and memory efficiency, requiring $\sim$ 94\% fewer parameters and $\sim$ 75\% less memory than CCNN at higher feature counts, while incurring higher runtime. Across both model families, quantum models consistently outperform classical models by greater margins in accuracy as feature dimensionality or sample size increases.
【27】Decentralized Parameter-Free Online Learning with Compressed Gossip
标题:具有压缩流言的分散无参数在线学习
链接:https://arxiv.org/abs/2605.27831
摘要:We study decentralized online convex optimization when agents communicate over a graph and messages may be compressed. Classical decentralized online methods typically require learning-rate choices that depend on the horizon, comparator scale, or other problem parameters, while compressed communication introduces additional disagreement that must be controlled. We propose DECO-EF (DEcentralized COin-betting with Error Feedback), a decentralized parameter-free online learning algorithm that combines coin-betting predictions with compressed difference-based gossip. Each agent maintains a clean accumulated state and a compressed tracker, and communicates only compressed state differences during gossip steps. The method is parameter-free in the online-learning sense: it does not tune to the horizon, the comparator norm, or the learning rate. We prove expected comparator-adaptive network-regret bounds for DECO-EF under compressed communication. To the best of our knowledge, this gives the first expected sublinear network-regret guarantees for parameter-free decentralized online learning under compressed communication.
【28】Turning Video Models into Generalist Robot Policies
标题:将视频模型转变为多面手机器人政策
链接:https://arxiv.org/abs/2605.27817
备注:project page: https://vera.csail.mit.edu
摘要:Video generative models have emerged as a promising robotics backbone, capable of generating videos that depict the completion of complex tasks across embodiments and environments. Recent work proposes robot foundation models that jointly predict future observations and actions by finetuning video models with action-labeled data. In this paper, we test the limits of an alternative approach: leave the video planner as-is while training an embodiment-specific inverse dynamics model (IDM). This decoupling offers several natural benefits: the video planner remains embodiment-agnostic, different video models can be interchanged easily without re-training the IDM, and the IDM can be independently trained with readily available self-play data. We present a closed-loop, video-to-action policy that combines an action-free video world model with a carefully-designed IDM based on the robot embodiment Jacobian. We demonstrate that our IDM design is both data-efficient and scalable to high-dimensional action spaces. Our policy, which we coin the Video-to-Embodied Robot Action Model (VERA), achieves strong performance across simulated and real-world benchmarks, including zero-shot Panda arm manipulation and 16-DoF Allegro-hand dexterous cube re-orientation. The same video planner can be used across multiple embodiments by pairing it with different embodiment-specific IDMs. Our results show that decoupled video planning plus faithful video-to-action translation is a viable alternative route towards zero-shot, cross-embodiment, and generalizable robot control. More results are available on our project website: https://vera.csail.mit.edu.
【29】Residualized Temporal Sparse Autoencoders for Interpreting Diffusion Models
标题:用于解释扩散模型的剩余时间稀疏自动编码器
链接:https://arxiv.org/abs/2605.27813
摘要:Text-to-image diffusion models generate images through an iterative denoising process, so internal neural layers produce trajectories of activations rather than single static representations. Sparse autoencoders (SAEs) have recently been used to decompose diffusion activations into interpretable feature directions, but most approaches analyze activations at individual timesteps or condition on time rather than learning directly from full activation trajectories. In this work, we introduce residualized temporal SAEs for diffusion activation trajectories. We collect activations across denoising time, fit linear predictors between neighboring timesteps, and represent each trajectory using an initial activation together with residual components not explained by these linear dynamics. Training an SAE on this residualized representation encourages sparse latents to capture structure beyond what is linearly predictable. The residualized decoder directions can be mapped back into activation space, allowing each latent to be analyzed as a feature trajectory over denoising time. Through reconstruction and ablation studies, spatiotemporal feature analysis, and qualitative steering experiments on Stable Diffusion~1.5, we show that residualized temporal SAEs provide a useful framework for studying temporally structured diffusion activations.
【30】Do Models Know Why They Changed Their Mind? Interpretability and Faithfulness of Chain-of-Thought Under Knowledge Conflict
标题:模特知道为什么他们改变主意吗?知识冲突下思想链的可解释性和忠实性
链接:https://arxiv.org/abs/2605.27773
备注:12 pages, 8 tables, 3 appendices
摘要:When a language model sees a document contradicting its training knowledge, it must choose: follow the document or trust itself. Prior work proved this choice depends on how well-known the fact is. We ask: does the model's chain-of-thought (CoT) reasoning faithfully report this mechanism? We introduce introspective faithfulness and test it across 200 questions, 8 models, and 4 prompt conditions. We find CoT reasoning is highly stable across opposite decisions: flip pairs retain 96% of same-answer similarity (d=0.34; confirmed by ROUGE-L, d=0.45). Yet self-rated confidence carries a faint genuine signal: for obscure facts where entity fame is uninformative, confidence still predicts decisions (p<0.001) and tracks item-level knowledge (r=0.134). GPT-4o is the only model with statistically reliable reasoning-decision coupling. Claude Sonnet 4.6 shows the widest confidence range (SD=1.39) but near-zero pooled correlation because the confidence-decision relationship reverses between conditions; a temperature ablation confirms this is model-specific. Internal thinking tokens show greater decision-sensitivity than user-facing CoT (p=0.033). CoT decomposes into a decision-invariant knowledge display (~96%) and a thin confidence layer with weak but real signal. For monitoring: read confidence, not the argument.
【31】Explicit Critic Guidance for Aligning Diffusion Models
标题:调整扩散模型的明确批评指导
链接:https://arxiv.org/abs/2605.27736
摘要:Online reinforcement learning is becoming increasingly important for aligning diffusion models with non-differentiable objectives. However, existing methods still face limitations in assigning fine-grained credit along denoising trajectories and in realizing stable value-based optimization. We propose a state-aligned latent actor-critic framework for diffusion post-training, in which the diffusion model serves as its own timestep-conditioned value function and predicts values directly on noisy latent states. This enables trajectory-level PPO training, supports stable actor-critic optimization with simple conditioning and value pretraining strategies, and naturally allows the learned critic to be reused for inference-time steering. We further extend the framework to multi-reward optimization, where joint training with complementary rewards helps alleviate reward hacking. Across both UNet- and DiT-based backbones, our method consistently outperforms prior group-relative RL and actor-critic baselines on single-reward and multi-reward benchmarks, while test-time steering provides additional gains in generation quality.
【32】Learn from your own latents and not from tokens: A sample-complexity theory
标题:从你自己的潜伏中学习,而不是从代币中学习:样本复杂性理论
链接:https://arxiv.org/abs/2605.27734
备注:10 pages, 5 figures in main. 28 pages, 14 figures, 1 table in all
摘要
:Generative models, from diffusion models to large language models, achieve remarkable performance but at a cost in training data orders of magnitude larger than what biological learners require. An alternative paradigm has emerged in which networks are trained to predict their \emph{own} latent representations of related views or masked regions, as in data2vec and JEPA -- an idea related to predictive-coding accounts of the cortex. Despite strong empirical results, the theoretical understanding of these methods remains limited. Central questions include: by how much does latent prediction actually improve data efficiency? Is there a benefit to stacking such methods into multi-scale hierarchies? We answer both using as data a tractable probabilistic context-free grammar that captures the compositional structure of natural language and images. Such a grammar generates strings of visible tokens by recursively applying production rules along a tree of hidden symbols of depth $L$. For such data, supervised or token-level SSL require a number of samples \emph{exponential} in $L$ to recover the latent tree; we prove that latent prediction achieves this with a number of samples \emph{constant} in $L$, up to logarithmic factors. We confirm this bound with (i) a hierarchical clustering algorithm, (ii) an end-to-end neural network whose predictor-clusterer modules predict their own latents at each level via gradient descent, and (iii) the first sample-complexity analysis of data2vec, which we show implicitly performs hierarchical latent prediction. This suggests that explicit stacking such as H-JEPA is largely redundant.
【33】NUCLEUS-MoE: Unified Model of Pool Boiling for Liquid Cooling
标题:NUCLUS-MoE:液体冷却池沸腾统一模型
链接:https://arxiv.org/abs/2605.27722
备注:12 pages, 9 figurs, KDD AI for Science
【34】Structure over Pixels: Learning Variable-Length Visual Programs
标题:像素结构:学习变长视觉程序
链接:https://arxiv.org/abs/2605.27696
【35】How the Optimizer Shapes Learned Solutions in Equivariant Neural Networks
标题:优化器如何在等变神经网络中挖掘学习的解决方案
链接:https://arxiv.org/abs/2605.27662
备注:Accepted at ICML 2026 Workshop on Weight-Space Symmetries
【36】Evaluating Local Explainability Metrics for Machine Learning Models on Tabular Data
标题:评估表格数据上机器学习模型的局部可解释性指标
链接:https://arxiv.org/abs/2605.27618
备注:9 pages, 12 tables, 1 figure, DATA 2026 Conference
【37】Proper Agnostic Learning of Functions of Halfspaces under Gaussian Marginals
标题:高斯边缘下半空间函数的适当不可知学习
链接:https://arxiv.org/abs/2605.27594
【38】Debate Helps Weak Judges Reward Stronger Models
标题:辩论有助于弱法官奖励强模型
链接:https://arxiv.org/abs/2605.27483
【39】Balancing Fidelity and Diversity in Diffusion Models via Symmetric Attention Decomposition: Hopfield Perspective
标题:通过对称注意力分解平衡扩散模型中的忠实性和多样性:霍普菲尔德的观点
链接:https://arxiv.org/abs/2605.27476
备注:Accepted to ICML 2026 (Regular)
【40】Metric-Aware PCA as a Linear Instance of Geometric Deep Learning
标题:度量感知PCA作为几何深度学习的线性实例
链接:https://arxiv.org/abs/2605.27456
【41】Tackling Multimodal Learning Challenges with Mixture-of-Expert: A Survey
标题:利用混合专家应对多模式学习挑战:一项调查
链接:https://arxiv.org/abs/2605.27431
备注:This survey paper has just been accepted by IJCAI 2026. Results were released by 30 April 2026. As I could not find a particular place to drop the acceptance email. I have upload the acceptance email alongside the LaTeX files of the paper, named as Acceptance_email.pdf
【42】Advancing Direct Training for Spiking Neural Networks with Circulate-Firing Neurons and Learnable Gradients
标题:推进具有循环放电神经元和可学习对象的尖峰神经网络的直接训练
链接:https://arxiv.org/abs/2605.27412
【43】Benchmarking Fairness in Spiking Neural Networks: Data Bias, Spurious Features, and Hardware Effects
标题:尖峰神经网络中的公平性基准:数据偏差、杂散特征和硬件效应
链接:https://arxiv.org/abs/2605.27407
【44】Dark Quest II: A Wide-Coverage Neural Network Emulator of the Nonlinear Matter Power Spectrum Across Extended Cosmologies
标题:Dark Quest II:扩展宇宙中非线性物质功率谱的广泛覆盖神经网络模拟器
链接:https://arxiv.org/abs/2605.28596
备注:53 pages, 44 figures, emulator code available at https://github.com/DarkQuestCosmology/dark_emulator2_public
【45】Learning Logical Operations for Arbitrary Quantum Error Correction Codes
标题:学习任意量子错误纠正码的逻辑运算
链接:https://arxiv.org/abs/2605.28162
备注:23 pages, 12 figures, 5 tables
【46】Machine learning enables experimental access to photon-by-photon arrival times in scintillation detectors
标题:机器学习使实验性地访问闪烁探测器中逐子到达时间
链接:https://arxiv.org/abs/2605.27937
【47】Learning to target with network interference
标题:学习如何应对网络干扰
链接:https://arxiv.org/abs/2605.27794
【48】Sparse POD Mode Selection and Manifold Dimensionality Reduction with Neural Networks
标题:利用神经网络进行稀疏Pod模式选择和多维简化
链接:https://arxiv.org/abs/2605.27756
【49】CFDTwin: An open-source GUI and Python toolkit for POD-NN surrogate modeling of ANSYS Fluent simulations
标题:CFDTwin:一个开源图形用户界面和Python工具包,用于ANSYS Fluent模拟的POD-NN代理建模
链接:https://arxiv.org/abs/2605.27725
备注:9 pages, 3 figures
【50】On the Equivariant Learning of the $Q$-tensor Order Parameter
标题:关于$Q$-张量序参数的等变学习
链接:https://arxiv.org/abs/2605.27679
备注:15 pages (excluding 7-page appendix); 6 figures
【51】Neural Quantum Spectral Operator Learning for Solving Partial Differential Equations
标题:用于求解偏微方程的神经量子谱运算符学习
链接:https://arxiv.org/abs/2605.27408
备注:31 pages (main 9 pages), 17 figures, 8 tables
其他(56篇)
【1】Rethinking Memory as Continuously Evolving Connectivity
标题:将记忆重新思考为不断发展的连接性
链接:https://arxiv.org/abs/2605.28773
备注:Ongoing work
【2】Extrapolative Weight Averaging Reveals Correctness-Efficiency Frontiers in Code RL
标题:推断权重平均揭示代码RL中的正确性-效率前沿
链接:https://arxiv.org/abs/2605.28751
备注:54 pages
【3】How VLAs Fail Differently: Black-Box Action Monitoring Reveals Architecture-Specific Failure Signatures
标题:VLA如何不同地失败:黑匣子动作监控揭示特定于架构的失败特征
链接:https://arxiv.org/abs/2605.28726
备注:Accepted at IEEE ICRA 2026 Workshop "From Data to Decisions: VLA Pipelines for Real Robots", Vienna, June 2026. Non-archival workshop. 5 pages, 2 figures, 22 references
【4】Augmenting Attention with Exponentially Decaying Memory Improves Query-Aware KV Sparsity
标题:通过指数衰减记忆来增强注意力,改善查询感知的KN稀疏性
链接:https://arxiv.org/abs/2605.28640
【5】Single-Rollout Hidden-State Dynamics for Training-Free RLVR Data Selection
标题:用于免训练RLVR数据选择的单卷展隐态动力学
链接:https://arxiv.org/abs/2605.28631
备注:14 pages, 2 figures. Accepted by ICML 2026
【6】Position: Retire the "Positive Backdoor" Label -- Secret Alignment Requires Strict and Systematic Evaluation
链接:https://arxiv.org/abs/2605.28597
备注:ICML 2026
【7】PLS in the Mirror of Self-Attention
标题:自我注意力镜子中的PLS
链接:https://arxiv.org/abs/2605.28592
【8】Thinned Mean Field Langevin Dynamics
标题:稀薄平均场朗之万动力学
链接:https://arxiv.org/abs/2605.28589
【9】Tree of Thoughts as a Classical Heuristic Search Problem: Formal Foundations and Design Patterns
标题:作为经典启发式搜索问题的思想树:形式基础和设计模式
链接:https://arxiv.org/abs/2605.28566
备注:Extended version of the SoCS 2026 paper. Includes appendices omitted from the proceedings version
【10】Resolution-free neural surrogates for geometric parameterization and mapping with spatially varying fields
标题:用于空间变化场的几何参数化和映射的无分辨率神经代理
链接:https://arxiv.org/abs/2605.28551
【11】SPRINT: Efficient Spectral Priors for Humanoid Athletic Sprints
标题:SPRint:类人运动冲刺的有效光谱先验
链接:https://arxiv.org/abs/2605.28549
【12】Stabilizing distribution-free probabilistic forecasts
标题:稳定无分布概率预测
链接:https://arxiv.org/abs/2605.28531
【13】Stochastic Gradient Descent with Momentum is Algorithmically Stable
标题:有动量的随机梯度下降是数学稳定的
链接:https://arxiv.org/abs/2605.28517
【14】Bilinear Coordinate Alignment for Training-Free Task-Vector Transfer
标题:免训练任务-载体转移的双线性坐标对齐
链接:https://arxiv.org/abs/2605.28444
【15】Latent Diffusion for Missing Data
标题:缺失数据的潜在扩散
链接:https://arxiv.org/abs/2605.28427
【16】From paper to benchmark: agentic, framework-based reproduction of under-specified methods in machine health intelligence
标题:从纸张到基准:机器健康智能中未指定方法的代理、基于框架的复制
链接:https://arxiv.org/abs/2605.28371
【17】LEIA: Learned Environment for Interactive Architected Materials
标题:LEIA:交互式建筑材料的学习环境
链接:https://arxiv.org/abs/2605.28368
【18】Picid: A Modular Evaluation Infrastructure for Reproducible PHM Across Tasks and Domains
标题:Picid:跨任务和域的可重复性PHM的模块化评估基础设施
链接:https://arxiv.org/abs/2605.28345
【19】Where Rollouts Begin: Low-Load, High-Leverage First-Token Diversification for RLVR
标题:推出的起点:WLVR的低负载、高杠杆第一代币多元化
链接:https://arxiv.org/abs/2605.28295
【20】PhAME: Phenotype-Aware Molecular Editing via Latent Diffusion
标题:PhAME:通过潜在扩散的表型感知分子编辑
链接:https://arxiv.org/abs/2605.28226
【21】Geometry-First Generative Spatial Single-Cell Reconstruction
标题:几何-第一代空间单细胞重建
链接:https://arxiv.org/abs/2605.28200
备注:32nd SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2026)
【22】QuITE: Query-Based Irregular Time Series Embedding
标题:QuITE:基于查询的不规则时间序列嵌入
链接:https://arxiv.org/abs/2605.28166
备注:ICML 2026
【23】Self-Consistency via Marginal Sharpening
标题:通过边缘尖锐实现自我一致
链接:https://arxiv.org/abs/2605.28142
【24】Mind the Gap: Mixtures of Gaussians in Approximate Differential Privacy
标题:注意差距:近似差异隐私中高斯主义者的混合
链接:https://arxiv.org/abs/2605.28078
备注:ICML 2026 style: 9 main pages followed by acknowledgements, references, appendices
【25】RW-TTT: Batched Serving for Request-Owned Test-Time Training State
标题:RW-TTT:针对请求拥有的测试时间训练状态批量发球
链接:https://arxiv.org/abs/2605.28053
【26】Geometry-Correct Diffusion Posterior Sampling with Denoiser-Pullback Curvature Guidance and Manifold-Aligned Damping
标题:具有去噪器回拉弯曲引导和管汇对齐衰减的几何正确扩散后验采样
链接:https://arxiv.org/abs/2605.27990
备注:Code: https://github.com/Seunghyeok0715/CLAMP
【27】When Think-with-Image Meets Safety: What Determines Multimodal Jailbreak Robustness?
标题:当图像思考满足安全性:是什么决定了多模式越狱的稳健性?
链接:https://arxiv.org/abs/2605.27932
备注:17 pages, 6 figures, 7 tables
【28】Structure-Guided Visual Perturbation Neutralization for LVLMs
标题:LVLM的结构引导视觉扰动中和
链接:https://arxiv.org/abs/2605.27927
【29】Frequency-Guided Action Diffusion via Sub-Frequency Manifold Traversal
标题:通过次频率多管齐下的频率引导动作扩散
链接:https://arxiv.org/abs/2605.27919
备注:A preprint version of FGO
【30】Privately Estimating Monotone Statistics in Polynomial Time
标题:私下估计多项时间的单调统计量
链接:https://arxiv.org/abs/2605.27912
【31】Dr-CiK: A Testbed for Foresight-Driven Agents
标题:Dr-CiK:前瞻性驱动代理的测试平台
链接:https://arxiv.org/abs/2605.27904
【32】SPAR: Support-Preserving Action Rectification
标题:支持维护行动整改
链接:https://arxiv.org/abs/2605.27877
【33】Smoothed Score Queries and the Complexity of Sampling
标题:平滑分数线与抽样的复杂性
链接:https://arxiv.org/abs/2605.27769
【34】UniMaia: Steering Chess Policies with Language for Human-like Play
标题:UniMaia:用类人游戏语言指导国际象棋政策
链接:https://arxiv.org/abs/2605.27767
【35】Worker Disagreement Reveals Sharp Directions in Local SGD
标题:工人分歧揭示了当地新元的尖锐方向
链接:https://arxiv.org/abs/2605.27739
备注:5 pages main body, 18 pages appendix - Accepted to HiLD 2026, ICML
【36】Can Entry-Wise Clipping Give Spectral Control of Stochastic Gradients?
标题:进入时剪辑可以对随机子集进行光谱控制吗?
链接:https://arxiv.org/abs/2605.27733
【37】Bayesian Deployment Approval for Learned Landing Controllers under Finite Rollout Validation
标题:有限推出验证下习得着陆控制器的Bayesian部署批准
链接:https://arxiv.org/abs/2605.27720
备注:16 pages, 4 figures and 4 tables
【38】Simulation-Informed Diffusion for Decentralized Multi-robot Motion Planning
标题:分布式多机器人运动规划的仿真信息扩散
链接:https://arxiv.org/abs/2605.27697
【39】Test-Time Collective Action: Proxy-Based Perturbations for Correcting Algorithmic Harms
标题:测试时集体行动:基于代理的扰动纠正心律失常伤害
链接:https://arxiv.org/abs/2605.27689
【40】Hurwitz Quaternion Multiplicative Quantization for KV Cache Compression
标题:用于KV缓存压缩的Hurwitz四元数乘量化
链接:https://arxiv.org/abs/2605.27646
【41】Trinity: Unifying Class-Agnostic Terrain and Semantic Segmentation for Unstructured Outdoor Environments by Leveraging Synthetic Data
标题:Trinity:通过利用合成数据统一非结构化户外环境的类别不可知地形和语义分割
链接:https://arxiv.org/abs/2605.27644
【42】The Energy Blind Spot: NVIDIA's Flagship Edge AI Hardware Cannot Support Process-Level Energy Attribution
标题:能源盲点:英伟达旗舰边缘人工智能硬件无法支持处理级能源归因
链接:https://arxiv.org/abs/2605.27599
【43】SparseOpt: Addressing Normalization-induced Gradient Skew in Sparse Training
标题:SparseOpt:解决稀疏训练中规范化引发的梯度偏差
链接:https://arxiv.org/abs/2605.27541
备注:Accepted International Conference on Machine Learning (ICML) 2026
【44】Automating Formal Verification with Agent-Guided Tree Search
标题:使用代理引导树搜索自动化形式验证
链接:https://arxiv.org/abs/2605.27485
备注:78 pages, 8 figures
【45】Resource-Constrained Affect Modelling via Variance Regularisation Pruning
标题:通过方差正规化修剪的资源约束影响建模
链接:https://arxiv.org/abs/2605.27479
备注:This paper has been accepted at the 2026 PErvasive Technologies Related to Assistive Environments (PETRA)
【46】Architecture-driven Shift: towards a lightweight selector for capturing the trends of logit shift
标题:架构驱动的转变:迈向轻量级选择器,以捕捉logit转变的趋势
链接:https://arxiv.org/abs/2605.27469
【47】AgensFlow: A Coordination-Policy Substrate for Multi-Agent Systems
标题:DeliversFlow:多智能体系统的协调政策基础
链接:https://arxiv.org/abs/2605.27466
备注:7 pages, 4 figures, 4 tables. Code and reproducible evaluations available at: https://github.com/Nicolepcx/AgensFlow
【48】LNN-PINN: A Unified Physics-Only Training Framework with Liquid Residual Blocks
标题:LNN-PINN:具有液体残留块的统一纯物理训练框架
链接:https://arxiv.org/abs/2508.08935
【49】Beyond Lipschitz: Data-Driven Robustness via Discrete Modulus of Continuity
标题:超越Lipschitz:通过离散连续模实现数据驱动的鲁棒性
链接:https://arxiv.org/abs/2605.28729
【50】Latent-Conditioned Parameterized Quantum Circuits as Universal Approximators for Distributions over Quantum States
标题:潜条件参数化量子电路作为量子状态分布的通用逼近器
链接:https://arxiv.org/abs/2605.28690
备注:16 pages, 11 figures
【51】Implicit Regularization in Perturbed Deep Matrix Factorization: Spectral Conditions and Stability
标题:扰动深度矩阵分解中的隐式正规化:谱条件和稳定性
链接:https://arxiv.org/abs/2605.28613
【52】Geometry of Relaxed Fair Regression: A Unified Framework for Aware and Unaware Settings
标题:宽松公平回归的几何结构:感知和不感知设置的统一框架
链接:https://arxiv.org/abs/2605.28233
【53】Probabilistic Data-Driven Modelling of Astrophysical Transients: The Neural Process Family for Ultrafast and Class-Agnostic Light Curve Reconstruction with NightLANP
标题:天体物理学瞬时的概率数据驱动建模:使用NightLANP进行超快和类不可知光变曲线重建的神经过程家族
链接:https://arxiv.org/abs/2605.27527
【54】Identifiable Bayesian Deep Generative Copulas with Unknown Layer Widths for Data with Arbitrary Marginal Distributions
标题:具有任意边缘分布的数据的层宽未知的可识别的Bayesian深生成Copula
链接:https://arxiv.org/abs/2605.27523
【55】Iterative Causal Discovery: Per-Edge Impossibility Certificates, Tier-Aware Oracle Queries, and the $1+K$ Lower Bound
标题:迭代因果发现:逐边不可能性证书、分层感知Oracle收件箱和1 +K$下限
链接:https://arxiv.org/abs/2605.27477
备注:Contains 10 figures and 5 tables
【56】Proper Calibeating
标题:正确的校准
链接:https://arxiv.org/abs/2605.26703
机器翻译由腾讯交互翻译提供,仅供参考
点击“阅读原文”获取带摘要的学术速递