机器学习学术速递[9.25]

点击阅读原文访问arxivdaily.com，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏等功能！

cs.LG 方向，今日共计179篇

大模型相关(14篇)

【1】When Judgment Becomes Noise: How Design Failures in LLM Judge Benchmarks Silently Undermine Validity
标题：当判断成为噪音：LLM法官基准中的设计失败如何悄然削弱有效性
链接：https://arxiv.org/abs/2509.20293

作者：Feuer, Chiung-Yi Tseng, Astitwa Sarthak Lathe, Oussama Elachqar, John P Dickerson
摘要：LLM判断的基准越来越多地用于评估复杂的模型行为，但它们的设计引入了传统的基于地面实况的基准中没有的故障模式。我们认为，没有严格的目标和可验证的结构，基准排名可以产生高置信度的排名，实际上很大程度上是噪音。我们引入两种机制来诊断这些问题。图式坚持量化了多少法官的整体判决是由明确的评价图式解释，揭示了无法解释的变化时，法官偏离自己的标题。心理测量有效性聚合内部一致性和判别有效性信号，以量化任何基准测试运行中不可减少的不确定性。将这些工具应用于Arena-Hard Auto，我们发现在流行的判断中存在严重的模式不一致和因素崩溃：例如，DeepSeek-R1- 32 B的无法解释的方差超过90%，大多数标准的因素相关性超过0.93。我们还表明，Arena Hard Auto使用的P2P式聚合崩溃并掩盖了真正的排名不确定性。我们的研究结果突出了破坏有效性的设计失败，并为构建更好的范围，可靠性感知的LLM判断基准提供了可操作的原则。我们在https://anonymous.4open.science/r/judgment-to-noise-947D/README.md上发布代码
摘要：LLM-judged benchmarks are increasingly used to evaluate complex model behaviors, yet their design introduces failure modes absent in conventional ground-truth based benchmarks. We argue that without tight objectives and verifiable constructions, benchmark rankings can produce high-confidence rankings that are in fact largely noise. We introduce two mechanisms to diagnose these issues. Schematic adherence quantifies how much of a judge's overall verdict is explained by the explicit evaluation schema, revealing unexplained variance when judges deviate from their own rubric. Psychometric validity aggregates internal consistency and discriminant validity signals to quantify irreducible uncertainty in any benchmarking run. Applying these tools to Arena-Hard Auto, we find severe schema incoherence and factor collapse across popular judges: for example, unexplained variance exceeding 90 percent for DeepSeek-R1-32B and factor correlations above 0.93 for most criteria. We also show that the ELO-style aggregation used by Arena-Hard Auto collapses and masks genuine ranking uncertainty. Our results highlight design failures that undermine validity and offer actionable principles for building better-scoped, reliability-aware LLM-judged benchmarks. We release our code at https://anonymous.4open.science/r/judgment-to-noise-947D/README.md

【2】Beyond Sharp Minima: Robust LLM Unlearning via Feedback-Guided Multi-Point Optimization
标题：超越Sharp Minima：通过反馈引导多点优化实现稳健的LLM反学习
链接：https://arxiv.org/abs/2509.20230

作者：, Zheyuan Liu, Chongyang Gao, Ren Wang, Kaize Ding
摘要：当前的LLM遗忘方法面临着一个严重的安全漏洞，这破坏了它们的根本目的：虽然它们似乎成功地删除了敏感或有害的知识，但这些“被遗忘”的信息仍然难以通过重新学习攻击来恢复。我们发现，根本原因是传统的方法优化遗忘损失在个别数据点将驱动模型参数朝着尖锐的最小值的损失景观。在这些不稳定的区域，即使是最小的参数扰动可以大大改变模型的行为。因此，再学习攻击利用了这一漏洞，只需使用几个微调样本来导航这些不稳定区域周围的陡峭梯度，从而快速恢复被认为是被删除的知识。这暴露了明显的遗忘和实际的知识去除之间的关键鲁棒性差距。为了解决这个问题，我们提出了StableUN，一个双层反馈引导优化框架，通过邻域感知优化来明确寻求更稳定的参数区域。它集成了遗忘反馈，使用对抗性扰动来探测参数邻域，记忆反馈以保持模型效用，通过梯度投影对齐两个目标。在WMDP和MUSE基准测试上的实验表明，我们的方法在保持有竞争力的效用性能的同时，对重新学习和越狱攻击具有更强的鲁棒性。
摘要：Current LLM unlearning methods face a critical security vulnerability that undermines their fundamental purpose: while they appear to successfully remove sensitive or harmful knowledge, this ``forgotten" information remains precariously recoverable through relearning attacks. We identify that the root cause is that conventional methods optimizing the forgetting loss at individual data points will drive model parameters toward sharp minima in the loss landscape. In these unstable regions, even minimal parameter perturbations can drastically alter the model's behaviors. Consequently, relearning attacks exploit this vulnerability by using just a few fine-tuning samples to navigate the steep gradients surrounding these unstable regions, thereby rapidly recovering knowledge that was supposedly erased. This exposes a critical robustness gap between apparent unlearning and actual knowledge removal. To address this issue, we propose StableUN, a bi-level feedback-guided optimization framework that explicitly seeks more stable parameter regions via neighborhood-aware optimization. It integrates forgetting feedback, which uses adversarial perturbations to probe parameter neighborhoods, with remembering feedback to preserve model utility, aligning the two objectives through gradient projection. Experiments on WMDP and MUSE benchmarks demonstrate that our method is significantly more robust against both relearning and jailbreaking attacks while maintaining competitive utility performance.

【3】Q-Palette: Fractional-Bit Quantizers Toward Optimal Bit Allocation for Efficient LLM Deployment
标题：Q-SYS：分数位量化器实现最佳位分配以实现高效LLM部署
链接：https://arxiv.org/abs/2509.20214

作者：ee, Hyun Oh Song
备注：NeurIPS 2025
摘要：我们研究了仅加权后训练量化（PTQ），它使用很少或没有校准数据来量化大型语言模型（LLM）的权重，而无需重新训练。仅加权PTQ对于减少LLM推理的内存占用和延迟至关重要，特别是在内存受限的小批量推理场景中，例如边缘设备上的个性化推理。尽管其重要性，但LLM中具有重尾离群值的不规则权重分布使量化复杂化，最近激发了基于旋转的方法，该方法将权重转换为近高斯分布，其更规则，离群值更少，从而减少量化误差。在这项工作中，我们首先推导出在给定的比特预算下高斯化权重的信息理论上的最优比特分配，揭示了接近高斯失真率界的细粒度分数比特量化器对于实现接近最优的量化性能是必不可少的。为了弥合这一理论见解和实际实现，我们引入了Q-Palette，这是一个多功能的分数位量化器集合，范围从提供接近最佳失真的网格编码量化器到为更快推理而优化的更简单的矢量和标量量化器，所有这些都通过优化的CUDA内核有效地实现跨各种比特宽度。此外，利用Q-Qs作为基础组件，我们提出了一种新的混合方案量化框架，联合优化量化器的选择和层融合决策资源约束。该代码可在https://github.com/snu-mllab/Q-Palette上获得。
摘要：We study weight-only post-training quantization (PTQ), which quantizes the weights of a large language model (LLM) without retraining, using little or no calibration data. Weight-only PTQ is crucial for reducing the memory footprint and latency of LLM inference, especially in memory-bound, small-batch inference scenarios, such as personalized inference on edge devices. Despite its importance, irregular weight distributions with heavy-tailed outliers in LLMs complicate quantization, recently motivating rotation-based methods that transform weights into near-Gaussian distributions, which are more regular with fewer outliers, thereby reducing quantization error. In this work, we first derive the information-theoretically optimal bit allocation for Gaussianized weights under given bit budgets, revealing that fine-grained fractional-bit quantizers approaching the Gaussian distortion-rate bound are essential to achieve near-optimal quantization performance. To bridge this theoretical insight and practical implementation, we introduce Q-Palette, a versatile collection of fractional-bit quantizers that range from trellis-coded quantizers offering near-optimal distortion to simpler vector and scalar quantizers optimized for faster inference, all efficiently implemented with optimized CUDA kernels across various bitwidths. Furthermore, leveraging Q-Palette as a foundational component, we propose a novel mixed-scheme quantization framework, jointly optimizing quantizer choices and layer fusion decisions given resource constraints. The code is available at https://github.com/snu-mllab/Q-Palette.

【4】Universal Camouflage Attack on Vision-Language Models for Autonomous Driving
标题：自动驾驶视觉语言模型的通用伪装攻击
链接：https://arxiv.org/abs/2509.20196

作者：ng, Sifan Yu, Siyuan Liang, Jiawei Liang, Jianhou Gan, Aishan Liu, Wenqi Ren
摘要：自动驾驶的视觉语言建模是一个很有前途的研究方向，在多模态推理能力的实质性改进。尽管VLM-AD具有先进的推理能力，但它仍然容易受到来自对抗性攻击的严重安全威胁，这些攻击涉及通过精心制作的扰动来误导模型决策。现有的攻击有明显的挑战：1）物理对抗性攻击主要针对视觉模块。它们很难直接转移到VLM-AD系统，因为它们通常攻击低级感知组件。2)针对VLM-AD的对抗性攻击主要集中在数字层面。为了应对这些挑战，我们提出了第一个通用的摄像头攻击（UCA）框架VLM-AD。与之前专注于优化logit层的方法不同，UCA在特征空间中操作以生成物理上可实现的伪装纹理，这些纹理在不同的用户命令和模型架构中表现出很强的泛化能力。受VLM-AD中编码器和投影层的脆弱性的影响，UCA引入了一种特征发散损失（FDL），最大限度地提高了干净图像和对抗图像之间的代表性差异。此外，UCA采用多尺度学习策略，并调整采样率，以增强其对真实场景中尺度和视角多样性变化的适应性，从而提高训练稳定性。大量的实验表明，UCA可以在各种VLM-AD模型和驾驶场景中诱导错误的驾驶命令，显著超过现有的最先进的攻击方法（在3-P指标上提高30%）。此外，UCA表现出强大的攻击鲁棒性，在不同的观点和动态条件下，表明实际部署的潜力很大。
摘要：Visual language modeling for automated driving is emerging as a promising research direction with substantial improvements in multimodal reasoning capabilities. Despite its advanced reasoning abilities, VLM-AD remains vulnerable to serious security threats from adversarial attacks, which involve misleading model decisions through carefully crafted perturbations. Existing attacks have obvious challenges: 1) Physical adversarial attacks primarily target vision modules. They are difficult to directly transfer to VLM-AD systems because they typically attack low-level perceptual components. 2) Adversarial attacks against VLM-AD have largely concentrated on the digital level. To address these challenges, we propose the first Universal Camouflage Attack (UCA) framework for VLM-AD. Unlike previous methods that focus on optimizing the logit layer, UCA operates in the feature space to generate physically realizable camouflage textures that exhibit strong generalization across different user commands and model architectures. Motivated by the observed vulnerability of encoder and projection layers in VLM-AD, UCA introduces a feature divergence loss (FDL) that maximizes the representational discrepancy between clean and adversarial images. In addition, UCA incorporates a multi-scale learning strategy and adjusts the sampling ratio to enhance its adaptability to changes in scale and viewpoint diversity in real-world scenarios, thereby improving training stability. Extensive experiments demonstrate that UCA can induce incorrect driving commands across various VLM-AD models and driving scenarios, significantly surpassing existing state-of-the-art attack methods (improving 30\% in 3-P metrics). Furthermore, UCA exhibits strong attack robustness under diverse viewpoints and dynamic conditions, indicating high potential for practical deployment.

【5】Probability Signature: Bridging Data Semantics and Embedding Structure in Language Models
标题：概率签名：在语言模型中架起数据语义和嵌入结构
链接：https://arxiv.org/abs/2509.20124

作者：o, Zhi-Qin John Xu
摘要：语言模型的嵌入空间被广泛认为可以捕获语义关系;例如，数字的嵌入通常表现出与其自然序列相对应的有序结构。然而，驱动这种结构形成的机制仍然知之甚少。在这项工作中，我们通过数据分布解释嵌入结构。我们提出了一组概率签名，反映令牌之间的语义关系。通过对线性模型和前馈网络的复合加法任务的实验，结合梯度流动力学的理论分析，我们发现这些概率特征显著影响嵌入结构。我们通过在Pile语料库的子集上训练Qwen2.5架构，进一步将我们的分析推广到大型语言模型（LLM）。我们的研究结果表明，概率签名忠实地与嵌入结构对齐，特别是在捕获嵌入之间的强成对相似性方面。我们的工作揭示了数据分布如何引导嵌入结构形成的机制，建立了一个新的理解嵌入组织和语义模式之间的关系。
摘要：The embedding space of language models is widely believed to capture the semantic relationships; for instance, embeddings of digits often exhibit an ordered structure that corresponds to their natural sequence. However, the mechanisms driving the formation of such structures remain poorly understood. In this work, we interpret the embedding structures via the data distribution. We propose a set of probability signatures that reflect the semantic relationships among tokens. Through experiments on the composite addition tasks using the linear model and feedforward network, combined with theoretical analysis of gradient flow dynamics, we reveal that these probability signatures significantly influence the embedding structures. We further generalize our analysis to large language models (LLMs) by training the Qwen2.5 architecture on the subsets of the Pile corpus. Our results show that the probability signatures are faithfully aligned with the embedding structures, particularly in capturing strong pairwise similarities among embeddings. Our work uncovers the mechanism of how data distribution guides the formation of embedding structures, establishing a novel understanding of the relationship between embedding organization and semantic patterns.

【6】PromptCoT 2.0: Scaling Prompt Synthesis for Large Language Model Reasoning
标题：CocktCoT 2.0：用于大型语言模型推理的缩放提示合成
链接：https://arxiv.org/abs/2509.19894

作者：Zhao, Wei Wu, Jian Guan, Zhuocheng Gong, Lingpeng Kong
备注：Preprint
摘要：大型语言模型（LLM）正在从会话系统发展成为强大的推理机，用于奥林匹克数学和竞争性编程等任务。虽然缩放参数和测试时间计算推动了进展，但一个关键的瓶颈是缺乏高质量的训练问题：人工策划的数据集成本高且有限，而现有的合成语料库往往过于简单或狭窄。CoT 1.0表明，将基本原理注入快速综合会增加问题的难度。在此基础上，我们提出了一个可扩展的框架，它用期望最大化（EM）循环取代了手工制作的算法，在这个循环中，对基本原理进行了迭代改进，以指导快速构建。这就产生了比以前的语料库更难、更多样化的问题。合成提示支持两种训练后机制：（1）自我发挥，其中强模型通过可验证的反馈自主改进，而无需更强的教师;（2）监督微调（SFT），其中较弱的模型从教师蒸馏的痕迹中学习。大量的实验证明了这种方法的有效性。在自玩游戏中，对Qwen 3 - 30 B-A3 B-Thinking-2507应用AptCoT 2.0在30 B级设置了新的最先进的结果，AIME 24/25和HMMT 25上为+4.4，+4.8和+5.3，LiveCodeBench v5/v6上为+6.1和+5.0，Codeforces上为+35 Elo。在SFT中，仅针对合成提示进行Qwen2.5- 7 B-Instruct训练，准确率可提高至73.1（AIME 24）、65.6（AIME 25）和53.4（LiveCodeBench v5），超过了基于人类或混合数据训练的模型。分析进一步证实，CointCoT 2.0从根本上产生了更困难和分布上不同的问题。这些结果建立了即时合成作为一个新的轴扩展推理和位置作为一个可扩展的基础，为未来的开放源代码的模型。该实现可在https://github.com/inclusionAI/PromptCoT上获得。
摘要：Large language models (LLMs) are evolving from conversational systems into strong reasoners for tasks such as Olympiad mathematics and competitive programming. While scaling parameters and test-time computation has driven progress, a key bottleneck is the lack of high-quality training problems: human-curated datasets are costly and limited, while existing synthetic corpora are often too easy or narrow. PromptCoT 1.0 showed that injecting rationales into prompt synthesis increases problem difficulty. Building on this, we present PromptCoT 2.0, a scalable framework that replaces hand-crafted heuristics with an expectation-maximization (EM) loop, where rationales are iteratively refined to guide prompt construction. This produces problems that are both harder and more diverse than prior corpora. The synthetic prompts support two post-training regimes: (1) Self-Play, where strong models improve autonomously via verifiable feedback without stronger teachers; and (2) Supervised Fine-Tuning (SFT), where weaker models learn from teacher-distilled traces. Extensive experiments demonstrate the effectiveness of this approach. In self-play, applying PromptCoT 2.0 to Qwen3-30B-A3B-Thinking-2507 sets new state-of-the-art results at the 30B scale, with +4.4, +4.8, and +5.3 on AIME 24/25 and HMMT 25, +6.1 and +5.0 on LiveCodeBench v5/v6, and +35 Elo on Codeforces. In SFT, training Qwen2.5-7B-Instruct solely on synthetic prompts boosts accuracy to 73.1 (AIME 24), 65.6 (AIME 25), and 53.4 (LiveCodeBench v5), surpassing models trained on human or hybrid data. Analyses further confirm that PromptCoT 2.0 yields fundamentally harder and distributionally distinct problems. These results establish prompt synthesis as a new axis for scaling reasoning and position PromptCoT 2.0 as a scalable foundation for future open-source models. The implementation is available at https://github.com/inclusionAI/PromptCoT.

【7】VCRL: Variance-based Curriculum Reinforcement Learning for Large Language Models
标题：VRTL：大型语言模型的基于方差的课程强化学习
链接：https://arxiv.org/abs/2509.19803

作者：iang, Wenfeng Feng, Guofeng Quan, Chuzhan Hao, Yuewei Zhang, Guohua Liu, Hao Wang
摘要：基于策略的强化学习目前在改进数学推理任务的LLM方面发挥着重要作用。然而，现有的基于推出的强化学习方法（GRPO，DAPO，GSPO等）没有明确考虑LLM对不同难度样本的学习能力，这与人类对数学推理任务由易到难的认知过程相反。直觉上，我们发现，推出组的奖励在RLVR的方差部分反映了当前样本的LLM的难度。太容易或太难的样本具有较低的方差，而中等难度的样本具有较高的方差。在此基础上，我们提出了VCRL，一个课程强化学习框架，它根据群体奖励的方差动态控制训练样本的难度。五个数学基准和两个模型的实验表明，VCRL的优势，目前的LLM RL基线。
摘要：Policy-based reinforcement learning currently plays an important role in improving LLMs on mathematical reasoning tasks. However, existing rollout-based reinforcement learning methods (GRPO, DAPO, GSPO, etc.) fail to explicitly consider LLMs' learning ability for samples of different difficulty levels, which is contrary to the human cognitive process of mathematical reasoning tasks from easy to difficult. Intuitively, we find that the variance of the rollout group's reward in RLVR partly reflects the difficulty of the current sample for LLMs. Samples that are too easy or too difficult have a lower variance, while samples with moderate difficulty have a higher variance. Based on this, we propose VCRL, a curriculum reinforcement learning framework that dynamically controls the difficulty of training samples based on the variance of group rewards. Experiments on five mathematical benchmarks and two models reveal the advantages of VCRL over the current LLM RL baselines.

【8】A Foundation Chemical Language Model for Comprehensive Fragment-Based Drug Discovery
标题：全面基于片段的药物发现的基础化学语言模型
链接：https://arxiv.org/abs/2509.19586

作者： Ho, Sukyeong Lee, Francis T.F. Tsai
摘要：我们介绍了FragAtlas-62 M，这是一个在迄今为止最大的片段数据集上训练的专业基础模型。它建立在由超过6200万个分子组成的完整ZINC-22片段子集的基础上，实现了前所未有的片段化学空间覆盖。我们的基于GPT-2的模型（42.7M参数）产生99.90%的化学有效片段。通过12个描述符和三种指纹方法的验证显示，生成的片段与训练分布密切匹配（所有效应大小< 0.4）。该模型保留了53.6%的已知锌片段，同时产生了22%具有实际意义的新结构。我们发布了FragAtlas-62 M，其中包含训练代码、预处理数据、文档和模型权重，以加速采用。
摘要：We introduce FragAtlas-62M, a specialized foundation model trained on the largest fragment dataset to date. Built on the complete ZINC-22 fragment subset comprising over 62 million molecules, it achieves unprecedented coverage of fragment chemical space. Our GPT-2 based model (42.7M parameters) generates 99.90% chemically valid fragments. Validation across 12 descriptors and three fingerprint methods shows generated fragments closely match the training distribution (all effect sizes < 0.4). The model retains 53.6% of known ZINC fragments while producing 22% novel structures with practical relevance. We release FragAtlas-62M with training code, preprocessed data, documentation, and model weights to accelerate adoption.

【9】Confidence Calibration in Large Language Model-Based Entity Matching
标题：基于大语言模型的实体匹配中的置信度校准
链接：https://arxiv.org/abs/2509.19557

作者：teeg, Juan Cardenas-Cartagena, Floris van Beers, Gineke ten Holt, Tsegaye Misikir Tashu, Matias Valdenegro-Toro
备注：9 pages, 2 figures. UncertaiNLP 2025 Workshop @ EMNLP Camera Ready
摘要：本研究旨在探讨实体匹配中大语言模型和置信度校准的交叉点。为此，我们进行了一项实证研究，比较基线RoberTa置信度的实体匹配任务的置信度，使用温度缩放，蒙特卡罗辍学和合奏校准。我们使用Abt-Buy、DBLP-ACM、iTunes-Amazon和Company数据集。研究结果表明，提出的修改后的RoBERTa模型表现出轻微的过度自信，预期校准误差分数范围从0.0043到0.0552的数据集。我们发现，这种过度自信可以使用温度缩放来缓解，将预期校准误差分数降低高达23.83%。
摘要：This research aims to explore the intersection of Large Language Models and confidence calibration in Entity Matching. To this end, we perform an empirical study to compare baseline RoBERTa confidences for an Entity Matching task against confidences that are calibrated using Temperature Scaling, Monte Carlo Dropout and Ensembles. We use the Abt-Buy, DBLP-ACM, iTunes-Amazon and Company datasets. The findings indicate that the proposed modified RoBERTa model exhibits a slight overconfidence, with Expected Calibration Error scores ranging from 0.0043 to 0.0552 across datasets. We find that this overconfidence can be mitigated using Temperature Scaling, reducing Expected Calibration Error scores by up to 23.83%.

【10】Cognitive Load Limits in Large Language Models: Benchmarking Multi-Hop Reasoning
标题：大型语言模型中的认知负载限制：多跳推理基准
链接：https://arxiv.org/abs/2509.19517

作者：Reddy Adapala
摘要：大型语言模型（LLM）的扩展暴露了它们在静态基准测试中的性能与它们在动态、信息丰富的环境中的脆弱性之间的关键差距。虽然模型在孤立的任务中表现出色，但在认知负荷下控制其推理的计算限制仍然知之甚少。在这项工作中，我们介绍了一个正式的理论计算认知负荷，假定无关的，任务无关的信息（上下文饱和）和干扰任务切换（注意力残留）是降低性能的关键机制。我们设计了交错认知评估（ICE），这是一个去基础的基准测试，可以在具有挑战性的多跳推理任务中系统地操纵这些负载因素。一项全面的研究（200个问题中每个项目N = 10次重复）显示了五个预防调整模型之间的显着性能差异。较小的开源架构（Llama-3-8B-Instruct，Mistral-7 B-Instruct-v0.2）表现出基线脆性，在此高内在负载任务中，在所有条件下（包括清洁对照）均达到0%的准确度（SEM = 0.0）。相比之下，Gemini-2.0-Flash-001显示出部分弹性，在对照条件下达到85%的准确度，在上下文饱和下具有统计学显著性降低（$\beta = -0.003$/ %负载，$p < 0.001$）。这些发现提供了初步证据，证明认知负荷是推理失败的关键因素，支持了不确定性下的幻觉猜测理论。我们的结论是，动态的、认知感知的压力测试，如ICE基准测试所示，对于评估先进人工智能系统的真实弹性和安全性至关重要。
摘要：The scaling of Large Language Models (LLMs) has exposed a critical gap between their performance on static benchmarks and their fragility in dynamic, information-rich environments. While models excel at isolated tasks, the computational limits that govern their reasoning under cognitive load remain poorly understood. In this work, we introduce a formal theory of computational cognitive load, positing that extraneous, task-irrelevant information (Context Saturation) and interference from task-switching (Attentional Residue) are key mechanisms that degrade performance. We designed the Interleaved Cognitive Evaluation (ICE), a deconfounded benchmark to systematically manipulate these load factors on challenging multi-hop reasoning tasks. A comprehensive study (N = 10 replications per item across 200 questions) revealed significant performance variations across five instruction-tuned models. Smaller open-source architectures (Llama-3-8B-Instruct, Mistral-7B-Instruct-v0.2) exhibited baseline brittleness, achieving 0% accuracy (SEM = 0.0) across all conditions, including clean controls, on this high-intrinsic-load task. In contrast, Gemini-2.0-Flash-001 showed partial resilience, achieving 85% accuracy in control conditions, with a statistically significant degradation under context saturation ($\beta = -0.003$ per % load, $p < 0.001$). These findings provide preliminary evidence that cognitive load is a key contributor to reasoning failures, supporting theories of hallucination-as-guessing under uncertainty. We conclude that dynamic, cognitive-aware stress testing, as exemplified by the ICE benchmark, is essential for evaluating the true resilience and safety of advanced AI systems.

【11】OmniVLA: An Omni-Modal Vision-Language-Action Model for Robot Navigation
标题：OmniVLA：机器人导航的全模式视觉-语言-动作模型
链接：https://arxiv.org/abs/2509.19480

作者：irose, Catherine Glossop, Dhruv Shah, Sergey Levine
备注：9 pages, 7 figures, 6 tables
摘要：当导航到目的地时，人类可以灵活地解释和组合不同的目标规范，例如语言指令，空间坐标或视觉参考。相比之下，大多数现有的机器人导航策略都是在单一模式上训练的，这限制了它们对现实世界场景的适应性，在现实世界中，不同形式的目标规范是自然和互补的。在这项工作中，我们提出了一个训练框架的机器人基础模型，使全模态目标条件反射的视觉导航。我们的方法利用了高容量的视觉-语言-动作（VLA）骨干，并通过随机模态融合策略，使用三种主要目标模态进行训练：2D姿势，自我中心图像和自然语言，以及它们的组合。这种设计不仅扩展了可用数据集的池，而且还鼓励策略开发更丰富的几何，语义和视觉表示。由此产生的模型OmniVLA实现了对未知环境的强大泛化，对稀缺模态的鲁棒性，以及遵循新的自然语言指令的能力。我们证明了OmniVLA在不同模式下的性能优于专家基线，并为微调新模式和任务提供了灵活的基础。我们相信OmniVLA为广泛推广和灵活的导航策略迈出了一步，并为构建全模态机器人基础模型提供了可扩展的路径。我们展示了展示OmniVLA性能的视频，并将在我们的项目页面上发布其检查点和培训代码。
摘要：Humans can flexibly interpret and compose different goal specifications, such as language instructions, spatial coordinates, or visual references, when navigating to a destination. In contrast, most existing robotic navigation policies are trained on a single modality, limiting their adaptability to real-world scenarios where different forms of goal specification are natural and complementary. In this work, we present a training framework for robotic foundation models that enables omni-modal goal conditioning for vision-based navigation. Our approach leverages a high-capacity vision-language-action (VLA) backbone and trains with three primary goal modalities: 2D poses, egocentric images, and natural language, as well as their combinations, through a randomized modality fusion strategy. This design not only expands the pool of usable datasets but also encourages the policy to develop richer geometric, semantic, and visual representations. The resulting model, OmniVLA, achieves strong generalization to unseen environments, robustness to scarce modalities, and the ability to follow novel natural language instructions. We demonstrate that OmniVLA outperforms specialist baselines across modalities and offers a flexible foundation for fine-tuning to new modalities and tasks. We believe OmniVLA provides a step toward broadly generalizable and flexible navigation policies, and a scalable path for building omni-modal robotic foundation models. We present videos showcasing OmniVLA performance and will release its checkpoints and training code on our project page.

【12】Uncertainty Quantification of Large Language Models using Approximate Bayesian Computation
标题：使用近似Bayesian计算进行大型语言模型的不确定性量化
链接：https://arxiv.org/abs/2509.19375

作者：arma (1), Adeetya Patel (1), Zaneta D' Souza (1), Samira Abbasgholizadeh Rahimi (1 and 3), Siva Reddy (2 and 3), Sreenath Madathil (1) ((1) Faculty of Dental Medicine and Oral Health Sciences, McGill University, Montreal, Canada (2) School of Computer Science, McGill University, Montreal, Canada (3) Mila-Quebec Artificial Intelligence Institute, Montreal, Canada)
摘要：尽管大型语言模型（LLM）应用广泛，但它往往难以表达不确定性，这对临床诊断等高风险和安全关键领域的可靠部署构成了挑战。现有的标准基线方法，如模型logits和引发的概率产生过度自信和校准不良的估计。在这项工作中，我们提出了近似贝叶斯计算（ABC），一个似然无贝叶斯推理，基于方法，将LLM作为一个随机模拟器来推断预测概率的后验分布。我们在两个临床相关的基准上评估了我们的ABC方法：合成的口腔病变诊断数据集和公开可用的GretelAI诊断数据集。与标准基线相比，我们的方法将准确度提高了46.9%，将Brier分数降低了74.4%，并通过预期校准误差（ECE）和预测熵来增强校准。
摘要：Despite their widespread applications, Large Language Models (LLMs) often struggle to express uncertainty, posing a challenge for reliable deployment in high stakes and safety critical domains like clinical diagnostics. Existing standard baseline methods such as model logits and elicited probabilities produce overconfident and poorly calibrated estimates. In this work, we propose Approximate Bayesian Computation (ABC), a likelihood-free Bayesian inference, based approach that treats LLMs as a stochastic simulator to infer posterior distributions over predictive probabilities. We evaluate our ABC approach on two clinically relevant benchmarks: a synthetic oral lesion diagnosis dataset and the publicly available GretelAI symptom-to-diagnosis dataset. Compared to standard baselines, our approach improves accuracy by up to 46.9\%, reduces Brier scores by 74.4\%, and enhances calibration as measured by Expected Calibration Error (ECE) and predictive entropy.

【13】LLM-Assisted Topic Reduction for BERTopic on Social Media Data
标题：LLM辅助BER话题简化社交媒体数据
链接：https://arxiv.org/abs/2509.19365

作者：nssens, Matthias Bogaert, Dirk Van den Poel
备注：13 pages, 8 figures. To be published in the Post-Workshop proceedings of the ECML PKDD 2025 Conference
摘要：BERTopic框架利用Transformer嵌入和层次聚类从非结构化文本语料库中提取潜在主题。虽然有效，但它经常与社交媒体数据作斗争，这些数据往往是嘈杂和稀疏的，导致过多的重叠主题。最近的工作探索了使用大型语言模型进行端到端主题建模。然而，这些方法通常需要大量的计算开销，限制了它们在大数据环境中的可扩展性。在这项工作中，我们提出了一个框架，将用于主题生成的BERTopic与用于主题约简的大型语言模型相结合。该方法首先生成一组初始主题，并为每个主题构造一个表示。然后，这些表示作为输入提供给语言模型，该语言模型迭代地识别和合并语义相似的主题。我们在三个Twitter/X数据集和四种不同的语言模型上评估了这种方法。我们的方法在增强主题多样性方面优于基线方法，在许多情况下，一致性，对数据集特征和初始参数选择具有一定的敏感性。
摘要：The BERTopic framework leverages transformer embeddings and hierarchical clustering to extract latent topics from unstructured text corpora. While effective, it often struggles with social media data, which tends to be noisy and sparse, resulting in an excessive number of overlapping topics. Recent work explored the use of large language models for end-to-end topic modelling. However, these approaches typically require significant computational overhead, limiting their scalability in big data contexts. In this work, we propose a framework that combines BERTopic for topic generation with large language models for topic reduction. The method first generates an initial set of topics and constructs a representation for each. These representations are then provided as input to the language model, which iteratively identifies and merges semantically similar topics. We evaluate the approach across three Twitter/X datasets and four different language models. Our method outperforms the baseline approach in enhancing topic diversity and, in many cases, coherence, with some sensitivity to dataset characteristics and initial parameter selection.

【14】GAUSS: Benchmarking Structured Mathematical Skills for Large Language Models
标题：GAUSS：大型语言模型的基准结构化数学技能
链接：https://arxiv.org/abs/2509.18122

作者：, Jiaxin Zhang, Qiuyu Ren, Tahsin Saffat, Xiaoxuan Liu, Zitong Yang, Banghua Zhu, Yi Ma
备注：120 pages (including appendix)
摘要：我们介绍了\textbf{GAUSS}（\textbf{G} general\textbf{A}ssessment of \textbf{U}nderlying \textbf {S} structured\textbf {S}kills in Mathematics），一个评估法学硕士在12个核心技能维度的数学能力的基准，分为三个领域：知识和理解，解决问题和沟通，元技能和创造力。通过根据认知技能对问题进行分类，并设计隔离特定能力的任务，GAUSS构建了模型数学能力的全面，细粒度和可解释的配置文件。这些轮廓忠实地代表了他们潜在的数学智能。为了说明如何使用\textsc{GAUSS}基准，我们推导出了\textsc{GPT-5-thinking}的技能概况，揭示了它的优势和劣势以及相对于\textsc{o 4-mini-high}的差异，从而强调了多维的、基于技能的评估的价值。
摘要：We introduce \textbf{GAUSS} (\textbf{G}eneral \textbf{A}ssessment of \textbf{U}nderlying \textbf{S}tructured \textbf{S}kills in Mathematics), a benchmark that evaluates LLMs' mathematical abilities across twelve core skill dimensions, grouped into three domains: knowledge and understanding, problem solving and communication, and meta-skills and creativity. By categorizing problems according to cognitive skills and designing tasks that isolate specific abilities, GAUSS constructs comprehensive, fine-grained, and interpretable profiles of models' mathematical abilities. These profiles faithfully represent their underlying mathematical intelligence. To exemplify how to use the \textsc{GAUSS} benchmark, we have derived the skill profile of \textsc{GPT-5-thinking}, revealing its strengths and weaknesses as well as its differences relative to \textsc{o4-mini-high}, thereby underscoring the value of multidimensional, skill-based evaluation.

Graph相关(图学习|图神经网络|图优化等)(6篇)

【1】Spatio-Temporal Directed Graph Learning for Account Takeover Fraud Detection
标题：用于账户接管欺诈检测的时空有向图学习
链接：https://arxiv.org/abs/2509.20339

作者：yebi Kerdabadi, William Andrew Byron, Xin Sun, Amirfarrokh Iranitalab
备注：This paper has been accepted at NeurIPS 2025 workshop New Perspective in Graph Machine Learning (NPGML)
摘要：账户接管（ATO）欺诈对消费者银行业务构成了重大挑战，需要在严格的延迟下进行高召回，同时最大限度地减少合法用户的摩擦。生产系统通常依赖于表格式梯度提升决策树（例如，XGBoost），该评分会话独立，忽略了关系和时间结构的在线活动的特点，协调攻击和“欺诈环”。“我们介绍了ATLAS（跨时空有向图的帐户接管学习），这是一个框架，它将ATO检测重新定义为时间相关有向会话图上的时空节点分类。ATLAS通过共享标识符（帐户、设备、IP）链接实体，并通过时间窗口和新近度约束来调节连接性，从而实现因果、时间相关的消息传递和延迟感知标签传播，该标签传播仅使用评分时可用的标签，非预期且无泄漏。我们使用通过邻居采样训练的归纳GraphSAGE变体来操作ATLAS，在具有超过1亿个节点和大约1B边缘的会话图上进行规模化。在Capital One的一款高风险数字产品上，ATLAS实现了6.38%的AUC提升，并将客户摩擦减少了50%以上，在减少用户摩擦的同时提高了欺诈捕获能力。
摘要：Account Takeover (ATO) fraud poses a significant challenge in consumer banking, requiring high recall under strict latency while minimizing friction for legitimate users. Production systems typically rely on tabular gradient-boosted decision trees (e.g., XGBoost) that score sessions independently, overlooking the relational and temporal structure of online activity that characterizes coordinated attacks and "fraud rings." We introduce ATLAS (Account Takeover Learning Across Spatio-Temporal Directed Graph), a framework that reformulates ATO detection as spatio-temporal node classification on a time-respecting directed session graph. ATLAS links entities via shared identifiers (account, device, IP) and regulates connectivity with time-window and recency constraints, enabling causal, time-respecting message passing and latency-aware label propagation that uses only labels available at scoring time, non-anticipative and leakage-free. We operationalize ATLAS with inductive GraphSAGE variants trained via neighbor sampling, at scale on a sessions graph with more than 100M nodes and around 1B edges. On a high-risk digital product at Capital One, ATLAS delivers 6.38 percent AUC improvement and more than 50 percent reduction in customer friction, improving fraud capture while reducing user friction.

【2】Uncovering Graph Reasoning in Decoder-only Transformers with Circuit Tracing
标题：具有电路跟踪的纯解码器Transformer中的图推理
链接：https://arxiv.org/abs/2509.20336

作者：i, Chung-Hsiang Lo, Kai Guo, Shenglai Zeng, Dongsheng Luo, Jiliang Tang
备注：Accepted by the Workshop on Efficient Reasoning, Neurips 2025
摘要：基于transformer的LLM在图推理任务上表现出强大的性能，但其内部机制仍未得到充分探索。为了以基本且统一的视角揭示这些推理过程机制，我们设置了基本的仅解码器Transformers并使用电路跟踪器框架解释它们。通过这个镜头，我们可视化推理痕迹，并确定两个核心机制图推理：令牌合并和结构记忆，这两个路径推理和子结构提取任务的基础。我们进一步量化这些行为，并分析它们如何受到图密度和模型大小的影响。我们的研究提供了一个统一的可解释性框架，理解结构推理的解码器只有Transformers。
摘要：Transformer-based LLMs demonstrate strong performance on graph reasoning tasks, yet their internal mechanisms remain underexplored. To uncover these reasoning process mechanisms in a fundamental and unified view, we set the basic decoder-only transformers and explain them using the circuit-tracer framework. Through this lens, we visualize reasoning traces and identify two core mechanisms in graph reasoning: token merging and structural memorization, which underlie both path reasoning and substructure extraction tasks. We further quantify these behaviors and analyze how they are influenced by graph density and model size. Our study provides a unified interpretability framework for understanding structural reasoning in decoder-only Transformers.

【3】Graph Variate Neural Networks
标题：图变神经网络
链接：https://arxiv.org/abs/2509.20311

作者：ashar Moshfeghi, Keith Smith
摘要：对动态演化的时空信号进行建模是图神经网络（GNN）文献中的一个突出挑战。值得注意的是，GNN假设现有的底层图形结构。虽然这种潜在的结构可能并不总是存在或独立于信号导出，但时间演化的功能网络总是可以从多通道数据构建。图变量信号分析（GVSA）定义了一个统一的框架，由瞬时连接配置文件的网络张量组成，通常由信号本身构建的稳定支持。基于GVSA和图形信号处理工具，我们引入了图形变量神经网络（GVNN）：将时空信号与信号相关的连接张量进行卷积的层，将稳定的长期支持与瞬时的数据驱动交互相结合。这种设计捕获动态的统计相互依赖性，在每个时间步没有特设的滑动窗口，并承认一个有效的实现与线性复杂度的序列长度。在预测基准中，GVNN始终优于强大的基于图形的基线，并与广泛使用的序列模型（如LSTM和Transformers）竞争。在EEG运动图像分类上，GVNN实现了很强的准确性，突出了它们在脑机接口应用中的潜力。
摘要：Modelling dynamically evolving spatio-temporal signals is a prominent challenge in the Graph Neural Network (GNN) literature. Notably, GNNs assume an existing underlying graph structure. While this underlying structure may not always exist or is derived independently from the signal, a temporally evolving functional network can always be constructed from multi-channel data. Graph Variate Signal Analysis (GVSA) defines a unified framework consisting of a network tensor of instantaneous connectivity profiles against a stable support usually constructed from the signal itself. Building on GVSA and tools from graph signal processing, we introduce Graph-Variate Neural Networks (GVNNs): layers that convolve spatio-temporal signals with a signal-dependent connectivity tensor combining a stable long-term support with instantaneous, data-driven interactions. This design captures dynamic statistical interdependencies at each time step without ad hoc sliding windows and admits an efficient implementation with linear complexity in sequence length. Across forecasting benchmarks, GVNNs consistently outperform strong graph-based baselines and are competitive with widely used sequence models such as LSTMs and Transformers. On EEG motor-imagery classification, GVNNs achieve strong accuracy highlighting their potential for brain-computer interface applications.

【4】PGCLODA: Prompt-Guided Graph Contrastive Learning for Oligopeptide-Infectious Disease Association Prediction
标题：PGCLDA：用于寡肽-传染病关联预测的预算引导图对比学习
链接：https://arxiv.org/abs/2509.20290

作者： Jing Chen, Xiaoping Zhou, Yansen Su, Chunhou Zheng
备注：12page and 8 figures
摘要：传染病继续对公共卫生构成严重威胁，这突出表明迫切需要有效的计算方法来筛选新型抗感染药物。寡肽因其结构简单、生物利用度高、耐药性低等优点，已成为抗菌药物研究的重要候选药物。尽管有潜力，但专门设计用于预测寡肽与传染病之间关联的计算模型仍然很少。本研究引入了一个基于图的对比学习框架（PGCLODA）来发现潜在的关联。以寡肽、微生物和疾病为节点，结合结构和语义信息，构建了一个三方图。为了在对比学习过程中保留关键区域，采用了一种基于启发式引导的图增强策略来生成有意义的成对视图。该算法采用了一种双编码器结构，将图卷积网络（GCN）和Transformer结合起来，实现了局部和全局特征的联合捕获。融合嵌入随后输入到多层感知器（MLP）分类器进行最终预测。在基准数据集上的实验结果表明，PGCLODA在AUROC，AUPRC和准确性方面始终优于最先进的模型。消融和超参数研究证实了每个模块的贡献。案例研究进一步验证了PGCLODA的泛化能力及其发现新的生物相关关联的潜力。这些发现为机制驱动的发现和基于寡肽的药物开发提供了有价值的见解。PGCLODA的源代码可在https://github.com/jjnlcode/PGCLODA上在线获得。
摘要：Infectious diseases continue to pose a serious threat to public health, underscoring the urgent need for effective computational approaches to screen novel anti-infective agents. Oligopeptides have emerged as promising candidates in antimicrobial research due to their structural simplicity, high bioavailability, and low susceptibility to resistance. Despite their potential, computational models specifically designed to predict associations between oligopeptides and infectious diseases remain scarce. This study introduces a prompt-guided graph-based contrastive learning framework (PGCLODA) to uncover potential associations. A tripartite graph is constructed with oligopeptides, microbes, and diseases as nodes, incorporating both structural and semantic information. To preserve critical regions during contrastive learning, a prompt-guided graph augmentation strategy is employed to generate meaningful paired views. A dual encoder architecture, integrating Graph Convolutional Network (GCN) and Transformer, is used to jointly capture local and global features. The fused embeddings are subsequently input into a multilayer perceptron (MLP) classifier for final prediction. Experimental results on a benchmark dataset indicate that PGCLODA consistently outperforms state-of-the-art models in AUROC, AUPRC, and accuracy. Ablation and hyperparameter studies confirm the contribution of each module. Case studies further validate the generalization ability of PGCLODA and its potential to uncover novel, biologically relevant associations. These findings offer valuable insights for mechanism-driven discovery and oligopeptide-based drug development. The source code of PGCLODA is available online at https://github.com/jjnlcode/PGCLODA.

【5】Graph-based Neural Space Weather Forecasting
标题：基于图的神经空间天气预报
链接：https://arxiv.org/abs/2509.19605

作者：lmberg, Ivan Zaitsev, Markku Alho, Ioanna Bouri, Fanni Franssila, Haewon Jeong, Minna Palmroth, Teemu Roos
备注：20 pages, 18 figures. Accepted to the NeurIPS 2025 Workshop on Machine Learning and the Physical Sciences
摘要：准确的空间天气预报对于保护我们日益数字化的基础设施至关重要。像Vrasiator这样的混合Vlasov模型提供了超越当前操作系统的物理现实主义，但对于实时使用来说计算成本太高。我们引入了一个基于图的神经仿真器，在Vlasiator数据上进行训练，以自回归预测上游太阳风驱动的近地空间条件。我们展示了如何实现快速确定性预测，并通过使用生成模型，产生合奏捕捉预测的不确定性。这项工作表明，机器学习提供了一种方法，可以为现有的空间天气预测系统增加不确定性量化能力，并使混合Vlasov模拟易于操作。
摘要：Accurate space weather forecasting is crucial for protecting our increasingly digital infrastructure. Hybrid-Vlasov models, like Vlasiator, offer physical realism beyond that of current operational systems, but are too computationally expensive for real-time use. We introduce a graph-based neural emulator trained on Vlasiator data to autoregressively predict near-Earth space conditions driven by an upstream solar wind. We show how to achieve both fast deterministic forecasts and, by using a generative model, produce ensembles to capture forecast uncertainty. This work demonstrates that machine learning offers a way to add uncertainty quantification capability to existing space weather prediction systems, and make hybrid-Vlasov simulation tractable for operational use.

【6】Graph-Based Spatio-temporal Attention and Multi-Scale Fusion for Clinically Interpretable, High-Fidelity Fetal ECG Extraction
标题：基于图形的时空注意力和多尺度融合用于临床可解释、高保真胎儿心电图提取
链接：https://arxiv.org/abs/2509.19308

作者：g, Ming Zhu, Shahram Latifi, Buddhadeb Dawn, Shengjie Zhai
备注：6 pages, ACM BCB 2025
摘要：先天性心脏病（CHD）是最常见的新生儿异常，凸显了早期发现以改善预后的迫切需要。然而，腹部ECG（aECG）中的胎儿ECG（fECG）信号经常被母体ECG和噪声掩盖，这在低信噪比（SNR）条件下挑战了传统方法。我们提出了FetalHealthNet（FHNet），这是一个深度学习框架，它将图神经网络与多尺度增强型Transformer集成在一起，以动态建模时空导联间相关性并提取干净的fECG信号。在基准aECG数据集上，FHNet始终优于长短期记忆（LSTM）模型、标准Transformers和最先进的模型，即使在严重噪声下也能实现R2>0.99和RMSE = 0.015。可解释性分析突出了生理上有意义的时间和铅的贡献，支持模型的透明度和临床信任。FHNet展示了人工智能驱动的建模在推进胎儿监测和实现早期CHD筛查方面的潜力，强调了下一代生物医学信号处理的变革性影响。
摘要：Congenital Heart Disease (CHD) is the most common neonatal anomaly, highlighting the urgent need for early detection to improve outcomes. Yet, fetal ECG (fECG) signals in abdominal ECG (aECG) are often masked by maternal ECG and noise, challenging conventional methods under low signal-to-noise ratio (SNR) conditions. We propose FetalHealthNet (FHNet), a deep learning framework that integrates Graph Neural Networks with a multi-scale enhanced transformer to dynamically model spatiotemporal inter-lead correlations and extract clean fECG signals. On benchmark aECG datasets, FHNet consistently outperforms long short-term memory (LSTM) models, standard transformers, and state-of-the-art models, achieving R2>0.99 and RMSE = 0.015 even under severe noise. Interpretability analyses highlight physiologically meaningful temporal and lead contributions, supporting model transparency and clinical trust. FHNet illustrates the potential of AI-driven modeling to advance fetal monitoring and enable early CHD screening, underscoring the transformative impact of next-generation biomedical signal processing.

Transformer(4篇)

【1】Pi-Transformer: A Physics-informed Attention Mechanism for Time Series Anomaly Detection
标题：Pi-Transformer：用于时间序列异常检测的物理信息注意力机制
链接：https://arxiv.org/abs/2509.19985

作者：leki, Negar Pourmoazemi
摘要：多变量时间序列中的异常通常来自时间背景和跨通道协调，而不是孤立的离群值。我们提出了Pi-Transformer，一个物理信息的Transformer，具有两个注意路径：一个数据驱动的系列注意和一个平滑进化的先验注意，编码时间不变量，如尺度相关的自相似性和相位同步。先验作为校准重建误差的稳定参考。在训练过程中，我们将重建目标与发散项配对，该发散项鼓励两个注意力之间的一致性，同时保持它们有意义的区别;先验被正则化以平滑地演变，并被轻微地提炼为小规模统计。在推理时，该模型将加权重建信号（能量）与突出时序和相位中断的失配信号相结合，并将它们融合到单个分数中进行检测。在五个基准测试（SMD、MSL、SMAP、SWaT和PSM）中，Pi-Transformer实现了最先进或极具竞争力的F1，在时序和相位中断异常方面具有特别的优势。案例分析显示了两个流的互补行为和可解释的检测周围政权的变化。将物理学先验知识嵌入到注意力中，可以在复杂的多变量系统中产生一种校准和鲁棒的异常检测方法。代码可在此GitHub存储库\footnote{https：//github.com/sepehr-m/Pi-Transformer}中公开获取。
摘要：Anomalies in multivariate time series often arise from temporal context and cross-channel coordination rather than isolated outliers. We present Pi-Transformer, a physics-informed transformer with two attention pathways: a data-driven series attention and a smoothly evolving prior attention that encodes temporal invariants such as scale-related self-similarity and phase synchrony. The prior acts as a stable reference that calibrates reconstruction error. During training, we pair a reconstruction objective with a divergence term that encourages agreement between the two attentions while keeping them meaningfully distinct; the prior is regularised to evolve smoothly and is lightly distilled towards dataset-level statistics. At inference, the model combines an alignment-weighted reconstruction signal (Energy) with a mismatch signal that highlights timing and phase disruptions, and fuses them into a single score for detection. Across five benchmarks (SMD, MSL, SMAP, SWaT, and PSM), Pi-Transformer achieves state-of-the-art or highly competitive F1, with particular strength on timing and phase-breaking anomalies. Case analyses show complementary behaviour of the two streams and interpretable detections around regime changes. Embedding physics-informed priors into attention yields a calibrated and robust approach to anomaly detection in complex multivariate systems. Code is publicly available at this GitHub repository\footnote{https://github.com/sepehr-m/Pi-Transformer}.

【2】Linear Transformers Implicitly Discover Unified Numerical Algorithms
标题：线性变形器实现发现统一数值算法
链接：https://arxiv.org/abs/2509.19702

作者：utz, Aditya Gangrade, Hadi Daneshmand, Venkatesh Saligrama
备注：To appear at NeurIPS 2025
摘要：我们训练线性注意力Transformer在数百万个掩蔽块矩阵完成任务上：每个提示都是掩蔽的低秩矩阵，其缺失的块可能是（i）标量预测目标或（ii）Nystr\“om外推的不可见内核切片。该模型只看到输入输出对和均方损失;它没有给出正规方程，没有手工迭代，也没有任务相关的提示。令人惊讶的是，经过训练后，代数展开在三种不同的计算机制（完全可见性，秩限制更新和分布式计算）中揭示了相同的无参数更新规则。我们证明了这一规则实现了二阶收敛的全批问题，削减分布式迭代的复杂性，并保持准确的秩有限的注意。因此，一个Transformer训练单独修补丢失的块隐式地发现一个统一的，资源自适应的迭代求解器跨越预测，估计和Nystr\“om外推，突出了在上下文学习的强大能力。
摘要：We train a linear attention transformer on millions of masked-block matrix completion tasks: each prompt is masked low-rank matrix whose missing block may be (i) a scalar prediction target or (ii) an unseen kernel slice of Nystr\"om extrapolation. The model sees only input-output pairs and a mean-squared loss; it is given no normal equations, no handcrafted iterations, and no hint that the tasks are related. Surprisingly, after training, algebraic unrolling reveals the same parameter-free update rule across three distinct computational regimes (full visibility, rank-limited updates, and distributed computation). We prove that this rule achieves second-order convergence on full-batch problems, cuts distributed iteration complexity, and remains accurate with rank-limited attention. Thus, a transformer trained solely to patch missing blocks implicitly discovers a unified, resource-adaptive iterative solver spanning prediction, estimation, and Nystr\"om extrapolation, highlighting a powerful capability of in-context learning.

【3】Transformer Modeling for Both Scalability and Performance in Multivariate Time Series
标题：多变量时间序列可扩展性和性能的Transformer建模
链接：https://arxiv.org/abs/2509.19471

作者：e, Corey Clark
摘要：变量计数是多变量时间序列（MTS）数据中Transformer建模的主要可扩展性瓶颈之一。除此之外，该领域越来越多的共识指出，不加选择的变量间混合是噪声积累和性能下降的潜在来源。这可能是加剧了稀疏的信息信号的特点，许多MTS系统加上代表性的不对准源于（异构）变量之间的不加选择的信息混合。虽然可扩展性和性能通常被视为竞争利益的Transformer设计，我们表明，这两个可以同时提高MTS的战略约束变量间混合的代表能力。我们提出的方法，Transformer与委托令牌注意力（DELTAformer），约束变量间建模，通过我们所谓的委托令牌，然后用于执行完整的，无约束的，跨时间建模。委托令牌充当隐式正则化器，强制模型高度选择允许通过网络传播的变量间信息。我们的研究结果表明，DELTAformer与变量计数线性扩展，同时实际上优于标准Transformers，在基准和基线上实现了最先进的性能。此外，在嘈杂的MTS环境中，DELTAformer可以比标准Transformers更好地专注于相关信号，并且整体表现出卓越的抗噪能力。总体而言，各种实验的结果证实，通过调整我们的模型设计，利用MTS中特定领域的挑战，DELTAformer可以同时实现线性缩放，同时实际上提高了其相对于标准二次Transformers的性能。
摘要：Variable count is among the main scalability bottlenecks for transformer modeling in multivariate time series (MTS) data. On top of this, a growing consensus in the field points to indiscriminate inter-variable mixing as a potential source of noise-accumulation and performance degradation. This is likely exacerbated by sparsity of informative signals characteristic of many MTS systems coupled with representational misalignment stemming from indiscriminate information mixing between (heterogeneous) variables. While scalability and performance are often seen as competing interests in transformer design, we show that both can be improved simultaneously in MTS by strategically constraining the representational capacity of inter-variable mixing. Our proposed method, transformer with Delegate Token Attention (DELTAformer), constrains inter-variable modeling through what we call delegate tokens which are then used to perform full, unconstrained, inter-temporal modeling. Delegate tokens act as an implicit regularizer that forces the model to be highly selective about what inter-variable information is allowed to propagate through the network. Our results show that DELTAformer scales linearly with variable-count while actually outperforming standard transformers, achieving state-of-the-art performance across benchmarks and baselines. In addition, DELTAformer can focus on relevant signals better than standard transformers in noisy MTS environments and overall exhibit superior noise-resilience. Overall, results across various experiments confirm that by aligning our model design to leverage domain-specific challenges in MTS to our advantage, DELTAformer can simultaneously achieve linear scaling while actually improving its performance against standard, quadratic transformers.

【4】Holographic Transformers for Complex-Valued Signal Processing: Integrating Phase Interference into Self-Attention
标题：用于复值信号处理的全息变换器：将相干扰集成到自我注意力中
链接：https://arxiv.org/abs/2509.19331

作者：ng, Zhiyu Zhang, Tianxiang Xu, Chunshu Xia, Kaichun Hu, Yuchen Yang, Tongtong Pan, Dong Dong, Zhan Qin
摘要：复值信号对幅度和相位进行编码，但大多数深度模型将注意力视为实值相关性，忽略了干扰效应。我们介绍了全息Transformer，一个物理启发的架构，将波干涉原理纳入自我注意力。全息注意力通过相对相位调节相互作用，并连贯地叠加值，确保振幅和相位之间的一致性。双头解码器同时重建输入并预测任务输出，防止损失优先于幅度而不是相位时出现相位崩溃。我们证明了全息注意力实现了一个离散的干涉算子，并在线性混合下保持相位一致性。PolSAR图像分类和无线信道预测的实验表明，强大的性能，实现高的分类精度和F1分数，低回归误差，并增加相位扰动的鲁棒性。这些结果强调，在注意力中强制执行物理一致性导致复值学习的可推广的改进，并为相干信号建模提供了一个统一的、基于物理的框架。该代码可在https://github.com/EonHao/Holographic-Transformers上获得。
摘要：Complex-valued signals encode both amplitude and phase, yet most deep models treat attention as real-valued correlation, overlooking interference effects. We introduce the Holographic Transformer, a physics-inspired architecture that incorporates wave interference principles into self-attention. Holographic attention modulates interactions by relative phase and coherently superimposes values, ensuring consistency between amplitude and phase. A dual-headed decoder simultaneously reconstructs the input and predicts task outputs, preventing phase collapse when losses prioritize magnitude over phase. We demonstrate that holographic attention implements a discrete interference operator and maintains phase consistency under linear mixing. Experiments on PolSAR image classification and wireless channel prediction show strong performance, achieving high classification accuracy and F1 scores, low regression error, and increased robustness to phase perturbations. These results highlight that enforcing physical consistency in attention leads to generalizable improvements in complex-valued learning and provides a unified, physics-based framework for coherent signal modeling. The code is available at https://github.com/EonHao/Holographic-Transformers.

GAN|对抗|攻击|生成相关(9篇)

【1】VisualMimic: Visual Humanoid Loco-Manipulation via Motion Tracking and Generation
标题：Visual Mimic：通过运动跟踪和生成的视觉人形机器人部位操纵
链接：https://arxiv.org/abs/2509.20322

作者：Yin, Yanjie Ze, Hong-Xing Yu, C. Karen Liu, Jiajun Wu
备注：Website: this https URL
摘要：非结构化环境中的仿人机器人局部操纵需要自我中心感知和全身控制的紧密结合。然而，现有的方法要么依赖于外部运动捕捉系统，要么无法在不同的任务中推广。我们介绍VisualMimic，一个视觉模拟到真实的框架，它将以自我为中心的视觉与类人机器人的分层全身控制相结合。VisualMimic结合了一个与任务无关的低级关键点跟踪器-通过师生计划从人体运动数据中训练-与特定于任务的高级策略相结合，该策略从视觉和本体感受输入中生成关键点命令。为了确保稳定的训练，我们将噪声注入到低级别的策略中，并使用人体运动统计来剪辑高级别的动作。VisualMimic能够将在模拟中训练的视觉策略zero-shot传输到真实的人形机器人，完成各种各样的本地操作任务，如搬箱子，推，足球运球和踢球。除了受控的实验室环境外，我们的政策还广泛适用于户外环境。视频可在https://visualmimic.github.io上获得。
摘要：Humanoid loco-manipulation in unstructured environments demands tight integration of egocentric perception and whole-body control. However, existing approaches either depend on external motion capture systems or fail to generalize across diverse tasks. We introduce VisualMimic, a visual sim-to-real framework that unifies egocentric vision with hierarchical whole-body control for humanoid robots. VisualMimic combines a task-agnostic low-level keypoint tracker -- trained from human motion data via a teacher-student scheme -- with a task-specific high-level policy that generates keypoint commands from visual and proprioceptive input. To ensure stable training, we inject noise into the low-level policy and clip high-level actions using human motion statistics. VisualMimic enables zero-shot transfer of visuomotor policies trained in simulation to real humanoid robots, accomplishing a wide range of loco-manipulation tasks such as box lifting, pushing, football dribbling, and kicking. Beyond controlled laboratory settings, our policies also generalize robustly to outdoor environments. Videos are available at: https://visualmimic.github.io .

【2】Benchmarking Web API Integration Code Generation
标题：Web API集成代码生成基准测试
链接：https://arxiv.org/abs/2509.20172

作者：ninger, Leon Chemnitz, Amir Molzam Sharifloo, Jannis Brugger, Mira Mezini
备注：To be published in Proceedings of 2nd ACM International Conference on AI-powered Software, Benchmark & Dataset Track (AIware '25)
摘要：API集成是我们数字基础设施的基石，使软件系统能够连接和交互。然而，正如许多研究所示，编写或生成正确的代码来调用API，特别是Web API，是一项挑战。尽管大型语言模型（LLM）在软件开发中已经变得流行，但它们在自动生成Web API集成代码方面的有效性仍然没有得到探索。为了解决这个问题，我们提出了一个数据集和评估管道，旨在评估LLM生成Web API调用代码的能力。我们对几个开源LLM的实验表明，生成API调用构成了一个重大挑战，导致幻觉端点，不正确的参数使用和其他错误。没有一个被评估的开源模型能够解决超过40%的任务。
摘要：API integration is a cornerstone of our digital infrastructure, enabling software systems to connect and interact. However, as shown by many studies, writing or generating correct code to invoke APIs, particularly web APIs, is challenging. Although large language models~(LLMs) have become popular in software development, their effectiveness in automating the generation of web API integration code remains unexplored. In order to address this, we present a dataset and evaluation pipeline designed to assess the ability of LLMs to generate web API invocation code. Our experiments with several open-source LLMs reveal that generating API invocations poses a significant challenge, resulting in hallucinated endpoints, incorrect argument usage, and other errors. None of the evaluated open-source models were able to solve more than 40% of the tasks.

【3】Beyond Slater's Condition in Online CMDPs with Stochastic and Adversarial Constraints
标题：具有随机和对抗约束的在线CMDP中超越斯莱特条件
链接：https://arxiv.org/abs/2509.20114

作者： Emanuele Stradi, Eleonora Fidelia Chiefari, Matteo Castiglioni, Alberto Marchesi, Nicola Gatti
摘要：我们研究了随机和对抗约束下的在线情景约束马尔可夫决策过程（CMDPs）。我们提供了一种新的算法，其保证大大提高了Stradi等人（2025）引入的最先进的best-of-both-worlds算法的保证。在随机状态下，\n {即}，当约束从固定但未知的分布中采样时，我们的方法在不依赖Slater条件的情况下实现了$\widetilde{\mathcal{O}}（\sqrt{T}）$遗憾和约束违反，从而处理不存在严格可行解的设置。此外，我们提供了更强的概念上的保证\n {正}约束违反，这不允许恢复从大的违规行为在早期的情节，发挥严格的安全政策。在对抗制中，当约束条件在两个事件之间可以任意变化时，我们的算法保证了次线性约束违反，而没有Slater条件，并实现了关于无约束最优的次线性$\alpha$-后悔，其中$\alpha$是一个适当定义的乘法逼近因子.通过仿真实验进一步验证了算法的有效性。
摘要：We study \emph{online episodic Constrained Markov Decision Processes} (CMDPs) under both stochastic and adversarial constraints. We provide a novel algorithm whose guarantees greatly improve those of the state-of-the-art best-of-both-worlds algorithm introduced by Stradi et al. (2025). In the stochastic regime, \emph{i.e.}, when the constraints are sampled from fixed but unknown distributions, our method achieves $\widetilde{\mathcal{O}}(\sqrt{T})$ regret and constraint violation without relying on Slater's condition, thereby handling settings where no strictly feasible solution exists. Moreover, we provide guarantees on the stronger notion of \emph{positive} constraint violation, which does not allow to recover from large violation in the early episodes by playing strictly safe policies. In the adversarial regime, \emph{i.e.}, when the constraints may change arbitrarily between episodes, our algorithm ensures sublinear constraint violation without Slater's condition, and achieves sublinear $\alpha$-regret with respect to the \emph{unconstrained} optimum, where $\alpha$ is a suitably defined multiplicative approximation factor. We further validate our results through synthetic experiments, showing the practical effectiveness of our algorithm.

【4】Latent Iterative Refinement Flow: A Geometric-Constrained Approach for Few-Shot Generation
标题：潜在迭代细化流程：一种用于Few-Shot生成的几何约束方法
链接：https://arxiv.org/abs/2509.19903

作者：i, Zhenyu Liao, Tianqi Hou, Ting Gao
摘要：Few-Shot生成，即从有限的训练数据中合成高质量和多样化的样本，仍然是生成式建模中的一个重大挑战。从头开始训练的现有方法通常无法克服过拟合和模式崩溃，并且微调大型模型可能会继承偏差，同时忽略潜在空间的关键几何结构。为了解决这些局限性，我们引入了潜在的迭代精化流（LIRF），一种新的方法，重新定义Few-Shot生成的几何结构的流形的渐进致密化。LIRF建立了一个稳定的潜在空间，使用我们的新的\textbf {manifold-preservation loss}$L_{\text {manifold}}$训练的自动编码器。这种损失确保了潜在空间保持输入数据的几何和语义对应。在此基础上，我们提出了一个迭代生成-校正-增强循环。在这个循环中，候选样本通过几何\textbf {correction operator}进行细化，这是一种可证明的收缩映射，将样本拉向数据流形，同时保持多样性。我们还提供了\textbf {收敛定理}，证明了生成的和真实的数据流形之间的Hausdorff距离的可预测的减少。我们还证明了框架的可扩展性，通过生成连贯的，高分辨率的图像AFHQ猫。消融研究证实，流形保留潜在的空间和收缩校正机制是这一成功的关键组成部分。最终，LIRF为数据稀缺的生成建模提供了一种解决方案，不仅在理论上有基础，而且在实践中非常有效。
摘要：Few-shot generation, the synthesis of high-quality and diverse samples from limited training data, remains a significant challenge in generative modeling. Existing methods trained from scratch often fail to overcome overfitting and mode collapse, and fine-tuning large models can inherit biases while neglecting the crucial geometric structure of the latent space. To address these limitations, we introduce Latent Iterative Refinement Flow (LIRF), a novel approach that reframes few-shot generation as the progressive densification of geometrically structured manifold. LIRF establishes a stable latent space using an autoencoder trained with our novel \textbf{manifold-preservation loss} $L_{\text{manifold}}$. This loss ensures that the latent space maintains the geometric and semantic correspondence of the input data. Building on this, we propose an iterative generate-correct-augment cycle. Within this cycle, candidate samples are refined by a geometric \textbf{correction operator}, a provably contractive mapping that pulls samples toward the data manifold while preserving diversity. We also provide the \textbf{Convergence Theorem} demonstrating a predictable decrease in Hausdorff distance between generated and true data manifold. We also demonstrate the framework's scalability by generating coherent, high-resolution images on AFHQ-Cat. Ablation studies confirm that both the manifold-preserving latent space and the contractive correction mechanism are critical components of this success. Ultimately, LIRF provides a solution for data-scarce generative modeling that is not only theoretically grounded but also highly effective in practice.

【5】PPGFlowECG: Latent Rectified Flow with Cross-Modal Encoding for PPG-Guided ECG Generation and Cardiovascular Disease Detection
标题：PPGFlow心电图：具有交叉模式编码的潜在纠正流，用于PGP引导的心电图生成和心血管疾病检测
链接：https://arxiv.org/abs/2509.19774

作者： Fang, Jiarui Jin, Haoyu Wang, Che Liu, Jieyi Cai, Guangkun Nie, Jun Li, Hongyan Li, Shenda Hong
摘要：在临床实践中，心电图（ECG）仍然是心脏监测的黄金标准，为诊断各种心血管疾病（CVD）提供了重要见解。但是，由于它依赖专门设备和训练有素的人员，因此限制了持续例行监测的可行性。光电容积描记术（PPG）提供了可访问的，连续的监测，但缺乏明确的电生理信息，防止结论性诊断。生成模型提出了一种很有前途的方法来将PPG转化为有临床价值的ECG信号，但目前的方法面临着巨大的挑战，包括生成模型中生理语义的不一致和高维信号建模的复杂性。为此，我们提出了PPGFlowECG，这是一个两阶段的框架，它通过PALIGN Align Encoder在共享的潜在空间中对齐PPG和ECG，并采用潜在整流流来生成具有高保真度和可解释性的ECG。据我们所知，这是第一项在MCMED上进行实验的研究，MCMED是一个新发布的临床级数据集，包括来自118，000多个急诊科就诊的超过1000万对PPG-ECG样本，带有专家标记的心血管疾病注释。结果表明，我们的方法的有效性PPG到ECG的翻译和心血管疾病检测。此外，心脏病专家主导的评估证实，合成ECG实现了高保真度并提高了诊断可靠性，强调了我们的方法在现实世界心血管筛查中的潜力。
摘要：In clinical practice, electrocardiography (ECG) remains the gold standard for cardiac monitoring, providing crucial insights for diagnosing a wide range of cardiovascular diseases (CVDs). However, its reliance on specialized equipment and trained personnel limits feasibility for continuous routine monitoring. Photoplethysmography (PPG) offers accessible, continuous monitoring but lacks definitive electrophysiological information, preventing conclusive diagnosis. Generative models present a promising approach to translate PPG into clinically valuable ECG signals, yet current methods face substantial challenges, including the misalignment of physiological semantics in generative models and the complexity of modeling in high-dimensional signals. To this end, we propose PPGFlowECG, a two-stage framework that aligns PPG and ECG in a shared latent space via the CardioAlign Encoder and employs latent rectified flow to generate ECGs with high fidelity and interpretability. To the best of our knowledge, this is the first study to experiment on MCMED, a newly released clinical-grade dataset comprising over 10 million paired PPG-ECG samples from more than 118,000 emergency department visits with expert-labeled cardiovascular disease annotations. Results demonstrate the effectiveness of our method for PPG-to-ECG translation and cardiovascular disease detection. Moreover, cardiologist-led evaluations confirm that the synthesized ECGs achieve high fidelity and improve diagnostic reliability, underscoring our method's potential for real-world cardiovascular screening.

【6】TIMED: Adversarial and Autoregressive Refinement of Diffusion-Based Time Series Generation
标题：TIMED：基于扩散的时间序列生成的对抗性和自回归细化
链接：https://arxiv.org/abs/2509.19638

作者：eza EskandariNasab, Shah Muhammad Hamdi, Soukaina Filali Boubrahimi
备注：Accepted to the IEEE International Conference on Data Mining (ICDM) 2025
摘要：生成高质量的合成时间序列是一项基本而又具有挑战性的任务，跨领域，如预测和异常检测，其中真实数据可能是稀缺的，嘈杂的，或昂贵的收集。与静态数据生成不同，合成时间序列需要建模的观测和条件时序依赖性的边缘分布支配顺序动态。我们提出了TIMED，这是一个统一的生成框架，它集成了一个去噪扩散概率模型（DDPM），通过正向-反向扩散过程捕获全局结构，一个用教师强迫训练的监督网络，通过下一步预测学习自回归依赖关系，以及一个Wasserstein评论家，提供对抗性反馈，以确保时间平滑性和保真度。为了进一步对齐特征空间中的真实分布和合成分布，TIMED采用了最大平均离散（MMD）损失，从而提高了多样性和样本质量。所有组件都是使用针对序列建模优化的掩蔽注意力架构构建的，并经过联合训练，以有效地捕获时间序列数据的无条件和有条件方面。在不同的多变量时间序列基准的实验结果表明，TIMED生成更现实和时间相干的序列比国家的最先进的生成模型。
摘要：Generating high-quality synthetic time series is a fundamental yet challenging task across domains such as forecasting and anomaly detection, where real data can be scarce, noisy, or costly to collect. Unlike static data generation, synthesizing time series requires modeling both the marginal distribution of observations and the conditional temporal dependencies that govern sequential dynamics. We propose TIMED, a unified generative framework that integrates a denoising diffusion probabilistic model (DDPM) to capture global structure via a forward-reverse diffusion process, a supervisor network trained with teacher forcing to learn autoregressive dependencies through next-step prediction, and a Wasserstein critic that provides adversarial feedback to ensure temporal smoothness and fidelity. To further align the real and synthetic distributions in feature space, TIMED incorporates a Maximum Mean Discrepancy (MMD) loss, promoting both diversity and sample quality. All components are built using masked attention architectures optimized for sequence modeling and are trained jointly to effectively capture both unconditional and conditional aspects of time series data. Experimental results across diverse multivariate time series benchmarks demonstrate that TIMED generates more realistic and temporally coherent sequences than state-of-the-art generative models.

【7】Frame-based Equivariant Diffusion Models for 3D Molecular Generation
标题：基于框架的3D分子生成等变扩散模型
链接：https://arxiv.org/abs/2509.19506

作者： (Faculty of Science, University of Amsterdam), Cong Liu (AMLab, University of Amsterdam), Patrick Forré (AMLab, University of Amsterdam)
摘要：最近的分子生成方法面临着一个权衡：它们要么强制执行严格的等价性与昂贵的架构或放松它，以获得可扩展性和灵活性。我们提出了一个基于帧的扩散范例，实现了确定性的E（3）-等方差，同时解耦对称处理的骨干。基于这种范式，我们研究了三种变体：全局框架扩散（GFD），它分配一个共享的分子框架;局部框架扩散（LFD），它构建节点特定的框架，并受益于额外的对齐约束;和不变框架扩散（IFD），它依赖于预规范化的不变表示。为了增强表现力，我们进一步利用EdgeDiT，一个具有边缘感知注意力的扩散Transformer。在QM 9数据集上，带有EdgeDiT的GFD实现了最先进的性能，标准尺度下的测试NLL为-137.97，双尺度下的测试NLL为-141.85，原子稳定性为98.98%，分子稳定性为90.51%。这些结果超过了所有的同变基线，同时保持了高有效性和唯一性，并且与EDM相比，采样速度快了近2倍。总之，我们的研究建立了基于框架的扩散作为一个可扩展的，灵活的，物理接地的分子生成的范例，突出了全球结构保护的关键作用。
摘要：Recent methods for molecular generation face a trade-off: they either enforce strict equivariance with costly architectures or relax it to gain scalability and flexibility. We propose a frame-based diffusion paradigm that achieves deterministic E(3)-equivariance while decoupling symmetry handling from the backbone. Building on this paradigm, we investigate three variants: Global Frame Diffusion (GFD), which assigns a shared molecular frame; Local Frame Diffusion (LFD), which constructs node-specific frames and benefits from additional alignment constraints; and Invariant Frame Diffusion (IFD), which relies on pre-canonicalized invariant representations. To enhance expressivity, we further utilize EdgeDiT, a Diffusion Transformer with edge-aware attention. On the QM9 dataset, GFD with EdgeDiT achieves state-of-the-art performance, with a test NLL of -137.97 at standard scale and -141.85 at double scale, alongside atom stability of 98.98%, and molecular stability of 90.51%. These results surpass all equivariant baselines while maintaining high validity and uniqueness and nearly 2x faster sampling compared to EDM. Altogether, our study establishes frame-based diffusion as a scalable, flexible, and physically grounded paradigm for molecular generation, highlighting the critical role of global structure preservation.

【8】ROPA: Synthetic Robot Pose Generation for RGB-D Bimanual Data Augmentation
标题：ROPA：用于RGB-D双向数据增强的合成机器人姿势生成
链接：https://arxiv.org/abs/2509.19454

作者：n, I-Chun Arthur Liu, Gaurav Sukhatme, Daniel Seita
摘要：通过模仿学习训练鲁棒的双手操作策略需要广泛覆盖机器人姿势，接触和场景上下文的演示数据。然而，收集多样化和精确的真实世界演示是昂贵和耗时的，这阻碍了可扩展性。先前的工作已经通过数据增强解决了这个问题，通常用于具有RGB输入的手眼（手腕相机）设置或用于生成没有配对动作的新颖图像，从而使具有新动作标签的眼到手（第三人称）RGB-D训练的增强较少探索。在本文中，我们提出了RGB-D双手数据增强（ROPA）的合成机器人姿势生成，这是一种离线模仿学习数据增强方法，可以微调稳定扩散来合成第三人称RGB和RGB-D观察到的新机器人姿势。我们的方法同时生成相应的关节空间动作标签，同时采用约束优化，通过适当的抓手，以对象接触约束，在双手的情况下，执行物理一致性。我们评估我们的方法5模拟和3现实世界的任务。我们在2625次模拟试验和300次真实世界试验中的结果表明，ROPA优于基线和消融，显示了其在眼对手双手操作中可扩展RGB和RGB-D数据增强的潜力。我们的项目网站是：https://ropaaug.github.io/。
摘要：Training robust bimanual manipulation policies via imitation learning requires demonstration data with broad coverage over robot poses, contacts, and scene contexts. However, collecting diverse and precise real-world demonstrations is costly and time-consuming, which hinders scalability. Prior works have addressed this with data augmentation, typically for either eye-in-hand (wrist camera) setups with RGB inputs or for generating novel images without paired actions, leaving augmentation for eye-to-hand (third-person) RGB-D training with new action labels less explored. In this paper, we propose Synthetic Robot Pose Generation for RGB-D Bimanual Data Augmentation (ROPA), an offline imitation learning data augmentation method that fine-tunes Stable Diffusion to synthesize third-person RGB and RGB-D observations of novel robot poses. Our approach simultaneously generates corresponding joint-space action labels while employing constrained optimization to enforce physical consistency through appropriate gripper-to-object contact constraints in bimanual scenarios. We evaluate our method on 5 simulated and 3 real-world tasks. Our results across 2625 simulation trials and 300 real-world trials demonstrate that ROPA outperforms baselines and ablations, showing its potential for scalable RGB and RGB-D data augmentation in eye-to-hand bimanual manipulation. Our project website is available at: https://ropaaug.github.io/.

【9】A Spatio-Temporal Feature Fusion EEG Virtual Channel Signal Generation Network and Its Application in Anxiety Assessment
标题：时空特征融合脑电虚拟通道信号发生网络及其在焦虑评估中的应用
链接：https://arxiv.org/abs/2509.19334

作者： Yuan, Wenshuang Zhai, Shengwen Guo
摘要：针对便携式脑电设备通道有限、信息采集不足的问题，提出了一种基于时空特征融合的脑电虚拟通道信号生成网络。该网络以额叶4个通道的脑电信号为基础，生成其他13个重要脑区的虚拟通道脑电信号。该网络的架构是一个二维卷积神经网络，它包括一个并行模块的时间和空间域的特征提取，其次是一个特征融合模块。公共PRED+CT数据库，其中包括来自119名受试者的多通道EEG信号，被选择来验证所构建的网络。结果表明，生成的虚拟通道脑电信号与原始真实脑电信号的平均相关系数为0.6724，平均绝对误差为3.9470。将13个虚拟通道的脑电信号与4个脑区的原始脑电信号相结合，利用支持向量机进行焦虑分类。实验结果表明，该网络生成的虚拟脑电信号不仅与真实脑电信号具有高度的一致性，而且显著提高了焦虑分类机器学习算法的性能。该研究有效地解决了便携式脑电仪在通道数少的情况下信息采集不足的问题。
摘要：To address the issue of limited channels and insufficient information collection in portable EEG devices, this study explores an EEG virtual channel signal generation network using a novel spatio-temporal feature fusion strategy. Based on the EEG signals from four frontal lobe channels, the network aims to generate virtual channel EEG signals for other 13 important brain regions. The architecture of the network is a two-dimensional convolutional neural network and it includes a parallel module for temporal and spatial domain feature extraction, followed by a feature fusion module. The public PRED+CT database, which includes multi-channel EEG signals from 119 subjects, was selected to verify the constructed network. The results showed that the average correlation coefficient between the generated virtual channel EEG signals and the original real signals was 0.6724, with an average absolute error of 3.9470. Furthermore, the 13 virtual channel EEG signals were combined with the original EEG signals of four brain regions and then used for anxiety classification with a support vector machine. The results indicate that the virtual EEG signals generated by the constructed network not only have a high degree of consistency with the real channel EEG signals but also significantly enhance the performance of machine learning algorithms for anxiety classification. This study effectively alleviates the problem of insufficient information acquisition by portable EEG devices with few channels.

半/弱/无/有监督|不确定性|主动学习(12篇)

【1】Multilingual Hope Speech Detection: A Comparative Study of Logistic Regression, mBERT, and XLM-RoBERTa with Active Learning
标题：多语言希望语音检测：逻辑回归、mBERT和XLM-RoBERTa与主动学习的比较研究
链接：https://arxiv.org/abs/2509.20315

作者：ola, K. D. Abiodun, O. E. Olumide, O. O. Adebanji, O. Hiram Calvo, Grigori Sidorov
摘要：鼓励和乐观的希望言语在促进网上积极话语方面起着至关重要的作用。然而，它的检测仍然具有挑战性，特别是在多语言和低资源环境中。本文提出了一种多语言框架，希望语音检测使用主动学习方法和基于transformer的模型，包括mBERT和XLM-RoBERTA。实验是在英语、西班牙语、德语和乌尔都语的数据集上进行的，包括来自最近共享任务的基准测试集。我们的研究结果表明，Transformer模型显著优于传统基线，XLM-RoBERTA实现了最高的整体精度。此外，我们的主动学习策略即使在小的注释数据集上也保持了良好的性能。这项研究强调了将多语言Transformers与数据高效训练策略相结合用于希望语音检测的有效性。
摘要：Hope speech language that fosters encouragement and optimism plays a vital role in promoting positive discourse online. However, its detection remains challenging, especially in multilingual and low-resource settings. This paper presents a multilingual framework for hope speech detection using an active learning approach and transformer-based models, including mBERT and XLM-RoBERTa. Experiments were conducted on datasets in English, Spanish, German, and Urdu, including benchmark test sets from recent shared tasks. Our results show that transformer models significantly outperform traditional baselines, with XLM-RoBERTa achieving the highest overall accuracy. Furthermore, our active learning strategy maintained strong performance even with small annotated datasets. This study highlights the effectiveness of combining multilingual transformers with data-efficient training strategies for hope speech detection.

【2】Table Detection with Active Learning
标题：通过主动学习进行表格检测
链接：https://arxiv.org/abs/2509.20003

作者：utam, Nachiketa Purohit, Gaurav Harit
备注：Accepted in ICDAR 2025
摘要：有效的数据注释仍然是机器学习中的一个关键挑战，特别是对于需要大量标记数据的对象检测任务。主动学习（AL）已经成为一种很有前途的解决方案，通过选择最具信息量的样本来最大限度地减少注释成本。虽然传统的AL方法主要依赖于基于不确定性的选择，但最近的进展表明，结合基于多样性的策略可以提高对象检测任务的采样效率。我们的方法确保了代表性的例子，提高模型的泛化选择。我们使用最先进的表检测架构CascadeTabNet和YOLOv 9在两个基准数据集（TableBank-LaTeX，TableBank-Word）上评估了我们的方法。我们的研究结果表明，基于AL的示例选择显着优于随机采样，在有限的预算下减少注释工作，同时保持与完全监督模型相当的性能。我们的方法在相同的注释预算内获得了更高的mAP分数。
摘要：Efficient data annotation remains a critical challenge in machine learning, particularly for object detection tasks requiring extensive labeled data. Active learning (AL) has emerged as a promising solution to minimize annotation costs by selecting the most informative samples. While traditional AL approaches primarily rely on uncertainty-based selection, recent advances suggest that incorporating diversity-based strategies can enhance sampling efficiency in object detection tasks. Our approach ensures the selection of representative examples that improve model generalization. We evaluate our method on two benchmark datasets (TableBank-LaTeX, TableBank-Word) using state-of-the-art table detection architectures, CascadeTabNet and YOLOv9. Our results demonstrate that AL-based example selection significantly outperforms random sampling, reducing annotation effort given a limited budget while maintaining comparable performance to fully supervised models. Our method achieves higher mAP scores within the same annotation budget.

【3】Towards Self-Supervised Foundation Models for Critical Care Time Series
标题：走向重症监护时间序列的自我监督基金会模型
链接：https://arxiv.org/abs/2509.19885

作者：sunnguaq Jagd, Rachael DeVries, Ole Winther
备注：Accepted to NeurIPS 2025 workshop Learning from Time Series for Health (TS4H)
摘要：近年来，医疗保健领域的特定基础模型迅速扩展，但由于数据集的大小和可用性有限，重症监护时间序列的基础模型仍然相对不足。在这项工作中，我们介绍了一个早期阶段的预训练的基础模型的基础上的双轴Transformer（BAT）的基础上，重症监护时间序列的汇集的电子健康记录数据集的训练。我们通过在与死亡率预测的训练源不同的数据集上微调模型来证明有效的迁移学习，它的性能优于监督基线，特别是对于小数据集（$<5，000 $）。这些贡献突出了自我监督的基础模型的潜力，重症监护时间序列，以支持在资源有限的环境中推广和强大的临床应用。
摘要：Domain-specific foundation models for healthcare have expanded rapidly in recent years, yet foundation models for critical care time series remain relatively underexplored due to the limited size and availability of datasets. In this work, we introduce an early-stage pre-trained foundation model for critical care time-series based on the Bi-Axial Transformer (BAT), trained on pooled electronic health record datasets. We demonstrate effective transfer learning by fine-tuning the model on a dataset distinct from the training sources for mortality prediction, where it outperforms supervised baselines, particularly for small datasets ($<5,000$). These contributions highlight the potential of self-supervised foundation models for critical care times series to support generalizable and robust clinical applications in resource-limited settings.

【4】Symbol-Temporal Consistency Self-supervised Learning for Robust Time Series Classification
标题：用于鲁棒时间序列分类的符号时间一致性自监督学习
链接：https://arxiv.org/abs/2509.19654

作者：cia, Cassandra Garza, Brooklyn Berry, Yifeng Gao
备注：4 pages, 2 figures, IEEE-EMBS BSN 2025
摘要：时间序列在数字健康领域的重要性激增，需要先进的方法来提取有意义的模式和表示。自监督对比学习已经成为直接从原始数据中学习的一种有前途的方法。然而，数字健康中的时间序列数据被认为是高噪声的，本质上涉及概念漂移，并且对训练可推广的深度学习模型构成了挑战。在本文中，我们特别关注由不同的人类行为引起的数据分布偏移，并提出了一个自监督学习框架，该框架意识到符号袋表示。符号袋表示以其对时间序列数据中存在的数据扭曲、位置偏移和噪声不敏感而闻名，这使得它在指导深度学习获得抵抗这种数据偏移的表示方面具有潜在的关键性。我们证明了所提出的方法可以实现显着更好的性能存在显着的数据移位。
摘要：The surge in the significance of time series in digital health domains necessitates advanced methodologies for extracting meaningful patterns and representations. Self-supervised contrastive learning has emerged as a promising approach for learning directly from raw data. However, time series data in digital health is known to be highly noisy, inherently involves concept drifting, and poses a challenge for training a generalizable deep learning model. In this paper, we specifically focus on data distribution shift caused by different human behaviors and propose a self-supervised learning framework that is aware of the bag-of-symbol representation. The bag-of-symbol representation is known for its insensitivity to data warping, location shifts, and noise existed in time series data, making it potentially pivotal in guiding deep learning to acquire a representation resistant to such data shifting. We demonstrate that the proposed method can achieve significantly better performance where significant data shifting exists.

【5】Adaptive von Mises-Fisher Likelihood Loss for Supervised Deep Time Series Hashing
标题：自适应von Mises-Fisher因监督深度时间序列哈希而可能失败
链接：https://arxiv.org/abs/2509.19625

作者：el Perez, Kevin Garcia, Brooklyn Berry, Dongjin Song, Yifeng Gao
备注：6 pages, 6 figures, Conference: ICMLA 2025
摘要：通过建立紧凑的二进制表示来索引时间序列是时间序列数据挖掘中的一项基本任务。最近，基于深度学习的哈希方法已被证明可以有效地基于语义而不仅仅是原始相似性来索引时间序列。深度哈希的目的是将具有相同语义的样本映射到相同的二进制哈希码，从而实现更有效的搜索和检索。与其他监督表示学习方法不同，监督深度哈希需要一个离散化步骤来将实值表示转换为二进制代码，但这可能会导致显著的信息丢失。在本文中，我们提出了一个von Mises-Fisher（vMF）散列损失。所提出的深度哈希模型将数据映射到M维超球面空间以有效地减少信息丢失，并将每个数据类建模为遵循不同vMF分布的点。所设计的损失旨在最大化每个建模的vMF分布之间的分离，以提供最大化每个语义上不同的数据样本之间的裕度的更好方式。实验结果表明，我们的方法优于现有的基线。该实现可在https://github.com/jmpq97/vmf-hashing上公开获取
摘要：Indexing time series by creating compact binary representations is a fundamental task in time series data mining. Recently, deep learning-based hashing methods have proven effective for indexing time series based on semantic meaning rather than just raw similarity. The purpose of deep hashing is to map samples with the same semantic meaning to identical binary hash codes, enabling more efficient search and retrieval. Unlike other supervised representation learning methods, supervised deep hashing requires a discretization step to convert real-valued representations into binary codes, but this can induce significant information loss. In this paper, we propose a von Mises-Fisher (vMF) hashing loss. The proposed deep hashing model maps data to an M-dimensional hyperspherical space to effectively reduce information loss and models each data class as points following distinct vMF distributions. The designed loss aims to maximize the separation between each modeled vMF distribution to provide a better way to maximize the margin between each semantically different data sample. Experimental results show that our method outperforms existing baselines. The implementation is publicly available at https://github.com/jmpq97/vmf-hashing

【6】Uncertainty in Semantic Language Modeling with PIXELS
标题：使用PIXELS进行语义语言建模的不确定性
链接：https://arxiv.org/abs/2509.19563

作者：Radu, Marco Zullich, Matias Valdenegro-Toro
备注：9 pages, 6 figures, UncertaiNLP 2025 Workshop @ EMNLP Camera Ready
摘要：基于像素的语言模型旨在解决语言建模中的词汇瓶颈问题，但不确定性量化的挑战仍然存在。这项工作的新颖性包括分析18种语言和7种脚本中基于像素的语言模型的不确定性和置信度，所有这些都是3项语义挑战性任务的一部分。这是通过几种方法来实现的，例如Monte Carlo Dropout，Transformer Attention和Ensemble Learning。结果表明，基于像素的模型低估了重建补丁时的不确定性。不确定性也受到文字的影响，拉丁语的不确定性较低。集成学习的研究结果显示，在16种语言的命名实体识别和问答任务中应用超参数调整时，性能更好。
摘要：Pixel-based language models aim to solve the vocabulary bottleneck problem in language modeling, but the challenge of uncertainty quantification remains open. The novelty of this work consists of analysing uncertainty and confidence in pixel-based language models across 18 languages and 7 scripts, all part of 3 semantically challenging tasks. This is achieved through several methods such as Monte Carlo Dropout, Transformer Attention, and Ensemble Learning. The results suggest that pixel-based models underestimate uncertainty when reconstructing patches. The uncertainty is also influenced by the script, with Latin languages displaying lower uncertainty. The findings on ensemble learning show better performance when applying hyperparameter tuning during the named entity recognition and question-answering tasks across 16 languages.

【7】Self-evolved Imitation Learning in Simulated World
标题：模拟世界中的自进化模仿学习
链接：https://arxiv.org/abs/2509.19460

作者： Jun Cen, Jing Chen, Zhihe Lu
摘要：模仿学习一直是一种趋势，但在多个任务中训练一个通才代理仍然需要大规模的专家演示，这是昂贵的和劳动密集型的收集。为了解决有限的监督的挑战，我们提出了自我进化的模仿学习（SEIL），一个框架，通过模拟器的交互，逐步提高了Few-Shot模型。该模型首先尝试在模拟器中的任务，从成功的轨迹被收集作为新的示范迭代细化。为了增强这些演示的多样性，SEIL采用了双层增强：（i）模型级，使用指数移动平均（EMA）模型与主模型协作，以及（ii）环境级，在初始对象位置中引入轻微变化。我们还引入了一个轻量级的选择器，从生成的池中过滤补充和信息丰富的轨迹，以确保演示质量。这些精心策划的样本使模型能够以更少的训练样本实现有竞争力的性能。在LIBERO基准上的大量实验表明，SEIL在Few-Shot模仿学习场景中实现了新的最先进的性能。代码可在https://github.com/Jasper-aaa/SEIL.git上获得。
摘要：Imitation learning has been a trend recently, yet training a generalist agent across multiple tasks still requires large-scale expert demonstrations, which are costly and labor-intensive to collect. To address the challenge of limited supervision, we propose Self-Evolved Imitation Learning (SEIL), a framework that progressively improves a few-shot model through simulator interactions. The model first attempts tasksin the simulator, from which successful trajectories are collected as new demonstrations for iterative refinement. To enhance the diversity of these demonstrations, SEIL employs dual-level augmentation: (i) Model-level, using an Exponential Moving Average (EMA) model to collaborate with the primary model, and (ii) Environment-level, introducing slight variations in initial object positions. We further introduce a lightweight selector that filters complementary and informative trajectories from the generated pool to ensure demonstration quality. These curated samples enable the model to achieve competitive performance with far fewer training examples. Extensive experiments on the LIBERO benchmark show that SEIL achieves a new state-of-the-art performance in few-shot imitation learning scenarios. Code is available at https://github.com/Jasper-aaa/SEIL.git.

【8】Analyzing Uncertainty Quantification in Statistical and Deep Learning Models for Probabilistic Electricity Price Forecasting
标题：分析概率电价预测的统计和深度学习模型中的不确定性量化
链接：https://arxiv.org/abs/2509.19417

作者：ebedev, Abhinav Das, Sven Pappert, Stephan Schlüter
摘要：精确的概率预测是能源风险管理的基础，为此目的有各种统计和机器学习模型。这些概率模型固有的是某种形式的不确定性量化。然而，大多数模型并没有捕捉到不确定性的全部程度，这种不确定性不仅来自数据本身，而且来自模型和分配选择。在这项研究中，我们研究了德国市场电价预测的最先进统计和深度学习概率预测模型中的不确定性量化。特别是，我们考虑了深度分布式神经网络（DDNN），并使用集成方法、蒙特卡罗（MC）丢弃和共形预测来增强它们，以考虑模型的不确定性。此外，我们考虑LASSO估计自回归（LEAR）方法结合分位数回归平均（QRA），广义自回归条件异方差（GARCH），和共形预测。在一系列的性能指标，我们发现，基于LEAR的模型在概率预测方面表现良好，无论不确定性量化方法。此外，我们发现，DDNN受益于合并数据和模型的不确定性，改善点和概率预测。不确定性本身似乎是最好的捕获模型使用保形预测。总体而言，我们的广泛研究表明，所有考虑中的模型都具有竞争力。然而，它们的相对性能取决于点预测和概率预测的指标选择。
摘要：Precise probabilistic forecasts are fundamental for energy risk management, and there is a wide range of both statistical and machine learning models for this purpose. Inherent to these probabilistic models is some form of uncertainty quantification. However, most models do not capture the full extent of uncertainty, which arises not only from the data itself but also from model and distributional choices. In this study, we examine uncertainty quantification in state-of-the-art statistical and deep learning probabilistic forecasting models for electricity price forecasting in the German market. In particular, we consider deep distributional neural networks (DDNNs) and augment them with an ensemble approach, Monte Carlo (MC) dropout, and conformal prediction to account for model uncertainty. Additionally, we consider the LASSO-estimated autoregressive (LEAR) approach combined with quantile regression averaging (QRA), generalized autoregressive conditional heteroskedasticity (GARCH), and conformal prediction. Across a range of performance metrics, we find that the LEAR-based models perform well in terms of probabilistic forecasting, irrespective of the uncertainty quantification method. Furthermore, we find that DDNNs benefit from incorporating both data and model uncertainty, improving both point and probabilistic forecasting. Uncertainty itself appears to be best captured by the models using conformal prediction. Overall, our extensive study shows that all models under consideration perform competitively. However, their relative performance depends on the choice of metrics for point and probabilistic forecasting.

【9】Unsupervised Outlier Detection in Audit Analytics: A Case Study Using USA Spending Data
标题：审计分析中的无监督异常值检测：使用美国支出数据的案例研究
链接：https://arxiv.org/abs/2509.19366

作者：Berkay Kaplan, Maksym Lazirko, Aleksandr Kogan
摘要：本研究调查了审计分析中无监督离群值检测方法的有效性，利用美国卫生与公众服务部（DHHS）的美国支出数据作为案例。我们采用和比较多个离群值检测算法，包括基于直方图的离群值得分（HBOS），鲁棒主成分分析（PCA），最小协方差行列式（MCD）和K-最近邻（KNN），以确定联邦支出模式的异常。该研究解决了大规模政府数据集中日益增长的高效、准确的异常检测需求，而传统的审计方法可能无法满足这一需求。我们的方法包括数据准备，算法实现，以及使用精度，召回率和F1分数进行性能评估。实验结果表明，结合多种检测策略的混合方法，提高了复杂金融数据中离群点识别的鲁棒性和准确性。这项研究有助于审计分析领域的各种离群值检测模型的比较有效性的见解，并展示了无监督学习技术在提高审计质量和效率的潜力。这些发现对寻求在政府财务监督和风险管理中利用高级分析的审计师、政策制定者和研究人员具有影响。
摘要：This study investigates the effectiveness of unsupervised outlier detection methods in audit analytics, utilizing USA spending data from the U.S. Department of Health and Human Services (DHHS) as a case example. We employ and compare multiple outlier detection algorithms, including Histogram-based Outlier Score (HBOS), Robust Principal Component Analysis (PCA), Minimum Covariance Determinant (MCD), and K-Nearest Neighbors (KNN) to identify anomalies in federal spending patterns. The research addresses the growing need for efficient and accurate anomaly detection in large-scale governmental datasets, where traditional auditing methods may fall short. Our methodology involves data preparation, algorithm implementation, and performance evaluation using precision, recall, and F1 scores. Results indicate that a hybrid approach, combining multiple detection strategies, enhances the robustness and accuracy of outlier identification in complex financial data. This study contributes to the field of audit analytics by providing insights into the comparative effectiveness of various outlier detection models and demonstrating the potential of unsupervised learning techniques in improving audit quality and efficiency. The findings have implications for auditors, policymakers, and researchers seeking to leverage advanced analytics in governmental financial oversight and risk management.

【10】Online Adaptation via Dual-Stage Alignment and Self-Supervision for Fast-Calibration Brain-Computer Interfaces
标题：通过双阶段对齐和自我监督进行在线调整，以实现快速校准脑机接口
链接：https://arxiv.org/abs/2509.19403

作者： Duan, Jian-Long Hao, Tian-Yu Xiang, Xiao-Hu Zhou, Mei-Jiang Gui, Xiao-Liang Xie, Shi-Qi Liu, Zeng-Guang Hou
摘要：脑活动的个体差异阻碍了基于脑电的脑机接口系统的在线应用。为了克服这一局限性，本研究提出了一种在线适应算法，通过双阶段对齐和自我监督的未见过的主题。对齐过程首先在EEG数据空间中应用欧几里德对齐，然后在表示空间中更新批量归一化统计。此外，一个自我监督的损失被设计来更新解码器。损失由解码器导出的软伪标签计算，作为未知地面真相的代理，并由香农熵校准，以促进自监督训练。在5个公共数据集和7个解码器上的实验表明，该算法可以无缝集成，而不受BCI范式和解码器架构的影响。在每一次迭代中，解码器更新一个单一的在线试验，这产生了4.9%的稳态视觉诱发电位（SSVEP）和3.6%的运动想象的平均准确性增益。这些结果支持快速校准操作，并表明该算法具有很大的潜力，脑机接口的应用。
摘要：Individual differences in brain activity hinder the online application of electroencephalogram (EEG)-based brain computer interface (BCI) systems. To overcome this limitation, this study proposes an online adaptation algorithm for unseen subjects via dual-stage alignment and self-supervision. The alignment process begins by applying Euclidean alignment in the EEG data space and then updates batch normalization statistics in the representation space. Moreover, a self-supervised loss is designed to update the decoder. The loss is computed by soft pseudo-labels derived from the decoder as a proxy for the unknown ground truth, and is calibrated by Shannon entropy to facilitate self-supervised training. Experiments across five public datasets and seven decoders show the proposed algorithm can be integrated seamlessly regardless of BCI paradigm and decoder architecture. In each iteration, the decoder is updated with a single online trial, which yields average accuracy gains of 4.9% on steady-state visual evoked potentials (SSVEP) and 3.6% on motor imagery. These results support fast-calibration operation and show that the proposed algorithm has great potential for BCI applications.

【11】SpellerSSL: Self-Supervised Learning with P300 Aggregation for Speller BCIs
标题：SpellerSSL：Speller BCI的P300聚合自我监督学习
链接：https://arxiv.org/abs/2509.19401

作者：ong, Geoff Mackellar, Soheila Ghane
摘要：基于脑电（EEG）的P300拼写脑机接口（BCI）面临着三个主要挑战：低信噪比（SNR），泛化能力差，耗时的校准。我们提出了SpellerSSL，一个框架，结合自监督学习（SSL）与P300聚合来解决这些问题。首先，我们引入一个聚合策略，以提高信噪比。其次，为了实现训练的泛化，我们采用定制的1D U-Net主干，并在跨域和域内EEG数据上预训练模型。预训练的模型随后使用轻量级的ERP-Head分类器进行微调，用于P300检测，该分类器将学习的表示适应特定于主题的数据。我们对校准时间的评估表明，将聚合策略与SSL相结合显著降低了每个受试者的校准负担，并提高了受试者之间的鲁棒性。实验结果表明，SSL学习有效的EEG表示在域内和跨域，与域实现了最先进的字符识别率为94%，只有7次重复和最高的信息传输速率（ITR）的21.86位/分钟的公共II-B数据集。此外，具有P300聚合的域内SSL将所需的校准大小减少了60%，同时保持了相当的字符识别率。据我们所知，这是第一项将SSL应用于P300拼写者的研究，突出了其提高拼写者BCI效率和泛化的潜力，并为P300拼写者BCI的EEG基础模型铺平了道路。
摘要：Electroencephalogram (EEG)-based P300 speller brain-computer interfaces (BCIs) face three main challenges: low signal-to-noise ratio (SNR), poor generalization, and time-consuming calibration. We propose SpellerSSL, a framework that combines self-supervised learning (SSL) with P300 aggregation to address these issues. First, we introduce an aggregation strategy to enhance SNR. Second, to achieve generalization in training, we employ a customized 1D U-Net backbone and pretrain the model on both cross-domain and in-domain EEG data. The pretrained model is subsequently fine-tuned with a lightweight ERP-Head classifier for P300 detection, which adapts the learned representations to subject-specific data. Our evaluations on calibration time demonstrate that combining the aggregation strategy with SSL significantly reduces the calibration burden per subject and improves robustness across subjects. Experimental results show that SSL learns effective EEG representations in both in-domain and cross-domain, with in-domain achieving a state-of-the-art character recognition rate of 94% with only 7 repetitions and the highest information transfer rate (ITR) of 21.86 bits/min on the public II-B dataset. Moreover, in-domain SSL with P300 aggregation reduces the required calibration size by 60% while maintaining a comparable character recognition rate. To the best of our knowledge, this is the first study to apply SSL to P300 spellers, highlighting its potential to improve both efficiency and generalization in speller BCIs and paving the way toward an EEG foundation model for P300 speller BCIs.

【12】Self-Alignment Learning to Improve Myocardial Infarction Detection from Single-Lead ECG
标题：自对准学习改善单导心电图心肌梗塞检测
链接：https://arxiv.org/abs/2509.19397

作者：n, Xiaocheng Fang, Haoyu Wang, Jun Li, Che Liu, Donglin Xie, Hongyan Li, Shenda Hong
摘要：心肌梗死是冠状动脉疾病的重要表现，但由于空间信息有限，从单导联心电图（ECG）中检测心肌梗死仍然具有挑战性。一个直观的想法是将单导联ECG转换为多导联ECG，以便通过预训练模型进行分类，但在大多数情况下，在信号水平上优化的生成方法会留下很大的潜在空间间隙，最终降低诊断性能。这自然提出了一个问题，即潜在空间对齐是否有帮助。然而，大多数先前的ECG对齐方法集中于学习变换不变性，这与单导联检测的目标不匹配。为了解决这个问题，我们提出了SelfMIS，一个简单而有效的对齐学习框架，以提高单导联心电图心肌梗死检测。SelfMIS摒弃了手动数据扩充，采用自切割策略将多导联ECG与其对应的单导联段配对，并在潜在空间中直接对齐它们。该设计将学习目标从追求变换不变性转变为丰富单导联表示，明确地驱动单导联ECG编码器学习能够从局部信号推断全局心脏背景的表示。在实验上，SelfMIS在9种心肌梗死类型中实现了优于基线模型的性能，同时保持了更简单的架构和更低的计算开销，从而证实了直接潜在空间对齐的功效。我们的代码和检查点将在接受后公开。
摘要：Myocardial infarction is a critical manifestation of coronary artery disease, yet detecting it from single-lead electrocardiogram (ECG) remains challenging due to limited spatial information. An intuitive idea is to convert single-lead into multiple-lead ECG for classification by pre-trained models, but generative methods optimized at the signal level in most cases leave a large latent space gap, ultimately degrading diagnostic performance. This naturally raises the question of whether latent space alignment could help. However, most prior ECG alignment methods focus on learning transformation invariance, which mismatches the goal of single-lead detection. To address this issue, we propose SelfMIS, a simple yet effective alignment learning framework to improve myocardial infarction detection from single-lead ECG. Discarding manual data augmentations, SelfMIS employs a self-cutting strategy to pair multiple-lead ECG with their corresponding single-lead segments and directly align them in the latent space. This design shifts the learning objective from pursuing transformation invariance to enriching the single-lead representation, explicitly driving the single-lead ECG encoder to learn a representation capable of inferring global cardiac context from the local signal. Experimentally, SelfMIS achieves superior performance over baseline models across nine myocardial infarction types while maintaining a simpler architecture and lower computational overhead, thereby substantiating the efficacy of direct latent space alignment. Our code and checkpoint will be publicly available after acceptance.

迁移|Zero/Few/One-Shot|自适应(14篇)

【1】Video models are zero-shot learners and reasoners
标题：视频模型是Zero-Shot学习者和推理者
链接：https://arxiv.org/abs/2509.20328

作者：Wiedemer, Yuxuan Li, Paul Vicol, Shixiang Shane Gu, Nick Matarese, Kevin Swersky, Been Kim, Priyank Jaini, Robert Geirhos
备注：Project page: this https URL
摘要：大型语言模型（LLM）卓越的zero-shot能力推动了自然语言处理从特定于任务的模型到统一的通用基础模型。这种转变源于简单的原语：在网络规模数据上训练的大型生成模型。奇怪的是，相同的原语适用于今天的生成视频模型。视频模型是否会朝着通用视觉理解的方向发展，就像LLM开发的通用语言理解一样？我们证明，Veo 3可以解决各种各样的任务，它没有明确的训练：分割对象，检测边缘，编辑图像，理解物理属性，识别对象的启示，模拟工具的使用，等等。这些感知、建模和操纵视觉世界的能力使早期形式的视觉推理成为可能，比如迷宫和对称性求解。Veo的紧急zero-shot功能表明，视频模型正在成为统一的通用视觉基础模型。
摘要：The remarkable zero-shot capabilities of Large Language Models (LLMs) have propelled natural language processing from task-specific models to unified, generalist foundation models. This transformation emerged from simple primitives: large, generative models trained on web-scale data. Curiously, the same primitives apply to today's generative video models. Could video models be on a trajectory towards general-purpose vision understanding, much like LLMs developed general-purpose language understanding? We demonstrate that Veo 3 can solve a broad variety of tasks it wasn't explicitly trained for: segmenting objects, detecting edges, editing images, understanding physical properties, recognizing object affordances, simulating tool use, and more. These abilities to perceive, model, and manipulate the visual world enable early forms of visual reasoning like maze and symmetry solving. Veo's emergent zero-shot capabilities indicate that video models are on a path to becoming unified, generalist vision foundation models.

【2】Predictive Coding-based Deep Neural Network Fine-tuning for Computationally Efficient Domain Adaptation
标题：基于预测编码的深度神经网络微调以实现计算高效的领域自适应
链接：https://arxiv.org/abs/2509.20269

作者：rdoni, Sam Leroux
备注：20 pages, 4 figures
摘要：随着深度神经网络越来越多地部署在动态的现实环境中，依赖于单个静态模型往往是不够的。由传感器漂移或照明变化引起的输入数据分布的变化需要持续的模型适应。在本文中，我们提出了一种混合训练方法，通过结合反向传播和预测编码的优势，实现了有效的设备上域自适应。该方法从使用反向传播离线训练的深度神经网络开始，以实现高初始性能。随后，预测编码用于在线自适应，允许模型恢复由于输入数据分布的变化而丢失的准确性。这种方法利用反向传播的鲁棒性进行初始表示学习，并利用预测编码的计算效率进行持续学习，使其特别适合资源受限的边缘设备或未来的神经形态加速器。在MNIST和CIFAR-10数据集上的实验结果表明，这种混合策略可以有效地自适应，减少计算开销，为在动态环境中保持模型性能提供了一种有前途的解决方案。
摘要：As deep neural networks are increasingly deployed in dynamic, real-world environments, relying on a single static model is often insufficient. Changes in input data distributions caused by sensor drift or lighting variations necessitate continual model adaptation. In this paper, we propose a hybrid training methodology that enables efficient on-device domain adaptation by combining the strengths of Backpropagation and Predictive Coding. The method begins with a deep neural network trained offline using Backpropagation to achieve high initial performance. Subsequently, Predictive Coding is employed for online adaptation, allowing the model to recover accuracy lost due to shifts in the input data distribution. This approach leverages the robustness of Backpropagation for initial representation learning and the computational efficiency of Predictive Coding for continual learning, making it particularly well-suited for resource-constrained edge devices or future neuromorphic accelerators. Experimental results on the MNIST and CIFAR-10 datasets demonstrate that this hybrid strategy enables effective adaptation with a reduced computational overhead, offering a promising solution for maintaining model performance in dynamic environments.

【3】A HyperGraphMamba-Based Multichannel Adaptive Model for ncRNA Classification
标题：基于HyperGraphMamba的多通道自适应ncRNA分类模型
链接：https://arxiv.org/abs/2509.20240

作者：uijie Li, Qiao Ning, Hui Li, Qian Ma, Shikai Guo
备注：9 pages, 17 figures (including subfigures), 1 table. Xin An and Ruijie Li contributed equally to this work and should be considered co-first authors
摘要：非编码RNA（Non-coding RNA，ncRNA）在基因表达调控和多种疾病的发病机制中起着关键作用。ncRNA的准确分类对于功能注释和疾病诊断至关重要。为了解决特征提取深度和多模态融合的现有限制，我们提出了HGMamba-ncRNA，一种基于HyperGraphMamba的多通道自适应模型，它集成了序列，二级结构和可选的ncRNA表达特征，以提高分类性能。具体来说，ncRNA的序列使用并行多尺度卷积和LSTM架构（MKC-L）来建模，以捕获核苷酸的局部模式和远程依赖性。结构模态采用多尺度图Transformer（MSGraphTransformer）来表示ncRNA二级结构的多层次拓扑特征。表达模式利用基于Chebyshev多项式的Kolmogorov-Arnold网络（CPKAN）来有效地建模和解释高维表达谱。最后，通过将虚拟节点，以促进高效和全面的多模态交互，HyperGraphMamba提出自适应对齐和集成多通道异构模态功能。在三个公共数据集上进行的实验表明，HGMamba-ncRNA在准确性和其他指标方面始终优于最先进的方法。大量的实证研究进一步证实了该模型的鲁棒性、有效性和较强的可移植性，为复杂的ncRNA功能分类提供了一种新颖可靠的策略。代码和数据集可在https://anonymous.4open.science/r/HGMamba-ncRNA-94D0上获得。
摘要：Non-coding RNAs (ncRNAs) play pivotal roles in gene expression regulation and the pathogenesis of various diseases. Accurate classification of ncRNAs is essential for functional annotation and disease diagnosis. To address existing limitations in feature extraction depth and multimodal fusion, we propose HGMamba-ncRNA, a HyperGraphMamba-based multichannel adaptive model, which integrates sequence, secondary structure, and optionally available expression features of ncRNAs to enhance classification performance. Specifically, the sequence of ncRNA is modeled using a parallel Multi-scale Convolution and LSTM architecture (MKC-L) to capture both local patterns and long-range dependencies of nucleotides. The structure modality employs a multi-scale graph transformer (MSGraphTransformer) to represent the multi-level topological characteristics of ncRNA secondary structures. The expression modality utilizes a Chebyshev Polynomial-based Kolmogorov-Arnold Network (CPKAN) to effectively model and interpret high-dimensional expression profiles. Finally, by incorporating virtual nodes to facilitate efficient and comprehensive multimodal interaction, HyperGraphMamba is proposed to adaptively align and integrate multichannel heterogeneous modality features. Experiments conducted on three public datasets demonstrate that HGMamba-ncRNA consistently outperforms state-of-the-art methods in terms of accuracy and other metrics. Extensive empirical studies further confirm the model's robustness, effectiveness, and strong transferability, offering a novel and reliable strategy for complex ncRNA functional classification. Code and datasets are available at https://anonymous.4open.science/r/HGMamba-ncRNA-94D0.

【4】Time-adaptive HénonNets for separable Hamiltonian systems
标题：可分离Hamilton系统的时间自适应Hénon网
链接：https://arxiv.org/abs/2509.20212

作者：nik, Peter Benner
摘要：测量数据经常被不规则地采样，即，而不是等距离的时间网格这对于哈密顿系统也是正确的。然而，现有的机器学习方法，学习辛积分器，如SymNets [1]和H 'enonNets [2]仍然需要通过固定步长生成的训练数据。为了学习时间自适应辛积分器，在[3]中引入了SympNets的扩展，称为TSympNets。这项工作的目的是为H\'enonNets做一个类似的扩展。我们提出了一种新的神经网络架构称为T-H\'enonNets，这是辛的设计，可以处理自适应的时间步长。我们还将T-H\'enonNet结构扩展到非自治Hamilton系统。此外，我们提供了两个新的架构可分离的哈密顿系统的通用逼近定理，并讨论了为什么它是难以处理的非可分离的哈密顿系统与所提出的方法。为了研究这些理论近似能力，我们进行了不同的数值实验。
摘要：Measurement data is often sampled irregularly, i.e., not on equidistant time grids. This is also true for Hamiltonian systems. However, existing machine learning methods, which learn symplectic integrators, such as SympNets [1] and H\'enonNets [2] still require training data generated by fixed step sizes. To learn time-adaptive symplectic integrators, an extension to SympNets called TSympNets is introduced in [3]. The aim of this work is to do a similar extension for H\'enonNets. We propose a novel neural network architecture called T-H\'enonNets, which is symplectic by design and can handle adaptive time steps. We also extend the T-H\'enonNet architecture to non-autonomous Hamiltonian systems. Additionally, we provide universal approximation theorems for both new architectures for separable Hamiltonian systems and discuss why it is difficult to handle non-separable Hamiltonian systems with the proposed methods. To investigate these theoretical approximation capabilities, we perform different numerical experiments.

【5】You Only Measure Once: On Designing Single-Shot Quantum Machine Learning Models
标题：您只能测量一次：关于设计单次量子机器学习模型
链接：https://arxiv.org/abs/2509.20090

作者：iu, Leonardo Placidi, Kuan-Cheng Chen, Samuel Yen-Chi Chen, Gabriel Matos
摘要：量子机器学习（QML）模型通常依赖于对可观测量的重复测量（射击）来获得可靠的预测。这种对大拍摄预算的依赖导致高推理成本和时间开销，这是特别有问题的，因为量子硬件访问通常与拍摄数量成比例地定价。在这项工作中，我们提出了You Only Measure Once（Yomo），这是一种简单而有效的设计，可以用更少的测量来实现精确的推断，甚至可以达到单次测量。Yomo用概率聚合机制取代了Pauli的期望值输出，并引入了鼓励精确预测的损失函数。我们的理论分析表明，Yomo避免了基于预期的模型固有的镜头缩放限制，我们在MNIST和CIFAR-10上的实验证实，Yomo在不同的镜头预算和去极化通道的模拟下始终优于基线。通过实现精确的单次推理，Yomo大大降低了部署QML的财务和计算成本，从而降低了实际采用QML的障碍。
摘要：Quantum machine learning (QML) models conventionally rely on repeated measurements (shots) of observables to obtain reliable predictions. This dependence on large shot budgets leads to high inference cost and time overhead, which is particularly problematic as quantum hardware access is typically priced proportionally to the number of shots. In this work we propose You Only Measure Once (Yomo), a simple yet effective design that achieves accurate inference with dramatically fewer measurements, down to the single-shot regime. Yomo replaces Pauli expectation-value outputs with a probability aggregation mechanism and introduces loss functions that encourage sharp predictions. Our theoretical analysis shows that Yomo avoids the shot-scaling limitations inherent to expectation-based models, and our experiments on MNIST and CIFAR-10 confirm that Yomo consistently outperforms baselines across different shot budgets and under simulations with depolarizing channels. By enabling accurate single-shot inference, Yomo substantially reduces the financial and computational costs of deploying QML, thereby lowering the barrier to practical adoption of QML.

【6】How deep is your network? Deep vs. shallow learning of transfer operators
标题：您的网络有多深？转账操作员的深度学习与浅层学习
链接：https://arxiv.org/abs/2509.19930

作者：Tabish, Benedict Leimkuhler, Stefan Klus
摘要：我们提出了一种随机神经网络方法，称为RaNNDy学习转移算子和他们的光谱分解数据。神经网络隐层的权值是随机选择的，只训练输出层。主要优点是，在准确性没有明显降低的情况下，这种方法显着减少了训练时间和资源，同时避免了与深度学习相关的常见问题，例如对超参数的敏感性和收敛速度慢。此外，所提出的框架允许我们计算输出层的封闭形式的解决方案，它直接代表运营商的本征函数。此外，它是可能的，通过集成学习来估计与计算的光谱特性相关联的不确定性。我们给出了不同动力学算子的结果，包括Koopman和Perron-Frobenius算子，它们在分析复杂动力系统的行为中有重要的应用，以及Schr\“odinger算子。数值例子，突出的优点，但也提出的框架的弱点，包括几个随机动力系统，蛋白质折叠过程，量子谐振子。
摘要：We propose a randomized neural network approach called RaNNDy for learning transfer operators and their spectral decompositions from data. The weights of the hidden layers of the neural network are randomly selected and only the output layer is trained. The main advantage is that without a noticeable reduction in accuracy, this approach significantly reduces the training time and resources while avoiding common problems associated with deep learning such as sensitivity to hyperparameters and slow convergence. Additionally, the proposed framework allows us to compute a closed-form solution for the output layer which directly represents the eigenfunctions of the operator. Moreover, it is possible to estimate uncertainties associated with the computed spectral properties via ensemble learning. We present results for different dynamical operators, including Koopman and Perron-Frobenius operators, which have important applications in analyzing the behavior of complex dynamical systems, and the Schr\"odinger operator. The numerical examples, which highlight the strengths but also weaknesses of the proposed framework, include several stochastic dynamical systems, protein folding processes, and the quantum harmonic oscillator.

【7】MMSE-Calibrated Few-Shot Prompting for Alzheimer's Detection
标题：用于阿尔茨海默氏症检测的MMSE校准的Few-Shoting
链接：https://arxiv.org/abs/2509.19926

作者：dan, Mounim A. El-Yacoubi, Nasredine Semmar
摘要：识别大型语言模型是一种无需训练的方法，用于从语音转录中检测阿尔茨海默病。使用ADReSS数据集，我们重新审视zero-shot提示和研究Few-Shot提示与类平衡协议使用嵌套交织和严格的模式，每类扫描多达20个例子。我们评估两个变体实现国家的最先进的提示结果。(i)MMSE-代理验证：每个Few-Shot示例携带通过确定性映射锚定到简易精神状态检查带的概率，使得能够进行AUC计算;这达到0.82的准确度和0.86的AUC（ii）推理增强的推理：利用多模态LLM（GPT-5）生成Few-Shot示例池，该多模态LLM（GPT-5）将Cookie盗窃图像、转录和MMSE作为输入，以输出推理和MMSE对齐的概率;评估仍然是仅转录，并达到0.82的准确度和0.83的AUC。据我们所知，这是第一个ADReSS研究锚定引发的概率MMSE，并使用多模态建设，以提高可解释性。
摘要：Prompting large language models is a training-free method for detecting Alzheimer's disease from speech transcripts. Using the ADReSS dataset, we revisit zero-shot prompting and study few-shot prompting with a class-balanced protocol using nested interleave and a strict schema, sweeping up to 20 examples per class. We evaluate two variants achieving state-of-the-art prompting results. (i) MMSE-Proxy Prompting: each few-shot example carries a probability anchored to Mini-Mental State Examination bands via a deterministic mapping, enabling AUC computing; this reaches 0.82 accuracy and 0.86 AUC (ii) Reasoning-augmented Prompting: few-shot examples pool is generated with a multimodal LLM (GPT-5) that takes as input the Cookie Theft image, transcript, and MMSE to output a reasoning and MMSE-aligned probability; evaluation remains transcript-only and reaches 0.82 accuracy and 0.83 AUC. To our knowledge, this is the first ADReSS study to anchor elicited probabilities to MMSE and to use multimodal construction to improve interpretability.

【8】BoreaRL: A Multi-Objective Reinforcement Learning Environment for Climate-Adaptive Boreal Forest Management
标题：BoreaRL：用于气候适应性北方森林管理的多目标强化学习环境
链接：https://arxiv.org/abs/2509.19846

作者：dley Dsouza, Enoch Ofosu, Daniel Chukwuemeka Amaogu, Jérôme Pigeon, Richard Boudreault, Pooneh Maghoul, Juan Moreno-Cruz, Yuri Leonenko
摘要：北方森林储存了30-40%的陆地碳，其中大部分储存在气候脆弱的永久冻土中，这使得它们的管理对减缓气候至关重要。然而，优化森林管理以实现碳固存和永久冻土保护带来了目前工具无法充分解决的复杂权衡。我们介绍$\textbf{BoreaRL}$，第一个多目标强化学习环境，用于气候适应性北方森林管理，具有耦合能量，碳和水通量的物理接地模拟器。BoreaRL支持两种训练范式：用于受控研究的特定地点模式和用于在环境随机性下学习稳健政策的通才模式。通过对多目标RL算法的评估，我们揭示了学习难度的根本不对称性：碳目标比解冻（永久冻土保护）目标更容易优化，以解冻为重点的政策在两种范式中显示出最小的学习进展。在通才设置，标准的偏好条件的方法完全失败，而一个天真的课程学习方法通过战略性地选择训练集实现卓越的性能。学习策略的分析揭示了不同的管理理念，以碳为重点的政策有利于积极的高密度针叶林，而有效的多目标政策平衡物种组成和密度，以保护冻土，同时保持碳收益。我们的研究结果表明，强大的气候适应性森林管理仍然具有挑战性，目前MORL方法，建立BoreaRL作为一个有价值的基准，开发更有效的方法。我们开源了BoreaRL，以加速气候应用中多目标RL的研究。
摘要：Boreal forests store 30-40% of terrestrial carbon, much in climate-vulnerable permafrost soils, making their management critical for climate mitigation. However, optimizing forest management for both carbon sequestration and permafrost preservation presents complex trade-offs that current tools cannot adequately address. We introduce $\textbf{BoreaRL}$, the first multi-objective reinforcement learning environment for climate-adaptive boreal forest management, featuring a physically-grounded simulator of coupled energy, carbon, and water fluxes. BoreaRL supports two training paradigms: site-specific mode for controlled studies and generalist mode for learning robust policies under environmental stochasticity. Through evaluation of multi-objective RL algorithms, we reveal a fundamental asymmetry in learning difficulty: carbon objectives are significantly easier to optimize than thaw (permafrost preservation) objectives, with thaw-focused policies showing minimal learning progress across both paradigms. In generalist settings, standard preference-conditioned approaches fail entirely, while a naive curriculum learning approach achieves superior performance by strategically selecting training episodes. Analysis of learned strategies reveals distinct management philosophies, where carbon-focused policies favor aggressive high-density coniferous stands, while effective multi-objective policies balance species composition and density to protect permafrost while maintaining carbon gains. Our results demonstrate that robust climate-adaptive forest management remains challenging for current MORL methods, establishing BoreaRL as a valuable benchmark for developing more effective approaches. We open-source BoreaRL to accelerate research in multi-objective RL for climate applications.

【9】EgoBridge: Domain Adaptation for Generalizable Imitation from Egocentric Human Data
标题：EgoBridge：从以自我为中心的人类数据进行可推广模仿的领域调整
链接：https://arxiv.org/abs/2509.19626

作者：miya, Dhruv Patel, Patcharapong Aphiwetsa, Pranav Kuppili, Lawrence Y. Zhu, Simar Kareer, Judy Hoffman, Danfei Xu
备注：Accepted at 39th Conference on Neural Information Processing Systems (NeurIPS 2025) and Oral at Conference on Robot Learning (CoRL 2025)
摘要：以自我为中心的人类经验数据为扩大机器人操作的端到端模仿学习提供了巨大的资源。然而，人类和机器人之间的视觉外观，传感器模态和运动学的显着领域差距阻碍了知识转移。本文介绍了EgoBridge，这是一个统一的协同训练框架，它使用域自适应显式地对齐人类和机器人数据之间的策略潜在空间。通过测量基于最优运输（OT）的联合策略潜在特征和动作的差异，我们学习了观察表示，不仅在人类和机器人领域之间保持一致，而且还保留了对策略学习至关重要的动作相关信息。在三个真实世界的单臂和双手操作任务中，EgoBridge实现了44%的绝对策略成功率，比人类增强的跨具体基线提高了44%。EgoBridge还可以推广到只有在人类数据中才能看到的新对象，场景和任务，其中基线完全失败。视频和其他信息可以在https://ego-bridge.github.io上找到
摘要：Egocentric human experience data presents a vast resource for scaling up end-to-end imitation learning for robotic manipulation. However, significant domain gaps in visual appearance, sensor modalities, and kinematics between human and robot impede knowledge transfer. This paper presents EgoBridge, a unified co-training framework that explicitly aligns the policy latent spaces between human and robot data using domain adaptation. Through a measure of discrepancy on the joint policy latent features and actions based on Optimal Transport (OT), we learn observation representations that not only align between the human and robot domain but also preserve the action-relevant information critical for policy learning. EgoBridge achieves a significant absolute policy success rate improvement by 44% over human-augmented cross-embodiment baselines in three real-world single-arm and bimanual manipulation tasks. EgoBridge also generalizes to new objects, scenes, and tasks seen only in human data, where baselines fail entirely. Videos and additional information can be found at https://ego-bridge.github.io

【10】A Realistic Evaluation of Cross-Frequency Transfer Learning and Foundation Forecasting Models
标题：跨频率迁移学习与基础预测模型的现实评价
链接：https://arxiv.org/abs/2509.19465

作者：ivares, Malcolm Wolff, Tatiana Konstantinova, Shankar Ramasubramanian, Andrew Gordon Wilson, Andres Potapczynski, Willa Potosnak, Mengfei Cao, Boris Oreshkin, Dmitry Efimov
备注：Thirty-Ninth Annual Conference on Neural Information Processing Systems {NeurIPS 2025}. Recent Advances in Time Series Foundation Models Have We Reached the 'BERT Moment'?
摘要：跨频率迁移学习（CFTL）已经成为一种流行的框架，用于管理大规模时间序列数据集，以预训练基础预测模型（FFM）。虽然CFTL已显示出前景，但目前的基准做法无法准确评估其业绩。这一缺点源于许多因素：过度依赖小规模评估数据集;计算汇总统计时对样本量处理不当;报告次优统计模型;以及未能考虑到预训练和测试数据集之间重叠的不可忽视的风险。为了解决这些限制，我们引入了广泛采用的神经预测网络的统一重新实现，使其适应CFTL设置;我们只对专有和合成数据进行预训练，小心防止测试泄漏;我们对15个大型，多样化的公共预测竞争数据集进行评估。我们的实证分析表明，统计模型的准确性经常被低估。值得注意的是，我们确认统计模型及其集成在sCRPS中始终优于现有的FFM超过8.2%，在数据集中超过20%的MASE。然而，我们还发现，合成数据集预训练确实将FFM的准确性提高了7%。
摘要：Cross-frequency transfer learning (CFTL) has emerged as a popular framework for curating large-scale time series datasets to pre-train foundation forecasting models (FFMs). Although CFTL has shown promise, current benchmarking practices fall short of accurately assessing its performance. This shortcoming stems from many factors: an over-reliance on small-scale evaluation datasets; inadequate treatment of sample size when computing summary statistics; reporting of suboptimal statistical models; and failing to account for non-negligible risks of overlap between pre-training and test datasets. To address these limitations, we introduce a unified reimplementation of widely-adopted neural forecasting networks, adapting them for the CFTL setup; we pre-train only on proprietary and synthetic data, being careful to prevent test leakage; and we evaluate on 15 large, diverse public forecast competition datasets. Our empirical analysis reveals that statistical models' accuracy is frequently underreported. Notably, we confirm that statistical models and their ensembles consistently outperform existing FFMs by more than 8.2% in sCRPS, and by more than 20% MASE, across datasets. However, we also find that synthetic dataset pre-training does improve the accuracy of a FFM by 7% percent.

【11】TimeMosaic: Temporal Heterogeneity Guided Time Series Forecasting via Adaptive Granularity Patch and Segment-wise Decoding
标题：TimeMosaic：基于自适应粒度补丁和分段解码的时间异质性引导的时间序列预测
链接：https://arxiv.org/abs/2509.19406

作者：g, Fanda Fan, Chunyi Hou, Zheya Wang, Lei Wang, Zhengxin Yang, Jianfeng Zhan
摘要：多变量时间序列预测在金融、交通、气候和能源等领域至关重要。然而，现有的基于块的方法通常采用固定长度的分割，忽略了局部时间动态的异质性和预测的解码异质性。这种设计在信息密集的区域丢失了细节，在稳定的部分引入了冗余，并且无法捕捉短期和长期视野的独特复杂性。我们提出TimeMosaic，一个预测框架，旨在解决时间异质性。TimeMosaic采用自适应补丁嵌入，根据局部信息密度动态调整粒度，在保持时间连续性的同时平衡模体重用和结构清晰度。此外，它还引入了分段解码，将每个预测范围视为相关的子任务，并适应特定于范围的难度和信息要求，而不是应用单个统一的解码器。对基准数据集的广泛评估表明，TimeMosaic与现有方法相比具有一致的改进，我们在具有3210亿次观察的大规模语料库上训练的模型实现了与最先进的TSFM竞争的性能。
摘要：Multivariate time series forecasting is essential in domains such as finance, transportation, climate, and energy. However, existing patch-based methods typically adopt fixed-length segmentation, overlooking the heterogeneity of local temporal dynamics and the decoding heterogeneity of forecasting. Such designs lose details in information-dense regions, introduce redundancy in stable segments, and fail to capture the distinct complexities of short-term and long-term horizons. We propose TimeMosaic, a forecasting framework that aims to address temporal heterogeneity. TimeMosaic employs adaptive patch embedding to dynamically adjust granularity according to local information density, balancing motif reuse with structural clarity while preserving temporal continuity. In addition, it introduces segment-wise decoding that treats each prediction horizon as a related subtask and adapts to horizon-specific difficulty and information requirements, rather than applying a single uniform decoder. Extensive evaluations on benchmark datasets demonstrate that TimeMosaic delivers consistent improvements over existing methods, and our model trained on the large-scale corpus with 321 billion observations achieves performance competitive with state-of-the-art TSFMs.

【12】TensLoRA: Tensor Alternatives for Low-Rank Adaptation
标题：TensLoRA：低等级适应的张量替代方案
链接：https://arxiv.org/abs/2509.19391

作者：oret, Reda Bensaid, Jonathan Lys, Vincent Gripon, François Leduc-Primeau
备注：Submitted at ICASSP 2026. 5 pages, 1 figure, 2 tables. Code can be found at this https URL
摘要：低秩自适应（LoRA）被广泛用于通过将可训练的低秩矩阵添加到注意力投影来有效地自适应Transformers。虽然有效，但这些矩阵对于每个注意力投影（查询，关键字和值）和每个层都是独立的。最近的扩展考虑了联合，张量为基础的适应，但只有在有限的形式，没有一个系统的框架。我们引入了TensLoRA，这是一个统一的框架，它将LoRA更新聚合到高阶张量中，并对一系列基于张量的低秩适应进行建模。我们的公式概括了现有的基于张量的方法，并使模式特定的压缩率，允许参数预算根据模态和任务进行定制。视觉和语言基准测试的实验表明，张量构造直接影响性能，有时在类似的参数计数下优于标准LoRA。
摘要：Low-Rank Adaptation (LoRA) is widely used to efficiently adapt Transformers by adding trainable low-rank matrices to attention projections. While effective, these matrices are considered independent for each attention projection (Query, Key, and Value) and each layer. Recent extensions have considered joint, tensor-based adaptations, but only in limited forms and without a systematic framework. We introduce TensLoRA, a unified framework that aggregates LoRA updates into higher-order tensors and models a broad family of tensor-based low-rank adaptations. Our formulation generalizes existing tensor-based methods and enables mode-specific compression rates, allowing parameter budgets to be tailored according to the modality and task. Experiments on vision and language benchmarks reveal that the tensor construction directly impacts performance, sometimes better than standard LoRA under similar parameter counts.

【13】Analyzing the Impact of Credit Card Fraud on Economic Fluctuations of American Households Using an Adaptive Neuro-Fuzzy Inference System
标题：使用自适应神经模糊推理系统分析信用卡欺诈对美国家庭经济波动的影响
链接：https://arxiv.org/abs/2509.19363

作者：g, Qinghe Zhang, Zhuopei Cheng
摘要：信用卡诈骗正日益成为美国家庭财务状况的主要威胁，导致家庭经济行为发生不可预测的变化。为了解决这一问题，本文提出了一种新的混合分析方法，利用增强型ANFIS。该模型提出了几个传统的ANFIS框架的进步，并采用了多分辨率小波分解模块和时间注意机制。该模型对历史交易数据和宏观经济指标进行离散小波变换，生成局部化的经济冲击信号。转换后的功能，然后送入一个深模糊规则库，这是基于Takagi-Sugeno模糊规则与自适应高斯隶属函数。该模型提出了一个时间注意编码器，自适应地分配权重的多尺度的经济行为模式，提高了相关性评估的有效性，在模糊推理阶段，并提高捕获的长期时间依赖性和欺诈活动所造成的异常。所提出的方法不同于经典的ANFIS具有固定的输入输出关系，因为它集成了模糊规则激活与小波基的选择和时间相关权重，通过一个模块化的训练过程。实验结果表明，与局部神经模糊模型和传统LSTM模型相比，RMSE降低了17.8%。
摘要：Credit card fraud is assuming growing proportions as a major threat to the financial position of American household, leading to unpredictable changes in household economic behavior. To solve this problem, in this paper, a new hybrid analysis method is presented by using the Enhanced ANFIS. The model proposes several advances of the conventional ANFIS framework and employs a multi-resolution wavelet decomposition module and a temporal attention mechanism. The model performs discrete wavelet transformations on historical transaction data and macroeconomic indicators to generate localized economic shock signals. The transformed features are then fed into a deep fuzzy rule library which is based on Takagi-Sugeno fuzzy rules with adaptive Gaussian membership functions. The model proposes a temporal attention encoder that adaptively assigns weights to multi-scale economic behavior patterns, increasing the effectiveness of relevance assessment in the fuzzy inference stage and enhancing the capture of long-term temporal dependencies and anomalies caused by fraudulent activities. The proposed method differs from classical ANFIS which has fixed input-output relations since it integrates fuzzy rule activation with the wavelet basis selection and the temporal correlation weights via a modular training procedure. Experimental results show that the RMSE was reduced by 17.8% compared with local neuro-fuzzy models and conventional LSTM models.

【14】Advancing Few-Shot Pediatric Arrhythmia Classification with a Novel Contrastive Loss and Multimodal Learning
标题：利用新型对比损失和多模式学习推进Few-Shot儿科心律失常分类
链接：https://arxiv.org/abs/2509.19315

作者：en, Zijian Huang, Zhenghui Feng
备注：12pages, 10 figures
摘要：儿童心律失常是残疾和心源性猝死的主要风险因素，但由于类别不平衡、Few-Shot类别和复杂信号特征，其自动分类仍然具有挑战性，这严重限制了早期筛查和临床干预的效率和可靠性。为了解决这个问题，我们提出了一个多模态端到端深度学习框架，该框架结合了用于ECG和IEGM的双分支卷积编码器，用于跨模态特征对齐的语义注意力，以及用于全局依赖建模的轻量级Transformer编码器。此外，我们引入了一个新的对比损失函数，称为自适应全局类感知对比损失（AGCACL），以提高类内的紧凑性和类间的可分性，通过类原型和全局相似性矩阵。据我们所知，这是第一个基于莱比锡心脏中心儿科/先天性ECG+IEGM数据集的系统性研究，我们还为此提供了一个完整且可重现的预处理管道。实验结果表明，该方法在该数据集上取得了最佳的性能，包括97.76%的Top-1准确率、94.08%的宏精确率、91.97%的宏召回率、92.97%的宏F1和92.36%的宏F2，分别提高了+13.64，+15.96，+19.82，在宏观精确度/召回率/F1/F2中，分别比最强基线高出+19.44个百分点。这些发现表明，该框架显著提高了少数心律失常类别的可检测性和稳健性，为儿科和先天性心脏病人群的心律筛查、术前评估和术后随访提供了潜在的临床价值。
摘要：Pediatric arrhythmias are a major risk factor for disability and sudden cardiac death, yet their automated classification remains challenging due to class imbalance, few-shot categories, and complex signal characteristics, which severely limit the efficiency and reliability of early screening and clinical intervention. To address this problem, we propose a multimodal end-to-end deep learning framework that combines dual-branch convolutional encoders for ECG and IEGM, semantic attention for cross-modal feature alignment, and a lightweight Transformer encoder for global dependency modeling. In addition, we introduce a new contrastive loss fucntion named Adaptive Global Class-Aware Contrastive Loss (AGCACL) to enhance intra-class compactness and inter-class separability through class prototypes and a global similarity matrix. To the best of our knowledge, this is the first systematic study based on the Leipzig Heart Center pediatric/congenital ECG+IEGM dataset, for which we also provide a complete and reproducible preprocessing pipeline. Experimental results demonstrate that the proposed method achieves the overall best performance on this dataset, including 97.76\% Top-1 Accuracy, 94.08\% Macro Precision, 91.97\% Macro Recall, 92.97\% Macro F1, and 92.36\% Macro F2, with improvements of +13.64, +15.96, +19.82, and +19.44 percentage points over the strongest baseline in Macro Precision/Recall/F1/F2, respectively. These findings indicate that the framework significantly improves the detectability and robustness for minority arrhythmia classes, offering potential clinical value for rhythm screening, pre-procedural assessment, and postoperative follow-up in pediatric and congenital heart disease populations.

强化学习(4篇)

【1】UserRL: Training Interactive User-Centric Agent via Reinforcement Learning
标题：userRL：通过强化学习训练交互式以用户为中心的代理
链接：https://arxiv.org/abs/2509.19736

作者：n, Zuxin Liu, Akshara Prabhakar, Jielin Qiu, Zhiwei Liu, Haolin Chen, Shirley Kokane, Heng Ji, Weiran Yao, Shelby Heinecke, Silvio Savarese, Caiming Xiong, Huan Wang
备注：28 Pages, 15 Figures, 6 Tables; Built upon latest UserBench release: arXiv:2507.22034
摘要：强化学习（RL）在训练超越静态基准的代理模型方面表现出了希望，这些模型可以参与动态的多回合交互。然而，这种代理的最终价值在于他们的能力，以帮助用户，设置的多样性和动态的用户交互构成挑战。在这项工作中，我们提出了UserRL，一个统一的框架，通过与模拟用户配对的标准化健身房环境来训练和评估以用户为中心的能力。我们系统地改变了回合奖励分配和强制等级分数计算，以分析不同的配方如何影响GRPO算法下的学习。我们在Qwen 3模型上的实验揭示了三个关键发现：（i）SFT冷启动对于解锁初始交互能力和实现持续的RL改进至关重要;（ii）故意的轨迹评分产生更高效和有效的多回合交互;以及（iii）虽然更强的模拟用户（例如，GPT-4 o）促进培训，开源模拟器（例如，Qwen 3 - 32 B）仍然是一个具有成本效益和可转让的选择。总之，这些结果强调了奖励塑造和用户模拟选择的精心设计与模型规模一样重要，并将UserRL确立为开发强大的以用户为中心的代理模型的实用途径。所有代码和数据都是公开的，供未来研究使用。
摘要：Reinforcement learning (RL) has shown promise in training agentic models that move beyond static benchmarks to engage in dynamic, multi-turn interactions. Yet, the ultimate value of such agents lies in their ability to assist users, a setting where diversity and dynamics of user interaction pose challenges. In this work, we propose UserRL, a unified framework for training and evaluating user-centric abilities through standardized gym environments paired with simulated users. We systematically vary turn-level reward assignment and trajectory-level score calculation to analyze how different formulations affect learning under the GRPO algorithm. Our experiments across Qwen3 models reveal three key findings: (i) SFT cold start is critical for unlocking initial interaction ability and enabling sustained RL improvements; (ii) deliberate trajectory scoring yields more efficient and effective multi-turn interactions; and (iii) while stronger simulated users (e.g., GPT-4o) facilitates training, open-source simulators (e.g., Qwen3-32B) remain a cost-effective and transferable option. Together, these results highlight that careful design of reward shaping and user simulation choice is as crucial as model scale, and establish UserRL as a practical pathway for developing robust user-centric agentic models. All codes and data are public for future research.

【2】DAWM: Diffusion Action World Models for Offline Reinforcement Learning via Action-Inferred Transitions
标题：DAWM：通过时间推断转变的离线强化学习的扩散动作世界模型
链接：https://arxiv.org/abs/2509.19538

作者：i, Xiao Han, Yusong Li, Niklas Strauss, Matthias Schubert
备注：ICML2025 workshop Building Physically Plausible World Models
摘要：基于扩散的世界模型在合成离线强化学习（RL）的现实长时间轨迹方面表现出强大的能力。然而，许多现有的方法不直接生成与状态和奖励一起的动作，限制了它们与依赖于一步时间差（TD）学习的标准基于值的离线RL算法的兼容性。虽然先前的工作已经探索了状态，奖励和动作的联合建模来解决这个问题，但这种公式通常会导致训练复杂性增加和实际性能降低。我们提出了\textbf{DAWM}，这是一个基于扩散的世界模型，它根据当前状态、动作和返回来生成未来的状态奖励轨迹，并与逆动力学模型（IDM）配对，以进行有效的动作推理。这种模块化设计产生适合于一步TD离线RL的完整合成转换，从而实现有效且计算效率高的训练。从经验上讲，我们表明，保守的离线RL算法（如TD 3BC和IQL）从这些增强轨迹的训练中获益匪浅，在D4 RL基准测试中的多个任务中始终优于先前基于扩散的基线。
摘要：Diffusion-based world models have demonstrated strong capabilities in synthesizing realistic long-horizon trajectories for offline reinforcement learning (RL). However, many existing methods do not directly generate actions alongside states and rewards, limiting their compatibility with standard value-based offline RL algorithms that rely on one-step temporal difference (TD) learning. While prior work has explored joint modeling of states, rewards, and actions to address this issue, such formulations often lead to increased training complexity and reduced performance in practice. We propose \textbf{DAWM}, a diffusion-based world model that generates future state-reward trajectories conditioned on the current state, action, and return-to-go, paired with an inverse dynamics model (IDM) for efficient action inference. This modular design produces complete synthetic transitions suitable for one-step TD-based offline RL, enabling effective and computationally efficient training. Empirically, we show that conservative offline RL algorithms such as TD3BC and IQL benefit significantly from training on these augmented trajectories, consistently outperforming prior diffusion-based baselines across multiple tasks in the D4RL benchmark.

【3】Evaluation-Aware Reinforcement Learning
标题：评估感知强化学习
链接：https://arxiv.org/abs/2509.19464

作者：ilasrao Deshmukh, Will Schwarzer, Scott Niekum
备注：9 pages, under submission
摘要：策略评估通常是部署安全和性能关键型系统的先决条件。现有的评估方法经常遭受高方差，由于有限的数据和长期任务，或高偏差，由于不平等的支持或不准确的环境模型。我们认为，这些挑战部分来自政策学习的标准强化学习（RL）范式，而没有明确考虑评估。作为替代方案，我们提出了评估感知强化学习（EvA-RL），其中训练策略以最大化预期回报，同时在给定值预测方案下最小化预期评估误差-换句话说，“容易”评估。我们形式化了EvA-RL的框架，并设计了一个实例，可以实现准确的策略评估，条件是在评估环境中的少量推出，可以不同于部署环境。然而，我们的理论分析和实证结果表明，在EvA-RL中使用固定值预测方案时，评估精度和政策性能之间往往存在权衡。为了减轻这种权衡，我们将我们的方法扩展到与策略一起共同学习评估条件状态值预测器。在不同的离散和连续的行动领域的实证结果表明，EvA-RL可以大大减少评估错误，同时保持竞争力的回报。这项工作为广泛的新一类强化学习方法奠定了基础，这些方法将可靠的评估视为训练过程中的首要原则。
摘要：Policy evaluation is often a prerequisite for deploying safety- and performance-critical systems. Existing evaluation approaches frequently suffer from high variance due to limited data and long-horizon tasks, or high bias due to unequal support or inaccurate environmental models. We posit that these challenges arise, in part, from the standard reinforcement learning (RL) paradigm of policy learning without explicit consideration of evaluation. As an alternative, we propose evaluation-aware reinforcement learning (EvA-RL), in which a policy is trained to maximize expected return while simultaneously minimizing expected evaluation error under a given value prediction scheme -- in other words, being "easy" to evaluate. We formalize a framework for EvA-RL and design an instantiation that enables accurate policy evaluation, conditioned on a small number of rollouts in an assessment environment that can be different than the deployment environment. However, our theoretical analysis and empirical results show that there is often a tradeoff between evaluation accuracy and policy performance when using a fixed value-prediction scheme within EvA-RL. To mitigate this tradeoff, we extend our approach to co-learn an assessment-conditioned state-value predictor alongside the policy. Empirical results across diverse discrete and continuous action domains demonstrate that EvA-RL can substantially reduce evaluation error while maintaining competitive returns. This work lays the foundation for a broad new class of RL methods that treat reliable evaluation as a first-class principle during training.

【4】Wavelet Fourier Diffuser: Frequency-Aware Diffusion Model for Reinforcement Learning
标题：子波傅里叶扩散器：强化学习的频率感知扩散模型
链接：https://arxiv.org/abs/2509.19305

作者： Yongzhe Chang, Xueqian Wang
摘要：扩散概率模型通过直接对轨迹序列进行建模，在离线强化学习中表现出了显着的前景。然而，现有的方法主要集中在时域功能，而忽略了频域功能，导致频移和性能下降，根据我们的观察。在本文中，我们研究RL问题从频域的一个新的角度。我们首先观察到，仅时域的方法无意中引入频域的低频分量的偏移，这导致轨迹不稳定和性能下降。为了解决这个问题，我们提出了小波傅立叶扩散器（WFDiffuser），一种新的基于扩散的RL框架，集成了离散小波变换，将轨迹分解为低频和高频分量。为了进一步增强每个组件的扩散建模，WFDiffuser采用短时傅立叶变换和交叉注意机制来提取频域特征并促进交叉频率交互。在D4RL基准测试上的大量实验结果表明，WFDiffuser有效地减轻了频率偏移，导致更平滑，更稳定的轨迹，并改善了现有方法的决策性能。
摘要：Diffusion probability models have shown significant promise in offline reinforcement learning by directly modeling trajectory sequences. However, existing approaches primarily focus on time-domain features while overlooking frequency-domain features, leading to frequency shift and degraded performance according to our observation. In this paper, we investigate the RL problem from a new perspective of the frequency domain. We first observe that time-domain-only approaches inadvertently introduce shifts in the low-frequency components of the frequency domain, which results in trajectory instability and degraded performance. To address this issue, we propose Wavelet Fourier Diffuser (WFDiffuser), a novel diffusion-based RL framework that integrates Discrete Wavelet Transform to decompose trajectories into low- and high-frequency components. To further enhance diffusion modeling for each component, WFDiffuser employs Short-Time Fourier Transform and cross attention mechanisms to extract frequency-domain features and facilitate cross-frequency interaction. Extensive experiment results on the D4RL benchmark demonstrate that WFDiffuser effectively mitigates frequency shift, leading to smoother, more stable trajectories and improved decision-making performance over existing methods.

元学习(2篇)

【1】Feeding Two Birds or Favoring One? Adequacy-Fluency Tradeoffs in Evaluation and Meta-Evaluation of Machine Translation
标题：喂两只鸟还是偏爱一只？机器翻译评估和元评估中的语言性和流利性权衡
链接：https://arxiv.org/abs/2509.20287

作者：ayegh, Jan-Thorsten Peter, David Vilar, Tobias Domhan, Juraj Juraska, Markus Freitag, Lili Mou
备注：Accepted by Tenth Conference on Machine Translation (WMT25)
摘要：我们研究了机器翻译中充分性和流畅性之间的权衡。我们在评估层面上展示了这种权衡的严重性，并分析了流行的指标在其中的位置。从本质上讲，当前的指标通常倾向于充分性，这意味着它们的分数与翻译的充分性比与流畅性的相关性更强。更重要的是，我们发现，这种权衡也持续在元评估水平，和标准的WMT元评估有利于充分性为导向的指标，流畅性为导向的。我们发现，这种偏差部分归因于元评估数据集中包含的系统的组成。为了控制这种偏见，我们提出了一种方法，综合翻译系统的元评价。我们的研究结果强调了理解这种权衡在元评估及其对指标排名的影响的重要性。
摘要：We investigate the tradeoff between adequacy and fluency in machine translation. We show the severity of this tradeoff at the evaluation level and analyze where popular metrics fall within it. Essentially, current metrics generally lean toward adequacy, meaning that their scores correlate more strongly with the adequacy of translations than with fluency. More importantly, we find that this tradeoff also persists at the meta-evaluation level, and that the standard WMT meta-evaluation favors adequacy-oriented metrics over fluency-oriented ones. We show that this bias is partially attributed to the composition of the systems included in the meta-evaluation datasets. To control this bias, we propose a method that synthesizes translation systems in meta-evaluation. Our findings highlight the importance of understanding this tradeoff in meta-evaluation and its impact on metric rankings.

【2】Intelligent Algorithm Selection for Recommender Systems: Meta-Learning via in-depth algorithm feature engineering
标题：推荐系统的智能算法选择：通过深度算法特征工程进行元学习
链接：https://arxiv.org/abs/2509.20134

作者：hi Decker
摘要：“没有免费的午餐”定理表明，没有一个推荐算法对所有用户都是最佳的，这就产生了一个重要的算法选择问题。标准的元学习方法旨在通过基于用户特征选择算法来解决这个问题，但将根本不同的算法本身视为等同的“黑盒”选择。本论文研究的影响，克服这一限制，工程一个全面的功能集，明确描述算法本身。我们结合静态代码指标，抽象树属性，行为性能地标，和高层次的概念功能。我们在五个数据集上评估了两个元学习器：一个只使用用户特征的基线，一个使用用户和算法特征的模型。我们的研究结果表明，使用算法特征增强的元学习者实现了平均NDCG@10为0.143，比单一最佳算法基线（0.128）统计上显著提高了11.7%。然而，我们发现，包含算法特征并没有导致整体NDCG@10比仅使用用户特征的Meta学习器（0.144）有所改善。虽然向元学习者添加算法功能确实提高了其Top-1选择准确率（+16.1%），但这会导致Top-3准确率降低（-10.7%）。我们的结论是，对于推荐系统中的每用户算法选择任务，用户特征的预测能力是压倒性的主导。虽然算法功能提高了选择精度，但释放其潜力以提高整体性能仍然是一个不小的挑战。
摘要：The "No Free Lunch" theorem dictates that no single recommender algorithm is optimal for all users, creating a significant Algorithm Selection Problem. Standard meta-learning approaches aim to solve this by selecting an algorithm based on user features, but treat the fundamentally diverse algorithms themselves as equivalent, "black-box" choices. This thesis investigates the impact of overcoming this limitation by engineering a comprehensive feature set to explicitly characterize the algorithms themselves. We combine static code metrics, Abstract Syntax Tree properties, behavioral performance landmarks, and high-level conceptual features. We evaluate two meta-learners across five datasets: a baseline using only user features and our proposed model using both user and algorithm features. Our results show that the meta-learner augmented with algorithm features achieves an average NDCG@10 of 0.143, a statistically significant improvement of 11.7% over the Single Best Algorithm baseline (0.128). However, we found that the inclusion of algorithm features did not lead to an improvement in overall NDCG@10 over the meta learner using only user features (0.144). While adding algorithm features to the meta-learner did improve its Top-1 selection accuracy (+16.1%), this was counterbalanced by leading to a lower Top-3 accuracy (-10.7%). We conclude that for the per-user algorithm selection task in recommender systems, the predictive power of user features is overwhelmingly dominant. While algorithm features improve selection precision, unlocking their potential to boost overall performance remains a non-trivial challenge.

符号|符号学习(1篇)

【1】Analyzing Generalization in Pre-Trained Symbolic Regression
标题：分析预训练符号回归中的推广
链接：https://arxiv.org/abs/2509.19849

作者：igt, Paul Kahlmeyer, Kai Lawonn, Michael Habeck, Joachim Giesen
摘要：符号回归算法在数学表达式的空间中搜索解释给定数据的公式。基于transformer的模型已经成为一种有前途的可扩展方法，将昂贵的组合搜索转移到大规模的预训练阶段。然而，这些模型的成功关键取决于它们的预训练数据。他们对这种预训练分布之外的问题进行概括的能力在很大程度上尚未被探索。在这项工作中，我们进行了系统的实证研究，以评估预训练的，基于transformer的符号回归的泛化能力。我们严格测试了几种最先进方法在预训练分布和一系列分布外挑战中的性能。我们的研究结果揭示了一个重要的二分法：虽然预训练的模型在分布中表现良好，但在分布外的情况下，性能一直在下降。我们的结论是，这种泛化的差距是从业者的一个关键障碍，因为它严重限制了实际使用的预先训练的方法在现实世界中的应用。
摘要：Symbolic regression algorithms search a space of mathematical expressions for formulas that explain given data. Transformer-based models have emerged as a promising, scalable approach shifting the expensive combinatorial search to a large-scale pre-training phase. However, the success of these models is critically dependent on their pre-training data. Their ability to generalize to problems outside of this pre-training distribution remains largely unexplored. In this work, we conduct a systematic empirical study to evaluate the generalization capabilities of pre-trained, transformer-based symbolic regression. We rigorously test performance both within the pre-training distribution and on a series of out-of-distribution challenges for several state of the art approaches. Our findings reveal a significant dichotomy: while pre-trained models perform well in-distribution, the performance consistently degrades in out-of-distribution scenarios. We conclude that this generalization gap is a critical barrier for practitioners, as it severely limits the practical use of pre-trained approaches for real-world applications.

医学相关(4篇)

【1】RAD: Towards Trustworthy Retrieval-Augmented Multi-modal Clinical Diagnosis
标题：RAD：迈向值得信赖的检索增强多模式临床诊断
链接：https://arxiv.org/abs/2509.19980

作者：, Tianjie Dai, Zhe Chen, Siyuan Du, Jiangchao Yao, Ya Zhang, Yanfeng Wang
备注：Accepted to NeurIPS 2025
摘要：临床诊断是一门高度专业化的学科，既需要领域专业知识，又需要严格遵守严格的指导方针。虽然目前人工智能驱动的医学研究主要集中在知识图或自然文本预训练范式上，以纳入医学知识，但这些方法主要依赖于模型参数中的隐式编码知识，忽略了不同下游任务所需的特定于任务的知识。为了解决这个问题，我们提出了检索增强诊断（RAD），一个新的框架，明确注入外部知识到多模态模型直接对下游任务。具体而言，区域发展方案通过三个关键机制运作：从多个医学源检索和细化以疾病为中心的知识，约束多模态特征和指南知识之间的潜在距离的指南增强对比损失，以及采用指南作为查询以引导跨模态融合的双Transformer解码器，使模型与从指南获取到特征提取和决策的临床诊断工作流程相一致。此外，认识到缺乏定量评估的多模态诊断模型的可解释性，我们介绍了一套标准，从图像和文本的角度来评估的可解释性。对具有不同解剖结构的四个数据集的广泛评估证明了RAD的通用性，实现了最先进的性能。此外，RAD使模型能够更精确地集中在异常区域和关键指标上，确保基于证据的可靠诊断。我们的代码可在https://github.com/tdlhl/RAD上获得。
摘要：Clinical diagnosis is a highly specialized discipline requiring both domain expertise and strict adherence to rigorous guidelines. While current AI-driven medical research predominantly focuses on knowledge graphs or natural text pretraining paradigms to incorporate medical knowledge, these approaches primarily rely on implicitly encoded knowledge within model parameters, neglecting task-specific knowledge required by diverse downstream tasks. To address this limitation, we propose Retrieval-Augmented Diagnosis (RAD), a novel framework that explicitly injects external knowledge into multimodal models directly on downstream tasks. Specifically, RAD operates through three key mechanisms: retrieval and refinement of disease-centered knowledge from multiple medical sources, a guideline-enhanced contrastive loss that constrains the latent distance between multi-modal features and guideline knowledge, and the dual transformer decoder that employs guidelines as queries to steer cross-modal fusion, aligning the models with clinical diagnostic workflows from guideline acquisition to feature extraction and decision-making. Moreover, recognizing the lack of quantitative evaluation of interpretability for multimodal diagnostic models, we introduce a set of criteria to assess the interpretability from both image and text perspectives. Extensive evaluations across four datasets with different anatomies demonstrate RAD's generalizability, achieving state-of-the-art performance. Furthermore, RAD enables the model to concentrate more precisely on abnormal regions and critical indicators, ensuring evidence-based, trustworthy diagnosis. Our code is available at https://github.com/tdlhl/RAD.

【2】Cuffless Blood Pressure Prediction from Speech Sentences using Deep Learning Methods
标题：使用深度学习方法从言语句子中预测无袖血压
链接：https://arxiv.org/abs/2509.19750

作者：
备注：MS Thesis
摘要：本研究提出了一种基于BERT回归模型的语音信号无创动脉血压预测方法。动脉血压是心血管健康的重要指标，准确监测动脉血压对于预防高血压相关并发症至关重要。传统的袖带方法由于白大衣和掩蔽高血压等因素，通常会产生不一致的结果。捕获语音特征以建立与血压水平的相关性利用先进的深度学习技术，我们分析语音信号以提取相关模式，从而实现实时监测，而不会出现传统方法的不适。在我们的研究中，我们采用了一个数据集，该数据集包括来自95名参与者的录音，确保了多样化的表示。BERT模型对从语音中提取的特征进行了微调，从而实现了令人印象深刻的性能指标，收缩压SBP的平均绝对误差MAE为136 mmHg，舒张压DBP的平均绝对误差MAE为124 mmHg，R得分分别为099和094。这些结果表明，模型在准确预测血压水平方面具有鲁棒性。此外，训练和验证损失分析证明了有效的学习和最小的过拟合。血压监测的可行替代方案为改进远程医疗和远程健康监测的应用铺平了道路通过提供用户友好和准确的血压评估方法，这项研究对加强患者护理和心血管健康的主动管理具有重要意义
摘要：This research presents a novel method for noninvasive arterial blood pressure ABP prediction using speech signals employing a BERT based regression model Arterial blood pressure is a vital indicator of cardiovascular health and accurate monitoring is essential in preventing hypertension related complications Traditional cuff based methods often yield inconsistent results due to factors like whitecoat and masked hypertension Our approach leverages the acoustic characteristics of speech capturing voice features to establish correlations with blood pressure levels Utilizing advanced deep learning techniques we analyze speech signals to extract relevant patterns enabling real time monitoring without the discomfort of conventional methods In our study we employed a dataset comprising recordings from 95 participants ensuring diverse representation The BERT model was fine tuned on extracted features from speech leading to impressive performance metrics achieving a mean absolute error MAE of 136 mmHg for systolic blood pressure SBP and 124 mmHg for diastolic blood pressure DBP with R scores of 099 and 094 respectively These results indicate the models robustness in accurately predicting blood pressure levels Furthermore the training and validation loss analysis demonstrates effective learning and minimal overfitting Our findings suggest that integrating deep learning with speech analysis presents a viable alternative for blood pressure monitoring paving the way for improved applications in telemedicine and remote health monitoring By providing a user friendly and accurate method for blood pressure assessment this research has significant implications for enhancing patient care and proactive management of cardiovascular health

【3】Revisiting Performance Claims for Chest X-Ray Models Using Clinical Context
标题：使用临床背景重新审视胸部X射线模型的性能声明
链接：https://arxiv.org/abs/2509.19671

作者：ng, Jiashuo Zhang, Michael Oberst
摘要：胸部X射线（CXR）的公共医疗数据集长期以来一直是开发医疗保健计算机视觉模型的流行基准。然而，机器学习（ML）模型在这些数据集上的强大平均情况性能不足以证明其临床实用性。在本文中，我们使用临床背景，如先前的出院总结所捕获的，为CXR诊断任务提供对当前“最先进”模型的更全面评估。使用每次CXR之前记录的出院总结，我们得出每个CXR标签的“先验”或“预测试”概率，作为临床医生在解释CXR时可用的现有背景知识的代表。使用这种方法，我们证明了两个关键的发现：首先，对于几个诊断标签，CXR模型往往在预检验概率非常低的情况下表现最好，而在预检验概率较高的情况下表现更差。其次，我们使用预测试概率来评估强平均情况下的性能是否反映了真正的诊断信号，而不是推断预测试概率作为捷径的能力。我们发现，性能急剧下降的平衡测试集，这种捷径不存在，这可能表明，大部分的明显的诊断能力来自推断这种临床背景。我们认为，这种风格的分析，使用来自临床笔记的上下文，是一个有前途的方向，更严格和细粒度的临床视觉模型的评估。
摘要：Public healthcare datasets of Chest X-Rays (CXRs) have long been a popular benchmark for developing computer vision models in healthcare. However, strong average-case performance of machine learning (ML) models on these datasets is insufficient to certify their clinical utility. In this paper, we use clinical context, as captured by prior discharge summaries, to provide a more holistic evaluation of current ``state-of-the-art'' models for the task of CXR diagnosis. Using discharge summaries recorded prior to each CXR, we derive a ``prior'' or ``pre-test'' probability of each CXR label, as a proxy for existing contextual knowledge available to clinicians when interpreting CXRs. Using this measure, we demonstrate two key findings: First, for several diagnostic labels, CXR models tend to perform best on cases where the pre-test probability is very low, and substantially worse on cases where the pre-test probability is higher. Second, we use pre-test probability to assess whether strong average-case performance reflects true diagnostic signal, rather than an ability to infer the pre-test probability as a shortcut. We find that performance drops sharply on a balanced test set where this shortcut does not exist, which may indicate that much of the apparent diagnostic power derives from inferring this clinical context. We argue that this style of analysis, using context derived from clinical notes, is a promising direction for more rigorous and fine-grained evaluation of clinical vision models.

【4】Human Activity Recognition Based on Electrocardiogram Data Only
标题：仅基于心电图数据的人体活动识别
链接：https://arxiv.org/abs/2509.19328

作者：azeri, Waltenegus Dargie, Yunhe Feng, Kewei Sha
备注：This is a preprint version. Content may change before final publication
摘要：人类活动识别对于早期干预和健康分析等应用至关重要。传统的活动识别依赖于惯性测量单元（IMU），这是资源密集型的，需要校准。虽然已经探索了基于心电图（ECG）的方法，但是这些方法通常用作IMU的补充，或者局限于广泛的分类，例如跌倒检测或日常活动中的活动与非活动。在本文中，我们推进该领域的演示，第一次，强大的识别活动只有ECG在六个不同的活动，这是超出了以前的工作范围。我们设计和评估了三种新的深度学习模型，包括一个CNN分类器，它具有用于通道特征重新校准的挤压和激励块，一个ResNet分类器，它具有用于多尺度时间依赖性捕获的扩张卷积，以及一个新的CNNTransformer混合体，它将卷积特征提取与注意力机制相结合，用于长距离时间关系建模。对来自54名受试者的6项活动的数据进行测试，所有三个模型对可见受试者的准确率都超过94%，而CNNTransformer混合模型对未见过受试者的准确率最高，达到72%，这一结果可以通过增加训练人群来进一步提高。这项研究首次成功地对多种身体活动进行了ECG活动分类，为开发下一代可穿戴设备提供了巨大的潜力，这些可穿戴设备能够在没有额外运动传感器的情况下同时进行心脏监测和活动识别。
摘要：Human activity recognition is critical for applications such as early intervention and health analytics. Traditional activity recognition relies on inertial measurement units (IMUs), which are resource intensive and require calibration. Although electrocardiogram (ECG)-based methods have been explored, these have typically served as supplements to IMUs or have been limited to broad categorical classification such as fall detection or active vs. inactive in daily activities. In this paper, we advance the field by demonstrating, for the first time, robust recognition of activity only with ECG in six distinct activities, which is beyond the scope of previous work. We design and evaluate three new deep learning models, including a CNN classifier with Squeeze-and-Excitation blocks for channel-wise feature recalibration, a ResNet classifier with dilated convolutions for multiscale temporal dependency capture, and a novel CNNTransformer hybrid combining convolutional feature extraction with attention mechanisms for long-range temporal relationship modeling. Tested on data from 54 subjects for six activities, all three models achieve over 94% accuracy for seen subjects, while CNNTransformer hybrid reaching the best accuracy of 72% for unseen subjects, a result that can be further improved by increasing the training population. This study demonstrates the first successful ECG-only activity classification in multiple physical activities, offering significant potential for developing next-generation wearables capable of simultaneous cardiac monitoring and activity recognition without additional motion sensors.

蒸馏|知识提取(1篇)

【1】Learnable Sampler Distillation for Discrete Diffusion Models
标题：离散扩散模型的可学习采样蒸馏
链接：https://arxiv.org/abs/2509.19962

作者：u, Tongxian Guo, Zhaoqiang Liu
备注：NeurIPS 2025
摘要：离散扩散模型（DDM）已经显示出强大的生成能力，离散数据形态，如文本和分子。然而，它们的实际应用受到效率低下的采样的阻碍，需要大量的采样步骤。通过使用更大的步长来加速DDM通常在生成质量中引入显著的问题，因为它放大了由于因式分解预测引起的复合解码误差和来自数值近似的离散化误差两者的影响，从而导致采样质量的显著降低。为了解决这些挑战，我们提出了可学习的采样器蒸馏（LSD），一种新的方法来训练快速和高保真采样器的DDS。LSD采用了一种蒸馏方法，在这种方法中，一个有几个步骤的学生采样器学会将其中间分数轨迹与一个有许多步骤的高质量教师采样器的中间分数轨迹对齐。这种对齐是通过优化可学习的采样器系数来自适应地调整采样动态来实现的。此外，我们还提出了LSD+，它也可以学习非均匀分配步骤的时间表。跨文本生成，图像生成和合成任务的实验表明，我们提出的方法优于现有的采样器的DDMs，实现更高的采样质量与显着更少的采样步骤。我们的代码可以在\href{https：//github.com/feiyangfu/LSD}{https：//github.com/feiyangfu/LSD}上找到。
摘要：Discrete diffusion models (DDMs) have shown powerful generation ability for discrete data modalities like text and molecules. However, their practical application is hindered by inefficient sampling, requiring a large number of sampling steps. Accelerating DDMs by using larger step sizes typically introduces significant problems in generation quality, as it amplifies the impact of both the compounding decoding error due to factorized predictions and discretization error from numerical approximations, leading to a significant decrease in sampling quality. To address these challenges, we propose learnable sampler distillation (LSD), a novel approach to train fast and high-fidelity samplers for DDMs. LSD employs a distillation approach where a student sampler with a few steps learns to align its intermediate score trajectory with that of a high-quality teacher sampler with numerous steps. This alignment is achieved by optimizing learnable sampler coefficients that adaptively adjust sampling dynamics. Additionally, we further propose LSD+, which also learns time schedules that allocate steps non-uniformly. Experiments across text generation, image generation, and synthetic tasks demonstrate that our proposed approaches outperform existing samplers for DDMs, achieving substantially higher sampling quality with significantly fewer sampling steps. Our code is available at \href{https://github.com/feiyangfu/LSD}{https://github.com/feiyangfu/LSD}.

聚类(1篇)

【1】Anomaly Detection by Clustering DINO Embeddings using a Dirichlet Process Mixture
标题：使用Dirichlet过程混合物对DINO嵌入进行聚集来进行异常检测
链接：https://arxiv.org/abs/2509.19997

作者：lthess, Ender Konukoglu
备注：Paper accepted at MICCAI 2025
摘要：在这项工作中，我们利用信息嵌入的基础模型在医学成像中进行无监督的异常检测。对于小的数据集，一个记忆库的规范功能可以直接用于异常检测，最近已经证明。然而，这不适合大型医疗数据集，因为计算负担大幅增加。因此，我们建议使用Dirichlet Process Mixture模型（DPMM）对规范DINOv2嵌入的分布进行建模，DPMM是一种非参数混合模型，可以根据手头的数据自动调整混合成分的数量。而不是使用一个内存库，我们使用的组件中心和嵌入之间的相似性作为异常评分函数，以创建一个粗糙的异常分割掩模。我们的实验表明，通过DINOv2的DPMM嵌入，尽管是在自然图像上训练的，但在医学成像基准上实现了非常有竞争力的异常检测性能，并且可以做到这一点，同时至少将推理的计算时间减半。我们的分析进一步表明，归一化DINOv2嵌入通常比未归一化的特征更符合解剖结构，即使在存在异常的情况下，也使其成为异常检测的良好代表。该代码可在https://github.com/NicoSchulthess/anomalydino-dpmm上获得。
摘要：In this work, we leverage informative embeddings from foundational models for unsupervised anomaly detection in medical imaging. For small datasets, a memory-bank of normative features can directly be used for anomaly detection which has been demonstrated recently. However, this is unsuitable for large medical datasets as the computational burden increases substantially. Therefore, we propose to model the distribution of normative DINOv2 embeddings with a Dirichlet Process Mixture model (DPMM), a non-parametric mixture model that automatically adjusts the number of mixture components to the data at hand. Rather than using a memory bank, we use the similarity between the component centers and the embeddings as anomaly score function to create a coarse anomaly segmentation mask. Our experiments show that through DPMM embeddings of DINOv2, despite being trained on natural images, achieve very competitive anomaly detection performance on medical imaging benchmarks and can do this while at least halving the computation time at inference. Our analysis further indicates that normalized DINOv2 embeddings are generally more aligned with anatomical structures than unnormalized features, even in the presence of anomalies, making them great representations for anomaly detection. The code is available at https://github.com/NicoSchulthess/anomalydino-dpmm.

自动驾驶|车辆|车道检测等(4篇)

【1】RDAR: Reward-Driven Agent Relevance Estimation for Autonomous Driving
标题：RDAR：自动驾驶的奖励驱动代理相关性估计
链接：https://arxiv.org/abs/2509.19789

作者：io, Greg Woelki, Noureldin Hendy, Nicholas Roy, Byungsoo Kim
备注：10 pages, 6 figures
摘要：人类驾驶员在任何时候都只关注少数几个代理。另一方面，自动驾驶系统处理具有众多智能体的复杂场景，无论它们是人行横道上的行人还是停在路边的车辆。虽然注意力机制提供了一种隐式的方式来减少对影响决策的元素的输入，但现有的用于捕获代理交互的注意力机制是二次的，并且通常在计算上昂贵。我们提出了RDAR，这是一种学习每个代理相关性的策略-每个代理对受控车辆的行为有多大影响-通过识别哪些代理可以从预先训练的行为模型的输入中排除。我们制定的掩蔽过程作为一个马尔可夫决策过程的行动包括一个二进制掩码表示代理选择。我们在一个大规模的驾驶数据集上评估了RDAR，并证明了它通过在整体进度、安全性和性能方面实现可比的驾驶性能来学习准确的相关性数值度量的能力，同时与最先进的行为模型相比，处理的代理数量要少得多。
摘要：Human drivers focus only on a handful of agents at any one time. On the other hand, autonomous driving systems process complex scenes with numerous agents, regardless of whether they are pedestrians on a crosswalk or vehicles parked on the side of the road. While attention mechanisms offer an implicit way to reduce the input to the elements that affect decisions, existing attention mechanisms for capturing agent interactions are quadratic, and generally computationally expensive. We propose RDAR, a strategy to learn per-agent relevance -- how much each agent influences the behavior of the controlled vehicle -- by identifying which agents can be excluded from the input to a pre-trained behavior model. We formulate the masking procedure as a Markov Decision Process where the action consists of a binary mask indicating agent selection. We evaluate RDAR on a large-scale driving dataset, and demonstrate its ability to learn an accurate numerical measure of relevance by achieving comparable driving performance, in terms of overall progress, safety and performance, while processing significantly fewer agents compared to a state of the art behavior model.

【2】Vision-Based Perception for Autonomous Vehicles in Off-Road Environment Using Deep Learning
标题：使用深度学习在越野环境中实现自动驾驶车辆基于视觉的感知
链接：https://arxiv.org/abs/2509.19378

作者：ves Ferreira Neto
备注：2022. 117p. Electrical Engineering PhD Thesis - Graduate Program in Electrical and Computer Engineering, Federal University of Bahia, 40210-630, Salvador, Brazil
摘要：在露天矿和发展中国家的非均匀地形上自动驾驶需要低延迟智能系统。这项工作提出了一个感知系统的自动驾驶汽车在未铺设的道路和越野环境，能够导航粗糙的地形没有预定义的线索。可配置模块化分割网络（CMSNet）框架的建议，促进不同的架构安排。CMSNet配置经过培训，可以在新图像上分割障碍物和可通行地面，这些图像来自具有不利条件（夜间，下雨，灰尘）的未铺设/越野场景。我们研究了应用深度学习来检测没有明确轨道边界的可驾驶区域，研究了可见性受损情况下的算法行为，并评估了实时语义分割的现场测试。一个新的数据集，卡米诺，提出了近12,000图像从一个操作车辆与八个同步相机。与类似的公共集合相比，Kamino数据集具有大量的标记像素，并且包括来自越野试验场的图像，该试验场模拟了在不利能见度下的地雷。为了实现实时推理，CMSNet CNN层被有条不紊地移除，并使用TensorRT、C++和CUDA进行融合。在两个数据集上的实验验证了该系统的有效性。
摘要：Low-latency intelligent systems are required for autonomous driving on non-uniform terrain in open-pit mines and developing countries. This work proposes a perception system for autonomous vehicles on unpaved roads and off-road environments, capable of navigating rough terrain without a predefined trail. The Configurable Modular Segmentation Network (CMSNet) framework is proposed, facilitating different architectural arrangements. CMSNet configurations were trained to segment obstacles and trafficable ground on new images from unpaved/off-road scenarios with adverse conditions (night, rain, dust). We investigated applying deep learning to detect drivable regions without explicit track boundaries, studied algorithm behavior under visibility impairment, and evaluated field tests with real-time semantic segmentation. A new dataset, Kamino, is presented with almost 12,000 images from an operating vehicle with eight synchronized cameras. The Kamino dataset has a high number of labeled pixels compared to similar public collections and includes images from an off-road proving ground emulating a mine under adverse visibility. To achieve real-time inference, CMSNet CNN layers were methodically removed and fused using TensorRT, C++, and CUDA. Empirical experiments on two datasets validated the proposed system's effectiveness.

【3】Stochastic Path Planning in Correlated Obstacle Fields
标题：相关障碍场中的随机路径规划
链接：https://arxiv.org/abs/2509.19559

作者：Elvan Ceyhan
摘要：我们介绍了随机相关的障碍场景（SCOS）的问题，导航设置与空间相关的障碍物的不确定的阻塞状态，现实的约束传感器，提供嘈杂的读数和昂贵的消歧。用高斯随机场（GRF）对空间相关性进行建模，我们开发了贝叶斯信念更新，改进了阻塞概率，并使用后验来减少搜索空间以提高效率。为了找到最优的遍历策略，我们提出了一种新的两阶段学习框架。离线阶段学习一个强大的基础政策，通过乐观的政策迭代增强信息奖金，以鼓励探索信息区域，然后通过贝叶斯信息适应机制的定期基础更新的在线推出政策。该框架支持蒙特卡洛点估计和分布式强化学习（RL）来学习完整的成本分布，从而实现更强的不确定性量化。我们建立了相关性感知更新和收敛性的理论优势下后验采样。通过对不同障碍物密度的综合经验评估，传感器功能显示出超过基线的一致性能增益。该框架解决了在具有对抗性中断或聚集性自然灾害的环境中的导航挑战。
摘要：We introduce the Stochastic Correlated Obstacle Scene (SCOS) problem, a navigation setting with spatially correlated obstacles of uncertain blockage status, realistically constrained sensors that provide noisy readings and costly disambiguation. Modeling the spatial correlation with Gaussian Random Field (GRF), we develop Bayesian belief updates that refine blockage probabilities, and use the posteriors to reduce search space for efficiency. To find the optimal traversal policy, we propose a novel two-stage learning framework. An offline phase learns a robust base policy via optimistic policy iteration augmented with information bonus to encourage exploration in informative regions, followed by an online rollout policy with periodic base updates via a Bayesian mechanism for information adaptation. This framework supports both Monte Carlo point estimation and distributional reinforcement learning (RL) to learn full cost distributions, leading to stronger uncertainty quantification. We establish theoretical benefits of correlation-aware updating and convergence property under posterior sampling. Comprehensive empirical evaluations across varying obstacle densities, sensor capabilities demonstrate consistent performance gains over baselines. This framework addresses navigation challenges in environments with adversarial interruptions or clustered natural hazards.

【4】Electric Vehicle Identification from Behind Smart Meter Data
标题：智能电表数据背后的电动汽车识别
链接：https://arxiv.org/abs/2509.19316

作者：oona, Hui Song, Ali Moradi Amani, Mahdi Jalili, Xinghuo Yu, Peter McTaggart
备注：27 pages,
摘要：从智能电表记录中识别电动汽车（EV）充电负载是一个不可或缺的方面，它使能源分销商能够有效地做出决策，从而就电网的可靠性做出明智和智能的决策。当电动汽车充电发生在电表（BTM）后面时，充电发生在电表的客户端，这测量了整体电力消耗。换句话说，电动汽车的充电被认为是客户负载的一部分，而不是由配电网络运营商（DNO）单独测量。DNO需要完全了解其网络中的EV存在。确定电动汽车充电需求对于更好地规划和管理配电网至关重要。不同于监督的方法，本文解决了电动汽车充电负载识别的问题，在一个非非侵入性的方式从低频智能电表使用的非监督学习方法的基础上异常检测技术。我们的方法不需要事先了解电动汽车充电曲线。它只需要非电动汽车用户的真实用电数据，而这些数据在实践中是丰富的。我们提出了一种深度时间卷积编码解码（TAE）网络。TAE应用于澳大利亚维多利亚州家庭的智能BTM的功耗，TAE在识别电动汽车家庭方面表现出卓越的性能。
摘要：Electric vehicle (EV) charging loads identification from behind smart meter recordings is an indispensable aspect that enables effective decision-making for energy distributors to reach an informed and intelligent decision about the power grid's reliability. When EV charging happens behind the meter (BTM), the charging occurs on the customer side of the meter, which measures the overall electricity consumption. In other words, the charging of the EV is considered part of the customer's load and not separately measured by the Distribution Network Operators (DNOs). DNOs require complete knowledge about the EV presence in their network. Identifying the EV charging demand is essential to better plan and manage the distribution grid. Unlike supervised methods, this paper addresses the problem of EV charging load identification in a non-nonintrusive manner from low-frequency smart meter using an unsupervised learning approach based on anomaly detection technique. Our approach does not require prior knowledge of EV charging profiles. It only requires real power consumption data of non-EV users, which are abundant in practice. We propose a deep temporal convolution encoding decoding (TAE) network. The TAE is applied to power consumption from smart BTM from Victorian households in Australia, and the TAE shows superior performance in identifying households with EVs.

点云|SLAM|雷达|激光|深度RGBD相关(1篇)

【1】Feature Dynamics as Implicit Data Augmentation: A Depth-Decomposed View on Deep Neural Network Generalization
标题：特征动态学作为隐式数据增强：深度神经网络概括的深度分解观点
链接：https://arxiv.org/abs/2509.20334

作者：an, Kuo Gai, Shihua Zhang
摘要：为什么深度网络能很好地泛化？与经典的泛化理论相比，我们不仅通过检查输入和输出来解决这个基本问题，而且还通过内部特征的演变来解决这个问题。我们的研究表明，时间一致性的现象，预测保持稳定时，从较早的检查点结合较深的功能，从后来的浅功能。这种稳定性不是一个微不足道的收敛伪像。它作为一种支持泛化的隐式结构化增强形式。我们表明，时间一致性扩展到看不见的和损坏的数据，但崩溃时，语义结构被破坏（例如，随机标签）。统计测试进一步表明，SGD注入与几个主要方向一致的各向异性噪声，加强了其作为结构化可变性来源的作用。总之，这些研究结果提出了一个概念的角度来看，链接功能动态泛化，指向未来的工作，测量时间特征演变的实际代理人。
摘要：Why do deep networks generalize well? In contrast to classical generalization theory, we approach this fundamental question by examining not only inputs and outputs, but the evolution of internal features. Our study suggests a phenomenon of temporal consistency that predictions remain stable when shallow features from earlier checkpoints combine with deeper features from later ones. This stability is not a trivial convergence artifact. It acts as a form of implicit, structured augmentation that supports generalization. We show that temporal consistency extends to unseen and corrupted data, but collapses when semantic structure is destroyed (e.g., random labels). Statistical tests further reveal that SGD injects anisotropic noise aligned with a few principal directions, reinforcing its role as a source of structured variability. Together, these findings suggest a conceptual perspective that links feature dynamics to generalization, pointing toward future work on practical surrogates for measuring temporal feature evolution.

联邦学习|隐私保护|加密(4篇)

【1】FairEquityFL -- A Fair and Equitable Client Selection in Federated Learning for Heterogeneous IoV Networks
标题：FairEquityFL --异构车联网联合学习中公平公正的客户端选择
链接：https://arxiv.org/abs/2509.20193

作者：slam, Adnan Mahmood, Noorain Mukhtiar, Kasun Eranda Wijethilake, Quan Z. Sheng
备注：Published in: Advanced Data Mining and Applications (ADMA 2024), Lecture Notes in Computer Science, vol. 15388, pp. 254-269. First online: 13 Dec 2024. DOI: https://doi.org/10.1007/978-981-96-0814-0_17. 422
摘要：联合学习（FL）已经被广泛用于机器学习中的许多应用，即，这主要是由于其隐私保护性质和减轻通信开销的效率。车联网是其中一个很有前景的应用，其中FL可以用来更有效地训练模型。由于只有一部分客户可以参加每轮FL培训，因此在客户选择过程中出现了与公平性有关的挑战。多年来，许多来自学术界和工业界的研究人员提出了许多外语框架。然而，据我们所知，他们都没有在动态和异构的车联网环境中采用公平的基于FL的客户端选择。因此，在本文中，我们设想了一个FairEquityFL框架，以确保公平的机会，为所有的客户参与FL培训过程。特别是，我们在选择器组件中引入了采样均衡器模块，用于确保在客户端选择过程中所有客户端的公平协作机会方面的公平性。选择者还负责监督和控制客户在每轮FL培训中的参与。此外，一个离群值检测机制是强制执行的识别恶意客户端的模型性能的基础上，无论是在准确性或损失最小化的相当大的波动。选择器标记可疑客户并暂时中止此类客户参与FL培训过程。我们进一步评估了FairEquityFL在公开数据集FEMNIST上的性能。我们的仿真结果表明，FairEquityFL在相当程度上优于基线模型。
摘要：Federated Learning (FL) has been extensively employed for a number of applications in machine learning, i.e., primarily owing to its privacy preserving nature and efficiency in mitigating the communication overhead. Internet of Vehicles (IoV) is one of the promising applications, wherein FL can be utilized to train a model more efficiently. Since only a subset of the clients can participate in each FL training round, challenges arise pertinent to fairness in the client selection process. Over the years, a number of researchers from both academia and industry have proposed numerous FL frameworks. However, to the best of our knowledge, none of them have employed fairness for FL-based client selection in a dynamic and heterogeneous IoV environment. Accordingly, in this paper, we envisage a FairEquityFL framework to ensure an equitable opportunity for all the clients to participate in the FL training process. In particular, we have introduced a sampling equalizer module within the selector component for ensuring fairness in terms of fair collaboration opportunity for all the clients in the client selection process. The selector is additionally responsible for both monitoring and controlling the clients' participation in each FL training round. Moreover, an outlier detection mechanism is enforced for identifying malicious clients based on the model performance in terms of considerable fluctuation in either accuracy or loss minimization. The selector flags suspicious clients and temporarily suspend such clients from participating in the FL training process. We further evaluate the performance of FairEquityFL on a publicly available dataset, FEMNIST. Our simulation results depict that FairEquityFL outperforms baseline models to a considerable extent.

【2】On the Fragility of Contribution Score Computation in Federated Learning
标题：联邦学习中贡献分数计算的脆弱性
链接：https://arxiv.org/abs/2509.19921

作者：jo, Marcell Frank, Krisztian Varga, Peter Veliczky
摘要：本文研究了联邦学习中贡献评估的脆弱性，这是确保公平性和激励参与的关键机制。我们认为，贡献分数容易受到显着的扭曲，从两个基本的角度：建筑敏感性和有意操纵。首先，我们探讨不同的模型聚合方法如何影响这些分数。虽然大多数研究假设一个基本的平均方法，我们证明，先进的技术，包括那些旨在处理不可靠或不同的客户端，可以无意中，但显着改变最终的分数。其次，我们探讨了中毒攻击带来的漏洞，恶意参与者策略性地操纵他们的模型更新，以夸大自己的贡献分数或降低其他参与者的重要性。通过在Flower框架内实现的不同数据集和模型架构的广泛实验，我们严格证明了聚合方法的选择和攻击者的存在都是扭曲贡献分数的有效向量，突出了对更强大的评估方案的迫切需求。
摘要：This paper investigates the fragility of contribution evaluation in federated learning, a critical mechanism for ensuring fairness and incentivizing participation. We argue that contribution scores are susceptible to significant distortions from two fundamental perspectives: architectural sensitivity and intentional manipulation. First, we explore how different model aggregation methods impact these scores. While most research assumes a basic averaging approach, we demonstrate that advanced techniques, including those designed to handle unreliable or diverse clients, can unintentionally yet significantly alter the final scores. Second, we explore vulnerabilities posed by poisoning attacks, where malicious participants strategically manipulate their model updates to inflate their own contribution scores or reduce the importance of other participants. Through extensive experiments across diverse datasets and model architectures, implemented within the Flower framework, we rigorously show that both the choice of aggregation method and the presence of attackers are potent vectors for distorting contribution scores, highlighting a critical need for more robust evaluation schemes.

【3】C${}^2$Prompt: Class-aware Client Knowledge Interaction for Federated Continual Learning
标题：C${}' 2 $提示：用于联合持续学习的班级感知客户知识交互
链接：https://arxiv.org/abs/2509.19674

作者：, Yibo Feng, Jiangmeng Li, Yongsheng Qi, Jiahuan Zhou
备注：Accepted by NeurIPS 2025
摘要：联邦持续学习（FCL）解决了从分布式客户端不断出现的任务数据中学习的场景，其中的关键挑战在于同时解决时间遗忘和空间遗忘。最近，基于任务提示的FCL方法通过任务提示的沟通表现出了先进的性能。在这项研究中，我们强调，现有的基于任务提示的FCL方法很容易跨客户端提示之间的类知识一致性。类知识一致性包括两个方面：（1）跨客户端的类内分布差距，这降低了跨提示的学习语义，（2）提示间类相关性，这突出了跨类知识混淆。在即时交流过程中，类间连贯性不足会加剧新提示之间的知识冲突，并导致对旧提示的干扰，从而加剧时空遗忘。为了解决这些问题，我们提出了一种新的类感知客户端知识交互（C${}^2$Prompt）方法，显式增强类明智的知识一致性，在提示通信。具体而言，本地类分布补偿机制（LCDC）的引入，以减少跨客户端类内分布的差异，从而加强类内知识的一致性。此外，类感知提示聚合计划（CPA）的设计，以减轻类间的知识混淆，有选择地加强类相关的知识聚合。在多个FCL基准测试上的大量实验表明，C${}^2$Prompt实现了最先进的性能。我们的源代码可以在https://github.com/zhoujiahuan1991/NeurIPS2025-C2Prompt上找到
摘要：Federated continual learning (FCL) tackles scenarios of learning from continuously emerging task data across distributed clients, where the key challenge lies in addressing both temporal forgetting over time and spatial forgetting simultaneously. Recently, prompt-based FCL methods have shown advanced performance through task-wise prompt communication.In this study, we underscore that the existing prompt-based FCL methods are prone to class-wise knowledge coherence between prompts across clients. The class-wise knowledge coherence includes two aspects: (1) intra-class distribution gap across clients, which degrades the learned semantics across prompts, (2) inter-prompt class-wise relevance, which highlights cross-class knowledge confusion. During prompt communication, insufficient class-wise coherence exacerbates knowledge conflicts among new prompts and induces interference with old prompts, intensifying both spatial and temporal forgetting. To address these issues, we propose a novel Class-aware Client Knowledge Interaction (C${}^2$Prompt) method that explicitly enhances class-wise knowledge coherence during prompt communication. Specifically, a local class distribution compensation mechanism (LCDC) is introduced to reduce intra-class distribution disparities across clients, thereby reinforcing intra-class knowledge consistency. Additionally, a class-aware prompt aggregation scheme (CPA) is designed to alleviate inter-class knowledge confusion by selectively strengthening class-relevant knowledge aggregation. Extensive experiments on multiple FCL benchmarks demonstrate that C${}^2$Prompt achieves state-of-the-art performance. Our source code is available at https://github.com/zhoujiahuan1991/NeurIPS2025-C2Prompt

【4】OmniFed: A Modular Framework for Configurable Federated Learning from Edge to HPC
标题：OmniFed：从边缘到IPC的可配置联邦学习的模块化框架
链接：https://arxiv.org/abs/2509.19396

作者：gi, Andrei Cozma, Olivera Kotevska, Feiyi Wang
摘要：联合学习（FL）对于边缘和高性能计算（HPC）至关重要，因为数据不是集中的，隐私至关重要。我们提出了OmniFed，一个模块化框架，围绕解耦和明确分离的关注配置，编排，通信和培训逻辑。它的体系结构支持配置驱动的原型设计和代码级覆盖您所需要的定制。我们还支持不同的拓扑结构、单一部署中的混合通信协议以及流行的训练算法。它还提供可选的隐私机制，包括差分隐私（DP），同态加密（HE）和安全聚合（SA）以及压缩策略。这些功能通过定义良好的扩展点公开，允许用户自定义拓扑和编排、学习逻辑和隐私/压缩插件，同时保持核心系统的完整性。我们评估多个模型和算法来衡量各种性能指标。通过在一个堆栈中统一拓扑配置、混合协议通信和可插拔模块，OmniFed简化了异构环境中的FL部署。Github存储库可在https://github.com/at-aaims/OmniFed获得。
摘要：Federated Learning (FL) is critical for edge and High Performance Computing (HPC) where data is not centralized and privacy is crucial. We present OmniFed, a modular framework designed around decoupling and clear separation of concerns for configuration, orchestration, communication, and training logic. Its architecture supports configuration-driven prototyping and code-level override-what-you-need customization. We also support different topologies, mixed communication protocols within a single deployment, and popular training algorithms. It also offers optional privacy mechanisms including Differential Privacy (DP), Homomorphic Encryption (HE), and Secure Aggregation (SA), as well as compression strategies. These capabilities are exposed through well-defined extension points, allowing users to customize topology and orchestration, learning logic, and privacy/compression plugins, all while preserving the integrity of the core system. We evaluate multiple models and algorithms to measure various performance metrics. By unifying topology configuration, mixed-protocol communication, and pluggable modules in one stack, OmniFed streamlines FL deployment across heterogeneous environments. Github repository is available at https://github.com/at-aaims/OmniFed.

推理|分析|理解|解释(8篇)

【1】Energy Use of AI Inference: Efficiency Pathways and Test-Time Compute
标题：人工智能推理的能源使用：效率途径和测试时间计算
链接：https://arxiv.org/abs/2509.20241

作者：iedo, Fiodar Kazhamiaka, Esha Choukse, Allen Kim, Amy Luers, Melanie Nakagawa, Ricardo Bianchini, Juan M. Lavista Ferres
备注：A preprint version with DOI is available at Zenodo: this https URL
摘要：随着人工智能推理扩展到数十亿次查询，新兴的推理和代理工作流增加了令牌需求，对每个查询的能源使用的可靠估计对于容量规划，排放核算和效率优先级越来越重要。许多公开的估计是不一致的，夸大了能源使用，因为它们是从有限的基准推断出来的，没有反映出大规模可实现的效率提高。从这个角度来看，我们引入了一个自下而上的方法来估计每个查询的能量的大规模LLM系统的令牌吞吐量的基础上。对于在现实工作负载、GPU利用率和PUE约束下在H100节点上运行的模型，我们估计前沿规模模型（> 2000亿个参数）的每个查询的平均能量为0.34 Wh（IQR：0.18-0.67）。这些结果与使用生产规模配置的测量结果一致，并表明非生产估计和假设可以将能源使用夸大4- 20倍。扩展到测试时间扩展场景，每个典型查询的令牌增加15倍，中位数能量增加13倍，达到4.32 Wh，这表明在这种制度下的目标效率将实现最大的车队范围内的节省。我们量化了模型、服务平台和硬件层面上可实现的效率提升，发现每个查询的能耗降低了1.5- 3.5倍，而综合进步可以实现8- 20倍的降低。为了说明系统级的影响，我们估计服务10亿次查询的部署的基线每日能源使用量为0.8 GWh/天。如果10%是长查询，需求可能会增长到1.8 GWh/天。通过有针对性的效率干预，它下降到0.9 GWh/天，类似于该规模的网络搜索的能源足迹。这与数据中心在互联网和云构建期间通过提高效率来调节能源增长的历史相呼应。
摘要：As AI inference scales to billions of queries and emerging reasoning and agentic workflows increase token demand, reliable estimates of per-query energy use are increasingly important for capacity planning, emissions accounting, and efficiency prioritization. Many public estimates are inconsistent and overstate energy use, because they extrapolate from limited benchmarks and fail to reflect efficiency gains achievable at scale. In this perspective, we introduce a bottom-up methodology to estimate the per-query energy of large-scale LLM systems based on token throughput. For models running on an H100 node under realistic workloads, GPU utilization and PUE constraints, we estimate a median energy per query of 0.34 Wh (IQR: 0.18-0.67) for frontier-scale models (>200 billion parameters). These results are consistent with measurements using production-scale configurations and show that non-production estimates and assumptions can overstate energy use by 4-20x. Extending to test-time scaling scenarios with 15x more tokens per typical query, the median energy rises 13x to 4.32 Wh, indicating that targeting efficiency in this regime will deliver the largest fleet-wide savings. We quantify achievable efficiency gains at the model, serving platform, and hardware levels, finding individual median reductions of 1.5-3.5x in energy per query, while combined advances can plausibly deliver 8-20x reductions. To illustrate the system-level impact, we estimate the baseline daily energy use of a deployment serving 1 billion queries to be 0.8 GWh/day. If 10% are long queries, demand could grow to 1.8 GWh/day. With targeted efficiency interventions, it falls to 0.9 GWh/day, similar to the energy footprint of web search at that scale. This echoes how data centers historically tempered energy growth through efficiency gains during the internet and cloud build-up.

【2】Practical do-Shapley Explanations with Estimand-Agnostic Causal Inference
标题：具有估计不可知因果推理的实用do-Shapley解释
链接：https://arxiv.org/abs/2509.20211

作者：rafita, Tomas Garriga, Axel Brando, Francisco J. Cazorla
备注：Accepted for publication at NeurIPS 2025
摘要：在可解释性技术中，SHAP是最受欢迎的技术之一，但往往忽略了问题的因果结构。作为回应，do-SHAP采用介入查询，但其依赖于被估量阻碍了其实际应用。为了解决这个问题，我们提出了使用估计不可知的方法，它允许从一个单一的模型估计任何可识别的查询，使做SHAP在复杂的图形上可行。我们还开发了一种新的算法，大大加快其计算在一个可以忽略不计的成本，以及一种方法来解释无法访问的数据生成过程。我们展示了我们的方法的估计和计算性能，并在两个真实世界的数据集上验证了它，突出了它在获得可靠解释方面的潜力。
摘要：Among explainability techniques, SHAP stands out as one of the most popular, but often overlooks the causal structure of the problem. In response, do-SHAP employs interventional queries, but its reliance on estimands hinders its practical application. To address this problem, we propose the use of estimand-agnostic approaches, which allow for the estimation of any identifiable query from a single model, making do-SHAP feasible on complex graphs. We also develop a novel algorithm to significantly accelerate its computation at a negligible cost, as well as a method to explain inaccessible Data Generating Processes. We demonstrate the estimation and computational performance of our approach, and validate it on two real-world datasets, highlighting its potential in obtaining reliable explanations.

【3】Faster, Smaller, and Smarter: Task-Aware Expert Merging for Online MoE Inference
标题：更快、更小、更智能：任务感知专家合并以实现在线MoE推理
链接：https://arxiv.org/abs/2509.19781

作者： Xutong Liu, Ruiting Zhou, Xiangxiang Dai, John C.S. Lui
摘要：Sparse Mixture of Experts (SMoE) has become a preferred architecture for scaling Transformer capacity without increasing computational cost, as it activates only a small subset of experts for each input. However, deploying such an approach for \textit{online inference} remains challenging due to the large size of a full SMoE model and the complexity of expert routing, especially in resource-constrained edge networks. Moreover, during the online inference, task information is often unavailable, making the task-level routing error-prone. In this work, we propose a novel tree-structured adaptive neural bandit router, \texttt{Tanbr}, to enable efficient and reliable online MoE inference. Instead of relying on explicit task tags, \texttt{Tanbr} estimates the task distribution over time from historical data and uses it to guide task-aware expert merging within a given pre-trained MoE. To handle the large continuous space of merging weights, \texttt{Tanbr} employs a binary tree to progressively partition the space and generate finer candidate weights. It then applies a neural bandit to learn the non-linear mapping from merging weight to model performance and decides optimal expert merging. We prove that \texttt{Tanbr} achieves a sublinear regret bound of {\small $\mathcal{O}(\sqrt{T} \log(T))$} over {\small $T$} rounds, despite operating over a continuous decision space, matching regret bounds compared to existing methods. Extensive experiments show that \texttt{Tanbr} reduces inference latency by at least {\small $45\%$} and memory usage by up to {\small $25\%$}, while maintaining a high accuracy compared to many state-of-the-art methods.
摘要：Sparse Mixture of Experts (SMoE) has become a preferred architecture for scaling Transformer capacity without increasing computational cost, as it activates only a small subset of experts for each input. However, deploying such an approach for \textit{online inference} remains challenging due to the large size of a full SMoE model and the complexity of expert routing, especially in resource-constrained edge networks. Moreover, during the online inference, task information is often unavailable, making the task-level routing error-prone. In this work, we propose a novel tree-structured adaptive neural bandit router, \texttt{Tanbr}, to enable efficient and reliable online MoE inference. Instead of relying on explicit task tags, \texttt{Tanbr} estimates the task distribution over time from historical data and uses it to guide task-aware expert merging within a given pre-trained MoE. To handle the large continuous space of merging weights, \texttt{Tanbr} employs a binary tree to progressively partition the space and generate finer candidate weights. It then applies a neural bandit to learn the non-linear mapping from merging weight to model performance and decides optimal expert merging. We prove that \texttt{Tanbr} achieves a sublinear regret bound of {\small $\mathcal{O}(\sqrt{T} \log(T))$} over {\small $T$} rounds, despite operating over a continuous decision space, matching regret bounds compared to existing methods. Extensive experiments show that \texttt{Tanbr} reduces inference latency by at least {\small $45\%$} and memory usage by up to {\small $25\%$}, while maintaining a high accuracy compared to many state-of-the-art methods.

【4】What Does Your Benchmark Really Measure? A Framework for Robust Inference of AI Capabilities
标题：你的基准真正衡量的是什么？人工智能能力稳健推理的框架
链接：https://arxiv.org/abs/2509.19590

作者： Jo, Ashia Wilson
摘要：对基准数据的生成模型的评估现在无处不在，其结果严重影响了公众和科学界对人工智能能力的期望。然而，人们对它们的可靠性越来越怀疑。我们如何知道报告的准确性真正反映了模型的真实性能？评估通常以简单的测量方式呈现，但实际上它们是推论：将基准分数视为能力的证据，已经是在假设什么是能力以及能力如何在测试中表现的理论。我们通过提出一个评估推理的原则框架来明确这一步骤：从能力理论开始，然后推导出评估能力的方法。这种观点在心理测量学等领域很常见，但在人工智能评估中还没有普及。作为概念验证，我们解决了一个破坏可靠性的核心挑战：对扰动的敏感性。在制定了一个模型的能力，我们介绍的方法，推断能力，同时占不确定性的敏感性和有限的样本，包括自适应算法，显着降低样本的复杂性。总之，这些贡献为通过基准测试对人工智能能力进行更可靠和更值得信赖的评估奠定了基础。
摘要：Evaluations of generative models on benchmark data are now ubiquitous, and their outcomes critically shape public and scientific expectations of AI's capabilities. Yet growing skepticism surrounds their reliability. How can we know that a reported accuracy genuinely reflects a model's true performance? Evaluations are often presented as simple measurements, but in reality they are inferences: to treat benchmark scores as evidence of capability is already to assume a theory of what capability is and how it manifests in a test. We make this step explicit by proposing a principled framework for evaluation as inference: begin from a theory of capability, and then derive methods for estimating it. This perspective, familiar in fields such as psychometrics, has not yet become commonplace in AI evaluation. As a proof of concept, we address a central challenge that undermines reliability: sensitivity to perturbations. After formulating a model of ability, we introduce methods that infer ability while accounting for uncertainty from sensitivity and finite samples, including an adaptive algorithm that significantly reduces sample complexity. Together, these contributions lay the groundwork for more reliable and trustworthy estimates of AI capabilities as measured through benchmarks.

【5】Learning Dynamics of Deep Learning -- Force Analysis of Deep Neural Networks
标题：深度学习的学习动力学--深度神经网络的力分析
链接：https://arxiv.org/abs/2509.19554

作者：
备注：175 pages
摘要：本论文利用受力分析启发的想法，探讨深度学习模型如何随着时间的推移进行学习。具体来说，我们放大模型的训练过程，看看一个训练示例在学习过程中如何影响另一个训练示例，比如分析力如何移动物体。我们将这种影响分为两部分：两个例子有多相似，以及更新力有多强。这个框架帮助我们理解模型在不同真实系统中的广泛行为。例如，它解释了为什么某些示例具有非平凡的学习路径，为什么（以及为什么不）一些LLM微调方法有效，以及为什么更简单，更结构化的模式往往更容易学习。我们将这种方法应用于各种学习任务，并发现了改进模型训练的新策略。虽然该方法仍处于发展阶段，但它为系统地解释模型的行为提供了一种新的方法。
摘要：This thesis explores how deep learning models learn over time, using ideas inspired by force analysis. Specifically, we zoom in on the model's training procedure to see how one training example affects another during learning, like analyzing how forces move objects. We break this influence into two parts: how similar the two examples are, and how strong the updating force is. This framework helps us understand a wide range of the model's behaviors in different real systems. For example, it explains why certain examples have non-trivial learning paths, why (and why not) some LLM finetuning methods work, and why simpler, more structured patterns tend to be learned more easily. We apply this approach to various learning tasks and uncover new strategies for improving model training. While the method is still developing, it offers a new way to interpret models' behaviors systematically.

【6】Constraint-Reduced MILP with Local Outlier Factor Modeling for Plausible Counterfactual Explanations in Credit Approval
标题：具有局部异常值因子建模的约束简化MILP，用于信贷批准中合理的反事实解释
链接：https://arxiv.org/abs/2509.19504

作者：yen Thanh, Huyen Giang Thi Thu, Tai Le Quy, Ha-Bang Ban
备注：Accepted to NICE-TEAS ASIA 2025 conference
摘要：反事实解释（CE）是一种广泛使用的事后方法，为个体提供可操作的更改，以改变机器学习模型的不利预测。合理并行工程方法通过考虑数据分布特性提高了真实性，但其优化模型引入了大量约束，导致计算成本高。在这项工作中，我们重新审视了DACE框架，并提出了一个改进的混合线性规划（MILP）的制定，显着减少了本地离群因子（LOF）的目标组件的约束的数量。我们还将该方法应用到一个线性SVM分类器与标准缩放。实验结果表明，我们的方法实现了更快的解决时间，同时保持解释质量。这些结果证明了在反事实解释和数据科学应用中更有效的LOF建模的前景。
摘要：Counterfactual explanation (CE) is a widely used post-hoc method that provides individuals with actionable changes to alter an unfavorable prediction from a machine learning model. Plausible CE methods improve realism by considering data distribution characteristics, but their optimization models introduce a large number of constraints, leading to high computational cost. In this work, we revisit the DACE framework and propose a refined Mixed-Integer Linear Programming (MILP) formulation that significantly reduces the number of constraints in the local outlier factor (LOF) objective component. We also apply the method to a linear SVM classifier with standard scaler. The experimental results show that our approach achieves faster solving times while maintaining explanation quality. These results demonstrate the promise of more efficient LOF modeling in counterfactual explanation and data science applications.

【7】Statistical Inference Leveraging Synthetic Data with Distribution-Free Guarantees
标题：统计推理利用合成数据和无分布保证
链接：https://arxiv.org/abs/2509.20345

作者：hari, Yonghoon Lee, Roy Maor Lotan, Edgar Dobriban, Yaniv Romano
摘要：高质量的合成数据（由先进的人工智能模型生成或作为相关任务的辅助数据收集）的快速增长为统计推断带来了机遇和挑战。本文介绍了一个通用的综合动力推理（GESPI）框架，该框架围绕任何统计推理过程，通过结合合成数据和真实数据来安全地提高样本效率。我们的框架利用高质量的合成数据来提高统计能力，但当合成数据质量较低时，自适应地默认为仅使用真实数据的标准推理方法。我们的方法的误差仍然低于用户指定的范围内没有任何分布假设的合成数据，并降低合成数据的质量提高。这种灵活性使其能够与共形预测、风险控制、假设检验和多个检验程序无缝集成，而无需修改基本推理方法。我们展示了我们的方法在具有有限标记数据的挑战性任务上的优势，包括AlphaFold蛋白质结构预测，以及在复杂数学问题上比较大型推理模型。
摘要：The rapid proliferation of high-quality synthetic data -- generated by advanced AI models or collected as auxiliary data from related tasks -- presents both opportunities and challenges for statistical inference. This paper introduces a GEneral Synthetic-Powered Inference (GESPI) framework that wraps around any statistical inference procedure to safely enhance sample efficiency by combining synthetic and real data. Our framework leverages high-quality synthetic data to boost statistical power, yet adaptively defaults to the standard inference method using only real data when synthetic data is of low quality. The error of our method remains below a user-specified bound without any distributional assumptions on the synthetic data, and decreases as the quality of the synthetic data improves. This flexibility enables seamless integration with conformal prediction, risk control, hypothesis testing, and multiple testing procedures, all without modifying the base inference method. We demonstrate the benefits of our method on challenging tasks with limited labeled data, including AlphaFold protein structure prediction, and comparing large reasoning models on complex math problems.

【8】Quantum Harmonic Analysis and the Structure in Data: Augmentation
标题：量子调和分析和数据结构：增强
链接：https://arxiv.org/abs/2509.19474

作者：erfler, Franz Luef, Henry McNulty
备注：13 pages, 2 figures
摘要：在这篇简短的笔记中，我们研究了数据增强对高维数据集主成分平滑度的影响。利用量子调和分析的工具，我们证明了对应于增广数据集的算子的本征函数位于调制空间$M^1（\mathbb{R}^d）$中，保证了光滑性和连续性。合成和音频数据的数值例子证实了理论研究结果。虽然本身很有趣，但结果表明，流形学习和特征提取算法可以从系统和知情的增强原则中受益。
摘要：In this short note, we study the impact of data augmentation on the smoothness of principal components of high-dimensional datasets. Using tools from quantum harmonic analysis, we show that eigenfunctions of operators corresponding to augmented data sets lie in the modulation space $M^1(\mathbb{R}^d)$, guaranteeing smoothness and continuity. Numerical examples on synthetic and audio data confirm the theoretical findings. While interesting in itself, the results suggest that manifold learning and feature extraction algorithms can benefit from systematic and informed augmentation principles.

检测相关(4篇)

【1】An Improved Time Series Anomaly Detection by Applying Structural Similarity
标题：一种改进的基于结构相似性的时间序列异常检测方法
链接：https://arxiv.org/abs/2509.20184

作者：ng, Rui Wang, Xudong Mou, Mengyuan Ma, Tianyu Wo, Renyu Yang, Xudong Liu
摘要：有效的时间序列异常检测是现代工业应用和金融系统的关键。由于异常标记的稀缺性和人工标记的高成本，基于重建的无监督方法已经获得了相当大的关注。然而，准确的异常检测仍然是一个尚未解决的挑战，因为基于重构的方法的优化目标仅仅依赖于逐点距离测量，忽略了时间序列的潜在结构特征，从而无法解决复杂的模式异常。在本文中，我们提出了StrAD，一种新的结构增强的异常检测方法，以丰富的优化目标，将隐藏在时间序列中的结构信息，并引导数据重建过程，以更好地捕捉这种结构特征。StrAD在重构模型的优化目标中适应趋势、季节性和形状，以学习潜在的结构特征并捕获时间序列的内在模式变化。提出的结构感知优化目标机制可以保证原始数据和重构数据在结构特征上的一致性，从而保持全局波动和局部特征的一致性。该机制是可插拔的，适用于任何基于重构的方法，增强了模型对点式异常和模式式异常的敏感性。实验结果表明，StrAD在五个真实世界的异常检测数据集上提高了最先进的基于重建的模型的性能。
摘要：Effective anomaly detection in time series is pivotal for modern industrial applications and financial systems. Due to the scarcity of anomaly labels and the high cost of manual labeling, reconstruction-based unsupervised approaches have garnered considerable attention. However, accurate anomaly detection remains an unsettled challenge, since the optimization objectives of reconstruction-based methods merely rely on point-by-point distance measures, ignoring the potential structural characteristics of time series and thus failing to tackle complex pattern-wise anomalies. In this paper, we propose StrAD, a novel structure-enhanced anomaly detection approach to enrich the optimization objective by incorporating structural information hidden in the time series and steering the data reconstruction procedure to better capture such structural features. StrAD accommodates the trend, seasonality, and shape in the optimization objective of the reconstruction model to learn latent structural characteristics and capture the intrinsic pattern variation of time series. The proposed structure-aware optimization objective mechanism can assure the alignment between the original data and the reconstructed data in terms of structural features, thereby keeping consistency in global fluctuation and local characteristics. The mechanism is pluggable and applicable to any reconstruction-based methods, enhancing the model sensitivity to both point-wise anomalies and pattern-wise anomalies. Experimental results show that StrAD improves the performance of state-of-the-art reconstruction-based models across five real-world anomaly detection datasets.

【2】Solving Freshness in RAG: A Simple Recency Prior and the Limits of Heuristic Trend Detection
标题：解决RAG中的新鲜度：简单的近因先验和启发式趋势检测的局限性
链接：https://arxiv.org/abs/2509.19376

作者：rofsky
摘要：我们使用两种方法对网络安全数据解决RAG系统中的时间故障。一个简单的近因先验在新鲜度任务上实现了1.00的准确度。相比之下，用于主题进化的聚类启发式算法失败了（0.08 F1分数），这表明趋势检测需要超出简单启发式算法的方法。
摘要：We address temporal failures in RAG systems using two methods on cybersecurity data. A simple recency prior achieved an accuracy of 1.00 on freshness tasks. In contrast, a clustering heuristic for topic evolution failed (0.08 F1-score), showing trend detection requires methods beyond simple heuristics.

【3】Deep learning for exoplanet detection and characterization by direct imaging at high contrast
标题：通过高对比度直接成像进行系外行星检测和定性的深度学习
链接：https://arxiv.org/abs/2509.20310

作者：ito, Olivier Flasseur, Julien Mairal, Jean Ponce, Maud Langlois, Anne-Marie Lagrange
备注：SF2A 2025
摘要：由于对高角分辨率和高对比度的需求，系外行星成像是天体物理学中的一个主要挑战。我们提出了一个多尺度的统计模型的滋扰组件腐败的多元图像系列在高对比度。集成到一个可学习的架构中，它利用了问题的物理学原理，并能够以一种在检测信噪比方面最佳的方式融合同一颗恒星的多个观测结果。应用于VLT/SPHERE仪器的数据，该方法显着提高了检测灵敏度和天体测量和光度估计的精度。
摘要：Exoplanet imaging is a major challenge in astrophysics due to the need for high angular resolution and high contrast. We present a multi-scale statistical model for the nuisance component corrupting multivariate image series at high contrast. Integrated into a learnable architecture, it leverages the physics of the problem and enables the fusion of multiple observations of the same star in a way that is optimal in terms of detection signal-to-noise ratio. Applied to data from the VLT/SPHERE instrument, the method significantly improves the detection sensitivity and the accuracy of astrometric and photometric estimation.

【4】Hybrid Pipeline SWD Detection in Long-Term EEG Signals
标题：长期脑电信号中的混合管道SWD检测
链接：https://arxiv.org/abs/2509.19387

作者：uintero Rincon, Nicolas Masino, Veronica Marsico, Hadj Batatia
备注：11 pages, 8 figures, 4 tables, SABI 2025 CLIC 2025
摘要：棘波放电（SWD）是失神癫痫的脑电图标志，但在多日记录中手动识别它们仍然是劳动密集型和容易出错的。我们提出了一个轻量级的混合管道，耦合分析功能与浅层人工神经网络（ANN）的准确，患者特定的SWD检测在长期的单极EEG。双边移动平均（MA）滤波器首先抑制正常背景活动的高频分量。然后，通过其正态分布样本的平均值和标准差来总结残差信号，从而为每20 s窗口产生紧凑的二维特征向量。这些特征被馈送到通过反向传播训练的单隐藏层ANN，以将每个窗口分类为SWD或非SWD。该方法进行了评估，在256 Hz采样的780个通道，从12名患者，包括392注释SWD事件。它正确检测到384个事件（灵敏度：98%），同时达到96.2%的特异性和97.2%的总体准确性。由于特征提取是分析性的，并且分类器很小，因此流水线实时运行，不需要手动阈值调整。这些结果表明，正态分布描述符结合适度的人工神经网络提供了一个有效的和计算成本低廉的解决方案，自动SWD筛选扩展EEG记录。
摘要：Spike-and-wave discharges (SWDs) are the electroencephalographic hallmark of absence epilepsy, yet their manual identification in multi-day recordings remains labour-intensive and error-prone. We present a lightweight hybrid pipeline that couples analytical features with a shallow artificial neural network (ANN) for accurate, patient-specific SWD detection in long-term, monopolar EEG. A two-sided moving-average (MA) filter first suppresses the high-frequency components of normal background activity. The residual signal is then summarised by the mean and the standard deviation of its normally distributed samples, yielding a compact, two-dimensional feature vector for every 20s window. These features are fed to a single-hidden-layer ANN trained via back-propagation to classify each window as SWD or non-SWD. The method was evaluated on 780 channels sampled at 256 Hz from 12 patients, comprising 392 annotated SWD events. It correctly detected 384 events (sensitivity: 98%) while achieving a specificity of 96.2 % and an overall accuracy of 97.2%. Because feature extraction is analytic, and the classifier is small, the pipeline runs in real-time and requires no manual threshold tuning. These results indicate that normal-distribution descriptors combined with a modest ANN provide an effective and computationally inexpensive solution for automated SWD screening in extended EEG recordings.

分类|识别(5篇)

【1】Thinking While Listening: Simple Test Time Scaling For Audio Classification
标题：边听边思考：音频分类的简单测试时间缩放
链接：https://arxiv.org/abs/2509.19676

作者：erma, Mert Pilanci
备注：6 pages, 3 figures, 2 Tables, ICASSP 2026
摘要：我们提出了一个框架，使神经模型能够“边听边思考”日常声音，从而提高音频分类性能。受大型语言模型推理能力的最新进展的启发，我们解决了两个核心问题：（i）如何将思维纳入现有的音频分类管道中，以实现类别空间中的推理并提高性能，以及（ii）可以从头开始设计一个新的架构来支持思维和测试时间缩放？我们证明，在这两种设置中，我们的模型表现出更高的分类精度。利用测试时间缩放，我们观察到一致的增益作为采样迹线的数量增加。此外，我们评估了两个开源的推理模型，GPT-OSS-20 B和Qwen 3 - 14 B，表明虽然这些模型能够进行zero-shot推理，但一种轻量级的方法-只重新训练冻结的嵌入矩阵，如GPT-2-可以超越基于十亿参数文本的推理模型的性能。
摘要：We propose a framework that enables neural models to "think while listening" to everyday sounds, thereby enhancing audio classification performance. Motivated by recent advances in the reasoning capabilities of large language models, we address two central questions: (i) how can thinking be incorporated into existing audio classification pipelines to enable reasoning in the category space and improve performance, and (ii) can a new architecture be designed from the ground up to support both thinking and test-time scaling? We demonstrate that in both settings, our models exhibit improved classification accuracy. Leveraging test-time scaling, we observe consistent gains as the number of sampled traces increases. Furthermore, we evaluate two open-source reasoning models, GPT-OSS-20B and Qwen3-14B, showing that while such models are capable of zero-shot reasoning, a lightweight approach--retraining only the embedding matrix of a frozen, smaller model like GPT-2--can surpass the performance of billion-parameter text-based reasoning models.

【2】Efficient Online Large-Margin Classification via Dual Certificates
标题：通过双证书高效在线大额分类
链接：https://arxiv.org/abs/2509.19670

作者：uyen, Fatma Kılınç-Karzan, Ellie Nguyen, Lingqing Shen
摘要：在线分类是优化、统计学习和数据科学中的核心问题。经典的算法，如感知器提供有效的更新和有限的错误保证线性可分的数据，但他们不利用潜在的几何结构的分类问题。我们研究离线最大利润率问题，通过其双重制定和使用由此产生的几何见解，设计一个原则和有效的算法在线设置。我们的方法的一个关键特征是它的平移不变性，继承自离线公式，这在其性能分析中起着核心作用。我们的理论分析产生了改进的错误和利润界限，只依赖于预测不变量，提供更强的保证比现有的算法在相同的假设下，在有利的设置。特别是，我们确定了一个参数制度，我们的算法使每个序列最多两个错误，而感知器可以被迫作出任意多的错误。我们对真实数据的数值研究进一步表明，我们的方法与现有在线算法的计算效率相匹配，同时在准确性上显着优于它们。
摘要：Online classification is a central problem in optimization, statistical learning and data science. Classical algorithms such as the perceptron offer efficient updates and finite mistake guarantees on linearly separable data, but they do not exploit the underlying geometric structure of the classification problem. We study the offline maximum margin problem through its dual formulation and use the resulting geometric insights to design a principled and efficient algorithm for the online setting. A key feature of our method is its translation invariance, inherited from the offline formulation, which plays a central role in its performance analysis. Our theoretical analysis yields improved mistake and margin bounds that depend only on translation-invariant quantities, offering stronger guarantees than existing algorithms under the same assumptions in favorable settings. In particular, we identify a parameter regime where our algorithm makes at most two mistakes per sequence, whereas the perceptron can be forced to make arbitrarily many mistakes. Our numerical study on real data further demonstrates that our method matches the computational efficiency of existing online algorithms, while significantly outperforming them in accuracy.

【3】MAGIC: Multi-task Gaussian process for joint imputation and classification in healthcare time series
标题：MAGIC：医疗保健时间序列中联合插补和分类的多任务高斯过程
链接：https://arxiv.org/abs/2509.19577

作者：, Catherine D. Chong, Visar Berisha, Todd J. Schwedt, Jing Li
备注：36 pages, 4 figures
摘要：时间序列分析已成为改善医疗保健应用中患者诊断和管理的重要工具。然而，这些应用程序通常面临两个关键的挑战：时间错位和数据稀疏。传统的方法通过估算和预测这两个步骤来解决这些问题。我们提出了MAGIC（Multi-tAsk Gaussian Process for Imputation and Classification），这是一种新的统一框架，可以在分层多任务高斯过程中同时执行类通知缺失值填补和标签预测，再加上功能逻辑回归。为了处理棘手的似然分量，MAGIC采用泰勒展开近似与有界误差分析，和参数估计使用EM算法与块坐标优化的收敛性分析支持。我们通过两个医疗保健应用程序验证MAGIC：预测轻度创伤性脑损伤后创伤后头痛改善和预测ICU入院后48小时内的住院死亡率。在这两种应用中，与现有方法相比，MAGIC实现了更高的预测精度。利用有限的样本生成实时和准确预测的能力有助于早期临床评估和治疗计划，使医疗保健提供者能够做出更明智的治疗决策。
摘要：Time series analysis has emerged as an important tool for improving patient diagnosis and management in healthcare applications. However, these applications commonly face two critical challenges: time misalignment and data sparsity. Traditional approaches address these issues through a two-step process of imputation followed by prediction. We propose MAGIC (Multi-tAsk Gaussian Process for Imputation and Classification), a novel unified framework that simultaneously performs class-informed missing value imputation and label prediction within a hierarchical multi-task Gaussian process coupled with functional logistic regression. To handle intractable likelihood components, MAGIC employs Taylor expansion approximations with bounded error analysis, and parameter estimation is performed using EM algorithm with block coordinate optimization supported by convergence analysis. We validate MAGIC through two healthcare applications: prediction of post-traumatic headache improvement following mild traumatic brain injury and prediction of in-hospital mortality within 48 hours after ICU admission. In both applications, MAGIC achieves superior predictive accuracy compared to existing methods. The ability to generate real-time and accurate predictions with limited samples facilitates early clinical assessment and treatment planning, enabling healthcare providers to make more informed treatment decisions.

【4】Low-Cost Sensor Fusion Framework for Organic Substance Classification and Quality Control Using Classification Methods
标题：使用分类方法进行有机物质分类和质量控制的低成本传感器融合框架
链接：https://arxiv.org/abs/2509.19367

作者：din Chowdhury, Damian Valles, Md Raf E Ul Shougat
备注：Copyright 2025 IEEE. This is the author's version of the work accepted for publication in FMLDS 2025. The final version will be published by IEEE and available via DOI (to be inserted when available). Accepted at FMLDS 2025, to appear in IEEE Xplore. 8 pages, 17 figures, 3 tables
摘要：我们提出了一个传感器融合框架，用于快速，非破坏性的分类和有机物质的质量控制，建立在一个标准的Arduino Mega 2560微控制器平台上，配备了三个商业环境和气体传感器。本研究中使用的所有数据都是在内部生成的：十个不同类别的传感器输出-包括苹果汁，洋葱，大蒜和生姜的新鲜和过期样品，以及肉桂和豆蔻-使用这种硬件设置系统地收集和标记，从而产生独特的特定于应用的数据集。相关性分析被用作特征选择的预处理流水线的一部分。在预处理和降维（PCA/LDA）之后，多个监督学习模型-包括支持向量机（SVM），决策树（DT）和随机森林（RF），每个模型都具有超参数调整，以及人工神经网络（ANN）和集成投票分类器-在收集的数据集上进行训练和交叉验证。性能最好的模型，包括调整的随机森林，集成和人工神经网络，实现了93%至94%的测试精度。这些结果表明，基于Arduino Mega 2560的低成本多传感器平台，结合先进的机器学习和相关性驱动的特征工程，能够可靠地识别和质量控制有机化合物。
摘要：We present a sensor-fusion framework for rapid, non-destructive classification and quality control of organic substances, built on a standard Arduino Mega 2560 microcontroller platform equipped with three commercial environmental and gas sensors. All data used in this study were generated in-house: sensor outputs for ten distinct classes - including fresh and expired samples of apple juice, onion, garlic, and ginger, as well as cinnamon and cardamom - were systematically collected and labeled using this hardware setup, resulting in a unique, application-specific dataset. Correlation analysis was employed as part of the preprocessing pipeline for feature selection. After preprocessing and dimensionality reduction (PCA/LDA), multiple supervised learning models - including Support Vector Machine (SVM), Decision Tree (DT), and Random Forest (RF), each with hyperparameter tuning, as well as an Artificial Neural Network (ANN) and an ensemble voting classifier - were trained and cross-validated on the collected dataset. The best-performing models, including tuned Random Forest, ensemble, and ANN, achieved test accuracies in the 93 to 94 percent range. These results demonstrate that low-cost, multisensory platforms based on the Arduino Mega 2560, combined with advanced machine learning and correlation-driven feature engineering, enable reliable identification and quality control of organic compounds.

【5】LibEMER: A novel benchmark and algorithms library for EEG-based Multimodal Emotion Recognition
标题：LibEMER：基于脑电波的多模式情绪识别的新型基准和算法库
链接：https://arxiv.org/abs/2509.19330

作者：, Yunshan Chen, Chengxi Xie, Huan Liu
备注：5 pages, 2 figures
摘要：基于EEG的多模态情感识别（EMER）已经得到了广泛的关注并取得了显著的进展，人类神经系统固有的复杂性促使人们对多模态方法进行了大量的努力。然而，这一领域目前受到三个关键限制：（i）缺乏开源实现。(ii)缺乏公平业绩分析的标准化和透明的基准。(iii)关于主要挑战和有希望的研究方向的深入讨论明显不足。为了应对这些挑战，我们引入了LibEMER，这是一个统一的评估框架，它提供了完全可复制的PyTorch实现，以及用于数据预处理、模型实现和实验设置的标准化协议。该框架可以在两个学习任务中对三个广泛使用的公共数据集进行公正的性能评估。开放源代码库可在以下网址公开访问：https://anonymous.4open.science/r/2025ULUIUBUEUMUEUR485384
摘要：EEG-based multimodal emotion recognition(EMER) has gained significant attention and witnessed notable advancements, the inherent complexity of human neural systems has motivated substantial efforts toward multimodal approaches. However, this field currently suffers from three critical limitations: (i) the absence of open-source implementations. (ii) the lack of standardized and transparent benchmarks for fair performance analysis. (iii) in-depth discussion regarding main challenges and promising research directions is a notable scarcity. To address these challenges, we introduce LibEMER, a unified evaluation framework that provides fully reproducible PyTorch implementations of curated deep learning methods alongside standardized protocols for data preprocessing, model realization, and experimental setups. This framework enables unbiased performance assessment on three widely-used public datasets across two learning tasks. The open-source library is publicly accessible at: https://anonymous.4open.science/r/2025ULUIUBUEUMUEUR485384

表征(2篇)

【1】Diffusion-Augmented Contrastive Learning: A Noise-Robust Encoder for Biosignal Representations
标题：扩散增强对比学习：生物信号表示的噪音稳健编码器
链接：https://arxiv.org/abs/2509.20048

作者：il
摘要：生物信号的鲁棒表示学习通常受到设计有效数据增强的挑战的阻碍。传统方法无法捕获生理数据中固有的复杂变化。在此背景下，我们提出了一种新的混合框架，扩散增强对比学习（DACL），融合了扩散模型和监督对比学习的概念。DACL框架在由轻量级变分自动编码器（VAE）创建的潜在空间上运行，该编码器在我们的新型散射Transformer（ST）特征上训练[12]。它利用扩散前向过程作为原则性的数据增强技术来生成这些潜在嵌入的多个噪声视图。然后，使用监督对比目标训练U-Net风格的编码器，以学习一种表示，该表示在各种扩散时间步长上平衡类别区分与对噪声的鲁棒性。我们在PhysioNet 2017 ECG数据集上评估了这种概念验证方法，获得了0.7815的竞争性AUROC。这项工作通过使用扩散过程本身来驱动对比目标，建立了一种新的表示学习范式，创建了噪声不变的嵌入，为类可分性奠定了坚实的基础。
摘要：Learning robust representations for biosignals is often hampered by the challenge of designing effective data augmentations.Traditional methods can fail to capture the complex variations inherent in physiological data. Within this context, we propose a novel hybrid framework, Diffusion-Augmented Contrastive Learning (DACL), that fuses concepts from diffusion models and supervised contrastive learning. The DACL framework operates on a latent space created by a lightweight Variational Autoencoder (VAE) trained on our novel Scattering Transformer (ST) features [12]. It utilizes the diffusion forward process as a principled data augmentation technique to generate multiple noisy views of these latent embeddings. A U-Net style encoder is then trained with a supervised contrastive objective to learn a representation that balances class discrimination with robustness to noise across various diffusion time steps. We evaluated this proof-of-concept method on the PhysioNet 2017 ECG dataset, achieving a competitive AUROC of 0.7815. This work establishes a new paradigm for representation learning by using the diffusion process itself to drive the contrastive objective, creating noise-invariant embeddings that demonstrate a strong foundation for class separability.

【2】Representation-based Broad Hallucination Detectors Fail to Generalize Out of Distribution
标题：基于表象的广泛幻觉检测器未能概括出分布
链接：https://arxiv.org/abs/2509.19372

作者：ubanowska, Maciej Żelaszczyk, Michał Brzozowski, Paolo Mandica, Michał Karpowicz
备注：Accepted in EMNLP 2025 Findings
摘要：我们批判性地评估了当前SOTA在幻觉检测中的有效性，并发现其在RAGTruth数据集上的性能主要是由与数据的虚假相关性驱动的。控制这种效应，最先进的性能并不比监督线性探测器更好，同时需要跨数据集进行广泛的超参数调整。分布外泛化目前还无法实现，所有分析的方法都接近随机。我们提出了一套指导方针的幻觉检测和评估。
摘要：We critically assess the efficacy of the current SOTA in hallucination detection and find that its performance on the RAGTruth dataset is largely driven by a spurious correlation with data. Controlling for this effect, state-of-the-art performs no better than supervised linear probes, while requiring extensive hyperparameter tuning across datasets. Out-of-distribution generalization is currently out of reach, with all of the analyzed methods performing close to random. We propose a set of guidelines for hallucination detection and its evaluation.

编码器(1篇)

【1】Geometric Autoencoder Priors for Bayesian Inversion: Learn First Observe Later
标题：几何自动编码器的Bayesian倒置先验：先学习后观察
链接：https://arxiv.org/abs/2509.19929

作者：deboncoeur, Gregory Duthé, Mark Girolami, Eleni Chatzi
摘要：不确定性量化（UQ）是工程应用中推理的重要手段。一个常见的推理任务是从少量的噪声观测中恢复物理系统的全场信息，这通常是一个高度不适定的问题。重要的是，工程系统通常具有复杂和可变的几何形状，禁止使用标准的贝叶斯UQ。在这项工作中，我们介绍了贝叶斯反演的几何自动编码器（GABI），这是一个用于学习物理响应的几何感知生成模型的框架，这些模型可以作为贝叶斯反演的高度信息化的几何条件先验。遵循“先学习，后观察”的范式，GABI从具有不同几何形状的系统的大型数据集中提取信息，而不需要掌握偏微分方程，边界条件或观察过程的知识，成为丰富的潜在先验。在推理时，该先验与特定观察过程的可能性无缝结合，产生几何适应的后验分布。我们提出的框架是架构不可知的。创造性地使用近似贝叶斯计算（ABC）采样产生了利用现代GPU硬件的高效实现。我们测试我们的方法：矩形区域上稳态加热;绕翼型的雷诺平均纳维尔-斯托克斯（RANS）流; 3D车身上的亥姆霍兹共振和源定位;地形上的RANS气流。我们发现：在适用监督学习的受限环境中，预测精度与确定性监督学习方法相当; UQ在具有复杂几何形状的挑战性问题上具有良好的校准和鲁棒性。该方法提供了一种灵活的几何感知的火车一次使用任何地方的基础模型，这是独立于任何特定的观察过程。
摘要：Uncertainty Quantification (UQ) is paramount for inference in engineering applications. A common inference task is to recover full-field information of physical systems from a small number of noisy observations, a usually highly ill-posed problem. Critically, engineering systems often have complicated and variable geometries prohibiting the use of standard Bayesian UQ. In this work, we introduce Geometric Autoencoders for Bayesian Inversion (GABI), a framework for learning geometry-aware generative models of physical responses that serve as highly informative geometry-conditioned priors for Bayesian inversion. Following a ''learn first, observe later'' paradigm, GABI distills information from large datasets of systems with varying geometries, without requiring knowledge of governing PDEs, boundary conditions, or observation processes, into a rich latent prior. At inference time, this prior is seamlessly combined with the likelihood of the specific observation process, yielding a geometry-adapted posterior distribution. Our proposed framework is architecture agnostic. A creative use of Approximate Bayesian Computation (ABC) sampling yields an efficient implementation that utilizes modern GPU hardware. We test our method on: steady-state heat over rectangular domains; Reynold-Averaged Navier-Stokes (RANS) flow around airfoils; Helmholtz resonance and source localization on 3D car bodies; RANS airflow over terrain. We find: the predictive accuracy to be comparable to deterministic supervised learning approaches in the restricted setting where supervised learning is applicable; UQ to be well calibrated and robust on challenging problems with complex geometries. The method provides a flexible geometry-aware train-once-use-anywhere foundation model which is independent of any particular observation process.

优化|敛散性(3篇)

【1】Ads that Stick: Near-Optimal Ad Optimization through Psychological Behavior Models
标题：坚持的广告：通过心理行为模型进行近乎最优的广告优化
链接：https://arxiv.org/abs/2509.20304

作者：opal Darmasubramanian, Akash Pareek, Arindam Khan, Arpit Agarwal
摘要：优化广告的时间和频率是数字广告的核心问题，具有重大的经济后果。现有的调度策略依赖于简单的策略，如统一的间距和频率上限，忽略了长期的用户兴趣。然而，众所周知，用户的长期兴趣和参与是几种心理效应相互作用的结果（Curmei，Haupt，Recht，Hadfield-Menell，ACM CRS，2022）。在这项工作中，我们的模型显示广告后，用户兴趣的变化基于三个关键的心理学原则：单纯的曝光，享乐适应和操作性条件反射。前两个效果使用重复曝光的用户兴趣的凹函数建模，而第三个效果使用时间衰减函数建模，这解释了由于过度曝光导致的用户兴趣下降。在我们的心理行为模型下，我们提出以下问题：给定一个连续的时间间隔$T$，应该在什么时间显示多少广告，以最大限度地提高用户对广告的兴趣？为了回答这个问题，我们首先表明，如果显示的广告的数量是固定的，那么最佳的广告时间表只取决于操作条件功能。我们的主要结果是一个准线性时间算法，输出一个接近最优的广告时间表，即，我们的调度和最优调度的性能差异是指数级小的。我们的算法导致显着的见解最佳的广告投放，并表明，简单的广告，如均匀的间距是次优在许多自然设置。要显示的广告的最佳数量也取决于单纯的曝光和享乐主义适应函数，可以通过给出上述算法的简单线性搜索来找到。我们通过实验结果进一步支持我们的发现，证明我们的策略优于各种基线。
摘要：Optimizing the timing and frequency of ads is a central problem in digital advertising, with significant economic consequences. Existing scheduling policies rely on simple heuristics, such as uniform spacing and frequency caps, that overlook long-term user interest. However, it is well-known that users' long-term interest and engagement result from the interplay of several psychological effects (Curmei, Haupt, Recht, Hadfield-Menell, ACM CRS, 2022). In this work, we model change in user interest upon showing ads based on three key psychological principles: mere exposure, hedonic adaptation, and operant conditioning. The first two effects are modeled using a concave function of user interest with repeated exposure, while the third effect is modeled using a temporal decay function, which explains the decline in user interest due to overexposure. Under our psychological behavior model, we ask the following question: Given a continuous time interval $T$, how many ads should be shown, and at what times, to maximize the user interest towards the ads? Towards answering this question, we first show that, if the number of displayed ads is fixed, then the optimal ad-schedule only depends on the operant conditioning function. Our main result is a quasi-linear time algorithm that outputs a near-optimal ad-schedule, i.e., the difference in the performance of our schedule and the optimal schedule is exponentially small. Our algorithm leads to significant insights about optimal ad placement and shows that simple heuristics such as uniform spacing are sub-optimal under many natural settings. The optimal number of ads to display, which also depends on the mere exposure and hedonistic adaptation functions, can be found through a simple linear search given the above algorithm. We further support our findings with experimental results, demonstrating that our strategy outperforms various baselines.

【2】On the Rate of Convergence of Kolmogorov-Arnold Network Regression Estimators
标题：Kolmogorov-Arnold网络回归估计的收敛速度
链接：https://arxiv.org/abs/2509.19830

作者：Eleni Chatzi, Zhilu Lai
摘要：Kolmogorov-Arnold网络（KAN）通过加法或乘法聚合组合单变量变换，为多元函数逼近提供了一个结构化和可解释的框架。本文建立了一元分量用B样条表示时KAN的理论收敛保证。我们证明了对于光滑Sobolev空间中的函数，加性和混合加乘KAN的极小极大最优收敛速度为O（n^{-2r/（2 r +1）}）。我们进一步推导出的B-样条的最佳节点数的选择准则。该理论得到了模拟研究的支持，证实了预测的收敛速度。这些结果为在非参数回归中使用KAN提供了理论基础，并突出了它们作为现有方法的结构化替代方案的潜力。
摘要：Kolmogorov-Arnold Networks (KANs) offer a structured and interpretable framework for multivariate function approximation by composing univariate transformations through additive or multiplicative aggregation. This paper establishes theoretical convergence guarantees for KANs when the univariate components are represented by B-splines. We prove that both additive and hybrid additive-multiplicative KANs attain the minimax-optimal convergence rate $O(n^{-2r/(2r+1)})$ for functions in Sobolev spaces of smoothness $r$. We further derive guidelines for selecting the optimal number of knots in the B-splines. The theory is supported by simulation studies that confirm the predicted convergence rates. These results provide a theoretical foundation for using KANs in nonparametric regression and highlight their potential as a structured alternative to existing methods.

【3】BioBO: Biology-informed Bayesian Optimization for Perturbation Design
标题：BioBO：基于生物学的Bayesian优化微扰设计
链接：https://arxiv.org/abs/2509.19988

作者： Tianyu Cui, Tommaso Mansi, Mangal Prakash, Rui Liao
备注：NeurIPS: Structured Probabilistic Inference & Generative Modeling, 2025
摘要：基因组扰动实验的有效设计对于加速药物发现和治疗靶点识别至关重要，但由于潜在的遗传相互作用和实验限制的巨大搜索空间，人类基因组的详尽扰动仍然是不可行的。贝叶斯优化（BO）已经成为一个强大的框架，选择信息干预，但现有的方法往往无法利用特定领域的生物先验知识。我们提出了生物信息贝叶斯优化（BioBO），一种将贝叶斯优化与多模态基因嵌入和富集分析（生物学中基因优先级排序的广泛使用的工具）相结合的方法，以增强替代建模和获取策略。BioBO将生物学基础的先验知识与获取功能结合在一个原则性的框架中，使搜索偏向有希望的基因，同时保持探索不确定区域的能力。通过在已建立的公共基准和数据集上的实验，我们证明了BioBO将标记效率提高了25- 40%，并且通过更有效地识别性能最佳的扰动，始终优于传统BO。此外，通过整合富集分析，BioBO产生了对选定扰动的路径级解释，提供了将设计与生物学连贯的调控回路联系起来的机制解释性。
摘要：Efficient design of genomic perturbation experiments is crucial for accelerating drug discovery and therapeutic target identification, yet exhaustive perturbation of the human genome remains infeasible due to the vast search space of potential genetic interactions and experimental constraints. Bayesian optimization (BO) has emerged as a powerful framework for selecting informative interventions, but existing approaches often fail to exploit domain-specific biological prior knowledge. We propose Biology-Informed Bayesian Optimization (BioBO), a method that integrates Bayesian optimization with multimodal gene embeddings and enrichment analysis, a widely used tool for gene prioritization in biology, to enhance surrogate modeling and acquisition strategies. BioBO combines biologically grounded priors with acquisition functions in a principled framework, which biases the search toward promising genes while maintaining the ability to explore uncertain regions. Through experiments on established public benchmarks and datasets, we demonstrate that BioBO improves labeling efficiency by 25-40%, and consistently outperforms conventional BO by identifying top-performing perturbations more effectively. Moreover, by incorporating enrichment analysis, BioBO yields pathway-level explanations for selected perturbations, offering mechanistic interpretability that links designs to biologically coherent regulatory circuits.

预测|估计(13篇)

【1】Process-Informed Forecasting of Complex Thermal Dynamics in Pharmaceutical Manufacturing
标题：制药制造中复杂热动力学的过程知情预测
链接：https://arxiv.org/abs/2509.20349

作者：bini, Siavash Khodakarami, Aniruddha Bora, George Em Karniadakis, Michele Dassisti
摘要：对复杂物理系统进行精确的时间序列预测是现代工业监测和控制的基础。虽然深度学习模型擅长捕捉复杂的动态，但目前，由于物理不一致性和鲁棒性，它们的部署受到限制，因此限制了它们在受监管环境中的可靠性。我们介绍了过程知情预测（PIF）模型的药物冻干温度。我们研究了各种模型，从经典的自回归综合移动平均模型（ARIMA）和指数平滑模型（ETS）到现代深度学习架构，包括Kolmogorov-Arnold网络（KAN）。我们比较了三种不同的损失函数公式，这些公式集成了过程知情轨迹先验：固定重量损失，基于动态不确定性的损失和基于残差的注意力（RBA）机制。我们不仅评估所有模型的准确性和物理一致性，而且还评估对传感器噪声的鲁棒性。此外，我们测试了在一个新的过程中的迁移学习场景中的最佳模型的实际泛化能力。我们的研究结果表明，PIF模型在准确性，物理可扩展性和抗噪声能力方面优于数据驱动的模型。这项工作提供了一个路线图，开发可靠的和可推广的预测解决方案，在制药领域的关键应用。
摘要：Accurate time-series forecasting for complex physical systems is the backbone of modern industrial monitoring and control. While deep learning models excel at capturing complex dynamics, currently, their deployment is limited due to physical inconsistency and robustness, hence constraining their reliability in regulated environments. We introduce process-informed forecasting (PIF) models for temperature in pharmaceutical lyophilization. We investigate a wide range of models, from classical ones such as Autoregressive Integrated Moving Average Model (ARIMA) and Exponential Smoothing Model (ETS), to modern deep learning architectures, including Kolmogorov-Arnold Networks (KANs). We compare three different loss function formulations that integrate a process-informed trajectory prior: a fixed-weight loss, a dynamic uncertainty-based loss, and a Residual-Based Attention (RBA) mechanism. We evaluate all models not only for accuracy and physical consistency but also for robustness to sensor noise. Furthermore, we test the practical generalizability of the best model in a transfer learning scenario on a new process. Our results show that PIF models outperform their data-driven counterparts in terms of accuracy, physical plausibility and noise resilience. This work provides a roadmap for developing reliable and generalizable forecasting solutions for critical applications in the pharmaceutical manufacturing landscape.

【2】Dynamic Lagging for Time-Series Forecasting in E-Commerce Finance: Mitigating Information Loss with A Hybrid ML Architecture
标题：电子商务金融中时间序列预测的动态滞后：用混合ML架构缓解信息丢失
链接：https://arxiv.org/abs/2509.20244

作者：Sharma, Anat Parush, Sumit Wadhwa, Amihai Savir, Anne Guinard, Prateek Srivastava
摘要：由于不规则的发票时间表，付款延迟和用户特定的行为变化，电子商务金融领域的准确预测特别具有挑战性。这些因素，加上稀疏的数据集和短的历史窗口，限制了传统的时间序列方法的有效性。虽然深度学习和基于transformer的模型在其他领域表现出了希望，但在部分可观测性和有限的历史数据下，它们的性能会下降。为了解决这些挑战，我们提出了一个混合预测框架，集成了动态滞后特征工程和自适应滚动窗口表示与经典的统计模型和集成学习器。我们的方法明确地结合了发票级行为建模，支持数据的结构化滞后和自定义稳定性感知损失函数，从而在稀疏和不规则的金融环境中实现稳健的预测。实证结果表明，与基线模型相比，MAPE减少了约5%，转化为大量的财务节省。此外，该框架通过捕获短期和长期模式、利用用户配置文件属性和模拟即将到来的发票行为，增强了季度范围内的预测稳定性，并加强了功能目标的相关性。这些发现强调了结合结构化滞后，发票级关闭建模和行为洞察力的价值，以提高稀疏财务时间序列预测的预测准确性。
摘要：Accurate forecasting in the e-commerce finance domain is particularly challenging due to irregular invoice schedules, payment deferrals, and user-specific behavioral variability. These factors, combined with sparse datasets and short historical windows, limit the effectiveness of conventional time-series methods. While deep learning and Transformer-based models have shown promise in other domains, their performance deteriorates under partial observability and limited historical data. To address these challenges, we propose a hybrid forecasting framework that integrates dynamic lagged feature engineering and adaptive rolling-window representations with classical statistical models and ensemble learners. Our approach explicitly incorporates invoice-level behavioral modeling, structured lag of support data, and custom stability-aware loss functions, enabling robust forecasts in sparse and irregular financial settings. Empirical results demonstrate an approximate 5% reduction in MAPE compared to baseline models, translating into substantial financial savings. Furthermore, the framework enhances forecast stability over quarterly horizons and strengthens feature target correlation by capturing both short- and long-term patterns, leveraging user profile attributes, and simulating upcoming invoice behaviors. These findings underscore the value of combining structured lagging, invoice-level closure modeling, and behavioral insights to advance predictive accuracy in sparse financial time-series forecasting.

【3】Design Insights and Comparative Evaluation of a Hardware-Based Cooperative Perception Architecture for Lane Change Prediction
标题：用于车道变化预测的基于硬件的协作感知架构的设计见解和比较评估
链接：https://arxiv.org/abs/2509.20218

作者：anzour, Catherine M. Elias, Omar M. Shehata, Rubén Izquierdo, Miguel Ángel Sotelo
摘要：在过去的几年里，车道变换预测的研究得到了关注。这一领域的大多数现有工作都是在模拟环境中或使用预先记录的数据集进行的，这些工作通常依赖于有关传感，通信和交通行为的简化假设，这些假设在实践中并不总是成立。车道变化预测系统的实际部署相对较少，并且当它们被报告时，实际挑战，限制和经验教训往往记录不足。本研究通过混合交通中的实际硬件部署探索了合作换道预测，并分享了在实施和测试过程中出现的见解。我们强调了我们所面临的实际挑战，包括瓶颈，可靠性问题和操作限制，塑造了系统的行为。通过记录这些经验，这项研究为其他从事类似管道工作的人提供了指导。
摘要：Research on lane change prediction has gained attention in the last few years. Most existing works in this area have been conducted in simulation environments or with pre-recorded datasets, these works often rely on simplified assumptions about sensing, communication, and traffic behavior that do not always hold in practice. Real-world deployments of lane-change prediction systems are relatively rare, and when they are reported, the practical challenges, limitations, and lessons learned are often under-documented. This study explores cooperative lane-change prediction through a real hardware deployment in mixed traffic and shares the insights that emerged during implementation and testing. We highlight the practical challenges we faced, including bottlenecks, reliability issues, and operational constraints that shaped the behavior of the system. By documenting these experiences, the study provides guidance for others working on similar pipelines.

【4】A Novel Short-Term Anomaly Prediction for IIoT with Software Defined Twin Network
标题：利用软件定义的孪生网络进行工业物联网短期异常预测
链接：https://arxiv.org/abs/2509.20068

作者：gic (1), Betul Sen (1), Muge Erel-Ozcevik (1) ((1) Manisa Celal Bayar University, Turkey)
备注：Accepted by 2025 IEEE Globecom Workshops-TwinNetApp
摘要：IIoT环境中的安全监控和动态控制是当前发展目标的主要要求。我们相信，通过与软件定义网络（SDN）和数字孪生（DT）范式的集成，可以实现对IIoT环境的动态、安全监控。目前的文献缺乏基于SDN的DT和时间感知智能模型训练的实现细节，用于针对IIoT威胁的短期异常检测。因此，我们提出了一种使用基于SDN的DT进行短期异常检测的新型框架。使用全面的数据集，时间感知的特征标记，以及对各种机器学习模型的综合评估，我们提出了一种新的基于SD-TWIN的异常检测算法。根据新的实时SD-TWIN部署的性能，GPU加速的LightGBM模型特别有效，实现了高召回率和强分类性能的平衡。
摘要：Secure monitoring and dynamic control in an IIoT environment are major requirements for current development goals. We believe that dynamic, secure monitoring of the IIoT environment can be achieved through integration with the Software-Defined Network (SDN) and Digital Twin (DT) paradigms. The current literature lacks implementation details for SDN-based DT and time-aware intelligent model training for short-term anomaly detection against IIoT threats. Therefore, we have proposed a novel framework for short-term anomaly detection that uses an SDN-based DT. Using a comprehensive dataset, time-aware labeling of features, and a comprehensive evaluation of various machine learning models, we propose a novel SD-TWIN-based anomaly detection algorithm. According to the performance of a new real-time SD-TWIN deployment, the GPU- accelerated LightGBM model is particularly effective, achieving a balance of high recall and strong classification performance.

【5】One Filters All: A Generalist Filter for State Estimation
标题：一个过滤器全部：状态估计的通用过滤器
链接：https://arxiv.org/abs/2509.20051

作者：, Wenhan Cao, Chang Liu, Zeyu He, Tianyi Zhang, Shengbo Eben Li
备注：NeurIPS 2025
摘要：估计动力系统中的隐藏状态，也称为最优滤波，是科学和工程各个领域中的一个长期存在的问题。在本文中，我们介绍了一个通用的过滤框架，\textbf{LLM-Filter}，它利用大型语言模型（LLM）进行状态估计，通过将噪声观测嵌入文本原型。在经典动力系统的各种实验中，我们发现，首先，状态估计可以显着受益于预训练LLM中嵌入的推理知识。通过与冻结的LLM实现适当的模态对齐，LLM-Filter优于最先进的基于学习的方法。其次，我们仔细设计了提示结构，即系统提示（SaP），其中包含使LLM能够理解估计任务的任务指令。在这些提示的指导下，LLM-Filter表现出出色的泛化能力，能够在变化的甚至看不见的环境中准确地执行过滤任务。我们进一步观察到LLM-Filter中的缩放律行为，其中精度随着模型大小和训练时间的增加而提高。这些发现使LLM滤波器成为一种很有前途的滤波基础模型。
摘要：Estimating hidden states in dynamical systems, also known as optimal filtering, is a long-standing problem in various fields of science and engineering. In this paper, we introduce a general filtering framework, \textbf{LLM-Filter}, which leverages large language models (LLMs) for state estimation by embedding noisy observations with text prototypes. In various experiments for classical dynamical systems, we find that first, state estimation can significantly benefit from the reasoning knowledge embedded in pre-trained LLMs. By achieving proper modality alignment with the frozen LLM, LLM-Filter outperforms the state-of-the-art learning-based approaches. Second, we carefully design the prompt structure, System-as-Prompt (SaP), incorporating task instructions that enable the LLM to understand the estimation tasks. Guided by these prompts, LLM-Filter exhibits exceptional generalization, capable of performing filtering tasks accurately in changed or even unseen environments. We further observe a scaling-law behavior in LLM-Filter, where accuracy improves with larger model sizes and longer training times. These findings make LLM-Filter a promising foundation model of filtering.

【6】Predictive Quality Assessment for Mobile Secure Graphics
标题：移动安全图形的预测性质量评估
链接：https://arxiv.org/abs/2509.20028

作者：stra, Sergey Milyaev, Shaodi You
备注：8 pages, to appear at ICCV 2025 MIPI Workshop (IEEE)
摘要：安全图形验证是一种关键的防伪工具，其可靠性因智能手机上的图像采集不良而受到损害。不受控制的用户捕获这些高熵模式会导致高错误拒绝率，从而产生显著的“可靠性差距”。为了弥合这一差距，我们离开传统的感知IQA，并引入一个框架，预测估计一个帧的实用程序的下游验证任务。我们提出了一个轻量级的模型来预测视频帧的质量分数，确定其适用于资源密集型的Oracle模型。我们的框架使用重新上下文化的FNMR和ISRR指标对来自105部智能手机的32，000多张图像的大规模数据集进行了验证。此外，对来自不同工业印刷机的图形进行的一种新的跨域分析揭示了一个关键发现：在冻结的ImageNet预训练网络上的轻量级探针比完全微调的模型更好地推广到看不见的印刷技术。这为现实世界的泛化提供了一个关键的见解：对于从物理制造的域转移，冻结的通用主干可能比完全微调更健壮，这可能过度适应源域工件。
摘要：The reliability of secure graphic verification, a key anti-counterfeiting tool, is undermined by poor image acquisition on smartphones. Uncontrolled user captures of these high-entropy patterns cause high false rejection rates, creating a significant 'reliability gap'. To bridge this gap, we depart from traditional perceptual IQA and introduce a framework that predictively estimates a frame's utility for the downstream verification task. We propose a lightweight model to predict a quality score for a video frame, determining its suitability for a resource-intensive oracle model. Our framework is validated using re-contextualized FNMR and ISRR metrics on a large-scale dataset of 32,000+ images from 105 smartphones. Furthermore, a novel cross-domain analysis on graphics from different industrial printing presses reveals a key finding: a lightweight probe on a frozen, ImageNet-pretrained network generalizes better to an unseen printing technology than a fully fine-tuned model. This provides a key insight for real-world generalization: for domain shifts from physical manufacturing, a frozen general-purpose backbone can be more robust than full fine-tuning, which can overfit to source-domain artifacts.

【7】From Samples to Scenarios: A New Paradigm for Probabilistic Forecasting
标题：从样本到情景：概率预测的新范式
链接：https://arxiv.org/abs/2509.19975

作者：, Zhijian Xu, Wanxu Cai, Qiang Xu
摘要：大多数最先进的概率时间序列预测模型依赖于采样来表示未来的不确定性。然而，这种模式具有固有的局限性，如缺乏明确的概率，覆盖面不足，计算成本高。在这项工作中，我们介绍了\textbf{概率情景}，一种替代范式，旨在解决抽样的局限性。它通过直接产生一个有限的场景，概率对的集合来操作，从而避免了Monte Carlo近似。为了验证这种模式，我们提出了\textbf{TimePrism}，一个简单的模型，只由三个平行的线性层组成。令人惊讶的是，TimePrism在两个指标的五个基准数据集上实现了10个最先进结果中的9个。我们范式的有效性来自于对学习目标的基本重构。该模型不是对整个连续概率空间进行建模，而是学习表示一组合理的场景和相应的概率。我们的工作展示了概率情景范式的潜力，在预测超越抽样方面开辟了一个有前途的研究方向。
摘要：Most state-of-the-art probabilistic time series forecasting models rely on sampling to represent future uncertainty. However, this paradigm suffers from inherent limitations, such as lacking explicit probabilities, inadequate coverage, and high computational costs. In this work, we introduce \textbf{Probabilistic Scenarios}, an alternative paradigm designed to address the limitations of sampling. It operates by directly producing a finite set of \{Scenario, Probability\} pairs, thus avoiding Monte Carlo-like approximation. To validate this paradigm, we propose \textbf{TimePrism}, a simple model composed of only three parallel linear layers. Surprisingly, TimePrism achieves 9 out of 10 state-of-the-art results across five benchmark datasets on two metrics. The effectiveness of our paradigm comes from a fundamental reframing of the learning objective. Instead of modeling an entire continuous probability space, the model learns to represent a set of plausible scenarios and corresponding probabilities. Our work demonstrates the potential of the Probabilistic Scenarios paradigm, opening a promising research direction in forecasting beyond sampling.

【8】Advancing Universal Deep Learning for Electronic-Structure Hamiltonian Prediction of Materials
标题：推进材料电子结构Hamilton预测的通用深度学习
链接：https://arxiv.org/abs/2509.19877

作者：Zujian Dai, Xinyang Pan, Lixin He
摘要：与传统的DFT方法相比，用于电子结构哈密顿量预测的深度学习方法具有显著的计算效率优势，但原子类型、结构模式的多样性和哈密顿量的高维复杂性对泛化性能提出了重大挑战。在这项工作中，我们在方法和数据集方面做出了贡献，以推进用于哈密顿预测的通用深度学习范式。在方法方面，我们提出了NextHAM，一种神经E（3）对称性和表达校正方法，用于有效和可推广的材料电子结构哈密顿量预测。首先，我们引入了零步哈密顿算子，它可以有效地构造由DFT的初始电荷密度，作为神经回归模型的输入级和输出级的目标哈密顿算子的初始估计的信息描述符，使回归模型直接预测校正项的目标地面真值，从而显着简化了学习的输入-输出映射。其次，我们提出了一个神经Transformer架构，具有严格的E（3）-对称性和高度的非线性表达的哈密顿预测。第三，我们提出了一种新的训练目标，以确保哈密顿量在实空间和倒易空间中的准确性能，防止误差放大和重叠矩阵的大条件数导致的“鬼态”的发生。在数据集方面，我们策划了一个高质量的广泛覆盖的大型基准，即Materials-HAM-SOC，包括17，000个材料结构，涵盖周期表六行的68种元素，并明确纳入SOC效应。Materials-HAM-SOC上的实验结果表明，NextHAM在预测汉密尔顿和能带结构方面具有出色的准确性和效率。
摘要：Deep learning methods for electronic-structure Hamiltonian prediction has offered significant computational efficiency advantages over traditional DFT methods, yet the diversity of atomic types, structural patterns, and the high-dimensional complexity of Hamiltonians pose substantial challenges to the generalization performance. In this work, we contribute on both the methodology and dataset sides to advance universal deep learning paradigm for Hamiltonian prediction. On the method side, we propose NextHAM, a neural E(3)-symmetry and expressive correction method for efficient and generalizable materials electronic-structure Hamiltonian prediction. First, we introduce the zeroth-step Hamiltonians, which can be efficiently constructed by the initial charge density of DFT, as informative descriptors of neural regression model in the input level and initial estimates of the target Hamiltonian in the output level, so that the regression model directly predicts the correction terms to the target ground truths, thereby significantly simplifying the input-output mapping for learning. Second, we present a neural Transformer architecture with strict E(3)-Symmetry and high non-linear expressiveness for Hamiltonian prediction. Third, we propose a novel training objective to ensure the accuracy performance of Hamiltonians in both real space and reciprocal space, preventing error amplification and the occurrence of "ghost states" caused by the large condition number of the overlap matrix. On the dataset side, we curate a high-quality broad-coverage large benchmark, namely Materials-HAM-SOC, comprising 17,000 material structures spanning 68 elements from six rows of the periodic table and explicitly incorporating SOC effects. Experimental results on Materials-HAM-SOC demonstrate that NextHAM achieves excellent accuracy and efficiency in predicting Hamiltonians and band structures.

【9】Consistent Estimation of Numerical Distributions under Local Differential Privacy by Wavelet Expansion
标题：局部差异保密下数值分布的子波展开一致估计
链接：https://arxiv.org/abs/2509.19661

作者：ao, Zhikun Zhang, Bo Sun, Li Shen, Liang Zhang, Shaowei Wang, Zhe Liu
摘要：局部差分隐私（LDP）下的分布估计是一个基本而具有挑战性的任务。在分类数据方面取得了重大进展。然而，由于不同的评估指标，这些方法不能很好地工作时，转移到数值数据。特别是，我们需要防止概率质量被放错地方。在本文中，我们提出了一种新的方法，表示样本分布使用小波展开。在LDP下估计小波级数的系数。我们的方法优先考虑低阶系数的估计，以确保在宏观层面上的准确估计。因此，概率质量被防止被错放在离它的地面真值太远的地方。我们为我们的方法建立理论保证。实验表明，我们的小波扩展方法显着优于现有的解决方案下Wasserstein和KS距离。
摘要：Distribution estimation under local differential privacy (LDP) is a fundamental and challenging task. Significant progresses have been made on categorical data. However, due to different evaluation metrics, these methods do not work well when transferred to numerical data. In particular, we need to prevent the probability mass from being misplaced far away. In this paper, we propose a new approach that express the sample distribution using wavelet expansions. The coefficients of wavelet series are estimated under LDP. Our method prioritizes the estimation of low-order coefficients, in order to ensure accurate estimation at macroscopic level. Therefore, the probability mass is prevented from being misplaced too far away from its ground truth. We establish theoretical guarantees for our methods. Experiments show that our wavelet expansion method significantly outperforms existing solutions under Wasserstein and KS distances.

【10】Toward Scalable and Structured Global Station Weather Forecasting
标题：迈向可扩展和结构化的全球站天气预报
链接：https://arxiv.org/abs/2509.19648

作者：en, Xiucheng Li, Xinyang Chen, Yun Cheng, Jing Li, Kehai Chen, Liqiang Nie
摘要：全球台站天气预报（Gestival Weather Forecasting）是一个重要的气象研究领域，对能源、航空和农业至关重要。现有的时间序列预报方法在进行大尺度全球台站预报时，往往忽略或单向模拟空间相关性。这与全球天气系统观测的内在本质相矛盾，限制了预报性能。为了解决这个问题，本文提出了一种新的空间结构化注意块。它将空间图划分为一组子图，并实例化Intra-subgraph Attention以学习每个子图内的局部空间相关性，并将节点聚合到子图表示中，以便通过Inter-subgraph Attention在子图之间传递消息-同时考虑空间邻近性和全局相关性。在此基础上，我们通过逐步扩展子图尺度，建立了一个多尺度时空预测模型。该模型具有可扩展性，能够产生结构化的空间相关性，同时易于实现。实验结果表明，它可以实现性能提高了16.8%的时间序列预测基线在较低的运行成本。
摘要：Global Station Weather Forecasting (GSWF) is a key meteorological research area, critical to energy, aviation, and agriculture. Existing time series forecasting methods often ignore or unidirectionally model spatial correlation when conducting large-scale global station forecasting. This contradicts the intrinsic nature underlying observations of the global weather system, limiting forecast performance. To address this, we propose a novel Spatial Structured Attention Block in this paper. It partitions the spatial graph into a set of subgraphs and instantiates Intra-subgraph Attention to learn local spatial correlation within each subgraph, and aggregates nodes into subgraph representations for message passing among the subgraphs via Inter-subgraph Attention -- considering both spatial proximity and global correlation. Building on this block, we develop a multiscale spatiotemporal forecasting model by progressively expanding subgraph scales. The resulting model is both scalable and able to produce structured spatial correlation, and meanwhile, it is easy to implement. The experimental results show that it can achieve performance improvements up to 16.8% over time series forecasting baselines at low running costs.

【11】Enhancing Credit Default Prediction Using Boruta Feature Selection and DBSCAN Algorithm with Different Resampling Techniques
标题：使用Boruta特征选择和DBSCAN算法以及不同的重新分配技术增强信用违约预测
链接：https://arxiv.org/abs/2509.19408

作者： Ampomah, Edmund Agyemang, Kofi Acheampong, Louis Agyekum
备注：16 pages, 8 figures and 5 tables
摘要：本研究通过比较三种技术，即SMOTE，SMOTE-Tomek和ADASYN，来研究信用违约预测，这三种技术通常用于解决信用违约情况下的类不平衡问题。认识到信用违约数据集通常是倾斜的，违约者所占的比例比非违约者小得多，我们通过在不平衡数据上评估机器学习（ML）模型来开始我们的分析，而没有任何重新评估来建立基线性能。这些基线结果为理解后续平衡方法的影响提供了参考点。除了传统的分类器，如朴素贝叶斯和K-最近邻（KNN），我们的研究还探讨了先进的集成提升算法的适用性，包括极端梯度提升（XGBoost），AdaBoost，梯度提升机（GBM）和Light GBM，用于使用Boruta特征选择和基于DBSCAN的离群值检测进行信用违约预测，无论是之前还是之后。来自克利夫兰大学ML存储库的真实信用违约数据集被用来构建ML分类器，并测试了它们的性能。选择用于测量模型性能的标准是受试者工作特征曲线下面积（ROC-AUC）、精确度-召回率曲线下面积（PR-AUC）、G均值和F1评分。这项实证研究的结果表明，在信用违约背景下，Boruta + DBSCAN + SMOTE-Tomek + GBM分类器的性能优于其他ML模型（F1分数：82.56%，G均值：82.98%，ROC-AUC：90.90%，PR-AUC：91.85%）。这些发现为未来建立更具弹性和适应性的信用违约系统奠定了基础，随着全球基于信用的交易继续增加，这将是必不可少的。
摘要：This study examines credit default prediction by comparing three techniques, namely SMOTE, SMOTE-Tomek, and ADASYN, that are commonly used to address the class imbalance problem in credit default situations. Recognizing that credit default datasets are typically skewed, with defaulters comprising a much smaller proportion than non-defaulters, we began our analysis by evaluating machine learning (ML) models on the imbalanced data without any resampling to establish baseline performance. These baseline results provide a reference point for understanding the impact of subsequent balancing methods. In addition to traditional classifiers such as Naive Bayes and K-Nearest Neighbors (KNN), our study also explores the suitability of advanced ensemble boosting algorithms, including Extreme Gradient Boosting (XGBoost), AdaBoost, Gradient Boosting Machines (GBM), and Light GBM for credit default prediction using Boruta feature selection and DBSCAN-based outlier detection, both before and after resampling. A real-world credit default data set sourced from the University of Cleveland ML Repository was used to build ML classifiers, and their performances were tested. The criteria chosen to measure model performance are the area under the receiver operating characteristic curve (ROC-AUC), area under the precision-recall curve (PR-AUC), G-mean, and F1-scores. The results from this empirical study indicate that the Boruta+DBSCAN+SMOTE-Tomek+GBM classifier outperformed the other ML models (F1-score: 82.56%, G-mean: 82.98%, ROC-AUC: 90.90%, PR-AUC: 91.85%) in a credit default context. The findings establish a foundation for future progress in creating more resilient and adaptive credit default systems, which will be essential as credit-based transactions continue to rise worldwide.

【12】Short-Term Regional Electricity Demand Forecasting in Argentina Using LSTM Networks
标题：使用LSTM网络预测阿根廷短期区域电力需求
链接：https://arxiv.org/abs/2509.19374

作者：Oviedo
备注：44 pages, 13 figures
摘要：这项研究提出了基于长短期记忆（LSTM）网络的深度学习模型的开发和优化，以预测阿根廷科尔多瓦的短期小时电力需求。将历史消费数据与外生变量（气候因素、时间周期和人口统计数据）相结合，该模型具有较高的预测精度，平均绝对百分比误差为3.20%，决定系数为0.95。事实证明，纳入周期性时间编码和天气变量对于捕捉季节性模式和极端消费事件至关重要，增强了模型的稳健性和普遍性。除了LSTM架构的设计和超参数优化外，还进行了两项互补分析：（i）使用随机森林回归进行可解释性研究，以量化外部驱动因素的相对重要性，以及（ii）评估模型在预测每日需求最大值和最小值的时间方面的性能，在超过三分之二的测试日内达到精确的小时精度，在超过90%的情况下达到绝对值（1）小时以内。总之，这些结果突出了所提出的框架的预测准确性和操作相关性，为电网运营商在不同的需求场景下寻求优化的规划和控制策略提供了有价值的见解。
摘要：This study presents the development and optimization of a deep learning model based on Long Short-Term Memory (LSTM) networks to predict short-term hourly electricity demand in C\'ordoba, Argentina. Integrating historical consumption data with exogenous variables (climatic factors, temporal cycles, and demographic statistics), the model achieved high predictive precision, with a mean absolute percentage error of 3.20\% and a determination coefficient of 0.95. The inclusion of periodic temporal encodings and weather variables proved crucial to capture seasonal patterns and extreme consumption events, enhancing the robustness and generalizability of the model. In addition to the design and hyperparameter optimization of the LSTM architecture, two complementary analyses were carried out: (i) an interpretability study using Random Forest regression to quantify the relative importance of exogenous drivers, and (ii) an evaluation of model performance in predicting the timing of daily demand maxima and minima, achieving exact-hour accuracy in more than two-thirds of the test days and within abs(1) hour in over 90\% of cases. Together, these results highlight both the predictive accuracy and operational relevance of the proposed framework, providing valuable insights for grid operators seeking optimized planning and control strategies under diverse demand scenarios.

【13】STL-FFT-STFT-TCN-LSTM: An Effective Wave Height High Accuracy Prediction Model Fusing Time-Frequency Domain Features
标题：STL-FT-STFT-TCN-LSTM：融合时频域特征的有效波高高精度预测模型
链接：https://arxiv.org/abs/2509.19313

作者：iu, Zhichao Zhu, Yuan Zhou, Changlu Li
备注：17 page, 13 figures; references added
摘要：随着传统能源的消耗加剧及其对环境的不利影响变得更加明显，波浪能因其高能量密度，稳定性，广泛分布和环境友好性而成为可再生能源家族中极具前景的成员。其发展的关键在于有效波高的精确预报。然而，波能信号具有强非线性、突变、多尺度周期性、数据稀疏性和高频噪声干扰等特点，此外，用于波能预测的物理模型会产生极高的计算成本。为了解决这些挑战，本研究提出了一种结合STL-FFT-STFT-TCN-LSTM的混合模型。该模型利用基于黄土（STL），快速傅里叶变换（FFT），短时傅里叶变换（STFT），时间卷积网络（TCN）和长短期记忆（LSTM）技术的季节趋势分解过程。该模型旨在优化多尺度特征融合，捕捉极端波高，并解决与高频噪声和周期信号相关的问题，从而实现有效波高的高效准确预测。实验使用NOAA 41008站和41047站2019年至2022年的每小时数据进行。结果表明，与其他单一模式和混合模式相比，STL-FFT-STFT-TCN-LSTM模式在捕捉极值波高和抑制高频噪声方面的预报精度显著提高，MAE降低15.8%~ 40.5%，SMAPE降低8.3%~ 20.3%，R提高1.31%~ 2.9%;在烧蚀实验中，该模型也证明了各组成步骤的不可或缺性，验证了其在多尺度特征融合中的优越性。
摘要：As the consumption of traditional energy sources intensifies and their adverse environmental impacts become more pronounced, wave energy stands out as a highly promising member of the renewable energy family due to its high energy density, stability, widespread distribution, and environmental friendliness. The key to its development lies in the precise prediction of Significant Wave Height (WVHT). However, wave energy signals exhibit strong nonlinearity, abrupt changes, multi-scale periodicity, data sparsity, and high-frequency noise interference; additionally, physical models for wave energy prediction incur extremely high computational costs. To address these challenges, this study proposes a hybrid model combining STL-FFT-STFT-TCN-LSTM. This model exploits the Seasonal-Trend Decomposition Procedure based on Loess (STL), Fast Fourier Transform (FFT), Short-Time Fourier Transform (STFT), Temporal Convolutional Network (TCN), and Long Short-Term Memory (LSTM) technologies. The model aims to optimize multi-scale feature fusion, capture extreme wave heights, and address issues related to high-frequency noise and periodic signals, thereby achieving efficient and accurate prediction of significant wave height. Experiments were conducted using hourly data from NOAA Station 41008 and 41047 spanning 2019 to 2022. The results showed that compared with other single models and hybrid models, the STL-FFT-STFT-TCN-LSTM model achieved significantly higher prediction accuracy in capturing extreme wave heights and suppressing high-frequency noise, with MAE reduced by 15.8\%-40.5\%, SMAPE reduced by 8.3\%-20.3\%, and R increased by 1.31\%-2.9\%; in ablation experiments, the model also demonstrated the indispensability of each component step, validating its superiority in multi-scale feature fusion.

其他神经网络|深度学习|模型|建模(30篇)

【1】A Recovery Guarantee for Sparse Neural Networks
标题：稀疏神经网络的恢复保证
链接：https://arxiv.org/abs/2509.20323

作者：ovich-Keil, Mert Pilanci
备注：Code is available at this https URL
摘要：我们证明了ReLU神经网络稀疏恢复的第一个保证，其中稀疏网络权重构成了要恢复的信号。具体来说，我们研究了两层，标量输出网络的稀疏网络权重的结构特性，根据一个简单的迭代硬阈值算法恢复这些权重准确，使用非零权重的数量线性增长的内存。我们验证了这一理论结果与简单的实验恢复稀疏种植的MLP，MNIST分类，和隐式神经表示。在实验中，我们发现的性能是有竞争力的，往往超过，一个高性能的，但内存效率低下的基线的基础上迭代幅度修剪。
摘要：We prove the first guarantees of sparse recovery for ReLU neural networks, where the sparse network weights constitute the signal to be recovered. Specifically, we study structural properties of the sparse network weights for two-layer, scalar-output networks under which a simple iterative hard thresholding algorithm recovers these weights exactly, using memory that grows linearly in the number of nonzero weights. We validate this theoretical result with simple experiments on recovery of sparse planted MLPs, MNIST classification, and implicit neural representations. Experimentally, we find performance that is competitive with, and often exceeds, a high-performing but memory-inefficient baseline based on iterative magnitude pruning.

【2】Extended Low-Rank Approximation Accelerates Learning of Elastic Response in Heterogeneous Materials
标题：扩展低阶逼近加速非均匀材料弹性响应的学习
链接：https://arxiv.org/abs/2509.20276

作者：armakar, Sayan Gupta, Ilaksh Adlakha
摘要：预测微观结构如何控制非均质材料的力学响应对于优化设计和性能至关重要。然而，由于微观结构特征的复杂、高维性质，这项任务仍然很困难。依靠基于物理的模拟来探测微观结构空间在计算上是禁止的。这促使计算工具的发展，以有效地了解结构性能的联系，管理机械行为。虽然当代数据驱动方法提供了新的可能性，但它们通常需要大型数据集。为了解决这一挑战，这项工作提出了扩展的低秩近似（xLRA），一个框架，采用规范polyadic张量分解。它有效地映射高维微结构信息的局部弹性响应，自适应地纳入高阶项。xLRA精确地预测了多孔微结构中的局部弹性应变场，最大秩仅为4。xLRA的紧凑公式仅在5%的数据集上进行训练即可实现准确的预测，表现出显着的数据效率。此外，xLRA通过在代表性材料系统（包括两相复合材料以及单相和双相多晶）中提供结果证明了可转移性。尽管结构紧凑，但xLRA保留了基本的微观结构细节，能够准确预测看不见的微观结构。基准测试表明，xLRA在预测准确性、可推广性和计算效率方面优于当代方法，同时需要的浮点运算少6个数量级。总之，xLRA提供了一个有效的框架，用于预测弹性响应的微观结构，使可扩展的映射结构属性的联系。
摘要：Predicting how the microstructure governs the mechanical response of heterogeneous materials is essential for optimizing design and performance. Yet this task remains difficult due to the complex, high dimensional nature of microstructural features. Relying on physics based simulations to probe the microstructural space is computationally prohibitive. This motivates the development of computational tools to efficiently learn structure property linkages governing mechanical behavior. While contemporary data driven approaches offer new possibilities, they often require large datasets. To address this challenge, this work presents the Extended Low Rank Approximation (xLRA), a framework that employs canonical polyadic tensor decomposition. It efficiently maps high dimensional microstructural information to the local elastic response by adaptively incorporating higher rank terms. xLRA accurately predicts the local elastic strain fields in porous microstructures, requiring a maximum rank of only 4. The compact formulation of xLRA achieves accurate predictions when trained on just 5% of the dataset, demonstrating significant data efficiency. Moreover, xLRA proves transferability by delivering results across representative material systems, including two phase composites and single and dual phase polycrystals. Despite being compact, xLRA retains essential microstructural details, enabling accurate predictions on unseen microstructures. Benchmarking shows that xLRA outperforms contemporary methods in predictive accuracy, generalizability, and computational efficiency, while requiring 6 orders of magnitude fewer floating point operations. In summary, xLRA provides an efficient framework for predicting the elastic response from microstructures, enabling scalable mapping of structure property linkages.

【3】Generative Model Inversion Through the Lens of the Manifold Hypothesis
标题：流形假设下的生成模型反演
链接：https://arxiv.org/abs/2509.20177

作者：g, Bo Han, Fengfei Yu, Tongliang Liu, Feng Liu, Mingyuan Zhou
备注：NeurIPS 2025
摘要：模型反演攻击（MIA）旨在从训练的模型中重建具有类别代表性的样本。最近的生成MIA利用生成对抗网络来学习引导反演过程的图像先验，从而产生具有高视觉质量和对私有训练数据的强保真度的重建。为了探索其有效性背后的原因，我们首先检查反转损失相对于合成输入的梯度，并发现这些梯度的噪音令人惊讶。进一步的分析表明，生成反演隐式地通过将这些梯度投影到生成流形的切空间上来消除这些梯度的噪声，过滤掉流形外的分量，同时保留与流形对齐的信息方向。我们的经验测量表明，在使用标准监督训练的模型中，损失梯度通常表现出与数据流形的较大角度偏差，表明与类相关方向的对齐较差。这一观察激发了我们的中心假设：当模型的损失梯度与生成流形更紧密地对齐时，模型变得更容易受到MIA的影响。我们通过设计一个新的训练目标来验证这一假设，该目标明确地促进了这种对齐。基于这一见解，我们进一步引入了一种免训练方法来增强反演过程中的梯度流形对齐，从而使最先进的生成MIA得到一致的改进。
摘要：Model inversion attacks (MIAs) aim to reconstruct class-representative samples from trained models. Recent generative MIAs utilize generative adversarial networks to learn image priors that guide the inversion process, yielding reconstructions with high visual quality and strong fidelity to the private training data. To explore the reason behind their effectiveness, we begin by examining the gradients of inversion loss with respect to synthetic inputs, and find that these gradients are surprisingly noisy. Further analysis reveals that generative inversion implicitly denoises these gradients by projecting them onto the tangent space of the generator manifold, filtering out off-manifold components while preserving informative directions aligned with the manifold. Our empirical measurements show that, in models trained with standard supervision, loss gradients often exhibit large angular deviations from the data manifold, indicating poor alignment with class-relevant directions. This observation motivates our central hypothesis: models become more vulnerable to MIAs when their loss gradients align more closely with the generator manifold. We validate this hypothesis by designing a novel training objective that explicitly promotes such alignment. Building on this insight, we further introduce a training-free approach to enhance gradient-manifold alignment during inversion, leading to consistent improvements over state-of-the-art generative MIAs.

【4】Choose Your Battles: Distributed Learning Over Multiple Tug of War Games
标题：选择您的战斗：多个拔河游戏中的分布式学习
链接：https://arxiv.org/abs/2509.20147

作者： Chandak, Ilai Bistritz, Nicholas Bambos
备注：Submitted to IEEE TAC
摘要：假设N个玩家和K个游戏同时进行。这些游戏中的每一个都被建模为拔河（ToW）游戏，其中增加一个玩家的动作会减少所有其他玩家的奖励。每个玩家在任何给定的时间只参与一个游戏。在每个时间步，玩家决定他们希望参与的游戏以及他们在该游戏中采取的行动。他们的奖励取决于所有玩家在同一场比赛中的行动。这种K游戏系统被称为“Meta Tug-of-War”（Meta-ToW）游戏。这些游戏可以模拟传感器网络中的功率控制、分布式任务分配和激活等场景。我们提出了Meta拔河和平算法，一个分布式算法，其中的动作更新是使用一个简单的随机近似算法，并决定切换游戏是使用不频繁的1位玩家之间的通信。我们证明，在Meta-ToW游戏中，我们的算法收敛到一个均衡，满足目标服务质量奖励向量的球员。然后，我们证明了我们的算法的有效性，通过模拟上述情况。
摘要：Consider N players and K games taking place simultaneously. Each of these games is modeled as a Tug-of-War (ToW) game where increasing the action of one player decreases the reward for all other players. Each player participates in only one game at any given time. At each time step, a player decides the game in which they wish to participate in and the action they take in that game. Their reward depends on the actions of all players that are in the same game. This system of K games is termed `Meta Tug-of-War' (Meta-ToW) game. These games can model scenarios such as power control, distributed task allocation, and activation in sensor networks. We propose the Meta Tug-of-Peace algorithm, a distributed algorithm where the action updates are done using a simple stochastic approximation algorithm, and the decision to switch games is made using an infrequent 1-bit communication between the players. We prove that in Meta-ToW games, our algorithm converges to an equilibrium that satisfies a target Quality of Service reward vector for the players. We then demonstrate the efficacy of our algorithm through simulations for the scenarios mentioned above.

【5】Hyperspectral Adapter for Semantic Segmentation with Vision Foundation Models
标题：使用Vision Foundation模型进行语义分割的高光谱适配器
链接：https://arxiv.org/abs/2509.20107

作者：a Valeria Hurtado, Rohit Mohan, Abhinav Valada
摘要：高光谱成像（HSI）捕获空间信息以及跨越许多窄波长带的密集光谱测量。这种丰富的光谱内容有可能促进强大的机器人感知，特别是在具有复杂材料成分、不同照明或其他视觉挑战性条件的环境中。然而，目前的HSI语义分割方法由于依赖于针对RGB输入优化的架构和学习框架而表现不佳。在这项工作中，我们提出了一种新的高光谱适配器，它利用预训练的视觉基础模型来有效地从高光谱数据中学习。我们的架构结合了一个光谱Transformer和一个光谱感知空间先验模块，以提取丰富的空间光谱特征。此外，我们引入了一个模态感知的交互块，通过专用的提取和注入机制，促进高光谱表示和冻结的Vision Transformer功能的有效集成。对三个基准自动驾驶数据集的广泛评估表明，我们的架构在直接使用HSI输入的同时实现了最先进的语义分割性能，优于基于视觉和高光谱分割方法。我们在https://hyperspectraladapter.cs.uni-freiburg.de上提供代码。
摘要：Hyperspectral imaging (HSI) captures spatial information along with dense spectral measurements across numerous narrow wavelength bands. This rich spectral content has the potential to facilitate robust robotic perception, particularly in environments with complex material compositions, varying illumination, or other visually challenging conditions. However, current HSI semantic segmentation methods underperform due to their reliance on architectures and learning frameworks optimized for RGB inputs. In this work, we propose a novel hyperspectral adapter that leverages pretrained vision foundation models to effectively learn from hyperspectral data. Our architecture incorporates a spectral transformer and a spectrum-aware spatial prior module to extract rich spatial-spectral features. Additionally, we introduce a modality-aware interaction block that facilitates effective integration of hyperspectral representations and frozen vision Transformer features through dedicated extraction and injection mechanisms. Extensive evaluations on three benchmark autonomous driving datasets demonstrate that our architecture achieves state-of-the-art semantic segmentation performance while directly using HSI inputs, outperforming both vision-based and hyperspectral segmentation methods. We make the code available at https://hyperspectraladapter.cs.uni-freiburg.de.

【6】Projective Kolmogorov Arnold Neural Networks (P-KANs): Entropy-Driven Functional Space Discovery for Interpretable Machine Learning
标题：投射Kolmogorov Arnold神经网络（P-KANs）：可解释机器学习的熵驱动功能空间发现
链接：https://arxiv.org/abs/2509.20049

作者：Poole, Stig McArthur, Saravan Kumar
摘要：Kolmogorov-Arnold网络（KAN）将可学习的非线性从节点重新定位到边缘，在科学机器学习和可解释建模方面展示了卓越的能力。然而，目前的KAN实现遭受基本的低效率，由于在高维样条参数空间中的冗余，其中许多不同的参数化产生功能上等效的行为。这种冗余在模型的雅可比矩阵中表现为“讨厌的空间”，导致对过度拟合和泛化能力差的敏感性。我们介绍投影Kolmogorov-Arnold网络（P-KANs），这是一种新的训练框架，通过信号分析和稀疏字典学习的熵最小化技术，引导边缘函数发现可解释的函数表示。我们的方法不是将函数约束到预定的空间，而是保持样条空间的灵活性，同时引入“引力”项，鼓励收敛到最佳的函数表示。我们的关键见解认识到，最佳表示可以通过投影系数的熵分析来确定，将边缘函数压缩到较低参数的投影空间（傅立叶，切比雪夫，贝塞尔）。P-KAN在多个领域表现出卓越的性能，在保持代表能力的同时实现了高达80%的参数减少，与标准KAN相比显着提高了对噪声的鲁棒性，并成功应用于工业自动化纤维铺放预测。我们的方法可以自动发现混合函数表示，其中不同的边缘收敛到不同的最优空间，为科学机器学习应用提供压缩优势和增强的可解释性。
摘要：Kolmogorov-Arnold Networks (KANs) relocate learnable nonlinearities from nodes to edges, demonstrating remarkable capabilities in scientific machine learning and interpretable modeling. However, current KAN implementations suffer from fundamental inefficiencies due to redundancy in high-dimensional spline parameter spaces, where numerous distinct parameterisations yield functionally equivalent behaviors. This redundancy manifests as a "nuisance space" in the model's Jacobian, leading to susceptibility to overfitting and poor generalization. We introduce Projective Kolmogorov-Arnold Networks (P-KANs), a novel training framework that guides edge function discovery towards interpretable functional representations through entropy-minimisation techniques from signal analysis and sparse dictionary learning. Rather than constraining functions to predetermined spaces, our approach maintains spline space flexibility while introducing "gravitational" terms that encourage convergence towards optimal functional representations. Our key insight recognizes that optimal representations can be identified through entropy analysis of projection coefficients, compressing edge functions to lower-parameter projective spaces (Fourier, Chebyshev, Bessel). P-KANs demonstrate superior performance across multiple domains, achieving up to 80% parameter reduction while maintaining representational capacity, significantly improved robustness to noise compared to standard KANs, and successful application to industrial automated fiber placement prediction. Our approach enables automatic discovery of mixed functional representations where different edges converge to different optimal spaces, providing both compression benefits and enhanced interpretability for scientific machine learning applications.

【7】Learning Robust Penetration-Testing Policies under Partial Observability: A systematic evaluation
标题：部分可观察性下学习稳健的渗透测试政策：系统评估
链接：https://arxiv.org/abs/2509.20008

作者：imon, Pieter Libin, Wim Mees
备注：27 pages, 8 figures
摘要：渗透测试是对网络攻击的模拟，以识别安全漏洞，它提出了一个非常适合强化学习（RL）自动化的顺序决策问题。像RL在现实世界问题中的许多应用一样，部分可观测性是一个主要挑战，因为它使马尔可夫决策过程（MDP）中的马尔可夫属性无效。部分可观测MDP需要历史聚合或信念状态估计来学习成功的策略。我们研究随机的，部分可观察的渗透测试场景在不同规模的主机网络，旨在更好地反映现实世界的复杂性，通过更具挑战性和代表性的基准。这种方法导致开发更强大和可转移的策略，这对于确保在多样化和不可预测的现实世界环境中的可靠性能至关重要。使用香草近端策略优化（PPO）作为基线，我们比较了一系列旨在减轻部分可观测性的PPO变体，包括帧堆叠，用历史信息增强观测，以及采用基于递归或变换器的架构。我们进行了系统的实证分析，这些算法在不同的主机网络规模。我们发现，这一任务大大受益于历史聚合。收敛速度比其他方法快三倍。通过算法对学习到的策略进行手动检查，可以发现明显的区别，并提供超越定量结果的见解。
摘要：Penetration testing, the simulation of cyberattacks to identify security vulnerabilities, presents a sequential decision-making problem well-suited for reinforcement learning (RL) automation. Like many applications of RL to real-world problems, partial observability presents a major challenge, as it invalidates the Markov property present in Markov Decision Processes (MDPs). Partially Observable MDPs require history aggregation or belief state estimation to learn successful policies. We investigate stochastic, partially observable penetration testing scenarios over host networks of varying size, aiming to better reflect real-world complexity through more challenging and representative benchmarks. This approach leads to the development of more robust and transferable policies, which are crucial for ensuring reliable performance across diverse and unpredictable real-world environments. Using vanilla Proximal Policy Optimization (PPO) as a baseline, we compare a selection of PPO variants designed to mitigate partial observability, including frame-stacking, augmenting observations with historical information, and employing recurrent or transformer-based architectures. We conduct a systematic empirical analysis of these algorithms across different host network sizes. We find that this task greatly benefits from history aggregation. Converging three times faster than other approaches. Manual inspection of the learned policies by the algorithms reveals clear distinctions and provides insights that go beyond quantitative results.

【8】Exploration with Foundation Models: Capabilities, Limitations, and Hybrid Approaches
标题：基础模型探索：能力、局限性和混合方法
链接：https://arxiv.org/abs/2509.19924

作者：o, Michelangelo Conserva, Dominik Jeurissen, Paulo Rauber
备注：16 pages, 7 figures. Accepted for presentation at the 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop on the Foundations of Reasoning in Language Models (FoRLM)
摘要：强化学习（RL）的探索仍然具有挑战性，特别是在稀疏奖励设置中。虽然基础模型具有很强的语义先验，但它们在经典RL基准测试中作为zero-shot探索代理的能力还没有得到很好的理解。我们在多武装土匪、网格世界和稀疏奖励Atari上对LLM和VLM进行基准测试，以测试zero-shot探索。我们的调查揭示了一个关键的局限性：虽然VLM可以从视觉输入中推断出高层次的目标，但它们在精确的低层次控制方面始终失败：“知行差距”。为了分析这一差距的潜在桥梁，我们研究了一个简单的政策混合框架，在一个受控的，最好的情况下。我们在这种理想化环境中的结果表明，VLM指导可以显着提高早期样本的效率，提供了一个清晰的分析的潜力和限制，使用基础模型来指导勘探，而不是端到端的控制。
摘要：Exploration in reinforcement learning (RL) remains challenging, particularly in sparse-reward settings. While foundation models possess strong semantic priors, their capabilities as zero-shot exploration agents in classic RL benchmarks are not well understood. We benchmark LLMs and VLMs on multi-armed bandits, Gridworlds, and sparse-reward Atari to test zero-shot exploration. Our investigation reveals a key limitation: while VLMs can infer high-level objectives from visual input, they consistently fail at precise low-level control: the "knowing-doing gap". To analyze a potential bridge for this gap, we investigate a simple on-policy hybrid framework in a controlled, best-case scenario. Our results in this idealized setting show that VLM guidance can significantly improve early-stage sample efficiency, providing a clear analysis of the potential and constraints of using foundation models to guide exploration rather than for end-to-end control.

【9】Modeling and Control of Deep Sign-Definite Dynamics with Application to Hybrid Powertrain Control
标题：深度定号动力学的建模和控制及其在混合动力总成控制中的应用
链接：https://arxiv.org/abs/2509.19869

作者：to, Ryotaro Shima, Kenji Kashima
备注：Submitted to Automatica
摘要：深度学习越来越多地用于复杂的大规模系统，其中第一原理建模很困难。然而，标准的深度学习模型通常无法在下游控制中强制执行物理结构或保持凸性，从而导致物理上不一致的预测和非凸性导致的不连续输入。我们引入符号约束-符号限制雅可比条目-统一单调性，积极性和符号确定性，此外，我们开发的模型构建方法，执行它们，连同控制合成过程。特别是，我们设计了满足这些约束条件的精确线性化深度模型，并将模型预测控制制定为凸二次规划，从而产生唯一的优化器和Lipschitz连续控制律。在一个双容系统和一个混合动力系统上，所提出的方法比现有的方法提高了预测精度，并产生更平滑的控制输入。
摘要：Deep learning is increasingly used for complex, large-scale systems where first-principles modeling is difficult. However, standard deep learning models often fail to enforce physical structure or preserve convexity in downstream control, leading to physically inconsistent predictions and discontinuous inputs owing to nonconvexity. We introduce sign constraints--sign restrictions on Jacobian entries--that unify monotonicity, positivity, and sign-definiteness; additionally, we develop model-construction methods that enforce them, together with a control-synthesis procedure. In particular, we design exactly linearizable deep models satisfying these constraints and formulate model predictive control as a convex quadratic program, which yields a unique optimizer and a Lipschitz continuous control law. On a two-tank system and a hybrid powertrain, the proposed approach improves prediction accuracy and produces smoother control inputs than existing methods.

【10】Sobolev acceleration for neural networks
标题：神经网络的Sobolev加速
链接：https://arxiv.org/abs/2509.19773

作者： Oh, Hanbaek Lyu, Hwijae Son
摘要：Sobolev训练将目标导数集成到损失函数中，与传统的L^2 $训练相比，它已被证明可以加速收敛并提高泛化能力。然而，这种训练方法的基本机制仍然只有部分了解。在这项工作中，我们提出了第一个严格的理论框架，证明Sobolev训练加速了整流线性单元（ReLU）网络的收敛。在高斯输入和浅架构的学生-教师框架下，我们推导出人口梯度和Hessians的精确公式，并量化了损失景观和梯度流收敛率的条件改善。大量的数值实验验证了我们的理论发现，并表明Sobolev训练的好处扩展到现代深度学习任务。
摘要：Sobolev training, which integrates target derivatives into the loss functions, has been shown to accelerate convergence and improve generalization compared to conventional $L^2$ training. However, the underlying mechanisms of this training method remain only partially understood. In this work, we present the first rigorous theoretical framework proving that Sobolev training accelerates the convergence of Rectified Linear Unit (ReLU) networks. Under a student-teacher framework with Gaussian inputs and shallow architectures, we derive exact formulas for population gradients and Hessians, and quantify the improvements in conditioning of the loss landscape and gradient-flow convergence rates. Extensive numerical experiments validate our theoretical findings and show that the benefits of Sobolev training extend to modern deep learning tasks.

【11】Frictional Q-Learning
标题：实用Q学习
链接：https://arxiv.org/abs/2509.19771

作者：im, Hyo Kyung Lee
摘要：我们在经典力学中的静摩擦和非政策RL中的外推误差之间进行类比，并使用它来制定一个约束，以防止政策漂移到不受支持的行动。在这项研究中，我们提出了一种用于连续控制的深度强化学习算法，它扩展了批量约束强化学习。我们的算法约束代理的动作空间，以鼓励类似于重放缓冲区中的行为，同时保持与正交动作空间的流形的距离。该约束保持了批约束的简单性，并提供了外推误差的直观物理解释。从经验上讲，我们进一步证明了我们的算法是鲁棒的训练，并在标准的连续控制基准上实现了有竞争力的性能。
摘要：We draw an analogy between static friction in classical mechanics and extrapolation error in off-policy RL, and use it to formulate a constraint that prevents the policy from drifting toward unsupported actions. In this study, we present Frictional Q-learning, a deep reinforcement learning algorithm for continuous control, which extends batch-constrained reinforcement learning. Our algorithm constrains the agent's action space to encourage behavior similar to that in the replay buffer, while maintaining a distance from the manifold of the orthonormal action space. The constraint preserves the simplicity of batch-constrained, and provides an intuitive physical interpretation of extrapolation error. Empirically, we further demonstrate that our algorithm is robustly trained and achieves competitive performance across standard continuous control benchmarks.

【12】Causal Machine Learning for Surgical Interventions
标题：手术干预的因果机器学习
链接：https://arxiv.org/abs/2509.19705

作者：mo, Nishant S. Chouhan, Micky C. Nnamdi, Yining Yuan, Shreya S. Chivilkar, Wenqi Shi, Steven W. Hwang, B. Randall Brenn, May D. Wang
摘要：手术决策是复杂的，需要了解患者特征，干预措施和结果之间的因果关系。在脊柱融合或脊柱侧凸矫正等高风险环境中，由于依赖于传统的统计方法，难以处理复杂的异质数据，因此对个体化治疗效果（ITE）的准确估计仍然有限。在这项研究中，我们开发了一个多任务元学习框架X-MultiTask，用于ITE估计，该框架对每个手术决策（例如，前入路与后入路，手术与非手术）作为不同的任务，同时学习跨任务的共享表示。为了加强因果有效性，我们将逆概率加权（IPW）的训练目标。我们在两个数据集上评估了我们的方法：（1）公共脊柱融合数据集（1，017例患者），以评估前路与后路手术对并发症严重程度的影响;（2）私人AIS数据集（368例患者），以分析后路脊柱融合（PSF）与非手术治疗对患者报告结局（PRO）的影响。我们的模型在前组中实现了最高的平均AUC（0.84），并在后组中保持了竞争力（0.77）。其在治疗效果估计方面优于基线，总体$NN-PEHE $（0.2778）和$ATE $（0.0763）最低。同样，在预测AIS中的PRO时，X-MultiTask在所有领域都一致显示出卓越的性能，$\epsilon_{\text{NN-PEHE}}$ = 0.2551和$\epsilon_{\text{ATE}}$ = 0.0902。通过提供强大的、患者特定的因果关系估计，X-MultiTask提供了一个强大的工具来推进个性化的手术护理和改善患者的预后。该代码可在https://github.com/Wizaaard/X-MultiTask上获得。
摘要：Surgical decision-making is complex and requires understanding causal relationships between patient characteristics, interventions, and outcomes. In high-stakes settings like spinal fusion or scoliosis correction, accurate estimation of individualized treatment effects (ITEs) remains limited due to the reliance on traditional statistical methods that struggle with complex, heterogeneous data. In this study, we develop a multi-task meta-learning framework, X-MultiTask, for ITE estimation that models each surgical decision (e.g., anterior vs. posterior approach, surgery vs. no surgery) as a distinct task while learning shared representations across tasks. To strengthen causal validity, we incorporate the inverse probability weighting (IPW) into the training objective. We evaluate our approach on two datasets: (1) a public spinal fusion dataset (1,017 patients) to assess the effect of anterior vs. posterior approaches on complication severity; and (2) a private AIS dataset (368 patients) to analyze the impact of posterior spinal fusion (PSF) vs. non-surgical management on patient-reported outcomes (PROs). Our model achieves the highest average AUC (0.84) in the anterior group and maintains competitive performance in the posterior group (0.77). It outperforms baselines in treatment effect estimation with the lowest overall $\epsilon_{\text{NN-PEHE}}$ (0.2778) and $\epsilon_{\text{ATE}}$ (0.0763). Similarly, when predicting PROs in AIS, X-MultiTask consistently shows superior performance across all domains, with $\epsilon_{\text{NN-PEHE}}$ = 0.2551 and $\epsilon_{\text{ATE}}$ = 0.0902. By providing robust, patient-specific causal estimates, X-MultiTask offers a powerful tool to advance personalized surgical care and improve patient outcomes. The code is available at https://github.com/Wizaaard/X-MultiTask.

【13】Diffusion-Based Impedance Learning for Contact-Rich Manipulation Tasks
标题：基于扩散的阻抗学习用于丰富接触的操纵任务
链接：https://arxiv.org/abs/2509.19696

作者：er, Tamim Asfour, Neville Hogan, Johannes Lachner
备注：15 pages, 12 figures
摘要：学习方法擅长于信息域中的运动生成，但主要不是为能量域中的物理交互而设计的。阻抗控制塑造物理交互，但需要通过选择可行的阻抗参数进行任务感知调整。我们提出了基于扩散的阻抗学习，这是一个结合了这两个领域的框架。基于变压器的扩散模型与外部扳手的交叉关注重建模拟零力轨迹（sZFT）。这捕获了平移和旋转任务空间行为。对于旋转，我们引入了一种新的基于SLERP的四元数噪声调度器，以确保几何一致性。然后将重建的sZFT传递到基于能量的估计器，该估计器更新刚度和阻尼参数。应用方向性规则，该规则减少沿非任务轴的阻抗，同时保持沿任务方向的刚性。训练数据是针对跑酷场景和使用Apple Vision Pro远程操作的机器人辅助治疗任务收集的。仅用数万个样本，该模型就实现了亚毫米的位置精度和亚度的旋转精度。其紧凑的模型尺寸使KUKA LBR iiwa机器人能够实现实时扭矩控制和自主刚度自适应。控制器在力和速度限制内实现了平稳的跑酷穿越，圆柱形，方形和星形钉插入的成功率为30/30，而在训练数据集中没有任何特定的钉演示。基于变换器的扩散模型、机器人控制器和Apple Vision Pro远程操作框架的所有代码都是公开的。这些结果标志着物理AI的重要一步，融合了基于模型的物理交互控制和基于学习的轨迹生成方法。
摘要：Learning methods excel at motion generation in the information domain but are not primarily designed for physical interaction in the energy domain. Impedance Control shapes physical interaction but requires task-aware tuning by selecting feasible impedance parameters. We present Diffusion-Based Impedance Learning, a framework that combines both domains. A Transformer-based Diffusion Model with cross-attention to external wrenches reconstructs a simulated Zero-Force Trajectory (sZFT). This captures both translational and rotational task-space behavior. For rotations, we introduce a novel SLERP-based quaternion noise scheduler that ensures geometric consistency. The reconstructed sZFT is then passed to an energy-based estimator that updates stiffness and damping parameters. A directional rule is applied that reduces impedance along non task axes while preserving rigidity along task directions. Training data were collected for a parkour scenario and robotic-assisted therapy tasks using teleoperation with Apple Vision Pro. With only tens of thousands of samples, the model achieved sub-millimeter positional accuracy and sub-degree rotational accuracy. Its compact model size enabled real-time torque control and autonomous stiffness adaptation on a KUKA LBR iiwa robot. The controller achieved smooth parkour traversal within force and velocity limits and 30/30 success rates for cylindrical, square, and star peg insertions without any peg-specific demonstrations in the training data set. All code for the Transformer-based Diffusion Model, the robot controller, and the Apple Vision Pro telemanipulation framework is publicly available. These results mark an important step towards Physical AI, fusing model-based control for physical interaction with learning-based methods for trajectory generation.

【14】Deep Learning for Clouds and Cloud Shadow Segmentation in Methane Satellite and Airborne Imaging Spectroscopy
标题：甲烷卫星和机载成像光谱中云和云影分割的深度学习
链接：https://arxiv.org/abs/2509.19665

作者：rez-Carrasco, Maya Nasr, Sebastien Roche, Chris Chan Miller, Zhan Zhang, Core Francisco Park, Eleanor Walker, Cecilia Garraffo, Douglas Finkbeiner, Ritesh Gautam, Steven Wofsy
摘要：有效的云和云阴影检测是高光谱遥感中准确反演大气甲烷或其他痕量气体浓度的关键前提。这一挑战对甲烷卫星及其机载伴随飞行任务甲烷空气特别重要。在这项研究中，我们使用机器学习方法来解决这些高空间分辨率仪器传感器的云和云阴影检测问题。遥感数据中的云和云阴影需要有效地筛选出来，因为它们会使遥感图像中的甲烷检索产生偏差，并影响排放量的量化。我们部署和评估传统技术，包括迭代逻辑回归（ILR）和多层感知器（MLP），以及先进的深度学习架构，即UNet和频谱通道注意力网络（SCAN）方法。我们的研究结果表明，传统的方法与空间一致性和边界定义的斗争，影响云和云阴影的检测。深度学习模型大大提高了检测质量：UNet在保留空间结构方面表现最好，而SCAN在捕获精细边界细节方面表现出色。值得注意的是，SCAN在甲烷卫星数据方面超过了UNet，突出说明了将光谱注意力纳入卫星具体特征的好处。这种对各种不同机器学习技术的深入评估展示了先进深度学习架构在为云和云阴影筛选提供强大、可扩展的解决方案方面的优势和有效性，以增强现有和下一代高光谱任务的甲烷排放量化能力。我们的数据和代码可在https://doi.org/10.7910/DVN/IKLZOJ上公开获取
摘要：Effective cloud and cloud shadow detection is a critical prerequisite for accurate retrieval of concentrations of atmospheric methane or other trace gases in hyperspectral remote sensing. This challenge is especially pertinent for MethaneSAT and for its airborne companion mission, MethaneAIR. In this study, we use machine learning methods to address the cloud and cloud shadow detection problem for sensors with these high spatial resolutions instruments. Cloud and cloud shadows in remote sensing data need to be effectively screened out as they bias methane retrievals in remote sensing imagery and impact the quantification of emissions. We deploy and evaluate conventional techniques including Iterative Logistic Regression (ILR) and Multilayer Perceptron (MLP), with advanced deep learning architectures, namely UNet and a Spectral Channel Attention Network (SCAN) method. Our results show that conventional methods struggle with spatial coherence and boundary definition, affecting the detection of clouds and cloud shadows. Deep learning models substantially improve detection quality: UNet performs best in preserving spatial structure, while SCAN excels at capturing fine boundary details. Notably, SCAN surpasses UNet on MethaneSAT data, underscoring the benefits of incorporating spectral attention for satellite specific features. This in depth assessment of various disparate machine learning techniques demonstrates the strengths and effectiveness of advanced deep learning architectures in providing robust, scalable solutions for clouds and cloud shadow screening towards enhancing methane emission quantification capacity of existing and next generation hyperspectral missions. Our data and code is publicly available at https://doi.org/10.7910/DVN/IKLZOJ

【15】Improved Therapeutic Antibody Reformatting through Multimodal Machine Learning
标题：通过多模式机器学习改进治疗性抗体重组
链接：https://arxiv.org/abs/2509.19604

作者：, Aniruddh Raghu, Nick Bhattacharya, Adam Carr, Melanie Montgomery, Hunter Elliott
备注：NeurIPS 2025 AI4Science Workshop and NeurIPS 2025 Multi-modal Foundation Models and Large Language Models for Life Sciences Workshop
摘要：现代治疗性抗体设计通常涉及组成单个功能结构域的多部分组装体，每个功能结构域可以来自不同的来源或独立地工程化。虽然这些复杂的形式可以扩大疾病的适用性和提高安全性，但它们提出了一个重大的工程挑战：在新的形式中，单个结构域的功能和稳定性得不到保证，整个分子可能不再是可合成的。为了应对这些挑战，我们开发了一个机器学习框架来预测“重新格式化成功”--将抗体从一种格式转换为另一种格式是否会成功。我们的框架结合了抗体序列和结构背景，结合了反映现实部署情况的评估协议。在对真实世界抗体重新格式化数据集的实验中，我们发现了令人惊讶的结果，即大型预训练蛋白质语言模型（PLM）未能优于简单的，领域定制的多模态表示。这在最困难的评估设置中尤其明显，我们测试模型对新起始抗体的泛化。在这种具有挑战性的“新抗体，没有数据”的情况下，我们最好的多模式模型实现了高预测准确性，从而能够优先考虑有希望的候选人并减少浪费的实验工作。
摘要：Modern therapeutic antibody design often involves composing multi-part assemblages of individual functional domains, each of which may be derived from a different source or engineered independently. While these complex formats can expand disease applicability and improve safety, they present a significant engineering challenge: the function and stability of individual domains are not guaranteed in the novel format, and the entire molecule may no longer be synthesizable. To address these challenges, we develop a machine learning framework to predict "reformatting success" -- whether converting an antibody from one format to another will succeed or not. Our framework incorporates both antibody sequence and structural context, incorporating an evaluation protocol that reflects realistic deployment scenarios. In experiments on a real-world antibody reformatting dataset, we find the surprising result that large pretrained protein language models (PLMs) fail to outperform simple, domain-tailored, multimodal representations. This is particularly evident in the most difficult evaluation setting, where we test model generalization to a new starting antibody. In this challenging "new antibody, no data" scenario, our best multimodal model achieves high predictive accuracy, enabling prioritization of promising candidates and reducing wasted experimental effort.

【16】Modular Machine Learning with Applications to Genetic Circuit Composition
标题：模块化机器学习及其在遗传电路合成中的应用
链接：https://arxiv.org/abs/2509.19601

作者：g, Eduardo D. Sontag, Domitilla Del Vecchio
摘要：在一些应用中，包括合成生物学中，通常在由许多模块组成的系统上具有输入/输出数据，并且尽管模块的输入/输出函数和信号可能是未知的，但是组合架构的知识可以显著减少学习系统的输入/输出映射所需的训练数据量。学习模块的输入/输出函数对于从不同的组合架构设计新系统也是必要的。在这里，我们提出了一个模块化的学习框架，它结合了系统的组成结构的先验知识，以（a）识别组成模块的输入/输出功能，从系统的输入/输出数据和（b）实现这一点，通过使用减少的数据量相比，将需要没有知识的组成结构。为了实现这一目标，我们引入了模块可识别性的概念，它允许从系统的输入/输出数据的子集中恢复模块的输入/输出功能，并提供理论保证一类由遗传电路激励的系统。我们展示了计算研究的理论，表明一个神经网络（NNET），占组成结构可以学习组成模块的输入/输出功能，并预测系统的输出输入外的训练集分布。相比之下，结构不可知的神经网络无法预测训练集分布之外的输入。通过减少对实验数据的需求并允许模块识别，该框架提供了简化合成生物电路和更普遍的多模块系统的设计的潜力。
摘要：In several applications, including in synthetic biology, one often has input/output data on a system composed of many modules, and although the modules' input/output functions and signals may be unknown, knowledge of the composition architecture can significantly reduce the amount of training data required to learn the system's input/output mapping. Learning the modules' input/output functions is also necessary for designing new systems from different composition architectures. Here, we propose a modular learning framework, which incorporates prior knowledge of the system's compositional structure to (a) identify the composing modules' input/output functions from the system's input/output data and (b) achieve this by using a reduced amount of data compared to what would be required without knowledge of the compositional structure. To achieve this, we introduce the notion of modular identifiability, which allows recovery of modules' input/output functions from a subset of the system's input/output data, and provide theoretical guarantees on a class of systems motivated by genetic circuits. We demonstrate the theory on computational studies showing that a neural network (NNET) that accounts for the compositional structure can learn the composing modules' input/output functions and predict the system's output on inputs outside of the training set distribution. By contrast, a neural network that is agnostic of the structure is unable to predict on inputs that fall outside of the training set distribution. By reducing the need for experimental data and allowing module identification, this framework offers the potential to ease the design of synthetic biological circuits and of multi-module systems more generally.

【17】THINNs: Thermodynamically Informed Neural Networks
标题：THINNS：热力学信息神经网络
链接：https://arxiv.org/abs/2509.19467

作者：stro, Benjamin Gess
摘要：物理信息神经网络（PINN）是一类深度学习模型，旨在通过训练神经网络来最小化方程的残差来近似PDE的解。聚焦于非平衡波动系统，我们提出了一个物理上知情的选择惩罚，是一致的基本波动结构，其特征在于由一个大偏差原则。这种方法产生了一种新的PINN公式，其中惩罚项被选择为惩罚不可能的偏差，而不是被选择为抽象的。由此产生的PINNs，称为THINNs，随后分析建立分析后验估计，并提供实证比较既定的惩罚策略。
摘要：Physics-Informed Neural Networks (PINNs) are a class of deep learning models aiming to approximate solutions of PDEs by training neural networks to minimize the residual of the equation. Focusing on non-equilibrium fluctuating systems, we propose a physically informed choice of penalization that is consistent with the underlying fluctuation structure, as characterized by a large deviations principle. This approach yields a novel formulation of PINNs in which the penalty term is chosen to penalize improbable deviations, rather than being selected heuristically. The resulting thermodynamically consistent extension of PINNs, termed THINNs, is subsequently analyzed by establishing analytical a posteriori estimates, and providing empirical comparisons to established penalization strategies.

【18】Probabilistic Runtime Verification, Evaluation and Risk Assessment of Visual Deep Learning Systems
标题：视觉深度学习系统的概率验证、评估和风险评估
链接：https://arxiv.org/abs/2509.19419

作者：mann-Hagen, Pål Halvorsen, Michael A. Riegler, Dag Johansen
摘要：尽管深度神经网络在基准测试中表现出色，但由于对输入数据中微小的、通常难以察觉的变化（称为分布变化）的敏感性，它在现实世界的部署中往往表现不佳。这些变化在实际场景中很常见，但在评估过程中很少考虑，导致性能指标膨胀。为了解决这一差距，我们提出了一种新的方法来验证，评估和深度学习系统的风险评估。我们的方法明确地建模的分布变化的发生率在运行时，通过估计其概率从输出的分布检测器。我们将这些估计与网络正确性的条件概率相结合，将它们构建在二叉树中。通过遍历这棵树，我们可以计算出可靠和精确的网络精度估计。我们评估我们的方法在五个不同的数据集，我们模拟部署条件的特点是不同频率的分布转移。我们的方法始终优于传统的评估，精度估计误差通常在0.01和0.1之间。我们进一步展示了我们的方法在医疗细分基准上的潜力，其中我们通过将成本与树节点相关联，为成本效益分析和价值判断提供信息，将我们的方法应用于风险评估。最终，我们的方法通过提供更准确的性能估计和可操作的风险评估，为提高深度学习系统的可靠性和可信度提供了一个强大的框架，特别是在安全关键型应用中。
摘要：Despite achieving excellent performance on benchmarks, deep neural networks often underperform in real-world deployment due to sensitivity to minor, often imperceptible shifts in input data, known as distributional shifts. These shifts are common in practical scenarios but are rarely accounted for during evaluation, leading to inflated performance metrics. To address this gap, we propose a novel methodology for the verification, evaluation, and risk assessment of deep learning systems. Our approach explicitly models the incidence of distributional shifts at runtime by estimating their probability from outputs of out-of-distribution detectors. We combine these estimates with conditional probabilities of network correctness, structuring them in a binary tree. By traversing this tree, we can compute credible and precise estimates of network accuracy. We assess our approach on five different datasets, with which we simulate deployment conditions characterized by differing frequencies of distributional shift. Our approach consistently outperforms conventional evaluation, with accuracy estimation errors typically ranging between 0.01 and 0.1. We further showcase the potential of our approach on a medical segmentation benchmark, wherein we apply our methods towards risk assessment by associating costs with tree nodes, informing cost-benefit analyses and value-judgments. Ultimately, our approach offers a robust framework for improving the reliability and trustworthiness of deep learning systems, particularly in safety-critical applications, by providing more accurate performance estimates and actionable risk assessments.

【19】Learning from Observation: A Survey of Recent Advances
标题：从观察中学习：最近进展概览
链接：https://arxiv.org/abs/2509.19379

作者：urnwal, Hriday Mehta, Nirav Pravinbhai Bhatt, Balaraman Ravindran
摘要：模仿学习（IL）算法提供了一种有效的方式来训练代理人通过模仿专家的行为，而不需要奖励功能。IL算法通常需要访问来自专家演示的状态和动作信息。尽管专家操作可以提供详细的指导，但对于难以获得专家操作的现实应用程序来说，要求此类操作信息可能不切实际。为了解决这一限制，从观察中学习（LfO）或仅状态模仿学习（SOIL）的概念最近得到了关注，其中模仿者只能访问专家状态访问信息。在本文中，我们提出了一个框架，LfO和使用它来调查和分类现有的LfO方法的轨迹建设，假设和算法的设计选择。该调查还绘制了几个相关领域之间的联系，如离线RL，基于模型的RL和分层RL。最后，我们使用我们的框架，以确定开放的问题，并建议未来的研究方向。
摘要：Imitation Learning (IL) algorithms offer an efficient way to train an agent by mimicking an expert's behavior without requiring a reward function. IL algorithms often necessitate access to state and action information from expert demonstrations. Although expert actions can provide detailed guidance, requiring such action information may prove impractical for real-world applications where expert actions are difficult to obtain. To address this limitation, the concept of learning from observation (LfO) or state-only imitation learning (SOIL) has recently gained attention, wherein the imitator only has access to expert state visitation information. In this paper, we present a framework for LfO and use it to survey and classify existing LfO methods in terms of their trajectory construction, assumptions and algorithm's design choices. This survey also draws connections between several related fields like offline RL, model-based RL and hierarchical RL. Finally, we use our framework to identify open problems and suggest future research directions.

【20】DeepACTIF: Efficient Feature Attribution via Activation Traces in Neural Sequence Models
标题：DeepACTIF：通过神经序列模型中的激活痕迹进行高效特征归因
链接：https://arxiv.org/abs/2509.19362

作者：W. Hosp
摘要：特征属性对于解释深度学习模型至关重要，特别是在医疗保健、生物识别和人机交互等时间序列领域。然而，标准的归因方法，如综合归因或SHAP，是计算密集型的，并不适合实时应用。我们提出了DeepACTIF，一种轻量级和架构感知的特征归因方法，它利用序列模型的内部激活来有效地估计特征重要性。专注于基于LSTM的网络，我们引入了一个反加权聚合方案，强调跨时间步长的激活的稳定性和幅度。我们对三个生物特征凝视数据集的评估表明，DeepACTIF不仅在严重的特征减少（前10%的特征）下保持了预测性能，而且在准确性和统计鲁棒性方面显著优于已有的方法，包括SHAP，IG和DeepLIFT。使用Wilcoxon符号秩检验和效应量分析，我们证明了DeepACTIF在所有前k条件下（10 - 40%）产生更多信息的特征排名，并且误差显著降低。我们的实验表明，DeepACTIF不仅减少了几个数量级的计算时间和内存使用，而且在只使用排名靠前的特征时还保持了模型的准确性。这使得DeepACTIF成为边缘设备（如移动XR耳机或嵌入式健康监测器）上实时可解释性的可行解决方案。
摘要：Feature attribution is essential for interpreting deep learning models, particularly in time-series domains such as healthcare, biometrics, and human-AI interaction. However, standard attribution methods, such as Integrated Gradients or SHAP, are computationally intensive and not well-suited for real-time applications. We present DeepACTIF, a lightweight and architecture-aware feature attribution method that leverages internal activations of sequence models to estimate feature importance efficiently. Focusing on LSTM-based networks, we introduce an inverse-weighted aggregation scheme that emphasises stability and magnitude of activations across time steps. Our evaluation across three biometric gaze datasets shows that DeepACTIF not only preserves predictive performance under severe feature reduction (top 10% of features) but also significantly outperforms established methods, including SHAP, IG, and DeepLIFT, in terms of both accuracy and statistical robustness. Using Wilcoxon signed-rank tests and effect size analysis, we demonstrate that DeepACTIF yields more informative feature rankings with significantly lower error across all top-k conditions (10 - 40%). Our experiments demonstrate that DeepACTIF not only reduces computation time and memory usage by orders of magnitude but also preserves model accuracy when using only top-ranked features. That makes DeepACTIF a viable solution for real-time interpretability on edge devices such as mobile XR headsets or embedded health monitors.

【21】Anti-Money Laundering Systems Using Deep Learning
标题：使用深度学习的反洗钱系统
链接：https://arxiv.org/abs/2509.19359

作者：Abdalwahid Sidiq, Yimamu Kirubel Wondaferew
备注：22 pages, 9 figures
摘要：在本文中，我们专注于使用深度学习方法来检测金融交易网络中的洗钱行为，以证明它可以作为更常用的基于规则的系统和传统反洗钱（AML）系统的补充或替代。本文探讨了反洗钱（AML）活动在全球金融业中发挥的关键作用。它强调了传统AML系统的缺点，这些系统表现出很高的误报率，并且缺乏发现复杂洗钱计划的复杂性。为了应对这些挑战，本文提出了一种先进的AML系统，该系统利用深度学习技术进行链接分析。该系统的核心在于利用中心性算法，如度中心性，接近中心性，介数中心性和PageRank。这些算法通过检查金融交易网络内的影响和相互联系，提高了系统识别可疑活动的能力。本文讨论了全球金融部门反洗钱工作的重要性。它强调了传统AML系统的局限性。结果显示了GCN模型的新实现的实用性和优越性，这是一种连接结构化数据的首选方法，这意味着交易或账户在其金融环境的背景下进行分析。此外，本文还探讨了反洗钱（AML）工作的前景，提出了深度学习和中心性算法等新兴技术的整合。这种整合有望通过完善反洗钱系统的能力来提高其效力。
摘要：In this paper, we focused on using deep learning methods for detecting money laundering in financial transaction networks, in order to demonstrate that it can be used as a complement or instead of the more commonly used rule-based systems and conventional Anti-Money Laundering (AML) systems. The paper explores the pivotal role played by Anti-Money Laundering (AML) activities in the global financial industry. It underscores the drawbacks of conventional AML systems, which exhibit high rates of false positives and lack the sophistication to uncover intricate money laundering schemes. To tackle these challenges, the paper proposes an advanced AML system that capitalizes on link analysis using deep learning techniques. At the heart of this system lies the utilization of centrality algorithms like Degree Centrality, Closeness Centrality, Betweenness Centrality, and PageRank. These algorithms enhance the system's capability to identify suspicious activities by examining the influence and interconnections within networks of financial transactions. The significance of Anti-Money Laundering (AML) efforts within the global financial sector is discussed in this paper. It highlights the limitations of traditional AML systems. The results showed the practicality and superiority of the new implementation of the GCN model, which is a preferable method for connectively structured data, meaning that a transaction or account is analyzed in the context of its financial environment. In addition, the paper delves into the prospects of Anti-Money Laundering (AML) efforts, proposing the integration of emerging technologies such as deep learning and centrality algorithms. This integration holds promise for enhancing the effectiveness of AML systems by refining their capabilities.

【22】Fine-Grained AI Model Caching and Downloading With Coordinated Multipoint Broadcasting in Multi-Cell Edge Networks
标题：细粒度人工智能模型在多小区边缘网络中通过协调多点广播进行缓存和下载
链接：https://arxiv.org/abs/2509.19341

作者：Peng Qin, Yueyue Zhang, Yifei Wang
摘要：预计6G网络将支持按需AI模型下载，以适应最终用户的各种推理需求。通过在边缘节点主动缓存模型，用户可以以低延迟检索所请求的模型，以进行设备上的AI推理。然而，当代人工智能模型的巨大规模对有限存储容量下的边缘缓存以及通过无线信道并发交付异构模型提出了重大挑战。为了解决这些挑战，我们提出了一个细粒度的AI模型缓存和下载系统，该系统利用参数可重用性，这源于从具有冻结参数的共享预训练模型中微调特定任务模型的常见做法。该系统在边缘节点处选择性地缓存模型参数块（PB），消除了跨不同缓存模型的可重用参数的冗余存储。此外，它结合了协作多点（CoMP）广播，以同时向多个用户递送可重用的PB，从而增强下行链路频谱利用率。在这种安排下，我们制定了一个模型下载延迟最小化问题，以共同优化PB缓存，迁移（边缘节点之间），和广播波束成形。为了解决这个棘手的问题，我们开发了一个分布式多智能体学习框架，使边缘节点明确学习他们的行动之间的相互影响，从而促进合作。此外，提出了一种数据增强方法，通过预测模型自适应地生成合成训练样本，提高样本效率并加速策略学习。理论分析和仿真实验都验证了所提出的学习框架的优越收敛性能。
摘要：6G networks are envisioned to support on-demand AI model downloading to accommodate diverse inference requirements of end users. By proactively caching models at edge nodes, users can retrieve the requested models with low latency for on-device AI inference. However, the substantial size of contemporary AI models poses significant challenges for edge caching under limited storage capacity, as well as for the concurrent delivery of heterogeneous models over wireless channels. To address these challenges, we propose a fine-grained AI model caching and downloading system that exploits parameter reusability, stemming from the common practice of fine-tuning task-specific models from a shared pre-trained model with frozen parameters. This system selectively caches model parameter blocks (PBs) at edge nodes, eliminating redundant storage of reusable parameters across different cached models. Additionally, it incorporates coordinated multipoint (CoMP) broadcasting to simultaneously deliver reusable PBs to multiple users, thereby enhancing downlink spectrum utilization. Under this arrangement, we formulate a model downloading delay minimization problem to jointly optimize PB caching, migration (among edge nodes), and broadcasting beamforming. To tackle this intractable problem, we develop a distributed multi-agent learning framework that enables edge nodes to explicitly learn mutual influence among their actions, thereby facilitating cooperation. Furthermore, a data augmentation approach is proposed to adaptively generate synthetic training samples through a predictive model, boosting sample efficiency and accelerating policy learning. Both theoretical analysis and simulation experiments validate the superior convergence performance of the proposed learning framework.

【23】Examining the robustness of Physics-Informed Neural Networks to noise for Inverse Problems
标题：检查物理信息神经网络对逆问题的噪音的鲁棒性
链接：https://arxiv.org/abs/2509.20191

作者：a Jekic, Afroditi Natsaridou, Signe Riemer-Sørensen, Helge Langseth, Odd Erik Gundersen
备注：25 pages without appendix, 22 figures, submitted to a journal
摘要：Approximating solutions to partial differential equations (PDEs) is fundamental for the modeling of dynamical systems in science and engineering. Physics-informed neural networks (PINNs) are a recent machine learning-based approach, for which many properties and limitations remain unknown. PINNs are widely accepted as inferior to traditional methods for solving PDEs, such as the finite element method, both with regard to computation time and accuracy. However, PINNs are commonly claimed to show promise in solving inverse problems and handling noisy or incomplete data. We compare the performance of PINNs in solving inverse problems with that of a traditional approach using the finite element method combined with a numerical optimizer. The models are tested on a series of increasingly difficult fluid mechanics problems, with and without noise. We find that while PINNs may require less human effort and specialized knowledge, they are outperformed by the traditional approach. However, the difference appears to decrease with higher dimensions and more data. We identify common failures during training to be addressed if the performance of PINNs on noisy inverse problems is to become more competitive.

【24】High-Dimensional Statistical Process Control via Manifold Fitting and Learning
标题：通过管汇匹配和学习实现多维统计过程控制
链接：https://arxiv.org/abs/2509.19820

作者：Tas, Enrique del Castillo
摘要：We address the Statistical Process Control (SPC) of high-dimensional, dynamic industrial processes from two complementary perspectives: manifold fitting and manifold learning, both of which assume data lies on an underlying nonlinear, lower dimensional space. We propose two distinct monitoring frameworks for online or 'phase II' Statistical Process Control (SPC). The first method leverages state-of-the-art techniques in manifold fitting to accurately approximate the manifold where the data resides within the ambient high-dimensional space. It then monitors deviations from this manifold using a novel scalar distribution-free control chart. In contrast, the second method adopts a more traditional approach, akin to those used in linear dimensionality reduction SPC techniques, by first embedding the data into a lower-dimensional space before monitoring the embedded observations. We prove how both methods provide a controllable Type I error probability, after which they are contrasted for their corresponding fault detection ability. Extensive numerical experiments on a synthetic process and on a replicated Tennessee Eastman Process show that the conceptually simpler manifold-fitting approach achieves performance competitive with, and sometimes superior to, the more classical lower-dimensional manifold monitoring methods. In addition, we demonstrate the practical applicability of the proposed manifold-fitting approach by successfully detecting surface anomalies in a real image dataset of electrical commutators.

【25】Discovery of Sustainable Refrigerants through Physics-Informed RL Fine-Tuning of Sequence Models
标题：通过基于物理信息的RL对序列模型进行微调来发现可持续制冷剂
链接：https://arxiv.org/abs/2509.19588

作者：ldszal, Diego Calanzone, Vincent Taboga, Pierre-Luc Bacon
摘要：Most refrigerants currently used in air-conditioning systems, such as hydrofluorocarbons, are potent greenhouse gases and are being phased down. Large-scale molecular screening has been applied to the search for alternatives, but in practice only about 300 refrigerants are known, and only a few additional candidates have been suggested without experimental validation. This scarcity of reliable data limits the effectiveness of purely data-driven methods. We present Refgen, a generative pipeline that integrates machine learning with physics-grounded inductive biases. Alongside fine-tuning for valid molecular generation, Refgen incorporates predictive models for critical properties, equations of state, thermochemical polynomials, and full vapor compression cycle simulations. These models enable reinforcement learning fine-tuning under thermodynamic constraints, enforcing consistency and guiding discovery toward molecules that balance efficiency, safety, and environmental impact. By embedding physics into the learning process, Refgen leverages scarce data effectively and enables de novo refrigerant discovery beyond the known set of compounds.

【26】The Platonic Universe: Do Foundation Models See the Same Sky?
标题：柏拉图宇宙：基金会模型看到同一片天空吗？
链接：https://arxiv.org/abs/2509.19453

作者：BD: Kshitij Duraphe, Michael J. Smith, Shashwat Sourav, John F. Wu
备注：9 pages, 3 tables, 1 figure. Accepted as a workshop paper to Machine Learning and the Physical Sciences at NeurIPS 2025
摘要：We test the Platonic Representation Hypothesis (PRH) in astronomy by measuring representational convergence across a range of foundation models trained on different data types. Using spectroscopic and imaging observations from JWST, HSC, Legacy Survey, and DESI, we compare representations from vision transformers, self-supervised models, and astronomy-specific architectures via mutual $k$-nearest neighbour analysis. We observe consistent scaling: representational alignment generally increases with model capacity across our tested architectures, supporting convergence toward a shared representation of galaxy astrophysics. Our results suggest that astronomical foundation models can use pre-trained general-purpose architectures, allowing us to capitalise on the broader machine learning community's already-spent computational investment.

【27】Neural Network Based Framework for Passive Intermodulation Cancellation in MIMO Systems
标题：基于神经网络的多输入输出系统无源交调消除框架
链接：https://arxiv.org/abs/2509.19382

作者：Li, Zhi-qin John Xu, Peiting You, Yifei Zhu
摘要：Passive intermodulation (PIM) has emerged as a critical source of self-interference in modern MIMO-OFDM systems, especially under the stringent requirements of 5G and beyond. Conventional cancellation methods often rely on complex nonlinear models with limited scalability and high computational cost. In this work, we propose a lightweight deep learning framework for PIM cancellation that leverages depthwise separable convolutions and dilated convolutions to efficiently capture nonlinear dependencies across antennas and subcarriers. To further enhance convergence, we adopt a cyclic learning rate schedule and gradient clipping. In a controlled MIMO experimental setup, the method effectively suppresses third-order passive intermodulation (PIM) distortion, achieving up to 29dB of average power error (APE) with only 11k trainable parameters. These results highlight the potential of compact neural architectures for scalable interference mitigation in future wireless communication systems.

【28】The Impact of Structural Changes on Learning Capacity in the Fly Olfactory Neural Circuit
标题：苍蝇嗅觉神经回路结构变化对学习能力的影响
链接：https://arxiv.org/abs/2509.19351

作者： Xie, Gabriel Koch Ocker
摘要：The Drosophila mushroom body (MB) is known to be involved in olfactory learning and memory; the synaptic plasticity of the Kenyon cell (KC) to mushroom body output neuron (MBON) synapses plays a key role in the learning process. Previous research has focused on projection neuron (PN) to Kenyon cell (KC) connectivity within the MB; we examine how perturbations to the mushroom body circuit structure and changes in connectivity, specifically within the KC to mushroom body output neuron (MBON) neural circuit, affect the MBONs' ability to distinguish between odor classes. We constructed a neural network that incorporates the connectivity between PNs, KCs, and MBONs. To train our model, we generated ten artificial input classes, which represent the projection neuron activity in response to different odors. We collected data on the number of KC-to-MBON connections, MBON error rates, and KC-to-MBON synaptic weights, among other metrics. We observed that MBONs with very few presynaptic KCs consistently performed worse than others in the odor classification task. The developmental types of KCs also played a significant role in each MBON's output. We performed random and targeted KC ablation and observed that ablating developmentally mature KCs had a greater negative impact on MBONs' learning capacity than ablating immature KCs. Random and targeted pruning of KC-MBON synaptic connections yielded results largely consistent with the ablation experiments. To further explore the various types of KCs, we also performed rewiring experiments in the PN to KC circuit. Our study furthers our understanding of olfactory neuroplasticity and provides important clues to understanding learning and memory in general. Understanding how the olfactory circuits process and learn can also have potential applications in artificial intelligence and treatments for neurodegenerative diseases.

【29】A Measurement Report Data-Driven Framework for Localized Statistical Channel Modeling
标题：用于本地化统计渠道建模的测量报告数据驱动框架
链接：https://arxiv.org/abs/2509.19342

作者：, Ye Xue, Qi Yan, Shutao Zhang, Bingsheng Peng, Tsung-Hui Chang
摘要：Localized statistical channel modeling (LSCM) is crucial for effective performance evaluation in digital twin-assisted network optimization. Solely relying on the multi-beam reference signal receiving power (RSRP), LSCM aims to model the localized statistical propagation environment by estimating the channel angular power spectrum (APS). However, existing methods rely heavily on drive test data with high collection costs and limited spatial coverage. In this paper, we propose a measurement report (MR) data-driven framework for LSCM, exploiting the low-cost and extensive collection of MR data. The framework comprises two novel modules. The MR localization module addresses the issue of missing locations in MR data by introducing a semi-supervised method based on hypergraph neural networks, which exploits multi-modal information via distance-aware hypergraph modeling and hypergraph convolution for location extraction. To enhance the computational efficiency and solution robustness, LSCM operates at the grid level. Compared to independently constructing geographically uniform grids and estimating channel APS, the joint grid construction and channel APS estimation module enhances robustness in complex environments with spatially non-uniform data by exploiting their correlation. This module alternately optimizes grid partitioning and APS estimation using clustering and improved sparse recovery for the ill-conditioned measurement matrix and incomplete observations. Through comprehensive experiments on a real-world MR dataset, we demonstrate the superior performance and robustness of our framework in localization and channel modeling.

【30】E2E Learning Massive MIMO for Multimodal Semantic Non-Orthogonal Transmission and Fusion
标题：面向多模态语义非正交传输与融合的E2E学习大规模MIMO
链接：https://arxiv.org/abs/2509.19312

作者：u, Zhen Gao
摘要：Massive multiple-input multiple-output (MIMO) promises high spectral efficiency but also leads to high-dimensional downlink channel state information (CSI), which complicates real-time channel acquisition and precoding. To address this, we propose an end-to-end (E2E) uplink-downlink CSI fusion precoding network that jointly models downlink CSI reference signal (CSI-RS) design, CSI feedback, and base-station (BS) precoding within a single E2E neural architecture. Concretely, a projection network built on the MAXIM architecture takes uplink sounding reference signals (SRS) as input and outputs frequency-, beam-, and port-domain projection matrices for designing downlink CSI-RS. User equipment (UE) then compresses/quantizes the resulting CSI-RS observations and feeds back a compact representation. At the base station (BS), two complementary branches produce candidate precoders: one is a feedback-only precoding network driven by quantized downlink observations, and the other is an SRS-only precoding network driven by uplink SRS. These candidate precoders are subsequently combined by a fusion precoding network to yield the final transmit precoder. All the modules are trained with a spectral-efficiency-oriented loss under a three-stage schedule. Simulation results show that the proposed approach effectively harnesses both SRS-derived information and UE feedback, achieving markedly better performance than conventional baselines.

其他(32篇)

【1】Alignment-Sensitive Minimax Rates for Spectral Algorithms with Learned Kernels
标题：具有学习核的谱算法的对齐敏感极小极大速率
链接：https://arxiv.org/abs/2509.20294

作者：Huang, Zhifan Li, Yicheng Li, Qian Lin
摘要：We study spectral algorithms in the setting where kernels are learned from data. We introduce the effective span dimension (ESD), an alignment-sensitive complexity measure that depends jointly on the signal, spectrum, and noise level $\sigma^2$. The ESD is well-defined for arbitrary kernels and signals without requiring eigen-decay conditions or source conditions. We prove that for sequence models whose ESD is at most $K$, the minimax excess risk scales as $\sigma^2 K$. Furthermore, we analyze over-parameterized gradient flow and prove that it can reduce the ESD. This finding establishes a connection between adaptive feature learning and provable improvements in generalization of spectral algorithms. We demonstrate the generality of the ESD framework by extending it to linear models and RKHS regression, and we support the theory with numerical experiments. This framework provides a novel perspective on generalization beyond traditional fixed-kernel theories.

【2】Failure Modes of Maximum Entropy RLHF
标题：最大量RL HF的失效模式
链接：https://arxiv.org/abs/2509.20265

作者：el Çağatan, Barış Akgün
备注：26 pages, 9 figures
摘要：In this paper, we show that Simple Preference Optimization (SimPO) can be derived as Maximum Entropy Reinforcement Learning with length-normalized temperature, providing a theoretical foundation for this reference-free method. Motivated by SimPO's strong performance in offline preference optimization, we investigate whether Maximum Entropy RL can achieve similar results in online RLHF settings. Our experiments find that Maximum Entropy RL consistently exhibits overoptimization and unstable KL dynamics, even at very low learning rates. Unlike KL-constrained methods that maintain stable training, entropy regularization fails to prevent reward hacking and appears to correlate with overoptimization. Lastly, we discuss possible explanations for why SimPO succeeds in offline settings while Maximum Entropy RL struggles in online scenarios. Our findings suggest that reference-free approaches may face distinct challenges when applied to online or offline preference learning.

【3】ImageNet-trained CNNs are not biased towards texture: Revisiting feature reliance through controlled suppression
标题：ImageNet训练的CNN不会偏向纹理：通过受控抑制重新审视特征依赖
链接：https://arxiv.org/abs/2509.20234

作者：rt, Oliver Stoll, Paolo Rota, Begüm Demir
备注：Accepted at NeurIPS 2025 (oral)
摘要：The hypothesis that Convolutional Neural Networks (CNNs) are inherently texture-biased has shaped much of the discourse on feature use in deep learning. We revisit this hypothesis by examining limitations in the cue-conflict experiment by Geirhos et al. To address these limitations, we propose a domain-agnostic framework that quantifies feature reliance through systematic suppression of shape, texture, and color cues, avoiding the confounds of forced-choice conflicts. By evaluating humans and neural networks under controlled suppression conditions, we find that CNNs are not inherently texture-biased but predominantly rely on local shape features. Nonetheless, this reliance can be substantially mitigated through modern training strategies or architectures (ConvNeXt, ViTs). We further extend the analysis across computer vision, medical imaging, and remote sensing, revealing that reliance patterns differ systematically: computer vision models prioritize shape, medical imaging models emphasize color, and remote sensing models exhibit a stronger reliance towards texture. Code is available at https://github.com/tomburgert/feature-reliance.

【4】Staying on the Manifold: Geometry-Aware Noise Injection
标题：留在管汇上：几何感知噪音注入
链接：https://arxiv.org/abs/2509.20201

作者：øller Jacobsen, Johanna Marie Gegenfurtner, Georgios Arvanitidis
摘要：It has been shown that perturbing the input during training implicitly regularises the gradient of the learnt function, leading to smoother models and enhancing generalisation. However, previous research mostly considered the addition of ambient noise in the input space, without considering the underlying structure of the data. In this work, we propose several methods of adding geometry-aware input noise that accounts for the lower dimensional manifold the input space inhabits. We start by projecting ambient Gaussian noise onto the tangent space of the manifold. In a second step, the noise sample is mapped on the manifold via the associated geodesic curve. We also consider Brownian motion noise, which moves in random steps along the manifold. We show that geometry-aware noise leads to improved generalization and robustness to hyperparameter selection on highly curved manifolds, while performing at least as well as training without noise on simpler manifolds. Our proposed framework extends to learned data manifolds.

【5】Thinking Augmented Pre-training
标题：思维强化预训练
链接：https://arxiv.org/abs/2509.20186

作者：g, Nan Yang, Shaohan Huang, Li Dong, Furu Wei
备注：19 pages
摘要：This paper introduces a simple and scalable approach to improve the data efficiency of large language model (LLM) training by augmenting existing text data with thinking trajectories. The compute for pre-training LLMs has been growing at an unprecedented rate, while the availability of high-quality data remains limited. Consequently, maximizing the utility of available data constitutes a significant research challenge. A primary impediment is that certain high-quality tokens are difficult to learn given a fixed model capacity, as the underlying rationale for a single token can be exceptionally complex and deep. To address this issue, we propose Thinking augmented Pre-Training (TPT), a universal methodology that augments text with automatically generated thinking trajectories. Such augmentation effectively increases the volume of the training data and makes high-quality tokens more learnable through step-by-step reasoning and decomposition. We apply TPT across diverse training configurations up to $100$B tokens, encompassing pre-training with both constrained and abundant data, as well as mid-training from strong open-source checkpoints. Experimental results indicate that our method substantially improves the performance of LLMs across various model sizes and families. Notably, TPT enhances the data efficiency of LLM pre-training by a factor of $3$. For a $3$B parameter model, it improves the post-training performance by over $10\%$ on several challenging reasoning benchmarks.

【6】Discovering Association Rules in High-Dimensional Small Tabular Data
标题：高维小表格数据中关联规则的发现
链接：https://arxiv.org/abs/2509.20113

作者：abulut, Daniel Daza, Paul Groth, Victoria Degeler
备注：This paper was accepted at ECAI 2025 Workshop: 1st International Workshop on Advanced Neuro-Symbolic Applications (ANSyA)
摘要：Association Rule Mining (ARM) aims to discover patterns between features in datasets in the form of propositional rules, supporting both knowledge discovery and interpretable machine learning in high-stakes decision-making. However, in high-dimensional settings, rule explosion and computational overhead render popular algorithmic approaches impractical without effective search space reduction, challenges that propagate to downstream tasks. Neurosymbolic methods, such as Aerial+, have recently been proposed to address the rule explosion in ARM. While they tackle the high dimensionality of the data, they also inherit limitations of neural networks, particularly reduced performance in low-data regimes. This paper makes three key contributions to association rule discovery in high-dimensional tabular data. First, we empirically show that Aerial+ scales one to two orders of magnitude better than state-of-the-art algorithmic and neurosymbolic baselines across five real-world datasets. Second, we introduce the novel problem of ARM in high-dimensional, low-data settings, such as gene expression data from the biomedicine domain with around 18k features and 50 samples. Third, we propose two fine-tuning approaches to Aerial+ using tabular foundation models. Our proposed approaches are shown to significantly improve rule quality on five real-world datasets, demonstrating their effectiveness in low-data, high-dimensional scenarios.

【7】Incomplete Data, Complete Dynamics: A Diffusion Approach
标题：不完整数据，完整动态：扩散方法
链接：https://arxiv.org/abs/2509.20098

作者：u, Chenguang Wang, Hongyi Ye, Yongtao Guan, Tianshu Yu
摘要：Learning physical dynamics from data is a fundamental challenge in machine learning and scientific modeling. Real-world observational data are inherently incomplete and irregularly sampled, posing significant challenges for existing data-driven approaches. In this work, we propose a principled diffusion-based framework for learning physical systems from incomplete training samples. To this end, our method strategically partitions each such sample into observed context and unobserved query components through a carefully designed splitting strategy, then trains a conditional diffusion model to reconstruct the missing query portions given available contexts. This formulation enables accurate imputation across arbitrary observation patterns without requiring complete data supervision. Specifically, we provide theoretical analysis demonstrating that our diffusion training paradigm on incomplete data achieves asymptotic convergence to the true complete generative process under mild regularity conditions. Empirically, we show that our method significantly outperforms existing baselines on synthetic and real-world physical dynamics benchmarks, including fluid flows and weather systems, with particularly strong performance in limited and irregular observation regimes. These results demonstrate the effectiveness of our theoretically principled approach for learning and imputing partially observed dynamics.

【8】The Syntax and Semantics of einsum
标题：einsum的语法和语义
链接：https://arxiv.org/abs/2509.20020

作者：enig, Paul G. Rump, Mark Blacher, Joachim Giesen
备注：21 pages, 1 figure. Includes formal definitions, proofs of algebraic properties, and nesting/denesting rules for the einsum notation
摘要：In 2011, einsum was introduced to NumPy as a practical and convenient notation for tensor expressions in machine learning, quantum circuit simulation, and other fields. It has since been implemented in additional Python frameworks such as PyTorch and TensorFlow, as well as in other programming languages such as Julia. Despite its practical success, the einsum notation still lacks a solid theoretical basis, and is not unified across the different frameworks, limiting opportunities for formal reasoning and systematic optimization. In this work, we discuss the terminology of tensor expressions and provide a formal definition of the einsum language. Based on this definition, we formalize and prove important equivalence rules for tensor expressions and highlight their relevance in practical applications.

【9】Faster Than SVD, Smarter Than SGD: The OPLoRA Alternating Update
标题：比DID更快，比新元更智能：OPLoRA交替更新
链接：https://arxiv.org/abs/2509.19977

作者：asem Almansoori, Maria Ivanova, Andrey Veprikov, Aleksandr Beznosikov, Samuel Horváth, Martin Takáč
备注：12 pages, 2 figures, 1 table. Accepted to OPT 2025 Workshop
摘要：Low-Rank Adaptation (LoRA) fine-tunes large models by learning low-rank updates on top of frozen weights, dramatically reducing trainable parameters and memory. However, there is still a gap between full training with low-rank projections (SVDLoRA) and LoRA fine-tuning, indicating that LoRA steps can be further improved. In this study, we propose OPLoRA, a memory-efficient optimizer that closes this gap by casting LoRA optimization as an interpretable sub-problem and solving it efficiently with alternating least squares updates, where 1-2 alternating steps are empirically found to be sufficient to closely match truncated SVD without ever forming the full matrix. We also retrieve the recently proposed preconditioning methods for LoRA as a special case. OPLoRA supports momentum by maintaining a low-rank estimate using the same subroutine (LoRSum) for computing the step, with a memory budget of 3 times the number of LoRA parameters (i.e., same as Adam). We also propose an experimental scaled variant that uses the K-FAC metric, which could be of interest. Across a linear task, MNIST, CIFAR-100, and RoBERTa-base (MNLI), OPLoRA consistently approaches SVDLoRA's performance using significantly less memory.

【10】TABFAIRGDT: A Fast Fair Tabular Data Generator using Autoregressive Decision Trees
标题：TABFAIRGDT：使用自回归决策树的快速公平表格数据生成器
链接：https://arxiv.org/abs/2509.19927

作者： Panagiotou, Benoît Ronval, Arjun Roy, Ludwig Bothmann, Bernd Bischl, Siegfried Nijssen, Eirini Ntoutsi
备注：Paper accepted at IEEE ICDM 2025: IEEE International Conference on Data Mining 2025, November 12-15, 2025, Washington DC, USA
摘要：Ensuring fairness in machine learning remains a significant challenge, as models often inherit biases from their training data. Generative models have recently emerged as a promising approach to mitigate bias at the data level while preserving utility. However, many rely on deep architectures, despite evidence that simpler models can be highly effective for tabular data. In this work, we introduce TABFAIRGDT, a novel method for generating fair synthetic tabular data using autoregressive decision trees. To enforce fairness, we propose a soft leaf resampling technique that adjusts decision tree outputs to reduce bias while preserving predictive performance. Our approach is non-parametric, effectively capturing complex relationships between mixed feature types, without relying on assumptions about the underlying data distributions. We evaluate TABFAIRGDT on benchmark fairness datasets and demonstrate that it outperforms state-of-the-art (SOTA) deep generative models, achieving better fairness-utility trade-off for downstream tasks, as well as higher synthetic data quality. Moreover, our method is lightweight, highly efficient, and CPU-compatible, requiring no data pre-processing. Remarkably, TABFAIRGDT achieves a 72% average speedup over the fastest SOTA baseline across various dataset sizes, and can generate fair synthetic data for medium-sized datasets (10 features, 10K samples) in just one second on a standard CPU, making it an ideal solution for real-world fairness-sensitive applications.

【11】Pure Exploration via Frank-Wolfe Self-Play
标题：弗兰克-沃尔夫自我游戏的纯粹探索
链接：https://arxiv.org/abs/2509.19901

作者：, Chao Qin, Wei You
摘要：We study pure exploration in structured stochastic multi-armed bandits, aiming to efficiently identify the correct hypothesis from a finite set of alternatives. For a broad class of tasks, asymptotic analyses reduce to a maximin optimization that admits a two-player zero-sum game interpretation between an experimenter and a skeptic: the experimenter allocates measurements to rule out alternatives while the skeptic proposes alternatives. We reformulate the game by allowing the skeptic to adopt a mixed strategy, yielding a concave-convex saddle-point problem. This viewpoint leads to Frank-Wolfe Self-Play (FWSP): a projection-free, regularization-free, tuning-free method whose one-hot updates on both sides match the bandit sampling paradigm. However, structural constraints introduce sharp pathologies that complicate algorithm design and analysis: our linear-bandit case study exhibits nonunique optima, optimal designs with zero mass on the best arm, bilinear objectives, and nonsmoothness at the boundary. We address these challenges via a differential-inclusion argument, proving convergence of the game value for best-arm identification in linear bandits. Our analysis proceeds through a continuous-time limit: a differential inclusion with a Lyapunov function that decays exponentially, implying a vanishing duality gap and convergence to the optimal value. Although Lyapunov analysis requires differentiability of the objective, which is not guaranteed on the boundary, we show that along continuous trajectories the algorithm steers away from pathological nonsmooth points and achieves uniform global convergence to the optimal game value. We then embed the discrete-time updates into a perturbed flow and show that the discrete game value also converges. Building on FWSP, we further propose a learning algorithm based on posterior sampling. Numerical experiments demonstrate a vanishing duality gap.

【12】MCGrad:: Multicalibration at Web Scale
标题：MCGrad：：网络规模的多元校准
链接：https://arxiv.org/abs/2509.19884

作者：erini, Daniel Haimovich, Fridolin Linder, Niek Tax, Dima Karamshuk, Milan Vojnovic, Nastaran Okati, Pavlos Athanasios Apostolopoulos
备注：Under submission
摘要：We propose MCGrad, a novel and scalable multicalibration algorithm. Multicalibration - calibration in sub-groups of the data - is an important property for the performance of machine learning-based systems. Existing multicalibration methods have thus far received limited traction in industry. We argue that this is because existing methods (1) require such subgroups to be manually specified, which ML practitioners often struggle with, (2) are not scalable, or (3) may harm other notions of model performance such as log loss and Area Under the Precision-Recall Curve (PRAUC). MCGrad does not require explicit specification of protected groups, is scalable, and often improves other ML evaluation metrics instead of harming them. MCGrad has been in production at Meta, and is now part of hundreds of production models. We present results from these deployments as well as results on public datasets.

【13】Oversampling and Downsampling with Core-Boundary Awareness: A Data Quality-Driven Approach
标题：具有核心边界意识的过采样和下采样：数据质量驱动的方法
链接：https://arxiv.org/abs/2509.19856

作者：him Belhaouari, Yunis Carreon Kahalan, Humaira Shaffique, Ismael Belhaouari, Ashhadul Islam
摘要：The effectiveness of machine learning models, particularly in unbalanced classification tasks, is often hindered by the failure to differentiate between critical instances near the decision boundary and redundant samples concentrated in the core of the data distribution. In this paper, we propose a method to systematically identify and differentiate between these two types of data. Through extensive experiments on multiple benchmark datasets, we show that the boundary data oversampling method improves the F1 score by up to 10\% on 96\% of the datasets, whereas our core-aware reduction method compresses datasets up to 90\% while preserving their accuracy, making it 10 times more powerful than the original dataset. Beyond imbalanced classification, our method has broader implications for efficient model training, particularly in computationally expensive domains such as Large Language Model (LLM) training. By prioritizing high-quality, decision-relevant data, our approach can be extended to text, multimodal, and self-supervised learning scenarios, offering a pathway to faster convergence, improved generalization, and significant computational savings. This work paves the way for future research in data-efficient learning, where intelligent sampling replaces brute-force expansion, driving the next generation of AI advancements. Our code is available as a Python package at https://pypi.org/project/adaptive-resampling/ .

【14】An Efficient Conditional Score-based Filter for High Dimensional Nonlinear Filtering Problems
标题：一种用于多维非线性过滤问题的高效基于条件分数的过滤器
链接：https://arxiv.org/abs/2509.19816

作者：ng, Weiye Gan, Junqing Chen, Zuoqiang Shi
摘要：In many engineering and applied science domains, high-dimensional nonlinear filtering is still a challenging problem. Recent advances in score-based diffusion models offer a promising alternative for posterior sampling but require repeated retraining to track evolving priors, which is impractical in high dimensions. In this work, we propose the Conditional Score-based Filter (CSF), a novel algorithm that leverages a set-transformer encoder and a conditional diffusion model to achieve efficient and accurate posterior sampling without retraining. By decoupling prior modeling and posterior sampling into offline and online stages, CSF enables scalable score-based filtering across diverse nonlinear systems. Extensive experiments on benchmark problems show that CSF achieves superior accuracy, robustness, and efficiency across diverse nonlinear filtering scenarios.

【15】Intuition to Evidence: Measuring AI's True Impact on Developer Productivity
标题：证据直觉：衡量人工智能对开发人员生产力的真正影响
链接：https://arxiv.org/abs/2509.19708

作者：ar, Vishal Khare, Deepak Sharma, Satyam Kumar, Vijay Saini, Anshul Yadav, Sachendra Jain, Ankit Rana, Pratham Verma, Vaibhav Meena, Avinash Edubilli
备注：16 pages, 10 figures, 5 tables
摘要：We present a comprehensive real-world evaluation of AI-assisted software development tools deployed at enterprise scale. Over one year, 300 engineers across multiple teams integrated an in-house AI platform (DeputyDev) that combines code generation and automated review capabilities into their daily workflows. Through rigorous cohort analysis, our study demonstrates statistically significant productivity improvements, including an overall 31.8% reduction in PR review cycle time. Developer adoption was strong, with 85% satisfaction for code review features and 93% expressing a desire to continue using the platform. Adoption patterns showed systematic scaling from 4% engagement in month 1 to 83% peak usage by month 6, stabilizing at 60% active engagement. Top adopters achieved a 61% increase in code volume pushed to production, contributing to approximately 30 to 40% of code shipped to production through this tool, accounting for an overall 28% increase in code shipment volume. Unlike controlled benchmark evaluations, our longitudinal analysis provides empirical evidence from production environments, revealing both the transformative potential and practical deployment challenges of integrating AI into enterprise software development workflows.

【16】A Unified Noise-Curvature View of Loss of Trainability
标题：训练性丧失的统一噪音曲线观
链接：https://arxiv.org/abs/2509.19698

作者：ngh Baveja, Mark Schmidt
摘要：Loss of trainability (LoT) in continual learning occurs when gradient steps no longer yield improvement as tasks evolve, so accuracy stalls or degrades despite adequate capacity and supervision. We analyze LoT incurred with Adam through an optimization lens and find that single indicators such as Hessian rank, sharpness level, weight or gradient norms, gradient-to-parameter ratios, and unit-sign entropy are not reliable predictors. Instead we introduce two complementary criteria: a batch-size-aware gradient-noise bound and a curvature volatility-controlled bound that combine into a per-layer predictive threshold that anticipates trainability behavior. Using this threshold, we build a simple per-layer scheduler that keeps each layers effective step below a safe limit, stabilizing training and improving accuracy across concatenated ReLU (CReLU), Wasserstein regularization, and L2 weight decay, with learned learning-rate trajectories that mirror canonical decay.

【17】Formal Safety Verification and Refinement for Generative Motion Planners via Certified Local Stabilization
标题：通过认证的局部稳定对生成运动规划器进行正式安全验证和细化
链接：https://arxiv.org/abs/2509.19688

作者：th, Haoran Yin, Glen Chou
备注：10 pages, 12 figures
摘要：We present a method for formal safety verification of learning-based generative motion planners. Generative motion planners (GMPs) offer advantages over traditional planners, but verifying the safety and dynamic feasibility of their outputs is difficult since neural network verification (NNV) tools scale only to a few hundred neurons, while GMPs often contain millions. To preserve GMP expressiveness while enabling verification, our key insight is to imitate the GMP by stabilizing references sampled from the GMP with a small neural tracking controller and then applying NNV to the closed-loop dynamics. This yields reachable sets that rigorously certify closed-loop safety, while the controller enforces dynamic feasibility. Building on this, we construct a library of verified GMP references and deploy them online in a way that imitates the original GMP distribution whenever it is safe to do so, improving safety without retraining. We evaluate across diverse planners, including diffusion, flow matching, and vision-language models, improving safety in simulation (on ground robots and quadcopters) and on hardware (differential-drive robot).

【18】Mamba Modulation: On the Length Generalization of Mamba
标题：曼巴调制：曼巴的长度概括
链接：https://arxiv.org/abs/2509.19633

作者：Jerry Huang, Qiuhao Zeng, Xinyu Wang, Boxing Wang, Philippe Langlais, Yufei Cui
备注：Accepted to The Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS) 2025. First two authors contributed equally
摘要：The quadratic complexity of the attention mechanism in Transformer models has motivated the development of alternative architectures with sub-quadratic scaling, such as state-space models. Among these, Mamba has emerged as a leading architecture, achieving state-of-the-art results across a range of language modeling tasks. However, Mamba's performance significantly deteriorates when applied to contexts longer than those seen during pre-training, revealing a sharp sensitivity to context length extension. Through detailed analysis, we attribute this limitation to the out-of-distribution behaviour of its state-space dynamics, particularly within the parameterization of the state transition matrix $\mathbf{A}$. Unlike recent works which attribute this sensitivity to the vanished accumulation of discretization time steps, $\exp(-\sum_{t=1}^N\Delta_t)$, we establish a connection between state convergence behavior as the input length approaches infinity and the spectrum of the transition matrix $\mathbf{A}$, offering a well-founded explanation of its role in length extension. Next, to overcome this challenge, we propose an approach that applies spectrum scaling to pre-trained Mamba models to enable robust long-context generalization by selectively modulating the spectrum of $\mathbf{A}$ matrices in each layer. We show that this can significantly improve performance in settings where simply modulating $\Delta_t$ fails, validating our insights and providing avenues for better length generalization of state-space models with structured transition matrices.

【19】AnySafe: Adapting Latent Safety Filters at Runtime via Safety Constraint Parameterization in the Latent Space
标题：AnySafe：通过潜在空间中的安全约束参数化在收件箱中调整潜在安全过滤器
链接：https://arxiv.org/abs/2509.19555

作者：grawal, Junwon Seo, Kensuke Nakamura, Ran Tian, Andrea Bajcsy
摘要：Recent works have shown that foundational safe control methods, such as Hamilton-Jacobi (HJ) reachability analysis, can be applied in the latent space of world models. While this enables the synthesis of latent safety filters for hard-to-model vision-based tasks, they assume that the safety constraint is known a priori and remains fixed during deployment, limiting the safety filter's adaptability across scenarios. To address this, we propose constraint-parameterized latent safety filters that can adapt to user-specified safety constraints at runtime. Our key idea is to define safety constraints by conditioning on an encoding of an image that represents a constraint, using a latent-space similarity measure. The notion of similarity to failure is aligned in a principled way through conformal calibration, which controls how closely the system may approach the constraint representation. The parameterized safety filter is trained entirely within the world model's imagination, treating any image seen by the model as a potential test-time constraint, thereby enabling runtime adaptation to arbitrary safety constraints. In simulation and hardware experiments on vision-based control tasks with a Franka manipulator, we show that our method adapts at runtime by conditioning on the encoding of user-specified constraint images, without sacrificing performance. Video results can be found on https://any-safe.github.io

【20】Metriplectic Conditional Flow Matching for Dissipative Dynamics
标题：消散动力学的矩阵条件流匹配
链接：https://arxiv.org/abs/2509.19526

作者：i, Lars Lindemann
摘要：Metriplectic conditional flow matching (MCFM) learns dissipative dynamics without violating first principles. Neural surrogates often inject energy and destabilize long-horizon rollouts; MCFM instead builds the conservative-dissipative split into both the vector field and a structure preserving sampler. MCFM trains via conditional flow matching on short transitions, avoiding long rollout adjoints. In inference, a Strang-prox scheme alternates a symplectic update with a proximal metric step, ensuring discrete energy decay; an optional projection enforces strict decay when a trusted energy is available. We provide continuous and discrete time guarantees linking this parameterization and sampler to conservation, monotonic dissipation, and stable rollouts. On a controlled mechanical benchmark, MCFM yields phase portraits closer to ground truth and markedly fewer energy-increase and positive energy rate events than an equally expressive unconstrained neural flow, while matching terminal distributional fit.

【21】AIRwaves at CheckThat! 2025: Retrieving Scientific Sources for Implicit Claims on Social Media with Dual Encoders and Neural Re-Ranking
标题：CheckThat！2025年：利用双编码器和神经重新排名检索社交媒体上隐性主张的科学来源
链接：https://arxiv.org/abs/2509.19509

作者：ugh, Leon Baumgärtner, Tim Gress, Nikita Sidorov, Daniel Werner
备注：CLEF 2025 (Conference and Labs of the Evaluation Forum)
摘要：Linking implicit scientific claims made on social media to their original publications is crucial for evidence-based fact-checking and scholarly discourse, yet it is hindered by lexical sparsity, very short queries, and domain-specific language. Team AIRwaves ranked second in Subtask 4b of the CLEF-2025 CheckThat! Lab with an evidence-retrieval approach that markedly outperforms the competition baseline. The optimized sparse-retrieval baseline(BM25) achieves MRR@5 = 0.5025 on the gold label blind test set. To surpass this baseline, a two-stage retrieval pipeline is introduced: (i) a first stage that uses a dual encoder based on E5-large, fine-tuned using in-batch and mined hard negatives and enhanced through chunked tokenization and rich document metadata; and (ii) a neural re-ranking stage using a SciBERT cross-encoder. Replacing purely lexical matching with neural representations lifts performance to MRR@5 = 0.6174, and the complete pipeline further improves to MRR@5 = 0.6828. The findings demonstrate that coupling dense retrieval with neural re-rankers delivers a powerful and efficient solution for tweet-to-study matching and provides a practical blueprint for future evidence-retrieval pipelines.

【22】HUNT: High-Speed UAV Navigation and Tracking in Unstructured Environments via Instantaneous Relative Frames
标题：HUNT：通过瞬时相对框架在非结构化环境中进行高速无人机导航和跟踪
链接：https://arxiv.org/abs/2509.19452

作者：o Saviolo, Jeffrey Mao, Giuseppe Loianno
摘要：Search and rescue operations require unmanned aerial vehicles to both traverse unknown unstructured environments at high speed and track targets once detected. Achieving both capabilities under degraded sensing and without global localization remains an open challenge. Recent works on relative navigation have shown robust tracking by anchoring planning and control to a visible detected object, but cannot address navigation when no target is in the field of view. We present HUNT (High-speed UAV Navigation and Tracking), a real-time framework that unifies traversal, acquisition, and tracking within a single relative formulation. HUNT defines navigation objectives directly from onboard instantaneous observables such as attitude, altitude, and velocity, enabling reactive high-speed flight during search. Once a target is detected, the same perception-control pipeline transitions seamlessly to tracking. Outdoor experiments in dense forests, container compounds, and search-and-rescue operations with vehicles and mannequins demonstrate robust autonomy where global methods fail.

【23】Poster: ChatIYP: Enabling Natural Language Access to the Internet Yellow Pages Database
标题：海报：ChatIYP：使自然语言访问互联网黄页数据库
链接：https://arxiv.org/abs/2509.19411

作者：ndritsoudis, Pavlos Sermpezis, Ilias Dimitriadis, Athena Vakali
备注：ACM Internet Measurement Conference (IMC) 2025
摘要：The Internet Yellow Pages (IYP) aggregates information from multiple sources about Internet routing into a unified, graph-based knowledge base. However, querying it requires knowledge of the Cypher language and the exact IYP schema, thus limiting usability for non-experts. In this paper, we propose ChatIYP, a domain-specific Retrieval-Augmented Generation (RAG) system that enables users to query IYP through natural language questions. Our evaluation demonstrates solid performance on simple queries, as well as directions for improvement, and provides insights for selecting evaluation metrics that are better fit for IYP querying AI agents.

【24】ShinkaEvolve: Towards Open-Ended And Sample-Efficient Program Evolution
标题：ShinkaEvolve：走向开放且样本高效的程序进化
链接：https://arxiv.org/abs/2509.19349

作者：arko Lange, Yuki Imajuku, Edoardo Cetin
备注：52 pages, 14 figures
摘要：We introduce ShinkaEvolve: a new open-source framework leveraging large language models (LLMs) to advance scientific discovery with state-of-the-art performance and unprecedented efficiency. Recent advances in scaling inference time compute of LLMs have enabled significant progress in generalized scientific discovery. These approaches rely on evolutionary agentic harnesses that leverage LLMs as mutation operators to generate candidate solutions. However, current code evolution methods suffer from critical limitations: they are sample inefficient, requiring thousands of samples to identify effective solutions, and remain closed-source, hindering broad adoption and extension. ShinkaEvolve addresses these limitations, introducing three key innovations: a parent sampling technique balancing exploration and exploitation, code novelty rejection-sampling for efficient search space exploration, and a bandit-based LLM ensemble selection strategy. We evaluate ShinkaEvolve across diverse tasks, demonstrating consistent improvements in sample efficiency and solution quality. ShinkaEvolve discovers a new state-of-the-art circle packing solution using only 150 samples, designs high-performing agentic harnesses for AIME mathematical reasoning tasks, identifies improvements to ALE-Bench competitive programming solutions, and discovers novel mixture-of-expert load balancing loss functions that illuminate the space of optimization strategies. Our results demonstrate that ShinkaEvolve achieves broad applicability with exceptional sample efficiency. By providing open-source accessibility and cost-efficiency, this work democratizes open-ended discovery across diverse computational problems.

【25】Error Propagation in Dynamic Programming: From Stochastic Control to Option Pricing
标题：动态规划中的误差传播：从随机控制到期权定价
链接：https://arxiv.org/abs/2509.20239

作者：lla Vecchia, Damir Filipović
摘要：This paper investigates theoretical and methodological foundations for stochastic optimal control (SOC) in discrete time. We start formulating the control problem in a general dynamic programming framework, introducing the mathematical structure needed for a detailed convergence analysis. The associate value function is estimated through a sequence of approximations combining nonparametric regression methods and Monte Carlo subsampling. The regression step is performed within reproducing kernel Hilbert spaces (RKHSs), exploiting the classical KRR algorithm, while Monte Carlo sampling methods are introduced to estimate the continuation value. To assess the accuracy of our value function estimator, we propose a natural error decomposition and rigorously control the resulting error terms at each time step. We then analyze how this error propagates backward in time-from maturity to the initial stage-a relatively underexplored aspect of the SOC literature. Finally, we illustrate how our analysis naturally applies to a key financial application: the pricing of American options.

【26】First-Extinction Law for Resampling Processes
标题：恢复过程的第一灭绝定律
链接：https://arxiv.org/abs/2509.20101

作者：nati, Alessandro Londei, Denise Lanzieri, Vittorio Loreto
摘要：Extinction times in resampling processes are fundamental yet often intractable, as previous formulas scale as $2^M$ with the number of states $M$ present in the initial probability distribution. We solve this by treating multinomial updates as independent square-root diffusions of zero drift, yielding a closed-form law for the first-extinction time. We prove that the mean coincides exactly with the Wright-Fisher result of Baxter et al., thereby replacing exponential-cost evaluations with a linear-cost expression, and we validate this result through extensive simulations. Finally, we demonstrate predictive power for model collapse in a simple self-training setup: the onset of collapse coincides with the resampling-driven first-extinction time computed from the model's initial stationary distribution. These results hint to a unified view of resampling extinction dynamics.

【27】Convex Regression with a Penalty
标题：带罚分的凸回归
链接：https://arxiv.org/abs/2509.19788

作者：
摘要：A common way to estimate an unknown convex regression function $f_0: \Omega \subset \mathbb{R}^d \rightarrow \mathbb{R}$ from a set of $n$ noisy observations is to fit a convex function that minimizes the sum of squared errors. However, this estimator is known for its tendency to overfit near the boundary of $\Omega$, posing significant challenges in real-world applications. In this paper, we introduce a new estimator of $f_0$ that avoids this overfitting by minimizing a penalty on the subgradient while enforcing an upper bound $s_n$ on the sum of squared errors. The key advantage of this method is that $s_n$ can be directly estimated from the data. We establish the uniform almost sure consistency of the proposed estimator and its subgradient over $\Omega$ as $n \rightarrow \infty$ and derive convergence rates. The effectiveness of our estimator is illustrated through its application to estimating waiting times in a single-server queue.

【28】Diffusion and Flow-based Copulas: Forgetting and Remembering Dependencies
标题：扩散和基于流的Copula：忘记和记住从属关系
链接：https://arxiv.org/abs/2509.19707

作者：, Theodoros Damoulas
备注：Preprint
摘要：Copulas are a fundamental tool for modelling multivariate dependencies in data, forming the method of choice in diverse fields and applications. However, the adoption of existing models for multimodal and high-dimensional dependencies is hindered by restrictive assumptions and poor scaling. In this work, we present methods for modelling copulas based on the principles of diffusions and flows. We design two processes that progressively forget inter-variable dependencies while leaving dimension-wise distributions unaffected, provably defining valid copulas at all times. We show how to obtain copula models by learning to remember the forgotten dependencies from each process, theoretically recovering the true copula at optimality. The first instantiation of our framework focuses on direct density estimation, while the second specialises in expedient sampling. Empirically, we demonstrate the superior performance of our proposed methods over state-of-the-art copula approaches in modelling complex and high-dimensional dependencies from scientific datasets and images. Our work enhances the representational power of copula models, empowering applications and paving the way for their adoption on larger scales and more challenging domains.

【29】Anchored Langevin Algorithms
标题：锚定Langevin算法
链接：https://arxiv.org/abs/2509.19455

作者：uzbalaban, Hoang M. Nguyen, Xicheng Zhang, Lingjiong Zhu
备注：49 pages, 8 figures, 1 table
摘要：Standard first-order Langevin algorithms such as the unadjusted Langevin algorithm (ULA) are obtained by discretizing the Langevin diffusion and are widely used for sampling in machine learning because they scale to high dimensions and large datasets. However, they face two key limitations: (i) they require differentiable log-densities, excluding targets with non-differentiable components; and (ii) they generally fail to sample heavy-tailed targets. We propose anchored Langevin dynamics, a unified approach that accommodates non-differentiable targets and certain classes of heavy-tailed distributions. The method replaces the original potential with a smooth reference potential and modifies the Langevin diffusion via multiplicative scaling. We establish non-asymptotic guarantees in the 2-Wasserstein distance to the target distribution and provide an equivalent formulation derived via a random time change of the Langevin diffusion. We provide numerical experiments to illustrate the theory and practical performance of our proposed approach.

【30】The Pareto Frontier of Resilient Jet Tagging
标题：弹性喷气标签的帕累托前沿
链接：https://arxiv.org/abs/2509.19431

作者：bhir, Matt LeBlanc, Yuanchen Zhou
备注：6 pages, 2 figures and 2 tables. Preliminary version accepted for the 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop: Machine Learning and the Physical Sciences. 6 or 7 December, 2025; San Diego, California, USA
摘要：Classifying hadronic jets using their constituents' kinematic information is a critical task in modern high-energy collider physics. Often, classifiers are designed by targeting the best performance using metrics such as accuracy, AUC, or rejection rates. However, the use of a single metric can lead to the use of architectures that are more model-dependent than competitive alternatives, leading to potential uncertainty and bias in analysis. We explore such trade-offs and demonstrate the consequences of using networks with high performance metrics but low resilience.

【31】A Statistical Mixture-of-Experts Framework for EMG Artifact Removal in EEG: Empirical Insights and a Proof-of-Concept Application
标题：用于脑电中肌电信号消除的统计专家混合框架：经验见解和概念验证应用
链接：https://arxiv.org/abs/2509.19385

作者：J. Choi, Griffin Milsap, Clara A. Scholl, Francesco Tenore, Mattson Ogg
摘要：Effective control of neural interfaces is limited by poor signal quality. While neural network-based electroencephalography (EEG) denoising methods for electromyogenic (EMG) artifacts have improved in recent years, current state-of-the-art (SOTA) models perform suboptimally in settings with high noise. To address the shortcomings of current machine learning (ML)-based denoising algorithms, we present a signal filtration algorithm driven by a new mixture-of-experts (MoE) framework. Our algorithm leverages three new statistical insights into the EEG-EMG denoising problem: (1) EMG artifacts can be partitioned into quantifiable subtypes to aid downstream MoE classification, (2) local experts trained on narrower signal-to-noise ratio (SNR) ranges can achieve performance increases through specialization, and (3) correlation-based objective functions, in conjunction with rescaling algorithms, can enable faster convergence in a neural network-based denoising context. We empirically demonstrate these three insights into EMG artifact removal and use our findings to create a new downstream MoE denoising algorithm consisting of convolutional (CNN) and recurrent (RNN) neural networks. We tested all results on a major benchmark dataset (EEGdenoiseNet) collected from 67 subjects. We found that our MoE denoising model achieved competitive overall performance with SOTA ML denoising algorithms and superior lower bound performance in high noise settings. These preliminary results highlight the promise of our MoE framework for enabling advances in EMG artifact removal for EEG processing, especially in high noise settings. Further research and development will be necessary to assess our MoE framework on a wider range of real-world test cases and explore its downstream potential to unlock more effective neural interfaces.

【32】Data-Driven Reconstruction of Significant Wave Heights from Sparse Observations
标题：从稀疏观测数据驱动重建有效波高
链接：https://arxiv.org/abs/2509.19384

作者：Shi, Yilin Zhai, Ping Dong, Zaijin You, Chao Zhan, Qing Wang
摘要：Reconstructing high-resolution regional significant wave height fields from sparse and uneven buoy observations remains a core challenge for ocean monitoring and risk-aware operations. We introduce AUWave, a hybrid deep learning framework that fuses a station-wise sequence encoder (MLP) with a multi-scale U-Net enhanced by a bottleneck self-attention layer to recover 32$\times$32 regional SWH fields. A systematic Bayesian hyperparameter search with Optuna identifies the learning rate as the dominant driver of generalization, followed by the scheduler decay and the latent dimension. Using NDBC buoy observations and ERA5 reanalysis over the Hawaii region, AUWave attains a minimum validation loss of 0.043285 and a slightly right-skewed RMSE distribution. Spatial errors are lowest near observation sites and increase with distance, reflecting identifiability limits under sparse sampling. Sensitivity experiments show that AUWave consistently outperforms a representative baseline in data-richer configurations, while the baseline is only marginally competitive in the most underdetermined single-buoy cases. The architecture's multi-scale and attention components translate into accuracy gains when minimal but non-trivial spatial anchoring is available. Error maps and buoy ablations reveal key anchor stations whose removal disproportionately degrades performance, offering actionable guidance for network design. AUWave provides a scalable pathway for gap filling, high-resolution priors for data assimilation, and contingency reconstruction.

机器翻译由腾讯交互翻译提供，仅供参考

点击“阅读原文”获取带摘要的学术速递