机器学习学术速递[9.4]

点击阅读原文访问arxivdaily.com，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏等功能！

cs.LG 方向，今日共计126篇

大模型相关(11篇)

【1】Can LLMs Lie? Investigation beyond Hallucination
标题：LLM可以说谎吗？超越幻觉的调查
链接：https://arxiv.org/abs/2509.03518

作者：an, Mihir Prabhudesai, Mengning Wu, Shantanu Jaiswal, Deepak Pathak
备注：Website at this https URL
摘要：大型语言模型（LLM）已经在各种任务中展示了令人印象深刻的能力，但它们在现实世界应用程序中日益增加的自主性引发了人们对其可信度的担忧。虽然幻觉-无意的谎言-已经被广泛研究，但撒谎的现象，即LLM故意产生谎言以达到不可告人的目的，仍然没有得到充分的研究。在这项工作中，我们系统地研究了LLM的说谎行为，将其与幻觉区分开来，并在实际场景中进行测试。通过机械的可解释性技术，我们揭示了欺骗的神经机制，采用logit透镜分析，因果干预，对比激活转向识别和控制欺骗行为。我们研究真实世界的说谎场景，并引入行为导向向量，使说谎倾向的细粒度操纵。此外，我们探索说谎和最终任务性能之间的权衡，建立一个帕累托边界，不诚实可以提高目标优化。我们的研究结果有助于更广泛地讨论人工智能伦理，揭示在高风险环境中部署LLM的风险和潜在保障措施。代码和更多插图可在www.example.com上获得
摘要：Large language models (LLMs) have demonstrated impressive capabilities across a variety of tasks, but their increasing autonomy in real-world applications raises concerns about their trustworthiness. While hallucinations-unintentional falsehoods-have been widely studied, the phenomenon of lying, where an LLM knowingly generates falsehoods to achieve an ulterior objective, remains underexplored. In this work, we systematically investigate the lying behavior of LLMs, differentiating it from hallucinations and testing it in practical scenarios. Through mechanistic interpretability techniques, we uncover the neural mechanisms underlying deception, employing logit lens analysis, causal interventions, and contrastive activation steering to identify and control deceptive behavior. We study real-world lying scenarios and introduce behavioral steering vectors that enable fine-grained manipulation of lying tendencies. Further, we explore the trade-offs between lying and end-task performance, establishing a Pareto frontier where dishonesty can enhance goal optimization. Our findings contribute to the broader discourse on AI ethics, shedding light on the risks and potential safeguards for deploying LLMs in high-stakes environments. Code and more illustrations are available at https://llm-liar.github.io/

【2】Strefer: Empowering Video LLMs with Space-Time Referring and Reasoning via Synthetic Instruction Data
标题：Strefer：通过合成指令数据赋予视频LLM时空引用和推理能力
链接：https://arxiv.org/abs/2509.03501

作者：ou, Xiangyu Peng, Shrikant Kendre, Michael S. Ryoo, Silvio Savarese, Caiming Xiong, Juan Carlos Niebles
备注：This technical report serves as the archival version of our paper accepted at the ICCV 2025 Workshop. For more information, please visit our project website: this https URL
摘要：下一代人工智能伙伴必须超越一般的视频理解，以解决动态现实环境中的空间和时间参考。现有的视频大语言模型（Video LLM）虽然能够进行粗层次的理解，但难以进行细粒度的时空推理，特别是当用户查询依赖于基于时间的事件引用进行时间锚定或手势提示进行空间锚定以澄清对象引用和位置时。为了弥合这一关键差距，我们介绍了Strefer，一个合成的指令数据生成框架，旨在装备视频LLM与时空引用和推理能力。Strefer使用一个数据引擎来产生各种各样的调整数据，该数据引擎对时间密集的细粒度视频元数据进行伪注释，以结构化的方式捕获丰富的空间和时间信息，包括主体，对象，它们作为masklet的位置，以及它们的动作描述和时间轴。我们的方法增强了视频LLM解释空间和时间参考的能力，培养了对现实世界AI同伴至关重要的更通用的时空感知推理。在不使用专有模型、昂贵的人工注释或需要注释大量新视频的情况下，实验评估表明，使用Strefer生成的数据训练的模型在需要空间和时间消歧的任务上优于基线。此外，这些模型还表现出增强的时空感知推理，为感知接地，调整的视频LLM建立了新的基础。
摘要：Next-generation AI companions must go beyond general video understanding to resolve spatial and temporal references in dynamic, real-world environments. Existing Video Large Language Models (Video LLMs), while capable of coarse-level comprehension, struggle with fine-grained, spatiotemporal reasoning, especially when user queries rely on time-based event references for temporal anchoring, or gestural cues for spatial anchoring to clarify object references and positions. To bridge this critical gap, we introduce Strefer, a synthetic instruction data generation framework designed to equip Video LLMs with spatiotemporal referring and reasoning capabilities. Strefer produces diverse instruction-tuning data using a data engine that pseudo-annotates temporally dense, fine-grained video metadata, capturing rich spatial and temporal information in a structured manner, including subjects, objects, their locations as masklets, and their action descriptions and timelines. Our approach enhances the ability of Video LLMs to interpret spatial and temporal references, fostering more versatile, space-time-aware reasoning essential for real-world AI companions. Without using proprietary models, costly human annotation, or the need to annotate large volumes of new videos, experimental evaluations show that models trained with data produced by Strefer outperform baselines on tasks requiring spatial and temporal disambiguation. Additionally, these models exhibit enhanced space-time-aware reasoning, establishing a new foundation for perceptually grounded, instruction-tuned Video LLMs.

【3】On Entropy Control in LLM-RL Algorithms
标题：LLM-RL算法中的熵控制
链接：https://arxiv.org/abs/2509.03493

作者：
摘要：对于RL算法，适当的熵控制是其有效性的关键。为了控制策略熵，一种常用的方法是熵正则化，它被采用在各种流行的RL算法中，包括PPO，SAC和A3 C。尽管熵正则化在机器人和游戏RL中被证明是有效的，但研究发现它在LLM-RL训练中的收益很弱甚至没有。在这项工作中，我们研究了LLM-RL设置中的熵奖金问题。具体来说，我们首先认为，传统的熵正则化遭受LLM的极大的响应空间和稀疏的最佳输出。作为一种补救措施，我们提出了AEnt，熵控制方法，利用一个新的钳位熵奖金自动调整系数。钳位熵的评估与重新规范化的政策，定义在某些较小的令牌空间，这鼓励探索一个更紧凑的响应集。此外，该算法根据熵的钳制值自动调整熵系数，在充分利用熵的优点的同时，有效地控制了熵引起的偏差。AEnt在不同的基础模型和数据集下的数学推理任务中进行了测试，观察到AEnt在多个基准测试中始终优于基线。
摘要：For RL algorithms, appropriate entropy control is crucial to their effectiveness. To control the policy entropy, a commonly used method is entropy regularization, which is adopted in various popular RL algorithms including PPO, SAC and A3C. Although entropy regularization proves effective in robotic and games RL conventionally, studies found that it gives weak to no gains in LLM-RL training. In this work, we study the issues of entropy bonus in LLM-RL setting. Specifically, we first argue that the conventional entropy regularization suffers from the LLM's extremely large response space and the sparsity of the optimal outputs. As a remedy, we propose AEnt, an entropy control method that utilizes a new clamped entropy bonus with an automatically adjusted coefficient. The clamped entropy is evaluated with the re-normalized policy defined on certain smaller token space, which encourages exploration within a more compact response set. In addition, the algorithm automatically adjusts entropy coefficient according to the clamped entropy value, effectively controlling the entropy-induced bias while leveraging the entropy's benefits. AEnt is tested in math-reasoning tasks under different base models and datasets, and it is observed that AEnt outperforms the baselines consistently across multiple benchmarks.

【4】EvolveSignal: A Large Language Model Powered Coding Agent for Discovering Traffic Signal Control Algorithms
标题：EvolveSignal：一种用于发现交通信号控制算法的大型语言模型驱动的编码代理
链接：https://arxiv.org/abs/2509.03335

作者：ang, Peibo Duan, Hao Wang, Yue Wang, Jian Xu, Nan Zheng, Zhenliang Ma
摘要：在交通工程中，定时交通信号控制以其成本低、稳定性好、可解释性强等优点得到了广泛的应用。然而，它的设计依赖于手工制作的公式（例如，Webster）和工程师手动重新定时以适应需求变化，这是劳动密集型的，并且在异构或拥塞条件下经常产生次优结果。本文介绍了EvolveSignal，一个大语言模型（LLM）驱动的编码代理，自动发现新的交通信号控制算法。我们将问题表述为程序合成，其中候选算法表示为具有固定输入输出结构的Python函数，并通过外部评估（例如，交通模拟器）和进化搜索。信号交叉口的实验表明，所发现的算法优于韦伯斯特的基线，减少平均延误20.1%，平均停止47.1%。除了性能，消融和增量分析表明，EvolveSignal修改，如调整周期长度界限，纳入右转需求，重新调整绿色分配，可以为交通工程师提供实际有意义的见解。该工作开辟了一个新的研究方向，利用人工智能的算法设计在交通信号控制，桥梁程序综合与交通工程。
摘要：In traffic engineering, the fixed-time traffic signal control remains widely used for its low cost, stability, and interpretability. However, its design depends on hand-crafted formulas (e.g., Webster) and manual re-timing by engineers to adapt to demand changes, which is labor-intensive and often yields suboptimal results under heterogeneous or congested conditions. This paper introduces the EvolveSignal, a large language models (LLMs) powered coding agent to automatically discover new traffic signal control algorithms. We formulate the problem as program synthesis, where candidate algorithms are represented as Python functions with fixed input-output structures, and iteratively optimized through external evaluations (e.g., a traffic simulator) and evolutionary search. Experiments on a signalized intersection demonstrate that the discovered algorithms outperform Webster's baseline, reducing average delay by 20.1% and average stops by 47.1%. Beyond performance, ablation and incremental analyses reveal that EvolveSignal modifications-such as adjusting cycle length bounds, incorporating right-turn demand, and rescaling green allocations-can offer practically meaningful insights for traffic engineers. This work opens a new research direction by leveraging AI for algorithm design in traffic signal control, bridging program synthesis with transportation engineering.

【5】TeRA: Vector-based Random Tensor Network for High-Rank Adaptation of Large Language Models
标题：TeRA：用于大型语言模型高等级适应的基于载体的随机张量网络
链接：https://arxiv.org/abs/2509.03234

作者：, Wuyang Zhou, Giorgos Iacovides, Danilo Mandic
摘要：参数高效微调（PEFT）方法，如低秩自适应（LoRA），大大减少了微调大型语言模型（LLM）所需的可训练参数的数量。LoRA风格适配器的后续发展分为两个主要方向：（1）使用高阶适配器增强模型表达能力，以及（2）推动进一步的参数减少，例如基于向量的方法。然而，这些方法提出了一个权衡，因为实现高秩权重更新的表达能力通常是以牺牲基于向量的技术所提供的极端参数效率为代价的。为了解决这个问题，我们提出了一个基于向量的随机\underline{\textbf{Te}}nsor网络的高\underline{\textbf{R}}ank \underline{\textbf{A}}适配（TeRA），一种新的PEFT方法，实现高秩的权重更新，同时保持基于向量的PEFT适配器的参数效率。这是通过将张量化的权重更新矩阵参数化为Tucker-like张量网络（TN）来实现的，其中大的随机初始化因子被冻结并在各层之间共享，而只有由对角因子矩阵中的条目形成的小的特定于层的缩放向量被训练。这种设计有效地将权重更新矩阵的秩从可训练参数的数量中去除。综合实验表明，TeRA匹配甚至优于高阶适配器，同时需要类似于基于向量的方法的可训练参数计数。理论分析和消融实验进一步验证了该方法的有效性。
摘要：Parameter-Efficient Fine-Tuning (PEFT) methods, such as Low-Rank Adaptation (LoRA), have significantly reduced the number of trainable parameters needed in fine-tuning large language models (LLMs). Subsequent developments of LoRA-style adapters have diverged into two main directions: (1) enhancing model expressivity with high-rank adapters, and (2) pushing for further parameter reduction, as exemplified by vector-based methods. However, these approaches present a trade-off, as achieving the expressivity of high-rank weight updates typically comes at the cost of sacrificing the extreme parameter efficiency offered by vector-based techniques. To address this issue, we propose a vector-based random \underline{\textbf{Te}}nsor network for high-\underline{\textbf{R}}ank \underline{\textbf{A}}daptation (TeRA), a novel PEFT method that achieves high-rank weight updates while retaining the parameter efficiency of vector-based PEFT adapters. This is achieved by parameterizing the tensorized weight update matrix as a Tucker-like tensor network (TN), in which large randomly initialized factors are frozen and shared across layers, while only small layer-specific scaling vectors, formed by entries in diagonal factor matrices, are trained. This design effectively decouples the rank of the weight update matrix from the number of trainable parameters. Comprehensive experiments demonstrate that TeRA matches or even outperforms high-rank adapters, while requiring a trainable parameter count similar to vector-based methods. Theoretical analysis and ablation studies further validate the effectiveness of our approach.

【6】From Evaluation to Defense: Constructing Persistent Edit-Based Fingerprints for Large Language Models
标题：从评估到防御：为大型语言模型构建持久的基于编辑的指纹
链接：https://arxiv.org/abs/2509.03122

作者：in Yi, Dongsheng Shi, Yongyi Cui, Gerard de Melo, Xiaoling Wang, Linlin Wang
备注：preprint
摘要：大型语言模型（LLM）的知识产权（IP）保护越来越重要。通过指令调优将专用指纹注入LLM是一种常见的IP保护技术。然而，这可能会显着降低模型的性能，需要大量的计算资源，并表现出模型修改下的持久性差。我们认为，知识编辑提供了一个轻量级的替代方案，更适合指纹注入。因此，我们首次将知识编辑应用于指纹注入，并展示了其强大的能力。尽管使用了杂乱的文本作为指纹，以防止它们在微调期间被覆盖，但在大规模微调下仍然会发生降级。为了解决这个问题，我们提出了指纹子空间感知微调（FSFT），它通过限制指纹子空间的更新来减少指纹退化。即使在最坏的情况下，FSFT的性能也超过微调10%。此外，我们观察到，指纹注入模型很难区分指纹和相似的文本，因为它们的特征高度相似。这一发现强调了迫切需要更强大和细粒度的指纹注射方法的LLM。
摘要：The intellectual property (IP) protection of Large Language Models (LLMs) is increasingly critical. Injecting specialized fingerprints into LLMs through instruction tuning is a common IP protection technique. However, this may significantly degrade model performance, requires substantial computational resources, and exhibits poor persistence under model modifications. We argue that knowledge editing offers a lightweight alternative that is more suitable for fingerprint injection. Accordingly, we apply knowledge editing to fingerprint injection for the first time and demonstrate its strong capability. Despite using scrambled text as fingerprints to prevent them from being overwritten during fine-tuning, degradation still occurs under large-scale fine-tuning. To address this, we propose Fingerprint Subspace-aware Fine-Tuning (FSFT), which reduces fingerprint degradation by constraining the update of the fingerprint subspace. The performance of FSFT exceeds fine-tuning by 10% even in the worst-case scenario. Additionally, we observe that the fingerprint-injected models struggle to distinguish between fingerprints and similar texts due to the high similarity of their features. This finding underscores the urgent need for more robust and fine-grained fingerprinting injection methods for LLMs.

【7】Binary Quantization For LLMs Through Dynamic Grouping
标题：通过动态收件箱实现LLM的二进制量化
链接：https://arxiv.org/abs/2509.03054

作者：eng, Zhen-Qun Yang, Haoran Xie, S. Joe Qin, Arlene Chen, Fangzhen Lin
备注：14 pages, 11 figures
摘要：大型语言模型（LLM）在广泛的自然语言处理（NLP）任务中表现出卓越的性能，但需要大量的内存和计算资源。二进制量化将模型权重从16位Brain Float压缩为{-1，1}中的1位表示，可显著降低存储和推理成本。然而，与更保守的4比特量化方法相比，这种激进的量化通常导致显著的性能降级。在这项研究中，我们提出了一个新的优化目标，适合二进制量化，以及三个算法，旨在有效地实现它。我们的方法通过自适应分组策略动态识别最佳非结构化子矩阵来增强分块量化。实验结果表明，我们的方法实现了平均位长度仅为1.007位，同时保持高模型质量。具体来说，我们的量化LLaMA 3.2 3B模型达到了8.23的困惑，非常接近原来的7.81，并超过了以前的SOTA BiLLM的困惑只有123.90。此外，我们的方法在性能和效率方面与GPTQ等SOTA 4位方法具有竞争力。压缩过程非常高效，仅需14秒即可在单个CPU内核上完成完整的LLaMA 3.2 3B权重，整个过程在100分钟内完成，并表现出惊人的并行特性。代码-https://github.com/johnnyzheng0636/WGM_bi_quan
摘要：Large Language Models (LLMs) have demonstrated remarkable performance across a wide range of Natural Language Processing (NLP) tasks, but require substantial memory and computational resources. Binary quantization, which compresses model weights from 16-bit Brain Float to 1-bit representations in {-1, 1}, offers significant reductions in storage and inference costs. However, such aggressive quantization often leads to notable performance degradation compared to more conservative 4-bit quantization methods. In this research, we propose a novel optimization objective tailored for binary quantization, along with three algorithms designed to realize it effectively. Our method enhances blocked quantization by dynamically identifying optimal unstructured sub-matrices through adaptive grouping strategies. Experimental results demonstrate that our approach achieves an average bit length of just 1.007 bits, while maintaining high model quality. Specifically, our quantized LLaMA 3.2 3B model attains a perplexity of 8.23, remarkably close to the original 7.81, and surpasses previous SOTA BiLLM with a perplexity of only 123.90. Furthermore, our method is competitive with SOTA 4-bit approaches such as GPTQ in both performance and efficiency. The compression process is highly efficient, requiring only 14 seconds to quantize the full LLaMA 3.2 3B weights on a single CPU core, with the entire process completing in under 100 minutes and exhibiting embarrassingly parallel properties. Code - https://github.com/johnnyzheng0636/WGM_bi_quan

【8】Knowledge Integration for Physics-informed Symbolic Regression Using Pre-trained Large Language Models
标题：使用预训练的大型语言模型进行物理信息符号回归的知识集成
链接：https://arxiv.org/abs/2509.03036

作者：kin, Wenxiong Xie, Teddy Lazebnik
摘要：符号回归（SR）已经成为自动化科学发现的强大工具，可以从实验数据中导出控制方程。越来越多的工作说明了将领域知识集成到SR中以提高所发现的方程的通用性和实用性的前景。物理信息SR（PiSR）通过结合领域知识来解决这一问题，但目前的方法通常需要专门的公式和手动特征工程，仅限于领域专家的适应性。在这项研究中，我们利用预先训练的大型语言模型（LLM）来促进PiSR中的知识整合。通过利用对大量科学文献进行培训的LLM的上下文理解，我们的目标是自动化领域知识的整合，减少人工干预的需要，并使该过程更容易解决更广泛的科学问题。也就是说，LLM被集成到SR的损失函数中，增加了LLM对SR的产生的方程的评估的项。我们使用三种SR算法（DEAP，gplearn和PySR）和三种预先训练的LLM（Falcon，Mistral和LLama 2）在三种物理动力学（落球，简谐运动和电磁波）中广泛评估我们的方法。结果表明，LLM集成一致地改善了从数据的物理动力学的重建，增强了SR模型对噪声和复杂性的鲁棒性。我们进一步探讨的影响，提示工程，发现更多的信息提示显着提高性能。
摘要：Symbolic regression (SR) has emerged as a powerful tool for automated scientific discovery, enabling the derivation of governing equations from experimental data. A growing body of work illustrates the promise of integrating domain knowledge into the SR to improve the discovered equation's generality and usefulness. Physics-informed SR (PiSR) addresses this by incorporating domain knowledge, but current methods often require specialized formulations and manual feature engineering, limiting their adaptability only to domain experts. In this study, we leverage pre-trained Large Language Models (LLMs) to facilitate knowledge integration in PiSR. By harnessing the contextual understanding of LLMs trained on vast scientific literature, we aim to automate the incorporation of domain knowledge, reducing the need for manual intervention and making the process more accessible to a broader range of scientific problems. Namely, the LLM is integrated into the SR's loss function, adding a term of the LLM's evaluation of the SR's produced equation. We extensively evaluate our method using three SR algorithms (DEAP, gplearn, and PySR) and three pre-trained LLMs (Falcon, Mistral, and LLama 2) across three physical dynamics (dropping ball, simple harmonic motion, and electromagnetic wave). The results demonstrate that LLM integration consistently improves the reconstruction of physical dynamics from data, enhancing the robustness of SR models to noise and complexity. We further explore the impact of prompt engineering, finding that more informative prompts significantly improve performance.

【9】Mycroft: Tracing Dependencies in Collective Communication Towards Reliable LLM Training
标题：Mycroft：追踪集体沟通中的不确定性，以实现可靠的LLM训练
链接：https://arxiv.org/abs/2509.03018

作者：eng, Lei Zhang, Qinlong Wang, Xiaoyun Zhi, Xinlei Zhang, Zhuo Jiang, Haohan Xu, Lei Wang, Zuquan Song, Gaohong Liu, Yang Bai, Shuguang Wang, Wencong Xiao, Jianxi Ye, Minlan Yu, Hong Xu
摘要：可靠性对于确保LLM培训的效率至关重要。然而，许多现实世界的可靠性问题仍然难以解决，导致资源浪费和模型性能下降。不幸的是，今天的集体通信库像黑匣子一样运行，隐藏了有效的根本原因分析所需的关键信息。我们提出了Mycroft，一个轻量级的分布式跟踪和根本原因分析系统，旨在解决以前隐藏的可靠性问题，在集体通信。Mycroft的关键思想是跟踪集体通信状态，并利用内部控制和数据依赖性来解决LLM培训中的可靠性问题。Mycroft已经在字节跳动部署了六个多月，用于在运行时调试集体通信相关问题。它在90%的情况下在15秒内检测到异常，在60%的情况下在20秒内确定了根本原因。我们还进行了广泛的故障注入实验，以证明Mycroft的能力和效率。
摘要：Reliability is essential for ensuring efficiency in LLM training. However, many real-world reliability issues remain difficult to resolve, resulting in wasted resources and degraded model performance. Unfortunately, today's collective communication libraries operate as black boxes, hiding critical information needed for effective root cause analysis. We propose Mycroft, a lightweight distributed tracing and root cause analysis system designed to address previously hidden reliability issues in collective communication. Mycroft's key idea is to trace collective communication states and leverage internal control and data dependencies to resolve reliability problems in LLM training. Mycroft has been deployed at ByteDance for over six months to debug collective communication related issues at runtime. It detected anomalies within 15 seconds in 90% of cases and identified the root cause within 20 seconds in 60% of cases. We also conducted extensive fault injection experiments to demonstrate Mycroft's capability and efficiency.

【10】Unlearning That Lasts: Utility-Preserving, Robust, and Almost Irreversible Forgetting in LLMs
标题：忘记持久的学习：LLC中的实用性保留、稳健且几乎不可逆转的遗忘
链接：https://arxiv.org/abs/2509.02820

作者：p Singh, Maximilian Müller, Francesco Croce, Matthias Hein
摘要：在大型语言模型（LLM）中进行学习涉及从预先训练的模型中精确删除特定信息。这对于通过删除在预训练期间获得的私人数据或有害知识来确保LLM的安全至关重要。然而，现有的遗忘方法往往不足以进行彻底的评估。为了克服这个问题，我们引入了JensUn，在其中我们利用Jensen-Shannon Divergence作为遗忘集和保留集的训练目标，与常用的损失函数相比，可以实现更稳定、更有效的遗忘动态。在广泛的实验中，JensUn比竞争方法实现了更好的遗忘效用权衡，甚至表现出对良性再学习的强大弹性。此外，为了进行精确的遗忘评估，我们引入了LKF，这是一个鲜为人知的事实的策划数据集，提供了一个现实的遗忘场景。最后，为了全面测试非学习方法，我们建议（i）采用LLM作为语义判断，而不是标准的ROUGE评分，以及（ii）对各种释义和输入格式使用最坏情况的非学习评估。我们改进后的评估框架表明，许多现有的方法不如以前认为的有效。
摘要：Unlearning in large language models (LLMs) involves precisely removing specific information from a pre-trained model. This is crucial to ensure safety of LLMs by deleting private data or harmful knowledge acquired during pre-training. However, existing unlearning methods often fall short when subjected to thorough evaluation. To overcome this, we introduce JensUn, where we leverage the Jensen-Shannon Divergence as the training objective for both forget and retain sets for more stable and effective unlearning dynamics compared to commonly used loss functions. In extensive experiments, JensUn achieves better forget-utility trade-off than competing methods, and even demonstrates strong resilience to benign relearning. Additionally, for a precise unlearning evaluation, we introduce LKF, a curated dataset of lesser-known facts that provides a realistic unlearning scenario. Finally, to comprehensively test unlearning methods, we propose (i) employing an LLM as semantic judge instead of the standard ROUGE score, and (ii) using worst-case unlearning evaluation over various paraphrases and input formats. Our improved evaluation framework reveals that many existing methods are less effective than previously thought.

【11】Challenges in Understanding Modality Conflict in Vision-Language Models
标题：理解视觉语言模型中情态冲突的挑战
链接：https://arxiv.org/abs/2509.02805

作者：yen, Jackson Michaels, Madalina Fiterau, David Jensen
摘要：本文强调了在视觉语言模型（VLM）中从冲突解决中分解冲突检测的挑战，并提出了潜在的方法，包括通过线性探测和基于组的注意力模式分析使用监督度量。我们进行了一个机械的调查LLaVA-OV-7 B，一个国家的最先进的VLM，表现出不同的分辨率行为时，面对冲突的多模态输入。我们的研究结果表明，一个线性解码的冲突信号出现在模型的中间层和注意模式与冲突检测和解决分歧在不同阶段的网络。这些发现支持了检测和解决是功能上不同的机制的假设。我们讨论了这种分解如何使更具可操作性的可解释性和有针对性的干预措施，以提高模型的鲁棒性，在具有挑战性的多模态设置。
摘要：This paper highlights the challenge of decomposing conflict detection from conflict resolution in Vision-Language Models (VLMs) and presents potential approaches, including using a supervised metric via linear probes and group-based attention pattern analysis. We conduct a mechanistic investigation of LLaVA-OV-7B, a state-of-the-art VLM that exhibits diverse resolution behaviors when faced with conflicting multimodal inputs. Our results show that a linearly decodable conflict signal emerges in the model's intermediate layers and that attention patterns associated with conflict detection and resolution diverge at different stages of the network. These findings support the hypothesis that detection and resolution are functionally distinct mechanisms. We discuss how such decomposition enables more actionable interpretability and targeted interventions for improving model robustness in challenging multimodal settings.

Graph相关(图学习|图神经网络|图优化等)(8篇)

【1】Graph neural networks for learning liquid simulations in dynamic scenes containing kinematic objects
标题：用于学习包含运动学对象的动态场景中的液体模拟的图神经网络
链接：https://arxiv.org/abs/2509.03446

作者：idlagajni, Constantin A. Rothkopf
摘要：高保真仿真粒子动力学对于解决设计、图形和机器人中涉及液体的现实世界交互和控制任务至关重要。最近，数据驱动的方法，特别是基于图神经网络（GNN）的方法，在解决这些问题方面取得了进展。然而，这些方法通常限于学习静态自由落体环境中的流体行为或涉及原始对象的简单操纵设置，通常忽略与动态移动运动学刚体的复杂交互。在这里，我们提出了一个基于GNN的框架，从头开始设计，以学习刚体相互作用和主动操纵下的液体动力学，其中粒子表示为图形节点，粒子-对象碰撞使用具有包围体层次（BVH）算法的表面表示来处理。这种方法使网络能够模拟液体颗粒和复杂表面几何形状之间的复杂相互作用。我们的模型准确地捕捉动态设置中的流体行为，也可以作为静态自由落体环境中的模拟器。尽管在倾倒的单对象操作任务上进行了训练，但我们的模型有效地推广到了具有看不见的对象和新颖的操作任务（如搅拌和舀）的环境。最后，我们表明，学习的动态可以利用基于梯度的优化方法来解决控制和操纵任务。
摘要：Simulating particle dynamics with high fidelity is crucial for solving real-world interaction and control tasks involving liquids in design, graphics, and robotics. Recently, data-driven approaches, particularly those based on graph neural networks (GNNs), have shown progress in tackling such problems. However, these approaches are often limited to learning fluid behavior in static free-fall environments or simple manipulation settings involving primitive objects, often overlooking complex interactions with dynamically moving kinematic rigid bodies. Here, we propose a GNN-based framework designed from the ground up to learn the dynamics of liquids under rigid body interactions and active manipulations, where particles are represented as graph nodes and particle-object collisions are handled using surface representations with the bounding volume hierarchy (BVH) algorithm. This approach enables the network to model complex interactions between liquid particles and intricate surface geometries. Our model accurately captures fluid behavior in dynamic settings and can also function as a simulator in static free-fall environments. Despite being trained on a single-object manipulation task of pouring, our model generalizes effectively to environments with unseen objects and novel manipulation tasks such as stirring and scooping. Finally, we show that the learned dynamics can be leveraged to solve control and manipulation tasks using gradient-based optimization methods.

【2】Exploring a Graph-based Approach to Offline Reinforcement Learning for Sepsis Treatment
标题：探索基于图形的离线强化学习方法用于败血症治疗
链接：https://arxiv.org/abs/2509.03393

作者：hakharova, Lucas Sakizloglou, Leen Lambers
备注：18th European Workshop on Reinforcement Learning (EWRL 2025)
摘要：脓毒症是一种严重的，危及生命的疾病。当治疗败血症时，确定给定患者的静脉内液体和血管加压药的正确量是具有挑战性的。虽然基于自动强化学习（RL）的方法已被用于支持这些决策，并取得了令人鼓舞的结果，但以前的研究依赖于关系数据。考虑到现代医疗保健数据的复杂性，将数据表示为图形可以提供更自然和有效的方法。这项研究将来自著名的MIMIC-III数据集的患者数据建模为随时间演变的异构图。随后，我们探索了两种图神经网络架构- GraphSAGE和GATv 2-用于学习患者状态表示，采用将表示学习与策略学习解耦的方法。编码器被训练以产生潜在状态表示，与预测下一个患者状态的解码器联合。然后，这些表示用于使用dBCQ算法的策略学习。我们的实验评估的结果证实了基于图的方法的潜力，同时突出了表征学习在这一领域的复杂性。
摘要：Sepsis is a serious, life-threatening condition. When treating sepsis, it is challenging to determine the correct amount of intravenous fluids and vasopressors for a given patient. While automated reinforcement learning (RL)-based methods have been used to support these decisions with promising results, previous studies have relied on relational data. Given the complexity of modern healthcare data, representing data as a graph may provide a more natural and effective approach. This study models patient data from the well-known MIMIC-III dataset as a heterogeneous graph that evolves over time. Subsequently, we explore two Graph Neural Network architectures - GraphSAGE and GATv2 - for learning patient state representations, adopting the approach of decoupling representation learning from policy learning. The encoders are trained to produce latent state representations, jointly with decoders that predict the next patient state. These representations are then used for policy learning with the dBCQ algorithm. The results of our experimental evaluation confirm the potential of a graph-based approach, while highlighting the complexity of representation learning in this domain.

【3】Temporal social network modeling of mobile connectivity data with graph neural networks
标题：利用图神经网络对移动连接数据进行时间社交网络建模
链接：https://arxiv.org/abs/2509.03319

作者：ari, Chandreyee Roy, Fumiko Ogushi, Mikko Saukkoriipi, Jaakko Sahlsten, Kimmo Kaski
备注：22 pages, 7 figures
摘要：图神经网络（GNN）已经成为一种最先进的数据驱动工具，用于对图结构复杂网络的连接数据进行建模，并在空间和时间上整合其节点和边的信息。然而，到目前为止，使用人们的移动连接数据的时间序列的社交网络的分析还没有被广泛研究。在本研究中，我们研究了四种基于快照的时间GNN在预测移动通信网络用户之间的电话呼叫和短信活动中的应用。此外，我们使用最近提出的EdgeBank方法开发了一个简单的非GNN基线模型。我们的分析表明，在大多数情况下，ROLAND时间GNN的性能优于基线模型，而其他三个GNN的平均性能低于基线。结果表明，GNN为基础的方法持有承诺的时间社交网络通过移动连接数据的分析。然而，由于ROLAND和基线模型之间的性能裕度相对较小，需要对用于时间社交网络分析的专用GNN架构进行进一步研究。
摘要：Graph neural networks (GNNs) have emerged as a state-of-the-art data-driven tool for modeling connectivity data of graph-structured complex networks and integrating information of their nodes and edges in space and time. However, as of yet, the analysis of social networks using the time series of people's mobile connectivity data has not been extensively investigated. In the present study, we investigate four snapshot - based temporal GNNs in predicting the phone call and SMS activity between users of a mobile communication network. In addition, we develop a simple non - GNN baseline model using recently proposed EdgeBank method. Our analysis shows that the ROLAND temporal GNN outperforms the baseline model in most cases, whereas the other three GNNs perform on average worse than the baseline. The results show that GNN based approaches hold promise in the analysis of temporal social networks through mobile connectivity data. However, due to the relatively small performance margin between ROLAND and the baseline model, further research is required on specialized GNN architectures for temporal social network analysis.

【4】Discrete Functional Geometry of ReLU Networks via ReLU Transition Graphs
标题：通过ReLU转移图的ReLU网络的离散函数几何
链接：https://arxiv.org/abs/2509.03056

作者：esh Dhayalkar
备注：7 pages, 3 figures. Submitted as a conference paper to 2025 5th International Conference on Robotics, Automation, and Artificial Intelligence (RAAI 2025)
摘要：我们将ReLU转换图（RTG）框架扩展为一个全面的图论模型，用于理解深度ReLU网络。在这个模型中，每个节点代表一个线性激活区域，边缘连接不同的区域，通过单个ReLU激活翻转，在网络的功能行为上形成离散的几何结构。我们证明，在随机初始化的RTGs表现出强大的扩展，二项度分布，和频谱特性，严格控制泛化。这些结构性的见解，使新的界限容量通过区域熵和泛化通过频谱间隙和边方向KL分歧。从经验上讲，我们构建RTG的小型网络，测量其平滑性和连通性，并验证理论预测。我们的研究结果表明，区域熵饱和下过参数化，光谱间隙与泛化，和KL分歧相邻区域反映功能的光滑性。这项工作提供了一个统一的框架，通过离散功能几何的镜头分析ReLU网络，为理解，诊断和改进泛化提供了新的工具。
摘要：We extend the ReLU Transition Graph (RTG) framework into a comprehensive graph-theoretic model for understanding deep ReLU networks. In this model, each node represents a linear activation region, and edges connect regions that differ by a single ReLU activation flip, forming a discrete geometric structure over the network's functional behavior. We prove that RTGs at random initialization exhibit strong expansion, binomial degree distributions, and spectral properties that tightly govern generalization. These structural insights enable new bounds on capacity via region entropy and on generalization via spectral gap and edge-wise KL divergence. Empirically, we construct RTGs for small networks, measure their smoothness and connectivity properties, and validate theoretical predictions. Our results show that region entropy saturates under overparameterization, spectral gap correlates with generalization, and KL divergence across adjacent regions reflects functional smoothness. This work provides a unified framework for analyzing ReLU networks through the lens of discrete functional geometry, offering new tools to understand, diagnose, and improve generalization.

【5】RankGraph: Unified Heterogeneous Graph Learning for Cross-Domain Recommendation
标题：RankShape：跨领域推荐的统一异类图学习
链接：https://arxiv.org/abs/2509.02942

作者：, Junjie Yang, Li Chen, Hong Li, Li Yu, Hong Yan
备注：RecSys 2025
摘要：跨领域推荐系统面临的挑战是如何在不同的产品领域中集成细粒度的用户和项目关系。为了解决这个问题，我们引入了RankGraph，这是一个可扩展的图学习框架，旨在作为推荐基础模型（FM）的核心组件。通过构建和利用由跨多个产品的异构节点和边组成的图，RankGraph可以集成用户、帖子、广告和其他实体之间的复杂关系。我们的框架采用GPU加速的图神经网络和对比学习，允许动态提取子图，如项目-项目和用户-用户图，以支持基于相似性的检索和实时聚类。此外，RankGraph将基于图的预训练表示作为上下文令牌集成到FM序列模型中，从而丰富了结构化的关系知识。RankGraph在在线A/B测试中展示了点击率（+0.92%）和转化率（+2.82%）的改进，展示了其在跨域推荐场景中的有效性。
摘要：Cross-domain recommendation systems face the challenge of integrating fine-grained user and item relationships across various product domains. To address this, we introduce RankGraph, a scalable graph learning framework designed to serve as a core component in recommendation foundation models (FMs). By constructing and leveraging graphs composed of heterogeneous nodes and edges across multiple products, RankGraph enables the integration of complex relationships between users, posts, ads, and other entities. Our framework employs a GPU-accelerated Graph Neural Network and contrastive learning, allowing for dynamic extraction of subgraphs such as item-item and user-user graphs to support similarity-based retrieval and real-time clustering. Furthermore, RankGraph integrates graph-based pretrained representations as contextual tokens into FM sequence models, enriching them with structured relational knowledge. RankGraph has demonstrated improvements in click (+0.92%) and conversion rates (+2.82%) in online A/B tests, showcasing its effectiveness in cross-domain recommendation scenarios.

【6】Power Grid Control with Graph-Based Distributed Reinforcement Learning
标题：基于图的分布式强化学习的电网控制
链接：https://arxiv.org/abs/2509.02861

作者：rizio, Gianvito Losapio, Marco Mussi, Alberto Maria Metelli, Marcello Restelli
摘要：可再生能源的必要整合，加上电网规模的不断扩大，对现代电网的控制提出了重大挑战。传统的控制系统，这是人为的和优化的基础上，努力适应和规模在这样一个不断变化的背景下，激励探索更动态和分布式控制策略。这项工作提出了一个基于图的分布式强化学习框架，用于实时，可扩展的网格管理。建议的体系结构由一个网络的分布式低级别代理人的行为对个人的电力线和协调的高级别的经理代理。一个图形神经网络（GNN）被用来编码网络的拓扑信息内的单个低级别代理的观察。为了加速收敛和增强学习稳定性，该框架集成了模仿学习和基于潜力的奖励塑造。与传统的分散式方法相比，这种方法只分解动作空间，而依赖于全局观测，这种方法也分解了观测空间。每个低级别代理的行为基于通过GNN构建的环境的结构化和信息化的本地视图。在Grid2Op仿真环境下的实验表明了该方法的有效性，其性能始终优于该领域常用的标准基线。此外，该模型被证明是更有效的计算比基于模拟的专家方法。
摘要：The necessary integration of renewable energy sources, combined with the expanding scale of power networks, presents significant challenges in controlling modern power grids. Traditional control systems, which are human and optimization-based, struggle to adapt and to scale in such an evolving context, motivating the exploration of more dynamic and distributed control strategies. This work advances a graph-based distributed reinforcement learning framework for real-time, scalable grid management. The proposed architecture consists of a network of distributed low-level agents acting on individual power lines and coordinated by a high-level manager agent. A Graph Neural Network (GNN) is employed to encode the network's topological information within the single low-level agent's observation. To accelerate convergence and enhance learning stability, the framework integrates imitation learning and potential-based reward shaping. In contrast to conventional decentralized approaches that decompose only the action space while relying on global observations, this method also decomposes the observation space. Each low-level agent acts based on a structured and informative local view of the environment constructed through the GNN. Experiments on the Grid2Op simulation environment show the effectiveness of the approach, which consistently outperforms the standard baseline commonly adopted in the field. Additionally, the proposed model proves to be much more computationally efficient than the simulation-based Expert method.

【7】Multi-Scale Deep Learning for Colon Histopathology: A Hybrid Graph-Transformer Approach
标题：结肠组织学的多尺度深度学习：混合图形转换方法
链接：https://arxiv.org/abs/2509.02851

作者：emi, Amirhossein Ahmadkhan Kordbacheh
摘要：结肠癌也被称为结直肠癌，是世界上最恶性的癌症类型之一。结肠癌的早期检测对于防止其恶化至关重要。这项研究提出了一种混合多尺度深度学习架构，该架构协同胶囊网络、图形注意力机制、Transformer模块和残差学习，以推进肺癌和结肠癌组织病理学图像数据集（LC 25000）数据集上的结肠癌分类。本文提出的模型利用了HG-TNet模型，该模型引入了一种混合架构，该架构将Transformers和卷积神经网络中的强度点连接起来，以捕获组织病理学图像中的多尺度特征。主要地，Transformer分支通过基于卷积的补丁嵌入将图像分割成补丁，然后通过Transformer编码器处理这些补丁来提取全局上下文键。类似地，专用的CNN分支通过连续合并这些不同的特征来捕获细粒度的局部细节，结合自监督旋转预测目标，产生在性能上超过标准架构的鲁棒诊断表示。实验结果表明，这些算法不仅在精度和损失函数上有较好的性能，而且利用胶囊网络保持空间有序性，实现了各个元素如何单独组合形成整体结构。
摘要：Colon cancer also known as Colorectal cancer, is one of the most malignant types of cancer worldwide. Early-stage detection of colon cancer is highly crucial to prevent its deterioration. This research presents a hybrid multi-scale deep learning architecture that synergizes capsule networks, graph attention mechanisms, transformer modules, and residual learning to advance colon cancer classification on the Lung and Colon Cancer Histopathological Image Dataset (LC25000) dataset. The proposed model in this paper utilizes the HG-TNet model that introduces a hybrid architecture that joins strength points in transformers and convolutional neural networks to capture multi-scale features in histopathological images. Mainly, a transformer branch extracts global contextual bonds by partitioning the image into patches by convolution-based patch embedding and then processing these patches through a transformer encoder. Analogously, a dedicated CNN branch captures fine-grained, local details through successive Incorporation these diverse features, combined with a self-supervised rotation prediction objective, produce a robust diagnostic representation that surpasses standard architectures in performance. Results show better performance not only in accuracy or loss function but also in these algorithms by utilizing capsule networks to preserve spatial orders and realize how each element individually combines and forms whole structures.

【8】Learning Laplacian Eigenvectors: a Pre-training Method for Graph Neural Networks
标题：学习拉普拉斯特征载体：图神经网络的预训练方法
链接：https://arxiv.org/abs/2509.02803

作者：i, Nyambura Njenga, Benjamin Whitsett, Catherine Ma, Darwin Deng, Sara de Ángel, Alexandre Van Tassel, Siddharth Viswanath, Ryan Pellico, Ian Adelstein, Smita Krishnaswamy
摘要：我们提出了一种新的框架，通过归纳学习拉普拉斯特征向量来预训练图神经网络（GNNs）。传统的消息传递神经网络（MPNN）通常难以捕获全局和区域图结构，因为随着网络深度的增加，存在过度平滑的风险。由于图拉普拉斯矩阵的低频特征向量编码全局信息，因此预训练GNN来预测这些特征向量会鼓励网络自然地学习每个图上的大规模结构模式。从经验上讲，我们表明，通过我们的框架预训练的模型在各种基于图结构的任务上优于基线模型。虽然大多数现有的预训练方法都专注于特定领域的任务，如节点或边缘特征重建，但我们的自监督预训练框架是基于结构的，非常灵活。特征向量学习可以应用于所有基于图的数据集，并且可以在特定任务数据稀疏时与合成特征一起使用。
摘要：We propose a novel framework for pre-training Graph Neural Networks (GNNs) by inductively learning Laplacian eigenvectors. Traditional Message Passing Neural Networks (MPNNs) often struggle to capture global and regional graph structure due to over-smoothing risk as network depth increases. Because the low-frequency eigenvectors of the graph Laplacian matrix encode global information, pre-training GNNs to predict these eigenvectors encourages the network to naturally learn large-scale structural patterns over each graph. Empirically, we show that models pre-trained via our framework outperform baseline models on a variety of graph structure-based tasks. While most existing pre-training methods focus on domain-specific tasks like node or edge feature reconstruction, our self-supervised pre-training framework is structure-based and highly flexible. Eigenvector-learning can be applied to all graph-based datasets, and can be used with synthetic features when task-specific data is sparse.

GAN|对抗|攻击|生成相关(1篇)

【1】On the MIA Vulnerability Gap Between Private GANs and Diffusion Models
标题：私人GAN和扩散模型之间的MIA脆弱性差距
链接：https://arxiv.org/abs/2509.03341

作者：ag, Jean-Yves Franceschi, Alain Rakotomamonjy, Alexandre Allauzen, Jamal Atif
摘要：生成对抗网络（GANs）和扩散模型已成为高质量图像合成的主要方法。虽然两者都可以在差异隐私（DP）下进行训练以保护敏感数据，但它们对成员推断攻击（MIA）的敏感性（数据机密性的关键威胁）仍然知之甚少。在这项工作中，我们提出了第一个统一的理论和实证分析的隐私风险所面临的差异私人生成模型。我们首先通过基于稳定性的分析表明，GANs对数据扰动的敏感性比扩散模型低得多，这表明在抵抗MIA方面具有结构优势。然后，我们使用标准化的MIA管道进行全面的实证研究，以评估数据集和隐私预算之间的隐私泄漏，从而验证了这一见解。我们的结果一致显示，即使在强DP机制中，也存在明显的隐私鲁棒性差距，有利于GANs，这突出表明模型类型本身可以严重影响隐私泄露。
摘要：Generative Adversarial Networks (GANs) and diffusion models have emerged as leading approaches for high-quality image synthesis. While both can be trained under differential privacy (DP) to protect sensitive data, their sensitivity to membership inference attacks (MIAs), a key threat to data confidentiality, remains poorly understood. In this work, we present the first unified theoretical and empirical analysis of the privacy risks faced by differentially private generative models. We begin by showing, through a stability-based analysis, that GANs exhibit fundamentally lower sensitivity to data perturbations than diffusion models, suggesting a structural advantage in resisting MIAs. We then validate this insight with a comprehensive empirical study using a standardized MIA pipeline to evaluate privacy leakage across datasets and privacy budgets. Our results consistently reveal a marked privacy robustness gap in favor of GANs, even in strong DP regimes, highlighting that model type alone can critically shape privacy leakage.

半/弱/无/有监督|不确定性|主动学习(7篇)

【1】Unsupervised Learning based Element Resource Allocation for Reconfigurable Intelligent Surfaces in mmWave Network
标题：毫米波网络中基于无监督学习的可重构智能表面元素资源分配
链接：https://arxiv.org/abs/2509.03241

作者：amillapalli, Yoghitha Ramamoorthi, Abhinav Kumar, Tomoki Murakami, Tomoaki Ogawa, Yasushi Takatori
摘要：无线系统中对高数据速率和无缝连接的需求日益增长，引发了人们对可重构智能表面（RIS）和基于人工智能的无线应用的极大兴趣。RIS通常包括无源反射天线元件，其通过充分调谐反射元件的相位来控制无线传播环境。RIS元素分配给多用户设备（UE）是有效利用RIS的关键。在这项工作中，我们制定了一个联合优化问题，优化RIS阶段配置和资源分配下的$\alpha$-公平的调度框架，并提出了一种有效的方式分配RIS元素。然而，传统的迭代优化方法随着RIS元素数量的增加而遭受指数增加的计算复杂度，并且还使监督学习的训练标签的生成复杂化。为了克服这些挑战，我们提出了一个五层全连接神经网络（FNN）与预处理技术相结合，以显着降低输入维数，降低计算复杂度，并提高可扩展性。仿真结果表明，我们提出的基于神经网络的解决方案减少了计算开销，同时显着提高系统吞吐量的6.8%，现有的RIS元素分配方案相比。此外，所提出的系统实现了更好的性能，同时降低了计算复杂度，使其显着更具可扩展性比迭代优化算法。
摘要：The increasing demand for high data rates and seamless connectivity in wireless systems has sparked significant interest in reconfigurable intelligent surfaces (RIS) and artificial intelligence-based wireless applications. RIS typically comprises passive reflective antenna elements that control the wireless propagation environment by adequately tuning the phase of the reflective elements. The allocation of RIS elements to multipleuser equipment (UEs) is crucial for efficiently utilizing RIS. In this work, we formulate a joint optimization problem that optimizes the RIS phase configuration and resource allocation under an $\alpha$-fair scheduling framework and propose an efficient way of allocating RIS elements. Conventional iterative optimization methods, however, suffer from exponentially increasing computational complexity as the number of RIS elements increases and also complicate the generation of training labels for supervised learning. To overcome these challenges, we propose a five-layer fully connected neural network (FNN) combined with a preprocessing technique to significantly reduce input dimensionality, lower computational complexity, and enhance scalability. The simulation results show that our proposed NN-based solution reduces computational overhead while significantly improving system throughput by 6.8% compared to existing RIS element allocation schemes. Furthermore, the proposed system achieves better performance while reducing computational complexity, making it significantly more scalable than the iterative optimization algorithms.

【2】Uncertainty-driven Adaptive Exploration
标题：不确定性驱动的适应性探索
链接：https://arxiv.org/abs/2509.03219

作者：Bakopoulos, Georgios Chalkiadakis
摘要：自适应探索方法提出了通过探索和利用之间的交替来学习复杂策略的方法。此类方法的一个重要问题是确定在勘探和开采之间切换的适当时机，反之亦然。这在需要学习长而复杂的动作序列的领域中至关重要。在这项工作中，我们提出了一个通用的自适应探索框架，采用不确定性来解决这一重要问题的原则性的方式。我们的框架包括以前的自适应探索方法作为特例。此外，我们可以在我们的框架中纳入任何选择的不确定性测量机制，例如内在动机或基于认知不确定性的探索方法中使用的机制。我们的实验表明，我们的框架产生了自适应的探索策略，在几个MuJoCo环境中优于标准的。
摘要：Adaptive exploration methods propose ways to learn complex policies via alternating between exploration and exploitation. An important question for such methods is to determine the appropriate moment to switch between exploration and exploitation and vice versa. This is critical in domains that require the learning of long and complex sequences of actions. In this work, we present a generic adaptive exploration framework that employs uncertainty to address this important issue in a principled manner. Our framework includes previous adaptive exploration approaches as special cases. Moreover, we can incorporate in our framework any uncertainty-measuring mechanism of choice, for instance mechanisms used in intrinsic motivation or epistemic uncertainty-based exploration methods. We experimentally demonstrate that our framework gives rise to adaptive exploration strategies that outperform standard ones across several MuJoCo environments.

【3】Autonomous Learning From Success and Failure: Goal-Conditioned Supervised Learning with Negative Feedback
标题：成功与失败中的自主学习：负反馈的目标条件监督学习
链接：https://arxiv.org/abs/2509.03206

作者：hang, Fabian Wurzberger, Gerrit Schmid, Sebastian Gottwald, Daniel A. Braun
摘要：强化学习在应用于具有稀疏奖励结构的任务时面临着重大挑战。虽然模仿学习在监督学习领域内提供了更快的收敛速度，但它在很大程度上依赖于人类生成的演示。最近，目标条件监督学习（GCSL）已经成为一个潜在的解决方案，使自主系统的自我模仿学习。通过战略性地重新标记目标，代理人可以从自己的经验中获得政策见解。尽管这一框架取得了成功，但它存在两个明显的局限性：（1）完全从自我生成的经验中学习会加剧代理人的固有偏见;（2）重新标记策略允许代理人只关注成功的结果，阻止他们从错误中学习。为了解决这些问题，我们提出了一个新的模型，将对比学习原则纳入GCSL框架，从成功和失败中学习。通过实证评估，我们证明，我们的算法克服了代理人的初始偏见的限制，从而使更多的探索性行为。这有助于识别和采用有效的策略，从而在各种具有挑战性的环境中实现卓越的性能。
摘要：Reinforcement learning faces significant challenges when applied to tasks characterized by sparse reward structures. Although imitation learning, within the domain of supervised learning, offers faster convergence, it relies heavily on human-generated demonstrations. Recently, Goal-Conditioned Supervised Learning (GCSL) has emerged as a potential solution by enabling self-imitation learning for autonomous systems. By strategically relabelling goals, agents can derive policy insights from their own experiences. Despite the successes of this framework, it presents two notable limitations: (1) Learning exclusively from self-generated experiences can exacerbate the agents' inherent biases; (2) The relabelling strategy allows agents to focus solely on successful outcomes, precluding them from learning from their mistakes. To address these issues, we propose a novel model that integrates contrastive learning principles into the GCSL framework to learn from both success and failure. Through empirical evaluations, we demonstrate that our algorithm overcomes limitations imposed by agents' initial biases and thereby enables more exploratory behavior. This facilitates the identification and adoption of effective policies, leading to superior performance across a variety of challenging environments.

【4】VendiRL: A Framework for Self-Supervised Reinforcement Learning of Diversely Diverse Skills
标题：VendiRL：多样化技能的自我监督强化学习框架
链接：https://arxiv.org/abs/2509.02930

作者：intunen
备注：17 pages including appendices
摘要：在自我监督强化学习（RL）中，一个关键的挑战是学习一组不同的技能，为未知的未来任务做好准备。尽管取得了令人印象深刻的进展，但可扩展性和评估仍然是普遍存在的问题。关于可扩展性，对有意义的技能的搜索可能被高维特征空间所掩盖，其中相关特征可能在下游任务域中有所不同。为了评估技能多样性，界定什么是“多样性”，通常需要对技能多样性意味着什么的具体概念作出严格承诺，这可能导致对技能多样性的理解不一致，使不同方法的结果难以比较，并使许多形式的多样性未被探索。为了解决这些问题，我们采用了样本多样性的衡量标准，将生态学的想法转化为机器学习-Vendi Score -允许用户指定和评估任何所需的多样性形式。我们演示了如何这个指标促进技能评估，并介绍VendiRL，一个统一的框架，学习不同的技能集。考虑到不同的相似性函数，VendiRL激发了不同形式的多样性，这可以在新的和丰富的交互式环境中支持技能多样性预训练，其中可能需要优化各种形式的多样性。
摘要：In self-supervised reinforcement learning (RL), one of the key challenges is learning a diverse set of skills to prepare agents for unknown future tasks. Despite impressive advances, scalability and evaluation remain prevalent issues. Regarding scalability, the search for meaningful skills can be obscured by high-dimensional feature spaces, where relevant features may vary across downstream task domains. For evaluating skill diversity, defining what constitutes "diversity" typically requires a hard commitment to a specific notion of what it means for skills to be diverse, potentially leading to inconsistencies in how skill diversity is understood, making results across different approaches hard to compare, and leaving many forms of diversity unexplored. To address these issues, we adopt a measure of sample diversity that translates ideas from ecology to machine learning -- the Vendi Score -- allowing the user to specify and evaluate any desired form of diversity. We demonstrate how this metric facilitates skill evaluation and introduce VendiRL, a unified framework for learning diversely diverse sets of skills. Given distinct similarity functions, VendiRL motivates distinct forms of diversity, which could support skill-diversity pretraining in new and richly interactive environments where optimising for various forms of diversity may be desirable.

【5】PDRL: Post-hoc Descriptor-based Residual Learning for Uncertainty-Aware Machine Learning Potentials
标题：PDRL：基于事后描述符的剩余学习，用于不确定性感知机器学习潜力
链接：https://arxiv.org/abs/2509.02927

作者： Huang, Nontawat Charoenphakdee, Yuta Tsuboi, Yong-Bin Zhuang, Wenwen Li
摘要：Entrance方法被认为是机器学习原子间相互作用势（MLIP）不确定度量化（UQ）的金标准。然而，它们的高计算成本会限制其实用性。已经提出了替代技术，如蒙特卡罗丢弃和深度内核学习，以提高计算效率;然而，其中一些方法不能应用于已经训练的模型，并可能影响预测精度。在本文中，我们提出了一个简单而有效的事后框架UQ，利用训练的图神经网络的描述符潜力来估计残差。我们将这种方法称为基于事后干扰器的基于残差的学习（PDRL）。PDRL对MLIP预测和地面实况值之间的差异进行建模，允许这些残差作为预测不确定性的代理。我们探索了PDRL的多种变体，并将其与已建立的UQ方法进行基准测试，评估其有效性和局限性。
摘要：Ensemble method is considered the gold standard for uncertainty quantification (UQ) for machine learning interatomic potentials (MLIPs). However, their high computational cost can limit its practicality. Alternative techniques, such as Monte Carlo dropout and deep kernel learning, have been proposed to improve computational efficiency; however, some of these methods cannot be applied to already trained models and may affect the prediction accuracy. In this paper, we propose a simple and efficient post-hoc framework for UQ that leverages the descriptor of a trained graph neural network potential to estimate residual errors. We refer to this method as post-hoc descriptor-based residual-based learning (PDRL). PDRL models the discrepancy between MLIP predictions and ground truth values, allowing these residuals to act as proxies for prediction uncertainty. We explore multiple variants of PDRL and benchmark them against established UQ methods, evaluating both their effectiveness and limitations.

【6】Improving Perceptual Audio Aesthetic Assessment via Triplet Loss and Self-Supervised Embeddings
标题：通过三重缺失和自我监督嵌入改善知觉音频审美评估
链接：https://arxiv.org/abs/2509.03292

作者：. G. Wisnu, Ryandhimas E. Zezario, Stefano Rini, Hsin-Min Wang, Yu Tsao
备注：Accepted by IEEE Automatic Speech Recognition and Understanding Workshop(ASRU), 2025
摘要：我们提出了一个用于生成音频的自动多轴感知质量预测的系统，该系统是为AudioMOS Challenge 2025的Track 2开发的。该任务是预测四个音频美学分数-生产质量，生产复杂性，内容享受和内容丰富性-由文本到语音（TTS），文本到音频（TTA）和文本到音乐（TTM）系统生成的音频。一个主要的挑战是自然训练数据和综合评估数据之间的域转移。为了解决这个问题，我们将BEAT（一种预训练的基于变换的音频表示模型）与多分支长短期记忆（LSTM）预测器相结合，并使用基于缓冲区的采样的三重丢失来通过感知相似性来构建嵌入空间。我们的研究结果表明，这提高了嵌入的可辨别性和泛化性，使域鲁棒的音频质量评估没有合成的训练数据。
摘要：We present a system for automatic multi-axis perceptual quality prediction of generative audio, developed for Track 2 of the AudioMOS Challenge 2025. The task is to predict four Audio Aesthetic Scores--Production Quality, Production Complexity, Content Enjoyment, and Content Usefulness--for audio generated by text-to-speech (TTS), text-to-audio (TTA), and text-to-music (TTM) systems. A main challenge is the domain shift between natural training data and synthetic evaluation data. To address this, we combine BEATs, a pretrained transformer-based audio representation model, with a multi-branch long short-term memory (LSTM) predictor and use a triplet loss with buffer-based sampling to structure the embedding space by perceptual similarity. Our results show that this improves embedding discriminability and generalization, enabling domain-robust audio quality assessment without synthetic training data.

【7】Deep Self-knowledge Distillation: A hierarchical supervised learning for coronary artery segmentation
标题：深度自我知识蒸馏：冠状动脉分割的分层监督学习
链接：https://arxiv.org/abs/2509.03173

作者：Lin
摘要：冠状动脉疾病是导致死亡的主要原因，强调了通过X射线血管造影术进行精确诊断的至关重要性。从这些图像中手动分割冠状动脉是耗时且低效的，这促使了自动模型的开发。然而，现有的方法，无论是基于规则的还是深度学习模型，都面临着性能差和泛化能力有限等问题。此外，目前应用于该领域的知识蒸馏方法没有充分利用模型的层次知识，导致一定的信息浪费和不足的增强模型的分割任务的性能能力。为了解决这些问题，本文介绍了深度自我知识蒸馏，一种利用分层输出进行监督的冠状动脉分割的新方法。通过结合深度分布损失和像素级自知识蒸馏损失，我们的方法通过分层学习策略增强了学生模型的分割性能，有效地从教师模型中转移知识。我们的方法结合了松散约束的概率分布向量与严格约束的逐像素监督，为分割模型提供了双重正则化，同时也增强了其泛化性和鲁棒性。在XCAD和DCA 1数据集上的大量实验表明，与其他模型相比，我们的方法在骰子系数，准确性，灵敏度和IoU方面优于其他模型。
摘要：Coronary artery disease is a leading cause of mortality, underscoring the critical importance of precise diagnosis through X-ray angiography. Manual coronary artery segmentation from these images is time-consuming and inefficient, prompting the development of automated models. However, existing methods, whether rule-based or deep learning models, struggle with issues like poor performance and limited generalizability. Moreover, current knowledge distillation methods applied in this field have not fully exploited the hierarchical knowledge of the model, leading to certain information waste and insufficient enhancement of the model's performance capabilities for segmentation tasks. To address these issues, this paper introduces Deep Self-knowledge Distillation, a novel approach for coronary artery segmentation that leverages hierarchical outputs for supervision. By combining Deep Distribution Loss and Pixel-wise Self-knowledge Distillation Loss, our method enhances the student model's segmentation performance through a hierarchical learning strategy, effectively transferring knowledge from the teacher model. Our method combines a loosely constrained probabilistic distribution vector with tightly constrained pixel-wise supervision, providing dual regularization for the segmentation model while also enhancing its generalization and robustness. Extensive experiments on XCAD and DCA1 datasets demonstrate that our approach outperforms the dice coefficient, accuracy, sensitivity and IoU compared to other models in comparative evaluations.

迁移|Zero/Few/One-Shot|自适应(6篇)

【1】RecBase: Generative Foundation Model Pretraining for Zero-Shot Recommendation
标题：RecBase：针对Zero-Shot推荐的生成性基础模型预训练
链接：https://arxiv.org/abs/2509.03131

作者：hou, Weinan Gan, Qijiong Liu, Ke Lei, Jieming Zhu, Hai Huang, Yan Xia, Ruiming Tang, Zhenhua Dong, Zhou Zhao
备注：None
摘要：基于LLM的推荐的最新进展已经显示出了希望，但它们的跨领域推广受到以语言为中心的预训练和推荐任务之间的根本不匹配的阻碍。现有的方法，依赖于语言级的知识，无法捕捉动态的，项目级的用户兴趣跨域。为了弥合这一差距，我们提出了RecBase，这是一个与领域无关的基础模型，它预先训练了一个面向解释的目标。RecBase利用具有统一文本表示和特征映射的大规模、异构、跨域语料库来增强跨域泛化。为了进一步调整跨域的项目语义，我们引入了一个统一的项目标记器，将项目编码为分层概念标识符，从而实现结构化表示和高效的词汇共享。该模型使用自回归目标进行训练，以捕获复杂的项目级序列模式。在8个真实数据集上，我们的1. 5B参数模型在zero-shot和跨域推荐任务中匹配或超过了LLM基线的性能，最高可达7B参数。
摘要：Recent advances in LLM-based recommendation have shown promise, yet their cross-domain generalization is hindered by a fundamental mismatch between language-centric pretraining and the recommendation task. Existing methods, relying on language-level knowledge, fail to capture dynamic, item-level user interests across domains. To bridge this gap, we propose RecBase, a domain-agnostic foundational model pretrained with a recommendation-oriented objective. RecBase leverages a large-scale, heterogeneous, cross-domain corpus with unified textual representations and feature mappings to enhance cross-domain generalization. To further align item semantics across domains, we introduce a unified item tokenizer that encodes items into hierarchical concept identifiers, enabling structured representation and efficient vocabulary sharing. The model is trained using an autoregressive objective to capture complex item-level sequential patterns. On eight real-world datasets, our 1.5B-parameter model matches or surpasses the performance of LLM baselines up to 7B parameters in zero-shot and cross-domain recommendation tasks.

【2】StableSleep: Source-Free Test-Time Adaptation for Sleep Staging with Lightweight Safety Rails
标题：StableSleep：具有轻量级安全轨的睡眠阶段无源代码测试时间调整
链接：https://arxiv.org/abs/2509.02982

作者：asu, Faisal R Jahangiri
备注：5 page paper, 8 figures
摘要：当部署在具有不可见生理或记录条件的患者身上时，睡眠分期模型通常会降低。我们提出了一个流，无源测试时间适应（TTA）的配方，结合熵最小化（帐篷）与批量规范统计刷新和两个安全轨道：熵门暂停适应不确定的窗口和基于EMA的复位卷轴回漂移。在Sleep-EDF Expanded上，使用单导联EEG（Fpz-Cz，100 Hz，30 s epochs; R&K到AASM映射），我们在秒级延迟和最小记忆下显示出超过冻结基线的一致增益，报告每个阶段的指标和Cohen的k。该方法与模型无关，不需要源数据或患者校准，并且对于设备上或床边使用是实用的。
摘要：Sleep staging models often degrade when deployed on patients with unseen physiology or recording conditions. We propose a streaming, source-free test-time adaptation (TTA) recipe that combines entropy minimization (Tent) with Batch-Norm statistic refresh and two safety rails: an entropy gate to pause adaptation on uncertain windows and an EMA-based reset to reel back drift. On Sleep-EDF Expanded, using single-lead EEG (Fpz-Cz, 100 Hz, 30s epochs; R&K to AASM mapping), we show consistent gains over a frozen baseline at seconds-level latency and minimal memory, reporting per-stage metrics and Cohen's k. The method is model-agnostic, requires no source data or patient calibration, and is practical for on-device or bedside use.

【3】AdaGrad Meets Muon: Adaptive Stepsizes for Orthogonal Updates
标题：AdaGrad会见Muon：自适应的自适应步进
链接：https://arxiv.org/abs/2509.02981

作者：ang, Yuxuan Liu, Hayden Schaeffer
摘要：最近提出的μ子优化器通过正交化动量更新权重矩阵，并在大型语言模型训练中取得了很大的经验成功。然而，目前还不清楚如何确定这种正交更新的学习率。相比之下，AdaGrad是一种广泛使用的自适应方法，它通过累积过去的梯度来缩放随机梯度。我们提出了一种新的算法，AdaGO，它结合了一个规范为基础的AdaGrad型步长与正交化的更新方向，汇集了这两种方法的好处。与Muon的其他自适应变体不同，AdaGO保留了更新方向的正交性，这可以被解释为谱下降方向，同时通过用累积的过去梯度范数缩放方向来调整步长以适应优化景观。AdaGO的实现只需要对μ子进行最小的修改，只需计算一个额外的标量变量，即累积平方梯度范数，使其在计算和内存方面都很高效。在标准平滑性和无偏有界方差噪声假设下，在随机和确定性环境中为非凸函数建立了最佳理论收敛率。CIFAR-10分类和函数回归的实验结果表明，AdaGO优于Muon和Adam。
摘要：The recently proposed Muon optimizer updates weight matrices via orthogonalized momentum and has demonstrated strong empirical success in large language model training. However, it remains unclear how to determine the learning rates for such orthogonalized updates. AdaGrad, by contrast, is a widely used adaptive method that scales stochastic gradients by accumulated past gradients. We propose a new algorithm, AdaGO, which combines a norm-based AdaGrad-type stepsize with an orthogonalized update direction, bringing together the benefits of both approaches. Unlike other adaptive variants of Muon, AdaGO preserves the orthogonality of the update direction, which can be interpreted as a spectral descent direction, while adapting the stepsizes to the optimization landscape by scaling the direction with accumulated past gradient norms. The implementation of AdaGO requires only minimal modification to Muon, with a single additional scalar variable, the accumulated squared gradient norms, to be computed, making it computationally and memory efficient. Optimal theoretical convergence rates are established for nonconvex functions in both stochastic and deterministic settings under standard smoothness and unbiased bounded-variance noise assumptions. Empirical results on CIFAR-10 classification and function regression demonstrate that AdaGO outperforms Muon and Adam.

【4】LExI: Layer-Adaptive Active Experts for Efficient MoE Model Inference
标题：LExI：用于高效MoE模型推理的层自适应主动专家
链接：https://arxiv.org/abs/2509.02753

作者：eja Chitty-Venkata, Sandeep Madireddy, Murali Emani, Venkatram Vishwanath
备注：Preprint
摘要：混合专家（MoE）模型通过每个令牌仅激活专家的子集来有效地扩展，从而为密集架构提供计算稀疏的替代方案。虽然之前的训练后优化，如专家间和专家内修剪，减少了内存使用，但它们在推理时间计算效率方面的增益有限。此外，现有的MoE架构通常在所有层上均匀地激活固定数量的专家，导致冗余计算和次优性能。在这项工作中，我们首先证明了MoE修剪策略只提高了内存占用，但没有显着提高GPU上使用优化的框架，如vLLM的推理性能。为了解决这个问题，我们引入了LExI，这是一种无数据优化技术，可以确定预训练的MoE模型中每层活跃专家的最佳数量。LExI仅利用模型权重来估计每层的相对重要性，并相应地自适应地为每层分配活动专家的数量。在最先进的语言和视觉MoE基准测试上的实验表明，LExI在推理效率方面明显优于传统的MoE剪枝方法，而准确性损失可以忽略不计。例如，使用LExI，Qwen1.5-MoE在Nvidia H100 GPU上实现了相同的吞吐量，准确率比传统的专家修剪高10%。
摘要：Mixture-of-Experts (MoE) models scale efficiently by activating only a subset of experts per token, offering a computationally sparse alternative to dense architectures. While prior post-training optimizations, such as inter- and intra-expert pruning, reduce memory usage they provide limited gains in inference-time compute efficiency. Moreover, existing MoE architectures typically activate a fixed number of experts uniformly across all layers, resulting in redundant computation and suboptimal performance. In this work, we first demonstrate that MoE pruning strategies improve only the memory footprint but do not significantly improve inference performance on GPU using optimized frameworks such as vLLM. To address this, we introduce LExI, a data-free optimization technique that determines the optimal number of active experts per layer in a pretrained MoE model. LExI leverages only the model weights to estimate the relative importance of each layer and adaptively assigns the number of active experts accordingly per layer. Experiments on state-of-the-art language and vision MoE benchmarks demonstrate that LExI significantly outperforms traditional MoE pruning approaches in terms of inference efficiency with negligible accuracy loss. For example, using LExI, Qwen1.5-MoE achieves the same throughput on Nvidia H100 GPU with 10% better accuracy than traditional expert pruning.

【5】Scale-Adaptive Generative Flows for Multiscale Scientific Data
标题：多尺度科学数据的规模自适应生成流
链接：https://arxiv.org/abs/2509.02971

作者：n, Eric Vanden-Eijnden
摘要：基于流的生成模型在对具有多尺度傅立叶光谱的科学数据进行建模时可能面临重大挑战，通常会在精细尺度特征中产生较大的误差。我们解决这个问题的框架内的随机插值，通过原则性的设计噪声分布和插值时间表。关键的见解是，噪声不应该比目标数据分布更平滑-通过傅立叶频谱衰减率测量-以确保初始时间附近的有界漂移场。对于高斯和近高斯分布的精细尺度结构是已知的，我们表明，频谱匹配噪声提高了数值效率相比，标准的白噪声方法。对于复杂的非高斯分布，我们开发了尺度自适应插值时间表，解决了由比数据粗糙的噪声引起的数值病态。合成高斯随机场和随机Allen-Cahn和Navier-Stokes方程的解决方案的数值实验验证了我们的方法，并证明了它的能力，以较低的计算成本比传统的方法产生高保真度的样本。
摘要：Flow-based generative models can face significant challenges when modeling scientific data with multiscale Fourier spectra, often producing large errors in fine-scale features. We address this problem within the framework of stochastic interpolants, via principled design of noise distributions and interpolation schedules. The key insight is that the noise should not be smoother than the target data distribution -- measured by Fourier spectrum decay rates -- to ensure bounded drift fields near the initial time. For Gaussian and near-Gaussian distributions whose fine-scale structure is known, we show that spectrum-matched noise improves numerical efficiency compared to standard white-noise approaches. For complex non-Gaussian distributions, we develop scale-adaptive interpolation schedules that address the numerical ill-conditioning arising from rougher-than-data noise. Numerical experiments on synthetic Gaussian random fields and solutions to the stochastic Allen-Cahn and Navier-Stokes equations validate our approach and demonstrate its ability to generate high-fidelity samples at lower computational cost than traditional approaches.

【6】Lessons Learned from Deploying Adaptive Machine Learning Agents with Limited Data for Real-time Cell Culture Process Monitoring
标题：部署具有有限数据的自适应机器学习代理进行实时细胞培养过程监控的经验教训
链接：https://arxiv.org/abs/2509.02606

作者：g Khuat, Johnny Peng, Robert Bassett, Ellen Otte, Bogdan Gabrys
摘要：本研究探讨了三种机器学习（ML）方法的部署，用于实时预测细胞培养过程中的葡萄糖，乳酸盐和铵浓度，使用拉曼光谱作为输入特征。该研究解决了与有限的数据可用性和过程可变性相关的挑战，提供了对预训练模型，即时学习（JITL）和在线学习算法的比较分析。两个工业案例研究，以评估不同的生物过程条件对模型性能的影响。研究结果强调了预训练模型表现出卓越的预测准确性的特定条件，并确定了JITL或在线学习方法对自适应流程监测更有效的场景。本研究还强调了在生物反应器操作期间使用最新的离线分析测量值更新部署的模型/试剂的至关重要性，以在整个生物反应器运行期间保持模型性能以应对细胞生长行为和操作条件的变化。此外，该研究证实了一个简单的专家混合框架在基于拉曼光谱数据实时预测代谢物浓度方面的有效性，提高了准确性和鲁棒性。这些见解有助于开发强大的策略，以便在动态和不断变化的生物制造环境中有效部署ML模型。
摘要：This study explores the deployment of three machine learning (ML) approaches for real-time prediction of glucose, lactate, and ammonium concentrations in cell culture processes, using Raman spectroscopy as input features. The research addresses challenges associated with limited data availability and process variability, providing a comparative analysis of pretrained models, just-in-time learning (JITL), and online learning algorithms. Two industrial case studies are presented to evaluate the impact of varying bioprocess conditions on model performance. The findings highlight the specific conditions under which pretrained models demonstrate superior predictive accuracy and identify scenarios where JITL or online learning approaches are more effective for adaptive process monitoring. This study also highlights the critical importance of updating the deployed models/agents with the latest offline analytical measurements during bioreactor operations to maintain the model performance against the changes in cell growth behaviours and operating conditions throughout the bioreactor run. Additionally, the study confirms the usefulness of a simple mixture-of-experts framework in achieving enhanced accuracy and robustness for real-time predictions of metabolite concentrations based on Raman spectral data. These insights contribute to the development of robust strategies for the efficient deployment of ML models in dynamic and changing biomanufacturing environments.

强化学习(3篇)

【1】A Hierarchical Deep Reinforcement Learning Framework for Traffic Signal Control with Predictable Cycle Planning
标题：具有可预测循环规划的交通信号控制分层深度强化学习框架
链接：https://arxiv.org/abs/2509.03118

作者：u, Yuli Zhang, Chengming Wang, Ruiyuan Jiang, Ziheng Qiao, Pengfei Fan, Dongyao Jia
摘要：深度强化学习（DRL）由于能够从复杂的交通环境中学习自适应策略，已成为交通信号控制（TSC）中的一种流行方法。在基于DRL的TSC方法中，两个主要的控制范例是“选择相位”和“切换”策略。虽然选择阶段范例中的代理自适应地选择下一个活动阶段，但这种范例可能会导致驾驶员出现意想不到的阶段序列，扰乱他们的预期，并可能危及交叉口的安全。同时，切换范例允许代理决定是否切换到下一个预定义的阶段或延长当前阶段。虽然这种结构保持了更可预测的秩序，但它可能导致不公平和低效的阶段分配，因为某些运动可能会不成比例地延长，而其他运动则被忽视。本文提出了一种DRL模型，称为深度层次周期规划器（DHCP），分层分配交通信号周期持续时间。高级代理首先根据总体交通状态确定南北（NS）和东西（EW）方向之间的总周期时间的划分。然后，低级代理进一步将每个主要方向内的分配持续时间划分为直行和左转运动，从而使两个运动的持续时间更加灵活。我们在真实和合成道路网络上测试我们的模型，以及多组真实和合成交通流。实证结果表明，我们的模型在所有数据集上相对于基线均取得了最佳性能。
摘要：Deep reinforcement learning (DRL) has become a popular approach in traffic signal control (TSC) due to its ability to learn adaptive policies from complex traffic environments. Within DRL-based TSC methods, two primary control paradigms are ``choose phase" and ``switch" strategies. Although the agent in the choose phase paradigm selects the next active phase adaptively, this paradigm may result in unexpected phase sequences for drivers, disrupting their anticipation and potentially compromising safety at intersections. Meanwhile, the switch paradigm allows the agent to decide whether to switch to the next predefined phase or extend the current phase. While this structure maintains a more predictable order, it can lead to unfair and inefficient phase allocations, as certain movements may be extended disproportionately while others are neglected. In this paper, we propose a DRL model, named Deep Hierarchical Cycle Planner (DHCP), to allocate the traffic signal cycle duration hierarchically. A high-level agent first determines the split of the total cycle time between the North-South (NS) and East-West (EW) directions based on the overall traffic state. Then, a low-level agent further divides the allocated duration within each major direction between straight and left-turn movements, enabling more flexible durations for the two movements. We test our model on both real and synthetic road networks, along with multiple sets of real and synthetic traffic flows. Empirical results show our model achieves the best performance over all datasets against baselines.

【2】Population-aware Online Mirror Descent for Mean-Field Games with Common Noise by Deep Reinforcement Learning
标题：通过深度强化学习实现具有共同噪音的平均场游戏的人口感知在线镜像下降
链接：https://arxiv.org/abs/2509.03030

作者：Mathieu Lauriere, Matthieu Geist, Olivier Pietquin, Ankur Mehta
备注：2025 IEEE 64rd Conference on Decision and Control (CDC)
摘要：平均场博弈（MFG）为研究大规模多智能体系统提供了一个强大的框架。然而，学习纳什均衡在MFG仍然是一个具有挑战性的问题，特别是当初始分布是未知的，或者当人口受到共同的噪音。在本文中，我们介绍了一种高效的深度强化学习（DRL）算法，该算法旨在实现人口依赖的纳什均衡，而不依赖于平均或历史采样，灵感来自Munchausen RL和在线镜像下降。由此产生的政策是适应各种初始分布和常见的噪声源。通过对七个典型例子的数值实验，我们证明了我们的算法具有优越的收敛性能相比，国家的最先进的算法，特别是DRL版本的Fictionary播放人口依赖的政策。在常见噪声存在下的性能强调了我们的方法的鲁棒性和适应性。
摘要：Mean Field Games (MFGs) offer a powerful framework for studying large-scale multi-agent systems. Yet, learning Nash equilibria in MFGs remains a challenging problem, particularly when the initial distribution is unknown or when the population is subject to common noise. In this paper, we introduce an efficient deep reinforcement learning (DRL) algorithm designed to achieve population-dependent Nash equilibria without relying on averaging or historical sampling, inspired by Munchausen RL and Online Mirror Descent. The resulting policy is adaptable to various initial distributions and sources of common noise. Through numerical experiments on seven canonical examples, we demonstrate that our algorithm exhibits superior convergence properties compared to state-of-the-art algorithms, particularly a DRL version of Fictitious Play for population-dependent policies. The performance in the presence of common noise underscores the robustness and adaptability of our approach.

【3】Latent Variable Modeling in Multi-Agent Reinforcement Learning via Expectation-Maximization for UAV-Based Wildlife Protection
标题：基于无人机野生动物保护的期望最大化的多智能体强化学习中的潜在变量建模
链接：https://arxiv.org/abs/2509.02579

作者：ghavi, Rahman Farnoosh
摘要：保护濒危野生动物免遭非法偷猎是一个严峻的挑战，特别是在广阔和部分可观察的环境中，实时响应至关重要。针对野生动物保护中无人机（UAV）协同问题，提出了一种基于期望最大化（EM）的隐变量建模方法，并将其应用于多Agent强化学习（MARL）中。通过隐变量对隐藏的环境因素和代理间动态进行建模，我们的方法增强了不确定性下的探索和协调。我们使用自定义模拟来实现和评估我们的EM-MARL框架，该模拟涉及10架无人机，其任务是巡逻濒危的伊朗豹的保护栖息地。大量的实验结果表明，与标准算法（如最近策略优化（PPO）和深度确定性策略梯度（DDPG））相比，该算法在检测精度，适应性和策略收敛方面具有优异的性能。我们的研究结果强调了EM推理与MARL相结合的潜力，以改善复杂的，高风险的保护方案中的分散决策。完整的实现，模拟环境和培训脚本在GitHub上公开。
摘要：Protecting endangered wildlife from illegal poaching presents a critical challenge, particularly in vast and partially observable environments where real-time response is essential. This paper introduces a novel Expectation-Maximization (EM) based latent variable modeling approach in the context of Multi-Agent Reinforcement Learning (MARL) for Unmanned Aerial Vehicle (UAV) coordination in wildlife protection. By modeling hidden environmental factors and inter-agent dynamics through latent variables, our method enhances exploration and coordination under uncertainty.We implement and evaluate our EM-MARL framework using a custom simulation involving 10 UAVs tasked with patrolling protected habitats of the endangered Iranian leopard. Extensive experimental results demonstrate superior performance in detection accuracy, adaptability, and policy convergence when compared to standard algorithms such as Proximal Policy Optimization (PPO) and Deep Deterministic Policy Gradient (DDPG). Our findings underscore the potential of combining EM inference with MARL to improve decentralized decisionmaking in complex, high-stakes conservation scenarios. The full implementation, simulation environment, and training scripts are publicly available on GitHub.

元学习(1篇)

【1】Meta-Imputation Balanced (MIB): An Ensemble Approach for Handling Missing Data in Biomedical Machine Learning
标题：元插补平衡（MCB）：生物医学机器学习中处理缺失数据的集成方法
链接：https://arxiv.org/abs/2509.03316

作者：zad, Zoran Bosnić, Matjaž Kukar
摘要：数据缺失是机器学习应用中的一个基本挑战，通常会降低模型的性能和可靠性。这个问题在生物信息学和临床机器学习等领域尤其严重，由于数据生成和数据收集的性质，数据集通常不完整。虽然存在许多插补方法，从简单的统计技术到先进的深度学习模型，但没有一种方法在不同的数据集和缺失机制中始终表现良好。本文提出了一种新的元插补方法，学习组合多个基本插补器的输出，以更准确地预测缺失值。通过在具有已知基础事实的综合屏蔽数据上训练所提出的称为元插补平衡（MIB）的方法，系统学会根据每种方法的行为预测最合适的插补值。我们的工作突出了集成学习在估算中的潜力，并为现实世界机器学习系统中更强大，模块化和可解释的预处理管道铺平了道路。
摘要：Missing data represents a fundamental challenge in machine learning applications, often reducing model performance and reliability. This problem is particularly acute in fields like bioinformatics and clinical machine learning, where datasets are frequently incomplete due to the nature of both data generation and data collection. While numerous imputation methods exist, from simple statistical techniques to advanced deep learning models, no single method consistently performs well across diverse datasets and missingness mechanisms. This paper proposes a novel Meta-Imputation approach that learns to combine the outputs of multiple base imputers to predict missing values more accurately. By training the proposed method called Meta-Imputation Balanced (MIB) on synthetically masked data with known ground truth, the system learns to predict the most suitable imputed value based on the behavior of each method. Our work highlights the potential of ensemble learning in imputation and paves the way for more robust, modular, and interpretable preprocessing pipelines in real-world machine learning systems.

医学相关(6篇)

【1】Scalable and Loosely-Coupled Multimodal Deep Learning for Breast Cancer Subtyping
标题：可扩展和松耦合的多模式深度学习用于乳腺癌分型
链接：https://arxiv.org/abs/2509.03408

作者：Amer, Mohamed A. Suliman, Tu Bui, Nuria Garcia, Serban Georgescu
摘要：医疗保健应用程序本质上是多模式的，从不同数据源的集成中受益匪浅。然而，临床环境中可用的模式可能因不同的位置和患者而异。从多模式整合中受益的一个关键领域是乳腺癌分子亚型分型，这是一项重要的临床任务，可以促进个性化治疗并改善患者预后。在这项工作中，我们提出了一个可扩展的和松散耦合的多模式框架，无缝集成来自各种模式的数据，包括拷贝数变异（CNV），临床记录和组织病理学图像，以增强乳腺癌亚型。虽然我们的主要重点是乳腺癌，但我们的框架旨在轻松适应其他模式，提供灵活性，以最小的开销扩大或缩小规模，而无需重新培训现有模式，使其也适用于其他类型的癌症。我们介绍了一个双为基础的表示整个幻灯片图像（WSI），结合传统的基于图像和基于图形的WSI表示。这种新颖的双重方法导致显着的性能改进。此外，我们提出了一种新的多模态融合策略，证明了它能够在一系列多模态条件下提高性能。我们的综合结果表明，将我们的双重WSI表示与CNV和临床健康记录相结合，以及我们的管道和融合策略，在乳腺癌亚型分型方面优于最先进的方法。
摘要：Healthcare applications are inherently multimodal, benefiting greatly from the integration of diverse data sources. However, the modalities available in clinical settings can vary across different locations and patients. A key area that stands to gain from multimodal integration is breast cancer molecular subtyping, an important clinical task that can facilitate personalized treatment and improve patient prognosis. In this work, we propose a scalable and loosely-coupled multimodal framework that seamlessly integrates data from various modalities, including copy number variation (CNV), clinical records, and histopathology images, to enhance breast cancer subtyping. While our primary focus is on breast cancer, our framework is designed to easily accommodate additional modalities, offering the flexibility to scale up or down with minimal overhead without requiring re-training of existing modalities, making it applicable to other types of cancers as well. We introduce a dual-based representation for whole slide images (WSIs), combining traditional image-based and graph-based WSI representations. This novel dual approach results in significant performance improvements. Moreover, we present a new multimodal fusion strategy, demonstrating its ability to enhance performance across a range of multimodal conditions. Our comprehensive results show that integrating our dual-based WSI representation with CNV and clinical health records, along with our pipeline and fusion strategy, outperforms state-of-the-art methods in breast cancer subtyping.

【2】A Narrative Review of Clinical Decision Support Systems in Offloading Footwear for Diabetes-Related Foot Ulcers
标题：糖尿病相关足溃疡患者减重的临床决策支持系统的叙述性审查
链接：https://arxiv.org/abs/2509.02923

作者：ar, Muhammad Ashad Kabir, Luke Donnan, Sayed Ahmed
备注：44 pages, 2 figures, and 3 tables
摘要：卸载鞋类通过降低足底压力（PP）有助于预防和治疗糖尿病足溃疡（DFU），但处方决策仍然分散：功能选择不同，个性化有限，评估实践也不同。我们对截至2025年8月发表的45项研究（12项指南/方案，25项基于知识的系统，8项机器学习应用）进行了叙述性综述。我们对知识类型、决策逻辑、评估方法和使能技术进行了专题分析。指南强调PP阈值（<=200 kPa或>=25- 30\ %降低），但很少产生可操作的功能级输出。基于知识的系统使用规则和传感器驱动的逻辑，集成了PP监控，遵守跟踪和可用性测试。ML工作引入了预测、优化和生成模型，具有高计算精度，但解释性和临床验证有限。评价工作仍然支离破碎：协议优先考虑生物力学测试;基于知识的系统评估可用性/依从性; ML研究侧重于技术准确性，与长期结果的联系较弱。通过这种综合，我们提出了一个由五部分组成的CDSS框架：（1）最小可行数据集;（2）结合规则、优化和可解释ML的混合架构;（3）结构化特征级输出;（4）持续验证和评估;（5）与临床和远程医疗工作流程集成。该框架旨在为DFU护理提供可扩展的、以患者为中心的CDSS;优先考虑可互操作的数据集、可解释的模型和以结果为中心的评估将是临床采用的关键。
摘要：Offloading footwear helps prevent and treat diabetic foot ulcers (DFUs) by lowering plantar pressure (PP), yet prescription decisions remain fragmented: feature selection varies, personalization is limited, and evaluation practices differ. We performed a narrative review of 45 studies (12 guidelines/protocols, 25 knowledge-based systems, 8 machine-learning applications) published to Aug 2025. We thematically analyzed knowledge type, decision logic, evaluation methods, and enabling technologies. Guidelines emphasize PP thresholds (<=200 kPa or >=25--30\% reduction) but rarely yield actionable, feature-level outputs. Knowledge-based systems use rule- and sensor-driven logic, integrating PP monitoring, adherence tracking, and usability testing. ML work introduces predictive, optimization, and generative models with high computational accuracy but limited explainability and clinical validation. Evaluation remains fragmented: protocols prioritize biomechanical tests; knowledge-based systems assess usability/adherence; ML studies focus on technical accuracy with weak linkage to long-term outcomes. From this synthesis we propose a five-part CDSS framework: (1) a minimum viable dataset; (2) a hybrid architecture combining rules, optimization, and explainable ML; (3) structured feature-level outputs; (4) continuous validation and evaluation; and (5) integration with clinical and telehealth workflows. This framework aims to enable scalable, patient-centered CDSSs for DFU care; prioritizing interoperable datasets, explainable models, and outcome-focused evaluation will be key to clinical adoption.

【3】Enhancing Machine Learning for Imbalanced Medical Data: A Quantum-Inspired Approach to Synthetic Oversampling (QI-SMOTE)
标题：增强不平衡医疗数据的机器学习：量子启发的合成过采样方法（QI-SMOTE）
链接：https://arxiv.org/abs/2509.02863

作者：htriya, Pardeep Singh
摘要：类别不平衡仍然是机器学习（ML）中的一个关键挑战，特别是在医疗领域，代表性不足的少数类别会导致有偏见的模型和预测性能降低。这项研究介绍了量子启发SMOTE（QI-SMOTE），这是一种新型的数据增强技术，通过利用量子进化和分层纠缠等量子原理，增强了ML分类器的性能，包括随机森林（RF），支持向量机（SVM），逻辑回归（LR），k最近邻（KNN），梯度提升（GB）和神经网络。与传统的过采样方法不同，QI-SMOTE生成的合成实例保留了复杂的数据结构，提高了模型的泛化和分类精度。我们在MIMIC-III和MIMIC-IV数据集上验证QI-SMOTE，由于其临床意义和固有的类别不平衡，使用死亡率检测作为基准任务。我们将我们的方法与传统的过采样技术进行比较，包括Borderline-SMOTE，ADASYN，SMOTE-ENN，SMOTE-TOMEK和SVM-SMOTE，使用关键性能指标，如准确性，F1分数，G-Mean和AUC-ROC。结果表明，QI-SMOTE通过产生更多信息和平衡的训练数据，显着提高了集成方法（RF，GB，ADA），基于内核的模型（SVM）和深度学习方法的有效性。通过将量子启发的转换集成到ML管道中，QI-SMOTE不仅减轻了类别不平衡，还增强了医疗诊断和决策中预测模型的鲁棒性和可靠性。这项研究强调了量子激发的reservation技术在推进最先进的ML方法中的潜力。
摘要：Class imbalance remains a critical challenge in machine learning (ML), particularly in the medical domain, where underrepresented minority classes lead to biased models and reduced predictive performance. This study introduces Quantum-Inspired SMOTE (QI-SMOTE), a novel data augmentation technique that enhances the performance of ML classifiers, including Random Forest (RF), Support Vector Machine (SVM), Logistic Regression (LR), k-Nearest Neighbors (KNN), Gradient Boosting (GB), and Neural Networks, by leveraging quantum principles such as quantum evolution and layered entanglement. Unlike conventional oversampling methods, QI-SMOTE generates synthetic instances that preserve complex data structures, improving model generalization and classification accuracy. We validate QI-SMOTE on the MIMIC-III and MIMIC-IV datasets, using mortality detection as a benchmark task due to their clinical significance and inherent class imbalance. We compare our method against traditional oversampling techniques, including Borderline-SMOTE, ADASYN, SMOTE-ENN, SMOTE-TOMEK, and SVM-SMOTE, using key performance metrics such as Accuracy, F1-score, G-Mean, and AUC-ROC. The results demonstrate that QI-SMOTE significantly improves the effectiveness of ensemble methods (RF, GB, ADA), kernel-based models (SVM), and deep learning approaches by producing more informative and balanced training data. By integrating quantum-inspired transformations into the ML pipeline, QI-SMOTE not only mitigates class imbalance but also enhances the robustness and reliability of predictive models in medical diagnostics and decision-making. This study highlights the potential of quantum-inspired resampling techniques in advancing state-of-the-art ML methodologies.

【4】S2M2ECG: Spatio-temporal bi-directional State Space Model Enabled Multi-branch Mamba for ECG
标题：S2M2ECG：基于时空双向状态空间模型的多分支Mamba心电图机
链接：https://arxiv.org/abs/2509.03066

作者： Zhang, Ruoxin Wang, Chenlian Zhou, Jiguang Shi, Yue Ge, Zhoutong Li, Sheng Chang, Hao Wang, Jin He, Qijun Huang
摘要：作为心血管疾病（CVD）诊断的最有效方法之一，多导联心电图（ECG）信号提出了一个特征性的多传感器信息融合挑战，在深度学习领域一直受到研究。尽管提出了许多具有不同DL架构的算法，但在性能、计算复杂度和多源ECG特征融合之间保持平衡仍然具有挑战性。最近，状态空间模型（SSM），特别是Mamba，在各个领域都表现出了显着的有效性。其固有的高效计算和线性复杂度设计使其特别适合ECG等低维数据。本文提出了S2 M2 ECG，一种具有三层融合机制的SSM架构：（1）用于低层信号融合的具有分段标记化的时空双向SSM;（2）用于提高前向和后向识别精度的具有双向扫描的导联内时间信息融合;（3）用于空间信息融合的跨导联特征交互模块。为了充分利用ECG信号中固有的ECG特定多导联机制，采用了多分支设计和导联融合模块，可对每个导联进行单独分析，同时确保与其他导联的无缝集成。实验结果表明，S2 M2 ECG在节律，形态和临床场景中取得了优异的性能。此外，它的轻量级架构确保它在现有模型中具有几乎最少的参数，使其非常适合高效推理和方便部署。总的来说，S2 M2 ECG提供了一种有前途的替代方案，在性能、计算复杂性和ECG特定特征之间取得了良好的平衡，为CVD诊断中的高性能、轻量级计算铺平了道路。
摘要：As one of the most effective methods for cardiovascular disease (CVD) diagnosis, multi-lead Electrocardiogram (ECG) signals present a characteristic multi-sensor information fusion challenge that has been continuously researched in deep learning domains. Despite the numerous algorithms proposed with different DL architectures, maintaining a balance among performance, computational complexity, and multi-source ECG feature fusion remains challenging. Recently, state space models (SSMs), particularly Mamba, have demonstrated remarkable effectiveness across various fields. Their inherent design for high-efficiency computation and linear complexity makes them particularly suitable for low-dimensional data like ECGs. This work proposes S2M2ECG, an SSM architecture featuring three-level fusion mechanisms: (1) Spatio-temporal bi-directional SSMs with segment tokenization for low-level signal fusion, (2) Intra-lead temporal information fusion with bi-directional scanning to enhance recognition accuracy in both forward and backward directions, (3) Cross-lead feature interaction modules for spatial information fusion. To fully leverage the ECG-specific multi-lead mechanisms inherent in ECG signals, a multi-branch design and lead fusion modules are incorporated, enabling individual analysis of each lead while ensuring seamless integration with others. Experimental results reveal that S2M2ECG achieves superior performance in the rhythmic, morphological, and clinical scenarios. Moreover, its lightweight architecture ensures it has nearly the fewest parameters among existing models, making it highly suitable for efficient inference and convenient deployment. Collectively, S2M2ECG offers a promising alternative that strikes an excellent balance among performance, computational complexity, and ECG-specific characteristics, paving the way for high-performance, lightweight computations in CVD diagnosis.

【5】Toward a robust lesion detection model in breast DCE-MRI: adapting foundation models to high-risk women
标题：建立强大的乳腺DCE-MRI病变检测模型：将基础模型调整为高危女性
链接：https://arxiv.org/abs/2509.02710

作者：.B. do Nascimento, Vincent Dong, Guilherme J. Cavalcante, Alex Nguyen, Thaís G. do Rêgo, Yuri Malheiros, Telmo M. Silva Filho, Carla R. Zeballos Torrez, James C. Gee, Anne Marie McCarthy, Andrew D. A. Maidment, Bruno Barufaldi
摘要：准确的乳腺MRI病变检测对于早期癌症诊断至关重要，特别是在高危人群中。我们提出了一种分类管道，该管道采用预训练的基础模型Medical Slice Transformer（MST），用于使用动态对比增强MRI（DCE-MRI）进行乳腺病变分类。利用基于DINOv 2的自监督预训练，MST生成鲁棒的每片特征嵌入，然后用于训练Kolmogorov-Arnold网络（KAN）分类器。KAN通过自适应B样条激活实现局部非线性变换，为传统卷积网络提供了一种灵活且可解释的替代方案。这增强了模型在不平衡和异质临床数据集中区分良性和恶性病变的能力。实验结果表明，MST+KAN管道优于基线MST分类器，实现AUC = 0.80 \pm 0.02，同时通过基于注意力的热图保持可解释性。我们的研究结果强调了基础模型嵌入与高级分类策略相结合的有效性，以构建强大且可推广的乳腺MRI分析工具。
摘要：Accurate breast MRI lesion detection is critical for early cancer diagnosis, especially in high-risk populations. We present a classification pipeline that adapts a pretrained foundation model, the Medical Slice Transformer (MST), for breast lesion classification using dynamic contrast-enhanced MRI (DCE-MRI). Leveraging DINOv2-based self-supervised pretraining, MST generates robust per-slice feature embeddings, which are then used to train a Kolmogorov--Arnold Network (KAN) classifier. The KAN provides a flexible and interpretable alternative to conventional convolutional networks by enabling localized nonlinear transformations via adaptive B-spline activations. This enhances the model's ability to differentiate benign from malignant lesions in imbalanced and heterogeneous clinical datasets. Experimental results demonstrate that the MST+KAN pipeline outperforms the baseline MST classifier, achieving AUC = 0.80 \pm 0.02 while preserving interpretability through attention-based heatmaps. Our findings highlight the effectiveness of combining foundation model embeddings with advanced classification strategies for building robust and generalizable breast MRI analysis tools.

【6】Optimizing Prognostic Biomarker Discovery in Pancreatic Cancer Through Hybrid Ensemble Feature Selection and Multi-Omics Data
标题：通过混合集合特征选择和多组学数据优化胰腺癌预后生物标志物发现
链接：https://arxiv.org/abs/2509.02648

作者：las, Anne-Marie George, Alberto López, Sebastian Fischer, Marc Becker, Tero Aittokallio
备注：52 pages, 5 figures, 9 Supplementary Figures, 1 Supplementary Table
摘要：使用高维多组学数据预测患者生存需要系统的特征选择方法，以确保预测性能、稀疏性和预后生物标志物发现的可靠性。我们开发了一种混合集成特征选择（hEFS）方法，该方法将数据子采样与多个预后模型相结合，集成了嵌入式和基于包装器的生存预测策略。组学特征使用投票理论启发的聚合机制跨模型和子样本进行排名，而最佳特征数量通过帕累托前沿选择，在没有任何用户定义阈值的情况下平衡预测准确性和模型稀疏性。当应用于来自三个胰腺癌队列的多组学数据集时，与传统的后期融合CoxLasso模型相比，hEFS识别出显著更少且更稳定的生物标志物，同时保持相当的区分性能。在开源的mlr3fselect R包中实现，hEFS为高维生存环境中的预后建模和生物标志物发现提供了一个强大的，可解释的和有临床价值的工具。
摘要：Prediction of patient survival using high-dimensional multi-omics data requires systematic feature selection methods that ensure predictive performance, sparsity, and reliability for prognostic biomarker discovery. We developed a hybrid ensemble feature selection (hEFS) approach that combines data subsampling with multiple prognostic models, integrating both embedded and wrapper-based strategies for survival prediction. Omics features are ranked using a voting-theory-inspired aggregation mechanism across models and subsamples, while the optimal number of features is selected via a Pareto front, balancing predictive accuracy and model sparsity without any user-defined thresholds. When applied to multi-omics datasets from three pancreatic cancer cohorts, hEFS identifies significantly fewer and more stable biomarkers compared to the conventional, late-fusion CoxLasso models, while maintaining comparable discrimination performance. Implemented within the open-source mlr3fselect R package, hEFS offers a robust, interpretable, and clinically valuable tool for prognostic modelling and biomarker discovery in high-dimensional survival settings.

推荐(2篇)

【1】Enhancing Interpretability and Effectiveness in Recommendation with Numerical Features via Learning to Contrast the Counterfactual samples
标题：通过学习对比反事实样本来增强具有数字特征的推荐的可解释性和有效性
链接：https://arxiv.org/abs/2509.03187

作者：Xu, Hao Wu, Wenhui Yu, Lantao Hu, Peng Jiang, Kun Gai
备注：Accepted by TheWebConf2024
摘要：我们提出了一个通用的模型不可知的对比学习框架与反事实样本合成（CCSS）建模的神经网络输出和数值特征之间的单调性，这是至关重要的可解释性和有效性的推荐系统。CCSS通过两个阶段的过程来模拟单调性：合成反事实样本和对比反事实样本。这两种技术自然地集成到一个与模型无关的框架中，形成了一个端到端的训练过程。在公开数据集和实际工业数据集上进行了大量的实证检验，结果很好地证明了我们提出的CCSS的有效性。此外，CCSS已经部署在我们真正的大规模工业推荐系统中，成功服务了数亿用户。
摘要：We propose a general model-agnostic Contrastive learning framework with Counterfactual Samples Synthesizing (CCSS) for modeling the monotonicity between the neural network output and numerical features which is critical for interpretability and effectiveness of recommender systems. CCSS models the monotonicity via a two-stage process: synthesizing counterfactual samples and contrasting the counterfactual samples. The two techniques are naturally integrated into a model-agnostic framework, forming an end-to-end training process. Abundant empirical tests are conducted on a publicly available dataset and a real industrial dataset, and the results well demonstrate the effectiveness of our proposed CCSS. Besides, CCSS has been deployed in our real large-scale industrial recommender, successfully serving over hundreds of millions users.

【2】Efficient Privacy-Preserving Recommendation on Sparse Data using Fully Homomorphic Encryption
标题：使用完全同形加密的稀疏数据的有效隐私保护建议
链接：https://arxiv.org/abs/2509.03024

作者：Nishat Chowdhury, André Bauer, Minxuan Zhou
备注：The paper is accepted at the 21st IEEE International eScience Conference (eScience'25) and will be published soon. Link: this https URL
摘要：在当今数据驱动的世界中，推荐系统可以跨行业个性化用户体验，但依赖于敏感数据，从而引发了隐私问题。全同态加密（FHE）可以确保这些系统的安全，但将FHE应用于推荐系统的一个重大挑战是有效地处理固有的大而稀疏的用户项目评级矩阵。FHE操作是计算密集型的，并且在推荐系统中天真地处理各种稀疏矩阵将是非常昂贵的。此外，各方之间的通信开销仍然是加密域中的关键问题。我们提出了一种新的方法相结合的压缩稀疏行（CSR）表示与基于FHE的矩阵分解，有效地处理矩阵稀疏加密域，同时最大限度地减少通信成本。我们的实验结果表明，加密数据的推荐准确率高，同时实现了最低的通信成本，有效地保护用户隐私。
摘要：In today's data-driven world, recommendation systems personalize user experiences across industries but rely on sensitive data, raising privacy concerns. Fully homomorphic encryption (FHE) can secure these systems, but a significant challenge in applying FHE to recommendation systems is efficiently handling the inherently large and sparse user-item rating matrices. FHE operations are computationally intensive, and naively processing various sparse matrices in recommendation systems would be prohibitively expensive. Additionally, the communication overhead between parties remains a critical concern in encrypted domains. We propose a novel approach combining Compressed Sparse Row (CSR) representation with FHE-based matrix factorization that efficiently handles matrix sparsity in the encrypted domain while minimizing communication costs. Our experimental results demonstrate high recommendation accuracy with encrypted data while achieving the lowest communication costs, effectively preserving user privacy.

聚类(1篇)

【1】Cluster and then Embed: A Modular Approach for Visualization
标题：集群然后嵌入：可视化的模块化方法
链接：https://arxiv.org/abs/2509.03373

作者： Coda, Ery Arias-Castro, Gal Mishne
摘要：诸如t-SNE和UMAP之类的简化方法是用于可视化具有潜在（潜在）聚类结构的数据的流行方法。众所周知，它们在嵌入数据点的同时对数据点进行分组，从而产生具有良好分离的集群的可视化，这些集群可以很好地保留本地信息。然而，t-SNE和UMAP也倾向于扭曲底层数据的全局几何形状。我们提出了一种更透明的模块化方法，首先对数据进行聚类，然后嵌入每个聚类，最后对齐聚类以获得全局嵌入。我们在几个合成和真实世界的数据集上展示了这种方法，并表明它与现有方法相比具有竞争力，同时更加透明。
摘要：Dimensionality reduction methods such as t-SNE and UMAP are popular methods for visualizing data with a potential (latent) clustered structure. They are known to group data points at the same time as they embed them, resulting in visualizations with well-separated clusters that preserve local information well. However, t-SNE and UMAP also tend to distort the global geometry of the underlying data. We propose a more transparent, modular approach consisting of first clustering the data, then embedding each cluster, and finally aligning the clusters to obtain a global embedding. We demonstrate this approach on several synthetic and real-world datasets and show that it is competitive with existing methods, while being much more transparent.

点云|SLAM|雷达|激光|深度RGBD相关(1篇)

【1】Multimodal learning of melt pool dynamics in laser powder bed fusion
标题：激光粉末床聚变熔池动力学的多模式学习
链接：https://arxiv.org/abs/2509.03029

作者：Mojumder, Pallock Halder, Tiana Tonge
备注：20 pages, 6 figures, 1 table
摘要：虽然多个传感器用于增材制造中的实时监控，但并非所有传感器都能提供实用或可靠的过程见解。例如，高速X射线成像提供了关于地下熔池行为的有价值的空间信息，但对于大多数工业环境来说是昂贵且不切实际的。相比之下，来自低成本光电二极管的吸收率数据与熔池动力学相关，但单独使用时往往太嘈杂而无法准确预测。在本文中，我们提出了一种多模态数据融合方法预测熔池动态相结合的高保真度X射线数据与低保真度的吸收率数据在激光粉末床融合（LPBF）过程。我们的多模态学习框架集成了卷积神经网络（CNN），用于从X射线数据中提取空间特征，并使用早期融合策略，将递归神经网络（RNN）用于从吸收率信号中提取时间特征。多峰模型还被用作迁移学习模型来微调RNN模型，该模型可以仅用吸收率来预测熔池动态，与多峰模型相比具有更高的准确性。结果表明，与单独使用任何一种模式相比，使用这两种模式进行训练显着提高了预测准确性。此外，一旦经过训练，该模型可以仅使用吸收率数据来推断熔池特性，从而消除了对昂贵的X射线成像的需要。这种多模态融合方法能够实现具有成本效益的实时监控，并在增材制造中具有广泛的适用性。
摘要：While multiple sensors are used for real-time monitoring in additive manufacturing, not all provide practical or reliable process insights. For example, high-speed X-ray imaging offers valuable spatial information about subsurface melt pool behavior but is costly and impractical for most industrial settings. In contrast, absorptivity data from low-cost photodiodes correlate with melt pool dynamics but is often too noisy for accurate prediction when used alone. In this paper, we propose a multimodal data fusion approach for predicting melt pool dynamics by combining high-fidelity X-ray data with low-fidelity absorptivity data in the Laser Powder Bed Fusion (LPBF) process. Our multimodal learning framework integrates convolutional neural networks (CNNs) for spatial feature extraction from X-ray data with recurrent neural networks (RNNs) for temporal feature extraction from absorptivity signals, using an early fusion strategy. The multimodal model is further used as a transfer learning model to fine-tune the RNN model that can predict melt pool dynamics only with absorptivity, with greater accuracy compared to the multimodal model. Results show that training with both modalities significantly improves prediction accuracy compared to using either modality alone. Furthermore, once trained, the model can infer melt pool characteristics using only absorptivity data, eliminating the need for expensive X-ray imaging. This multimodal fusion approach enables cost-effective, real-time monitoring and has broad applicability in additive manufacturing.

联邦学习|隐私保护|加密(1篇)

【1】Delayed Momentum Aggregation: Communication-efficient Byzantine-robust Federated Learning with Partial Participation
标题：延迟动量聚合：具有部分参与的高效通信的拜占庭稳健联邦学习
链接：https://arxiv.org/abs/2509.02970

作者：uka, Yuki Takezawa, Makoto Yamada
摘要：联邦学习（FL）允许跨多个客户端进行分布式模型训练，同时保护数据隐私，但它仍然容易受到表现出恶意行为的拜占庭客户端的攻击。虽然现有的拜占庭鲁棒FL方法提供了强收敛保证（例如，在拜占庭攻击下，它们通常假设完全的客户端参与，由于通信约束和客户端可用性，这是不现实的。在部分参与下，现有的方法在采样的客户端包含拜占庭多数后立即失败，这对稀疏通信造成了根本性的挑战。首先，我们引入了延迟动量聚合，这是一种新的原理，服务器将最近从非参与客户端接收到的梯度与来自活跃客户端的新鲜动量聚合在一起。我们的优化器D-Byz-SGDM（Delayed Byzantine-robust SGD with Momentum）实现了部分参与的Byzantine-robust FL的延迟动量聚合原理。然后，我们建立收敛保证，恢复以前的充分参与的结果和匹配的基本下限，我们证明的部分参与设置。深度学习任务的实验验证了我们的理论发现，在各种拜占庭攻击下显示出稳定和强大的训练。
摘要：Federated Learning (FL) allows distributed model training across multiple clients while preserving data privacy, but it remains vulnerable to Byzantine clients that exhibit malicious behavior. While existing Byzantine-robust FL methods provide strong convergence guarantees (e.g., to a stationary point in expectation) under Byzantine attacks, they typically assume full client participation, which is unrealistic due to communication constraints and client availability. Under partial participation, existing methods fail immediately after the sampled clients contain a Byzantine majority, creating a fundamental challenge for sparse communication. First, we introduce delayed momentum aggregation, a novel principle where the server aggregates the most recently received gradients from non-participating clients alongside fresh momentum from active clients. Our optimizer D-Byz-SGDM (Delayed Byzantine-robust SGD with Momentum) implements this delayed momentum aggregation principle for Byzantine-robust FL with partial participation. Then, we establish convergence guarantees that recover previous full participation results and match the fundamental lower bounds we prove for the partial participation setting. Experiments on deep learning tasks validated our theoretical findings, showing stable and robust training under various Byzantine attacks.

推理|分析|理解|解释(10篇)

【1】LINKER: Learning Interactions Between Functional Groups and Residues With Chemical Knowledge-Enhanced Reasoning and Explainability
标题：链接：利用化学知识增强推理和解释性学习功能组和残基之间的相互作用
链接：https://arxiv.org/abs/2509.03425

作者：, Viet Thanh Duy Nguyen, Truong-Son Hy
摘要：蛋白质残基和配体官能团之间相互作用的准确鉴定对于理解分子识别和指导合理的药物设计是必不可少的。现有的用于蛋白质-配体可解释性的深度学习方法通常依赖于3D结构输入或使用基于距离的接触标签，这限制了它们的适用性和生物相关性。我们介绍LINKER，第一个基于序列的模型来预测残基官能团相互作用的生物定义的相互作用类型，仅使用蛋白质序列和配体SMILES作为输入。LINKER使用结构监督注意力进行训练，其中相互作用标签通过基于官能团的基序提取来自3D蛋白质-配体复合物。通过将配体结构抽象为官能团，该模型专注于化学上有意义的子结构，同时预测相互作用类型，而不仅仅是空间接近度。至关重要的是，LINKER在推理时只需要序列级的输入，从而能够在结构数据不可用的情况下实现大规模应用。在LP-PDBBind基准上的实验表明，对官能团抽象的结构知情监督产生与地面实况生化注释密切一致的相互作用预测。
摘要：Accurate identification of interactions between protein residues and ligand functional groups is essential to understand molecular recognition and guide rational drug design. Existing deep learning approaches for protein-ligand interpretability often rely on 3D structural input or use distance-based contact labels, limiting both their applicability and biological relevance. We introduce LINKER, the first sequence-based model to predict residue-functional group interactions in terms of biologically defined interaction types, using only protein sequences and the ligand SMILES as input. LINKER is trained with structure-supervised attention, where interaction labels are derived from 3D protein-ligand complexes via functional group-based motif extraction. By abstracting ligand structures into functional groups, the model focuses on chemically meaningful substructures while predicting interaction types rather than mere spatial proximity. Crucially, LINKER requires only sequence-level input at inference time, enabling large-scale application in settings where structural data is unavailable. Experiments on the LP-PDBBind benchmark demonstrate that structure-informed supervision over functional group abstractions yields interaction predictions closely aligned with ground-truth biochemical annotations.

【2】Some patterns of sleep quality and Daylight Saving Time across countries: a predictive and exploratory analysis
标题：各国睡眠质量和夏令时的一些模式：预测和探索性分析
链接：https://arxiv.org/abs/2509.03358

作者：rma, Eugene Pinsky
备注：16 Pages
摘要：在这项研究中，我们分析了61个国家的平均睡眠时间，以研究夏令时（DST）实践的影响。确定了影响睡眠的关键指标，并应用统计相关分析来探讨这些因素之间的关系。根据DST观察对国家进行分组，并将DST和非DST地区之间的睡眠模式进行可视化比较。结果显示，平均而言，观察DST的国家往往比那些没有观察DST的国家报告更长的睡眠时间。当考虑纬度时，出现了一个更详细的模式：在低纬度，与非DST国家相比，DST观察国家报告的睡眠时间较短，而在高纬度，DST观察国家报告的平均睡眠时间较长。这些发现表明，DST对睡眠的影响可能会受到地理位置的调节。
摘要：In this study we analyzed average sleep durations across 61 countries to examine the impact of Daylight Saving Time (DST) practices. Key metrics influencing sleep were identified, and statistical correlation analysis was applied to explore relationships among these factors. Countries were grouped based on DST observance, and visualizations compared sleep patterns between DST and non-DST regions. Results show that, on average, countries observing DST tend to report longer sleep durations than those that do not. A more detailed pattern emerged when accounting for latitude: at lower latitudes, DST-observing countries reported shorter sleep durations compared to non-DST countries, while at higher latitudes, DST-observing countries reported longer average sleep durations. These findings suggest that the influence of DST on sleep may be moderated by geographical location.

【3】Structure Transfer: an Inference-Based Calculus for the Transformation of Representations
标题：结构转移：一种基于推理的表示转换演算
链接：https://arxiv.org/abs/2509.03249

作者：ggi, Gem Stapleton, Mateja Jamnik, Aaron Stockdill, Grecia Garcia Garcia, Peter C-H. Cheng
摘要：表征选择对于我们有效沟通和推理的能力至关重要。一个主要的未解决的问题，在本文中，是如何设计\textit{代表性系统（RS）不可知的}技术，驱动表示转换和选择。我们提出了一种新型演算，称为\textit{结构转移}，可以实现不同RS之间的表示转换。具体来说，给定从源RS绘制的\textit{source}表示，结构转换规则允许我们为目标RS生成\textit{target}表示。结构转换的通用性部分来自于它确保源表示和生成的目标表示满足任何指定关系（如语义等价）的能力。这是通过利用\textit{schemas}来完成的，它对RS的知识进行编码。具体来说，模式可以表达任何一对RS之间的关系的信息保存，并且结构转移使用这些知识来导出目标表示的结构，以确保所需的关系保持不变。我们正式使用表征系统理论，建立在一个\textit{构造空间}的关键概念。构造空间的抽象性赋予了它们对不同类型的RS建模的通用性，包括形式语言，几何图形和图表，以及非正式符号。因此，结构转移是一个系统不可知的演算，可以用来识别替代表示在广泛的实际设置。
摘要：Representation choice is of fundamental importance to our ability to communicate and reason effectively. A major unsolved problem, addressed in this paper, is how to devise \textit{representational-system (RS) agnostic} techniques that drive representation transformation and choice. We present a novel calculus, called \textit{structure transfer}, that enables representation transformation across diverse RSs. Specifically, given a \textit{source} representation drawn from a source RS, the rules of structure transfer allow us to generate a \textit{target} representation for a target RS. The generality of structure transfer comes in part from its ability to ensure that the source representation and the generated target representation satisfy \textit{any} specified relation (such as semantic equivalence). This is done by exploiting \textit{schemas}, which encode knowledge about RSs. Specifically, schemas can express \textit{preservation of information} across relations between any pair of RSs, and this knowledge is used by structure transfer to derive a structure for the target representation which ensures that the desired relation holds. We formalise this using Representational Systems Theory~\cite{raggi2022rst}, building on the key concept of a \textit{construction space}. The abstract nature of construction spaces grants them the generality to model RSs of diverse kinds, including formal languages, geometric figures and diagrams, as well as informal notations. Consequently, structure transfer is a system-agnostic calculus that can be used to identify alternative representations in a wide range of practical settings.

【4】Rashomon in the Streets: Explanation Ambiguity in Scene Understanding
标题：街头罗生门：场景理解中的解释模糊性
链接：https://arxiv.org/abs/2509.03169

作者：eker, Jørn Eirik Betten, Arnaud Gotlieb, Nadjib Lazaar, Nassim Belmecheri
备注：AAAI 2025 Fall Symposium: AI Trustworthiness and Risk Assessment for Challenged Contexts (ATRACC)
摘要：可解释AI（XAI）对于在自动驾驶等安全关键型应用中验证和信任模型至关重要。然而，XAI的可靠性受到罗生门效应的挑战，在罗生门效应中，多个同样准确的模型可以为同一预测提供不同的解释。本文提供了第一个经验量化的这种效果的任务，在现实世界的驾驶场景中的动作预测。使用定性可解释图（QXGs）作为符号场景表示，我们训练罗生门集的两个不同的模型类：可解释的，基于对的梯度提升模型和复杂的，基于图的图神经网络（GNNs）。使用特征归因方法，我们测量这些类内和之间的解释的一致性。我们的研究结果揭示了显着的解释分歧。我们的研究结果表明，解释歧义是一个固有的属性的问题，而不仅仅是一个建模工件。
摘要：Explainable AI (XAI) is essential for validating and trusting models in safety-critical applications like autonomous driving. However, the reliability of XAI is challenged by the Rashomon effect, where multiple, equally accurate models can offer divergent explanations for the same prediction. This paper provides the first empirical quantification of this effect for the task of action prediction in real-world driving scenes. Using Qualitative Explainable Graphs (QXGs) as a symbolic scene representation, we train Rashomon sets of two distinct model classes: interpretable, pair-based gradient boosting models and complex, graph-based Graph Neural Networks (GNNs). Using feature attribution methods, we measure the agreement of explanations both within and between these classes. Our results reveal significant explanation disagreement. Our findings suggest that explanation ambiguity is an inherent property of the problem, not just a modeling artifact.

【5】TRELLIS-Enhanced Surface Features for Comprehensive Intracranial Aneurysm Analysis
标题：用于全面脑动脉瘤分析的TERLLIS增强表面特征
链接：https://arxiv.org/abs/2509.03095

作者：ervé, Paul Garnier, Jonathan Viquerat, Elie Hachem
摘要：颅内动脉瘤具有显著的临床风险，但由于有限的注释3D数据，难以检测、描绘和建模。我们提出了一种跨域特征转移方法，该方法利用TRELLIS（一种在大规模非医疗3D数据集上训练的生成模型）学习的潜在几何嵌入来增强神经网络以进行动脉瘤分析。通过用TRELLIS表面特征替换传统的点法线或网格描述符，我们系统地增强了三个下游任务：（i）在Intra3D数据集中对动脉瘤与健康血管进行分类，（ii）在3D网格上分割动脉瘤和血管区域，以及（iii）使用AnXplore数据集上的图形神经网络预测随时间变化的血流场。我们的实验表明，这些功能的包含产生强大的增益在准确性，F1分数和分割质量超过国家的最先进的基线，并减少模拟误差15%。这些结果说明了将3D表示从通用生成模型转移到专业医疗任务的更广泛潜力。
摘要：Intracranial aneurysms pose a significant clinical risk yet are difficult to detect, delineate and model due to limited annotated 3D data. We propose a cross-domain feature-transfer approach that leverages the latent geometric embeddings learned by TRELLIS, a generative model trained on large-scale non-medical 3D datasets, to augment neural networks for aneurysm analysis. By replacing conventional point normals or mesh descriptors with TRELLIS surface features, we systematically enhance three downstream tasks: (i) classifying aneurysms versus healthy vessels in the Intra3D dataset, (ii) segmenting aneurysm and vessel regions on 3D meshes, and (iii) predicting time-evolving blood-flow fields using a graph neural network on the AnXplore dataset. Our experiments show that the inclusion of these features yields strong gains in accuracy, F1-score and segmentation quality over state-of-the-art baselines, and reduces simulation error by 15\%. These results illustrate the broader potential of transferring 3D representations from general-purpose generative models to specialized medical tasks.

【6】Lattice Annotated Temporal (LAT) Logic for Non-Markovian Reasoning
标题：用于非马尔科夫推理的格注释时态（LAT）逻辑
链接：https://arxiv.org/abs/2509.02958

作者：ukherji, Jaikrishna Manojkumar Patil, Dyuman Aditya, Paulo Shakarian, Devendra Parkar, Lahari Pokala, Clark Dorman, Gerardo I. Simari
摘要：我们介绍格注释时间（LAT）逻辑，广义注释逻辑程序（GAP），采用时间推理和支持开放世界的语义通过使用较低的格结构的扩展。这种逻辑结合了一个有效的演绎过程与时序逻辑编程，以支持非马尔可夫关系和开放世界的推理能力。开放世界方面，使用低格注释结构的副产品，允许通过Skolemization过程进行有效的接地，即使在具有无限或高度多样化常数的域中。我们提供了一套理论结果，绑定接地过程的计算复杂性，除了显示，许多结果的差距（使用上格）仍然持有较低的晶格和时间的扩展（虽然需要不同的证明技术）。我们的开源实现PyReason具有模块化设计，机器级优化以及与强化学习环境的直接集成。跨多智能体模拟和知识图任务的经验评估表明，在保持或提高任务性能的同时，速度提高了三个数量级，内存减少了五个数量级。此外，我们还评估了LAT Logic在强化学习环境中作为非马尔可夫模拟器的价值，通过提高代理性能实现了高达三个数量级的模拟速度，包括由于捕获更丰富的时间依赖性而使胜率提高了26%。这些结果突出了LAT逻辑的潜力，作为一个统一的，可扩展的框架，开放世界的时间推理在动态和不确定的环境。我们的实现可以在：pyreason.syracuse.edu上找到。
摘要：We introduce Lattice Annotated Temporal (LAT) Logic, an extension of Generalized Annotated Logic Programs (GAPs) that incorporates temporal reasoning and supports open-world semantics through the use of a lower lattice structure. This logic combines an efficient deduction process with temporal logic programming to support non-Markovian relationships and open-world reasoning capabilities. The open-world aspect, a by-product of the use of the lower-lattice annotation structure, allows for efficient grounding through a Skolemization process, even in domains with infinite or highly diverse constants. We provide a suite of theoretical results that bound the computational complexity of the grounding process, in addition to showing that many of the results on GAPs (using an upper lattice) still hold with the lower lattice and temporal extensions (though different proof techniques are required). Our open-source implementation, PyReason, features modular design, machine-level optimizations, and direct integration with reinforcement learning environments. Empirical evaluations across multi-agent simulations and knowledge graph tasks demonstrate up to three orders of magnitude speedup and up to five orders of magnitude memory reduction while maintaining or improving task performance. Additionally, we evaluate LAT Logic's value in reinforcement learning environments as a non-Markovian simulator, achieving up to three orders of magnitude faster simulation with improved agent performance, including a 26% increase in win rate due to capturing richer temporal dependencies. These results highlight LAT Logic's potential as a unified, extensible framework for open-world temporal reasoning in dynamic and uncertain environments. Our implementation is available at: pyreason.syracuse.edu.

【7】Improving Generative Methods for Causal Evaluation via Simulation-Based Inference
标题：通过基于模拟的推理改进因果评价的生成方法
链接：https://arxiv.org/abs/2509.02892

作者：Amaranath, Vinitra Muralikrishnan, Amit Sharma, David D. Jensen
备注：12 pages main text, 48 pages total
摘要：生成准确反映真实世界观测数据的合成数据集对于评估因果估计量至关重要，但仍然是一项具有挑战性的任务。现有的生成方法提供了一种解决方案，通过生成锚定在观察数据（源数据）中的合成数据集，同时允许关键参数（如治疗效果和混杂偏倚量）的变化。然而，现有方法通常要求用户提供此类参数的点估计（而不是分布）和固定估计（而不是可以参考源数据改进的估计）。这使用户无法表达参数值的不确定性，并消除了后验推断的可能性，可能导致不可靠的估计量比较。我们引入了基于模拟的因果评估推理（SBICE），这是一个框架，它将生成参数建模为不确定的，并在给定源数据集的情况下推断其后验分布。利用基于模拟的推理技术，SBICE确定了产生与源数据分布密切一致的合成数据集的参数配置。实证结果表明，SBICE通过生成更真实的数据集，支持一个强大的和数据一致的方法，因果基准不确定性下，提高了估计评估的可靠性。
摘要：Generating synthetic datasets that accurately reflect real-world observational data is critical for evaluating causal estimators, but remains a challenging task. Existing generative methods offer a solution by producing synthetic datasets anchored in the observed data (source data) while allowing variation in key parameters such as the treatment effect and amount of confounding bias. However, existing methods typically require users to provide point estimates of such parameters (rather than distributions) and fixed estimates (rather than estimates that can be improved with reference to the source data). This denies users the ability to express uncertainty over parameter values and removes the potential for posterior inference, potentially leading to unreliable estimator comparisons. We introduce simulation-based inference for causal evaluation (SBICE), a framework that models generative parameters as uncertain and infers their posterior distribution given a source dataset. Leveraging techniques in simulation-based inference, SBICE identifies parameter configurations that produce synthetic datasets closely aligned with the source data distribution. Empirical results demonstrate that SBICE improves the reliability of estimator evaluations by generating more realistic datasets, which supports a robust and data-consistent approach to causal benchmarking under uncertainty.

【8】Towards Reasoning for PDE Foundation Models: A Reward-Model-Driven Inference-Time-Scaling Algorithm
标题：迈向PDL基础模型的推理：奖励模型驱动的推理时间缩放算法
链接：https://arxiv.org/abs/2509.02846

作者： Mansingh, James Amarel, Ragib Arnab, Arvind Mohan, Kamaljeet Singh, Gerd J. Kunde, Nicolas Hengartner, Benjamin Migliori, Emily Casleton, Nathan A. Debarledeben, Ayan Biswas, Diane Oyen, Earl Lawrence
摘要：偏微分方程（PDE）是现代计算科学和工程的基石，并且固有地计算昂贵。虽然PDE基础模型在模拟这种复杂的时空现象方面表现出了很大的希望，但现有的模型仍然受到预训练数据集的限制，并且在自回归推出性能方面存在困难，特别是在分布外（OOD）的情况下。此外，它们具有显著的计算和训练数据要求，这阻碍了它们在许多关键应用中的使用。受大型语言模型（LLM）中使用的“思考”策略的最新进展的启发，我们引入了第一个用于PDE的测试时计算（TTC）策略，该策略在推理过程中利用计算资源，以更少的训练样本和更小的模型实现更准确的预测。我们用两种类型的奖励模型来实现这一点，这些模型评估基于随机模型的时空一致性预测。我们证明了这种方法的可压缩欧拉方程模拟从PDEGym基准，并表明TTC捕获改进的预测相对于标准的非自适应自回归推理。这个TTC框架标志着朝着更先进的推理算法或PDE建模迈出了基础性的一步，包括构建基于重复学习的方法，可能会改变物理和工程中的计算工作流程。
摘要：Partial Differential Equations (PDEs) are the bedrock for modern computational sciences and engineering, and inherently computationally expensive. While PDE foundation models have shown much promise for simulating such complex spatio-temporal phenomena, existing models remain constrained by the pretraining datasets and struggle with auto-regressive rollout performance, especially in out-of-distribution (OOD) cases. Furthermore, they have significant compute and training data requirements which hamper their use in many critical applications. Inspired by recent advances in ``thinking" strategies used in large language models (LLMs), we introduce the first test-time computing (TTC) strategy for PDEs that utilizes computational resources during inference to achieve more accurate predictions with fewer training samples and smaller models. We accomplish this with two types of reward models that evaluate predictions of a stochastic based model for spatio-temporal consistency. We demonstrate this method on compressible Euler-equation simulations from the PDEGym benchmark and show that TTC captures improved predictions relative to standard non-adaptive auto-regressive inference. This TTC framework marks a foundational step towards more advanced reasoning algorithms or PDE modeling, inluding building reinforcement-learning-based approaches, potentially transforming computational workflows in physics and engineering.

【9】Ensemble Learning for Healthcare: A Comparative Analysis of Hybrid Voting and Ensemble Stacking in Obesity Risk Prediction
标题：为医疗保健而学习：肥胖风险预测中混合投票和招生堆叠的比较分析
链接：https://arxiv.org/abs/2509.02826

作者：Islam, Md Sumon Ali
备注：26 pages, 3 figures, 16 tables
摘要：肥胖是由饮食、生理和环境因素驱动的关键全球健康问题，并且与慢性疾病如糖尿病、心血管疾病和癌症密切相关。机器学习已经成为早期肥胖风险预测的一种有前途的方法，但对集成技术的比较评估-特别是混合多数投票和集成堆叠-仍然有限。这项研究的目的是比较混合多数投票和集成叠加方法预测肥胖风险，确定哪种方法提供更高的准确性和效率。该分析旨在突出这些集成技术在指导医疗保健应用更好的预测模型选择方面的互补优势。利用两个数据集来评估三个集成模型：多数硬投票，加权硬投票和堆叠（多层感知器作为元分类器）。分析了在总共50个超参数配置中评估的9个机器学习（ML）算法，以确定前三个模型作为集成方法的基础学习者。预处理步骤涉及数据集平衡和离群值检测，并使用准确度和F1分数评估模型性能。在Dataset-1上，加权硬投票和堆叠实现了几乎相同的性能（准确度：0.920304，F1：0.920070），优于多数硬投票。在Dataset-2上，与多数硬投票（准确度：0.981707，F1：0.981675）和加权硬投票（表现出最低的性能）相比，堆叠显示出更好的结果（准确度：0.989837，F1：0.989825）。研究结果证实，集成堆叠提供了更强的预测能力，特别是对于复杂的数据分布，而混合多数表决仍然是一个强大的替代方案。
摘要：Obesity is a critical global health issue driven by dietary, physiological, and environmental factors, and is strongly associated with chronic diseases such as diabetes, cardiovascular disorders, and cancer. Machine learning has emerged as a promising approach for early obesity risk prediction, yet a comparative evaluation of ensemble techniques -- particularly hybrid majority voting and ensemble stacking -- remains limited. This study aims to compare hybrid majority voting and ensemble stacking methods for obesity risk prediction, identifying which approach delivers higher accuracy and efficiency. The analysis seeks to highlight the complementary strengths of these ensemble techniques in guiding better predictive model selection for healthcare applications. Two datasets were utilized to evaluate three ensemble models: Majority Hard Voting, Weighted Hard Voting, and Stacking (with a Multi-Layer Perceptron as meta-classifier). A pool of nine Machine Learning (ML) algorithms, evaluated across a total of 50 hyperparameter configurations, was analyzed to identify the top three models to serve as base learners for the ensemble methods. Preprocessing steps involved dataset balancing, and outlier detection, and model performance was evaluated using Accuracy and F1-Score. On Dataset-1, weighted hard voting and stacking achieved nearly identical performance (Accuracy: 0.920304, F1: 0.920070), outperforming majority hard voting. On Dataset-2, stacking demonstrated superior results (Accuracy: 0.989837, F1: 0.989825) compared to majority hard voting (Accuracy: 0.981707, F1: 0.981675) and weighted hard voting, which showed the lowest performance. The findings confirm that ensemble stacking provides stronger predictive capability, particularly for complex data distributions, while hybrid majority voting remains a robust alternative.

【10】Understanding and Improving the Shampoo Optimizer via Kullback-Leibler Minimization
标题：通过Kullback-Leibler最小化理解和改进洗发水优化器
链接：https://arxiv.org/abs/2509.03378

作者：cott C. Lowe, Felix Dangel, Runa Eschenhagen, Zikun Xu, Roger B. Grosse
备注：technical report, working in progress
摘要：Shampoo作为一种自适应方法，采用结构化二阶矩估计，其有效性越来越受到关注。先前的工作主要是通过Frobenius范数分析其估计方案。出于二阶矩和协方差矩阵之间的自然联系，我们建议通过Kullback-Leibler（KL）最小化的镜头研究香波的估计作为协方差估计。这种替代的视角揭示了一个以前隐藏的局限性，激发了Shampoo设计的改进。基于这一认识，我们开发了一个实用的估计方案，称为KL洗发水，消除洗发水的依赖亚当的稳定，从而消除额外的内存开销引入亚当。初步结果表明，KL-Shampoo提高了Shampoo的性能，使其能够在没有Adam的情况下保持稳定，甚至在神经网络预训练中优于其Adam稳定的变体SOAP。
摘要：As an adaptive method, Shampoo employs a structured second-moment estimation, and its effectiveness has attracted growing attention. Prior work has primarily analyzed its estimation scheme through the Frobenius norm. Motivated by the natural connection between the second moment and a covariance matrix, we propose studying Shampoo's estimation as covariance estimation through the lens of Kullback-Leibler (KL) minimization. This alternative perspective reveals a previously hidden limitation, motivating improvements to Shampoo's design. Building on this insight, we develop a practical estimation scheme, termed KL-Shampoo, that eliminates Shampoo's reliance on Adam for stabilization, thereby removing the additional memory overhead introduced by Adam. Preliminary results show that KL-Shampoo improves Shampoo's performance, enabling it to stabilize without Adam and even outperform its Adam-stabilized variant, SOAP, in neural network pretraining.

检测相关(4篇)

【1】Machine Learning-Driven Anomaly Detection for 5G O-RAN Performance Metrics
标题：机器学习驱动的5G O-RAN性能测试异常检测
链接：https://arxiv.org/abs/2509.03290

作者：aei, Kishor Chandra Joshi, George Exarchakos
摘要：关键服务对网络基础设施的依赖日益增加，加上超5G/6 G网络运营复杂性的增加，需要主动和自动化的网络故障管理。在不同无线电接入网络（RAN）元件之间提供开放接口以及将AI/ML集成到由开放RAN（O-RAN）规范实现的网络架构中，为主动网络健康监测和异常检测带来了新的可能性。在本文中，我们利用这些优势，并开发了一个异常检测框架，主动检测可能的吞吐量下降的UE，并尽量减少切换后的故障。我们提出了两个可操作的异常检测算法为现实世界的部署量身定制。第一种算法通过分析诸如资源块利用率和信号质量度量的关键性能指标（KPI）来识别处于严重吞吐量降级风险中的用户设备（UE），从而实现主动切换发起。第二种算法评估相邻小区的无线电覆盖质量，过滤掉具有异常信号强度或干扰水平的小区。这使移交的候选目标平均减少了41.27%。总之，这些方法减轻了切换后故障和吞吐量下降，同时比近实时延迟约束更快地操作。这为自我修复的6 G网络铺平了道路。
摘要：The ever-increasing reliance of critical services on network infrastructure coupled with the increased operational complexity of beyond-5G/6G networks necessitate the need for proactive and automated network fault management. The provision for open interfaces among different radio access network\,(RAN) elements and the integration of AI/ML into network architecture enabled by the Open RAN\,(O-RAN) specifications bring new possibilities for active network health monitoring and anomaly detection. In this paper we leverage these advantages and develop an anomaly detection framework that proactively detect the possible throughput drops for a UE and minimize the post-handover failures. We propose two actionable anomaly detection algorithms tailored for real-world deployment. The first algorithm identifies user equipment (UE) at risk of severe throughput degradation by analyzing key performance indicators (KPIs) such as resource block utilization and signal quality metrics, enabling proactive handover initiation. The second algorithm evaluates neighbor cell radio coverage quality, filtering out cells with anomalous signal strength or interference levels. This reduces candidate targets for handover by 41.27\% on average. Together, these methods mitigate post-handover failures and throughput drops while operating much faster than the near-real-time latency constraints. This paves the way for self-healing 6G networks.

【2】Evaluation of Stress Detection as Time Series Events -- A Novel Window-Based F1-Metric
标题：作为时间序列事件的压力检测评估--一种新型基于窗口的F1指标
链接：https://arxiv.org/abs/2509.03240

作者：lhelm Skat-Rørdam, Sneha Das, Kathrine Sofie Rasmussen, Nicole Nadine Lønfeldt, Line Clemmensen
备注：15 pages, 6 figures
摘要：时间序列中事件检测的准确评估对于可穿戴设备的压力监测等应用至关重要，其中地面实况通常被注释为单点事件，即使潜在的现象是渐进的和暂时扩散的。像F1和点调整的F1（F1$_{pa}$）这样的标准指标通常会在这种真实世界的不平衡数据集中歪曲模型性能。我们引入了一个基于窗口的F1度量（F1$_w$），采用了时间公差，使事件检测时，准确的对齐是不现实的更强大的评估。在三个生理数据集，两个在野外（ADARP，手腕天使）和一个实验（ROAD）的实证分析表明，F1$_w$揭示了传统指标不可见的有意义的模型性能模式，而其窗口大小可以适应域知识，以避免高估。我们表明，评估指标的选择强烈影响了模型性能的解释：使用TimesFM的预测，只有我们的时间容忍指标在两个野外用例中显示出比随机基线和空基线有统计学上的显着改善。这项工作解决了时间序列评估中的关键差距，并为医疗保健应用提供了实际指导，其中对时间精度的要求因上下文而异。
摘要：Accurate evaluation of event detection in time series is essential for applications such as stress monitoring with wearable devices, where ground truth is typically annotated as single-point events, even though the underlying phenomena are gradual and temporally diffused. Standard metrics like F1 and point-adjusted F1 (F1$_{pa}$) often misrepresent model performance in such real-world, imbalanced datasets. We introduce a window-based F1 metric (F1$_w$) that incorporates temporal tolerance, enabling a more robust assessment of event detection when exact alignment is unrealistic. Empirical analysis in three physiological datasets, two in-the-wild (ADARP, Wrist Angel) and one experimental (ROAD), indicates that F1$_w$ reveals meaningful model performance patterns invisible to conventional metrics, while its window size can be adapted to domain knowledge to avoid overestimation. We show that the choice of evaluation metric strongly influences the interpretation of model performance: using predictions from TimesFM, only our temporally tolerant metrics reveal statistically significant improvements over random and null baselines in the two in-the-wild use cases. This work addresses key gaps in time series evaluation and provides practical guidance for healthcare applications where requirements for temporal precision vary by context.

【3】A Data-Driven RetinaNet Model for Small Object Detection in Aerial Images
标题：航空图像中小目标检测的数据驱动RetinaNet模型
链接：https://arxiv.org/abs/2509.02928

作者：Tang, Jinwen Tang, Yi Shang
摘要：在航空成像领域，检测小物体的能力对于无数应用至关重要，包括环境监测、城市设计和危机管理。利用RetinaNet，这项工作推出了DDR-Net：一种数据驱动的深度学习模型，旨在增强对小型物体的检测。DDR-Net引入了新颖的数据驱动技术来自主确定最佳特征图和锚点估计，在保持精确度的同时培养量身定制且熟练的训练过程。此外，本文提出了一种创新的采样技术，以支持有限的数据训练约束下的模型功效。该模型增强的检测功能支持关键应用，包括野生动物和栖息地监测、交通流量优化以及通过准确识别车辆和行人等小型物体来改善公共安全。DDR-Net显著降低了数据收集和培训所需的成本和时间，即使在数据有限的情况下也能提供高效的性能。对各种航空鸟类图像数据集的经验评估表明，DDR-Net明显优于RetinaNet和其他当代模型。这些创新推进了当前的航空图像分析技术，并有望在农业、安全和考古等多个领域产生广泛的影响。
摘要：In the realm of aerial imaging, the ability to detect small objects is pivotal for a myriad of applications, encompassing environmental surveillance, urban design, and crisis management. Leveraging RetinaNet, this work unveils DDR-Net: a data-driven, deep-learning model devised to enhance the detection of diminutive objects. DDR-Net introduces novel, data-driven techniques to autonomously ascertain optimal feature maps and anchor estimations, cultivating a tailored and proficient training process while maintaining precision. Additionally, this paper presents an innovative sampling technique to bolster model efficacy under limited data training constraints. The model's enhanced detection capabilities support critical applications including wildlife and habitat monitoring, traffic flow optimization, and public safety improvements through accurate identification of small objects like vehicles and pedestrians. DDR-Net significantly reduces the cost and time required for data collection and training, offering efficient performance even with limited data. Empirical assessments over assorted aerial avian imagery datasets demonstrate that DDR-Net markedly surpasses RetinaNet and alternative contemporary models. These innovations advance current aerial image analysis technologies and promise wide-ranging impacts across multiple sectors including agriculture, security, and archaeology.

【4】Event Detection and Classification for Long Range Sensing of Elephants Using Seismic Signal
标题：利用地震信号进行大象远程感知的事件检测和分类
链接：https://arxiv.org/abs/2509.02920

作者： Wijayaraja, Janaka L. Wijekoon, Malitha Wijesundara
备注：This article has been accepted for publication in IEEE Access
摘要：通过地震信号检测大象是一个新兴的研究课题，旨在为人象冲突（HEC）制定解决方案。尽管取得了可喜的成果，但这些解决方案严重依赖于大象足迹的手动分类，这限制了它们在自然环境中实时分类的适用性。为了解决这一限制，并建立在我们以前的工作，这项研究介绍了一个分类框架，针对资源受限的实现，优先考虑准确性和计算效率。作为该框架的一部分，一种新的事件检测技术，称为上下文自定义窗口（CCW），专门为检测大象的脚步，并进行了评估，通过比较它与短期平均/长期平均（STA/LTA）的方法。结果表明，在受控条件下，最大验证检测范围为155.6 m，在自然环境中为140 m。使用具有径向基函数（RBF）内核的支持向量机（SVM）进行大象足迹分类，在多种设置中表现出卓越的性能，在受控环境中达到99%的准确率，在自然大象栖息地中达到73%，在最具挑战性的人类栖息地中达到70%。此外，使用可解释AI的特征影响分析将零交叉和动态时间规整（DTW）对齐成本的数量确定为所有实验中最具影响力的因素，而主导频率在受控设置中表现出显着的影响。
摘要：Detecting elephants through seismic signals is an emerging research topic aimed at developing solutions for Human-Elephant Conflict (HEC). Despite the promising results, such solutions heavily rely on manual classification of elephant footfalls, which limits their applicability for real-time classification in natural settings. To address this limitation and build on our previous work, this study introduces a classification framework targeting resource-constrained implementations, prioritizing both accuracy and computational efficiency. As part of this framework, a novel event detection technique named Contextually Customized Windowing (CCW), tailored specifically for detecting elephant footfalls, was introduced, and evaluations were conducted by comparing it with the Short-Term Average/Long-Term Average (STA/LTA) method. The yielded results show that the maximum validated detection range was 155.6 m in controlled conditions and 140 m in natural environments. Elephant footfall classification using Support Vector Machine (SVM) with a Radial Basis Function (RBF) kernel demonstrated superior performance across multiple settings, achieving an accuracy of 99% in controlled environments, 73% in natural elephant habitats, and 70% in HEC-prone human habitats, the most challenging scenario. Furthermore, feature impact analysis using explainable AI identified the number of Zero Crossings and Dynamic Time Warping (DTW) Alignment Cost as the most influential factors in all experiments, while Predominant Frequency exhibited significant influence in controlled settings.

分类|识别(3篇)

【1】Invariant Features for Global Crop Type Classification
标题：全球作物类型分类的不变特征
链接：https://arxiv.org/abs/2509.03497

作者：ng, Sherrie Wang
摘要：准确获取全球范围内的作物类型及其空间分布对于粮食安全、农业决策和可持续发展至关重要。遥感为大规模作物分类提供了一种有效的解决办法，但许多地区可靠的地面样本有限，限制了其在不同地理区域的适用性。为了解决地理空间变化下的性能下降，本研究确定了遥感功能，是不变的地理变化，并提出了战略，以提高跨区域的推广。我们构建了CropGlobe，这是一个全球作物类型数据集，包含来自五大洲八个国家的300，000个像素级样本，涵盖六种主要粮食和经济作物（玉米，大豆，水稻，小麦，甘蔗，棉花）。CropGlobe具有广泛的地理覆盖范围，能够在跨国家，跨大陆和跨半球转移的情况下进行系统评估。我们比较了时间多光谱特征（基于Sentinel-2的1D/2D中值特征和谐波系数）和高光谱特征（来自EMIT）的可转移性。为了提高光谱和物候变化下的泛化能力，我们设计了CropNet，这是一种轻量级且鲁棒的CNN，专为像素级作物分类量身定制，再加上时间数据增强（时间偏移，时间尺度和幅度扭曲），模拟真实的跨区域物候。实验表明，Sentinel-2的2D中值时间特征在所有传输场景中始终表现出最强的不变性，增强进一步提高了鲁棒性，特别是在训练数据多样性有限的情况下。总的来说，这项工作确定了更多的不变特征表示，增强了地理可转移性，并提出了一个有前途的道路，可扩展的，低成本的作物类型的应用在全球不同地区。
摘要：Accurately obtaining crop type and its spatial distribution at a global scale is critical for food security, agricultural policy-making, and sustainable development. Remote sensing offers an efficient solution for large-scale crop classification, but the limited availability of reliable ground samples in many regions constrains applicability across geographic areas. To address performance declines under geospatial shifts, this study identifies remote sensing features that are invariant to geographic variation and proposes strategies to enhance cross-regional generalization. We construct CropGlobe, a global crop type dataset with 300,000 pixel-level samples from eight countries across five continents, covering six major food and industrial crops (corn, soybeans, rice, wheat, sugarcane, cotton). With broad geographic coverage, CropGlobe enables a systematic evaluation under cross-country, cross-continent, and cross-hemisphere transfer. We compare the transferability of temporal multi-spectral features (Sentinel-2-based 1D/2D median features and harmonic coefficients) and hyperspectral features (from EMIT). To improve generalization under spectral and phenological shifts, we design CropNet, a lightweight and robust CNN tailored for pixel-level crop classification, coupled with temporal data augmentation (time shift, time scale, and magnitude warping) that simulates realistic cross-regional phenology. Experiments show that 2D median temporal features from Sentinel-2 consistently exhibit the strongest invariance across all transfer scenarios, and augmentation further improves robustness, particularly when training data diversity is limited. Overall, the work identifies more invariant feature representations that enhance geographic transferability and suggests a promising path toward scalable, low-cost crop type applications across globally diverse regions.

【2】epiGPTope: A machine learning-based epitope generator and classifier
标题：epiGPTope：基于机器学习的表位生成器和分类器
链接：https://arxiv.org/abs/2509.03351

作者：lechas Manrique, Alberto Martínez, Elena López-Martínez, Luc Andrea, Román Orus, Aitor Manteca, Aitziber L. Cortajarena, Llorenç Espinosa-Portalés
备注：11 pages, 4 figures. Supplementary Information with 5 pages, 4 figures
摘要：表位是由抗体或免疫细胞受体识别的短抗原肽序列。这些是免疫疗法、疫苗和诊断学发展的核心。然而，合成表位文库的合理设计是具有挑战性的，这是由于大的组合序列空间，对于n个氨基酸的线性表位的20^n $组合，使得筛选和测试不可行，即使使用高通量实验技术。在这项研究中，我们提出了一个大型语言模型epiGPTope，该模型对蛋白质数据进行了预训练，并对线性表位进行了专门的微调，这是第一次可以直接生成新的表位样序列，这些序列被发现具有与已知表位相似的统计特性。这种生成方法可用于制备表位候选序列的文库。我们进一步训练统计分类器来预测表位序列是细菌还是病毒来源的，从而缩小候选库并增加识别特定表位的可能性。我们建议，这种组合的生成和预测模型可以在表位发现的援助。该方法仅使用线性表位的一级氨基酸序列，绕过了对序列的几何框架或手工制作特征的需要。通过开发一种方法来创建生物学上可行的序列，我们预计更快，更具成本效益的合成表位的生成和筛选，在新的生物技术的开发相关的应用。
摘要：Epitopes are short antigenic peptide sequences which are recognized by antibodies or immune cell receptors. These are central to the development of immunotherapies, vaccines, and diagnostics. However, the rational design of synthetic epitope libraries is challenging due to the large combinatorial sequence space, $20^n$ combinations for linear epitopes of n amino acids, making screening and testing unfeasible, even with high throughput experimental techniques. In this study, we present a large language model, epiGPTope, pre-trained on protein data and specifically fine-tuned on linear epitopes, which for the first time can directly generate novel epitope-like sequences, which are found to possess statistical properties analogous to the ones of known epitopes. This generative approach can be used to prepare libraries of epitope candidate sequences. We further train statistical classifiers to predict whether an epitope sequence is of bacterial or viral origin, thus narrowing the candidate library and increasing the likelihood of identifying specific epitopes. We propose that such combination of generative and predictive models can be of assistance in epitope discovery. The approach uses only primary amino acid sequences of linear epitopes, bypassing the need for a geometric framework or hand-crafted features of the sequences. By developing a method to create biologically feasible sequences, we anticipate faster and more cost-effective generation and screening of synthetic epitopes, with relevant applications in the development of new biotechnologies.

【3】Beyond Words: Interjection Classification for Improved Human-Computer Interaction
标题：超越言语：改善人机交互的感叹词分类
链接：https://arxiv.org/abs/2509.03181

作者：en, Yuval Cohen, Alexander Apartsin, Yehudit Aperstein
备注：9 pages
摘要：在人机交互领域，促进人与机器之间的自然对话至关重要。这段对话中一个经常被忽视的关键部分是感叹词的使用，比如“mmm”和“hmm”。尽管它们经常用于表达同意、犹豫或请求信息，但这些感叹词通常被自动语音识别（ASR）引擎视为“非单词”。针对这一差距，我们介绍了一个新的任务，致力于感叹词分类，在该领域的先驱，我们的知识。这项任务是具有挑战性的，由于短的持续时间的感叹信号和显着的扬声器间和扬声器内的变化。在这项工作中，我们提出并发布了一个专门为感叹词分类收集的感叹词信号数据集。我们使用这个数据集来训练和评估基线深度学习模型。为了提高性能，我们使用节奏和音高变换等技术来增强训练数据集，这些技术显著提高了分类准确性，使模型更加鲁棒。感叹词数据集是一个用于增强管道、基线模型和评估脚本的Python库，可供研究社区使用。
摘要：In the realm of human-computer interaction, fostering a natural dialogue between humans and machines is paramount. A key, often overlooked, component of this dialogue is the use of interjections such as "mmm" and "hmm". Despite their frequent use to express agreement, hesitation, or requests for information, these interjections are typically dismissed as "non-words" by Automatic Speech Recognition (ASR) engines. Addressing this gap, we introduce a novel task dedicated to interjection classification, a pioneer in the field to our knowledge. This task is challenging due to the short duration of interjection signals and significant inter- and intra-speaker variability. In this work, we present and publish a dataset of interjection signals collected specifically for interjection classification. We employ this dataset to train and evaluate a baseline deep learning model. To enhance performance, we augment the training dataset using techniques such as tempo and pitch transformation, which significantly improve classification accuracy, making models more robust. The interjection dataset, a Python library for the augmentation pipeline, baseline model, and evaluation scripts, are available to the research community.

表征(1篇)

【1】SurGBSA: Learning Representations From Molecular Dynamics Simulations
标题：SurGBSA：从分子动力学模拟中学习表示
链接：https://arxiv.org/abs/2509.03084

作者：es, Yue Yang, Felice C. Lightstone, Niema Moshiri, Jonathan E. Allen, Tajana S. Rosing
摘要：从药物样化合物和蛋白质的静态结构进行自我监督预训练，可以实现强大的学习特征表示。学习功能展示了一系列预测任务的最新性能，包括分子特性，结构生成和蛋白质-配体相互作用。大多数的方法是有限的，他们使用的静态结构，它仍然是一个悬而未决的问题，如何最好地使用原子分子动力学（MD）模拟开发更广义的模型，以提高预测精度的新的分子结构。我们提出了SURrogate mmGBSA（SurGBSA）作为一种新的建模方法，用于基于MD的表示学习，它学习了分子力学广义出生表面积（MMGBSA）的代理函数。我们首次展示了物理信息预训练的好处，以训练一个代理MMGBSA模型，该模型收集了从CASF-2016基准的MD模拟中收集的超过140万个3D轨迹。与传统的基于物理的单点MMGBSA计算相比，SurGBSA显示出惊人的6，497倍加速，同时在识别正确的顶部姿势的挑战性姿势排名问题上几乎匹配单点MMGBSA准确度（-0.4%差异）。我们的工作通过在MD模拟上训练时显示模型改进来推进分子基础模型的发展。模型、代码和训练数据都是公开的。
摘要：Self-supervised pretraining from static structures of drug-like compounds and proteins enable powerful learned feature representations. Learned features demonstrate state of the art performance on a range of predictive tasks including molecular properties, structure generation, and protein-ligand interactions. The majority of approaches are limited by their use of static structures and it remains an open question, how best to use atomistic molecular dynamics (MD) simulations to develop more generalized models to improve prediction accuracy for novel molecular structures. We present SURrogate mmGBSA (SurGBSA) as a new modeling approach for MD-based representation learning, which learns a surrogate function of the Molecular Mechanics Generalized Born Surface Area (MMGBSA). We show for the first time the benefits of physics-informed pre-training to train a surrogate MMGBSA model on a collection of over 1.4 million 3D trajectories collected from MD simulations of the CASF-2016 benchmark. SurGBSA demonstrates a dramatic 6,497x speedup versus a traditional physics-based single-point MMGBSA calculation while nearly matching single-point MMGBSA accuracy on the challenging pose ranking problem for identification of the correct top pose (-0.4% difference). Our work advances the development of molecular foundation models by showing model improvements when training on MD simulations. Models, code and training data are made publicly available.

优化|敛散性(5篇)

【1】FoMEMO: Towards Foundation Models for Expensive Multi-objective Optimization
标题：FoMEMO：面向昂贵多目标优化的基础模型
链接：https://arxiv.org/abs/2509.03244

作者：o, Fei Liu, Liang Zhao, Xi Lin, Qingfu Zhang
摘要：昂贵的多目标优化在许多现实世界的场景中是一个普遍和关键的问题，其中样本效率是至关重要的，因为有限的评估，以恢复真正的帕累托前沿的决策。现有的工作要么涉及为遇到的每个新问题中的每个目标从头开始重建高斯过程代理，要么依赖于广泛的过去领域实验来预训练深度学习模型，这使得它们很难推广，并且不切实际地应对现实世界中的各种新兴应用。为了解决这个问题，我们提出了一个新的范例命名为FoMEMO（基础模型昂贵的多目标优化），这使得建立一个基础模型的条件下，任何域的轨迹和用户的偏好，并促进快速的上下文优化的基础上预测的偏好明智的聚合后验。我们没有在现实世界中进行广泛的领域实验，而是证明了用数亿个不同的合成数据对基础模型进行预训练可以对未知问题产生更好的适应性，而不需要在优化过程中进行任何后续的模型训练或更新。我们评估我们的方法在各种合成基准和实际应用，并证明其优越的通用性和竞争力的性能相比，现有的方法。
摘要：Expensive multi-objective optimization is a prevalent and crucial concern in many real-world scenarios, where sample-efficiency is vital due to the limited evaluations to recover the true Pareto front for decision making. Existing works either involve rebuilding Gaussian process surrogates from scratch for each objective in each new problem encountered, or rely on extensive past domain experiments for pre-training deep learning models, making them hard to generalize and impractical to cope with various emerging applications in the real world. To address this issue, we propose a new paradigm named FoMEMO (Foundation Models for Expensive Multi-objective Optimization), which enables the establishment of a foundation model conditioned on any domain trajectory and user preference, and facilitates fast in-context optimization based on the predicted preference-wise aggregation posteriors. Rather than accessing extensive domain experiments in the real world, we demonstrate that pre-training the foundation model with a diverse set of hundreds of millions of synthetic data can lead to superior adaptability to unknown problems, without necessitating any subsequent model training or updates in the optimization process. We evaluate our method across a variety of synthetic benchmarks and real-word applications, and demonstrate its superior generality and competitive performance compared to existing methods.

【2】Imitate Optimal Policy: Prevail and Induce Action Collapse in Policy Gradient
标题：模仿最优政策：政策梯度中盛行并导致行动崩溃
链接：https://arxiv.org/abs/2509.02737

作者：Zhou, Yibo Yang, Ziyan Chen, Fengxiang Bie, Haojun Xia, Xiaoxia Wu, Robert Wu, Ben Athiwaratkun, Bernard Ghanem, Shuaiwen Leon Song
备注：18 pages, 4 figures, 2 tables; includes supplementary material; preprint
摘要：强化学习中的策略梯度（PG）方法经常利用深度神经网络（DNN）来学习用于计算动作选择层中的似然性的特征表示的共享主干。关于政策网络的收敛性和全局最优性的研究很多，但很少有人分析政策网络的表征结构。在训练最优策略DNN时，我们观察到在某些约束下，出现了一个类似神经崩溃的温和结构，我们称之为动作崩溃（AC）。这表明：1）共享相同最优动作的状态-动作激活（即最后一层特征）向那些最优动作各自的平均激活塌陷; 2）共享相同最优动作的激活的可变性收敛到零; 3）动作选择层和平均激活的权重塌陷到单纯形等角紧框架（ETF）。我们的早期工作表明，上述限制是必要的，这些意见。由于最优策略DNN的折叠ETF最大限度地分离了状态-动作空间中所有动作的成对角度，我们自然会提出一个问题：我们可以使用ETF结构作为动作选择层中的（固定）目标配置来学习最优策略吗？我们的分析证明表明，以固定ETF作为动作选择层的学习激活自然会导致AC。因此，我们提出了行动崩溃政策梯度（ACPG）的方法，相应地附加一个合成的ETF作为我们的行动选择层。ACPG诱导策略DNN在动作选择层中产生这样的理想配置，同时保持最优。我们在各种OpenAI Gym环境中的实验表明，我们的技术可以集成到任何离散的PG方法中，并更快、更稳健地带来有利的奖励改善。
摘要：Policy gradient (PG) methods in reinforcement learning frequently utilize deep neural networks (DNNs) to learn a shared backbone of feature representations used to compute likelihoods in an action selection layer. Numerous studies have been conducted on the convergence and global optima of policy networks, but few have analyzed representational structures of those underlying networks. While training an optimal policy DNN, we observed that under certain constraints, a gentle structure resembling neural collapse, which we refer to as Action Collapse (AC), emerges. This suggests that 1) the state-action activations (i.e. last-layer features) sharing the same optimal actions collapse towards those optimal actions respective mean activations; 2) the variability of activations sharing the same optimal actions converges to zero; 3) the weights of action selection layer and the mean activations collapse to a simplex equiangular tight frame (ETF). Our early work showed those aforementioned constraints to be necessary for these observations. Since the collapsed ETF of optimal policy DNNs maximally separates the pair-wise angles of all actions in the state-action space, we naturally raise a question: can we learn an optimal policy using an ETF structure as a (fixed) target configuration in the action selection layer? Our analytical proof shows that learning activations with a fixed ETF as action selection layer naturally leads to the AC. We thus propose the Action Collapse Policy Gradient (ACPG) method, which accordingly affixes a synthetic ETF as our action selection layer. ACPG induces the policy DNN to produce such an ideal configuration in the action selection layer while remaining optimal. Our experiments across various OpenAI Gym environments demonstrate that our technique can be integrated into any discrete PG methods and lead to favorable reward improvements more quickly and robustly.

【3】Off-Policy Learning in Large Action Spaces: Optimization Matters More Than Estimation
标题：大型行动空间中的政策外学习：优化比估计更重要
链接：https://arxiv.org/abs/2509.03456

作者：li, Otmane Sakhi
备注：Recsys '25, CONSEQUENCES: Causality, Counterfactuals & Sequential Decision-Making Workshop
摘要：离线策略评估（OPE）和离线策略学习（OPL）是离线情境强盗决策的基础。OPL的最新进展主要是优化OPE估计与改进的统计特性，假设更好的估计固有地产生优越的政策。虽然理论上是合理的，但我们认为这种以估计器为中心的方法忽略了一个关键的实际障碍：具有挑战性的优化景观。在本文中，我们提供的理论见解和广泛的经验证据表明，目前的OPL方法遇到严重的优化问题，特别是行动空间变得很大。我们证明，更简单的加权对数似然目标享有更好的优化性能，仍然恢复竞争力，往往是优越的，学习的政策。我们的研究结果强调明确解决优化问题的必要性，在开发OPL算法的大动作空间。
摘要：Off-policy evaluation (OPE) and off-policy learning (OPL) are foundational for decision-making in offline contextual bandits. Recent advances in OPL primarily optimize OPE estimators with improved statistical properties, assuming that better estimators inherently yield superior policies. Although theoretically justified, we argue this estimator-centric approach neglects a critical practical obstacle: challenging optimization landscapes. In this paper, we provide theoretical insights and extensive empirical evidence showing that current OPL methods encounter severe optimization issues, particularly as action spaces become large. We demonstrate that simpler weighted log-likelihood objectives enjoy substantially better optimization properties and still recover competitive, often superior, learned policies. Our findings emphasize the necessity of explicitly addressing optimization considerations in the development of OPL algorithms for large action spaces.

【4】Non-Linear Counterfactual Aggregate Optimization
标题：非线性反事实汇总优化
链接：https://arxiv.org/abs/2509.03438

作者：Heymann, Otmane Sakhi
备注：Recsys '25, CONSEQUENCES: Causality, Counterfactuals & Sequential Decision-Making Workshop
摘要：我们考虑直接优化一个结果的非线性函数的问题，其中这个结果本身是许多小贡献的总和。函数的非线性意味着问题不等同于个体贡献期望的最大化。通过利用单个结果之和的集中特性，我们得到了一个可扩展的下降算法，直接优化我们的既定目标。例如，这允许最大化成功A/B测试的概率，为此，以成功标准为目标可能更明智，例如超过给定的提升，而不是追逐最高的预期收益。
摘要：We consider the problem of directly optimizing a non-linear function of an outcome, where this outcome itself is the sum of many small contributions. The non-linearity of the function means that the problem is not equivalent to the maximization of the expectation of the individual contribution. By leveraging the concentration properties of the sum of individual outcomes, we derive a scalable descent algorithm that directly optimizes for our stated objective. This allows for instance to maximize the probability of successful A/B test, for which it can be wiser to target a success criterion, such as exceeding a given uplift, rather than chasing the highest expected payoff.

【5】Faster Gradient Methods for Highly-smooth Stochastic Bilevel Optimization
标题：高度光滑随机二层优化的快速梯度方法
链接：https://arxiv.org/abs/2509.02937

作者：, Junru Li, Jingzhao Zhang
摘要：This paper studies the complexity of finding an $\epsilon$-stationary point for stochastic bilevel optimization when the upper-level problem is nonconvex and the lower-level problem is strongly convex. Recent work proposed the first-order method, F${}^2$SA, achieving the $\tilde{\mathcal{O}}(\epsilon^{-6})$ upper complexity bound for first-order smooth problems. This is slower than the optimal $\Omega(\epsilon^{-4})$ complexity lower bound in its single-level counterpart. In this work, we show that faster rates are achievable for higher-order smooth problems. We first reformulate F$^2$SA as approximating the hyper-gradient with a forward difference. Based on this observation, we propose a class of methods F${}^2$SA-$p$ that uses $p$th-order finite difference for hyper-gradient approximation and improves the upper bound to $\tilde{\mathcal{O}}(p \epsilon^{4-p/2})$ for $p$th-order smooth problems. Finally, we demonstrate that the $\Omega(\epsilon^{-4})$ lower bound also holds for stochastic bilevel problems when the high-order smoothness holds for the lower-level variable, indicating that the upper bound of F${}^2$SA-$p$ is nearly optimal in the highly smooth region $p = \Omega( \log \epsilon^{-1} / \log \log \epsilon^{-1})$.
摘要：This paper studies the complexity of finding an $\epsilon$-stationary point for stochastic bilevel optimization when the upper-level problem is nonconvex and the lower-level problem is strongly convex. Recent work proposed the first-order method, F${}^2$SA, achieving the $\tilde{\mathcal{O}}(\epsilon^{-6})$ upper complexity bound for first-order smooth problems. This is slower than the optimal $\Omega(\epsilon^{-4})$ complexity lower bound in its single-level counterpart. In this work, we show that faster rates are achievable for higher-order smooth problems. We first reformulate F$^2$SA as approximating the hyper-gradient with a forward difference. Based on this observation, we propose a class of methods F${}^2$SA-$p$ that uses $p$th-order finite difference for hyper-gradient approximation and improves the upper bound to $\tilde{\mathcal{O}}(p \epsilon^{4-p/2})$ for $p$th-order smooth problems. Finally, we demonstrate that the $\Omega(\epsilon^{-4})$ lower bound also holds for stochastic bilevel problems when the high-order smoothness holds for the lower-level variable, indicating that the upper bound of F${}^2$SA-$p$ is nearly optimal in the highly smooth region $p = \Omega( \log \epsilon^{-1} / \log \log \epsilon^{-1})$.

预测|估计(4篇)

【1】CloudFormer: An Attention-based Performance Prediction for Public Clouds with Unknown Workload
标题：CloudFormer：具有未知时间点的公共云的基于注意力的性能预测
链接：https://arxiv.org/abs/2509.03394

作者：in Shahbazinia, Darong Huang, Luis Costero, David Atienza
摘要：由于其可扩展性、灵活性和成本效益，越来越多地依赖云平台来托管各种资源密集型工作负载。在多租户云环境中，虚拟机在共享物理服务器上进行整合，以提高资源利用率。虽然虚拟化保证了CPU、内存和存储的资源分区，但它无法确保性能隔离。对共享资源（如末级缓存、内存带宽和网络接口）的竞争通常会导致严重的性能下降。现有的管理技术，包括虚拟机调度和资源供应，需要准确的性能预测，以减轻干扰。然而，由于虚拟机的黑盒性质和工作负载的高度动态性，这在公共云中仍然具有挑战性。为了解决这些限制，我们提出了CloudFormer，一个基于双分支转换器的模型，旨在预测黑盒环境中的VM性能下降。CloudFormer联合建模时间动态和系统级交互，在静态和动态场景中以1秒的分辨率利用206个系统指标。这种设计使模型能够捕获瞬态干扰效应，并适应不断变化的工作负载条件，而无需进行特定于MIMO的调整。补充的方法，我们提供了一个细粒度的数据集，显着扩展的时间分辨率和度量的多样性相比，现有的基准。实验结果表明，CloudFormer在多个评估指标上的表现始终优于最先进的基线，在各种以前看不见的工作负载中实现了强大的泛化。值得注意的是，CloudFormer的平均绝对误差（MAE）仅为7.8%，这意味着预测准确性有了实质性的提高，并且至少比现有方法高出28%。
摘要：Cloud platforms are increasingly relied upon to host diverse, resource-intensive workloads due to their scalability, flexibility, and cost-efficiency. In multi-tenant cloud environments, virtual machines are consolidated on shared physical servers to improve resource utilization. While virtualization guarantees resource partitioning for CPU, memory, and storage, it cannot ensure performance isolation. Competition for shared resources such as last-level cache, memory bandwidth, and network interfaces often leads to severe performance degradation. Existing management techniques, including VM scheduling and resource provisioning, require accurate performance prediction to mitigate interference. However, this remains challenging in public clouds due to the black-box nature of VMs and the highly dynamic nature of workloads. To address these limitations, we propose CloudFormer, a dual-branch Transformer-based model designed to predict VM performance degradation in black-box environments. CloudFormer jointly models temporal dynamics and system-level interactions, leveraging 206 system metrics at one-second resolution across both static and dynamic scenarios. This design enables the model to capture transient interference effects and adapt to varying workload conditions without scenario-specific tuning. Complementing the methodology, we provide a fine-grained dataset that significantly expands the temporal resolution and metric diversity compared to existing benchmarks. Experimental results demonstrate that CloudFormer consistently outperforms state-of-the-art baselines across multiple evaluation metrics, achieving robust generalization across diverse and previously unseen workloads. Notably, CloudFormer attains a mean absolute error (MAE) of just 7.8%, representing a substantial improvement in predictive accuracy and outperforming existing methods at least by 28%.

【2】Count2Density: Crowd Density Estimation without Location-level Annotations
标题：Count2 D密度：无需位置级别注释的人群密度估计
链接：https://arxiv.org/abs/2509.03170

作者：trico, Feng Chen, Michael Pound, Sotirios A Tsaftaris, Sebastiano Battiato, Mario Valerio Giuffrida
摘要：人群密度估计是一个众所周知的计算机视觉任务，旨在估计图像中的人的密度分布。这个领域的主要挑战是依赖于细粒度的位置级注释（即放置在每个个体之上的点）来训练深度网络。收集如此详细的注释既繁琐又耗时，并且对现实世界应用程序的可伸缩性构成了重大障碍。为了减轻这一负担，我们提出了Count2Density：一种新颖的管道，旨在预测仅使用计数级注释（即，在培训过程中，？为了实现这一点，Count2Density利用存储在历史地图库中的过去预测生成伪密度地图，从而减少确认偏差。该银行使用无监督的显着性估计器进行初始化，以提供初始空间先验，并使用预测密度图的EMA进行迭代更新。这些伪密度图是通过使用超几何分布从估计的人群区域中采样位置来获得的，其中采样的数量由计数级注释确定。为了进一步增强模型的空间意识，我们添加了一个自监督的对比空间正则化器，以鼓励拥挤区域内的相似特征表示，同时最大化与背景区域的差异。实验结果表明，我们的方法显着优于跨域自适应方法，并取得了更好的结果比最近的国家的最先进的方法在半监督设置跨几个数据集。额外的分析验证了我们管道中每个组件的有效性，确认了Count2Density从计数级注释中有效检索空间信息的能力，并实现了准确的子区域计数。
摘要：Crowd density estimation is a well-known computer vision task aimed at estimating the density distribution of people in an image. The main challenge in this domain is the reliance on fine-grained location-level annotations, (i.e. points placed on top of each individual) to train deep networks. Collecting such detailed annotations is both tedious, time-consuming, and poses a significant barrier to scalability for real-world applications. To alleviate this burden, we present Count2Density: a novel pipeline designed to predict meaningful density maps containing quantitative spatial information using only count-level annotations (i.e., total number of people) during training. To achieve this, Count2Density generates pseudo-density maps leveraging past predictions stored in a Historical Map Bank, thereby reducing confirmation bias. This bank is initialised using an unsupervised saliency estimator to provide an initial spatial prior and is iteratively updated with an EMA of predicted density maps. These pseudo-density maps are obtained by sampling locations from estimated crowd areas using a hypergeometric distribution, with the number of samplings determined by the count-level annotations. To further enhance the spatial awareness of the model, we add a self-supervised contrastive spatial regulariser to encourage similar feature representations within crowded regions while maximising dissimilarity with background regions. Experimental results demonstrate that our approach significantly outperforms cross-domain adaptation methods and achieves better results than recent state-of-the-art approaches in semi-supervised settings across several datasets. Additional analyses validate the effectiveness of each individual component of our pipeline, confirming the ability of Count2Density to effectively retrieve spatial information from count-level annotations and enabling accurate subregion counting.

【3】AR-KAN: Autoregressive-Weight-Enhanced Kolmogorov-Arnold Network for Time Series Forecasting
标题：AR-KAN：用于时间序列预测的自回归加权增强Kolmogorov-Arnold网络
链接：https://arxiv.org/abs/2509.02967

作者：, Tiehang Xu, Qiao Wang
摘要：传统的神经网络在信号的频谱分析中经常面临挑战。为了应对这一挑战，傅立叶神经网络（FNN）和类似的方法将傅立叶级数的分量集成到神经网络的结构中。尽管如此，一个重要的障碍往往被忽视：周期信号的叠加不一定会导致周期信号。例如，当预测由频率不相称的信号组成的几乎周期函数时，自回归积分移动平均（ARIMA）等传统模型通常优于大多数神经网络，包括大型语言模型（LLM）。为了实现这一目标，我们提出了自回归权重增强AR-KAN，这是一种结合了两种方法优点的混合模型。使用通用近视映射定理，我们将Kolmogorov-Arnold网络（KAN）应用于静态非线性部分，并通过预训练的AR组件包含记忆，这可以解释为保留最有用的信息，同时消除冗余。实验数据表明，AR-KAN在72\%$的真实世界数据集上提供了优异的结果。
摘要：Conventional neural networks frequently face challenges in spectral analysis of signals. To address this challenge, Fourier neural networks (FNNs) and similar approaches integrate components of Fourier series into the structure of neural networks. Nonetheless, a significant hurdle is often overlooked: the superposition of periodic signals does not necessarily result in a periodic signal. For example, when forecasting almost periodic functions composed of signals with incommensurate frequencies, traditional models such as Autoregressive Integrated Moving Average (ARIMA) frequently outperform most neural networks including large language models (LLMs). To tackle this goal, we propose Autoregressive-Weight-Enhanced AR-KAN, a hybrid model that combines the benefits of both methods. Using the Universal Myopic Mapping Theorem, we apply a Kolmogorov-Arnold Network (KAN) for the static nonlinear part and include memory through a pre-trained AR component, which can be explained to retain the most useful information while eliminating redundancy. Experimental data indicates that AR-KAN delivers superior results on $72\%$ of real-world datasets.

【4】Conformal Prediction for Time-series Forecasting with Change Points
标题：具有变化点的时间序列预测的保形预测
链接：https://arxiv.org/abs/2509.02844

作者：n, Rose Yu
摘要：共形预测已被探索为提供时间序列不确定性量化的通用且有效的方法。然而，目前的方法很难处理具有变化点的时间序列数据-底层数据生成过程中的突然变化。在本文中，我们提出了一种新的共形预测的时间序列与变化点（CPTC）算法，解决了这一差距，通过集成模型预测的基础状态与在线共形预测模型的不确定性在非平稳时间序列。我们证明了CPTC的有效性和改进的自适应性的时间序列设置下的最小假设，并证明了CPTC的实际有效性6合成和真实世界的数据集，显示出改进的有效性和自适应性相比，国家的最先进的基线。
摘要：Conformal prediction has been explored as a general and efficient way to provide uncertainty quantification for time series. However, current methods struggle to handle time series data with change points - sudden shifts in the underlying data-generating process. In this paper, we propose a novel Conformal Prediction for Time-series with Change points (CPTC) algorithm, addressing this gap by integrating a model to predict the underlying state with online conformal prediction to model uncertainties in non-stationary time series. We prove CPTC's validity and improved adaptivity in the time series setting under minimum assumptions, and demonstrate CPTC's practical effectiveness on 6 synthetic and real-world datasets, showing improved validity and adaptivity compared to state-of-the-art baselines.

其他神经网络|深度学习|模型|建模(22篇)

【1】Can the Waymo Open Motion Dataset Support Realistic Behavioral Modeling? A Validation Study with Naturalistic Trajectories
标题：Waymo开放运动数据集可以支持真实行为建模吗？自然主义轨迹的验证研究
链接：https://arxiv.org/abs/2509.03515

作者：ang, Sungyong Chung, Nachuan Li, Dana Monzer, Hani S. Mahmassani, Samer H. Hamdar, Alireza Talebpour
摘要：Waymo开放运动数据集（WOMD）已成为自动驾驶汽车（AV）行为数据驱动建模的热门资源。然而，由于专有的后处理，缺乏错误量化，以及将轨迹分割成20秒的片段，其行为分析的有效性仍然不确定。本研究探讨了WOMD是否准确地捕捉到在现实世界的AV操作中观察到的动态和相互作用。利用从亚利桑那州凤凰城（PHX）的4级AV运营中独立收集的自然数据集，我们对三种代表性的城市驾驶场景进行了比较分析：在信号交叉口放电，车辆跟随和换道行为。对于放电分析，车头时距是从航空视频中手动提取的，以确保测量误差可以忽略不计。对于车辆跟随和换道的情况下，我们应用模拟外推（SIMEX）方法来解释经验估计的PHX数据中的误差，并使用动态时间规整（DTW）距离来量化行为差异。所有场景的结果一致表明，PHX中的行为超出了WOMD的行为范围。值得注意的是，WOMD低估了短车头时距和突然减速。这些发现表明，仅基于WOMD校准的行为模型可能会系统地低估自然驾驶的可变性，风险和复杂性。因此，在没有对独立收集的数据进行适当验证的情况下，使用WOMD进行行为建模时需要谨慎。
摘要：The Waymo Open Motion Dataset (WOMD) has become a popular resource for data-driven modeling of autonomous vehicles (AVs) behavior. However, its validity for behavioral analysis remains uncertain due to proprietary post-processing, the absence of error quantification, and the segmentation of trajectories into 20-second clips. This study examines whether WOMD accurately captures the dynamics and interactions observed in real-world AV operations. Leveraging an independently collected naturalistic dataset from Level 4 AV operations in Phoenix, Arizona (PHX), we perform comparative analyses across three representative urban driving scenarios: discharging at signalized intersections, car-following, and lane-changing behaviors. For the discharging analysis, headways are manually extracted from aerial video to ensure negligible measurement error. For the car-following and lane-changing cases, we apply the Simulation-Extrapolation (SIMEX) method to account for empirically estimated error in the PHX data and use Dynamic Time Warping (DTW) distances to quantify behavioral differences. Results across all scenarios consistently show that behavior in PHX falls outside the behavioral envelope of WOMD. Notably, WOMD underrepresents short headways and abrupt decelerations. These findings suggest that behavioral models calibrated solely on WOMD may systematically underestimate the variability, risk, and complexity of naturalistic driving. Caution is therefore warranted when using WOMD for behavior modeling without proper validation against independently collected data.

【2】LimiX: Unleashing Structured-Data Modeling Capability for Generalist Intelligence
标题：LimiX：释放通才智能的结构数据建模能力
链接：https://arxiv.org/abs/2509.03505

作者：Zhang, Gang Ren, Han Yu, Hao Yuan, Hui Wang, Jiansheng Li, Jiayun Wu, Lang Mo, Li Mao, Mingchao Hao, Ningbo Dai, Renzhe Xu, Shuyang Li, Tianyang Zhang, Yue He, Yuanrui Wang, Yunjia Zhang, Zijing Xu, Dongzhe Li, Fang Gao, Hao Zou, Jiandong Liu, Jiashuo Liu, Jiawei Xu, Kaijie Cheng, Kehan Li, Linjun Zhou, Qing Li, Shaohua Fan, Xiaoyu Lin, Xinyan Han, Xuanyue Li, Yan Lu, Yuan Xue, Yuanyuan Jiang, Zimu Wang, Zhenlei Wang, Peng Cui
备注：56 pages
摘要：我们认为，一般智能的进展需要基于语言，物理世界和结构化数据的互补基础模型。本报告介绍了LimiX，这是我们的大型结构化数据模型（LDM）的第一部分。LimiX将结构化数据视为变量和缺失的联合分布，因此能够通过单一模型通过基于查询的条件预测来解决各种表格任务。LimiX使用具有情景、上下文条件目标的掩蔽联合分布建模进行预训练，其中模型预测以特定于特定我们评估LimiX在10个大型结构化数据基准与广泛的制度的样本量，特征维度，类数，分类数字特征比，缺失，和样本特征比。凭借单一模型和统一界面，LimiX始终超越强大的基线，包括梯度提升树，深度表格网络，最新的表格基础模型和自动化集成，如图1和图2所示。这种优势适用于各种任务，例如分类、回归、缺失值插补和数据生成，通常具有很大的优势，同时避免了特定于任务的架构或针对每个任务的定制训练。所有LimiX模型都可以在Apache 2.0下公开访问。
摘要：We argue that progress toward general intelligence requires complementary foundation models grounded in language, the physical world, and structured data. This report presents LimiX, the first installment of our large structured-data models (LDMs). LimiX treats structured data as a joint distribution over variables and missingness, thus capable of addressing a wide range of tabular tasks through query-based conditional prediction via a single model. LimiX is pretrained using masked joint-distribution modeling with an episodic, context-conditional objective, where the model predicts for query subsets conditioned on dataset-specific contexts, supporting rapid, training-free adaptation at inference. We evaluate LimiX across 10 large structured-data benchmarks with broad regimes of sample size, feature dimensionality, class number, categorical-to-numerical feature ratio, missingness, and sample-to-feature ratios. With a single model and a unified interface, LimiX consistently surpasses strong baselines including gradient-boosting trees, deep tabular networks, recent tabular foundation models, and automated ensembles, as shown in Figure 1 and Figure 2. The superiority holds across a wide range of tasks, such as classification, regression, missing value imputation, and data generation, often by substantial margins, while avoiding task-specific architectures or bespoke training per task. All LimiX models are publicly accessible under Apache 2.0.

【3】SafeProtein: Red-Teaming Framework and Benchmark for Protein Foundation Models
标题：SafeProtein：蛋白质基础模型的红团队框架和基准
链接：https://arxiv.org/abs/2509.03487

作者：n, Zhenghong Zhou, Ruofan Jin, Le Cong, Mengdi Wang, Zaixi Zhang
摘要：蛋白质在几乎所有的生物过程中起着至关重要的作用。深度学习的进步大大加速了蛋白质基础模型的发展，导致蛋白质理解和设计的重大成功。然而，这些模型缺乏系统的红队，这引起了人们对它们可能被滥用的严重担忧，例如产生具有生物安全风险的蛋白质。本文介绍了SafeProtein，这是第一个为蛋白质基础模型设计的红队框架。SafeProtein结合了多模式提示工程和启发式波束搜索，系统地设计了红队方法，并对蛋白质基础模型进行了测试。我们还策划了SafeProtein-Bench，其中包括手动构建的红队基准数据集和全面的评估协议。SafeProtein在最先进的蛋白质基础模型上实现了连续越狱（针对ESM 3的攻击成功率高达70%），揭示了当前蛋白质基础模型中潜在的生物安全风险，并为前沿模型开发强大的安全保护技术提供了见解。这些守则将在https://github.com/jigang-fan/SafeProtein上公布。
摘要：Proteins play crucial roles in almost all biological processes. The advancement of deep learning has greatly accelerated the development of protein foundation models, leading to significant successes in protein understanding and design. However, the lack of systematic red-teaming for these models has raised serious concerns about their potential misuse, such as generating proteins with biological safety risks. This paper introduces SafeProtein, the first red-teaming framework designed for protein foundation models to the best of our knowledge. SafeProtein combines multimodal prompt engineering and heuristic beam search to systematically design red-teaming methods and conduct tests on protein foundation models. We also curated SafeProtein-Bench, which includes a manually constructed red-teaming benchmark dataset and a comprehensive evaluation protocol. SafeProtein achieved continuous jailbreaks on state-of-the-art protein foundation models (up to 70% attack success rate for ESM3), revealing potential biological safety risks in current protein foundation models and providing insights for the development of robust security protection technologies for frontier models. The codes will be made publicly available at https://github.com/jigang-fan/SafeProtein.

【4】Robult: Leveraging Redundancy and Modality Specific Features for Robust Multimodal Learning
标题：Robult：利用冗余和模式特定特征实现稳健的多模式学习
链接：https://arxiv.org/abs/2509.03477

作者：uyen, Abhi Kamboj, Minh N. Do
备注：Accepted and presented at IJCAI 2025 in Montreal, Canada
摘要：解决缺失的模态和有限的标记数据对于推进鲁棒的多模态学习至关重要。我们提出了Robult，一个可扩展的框架，旨在通过保留特定于模态的信息和利用冗余，通过一种新的信息理论的方法，以减轻这些挑战。Robult优化了两个核心目标：（1）软正未标记（PU）对比损失，最大化任务相关特征对齐，同时有效利用半监督设置中的有限标记数据，以及（2）潜在重建损失，确保保留独特的模态特定信息。这些策略嵌入在模块化设计中，增强了各种下游任务的性能，并确保在推理过程中对不完整模态的恢复能力。在不同数据集上的实验结果验证了Robult在半监督学习和缺失模态上下文中的性能优于现有方法。此外，它的轻量级设计促进了可扩展性和与现有架构的无缝集成，使其适用于现实世界的多模式应用程序。
摘要：Addressing missing modalities and limited labeled data is crucial for advancing robust multimodal learning. We propose Robult, a scalable framework designed to mitigate these challenges by preserving modality-specific information and leveraging redundancy through a novel information-theoretic approach. Robult optimizes two core objectives: (1) a soft Positive-Unlabeled (PU) contrastive loss that maximizes task-relevant feature alignment while effectively utilizing limited labeled data in semi-supervised settings, and (2) a latent reconstruction loss that ensures unique modality-specific information is retained. These strategies, embedded within a modular design, enhance performance across various downstream tasks and ensure resilience to incomplete modalities during inference. Experimental results across diverse datasets validate that Robult achieves superior performance over existing approaches in both semi-supervised learning and missing modality contexts. Furthermore, its lightweight design promotes scalability and seamless integration with existing architectures, making it suitable for real-world multimodal applications.

【5】DPQuant: Efficient and Differentially-Private Model Training via Dynamic Quantization Scheduling
标题：DPQuant：通过动态量化调度进行高效且差异私密的模型训练
链接：https://arxiv.org/abs/2509.03472

作者： Renbo Tu, Gennady Pekhimenko, Nandita Vijaykumar
摘要：差分隐私SGD（DP-SGD）是一种在使用敏感数据训练神经网络时保护用户隐私的强大技术。在训练期间，将模型权重和激活转换为低精度格式，即，量化，可以大大减少训练时间，能量消耗和成本，因此是一种广泛使用的技术。在这项工作中，我们证明了量化导致显着更高的精度退化DP-SGD相比，定期SGD。我们观察到，这是由DP-SGD中的噪声注入引起的，其放大了量化方差，导致不成比例的大精度下降。为了解决这一挑战，我们提出了QPQuant，一个动态量化框架，自适应地选择一个不断变化的层的子集，以在每个时期的量化。我们的方法结合了两个关键思想，有效地降低了量化方差：（i）对每个历元旋转量化哪些层的层的概率采样，以及（ii）损失感知层优先级，它使用差分私有损失敏感度估计器来识别可以量化的层，对模型质量的影响最小。这个估计器消耗了整个隐私预算的一小部分，保留了DP保证。在一系列数据集上对ResNet 18、ResNet 50和DenseNet 121进行的实证评估表明，DPQuant的性能始终优于静态量化基线，实现了接近帕累托最优的精度-计算权衡，在低精度硬件上实现了高达2.21倍的理论吞吐量提升，验证精度下降不到2%。
摘要：Differentially-Private SGD (DP-SGD) is a powerful technique to protect user privacy when using sensitive data to train neural networks. During training, converting model weights and activations into low-precision formats, i.e., quantization, can drastically reduce training times, energy consumption, and cost, and is thus a widely used technique. In this work, we demonstrate that quantization causes significantly higher accuracy degradation in DP-SGD compared to regular SGD. We observe that this is caused by noise injection in DP-SGD, which amplifies quantization variance, leading to disproportionately large accuracy degradation. To address this challenge, we present QPQuant, a dynamic quantization framework that adaptively selects a changing subset of layers to quantize at each epoch. Our method combines two key ideas that effectively reduce quantization variance: (i) probabilistic sampling of the layers that rotates which layers are quantized every epoch, and (ii) loss-aware layer prioritization, which uses a differentially private loss sensitivity estimator to identify layers that can be quantized with minimal impact on model quality. This estimator consumes a negligible fraction of the overall privacy budget, preserving DP guarantees. Empirical evaluations on ResNet18, ResNet50, and DenseNet121 across a range of datasets demonstrate that DPQuant consistently outperforms static quantization baselines, achieving near Pareto-optimal accuracy-compute trade-offs and up to 2.21x theoretical throughput improvements on low-precision hardware, with less than 2% drop in validation accuracy.

【6】Initialization Schemes for Kolmogorov-Arnold Networks: An Empirical Study
标题：Kolmogorov-Arnold网络的调试方案：实证研究
链接：https://arxiv.org/abs/2509.03417

作者：gas, Dhruv Verma, Georgios Alexandridis, Yixuan Wang
备注：30 pages, 19 figures
摘要：Kolmogorov-Arnold网络（KAN）是最近推出的一种神经架构，它用可训练的激活函数取代了固定的非线性，提供了增强的灵活性和可解释性。虽然KAN已经成功地应用于科学和机器学习任务，但它们的初始化策略在很大程度上仍未被探索。在这项工作中，我们研究了基于样条的KAN的初始化方案，提出了两种理论驱动的方法，灵感来自LeCun和Glorot，以及一个经验幂律家族可调指数。我们的评估结合了对函数拟合和前向PDE基准的大规模网格搜索，通过神经切线内核的镜头对训练动态的分析，以及对Feynman数据集子集的评估。我们的研究结果表明，Glorot启发的初始化在参数丰富的模型中显着优于基线，而幂律初始化在跨任务和不同大小的架构中实现了最强的整体性能。本手稿附带的所有代码和数据均可在https://github.com/srigas/KAN_Initialization_Schemes上公开获取。
摘要：Kolmogorov-Arnold Networks (KANs) are a recently introduced neural architecture that replace fixed nonlinearities with trainable activation functions, offering enhanced flexibility and interpretability. While KANs have been applied successfully across scientific and machine learning tasks, their initialization strategies remain largely unexplored. In this work, we study initialization schemes for spline-based KANs, proposing two theory-driven approaches inspired by LeCun and Glorot, as well as an empirical power-law family with tunable exponents. Our evaluation combines large-scale grid searches on function fitting and forward PDE benchmarks, an analysis of training dynamics through the lens of the Neural Tangent Kernel, and evaluations on a subset of the Feynman dataset. Our findings indicate that the Glorot-inspired initialization significantly outperforms the baseline in parameter-rich models, while power-law initialization achieves the strongest performance overall, both across tasks and for architectures of varying size. All code and data accompanying this manuscript are publicly available at https://github.com/srigas/KAN_Initialization_Schemes.

【7】Automatic Differentiation of Agent-Based Models
标题：基于Agent模型的自动微分
链接：https://arxiv.org/abs/2509.03303

作者：ra-Bofarull, Nicholas Bishop, Joel Dyer, Daniel Jarne Ornia, Anisoara Calinescu, Doyne Farmer, Michael Wooldridge
摘要：基于代理的模型（ABM）通过捕获组成系统的各个代理的自下而上的交互来模拟复杂系统。许多复杂的系统，如流行病或金融市场，涉及数千甚至数百万的代理人。因此，反弹道导弹往往成为计算上的要求，并依赖于众多的自由参数的校准，这大大阻碍了他们的广泛采用。在本文中，我们证明了自动微分（AD）技术可以有效地减轻这些计算负担。通过将AD应用于ABM，模拟器的梯度变得容易获得，大大方便了校准和灵敏度分析等基本任务。具体来说，我们展示了AD如何使变分推理（VI）技术有效的参数校准。我们的实验表明，大量的性能改善和计算节省使用VI三个突出的ABM：Axtell的公司模型; Sugarscape;和SIR流行病学模型。因此，我们的方法显着提高了实用性和可扩展性的ABMs研究复杂的系统。
摘要：Agent-based models (ABMs) simulate complex systems by capturing the bottom-up interactions of individual agents comprising the system. Many complex systems of interest, such as epidemics or financial markets, involve thousands or even millions of agents. Consequently, ABMs often become computationally demanding and rely on the calibration of numerous free parameters, which has significantly hindered their widespread adoption. In this paper, we demonstrate that automatic differentiation (AD) techniques can effectively alleviate these computational burdens. By applying AD to ABMs, the gradients of the simulator become readily available, greatly facilitating essential tasks such as calibration and sensitivity analysis. Specifically, we show how AD enables variational inference (VI) techniques for efficient parameter calibration. Our experiments demonstrate substantial performance improvements and computational savings using VI on three prominent ABMs: Axtell's model of firms; Sugarscape; and the SIR epidemiological model. Our approach thus significantly enhances the practicality and scalability of ABMs for studying complex systems.

【8】HyPV-LEAD: Proactive Early-Warning of Cryptocurrency Anomalies through Data-Driven Structural-Temporal Modeling
标题：HyPV-LEAD：通过数据驱动的结构-时态建模对加密货币异常进行主动预警
链接：https://arxiv.org/abs/2509.03260

作者：ark, Gyuyeon Na, Soyoun Kim, Sunyoung Moon, HyeonJeong Cha, Sangmi Chai
摘要：异常的加密货币交易-例如混合服务，欺诈性转移和泵和转储操作-对金融完整性构成了不断升级的风险，但由于类别不平衡，时间波动性和复杂的网络依赖性，仍然难以检测。现有的方法主要是以模型为中心和事后的，只有在异常发生后才对其进行标记，因此提供的预防价值有限。本文介绍了HyPV-LEAD（双曲线峰谷提前期启用异常检测），数据驱动的预警框架，显式地将提前期异常检测。与之前的方法不同，HyPV-LEAD集成了三项创新：（1）窗口水平建模，以保证可操作的提前期警报，（2）峰谷（PV）采样，以减轻类别不平衡，同时保持时间连续性，以及（3）双曲线嵌入，以捕获区块链交易网络的分层和无标度属性。对大规模比特币交易数据的实证评估表明，HyPV-LEAD始终优于最先进的基线，实现了0.9624的PR-AUC，在精确度和召回率方面有显着提高。消融研究进一步证实，每个组件- PV采样，双曲线嵌入和结构-时间建模-提供互补的好处，与完整的框架提供最高的性能。通过将异常检测从被动分类转变为主动预警，HyPV-LEAD为动态区块链环境中的实时风险管理、反洗钱（AML）合规性和金融安全奠定了坚实的基础。
摘要：Abnormal cryptocurrency transactions - such as mixing services, fraudulent transfers, and pump-and-dump operations -- pose escalating risks to financial integrity but remain notoriously difficult to detect due to class imbalance, temporal volatility, and complex network dependencies. Existing approaches are predominantly model-centric and post hoc, flagging anomalies only after they occur and thus offering limited preventive value. This paper introduces HyPV-LEAD (Hyperbolic Peak-Valley Lead-time Enabled Anomaly Detection), a data-driven early-warning framework that explicitly incorporates lead time into anomaly detection. Unlike prior methods, HyPV-LEAD integrates three innovations: (1) window-horizon modeling to guarantee actionable lead-time alerts, (2) Peak-Valley (PV) sampling to mitigate class imbalance while preserving temporal continuity, and (3) hyperbolic embedding to capture the hierarchical and scale-free properties of blockchain transaction networks. Empirical evaluation on large-scale Bitcoin transaction data demonstrates that HyPV-LEAD consistently outperforms state-of-the-art baselines, achieving a PR-AUC of 0.9624 with significant gains in precision and recall. Ablation studies further confirm that each component - PV sampling, hyperbolic embedding, and structural-temporal modeling - provides complementary benefits, with the full framework delivering the highest performance. By shifting anomaly detection from reactive classification to proactive early-warning, HyPV-LEAD establishes a robust foundation for real-time risk management, anti-money laundering (AML) compliance, and financial security in dynamic blockchain environments.

【9】NeurStore: Efficient In-database Deep Learning Model Management System
标题：NeurStore：高效的数据库内深度学习模型管理系统
链接：https://arxiv.org/abs/2509.03228

作者：g, Sheng Wang, Xiaokui Xiao, Cong Yue, Zhanhao Zhao, Beng Chin Ooi
备注：15 pages, 14 figures, Accepted at SIGMOD 2026
摘要：随着数据库内人工智能分析的普及，对数据库系统的需求越来越大，以有效地管理不断扩大的深度学习模型的数量和规模。然而，现有的数据库系统通常将整个模型存储为单片文件，或者应用忽略深度学习模型的结构特征的压缩技术，从而导致次优的模型存储开销。本文介绍了NeurStore，这是一种新型的数据库模型管理系统，可以有效地存储和利用深度学习模型。首先，NeurStore采用基于张量的模型存储引擎，在数据库中实现细粒度的模型存储。特别是，我们增强了分层导航小世界（HNSW）图索引张量，并只存储额外的增量张量在预定义的相似性阈值，以确保张量级去重。其次，我们提出了一个增量量化算法，有效地压缩增量张量，从而实现了可控的模型精度损失的优越的压缩比。最后，我们设计了一个压缩感知的模型加载机制，通过在压缩张量上直接计算来提高模型利用率。实验评估表明，与最先进的方法相比，NeurStore实现了卓越的压缩比和有竞争力的模型加载吞吐量。
摘要：With the prevalence of in-database AI-powered analytics, there is an increasing demand for database systems to efficiently manage the ever-expanding number and size of deep learning models. However, existing database systems typically store entire models as monolithic files or apply compression techniques that overlook the structural characteristics of deep learning models, resulting in suboptimal model storage overhead. This paper presents NeurStore, a novel in-database model management system that enables efficient storage and utilization of deep learning models. First, NeurStore employs a tensor-based model storage engine to enable fine-grained model storage within databases. In particular, we enhance the hierarchical navigable small world (HNSW) graph to index tensors, and only store additional deltas for tensors within a predefined similarity threshold to ensure tensor-level deduplication. Second, we propose a delta quantization algorithm that effectively compresses delta tensors, thus achieving a superior compression ratio with controllable model accuracy loss. Finally, we devise a compression-aware model loading mechanism, which improves model utilization performance by enabling direct computation on compressed tensors. Experimental evaluations demonstrate that NeurStore achieves superior compression ratios and competitive model loading throughput compared to state-of-the-art approaches.

【10】Exploring the Design Space of Fair Tree Learning Algorithms
标题：探索公平树学习算法的设计空间
链接：https://arxiv.org/abs/2509.03204

作者：mpel, Mattia Cerrato, Stefan Kramer
摘要：决策树在公平的背景下得到了广泛的研究，旨在最大限度地提高预测性能，同时确保不歧视不同的群体。这个领域的技术通常集中在在训练时施加约束，约束搜索空间，以便不考虑、丢弃或不鼓励显示不可接受的相关度量值的解决方案。如果我们假设一个目标变量y和一个敏感属性s，树学习算法的设计空间可以如下扩展：（i）可以有一棵树T，它是使用作为y，s和T的函数的目标函数构建的。例如，可以基于关于y（最大化）和s（最小化）的加权信息增益来构建树。(ii)第二个选项是有一个树模型T，它使用y和T中的目标函数以及s和T上的约束。在这里，s不再是目标的一部分，而是约束的一部分。这可以通过一旦优化y中的目标的条件不满足s上的约束就中止进一步的分裂来实现。探索其他分裂的一种简单方法是在树构造期间一旦违反公平性约束就回溯。(iii)第三个选项是有两个树T_y和T_s，一个用于y，一个用于s，这样y和s的树结构不必共享。通过这种方式，可以独立地使用关于y和关于s的信息，而不必通过两个变量之间的互信息来约束树构造中的选择。相当令人惊讶的是，在这三种选择中，只有第一种和第二种的贪婪变体在文献中进行了研究。在本文中，我们从设计空间中引入了上述两个附加选项，并在多个数据集上进行了实验。
摘要：Decision trees have been studied extensively in the context of fairness, aiming to maximize prediction performance while ensuring non-discrimination against different groups. Techniques in this space usually focus on imposing constraints at training time, constraining the search space so that solutions which display unacceptable values of relevant metrics are not considered, discarded, or discouraged. If we assume one target variable y and one sensitive attribute s, the design space of tree learning algorithms can be spanned as follows: (i) One can have one tree T that is built using an objective function that is a function of y, s, and T. For instance, one can build a tree based on the weighted information gain regarding y (maximizing) and s (minimizing). (ii) The second option is to have one tree model T that uses an objective function in y and T and a constraint on s and T. Here, s is no longer part of the objective, but part of a constraint. This can be achieved greedily by aborting a further split as soon as the condition that optimizes the objective in y fails to satisfy the constraint on s. A simple way to explore other splits is to backtrack during tree construction once a fairness constraint is violated. (iii) The third option is to have two trees T_y and T_s, one for y and one for s, such that the tree structure for y and s does not have to be shared. In this way, information regarding y and regarding s can be used independently, without having to constrain the choices in tree construction by the mutual information between the two variables. Quite surprisingly, of the three options, only the first one and the greedy variant of the second have been studied in the literature so far. In this paper, we introduce the above two additional options from that design space and characterize them experimentally on multiple datasets.

【11】Tabular foundation model for GEOAI benchmark problems BM/AirportSoilProperties/2/2025
标题：GEOAI基准问题的表格基础模型BM/AirportSoilProperties/2/2025
链接：https://arxiv.org/abs/2509.03191

作者：to, Yu Otake, Stephen Wu
摘要：本文提出了一种新的应用程序的表格先验数据拟合网络（TabPFN）-基于变换的基础模型的表格数据-岩土工程现场表征问题中定义的GEOAI基准BM/AirportSoilProperties/2/2025。涉及两项任务：（1）预测钻孔深度剖面上不排水抗剪强度（su）的空间变化，（2）在密集场地数据集中估算缺失的力学参数。我们将TabPFN应用于零训练，Few-Shot，上下文学习设置-无需超参数调整-并从大型间接数据库（BID）中为其提供额外的上下文。该研究表明，TabPFN作为一种通用基础模型，与传统的分层贝叶斯模型（HBM）基线相比，实现了更高的准确性和校准良好的预测分布，同时还提供了推理效率的显着提高。在基准测试问题#1（空间su预测）中，TabPFN在预测精度方面优于HBM，并提供了数量级更快的运行时间。在基准问题#2（缺失机械参数插补）中，TabPFN同样对所有目标参数实现了较低的RMSE，具有良好的量化不确定性，尽管由于其一次一个变量的推断，其累积计算成本高于HBM。这些结果标志着第一次成功地使用表格基础模型在岩土工程建模，这表明一个潜在的范式转变的概率现场表征。
摘要：This paper presents a novel application of the Tabular Prior-Data Fitted Network (TabPFN) - a transformer-based foundation model for tabular data - to geotechnical site characterization problems defined in the GEOAI benchmark BM/AirportSoilProperties/2/2025. Two tasks are addressed: (1) predicting the spatial variation of undrained shear strength (su) across borehole depth profiles, and (2) imputing missing mechanical parameters in a dense-site dataset. We apply TabPFN in a zero-training, few-shot, in-context learning setting - without hyper-parameter tuning - and provide it with additional context from the big indirect database (BID). The study demonstrates that TabPFN, as a general-purpose foundation model, achieved superior accuracy and well-calibrated predictive distributions compared to a conventional hierarchical Bayesian model (HBM) baseline, while also offering significant gains in inference efficiency. In Benchmark Problem #1 (spatial su prediction), TabPFN outperformed the HBM in prediction accuracy and delivered an order-of-magnitude faster runtime. In Benchmark Problem #2 (missing mechanical parameter imputation), TabPFN likewise achieved lower RMSE for all target parameters with well-quantified uncertainties, though its cumulative computation cost was higher than HBM's due to its one-variable-at-a-time inference. These results mark the first successful use of a tabular foundation model in geotechnical modeling, suggesting a potential paradigm shift in probabilistic site characterization.

【12】Temporally-Aware Diffusion Model for Brain Progression Modelling with Bidirectional Temporal Regularisation
标题：具有双向时间规则化的脑进程建模的时间感知扩散模型
链接：https://arxiv.org/abs/2509.03141

作者：trico, Francesco Guarnera, Mario Valerio Giuffrida, Daniele Ravì, Sebastiano Battiato
摘要：生成逼真的MRI以准确预测大脑结构的未来变化是临床医生评估临床结果和分析患者水平疾病进展的宝贵工具。然而，当前现有的方法存在一些局限性：（i）一些方法不能明确地捕获结构变化和时间间隔之间的关系，特别是当在年龄不平衡的数据集上训练时;（ii）其他方法仅依赖于扫描插值，这缺乏临床实用性，因为它们生成时间点之间的中间图像而不是未来的病理进展;以及（iii）大多数方法依赖于基于2D切片的架构，从而忽略了对于精确纵向预测至关重要的完整3D解剖背景。我们提出了一个三维时间感知扩散模型（TADM-3D），它可以准确地预测MRI体积上的脑进展。为了更好地建模时间间隔和大脑变化之间的关系，TADM-3D使用预训练的脑年龄估计（BAE），该估计在生成MRI时指导扩散模型，准确反映基线和生成的随访扫描之间的预期年龄差异。此外，为了进一步提高TADM-3D的时间意识，我们提出了时间回溯规则化（BITR），通过训练TADM-3D从基线到随访（向前）以及从随访到基线（向后）进行双向预测。虽然预测过去的扫描限制了临床应用，但这种正则化有助于模型生成时间上更准确的扫描。我们在OASIS-3数据集上训练和评估TADM-3D，并在NACC数据集的外部测试集上验证泛化性能。代码将在接受后提供。
摘要：Generating realistic MRIs to accurately predict future changes in the structure of brain is an invaluable tool for clinicians in assessing clinical outcomes and analysing the disease progression at the patient level. However, current existing methods present some limitations: (i) some approaches fail to explicitly capture the relationship between structural changes and time intervals, especially when trained on age-imbalanced datasets; (ii) others rely only on scan interpolation, which lack clinical utility, as they generate intermediate images between timepoints rather than future pathological progression; and (iii) most approaches rely on 2D slice-based architectures, thereby disregarding full 3D anatomical context, which is essential for accurate longitudinal predictions. We propose a 3D Temporally-Aware Diffusion Model (TADM-3D), which accurately predicts brain progression on MRI volumes. To better model the relationship between time interval and brain changes, TADM-3D uses a pre-trained Brain-Age Estimator (BAE) that guides the diffusion model in the generation of MRIs that accurately reflect the expected age difference between baseline and generated follow-up scans. Additionally, to further improve the temporal awareness of TADM-3D, we propose the Back-In-Time Regularisation (BITR), by training TADM-3D to predict bidirectionally from the baseline to follow-up (forward), as well as from the follow-up to baseline (backward). Although predicting past scans has limited clinical applications, this regularisation helps the model generate temporally more accurate scans. We train and evaluate TADM-3D on the OASIS-3 dataset, and we validate the generalisation performance on an external test set from the NACC dataset. The code will be available upon acceptance.

【13】A Neural Network Approach to Multi-radionuclide TDCR Beta Spectroscopy
标题：多放射性核素TDCR Beta光谱的神经网络方法
链接：https://arxiv.org/abs/2509.03137

作者：an Yang
备注：15 pages, 3 figures
摘要：液体闪烁三倍符合比（TDCR）光谱法因其具有高精度、自校准能力和不依赖于放射性参考源等优点，被广泛用作放射性核素定量的标准方法。然而，通过TDCR进行的多放射性核素分析面临着自动化程度有限和依赖混合物特定标准的挑战，这些标准可能不容易获得。在这里，我们提出了一个人工智能（AI）框架，它结合了数值光谱模拟和深度学习，用于无标准的自动分析。用于模型训练的$\beta$光谱是使用Geant 4模拟结合统计建模的检测器响应采样生成的。一个量身定制的神经网络架构，在这个数据集上训练，涵盖各种核混合比和淬灭场景，通过端到端的学习范例，实现了对单个放射性核素活动的自主分辨率和检测效率。该模型提供了一致的高精度跨任务：活动比例（平均绝对误差= 0.009），检测效率（平均绝对误差= 0.002），和光谱重建（结构相似性指数= 0.9998），验证其淬火$\beta$光谱的物理兼容性。这种人工智能驱动的方法具有强大的泛化能力，实时处理能力和工程可行性，特别是在参考材料不可用或需要快速现场分析的情况下，具有自动安全合规的多放射性核素分析的巨大潜力。
摘要：Liquid scintillation triple-to-doubly coincident ratio (TDCR) spectroscopy is widely adopted as a standard method for radionuclide quantification because of its inherent advantages such as high precision, self-calibrating capability, and independence from radioactive reference sources. However, multiradionuclide analysis via TDCR faces the challenges of limited automation and reliance on mixture-specific standards, which may not be easily available. Here, we present an Artificial Intelligence (AI) framework that combines numerical spectral simulation and deep learning for standard-free automated analysis. $\beta$ spectra for model training were generated using Geant4 simulations coupled with statistically modeled detector response sampling. A tailored neural network architecture, trained on this dataset covering various nuclei mix ratio and quenching scenarios, enables autonomous resolution of individual radionuclide activities and detecting efficiency through end-to-end learning paradigms. The model delivers consistent high accuracy across tasks: activity proportions (mean absolute error = 0.009), detection efficiencies (mean absolute error = 0.002), and spectral reconstruction (Structural Similarity Index = 0.9998), validating its physical plausibility for quenched $\beta$ spectroscopy. This AI-driven methodology exhibits significant potential for automated safety-compliant multiradionuclide analysis with robust generalization, real-time processing capabilities, and engineering feasibility, particularly in scenarios where reference materials are unavailable or rapid field analysis is required.

【14】Structured Basis Function Networks: Loss-Centric Multi-Hypothesis Ensembles with Controllable Diversity
标题：结构化基函数网络：具有可控多样性的以损失为中心的多假设集合
链接：https://arxiv.org/abs/2509.02792

作者： Rodriguez Dominguez, Muhammad Shahzad, Xia Hong
备注：32 Pages, 10 Figures, 11 Tables
摘要：现有的预测不确定性方法要么依赖于多假设预测，这会促进多样性，但缺乏原则性聚合，要么依赖于集成学习，这会提高准确性，但很少捕捉结构化模糊性。这意味着仍然没有一个与损失几何结构相一致的统一框架。结构化基函数网络通过将多假设预测与由Bregman发散引起的质心聚集相联系来解决这一差距。该配方适用于整个回归和分类对齐预测的几何损失，并支持一个封闭形式的最小二乘估计和基于梯度的程序的一般目标。一个可调的多样性机制提供了偏置方差多样性权衡的参数控制，连接多假设概括与损失感知集成聚合。实验验证了这种关系，并使用该机制来研究深度学习预测器难度增加的数据集之间的复杂性-容量-多样性权衡。
摘要：Existing approaches to predictive uncertainty rely either on multi-hypothesis prediction, which promotes diversity but lacks principled aggregation, or on ensemble learning, which improves accuracy but rarely captures the structured ambiguity. This implicitly means that a unified framework consistent with the loss geometry remains absent. The Structured Basis Function Network addresses this gap by linking multi-hypothesis prediction and ensembling through centroidal aggregation induced by Bregman divergences. The formulation applies across regression and classification by aligning predictions with the geometry of the loss, and supports both a closed-form least-squares estimator and a gradient-based procedure for general objectives. A tunable diversity mechanism provides parametric control of the bias-variance-diversity trade-off, connecting multi-hypothesis generalisation with loss-aware ensemble aggregation. Experiments validate this relation and use the mechanism to study the complexity-capacity-diversity trade-off across datasets of increasing difficulty with deep-learning predictors.

【15】The Transparent Earth: A Multimodal Foundation Model for the Earth's Subsurface
标题：透明地球：地球次表面的多模式基础模型
链接：https://arxiv.org/abs/2509.02783

作者：umder, Javier E. Santos, Noah Hobbs, Mohamed Mehana, Daniel O'Malley
摘要：我们提出了透明地球，这是一种基于变换器的架构，用于从稀疏度、分辨率和模态不同的异构数据集重建地下属性，其中每种模态代表不同类型的观测（例如，应力角、地幔温度、构造板块类型）。该模型结合了位置编码的观察与模态编码，来自一个文本嵌入模型应用到每个模态的描述。这种设计使模型能够扩展到任意数量的模态，从而可以直接添加初始设计中未考虑的新模态。我们目前包括八种模态，包括方向角、分类类和连续属性（如温度和厚度）。这些功能支持上下文学习，使模型能够在没有输入的情况下生成预测，或者从任何模态子集中添加任意数量的额外观察。在验证数据上，这将预测应力角的误差减少了三倍以上。所提出的架构是可扩展的，并展示了提高性能，增加参数。这些进步使透明地球成为地球地下的初始基础模型，最终旨在预测地球上任何地方的任何地下属性。
摘要：We present the Transparent Earth, a transformer-based architecture for reconstructing subsurface properties from heterogeneous datasets that vary in sparsity, resolution, and modality, where each modality represents a distinct type of observation (e.g., stress angle, mantle temperature, tectonic plate type). The model incorporates positional encodings of observations together with modality encodings, derived from a text embedding model applied to a description of each modality. This design enables the model to scale to an arbitrary number of modalities, making it straightforward to add new ones not considered in the initial design. We currently include eight modalities spanning directional angles, categorical classes, and continuous properties such as temperature and thickness. These capabilities support in-context learning, enabling the model to generate predictions either with no inputs or with an arbitrary number of additional observations from any subset of modalities. On validation data, this reduces errors in predicting stress angle by more than a factor of three. The proposed architecture is scalable and demonstrates improved performance with increased parameters. Together, these advances make the Transparent Earth an initial foundation model for the Earth's subsurface that ultimately aims to predict any subsurface property anywhere on Earth.

【16】Mentality: A Mamba-based Approach towards Foundation Models for EEG
标题：心态：基于曼巴舞的脑电基础模型方法
链接：https://arxiv.org/abs/2509.02746

作者：anchavati, Corey Arnold, William Speier
备注：None
摘要：这项工作探讨了基础模型的潜力，特别是基于Mamba的选择性状态空间模型，用于增强神经系统疾病诊断中的EEG分析。EEG对于诊断癫痫等疾病至关重要，但由于其噪声，高维和非线性性质而面临重大挑战。传统的机器学习方法在自动化EEG分析方面取得了进展，但往往无法捕捉其复杂的时空动态。深度学习的最新进展，特别是在序列建模方面，为创建能够处理这种复杂性的更通用和更具表达力的模型提供了新的途径。通过在包含癫痫发作和非癫痫发作EEG记录的大型数据集上训练基于Mamba的模型，通过自监督重建任务以及随后的癫痫发作检测任务，我们证明了该模型的有效性，在保持的测试集上实现了0.72的AUROC。这种方法标志着朝着开发用于EEG数据分析的大规模、临床适用的基础模型迈出了重要的一步。
摘要：This work explores the potential of foundation models, specifically a Mamba-based selective state space model, for enhancing EEG analysis in neurological disorder diagnosis. EEG, crucial for diagnosing conditions like epilepsy, presents significant challenges due to its noisy, high-dimensional, and nonlinear nature. Traditional machine learning methods have made advances in automating EEG analysis but often fail to capture its complex spatio-temporal dynamics. Recent advances in deep learning, particularly in sequence modeling, offer new avenues for creating more generalized and expressive models capable of handling such complexities. By training a Mamba-based model on a large dataset containing seizure and non-seizure EEG recordings through a self-supervised reconstruction task followed by a seizure detection task, we demonstrate the model's effectiveness, achieving an AUROC of 0.72 on a held-out test set. This approach marks a significant step toward developing large-scale, clinically applicable foundation models for EEG data analysis.

【17】Beyond Synthetic Augmentation: Group-Aware Threshold Calibration for Robust Balanced Accuracy in Imbalanced Learning
标题：超越合成增强：群体感知阈值校准，以实现不平衡学习中的稳健平衡准确性
链接：https://arxiv.org/abs/2509.02592

作者：ttlin
备注：Accepted to the AIDEM'25 conference at ECML; to be published in Springer (LNCS)
摘要：类不平衡仍然是机器学习的一个根本挑战，传统的解决方案往往会产生与解决问题一样多的问题。我们证明了群体感知阈值校准-为不同的人口统计群体设置不同的决策阈值-与合成数据生成方法相比提供了更好的鲁棒性。通过大量的实验，我们表明，特定于组的阈值比SMOTE和CT-GAN增强模型的平衡准确度高1.5-4%，同时提高了最差组的平衡准确度。与在所有组中应用一个截止值的单阈值方法不同，我们的组感知方法优化了平衡准确性和最差组平衡准确性之间的帕累托边界，从而实现了对组级性能的细粒度控制。重要的是，我们发现将组阈值应用于综合增强数据产生的额外好处最小，这表明这些方法从根本上是多余的。我们的研究结果涵盖了七个模型家族，包括线性，基于树，基于实例和提升方法，证实了群体感知阈值校准为类不平衡提供了更简单，更可解释，更有效的解决方案。
摘要：Class imbalance remains a fundamental challenge in machine learning, with traditional solutions often creating as many problems as they solve. We demonstrate that group-aware threshold calibration--setting different decision thresholds for different demographic groups--provides superior robustness compared to synthetic data generation methods. Through extensive experiments, we show that group-specific thresholds achieve 1.5-4% higher balanced accuracy than SMOTE and CT-GAN augmented models while improving worst-group balanced accuracy. Unlike single-threshold approaches that apply one cutoff across all groups, our group-aware method optimizes the Pareto frontier between balanced accuracy and worst-group balanced accuracy, enabling fine-grained control over group-level performance. Critically, we find that applying group thresholds to synthetically augmented data yields minimal additional benefit, suggesting these approaches are fundamentally redundant. Our results span seven model families including linear, tree-based, instance-based, and boosting methods, confirming that group-aware threshold calibration offers a simpler, more interpretable, and more effective solution to class imbalance.

【18】The Lifecycle Principle: Stabilizing Dynamic Neural Networks with State Memory
标题：预设原则：用状态记忆稳定动态神经网络
链接：https://arxiv.org/abs/2509.02575

作者：ang
备注：8 pages, 1 figure
摘要：我研究了一种更强的正则化形式，通过长时间停用神经元，这与Dropout等方法的临时变化不同。然而，这种长期的动态性带来了一个关键的挑战：当神经元以随机权重恢复时，会出现严重的训练不稳定性。为了解决这个问题，我提出了一种正则化机制，即LC原则，它的核心是一个关键的创新：状态记忆。我的方法不是重新初始化复活的神经元，而是将其参数恢复到最后已知的有效状态。这个过程保留了学到的知识，避免了破坏性的优化冲击。我的理论分析表明，LC原理平滑了损失情况，引导优化朝着与更好的泛化相关的更平坦的最小值进行。在图像分类基准上的实验表明，该方法提高了泛化能力和鲁棒性。最重要的是，消融研究证实了状态记忆对于实现这些增益至关重要。
摘要：I investigate a stronger form of regularization by deactivating neurons for extended periods, a departure from the temporary changes of methods like Dropout. However, this long-term dynamism introduces a critical challenge: severe training instability when neurons are revived with random weights. To solve this, I propose the Lifecycle (LC) principle, a regularization mechanism centered on a key innovation: state memory. Instead of re-initializing a revived neuron, my method restores its parameters to their last known effective state. This process preserves learned knowledge and avoids destructive optimization shocks. My theoretical analysis reveals that the LC principle smooths the loss landscape, guiding optimization towards flatter minima associated with better generalization. Experiments on image classification benchmarks demonstrate that my method improves generalization and robustness. Crucially, ablation studies confirm that state memory is essential for achieving these gains.

【19】Learning AC Power Flow Solutions using a Data-Dependent Variational Quantum Circuit
标题：使用数据相关变分量子电路学习交流潮流解决方案
链接：https://arxiv.org/abs/2509.03495

作者：t Le, Md Obaidur Rahman, Vassilis Kekatos
备注：7 pages, 6 figures, accepted for the IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids 2025
摘要：互联研究需要解决大量的交流负荷或潮流（AC PF）问题，以模拟不同的情况下，电力系统导航正在进行的能源转型。为了加快这些研究，这项工作利用量子计算的最新进展，使用变分量子电路（VQC）来寻找或预测AC PF解决方案。VQC是一种可训练的模型，它运行在现代嘈杂的中间尺度量子（NISQ）硬件上，以完成精心设计的优化和机器学习（ML）任务。我们的第一个贡献是将AC PF的单个实例作为VQC可训练参数（权重）的非线性最小二乘拟合，并使用混合经典/量子计算方法解决它。第二个贡献是将PF规范作为特征馈送到数据嵌入式VQC中，并训练所得到的量子ML（QML）模型来预测一般PF解决方案。第三个贡献是开发了一种新的协议，有效地测量AC-PF量子可观测量利用电力网络的图形结构。初步的数值测试表明，尽管使用了更少的权重，但所提出的VQC模型在深度神经网络上获得了增强的预测性能。所提出的量子AC-PF框架为通过量子计算解决更复杂的网格任务奠定了基础。
摘要：Interconnection studies require solving numerous instances of the AC load or power flow (AC PF) problem to simulate diverse scenarios as power systems navigate the ongoing energy transition. To expedite such studies, this work leverages recent advances in quantum computing to find or predict AC PF solutions using a variational quantum circuit (VQC). VQCs are trainable models that run on modern-day noisy intermediate-scale quantum (NISQ) hardware to accomplish elaborate optimization and machine learning (ML) tasks. Our first contribution is to pose a single instance of the AC PF as a nonlinear least-squares fit over the VQC trainable parameters (weights) and solve it using a hybrid classical/quantum computing approach. The second contribution is to feed PF specifications as features into a data-embedded VQC and train the resultant quantum ML (QML) model to predict general PF solutions. The third contribution is to develop a novel protocol to efficiently measure AC-PF quantum observables by exploiting the graph structure of a power network. Preliminary numerical tests indicate that the proposed VQC models attain enhanced prediction performance over a deep neural network despite using much fewer weights. The proposed quantum AC-PF framework sets the foundations for addressing more elaborate grid tasks via quantum computing.

【20】An Effective Strategy for Modeling Score Ordinality and Non-uniform Intervals in Automated Speaking Assessment
标题：自动口语评估中得分有序度和非均匀间隔建模的有效策略
链接：https://arxiv.org/abs/2509.03372

作者： Lo, Szu-Yu Chen, Yao-Ting Sung, Berlin Chen
备注：Accepted at ASRU 2025
摘要：最近一系列关于自动口语评估（ASA）的研究受益于自监督学习（SSL）表示，它可以在没有特征策展的基础假设的情况下捕获非母语语音中丰富的声学和语言模式。然而，基于语音的SSL模型捕获声学相关的特征，但忽略了语言内容，而基于文本的SSL模型依赖于ASR输出，无法编码韵律的细微差别。此外，大多数现有技术将熟练程度水平视为名义类别，忽略了它们的顺序结构和熟练程度标签之间的非均匀间隔。为了解决这些限制，我们提出了一个有效的ASA方法相结合的SSL手工制作的指标功能，通过一个新的建模范例。我们进一步引入了一个多利润率序数损失，共同模型的得分序数和非均匀间隔的熟练度标签。在TEEMI语料库上进行的大量实验表明，我们的方法始终优于强基线，并很好地推广到看不见的提示。
摘要：A recent line of research on automated speaking assessment (ASA) has benefited from self-supervised learning (SSL) representations, which capture rich acoustic and linguistic patterns in non-native speech without underlying assumptions of feature curation. However, speech-based SSL models capture acoustic-related traits but overlook linguistic content, while text-based SSL models rely on ASR output and fail to encode prosodic nuances. Moreover, most prior arts treat proficiency levels as nominal classes, ignoring their ordinal structure and non-uniform intervals between proficiency labels. To address these limitations, we propose an effective ASA approach combining SSL with handcrafted indicator features via a novel modeling paradigm. We further introduce a multi-margin ordinal loss that jointly models both the score ordinality and non-uniform intervals of proficiency labels. Extensive experiments on the TEEMI corpus show that our method consistently outperforms strong baselines and generalizes well to unseen prompts.

【21】Bayesian Additive Regression Trees for functional ANOVA model
标题：功能性方差分析模型的Bayesian加性回归树
链接：https://arxiv.org/abs/2509.03317

作者：ark, Insung Kong, Yongdai Kim
摘要：贝叶斯加性回归树（Bayesian Additive Regression Trees，BART）是一个强大的统计模型，它利用了贝叶斯推理和回归树的优势。它因捕捉复杂的非线性关系和预测变量之间的相互作用而受到极大关注。然而，BART的准确性往往以可解释性为代价。为了解决这一限制，我们提出了方差分析贝叶斯加性回归树（ANOVA-BART），这是基于函数方差分析分解的BART的一种新型扩展，用于将函数的变异性分解为不同的相互作用，每个相互作用代表不同的贡献一组协变量或因素。我们提出的ANOVA-BART增强了可解释性，保留和扩展了BART的理论保证，并实现了卓越的预测性能。具体来说，我们建立的后验浓度率的方差-BART是近极大极小最优的，并进一步提供了相同的收敛速度为每个相互作用是不可用的BART。此外，全面的实验证实，方差-BART超越BART的准确性和不确定性量化，同时也证明了其有效性的组件选择。这些结果表明，ANOVA-BART通过平衡预测准确性、可解释性和理论一致性，为BART提供了一种令人信服的替代方案。
摘要：Bayesian Additive Regression Trees (BART) is a powerful statistical model that leverages the strengths of Bayesian inference and regression trees. It has received significant attention for capturing complex non-linear relationships and interactions among predictors. However, the accuracy of BART often comes at the cost of interpretability. To address this limitation, we propose ANOVA Bayesian Additive Regression Trees (ANOVA-BART), a novel extension of BART based on the functional ANOVA decomposition, which is used to decompose the variability of a function into different interactions, each representing the contribution of a different set of covariates or factors. Our proposed ANOVA-BART enhances interpretability, preserves and extends the theoretical guarantees of BART, and achieves superior predictive performance. Specifically, we establish that the posterior concentration rate of ANOVA-BART is nearly minimax optimal, and further provides the same convergence rates for each interaction that are not available for BART. Moreover, comprehensive experiments confirm that ANOVA-BART surpasses BART in both accuracy and uncertainty quantification, while also demonstrating its effectiveness in component selection. These results suggest that ANOVA-BART offers a compelling alternative to BART by balancing predictive accuracy, interpretability, and theoretical consistency.

【22】Fast kernel methods: Sobolev, physics-informed, and additive models
标题：快速内核方法：Sobolev、物理知识模型和添加剂模型
链接：https://arxiv.org/abs/2509.02649

作者：umèche (LPSM, EDF R&D OSIRIS), Francis Bach (ENS-PSL), Gérard Biau (LPSM, IUF), Claire Boyer (LMO)
摘要：核方法是统计学习中的强大工具，但其在样本大小n下的立方复杂性限制了其在大规模数据集上的使用。在这项工作中，我们引入了一个可扩展的框架，内核回归O（n log n）的复杂性，充分利用GPU加速。该方法基于核的傅立叶表示与非均匀快速傅立叶变换（NUFFT）相结合，从而实现精确、快速和内存高效的计算。我们在三种设置中实例化我们的框架：Sobolev内核回归，物理信息回归和添加剂模型。当已知时，所提出的估计实现极小极大收敛速度，与经典核理论一致。实证结果表明，我们的方法可以在几分钟内处理高达数百亿个样本，提供统计准确性和计算可扩展性。这些贡献建立了一种灵活的方法，为核方法在大规模学习任务中的常规应用铺平了道路。
摘要：Kernel methods are powerful tools in statistical learning, but their cubic complexity in the sample size n limits their use on large-scale datasets. In this work, we introduce a scalable framework for kernel regression with O(n log n) complexity, fully leveraging GPU acceleration. The approach is based on a Fourier representation of kernels combined with non-uniform fast Fourier transforms (NUFFT), enabling exact, fast, and memory-efficient computations. We instantiate our framework in three settings: Sobolev kernel regression, physics-informed regression, and additive models. When known, the proposed estimators are shown to achieve minimax convergence rates, consistent with classical kernel theory. Empirical results demonstrate that our methods can process up to tens of billions of samples within minutes, providing both statistical accuracy and computational scalability. These contributions establish a flexible approach, paving the way for the routine application of kernel methods in large-scale learning tasks.

其他(29篇)

【1】Warming Up for Zeroth-Order Federated Pre-Training with Low Resource Clients
标题：为低资源客户端的零阶联邦预训练做好准备
链接：https://arxiv.org/abs/2509.03503

作者：te, Irina Rish, Eugene Belilovsky
摘要：联合学习使得能够在众多边缘设备之间进行协作模型训练，而不需要参与者共享数据;然而，这些边缘设备上的存储器和通信限制可能会妨碍他们参与训练。我们考虑一种设置，其中边缘设备的子集低于进行模型更新所需的关键内存或通信阈值。在典型的联邦优化算法下，这些设备被排除在训练之外，这使得它们的数据不可访问，并增加了系统引起的偏差。我们受到MeZO的启发，MeZO是一种用于内存高效微调的零阶方法。零阶梯度近似固有的增加的方差已经将先前的零阶优化器专门降级到微调域;我们寻求纠正的限制。我们设计了一个联邦，内存效率的零阶优化器，ZOWarmUp，允许零阶训练从随机初始化。ZOWarmUp利用不同的客户端功能和谨慎的方差减少技术，以促进代表性不足，资源不足的客户端参与模型培训。与其他联合零阶方法一样，ZOWarmUp消除了边缘设备将其完整梯度传输到服务器的需要，而是仅依赖于一小部分随机种子，从而使上行链路通信成本可以忽略不计。我们使用各种数据集和模型架构进行实验，以表明ZOWarmUp是一种强大的算法，可以在各种情况下应用。对于边缘设备比例较高的系统，这些边缘设备将被排除在训练之外，该算法提供了对更大数量和多样性数据的访问，从而改善了训练结果。
摘要：Federated learning enables collaborative model training across numerous edge devices without requiring participants to share data; however, memory and communication constraints on these edge devices may preclude their participation in training. We consider a setting in which a subset of edge devices are below a critical memory or communication threshold required to conduct model updates. Under typical federated optimization algorithms, these devices are excluded from training which renders their data inaccessible and increases system induced bias. We are inspired by MeZO, a zeroth-order method used for memory-efficient fine-tuning. The increased variance inherent to zeroth-order gradient approximations has relegated previous zeroth-order optimizers exclusively to the domain of fine tuning; a limitation we seek to correct. We devise a federated, memory-efficient zeroth-order optimizer, ZOWarmUp that permits zeroth-order training from a random initialization. ZOWarmUp leverages differing client capabilities and careful variance reduction techniques to facilitate participation of under-represented, low-resource clients in model training. Like other federated zeroth-order methods, ZOWarmUp eliminates the need for edge devices to transmit their full gradients to the server and instead relies on only a small set of random seeds, rendering the up-link communication cost negligible. We present experiments using various datasets and model architectures to show that ZOWarmUp is a robust algorithm that can can be applied under a wide variety of circumstances. For systems with a high proportion of edge devices that would otherwise be excluded from training, this algorithm provides access to a greater volume and diversity of data, thus improving training outcomes.

【2】Geometric Foundations of Tuning without Forgetting in Neural ODEs
标题：神经ODE中不忘记调整的几何基础
链接：https://arxiv.org/abs/2509.03474

作者：ram, Mohamed-Ali Belabbas, Tamer Başar
摘要：在我们早期的工作中，我们介绍了用于神经ODE的顺序训练的Tuning without Forgetting（TwF）原理，其中训练样本被迭代地添加，并且参数在控制函数的子空间内更新，该子空间在一阶近似意义上保留了输出标签流形上先前学习样本的端点映射。本文证明了这个参数子空间在非奇异控制下构成一个有限余维的Banach子流形，并刻画了它的切空间。这表明TwF对应于控制函数沿着这个Banach子流形的切空间的延续/变形，为它在顺序训练期间的映射保持（不忘记）提供了理论基础，超越了一阶近似。
摘要：In our earlier work, we introduced the principle of Tuning without Forgetting (TwF) for sequential training of neural ODEs, where training samples are added iteratively and parameters are updated within the subspace of control functions that preserves the end-point mapping at previously learned samples on the manifold of output labels in the first-order approximation sense. In this letter, we prove that this parameter subspace forms a Banach submanifold of finite codimension under nonsingular controls, and we characterize its tangent space. This reveals that TwF corresponds to a continuation/deformation of the control function along the tangent space of this Banach submanifold, providing a theoretical foundation for its mapping-preserving (not forgetting) during the sequential training exactly, beyond first-order approximation.

【3】Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training
标题：超越正确：通过RL训练协调流程和结果奖励
链接：https://arxiv.org/abs/2509.03403

作者：, Zhou Yu, Ziji Zhang, Hao Chen, Narayanan Sadagopan, Jing Huang, Tong Zhang, Anurag Beniwal
摘要：具有可验证奖励的强化学习（RLVR）已经成为数学推理任务的主要范式，提供了推理能力的稳定改善。然而，RLVR中的结果奖励模型（ORM）过于粗粒度，无法区分正确答案中的错误推理或不正确答案中的有效推理。这种粒度的缺乏显著地引入了噪声和误导性梯度，并阻碍了推理过程质量的进一步进步。虽然过程奖励模型（PRM）为中间步骤提供了细粒度的指导，但它们经常存在不准确性，并且容易受到奖励黑客的影响。为了解决这一困境，我们引入了过程一致性过滤器（PROF），这是一种有效的数据处理策展方法，可以将嘈杂的细粒度过程奖励与准确的粗粒度结果奖励相协调。PROF不是在目标函数中天真地混合PRM和ORM（arXiv：archive/2506.18896），而是通过一致性驱动的样本选择来利用它们的互补优势。我们的方法保留了具有较高平均过程值的正确响应和具有较低平均过程值的不正确响应，同时保持正/负训练样本平衡。大量的实验表明，我们的方法不仅一致地提高了最终的准确率超过4\%$的混合方法相比，但也加强了中间推理步骤的质量。代码和培训食谱可在https://github.com/Chenluye99/PROF上获得。
摘要：Reinforcement learning with verifiable rewards (RLVR) has emerged to be a predominant paradigm for mathematical reasoning tasks, offering stable improvements in reasoning ability. However, Outcome Reward Models (ORMs) in RLVR are too coarse-grained to distinguish flawed reasoning within correct answers or valid reasoning within incorrect answers. This lack of granularity introduces noisy and misleading gradients significantly and hinders further progress in reasoning process quality. While Process Reward Models (PRMs) offer fine-grained guidance for intermediate steps, they frequently suffer from inaccuracies and are susceptible to reward hacking. To resolve this dilemma, we introduce PRocess cOnsistency Filter (PROF), an effective data process curation method that harmonizes noisy, fine-grained process rewards with accurate, coarse-grained outcome rewards. Rather than naively blending PRM and ORM in the objective function (arXiv:archive/2506.18896), PROF leverages their complementary strengths through consistency-driven sample selection. Our approach retains correct responses with higher averaged process values and incorrect responses with lower averaged process values, while maintaining positive/negative training sample balance. Extensive experiments demonstrate that our method not only consistently improves the final accuracy over $4\%$ compared to the blending approaches, but also strengthens the quality of intermediate reasoning steps. Codes and training recipes are available at https://github.com/Chenluye99/PROF.

【4】The distribution of calibrated likelihood functions on the probability-likelihood Aitchison simplex
标题：概率-似然Aitchison单形上校准似然函数的分布
链接：https://arxiv.org/abs/2509.03365

作者：hier Noé, Andreas Nautsch, Driss Matrouf, Pierre-Michel Bousquet, Jean-François Bonastre
备注：Preprint. Under review
摘要：虽然概率预测的校准已被广泛研究，本文而是解决校准的似然函数。这已经被讨论过，特别是在生物识别中，在只有两个详尽和互斥的假设（类）的情况下，其中似然函数可以写为对数似然比（LLR）。在定义了LLR的校准及其与证据权重概念的联系之后，我们提出了LLR分布的等同性及其相关约束。虽然这些结果已经知道了几十年，但它们仅限于二元情况。在这里，我们通过使用单纯形的艾奇逊几何将它们扩展到具有两个以上假设的情况，这允许我们以向量形式恢复贝叶斯规则的加法形式;因此将LLR和证据权重扩展到任何数量的假设。特别是，我们扩展了校准的定义，的idemonium，和约束分布的似然函数的LLR的多个假设和多类对应：等距对数比变换的似然函数。这项工作主要是概念性的，但我们仍然提供了一个应用程序的机器学习，提出了一个非线性判别分析，其中的判别组件形成一个校准的似然函数的类，因此提高了可解释性和可靠性的方法。
摘要：While calibration of probabilistic predictions has been widely studied, this paper rather addresses calibration of likelihood functions. This has been discussed, especially in biometrics, in cases with only two exhaustive and mutually exclusive hypotheses (classes) where likelihood functions can be written as log-likelihood-ratios (LLRs). After defining calibration for LLRs and its connection with the concept of weight-of-evidence, we present the idempotence property and its associated constraint on the distribution of the LLRs. Although these results have been known for decades, they have been limited to the binary case. Here, we extend them to cases with more than two hypotheses by using the Aitchison geometry of the simplex, which allows us to recover, in a vector form, the additive form of the Bayes' rule; extending therefore the LLR and the weight-of-evidence to any number of hypotheses. Especially, we extend the definition of calibration, the idempotence, and the constraint on the distribution of likelihood functions to this multiple hypotheses and multiclass counterpart of the LLR: the isometric-log-ratio transformed likelihood function. This work is mainly conceptual, but we still provide one application to machine learning by presenting a non-linear discriminant analysis where the discriminant components form a calibrated likelihood function over the classes, improving therefore the interpretability and the reliability of the method.

【5】Fair Resource Allocation for Fleet Intelligence
标题：舰队情报部门的公平资源配置
链接：https://arxiv.org/abs/2509.03353

作者：aser, Kaan Kale, Po-han Li, Sandeep Chinchali
备注：This paper has been accepted for presentation at the 2025 IEEE Global Communications Conference (GLOBECOM 2025)
摘要：资源分配是云辅助多智能体性能优化的关键。传统的方法往往忽略了智能体的多样性计算能力和复杂的运行环境，导致资源分配效率低下和不公平。为了解决这个问题，我们开源了Fair-Synergy，这是一个算法框架，它利用代理的准确性和系统资源之间的凹关系，以确保整个舰队情报的公平资源分配。我们扩展了传统的分配方法，以涵盖由模型参数，训练数据量和任务复杂性定义的多维机器学习实用程序。我们使用先进的视觉和语言模型（如BERT，VGG 16，MobileNet和ResNets）对包括MNIST，CIFAR-10，CIFAR-100，BDD和GLUE在内的数据集进行评估。我们证明，公平协同优于标准基准高达25%的多智能体推理和11%的多智能体学习设置。此外，我们探讨了公平性水平如何影响最小的，最小的和平均的代理，提供公平的舰队情报的见解。
摘要：Resource allocation is crucial for the performance optimization of cloud-assisted multi-agent intelligence. Traditional methods often overlook agents' diverse computational capabilities and complex operating environments, leading to inefficient and unfair resource distribution. To address this, we open-sourced Fair-Synergy, an algorithmic framework that utilizes the concave relationship between the agents' accuracy and the system resources to ensure fair resource allocation across fleet intelligence. We extend traditional allocation approaches to encompass a multidimensional machine learning utility landscape defined by model parameters, training data volume, and task complexity. We evaluate Fair-Synergy with advanced vision and language models such as BERT, VGG16, MobileNet, and ResNets on datasets including MNIST, CIFAR-10, CIFAR-100, BDD, and GLUE. We demonstrate that Fair-Synergy outperforms standard benchmarks by up to 25% in multi-agent inference and 11% in multi-agent learning settings. Also, we explore how the level of fairness affects the least advantaged, most advantaged, and average agents, providing insights for equitable fleet intelligence.

【6】Generative Auto-Bidding in Large-Scale Competitive Auctions via Diffusion Completer-Aligner
标题：通过扩散完全结盟者进行大规模竞争性拍卖的生成自动竞价
链接：https://arxiv.org/abs/2509.03348

作者： Jingtong Gao, Nan Jiang, Shuai Mao, Ruyi An, Fei Pan, Xiangyu Zhao, Bo An, Qingpeng Cai, Peng Jiang
摘要：自动出价是计算广告的核心，通过在经济约束下优化广告商的出价来实现显着的商业成功。最近，大型生成模型显示出通过生成可以灵活适应复杂竞争环境的出价来彻底改变自动出价的潜力。其中，扩散器因其通过关注轨迹级别的累积奖励来解决稀疏奖励挑战的能力以及其可解释的能力而脱颖而出，即，规划各国未来的发展轨迹，并相应地执行投标。然而，扩散器与生成不确定性斗争，特别是关于相邻状态之间的动态合法性，这可能导致差的出价，并且当在高度竞争的拍卖环境中与其他广告商竞争时，进一步导致广告印象机会的显著损失。为了解决这个问题，我们提出了一个因果自动投标方法的基础上扩散完成对齐框架，称为CBD。首先，我们增加了一个额外的随机变量t的扩散训练过程，其中模型观察t长度的历史序列，目标是完成剩余的序列，从而提高生成的序列的动态合法性。然后，我们采用了一个自动化水平的回报模型来完善生成的轨迹，更紧密地与广告客户的目标。不同设置的实验结果表明，我们的方法不仅在大规模自动出价基准上实现了卓越的性能，例如在具有挑战性的稀疏奖励拍卖设置中转化值提高了29.9%，而且在快手在线广告平台上也实现了显着的改进，包括目标成本增加了2.0%。
摘要：Auto-bidding is central to computational advertising, achieving notable commercial success by optimizing advertisers' bids within economic constraints. Recently, large generative models show potential to revolutionize auto-bidding by generating bids that could flexibly adapt to complex, competitive environments. Among them, diffusers stand out for their ability to address sparse-reward challenges by focusing on trajectory-level accumulated rewards, as well as their explainable capability, i.e., planning a future trajectory of states and executing bids accordingly. However, diffusers struggle with generation uncertainty, particularly regarding dynamic legitimacy between adjacent states, which can lead to poor bids and further cause significant loss of ad impression opportunities when competing with other advertisers in a highly competitive auction environment. To address it, we propose a Causal auto-Bidding method based on a Diffusion completer-aligner framework, termed CBD. Firstly, we augment the diffusion training process with an extra random variable t, where the model observes t-length historical sequences with the goal of completing the remaining sequence, thereby enhancing the generated sequences' dynamic legitimacy. Then, we employ a trajectory-level return model to refine the generated trajectories, aligning more closely with advertisers' objectives. Experimental results across diverse settings demonstrate that our approach not only achieves superior performance on large-scale auto-bidding benchmarks, such as a 29.9% improvement in conversion value in the challenging sparse-reward auction setting, but also delivers significant improvements on the Kuaishou online advertising platform, including a 2.0% increase in target cost.

【7】Equivariant Flow Matching for Symmetry-Breaking Bifurcation Problems
标题：对称破坏分歧问题的等变流匹配
链接：https://arxiv.org/abs/2509.03340

作者：driks, Ondřej Rokoš, Martin Doškář, Marc G.D. Geers, Vlado Menkovski
备注：12 pages, 7 figures including appendices
摘要：非线性动力系统中的分岔现象往往导致多个稳定解共存，特别是在对称性破缺的情况下。确定性机器学习模型很难捕捉这种多样性，对解决方案进行平均，无法代表较低对称性的结果。在这项工作中，我们提出了一个基于流匹配的生成框架，以模拟分叉结果的全概率分布。我们的方法可以直接采样的多个有效的解决方案，同时保持系统的对称性，通过等变建模。我们引入了一种对称匹配策略，该策略可以在组动作下对齐预测输出和目标输出，从而在等变设置中实现准确的学习。我们验证了我们的方法在一系列的系统，从玩具模型到复杂的物理问题，如屈曲梁和艾伦-卡恩方程。我们的研究结果表明，流匹配显着优于非概率和变分方法在捕捉多峰分布和破环分叉，提供了一个原则性和可扩展的解决方案，在高维系统中的多稳定性建模。
摘要：Bifurcation phenomena in nonlinear dynamical systems often lead to multiple coexisting stable solutions, particularly in the presence of symmetry breaking. Deterministic machine learning models struggle to capture this multiplicity, averaging over solutions and failing to represent lower-symmetry outcomes. In this work, we propose a generative framework based on flow matching to model the full probability distribution over bifurcation outcomes. Our method enables direct sampling of multiple valid solutions while preserving system symmetries through equivariant modeling. We introduce a symmetric matching strategy that aligns predicted and target outputs under group actions, allowing accurate learning in equivariant settings. We validate our approach on a range of systems, from toy models to complex physical problems such as buckling beams and the Allen-Cahn equation. Our results demonstrate that flow matching significantly outperforms non-probabilistic and variational methods in capturing multimodal distributions and symmetry-breaking bifurcations, offering a principled and scalable solution for modeling multistability in high-dimensional systems.

【8】A Comprehensive Guide to Differential Privacy: From Theory to User Expectations
标题：差异隐私全面指南：从理论到用户期望
链接：https://arxiv.org/abs/2509.03294

作者：mitsa, Antti Airola, Tapio Pahikkala, Tinja Pitkämäki
摘要：个人数据的可用性不断增加，使机器学习、医疗保健和网络安全等领域取得了重大进展。然而，这种数据丰富也引起了严重的隐私问题，特别是考虑到强大的重新识别攻击以及对负责任的数据使用的日益增长的法律和道德要求。差分隐私（DP）已经成为一个原则性的，数学基础的框架，以减轻这些风险。本文对DP进行了全面的综述，涵盖其理论基础、实践机制和现实应用。它探讨了关键的算法工具和特定领域的挑战-特别是在隐私保护机器学习和合成数据生成方面。报告还强调了可用性问题以及改进发展伙伴关系系统中的沟通和透明度的必要性。总的来说，我们的目标是支持研究人员和从业人员在数据隐私不断变化的环境中明智地采用DP。
摘要：The increasing availability of personal data has enabled significant advances in fields such as machine learning, healthcare, and cybersecurity. However, this data abundance also raises serious privacy concerns, especially in light of powerful re-identification attacks and growing legal and ethical demands for responsible data use. Differential privacy (DP) has emerged as a principled, mathematically grounded framework for mitigating these risks. This review provides a comprehensive survey of DP, covering its theoretical foundations, practical mechanisms, and real-world applications. It explores key algorithmic tools and domain-specific challenges - particularly in privacy-preserving machine learning and synthetic data generation. The report also highlights usability issues and the need for improved communication and transparency in DP systems. Overall, the goal is to support informed adoption of DP by researchers and practitioners navigating the evolving landscape of data privacy.

【9】Estudio de la eficiencia en la escalabilidad de GPUs para el entrenamiento de Inteligencia Artificial
标题：人工智能可升级图形处理器研究
链接：https://arxiv.org/abs/2509.03263

作者：tes, Carlos Juiz, Belen Bermejo
备注：8 pages, in Spanish language, 8 figures, Conference at SARTECO 2025, Spain
摘要：训练大规模的深度学习模型已经成为科学界和工业界的一个关键挑战。虽然大量使用GPU可以显著加快训练时间，但这种方法对效率有负面影响。在本文中，我们详细分析了MLPerf Training v4.1在四种工作负载（BERT、Llama2 LoRA、RetinaNet和Stable Diffusion）上报告的时间，表明存在优化性能、GPU使用率和效率之间关系的配置。结果指向一个盈亏平衡点，允许减少训练时间，同时最大限度地提高效率。
摘要：Training large-scale deep learning models has become a key challenge for the scientific community and industry. While the massive use of GPUs can significantly speed up training times, this approach has a negative impact on efficiency. In this article, we present a detailed analysis of the times reported by MLPerf Training v4.1 on four workloads: BERT, Llama2 LoRA, RetinaNet, and Stable Diffusion, showing that there are configurations that optimise the relationship between performance, GPU usage, and efficiency. The results point to a break-even point that allows training times to be reduced while maximising efficiency.

【10】TopoMap: A Feature-based Semantic Discriminator of the Topographical Regions in the Test Input Space
标题：TopoMap：测试输入空间中地形区域的基于语义识别器
链接：https://arxiv.org/abs/2509.03242

作者： De Vita, Nargiz Humbatova, Paolo Tonella
摘要：测试基于深度学习（DL）的系统是一个开放的挑战。虽然找到导致DL模型行为不当的输入相对容易，但根据使被测DL模型失败的特征对输入进行分组在很大程度上尚未探索。现有的DL测试方法引入了可能集中在特定故障诱导特征上的扰动，而忽略了属于特征空间不同区域的其他特征。在本文中，我们创建一个显式的输入特征空间的地形图。我们的方法，名为TopoMap，是黑盒和模型不可知的，因为它只依赖于功能，隐藏输入空间。为了根据它们共享的特定特征来区分输入，我们首先应用降维来获得输入嵌入，然后对其进行聚类。每个DL模型可能需要特定的嵌入计算和聚类算法，以实现将输入有意义地分离为有区别的组。我们提出了一种新的方法来评估嵌入和聚类技术的替代配置。我们使用深度神经网络（DNN）作为人类评估器的近似，人类评估器可以根据所包含元素的特征判断是否可以区分一对聚类。我们使用这样的DNN来自动选择由不同嵌入/聚类配置产生的所有输入的最佳地形图。评价结果表明，TopoMap生成的地图由可区分的和有意义的区域组成。此外，我们使用突变分析评估TopoMap的有效性。特别是，我们评估我们的地形图中的集群是否允许有效选择突变杀伤输入。实验结果表明，我们的方法优于随机选择的35%，平均可重复的突变体，61%的非可重复的。
摘要：Testing Deep Learning (DL)-based systems is an open challenge. Although it is relatively easy to find inputs that cause a DL model to misbehave, the grouping of inputs by features that make the DL model under test fail is largely unexplored. Existing approaches for DL testing introduce perturbations that may focus on specific failure-inducing features, while neglecting others that belong to different regions of the feature space. In this paper, we create an explicit topographical map of the input feature space. Our approach, named TopoMap, is both black-box and model-agnostic as it relies solely on features that characterise the input space. To discriminate the inputs according to the specific features they share, we first apply dimensionality reduction to obtain input embeddings, which are then subjected to clustering. Each DL model might require specific embedding computations and clustering algorithms to achieve a meaningful separation of inputs into discriminative groups. We propose a novel way to evaluate alternative configurations of embedding and clustering techniques. We used a deep neural network (DNN) as an approximation of a human evaluator who could tell whether a pair of clusters can be discriminated based on the features of the included elements. We use such a DNN to automatically select the optimal topographical map of the inputs among all those that are produced by different embedding/clustering configurations. The evaluation results show that the maps generated by TopoMap consist of distinguishable and meaningful regions. In addition, we evaluate the effectiveness of TopoMap using mutation analysis. In particular, we assess whether the clusters in our topographical map allow for an effective selection of mutation-killing inputs. Experimental results show that our approach outperforms random selection by 35% on average on killable mutants; by 61% on non-killable ones.

【11】The Role of Embodiment in Intuitive Whole-Body Teleoperation for Mobile Manipulation
标题：化身在移动操纵的直觉全身遥操作中的作用
链接：https://arxiv.org/abs/2509.03222

作者：anchi Moyen, Rickmer Krohn, Sophie Lueth, Kay Pompetzki, Jan Peters, Vignesh Prasad, Georgia Chalvatzaki
备注：8 pages, 8 figures, Accepted at the IEEE-RAS International Conference on Humanoid Robots (Humanoids) 2025
摘要：直观的遥操作界面对于移动操作机器人至关重要，以确保高质量的数据收集，同时减少操作员的工作量。强烈的体现感与最小的身体和认知需求相结合，不仅可以增强大规模数据收集期间的用户体验，还有助于在较长时间内保持数据质量。这对于需要全身协调的具有挑战性的长期移动操作任务变得尤为重要。我们比较两种不同的机器人控制范例：一个耦合的实施例集成手臂操纵和基地导航功能，和一个解耦的实施例处理这些系统作为单独的控制实体。此外，我们评估两个视觉反馈机制：沉浸式虚拟现实和传统的基于屏幕的可视化机器人的视野。这些配置是在需要综合规划和执行的复杂、多阶段任务序列中进行系统评估的。我们的研究结果表明，使用VR作为一种反馈方式增加任务完成时间，认知工作量，和感知的努力遥操作。耦合操纵和导航导致用户的工作量与解耦的实施例相当，而初步实验表明，通过耦合的遥操作获得的数据导致更好的模仿学习性能。我们对直观的远程操作界面的整体看法为收集高质量，高维度的移动操作数据提供了有价值的见解，同时考虑到人类操作员。项目网址：https://sophiamoyen.github.io/role-embodiment-wbc-moma-teleop/
摘要：Intuitive Teleoperation interfaces are essential for mobile manipulation robots to ensure high quality data collection while reducing operator workload. A strong sense of embodiment combined with minimal physical and cognitive demands not only enhances the user experience during large-scale data collection, but also helps maintain data quality over extended periods. This becomes especially crucial for challenging long-horizon mobile manipulation tasks that require whole-body coordination. We compare two distinct robot control paradigms: a coupled embodiment integrating arm manipulation and base navigation functions, and a decoupled embodiment treating these systems as separate control entities. Additionally, we evaluate two visual feedback mechanisms: immersive virtual reality and conventional screen-based visualization of the robot's field of view. These configurations were systematically assessed across a complex, multi-stage task sequence requiring integrated planning and execution. Our results show that the use of VR as a feedback modality increases task completion time, cognitive workload, and perceived effort of the teleoperator. Coupling manipulation and navigation leads to a comparable workload on the user as decoupling the embodiments, while preliminary experiments suggest that data acquired by coupled teleoperation leads to better imitation learning performance. Our holistic view on intuitive teleoperation interfaces provides valuable insight into collecting high-quality, high-dimensional mobile manipulation data at scale with the human operator in mind. Project website:https://sophiamoyen.github.io/role-embodiment-wbc-moma-teleop/

【12】Systematic Evaluation of Attribution Methods: Eliminating Threshold Bias and Revealing Method-Dependent Performance Patterns
标题：归因方法的系统评估：消除阈值偏差并揭示与方法相关的绩效模式
链接：https://arxiv.org/abs/2509.03176

作者：oy
备注：15 pages, 9 figures
摘要：归因方法通过识别有影响力的输入特征来解释神经网络预测，但它们的评估存在阈值选择偏差，这可能会逆转方法排名并破坏结论。目前的协议在单一阈值上将归因图二进制化，其中仅阈值选择就可以改变排名超过200个百分点。我们通过一个无阈值框架来解决这个缺陷，该框架可以计算交集曲线下的面积（AUC-IoU），在整个阈值范围内捕获归因质量。评估皮肤病成像的七种归因方法，我们发现单阈值指标产生矛盾的结果，而无阈值评估提供了可靠的区分。XRAI比LIME提高了31%，比vanilla IntegratedCompatients提高了204%，尺寸分层分析显示，在病变尺度上，性能变化高达269%。这些发现建立了方法标准，消除了评价伪影，并使基于证据的方法选择。无阈值框架提供了理论上的洞察归因行为和实际指导，在医学成像和超越强大的比较。
摘要：Attribution methods explain neural network predictions by identifying influential input features, but their evaluation suffers from threshold selection bias that can reverse method rankings and undermine conclusions. Current protocols binarize attribution maps at single thresholds, where threshold choice alone can alter rankings by over 200 percentage points. We address this flaw with a threshold-free framework that computes Area Under the Curve for Intersection over Union (AUC-IoU), capturing attribution quality across the full threshold spectrum. Evaluating seven attribution methods on dermatological imaging, we show single-threshold metrics yield contradictory results, while threshold-free evaluation provides reliable differentiation. XRAI achieves 31% improvement over LIME and 204% over vanilla Integrated Gradients, with size-stratified analysis revealing performance variations up to 269% across lesion scales. These findings establish methodological standards that eliminate evaluation artifacts and enable evidence-based method selection. The threshold-free framework provides both theoretical insight into attribution behavior and practical guidance for robust comparison in medical imaging and beyond.

【13】LSAM: Asynchronous Distributed Training with Landscape-Smoothed Sharpness-Aware Minimization
标题：LSam：具有景观平滑清晰度最小化的同步分布式训练
链接：https://arxiv.org/abs/2509.03110

作者：ng, Sixin Zhang
摘要：虽然尖锐度感知最小化（SAM）通过最小化损失和尖锐度来提高深度神经网络的泛化能力，但它在分布式大批量训练中存在效率低下的问题。我们提出了景观平滑SAM（LSAM），一种新的优化，保留SAM的泛化优势，同时提供卓越的效率。LSAM将SAM的对抗步骤与异步分布式采样策略相结合，生成异步分布式采样方案，产生平滑的锐度感知损失景观以进行优化。与数据并行SAM相比，这种设计消除了同步瓶颈，加速了大批量收敛，并提供了更高的最终精度。
摘要：While Sharpness-Aware Minimization (SAM) improves generalization in deep neural networks by minimizing both loss and sharpness, it suffers from inefficiency in distributed large-batch training. We present Landscape-Smoothed SAM (LSAM), a novel optimizer that preserves SAM's generalization advantages while offering superior efficiency. LSAM integrates SAM's adversarial steps with an asynchronous distributed sampling strategy, generating an asynchronous distributed sampling scheme, producing a smoothed sharpness-aware loss landscape for optimization. This design eliminates synchronization bottlenecks, accelerates large-batch convergence, and delivers higher final accuracy compared to data-parallel SAM.

【14】Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers
标题：龙：通过验证器大规模合成长长的思想链
链接：https://arxiv.org/abs/2509.03059

作者：uang, Rishabh, Gregor Franke, Ziyi Yang, Jiamu Bai, Weijie Bai, Jinhe Bi, Zifeng Ding, Yiqun Duan, Chengyu Fan, Wendong Fan, Xin Gao, Ruohao Guo, Yuan He, Zhuangzhuang He, Xianglong Hu, Neil Johnson, Bowen Li, Fangru Lin, Siyu Lin, Tong Liu, Yunpu Ma, Hao Shen, Hao Sun, Beibei Wang, Fangyijie Wang, Hao Wang, Haoran Wang, Yang Wang, Yifeng Wang, Zhaowei Wang, Ziyang Wang, Yifan Wu, Zikai Xiao, Chengxing Xie, Fan Yang, Junxiao Yang, Qianshuo Ye, Ziyu Ye, Guangtao Zeng, Yuwen Ebony Zhang, Zeyu Zhang, Zihao Zhu, Bernard Ghanem, Philip Torr, Guohao Li
摘要：大型语言模型（LLM）的最新进展表明，它们的推理能力可以通过具有可验证奖励的强化学习（RLVR）得到显着提高，特别是在数学和编程等领域，可以自动评估地面真理的正确性。然而，由于缺乏高质量、可验证的数据集以及人类监督的高成本，将这一成功推广到其他推理密集型领域仍然具有挑战性。在这项工作中，我们介绍了Loong项目：一个开源框架，用于跨各种推理密集型领域的可扩展合成数据生成和验证。该框架由两个关键组件组成：（1）LoongBench，一个策展的种子数据集，包含12个领域（例如，高等数学，化学，逻辑），每一个都与可执行代码和丰富的元数据配对;以及（2）LoongEnv，一个模块化的合成数据生成环境，支持多种提示策略来生成新的问题-答案-代码三元组。这些组件一起形成了一个能够实现强化学习的代理-环境循环，其中基于LLM的代理因生成与代码执行的答案相一致的思想链（CoT）解决方案而获得奖励。从经验上讲，我们在一系列开源和专有的LLM上对LoongBench进行基准测试，以评估域覆盖率并揭示性能瓶颈。此外，我们对LoongEnv生成的合成数据进行了全面分析，检查了正确性，难度和多样性。代码和文档可在https://github.com/camel-ai/loong上获得。
摘要：Recent advances in Large Language Models (LLMs) have shown that their reasoning capabilities can be significantly improved through Reinforcement Learning with Verifiable Reward (RLVR), particularly in domains like mathematics and programming, where ground-truth correctness can be automatically evaluated. However, extending this success to other reasoning-intensive domains remains challenging due to the scarcity of high-quality, verifiable datasets and the high cost of human supervision. In this work, we introduce the Loong Project: an open-source framework for scalable synthetic data generation and verification across a diverse range of reasoning-intensive domains. The framework consists of two key components: (1) LoongBench, a curated seed dataset containing 8,729 human-vetted examples across 12 domains (e.g., Advanced Mathematics, Chemistry, Logic), each paired with executable code and rich metadata; and (2) LoongEnv, a modular synthetic data generation environment that supports multiple prompting strategies to produce new question-answer-code triples. Together, these components form an agent-environment loop that enables reinforcement learning, where an LLM-based agent is rewarded for generating Chain-of-Thought (CoT) solutions that align with code-executed answers. Empirically, we benchmark LoongBench on a broad suite of both open-source and proprietary LLMs to evaluate domain coverage and reveal performance bottlenecks. In addition, we conduct a comprehensive analysis of synthetic data generated by LoongEnv, examining correctness, difficulty, and diversity. Code and documentation are available at https://github.com/camel-ai/loong.

【15】Mitigating Data Imbalance in Automated Speaking Assessment
标题：缓解自动演讲评估中的数据失衡
链接：https://arxiv.org/abs/2509.03010

作者： Tsai, Kuan-Tang Huang, Bi-Cheng Yan, Tien-Hong Lo, Berlin Chen
备注：Submitted to APSIPA 2025
摘要：自动口语评估在评估二语学习者的语言水平方面起着至关重要的作用。然而，ASA模型经常遭受类不平衡，导致有偏见的预测。为了解决这个问题，我们引入了一个新的目标来训练ASA模型，称为平衡Logit变异（BLV）损失，它会干扰模型预测，以改善少数类的特征表示，而无需修改数据集。对ICNALE基准数据集的评估表明，将BLV损失集成到著名的基于文本的（BERT）模型中显着提高了分类准确性和公平性，使自动语音评估对不同的学习者更加强大。
摘要：Automated Speaking Assessment (ASA) plays a crucial role in evaluating second-language (L2) learners proficiency. However, ASA models often suffer from class imbalance, leading to biased predictions. To address this, we introduce a novel objective for training ASA models, dubbed the Balancing Logit Variation (BLV) loss, which perturbs model predictions to improve feature representation for minority classes without modifying the dataset. Evaluations on the ICNALE benchmark dataset show that integrating the BLV loss into a celebrated text-based (BERT) model significantly enhances classification accuracy and fairness, making automated speech evaluation more robust for diverse learners.

【16】Managing Correlations in Data and Privacy Demand
标题：管理数据和隐私需求的相关性
链接：https://arxiv.org/abs/2509.02856

作者： Chaudhuri, Thomas A. Courtade
备注：To appeat at ACM CCS, 2025
摘要：以前的作品在差分隐私文献，允许用户选择他们的隐私级别通常在异构差分隐私（HDP）框架下操作，简化假设用户数据和隐私级别不相关。首先，我们证明了标准的HDP框架不足时，用户数据和隐私需求被允许相关。其次，为了解决这个缺点，我们提出了一个替代框架，添加-删除异构差分隐私（AHDP），联合帐户的用户数据和隐私偏好。我们表明，AHDP是强大的数据和隐私之间可能存在的相关性。第三，我们正式提出的AHDP框架的保证，通过操作假设检验的角度来看。假设检验设置也可能对分析其他隐私框架具有独立的兴趣。第四，我们证明了存在非平凡的AHDP机制，特别是不需要先验知识的数据隐私的相关性。我们提出了一些这样的机制，并将它们应用于核心统计任务，如均值估计，频率估计和线性回归。所提出的机制是简单的实现与最小的假设和建模要求，使他们有吸引力的现实世界中使用。最后，我们经验性地评估了所提出的AHDP机制，使用LLM生成的合成数据集突出了它们的权衡，我们将其发布用于未来的研究。
摘要：Previous works in the differential privacy literature that allow users to choose their privacy levels typically operate under the heterogeneous differential privacy (HDP) framework with the simplifying assumption that user data and privacy levels are not correlated. Firstly, we demonstrate that the standard HDP framework falls short when user data and privacy demands are allowed to be correlated. Secondly, to address this shortcoming, we propose an alternate framework, Add-remove Heterogeneous Differential Privacy (AHDP), that jointly accounts for user data and privacy preference. We show that AHDP is robust to possible correlations between data and privacy. Thirdly, we formalize the guarantees of the proposed AHDP framework through an operational hypothesis testing perspective. The hypothesis testing setup may be of independent interest in analyzing other privacy frameworks as well. Fourthly, we show that there exists non-trivial AHDP mechanisms that notably do not require prior knowledge of the data-privacy correlations. We propose some such mechanisms and apply them to core statistical tasks such as mean estimation, frequency estimation, and linear regression. The proposed mechanisms are simple to implement with minimal assumptions and modeling requirements, making them attractive for real-world use. Finally, we empirically evaluate proposed AHDP mechanisms, highlighting their trade-offs using LLM-generated synthetic datasets, which we release for future research.

【17】Fast and Accurate SVD-Type Updating in Streaming Data
标题：流数据中快速准确的VD类型更新
链接：https://arxiv.org/abs/2509.02840

作者：J. Brust, Michael A. Saunders
摘要：对于数据流来说，短时间间隔内的变化通常是低等级的。对于以矩阵格式布置的高吞吐量信息，在每个步骤之后重新计算最优SVD近似通常是禁止的。相反，使用增量和截断更新策略，这可能不会扩展为大截断秩。因此，我们提出了一组有效的新算法，更新的双对角分解，并同样准确的SVD方法。特别是，我们开发了一个紧凑的Householder型算法，从低秩更新的稀疏部分，并有大约一半的标准双对角化方法的内存需求。第二种算法基于Givens旋转，每次旋转只有大约10次浮点运算，与典型的立方缩放相比，它与问题大小成二次方。因此，该算法是有效的处理高吞吐量的更新，因为我们证明在跟踪大型子空间的推荐系统和网络，以及比较知名的软件，如LAPACK或增量SVD。
摘要：For a datastream, the change over a short interval is often of low rank. For high throughput information arranged in matrix format, recomputing an optimal SVD approximation after each step is typically prohibitive. Instead, incremental and truncated updating strategies are used, which may not scale for large truncation ranks. Therefore, we propose a set of efficient new algorithms that update a bidiagonal factorization, and which are similarly accurate as the SVD methods. In particular, we develop a compact Householder-type algorithm that decouples a sparse part from a low-rank update and has about half the memory requirements of standard bidiagonalization methods. A second algorithm based on Givens rotations has only about 10 flops per rotation and scales quadratically with the problem size, compared to a typical cubic scaling. The algorithm is therefore effective for processing high-throughput updates, as we demonstrate in tracking large subspaces of recommendation systems and networks, and when compared to well known software such as LAPACK or the incremental SVD.

【18】Multi-Embodiment Locomotion at Scale with extreme Embodiment Randomization
标题：具有极端化身随机化的大规模多化身运动
链接：https://arxiv.org/abs/2509.02815

作者：inger, Jan Peters
摘要：我们提出了一个单一的，一般的运动策略训练的50条腿的机器人的不同集合。通过将改进的形态感知架构（URMAv2）与基于性能的课程相结合，以实现极端的实施随机化，我们的策略可以控制数百万种形态变化。我们的政策实现了zero-shot转移到看不见的现实世界的人形和四足机器人。
摘要：We present a single, general locomotion policy trained on a diverse collection of 50 legged robots. By combining an improved embodiment-aware architecture (URMAv2) with a performance-based curriculum for extreme Embodiment Randomization, our policy learns to control millions of morphological variations. Our policy achieves zero-shot transfer to unseen real-world humanoid and quadruped robots.

【19】Deep Research is the New Analytics System: Towards Building the Runtime for AI-Driven Analytics
标题：深度研究是新的分析系统：迈向构建人工智能驱动分析的数据库
链接：https://arxiv.org/abs/2509.02751

作者：usso, Tim Kraska
备注：6 pages, 2 figures, submitted to CIDR'26
摘要：随着大型语言模型（LLM）的进步，研究人员正在创建新的系统，可以在大型非结构化数据集上执行AI驱动的分析。最近的工作探索了使用语义运算符执行此类分析查询-一组具有自然语言规范的AI驱动的数据转换的声明性集合。然而，即使经过优化，这些操作符在数百万条记录上执行的成本也可能很高，并且它们的迭代器执行语义使它们不适合交互式数据分析任务。在另一项工作中，Deep Research系统已经证明了在大型数据集上回答自然语言问题的能力。这些系统使用一个或多个LLM代理来计划其执行、处理数据集并迭代改进其答案。然而，这些系统没有显式地优化其查询计划，这可能导致计划执行不佳。为了让AI驱动的分析脱颖而出，我们需要一个运行时，它将语义运算符的优化执行与深度研究系统的灵活性和更动态的执行相结合。作为实现这一愿景的第一步，我们构建了一个原型，使深度研究代理能够编写和执行优化的语义运算符程序。我们评估了我们的原型，并证明它可以在两个基本查询上胜过手工制作的语义运算符程序和开放的深度研究系统。与标准的开放Deep Research代理相比，我们的原型的F1分数提高了1.95倍。此外，即使我们让代理访问语义运算符作为工具，我们的原型仍然实现了高达76.8%和72.7%的成本和运行时间节省由于其优化的执行。
摘要：With advances in large language models (LLMs), researchers are creating new systems that can perform AI-driven analytics over large unstructured datasets. Recent work has explored executing such analytics queries using semantic operators -- a declarative set of AI-powered data transformations with natural language specifications. However, even when optimized, these operators can be expensive to execute on millions of records and their iterator execution semantics make them ill-suited for interactive data analytics tasks. In another line of work, Deep Research systems have demonstrated an ability to answer natural language question(s) over large datasets. These systems use one or more LLM agent(s) to plan their execution, process the dataset(s), and iteratively refine their answer. However, these systems do not explicitly optimize their query plans which can lead to poor plan execution. In order for AI-driven analytics to excel, we need a runtime which combines the optimized execution of semantic operators with the flexibility and more dynamic execution of Deep Research systems. As a first step towards this vision, we build a prototype which enables Deep Research agents to write and execute optimized semantic operator programs. We evaluate our prototype and demonstrate that it can outperform a handcrafted semantic operator program and open Deep Research systems on two basic queries. Compared to a standard open Deep Research agent, our prototype achieves up to 1.95x better F1-score. Furthermore, even if we give the agent access to semantic operators as tools, our prototype still achieves cost and runtime savings of up to 76.8% and 72.7% thanks to its optimized execution.

【20】Preference Robustness for DPO with Applications to Public Health
标题：DPO在公共卫生应用中的偏好稳健性
链接：https://arxiv.org/abs/2509.02709

作者： Kim, Shresth Verma, Mauricio Tec, Milind Tambe
摘要：我们研究了一个LLM微调任务，用于设计公共卫生中顺序资源分配问题的奖励函数，以自然语言表达的人类偏好为指导。由于目标复杂而模糊，数据可用性有限，这种设置为校准提供了一个具有挑战性的测试平台。我们提出了DPO-PRO，一个强大的微调算法的基础上直接偏好优化（DPO），占不确定性的偏好分布使用一个轻量级的分布鲁棒优化（DRO）制定。与先前的基于DRO的DPO方法不同，DPO-PRO明显不那么保守。我们评估DPO-PRO在现实世界的孕产妇移动健康计划由非营利组织ARMMAN，以及标准的对齐基准。实验结果表明，与现有的DPO变体相比，我们的方法始终提高了对噪声偏好信号的鲁棒性。此外，DPO-PRO实现了与先前的基于自我反思的奖励函数设计基线相当的性能，同时需要显著降低的推理时间成本。
摘要：We study an LLM fine-tuning task for designing reward functions for sequential resource allocation problems in public health, guided by human preferences expressed in natural language. This setting presents a challenging testbed for alignment due to complex and ambiguous objectives and limited data availability. We propose DPO-PRO, a robust fine-tuning algorithm based on Direct Preference Optimization (DPO), which accounts for uncertainty in the preference distribution using a lightweight Distributionally Robust Optimization (DRO) formulation. Unlike prior DRO-based DPO methods, DPO-PRO is significantly less conservative. We evaluate DPO-PRO on a real-world maternal mobile health program operated by the non-profit organization ARMMAN, as well as on standard alignment benchmarks. Experimental results demonstrate that our method consistently improves robustness to noisy preference signals compared to existing DPO variants. Moreover, DPO-PRO achieves comparable performance to prior self-reflection-based baseline for reward function design, while requiring significantly lower inference-time cost.

【21】The Future of Artificial Intelligence and the Mathematical and Physical Sciences (AI+MPS)
标题：人工智能以及数学和物理科学的未来（AI+MPS）
链接：https://arxiv.org/abs/2509.02661

作者：rguson, Marisa LaFleur, Lars Ruthotto, Jesse Thaler, Yuan-Sen Ting, Pratyush Tiwary, Soledad Villar, E. Paulo Alves, Jeremy Avigad, Simon Billinge, Camille Bilodeau, Keith Brown, Emmanuel Candes, Arghya Chattopadhyay, Bingqing Cheng, Jonathan Clausen, Connor Coley, Andrew Connolly, Fred Daum, Sijia Dong, Chrisy Xiyu Du, Cora Dvorkin, Cristiano Fanelli, Eric B. Ford, Luis Manuel Frutos, Nicolás García Trillos, Cecilia Garraffo, Robert Ghrist, Rafael Gomez-Bombarelli, Gianluca Guadagni, Sreelekha Guggilam, Sergei Gukov, Juan B. Gutiérrez, Salman Habib, Johannes Hachmann, Boris Hanin, Philip Harris, Murray Holland, Elizabeth Holm, Hsin-Yuan Huang, Shih-Chieh Hsu, Nick Jackson, Olexandr Isayev, Heng Ji, Aggelos Katsaggelos, Jeremy Kepner, Yannis Kevrekidis, Michelle Kuchera, J. Nathan Kutz, Branislava Lalic, Ann Lee, Matt LeBlanc, Josiah Lim, Rebecca Lindsey, Yongmin Liu, Peter Y. Lu, Sudhir Malik, Vuk Mandic, Vidya Manian, Emeka P. Mazi, Pankaj Mehta, Peter Melchior, Brice Ménard, Jennifer Ngadiuba, Stella Offner, Elsa Olivetti, Shyue Ping Ong, Christopher Rackauckas, Philippe Rigollet, Chad Risko, Philip Romero, Grant Rotskoff, Brett Savoie, Uros Seljak, David Shih, Gary Shiu, Dima Shlyakhtenko, Eva Silverstein, Taylor Sparks, Thomas Strohmer, Christopher Stubbs, Stephen Thomas, Suriyanarayanan Vaikuntanathan, Rene Vidal, Francisco Villaescusa-Navarro, Gregory Voth, Benjamin Wandelt, Rachel Ward, Melanie Weber, Risa Wechsler, Stephen Whitelam, Olaf Wiest, Mike Williams, Zhuoran Yang, Yaroslava G. Yingling, Bin Yu, Shuwen Yue, Ann Zabludoff, Huimin Zhao, Tong Zhang
备注：Community Paper from the Future of NSF AI+MPS Workshop, Cambridge, Massachusetts, March 24-26, 2025, supported by NSF Award Number 2512945
摘要：这篇社区论文是在2025年3月举行的NSF人工智能（AI）和数学与物理科学（MPS）未来研讨会上开发的，其目标是了解MPS领域（天文学，化学，材料研究，数学科学和物理学）如何最好地利用和促进AI的未来。我们在这里介绍了MPS社区的观点的摘要和快照，截至2025年春季/夏季，在一个快速发展的领域。人工智能和MPS之间的联系变得越来越不可分割;现在是加强人工智能和科学之间联系的关键时刻，通过采取主动和深思熟虑的战略，利用人工智能的潜力进行科学发现，并通过应用基础科学的概念来优化影响人工智能发展的机会。为了实现这一目标，我们提出了以下活动和战略重点：（1）实现AI+MPS研究的双向发展;（2）建立AI+MPS研究人员的跨学科社区;（3）促进MPS研究人员和学生在AI方面的教育和劳动力发展。最后，我们总结了资助机构、教育机构和个人研究人员的建议优先事项，以帮助MPS社区成为AI+MPS变革潜力的领导者并充分利用这一潜力。
摘要：This community paper developed out of the NSF Workshop on the Future of Artificial Intelligence (AI) and the Mathematical and Physics Sciences (MPS), which was held in March 2025 with the goal of understanding how the MPS domains (Astronomy, Chemistry, Materials Research, Mathematical Sciences, and Physics) can best capitalize on, and contribute to, the future of AI. We present here a summary and snapshot of the MPS community's perspective, as of Spring/Summer 2025, in a rapidly developing field. The link between AI and MPS is becoming increasingly inextricable; now is a crucial moment to strengthen the link between AI and Science by pursuing a strategy that proactively and thoughtfully leverages the potential of AI for scientific discovery and optimizes opportunities to impact the development of AI by applying concepts from fundamental science. To achieve this, we propose activities and strategic priorities that: (1) enable AI+MPS research in both directions; (2) build up an interdisciplinary community of AI+MPS researchers; and (3) foster education and workforce development in AI for MPS researchers and students. We conclude with a summary of suggested priorities for funding agencies, educational institutions, and individual researchers to help position the MPS community to be a leader in, and take full advantage of, the transformative potential of AI+MPS.

【22】Towards Performatively Stable Equilibria in Decision-Dependent Games for Arbitrary Data Distribution Maps
标题：任意数据分布地图的决策相关博弈中实现表演稳定均衡
链接：https://arxiv.org/abs/2509.02619

作者：g Zhong, Yang Liu, Jiming Liu
摘要：在决策依赖型博弈中，多个参与者在数据分布下优化他们的决策，数据分布会随着他们的联合行动而变化，从而在市场定价等应用中产生复杂的动态。这些动态的一个实际结果是执行稳定均衡，其中每个参与者的策略是诱导分布下的最佳对策。之前的工作依赖于$\beta$-平滑度，假设损失函数梯度相对于数据分布的Lipschitz连续性，这是不切实际的，因为数据分布映射，即，联合决策和最终的分布偏移之间的关系通常是未知的，使得$\beta$无法获得。为了克服这一限制，我们提出了一个基于梯度的灵敏度测度，直接量化决策引起的分布变化的影响。利用这一措施，我们得到的收敛保证表演稳定的均衡强单调性的实际可行的假设下。因此，我们开发了一个敏感性告知重复再训练算法，调整球员的损失函数的基础上的敏感性措施，保证收敛到表演稳定的平衡任意数据分布图。在预测误差最小化博弈、古诺竞争和收益最大化博弈上的实验表明，该方法的性能优于最先进的基线，实现了更低的损失和更快的收敛。
摘要：In decision-dependent games, multiple players optimize their decisions under a data distribution that shifts with their joint actions, creating complex dynamics in applications like market pricing. A practical consequence of these dynamics is the \textit{performatively stable equilibrium}, where each player's strategy is a best response under the induced distribution. Prior work relies on $\beta$-smoothness, assuming Lipschitz continuity of loss function gradients with respect to the data distribution, which is impractical as the data distribution maps, i.e., the relationship between joint decision and the resulting distribution shifts, are typically unknown, rendering $\beta$ unobtainable. To overcome this limitation, we propose a gradient-based sensitivity measure that directly quantifies the impact of decision-induced distribution shifts. Leveraging this measure, we derive convergence guarantees for performatively stable equilibria under a practically feasible assumption of strong monotonicity. Accordingly, we develop a sensitivity-informed repeated retraining algorithm that adjusts players' loss functions based on the sensitivity measure, guaranteeing convergence to performatively stable equilibria for arbitrary data distribution maps. Experiments on prediction error minimization game, Cournot competition, and revenue maximization game show that our approach outperforms state-of-the-art baselines, achieving lower losses and faster convergence.

【23】From Image Denoisers to Regularizing Imaging Inverse Problems: An Overview
标题：从图像降噪器到正规化成像逆问题：概述
链接：https://arxiv.org/abs/2509.03475

作者：an, Subhadip Mukherjee, Junqi Tang
摘要：逆问题是现代成像科学的核心，在医学成像、遥感和显微镜等领域有着广泛的应用。近年来，解决成像逆问题的范式发生了转变，数据驱动的正则化器的使用越来越多，从而实现了显着的高保真度重建。数据驱动正则化的一个特别值得注意的方法是使用学习的图像去噪器作为迭代图像重建算法中的隐式先验。本调查全面概述了这类功能强大的新兴算法，通常称为即插即用（Plug-and-Play，简称PSTs）方法。我们首先简要介绍图像去噪和逆问题的背景，然后简要回顾传统的正则化策略。然后，我们探讨如何近端分裂算法，如交替方向的乘法器（ADMM）和近端梯度下降（PGD），可以自然地容纳学习去噪器的地方近端运营商，在什么条件下，这种替代保持收敛。讨论了Tweedie公式在连接最佳高斯去噪器和分数估计中的作用，这为去噪正则化（RED）和最近的基于扩散的后验采样方法奠定了基础。我们讨论了理论上的进展，无论是在RED和近端设置的收敛性的PADER算法，强调的结构假设，降噪必须满足收敛，如非扩张性，Lipschitz连续性，和局部均匀性。我们还解决了算法设计中的实际考虑，包括去噪架构和加速策略的选择。
摘要：Inverse problems lie at the heart of modern imaging science, with broad applications in areas such as medical imaging, remote sensing, and microscopy. Recent years have witnessed a paradigm shift in solving imaging inverse problems, where data-driven regularizers are used increasingly, leading to remarkably high-fidelity reconstruction. A particularly notable approach for data-driven regularization is to use learned image denoisers as implicit priors in iterative image reconstruction algorithms. This survey presents a comprehensive overview of this powerful and emerging class of algorithms, commonly referred to as plug-and-play (PnP) methods. We begin by providing a brief background on image denoising and inverse problems, followed by a short review of traditional regularization strategies. We then explore how proximal splitting algorithms, such as the alternating direction method of multipliers (ADMM) and proximal gradient descent (PGD), can naturally accommodate learned denoisers in place of proximal operators, and under what conditions such replacements preserve convergence. The role of Tweedie's formula in connecting optimal Gaussian denoisers and score estimation is discussed, which lays the foundation for regularization-by-denoising (RED) and more recent diffusion-based posterior sampling methods. We discuss theoretical advances regarding the convergence of PnP algorithms, both within the RED and proximal settings, emphasizing the structural assumptions that the denoiser must satisfy for convergence, such as non-expansiveness, Lipschitz continuity, and local homogeneity. We also address practical considerations in algorithm design, including choices of denoiser architecture and acceleration strategies.

【24】Quantifying the Social Costs of Power Outages and Restoration Disparities Across Four U.S. Hurricanes
标题：量化美国四次飓风停电和恢复差距的社会成本
链接：https://arxiv.org/abs/2509.02653

作者： Li, Junwei Ma, Bo Li, Ali Mostafavi
摘要：灾害影响的多面性表明，人口稠密地区对总体负担的贡献更大，而人口稀少但受灾严重的地区在个人层面上遭受的痛苦则不成比例。本研究介绍了一个框架，通过将客户加权停电风险转化为剥夺措施，将福利指标与三个恢复指标，每个客户的平均停电天数，恢复时间和相对恢复率，计算从连续的EAGLE I观察和链接到邮政编码制表区人口统计，量化停电的社会影响。应用于四个美国飓风，Beryl 2024 Texas，Helene 2024 Florida，Milton 2024 Florida和Ida 2021 Louisiana，该标准化管道提供了第一个交叉事件，停电影响及其驱动因素的精细规模评估。结果表明，回归模式与更大的负担，在低收入地区，机械分析表明，剥夺增加与更长的恢复持续时间和更快的恢复率，可解释的建模确定恢复持续时间为主导的驱动程序，聚类揭示了不同的恢复类型，而不是传统的可靠性指标。该框架提供了一种可转移的方法，用于评估停电影响和公平性，比较跨事件证据将恢复动态与社会成果联系起来，以及支持公平知情的恢复规划和恢复力投资的可操作空间分析。
摘要：The multifaceted nature of disaster impact shows that densely populated areas contribute more to aggregate burden, while sparsely populated but heavily affected regions suffer disproportionately at the individual level. This study introduces a framework for quantifying the societal impacts of power outages by translating customer weighted outage exposure into deprivation measures, integrating welfare metrics with three recovery indicators, average outage days per customer, restoration duration, and relative restoration rate, computed from sequential EAGLE I observations and linked to Zip Code Tabulation Area demographics. Applied to four United States hurricanes, Beryl 2024 Texas, Helene 2024 Florida, Milton 2024 Florida, and Ida 2021 Louisiana, this standardized pipeline provides the first cross event, fine scale evaluation of outage impacts and their drivers. Results demonstrate regressive patterns with greater burdens in lower income areas, mechanistic analysis shows deprivation increases with longer restoration durations and decreases with faster restoration rates, explainable modeling identifies restoration duration as the dominant driver, and clustering reveals distinct recovery typologies not captured by conventional reliability metrics. This framework delivers a transferable method for assessing outage impacts and equity, comparative cross event evidence linking restoration dynamics to social outcomes, and actionable spatial analyses that support equity informed restoration planning and resilience investment.

【25】Quantifying Clinician Bias and its Effects on Schizophrenia Diagnosis in the Emergency Department of the Mount Sinai Health System
标题：西奈山卫生系统急诊科量化临床医生偏见及其对精神分裂症诊断的影响
链接：https://arxiv.org/abs/2509.02651

作者： Valentine, Lauren A. Lepow, Lili Chan, Alexander W. Charney, Isotta Landi
摘要：在美国，精神分裂症（SCZ）带有种族和性别差异，这可能是由临床医生偏见解释的-临床医生对患者的信念妨碍了公正的临床决策。急诊科（ED）的特点是压力率较高，导致临床医生在决策过程中更多地依赖于隐性偏见。在这项工作中，我们考虑了来自纽约市西奈山卫生系统（MSHS）的ED中的精神病患者的大型队列，以调查临床医生偏见对SCZ诊断的影响，同时控制已知的风险因素和患者的社会人口学信息。临床医生的偏见被量化为患者的第一个ED笔记中的否定句子与总句子的比率。我们利用逻辑回归来预测SCZ的诊断，患者的种族，性别，年龄，创伤或物质使用障碍的历史，以及否定句的比例。我们的研究结果表明，增加的比率的负面句子与获得SCZ诊断的几率较高[OR（95%CI）=1.408（1.361-1.456）]。确定为男性[OR（95%CI）=1.112（1.055-1.173）]或黑人[OR（95%CI）=1.081（1.031-1.133）]增加了被诊断为SCZ的几率。然而，从交叉晶状体来看，SES高的黑人女性患者获得SCZ诊断的几率最高[OR（95% CI）=1.629（1.535-1.729）]。这些结果表明，SES并不能作为所有患者SCZ诊断的保护性缓冲，需要更多地关注健康差异的量化。最后，我们证明了临床医生的偏见是可操作的与现实世界的数据，并与获得污名化的诊断，如SCZ的几率增加。
摘要：In the United States, schizophrenia (SCZ) carries a race and sex disparity that may be explained by clinician bias - a belief held by a clinician about a patient that prevents impartial clinical decision making. The emergency department (ED) is marked by higher rates of stress that lead to clinicians relying more on implicit biases during decision making. In this work, we considered a large cohort of psychiatric patients in the ED from the Mount Sinai Health System (MSHS) in New York City to investigate the effects of clinician bias on SCZ diagnosis while controlling for known risk factors and patient sociodemographic information. Clinician bias was quantified as the ratio of negative to total sentences within a patient's first ED note. We utilized a logistic regression to predict SCZ diagnosis given patient race, sex, age, history of trauma or substance use disorder, and the ratio of negative sentences. Our findings showed that an increased ratio of negative sentences is associated with higher odds of obtaining a SCZ diagnosis [OR (95% CI)=1.408 (1.361-1.456)]. Identifying as male [OR (95% CI)=1.112 (1.055-1.173)] or Black [OR (95% CI)=1.081(1.031-1.133)] increased one's odds of being diagnosed with SCZ. However, from an intersectional lens, Black female patients with high SES have the highest odds of obtaining a SCZ diagnosis [OR (95% CI)=1.629 (1.535-1.729)]. Results such as these suggest that SES does not act as a protective buffer against SCZ diagnosis in all patients, demanding more attention to the quantification of health disparities. Lastly, we demonstrated that clinician bias is operational with real world data and related to increased odds of obtaining a stigmatizing diagnosis such as SCZ.

【26】Gaussian process surrogate with physical law-corrected prior for multi-coupled PDEs defined on irregular geometry
标题：针对不规则几何形状上定义的多耦合偏出方程，具有物理定律修正先验的高斯过程代理
链接：https://arxiv.org/abs/2509.02617

作者：ang, Hongqiao Wang, Wenzhou Lin, Qian Chen, Heng Yong
备注：40 pages, 16 figures, 7 tables
摘要：参数化偏微分方程（PDE）是建模复杂物理系统的基本数学工具，但它们在参数空间中的数值计算仍然是计算密集型的，当使用传统的高保真求解器。为了应对这一挑战，我们提出了一种新的物理定律校正先验高斯过程（LC先验GP）代理建模框架，有效地将数据驱动学习与底层物理约束相结合，以灵活地处理复杂几何形状上定义的多耦合变量。所提出的方法利用适当的正交分解（POD）参数化高维PDE的解决方案，通过其主导模式和相关的系数，从而使有效的高斯过程（GP）的代理建模降维系数空间内。一个关键的贡献在于结合物理定律以及有限数量的参数样本来校正GP后验均值，从而避免依赖于计算昂贵的数值求解器。此外，构造插值函数来描述从全参数空间到基于物理的校正项的映射。这个映射随后被反向传播以约束原始GP代理，从而产生物理上更一致的条件先验。为了处理不规则的几何形状，径向基函数有限差分（RBF-FD）方法被纳入训练集计算过程中，其固有的微分矩阵提供计算效率和数值精度的物理约束优化。通过对反应扩散模型、混相驱模型和定义在不规则区域上的多物理场耦合Navier-Stokes方程的数值实验，验证了该方法的有效性.
摘要：Parametric partial differential equations (PDEs) are fundamental mathematical tools for modeling complex physical systems, yet their numerical evaluation across parameter spaces remains computationally intensive when using conventional high-fidelity solvers. To address this challenge, we propose a novel physical law-corrected prior Gaussian process (LC-prior GP) surrogate modeling framework that effectively integrates data-driven learning with underlying physical constraints to flexibly handle multi-coupled variables defined on complex geometries. The proposed approach leverages proper orthogonal decomposition (POD) to parameterize high-dimensional PDE solutions via their dominant modes and associated coefficients, thereby enabling efficient Gaussian process (GP) surrogate modeling within a reduced-dimensional coefficient space. A key contribution lies in the incorporation of physical laws together with a limited number of parameter samples to correct the GP posterior mean, thus avoiding reliance on computationally expensive numerical solvers. Furthermore, interpolation functions are constructed to describe the mapping from the full parameter space to the physics-based correction term. This mapping is subsequently backpropagated to constrain the original GP surrogate, yielding a more physically consistent conditional prior. To handle irregular geometries, the radial basis function-finite difference (RBF-FD) method is incorporated during training set computation, with its inherent differentiation matrices providing both computational efficiency and numerical accuracy for physical constraint optimization. The effectiveness of the proposed method is demonstrated through numerical experiments involving a reaction-diffusion model, miscible flooding models, and Navier-Stokes equations with multi-physics coupling defined on irregular domains.

【27】Use ADAS Data to Predict Near-Miss Events: A Group-Based Zero-Inflated Poisson Approach
标题：使用ADAS数据预测未遂事件：基于群体的零膨胀Poisson方法
链接：https://arxiv.org/abs/2509.02614

作者：ng, Montserrat Guillen, Lishuai Li, Xin Li, Youhua Frank Chen
备注：Preprint. 10 pages, 3 figures, 4 tables. Submitted to 2025 IEEE International Conference on Big Data (IEEE BigData 2025). Corresponding authors: Youhua Frank Chen (youhchen@cityu.this http URL)
摘要：驾驶行为大数据利用多传感器远程信息处理来了解人们如何驾驶，并为风险评估、保险定价和有针对性的干预等应用提供动力。建立在这些数据基础上的基于使用的保险（UBI）已经成为主流。远程信息处理捕获的未遂事件（NME）提供了一个及时的替代索赔为基础的风险，但每周NME是稀疏的，高度零膨胀，和行为异构，即使在曝光正常化。通过分析多传感器远程信息处理和ADAS警告，我们发现传统的统计模型不适合数据集。我们通过提出一组零膨胀泊松（ZIP）框架来解决这些挑战，该框架通过EM学习潜在行为组并拟合基于偏移的计数模型，以产生校准的，可解释的每周风险预测。使用来自354名商业驾驶员的车队的自然数据集，在此期间，驾驶员完成了287，511次行程，总共记录了8，142，896公里，我们的结果显示了基线和先前的远程信息处理模型的一致改进，样本内AIC/BIC值较低，样本外校准更好。我们还对基于EM的分组进行了聚类数量的敏感性分析，发现收益是稳健和可解释的。实际上，这支持每周的情景感知费率制定，并通过识别不同的驾驶风格来实现更公平的保费。
摘要：Driving behavior big data leverages multi-sensor telematics to understand how people drive and powers applications such as risk evaluation, insurance pricing, and targeted intervention. Usage-based insurance (UBI) built on these data has become mainstream. Telematics-captured near-miss events (NMEs) provide a timely alternative to claim-based risk, but weekly NMEs are sparse, highly zero-inflated, and behaviorally heterogeneous even after exposure normalization. Analyzing multi-sensor telematics and ADAS warnings, we show that the traditional statistical models underfit the dataset. We address these challenges by proposing a set of zero-inflated Poisson (ZIP) frameworks that learn latent behavior groups and fit offset-based count models via EM to yield calibrated, interpretable weekly risk predictions. Using a naturalistic dataset from a fleet of 354 commercial drivers over a year, during which the drivers completed 287,511 trips and logged 8,142,896 km in total, our results show consistent improvements over baselines and prior telematics models, with lower AIC/BIC values in-sample and better calibration out-of-sample. We also conducted sensitivity analyses on the EM-based grouping for the number of clusters, finding that the gains were robust and interpretable. Practically, this supports context-aware ratemaking on a weekly basis and fairer premiums by recognizing heterogeneous driving styles.

【28】Gaussian Process Regression of Steering Vectors With Physics-Aware Deep Composite Kernels for Augmented Listening
标题：具有物理感知深度复合核的引导载体的高斯过程回归以增强听力
链接：https://arxiv.org/abs/2509.02571

作者：Carlo (RIKEN AIP), Koyama Shoichi (UTokyo), Nugraha Aditya Arie (RIKEN AIP), Fontaine Mathieu (LTCI, S2A), Bando Yoshiaki (AIST), Yoshii Kazuyoshi (RIKEN AIP)
摘要：本文研究了用于增强收听的导向向量在麦克风和源的频率和位置上的连续表示（例如，空间滤波和双耳再现），并精确控制用户感知的声场。导向矢量通常用于将声场的空间特性表示为收听位置的函数。假设理想环境的导向矢量的基本代数表示不能处理声场的散射效应。因此，可以收集在专用设施中测量的实际导向矢量的离散集合，并且超分辨（即，upsample）。最近，物理感知的深度学习方法已有效地用于此目的。然而，这种确定性的超分辨率，遭受过拟合问题，由于测量空间上的非均匀的不确定性。为了解决这个问题，我们将基于神经场（NF）的表达表示集成到基于高斯过程（GP）的原则概率框架中。具体来说，我们提出了一个物理感知的复合内核，模型的方向传入波和随后的散射效果。通过综合对比实验，验证了该方法在数据不足情况下的有效性.在下游任务中，如语音增强和双耳渲染，使用SPECTRA挑战的模拟数据，预言性能达到不到十倍的测量。
摘要：This paper investigates continuous representations of steering vectors over frequency and position of microphone and source for augmented listening (e.g., spatial filtering and binaural rendering) with precise control of the sound field perceived by the user. Steering vectors have typically been used for representing the spatial characteristics of the sound field as a function of the listening position. The basic algebraic representation of steering vectors assuming an idealized environment cannot deal with the scattering effect of the sound field. One may thus collect a discrete set of real steering vectors measured in dedicated facilities and super-resolve (i.e., upsample) them. Recently, physics-aware deep learning methods have been effectively used for this purpose. Such deterministic super-resolution, however, suffers from the overfitting problem due to the non-uniform uncertainty over the measurement space. To solve this problem, we integrate an expressive representation based on the neural field (NF) into the principled probabilistic framework based on the Gaussian process (GP). Specifically, we propose a physics-aware composite kernel that model the directional incoming waves and the subsequent scattering effect. Our comprehensive comparative experiment showed the effectiveness of the proposed method under data insufficiency conditions. In downstream tasks such as speech enhancement and binaural rendering using the simulated data of the SPEAR challenge, the oracle performances were attained with less than ten times fewer measurements.

【29】EEG-MSAF: An Interpretable Microstate Framework uncovers Default-Mode Decoherence in Early Neurodegeneration
标题：EEG-MSAF：可解释微观状态框架揭示早期神经退行性疾病的默认模式去相关
链接：https://arxiv.org/abs/2509.02568

作者：Mehedi Hasan, Pedro G. Lind, Hernando Ombao, Anis Yazidi, Rabindra Khadka
备注：Dementia, EEG, Microstates, Explainable, SHAP
摘要：痴呆症（DEM）是一个日益严重的全球健康挑战，强调了早期和准确诊断的必要性。脑电图（EEG）为大脑活动提供了一个非侵入性的窗口，但传统方法难以捕捉其短暂的复杂性。我们提出了\textbf{EEG微状态分析框架（EEG-MSAF）}，这是一个端到端的管道，它利用EEG微状态离散，准稳定的地形来识别DEM相关的生物标志物，并区分DEM，轻度认知障碍（MCI）和正常认知（NC）。EEG-MSAF包括三个阶段：（1）自动微观状态特征提取，（2）机器学习（ML）分类，以及（3）使用Shapley加法解释（SHAP）进行特征排名以突出显示关键生物标志物。我们在两个EEG数据集上进行评估：公共的Chung-Ang大学EEG（EEG）数据集和塞萨洛尼基医院的临床队列。我们的框架展示了强大的性能和通用性。在EEG上，EEG-MSAF-SVM达到了\textbf{89\% $\pm $0.01精度}，超过深度学习基线CEEDNET\textbf{19.3\%}。在塞萨洛尼基数据集上，它达到了\textbf{95\% $\pm $0.01精度}，与EEGConvNeXt相当。SHAP分析确定了平均相关性和发生率作为最具信息量的指标：微观状态C（显着性/注意力网络）的破坏主导DEM预测，而微观状态F，一种新的默认模式模式，出现作为MCI和DEM的关键早期生物标志物。通过结合准确性，概括性和可解释性，EEG-MSAF推进基于EEG的痴呆诊断，并揭示了整个认知频谱的脑动力学。
摘要：Dementia (DEM) is a growing global health challenge, underscoring the need for early and accurate diagnosis. Electroencephalography (EEG) provides a non-invasive window into brain activity, but conventional methods struggle to capture its transient complexity. We present the \textbf{EEG Microstate Analysis Framework (EEG-MSAF)}, an end-to-end pipeline that leverages EEG microstates discrete, quasi-stable topographies to identify DEM-related biomarkers and distinguish DEM, mild cognitive impairment (MCI), and normal cognition (NC). EEG-MSAF comprises three stages: (1) automated microstate feature extraction, (2) classification with machine learning (ML), and (3) feature ranking using Shapley Additive Explanations (SHAP) to highlight key biomarkers. We evaluate on two EEG datasets: the public Chung-Ang University EEG (CAUEEG) dataset and a clinical cohort from Thessaloniki Hospital. Our framework demonstrates strong performance and generalizability. On CAUEEG, EEG-MSAF-SVM achieves \textbf{89\% $\pm$ 0.01 accuracy}, surpassing the deep learning baseline CEEDNET by \textbf{19.3\%}. On the Thessaloniki dataset, it reaches \textbf{95\% $\pm$ 0.01 accuracy}, comparable to EEGConvNeXt. SHAP analysis identifies mean correlation and occurrence as the most informative metrics: disruption of microstate C (salience/attention network) dominates DEM prediction, while microstate F, a novel default-mode pattern, emerges as a key early biomarker for both MCI and DEM. By combining accuracy, generalizability, and interpretability, EEG-MSAF advances EEG-based dementia diagnosis and sheds light on brain dynamics across the cognitive spectrum.

机器翻译由腾讯交互翻译提供，仅供参考

点击“阅读原文”获取带摘要的学术速递