机器学习学术速递[8.21]

点击阅读原文访问arxivdaily.com，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏等功能！

cs.LG 方向，今日共计124篇

大模型相关(17篇)

【1】Universal and Transferable Adversarial Attack on Large Language Models Using Exponentiated Gradient Descent
标题：使用指数梯度下降对大型语言模型进行普遍且可转移的对抗攻击
链接：https://arxiv.org/abs/2508.14853

作者：was, Mao Nishino, Samuel Jacob Chacko, Xiuwen Liu
摘要：随着大型语言模型（LLM）越来越多地部署在关键应用程序中，确保其鲁棒性和安全性仍然是一个重大挑战。尽管对齐技术（例如基于人类反馈的强化学习（RLHF））在典型提示上取得了整体成功，但LLM仍然容易受到附加到用户提示的精心制作的对抗性触发器的越狱攻击。大多数现有的越狱方法要么依赖于在离散令牌空间上的低效搜索，要么直接优化连续嵌入。虽然连续嵌入可以直接作为输入提供给选定的开源模型，但对于专有模型来说这样做是不可行的。另一方面，将这些嵌入投射回有效的离散令牌会引入额外的复杂性，并且通常会降低攻击的有效性。我们提出了一种内在优化方法，该方法使用指数梯度下降结合Bregman投影直接优化对抗后缀令牌的松弛独热编码，确保每个令牌的优化独热编码始终保持在概率单纯形内。我们为我们提出的方法提供了理论上的收敛性证明，并实现了一个有效的算法，有效地越狱几个广泛使用的LLM。与三个最先进的基线相比，我们的方法实现了更高的成功率和更快的收敛速度，这些基线是在五个开源LLM和四个用于评估越狱方法的对抗行为数据集上进行评估的。除了单独的提示攻击，我们还生成了在多个提示中有效的通用对抗后缀，并展示了优化后缀到不同LLM的可转移性。
摘要：As large language models (LLMs) are increasingly deployed in critical applications, ensuring their robustness and safety alignment remains a major challenge. Despite the overall success of alignment techniques such as reinforcement learning from human feedback (RLHF) on typical prompts, LLMs remain vulnerable to jailbreak attacks enabled by crafted adversarial triggers appended to user prompts. Most existing jailbreak methods either rely on inefficient searches over discrete token spaces or direct optimization of continuous embeddings. While continuous embeddings can be given directly to selected open-source models as input, doing so is not feasible for proprietary models. On the other hand, projecting these embeddings back into valid discrete tokens introduces additional complexity and often reduces attack effectiveness. We propose an intrinsic optimization method which directly optimizes relaxed one-hot encodings of the adversarial suffix tokens using exponentiated gradient descent coupled with Bregman projection, ensuring that the optimized one-hot encoding of each token always remains within the probability simplex. We provide theoretical proof of convergence for our proposed method and implement an efficient algorithm that effectively jailbreaks several widely used LLMs. Our method achieves higher success rates and faster convergence compared to three state-of-the-art baselines, evaluated on five open-source LLMs and four adversarial behavior datasets curated for evaluating jailbreak methods. In addition to individual prompt attacks, we also generate universal adversarial suffixes effective across multiple prompts and demonstrate transferability of optimized suffixes to different LLMs.

【2】PepThink-R1: LLM for Interpretable Cyclic Peptide Optimization with CoT SFT and Reinforcement Learning
标题：PepThink-R1：采用CoT SFT和强化学习的可解释环肽优化LLM
链接：https://arxiv.org/abs/2508.14765

作者：ng, Hang Zhang, Trieu Nguyen, Shasha Feng, Hao-Wei Pang, Xiang Yu, Li Xiao, Peter Zhiping Zhang
摘要：设计具有定制性质的治疗肽受到序列空间的巨大性、有限的实验数据和当前生成模型的可解释性差的阻碍。为了应对这些挑战，我们引入了PepThink-R1，这是一个生成框架，它将大型语言模型（LLM）与思想链（CoT）监督微调和强化学习（RL）集成在一起。与以前的方法不同，PepThink-R1在序列生成过程中明确地对单体水平的修饰进行推理，从而在优化多种药理学特性的同时实现可解释的设计选择。在平衡化学有效性和属性改进的定制奖励函数的指导下，该模型自主探索不同的序列变体。我们证明PepThink-R1产生具有显著增强的亲脂性、稳定性和暴露的环肽，优于现有的一般LLM（例如，GPT-5）和特定领域的基线在优化成功和可解释性。据我们所知，这是第一个基于LLM的肽设计框架，它将显式推理与RL驱动的属性控制相结合，标志着向可靠和透明的肽优化迈出了一步。
摘要：Designing therapeutic peptides with tailored properties is hindered by the vastness of sequence space, limited experimental data, and poor interpretability of current generative models. To address these challenges, we introduce PepThink-R1, a generative framework that integrates large language models (LLMs) with chain-of-thought (CoT) supervised fine-tuning and reinforcement learning (RL). Unlike prior approaches, PepThink-R1 explicitly reasons about monomer-level modifications during sequence generation, enabling interpretable design choices while optimizing for multiple pharmacological properties. Guided by a tailored reward function balancing chemical validity and property improvements, the model autonomously explores diverse sequence variants. We demonstrate that PepThink-R1 generates cyclic peptides with significantly enhanced lipophilicity, stability, and exposure, outperforming existing general LLMs (e.g., GPT-5) and domain-specific baseline in both optimization success and interpretability. To our knowledge, this is the first LLM-based peptide design framework that combines explicit reasoning with RL-driven property control, marking a step toward reliable and transparent peptide optimization for therapeutic discovery.

【3】HERAKLES: Hierarchical Skill Compilation for Open-ended LLM Agents
标题：HERAKLES：开放式LLM代理的分层技能汇编
链接：https://arxiv.org/abs/2508.14751

作者：rta, Clément Romac, Loris Gaven, Pierre-Yves Oudeyer, Olivier Sigaud, Sylvain Lamprier
备注：42 pages
摘要：开放式AI代理需要能够有效地学习在其生命周期中增加复杂性，抽象性和异质性的目标。除了有效地对自己的目标进行采样之外，autotelic agents特别需要能够控制目标日益增长的复杂性，限制样本和计算复杂性的相关增长。为了应对这一挑战，最近的方法利用了分层强化学习（HRL）和语言，利用其组合和组合泛化能力来获得时间扩展的可重用行为。现有的方法使用专家定义的子目标空间，它们在子目标空间上实例化层次结构，并且通常假设预先训练的相关联的低级策略。这种设计在开放式情景中是不够的，在开放式情景中，目标空间自然会在各种各样的困难中多样化。我们介绍了HERAKLES，一个框架，使两级分层autotelic代理不断编译成低级别的政策，由一个小的，快速的神经网络执行的掌握目标，动态扩展的子目标提供给高级别的政策。我们训练一个大型语言模型（LLM）作为高级控制器，利用其在目标分解和泛化方面的优势，在这个不断发展的子目标空间中有效地运行。我们在开放式Crafter环境中评估了HERAKLES，并表明它可以有效地扩展目标复杂性，通过技能编译提高样本效率，并使代理能够随着时间的推移稳健地适应新的挑战。
摘要：Open-ended AI agents need to be able to learn efficiently goals of increasing complexity, abstraction and heterogeneity over their lifetime. Beyond sampling efficiently their own goals, autotelic agents specifically need to be able to keep the growing complexity of goals under control, limiting the associated growth in sample and computational complexity. To adress this challenge, recent approaches have leveraged hierarchical reinforcement learning (HRL) and language, capitalizing on its compositional and combinatorial generalization capabilities to acquire temporally extended reusable behaviours. Existing approaches use expert defined spaces of subgoals over which they instantiate a hierarchy, and often assume pre-trained associated low-level policies. Such designs are inadequate in open-ended scenarios, where goal spaces naturally diversify across a broad spectrum of difficulties. We introduce HERAKLES, a framework that enables a two-level hierarchical autotelic agent to continuously compile mastered goals into the low-level policy, executed by a small, fast neural network, dynamically expanding the set of subgoals available to the high-level policy. We train a Large Language Model (LLM) to serve as the high-level controller, exploiting its strengths in goal decomposition and generalization to operate effectively over this evolving subgoal space. We evaluate HERAKLES in the open-ended Crafter environment and show that it scales effectively with goal complexity, improves sample efficiency through skill compilation, and enables the agent to adapt robustly to novel challenges over time.

【4】Cross-Modality Controlled Molecule Generation with Diffusion Language Model
标题：采用扩散语言模型的跨模式控制分子生成
链接：https://arxiv.org/abs/2508.14748

作者：ang, Yifei Wang, Khanh Vinh Nguyen, Pengyu Hong
摘要：目前基于SMILES的分子生成扩散模型通常只支持单峰约束。它们在训练过程开始时注入条件信号，并且每当约束发生变化时，都需要从头开始重新训练新模型。然而，现实世界的应用程序往往涉及跨不同模态的多个约束，并且在研究过程中可能会出现其他约束。这就提出了一个挑战：如何扩展预先训练的扩散模型，不仅支持跨模态约束，而且在没有重新训练的情况下纳入新的约束。为了解决这个问题，我们提出了交叉模态控制的分子生成与扩散语言模型（CMCM-DLM），证明了两个不同的交叉模态：分子结构和化学性质。我们的方法建立在一个预先训练的扩散模型，结合两个可训练的模块，结构控制模块（SCM）和属性控制模块（PCM），并在生成过程中在两个不同的阶段运行。在第一阶段，我们使用SCM在早期扩散步骤中注入结构约束，有效地锚定分子骨架。在此基础上，第二阶段进一步引入PCM来指导后期的推理，以改进生成的分子，确保它们的化学性质与指定的目标相匹配。多个数据集上的实验结果证明了我们方法的效率和适应性，突出了CMCM-DLM在药物发现应用分子生成方面的重大进步。
摘要：Current SMILES-based diffusion models for molecule generation typically support only unimodal constraint. They inject conditioning signals at the start of the training process and require retraining a new model from scratch whenever the constraint changes. However, real-world applications often involve multiple constraints across different modalities, and additional constraints may emerge over the course of a study. This raises a challenge: how to extend a pre-trained diffusion model not only to support cross-modality constraints but also to incorporate new ones without retraining. To tackle this problem, we propose the Cross-Modality Controlled Molecule Generation with Diffusion Language Model (CMCM-DLM), demonstrated by two distinct cross modalities: molecular structure and chemical properties. Our approach builds upon a pre-trained diffusion model, incorporating two trainable modules, the Structure Control Module (SCM) and the Property Control Module (PCM), and operates in two distinct phases during the generation process. In Phase I, we employs the SCM to inject structural constraints during the early diffusion steps, effectively anchoring the molecular backbone. Phase II builds on this by further introducing PCM to guide the later stages of inference to refine the generated molecules, ensuring their chemical properties match the specified targets. Experimental results on multiple datasets demonstrate the efficiency and adaptability of our approach, highlighting CMCM-DLM's significant advancement in molecular generation for drug discovery applications.

【5】ShizhenGPT: Towards Multimodal LLMs for Traditional Chinese Medicine
标题：世珍GPT：迈向中医多模式LLM
链接：https://arxiv.org/abs/2508.14706

作者：hen, Zhenyang Cai, Zhiheng Liu, Yunjin Yang, Rongsheng Wang, Qingying Xiao, Xiangyi Feng, Zhan Su, Jing Guo, Xiang Wan, Guangjun Yu, Haizhou Li, Benyou Wang
摘要：尽管大型语言模型（LLM）在各个领域取得了成功，但由于两个关键障碍，它们在传统中医（TCM）中的潜力在很大程度上仍未得到充分挖掘：（1）缺乏高质量的中医数据;（2）中医诊断固有的多模态性质，包括看，听，闻和脉。这些感官丰富的形式超出了传统LLM的范围。为了应对这些挑战，我们提出了ShizhenGPT，第一个为中医量身定制的多式联运LLM。为了克服数据稀缺，我们策划了迄今为止最大的TCM数据集，包括100 GB+的文本和200 GB+的多模态数据，包括120万张图像，200小时的音频和生理信号。ShizhenGPT经过预训练和调整，以实现深层中医知识和多模态推理。为了评估，我们收集了最近的国家中医资格考试，并建立了一个视觉基准的药物识别和视觉诊断。实验表明，ShizhenGPT优于同等规模的LLM，并与更大的专有模型竞争。此外，它在现有多模式LLM中引领中医视觉理解，并展示了声音、脉搏、气味和视觉等模态的统一感知，为中医整体多模式感知和诊断铺平了道路。数据集、模型和代码都是公开的。我们希望这项工作将激发在这一领域的进一步探索。
摘要：Despite the success of large language models (LLMs) in various domains, their potential in Traditional Chinese Medicine (TCM) remains largely underexplored due to two critical barriers: (1) the scarcity of high-quality TCM data and (2) the inherently multimodal nature of TCM diagnostics, which involve looking, listening, smelling, and pulse-taking. These sensory-rich modalities are beyond the scope of conventional LLMs. To address these challenges, we present ShizhenGPT, the first multimodal LLM tailored for TCM. To overcome data scarcity, we curate the largest TCM dataset to date, comprising 100GB+ of text and 200GB+ of multimodal data, including 1.2M images, 200 hours of audio, and physiological signals. ShizhenGPT is pretrained and instruction-tuned to achieve deep TCM knowledge and multimodal reasoning. For evaluation, we collect recent national TCM qualification exams and build a visual benchmark for Medicinal Recognition and Visual Diagnosis. Experiments demonstrate that ShizhenGPT outperforms comparable-scale LLMs and competes with larger proprietary models. Moreover, it leads in TCM visual understanding among existing multimodal LLMs and demonstrates unified perception across modalities like sound, pulse, smell, and vision, paving the way toward holistic multimodal perception and diagnosis in TCM. Datasets, models, and code are publicly available. We hope this work will inspire further exploration in this field.

【6】ELATE: Evolutionary Language model for Automated Time-series Engineering
标题：ELATE：自动时间序列工程的进化语言模型
链接：https://arxiv.org/abs/2508.14667

作者：rray, Danial Dervovic, Michael Cashmore
备注：27 pages, 4 figures. Comments welcome
摘要：时间序列预测涉及使用机器学习模型预测未来值。特征工程是将现有特征转换为新特征，对于增强模型性能至关重要，但通常需要手动操作且耗时。现有的自动化尝试依赖于穷举，这可能是计算成本高，缺乏特定领域的见解。我们介绍ELATE（进化语言模型的自动化时间序列工程），它利用一个进化框架内的语言模型来自动化时间序列数据的特征工程。ELATE采用时间序列统计度量和特征重要性度量来指导和修剪特征，而语言模型则提出新的上下文相关的特征转换。我们的实验表明，ELATE在各个领域的预测准确率平均提高了8.4%。
摘要：Time-series prediction involves forecasting future values using machine learning models. Feature engineering, whereby existing features are transformed to make new ones, is critical for enhancing model performance, but is often manual and time-intensive. Existing automation attempts rely on exhaustive enumeration, which can be computationally costly and lacks domain-specific insights. We introduce ELATE (Evolutionary Language model for Automated Time-series Engineering), which leverages a language model within an evolutionary framework to automate feature engineering for time-series data. ELATE employs time-series statistical measures and feature importance metrics to guide and prune features, while the language model proposes new, contextually relevant feature transformations. Our experiments demonstrate that ELATE improves forecasting accuracy by an average of 8.4% across various domains.

【7】Adaptively Robust LLM Inference Optimization under Prediction Uncertainty
标题：预测不确定性下的自适应鲁棒LLM推理优化
链接：https://arxiv.org/abs/2508.14544

作者：, Yinyu Ye, Zijie Zhou
摘要：研究了优化大语言模型（LLM）推理调度以最小化总延迟的问题。LLM推理是一个在线的多任务服务过程，也是一个高能耗的过程，通过预训练的LLM处理输入请求并顺序地生成输出令牌。因此，在大量即时请求到达的情况下，提高其调度效率，降低功耗是至关重要的。LLM推理调度中的一个关键挑战是，虽然提示长度在到达时是已知的，但严重影响内存使用和处理时间的输出长度是未知的。为了解决这种不确定性，我们提出了利用机器学习来预测输出长度的算法，假设预测为每个请求提供了一个区间分类（最小-最大范围）。我们首先设计了一个保守的算法，$\mathcal{A}_{\max}$，它根据预测输出长度的上限来调度请求，以防止内存溢出。然而，这种方法过于保守：随着预测精度的降低，由于潜在的高估，性能会显着下降。为了克服这一限制，我们提出了$\mathcal{A}_{\min}$，一个自适应算法，最初将预测的下限作为输出长度，并在推理过程中动态地细化此估计。我们证明了$\mathcal{A}_{\min}$达到对数尺度的竞争比。通过数值模拟，我们证明了$\mathcal{A}_{\min}$经常表现得几乎和事后调度一样好，突出了它在实际场景中的效率和鲁棒性。此外，$\mathcal{A}_{\min}$仅依赖于预测区间的下限-这是一个有利的设计选择，因为输出长度的上限通常更难以准确预测。
摘要：We study the problem of optimizing Large Language Model (LLM) inference scheduling to minimize total latency. LLM inference is an online and multi-task service process and also heavily energy consuming by which a pre-trained LLM processes input requests and generates output tokens sequentially. Therefore, it is vital to improve its scheduling efficiency and reduce the power consumption while a great amount of prompt requests are arriving. A key challenge in LLM inference scheduling is that while the prompt length is known upon arrival, the output length, which critically impacts memory usage and processing time, is unknown. To address this uncertainty, we propose algorithms that leverage machine learning to predict output lengths, assuming the prediction provides an interval classification (min-max range) for each request. We first design a conservative algorithm, $\mathcal{A}_{\max}$, which schedules requests based on the upper bound of predicted output lengths to prevent memory overflow. However, this approach is overly conservative: as prediction accuracy decreases, performance degrades significantly due to potential overestimation. To overcome this limitation, we propose $\mathcal{A}_{\min}$, an adaptive algorithm that initially treats the predicted lower bound as the output length and dynamically refines this estimate during inferencing. We prove that $\mathcal{A}_{\min}$ achieves a log-scale competitive ratio. Through numerical simulations, we demonstrate that $\mathcal{A}_{\min}$ often performs nearly as well as the hindsight scheduler, highlighting both its efficiency and robustness in practical scenarios. Moreover, $\mathcal{A}_{\min}$ relies solely on the lower bound of the prediction interval--an advantageous design choice since upper bounds on output length are typically more challenging to predict accurately.

【8】Semantic Energy: Detecting LLM Hallucination Beyond Entropy
标题：语义能量：超越熵检测LLM幻觉
链接：https://arxiv.org/abs/2508.14496

作者：Jiadong Pan, Jing Liu, Yan Chen, Joey Tianyi Zhou, Guangyu Wang, Qinghua Hu, Hua Wu, Changqing Zhang, Haifeng Wang
摘要：大型语言模型（LLM）越来越多地被部署在现实世界的应用程序中，但它们仍然容易受到幻觉的影响，这会产生流畅但不正确的反应，并导致错误的决策。不确定性估计是检测这种幻觉的可行方法。例如，语义熵通过考虑多个采样响应的语义多样性来估计不确定性，从而识别幻觉。然而，语义熵依赖于后softmax概率，无法捕捉模型固有的不确定性，导致其在某些场景中无效。为了解决这个问题，我们引入了语义能量，一种新的不确定性估计框架，它通过直接在倒数第二层的logits上操作来利用LLM的固有置信度。通过将语义聚类与玻尔兹曼启发的能量分布相结合，我们的方法可以更好地捕捉语义熵失败的情况下的不确定性。多个基准测试的实验表明，语义能量显著改善了幻觉检测和不确定性估计，为幻觉检测等下游应用提供了更可靠的信号。
摘要：Large Language Models (LLMs) are being increasingly deployed in real-world applications, but they remain susceptible to hallucinations, which produce fluent yet incorrect responses and lead to erroneous decision-making. Uncertainty estimation is a feasible approach to detect such hallucinations. For example, semantic entropy estimates uncertainty by considering the semantic diversity across multiple sampled responses, thus identifying hallucinations. However, semantic entropy relies on post-softmax probabilities and fails to capture the model's inherent uncertainty, causing it to be ineffective in certain scenarios. To address this issue, we introduce Semantic Energy, a novel uncertainty estimation framework that leverages the inherent confidence of LLMs by operating directly on logits of penultimate layer. By combining semantic clustering with a Boltzmann-inspired energy distribution, our method better captures uncertainty in cases where semantic entropy fails. Experiments across multiple benchmarks show that Semantic Energy significantly improves hallucination detection and uncertainty estimation, offering more reliable signals for downstream applications such as hallucination detection.

【9】DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization
标题：DuPO：通过双重偏好优化实现可靠的LLM自我验证
链接：https://arxiv.org/abs/2508.14460

作者：She, Yu Bao, Yu Lu, Lu Xu, Tao Li, Wenhao Zhu, Shujian Huang, Shanbo Cheng, Lu Lu, Yuxuan Wang
备注：18 pages, 4 figures,
摘要：我们提出了DuPO，一个基于双重学习的偏好优化框架，通过广义对偶生成无注释反馈。DuPO解决了两个关键限制：具有可验证奖励的强化学习（RLVR）对昂贵标签的依赖和仅限于可验证任务的适用性，以及传统双重学习对严格双重任务对的限制（例如，翻译和回译）。具体地说，DuPO将原始任务的输入分解为已知和未知分量，然后构建其对偶任务，以使用原始输出和已知信息（例如，反转数学解决方案以恢复隐藏变量），扩大了对不可逆任务的适用性。这种重建的质量作为一种自我监督的奖励，以优化原始任务，与LLM通过单个模型实例化两个任务的能力协同作用。从经验上看，DuPO在不同的任务中取得了巨大的进步：它在756个方向上将平均翻译质量提高了2.13个COMET，在三个挑战基准上将数学推理准确性平均提高了6.4个点，并将推理时间重新排序器的性能提高了9.3个点（以计算换取准确性）。这些结果将DuPO定位为LLM优化的可扩展，通用和无注释的范例。
摘要：We present DuPO, a dual learning-based preference optimization framework that generates annotation-free feedback via a generalized duality. DuPO addresses two key limitations: Reinforcement Learning with Verifiable Rewards (RLVR)'s reliance on costly labels and applicability restricted to verifiable tasks, and traditional dual learning's restriction to strictly dual task pairs (e.g., translation and back-translation). Specifically, DuPO decomposes a primal task's input into known and unknown components, then constructs its dual task to reconstruct the unknown part using the primal output and known information (e.g., reversing math solutions to recover hidden variables), broadening applicability to non-invertible tasks. The quality of this reconstruction serves as a self-supervised reward to optimize the primal task, synergizing with LLMs' ability to instantiate both tasks via a single model. Empirically, DuPO achieves substantial gains across diverse tasks: it enhances the average translation quality by 2.13 COMET over 756 directions, boosts the mathematical reasoning accuracy by an average of 6.4 points on three challenge benchmarks, and enhances performance by 9.3 points as an inference-time reranker (trading computation for accuracy). These results position DuPO as a scalable, general, and annotation-free paradigm for LLM optimization.

【10】Organ-Agents: Virtual Human Physiology Simulator via LLMs
标题：器官代理：通过LLM的虚拟人类生理模拟器
链接：https://arxiv.org/abs/2508.14357

作者：ng, He Jiao, Weizhi Nie, Honglin Guo, Keliang Xie, Zhenhua Wu, Lina Zhao, Yunpeng Bai, Yongtao Ma, Lanjun Wang, Yuting Su, Xi Gao, Weijie Wang, Nicu Sebe, Bruno Lepri, Bingwei Sun
摘要：大型语言模型（LLM）的最新进展为模拟复杂的生理系统提供了新的可能性。我们介绍器官代理，一个多代理框架，通过LLM驱动的代理模拟人体生理。每个模拟器模拟特定的系统（例如，心血管、肾、免疫）。训练包括对系统特定的时间序列数据进行监督微调，然后使用动态参考选择和纠错进行协调指导。我们收集了7，134名败血症患者和7，895名对照的数据，在9个系统和125个变量中生成了高分辨率轨迹。Organ-Agents对4，509名被排除的患者实现了高模拟准确性，每个系统的MSE <0.16，并且在基于SOFA的严重程度分层中具有鲁棒性。来自两家医院的22，689名ICU患者的外部验证显示，在稳定模拟的分布变化下中度退化。器官代理忠实地再现关键的多系统事件（例如，低血压、高乳酸血症、低氧血症）。15名重症监护医生的评价证实了现实性和生理可接受性（平均Likert评分3.9和3.7）。Organ-Agents还可以在其他脓毒症治疗策略下进行反事实模拟，生成与匹配的真实世界患者一致的轨迹和APACHE II评分。在下游预警任务中，在合成数据上训练的分类器显示出最小的AUROC下降（<0.04），表明保留了决策相关模式。这些结果将Organ-Agents定位为可靠，可解释和可推广的数字孪生模型，用于重症监护中的精确诊断，治疗模拟和假设检验。
摘要：Recent advances in large language models (LLMs) have enabled new possibilities in simulating complex physiological systems. We introduce Organ-Agents, a multi-agent framework that simulates human physiology via LLM-driven agents. Each Simulator models a specific system (e.g., cardiovascular, renal, immune). Training consists of supervised fine-tuning on system-specific time-series data, followed by reinforcement-guided coordination using dynamic reference selection and error correction. We curated data from 7,134 sepsis patients and 7,895 controls, generating high-resolution trajectories across 9 systems and 125 variables. Organ-Agents achieved high simulation accuracy on 4,509 held-out patients, with per-system MSEs <0.16 and robustness across SOFA-based severity strata. External validation on 22,689 ICU patients from two hospitals showed moderate degradation under distribution shifts with stable simulation. Organ-Agents faithfully reproduces critical multi-system events (e.g., hypotension, hyperlactatemia, hypoxemia) with coherent timing and phase progression. Evaluation by 15 critical care physicians confirmed realism and physiological plausibility (mean Likert ratings 3.9 and 3.7). Organ-Agents also enables counterfactual simulations under alternative sepsis treatment strategies, generating trajectories and APACHE II scores aligned with matched real-world patients. In downstream early warning tasks, classifiers trained on synthetic data showed minimal AUROC drops (<0.04), indicating preserved decision-relevant patterns. These results position Organ-Agents as a credible, interpretable, and generalizable digital twin for precision diagnosis, treatment simulation, and hypothesis testing in critical care.

【11】Zero-knowledge LLM hallucination detection and mitigation through fine-grained cross-model consistency
标题：通过细粒度跨模型一致性进行零知识LLM幻觉检测和缓解
链接：https://arxiv.org/abs/2508.14314

作者：, Daniel Schwartz, Yanjun Qi
摘要：大型语言模型（LLM）在不同的任务中表现出了令人印象深刻的能力，但它们仍然容易受到幻觉的影响-生成看似合理但包含事实不准确的内容。我们提出了Finch-Zk，这是一个黑盒框架，它利用细粒度跨模型一致性来检测和减轻LLM输出中的幻觉，而无需外部知识源。Finch-Zk引入了两个关键创新：1）跨模型一致性检查策略，通过比较不同模型从语义等效提示中生成的响应来揭示细粒度的不准确性，以及2）有针对性的缓解技术，在保留准确内容的同时对有问题的片段进行精确纠正。在FELM数据集上的实验表明，与现有方法相比，Finch-Zk将幻觉检测F1分数提高了6- 39%。为了缓解，Finch-Zk在GPQA钻石数据集上的答案准确性上实现了7-8个绝对百分点的提高，当应用于最先进的模型时，如Llama 4 Maverick和Claude 4 Sonnet。对多个模型的广泛评估表明，Finch-Zk为提高生产LLM系统的实际可靠性提供了实用的部署就绪保障。
摘要：Large language models (LLMs) have demonstrated impressive capabilities across diverse tasks, but they remain susceptible to hallucinations--generating content that appears plausible but contains factual inaccuracies. We present Finch-Zk, a black-box framework that leverages FINe-grained Cross-model consistency to detect and mitigate Hallucinations in LLM outputs without requiring external knowledge sources. Finch-Zk introduces two key innovations: 1) a cross-model consistency checking strategy that reveals fine-grained inaccuracies by comparing responses generated by diverse models from semantically-equivalent prompts, and 2) a targeted mitigation technique that applies precise corrections to problematic segments while preserving accurate content. Experiments on the FELM dataset show Finch-Zk improves hallucination detection F1 scores by 6-39\% compared to existing approaches. For mitigation, Finch-Zk achieves 7-8 absolute percentage points improvement in answer accuracy on the GPQA-diamond dataset when applied to state-of-the-art models like Llama 4 Maverick and Claude 4 Sonnet. Extensive evaluation across multiple models demonstrates that Finch-Zk provides a practical, deployment-ready safeguard for enhancing factual reliability in production LLM systems.

【12】GLASS: Test-Time Acceleration for LLMs via Global-Local Neural Importance Aggregation
标题：GLASS：通过全局-局部神经重要性聚合实现LLM的测试时间加速
链接：https://arxiv.org/abs/2508.14302

作者：n Sattarifard, Sepehr Lavasani, Ehsan Imani, Kunlin Zhang, Hanlin Xu, Fengyu Sun, Negar Hassanpour, Chao Gao
摘要：在边缘硬件上部署大型语言模型（LLM）需要积极的、可感知的动态修剪，以减少计算而不降低质量。静态或基于预测器的方案要么锁定在单个稀疏模式中，要么招致额外的运行时开销，并且依赖于来自单个提示的统计信息的最近的zero-shot方法在短提示和/或长生成场景中失败。我们介绍A/I-GLASS：基于激活和影响的全局-局部神经元重要性前馈网络SparSification的聚合，两种无需训练的方法，使用提示局部和模型固有全局神经元统计的秩聚合动态选择FFN单元。多个LLM和基准测试的实证结果表明，GLASS的性能明显优于先前的无训练方法，特别是在具有挑战性的长形式生成场景中，而不依赖于辅助预测器或添加任何推理开销。
摘要：Deploying Large Language Models (LLMs) on edge hardware demands aggressive, prompt-aware dynamic pruning to reduce computation without degrading quality. Static or predictor-based schemes either lock in a single sparsity pattern or incur extra runtime overhead, and recent zero-shot methods that rely on statistics from a single prompt fail on short prompt and/or long generation scenarios. We introduce A/I-GLASS: Activation- and Impact-based Global-Local neural importance Aggregation for feed-forward network SparSification, two training-free methods that dynamically select FFN units using a rank-aggregation of prompt local and model-intrinsic global neuron statistics. Empirical results across multiple LLMs and benchmarks demonstrate that GLASS significantly outperforms prior training-free methods, particularly in challenging long-form generation scenarios, without relying on auxiliary predictors or adding any inference overhead.

【13】Amortized Bayesian Meta-Learning for Low-Rank Adaptation of Large Language Models
标题：用于大型语言模型低等级适应的摊销式Bayesian元学习
链接：https://arxiv.org/abs/2508.14285

作者：g, Jake Snell, Thomas L. Griffiths
备注：6 pages, 2 figures
摘要：使用低秩自适应（LoRA）微调大型语言模型（LLM）是一种整合特定数据集信息的经济有效的方法。然而，通常不清楚微调的LLM将推广到什么程度，即，它在看不见的数据集上的表现如何。已经提出了通过优化上下文提示或使用元学习来微调LLM来提高泛化的方法。然而，这些方法在存储器和计算方面是昂贵的，需要长上下文提示或保存参数的副本并使用二阶梯度更新。为了应对这些挑战，我们提出了LoRA的摊销贝叶斯元学习（ABMLL）。这种方法建立在用于较小模型的摊销贝叶斯元学习的基础上，使这种方法适用于LLM，同时保持其计算效率。我们在LoRA的上下文中重新构建了特定于任务的参数和全局参数，并使用一组新的超参数来平衡重建精度和特定于任务的参数对全局参数的保真度。ABMLL为大型模型（如Llama 3 -8B）提供了有效的泛化和缩放。此外，由于使用贝叶斯框架，ABMLL提供了改进的不确定性量化。我们在Unified-QA和CrossFit数据集上测试了ABMLL，发现它在准确性和预期校准误差方面优于这些基准的现有方法。
摘要：Fine-tuning large language models (LLMs) with low-rank adaptaion (LoRA) is a cost-effective way to incorporate information from a specific dataset. However, it is often unclear how well the fine-tuned LLM will generalize, i.e., how well it will perform on unseen datasets. Methods have been proposed to improve generalization by optimizing with in-context prompts, or by using meta-learning to fine-tune LLMs. However, these methods are expensive in memory and computation, requiring either long-context prompts or saving copies of parameters and using second-order gradient updates. To address these challenges, we propose Amortized Bayesian Meta-Learning for LoRA (ABMLL). This method builds on amortized Bayesian meta-learning for smaller models, adapting this approach to LLMs while maintaining its computational efficiency. We reframe task-specific and global parameters in the context of LoRA and use a set of new hyperparameters to balance reconstruction accuracy and the fidelity of task-specific parameters to the global ones. ABMLL provides effective generalization and scales to large models such as Llama3-8B. Furthermore, as a result of using a Bayesian framework, ABMLL provides improved uncertainty quantification. We test ABMLL on Unified-QA and CrossFit datasets and find that it outperforms existing methods on these benchmarks in terms of both accuracy and expected calibration error.

【14】Two Birds with One Stone: Multi-Task Detection and Attribution of LLM-Generated Text
标题：一箭双雕：LLM生成文本的多任务检测和归因
链接：https://arxiv.org/abs/2508.14190

作者：, Youssef Mohamed, Shang Liu, Zeyan Liu
备注：Securecomm 2025
摘要：大型语言模型（LLM），如GPT-4和Llama，在生成自然语言方面表现出了非凡的能力。然而，它们也带来了安全性和完整性方面的挑战。现有的对策主要集中在区分人工智能生成的内容和人类书写的文本，大多数解决方案都是为英语量身定制的。与此同时，作者归属-确定哪个特定的LLM产生了给定的文本-尽管在法医分析中很重要，但相对较少受到关注。在本文中，我们提出了DA-MTL，一个多任务学习框架，同时解决文本检测和作者归属。我们在九个数据集和四个主干模型上评估了DA-MTL，展示了其在多种语言和LLM源中的强大性能。我们的框架捕捉每个任务的独特特征，并在它们之间分享见解，从而提高两项任务的性能。此外，我们对跨模态和跨语言模式进行了深入分析，并评估了该框架对对抗性混淆技术的鲁棒性。我们的研究结果为法学硕士行为以及检测和作者归属的概括提供了有价值的见解。
摘要：Large Language Models (LLMs), such as GPT-4 and Llama, have demonstrated remarkable abilities in generating natural language. However, they also pose security and integrity challenges. Existing countermeasures primarily focus on distinguishing AI-generated content from human-written text, with most solutions tailored for English. Meanwhile, authorship attribution--determining which specific LLM produced a given text--has received comparatively little attention despite its importance in forensic analysis. In this paper, we present DA-MTL, a multi-task learning framework that simultaneously addresses both text detection and authorship attribution. We evaluate DA-MTL on nine datasets and four backbone models, demonstrating its strong performance across multiple languages and LLM sources. Our framework captures each task's unique characteristics and shares insights between them, which boosts performance in both tasks. Additionally, we conduct a thorough analysis of cross-modal and cross-lingual patterns and assess the framework's robustness against adversarial obfuscation techniques. Our findings offer valuable insights into LLM behavior and the generalization of both detection and authorship attribution.

【15】DPad: Efficient Diffusion Language Models with Suffix Dropout
标题：DPad：带后缀Dropout的高效扩散语言模型
链接：https://arxiv.org/abs/2508.14148

作者：en, Sitao Huang, Cong Guo, Chiyue Wei, Yintao He, Jianyi Zhang, Hai "Hellen" Li, Yiran Chen
摘要：基于扩散的大型语言模型（dLLM）通过将解码框架作为去噪过程来并行化文本生成，但由于它们在每一步预测所有未来的后缀标记而仅保留一小部分，因此计算开销很高。我们提出了一种免训练的方法--扩散后缀（DPad），它将注意力限制在附近的一小部分后缀标记上，在消除冗余的同时保持保真度。DPad集成了两种策略：（i）滑动窗口，其保持固定长度的后缀窗口，以及（ii）距离衰减丢弃，其在注意力计算之前确定性地移除远距离后缀标记。这种简单的设计与现有的优化（如前缀缓存）兼容，并且只需几行代码即可实现。对LLaDA-1.5和Dream模型的多个基准测试的综合评估表明，DPad在保持相当准确性的同时，比vanilla dLLM提供了高达$\mathbf{61.4\times}$的加速，突出了其高效和可扩展的长序列推理的潜力。我们的代码可在https://github.com/Crys-Chen/DPad上获得。
摘要：Diffusion-based Large Language Models (dLLMs) parallelize text generation by framing decoding as a denoising process, but suffer from high computational overhead since they predict all future suffix tokens at each step while retaining only a small fraction. We propose Diffusion Scratchpad (DPad), a training-free method that restricts attention to a small set of nearby suffix tokens, preserving fidelity while eliminating redundancy. DPad integrates two strategies: (i) a sliding window, which maintains a fixed-length suffix window, and (ii) distance-decay dropout, which deterministically removes distant suffix tokens before attention computation. This simple design is compatible with existing optimizations such as prefix caching and can be implemented with only a few lines of code. Comprehensive evaluations across multiple benchmarks on LLaDA-1.5 and Dream models demonstrate that DPad delivers up to $\mathbf{61.4\times}$ speedup over vanilla dLLMs while maintaining comparable accuracy, highlighting its potential for efficient and scalable long-sequence inference. Our code is available at https://github.com/Crys-Chen/DPad.

【16】Punctuation and Predicates in Language Models
标题：语言模型中的标点符号和动词
链接：https://arxiv.org/abs/2508.14067

作者：Chauhan, Maheep Chaudhary, Koby Choy, Samuel Nellessen, Nandi Schoots
摘要：在本文中，我们将探讨在哪里收集信息，以及如何在大型语言模型（LLM）的各个层中传播信息。我们首先研究了令人惊讶的计算标点符号的重要性，以前的工作已经确定为注意力下沉和记忆辅助。使用基于干预的技术，我们评估了GPT-2，DeepSeek和Gemma中跨层标点符号的必要性和充分性（用于保持模型性能）。我们的研究结果显示了明显的模型特定差异：对于GPT-2，标点符号在多个层中是必要的和足够的，而这在DeepSeek中要少得多，在Gemma中根本不存在。超越标点符号，我们问LLM是否处理输入的不同组成部分（例如，主题、形容词、标点符号、完整句子），或者如果模型对这些组件跨层的变化保持敏感。扩展超出标点符号，我们调查是否不同的推理规则处理不同的LLM。特别是，通过交换干预和层交换实验，我们发现条件语句（如果，然后）和全称量化（对于所有）的处理非常不同。我们的研究结果提供了新的见解标点符号的使用和推理的内部机制在LLM和可解释性的影响。
摘要：In this paper we explore where information is collected and how it is propagated throughout layers in large language models (LLMs). We begin by examining the surprising computational importance of punctuation tokens which previous work has identified as attention sinks and memory aids. Using intervention-based techniques, we evaluate the necessity and sufficiency (for preserving model performance) of punctuation tokens across layers in GPT-2, DeepSeek, and Gemma. Our results show stark model-specific differences: for GPT-2, punctuation is both necessary and sufficient in multiple layers, while this holds far less in DeepSeek and not at all in Gemma. Extending beyond punctuation, we ask whether LLMs process different components of input (e.g., subjects, adjectives, punctuation, full sentences) by forming early static summaries reused across the network, or if the model remains sensitive to changes in these components across layers. Extending beyond punctuation, we investigate whether different reasoning rules are processed differently by LLMs. In particular, through interchange intervention and layer-swapping experiments, we find that conditional statements (if, then), and universal quantification (for all) are processed very differently. Our findings offer new insight into the internal mechanisms of punctuation usage and reasoning in LLMs and have implications for interpretability.

【17】EmoSLLM: Parameter-Efficient Adaptation of LLMs for Speech Emotion Recognition
标题：LLM：用于语音情感识别的LLM的参数有效自适应
链接：https://arxiv.org/abs/2508.14130

作者：onier, Antony Perzo, Renaud Seguier
摘要：从语音中识别情感是一项具有挑战性的任务，需要捕获语言和非语言线索，在人机交互和心理健康监测中具有重要应用。最近的工作强调了大型语言模型（LLM）在唯一自然语言区域之外执行任务的能力。特别是，最近的方法已经研究了通过使用预先训练的骨干和不同的融合机制将LLM与其他数据模态耦合。这项工作提出了一种新的方法，微调LLM与音频和文本表示的情感预测。我们的方法首先使用音频特征提取器提取音频特征，然后通过可学习的接口模块将其映射到LLM的表示空间。LLM将（1）变换后的音频特征，（2）自然语言形式的附加特征（例如，抄本），以及（3）描述情绪预测任务的文本提示。为了有效地使LLM适应这种多模态任务，我们采用了低秩自适应（LoRA），从而实现了参数有效的微调。标准的情感识别基准的实验结果表明，我们的模型优于所有，但一个现有的语音-文本LLM在文献中，而需要不到一半的竞争方法的参数。这突出了我们的方法的有效性，在集成多模态输入语音为基础的情感理解，同时保持显着的计算效率。
摘要：Emotion recognition from speech is a challenging task that requires capturing both linguistic and paralinguistic cues, with critical applications in human-computer interaction and mental health monitoring. Recent works have highlighted the ability of Large Language Models (LLMs) to perform tasks outside of the sole natural language area. In particular, recent approaches have investigated coupling LLMs with other data modalities by using pre-trained backbones and different fusion mechanisms. This work proposes a novel approach that fine-tunes an LLM with audio and text representations for emotion prediction. Our method first extracts audio features using an audio feature extractor, which are then mapped into the LLM's representation space via a learnable interfacing module. The LLM takes as input (1) the transformed audio features, (2) additional features in the form of natural language (e.g., the transcript), and (3) a textual prompt describing the emotion prediction task. To efficiently adapt the LLM to this multimodal task, we employ Low-Rank Adaptation (LoRA), enabling parameter-efficient fine-tuning. Experimental results on standard emotion recognition benchmarks demonstrate that our model outperforms all but one existing Speech-Text LLMs in the literature, while requiring less than half the parameters of competing approaches. This highlights our approach's effectiveness in integrating multi-modal inputs for speech-based emotion understanding while maintaining significant computational efficiency.

Graph相关(图学习|图神经网络|图优化等)(17篇)

【1】Graph Structure Learning with Temporal Graph Information Bottleneck for Inductive Representation Learning
标题：具有时态图信息瓶颈的图结构学习用于归纳表示学习
链接：https://arxiv.org/abs/2508.14859

作者：iong, Rizos Sakellariou
备注：Accepted in the 28th European Conference on Artificial Intelligence (ECAI), 2025
摘要：时间图学习对于动态网络至关重要，在动态网络中，节点和边随着时间的推移而演变，新节点不断加入系统。在这样的环境中，归纳表示学习面临两个主要挑战：有效地表示看不见的节点和减少噪声或冗余的图形信息。我们提出了GTGIB，一个通用的框架，集成了图结构学习（GSL）与时间图信息瓶颈（TGIB）。我们设计了一种新的基于GSL的两步结构增强器来丰富和优化节点邻域，并通过理论证明和实验证明了其有效性和效率。TGIB通过将信息瓶颈原理扩展到时间图来细化优化图，基于我们推导出的易于处理的TGIB目标函数通过变分近似来正则化边缘和特征，从而实现稳定和高效的优化。基于GTGIB的模型进行评估，以预测四个现实世界的数据集上的链接，他们优于现有的方法在所有数据集下的归纳设置，具有显着和一致的改进，在转导设置。
摘要：Temporal graph learning is crucial for dynamic networks where nodes and edges evolve over time and new nodes continuously join the system. Inductive representation learning in such settings faces two major challenges: effectively representing unseen nodes and mitigating noisy or redundant graph information. We propose GTGIB, a versatile framework that integrates Graph Structure Learning (GSL) with Temporal Graph Information Bottleneck (TGIB). We design a novel two-step GSL-based structural enhancer to enrich and optimize node neighborhoods and demonstrate its effectiveness and efficiency through theoretical proofs and experiments. The TGIB refines the optimized graph by extending the information bottleneck principle to temporal graphs, regularizing both edges and features based on our derived tractable TGIB objective function via variational approximation, enabling stable and efficient optimization. GTGIB-based models are evaluated to predict links on four real-world datasets; they outperform existing methods in all datasets under the inductive setting, with significant and consistent improvement in the transductive setting.

【2】MissionHD: Data-Driven Refinement of Reasoning Graph Structure through Hyperdimensional Causal Path Encoding and Decoding
标题：MissionHD：通过多维因果路径编码和解码对推理图结构进行数据驱动的细化
链接：https://arxiv.org/abs/2508.14746

作者：Yun, Raheeb Hassan, Ryozo Masukawa, Mohsen Imani
摘要：来自大型语言模型（LLM）的推理图通常与下游视觉任务（如视频异常检测（VAD））不一致。现有的图结构精化（GSR）方法不适合这些新的，无网格图。我们引入了数据驱动的GSR（Data-driven GSR，D-GSR），这是一种利用下游任务数据直接优化图结构的新范式，并提出了MissionHD，一个超维计算（hyperdimensional computing，HDC）框架来实现它。MissionHD使用高效的编码-解码过程来细化图，由下游任务信号指导。在具有挑战性的VAD和VAR基准测试上的实验表明，当使用我们的细化图时，性能得到了显着提高，验证了我们的方法是一个有效的预处理步骤。
摘要：Reasoning graphs from Large Language Models (LLMs) are often misaligned with downstream visual tasks such as video anomaly detection (VAD). Existing Graph Structure Refinement (GSR) methods are ill-suited for these novel, dataset-less graphs. We introduce Data-driven GSR (D-GSR), a new paradigm that directly optimizes graph structure using downstream task data, and propose MissionHD, a hyperdimensional computing (HDC) framework to operationalize it. MissionHD uses an efficient encode-decode process to refine the graph, guided by the downstream task signal. Experiments on challenging VAD and VAR benchmarks show significant performance improvements when using our refined graphs, validating our approach as an effective pre-processing step.

【3】Addressing Graph Anomaly Detection via Causal Edge Separation and Spectrum
标题：通过因果边分离和谱解决图异常检测
链接：https://arxiv.org/abs/2508.14684

作者：, Wenjun Wang, Minglai Shao, Chang Liu, Yumeng Wang, Yueheng Sun
备注：Proceedings of the 2024 KDD Workshop
摘要：在现实世界中，异常实体通常会添加更多的合法连接，同时隐藏与其他异常实体的直接连接，从而导致大多数基于GNN的技术无法解决的异常网络中的异嗜结构。已经提出了一些工作来解决这个问题，在空间域。然而，这些方法忽略了节点结构编码、节点特征及其上下文环境之间的复杂关系，并且依赖于原则性指导，解决谱域异嗜问题的研究仍然有限。分析了不同异亲度节点的频谱分布，发现异常节点的异亲性导致频谱能量从低频向高频移动。为了解决上述挑战，我们提出了一个谱神经网络CES 2-GAD基于因果边缘分离的异常检测heterophilic图。首先，CES 2-GAD将使用因果干预将原始图分离为同亲和异亲边。随后，使用各种混合频谱滤波器从分段图中捕获信号。最后，来自多个信号的表示被连接并输入到分类器中以预测异常。在真实数据集上的大量实验证明了该方法的有效性。
摘要：In the real world, anomalous entities often add more legitimate connections while hiding direct links with other anomalous entities, leading to heterophilic structures in anomalous networks that most GNN-based techniques fail to address. Several works have been proposed to tackle this issue in the spatial domain. However, these methods overlook the complex relationships between node structure encoding, node features, and their contextual environment and rely on principled guidance, research on solving spectral domain heterophilic problems remains limited. This study analyzes the spectral distribution of nodes with different heterophilic degrees and discovers that the heterophily of anomalous nodes causes the spectral energy to shift from low to high frequencies. To address the above challenges, we propose a spectral neural network CES2-GAD based on causal edge separation for anomaly detection on heterophilic graphs. Firstly, CES2-GAD will separate the original graph into homophilic and heterophilic edges using causal interventions. Subsequently, various hybrid-spectrum filters are used to capture signals from the segmented graphs. Finally, representations from multiple signals are concatenated and input into a classifier to predict anomalies. Extensive experiments with real-world datasets have proven the effectiveness of the method we proposed.

【4】Improving Fairness in Graph Neural Networks via Counterfactual Debiasing
标题：通过反事实去偏置提高图神经网络的公平性
链接：https://arxiv.org/abs/2508.14683

作者：, Chang Liu, Yumeng Wang, Minglai Shao, Wenjun Wang
备注：Proceedings of the 2024 KDD Workshop
摘要：图神经网络（GNN）已经成功地对图结构数据进行建模。然而，与其他机器学习模型类似，GNN可能会在基于种族和性别等属性的预测中表现出偏差。此外，GNN中的偏差可能会因图结构和消息传递机制而加剧。最近的尖端方法提出通过从输入或表示中过滤掉敏感信息来减轻偏差，如边缘丢弃或特征掩蔽。然而，我们认为，这种策略可能会无意中消除非敏感特征，导致预测准确性和公平性之间的平衡受损。为了应对这一挑战，我们提出了一种新的方法，利用反事实数据增强偏见缓解。该方法涉及在消息传递之前使用反事实创建不同的邻域，促进从增强图中学习的无偏节点表示。随后，采用对抗性分类器来减少传统GNN分类器的预测偏差。我们提出的技术，公平ICD，确保在适度的条件下GNN的公平性。在使用三个GNN主干的标准数据集上的实验表明，Fair-ICD在保持高预测性能的同时显著增强了公平性度量。
摘要：Graph Neural Networks (GNNs) have been successful in modeling graph-structured data. However, similar to other machine learning models, GNNs can exhibit bias in predictions based on attributes like race and gender. Moreover, bias in GNNs can be exacerbated by the graph structure and message-passing mechanisms. Recent cutting-edge methods propose mitigating bias by filtering out sensitive information from input or representations, like edge dropping or feature masking. Yet, we argue that such strategies may unintentionally eliminate non-sensitive features, leading to a compromised balance between predictive accuracy and fairness. To tackle this challenge, we present a novel approach utilizing counterfactual data augmentation for bias mitigation. This method involves creating diverse neighborhoods using counterfactuals before message passing, facilitating unbiased node representations learning from the augmented graph. Subsequently, an adversarial discriminator is employed to diminish bias in predictions by conventional GNN classifiers. Our proposed technique, Fair-ICD, ensures the fairness of GNNs under moderate conditions. Experiments on standard datasets using three GNN backbones demonstrate that Fair-ICD notably enhances fairness metrics while preserving high predictive performance.

【5】SBGD: Improving Graph Diffusion Generative Model via Stochastic Block Diffusion
标题：SBBD：通过随机块扩散改进图扩散生成模型
链接：https://arxiv.org/abs/2508.14352

作者：, Shan Wu
摘要：图扩散生成模型（GDGM）已经成为生成高质量图的强大工具。然而，它们的更广泛采用面临着可伸缩性和大小泛化的挑战。GDGM由于其高内存要求而难以扩展到大型图，因为它们通常在整个图空间中操作，需要在训练和推理期间将整个图存储在内存中。这个限制限制了它们在大规模真实世界图中的可行性。GDGM还表现出较差的尺寸泛化能力，生成与训练数据中的尺寸不同的图的能力有限，限制了它们在不同应用中的适应性。为了解决这些挑战，我们提出了随机块图扩散（SBGD）模型，它细化到一个块图空间的图形表示。这个空间结合了基于真实世界图形模式的结构先验，显著降低了内存复杂性，并实现了对大型图形的可扩展性。块表示还通过捕获基本图结构来改进尺寸概括。实证结果表明，SBGD实现显着的内存改进（高达6$\times$），同时保持相当的，甚至更优越的图形生成性能相对于国家的最先进的方法。此外，实验表明，SBGD更好地推广到看不见的图形大小。SBGD的意义不仅仅是一个可扩展和有效的GDGM;它还体现了生成建模中的模块化原则，通过将复杂任务分解为更易于管理的组件，为探索生成模型提供了一条新的途径。
摘要：Graph diffusion generative models (GDGMs) have emerged as powerful tools for generating high-quality graphs. However, their broader adoption faces challenges in \emph{scalability and size generalization}. GDGMs struggle to scale to large graphs due to their high memory requirements, as they typically operate in the full graph space, requiring the entire graph to be stored in memory during training and inference. This constraint limits their feasibility for large-scale real-world graphs. GDGMs also exhibit poor size generalization, with limited ability to generate graphs of sizes different from those in the training data, restricting their adaptability across diverse applications. To address these challenges, we propose the stochastic block graph diffusion (SBGD) model, which refines graph representations into a block graph space. This space incorporates structural priors based on real-world graph patterns, significantly reducing memory complexity and enabling scalability to large graphs. The block representation also improves size generalization by capturing fundamental graph structures. Empirical results show that SBGD achieves significant memory improvements (up to 6$\times$) while maintaining comparable or even superior graph generation performance relative to state-of-the-art methods. Furthermore, experiments demonstrate that SBGD better generalizes to unseen graph sizes. The significance of SBGD extends beyond being a scalable and effective GDGM; it also exemplifies the principle of modularization in generative modeling, offering a new avenue for exploring generative models by decomposing complex tasks into more manageable components.

【6】A Non-Asymptotic Convergent Analysis for Scored-Based Graph Generative Model via a System of Stochastic Differential Equations
标题：基于随机方程组的得分图生成模型的非渐进收敛分析
链接：https://arxiv.org/abs/2508.14351

作者：, Chuan Wu
摘要：基于分数的图生成模型（SGGM）已被证明在药物发现和蛋白质合成等关键应用中是有效的。然而，他们的理论行为，特别是关于收敛，仍然是探索不足。不同于常见的基于分数的生成模型（SGM），这是由一个单一的随机微分方程（SDES），SGGM涉及耦合的系统的SDES。在SGGM中，图结构和节点特征由独立但相互依赖的SDE控制。这种区别使得现有的SGGM收敛分析不适用于SGGM。在这项工作中，我们提出了第一个SGGM的非渐近收敛分析，集中在收敛界（生成错误的风险）在三个关键的图形生成范式：（1）具有固定的图形结构的特征生成，（2）具有固定节点特征的图形结构生成，以及（3）图形结构和节点特征的联合生成。我们的分析揭示了SGGM特有的几个独特因素（例如，图结构的拓扑性质）影响收敛界。此外，我们还提供了超参数选择的理论见解（例如，采样步骤和扩散长度），并提倡像归一化这样的技术来提高收敛性。为了验证我们的理论研究结果，我们使用合成图模型进行了一项受控的实证研究，结果与我们的理论预测一致。这项工作加深了对SGGM的理论理解，证明了它们在关键领域的适用性，并为设计有效的模型提供了实践指导。
摘要：Score-based graph generative models (SGGMs) have proven effective in critical applications such as drug discovery and protein synthesis. However, their theoretical behavior, particularly regarding convergence, remains underexplored. Unlike common score-based generative models (SGMs), which are governed by a single stochastic differential equation (SDE), SGGMs involve a system of coupled SDEs. In SGGMs, the graph structure and node features are governed by separate but interdependent SDEs. This distinction makes existing convergence analyses from SGMs inapplicable for SGGMs. In this work, we present the first non-asymptotic convergence analysis for SGGMs, focusing on the convergence bound (the risk of generative error) across three key graph generation paradigms: (1) feature generation with a fixed graph structure, (2) graph structure generation with fixed node features, and (3) joint generation of both graph structure and node features. Our analysis reveals several unique factors specific to SGGMs (e.g., the topological properties of the graph structure) which affect the convergence bound. Additionally, we offer theoretical insights into the selection of hyperparameters (e.g., sampling steps and diffusion length) and advocate for techniques like normalization to improve convergence. To validate our theoretical findings, we conduct a controlled empirical study using synthetic graph models, and the results align with our theoretical predictions. This work deepens the theoretical understanding of SGGMs, demonstrates their applicability in critical domains, and provides practical guidance for designing effective models.

【7】On the Interplay between Graph Structure and Learning Algorithms in Graph Neural Networks
标题：图神经网络中图结构与学习算法的相互作用
链接：https://arxiv.org/abs/2508.14338

作者：, Chuan Wu
摘要：本文研究了图神经网络（GNNs）的学习算法和图结构之间的相互作用。关于GNN的学习动态的现有理论研究主要集中在插值机制（无噪声）下的学习算法的收敛速率上，并且仅提供了这些动态与实际图结构之间的粗略联系（例如，最大程度）。本文旨在通过研究GNNs中学习算法在泛化机制（有噪声）中的过度风险（泛化性能）来弥合这一差距。具体来说，我们将学习理论文献中的传统设置扩展到GNNs的上下文中，并研究图结构如何影响随机梯度下降（SGD）和岭回归等学习算法的性能。我们的研究为理解GNN中图结构和学习之间的相互作用做出了几个关键贡献。首先，我们推导出GNN中SGD和岭回归的超额风险曲线，并通过谱图理论将这些曲线连接到图结构。有了这个框架，我们进一步探讨不同的图结构（常规与幂律）如何影响这些算法的性能，通过比较分析。此外，我们将我们的分析扩展到多层线性GNN，揭示了对超额风险状况的非各向同性影响，从而从学习算法的角度对GNN中的过度平滑问题提供了新的见解。我们的实证结果与我们的理论预测一致，共同展示了图结构，GNN和学习算法之间的耦合关系，并提供了GNN算法设计和选择的见解。
摘要：This paper studies the interplay between learning algorithms and graph structure for graph neural networks (GNNs). Existing theoretical studies on the learning dynamics of GNNs primarily focus on the convergence rates of learning algorithms under the interpolation regime (noise-free) and offer only a crude connection between these dynamics and the actual graph structure (e.g., maximum degree). This paper aims to bridge this gap by investigating the excessive risk (generalization performance) of learning algorithms in GNNs within the generalization regime (with noise). Specifically, we extend the conventional settings from the learning theory literature to the context of GNNs and examine how graph structure influences the performance of learning algorithms such as stochastic gradient descent (SGD) and Ridge regression. Our study makes several key contributions toward understanding the interplay between graph structure and learning in GNNs. First, we derive the excess risk profiles of SGD and Ridge regression in GNNs and connect these profiles to the graph structure through spectral graph theory. With this established framework, we further explore how different graph structures (regular vs. power-law) impact the performance of these algorithms through comparative analysis. Additionally, we extend our analysis to multi-layer linear GNNs, revealing an increasing non-isotropic effect on the excess risk profile, thereby offering new insights into the over-smoothing issue in GNNs from the perspective of learning algorithms. Our empirical results align with our theoretical predictions, \emph{collectively showcasing a coupling relation among graph structure, GNNs and learning algorithms, and providing insights on GNN algorithm design and selection in practice.}

【8】Multi-view Graph Condensation via Tensor Decomposition
标题：通过张量分解实现多视图图形凝聚
链接：https://arxiv.org/abs/2508.14330

作者：oque dos Santos, Dawon Ahn, Diego Minatel, Alneu de Andrade Lopes, Evangelos E. Papalexakis
摘要：图神经网络（GNN）在各种现实世界的应用中表现出了显着的效果，包括药物发现，对象检测，社交媒体分析，推荐系统和文本分类。与其巨大的潜力相比，在大规模图上训练它们会带来巨大的计算挑战，因为它们的存储和处理需要资源。图凝聚已经成为一种有前途的解决方案，通过学习一个合成的紧凑图来减少这些需求，该图保留了原始图的基本信息，同时保持了GNN的预测性能。尽管它们的功效，当前的图凝聚方法经常依赖于计算密集型的双层优化。此外，它们无法保持合成节点和原始节点之间的映射，限制了模型决策的可解释性。从这个意义上说，广泛的分解技术已被应用于从图形数据中学习线性或多线性函数，提供了一种更透明、资源密集度更低的替代方案。然而，他们的适用性，图凝聚仍然是未开发的。本文解决了这一差距，并提出了一种新的方法，称为通过张量分解（GCTD）的多视图图凝聚，以调查在何种程度上这种技术可以合成一个信息量较小的图，并实现可比的下游任务性能。在六个真实世界数据集上进行的大量实验表明，GCTD有效地减小了图的大小，同时保持了GNN的性能，与现有方法相比，在六个数据集中的三个数据集上的准确性提高了4.0\，在大型图上的性能具有竞争力。我们的代码可在www.example.com上获得。
摘要：Graph Neural Networks (GNNs) have demonstrated remarkable results in various real-world applications, including drug discovery, object detection, social media analysis, recommender systems, and text classification. In contrast to their vast potential, training them on large-scale graphs presents significant computational challenges due to the resources required for their storage and processing. Graph Condensation has emerged as a promising solution to reduce these demands by learning a synthetic compact graph that preserves the essential information of the original one while maintaining the GNN's predictive performance. Despite their efficacy, current graph condensation approaches frequently rely on a computationally intensive bi-level optimization. Moreover, they fail to maintain a mapping between synthetic and original nodes, limiting the interpretability of the model's decisions. In this sense, a wide range of decomposition techniques have been applied to learn linear or multi-linear functions from graph data, offering a more transparent and less resource-intensive alternative. However, their applicability to graph condensation remains unexplored. This paper addresses this gap and proposes a novel method called Multi-view Graph Condensation via Tensor Decomposition (GCTD) to investigate to what extent such techniques can synthesize an informative smaller graph and achieve comparable downstream task performance. Extensive experiments on six real-world datasets demonstrate that GCTD effectively reduces graph size while preserving GNN performance, achieving up to a 4.0\ improvement in accuracy on three out of six datasets and competitive performance on large graphs compared to existing approaches. Our code is available at https://anonymous.4open.science/r/gctd-345A.

【9】Graph Concept Bottleneck Models
标题：图概念瓶颈模型
链接：https://arxiv.org/abs/2508.14255

作者：u, Tsui-Wei Weng, Lam M. Nguyen, Tengfei Ma
摘要：概念瓶颈模型（CBMs）通过概念为深度神经网络提供明确的解释，并允许对概念进行干预以调整最终预测。现有的CBMs假设概念是条件独立的给定的标签和相互隔离，忽略了隐藏的概念之间的关系。然而，建立信任措施中的一套概念往往具有内在结构，其中概念通常是相互关联的：改变一个概念必然会影响其相关概念。为了减轻这种限制，我们提出了GraphCBMs：一种新的CBM变体，通过构建潜在的概念图来促进概念关系，它可以与CBM相结合，以提高模型性能，同时保持其可解释性。我们在真实世界图像分类任务上的实验结果表明，图CBMs具有以下优点：（1）在图像分类任务中具有优越性，同时提供更多的概念结构信息以实现可解释性;（2）能够利用潜在概念图进行更有效的干预;（3）在不同的训练和架构设置中具有鲁棒性。
摘要：Concept Bottleneck Models (CBMs) provide explicit interpretations for deep neural networks through concepts and allow intervention with concepts to adjust final predictions. Existing CBMs assume concepts are conditionally independent given labels and isolated from each other, ignoring the hidden relationships among concepts. However, the set of concepts in CBMs often has an intrinsic structure where concepts are generally correlated: changing one concept will inherently impact its related concepts. To mitigate this limitation, we propose GraphCBMs: a new variant of CBM that facilitates concept relationships by constructing latent concept graphs, which can be combined with CBMs to enhance model performance while retaining their interpretability. Our experiment results on real-world image classification tasks demonstrate Graph CBMs offer the following benefits: (1) superior in image classification tasks while providing more concept structure information for interpretability; (2) able to utilize latent concept graphs for more effective interventions; and (3) robust in performance across different training and architecture settings.

【10】Accelerating Image Classification with Graph Convolutional Neural Networks using Voronoi Diagrams
标题：使用Voronoi图的图卷积神经网络加速图像分类
链接：https://arxiv.org/abs/2508.14218

作者：ohammadi Gharasuie, Luis Rueda
备注：14 pages, 13 figures
摘要：图像分类的最新进展已经被图卷积网络（GCN）的集成所显著推动，为处理复杂的数据结构提供了一种新的范例。这项研究介绍了一个创新的框架，采用GCN结合Voronoi图进行图像分类，利用其特殊的能力来建模关系数据。与传统的卷积神经网络不同，我们的方法利用了基于图形的图像表示，其中像素或区域被视为图形的顶点，然后以相应的Delaunay三角剖分的形式进行简化。我们的模型在几个基准数据集上的预处理时间和分类准确性方面有了显着的改进，超过了现有的最先进的模型，特别是在涉及复杂场景和细粒度类别的场景中。通过交叉验证验证的实验结果强调了GCN与Voronoi图在推进图像分类任务中的潜力。这项研究通过引入一种新的图像分类方法为该领域做出了贡献，同时为在计算机视觉和非结构化数据的其他领域开发基于图的学习范式开辟了新的途径。特别是，我们提出了一个新的版本的GCN在本文中，即归一化Voronoi图卷积网络（NVGCN），这是比常规的GCN更快。
摘要：Recent advances in image classification have been significantly propelled by the integration of Graph Convolutional Networks (GCNs), offering a novel paradigm for handling complex data structures. This study introduces an innovative framework that employs GCNs in conjunction with Voronoi diagrams to peform image classification, leveraging their exceptional capability to model relational data. Unlike conventional convolutional neural networks, our approach utilizes a graph-based representation of images, where pixels or regions are treated as vertices of a graph, which are then simplified in the form of the corresponding Delaunay triangulations. Our model yields significant improvement in pre-processing time and classification accuracy on several benchmark datasets, surpassing existing state-of-the-art models, especially in scenarios that involve complex scenes and fine-grained categories. The experimental results, validated via cross-validation, underscore the potential of integrating GCNs with Voronoi diagrams in advancing image classification tasks. This research contributes to the field by introducing a novel approach to image classification, while opening new avenues for developing graph-based learning paradigms in other domains of computer vision and non-structured data. In particular, we have proposed a new version of the GCN in this paper, namely normalized Voronoi Graph Convolution Network (NVGCN), which is faster than the regular GCN.

【11】Noise Robust One-Class Intrusion Detection on Dynamic Graphs
标题：动态图上的噪音鲁棒一类入侵检测
链接：https://arxiv.org/abs/2508.14192

作者：iuliakov, Alexander Schulz, Luca Hermes, Barbara Hammer
摘要：在网络入侵检测领域，对污染和噪声数据输入的鲁棒性仍然是一个关键的挑战。本研究介绍了一个概率版本的时间图网络支持向量数据描述（TGN-SVDD）模型，旨在提高检测精度的输入噪声的存在。通过预测每个网络事件的高斯分布参数，我们的模型能够自然地解决噪声对抗，并与基线模型相比提高鲁棒性。我们在具有合成噪声的修改后的CIC-IDS 2017数据集上的实验表明，与基线TGN-SVDD模型相比，检测性能得到了显着改善，特别是随着噪声水平的增加。
摘要：In the domain of network intrusion detection, robustness against contaminated and noisy data inputs remains a critical challenge. This study introduces a probabilistic version of the Temporal Graph Network Support Vector Data Description (TGN-SVDD) model, designed to enhance detection accuracy in the presence of input noise. By predicting parameters of a Gaussian distribution for each network event, our model is able to naturally address noisy adversarials and improve robustness compared to a baseline model. Our experiments on a modified CIC-IDS2017 data set with synthetic noise demonstrate significant improvements in detection performance compared to the baseline TGN-SVDD model, especially as noise levels increase.

【12】Beyond Fixed Morphologies: Learning Graph Policies with Trust Region Compensation in Variable Action Spaces
标题：超越固定形态：可变动作空间中具有信任区域补偿的学习图策略
链接：https://arxiv.org/abs/2508.14102

作者：llien
摘要：基于信赖域的优化方法已经成为基础强化学习算法，在连续控制任务中提供稳定性和强大的经验性能。对可扩展和可重复使用的控制策略的兴趣越来越大，这也转化为对形态学泛化的需求，即控制策略应对不同运动学结构的能力。基于图的策略架构提供了一种自然而有效的机制来编码这种结构差异。然而，虽然这些架构适应可变的形态，信赖域方法在不同的动作空间维度下的行为仍然知之甚少。为此，我们进行了基于信任区域的策略优化方法的理论分析，重点是信任区域策略优化（TRPO）和它的广泛使用的一阶近似，邻近策略优化（PPO）。我们的目标是证明如何不同的动作空间维度影响的优化景观，特别是在KL发散或政策裁剪处罚的约束下。补充的理论见解，形态变化下的实证评估进行了体育馆游泳环境。这个基准提供了一个系统的控制设置，在不改变基本任务的情况下改变运动学结构，使其特别适合于研究形态学概括。
摘要：Trust region-based optimization methods have become foundational reinforcement learning algorithms that offer stability and strong empirical performance in continuous control tasks. Growing interest in scalable and reusable control policies translate also in a demand for morphological generalization, the ability of control policies to cope with different kinematic structures. Graph-based policy architectures provide a natural and effective mechanism to encode such structural differences. However, while these architectures accommodate variable morphologies, the behavior of trust region methods under varying action space dimensionality remains poorly understood. To this end, we conduct a theoretical analysis of trust region-based policy optimization methods, focusing on both Trust Region Policy Optimization (TRPO) and its widely used first-order approximation, Proximal Policy Optimization (PPO). The goal is to demonstrate how varying action space dimensionality influence the optimization landscape, particularly under the constraints imposed by KL-divergence or policy clipping penalties. Complementing the theoretical insights, an empirical evaluation under morphological variation is carried out using the Gymnasium Swimmer environment. This benchmark offers a systematically controlled setting for varying the kinematic structure without altering the underlying task, making it particularly well-suited to study morphological generalization.

【13】Non-Dissipative Graph Propagation for Non-Local Community Detection
标题：非本地社区检测的非消耗图传播
链接：https://arxiv.org/abs/2508.14097

作者：eeney, Alessio Gravina, Davide Bacciu
备注：Accepted at IJCNN 2025
摘要：图中的社区检测旨在将节点聚类到有意义的组中，这在异嗜图中是一项特别具有挑战性的任务，其中共享相似性和同一社区成员资格的节点通常是远距离连接的。当这个任务由图神经网络处理时，这一点尤其明显，因为它们依赖于固有的本地消息传递方案来学习节点表示，这些节点表示用于将节点聚类到社区中。在这项工作中，我们认为，在消息传递过程中传播远程信息的能力是关键，有效地执行社区检测heterophilic图。为此，我们引入了无监督反对称图神经网络（uAGNN），这是一种新型的无监督社区检测方法，利用非耗散动力系统来确保稳定性并有效地传播长距离信息。通过采用反对称权重矩阵，uAGNN捕获局部和全局图结构，克服了异嗜场景带来的限制。在10个数据集上进行的广泛实验证明了uAGNN在高和中等嗜异性环境中的卓越性能，传统方法无法利用长程依赖性。这些结果突出了uAGNN的潜力，作为一个强大的工具，在不同的图形环境中的无监督社区检测。
摘要：Community detection in graphs aims to cluster nodes into meaningful groups, a task particularly challenging in heterophilic graphs, where nodes sharing similarities and membership to the same community are typically distantly connected. This is particularly evident when this task is tackled by graph neural networks, since they rely on an inherently local message passing scheme to learn the node representations that serve to cluster nodes into communities. In this work, we argue that the ability to propagate long-range information during message passing is key to effectively perform community detection in heterophilic graphs. To this end, we introduce the Unsupervised Antisymmetric Graph Neural Network (uAGNN), a novel unsupervised community detection approach leveraging non-dissipative dynamical systems to ensure stability and to propagate long-range information effectively. By employing antisymmetric weight matrices, uAGNN captures both local and global graph structures, overcoming the limitations posed by heterophilic scenarios. Extensive experiments across ten datasets demonstrate uAGNN's superior performance in high and medium heterophilic settings, where traditional methods fail to exploit long-range dependencies. These results highlight uAGNN's potential as a powerful tool for unsupervised community detection in diverse graph environments.

【14】GeoMAE: Masking Representation Learning for Spatio-Temporal Graph Forecasting with Missing Values
标题：EOMAE：用于缺失值的时空图预测的掩蔽表示学习
链接：https://arxiv.org/abs/2508.14083

作者：, Chenyu Wu, Yuxuan Liang, Xiuwen Yi, Yanping Sun, Junbo Zhang, Yu Zheng
备注：8 pages, incomplete version
摘要：准确获取感兴趣点的人群流量是有效的交通管理、公共服务和城市规划的关键。尽管如此重要，由于城市传感技术的限制，大多数来源的数据质量不足以监测每个POI的人群流量。这使得从低质量数据中推断准确的人群流量成为一项关键且具有挑战性的任务。三个关键因素加剧了复杂性：1）标记数据的稀缺性，2）POI之间复杂的时空依赖性，以及3）精确的人群流量和GPS报告之间的无数相关性。为了应对这些挑战，我们将人群流推理问题重新定义为一个自监督的属性图表示学习任务，并引入了一个新的\underline{C}对比\underline{S}自学习框架，用于\underline{S} patience-\underline{T}emporal数据（\model）。我们的方法开始与建立在POI和它们各自的距离的空间邻接图的建设。然后，我们采用对比学习技术，利用大量的未标记的时空数据。我们采用交换预测方法来预测目标子图的表示。在预训练阶段之后，模型将使用准确的人群流量数据进行微调。我们在两个真实世界的数据集上进行的实验表明，在大量噪声数据上预先训练的模型始终优于从头开始训练的模型。
摘要：Accurate acquisition of crowd flow at Points of Interest (POIs) is pivotal for effective traffic management, public service, and urban planning. Despite this importance, due to the limitations of urban sensing techniques, the data quality from most sources is inadequate for monitoring crowd flow at each POI. This renders the inference of accurate crowd flow from low-quality data a critical and challenging task. The complexity is heightened by three key factors: 1) \emph{The scarcity and rarity of labeled data}, 2) \emph{The intricate spatio-temporal dependencies among POIs}, and 3) \emph{The myriad correlations between precise crowd flow and GPS reports}. To address these challenges, we recast the crowd flow inference problem as a self-supervised attributed graph representation learning task and introduce a novel \underline{C}ontrastive \underline{S}elf-learning framework for \underline{S}patio-\underline{T}emporal data (\model). Our approach initiates with the construction of a spatial adjacency graph founded on the POIs and their respective distances. We then employ a contrastive learning technique to exploit large volumes of unlabeled spatio-temporal data. We adopt a swapped prediction approach to anticipate the representation of the target subgraph from similar instances. Following the pre-training phase, the model is fine-tuned with accurate crowd flow data. Our experiments, conducted on two real-world datasets, demonstrate that the \model pre-trained on extensive noisy data consistently outperforms models trained from scratch.

【15】Explainable Graph Spectral Clustering For Text Embeddings
标题：文本嵌入的可解释图谱聚集
链接：https://arxiv.org/abs/2508.14075

作者：w A. Kłopotek, Sławomir T. Wierzchoń, Bartłomiej Starosta, Piotr Borkowski, Dariusz Czerski, Eryk Laskowski
备注：47 pages, 19 tables, 11 figures
摘要：在以前的论文中，我们提出了一个介绍文本文档的图谱聚类结果的可解释性，给定的文档相似度计算为术语向量空间中的余弦相似度。在本文中，我们通过考虑文档的其他嵌入，特别是基于GloVe嵌入思想来推广这一思想。
摘要：In a previous paper, we proposed an introduction to the explainability of Graph Spectral Clustering results for textual documents, given that document similarity is computed as cosine similarity in term vector space. In this paper, we generalize this idea by considering other embeddings of documents, in particular, based on the GloVe embedding idea.

【16】Graph Neural Network for Product Recommendation on the Amazon Co-purchase Graph
标题：亚马逊联合购买图上的产品推荐神经网络
链接：https://arxiv.org/abs/2508.14059

作者：Cao, Frank F. Yang, Yi Jin, Yijun Yan
备注：15 pages, 5 figures, preprint
摘要：在海量数据中识别相关信息是现代推荐系统面临的一个挑战。图神经网络（GNN）通过基于图的学习利用结构和语义关系，已经展示了巨大的潜力。该研究评估了四种GNN架构，LightGCN，GraphSAGE，GAT和PinSAGE，在链接预测设置下的亚马逊产品共同购买网络上的能力。我们研究了架构，模型性能，可扩展性，训练复杂性和泛化之间的实际权衡。结果展示了每个模型在现实推荐场景中部署GNN的性能特征。
摘要：Identifying relevant information among massive volumes of data is a challenge for modern recommendation systems. Graph Neural Networks (GNNs) have demonstrated significant potential by utilizing structural and semantic relationships through graph-based learning. This study assessed the abilities of four GNN architectures, LightGCN, GraphSAGE, GAT, and PinSAGE, on the Amazon Product Co-purchase Network under link prediction settings. We examined practical trade-offs between architectures, model performance, scalability, training complexity and generalization. The outcomes demonstrated each model's performance characteristics for deploying GNN in real-world recommendation scenarios.

【17】Deep Learning for School Dropout Detection: A Comparison of Tabular and Graph-Based Models for Predicting At-Risk Students
标题：用于辍学检测的深度学习：用于预测高危学生的表格和基于图形的模型的比较
链接：https://arxiv.org/abs/2508.14057

作者：Almeida, Guilherme A. L. Silva, Valéria Santos, Gladston Moreira, Pedro Silva, Eduardo Luz
备注：12 pages
摘要：学生辍学是世界各地教育系统的一个重大挑战，导致巨大的社会和经济成本。预测有辍学风险的学生可以及时采取干预措施。虽然在表格数据上操作的传统机器学习（ML）模型已经显示出了希望，但图神经网络（GNN）通过捕获学生数据中固有的复杂关系（如果结构为图）提供了潜在的优势。本文探讨是否转换表格学生数据到图形结构，主要是使用聚类技术，提高辍学预测的准确性。我们使用真实世界的学生数据集比较了GNN（自定义图卷积网络（GCN）和GraphSAGE）在这些生成的图上与已建立的表格模型（随机森林（RF），XGBoost和TabNet）的性能。我们的实验探索了基于不同聚类算法（K-Means，HDBSCAN）和降维技术（主成分分析（PCA），均匀流形近似和投影（UMAP））的各种图形构建策略。我们的研究结果表明，一个特定的GNN配置，来自PCA-KMeans聚类的图形上的GraphSAGE，实现了卓越的性能，显着提高了宏观F1分数约7个百分点，准确性比最强的表格基线（XGBoost）提高了近2个百分点。然而，其他GNN配置和图构造方法并不总是超越表格模型，强调了图生成策略和GNN架构选择的关键作用。这突出了GNN的潜力以及在该领域为基于图的学习优化转换表格数据的挑战。
摘要：Student dropout is a significant challenge in educational systems worldwide, leading to substantial social and economic costs. Predicting students at risk of dropout allows for timely interventions. While traditional Machine Learning (ML) models operating on tabular data have shown promise, Graph Neural Networks (GNNs) offer a potential advantage by capturing complex relationships inherent in student data if structured as graphs. This paper investigates whether transforming tabular student data into graph structures, primarily using clustering techniques, enhances dropout prediction accuracy. We compare the performance of GNNs (a custom Graph Convolutional Network (GCN) and GraphSAGE) on these generated graphs against established tabular models (Random Forest (RF), XGBoost, and TabNet) using a real-world student dataset. Our experiments explore various graph construction strategies based on different clustering algorithms (K-Means, HDBSCAN) and dimensionality reduction techniques (Principal Component Analysis (PCA), Uniform Manifold Approximation and Projection (UMAP)). Our findings demonstrate that a specific GNN configuration, GraphSAGE on a graph derived from PCA-KMeans clustering, achieved superior performance, notably improving the macro F1-score by approximately 7 percentage points and accuracy by nearly 2 percentage points over the strongest tabular baseline (XGBoost). However, other GNN configurations and graph construction methods did not consistently surpass tabular models, emphasizing the critical role of the graph generation strategy and GNN architecture selection. This highlights both the potential of GNNs and the challenges in optimally transforming tabular data for graph-based learning in this domain.

Transformer(3篇)

【1】Multimodal Quantum Vision Transformer for Enzyme Commission Classification from Biochemical Representations
标题：基于多模态量子视觉Transformer器的生化表征酶分类
链接：https://arxiv.org/abs/2508.14844

作者：k, Mandeep Kaur Saggi, Humaira Gowher, Sabre Kais
备注：Accepted at IEEE International Conference on Quantum Artificial Intelligence (QAI) 2025
摘要：准确预测酶的功能仍然是计算生物学中的主要挑战之一，特别是对于具有有限结构注释或序列同源性的酶。我们提出了一种新的多模态量子机器学习（QML）框架，通过整合四种互补的生物化学模式来增强酶委员会（EC）的分类：蛋白质序列嵌入，量子衍生的电子描述符，分子图结构和二维分子图像表示。Quantum Vision Transformer（QVT）主干配备特定模态编码器和统一的交叉注意融合模块。通过整合图形特征和空间模式，我们的方法捕获酶功能背后的关键立体电子相互作用。实验结果表明，我们的多模态QVT模型达到了85.1%的top-1准确率，大大优于仅序列基线，并且与其他QML模型相比，实现了更好的性能。
摘要：Accurately predicting enzyme functionality remains one of the major challenges in computational biology, particularly for enzymes with limited structural annotations or sequence homology. We present a novel multimodal Quantum Machine Learning (QML) framework that enhances Enzyme Commission (EC) classification by integrating four complementary biochemical modalities: protein sequence embeddings, quantum-derived electronic descriptors, molecular graph structures, and 2D molecular image representations. Quantum Vision Transformer (QVT) backbone equipped with modality-specific encoders and a unified cross-attention fusion module. By integrating graph features and spatial patterns, our method captures key stereoelectronic interactions behind enzyme function. Experimental results demonstrate that our multimodal QVT model achieves a top-1 accuracy of 85.1%, outperforming sequence-only baselines by a substantial margin and achieving better performance results compared to other QML models.

【2】NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model
标题：NVIDIA Nemotron Nano 2：准确有效的混合Mamba-Transformer推理模型
链接：https://arxiv.org/abs/2508.14444

作者：arti Basant, Abhijit Khairnar, Abhijit Paithankar, Abhinav Khattar, Adi Renduchintala, Adithya Renduchintala, Aditya Malte, Akhiad Bercovich, Akshay Hazare, Alejandra Rico, Aleksander Ficek, Alex Kondratenko, Alex Shaposhnikov, Ali Taghibakhshi, Amelia Barton, Ameya Sunil Mahabaleshwarkar, Amy Shen, Andrew Tao, Ann Guan, Anna Shors, Anubhav Mandarwal, Arham Mehta, Arun Venkatesan, Ashton Sharabiani, Ashwath Aithal, Ashwin Poojary, Ayush Dattagupta, Balaram Buddharaju, Banghua Zhu, Barnaby Simkin, Bilal Kartal, Bita Darvish Rouhani, Bobby Chen, Boris Ginsburg, Brandon Norick, Brian Yu, Bryan Catanzaro, Charles Wang, Charlie Truong, Chetan Mungekar, Chintan Patel, Chris Alexiuk, Christian Munley, Christopher Parisien, Dan Su, Daniel Afrimi, Daniel Korzekwa, Daniel Rohrer, Daria Gitman, David Mosallanezhad, Deepak Narayanan, Dima Rekesh, Dina Yared, Dmytro Pykhtar, Dong Ahn, Duncan Riach, Eileen Long, Elliott Ning, Eric Chung, Erick Galinkin, Evelina Bakhturina, Gargi Prasad, Gerald Shen, Haim Elisha, Harsh Sharma, Hayley Ross, Helen Ngo, Herman Sahota, Hexin Wang, Hoo Chang Shin, Hua Huang, Iain Cunningham, Igor Gitman, Ivan Moshkov, Jaehun Jung, Jan Kautz, Jane Polak Scowcroft, Jared Casper, Jimmy Zhang, Jinze Xue, Jocelyn Huang, Joey Conway, John Kamalu, Jonathan Cohen, Joseph Jennings, Julien Veron Vialard, Junkeun Yi, Jupinder Parmar, Kari Briski, Katherine Cheung, Katherine Luna, Keith Wyss, Keshav Santhanam, Kezhi Kong, Krzysztof Pawelec, Kumar Anik, Kunlun Li, Kushan Ahmadian, Lawrence McAfee
摘要：我们推出了Nemotron-Nano-9 B-v2，这是一种混合Mamba-Transformer语言模型，旨在提高推理工作负载的吞吐量，同时与类似大小的模型相比实现最先进的准确性。Nemotron-Nano-9 B-v2建立在Nemotron-H架构的基础上，其中常见的Transformer架构中的大多数自我注意层被Mamba-2层取代，以在生成推理所需的长思维轨迹时实现更快的推理速度。我们首先使用FP 8训练配方在20万亿令牌上预训练120亿参数模型（Nemotron-Nano-12 B-v2-Base），从而创建Nemotron-Nano-9 B-v2。在对齐Nemotron-Nano-12 B-v2-Base之后，我们采用Minitron策略来压缩和提取模型，目标是在单个NVIDIA A10 G GPU（22 GiB内存，bfloat 16精度）上实现多达128 k令牌的推理。与现有的类似尺寸的模型相比（例如，Qwen 3 -8B），我们表明Nemotron-Nano-9 B-v2在推理基准上达到了同等或更好的准确性，同时在推理设置（如8 k输入和16 k输出令牌）中实现了高达6倍的推理吞吐量。我们正在发布Nemotron-Nano-9 B-v2，Nemotron-Nano 12 B-v2-Base和Nemotron-Nano-9 B-v2-Base检查点，以及我们在Hugging Face上的大部分训练前和训练后数据集。
摘要：We introduce Nemotron-Nano-9B-v2, a hybrid Mamba-Transformer language model designed to increase throughput for reasoning workloads while achieving state-of-the-art accuracy compared to similarly-sized models. Nemotron-Nano-9B-v2 builds on the Nemotron-H architecture, in which the majority of the self-attention layers in the common Transformer architecture are replaced with Mamba-2 layers, to achieve improved inference speed when generating the long thinking traces needed for reasoning. We create Nemotron-Nano-9B-v2 by first pre-training a 12-billion-parameter model (Nemotron-Nano-12B-v2-Base) on 20 trillion tokens using an FP8 training recipe. After aligning Nemotron-Nano-12B-v2-Base, we employ the Minitron strategy to compress and distill the model with the goal of enabling inference on up to 128k tokens on a single NVIDIA A10G GPU (22GiB of memory, bfloat16 precision). Compared to existing similarly-sized models (e.g., Qwen3-8B), we show that Nemotron-Nano-9B-v2 achieves on-par or better accuracy on reasoning benchmarks while achieving up to 6x higher inference throughput in reasoning settings like 8k input and 16k output tokens. We are releasing Nemotron-Nano-9B-v2, Nemotron-Nano12B-v2-Base, and Nemotron-Nano-9B-v2-Base checkpoints along with the majority of our pre- and post-training datasets on Hugging Face.

【3】STAS: Spatio-Temporal Adaptive Computation Time for Spiking Transformers
标题：STAS：峰值Transformer的时空自适应计算时间
链接：https://arxiv.org/abs/2508.14138

作者：ang, Doohyun Kim, Sang-Ki Ko, Jinkyu Lee, Brent ByungHoon Kang, Hyeongboo Baek
备注：8 pages
摘要：尖峰神经网络（SNN）提供了比人工神经网络（ANN）更高的能量效率，但由于其多时间步操作性质，其具有高延迟和计算开销。虽然已经开发了各种动态计算方法来通过针对空间、时间或特定于架构的冗余来减轻这种情况，但它们仍然是分散的。虽然自适应计算时间（ACT）的原则提供了一个强大的基础，一个统一的方法，它的应用SNN为基础的Vision Transformers（ViTs）是阻碍了两个核心问题：违反其时间相似性的先决条件和静态架构根本不适合其原则。为了解决这些挑战，我们提出了STAS（Spatio-Temporal Adaptive Computation Time for Spiking Transformers），这是一个协同设计静态架构和动态计算策略的框架。STAS引入了一个集成的尖峰补丁分裂（I-SPS）模块，通过创建一个统一的输入表示来建立时间稳定性，从而解决了时间不相似的架构问题。反过来，这种稳定性允许我们的自适应尖峰自我注意（A-SSA）模块在空间和时间轴上执行二维标记修剪。STAS在尖峰Transformer架构上实现，并在CIFAR-10、CIFAR-100和ImageNet上进行了验证，分别将能耗降低了45.9%、43.8%和30.1%，同时提高了SOTA模型的准确性。
摘要：Spiking neural networks (SNNs) offer energy efficiency over artificial neural networks (ANNs) but suffer from high latency and computational overhead due to their multi-timestep operational nature. While various dynamic computation methods have been developed to mitigate this by targeting spatial, temporal, or architecture-specific redundancies, they remain fragmented. While the principles of adaptive computation time (ACT) offer a robust foundation for a unified approach, its application to SNN-based vision Transformers (ViTs) is hindered by two core issues: the violation of its temporal similarity prerequisite and a static architecture fundamentally unsuited for its principles. To address these challenges, we propose STAS (Spatio-Temporal Adaptive computation time for Spiking transformers), a framework that co-designs the static architecture and dynamic computation policy. STAS introduces an integrated spike patch splitting (I-SPS) module to establish temporal stability by creating a unified input representation, thereby solving the architectural problem of temporal dissimilarity. This stability, in turn, allows our adaptive spiking self-attention (A-SSA) module to perform two-dimensional token pruning across both spatial and temporal axes. Implemented on spiking Transformer architectures and validated on CIFAR-10, CIFAR-100, and ImageNet, STAS reduces energy consumption by up to 45.9%, 43.8%, and 30.1%, respectively, while simultaneously improving accuracy over SOTA models.

GAN|对抗|攻击|生成相关(5篇)

【1】Personalized Counterfactual Framework: Generating Potential Outcomes from Wearable Data
标题：个性化反事实框架：从可穿戴数据中产生潜在结果
链接：https://arxiv.org/abs/2508.14432

作者：amanian, Amir M. Rahmani
摘要：可穿戴传感器数据为个性化健康监测提供了机会，但从其复杂的纵向数据流中获得可操作的见解是具有挑战性的。本文介绍了一个框架，学习个性化的反事实模型，从多元可穿戴数据。这使得探索假设情景，以了解生活方式选择的潜在个人特定结果。我们的方法首先通过多模态相似性分析来增强来自相似患者的数据的个体数据集。然后，我们使用时间PC（Peter-Clark）算法适应来发现预测关系，建模时间t-1的变量如何影响时间t的生理变化。梯度提升机在这些发现的关系上进行训练，以量化个体特定的影响。这些模型驱动反事实引擎，在假设的干预下投射生理轨迹（例如，活动或睡眠变化）。我们通过一步预测验证和评估干预措施的可行性和影响来评估该框架。评估显示了合理的预测准确性（例如，平均心率MAE 4.71 bpm）和高反事实可信度（中位数0.9643）。至关重要的是，这些干预措施强调了对假设的生活方式变化做出反应的显著个体间差异，显示了该框架在个性化见解方面的潜力。这项工作提供了一个工具，探索个性化的健康动态，并产生对生活方式变化的个人反应的假设。
摘要：Wearable sensor data offer opportunities for personalized health monitoring, yet deriving actionable insights from their complex, longitudinal data streams is challenging. This paper introduces a framework to learn personalized counterfactual models from multivariate wearable data. This enables exploring what-if scenarios to understand potential individual-specific outcomes of lifestyle choices. Our approach first augments individual datasets with data from similar patients via multi-modal similarity analysis. We then use a temporal PC (Peter-Clark) algorithm adaptation to discover predictive relationships, modeling how variables at time t-1 influence physiological changes at time t. Gradient Boosting Machines are trained on these discovered relationships to quantify individual-specific effects. These models drive a counterfactual engine projecting physiological trajectories under hypothetical interventions (e.g., activity or sleep changes). We evaluate the framework via one-step-ahead predictive validation and by assessing the plausibility and impact of interventions. Evaluation showed reasonable predictive accuracy (e.g., mean heart rate MAE 4.71 bpm) and high counterfactual plausibility (median 0.9643). Crucially, these interventions highlighted significant inter-individual variability in response to hypothetical lifestyle changes, showing the framework's potential for personalized insights. This work provides a tool to explore personalized health dynamics and generate hypotheses on individual responses to lifestyle changes.

【2】HandCraft: Dynamic Sign Generation for Synthetic Data Augmentation
标题：HandCraft：用于合成数据增强的动态符号生成
链接：https://arxiv.org/abs/2508.14345

作者：stavo Rios
备注：26 pages, 4 figures, 9 tables, code available at this https URL
摘要：手语识别（SLR）模型由于训练数据的可用性不足而面临显著的性能限制。在这篇文章中，我们解决了有限的数据在SLR的挑战，通过引入一种新的和轻量级的标志生成模型的基础上CMLPe。该模型与合成数据预训练方法相结合，不断提高识别准确性，使用我们的Mamba-SL和Transformer-SL分类器为LSFB和DiSPLaY数据集建立了新的最先进的结果。我们的研究结果表明，合成数据预训练在某些情况下优于传统的增强方法，并且在与它们一起实施时产生互补的好处。我们的方法通过提供计算效率高的方法，使SLR的符号生成和合成数据预训练民主化，这些方法可以在不同的数据集上实现显着的性能改进。
摘要：Sign Language Recognition (SLR) models face significant performance limitations due to insufficient training data availability. In this article, we address the challenge of limited data in SLR by introducing a novel and lightweight sign generation model based on CMLPe. This model, coupled with a synthetic data pretraining approach, consistently improves recognition accuracy, establishing new state-of-the-art results for the LSFB and DiSPLaY datasets using our Mamba-SL and Transformer-SL classifiers. Our findings reveal that synthetic data pretraining outperforms traditional augmentation methods in some cases and yields complementary benefits when implemented alongside them. Our approach democratizes sign generation and synthetic data pretraining for SLR by providing computationally efficient methods that achieve significant performance improvements across diverse datasets.

【3】GEPD:GAN-Enhanced Generalizable Model for EEG-Based Detection of Parkinson's Disease
标题：GEPD：基于脑电检测帕金森病的GAN增强的可推广模型
链接：https://arxiv.org/abs/2508.14074

作者：g, Ruilin Zhang, Biaokai Zhu, Xun Han, Jun Xiao, Yifan Liu, Zhe Wang
备注：Accepted by International Conference on Intelligent Computing(ICIC 2025)
摘要：目前的帕金森病检测方法在单个数据集上取得了显著的成功，然而，不同EEG数据集上的检测方法的差异性以及每个数据集的小规模给训练跨数据集场景的通用模型带来了挑战。为了解决这些问题，本文提出了一种基于GAN增强的泛化模型GEPD，专门用于基于EEG的帕金森病跨数据集分类。首先，我们设计了一个生成网络，通过控制生成数据与真实数据之间的分布相似度来生成融合EEG数据。此外，我们还设计了一个EEG信号质量评估模型，以确保生成数据的质量。我们设计了一个分类网络，它利用多个卷积神经网络的组合来有效地捕获EEG信号的时频特征，同时保持可推广的结构并确保易于收敛。这项工作致力于利用智能方法来研究病理表现，旨在促进神经系统疾病的诊断和监测。评估结果表明，我们的模型在跨数据集设置中执行最先进的模型，实现了84.3%的准确性和84.0%的F1得分，展示了所提出的模型的泛化能力。
摘要：Electroencephalography has been established as an effective method for detecting Parkinson's disease, typically diagnosed early.Current Parkinson's disease detection methods have shown significant success within individual datasets, however, the variability in detection methods across different EEG datasets and the small size of each dataset pose challenges for training a generalizable model for cross-dataset scenarios. To address these issues, this paper proposes a GAN-enhanced generalizable model, named GEPD, specifically for EEG-based cross-dataset classification of Parkinson's disease.First, we design a generative network that creates fusion EEG data by controlling the distribution similarity between generated data and real data.In addition, an EEG signal quality assessment model is designed to ensure the quality of generated data great.Second, we design a classification network that utilizes a combination of multiple convolutional neural networks to effectively capture the time-frequency characteristics of EEG signals, while maintaining a generalizable structure and ensuring easy convergence.This work is dedicated to utilizing intelligent methods to study pathological manifestations, aiming to facilitate the diagnosis and monitoring of neurological diseases.The evaluation results demonstrate that our model performs comparably to state-of-the-art models in cross-dataset settings, achieving an accuracy of 84.3% and an F1-score of 84.0%, showcasing the generalizability of the proposed model.

【4】Distributional Adversarial Attacks and Training in Deep Hedging
标题：分布式对抗攻击和深度对冲训练
链接：https://arxiv.org/abs/2508.14757

作者：e, Tobias Sutter, Lukas Gonon
备注：Preprint. Under review
摘要：在本文中，我们利用对抗性攻击的概念研究了经典深度对冲策略在分布变化下的鲁棒性。我们首先证明了标准的深度对冲模型非常容易受到输入分布中的小扰动的影响，从而导致显着的性能下降。受此启发，我们提出了一个对抗性训练框架，以提高深度对冲策略的鲁棒性。我们的方法将逐点对抗攻击扩展到分布式设置，并在Wasserstein球上引入了对抗优化问题的计算上易于处理的重新表述。这使得能够有效地训练对分布扰动有弹性的对冲策略。通过大量的数值实验，我们表明，在样本外性能和对模型错误指定的弹性方面，对抗训练的深度对冲策略始终优于经典对冲策略。我们的研究结果建立了一个实用和有效的框架下，现实的市场不确定性的强大的深度对冲。
摘要：In this paper, we study the robustness of classical deep hedging strategies under distributional shifts by leveraging the concept of adversarial attacks. We first demonstrate that standard deep hedging models are highly vulnerable to small perturbations in the input distribution, resulting in significant performance degradation. Motivated by this, we propose an adversarial training framework tailored to increase the robustness of deep hedging strategies. Our approach extends pointwise adversarial attacks to the distributional setting and introduces a computationally tractable reformulation of the adversarial optimization problem over a Wasserstein ball. This enables the efficient training of hedging strategies that are resilient to distributional perturbations. Through extensive numerical experiments, we show that adversarially trained deep hedging strategies consistently outperform their classical counterparts in terms of out-of-sample performance and resilience to model misspecification. Our findings establish a practical and effective framework for robust deep hedging under realistic market uncertainties.

【5】3D Cardiac Anatomy Generation Using Mesh Latent Diffusion Models
标题：使用网格潜在扩散模型生成3D心脏解剖结构
链接：https://arxiv.org/abs/2508.14122

作者：ozyrska, Marcel Beetz, Luke Melas-Kyriazi, Vicente Grau, Abhirup Banerjee, Alfonso Bueno-Orovio
摘要：扩散模型最近因其生成能力，特别是合成数据的高质量和多样性而获得了极大的兴趣。然而，它们在3D医学成像中的应用实例仍然很少，特别是在心脏病学中。生成各种逼真的心脏解剖结构对于计算机试验、机电计算机模拟或机器学习模型的数据增强等应用至关重要。在这项工作中，我们研究应用潜在扩散模型（LDMs）生成三维网格的人体心脏解剖结构。为此，我们提出了一种新的LDM架构- MeshLDM。我们应用所提出的模型对急性心肌梗死患者的左心室心脏解剖结构的三维网格数据集，并评估其性能的定性和定量的临床和三维网格重建指标。申报的MeshLDM成功捕获了舒张末期（舒张）和收缩末期（收缩）心脏相位的心脏形状特征，生成的网格与金标准相比，群体平均值差异为2.4%。
摘要：Diffusion models have recently gained immense interest for their generative capabilities, specifically the high quality and diversity of the synthesized data. However, examples of their applications in 3D medical imaging are still scarce, especially in cardiology. Generating diverse realistic cardiac anatomies is crucial for applications such as in silico trials, electromechanical computer simulations, or data augmentations for machine learning models. In this work, we investigate the application of Latent Diffusion Models (LDMs) for generating 3D meshes of human cardiac anatomies. To this end, we propose a novel LDM architecture -- MeshLDM. We apply the proposed model on a dataset of 3D meshes of left ventricular cardiac anatomies from patients with acute myocardial infarction and evaluate its performance in terms of both qualitative and quantitative clinical and 3D mesh reconstruction metrics. The proposed MeshLDM successfully captures characteristics of the cardiac shapes at end-diastolic (relaxation) and end-systolic (contraction) cardiac phases, generating meshes with a 2.4% difference in population mean compared to the gold standard.

半/弱/无/有监督|不确定性|主动学习(3篇)

【1】Topological Data Analysis for Unsupervised Anomaly Detection and Customer Segmentation on Banking Data
标题：银行数据的无监督异常检测和客户细分的布局数据分析
链接：https://arxiv.org/abs/2508.14136

作者：Aldo Alejandro Barberi, Linda Maria De Cave
摘要：本文介绍了先进的拓扑数据分析（TDA）技术，用于银行数据的无监督异常检测和客户细分。使用映射算法和持久同源性，我们开发了无监督的程序，发现有意义的模式，在客户的银行数据，利用拓扑信息。我们在本文中提出的框架产生了可操作的见解，将抽象的拓扑数学主题与工业中有用的实际用例相结合。
摘要：This paper introduces advanced techniques of Topological Data Analysis (TDA) for unsupervised anomaly detection and customer segmentation in banking data. Using the Mapper algorithm and persistent homology, we develop unsupervised procedures that uncover meaningful patterns in customers' banking data by exploiting topological information. The framework we present in this paper yields actionable insights that combine the abstract mathematical subject of topology with real-life use cases that are useful in industry.

【2】Towards Agent-based Test Support Systems: An Unsupervised Environment Design Approach
标题：迈向基于代理的测试支持系统：无监督环境设计方法
链接：https://arxiv.org/abs/2508.14135

作者：.Ogbodo, Timothy J. Rogers, Mattia Dal Borgo, David J. Wagg
备注：17 pages, 11 figures; currently under peer review
摘要：模态测试在结构分析中起着至关重要的作用，它为各种工程行业的动态行为提供了重要的见解。在实践中，设计一个有效的模态试验活动涉及到复杂的实验规划，包括一系列相互依赖的决定，显着影响最终的测试结果。传统的测试设计方法通常是静态的，只关注全局测试，而不考虑不断变化的测试活动参数或这些变化对先前确定的决策（如传感器配置）的影响，这些变化已被发现会显著影响测试结果。这些严格的方法经常会损害测试的准确性和适应性。为了解决这些限制，本研究介绍了一个基于代理的决策支持框架，在动态变化的模态测试环境中的自适应传感器放置。该框架使用未指定的部分可观察马尔可夫决策过程来制定问题，从而通过双课程学习策略来训练通才强化学习代理。对钢悬臂结构的详细案例研究证明了所提出的方法在优化频率段上的传感器位置方面的有效性，验证了其在实验环境中的鲁棒性和真实世界的适用性。
摘要：Modal testing plays a critical role in structural analysis by providing essential insights into dynamic behaviour across a wide range of engineering industries. In practice, designing an effective modal test campaign involves complex experimental planning, comprising a series of interdependent decisions that significantly influence the final test outcome. Traditional approaches to test design are typically static-focusing only on global tests without accounting for evolving test campaign parameters or the impact of such changes on previously established decisions, such as sensor configurations, which have been found to significantly influence test outcomes. These rigid methodologies often compromise test accuracy and adaptability. To address these limitations, this study introduces an agent-based decision support framework for adaptive sensor placement across dynamically changing modal test environments. The framework formulates the problem using an underspecified partially observable Markov decision process, enabling the training of a generalist reinforcement learning agent through a dual-curriculum learning strategy. A detailed case study on a steel cantilever structure demonstrates the efficacy of the proposed method in optimising sensor locations across frequency segments, validating its robustness and real-world applicability in experimental settings.

【3】Toward Generalist Semi-supervised Regression via Decoupled Representation Distillation
标题：通过脱钩表示蒸馏实现通才半监督回归
链接：https://arxiv.org/abs/2508.14082

作者：zhe Qiao, Wei Huang, Lin Chen
备注：12 pages
摘要：半监督回归（SSR）旨在预测样本的连续分数，同时减少对大量标记数据的依赖，最近在各种应用中受到了相当大的关注，包括计算机视觉，自然语言处理以及音频和医学分析。现有的半监督方法通常通过生成伪标签来对一般回归任务应用一致性正则化。然而，这些方法严重依赖于伪标签的质量，并且直接回归无法学习标签分布，并且容易导致过拟合。为了解决这些挑战，我们引入了一个端到端的解耦表示蒸馏框架（DRILL），它是专门为半监督回归任务设计的，我们将一般回归任务转换为多个桶上的离散分布估计（DDE）任务，以更好地捕获底层标签分布并减轻与直接回归相关的过拟合风险。然后，采用解耦分布对齐（DDA）算法，在桶的分布上对齐教师和学生之间的目标桶和非目标桶，鼓励学生从教师那里学习更健壮、更概括的知识。在不同领域的数据集上进行的大量实验表明，所提出的DRILL具有很强的泛化能力，优于竞争方法。
摘要：Semi-supervised regression (SSR), which aims to predict continuous scores of samples while reducing reliance on a large amount of labeled data, has recently received considerable attention across various applications, including computer vision, natural language processing, and audio and medical analysis. Existing semi-supervised methods typically apply consistency regularization on the general regression task by generating pseudo-labels. However, these methods heavily rely on the quality of pseudo-labels, and direct regression fails to learn the label distribution and can easily lead to overfitting. To address these challenges, we introduce an end-to-end Decoupled Representation distillation framework (DRILL) which is specially designed for the semi-supervised regression task where we transform the general regression task into a Discrete Distribution Estimation (DDE) task over multiple buckets to better capture the underlying label distribution and mitigate the risk of overfitting associated with direct regression. Then we employ the Decoupled Distribution Alignment (DDA) to align the target bucket and non-target bucket between teacher and student on the distribution of buckets, encouraging the student to learn more robust and generalized knowledge from the teacher. Extensive experiments conducted on datasets from diverse domains demonstrate that the proposed DRILL has strong generalization and outperforms the competing methods.

迁移|Zero/Few/One-Shot|自适应(1篇)

【1】Synthetic Adaptive Guided Embeddings (SAGE): A Novel Knowledge Distillation Method
标题：合成自适应引导嵌入（SAGE）：一种新型知识提炼方法
链接：https://arxiv.org/abs/2508.14783

作者：Olcay Polat, Poli A. Nemkova, Mark V. Albert
摘要：模型蒸馏可以将知识从大规模模型转移到紧凑的学生模型，从而促进在资源受限的环境中的部署。然而，传统的蒸馏方法通常遭受计算开销和有限的推广。我们提出了一个新的自适应蒸馏框架，动态地增加训练数据的区域高学生模型损失。使用基于UMAP的降维和最近邻采样，我们的方法可以识别嵌入空间中表现不佳的区域，并生成有针对性的合成示例来指导学生学习。为了进一步提高效率，我们引入了一个轻量级的教师-学生界面，绕过了教师的输入层，实现了对矢量化表示的直接蒸馏。在标准NLP基准测试中的实验表明，我们的66 M参数学生模型始终匹配或超过既定的基线，在QNLI上达到91.2%，在SST-2上达到92.3%，同时训练次数较少。这些结果突出了损失感知数据增强和矢量化蒸馏的承诺，高效和有效的模型压缩。
摘要：Model distillation enables the transfer of knowledge from large-scale models to compact student models, facilitating deployment in resource-constrained environments. However, conventional distillation approaches often suffer from computational overhead and limited generalization. We propose a novel adaptive distillation framework that dynamically augments training data in regions of high student model loss. Using UMAP-based dimensionality reduction and nearest neighbor sampling, our method identifies underperforming regions in the embedding space and generates targeted synthetic examples to guide student learning. To further improve efficiency, we introduce a lightweight teacher-student interface that bypasses the teacher's input layer, enabling direct distillation on vectorized representations. Experiments across standard NLP benchmarks demonstrate that our 66M-parameter student model consistently matches or surpasses established baselines, achieving 91.2% on QNLI and 92.3% on SST-2, while training with fewer epochs. These results highlight the promise of loss-aware data augmentation and vectorized distillation for efficient and effective model compression.

强化学习(2篇)

【1】A Comparative Evaluation of Teacher-Guided Reinforcement Learning Techniques for Autonomous Cyber Operations
标题：自主网络操作的教师引导强化学习技术的比较评估
链接：https://arxiv.org/abs/2508.14340

作者：ll, Mariam El Mezouar, Ranwa Al Mallah
摘要：自主网络操作（ACO）依赖于强化学习（RL）来训练代理，以在网络安全领域做出有效的决策。然而，现有的ACO应用程序需要代理从头开始学习，导致收敛速度慢，早期性能差。虽然教师指导的技术已经在其他领域的承诺，他们还没有被应用到ACO。在这项研究中，我们实现了四个不同的教师指导的技术在模拟CybORG环境，并进行比较评估。我们的结果表明，教师整合可以显着提高早期政策绩效和收敛速度方面的培训效率，凸显了其对自主网络安全的潜在好处。
摘要：Autonomous Cyber Operations (ACO) rely on Reinforcement Learning (RL) to train agents to make effective decisions in the cybersecurity domain. However, existing ACO applications require agents to learn from scratch, leading to slow convergence and poor early-stage performance. While teacher-guided techniques have demonstrated promise in other domains, they have not yet been applied to ACO. In this study, we implement four distinct teacher-guided techniques in the simulated CybORG environment and conduct a comparative evaluation. Our results demonstrate that teacher integration can significantly improve training efficiency in terms of early policy performance and convergence speed, highlighting its potential benefits for autonomous cybersecurity.

【2】PersRM-R1: Enhance Personalized Reward Modeling with Reinforcement Learning
标题：CLARRM-R1：利用强化学习增强个性化奖励建模
链接：https://arxiv.org/abs/2508.14076

作者：, Guanqiao Chen, Xufeng Zhao, Haochen Wen, Shu Yang, Di Wang
摘要：奖励模型（RM）是现有后训练方法的核心，旨在通过在微调期间提供反馈信号来使LLM输出与人类价值观保持一致。然而，现有的RM难以捕捉细微差别，用户特定的偏好，特别是在有限的数据和不同的领域。因此，我们介绍了RISRM-R1，第一个基于推理的奖励建模框架，专门设计用于识别和表示来自一个或几个个人样本的个人因素。为了解决包括有限的数据可用性和强大的泛化需求在内的挑战，我们的方法将合成数据生成与两阶段训练管道相结合，该两阶段训练管道由监督微调和强化微调组成。实验结果表明，ARRM-R1优于现有的类似大小的模型，并在准确性和泛化能力方面与更大的模型的性能相匹配，为更有效的个性化LLM铺平了道路。
摘要：Reward models (RMs), which are central to existing post-training methods, aim to align LLM outputs with human values by providing feedback signals during fine-tuning. However, existing RMs struggle to capture nuanced, user-specific preferences, especially under limited data and across diverse domains. Thus, we introduce PersRM-R1, the first reasoning-based reward modeling framework specifically designed to identify and represent personal factors from only one or a few personal exemplars. To address challenges including limited data availability and the requirement for robust generalization, our approach combines synthetic data generation with a two-stage training pipeline consisting of supervised fine-tuning followed by reinforcement fine-tuning. Experimental results demonstrate that PersRM-R1 outperforms existing models of similar size and matches the performance of much larger models in both accuracy and generalizability, paving the way for more effective personalized LLMs.

元学习(1篇)

【1】Learning to Learn the Macroscopic Fundamental Diagram using Physics-Informed and meta Machine Learning techniques
标题：学习使用物理知识和Meta机器学习技术学习宏观基本图
链接：https://arxiv.org/abs/2508.14137

作者：ark, Serio Agriesti, Francisco Camara Pereira, Guido Cantelmo
摘要：宏观基本图是一种流行的工具，用于以聚合的方式描述交通动态，其应用范围从交通控制到事件分析。然而，估计给定网络的MFD需要大量的环路检测器，这在实践中并不总是可用的。本文提出了一个利用元学习的框架，元学习是机器学习的一个子类别，它训练模型自己理解和适应新任务，以缓解数据稀缺的挑战。开发的模型通过利用来自多个城市的数据进行训练和测试，并利用它来模拟具有不同探测器份额和拓扑结构的其他城市的MFD。提出的元学习框架应用于一个特设的多任务物理信息神经网络，专门设计来估计MFD。结果表明，平均MSE改善流量预测范围在~ 17500和36000之间（取决于测试的环路检测器的子集）。因此，元学习框架成功地在不同的城市环境中推广，并提高了数据有限的城市的性能，展示了在检测器数量有限的情况下使用元学习的潜力。最后，该框架针对传统的迁移学习方法进行了验证，并使用FitFun（文献中的非参数模型）进行了测试，以证明其可迁移性。
摘要：The Macroscopic Fundamental Diagram is a popular tool used to describe traffic dynamics in an aggregated way, with applications ranging from traffic control to incident analysis. However, estimating the MFD for a given network requires large numbers of loop detectors, which is not always available in practice. This article proposes a framework harnessing meta-learning, a subcategory of machine learning that trains models to understand and adapt to new tasks on their own, to alleviate the data scarcity challenge. The developed model is trained and tested by leveraging data from multiple cities and exploiting it to model the MFD of other cities with different shares of detectors and topological structures. The proposed meta-learning framework is applied to an ad-hoc Multi-Task Physics-Informed Neural Network, specifically designed to estimate the MFD. Results show an average MSE improvement in flow prediction ranging between ~ 17500 and 36000 (depending on the subset of loop detectors tested). The meta-learning framework thus successfully generalizes across diverse urban settings and improves performance on cities with limited data, demonstrating the potential of using meta-learning when a limited number of detectors is available. Finally, the proposed framework is validated against traditional transfer learning approaches and tested with FitFun, a non-parametric model from the literature, to prove its transferability.

符号|符号学习(2篇)

【1】Fast Symbolic Regression Benchmarking
标题：快速符号回归基准测试
链接：https://arxiv.org/abs/2508.14481

作者：rtinek
摘要：符号回归（Symbolic Regression，SR）是从数据中发现数学模型。已经提出了几个基准测试来比较SR算法的性能。然而，现有的地面实况重新发现基准过分强调恢复“一个”表达形式或仅依赖于计算机代数系统（如SymPy）来评估成功。此外，现有的基准继续表达式搜索，即使在它的发现。我们通过引入可接受表达式的精选列表和提前终止的回调机制来改进这些问题。作为起点，我们使用Yoshitomo等人提出的科学发现符号回归（SRSD）基准问题，并对两个SR包SymbolicRegression. jl和TiSR进行基准测试。新的基准测试方法将SymbolicRegression. jl的重新发现率从26.7%提高到26.7%，如Yoshitomo等人所报道的，44.7%。执行基准测试的计算开销减少了41.2%。TiSR的重新发现率为69.4%，而执行基准测试节省了63%的时间。
摘要：Symbolic regression (SR) uncovers mathematical models from data. Several benchmarks have been proposed to compare the performance of SR algorithms. However, existing ground-truth rediscovery benchmarks overemphasize the recovery of "the one" expression form or rely solely on computer algebra systems (such as SymPy) to assess success. Furthermore, existing benchmarks continue the expression search even after its discovery. We improve upon these issues by introducing curated lists of acceptable expressions, and a callback mechanism for early termination. As a starting point, we use the symbolic regression for scientific discovery (SRSD) benchmark problems proposed by Yoshitomo et al., and benchmark the two SR packages SymbolicRegression.jl and TiSR. The new benchmarking method increases the rediscovery rate of SymbolicRegression.jl from 26.7%, as reported by Yoshitomo et at., to 44.7%. Performing the benchmark takes 41.2% less computational expense. TiSR's rediscovery rate is 69.4%, while performing the benchmark saves 63% time.

【2】Parameter-Aware Ensemble SINDy for Interpretable Symbolic SGS Closure
标题：可解释符号性GS闭合的参数感知Ensemble SINDY
链接：https://arxiv.org/abs/2508.14085

作者：ang, Shervin Karimkashi, Ville Vuorinen
备注：16 pages, 6 figures
摘要：我们提出了一个可扩展的，参数感知的稀疏回归框架，用于从多参数模拟数据中发现可解释的偏微分方程和子网格尺度闭包。基于SINDy（非线性动力学的稀疏识别），我们的方法通过四个创新解决了关键限制：符号参数化，使物理参数在统一回归中变化;维度相似性过滤器，在减少候选库的同时执行单位一致性;内存高效的Gram-matrix积累，使批处理成为可能;以及集成一致性与系数稳定性分析，用于鲁棒的模型识别。规范的一维基准验证表明可靠的恢复控制方程的参数范围。应用于过滤的Burgers数据集，该框架发现SGS闭包$\tau_{SGS}} = 0.1603\cdot\Delta^2\left（\frac{\partial \bar{u}}{\partial x}\right）^2$，对应于大约0.4004的Smagorinsky常数。这代表了在没有事先理论假设的情况下从数据中自主发现Smagorinsky型闭包结构。所发现的模型在整个过滤尺度上实现了R^2 = 0.886$，并证明了与经典闭包相比提高了预测精度。该框架的能力，以确定物理意义的SGS形式和校准系数提供了一个补充的方法，现有的湍流建模方法，有助于不断增长的领域的数据驱动关闭发现。
摘要：We present a scalable, parameter-aware sparse regression framework for discovering interpretable partial differential equations and subgrid-scale closures from multi-parameter simulation data. Building on SINDy (Sparse Identification of Nonlinear Dynamics), our approach addresses key limitations through four innovations: symbolic parameterisation enabling physical parameters to vary within unified regression; Dimensional Similarity Filter enforcing unit-consistency whilst reducing candidate libraries; memory-efficient Gram-matrix accumulation enabling batch processing; and ensemble consensus with coefficient stability analysis for robust model identification. Validation on canonical one-dimensional benchmarks demonstrates reliable recovery of governing equations across parameter ranges. Applied to filtered Burgers datasets, the framework discovers an SGS closure $\tau_{\mathrm{SGS}} = 0.1603\cdot\Delta^2\left(\frac{\partial \bar{u}}{\partial x}\right)^2$, corresponding to a Smagorinsky constant of approximately 0.4004. This represents autonomous discovery of Smagorinsky-type closure structure from data without prior theoretical assumptions. The discovered model achieves $R^2 = 0.886$ across filter scales and demonstrates improved prediction accuracy compared to classical closures. The framework's ability to identify physically meaningful SGS forms and calibrate coefficients offers a complementary approach to existing turbulence modelling methods, contributing to the growing field of data-driven closure discovery.

医学相关(2篇)

【1】Clinical semantics for lung cancer prediction
标题：肺癌预测的临床语义
链接：https://arxiv.org/abs/2508.14627

作者：ohn, Jan A. Kors, Jenna M. Reps, Peter R. Rijnbeek, Egill A. Fridgeirsson
摘要：背景资料：现有的临床预测模型通常使用忽略临床概念之间的语义关系的特征来表示患者数据。本研究通过使用庞加莱嵌入将SNOMED医学术语层次映射到低维双曲空间来整合特定领域的语义信息，旨在改善肺癌发病预测。研究方法：使用Optum EHR数据集的回顾性队列，我们从SNOMED分类法中推导出临床知识图，并通过黎曼随机梯度下降生成庞加莱嵌入。然后，这些嵌入被整合到两个深度学习架构中，一个是ResNet，另一个是Transformer模型。评价模型的区分度（受试者工作特征曲线下面积）和校准（观察概率和预测概率之间的平均绝对差）性能。结果如下：与使用随机初始化欧几里德嵌入的基线模型相比，验证预训练的庞加莱嵌入导致了识别性能的适度和一致的改善。ResNet模型，特别是那些使用10维庞加莱嵌入的模型，显示出增强的校准，而Transformer模型在整个配置中保持稳定的校准。讨论内容：将临床知识图嵌入双曲空间并将这些表示集成到深度学习模型中，可以通过保留用于预测的临床术语的层次结构来改善肺癌发病预测。这种方法展示了一种将数据驱动的特征提取与已建立的临床知识相结合的可行方法。
摘要：Background: Existing clinical prediction models often represent patient data using features that ignore the semantic relationships between clinical concepts. This study integrates domain-specific semantic information by mapping the SNOMED medical term hierarchy into a low-dimensional hyperbolic space using Poincar\'e embeddings, with the aim of improving lung cancer onset prediction. Methods: Using a retrospective cohort from the Optum EHR dataset, we derived a clinical knowledge graph from the SNOMED taxonomy and generated Poincar\'e embeddings via Riemannian stochastic gradient descent. These embeddings were then incorporated into two deep learning architectures, a ResNet and a Transformer model. Models were evaluated for discrimination (area under the receiver operating characteristic curve) and calibration (average absolute difference between observed and predicted probabilities) performance. Results: Incorporating pre-trained Poincar\'e embeddings resulted in modest and consistent improvements in discrimination performance compared to baseline models using randomly initialized Euclidean embeddings. ResNet models, particularly those using a 10-dimensional Poincar\'e embedding, showed enhanced calibration, whereas Transformer models maintained stable calibration across configurations. Discussion: Embedding clinical knowledge graphs into hyperbolic space and integrating these representations into deep learning models can improve lung cancer onset prediction by preserving the hierarchical structure of clinical terminologies used for prediction. This approach demonstrates a feasible method for combining data-driven feature extraction with established clinical knowledge.

【2】On the notion of missingness for path attribution explainability methods in medical settings: Guiding the selection of medically meaningful baselines
标题：关于医疗环境中路径归因解释方法缺失的概念：指导选择具有医学意义的基线
链接：https://arxiv.org/abs/2508.14482

作者： Geiger, Lars Wagner, Daniel Rueckert, Dirk Wilhelm, Alissa Jell
摘要：深度学习模型的可解释性仍然是一个重大挑战，特别是在医疗领域，可解释的输出对于临床信任和透明度至关重要。路径归因方法，如综合特征依赖于表示相关特征缺失（“缺失”）的基线输入。常用的基线，如全零输入，通常在语义上是无意义的，特别是在缺失本身可以提供信息的医疗环境中。虽然已经探索了替代基线选择，但现有方法缺乏一种原则性方法来动态选择针对每种输入的基线。在这项工作中，我们研究了医疗环境中缺失的概念，分析了其对基线选择的影响，并引入了一种反事实指导的方法来解决传统基线的局限性。我们认为，一个临床上正常的，但输入关闭反事实代表了一个更准确的表示有意义的医疗数据中没有功能。为了实现这一点，我们使用变分自动编码器来生成反事实基线，尽管我们的概念是生成模型不可知的，并且可以应用于任何合适的反事实方法。我们评估了三个不同的医疗数据集的方法和经验表明，反事实基线产生更忠实和医学相关的属性相比，标准的基线选择。
摘要：The explainability of deep learning models remains a significant challenge, particularly in the medical domain where interpretable outputs are critical for clinical trust and transparency. Path attribution methods such as Integrated Gradients rely on a baseline input representing the absence of relevant features ("missingness"). Commonly used baselines, such as all-zero inputs, are often semantically meaningless, especially in medical contexts where missingness can itself be informative. While alternative baseline choices have been explored, existing methods lack a principled approach to dynamically select baselines tailored to each input. In this work, we examine the notion of missingness in the medical setting, analyze its implications for baseline selection, and introduce a counterfactual-guided approach to address the limitations of conventional baselines. We argue that a clinically normal but input-close counterfactual represents a more accurate representation of a meaningful absence of features in medical data. To implement this, we use a Variational Autoencoder to generate counterfactual baselines, though our concept is generative-model-agnostic and can be applied with any suitable counterfactual method. We evaluate the approach on three distinct medical data sets and empirically demonstrate that counterfactual baselines yield more faithful and medically relevant attributions compared to standard baseline choices.

蒸馏|知识提取(1篇)

【1】Federated Distillation on Edge Devices: Efficient Client-Side Filtering for Non-IID Data
标题：边缘设备上的联合蒸馏：非IID数据的高效客户端过滤
链接：https://arxiv.org/abs/2508.14769

作者：taba, Gleb Radchenko, Radu Prodan, Marc Masana
备注：This paper was accepted at the International Conference on Federated Learning Technologies and Applications, 2025. The final version is available at IEEE Xplore
摘要：联邦蒸馏已经成为一种很有前途的协作机器学习方法，与传统的联邦学习相比，通过交换模型输出（软逻辑）而不是完整的模型参数，提供了增强的隐私保护和减少的通信。然而，现有的方法采用复杂的选择性知识共享策略，需要客户端通过计算昂贵的统计密度比估计，以确定在分布代理数据。此外，服务器端对模糊知识的过滤会给流程带来延迟。为了解决这些挑战，我们提出了一个强大的，资源高效的EdgeFD方法，降低了客户端密度比估计的复杂性，并消除了对服务器端过滤的需要。EdgeFD引入了一个高效的基于KMeans的密度比估计器，用于有效过滤客户端上的分布内和分布外代理数据，显着提高知识共享的质量。我们在不同的实际场景中评估EdgeFD，包括客户端上的强非IID，弱非IID和IID数据分布，而不需要服务器上的预训练教师模型进行知识蒸馏。实验结果表明，EdgeFD优于最先进的方法，即使在异构和具有挑战性的条件下，也始终达到接近IID场景的准确度水平。基于KMeans的估计器的计算开销显著减少，适合部署在资源受限的边缘设备上，从而增强了联邦蒸馏的可扩展性和现实世界的适用性。该代码可在网上复制。
摘要：Federated distillation has emerged as a promising collaborative machine learning approach, offering enhanced privacy protection and reduced communication compared to traditional federated learning by exchanging model outputs (soft logits) rather than full model parameters. However, existing methods employ complex selective knowledge-sharing strategies that require clients to identify in-distribution proxy data through computationally expensive statistical density ratio estimators. Additionally, server-side filtering of ambiguous knowledge introduces latency to the process. To address these challenges, we propose a robust, resource-efficient EdgeFD method that reduces the complexity of the client-side density ratio estimation and removes the need for server-side filtering. EdgeFD introduces an efficient KMeans-based density ratio estimator for effectively filtering both in-distribution and out-of-distribution proxy data on clients, significantly improving the quality of knowledge sharing. We evaluate EdgeFD across diverse practical scenarios, including strong non-IID, weak non-IID, and IID data distributions on clients, without requiring a pre-trained teacher model on the server for knowledge distillation. Experimental results demonstrate that EdgeFD outperforms state-of-the-art methods, consistently achieving accuracy levels close to IID scenarios even under heterogeneous and challenging conditions. The significantly reduced computational overhead of the KMeans-based estimator is suitable for deployment on resource-constrained edge devices, thereby enhancing the scalability and real-world applicability of federated distillation. The code is available online for reproducibility.

推荐(1篇)

【1】Personalized Contest Recommendation in Fantasy Sports
标题：梦幻体育中的个性化比赛推荐
链接：https://arxiv.org/abs/2508.14065

作者：Srilakshmi, Kartavya Kothari, Kamlesh Marathe, Vedavyas Chigurupati, Hitesh Kapoor
摘要：在日常的幻想运动中，玩家进入“竞赛”，其中他们通过建立运动员团队来彼此竞争，这些运动员团队基于现实生活中的运动比赛中实际发生的事情来获得幻想分数。对于任何给定的体育比赛，有许多比赛可供玩家选择，在3个主要方面有很大的变化：报名费，点数和奖金池分配。由于玩家的偏好也是相当异质的，比赛个性化是将玩家与比赛匹配的重要工具。本文提出了一个可扩展的竞赛推荐系统，其核心是一个广泛和深入的互动排名（Wietry）。我们在我们公司生产了这个系统，这是一个大型的梦幻体育平台，每天有数百万的比赛和数百万的玩家，在线实验显示，在召回率和其他关键业务指标方面，与其他候选模型相比，有了显着的改善。
摘要：In daily fantasy sports, players enter into "contests" where they compete against each other by building teams of athletes that score fantasy points based on what actually occurs in a real-life sports match. For any given sports match, there are a multitude of contests available to players, with substantial variation across 3 main dimensions: entry fee, number of spots, and the prize pool distribution. As player preferences are also quite heterogeneous, contest personalization is an important tool to match players with contests. This paper presents a scalable contest recommendation system, powered by a Wide and Deep Interaction Ranker (WiDIR) at its core. We productionized this system at our company, one of the large fantasy sports platforms with millions of daily contests and millions of players, where online experiments show a marked improvement over other candidate models in terms of recall and other critical business metrics.

自动驾驶|车辆|车道检测等(2篇)

【1】Edge-Selector Model Applied for Local Search Neighborhood for Solving Vehicle Routing Problems
标题：边缘搜索模型应用于局部搜索邻居解决车辆路径问题
链接：https://arxiv.org/abs/2508.14071

作者：Herdianto, Romain Billot, Flavien Lucas, Marc Sevaux, Daniele Vigo
备注：29 pages, 12 figures
摘要：本研究提出一种混合式机器学习与超启发式机制来解决车辆路径问题。我们的方法的主要是一个边缘解决方案选择器模型，该模型对解决方案边缘进行分类，以确定在局部搜索过程中禁止的移动，从而指导元启发式基线内的搜索过程。两个基于学习的机制被用来开发边缘选择器：一个简单的表格二元分类器和一个图形神经网络（GNN）。表格分类器采用梯度提升树和前馈神经网络作为基线算法。对决策阈值的调整也被应用于处理问题实例中的类不平衡。另一种机制采用GNN来利用图结构进行直接解边缘预测，目的是通过预测禁止的移动来指导局部搜索。这些混合机制，然后应用于国家的最先进的元启发式基线。我们的方法展示了可扩展性和可推广性，在不同的基线元算法，各种问题的大小和变体，包括容量限制车辆路径问题（CVRP）和CVRP与时间窗口（CVRPTW）实现性能改进。对多达30，000个客户节点的基准数据集进行的实验评估，得到了成对统计分析的支持，验证了所观察到的改进。
摘要：This research proposes a hybrid Machine Learning and metaheuristic mechanism that is designed to solve Vehicle Routing Problems (VRPs). The main of our method is an edge solution selector model, which classifies solution edges to identify prohibited moves during the local search, hence guiding the search process within metaheuristic baselines. Two learning-based mechanisms are used to develop the edge selector: a simple tabular binary classifier and a Graph Neural Network (GNN). The tabular classifier employs Gradient Boosting Trees and Feedforward Neural Network as the baseline algorithms. Adjustments to the decision threshold are also applied to handle the class imbalance in the problem instance. An alternative mechanism employs the GNN to utilize graph structure for direct solution edge prediction, with the objective of guiding local search by predicting prohibited moves. These hybrid mechanisms are then applied in state-fo-the-art metaheuristic baselines. Our method demonstrates both scalability and generalizability, achieving performance improvements across different baseline metaheuristics, various problem sizes and variants, including the Capacitated Vehicle Routing Problem (CVRP) and CVRP with Time Windows (CVRPTW). Experimental evaluations on benchmark datasets up to 30,000 customer nodes, supported by pair-wise statistical analysis, verify the observed improvements.

【2】Learning from user's behaviour of some well-known congested traffic networks
标题：学习一些著名拥堵交通网络的用户行为
链接：https://arxiv.org/abs/2508.14804

作者：rdoso, Lucas Venturato, Jorgelina Walpen
备注：12 pages, 4 figures, 3 tables
摘要：本文研究了在均衡条件下预测拥挤交通网络中用户行为的交通分配问题。我们提出了一个两阶段的机器学习方法，耦合神经网络与固定点算法，我们沿着几个经典的拥挤的交通网络的性能进行评估。
摘要：We consider the problem of predicting users' behavior of a congested traffic network under an equilibrium condition, the traffic assignment problem. We propose a two-stage machine learning approach which couples a neural network with a fixed point algorithm, and we evaluate its performance along several classical congested traffic networks.

联邦学习|隐私保护|加密(1篇)

【1】FedEve: On Bridging the Client Drift and Period Drift for Cross-device Federated Learning
标题：FedEve：关于跨设备联合学习的跨客户端漂移和周期漂移的弥合
链接：https://arxiv.org/abs/2508.14539

作者： Zexi Li, Didi Zhu, Ziyu Zhao, Chao Wu, Fei Wu
摘要：联邦学习（FL）是一种机器学习范例，允许多个客户端协作训练共享模型，而不会暴露其私有数据。数据异构性是FL中的一个基本挑战，它会导致收敛性差和性能下降。客户端漂移已被认为是导致此问题的因素之一，这是由于FedAvg中的多个本地更新造成的。然而，在跨设备FL中，由于部分客户端的参与，会出现不同形式的漂移，但尚未得到很好的研究。这种漂移，我们称为周期漂移，发生在每个通信回合的参与客户端可能会表现出不同的数据分布，偏离所有客户端。它可能比客户端漂移更有害，因为优化目标随着每一轮而变化。在本文中，我们研究了周期漂移和客户端漂移之间的相互作用，发现随着数据异构程度的增加，周期漂移会对跨设备FL产生特别不利的影响。为了解决这些问题，我们提出了一个预测观察框架，并提出了一个实例化的方法，FedEve，这两种类型的漂移可以相互补偿，以减轻其整体影响。我们提供的理论证据表明，我们的方法可以减少模型更新的方差。大量的实验表明，我们的方法优于替代品的非iid数据在跨设备设置。
摘要：Federated learning (FL) is a machine learning paradigm that allows multiple clients to collaboratively train a shared model without exposing their private data. Data heterogeneity is a fundamental challenge in FL, which can result in poor convergence and performance degradation. Client drift has been recognized as one of the factors contributing to this issue resulting from the multiple local updates in FedAvg. However, in cross-device FL, a different form of drift arises due to the partial client participation, but it has not been studied well. This drift, we referred as period drift, occurs as participating clients at each communication round may exhibit distinct data distribution that deviates from that of all clients. It could be more harmful than client drift since the optimization objective shifts with every round. In this paper, we investigate the interaction between period drift and client drift, finding that period drift can have a particularly detrimental effect on cross-device FL as the degree of data heterogeneity increases. To tackle these issues, we propose a predict-observe framework and present an instantiated method, FedEve, where these two types of drift can compensate each other to mitigate their overall impact. We provide theoretical evidence that our approach can reduce the variance of model updates. Extensive experiments demonstrate that our method outperforms alternatives on non-iid data in cross-device settings.

推理|分析|理解|解释(6篇)

【1】Long Chain-of-Thought Reasoning Across Languages
标题：跨语言的长思想链推理
链接：https://arxiv.org/abs/2508.14828

作者：a, Seun Eisape, Kayo Yin, Alane Suhr
备注：Accepted to SCALR @ COLM 2025
摘要：通过长思维链（CoT）扩展推理已经在大型语言模型（LLM）中解锁了令人印象深刻的推理能力，但推理过程仍然几乎完全以英语为中心。我们构建了两个流行的英语推理数据集的翻译版本，微调Qwen 2.5（7B）和Qwen 3（8B）模型，并对法语，日语，拉脱维亚语和斯瓦希里语的长CoT生成进行了系统研究。我们的实验揭示了三个关键发现。首先，使用英语作为枢纽语言的功效因语言而异：它对法语没有任何好处，当用作日语和拉脱维亚语的推理语言时，它可以提高性能，而对于斯瓦希里语来说，任务理解和推理仍然很差。第二，Qwen 3中广泛的多语言预训练缩小了但并没有消除跨语言的表现差距。在斯瓦希里语中，仅使用1k轨迹的轻量级微调仍将性能提高了30%以上。第三，数据质量与规模的权衡取决于语言：精心策划的小型数据集对于英语和法语来说就足够了，而较大但噪音较大的语料库对于斯瓦希里语和拉脱维亚语来说更有效。总之，这些结果澄清了何时以及为什么长CoTs跨语言传输，并提供翻译数据集，以促进公平的多语言推理研究。
摘要：Scaling inference through long chains-of-thought (CoTs) has unlocked impressive reasoning capabilities in large language models (LLMs), yet the reasoning process remains almost exclusively English-centric. We construct translated versions of two popular English reasoning datasets, fine-tune Qwen 2.5 (7B) and Qwen 3 (8B) models, and present a systematic study of long CoT generation across French, Japanese, Latvian, and Swahili. Our experiments reveal three key findings. First, the efficacy of using English as a pivot language varies by language: it provides no benefit for French, improves performance when used as the reasoning language for Japanese and Latvian, and proves insufficient for Swahili where both task comprehension and reasoning remain poor. Second, extensive multilingual pretraining in Qwen 3 narrows but does not eliminate the cross-lingual performance gap. A lightweight fine-tune using only 1k traces still improves performance by over 30\% in Swahili. Third, data quality versus scale trade-offs are language dependent: small, carefully curated datasets suffice for English and French, whereas larger but noisier corpora prove more effective for Swahili and Latvian. Together, these results clarify when and why long CoTs transfer across languages and provide translated datasets to foster equitable multilingual reasoning research.

【2】Assessing the Quality and Security of AI-Generated Code: A Quantitative Analysis
标题：评估人工智能生成代码的质量和安全性：定量分析
链接：https://arxiv.org/abs/2508.14727

作者：ra, Olivier Schmitt, Joseph Tyler
摘要：本研究对五个主要的大型语言模型（LLM）的代码质量和安全性进行了定量评估：Claude Sonnet 4，Claude 3.7 Sonnet，GPT-4 o，Llama 3.2 90 B和OpenCoder 8B。虽然之前的研究已经评估了LLM生成代码的功能性能，但这项研究通过使用SonarQube进行全面的静态分析，测试了4，442个Java编码任务的LLM输出。研究结果表明，尽管LLM可以生成功能代码，但它们也会引入一系列软件缺陷，包括错误，安全漏洞和代码气味。这些缺陷似乎并不是孤立的;相反，它们可能代表了当前LLM代码生成方法中的系统性限制所带来的共同弱点。特别是，在多个模型中观察到了极其严重的问题，例如硬编码密码和路径遍历漏洞。这些结果表明，LLM生成的代码需要验证才能被认为是生产就绪的。这项研究发现，模型的功能性能（通过单元测试的Pass@1率衡量）与其生成代码的整体质量和安全性之间没有直接的相关性，这些质量和安全性是通过功能测试的基准解决方案中的SonarQube问题数量来衡量的。这表明函数基准测试性能分数不是整体代码质量和安全性的良好指标。这项研究的目的不是对LLM的性能进行排名，而是强调所有评估的模型似乎都有某些弱点。因此，这些发现支持了这样一种观点，即静态分析可以成为检测潜在缺陷的有价值的工具，也是在软件开发中部署人工智能的组织的重要保障。
摘要：This study presents a quantitative evaluation of the code quality and security of five prominent Large Language Models (LLMs): Claude Sonnet 4, Claude 3.7 Sonnet, GPT-4o, Llama 3.2 90B, and OpenCoder 8B. While prior research has assessed the functional performance of LLM-generated code, this research tested LLM output from 4,442 Java coding assignments through comprehensive static analysis using SonarQube. The findings suggest that although LLMs can generate functional code, they also introduce a range of software defects, including bugs, security vulnerabilities, and code smells. These defects do not appear to be isolated; rather, they may represent shared weaknesses stemming from systemic limitations within current LLM code generation methods. In particular, critically severe issues, such as hard-coded passwords and path traversal vulnerabilities, were observed across multiple models. These results indicate that LLM-generated code requires verification in order to be considered production-ready. This study found no direct correlation between a model's functional performance (measured by Pass@1 rate of unit tests) and the overall quality and security of its generated code, measured by the number of SonarQube issues in benchmark solutions that passed the functional tests. This suggests that functional benchmark performance score is not a good indicator of overall code quality and security. The goal of this study is not to rank LLM performance but to highlight that all evaluated models appear to share certain weaknesses. Consequently, these findings support the view that static analysis can be a valuable instrument for detecting latent defects and an important safeguard for organizations that deploy AI in software development.

【3】Understanding Data Influence with Differential Approximation
标题：通过差异逼近理解数据影响
链接：https://arxiv.org/abs/2508.14648

作者：, Sitong Wu, Xiuzhe Wu, Wang Wang, Bo Zhao, Zeke Xie, Gui-Song Xia, Xiaojuan Qi
摘要：数据在人工智能的突破性进展中发挥着关键作用。数据的定量分析大大有助于模型训练，提高数据利用的效率和质量。然而，现有的数据分析工具往往在准确性方面滞后。例如，这些工具中的许多甚至假设神经网络的损失函数是凸的。这些限制使得有效实施当前方法变得具有挑战性。在本文中，我们引入了一个新的配方，以近似一个样本的影响，通过积累连续的学习步骤之间的影响的差异，我们术语的Diff-In。具体来说，我们将样本影响公式化为连续训练迭代中其变化/差异的累积和。通过采用二阶近似，我们近似这些差异项具有高精度，同时消除了现有方法所需的模型凸性的需要。尽管是二阶方法，Diff-In保持与一阶方法相当的计算复杂度，并且保持可扩展性。这种效率是通过计算Hessian和梯度的乘积来实现的，这可以使用一阶梯度的有限差分来有效地近似。我们从理论和经验两方面评估了Diff-In的近似精度。我们的理论分析表明，Diff-In实现了显着较低的近似误差相比，现有的影响估计。大量的实验进一步证实了它在三个以数据为中心的任务：数据清理，数据删除和核心集选择中在多个基准数据集上的卓越性能。值得注意的是，我们对大规模视觉语言预训练的数据修剪实验表明，Diff-In可以扩展到数百万个数据点，并且优于强基线。
摘要：Data plays a pivotal role in the groundbreaking advancements in artificial intelligence. The quantitative analysis of data significantly contributes to model training, enhancing both the efficiency and quality of data utilization. However, existing data analysis tools often lag in accuracy. For instance, many of these tools even assume that the loss function of neural networks is convex. These limitations make it challenging to implement current methods effectively. In this paper, we introduce a new formulation to approximate a sample's influence by accumulating the differences in influence between consecutive learning steps, which we term Diff-In. Specifically, we formulate the sample-wise influence as the cumulative sum of its changes/differences across successive training iterations. By employing second-order approximations, we approximate these difference terms with high accuracy while eliminating the need for model convexity required by existing methods. Despite being a second-order method, Diff-In maintains computational complexity comparable to that of first-order methods and remains scalable. This efficiency is achieved by computing the product of the Hessian and gradient, which can be efficiently approximated using finite differences of first-order gradients. We assess the approximation accuracy of Diff-In both theoretically and empirically. Our theoretical analysis demonstrates that Diff-In achieves significantly lower approximation error compared to existing influence estimators. Extensive experiments further confirm its superior performance across multiple benchmark datasets in three data-centric tasks: data cleaning, data deletion, and coreset selection. Notably, our experiments on data pruning for large-scale vision-language pre-training show that Diff-In can scale to millions of data points and outperforms strong baselines.

【4】A Fuzzy-Enhanced Explainable AI Framework for Flight Continuous Descent Operations Classification
标题：用于飞行连续下降操作分类的模糊增强可解释人工智能框架
链接：https://arxiv.org/abs/2508.14618

作者：ozi, Sandaruwan K. Sethunge, Elham Norouzi, Phat T. Phan, Kavinda U. Waduge, Md. Arafatur Rahman
摘要：连续下降操作（CDO）涉及平稳的怠速推力下降，避免水平，减少燃料燃烧，排放和噪音，同时提高效率和乘客舒适度。尽管它的运营和环境效益，有限的研究系统地研究了影响CDO性能的因素。此外，相关领域的许多现有方法，如轨迹优化，缺乏航空业所需的透明度，而航空业的可解释性对于安全和利益相关者的信任至关重要。这项研究通过提出一个模糊增强可解释AI（FEXAI）框架来解决这些差距，该框架将模糊逻辑与机器学习和SHapley加法解释（SHAP）分析相结合。为此，使用广播式自动相关监视数据从1 094次飞行中收集了29个特征的综合数据集，其中包括11个运行特征和18个与天气有关的特征。然后应用机器学习模型和SHAP对航班的CDO遵守程度进行分类，并按重要性对特征进行排名。三个最有影响力的功能，确定的SHAP分数，然后用来构建一个模糊的基于规则的分类器，使可解释的模糊规则的提取。所有模型的分类准确率都超过了90%，FEXAI为操作用户提供了有意义的、人类可读的规则。结果表明，到达路线内的平均下降率，下降段的数量，和下降过程中的方向航向的平均变化是CDO性能的最强预测因子。在这项研究中提出的FEXAI方法提出了一种新的途径，操作决策支持，并可以集成到航空工具，使实时系统，保持CDO坚持在不同的操作条件下。
摘要：Continuous Descent Operations (CDO) involve smooth, idle-thrust descents that avoid level-offs, reducing fuel burn, emissions, and noise while improving efficiency and passenger comfort. Despite its operational and environmental benefits, limited research has systematically examined the factors influencing CDO performance. Moreover, many existing methods in related areas, such as trajectory optimization, lack the transparency required in aviation, where explainability is critical for safety and stakeholder trust. This study addresses these gaps by proposing a Fuzzy-Enhanced Explainable AI (FEXAI) framework that integrates fuzzy logic with machine learning and SHapley Additive exPlanations (SHAP) analysis. For this purpose, a comprehensive dataset of 29 features, including 11 operational and 18 weather-related features, was collected from 1,094 flights using Automatic Dependent Surveillance-Broadcast (ADS-B) data. Machine learning models and SHAP were then applied to classify flights' CDO adherence levels and rank features by importance. The three most influential features, as identified by SHAP scores, were then used to construct a fuzzy rule-based classifier, enabling the extraction of interpretable fuzzy rules. All models achieved classification accuracies above 90%, with FEXAI providing meaningful, human-readable rules for operational users. Results indicated that the average descent rate within the arrival route, the number of descent segments, and the average change in directional heading during descent were the strongest predictors of CDO performance. The FEXAI method proposed in this study presents a novel pathway for operational decision support and could be integrated into aviation tools to enable real-time advisories that maintain CDO adherence under varying operational conditions.

【5】Beyond Turing: Memory-Amortized Inference as a Foundation for Cognitive Computation
标题：超越图灵：记忆分解推理作为认知计算的基础
链接：https://arxiv.org/abs/2508.14143

作者：
摘要：智能从根本上说是非遍历的：它不是从零开始的均匀采样或优化中出现的，而是从先前推理轨迹的结构化重用中出现的。我们引入记忆摊销推理（MAI）作为一个正式的框架，在这个框架中，认知被建模为对记忆中潜在周期的推理，而不是通过梯度下降重新计算。MAI系统通过结构重用编码归纳偏差，最小化熵并实现上下文感知，结构保持推理。这种方法重构的认知系统，而不是遍历采样器，但作为导航器的约束潜在的流形，由持久的拓扑记忆。通过镜头的三角洲同源性，我们表明，MAI提供了一个原则性的基础Mountcastle的通用皮质算法，建模每个皮质列作为一个本地推理算子在周期一致的记忆状态。此外，我们在MAI和强化学习之间建立了一个时间反转的对偶性：RL从奖励向前传播价值，而MAI从记忆向后重建潜在原因。这种倒置为节能推理铺平了道路，并解决了现代AI面临的计算瓶颈。因此，MAI提供了一个统一的、基于生物学基础的智力理论，该理论基于结构、重用和记忆。我们还简要讨论了MAI对实现人工通用智能（AGI）的深远影响。
摘要：Intelligence is fundamentally non-ergodic: it emerges not from uniform sampling or optimization from scratch, but from the structured reuse of prior inference trajectories. We introduce Memory-Amortized Inference (MAI) as a formal framework in which cognition is modeled as inference over latent cycles in memory, rather than recomputation through gradient descent. MAI systems encode inductive biases via structural reuse, minimizing entropy and enabling context-aware, structure-preserving inference. This approach reframes cognitive systems not as ergodic samplers, but as navigators over constrained latent manifolds, guided by persistent topological memory. Through the lens of delta-homology, we show that MAI provides a principled foundation for Mountcastle's Universal Cortical Algorithm, modeling each cortical column as a local inference operator over cycle-consistent memory states. Furthermore, we establish a time-reversal duality between MAI and reinforcement learning: whereas RL propagates value forward from reward, MAI reconstructs latent causes backward from memory. This inversion paves a path toward energy-efficient inference and addresses the computational bottlenecks facing modern AI. MAI thus offers a unified, biologically grounded theory of intelligence based on structure, reuse, and memory. We also briefly discuss the profound implications of MAI for achieving artificial general intelligence (AGI).

【6】Logical Expressivity and Explanations for Monotonic GNNs with Scoring Functions
标题：具有评分功能的单调GNN的逻辑表达性和解释
链接：https://arxiv.org/abs/2508.14091

作者：orris, David J. Tena Cucala, Bernardo Cuenca Grau
备注：Full version (with appendices) of paper accepted to KR 2025 (22nd International Conference on Principles of Knowledge Representation and Reasoning)
摘要：图神经网络（GNN）通常用于链接预测任务：预测知识图（KG）中缺失的二元事实。为了解决KGs上GNN缺乏可解释性的问题，最近的工作从GNN中提取了具有可证明对应保证的Datasheet规则。提取的规则可以用来解释GNN的预测;此外，它们可以帮助验证各种GNN模型的表达能力。然而，这些作品只解决了一种形式的链接预测的基础上限制，低表现力的图形编码/解码方法。在本文中，我们考虑了一种更通用和流行的链接预测方法，其中使用评分函数将GNN输出解码为事实预测。我们展示了GNN和评分函数如何适应单调性，使用单调性来提取合理的规则来解释预测，并利用现有的关于评分函数可以捕获的规则类型的结果。我们还定义了程序，获得等价的Datatron程序的某些类的单调GNNs评分功能。我们的实验表明，在链接预测基准测试中，单调GNNs和评分函数在实践中表现良好，并产生许多合理的规则。
摘要：Graph neural networks (GNNs) are often used for the task of link prediction: predicting missing binary facts in knowledge graphs (KGs). To address the lack of explainability of GNNs on KGs, recent works extract Datalog rules from GNNs with provable correspondence guarantees. The extracted rules can be used to explain the GNN's predictions; furthermore, they can help characterise the expressive power of various GNN models. However, these works address only a form of link prediction based on a restricted, low-expressivity graph encoding/decoding method. In this paper, we consider a more general and popular approach for link prediction where a scoring function is used to decode the GNN output into fact predictions. We show how GNNs and scoring functions can be adapted to be monotonic, use the monotonicity to extract sound rules for explaining predictions, and leverage existing results about the kind of rules that scoring functions can capture. We also define procedures for obtaining equivalent Datalog programs for certain classes of monotonic GNNs with scoring functions. Our experiments show that, on link prediction benchmarks, monotonic GNNs and scoring functions perform well in practice and yield many sound rules.

检测相关(4篇)

【1】Artificial Intelligence-Based Multiscale Temporal Modeling for Anomaly Detection in Cloud Services
标题：基于人工智能的云服务异常检测多尺度时态建模
链接：https://arxiv.org/abs/2508.14503

作者：, Yilin Li, Song Han, Renzi Meng, Sibo Wang, Ming Wang
摘要：针对云服务环境中时态建模和尺度感知特征表示的局限性，提出了一种基于Transformer架构的异常检测方法。该方法首先采用一种改进的Transformer模块对高维监测数据进行时态建模，并利用自注意机制捕获长距离依赖关系和上下文语义。然后，引入多尺度特征构造路径，通过下采样和并行编码来提取不同粒度的时间特征。设计了注意力加权融合模块，动态调整各尺度对最终决策的贡献，增强了模型在异常模式建模中的鲁棒性。在输入建模阶段，构建标准化的多维时间序列，覆盖CPU利用率、内存使用率和任务调度状态等核心信号，同时使用位置编码来加强模型的时间感知。设计了系统的实验设置来评估性能，包括比较实验和超参数敏感性分析，重点关注优化器、学习率、异常率和噪声水平的影响。实验结果表明，该方法在查准率、查全率、AUC和F1-score等关键指标上均优于主流基线模型，在各种扰动条件下均保持了较强的稳定性和检测性能，在复杂云环境下表现出了优越的性能。
摘要：This study proposes an anomaly detection method based on the Transformer architecture with integrated multiscale feature perception, aiming to address the limitations of temporal modeling and scale-aware feature representation in cloud service environments. The method first employs an improved Transformer module to perform temporal modeling on high-dimensional monitoring data, using a self-attention mechanism to capture long-range dependencies and contextual semantics. Then, a multiscale feature construction path is introduced to extract temporal features at different granularities through downsampling and parallel encoding. An attention-weighted fusion module is designed to dynamically adjust the contribution of each scale to the final decision, enhancing the model's robustness in anomaly pattern modeling. In the input modeling stage, standardized multidimensional time series are constructed, covering core signals such as CPU utilization, memory usage, and task scheduling states, while positional encoding is used to strengthen the model's temporal awareness. A systematic experimental setup is designed to evaluate performance, including comparative experiments and hyperparameter sensitivity analysis, focusing on the impact of optimizers, learning rates, anomaly ratios, and noise levels. Experimental results show that the proposed method outperforms mainstream baseline models in key metrics, including precision, recall, AUC, and F1-score, and maintains strong stability and detection performance under various perturbation conditions, demonstrating its superior capability in complex cloud environments.

【2】Reliability comparison of vessel trajectory prediction models via Probability of Detection
标题：通过检测概率进行船舶轨迹预测模型的可靠性比较
链接：https://arxiv.org/abs/2508.14198

作者：tin, Kathrin Donandt, Dirk Söffker
备注：2025 IEEE Intelligent Vehicles Symposium (IV)
摘要：本文讨论了船舶轨迹预测（VTP），重点是评估不同的基于深度学习的方法。我们的目标是评估模型的性能在不同的交通复杂性和比较的方法的可靠性。虽然以前的VTP模型忽略了特定的交通情况的复杂性和缺乏可靠性评估，本研究使用检测分析的概率来量化模型的可靠性在不同的交通情况下，从而超越常见的错误分布分析。所有的模型进行评估的测试样本分类，根据他们的交通状况在预测范围内，与性能指标和可靠性估计获得每个类别。这一全面评价的结果使人们更深入地了解不同预测方法的优缺点，以及它们在保证安全预测的预测范围长度方面的可靠性。这些发现可以为开发更可靠的船舶轨迹预测方法提供信息，提高未来内河航运的安全性和效率。
摘要：This contribution addresses vessel trajectory prediction (VTP), focusing on the evaluation of different deep learning-based approaches. The objective is to assess model performance in diverse traffic complexities and compare the reliability of the approaches. While previous VTP models overlook the specific traffic situation complexity and lack reliability assessments, this research uses a probability of detection analysis to quantify model reliability in varying traffic scenarios, thus going beyond common error distribution analyses. All models are evaluated on test samples categorized according to their traffic situation during the prediction horizon, with performance metrics and reliability estimates obtained for each category. The results of this comprehensive evaluation provide a deeper understanding of the strengths and weaknesses of the different prediction approaches, along with their reliability in terms of the prediction horizon lengths for which safe forecasts can be guaranteed. These findings can inform the development of more reliable vessel trajectory prediction approaches, enhancing safety and efficiency in future inland waterways navigation.

【3】CoBAD: Modeling Collective Behaviors for Human Mobility Anomaly Detection
标题：CoBAD：人类移动异常检测的集体行为建模
链接：https://arxiv.org/abs/2508.14088

作者：n, Shurui Cao, Leman Akoglu
摘要：检测人类移动的异常对于公共安全和城市规划等应用至关重要。虽然传统的异常检测方法主要关注于个体移动模式（例如，孩子晚上应该呆在家里），集体异常检测旨在识别个体之间集体移动行为的不规则性（例如，儿童独自在家，而父母却在别处），这仍然是一个未充分探讨的挑战。与个体异常不同，集体异常需要对个体之间的时空依赖性进行建模，从而引入额外的复杂性。为了解决这一差距，我们提出了CoBAD，这是一种旨在捕获集体行为以进行人类移动异常检测的新型模型。我们首先将问题表述为具有共现事件图的集体事件序列（CES）上的无监督学习，其中CES表示相关个体的事件序列。CoBAD然后采用两阶段注意力机制来模拟个体移动模式和多个个体之间的交互。通过掩蔽事件和链接重建任务对大规模集体行为数据进行预训练，CoBAD能够检测两种类型的集体异常：意外的同现异常和缺席异常，后者在以前的工作中基本上被忽视了。在大规模移动数据集上进行的大量实验表明，CoBAD显著优于现有的异常检测基线，AUCROC和AUCPR分别提高了13%-18%和19%-70%。所有源代码都可以在https://github.com/wenhaomin/CoBAD上找到。
摘要：Detecting anomalies in human mobility is essential for applications such as public safety and urban planning. While traditional anomaly detection methods primarily focus on individual movement patterns (e.g., a child should stay at home at night), collective anomaly detection aims to identify irregularities in collective mobility behaviors across individuals (e.g., a child is at home alone while the parents are elsewhere) and remains an underexplored challenge. Unlike individual anomalies, collective anomalies require modeling spatiotemporal dependencies between individuals, introducing additional complexity. To address this gap, we propose CoBAD, a novel model designed to capture Collective Behaviors for human mobility Anomaly Detection. We first formulate the problem as unsupervised learning over Collective Event Sequences (CES) with a co-occurrence event graph, where CES represents the event sequences of related individuals. CoBAD then employs a two-stage attention mechanism to model both the individual mobility patterns and the interactions across multiple individuals. Pre-trained on large-scale collective behavior data through masked event and link reconstruction tasks, CoBAD is able to detect two types of collective anomalies: unexpected co-occurrence anomalies and absence anomalies, the latter of which has been largely overlooked in prior work. Extensive experiments on large-scale mobility datasets demonstrate that CoBAD significantly outperforms existing anomaly detection baselines, achieving an improvement of 13%-18% in AUCROC and 19%-70% in AUCPR. All source code is available at https://github.com/wenhaomin/CoBAD.

【4】MCLPD:Multi-view Contrastive Learning for EEG-based PD Detection Across Datasets
标题：MCLPD：跨数据集中基于脑电的PD检测的多视图对比学习
链接：https://arxiv.org/abs/2508.14073

作者：ga, Ruilin Zhang, Jun Xiao, Yifan Liu, Zhe Wang
备注：Acccepted by European Conference on Artificial Intelligence(ECAI 2025)
摘要：脑电描记术已被验证为检测帕金森病的有效技术，特别是在其早期阶段。然而，脑电数据注释的高成本通常导致有限的数据集大小和数据集之间的相当大的差异，包括采集协议和受试者人口统计学的差异，显著阻碍了跨数据集检测场景中模型的鲁棒性和通用性。为了解决这些挑战，本文提出了一种半监督学习框架MCLPD，该框架将多视角对比预训练与轻量级监督微调相结合，以提高跨数据集局部放电检测性能。在预训练过程中，MCLPD对未标记的UNM数据集进行自监督学习。为了建立对比对，它在时域和频域上应用双重增强，在微调阶段，只有一小部分来自另外两个数据集的标记数据，实验结果表明，MCLPD在UI上的F1得分为0.91，在UC上的F1得分为0.81，仅使用1%的标记数据，与现有方法相比，MCLPD在降低对标记数据依赖的同时，显著提高了跨数据集泛化能力，证明了该框架的有效性.
摘要：Electroencephalography has been validated as an effective technique for detecting Parkinson's disease,particularly in its early stages.However,the high cost of EEG data annotation often results in limited dataset size and considerable discrepancies across datasets,including differences in acquisition protocols and subject demographics,significantly hinder the robustness and generalizability of models in cross-dataset detection scenarios.To address such challenges,this paper proposes a semi-supervised learning framework named MCLPD,which integrates multi-view contrastive pre-training with lightweight supervised fine-tuning to enhance cross-dataset PD detection performance.During pre-training,MCLPD uses self-supervised learning on the unlabeled UNM dataset.To build contrastive pairs,it applies dual augmentations in both time and frequency domains,which enrich the data and naturally fuse time-frequency information.In the fine-tuning phase,only a small proportion of labeled data from another two datasets (UI and UC)is used for supervised optimization.Experimental results show that MCLPD achieves F1 scores of 0.91 on UI and 0.81 on UC using only 1%of labeled data,which further improve to 0.97 and 0.87,respectively,when 5%of labeled data is used.Compared to existing methods,MCLPD substantially improves cross-dataset generalization while reducing the dependency on labeled data,demonstrating the effectiveness of the proposed framework.

分类|识别(2篇)

【1】DualNILM: Energy Injection Identification Enabled Disaggregation with Deep Multi-Task Learning
标题：DualNILM：能量注入识别通过深度多任务学习实现分解
链接：https://arxiv.org/abs/2508.14600

作者：ng, Guoming Tang, Junyu Xue, Srinivasan Keshav, Tongxin Li, Chris Ding
备注：Preprint
摘要：非侵入式负载监控（NILM）提供了一种经济高效的方法，可以在智能家居和建筑应用中获得精细的设备级能耗。然而，越来越多地采用诸如太阳能电池板和电池存储的仪表后能源，对仅依赖于仪表数据的传统NILM方法提出了新的挑战。从仪表后面的源注入的能量可以掩盖单个设备的功率特征，导致NILM性能的显著下降。为了应对这一挑战，我们提出了DualNILM，这是一个深度多任务学习框架，专为NILM中的设备状态识别和注入能量识别的双重任务而设计。通过在基于Transformer的架构中集成序列到点和序列到序列策略，DualNILM可以有效地捕获聚合功耗模式中的多尺度时间依赖性，从而实现准确的设备状态识别和能量注入识别。我们使用自我收集和合成的开放NILM数据集进行DualNILM的验证，这些数据集包括设备级能耗和能量注入。大量的实验结果表明，DualNILM保持了良好的性能，在NILM的双重任务，大大优于传统的方法。
摘要：Non-Intrusive Load Monitoring (NILM) offers a cost-effective method to obtain fine-grained appliance-level energy consumption in smart homes and building applications. However, the increasing adoption of behind-the-meter energy sources, such as solar panels and battery storage, poses new challenges for conventional NILM methods that rely solely on at-the-meter data. The injected energy from the behind-the-meter sources can obscure the power signatures of individual appliances, leading to a significant decline in NILM performance. To address this challenge, we present DualNILM, a deep multi-task learning framework designed for the dual tasks of appliance state recognition and injected energy identification in NILM. By integrating sequence-to-point and sequence-to-sequence strategies within a Transformer-based architecture, DualNILM can effectively capture multi-scale temporal dependencies in the aggregate power consumption patterns, allowing for accurate appliance state recognition and energy injection identification. We conduct validation of DualNILM using both self-collected and synthesized open NILM datasets that include both appliance-level energy consumption and energy injection. Extensive experimental results demonstrate that DualNILM maintains an excellent performance for the dual tasks in NILM, much outperforming conventional methods.

【2】ERIS: An Energy-Guided Feature Disentanglement Framework for Out-of-Distribution Time Series Classification
标题：ERIS：一种用于分布外时间序列分类的能量引导特征解纠缠框架
链接：https://arxiv.org/abs/2508.14134

作者：ei Teng, Ji Zhang, Xingwang Li, Yuxuan Liang
备注：conference
摘要：理想的时间序列分类（TSC）应该能够捕获不变的表示，但在分布外（OOD）数据上实现可靠的性能仍然是一个核心障碍。这一障碍源于模型固有地纠缠特定领域和标签相关特征的方式，从而导致虚假的相关性。虽然特征分解旨在解决这个问题，但目前的方法在很大程度上是无指导的，缺乏隔离真正通用特征所需的语义方向。为了解决这个问题，我们提出了一个端到端的能量正则化信息转移鲁棒性（ERIS）框架，以实现引导和可靠的功能解纠缠。其核心思想是，有效的解纠缠不仅需要数学约束，还需要语义指导来锚定分离过程。ERIS包含三个关键机制来实现这一目标。具体来说，我们首先引入了一个能量引导的校准机制，它为分离提供了关键的语义指导，使模型能够自我校准。此外，权重级正交性策略在特定于域的特征和标签相关的特征之间实施结构独立性，从而减轻它们的干扰。此外，辅助对抗训练机制通过注入结构化扰动来增强鲁棒性。实验表明，ERIS在四个基准测试中的平均准确率为4.04%。
摘要：An ideal time series classification (TSC) should be able to capture invariant representations, but achieving reliable performance on out-of-distribution (OOD) data remains a core obstacle. This obstacle arises from the way models inherently entangle domain-specific and label-relevant features, resulting in spurious correlations. While feature disentanglement aims to solve this, current methods are largely unguided, lacking the semantic direction required to isolate truly universal features. To address this, we propose an end-to-end Energy-Regularized Information for Shift-Robustness (\textbf{ERIS}) framework to enable guided and reliable feature disentanglement. The core idea is that effective disentanglement requires not only mathematical constraints but also semantic guidance to anchor the separation process. ERIS incorporates three key mechanisms to achieve this goal. Specifically, we first introduce an energy-guided calibration mechanism, which provides crucial semantic guidance for the separation, enabling the model to self-calibrate. Additionally, a weight-level orthogonality strategy enforces structural independence between domain-specific and label-relevant features, thereby mitigating their interference. Moreover, an auxiliary adversarial training mechanism enhances robustness by injecting structured perturbations. Experiments demonstrate that ERIS improves upon state-of-the-art baselines by an average of 4.04% accuracy across four benchmarks.

表征(2篇)

【1】Offline Imitation Learning upon Arbitrary Demonstrations by Pre-Training Dynamics Representations
标题：通过训练前动力学表示进行任意演示的离线模仿学习
链接：https://arxiv.org/abs/2508.14383

作者：a, Bo Dai, Zhaolin Ren, Yebin Wang, Na Li
备注：7 pages, 5 figures
摘要：有限的数据已经成为扩大离线模仿学习（IL）的主要瓶颈。在本文中，我们提出了在有限的专家数据下，通过引入一个预训练阶段，学习动态表示，来自转换动态的因式分解，提高IL的性能。我们首先从理论上证明离线IL的最佳决策变量位于表示空间中，从而显着减少了下游IL中学习的参数。此外，可以从以相同动态收集的任意数据中学习动态表示，从而允许重用大量非专家数据并缓解有限的数据问题。我们提出了一个易于处理的损失函数的灵感来自噪声对比估计学习的动态表示在预训练阶段。MuJoCo上的实验表明，我们所提出的算法可以模仿专家的政策，只有一个单一的轨迹。对真正的四足动物进行的实验表明，我们可以利用模拟器数据中预先训练好的动态表示，从一些真实世界的演示中学习走路。
摘要：Limited data has become a major bottleneck in scaling up offline imitation learning (IL). In this paper, we propose enhancing IL performance under limited expert data by introducing a pre-training stage that learns dynamics representations, derived from factorizations of the transition dynamics. We first theoretically justify that the optimal decision variable of offline IL lies in the representation space, significantly reducing the parameters to learn in the downstream IL. Moreover, the dynamics representations can be learned from arbitrary data collected with the same dynamics, allowing the reuse of massive non-expert data and mitigating the limited data issues. We present a tractable loss function inspired by noise contrastive estimation to learn the dynamics representations at the pre-training stage. Experiments on MuJoCo demonstrate that our proposed algorithm can mimic expert policies with as few as a single trajectory. Experiments on real quadrupeds show that we can leverage pre-trained dynamics representations from simulator data to learn to walk from a few real-world demonstrations.

【2】EEGDM: EEG Representation Learning via Generative Diffusion Model
标题：EEGDM：基于生成扩散模型的EEG表示学习
链接：https://arxiv.org/abs/2508.14086

作者：Puah, Sim Kuan Goh, Ziwei Zhang, Zixuan Ye, Chow Khuen Chan, Kheng Seang Lim, Si Lei Fong, Kok Sin Woon
备注：EEGDM Preprint
摘要：虽然脑电图（EEG）已经成为监测大脑和诊断神经系统疾病（例如，癫痫），由于有限的注释和高信号可变性，从原始EEG信号学习有意义的表示仍然具有挑战性。最近，EEG基础模型（FM）通过采用来自大型语言模型（例如，掩蔽预测）从不同的EEG数据中学习表示，然后对特定的EEG任务进行微调。尽管如此，这些大型模型在训练和推理过程中通常会产生很高的计算成本，随着模型大小的增加，性能只有很小的提高。在这项工作中，我们提出了基于生成扩散模型（EEGDM）的EEG表示学习框架。具体来说，我们开发了扩散预训练的结构化状态空间模型（SSMDP），以更好地捕捉EEG信号的时间动态，并使用去噪扩散概率模型训练架构。然后，通过我们提出的潜在融合Transformer（LFT）将所得到的潜在EEG表示用于下游分类任务。为了评估我们的方法，我们使用了多事件天普大学EEG事件语料库，并将EEGDM与当前最先进的方法（包括EEG FM）进行了比较。实证结果表明，我们的方法优于现有的方法，同时是大约19倍更轻量级。这些研究结果表明，EEGDM提供了一个有前途的替代目前FM。我们的代码可在https://github.com/jhpuah/EEGDM上获取。
摘要：While electroencephalogram (EEG) has been a crucial tool for monitoring the brain and diagnosing neurological disorders (e.g., epilepsy), learning meaningful representations from raw EEG signals remains challenging due to limited annotations and high signal variability. Recently, EEG foundation models (FMs) have shown promising potential by adopting transformer architectures and self-supervised pre-training methods from large language models (e.g., masked prediction) to learn representations from diverse EEG data, followed by fine-tuning on specific EEG tasks. Nonetheless, these large models often incurred high computational costs during both training and inference, with only marginal performance improvements as model size increases. In this work, we proposed EEG representation learning framework building upon Generative Diffusion Model (EEGDM). Specifically, we developed structured state-space model for diffusion pretraining (SSMDP) to better capture the temporal dynamics of EEG signals and trained the architecture using a Denoising Diffusion Probabilistic Model. The resulting latent EEG representations were then used for downstream classification tasks via our proposed latent fusion transformer (LFT). To evaluate our method, we used the multi-event Temple University EEG Event Corpus and compared EEGDM with current state-of-the-art approaches, including EEG FMs. Empirical results showed that our method outperformed existing methods while being approximately 19x more lightweight. These findings suggested that EEGDM offered a promising alternative to current FMs. Our code is available at: https://github.com/jhpuah/EEGDM.

3D|3D重建等相关(1篇)

【1】Pixels to Play: A Foundation Model for 3D Gameplay
标题：可玩像素：3D游戏玩法的基础模型
链接：https://arxiv.org/abs/2508.14295

作者：ue, Chris Green, Samuel Hunt, Irakli Salia, Wenzhe Shi, Jonathan J Hunt
备注：None
摘要：我们介绍Pixels 2 Play-0.1（P2P0.1），这是一个基础模型，可以学习玩各种具有可识别的类人行为的3D视频游戏。受新兴的消费者和开发者用例（AI队友，可控NPC，个性化直播，辅助测试人员）的激励，我们认为代理必须依赖于玩家可用的相同像素流，并以最少的游戏特定工程推广到新游戏。P2P0.1是通过行为克隆进行端到端训练的：从仪器化的人类游戏中收集的标记演示由未标记的公共视频补充，我们通过逆动力学模型将动作归因于这些视频。具有自回归动作输出的仅解码器Transformer可处理较大的动作空间，同时在单个消费级GPU上保持延迟友好。我们报告的定性结果显示，在简单的Roblox和经典的MS-DOS标题，消融未标记的数据称职的发挥，并概述了达到专家级，文本条件控制所需的缩放和评估步骤。
摘要：We introduce Pixels2Play-0.1 (P2P0.1), a foundation model that learns to play a wide range of 3D video games with recognizable human-like behavior. Motivated by emerging consumer and developer use cases - AI teammates, controllable NPCs, personalized live-streamers, assistive testers - we argue that an agent must rely on the same pixel stream available to players and generalize to new titles with minimal game-specific engineering. P2P0.1 is trained end-to-end with behavior cloning: labeled demonstrations collected from instrumented human game-play are complemented by unlabeled public videos, to which we impute actions via an inverse-dynamics model. A decoder-only transformer with auto-regressive action output handles the large action space while remaining latency-friendly on a single consumer GPU. We report qualitative results showing competent play across simple Roblox and classic MS-DOS titles, ablations on unlabeled data, and outline the scaling and evaluation steps required to reach expert-level, text-conditioned control.

编码器(2篇)

【1】ECHO: Frequency-aware Hierarchical Encoding for Variable-length Signal
标题：ECHO：可变长度信号的频率感知分层编码
链接：https://arxiv.org/abs/2508.14689

作者：ang, Juan Liu, Ming Li
摘要：预先训练的基础模型在视觉和语言方面取得了显著的成功，但它们在通用机器信号建模（包括声学、振动和其他工业传感器数据）方面的潜力仍有待开发。使用基于子带的编码器的现有方法已经取得了有竞争力的结果，但受到固定输入长度和缺乏显式频率位置编码的限制。在这项工作中，我们提出了一种新的基础模型，集成了先进的频带分离架构与相对频率位置嵌入，使精确的频谱定位在任意采样配置。该模型支持任意长度的输入，而无需填充或分割，从而产生简洁的嵌入，同时保留时间和频谱保真度。我们在SIREN（https：//github.com/yuanzh/SIREN）上评估了我们的方法，SIREN是一个新引入的机器信号编码的大规模基准，它统一了多个数据集，包括所有DCASE任务2挑战（2020-2025）和广泛使用的工业信号语料库。实验结果表明，在异常检测和故障识别一致的最先进的性能，证实了所提出的模型的有效性和泛化能力。我们在https://github.com/yucongzh/ECHO上开源了ECHO。
摘要：Pre-trained foundation models have demonstrated remarkable success in vision and language, yet their potential for general machine signal modeling-covering acoustic, vibration, and other industrial sensor data-remains under-explored. Existing approach using sub-band-based encoders has achieved competitive results but are limited by fixed input lengths, and the absence of explicit frequency positional encoding. In this work, we propose a novel foundation model that integrates an advanced band-split architecture with relative frequency positional embeddings, enabling precise spectral localization across arbitrary sampling configurations. The model supports inputs of arbitrary length without padding or segmentation, producing a concise embedding that retains both temporal and spectral fidelity. We evaluate our method on SIREN (https://github.com/yucongzh/SIREN), a newly introduced large-scale benchmark for machine signal encoding that unifies multiple datasets, including all DCASE task 2 challenges (2020-2025) and widely-used industrial signal corpora. Experimental results demonstrate consistent state-of-the-art performance in anomaly detection and fault identification, confirming the effectiveness and generalization capability of the proposed model. We open-sourced ECHO on https://github.com/yucongzh/ECHO.

【2】Towards Skeletal and Signer Noise Reduction in Sign Language Production via Quaternion-Based Pose Encoding and Contrastive Learning
标题：通过基于四元数的姿势编码和对比学习降低手语制作中的Skelstival和Signer噪音
链接：https://arxiv.org/abs/2508.14574

作者：auré (MULTISPEECH), Mostafa Sadeghi (MULTISPEECH), Sam Bigeard (MULTISPEECH), Slim Ouni (LORIA, MULTISPEECH)
备注：None
摘要：神经手语产生（SLP）的主要挑战之一在于高类内的变化，所产生的签名形态和风格的变化，在训练数据的迹象。为了提高对这种变化的鲁棒性，我们提出了对标准渐进式Transformers（PT）架构的两个增强（Saunders等人，2020年）。首先，我们在四元数空间中使用骨骼旋转编码姿势，并使用测地线损失进行训练，以提高角关节运动的准确性和清晰度。其次，我们引入了一个对比损失的结构解码器嵌入语义相似性，使用光泽重叠或基于SBERT的句子相似性，旨在过滤出解剖和风格特征，不传达相关的语义信息。在Phoenix14T数据集上，仅对比损失就能使PT基线的正确关键点概率提高16%。当与基于四元数的姿势编码相结合时，该模型实现了平均骨角误差降低6%。这些结果表明，将骨骼结构建模和语义引导的对比目标符号姿势表示到基于变换器的SLP模型的训练中的好处。
摘要：One of the main challenges in neural sign language production (SLP) lies in the high intra-class variability of signs, arising from signer morphology and stylistic variety in the training data. To improve robustness to such variations, we propose two enhancements to the standard Progressive Transformers (PT) architecture (Saunders et al., 2020). First, we encode poses using bone rotations in quaternion space and train with a geodesic loss to improve the accuracy and clarity of angular joint movements. Second, we introduce a contrastive loss to structure decoder embeddings by semantic similarity, using either gloss overlap or SBERT-based sentence similarity, aiming to filter out anatomical and stylistic features that do not convey relevant semantic information. On the Phoenix14T dataset, the contrastive loss alone yields a 16% improvement in Probability of Correct Keypoint over the PT baseline. When combined with quaternion-based pose encoding, the model achieves a 6% reduction in Mean Bone Angle Error. These results point to the benefit of incorporating skeletal structure modeling and semantically guided contrastive objectives on sign pose representations into the training of Transformer-based SLP models.

优化|敛散性(5篇)

【1】Compute-Optimal Scaling for Value-Based Deep RL
标题：基于价值的深度RL的计算最佳缩放
链接：https://arxiv.org/abs/2508.14881

作者：u, Oleh Rybkin, Zhiyuan Zhou, Michal Nauman, Pieter Abbeel, Sergey Levine, Aviral Kumar
摘要：随着模型变得越来越大，训练它们变得越来越昂贵，不仅要将训练配方扩展到更大的模型和更多的数据，而且要以计算最优的方式进行，以获得每单位计算的最大性能。虽然这种缩放已经在语言建模中得到了很好的研究，但强化学习（RL）在这方面受到的关注较少。在本文中，我们研究了基于值的在线深度RL的计算缩放。这些方法为计算分配提供了两个主轴：模型容量和更新数据（UTD）比率。给定固定的计算预算，我们会问：应该如何在这些轴上划分资源，以最大限度地提高样本效率？我们的分析揭示了模型大小，批量大小和UTD之间的微妙相互作用。特别是，我们发现了一种称为TD过拟合的现象：增加批次会迅速损害小型模型的Q函数准确性，但这种效应在大型模型中不存在，从而能够大规模有效使用大批量。我们提供了一个理解这种现象的心理模型，并建立了选择批量大小和UTD的指导方针，以优化计算使用。我们的研究结果为深度RL中的计算最优缩放提供了一个坚实的起点，反映了监督学习中的研究，但适用于TD学习。
摘要：As models grow larger and training them becomes expensive, it becomes increasingly important to scale training recipes not just to larger models and more data, but to do so in a compute-optimal manner that extracts maximal performance per unit of compute. While such scaling has been well studied for language modeling, reinforcement learning (RL) has received less attention in this regard. In this paper, we investigate compute scaling for online, value-based deep RL. These methods present two primary axes for compute allocation: model capacity and the update-to-data (UTD) ratio. Given a fixed compute budget, we ask: how should resources be partitioned across these axes to maximize sample efficiency? Our analysis reveals a nuanced interplay between model size, batch size, and UTD. In particular, we identify a phenomenon we call TD-overfitting: increasing the batch quickly harms Q-function accuracy for small models, but this effect is absent in large models, enabling effective use of large batch size at scale. We provide a mental model for understanding this phenomenon and build guidelines for choosing batch size and UTD to optimize compute usage. Our findings provide a grounded starting point for compute-optimal scaling in deep RL, mirroring studies in supervised learning but adapted to TD learning.

【2】Optimal Subspace Embeddings: Resolving Nelson-Nguyen Conjecture Up to Sub-Polylogarithmic Factors
标题：最佳子空间嵌入：解决Nelson-Nguyen猜想，直至亚多对数因子
链接：https://arxiv.org/abs/2508.14234

作者： Chenakkod, Michał Dereziński, Xiaoyu Dong
摘要：我们证明了Nelson和Nguyen [FOCS 2013]关于不经意子空间嵌入的最佳维度和稀疏性的猜想，直到子多对数因子：对于任意$n\geq d$和$\n\geq d^{-O（1）}$，有一个随机的$\波浪号O（d/\epsilon ^2）\times n$矩阵$\Pi$，每列$\tilde O（\log（d）/\epsilon）$非零，使得对于任何$A\in\mathbb{R}^{n\times d}$，高概率，$（1-\n）\|Ax\|\leq\|\Pi Ax\|\leq（1+\n）\|Ax\|$对于所有$x\in\mathbb{R}^d$，其中$\tilde O（\cdot）$仅隐藏$d$中的次多对数因子。我们的结果特别意味着一个新的最快的子电流矩阵乘法时间减少的大小为O（d/\epsilon ^2）$的广泛的一类$n\times d$线性回归任务。在我们的分析中，一个关键的新颖性是我们称之为迭代解耦的矩阵集中技术，我们使用它来微调通过现有的随机矩阵通用性工具[Brailovskaya和van Handel，GAFA 2024]可达到的高阶迹矩界限。
摘要：We give a proof of the conjecture of Nelson and Nguyen [FOCS 2013] on the optimal dimension and sparsity of oblivious subspace embeddings, up to sub-polylogarithmic factors: For any $n\geq d$ and $\epsilon\geq d^{-O(1)}$, there is a random $\tilde O(d/\epsilon^2)\times n$ matrix $\Pi$ with $\tilde O(\log(d)/\epsilon)$ non-zeros per column such that for any $A\in\mathbb{R}^{n\times d}$, with high probability, $(1-\epsilon)\|Ax\|\leq\|\Pi Ax\|\leq(1+\epsilon)\|Ax\|$ for all $x\in\mathbb{R}^d$, where $\tilde O(\cdot)$ hides only sub-polylogarithmic factors in $d$. Our result in particular implies a new fastest sub-current matrix multiplication time reduction of size $\tilde O(d/\epsilon^2)$ for a broad class of $n\times d$ linear regression tasks. A key novelty in our analysis is a matrix concentration technique we call iterative decoupling, which we use to fine-tune the higher-order trace moment bounds attainable via existing random matrix universality tools [Brailovskaya and van Handel, GAFA 2024].

【3】A Guide to Robust Generalization: The Impact of Architecture, Pre-training, and Optimization Strategy
标题：稳健概括指南：架构、预训练和优化策略的影响
链接：https://arxiv.org/abs/2508.14079

作者：uillet, Rishika Bhagwatkar, Jonas Ngnawé, Yann Pequignot, Alexandre Larouche, Christian Gagné, Irina Rish, Ola Ahmad, Audrey Durand
摘要：在图像域中运行的深度学习模型容易受到小的输入扰动的影响。多年来，通过从头开始训练模型来追求对这种扰动的鲁棒性（即，随机初始化）使用专门的损失目标。最近，稳健的微调已经成为一种更有效的替代方案：而不是从头开始训练，预训练模型被调整以最大限度地提高预测性能和鲁棒性。为了进行稳健的微调，从业者设计包括模型更新协议（例如，全部或部分）和专门的损失目标。其他设计选择包括架构类型和大小以及预训练表示。这些设计选择影响鲁棒泛化，这是模型在测试时暴露于新的和不可见的扰动时保持性能的能力。了解这些设计选择如何影响泛化仍然是一个悬而未决的问题，具有重要的实际意义。作为回应，我们提出了一项实证研究，涵盖6个数据集，40个预训练架构，2个专门的损失和3个自适应协议，在5种扰动类型中产生了1，440个训练配置和7，200个鲁棒性测量。据我们所知，这是迄今为止最多样化和最全面的稳健微调基准。虽然基于注意力的架构和鲁棒的预训练表示越来越受欢迎，但我们发现在大型数据集上以监督方式预训练的卷积神经网络通常表现最好。我们的分析既确认和挑战先前的设计假设，突出有前途的研究方向，并提供实用的指导。
摘要：Deep learning models operating in the image domain are vulnerable to small input perturbations. For years, robustness to such perturbations was pursued by training models from scratch (i.e., with random initializations) using specialized loss objectives. Recently, robust fine-tuning has emerged as a more efficient alternative: instead of training from scratch, pretrained models are adapted to maximize predictive performance and robustness. To conduct robust fine-tuning, practitioners design an optimization strategy that includes the model update protocol (e.g., full or partial) and the specialized loss objective. Additional design choices include the architecture type and size, and the pretrained representation. These design choices affect robust generalization, which is the model's ability to maintain performance when exposed to new and unseen perturbations at test time. Understanding how these design choices influence generalization remains an open question with significant practical implications. In response, we present an empirical study spanning 6 datasets, 40 pretrained architectures, 2 specialized losses, and 3 adaptation protocols, yielding 1,440 training configurations and 7,200 robustness measurements across five perturbation types. To our knowledge, this is the most diverse and comprehensive benchmark of robust fine-tuning to date. While attention-based architectures and robust pretrained representations are increasingly popular, we find that convolutional neural networks pretrained in a supervised manner on large datasets often perform best. Our analysis both confirms and challenges prior design assumptions, highlighting promising research directions and offering practical guidance.

【4】Multi-Objective Bayesian Optimization with Independent Tanimoto Kernel Gaussian Processes for Diverse Pareto Front Exploration
标题：具有独立Tanimoto核高斯过程的多目标Bayesian优化用于多样化帕累托前沿探索
链接：https://arxiv.org/abs/2508.14072

作者：ng
备注：Masters of Science thesis
摘要：我们提出了GP-MOBO，一种新的多目标贝叶斯优化算法，先进的最先进的分子优化。我们的方法集成了一个快速的最小包精确高斯过程（GP）能够有效地处理稀疏分子指纹的全维，而不需要大量的计算资源。GP-MOBO通过充分利用指纹维度，始终优于GP-BO等传统方法，从而识别出更高质量和有效的SMILES。此外，我们的模型实现了更广泛的探索化学搜索空间，证明了其优越的接近帕累托前在所有测试的情况下。DockSTRING数据集的实证结果表明，GP-MOBO在20次贝叶斯优化迭代中产生了更高的几何平均值，强调了其在以最小计算开销解决复杂多目标优化挑战方面的有效性和效率。
摘要：We present GP-MOBO, a novel multi-objective Bayesian Optimization algorithm that advances the state-of-the-art in molecular optimization. Our approach integrates a fast minimal package for Exact Gaussian Processes (GPs) capable of efficiently handling the full dimensionality of sparse molecular fingerprints without the need for extensive computational resources. GP-MOBO consistently outperforms traditional methods like GP-BO by fully leveraging fingerprint dimensionality, leading to the identification of higher-quality and valid SMILES. Moreover, our model achieves a broader exploration of the chemical search space, as demonstrated by its superior proximity to the Pareto front in all tested scenarios. Empirical results from the DockSTRING dataset reveal that GP-MOBO yields higher geometric mean values across 20 Bayesian optimization iterations, underscoring its effectiveness and efficiency in addressing complex multi-objective optimization challenges with minimal computational overhead.

【5】Evaluation and Optimization of Leave-one-out Cross-validation for the Lasso
标题：Lasso留一交叉验证的评估与优化
链接：https://arxiv.org/abs/2508.14368

作者：
备注：18 pages, 3 figures, 7 tables
摘要：我开发了一个算法来产生分段二次型，该算法计算套索的留一交叉验证作为其超参数的函数。该算法可用于寻找全局或局部优化留一交叉验证的精确超参数，并在真实数据集上证明了其实用性。
摘要：I develop an algorithm to produce the piecewise quadratic that computes leave-one-out cross-validation for the lasso as a function of its hyperparameter. The algorithm can be used to find exact hyperparameters that optimize leave-one-out cross-validation either globally or locally, and its practicality is demonstrated on real-world data sets.

预测|估计(8篇)

【1】Successive Halving with Learning Curve Prediction via Latent Kronecker Gaussian Processes
标题：通过潜在克罗内克高斯过程进行学习曲线预测的连续减半
链接：https://arxiv.org/abs/2508.14818

作者：reas Lin, Nicolas Mayoraz, Steffen Rendle, Dima Kuzmin, Emil Praun, Berivan Isik
备注：AutoML 2025 Non-Archival Track
摘要：逐次减半是一种流行的超参数优化算法，它将指数级更多的资源分配给有希望的候选者。然而，该算法通常依赖于中间性能值来做出资源分配决策，这可能导致它过早地修剪最终将成为最佳候选者的缓慢启动器。我们研究了基于潜在克罗内克高斯过程的学习曲线预测指导连续减半是否可以克服这一限制。在一项涉及不同神经网络架构和点击预测数据集的大规模实证研究中，我们将这种预测方法与基于当前性能值的标准方法进行了比较。我们的实验表明，虽然预测方法实现了有竞争力的性能，但与向标准方法投入更多资源相比，它不是帕累托最优的，因为它需要完全观察到的学习曲线作为训练数据。但是，可以通过利用现有的学习曲线数据来缓解这种不利因素。
摘要：Successive Halving is a popular algorithm for hyperparameter optimization which allocates exponentially more resources to promising candidates. However, the algorithm typically relies on intermediate performance values to make resource allocation decisions, which can cause it to prematurely prune slow starters that would eventually become the best candidate. We investigate whether guiding Successive Halving with learning curve predictions based on Latent Kronecker Gaussian Processes can overcome this limitation. In a large-scale empirical study involving different neural network architectures and a click prediction dataset, we compare this predictive approach to the standard approach based on current performance values. Our experiments show that, although the predictive approach achieves competitive performance, it is not Pareto optimal compared to investing more resources into the standard approach, because it requires fully observed learning curves as training data. However, this downside could be mitigated by leveraging existing learning curve data.

【2】Enhancing Contrastive Link Prediction With Edge Balancing Augmentation
标题：通过边缘平衡增强增强对比链接预测
链接：https://arxiv.org/abs/2508.14808

作者：Chang, Hui-Ju Hung, Chia-Hsun Lu, Chih-Ya Shen
备注：Accepted by CIKM 2025
摘要：链接预测是图挖掘中最基本的任务之一，它激发了最近利用对比学习来提高性能的研究。然而，我们观察到这些研究的两个主要缺点：i）缺乏对链接预测的对比学习的理论分析，ii）对比学习中节点度的考虑不足。为了解决上述弱点，我们提供了第一个正式的理论分析对比学习的链接预测，我们的分析结果可以推广到基于自动编码器的链接预测模型与对比学习。基于我们的分析结果，我们提出了一种新的图增强方法，边平衡增强（EBA），它调整图中的节点度作为增强。然后，我们提出了一种新的方法，名为对比链路预测与边缘平衡增强（CoEBA），它集成了拟议的EBA和拟议的新的对比损失，以提高模型的性能。我们在8个基准数据集上进行了实验。结果表明，我们提出的CoEBA显着优于其他国家的最先进的链路预测模型。
摘要：Link prediction is one of the most fundamental tasks in graph mining, which motivates the recent studies of leveraging contrastive learning to enhance the performance. However, we observe two major weaknesses of these studies: i) the lack of theoretical analysis for contrastive learning on link prediction, and ii) inadequate consideration of node degrees in contrastive learning. To address the above weaknesses, we provide the first formal theoretical analysis for contrastive learning on link prediction, where our analysis results can generalize to the autoencoder-based link prediction models with contrastive learning. Motivated by our analysis results, we propose a new graph augmentation approach, Edge Balancing Augmentation (EBA), which adjusts the node degrees in the graph as the augmentation. We then propose a new approach, named Contrastive Link Prediction with Edge Balancing Augmentation (CoEBA), that integrates the proposed EBA and the proposed new contrastive losses to improve the model performance. We conduct experiments on 8 benchmark datasets. The results demonstrate that our proposed CoEBA significantly outperforms the other state-of-the-art link prediction models.

【3】A Comprehensive Evaluation of the Sensitivity of Density-Ratio Estimation Based Fairness Measurement in Regression
标题：回归中基于密度比估计的公平性衡量敏感性的综合评价
链接：https://arxiv.org/abs/2508.14576

作者：b Almajed, Maryam Tabar, Peyman Najafirad
摘要：算法偏差在机器学习（ML）驱动的方法中的流行激发了越来越多的关于测量和减轻ML领域偏差的研究。因此，以往的研究主要是研究如何在回归中度量公平性，这是一个复杂的问题。特别是，最近的研究提出将其表述为密度比估计问题，并依赖于Logistic回归驱动的基于概率分类器的方法来解决它。然而，还有其他几种方法来估计密度比，据我们所知，以前的工作并没有研究这种公平性度量方法对底层密度比估计算法选择的敏感性。为了填补这一空白，本文开发了一套公平性测量方法与各种密度比估计核心，并深入研究不同的核心将如何影响所达到的公平性水平。我们的实验结果表明，密度比估计核心的选择可以显着影响公平性测量方法的结果，甚至，产生不一致的结果相对于各种算法的公平性。这些观察表明，主要问题与密度比估计为基础的公平性测量回归和需要进一步的研究，以提高其可靠性。
摘要：The prevalence of algorithmic bias in Machine Learning (ML)-driven approaches has inspired growing research on measuring and mitigating bias in the ML domain. Accordingly, prior research studied how to measure fairness in regression which is a complex problem. In particular, recent research proposed to formulate it as a density-ratio estimation problem and relied on a Logistic Regression-driven probabilistic classifier-based approach to solve it. However, there are several other methods to estimate a density ratio, and to the best of our knowledge, prior work did not study the sensitivity of such fairness measurement methods to the choice of underlying density ratio estimation algorithm. To fill this gap, this paper develops a set of fairness measurement methods with various density-ratio estimation cores and thoroughly investigates how different cores would affect the achieved level of fairness. Our experimental results show that the choice of density-ratio estimation core could significantly affect the outcome of fairness measurement method, and even, generate inconsistent results with respect to the relative fairness of various algorithms. These observations suggest major issues with density-ratio estimation based fairness measurement in regression and a need for further research to enhance their reliability.

【4】Great GATsBi: Hybrid, Multimodal, Trajectory Forecasting for Bicycles using Anticipation Mechanism
标题：Great GATsbi：使用预期机制对自行车进行混合、多模式、轨迹预测
链接：https://arxiv.org/abs/2508.14523

作者：hl, Shaimaa K. El-Baklish, Anastasios Kouvelas, Michail A. Makridis
摘要：从高级驾驶员辅助系统到自动驾驶的许多应用越来越需要准确预测道路使用者的运动，这对道路安全尤其重要。尽管大多数交通事故的死亡人数占自行车，他们很少受到关注，因为以前的工作主要集中在行人和机动车。在这项工作中，我们提出了伟大的GATsBi，一个基于领域知识的，混合的，多模式的自行车轨迹预测框架。该模型结合了基于物理的建模（受机动车辆的启发）和基于社会的建模（受行人运动的启发），以明确说明自行车运动的双重性质。社交互动是用图形注意力网络建模的，包括自行车社区的衰减历史，但也包括预期的未来轨迹数据，这是根据心理学和社会研究的最新见解。结果表明，所提出的物理模型（在短期预测中表现良好）和社会模型（在长期预测中表现良好）的集合超过了最先进的性能。我们还进行了一个受控的大规模自行车实验，以证明该框架的性能时，预测自行车轨迹和建模与道路使用者的社会互动。
摘要：Accurate prediction of road user movement is increasingly required by many applications ranging from advanced driver assistance systems to autonomous driving, and especially crucial for road safety. Even though most traffic accident fatalities account to bicycles, they have received little attention, as previous work focused mainly on pedestrians and motorized vehicles. In this work, we present the Great GATsBi, a domain-knowledge-based, hybrid, multimodal trajectory prediction framework for bicycles. The model incorporates both physics-based modeling (inspired by motorized vehicles) and social-based modeling (inspired by pedestrian movements) to explicitly account for the dual nature of bicycle movement. The social interactions are modeled with a graph attention network, and include decayed historical, but also anticipated, future trajectory data of a bicycles neighborhood, following recent insights from psychological and social studies. The results indicate that the proposed ensemble of physics models -- performing well in the short-term predictions -- and social models -- performing well in the long-term predictions -- exceeds state-of-the-art performance. We also conducted a controlled mass-cycling experiment to demonstrate the framework's performance when forecasting bicycle trajectories and modeling social interactions with road users.

【5】NeRC: Neural Ranging Correction through Differentiable Moving Horizon Location Estimation
标题：NeRC：通过可区分移动地平线位置估计进行神经距离修正
链接：https://arxiv.org/abs/2508.14336

作者：K.V. Ling, Haochen Liu, Bingheng Wang, Kun Cao
摘要：使用日常移动设备进行GNSS定位在城市环境中具有挑战性，因为卫星信号的复杂传播和低质量的机载GNSS硬件造成的测距误差被认为是破坏定位精度的原因。研究人员将希望寄托在数据驱动的方法上，以从原始测量中回归这种测距误差。然而，令人精疲力竭的测距误差注释阻碍了他们的步伐。本文提出了一个强大的端到端神经测距校正（NeRC）框架，其中定位相关的指标作为训练神经模块的任务目标。而不是寻求不切实际的测距误差标签，我们训练的神经网络使用地面实况的位置，是相对容易获得的。该功能由可微分移动水平位置估计（MHE）支持，其处理用于定位的测量水平并反向传播用于训练的梯度。更好的是，作为端到端学习的祝福，我们提出了一种使用欧氏距离场（EDF）成本图的新训练范式，它简化了对标记位置的要求。我们根据公共基准和我们收集的数据集评估了拟议的NeRC，证明了其在定位准确性方面的显着改进。我们还在边缘部署NeRC，以验证其在移动设备上的实时性能。
摘要：GNSS localization using everyday mobile devices is challenging in urban environments, as ranging errors caused by the complex propagation of satellite signals and low-quality onboard GNSS hardware are blamed for undermining positioning accuracy. Researchers have pinned their hopes on data-driven methods to regress such ranging errors from raw measurements. However, the grueling annotation of ranging errors impedes their pace. This paper presents a robust end-to-end Neural Ranging Correction (NeRC) framework, where localization-related metrics serve as the task objective for training the neural modules. Instead of seeking impractical ranging error labels, we train the neural network using ground-truth locations that are relatively easy to obtain. This functionality is supported by differentiable moving horizon location estimation (MHE) that handles a horizon of measurements for positioning and backpropagates the gradients for training. Even better, as a blessing of end-to-end learning, we propose a new training paradigm using Euclidean Distance Field (EDF) cost maps, which alleviates the demands on labeled locations. We evaluate the proposed NeRC on public benchmarks and our collected datasets, demonstrating its distinguished improvement in positioning accuracy. We also deploy NeRC on the edge to verify its real-time performance for mobile devices.

【6】A Cost-Effective Framework for Predicting Parking Availability Using Geospatial Data and Machine Learning
标题：使用地理空间数据和机器学习预测停车可用性的经济高效框架
链接：https://arxiv.org/abs/2508.14125

作者：gosher, Tala Mustafa, Mohammad Alsmirat, Amal Al-Ali, Isam Mashhour Al Jawarneh
摘要：随着城市人口的持续增长，城市在管理停车和确定占用率方面面临着许多挑战。这个问题在大学校园里尤为突出，学生需要在上课时间内快速方便地找到空闲的停车位。校园停车位的有限性强调了实施有效分配空置停车位的有效系统的必要性。我们提出了一个智能框架，它集成了多个数据源，包括街道地图，移动性和气象数据，通过空间连接操作来捕获连续3天的停车行为和车辆移动模式，每小时持续时间为7AM至3PM。该系统将不需要在街道或停车场安装任何传感工具来提供服务，因为所需的所有数据都将使用定位服务来收集。该框架将使用预期的停车场入口和时间来指定合适的停车场。评估了几种预测模型，即线性回归，支持向量回归（SVR），随机森林回归（RFR）和长短期记忆（LSTM）。使用网格搜索进行超参数调整，并使用均方根误差（RMSE）、平均绝对误差（MAE）和决定系数（R2）评估模型性能。随机森林回归实现了0.142的最低RMSE和0.582的最高R2。然而，考虑到任务的时间序列性质，LSTM模型可能会在额外的数据和更长的时间步长下表现得更好。
摘要：As urban populations continue to grow, cities face numerous challenges in managing parking and determining occupancy. This issue is particularly pronounced in university campuses, where students need to find vacant parking spots quickly and conveniently during class timings. The limited availability of parking spaces on campuses underscores the necessity of implementing efficient systems to allocate vacant parking spots effectively. We propose a smart framework that integrates multiple data sources, including street maps, mobility, and meteorological data, through a spatial join operation to capture parking behavior and vehicle movement patterns over the span of 3 consecutive days with an hourly duration between 7AM till 3PM. The system will not require any sensing tools to be installed in the street or in the parking area to provide its services since all the data needed will be collected using location services. The framework will use the expected parking entrance and time to specify a suitable parking area. Several forecasting models, namely, Linear Regression, Support Vector Regression (SVR), Random Forest Regression (RFR), and Long Short-Term Memory (LSTM), are evaluated. Hyperparameter tuning was employed using grid search, and model performance is assessed using Root Mean Squared Error (RMSE), Mean Absolute Error (MAE) and Coefficient of Determination (R2). Random Forest Regression achieved the lowest RMSE of 0.142 and highest R2 of 0.582. However, given the time-series nature of the task, an LSTM model may perform better with additional data and longer timesteps.

【7】Out-of-Sample Hydrocarbon Production Forecasting: Time Series Machine Learning using Productivity Index-Driven Features and Inductive Conformal Prediction
标题：样本外碳氢化合物产量预测：使用生产率指数驱动特征和归纳共形预测的时间序列机器学习
链接：https://arxiv.org/abs/2508.14078

作者：assan Abdalla Idris, Jakub Marek Cebula, Jebraeel Gholinezhad, Shamsul Masum, Hongjie Ma
摘要：本研究介绍了一种新的ML框架，旨在增强样本外油气产量预测的鲁棒性，特别是针对多变量时间序列分析。所提出的方法集成了生产力指数（PI）驱动的功能选择，一个概念来自油藏工程，与归纳共形预测（ICP）严格的不确定性量化。利用来自Volve（PF14、PF12井）和Norne（E1H井）油田的历史数据，本研究调查了各种预测算法（即长短期记忆（LSTM）、双向LSTM（BiLSTM）、门控递归单元（GRU）和极端梯度提升（XGBoost））在预测历史产油率（OPR_H）方面的有效性。所有模型都实现了对即将到来的未来时间段的“样本外”生产预测。使用传统的误差指标（例如，MAE），辅以预测偏差和预测方向准确度（PDA），以评估偏差和趋势捕捉能力。与传统的数值模拟工作流程相比，基于PI的特征选择有效地降低了输入维度。使用ICP框架解决了不确定性量化问题，ICP框架是一种无分布方法，可保证有效的预测区间（例如，95%的覆盖率），而不依赖于分布假设，提供了一个明显的优势，比传统的置信区间，特别是对于复杂的，非正态数据。结果证明了LSTM模型的卓越性能，实现了PF14井的最低MAE测试（19.468）和真正的样本外预测数据（29.638），随后在Norne E1H井进行了验证。这些发现突出了将特定领域的知识与先进的ML技术相结合以提高碳氢化合物产量预测可靠性的巨大潜力。
摘要：This research introduces a new ML framework designed to enhance the robustness of out-of-sample hydrocarbon production forecasting, specifically addressing multivariate time series analysis. The proposed methodology integrates Productivity Index (PI)-driven feature selection, a concept derived from reservoir engineering, with Inductive Conformal Prediction (ICP) for rigorous uncertainty quantification. Utilizing historical data from the Volve (wells PF14, PF12) and Norne (well E1H) oil fields, this study investigates the efficacy of various predictive algorithms-namely Long Short-Term Memory (LSTM), Bidirectional LSTM (BiLSTM), Gated Recurrent Unit (GRU), and eXtreme Gradient Boosting (XGBoost) - in forecasting historical oil production rates (OPR_H). All the models achieved "out-of-sample" production forecasts for an upcoming future timeframe. Model performance was comprehensively evaluated using traditional error metrics (e.g., MAE) supplemented by Forecast Bias and Prediction Direction Accuracy (PDA) to assess bias and trend-capturing capabilities. The PI-based feature selection effectively reduced input dimensionality compared to conventional numerical simulation workflows. The uncertainty quantification was addressed using the ICP framework, a distribution-free approach that guarantees valid prediction intervals (e.g., 95% coverage) without reliance on distributional assumptions, offering a distinct advantage over traditional confidence intervals, particularly for complex, non-normal data. Results demonstrated the superior performance of the LSTM model, achieving the lowest MAE on test (19.468) and genuine out-of-sample forecast data (29.638) for well PF14, with subsequent validation on Norne well E1H. These findings highlight the significant potential of combining domain-specific knowledge with advanced ML techniques to improve the reliability of hydrocarbon production forecasts.

【8】Load Forecasting on A Highly Sparse Electrical Load Dataset Using Gaussian Interpolation
标题：利用高斯插值在高度稀疏电力负荷数据集中进行负荷预测
链接：https://arxiv.org/abs/2508.14069

作者：iswas, Nafis Faisal, Vivek Chowdhury, Abrar Al-Shadid Abir, Sabir Mahmud, Mithon Rahman, Shaikh Anowarul Fattah, Hafiz Imtiaz
备注：Under review in Elsevier Electric Power Systems Research
摘要：稀疏性，定义为数据集中存在缺失值或零值，在实际数据集上操作时通常会带来重大挑战。训练数据集的特征或目标数据的稀疏性可以使用各种插值方法来处理，例如线性或多项式插值、样条、移动平均，或者可以简单地估算。插值方法通常对严格意义平稳（SSS）数据表现良好。在这项研究中，我们表明，约62\%的稀疏数据集与小时负荷数据的发电厂可以用于负荷预测假设的数据是广义平稳（WSS），如果增广高斯插值。更具体地说，我们对数据进行统计分析，并在数据集上训练多个机器学习和深度学习模型。通过比较这些模型的性能，我们的经验表明，高斯插值是一个合适的选择处理负荷预测问题。此外，我们还证明了基于长短期记忆（LSTM）的神经网络模型在各种经典模型和基于神经网络的模型中提供了最佳性能。
摘要：Sparsity, defined as the presence of missing or zero values in a dataset, often poses a major challenge while operating on real-life datasets. Sparsity in features or target data of the training dataset can be handled using various interpolation methods, such as linear or polynomial interpolation, spline, moving average, or can be simply imputed. Interpolation methods usually perform well with Strict Sense Stationary (SSS) data. In this study, we show that an approximately 62\% sparse dataset with hourly load data of a power plant can be utilized for load forecasting assuming the data is Wide Sense Stationary (WSS), if augmented with Gaussian interpolation. More specifically, we perform statistical analysis on the data, and train multiple machine learning and deep learning models on the dataset. By comparing the performance of these models, we empirically demonstrate that Gaussian interpolation is a suitable option for dealing with load forecasting problems. Additionally, we demonstrate that Long Short-term Memory (LSTM)-based neural network model offers the best performance among a diverse set of classical and neural network-based models.

其他神经网络|深度学习|模型|建模(13篇)

【1】Squeezed Diffusion Models
标题：挤压扩散模型
链接：https://arxiv.org/abs/2508.14871

作者： Singh, Samar Khanna, James Burgess
备注：7 pages, 3 figures
摘要：扩散模型通常注入各向同性高斯噪声，而不考虑数据中的结构。受量子压缩态根据海森堡不确定性原理重新分配不确定性的方式的启发，我们引入了压缩扩散模型（SDM），该模型沿着训练分布的主成分各向异性地缩放噪声。由于压缩增强了物理学中的信噪比，我们假设以数据依赖的方式缩放噪声可以更好地帮助扩散模型学习重要的数据特征。我们研究两种配置：（i）海森堡扩散模型，其利用正交方向上的逆缩放来补偿主轴上的缩放，以及（ii）仅缩放主轴的标准SDM变体。与直觉相反的是，在CIFAR-10/100和CelebA-64上，轻微的反挤压-即增加主轴上的方差-始终可以提高FID高达15%，并将精确-召回边界向更高的召回移动。我们的研究结果表明，简单的数据感知噪声整形可以在不改变架构的情况下提供强大的生成增益。
摘要：Diffusion models typically inject isotropic Gaussian noise, disregarding structure in the data. Motivated by the way quantum squeezed states redistribute uncertainty according to the Heisenberg uncertainty principle, we introduce Squeezed Diffusion Models (SDM), which scale noise anisotropically along the principal component of the training distribution. As squeezing enhances the signal-to-noise ratio in physics, we hypothesize that scaling noise in a data-dependent manner can better assist diffusion models in learning important data features. We study two configurations: (i) a Heisenberg diffusion model that compensates the scaling on the principal axis with inverse scaling on orthogonal directions and (ii) a standard SDM variant that scales only the principal axis. Counterintuitively, on CIFAR-10/100 and CelebA-64, mild antisqueezing - i.e. increasing variance on the principal axis - consistently improves FID by up to 15% and shifts the precision-recall frontier toward higher recall. Our results demonstrate that simple, data-aware noise shaping can deliver robust generative gains without architectural changes.

【2】Measuring IIA Violations in Similarity Choices with Bayesian Models
标题：基于贝叶斯模型的相似性选择中国际投资协定违反行为的度量
链接：https://arxiv.org/abs/2508.14615

作者：s Corrêa, Suryanarayana Sankagiri, Daniel Ratton Figueiredo, Matthias Grossglauser
备注：26 pages and 34 figures, for associated code and data, see this https URL, poster session in UAI 2025
摘要：相似性选择数据发生在人类基于备选方案与目标的相似性在备选方案之间做出选择时，例如，in the context上下文of information信息retrieval检索and in embedding嵌入learning学习settings设置.经典的基于度量的相似性选择模型假设不相关的替代品（IIA）的独立性，一个属性，允许一个更简单的制定。虽然IIA违规行为已被发现在许多离散的选择设置，相似性选择设置很少受到关注。这是因为选择的目标依赖性使国际投资协定的测试复杂化。我们提出了两种统计方法来测试IIA：一个经典的拟合优度检验和贝叶斯对应的框架下的后验预测检查（PPC）。这种贝叶斯方法，我们的主要技术贡献，量化了违反国际投资协定的程度，超出了其单纯的意义。我们收集了两个数据集：一个具有设计用于引发IIA违规的选择集，另一个具有来自相同项目宇宙的随机生成的选择集。我们的测试证实了这两个数据集上的重大IIA违规行为，值得注意的是，我们发现它们之间的违规程度相当。此外，我们设计了一个新的PPC测试的人口同质性。结果表明，人口确实是同质的，这表明违反国际投资协定是由背景效应驱动的-特别是，选择集内的相互作用。这些结果强调了需要新的相似性选择模型，占这样的背景效应。
摘要：Similarity choice data occur when humans make choices among alternatives based on their similarity to a target, e.g., in the context of information retrieval and in embedding learning settings. Classical metric-based models of similarity choice assume independence of irrelevant alternatives (IIA), a property that allows for a simpler formulation. While IIA violations have been detected in many discrete choice settings, the similarity choice setting has received scant attention. This is because the target-dependent nature of the choice complicates IIA testing. We propose two statistical methods to test for IIA: a classical goodness-of-fit test and a Bayesian counterpart based on the framework of Posterior Predictive Checks (PPC). This Bayesian approach, our main technical contribution, quantifies the degree of IIA violation beyond its mere significance. We curate two datasets: one with choice sets designed to elicit IIA violations, and another with randomly generated choice sets from the same item universe. Our tests confirmed significant IIA violations on both datasets, and notably, we find a comparable degree of violation between them. Further, we devise a new PPC test for population homogeneity. Results show that the population is indeed homogenous, suggesting that the IIA violations are driven by context effects -- specifically, interactions within the choice sets. These results highlight the need for new similarity choice models that account for such context effects.

【3】Beyond ReLU: Chebyshev-DQN for Enhanced Deep Q-Networks
标题：超越ReLU：用于增强型深度Q网络的Chebyshev-DQN
链接：https://arxiv.org/abs/2508.14536

作者：dannik, Morteza Tayefi, Shamim Sanisales
摘要：深度Q网络（DQN）的性能关键取决于其底层神经网络准确逼近动作值函数的能力。标准函数近似器，如多层感知器，可能难以有效地表示许多强化学习问题中固有的复杂值景观。本文介绍了一种新的架构，Chebyshev-DQN（Ch-DQN），它集成了Chebyshev多项式基础到DQN框架，以创建一个更有效的特征表示。通过利用Chebyshev多项式强大的函数逼近特性，我们假设Ch-DQN可以更有效地学习并实现更高的性能。我们在CartPole-v1基准上评估了我们提出的模型，并将其与具有可比数量参数的标准DQN进行了比较。我们的研究结果表明，具有中等多项式次数（N=4）的Ch-DQN实现了显着更好的渐近性能，超过基线约39%。然而，我们也发现多项式次数的选择是一个关键的超参数，因为高次数（N=8）可能对学习不利。这项工作验证了在深度强化学习中使用正交多项式基的潜力，同时也强调了模型复杂性所涉及的权衡。
摘要：The performance of Deep Q-Networks (DQN) is critically dependent on the ability of its underlying neural network to accurately approximate the action-value function. Standard function approximators, such as multi-layer perceptrons, may struggle to efficiently represent the complex value landscapes inherent in many reinforcement learning problems. This paper introduces a novel architecture, the Chebyshev-DQN (Ch-DQN), which integrates a Chebyshev polynomial basis into the DQN framework to create a more effective feature representation. By leveraging the powerful function approximation properties of Chebyshev polynomials, we hypothesize that the Ch-DQN can learn more efficiently and achieve higher performance. We evaluate our proposed model on the CartPole-v1 benchmark and compare it against a standard DQN with a comparable number of parameters. Our results demonstrate that the Ch-DQN with a moderate polynomial degree (N=4) achieves significantly better asymptotic performance, outperforming the baseline by approximately 39\%. However, we also find that the choice of polynomial degree is a critical hyperparameter, as a high degree (N=8) can be detrimental to learning. This work validates the potential of using orthogonal polynomial bases in deep reinforcement learning while also highlighting the trade-offs involved in model complexity.

【4】Disentanglement in T-space for Faster and Distributed Training of Diffusion Models with Fewer Latent-states
标题：T空间中的解纠缠，以更快、更分布式地训练具有更少潜伏状态的扩散模型
链接：https://arxiv.org/abs/2508.14413

作者：upta, Raghudeep Gadde, Rui Chen, Aleix M. Martinez
摘要：我们挑战扩散模型的一个基本假设，即训练需要大量的潜伏状态或时间步长，以便反向生成过程接近高斯。我们首先表明，通过仔细选择噪声时间表，在少量潜在状态（即$T \sim 32$）上训练的扩散模型与在大量潜在状态（$T \sim 1，000 $）上训练的模型的性能相匹配。其次，我们将这个限制（所需的最小潜态数量）推到一个单一的潜态，我们称之为T空间中的完全解纠缠。我们表明，高质量的样本可以很容易地产生的解纠缠模型，通过结合几个独立训练的单潜态模型。我们提供了大量的实验表明，建议的解纠缠模型提供了4- 6\times $更快的收敛速度在两个不同的数据集上的各种指标测量。
摘要：We challenge a fundamental assumption of diffusion models, namely, that a large number of latent-states or time-steps is required for training so that the reverse generative process is close to a Gaussian. We first show that with careful selection of a noise schedule, diffusion models trained over a small number of latent states (i.e. $T \sim 32$) match the performance of models trained over a much large number of latent states ($T \sim 1,000$). Second, we push this limit (on the minimum number of latent states required) to a single latent-state, which we refer to as complete disentanglement in T-space. We show that high quality samples can be easily generated by the disentangled model obtained by combining several independently trained single latent-state models. We provide extensive experiments to show that the proposed disentangled model provides 4-6$\times$ faster convergence measured across a variety of metrics on two different datasets.

【5】Online Incident Response Planning under Model Misspecification through Bayesian Learning and Belief Quantization
标题：通过Bayesian学习和信念量化实现模型错误规范下的在线事件响应规划
链接：https://arxiv.org/abs/2508.14385

作者：r, Tao Li
备注：Accepted to ACM CCS AISec2025
摘要：有效应对网络攻击需要快速决策，即使有关攻击的信息不完整或不准确。然而，大多数事件响应的决策支持框架依赖于描述事件的详细系统模型，这限制了它们的实际效用。在本文中，我们解决了这一限制，并提出了一种在模型误指定情况下进行事件响应规划的在线方法，我们称之为MOBAL：误指定在线贝叶斯学习。当新信息可用时，MOBAL通过贝叶斯学习迭代地改进关于模型的推测，这有助于随着事件的展开进行模型适应。为了在线确定有效的响应，我们将约束模型转换为有限马尔可夫模型，通过动态规划实现有效的响应规划。我们证明了贝叶斯学习对于信息反馈是渐近一致的。此外，我们还建立了误指定和量化误差的界限。在CAGE-2基准上的实验表明，MOBAL在对模型误指定的适应性和鲁棒性方面优于现有技术。
摘要：Effective responses to cyberattacks require fast decisions, even when information about the attack is incomplete or inaccurate. However, most decision-support frameworks for incident response rely on a detailed system model that describes the incident, which restricts their practical utility. In this paper, we address this limitation and present an online method for incident response planning under model misspecification, which we call MOBAL: Misspecified Online Bayesian Learning. MOBAL iteratively refines a conjecture about the model through Bayesian learning as new information becomes available, which facilitates model adaptation as the incident unfolds. To determine effective responses online, we quantize the conjectured model into a finite Markov model, which enables efficient response planning through dynamic programming. We prove that Bayesian learning is asymptotically consistent with respect to the information feedback. Additionally, we establish bounds on misspecification and quantization errors. Experiments on the CAGE-2 benchmark show that MOBAL outperforms the state of the art in terms of adaptability and robustness to model misspecification.

【6】Action-Constrained Imitation Learning
标题：约束模仿学习
链接：https://arxiv.org/abs/2508.14379

作者：Yeh, Tse-Sheng Nan, Risto Vuorio, Wei Hung, Hung-Yen Wu, Shao-Hua Sun, Ping-Chun Hsieh
备注：Published in ICML 2025
摘要：在各种机器人控制和资源分配应用中，动作约束下的策略学习在确保安全行为方面起着核心作用。在本文中，我们研究了一个新的问题设置被称为约束模仿学习（ACIL），其中一个动作约束模仿者的目的是从一个演示的专家与更大的动作空间学习。ACIL的根本挑战在于行动约束导致的专家和模仿者之间不可避免的占用度量不匹配。我们通过\textit{trajectory alignment}解决了这种不匹配，并提出了DTWIL，它用一个代理数据集取代了原来的专家演示，该数据集遵循类似的状态轨迹，同时遵守动作约束。具体来说，我们将轨迹对齐作为一个规划问题，并通过模型预测控制来解决，模型预测控制基于动态时间规整（DTW）距离将代理轨迹与专家轨迹对齐。通过大量的实验，我们证明了从DTWIL生成的数据集学习可以显着提高多个机器人控制任务的性能，并在样本效率方面优于各种基准模仿学习算法。我们的代码可在https://github.com/NYCU-RL-Bandits-Lab/ACRL-Baselines上公开获取。
摘要：Policy learning under action constraints plays a central role in ensuring safe behaviors in various robot control and resource allocation applications. In this paper, we study a new problem setting termed Action-Constrained Imitation Learning (ACIL), where an action-constrained imitator aims to learn from a demonstrative expert with larger action space. The fundamental challenge of ACIL lies in the unavoidable mismatch of occupancy measure between the expert and the imitator caused by the action constraints. We tackle this mismatch through \textit{trajectory alignment} and propose DTWIL, which replaces the original expert demonstrations with a surrogate dataset that follows similar state trajectories while adhering to the action constraints. Specifically, we recast trajectory alignment as a planning problem and solve it via Model Predictive Control, which aligns the surrogate trajectories with the expert trajectories based on the Dynamic Time Warping (DTW) distance. Through extensive experiments, we demonstrate that learning from the dataset generated by DTWIL significantly enhances performance across multiple robot control tasks and outperforms various benchmark imitation learning algorithms in terms of sample efficiency. Our code is publicly available at https://github.com/NYCU-RL-Bandits-Lab/ACRL-Baselines.

【7】FedRAIN-Lite: Federated Reinforcement Algorithms for Improving Idealised Numerical Weather and Climate Models
标题：FedRAIN-Lite：改进理想数字天气和气候模型的联邦强化算法
链接：https://arxiv.org/abs/2508.14315

作者：t Nath, Sebastian Schemm, Henry Moss, Peter Haynes, Emily Shuckburgh, Mark Webb
备注：21 pages, 6 figures
摘要：气候模式中的子网格参数化传统上是静态的，离线调整，限制了对不断变化的状态的适应性。这项工作引入了FedRAIN-Lite，这是一个联邦强化学习（FedRL）框架，通过将代理分配给纬度带，反映了大气环流模型（GCM）中使用的空间分解，从而实现了局部参数学习和周期性全局聚合。使用简化的能量平衡气候模型的层次结构，从单代理基线（ebm-v1）到多代理集成（ebm-v2）和GCM类（ebm-v3）设置，我们在不同的FedRL配置下对三种RL算法进行基准测试。结果表明，深度确定性策略梯度（DDPG）始终优于静态和单代理基线，在ebm-v2和ebm-v3设置中，热带和中纬度地区的收敛速度更快，区域加权RMSE更低。DDPG跨超参数传输的能力和低计算成本使其非常适合于地理自适应参数学习。这一能力为高复杂性的GCM提供了一个可扩展的途径，并为物理对齐的在线学习气候模型提供了一个原型，这些模型可以随着气候变化而发展。代码可访问https://github.com/p3jitnath/climate-rl-fedrl。
摘要：Sub-grid parameterisations in climate models are traditionally static and tuned offline, limiting adaptability to evolving states. This work introduces FedRAIN-Lite, a federated reinforcement learning (FedRL) framework that mirrors the spatial decomposition used in general circulation models (GCMs) by assigning agents to latitude bands, enabling local parameter learning with periodic global aggregation. Using a hierarchy of simplified energy-balance climate models, from a single-agent baseline (ebm-v1) to multi-agent ensemble (ebm-v2) and GCM-like (ebm-v3) setups, we benchmark three RL algorithms under different FedRL configurations. Results show that Deep Deterministic Policy Gradient (DDPG) consistently outperforms both static and single-agent baselines, with faster convergence and lower area-weighted RMSE in tropical and mid-latitude zones across both ebm-v2 and ebm-v3 setups. DDPG's ability to transfer across hyperparameters and low computational cost make it well-suited for geographically adaptive parameter learning. This capability offers a scalable pathway towards high-complexity GCMs and provides a prototype for physically aligned, online-learning climate models that can evolve with a changing climate. Code accessible at https://github.com/p3jitnath/climate-rl-fedrl.

【8】Learning Time-Varying Convexifications of Multiple Fairness Measures
标题：学习多个公平性指标的时变凸
链接：https://arxiv.org/abs/2508.14311

作者：, Jakub Marecek, Robert Shorten
摘要：越来越多的人认识到，可能需要考虑多种公平措施，例如，考虑到多个群体和个人的公平概念。公平调节器的相对权重是先验未知的，可能是时变的，并且需要在运行中学习。我们考虑学习的时变凸的多个公平措施有限图结构的反馈。
摘要：There is an increasing appreciation that one may need to consider multiple measures of fairness, e.g., considering multiple group and individual fairness notions. The relative weights of the fairness regularisers are a priori unknown, may be time varying, and need to be learned on the fly. We consider the learning of time-varying convexifications of multiple fairness measures with limited graph-structured feedback.

【9】RewardRank: Optimizing True Learning-to-Rank Utility
标题：RewardRank：优化真正的学习到排名实用程序
链接：https://arxiv.org/abs/2508.14180

作者：att, Kiran Koshy Thekumparampil, Tanmay Gangwani, Tesi Xiao, Leonid Sigal
摘要：传统的排名系统依赖于代理损失函数，该函数假设简单的用户行为，例如用户更喜欢排名列表，其中项目按手工制作的相关性排序。然而，现实世界中的用户交互受到复杂的行为偏差的影响，包括位置偏差，品牌亲和力，诱饵效应和相似性厌恶，这些目标未能捕获。因此，在这种损失上训练的模型往往与实际的用户效用不一致，例如在排名列表中任何点击或购买的概率。在这项工作中，我们提出了一个数据驱动的框架，通过反事实奖励学习来建模用户行为。我们的方法，RewardRank，首先训练一个深度效用模型，使用记录的数据来估计整个项目排列的用户参与度。然后，通过可微软置换算子优化排名策略以最大化预测效用，从而在事实和反事实排名空间上实现端到端训练。为了解决在没有地面实况的情况下对看不见的排列进行评估的挑战，我们引入了两个自动化协议：（i）$\textit{KD-Eval}$，使用位置感知的oracle进行反事实奖励估计，以及（ii）$\textit{LLM-Eval}$，通过大型语言模型模拟用户偏好。对大规模基准测试（包括Baidu-ULTR和Amazon KDD Cup数据集）的实验表明，我们的方法始终优于强基线，凸显了为实用程序优化排名建模用户行为动态的有效性。我们的代码可从以下网址获得：https://github.com/GauravBh1010tt/RewardRank
摘要：Traditional ranking systems rely on proxy loss functions that assume simplistic user behavior, such as users preferring a rank list where items are sorted by hand-crafted relevance. However, real-world user interactions are influenced by complex behavioral biases, including position bias, brand affinity, decoy effects, and similarity aversion, which these objectives fail to capture. As a result, models trained on such losses often misalign with actual user utility, such as the probability of any click or purchase across the ranked list. In this work, we propose a data-driven framework for modeling user behavior through counterfactual reward learning. Our method, RewardRank, first trains a deep utility model to estimate user engagement for entire item permutations using logged data. Then, a ranking policy is optimized to maximize predicted utility via differentiable soft permutation operators, enabling end-to-end training over the space of factual and counterfactual rankings. To address the challenge of evaluation without ground-truth for unseen permutations, we introduce two automated protocols: (i) $\textit{KD-Eval}$, using a position-aware oracle for counterfactual reward estimation, and (ii) $\textit{LLM-Eval}$, which simulates user preferences via large language models. Experiments on large-scale benchmarks, including Baidu-ULTR and the Amazon KDD Cup datasets, demonstrate that our approach consistently outperforms strong baselines, highlighting the effectiveness of modeling user behavior dynamics for utility-optimized ranking. Our code is available at: https://github.com/GauravBh1010tt/RewardRank

【10】Implicit Hypergraph Neural Network
标题：隐式超图神经网络
链接：https://arxiv.org/abs/2508.14101

作者：udhuri, Yongjian Zhong, Bijaya Adhikari
备注：Submitted to ICDM 2025
摘要：超图提供了一个通用的框架，用于捕捉实体之间的高阶关系，并已被广泛应用于各种领域，包括医疗保健，社交网络，和生物信息学。超图神经网络依赖于超边上节点之间的消息传递来学习潜在表示，已成为许多这些领域中预测任务的首选方法。这些方法通常只执行少量的消息传递轮来学习表示，然后用于预测。少量的消息传递轮是有代价的，因为表示只捕获本地信息，而放弃了远程高阶依赖性。然而，正如我们所证明的，盲目地增加消息传递轮数来捕获长程依赖关系也会降低超图神经网络的性能。最近的研究表明，隐式图神经网络在保持性能的同时，可以捕获标准图中的长程依赖关系。尽管超图神经网络很受欢迎，但之前的工作还没有研究超图神经网络的长程依赖问题。在这里，我们首先证明了现有的超图神经网络在聚合更多信息以捕获长期依赖性时会失去预测能力。然后，我们提出了隐式超图神经网络（IHNN），这是一种新的框架，它以端到端的方式联合学习节点和超边的定点表示，以缓解这个问题。利用隐式微分，我们引入了一个易于处理的投影梯度下降方法来有效地训练模型。对真实超图进行节点分类的广泛实验表明，IHNN在大多数情况下都优于最接近的先前工作，在超图学习中建立了一个新的最先进的技术。
摘要：Hypergraphs offer a generalized framework for capturing high-order relationships between entities and have been widely applied in various domains, including healthcare, social networks, and bioinformatics. Hypergraph neural networks, which rely on message-passing between nodes over hyperedges to learn latent representations, have emerged as the method of choice for predictive tasks in many of these domains. These approaches typically perform only a small number of message-passing rounds to learn the representations, which they then utilize for predictions. The small number of message-passing rounds comes at a cost, as the representations only capture local information and forego long-range high-order dependencies. However, as we demonstrate, blindly increasing the message-passing rounds to capture long-range dependency also degrades the performance of hyper-graph neural networks. Recent works have demonstrated that implicit graph neural networks capture long-range dependencies in standard graphs while maintaining performance. Despite their popularity, prior work has not studied long-range dependency issues on hypergraph neural networks. Here, we first demonstrate that existing hypergraph neural networks lose predictive power when aggregating more information to capture long-range dependency. We then propose Implicit Hypergraph Neural Network (IHNN), a novel framework that jointly learns fixed-point representations for both nodes and hyperedges in an end-to-end manner to alleviate this issue. Leveraging implicit differentiation, we introduce a tractable projected gradient descent approach to train the model efficiently. Extensive experiments on real-world hypergraphs for node classification demonstrate that IHNN outperforms the closest prior works in most settings, establishing a new state-of-the-art in hypergraph learning.

【11】FM4NPP: A Scaling Foundation Model for Nuclear and Particle Physics
标题：FM 4NPP：核和粒子物理的缩放基础模型
链接：https://arxiv.org/abs/2508.14087

作者：k, Shuhang Li, Yi Huang, Xihaier Luo, Haiwang Yu, Yeonju Go, Christopher Pinkenburg, Yuewei Lin, Shinjae Yoo, Joseph Osborn, Jin Huang, Yihui Ren
摘要：大型语言模型通过实现通过自我监督训练的大型可推广模型，彻底改变了人工智能。这种范式激发了科学基础模型（FM）的发展。然而，将这种能力应用于实验粒子物理学是具有挑战性的，因为探测器数据的稀疏性和空间分布性与自然语言有很大不同。这项工作解决了粒子物理学的FM是否可以在不同的任务中扩展和推广。我们引入了一个新的数据集，其中包含超过1100万个粒子碰撞事件以及一系列下游任务和标记数据。我们提出了一种新的检测器数据的自监督训练方法，并通过具有多达1.88亿个参数的模型证明了其神经可扩展性。通过冻结权重和特定于任务的适配器，该FM在所有下游任务中始终优于基线模型。性能还表现出强大的数据有效的适应。进一步的分析表明，FM提取的表示是任务不可知的，但可以通过一个单一的线性映射为不同的下游任务专门化。
摘要：Large language models have revolutionized artificial intelligence by enabling large, generalizable models trained through self-supervision. This paradigm has inspired the development of scientific foundation models (FMs). However, applying this capability to experimental particle physics is challenging due to the sparse, spatially distributed nature of detector data, which differs dramatically from natural language. This work addresses if an FM for particle physics can scale and generalize across diverse tasks. We introduce a new dataset with more than 11 million particle collision events and a suite of downstream tasks and labeled data for evaluation. We propose a novel self-supervised training method for detector data and demonstrate its neural scalability with models that feature up to 188 million parameters. With frozen weights and task-specific adapters, this FM consistently outperforms baseline models across all downstream tasks. The performance also exhibits robust data-efficient adaptation. Further analysis reveals that the representations extracted by the FM are task-agnostic but can be specialized via a single linear mapping for different downstream tasks.

【12】Toward Lifelong Learning in Equilibrium Propagation: Sleep-like and Awake Rehearsal for Enhanced Stability
标题：在平衡传播中走向终身学习：睡眠和清醒的排练以增强稳定性
链接：https://arxiv.org/abs/2508.14081

作者： Kubo, Jean Erik Delanois, Maxim Bazhenov
摘要：使用平衡传播（EP）（一种生物学上合理的训练算法）训练的递归神经网络（RNN）在图像分类和强化学习等各种任务中表现出强大的性能。然而，这些网络在持续学习中面临着一个关键挑战：灾难性遗忘，当学习新任务时，先前获得的知识会被覆盖。这种局限性与人类大脑保留和整合新旧知识的能力形成鲜明对比，人类大脑在睡眠期间通过重放学习到的信息来巩固记忆。为了解决RNN中的这一挑战，我们在这里提出了一种用于EP训练的RNN的类似睡眠的重放合并（SRC）算法。我们发现SRC显著提高了RNN在连续学习场景中对灾难性遗忘的恢复能力。在每次新任务训练后实施SRC的类增量学习中，与包含几种成熟正则化技术的前馈网络相比，EP训练的多层RNN模型（MRNN-EP）的表现明显更好。MRNN-EP与使用时间反向传播（BPTT）训练的MRNN表现相当，当两者都在MNIST数据上配备SRC时，并且在Fashion MNIST，Kuzushiji-MNIST，CIFAR 10和ImageNet数据集上超过了基于BPTT的模型。将SRC与排练（也称为“清醒重播”）相结合，进一步提高了网络在继续学习新任务的同时保留长期知识的能力。我们的研究揭示了睡眠重放技术对RNN的适用性，并强调了将类似人类的学习行为整合到人工神经网络（ANN）中的潜力。
摘要：Recurrent neural networks (RNNs) trained using Equilibrium Propagation (EP), a biologically plausible training algorithm, have demonstrated strong performance in various tasks such as image classification and reinforcement learning. However, these networks face a critical challenge in continuous learning: catastrophic forgetting, where previously acquired knowledge is overwritten when new tasks are learned. This limitation contrasts with the human brain's ability to retain and integrate both old and new knowledge, aided by processes like memory consolidation during sleep through the replay of learned information. To address this challenge in RNNs, here we propose a sleep-like replay consolidation (SRC) algorithm for EP-trained RNNs. We found that SRC significantly improves RNN's resilience to catastrophic forgetting in continuous learning scenarios. In class-incremental learning with SRC implemented after each new task training, the EP-trained multilayer RNN model (MRNN-EP) performed significantly better compared to feedforward networks incorporating several well-established regularization techniques. The MRNN-EP performed on par with MRNN trained using Backpropagation Through Time (BPTT) when both were equipped with SRC on MNIST data and surpassed BPTT-based models on the Fashion MNIST, Kuzushiji-MNIST, CIFAR10, and ImageNet datasets. Combining SRC with rehearsal, also known as "awake replay", further boosted the network's ability to retain long-term knowledge while continuing to learn new tasks. Our study reveals the applicability of sleep-like replay techniques to RNNs and highlights the potential for integrating human-like learning behaviors into artificial neural networks (ANNs).

【13】Comparing Model-agnostic Feature Selection Methods through Relative Efficiency
标题：通过相对效率比较模型不可知的特征选择方法
链接：https://arxiv.org/abs/2508.14268

作者：Zheng, Garvesh Raskutti
摘要：在模型不可知的设置中的特征选择和重要性估计是一个持续的挑战的重大利益。包装器方法很常用，因为它们通常是模型不可知的，尽管它们是计算密集型的。在本文中，我们专注于广义协方差测度（GCM）和留一协变量（LOCO）估计相关的特征选择方法，并提供了一个比较的基础上的相对效率。特别是，我们提出了三种模型设置下的理论比较：线性模型，非线性加性模型，和模仿单层神经网络的单指数模型。我们补充这与广泛的模拟和真实数据的例子。我们的理论结果，以及实证研究结果表明，GCM相关的方法一般优于LOCO在适当的规律性条件下。此外，我们量化这些方法的渐近相对效率。我们的模拟和真实数据分析包括广泛使用的机器学习方法，如神经网络和梯度提升树。
摘要：Feature selection and importance estimation in a model-agnostic setting is an ongoing challenge of significant interest. Wrapper methods are commonly used because they are typically model-agnostic, even though they are computationally intensive. In this paper, we focus on feature selection methods related to the Generalized Covariance Measure (GCM) and Leave-One-Covariate-Out (LOCO) estimation, and provide a comparison based on relative efficiency. In particular, we present a theoretical comparison under three model settings: linear models, non-linear additive models, and single index models that mimic a single-layer neural network. We complement this with extensive simulations and real data examples. Our theoretical results, along with empirical findings, demonstrate that GCM-related methods generally outperform LOCO under suitable regularity conditions. Furthermore, we quantify the asymptotic relative efficiency of these approaches. Our simulations and real data analysis include widely used machine learning methods such as neural networks and gradient boosting trees.

其他(23篇)

【1】On Defining Neural Averaging
标题：关于定义神经平均
链接：https://arxiv.org/abs/2508.14832

作者： Lee, Richard Ngo
摘要：平均神经网络意味着什么？我们研究了从一组预训练模型中合成单个神经网络的问题，每个模型都是在不相交的数据碎片上训练的，只使用它们的最终权重，而不访问训练数据。在形成神经平均的定义时，我们从模型汤中获得了见解，它似乎将多个模型聚合成一个单一的模型，同时增强了泛化性能。在这项工作中，我们重新解释了模型汤作为一个更广泛的框架的特殊情况：用于神经平均的摊销模型集成（AME），这是一种无数据的元优化方法，将模型差异视为伪梯度来指导神经权重更新。我们表明，这种观点不仅恢复模型汤，但使更多的表现力和自适应的合奏策略。从经验上讲，AME产生的平均神经解决方案优于单个专家和模型汤基线，特别是在分布外的设置中。我们的研究结果提出了无数据模型权重聚合的原则性和可推广的概念，并在某种意义上定义了如何执行神经平均。
摘要：What does it even mean to average neural networks? We investigate the problem of synthesizing a single neural network from a collection of pretrained models, each trained on disjoint data shards, using only their final weights and no access to training data. In forming a definition of neural averaging, we take insight from model soup, which appears to aggregate multiple models into a singular model while enhancing generalization performance. In this work, we reinterpret model souping as a special case of a broader framework: Amortized Model Ensembling (AME) for neural averaging, a data-free meta-optimization approach that treats model differences as pseudogradients to guide neural weight updates. We show that this perspective not only recovers model soup but enables more expressive and adaptive ensembling strategies. Empirically, AME produces averaged neural solutions that outperform both individual experts and model soup baselines, especially in out-of-distribution settings. Our results suggest a principled and generalizable notion of data-free model weight aggregation and defines, in one sense, how to perform neural averaging.

【2】Source-Guided Flow Matching
标题：源引导流量匹配
链接：https://arxiv.org/abs/2508.14807

作者：g, Alice Harting, Matthieu Barreau, Michael M. Zavlanos, Karl H. Johansson
摘要：生成模型的引导通常通过添加引导场来修改概率流向量场来实现。在本文中，我们提出了Source-Guided Flow Matching（SGFM）框架，该框架直接修改源分布，同时保持预训练的向量场不变。这将引导问题简化为从源分布采样的定义明确的问题。我们从理论上表明，SGFM恢复所需的目标分布准确。此外，我们提供的Wasserstein误差的界限时，使用近似采样器的源分布和近似向量场生成的分布。我们的方法的主要好处是，它允许用户根据他们的具体问题灵活地选择采样方法。为了说明这一点，我们系统地比较不同的采样方法，并讨论渐近精确制导的条件。此外，我们的框架集成了最佳流匹配模型，因为直接的传输图所产生的向量场被保留。合成2D基准测试、图像数据集和物理信息生成任务的实验结果证明了所提出框架的有效性和灵活性。
摘要：Guidance of generative models is typically achieved by modifying the probability flow vector field through the addition of a guidance field. In this paper, we instead propose the Source-Guided Flow Matching (SGFM) framework, which modifies the source distribution directly while keeping the pre-trained vector field intact. This reduces the guidance problem to a well-defined problem of sampling from the source distribution. We theoretically show that SGFM recovers the desired target distribution exactly. Furthermore, we provide bounds on the Wasserstein error for the generated distribution when using an approximate sampler of the source distribution and an approximate vector field. The key benefit of our approach is that it allows the user to flexibly choose the sampling method depending on their specific problem. To illustrate this, we systematically compare different sampling methods and discuss conditions for asymptotically exact guidance. Moreover, our framework integrates well with optimal flow matching models since the straight transport map generated by the vector field is preserved. Experimental results on synthetic 2D benchmarks, image datasets, and physics-informed generative tasks demonstrate the effectiveness and flexibility of the proposed framework.

【3】A Guide for Manual Annotation of Scientific Imagery: How to Prepare for Large Projects
标题：科学图像手动注释指南：如何为大型项目做准备
链接：https://arxiv.org/abs/2508.14801

作者：dzadeh, Rohan Adhyapak, Armin Iraji, Kartik Chaurasiya, V Aparna, Petrus C. Martens
摘要：尽管对手动注释图像数据的需求很高，但管理复杂且昂贵的注释项目仍然没有得到充分讨论。部分原因是，领导此类项目需要应对一系列不同且相互关联的挑战，而这些挑战往往超出了特定领域专家的专业知识范围，因此缺乏实用的指导方针。这些挑战范围很广，从数据收集到资源分配和招聘，从减少偏见到有效培训注释人员。本文为注释项目提供了一个领域不可知的准备指南，重点是科学图像。借鉴作者在管理大型手动注释项目方面的丰富经验，它解决了基本概念，包括成功措施，注释主题，项目目标，数据可用性和基本团队角色。此外，它还讨论了各种人为偏见，并推荐了提高注释质量和效率的工具和技术。其目标是鼓励进一步的研究和框架，以创建一个全面的知识库，以减少跨各个领域的手动注释项目的成本。
摘要：Despite the high demand for manually annotated image data, managing complex and costly annotation projects remains under-discussed. This is partly due to the fact that leading such projects requires dealing with a set of diverse and interconnected challenges which often fall outside the expertise of specific domain experts, leaving practical guidelines scarce. These challenges range widely from data collection to resource allocation and recruitment, from mitigation of biases to effective training of the annotators. This paper provides a domain-agnostic preparation guide for annotation projects, with a focus on scientific imagery. Drawing from the authors' extensive experience in managing a large manual annotation project, it addresses fundamental concepts including success measures, annotation subjects, project goals, data availability, and essential team roles. Additionally, it discusses various human biases and recommends tools and technologies to improve annotation quality and efficiency. The goal is to encourage further research and frameworks for creating a comprehensive knowledge base to reduce the costs of manual annotation projects across various fields.

【4】Context Steering: A New Paradigm for Compression-based Embeddings by Synthesizing Relevant Information Features
标题：上下文引导：综合相关信息特征的基于压缩的嵌入新范式
链接：https://arxiv.org/abs/2508.14780

作者： Sarasa Durán, Ana Granados Fontecha, Francisco de Borja Rodríguez Ortíz
摘要：基于压缩的距离（CD）通过识别数据对象之间的冗余隐含信息，提供了一种灵活且与领域无关的相似性度量方法。然而，由于相似性特征是从数据中导出的，而不是定义为输入，因此通常很难与手头的任务保持一致，特别是在复杂的聚类或分类设置中。为了解决这个问题，我们引入了“上下文转向”，一种新的方法，积极引导功能塑造过程。我们的方法不是被动地接受紧急数据结构（通常是来自聚类CD的层次结构），而是通过系统地分析每个对象如何影响聚类框架内的关系上下文来“引导”过程。这个过程生成了一个定制的嵌入，隔离和放大类的独特信息。我们使用归一化压缩距离（NCD）和相对压缩距离（NRC）与常见的层次聚类验证这种策略的能力，提供了一种有效的替代常见的转导方法。从文本到真实世界的音频，跨异构数据集的实验结果验证了上下文引导的鲁棒性和通用性，标志着其应用的根本转变：从仅仅发现固有的数据结构到主动塑造针对特定目标的特征空间。
摘要：Compression-based distances (CD) offer a flexible and domain-agnostic means of measuring similarity by identifying implicit information through redundancies between data objects. However, as similarity features are derived from the data, rather than defined as an input, it often proves difficult to align with the task at hand, particularly in complex clustering or classification settings. To address this issue, we introduce "context steering," a novel methodology that actively guides the feature-shaping process. Instead of passively accepting the emergent data structure (typically a hierarchy derived from clustering CDs), our approach "steers" the process by systematically analyzing how each object influences the relational context within a clustering framework. This process generates a custom-tailored embedding that isolates and amplifies class-distinctive information. We validate the capabilities of this strategy using Normalized Compression Distance (NCD) and Relative Compression Distance (NRC) with common hierarchical clustering, providing an effective alternative to common transductive methods. Experimental results across heterogeneous datasets-from text to real-world audio-validate the robustness and generality of context steering, marking a fundamental shift in their application: from merely discovering inherent data structures to actively shaping a feature space tailored to a specific objective.

【5】CaTE Data Curation for Trustworthy AI
标题：值得信赖的人工智能的CaTE数据策划
链接：https://arxiv.org/abs/2508.14741

作者：a Clemens-Sewall, Christopher Cervantes, Emma Rafkin, J. Neil Otte, Tom Magelinski, Libby Lewis, Michelle Liu, Dana Udwin, Monique Kirkman-Bey
摘要：该报告为设计或开发支持AI的系统的团队提供了实用指导，以指导他们如何在开发的数据管理阶段提高可信度。在本报告中，作者首先定义了数据、数据策展阶段和可信度。然后，我们描述了开发团队，特别是数据科学家，可以采取的一系列步骤来构建一个值得信赖的AI系统。我们列举了核心步骤的顺序，并跟踪存在替代品的并行路径。对这些步骤的描述包括优势、劣势、前提条件、结果和相关的开源软件工具实现。总的来说，本报告综合了相关学术文献中的数据策展工具和方法，我们的目标是为读者提供一套多样而连贯的实践，以提高人工智能的可信度。
摘要：This report provides practical guidance to teams designing or developing AI-enabled systems for how to promote trustworthiness during the data curation phase of development. In this report, the authors first define data, the data curation phase, and trustworthiness. We then describe a series of steps that the development team, especially data scientists, can take to build a trustworthy AI-enabled system. We enumerate the sequence of core steps and trace parallel paths where alternatives exist. The descriptions of these steps include strengths, weaknesses, preconditions, outcomes, and relevant open-source software tool implementations. In total, this report is a synthesis of data curation tools and approaches from relevant academic literature, and our goal is to equip readers with a diverse yet coherent set of practices for improving AI trustworthiness.

【6】AFABench: A Generic Framework for Benchmarking Active Feature Acquisition
标题：AFABench：用于对活动特征获取进行基准测试的通用框架
链接：https://arxiv.org/abs/2508.14734

作者：hütz, Han Wu, Reza Rezvan, Linus Aronsson, Morteza Haghir Chehreghani
摘要：在许多现实场景中，由于货币成本、延迟或隐私问题，获取数据实例的所有特征可能是昂贵的或不切实际的。主动特征获取（AFA）通过为每个数据实例动态选择信息特征的子集来解决这一挑战，从而在预测性能与获取成本之间进行权衡。虽然已经为AFA提出了许多方法，从贪婪的信息理论策略到非近视强化学习方法，但由于缺乏标准化的基准，这些方法的公平和系统评估受到阻碍。在本文中，我们介绍了AFABench，第一个基准框架AFA。我们的基准测试包括一组不同的合成和真实世界的数据集，支持广泛的采集策略，并提供模块化设计，使新方法和任务的集成变得容易。我们实现和评估了所有主要类别的代表性算法，包括静态，贪婪和基于强化学习的方法。为了测试AFA策略的前瞻能力，我们引入了一个新的合成数据集，AFAContext，旨在暴露贪婪选择的局限性。我们的研究结果突出了不同AFA策略之间的关键权衡，并为未来的研究提供了可操作的见解。基准代码可在https://github.com/Linusaronsson/AFA-Benchmark上获得。
摘要：In many real-world scenarios, acquiring all features of a data instance can be expensive or impractical due to monetary cost, latency, or privacy concerns. Active Feature Acquisition (AFA) addresses this challenge by dynamically selecting a subset of informative features for each data instance, trading predictive performance against acquisition cost. While numerous methods have been proposed for AFA, ranging from greedy information-theoretic strategies to non-myopic reinforcement learning approaches, fair and systematic evaluation of these methods has been hindered by the lack of standardized benchmarks. In this paper, we introduce AFABench, the first benchmark framework for AFA. Our benchmark includes a diverse set of synthetic and real-world datasets, supports a wide range of acquisition policies, and provides a modular design that enables easy integration of new methods and tasks. We implement and evaluate representative algorithms from all major categories, including static, greedy, and reinforcement learning-based approaches. To test the lookahead capabilities of AFA policies, we introduce a novel synthetic dataset, AFAContext, designed to expose the limitations of greedy selection. Our results highlight key trade-offs between different AFA strategies and provide actionable insights for future research. The benchmark code is available at: https://github.com/Linusaronsson/AFA-Benchmark.

【7】Cooperative SGD with Dynamic Mixing Matrices
标题：具有动态混合矩阵的合作新元
链接：https://arxiv.org/abs/2508.14565

作者：rkar, Shweta Jain
备注：Accepted at 28th European Conference on Artificial Intelligence (ECAI-2025) in main paper track
摘要：当今训练机器学习算法的最常见方法之一是随机梯度下降（SGD）。在分布式环境中，基于SGD的算法已被证明在特定情况下理论上收敛。分布式SGD设置中的大量工作假定边缘设备的固定拓扑。这些论文还假设节点对全局模型的贡献是均匀的。然而，实验表明，这样的假设是次优的和一个非均匀的聚合策略，再加上一个动态变化的拓扑结构和客户端的选择，可以显着提高这样的模型的性能。本文详细介绍了一个统一的框架，涵盖了几个本地更新SGD为基础的分布式算法与动态拓扑结构，并提供改进或匹配的理论保证收敛相比，现有的工作。
摘要：One of the most common methods to train machine learning algorithms today is the stochastic gradient descent (SGD). In a distributed setting, SGD-based algorithms have been shown to converge theoretically under specific circumstances. A substantial number of works in the distributed SGD setting assume a fixed topology for the edge devices. These papers also assume that the contribution of nodes to the global model is uniform. However, experiments have shown that such assumptions are suboptimal and a non uniform aggregation strategy coupled with a dynamically shifting topology and client selection can significantly improve the performance of such models. This paper details a unified framework that covers several Local-Update SGD-based distributed algorithms with dynamic topologies and provides improved or matching theoretical guarantees on convergence compared to existing work.

【8】Improving OCR using internal document redundancy
标题：使用内部文档冗余改进OCR
链接：https://arxiv.org/abs/2508.14557

作者：zarena, Seginus Mowlavi, Aitor Artola, Camilo Mariño, Marina Gardella, Ignacio Ramírez, Antoine Tadros, Roy He, Natalia Bottaioli, Boshra Rajaei, Gregory Randall, Jean-Michel Morel
备注：28 pages, 10 figures, including supplementary material. Code: this https URL. Dataset: this https URL
摘要：当前的OCR系统基于在大量数据上训练的深度学习模型。虽然它们已经表现出一定的能力来概括看不见的数据，特别是在检测任务中，但它们可能难以识别低质量的数据。这对于打印文档尤其明显，其中域内数据可变性通常较低，但域间数据可变性较高。在这种情况下，目前的OCR方法没有充分利用每个文档的冗余。我们提出了一种无监督的方法，利用冗余的字符形状在一个文件中，以纠正不完美的输出一个给定的OCR系统，并建议更好的聚类。为此，我们引入了一个扩展的高斯混合模型（GMM）交替的期望最大化（EM）算法与集群内的重新排列过程和正态性统计检验。我们展示了不同程度退化的文件的改进，包括恢复乌拉圭军事档案和17世纪至20世纪中期的欧洲报纸。
摘要：Current OCR systems are based on deep learning models trained on large amounts of data. Although they have shown some ability to generalize to unseen data, especially in detection tasks, they can struggle with recognizing low-quality data. This is particularly evident for printed documents, where intra-domain data variability is typically low, but inter-domain data variability is high. In that context, current OCR methods do not fully exploit each document's redundancy. We propose an unsupervised method by leveraging the redundancy of character shapes within a document to correct imperfect outputs of a given OCR system and suggest better clustering. To this aim, we introduce an extended Gaussian Mixture Model (GMM) by alternating an Expectation-Maximization (EM) algorithm with an intra-cluster realignment process and normality statistical testing. We demonstrate improvements in documents with various levels of degradation, including recovered Uruguayan military archives and 17th to mid-20th century European newspapers.

【9】Exact Shapley Attributions in Quadratic-time for FANOVA Gaussian Processes
标题：FANOVA高斯过程的二次Shapley精确属性
链接：https://arxiv.org/abs/2508.14499

作者：ammadi, Krikamol Muandet, Ilaria Tiddi, Annette Ten Teije, Siu Lun Chau
摘要：Shapley值被广泛认为是在机器学习中将重要性归因于输入特征的原则方法。然而，Shapley值的精确计算与特征的数量呈指数关系，严重限制了这种强大方法的实际应用。当预测模型是概率性的（如高斯过程（GP））时，挑战进一步加剧，其中输出是随机变量而不是点估计，需要在建模高阶矩时进行额外的计算工作。在这项工作中，我们证明，对于一类重要的GP称为FANOVA GP，明确建模所有的主效应和相互作用，* 精确 * Shapley属性的本地和全球的解释可以计算在 * 二次时间 *。对于本地的，实例明智的解释，我们定义了一个随机合作游戏的功能组件和计算精确的随机Shapley值仅在二次时间，捕获预期的贡献和不确定性。对于全局解释，我们引入了一个确定性的，基于方差的值函数，并计算精确的Shapley值，以量化每个特征对模型整体灵敏度的贡献。我们的方法利用封闭形式（随机）M\"{o}bius表示的FANOVA分解，并引入递归算法，牛顿的身份的启发，有效地计算Shapley值的均值和方差。正如实证研究所证明的那样，我们的工作通过为结构化概率模型生成的预测提供更具可扩展性、公理合理性和不确定性意识的解释，增强了可解释人工智能的实用性。
摘要：Shapley values are widely recognized as a principled method for attributing importance to input features in machine learning. However, the exact computation of Shapley values scales exponentially with the number of features, severely limiting the practical application of this powerful approach. The challenge is further compounded when the predictive model is probabilistic - as in Gaussian processes (GPs) - where the outputs are random variables rather than point estimates, necessitating additional computational effort in modeling higher-order moments. In this work, we demonstrate that for an important class of GPs known as FANOVA GP, which explicitly models all main effects and interactions, *exact* Shapley attributions for both local and global explanations can be computed in *quadratic time*. For local, instance-wise explanations, we define a stochastic cooperative game over function components and compute the exact stochastic Shapley value in quadratic time only, capturing both the expected contribution and uncertainty. For global explanations, we introduce a deterministic, variance-based value function and compute exact Shapley values that quantify each feature's contribution to the model's overall sensitivity. Our methods leverage a closed-form (stochastic) M\"{o}bius representation of the FANOVA decomposition and introduce recursive algorithms, inspired by Newton's identities, to efficiently compute the mean and variance of Shapley values. Our work enhances the utility of explainable AI, as demonstrated by empirical studies, by providing more scalable, axiomatically sound, and uncertainty-aware explanations for predictions generated by structured probabilistic models.

【10】Hilbert geometry of the symmetric positive-definite bicone: Application to the geometry of the extended Gaussian family
标题：对称正定双锥的赫BERT几何：在扩展高斯族几何中的应用
链接：https://arxiv.org/abs/2508.14369

作者：wowski, Frank Nielsen
备注：21 pages
摘要：扩展高斯族是高斯族的封闭，通过用退化协方差或退化精度矩阵或两种退化的混合引起的对应元素来完成高斯族而获得。扩展高斯族的参数空间形成对称半正定矩阵双锥，即两个部分对称半正定矩阵锥在它们的基底处连接。本文研究了这类开有界凸对称正定双锥的Hilbert几何。我们报告了相应的希尔伯特度量距离的封闭形式的公式，并详尽地研究其不变性。我们还触及这种几何处理扩展高斯分布的潜在应用。
摘要：The extended Gaussian family is the closure of the Gaussian family obtained by completing the Gaussian family with the counterpart elements induced by degenerate covariance or degenerate precision matrices, or a mix of both degeneracies. The parameter space of the extended Gaussian family forms a symmetric positive semi-definite matrix bicone, i.e. two partial symmetric positive semi-definite matrix cones joined at their bases. In this paper, we study the Hilbert geometry of such an open bounded convex symmetric positive-definite bicone. We report the closed-form formula for the corresponding Hilbert metric distance and study exhaustively its invariance properties. We also touch upon potential applications of this geometry for dealing with extended Gaussian distributions.

【11】Generative AI Against Poaching: Latent Composite Flow Matching for Wildlife Conservation
标题：反偷猎的生成人工智能：用于野生动物保护的潜在复合流匹配
链接：https://arxiv.org/abs/2508.14342

作者：ong, Haichuan Wang, Charles A. Emogor, Vincent Börsch-Supan, Lily Xu, Milind Tambe
摘要：偷猎对野生动物和生物多样性构成严重威胁。减少偷猎的一个有价值的步骤是预测偷猎者的行为，这可以为巡逻计划和其他保护措施提供信息。现有的偷猎预测方法基于线性模型或决策树，缺乏表达能力，捕捉复杂的，非线性的时空模式。生成式建模的最新进展，特别是流量匹配，提供了一个更灵活的选择。然而，在真实世界的偷猎数据上训练这样的模型面临两个主要障碍：偷猎事件的检测不完善和数据有限。为了解决不完善的检测，我们集成流匹配与基于占用的检测模型和训练的潜在空间中的流来推断潜在的占用状态。为了缓解数据稀缺性，我们采用了从线性模型预测而不是随机噪声（扩散模型中的标准）初始化的复合流，注入先验知识并提高泛化能力。对乌干达两个国家公园数据集的评估显示，预测准确性不断提高。
摘要：Poaching poses significant threats to wildlife and biodiversity. A valuable step in reducing poaching is to forecast poacher behavior, which can inform patrol planning and other conservation interventions. Existing poaching prediction methods based on linear models or decision trees lack the expressivity to capture complex, nonlinear spatiotemporal patterns. Recent advances in generative modeling, particularly flow matching, offer a more flexible alternative. However, training such models on real-world poaching data faces two central obstacles: imperfect detection of poaching events and limited data. To address imperfect detection, we integrate flow matching with an occupancy-based detection model and train the flow in latent space to infer the underlying occupancy state. To mitigate data scarcity, we adopt a composite flow initialized from a linear-model prediction rather than random noise which is the standard in diffusion models, injecting prior knowledge and improving generalization. Evaluations on datasets from two national parks in Uganda show consistent gains in predictive accuracy.

【12】Your Reward Function for RL is Your Best PRM for Search: Unifying RL and Search-Based TTS
标题：您的RL奖励功能是您最适合搜索的PRM：统一RL和基于搜索的TTC
链接：https://arxiv.org/abs/2508.14313

作者：Yang Zhou, Qixin Zhang, Hongwu Peng, Di Zhang, Marco Pavone, Ligong Han, Zhang-Wei Hong, Tong Che, Dimitris N. Metaxas
摘要：大型语言模型（LLM）的测试时间缩放（TTS）目前主要分为两种模式：（1）强化学习（RL）方法，优化稀疏的基于结果的奖励，但存在不稳定和低样本效率的问题;以及（2）由独立训练的静态过程奖励模型（PRM）引导的基于搜索的技术，其需要昂贵的人工或LLM生成的标签，并且经常在分布变化下降级。在本文中，我们介绍了AIRL-S，第一个自然统一的RL为基础的和基于搜索的TTS。AIRL-S的核心是在RL训练期间学习的奖励函数本质上代表了指导下游搜索的理想PRM。具体来说，我们利用对抗性逆强化学习（AIRL）与组相对策略优化（GRPO）相结合，直接从正确的推理轨迹中学习密集的动态PRM，完全消除了对标记中间过程数据的需求。在推理时，所得到的PRM同时充当RL推出的评论家和有效引导搜索过程的启发式，促进鲁棒的推理链扩展，减轻奖励黑客攻击，并增强跨任务泛化。八个基准测试的实验结果，包括数学，科学推理和代码生成，表明我们的统一方法比基本模型平均提高了9%的性能，与GPT-4 o相匹配。此外，当集成到多个搜索算法中时，我们的PRM始终优于所有使用标记数据训练的基线PRM。这些结果强调，事实上，你的RL奖励函数是你最好的搜索PRM，为LLM中的复杂推理任务提供了一个强大且具有成本效益的解决方案。
摘要：Test-time scaling (TTS) for large language models (LLMs) has thus far fallen into two largely separate paradigms: (1) reinforcement learning (RL) methods that optimize sparse outcome-based rewards, yet suffer from instability and low sample efficiency; and (2) search-based techniques guided by independently trained, static process reward models (PRMs), which require expensive human- or LLM-generated labels and often degrade under distribution shifts. In this paper, we introduce AIRL-S, the first natural unification of RL-based and search-based TTS. Central to AIRL-S is the insight that the reward function learned during RL training inherently represents the ideal PRM for guiding downstream search. Specifically, we leverage adversarial inverse reinforcement learning (AIRL) combined with group relative policy optimization (GRPO) to learn a dense, dynamic PRM directly from correct reasoning traces, entirely eliminating the need for labeled intermediate process data. At inference, the resulting PRM simultaneously serves as the critic for RL rollouts and as a heuristic to effectively guide search procedures, facilitating robust reasoning chain extension, mitigating reward hacking, and enhancing cross-task generalization. Experimental results across eight benchmarks, including mathematics, scientific reasoning, and code generation, demonstrate that our unified approach improves performance by 9 % on average over the base model, matching GPT-4o. Furthermore, when integrated into multiple search algorithms, our PRM consistently outperforms all baseline PRMs trained with labeled data. These results underscore that, indeed, your reward function for RL is your best PRM for search, providing a robust and cost-effective solution to complex reasoning tasks in LLMs.

【13】Local Scale Equivariance with Latent Deep Equilibrium Canonicalizer
标题：具有潜在深度平衡规范化因子的局部规模等方差
链接：https://arxiv.org/abs/2508.14187

作者：r Rahman, Chiao-An Yang, Michael N. Cheng, Lim Jun Hao, Jeremiah Jiang, Teck-Yian Lim, Raymond A. Yeh
摘要：尺度变化是计算机视觉中的一个基本挑战。同一类的对象可以具有不同的大小，并且它们的感知大小进一步受到距相机的距离的影响。这些变化是对象本地的，即，不同的对象尺寸可以在同一图像内不同地改变。为了有效地处理尺度变化，我们提出了一个深度平衡规范化器（DEC）来改善模型的局部尺度等方差。DEC可以很容易地集成到现有的网络架构中，并且可以适应预先训练的模型。值得注意的是，我们表明，在竞争性ImageNet基准测试中，DEC提高了四个流行的预训练深度网络的模型性能和局部规模一致性，例如，ViT、DeiT、Swin和BEiT。我们的代码可在https://github.com/ashiq24/local-scale-equivariance上获得。
摘要：Scale variation is a fundamental challenge in computer vision. Objects of the same class can have different sizes, and their perceived size is further affected by the distance from the camera. These variations are local to the objects, i.e., different object sizes may change differently within the same image. To effectively handle scale variations, we present a deep equilibrium canonicalizer (DEC) to improve the local scale equivariance of a model. DEC can be easily incorporated into existing network architectures and can be adapted to a pre-trained model. Notably, we show that on the competitive ImageNet benchmark, DEC improves both model performance and local scale consistency across four popular pre-trained deep-nets, e.g., ViT, DeiT, Swin, and BEiT. Our code is available at https://github.com/ashiq24/local-scale-equivariance.

【14】Neuro-inspired Ensemble-to-Ensemble Communication Primitives for Sparse and Efficient ANNs
标题：神经启发的稀疏高效ANN的集成到集成通信基本要素
链接：https://arxiv.org/abs/2508.14140

作者：onstantaropoulos, Stelios Manolis Smirnakis, Maria Papadopouli
摘要：生物神经回路的结构模块化，层次化，稀疏互连反映了布线成本，功能专业化和鲁棒性之间的有效权衡。这些原则为人工神经网络（ANN）设计提供了有价值的见解，特别是随着网络的深度和规模的增长。特别是稀疏性，已经被广泛探索以减少内存和计算，提高速度和增强泛化。受系统神经科学研究结果的启发，我们探索了小鼠视觉皮层中的功能连接模式，特别是合奏到合奏通信，可以通知ANN设计。我们介绍了G2 GNet，一种新的架构，在前馈层之间施加稀疏，模块化的连接。尽管与完全连接的模型相比，G2 GNet的参数显著减少，但在标准视觉基准测试中，G2 GNet仍实现了卓越的准确性。据我们所知，这是第一个将生物学观察到的功能连接模式作为ANN设计中的结构偏差的架构。我们用动态稀疏训练（DST）机制来补充这种静态偏差，该机制在训练过程中修剪和重新生长边缘。我们还提出了一个赫布启发的重新布线规则的基础上激活相关性，生物可塑性的原则。G2 GNet实现了高达75%的稀疏性，同时在基准测试（包括Fashion-MNIST、CIFAR-10和CIFAR-100）上将准确性提高了4.3%，以更少的计算量超越了密集基线。
摘要：The structure of biological neural circuits-modular, hierarchical, and sparsely interconnected-reflects an efficient trade-off between wiring cost, functional specialization, and robustness. These principles offer valuable insights for artificial neural network (ANN) design, especially as networks grow in depth and scale. Sparsity, in particular, has been widely explored for reducing memory and computation, improving speed, and enhancing generalization. Motivated by systems neuroscience findings, we explore how patterns of functional connectivity in the mouse visual cortex-specifically, ensemble-to-ensemble communication, can inform ANN design. We introduce G2GNet, a novel architecture that imposes sparse, modular connectivity across feedforward layers. Despite having significantly fewer parameters than fully connected models, G2GNet achieves superior accuracy on standard vision benchmarks. To our knowledge, this is the first architecture to incorporate biologically observed functional connectivity patterns as a structural bias in ANN design. We complement this static bias with a dynamic sparse training (DST) mechanism that prunes and regrows edges during training. We also propose a Hebbian-inspired rewiring rule based on activation correlations, drawing on principles of biological plasticity. G2GNet achieves up to 75% sparsity while improving accuracy by up to 4.3% on benchmarks, including Fashion-MNIST, CIFAR-10, and CIFAR-100, outperforming dense baselines with far fewer computations.

【15】Comparison of derivative-free and gradient-based minimization for multi-objective compositional design of shape memory alloys
标题：形状记忆合金多目标成分设计的无导数和梯度优化方法比较
链接：https://arxiv.org/abs/2508.14127

作者：a, Y. Noiman, E. J. Payton, T. Giovannelli
摘要：设计形状记忆合金（SMA）以满足性能目标，同时保持经济实惠和可持续性是一项复杂的挑战。在这项工作中，我们专注于优化SMA组合物，以实现所需的马氏体开始温度（Ms），同时最大限度地降低成本。为此，我们使用机器学习模型作为替代预测器，并应用数值优化方法来搜索合适的合金组合。我们训练了两种类型的机器学习模型，基于树的集成和神经网络，使用实验表征的合金和物理信息特征的数据集。基于树的模型与无导数优化器（COBYLA）一起使用，而提供梯度信息的神经网络与基于梯度的优化器（TRUST-CONSTR）配对。我们的研究结果表明，虽然两种模型预测Ms的准确性相似，但与神经网络配对的优化器可以更一致地找到更好的解决方案。COBYLA经常收敛到次优结果，特别是当初始猜测远离目标时。TRUST-CONSTR方法表现出更稳定的行为，并且更好地达到满足两个目标的合金成分。这项研究展示了一种实用的方法，通过结合物理信息数据，机器学习模型和优化算法来探索新的SMA组合物。虽然我们的数据集的规模小于基于模拟的努力，但使用实验数据提高了预测的可靠性。该方法可以扩展到其他材料的设计权衡必须与有限的数据。
摘要：Designing shape memory alloys (SMAs) that meet performance targets while remaining affordable and sustainable is a complex challenge. In this work, we focus on optimizing SMA compositions to achieve a desired martensitic start temperature (Ms) while minimizing cost. To do this, we use machine learning models as surrogate predictors and apply numerical optimization methods to search for suitable alloy combinations. We trained two types of machine learning models, a tree-based ensemble and a neural network, using a dataset of experimentally characterized alloys and physics-informed features. The tree-based model was used with a derivative-free optimizer (COBYLA), while the neural network, which provides gradient information, was paired with a gradient-based optimizer (TRUST-CONSTR). Our results show that while both models predict Ms with similar accuracy, the optimizer paired with the neural network finds better solutions more consistently. COBYLA often converged to suboptimal results, especially when the starting guess was far from the target. The TRUST-CONSTR method showed more stable behavior and was better at reaching alloy compositions that met both objectives. This study demonstrates a practical approach to exploring new SMA compositions by combining physics-informed data, machine learning models, and optimization algorithms. Although the scale of our dataset is smaller than simulation-based efforts, the use of experimental data improves the reliability of the predictions. The approach can be extended to other materials where design trade-offs must be made with limited data.

【16】From AI for Science to Agentic Science: A Survey on Autonomous Scientific Discovery
标题：从人工智能科学到宇宙科学：自主科学发现概览
链接：https://arxiv.org/abs/2508.14111

作者：, Yuejin Yang, Xiang Zhang, Yuhan Chen, Xiang Zhuang, Zhangyang Gao, Dongzhan Zhou, Guangshuai Wang, Zhiqiang Gao, Juntai Cao, Zijie Qiu, Xuming He, Qiang Zhang, Chenyu You, Shuangjia Zheng, Ning Ding, Wanli Ouyang, Nanqing Dong, Yu Cheng, Siqi Sun, Lei Bai, Bowen Zhou
摘要：人工智能（AI）正在重塑科学发现，从专业计算工具演变为自主研究伙伴。我们将人工智能科学定位为更广泛的人工智能科学范式中的一个关键阶段，人工智能系统从部分协助发展到完全的科学机构。在大型语言模型（LLM）、多模态系统和集成研究平台的支持下，人工智能在假设生成、实验设计、执行、分析和迭代优化方面表现出了能力--这些行为曾经被认为是人类独有的。本调查提供了一个面向领域的审查自主科学发现跨越生命科学，化学，材料科学和物理学。我们统一了三个以前分散的观点-面向流程，面向能力，面向机制-通过一个全面的框架，连接基础能力，核心流程和特定领域的实现。在此框架的基础上，我们（i）跟踪人工智能科学的演变，（ii）确定支撑科学机构的五个核心能力，（iii）模型发现作为一个动态的四阶段工作流程，（iv）审查上述领域的应用，以及（v）综合关键挑战和未来机遇。这项工作建立了一个面向领域的自主科学发现的综合，并将人工智能科学定位为推进人工智能驱动研究的结构化范式。
摘要：Artificial intelligence (AI) is reshaping scientific discovery, evolving from specialized computational tools into autonomous research partners. We position Agentic Science as a pivotal stage within the broader AI for Science paradigm, where AI systems progress from partial assistance to full scientific agency. Enabled by large language models (LLMs), multimodal systems, and integrated research platforms, agentic AI shows capabilities in hypothesis generation, experimental design, execution, analysis, and iterative refinement -- behaviors once regarded as uniquely human. This survey provides a domain-oriented review of autonomous scientific discovery across life sciences, chemistry, materials science, and physics. We unify three previously fragmented perspectives -- process-oriented, autonomy-oriented, and mechanism-oriented -- through a comprehensive framework that connects foundational capabilities, core processes, and domain-specific realizations. Building on this framework, we (i) trace the evolution of AI for Science, (ii) identify five core capabilities underpinning scientific agency, (iii) model discovery as a dynamic four-stage workflow, (iv) review applications across the above domains, and (v) synthesize key challenges and future opportunities. This work establishes a domain-oriented synthesis of autonomous scientific discovery and positions Agentic Science as a structured paradigm for advancing AI-driven research.

【17】Hard Examples Are All You Need: Maximizing GRPO Post-Training Under Annotation Budgets
标题：硬例子就是你所需要的：在注释预算下最大化GRPO后训练
链接：https://arxiv.org/abs/2508.14094

作者：Pikus, Pratyush Ranjan Tiwari, Burton Ye
摘要：为语言模型微调收集高质量的训练示例是昂贵的，实际预算限制了可以获得的数据量。我们调查一个关键的问题，资源受限的对齐：在一个固定的采购预算，从业者应该优先考虑的例子是容易的，中等的，困难的，或随机困难？我们研究了组相对策略优化（GRPO）在不同的模型大小和家庭微调，比较四个子集选择政策选择从相同的未标记的池使用通过多样本评估获得的基础模型难度估计。我们的实验表明，在最难的例子上训练会产生最大的性能增益，高达47%，而在简单的例子上训练会产生最小的增益。分析表明，这种效果产生于更难的例子，提供更多的学习机会，在GRPO培训。这些发现为受限后训练提供了实用指导：使用GRPO时，优先考虑硬示例会在推理任务上产生实质性的性能提升。
摘要：Collecting high-quality training examples for language model fine-tuning is expensive, with practical budgets limiting the amount of data that can be procured. We investigate a critical question for resource-constrained alignment: under a fixed acquisition budget, should practitioners prioritize examples that are easy, medium, hard, or of random difficulty? We study Group Relative Policy Optimization (GRPO) fine-tuning across different model sizes and families, comparing four subset selection policies chosen from the same unlabeled pool using base-model difficulty estimates obtained via multi-sample evaluation. Our experiments reveal that training on the hardest examples yields the largest performance gains, up to 47%, while training on easy examples yield the smallest gains. Analysis reveals that this effect arises from harder examples providing more learnable opportunities during GRPO training. These findings provide practical guidance for budget-constrained post-training: prioritizing hard examples yields substantial performance gains on reasoning tasks when using GRPO.

【18】Physics-Informed Reward Machines
标题：身体知情的奖励机器
链接：https://arxiv.org/abs/2508.14093

作者：eleye, Ashutosh Trivedi, Majid Zamani
备注：20 pages, currently under review in a conference
摘要：奖励机器（RM）提供了一种结构化的方式来指定强化学习（RL）中的非马尔可夫奖励，从而提高了表达能力和可编程性。从更广泛的角度来看，它们将奖励机制捕获的关于环境的已知信息与仍然未知且必须通过采样发现的信息区分开来。这种分离支持反事实经验生成和奖励塑造等技术，这些技术降低了样本复杂性并加快了学习速度。我们引入了物理信息奖励机（pRM），这是一种符号机器，旨在表达RL代理的复杂学习目标和奖励结构，从而实现更可编程，更具表现力和更有效的学习。我们提出了能够通过反事实经验和奖励塑造来利用pRM的RL算法。我们的实验结果表明，这些技术在强化学习的训练阶段加速了奖励获取。我们通过实验证明了在有限和连续的物理环境中的表现力和有效性的pRM，说明将pRM显着提高学习效率在几个控制任务。
摘要：Reward machines (RMs) provide a structured way to specify non-Markovian rewards in reinforcement learning (RL), thereby improving both expressiveness and programmability. Viewed more broadly, they separate what is known about the environment, captured by the reward mechanism, from what remains unknown and must be discovered through sampling. This separation supports techniques such as counterfactual experience generation and reward shaping, which reduce sample complexity and speed up learning. We introduce physics-informed reward machines (pRMs), a symbolic machine designed to express complex learning objectives and reward structures for RL agents, thereby enabling more programmable, expressive, and efficient learning. We present RL algorithms capable of exploiting pRMs via counterfactual experiences and reward shaping. Our experimental results show that these techniques accelerate reward acquisition during the training phases of RL. We demonstrate the expressiveness and effectiveness of pRMs through experiments in both finite and continuous physical environments, illustrating that incorporating pRMs significantly improves learning efficiency across several control tasks.

【19】Systematic FAIRness Assessment of Open Voice Biomarker Datasets for Mental Health and Neurodegenerative Diseases
标题：心理健康和神经退行性疾病的开放语音生物标志物数据集的系统公平性评估
链接：https://arxiv.org/abs/2508.14089

作者：hapatra, Nihar R. Mahapatra
备注：To appear in the Proceedings of the 28th International Conference on Text, Speech and Dialogue (TSD 2025), Erlangen, Germany, August 25-28, 2025
摘要：语音生物标志物-人类产生的声音信号，如语音，咳嗽和呼吸-是有前途的工具，可扩展，非侵入性检测和监测心理健康和神经退行性疾病。然而，它们的临床应用仍然受到公开数据集质量不一致和可用性有限的限制。为了解决这一差距，我们提出了第一个系统的FAIR（可发现，可解释，可互操作，可重复使用）评估27个公开的声音生物标志物数据集集中在这些疾病领域。使用FAIR数据成熟度模型和结构化的优先级加权评分方法，我们评估了子原则，原则和复合水平的公平性。我们的分析显示，可查找性一直很高，但在可访问性、互操作性和可重用性方面存在很大的可变性和弱点。精神健康数据集在FAIR评分中表现出更大的变异性，而神经退行性疾病数据集则稍微一致。存储库的选择也显着影响公平分数。为了提高数据集质量和临床实用性，我们建议采用结构化的、特定领域的元数据标准，优先考虑符合FAIR的存储库，并定期应用结构化的FAIR评估框架。这些发现为改善数据集的互操作性和重用提供了可操作的指导，从而加速了语音生物标志物技术的临床转化。
摘要：Voice biomarkers--human-generated acoustic signals such as speech, coughing, and breathing--are promising tools for scalable, non-invasive detection and monitoring of mental health and neurodegenerative diseases. Yet, their clinical adoption remains constrained by inconsistent quality and limited usability of publicly available datasets. To address this gap, we present the first systematic FAIR (Findable, Accessible, Interoperable, Reusable) evaluation of 27 publicly available voice biomarker datasets focused on these disease areas. Using the FAIR Data Maturity Model and a structured, priority-weighted scoring method, we assessed FAIRness at subprinciple, principle, and composite levels. Our analysis revealed consistently high Findability but substantial variability and weaknesses in Accessibility, Interoperability, and Reusability. Mental health datasets exhibited greater variability in FAIR scores, while neurodegenerative datasets were slightly more consistent. Repository choice also significantly influenced FAIRness scores. To enhance dataset quality and clinical utility, we recommend adopting structured, domain-specific metadata standards, prioritizing FAIR-compliant repositories, and routinely applying structured FAIR evaluation frameworks. These findings provide actionable guidance to improve dataset interoperability and reuse, thereby accelerating the clinical translation of voice biomarker technologies.

【20】KnowDR-REC: A Benchmark for Referring Expression Comprehension with Real-World Knowledge
标题：KnowDR-REC：用现实世界知识引用表达理解的基准
链接：https://arxiv.org/abs/2508.14080

作者：Jin, Jingpei Wu, Tianpei Guo, Yiyi Niu, Weidong Zhou, Guoyang Liu
摘要：引用表达式理解（REC）是一种流行的多模态任务，旨在基于给定的文本表达式准确地检测单个图像中的目标对象。然而，由于早期模型的局限性，传统的REC基准要么仅仅依赖于图像内线索，要么缺乏足够细粒度的实例注释，使它们不足以评估多模态大型语言模型（MLLM）的推理能力。为了解决这一差距，我们提出了一个新的基准，KnowDR-REC，其特点是三个关键特征：首先，它是建立在现实世界的知识，需要细粒度的多模态推理跨文本和图像。其次，数据集包括通过细粒度表达编辑精心构建的阴性样本，旨在评估模型的鲁棒性和抗幻觉能力。最后，我们引入了三个新的评价指标，系统地探讨了模型的内部推理过程。我们在KnowDR-REC上评估了16种最先进的多模态模型，实验结果表明，现有的MLLM仍然在努力完成知识驱动的视觉基础任务。此外，我们观察到文本理解和视觉基础之间的解耦MLLM，其中许多模型受到记忆的快捷相关性的显着影响，这严重影响了他们的行为在我们的基准，并阻碍真正的多模态推理。我们预计，拟议的基准将激发未来的研究，以开发更强大，可解释和知识密集型的视觉基础框架，推动开发更可靠，更强大的多模式系统，用于复杂的现实世界场景。
摘要：Referring Expression Comprehension (REC) is a popular multimodal task that aims to accurately detect target objects within a single image based on a given textual expression. However, due to the limitations of earlier models, traditional REC benchmarks either rely solely on intra-image cues or lack sufficiently fine-grained instance annotations, making them inadequate for evaluating the reasoning capabilities of Multi-modal Large Language Models (MLLMs). To address this gap, we propose a new benchmark, KnowDR-REC, characterized by three key features: Firstly, it is built upon real-world knowledge, requiring fine-grained multimodal reasoning across text and image. Secondly, the dataset includes elaborately constructed negative samples via fine-grained expression editing, designed to evaluate a model's robustness and anti-hallucination ability. Lastly, we introduce three novel evaluation metrics to systematically explore the model's internal reasoning process. We evaluate 16 state-of-the-art multimodal models on KnowDR-REC, with experimental results showing that existing MLLMs still struggle with knowledge-driven visual grounding tasks. Furthermore, we observe a decoupling between textual understanding and visual grounding in MLLMs, where many models are significantly influenced by memorized shortcut correlations, which severely affect their behavior on our benchmark and hinder genuine multimodal reasoning. We anticipate that the proposed benchmark will inspire future research towards developing more robust, interpretable, and knowledge-intensive visual grounding frameworks, driving the development of more reliable and robust multimodal systems for complex real-world scenarios.

【21】Label Smoothing is a Pragmatic Information Bottleneck
标题：标签平滑是一个实用信息瓶颈
链接：https://arxiv.org/abs/2508.14077

作者：
备注：18 pages, 8 figures, published in Transactions on Machine Learning Research (TMLR), 2025
摘要：本研究通过信息瓶颈的形式重新审视标签平滑。在充分的模型灵活性和相同输入没有冲突标签的假设下，我们从理论和实验上证明，通过标签平滑获得的模型输出探索信息瓶颈的最优解决方案。基于此，标签平滑可以被解释为一种解决信息瓶颈的实用方法，实现简单。作为一个信息瓶颈的方法，我们的实验表明，标签平滑也表现出的属性是不敏感的因素，不包含有关目标的信息，或因素，不提供额外的信息时，它的条件是另一个变量。
摘要：This study revisits label smoothing via a form of information bottleneck. Under the assumption of sufficient model flexibility and no conflicting labels for the same input, we theoretically and experimentally demonstrate that the model output obtained through label smoothing explores the optimal solution of the information bottleneck. Based on this, label smoothing can be interpreted as a practical approach to the information bottleneck, enabling simple implementation. As an information bottleneck method, we experimentally show that label smoothing also exhibits the property of being insensitive to factors that do not contain information about the target, or to factors that provide no additional information about it when conditioned on another variable.

【22】The C-index Multiverse
标题：C指数多元宇宙
链接：https://arxiv.org/abs/2508.14821

作者： Sierra, Colin McLean, Peter S. Hall, Catalina A. Vallejos
备注：21 pages main text with 6 figures and 3 tables. 19 pages of supplementary material
摘要：量化事件发生时间结果的样本外鉴别性能是预测建模背景下模型评估和选择的基本步骤。一致性指数（concordance index）或C指数（C-index）是用于此目的的广泛使用的指标，特别是随着机器学习方法的不断发展。除了建议的C-指数估计（如Harrell的，Uno的和Antolini的）之间的差异，我们证明了一个C-指数多重宇宙之间存在可用的R和Python软件，其中看似平等的实现可以产生不同的结果。这可能会破坏可重复性，并使模型和研究之间的公平比较复杂化。关键变化来源包括领带处理和调整删失。此外，缺乏标准化的方法来总结生存分布的风险，导致另一个依赖于输入类型的变异来源。我们展示了C指数多重宇宙在量化公开可用的乳腺癌数据和半合成示例的几种生存模型（从Cox比例风险到最近的深度学习方法）的预测性能时的后果。我们的工作强调需要更好的报告，以提高透明度和可重复性。本文旨在成为一个有用的指南，帮助分析师在多重宇宙中导航，提供统一的文档并突出现有软件的潜在陷阱。所有代码都可以在www.github.com/BBolosSierra/CindexMultiverse上公开获得。
摘要：Quantifying out-of-sample discrimination performance for time-to-event outcomes is a fundamental step for model evaluation and selection in the context of predictive modelling. The concordance index, or C-index, is a widely used metric for this purpose, particularly with the growing development of machine learning methods. Beyond differences between proposed C-index estimators (e.g. Harrell's, Uno's and Antolini's), we demonstrate the existence of a C-index multiverse among available R and python software, where seemingly equal implementations can yield different results. This can undermine reproducibility and complicate fair comparisons across models and studies. Key variation sources include tie handling and adjustment to censoring. Additionally, the absence of a standardised approach to summarise risk from survival distributions, result in another source of variation dependent on input types. We demonstrate the consequences of the C-index multiverse when quantifying predictive performance for several survival models (from Cox proportional hazards to recent deep learning approaches) on publicly available breast cancer data, and semi-synthetic examples. Our work emphasises the need for better reporting to improve transparency and reproducibility. This article aims to be a useful guideline, helping analysts when navigating the multiverse, providing unified documentation and highlighting potential pitfalls of existing software. All code is publicly available at: www.github.com/BBolosSierra/CindexMultiverse.

【23】Activity Coefficient-based Channel Selection for Electroencephalogram: A Task-Independent Approach
标题：基于活动系数的脑电波通道选择：任务无关的方法
链接：https://arxiv.org/abs/2508.14060

作者：ndey, Arun Balasubramanian, Debasis Samanta
摘要：脑电信号（EEG）由于其无创、低成本和相对简单的采集过程而在脑机接口（BCI）应用中得到广泛采用。对更高空间分辨率的需求，特别是在临床环境中，导致了高密度电极阵列的发展。然而，增加信道的数量引入了诸如交叉信道干扰和计算开销的挑战。为了解决这些问题，现代BCI系统通常采用通道选择算法。然而，现有的方法通常是特定于任务的，并且需要针对每个新应用进行重新优化。这项工作提出了一个任务不可知的信道选择方法，基于活动系数的信道选择（ACCS），它使用一种新的度量称为信道活动系数（CAC）量化信道效用的活动水平的基础上。通过选择CAC排名前16位的通道，ACCS在多类分类准确率方面实现了高达34.97%的提高。与传统方法不同，ACCS识别了一组独立于下游任务或模型的可重复使用的信息通道，使其高度适应于各种基于EEG的应用。
摘要：Electroencephalogram (EEG) signals have gained widespread adoption in brain-computer interface (BCI) applications due to their non-invasive, low-cost, and relatively simple acquisition process. The demand for higher spatial resolution, particularly in clinical settings, has led to the development of high-density electrode arrays. However, increasing the number of channels introduces challenges such as cross-channel interference and computational overhead. To address these issues, modern BCI systems often employ channel selection algorithms. Existing methods, however, are typically task-specific and require re-optimization for each new application. This work proposes a task-agnostic channel selection method, Activity Coefficient-based Channel Selection (ACCS), which uses a novel metric called the Channel Activity Coefficient (CAC) to quantify channel utility based on activity levels. By selecting the top 16 channels ranked by CAC, ACCS achieves up to 34.97% improvement in multi-class classification accuracy. Unlike traditional approaches, ACCS identifies a reusable set of informative channels independent of the downstream task or model, making it highly adaptable for diverse EEG-based applications.

机器翻译由腾讯交互翻译提供，仅供参考

点击“阅读原文”获取带摘要的学术速递