机器学习学术速递[8.15]

点击阅读原文访问arxivdaily.com，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏等功能！

cs.LG 方向，今日共计140篇

大模型相关(16篇)

【1】A Survey on Diffusion Language Models
标题：扩散语言模型综述
链接：https://arxiv.org/abs/2508.10875

作者：, Mingda Chen, Bowei Guo, Zhiqiang Shen
摘要：扩散语言模型（DLMs）正在迅速成为一个强大的和有前途的替代占主导地位的自回归（AR）范式。通过迭代去噪过程并行生成令牌，DLM在减少推理延迟和捕获双向上下文方面具有固有优势，从而实现对生成过程的细粒度控制。在实现数倍加速的同时，最近的进步使DLM显示出与自回归对应物相当的性能，使其成为各种自然语言处理任务的引人注目的选择。在本次调查中，我们全面概述了当前的DLM格局。我们追踪它的演变和与其他范式的关系，如自回归和掩蔽语言模型，并涵盖基本原则和最先进的模型。我们的工作提供了一个最新的，全面的分类和深入分析当前的技术，从训练前的策略，以先进的训练后的方法。本调查的另一个贡献是对DLM推理策略和优化的全面审查，包括解码并行性，缓存机制和生成质量的改进。我们还强调了最新的方法，多模式扩展的DLMs和描绘他们的应用程序在各种实际情况。此外，我们的讨论解决了DLMs的局限性和挑战，包括效率，长序列处理和基础设施的要求，同时概述了未来的研究方向，以维持在这个快速发展的领域的进展。Project GitHub可在https://github.com/VILA-Lab/Awesome-DLMs上找到。
摘要：Diffusion Language Models (DLMs) are rapidly emerging as a powerful and promising alternative to the dominant autoregressive (AR) paradigm. By generating tokens in parallel through an iterative denoising process, DLMs possess inherent advantages in reducing inference latency and capturing bidirectional context, thereby enabling fine-grained control over the generation process. While achieving a several-fold speed-up, recent advancements have allowed DLMs to show performance comparable to their autoregressive counterparts, making them a compelling choice for various natural language processing tasks. In this survey, we provide a holistic overview of the current DLM landscape. We trace its evolution and relationship with other paradigms, such as autoregressive and masked language models, and cover both foundational principles and state-of-the-art models. Our work offers an up-to-date, comprehensive taxonomy and an in-depth analysis of current techniques, from pre-training strategies to advanced post-training methods. Another contribution of this survey is a thorough review of DLM inference strategies and optimizations, including improvements in decoding parallelism, caching mechanisms, and generation quality. We also highlight the latest approaches to multimodal extensions of DLMs and delineate their applications across various practical scenarios. Furthermore, our discussion addresses the limitations and challenges of DLMs, including efficiency, long-sequence handling, and infrastructure requirements, while outlining future research directions to sustain progress in this rapidly evolving field. Project GitHub is available at https://github.com/VILA-Lab/Awesome-DLMs.

【2】Reinforced Language Models for Sequential Decision Making
标题：用于顺序决策的强化语言模型
链接：https://arxiv.org/abs/2508.10839

作者：s, Vahid Yazdanpanah, Sebastian Stein
摘要：大型语言模型（LLM）显示出作为顺序决策代理的潜力，但由于依赖于大型计算昂贵的模型，它们的应用往往受到限制。这就需要改进更小的模型，但现有的后训练方法是为单轮交互设计的，不能处理多步代理任务中的信用分配。为了解决这个问题，我们引入了多步组相对策略优化（MS-GRPO），这是一种用于后训练LLM代理的新算法，基于正式的文本介导随机游戏（TSMG）和代理策略（WP-Agent）框架。对于信用分配，MS-GRPO将整个累积事件奖励归因于每个单独的事件步骤。我们补充这个算法与一个新的绝对优势加权情节采样策略，我们表明提高训练性能。我们通过对Snake和Frozen Lake上的30亿参数模型进行后训练来评估我们的方法。我们的实验表明，该方法在提高决策性能方面是有效的：我们的后训练3B参数模型在Frozen Lake任务上的性能比72 B参数基线高出50%。这项工作表明，有针对性的后训练是一个实用和有效的替代依赖于模型规模使用LLM创建顺序决策代理。
摘要：Large Language Models (LLMs) show potential as sequential decision-making agents, but their application is often limited due to a reliance on large, computationally expensive models. This creates a need to improve smaller models, yet existing post-training methods are designed for single-turn interactions and cannot handle credit assignment in multi-step agentic tasks. To address this, we introduce Multi-Step Group-Relative Policy Optimization (MS-GRPO), a new algorithm for post-training LLM agents, grounded in formal Text-Mediated Stochastic Game (TSMG) and Language-Agent Policy (LAP) frameworks. For credit assignment, MS-GRPO attributes the entire cumulative episode reward to each individual episode step. We supplement this algorithm with a novel absolute-advantage-weighted episode sampling strategy that we show improves training performance. We evaluate our approach by post-training a 3-billion parameter model on Snake and Frozen Lake. Our experiments demonstrate that the method is effective in improving decision-making performance: our post-trained 3B parameter model outperforms a 72B parameter baseline by 50% on the Frozen Lake task. This work demonstrates that targeted post-training is a practical and efficient alternative to relying on model scale for creating sequential decision-making agents using LLMs.

【3】Advancing Autonomous Incident Response: Leveraging LLMs and Cyber Threat Intelligence
标题：推进自主事件响应：利用LLM和网络威胁情报
链接：https://arxiv.org/abs/2508.10677

作者：lache, Abdelaziz Amara Korba, Amdjed Mokhtari, Horea Moldovan, Yacine Ghamri-Doudane
摘要：有效的事件响应（IR）对于缓解网络威胁至关重要，但安全团队却被警报疲劳、高误报率和大量非结构化网络威胁情报（CTI）文档所淹没。虽然CTI在丰富安全操作方面具有巨大的潜力，但其广泛和分散的性质使得手动分析耗时且资源密集。为了弥合这一差距，我们引入了一种新的检索增强生成（RAG）的框架，利用大型语言模型（LLM）自动化和增强IR集成动态检索CTI。我们的方法引入了一种混合检索机制，结合了基于NLP的相似性搜索的CTI向量数据库与标准化查询外部CTI平台，促进上下文感知的安全警报丰富。然后，LLM驱动的响应生成模块利用增强的智能，该模块制定了精确的，可操作的和上下文相关的事件缓解策略。我们提出了一个双重评估范式，其中使用辅助LLM的自动评估由网络安全专家系统地交叉验证。对真实世界和模拟警报的实证验证表明，我们的方法提高了IR的准确性，情境化和效率，减轻了分析人员的工作量，减少了响应延迟。这项工作强调了LLM驱动的CTI融合在推进自主安全操作和为智能自适应网络安全框架奠定基础方面的潜力。
摘要：Effective incident response (IR) is critical for mitigating cyber threats, yet security teams are overwhelmed by alert fatigue, high false-positive rates, and the vast volume of unstructured Cyber Threat Intelligence (CTI) documents. While CTI holds immense potential for enriching security operations, its extensive and fragmented nature makes manual analysis time-consuming and resource-intensive. To bridge this gap, we introduce a novel Retrieval-Augmented Generation (RAG)-based framework that leverages Large Language Models (LLMs) to automate and enhance IR by integrating dynamically retrieved CTI. Our approach introduces a hybrid retrieval mechanism that combines NLP-based similarity searches within a CTI vector database with standardized queries to external CTI platforms, facilitating context-aware enrichment of security alerts. The augmented intelligence is then leveraged by an LLM-powered response generation module, which formulates precise, actionable, and contextually relevant incident mitigation strategies. We propose a dual evaluation paradigm, wherein automated assessment using an auxiliary LLM is systematically cross-validated by cybersecurity experts. Empirical validation on real-world and simulated alerts demonstrates that our approach enhances the accuracy, contextualization, and efficiency of IR, alleviating analyst workload and reducing response latency. This work underscores the potential of LLM-driven CTI fusion in advancing autonomous security operations and establishing a foundation for intelligent, adaptive cybersecurity frameworks.

【4】Technical Report: Facilitating the Adoption of Causal Inference Methods Through LLM-Empowered Co-Pilot
标题：技术报告：通过法学硕士授权的副驾驶促进因果推理方法的采用
链接：https://arxiv.org/abs/2508.10581

作者：rrevoets, Julianna Piskorz, Robert Davis, Harry Amad, Jim Weatherall, Mihaela van der Schaar
摘要：从观察数据估计治疗效果（TE）在许多领域都是一项关键而复杂的任务，从医疗保健和经济学到公共政策。虽然机器学习和因果推理的最新进展已经产生了强大的估计技术，但由于需要在因果假设，调整策略和模型选择方面的深入专业知识，它们的采用仍然有限。在本文中，我们介绍了CATE-B，这是一个开源的联合试点系统，它在代理框架内使用大型语言模型（LLM）来指导用户完成治疗效果估计的端到端过程。CATE-B有助于（i）通过因果发现和基于LLM的边缘方向构建结构因果模型，（ii）通过新的最小不确定性调整集标准识别鲁棒调整集，以及（iii）选择适合因果结构和数据集特征的适当回归方法。为了鼓励可重复性和评估，我们发布了一套跨越不同领域和因果复杂性的基准任务。通过将因果推理与智能交互式辅助相结合，CATE-B降低了严格因果分析的障碍，并为自动治疗效果估计中的一类新基准奠定了基础。
摘要：Estimating treatment effects (TE) from observational data is a critical yet complex task in many fields, from healthcare and economics to public policy. While recent advances in machine learning and causal inference have produced powerful estimation techniques, their adoption remains limited due to the need for deep expertise in causal assumptions, adjustment strategies, and model selection. In this paper, we introduce CATE-B, an open-source co-pilot system that uses large language models (LLMs) within an agentic framework to guide users through the end-to-end process of treatment effect estimation. CATE-B assists in (i) constructing a structural causal model via causal discovery and LLM-based edge orientation, (ii) identifying robust adjustment sets through a novel Minimal Uncertainty Adjustment Set criterion, and (iii) selecting appropriate regression methods tailored to the causal structure and dataset characteristics. To encourage reproducibility and evaluation, we release a suite of benchmark tasks spanning diverse domains and causal complexities. By combining causal inference with intelligent, interactive assistance, CATE-B lowers the barrier to rigorous causal analysis and lays the foundation for a new class of benchmarks in automated treatment effect estimation.

【5】Driving Accurate Allergen Prediction with Protein Language Models and Generalization-Focused Evaluation
标题：利用蛋白质语言模型和以概括为中心的评估推动准确的过敏原预测
链接：https://arxiv.org/abs/2508.10541

作者：ng-Hei Wong, Joshua Mincheol Kim, Sin-Hang Fung, Qing Xiong, Kelvin Fu-Kiu Ao, Junkang Wei, Ran Wang, Dan Michelle Wang, Jingying Zhou, Bo Feng, Alfred Sze-Lok Cheng, Kevin Y. Yip, Stephen Kwok-Wing Tsui, Qin Cao
备注：59 pages, 5 main figures, 15 supplementary figures, 2 supplementary tables
摘要：过敏原，通常是能够引发不良免疫反应的蛋白质，代表了重大的公共卫生挑战。为了准确地识别过敏原蛋白，我们引入了Applm（Allergen Prediction with Protein Language Models），这是一个利用1000亿参数xTrimoPGLM蛋白质语言模型的计算框架。我们表明，Applm始终优于七个国家的最先进的方法，在一组不同的任务，非常类似于困难的现实世界的情况。这些包括鉴定在训练集中缺乏相似实例的新型过敏原，区分具有高序列相似性的同源物中的过敏原和非过敏原，以及评估对蛋白质序列产生很少变化的突变的功能后果。我们的分析证实，xTrimoPGLM最初在1万亿个标记上训练以捕获一般蛋白质序列特征，通过检测蛋白质序列之间的重要差异，对Applm的性能至关重要。除了提供Applm作为开源软件，我们还提供精心策划的基准数据集，以促进未来的研究。
摘要：Allergens, typically proteins capable of triggering adverse immune responses, represent a significant public health challenge. To accurately identify allergen proteins, we introduce Applm (Allergen Prediction with Protein Language Models), a computational framework that leverages the 100-billion parameter xTrimoPGLM protein language model. We show that Applm consistently outperforms seven state-of-the-art methods in a diverse set of tasks that closely resemble difficult real-world scenarios. These include identifying novel allergens that lack similar examples in the training set, differentiating between allergens and non-allergens among homologs with high sequence similarity, and assessing functional consequences of mutations that create few changes to the protein sequences. Our analysis confirms that xTrimoPGLM, originally trained on one trillion tokens to capture general protein sequence characteristics, is crucial for Applm's performance by detecting important differences among protein sequences. In addition to providing Applm as open-source software, we also provide our carefully curated benchmark datasets to facilitate future research.

【6】SC2Arena and StarEvolve: Benchmark and Self-Improvement Framework for LLMs in Complex Decision-Making Tasks
标题：SC 2 Arena和StarEvolve：复杂决策任务中的LLM基准和自我改进框架
链接：https://arxiv.org/abs/2508.10428

作者：en, Yaqing Wang, Ni Mu, Yao Luan, Runpeng Xie, Senhao Yang, Lexiang Wang, Hao Hu, Shuang Xu, Yiqin Yang, Bo Xu
摘要：在复杂决策中评估大型语言模型（LLM）对于提高人工智能的战略规划和实时适应能力至关重要。然而，像《星际争霸2》这样的任务的现有基准测试无法捕捉游戏的全部复杂性，例如完整的游戏环境，多样化的动作空间和所有可玩的种族。为了解决这一差距，我们提出了SC2 Arena，这是一个完全支持所有可玩种族，低级别动作空间的基准测试，并优化了基于文本的观察以应对空间推理挑战。作为补充，我们引入了StarEvolve，这是一个将战略规划与战术执行相结合的分层框架，通过对高质量游戏数据的微调，具有迭代自我纠正和持续改进的特点。它的关键组件包括一个规划者-执行者-验证者结构来分解游戏，以及一个用于选择高质量训练样本的评分系统。使用SC2 Arena的全面分析为开发通才代理提供了宝贵的见解，这是之前的基准测试无法实现的。实验结果还表明，我们提出的StarEvolve实现了卓越的性能，在战略规划。我们的代码、环境和算法都是公开的。
摘要：Evaluating large language models (LLMs) in complex decision-making is essential for advancing AI's ability for strategic planning and real-time adaptation. However, existing benchmarks for tasks like StarCraft II fail to capture the game's full complexity, such as its complete game context, diverse action spaces, and all playable races. To address this gap, we present SC2Arena, a benchmark that fully supports all playable races, low-level action spaces, and optimizes text-based observations to tackle spatial reasoning challenges. Complementing this, we introduce StarEvolve, a hierarchical framework that integrates strategic planning with tactical execution, featuring iterative self-correction and continuous improvement via fine-tuning on high-quality gameplay data. Its key components include a Planner-Executor-Verifier structure to break down gameplay, and a scoring system for selecting high-quality training samples. Comprehensive analysis using SC2Arena provides valuable insights into developing generalist agents that were not possible with previous benchmarks. Experimental results also demonstrate that our proposed StarEvolve achieves superior performance in strategic planning. Our code, environment, and algorithms are publicly available.

【7】XQuant: Breaking the Memory Wall for LLM Inference with KV Cache Rematerialization
标题：XQuant：通过KV缓存重新实体化打破LLM推理的记忆墙
链接：https://arxiv.org/abs/2508.10395

作者：mar, Coleman Hooper, Minjae Lee, Haocheng Xi, Rishabh Tiwari, Wonjun Kang, Luca Manolache, Michael W. Mahoney, Kurt Keutzer, Amir Gholami
备注：24 pages
摘要：虽然LLM推断已经成为许多下游应用程序的关键工作负载，但由于大量的内存占用和带宽要求，有效地推断LLM是具有挑战性的。与此同时，在过去几十年中，计算能力稳步超过内存容量和带宽，这一趋势在现代GPU硬件中仍然很明显，并加剧了LLM推理的挑战。因此，新的算法正在出现，以增加的计算来换取减少的存储器操作。为此，我们提出了XQuant，它利用了这一趋势，通过低位量化实现内存消耗的数量级减少，相对于最先进的KV缓存量化方法，具有显着的准确性优势。我们通过量化和缓存层输入激活X来实现这一点，而不是使用标准的KV缓存，然后在推理过程中动态地重新实现键和值。与KV缓存相比，这可以立即节省2 $\times $的内存。通过应用XQuant，我们实现了高达$\sim 7.7\times$的内存节省与$<0.1$的困惑相比，FP 16基线下降。此外，我们的方法利用了X值在各层中相似的事实。基于这一观察，我们引入了XQuant-CL，它利用X嵌入中的跨层相似性进行极端压缩。在不同的模型中，XQuant-CL相对于FP 16基线仅以0.01的困惑度降级实现了高达10$\times$的内存节省，并且仅以$0.1$的困惑度降级实现了12.5$\times$的内存节省。XQuant利用硬件平台快速增长的计算能力来消除内存瓶颈，同时超越最先进的KV缓存量化方法，并在各种模型中实现接近FP 16的精度。
摘要：Although LLM inference has emerged as a critical workload for many downstream applications, efficiently inferring LLMs is challenging due to the substantial memory footprint and bandwidth requirements. In parallel, compute capabilities have steadily outpaced both memory capacity and bandwidth over the last few decades, a trend that remains evident in modern GPU hardware and exacerbates the challenge of LLM inference. As such, new algorithms are emerging that trade increased computation for reduced memory operations. To that end, we present XQuant, which takes advantage of this trend, enabling an order-of-magnitude reduction in memory consumption through low-bit quantization with substantial accuracy benefits relative to state-of-the-art KV cache quantization methods. We accomplish this by quantizing and caching the layer input activations X, instead of using standard KV caching, and then rematerializing the Keys and Values on-the-fly during inference. This results in an immediate 2$\times$ memory savings compared to KV caching. By applying XQuant, we achieve up to $\sim 7.7\times$ memory savings with $<0.1$ perplexity degradation compared to the FP16 baseline. Furthermore, our approach leverages the fact that X values are similar across layers. Building on this observation, we introduce XQuant-CL, which exploits the cross-layer similarity in the X embeddings for extreme compression. Across different models, XQuant-CL attains up to 10$\times$ memory savings relative to the FP16 baseline with only 0.01 perplexity degradation, and 12.5$\times$ memory savings with only $0.1$ perplexity degradation. XQuant exploits the rapidly increasing compute capabilities of hardware platforms to eliminate the memory bottleneck, while surpassing state-of-the-art KV cache quantization methods and achieving near-FP16 accuracy across a wide range of models.

【8】A Vision-Language Pre-training Model-Guided Approach for Mitigating Backdoor Attacks in Federated Learning
标题：一种视觉语言预训练模型引导的方法用于缓解联邦学习中的后门攻击
链接：https://arxiv.org/abs/2508.10315

作者： Dongjue Wang, Jing Yu, Liehuang Zhu, Qi Wu
摘要：现有的联邦学习后门防御方法依赖于客户端数据分布均匀或服务器数据集干净的假设，限制了其实用性和有效性。在异构客户端数据分布下防御后门攻击，同时保持模型性能仍然是一个重大挑战。在本文中，我们提出了一个名为CLIP-Fed的FL后门防御框架，该框架利用了视觉语言预训练模型的zero-shot学习能力。通过整合聚合前和聚合后的防御策略，CLIP-Fed克服了非IID对防御有效性的限制。为了解决隐私问题并提高数据集对不同触发器的覆盖率，我们使用多模态大语言模型和频率分析来构建和增强服务器数据集，而无需任何客户端样本。为了解决由后门样本引起的类原型偏差，并消除触发模式和目标标签之间的相关性，CLIP-Fed使用原型对比损失和Kullback-Leibler散度在增强数据集上对齐全局模型和CLIP的知识。在代表性数据集上的大量实验验证了CLIP-Fed的有效性。与最先进的方法相比，CLIP-Fed实现了ASR的平均降低，即，CIFAR-10和CIFAR-10-LT的平均MA分别提高了2.03%和1.35%，平均MA分别提高了7.92%和0.48%。
摘要：Existing backdoor defense methods in Federated Learning (FL) rely on the assumption of homogeneous client data distributions or the availability of a clean serve dataset, which limits the practicality and effectiveness. Defending against backdoor attacks under heterogeneous client data distributions while preserving model performance remains a significant challenge. In this paper, we propose a FL backdoor defense framework named CLIP-Fed, which leverages the zero-shot learning capabilities of vision-language pre-training models. By integrating both pre-aggregation and post-aggregation defense strategies, CLIP-Fed overcomes the limitations of Non-IID imposed on defense effectiveness. To address privacy concerns and enhance the coverage of the dataset against diverse triggers, we construct and augment the server dataset using the multimodal large language model and frequency analysis without any client samples. To address class prototype deviations caused by backdoor samples and eliminate the correlation between trigger patterns and target labels, CLIP-Fed aligns the knowledge of the global model and CLIP on the augmented dataset using prototype contrastive loss and Kullback-Leibler divergence. Extensive experiments on representative datasets validate the effectiveness of CLIP-Fed. Compared to state-of-the-art methods, CLIP-Fed achieves an average reduction in ASR, i.e., 2.03\% on CIFAR-10 and 1.35\% on CIFAR-10-LT, while improving average MA by 7.92\% and 0.48\%, respectively.

【9】Prompt-Response Semantic Divergence Metrics for Faithfulness Hallucination and Misalignment Detection in Large Language Models
标题：冲突响应语义分歧用于大型语言模型中的忠实幻觉和错位检测
链接：https://arxiv.org/abs/2508.10192

作者：erin
备注：24 pages, 3 figures
摘要：大型语言模型（LLM）的扩散受到幻觉，关键故障模式的挑战，其中模型生成非事实，无意义或不忠实的文本。本文介绍了语义分歧检测（SDM），一种新的轻量级框架检测忠实幻觉-事件的严重偏差的LLM响应输入上下文。我们专注于这些LLM错误的具体实现，{虚构，定义为任意的和语义上与用户查询不一致的响应。现有的方法，如语义熵测试的任意性，通过测量答案的多样性，一个单一的，固定的提示。我们的SDM框架改进了这一点，更意识到：我们测试的更深层次的形式的任意性，测量响应的一致性，不仅在多个答案，而且在多个，语义上等同的释义的原始提示。方法上，我们的方法使用联合聚类的句子嵌入创建一个共享的主题空间的提示和答案。提示和响应之间的主题共现的热图可以被视为用户-机器对话的量化二维可视化。然后，我们计算一套信息理论的指标来衡量提示和响应之间的语义分歧。我们的实际分数，$\mathcal{S}_H$，结合了詹森-香农分歧和沃瑟斯坦距离来量化这种分歧，分数高表明忠诚幻觉。此外，我们确定KL分歧KL（答案$||$ Prompt）作为\textbf{语义探索}的强大指标，是区分不同生成行为的关键信号。这些指标被进一步组合到语义框中，语义框是一个诊断框架，用于对LLM反应类型进行分类，包括危险的、自信的虚构。
摘要：The proliferation of Large Language Models (LLMs) is challenged by hallucinations, critical failure modes where models generate non-factual, nonsensical or unfaithful text. This paper introduces Semantic Divergence Metrics (SDM), a novel lightweight framework for detecting Faithfulness Hallucinations -- events of severe deviations of LLMs responses from input contexts. We focus on a specific implementation of these LLM errors, {confabulations, defined as responses that are arbitrary and semantically misaligned with the user's query. Existing methods like Semantic Entropy test for arbitrariness by measuring the diversity of answers to a single, fixed prompt. Our SDM framework improves upon this by being more prompt-aware: we test for a deeper form of arbitrariness by measuring response consistency not only across multiple answers but also across multiple, semantically-equivalent paraphrases of the original prompt. Methodologically, our approach uses joint clustering on sentence embeddings to create a shared topic space for prompts and answers. A heatmap of topic co-occurances between prompts and responses can be viewed as a quantified two-dimensional visualization of the user-machine dialogue. We then compute a suite of information-theoretic metrics to measure the semantic divergence between prompts and responses. Our practical score, $\mathcal{S}_H$, combines the Jensen-Shannon divergence and Wasserstein distance to quantify this divergence, with a high score indicating a Faithfulness hallucination. Furthermore, we identify the KL divergence KL(Answer $||$ Prompt) as a powerful indicator of \textbf{Semantic Exploration}, a key signal for distinguishing different generative behaviors. These metrics are further combined into the Semantic Box, a diagnostic framework for classifying LLM response types, including the dangerous, confident confabulation.

【10】Nested-ReFT: Efficient Reinforcement Learning for Large Language Model Fine-Tuning via Off-Policy Rollouts
标题：Nested-ReFT：通过政策外滚放进行大型语言模型微调的高效强化学习
链接：https://arxiv.org/abs/2508.10123

作者：uillet, Yufei Cui, Boxing Chen, Audrey Durand, Prasanna Parthasarathi
摘要：LLM中具有挑战性的领域（如数学推理）的高级推理可以使用基于可验证奖励的强化微调（ReFT）来解决。在标准的ReFT框架中，行为模型为每个问题生成多个答案，然后通过奖励函数对答案进行评分。虽然这种RL后训练方法在具有挑战性的推理领域中表现出显着的性能改进，但在具有多个推理步骤的训练期间生成补全的计算成本使得训练成本不小。为了解决这个问题，我们从非策略强化学习和推测性解码中汲取灵感，引入了一个新的ReFT框架，称为Nested-ReFT，其中目标模型的一个子集作为行为模型，在训练过程中生成非策略完成。与标准ReFT框架相比，在训练期间每批配置有动态跳层的行为模型降低了推理成本。我们的理论分析表明，Nested-ReFT产生无偏梯度估计与控制方差。我们的实证分析表明，在多个数学推理基准和模型大小上，计算效率得到了提高，以令牌/秒衡量。此外，我们还探索了三种偏差缓解方法，以最大限度地减少梯度更新中的非策略性，从而保持与基准ReFT性能相匹配的性能。
摘要：Advanced reasoning in LLMs on challenging domains like mathematical reasoning can be tackled using verifiable rewards based reinforced fine-tuning (ReFT). In standard ReFT frameworks, a behavior model generates multiple completions with answers per problem, for the answer to be then scored by a reward function. While such RL post-training methods demonstrate significant performance improvements across challenging reasoning domains, the computational cost of generating completions during training with multiple inference steps makes the training cost non-trivial. To address this, we draw inspiration from off-policy RL, and speculative decoding to introduce a novel ReFT framework, dubbed Nested-ReFT, where a subset of layers of the target model acts as the behavior model to generate off-policy completions during training. The behavior model configured with dynamic layer skipping per batch during training decreases the inference cost compared to the standard ReFT frameworks. Our theoretical analysis shows that Nested-ReFT yields unbiased gradient estimates with controlled variance. Our empirical analysis demonstrates improved computational efficiency measured as tokens/sec across multiple math reasoning benchmarks and model sizes. Additionally, we explore three variants of bias mitigation to minimize the off-policyness in the gradient updates that allows for maintaining performance that matches the baseline ReFT performance.

【11】Less is More: Learning Graph Tasks with Just LLMs
标题：少即是多：仅通过LLM学习图形任务
链接：https://arxiv.org/abs/2508.10115

作者：ai, Kavitha Srinivas, Julian Dolby, Michael Katz, Horst Samulowitz, Shirin Sohrabi
摘要：对于大型语言模型（LLM），图上的推理可以帮助解决许多问题。先前的工作试图通过研究如何最好地将图序列化为文本以及通过组合GNN和LLM来改进LLM图推理。然而，这种方法的优点仍然不清楚，所以我们经验性地回答了以下研究问题：（1）LLM可以在没有专门的图编码模型的情况下学习解决基本的图任务吗？(2)LLM可以将学习到的解决方案推广到看不见的图结构或任务吗？以及（3）学习图形任务的竞争方法的优点是什么？我们表明，即使是小型的LLM也可以通过使用指导性的思维链解决方案来训练它们来学习解决图形任务，并且这种训练可以在没有专门的图形编码器的情况下推广到新的任务和图形结构。
摘要：For large language models (LLMs), reasoning over graphs could help solve many problems. Prior work has tried to improve LLM graph reasoning by examining how best to serialize graphs as text and by combining GNNs and LLMs. However, the merits of such approaches remain unclear, so we empirically answer the following research questions: (1) Can LLMs learn to solve fundamental graph tasks without specialized graph encoding models?, (2) Can LLMs generalize learned solutions to unseen graph structures or tasks?, and (3) What are the merits of competing approaches to learn graph tasks? We show that even small LLMs can learn to solve graph tasks by training them with instructive chain-of-thought solutions, and this training generalizes, without specialized graph encoders, to new tasks and graph structures.

【12】Constrained Decoding of Diffusion LLMs with Context-Free Grammars
标题：具有上下文无关文法的扩散LLM的约束解码
链接：https://arxiv.org/abs/2508.10111

作者：dler, Jasper Dekoninck, Martin Vechev
摘要：大型语言模型（LLM）在不同领域表现出良好的性能。LLM的许多实际应用，如代码完成和结构化数据提取，都需要遵守形式语言指定的语法约束。然而，由于其概率性质，LLM输出不能保证遵守这些形式语言。先前的工作已经提出了约束解码作为一种手段，以限制LLM生成特定的形式语言。然而，当用于实际场景（例如生成形式正确的C++或JSON输出）时，现有的工作不适用于新兴的扩散LLM范式。在本文中，我们解决这一挑战，并提出了第一个约束解码方法的扩散模型，一个可以处理形式语言捕获的上下文无关的语法。我们首先将约束解码简化为更一般的加法填充问题，该问题询问部分输出是否可以完成为目标语言中的有效单词。该问题也自然地包含先前未解决的多区域填充约束解码。然后，我们减少这个问题的任务，决定是否目标语言和一个正规语言的交集是空的，并提出了一个有效的算法来解决它的上下文无关的语言。各种应用程序的实证结果，如C++代码填充和JSON中的结构化数据提取，表明我们的方法实现了近乎完美的语法正确性，同时始终保持或提高功能的正确性。重要的是，我们的效率优化确保了计算开销保持实用。
摘要：Large language models (LLMs) have shown promising performance across diverse domains. Many practical applications of LLMs, such as code completion and structured data extraction, require adherence to syntactic constraints specified by a formal language. Yet, due to their probabilistic nature, LLM output is not guaranteed to adhere to such formal languages. Prior work has proposed constrained decoding as a means to restrict LLM generation to particular formal languages. However, existing works are not applicable to the emerging paradigm of diffusion LLMs, when used in practical scenarios such as the generation of formally correct C++ or JSON output. In this paper we address this challenge and present the first constrained decoding method for diffusion models, one that can handle formal languages captured by context-free grammars. We begin by reducing constrained decoding to the more general additive infilling problem, which asks whether a partial output can be completed to a valid word in the target language. This problem also naturally subsumes the previously unaddressed multi-region infilling constrained decoding. We then reduce this problem to the task of deciding whether the intersection of the target language and a regular language is empty and present an efficient algorithm to solve it for context-free languages. Empirical results on various applications, such as C++ code infilling and structured data extraction in JSON, demonstrate that our method achieves near-perfect syntactic correctness while consistently preserving or improving functional correctness. Importantly, our efficiency optimizations ensure that the computational overhead remains practical.

【13】PREF: Reference-Free Evaluation of Personalised Text Generation in LLMs
标题：PREF：LLM中个性化文本生成的无参考评估
链接：https://arxiv.org/abs/2508.10028

作者：Hossein A. Rahmani, Bin Wu, Jerome Ramos, Emine Yilmaz, Aldo Lipani
备注：7 pages
摘要：个性化文本生成是以用户为中心的信息系统必不可少的，但大多数评估方法忽略了用户的个性。我们引入了\textbf{PREF}，这是一种\textbf{P}超声波化的\textbf{R}无参考\textbf{E}评估\textbf{F}算法，可联合测量一般输出质量和用户特定的对齐，而无需黄金个性化参考。PREF在三个步骤的管道中操作：（1）覆盖阶段使用大型语言模型（LLM）来生成全面的、特定于查询的指南，该指南涵盖诸如真实性、连贯性和完整性等通用标准;（2）偏好阶段使用目标用户的简档、陈述或推断的偏好以及上下文来重新排序并选择性地增强这些因素，从而生成个性化的评估规则;以及（3）评分阶段应用LLM法官来针对该题目对候选答案进行评级，确保基线充分性，同时捕获主观优先级。这种覆盖率与偏好的分离提高了鲁棒性，透明度和可重用性，并允许较小的模型接近较大模型的个性化质量。PrefEval基准测试的实验，包括隐式偏好跟踪任务，表明PREF比强基线实现了更高的准确性，更好的校准，以及与人类判断更接近的一致性。通过实现可扩展、可解释和用户一致的评估，PREF为更可靠的评估和个性化语言生成系统的开发奠定了基础。
摘要：Personalised text generation is essential for user-centric information systems, yet most evaluation methods overlook the individuality of users. We introduce \textbf{PREF}, a \textbf{P}ersonalised \textbf{R}eference-free \textbf{E}valuation \textbf{F}ramework that jointly measures general output quality and user-specific alignment without requiring gold personalised references. PREF operates in a three-step pipeline: (1) a coverage stage uses a large language model (LLM) to generate a comprehensive, query-specific guideline covering universal criteria such as factuality, coherence, and completeness; (2) a preference stage re-ranks and selectively augments these factors using the target user's profile, stated or inferred preferences, and context, producing a personalised evaluation rubric; and (3) a scoring stage applies an LLM judge to rate candidate answers against this rubric, ensuring baseline adequacy while capturing subjective priorities. This separation of coverage from preference improves robustness, transparency, and reusability, and allows smaller models to approximate the personalised quality of larger ones. Experiments on the PrefEval benchmark, including implicit preference-following tasks, show that PREF achieves higher accuracy, better calibration, and closer alignment with human judgments than strong baselines. By enabling scalable, interpretable, and user-aligned evaluation, PREF lays the groundwork for more reliable assessment and development of personalised language generation systems.

【14】SABER: Switchable and Balanced Training for Efficient LLM Reasoning
标题：SABER：可切换且平衡的训练，以实现高效的LLM推理
链接：https://arxiv.org/abs/2508.10026

作者： Yanjun Zhao, Jiaming Song, Shien He, Lusheng Zhang, Qiang Zhang, Tianjiao Li
摘要：由思维链推理授权的大型语言模型（LLM）在复杂任务上取得了令人印象深刻的准确性，但在统一应用于所有问题时，会遭受过度的推理成本和延迟。我们提出了SABER（Switchable and Balanced Training for Efficient LLM Reasoning），这是一个强化学习框架，它赋予LLM用户可控的、令牌预算的推理。SABER首先分析每个训练示例的基本模型思维令牌使用情况，并将其分配给一个预定义的预算层。在微调期间，模型由系统提示和长度感知奖励指导，以遵守其分配的预算。与此同时，我们结合了无思考的例子，以确保模型即使在显式推理关闭时也保持可靠。SABER还支持四种离散推理模式- NoThink、FastThink、CoreThink和DeepThink，从而在延迟和推理深度之间实现灵活的权衡。对数学推理（MATH，GSM 8 K），代码生成（MBPP）和逻辑推理（LiveBench-Reasoning）的广泛评估表明，SABER在预算紧张的情况下实现了高精度，优雅的降级以及有效的跨尺度和跨域泛化。特别是，SABER-FastThink将推理长度减少了65.4%，与MATH基准测试的基础模型相比，准确率提高了3.6%。
摘要：Large language models (LLMs) empowered by chain-of-thought reasoning have achieved impressive accuracy on complex tasks but suffer from excessive inference costs and latency when applied uniformly to all problems. We propose SABER (Switchable and Balanced Training for Efficient LLM Reasoning), a reinforcement learning framework that endows LLMs with user-controllable, token-budgeted reasoning. SABER first profiles each training example's base-model thinking token usage and assigns it to one of the predefined budget tiers. During fine-tuning, the model is guided by system prompts and length-aware rewards to respect its assigned budget. In parallel, we incorporate no-think examples to ensure the model remains reliable even when explicit reasoning is turned off. SABER further supports four discrete inference modes - NoThink, FastThink, CoreThink, and DeepThink, enabling flexible trade-offs between latency and reasoning depth. Extensive evaluations on math reasoning (MATH, GSM8K), code generation (MBPP), and logical reasoning (LiveBench-Reasoning) demonstrate that SABER achieves high accuracy under tight budgets, graceful degradation, and effective cross-scale and cross-domain generalization. In particular, SABER-FastThink cuts reasoning length by 65.4% and yields a 3.6% accuracy gain compared with the base model on the MATH benchmark.

【15】AutoGeTS: Knowledge-based Automated Generation of Text Synthetics for Improving Text Classification
标题：AutoGeTS：基于知识的文本合成自动生成，用于改进文本分类
链接：https://arxiv.org/abs/2508.10000

作者：ue, Yuanzhe Jin, Adrian Carrasco-Revilla, Joyraj Chakraborty, Min Chen
摘要：在为现实世界的应用开发文本分类模型时，一个主要的挑战是难以为所有文本类收集足够的数据。在这项工作中，我们通过利用大型语言模型（LLM）来生成合成数据，并使用这些数据来提高模型的性能，而无需等待收集和标记更多的真实数据来解决这一挑战。作为一个LLM生成不同的合成数据，以响应不同的输入示例，我们制定了一个自动化的工作流程，搜索输入的例子，导致更“有效”的合成数据，以改善有关模型。我们研究了三种搜索策略与广泛的实验，并使用实验结果通知集成算法，选择一个搜索策略，根据类的特点。我们进一步的实验表明，这种集成方法比我们使用LLM改进分类模型的自动化工作流程中的每个单独策略更有效。
摘要：When developing text classification models for real world applications, one major challenge is the difficulty to collect sufficient data for all text classes. In this work, we address this challenge by utilizing large language models (LLMs) to generate synthetic data and using such data to improve the performance of the models without waiting for more real data to be collected and labelled. As an LLM generates different synthetic data in response to different input examples, we formulate an automated workflow, which searches for input examples that lead to more ``effective'' synthetic data for improving the model concerned. We study three search strategies with an extensive set of experiments, and use experiment results to inform an ensemble algorithm that selects a search strategy according to the characteristics of a class. Our further experiments demonstrate that this ensemble approach is more effective than each individual strategy in our automated workflow for improving classification models using LLMs.

【16】XFacta: Contemporary, Real-World Dataset and Evaluation for Multimodal Misinformation Detection with Multimodal LLMs
标题：XFacta：使用多模式LLM进行多模式错误信息检测的当代现实世界数据集和评估
链接：https://arxiv.org/abs/2508.09999

作者：ao, Zeyu Han, Yuhan Wang, Huaizu Jiang
备注：For associated code and dataset, see this https URL
摘要：多模态错误信息在社交媒体上的快速传播需要更有效和更强大的检测方法。利用多模态大型语言模型（MLLM）的最新进展已经显示出解决这一挑战的潜力。然而，目前还不清楚现有方法的瓶颈究竟在哪里（证据检索与推理），阻碍了该领域的进一步发展。在数据集方面，现有的基准包含过时的事件，由于与当代社交媒体场景的差异导致评估偏差，因为MLLM可以简单地记住这些事件，或者人工合成，无法反映真实世界的错误信息模式。此外，它缺乏对基于MLLM的模型设计策略的全面分析。为了解决这些问题，我们引入了XFacta，这是一个当代的真实世界数据集，更适合评估基于MLLM的检测器。我们系统地评估各种基于MLLM的错误信息检测策略，评估不同架构和规模的模型，以及对现有检测方法的基准测试。在这些分析的基础上，我们进一步启用了一个半自动检测的循环框架，不断更新XFacta的新内容，以保持其当代的相关性。我们的分析为推进多模态错误信息检测领域提供了有价值的见解和实践。代码和数据已经发布。
摘要：The rapid spread of multimodal misinformation on social media calls for more effective and robust detection methods. Recent advances leveraging multimodal large language models (MLLMs) have shown the potential in addressing this challenge. However, it remains unclear exactly where the bottleneck of existing approaches lies (evidence retrieval v.s. reasoning), hindering the further advances in this field. On the dataset side, existing benchmarks either contain outdated events, leading to evaluation bias due to discrepancies with contemporary social media scenarios as MLLMs can simply memorize these events, or artificially synthetic, failing to reflect real-world misinformation patterns. Additionally, it lacks comprehensive analyses of MLLM-based model design strategies. To address these issues, we introduce XFacta, a contemporary, real-world dataset that is better suited for evaluating MLLM-based detectors. We systematically evaluate various MLLM-based misinformation detection strategies, assessing models across different architectures and scales, as well as benchmarking against existing detection methods. Building on these analyses, we further enable a semi-automatic detection-in-the-loop framework that continuously updates XFacta with new content to maintain its contemporary relevance. Our analysis provides valuable insights and practices for advancing the field of multimodal misinformation detection. The code and data have been released.

Graph相关(图学习|图神经网络|图优化等)(5篇)

【1】Enhancing Fairness in Autoencoders for Node-Level Graph Anomaly Detection
标题：增强自动编码器的公平性以实现节点级图异常检测
链接：https://arxiv.org/abs/2508.10785

作者：ng, Yuchen Song, Sheng'en Li, Dongmian Zou
备注：Accepted in ECAI-2025
摘要：图异常检测（GAD）已经成为跨各个领域的一个越来越重要的任务。随着图神经网络（GNNs）的快速发展，GAD方法的性能得到了显著的提高。然而，GAD中的公平性考虑在很大程度上仍未得到充分探讨。事实上，基于GNN的GAD模型可以继承和放大训练数据中存在的偏差，可能导致不公平的结果。虽然现有的努力集中在开发公平的GNN，但大多数方法都是针对节点分类任务，其中模型通常依赖于简单的层架构，而不是基于自动编码器的结构，这是异常检测中最广泛使用的架构。为了解决基于自动编码器的GAD模型中的公平性，我们提出了\textbf{D}是\textbf{E}ntangled \textbf{C}反事实\textbf{A}对抗\textbf{F}air（DECAF）-GAD，这是一个在保持GAD性能的同时消除偏差的框架。具体来说，我们引入了一个结构因果模型（SCM）来解开敏感的属性从学习表示。基于这个因果框架，我们制定了一个专门的自动编码器架构以及公平指导的损失函数。通过对合成数据集和真实数据集的广泛实验，我们证明了DECAF-GAD不仅实现了竞争异常检测性能，而且与基线GAD方法相比显着增强了公平性度量。我们的代码可在https://github.com/Tlhey/decaf_code上获得。
摘要：Graph anomaly detection (GAD) has become an increasingly important task across various domains. With the rapid development of graph neural networks (GNNs), GAD methods have achieved significant performance improvements. However, fairness considerations in GAD remain largely underexplored. Indeed, GNN-based GAD models can inherit and amplify biases present in training data, potentially leading to unfair outcomes. While existing efforts have focused on developing fair GNNs, most approaches target node classification tasks, where models often rely on simple layer architectures rather than autoencoder-based structures, which are the most widely used architecturs for anomaly detection. To address fairness in autoencoder-based GAD models, we propose \textbf{D}is\textbf{E}ntangled \textbf{C}ounterfactual \textbf{A}dversarial \textbf{F}air (DECAF)-GAD, a framework that alleviates bias while preserving GAD performance. Specifically, we introduce a structural causal model (SCM) to disentangle sensitive attributes from learned representations. Based on this causal framework, we formulate a specialized autoencoder architecture along with a fairness-guided loss function. Through extensive experiments on both synthetic and real-world datasets, we demonstrate that DECAF-GAD not only achieves competitive anomaly detection performance but also significantly enhances fairness metrics compared to baseline GAD methods. Our code is available at https://github.com/Tlhey/decaf_code.

【2】Graph Learning via Logic-Based Weisfeiler-Leman Variants and Tabularization
标题：通过基于逻辑的Weisfeiler-Leman变体和表格化的图学习
链接：https://arxiv.org/abs/2508.10651

作者：kkola, Tomi Janhunen, Antti Kuusisto, Magdalena Ortiz, Matias Selin, Mantas Šimkus
摘要：我们提出了一种新的方法，图形分类的基础上，表格化图形数据通过Weisfeiler-Leman算法的变种，然后应用方法的表格数据。我们调查了一个全面的类Weisfeiler-Leman变体通过修改基本的逻辑框架，并建立一个精确的理论表征其表达能力。然后，我们在跨越一系列不同领域的12个基准数据集上测试了两个选定的变体。实验表明，我们的方法与最先进的图神经网络和图内核的准确性相匹配，同时根据数据集的不同，具有更高的时间或内存效率。我们还简要讨论了直接从图数据集中提取可解释的模态逻辑公式。
摘要：We present a novel approach for graph classification based on tabularizing graph data via variants of the Weisfeiler-Leman algorithm and then applying methods for tabular data. We investigate a comprehensive class of Weisfeiler-Leman variants obtained by modifying the underlying logical framework and establish a precise theoretical characterization of their expressive power. We then test two selected variants on twelve benchmark datasets that span a range of different domains. The experiments demonstrate that our approach matches the accuracy of state-of-the-art graph neural networks and graph kernels while being more time or memory efficient, depending on the dataset. We also briefly discuss directly extracting interpretable modal logic formulas from graph datasets.

【3】FreeGAD: A Training-Free yet Effective Approach for Graph Anomaly Detection
标题：FreeGAD：一种无需训练但有效的图异常检测方法
链接：https://arxiv.org/abs/2508.10594

作者：hao, Yixin Liu, Shiyuan Li, Qingfeng Chen, Yu Zheng, Shirui Pan
摘要：图异常检测（GAD）旨在识别图中偏离大多数的节点，在社交网络和电子商务等应用中起着至关重要的作用。尽管目前基于深度学习的GAD取得了进展，但由于其复杂且资源密集型的训练过程，现有方法通常存在部署成本高和可扩展性差的问题。令人惊讶的是，我们的实证研究结果表明，深度GAD方法的训练阶段，通常被认为是至关重要的，实际上可能对异常检测性能的贡献比预期的要小。受此启发，我们提出了FreeGAD，这是一种新的免训练但有效的GAD方法。具体而言，它利用仿射门控残差编码器来生成异常感知表示。同时，FreeGAD将锚节点识别为伪正常和异常引导，然后通过锚引导的统计偏差计算异常得分。大量的实验表明，FreeGAD在来自不同领域的多个基准数据集上实现了卓越的异常检测性能，效率和可扩展性，无需任何训练或迭代优化。
摘要：Graph Anomaly Detection (GAD) aims to identify nodes that deviate from the majority within a graph, playing a crucial role in applications such as social networks and e-commerce. Despite the current advancements in deep learning-based GAD, existing approaches often suffer from high deployment costs and poor scalability due to their complex and resource-intensive training processes. Surprisingly, our empirical findings suggest that the training phase of deep GAD methods, commonly perceived as crucial, may actually contribute less to anomaly detection performance than expected. Inspired by this, we propose FreeGAD, a novel training-free yet effective GAD method. Specifically, it leverages an affinity-gated residual encoder to generate anomaly-aware representations. Meanwhile, FreeGAD identifies anchor nodes as pseudo-normal and anomalous guides, followed by calculating anomaly scores through anchor-guided statistical deviations. Extensive experiments demonstrate that FreeGAD achieves superior anomaly detection performance, efficiency, and scalability on multiple benchmark datasets from diverse domains, without any training or iterative optimization.

【4】GraphFedMIG: Tackling Class Imbalance in Federated Graph Learning via Mutual Information-Guided Generation
标题：GraphFederator：通过互信息引导生成解决联邦图学习中的类不平衡
链接：https://arxiv.org/abs/2508.10471

作者：, Qilin Fan, Tianfu Wang, Kaiwen Wei, Ke Yu, Xu Zhang
摘要：联邦图学习（FGL）使多个客户端能够协作训练强大的图神经网络，而无需共享其私有的分散图数据。FGL继承自通用联邦学习，受到统计异质性的严重挑战，其中跨客户端的非IID数据分布可能严重损害模型性能。一种特别具有破坏性的形式是类不平衡，这会导致全局模型偏向多数类，无法识别罕见但关键的事件。这个问题在FGL中更加严重，因为来自少数类的节点通常被有偏见的邻域信息包围，阻碍了表达嵌入的学习。为了应对这一挑战，我们提出了一个新的FGL框架，将问题重新定义为联邦生成数据增强任务。GraphFedora采用了一个分层的生成对抗网络，每个客户端都训练一个本地生成器来合成高保真的特征表示。为了提供量身定制的监督，客户被分组为集群，每个集群共享一个专用的数据库。至关重要的是，该框架设计了一个相互信息引导的机制来引导这些客户端生成器的发展。通过计算每个客户端的唯一信息值，该机制校正了本地生成器参数，确保后续的交互信息引导生成轮专注于生成高价值的少数类特征。我们在四个真实世界的数据集上进行了广泛的实验，结果表明，与其他基线相比，所提出的GraphFederator的优越性。
摘要：Federated graph learning (FGL) enables multiple clients to collaboratively train powerful graph neural networks without sharing their private, decentralized graph data. Inherited from generic federated learning, FGL is critically challenged by statistical heterogeneity, where non-IID data distributions across clients can severely impair model performance. A particularly destructive form of this is class imbalance, which causes the global model to become biased towards majority classes and fail at identifying rare but critical events. This issue is exacerbated in FGL, as nodes from a minority class are often surrounded by biased neighborhood information, hindering the learning of expressive embeddings. To grapple with this challenge, we propose GraphFedMIG, a novel FGL framework that reframes the problem as a federated generative data augmentation task. GraphFedMIG employs a hierarchical generative adversarial network where each client trains a local generator to synthesize high-fidelity feature representations. To provide tailored supervision, clients are grouped into clusters, each sharing a dedicated discriminator. Crucially, the framework designs a mutual information-guided mechanism to steer the evolution of these client generators. By calculating each client's unique informational value, this mechanism corrects the local generator parameters, ensuring that subsequent rounds of mutual information-guided generation are focused on producing high-value, minority-class features. We conduct extensive experiments on four real-world datasets, and the results demonstrate the superiority of the proposed GraphFedMIG compared with other baselines.

【5】In silico study on the cytotoxicity against Hela cancer cells of xanthones bioactive compounds from Garcinia cowa: QSAR based on Graph Deep Learning, Network Pharmacology, and Molecular Docking
标题：来自山楂的氧杂黄酮生物活性化合物对HeLa癌细胞的细胞毒性的计算机模拟研究：基于图形深度学习、网络药理学和分子对接的QSAR
链接：https://arxiv.org/abs/2508.10117

作者：nh Son, Pham Huu Vang, Nguyen Thi Dung, Nguyen Manh Ha. Ta Thi Thao, Tran Thi Thu Thuy, Phan Minh Giang
摘要：癌症被认为是一组复杂的疾病，导致全球最高的死亡率，患病率不断上升，并有影响年轻人口的趋势。其特征在于异常细胞的不受控制的增殖、邻近组织的侵袭和向远处器官的转移。藤黄，一种传统的药用植物，广泛用于东南亚，包括越南，用于治疗发烧，咳嗽，消化不良，作为泻药，和寄生虫病。从该物种中分离出的许多氧杂蒽酮化合物表现出广泛的生物活性，其中一些显示出作为抗癌和抗疟疾剂的前景。网络药理学分析成功地确定了关键的生物活性化合物Rubraxanthone，Garcinone D，Norcowanin，Cowanol和Cowaxanthone以及它们的主要蛋白质靶点（TNF，CTNNB1，SRC，NFKB 1和MTOR），为其抗癌作用的分子机制提供了重要的见解。图形注意力网络算法表现出卓越的预测性能，在数据增强后达到0.98的R2和0.02的RMSE，突出了其在预测基于氧杂蒽酮的化合物的pIC 50值方面的准确性。此外，分子对接揭示MTOR作为一个潜在的目标，诱导细胞毒性的HeLa癌细胞从山竹。
摘要：Cancer is recognized as a complex group of diseases, contributing to the highest global mortality rates, with increasing prevalence and a trend toward affecting younger populations. It is characterized by uncontrolled proliferation of abnormal cells, invasion of adjacent tissues, and metastasis to distant organs. Garcinia cowa, a traditional medicinal plant widely used in Southeast Asia, including Vietnam, is employed to treat fever, cough, indigestion, as a laxative, and for parasitic diseases. Numerous xanthone compounds isolated from this species exhibit a broad spectrum of biological activities, with some showing promise as anti cancer and antimalarial agents. Network pharmacology analysis successfully identified key bioactive compounds Rubraxanthone, Garcinone D, Norcowanin, Cowanol, and Cowaxanthone alongside their primary protein targets (TNF, CTNNB1, SRC, NFKB1, and MTOR), providing critical insights into the molecular mechanisms underlying their anti-cancer effects. The Graph Attention Network algorithm demonstrated superior predictive performance, achieving an R2 of 0.98 and an RMSE of 0.02 after data augmentation, highlighting its accuracy in predicting pIC50 values for xanthone based compounds. Additionally, molecular docking revealed MTOR as a potential target for inducing cytotoxicity in HeLa cancer cells from Garcinia cowa.

Transformer(6篇)

【1】Memory-Augmented Transformers: A Systematic Review from Neuroscience Principles to Technical Solutions
标题：记忆增强Transformer：从神经科学原理到技术解决方案的系统性回顾
链接：https://arxiv.org/abs/2508.10824

作者：di, Xingshuai Huang, Axel Laborieux, Bahareh Nikpour, Tianyu Shi, Armaghan Eshaghi
摘要：记忆是智能的基础，使学习，推理和适应性跨越生物和人工系统。虽然Transformer架构在序列建模方面表现出色，但它们在长期上下文保持、持续学习和知识集成方面面临着严重的限制。这篇综述提出了一个统一的框架，桥接神经科学原理，包括动态多时间尺度记忆，选择性注意和巩固，与工程进展的记忆增强Transformers。我们通过三个分类维度组织最近的进展：功能目标（上下文扩展，推理，知识整合，适应），记忆表示（参数编码，基于状态，明确，混合），和整合机制（注意力融合，门控，联想检索）。我们对核心内存操作（读、写、遗忘和容量管理）的分析揭示了从静态缓存向自适应、测试时学习系统的转变。我们确定了可扩展性和干扰方面的持续挑战，以及新兴的解决方案，包括分层缓冲和门控更新。这种综合为认知启发的终身学习Transformer架构提供了路线图。
摘要：Memory is fundamental to intelligence, enabling learning, reasoning, and adaptability across biological and artificial systems. While Transformer architectures excel at sequence modeling, they face critical limitations in long-range context retention, continual learning, and knowledge integration. This review presents a unified framework bridging neuroscience principles, including dynamic multi-timescale memory, selective attention, and consolidation, with engineering advances in Memory-Augmented Transformers. We organize recent progress through three taxonomic dimensions: functional objectives (context extension, reasoning, knowledge integration, adaptation), memory representations (parameter-encoded, state-based, explicit, hybrid), and integration mechanisms (attention fusion, gated control, associative retrieval). Our analysis of core memory operations (reading, writing, forgetting, and capacity management) reveals a shift from static caches toward adaptive, test-time learning systems. We identify persistent challenges in scalability and interference, alongside emerging solutions including hierarchical buffering and surprise-gated updates. This synthesis provides a roadmap toward cognitively-inspired, lifelong-learning Transformer architectures.

【2】Self-Supervised Temporal Super-Resolution of Energy Data using Generative Adversarial Transformer
标题：使用生成对抗Transformer的能量数据自监督时间超分辨率
链接：https://arxiv.org/abs/2508.10587

作者：u, Gökhan Demirel, Yuzhe Zhang, Jianlei Liu, Thorsten Schlachter, Veit Hagenmeyer
摘要：为了弥补基于能源系统模型的能源网络设计和运行中时间粒度的差距，需要对时间序列进行重新排序。虽然传统的上采样方法在计算上是高效的，但它们通常导致显著的信息丢失或增加的噪声。时间序列生成模型、超分辨率模型和插补模型等高级模型显示出潜力，但也面临着根本性的挑战。时间序列生成模型的目标是学习原始数据的分布，以生成具有相似统计特征的高分辨率序列。这与上采样的定义并不完全一致。时间序列超分辨率模型或插补模型可能会降低上采样的准确性，因为输入的低分辨率时间序列是稀疏的，并且可能没有足够的上下文。此外，这些模型通常依赖于监督学习范式。这提出了一个基本的应用悖论：它们的训练需要高分辨率的时间序列，而这在上采样应用场景中本质上是不存在的。为了解决上述上采样问题，本文介绍了一种利用生成对抗Transformers（GAT）的新方法，该方法可以在不访问任何地面真实高分辨率数据的情况下进行训练。与传统插值方法相比，该方法可以将上采样任务的均方根误差（RMSE）降低9%，模型预测控制（MPC）应用场景的精度提高13%。
摘要：To bridge the temporal granularity gap in energy network design and operation based on Energy System Models, resampling of time series is required. While conventional upsampling methods are computationally efficient, they often result in significant information loss or increased noise. Advanced models such as time series generation models, Super-Resolution models and imputation models show potential, but also face fundamental challenges. The goal of time series generative models is to learn the distribution of the original data to generate high-resolution series with similar statistical characteristics. This is not entirely consistent with the definition of upsampling. Time series Super-Resolution models or imputation models can degrade the accuracy of upsampling because the input low-resolution time series are sparse and may have insufficient context. Moreover, such models usually rely on supervised learning paradigms. This presents a fundamental application paradox: their training requires the high-resolution time series that is intrinsically absent in upsampling application scenarios. To address the mentioned upsampling issue, this paper introduces a new method utilizing Generative Adversarial Transformers (GATs), which can be trained without access to any ground-truth high-resolution data. Compared with conventional interpolation methods, the introduced method can reduce the root mean square error (RMSE) of upsampling tasks by 9%, and the accuracy of a model predictive control (MPC) application scenario is improved by 13%.

【3】Multi-Label Plant Species Prediction with Metadata-Enhanced Multi-Head Vision Transformers
标题：使用元数据增强型多头视觉转换器进行多标签植物物种预测
链接：https://arxiv.org/abs/2508.10457

作者：asimchyk, Robin Labryga, Tomislav Prusina
备注：Accepted for publication at: LifeCLEF Lab at CLEF 2025 Working Notes, 2025, Madrid, Spain
摘要：我们提出了一种多头Vision Transformer方法，用于植被图图像中的多标记植物物种预测，解决PlantCLEF 2025挑战。该任务涉及在单物种植物图像上训练模型，同时在多物种样方图像上进行测试，从而产生剧烈的域转移。我们的方法利用了预先训练的DINOv 2 Vision Transformer Base（ViT-B/14）主干，该主干具有用于种、属和科预测的多个分类头，利用分类层次结构。主要贡献包括多尺度平铺以捕获不同尺度的植物，基于平均预测长度的动态阈值优化，以及通过装袋和Hydra模型架构的集成策略。该方法采用了各种推理技术，包括图像裁剪，以消除非植物文物，前n过滤预测约束，和logit阈值策略。在大约140万张训练图像上进行了实验，覆盖了7，806种植物。结果显示出强大的性能，使我们的提交在私人排行榜上排名第三。我们的代码可在https://github.com/geranium12/plant-clef-2025/tree/v1.0.0上获得。
摘要：We present a multi-head vision transformer approach for multi-label plant species prediction in vegetation plot images, addressing the PlantCLEF 2025 challenge. The task involves training models on single-species plant images while testing on multi-species quadrat images, creating a drastic domain shift. Our methodology leverages a pre-trained DINOv2 Vision Transformer Base (ViT-B/14) backbone with multiple classification heads for species, genus, and family prediction, utilizing taxonomic hierarchies. Key contributions include multi-scale tiling to capture plants at different scales, dynamic threshold optimization based on mean prediction length, and ensemble strategies through bagging and Hydra model architectures. The approach incorporates various inference techniques including image cropping to remove non-plant artifacts, top-n filtering for prediction constraints, and logit thresholding strategies. Experiments were conducted on approximately 1.4 million training images covering 7,806 plant species. Results demonstrate strong performance, making our submission 3rd best on the private leaderboard. Our code is available at https://github.com/geranium12/plant-clef-2025/tree/v1.0.0.

【4】Pruning and Malicious Injection: A Retraining-Free Backdoor Attack on Transformer Models
标题：剪枝和恶意注入：一种对Transformer模型的无重训练后门攻击
链接：https://arxiv.org/abs/2508.10243

作者：hao, Mingxuan Sun, Hao Wang, Xiaobing Chen, Xiangwei Zhou
摘要：Transformer模型已表现出卓越的性能，并已成为计算机视觉（CV）和自然语言处理（NLP）任务中不可或缺的工具。然而，最近的研究表明，Transformers容易受到后门攻击。以前的后门攻击方法通常依赖于使用干净数据进行重新训练或更改模型架构，这两种方法都可能是资源密集型和侵入性的。在本文中，我们提出了头部修剪和恶意注入（HPMI），这是一种新的对Transformers的无再训练后门攻击，不会改变模型的架构。我们的方法只需要原始数据的一个小子集和模型架构的基本知识，消除了重新训练目标Transformer的需要。从技术上讲，HPMI的工作原理是修剪最不重要的头部，并注入一个预先训练好的恶意头部来建立后门。我们提供了一个严格的理论依据，证明植入的后门抵抗检测和删除的最先进的防御技术，在合理的假设。在多个数据集上的实验评估进一步验证了HPMI的有效性，表明它1）产生的干净准确性损失可以忽略不计，2）达到至少99.55%的攻击成功率，3）绕过四种高级防御机制。此外，相对于最先进的依赖再训练的攻击，HPMI实现了更大的隐蔽性和对不同防御策略的鲁棒性，同时保持对干净准确性的影响最小。
摘要：Transformer models have demonstrated exceptional performance and have become indispensable in computer vision (CV) and natural language processing (NLP) tasks. However, recent studies reveal that transformers are susceptible to backdoor attacks. Prior backdoor attack methods typically rely on retraining with clean data or altering the model architecture, both of which can be resource-intensive and intrusive. In this paper, we propose Head-wise Pruning and Malicious Injection (HPMI), a novel retraining-free backdoor attack on transformers that does not alter the model's architecture. Our approach requires only a small subset of the original data and basic knowledge of the model architecture, eliminating the need for retraining the target transformer. Technically, HPMI works by pruning the least important head and injecting a pre-trained malicious head to establish the backdoor. We provide a rigorous theoretical justification demonstrating that the implanted backdoor resists detection and removal by state-of-the-art defense techniques, under reasonable assumptions. Experimental evaluations across multiple datasets further validate the effectiveness of HPMI, showing that it 1) incurs negligible clean accuracy loss, 2) achieves at least 99.55% attack success rate, and 3) bypasses four advanced defense mechanisms. Additionally, relative to state-of-the-art retraining-dependent attacks, HPMI achieves greater concealment and robustness against diverse defense strategies, while maintaining minimal impact on clean accuracy.

【5】Can Transformers Break Encryption Schemes via In-Context Learning?
标题：Transformer可以通过上下文学习破解加密方案吗？
链接：https://arxiv.org/abs/2508.10235

作者：rrapati, Patrick Mendoza, Aditya Tomar, Abein Abraham
摘要：上下文学习（ICL）已经成为基于transformer的语言模型的强大功能，使它们能够通过在推理时提供的少量示例来执行任务，而无需任何参数更新。先前的工作表明，Transformers可以纯粹从上下文推广到简单的函数类，如线性函数，决策树，甚至神经网络，专注于底层结构良好的函数的数值或符号推理。相反，我们提出了一种新的应用ICL到域的密码函数学习，特别是集中在密码，如单字母表替换和Vigen\`ere密码，两类私钥加密方案。这些密码涉及明文和密文字符之间的固定但隐藏的双射映射。给定一小组（密文，纯文本）对，目标是让模型推断潜在的替换并解码新的密文单词。这种设置构成了一个结构化推理的挑战，这是非常适合于评估的归纳偏见和概括能力的Transformers器下的ICL范式。代码可在https://github.com/adistomar/CS182-project上获得。
摘要：In-context learning (ICL) has emerged as a powerful capability of transformer-based language models, enabling them to perform tasks by conditioning on a small number of examples presented at inference time, without any parameter updates. Prior work has shown that transformers can generalize over simple function classes like linear functions, decision trees, even neural networks, purely from context, focusing on numerical or symbolic reasoning over underlying well-structured functions. Instead, we propose a novel application of ICL into the domain of cryptographic function learning, specifically focusing on ciphers such as mono-alphabetic substitution and Vigen\`ere ciphers, two classes of private-key encryption schemes. These ciphers involve a fixed but hidden bijective mapping between plain text and cipher text characters. Given a small set of (cipher text, plain text) pairs, the goal is for the model to infer the underlying substitution and decode a new cipher text word. This setting poses a structured inference challenge, which is well-suited for evaluating the inductive biases and generalization capabilities of transformers under the ICL paradigm. Code is available at https://github.com/adistomar/CS182-project.

【6】Pre-trained Transformer-models using chronic invasive electrophysiology for symptom decoding without patient-individual training
标题：预训练的变形者模型使用慢性侵入性电生理学进行症状解码，无需患者个体训练
链接：https://arxiv.org/abs/2508.10160

作者：k, Saeed Salehi, Richard M. Koehler, Qiming Cui, Maria Olaru, Amelia Hahn, Nicole R. Provenza, Simon Little, Reza Abbasi-Asl, Phil A. Starr, Wolf-Julian Neumann
备注：5 pages, 6 figures
摘要：病理和生理状态的神经解码可以实现患者个体化闭环神经调节治疗。预训练的大规模基础模型的最新进展提供了无需患者个体训练的广义状态估计的潜力。在这里，我们提出了一个基础模型，在24天的慢性纵向脑深部电刺激记录上训练。坚持长时间尺度的症状波动，我们强调了30分钟的扩展上下文窗口。我们提出了一个优化的神经电生理数据的预训练损失函数，该函数校正了由于1-over-f幂律而导致的常见掩蔽自动编码器损失函数的频率偏差。我们在一个下游任务中展示了在没有患者个体训练的情况下，用留一个受试者交叉验证解码帕金森病症状。
摘要：Neural decoding of pathological and physiological states can enable patient-individualized closed-loop neuromodulation therapy. Recent advances in pre-trained large-scale foundation models offer the potential for generalized state estimation without patient-individual training. Here we present a foundation model trained on chronic longitudinal deep brain stimulation recordings spanning over 24 days. Adhering to long time-scale symptom fluctuations, we highlight the extended context window of 30 minutes. We present an optimized pre-training loss function for neural electrophysiological data that corrects for the frequency bias of common masked auto-encoder loss functions due to the 1-over-f power law. We show in a downstream task the decoding of Parkinson's disease symptoms with leave-one-subject-out cross-validation without patient-individual training.

GAN|对抗|攻击|生成相关(10篇)

【1】IBEX: Information-Bottleneck-EXplored Coarse-to-Fine Molecular Generation under Limited Data
标题：IBEX：有限数据下的信息瓶颈探索从粗到细的分子生成
链接：https://arxiv.org/abs/2508.10775

作者：Zhangfan Yang, Jenna Xinyi Yao, Shuangbao Song, Zexuan Zhu, Junkai Ji
备注：10 pages, 8 figures
摘要：三维生成模型越来越多地驱动基于结构的药物发现，但它仍然受到稀缺的公开可用的蛋白质-配体复合物的限制。在这种数据稀缺的情况下，几乎所有现有的管道都很难学习可转移的几何先验知识，从而过拟合训练集偏差。因此，我们提出了IBEX，这是一个信息瓶颈探索的粗到细管道，以解决基于结构的药物设计中蛋白质-配体复合物数据的长期短缺问题。具体来说，我们使用PAC贝叶斯信息瓶颈理论来量化每个样本的信息密度。该分析揭示了不同的掩蔽策略如何影响泛化，并表明，与传统的从头生成相比，受约束的脚手架跳跃任务赋予模型更大的有效容量和改善的传输性能。IBEX保留了原始的TargetDiff架构和超参数，用于训练以生成与结合口袋兼容的分子;然后应用L-BFGS优化步骤，通过优化五个基于物理的项并在一秒内调整六个平移和旋转自由度来精细地细化每个构象。仅通过这些修改，IBEX将CBGBench CrossDocked 2020上的zero-shot对接成功率从53%提高到64%，将平均Vina评分从$-7.41 kcal mol^{-1}$提高到$-8.07 kcal mol^{-1}$，并在100个囊袋中的57个囊袋中实现了最佳Vina能量中位数，而原始TargetDiff仅为3个。IBEX还将QED提高了25%，实现了最先进的有效性和多样性，并显着降低了外推误差。
摘要：Three-dimensional generative models increasingly drive structure-based drug discovery, yet it remains constrained by the scarce publicly available protein-ligand complexes. Under such data scarcity, almost all existing pipelines struggle to learn transferable geometric priors and consequently overfit to training-set biases. As such, we present IBEX, an Information-Bottleneck-EXplored coarse-to-fine pipeline to tackle the chronic shortage of protein-ligand complex data in structure-based drug design. Specifically, we use PAC-Bayesian information-bottleneck theory to quantify the information density of each sample. This analysis reveals how different masking strategies affect generalization and indicates that, compared with conventional de novo generation, the constrained Scaffold Hopping task endows the model with greater effective capacity and improved transfer performance. IBEX retains the original TargetDiff architecture and hyperparameters for training to generate molecules compatible with the binding pocket; it then applies an L-BFGS optimization step to finely refine each conformation by optimizing five physics-based terms and adjusting six translational and rotational degrees of freedom in under one second. With only these modifications, IBEX raises the zero-shot docking success rate on CBGBench CrossDocked2020-based from 53% to 64%, improves the mean Vina score from $-7.41 kcal mol^{-1}$ to $-8.07 kcal mol^{-1}$, and achieves the best median Vina energy in 57 of 100 pockets versus 3 for the original TargetDiff. IBEX also increases the QED by 25%, achieves state-of-the-art validity and diversity, and markedly reduces extrapolation error.

【2】Video-BLADE: Block-Sparse Attention Meets Step Distillation for Efficient Video Generation
标题：Video-BLADE：块稀疏关注满足分步蒸馏以实现高效视频生成
链接：https://arxiv.org/abs/2508.10774

作者：u, Xiaolong Li, Yuhao Hu, Bohan Zhuang
备注：Tech report
摘要：扩散Transformers目前在高质量视频生成领域处于领先地位，但其缓慢的迭代去噪过程和长序列的二次注意力成本造成了显着的推理瓶颈。虽然步骤蒸馏和稀疏注意力机制都显示出作为独立加速策略的前景，但有效地结合这些方法提出了关键的挑战-无训练集成产生次优结果，而在步骤蒸馏后单独训练稀疏注意力需要昂贵的高质量视频数据。为了克服这些限制，我们提出了BLADE，这是一种创新的无数据联合训练框架，它引入了：（1）自适应块稀疏注意（ASA）机制，用于动态生成内容感知稀疏掩码以将计算集中在显著时空特征上，以及（2）基于轨迹分布匹配（TDM）的稀疏性感知分步蒸馏范例直接将稀疏性合并到蒸馏过程中，而不是将其视为具有快速收敛的单独压缩步骤。我们在CogVideoX-5 B和Wan2.1-1.3B等文本到视频模型上验证了BLADE。我们的框架在不同规模上展示了显着的效率提升。在Wan2.1-1.3B上，BLADE在50步基线上实现了14.10倍的端到端推理加速。此外，在具有短视频序列长度的CogVideoX-5 B等模型上，我们的框架提供了强大的8.89倍加速。至关重要的是，加速伴随着质量的持续改善。在VBench-2.0基准测试中，BLADE将CogVideoX-5 B的得分提高到0.569（从0.534），将Wan2.1-1.3B的得分提高到0.570（从0.563），这些结果进一步得到了人类评估中的优异评分的证实。我们的代码和模型权重可在http://ziplab.co/BLADE-Homepage/上公开获取。
摘要：Diffusion transformers currently lead the field in high-quality video generation, but their slow iterative denoising process and prohibitive quadratic attention costs for long sequences create significant inference bottlenecks. While both step distillation and sparse attention mechanisms have shown promise as independent acceleration strategies, effectively combining these approaches presents critical challenges -- training-free integration yields suboptimal results, while separately training sparse attention after step distillation requires prohibitively expensive high-quality video data. To overcome these limitations, we propose BLADE, an innovative data-free joint training framework that introduces: (1) an Adaptive Block-Sparse Attention (ASA) mechanism for dynamically generating content-aware sparsity masks to focus computation on salient spatiotemporal features, and (2) a sparsity-aware step distillation paradigm built upon Trajectory Distribution Matching (TDM) that directly incorporates sparsity into the distillation process rather than treating it as a separate compression step, with fast convergence. We validate BLADE on text-to-video models like CogVideoX-5B and Wan2.1-1.3B. Our framework demonstrates remarkable efficiency gains across different scales. On Wan2.1-1.3B, BLADE achieves a 14.10x end-to-end inference acceleration over a 50-step baseline. Moreover, on models such as CogVideoX-5B with short video sequence lengths, our framework delivers a robust 8.89x speedup. Crucially, the acceleration is accompanied by a consistent quality improvement. On the VBench-2.0 benchmark, BLADE boosts the score of CogVideoX-5B to 0.569 (from 0.534) and Wan2.1-1.3B to 0.570 (from 0.563), results that are further corroborated by superior ratings in human evaluations. Our code and model weights are publicly available at: http://ziplab.co/BLADE-Homepage/.

【3】Oops!... They Stole it Again: Attacks on Split Learning
标题：哎呀！..他们又偷走了：对分裂学习的攻击
链接：https://arxiv.org/abs/2508.10598

作者：han, Antonis Michalas
摘要：Split Learning（SL）是一种协作学习方法，通过将数据保存在客户端，同时仅与服务器共享中间输出来提高隐私性。然而，SL的分布式特性带来了新的安全挑战，需要全面探索潜在的攻击。本文系统地回顾了SL上的各种攻击，根据攻击者的角色，隐私风险的类型，何时发生数据泄漏以及漏洞存在的位置等因素对其进行分类。我们还分析了现有的防御方法，包括加密方法，数据修改方法，分布式技术和混合解决方案。我们的发现揭示了安全漏洞，突出了现有防御的有效性和局限性。通过确定开放的挑战和未来的方向，这项工作提供了有价值的信息，以改善SL隐私问题，并指导进一步的研究。
摘要：Split Learning (SL) is a collaborative learning approach that improves privacy by keeping data on the client-side while sharing only the intermediate output with a server. However, the distributed nature of SL introduces new security challenges, necessitating a comprehensive exploration of potential attacks. This paper systematically reviews various attacks on SL, classifying them based on factors such as the attacker's role, the type of privacy risks, when data leaks occur, and where vulnerabilities exist. We also analyze existing defense methods, including cryptographic methods, data modification approaches, distributed techniques, and hybrid solutions. Our findings reveal security gaps, highlighting the effectiveness and limitations of existing defenses. By identifying open challenges and future directions, this work provides valuable information to improve SL privacy issues and guide further research.

【4】Projected Coupled Diffusion for Test-Time Constrained Joint Generation
标题：测试时间受限联合生成的投影耦合扩散
链接：https://arxiv.org/abs/2508.10531

作者： Yi Xian Goh, See-Kiong Ng, Chun Kai Ling
备注：37 pages
摘要：对测试时间采样的修改已经成为扩散算法的重要扩展，其目标是使生成过程偏置以实现给定目标，而无需重新训练整个扩散模型。然而，从多个预先训练的扩散模型生成联合相关的样本，同时执行特定于任务的约束，而无需昂贵的再训练，仍然具有挑战性。为此，我们提出了投影耦合扩散（PCD），一种新的测试时间框架约束联合生成。PCD在生成动力学中引入了一个耦合的指导项，以鼓励扩散模型之间的协调，并在每个扩散步骤中引入了一个投影步骤，以执行硬约束。经验上，我们证明了PCD在图像对生成，对象操作和多机器人运动规划的应用场景中的有效性。我们的研究结果表明，改善耦合效应和保证约束的满意度，而不会产生过多的计算成本。
摘要：Modifications to test-time sampling have emerged as an important extension to diffusion algorithms, with the goal of biasing the generative process to achieve a given objective without having to retrain the entire diffusion model. However, generating jointly correlated samples from multiple pre-trained diffusion models while simultaneously enforcing task-specific constraints without costly retraining has remained challenging. To this end, we propose Projected Coupled Diffusion (PCD), a novel test-time framework for constrained joint generation. PCD introduces a coupled guidance term into the generative dynamics to encourage coordination between diffusion models and incorporates a projection step at each diffusion step to enforce hard constraints. Empirically, we demonstrate the effectiveness of PCD in application scenarios of image-pair generation, object manipulation, and multi-robot motion planning. Our results show improved coupling effects and guaranteed constraint satisfaction without incurring excessive computational costs.

【5】A Unified Multi-Agent Framework for Universal Multimodal Understanding and Generation
标题：一种通用多模态理解和生成的统一多Agent框架
链接：https://arxiv.org/abs/2508.10494

作者：, Ping Huang, Yexin Li, Shuo Chen, Juewen Hu, Ye Tian
备注：8 pages, 5 figures
摘要：现实世界的多模态应用程序通常需要任意对任意的功能，从而能够跨模态（包括文本、图像、音频和视频）进行理解和生成。然而，整合自回归语言模型（LLM）的推理和扩散模型的高保真生成的优势仍然具有挑战性。现有的方法依赖于严格的管道或紧密耦合的架构，限制了灵活性和可扩展性。我们提出了MAGUS（多智能体引导的统一多模态系统），一个模块化的框架，统一多模态的理解和生成通过两个解耦的阶段：认知和审议。MAGUS支持在共享文本工作空间内进行符号化的多代理协作。在认知阶段，三个角色制约的多模态LLM代理-感知者，规划者和反射者-参与协作对话，以执行结构化的理解和规划。审议阶段包含一个增长感知搜索机制，以相互加强的方式编排基于LLM的推理和基于扩散的生成。MAGUS支持即插即用的可扩展性、可扩展的任意到任意模态转换和语义对齐-所有这些都不需要联合训练。多个基准测试的实验，包括图像、视频和音频生成，以及跨模态指令跟踪，表明MAGUS优于强大的基线和最先进的系统。值得注意的是，在MME基准测试中，MAGUS超过了强大的闭源模型GPT-4 o。
摘要：Real-world multimodal applications often require any-to-any capabilities, enabling both understanding and generation across modalities including text, image, audio, and video. However, integrating the strengths of autoregressive language models (LLMs) for reasoning and diffusion models for high-fidelity generation remains challenging. Existing approaches rely on rigid pipelines or tightly coupled architectures, limiting flexibility and scalability. We propose MAGUS (Multi-Agent Guided Unified Multimodal System), a modular framework that unifies multimodal understanding and generation via two decoupled phases: Cognition and Deliberation. MAGUS enables symbolic multi-agent collaboration within a shared textual workspace. In the Cognition phase, three role-conditioned multimodal LLM agents - Perceiver, Planner, and Reflector - engage in collaborative dialogue to perform structured understanding and planning. The Deliberation phase incorporates a Growth-Aware Search mechanism that orchestrates LLM-based reasoning and diffusion-based generation in a mutually reinforcing manner. MAGUS supports plug-and-play extensibility, scalable any-to-any modality conversion, and semantic alignment - all without the need for joint training. Experiments across multiple benchmarks, including image, video, and audio generation, as well as cross-modal instruction following, demonstrate that MAGUS outperforms strong baselines and state-of-the-art systems. Notably, on the MME benchmark, MAGUS surpasses the powerful closed-source model GPT-4o.

【6】Contrastive ECOC: Learning Output Codes for Adversarial Defense
标题：对比ECOC：对抗性防御的学习输出代码
链接：https://arxiv.org/abs/2508.10491

作者：ou, Hung-Hsuan Chen
摘要：虽然独热编码通常用于多类分类，但它并不总是最有效的编码机制。纠错输出码（ECOC）通过将每个类映射到用作标签的唯一码字来解决多类分类问题。传统的ECOC方法依赖于手动设计或随机生成的码本，这是劳动密集型的，并且可能会产生次优的、与数据集无关的结果。本文介绍了三种基于对比学习的自动码本学习模型，允许码本直接自适应地从数据中学习。在四个数据集上，与两个基线相比，我们提出的模型对对抗性攻击表现出更好的鲁棒性。来源可在https://github.com/YuChou20/Automated-Codebook-Learning-with-Error-Correcting-Output-Code-Technique上获得。
摘要：Although one-hot encoding is commonly used for multiclass classification, it is not always the most effective encoding mechanism. Error Correcting Output Codes (ECOC) address multiclass classification by mapping each class to a unique codeword used as a label. Traditional ECOC methods rely on manually designed or randomly generated codebooks, which are labor-intensive and may yield suboptimal, dataset-agnostic results. This paper introduces three models for automated codebook learning based on contrastive learning, allowing codebooks to be learned directly and adaptively from data. Across four datasets, our proposed models demonstrate superior robustness to adversarial attacks compared to two baselines. The source is available at https://github.com/YuChou20/Automated-Codebook-Learning-with-Error-Correcting-Output-Code-Technique.

【7】A Hierarchical IDS for Zero-Day Attack Detection in Internet of Medical Things Networks
标题：医疗物联网网络中用于零日攻击检测的分层IDS
链接：https://arxiv.org/abs/2508.10346

作者： Uddin, Nam H. Chu, Reza Rafeh
备注：13 pages, and 4 figures
摘要：医疗物联网（IoMT）正在推动医疗保健革命，但仍然容易受到拒绝服务、勒索软件、数据劫持和欺骗等网络攻击。这些网络包括资源受限的异构设备（例如，可穿戴传感器、智能药丸、可植入物），由于响应延迟、隐私风险和增加的漏洞，使得传统的集中式入侵检测系统（IDS）不适用。集中式IDS要求所有传感器将数据传输到中央服务器，从而在密集环境中造成延迟或网络中断。由于计算有限，在IoMT设备上本地运行IDS通常是不可行的，如果更新的模型延迟，即使是轻量级的IDS组件也会面临风险，使它们暴露于零日攻击，威胁患者健康和数据安全。我们提出了一个多层次的IoMT IDS框架，能够检测零日攻击，区分已知和未知的威胁。第一层（靠近边缘）使用元学习或带有usfAD算法的一类分类（OCC）在粗略级别（攻击或非攻击）过滤流量。后续层（远边缘，云）识别攻击类型和新颖性。在CICIoMT2024数据集上的实验显示了99.77%的准确率和97.8%的F1分数。第一层以高精度检测零日攻击，无需新数据集，确保在IoMT环境中的强大适用性。此外，元学习方法取得了很高的成就。
摘要：The Internet of Medical Things (IoMT) is driving a healthcare revolution but remains vulnerable to cyberattacks such as denial of service, ransomware, data hijacking, and spoofing. These networks comprise resource constrained, heterogeneous devices (e.g., wearable sensors, smart pills, implantables), making traditional centralized Intrusion Detection Systems (IDSs) unsuitable due to response delays, privacy risks, and added vulnerabilities. Centralized IDSs require all sensors to transmit data to a central server, causing delays or network disruptions in dense environments. Running IDSs locally on IoMT devices is often infeasible due to limited computation, and even lightweight IDS components remain at risk if updated models are delayed leaving them exposed to zero-day attacks that threaten patient health and data security. We propose a multi level IoMT IDS framework capable of detecting zero day attacks and distinguishing between known and unknown threats. The first layer (near Edge) filters traffic at a coarse level (attack or not) using meta-learning or One Class Classification (OCC) with the usfAD algorithm. Subsequent layers (far Edge, Cloud) identify attack type and novelty. Experiments on the CICIoMT2024 dataset show 99.77 percentage accuracy and 97.8 percentage F1-score. The first layer detects zero-day attacks with high accuracy without needing new datasets, ensuring strong applicability in IoMT environments. Additionally, the meta-learning approach achieves high.

【8】From Intent to Execution: Multimodal Chain-of-Thought Reinforcement Learning for Precise CAD Code Generation
标题：从意图到执行：多模式思维链强化学习，用于精确生成CAD代码
链接：https://arxiv.org/abs/2508.10118

作者：aiyang Yu, Zhuofan Chen, Mengyang Zhao, Teng Fu, Bin Li, Xiangyang Xue
摘要：计算机辅助设计（CAD）在工程和制造中起着至关重要的作用，但目前的CAD工作流程需要广泛的领域专业知识和手动建模工作。大型语言模型（LLM）的最新进展使得从自然语言生成代码成为可能，为自动化参数化3D建模提供了新的机会。然而，由于需要逻辑推理、语法正确性和数值精度，将人类设计意图直接翻译成可执行的CAD代码仍然具有很大的挑战性。在这项工作中，我们提出了CAD-RL，这是一种多模态思想链（CoT）引导的强化学习后训练框架，用于CAD建模代码生成。我们的方法将基于CoT的冷启动与目标驱动的强化学习后训练相结合，使用三种特定于任务的奖励：可执行性奖励，几何准确性奖励和外部评估奖励。为了确保在稀疏和高方差奖励条件下稳定的策略学习，我们引入了三种有针对性的优化策略：用于改进探索的信任区域拉伸，用于增强维度参数准确性的精确令牌丢失，以及用于减少噪声监督的超长过滤。为了支持培训和基准测试，我们发布了ExeCAD，这是一个新的数据集，包含16，540个真实的CAD示例，带有成对的自然语言和结构化设计语言描述，可执行的CADQuery脚本和渲染的3D模型。实验表明，CAD-RL实现了显着的改进，在推理质量，输出精度和代码的可执行性比现有的VLM。
摘要：Computer-Aided Design (CAD) plays a vital role in engineering and manufacturing, yet current CAD workflows require extensive domain expertise and manual modeling effort. Recent advances in large language models (LLMs) have made it possible to generate code from natural language, opening new opportunities for automating parametric 3D modeling. However, directly translating human design intent into executable CAD code remains highly challenging, due to the need for logical reasoning, syntactic correctness, and numerical precision. In this work, we propose CAD-RL, a multimodal Chain-of-Thought (CoT) guided reinforcement learning post training framework for CAD modeling code generation. Our method combines CoT-based Cold Start with goal-driven reinforcement learning post training using three task-specific rewards: executability reward, geometric accuracy reward, and external evaluation reward. To ensure stable policy learning under sparse and high-variance reward conditions, we introduce three targeted optimization strategies: Trust Region Stretch for improved exploration, Precision Token Loss for enhanced dimensions parameter accuracy, and Overlong Filtering to reduce noisy supervision. To support training and benchmarking, we release ExeCAD, a noval dataset comprising 16,540 real-world CAD examples with paired natural language and structured design language descriptions, executable CADQuery scripts, and rendered 3D models. Experiments demonstrate that CAD-RL achieves significant improvements in reasoning quality, output precision, and code executability over existing VLMs.

【9】Neural Network-Based Detection and Multi-Class Classification of FDI Attacks in Smart Grid Home Energy Systems
标题：基于神经网络的智能电网家庭能源系统中FDI攻击检测和多类分类
链接：https://arxiv.org/abs/2508.10035

作者：n, Biswash Basnet
备注：17 pages, 7 figures
摘要：虚假数据注入攻击（FDIA）对智能电网基础设施构成了重大威胁，特别是高度采用实时监控和控制的家庭区域网络（HAN）。由于HAN相对不那么严格的安全控制和广泛的可用性，攻击者将其视为操纵聚合需求模式的有吸引力的入口点，这最终会传播并破坏更广泛的网格操作。这些攻击破坏了智能电表数据的完整性，使恶意行为者能够在不激活传统警报的情况下操纵消费值，从而在住宅和公用事业规模的基础设施中造成严重的漏洞。本文提出了一种基于机器学习的框架，用于使用住宅能源数据对FDIA进行检测和分类。实时检测由轻量级人工神经网络（ANN）提供，该网络通过使用能耗，成本和时间上下文的最重要特征来工作。对于不同攻击类型的分类，双向LSTM通过学习数据中的顺序依赖关系来训练识别正常，梯形和S形攻击形状。生成了一个合成时间序列数据集，以模拟现实的家庭行为。实验结果表明，所提出的模型是有效的识别和分类FDIA，提供了一个可扩展的解决方案，以提高网格弹性的边缘。这项工作有助于建立智能的、数据驱动的防御机制，从住宅端点加强智能电网的网络安全。
摘要：False Data Injection Attacks (FDIAs) pose a significant threat to smart grid infrastructures, particularly Home Area Networks (HANs), where real-time monitoring and control are highly adopted. Owing to the comparatively less stringent security controls and widespread availability of HANs, attackers view them as an attractive entry point to manipulate aggregated demand patterns, which can ultimately propagate and disrupt broader grid operations. These attacks undermine the integrity of smart meter data, enabling malicious actors to manipulate consumption values without activating conventional alarms, thereby creating serious vulnerabilities across both residential and utility-scale infrastructures. This paper presents a machine learning-based framework for both the detection and classification of FDIAs using residential energy data. A real-time detection is provided by the lightweight Artificial Neural Network (ANN), which works by using the most vital features of energy consumption, cost, and time context. For the classification of different attack types, a Bidirectional LSTM is trained to recognize normal, trapezoidal, and sigmoid attack shapes through learning sequential dependencies in the data. A synthetic time-series dataset was generated to emulate realistic household behaviour. Experimental results demonstrate that the proposed models are effective in identifying and classifying FDIAs, offering a scalable solution for enhancing grid resilience at the edge. This work contributes toward building intelligent, data-driven defence mechanisms that strengthen smart grid cybersecurity from residential endpoints.

【10】Whisper Smarter, not Harder: Adversarial Attack on Partial Suppression
标题：低语更聪明，而不是更难：对部分抑制的对抗攻击
链接：https://arxiv.org/abs/2508.09994

作者： Wong, Bingquan Shen
备注：13 pages, 7 figures
摘要：目前，自动语音识别（ASR）模型被部署在广泛的应用中。然而，最近的研究已经证明了对这些模型进行对抗性攻击的可能性，这可能会抑制或破坏模型输出。我们调查和验证这些攻击的鲁棒性，并探讨是否有可能提高其不可感知性。我们还发现，通过放松的优化目标从完全抑制部分抑制，我们可以进一步降低攻击的不可感知性。我们还探讨了针对这些攻击的可能防御措施，并表明低通滤波器防御可能是一种有效的防御措施。
摘要：Currently, Automatic Speech Recognition (ASR) models are deployed in an extensive range of applications. However, recent studies have demonstrated the possibility of adversarial attack on these models which could potentially suppress or disrupt model output. We investigate and verify the robustness of these attacks and explore if it is possible to increase their imperceptibility. We additionally find that by relaxing the optimisation objective from complete suppression to partial suppression, we can further decrease the imperceptibility of the attack. We also explore possible defences against these attacks and show a low-pass filter defence could potentially serve as an effective defence.

半/弱/无/有监督|不确定性|主动学习(3篇)

【1】Uncertainty-Aware Prediction of Parkinson's Disease Medication Needs: A Two-Stage Conformal Prediction Approach
标题：帕金森病药物需求的不确定性预测：两阶段保形预测方法
链接：https://arxiv.org/abs/2508.10284

作者：iaz-Rincon, Muxuan Liang, Adolfo Ramirez-Zamora, Benjamin Shickel
备注：Accepted to MLHC 2025
摘要：帕金森病（PD）的药物管理提出了独特的挑战，由于异质性的疾病进展和治疗反应。神经科医生必须根据功能障碍平衡症状控制与最佳多巴胺能剂量，同时尽量减少副作用。这种平衡是至关重要的，因为不充分或突然的变化可能会导致左旋多巴诱导的运动障碍，磨损和神经精神影响，显着降低生活质量。目前的方法依赖于试错决策，而没有系统的预测方法。尽管机器学习取得了进步，但由于依赖于不考虑预测不确定性的点预测，临床应用仍然有限，从而破坏了临床信任和实用性。临床医生不仅需要预测未来的药物需求，还需要可靠的信心措施。在没有量化的不确定性的情况下，调整有可能过早增加到最大剂量或延长症状控制不足的时间。我们开发了一个适形预测框架，可以提前两年预测药物需求，具有可靠的预测间隔和统计保证。我们的方法解决了PD住院患者数据的零膨胀，患者在两次访视之间保持稳定的药物治疗方案。使用佛罗里达大学健康中心（2011-2021）631名住院患者的电子健康记录，我们的两阶段方法确定了可能需要改变药物的患者，然后预测了所需的左旋多巴等效日剂量调整。与传统方法相比，我们的框架实现了边际覆盖，同时减少了预测区间长度，为短期规划提供了精确的预测，为长期预测提供了更大的范围。通过量化不确定性，我们的方法能够对左旋多巴剂量做出基于证据的决策，优化症状控制，同时最大限度地减少副作用并提高生活质量。
摘要：Parkinson's Disease (PD) medication management presents unique challenges due to heterogeneous disease progression and treatment response. Neurologists must balance symptom control with optimal dopaminergic dosing based on functional disability while minimizing side effects. This balance is crucial as inadequate or abrupt changes can cause levodopa-induced dyskinesia, wearing off, and neuropsychiatric effects, significantly reducing quality of life. Current approaches rely on trial-and-error decisions without systematic predictive methods. Despite machine learning advances, clinical adoption remains limited due to reliance on point predictions that do not account for prediction uncertainty, undermining clinical trust and utility. Clinicians require not only predictions of future medication needs but also reliable confidence measures. Without quantified uncertainty, adjustments risk premature escalation to maximum doses or prolonged inadequate symptom control. We developed a conformal prediction framework anticipating medication needs up to two years in advance with reliable prediction intervals and statistical guarantees. Our approach addresses zero-inflation in PD inpatient data, where patients maintain stable medication regimens between visits. Using electronic health records from 631 inpatient admissions at University of Florida Health (2011-2021), our two-stage approach identifies patients likely to need medication changes, then predicts required levodopa equivalent daily dose adjustments. Our framework achieved marginal coverage while reducing prediction interval lengths compared to traditional approaches, providing precise predictions for short-term planning and wider ranges for long-term forecasting. By quantifying uncertainty, our approach enables evidence-based decisions about levodopa dosing, optimizing symptom control while minimizing side effects and improving life quality.

【2】rETF-semiSL: Semi-Supervised Learning for Neural Collapse in Temporal Data
标题：rETF-semiSL：时态数据中神经崩溃的半监督学习
链接：https://arxiv.org/abs/2508.10147

作者：, William Cappelletti, Mahsa Shoaran, Pascal Frossard
备注：12 pages, 4 figures
摘要：用于时间序列的深度神经网络必须捕获复杂的时间模式，以有效地表示动态数据。自监督和半监督学习方法在预训练大型模型方面表现出了良好的效果，这些模型在进行分类微调时，往往优于从头开始训练的模型。尽管如此，借口训练任务的选择通常是启发式的，并且它们对下游分类的可转移性不被授予，因此我们提出了一种新的半监督预训练策略来执行满足在最佳训练的神经分类器中观察到的神经崩溃现象的潜在表示。我们使用旋转等角紧帧分类器和伪标记来预训练具有少量标记样本的深度编码器。此外，为了有效地捕获时间动态，同时执行嵌入可分性，我们将生成借口任务与我们的方法相结合，并定义了一种新的顺序增强策略。我们表明，我们的方法显着优于以前的借口任务时，应用于LSTM，Transformers和状态空间模型的三个多变量时间序列分类数据集。这些结果突出了将预训练目标与理论上接地的嵌入几何对齐的好处。
摘要：Deep neural networks for time series must capture complex temporal patterns, to effectively represent dynamic data. Self- and semi-supervised learning methods show promising results in pre-training large models, which -- when finetuned for classification -- often outperform their counterparts trained from scratch. Still, the choice of pretext training tasks is often heuristic and their transferability to downstream classification is not granted, thus we propose a novel semi-supervised pre-training strategy to enforce latent representations that satisfy the Neural Collapse phenomenon observed in optimally trained neural classifiers. We use a rotational equiangular tight frame-classifier and pseudo-labeling to pre-train deep encoders with few labeled samples. Furthermore, to effectively capture temporal dynamics while enforcing embedding separability, we integrate generative pretext tasks with our method, and we define a novel sequential augmentation strategy. We show that our method significantly outperforms previous pretext tasks when applied to LSTMs, transformers, and state-space models on three multivariate time series classification datasets. These results highlight the benefit of aligning pre-training objectives with theoretically grounded embedding geometry.

【3】Layer-Wise Analysis of Self-Supervised Representations for Age and Gender Classification in Children's Speech
标题：儿童言语中年龄和性别分类的自我监督表征的分层分析
链接：https://arxiv.org/abs/2508.10332

作者：inha, Harishankar Kumar, Mohit Joshi, Hemant Kumar Kathania, Shrikanth Narayanan, Sudarsana Reddy Kadiri
备注：Accepted at Workshop on Child Computer Interaction (WOCCI 2025)
摘要：儿童的语音提出了挑战，年龄和性别分类，由于高度的变化，在音高，清晰度和发展特点。虽然自我监督学习（SSL）模型在成人语音任务中表现良好，但它们对儿童说话者特征进行编码的能力仍然未得到充分研究。本文使用PFSTAR和CMU Kids数据集对四种Wav 2 Vec 2变体进行了详细的逐层分析。结果表明，早期层（1-7）比更深的层更有效地捕捉特定于说话者的线索，而更深的层越来越关注语言信息。应用PCA进一步改进了分类，减少了冗余并突出了信息量最大的组件。Wav 2 Vec 2-large-lv 60模型在CMU Kids上达到97.14%（年龄）和98.20%（性别）; base-100 h和large-lv 60模型在PFSTAR上达到86.05%和95.00%。这些结果揭示了说话人特征是如何在SSL模型深度上进行结构化的，并支持针对儿童感知语音界面的更有针对性的自适应策略。
摘要：Children's speech presents challenges for age and gender classification due to high variability in pitch, articulation, and developmental traits. While self-supervised learning (SSL) models perform well on adult speech tasks, their ability to encode speaker traits in children remains underexplored. This paper presents a detailed layer-wise analysis of four Wav2Vec2 variants using the PFSTAR and CMU Kids datasets. Results show that early layers (1-7) capture speaker-specific cues more effectively than deeper layers, which increasingly focus on linguistic information. Applying PCA further improves classification, reducing redundancy and highlighting the most informative components. The Wav2Vec2-large-lv60 model achieves 97.14% (age) and 98.20% (gender) on CMU Kids; base-100h and large-lv60 models reach 86.05% and 95.00% on PFSTAR. These results reveal how speaker traits are structured across SSL model depth and support more targeted, adaptive strategies for child-aware speech interfaces.

迁移|Zero/Few/One-Shot|自适应(7篇)

【1】PASS: Probabilistic Agentic Supernet Sampling for Interpretable and Adaptive Chest X-Ray Reasoning
标题：通过：用于可解释和自适应胸部X射线推理的概率统计超网采样
链接：https://arxiv.org/abs/2508.10501

作者：g, Junye Du, Yingying Hong, Qifan Wang, Lequan Yu
摘要：现有的工具增强代理系统在现实世界中受到以下限制：（i）黑盒推理步骤，破坏决策的信任并带来安全风险;（ii）差的多模式集成，这对医疗保健任务至关重要;以及（iii）刚性和计算效率低下的代理管道。我们介绍PASS（概率超网采样），第一个多模态框架，以解决这些挑战的背景下，胸部X射线（CXR）推理。PASS在多工具图上自适应地对代理工作流进行采样，产生用可解释的概率注释的决策路径。考虑到具有多模态医疗数据的复杂CXR推理任务，PASS利用其在代理超网上学习的任务条件分布。因此，它在每个超网层自适应地选择最合适的工具，为事后审计提供概率注释轨迹，并直接增强医疗AI安全性。PASS还不断将重要发现压缩到不断发展的个性化记忆中，同时动态决定是否深化其推理路径或调用早期退出以提高效率。为了优化Pareto边界平衡性能和成本，我们设计了一种新的三阶段训练过程，包括专家知识预热，对比路径排名和成本感知强化学习。为了便于严格的评估，我们引入了CAB-E，这是一个多步骤，安全关键，自由形式的CXR推理的综合基准。跨各种基准的实验验证了PASS在多个指标（例如，准确度，AUC，LLM-J。）同时平衡计算成本，推动新的范式转向可解释的，自适应的和多模式的医疗代理系统。
摘要：Existing tool-augmented agentic systems are limited in the real world by (i) black-box reasoning steps that undermine trust of decision-making and pose safety risks, (ii) poor multimodal integration, which is inherently critical for healthcare tasks, and (iii) rigid and computationally inefficient agentic pipelines. We introduce PASS (Probabilistic Agentic Supernet Sampling), the first multimodal framework to address these challenges in the context of Chest X-Ray (CXR) reasoning. PASS adaptively samples agentic workflows over a multi-tool graph, yielding decision paths annotated with interpretable probabilities. Given the complex CXR reasoning task with multimodal medical data, PASS leverages its learned task-conditioned distribution over the agentic supernet. Thus, it adaptively selects the most suitable tool at each supernet layer, offering probability-annotated trajectories for post-hoc audits and directly enhancing medical AI safety. PASS also continuously compresses salient findings into an evolving personalized memory, while dynamically deciding whether to deepen its reasoning path or invoke an early exit for efficiency. To optimize a Pareto frontier balancing performance and cost, we design a novel three-stage training procedure, including expert knowledge warm-up, contrastive path-ranking, and cost-aware reinforcement learning. To facilitate rigorous evaluation, we introduce CAB-E, a comprehensive benchmark for multi-step, safety-critical, free-form CXR reasoning. Experiments across various benchmarks validate that PASS significantly outperforms strong baselines in multiple metrics (e.g., accuracy, AUC, LLM-J.) while balancing computational costs, pushing a new paradigm shift towards interpretable, adaptive, and multimodal medical agentic systems.

【2】EDAPT: Towards Calibration-Free BCIs with Continual Online Adaptation
标题：EDAPT：通过持续在线调整实现免校准的BCI
链接：https://arxiv.org/abs/2508.10474

作者：l, Jaivardhan Kapoor, Ulf Ziemann, Jakob H. Macke
备注：Preprint
摘要：脑机接口（BCI）的准确性会下降，因为神经信号会随着时间的推移而漂移，并且在用户之间会发生变化，需要频繁的重新校准，这限制了实际部署。我们介绍EDAPT，一个任务和模型无关的框架，通过不断的模型自适应消除校准。EDAPT首先使用来自多个用户的数据训练基线解码器，然后随着神经模式在使用过程中的演变，通过监督微调不断个性化该模型。我们在涵盖三个BCI任务的九个数据集上测试了EDAPT，发现它始终比传统的静态方法提高了准确性。这些改进主要来自于人口级预训练和在线持续微调的结合，以及无监督的领域自适应在某些数据集上提供进一步的增益。EDAPT高效运行，在消费级硬件上在200毫秒内更新模型。最后，解码准确性与总数据预算有关，而不是其在受试者和试验之间的分配。EDAPT提供了一种实现无校准BCI的实用途径，减少了BCI部署的主要障碍。
摘要：Brain-computer interfaces (BCIs) suffer from accuracy degradation as neural signals drift over time and vary across users, requiring frequent recalibration that limits practical deployment. We introduce EDAPT, a task- and model-agnostic framework that eliminates calibration through continual model adaptation. EDAPT first trains a baseline decoder using data from multiple users, then continually personalizes this model via supervised finetuning as the neural patterns evolve during use. We tested EDAPT across nine datasets covering three BCI tasks, and found that it consistently improved accuracy over conventional, static methods. These improvements primarily stem from combining population-level pretraining and online continual finetuning, with unsupervised domain adaptation providing further gains on some datasets. EDAPT runs efficiently, updating models within 200 milliseconds on consumer-grade hardware. Finally, decoding accuracy scales with total data budget rather than its allocation between subjects and trials. EDAPT provides a practical pathway toward calibration-free BCIs, reducing a major barrier to BCI deployment.

【3】Source Component Shift Adaptation via Offline Decomposition and Online Mixing Approach
标题：通过离线分解和在线混合方法的源成分转移适应
链接：https://arxiv.org/abs/2508.10257

作者：suno
备注：To appear in ECAI 2025
摘要：本文讨论了源分量移位自适应，旨在根据过去的训练数据更新预测，以适应传入数据流的源分量移位。现有的在线学习方法往往无法有效地利用经常性的变化，而基于模型库的方法很难捕捉单个源组件，导致适应性差。在本文中，我们提出了一种通过离线分解和在线混合方法的源分量移位自适应方法。我们从理论上确定，该问题可以分为两个子问题：离线源分量分解和在线混合权重自适应。在此基础上，我们的方法首先确定预测模型，每个模型都通过EM算法离线学习仅基于过去训练数据的源组件。然后，通过在线凸优化更新预测模型的混合权重，实现精确预测。由于我们的理论推导，我们的方法充分利用了移位的特性，实现了优于现有方法的自适应性能。在各种真实回归数据集上进行的实验表明，我们的方法优于基线，将累积测试损失减少了67.4%。
摘要：This paper addresses source component shift adaptation, aiming to update predictions adapting to source component shifts for incoming data streams based on past training data. Existing online learning methods often fail to utilize recurring shifts effectively, while model-pool-based methods struggle to capture individual source components, leading to poor adaptation. In this paper, we propose a source component shift adaptation method via an offline decomposition and online mixing approach. We theoretically identify that the problem can be divided into two subproblems: offline source component decomposition and online mixing weight adaptation. Based on this, our method first determines prediction models, each of which learns a source component solely based on past training data offline through the EM algorithm. Then, it updates the mixing weight of the prediction models for precise prediction through online convex optimization. Thanks to our theoretical derivation, our method fully leverages the characteristics of the shifts, achieving superior adaptation performance over existing methods. Experiments conducted on various real-world regression datasets demonstrate that our method outperforms baselines, reducing the cumulative test loss by up to 67.4%.

【4】Multi-Agent Reinforcement Learning for Adaptive Resource Orchestration in Cloud-Native Clusters
标题：基于多Agent强化学习的云原生集群自适应资源配置
链接：https://arxiv.org/abs/2508.10253

作者：o, Heyao Liu, Linyan Dai
摘要：本文讨论了云原生数据库系统的高资源动态性和调度复杂性的挑战。提出了一种基于多Agent强化学习的自适应资源编排方法。该方法引入了一种基于角色的异构Agent建模机制。这允许不同的资源实体（例如计算节点、存储节点和调度器）采用不同的策略表示。这些代理人能够更好地反映系统内不同的职能责任和当地环境特点。设计了一种奖励形成机制，将局部观察与全局反馈相结合。这有助于减轻由不完整的状态观察引起的策略学习偏差。该机制将实时局部性能信号与全局系统价值估计相结合，提高了代理之间的协调性，增强了策略收敛的稳定性。开发了一个统一的多智能体训练框架，并在一个有代表性的生产调度数据集上进行了评估。实验结果表明，该方法优于传统的方法在多个关键指标。这些包括资源利用率，调度延迟，策略收敛速度，系统稳定性和公平性。结果具有较强的推广性和实用性。在不同的实验场景中，该方法被证明是有效的，在处理高并发，高维状态空间，复杂的依赖关系的编排任务。这证实了它在现实世界中的优势，大规模的调度环境。
摘要：This paper addresses the challenges of high resource dynamism and scheduling complexity in cloud-native database systems. It proposes an adaptive resource orchestration method based on multi-agent reinforcement learning. The method introduces a heterogeneous role-based agent modeling mechanism. This allows different resource entities, such as compute nodes, storage nodes, and schedulers, to adopt distinct policy representations. These agents are better able to reflect diverse functional responsibilities and local environmental characteristics within the system. A reward-shaping mechanism is designed to integrate local observations with global feedback. This helps mitigate policy learning bias caused by incomplete state observations. By combining real-time local performance signals with global system value estimation, the mechanism improves coordination among agents and enhances policy convergence stability. A unified multi-agent training framework is developed and evaluated on a representative production scheduling dataset. Experimental results show that the proposed method outperforms traditional approaches across multiple key metrics. These include resource utilization, scheduling latency, policy convergence speed, system stability, and fairness. The results demonstrate strong generalization and practical utility. Across various experimental scenarios, the method proves effective in handling orchestration tasks with high concurrency, high-dimensional state spaces, and complex dependency relationships. This confirms its advantages in real-world, large-scale scheduling environments.

【5】PakBBQ: A Culturally Adapted Bias Benchmark for QA
标题：PakBBQ：适合QA的文化适应偏见基准
链接：https://arxiv.org/abs/2508.10186

作者：Hashmat, Muhammad Arham Mirza, Agha Ali Raza
备注：8 pages, 7 figures, 2 tables, Submitted to EMNLP 2025
摘要：随着大型语言模型（LLM）在各种应用程序中的广泛采用，确保其在所有用户社区中的公平性是经验性的。然而，大多数LLM都是在以西方为中心的数据上进行培训和评估的，很少关注低资源语言和区域背景。为了解决这一差距，我们引入了PakBBQ，这是原始问题分类（BBQ）数据集的偏差基准的文化和区域适应性扩展。PakBBQ包含超过214个模板，17180个问答对，涵盖8个偏见维度，包括年龄，残疾，外表，性别，社会经济地位，宗教，地区归属和巴基斯坦相关的语言形式。我们评估多个多语言LLM下的歧义和明确消除歧义的情况下，以及消极与非消极的问题框架。我们的实验表明：（i）平均准确率增益为12%的消歧，（ii）在乌尔都语中始终比英语更强的反偏见行为，以及（iii）显着的框架效应，减少刻板反应时，问题是负面的。这些研究结果强调了在低资源环境下，情境化基准和简单快速的工程策略对减轻偏见的重要性。
摘要：With the widespread adoption of Large Language Models (LLMs) across various applications, it is empirical to ensure their fairness across all user communities. However, most LLMs are trained and evaluated on Western centric data, with little attention paid to low-resource languages and regional contexts. To address this gap, we introduce PakBBQ, a culturally and regionally adapted extension of the original Bias Benchmark for Question Answering (BBQ) dataset. PakBBQ comprises over 214 templates, 17180 QA pairs across 8 categories in both English and Urdu, covering eight bias dimensions including age, disability, appearance, gender, socio-economic status, religious, regional affiliation, and language formality that are relevant in Pakistan. We evaluate multiple multilingual LLMs under both ambiguous and explicitly disambiguated contexts, as well as negative versus non negative question framings. Our experiments reveal (i) an average accuracy gain of 12\% with disambiguation, (ii) consistently stronger counter bias behaviors in Urdu than in English, and (iii) marked framing effects that reduce stereotypical responses when questions are posed negatively. These findings highlight the importance of contextualized benchmarks and simple prompt engineering strategies for bias mitigation in low resource settings.

【6】An Iterative Algorithm for Differentially Private $k$-PCA with Adaptive Noise
标题：带自适应噪声的差分私有k$-PCA迭代算法
链接：https://arxiv.org/abs/2508.10879

作者：üngler, Amartya Sanyal
摘要：给定$n$i. i. d。对于共享共同期望$\Sigma$的随机矩阵$A_i \in \mathbb{R}^{d \times d}$，差分隐私随机PCA的目标是识别维度$k$的子空间，该子空间捕获$\Sigma$的最大方差方向，同时保持每个个体$A_i$的差分隐私（DP）。现有的方法要么（i）需要样本大小$n$与维度$d$超线性地缩放，即使在$A_i$上的高斯假设下，要么（ii）即使在$A_i$内的固有随机性很小时，也为DP引入过多的噪声。Liu等人（2022 a）解决了亚高斯数据的这些问题，但仅用于使用他们的算法DP-PCA估计顶部特征向量（$k=1$）。我们提出的第一个算法能够估计的顶部$k$特征向量为任意$k \leq d$，同时克服上述两个限制。对于$k=1$我们的算法相匹配的DP-PCA的效用保证，即使当$n = \tilde{\！O}（d）$。我们进一步提供了一个下界一般k > 1$，匹配我们的上限到一个因素的k$，并通过实验证明了我们的算法的优势可比基线。
摘要：Given $n$ i.i.d. random matrices $A_i \in \mathbb{R}^{d \times d}$ that share a common expectation $\Sigma$, the objective of Differentially Private Stochastic PCA is to identify a subspace of dimension $k$ that captures the largest variance directions of $\Sigma$, while preserving differential privacy (DP) of each individual $A_i$. Existing methods either (i) require the sample size $n$ to scale super-linearly with dimension $d$, even under Gaussian assumptions on the $A_i$, or (ii) introduce excessive noise for DP even when the intrinsic randomness within $A_i$ is small. Liu et al. (2022a) addressed these issues for sub-Gaussian data but only for estimating the top eigenvector ($k=1$) using their algorithm DP-PCA. We propose the first algorithm capable of estimating the top $k$ eigenvectors for arbitrary $k \leq d$, whilst overcoming both limitations above. For $k=1$ our algorithm matches the utility guarantees of DP-PCA, achieving near-optimal statistical error even when $n = \tilde{\!O}(d)$. We further provide a lower bound for general $k > 1$, matching our upper bound up to a factor of $k$, and experimentally demonstrate the advantages of our algorithm over comparable baselines.

【7】Dynamical Alignment: A Principle for Adaptive Neural Computation
标题：动态对齐：自适应神经计算的原理
链接：https://arxiv.org/abs/2508.10064

作者：
备注：16 pages, 10 figures;
摘要：神经网络的计算能力被广泛认为是由其静态结构决定的。在这里，我们通过建立一个固定的神经结构可以在根本上不同的计算模式下运行，而不是由其结构驱动，而是由其输入信号的时间动态来挑战这种观点。我们把这个原则称为“动态对齐”。应用这一原理为长期存在的悖论提供了一个新的解决方案，即为什么大脑启发的尖峰神经网络（SNN）表现不佳。通过将静态输入编码为可控的动态轨迹，我们发现了一个双峰优化景观与相空间体积动力学的关键相变。由收缩动态驱动的“耗散”模式通过稀疏时间代码实现了优异的能量效率。相比之下，由扩展动态驱动的“扩展”模式释放了SNN在不同任务（包括分类、强化学习和认知整合）上匹配甚至超过人工神经网络所需的表征能力。我们发现这种计算优势出现在输入动态和神经元整合之间的时间尺度对齐。这一原则，反过来，提供了一个统一的，可计算的角度来看，长期观察到的二元性神经科学，从稳定性可塑性的困境，分离整合动态。它表明，生物和人工系统中的计算可以通过“软件”在固定的“硬件”上动态塑造，这表明人工智能研究的潜在范式转变：远离设计复杂的静态架构，转向掌握自适应的动态计算原理。
摘要：The computational capabilities of a neural network are widely assumed to be determined by its static architecture. Here we challenge this view by establishing that a fixed neural structure can operate in fundamentally different computational modes, driven not by its structure but by the temporal dynamics of its input signals. We term this principle 'Dynamical Alignment'. Applying this principle offers a novel resolution to the long-standing paradox of why brain-inspired spiking neural networks (SNNs) underperform. By encoding static input into controllable dynamical trajectories, we uncover a bimodal optimization landscape with a critical phase transition governed by phase space volume dynamics. A 'dissipative' mode, driven by contracting dynamics, achieves superior energy efficiency through sparse temporal codes. In contrast, an 'expansive' mode, driven by expanding dynamics, unlocks the representational power required for SNNs to match or even exceed their artificial neural network counterparts on diverse tasks, including classification, reinforcement learning, and cognitive integration. We find this computational advantage emerges from a timescale alignment between input dynamics and neuronal integration. This principle, in turn, offers a unified, computable perspective on long-observed dualities in neuroscience, from stability-plasticity dilemma to segregation-integration dynamic. It demonstrates that computation in both biological and artificial systems can be dynamically sculpted by 'software' on fixed 'hardware', pointing toward a potential paradigm shift for AI research: away from designing complex static architectures and toward mastering adaptive, dynamic computation principles.

强化学习(7篇)

【1】REFN: A Reinforcement-Learning-From-Network Framework against 1-day/n-day Exploitations
标题：REFN：一个针对1天/n天剥削的网络强化学习框架
链接：https://arxiv.org/abs/2508.10701

作者：Yu, Lihong Liu, Ziyi Zhou, Fudu Xing, Kailong Wang, Yang Yang
摘要：1天或n天漏洞的利用对联网设备构成严重威胁，因为部署规模大，补丁延迟（平均补丁时间超过60天）。现有的防御措施，包括基于主机的修补和基于网络的过滤，是不够的，因为跨不同设备的可扩展性有限，兼容性问题，特别是与嵌入式或传统系统，以及容易出错的部署过程（手动补丁验证）。为了解决这些问题，我们引入了REFN（Reinforcement Learning From Network），这是一种新的框架，可以训练大型语言模型（LLM）自主生成网络过滤器，以防止1天或n天的利用。REFN通过独特地采用在线网络奖励驱动的强化学习（RL）而不是传统的人类反馈（RLHF）来确保可扩展性。REFN通过在边缘安全网关（Amazon Eero）上的统一部署来保证兼容性。REFN通过使用真实网络流量的在线验证提供鲁棒性。至关重要的是，REFN解决了培训LLM预防剥削的三个核心挑战：1）通过基于知识蒸馏的抽象RAG扩展当前LLM有限的漏洞修复专业知识，2）通过翻译语言上下文的RL From VNF Pipeline将当前LLM语言桥接到网络间隙（漏洞描述）纳入网络执法，3）通过惩罚错误输出的在线验证来解决LLM幻觉和非确定性。通过对22个系列的1天或n天漏洞进行评估，REFN证明了有效性（比替代品高21.1%的准确性），效率（平均补丁时间为3.65小时）和可扩展性（轻松扩展到10K设备）。REFN是培训LLM的第一步，以快速防止大规模的1天或n天剥削。
摘要：The exploitation of 1 day or n day vulnerabilities poses severe threats to networked devices due to massive deployment scales and delayed patching (average Mean Time To Patch exceeds 60 days). Existing defenses, including host based patching and network based filtering, are inadequate due to limited scalability across diverse devices, compatibility issues especially with embedded or legacy systems, and error prone deployment process (manual patch validation). To address these issues, we introduce REFN (Reinforcement Learning From Network), a novel framework that trains Large Language Models (LLMs) to autonomously generate network filters to prevent 1 day or n day exploitations. REFN ensures scalability by uniquely employs Reinforcement Learning (RL) driven by online network rewards instead of traditional Human Feedback (RLHF). REFN guarantees compatibility via unified deployment on edge security gateways (Amazon Eero). REFN provides robustness via online validation using real network traffic. Crucially, REFN addresses three core challenges in training LLMs for exploit prevention: 1) expanding current LLMs limited vulnerability fixing expertise via Agentic RAG based Knowledge Distillation, 2) bridging current LLMs language to network gaps through an RL From VNF Pipeline that translates language context (vulnerability description) into network enforcement, 3) addressing the LLM hallucination and non determinism via the Online Agentic Validation that penalizes erroneous outputs. Evaluated across 22 families of 1 day or n day exploits, REFN demonstrates effectiveness (21.1 percent higher accuracy than alternatives), efficiency (Mean Time To Patch of 3.65 hours) and scalability (easily scale to 10K devices). REFN serves as an initial step toward training LLMs to rapidly prevent massive scale 1 day or n day exploitations.

【2】Variance Reduced Policy Gradient Method for Multi-Objective Reinforcement Learning
标题：多目标强化学习的方差减少策略梯度方法
链接：https://arxiv.org/abs/2508.10608

作者：idobene, Lorenzo Benedetti, Diego Arapovic
备注：7 pages, 4 figures
摘要：多目标强化学习（MORL）是传统强化学习（RL）的推广，旨在同时优化多个经常相互冲突的目标，而不是专注于单一奖励。这种方法在复杂的决策场景中至关重要，在这些场景中，代理人必须在各种目标之间进行权衡，例如最大限度地提高性能，同时最大限度地降低成本。我们考虑的问题，MORL的目标结合使用非线性标量函数。就像在标准RL中一样，策略梯度方法（PGMs）是处理MORL中大型连续状态-动作空间的最有效方法之一。然而，现有的用于MORL的PGM遭受高样本效率，需要大量的数据才能有效。以前解决这个问题的尝试依赖于过于严格的假设，失去了PGM在大状态-动作空间的可扩展性方面的优势。在这项工作中，我们解决了样本效率的问题，通过实施方差减少技术，以减少政策梯度的样本复杂性，同时保持一般假设。
摘要：Multi-Objective Reinforcement Learning (MORL) is a generalization of traditional Reinforcement Learning (RL) that aims to optimize multiple, often conflicting objectives simultaneously rather than focusing on a single reward. This approach is crucial in complex decision-making scenarios where agents must balance trade-offs between various goals, such as maximizing performance while minimizing costs. We consider the problem of MORL where the objectives are combined using a non-linear scalarization function. Just like in standard RL, policy gradient methods (PGMs) are amongst the most effective for handling large and continuous state-action spaces in MORL. However, existing PGMs for MORL suffer from high sample inefficiency, requiring large amounts of data to be effective. Previous attempts to solve this problem rely on overly strict assumptions, losing PGMs' benefits in scalability to large state-action spaces. In this work, we address the issue of sample efficiency by implementing variance-reduction techniques to reduce the sample complexity of policy gradients while maintaining general assumptions.

【3】Stabilizing Long-term Multi-turn Reinforcement Learning with Gated Rewards
标题：通过门控奖励稳定长期多轮强化学习
链接：https://arxiv.org/abs/2508.10548

作者：n, Dongfang Li, Zhuoen Chen, Yuhuai Qin, Baotian Hu
摘要：长期强化学习（RL）任务中的奖励稀疏性仍然是一个重大挑战，而现有的基于结果的奖励塑造努力定义有意义的即时奖励，而不引入偏见或需要明确的任务分解。或者，基于验证的奖励塑造使用逐步批评，但即时奖励和长期目标之间的不一致可能导致奖励黑客和次优政策。在这项工作中，我们解决这个问题的软件工程（SWE）的任务，其中多轮推理和基于规则的验证是至关重要的。我们介绍了面向SWE的RL框架，这是一个统一的系统，支持多轮交互，基于docker的执行和可定制的奖励功能。此外，我们提出了门控奖励累积（G-RA），这是一种新方法，仅当高级别（长期）奖励满足预定义阈值时才累积即时奖励，确保稳定的RL优化。在SWE-BenchVerified和kBench上的实验表明，G-RA提高了策略完成率（47.6% rightarrow 93.8%和22.0% rightarrow 86.0%）和修改率（19.6% rightarrow 23.8%和12.0% rightarrow 42.0%），同时避免了由于奖励错位而导致的策略退化。我们的研究结果强调了长期强化学习中平衡奖励积累的重要性，并提供了一个实用的解决方案。
摘要：Reward sparsity in long-horizon reinforcement learning (RL) tasks remains a significant challenge, while existing outcome-based reward shaping struggles to define meaningful immediate rewards without introducing bias or requiring explicit task decomposition. Alternatively, verification-based reward shaping uses stepwise critics, but misalignment between immediate rewards and long-term objectives can lead to reward hacking and suboptimal policies. In this work, we address this problem in the context of software engineering (SWE) tasks, where multi-turn reasoning and rule-based verification are critical. We introduce the SWE-oriented RL Framework, a unified system supporting multi-turn interaction, docker-based execution, and customizable reward functions. Additionally, we propose Gated Reward Accumulation (G-RA), a novel method that accumulates immediate rewards only when high-level (long-term) rewards meet a predefined threshold, ensuring stable RL optimization. Experiments on SWE-bench Verified and kBench demonstrate that G-RA leads to an increase in completion rates (47.6\% \rightarrow 93.8\% and 22.0\% \rightarrow 86.0\%) and modification rates (19.6\% \rightarrow 23.8\% and 12.0\% \rightarrow 42.0\%), while avoiding policy degradation caused by reward misalignment. Our findings highlight the importance of balanced reward accumulation in long-horizon RL and provide a practical solution.

【4】Nonlocal Monte Carlo via Reinforcement Learning
标题：通过强化学习的非本地蒙特卡洛
链接：https://arxiv.org/abs/2508.10520

作者：obrynin, Masoud Mohseni, John Paul Strachan
摘要：优化或采样组合优化问题的复杂成本函数是跨学科和应用的长期挑战。当采用基于马尔可夫链蒙特卡罗（MCMC）的传统算法家族（例如模拟退火或并行回火）时，假设输入上的均匀（平衡）温度分布。这种实例独立的方法被证明是无效的最难的基准附近的计算相变时，所谓的间隙属性举行。在这些制度中，传统的MCMC很难解冻刚性变量，摆脱次优的吸引范围，并尝试高质量和多样化的解决方案。为了缓解这些挑战，提出了非平衡非局部蒙特卡罗（NMC）算法，该算法利用不均匀的温度分布，从而加速对配置空间的探索，而不影响其利用。在这里，我们采用深度强化学习（RL）来训练先前现象学设计的NMC的非局部转换策略。我们证明，由此产生的求解器可以单独通过观察配置空间探索的能量变化作为RL奖励和RL状态的局部最小能量景观几何来训练。我们进一步表明，训练后的策略在剩余能量，求解时间和解决方案度量的多样性方面改善了基于MCMC和非局部模拟退火的硬均匀随机和无尺度随机4-SAT基准。
摘要：Optimizing or sampling complex cost functions of combinatorial optimization problems is a longstanding challenge across disciplines and applications. When employing family of conventional algorithms based on Markov Chain Monte Carlo (MCMC) such as simulated annealing or parallel tempering, one assumes homogeneous (equilibrium) temperature profiles across input. This instance independent approach was shown to be ineffective for the hardest benchmarks near a computational phase transition when the so-called overlap-gap-property holds. In these regimes conventional MCMC struggles to unfreeze rigid variables, escape suboptimal basins of attraction, and sample high-quality and diverse solutions. In order to mitigate these challenges, Nonequilibrium Nonlocal Monte Carlo (NMC) algorithms were proposed that leverage inhomogeneous temperature profiles thereby accelerating exploration of the configuration space without compromising its exploitation. Here, we employ deep reinforcement learning (RL) to train the nonlocal transition policies of NMC which were previously designed phenomenologically. We demonstrate that the resulting solver can be trained solely by observing energy changes of the configuration space exploration as RL rewards and the local minimum energy landscape geometry as RL states. We further show that the trained policies improve upon the standard MCMC-based and nonlocal simulated annealing on hard uniform random and scale-free random 4-SAT benchmarks in terms of residual energy, time-to-solution, and diversity of solutions metrics.

【5】A Curriculum Learning Approach to Reinforcement Learning: Leveraging RAG for Multimodal Question Answering
标题：强化学习的课程学习方法：利用RAG进行多模式问题解答
链接：https://arxiv.org/abs/2508.10337

作者： Zhang, Lin Wang, Yuanyuan Lu, Yusheng Qi, Kexin Wang, Peixu Hou, Wenshi Chen
摘要：本文介绍了大众点评信任安全团队针对Meta CRAG-MM挑战的解决方案。这一挑战需要建立一个全面的检索增强生成系统，能够进行多模态多回合问题回答。比赛包括三个任务：（1）使用从基于图像的模拟知识图中检索到的结构化数据回答问题，（2）从知识图和网络搜索结果中合成信息，以及（3）处理需要上下文理解和来自多个来源的信息聚合的多轮对话。对于任务1，我们的解决方案基于vision大型语言模型，并通过从GPT-4.1中提取的知识进行监督微调来增强。我们进一步应用课程学习策略来指导强化学习，从而提高了答案的准确性并减少了幻觉。对于任务2和任务3，我们还利用Web搜索API来整合外部知识，使系统能够更好地处理复杂的查询和多轮对话。我们的方法在任务1中获得第一名，领先52.38%，在任务3中获得第三名，证明了课程学习与强化学习在我们的培训管道中整合的有效性。
摘要：This paper describes the solutions of the Dianping-Trust-Safety team for the META CRAG-MM challenge. The challenge requires building a comprehensive retrieval-augmented generation system capable for multi-modal multi-turn question answering. The competition consists of three tasks: (1) answering questions using structured data retrieved from an image-based mock knowledge graph, (2) synthesizing information from both knowledge graphs and web search results, and (3) handling multi-turn conversations that require context understanding and information aggregation from multiple sources. For Task 1, our solution is based on the vision large language model, enhanced by supervised fine-tuning with knowledge distilled from GPT-4.1. We further applied curriculum learning strategies to guide reinforcement learning, resulting in improved answer accuracy and reduced hallucination. For Task 2 and Task 3, we additionally leveraged web search APIs to incorporate external knowledge, enabling the system to better handle complex queries and multi-turn conversations. Our approach achieved 1st place in Task 1 with a significant lead of 52.38\%, and 3rd place in Task 3, demonstrating the effectiveness of the integration of curriculum learning with reinforcement learning in our training pipeline.

【6】A Personalized Exercise Assistant using Reinforcement Learning (PEARL): Results from a four-arm Randomized-controlled Trial
标题：使用强化学习（PEARL）的个性化锻炼助手：四臂随机对照试验的结果
链接：https://arxiv.org/abs/2508.10060

作者：to Lee, Narayan Hegde, Nina Deliu, Emily Rosenzweig, Arun Suggala, Sriram Lakshminarasimhan, Qian He, John Hernandez, Martin Seneviratne, Rahul Singh, Pradnesh Kalkar, Karthikeyan Shanmugam, Aravindan Raghuveer, Abhimanyu Singh, My Nguyen, James Taylor, Jatin Alla, Sofia S. Villar, Hulya Emir-Farinas
摘要：持续缺乏身体活动是一个重大的全球健康挑战。移动健康（mHealth）干预措施，特别是即时适应性干预措施（JITAIs），为可扩展的个性化身体活动（PA）推广提供了一个有前途的途径。然而，大规模开发和评估这种干预措施，同时整合强大的行为科学，提出了方法上的障碍。PEARL研究是第一项大规模的四臂随机对照试验，旨在评估强化学习（RL）算法，该算法以健康行为改变理论为基础，通过Fitbit应用程序个性化PA轻推的内容和时间。我们招募了13,463名Fitbit用户，并将其随机分为四个研究组：对照组、随机组、固定组和RL组。控制臂没有受到轻推。另外三只手臂则从一个基于行为科学原理的155个轻推库中接受轻推。随机臂接受随机选择的轻推。固定臂收到轻推的基础上预先设定的逻辑从PA障碍的调查答复。RL组接收由自适应RL算法选择的轻推。我们在主要分析中纳入了7，711名参与者（平均年龄42.1岁，86.3%为女性，基线步数5，618.2）。我们观察到，从基线到1个月和2个月，RL组的PA比所有其他组增加。与所有其他组相比，RL组在1个月时的平均每日步数显著增加：对照组（+296步，p=0.0002）、随机组（+218步，p=0.005）和固定组（+238步，p=0.002）。在2个月时，RL组与对照组相比持续显著增加（+210步，p=0.0122）。广义估计方程模型还显示，RL组与对照组相比，每日步数持续增加（+208步，p=0.002）。这些发现表明了一种可扩展的、行为上知情的RL方法的潜力，可以为PA提供个性化的数字健康干预。
摘要：Consistent physical inactivity poses a major global health challenge. Mobile health (mHealth) interventions, particularly Just-in-Time Adaptive Interventions (JITAIs), offer a promising avenue for scalable, personalized physical activity (PA) promotion. However, developing and evaluating such interventions at scale, while integrating robust behavioral science, presents methodological hurdles. The PEARL study was the first large-scale, four-arm randomized controlled trial to assess a reinforcement learning (RL) algorithm, informed by health behavior change theory, to personalize the content and timing of PA nudges via a Fitbit app. We enrolled and randomized 13,463 Fitbit users into four study arms: control, random, fixed, and RL. The control arm received no nudges. The other three arms received nudges from a bank of 155 nudges based on behavioral science principles. The random arm received nudges selected at random. The fixed arm received nudges based on a pre-set logic from survey responses about PA barriers. The RL group received nudges selected by an adaptive RL algorithm. We included 7,711 participants in primary analyses (mean age 42.1, 86.3% female, baseline steps 5,618.2). We observed an increase in PA for the RL group compared to all other groups from baseline to 1 and 2 months. The RL group had significantly increased average daily step count at 1 month compared to all other groups: control (+296 steps, p=0.0002), random (+218 steps, p=0.005), and fixed (+238 steps, p=0.002). At 2 months, the RL group sustained a significant increase compared to the control group (+210 steps, p=0.0122). Generalized estimating equation models also revealed a sustained increase in daily steps in the RL group vs. control (+208 steps, p=0.002). These findings demonstrate the potential of a scalable, behaviorally-informed RL approach to personalize digital health interventions for PA.

【7】SegDAC: Segmentation-Driven Actor-Critic for Visual Reinforcement Learning
标题：SegADC：分段驱动的视觉强化学习演员评论家
链接：https://arxiv.org/abs/2508.09325

作者： Brown, Glen Berseth
摘要：视觉强化学习（RL）具有挑战性，因为需要从高维输入和噪声奖励中学习感知和动作。虽然存在大型感知模型，但将它们有效地集成到RL中以实现视觉泛化和提高样本效率仍然不清楚。我们提出了SegDAC，一种分段驱动的演员评论方法。SegDAC使用SegAnything（SAM）进行以对象为中心的分解，并使用YOLO-World通过文本提示进行语义上的细分。它包括一个新的基于transformer的架构，该架构在每个时间步支持动态数量的段，并有效地学习使用在线RL关注哪些段，而不使用人工标签。通过评估SegDAC在一个具有挑战性的视觉泛化基准使用Manislets 3，它涵盖了强烈的视觉扰动下的各种操作任务，我们证明了SegDAC实现了显着更好的视觉泛化，在最困难的设置和匹配或超越之前的方法在所有评估的任务的样本效率加倍之前的性能。
摘要：Visual reinforcement learning (RL) is challenging due to the need to learn both perception and actions from high-dimensional inputs and noisy rewards. Although large perception models exist, integrating them effectively into RL for visual generalization and improved sample efficiency remains unclear. We propose SegDAC, a Segmentation-Driven Actor-Critic method. SegDAC uses Segment Anything (SAM) for object-centric decomposition and YOLO-World to ground segments semantically via text prompts. It includes a novel transformer-based architecture that supports a dynamic number of segments at each time step and effectively learns which segments to focus on using online RL, without using human labels. By evaluating SegDAC over a challenging visual generalization benchmark using Maniskill3, which covers diverse manipulation tasks under strong visual perturbations, we demonstrate that SegDAC achieves significantly better visual generalization, doubling prior performance on the hardest setting and matching or surpassing prior methods in sample efficiency across all evaluated tasks.

符号|符号学习(1篇)

【1】Understanding Textual Emotion Through Emoji Prediction
标题：通过拼音预测理解文本情感
链接：https://arxiv.org/abs/2508.10222

作者：don, Nishank Kuppa, Rigved Tummala, Sriram Anasuri
摘要：该项目使用四种深度学习架构探索短文本序列的表情符号预测：前馈网络，CNN，Transformer和BERT。使用TweetEval数据集，我们通过焦点丢失和正则化技术来解决类不平衡问题。结果显示，由于其预训练优势，BERT实现了最高的整体性能，而CNN在罕见的表情符号类上表现出更好的效果。该研究表明了架构选择和超参数调整对于情感感知表情符号预测的重要性，有助于改善人机交互。
摘要：This project explores emoji prediction from short text sequences using four deep learning architectures: a feed-forward network, CNN, transformer, and BERT. Using the TweetEval dataset, we address class imbalance through focal loss and regularization techniques. Results show BERT achieves the highest overall performance due to its pre-training advantage, while CNN demonstrates superior efficacy on rare emoji classes. This research shows the importance of architecture selection and hyperparameter tuning for sentiment-aware emoji prediction, contributing to improved human-computer interaction.

医学相关(5篇)

【1】Mobile-Friendly Deep Learning for Plant Disease Detection: A Lightweight CNN Benchmark Across 101 Classes of 33 Crops
标题：用于植物病害检测的移动友好深度学习：跨越101类（33种作物）的轻量级CNN基准
链接：https://arxiv.org/abs/2508.10817

作者：ar, Harminder Pal Monga, Tapasi Brahma, Satyam Kalra, Navas Sherif
备注：15 pages, 5 figures, 2 tables
摘要：植物病害是全球粮食安全的主要威胁。重要的是开发能够准确检测的早期检测系统。计算机视觉技术的进步有可能解决这一挑战。我们开发了一种移动友好的解决方案，可以准确地对33种作物的101种植物病害进行分类。我们通过组合不同的数据集，Plant Doc，PlantVillage和PlantWild，构建了一个综合数据集，所有这些数据集都是出于相同的目的。我们评估了几种轻量级架构的性能-MobileNetV 2、MobileNetV 3、MobileNetV 3-Large和EfficientNet-B 0、B1 -这些架构是专门针对资源受限设备上的效率而选择的。结果令人鼓舞，EfficientNet-B1以94.7%的分类准确率提供了我们最好的性能。这种架构在准确性和计算效率之间取得了最佳平衡，使其非常适合在移动设备上进行实际部署。
摘要：Plant diseases are a major threat to food security globally. It is important to develop early detection systems which can accurately detect. The advancement in computer vision techniques has the potential to solve this challenge. We have developed a mobile-friendly solution which can accurately classify 101 plant diseases across 33 crops. We built a comprehensive dataset by combining different datasets, Plant Doc, PlantVillage, and PlantWild, all of which are for the same purpose. We evaluated performance across several lightweight architectures - MobileNetV2, MobileNetV3, MobileNetV3-Large, and EfficientNet-B0, B1 - specifically chosen for their efficiency on resource-constrained devices. The results were promising, with EfficientNet-B1 delivering our best performance at 94.7% classification accuracy. This architecture struck an optimal balance between accuracy and computational efficiency, making it well-suited for real-world deployment on mobile devices.

【2】Detecting and explaining postpartum depression in real-time with generative artificial intelligence
标题：利用生成性人工智能实时检测和解释产后抑郁症
链接：https://arxiv.org/abs/2508.10025

作者：rcía-Méndez, Francisco de Arriba-Pérez
摘要：在母亲分娩后面临的许多挑战中，产后抑郁症（PPD）是一种严重的疾病，严重影响她们的身心健康。因此，产后抑郁症及其相关危险因素的快速检测对于及时评估和通过专门的预防程序进行干预至关重要。因此，这项工作解决了帮助从业者利用最新技术进步做出决策的需要，以实现实时筛查和治疗建议。主要是，我们的工作有助于智能PPD筛选系统，该系统结合了自然语言处理，机器学习（ML）和大型语言模型（LLM），以实现负担得起的，实时的和非侵入性的自由言论分析。此外，它还解决了黑箱问题，因为通过LLM与可解释的ml模型的结合（即，基于树的算法）使用特征重要性和自然语言。所有评价指标的PPD检测结果均为90%，优于文献中的竞争解决方案。最终，我们的解决方案有助于快速检测PPD及其相关风险因素，这对于及时和适当的评估和干预至关重要。
摘要：Among the many challenges mothers undergo after childbirth, postpartum depression (PPD) is a severe condition that significantly impacts their mental and physical well-being. Consequently, the rapid detection of ppd and their associated risk factors is critical for in-time assessment and intervention through specialized prevention procedures. Accordingly, this work addresses the need to help practitioners make decisions with the latest technological advancements to enable real-time screening and treatment recommendations. Mainly, our work contributes to an intelligent PPD screening system that combines Natural Language Processing, Machine Learning (ML), and Large Language Models (LLMs) towards an affordable, real-time, and non-invasive free speech analysis. Moreover, it addresses the black box problem since the predictions are described to the end users thanks to the combination of LLMs with interpretable ml models (i.e., tree-based algorithms) using feature importance and natural language. The results obtained are 90 % on ppd detection for all evaluation metrics, outperforming the competing solutions in the literature. Ultimately, our solution contributes to the rapid detection of PPD and their associated risk factors, critical for in-time and proper assessment and intervention.

【3】A Robust Pipeline for Differentially Private Federated Learning on Imbalanced Clinical Data using SMOTETomek and FedProx
标题：使用SMOTETOomek和FedProx对不平衡临床数据进行差异化私人联邦学习的稳健管道
链接：https://arxiv.org/abs/2508.10017

作者：ertulino
备注：This is being prepared to be submitted to the Journal of the Brazilian Computer Society (JBCS), which is still under construction
摘要：联邦学习（FL）为协作健康研究提供了一种开创性的方法，允许在分散数据上进行模型训练，同时保护患者隐私。FL在与差分隐私（DP）结合时提供正式的安全保证。然而，这些技术的集成在隐私和临床实用性之间引入了一个重要的权衡，这一挑战由于医疗数据集中经常存在的严重类别不平衡而进一步复杂化。本文提出的研究通过系统的多阶段分析解决这些相互关联的问题。一个FL框架被用于心血管风险预测，最初的实验表明，标准方法难以处理不平衡的数据，导致召回率为零。为了克服这种限制，我们首先在客户端层面将混合合成少数过采样技术与Tomek Links（SMOTETomek）集成在一起，成功开发了一个临床有用的模型。随后，使用调整的FedProx算法针对非IID数据对框架进行了优化。我们的最终结果揭示了隐私预算（Risk）和模型召回之间的明确的非线性权衡，优化的FedProx始终优于标准的FedAvg。在隐私-效用边界上确定了一个最佳操作区域，在该区域可以实现强隐私保证（具有9.0），同时保持高临床效用（召回率大于77%）。最终，我们的研究提供了一个实用的方法蓝图，用于创建有效，安全和准确的诊断工具，可应用于现实世界的异构医疗数据。
摘要：Federated Learning (FL) presents a groundbreaking approach for collaborative health research, allowing model training on decentralized data while safeguarding patient privacy. FL offers formal security guarantees when combined with Differential Privacy (DP). The integration of these technologies, however, introduces a significant trade-off between privacy and clinical utility, a challenge further complicated by the severe class imbalance often present in medical datasets. The research presented herein addresses these interconnected issues through a systematic, multi-stage analysis. An FL framework was implemented for cardiovascular risk prediction, where initial experiments showed that standard methods struggled with imbalanced data, resulting in a recall of zero. To overcome such a limitation, we first integrated the hybrid Synthetic Minority Over-sampling Technique with Tomek Links (SMOTETomek) at the client level, successfully developing a clinically useful model. Subsequently, the framework was optimized for non-IID data using a tuned FedProx algorithm. Our final results reveal a clear, non-linear trade-off between the privacy budget (epsilon) and model recall, with the optimized FedProx consistently out-performing standard FedAvg. An optimal operational region was identified on the privacy-utility frontier, where strong privacy guarantees (with epsilon 9.0) can be achieved while maintaining high clinical utility (recall greater than 77%). Ultimately, our study provides a practical methodological blueprint for creating effective, secure, and accurate diagnostic tools that can be applied to real-world, heterogeneous healthcare data.

【4】User Perception of Attention Visualizations: Effects on Interpretability Across Evidence-Based Medical Documents
标题：注意力可视化的用户感知：对循证医学文档可解释性的影响
链接：https://arxiv.org/abs/2508.10004

作者：rvallo, Denis Parra, Peter Brusilovsky, Hernan Valdivieso, Gabriel Rada, Ivania Donoso, Vladimir Araujo
摘要：注意力机制是Transformer体系结构的核心组件。除了提高性能之外，注意力还被提议作为一种通过注意力权重进行解释的机制，注意力权重与输入特征（例如，文档中的令牌）。在这种情况下，更大的注意力权重可能意味着模型预测的更多相关特征。在循证医学中，此类解释可以支持医生理解用于对生物医学文献进行分类的人工智能系统并与之互动。然而，对于注意力权重是否提供了有用的解释，仍然没有达成共识。此外，很少有研究探讨视觉化注意如何影响其作为解释辅助工具的有用性。为了弥补这一差距，我们进行了一项用户研究，以评估基于注意力的解释是否支持用户进行生物医学文档分类，以及是否有首选的可视化方式。这项研究涉及来自不同学科的医学专家，他们根据研究设计（例如，系统综述、广泛综合、随机和非随机试验）。我们的研究结果表明，Transformer模型（XLNet）分类文件准确，但是，注意力的权重被认为是不是特别有助于解释的预测。然而，这种感知会根据注意力的可视化方式而发生显着变化。与Munzner的视觉效果原则相反，它倾向于精确的编码，如条长度，用户更喜欢更直观的格式，如文本亮度或背景颜色。虽然我们的研究结果并没有证实注意力权重的整体效用，但它们表明，他们的感知帮助受到视觉呈现方式的影响。
摘要：The attention mechanism is a core component of the Transformer architecture. Beyond improving performance, attention has been proposed as a mechanism for explainability via attention weights, which are associated with input features (e.g., tokens in a document). In this context, larger attention weights may imply more relevant features for the model's prediction. In evidence-based medicine, such explanations could support physicians' understanding and interaction with AI systems used to categorize biomedical literature. However, there is still no consensus on whether attention weights provide helpful explanations. Moreover, little research has explored how visualizing attention affects its usefulness as an explanation aid. To bridge this gap, we conducted a user study to evaluate whether attention-based explanations support users in biomedical document classification and whether there is a preferred way to visualize them. The study involved medical experts from various disciplines who classified articles based on study design (e.g., systematic reviews, broad synthesis, randomized and non-randomized trials). Our findings show that the Transformer model (XLNet) classified documents accurately; however, the attention weights were not perceived as particularly helpful for explaining the predictions. However, this perception varied significantly depending on how attention was visualized. Contrary to Munzner's principle of visual effectiveness, which favors precise encodings like bar length, users preferred more intuitive formats, such as text brightness or background color. While our results do not confirm the overall utility of attention weights for explanation, they suggest that their perceived helpfulness is influenced by how they are visually presented.

【5】Bridging AI Innovation and Healthcare Needs: Lessons Learned from Incorporating Modern NLP at The BC Cancer Registry
标题：弥合人工智能创新与医疗保健需求：从BC癌症登记处推广现代NLP中学到的教训
链接：https://arxiv.org/abs/2508.09991

作者：Gondara, Gregory Arbour, Raymond Ng, Jonathan Simkin, Shebnum Devji
摘要：从临床文档中自动提取数据为提高医疗保健环境的效率提供了巨大的潜力，但部署自然语言处理（NLP）解决方案带来了实际挑战。根据我们在不列颠哥伦比亚省癌症登记处（BCCR）实施各种NLP模型进行信息提取和分类任务的经验，本文分享了在整个项目生命周期中学到的关键经验教训。我们强调基于明确的业务目标而不仅仅是技术准确性来定义问题的至关重要性，采用迭代的开发方法，并从一开始就促进涉及领域专家，最终用户和ML专家的深度跨学科协作和共同设计。进一步的见解强调了务实的模型选择（包括适当的混合方法和更简单的方法），严格关注数据质量（代表性，漂移，注释），强大的错误缓解策略，包括人在回路验证和持续审计，以及建立组织AI素养。这些实用的考虑因素，可推广到癌症登记之外，为寻求成功实施AI/NLP解决方案的医疗保健组织提供指导，以增强数据管理流程，并最终改善患者护理和公共卫生结果。
摘要：Automating data extraction from clinical documents offers significant potential to improve efficiency in healthcare settings, yet deploying Natural Language Processing (NLP) solutions presents practical challenges. Drawing upon our experience implementing various NLP models for information extraction and classification tasks at the British Columbia Cancer Registry (BCCR), this paper shares key lessons learned throughout the project lifecycle. We emphasize the critical importance of defining problems based on clear business objectives rather than solely technical accuracy, adopting an iterative approach to development, and fostering deep interdisciplinary collaboration and co-design involving domain experts, end-users, and ML specialists from inception. Further insights highlight the need for pragmatic model selection (including hybrid approaches and simpler methods where appropriate), rigorous attention to data quality (representativeness, drift, annotation), robust error mitigation strategies involving human-in-the-loop validation and ongoing audits, and building organizational AI literacy. These practical considerations, generalizable beyond cancer registries, provide guidance for healthcare organizations seeking to successfully implement AI/NLP solutions to enhance data management processes and ultimately improve patient care and public health outcomes.

蒸馏|知识提取(3篇)

【1】A Dataset for Distilling Knowledge Priors from Literature for Therapeutic Design
标题：从治疗设计文献中提取知识先验的数据集
链接：https://arxiv.org/abs/2508.10899

作者：mas Jones, Natalie Maus, Josh Magnus Ludan, Maggie Ziyu Huan, Jiaming Liang, Marcelo Der Torossian Torres, Jiatao Liang, Zachary Ives, Yoseph Barash, Cesar de la Fuente-Nunez, Jacob R. Gardner, Mark Yatskar
摘要：人工智能驱动的发现可以大大缩短设计时间，提高新疗法的有效性。使用模拟器的模型探索了广阔的设计空间，但由于缺乏实验先验，存在违反隐含约束的风险。例如，在我们使用监督分类器对GuacaMol基准上的不同模型集进行的一项新分析中，超过60%的分子具有很高的致突变性概率。在这项工作中，我们介绍了我们的数据集，从描述实验室环境中使用的化合物的文献中提取的设计问题的先验数据集。它是用LLM管道构建的，用于在相关段落中发现治疗实体，并以简明合理的事实总结信息。我们的数据集包含3230万对自然语言事实和适当的实体表示（即SMILES或refseq ID）。为了展示数据的潜力，我们训练LLM、CLIP和Lava架构，以联合推理文本和设计目标，并评估来自治疗数据共享空间（TDC）的任务。\ourdataset~对于创建具有强先验的模型非常有效：在使用我们的数据作为预训练的监督预测问题中，我们具有15 M可学习参数的最佳模型在回归和分类TDC任务上的表现优于更大的2B TxGemma，并且平均执行10到9 B模型。使用\ourdataset~构建的模型可以在GuacaMol中优化新分子时用作约束条件，从而产生更安全且几乎同样有效的建议。我们在\href{https：//huggingface.co/medexanon/Medex}{huggingface.co/medexanon/Medex}发布我们的数据集，并将随着可用文献的增长提供扩展版本。
摘要：AI-driven discovery can greatly reduce design time and enhance new therapeutics' effectiveness. Models using simulators explore broad design spaces but risk violating implicit constraints due to a lack of experimental priors. For example, in a new analysis we performed on a diverse set of models on the GuacaMol benchmark using supervised classifiers, over 60\% of molecules proposed had high probability of being mutagenic. In this work, we introduce \ourdataset, a dataset of priors for design problems extracted from literature describing compounds used in lab settings. It is constructed with LLM pipelines for discovering therapeutic entities in relevant paragraphs and summarizing information in concise fair-use facts. \ourdataset~ consists of 32.3 million pairs of natural language facts, and appropriate entity representations (i.e. SMILES or refseq IDs). To demonstrate the potential of the data, we train LLM, CLIP, and LLava architectures to reason jointly about text and design targets and evaluate on tasks from the Therapeutic Data Commons (TDC). \ourdataset~is highly effective for creating models with strong priors: in supervised prediction problems that use our data as pretraining, our best models with 15M learnable parameters outperform larger 2B TxGemma on both regression and classification TDC tasks, and perform comparably to 9B models on average. Models built with \ourdataset~can be used as constraints while optimizing for novel molecules in GuacaMol, resulting in proposals that are safer and nearly as effective. We release our dataset at \href{https://huggingface.co/datasets/medexanon/Medex}{huggingface.co/datasets/medexanon/Medex}, and will provide expanded versions as available literature grows.

【2】Reflect then Learn: Active Prompting for Information Extraction Guided by Introspective Confusion
标题：反思然后学习：在内省混乱的指导下主动预算信息提取
链接：https://arxiv.org/abs/2508.10036

作者：, Yadong Wang, Xiang Chen, Chenxi Wang, Hongliang Dai, Chuanxing Geng, Shengzhong Zhang, Shaoyuan Li, Sheng-Jun Huang
备注：Under Review
摘要：大型语言模型在Few-Shot信息抽取方面表现出巨大的潜力，但其性能对上下文示例的选择高度敏感。传统的选择策略往往无法提供信息的指导，因为他们忽略了模型易犯错误的一个关键来源：不仅来自语义内容的混乱，而且来自IE任务所需的结构良好的格式的生成。为了解决这个问题，我们介绍了主动提示信息提取（APIE），一种新的主动提示框架的指导原则，我们术语内省的混乱。我们的方法使LLM能够通过一个双分量不确定性度量来评估自己的混乱，该度量唯一地量化了格式不确定性（生成正确语法的困难）和内容不确定性（提取的语义不一致）。通过用这个综合分数对未标记的数据进行排名，我们的框架主动选择最具挑战性和信息量最大的样本作为Few-Shot样本。在四个基准上的大量实验表明，我们的方法始终优于强基线，在提取精度和鲁棒性方面都有显着提高。我们的工作强调了在构建有效和可靠的结构化生成系统时，模型不确定性的细粒度，双层视图的至关重要性。
摘要：Large Language Models (LLMs) show remarkable potential for few-shot information extraction (IE), yet their performance is highly sensitive to the choice of in-context examples. Conventional selection strategies often fail to provide informative guidance, as they overlook a key source of model fallibility: confusion stemming not just from semantic content, but also from the generation of well-structured formats required by IE tasks. To address this, we introduce Active Prompting for Information Extraction (APIE), a novel active prompting framework guided by a principle we term introspective confusion. Our method empowers an LLM to assess its own confusion through a dual-component uncertainty metric that uniquely quantifies both Format Uncertainty (difficulty in generating correct syntax) and Content Uncertainty (inconsistency in extracted semantics). By ranking unlabeled data with this comprehensive score, our framework actively selects the most challenging and informative samples to serve as few-shot exemplars. Extensive experiments on four benchmarks show that our approach consistently outperforms strong baselines, yielding significant improvements in both extraction accuracy and robustness. Our work highlights the critical importance of a fine-grained, dual-level view of model uncertainty when it comes to building effective and reliable structured generation systems.

【3】zERExtractor:An Automated Platform for Enzyme-Catalyzed Reaction Data Extraction from Scientific Literature
标题：zER Extractor：从科学文献中提取酶催化反应数据的自动化平台
链接：https://arxiv.org/abs/2508.09995

作者： Haohui Ma, Tianle Xin, Lixin Zou, Qiuyue Hu, Hongxi Cheng, Mingzhi Lin, Jingjing Guo, Sheng Wang, Guoqing Zhang, Yanjie Wei, Liangzhen Zheng
摘要：酶动力学文献的快速扩张已经超过了主要生化数据库的管理能力，为人工智能驱动的建模和知识发现创造了巨大的障碍。我们推出了zERExtractor，这是一个自动化且可扩展的平台，用于从科学文献中全面提取酶催化反应和活性数据。zERExtractor具有统一的模块化架构，支持最先进的模型（包括大型语言模型（LLM））作为可互换组件的即插即用集成，从而实现系统的持续发展以及AI的进步。我们的管道结合了领域适应性深度学习，高级OCR，语义实体识别和人工驱动的LLM模块，以及人类专家校正，以提取动力学参数（例如，kcat，Km）、酶序列、底物SMILES、实验条件和来自异质文档格式的分子图。通过集成人工智能辅助注释、专家验证和迭代细化的主动学习策略，系统可以快速适应新的数据源。我们还发布了一个大型基准数据集，包括来自270个P450相关酶学出版物的1，000多个注释表和5，000个生物字段。基准测试表明，zERExtractor在表格识别（Acc 89.9%）、分子图像解释（高达99.1%）和关系提取（准确度94.2%）方面始终优于现有基线。zERExtractor通过灵活的插件就绪框架和高保真提取弥合了酶动力学中长期存在的数据差距，为未来AI驱动的酶建模和生物化学知识发现奠定了基础。
摘要：The rapid expansion of enzyme kinetics literature has outpaced the curation capabilities of major biochemical databases, creating a substantial barrier to AI-driven modeling and knowledge discovery. We present zERExtractor, an automated and extensible platform for comprehensive extraction of enzyme-catalyzed reaction and activity data from scientific literature. zERExtractor features a unified, modular architecture that supports plug-and-play integration of state-of-the-art models, including large language models (LLMs), as interchangeable components, enabling continuous system evolution alongside advances in AI. Our pipeline combines domain-adapted deep learning, advanced OCR, semantic entity recognition, and prompt-driven LLM modules, together with human expert corrections, to extract kinetic parameters (e.g., kcat, Km), enzyme sequences, substrate SMILES, experimental conditions, and molecular diagrams from heterogeneous document formats. Through active learning strategies integrating AI-assisted annotation, expert validation, and iterative refinement, the system adapts rapidly to new data sources. We also release a large benchmark dataset comprising over 1,000 annotated tables and 5,000 biological fields from 270 P450-related enzymology publications. Benchmarking demonstrates that zERExtractor consistently outperforms existing baselines in table recognition (Acc 89.9%), molecular image interpretation (up to 99.1%), and relation extraction (accuracy 94.2%). zERExtractor bridges the longstanding data gap in enzyme kinetics with a flexible, plugin-ready framework and high-fidelity extraction, laying the groundwork for future AI-powered enzyme modeling and biochemical knowledge discovery.

推荐(3篇)

【1】Confounding is a Pervasive Problem in Real World Recommender Systems
标题：混淆是现实世界推荐系统中普遍存在的问题
链接：https://arxiv.org/abs/2508.10479

作者： Merkov, David Rohde, Alexandre Gilotte, Benjamin Heymann
备注：12 pages, 4 figures
摘要：当一个未测量的特征影响治疗和结局时，会出现未观察到的混杂，导致有偏倚的因果效应估计。这个问题破坏了经济学、医学、生态学或流行病学等领域的观察性研究。推荐系统充分利用观察到的数据似乎不容易受到这个问题。然而，推荐系统中的许多标准实践导致观察到的特征被忽略，从而实际上导致相同的问题。本文将表明，许多常见的做法，如功能工程，A/B测试和模块化，实际上可以引入混淆到推荐系统，并妨碍其性能。提供了几个插图的现象，支持模拟研究与实践的建议，从业者如何减少或避免在实际系统中的混淆的影响。
摘要：Unobserved confounding arises when an unmeasured feature influences both the treatment and the outcome, leading to biased causal effect estimates. This issue undermines observational studies in fields like economics, medicine, ecology or epidemiology. Recommender systems leveraging fully observed data seem not to be vulnerable to this problem. However many standard practices in recommender systems result in observed features being ignored, resulting in effectively the same problem. This paper will show that numerous common practices such as feature engineering, A/B testing and modularization can in fact introduce confounding into recommendation systems and hamper their performance. Several illustrations of the phenomena are provided, supported by simulation studies with practical suggestions about how practitioners may reduce or avoid the affects of confounding in real systems.

【2】HiRef: Leveraging Hierarchical Ontology and Network Refinement for Robust Medication Recommendation
标题：HiRef：利用分层本体和网络细化来实现稳健的药物推荐
链接：https://arxiv.org/abs/2508.10425

作者：Chok, Soyon Park, Seungheun Baek, Hajung Kim, Junhyun Lee, Jaewoo Kang
摘要：药物推荐是一项重要的任务，以协助医生作出及时的决定，从纵向病人的医疗记录。然而，由于存在很少观察到的医疗实体和不完整的记录，现实世界的EHR数据存在重大挑战，这些记录可能无法完全捕获临床基础事实。虽然在纵向电子健康记录上训练的数据驱动模型通常具有很强的经验性能，但它们很难在缺失或新的条件下进行概括，这主要是由于它们依赖于观察到的共现模式。为了解决这些问题，我们提出了分层本体和网络细化的鲁棒药物推荐（HiRef），一个统一的框架，结合了两个互补的结构：（i）层次语义编码在策划的医疗本体，和（ii）来自现实世界的EHR细化共现模式。我们将本体实体嵌入到双曲空间中，这自然会捕获树状关系，并通过共享祖先实现知识转移，从而提高对看不见的代码的泛化能力。为了进一步提高鲁棒性，我们引入了一个先验指导的稀疏正则化方案，该方案通过抑制虚假边缘来细化EHR同现图，同时保留临床上有意义的关联。我们的模型在EHR基准测试（MIMIC-III和MIMIC-IV）上实现了强大的性能，并在模拟的不可见代码设置下保持了高精度。通过全面消融研究进行的广泛实验证明了HiRef对看不见的医疗代码的弹性，并得到了对学习到的稀疏图结构和医疗代码嵌入的深入分析的支持。
摘要：Medication recommendation is a crucial task for assisting physicians in making timely decisions from longitudinal patient medical records. However, real-world EHR data present significant challenges due to the presence of rarely observed medical entities and incomplete records that may not fully capture the clinical ground truth. While data-driven models trained on longitudinal Electronic Health Records often achieve strong empirical performance, they struggle to generalize under missing or novel conditions, largely due to their reliance on observed co-occurrence patterns. To address these issues, we propose Hierarchical Ontology and Network Refinement for Robust Medication Recommendation (HiRef), a unified framework that combines two complementary structures: (i) the hierarchical semantics encoded in curated medical ontologies, and (ii) refined co-occurrence patterns derived from real-world EHRs. We embed ontology entities in hyperbolic space, which naturally captures tree-like relationships and enables knowledge transfer through shared ancestors, thereby improving generalizability to unseen codes. To further improve robustness, we introduce a prior-guided sparse regularization scheme that refines the EHR co-occurrence graph by suppressing spurious edges while preserving clinically meaningful associations. Our model achieves strong performance on EHR benchmarks (MIMIC-III and MIMIC-IV) and maintains high accuracy under simulated unseen-code settings. Extensive experiments with comprehensive ablation studies demonstrate HiRef's resilience to unseen medical codes, supported by in-depth analyses of the learned sparsified graph structure and medical code embeddings.

【3】Clicks Versus Conversion: Choosing a Recommender's Training Objective in E-Commerce
标题：点击量与转化：选择电子商务中推荐人的训练目标
链接：https://arxiv.org/abs/2508.10377

作者：eiss, Robert Rosenbach, Christian Eggenberger
摘要：对产品推荐进行排名以优化高点击率（CTR）或高转化率，例如添加到购物车率（ACR）和订单提交率（OSR，查看到购买转化率）是电子商务中的标准做法。优化CTR似乎是一个简单的选择：训练数据（即，点击数据）易于收集并且通常可以大量获得。此外，CTR的用途远远超出了电子商务，使其成为一个通用的，易于实现的选项。另一方面，ACR和OSR更直接地与商店的商业目标联系在一起，例如总销售额（GMV）。在本文中，我们比较了使用这些目标中的任何一个使用在线A/B测试的效果。在我们的主要发现中，我们证明了在我们的商店中，优化OSR比优化CTR时产生的GMV提升超过五倍，而不会牺牲新产品的发现。我们的研究结果还提供了不同的功能重要性为每个目标的见解。
摘要：Ranking product recommendations to optimize for a high click-through rate (CTR) or for high conversion, such as add-to-cart rate (ACR) and Order-Submit-Rate (OSR, view-to-purchase conversion) are standard practices in e-commerce. Optimizing for CTR appears like a straightforward choice: Training data (i.e., click data) are simple to collect and often available in large quantities. Additionally, CTR is used far beyond e-commerce, making it a generalist, easily implemented option. ACR and OSR, on the other hand, are more directly linked to a shop's business goals, such as the Gross Merchandise Value (GMV). In this paper, we compare the effects of using either of these objectives using an online A/B test. Among our key findings, we demonstrate that in our shops, optimizing for OSR produces a GMV uplift more than five times larger than when optimizing for CTR, without sacrificing new product discovery. Our results also provide insights into the different feature importances for each of the objectives.

聚类(2篇)

【1】SPHENIC: Topology-Informed Multi-View Clustering for Spatial Transcriptomics
标题：SPPHENIC：用于空间转录组学的地形信息多视图聚集
链接：https://arxiv.org/abs/2508.10646

作者：uo, Yikai Zhu, Jing Yangum, Renxiang Guan, Por Lip Yee, Guangdun Peng, Dayu Hu
备注：12 pages, 6 figures, 2 tables
摘要：通过结合空间位置信息，空间转录组学聚类产生更全面的见解细胞亚群识别。尽管最近取得了进展，但现有的方法至少有两个局限性：（i）拓扑学习通常只考虑单个细胞或其相互作用图的表示;然而，空间转录组学图谱通常是嘈杂的，使得这些方法容易受到低质量拓扑信号的影响，以及（ii）空间邻域信息的建模不足导致低质量的空间嵌入。为了解决这些局限性，我们提出了SPENIC，一种新的空间持久同源增强邻域综合聚类方法。具体来说，SPENIC将不变的拓扑特征融入聚类网络，以实现稳定的表示学习。此外，为了构建反映真实细胞分布的高质量空间嵌入，我们设计了空间约束和分布优化模块（SCDOM）。该模块增加了细胞嵌入与其空间邻居嵌入之间的相似性，降低了与非相邻细胞的相似性，从而产生了聚类友好的空间嵌入。在14个基准空间转录组切片上的广泛实验表明，SPHENIC在空间聚类任务上实现了卓越的性能，比现有的最先进的方法高出3.31%-6.54%。
摘要：By incorporating spatial location information, spatial-transcriptomics clustering yields more comprehensive insights into cell subpopulation identification. Despite recent progress, existing methods have at least two limitations: (i) topological learning typically considers only representations of individual cells or their interaction graphs; however, spatial transcriptomic profiles are often noisy, making these approaches vulnerable to low-quality topological signals, and (ii) insufficient modeling of spatial neighborhood information leads to low-quality spatial embeddings. To address these limitations, we propose SPHENIC, a novel Spatial Persistent Homology Enhanced Neighborhood Integrative Clustering method. Specifically, SPHENIC incorporates invariant topological features into the clustering network to achieve stable representation learning. Additionally, to construct high-quality spatial embeddings that reflect the true cellular distribution, we design the Spatial Constraint and Distribution Optimization Module (SCDOM). This module increases the similarity between a cell's embedding and those of its spatial neighbors, decreases similarity with non-neighboring cells, and thereby produces clustering-friendly spatial embeddings. Extensive experiments on 14 benchmark spatial transcriptomic slices demonstrate that SPHENIC achieves superior performance on the spatial clustering task, outperforming existing state-of-the-art methods by 3.31%-6.54% over the best alternative.

【2】Welfare-Centric Clustering
标题：以福利为中心的集群
链接：https://arxiv.org/abs/2508.10345

作者：e Zhang, Seyed A. Esmaeili, Jamie Morgenstern
摘要：公平集群传统上侧重于确保公平的群体代表性或均衡特定群体的集群成本。然而，Dickerson等人（2025）最近表明，这些公平概念可能会产生不理想或不直观的聚类结果，并主张采用以福利为中心的聚类方法，对群体的效用进行建模。在这项工作中，我们的距离和比例代表性的基础上建模组的效用和形式化的两个优化目标的基础上以福利为中心的聚类：罗尔斯（平等主义）的目标和功利主义的目标。我们引入新的算法，这两个目标，并证明他们的理论保证。在多个真实数据集上的实证评估表明，我们的方法显着优于现有的公平聚类基线。
摘要：Fair clustering has traditionally focused on ensuring equitable group representation or equalizing group-specific clustering costs. However, Dickerson et al. (2025) recently showed that these fairness notions may yield undesirable or unintuitive clustering outcomes and advocated for a welfare-centric clustering approach that models the utilities of the groups. In this work, we model group utilities based on both distances and proportional representation and formalize two optimization objectives based on welfare-centric clustering: the Rawlsian (Egalitarian) objective and the Utilitarian objective. We introduce novel algorithms for both objectives and prove theoretical guarantees for them. Empirical evaluations on multiple real-world datasets demonstrate that our methods significantly outperform existing fair clustering baselines.

超分辨率|去噪|去模糊|去雾(1篇)

【1】CrossDenoise: Denoising Implicit Feedback via a Lightweight Entity-Aware Synergistic Framework
标题：CrossDenoise：通过轻量级贫困意识协同框架消除隐性反馈
链接：https://arxiv.org/abs/2508.10851

作者：ianquan Wang, Shuochen Liu, Jie Ma, Huibo Xu, Yupeng Han, Zhe Yang, Kai Zhang, Longfei Li, Jun Zhou
摘要：推荐系统严重依赖于隐式反馈，由于假阳性和假阴性，隐式反馈本身就有噪声，严重降低了推荐准确性。现有的去噪策略往往忽视实体感知建模，遭受高计算开销，或需要过多的超参数调整，限制了其现实世界的适用性。我们提出了CrossDenoise，一种新颖的轻量级框架，通过将噪声估计分解为用户，项目和交互特定因素来解决这些挑战。利用显示用户和项目噪声倾向显着异质性的经验观察，CrossDenoise通过平均训练损失的基于排名的线性映射来计算实体声誉因素（用户/项目可靠性）。这些都是融合的互动水平的权重来自经验累积分布函数（ECDF）的个人损失。这种设计与模型无关，计算效率高，只需要两个直观的超参数。跨GMF、NeuMF和CDAE主干对ML-1 M、Yelp和Amazon图书数据集进行的广泛实验表明，CrossDenoise的性能始终且显着优于最先进的基线。例如，它在使用NeuMF的Yelp上实现了高达27.01%的NDCG@50增益，同时产生的计算和内存开销可以忽略不计。我们的分析证实，CrossDenoise有效地分离干净的噪声样本，并在不同的超参数设置下保持稳健。它为隐式反馈去噪提供了一种实用且可扩展的解决方案。
摘要：Recommender systems heavily rely on implicit feedback, which is inherently noisy due to false positives and negatives, severely degrading recommendation accuracy. Existing denoising strategies often overlook entity-aware modeling, suffer from high computational overhead, or demand excessive hyperparameter tuning, limiting their real-world applicability. We propose CrossDenoise, a novel and lightweight framework that addresses these challenges by disentangling noise estimation into user-, item-, and interaction-specific factors. Leveraging empirical observations that show significant heterogeneity in user and item noise propensities, CrossDenoise computes entity reputation factors (user/item reliability) via a rank-based linear mapping of average training losses. These are fused with interaction-level weights derived from an empirical cumulative distribution function (ECDF) of individual losses. This design is model-agnostic, computationally efficient, and requires only two intuitive hyperparameters. Extensive experiments on ML-1M, Yelp, and Amazon-book datasets, across GMF, NeuMF, and CDAE backbones, demonstrate that CrossDenoise consistently and significantly outperforms state-of-the-art baselines. For instance, it achieves up to 27.01% NDCG@50 gain on Yelp with NeuMF, while incurring negligible computational and memory overhead. Our analysis confirms that CrossDenoise effectively separates clean from noisy samples and remains robust under varied hyperparameter settings. It offers a practical and scalable solution for denoising implicit feedback.

点云|SLAM|雷达|激光|深度RGBD相关(1篇)

【1】Natively Trainable Sparse Attention for Hierarchical Point Cloud Datasets
标题：分层点云数据集的本地可训练稀疏注意力
链接：https://arxiv.org/abs/2508.10758

作者：apautre, Maria Marchenko, Carlos Miguel Patiño, Xin Zhou
摘要：释放Transformers在大型物理系统数据集上的潜力取决于克服注意力机制的二次缩放。这项工作探索结合欧文架构与原生稀疏注意力（NSA）机制，以提高效率和大规模物理系统的Transformer模型的感受野，解决二次注意力复杂性的挑战。我们适应非连续数据的NSA机制，实现欧文NSA模型，并在来自物理科学的三个数据集上进行评估-宇宙学模拟，分子动力学和气压建模-实现匹配或超过原始欧文模型的性能。此外，我们再现了欧文论文的实验结果，以验证其实施。
摘要：Unlocking the potential of transformers on datasets of large physical systems depends on overcoming the quadratic scaling of the attention mechanism. This work explores combining the Erwin architecture with the Native Sparse Attention (NSA) mechanism to improve the efficiency and receptive field of transformer models for large-scale physical systems, addressing the challenge of quadratic attention complexity. We adapt the NSA mechanism for non-sequential data, implement the Erwin NSA model, and evaluate it on three datasets from the physical sciences -- cosmology simulations, molecular dynamics, and air pressure modeling -- achieving performance that matches or exceeds that of the original Erwin model. Additionally, we reproduce the experimental results from the Erwin paper to validate their implementation.

联邦学习|隐私保护|加密(3篇)

【1】APFL: Analytic Personalized Federated Learning via Dual-Stream Least Squares
标题：APFL：通过双流最小二乘法的分析个性化联邦学习
链接：https://arxiv.org/abs/2508.10732

作者：, Jianheng Tang, Zhirui Yang, Feijiang Han, Jiaxu Li, Run He, Yajiang Huang, Anfeng Liu, Houbing Herbert Song, Yunhuai Liu, Huiping Zhuang
备注：9 pages, 4 figures, 2 tables
摘要：个性化联邦学习（PFL）提出了一个重大的挑战，提供个性化的模型，个人客户端通过协作培训。现有的PFL方法通常容易受到非IID数据的影响，这严重阻碍了集体泛化，然后损害了随后的个性化努力。在本文中，为了解决这个非IID问题PFL，我们提出了一个分析个性化联邦学习（APFL）的方法，通过双流最小二乘法。在我们的APFL中，我们使用基础模型作为特征提取的冻结主干。在特征提取器之后，我们开发了双流分析模型来实现集体泛化和个人个性化。具体来说，我们的APFL包含一个共享的主流，用于所有客户端的全局泛化，以及一个专用的细化流，用于每个客户端的本地个性化。APFL的分析解决方案实现了其异质不变性的理想属性，理论上意味着每个个性化模型都保持相同，而不管数据在所有其他客户端之间的分布有多异质。不同数据集的实证结果也验证了我们的APFL优于最先进的基线，准确率至少为1.10%-15.45%。
摘要：Personalized Federated Learning (PFL) has presented a significant challenge to deliver personalized models to individual clients through collaborative training. Existing PFL methods are often vulnerable to non-IID data, which severely hinders collective generalization and then compromises the subsequent personalization efforts. In this paper, to address this non-IID issue in PFL, we propose an Analytic Personalized Federated Learning (APFL) approach via dual-stream least squares. In our APFL, we use a foundation model as a frozen backbone for feature extraction. Subsequent to the feature extractor, we develop dual-stream analytic models to achieve both collective generalization and individual personalization. Specifically, our APFL incorporates a shared primary stream for global generalization across all clients, and a dedicated refinement stream for local personalization of each individual client. The analytical solutions of our APFL enable its ideal property of heterogeneity invariance, theoretically meaning that each personalized model remains identical regardless of how heterogeneous the data are distributed across all other clients. Empirical results across various datasets also validate the superiority of our APFL over state-of-the-art baselines, with advantages of at least 1.10%-15.45% in accuracy.

【2】Flexible Personalized Split Federated Learning for On-Device Fine-Tuning of Foundation Models
标题：灵活的个性化拆分联邦学习，用于基础模型的设备上微调
链接：https://arxiv.org/abs/2508.10349

作者：uan, Jiaxiang Geng, Pengchao Han, Xianhao Chen, Bing Luo
备注：10 pages, Submitted to INFOCOM2026
摘要：与使用预训练的模型相比，微调基础模型对于个性化下游任务的卓越性能至关重要。协作学习可以利用本地客户的数据集进行微调，但有限的客户数据和异构的数据分布阻碍了有效的协作。为了应对这一挑战，我们提出了一个灵活的个性化联邦学习模式，使客户能够参与协作学习，同时保持个性化的目标。鉴于客户端上可用的计算资源有限且异构，我们引入了\textbf{灵活的个性化分裂联邦学习（FlexP-SFL）}。FlexP-SFL基于分割学习，允许每个客户端根据资源约束在本地训练模型的一部分，同时将其余部分卸载到服务器。此外，我们提出了一个对齐策略，以提高个性化的模型在全球数据上的性能。实验结果表明，FlexP-SFL在个性化微调效率和最终准确率方面优于基线模型。
摘要：Fine-tuning foundation models is critical for superior performance on personalized downstream tasks, compared to using pre-trained models. Collaborative learning can leverage local clients' datasets for fine-tuning, but limited client data and heterogeneous data distributions hinder effective collaboration. To address the challenge, we propose a flexible personalized federated learning paradigm that enables clients to engage in collaborative learning while maintaining personalized objectives. Given the limited and heterogeneous computational resources available on clients, we introduce \textbf{flexible personalized split federated learning (FlexP-SFL)}. Based on split learning, FlexP-SFL allows each client to train a portion of the model locally while offloading the rest to a server, according to resource constraints. Additionally, we propose an alignment strategy to improve personalized model performance on global data. Experimental results show that FlexP-SFL outperforms baseline models in personalized fine-tuning efficiency and final accuracy.

【3】Improving Learning of New Diseases through Knowledge-Enhanced Initialization for Federated Adapter Tuning
标题：通过用于联邦适配器调整的知识增强型神经网络来改善对新疾病的学习
链接：https://arxiv.org/abs/2508.10299

作者：g, Yuan Wang, Kangning Cai, Peiyan Ning, Jiming Xu, Yong Liu, Rick Siow Mong Goh, Qingsong Wei, Huazhu Fu
摘要：在医疗保健领域，联邦学习（FL）是一种被广泛采用的框架，可以实现医疗机构之间的隐私保护协作。随着大型基础模型（FM）展示出令人印象深刻的功能，通过经济高效的适配器调整在FL中使用FM已成为一种流行的方法。鉴于快速发展的医疗环境，个人客户在借鉴过去经验的同时，通过调整适配器来快速适应新的任务或疾病至关重要。在这项工作中，我们介绍了联邦知识增强型学习（FedKEI），一个新的框架，利用跨客户端和跨任务传输从过去的知识，生成知情的初始化学习新的任务与适配器。FedKEI首先在服务器上进行全局聚类过程，以概括任务之间的知识，然后优化集群之间（集群间权重）和每个集群内（集群内权重）的聚合权重，以个性化每个新任务的知识转移。为了更有效地学习集群内和集群间的权重，我们采用了一种双层优化方案，该方案协作学习跨客户端的全局集群内权重，并优化每个客户端的任务目标的局部集群间权重。在三个不同模式的基准数据集上进行的广泛实验，包括皮肤病学，胸部X射线和视网膜OCT，证明了FedKEI在适应新疾病方面的优势。
摘要：In healthcare, federated learning (FL) is a widely adopted framework that enables privacy-preserving collaboration among medical institutions. With large foundation models (FMs) demonstrating impressive capabilities, using FMs in FL through cost-efficient adapter tuning has become a popular approach. Given the rapidly evolving healthcare environment, it is crucial for individual clients to quickly adapt to new tasks or diseases by tuning adapters while drawing upon past experiences. In this work, we introduce Federated Knowledge-Enhanced Initialization (FedKEI), a novel framework that leverages cross-client and cross-task transfer from past knowledge to generate informed initializations for learning new tasks with adapters. FedKEI begins with a global clustering process at the server to generalize knowledge across tasks, followed by the optimization of aggregation weights across clusters (inter-cluster weights) and within each cluster (intra-cluster weights) to personalize knowledge transfer for each new task. To facilitate more effective learning of the inter- and intra-cluster weights, we adopt a bi-level optimization scheme that collaboratively learns the global intra-cluster weights across clients and optimizes the local inter-cluster weights toward each client's task objective. Extensive experiments on three benchmark datasets of different modalities, including dermatology, chest X-rays, and retinal OCT, demonstrate FedKEI's advantage in adapting to new diseases compared to state-of-the-art methods.

推理|分析|理解|解释(10篇)

【1】Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models
标题：Pass@k训练，适应性平衡大型推理模型的探索和利用
链接：https://arxiv.org/abs/2508.10751

作者：hen, Xiaobo Qin, Youbin Wu, Yue Ling, Qinghao Ye, Wayne Xin Zhao, Guang Shi
备注：Technical Report about RLVR: 32 pages, 18 figures, 7 tables
摘要：具有可验证奖励的强化学习（RLVR）通常采用Pass@1作为奖励，面临着平衡探索和利用的问题，导致策略倾向于保守行动，收敛到局部最优。因此，确定适当的奖励指标至关重要。关于之前的工作，尽管Pass@k已用于评估，但其与RLVR中LLM勘探能力的联系在很大程度上仍然被忽视。为了研究这一点，我们首先使用Pass@k作为奖励来训练策略模型（即，$\textbf{Pass@k Training}$），观察其探索能力的提升。接下来，我们推导出Pass@k Training优势的分析解决方案，从而实现高效和有效的过程。在此基础上，我们的分析表明，勘探和开发并不是内在冲突的目标，而它们可以相互促进。此外，带有分析推导的Pass@k训练本质上涉及直接设计优势函数。受此启发，我们初步探索了RLVR的优势设计，显示了有希望的结果，并突出了一个潜在的未来方向。
摘要：Reinforcement learning with verifiable rewards (RLVR), which typically adopts Pass@1 as the reward, has faced the issues in balancing exploration and exploitation, causing policies to prefer conservative actions, converging to a local optimum. Identifying an appropriate reward metric is therefore crucial. Regarding the prior work, although Pass@k has been used in evaluation, its connection to LLM exploration ability in RLVR remains largely overlooked. To investigate this, we first use Pass@k as the reward to train the policy model (i.e., $\textbf{Pass@k Training}$), and observe the improvement on its exploration ability. Next, we derive an analytical solution for the advantage of Pass@k Training, leading to an efficient and effective process. Building on this, our analysis reveals that exploration and exploitation are not inherently conflicting objectives, while they can mutually enhance each other. Moreover, Pass@k Training with analytical derivation essentially involves directly designing the advantage function. Inspired by this, we preliminarily explore the advantage design for RLVR, showing promising results and highlighting a potential future direction.

【2】Reproducible Physiological Features in Affective Computing: A Preliminary Analysis on Arousal Modeling
标题：情感计算中可复制的生理特征：觉醒建模的初步分析
链接：https://arxiv.org/abs/2508.10561

作者：rgano, Jasin Machkour, Mimma Nardelli, Enzo Pasquale Scilingo, Michael Muma
备注：Submitted to 2025 IEEE International Conference on Metrology for eXtended Reality, Artificial Intelligence and Neural Engineering (MetroXRAINE). 6 pages, 3 figures
摘要：在情感计算中，一个关键的挑战在于可靠地将主观情感体验与客观生理标记联系起来。这项初步研究通过识别与唤醒水平的连续自我报告相关的心血管和皮肤电信号的生理特征来解决再现性问题。使用连续注释的情绪信号数据集，我们分析了从30名参与者的心脏和皮肤电信号中提取的164个特征。使用终止随机实验（T-Rex）方法进行特征选择，该方法系统地执行控制用户定义的目标错误发现率的变量选择。值得注意的是，在所有候选特征中，只有两个电皮肤衍生的特征表现出可重复的和统计学上显著的与唤醒的关联，实现了100%的确认率。这些结果强调了在生理特征选择中严格的再现性评估的必要性，这是情感计算中经常忽视的一个方面。我们的方法是特别有前途的安全关键环境中的应用程序，需要值得信赖和可靠的白盒模型，如精神障碍识别和人机交互系统。
摘要：In Affective Computing, a key challenge lies in reliably linking subjective emotional experiences with objective physiological markers. This preliminary study addresses the issue of reproducibility by identifying physiological features from cardiovascular and electrodermal signals that are associated with continuous self-reports of arousal levels. Using the Continuously Annotated Signal of Emotion dataset, we analyzed 164 features extracted from cardiac and electrodermal signals of 30 participants exposed to short emotion-evoking videos. Feature selection was performed using the Terminating-Random Experiments (T-Rex) method, which performs variable selection systematically controlling a user-defined target False Discovery Rate. Remarkably, among all candidate features, only two electrodermal-derived features exhibited reproducible and statistically significant associations with arousal, achieving a 100\% confirmation rate. These results highlight the necessity of rigorous reproducibility assessments in physiological features selection, an aspect often overlooked in Affective Computing. Our approach is particularly promising for applications in safety-critical environments requiring trustworthy and reliable white box models, such as mental disorder recognition and human-robot interaction systems.

【3】On the Complexity-Faithfulness Trade-off of Gradient-Based Explanations
标题：论基于对象的解释的复杂性与忠诚性权衡
链接：https://arxiv.org/abs/2508.10490

作者：panah, Matteo Gamba, Kevin Smith, Hossein Azizpour
备注：23 pages, 14 figures, to be published in International Conference on Computer Vision 2025
摘要：ReLU网络虽然在视觉数据中很流行，但具有急剧的过渡，有时依赖于单个像素进行预测，使得基于梯度的解释变得嘈杂且难以解释。现有的方法，如GradCAM，通过以忠实为代价产生代理模型来平滑这些解释。我们引入了一个统一的光谱框架，系统地分析和量化平滑度，忠实度，以及它们在解释中的权衡。使用这个框架，我们量化和规范化了ReLU网络对高频信息的贡献，为识别这种权衡提供了一种原则性的方法。我们的分析特征如何代理为基础的平滑扭曲的解释，导致一个“解释差距”，我们正式定义和衡量不同的事后方法。最后，我们通过不同的设计选择、数据集和消融验证了我们的理论研究结果。
摘要：ReLU networks, while prevalent for visual data, have sharp transitions, sometimes relying on individual pixels for predictions, making vanilla gradient-based explanations noisy and difficult to interpret. Existing methods, such as GradCAM, smooth these explanations by producing surrogate models at the cost of faithfulness. We introduce a unifying spectral framework to systematically analyze and quantify smoothness, faithfulness, and their trade-off in explanations. Using this framework, we quantify and regularize the contribution of ReLU networks to high-frequency information, providing a principled approach to identifying this trade-off. Our analysis characterizes how surrogate-based smoothing distorts explanations, leading to an ``explanation gap'' that we formally define and measure for different post-hoc methods. Finally, we validate our theoretical findings across different design choices, datasets, and ablations.

【4】RealAC: A Domain-Agnostic Framework for Realistic and Actionable Counterfactual Explanations
标题：RealAC：一个用于现实和可操作的反事实推理的领域无关框架
链接：https://arxiv.org/abs/2508.10455

作者：efeen, Shovito Barua Soumma, Hassan Ghasemzadeh
摘要：反事实解释通过描述可能改变模型预测的输入特征的最小变化，为人工智能做出的决策提供人类可理解的推理。为了在实践中真正有用，这种解释必须是现实和可行的-它们应该尊重基本的数据分布和用户定义的可行性限制。现有的方法通常通过严格的手工约束或特定领域的知识来强制执行特征间的依赖关系，这限制了它们的泛化能力和捕获数据中固有的复杂非线性关系的能力。此外，它们很少适应用户指定的偏好，并提出因果关系不可信或不可行的解释。我们介绍RealAC，一个域不可知的框架，用于生成现实的和可操作的反事实。RealAC自动保留复杂的特征间依赖关系，而不依赖于显式的领域知识-通过对齐事实和反事实实例之间的特征对的联合分布。该框架还允许最终用户通过在优化期间抑制冻结特性的变化来“冻结”他们不能或不希望改变的属性。在三个合成数据集和两个真实数据集上的评估表明，RealAC平衡了现实主义与可操作性。我们的方法在因果边缘分数，依赖保留分数和IM 1现实主义度量方面优于最先进的基线和基于大型语言模型的反事实生成技术，并为以用户为中心的反事实生成提供了一个解决方案。
摘要：Counterfactual explanations provide human-understandable reasoning for AI-made decisions by describing minimal changes to input features that would alter a model's prediction. To be truly useful in practice, such explanations must be realistic and feasible -- they should respect both the underlying data distribution and user-defined feasibility constraints. Existing approaches often enforce inter-feature dependencies through rigid, hand-crafted constraints or domain-specific knowledge, which limits their generalizability and ability to capture complex, nonlinear relations inherent in data. Moreover, they rarely accommodate user-specified preferences and suggest explanations that are causally implausible or infeasible to act upon. We introduce RealAC, a domain-agnostic framework for generating realistic and actionable counterfactuals. RealAC automatically preserves complex inter-feature dependencies without relying on explicit domain knowledge -- by aligning the joint distributions of feature pairs between factual and counterfactual instances. The framework also allows end-users to ``freeze'' attributes they cannot or do not wish to change by suppressing change in frozen features during optimization. Evaluations on three synthetic and two real datasets demonstrate that RealAC balances realism with actionability. Our method outperforms state-of-the-art baselines and Large Language Model-based counterfactual generation techniques in causal edge score, dependency preservation score, and IM1 realism metric and offers a solution for causality-aware and user-centric counterfactual generation.

【5】We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning
标题：We-Math 2.0：用于激励视觉数学推理的多功能MathBook系统
链接：https://arxiv.org/abs/2508.10433

作者：o, Qiuna Tan, Peiqing Yang, Yanzi Wang, Xiaowan Wang, Enhui Wan, Sitong Zhou, Guanting Dong, Yuchen Zeng, Yida Xu, Jie Wang, Chong Sun, Chen Li, Honggang Zhang
备注：Working in progress
摘要：多模态大型语言模型（MLLM）在各种任务中表现出了令人印象深刻的能力，但仍然难以进行复杂的数学推理。现有的研究主要集中在数据集构建和方法优化，往往忽略了两个关键方面：全面的知识驱动的设计和以模型为中心的数据空间建模。在本文中，我们介绍了We-Math 2.0，这是一个统一的系统，它集成了结构化的数学知识系统，以模型为中心的数据空间建模和基于强化学习（RL）的训练范式，以全面提高MLLM的数学推理能力。We-Math 2.0的主要贡献有四个方面：（1）MathBook知识体系：我们构建了一个包含491个知识点和1，819个基本原理的五级层次体系。(2)MathBook-Standard & Pro：我们开发了MathBook-Standard，这是一个通过双重扩展确保广泛概念覆盖和灵活性的数据集。此外，我们定义了一个三维难度空间，并为每个问题生成7个渐进变量，以构建MathBook-Pro，这是一个具有挑战性的数据集，用于强大的训练。(3)MathBook-RL：我们提出了一个两阶段的RL框架，包括：（i）冷启动微调，将模型与面向知识的思维链推理相结合;（ii）渐进对齐RL，利用平均奖励学习和动态数据调度来实现跨难度级别的渐进对齐。(4)MathBookEval：我们引入了一个全面的基准测试，涵盖了所有491个知识点，具有不同的推理步骤分布。实验结果表明，MathBook-RL在四个广泛使用的基准测试中与现有基线具有竞争力，并在MathBookEval上取得了很好的结果，这表明在数学推理中具有很好的泛化能力。
摘要：Multimodal Large Language Models (MLLMs) have demonstrated impressive capabilities across various tasks, but still struggle with complex mathematical reasoning. Existing research primarily focuses on dataset construction and method optimization, often overlooking two critical aspects: comprehensive knowledge-driven design and model-centric data space modeling. In this paper, we introduce We-Math 2.0, a unified system that integrates a structured mathematical knowledge system, model-centric data space modeling, and a reinforcement learning (RL)-based training paradigm to comprehensively enhance the mathematical reasoning abilities of MLLMs. The key contributions of We-Math 2.0 are fourfold: (1) MathBook Knowledge System: We construct a five-level hierarchical system encompassing 491 knowledge points and 1,819 fundamental principles. (2) MathBook-Standard & Pro: We develop MathBook-Standard, a dataset that ensures broad conceptual coverage and flexibility through dual expansion. Additionally, we define a three-dimensional difficulty space and generate 7 progressive variants per problem to build MathBook-Pro, a challenging dataset for robust training. (3) MathBook-RL: We propose a two-stage RL framework comprising: (i) Cold-Start Fine-tuning, which aligns the model with knowledge-oriented chain-of-thought reasoning; and (ii) Progressive Alignment RL, leveraging average-reward learning and dynamic data scheduling to achieve progressive alignment across difficulty levels. (4) MathBookEval: We introduce a comprehensive benchmark covering all 491 knowledge points with diverse reasoning step distributions. Experimental results show that MathBook-RL performs competitively with existing baselines on four widely-used benchmarks and achieves strong results on MathBookEval, suggesting promising generalization in mathematical reasoning.

【6】ComoRAG: A Cognitive-Inspired Memory-Organized RAG for Stateful Long Narrative Reasoning
标题：ComoRAG：一个受认知启发、记忆组织的RAG，用于状态长叙事推理
链接：https://arxiv.org/abs/2508.10419

作者：ng, Rongchen Zhao, Wei Wei, Yufeng Wang, Mo Yu, Jie Zhou, Jin Xu, Liyan Xu
摘要：长篇小说的叙事理解一直是一个具有挑战性的领域，这是由于其复杂的情节和错综复杂的人物和实体之间的关系。由于LLM的推理减少了扩展的上下文和高计算成本，基于检索的方法在实践中仍然是一个关键的作用。然而，传统的RAG方法可能由于其无状态的单步检索过程而不足，这通常忽略了在长距离上下文中捕获互连关系的动态性质。在这项工作中，我们提出了ComoRAG，持有的原则，叙事推理不是一个一次性的过程，而是一个动态的，不断发展的新证据获取和过去的知识巩固之间的相互作用，类似于人类的认知时，推理与记忆相关的信号在大脑中。具体来说，当遇到推理僵局时，ComoRAG在与动态内存工作空间交互时会经历迭代推理循环。在每个周期中，它生成探测查询来设计新的探索路径，然后将检索到的新方面的证据集成到全局内存池中，从而支持查询解析的连贯上下文的出现。在四个具有挑战性的长背景叙事基准（20万+代币）中，ComoRAG的表现优于强RAG基线，与最强基线相比，相对收益高达11%。进一步的分析表明，ComoRAG是特别有利的复杂的查询，需要全球的理解，提供了一个原则性的，认知动机的范式检索为基础的长上下文理解对有状态的推理。我们的代码在https://github.com/EternityJune25/ComoRAG上公开发布
摘要：Narrative comprehension on long stories and novels has been a challenging domain attributed to their intricate plotlines and entangled, often evolving relations among characters and entities. Given the LLM's diminished reasoning over extended context and high computational cost, retrieval-based approaches remain a pivotal role in practice. However, traditional RAG methods can fall short due to their stateless, single-step retrieval process, which often overlooks the dynamic nature of capturing interconnected relations within long-range context. In this work, we propose ComoRAG, holding the principle that narrative reasoning is not a one-shot process, but a dynamic, evolving interplay between new evidence acquisition and past knowledge consolidation, analogous to human cognition when reasoning with memory-related signals in the brain. Specifically, when encountering a reasoning impasse, ComoRAG undergoes iterative reasoning cycles while interacting with a dynamic memory workspace. In each cycle, it generates probing queries to devise new exploratory paths, then integrates the retrieved evidence of new aspects into a global memory pool, thereby supporting the emergence of a coherent context for the query resolution. Across four challenging long-context narrative benchmarks (200K+ tokens), ComoRAG outperforms strong RAG baselines with consistent relative gains up to 11% compared to the strongest baseline. Further analysis reveals that ComoRAG is particularly advantageous for complex queries requiring global comprehension, offering a principled, cognitively motivated paradigm for retrieval-based long context comprehension towards stateful reasoning. Our code is publicly released at https://github.com/EternityJune25/ComoRAG

【7】Convergence Analysis of Max-Min Exponential Neural Network Operators in Orlicz Space
标题：Orlicz空间中最大-最小指数神经网络运算符的收敛性分析
链接：https://arxiv.org/abs/2508.10248

作者：an Pradhan, Madan Mohan Soren
备注：35 pages, 6 figures
摘要：在目前的工作中，我们提出了一个最大最小的方法近似函数使用指数神经网络运营商。我们扩展这个框架，开发最大最小Kantorovich型指数神经网络运营商，并研究其逼近性能。我们研究了一元函数的点态收敛和一致收敛。为了分析收敛阶，我们使用对数连续模并估计了相应的收敛速度。此外，我们研究的最大最小Kantorovich型指数神经网络运营商的Orlicz空间设置内的收敛行为。我们提供了一些图形表示，以说明通过适当的内核和sigmoidal激活函数的函数的近似误差。
摘要：In this current work, we propose a Max Min approach for approximating functions using exponential neural network operators. We extend this framework to develop the Max Min Kantorovich-type exponential neural network operators and investigate their approximation properties. We study both pointwise and uniform convergence for univariate functions. To analyze the order of convergence, we use the logarithmic modulus of continuity and estimate the corresponding rate of convergence. Furthermore, we examine the convergence behavior of the Max Min Kantorovich type exponential neural network operators within the Orlicz space setting. We provide some graphical representations to illustrate the approximation error of the function through suitable kernel and sigmoidal activation functions.

【8】AI-Driven Detection and Analysis of Handwriting on Seized Ivory: A Tool to Uncover Criminal Networks in the Illicit Wildlife Trade
标题：人工智能驱动的查获象牙笔迹检测和分析：揭露非法野生动物贸易犯罪网络的工具
链接：https://arxiv.org/abs/2508.10219

作者：, Ryan J. Horwitz, John E. Brown III, Amit Misra, Felipe Oviedo, Kevin White, Juan M. Lavista Ferres, Samuel K. Wasser
备注：Submitted. 13 pages, 5 figures, 4 tables
摘要：跨国象牙贸易继续推动非洲大象数量的下降，贩运网络仍然难以破坏。执法官员缴获的象牙带有负责出口象牙的贩运者的法医信息，包括DNA证据和贩运者手写的标记。20年来，对象牙DNA的分析已经确定了大象被偷猎的地点，并建立了象牙运输之间的联系。虽然利用遗传证据建立的联系是非常确凿的，但遗传数据是昂贵的，有时不可能获得。但是，尽管手写标记很容易拍照，但它们很少被记录或分析。在这里，我们提出了一个人工智能驱动的管道，用于提取和分析缴获的象牙上的手写标记，提供了一个新颖的，可扩展的，低成本的法医证据来源。在6年期间（2014-2019年）从8次大规模象牙缉获中收集了6，085张照片，我们使用对象检测模型提取了17，000多个单独的标记，然后使用最先进的人工智能工具对其进行标记和描述。我们确定了184个重复出现的“签名标记”，这些标记连接着它们出现的象牙。在多次缉获中观察到20个签名标记，通过参与两次运输的贩运者确定了这些缉获之间的法医联系。这项工作补充了其他调查技术，填补了没有其他数据来源的空白。该研究展示了人工智能在野生动物法医学中的变革潜力，并强调了将笔迹分析整合到破坏有组织野生动物犯罪工作中的实际步骤。
摘要：The transnational ivory trade continues to drive the decline of elephant populations across Africa, and trafficking networks remain difficult to disrupt. Tusks seized by law enforcement officials carry forensic information on the traffickers responsible for their export, including DNA evidence and handwritten markings made by traffickers. For 20 years, analyses of tusk DNA have identified where elephants were poached and established connections among shipments of ivory. While the links established using genetic evidence are extremely conclusive, genetic data is expensive and sometimes impossible to obtain. But though handwritten markings are easy to photograph, they are rarely documented or analyzed. Here, we present an AI-driven pipeline for extracting and analyzing handwritten markings on seized elephant tusks, offering a novel, scalable, and low-cost source of forensic evidence. Having collected 6,085 photographs from eight large seizures of ivory over a 6-year period (2014-2019), we used an object detection model to extract over 17,000 individual markings, which were then labeled and described using state-of-the-art AI tools. We identified 184 recurring "signature markings" that connect the tusks on which they appear. 20 signature markings were observed in multiple seizures, establishing forensic links between these seizures through traffickers involved in both shipments. This work complements other investigative techniques by filling in gaps where other data sources are unavailable. The study demonstrates the transformative potential of AI in wildlife forensics and highlights practical steps for integrating handwriting analysis into efforts to disrupt organized wildlife crime.

【9】An Explainable AI based approach for Monitoring Animal Health
标题：一种基于人工智能的可解释方法来监测动物健康
链接：https://arxiv.org/abs/2508.10210

作者：aa, Shubham Dixit, Mrityunjay Sharma, Ritesh Kumar
摘要：由于难以跟踪农场中的所有动物，因此监测牛的健康状况和优化产量是奶农面临的主要挑战。这项工作旨在展示基于可解释机器学习（ML）方法的现代数据驱动的农业实践，这些方法解释了奶牛的活动和行为。3轴加速度传感器的连续数据收集以及强大的ML方法和算法的使用，为农民和研究人员提供了关于牛活动的可操作信息，使农民能够做出明智的决策并采用可持续的做法。该研究利用基于蓝牙的物联网（IoT）设备和4G网络进行无缝数据传输，即时分析，推理生成，并使用可解释性框架解释模型性能。特别强调的加速度计时间序列数据的预处理，包括提取的统计特征，信号处理技术，和滞后的功能，使用滑动窗口技术。各种超参数优化的ML模型在不同的窗口长度上进行评估，以进行活动分类。k-最近邻分类器实现了最佳性能，训练集上的AUC平均值为0.98，标准差为0.0026，测试集上的AUC平均值为0.99。为了确保透明度，基于可解释人工智能的框架（如SHAP）用于解释从业者可以理解和使用的特征重要性。重要特征的详细比较，以及选定特征的稳定性分析，支持可持续畜牧业管理的可解释和实用的ML模型的开发。
摘要：Monitoring cattle health and optimizing yield are key challenges faced by dairy farmers due to difficulties in tracking all animals on the farm. This work aims to showcase modern data-driven farming practices based on explainable machine learning(ML) methods that explain the activity and behaviour of dairy cattle (cows). Continuous data collection of 3-axis accelerometer sensors and usage of robust ML methodologies and algorithms, provide farmers and researchers with actionable information on cattle activity, allowing farmers to make informed decisions and incorporate sustainable practices. This study utilizes Bluetooth-based Internet of Things (IoT) devices and 4G networks for seamless data transmission, immediate analysis, inference generation, and explains the models performance with explainability frameworks. Special emphasis is put on the pre-processing of the accelerometers time series data, including the extraction of statistical characteristics, signal processing techniques, and lag-based features using the sliding window technique. Various hyperparameter-optimized ML models are evaluated across varying window lengths for activity classification. The k-nearest neighbour Classifier achieved the best performance, with AUC of mean 0.98 and standard deviation of 0.0026 on the training set and 0.99 on testing set). In order to ensure transparency, Explainable AI based frameworks such as SHAP is used to interpret feature importance that can be understood and used by practitioners. A detailed comparison of the important features, along with the stability analysis of selected features, supports development of explainable and practical ML models for sustainable livestock management.

【10】Prediction-Powered Inference with Inverse Probability Weighting
标题：具有逆概率加权的预测动力推理
链接：https://arxiv.org/abs/2508.10149

作者： Datta, Nicholas G. Polson
备注：15 pages, 3 figures
摘要：预测动力推理（PPI）是一个最近的框架，有效的统计推断与部分标记的数据，结合基于模型的预测一个大的未标记的集合与偏差校正从一个较小的标记子集。我们表明，PPI可以扩展到处理信息标签，通过用逆概率加权（IPW）版本替换其未加权的偏差校正项，使用经典的Horvitz-Thompson或H\'ajek形式。这种联系将基于设计的调查抽样思想与现代预测辅助推理结合起来，当标签概率在单位之间变化时，产生的估计量仍然有效。我们考虑包含概率未知但从正确指定的模型估计的常见设置。在模拟中，IPW调整PPI与估计倾向的性能密切匹配已知概率的情况下，保留名义覆盖率和方差减少PPI的好处。
摘要：Prediction-powered inference (PPI) is a recent framework for valid statistical inference with partially labeled data, combining model-based predictions on a large unlabeled set with bias correction from a smaller labeled subset. We show that PPI can be extended to handle informative labeling by replacing its unweighted bias-correction term with an inverse probability weighted (IPW) version, using the classical Horvitz--Thompson or H\'ajek forms. This connection unites design-based survey sampling ideas with modern prediction-assisted inference, yielding estimators that remain valid when labeling probabilities vary across units. We consider the common setting where the inclusion probabilities are not known but estimated from a correctly specified model. In simulations, the performance of IPW-adjusted PPI with estimated propensities closely matches the known-probability case, retaining both nominal coverage and the variance-reduction benefits of PPI.

检测相关(6篇)

【1】Lightweight CNNs for Embedded SAR Ship Target Detection and Classification
标题：用于嵌入式SAR船舶目标检测和分类的轻量级CNN
链接：https://arxiv.org/abs/2508.10712

作者：esse, Georgios Pilikos, Mario Azcueta, Nicolas Floury
备注：Accepted at Big Data from Space 2025 (BiDS'25)
摘要：合成孔径雷达（SAR）数据能够对海上船只进行大规模监视。然而，近实时监测目前受到限制，需要下行所有原始数据，执行图像聚焦，并随后在地面上进行分析。生成更高级别产品的板载处理可以减少需要下行的数据量，缓解带宽限制并最大限度地减少延迟。然而，由于卫星有限的存储器、处理能力和计算资源，传统的图像聚焦和处理算法面临挑战。这项工作提出并评估神经网络设计的实时推理未聚焦的SAR数据在条带和干涉宽（IW）模式与哨兵1捕获。我们的结果证明了使用我们的模型之一在FPGA上进行板上处理和部署的可行性。此外，通过调查船舶和风车之间的二元分类任务，我们证明了目标分类是可能的。
摘要：Synthetic Aperture Radar (SAR) data enables large-scale surveillance of maritime vessels. However, near-real-time monitoring is currently constrained by the need to downlink all raw data, perform image focusing, and subsequently analyze it on the ground. On-board processing to generate higher-level products could reduce the data volume that needs to be downlinked, alleviating bandwidth constraints and minimizing latency. However, traditional image focusing and processing algorithms face challenges due to the satellite's limited memory, processing power, and computational resources. This work proposes and evaluates neural networks designed for real-time inference on unfocused SAR data acquired in Stripmap and Interferometric Wide (IW) modes captured with Sentinel-1. Our results demonstrate the feasibility of using one of our models for on-board processing and deployment on an FPGA. Additionally, by investigating a binary classification task between ships and windmills, we demonstrate that target classification is possible.

【2】Conditional Information Bottleneck for Multimodal Fusion: Overcoming Shortcut Learning in Sarcasm Detection
标题：多模式融合的条件信息瓶颈：克服讽刺检测中的预设学习
链接：https://arxiv.org/abs/2508.10644

作者：g, Qi Jia, Cong Xu, Feiyu Chen, Yuhan Liu, Haotian Zhang, Liang Jin, Lu Liu, Zhichun Wang
摘要：多模态讽刺检测是一项复杂的任务，需要区分模态之间的微妙互补信号，同时过滤掉不相关的信息。许多高级方法依赖于从数据集学习快捷方式，而不是提取预期的与算法相关的特征。然而，我们的实验表明，捷径学习削弱了模型在现实世界中的泛化能力。此外，我们通过系统的实验揭示了当前多模态讽刺检测的模态融合策略的弱点，强调了关注复杂情感识别的有效模态融合的必要性。为了解决这些挑战，我们通过从MUStARD++中删除快捷信号来构建MUStARD++$^{R}$。然后，一个多模态条件信息瓶颈（MCIB）模型，使有效的多模态融合的讽刺检测。实验结果表明，MCIB在不依赖捷径学习的情况下获得了最佳性能。
摘要：Multimodal sarcasm detection is a complex task that requires distinguishing subtle complementary signals across modalities while filtering out irrelevant information. Many advanced methods rely on learning shortcuts from datasets rather than extracting intended sarcasm-related features. However, our experiments show that shortcut learning impairs the model's generalization in real-world scenarios. Furthermore, we reveal the weaknesses of current modality fusion strategies for multimodal sarcasm detection through systematic experiments, highlighting the necessity of focusing on effective modality fusion for complex emotion recognition. To address these challenges, we construct MUStARD++$^{R}$ by removing shortcut signals from MUStARD++. Then, a Multimodal Conditional Information Bottleneck (MCIB) model is introduced to enable efficient multimodal fusion for sarcasm detection. Experimental results show that the MCIB achieves the best performance without relying on shortcut learning.

【3】SkeySpot: Automating Service Key Detection for Digital Electrical Layout Plans in the Construction Industry
标题：SkeSpot：建筑行业数字电气布局计划的自动化服务密钥检测
链接：https://arxiv.org/abs/2508.10449

作者：i, Rohit Meena, Param Rajpura, Yogesh Kumar Meena
备注：6 pages, preprint accepted in IEEE SMC 2025
摘要：传统的平面图通常仅作为扫描文档保存，仍然是建筑行业建筑、城市规划和设施管理的重要资源。然而，由于缺乏机器可读的平面图，大规模的解释既费时又容易出错。自动化符号识别通过直接从平面图中识别服务关键符号，支持成本估算、基础设施维护和法规遵从性等工作流程，提供了一种可扩展的解决方案。这项工作介绍了一个标记的数字化电气布局图（DELP）数据集，包括45个扫描的电气布局图，注释有2,450个实例，跨越34个不同的服务关键类。使用DELP数据集的预训练对象检测模型提出了一个系统的评估框架。在基准测试的模型中，YOLOv 8实现了最高的性能，平均平均精度（mAP）为82.5%。使用YOLOv 8，我们开发了SkeySpot，这是一个轻量级的开源工具包，用于实时检测，分类和量化电气符号。SkeySpot生成结构化、标准化的输出，可以扩展为可互操作的建筑信息工作流程，最终实现下游应用程序和监管平台的兼容性。通过降低对专有CAD系统的依赖并减少手动注释工作，这种方法使电气布局的数字化更容易为建筑行业的中小型企业（SME）所用，同时支持建筑环境中更广泛的标准化，互操作性和可持续性目标。
摘要：Legacy floor plans, often preserved only as scanned documents, remain essential resources for architecture, urban planning, and facility management in the construction industry. However, the lack of machine-readable floor plans render large-scale interpretation both time-consuming and error-prone. Automated symbol spotting offers a scalable solution by enabling the identification of service key symbols directly from floor plans, supporting workflows such as cost estimation, infrastructure maintenance, and regulatory compliance. This work introduces a labelled Digitised Electrical Layout Plans (DELP) dataset comprising 45 scanned electrical layout plans annotated with 2,450 instances across 34 distinct service key classes. A systematic evaluation framework is proposed using pretrained object detection models for DELP dataset. Among the models benchmarked, YOLOv8 achieves the highest performance with a mean Average Precision (mAP) of 82.5\%. Using YOLOv8, we develop SkeySpot, a lightweight, open-source toolkit for real-time detection, classification, and quantification of electrical symbols. SkeySpot produces structured, standardised outputs that can be scaled up for interoperable building information workflows, ultimately enabling compatibility across downstream applications and regulatory platforms. By lowering dependency on proprietary CAD systems and reducing manual annotation effort, this approach makes the digitisation of electrical layouts more accessible to small and medium-sized enterprises (SMEs) in the construction industry, while supporting broader goals of standardisation, interoperability, and sustainability in the built environment.

【4】Federated Anomaly Detection for Multi-Tenant Cloud Platforms with Personalized Modeling
标题：具有个性化建模的多租户云平台联合异常检测
链接：https://arxiv.org/abs/2508.10255

作者：, Heyao Liu, Nyutian Long, Guanzi Yao
摘要：本文提出了一种基于联邦学习的异常检测方法，以解决多租户云环境中的关键挑战，包括数据隐私泄漏，异构资源行为，以及集中式建模的局限性。该方法建立涉及多个租户的联合训练框架。每个租户使用私有资源使用数据在本地训练模型。通过参数聚合，优化了全局模型，实现了跨租户协同异常检测，同时保护了数据隐私。为了提高对不同资源使用模式的适应性，引入了个性化参数调整机制。这允许模型在共享全局知识的同时保留特定于租户的特征表示。在模型输出阶段，使用马氏距离计算异常分数。这提高了异常检测的准确性和稳定性。实验使用来自云平台的真实遥测数据来构建模拟的多租户环境。该研究评估了不同的参与率和噪声注入水平下的模型的性能。这些比较证明了所提出的方法的鲁棒性和检测精度。实验结果表明，该方法优于现有的主流模型的关键指标，如精度，召回率和F1分数。它还在各种复杂场景中保持稳定的性能。这些发现突出了该方法在云计算环境中的智能资源监控和异常诊断的实际潜力。
摘要：This paper proposes an anomaly detection method based on federated learning to address key challenges in multi-tenant cloud environments, including data privacy leakage, heterogeneous resource behavior, and the limitations of centralized modeling. The method establishes a federated training framework involving multiple tenants. Each tenant trains the model locally using private resource usage data. Through parameter aggregation, a global model is optimized, enabling cross-tenant collaborative anomaly detection while preserving data privacy. To improve adaptability to diverse resource usage patterns, a personalized parameter adjustment mechanism is introduced. This allows the model to retain tenant-specific feature representations while sharing global knowledge. In the model output stage, the Mahalanobis distance is used to compute anomaly scores. This enhances both the accuracy and stability of anomaly detection. The experiments use real telemetry data from a cloud platform to construct a simulated multi-tenant environment. The study evaluates the model's performance under varying participation rates and noise injection levels. These comparisons demonstrate the proposed method's robustness and detection accuracy. Experimental results show that the proposed method outperforms existing mainstream models across key metrics such as Precision, Recall, and F1-Score. It also maintains stable performance in various complex scenarios. These findings highlight the method's practical potential for intelligent resource monitoring and anomaly diagnosis in cloud computing environments.

【5】Out-of-Distribution Detection using Counterfactual Distance
标题：使用反事实距离的分布外检测
链接：https://arxiv.org/abs/2508.10148

作者：ica, Francesco Leofante, Alessio Lomuscio
摘要：为了安全地使用机器学习系统，需要准确和可解释的分发外（OOD）检测。以前的工作表明，决策边界的特征距离可以用来有效地识别OOD数据。在本文中，我们建立在这种直觉，并提出了一个事后OOD检测方法，给定一个输入，计算距离决策边界利用反事实的解释。由于计算的解释可能是昂贵的大型架构，我们还提出了战略，以提高可扩展性，直接在嵌入空间中计算反事实。至关重要的是，由于该方法采用了反事实解释，我们可以无缝地使用它们来帮助解释探测器的结果。我们表明，我们的方法与CIFAR-10的最新技术水平一致，实现了93.50%AUROC和25.80%FPR95。在四个OOD数据集上，我们的方法在CIFAR-100上以97.05%AUROC和13.79%FPR95以及ImageNet-200上以92.55%AUROC和33.55%FPR95优于这些方法
摘要：Accurate and explainable out-of-distribution (OOD) detection is required to use machine learning systems safely. Previous work has shown that feature distance to decision boundaries can be used to identify OOD data effectively. In this paper, we build on this intuition and propose a post-hoc OOD detection method that, given an input, calculates the distance to decision boundaries by leveraging counterfactual explanations. Since computing explanations can be expensive for large architectures, we also propose strategies to improve scalability by computing counterfactuals directly in embedding space. Crucially, as the method employs counterfactual explanations, we can seamlessly use them to help interpret the results of our detector. We show that our method is in line with the state of the art on CIFAR-10, achieving 93.50% AUROC and 25.80% FPR95. Our method outperforms these methods on CIFAR-100 with 97.05% AUROC and 13.79% FPR95 and on ImageNet-200 with 92.55% AUROC and 33.55% FPR95 across four OOD datasets

【6】Machine Learning for Cloud Detection in IASI Measurements: A Data-Driven SVM Approach with Physical Constraints
标题：IIAS测量中云检测的机器学习：一种具有物理约束的数据驱动的支持者方法
链接：https://arxiv.org/abs/2508.10120

作者：garini, Cristina Sgattoni, Luca Sgheri
摘要：云探测对于大气反演、气候研究和天气预报至关重要。我们分析了红外大气探测干涉仪（IASI）机载气象业务（MetOp）卫星的红外辐射，以分类为清晰或多云的场景。我们应用基于不可分离数据的核方法的支持向量机（SVM）方法。在这项研究中，云识别（CISVM）的方法实现分类的测试集使用辐射或亮度温度，与降维通过主成分分析（PCA）和云敏感的通道选择，专注于最翔实的功能。我们的最佳配置与参考标签的一致性达到88.30%，并与中分辨率成像光谱仪（MODIS）的云掩模显示出很强的一致性，由于传感器的差异，极地地区的差异最大。这些结果表明，CISVM是一个强大的，灵活的，有效的方法，从红外辐射自动云分类，适用于业务检索和未来的任务，如远红外出射辐射的理解和监测（论坛），第九欧洲空间局地球探测器任务。
摘要：Cloud detection is essential for atmospheric retrievals, climate studies, and weather forecasting. We analyze infrared radiances from the Infrared Atmospheric Sounding Interferometer (IASI) onboard Meteorological Operational (MetOp) satellites to classify scenes as clear or cloudy. We apply the Support Vector Machine (SVM) approach, based on kernel methods for non-separable data. In this study, the method is implemented for Cloud Identification (CISVM) to classify the test set using radiances or brightness temperatures, with dimensionality reduction through Principal Component Analysis (PCA) and cloud-sensitive channel selection to focus on the most informative features. Our best configuration achieves 88.30 percent agreement with reference labels and shows strong consistency with cloud masks from the Moderate Resolution Imaging Spectroradiometer (MODIS), with the largest discrepancies in polar regions due to sensor differences. These results demonstrate that CISVM is a robust, flexible, and efficient method for automated cloud classification from infrared radiances, suitable for operational retrievals and future missions such as Far infrared Outgoing Radiation Understanding and Monitoring (FORUM), the ninth European Space Agency Earth Explorer Mission.

分类|识别(1篇)

【1】Multidimensional classification of posts for online course discussion forum curation
标题：在线课程讨论论坛策展的帖子多维分类
链接：https://arxiv.org/abs/2508.10008

作者：eandro Martins Candido, Jose Everardo Bessa Maia
备注：8 pages, 1 figure
摘要：在线课程中讨论论坛的自动管理需要不断更新，这使得频繁重新训练大型语言模型（LLM）成为一个资源密集型过程。为了避免昂贵的微调的需要，本文提出并评估了贝叶斯融合的使用。该方法将预训练的通用LLM的多维分类分数与在本地数据上训练的分类器的多维分类分数相结合。性能比较表明，所提出的融合改善了结果相比，每个分类器单独，是有竞争力的LLM微调方法
摘要：The automatic curation of discussion forums in online courses requires constant updates, making frequent retraining of Large Language Models (LLMs) a resource-intensive process. To circumvent the need for costly fine-tuning, this paper proposes and evaluates the use of Bayesian fusion. The approach combines the multidimensional classification scores of a pre-trained generic LLM with those of a classifier trained on local data. The performance comparison demonstrated that the proposed fusion improves the results compared to each classifier individually, and is competitive with the LLM fine-tuning approach

表征(1篇)

【1】SynBrain: Enhancing Visual-to-fMRI Synthesis via Probabilistic Representation Learning
标题：SynBrain：通过概率表示学习增强视觉到fMRI合成
链接：https://arxiv.org/abs/2508.10298

作者：ai, Jiamin Wu, Yu Zhu, Zhouheng Yao, Dongzhan Zhou, Andrew F. Luo, Qihao Zheng, Wanli Ouyang, Chunfeng Song
摘要：解读视觉刺激如何转化为皮层反应是计算神经科学的一个基本挑战。这种视觉到神经映射本质上是一对多的关系，因为相同的视觉输入可靠地引起不同试验、背景和受试者的不同血液动力学反应。然而，现有的确定性方法的斗争，同时捕捉潜在的功能一致性，编码刺激信息的生物变异性建模。为了解决这些局限性，我们提出了SynBrain，一个生成框架，它以概率和生物学可解释的方式模拟从视觉语义到神经反应的转换。SynBrain引入了两个关键组件：（i）BrainVAE通过概率学习将神经表征建模为连续概率分布，同时通过视觉语义约束保持功能一致性;（ii）语义到神经映射器作为语义传输路径，将视觉语义投射到神经响应流形中，以促进高保真fMRI合成。实验结果表明，SynBrain超越国家的最先进的方法，在特定主题的视觉功能磁共振成像编码性能。此外，SynBrain能够有效地适应具有Few-Shot数据的新受试者，并合成高质量的fMRI信号，从而有效地改善数据受限的fMRI到图像解码性能。除此之外，SynBrain还揭示了试验和受试者之间的功能一致性，合成信号捕获了由生物神经变异性形成的可解释模式。该代码将公开发布。
摘要：Deciphering how visual stimuli are transformed into cortical responses is a fundamental challenge in computational neuroscience. This visual-to-neural mapping is inherently a one-to-many relationship, as identical visual inputs reliably evoke variable hemodynamic responses across trials, contexts, and subjects. However, existing deterministic methods struggle to simultaneously model this biological variability while capturing the underlying functional consistency that encodes stimulus information. To address these limitations, we propose SynBrain, a generative framework that simulates the transformation from visual semantics to neural responses in a probabilistic and biologically interpretable manner. SynBrain introduces two key components: (i) BrainVAE models neural representations as continuous probability distributions via probabilistic learning while maintaining functional consistency through visual semantic constraints; (ii) A Semantic-to-Neural Mapper acts as a semantic transmission pathway, projecting visual semantics into the neural response manifold to facilitate high-fidelity fMRI synthesis. Experimental results demonstrate that SynBrain surpasses state-of-the-art methods in subject-specific visual-to-fMRI encoding performance. Furthermore, SynBrain adapts efficiently to new subjects with few-shot data and synthesizes high-quality fMRI signals that are effective in improving data-limited fMRI-to-image decoding performance. Beyond that, SynBrain reveals functional consistency across trials and subjects, with synthesized signals capturing interpretable patterns shaped by biological neural variability. The code will be made publicly available.

3D|3D重建等相关(1篇)

【1】Accelerating exoplanet climate modelling: A machine learning approach to complement 3D GCM grid simulations
标题：加速系外行星气候建模：补充3D GCM网格模拟的机器学习方法
链接：https://arxiv.org/abs/2508.10827

作者： Plaschzug, Amit Reza, Ludmila Carone, Sebastian Gernjak, Christiane Helling
摘要：随着能够更详细和更多地观测系外行星大气的望远镜的不断改进，对增强型3D气候模型的需求越来越大，以支持和帮助解释CHEOPS，TESS，JWST，PLATO和Ariel等太空任务的观测数据。然而，大气环流模式（GCM）的计算密集型和耗时的性质提出了重大的挑战，在模拟范围广泛的系外行星大气。本研究旨在确定机器学习（ML）算法是否可用于预测任意潮汐锁定气态系外行星在行星参数范围内的3D温度和风结构。介绍了一个新的3D GCM网格，其中包含60个围绕A、F、G、K和M型主星运行的膨胀热彗星。密集神经网络（DNN）和决策树算法（XGBoost）在这个网格上进行训练，以预测当地的气体温度以及水平和垂直风。为了确保ML模型预测的可靠性和质量，选择了WASP-121 b，HATS-42 b，NGTS-17 b，WASP-23 b和NGTS-1 b类行星，这些都是PLATO观测的目标，并使用ExoRad和两种ML方法作为测试用例进行建模。DNN对气体温度的预测达到了这样一个程度，即除了一颗行星外，所有行星的计算光谱都在32 ppm以内，其中只有一个单一的HCN特征达到了100 ppm的差异。开发的ML仿真器可以可靠地预测膨胀的温暖到超热的潮汐锁定木星周围的A到M型宿主星的完整的3D温度场。它提供了一个快速的工具，以补充和扩展传统的GCM网格的系外行星系综研究。预测的质量是这样的，没有或最小的影响，对气相化学，因此对云的形成和传输光谱，是预期的。
摘要：With the development of ever-improving telescopes capable of observing exoplanet atmospheres in greater detail and number, there is a growing demand for enhanced 3D climate models to support and help interpret observational data from space missions like CHEOPS, TESS, JWST, PLATO, and Ariel. However, the computationally intensive and time-consuming nature of general circulation models (GCMs) poses significant challenges in simulating a wide range of exoplanetary atmospheres. This study aims to determine whether machine learning (ML) algorithms can be used to predict the 3D temperature and wind structure of arbitrary tidally-locked gaseous exoplanets in a range of planetary parameters. A new 3D GCM grid with 60 inflated hot Jupiters orbiting A, F, G, K, and M-type host stars modelled with Exorad has been introduced. A dense neural network (DNN) and a decision tree algorithm (XGBoost) are trained on this grid to predict local gas temperatures along with horizontal and vertical winds. To ensure the reliability and quality of the ML model predictions, WASP-121 b, HATS-42 b, NGTS-17 b, WASP-23 b, and NGTS-1 b-like planets, which are all targets for PLATO observation, are selected and modelled with ExoRad and the two ML methods as test cases. The DNN predictions for the gas temperatures are to such a degree that the calculated spectra agree within 32 ppm for all but one planet, for which only one single HCN feature reaches a 100 ppm difference. The developed ML emulators can reliably predict the complete 3D temperature field of an inflated warm to ultra-hot tidally locked Jupiter around A to M-type host stars. It provides a fast tool to complement and extend traditional GCM grids for exoplanet ensemble studies. The quality of the predictions is such that no or minimal effects on the gas phase chemistry, hence on the cloud formation and transmission spectra, are to be expected.

优化|敛散性(1篇)

【1】MDNS: Masked Diffusion Neural Sampler via Stochastic Optimal Control
标题：MDNS：通过随机最优控制的掩蔽扩散神经采样器
链接：https://arxiv.org/abs/2508.10684

作者：u, Wei Guo, Jaemoo Choi, Guan-Horng Liu, Yongxin Chen, Molei Tao
摘要：我们研究了学习神经采样器从离散状态空间生成样本的问题，其中目标概率质量函数$\pi\propto\mathrm{e}^{-U}$已知为归一化常数，这是统计物理，机器学习，组合优化等领域的重要任务，等。为了在状态空间具有大基数并且分布是多模态时更好地解决这个具有挑战性的任务，我们建议$\textbf{M}$要求$\textbf{D}$有效$\textbf{N}$eural $\textbf{S}$采样器（$\textbf{MDNS}$），一种通过一系列学习目标对齐两个路径度量来训练离散神经采样器的新框架，理论上基于连续时间马尔可夫链的随机最优控制。我们通过对具有不同统计特性的各种分布进行广泛的实验来验证MDNS的效率和可扩展性，其中MDNS学习从目标分布中准确采样，尽管问题维度非常高，并且远远优于其他基于学习的基线。还提供了消融和扩展的综合研究，以证明所提出的框架的有效性和潜力。
摘要：We study the problem of learning a neural sampler to generate samples from discrete state spaces where the target probability mass function $\pi\propto\mathrm{e}^{-U}$ is known up to a normalizing constant, which is an important task in fields such as statistical physics, machine learning, combinatorial optimization, etc. To better address this challenging task when the state space has a large cardinality and the distribution is multi-modal, we propose $\textbf{M}$asked $\textbf{D}$iffusion $\textbf{N}$eural $\textbf{S}$ampler ($\textbf{MDNS}$), a novel framework for training discrete neural samplers by aligning two path measures through a family of learning objectives, theoretically grounded in the stochastic optimal control of the continuous-time Markov chains. We validate the efficiency and scalability of MDNS through extensive experiments on various distributions with distinct statistical properties, where MDNS learns to accurately sample from the target distributions despite the extremely high problem dimensions and outperforms other learning-based baselines by a large margin. A comprehensive study of ablations and extensions is also provided to demonstrate the efficacy and potential of the proposed framework.

预测|估计(12篇)

【1】Geospatial Diffusion for Land Cover Imperviousness Change Forecasting
标题：土地覆被不渗透性变化预测的地理空间扩散
链接：https://arxiv.org/abs/2508.10649

作者：arshney, Vibhas Vats, Bhartendu Pandey, Christa Brelsford, Philipe Dias
摘要：目前和未来的土地覆被对若干重要的地球系统过程具有重大影响。例如，不透水的地表会升温，加速地表水径流，减少地下水渗透，从而对区域水文和洪水风险产生影响。虽然区域地球系统模型在未来气候情景中以高分辨率预测水文和大气过程的能力越来越强，但我们预测土地利用和土地覆盖变化（LULC）的能力，这是这些情景风险和后果评估的关键输入，已经落后。在本文中，我们提出了一种利用生成人工智能（GenAI）进行土地覆盖变化预测的新范式，将LULC预测框架为以历史和辅助数据源为条件的数据综合问题。我们讨论了生成模型的理想属性，为我们的研究前提奠定了基础，并通过实验证明了我们的方法的可行性，使用历史数据覆盖整个美国的不渗透性预测。具体来说，我们训练的扩散模型的不渗透性的十年预测和比较其性能的基线，假设没有变化。在12个大都市地区的一年培训期间举行的评估表明，平均分辨率为0.7km^2 $我们的模型产生MAE低于这样的基线。这一发现证实了这样一个生成模型可以从历史数据中捕获时空模式，这对预测未来的变化很重要。最后，我们讨论了未来的研究，以纳入有关地球的物理特性的辅助信息，以及支持模拟不同的情况下，通过驱动变量。
摘要：Land cover, both present and future, has a significant effect on several important Earth system processes. For example, impervious surfaces heat up and speed up surface water runoff and reduce groundwater infiltration, with concomitant effects on regional hydrology and flood risk. While regional Earth System models have increasing skill at forecasting hydrologic and atmospheric processes at high resolution in future climate scenarios, our ability to forecast land-use and land-cover change (LULC), a critical input to risk and consequences assessment for these scenarios, has lagged behind. In this paper, we propose a new paradigm exploiting Generative AI (GenAI) for land cover change forecasting by framing LULC forecasting as a data synthesis problem conditioned on historical and auxiliary data-sources. We discuss desirable properties of generative models that fundament our research premise, and demonstrate the feasibility of our methodology through experiments on imperviousness forecasting using historical data covering the entire conterminous United States. Specifically, we train a diffusion model for decadal forecasting of imperviousness and compare its performance to a baseline that assumes no change at all. Evaluation across 12 metropolitan areas for a year held-out during training indicate that for average resolutions $\geq 0.7\times0.7km^2$ our model yields MAE lower than such a baseline. This finding corroborates that such a generative model can capture spatiotemporal patterns from historical data that are significant for projecting future change. Finally, we discuss future research to incorporate auxiliary information on physical properties about the Earth, as well as supporting simulation of different scenarios by means of driver variables.

【2】Energy-Based Models for Predicting Mutational Effects on Proteins
标题：预测蛋白质突变效应的基于能量的模型
链接：https://arxiv.org/abs/2508.10629

作者：oga, Zhenyu Lei, Yinhan He, Camille Bilodeau, Jundong Li
备注：12 pages
摘要：预测结合自由能（$\Delta\Delta G$）的变化是蛋白质工程和蛋白质-蛋白质相互作用（PPI）工程中用于药物发现的重要任务。以前的工作已经观察到$\Delta\Delta G$和熵之间的高度相关性，使用生物学上重要的对象，如侧链角度和残基身份的概率来估计$\Delta\Delta G$。然而，估计蛋白质复合物的完整构象分布通常被认为是棘手的。在这项工作中，我们提出了一种新的方法来$\Delta\Delta G$预测，避免了这个问题，而是利用基于能量的模型来估计复杂的构象的概率。具体而言，我们新颖地分解$\Delta\Delta G$成基于序列的组件估计的逆折叠模型和基于结构的组件估计的能量模型。通过假设束缚态和非束缚态之间的平衡，这种分解变得易于处理，使我们能够简化与每个状态相关的简并度的估计。与以前基于深度学习的方法不同，我们的方法通过将常用的基于序列对数比值比的方法与$\Delta\Delta G$预测和基于统计力学的新$\Delta\Delta E$项相连接，结合了基于能量的物理归纳偏差。我们证明了在$\Delta\Delta G$预测和针对SARS-CoV-2的抗体优化方面优于现有最先进的结构和基于序列的深度学习方法。
摘要：Predicting changes in binding free energy ($\Delta\Delta G$) is a vital task in protein engineering and protein-protein interaction (PPI) engineering for drug discovery. Previous works have observed a high correlation between $\Delta\Delta G$ and entropy, using probabilities of biologically important objects such as side chain angles and residue identities to estimate $\Delta\Delta G$. However, estimating the full conformational distribution of a protein complex is generally considered intractable. In this work, we propose a new approach to $\Delta\Delta G$ prediction that avoids this issue by instead leveraging energy-based models for estimating the probability of a complex's conformation. Specifically, we novelly decompose $\Delta\Delta G$ into a sequence-based component estimated by an inverse folding model and a structure-based component estimated by an energy model. This decomposition is made tractable by assuming equilibrium between the bound and unbound states, allowing us to simplify the estimation of degeneracies associated with each state. Unlike previous deep learning-based methods, our method incorporates an energy-based physical inductive bias by connecting the often-used sequence log-odds ratio-based approach to $\Delta\Delta G$ prediction with a new $\Delta\Delta E$ term grounded in statistical mechanics. We demonstrate superiority over existing state-of-the-art structure and sequence-based deep learning methods in $\Delta\Delta G$ prediction and antibody optimization against SARS-CoV-2.

【3】Learning State-Space Models of Dynamic Systems from Arbitrary Data using Joint Embedding Predictive Architectures
标题：使用联合嵌入预测架构从任意数据学习动态系统的状态空间模型
链接：https://arxiv.org/abs/2508.10489

作者：en, Ganesh Sundaram, Daniel Görges
备注：6 Pages, Published in IFAC Joint Symposia on Mechatronics & Robotics 2025
摘要：随着联合嵌入预测架构（JEPAs）的出现，它似乎比基于重建的方法更有能力，本文介绍了一种新的技术，用于创建世界模型，使用连续时间动态系统从任意观测数据。所提出的方法集成了序列嵌入与神经常微分方程（神经ODE）。它采用损失函数，强制压缩嵌入和Lipschitz常数在状态转换，以构建一个组织良好的潜在状态空间。该方法的有效性证明通过生成结构化的潜在状态空间模型的单摆系统仅使用图像数据。这为开发更通用的控制算法和估计技术开辟了一种新技术，在机器人技术中具有广泛的应用。
摘要：With the advent of Joint Embedding Predictive Architectures (JEPAs), which appear to be more capable than reconstruction-based methods, this paper introduces a novel technique for creating world models using continuous-time dynamic systems from arbitrary observation data. The proposed method integrates sequence embeddings with neural ordinary differential equations (neural ODEs). It employs loss functions that enforce contractive embeddings and Lipschitz constants in state transitions to construct a well-organized latent state space. The approach's effectiveness is demonstrated through the generation of structured latent state-space models for a simple pendulum system using only image data. This opens up a new technique for developing more general control algorithms and estimation techniques with broad applications in robotics.

【4】The Conditional Regret-Capacity Theorem for Batch Universal Prediction
标题：批量普适预测的条件遗憾容量定理
链接：https://arxiv.org/abs/2508.10282

作者：daschi, Michael Gastpar
摘要：我们推导出经典的后悔能力定理的条件版本。这一结果可以用于通用预测中，以找到最小批量遗憾的下限，这是最近引入的平均遗憾的泛化，当批量的训练数据可用于预测器时。作为一个例子，我们将这个结果应用到类的二进制无记忆源。最后，我们将该定理推广到R 'enyi信息测度，揭示了条件R' enyi散度与条件Sibson互信息之间的深层联系.
摘要：We derive a conditional version of the classical regret-capacity theorem. This result can be used in universal prediction to find lower bounds on the minimal batch regret, which is a recently introduced generalization of the average regret, when batches of training data are available to the predictor. As an example, we apply this result to the class of binary memoryless sources. Finally, we generalize the theorem to R\'enyi information measures, revealing a deep connection between the conditional R\'enyi divergence and the conditional Sibson's mutual information.

【5】Interpretable Machine Learning Model for Early Prediction of Acute Kidney Injury in Critically Ill Patients with Cirrhosis: A Retrospective Study
标题：早期预测重症肝硬化患者急性肾损伤的可解释机器学习模型：一项回顾性研究
链接：https://arxiv.org/abs/2508.10233

作者：huheng Chen, Junyi Fan, Yong Si, Minoo Ahmadi, Elham Pishgar, Kamiar Alaei, Maryam Pishgar
摘要：背景资料：肝硬化是一种进行性肝脏疾病，具有高死亡率和频繁的并发症，特别是急性肾损伤（AKI），其发生在高达50%的住院患者和预后不良的患者中。AKI源于复杂的血流动力学、炎症和代谢变化，因此早期检测至关重要。许多预测工具缺乏准确性、可解释性以及与重症监护室（ICU）工作流程的一致性。这项研究开发了一种可解释的机器学习模型，用于肝硬化重症患者的早期AKI预测。研究方法：我们对MIMIC-IV v2.2数据库进行了回顾性分析，确定了1240例肝硬化成人ICU患者，排除了ICU停留时间低于48小时或缺失关键数据的患者。提取前48小时的实验室和生理变量。流水线包括预处理、缺失过滤、LASSO特征选择和SMOTE类平衡。六种算法-LightGBM，CatBoost，XGBoost，逻辑回归，朴素贝叶斯和神经网络-使用AUROC，准确性，F1评分，灵敏度，特异性和预测值进行训练和评估。结果：LightGBM获得了最佳性能（AUROC 0.808，95%CI 0.741-0.856;准确度0.704; NPV 0.911）。关键预测因素包括部分凝血活酶时间延长、未在机构外放置20 G、pH值低和pO 2改变，与已知的水肿-AKI机制一致，并提示可采取行动的目标。结论：基于LightGBM的模型能够使用常规临床变量对ICU肝硬化患者进行准确的早期AKI风险分层。其高阴性预测值支持低风险患者的安全降级，可解释性促进临床医生的信任和有针对性的预防。外部验证和整合到电子健康记录系统是必要的。
摘要：Background: Cirrhosis is a progressive liver disease with high mortality and frequent complications, notably acute kidney injury (AKI), which occurs in up to 50% of hospitalized patients and worsens outcomes. AKI stems from complex hemodynamic, inflammatory, and metabolic changes, making early detection essential. Many predictive tools lack accuracy, interpretability, and alignment with intensive care unit (ICU) workflows. This study developed an interpretable machine learning model for early AKI prediction in critically ill patients with cirrhosis. Methods: We conducted a retrospective analysis of the MIMIC-IV v2.2 database, identifying 1240 adult ICU patients with cirrhosis and excluding those with ICU stays under 48 hours or missing key data. Laboratory and physiological variables from the first 48 hours were extracted. The pipeline included preprocessing, missingness filtering, LASSO feature selection, and SMOTE class balancing. Six algorithms-LightGBM, CatBoost, XGBoost, logistic regression, naive Bayes, and neural networks-were trained and evaluated using AUROC, accuracy, F1-score, sensitivity, specificity, and predictive values. Results: LightGBM achieved the best performance (AUROC 0.808, 95% CI 0.741-0.856; accuracy 0.704; NPV 0.911). Key predictors included prolonged partial thromboplastin time, absence of outside-facility 20G placement, low pH, and altered pO2, consistent with known cirrhosis-AKI mechanisms and suggesting actionable targets. Conclusion: The LightGBM-based model enables accurate early AKI risk stratification in ICU patients with cirrhosis using routine clinical variables. Its high negative predictive value supports safe de-escalation for low-risk patients, and interpretability fosters clinician trust and targeted prevention. External validation and integration into electronic health record systems are warranted.

【6】Characterizing Evolution in Expectation-Maximization Estimates for Overspecified Mixed Linear Regression
标题：超指定混合线性回归期望最大化估计的演化特征
链接：https://arxiv.org/abs/2508.10154

作者：uo, Abolfazl Hashemi
摘要：混合模型由于其实用性和广泛的理论基础而受到广泛关注。一个持续存在的挑战是模型错误指定，当要拟合的模型具有比数据分布中更多的混合成分时，就会发生这种情况。在本文中，我们开发了一个理论上的理解的期望最大化（EM）算法的行为的背景下，有针对性的模型误指定为超指定的两个组件的混合线性回归（2 MLR）未知的$d$维回归参数和混合权重。在定理5.1中，在总体水平上，对于混合权重具有不平衡的初始猜测，我们建立了回归参数在O（\log（1/\log））$步骤中的线性收敛。相反，与一个平衡的初始猜测混合权重，我们观察到次线性收敛在$O（\displaystyle ^{-2}）$步骤，以实现$\displaystyle $-准确度在欧几里得距离。在定理6.1中，在有限样本水平上，对于具有充分不平衡的固定混合权重的混合，我们证明了$O（（d/n）^{1/2}）$的统计精度，而对于具有充分平衡的固定混合权重的混合，给定$n$数据样本，精度为$O（（d/n）^{1/4}）$。此外，我们强调我们的总体水平和有限样本水平结果之间的联系：通过将定理5.1中的期望最终精度设置为在有限样本水平上与定理6.1中的匹配，也就是说，对于充分不平衡的固定混合权重，让$\n = O（（d/n）^{1/2}）$，$\n = O（（d/n）^{1/4}）$对于充分平衡的固定混合权重，我们直观地推导出对于充分不平衡和平衡的初始混合权重，在有限样本水平上的迭代复杂度界$O（\log（1/\n））=O（\log（n/d））$和$O（\log ^{-2}）=O（（n/d）^{1/2}）$。我们进一步扩展我们的分析，在超指定的设置低信噪比制度。
摘要：Mixture models have attracted significant attention due to practical effectiveness and comprehensive theoretical foundations. A persisting challenge is model misspecification, which occurs when the model to be fitted has more mixture components than those in the data distribution. In this paper, we develop a theoretical understanding of the Expectation-Maximization (EM) algorithm's behavior in the context of targeted model misspecification for overspecified two-component Mixed Linear Regression (2MLR) with unknown $d$-dimensional regression parameters and mixing weights. In Theorem 5.1 at the population level, with an unbalanced initial guess for mixing weights, we establish linear convergence of regression parameters in $O(\log(1/\epsilon))$ steps. Conversely, with a balanced initial guess for mixing weights, we observe sublinear convergence in $O(\epsilon^{-2})$ steps to achieve the $\epsilon$-accuracy at Euclidean distance. In Theorem 6.1 at the finite-sample level, for mixtures with sufficiently unbalanced fixed mixing weights, we demonstrate a statistical accuracy of $O((d/n)^{1/2})$, whereas for those with sufficiently balanced fixed mixing weights, the accuracy is $O((d/n)^{1/4})$ given $n$ data samples. Furthermore, we underscore the connection between our population level and finite-sample level results: by setting the desired final accuracy $\epsilon$ in Theorem 5.1 to match that in Theorem 6.1 at the finite-sample level, namely letting $\epsilon = O((d/n)^{1/2})$ for sufficiently unbalanced fixed mixing weights and $\epsilon = O((d/n)^{1/4})$ for sufficiently balanced fixed mixing weights, we intuitively derive iteration complexity bounds $O(\log (1/\epsilon))=O(\log (n/d))$ and $O(\epsilon^{-2})=O((n/d)^{1/2})$ at the finite-sample level for sufficiently unbalanced and balanced initial mixing weights. We further extend our analysis in overspecified setting to low SNR regime.

【7】Next Edit Prediction: Learning to Predict Code Edits from Context and Interaction History
标题：下一步编辑预测：学习根据上下文和交互历史预测代码编辑
链接：https://arxiv.org/abs/2508.10074

作者：, Yintong Huo, Meng Zhang, Yichen Li, Michael R. Lyu
摘要：大型语言模型（LLM）的快速发展导致了集成到开发环境中的AI编程助手的广泛采用。一方面，低延迟代码完成提供了完成建议，但从根本上限制了光标的当前位置。另一方面，基于聊天的编辑可以执行复杂的修改，但迫使开发人员停止工作，用自然语言描述意图，这会导致上下文切换远离代码。这造成了次优的用户体验，因为这两种范式都没有主动预测开发人员在相关编辑序列中的下一次编辑。为了弥合这一差距，并提供无缝的代码编辑建议，我们引入了下一个编辑预测的任务，一个新的任务，旨在推断开发人员的意图，从最近的交互历史，预测的位置和内容的后续编辑。具体来说，我们为Next Edit Prediction任务策划了一个高质量的监督微调数据集和一个评估基准。然后，我们对一系列模型进行了监督微调，并对微调后的模型和其他基线模型进行了综合评估，得出了一些新的发现。这项工作奠定了一个新的互动模式，积极与开发人员合作，预计他们的后续行动，而不仅仅是反应显式的指令的基础。
摘要：The rapid advancement of large language models (LLMs) has led to the widespread adoption of AI-powered coding assistants integrated into a development environment. On one hand, low-latency code completion offers completion suggestions but is fundamentally constrained to the cursor's current position. On the other hand, chat-based editing can perform complex modifications, yet forces developers to stop their work, describe the intent in natural language, which causes a context-switch away from the code. This creates a suboptimal user experience, as neither paradigm proactively predicts the developer's next edit in a sequence of related edits. To bridge this gap and provide the seamless code edit suggestion, we introduce the task of Next Edit Prediction, a novel task designed to infer developer intent from recent interaction history to predict both the location and content of the subsequent edit. Specifically, we curate a high-quality supervised fine-tuning dataset and an evaluation benchmark for the Next Edit Prediction task. Then, we conduct supervised fine-tuning on a series of models and performed a comprehensive evaluation of both the fine-tuned models and other baseline models, yielding several novel findings. This work lays the foundation for a new interaction paradigm that proactively collaborate with developers by anticipating their following action, rather than merely reacting to explicit instructions.

【8】Measuring Time Series Forecast Stability for Demand Planning
标题：测量需求规划的时间序列预测稳定性
链接：https://arxiv.org/abs/2508.10063

作者：ee, Yuntian Xia
备注：6 pages, 3 figures; KDD '25
摘要：时间序列预测是为供应链制定需求计划的关键第一步。时间序列模型上的实验通常侧重于展示现有/基线解决方案在预测准确性方面的改进，并根据某些准确性指标进行量化。毫无疑问，预测准确性很重要;然而，在生产系统中，需求计划人员通常重视一致性和稳定性，而不是增量准确性的改进。假设输入没有显著变化，那么从一个规划周期到下一个规划周期变化很大的预测需要大量的人工干预，这会使需求规划者感到沮丧，甚至可能导致他们对ML预测模型失去信任。我们研究模型诱导的随机性，它量化的方差由一个单一的模型产生的一组预测时，输入是固定的。具有较低方差的模型更稳定。最近，通过开发用于时间序列预测的深度机器学习模型，预测界在预测准确性方面取得了显著进步。我们进行了一项案例研究，测量最先进的预测模型（Chronos，DeepAR，PatchTST，Temporal Fusion Transformer，TIDE和AutoGluon最佳质量集成）在M5比赛和Favorita杂货店销售的公共数据集上的稳定性和准确性。我们发现，集合模型提高稳定性，而不会显着恶化（甚至提高）预测精度。虽然这些结果可能并不令人惊讶，但本文的主要观点是提出需要进一步研究生产系统中部署的模型的预测稳定性。
摘要：Time series forecasting is a critical first step in generating demand plans for supply chains. Experiments on time series models typically focus on demonstrating improvements in forecast accuracy over existing/baseline solutions, quantified according to some accuracy metric. There is no doubt that forecast accuracy is important; however in production systems, demand planners often value consistency and stability over incremental accuracy improvements. Assuming that the inputs have not changed significantly, forecasts that vary drastically from one planning cycle to the next require high amounts of human intervention, which frustrates demand planners and can even cause them to lose trust in ML forecasting models. We study model-induced stochasticity, which quantifies the variance of a set of forecasts produced by a single model when the set of inputs is fixed. Models with lower variance are more stable. Recently the forecasting community has seen significant advances in forecast accuracy through the development of deep machine learning models for time series forecasting. We perform a case study measuring the stability and accuracy of state-of-the-art forecasting models (Chronos, DeepAR, PatchTST, Temporal Fusion Transformer, TiDE, and the AutoGluon best quality ensemble) on public data sets from the M5 competition and Favorita grocery sales. We show that ensemble models improve stability without significantly deteriorating (or even improving) forecast accuracy. While these results may not be surprising, the main point of this paper is to propose the need for further study of forecast stability for models that are being deployed in production systems.

【9】OpenFPL: An open-source forecasting method rivaling state-of-the-art Fantasy Premier League services
标题：OpenFPL：一种与最先进的Fantasy Premier League服务相媲美的开源预测方法
链接：https://arxiv.org/abs/2508.09992

作者：oos
备注：Models and inference code are freely available at this https URL
摘要：梦幻英超联赛让足球界参与选择那些在比赛周中表现最好的英超球员。获得准确的表现预测，通过指导对球员结果的预期和减少球队选择的不确定性，使参与者比竞争对手更具优势。然而，高准确度的预测目前仅限于商业服务，其内部工作未公开，并依赖于专有数据。本文旨在通过介绍OpenFPL，一种完全从公共数据开发的开源梦幻英超联赛预测方法，使对球员表现的高度准确预测民主化。OpenFPL包括根据前四个赛季（2020-21赛季至2023-24赛季）的Fantasy Premier League和Understat数据优化的特定位置集合模型，在对2024-25赛季的数据进行前瞻性测试时，其准确性可与领先的商业服务相媲美。OpenFPL还超过了高回报玩家（$> 2美元积分）的商业基准，这对排名增长最有影响力。这些发现适用于一周、两周和三周的预测范围，支持长期的转会和战略规划，同时也为最后一天的决策提供信息。
摘要：Fantasy Premier League engages the football community in selecting the Premier League players who will perform best from gameweek to gameweek. Access to accurate performance forecasts gives participants an edge over competitors by guiding expectations about player outcomes and reducing uncertainty in squad selection. However, high-accuracy forecasts are currently limited to commercial services whose inner workings are undisclosed and that rely on proprietary data. This paper aims to democratize access to highly accurate forecasts of player performance by presenting OpenFPL, an open-source Fantasy Premier League forecasting method developed exclusively from public data. Comprising position-specific ensemble models optimized on Fantasy Premier League and Understat data from four previous seasons (2020-21 to 2023-24), OpenFPL achieves accuracy comparable to a leading commercial service when tested prospectively on data from the 2024-25 season. OpenFPL also surpasses the commercial benchmark for high-return players ($>$ 2 points), which are most influential for rank gains. These findings hold across one-, two-, and three-gameweek forecast horizons, supporting long-term planning of transfers and strategies while also informing final-day decisions.

【10】Symmetry-Constrained Multi-Scale Physics-Informed Neural Networks for Graphene Electronic Band Structure Prediction
标题：用于石墨烯电子带结构预测的对称约束多尺度物理信息神经网络
链接：https://arxiv.org/abs/2508.10718

作者：Lee, I Hang Kwok, Kam Ian Leong, Chi Kiu Althina Chau, Kei Chon Sio
备注：36 pages and 14 figures
摘要：准确预测二维材料中的电子能带结构仍然是一个根本性的挑战，现有的方法难以平衡计算效率和物理精度。我们提出了对称约束多尺度物理信息神经网络（SCMS-PINN）v35，它直接学习石墨烯能带结构，同时通过多头架构严格执行晶体学对称性。我们的方法引入了三种专用的ResNet-6路径--用于Dirac物理的K-head，用于鞍点的M-head和用于平滑插值的General head--对从k点提取的31个物理信息特征进行操作。渐进式狄拉克约束调度系统地将权重参数从5.0增加到25.0，从而实现从全局拓扑到局部临界物理的分层学习。在300个epoch上对10，000个k点进行训练，训练损失减少了99.99%（34.597到0.003），验证损失为0.0085。该模型预测狄拉克点间隙在30.3 $\mu$eV的理论零，并实现平均误差为53.9毫电子伏（价）和40.5毫电子伏（传导）的布里渊区。所有12个C$_{6v}$操作都通过系统平均来执行，从而保证精确的对称性保持。该框架为将物理知识学习扩展到更广泛的二维材料以加速发现奠定了基础。
摘要：Accurate prediction of electronic band structures in two-dimensional materials remains a fundamental challenge, with existing methods struggling to balance computational efficiency and physical accuracy. We present the Symmetry-Constrained Multi-Scale Physics-Informed Neural Network (SCMS-PINN) v35, which directly learns graphene band structures while rigorously enforcing crystallographic symmetries through a multi-head architecture. Our approach introduces three specialized ResNet-6 pathways -- K-head for Dirac physics, M-head for saddle points, and General head for smooth interpolation -- operating on 31 physics-informed features extracted from k-points. Progressive Dirac constraint scheduling systematically increases the weight parameter from 5.0 to 25.0, enabling hierarchical learning from global topology to local critical physics. Training on 10,000 k-points over 300 epochs achieves 99.99\% reduction in training loss (34.597 to 0.003) with validation loss of 0.0085. The model predicts Dirac point gaps within 30.3 $\mu$eV of theoretical zero and achieves average errors of 53.9 meV (valence) and 40.5 meV (conduction) across the Brillouin zone. All twelve C$_{6v}$ operations are enforced through systematic averaging, guaranteeing exact symmetry preservation. This framework establishes a foundation for extending physics-informed learning to broader two-dimensional materials for accelerated discovery.

【11】CATNet: A geometric deep learning approach for CAT bond spread prediction in the primary market
标题：CATNet：一级市场CAT债券利差预测的几何深度学习方法
链接：https://arxiv.org/abs/2508.10208

作者：feh, Saeid Safarveisi
摘要：传统的巨灾债券定价模型难以捕捉这些工具中固有的复杂的相关数据。本文介绍了CATNet，这是一个新的框架，它应用了几何深度学习架构，关系图卷积网络（R-GCN），将CAT债券一级市场建模为图，利用其底层网络结构进行价差预测。我们的分析表明，CAT债券市场表现出无标度网络的特征，这种结构由几个高度连接和有影响力的枢纽所主导。CATNet表现出很高的预测性能，显著优于强大的随机森林基准。包含拓扑中心性度量作为特征提供了进一步的、显著的准确性提高。可解释性分析证实，这些网络特征不仅仅是统计人工制品，它们是长期持有的行业直觉的定量代理，涉及发行人声誉，承销商影响力和风险集中度。这项研究提供的证据表明，网络连接是价格的关键决定因素，为风险评估提供了一种新的范式，并证明基于图形的模型可以提供最先进的准确性和更深入的、可量化的市场洞察。
摘要：Traditional models for pricing catastrophe (CAT) bonds struggle to capture the complex, relational data inherent in these instruments. This paper introduces CATNet, a novel framework that applies a geometric deep learning architecture, the Relational Graph Convolutional Network (R-GCN), to model the CAT bond primary market as a graph, leveraging its underlying network structure for spread prediction. Our analysis reveals that the CAT bond market exhibits the characteristics of a scale-free network, a structure dominated by a few highly connected and influential hubs. CATNet demonstrates high predictive performance, significantly outperforming a strong Random Forest benchmark. The inclusion of topological centrality measures as features provides a further, significant boost in accuracy. Interpretability analysis confirms that these network features are not mere statistical artifacts; they are quantitative proxies for long-held industry intuition regarding issuer reputation, underwriter influence, and peril concentration. This research provides evidence that network connectivity is a key determinant of price, offering a new paradigm for risk assessment and proving that graph-based models can deliver both state-of-the-art accuracy and deeper, quantifiable market insights.

【12】Bayesian Models for Joint Selection of Features and Auto-Regressive Lags: Theory and Applications in Environmental and Financial Forecasting
标题：特征和自回归滞后联合选择的Bayesian模型：环境和金融预测中的理论和应用
链接：https://arxiv.org/abs/2508.10055

作者：anna, Sujit K. Ghosh
摘要：我们开发了一个贝叶斯框架的变量选择线性回归自相关误差，容纳滞后协变量和自回归结构。此设置出现在响应取决于同期或过去的解释变量和持续随机冲击的时间序列应用程序中，包括需要时间依赖性捕获的金融建模、水文预测和气象应用程序。我们的方法使用层次贝叶斯模型与穗和板先验，同时选择相关的协变量和滞后误差项。我们提出了一个有效的两阶段MCMC算法分离变量包含指标和模型参数的采样，以解决高维计算的挑战。理论分析建立后验选择一致性在温和的条件下，即使当候选预测指数增长的样本量，常见的现代时间序列与许多潜在的滞后变量。通过模拟和实际应用（地下水深度预测，S&P 500日志回报建模），我们证明了变量选择精度和预测性能的大幅提高。与现有方法相比，我们的框架实现了较低的MSPE，改进了真实模型组件识别，并具有更强的自相关噪声鲁棒性，强调了自回归设置中模型解释和预测的实用性。
摘要：We develop a Bayesian framework for variable selection in linear regression with autocorrelated errors, accommodating lagged covariates and autoregressive structures. This setting occurs in time series applications where responses depend on contemporaneous or past explanatory variables and persistent stochastic shocks, including financial modeling, hydrological forecasting, and meteorological applications requiring temporal dependency capture. Our methodology uses hierarchical Bayesian models with spike-and-slab priors to simultaneously select relevant covariates and lagged error terms. We propose an efficient two-stage MCMC algorithm separating sampling of variable inclusion indicators and model parameters to address high-dimensional computational challenges. Theoretical analysis establishes posterior selection consistency under mild conditions, even when candidate predictors grow exponentially with sample size, common in modern time series with many potential lagged variables. Through simulations and real applications (groundwater depth prediction, S&P 500 log returns modeling), we demonstrate substantial gains in variable selection accuracy and predictive performance. Compared to existing methods, our framework achieves lower MSPE, improved true model component identification, and greater robustness with autocorrelated noise, underscoring practical utility for model interpretation and forecasting in autoregressive settings.

其他神经网络|深度学习|模型|建模(18篇)

【1】Empirical Investigation into Configuring Echo State Networks for Representative Benchmark Problem Domains
标题：为代表性基准问题域配置回声状态网络的实证研究
链接：https://arxiv.org/abs/2508.10887

作者： Weborg, Gursel Serpen
备注：49 pages, 21 figures
摘要：本文研究回声状态网络，水库计算机，性能使用四个不同的基准问题，然后提出配置的架构，以及参数的选择和它们的值，这是适用于同一领域内的问题，以帮助填补那些进入这一领域的研究所需的经验差距的经验或经验规则。各种参数选择及其值调整的影响，以及对回声状态网络（一种被配置为库计算机的强大递归神经网络）进行的架构改变，在没有该领域经验的情况下完全理解可能具有挑战性，并且甚至一些超参数优化算法在没有首先进行适当手动选择的情况下可能难以调整参数值。因此，为了成功构建，必须了解参数及其值选择对Echo State Network架构性能的影响。因此，为了解决回声状态网络架构中广泛背景的要求，以及检查回声状态网络性能如何受到架构，设计和参数选择和值的变化的影响，一系列代表不同问题域的基准任务，包括时间序列预测，模式生成，混沌系统预测和时间序列分类，进行了建模和实验，以显示对回声状态网络性能的影响。
摘要：This paper examines Echo State Network, a reservoir computer, performance using four different benchmark problems, then proposes heuristics or rules of thumb for configuring the architecture, as well as the selection of parameters and their values, which are applicable to problems within the same domain, to help serve to fill the experience gap needed by those entering this field of study. The influence of various parameter selections and their value adjustments, as well as architectural changes made to an Echo State Network, a powerful recurrent neural network configured as a reservoir computer, can be challenging to fully comprehend without experience in the field, and even some hyperparameter optimization algorithms may have difficulty adjusting parameter values without proper manual selections made first. Therefore, it is imperative to understand the effects of parameters and their value selection on Echo State Network architecture performance for a successful build. Thus, to address the requirement for an extensive background in Echo State Network architecture, as well as examine how Echo State Network performance is affected with respect to variations in architecture, design, and parameter selection and values, a series of benchmark tasks representing different problem domains, including time series prediction, pattern generation, chaotic system prediction, and time series classification, were modeled and experimented on to show the impact on the performance of Echo State Network.

【2】SoK: Data Minimization in Machine Learning
标题：SoK：机器学习中的数据最小化
链接：https://arxiv.org/abs/2508.10836

作者：ab, Nikola Jovanović, Kimberly Mai, Prakhar Ganesh, Martin Vechev, Ferdinando Fioretto, Matthew Jagielski
摘要：数据最小化（DM）描述了只收集给定任务严格必要的数据的原则。这是GDPR和CPRA等主要数据保护法规的基本原则。违反这一原则会产生严重的现实后果，监管行动导致的罚款高达数亿美元。值得注意的是，数据最小化的相关性在机器学习（ML）应用中尤为突出，这些应用通常依赖于大型数据集，从而产生了一个称为机器学习数据最小化（DMML）的新兴研究领域。与此同时，其他ML隐私和安全主题的现有工作通常解决与DMML相关的问题，而没有明确承认这种联系。这种脱节导致从业者之间的混乱，使他们实施DM原则和解释不同研究社区使用的术语，指标和评估标准的努力变得复杂。为了解决这一差距，我们的工作为DMML引入了一个全面的框架，包括统一的数据管道，对手和最小化点。这个框架使我们能够系统地回顾文献的数据最小化和DM相邻的方法，第一次提出了一个结构化的概述，旨在帮助从业者和研究人员有效地应用DM的原则。我们的工作促进了对以DM为中心的统一理解，并在AI/ML中更广泛地采用数据最小化策略。
摘要：Data minimization (DM) describes the principle of collecting only the data strictly necessary for a given task. It is a foundational principle across major data protection regulations like GDPR and CPRA. Violations of this principle have substantial real-world consequences, with regulatory actions resulting in fines reaching hundreds of millions of dollars. Notably, the relevance of data minimization is particularly pronounced in machine learning (ML) applications, which typically rely on large datasets, resulting in an emerging research area known as Data Minimization in Machine Learning (DMML). At the same time, existing work on other ML privacy and security topics often addresses concerns relevant to DMML without explicitly acknowledging the connection. This disconnect leads to confusion among practitioners, complicating their efforts to implement DM principles and interpret the terminology, metrics, and evaluation criteria used across different research communities. To address this gap, our work introduces a comprehensive framework for DMML, including a unified data pipeline, adversaries, and points of minimization. This framework allows us to systematically review the literature on data minimization and \emph{DM-adjacent} methodologies, for the first time presenting a structured overview designed to help practitioners and researchers effectively apply DM principles. Our work facilitates a unified DM-centric understanding and broader adoption of data minimization strategies in AI/ML.

【3】Electromagnetic Simulations of Antennas on GPUs for Machine Learning Applications
标题：用于机器学习应用的图形处理器上天线的电磁模拟
链接：https://arxiv.org/abs/2508.10713

作者：iz, Vemund Bakken
备注：20 pages, 10 figures, 4 tables, journal article
摘要：本研究提出了一种基于开源电磁（EM）仿真软件（gprMax）的图形处理单元（GPU）驱动的天线仿真框架，用于天线设计和优化的机器学习应用。此外，它与通过商业EM软件获得的模拟结果进行了比较。所提出的机器学习和代理模型应用程序的软件框架将使用GPU产生由大量天线仿真结果组成的天线数据集。虽然机器学习方法可以获得许多问题的最佳解决方案，但众所周知，它们需要大量的数据，并且需要大量的样本用于算法的训练阶段。然而，由于EM模拟的高计算复杂性，在有限的时间内在EM应用中产生足够数量的训练样本是具有挑战性的。因此，在这项研究中，利用GPU来模拟大量的天线与预定义的或随机的天线形状参数，以产生数据集。此外，本研究还比较了各种机器学习和深度学习模型在天线参数估计性能方面的差异。这项研究表明，入门级GPU在计算性能方面远远优于高端CPU，而高端游戏GPU的计算性能是高端CPU的18倍左右。此外，它表明，开源的EM仿真软件可以提供类似的结果，通过商业软件在微带天线的仿真时，空间分辨率的模拟是足够精细。
摘要：This study proposes an antenna simulation framework powered by graphics processing units (GPUs) based on an open-source electromagnetic (EM) simulation software (gprMax) for machine learning applications of antenna design and optimization. Furthermore, it compares the simulation results with those obtained through commercial EM software. The proposed software framework for machine learning and surrogate model applications will produce antenna data sets consisting of a large number of antenna simulation results using GPUs. Although machine learning methods can attain the optimum solutions for many problems, they are known to be data-hungry and require a great deal of samples for the training stage of the algorithms. However, producing a sufficient number of training samples in EM applications within a limited time is challenging due to the high computational complexity of EM simulations. Therefore, GPUs are utilized in this study to simulate a large number of antennas with predefined or random antenna shape parameters to produce data sets. Moreover, this study also compares various machine learning and deep learning models in terms of antenna parameter estimation performance. This study demonstrates that an entry-level GPU substantially outperforms a high-end CPU in terms of computational performance, while a high-end gaming GPU can achieve around 18 times more computational performance compared to a high-end CPU. Moreover, it is shown that the open-source EM simulation software can deliver similar results to those obtained via commercial software in the simulation of microstrip antennas when the spatial resolution of the simulations is sufficiently fine.

【4】GNN-based Unified Deep Learning
标题：基于GNN的统一深度学习
链接：https://arxiv.org/abs/2508.10583

作者：la, Islem Rekik
摘要：深度学习模型通常难以在医学成像中保持普遍性，特别是在域断裂场景下，其中分布变化来自不同的成像技术，采集协议，患者人群，人口统计学和设备。在实践中，每个医院可能需要训练不同的模型-学习任务、宽度和深度不同-以匹配本地数据。例如，一家医院可能使用MLP和CNN等欧几里得架构来处理表格或网格状图像数据，而另一家医院可能需要非欧几里得架构，如图形神经网络（GNN）来处理大脑连接体等不规则数据。如何跨数据集连贯地训练这些异构模型，同时增强每个模型的泛化能力，仍然是一个悬而未决的问题。我们提出了统一学习，这是一种新的范式，将每个模型编码为图表示，从而在共享的图学习空间中实现统一。然后，GNN指导这些统一模型的优化。通过解耦各个模型的参数并通过统一的GNN（uGNN）控制它们，我们的方法支持不同架构（MLP，CNN，GNN）和分布的参数共享和知识转移，提高了泛化能力。对MorphoMNIST和两个MedMNIST基准测试（MononiaMNIST和BreastMNIST）的评估表明，当模型在独特的分布上进行训练并在混合分布上进行测试时，统一学习可以提高性能，对具有较大分布变化的不可见数据表现出强大的鲁棒性。代码和基准测试：https://github.com/basiralab/uGNN
摘要：Deep learning models often struggle to maintain generalizability in medical imaging, particularly under domain-fracture scenarios where distribution shifts arise from varying imaging techniques, acquisition protocols, patient populations, demographics, and equipment. In practice, each hospital may need to train distinct models - differing in learning task, width, and depth - to match local data. For example, one hospital may use Euclidean architectures such as MLPs and CNNs for tabular or grid-like image data, while another may require non-Euclidean architectures such as graph neural networks (GNNs) for irregular data like brain connectomes. How to train such heterogeneous models coherently across datasets, while enhancing each model's generalizability, remains an open problem. We propose unified learning, a new paradigm that encodes each model into a graph representation, enabling unification in a shared graph learning space. A GNN then guides optimization of these unified models. By decoupling parameters of individual models and controlling them through a unified GNN (uGNN), our method supports parameter sharing and knowledge transfer across varying architectures (MLPs, CNNs, GNNs) and distributions, improving generalizability. Evaluations on MorphoMNIST and two MedMNIST benchmarks - PneumoniaMNIST and BreastMNIST - show that unified learning boosts performance when models are trained on unique distributions and tested on mixed ones, demonstrating strong robustness to unseen data with large distribution shifts. Code and benchmarks: https://github.com/basiralab/uGNN

【5】Pinet: Optimizing hard-constrained neural networks with orthogonal projection layers
标题：Pinet：使用垂直投影层优化硬约束神经网络
链接：https://arxiv.org/abs/2508.10480

作者：s D. Grontas, Antonio Terpin, Efe C. Balta, Raffaello D'Andrea, John Lygeros
摘要：我们为神经网络引入了一个输出层，以确保满足凸约束。我们的方法，$\Pi$net，利用算子分裂在向前传递中进行快速可靠的预测，并利用隐函数定理进行反向传播。我们部署$\Pi$net作为参数约束优化问题的可行设计优化代理，在解决单个问题时比传统求解器更快地获得中等精度的解决方案，并且对于一批问题来说速度更快。我们在训练时间、解决方案质量和对超参数调整的鲁棒性方面超越了最先进的学习方法，同时保持了相似的推理时间。最后，我们解决了多车辆运动规划与非凸轨迹偏好，并提供$\Pi$net作为一个GPU就绪的包，在JAX中实现有效的调整算法。
摘要：We introduce an output layer for neural networks that ensures satisfaction of convex constraints. Our approach, $\Pi$net, leverages operator splitting for rapid and reliable projections in the forward pass, and the implicit function theorem for backpropagation. We deploy $\Pi$net as a feasible-by-design optimization proxy for parametric constrained optimization problems and obtain modest-accuracy solutions faster than traditional solvers when solving a single problem, and significantly faster for a batch of problems. We surpass state-of-the-art learning approaches in terms of training time, solution quality, and robustness to hyperparameter tuning, while maintaining similar inference times. Finally, we tackle multi-vehicle motion planning with non-convex trajectory preferences and provide $\Pi$net as a GPU-ready package implemented in JAX with effective tuning heuristics.

【6】SingleStrip: learning skull-stripping from a single labeled example
标题：SingleStrip：从单个已标记的示例中学习头骨剥制
链接：https://arxiv.org/abs/2508.10464

作者：cktor-Fadida, Malte Hoffmann
备注：Accepted as an oral presentation to the MICCAI 2025 Data Engineering in Medical Imaging (DEMI) workshop
摘要：深度学习分割在很大程度上依赖于标记数据，但手动标记既费力又耗时，尤其是对于脑磁共振成像（MRI）等体积图像。虽然最近的域随机化技术通过从标签图合成不同的训练图像来减轻对标记数据的依赖性，但是当很少有标签图可用时，它们提供有限的解剖变异性。半监督自训练通过迭代地将模型预测纳入训练集来解决标签稀缺问题，使网络能够从未标记的数据中学习。在这项工作中，我们结合域随机化与自我训练，训练三维skull-stripping网络使用一个单一的标记的例子。首先，我们自动对体素强度进行分块，生成用于合成图像的标签，以训练初始的剥颅模型。其次，我们在标记的示例上训练卷积自动编码器（AE），并使用其重建误差来评估为未标记数据预测的大脑掩码的质量。第三，我们选择排名靠前的伪标签来微调网络，在分布外的数据上实现skull-stripping性能，接近用更多标记图像训练的模型。我们比较AE为基础的排名，以一致性为基础的排名下测试时间增强，发现AE方法产生更强的相关性与分割精度。我们的研究结果突出了结合域随机化和基于AE的质量控制的潜力，以实现从极其有限的标记数据中进行有效的半监督分割。这种策略可以减轻标签负担，减缓涉及新解剖结构或新兴成像技术的研究进展。
摘要：Deep learning segmentation relies heavily on labeled data, but manual labeling is laborious and time-consuming, especially for volumetric images such as brain magnetic resonance imaging (MRI). While recent domain-randomization techniques alleviate the dependency on labeled data by synthesizing diverse training images from label maps, they offer limited anatomical variability when very few label maps are available. Semi-supervised self-training addresses label scarcity by iteratively incorporating model predictions into the training set, enabling networks to learn from unlabeled data. In this work, we combine domain randomization with self-training to train three-dimensional skull-stripping networks using as little as a single labeled example. First, we automatically bin voxel intensities, yielding labels we use to synthesize images for training an initial skull-stripping model. Second, we train a convolutional autoencoder (AE) on the labeled example and use its reconstruction error to assess the quality of brain masks predicted for unlabeled data. Third, we select the top-ranking pseudo-labels to fine-tune the network, achieving skull-stripping performance on out-of-distribution data that approaches models trained with more labeled images. We compare AE-based ranking to consistency-based ranking under test-time augmentation, finding that the AE approach yields a stronger correlation with segmentation accuracy. Our results highlight the potential of combining domain randomization and AE-based quality control to enable effective semi-supervised segmentation from extremely limited labeled data. This strategy may ease the labeling burden that slows progress in studies involving new anatomical structures or emerging imaging techniques.

【7】Alternating Approach-Putt Models for Multi-Stage Speech Enhancement
标题：交替逼近-Putt模型的多级语音增强
链接：https://arxiv.org/abs/2508.10436

作者：ong, Kyung-Joong Kim, Kang-Hun Ahn
备注：This work has been submitted to the IEEE for possible publication
摘要：使用人工神经网络的语音增强旨在从含噪语音信号中去除噪声，同时保留语音内容。然而，语音增强网络经常向语音信号引入失真，称为伪像，这会降低音频质量。在这项工作中，我们提出了一个后处理神经网络，旨在减轻语音增强模型引入的文物。灵感来自于在高尔夫中的“接近”之后进行“推杆”的类比，我们将我们的模型命名为PuttNet。我们证明，交替之间的语音增强模型和Putt模型，导致改善语音质量，感知质量分数（PESQ），客观清晰度（STOI），和背景噪声侵入（CBAK）分数。此外，我们用图形分析说明了为什么这种交替的方法优于单独使用任何一种模型的重复应用。
摘要：Speech enhancement using artificial neural networks aims to remove noise from noisy speech signals while preserving the speech content. However, speech enhancement networks often introduce distortions to the speech signal, referred to as artifacts, which can degrade audio quality. In this work, we propose a post-processing neural network designed to mitigate artifacts introduced by speech enhancement models. Inspired by the analogy of making a `Putt' after an `Approach' in golf, we name our model PuttNet. We demonstrate that alternating between a speech enhancement model and the proposed Putt model leads to improved speech quality, as measured by perceptual quality scores (PESQ), objective intelligibility (STOI), and background noise intrusiveness (CBAK) scores. Furthermore, we illustrate with graphical analysis why this alternating Approach outperforms repeated application of either model alone.

【8】Unpacking the Implicit Norm Dynamics of Sharpness-Aware Minimization in Tensorized Models
标题：张量化模型中敏锐度意识最小化的隐范动力学解包
链接：https://arxiv.org/abs/2508.10435

作者：Cao, Kyohei Atarashi, Hisashi Kashima
摘要：Sharpness-Aware Minimization（SAM）是一种有效的优化技术，可以提高过参数化模型的泛化能力。虽然先前的工作已经在简单的双核尺度不变设置中探索了SAM的隐式正则化，但其在更一般的张量化或尺度不变模型中的行为仍然没有得到充分的探索。在这项工作中，我们利用尺度不变性来分析一般张量化模型中SAM的范数动态。我们引入了\n {Norm Deviation}的概念作为核心规范不平衡的全局度量，并使用梯度流分析导出其在SAM下的演化。我们表明，SAM的隐式控制的范数偏差是由核心规范和它们的梯度幅度之间的协方差。受这些发现的启发，我们提出了一种简单而有效的方法，偏差感知缩放（DAS），它通过以数据自适应的方式缩放核心规范来显式地模仿这种正则化行为。我们在张量完成，噪声训练，模型压缩和参数高效微调方面的实验证实，DAS在SAM上实现了竞争或改进的性能，同时降低了计算开销。
摘要：Sharpness-Aware Minimization (SAM) has been proven to be an effective optimization technique for improving generalization in overparameterized models. While prior works have explored the implicit regularization of SAM in simple two-core scale-invariant settings, its behavior in more general tensorized or scale-invariant models remains underexplored. In this work, we leverage scale-invariance to analyze the norm dynamics of SAM in general tensorized models. We introduce the notion of \emph{Norm Deviation} as a global measure of core norm imbalance, and derive its evolution under SAM using gradient flow analysis. We show that SAM's implicit control of Norm Deviation is governed by the covariance between core norms and their gradient magnitudes. Motivated by these findings, we propose a simple yet effective method, \emph{Deviation-Aware Scaling (DAS)}, which explicitly mimics this regularization behavior by scaling core norms in a data-adaptive manner. Our experiments across tensor completion, noisy training, model compression, and parameter-efficient fine-tuning confirm that DAS achieves competitive or improved performance over SAM, while offering reduced computational overhead.

【9】A Unified Evaluation Framework for Multi-Annotator Tendency Learning
标题：多注释者倾向学习的统一评估框架
链接：https://arxiv.org/abs/2508.10393

作者：ng, Jingcheng Ke, Shenli Fan, Xuanmeng Sha, Zheng Lian
备注：9 pages
摘要：最近的工作已经出现在多注释者学习中，其将焦点从将多个注释聚合成单个地面事实预测的面向对象的学习（CoL）转移到对注释者特定的标记行为模式（即，倾向），以提供解释分析来理解注释者的决策。然而，目前没有评估国际交易日志方法是否真正捕捉到个人倾向并提供有意义的行为解释的评估框架。为了解决这个问题，我们提出了第一个统一的评估框架，其中包括两个新的度量：（1）注释者间一致性差异（DIC）通过比较预测的注释者间相似性结构与地面实况来量化模型捕获注释者趋势的程度;（2）行为对齐可解释性（BAE）通过对齐可解释性来评估模型解释反映注释者行为和决策相关性的程度。通过多维标度（MDS）使用地面实况标记相似性结构导出。大量的实验验证了我们提出的评估框架的有效性。
摘要：Recent works have emerged in multi-annotator learning that shift focus from Consensus-oriented Learning (CoL), which aggregates multiple annotations into a single ground-truth prediction, to Individual Tendency Learning (ITL), which models annotator-specific labeling behavior patterns (i.e., tendency) to provide explanation analysis for understanding annotator decisions. However, no evaluation framework currently exists to assess whether ITL methods truly capture individual tendencies and provide meaningful behavioral explanations. To address this gap, we propose the first unified evaluation framework with two novel metrics: (1) Difference of Inter-annotator Consistency (DIC) quantifies how well models capture annotator tendencies by comparing predicted inter-annotator similarity structures with ground-truth; (2) Behavior Alignment Explainability (BAE) evaluates how well model explanations reflect annotator behavior and decision relevance by aligning explainability-derived with ground-truth labeling similarity structures via Multidimensional Scaling (MDS). Extensive experiments validate the effectiveness of our proposed evaluation framework.

【10】eMamba: Efficient Acceleration Framework for Mamba Models in Edge Computing
标题：eMamba：边缘计算中Mamba模型的高效加速框架
链接：https://arxiv.org/abs/2508.10370

作者：m, Jaeho Lee, Jiahao Lin, Alish Kanani, Miao Sun, Umit Y. Ogras, Jaehyun Park
备注：Paper accepted at ESWEEK 2025 (CODES+ISSS) conference
摘要：基于状态空间模型（SSM）的机器学习架构最近在处理序列数据方面获得了极大的关注。Mamba是一种最新的序列到序列SSM，与最先进的Transformer模型相比，它提供了具有竞争力的准确性和卓越的计算效率。虽然这一优势使Mamba特别适合资源受限的边缘设备，但目前还没有硬件加速框架针对在此类环境中部署它进行优化。本文介绍了eMamba，这是一个全面的端到端硬件加速框架，专为在边缘平台上部署Mamba模型而设计。eMamba通过将复杂的归一化层替换为轻量级的硬件感知替代方案，并考虑目标应用，近似昂贵的操作（如SiLU激活和取幂），从而最大限度地提高计算效率。然后，它执行近似感知神经架构搜索（NAS）来调整近似过程中使用的可学习参数。使用Fashion-MNIST、CIFAR-10和MARS（一种开源人体姿势估计数据集）进行的评估显示，eMamba使用1.63- 19.9倍的参数实现了与最先进技术相当的准确性。此外，它可以很好地推广到大规模的自然语言任务，在WikiText 2数据集上的不同序列长度上表现出稳定的困惑。我们还使用GlobalFoundries（GF）22纳米技术在AMD ZCU 102 FPGA和ASIC上量化和实现整个eMamba流水线。实验结果表明，与基准解决方案相比，延迟降低了4.95- 5.62倍，吞吐量提高了2.22- 9.95倍，面积减小了4.77倍，功耗降低了9.84倍，能耗降低了48.6倍，同时保持了具有竞争力的准确性。
摘要：State Space Model (SSM)-based machine learning architectures have recently gained significant attention for processing sequential data. Mamba, a recent sequence-to-sequence SSM, offers competitive accuracy with superior computational efficiency compared to state-of-the-art transformer models. While this advantage makes Mamba particularly promising for resource-constrained edge devices, no hardware acceleration frameworks are currently optimized for deploying it in such environments. This paper presents eMamba, a comprehensive end-to-end hardware acceleration framework explicitly designed for deploying Mamba models on edge platforms. eMamba maximizes computational efficiency by replacing complex normalization layers with lightweight hardware-aware alternatives and approximating expensive operations, such as SiLU activation and exponentiation, considering the target applications. Then, it performs an approximation-aware neural architecture search (NAS) to tune the learnable parameters used during approximation. Evaluations with Fashion-MNIST, CIFAR-10, and MARS, an open-source human pose estimation dataset, show eMamba achieves comparable accuracy to state-of-the-art techniques using 1.63-19.9$\times$ fewer parameters. In addition, it generalizes well to large-scale natural language tasks, demonstrating stable perplexity across varying sequence lengths on the WikiText2 dataset. We also quantize and implement the entire eMamba pipeline on an AMD ZCU102 FPGA and ASIC using GlobalFoundries (GF) 22 nm technology. Experimental results show 4.95-5.62$\times$ lower latency and 2.22-9.95$\times$ higher throughput, with 4.77$\times$ smaller area, 9.84$\times$ lower power, and 48.6$\times$ lower energy consumption than baseline solutions while maintaining competitive accuracy.

【11】Semantic Communication with Distribution Learning through Sequential Observations
标题：通过顺序观察进行分布学习的语义沟通
链接：https://arxiv.org/abs/2508.10350

作者：oud, Kinda Khawam
摘要：语义传播的目的是传达意义，而不是比特完美的复制，代表了从传统的通信模式的转变。本文研究了语义交流中的分布学习，其中接收者必须通过连续的观察来推断潜在的意义分布。虽然语义通信传统上优化个人意义的传输，我们建立了基本条件，学习源统计先验知识时是未知的。我们证明了可学习性需要有效传输矩阵的满秩，描述了分布估计的收敛速度，并量化了估计误差如何转化为语义失真。我们的分析揭示了一个基本的权衡：为即时语义性能而优化的编码方案往往会牺牲长期的可学习性。CIFAR-10上的实验验证了我们的理论框架，表明系统条件对学习速率和可实现的性能都有重要影响。这些结果提供了第一个严格的表征统计学习语义通信和系统的设计原则，平衡即时性能与适应能力。
摘要：Semantic communication aims to convey meaning rather than bit-perfect reproduction, representing a paradigm shift from traditional communication. This paper investigates distribution learning in semantic communication where receivers must infer the underlying meaning distribution through sequential observations. While semantic communication traditionally optimizes individual meaning transmission, we establish fundamental conditions for learning source statistics when priors are unknown. We prove that learnability requires full rank of the effective transmission matrix, characterize the convergence rate of distribution estimation, and quantify how estimation errors translate to semantic distortion. Our analysis reveals a fundamental trade-off: encoding schemes optimized for immediate semantic performance often sacrifice long-term learnability. Experiments on CIFAR-10 validate our theoretical framework, demonstrating that system conditioning critically impacts both learning rate and achievable performance. These results provide the first rigorous characterization of statistical learning in semantic communication and offer design principles for systems that balance immediate performance with adaptation capability.

【12】Concepts or Skills? Rethinking Instruction Selection for Multi-modal Models
标题：概念还是技能？重新思考多模式模型的教学选择
链接：https://arxiv.org/abs/2508.10339

作者：i, Justin Cui, Ruochen Wang, Cho-Jui Hsieh
备注：11 pages, 1 figure
摘要：视觉语言教学调整主要实现两个目的：学习视觉概念和学习视觉技能。在本文中，我们发现，视觉语言基准分为二分法，主要受益于培训的指示类似的技能或视觉概念。受这一发现的启发，我们设计了一种简单的有针对性的训练数据选择方法，以优化给定基准的性能。我们首先从基准测试中提取概念/技能，确定基准测试是否主要受益于相似的概念或技能，最后选择具有最匹配的概念/技能的指令。在10多个基准测试上的实验验证了我们的目标数据选择方法的有效性，在所有基准测试的平均值上显示出+0.9\%的最佳现有基线，在以技能为中心的子集上显示出+1.5\%。我们的研究结果强调了认识到内在的权衡指令的选择，这需要平衡的概念知识对视觉技能的收购的重要性。
摘要：Vision-language instruction tuning achieves two main purposes: learning visual concepts and learning visual skills. In this paper, we found that vision-language benchmarks fall into the dichotomy of mainly benefiting from training on instructions with similar skills or visual concepts. Inspired by the discovery, we designed a simple targeted training data selection method to optimize the performance of a given benchmark. We first extract the concepts/skills from the benchmark, determine whether the benchmark predominantly benefits from similar concepts or skills, and finally select instructions with the most matching concepts/skills. Experiments on 10+ benchmarks validate the effectiveness of our targeted data selection method, showing +0.9\% over the best existing baseline averaged over all benchmarks and +1.5\% on the skill-focused subset. Our findings underscore the importance of recognizing the inherent trade-off within instruction selection, which requires balancing the acquisition of conceptual knowledge against visual skill.

【13】xRFM: Accurate, scalable, and interpretable feature learning models for tabular data
标题：xRFM：针对表格数据的准确、可扩展且可解释的特征学习模型
链接：https://arxiv.org/abs/2508.10053

作者：aglehole, David Holzmüller, Adityanarayanan Radhakrishnan, Mikhail Belkin
摘要：从表格数据中推断，连续和分类变量的集合组织成矩阵，是现代技术和科学的基础。然而，与人工智能其他领域的爆炸性变化相比，这些预测任务的最佳实践相对没有变化，仍然主要基于梯度提升决策树（GBDTs）的变化。最近，基于神经网络和特征学习方法的最新发展，人们对开发用于表格数据的最先进方法重新产生了兴趣。在这项工作中，我们引入了xRFM，这是一种将特征学习内核机器与树结构相结合的算法，既能适应数据的局部结构，又能扩展到基本上无限数量的训练数据。我们表明，与31美元的其他方法相比，包括最近推出的表格基础模型（TabPFNv2）和GBDTs，xRFM在100美元的回归数据集上实现了最佳性能，并且与200美元的分类数据集上的最佳方法相比具有竞争力，优于GBDTs。此外，xRFM通过平均梯度外积原生提供可解释性。
摘要：Inference from tabular data, collections of continuous and categorical variables organized into matrices, is a foundation for modern technology and science. Yet, in contrast to the explosive changes in the rest of AI, the best practice for these predictive tasks has been relatively unchanged and is still primarily based on variations of Gradient Boosted Decision Trees (GBDTs). Very recently, there has been renewed interest in developing state-of-the-art methods for tabular data based on recent developments in neural networks and feature learning methods. In this work, we introduce xRFM, an algorithm that combines feature learning kernel machines with a tree structure to both adapt to the local structure of the data and scale to essentially unlimited amounts of training data. We show that compared to $31$ other methods, including recently introduced tabular foundation models (TabPFNv2) and GBDTs, xRFM achieves best performance across $100$ regression datasets and is competitive to the best methods across $200$ classification datasets outperforming GBDTs. Additionally, xRFM provides interpretability natively through the Average Gradient Outer Product.

【14】Personalized Product Search Ranking: A Multi-Task Learning Approach with Tabular and Non-Tabular Data
标题：个性化产品搜索排名：具有表格和非表格数据的多任务学习方法
链接：https://arxiv.org/abs/2508.09636

作者：Morishetti, Abhay Kumar, Jonathan Scott, Kaushiki Nag, Gunjan Sharma, Shanu Vashishtha, Rahul Sridhar, Rohit Chatter, Kannan Achan
备注：17 pages, 2 figures, The Pacific Rim International Conference on Artificial Intelligence (PRICAI-2025) Conference
摘要：在本文中，我们提出了一种新的模型架构，优化个性化的产品搜索排名使用多任务学习（MTL）框架。我们的方法独特地集成了表格和非表格数据，利用预先训练的TinyBERT模型进行语义嵌入，并采用新颖的采样技术来捕获不同的客户行为。我们根据几个基线评估我们的模型，包括XGBoost，TabNet，FT-Transformer，DCN-V2和MMoE，重点关注它们处理混合数据类型和优化个性化排名的能力。此外，我们提出了一个可扩展的相关性标记机制的基础上点击率，点击位置和语义相似性，提供了一种替代传统的人类注释的标签。实验结果表明，在多任务学习范式中，将非表格数据与先进的嵌入技术相结合，显著提高了模型的性能。消融研究进一步强调了合并相关性标签、微调TinyBERT层和TinyBERT查询-产品嵌入交互的好处。这些结果表明，我们的方法在实现改进的个性化产品搜索排名的有效性。
摘要：In this paper, we present a novel model architecture for optimizing personalized product search ranking using a multi-task learning (MTL) framework. Our approach uniquely integrates tabular and non-tabular data, leveraging a pre-trained TinyBERT model for semantic embeddings and a novel sampling technique to capture diverse customer behaviors. We evaluate our model against several baselines, including XGBoost, TabNet, FT-Transformer, DCN-V2, and MMoE, focusing on their ability to handle mixed data types and optimize personalized ranking. Additionally, we propose a scalable relevance labeling mechanism based on click-through rates, click positions, and semantic similarity, offering an alternative to traditional human-annotated labels. Experimental results show that combining non-tabular data with advanced embedding techniques in multi-task learning paradigm significantly enhances model performance. Ablation studies further underscore the benefits of incorporating relevance labels, fine-tuning TinyBERT layers, and TinyBERT query-product embedding interactions. These results demonstrate the effectiveness of our approach in achieving improved personalized product search ranking.

【15】Memorisation and forgetting in a learning Hopfield neural network: bifurcation mechanisms, attractors and basins
标题：学习Hopfield神经网络中的同步化和遗忘：分歧机制、吸引子和盆地
链接：https://arxiv.org/abs/2508.10765

作者：ssex (1), Natalia B. Janson (1), Rachel A. Norris (1), Alexander G. Balanov (1) ((1) Loughborough University, England)
备注：19 pages, 14 figures. The following article has been submitted to `Chaos: An Interdisciplinary Journal of Nonlinear Science'. After it is published, it will be found at this https URL
摘要：尽管基于人工神经网络（ANN）的人工智能呈爆炸性增长，但它们被用作“黑匣子”，因为目前尚不清楚它们在学习过程中如何形成记忆或发展出不需要的特征，包括虚假记忆和灾难性遗忘。关于学习ANN的孤立方面有很多研究，但由于它们的高维性和非线性，它们的全面分析仍然是一个挑战。在人工神经网络中，知识被认为驻留在连接权重或吸引盆中，但这两种范式并没有明确联系。在这里，我们全面分析了在一个81神经元的Hopfield网络进行赫布学习的记忆形成机制，揭示分叉导致形成和破坏的吸引子和他们的盆地边界。我们表明，通过影响连接权重的演变，施加的刺激诱导干草叉，然后级联的鞍结分叉创建新的吸引子与他们的盆地，可以编码真实或虚假的记忆，和旧的记忆突然消失（灾难性遗忘）。在成功学习的情况下，新的范畴由新生点吸引子的盆表示，它们的边界由新鞍点的稳定流形表示。因此，记忆和遗忘是同一机制的两种表现形式。我们分析高维学习ANN的策略是通用的，适用于任何形式的递归ANN。所展示的记忆形成和灾难性遗忘的机制揭示了更广泛的循环ANN的操作，并可以帮助开发减轻其缺陷的方法。
摘要：Despite explosive expansion of artificial intelligence based on artificial neural networks (ANNs), these are employed as "black boxes'', as it is unclear how, during learning, they form memories or develop unwanted features, including spurious memories and catastrophic forgetting. Much research is available on isolated aspects of learning ANNs, but due to their high dimensionality and non-linearity, their comprehensive analysis remains a challenge. In ANNs, knowledge is thought to reside in connection weights or in attractor basins, but these two paradigms are not linked explicitly. Here we comprehensively analyse mechanisms of memory formation in an 81-neuron Hopfield network undergoing Hebbian learning by revealing bifurcations leading to formation and destruction of attractors and their basin boundaries. We show that, by affecting evolution of connection weights, the applied stimuli induce a pitchfork and then a cascade of saddle-node bifurcations creating new attractors with their basins that can code true or spurious memories, and an abrupt disappearance of old memories (catastrophic forgetting). With successful learning, new categories are represented by the basins of newly born point attractors, and their boundaries by the stable manifolds of new saddles. With this, memorisation and forgetting represent two manifestations of the same mechanism. Our strategy to analyse high-dimensional learning ANNs is universal and applicable to recurrent ANNs of any form. The demonstrated mechanisms of memory formation and of catastrophic forgetting shed light on the operation of a wider class of recurrent ANNs and could aid the development of approaches to mitigate their flaws.

【16】Mitigating Exponential Mixed Frequency Growth through Frequency Selection and Dimensional Separation in Quantum Machine Learning
标题：量子机器学习中通过频率选择和维度分离抑制指数混合频率增长
链接：https://arxiv.org/abs/2508.10533

作者：oppel, David Bucher, Maximilian Zorn, Nico Kraus, Jonas Stein, Claudia Linnhoff-Popien
摘要：为了利用量子计算（QC）的潜在计算加速，量子机器学习（QML）的研究越来越突出。QML模型中的角度编码技术已被证明可以生成截断傅立叶级数，提供渐近通用函数逼近能力。通过在量子电路中选择有效的特征映射（FM），可以利用傅立叶频率的指数增长来改进近似。在多维设置中，额外的输入维度通过混合频率引起进一步的指数缩放。然而，在实践中，量子模型经常在回归任务中失败。通过两个白盒实验，我们表明，即使存在相关频率，由于可训练参数的数量不足，也可能发生此类故障。为了减轻双指数增长的频率导致的双指数参数增长，我们提出了频率选择和维度分离作为技术来约束参数的数量，从而提高可训练性。通过限制QML模型的基本频率和允许的混合频率之间的特征尺寸与已知的相互依赖性，我们扩大了一套易于处理的问题，在当前的硬件。我们证明了减少参数的要求，通过拟合两个白盒函数与已知的频谱和尺寸的相互依赖性，不能与默认的方法。减少的参数要求使我们能够在嘈杂的量子模拟器上进行训练，并在真实的量子硬件上演示推理。
摘要：To leverage the potential computational speedup of quantum computing (QC), research in quantum machine learning (QML) has gained increasing prominence. Angle encoding techniques in QML models have been shown to generate truncated Fourier series, offering asymptotically universal function approximation capabilities. By selecting efficient feature maps (FMs) within quantum circuits, one can leverage the exponential growth of Fourier frequencies for improved approximation. In multi-dimensional settings, additional input dimensions induce further exponential scaling via mixed frequencies. In practice, however, quantum models frequently fail at regression tasks. Through two white-box experiments, we show that such failures can occur even when the relevant frequencies are present, due to an insufficient number of trainable parameters. In order to mitigate the double-exponential parameter growth resulting from double-exponentially growing frequencies, we propose frequency selection and dimensional separation as techniques to constrain the number of parameters, thereby improving trainability. By restricting the QML model to essential frequencies and permitting mixed frequencies only among feature dimensions with known interdependence, we expand the set of tractable problems on current hardware. We demonstrate the reduced parameter requirements by fitting two white-box functions with known frequency spectrum and dimensional interdependencies that could not be fitted with the default methods. The reduced parameter requirements permit us to perform training on a noisy quantum simulator and to demonstrate inference on real quantum hardware.

【17】Estimating carbon pools in the shelf sea environment: reanalysis or model-informed machine learning?
标题：估计大陆架海洋环境中的碳库：重新分析还是模型知情的机器学习？
链接：https://arxiv.org/abs/2508.10178

作者：kala
备注：24 pages, 9 figures (4 in the appendix)
摘要：陆架海对于碳固存和碳循环很重要，但关于陆架海环境中碳库的现场数据或卫星数据往往很少，或很不确定。通过再分析可以提供替代方案，但这些方法的运行成本通常很高。我们建议使用一个集成的神经网络（NN）从耦合的物理-地球化学模型中学习直接可观察的变量和碳库之间的关系。我们证明了西北欧大陆架（NWES）的海洋环境，当神经网络训练模型自由运行模拟应用于NWES再分析，它能够重现碳池的再分析输出。此外，与现有的NWES再分析不同，NN集合还能够提供池的不确定性信息。我们专注于结果的可解释性，并展示了NN在未来气候假设情景中的潜在用途。我们认为，基于模型的机器学习为昂贵的重新分析提供了一种可行的替代方案，并且可以补充观测数据，无论它们在哪里缺失和/或高度不确定。
摘要：Shelf seas are important for carbon sequestration and carbon cycle, but available in situ, or satellite data for carbon pools in the shelf sea environment are often sparse, or highly uncertain. Alternative can be provided by reanalyses, but these are often expensive to run. We propose to use an ensemble of neural networks (NN) to learn from a coupled physics-biogeochemistry model the relationship between the directly observable variables and carbon pools. We demonstrate for North-West European Shelf (NWES) sea environment, that when the NN trained on a model free run simulation is applied to the NWES reanalysis, it is capable to reproduce the reanalysis outputs for carbon pools. Moreover, unlike the existing NWES reanalysis, the NN ensemble is also capable to provide uncertainty information for the pools. We focus on explainability of the results and demonstrate potential use of the NNs for future climate what-if scenarios. We suggest that model-informed machine learning presents a viable alternative to expensive reanalyses and could complement observational data, wherever they are missing and/or highly uncertain.

【18】Jet Image Tagging Using Deep Learning: An Ensemble Model
标题：使用深度学习标记Jet图像：一种融合模型
链接：https://arxiv.org/abs/2508.10034

作者：assa, Vidya Manian, Sudhir Malik, Arghya Chattopadhyay
备注：19 Pages. All codes available at this https URL
摘要：高能粒子物理学中的喷流分类对于理解基本相互作用和探测标准模型之外的现象非常重要。喷注起源于夸克和胶子的碎裂和强子化，由于其复杂的多维结构，对识别构成了挑战。传统的分类方法往往无法捕捉这些复杂性，需要先进的机器学习方法。在本文中，我们同时使用两个神经网络作为集成来标记各种喷流类型。我们将射流数据转换为二维直方图，而不是将其表示为高维空间中的点。具体来说，这种集合方法（以下称为Encourage Model）用于将喷流标记为JetNet数据集的类别，对应于：顶夸克，轻夸克（向上或向下）以及W和Z玻色子。对于上面提到的喷流类别，我们表明集成模型可用于二元和多类别分类。这种集成方法通过利用每个组成网络的优势来学习喷气特征，从而实现与单个网络相比的卓越性能。
摘要：Jet classification in high-energy particle physics is important for understanding fundamental interactions and probing phenomena beyond the Standard Model. Jets originate from the fragmentation and hadronization of quarks and gluons, and pose a challenge for identification due to their complex, multidimensional structure. Traditional classification methods often fall short in capturing these intricacies, necessitating advanced machine learning approaches. In this paper, we employ two neural networks simultaneously as an ensemble to tag various jet types. We convert the jet data to two-dimensional histograms instead of representing them as points in a higher-dimensional space. Specifically, this ensemble approach, hereafter referred to as Ensemble Model, is used to tag jets into classes from the JetNet dataset, corresponding to: Top Quarks, Light Quarks (up or down), and W and Z bosons. For the jet classes mentioned above, we show that the Ensemble Model can be used for both binary and multi-categorical classification. This ensemble approach learns jet features by leveraging the strengths of each constituent network achieving superior performance compared to either individual network.

其他(17篇)

【1】Efficiently Verifiable Proofs of Data Attribution
标题：有效可验证的数据属性证明
链接：https://arxiv.org/abs/2508.10866

作者：mer, Seth Neel, Martin Pawelczyk
摘要：数据归因方法旨在回答有用的反事实问题，例如“如果在不同的数据集上训练ML模型，它的预测会是什么？”然而，通过经验影响或“数据建模”等技术来估计数据归因模型仍然非常昂贵。这导致了一个关键的信任问题：如果只有少数计算丰富的各方可以获得数据属性，那么资源受限的各方如何能够信任所提供的属性确实是“好的”，特别是当它们用于重要的下游应用时（例如，数据定价）？在本文中，我们解决这个信任问题，提出了一个交互式的验证范式的数据属性。一个不可信的、计算能力强大的证明者学习数据属性，然后与一个资源受限的验证者进行交互式证明。我们的主要结果是一个协议，提供正式的完整性，健全性和效率保证的意义上的概率近似正确（PAC）验证。具体地说，如果证明者和验证者都遵循协议，则验证者接受{\displaystyle {\delta}-接近最佳数据属性（就均方误差而言）的数据属性，概率为1-{\delta}。相反，如果证明者任意偏离协议，即使无限计算，也会被检测到（或者它仍然会将数据归因于验证者），除非概率为{\delta}。重要的是，我们的协议确保了验证器的工作负载，通过它必须执行的独立模型重新训练的数量来衡量，规模仅为O（1/{\displaystyle O}）;即，与数据集大小无关。在技术层面上，我们的结果适用于有效地验证任何线性函数的布尔超立方体计算的证明，使它们广泛适用于各种归因任务。
摘要：Data attribution methods aim to answer useful counterfactual questions like "what would a ML model's prediction be if it were trained on a different dataset?" However, estimation of data attribution models through techniques like empirical influence or "datamodeling" remains very computationally expensive. This causes a critical trust issue: if only a few computationally rich parties can obtain data attributions, how can resource-constrained parties trust that the provided attributions are indeed "good," especially when they are used for important downstream applications (e.g., data pricing)? In this paper, we address this trust issue by proposing an interactive verification paradigm for data attribution. An untrusted and computationally powerful Prover learns data attributions, and then engages in an interactive proof with a resource-constrained Verifier. Our main result is a protocol that provides formal completeness, soundness, and efficiency guarantees in the sense of Probably-Approximately-Correct (PAC) verification. Specifically, if both Prover and Verifier follow the protocol, the Verifier accepts data attributions that are {\epsilon}-close to the optimal data attributions (in terms of the Mean Squared Error) with probability 1-{\delta}. Conversely, if the Prover arbitrarily deviates from the protocol, even with infinite compute, then this is detected (or it still yields data attributions to the Verifier) except with probability {\delta}. Importantly, our protocol ensures the Verifier's workload, measured by the number of independent model retrainings it must perform, scales only as O(1/{\epsilon}); i.e., independently of the dataset size. At a technical level, our results apply to efficiently verifying any linear function over the boolean hypercube computed by the Prover, making them broadly applicable to various attribution tasks.

【2】Comparison of Data Reduction Criteria for Online Gaussian Processes
标题：在线高斯过程数据简化标准的比较
链接：https://arxiv.org/abs/2508.10815

作者：tzke, Knut Graichen
备注：12 pages
摘要：高斯过程（GP）由于其灵活性和量化不确定性的能力而广泛用于回归和系统辨识。然而，它们的计算复杂性限制了它们对小数据集的适用性。此外，在流场景中，越来越多的数据点积累，这即使对于稀疏GP也是棘手的。在线GPS旨在通过定义数据点的最大预算和删除冗余数据点来缓解这一问题。这项工作提供了一个统一的比较几个约简准则，分析其计算复杂性和约简行为。这些标准在基准函数和真实世界的数据集上进行评估，包括动态系统识别任务。此外，提出了验收标准，以进一步过滤掉冗余数据点。这项工作产生实际的指导方针，选择一个合适的标准，在线GP算法。
摘要：Gaussian Processes (GPs) are widely used for regression and system identification due to their flexibility and ability to quantify uncertainty. However, their computational complexity limits their applicability to small datasets. Moreover in a streaming scenario, more and more datapoints accumulate which is intractable even for Sparse GPs. Online GPs aim to alleviate this problem by e.g. defining a maximum budget of datapoints and removing redundant datapoints. This work provides a unified comparison of several reduction criteria, analyzing both their computational complexity and reduction behavior. The criteria are evaluated on benchmark functions and real-world datasets, including dynamic system identification tasks. Additionally, acceptance criteria are proposed to further filter out redundant datapoints. This work yields practical guidelines for choosing a suitable criterion for an online GP algorithm.

【3】Non-Stationary Restless Multi-Armed Bandits with Provable Guarantee
标题：有可证明的保证的非静止不安分的多武装强盗
链接：https://arxiv.org/abs/2508.10804

作者：ung, Ping-Chun Hsieh, Kai Wang
摘要：在线不安分的多臂强盗（RMAB）通常假设每个手臂都遵循具有固定状态转换和奖励的平稳马尔可夫决策过程（MDP）。然而，在医疗保健和推荐系统等实际应用中，由于非平稳动态，这些假设经常会被打破，这对传统的RMAB算法构成了重大挑战。在这项工作中，我们特别考虑$N$armd RMAB与非平稳过渡约束有界变化预算$B$。我们建议的\rmab\;该算法将滑动窗口强化学习（RL）与置信上限（UCB）机制相结合，以同时学习过渡动态及其变化。我们进一步建立了\rmab\;实现$\widetilde{\mathcal{O}}（N^2 B^{\frac{1}{4}} T^{\frac{3}{4}}）$遗憾的限制，利用一个宽松的遗憾的定义，提供了一个基础的理论框架，为非平稳RMAB问题的第一次。
摘要：Online restless multi-armed bandits (RMABs) typically assume that each arm follows a stationary Markov Decision Process (MDP) with fixed state transitions and rewards. However, in real-world applications like healthcare and recommendation systems, these assumptions often break due to non-stationary dynamics, posing significant challenges for traditional RMAB algorithms. In this work, we specifically consider $N$-armd RMAB with non-stationary transition constrained by bounded variation budgets $B$. Our proposed \rmab\; algorithm integrates sliding window reinforcement learning (RL) with an upper confidence bound (UCB) mechanism to simultaneously learn transition dynamics and their variations. We further establish that \rmab\; achieves $\widetilde{\mathcal{O}}(N^2 B^{\frac{1}{4}} T^{\frac{3}{4}})$ regret bound by leveraging a relaxed definition of regret, providing a foundational theoretical framework for non-stationary RMAB problems for the first time.

【4】Agentic Design Review System
标题：大型设计审查系统
链接：https://arxiv.org/abs/2508.10745

作者：, K J Joseph, Koustava Goswami, Vlad I Morariu, Balaji Vasan Srinivasan
摘要：评估平面设计涉及从多个方面评估它，如对齐，构图，美学和颜色选择。以整体的方式评估设计涉及到汇总来自各个专家评审员的反馈。为此，我们提出了一个协同设计审查系统（AnticulticDRS），其中多个代理协作分析的设计，编排的元代理。一种新的上下文范例选择方法的基础上，图匹配和一个独特的提示扩展方法发挥了核心作用，使每个代理设计意识。为了评估这个框架，我们提出了DRS-BENCH基准。针对最先进的基线进行了彻底的实验评估，这些基线适用于问题设置，并以关键的消融实验作为后盾，从而显示了APDIC-DRS在评估图形设计和生成可操作反馈方面的功效。我们希望这项工作将引起人们对这一务实但尚未探索的研究方向的关注。
摘要：Evaluating graphic designs involves assessing it from multiple facets like alignment, composition, aesthetics and color choices. Evaluating designs in a holistic way involves aggregating feedback from individual expert reviewers. Towards this, we propose an Agentic Design Review System (AgenticDRS), where multiple agents collaboratively analyze a design, orchestrated by a meta-agent. A novel in-context exemplar selection approach based on graph matching and a unique prompt expansion method plays central role towards making each agent design aware. Towards evaluating this framework, we propose DRS-BENCH benchmark. Thorough experimental evaluation against state-of-the-art baselines adapted to the problem setup, backed-up with critical ablation experiments brings out the efficacy of Agentic-DRS in evaluating graphic designs and generating actionable feedback. We hope that this work will attract attention to this pragmatic, yet under-explored research direction.

【5】Dissecting Generalized Category Discovery: Multiplex Consensus under Self-Deconstruction
标题：剖析广义类别发现：自我解构下的多重共识
链接：https://arxiv.org/abs/2508.10731

作者：g, Kunze Huang, Chaoqi Chen, Yuxuan Yuan, Chenxin Li, Xiaotong Tu, Xinghao Ding, Yue Huang
备注：Accepted by ICCV 2025 as *** Highlight ***!
摘要：人类感知系统擅长在已知和新类别中归纳和识别对象，这一能力远远超出了当前的机器学习框架。而广义类别发现（GCD）的目的是弥合这一差距，现有的方法主要集中在优化目标函数。我们提出了一个正交的解决方案，灵感来自人类的认知过程，新的对象的理解：分解成视觉原语的对象，并建立跨知识的比较。我们提出了ConGCD，它通过高层次的语义重构建立面向连续性的表示，通过解构绑定类内共享属性。在视觉处理中，不同的个体利用显性或上下文线索，我们实现了显性和上下文共识单位，以捕捉类歧视模式和固有的分布不变量，分别反映人类偏好的多样性。共识调度程序动态优化激活途径，最终预测通过多重共识整合出现。在粗粒度和细粒度的基准广泛的评估表明ConGCD的有效性作为一个共识意识的范例。代码可在github.com/lytang63/ConGCD上获得。
摘要：Human perceptual systems excel at inducing and recognizing objects across both known and novel categories, a capability far beyond current machine learning frameworks. While generalized category discovery (GCD) aims to bridge this gap, existing methods predominantly focus on optimizing objective functions. We present an orthogonal solution, inspired by the human cognitive process for novel object understanding: decomposing objects into visual primitives and establishing cross-knowledge comparisons. We propose ConGCD, which establishes primitive-oriented representations through high-level semantic reconstruction, binding intra-class shared attributes via deconstruction. Mirroring human preference diversity in visual processing, where distinct individuals leverage dominant or contextual cues, we implement dominant and contextual consensus units to capture class-discriminative patterns and inherent distributional invariants, respectively. A consensus scheduler dynamically optimizes activation pathways, with final predictions emerging through multiplex consensus integration. Extensive evaluations across coarse- and fine-grained benchmarks demonstrate ConGCD's effectiveness as a consensus-aware paradigm. Code is available at github.com/lytang63/ConGCD.

【6】Beyond Random Sampling: Instance Quality-Based Data Partitioning via Item Response Theory
标题：超越随机抽样：通过项目响应理论进行基于实例质量的数据分区
链接：https://arxiv.org/abs/2508.10628

作者：doso, Vitor Santos, José Ribeiro Filho, Ricardo Prudêncio, Regiane Kawasaki, Ronnie Alves
备注：12 pages, 8 figures, 1 table, Accepted to the ENIAC 2025 conference
摘要：机器学习（ML）模型的鲁棒验证是必不可少的，但传统的数据划分方法往往忽略了每个实例的内在质量。本研究提出使用项目反应理论（IRT）参数来表征和指导模型验证阶段的数据集划分。在四个表格数据集中评估了IRT通知的分区策略对几个ML模型性能的影响。所获得的结果表明，IRT揭示了固有的异质性的实例，并突出了存在的信息分组的实例在同一数据集内。基于IRT，创建了平衡分区，这些分区始终有助于更好地理解模型的偏差和方差之间的权衡。此外，猜测参数被证明是一个决定性因素：使用高猜测实例进行训练会显着损害模型性能，导致准确率低于50%，而在同一数据集中其他分区达到70%以上。
摘要：Robust validation of Machine Learning (ML) models is essential, but traditional data partitioning approaches often ignore the intrinsic quality of each instance. This study proposes the use of Item Response Theory (IRT) parameters to characterize and guide the partitioning of datasets in the model validation stage. The impact of IRT-informed partitioning strategies on the performance of several ML models in four tabular datasets was evaluated. The results obtained demonstrate that IRT reveals an inherent heterogeneity of the instances and highlights the existence of informative subgroups of instances within the same dataset. Based on IRT, balanced partitions were created that consistently help to better understand the tradeoff between bias and variance of the models. In addition, the guessing parameter proved to be a determining factor: training with high-guessing instances can significantly impair model performance and resulted in cases with accuracy below 50%, while other partitions reached more than 70% in the same dataset.

【7】On Spectral Properties of Gradient-based Explanation Methods
标题：关于基于物体的解释方法的光谱性质
链接：https://arxiv.org/abs/2508.10595

作者：panah, Erik Englesson, Hossein Azizpour
备注：36 pages, 16 figures, published in European Conference on Computer Vision 2024
摘要：了解深度网络的行为对于增加我们对其结果的信心至关重要。尽管有大量的工作来解释他们的预测，但研究人员面临着可靠性问题，这可以归因于形式主义不足。在我们的研究中，我们采用新的概率和光谱的角度来正式分析解释方法。我们的研究揭示了一个普遍的光谱偏差源于梯度的使用，并揭示了一些常见的设计选择，已被实验发现，特别是，使用平方梯度和输入扰动。我们进一步描述了解释方法中扰动超参数的选择，如SmoothGrad，如何导致不一致的解释，并基于我们提出的形式主义引入两种补救措施：（i）确定标准扰动尺度的机制，以及（ii）我们称之为SpectralLens的聚合方法。最后，我们通过定量评估证实了我们的理论结果。
摘要：Understanding the behavior of deep networks is crucial to increase our confidence in their results. Despite an extensive body of work for explaining their predictions, researchers have faced reliability issues, which can be attributed to insufficient formalism. In our research, we adopt novel probabilistic and spectral perspectives to formally analyze explanation methods. Our study reveals a pervasive spectral bias stemming from the use of gradient, and sheds light on some common design choices that have been discovered experimentally, in particular, the use of squared gradient and input perturbation. We further characterize how the choice of perturbation hyperparameters in explanation methods, such as SmoothGrad, can lead to inconsistent explanations and introduce two remedies based on our proposed formalism: (i) a mechanism to determine a standard perturbation scale, and (ii) an aggregation method which we call SpectralLens. Finally, we substantiate our theoretical results through quantitative evaluations.

【8】X-Node: Self-Explanation is All We Need
标题：X-节点：自我解释就是我们所需要的
链接：https://arxiv.org/abs/2508.10461

作者：ngupta, Islem Rekik
摘要：图神经网络（GNN）通过捕获数据实例之间的结构依赖关系，在计算机视觉和医学图像分类任务中取得了最先进的成果。然而，他们的决策在很大程度上仍然是不透明的，限制了他们在高风险的临床应用中的可信度，其中可解释性是必不可少的。GNN的现有可解释性技术通常是事后和全局的，对单个节点决策或局部推理的洞察有限。我们介绍了X-Node，这是一个自我解释的GNN框架，其中每个节点都会生成自己的解释作为预测过程的一部分。对于每个节点，我们构建了一个结构化的上下文向量编码可解释的线索，如程度，中心性，聚类，特征显着性和标签协议在其本地拓扑结构。轻量级Reasoner模块将该上下文映射到紧凑的解释向量中，该解释向量用于三个目的：（1）经由解码器重构节点的潜在嵌入以强制忠实性，（2）使用预先训练的LLM（例如，Grok或Gemini），以及（3）通过“文本注入”机制引导GNN本身，将解释反馈到消息传递管道中。我们在MedMNIST和MorphoMNIST的两个图形数据集上评估X-Node，将其与GCN，GAT和GIN骨干集成。我们的研究结果表明，X-Node保持了有竞争力的分类精度，同时产生了忠实的每个节点的解释。资料档案库：https://github.com/basiralab/X-Node。
摘要：Graph neural networks (GNNs) have achieved state-of-the-art results in computer vision and medical image classification tasks by capturing structural dependencies across data instances. However, their decision-making remains largely opaque, limiting their trustworthiness in high-stakes clinical applications where interpretability is essential. Existing explainability techniques for GNNs are typically post-hoc and global, offering limited insight into individual node decisions or local reasoning. We introduce X-Node, a self-explaining GNN framework in which each node generates its own explanation as part of the prediction process. For every node, we construct a structured context vector encoding interpretable cues such as degree, centrality, clustering, feature saliency, and label agreement within its local topology. A lightweight Reasoner module maps this context into a compact explanation vector, which serves three purposes: (1) reconstructing the node's latent embedding via a decoder to enforce faithfulness, (2) generating a natural language explanation using a pre-trained LLM (e.g., Grok or Gemini), and (3) guiding the GNN itself via a "text-injection" mechanism that feeds explanations back into the message-passing pipeline. We evaluate X-Node on two graph datasets derived from MedMNIST and MorphoMNIST, integrating it with GCN, GAT, and GIN backbones. Our results show that X-Node maintains competitive classification accuracy while producing faithful, per-node explanations. Repository: https://github.com/basiralab/X-Node.

【9】Efficient Methods for Accurate Sparse Trajectory Recovery and Map Matching
标题：精确稀疏轨迹恢复和地图匹配的有效方法
链接：https://arxiv.org/abs/2508.10460

作者： Jieming Shi, Man Lung Yiu
备注：13 pages, accepted by 2025 IEEE 41st International Conference on Data Engineering (ICDE)
摘要：真实世界的轨迹通常是稀疏的，具有低采样率（即，连续GPS点之间的间隔较长），并且与道路网络不对齐，但许多应用需要高质量的数据才能实现最佳性能。为了提高稀疏轨迹作为输入的数据质量，我们系统地研究了两个相关的研究问题：道路网络上的轨迹恢复，旨在推断缺失点以恢复高采样轨迹，以及地图匹配，旨在将GPS点映射到路段以确定潜在的路线。在本文中，我们提出了有效的方法TRMMA和MMA精确的轨迹恢复和地图匹配，分别，MMA作为TRMMA的第一步。在MMA中，我们仔细制定了一个分类任务，将GPS点从稀疏轨迹映射到一个小的候选路段集上的路段，而不是整个道路网络。我们在MMA中开发技术来生成有效的嵌入，捕获GPS数据，方向信息和路段的模式，以准确地将稀疏轨迹与路线对齐。对于轨迹恢复，TRMMA将重点放在MMA返回的路线中的路段上，通过路段上的位置比来推断缺失点，通过避免评估所有路段来高效地生成高采样轨迹。具体来说，在TRMMA中，我们设计了一个双变换器编码过程，以内聚地捕获轨迹和路线中的潜在模式，并设计了一种有效的解码技术，以顺序预测缺失点的位置比和路段。我们进行了广泛的实验，比较TRMMA和MMA与现有的许多方法的轨迹恢复和地图匹配，分别在4个大型的真实世界的数据集。TRMMA和MMA始终如一地实现最佳结果质量，通常有很大的优势。
摘要：Real-world trajectories are often sparse with low-sampling rates (i.e., long intervals between consecutive GPS points) and misaligned with road networks, yet many applications demand high-quality data for optimal performance. To improve data quality with sparse trajectories as input, we systematically study two related research problems: trajectory recovery on road network, which aims to infer missing points to recover high-sampling trajectories, and map matching, which aims to map GPS points to road segments to determine underlying routes. In this paper, we present efficient methods TRMMA and MMA for accurate trajectory recovery and map matching, respectively, where MMA serves as the first step of TRMMA. In MMA, we carefully formulate a classification task to map a GPS point from sparse trajectories to a road segment over a small candidate segment set, rather than the entire road network. We develop techniques in MMA to generate effective embeddings that capture the patterns of GPS data, directional information, and road segments, to accurately align sparse trajectories to routes. For trajectory recovery, TRMMA focuses on the segments in the route returned by MMA to infer missing points with position ratios on road segments, producing high-sampling trajectories efficiently by avoiding evaluation of all road segments. Specifically, in TRMMA, we design a dual-transformer encoding process to cohesively capture latent patterns in trajectories and routes, and an effective decoding technique to sequentially predict the position ratios and road segments of missing points. We conduct extensive experiments to compare TRMMA and MMA with numerous existing methods for trajectory recovery and map matching, respectively, on 4 large real-world datasets. TRMMA and MMA consistently achieve the best result quality, often by a significant margin.

【10】Comparison of D-Wave Quantum Annealing and Markov Chain Monte Carlo for Sampling from a Probability Distribution of a Restricted Boltzmann Machine
标题：从限制Boltzmann机的概率分布采样的D波量子退变和马尔科夫链蒙特卡罗比较
链接：https://arxiv.org/abs/2508.10228

作者：a El Yazizi, Samee U. Khan, Yaroslav Koshka
备注：22 pages, 10 figures
摘要：一个本地谷（LV）为中心的方法来评估采样质量的限制玻尔兹曼机（RBM）的最新一代的D波量子退火。在与基于对比发散的RBM学习相关的条件下，从经典训练的RBM中获得D-Wave和Gibbs样本。比较样本所属的LV数量和相应局部最小值的能量。通过减少D波退火时间，没有实现LV数量的显著（期望的）增加。在任何训练时期，D-Wave采样的状态属于比Gibbs采样中更多的LV数量。然而，这两种技术发现的许多LV是不同的。对于高概率采样状态，这两种技术（不利地）互补性较低，重叠较多。然而，许多潜在的“重要”局部极小值，即，那些具有中间（即使不高）概率值的样本仅被两种抽样技术中的一种发现，而被另一种遗漏。这两种技术重叠较少，在晚于早期的培训时期，这正是训练阶段时，适度提高采样质量可以使有意义的差异RBM的可训练性。这项工作的结果可以解释以前的调查失败，以实现实质性的（或任何）改进时，使用基于D波的采样。然而，结果显示了一些改进的潜力，例如，使用一种结合了经典和量子的方法。
摘要：A local-valley (LV) centered approach to assessing the quality of sampling from Restricted Boltzmann Machines (RBMs) was applied to the latest generation of the D-Wave quantum annealer. D-Wave and Gibbs samples from a classically trained RBM were obtained at conditions relevant to the contrastive-divergence-based RBM learning. The samples were compared for the number of the LVs to which they belonged and the energy of the corresponding local minima. No significant (desirable) increase in the number of the LVs has been achieved by decreasing the D-Wave annealing time. At any training epoch, the states sampled by the D-Wave belonged to a somewhat higher number of LVs than in the Gibbs sampling. However, many of those LVs found by the two techniques differed. For high-probability sampled states, the two techniques were (unfavorably) less complementary and more overlapping. Nevertheless, many potentially "important" local minima, i.e., those having intermediate, even if not high, probability values, were found by only one of the two sampling techniques while missed by the other. The two techniques overlapped less at later than earlier training epochs, which is precisely the stage of the training when modest improvements to the sampling quality could make meaningful differences for the RBM trainability. The results of this work may explain the failure of previous investigations to achieve substantial (or any) improvement when using D-Wave-based sampling. However, the results reveal some potential for improvement, e.g., using a combined classical-quantum approach.

【11】Benchmark-Driven Selection of AI: Evidence from DeepSeek-R1
标题：基准驱动的人工智能选择：来自DeepSeek-R1的证据
链接：https://arxiv.org/abs/2508.10173

作者：da, Vit Stritecky
备注：17 pages, 5 figures, 2 tables
摘要：在观察到它们可以将现有的能力结合到任务完成之前的中间步骤的新轨迹中，并且这些轨迹有时可以帮助它们比过去的模型更好地概括之后，对推理语言模型的评估变得非常重要。随着推理成为大型语言模型的下一个扩展维度，需要仔细研究它们在关键任务中的能力。我们表明，更好的性能并不总是由测试时的算法改进或模型大小引起的，而是通过使用有影响力的基准作为学习课程。我们称之为AI的基准驱动选择，并使用我们的人类最后一次考试中的顺序决策问题来展示其对DeepSeek-R1的影响。通过有影响力的基准来指导AI的开发，将评估转换为学习，并使测试任务的新颖性成为衡量推理模型泛化能力的关键。因此，一些基准可被视为培训课程，而不是看不见的成套测试。
摘要：Evaluation of reasoning language models gained importance after it was observed that they can combine their existing capabilities into novel traces of intermediate steps before task completion and that the traces can sometimes help them to generalize better than past models. As reasoning becomes the next scaling dimension of large language models, careful study of their capabilities in critical tasks is needed. We show that better performance is not always caused by test-time algorithmic improvements or model sizes but also by using impactful benchmarks as curricula for learning. We call this benchmark-driven selection of AI and show its effects on DeepSeek-R1 using our sequential decision-making problem from Humanity's Last Exam. Steering development of AI by impactful benchmarks trades evaluation for learning and makes novelty of test tasks key for measuring generalization capabilities of reasoning models. Consequently, some benchmarks could be seen as curricula for training rather than unseen test sets.

【12】DINOv3
标题：DINOv3
链接：https://arxiv.org/abs/2508.10104

作者：méoni, Huy V. Vo, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Michaël Ramamonjisoa, Francisco Massa, Daniel Haziza, Luca Wehrstedt, Jianyuan Wang, Timothée Darcet, Théo Moutakanni, Leonel Sentana, Claire Roberts, Andrea Vedaldi, Jamie Tolan, John Brandt, Camille Couprie, Julien Mairal, Hervé Jégou, Patrick Labatut, Piotr Bojanowski
摘要：自监督学习有望消除手动数据注释的需要，使模型能够轻松扩展到大规模数据集和更大的架构。由于不针对特定的任务或领域，这种训练范式有可能使用单一算法从不同的来源学习视觉表示，从自然图像到航空图像。本技术报告介绍了DINOv3，它是通过利用简单而有效的策略实现这一愿景的一个重要里程碑。首先，我们通过仔细的数据准备，设计和优化来利用扩展数据集和模型大小的好处。其次，我们引入了一种名为Gram锚定的新方法，它有效地解决了已知但尚未解决的密集特征图在长时间训练过程中退化的问题。最后，我们应用了事后策略，进一步增强了模型在分辨率、模型大小和与文本对齐方面的灵活性。因此，我们提出了一个多功能的视觉基础模型，在广泛的设置中优于专业的最先进的技术，而无需微调。DINOv3生成高质量的密集特征，在各种视觉任务中实现出色的性能，大大超过了以前的自监督和弱监督基础模型。我们还分享了DINOv3视觉模型套件，旨在通过为各种资源限制和部署场景提供可扩展的解决方案，在广泛的任务和数据方面推进最先进的技术。
摘要：Self-supervised learning holds the promise of eliminating the need for manual data annotation, enabling models to scale effortlessly to massive datasets and larger architectures. By not being tailored to specific tasks or domains, this training paradigm has the potential to learn visual representations from diverse sources, ranging from natural to aerial images -- using a single algorithm. This technical report introduces DINOv3, a major milestone toward realizing this vision by leveraging simple yet effective strategies. First, we leverage the benefit of scaling both dataset and model size by careful data preparation, design, and optimization. Second, we introduce a new method called Gram anchoring, which effectively addresses the known yet unsolved issue of dense feature maps degrading during long training schedules. Finally, we apply post-hoc strategies that further enhance our models' flexibility with respect to resolution, model size, and alignment with text. As a result, we present a versatile vision foundation model that outperforms the specialized state of the art across a broad range of settings, without fine-tuning. DINOv3 produces high-quality dense features that achieve outstanding performance on various vision tasks, significantly surpassing previous self- and weakly-supervised foundation models. We also share the DINOv3 suite of vision models, designed to advance the state of the art on a wide spectrum of tasks and data by providing scalable solutions for diverse resource constraints and deployment scenarios.

【13】Performance of universal machine-learned potentials with explicit long-range interactions in biomolecular simulations
标题：生物分子模拟中具有显式远程相互作用的通用机器学习势的性能
链接：https://arxiv.org/abs/2508.10841

作者：verkin, Matheus Ferraz, Francesco Alesiani, Mathias Niepert
摘要：通用的机器学习潜力承诺在组成和振动自由度上具有可转移的准确性，但它们在生物分子模拟中的应用仍然有待探索。这项工作系统地评估了在SPICE-v2数据集上训练的具有和不具有显式长程色散和静电的等变消息传递架构。我们评估了模型大小、训练数据组成和静电处理对分布内和分布外基准数据集的影响，以及散装液态水、NaCl水溶液和生物分子（包括丙氨酸三肽、迷你蛋白Trp笼和Crambin）的分子模拟。虽然更大的模型可以提高基准数据集的准确性，但这种趋势并不总是延伸到从模拟中获得的属性。预测的属性还取决于训练数据集的组成。远程静电对系统没有系统性影响。然而，对于Trp笼，它们的包含产率增加了构象变异性。我们的研究结果表明，不平衡的数据集和不成熟的评估实践目前挑战了通用机器学习潜力对生物分子模拟的适用性。
摘要：Universal machine-learned potentials promise transferable accuracy across compositional and vibrational degrees of freedom, yet their application to biomolecular simulations remains underexplored. This work systematically evaluates equivariant message-passing architectures trained on the SPICE-v2 dataset with and without explicit long-range dispersion and electrostatics. We assess the impact of model size, training data composition, and electrostatic treatment across in- and out-of-distribution benchmark datasets, as well as molecular simulations of bulk liquid water, aqueous NaCl solutions, and biomolecules, including alanine tripeptide, the mini-protein Trp-cage, and Crambin. While larger models improve accuracy on benchmark datasets, this trend does not consistently extend to properties obtained from simulations. Predicted properties also depend on the composition of the training dataset. Long-range electrostatics show no systematic impact across systems. However, for Trp-cage, their inclusion yields increased conformational variability. Our results suggest that imbalanced datasets and immature evaluation practices currently challenge the applicability of universal machine-learned potentials to biomolecular simulations.

【14】Parity Cross-Resonance: A Multiqubit Gate
标题：宇称交叉共振：多量子位门
链接：https://arxiv.org/abs/2508.10807

作者：, Siyu Wang, Radhika Joshi, Rihan Hai, Mohammad H. Ansari
备注：19 pages, 10 figures
摘要：我们提出了一个原生的三量子比特纠缠门，利用工程相互作用，实现控制-控制-目标和控制-目标-目标的操作在一个单一的一致的步骤。与传统的分解成多个双量子比特门不同，我们的混合优化方法选择性地放大了所需的相互作用，同时抑制了不必要的耦合，从而在计算子空间及其他空间中产生了强大的性能。这种新型门可以归类为交叉谐振门。我们可以利用它在几个方面，例如，在GHZ三重态的准备，托夫里类逻辑演示与多体相互作用，并在实现一个受控的ZZ门。后者将两个数据量子比特的奇偶校验直接映射到测量量子比特上，从而在表面编码量子纠错中实现更快和更高保真的稳定器测量。在所有这些例子中，我们表明，三量子比特门的性能在希尔伯特空间大小上仍然是鲁棒的，这一点通过增加总激发数的测试得到了证实。这项工作为共同设计电路架构和控制协议奠定了基础，这些协议利用原生多量子位相互作用作为下一代超导量子处理器的核心元素。
摘要：We present a native three-qubit entangling gate that exploits engineered interactions to realize control-control-target and control-target-target operations in a single coherent step. Unlike conventional decompositions into multiple two-qubit gates, our hybrid optimization approach selectively amplifies desired interactions while suppressing unwanted couplings, yielding robust performance across the computational subspace and beyond. The new gate can be classified as a cross-resonance gate. We show it can be utilized in several ways, for example, in GHZ triplet state preparation, Toffoli-class logic demonstrations with many-body interactions, and in implementing a controlled-ZZ gate. The latter maps the parity of two data qubits directly onto a measurement qubit, enabling faster and higher-fidelity stabilizer measurements in surface-code quantum error correction. In all these examples, we show that the three-qubit gate performance remains robust across Hilbert space sizes, as confirmed by testing under increasing total excitation numbers. This work lays the foundation for co-designing circuit architectures and control protocols that leverage native multiqubit interactions as core elements of next-generation superconducting quantum processors.

【15】Physics-Informed Deep Contrast Source Inversion: A Unified Framework for Inverse Scattering Problems
标题：基于物理学的深度对比源倒置：逆散射问题的统一框架
链接：https://arxiv.org/abs/2508.10555

作者：n, Daoqi Liu, Hongyu Zhou, Maokun Li, Shenheng Xu, Fan Yang
摘要：逆散射问题在电磁成像和医学诊断中至关重要，但其非线性和不同的测量场景带来了挑战。本文提出了一种基于物理信息的深对比度源反演框架（DeepCSI），用于在各种测量条件下快速准确地重建介质。受对比度源反演（CSI）和神经算子方法的启发，采用残差多层感知器（ResMLP）对不同发射源激励下感兴趣区域的电流分布进行建模，有效地线性化了非线性逆散射问题，显著降低了传统全波形反演的计算成本.通过将介质参数建模为可学习的张量，并利用集成了状态方程损失、数据方程损失和总变分正则化的混合损失函数，DeepCSI建立了一个完全可微的框架，用于网络参数和介质属性的联合优化。与传统方法相比，DeepCSI在简单性和通用建模能力方面具有优势，适用于各种测量场景，包括无相位和多频率观测。仿真和实验表明，DeepCSI在全数据、无相位数据和多频条件下实现了高精度、鲁棒的重建，优于传统CSI方法，为复杂逆散射问题提供了一种高效、通用的解决方案。
摘要：Inverse scattering problems are critical in electromagnetic imaging and medical diagnostics but are challenged by their nonlinearity and diverse measurement scenarios. This paper proposes a physics-informed deep contrast source inversion framework (DeepCSI) for fast and accurate medium reconstruction across various measurement conditions. Inspired by contrast source inversion (CSI) and neural operator methods, a residual multilayer perceptron (ResMLP) is employed to model current distributions in the region of interest under different transmitter excitations, effectively linearizing the nonlinear inverse scattering problem and significantly reducing the computational cost of traditional full-waveform inversion. By modeling medium parameters as learnable tensors and utilizing a hybrid loss function that integrates state equation loss, data equation loss, and total variation regularization, DeepCSI establishes a fully differentiable framework for joint optimization of network parameters and medium properties. Compared with conventional methods, DeepCSI offers advantages in terms of simplicity and universal modeling capabilities for diverse measurement scenarios, including phase-less and multi-frequency observation. Simulations and experiments demonstrate that DeepCSI achieves high-precision, robust reconstruction under full-data, phaseless data, and multifrequency conditions, outperforming traditional CSI methods and providing an efficient and universal solution for complex inverse scattering problems.

【16】Virtual Sensing for Solder Layer Degradation and Temperature Monitoring in IGBT Modules
标题：IGBT模块中焊层退化的虚拟传感和温度监测
链接：https://arxiv.org/abs/2508.10515

作者：golo, Monika Stipsitz, Helios Sanchis-Alepuz
备注：Andrea Urgolo and Monika Stipsitz contributed equally to this work
摘要：监测绝缘栅双极晶体管（IGBT）模块的退化状态对于确保电力电子系统的可靠性和寿命至关重要，尤其是在安全关键型和高性能应用中。然而，由于内部元件的物理不可接近性和恶劣的环境，直接测量关键退化指标（如结温、焊料疲劳或分层）仍然具有挑战性。在这种情况下，基于机器学习的虚拟传感提供了一种很有前途的替代方案，它可以弥合从可行的传感器放置到相关但不可访问的位置之间的差距。本文探讨了基于有限数量的物理传感器估计焊料层退化状态的可行性，以及相应的全温度图。基于一个特定的退化模式的合成数据，我们得到了一个高精度的退化焊料面积的估计（1.17%的平均绝对误差），并能够重现IGBT的表面温度的最大相对误差为4.56%（对应于0.37%的平均相对误差）。
摘要：Monitoring the degradation state of Insulated Gate Bipolar Transistor (IGBT) modules is essential for ensuring the reliability and longevity of power electronic systems, especially in safety-critical and high-performance applications. However, direct measurement of key degradation indicators - such as junction temperature, solder fatigue or delamination - remains challenging due to the physical inaccessibility of internal components and the harsh environment. In this context, machine learning-based virtual sensing offers a promising alternative by bridging the gap from feasible sensor placement to the relevant but inaccessible locations. This paper explores the feasibility of estimating the degradation state of solder layers, and the corresponding full temperature maps based on a limited number of physical sensors. Based on synthetic data of a specific degradation mode, we obtain a high accuracy in the estimation of the degraded solder area (1.17% mean absolute error), and are able to reproduce the surface temperature of the IGBT with a maximum relative error of 4.56% (corresponding to an average relative error of 0.37%).

【17】Mo' Memory, Mo' Problems: Stream-Native Machine Unlearning
标题：Mo“记忆，Mo”问题：流原生机器遗忘
链接：https://arxiv.org/abs/2508.10193

作者：ewart
摘要：Machine unlearning work assumes a static, i.i.d training environment that doesn't truly exist. Modern ML pipelines need to learn, unlearn, and predict continuously on production streams of data. We translate the notion of the batch unlearning scenario to the online setting using notions of regret, sample complexity, and deletion capacity. We further tighten regret bounds to a logarithmic $\mathcal{O}(\ln{T})$, a first for a machine unlearning algorithm. And we swap out an expensive Hessian inversion with online variant of L-BFGS optimization, removing a memory footprint that scales linearly with time. Such changes extend the lifespan of an ML model before expensive retraining, making for a more efficient unlearning process.

机器翻译由腾讯交互翻译提供，仅供参考

点击“阅读原文”获取带摘要的学术速递