机器学习学术速递[8.6]

点击阅读原文访问arxivdaily.com，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏等功能！

cs.LG 方向，今日共计159篇

大模型相关(18篇)

【1】No LLM Solved Yu Tsumura's 554th Problem
标题：没有法学硕士解决了津村裕的第554个问题
链接：https://arxiv.org/abs/2508.03685

作者：eder, William Hart
备注：67 pages
摘要：我们表明，与对LLM解决问题能力的乐观相反，最近获得的金牌推动了问题的存在- Yu Tsumura的第554个问题-a）在证明复杂性方面属于IMO问题的范围，b）不是一个组合学问题，这导致了LLM的问题，c）需要比典型的硬IMO问题更少的证明技术，d）有一个公开可用的解决方案（可能在LLM的训练数据中），以及e）任何现有的现成LLM（商业或开源）都无法轻易解决。
摘要：We show, contrary to the optimism about LLM's problem-solving abilities, fueled by the recent gold medals that were attained, that a problem exists -- Yu Tsumura's 554th problem -- that a) is within the scope of an IMO problem in terms of proof sophistication, b) is not a combinatorics problem which has caused issues for LLMs, c) requires fewer proof techniques than typical hard IMO problems, d) has a publicly available solution (likely in the training data of LLMs), and e) that cannot be readily solved by any existing off-the-shelf LLM (commercial or open-source).

【2】Self-Questioning Language Models
标题：自我质疑语言模型
链接：https://arxiv.org/abs/2508.03682

作者：, Mihir Prabhudesai, Katerina Fragkiadaki, Hao Liu, Deepak Pathak
摘要：大型语言模型可以在没有外部数据的情况下通过生成自己的问题和答案来改进吗？我们假设，预先训练的语言模型可以提高其推理能力，只要给出指定主题的单个提示（例如，代数单词问题），并要求模型生成自己的问题。为此，我们提出了自我提问语言模型（SQLM）：一个不对称的自我游戏框架，其中提议者被赋予主题并为求解者生成一个问题，求解者试图回答它。提议者和求解者都通过强化学习进行训练。如果问题不太容易或太难，提议者会得到奖励，而解决者会得到基于多数投票的奖励，这是在没有真实答案的情况下正确性的代理。对于编码，提议者可以生成用于验证的单元测试。我们研究这个不对称的自我发挥框架上的三个基准：三位数乘法，代数问题的OMEGA基准，编程问题的Codeforces。通过不断生成更有趣的问题并尝试解决它们，语言模型可以在不访问任何策划的训练数据集的情况下改进下游基准。
摘要：Can large language models improve without external data -- by generating their own questions and answers? We hypothesize that a pre-trained language model can improve its reasoning skills given only a single prompt specifying the topic (e.g., algebra word problems) and asking the model to generate its own questions. To do this, we propose Self-Questioning Language Models (SQLM): an asymmetric self-play framework where a proposer is given the topic and generates a question for a solver, who tries to answer it. Both the proposer and solver are trained via reinforcement learning. The proposer receives a reward if the problem is not too easy or too difficult, and the solver receives a reward based on majority voting, a proxy for correctness in the absence of ground-truth answers. For coding, the proposer can instead generate unit tests which are used for verification. We study this asymmetric self-play framework on three benchmarks: three-digit multiplication, algebra problems from the OMEGA benchmark, and programming problems from Codeforces. By continually generating more interesting problems and attempting to solve them, language models can improve on downstream benchmarks without access to any curated training datasets.

【3】More Than a Score: Probing the Impact of Prompt Specificity on LLM Code Generation
标题：不止一分：探索提示特异性对LLM代码生成的影响
链接：https://arxiv.org/abs/2508.03678

作者：Zi, Harshitha Menon, Arjun Guha
摘要：最先进的大型语言模型（LLM）在HumanEval等通用基准测试中达到了高通过率@1，但在ParEval等专用套件上表现不佳。这是由于LLM缺少领域知识还是没有给出足够的提示细节？为了回答这个问题，我们引入了PartialOrderEval，它通过从最小到最大详细的部分提示顺序来增强任何代码生成基准。将其应用于HumanEval以及ParEval的系列和OpenMP子集，我们测量了pass@1如何以提示特异性进行缩放。我们使用Llama-3.x和Qwen2.5-Coder进行的实验表明，不同任务的提示敏感度不同，定性分析强调明确的I/O规范、边缘情况处理和逐步细分是提示细节改进的关键驱动因素。
摘要：State-of-the-art Large Language Models (LLMs) achieve high pass@1 on general benchmarks like HumanEval but underperform on specialized suites such as ParEval. Is this due to LLMs missing domain knowledge or insufficient prompt detail is given? To answer this, we introduce PartialOrderEval, which augments any code generation benchmark with a partial order of prompts from minimal to maximally detailed. Applying it to HumanEval and both serial and OpenMP subsets of ParEval, we measure how pass@1 scales with prompt specificity. Our experiments with Llama-3.x and Qwen2.5-Coder demonstrate varying degrees of prompt sensitivity across different tasks, and a qualitative analysis highlights explicit I/O specifications, edge-case handling, and stepwise breakdowns as the key drivers of prompt detail improvement.

【4】LLMDistill4Ads: Using Cross-Encoders to Distill from LLM Signals for Advertiser Keyphrase Recommendations at eBay
标题：LLMDistill 4Ads：使用交叉编码器从LLM信号中提取，以在eBay上进行广告商关键词推荐
链接：https://arxiv.org/abs/2508.03628

作者：y, Benjamin Braun, Naveen Ravipati, Hansi Wu, Binbin Li
摘要：eBay的卖家会被推荐使用关键词进行竞价，以提高他们的广告活动效果。这些关键词的相关性是至关重要的，以避免过度拥挤的搜索系统与不相关的项目，并保持一个积极的卖方的看法。关键词建议必须与卖家和搜索关于拍卖的判断保持一致。由于难以获得大规模的负面人类判断，雇用法学硕士作为法官模仿卖方判断已被确立为几项研究的规范。本研究介绍了一种新的两步LLM蒸馏过程从LLM法官用于debias我们的嵌入式检索（EBR）模型的各种偏见，存在于点击数据。我们从一个LLM教师通过交叉编码器助理提取到一个双编码器学生使用多任务训练方法，最终采用学生双编码器检索相关的广告主关键词。我们发现，在多任务训练设置中集成LLM的知识蒸馏过程可以提高在eBay检索相关广告主关键词的双编码器性能。
摘要：Sellers at eBay are recommended keyphrases to bid on to enhance the performance of their advertising campaigns. The relevance of these keyphrases is crucial in avoiding the overcrowding of search systems with irrelevant items and maintaining a positive seller perception. It is essential that keyphrase recommendations align with both seller and Search judgments regarding auctions. Due to the difficulty in procuring negative human judgment at scale, employing LLM-as-a-judge to mimic seller judgment has been established as the norm in several studies. This study introduces a novel two-step LLM distillation process from a LLM-judge used to debias our Embedding Based Retrieval (EBR) model from the various biases that exist in click-data. We distill from an LLM teacher via a cross-encoder assistant into a bi-encoder student using a multi-task training approach, ultimately employing the student bi-encoder to retrieve relevant advertiser keyphrases. We show that integrating a knowledge distillation process from LLMs in a multi-task training setup enhances bi-encoder performance in retrieving relevant advertiser keyphrases at eBay.

【5】Tackling Distribution Shift in LLM via KILO: Knowledge-Instructed Learning for Continual Adaptation
标题：通过KILO解决LLM的分布转变：知识指导学习以持续适应
链接：https://arxiv.org/abs/2508.03571

作者：akhiroh, Thomas Fevens
摘要：大型语言模型（LLM）在面对领域转换时经常会出现性能下降，这主要是由于灾难性的遗忘。在这项工作中，我们提出了KILO（知识指导学习持续适应），一种新的持续学习框架，集成了动态知识图与指令调整。通过利用检索到的特定领域的知识作为培训过程中的指导，KILO增强了对新领域的适应性和对以前获得的知识的保留。我们在WikiText-103上预训练我们的模型，并评估四个不同目标域的顺序适应：BioASQ，SciQ，TweetEval和MIND。我们的实验表明，KILO在向后迁移、向前迁移、F1分数、留存率和训练效率方面始终优于强基线，包括持续微调、ERNIE 2.0和CPT。这些结果突出了结构化知识检索和教学相结合的有效性，以克服在持续学习的情况下域转移的挑战。
摘要：Large Language Models (LLMs) often suffer from performance degradation when faced with domain shifts, primarily due to catastrophic forgetting. In this work, we propose KILO (Knowledge-Instructed Learning for Continual Adaptation), a novel continual learning framework that integrates dynamic knowledge graphs with instruction tuning. By leveraging retrieved domain-specific knowledge as guidance during training, KILO enhances both adaptability to new domains and retention of previously acquired knowledge. We pretrain our model on WikiText-103 and evaluate sequential adaptation across four diverse target domains: BioASQ, SciQ, TweetEval, and MIND. Our experiments demonstrate that KILO consistently outperforms strong baselines, including continual fine-tuning, ERNIE 2.0, and CPT, in terms of backward transfer, forward transfer, F1 score, retention rate, and training efficiency. These results highlight the effectiveness of combining structured knowledge retrieval and instruction prompting to overcome domain shift challenges in continual learning scenarios.

【6】BitsAI-Fix: LLM-Driven Approach for Automated Lint Error Resolution in Practice
标题：BitsAI-Fix：LLM驱动的方法，用于在实践中自动解决线头错误
链接：https://arxiv.org/abs/2508.03487

作者：Li, Qi Long, Zhiyuan Yao, Jian Xu, Lintao Xie, Xu He, Lu Geng, Xin Han, Yueyan Chen, Wenbo Duan
摘要：随着企业代码库的规模和复杂性不断增长，lint错误的数量远远超过了工程师的手动修复能力，导致技术债务不断积累，阻碍了开发效率。本文介绍了BitsAI-Fix，这是一种基于大型语言模型（LLM）的自动化lint错误修复工作流，旨在解决工业规模环境中的这一关键挑战。BitsAI-Fix使用tree-sitter进行上下文扩展，并通过专门训练的LLM生成搜索和替换格式补丁，然后进行lint扫描重新验证以输出最终修复结果。此外，我们的方法引入了一种创新的渐进式强化学习（RL）训练策略，可以在项目冷启动阶段自动获取可验证的训练数据，并在系统部署后通过反馈收集在线样本来不断验证模型。此外，我们设计了一个有针对性的基于规则的奖励机制，结合格式奖励和正确性奖励，同时惩罚冗余的修改。我们还提出了一种“代码差异匹配”方法来持续跟踪在线有效性。在字节跳动的生产部署中，我们的解决方案已经支持了超过5，000名工程师，解决了超过12，000个静态分析问题，实现了约85%的修复准确率，每周约有1，000名活跃的采用者。这项工作证明了基于LLM的代码修复解决方案在企业环境中的实际可行性，并为大规模工业场景中的自动代码修复提供了参考。
摘要：As enterprise codebases continue to grow in scale and complexity, the volume of lint errors far exceeds engineers' manual remediation capacity, leading to continuous accumulation of technical debt and hindered development efficiency. This paper presents BitsAI-Fix, an automated lint error remediation workflow based on Large Language Models (LLMs), designed to address this critical challenge in industrial-scale environments. BitsAI-Fix employs tree-sitter for context expansion and generates search-and-replace format patches through specially trained LLMs, followed by lint scan re-verification to output final remediation results. Additionally, our approach introduces an innovative progressive reinforcement learning (RL) training strategy that can automatically acquire verifiable training data during the project cold-start phase and continuously iterate the model by collecting online samples through feedback after system deployment. Furthermore, we designed a targeted rule-based reward mechanism that combines format rewards and correctness rewards while penalizing redundant modifications. We also propose a "code diff matching" methodology to continuously track online effectiveness. In production deployment at ByteDance, our solution has supported over 5,000 engineers, resolved more than 12,000 static analysis issues, achieved approximately 85% remediation accuracy, with around 1,000 weekly active adopters. This work demonstrates the practical feasibility of LLM-based code remediation solutions in enterprise environments and serves as a reference for automated code fix in large-scale industrial scenarios.

【7】R2GenKG: Hierarchical Multi-modal Knowledge Graph for LLM-based Radiology Report Generation
标题：R2 GenKG：用于基于LLM的放射学报告生成的分层多模式知识图
链接：https://arxiv.org/abs/2508.03426

作者：ng, Yuhan Qiao, Xiao Wang, Fuling Wang, Yuxiang Zhang, Dengdi Sun
摘要：X射线医疗报告生成是人工智能在医疗保健领域的重要应用之一。在大型基础模型的支持下，医疗报告生成质量显著提升。然而，诸如幻觉和疾病诊断能力薄弱等挑战仍然存在。在本文中，我们首先构建了一个大规模的多模态医学知识图（称为M3 KG）的基础上地面真相的医疗报告使用GPT-40。它包含CheXpert Plus数据集的2477个实体，3种关系，37424个三元组和6943个疾病感知视觉标记。然后，我们采样，以获得多粒度的语义图，并使用R-GCN编码器进行特征提取。对于输入的X射线图像，我们采用Swin-Transformer来提取视觉特征，并使用交叉注意与知识进行交互。视觉标记被馈送到Q-形成器中，并使用另一个交叉注意来检索疾病感知视觉标记。最后，我们采用大语言模型将语义知识图、输入X射线图像和疾病感知视觉标记映射到语言描述中。在多个数据集上进行的大量实验充分验证了我们提出的知识图和X射线报告生成框架的有效性。本文的源代码将在https://github.com/Event-AHU/Medical_Image_Analysis上发布。
摘要：X-ray medical report generation is one of the important applications of artificial intelligence in healthcare. With the support of large foundation models, the quality of medical report generation has significantly improved. However, challenges such as hallucination and weak disease diagnostic capability still persist. In this paper, we first construct a large-scale multi-modal medical knowledge graph (termed M3KG) based on the ground truth medical report using the GPT-4o. It contains 2477 entities, 3 kinds of relations, 37424 triples, and 6943 disease-aware vision tokens for the CheXpert Plus dataset. Then, we sample it to obtain multi-granularity semantic graphs and use an R-GCN encoder for feature extraction. For the input X-ray image, we adopt the Swin-Transformer to extract the vision features and interact with the knowledge using cross-attention. The vision tokens are fed into a Q-former and retrieved the disease-aware vision tokens using another cross-attention. Finally, we adopt the large language model to map the semantic knowledge graph, input X-ray image, and disease-aware vision tokens into language descriptions. Extensive experiments on multiple datasets fully validated the effectiveness of our proposed knowledge graph and X-ray report generation framework. The source code of this paper will be released on https://github.com/Event-AHU/Medical_Image_Analysis.

【8】Exploring Layer-wise Information Effectiveness for Post-Training Quantization in Small Language Models
标题：探索小型语言模型中训练后量化的分层信息有效性
链接：https://arxiv.org/abs/2508.03332

作者：Qingyao Yang, Dirui Xie, Wendong Xu, Wenyong Zhou, Haobo Liu, Zhengwu Liu, Ngai Wong
备注：low-bit quantization
摘要：具有数十亿参数的大型语言模型通常被过度配置：许多层贡献很少的唯一信息，但在推理过程中占据了内存和能量足迹。我们提出了LieQ，这是一个度量驱动的后训练量化框架，它解决了在极端低比特压缩下保持sub-7B模型准确性的关键挑战。我们的方法引入了三个互补的逐层诊断-复杂度下降，代表性紧凑性和Top-k能量增益-揭示了跨层的规范分工，实现了自动位宽分配而无需梯度更新。与现有的方法在2 - 3位精度时遭受严重的精度下降不同，LieQ实现了最先进的压缩精度权衡：在Qwen 3 - 4B上，它在2.05位量化时恢复了FP16基线性能的95.9%，在七个zero-shot推理任务中，平均比GPTQ高19.7%，比AWQ高18.1%。应用于LLaMA 3.2 - 3B，LieQ在2.07位精度下保持了98.2%的基线精度，同时实现了4倍的内存减少，为在资源受限的边缘设备上部署小型语言模型建立了新的范例。
摘要：Large language models with billions of parameters are often over-provisioned: many layers contribute little unique information yet dominate the memory and energy footprint during inference. We present LieQ, a metric-driven post-training quantization framework that addresses the critical challenge of maintaining accuracy in sub-7B models under extreme low-bit compression. Our method introduces three complementary layer-wise diagnostics-Perplexity Drop, Representational Compactness, and Top-k Energy Gain -that reveal a canonical division of labour across layers, enabling automatic bit-width allocation without gradient updates. Unlike existing approaches that suffer severe accuracy degradation at 2-3 bits precision, LieQ achieves state-of-the-art compression-accuracy trade-offs: on Qwen3-4B, it recovers 95.9% of FP16 baseline performance at 2.05-bit quantization, outperforming GPTQ by 19.7% and AWQ by 18.1% on average across seven zero-shot reasoning tasks. Applied to LLaMA3.2-3B, LieQ maintains 98.2% of baseline accuracy at 2.07-bit precision while enabling 4x memory reduction, establishing new paradigms for deploying small language models on resource-constrained edge devices.

【9】Light-IF: Endowing LLMs with Generalizable Reasoning via Preview and Self-Checking for Complex Instruction Following
标题：Light-IF：通过预览和自我检查复杂指令遵循，赋予LLM可概括推理
链接：https://arxiv.org/abs/2508.03178

作者：Wang, Liang Wen, Shousheng Jia, Xiangzheng Zhang, Liang Xu
备注：12 pages, 10 figures, 7 tables
摘要：虽然LLM的推理能力的进步显着提高了他们在解决数学问题，编码任务和一般谜题方面的表现，但他们在准确遵守指令方面的有效性仍然不一致，特别是在更复杂的指令方面。我们的调查确定懒惰的推理过程中的思考阶段的主要因素，导致不良的教学坚持。为了缓解这个问题，我们提出了一个全面的框架，旨在使严格的推理过程，包括预览和自我检查，满足严格的指令约束是必不可少的。具体来说，我们首先生成具有复杂约束的指令，并应用过滤过程来获得有效的提示，从而产生三个不同的提示数据集，分为硬，容易和通过。然后，我们在传递提示上采用拒绝采样来管理一个小而高质量的数据集，从而实现模型的冷启动初始化，并促进其适应有效的推理模式。随后，我们采用了一种熵保持监督微调（Entropy-SFT）策略，结合基于规则的密集奖励指导的令牌熵自适应（TEA-RL）强化学习。这种方法鼓励模型转换其推理机制，最终培养包括预览和自我检查在内的可推广的推理能力。在遵循基准测试的基础上进行的大量实验表明，在各种模型尺度上，性能都有显著的提高。值得注意的是，我们的Light-IF-32 B模型超越了更大的开源模型（如DeepSeek-R1）和闭源模型（如Doubao-1.6）。
摘要：While advancements in the reasoning abilities of LLMs have significantly enhanced their performance in solving mathematical problems, coding tasks, and general puzzles, their effectiveness in accurately adhering to instructions remains inconsistent, particularly with more complex directives. Our investigation identifies lazy reasoning during the thinking stage as the primary factor contributing to poor instruction adherence. To mitigate this issue, we propose a comprehensive framework designed to enable rigorous reasoning processes involving preview and self-checking, essential for satisfying strict instruction constraints. Specifically, we first generate instructions with complex constraints and apply a filtering process to obtain valid prompts, resulting in three distinct prompt datasets categorized as hard, easy, and pass. Then, we employ rejection sampling on the pass prompts to curate a small yet high-quality dataset, enabling a cold-start initialization of the model and facilitating its adaptation to effective reasoning patterns. Subsequently, we employ an entropy-preserving supervised fine-tuning (Entropy-SFT) strategy coupled with token-wise entropy-adaptive (TEA-RL) reinforcement learning guided by rule-based dense rewards. This approach encourages the model to transform its reasoning mechanism, ultimately fostering generalizable reasoning abilities that encompass preview and self-checking. Extensive experiments conducted on instruction-following benchmarks demonstrate remarkable performance improvements across various model scales. Notably, our Light-IF-32B model surpasses both larger open-source models such as DeepSeek-R1 and closed-source models like Doubao-1.6.

【10】Estimating Worst-Case Frontier Risks of Open-Weight LLMs
标题：估计开放权重LLM的最坏情况前沿风险
链接：https://arxiv.org/abs/2508.03153

作者：ace, Olivia Watkins, Miles Wang, Kai Chen, Chris Koch
摘要：在本文中，我们研究了gpt-oss发布的最坏情况下的前沿风险。我们引入了恶意微调（MFT），在这里我们试图通过微调gpt-oss来获得最大的功能，使其在两个领域尽可能强大：生物学和网络安全。为了最大化生物风险，我们策划了与威胁创建相关的任务，并在具有Web浏览的RL环境中训练gpt-oss。为了最大限度地提高网络安全风险，我们在代理编码环境中对gpt-oss进行训练，以解决捕获旗帜（CTF）挑战。我们比较了这些MFT模型与开放式和封闭式加权LLMs的前沿风险评估。与前沿封闭权重模型相比，MFT gpt-oss的表现不如OpenAI o3，后者低于生物风险和网络安全的Preparedness High能力水平。与开放重量模型相比，gpt-oss可能会略微增加生物能力，但不会实质性地推进前沿。综上所述，这些结果有助于我们决定发布该模型，我们希望我们的MFT方法可以作为有用的指导，用于估计未来开放重量释放的危害。
摘要：In this paper, we study the worst-case frontier risks of releasing gpt-oss. We introduce malicious fine-tuning (MFT), where we attempt to elicit maximum capabilities by fine-tuning gpt-oss to be as capable as possible in two domains: biology and cybersecurity. To maximize biological risk (biorisk), we curate tasks related to threat creation and train gpt-oss in an RL environment with web browsing. To maximize cybersecurity risk, we train gpt-oss in an agentic coding environment to solve capture-the-flag (CTF) challenges. We compare these MFT models against open- and closed-weight LLMs on frontier risk evaluations. Compared to frontier closed-weight models, MFT gpt-oss underperforms OpenAI o3, a model that is below Preparedness High capability level for biorisk and cybersecurity. Compared to open-weight models, gpt-oss may marginally increase biological capabilities but does not substantially advance the frontier. Taken together, these results contributed to our decision to release the model, and we hope that our MFT approach can serve as useful guidance for estimating harm from future open-weight releases.

【11】Frontier: Simulating the Next Generation of LLM Inference Systems
标题：前沿：模拟下一代LLM推理系统
链接：https://arxiv.org/abs/2508.03148

作者：eng, Xin Tan, Kin Hang Sew, Yimin Jiang, Yibo Zhu, Hong Xu
摘要：随着混合专家（MoE）模型和分离架构的兴起，大型语言模型（LLM）推理变得越来越复杂，这些模型和架构将预填充/解码（PD）或注意力/FFN（AF）等组件解耦，以实现异构扩展。现有的模拟器，建筑共址，密集的模型，无法捕捉这些新兴的范式复杂的系统动力学。我们目前的边疆，高保真模拟器设计从地面上这个新的景观。Frontier引入了一个统一的框架来对位于同一地点的系统和分散的系统进行建模，为具有专家并行性（EP）的MoE推理提供本地支持。它支持模拟复杂的工作流，如跨集群专家路由和用于延迟隐藏的高级流水线策略。为了确保保真度和可用性，Frontier采用了改进的操作员模型，以提高准确性。Frontier使社区能够设计和优化LLM推理的未来。
摘要：Large Language Model (LLM) inference is growing increasingly complex with the rise of Mixture-of-Experts (MoE) models and disaggregated architectures that decouple components like prefill/decode (PD) or attention/FFN (AF) for heterogeneous scaling. Existing simulators, architected for co-located, dense models, are unable to capture the intricate system dynamics of these emerging paradigms. We present Frontier, a high-fidelity simulator designed from the ground up for this new landscape. Frontier introduces a unified framework to model both co-located and disaggregated systems, providing native support for MoE inference with expert parallelism (EP). It enables the simulation of complex workflows like cross-cluster expert routing and advanced pipelining strategies for latency hiding. To ensure fidelity and usability, Frontier incorporates refined operator models for improved accuracy. Frontier empowers the community to design and optimize the future of LLM inference at scale.

【12】Unified Tool Integration for LLMs: A Protocol-Agnostic Approach to Function Calling
标题：LLM的统一工具集成：函数调用的协议不可知方法
链接：https://arxiv.org/abs/2508.02979

作者：, Rick Stevens
备注：arXiv admin note: substantial text overlap with arXiv:2507.10593
摘要：工具增强的大型语言模型（LLM）的激增创建了一个碎片化的生态系统，开发人员必须导航多个协议，手动模式定义和复杂的执行工作流。我们提出了一个统一的方法来解决这一挑战的工具集成，抽象的协议差异，同时优化执行性能。我们的解决方案演示了协议无关设计原则如何通过自动模式生成、双模式并发执行和无缝多源工具管理来显著降低开发开销。实验结果表明，在集成场景中代码减少了60-80%，通过优化并发性能提高了3.1倍，并与现有的函数调用标准完全兼容。这项工作既为工具集成架构提供了理论见解，也为现实世界的LLM应用程序开发提供了实际解决方案。
摘要：The proliferation of tool-augmented Large Language Models (LLMs) has created a fragmented ecosystem where developers must navigate multiple protocols, manual schema definitions, and complex execution workflows. We address this challenge by proposing a unified approach to tool integration that abstracts protocol differences while optimizing execution performance. Our solution demonstrates how protocol-agnostic design principles can significantly reduce development overhead through automated schema generation, dual-mode concurrent execution, and seamless multi-source tool management. Experimental results show 60-80% code reduction across integration scenarios, performance improvements up to 3.1x through optimized concurrency, and full compatibility with existing function calling standards. This work contributes both theoretical insights into tool integration architecture and practical solutions for real-world LLM application development.

【13】LLM-based IR-system for Bank Supervisors
标题：基于LLM的银行监管者IR系统
链接：https://arxiv.org/abs/2508.02945

作者：ab
备注：None
摘要：银行监管者面临着确保新措施与历史先例保持一致的复杂任务。为了应对这一挑战，我们引入了一个新的信息检索（IR）系统，专门帮助主管起草一致和有效的措施。该系统吸收现场调查的结果。然后，它从综合数据库中检索最相关的历史调查结果及其相关措施，为主管人员编写新调查结果的知情措施提供坚实的基础。利用词汇，语义和资本要求法规（CRR）模糊集匹配技术的混合，IR系统确保检索的结果，密切配合当前的情况。该系统的性能，特别是在部分标记的数据的情况下，通过蒙特卡罗方法进行验证，展示其鲁棒性和准确性。通过基于变换器的去噪自动编码器进行微调，最终模型的平均精度（MAP@100）为0.83，平均倒数秩（MRR@100）为0.92。这些分数超过了独立的词汇模型，如BM 25和语义BERT类模型。
摘要：Bank supervisors face the complex task of ensuring that new measures are consistently aligned with historical precedents. To address this challenge, we introduce a novel Information Retrieval (IR) System tailored to assist supervisors in drafting both consistent and effective measures. This system ingests findings from on-site investigations. It then retrieves the most relevant historical findings and their associated measures from a comprehensive database, providing a solid basis for supervisors to write well-informed measures for new findings. Utilizing a blend of lexical, semantic, and Capital Requirements Regulation (CRR) fuzzy set matching techniques, the IR system ensures the retrieval of findings that closely align with current cases. The performance of this system, particularly in scenarios with partially labeled data, is validated through a Monte Carlo methodology, showcasing its robustness and accuracy. Enhanced by a Transformer-based Denoising AutoEncoder for fine-tuning, the final model achieves a Mean Average Precision (MAP@100) of 0.83 and a Mean Reciprocal Rank (MRR@100) of 0.92. These scores surpass those of both standalone lexical models such as BM25 and semantic BERT-like models.

【14】Context-Adaptive Multi-Prompt LLM Embedding for Vision-Language Alignment
标题：上下文自适应多提示LLM嵌入以实现视觉语言对齐
链接：https://arxiv.org/abs/2508.02762

作者：, Anelia Angelova
摘要：我们提出了上下文自适应多提示嵌入，一种新的方法来丰富视觉语言对比学习中的语义表征。与依赖于单个文本嵌入的标准CLIP风格模型不同，我们的方法引入了多个结构化提示，每个结构化提示包含一个不同的自适应令牌，可以捕获输入文本的不同语义方面。我们在一次向前传递中联合处理所有提示。由此产生的提示嵌入被组合成一个统一的文本表示，使语义更丰富的视觉功能对齐。为了进一步促进语义多样性和表示质量，我们将多样性正则化损失和负意识损失，鼓励专业化提示和提高对比歧视。我们的方法在图像-文本和视频-文本检索基准上实现了一致的改进。
摘要：We propose Context-Adaptive Multi-Prompt Embedding, a novel approach to enrich semantic representations in vision-language contrastive learning. Unlike standard CLIP-style models that rely on a single text embedding, our method introduces multiple structured prompts, each containing a distinct adaptive token that captures diverse semantic aspects of the input text. We process all prompts jointly in a single forward pass. The resulting prompt embeddings are combined into a unified text representation, enabling semantically richer alignment with visual features. To further promote semantic diversity and representation quality, we incorporate a diversity regularization loss and a negation-aware loss, encouraging specialization across prompts and improving contrastive discrimination. Our method achieves consistent improvements on both image-text and video-text retrieval benchmarks.

【15】SmallKV: Small Model Assisted Compensation of KV Cache Compression for Efficient LLM Inference
标题：SmallKV：小模型辅助的KV缓存压缩补偿，以实现高效的LLM推理
链接：https://arxiv.org/abs/2508.02751

作者：Yajuan Peng, Cam-Tu Nguyen, Zuchao Li, Xiaoliang Wang, Hai Zhao, Xiaoming Fu
摘要：KV缓存回收已经成为一种有效的解决方案，以减轻LLM在长上下文场景中面临的资源约束。然而，现有的令牌级驱逐方法往往忽略了两个关键方面：（1）他们的不可逆驱逐策略无法适应解码过程中的动态注意模式（显着性转移问题），以及（2）他们对待边缘重要的令牌和真正不重要的令牌平等，尽管边缘令牌的集体意义模型性能（边缘信息过压缩问题）。为了解决这些问题，我们设计了两个补偿机制的基础上的高度相似的注意力矩阵之间的不同尺度的LLM。我们提出了SmallKV，一个小模型辅助补偿方法KV缓存压缩。SmallKV可以保持不同尺度LLM之间的注意力匹配，以：1）帮助较大的模型感知全局重要的注意力信息; 2）使用较小模型的注意力分数来近似较大模型中的边缘标记的注意力分数。在GSM 8 K、BBH、MT-Bench和LongBench等基准测试上的大量实验证明了SmallKV的有效性。此外，效率评估表明，SmallKV实现了比基线方法高1.75 - 2.56倍的吞吐量，突出了其在资源受限环境中高效和高性能LLM推理的潜力。
摘要：KV cache eviction has emerged as an effective solution to alleviate resource constraints faced by LLMs in long-context scenarios. However, existing token-level eviction methods often overlook two critical aspects: (1) their irreversible eviction strategy fails to adapt to dynamic attention patterns during decoding (the saliency shift problem), and (2) they treat both marginally important tokens and truly unimportant tokens equally, despite the collective significance of marginal tokens to model performance (the marginal information over-compression problem). To address these issues, we design two compensation mechanisms based on the high similarity of attention matrices between LLMs of different scales. We propose SmallKV, a small model assisted compensation method for KV cache compression. SmallKV can maintain attention matching between different-scale LLMs to: 1) assist the larger model in perceiving globally important information of attention; and 2) use the smaller model's attention scores to approximate those of marginal tokens in the larger model. Extensive experiments on benchmarks including GSM8K, BBH, MT-Bench, and LongBench demonstrate the effectiveness of SmallKV. Moreover, efficiency evaluations show that SmallKV achieves 1.75 - 2.56 times higher throughput than baseline methods, highlighting its potential for efficient and performant LLM inference in resource constrained environments.

【16】A Bayesian Hybrid Parameter-Efficient Fine-Tuning Method for Large Language Models
标题：大型语言模型的贝叶斯混合参数有效微调方法
链接：https://arxiv.org/abs/2508.02711

作者：ai (1 and 2), Yang Liu (1 and 2), Yonghang Zhou (1 and 2), Jiaheng Xie (3), Daniel Dajun Zeng (4) ((1) School of Management, Hefei University of Technology, Hefei, China, (2) Key Laboratory of Process Optimization and Intelligent Decision-making, Ministry of Education, Hefei, China, (3) Department of Accounting and MIS, Lerner College of Business and Economics, University of Delaware, Newark, Delaware, U.S., (4) Institute of Automation, Chinese Academy of Sciences, Beijing, China)
摘要：大型语言模型（LLM）在重塑世界方面表现出了变革性的潜力。由于这些模型是在一般语料库上预训练的，因此它们通常需要特定于域的微调，以优化专业业务应用程序的性能。由于其庞大的规模，参数有效的微调（PEFT）方法被广泛用于减少训练成本。其中，结合多种PEFT技术的混合PEFT方法取得了最好的性能。然而，现有的混合PEFT方法在微调LLM用于专业应用时面临两个主要挑战：（1）依赖于点估计，缺乏量化不确定性以进行可靠决策的能力，以及（2）难以动态适应新兴数据，缺乏适应现实世界情况的能力。我们提出了贝叶斯混合参数有效的微调（BH-PEFT），一种新的方法，将贝叶斯学习到混合PEFT。BH-PEFT结合了适配器、LoRA和前缀调整，以微调Transformer的前馈和注意层。通过将可学习参数建模为分布，BH-PEFT实现了不确定性量化。我们进一步提出了一种贝叶斯动态微调方法，其中最后一个后验作为下一轮的先验，从而能够有效地适应新数据。我们评估了BH-PEFT的业务任务，如情感分析，新闻分类和常识推理。结果表明，我们的方法优于现有的PEFT基线，使不确定性量化更可靠的决策，并提高了适应性，在动态场景。这项工作通过提出一种新的BH-PEFT方法和动态微调方法，支持现实世界中的不确定性感知和自适应决策，为业务分析和数据科学做出了贡献。
摘要：Large Language Models (LLMs) have demonstrated transformative potential in reshaping the world. As these models are pretrained on general corpora, they often require domain-specific fine-tuning to optimize performance in specialized business applications. Due to their massive scale, parameter-efficient fine-tuning (PEFT) methods are widely used to reduce training costs. Among them, hybrid PEFT methods that combine multiple PEFT techniques have achieved the best performance. However, existing hybrid PEFT methods face two main challenges when fine-tuning LLMs for specialized applications: (1) relying on point estimates, lacking the ability to quantify uncertainty for reliable decision-making, and (2) struggling to dynamically adapt to emerging data, lacking the ability to suit real-world situations. We propose Bayesian Hybrid Parameter-Efficient Fine-Tuning (BH-PEFT), a novel method that integrates Bayesian learning into hybrid PEFT. BH-PEFT combines Adapter, LoRA, and prefix-tuning to fine-tune feedforward and attention layers of the Transformer. By modeling learnable parameters as distributions, BH-PEFT enables uncertainty quantification. We further propose a Bayesian dynamic fine-tuning approach where the last posterior serves as the prior for the next round, enabling effective adaptation to new data. We evaluated BH-PEFT on business tasks such as sentiment analysis, news categorization, and commonsense reasoning. Results show that our method outperforms existing PEFT baselines, enables uncertainty quantification for more reliable decisions, and improves adaptability in dynamic scenarios. This work contributes to business analytics and data science by proposing a novel BH-PEFT method and dynamic fine-tuning approach that support uncertainty-aware and adaptive decision-making in real-world situations.

【17】ToolRegistry: A Protocol-Agnostic Tool Management Library for Function-Calling LLMs
标题：Tools Registry：用于功能调用LLM的协议不可知的工具管理库
链接：https://arxiv.org/abs/2507.10593

作者：
摘要：大型语言模型（LLM）应用程序越来越依赖外部工具来扩展其文本生成功能。然而，当前的工具集成方法存在碎片化、协议限制和实现复杂性等问题，导致大量开发开销。本文介绍了Toolregistry，一个协议无关的工具管理库，简化了工具的注册，表示，执行和生命周期管理，通过一个统一的接口。我们的评估表明，\toolregistry减少了60-80%的工具集成代码，通过并发执行实现了高达3.1倍的性能提升，并与OpenAI函数调用标准100%兼容。真实的案例研究表明，在不同的集成场景中，开发效率和代码可维护性都有了显著的提高。\toolregistry是开源的，可以在https：//github.com/Oaklight/ToolRegistry上找到，在https://toolregistry.readthedocs.io/上有全面的文档。
摘要：Large Language Model (LLM) applications are increasingly relying on external tools to extend their capabilities beyond text generation. However, current tool integration approaches suffer from fragmentation, protocol limitations, and implementation complexity, leading to substantial development overhead. This paper presents Toolregistry, a protocol-agnostic tool management library that simplifies tool registration, representation, execution, and lifecycle management via a unified interface. Our evaluation demonstrates that \toolregistry achieves 60-80% reduction in tool integration code, up to 3.1x performance improvements through concurrent execution, and 100% compatibility with OpenAI function calling standards. Real-world case studies show significant improvements in development efficiency and code maintainability across diverse integration scenarios. \toolregistry is open-source and available at https://github.com/Oaklight/ToolRegistry, with comprehensive documentation at https://toolregistry.readthedocs.io/.

【18】Kronos: A Foundation Model for the Language of Financial Markets
标题：克洛诺斯：金融市场语言的基础模型
链接：https://arxiv.org/abs/2508.02739

作者：ongliang Fu, Shuo Chen, Bohan Zhao, Wei Xu, Changshui Zhang, Jian Li
摘要：以大型语言模型（LLM）为代表的大规模预训练范式的成功激发了时间序列基础模型（TSFM）的发展。然而，它们在金融烛台（K线）数据中的应用仍然有限，通常表现不佳。此外，现有的TSFMs往往忽略了关键的下游任务，如波动性预测和合成数据生成。为了解决这些限制，我们提出了Kronos，一个统一的，可扩展的预训练框架，专为金融K线建模。Kronos引入了一个专门的代币化器，将连续的市场信息离散化为代币序列，同时保留价格动态和交易活动模式。我们使用自回归目标对来自45个全球交易所的超过120亿条K线记录的大规模多市场语料库进行预训练，使其能够学习细致入微的时间和跨资产表示。Kronos在各种财务任务的zero-shot设置中表现出色。在基准数据集上，Kronos将价格序列预测RankIC比领先的TSFM提高了93%，比最佳非预训练基线提高了87%。它还实现了9%的低MAE波动性预测和22%的合成K线序列的生成保真度提高。这些结果使Kronos成为端到端金融时间序列分析的强大，通用的基础模型。我们的预训练模型可在https://github.com/shiyu-coder/Kronos上公开获得。
摘要：The success of large-scale pre-training paradigm, exemplified by Large Language Models (LLMs), has inspired the development of Time Series Foundation Models (TSFMs). However, their application to financial candlestick (K-line) data remains limited, often underperforming non-pre-trained architectures. Moreover, existing TSFMs often overlook crucial downstream tasks such as volatility prediction and synthetic data generation. To address these limitations, we propose Kronos, a unified, scalable pre-training framework tailored to financial K-line modeling. Kronos introduces a specialized tokenizer that discretizes continuous market information into token sequences, preserving both price dynamics and trade activity patterns. We pre-train Kronos using an autoregressive objective on a massive, multi-market corpus of over 12 billion K-line records from 45 global exchanges, enabling it to learn nuanced temporal and cross-asset representations. Kronos excels in a zero-shot setting across a diverse set of financial tasks. On benchmark datasets, Kronos boosts price series forecasting RankIC by 93% over the leading TSFM and 87% over the best non-pre-trained baseline. It also achieves a 9% lower MAE in volatility forecasting and a 22% improvement in generative fidelity for synthetic K-line sequences. These results establish Kronos as a robust, versatile foundation model for end-to-end financial time series analysis. Our pre-trained model is publicly available at https://github.com/shiyu-coder/Kronos.

Graph相关(图学习|图神经网络|图优化等)(4篇)

【1】Online Continual Graph Learning
标题：在线连续图形学习
链接：https://arxiv.org/abs/2508.03283

作者：Donghi, Luca Pasa, Daniele Zambon, Cesare Alippi, Nicolò Navarin
备注：This work has been submitted to the IEEE for possible publication
摘要：持续学习（CL）的目的是在避免灾难性遗忘的同时逐步学习新任务。在线持续学习（OCL）专注于从具有变化分布的连续数据流中有效学习。虽然最近的研究探索了利用图神经网络（GNN）在图上进行持续学习，但只有少数研究关注流媒体设置。然而，许多真实世界的图形随着时间的推移而演变，通常需要及时和在线预测。然而，目前的方法与标准的OCL设置并不一致，部分原因是缺乏对图上在线持续学习的明确定义。在这项工作中，我们提出了一个一般的配方在线持续学习的图形，强调批处理的效率要求的图形拓扑结构，并提供了一个定义良好的设置系统的模型评估。最后，我们介绍了一组基准，并报告了CL文献中的几种方法的性能，适应我们的设置。
摘要：The aim of Continual Learning (CL) is to learn new tasks incrementally while avoiding catastrophic forgetting. Online Continual Learning (OCL) specifically focuses on learning efficiently from a continuous stream of data with shifting distribution. While recent studies explore Continual Learning on graphs exploiting Graph Neural Networks (GNNs), only few of them focus on a streaming setting. Yet, many real-world graphs evolve over time, often requiring timely and online predictions. Current approaches, however, are not well aligned with the standard OCL setting, partly due to the lack of a clear definition of online Continual Learning on graphs. In this work, we propose a general formulation for online Continual Learning on graphs, emphasizing the efficiency requirements on batch processing over the graph topology, and providing a well-defined setting for systematic model evaluation. Finally, we introduce a set of benchmarks and report the performance of several methods in the CL literature, adapted to our setting.

【2】Understanding the Embedding Models on Hyper-relational Knowledge Graph
标题：理解超关系知识图上的嵌入模型
链接：https://arxiv.org/abs/2508.03280

作者：, Shimin Di, Zhili Wang, Haoyang Li, Fei Teng, Hao Xin, Lei Chen
备注：Accepted by CIKM 2025
摘要：最近，超关系知识图（HKG）已被提出作为传统知识图（KG）的扩展，以更好地表示真实世界的事实与额外的限定符。因此，研究人员试图通过设计额外的限定符处理模块来适应经典的知识图嵌入（KGE）模型。然而，目前尚不清楚超关系KGE（HKGE）模型的优越性能是否来自其基础KGE模型或专门设计的扩展模块。因此，在本文中，我们使用三种分解方法将HKG格式转换为KG格式，然后评估几种经典KGE模型对HKG的性能。我们的研究结果表明，一些KGE模型达到性能可比的HKGE模型。进一步分析后，我们发现，分解方法改变了原来的HKG拓扑结构，并未能完全保留HKG信息。此外，我们观察到，目前的HKGE模型是不够的，在捕捉图的远程依赖或斗争，以整合主三元组和限定符信息，由于信息压缩问题。为了进一步证明我们的发现并为未来的HKGE研究提供潜在的方向，我们提出了FormerGNN框架。该框架采用了一个限定符集成器来保留原始的HKG拓扑结构，并采用了一个基于GNN的图形编码器来捕获图形的长程依赖关系，然后采用了一种改进的方法来集成主三元组和限定符信息以减轻压缩问题。我们的实验结果表明，FormerGNN优于现有的HKGE模型。
摘要：Recently, Hyper-relational Knowledge Graphs (HKGs) have been proposed as an extension of traditional Knowledge Graphs (KGs) to better represent real-world facts with additional qualifiers. As a result, researchers have attempted to adapt classical Knowledge Graph Embedding (KGE) models for HKGs by designing extra qualifier processing modules. However, it remains unclear whether the superior performance of Hyper-relational KGE (HKGE) models arises from their base KGE model or the specially designed extension module. Hence, in this paper, we data-wise convert HKGs to KG format using three decomposition methods and then evaluate the performance of several classical KGE models on HKGs. Our results show that some KGE models achieve performance comparable to that of HKGE models. Upon further analysis, we find that the decomposition methods alter the original HKG topology and fail to fully preserve HKG information. Moreover, we observe that current HKGE models are either insufficient in capturing the graph's long-range dependency or struggle to integrate main-triple and qualifier information due to the information compression issue. To further justify our findings and offer a potential direction for future HKGE research, we propose the FormerGNN framework. This framework employs a qualifier integrator to preserve the original HKG topology, and a GNN-based graph encoder to capture the graph's long-range dependencies, followed by an improved approach for integrating main-triple and qualifier information to mitigate compression issues. Our experimental results demonstrate that FormerGNN outperforms existing HKGE models.

【3】GEDAN: Learning the Edit Costs for Graph Edit Distance
标题：GEDAN：学习图形编辑距离的编辑成本
链接：https://arxiv.org/abs/2508.03111

作者： Leonardi, Markus Orsi, Jean-Louis Reymond, Kaspar Riesen
摘要：图编辑距离（Graph Edit Distance，GED）是一种度量图之间的不相似性的度量标准，它是将一个图转换为另一个图的最小代价。GED的主要问题是它的计算是NP困难的，这反过来又导致了各种近似方法的发展，包括基于神经网络（NN）的方法。大多数基于NN的模型通过假设单位成本编辑操作简化了GED问题，这在现实世界的应用中是一个相当不切实际的约束。在这项工作中，我们提出了一种新的图神经网络框架，它使用监督和无监督训练来近似GED。在无监督设置中，它采用了仅梯度的自组织机制，可以在没有地面真实距离的情况下进行优化。此外，我们的架构的一个核心组成部分是广义加法模型的集成，它允许灵活和可解释的上下文感知的编辑成本的学习。实验结果表明，该方法取得了类似的结果，国家的最先进的参考方法，但显着提高了适应性和可解释性。也就是说，学习的成本函数提供了对复杂图结构的洞察，使其在分子分析和结构模式发现等领域特别有价值。
摘要：Graph Edit Distance (GED) is defined as the minimum cost transformation of one graph into another and is a widely adopted metric for measuring the dissimilarity between graphs. The major problem of GED is that its computation is NP-hard, which has in turn led to the development of various approximation methods, including approaches based on neural networks (NN). Most of these NN-based models simplify the problem of GED by assuming unit-cost edit operations, a rather unrealistic constraint in real-world applications. In this work, we present a novel Graph Neural Network framework that approximates GED using both supervised and unsupervised training. In the unsupervised setting, it employs a gradient-only self-organizing mechanism that enables optimization without ground-truth distances. Moreover, a core component of our architecture is the integration of a Generalized Additive Model, which allows the flexible and interpretable learning of context-aware edit costs. Experimental results show that the proposed method achieves similar results as state-of-the-art reference methods, yet significantly improves both adaptability and interpretability. That is, the learned cost function offers insights into complex graph structures, making it particularly valuable in domains such as molecular analysis and structural pattern discovery.

【4】Scalable Varied-Density Clustering via Graph Propagation
标题：通过图传播的可扩展变密度聚集
链接：https://arxiv.org/abs/2508.02989

作者：, Yingtao Zheng, Hugo Phibbs
摘要：我们提出了一个新的视角变密度聚类高维数据的框架，它作为一个标签传播过程中的邻域图，适应当地的密度变化。我们的方法正式连接基于密度的聚类与图形连接，使网络科学中开发的高效图形传播技术的使用。为了确保可扩展性，我们引入了一个密度感知的邻域传播算法，并利用先进的随机投影方法来构建近似的邻域图。我们的方法显着降低了计算成本，同时保持聚类质量。从经验上讲，它可以在几分钟内扩展到具有数百万个点的数据集，并与现有基线相比实现了具有竞争力的准确性。
摘要：We propose a novel perspective on varied-density clustering for high-dimensional data by framing it as a label propagation process in neighborhood graphs that adapt to local density variations. Our method formally connects density-based clustering with graph connectivity, enabling the use of efficient graph propagation techniques developed in network science. To ensure scalability, we introduce a density-aware neighborhood propagation algorithm and leverage advanced random projection methods to construct approximate neighborhood graphs. Our approach significantly reduces computational cost while preserving clustering quality. Empirically, it scales to datasets with millions of points in minutes and achieves competitive accuracy compared to existing baselines.

Transformer(7篇)

【1】VITA: Variational Pretraining of Transformers for Climate-Robust Crop Yield Forecasting
标题：VitA：用于气候稳健农作物产量预测的变形者变分预训练
链接：https://arxiv.org/abs/2508.03589

作者：n, Mardavij Roozbehani, Munther Dahleh
摘要：准确的作物产量预测对全球粮食安全至关重要。然而，当收益率偏离历史趋势时，当前的人工智能模型系统性地表现不佳。这个问题源于关键的数据挑战，包括丰富的预训练天气数据集和可用于微调的有限数据之间的重大不对称。我们介绍VITA（非对称数据的变分推理Transformer），这是一个解决这种非对称性的变分预训练框架。VITA不依赖于输入重建，而是在预训练期间使用详细的天气变量作为代理目标，并通过自我监督的特征掩蔽来学习预测丰富的大气状态。这允许在部署期间仅使用基本天气统计数据对模型进行微调。应用于美国玉米带的763个县，VITA在所有评估情景中预测玉米和大豆产量方面都达到了最先进的水平。虽然它在正常条件下始终提供卓越的性能，但其优势在极端天气年份尤为明显，具有统计学显著的改进（配对t检验，p约为0.01$）。重要的是，VITA使用更少的数据比GNN-RNN等先前的框架更好，使其在现实世界中更实用-特别是在数据稀缺的地区。这项工作突出了领域感知人工智能设计如何克服数据限制，并在不断变化的气候中支持弹性农业预测。
摘要：Accurate crop yield forecasting is essential for global food security. However, current AI models systematically underperform when yields deviate from historical trends. This issue arises from key data challenges, including a major asymmetry between rich pretraining weather datasets and the limited data available for fine-tuning. We introduce VITA (Variational Inference Transformer for Asymmetric data), a variational pretraining framework that addresses this asymmetry. Instead of relying on input reconstruction, VITA uses detailed weather variables as proxy targets during pretraining and learns to predict rich atmospheric states through self-supervised feature masking. This allows the model to be fine-tuned using only basic weather statistics during deployment. Applied to 763 counties in the U.S. Corn Belt, VITA achieves state-of-the-art performance in predicting corn and soybean yields across all evaluation scenarios. While it consistently delivers superior performance under normal conditions, its advantages are particularly pronounced during extreme weather years, with statistically significant improvements (paired t-test, $p \approx 0.01$). Importantly, VITA outperforms prior frameworks like GNN-RNN using less data, making it more practical for real-world use--particularly in data-scarce regions. This work highlights how domain-aware AI design can overcome data limitations and support resilient agricultural forecasting in a changing climate.

【2】MiSTR: Multi-Modal iEEG-to-Speech Synthesis with Transformer-Based Prosody Prediction and Neural Phase Reconstruction
标题：MiSTR：基于变换器的韵律预测和神经相位重建的多模态iEEG到语音合成
链接：https://arxiv.org/abs/2508.03166

作者：Salah Al-Radhi, Géza Németh, Branislav Gerazov
备注：5 pages, 2 figures, 1 table. Accepted for presentation at Interspeech 2025
摘要：从颅内脑电（iEEG）信号的语音合成提供了一个有前途的途径，恢复严重的语言障碍的个人的沟通。然而，由于特征表示、韵律建模和相位重建的限制，实现可理解和自然的语音仍然具有挑战性。我们介绍了MiSTR，这是一个深度学习框架，它集成了：1）基于小波的特征提取，以捕获iEEG信号的细粒度时间，频谱和神经生理表示，2）基于transformer的解码器，用于韵律感知频谱图预测，以及3）神经相位声码器，通过自适应频谱校正实现谐波一致性。在公共iEEG数据集上进行评估，MiSTR实现了最先进的语音清晰度，重建和原始Mel频谱图之间的平均Pearson相关性为0.91，优于现有的神经语音合成基线。
摘要：Speech synthesis from intracranial EEG (iEEG) signals offers a promising avenue for restoring communication in individuals with severe speech impairments. However, achieving intelligible and natural speech remains challenging due to limitations in feature representation, prosody modeling, and phase reconstruction. We introduce MiSTR, a deep-learning framework that integrates: 1) Wavelet-based feature extraction to capture fine-grained temporal, spectral, and neurophysiological representations of iEEG signals, 2) A Transformer-based decoder for prosody-aware spectrogram prediction, and 3) A neural phase vocoder enforcing harmonic consistency via adaptive spectral correction. Evaluated on a public iEEG dataset, MiSTR achieves state-of-the-art speech intelligibility, with a mean Pearson correlation of 0.91 between reconstructed and original Mel spectrograms, improving over existing neural speech synthesis baselines.

【3】BoostTransformer: Enhancing Transformer Models with Subgrid Selection and Importance Sampling
标题：BoostTransformer：通过次电网选择和重要性抽样增强Transformer模型
链接：https://arxiv.org/abs/2508.02924

作者：, Jean Utke, Truong Vo, Diego Klabjan
备注：10 pages, 5 figures, submitted for review at a major machine learning conference. arXiv admin note: substantial text overlap with arXiv:2203.00761, arXiv:2507.22842
摘要：Transformer架构在现代NLP中占主导地位，但通常需要大量的计算资源和复杂的超参数调整。为了减轻这些挑战，我们提出了一个新的框架，BoostTransformer，增强Transformers的升压原则，通过子网格令牌选择和重要性加权采样。我们的方法将最小二乘提升目标直接集成到Transformer管道中，从而实现更有效的训练和更高的性能。在多个细粒度文本分类基准测试中，BoostTransformer表现出更快的收敛速度和更高的准确性，超越了标准Transformers，同时最大限度地减少了架构搜索开销。
摘要：Transformer architectures dominate modern NLP but often demand heavy computational resources and intricate hyperparameter tuning. To mitigate these challenges, we propose a novel framework, BoostTransformer, that augments transformers with boosting principles through subgrid token selection and importance-weighted sampling. Our method incorporates a least square boosting objective directly into the transformer pipeline, enabling more efficient training and improved performance. Across multiple fine-grained text classification benchmarks, BoostTransformer demonstrates both faster convergence and higher accuracy, surpassing standard transformers while minimizing architectural search overhead.

【4】Beyond Least Squares: Robust Regression Transformer (R2T)
标题：超越最小二乘：鲁棒回归Transformer（R2T）
链接：https://arxiv.org/abs/2508.02874

作者：ierrez, Tony Kai Tang, Isabel Gutierrez
备注：10 pages, 4 figures, 1 table
摘要：鲁棒回归技术依赖于最小二乘优化，这对于高斯噪声很有效，但在存在非对称结构噪声的情况下失败。我们提出了一种混合神经符号架构，其中一个Transformer编码器处理数值序列，一个压缩NN预测符号参数，和一个固定的符号方程重建原始序列。使用合成数据，训练目标是在添加不对称结构噪声后恢复原始序列，有效地学习由神经参数估计引导的符号拟合。我们的模型在合成可穿戴数据上实现了6 e-6至3.5e-5的中位回归MSE，与普通最小二乘拟合和鲁棒回归技术（如Huber loss或SoftL 1）相比，这是10-300倍的改进。
摘要：Robust regression techniques rely on least-squares optimization, which works well for Gaussian noise but fails in the presence of asymmetric structured noise. We propose a hybrid neural-symbolic architecture where a transformer encoder processes numerical sequences, a compression NN predicts symbolic parameters, and a fixed symbolic equation reconstructs the original sequence. Using synthetic data, the training objective is to recover the original sequence after adding asymmetric structured noise, effectively learning a symbolic fit guided by neural parameter estimation. Our model achieves a median regression MSE of 6e-6 to 3.5e-5 on synthetic wearable data, which is a 10-300 times improvement when compared with ordinary least squares fit and robust regression techniques such as Huber loss or SoftL1.

【5】Evaluation and Analysis of Deep Neural Transformers and Convolutional Neural Networks on Modern Remote Sensing Datasets
标题：现代遥感数据集深度神经变换器和卷积神经网络的评估和分析
链接：https://arxiv.org/abs/2508.02871

作者：urt, Trevor M. Bajkowski, Grant J. Scott, Curt H. Davis
摘要：2012年，AlexNet将深度卷积神经网络（DCNN）确立为CV的最新技术，因为这些网络很快就在许多领域（包括遥感）的视觉任务中占据了主导地位。随着Visual Transformers的出版，我们正在见证计算视觉的第二次现代飞跃，因此，必须了解各种基于transformer的神经网络如何在卫星图像上执行。虽然Transformers在自然语言处理和CV应用中表现出了很高的性能，但它们还没有大规模地与现代遥感数据进行比较。在本文中，我们探讨了使用基于变换器的神经网络进行高分辨率光电卫星图像中的目标检测，在各种公开的基准数据集上展示了最先进的性能。在这项研究中，我们比较了11种不同的边界框检测和定位算法，其中7种是自2020年以来发表的，所有11种都是自2015年以来发表的。五个基于transformer的架构的性能与六个卷积网络在三个最先进的开源高分辨率遥感图像数据集的大小和复杂性进行了比较。在对33个深度神经模型进行训练和评估之后，我们讨论和分析了各种特征提取方法和检测算法的模型性能。
摘要：In 2012, AlexNet established deep convolutional neural networks (DCNNs) as the state-of-the-art in CV, as these networks soon led in visual tasks for many domains, including remote sensing. With the publication of Visual Transformers, we are witnessing the second modern leap in computational vision, and as such, it is imperative to understand how various transformer-based neural networks perform on satellite imagery. While transformers have shown high levels of performance in natural language processing and CV applications, they have yet to be compared on a large scale to modern remote sensing data. In this paper, we explore the use of transformer-based neural networks for object detection in high-resolution electro-optical satellite imagery, demonstrating state-of-the-art performance on a variety of publicly available benchmark data sets. We compare eleven distinct bounding-box detection and localization algorithms in this study, of which seven were published since 2020, and all eleven since 2015. The performance of five transformer-based architectures is compared with six convolutional networks on three state-of-the-art opensource high-resolution remote sensing imagery datasets ranging in size and complexity. Following the training and evaluation of thirty-three deep neural models, we then discuss and analyze model performance across various feature extraction methodologies and detection algorithms.

【6】PyCAT4: A Hierarchical Vision Transformer-based Framework for 3D Human Pose Estimation
标题：PyCAT 4：基于分层视觉转换器的3D人体姿势估计框架
链接：https://arxiv.org/abs/2508.02806

作者：ang, Jonathan Loo
备注：10 pages, 20 figures
摘要：最近，通过将卷积神经网络（CNN）与金字塔网格对齐反馈回路相结合，已经实现了3D人体姿态估计准确性的显著提高。此外，通过采用基于transformer的时态分析架构，在计算机视觉领域取得了创新性突破。鉴于这些进步，本研究旨在深入优化和改进现有的Pymaf网络架构。本文的主要创新点包括：（1）引入基于自注意机制的Transformer特征提取网络层，增强了对低层特征的捕获;（2）通过特征时间融合技术，增强了对视频序列中时间信号的理解和捕获;（3）采用空间金字塔结构实现多尺度特征融合，有效平衡不同尺度特征表示的差异。通过在COCO和3DPW数据集上的实验验证了本研究中获得的新PyCAT 4模型。实验结果表明，所提出的改进策略显著提高了网络在人体姿态估计中的检测能力，进一步推动了人体姿态估计技术的发展。
摘要：Recently, a significant improvement in the accuracy of 3D human pose estimation has been achieved by combining convolutional neural networks (CNNs) with pyramid grid alignment feedback loops. Additionally, innovative breakthroughs have been made in the field of computer vision through the adoption of Transformer-based temporal analysis architectures. Given these advancements, this study aims to deeply optimize and improve the existing Pymaf network architecture. The main innovations of this paper include: (1) Introducing a Transformer feature extraction network layer based on self-attention mechanisms to enhance the capture of low-level features; (2) Enhancing the understanding and capture of temporal signals in video sequences through feature temporal fusion techniques; (3) Implementing spatial pyramid structures to achieve multi-scale feature fusion, effectively balancing feature representations differences across different scales. The new PyCAT4 model obtained in this study is validated through experiments on the COCO and 3DPW datasets. The results demonstrate that the proposed improvement strategies significantly enhance the network's detection capability in human pose estimation, further advancing the development of human pose estimation technology.

【7】Forecasting NCAA Basketball Outcomes with Deep Learning: A Comparative Study of LSTM and Transformer Models
标题：利用深度学习预测NCAA篮球成绩：LSTM和Transformer模型的比较研究
链接：https://arxiv.org/abs/2508.02725

作者： Habib
备注：20 page scientific report
摘要：在这项研究中，我探索了先进的深度学习方法来预测2025年NCAA Division 1男子和女子篮球锦标赛的结果。利用历史NCAA比赛数据，我实现了两个复杂的基于序列的模型：长短期记忆（LSTM）和Transformer架构。这些模型的预测能力通过全面的特征工程得到增强，包括从广义线性模型（GLM）、Elo评级、种子差异和聚合框分数统计中获得的团队质量指标。为了评估预测的鲁棒性和可靠性，我使用二进制交叉熵（BCE）和Brier损失函数训练每个模型变量，从而深入了解分类性能和概率校准。我的比较分析显示，虽然使用BCE优化的Transformer架构产生了卓越的区分能力（最高AUC为0.8473），但使用Brier损失训练的LSTM模型表现出卓越的概率校准（最低Brier得分为0.1589）。这些发现强调了根据预测任务的具体要求选择适当的模型架构和损失函数的重要性。本文提供的详细分析流程可作为体育分析及其他领域未来预测建模任务的可复制框架。
摘要：In this research, I explore advanced deep learning methodologies to forecast the outcomes of the 2025 NCAA Division 1 Men's and Women's Basketball tournaments. Leveraging historical NCAA game data, I implement two sophisticated sequence-based models: Long Short-Term Memory (LSTM) and Transformer architectures. The predictive power of these models is augmented through comprehensive feature engineering, including team quality metrics derived from Generalized Linear Models (GLM), Elo ratings, seed differences, and aggregated box-score statistics. To evaluate the robustness and reliability of predictions, I train each model variant using both Binary Cross-Entropy (BCE) and Brier loss functions, providing insights into classification performance and probability calibration. My comparative analysis reveals that while the Transformer architecture optimized with BCE yields superior discriminative power (highest AUC of 0.8473), the LSTM model trained with Brier loss demonstrates superior probabilistic calibration (lowest Brier score of 0.1589). These findings underscore the importance of selecting appropriate model architectures and loss functions based on the specific requirements of forecasting tasks. The detailed analytical pipeline presented here serves as a reproducible framework for future predictive modeling tasks in sports analytics and beyond.

GAN|对抗|攻击|生成相关(5篇)

【1】Defending Against Knowledge Poisoning Attacks During Retrieval-Augmented Generation
标题：检索增强生成中知识中毒攻击的防御
链接：https://arxiv.org/abs/2508.02835

作者：demacu, Vinay M. Shashidhar, Micheal Tuape, Dan Abudu, Beakcheol Jang, Jong Wook Kim
备注：Preprint for Submission
摘要：检索增强生成（RAG）已经成为一种强大的方法，通过整合外部的最新知识源来提高大型语言模型（LLM）的能力。然而，这引入了知识中毒攻击的潜在漏洞，其中攻击者可以危害知识源以误导生成模型。其中一种攻击是PoisonedRAG，其中注入的对抗性文本引导模型生成攻击者选择的对目标问题的响应。在这项工作中，我们提出了新的防御方法，FilterRAG和ML-FilterRAG，以减轻PoisonedRAG攻击。首先，我们提出了一个新的属性来揭示不同的属性，以区分知识数据源中的对抗性文本和干净文本。接下来，我们在设计我们提出的方法时，利用这个属性从干净的文本中过滤掉敌对文本。使用基准数据集的这些方法的评估证明了它们的有效性，与原始RAG系统的性能接近。
摘要：Retrieval-Augmented Generation (RAG) has emerged as a powerful approach to boost the capabilities of large language models (LLMs) by incorporating external, up-to-date knowledge sources. However, this introduces a potential vulnerability to knowledge poisoning attacks, where attackers can compromise the knowledge source to mislead the generation model. One such attack is the PoisonedRAG in which the injected adversarial texts steer the model to generate an attacker-chosen response to a target question. In this work, we propose novel defense methods, FilterRAG and ML-FilterRAG, to mitigate the PoisonedRAG attack. First, we propose a new property to uncover distinct properties to differentiate between adversarial and clean texts in the knowledge data source. Next, we employ this property to filter out adversarial texts from clean ones in the design of our proposed approaches. Evaluation of these methods using benchmark datasets demonstrate their effectiveness, with performances close to those of the original RAG systems.

【2】Clinically Grounded Agent-based Report Evaluation: An Interpretable Metric for Radiology Report Generation
标题：基于临床基础的基于代理的报告评估：放射学报告生成的可解释指标
链接：https://arxiv.org/abs/2508.02808

作者：ua, Young Joon (Fred)Kwon, Siddhant Dogra, Daniel Freedman, Diana Ruan, Motaz Nashawaty, Danielle Rigau, Daniel Alexander Alber, Kang Zhang, Kyunghyun Cho, Eric Karl Oermann
摘要：放射成像是诊断、治疗计划和临床决策的核心。视觉语言基础模型激发了人们对自动放射学报告生成（RRG）的兴趣，但安全部署需要对生成的报告进行可靠的临床评估。现有的指标往往依赖于表面水平的相似性或表现为黑箱，缺乏可解释性。我们介绍ICARE（可解释和临床接地代理的报告评估），一个可解释的评估框架，利用大型语言模型代理和动态多项选择题回答（MCQA）。两个代理，每个代理都有地面实况或生成的报告，生成有临床意义的问题并相互提问。答案的一致性捕获了结果的保存和一致性，作为临床精确度和召回率的可解释代理。通过将分数与问答对联系起来，ICARE实现了透明和可解释的评估。临床医生研究表明，ICARE与专家判断的一致性明显高于先前的指标。扰动分析证实了临床内容和再现性的敏感性，而模型比较揭示了可解释的错误模式。
摘要：Radiological imaging is central to diagnosis, treatment planning, and clinical decision-making. Vision-language foundation models have spurred interest in automated radiology report generation (RRG), but safe deployment requires reliable clinical evaluation of generated reports. Existing metrics often rely on surface-level similarity or behave as black boxes, lacking interpretability. We introduce ICARE (Interpretable and Clinically-grounded Agent-based Report Evaluation), an interpretable evaluation framework leveraging large language model agents and dynamic multiple-choice question answering (MCQA). Two agents, each with either the ground-truth or generated report, generate clinically meaningful questions and quiz each other. Agreement on answers captures preservation and consistency of findings, serving as interpretable proxies for clinical precision and recall. By linking scores to question-answer pairs, ICARE enables transparent, and interpretable assessment. Clinician studies show ICARE aligns significantly more with expert judgment than prior metrics. Perturbation analyses confirm sensitivity to clinical content and reproducibility, while model comparisons reveal interpretable error patterns.

【3】Synthetic medical data generation: state of the art and application to trauma mechanism classification
标题：综合医学数据生成：最新技术及其在创伤机制分类中的应用
链接：https://arxiv.org/abs/2508.02771

作者：remus, Ariel Guerra-Adames, Marta Avalos-Fernandez, Vianney Jouhet, Cédric Gil-Jardiné, Emmanuel Lagarde
备注：Accepted to CIBB 2025 as a short paper
摘要：面对患者保密性和科学可重复性的挑战，健康机器学习的研究正转向合成医学数据库的概念。本文简要概述了用于生成合成表格和文本数据的最先进的机器学习方法，重点是它们在创伤机制自动分类中的应用，然后是我们提出的结合表格和非结构化文本数据生成高质量合成病历的方法。
摘要：Faced with the challenges of patient confidentiality and scientific reproducibility, research on machine learning for health is turning towards the conception of synthetic medical databases. This article presents a brief overview of state-of-the-art machine learning methods for generating synthetic tabular and textual data, focusing their application to the automatic classification of trauma mechanisms, followed by our proposed methodology for generating high-quality, synthetic medical records combining tabular and unstructured text data.

【4】ECGTwin: Personalized ECG Generation Using Controllable Diffusion Model
标题：ECGTwin：使用可控扩散模型的个性化心电图生成
链接：https://arxiv.org/abs/2508.02720

作者：ai, Bo Liu, Xinyan Guan, Qinghao Zhao, Hongyan Li, Shenda Hong
摘要：个性化心电图（ECG）生成是模拟患者的ECG数字孪生模型，以适应特定条件。它有可能将传统医疗保健转变为更准确的个性化模式，同时保留传统人群水平ECG合成的关键优势。然而，这个有前途的任务提出了两个基本挑战：提取单个特征而不需要地面实况，以及在不混淆生成模型的情况下注入各种类型的条件。在本文中，我们提出了ECGTwin，一个两阶段的框架，旨在解决这些挑战。在第一阶段中，通过对比学习训练的个体基础提取器从参考ECG中鲁棒地捕获个人特征。在第二阶段，提取的个体特征以及目标心脏状况通过我们的新型AdaX条件注入器集成到基于扩散的生成过程中，AdaX条件注入器通过两个专用和专门的路径注入这些信号。定性和定量实验都表明，我们的模型不仅可以生成高保真度和多样性的心电信号，通过提供一个细粒度的生成可控性，但也保留了个人的具体特征。此外，ECGTwin显示了在下游应用中增强ECG自动诊断的潜力，证实了精确个性化医疗保健解决方案的可能性。
摘要：Personalized electrocardiogram (ECG) generation is to simulate a patient's ECG digital twins tailored to specific conditions. It has the potential to transform traditional healthcare into a more accurate individualized paradigm, while preserving the key benefits of conventional population-level ECG synthesis. However, this promising task presents two fundamental challenges: extracting individual features without ground truth and injecting various types of conditions without confusing generative model. In this paper, we present ECGTwin, a two-stage framework designed to address these challenges. In the first stage, an Individual Base Extractor trained via contrastive learning robustly captures personal features from a reference ECG. In the second stage, the extracted individual features, along with a target cardiac condition, are integrated into the diffusion-based generation process through our novel AdaX Condition Injector, which injects these signals via two dedicated and specialized pathways. Both qualitative and quantitative experiments have demonstrated that our model can not only generate ECG signals of high fidelity and diversity by offering a fine-grained generation controllability, but also preserving individual-specific features. Furthermore, ECGTwin shows the potential to enhance ECG auto-diagnosis in downstream application, confirming the possibility of precise personalized healthcare solutions.

【5】CTBench: Cryptocurrency Time Series Generation Benchmark
标题：CTBench：加密货币时间序列生成基准
链接：https://arxiv.org/abs/2508.02758

作者：, Qiang Wang, Qiang Huang, Yifan Bao, Xinyu Xi, Anthony K. H. Tung, Chen Jin, Zhiyong Huang
备注：14 pages, 14 figures, and 3 tables
摘要：合成时间序列是定量金融中数据扩充、压力测试和算法原型的重要工具。然而，在以24/7交易、极端波动和快速政权转移为特征的加密货币市场中，现有的时间序列生成（TSG）方法和基准往往不足，危及实际效用。大多数先前的工作（1）针对非金融或传统金融领域，（2）狭隘地关注分类和预测，而忽略了加密货币特有的复杂性，（3）缺乏关键的财务评估，特别是对于交易应用程序。为了解决这些差距，我们引入了\textsf{CTBench}，这是为加密货币领域量身定制的第一个全面的TSG基准。\textsf{CTBench}从452个代币中整理了一个开源数据集，并在5个关键维度的13个指标上评估了TSG模型：预测准确性、排名保真度、交易性能、风险评估和计算效率。一个关键的创新是一个双任务评估框架：（1）预测效用任务衡量合成数据如何保持时间和横截面模式进行预测，而（2）统计套利任务评估重建的系列是否支持均值回复信号进行交易。我们对来自五个方法论家族的八个代表性模型在四个不同的市场制度进行了基准测试，揭示了统计保真度和现实世界盈利能力之间的权衡。值得注意的是，\textsf{CTBench}提供了模型排名分析和可操作的指导，用于在加密分析和战略开发中选择和部署TSG模型。
摘要：Synthetic time series are essential tools for data augmentation, stress testing, and algorithmic prototyping in quantitative finance. However, in cryptocurrency markets, characterized by 24/7 trading, extreme volatility, and rapid regime shifts, existing Time Series Generation (TSG) methods and benchmarks often fall short, jeopardizing practical utility. Most prior work (1) targets non-financial or traditional financial domains, (2) focuses narrowly on classification and forecasting while neglecting crypto-specific complexities, and (3) lacks critical financial evaluations, particularly for trading applications. To address these gaps, we introduce \textsf{CTBench}, the first comprehensive TSG benchmark tailored for the cryptocurrency domain. \textsf{CTBench} curates an open-source dataset from 452 tokens and evaluates TSG models across 13 metrics spanning 5 key dimensions: forecasting accuracy, rank fidelity, trading performance, risk assessment, and computational efficiency. A key innovation is a dual-task evaluation framework: (1) the \emph{Predictive Utility} task measures how well synthetic data preserves temporal and cross-sectional patterns for forecasting, while (2) the \emph{Statistical Arbitrage} task assesses whether reconstructed series support mean-reverting signals for trading. We benchmark eight representative models from five methodological families over four distinct market regimes, uncovering trade-offs between statistical fidelity and real-world profitability. Notably, \textsf{CTBench} offers model ranking analysis and actionable guidance for selecting and deploying TSG models in crypto analytics and strategy development.

半/弱/无/有监督|不确定性|主动学习(10篇)

【1】PAC Apprenticeship Learning with Bayesian Active Inverse Reinforcement Learning
标题：采用Bayesian主动反向强化学习的PAC学徒学习
链接：https://arxiv.org/abs/2508.03693

作者：jgar, Dewi S.W. Gould, Jonathon Liu, Alessandro Abate, Konstantinos Gatsis, Michael A. Osborne
备注：Published at RLC 2025
摘要：随着人工智能系统变得越来越自主，将其决策与人类偏好可靠地结合起来至关重要。反向强化学习（IRL）提供了一种很有前途的方法来推断演示的偏好。然后，可以使用这些偏好来生成在所演示的任务上表现良好的学徒策略。然而，在自动驾驶或机器人等领域，错误可能会产生严重后果，我们不仅需要良好的平均性能，还需要有正式保证的可靠政策-然而，获得足够的可靠性保证的人工演示可能代价高昂。Active IRL通过战略性地选择最具信息性的场景进行人类演示来应对这一挑战。我们介绍PAC-EIG，信息理论的采集功能，直接针对可能近似正确（PAC）的保证学习的政策-提供第一个这样的理论保证主动IRL与嘈杂的专家演示。我们的方法最大限度地提高信息增益的学徒政策的遗憾，有效地确定国家需要进一步的示范。当学习奖励本身是主要目标时，我们还提出了奖励EIG作为替代方案。集中在有限的状态-动作空间，我们证明了收敛界，说明失败的模式，以前的启发式方法，并证明我们的方法的优势实验。
摘要：As AI systems become increasingly autonomous, reliably aligning their decision-making to human preferences is essential. Inverse reinforcement learning (IRL) offers a promising approach to infer preferences from demonstrations. These preferences can then be used to produce an apprentice policy that performs well on the demonstrated task. However, in domains like autonomous driving or robotics, where errors can have serious consequences, we need not just good average performance but reliable policies with formal guarantees -- yet obtaining sufficient human demonstrations for reliability guarantees can be costly. Active IRL addresses this challenge by strategically selecting the most informative scenarios for human demonstration. We introduce PAC-EIG, an information-theoretic acquisition function that directly targets probably-approximately-correct (PAC) guarantees for the learned policy -- providing the first such theoretical guarantee for active IRL with noisy expert demonstrations. Our method maximises information gain about the regret of the apprentice policy, efficiently identifying states requiring further demonstration. We also present Reward-EIG as an alternative when learning the reward itself is the primary objective. Focusing on finite state-action spaces, we prove convergence bounds, illustrate failure modes of prior heuristic methods, and demonstrate our method's advantages experimentally.

【2】UPLME: Uncertainty-Aware Probabilistic Language Modelling for Robust Empathy Regression
标题：UPLME：鲁棒共情回归的不确定概率语言建模
链接：https://arxiv.org/abs/2508.03520

作者：l Hasan, Md Zakir Hossain, Aneesh Krishna, Shafin Rahman, Tom Gedeon
备注：Code available at this https URL
摘要：共情回归的监督学习受到嘈杂的自我报告共情分数的挑战。虽然已经提出了许多算法用于在文本分类问题中使用噪声标签进行学习，但回归对应的算法相对来说还没有得到充分的探索。我们提出了UPLME，一个不确定性感知的概率语言建模框架，以捕获标签噪声的移情检测的回归设置。UPLME包括一个概率语言模型，该模型预测同理心得分和异方差不确定性，并使用贝叶斯概念和变分模型集成进行训练。我们还引入了两个新的损失分量：一个惩罚退化的不确定性量化（UQ），另一个强制执行的相似性输入对上，我们预测共情。UPLME提供了最先进的性能（皮尔逊相关系数：$0.558\rightarrow0.580$和$0.629\rightarrow0.634$）在文献中报告的性能在两个公共基准，标签噪声。通过合成标签噪声注入，我们表明，UPLME是有效的分离噪声和清洁样本的基础上预测的不确定性。UPLME进一步优于（校准误差：$0.571\rightarrow0.376$）最近变分模型集成为基础的UQ方法设计的回归问题。
摘要：Supervised learning for empathy regression is challenged by noisy self-reported empathy scores. While many algorithms have been proposed for learning with noisy labels in textual classification problems, the regression counterpart is relatively under-explored. We propose UPLME, an uncertainty-aware probabilistic language modelling framework to capture label noise in the regression setting of empathy detection. UPLME includes a probabilistic language model that predicts both empathy score and heteroscedastic uncertainty and is trained using Bayesian concepts with variational model ensembling. We further introduce two novel loss components: one penalises degenerate Uncertainty Quantification (UQ), and another enforces the similarity between the input pairs on which we predict empathy. UPLME provides state-of-the-art performance (Pearson Correlation Coefficient: $0.558\rightarrow0.580$ and $0.629\rightarrow0.634$) in terms of the performance reported in the literature in two public benchmarks, having label noise. Through synthetic label noise injection, we show that UPLME is effective in separating noisy and clean samples based on the predicted uncertainty. UPLME further outperform (Calibration error: $0.571\rightarrow0.376$) a recent variational model ensembling-based UQ method designed for regression problems.

【3】Cropping outperforms dropout as an augmentation strategy for training self-supervised text embeddings
标题：作为训练自我监督文本嵌入的增强策略，裁剪的性能优于丢弃
链接：https://arxiv.org/abs/2508.03453

作者：ález-Márquez, Philipp Berens, Dmitry Kobak
摘要：文本嵌入，即整个文本的矢量表示，在许多NLP应用中发挥着重要作用，例如检索增强生成，情感分析，聚类或可视化文本集合以进行数据探索。目前，性能最好的嵌入模型是通过使用策划的文本对进行广泛的监督微调，从预先训练的语言模型中获得的。这与计算机视觉形成了鲜明对比，在计算机视觉中，基于数据增强的自我监督训练已经取得了显着的成功。在这里，我们系统地比较了两个最著名的增强策略，积极对生成文本嵌入的对比学习。我们评估嵌入质量MTEB和额外的域内评估，并表明裁剪增强大大优于基于辍学的方法。我们发现，在域外数据上，嵌入结果的质量低于监督SOTA模型，但对于域内数据，自监督微调在非常短的微调后产生高质量的文本嵌入，有时仅略低于监督SOTA。最后，我们表明，表示质量增加对最后的Transformer层，在微调过程中发生最大的变化，微调只有这些最后的层是足以达到类似的嵌入质量。
摘要：Text embeddings, i.e. vector representations of entire texts, play an important role in many NLP applications, such as retrieval-augmented generation, sentiment analysis, clustering, or visualizing collections of texts for data exploration. Currently, top-performing embedding models are derived from pre-trained language models via extensive supervised fine-tuning using curated text pairs. This contrasts with computer vision, where self-supervised training based on data augmentations has demonstrated remarkable success. Here we systematically compare the two most well-known augmentation strategies for positive pair generation in contrastive learning of text embeddings. We assess embedding quality on MTEB and additional in-domain evaluations and show that cropping augmentation strongly outperforms the dropout-based approach. We find that on out-of-domain data, the quality of resulting embeddings is below the supervised SOTA models, but for in-domain data, self-supervised fine-tuning produces high-quality text embeddings after very short fine-tuning, sometimes only marginally below the supervised SOTA. Finally, we show that representation quality increases towards the last transformer layers, which undergo the largest change during fine-tuning; and that fine-tuning only those last layers is sufficient to reach similar embedding quality.

【4】VRPO: Rethinking Value Modeling for Robust RL Training under Noisy Supervision
标题：VR PO：重新思考噪音监督下稳健RL训练的价值建模
链接：https://arxiv.org/abs/2508.03058

作者：hu, Shihan Dou, Zhiheng Xi, Senjie Jin, Guoqiang Zhang, Jiazheng Zhang, Junjie Ye, Mingxu Chai, Enyu Zhou, Ming Zhang, Caishuang Huang, Yunke Zhang, Yuran Wang, Tao Gui
摘要：基于人类反馈的强化学习（RLHF）在现实环境中经常受到噪声或不完美的奖励监督的影响，这会破坏策略的稳定性和泛化能力。这种噪声可能会导致模型在优势估计期间对关键词失去注意力。虽然以前的工作集中在奖励去噪或过滤不良数据，但它往往忽略了价值模型在策略优化中的关键作用。在这项工作中，我们证明了强价值模型对于通过吸收不稳定信号来减轻噪声并实现更可靠的优势估计是必不可少的。我们提出了VRPO，一个以价值为中心的框架，用于在嘈杂的监督下进行强大的PPO训练。VRPO结合了两个核心设计：（1）由冻结语言模型的熵和困惑引导的辅助损失，以及（2）变化的信息瓶颈。这些机制增强了价值模型在优势估计过程中滤除噪声并从上下文中捕获关键词的能力，将其从被动预测器转变为噪声的主动调节器。在基于规则和基于模型的噪声奖励下，数学推理，科学QA和多轮对话的实验表明，VRPO始终优于PPO和GRPO基线。我们的研究结果强调了RLHF中价值模型经常被忽视的重要性，并为在嘈杂的现实环境中进行稳健的策略优化提供了一种原则性和实用性的方法。
摘要：Reinforcement Learning from Human Feedback (RLHF) often suffers from noisy or imperfect reward supervision in real-world settings, which undermines policy stability and generalization. Such noise may cause models to lose attention on key words during advantage estimation. While prior work focuses on reward denoising or filtering poor data, it often overlooks the critical role of the value model in policy optimization. In this work, we show that a strong value model is essential for mitigating noise by absorbing unstable signals and enabling more reliable advantage estimation. We propose VRPO, a value-centric framework for robust PPO training under noisy supervision. VRPO combines two core designs: (1) an auxiliary loss guided by entropy and perplexity from a frozen language model, and (2) a variational information bottleneck. These mechanisms enhance the value model's ability to filter out noise and capture key words from the context during advantage estimation, transforming it from a passive predictor into an active regulator of noise. Experiments on math reasoning, science QA, and multi-turn dialogue, under both rule-based and model-based noisy rewards, show that VRPO consistently outperforms PPO and GRPO baselines. Our findings underscore the often-overlooked importance of the value model in RLHF and offer a principled and practical approach to robust policy optimization in noisy real-world environments.

【5】Uncertainty Sets for Distributionally Robust Bandits Using Structural Equation Models
标题：使用结构方程模型的分布稳健盗贼的不确定性集
链接：https://arxiv.org/abs/2508.02812

作者： Avery, Chinmay Pendse, David Jensen
备注：10 pages main text, 28 pages total
摘要：分布鲁棒评估估计最坏情况下的预期回报超过可能的协变量和奖励分布的不确定性集，分布鲁棒学习找到一个政策，最大化最坏情况下的回报在该不确定性集。不幸的是，目前的方法分布鲁棒的评估和学习创建过于保守的评估和政策。在这项工作中，我们提出了一个实用的强盗评估和学习算法，量身定制的不确定性集的具体问题，使用结构方程模型约束的数学程序。此外，我们还展示了如何使用条件独立性测试来检测建模的移位变量。我们发现，结构方程模型（SEM）的方法提供了更准确的评估和学习低方差的政策比传统的方法，特别是对于大的变化。此外，SEM方法学习最优策略，假设模型充分指定。
摘要：Distributionally robust evaluation estimates the worst-case expected return over an uncertainty set of possible covariate and reward distributions, and distributionally robust learning finds a policy that maximizes that worst-case return across that uncertainty set. Unfortunately, current methods for distributionally robust evaluation and learning create overly conservative evaluations and policies. In this work, we propose a practical bandit evaluation and learning algorithm that tailors the uncertainty set to specific problems using mathematical programs constrained by structural equation models. Further, we show how conditional independence testing can be used to detect shifted variables for modeling. We find that the structural equation model (SEM) approach gives more accurate evaluations and learns lower-variance policies than traditional approaches, particularly for large shifts. Further, the SEM approach learns an optimal policy, assuming the model is sufficiently well-specified.

【6】Supervised Dynamic Dimension Reduction with Deep Neural Network
标题：利用深度神经网络进行有监督的动态降维
链接：https://arxiv.org/abs/2508.03546

作者：o, Yuefeng Han, Xiufan Yu
摘要：本文研究了降维问题，旨在提高高维预测因子的时间序列预测能力。我们提出了一种新的监督深度动态主成分分析（SDDP）框架，该框架将目标变量和滞后观测值纳入因子提取过程。在时间神经网络的辅助下，我们通过以监督的方式缩放原始预测器来构建目标感知预测器，并将更大的权重分配给预测能力更强的预测器。然后对目标感知预测器执行主成分分析以提取估计的SDDP因子。这种有监督的因子提取不仅提高了下游预测任务的预测准确性，而且还产生了更多可解释的和特定于目标的潜在因子。SDDP的基础上，我们提出了一个因素增强的非线性动态预测模型，统一了广泛的家庭因素模型为基础的预测方法。为了进一步证明SDDP的更广泛的适用性，我们将我们的研究扩展到一个更具挑战性的场景，当预测因子仅部分可观察时。我们在几个真实世界的公共数据集上验证了所提出的方法的经验性能。结果表明，我们的算法实现了显着的改善，预测精度相比，国家的最先进的方法。
摘要：This paper studies the problem of dimension reduction, tailored to improving time series forecasting with high-dimensional predictors. We propose a novel Supervised Deep Dynamic Principal component analysis (SDDP) framework that incorporates the target variable and lagged observations into the factor extraction process. Assisted by a temporal neural network, we construct target-aware predictors by scaling the original predictors in a supervised manner, with larger weights assigned to predictors with stronger forecasting power. A principal component analysis is then performed on the target-aware predictors to extract the estimated SDDP factors. This supervised factor extraction not only improves predictive accuracy in the downstream forecasting task but also yields more interpretable and target-specific latent factors. Building upon SDDP, we propose a factor-augmented nonlinear dynamic forecasting model that unifies a broad family of factor-model-based forecasting approaches. To further demonstrate the broader applicability of SDDP, we extend our studies to a more challenging scenario when the predictors are only partially observable. We validate the empirical performance of the proposed method on several real-world public datasets. The results show that our algorithm achieves notable improvements in forecasting accuracy compared to state-of-the-art methods.

【7】Model Accuracy and Data Heterogeneity Shape Uncertainty Quantification in Machine Learning Interatomic Potentials
标题：模型准确性和数据异方差塑造机器学习原子间势中的不确定性量化
链接：https://arxiv.org/abs/2508.03405

作者：g, Zixiong Wei, Kai Liu, Wei Gao, Poulumi Dey
摘要：机器学习原子间势（MLIP）可以实现精确的原子建模，但可靠的不确定性量化（UQ）仍然难以捉摸。在这项研究中，我们研究了两个UQ策略，集成学习和D-最优，在原子簇扩展框架。结果表明，更高的模型精度加强了预测的不确定性和实际误差之间的相关性，并提高了新颖性检测，D-最优产生更保守的估计。这两种方法都在同质训练集上提供了校准良好的不确定性，但它们低估了错误，并在异构数据集上表现出较低的新颖性敏感性。为了解决这个问题，我们引入了聚类增强的局部D最优性，它在训练过程中将配置空间划分为集群，并在每个集群中应用D最优性。这种方法大大提高了异构数据集中新原子环境的检测。我们的研究结果阐明了模型保真度和数据异质性在UQ性能中的作用，并为MLIP开发提供了一条实用的途径，以实现强大的主动学习和自适应采样策略。
摘要：Machine learning interatomic potentials (MLIPs) enable accurate atomistic modelling, but reliable uncertainty quantification (UQ) remains elusive. In this study, we investigate two UQ strategies, ensemble learning and D-optimality, within the atomic cluster expansion framework. It is revealed that higher model accuracy strengthens the correlation between predicted uncertainties and actual errors and improves novelty detection, with D-optimality yielding more conservative estimates. Both methods deliver well calibrated uncertainties on homogeneous training sets, yet they underpredict errors and exhibit reduced novelty sensitivity on heterogeneous datasets. To address this limitation, we introduce clustering-enhanced local D-optimality, which partitions configuration space into clusters during training and applies D-optimality within each cluster. This approach substantially improves the detection of novel atomic environments in heterogeneous datasets. Our findings clarify the roles of model fidelity and data heterogeneity in UQ performance and provide a practical route to robust active learning and adaptive sampling strategies for MLIP development.

【8】PatchDSU: Uncertainty Modeling for Out of Distribution Generalization in Keyword Spotting
标题：PatchDSU：关键词发现中分布不确定性建模
链接：https://arxiv.org/abs/2508.03190

作者：ni Chernyak, Yael Segal, Yosi Shrem, Joseph Keshet
备注：This work has been submitted to the IEEE for possible publication
摘要：深度学习模型擅长许多任务，但依赖于训练和测试数据遵循相同分布的假设。这种假设在现实世界的语音系统中往往不成立，因为在现实世界中，由于环境、录音条件和说话者多样性的变化，分布偏移是常见的。不确定性域移位（DSU）方法根据输入特征统计来增加各层神经网络的输入。它通过假设特征统计遵循多元高斯分布并使用来自该分布的采样特征替代输入来解决域外泛化问题。虽然对计算机视觉有效，但由于数据的性质，将DSU应用于语音带来了挑战。与静态视觉数据不同，语音是一种时间信号，通常由频谱图表示-频率随时间的变化。这种表示不能被视为一个简单的图像，并且当应用于整个输入时，所产生的稀疏性可能导致倾斜的特征统计。为了解决关键字定位中的分布问题，我们提出了PatchDSU，它通过将输入拆分为补丁并独立地增强每个补丁来扩展DSU。我们评估了PatchDSU和DSU以及Google Speech Commands，Librispeech和TED-LIUM上的其他方法。此外，我们评估了白高斯和MUSAN音乐噪声条件下的性能。我们还通过分析模型在未经训练的数据集上的性能来探索域外泛化。总的来说，在大多数情况下，PatchDSU和DSU都优于其他方法。值得注意的是，与其他方法相比，PatchDSU在评估的场景中表现出更一致的改进。
摘要：Deep learning models excel at many tasks but rely on the assumption that training and test data follow the same distribution. This assumption often does not hold in real-world speech systems, where distribution shifts are common due to varying environments, recording conditions, and speaker diversity. The method of Domain Shifts with Uncertainty (DSU) augments the input of each neural network layer based on the input feature statistics. It addresses the problem of out-of-domain generalization by assuming feature statistics follow a multivariate Gaussian distribution and substitutes the input with sampled features from this distribution. While effective for computer vision, applying DSU to speech presents challenges due to the nature of the data. Unlike static visual data, speech is a temporal signal commonly represented by a spectrogram - the change of frequency over time. This representation cannot be treated as a simple image, and the resulting sparsity can lead to skewed feature statistics when applied to the entire input. To tackle out-of-distribution issues in keyword spotting, we propose PatchDSU, which extends DSU by splitting the input into patches and independently augmenting each patch. We evaluated PatchDSU and DSU alongside other methods on the Google Speech Commands, Librispeech, and TED-LIUM. Additionally, we evaluated performance under white Gaussian and MUSAN music noise conditions. We also explored out-of-domain generalization by analyzing model performance on datasets they were not trained on. Overall, in most cases, both PatchDSU and DSU outperform other methods. Notably, PatchDSU demonstrates more consistent improvements across the evaluated scenarios compared to other approaches.

【9】Veli: Unsupervised Method and Unified Benchmark for Low-Cost Air Quality Sensor Correction
标题：Veli：低成本空气质量传感器校正的无监督方法和统一基准
链接：https://arxiv.org/abs/2508.02724

作者：bah, Marcel Worring, Yen-Chia Hsu
备注：Main content: 7 pages, 9 Figures, 3 Tables. Appendix: 4 pages, 6 Figures
摘要：城市空气污染是一个重大的健康危机，每年造成数百万人过早死亡，这突出表明迫切需要对空气质量进行准确和可扩展的监测。虽然低成本传感器（LCS）为昂贵的参考级站提供了可扩展的替代方案，但它们的读数会受到漂移、校准误差和环境干扰的影响。为了应对这些挑战，我们引入了Veli（通过潜在推理进行无参考变分估计），这是一种无监督贝叶斯模型，它利用变分推理来校正LCS读数，而无需与参考站共置，从而消除了主要的部署障碍。具体来说，Veli构建了LCS读数的解纠缠表示，有效地将真正的污染物读数与传感器噪声分离开来。为了建立我们的模型并解决AQ监测中缺乏标准化基准的问题，我们还引入了空气质量传感器数据库（AQ-SDR）。AQ-SDR是迄今为止最大的AQ传感器基准，来自多个地区的23，737个LCS和参考站的读数。Veli在分布内和分布外设置中都表现出很强的泛化能力，有效地处理了传感器漂移和不稳定的传感器行为。模型和数据集的代码将在本文发表时公开。
摘要：Urban air pollution is a major health crisis causing millions of premature deaths annually, underscoring the urgent need for accurate and scalable monitoring of air quality (AQ). While low-cost sensors (LCS) offer a scalable alternative to expensive reference-grade stations, their readings are affected by drift, calibration errors, and environmental interference. To address these challenges, we introduce Veli (Reference-free Variational Estimation via Latent Inference), an unsupervised Bayesian model that leverages variational inference to correct LCS readings without requiring co-location with reference stations, eliminating a major deployment barrier. Specifically, Veli constructs a disentangled representation of the LCS readings, effectively separating the true pollutant reading from the sensor noise. To build our model and address the lack of standardized benchmarks in AQ monitoring, we also introduce the Air Quality Sensor Data Repository (AQ-SDR). AQ-SDR is the largest AQ sensor benchmark to date, with readings from 23,737 LCS and reference stations across multiple regions. Veli demonstrates strong generalization across both in-distribution and out-of-distribution settings, effectively handling sensor drift and erratic sensor behavior. Code for model and dataset will be made public when this paper is published.

【10】Measuring Dependencies between Biological Signals with Temporal Self-supervision, and its Limitations
标题：利用时间自我监督来测量生物信号之间的依赖性及其局限性
链接：https://arxiv.org/abs/2508.02703

作者： Sariyanidi, John D. Herrington, Lisa Yankowitz, Pratik Chaudhari, Theodore D. Satterthwaite, Casey J. Zampella, Robert T. Schultz, Russell T. Shinohara, Birkan Tunc
备注：To be submitted to NeurIPS 2025 AI for Science Workshop
摘要：测量观测信号之间的统计相关性是科学发现的主要工具。然而，生物系统往往表现出复杂的非线性相互作用，目前无法捕捉没有先验知识的依赖性的性质。我们介绍了一种自我监督的方法，并发，这是由观察到的启发，如果两个信号是依赖的，那么应该能够区分时间对齐与未对齐的片段从它们中提取。功能磁共振成像，生理和行为信号的实验表明，据我们所知，并发是第一种可以揭示如此广泛的信号之间的关系并提取科学相关差异的方法，而无需特别的参数调整或依赖先验信息，为跨领域的科学发现提供了一个有力的工具。然而，由外部因素引起的依赖性仍然是一个悬而未决的问题，因此研究者应该验证暴露的关系是否真正属于感兴趣的问题。
摘要：Measuring the statistical dependence between observed signals is a primary tool for scientific discovery. However, biological systems often exhibit complex non-linear interactions that currently cannot be captured without a priori knowledge regarding the nature of dependence. We introduce a self-supervised approach, concurrence, which is inspired by the observation that if two signals are dependent, then one should be able to distinguish between temporally aligned vs. misaligned segments extracted from them. Experiments with fMRI, physiological and behavioral signals show that, to our knowledge, concurrence is the first approach that can expose relationships across such a wide spectrum of signals and extract scientifically relevant differences without ad-hoc parameter tuning or reliance on a priori information, providing a potent tool for scientific discoveries across fields. However, depencencies caused by extraneous factors remain an open problem, thus researchers should validate that exposed relationships truely pertain to the question(s) of interest.

迁移|Zero/Few/One-Shot|自适应(6篇)

【1】DiWA: Diffusion Policy Adaptation with World Models
标题：DiWA：与世界模式相适应的扩散政策
链接：https://arxiv.org/abs/2508.03645

作者：Chandra, Iman Nematollahi, Chenguang Huang, Tim Welschehold, Wolfram Burgard, Abhinav Valada
备注：Accepted at the 2025 Conference on Robot Learning (CoRL)
摘要：用强化学习（RL）微调扩散策略提出了重大挑战。每个动作预测的长去噪序列阻碍了有效的奖励传播。此外，标准RL方法需要数百万次真实世界的交互，这对实际的微调构成了主要瓶颈。尽管先前的工作将扩散策略中的去噪过程框架为马尔可夫决策过程以实现基于RL的更新，但其对环境交互的强烈依赖仍然非常低效。为了弥补这一差距，我们引入了DiWA，这是一个新颖的框架，它利用世界模型通过强化学习完全离线地微调基于扩散的机器人技能。与需要数百万次环境交互来微调机器人技能库的无模型方法不同，DiWA使用一个在几十万次离线游戏交互上训练过的世界模型来实现有效的适应。这导致样本效率的显著提高，使该方法在现实世界的机器人学习中更加实用和安全。在具有挑战性的CALVIN基准测试中，DiWA仅使用离线自适应来提高八个任务的性能，同时需要比无模型基线少几个数量级的物理交互。据我们所知，这是第一次使用离线世界模型对现实世界机器人技能的微调扩散政策进行演示。我们在https://diwa.cs.uni-freiburg.de上公开提供代码。
摘要：Fine-tuning diffusion policies with reinforcement learning (RL) presents significant challenges. The long denoising sequence for each action prediction impedes effective reward propagation. Moreover, standard RL methods require millions of real-world interactions, posing a major bottleneck for practical fine-tuning. Although prior work frames the denoising process in diffusion policies as a Markov Decision Process to enable RL-based updates, its strong dependence on environment interaction remains highly inefficient. To bridge this gap, we introduce DiWA, a novel framework that leverages a world model for fine-tuning diffusion-based robotic skills entirely offline with reinforcement learning. Unlike model-free approaches that require millions of environment interactions to fine-tune a repertoire of robot skills, DiWA achieves effective adaptation using a world model trained once on a few hundred thousand offline play interactions. This results in dramatically improved sample efficiency, making the approach significantly more practical and safer for real-world robot learning. On the challenging CALVIN benchmark, DiWA improves performance across eight tasks using only offline adaptation, while requiring orders of magnitude fewer physical interactions than model-free baselines. To our knowledge, this is the first demonstration of fine-tuning diffusion policies for real-world robotic skills using an offline world model. We make the code publicly available at https://diwa.cs.uni-freiburg.de.

【2】Adaptive Sparse Softmax: An Effective and Efficient Softmax Variant
标题：自适应稀疏Softmax：一种有效且高效的Softmax变体
链接：https://arxiv.org/abs/2508.03175

作者：i Geng, Ziqiang Cao, Min Cao, Sujian Li, Wenjie Li, Guohong Fu
备注：Accept by IEEE TASLP (Early accept version)
摘要：具有交叉熵损失的Softmax是当前神经分类模型的标准配置。目标类的gold score应该是1，但在softmax模式下永远无法达到。这样的问题使得训练过程永远持续下去，并导致过拟合。此外，“目标-接近-1”训练目标迫使模型不断学习所有样本，导致在处理一些已经以高置信度正确分类的样本时浪费时间，而测试目标只是要求每个样本的目标类保持最高得分。为了解决上述缺点，我们提出了自适应稀疏softmax（AS-Softmax），它在softmax之上设计了一个合理的测试匹配变换。为了更有目的的学习，我们在训练过程中丢弃了与实际类别相比分数小得多的类别。这样，模型就可以集中精力学习，将目标类与其强对手区分开来，这也是测试中的一大挑战。此外，由于AS-Softmax中容易样本的训练损失会逐渐下降到0，因此我们开发了一种基于掩蔽样本比率的自适应梯度累积策略来加快训练速度。我们在各种文本多类，文本多标签，文本标记分类，图像分类和音频分类任务上验证了所提出的AS-Softmax，类大小范围从5到5000+。结果表明，AS-Softmax的性能始终优于softmax及其变体，并且AS-Softmax的损失与验证中的分类性能显著相关。此外，自适应梯度累积策略可以带来1.2倍的训练加速比相比，标准softmax，同时保持分类效果。
摘要：Softmax with the cross entropy loss is the standard configuration for current neural classification models. The gold score for a target class is supposed to be 1, but it is never reachable under the softmax schema. Such a problem makes the training process continue forever and leads to overfitting. Moreover, the "target-approach-1" training goal forces the model to continuously learn all samples, leading to a waste of time in handling some samples which have already been classified correctly with high confidence, while the test goal simply requires the target class of each sample to hold the maximum score. To solve the above weaknesses, we propose the Adaptive Sparse softmax (AS-Softmax) which designs a reasonable and test-matching transformation on top of softmax. For more purposeful learning, we discard the classes with far smaller scores compared with the actual class during training. Then the model could focus on learning to distinguish the target class from its strong opponents, which is also the great challenge in test. In addition, since the training losses of easy samples will gradually drop to 0 in AS-Softmax, we develop an adaptive gradient accumulation strategy based on the masked sample ratio to speed up training. We verify the proposed AS-Softmax on a variety of text multi-class, text multi-label, text token classification, image classification and audio classification tasks with class sizes ranging from 5 to 5000+. The results show that AS-Softmax consistently outperforms softmax and its variants, and the loss of AS-Softmax is remarkably correlated with classification performance in validation. Furthermore, adaptive gradient accumulation strategy can bring about 1.2x training speedup comparing with the standard softmax while maintaining classification effectiveness.

【3】On the Fast Adaptation of Delayed Clients in Decentralized Federated Learning: A Centroid-Aligned Distillation Approach
标题：去中心联邦学习中延迟客户端的快速适应：一种中心对齐的蒸馏方法
链接：https://arxiv.org/abs/2508.02993

作者：i, Hai Dong, A. K. Qin
备注：This paper is currently under peer review
摘要：分散式联合学习（DFL）在异步环境中努力适应延迟加入的客户端的缓慢适应和高通信成本。这些限制严重阻碍了整体性能。为了解决这个问题，我们提出了DFedCAD，一个新的框架，通过质心对齐蒸馏快速适应。DFedCAD首先采用加权聚类剪枝（WCP）将模型压缩成代表性的质心，大大减少了通信开销。然后，它使延迟的客户端能够使用一种新的结构距离度量和一个可微分的k-means蒸馏模块来智能地权衡和调整对等知识，从而促进高效的端到端知识转移。在CIFAR-10、CIFAR-100和Tiny-ImageNet上进行的大量实验表明，DFedCAD始终实现了最先进的性能，在所有评估的设置中实现了最高的准确性，同时将通信开销减少了86%以上。我们的框架提供了一个可扩展的和实用的解决方案，在动态的，真实的场景中有效的分散学习。
摘要：Decentralized Federated Learning (DFL) struggles with the slow adaptation of late-joining delayed clients and high communication costs in asynchronous environments. These limitations significantly hinder overall performance. To address this, we propose DFedCAD, a novel framework for rapid adaptation via Centroid-Aligned Distillation. DFedCAD first employs Weighted Cluster Pruning (WCP) to compress models into representative centroids, drastically reducing communication overhead. It then enables delayed clients to intelligently weigh and align with peer knowledge using a novel structural distance metric and a differentiable k-means distillation module, facilitating efficient end-to-end knowledge transfer. Extensive experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet show that DFedCAD consistently achieves state-of-the-art performance, attaining the highest accuracy across all evaluated settings while reducing communication overhead by over 86%. Our framework provides a scalable and practical solution for efficient decentralized learning in dynamic, real-world scenarios.

【4】Learning from B Cell Evolution: Adaptive Multi-Expert Diffusion for Antibody Design via Online Optimization
标题：从B细胞进化中学习：通过在线优化进行抗体设计的自适应多专家扩散
链接：https://arxiv.org/abs/2508.02834

作者：g, Peng Qiu, Mengchun Zhang, Yiran Tao, You Fan, Jingtao Xu, Barnabas Poczos
摘要：扩散模型的最新进展已经显示出抗体设计的显着潜力，但现有的方法应用统一的生成策略，不能适应每个抗原的独特要求。受B细胞亲和力成熟的启发，抗体通过多目标优化平衡亲和力，稳定性和自我回避来进化，我们提出了第一个生物学动机的框架，该框架利用在线元学习系统中基于物理学的领域知识。我们的方法采用了多个专业专家（范德华，分子识别，能量平衡和界面几何），其参数在生成过程中基于迭代反馈进行演变，模仿天然抗体的细化周期。而不是固定的协议，这种自适应指导发现个性化的优化策略，为每个目标。我们的实验表明，这种方法：（1）发现了最佳的SE（3）-等变指导策略，为不同的抗原类别，而无需预先训练，保持分子的对称性，在整个优化;（2）显着提高热点覆盖率和接口质量，通过目标特异性的适应，实现平衡的多目标优化特性的治疗性抗体;（3）建立了迭代优化的范例，其中每个抗体-抗原系统通过在线评估学习其独特的优化概况;（4）有效地概括了从小表位到大蛋白质界面的各种设计挑战，从而能够针对单个靶标进行精确聚焦的活动。
摘要：Recent advances in diffusion models have shown remarkable potential for antibody design, yet existing approaches apply uniform generation strategies that cannot adapt to each antigen's unique requirements. Inspired by B cell affinity maturation, where antibodies evolve through multi-objective optimization balancing affinity, stability, and self-avoidance, we propose the first biologically-motivated framework that leverages physics-based domain knowledge within an online meta-learning system. Our method employs multiple specialized experts (van der Waals, molecular recognition, energy balance, and interface geometry) whose parameters evolve during generation based on iterative feedback, mimicking natural antibody refinement cycles. Instead of fixed protocols, this adaptive guidance discovers personalized optimization strategies for each target. Our experiments demonstrate that this approach: (1) discovers optimal SE(3)-equivariant guidance strategies for different antigen classes without pre-training, preserving molecular symmetries throughout optimization; (2) significantly enhances hotspot coverage and interface quality through target-specific adaptation, achieving balanced multi-objective optimization characteristic of therapeutic antibodies; (3) establishes a paradigm for iterative refinement where each antibody-antigen system learns its unique optimization profile through online evaluation; (4) generalizes effectively across diverse design challenges, from small epitopes to large protein interfaces, enabling precision-focused campaigns for individual targets.

【5】MPCA-based Domain Adaptation for Transfer Learning in Ultrasonic Guided Waves
标题：基于MPCA的超声波转移学习领域自适应
链接：https://arxiv.org/abs/2508.02726

作者：ello, Francesco Cadini, Luca Lomazzi
摘要：超声导波（UGW）是薄壁结构中结构健康监测（SHM）的一种很有前途的诊断工具，其与机器学习（ML）算法的集成越来越多地被采用，以实现实时监测功能。然而，基于UGW的ML方法的大规模部署受到数据稀缺和不同材料和传感器配置的有限推广的限制。为了解决这些限制，这项工作提出了一种新的迁移学习（TL）框架的基础上多线性主成分分析（MPCA）。首先，训练用于回归的卷积神经网络（CNN）以执行针对镀覆结构的损伤定位。然后，将MPCA和微调相结合，以使CNN针对不同的板工作。通过将MPCA联合应用于源域和目标域，该方法提取共享的潜在特征，从而实现有效的域自适应，而无需预先假设维度。在MPCA之后，微调使预训练的CNN能够适应新的领域，而不需要大的训练数据集。建议MPCA为基础的TL方法进行了测试，对12个案例研究，涉及不同的复合材料和传感器阵列。统计指标被用来评估域对齐之前和之后的MPCA，结果表明，在本地化误差大幅减少相比，标准TL技术。因此，所提出的方法出现作为一个强大的，数据高效，基于统计的TL框架UGW为基础的SHM。
摘要：Ultrasonic Guided Waves (UGWs) represent a promising diagnostic tool for Structural Health Monitoring (SHM) in thin-walled structures, and their integration with machine learning (ML) algorithms is increasingly being adopted to enable real-time monitoring capabilities. However, the large-scale deployment of UGW-based ML methods is constrained by data scarcity and limited generalisation across different materials and sensor configurations. To address these limitations, this work proposes a novel transfer learning (TL) framework based on Multilinear Principal Component Analysis (MPCA). First, a Convolutional Neural Network (CNN) for regression is trained to perform damage localisation for a plated structure. Then, MPCA and fine-tuning are combined to have the CNN work for a different plate. By jointly applying MPCA to the source and target domains, the method extracts shared latent features, enabling effective domain adaptation without requiring prior assumptions about dimensionality. Following MPCA, fine-tuning enables adapting the pre-trained CNN to a new domain without the need for a large training dataset. The proposed MPCA-based TL method was tested against 12 case studies involving different composite materials and sensor arrays. Statistical metrics were used to assess domains alignment both before and after MPCA, and the results demonstrate a substantial reduction in localisation error compared to standard TL techniques. Hence, the proposed approach emerges as a robust, data-efficient, and statistically based TL framework for UGW-based SHM.

【6】Evaluating Transfer Learning Methods on Real-World Data Streams: A Case Study in Financial Fraud Detection
标题：评估现实数据流上的迁移学习方法：金融欺诈检测案例研究
链接：https://arxiv.org/abs/2508.02702

作者：ibeiro Pereira, Jacopo Bono, Hugo Ferreira, Pedro Ribeiro, Carlos Soares, Pedro Bizarro
备注：16 pages, 7 figures, submitted to ECML PKDD 2025
摘要：当目标域的可用数据有限时，迁移学习（TL）方法可以用于在相关的数据丰富的域上开发模型，然后将它们部署到目标域上。然而，这些TL方法通常设计有关于可用的标记和未标记的目标数据的量的特定的静态假设。这与许多现实世界的应用程序相反，在现实世界中，数据和相应标签的可用性随时间而变化。由于TL方法的评价通常也是在相同的静态数据可用性假设下进行的，这将导致对它们在现实世界中的性能产生不切实际的期望。为了支持对TL算法和模型进行更真实的评估和比较，我们提出了一个数据操作框架，该框架（1）模拟随着时间的推移而变化的数据可用性场景，（2）通过对给定数据集进行重新采样来创建多个域，以及（3）引入跨域可变性通过应用现实的域变换，例如，产生各种潜在的时间依赖性协变量和概念转移。这些功能可以模拟大量真实的实验变体，从而提供有关在动态设置中部署算法时的潜在行为的更多信息。我们证明了所提出的框架的实用性，通过对专有的真实世界的卡支付数据集进行案例研究。鉴于案例研究的保密性，我们还说明了使用公开的银行账户欺诈（BAF）数据集的框架。通过提供一种方法来评估TL方法随着时间的推移，在现实的数据可用性的情况下，我们的框架促进了模型和算法的行为的理解。这使得在现实环境中为新领域部署模型时可以做出更好的决策。
摘要：When the available data for a target domain is limited, transfer learning (TL) methods can be used to develop models on related data-rich domains, before deploying them on the target domain. However, these TL methods are typically designed with specific, static assumptions on the amount of available labeled and unlabeled target data. This is in contrast with many real world applications, where the availability of data and corresponding labels varies over time. Since the evaluation of the TL methods is typically also performed under the same static data availability assumptions, this would lead to unrealistic expectations concerning their performance in real world settings. To support a more realistic evaluation and comparison of TL algorithms and models, we propose a data manipulation framework that (1) simulates varying data availability scenarios over time, (2) creates multiple domains through resampling of a given dataset and (3) introduces inter-domain variability by applying realistic domain transformations, e.g., creating a variety of potentially time-dependent covariate and concept shifts. These capabilities enable simulation of a large number of realistic variants of the experiments, in turn providing more information about the potential behavior of algorithms when deployed in dynamic settings. We demonstrate the usefulness of the proposed framework by performing a case study on a proprietary real-world suite of card payment datasets. Given the confidential nature of the case study, we also illustrate the use of the framework on the publicly available Bank Account Fraud (BAF) dataset. By providing a methodology for evaluating TL methods over time and in realistic data availability scenarios, our framework facilitates understanding of the behavior of models and algorithms. This leads to better decision making when deploying models for new domains in real-world environments.

强化学习(4篇)

【1】Agent Lightning: Train ANY AI Agents with Reinforcement Learning
标题：Agent Lightning：通过强化学习训练任何人工智能Agent
链接：https://arxiv.org/abs/2508.03680

作者：o, Yuge Zhang, Zhiyuan He, Zilong Wang, Siyun Zhao, Dongsheng Li, Luna K. Qiu, Yuqing Yang
摘要：我们提出了Agent Lightning，这是一个灵活且可扩展的框架，可以为任何AI Agent提供基于强化学习（RL）的大型语言模型（LLM）训练。与现有的将RL训练与代理紧密耦合或依赖于序列级联与掩蔽的方法不同，Agent Lightning实现了代理执行和训练之间的完全解耦，允许与通过不同方式开发的现有代理无缝集成（例如，使用LangChain，OpenAI Agents SDK，AutoGen等框架，从头开始构建），几乎零代码修改。通过将代理执行制定为马尔可夫决策过程，我们定义了一个统一的数据接口，并提出了一个分层RL算法，LightningRL，其中包含一个信用分配模块，允许我们将任何代理生成的轨迹分解为训练过渡。这使RL能够处理复杂的交互逻辑，例如多代理场景和动态工作流。在系统设计上，引入了训练-Agent分解的体系结构，并将Agent可观测性框架引入Agent运行时，提供了一个标准化的Agent微调接口。跨文本到SQL、检索增强生成和数学工具使用任务的实验表明了稳定、持续的改进，展示了该框架在现实世界代理培训和部署方面的潜力。
摘要：We present Agent Lightning, a flexible and extensible framework that enables Reinforcement Learning (RL)-based training of Large Language Models (LLMs) for any AI agent. Unlike existing methods that tightly couple RL training with agent or rely on sequence concatenation with masking, Agent Lightning achieves complete decoupling between agent execution and training, allowing seamless integration with existing agents developed via diverse ways (e.g., using frameworks like LangChain, OpenAI Agents SDK, AutoGen, and building from scratch) with almost ZERO code modifications. By formulating agent execution as Markov decision process, we define an unified data interface and propose a hierarchical RL algorithm, LightningRL, which contains a credit assignment module, allowing us to decompose trajectories generated by ANY agents into training transition. This enables RL to handle complex interaction logic, such as multi-agent scenarios and dynamic workflows. For the system design, we introduce a Training-Agent Disaggregation architecture, and brings agent observability frameworks into agent runtime, providing a standardized agent finetuning interface. Experiments across text-to-SQL, retrieval-augmented generation, and math tool-use tasks demonstrate stable, continuous improvements, showcasing the framework's potential for real-world agent training and deployment.

【2】SLA-MORL: SLA-Aware Multi-Objective Reinforcement Learning for HPC Resource Optimization
标题：SLA-MORL：SLA感知的多目标强化学习用于高性能计算资源优化
链接：https://arxiv.org/abs/2508.03509

作者：Mahmud Mostafa, Aravind Mohan, Jianwu Wang
摘要：云环境中机器学习工作负载的动态资源分配仍然具有挑战性，因为在满足服务水平协议（SLA）约束的同时，还要最大限度地减少培训时间和运营成本。传统方法采用静态资源分配或单目标优化，导致违反SLA或资源浪费。我们提出了SLA-MORL，这是一个自适应多目标强化学习框架，可以根据用户定义的偏好（时间，成本或平衡）智能分配GPU和CPU资源，同时确保SLA合规性。我们的方法引入了两个关键创新：（1）通过历史学习或有效的基线运行进行智能初始化，消除了冷启动问题，将初始探索开销减少了60%，以及（2）动态权重自适应，根据实时SLA违规严重程度自动调整优化优先级，创建一个自我纠正系统。SLA-MORL构建了一个21维的状态表示，捕获资源利用率，培训进度和SLA合规性，使行动者-批评者网络能够在9个可能的行动中做出明智的分配决策。使用生产HPC基础设施对13种不同的ML工作负载进行了广泛的评估，结果表明，与静态基线相比，SLA-MORL将关键工作的培训时间减少了67.2%，将受限制工作负载的成本降低了68.8%，并将总体SLA合规性提高了73.4%。通过解决冷启动低效率和动态适应挑战，SLA-MORL为云资源管理提供了一个实用的解决方案，可以在现代ML培训环境中平衡性能、成本和可靠性。
摘要：Dynamic resource allocation for machine learning workloads in cloud environments remains challenging due to competing objectives of minimizing training time and operational costs while meeting Service Level Agreement (SLA) constraints. Traditional approaches employ static resource allocation or single-objective optimization, leading to either SLA violations or resource waste. We present SLA-MORL, an adaptive multi-objective reinforcement learning framework that intelligently allocates GPU and CPU resources based on user-defined preferences (time, cost, or balanced) while ensuring SLA compliance. Our approach introduces two key innovations: (1) intelligent initialization through historical learning or efficient baseline runs that eliminates cold-start problems, reducing initial exploration overhead by 60%, and (2) dynamic weight adaptation that automatically adjusts optimization priorities based on real-time SLA violation severity, creating a self-correcting system. SLA-MORL constructs a 21-dimensional state representation capturing resource utilization, training progress, and SLA compliance, enabling an actor-critic network to make informed allocation decisions across 9 possible actions. Extensive evaluation on 13 diverse ML workloads using production HPC infrastructure demonstrates that SLA-MORL achieves 67.2% reduction in training time for deadline-critical jobs, 68.8% reduction in costs for budget-constrained workloads, and 73.4% improvement in overall SLA compliance compared to static baselines. By addressing both cold-start inefficiency and dynamic adaptation challenges, SLA-MORL provides a practical solution for cloud resource management that balances performance, cost, and reliability in modern ML training environments.

【3】Training Long-Context, Multi-Turn Software Engineering Agents with Reinforcement Learning
标题：利用强化学习训练长上下文、多回合软件工程代理
链接：https://arxiv.org/abs/2508.03501

作者： Golubev, Maria Trofimova, Sergei Polezhaev, Ibragim Badertdinov, Maksim Nekrashevich, Anton Shevtsov, Simon Karasik, Sergey Abramov, Andrei Andriushchenko, Filipp Fisin, Sergei Skvortsov, Boris Yangel
摘要：强化学习（RL）在大型语言模型（LLM）中的应用研究主要集中在单轮问题上，例如数学推理或单次代码生成。虽然这些问题可以被视为令牌级多回合MDP，但这种观点对应于环境不提供反馈的多回合交互的退化情况。这与许多现实世界的领域形成了鲜明对比，例如软件工程（SWE），它需要与有状态环境进行丰富的多轮交互，并通过非平凡的观察来响应每个动作。为了弥合这一差距，我们证明了RL的成功应用，这一一般制度。采用一种改进的解耦优势策略优化（DAPO）算法，训练一个基于Qwen2.5- 72 B-Instruct的Agent，使其能够解决实际的软件工程任务。我们的方法将代理在SWE工作台验证基准上的成功率从20%的拒绝微调基线提高到39%，而不依赖于任何教师模型。在SWE-rebench上，我们的智能体使用相同的脚手架匹配或优于领先的开放权重模型，如DeepSeek-V3-0324和Qwen 3 - 235 B-A22 B，为基于开放模型构建更强大的自主智能体解决复杂的现实问题提供了可行的途径。
摘要：Research on applications of Reinforcement Learning (RL) to Large Language Models (LLMs) has mostly been focused on single-turn problems, such as mathematical reasoning or single-shot code generation. While these problems can be viewed as token-level multi-turn MDPs, this view corresponds to a degenerate case of multi-turn interaction where the environment provides no feedback. This contrasts with many real-world domains, such as software engineering (SWE), which require rich multi-turn interactions with a stateful environment that responds to each action with a non-trivial observation. To bridge this gap, we demonstrate the successful application of RL to this general regime. Using a modified Decoupled Advantage Policy Optimization (DAPO) algorithm, we train an agent based on Qwen2.5-72B-Instruct to solve real-world software engineering tasks. Our approach increases the agent's success rate on the SWE-bench Verified benchmark from a 20% rejection fine-tuned baseline to 39%, without relying on any teacher models. On SWE-rebench, our agent matches or outperforms leading open-weight models such as DeepSeek-V3-0324 and Qwen3-235B-A22B using an identical scaffolding, offering a viable path toward building more capable autonomous agents for complex real-world problems based on open models.

【4】Online Robust Multi-Agent Reinforcement Learning under Model Uncertainties
标题：模型不确定下的在线鲁棒多智能体强化学习
链接：https://arxiv.org/abs/2508.02948

作者：edeen Farhat, Debamita Ghosh, George K. Atia, Yue Wang
摘要：经过良好训练的多智能体系统在部署到现实环境中时可能会失败，原因是训练和部署环境之间的模型不匹配，这是由环境不确定性（包括噪声或对抗性攻击）造成的。分布鲁棒马尔可夫博弈（DRMG）通过优化一组定义的环境不确定性的最坏情况下的性能，提高系统的弹性。然而，目前的方法是有限的，它们依赖于模拟器或大型离线数据集，这往往是不可用的。本文开创了DRMG在线学习的研究，其中代理人直接从环境交互中学习，而无需先验数据。我们介绍了{\it鲁棒乐观纳什值迭代（RONAVI）}算法，并提供了第一个可证明的保证，这个设置。我们的理论分析表明，该算法实现了低遗憾，并有效地找到最优的鲁棒策略的不确定性集的总变差分歧和Kullback-Leibler分歧。这些结果建立了一个新的，实用的路径发展真正强大的多智能体系统。
摘要：Well-trained multi-agent systems can fail when deployed in real-world environments due to model mismatches between the training and deployment environments, caused by environment uncertainties including noise or adversarial attacks. Distributionally Robust Markov Games (DRMGs) enhance system resilience by optimizing for worst-case performance over a defined set of environmental uncertainties. However, current methods are limited by their dependence on simulators or large offline datasets, which are often unavailable. This paper pioneers the study of online learning in DRMGs, where agents learn directly from environmental interactions without prior data. We introduce the {\it Robust Optimistic Nash Value Iteration (RONAVI)} algorithm and provide the first provable guarantees for this setting. Our theoretical analysis demonstrates that the algorithm achieves low regret and efficiently finds the optimal robust policy for uncertainty sets measured by Total Variation divergence and Kullback-Leibler divergence. These results establish a new, practical path toward developing truly robust multi-agent systems.

分层学习(1篇)

【1】HiTeC: Hierarchical Contrastive Learning on Text-Attributed Hypergraph with Semantic-Aware Augmentation
标题：HiTeC：具有语义感知增强的文本属性超图的分层对比学习
链接：https://arxiv.org/abs/2508.03104

作者：Pan, Fan Li, Xiaoyang Wang, Wenjie Zhang, Xuemin Lin
备注：12 pages, 18 figures
摘要：对比学习（CL）已经成为自监督超图学习的主要范式，可以在没有昂贵标签的情况下进行有效的训练。然而，在现实世界的超图中，节点实体往往与丰富的文本信息，这是在以前的工作被忽视。直接将现有的基于CL的方法应用于此类文本属性超图（TAHG）会导致三个关键限制：（1）图不可知文本编码器的普遍使用忽略了文本内容和超图拓扑之间的相关性，导致次优表示。(2)他们对随机数据增强的依赖会引入噪音并削弱对比目标。(3)主要关注节点和超边缘级别的对比信号限制了捕获长期依赖关系的能力，这对于表达性表征学习至关重要。虽然HyperBERT在TAHG上率先推出了CL，但其协同训练模式的可扩展性较差。为了填补这一研究空白，我们引入了HiTeC，这是一个两阶段的分层对比学习框架，具有语义感知增强功能，可用于TAHG上的可扩展和有效的自监督学习。在第一阶段，我们用结构感知的对比目标来预训练文本编码器，以克服传统方法的图形不可知性。在第二阶段，我们引入了两个语义感知的增强策略，包括文本增强和语义感知的超边缘下降，以促进信息视图生成。此外，我们提出了一个多尺度的对比损失，扩展现有的目标与$s$-行走为基础的子图级对比，以更好地捕捉长期的依赖关系。通过将文本编码器预训练与超图对比学习解耦，这种两阶段设计在不影响表示质量的情况下增强了可扩展性。大量的实验证实了HiTeC的有效性。
摘要：Contrastive learning (CL) has become a dominant paradigm for self-supervised hypergraph learning, enabling effective training without costly labels. However, node entities in real-world hypergraphs are often associated with rich textual information, which is overlooked in prior works. Directly applying existing CL-based methods to such text-attributed hypergraphs (TAHGs) leads to three key limitations: (1) The common use of graph-agnostic text encoders overlooks the correlations between textual content and hypergraph topology, resulting in suboptimal representations. (2) Their reliance on random data augmentations introduces noise and weakens the contrastive objective. (3) The primary focus on node- and hyperedge-level contrastive signals limits the ability to capture long-range dependencies, which is essential for expressive representation learning. Although HyperBERT pioneers CL on TAHGs, its co-training paradigm suffers from poor scalability. To fill the research gap, we introduce HiTeC, a two-stage hierarchical contrastive learning framework with semantic-aware augmentation for scalable and effective self-supervised learning on TAHGs. In the first stage, we pre-train the text encoder with a structure-aware contrastive objective to overcome the graph-agnostic nature of conventional methods. In the second stage, we introduce two semantic-aware augmentation strategies, including prompt-enhanced text augmentation and semantic-aware hyperedge drop, to facilitate informative view generation. Furthermore, we propose a multi-scale contrastive loss that extends existing objectives with an $s$-walk-based subgraph-level contrast to better capture long-range dependencies. By decoupling text encoder pretraining from hypergraph contrastive learning, this two-stage design enhances scalability without compromising representation quality. Extensive experiments confirm the effectiveness of HiTeC.

医学相关(3篇)

【1】A Novel Multimodal Framework for Early Detection of Alzheimers Disease Using Deep Learning
标题：使用深度学习早期检测阿尔茨海默病的新型多模式框架
链接：https://arxiv.org/abs/2508.03046

作者：hi P Nagarhalli, Sanket Patil, Vishal Pande, Uday Aswalekar, Prafulla Patil
备注：Journal paper, 14 pages
摘要：阿尔茨海默病（AD）是一种进行性神经退行性疾病，其早期诊断面临重大挑战，通常导致患者治疗延迟和预后较差。传统的诊断方法通常依赖于单一的数据模式，无法捕捉疾病的多方面性质。在本文中，我们提出了一种新的多模式框架，早期检测AD，整合了三个主要来源的数据：MRI成像，认知评估和生物标志物。该框架采用卷积神经网络（CNN）分析MRI图像，并采用长短期记忆（LSTM）网络处理认知和生物标志物数据。该系统通过使用加权平均等先进技术汇总这些不同模态的结果，即使在不完整的数据中，也能提高诊断的准确性和可靠性。多模态方法不仅提高了检测过程的鲁棒性，而且能够在最早阶段识别AD，与传统方法相比具有显着优势。生物标志物和认知测试的整合尤其重要，因为这些可以在临床症状出现之前很久就检测到阿尔茨海默氏症，从而促进早期干预并可能改变疾病的进程。这项研究表明，拟议的框架有可能彻底改变AD的早期检测，为更及时、更有效的治疗铺平道路
摘要：Alzheimers Disease (AD) is a progressive neurodegenerative disorder that poses significant challenges in its early diagnosis, often leading to delayed treatment and poorer outcomes for patients. Traditional diagnostic methods, typically reliant on single data modalities, fall short of capturing the multifaceted nature of the disease. In this paper, we propose a novel multimodal framework for the early detection of AD that integrates data from three primary sources: MRI imaging, cognitive assessments, and biomarkers. This framework employs Convolutional Neural Networks (CNN) for analyzing MRI images and Long Short-Term Memory (LSTM) networks for processing cognitive and biomarker data. The system enhances diagnostic accuracy and reliability by aggregating results from these distinct modalities using advanced techniques like weighted averaging, even in incomplete data. The multimodal approach not only improves the robustness of the detection process but also enables the identification of AD at its earliest stages, offering a significant advantage over conventional methods. The integration of biomarkers and cognitive tests is particularly crucial, as these can detect Alzheimer's long before the onset of clinical symptoms, thereby facilitating earlier intervention and potentially altering the course of the disease. This research demonstrates that the proposed framework has the potential to revolutionize the early detection of AD, paving the way for more timely and effective treatments

【2】A Novel cVAE-Augmented Deep Learning Framework for Pan-Cancer RNA-Seq Classification
标题：用于泛癌症RN-Seq分类的新型cVAE增强深度学习框架
链接：https://arxiv.org/abs/2508.02743

作者：epalli
摘要：使用转录组学（RNA-Seq）数据的泛癌症分类可以为肿瘤亚型和治疗选择提供信息，但由于极高的维度和有限的样本量而具有挑战性。在这项研究中，我们提出了一种新的深度学习框架，该框架使用类条件变分自动编码器（cVAE）来增强泛癌症基因表达分类的训练数据。使用来自癌症基因组图谱（TCGA）的跨越5种癌症类型的801个肿瘤RNA-Seq样本，我们首先进行特征选择，以将20，531个基因表达特征减少到500个最不表达的基因。然后在此数据上训练cVAE，以学习以癌症类型为条件的基因表达的潜在表示，从而能够生成每个肿瘤类别的合成基因表达样本。我们使用这些cVAE生成的样本（使数据集大小加倍）来增加训练集，以减轻过度拟合和类别不平衡。随后在增强数据集上训练两层多层感知器（MLP）分类器以预测肿瘤类型。增强的框架在保持的测试集上实现了高分类准确率（~98%），大大优于仅在原始数据上训练的分类器。我们提出了详细的实验结果，包括VAE训练曲线，分类器性能指标（ROC曲线和混淆矩阵），和架构图来说明该方法。结果表明，基于cVAE的合成增强可以显着提高泛癌症预测性能，特别是对于代表性不足的癌症类别。
摘要：Pan-cancer classification using transcriptomic (RNA-Seq) data can inform tumor subtyping and therapy selection, but is challenging due to extremely high dimensionality and limited sample sizes. In this study, we propose a novel deep learning framework that uses a class-conditional variational autoencoder (cVAE) to augment training data for pan-cancer gene expression classification. Using 801 tumor RNA-Seq samples spanning 5 cancer types from The Cancer Genome Atlas (TCGA), we first perform feature selection to reduce 20,531 gene expression features to the 500 most variably expressed genes. A cVAE is then trained on this data to learn a latent representation of gene expression conditioned on cancer type, enabling the generation of synthetic gene expression samples for each tumor class. We augment the training set with these cVAE-generated samples (doubling the dataset size) to mitigate overfitting and class imbalance. A two-layer multilayer perceptron (MLP) classifier is subsequently trained on the augmented dataset to predict tumor type. The augmented framework achieves high classification accuracy (~98%) on a held-out test set, substantially outperforming a classifier trained on the original data alone. We present detailed experimental results, including VAE training curves, classifier performance metrics (ROC curves and confusion matrix), and architecture diagrams to illustrate the approach. The results demonstrate that cVAE-based synthetic augmentation can significantly improve pan-cancer prediction performance, especially for underrepresented cancer classes.

【3】Evaluation of Deep Learning Models for LBBB Classification in ECG Signals
标题：用于ECG信号LBBB分类的深度学习模型的评估
链接：https://arxiv.org/abs/2508.02710

作者：acas Ordóñez, Diego Vinicio Orellana Villavicencio, José Manuel Ferrández, Paula Bonomini
备注：Accepted for presentation in the 47th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC 2025)
摘要：本研究探讨了不同的神经网络架构，以评估它们从心电图（ECG）信号中提取空间和时间模式的能力，并将其分为三组：健康受试者，左束支阻滞（LBBB）和严重左束支阻滞（sLBBB）。临床相关性，创新技术通过优化左束支传导阻滞（LBBB）受试者的分类，能够选择心脏复苏治疗（CRT）的候选人。
摘要：This study explores different neural network architectures to evaluate their ability to extract spatial and temporal patterns from electrocardiographic (ECG) signals and classify them into three groups: healthy subjects, Left Bundle Branch Block (LBBB), and Strict Left Bundle Branch Block (sLBBB). Clinical Relevance, Innovative technologies enable the selection of candidates for Cardiac Resynchronization Therapy (CRT) by optimizing the classification of subjects with Left Bundle Branch Block (LBBB).

蒸馏|知识提取(2篇)

【1】Neural Speech Extraction with Human Feedback
标题：利用人类反馈的神经语音提取
链接：https://arxiv.org/abs/2508.03041

作者：ni, Ashton Graves, Sefik Emre Eskimez, Shyamnath Gollakota
备注：Interspeech 2025
摘要：我们提出了第一个神经目标语音提取（TSE）系统，使用人的反馈迭代细化。我们的方法允许用户标记TSE输出的特定部分，生成编辑掩码。然后，细化系统改进标记的部分，同时保留未标记的区域。由于人类标记的错误的大规模数据集很难收集，我们使用各种自动掩蔽函数生成合成数据集，并对每个数据集进行训练。评估表明，使用基于噪声功率的掩蔽（以dBFS为单位）和概率阈值训练的模型表现最好，与人类注释保持一致。在一项有22名参与者的研究中，用户显示出对精炼输出的偏好超过基线TSE。我们的研究结果表明，人在环细化是提高神经语音提取性能的一种很有前途的方法。
摘要：We present the first neural target speech extraction (TSE) system that uses human feedback for iterative refinement. Our approach allows users to mark specific segments of the TSE output, generating an edit mask. The refinement system then improves the marked sections while preserving unmarked regions. Since large-scale datasets of human-marked errors are difficult to collect, we generate synthetic datasets using various automated masking functions and train models on each. Evaluations show that models trained with noise power-based masking (in dBFS) and probabilistic thresholding perform best, aligning with human annotations. In a study with 22 participants, users showed a preference for refined outputs over baseline TSE. Our findings demonstrate that human-in-the-loop refinement is a promising approach for improving the performance of neural speech extraction.

【2】Resource-Efficient Automatic Software Vulnerability Assessment via Knowledge Distillation and Particle Swarm Optimization
标题：基于知识蒸馏和粒子群优化的资源有效的软件脆弱性自动评估
链接：https://arxiv.org/abs/2508.02840

作者：Gao, Xiang Chen, Jiyu Wang, Jibin Wang, Guang Yang
备注：Accepted by Engineering Applications of Artificial Intelligence
摘要：软件系统日益复杂，导致网络安全漏洞激增，需要有效和可扩展的漏洞评估解决方案。然而，在现实世界的场景中部署大型预训练模型受到其大量计算和存储需求的阻碍。为了应对这一挑战，我们提出了一种新的资源高效的框架，集成了知识蒸馏和粒子群优化，使自动化的漏洞评估。我们的框架采用了两个阶段的方法：首先，粒子群优化是用来优化一个紧凑的学生模型的架构，平衡计算效率和模型容量。其次，知识蒸馏应用于关键脆弱性评估知识从一个大的教师模型转移到优化的学生模型。此过程在保持高性能的同时显著减小了模型大小。在包含12，071个CVSS（通用漏洞评分系统）v3注释漏洞的增强MegaVul数据集上的实验结果证明了我们方法的有效性。我们的方法实现了99.4%的模型大小减少，同时保留89.3%的原始模型的准确性。此外，它的精度比最先进的基线高1.7%，参数减少60%。与传统遗传算法相比，该框架还减少了72.1%的训练时间和34.88%的架构搜索时间。
摘要：The increasing complexity of software systems has led to a surge in cybersecurity vulnerabilities, necessitating efficient and scalable solutions for vulnerability assessment. However, the deployment of large pre-trained models in real-world scenarios is hindered by their substantial computational and storage demands. To address this challenge, we propose a novel resource-efficient framework that integrates knowledge distillation and particle swarm optimization to enable automated vulnerability assessment. Our framework employs a two-stage approach: First, particle swarm optimization is utilized to optimize the architecture of a compact student model, balancing computational efficiency and model capacity. Second, knowledge distillation is applied to transfer critical vulnerability assessment knowledge from a large teacher model to the optimized student model. This process significantly reduces the model size while maintaining high performance. Experimental results on an enhanced MegaVul dataset, comprising 12,071 CVSS (Common Vulnerability Scoring System) v3 annotated vulnerabilities, demonstrate the effectiveness of our approach. Our approach achieves a 99.4% reduction in model size while retaining 89.3% of the original model's accuracy. Furthermore, it outperforms state-of-the-art baselines by 1.7% in accuracy with 60% fewer parameters. The framework also reduces training time by 72.1% and architecture search time by 34.88% compared to traditional genetic algorithms.

推荐(3篇)

【1】Personalized Recommendation of Dish and Restaurant Collections on iFood
标题：iFood上个性化推荐菜肴和餐厅精选
链接：https://arxiv.org/abs/2508.03670

作者：F. Granado, Davi A. Bezerra, Iuri Queiroz, Nathan Oliveira, Pedro Fernandes, Bruno Schock
备注：Workshop on Two-sided Marketplace Optimization: Search, Discovery, Matching, Pricing & Growth in conjunction with KDD Conference (KDD 2025) in Toronto, Canada
摘要：食品配送平台面临的挑战是帮助用户浏览大量的餐馆和菜肴目录，找到他们真正喜欢的食物。本文介绍了RED，这是一个为拉丁美洲最大的按需食品交付平台iFood设计的自动推荐系统，可以个性化选择向数百万用户展示的精选食品。我们的方法采用了LightGBM分类器，该分类器基于三个特征组对集合进行评分：集合特征，用户集合相似性和上下文信息。为了解决推荐新创建的集合的冷启动问题，我们使用项嵌入开发基于内容的表示，并实现单调性约束以提高泛化能力。我们通过从类别轮播互动中引导来解决数据稀缺问题，并通过对生产中的印象和购买进行无偏见的抽样来解决可见性偏差。该系统通过对iFood用户群的5-10%进行广泛的A/B测试，证明了其对现实世界的重大影响。我们的A/B测试的在线结果与基于流行度的基线相比，卡转化率提高了97%，整体应用转化率提高了1.4%。值得注意的是，我们的离线准确性指标与在线性能密切相关，从而在部署前实现可靠的影响预测。据我们所知，这是第一个详细介绍在动态商业环境中大规模推荐精选食品的工作。
摘要：Food delivery platforms face the challenge of helping users navigate vast catalogs of restaurants and dishes to find meals they truly enjoy. This paper presents RED, an automated recommendation system designed for iFood, Latin America's largest on-demand food delivery platform, to personalize the selection of curated food collections displayed to millions of users. Our approach employs a LightGBM classifier that scores collections based on three feature groups: collection characteristics, user-collection similarity, and contextual information. To address the cold-start problem of recommending newly created collections, we develop content-based representations using item embeddings and implement monotonicity constraints to improve generalization. We tackle data scarcity by bootstrapping from category carousel interactions and address visibility bias through unbiased sampling of impressions and purchases in production. The system demonstrates significant real-world impact through extensive A/B testing with 5-10% of iFood's user base. Online results of our A/B tests add up to 97% improvement in Card Conversion Rate and 1.4% increase in overall App Conversion Rate compared to popularity-based baselines. Notably, our offline accuracy metrics strongly correlate with online performance, enabling reliable impact prediction before deployment. To our knowledge, this is the first work to detail large-scale recommendation of curated food collections in a dynamic commercial environment.

【2】Parameter-Efficient Single Collaborative Branch for Recommendation
标题：一种参数有效的单协作分支推荐算法
链接：https://arxiv.org/abs/2508.03518

作者：cati, Shah Nawaz, Markus Schedl
备注：5 pages
摘要：推荐系统（RS）通常依赖于用户和项目在联合嵌入空间中的表示以及相似性度量来计算相关性分数。在现代RS中，获取用户和项目表示的模块由两个不同且独立的神经网络（NN）组成。在多模态表示学习中，权重共享已被证明可以有效地减少同一项目的多个模态之间的距离。受这些方法的启发，我们提出了一种新的RS，它利用用户和项目NN模块之间的权重共享来获得共享嵌入空间中的潜在表示。拟议的框架包括一个单一的建议合作处（CoBraR）。我们评估CoBraR通过定量实验对电子商务和电影推荐。我们的实验表明，通过减少参数的数量和提高超越精度方面，而不影响精度，CoBraR有潜力被应用和扩展到现实世界的场景。
摘要：Recommender Systems (RS) often rely on representations of users and items in a joint embedding space and on a similarity metric to compute relevance scores. In modern RS, the modules to obtain user and item representations consist of two distinct and separate neural networks (NN). In multimodal representation learning, weight sharing has been proven effective in reducing the distance between multiple modalities of a same item. Inspired by these approaches, we propose a novel RS that leverages weight sharing between the user and item NN modules used to obtain the latent representations in the shared embedding space. The proposed framework consists of a single Collaborative Branch for Recommendation (CoBraR). We evaluate CoBraR by means of quantitative experiments on e-commerce and movie recommendation. Our experiments show that by reducing the number of parameters and improving beyond-accuracy aspects without compromising accuracy, CoBraR has the potential to be applied and extended for real-world scenarios.

【3】Realizing Scaling Laws in Recommender Systems: A Foundation-Expert Paradigm for Hyperscale Model Deployment
标题：在推荐系统中实现缩放定律：超规模模型部署的基础专家范式
链接：https://arxiv.org/abs/2508.02929

作者：evin Course, Wei Li, Hongwei Li, Jie Hua, Yiqi Chen, Zhao Zhu, Rui Jian, Xuan Cao, Bi Xue, Yu Shi, Jing Qian, Kai Ren, Matt Ma, Qunshu Zhang, Rui Li
摘要：虽然缩放定律承诺显着的推荐系统的性能增益，有效地部署超大规模模型仍然是一个重大的未解决的挑战。与已经广泛采用FM的领域（如自然语言处理和计算机视觉）相比，推荐系统的进展受到独特挑战的阻碍，包括需要在不断变化的数据分布下从在线流数据中学习，需要适应不同的推荐表面，其下游任务和输入分布具有广泛的多样性，以及严格的延迟和计算约束。为了弥合这一差距，我们建议利用基金会专家范式：一个框架，旨在开发和部署超大规模的建议FM。在我们的方法中，中央FM在终身，跨表面，多模态用户数据上进行训练，以学习可概括的知识。然后，这些知识通过目标感知嵌入有效地转移到各种轻量级的、表面特定的"专家”模型，使它们能够以最小的开销适应本地数据分布和优化目标。为了满足我们的培训，推理和开发需求，我们构建了HyperCast，这是一个生产级的基础设施系统，可以重新设计培训，服务，日志记录和迭代，以支持这种解耦的范式。我们的方法现在部署在Meta上，每天服务数百亿用户请求，与我们以前的一阶段生产系统相比，展示了在线指标的改进，同时提高了开发人员的速度并保持了基础设施的效率。据我们所知，这项工作代表了在这种规模下首次成功部署的基础专家范式，提供了一个经过验证的，计算效率高，开发人员友好的蓝图，以实现推荐系统中的缩放定律的承诺。
摘要：While scaling laws promise significant performance gains for recommender systems, efficiently deploying hyperscale models remains a major unsolved challenge. In contrast to fields where FMs are already widely adopted such as natural language processing and computer vision, progress in recommender systems is hindered by unique challenges including the need to learn from online streaming data under shifting data distributions, the need to adapt to different recommendation surfaces with a wide diversity in their downstream tasks and their input distributions, and stringent latency and computational constraints. To bridge this gap, we propose to leverage the Foundation-Expert Paradigm: a framework designed for the development and deployment of hyperscale recommendation FMs. In our approach, a central FM is trained on lifelong, cross-surface, multi-modal user data to learn generalizable knowledge. This knowledge is then efficiently transferred to various lightweight, surface-specific ``expert" models via target-aware embeddings, allowing them to adapt to local data distributions and optimization goals with minimal overhead. To meet our training, inference and development needs, we built HyperCast, a production-grade infrastructure system that re-engineers training, serving, logging and iteration to power this decoupled paradigm. Our approach is now deployed at Meta serving tens of billions of user requests daily, demonstrating online metric improvements over our previous one-stage production system while improving developer velocity and maintaining infrastructure efficiency. To the best of our knowledge, this work represents the first successful deployment of a Foundation-Expert paradigm at this scale, offering a proven, compute-efficient, and developer-friendly blueprint to realize the promise of scaling laws in recommender systems.

聚类(1篇)

【1】Unveiling Location-Specific Price Drivers: A Two-Stage Cluster Analysis for Interpretable House Price Predictions
标题：揭示特定地点的价格驱动因素：可解释房价预测的两阶段集群分析
链接：https://arxiv.org/abs/2508.03156

作者：er, Julian Rosenberger, Mathias Kraus, Patrick Zschech, Nico Hambauer
备注：Accepted at 20th International Conference on Wirtschaftsinformatik (WI25); September 2025, Münster, Germany
摘要：由于当地市场的变化，房价估值仍然具有挑战性。现有的方法通常依赖于黑箱机器学习模型，缺乏可解释性，或者简单化的方法，如线性回归（LR），无法捕捉市场异质性。为了解决这个问题，我们提出了一种机器学习方法，该方法应用两阶段聚类，首先根据最小的基于位置的特征对属性进行分组，然后再结合其他特征。然后使用LR或广义加性模型（GAM）对每个聚类进行建模，平衡预测性能与可解释性。从2023年开始，我们在43，309个德国房产列表上构建和评估了我们的模型，与没有聚类的模型相比，GAM的平均绝对误差提高了36%，LR的平均绝对误差提高了58%。此外，图形分析揭示了集群之间的模式转变。这些发现强调了集群特定见解的重要性，增强了可解释性，并为寻求更可靠的房地产估值的买家，卖家和房地产分析师提供了实用价值。
摘要：House price valuation remains challenging due to localized market variations. Existing approaches often rely on black-box machine learning models, which lack interpretability, or simplistic methods like linear regression (LR), which fail to capture market heterogeneity. To address this, we propose a machine learning approach that applies two-stage clustering, first grouping properties based on minimal location-based features before incorporating additional features. Each cluster is then modeled using either LR or a generalized additive model (GAM), balancing predictive performance with interpretability. Constructing and evaluating our models on 43,309 German house property listings from 2023, we achieve a 36% improvement for the GAM and 58% for LR in mean absolute error compared to models without clustering. Additionally, graphical analyses unveil pattern shifts between clusters. These findings emphasize the importance of cluster-specific insights, enhancing interpretability and offering practical value for buyers, sellers, and real estate analysts seeking more reliable property valuations.

超分辨率|去噪|去模糊|去雾(1篇)

【1】Ultralight Polarity-Split Neuromorphic SNN for Event-Stream Super-Resolution
标题：超轻两极分裂神经形态SNN，用于事件流超分辨率
链接：https://arxiv.org/abs/2508.03244

作者：Xu, Haoxian Zhou, Langyi Chen, Yuk Ying Chung, Qiang Qu
摘要：活动摄像机具有无与伦比的优势，例如高时间分辨率、低延迟和高动态范围。然而，它们有限的空间分辨率对细粒度感知任务提出了挑战。在这项工作中，我们提出了一种超轻量级的，基于流的事件到事件的超分辨率方法的基础上尖峰神经网络（SNN），设计用于实时部署在资源受限的设备。为了进一步减小模型大小，我们引入了一种新的双前向极性分裂事件编码策略，通过共享SNN将正面和负面事件合并到单独的前向路径中。此外，我们提出了一个可学习的时空极性感知损失（LearnSTPLoss），自适应地平衡时间，空间和极性的一致性，使用可学习的不确定性为基础的权重。实验结果表明，我们的方法在多个数据集上实现了有竞争力的超分辨率性能，同时显着减少了模型大小和推理时间。轻量级设计使模块能够嵌入事件摄像机中，或将其用作下游视觉任务的有效前端预处理。
摘要：Event cameras offer unparalleled advantages such as high temporal resolution, low latency, and high dynamic range. However, their limited spatial resolution poses challenges for fine-grained perception tasks. In this work, we propose an ultra-lightweight, stream-based event-to-event super-resolution method based on Spiking Neural Networks (SNNs), designed for real-time deployment on resource-constrained devices. To further reduce model size, we introduce a novel Dual-Forward Polarity-Split Event Encoding strategy that decouples positive and negative events into separate forward paths through a shared SNN. Furthermore, we propose a Learnable Spatio-temporal Polarity-aware Loss (LearnSTPLoss) that adaptively balances temporal, spatial, and polarity consistency using learnable uncertainty-based weights. Experimental results demonstrate that our method achieves competitive super-resolution performance on multiple datasets while significantly reducing model size and inference time. The lightweight design enables embedding the module into event cameras or using it as an efficient front-end preprocessing for downstream vision tasks.

联邦学习|隐私保护|加密(1篇)

【1】Heterogeneity-Oblivious Robust Federated Learning
标题：不经意的稳健联邦学习
链接：https://arxiv.org/abs/2508.03579

作者：ang, Jinyang Li, Qi Song, Miao Wang, Chungang Lin, Haitong Luo, Xuying Meng, Yujun Zhang
备注：Under review
摘要：联邦学习（FL）仍然非常容易受到中毒攻击，特别是在现实世界的超异构性下，客户端在数据分布，通信能力和模型架构方面存在显着差异。这种异质性不仅破坏了聚合策略的有效性，而且使攻击更难以检测。此外，高维模型扩大了攻击面。为了解决这些挑战，我们提出了Horus，一个以低秩适应（LoRAs）为中心的异质性无关的强大FL框架。Horus不是聚合完整的模型参数，而是将LoRA插入到经验稳定的层中，并仅聚合LoRA以减少攻击面。我们发现了一个关键的经验观察结果，即在异质性和中毒情况下，输入投影（LoRA-A）比输出投影（LoRA-B）明显更稳定。利用这一点，我们使用LoRA-A的功能设计了一个异质性不经意中毒分数来过滤中毒的客户端。对于剩余的良性客户端，我们提出了投影感知聚合机制，以保持协作信号，同时抑制漂移，重新加权客户端更新的一致性与全球方向。在不同的数据集、模型架构和攻击中进行的大量实验表明，Horus在鲁棒性和准确性方面始终优于最先进的基线。
摘要：Federated Learning (FL) remains highly vulnerable to poisoning attacks, especially under real-world hyper-heterogeneity, where clients differ significantly in data distributions, communication capabilities, and model architectures. Such heterogeneity not only undermines the effectiveness of aggregation strategies but also makes attacks more difficult to detect. Furthermore, high-dimensional models expand the attack surface. To address these challenges, we propose Horus, a heterogeneity-oblivious robust FL framework centered on low-rank adaptations (LoRAs). Rather than aggregating full model parameters, Horus inserts LoRAs into empirically stable layers and aggregates only LoRAs to reduce the attack surface.We uncover a key empirical observation that the input projection (LoRA-A) is markedly more stable than the output projection (LoRA-B) under heterogeneity and poisoning. Leveraging this, we design a Heterogeneity-Oblivious Poisoning Score using the features from LoRA-A to filter poisoned clients. For the remaining benign clients, we propose projection-aware aggregation mechanism to preserve collaborative signals while suppressing drifts, which reweights client updates by consistency with the global directions. Extensive experiments across diverse datasets, model architectures, and attacks demonstrate that Horus consistently outperforms state-of-the-art baselines in both robustness and accuracy.

推理|分析|理解|解释(10篇)

【1】DeepFaith: A Domain-Free and Model-Agnostic Unified Framework for Highly Faithful Explanations
标题：DeepFaith：一个无领域且模型不可知的统一框架，用于高度忠实的解释
链接：https://arxiv.org/abs/2508.03586

作者：, Lizhong Ding, Shihan Jia, Yanyu Ren, Pengqi Li, Jiarun Fu, Changsheng Li, Ye yuan, Guoren Wang
备注：22 pages
摘要：可解释人工智能（XAI）通过揭示决策原理的模型归因方法在复杂系统中建立信任。然而，由于缺乏统一的最佳解释，现有的XAI方法缺乏客观评估和优化的基础事实。为了解决这个问题，我们提出了基于深度架构的信仰解释器（DeepFaith），一个领域自由和模型不可知的统一解释框架下的忠实性。通过建立一个统一的配方，多个广泛使用和验证的忠诚度指标，我们得到了一个最佳的解释目标，其解决方案同时实现最佳的忠诚度在这些指标，从而提供了一个从理论角度来看地面真相。我们设计了一个解释器学习框架，利用多种现有的解释方法，应用去重和过滤来构建高质量的监督解释信号，并优化模式一致性损失和局部相关性来训练一个忠实的解释器。经过训练后，DeepFaith可以通过一次向前传递生成高度忠实的解释，而无需访问正在解释的模型。在跨越6个模型和6个数据集的12个不同解释任务中，与所有基线方法相比，DeepFaith在10个指标上实现了最高的整体忠实度，突出了其有效性和跨领域的可推广性。
摘要：Explainable AI (XAI) builds trust in complex systems through model attribution methods that reveal the decision rationale. However, due to the absence of a unified optimal explanation, existing XAI methods lack a ground truth for objective evaluation and optimization. To address this issue, we propose Deep architecture-based Faith explainer (DeepFaith), a domain-free and model-agnostic unified explanation framework under the lens of faithfulness. By establishing a unified formulation for multiple widely used and well-validated faithfulness metrics, we derive an optimal explanation objective whose solution simultaneously achieves optimal faithfulness across these metrics, thereby providing a ground truth from a theoretical perspective. We design an explainer learning framework that leverages multiple existing explanation methods, applies deduplicating and filtering to construct high-quality supervised explanation signals, and optimizes both pattern consistency loss and local correlation to train a faithful explainer. Once trained, DeepFaith can generate highly faithful explanations through a single forward pass without accessing the model being explained. On 12 diverse explanation tasks spanning 6 models and 6 datasets, DeepFaith achieves the highest overall faithfulness across 10 metrics compared to all baseline methods, highlighting its effectiveness and cross-domain generalizability.

【2】VRPRM: Process Reward Modeling via Visual Reasoning
标题：VNPRM：通过视觉推理的流程奖励建模
链接：https://arxiv.org/abs/2508.03556

作者：hen, Bangwei Liu, Xuhong Wang
备注：13 pages, 5 figures
摘要：过程奖励模型（Process Reward Model，PRM）由于能够对生成内容的推理步骤进行细粒度评估，被广泛应用于大语言模型（Large Language Model，LLM）的后训练中。然而，大多数PRM缺乏长期推理和深入思考的能力。另一方面，虽然有一些作品试图将Chain-of-Thought能力引入到PRM中，但CoT-PRM数据的注释成本太昂贵，无法在各种任务中发挥稳定的作用。为了解决上述挑战，我们提出了VRPRM，通过视觉推理的过程奖励模型，并设计了一个有效的两阶段训练策略。实验结果表明，在BoN实验中，仅使用3.6K CoT-PRM SFT数据和50 K non-CoTPRM RL训练数据，VRPRM就可以超过总数据量为400 K的非思维PRM，相对于基本模型的性能提高高达118%.这一结果证实了所提出的组合训练策略可以以较低的数据标注成本实现更高质量的推理能力，从而为PRM训练提供了一种新的范式，具有更有效的数据利用率。
摘要：Process Reward Model (PRM) is widely used in the post-training of Large Language Model (LLM) because it can perform fine-grained evaluation of the reasoning steps of generated content. However, most PRMs lack long-term reasoning and deep thinking capabilities. On the other hand, although a few works have tried to introduce Chain-of-Thought capability into PRMs, the annotation cost of CoT-PRM data is too expensive to play a stable role in various tasks. To address the above challenges, we propose VRPRM, a process reward model via visual reasoning, and design an efficient two-stage training strategy. Experimental results show that using only 3.6K CoT-PRM SFT data and 50K non-CoT PRM RL training data, VRPRM can surpass the non-thinking PRM with a total data volume of 400K and achieved a relative performance improvement of up to 118\% over the base model in the BoN experiment. This result confirms that the proposed combined training strategy can achieve higher quality reasoning capabilities at a lower data annotation cost, thus providing a new paradigm for PRM training with more efficient data utilization.

【3】A Comparative Study of Neurosymbolic AI Approaches to Interpretable Logical Reasoning
标题：神经符号人工智能可解释逻辑推理方法的比较研究
链接：https://arxiv.org/abs/2508.03366

作者：. Chen
备注：Accepted to NeSy 2025
摘要：一般逻辑推理，定义为在领域不可知的任务上进行演绎推理的能力，仍然是大型语言模型（LLM）的一个挑战。目前的LLM无法确定性地推理，并且不可解释。因此，最近人们对神经符号AI的兴趣激增，它试图将逻辑融入神经网络。我们首先确定两个主要的神经符号方法，以提高逻辑推理：（i）综合方法包括模型，其中符号推理包含在神经网络中，以及（ii）混合方法包括模型，其中符号求解器，从神经网络中分离，执行符号推理。两者都包含AI系统，在特定领域的逻辑推理基准上都有很好的结果。然而，他们的表现领域无关的基准是研究不足。据我们所知，还没有一个比较的对比方法，回答以下问题：哪种方法是更有前途的发展一般逻辑推理？为了分析它们的潜力，引入了以下同类最佳的领域不可知模型：逻辑神经网络（LNN），它使用综合方法，和LLM-Symbolic Solver（LLM-SS），它使用混合方法。使用这两种模型作为案例研究和每种方法的代表，我们的分析表明，混合方法更有前途的发展一般逻辑推理，因为（i）它的推理链更可解释，（ii）它保留了现有LLM的能力和优势。为了支持未来使用混合方法的工作，我们提出了一个基于LLM-SS的可通用框架，该框架在设计上是模块化的，与模型无关，与领域无关，并且几乎不需要人工输入。
摘要：General logical reasoning, defined as the ability to reason deductively on domain-agnostic tasks, continues to be a challenge for large language models (LLMs). Current LLMs fail to reason deterministically and are not interpretable. As such, there has been a recent surge in interest in neurosymbolic AI, which attempts to incorporate logic into neural networks. We first identify two main neurosymbolic approaches to improving logical reasoning: (i) the integrative approach comprising models where symbolic reasoning is contained within the neural network, and (ii) the hybrid approach comprising models where a symbolic solver, separate from the neural network, performs symbolic reasoning. Both contain AI systems with promising results on domain-specific logical reasoning benchmarks. However, their performance on domain-agnostic benchmarks is understudied. To the best of our knowledge, there has not been a comparison of the contrasting approaches that answers the following question: Which approach is more promising for developing general logical reasoning? To analyze their potential, the following best-in-class domain-agnostic models are introduced: Logic Neural Network (LNN), which uses the integrative approach, and LLM-Symbolic Solver (LLM-SS), which uses the hybrid approach. Using both models as case studies and representatives of each approach, our analysis demonstrates that the hybrid approach is more promising for developing general logical reasoning because (i) its reasoning chain is more interpretable, and (ii) it retains the capabilities and advantages of existing LLMs. To support future works using the hybrid approach, we propose a generalizable framework based on LLM-SS that is modular by design, model-agnostic, domain-agnostic, and requires little to no human input.

【4】Towards Trustworthy Multimodal Moderation via Policy-Aligned Reasoning and Hierarchical Labeling
标题：通过政策一致推理和分层标签实现值得信赖的多模式调节
链接：https://arxiv.org/abs/2508.03296

作者：Wenwei Jin, Jintao Tong, Pengda Qin, Weijia Li, Guo Lu
摘要：社交平台彻底改变了信息共享，但也加速了有害和违反政策内容的传播。为了确保大规模的安全和合规性，调节系统必须超越效率，并提供准确性和可解释性。然而，目前的方法在很大程度上依赖于嘈杂的、标签驱动的学习，缺乏与适度规则的一致性，并产生阻碍人类审查的不透明决策。因此，我们提出了分层警卫（Hi-Guard），一个多模式的缓和框架，引入了一个新的政策对齐的决策范式。术语“分层”反映了我们系统设计的两个关键方面：（1）分层审核管道，其中轻量级二进制模型首先过滤安全内容，更强大的模型处理细粒度风险分类;以及（2）第二阶段的分层分类，其中模型在从粗到细粒度级别的分层分类上执行基于路径的分类。为了确保与不断发展的审核策略保持一致，Hi-Guard直接将规则定义合并到模型提示中。为了进一步增强结构化预测和推理，我们引入了多级软利润奖励，并使用组相对策略优化（GRPO）进行优化，惩罚语义相邻的错误分类并提高解释质量。广泛的实验和实际部署表明，Hi-Guard实现了卓越的分类准确性，泛化性和可解释性，为可扩展，透明和值得信赖的内容安全系统铺平了道路。代码可从以下网址获得：https://github.com/lianqi1008/Hi-Guard。
摘要：Social platforms have revolutionized information sharing, but also accelerated the dissemination of harmful and policy-violating content. To ensure safety and compliance at scale, moderation systems must go beyond efficiency and offer accuracy and interpretability. However, current approaches largely rely on noisy, label-driven learning, lacking alignment with moderation rules and producing opaque decisions that hinder human review. Therefore, we propose Hierarchical Guard (Hi-Guard), a multimodal moderation framework that introduces a new policy-aligned decision paradigm. The term "Hierarchical" reflects two key aspects of our system design: (1) a hierarchical moderation pipeline, where a lightweight binary model first filters safe content and a stronger model handles fine-grained risk classification; and (2) a hierarchical taxonomy in the second stage, where the model performs path-based classification over a hierarchical taxonomy ranging from coarse to fine-grained levels. To ensure alignment with evolving moderation policies, Hi-Guard directly incorporates rule definitions into the model prompt. To further enhance structured prediction and reasoning, we introduce a multi-level soft-margin reward and optimize with Group Relative Policy Optimization (GRPO), penalizing semantically adjacent misclassifications and improving explanation quality. Extensive experiments and real-world deployment demonstrate that Hi-Guard achieves superior classification accuracy, generalization, and interpretability, paving the way toward scalable, transparent, and trustworthy content safety systems. Code is available at: https://github.com/lianqi1008/Hi-Guard.

【5】Convergence of Deterministic and Stochastic Diffusion-Model Samplers: A Simple Analysis in Wasserstein Distance
标题：确定性和随机扩散模型采样器的收敛：沃瑟斯坦距离的简单分析
链接：https://arxiv.org/abs/2508.03210

作者：ler (SIERRA), Francis Bach (SIERRA)
摘要：我们为基于扩散的生成模型提供了新的Wasserstein距离收敛保证，涵盖随机（类似DDPM）和确定性（类似DDIM）采样方法。我们引入了一个简单的框架来分析离散化，初始化和分数估计错误。值得注意的是，我们推导出的Heun采样器的第一Wasserstein收敛界和改进现有的结果的概率流ODE的欧拉采样器。我们的分析强调了学习得分函数的空间规律性的重要性，并主张控制与真实反向过程相关的得分误差，与去噪得分匹配一致。我们还将最近的结果平滑Wasserstein距离锐化初始化误差界。
摘要：We provide new convergence guarantees in Wasserstein distance for diffusion-based generative models, covering both stochastic (DDPM-like) and deterministic (DDIM-like) sampling methods. We introduce a simple framework to analyze discretization, initialization, and score estimation errors. Notably, we derive the first Wasserstein convergence bound for the Heun sampler and improve existing results for the Euler sampler of the probability flow ODE. Our analysis emphasizes the importance of spatial regularity of the learned score function and argues for controlling the score error with respect to the true reverse process, in line with denoising score matching. We also incorporate recent results on smoothed Wasserstein distances to sharpen initialization error bounds.

【6】Quantum Spectral Reasoning: A Non-Neural Architecture for Interpretable Machine Learning
标题：量子谱推理：可解释机器学习的非神经架构
链接：https://arxiv.org/abs/2508.03170

作者：ruluta
摘要：我们提出了一种新的机器学习架构，该架构通过利用量子谱方法，特别是Pade逼近和Lanczos算法，从传统的神经网络范式出发，用于可解释的信号分析和符号推理。我们的方法的核心创新在于它能够将原始时域信号转换为稀疏的，有物理意义的频谱表示，而无需使用反向传播，高维嵌入或数据密集型黑盒模型。通过有理谱近似，该系统提取共振结构，然后通过核投影函数将其映射到符号谓词，从而通过基于规则的推理引擎进行逻辑推理。这种架构将数学物理、稀疏近似理论和符号人工智能联系在一起，为深度学习模型提供了一种透明的、基于物理的替代方案。我们开发了管道每个阶段的完整数学形式，提供了一个模块化的算法实现，并通过对时间序列异常检测，符号分类和混合推理任务的比较评估来证明系统的有效性。我们的研究结果表明，这种频谱符号架构在保持可解释性和数据效率的同时，实现了具有竞争力的准确性，为物理信息，推理能力的机器学习提出了一个有前途的新方向。
摘要：We propose a novel machine learning architecture that departs from conventional neural network paradigms by leveraging quantum spectral methods, specifically Pade approximants and the Lanczos algorithm, for interpretable signal analysis and symbolic reasoning. The core innovation of our approach lies in its ability to transform raw time-domain signals into sparse, physically meaningful spectral representations without the use of backpropagation, high-dimensional embeddings, or data-intensive black-box models. Through rational spectral approximation, the system extracts resonant structures that are then mapped into symbolic predicates via a kernel projection function, enabling logical inference through a rule-based reasoning engine. This architecture bridges mathematical physics, sparse approximation theory, and symbolic artificial intelligence, offering a transparent and physically grounded alternative to deep learning models. We develop the full mathematical formalism underlying each stage of the pipeline, provide a modular algorithmic implementation, and demonstrate the system's effectiveness through comparative evaluations on time-series anomaly detection, symbolic classification, and hybrid reasoning tasks. Our results show that this spectral-symbolic architecture achieves competitive accuracy while maintaining interpretability and data efficiency, suggesting a promising new direction for physically-informed, reasoning-capable machine learning.

【7】CoTox: Chain-of-Thought-Based Molecular Toxicity Reasoning and Prediction
标题：CoTox：基于知识链的分子毒性推理和预测
链接：https://arxiv.org/abs/2508.03159

作者：k, Yein Park, Minju Song, Soyon Park, Donghyeon Lee, Seungheun Baek, Jaewoo Kang
备注：Under review
摘要：药物毒性仍然是药物开发中的主要挑战。最近的机器学习模型已经改进了计算机毒性预测，但它们对注释数据的依赖和缺乏可解释性限制了它们的适用性。这限制了它们捕获由复杂生物机制驱动的器官特异性毒性的能力。大型语言模型（LLM）通过逐步推理和整合文本数据提供了一种有前途的替代方案，但现有方法缺乏生物背景和透明的原理。为了解决这个问题，我们提出了CoTox，一个新的框架，集成了LLM与思想链（CoT）推理的多毒性预测。CoTox结合了化学结构数据、生物学途径和基因本体（GO）术语，通过逐步推理生成可解释的毒性预测。使用GPT-4 o，我们证明CoTox优于传统机器学习和深度学习模型。我们进一步研究了它在各种LLM中的性能，以确定CoTox最有效的地方。此外，我们发现用IUPAC名称表示化学结构（LLM比SMILES更容易理解）可以增强模型的推理能力并提高预测性能。为了证明其在药物开发中的实用性，我们模拟了用药物治疗相关细胞类型，并将所得生物学背景纳入CoTox框架。这种方法允许CoTox生成与生理反应一致的毒性预测，如案例研究所示。这一结果突出了基于LLM的框架在提高可解释性和支持早期药物安全性评估方面的潜力。本工作中使用的代码和提示符可在https://github.com/dmis-lab/CoTox上获得。
摘要：Drug toxicity remains a major challenge in pharmaceutical development. Recent machine learning models have improved in silico toxicity prediction, but their reliance on annotated data and lack of interpretability limit their applicability. This limits their ability to capture organ-specific toxicities driven by complex biological mechanisms. Large language models (LLMs) offer a promising alternative through step-by-step reasoning and integration of textual data, yet prior approaches lack biological context and transparent rationale. To address this issue, we propose CoTox, a novel framework that integrates LLM with chain-of-thought (CoT) reasoning for multi-toxicity prediction. CoTox combines chemical structure data, biological pathways, and gene ontology (GO) terms to generate interpretable toxicity predictions through step-by-step reasoning. Using GPT-4o, we show that CoTox outperforms both traditional machine learning and deep learning model. We further examine its performance across various LLMs to identify where CoTox is most effective. Additionally, we find that representing chemical structures with IUPAC names, which are easier for LLMs to understand than SMILES, enhances the model's reasoning ability and improves predictive performance. To demonstrate its practical utility in drug development, we simulate the treatment of relevant cell types with drug and incorporated the resulting biological context into the CoTox framework. This approach allow CoTox to generate toxicity predictions aligned with physiological responses, as shown in case study. This result highlights the potential of LLM-based frameworks to improve interpretability and support early-stage drug safety assessment. The code and prompt used in this work are available at https://github.com/dmis-lab/CoTox.

【8】Accelerating SGDM via Learning Rate and Batch Size Schedules: A Lyapunov-Based Analysis
标题：通过学习率和批量计划加速SGDM：基于Lyapunov的分析
链接：https://arxiv.org/abs/2508.03105

作者：ndo, Hideaki Iiduka
摘要：通过引入一种新的李雅普诺夫函数，分析了具有动量的随机梯度下降算法（SGDM）在动态学习率和批量调度下的收敛行为。该李雅普诺夫函数与现有的结构相比具有更简单的结构，便于SGDM的具有挑战性的收敛性分析和跨各种动态调度的统一分析。具体来说，我们扩展了理论框架，以涵盖深度学习中常用的三种实际调度策略：（i）恒定的批量大小与衰减的学习率，（ii）增加批量大小与衰减的学习率，以及（iii）增加批量大小与增加学习率。我们的理论结果揭示了收敛行为的一个清晰的层次：虽然（i）不能保证期望梯度范数的收敛，但（ii）和（iii）都可以。此外，（iii）实现了比（i）和（ii）更快的衰减速率，即使在动量存在的情况下也证明了理论加速。实验结果验证了我们的理论，表明动态调度SGDM的收敛速度显着优于固定的超参数基线。我们还在实验中评估了热身时间表，该时间表在收敛行为方面的经验表现优于所有其他策略。这些发现为现代深度学习中设计高效稳定的训练过程提供了统一的理论基础和实践指导。
摘要：We analyze the convergence behavior of stochastic gradient descent with momentum (SGDM) under dynamic learning rate and batch size schedules by introducing a novel Lyapunov function. This Lyapunov function has a simpler structure compared with existing ones, facilitating the challenging convergence analysis of SGDM and a unified analysis across various dynamic schedules. Specifically, we extend the theoretical framework to cover three practical scheduling strategies commonly used in deep learning: (i) constant batch size with a decaying learning rate, (ii) increasing batch size with a decaying learning rate, and (iii) increasing batch size with an increasing learning rate. Our theoretical results reveal a clear hierarchy in convergence behavior: while (i) does not guarantee convergence of the expected gradient norm, both (ii) and (iii) do. Moreover, (iii) achieves a provably faster decay rate than (i) and (ii), demonstrating theoretical acceleration even in the presence of momentum. Empirical results validate our theory, showing that dynamically scheduled SGDM significantly outperforms fixed-hyperparameter baselines in convergence speed. We also evaluated a warm-up schedule in experiments, which empirically outperformed all other strategies in convergence behavior. These findings provide a unified theoretical foundation and practical guidance for designing efficient and stable training procedures in modern deep learning.

【9】Urban In-Context Learning: Bridging Pretraining and Inference through Masked Diffusion for Urban Profiling
标题：城市背景学习：通过城市概况的掩蔽扩散连接预训练和推理
链接：https://arxiv.org/abs/2508.03042

作者：hang, Bo Wang, Tongyu Zhu, Leilei Sun, Weifeng Lv
摘要：城市概况旨在预测未知地区的城市概况，在经济和社会普查中发挥着关键作用。现有的方法通常遵循两个阶段的范式：第一，学习表示的城市地区;第二，执行下游预测通过线性探测，这起源于BERT时代。受GPT风格模型发展的启发，最近的研究表明，新的自我监督预训练方案可以使模型直接适用于下游任务，从而消除了针对特定任务的微调。这在很大程度上是因为GPT通过下一个令牌预测统一了预训练和推理的形式。然而，城市数据表现出与语言根本不同的结构特征，这使得设计一个统一预训练和推理的单阶段模型具有挑战性。在这项工作中，我们提出了城市的上下文学习，一个框架，统一的预训练和推理，通过一个掩蔽的自动编码过程在城市地区。为了捕捉城市轮廓的分布，我们引入了城市掩蔽扩散Transformer，它使每个区域的预测能够表示为分布而不是确定性值。此外，为了稳定扩散训练，我们提出了城市表示对齐机制，该机制通过将模型的中间特征与经典的城市分析方法对齐来调整模型的中间特征。在两个城市的三个指标上进行的广泛实验表明，我们的一阶段方法始终优于最先进的两阶段方法。消融研究和案例研究进一步验证了每个模块的有效性，特别是使用扩散建模。
摘要：Urban profiling aims to predict urban profiles in unknown regions and plays a critical role in economic and social censuses. Existing approaches typically follow a two-stage paradigm: first, learning representations of urban areas; second, performing downstream prediction via linear probing, which originates from the BERT era. Inspired by the development of GPT style models, recent studies have shown that novel self-supervised pretraining schemes can endow models with direct applicability to downstream tasks, thereby eliminating the need for task-specific fine-tuning. This is largely because GPT unifies the form of pretraining and inference through next-token prediction. However, urban data exhibit structural characteristics that differ fundamentally from language, making it challenging to design a one-stage model that unifies both pretraining and inference. In this work, we propose Urban In-Context Learning, a framework that unifies pretraining and inference via a masked autoencoding process over urban regions. To capture the distribution of urban profiles, we introduce the Urban Masked Diffusion Transformer, which enables each region' s prediction to be represented as a distribution rather than a deterministic value. Furthermore, to stabilize diffusion training, we propose the Urban Representation Alignment Mechanism, which regularizes the model's intermediate features by aligning them with those from classical urban profiling methods. Extensive experiments on three indicators across two cities demonstrate that our one-stage method consistently outperforms state-of-the-art two-stage approaches. Ablation studies and case studies further validate the effectiveness of each proposed module, particularly the use of diffusion modeling.

【10】Autonomous Inorganic Materials Discovery via Multi-Agent Physics-Aware Scientific Reasoning
标题：通过多智能体物理感知科学推理自主发现无机材料
链接：https://arxiv.org/abs/2508.02956

作者：hafarollahi, Markus J. Buehler
摘要：传统的机器学习方法通过准确的属性预测和目标材料生成来加速无机材料设计，但它们作为单发模型运行，受到其训练数据中潜在知识的限制。一个核心挑战在于创建一个能够自主执行整个无机材料发现周期的智能系统，从构思和规划到实验和迭代改进。我们介绍了SparksMatter，这是一种用于自动化无机材料设计的多智能体人工智能模型，通过生成想法，设计和执行实验工作流程，不断评估和改进结果，并最终提出符合目标的候选材料来解决用户查询。SparksMatter还批评并改进了自己的回应，确定了研究差距和局限性，并提出了严格的后续验证步骤，包括DFT计算和实验合成和表征，嵌入在结构良好的最终报告中。该模型的性能在热电，半导体和钙钛矿氧化物材料设计的案例研究中进行了评估。结果表明，SparksMatter能够产生针对用户需求的新型稳定无机结构。对前沿模型的基准测试表明，SparksMatter在相关性、新颖性和科学严谨性方面始终获得较高的分数，在多个现实世界的设计任务中，由盲法评估者评估的新颖性有显着提高。这些结果表明SparksMatter具有超越现有材料知识的独特能力，能够产生化学上有效、物理上有意义且富有创造性的无机材料假设。
摘要：Conventional machine learning approaches accelerate inorganic materials design via accurate property prediction and targeted material generation, yet they operate as single-shot models limited by the latent knowledge baked into their training data. A central challenge lies in creating an intelligent system capable of autonomously executing the full inorganic materials discovery cycle, from ideation and planning to experimentation and iterative refinement. We introduce SparksMatter, a multi-agent AI model for automated inorganic materials design that addresses user queries by generating ideas, designing and executing experimental workflows, continuously evaluating and refining results, and ultimately proposing candidate materials that meet the target objectives. SparksMatter also critiques and improves its own responses, identifies research gaps and limitations, and suggests rigorous follow-up validation steps, including DFT calculations and experimental synthesis and characterization, embedded in a well-structured final report. The model's performance is evaluated across case studies in thermoelectrics, semiconductors, and perovskite oxides materials design. The results demonstrate the capacity of SparksMatter to generate novel stable inorganic structures that target the user's needs. Benchmarking against frontier models reveals that SparksMatter consistently achieves higher scores in relevance, novelty, and scientific rigor, with a significant improvement in novelty across multiple real-world design tasks as assessed by a blinded evaluator. These results demonstrate SparksMatter's unique capacity to generate chemically valid, physically meaningful, and creative inorganic materials hypotheses beyond existing materials knowledge.

检测相关(3篇)

【1】AI on the Pulse: Real-Time Health Anomaly Detection with Wearable and Ambient Intelligence
标题：脉搏中的人工智能：利用可穿戴和环境智能进行实时健康异常检测
链接：https://arxiv.org/abs/2508.03436

作者：brielli, Bardh Prenkaj, Paola Velardi, Stefano Faralli
摘要：我们介绍了AI on the Pulse，这是一个真实世界的异常检测系统，它使用可穿戴传感器，环境智能和先进的AI模型的融合来持续监测患者。由最先进的（SoTA）通用时间序列模型UniTS提供支持，我们的框架自主学习每个患者独特的生理和行为模式，检测潜在健康风险信号的细微偏差。不像分类方法，需要不切实际的，连续的标签在现实世界中的情况下，我们的方法使用异常检测提供实时的，个性化的警报反应性家庭护理干预。我们的方法优于12种SoTA异常检测方法，在高保真医疗设备（ECG）和消费者可穿戴设备上都表现出鲁棒性，F1评分提高了约22%。然而，人工智能对Pulse的真正影响在于@HOME，它已成功部署用于持续的真实患者监测。通过使用智能手表等非侵入性的轻量级设备，我们的系统证明了无需临床级设备即可实现高质量的健康监测。除了检测之外，我们还通过集成LLM来增强可解释性，将异常评分转化为对医疗保健专业人员有临床意义的见解。
摘要：We introduce AI on the Pulse, a real-world-ready anomaly detection system that continuously monitors patients using a fusion of wearable sensors, ambient intelligence, and advanced AI models. Powered by UniTS, a state-of-the-art (SoTA) universal time-series model, our framework autonomously learns each patient's unique physiological and behavioral patterns, detecting subtle deviations that signal potential health risks. Unlike classification methods that require impractical, continuous labeling in real-world scenarios, our approach uses anomaly detection to provide real-time, personalized alerts for reactive home-care interventions. Our approach outperforms 12 SoTA anomaly detection methods, demonstrating robustness across both high-fidelity medical devices (ECG) and consumer wearables, with a ~ 22% improvement in F1 score. However, the true impact of AI on the Pulse lies in @HOME, where it has been successfully deployed for continuous, real-world patient monitoring. By operating with non-invasive, lightweight devices like smartwatches, our system proves that high-quality health monitoring is possible without clinical-grade equipment. Beyond detection, we enhance interpretability by integrating LLMs, translating anomaly scores into clinically meaningful insights for healthcare professionals.

【2】Pseudo-label Induced Subspace Representation Learning for Robust Out-of-Distribution Detection
标题：伪标签诱导子空间表示学习用于鲁棒性分布外检测
链接：https://arxiv.org/abs/2508.03108

作者： Azad, Faizul Rakib Sayem, Shahana Ibrahim
摘要：分布外（OOD）检测是鲁棒人工智能（AI）的核心，旨在从训练集之外的新分布中识别样本。最近的方法已经利用特征表示作为用于OOD检测的区分签名。然而，大多数现有的方法依赖于对特征空间的限制性假设，限制了分布（ID）和OOD样本之间的可分性。在这项工作中，我们提出了一种新的OOD检测框架的基础上的伪标签诱导的子空间表示，工作在更宽松和自然的假设相比，现有的基于特征的技术。此外，我们引入了一个简单而有效的学习标准，它集成了基于交叉熵的ID分类损失与基于子空间距离的正则化损失，以增强ID-OOD可分性。大量的实验验证了我们的框架的有效性。
摘要：Out-of-distribution (OOD) detection lies at the heart of robust artificial intelligence (AI), aiming to identify samples from novel distributions beyond the training set. Recent approaches have exploited feature representations as distinguishing signatures for OOD detection. However, most existing methods rely on restrictive assumptions on the feature space that limit the separability between in-distribution (ID) and OOD samples. In this work, we propose a novel OOD detection framework based on a pseudo-label-induced subspace representation, that works under more relaxed and natural assumptions compared to existing feature-based techniques. In addition, we introduce a simple yet effective learning criterion that integrates a cross-entropy-based ID classification loss with a subspace distance-based regularization loss to enhance ID-OOD separability. Extensive experiments validate the effectiveness of our framework.

【3】Comparative Evaluation of Kolmogorov-Arnold Autoencoders and Orthogonal Autoencoders for Fault Detection with Varying Training Set Sizes
标题：Kolmogorov-Arnold自动编码器和用于不同训练集大小的故障检测的垂直自动编码器的比较评估
链接：https://arxiv.org/abs/2508.02860

作者：una Villagómez, Vladimir Mahalec
摘要：Kolmogorov-Arnold网络（KANs）最近成为传统神经网络的灵活和参数有效的替代方案。与使用固定的基于节点的激活的标准架构不同，KAN将可学习的函数放置在边缘上，由不同的函数族参数化。虽然它们在监督设置中表现出了希望，但它们在无监督故障检测中的实用性在很大程度上尚未开发。这项研究提出了一个比较评估的KAN为基础的自动编码器（KAN-AE）的无监督故障检测在化学过程中。我们研究了四种KAN-AE变体，每种变体都基于不同的KAN实现（EfficientKAN，FastKAN，FourierKAN和WavKAN），并将其与田纳西伊士曼过程上的正交自动编码器（OAE）进行基准测试。模型使用13种训练集大小的正常操作数据进行训练，并使用故障检测率（FDR）作为性能指标，针对21种故障类型进行评估。WavKAN-AE仅使用4，000个训练样本就实现了最高的整体FDR（92%），并且仍然是表现最好的，即使其他变体在更大的数据集上训练。EfficientKAN-AE仅用500个样本就达到了$90\% FDR，证明了在低数据设置中的鲁棒性。FastKAN-AE在更大规模（50，000美元样本）上具有竞争力，而FourierKAN-AE始终表现不佳。OAE基线逐渐改善，但需要更多的数据来匹配顶级KAN-AE性能。这些结果突出了KAN-AE将数据效率与强大的故障检测性能相结合的能力。它们对结构化基函数的使用表明了提高模型透明度的潜力，使它们成为在数据受限的工业环境中部署的有希望的候选者。
摘要：Kolmogorov-Arnold Networks (KANs) have recently emerged as a flexible and parameter-efficient alternative to conventional neural networks. Unlike standard architectures that use fixed node-based activations, KANs place learnable functions on edges, parameterized by different function families. While they have shown promise in supervised settings, their utility in unsupervised fault detection remains largely unexplored. This study presents a comparative evaluation of KAN-based autoencoders (KAN-AEs) for unsupervised fault detection in chemical processes. We investigate four KAN-AE variants, each based on a different KAN implementation (EfficientKAN, FastKAN, FourierKAN, and WavKAN), and benchmark them against an Orthogonal Autoencoder (OAE) on the Tennessee Eastman Process. Models are trained on normal operating data across 13 training set sizes and evaluated on 21 fault types, using Fault Detection Rate (FDR) as the performance metric. WavKAN-AE achieves the highest overall FDR ($\geq$92\%) using just 4,000 training samples and remains the top performer, even as other variants are trained on larger datasets. EfficientKAN-AE reaches $\geq$90\% FDR with only 500 samples, demonstrating robustness in low-data settings. FastKAN-AE becomes competitive at larger scales ($\geq$50,000 samples), while FourierKAN-AE consistently underperforms. The OAE baseline improves gradually but requires substantially more data to match top KAN-AE performance. These results highlight the ability of KAN-AEs to combine data efficiency with strong fault detection performance. Their use of structured basis functions suggests potential for improved model transparency, making them promising candidates for deployment in data-constrained industrial settings.

分类|识别(3篇)

【1】Cross-patient Seizure Onset Zone Classification by Patient-Dependent Weight
标题：按患者相关体重进行跨患者癫痫发作区分类
链接：https://arxiv.org/abs/2508.03635

作者：ao, Hidenori Sugano, Toshihisa Tanaka
摘要：识别局灶性癫痫患者的癫痫发作区（SOZ）对于手术治疗至关重要，并且由于其依赖于临床专家的视觉判断而仍然具有挑战性。机器学习的发展可以帮助诊断，并取得了可喜的进展。然而，与其他领域的数据不同，医疗数据通常是从单个患者中收集的，每个患者的疾病、身体状况和病史都不同，这导致每个患者的数据分布存在差异。这使得机器学习模型很难在每个新患者数据集中实现一致可靠的性能，我们称之为“跨患者问题”。“在本文中，我们提出了一种方法，使用每个新测试患者的患者特定权重来微调预训练模型，以提高诊断性能。首先，使用监督学习方法来训练机器学习模型。接下来，使用通过测试患者数据获得的训练模型的中间特征，定义测试患者数据和每个训练患者的数据之间的相似性，以确定要在以下微调中使用的每个训练患者的权重。最后，我们使用训练数据和患者权重微调预训练模型中的所有参数。在实验中，leave-one-patient-out方法被用来评估所提出的方法，并且结果显示，对于每个测试患者，分类准确率提高，平均提高超过10%。
摘要：Identifying the seizure onset zone (SOZ) in patients with focal epilepsy is essential for surgical treatment and remains challenging due to its dependence on visual judgment by clinical experts. The development of machine learning can assist in diagnosis and has made promising progress. However, unlike data in other fields, medical data is usually collected from individual patients, and each patient has different illnesses, physical conditions, and medical histories, which leads to differences in the distribution of each patient's data. This makes it difficult for a machine learning model to achieve consistently reliable performance in every new patient dataset, which we refer to as the "cross-patient problem." In this paper, we propose a method to fine-tune a pretrained model using patient-specific weights for every new test patient to improve diagnostic performance. First, the supervised learning method is used to train a machine learning model. Next, using the intermediate features of the trained model obtained through the test patient data, the similarity between the test patient data and each training patient's data is defined to determine the weight of each training patient to be used in the following fine-tuning. Finally, we fine-tune all parameters in the pretrained model with training data and patient weights. In the experiment, the leave-one-patient-out method is used to evaluate the proposed method, and the results show improved classification accuracy for every test patient, with an average improvement of more than 10%.

【2】Analyzing German Parliamentary Speeches: A Machine Learning Approach for Topic and Sentiment Classification
标题：分析德国议会演讲：话题和情绪分类的机器学习方法
链接：https://arxiv.org/abs/2508.03181

作者：z, Moritz Beyer, Jannik Späth, Lasse Bohlen, Patrick Zschech, Mathias Kraus, Julian Rosenberger
备注：Accepted at 20th International Conference on Wirtschaftsinformatik (WI25); September 2025, Münster, Germany
摘要：本研究通过分析过去五年中约28，000份议会演讲，调查德国议会（联邦议院）的政治话语。开发了两种用于主题和情感分类的机器学习模型，并在手动标记的数据集上进行了训练。这些模型显示出强大的分类性能，主题分类的接收器操作特征曲线下面积（AUROC）为0.94（主题平均值），情感分类为0.89。这两种模型都被用来评估各政党和一段时间内的话题趋势和情绪分布。分析揭示了政党之间的显着关系，他们在议会中的作用。特别是，可以观察到从政府转变为反对党的政党的风格发生了变化。虽然意识形态立场很重要，但治理责任也会塑造话语。分析直接解决了关键问题的主题，情绪动态，并在联邦议院特定的话语策略的演变。
摘要：This study investigates political discourse in the German parliament, the Bundestag, by analyzing approximately 28,000 parliamentary speeches from the last five years. Two machine learning models for topic and sentiment classification were developed and trained on a manually labeled dataset. The models showed strong classification performance, achieving an area under the receiver operating characteristic curve (AUROC) of 0.94 for topic classification (average across topics) and 0.89 for sentiment classification. Both models were applied to assess topic trends and sentiment distributions across political parties and over time. The analysis reveals remarkable relationships between parties and their role in parliament. In particular, a change in style can be observed for parties moving from government to opposition. While ideological positions matter, governing responsibilities also shape discourse. The analysis directly addresses key questions about the evolution of topics, sentiment dynamics, and party-specific discourse strategies in the Bundestag.

【3】CauKer: classification time series foundation models can be pretrained on synthetic data only
标题：CauKer：分类时间序列基础模型只能在合成数据上进行预训练
链接：https://arxiv.org/abs/2508.02879

作者：ie, Vasilii Feofanov, Marius Alonso, Ambroise Odonnat, Jianfeng Zhang, Themis Palpanas, Ievgen Redko
摘要：时间序列基础模型（TSFM）由于其强大的zero-shot能力和广泛的实际应用，近年来受到了广泛的关注。这种模型通常需要在大规模的、精心策划的真实世界序列集合上进行计算成本高昂的预训练。为了实现TSFM的样本有效预训练，我们提出了CauKer，这是一种新的算法，旨在生成具有现实趋势，季节性和非线性相互作用的多样化，因果相关的合成时间序列。CauKer将高斯过程（GP）内核组成与结构因果模型（SCM）相结合，为具有不同架构并遵循不同预训练方法的最先进分类TSFM的样本有效预训练产生数据。此外，我们的实验表明，Cauker生成的数据集对于数据集大小（10K到10M样本）和模型容量（1M到783M参数）都表现出明确的缩放规律，而不像现实世界的数据集，它显示出不规则的缩放行为。
摘要：Time series foundation models (TSFMs) have recently gained significant attention due to their strong zero-shot capabilities and widespread real-world applications. Such models typically require a computationally costly pretraining on large-scale, carefully curated collections of real-world sequences. To allow for a sample-efficient pretraining of TSFMs, we propose CauKer, a novel algorithm designed to generate diverse, causally coherent synthetic time series with realistic trends, seasonality, and nonlinear interactions. CauKer combines Gaussian Process (GP) kernel composition with Structural Causal Models (SCM) to produce data for sample-efficient pretraining of state-of-the-art classification TSFMs having different architectures and following different pretraining approaches. Additionally, our experiments reveal that CauKer-generated datasets exhibit clear scaling laws for both dataset size (10K to 10M samples) and model capacity (1M to 783M parameters), unlike real-world datasets, which display irregular scaling behavior.

表征(1篇)

【1】Cross-Model Semantics in Representation Learning
标题：表示学习中的跨模型语义
链接：https://arxiv.org/abs/2508.03649

作者：ooroo, Thomas Engel
摘要：深度网络学习到的内部表示通常对特定于架构的选择很敏感，这引发了关于学习结构在模型之间的稳定性、对齐性和可移植性的问题。在本文中，我们研究结构约束-如线性整形算子和校正路径-如何影响不同体系结构的内部表示的兼容性。从以前的结构化转换和收敛研究的见解的基础上，我们开发了一个框架，用于测量和分析不同的，但相关的建筑先验网络的代表性对齐。通过结合理论的见解，经验的探测，和控制转移实验，我们证明，结构的不确定性诱导代表性的几何结构下，更稳定的建筑变化。这表明某些形式的归纳偏见不仅支持模型内的泛化，而且还提高了模型之间学习特征的互操作性。最后，我们讨论了模型蒸馏，模块化学习和鲁棒学习系统的原则性设计的代表性的可移植性的影响。
摘要：The internal representations learned by deep networks are often sensitive to architecture-specific choices, raising questions about the stability, alignment, and transferability of learned structure across models. In this paper, we investigate how structural constraints--such as linear shaping operators and corrective paths--affect the compatibility of internal representations across different architectures. Building on the insights from prior studies on structured transformations and convergence, we develop a framework for measuring and analyzing representational alignment across networks with distinct but related architectural priors. Through a combination of theoretical insights, empirical probes, and controlled transfer experiments, we demonstrate that structural regularities induce representational geometry that is more stable under architectural variation. This suggests that certain forms of inductive bias not only support generalization within a model, but also improve the interoperability of learned features across models. We conclude with a discussion on the implications of representational transferability for model distillation, modular learning, and the principled design of robust learning systems.

优化|敛散性(5篇)

【1】Clus-UCB: A Near-Optimal Algorithm for Clustered Bandits
标题：Deliver-UCB：一种针对混乱盗贼的近优算法
链接：https://arxiv.org/abs/2508.02909

作者：re, Prasanna Chaporkar
摘要：我们研究了一个随机的多臂强盗设置武器被划分成已知的集群，使一个集群内的武器的平均奖励不同的最多一个已知的阈值。虽然聚类结构是先验已知的，但臂均值是未知的。该框架模拟了结果取决于多个因素的场景-一些具有重要影响，另一些影响较小-例如在线广告，临床试验和无线通信。我们得到了后悔的渐近下界，改进了Lai和Robbins（1985）的经典界。然后，我们提出了一个有效的算法，紧密匹配这个下界渐近的UCB。UCB是设计来利用聚类结构，并引入了一个新的指标来评估一个手臂，这取决于在集群内的其他手臂。通过这种方式，武器之间相互交流信息。我们提出了我们的算法的仿真结果，并比较其性能对KL-UCB和其他知名的算法与依赖武器的土匪。最后，我们解决了这项工作的一些局限性，并提到未来可能的研究结束。
摘要：We study a stochastic multi-armed bandit setting where arms are partitioned into known clusters, such that the mean rewards of arms within a cluster differ by at most a known threshold. While the clustering structure is known a priori, the arm means are unknown. This framework models scenarios where outcomes depend on multiple factors -- some with significant and others with minor influence -- such as online advertising, clinical trials, and wireless communication. We derive asymptotic lower bounds on the regret that improve upon the classical bound of Lai & Robbins (1985). We then propose Clus-UCB, an efficient algorithm that closely matches this lower bound asymptotically. Clus-UCB is designed to exploit the clustering structure and introduces a new index to evaluate an arm, which depends on other arms within the cluster. In this way, arms share information among each other. We present simulation results of our algorithm and compare its performance against KL-UCB and other well-known algorithms for bandits with dependent arms. Finally, we address some limitations of this work and conclude by mentioning possible future research.

【2】On the Theory and Practice of GRPO: A Trajectory-Corrected Approach with Fast Convergence
标题：GRPO的理论与实践：一种快速收敛的轨迹修正方法
链接：https://arxiv.org/abs/2508.02833

作者： Ruinan Jin
备注：12 pages
摘要：组相对策略优化（GRPO）是DeepSeek最近提出的一种无临界强化学习算法，用于微调大型语言模型。它用组标准化奖励取代了最近策略优化（PPO）中的值函数，同时保留了基于旧策略的PPO风格令牌级重要性采样。我们发现，GRPO更新规则实际上估计的政策梯度在旧的政策，而不是当前的。然而，由于旧的政策每隔几步更新一次，两者之间的差异仍然很小，限制了这种偏见在实践中的影响。我们通过消融研究验证了这一点，在消融研究中，重要性采样被完全删除，而是使用跨多个优化步骤的固定旧策略估计的梯度进行更新。值得注意的是，这种简化导致性能与标准GRPO相当。基于这些发现，我们提出了一种新的算法：轨迹级重要性校正GRPO（TIC GRPO）。TIC GRPO用单个轨迹级概率比代替令牌级重要性比，在保持无批评结构的同时产生当前策略梯度的无偏估计。此外，我们还提出了GRPO风格方法的第一个理论收敛性分析，涵盖了原始GRPO和我们提出的变体。
摘要：Group Relative Policy Optimization (GRPO), recently proposed by DeepSeek, is a critic-free reinforcement learning algorithm for fine tuning large language models. It replaces the value function in Proximal Policy Optimization (PPO) with group normalized rewards, while retaining PPO style token level importance sampling based on an old policy. We show that GRPO update rule in fact estimates the policy gradient at the old policy rather than the current one. However, since the old policy is refreshed every few steps, the discrepancy between the two remains small limiting the impact of this bias in practice. We validate this through an ablation study in which importance sampling is entirely removed, and updates are instead performed using the gradient estimated at a fixed old policy across multiple optimization steps. Remarkably, this simplification results in performance comparable to standard GRPO. Motivated by these findings, we propose a new algorithm: Trajectory level Importance Corrected GRPO (TIC GRPO). TIC GRPO replaces token level importance ratios with a single trajectory level probability ratio, yielding an unbiased estimate of the current policy gradient while preserving the critic free structure. Furthermore, we present the first theoretical convergence analysis for GRPO style methods, covering both the original GRPO and our proposed variant.

【3】Exponential convergence rate for Iterative Markovian Fitting
标题：迭代马尔科夫匹配的指数收敛速度
链接：https://arxiv.org/abs/2508.02770

作者：kolov, Alexander Korotin
摘要：考虑有限状态空间上的离散Schr“odinger桥问题。虽然已经知道迭代马尔可夫拟合（IMF）算法在Kullback-Leibler发散中收敛到地面真值解，但是该收敛的速度仍然没有量化。在这项工作中，我们首次建立了IMF具有显式收缩因子的指数收敛性。
摘要：We consider the discrete-time Schr\"odinger bridge problem on a finite state space. Although it has been known that the Iterative Markovian Fitting (IMF) algorithm converges in Kullback-Leibler divergence to the ground truth solution, the speed of that convergence remained unquantified. In this work, we establish for the first time that IMF exhibits exponential convergence with an explicit contraction factor.

【4】Overcoming the Loss Conditioning Bottleneck in Optimization-Based PDE Solvers: A Novel Well-Conditioned Loss Function
标题：克服基于优化的PDL求解器中的损失条件化瓶颈：一种新型的良好条件化损失函数
链接：https://arxiv.org/abs/2508.02692

作者：, Weiwei Zhang
摘要：近年来，基于优化的偏微分方程求解器，最大限度地减少标量损失函数得到了越来越多的关注。这些方法要么直接在离散变量上定义损失，如优化离散损失（ODIL），要么间接通过神经网络代理，如物理信息神经网络（PINN）。然而，尽管他们的承诺，这样的方法往往收敛速度比经典的迭代求解器慢得多，通常被认为是低效的。这项工作提供了一个理论上的见解，归因于使用的均方误差（MSE）损失，这隐含地形成正常的方程，平方的条件数，并严重损害优化的低效率。为了解决这个问题，我们提出了一种新的稳定梯度残差（SGR）损失。通过调整权参数，灵活地调整原系统与其正规方程之间的条件数，同时减小极限情况下的MSE损失。我们系统地基准的收敛行为和优化稳定性的SGR损失内的ODIL框架和PINNs采用数值或自动微分，并比较其性能与经典的迭代求解器。一系列基准问题的数值实验表明，在ODIL框架内，建议SGR损失实现数量级的收敛速度比MSE损失。PINNs框架内的进一步验证表明，尽管神经网络具有高度非线性，但SGR始终优于MSE损失。这些理论和实证研究结果有助于弥合经典迭代求解器和基于优化的求解器之间的性能差距，突出了损失调节的核心作用，并为更有效的PDE求解器的设计提供了关键见解。
摘要：Optimization-based PDE solvers that minimize scalar loss functions have gained increasing attention in recent years. These methods either define the loss directly over discrete variables, as in Optimizing a Discrete Loss (ODIL), or indirectly through a neural network surrogate, as in Physics-Informed Neural Networks (PINNs). However, despite their promise, such methods often converge much more slowly than classical iterative solvers and are commonly regarded as inefficient. This work provides a theoretical insight, attributing the inefficiency to the use of the mean squared error (MSE) loss, which implicitly forms the normal equations, squares the condition number, and severely impairs optimization. To address this, we propose a novel Stabilized Gradient Residual (SGR) loss. By tuning a weight parameter, it flexibly modulates the condition number between the original system and its normal equations, while reducing to the MSE loss in the limiting case. We systematically benchmark the convergence behavior and optimization stability of the SGR loss within both the ODIL framework and PINNs-employing either numerical or automatic differentiation-and compare its performance against classical iterative solvers. Numerical experiments on a range of benchmark problems demonstrate that, within the ODIL framework, the proposed SGR loss achieves orders-of-magnitude faster convergence than the MSE loss. Further validation within the PINNs framework shows that, despite the high nonlinearity of neural networks, SGR consistently outperforms the MSE loss. These theoretical and empirical findings help bridge the performance gap between classical iterative solvers and optimization-based solvers, highlighting the central role of loss conditioning, and provide key insights for the design of more efficient PDE solvers.

【5】A Dual Optimization View to Empirical Risk Minimization with f-Divergence Regularization
标题：f-分歧规范化经验风险最小化的双重优化观点
链接：https://arxiv.org/abs/2508.03314

作者： Daunas, Iñaki Esnaola, Samir M. Perlaza
备注：Conference paper to appear in ITW 2025. arXiv admin note: substantial text overlap with arXiv:2502.14544; text overlap with arXiv:2402.00501
摘要：介绍了f-发散正则化经验风险最小化（ERM-fDR）的对偶公式。ERM-FDR的对偶优化问题的解决方案与作为隐式函数引入的归一化函数的概念有关。这种双重方法利用Legendre-Fenchel变换和隐函数定理为归一化函数提供非线性ODE表达式。此外，非线性常微分方程的表达和它的性质提供了一个计算有效的方法来计算ERM-FDR解决方案的归一化函数在温和的条件下。
摘要：The dual formulation of empirical risk minimization with f-divergence regularization (ERM-fDR) is introduced. The solution of the dual optimization problem to the ERM-fDR is connected to the notion of normalization function introduced as an implicit function. This dual approach leverages the Legendre-Fenchel transform and the implicit function theorem to provide a nonlinear ODE expression to the normalization function. Furthermore, the nonlinear ODE expression and its properties provide a computationally efficient method to calculate the normalization function of the ERM-fDR solution under a mild condition.

预测|估计(7篇)

【1】SolarSeer: Ultrafast and accurate 24-hour solar irradiance forecasts outperforming numerical weather prediction across the USA
标题：SolarSeer：超快、准确的24小时太阳辐射预报优于美国各地的数字天气预报
链接：https://arxiv.org/abs/2508.03590

作者： Bai, Zuliang Fang, Shengyu Tao, Siqi Xiang, Jiang Bian, Yanfei Xiang, Pengcheng Zhao, Weixin Jin, Jonathan A. Weyn, Haiyu Dong, Bin Zhang, Hongyu Sun, Kit Thambiratnam, Qi Zhang, Hongbin Sun, Xuan Zhang, Qiuwei Wu
摘要：准确的24小时太阳辐照度预报对于太阳能光伏系统的安全和经济运行至关重要。传统的数值天气预报（NWP）模式代表了最先进的预报性能，但依赖于计算成本高的数据同化和求解模拟大气物理的复杂偏微分方程（PDE）。在这里，我们介绍SolarSeer，这是一个端到端的大型人工智能（AI）模型，用于预测整个美国大陆（CONUS）的太阳辐照度。SolarSeer旨在将历史卫星观测结果直接映射到未来预报，消除数据同化和偏微分方程求解的计算开销。这种效率使SolarSeer的运行速度比传统的NWP快1,500倍以上，在3秒内以5公里的分辨率为CONUS生成24小时云量和太阳辐照度预报。与CONUS最先进的数值预报相比，高分辨率快速刷新（HRRR），SolarSeer在再分析数据中将太阳辐照度预测的均方根误差降低了27.28%，在1，800个站点中降低了15.35%。SolarSeer还能有效捕捉太阳辐照度波动，显著提高一阶辐照度差预报精度。SolarSeer的超快、准确的24小时太阳辐照度预测为向可持续、净零能源系统的过渡提供了强有力的支持。
摘要：Accurate 24-hour solar irradiance forecasting is essential for the safe and economic operation of solar photovoltaic systems. Traditional numerical weather prediction (NWP) models represent the state-of-the-art in forecasting performance but rely on computationally costly data assimilation and solving complicated partial differential equations (PDEs) that simulate atmospheric physics. Here, we introduce SolarSeer, an end-to-end large artificial intelligence (AI) model for solar irradiance forecasting across the Contiguous United States (CONUS). SolarSeer is designed to directly map the historical satellite observations to future forecasts, eliminating the computational overhead of data assimilation and PDEs solving. This efficiency allows SolarSeer to operate over 1,500 times faster than traditional NWP, generating 24-hour cloud cover and solar irradiance forecasts for the CONUS at 5-kilometer resolution in under 3 seconds. Compared with the state-of-the-art NWP in the CONUS, i.e., High-Resolution Rapid Refresh (HRRR), SolarSeer significantly reduces the root mean squared error of solar irradiance forecasting by 27.28% in reanalysis data and 15.35% across 1,800 stations. SolarSeer also effectively captures solar irradiance fluctuations and significantly enhances the first-order irradiance difference forecasting accuracy. SolarSeer's ultrafast, accurate 24-hour solar irradiance forecasts provide strong support for the transition to sustainable, net-zero energy systems.

【2】Rethinking Selectivity in State Space Models: A Minimal Predictive Sufficiency Approach
标题：重新思考状态空间模型中的选择性：最小预测充分性方法
链接：https://arxiv.org/abs/2508.03158

作者：, Jian'an Zhang, Hongyi Duan, Haoyang Liu, Qingyang Li
备注：Submitted to AAAI'26
摘要：状态空间模型（SSM），特别是最近的选择性变体，如Mamba，已经成为序列建模的领先架构，挑战了Transformers的主导地位。然而，这些最先进的模型的成功在很大程度上依赖于设计的选择性机制，缺乏严格的第一性原理推导。这种理论上的差距提出了关于它们的最优性和对虚假相关性的鲁棒性的问题。为了解决这个问题，我们引入了预测充分性原则，这是一种新的信息理论标准，规定理想的隐藏状态应该是过去预测未来的最小充分统计量。基于这一原则，我们提出了最小预测充分性状态空间模型（MPS-SSM），一个新的框架，其中的选择机制是通过优化我们的原则派生的目标函数的指导。这种方法鼓励模型最大限度地压缩历史信息，而不会失去预测能力，从而学会忽略非因果噪声和虚假模式。在广泛的基准数据集上进行的大量实验表明，MPS-SSM不仅实现了最先进的性能，在长期预测和噪声场景中显着优于现有模型，而且还表现出卓越的鲁棒性。此外，我们表明，MPS原则可以扩展为一个通用的正则化框架，以增强其他流行的架构，突出其广泛的潜力。
摘要：State Space Models (SSMs), particularly recent selective variants like Mamba, have emerged as a leading architecture for sequence modeling, challenging the dominance of Transformers. However, the success of these state-of-the-art models largely relies on heuristically designed selective mechanisms, which lack a rigorous first-principle derivation. This theoretical gap raises questions about their optimality and robustness against spurious correlations. To address this, we introduce the Principle of Predictive Sufficiency, a novel information-theoretic criterion stipulating that an ideal hidden state should be a minimal sufficient statistic of the past for predicting the future. Based on this principle, we propose the Minimal Predictive Sufficiency State Space Model (MPS-SSM), a new framework where the selective mechanism is guided by optimizing an objective function derived from our principle. This approach encourages the model to maximally compress historical information without losing predictive power, thereby learning to ignore non-causal noise and spurious patterns. Extensive experiments on a wide range of benchmark datasets demonstrate that MPS-SSM not only achieves state-of-the-art performance, significantly outperforming existing models in long-term forecasting and noisy scenarios, but also exhibits superior robustness. Furthermore, we show that the MPS principle can be extended as a general regularization framework to enhance other popular architectures, highlighting its broad potential.

【3】Aerobatic maneuvers in insect-scale flapping-wing aerial robots via deep-learned robust tube model predictive control
标题：通过深入学习的鲁棒管模型预测控制在昆虫规模扑翼空中机器人中进行特技飞行
链接：https://arxiv.org/abs/2508.03043

作者：Hsiao, Andrea Tagliabue, Owen Matteson, Suhan Kim, Tong Zhao, Jonathan P. How, YuFeng Chen
备注：27 pages, 26 supplementary pages, 6 main figures, 16 supplementary figures, 1 table
摘要：飞行昆虫表现出高度灵活的机动，如急刹车，扫视，和身体翻转干扰下。相比之下，昆虫规模的空中机器人仅限于跟踪具有小的身体加速度的非攻击性轨迹。这种性能差距是由低机器人惯性，快速动态，扑翼空气动力学的不确定性和对环境干扰的高敏感性的组合造成的。执行高动态机动需要生成突破硬件限制的激进飞行轨迹和考虑模型和环境不确定性的高速反馈控制器。在这里，通过设计一个深度学习的鲁棒管模型预测控制器，我们在一个750毫克的扑翼机器人中展示了昆虫般的飞行敏捷性和鲁棒性。我们的模型预测控制器可以跟踪侵略性的飞行轨迹干扰。为了在计算受限的实时系统中实现高反馈率，我们设计了模仿学习方法来训练一个两层、全连接的神经网络，它类似于由中枢神经系统和运动神经元组成的昆虫飞行控制结构。我们的机器人展示了昆虫般的扫视运动，横向速度和加速度为197厘米每秒和11.7米每秒平方，比先前的结果提高了447$\%$和255$\%$。机器人还可以在每秒160厘米的风干扰和大的命令到力映射误差下执行扫视机动。此外，它在11秒内完成10次连续的身体翻转-这是亚克级飞行器中最具挑战性的动作。这些结果代表了实现昆虫级飞行敏捷性的里程碑，并激发了未来对传感和计算自主性的研究。
摘要：Aerial insects exhibit highly agile maneuvers such as sharp braking, saccades, and body flips under disturbance. In contrast, insect-scale aerial robots are limited to tracking non-aggressive trajectories with small body acceleration. This performance gap is contributed by a combination of low robot inertia, fast dynamics, uncertainty in flapping-wing aerodynamics, and high susceptibility to environmental disturbance. Executing highly dynamic maneuvers requires the generation of aggressive flight trajectories that push against the hardware limit and a high-rate feedback controller that accounts for model and environmental uncertainty. Here, through designing a deep-learned robust tube model predictive controller, we showcase insect-like flight agility and robustness in a 750-millgram flapping-wing robot. Our model predictive controller can track aggressive flight trajectories under disturbance. To achieve a high feedback rate in a compute-constrained real-time system, we design imitation learning methods to train a two-layer, fully connected neural network, which resembles insect flight control architecture consisting of central nervous system and motor neurons. Our robot demonstrates insect-like saccade movements with lateral speed and acceleration of 197 centimeters per second and 11.7 meters per second square, representing 447$\%$ and 255$\%$ improvement over prior results. The robot can also perform saccade maneuvers under 160 centimeters per second wind disturbance and large command-to-force mapping errors. Furthermore, it performs 10 consecutive body flips in 11 seconds - the most challenging maneuver among sub-gram flyers. These results represent a milestone in achieving insect-scale flight agility and inspire future investigations on sensing and compute autonomy.

【4】DMSC: Dynamic Multi-Scale Coordination Framework for Time Series Forecasting
标题：DMSC：时间序列预测的动态多尺度协调框架
链接：https://arxiv.org/abs/2508.02753

作者：ng, Jianchao Tang, Zhuo Li, Long Lan
摘要：时间序列预测（TSF）面临着在不同尺度上建模复杂的时间依赖性的持续挑战。尽管最近的进展利用了不同的分解操作和基于CNN、MLP或Transformer的新颖架构，但现有方法仍然与静态分解策略、碎片化依赖建模和不灵活的融合机制作斗争，限制了它们对复杂的时间依赖建模的能力。为了明确地解决上述三个问题，我们提出了一种新的动态多尺度协调框架（DMSC）多尺度补丁分解块（EMPD），三元交互块（TIB）和自适应规模路由MoE块（ASR-MoE）。具体而言，EMPD被设计为一个内置的组件，动态分段序列到层次补丁与指数缩放粒度，消除预定义的规模限制，通过输入自适应补丁调整。然后，TIB联合模型内补丁，补丁间，和交叉变量的依赖关系在每一层的分解表示。EMPD和TIB被联合集成到形成多层渐进级联架构的层中，其中来自较早层的粗粒度表示通过门控路径自适应地引导后续层中的细粒度特征提取。ASR-MoE通过利用具有时间感知权重的专业全球和本地专家，动态融合多尺度预测。在13个真实基准上的综合实验表明，DMSC始终保持最先进的（SOTA）性能和卓越的计算效率TSF任务。代码可从https://github.com/1327679995/DMSC获得。
摘要：Time Series Forecasting (TSF) faces persistent challenges in modeling intricate temporal dependencies across different scales. Despite recent advances leveraging different decomposition operations and novel architectures based on CNN, MLP or Transformer, existing methods still struggle with static decomposition strategies, fragmented dependency modeling, and inflexible fusion mechanisms, limiting their ability to model intricate temporal dependencies. To explicitly solve the mentioned three problems respectively, we propose a novel Dynamic Multi-Scale Coordination Framework (DMSC) with Multi-Scale Patch Decomposition block (EMPD), Triad Interaction Block (TIB) and Adaptive Scale Routing MoE block (ASR-MoE). Specifically, EMPD is designed as a built-in component to dynamically segment sequences into hierarchical patches with exponentially scaled granularities, eliminating predefined scale constraints through input-adaptive patch adjustment. TIB then jointly models intra-patch, inter-patch, and cross-variable dependencies within each layer's decomposed representations. EMPD and TIB are jointly integrated into layers forming a multi-layer progressive cascade architecture, where coarse-grained representations from earlier layers adaptively guide fine-grained feature extraction in subsequent layers via gated pathways. And ASR-MoE dynamically fuses multi-scale predictions by leveraging specialized global and local experts with temporal-aware weighting. Comprehensive experiments on thirteen real-world benchmarks demonstrate that DMSC consistently maintains state-of-the-art (SOTA) performance and superior computational efficiency for TSF tasks. Code is available at https://github.com/1327679995/DMSC.

【5】Low-Communication Resilient Distributed Estimation Algorithm Based on Memory Mechanism
标题：基于存储机制的低通信弹性分布式估计算法
链接：https://arxiv.org/abs/2508.02705

作者：imei Hu, Feng Chen, Ye Yao
摘要：在多任务对抗网络中，分布式算法中未知参数的精确估计会受到攻击节点或链路的阻碍。为了应对这一挑战，本文提出了一种低通信弹性分布式估计算法。首先，引入了一种基于信誉的节点选择策略，允许节点与更可靠的邻居子集进行通信。随后，识别可信的中间估计，加权支持向量数据描述（W-SVDD）模型训练的内存数据。该训练模型有助于增强分布式估计过程对受攻击节点或链路影响的弹性。此外，事件触发机制，以尽量减少无效更新的W-SVDD模型，并根据假设推导出合适的阈值。分析了算法的收敛性。仿真结果表明，该算法具有较好的性能，与其他算法相比，具有较少的通信开销。
摘要：In multi-task adversarial networks, the accurate estimation of unknown parameters in a distributed algorithm is hindered by attacked nodes or links. To tackle this challenge, this brief proposes a low-communication resilient distributed estimation algorithm. First, a node selection strategy based on reputation is introduced that allows nodes to communicate with more reliable subset of neighbors. Subsequently, to discern trustworthy intermediate estimates, the Weighted Support Vector Data Description (W-SVDD) model is employed to train the memory data. This trained model contributes to reinforce the resilience of the distributed estimation process against the impact of attacked nodes or links. Additionally, an event-triggered mechanism is introduced to minimize ineffective updates to the W-SVDD model, and a suitable threshold is derived based on assumptions. The convergence of the algorithm is analyzed. Finally, simulation results demonstrate that the proposed algorithm achieves superior performance with less communication cost compared to other algorithms.

【6】A Wireless Foundation Model for Multi-Task Prediction
标题：多任务预测的无线基础模型
链接：https://arxiv.org/abs/2507.05938

作者：heng, Jiacheng Wang, Xingyu Zhou, Le Liang, Hao Ye, Shi Jin, Geoffrey Ye Li
摘要：随着移动通信网络的复杂性和动态性的增长，准确地预测诸如信道状态信息（CSI）、用户位置和网络业务的关键系统参数对于广泛的物理（PHY）层和介质访问控制（MAC）层任务已经变得至关重要。尽管传统的基于深度学习（DL）的方法已被广泛应用于此类预测任务，但它们通常难以在不同的场景和任务中进行推广。作为回应，我们提出了一个统一的基础模型，在无线网络中的多任务预测，支持不同的预测区间。所提出的模型强制执行单变量分解，统一异构的任务，编码间隔意识的粒度，并使用一个因果Transformer骨干准确的预测。此外，我们在训练过程中引入了补丁掩蔽策略，以支持任意输入长度。在大规模数据集上进行训练后，所提出的基础模型对看不见的场景表现出很强的泛化能力，并在新任务上实现了zero-shot性能，超过了传统的全拍摄基线。
摘要：With the growing complexity and dynamics of the mobile communication networks, accurately predicting key system parameters, such as channel state information (CSI), user location, and network traffic, has become essential for a wide range of physical (PHY)-layer and medium access control (MAC)-layer tasks. Although traditional deep learning (DL)-based methods have been widely applied to such prediction tasks, they often struggle to generalize across different scenarios and tasks. In response, we propose a unified foundation model for multi-task prediction in wireless networks that supports diverse prediction intervals. The proposed model enforces univariate decomposition to unify heterogeneous tasks, encodes granularity for interval awareness, and uses a causal Transformer backbone for accurate predictions. Additionally, we introduce a patch masking strategy during training to support arbitrary input lengths. After trained on large-scale datasets, the proposed foundation model demonstrates strong generalization to unseen scenarios and achieves zero-shot performance on new tasks that surpass traditional full-shot baselines.

【7】Benchmarking Classical and Quantum Models for DeFi Yield Prediction on Curve Finance
标题：曲线金融上DeFi收益率预测的经典和量子模型基准
链接：https://arxiv.org/abs/2508.02685

作者： Chen, Aidan Hung-Wen Tsai
摘要：去中心化金融（DeFi）的兴起，对准确的收益率和业绩预测产生了越来越大的需求，以指导流动性分配策略。在这项研究中，我们对XGBoost，Random Forest，LSTM，Transformer，量子神经网络（QNN）和具有量子特征映射的量子支持向量机（QSVM-QNN）六种模型进行了基准测试，这些模型来自28个Curve Finance池的一年历史数据。我们评估测试MAE，RMSE和方向精度的模型性能。我们的研究结果表明，经典的集成模型，特别是XGBoost和随机森林，始终优于深度学习和量子模型。XGBoost达到了最高的方向准确度（71.57%），测试MAE为1.80，而随机森林达到了最低的测试MAE 1.77和71.36%的准确度。相比之下，量子模型表现不佳，方向准确度低于50%，误差更高，突出了当前将量子机器学习应用于真实世界DeFi时间序列数据的局限性。这项工作为DeFi应用程序的模型适用性提供了可重复的基准和实用的见解，强调了经典方法在该领域新兴量子方法的鲁棒性。
摘要：The rise of decentralized finance (DeFi) has created a growing demand for accurate yield and performance forecasting to guide liquidity allocation strategies. In this study, we benchmark six models, XGBoost, Random Forest, LSTM, Transformer, quantum neural networks (QNN), and quantum support vector machines with quantum feature maps (QSVM-QNN), on one year of historical data from 28 Curve Finance pools. We evaluate model performance on test MAE, RMSE, and directional accuracy. Our results show that classical ensemble models, particularly XGBoost and Random Forest, consistently outperform both deep learning and quantum models. XGBoost achieves the highest directional accuracy (71.57%) with a test MAE of 1.80, while Random Forest attains the lowest test MAE of 1.77 and 71.36% accuracy. In contrast, quantum models underperform with directional accuracy below 50% and higher errors, highlighting current limitations in applying quantum machine learning to real-world DeFi time series data. This work offers a reproducible benchmark and practical insights into model suitability for DeFi applications, emphasizing the robustness of classical methods over emerging quantum approaches in this domain.

其他神经网络|深度学习|模型|建模(28篇)

【1】Streaming Generated Gaussian Process Experts for Online Learning and Control
标题：流生成的高斯过程专家用于在线学习和控制
链接：https://arxiv.org/abs/2508.03679

作者：g, Dongfa Zhang, Xiaobing Dai, Fengyi Yu, Chi Zhang, Bingkun Huang, Hamid Sadeghian, Sami Haddadin
摘要：高斯过程（GPs）作为一种非参数学习方法，为函数逼近提供了灵活的建模能力和校准的不确定性量化。此外，GP通过有效地将新数据与多项式时间计算相结合来支持在线学习，使其非常适合需要快速适应的安全关键动态系统。然而，当处理流数据时，精确GP的推理和在线更新会导致立方计算时间和二次存储内存复杂度，从而限制了它们在实时设置中对大型数据集的可扩展性。在本文中，我们提出了一个\underline{s}treaming \underline{k} ernel-induced progressivel\underline{y} generated expert framework of \underline{G}aussian \underline{p} processes（SkyGP），它通过维护一个有界的专家集来解决计算和内存约束，同时继承精确高斯过程的学习性能保证。此外，还引入了两种SkyGP变体，每种变体都针对特定目标进行定制，可以最大限度地提高预测精度（SkyGP-Dense）或提高计算效率（SkyGP-Fast）。SkyGP的有效性通过广泛的基准测试和实时控制实验进行了验证，证明了其优越的性能相比，国家的最先进的方法。
摘要：Gaussian Processes (GPs), as a nonparametric learning method, offer flexible modeling capabilities and calibrated uncertainty quantification for function approximations. Additionally, GPs support online learning by efficiently incorporating new data with polynomial-time computation, making them well-suited for safety-critical dynamical systems that require rapid adaptation. However, the inference and online updates of exact GPs, when processing streaming data, incur cubic computation time and quadratic storage memory complexity, limiting their scalability to large datasets in real-time settings. In this paper, we propose a \underline{s}treaming \underline{k}ernel-induced progressivel\underline{y} generated expert framework of \underline{G}aussian \underline{p}rocesses (SkyGP) that addresses both computational and memory constraints by maintaining a bounded set of experts, while inheriting the learning performance guarantees from exact Gaussian processes. Furthermore, two SkyGP variants are introduced, each tailored to a specific objective, either maximizing prediction accuracy (SkyGP-Dense) or improving computational efficiency (SkyGP-Fast). The effectiveness of SkyGP is validated through extensive benchmarks and real-time control experiments demonstrating its superior performance compared to state-of-the-art approaches.

【2】MaLV-OS: Rethinking the Operating System Architecture for Machine Learning in Virtualized Clouds
标题：MaLV-OS：重新思考虚拟化云中机器学习的操作系统架构
链接：https://arxiv.org/abs/2508.03676

作者：tchebe, Oana Balmau
摘要：大量的研究已经使用机器学习（ML）模型来开发学习的操作系统（OS）和内核。后者动态适应作业负载，动态调整资源（CPU、IO、内存、网络带宽）分配，以响应实际用户需求。这项工作的共同点是它利用ML来改进内核决策。到今天为止，据我们所知，还没有任何工作采取相反的方向，即，使用OS来改进ML。虽然一些工作提出将系统级优化应用于ML算法，但它们并没有定制操作系统以适应ML上下文。为了解决这一限制，我们在本文中采用正交方法，利用操作系统来增强ML模型和算法的性能。我们探索了一条通往ML专用操作系统MaLV-OS的道路。MaLV-OS重新考虑了操作系统架构，使其专门针对ML工作负载，特别是在虚拟化云中，现在广泛用于运行ML应用程序。MaLV-OS设想的架构包括（1）一个微内核，Micro-LAKE，它允许内核空间应用程序使用GPU，以及（2）一个MLaaS（ML即服务）子系统，它收集ML模型以帮助Micro-LAKE进行内存管理和CPU调度。MaLV-OS架构还将模型的系统敏感部分卸载到操作系统，以减轻模型的复杂性和编程，并加快其执行速度。最后，MaLV-OS集成了一个开源GPU虚拟化软件，直接合并到hypervisor中。为了获得更大的灵活性，MaLV-OS的愿景是使虚拟机能够动态选择MLaaS策略，以提高用户正在运行的模型的性能。由于MLaaS被设计为Linux内核模块，因此MaLV-OS架构支持向MLaaS子系统动态添加新功能。
摘要：A large body of research has employed Machine Learning (ML) models to develop learned operating systems (OSes) and kernels. The latter dynamically adapts to the job load and dynamically adjusts resources (CPU, IO, memory, network bandwidth) allocation to respond to the actual user demand. What this work has in common is that it utilizes ML to improve kernel decisions. To this day, and to the best of our knowledge, no work has taken the opposite direction, i.e., using OS to improve ML. While some work proposes applying system-level optimizations to ML algorithms, they do not tailor the OS to adapt to the ML context. To address this limitation, we take an orthogonal approach in this paper by leveraging the OS to enhance the performance of ML models and algorithms. We explore the path towards an ML-specialized OS, MaLV-OS. MaLV-OS rethinks the OS architecture to make it specifically tailored to ML workloads, especially in virtualized clouds, which are now widely used to run ML applications. MaLV-OS envisioned architecture includes (1) a micro-kernel, Micro-LAKE, which allows kernel space applications to use the GPU, and (2) an MLaaS (ML as a Service) subsystem that gathers ML models to help Micro-LAKE with memory management and CPU scheduling. MaLV-OS architecture also offloads system-sensitive parts of the models to the OS, to lighten the model complexity and programming, and speed up its execution. Finally, MaLV-OS integrates an open-source GPU virtualization software, merged directly into the hypervisor. For more flexibility, MaLV-OS vision is to enable the virtual machine to dynamically select MLaaS policies that can improve the performance of the model the user is running. Because MLaaS is designed as loadable kernel modules, the MaLV-OS architecture enables the dynamic addition of new capabilities to the MLaaS subsystem.

【3】Minimal Convolutional RNNs Accelerate Spatiotemporal Learning
标题：最小卷积RNN加速时空学习
链接：https://arxiv.org/abs/2508.03614

作者： Horuz, Sebastian Otte, Martin V. Butz, Matthias Karlbauer
备注：Accepted at ICANN 2025
摘要：我们介绍了MinConvLSTM和MinConvGRU，这两个新的时空模型将卷积递归网络的空间归纳偏差与最小的可并行RNN的训练效率相结合。我们的方法将MinLSTM和MinGRU的对数域前缀和公式扩展到卷积架构，实现完全并行的训练，同时保留本地化的空间建模。这消除了在教师强制过程中顺序隐藏状态更新的需要-这是传统ConvRNN模型的主要瓶颈。此外，我们将受xLSTM架构启发的指数门控机制纳入MinConvLSTM，这进一步简化了对数域计算。我们的模型在结构上是最小的，计算效率高，减少了参数数量，提高了可扩展性。我们评估我们的模型在两个时空预测任务：Navier-Stokes动力学和现实世界的地球位势数据。在训练速度方面，我们的架构明显优于标准ConvLSTM和ConvGRU。此外，即使在闭环自回归模式下，我们的模型在这两个域中也实现了较低的预测误差。这些发现表明，最小递归结构与卷积输入聚合相结合时，为时空序列建模提供了一种引人注目的有效替代方案，弥合了递归简单性和空间复杂性之间的差距。
摘要：We introduce MinConvLSTM and MinConvGRU, two novel spatiotemporal models that combine the spatial inductive biases of convolutional recurrent networks with the training efficiency of minimal, parallelizable RNNs. Our approach extends the log-domain prefix-sum formulation of MinLSTM and MinGRU to convolutional architectures, enabling fully parallel training while retaining localized spatial modeling. This eliminates the need for sequential hidden state updates during teacher forcing - a major bottleneck in conventional ConvRNN models. In addition, we incorporate an exponential gating mechanism inspired by the xLSTM architecture into the MinConvLSTM, which further simplifies the log-domain computation. Our models are structurally minimal and computationally efficient, with reduced parameter count and improved scalability. We evaluate our models on two spatiotemporal forecasting tasks: Navier-Stokes dynamics and real-world geopotential data. In terms of training speed, our architectures significantly outperform standard ConvLSTMs and ConvGRUs. Moreover, our models also achieve lower prediction errors in both domains, even in closed-loop autoregressive mode. These findings demonstrate that minimal recurrent structures, when combined with convolutional input aggregation, offer a compelling and efficient alternative for spatiotemporal sequence modeling, bridging the gap between recurrent simplicity and spatial complexity.

【4】DyCAF-Net: Dynamic Class-Aware Fusion Network
标题：DyCAF-Net：动态类感知融合网络
链接：https://arxiv.org/abs/2508.03598

作者：Jahin, Shahriar Soudeep, M. F. Mridha, Nafiz Fahad, Md. Jakir Hossen
备注：Accepted to IEEE DSAA 2025 (10 pages, 5 figures)
摘要：目标检测的最新进展依赖于具有多尺度融合和注意机制的模块化架构。然而，静态融合启发式算法和与类别无关的注意力限制了具有遮挡、混乱和类别不平衡的动态场景的性能。我们介绍了动态类感知融合网络（DyCAF-Net）通过三项创新来应对这些挑战：（1）通过隐式定点建模迭代地细化多尺度特征的基于输入调节平衡的颈部，（2）使用依赖于输入和类别的线索自适应地重新校准通道和空间响应的双重动态注意机制，以及（3）类别感知特征自适应，其调制特征以优先化稀有类别的区分区域。通过使用YOLOv 8和相关架构进行全面的消融研究，以及对9个最先进的基线进行基准测试，DyCAF-Net在13个不同的基准测试中实现了精度、mAP@50和mAP@50-95的显著改善，包括阻塞严重和长尾数据集。该框架保持计算效率（1110万参数）和有竞争力的推理速度，而其适应性的规模变化，语义重叠和类不平衡的位置，它作为一个强大的解决方案，在医学成像，监控和自主系统的现实世界的检测任务。
摘要：Recent advancements in object detection rely on modular architectures with multi-scale fusion and attention mechanisms. However, static fusion heuristics and class-agnostic attention limit performance in dynamic scenes with occlusions, clutter, and class imbalance. We introduce Dynamic Class-Aware Fusion Network (DyCAF-Net) that addresses these challenges through three innovations: (1) an input-conditioned equilibrium-based neck that iteratively refines multi-scale features via implicit fixed-point modeling, (2) a dual dynamic attention mechanism that adaptively recalibrates channel and spatial responses using input- and class-dependent cues, and (3) class-aware feature adaptation that modulates features to prioritize discriminative regions for rare classes. Through comprehensive ablation studies with YOLOv8 and related architectures, alongside benchmarking against nine state-of-the-art baselines, DyCAF-Net achieves significant improvements in precision, mAP@50, and mAP@50-95 across 13 diverse benchmarks, including occlusion-heavy and long-tailed datasets. The framework maintains computational efficiency ($\sim$11.1M parameters) and competitive inference speeds, while its adaptability to scale variance, semantic overlaps, and class imbalance positions it as a robust solution for real-world detection tasks in medical imaging, surveillance, and autonomous systems.

【5】Semantic Mosaicing of Histo-Pathology Image Fragments using Visual Foundation Models
标题：使用视觉基础模型对组织病理学图像片段进行语义马赛克
链接：https://arxiv.org/abs/2508.03524

作者：andstätter, Maximilian Köller, Philipp Seeböck, Alissa Blessing, Felicitas Oberndorfer, Svitlana Pochepnia, Helmut Prosch, Georg Langs
摘要：在组织病理学中，组织样本通常比标准的显微镜载玻片大，这使得需要缝合多个片段来处理整个结构，例如肿瘤。自动拼接是缩放分析的先决条件，但由于制备期间可能的组织损失、不均匀的形态失真、染色不一致、由于载玻片上的未对准而缺失的区域或磨损的组织边缘而具有挑战性。这限制了使用边界形状匹配算法来重建人工整体载玻片（WMS）的最先进的拼接方法。在这里，我们介绍SemanticStitcher使用来自视觉组织病理学基础模型的潜在特征表示来识别不同片段中的相邻区域。基于大量语义匹配候选的鲁棒姿态估计导出多个片段的马赛克以形成WMS。三个不同的组织病理学数据集上的实验表明，SemanticStitcher产生强大的WMS镶嵌，并始终优于正确的边界匹配的最先进的。
摘要：In histopathology, tissue samples are often larger than a standard microscope slide, making stitching of multiple fragments necessary to process entire structures such as tumors. Automated stitching is a prerequisite for scaling analysis, but is challenging due to possible tissue loss during preparation, inhomogeneous morphological distortion, staining inconsistencies, missing regions due to misalignment on the slide, or frayed tissue edges. This limits state-of-the-art stitching methods using boundary shape matching algorithms to reconstruct artificial whole mount slides (WMS). Here, we introduce SemanticStitcher using latent feature representations derived from a visual histopathology foundation model to identify neighboring areas in different fragments. Robust pose estimation based on a large number of semantic matching candidates derives a mosaic of multiple fragments to form the WMS. Experiments on three different histopathology datasets demonstrate that SemanticStitcher yields robust WMS mosaicing and consistently outperforms the state of the art in correct boundary matches.

【6】SCFlow: Implicitly Learning Style and Content Disentanglement with Flow Models
标题：SCFlow：使用流模型的隐含学习风格和内容分解
链接：https://arxiv.org/abs/2508.03402

作者： Ma, Xiaopei Yang, Yusong Li, Ming Gui, Felix Krause, Johannes Schusterbauer, Björn Ommer
备注：ICCV 2025, Project Page: this https URL
摘要：由于视觉模型中的风格和内容的语义重叠和人类感知的主观性，明确地解开它们仍然具有挑战性。现有的方法建议通过生成或歧视性的目标分离，但他们仍然面临着固有的模糊性解开交织的概念。相反，我们要问：我们是否可以通过学习将风格和内容可逆地融合，从而绕过明确的分离，让分离自然出现？我们提出了SCFlow，一个流匹配框架，学习纠缠和解开表示之间的双向映射。我们的方法建立在三个关键的见解：1）训练只合并风格和内容，一个定义明确的任务，使可逆的解开没有明确的监督; 2）流匹配桥上的任意分布，避免限制性高斯先验的扩散模型和规范化流;以及3）策划510，000个样本（51个风格$\乘以$10，000个内容样本）的合成数据集以通过系统的风格-内容配对来模拟解开。除了可控的生成任务，我们证明了SCFlow在zero-shot设置中推广到ImageNet-1 k和WikiArt，并实现了有竞争力的性能，突出了从可逆合并过程中自然出现的解纠缠。
摘要：Explicitly disentangling style and content in vision models remains challenging due to their semantic overlap and the subjectivity of human perception. Existing methods propose separation through generative or discriminative objectives, but they still face the inherent ambiguity of disentangling intertwined concepts. Instead, we ask: Can we bypass explicit disentanglement by learning to merge style and content invertibly, allowing separation to emerge naturally? We propose SCFlow, a flow-matching framework that learns bidirectional mappings between entangled and disentangled representations. Our approach is built upon three key insights: 1) Training solely to merge style and content, a well-defined task, enables invertible disentanglement without explicit supervision; 2) flow matching bridges on arbitrary distributions, avoiding the restrictive Gaussian priors of diffusion models and normalizing flows; and 3) a synthetic dataset of 510,000 samples (51 styles $\times$ 10,000 content samples) was curated to simulate disentanglement through systematic style-content pairing. Beyond controllable generation tasks, we demonstrate that SCFlow generalizes to ImageNet-1k and WikiArt in zero-shot settings and achieves competitive performance, highlighting that disentanglement naturally emerges from the invertible merging process.

【7】A neural network machine-learning approach for characterising hydrogen trapping parameters from TDS experiments
标题：用于从TDS实验中描述氢捕获参数的神经网络机器学习方法
链接：https://arxiv.org/abs/2508.03371

作者：i, T. Hageman, E. Martínez-Pañeda
摘要：金属合金的氢捕获行为通常使用热脱附光谱（TDS）来表征。然而，作为一种间接的方法，提取关键参数（陷阱结合能和密度）仍然是一个重大的挑战。为了解决这些限制，这项工作介绍了一种基于机器学习的方案，从TDS光谱参数识别。开发了一个多神经网络（NN）模型，并专门在合成数据上进行训练，以直接从实验数据预测捕获参数。该模型包括两个多层，完全连接，前馈神经网络训练与反向传播。第一个网络（分类模型）预测不同陷阱类型的数量。第二个网络（回归模型），然后预测相应的陷阱密度和结合能。优化了NN架构、超参数和数据预处理，以最大限度地减少训练数据量。该模型表现出较强的预测能力时，应用到三个回火马氏体钢的不同组成。代码是免费提供的。
摘要：The hydrogen trapping behaviour of metallic alloys is generally characterised using Thermal Desorption Spectroscopy (TDS). However, as an indirect method, extracting key parameters (trap binding energies and densities) remains a significant challenge. To address these limitations, this work introduces a machine learning-based scheme for parameter identification from TDS spectra. A multi-Neural Network (NN) model is developed and trained exclusively on synthetic data to predict trapping parameters directly from experimental data. The model comprises two multi-layer, fully connected, feed-forward NNs trained with backpropagation. The first network (classification model) predicts the number of distinct trap types. The second network (regression model) then predicts the corresponding trap densities and binding energies. The NN architectures, hyperparameters, and data pre-processing were optimised to minimise the amount of training data. The proposed model demonstrated strong predictive capabilities when applied to three tempered martensitic steels of different compositions. The code developed is freely provided.

【8】FedPromo: Federated Lightweight Proxy Models at the Edge Bring New Domains to Foundation Models
标题：FedPromo：边缘的联合轻量级代理模型为基础模型带来新领域
链接：https://arxiv.org/abs/2508.03356

作者：ligiuri, Francesco Barbato, Donald Shenaj, Umberto Michieli, Pietro Zanuttigh
备注：7 pages (main document) + 12 pages (appendix), 3 figures (main) + 12 figures (appendix), 5 tables (main) + 6 tables (appendix), submitted to AAAI 2026
摘要：联邦学习（FL）是一种在分散数据上训练深度学习模型的既定范式。然而，随着模型大小的增长，传统的FL方法通常需要客户端设备上的大量计算资源，这可能是不可行的。我们介绍FedPromo，一个新的框架，使存储在中央服务器上的大规模基础模型，只有远程客户端遇到的新领域的有效适应。FedPromo不是直接在客户端设备上训练大型模型，而是通过FL优化轻量级代理模型，在保持隐私的同时显著减少计算开销。我们的方法遵循两个阶段的过程：首先，服务器端知识蒸馏对齐大规模基础模型的表示（例如，Transformer）与紧凑对应物的那些（例如，a CNN）。然后，将紧凑模型编码器部署到客户端设备，在那里本地学习可训练分类器。这些分类器随后被聚合并无缝地传输回基础模型，从而在不需要直接访问用户数据的情况下促进个性化适应。通过新颖的正则化策略，我们的框架可以实现分散的多域学习，平衡性能，隐私和资源效率。在五个图像分类基准上进行的大量实验表明，FedPromo在假设客户端资源有限的情况下优于现有方法。
摘要：Federated Learning (FL) is an established paradigm for training deep learning models on decentralized data. However, as the size of the models grows, conventional FL approaches often require significant computational resources on client devices, which may not be feasible. We introduce FedPromo, a novel framework that enables efficient adaptation of large-scale foundation models stored on a central server to new domains encountered only by remote clients. Instead of directly training the large model on client devices, FedPromo optimizes lightweight proxy models via FL, significantly reducing computational overhead while maintaining privacy. Our method follows a two-stage process: first, server-side knowledge distillation aligns the representations of a large-scale foundation model (e.g., a transformer) with those of a compact counterpart (e.g., a CNN). Then, the compact model encoder is deployed to client devices, where trainable classifiers are learned locally. These classifiers are subsequently aggregated and seamlessly transferred back to the foundation model, facilitating personalized adaptation without requiring direct access to user data. Through novel regularization strategies, our framework enables decentralized multi-domain learning, balancing performance, privacy, and resource efficiency. Extensive experiments on five image classification benchmarks demonstrate that FedPromo outperforms existing methods while assuming limited-resource clients.

【9】Bridging ocean wave physics and deep learning: Physics-informed neural operators for nonlinear wavefield reconstruction in real-time
标题：连接海浪物理学和深度学习：物理知识的神经操作器实时重建非线性波场
链接：https://arxiv.org/abs/2508.03315

作者：lers, Merten Stender, Norbert Hoffmann
备注：13 pages, 7 figures
摘要：精确的实时预测的相位分辨海浪场仍然是一个关键的，但在很大程度上未解决的问题，主要是由于缺乏实用的数据同化方法重建初始条件稀疏或间接的波浪测量。虽然监督式深度学习的最新进展已经显示出这方面的潜力，但它们需要大量标记的地面真实波数据集，而这在现实世界中是不可行的。为了克服这一局限性，我们提出了一个物理信息神经运算符（PINO）框架，用于从稀疏测量中重建空间和时间相位分辨的非线性海浪场，而无需在训练过程中使用地面真实数据。这是通过将海洋重力波的自由表面边界条件的残差嵌入到PINO的损失函数中来实现的，以软的方式约束解空间。训练后，我们使用高度逼真的合成波数据验证了我们的方法，并展示了从浮标时间序列和雷达快照中准确重建非线性波场。我们的研究结果表明，PINO能够实现准确，实时的重建，并在广泛的波浪条件下稳健地推广，从而为现实海洋环境中的操作，数据驱动的波浪重建和预测铺平了道路。
摘要：Accurate real-time prediction of phase-resolved ocean wave fields remains a critical yet largely unsolved problem, primarily due to the absence of practical data assimilation methods for reconstructing initial conditions from sparse or indirect wave measurements. While recent advances in supervised deep learning have shown potential for this purpose, they require large labelled datasets of ground truth wave data, which are infeasible to obtain in real-world scenarios. To overcome this limitation, we propose a Physics-Informed Neural Operator (PINO) framework for reconstructing spatially and temporally phase-resolved, nonlinear ocean wave fields from sparse measurements, without the need for ground truth data during training. This is achieved by embedding residuals of the free surface boundary conditions of ocean gravity waves into the loss function of the PINO, constraining the solution space in a soft manner. After training, we validate our approach using highly realistic synthetic wave data and demonstrate the accurate reconstruction of nonlinear wave fields from both buoy time series and radar snapshots. Our results indicate that PINOs enable accurate, real-time reconstruction and generalize robustly across a wide range of wave conditions, thereby paving the way for operational, data-driven wave reconstruction and prediction in realistic marine environments.

【10】Towards Interpretable Concept Learning over Time Series via Temporal Logic Semantics
标题：通过时态逻辑语义实现时间序列上的可解释概念学习
链接：https://arxiv.org/abs/2508.03269

作者：foglia, Simone Silvetti, Gaia Saveri, Laura Nenzi, Luca Bortolussi
摘要：时间序列分类是一项至关重要的任务，因为这类数据经常出现在安全关键型应用中。然而，它通常是用黑盒深度学习方法来处理的，这使得人类很难理解其输出背后的原理。为了应对这一挑战，我们提出了一个神经符号框架，通过将轨迹直接嵌入到信号时序逻辑（STL）概念的空间中来统一分类和解释。通过引入一种新颖的STL启发的内核，将原始时间序列映射到与预定义STL公式的对齐，我们的模型共同优化了准确性和可解释性，因为每个预测都伴随着最相关的逻辑概念，这使得分类基于人类可解释的时间模式，并产生局部和全局符号解释。早期的结果显示了有竞争力的性能，同时为模型决策提供了高质量的逻辑依据。
摘要：Time series classification is a task of paramount importance, as this kind of data often arises in safety-critical applications. However, it is typically tackled with black-box deep learning methods, making it hard for humans to understand the rationale behind their output. To take on this challenge, we propose a neuro-symbolic framework that unifies classification and explanation through direct embedding of trajectories into a space of Signal Temporal Logic (STL) concepts. By introducing a novel STL-inspired kernel that maps raw time series to their alignment with predefined STL formulae, our model jointly optimises for accuracy and interpretability, as each prediction is accompanied by the most relevant logical concepts that characterise it. This enables classification grounded in human-interpretable temporal patterns and produces both local and global symbolic explanations. Early results show competitive performance while offering high-quality logical justifications for model decisions.

【11】HALO: Hindsight-Augmented Learning for Online Auto-Bidding
标题：HALO：用于在线自动竞价的事后诸葛亮增强学习
链接：https://arxiv.org/abs/2508.03267

作者：g, Chenglong Cao, Xinyu Zhou, Jirong You, Linhe Xu, Feifan Xu, Shuo Yuan
备注：13 pages, 5 figures
摘要：数字广告平台通过实时竞价（RTB）系统进行毫秒级拍卖，广告商通过算法竞价竞争广告展示。这种动态机制可以实现精确的受众定位，但由于广告客户的异质性，带来了深刻的运营复杂性：预算和ROI目标跨越了广告客户的数量级，从单个商家到跨国品牌。这种多样性为多约束招标（MCB）创造了一个苛刻的适应环境。传统的自动投标解决方案在这种环境中失败，由于两个关键缺陷：1）严重的样本效率低下，在特定约束下失败的探索不会产生新的收益率-ROI组合的可转移知识，以及2）在约束转移下有限的泛化，因为它们忽略了约束和投标系数之间的物理关系。为了解决这个问题，我们提出了HALO：用于在线自动竞标的后见之明增强学习。HALO引入了一种理论上的事后机制，通过轨迹重定向将所有探索重新用于任意约束配置的训练数据。此外，它采用B样条函数表示，使连续的，导数感知投标映射跨越约束空间。即使预算/ROI要求与培训场景存在巨大差异，HALO也能确保稳健的适应性。工业数据集评估表明HALO在处理多尺度约束，减少约束违反，同时提高GMV方面的优越性。
摘要：Digital advertising platforms operate millisecond-level auctions through Real-Time Bidding (RTB) systems, where advertisers compete for ad impressions through algorithmic bids. This dynamic mechanism enables precise audience targeting but introduces profound operational complexity due to advertiser heterogeneity: budgets and ROI targets span orders of magnitude across advertisers, from individual merchants to multinational brands. This diversity creates a demanding adaptation landscape for Multi-Constraint Bidding (MCB). Traditional auto-bidding solutions fail in this environment due to two critical flaws: 1) severe sample inefficiency, where failed explorations under specific constraints yield no transferable knowledge for new budget-ROI combinations, and 2) limited generalization under constraint shifts, as they ignore physical relationships between constraints and bidding coefficients. To address this, we propose HALO: Hindsight-Augmented Learning for Online Auto-Bidding. HALO introduces a theoretically grounded hindsight mechanism that repurposes all explorations into training data for arbitrary constraint configuration via trajectory reorientation. Further, it employs B-spline functional representation, enabling continuous, derivative-aware bid mapping across constraint spaces. HALO ensures robust adaptation even when budget/ROI requirements differ drastically from training scenarios. Industrial dataset evaluations demonstrate the superiority of HALO in handling multi-scale constraints, reducing constraint violations while improving GMV.

【12】Scaling DRL for Decision Making: A Survey on Data, Network, and Training Budget Strategies
标题：扩展DRL以进行决策：数据、网络和训练预算策略调查
链接：https://arxiv.org/abs/2508.03194

作者：ngyao Tang, Chenjun Xiao, Yaodong Yang, Wei Wei, Jianye Hao, Jiye Liang
摘要：None
摘要：In recent years, the expansion of neural network models and training data has driven remarkable progress in deep learning, particularly in computer vision and natural language processing. This advancement is underpinned by the concept of Scaling Laws, which demonstrates that scaling model parameters and training data enhances learning performance. While these fields have witnessed breakthroughs, such as the development of large language models like GPT-4 and advanced vision models like Midjourney, the application of scaling laws in deep reinforcement learning (DRL) remains relatively unexplored. Despite its potential to improve performance, the integration of scaling laws into DRL for decision making has not been fully realized. This review addresses this gap by systematically analyzing scaling strategies in three dimensions: data, network, and training budget. In data scaling, we explore methods to optimize data efficiency through parallel sampling and data generation, examining the relationship between data volume and learning outcomes. For network scaling, we investigate architectural enhancements, including monolithic expansions, ensemble and MoE methods, and agent number scaling techniques, which collectively enhance model expressivity while posing unique computational challenges. Lastly, in training budget scaling, we evaluate the impact of distributed training, high replay ratios, large batch sizes, and auxiliary training on training efficiency and convergence. By synthesizing these strategies, this review not only highlights their synergistic roles in advancing DRL for decision making but also provides a roadmap for future research. We emphasize the importance of balancing scalability with computational efficiency and outline promising directions for leveraging scaling to unlock the full potential of DRL in various tasks such as robot control, autonomous driving and LLM training.

【13】RegMean++: Enhancing Effectiveness and Generalization of Regression Mean for Model Merging
标题：RegMean++：增强模型合并回归均值的有效性和推广
链接：https://arxiv.org/abs/2508.03121

作者：guyen, Dang Huu-Tien, Takeshi Suzuki, Le-Minh Nguyen
备注：17 pages, 11 figures, 11 tables
摘要：回归均值（RegMean）是一种将模型合并公式化为线性回归问题的方法，旨在通过最小化合并模型和候选模型之间的预测差异来为合并模型中的每个线性层找到最佳权重。RegMean为合并问题提供了一个精确的封闭形式解决方案;因此，它提供了可解释性和计算效率。然而，RegMean独立地合并每个线性层，忽略了早期层中的特征和信息如何通过层传播并影响合并模型中的最终预测。在本文中，我们介绍了RegMean++，一个简单而有效的替代RegMean，明确地将合并模型层之间的层内和跨层依赖关系纳入RegMean的目标。通过考虑这些依赖关系，RegMean++可以更好地捕捉合并模型的行为。大量的实验表明，RegMean++在不同的设置中始终优于RegMean，包括域内（ID）和域外（OOD）泛化，顺序合并，大规模任务以及在几种类型的分布变化下的鲁棒性。此外，RegMean++与各种最新的高级模型合并方法相比，具有竞争力或最先进的性能。我们的代码可在https://github.com/nthehai01/RegMean-plusplus上获得。
摘要：Regression Mean (RegMean), an approach that formulates model merging as a linear regression problem, aims to find the optimal weights for each linear layer in the merge model by minimizing the discrepancy in predictions between the merge and candidate models. RegMean provides a precise closed-form solution for the merging problem; therefore, it offers explainability and computational efficiency. However, RegMean merges each linear layer independently, overlooking how the features and information in the earlier layers propagate through the layers and influence the final prediction in the merge model. In this paper, we introduce RegMean++, a simple yet effective alternative to RegMean, that explicitly incorporates both intra- and cross-layer dependencies between merge models' layers into RegMean's objective. By accounting for these dependencies, RegMean++ better captures the behaviors of the merge model. Extensive experiments demonstrate that RegMean++ consistently outperforms RegMean across diverse settings, including in-domain (ID) and out-of-domain (OOD) generalization, sequential merging, large-scale tasks, and robustness under several types of distribution shifts. Furthermore, RegMean++ achieves competitive or state-of-the-art performance compared to various recent advanced model merging methods. Our code is available at https://github.com/nthehai01/RegMean-plusplus.

【14】PLoRA: Efficient LoRA Hyperparameter Tuning for Large Models
标题：PLoRA：大型模型的高效LoRA超参数调整
链接：https://arxiv.org/abs/2508.02932

作者：an, Zhuang Wang, Zhen Jia, Shivaram Venkataraman, Yida Wang
摘要：低秩自适应（LoRA）作为大型语言模型（LLM）的微调方法，由于其低资源需求和良好的性能而受到欢迎。虽然大量的工作已经研究了通过同时服务多个LoRA来提高LoRA服务效率，但现有方法假设各种各样的LoRA适配器可用于服务。在我们的工作中，我们进行了广泛的实证研究，以确定当前的训练范式没有有效地利用硬件资源，并需要高开销来获得高性能的LoRA。利用这些见解，我们提出了PLoRA，它在给定的硬件和模型约束下自动编排并发LoRA微调作业，并开发高性能内核以提高训练效率。我们的实验研究表明，PLoRA在给定的超参数搜索空间上将LoRA微调的完工时间缩短了7.52倍，并在一系列最先进的LLM中将训练吞吐量提高了12.8倍。
摘要：Low-rank Adaptation (LoRA) has gained popularity as a fine-tuning approach for Large Language Models (LLMs) due to its low resource requirements and good performance. While a plethora of work has investigated improving LoRA serving efficiency by serving multiple LoRAs concurrently, existing methods assume that a wide range of LoRA adapters are available for serving. In our work, we conduct extensive empirical studies to identify that current training paradigms do not utilize hardware resources efficiently and require high overhead to obtain a performant LoRA. Leveraging these insights, we propose PLoRA, which automatically orchestrates concurrent LoRA fine-tuning jobs under given hardware and model constraints and develops performant kernels to improve training efficiency. Our experimental studies show that PLoRA reduces the makespan of LoRA fine-tuning over a given hyperparameter search space by up to 7.52x and improves training throughput by up to 12.8x across a range of state-of-the-art LLMs.

【15】GrandJury: A Collaborative Machine Learning Model Evaluation Protocol for Dynamic Quality Rubrics
标题：GrandJury：动态质量指标的协作机器学习模型评估协议
链接：https://arxiv.org/abs/2508.02926

作者：o
备注：26 pages, 1 table. Open-source implementation available on PyPI (grandjury package) and GitHub. Dataset available on Hugging Face under CC-BY-4.0 license
摘要：生成式机器学习模型已经成为现代系统的核心，为创意写作、摘要、多跳推理和上下文感知对话中的应用提供动力。这些模型支持大规模人工智能助手、工作流程自动化和自主决策。在这些领域，可接受的反应很少是绝对的或静态的，但复数和高度依赖于上下文。然而，标准的评估机制仍然依赖于静态的基准式测试，激励优化排行榜分数，而不是与动态用户需求或不断变化的现实保持一致。GrandJury引入了一个正式的评估协议，它结合了时间衰减聚合，完全可追溯性，支持动态，透明的任务规则属性和多评价者的人类判断。这些因素合在一起，就能进行多元、负责任的评价，抓住不断形成的共识，提出分歧。我们提供了一个开源实现（大陪审团PyPI包）和一个公共的大型语言模型（LLM）推理输出集合来说明需求和方法。GrandJury为人工智能从业者提供了一个新的范式，在没有绝对基础事实的情况下评估机器学习输出。
摘要：Generative Machine Learning models have become central to modern systems, powering applications in creative writing, summarization, multi-hop reasoning, and context-aware dialogue. These models underpin large-scale AI assistants, workflow automation, and autonomous decision-making. In such domains, acceptable response is rarely absolute or static, but plural and highly context-dependent. Yet standard evaluation regimes still rely on static, benchmark-style tests, incentivizing optimization toward leaderboard scores rather than alignment with dynamic user needs or evolving realities. GrandJury introduces a formal evaluation protocol combining time-decayed aggregation, complete traceability, with the support of dynamic, transparent task rubric attribution, and multi-rater human judgment. Together, these elements enable pluralistic, accountable evaluation that captures evolving consensus and surfaces disagreement. We provide an open-source implementation (grandjury PyPI package) and a public collection of Large Language Model (LLM) inference outputs to illustrate the need and method. GrandJury provides a new paradigm for AI practitioners when evaluating machine learning outputs without absolute ground truth.

【16】Neural Networks with Orthogonal Jacobian
标题：具有垂直雅可比神经网络
链接：https://arxiv.org/abs/2508.02882

作者：ucco, Davide Murari, Carola-Bibiane Schönlieb
摘要：深度神经网络通过提取丰富的分层特征来实现最先进的性能。然而，通过反向传播训练它们通常会受到消失或爆炸梯度的阻碍。现有的补救措施，如正交或方差保持初始化和残差架构，允许更稳定的梯度传播和更深层次的模型的训练。在这项工作中，我们引入了一个统一的数学框架，描述了广泛的一类非线性前馈和残差网络，其输入输出雅可比矩阵几乎处处正交。这样的约束迫使所得到的网络实现完美的动态等距和有效的训练，尽管是非常深的。我们的公式不仅恢复标准架构作为特殊情况下，但也产生新的设计，匹配的残差网络的可训练性，而不依赖于传统的跳过连接。我们提供的实验证据表明，完美的雅可比正交初始化是足够的稳定训练，并实现竞争力的表现。我们比较这种策略的网络正则化，以保持雅可比正交性，并获得可比的结果。我们进一步扩展我们的分析，以及近似的正交雅可比矩阵的网络，并介绍网络与雅可比矩阵代表部分等距。这些广义模型，然后保持良好的可训练性。
摘要：Very deep neural networks achieve state-of-the-art performance by extracting rich, hierarchical features. Yet, training them via backpropagation is often hindered by vanishing or exploding gradients. Existing remedies, such as orthogonal or variance-preserving initialisation and residual architectures, allow for a more stable gradient propagation and the training of deeper models. In this work, we introduce a unified mathematical framework that describes a broad class of nonlinear feedforward and residual networks, whose input-to-output Jacobian matrices are exactly orthogonal almost everywhere. Such a constraint forces the resulting networks to achieve perfect dynamical isometry and train efficiently despite being very deep. Our formulation not only recovers standard architectures as particular cases but also yields new designs that match the trainability of residual networks without relying on conventional skip connections. We provide experimental evidence that perfect Jacobian orthogonality at initialisation is sufficient to stabilise training and achieve competitive performance. We compare this strategy to networks regularised to maintain the Jacobian orthogonality and obtain comparable results. We further extend our analysis to a class of networks well-approximated by those with orthogonal Jacobians and introduce networks with Jacobians representing partial isometries. These generalized models are then showed to maintain the favourable trainability properties.

【17】Agentic Privacy-Preserving Machine Learning
标题：隐私保护机器学习
链接：https://arxiv.org/abs/2508.02836

作者：ang, Zhuotao Liu, Jingwen Huang, Xuanqi Liu
摘要：隐私保护机器学习（PPML）对于确保人工智能中的数据隐私至关重要。在过去的几年里，社区已经提出了广泛的可证明安全的PPML方案，依赖于各种密码原语。然而，当涉及到具有数十亿个参数的大型语言模型（LLM）时，PPML的效率几乎是可以接受的。例如，与明文推理相比，机密LLM推理的最先进解决方案的性能至少要慢10，000倍。当上下文长度增加时，性能差距甚至更大。在这份立场文件中，我们提出了一个新的框架，名为prostatic-PPML，使PPML在LLM实用。我们的关键见解是采用通用LLM进行意图理解，并将加密安全推理委托给在垂直领域训练的专业模型。通过模块化地将语言意图解析（通常只涉及很少或不涉及敏感信息）与隐私关键计算分离开来，自动PPML完全消除了LLM处理加密提示的需要，从而实现了以隐私保护为中心的LLM服务的实际部署。
摘要：Privacy-preserving machine learning (PPML) is critical to ensure data privacy in AI. Over the past few years, the community has proposed a wide range of provably secure PPML schemes that rely on various cryptography primitives. However, when it comes to large language models (LLMs) with billions of parameters, the efficiency of PPML is everything but acceptable. For instance, the state-of-the-art solution for confidential LLM inference represents at least 10,000-fold slower performance compared to plaintext inference. The performance gap is even larger when the context length increases. In this position paper, we propose a novel framework named Agentic-PPML to make PPML in LLMs practical. Our key insight is to employ a general-purpose LLM for intent understanding and delegate cryptographically secure inference to specialized models trained on vertical domains. By modularly separating language intent parsing - which typically involves little or no sensitive information - from privacy-critical computation, Agentic-PPML completely eliminates the need for the LLMs to process the encrypted prompts, enabling practical deployment of privacy-preserving LLM-centric services.

【18】Considering Spatial Structure of the Road Network in Pavement Deterioration Modeling
标题：考虑道路网空间结构的路面退化建模
链接：https://arxiv.org/abs/2508.02749

作者：e Yu, Pan Lu
备注：None
摘要：路面退化建模在提供有关道路网络未来状态的信息以及确定预防性维护或修复处理的需求方面非常重要。本研究将路网空间相关性纳入路面老化模型，并以图类神经网路进行预测。使用GNN进行路面性能建模的主要动机是能够轻松直接地利用网络中丰富的结构信息。本文探讨了考虑路网空间结构是否会提高劣化模型的预测性能。在这项研究中使用的数据包括一个大型的路面状况数据集，从路面管理信息系统（PMIS）由得克萨斯州交通部维护超过50万的意见。比较结果表明，考虑空间关系的路面破损预测模型具有更好的预测效果。
摘要：Pavement deterioration modeling is important in providing information regarding the future state of the road network and in determining the needs of preventive maintenance or rehabilitation treatments. This research incorporated spatial dependence of road network into pavement deterioration modeling through a graph neural network (GNN). The key motivation of using a GNN for pavement performance modeling is the ability to easily and directly exploit the rich structural information in the network. This paper explored if considering spatial structure of the road network will improve the prediction performance of the deterioration models. The data used in this research comprises a large pavement condition data set with more than a half million observations taken from the Pavement Management Information System (PMIS) maintained by the Texas Department of Transportation. The promising comparison results indicates that pavement deterioration prediction models perform better when spatial relationship is considered.

【19】DeepGB-TB: A Risk-Balanced Cross-Attention Gradient-Boosted Convolutional Network for Rapid, Interpretable Tuberculosis Screening
标题：DeepGB-TB：一个风险平衡的交叉注意力受试者推动的卷积网络，用于快速、可解释的结核病筛查
链接：https://arxiv.org/abs/2508.02741

作者：Lu, Yulong Li, Feilong Tang, Zhengyong Jiang, Chong Li, Mian Zhou, Tenglong Li, Jionglong Su
摘要：大规模结核病筛查受到传统诊断的高成本和操作复杂性的限制，因此需要人工智能解决方案。我们提出了DeepGB-TB，这是一种非侵入性系统，仅使用咳嗽音频和基本人口统计数据即可立即分配结核病风险评分。该模型将用于音频处理的轻量级一维卷积神经网络与用于表格特征的梯度提升决策树相结合。它的主要创新是跨模态双向交叉注意模块（CM-BCA），该模块在模态之间迭代地交换显著线索，模仿临床医生整合症状和风险因素的方式。为了满足最大限度地减少漏诊病例的临床优先级，我们设计了一个结核病风险平衡损失（TRBL），对假阴性预测施加更强的惩罚，从而减少高风险的错误分类。DeepGB-TB在7个国家收集的1，105名患者的多样化数据集上进行了评估，AUROC为0.903，F1得分为0.851，代表了最新的技术水平。其计算效率可以直接在普通移动设备上进行实时离线推理，非常适合低资源环境。重要的是，该系统产生了临床验证的解释，促进了一线卫生工作者的信任和采用。通过将人工智能创新与公共卫生对速度、可负担性和可靠性的要求相结合，DeepGB-TB为推进全球结核病控制提供了一种工具。
摘要：Large-scale tuberculosis (TB) screening is limited by the high cost and operational complexity of traditional diagnostics, creating a need for artificial-intelligence solutions. We propose DeepGB-TB, a non-invasive system that instantly assigns TB risk scores using only cough audio and basic demographic data. The model couples a lightweight one-dimensional convolutional neural network for audio processing with a gradient-boosted decision tree for tabular features. Its principal innovation is a Cross-Modal Bidirectional Cross-Attention module (CM-BCA) that iteratively exchanges salient cues between modalities, emulating the way clinicians integrate symptoms and risk factors. To meet the clinical priority of minimizing missed cases, we design a Tuberculosis Risk-Balanced Loss (TRBL) that places stronger penalties on false-negative predictions, thereby reducing high-risk misclassifications. DeepGB-TB is evaluated on a diverse dataset of 1,105 patients collected across seven countries, achieving an AUROC of 0.903 and an F1-score of 0.851, representing a new state of the art. Its computational efficiency enables real-time, offline inference directly on common mobile devices, making it ideal for low-resource settings. Importantly, the system produces clinically validated explanations that promote trust and adoption by frontline health workers. By coupling AI innovation with public-health requirements for speed, affordability, and reliability, DeepGB-TB offers a tool for advancing global TB control.

【20】Embedding-Enhanced Probabilistic Modeling of Ferroelectric Field Effect Transistors (FeFETs)
标题：铁电场效应晶体管（Fe场效应晶体管）的嵌入增强概率建模
链接：https://arxiv.org/abs/2508.02737

作者：bi Afee, Jack Hutchins, Md Mazharul Islam, Thomas Kampfe, Ahmedullah Aziz
备注：15 pages, 6 figures, manuscript yet not submitted anywhere
摘要：FeFET在推进存储器和逻辑技术方面具有强大的潜力，但其由操作循环和制造可变性引起的固有随机性对准确和可靠的建模提出了重大挑战。捕捉这种变化是至关重要的，因为它使设计人员能够预测行为，优化性能，并确保可靠性和鲁棒性，以应对制造和操作条件的变化。现有的确定性和基于机器学习的紧凑模型通常无法捕捉这种可变性的全部范围，或者缺乏稳定电路级集成所需的数学平滑性。在这项工作中，我们提出了一个增强的概率建模框架FeFET，解决这些限制。基于混合密度网络（MDN）的基础上，我们的方法集成了C-无穷大连续激活函数，以实现平滑，稳定的学习和特定于设备的嵌入层，以捕获跨设备的固有物理变化。从学习的嵌入分布中采样使得能够生成用于可变性感知模拟的合成设备实例。在R2为0.92的情况下，该模型在捕获FeFET电流行为的可变性方面表现出高精度。总之，该框架提供了一个可扩展的，数据驱动的解决方案，用于建模FeFET的完整随机行为，并为未来的紧凑型模型开发和电路仿真集成提供了坚实的基础。
摘要：FeFETs hold strong potential for advancing memory and logic technologies, but their inherent randomness arising from both operational cycling and fabrication variability poses significant challenges for accurate and reliable modeling. Capturing this variability is critical, as it enables designers to predict behavior, optimize performance, and ensure reliability and robustness against variations in manufacturing and operating conditions. Existing deterministic and machine learning-based compact models often fail to capture the full extent of this variability or lack the mathematical smoothness required for stable circuit-level integration. In this work, we present an enhanced probabilistic modeling framework for FeFETs that addresses these limitations. Building upon a Mixture Density Network (MDN) foundation, our approach integrates C-infinity continuous activation functions for smooth, stable learning and a device-specific embedding layer to capture intrinsic physical variability across devices. Sampling from the learned embedding distribution enables the generation of synthetic device instances for variability-aware simulation. With an R2 of 0.92, the model demonstrates high accuracy in capturing the variability of FeFET current behavior. Altogether, this framework provides a scalable, data-driven solution for modeling the full stochastic behavior of FeFETs and offers a strong foundation for future compact model development and circuit simulation integration.

【21】Mathematical Foundations of Geometric Deep Learning
标题：几何深度学习的数学基础
链接：https://arxiv.org/abs/2508.02723

作者：z de Ocáriz Borde, Michael Bronstein
备注：78 pages
摘要：我们回顾了研究几何深度学习所需的关键数学概念。
摘要：We review the key mathematical concepts necessary for studying Geometric Deep Learning.

【22】ZetA: A Riemann Zeta-Scaled Extension of Adam for Deep Learning
标题：ZetA：Adam的Riemann Zeta扩展，用于深度学习
链接：https://arxiv.org/abs/2508.02719

作者：BC
备注：6 pages, 1 figure, 4 references. This paper introduces a hybrid optimizer combining Adam with Riemann zeta-based scaling
摘要：这项工作介绍了ZetA，这是一种新型的深度学习优化器，它通过基于Riemann zeta函数的动态缩放来扩展Adam。据我们所知，ZetA是第一个在深度学习优化中应用基于zeta的梯度缩放的优化器。该方法通过一种混合更新机制提高了泛化能力和鲁棒性，该机制集成了自适应阻尼、基于余弦相似性的动量提升、熵正则化损失和锐度感知最小化（SAM）风格的扰动。对SVHN、CIFAR10、CIFAR100、STL 10和噪声CIFAR10的经验评估一致显示，与Adam相比，测试准确度有所提高。所有实验都使用了一个轻量级的全连接网络，在混合精度设置下训练了五个时期。结果表明，ZetA是一个计算效率和强大的替代亚当，特别是有效的噪声或高粒度的分类任务。
摘要：This work introduces ZetA, a novel deep learning optimizer that extends Adam by incorporating dynamic scaling based on the Riemann zeta function. To the best of our knowledge, ZetA is the first optimizer to apply zeta-based gradient scaling within deep learning optimization. The method improves generalization and robustness through a hybrid update mechanism that integrates adaptive damping, cosine similarity-based momentum boosting, entropy-regularized loss, and Sharpness-Aware Minimization (SAM)-style perturbations. Empirical evaluations on SVHN, CIFAR10, CIFAR100, STL10, and noisy CIFAR10 consistently show test accuracy improvements over Adam. All experiments employ a lightweight fully connected network trained for five epochs under mixed-precision settings. The results demonstrate that ZetA is a computationally efficient and robust alternative to Adam, particularly effective in noisy or high-granularity classification tasks.

【23】Learning quadratic neural networks in high dimensions: SGD dynamics and scaling laws
标题：多维学习二次神经网络：SGD动力学和缩放定律
链接：https://arxiv.org/abs/2508.03688

作者：n Arous, Murat A. Erdogdu, N. Mert Vural, Denny Wu
备注：84 pages
摘要：We study the optimization and sample complexity of gradient-based training of a two-layer neural network with quadratic activation function in the high-dimensional regime, where the data is generated as $y \propto \sum_{j=1}^{r}\lambda_j \sigma\left(\langle \boldsymbol{\theta_j}, \boldsymbol{x}\rangle\right), \boldsymbol{x} \sim N(0,\boldsymbol{I}_d)$, $\sigma$ is the 2nd Hermite polynomial, and $\lbrace\boldsymbol{\theta}_j \rbrace_{j=1}^{r} \subset \mathbb{R}^d$ are orthonormal signal directions. We consider the extensive-width regime $r \asymp d^\beta$ for $\beta \in [0, 1)$, and assume a power-law decay on the (non-negative) second-layer coefficients $\lambda_j\asymp j^{-\alpha}$ for $\alpha \geq 0$. We present a sharp analysis of the SGD dynamics in the feature learning regime, for both the population limit and the finite-sample (online) discretization, and derive scaling laws for the prediction risk that highlight the power-law dependencies on the optimization time, sample size, and model width. Our analysis combines a precise characterization of the associated matrix Riccati differential equation with novel matrix monotonicity arguments to establish convergence guarantees for the infinite-dimensional effective dynamics.
摘要：We study the optimization and sample complexity of gradient-based training of a two-layer neural network with quadratic activation function in the high-dimensional regime, where the data is generated as $y \propto \sum_{j=1}^{r}\lambda_j \sigma\left(\langle \boldsymbol{\theta_j}, \boldsymbol{x}\rangle\right), \boldsymbol{x} \sim N(0,\boldsymbol{I}_d)$, $\sigma$ is the 2nd Hermite polynomial, and $\lbrace\boldsymbol{\theta}_j \rbrace_{j=1}^{r} \subset \mathbb{R}^d$ are orthonormal signal directions. We consider the extensive-width regime $r \asymp d^\beta$ for $\beta \in [0, 1)$, and assume a power-law decay on the (non-negative) second-layer coefficients $\lambda_j\asymp j^{-\alpha}$ for $\alpha \geq 0$. We present a sharp analysis of the SGD dynamics in the feature learning regime, for both the population limit and the finite-sample (online) discretization, and derive scaling laws for the prediction risk that highlight the power-law dependencies on the optimization time, sample size, and model width. Our analysis combines a precise characterization of the associated matrix Riccati differential equation with novel matrix monotonicity arguments to establish convergence guarantees for the infinite-dimensional effective dynamics.

【24】Likelihood Matching for Diffusion Models
标题：扩散模型的可能匹配
链接：https://arxiv.org/abs/2508.03636

作者： Wu Su, Yanqi Huang, Song Xi Chen
摘要：我们提出了一种训练扩散模型的似然匹配方法，首先建立目标数据分布的似然与沿反向扩散的样本路径的似然之间的等价性。为了有效地计算反向样本似然，拟似然被认为是近似每个反向转移密度的高斯分布与匹配的条件均值和协方差，分别。通过最大化准似然估计扩散生成的分数和Hessian函数，确保每两个时间点之间的前两个过渡时刻的一致匹配。一个随机采样器被引入，以促进计算，利用估计的分数和Hessian信息。我们建立的准最大似然估计的一致性，并提供非渐近收敛保证建议的采样器，量化的近似误差率，由于得分和海森估计，维数，和扩散步骤的数量。实证和仿真评估证明了所提出的似然匹配的有效性并验证了理论结果。
摘要：We propose a Likelihood Matching approach for training diffusion models by first establishing an equivalence between the likelihood of the target data distribution and a likelihood along the sample path of the reverse diffusion. To efficiently compute the reverse sample likelihood, a quasi-likelihood is considered to approximate each reverse transition density by a Gaussian distribution with matched conditional mean and covariance, respectively. The score and Hessian functions for the diffusion generation are estimated by maximizing the quasi-likelihood, ensuring a consistent matching of both the first two transitional moments between every two time points. A stochastic sampler is introduced to facilitate computation that leverages on both the estimated score and Hessian information. We establish consistency of the quasi-maximum likelihood estimation, and provide non-asymptotic convergence guarantees for the proposed sampler, quantifying the rates of the approximation errors due to the score and Hessian estimation, dimensionality, and the number of diffusion steps. Empirical and simulation evaluations demonstrate the effectiveness of the proposed Likelihood Matching and validate the theoretical results.

【25】Machine Learning Algorithms for Transplanting Accelerometer Observations in Future Satellite Gravimetry Missions
标题：在未来卫星重力测量任务中移植加速度计观测数据的机器学习算法
链接：https://arxiv.org/abs/2508.03522

作者：meshkani, Jürgen Müller, Sahar Ebadi, Alexey Kupriyanov, Annike Knabe, Nina Fletling, Manuel Schilling
摘要：对地球重力场的准确和持续监测对于跟踪与气候变化、水文循环和地球动力学现象有关的质量再分布过程至关重要。虽然GRACE和GRACE后续（GRACE-FO）任务为使用低-低卫星跟踪（LL-SST）的卫星重力测量设定了基准，但重力场恢复的精度仍然强烈依赖于加速度计（ACC）性能的质量和ACC数据的连续性。传统的静电加速度计（EA）面临着可能阻碍任务结果的限制，这促使人们探索先进的传感器技术和数据恢复技术。本研究提出了一个系统的评估加速度计数据移植使用新的加速度计配置，包括冷原子干涉（CAI）加速度计和混合EA-CAI设置，并应用分析和机器学习为基础的方法。使用全面的闭环LL-SST模拟，我们比较了四种情况，从传统的EA仅设置到理想的双混合配置，特别关注使用不同神经网络方法的基于移植的方法的性能。我们的研究结果表明，双混合配置提供了最准确的重力场反演。然而，基于移植的混合设置，特别是在机器学习的支持下，成为一种强大且具有成本效益的替代方案，以最少的额外硬件实现相当的性能。这些发现突出了将量子传感器技术和数据驱动移植相结合用于未来卫星重力测量任务的前景，为改善地球动态重力场的全球监测铺平了道路。
摘要：Accurate and continuous monitoring of Earth's gravity field is essential for tracking mass redistribution processes linked to climate variability, hydrological cycles, and geodynamic phenomena. While the GRACE and GRACE Follow-On (GRACE-FO) missions have set the benchmark for satellite gravimetry using low-low satellite to satellite tracking (LL-SST), the precision of gravity field recovery still strongly depends on the quality of accelerometer (ACC) performance and the continuity of ACC data. Traditional electrostatic accelerometers (EA) face limitations that can hinder mission outcomes, prompting exploration of advanced sensor technologies and data recovery techniques. This study presents a systematic evaluation of accelerometer data transplantation using novel accelerometer configurations, including Cold Atom Interferometry (CAI) accelerometers and hybrid EA-CAI setups, and applying both analytical and machine learning-based methods. Using comprehensive closed-loop LL-SST simulations, we compare four scenarios ranging from the conventional EA-only setup to ideal dual hybrid configurations, with a particular focus on the performance of transplant-based approaches using different neural network approaches. Our results show that the dual hybrid configuration provides the most accurate gravity field retrieval. However, the transplant-based hybrid setup, especially when supported by machine learning, emerges as a robust and cost-effective alternative, achieving comparable performance with minimal extra hardware. These findings highlight the promise of combining quantum sensor technology and data-driven transplantation for future satellite gravimetry missions, paving the way for improved global monitoring of Earth's dynamic gravity field.

【26】Quantum Neural Network applications to Protein Binding Affinity Predictions
标题：量子神经网络应用于蛋白质结合亲和力预测
链接：https://arxiv.org/abs/2508.03446

作者：za Teixeira, Lucas Barros Fernandes, Yara Rodrigues Inácio
备注：16 pages, 7 figures
摘要：结合能是控制分子相互作用的基本热力学性质，在医疗保健和自然科学等领域发挥着至关重要的作用。它在药物开发、疫苗设计和其他生物医学应用中特别相关。多年来，已经开发了各种方法来估计蛋白质结合能，从实验技术到计算方法，机器学习对这一领域做出了重大贡献。虽然经典计算在构建预测模型方面表现出了很好的效果，但用于机器学习的量子计算的变体已经成为一种有前途的替代方案。量子神经网络（QNNs）已经成为一个研究热点，提出了它们在预测结合能方面的潜在优势的问题。为了研究这种潜力，本研究通过提出基于多层感知器的量子神经网络的三十种变体来探索QNN用于此任务的可行性。这些变体跨越三种不同的架构，每种架构都包含十个不同的量子电路来配置它们的量子层。这些量子模型的性能进行了比较，一个国家的最先进的经典多层感知器为基础的人工神经网络，评估精度和训练时间。主要数据集用于训练，而包含完全不可见样本的两个额外数据集用于测试。结果表明，量子模型在一个看不见的数据集上实现了大约20%的准确性，尽管它们在其他数据集上的准确性较低。值得注意的是，量子模型的训练时间比经典模型短几个数量级，突出了它们在有效预测蛋白质结合能方面的潜力。
摘要：Binding energy is a fundamental thermodynamic property that governs molecular interactions, playing a crucial role in fields such as healthcare and the natural sciences. It is particularly relevant in drug development, vaccine design, and other biomedical applications. Over the years, various methods have been developed to estimate protein binding energy, ranging from experimental techniques to computational approaches, with machine learning making significant contributions to this field. Although classical computing has demonstrated strong results in constructing predictive models, the variation of quantum computing for machine learning has emerged as a promising alternative. Quantum neural networks (QNNs) have gained traction as a research focus, raising the question of their potential advantages in predicting binding energies. To investigate this potential, this study explored the feasibility of QNNs for this task by proposing thirty variations of multilayer perceptron-based quantum neural networks. These variations span three distinct architectures, each incorporating ten different quantum circuits to configure their quantum layers. The performance of these quantum models was compared with that of a state-of-the-art classical multilayer perceptron-based artificial neural network, evaluating both accuracy and training time. A primary dataset was used for training, while two additional datasets containing entirely unseen samples were employed for testing. Results indicate that the quantum models achieved approximately 20% higher accuracy on one unseen dataset, although their accuracy was lower on the other datasets. Notably, quantum models exhibited training times several orders of magnitude shorter than their classical counterparts, highlighting their potential for efficient protein binding energy prediction.

【27】Hedging with memory: shallow and deep learning with signatures
标题：用记忆对冲：用签名进行浅层和深度学习
链接：https://arxiv.org/abs/2508.02759

作者：bi Jaber, Louis-Amand Gérard
摘要：我们研究了在机器学习环境下使用路径签名对冲非马尔可夫随机波动率模型下的奇异衍生品。在深度学习环境中，我们使用签名作为前馈神经网络的特征，并表明在大多数情况下，它们的性能优于LSTM，训练计算量少了几个数量级。在浅学习设置中，我们比较了两种回归方法：第一种直接从价格过程的预期签名中学习对冲策略;第二种使用签名波动率模型对波动率的动态进行建模，并在波动率的预期签名上进行校准。在校准的签名波动率模型中解决对冲问题，在不同的收益和波动率动态中产生更准确和稳定的结果。
摘要：We investigate the use of path signatures in a machine learning context for hedging exotic derivatives under non-Markovian stochastic volatility models. In a deep learning setting, we use signatures as features in feedforward neural networks and show that they outperform LSTMs in most cases, with orders of magnitude less training compute. In a shallow learning setting, we compare two regression approaches: the first directly learns the hedging strategy from the expected signature of the price process; the second models the dynamics of volatility using a signature volatility model, calibrated on the expected signature of the volatility. Solving the hedging problem in the calibrated signature volatility model yields more accurate and stable results across different payoffs and volatility dynamics.

【28】Physics-guided denoiser network for enhanced additive manufacturing data quality
标题：用于增强增材制造数据质量的物理引导降噪网络
链接：https://arxiv.org/abs/2508.02712

作者：alder, Satyajit Mojumder
备注：28 pages, 13 figures, 5 tables
摘要：现代工程系统越来越多地配备了用于实时监控和决策的传感器。然而，这些传感器收集的数据往往是嘈杂的，难以解释，限制了其控制和诊断的实用性。在这项工作中，我们提出了一个基于物理的去噪框架，该框架集成了基于能量的模型和Fisher评分正则化，以共同减少数据噪声并与基于物理的模型保持物理一致性。该方法首先验证基准问题，包括简谐振子，Burgers方程，拉普拉斯方程，在不同的噪声水平。然后，我们将去噪框架应用于来自激光粉末床融合（LPBF）增材制造实验的真实热发射数据，使用LPBF过程的经过训练的物理信息神经网络（PINN）代理模型来指导去噪。实验结果表明，该方法的去噪性能优于基线神经网络去噪器，在一定范围的LPBF处理条件下有效地降低了噪声。这种物理指导的去噪策略可以对低成本传感器数据进行鲁棒的实时解释，从而促进预测控制并改善增材制造中的缺陷缓解。
摘要：Modern engineering systems are increasingly equipped with sensors for real-time monitoring and decision-making. However, the data collected by these sensors is often noisy and difficult to interpret, limiting its utility for control and diagnostics. In this work, we propose a physics-informed denoising framework that integrates energy-based model and Fisher score regularization to jointly reduce data noise and enforce physical consistency with a physics-based model. The approach is first validated on benchmark problems, including the simple harmonic oscillator, Burgers' equation, and Laplace's equation, across varying noise levels. We then apply the denoising framework to real thermal emission data from laser powder bed fusion (LPBF) additive manufacturing experiments, using a trained Physics-Informed Neural Network (PINN) surrogate model of the LPBF process to guide denoising. Results show that the proposed method outperforms baseline neural network denoisers, effectively reducing noise under a range of LPBF processing conditions. This physics-guided denoising strategy enables robust, real-time interpretation of low-cost sensor data, facilitating predictive control and improved defect mitigation in additive manufacturing.

其他(36篇)

【1】What If, But Privately: Private Counterfactual Retrieval
标题：如果，但私下：私人反事实检索
链接：https://arxiv.org/abs/2508.03681

作者：el, Mohamed Nomeir, Pasan Dissanayake, Sanghamitra Dutta, Sennur Ulukus
备注：arXiv admin note: text overlap with arXiv:2410.13812, arXiv:2411.10429
摘要：透明度和可解释性是在高风险应用中使用黑箱机器学习模型时需要考虑的两个重要方面。提供反事实的解释是满足这一要求的一种方式。然而，这也构成了一个威胁，提供解释的机构的隐私，以及用户谁是requestingit. In这项工作中，我们主要关注的是用户的隐私谁想要检索一个反事实的实例，而不透露他们的特征向量的机构。我们的框架检索确切的最近邻反事实的解释，从数据库中接受的点，同时实现完美的，信息理论，隐私的用户。首先，我们介绍了私人反事实检索（PCR）的问题，并提出了一个基线PCR计划，保持用户的特征向量信息，理论上是私人的机构。在此基础上，我们提出了另外两个计划，减少了机构数据库泄露给用户的信息量，相比基线计划。其次，我们放松了所有特征的可变性假设，并考虑了不变PCR（I-PCR）的设置。在这里，用户检索最接近的反事实，而不改变其特征的私有子集，其构成不可变集，同时保持其特征向量和不可变集对机构的私有性。为此，我们提出了两个方案，保留用户的隐私信息理论上，但确保不同程度的数据库隐私。第三，我们扩展了我们的PCR和I-PCR方案，将用户的偏好转换他们的属性，使一个更可行的解释，可以收到。最后，我们提出的数值结果来支持我们的理论研究结果，并比较所提出的计划的数据库泄漏。
摘要：Transparency and explainability are two important aspects to be considered when employing black-box machine learning models in high-stake applications. Providing counterfactual explanations is one way of catering this requirement. However, this also poses a threat to the privacy of the institution that is providing the explanation, as well as the user who is requesting it. In this work, we are primarily concerned with the user's privacy who wants to retrieve a counterfactual instance, without revealing their feature vector to the institution. Our framework retrieves the exact nearest neighbor counterfactual explanation from a database of accepted points while achieving perfect, information-theoretic, privacy for the user. First, we introduce the problem of private counterfactual retrieval (PCR) and propose a baseline PCR scheme that keeps the user's feature vector information-theoretically private from the institution. Building on this, we propose two other schemes that reduce the amount of information leaked about the institution database to the user, compared to the baseline scheme. Second, we relax the assumption of mutability of all features, and consider the setting of immutable PCR (I-PCR). Here, the user retrieves the nearest counterfactual without altering a private subset of their features, which constitutes the immutable set, while keeping their feature vector and immutable set private from the institution. For this, we propose two schemes that preserve the user's privacy information-theoretically, but ensure varying degrees of database privacy. Third, we extend our PCR and I-PCR schemes to incorporate user's preference on transforming their attributes, so that a more actionable explanation can be received. Finally, we present numerical results to support our theoretical findings, and compare the database leakage of the proposed schemes.

【2】Morphlux: Programmable chip-to-chip photonic fabrics in multi-accelerator servers for ML
标题：Morphlux：ML的多加速器服务器中的可编程芯片到芯片光结构
链接：https://arxiv.org/abs/2508.03674

作者：Vijaya Kumar, Eric Ding, Arjun Devraj, Rachee Singh
摘要：我们光学互连加速器芯片（例如，GPU、TPU），使用新的可行的可编程芯片到芯片光子结构。相比之下，今天，商业多加速器计算服务器是ML的主力，使用电气互连到服务器中的网络加速器芯片。然而，最近的趋势已经示出了由加速器FLOPS以比相同服务器中的加速器之间的互连的带宽更快的速率缩放而引起的互连带宽墙。这导致了云计算中心中GPU资源的利用率不足和闲置。我们开发了Morphlux，一种服务器级可编程光子结构，用于连接服务器内的加速器。我们发现，使用Morphlux增强最先进的光子ML中心可以将租户计算分配的带宽提高高达66%，并将计算碎片减少高达70%。我们开发了一个新型的Morphlux端到端硬件原型来证明这些性能优势，这意味着ML模型的训练吞吐量提高了1.72倍。通过在我们的硬件测试平台中快速编程服务器规模的结构，Morphlux可以在1.2秒内逻辑地替换故障的加速器芯片。
摘要：We optically interconnect accelerator chips (e.g., GPUs, TPUs) within compute servers using newly viable programmable chip-to-chip photonic fabrics. In contrast, today, commercial multi-accelerator compute servers that are workhorses of ML, use electrical interconnects to network accelerator chips in the server. However, recent trends have shown an interconnect bandwidth wall caused by accelerator FLOPS scaling at a faster rate than the bandwidth of the interconnect between accelerators in the same server. This has led to under-utilization and idling of GPU resources in cloud datacenters. We develop Morphlux, a server-scale programmable photonic fabric, to interconnect accelerators within servers. We show that augmenting state-of-the-art photonic ML-centric datacenters with Morphlux can improve the bandwidth of tenant compute allocations by up to 66% and reduce compute fragmentation by up to 70%. We develop a novel end-to-end hardware prototype of Morphlux to demonstrate these performance benefits, which translate to 1.72x improvement in training throughput of ML models. By rapidly programming the server-scale fabric in our hardware testbed, Morphlux can logically replace a failed accelerator chip in 1.2 seconds.

【3】A DbC Inspired Neurosymbolic Layer for Trustworthy Agent Design
标题：一种受DBC启发的可信赖代理设计的神经符号层
链接：https://arxiv.org/abs/2508.03665

作者：eoveanu-Condrei
备注：3 pages, 1 figure
摘要：生成模型，特别是大型语言模型（LLM），产生流畅的输出，但缺乏可验证的保证。我们采用契约式设计（DbC）和类型理论原则来引入一个契约层，它调解每个LLM调用。合同规定了对输入和输出的语义和类型要求，加上概率补救，以引导生成符合要求。该层将LLM的双重视图暴露为语义解析器和概率黑盒组件。契约满足是概率性的，语义验证是通过程序员指定的条件在良好类型的数据结构上操作定义的。更广泛地说，这项工作假设，任何两个代理人满足相同的合同是\n {功能等价}关于这些合同。
摘要：Generative models, particularly Large Language Models (LLMs), produce fluent outputs yet lack verifiable guarantees. We adapt Design by Contract (DbC) and type-theoretic principles to introduce a contract layer that mediates every LLM call. Contracts stipulate semantic and type requirements on inputs and outputs, coupled with probabilistic remediation to steer generation toward compliance. The layer exposes the dual view of LLMs as semantic parsers and probabilistic black-box components. Contract satisfaction is probabilistic and semantic validation is operationally defined through programmer-specified conditions on well-typed data structures. More broadly, this work postulates that any two agents satisfying the same contracts are \emph{functionally equivalent} with respect to those contracts.

【4】Forest vs Tree: The $(N, K)$ Trade-off in Reproducible ML Evaluation
标题：森林与树木：可再现ML评估中的（N，K）$权衡
链接：https://arxiv.org/abs/2508.03663

作者：ndita, Flip Korn, Chris Welty, Christopher M. Homan
摘要：复制是科学验证的基石，也是其结果的权威性的基石。机器学习评估中的可复制性会带来更大的信任、信心和价值。然而，机器学习中使用的地面真实响应通常必然来自人类，其中普遍存在分歧，并且令人惊讶的是，很少有研究研究有效忽略这些响应中的分歧的影响，这是典型的情况。缺乏研究的一个原因是收集人工注释的评估数据的预算有限，并且从多个注释者处为每个示例获取更多样本会大大增加每个项目的注释成本。我们研究了可靠的机器学习评估所需的项目数（$N$）和每个项目的响应数（$K$）之间的权衡。我们分析了不同的分类数据集，每个项目存在多个注释，模拟分布适合这些数据集，以确定最佳的$（N，K）$配置，给定固定的预算（$N \times K$），用于收集评估数据和可靠地比较机器学习模型的性能。我们的研究结果表明，首先，对于至少一个度量标准测试的每个数据集，解释人类分歧可能会有不超过1000（通常要低得多）的N \x K$。此外，这个最小的$N \times K$几乎总是发生在$K > 10$时。此外，$K$和$N$之间的权衡的性质-或者如果存在的话-取决于评估指标，对响应的完全分布更敏感的指标在更高的$K$水平上表现得更好。我们的方法可用于帮助ML从业者获得更有效的测试数据，方法是找到最佳指标以及每个项目的项目和注释数量，以获得最可靠的预算。
摘要：Reproducibility is a cornerstone of scientific validation and of the authority it confers on its results. Reproducibility in machine learning evaluations leads to greater trust, confidence, and value. However, the ground truth responses used in machine learning often necessarily come from humans, among whom disagreement is prevalent, and surprisingly little research has studied the impact of effectively ignoring disagreement in these responses, as is typically the case. One reason for the lack of research is that budgets for collecting human-annotated evaluation data are limited, and obtaining more samples from multiple annotators for each example greatly increases the per-item annotation costs. We investigate the trade-off between the number of items ($N$) and the number of responses per item ($K$) needed for reliable machine learning evaluation. We analyze a diverse collection of categorical datasets for which multiple annotations per item exist, and simulated distributions fit to these datasets, to determine the optimal $(N, K)$ configuration, given a fixed budget ($N \times K$), for collecting evaluation data and reliably comparing the performance of machine learning models. Our findings show, first, that accounting for human disagreement may come with $N \times K$ at no more than 1000 (and often much lower) for every dataset tested on at least one metric. Moreover, this minimal $N \times K$ almost always occurred for $K > 10$. Furthermore, the nature of the tradeoff between $K$ and $N$ -- or if one even existed -- depends on the evaluation metric, with metrics that are more sensitive to the full distribution of responses performing better at higher levels of $K$. Our methods can be used to help ML practitioners get more effective test data by finding the optimal metrics and number of items and annotations per item to collect to get the most reliability for their budget.

【5】Efficient Morphology-Aware Policy Transfer to New Embodiments
标题：有效的形态感知策略转移到新实施例
链接：https://arxiv.org/abs/2508.03660

作者：rzystupa, Hongyao Tang, Martin Jagersand, Santiago Miret, Mariano Phielipp, Matthew E. Taylor, Glen Berseth
备注：19 pages, 10 Figures, Published at the 2025 Reinforcement Learning Conference
摘要：形态感知策略学习是通过聚合来自多个代理的数据来提高策略样本效率的一种手段。这些类型的政策已被证明有助于概括代理形态之间的动态，运动学和肢体配置的变化。不幸的是，与部署时对形态的端到端微调相比，这些策略仍然具有次优的zero-shot性能。这一限制在实际应用中产生了影响，例如机器人，因为进一步收集数据以执行端到端微调可能在计算上是昂贵的。在这项工作中，我们研究了将形态感知预训练与参数高效微调（PEFT）技术相结合，以帮助减少将形态感知策略专门化到目标实施例所需的可学习参数。我们比较直接调整子集的模型权重，输入可学习的适配器，和前缀调整技术在线微调。我们的分析表明，与从头开始端到端训练模型相比，PEFT技术与策略预训练相结合通常有助于减少改进策略所需的样本数量。我们进一步发现，调整少至少于1%的总参数将提高政策的性能相比，zero-shot性能的基础预训练的政策。
摘要：Morphology-aware policy learning is a means of enhancing policy sample efficiency by aggregating data from multiple agents. These types of policies have previously been shown to help generalize over dynamic, kinematic, and limb configuration variations between agent morphologies. Unfortunately, these policies still have sub-optimal zero-shot performance compared to end-to-end finetuning on morphologies at deployment. This limitation has ramifications in practical applications such as robotics because further data collection to perform end-to-end finetuning can be computationally expensive. In this work, we investigate combining morphology-aware pretraining with parameter efficient finetuning (PEFT) techniques to help reduce the learnable parameters necessary to specialize a morphology-aware policy to a target embodiment. We compare directly tuning sub-sets of model weights, input learnable adapters, and prefix tuning techniques for online finetuning. Our analysis reveals that PEFT techniques in conjunction with policy pre-training generally help reduce the number of samples to necessary to improve a policy compared to training models end-to-end from scratch. We further find that tuning as few as less than 1% of total parameters will improve policy performance compared the zero-shot performance of the base pretrained a policy.

【6】Pair Correlation Factor and the Sample Complexity of Gaussian Mixtures
标题：对相关因子和高斯混合的样本复杂度
链接：https://arxiv.org/abs/2508.03633

作者：yan
备注：21 pages, no figures
摘要：我们研究学习高斯混合模型（GARCH）的问题，并问：哪些结构特性决定了它们的样本复杂性？先前的工作在很大程度上将这种复杂性与组件之间的最小成对分离联系在一起，但我们证明了这种观点是不完整的。我们引入了成对相关因子（PCF），这是一个几何量，用于捕获组件均值的聚类。与最小间隙不同，PCF更准确地指示参数恢复的难度。在均匀球形的情况下，我们给出了一个算法，改进的样本复杂度的界限，显示当超过通常的$\^{-2}$样本是必要的。
摘要：We study the problem of learning Gaussian Mixture Models (GMMs) and ask: which structural properties govern their sample complexity? Prior work has largely tied this complexity to the minimum pairwise separation between components, but we demonstrate this view is incomplete. We introduce the \emph{Pair Correlation Factor} (PCF), a geometric quantity capturing the clustering of component means. Unlike the minimum gap, the PCF more accurately dictates the difficulty of parameter recovery. In the uniform spherical case, we give an algorithm with improved sample complexity bounds, showing when more than the usual $\epsilon^{-2}$ samples are necessary.

【7】Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction
标题：Goedel-Prover-V2：利用支架数据合成和自纠正证明标度形式定理
链接：https://arxiv.org/abs/2508.03613

作者： Shange Tang, Bohan Lyu, Ziran Yang, Jui-Hui Chung, Haoyu Zhao, Lai Jiang, Yihan Geng, Jiawei Ge, Jingruo Sun, Jiayun Wu, Jiri Gesi, Ximing Lu, David Acuna, Kaiyu Yang, Hongzhou Lin, Yejin Choi, Danqi Chen, Sanjeev Arora, Chi Jin
备注：24 pages, 10 figures, 4 tables
摘要：我们介绍Goedel-Prover-V2，这是一系列开源语言模型，在自动化定理证明中开创了新的艺术境界。基于标准的专家迭代和强化学习管道，我们的方法包含三个关键创新：（1）脚手架数据合成：我们生成难度不断增加的合成任务，以训练模型来掌握越来越复杂的定理;（2）验证者引导的自我校正：我们使模型能够通过利用来自精益编译器的反馈来迭代修改其证明;（3）模型平均：我们合并模型检查点，以减轻训练后期模型输出多样性的下降。我们的小型模型Goedel-Prover-V2-8B在MiniF 2F上达到84.6%的通过率@32，在相同指标下优于DeepSeek-Prover-V2- 671 B，尽管小了80倍。我们的旗舰型号Goedel-Prover-V2- 32 B在MiniF 2F上的通过率为88.1%，在标准模式下为32，在自校正模式下为90.4%，大大优于之前的SOTA。此外，我们的旗舰模型在PutnamBench上以pass@184解决了86个问题，在排行榜上的开源模型中排名第一，超过了DeepSeek-Prover-V2- 671 B以pass@1024解决47个问题的记录，模型大小和计算预算都明显较小。在其发布时（2025年7月至8月），Goedel-Prover-V2在所有开源定理证明器中实现了最强的整体性能。它也是性能最好的模型之一-包括公开报告性能的闭源系统-在有限的测试时间计算预算下。我们的模型、代码和数据发布于https://github.com/Goedel-LM/Goedel-Prover-V2。
摘要：We introduce Goedel-Prover-V2, a series of open-source language models that set a new state-of-the-art in automated theorem proving. Built on the standard expert iteration and reinforcement learning pipeline, our approach incorporates three key innovations: (1) Scaffolded data synthesis: We generate synthetic tasks of increasing difficulty to train the model to master increasingly complex theorems; (2) Verifier-guided self-correction: We enable the model to iteratively revise its proofs by leveraging feedback from the Lean compiler; (3) Model averaging: We merge model checkpoints to mitigate the decrease in model output diversity in later stages of training. Our small model, Goedel-Prover-V2-8B, reaches 84.6% pass@32 on MiniF2F and outperforms DeepSeek-Prover-V2-671B under the same metric, despite being 80X smaller. Our flagship model, Goedel-Prover-V2-32B, achieves 88.1% on MiniF2F at pass@32 in standard mode and 90.4% in self-correction mode, outperforming prior SOTA by a large margin. Additionally, our flagship model solves 86 problems on PutnamBench at pass@184, securing the first place among open-source models on the leaderboard, surpassing DeepSeek-Prover-V2-671B's record of solving 47 problems by pass@1024 with a significantly smaller model size and compute budget. At the time of its release (July-August 2025), Goedel-Prover-V2 achieves the strongest overall performance among all open-source theorem provers. It also ranks among the top-performing models--including closed-source systems with publicly reported performance--under a constrained test-time compute budget. Our models, code, and data are released at https://github.com/Goedel-LM/Goedel-Prover-V2.

【8】On the (In)Significance of Feature Selection in High-Dimensional Datasets
标题：论多维数据集中特征选择的（内在）重要性
链接：https://arxiv.org/abs/2508.03593

作者：eekhra, Debayan Gupta, Partha Pratim Chakravarti
备注：submitted to Nature Computational Science (double-blind review in progress). supplementary material included in pdf; anonymized code at: this https URL
摘要：为了提高模型性能、降低计算成本和识别感兴趣的特征，人们对高维数据集的特征选择算法进行了广泛的研究。我们测试零假设使用随机选择的功能，比较对FS算法选择的功能，以验证后者的性能。我们的研究结果表明，在分类任务中，高维数据集（特别是基因表达）上的FS是没有用的。我们发现，（1）在随机选择的特征的小子集（所有特征的0.02%-1%）上训练的模型几乎总是比在所有特征上训练的模型更好，（2）“典型”大小的随机子集提供了与各种已发表研究中选择的前k个特征相当或更好的性能。因此，我们的工作挑战了高维数据集上的许多特征选择结果，特别是在计算基因组学中。它引起了人们对基于计算选择的基因提出药物设计或靶向干预的研究的严重关注，而没有在湿实验室中进行进一步验证。
摘要：Extensive research has been done on feature selection (FS) algorithms for high-dimensional datasets aiming to improve model performance, reduce computational cost and identify features of interest. We test the null hypothesis of using randomly selected features to compare against features selected by FS algorithms to validate the performance of the latter. Our results show that FS on high-dimensional datasets (in particular gene expression) in classification tasks is not useful. We find that (1) models trained on small subsets (0.02%-1% of all features) of randomly selected features almost always perform comparably to those trained on all features, and (2) a "typical"- sized random subset provides comparable or superior performance to that of top-k features selected in various published studies. Thus, our work challenges many feature selection results on high dimensional datasets, particularly in computational genomics. It raises serious concerns about studies that propose drug design or targeted interventions based on computationally selected genes, without further validation in a wet lab.

【9】Zero-Variance Gradients for Variational Autoencoders
标题：变分自动编码器的零方差要素
链接：https://arxiv.org/abs/2508.03587

作者：o, Anji Liu, Guy Van den Broeck
摘要：训练像变分自动编码器（VAE）这样的深度生成模型通常会受到通过潜变量随机采样反向传播梯度的需要的阻碍，这一过程本身就引入了估计方差，这会减慢收敛速度并降低性能。在本文中，我们提出了一个新的视角，回避了这个问题，我们称之为沉默的顺应性。我们利用特定的解码器架构来分析计算预期的ELBO，而不是改进随机估计器，从而产生零方差的梯度。首先，我们提供了一个理论基础，这种方法，并证明其优越性，现有的估计在控制设置与线性解码器。为了将我们的方法推广到复杂的，富有表现力的解码器的实际使用中，我们引入了一种新的训练动态，该动态使用精确的零方差梯度来指导编码器训练的早期阶段，然后退火到标准的随机估计器。我们的实验表明，这种技术在多个数据集上始终如一地提高了已建立基线的性能，包括重新参数化，Gumbel-Softmax和REINFORCE。这项工作通过将分析计算的稳定性与深层非线性架构的表现力相结合，为训练生成模型开辟了一个新的方向。
摘要：Training deep generative models like Variational Autoencoders (VAEs) is often hindered by the need to backpropagate gradients through the stochastic sampling of their latent variables, a process that inherently introduces estimation variance, which can slow convergence and degrade performance. In this paper, we propose a new perspective that sidesteps this problem, which we call Silent Gradients. Instead of improving stochastic estimators, we leverage specific decoder architectures to analytically compute the expected ELBO, yielding a gradient with zero variance. We first provide a theoretical foundation for this method and demonstrate its superiority over existing estimators in a controlled setting with a linear decoder. To generalize our approach for practical use with complex, expressive decoders, we introduce a novel training dynamic that uses the exact, zero-variance gradient to guide the early stages of encoder training before annealing to a standard stochastic estimator. Our experiments show that this technique consistently improves the performance of established baselines, including reparameterization, Gumbel-Softmax, and REINFORCE, across multiple datasets. This work opens a new direction for training generative models by combining the stability of analytical computation with the expressiveness of deep, nonlinear architecture.

【10】Vision-based Perception System for Automated Delivery Robot-Pedestrians Interactions
标题：用于自动送货机器人与行人互动的基于视觉的感知系统
链接：https://arxiv.org/abs/2508.03541

作者：e, Bilal Farooq
备注：None
摘要：将自动送货机器人（ADR）集成到城市空间中，在安全，高效和社会可接受的导航方面带来了独特的挑战。我们开发了一个完整的管道基于单一视觉传感器的多行人检测和跟踪，姿态估计，和单目深度感知。利用真实世界的MOT17数据集序列，这项研究展示了如何整合人体姿势估计和深度线索，即使在遮挡和密集人群的情况下，也能增强行人轨迹预测和身份维护。结果显示了可衡量的改进，包括身份保留（IDF1）提高了10%，多目标跟踪准确性（MOTA）提高了7%，即使在具有挑战性的场景中，检测精度也始终超过85%。值得注意的是，该系统识别了弱势行人群体，支持更具社会意识和包容性的机器人行为。
摘要：The integration of Automated Delivery Robots (ADRs) into pedestrian-heavy urban spaces introduces unique challenges in terms of safe, efficient, and socially acceptable navigation. We develop the complete pipeline for a single vision sensor based multi-pedestrian detection and tracking, pose estimation, and monocular depth perception. Leveraging the real-world MOT17 dataset sequences, this study demonstrates how integrating human-pose estimation and depth cues enhances pedestrian trajectory prediction and identity maintenance, even under occlusions and dense crowds. Results show measurable improvements, including up to a 10% increase in identity preservation (IDF1), a 7% improvement in multiobject tracking accuracy (MOTA), and consistently high detection precision exceeding 85%, even in challenging scenarios. Notably, the system identifies vulnerable pedestrian groups supporting more socially aware and inclusive robot behaviour.

【11】MoKA: Mixture of Kronecker Adapters
标题：MoKA：克罗内克适配器的混合体
链接：https://arxiv.org/abs/2508.03527

作者：eza Sadeghi, Mahsa Ghazvini Nejad, MirHamed Jafarzadeh Asl, Yu Gu, Yuanhao Yu, Masoud Asgharian, Vahid Partovi Nia
摘要：参数高效微调（PEFT）对于减少大型语言模型（LLM）的计算开销至关重要。低秩族适配器通常用于有效地控制参数大小，同时保持LLM的生成能力。然而，由于等级约束，它们有限的表达能力往往限制了它们在复杂任务上的表现。我们提出了混合的克罗内克适配器（MoKA），新一代的克罗内克适配器，解决了这一限制，通过建模权重更新的混合克罗内克产品。我们提出的适配器利用门控机制，衡量每个克罗内克因子的重要性，使更多的表达适应。此外，MoKA实现了秩的灵活性，在参数效率和准确性之间提供了更好的权衡。为了确保硬件效率，我们使用标准矩阵运算重新制定Kronecker计算，允许在GPU优化的硬件上无缝部署。我们使用低位量化版本的LLaMA 2 - 7 B和LLaMA 3 -8B模型进行了大量的实验。MoKA不仅优于PEFT基线，而且还将可训练参数的数量减少了27倍，实现了性能和参数效率之间的最佳平衡。
摘要：Parameter-efficient fine-tuning (PEFT) is essential for reducing the computational overhead of large language models (LLMs). Low-rank family adapters are commonly used to control the parameter size efficiently while maintaining the generative power of LLMs. However, their limited expressiveness due to the rank constraint often restricts their performance on complex tasks. We propose Mixture of Kronecker Adapters (MoKA), a new generation of Kronecker adapters that addresses this limitation by modeling weight updates as a mixture of Kronecker products. Our proposed adapter leverages a gating mechanism that measures the importance of each Kronecker factor, enabling more expressive adaptation. Moreover, MoKA enables a rank flexibility that provides a better trade-off between parameter efficiency and accuracy. To ensure hardware efficiency, we reformulate Kronecker computations using standard matrix operations, allowing seamless deployment on GPU-optimized hardware. We conduct extensive experiments on instruction-tuning and commonsense reasoning tasks using low-bit quantized versions of LLaMA2-7B and LLaMA3-8B models. MoKA not only outperforms PEFT baselines, but also reduces the number of trainable parameters up to 27x, achieving state-of-the-art trade-offs between performance and parameter efficiency.

【12】An Auditable Agent Platform For Automated Molecular Optimisation
标题：用于自动分子优化的可审核代理平台
链接：https://arxiv.org/abs/2508.03444

作者：lü, Phil Rohr, Ahmet Celebi
摘要：当数据、专业知识和工具分散时，药物发现往往会失去动力，从而减缓设计周期。为了缩短这个循环，我们建立了一个分层的工具，使用自动化分子优化的代理框架。首席研究员定义每个目标，数据库代理检索目标信息，人工智能专家使用序列生成从头支架到分子深度学习模型，药物化学家在调用对接工具时编辑它们，排名代理对候选人进行评分，科学评论家负责逻辑。每个工具调用都被总结和存储，从而使完整的推理路径保持可检查。代理通过捕获分子谱系的简洁出处记录进行通信，以构建可审计的、以分子为中心的推理轨迹，并通过上下文学习重复使用成功的转换。使用五个大型语言模型针对AKT1蛋白运行三个循环研究循环。在通过平均对接得分对模型进行排名后，我们对两个表现最好的模型进行了20次独立的放大。然后，我们比较了三种配置中主要LLM的结合亲和力结果：仅LLM、单药剂和多药剂。我们的研究结果揭示了一个架构权衡，多代理设置擅长集中结合优化，提高平均预测结合亲和力31%。相比之下，单一试剂运行产生的分子具有优异的药物样性质，其代价是较低的有效结合分数。无引导的LLM运行完成得最快，但他们缺乏透明的工具信号，使他们的推理路径的有效性未经验证。这些结果表明，测试时间缩放，集中反馈回路和出处将通用LLM转换为可审计的分子设计系统，并表明将工具集扩展到ADMET和选择性预测器可以进一步推动研究工作流程沿着发现管道。
摘要：Drug discovery frequently loses momentum when data, expertise, and tools are scattered, slowing design cycles. To shorten this loop we built a hierarchical, tool using agent framework that automates molecular optimisation. A Principal Researcher defines each objective, a Database agent retrieves target information, an AI Expert generates de novo scaffolds with a sequence to molecule deep learning model, a Medicinal Chemist edits them while invoking a docking tool, a Ranking agent scores the candidates, and a Scientific Critic polices the logic. Each tool call is summarised and stored causing the full reasoning path to remain inspectable. The agents communicate through concise provenance records that capture molecular lineage, to build auditable, molecule centered reasoning trajectories and reuse successful transformations via in context learning. Three cycle research loops were run against AKT1 protein using five large language models. After ranking the models by mean docking score, we ran 20 independent scale ups on the two top performers. We then compared the leading LLMs' binding affinity results across three configurations, LLM only, single agent, and multi agent. Our results reveal an architectural trade off, the multi agent setting excelled at focused binding optimization, improving average predicted binding affinity by 31%. In contrast, single agent runs generated molecules with superior drug like properties at the cost of less potent binding scores. Unguided LLM runs finished fastest, yet their lack of transparent tool signals left the validity of their reasoning paths unverified. These results show that test time scaling, focused feedback loops and provenance convert general purpose LLMs into auditable systems for molecular design, and suggest that extending the toolset to ADMET and selectivity predictors could push research workflows further along the discovery pipeline.

【13】Residual Neural Terminal Constraint for MPC-based Collision Avoidance in Dynamic Environments
标题：动态环境下基于MPC的碰撞避免的剩余神经终端约束
链接：https://arxiv.org/abs/2508.03428

作者：ajić, Mohamed-Khalil Bouzidi, Sebastian Bernhard, Wolfgang Hönig
摘要：在本文中，我们提出了一个混合MPC本地规划器，使用一个基于学习的近似的时变安全集，来自本地观测和应用的MPC终端约束。该集合可以表示为通过Hamilton-Jacobi（HJ）可达性分析计算的值函数的零超水平集合，这在实时中是不可行的。我们利用的性质，HJ值函数可以表示为相应的符号距离函数（SDF）和非负残差函数的差异。残差分量被建模为具有非负输出的神经网络，并从计算的SDF中减去，从而产生至少与SDF一样安全的实时值函数估计。此外，我们参数化的神经残差的超网络，以提高实时性能和泛化性能。将所提出的方法与模拟和硬件实验中的三种最先进方法进行比较，与最佳基线相比，成功率高出30%，同时需要类似的计算工作量并产生高质量（低传播时间）解决方案。
摘要：In this paper, we propose a hybrid MPC local planner that uses a learning-based approximation of a time-varying safe set, derived from local observations and applied as the MPC terminal constraint. This set can be represented as a zero-superlevel set of the value function computed via Hamilton-Jacobi (HJ) reachability analysis, which is infeasible in real-time. We exploit the property that the HJ value function can be expressed as a difference of the corresponding signed distance function (SDF) and a non-negative residual function. The residual component is modeled as a neural network with non-negative output and subtracted from the computed SDF, resulting in a real-time value function estimate that is at least as safe as the SDF by design. Additionally, we parametrize the neural residual by a hypernetwork to improve real-time performance and generalization properties. The proposed method is compared with three state-of-the-art methods in simulations and hardware experiments, achieving up to 30\% higher success rates compared to the best baseline while requiring a similar computational effort and producing high-quality (low travel-time) solutions.

【14】Sparsity and Total Variation Constrained Multilayer Linear Unmixing for Hyperspectral Imagery
标题：高光谱图像的稀疏性和全变差约束多层线性解混合
链接：https://arxiv.org/abs/2508.03403

作者：
摘要：高光谱解混的目的是估计物质特征（称为端元）和相应的比例（称为丰度），这是各种高光谱图像应用中的关键预处理步骤。本研究提出一种新的方法称为稀疏和总变差（TV）约束的多层线性解混（STVMLU）的高光谱图像。具体而言，基于多层矩阵分解模型，为了提高解混的准确性，引入TV约束以考虑相邻空间相似性。此外，采用L1/2范数稀疏约束来有效地表征丰度矩阵的稀疏性。为了优化STVMLU模型，交替方向乘法（ADMM）的方法，它允许同时提取端元和它们相应的丰度矩阵。实验结果表明，所提出的STVMLU相比，其他算法的增强性能。
摘要：Hyperspectral unmixing aims at estimating material signatures (known as endmembers) and the corresponding proportions (referred to abundances), which is a critical preprocessing step in various hyperspectral imagery applications. This study develops a novel approach called sparsity and total variation (TV) constrained multilayer linear unmixing (STVMLU) for hyperspectral imagery. Specifically, based on a multilayer matrix factorization model, to improve the accuracy of unmixing, a TV constraint is incorporated to consider adjacent spatial similarity. Additionally, a L1/2-norm sparse constraint is adopted to effectively characterize the sparsity of the abundance matrix. For optimizing the STVMLU model, the method of alternating direction method of multipliers (ADMM) is employed, which allows for the simultaneous extraction of endmembers and their corresponding abundance matrix. Experimental results illustrate the enhanced performance of the proposed STVMLU when compared to other algorithms.

【15】Software Fairness Dilemma: Is Bias Mitigation a Zero-Sum Game?
标题：软件公平困境：缓解偏见是零和游戏吗？
链接：https://arxiv.org/abs/2508.03323

作者：Chen, Xinyue Li, Jie M. Zhang, Weisong Sun, Ying Xiao, Tianlin Li, Yiling Lou, Yang Liu
备注：Accepted by the ACM International Conference on the Foundations of Software Engineering (FSE 2025)
摘要：公平性是机器学习（ML）软件的关键要求，推动了许多偏见缓解方法的发展。先前的研究已经确定了计算机视觉和自然语言处理任务的偏见缓解的水平下降效应，其中通过降低所有群体的表现而不使弱势群体受益来实现公平。然而，目前尚不清楚这种效应是否适用于表格数据任务的偏倚缓解，这是公平性研究的一个关键领域，具有重要的现实应用。这项研究评估了表格数据的八种偏差缓解方法，包括广泛使用的和尖端的方法，使用五个真实世界的数据集和四个常见的ML模型进行44项任务。与早期的研究结果相反，我们的研究结果表明，这些方法以零和的方式运作，其中非特权群体的改善与传统特权群体的福利减少有关。然而，先前的研究表明，零和权衡的看法可能会使更广泛地采用公平政策变得复杂。为了探索替代方案，我们研究了一种方法，该方法将最先进的偏差缓解方法仅应用于非特权群体，显示出增强非特权群体利益的潜力，而不会对特权群体或整体ML性能产生负面影响。我们的研究强调了在没有零和权衡的情况下实现公平性改善的潜在途径，这可能有助于促进偏见缓解方法的采用。
摘要：Fairness is a critical requirement for Machine Learning (ML) software, driving the development of numerous bias mitigation methods. Previous research has identified a leveling-down effect in bias mitigation for computer vision and natural language processing tasks, where fairness is achieved by lowering performance for all groups without benefiting the unprivileged group. However, it remains unclear whether this effect applies to bias mitigation for tabular data tasks, a key area in fairness research with significant real-world applications. This study evaluates eight bias mitigation methods for tabular data, including both widely used and cutting-edge approaches, across 44 tasks using five real-world datasets and four common ML models. Contrary to earlier findings, our results show that these methods operate in a zero-sum fashion, where improvements for unprivileged groups are related to reduced benefits for traditionally privileged groups. However, previous research indicates that the perception of a zero-sum trade-off might complicate the broader adoption of fairness policies. To explore alternatives, we investigate an approach that applies the state-of-the-art bias mitigation method solely to unprivileged groups, showing potential to enhance benefits of unprivileged groups without negatively affecting privileged groups or overall ML performance. Our study highlights potential pathways for achieving fairness improvements without zero-sum trade-offs, which could help advance the adoption of bias mitigation methods.

【16】Strategic Hypothesis Testing
标题：战略假设测试
链接：https://arxiv.org/abs/2508.03289

作者：ssain, Yatong Chen, Yiling Chen
摘要：我们研究假设检验委托代理框架内，战略代理人，持有私人的信念，对产品的有效性，提交数据的主要决定批准。委托人采用假设检验规则，旨在选择一个p值阈值，平衡假阳性和假阴性，同时预测代理人的激励，以最大限度地提高预期的盈利能力。在先前工作的基础上，我们开发了一个博弈论模型，该模型捕捉了代理人的参与和报告行为如何响应委托人的统计决策规则。尽管相互作用的复杂性，我们表明，主要的错误表现出明显的单调行为时，分割一个有效的可计算的临界p值阈值，导致其最佳p值阈值的可解释的表征。我们使用公开的药物批准数据实证验证了我们的模型和这些见解。总的来说，我们的工作提供了一个全面的视角，在假设检验框架内的战略互动，提供技术和监管的见解。
摘要：We examine hypothesis testing within a principal-agent framework, where a strategic agent, holding private beliefs about the effectiveness of a product, submits data to a principal who decides on approval. The principal employs a hypothesis testing rule, aiming to pick a p-value threshold that balances false positives and false negatives while anticipating the agent's incentive to maximize expected profitability. Building on prior work, we develop a game-theoretic model that captures how the agent's participation and reporting behavior respond to the principal's statistical decision rule. Despite the complexity of the interaction, we show that the principal's errors exhibit clear monotonic behavior when segmented by an efficiently computable critical p-value threshold, leading to an interpretable characterization of their optimal p-value threshold. We empirically validate our model and these insights using publicly available data on drug approvals. Overall, our work offers a comprehensive perspective on strategic interactions within the hypothesis testing framework, providing technical and regulatory insights.

【17】The alpha-beta divergence for real and complex data
标题：真实和复杂数据的阿尔法-贝塔分歧
链接：https://arxiv.org/abs/2508.03272

作者：uces
摘要：Divergences are fundamental to the information criteria that underpin most signal processing algorithms. The alpha-beta family of divergences, designed for non-negative data, offers a versatile framework that parameterizes and continuously interpolates several separable divergences found in existing literature. This work extends the definition of alpha-beta divergences to accommodate complex data, specifically when the arguments of the divergence are complex vectors. This novel formulation is designed in such a way that, by setting the divergence hyperparameters to unity, it particularizes to the well-known Euclidean and Mahalanobis squared distances. Other choices of hyperparameters yield practical separable and non-separable extensions of several classical divergences. In the context of the problem of approximating a complex random vector, the centroid obtained by optimizing the alpha-beta mean distortion has a closed-form expression, which interpretation sheds light on the distinct roles of the divergence hyperparameters. These contributions may have wide potential applicability, as there are many signal processing domains in which the underlying data are inherently complex.

【18】On Conformal Machine Unlearning
标题：论共形机器去学习
链接：https://arxiv.org/abs/2508.03245

作者：hatib, Wee Peng Tay
摘要：The increasing demand for data privacy, driven by regulations such as GDPR and CCPA, has made Machine Unlearning (MU) essential for removing the influence of specific training samples from machine learning models while preserving performance on retained data. However, most existing MU methods lack rigorous statistical guarantees, rely on heuristic metrics, and often require computationally expensive retraining baselines. To overcome these limitations, we introduce a new definition for MU based on Conformal Prediction (CP), providing statistically sound, uncertainty-aware guarantees without the need for the concept of naive retraining. We formalize conformal criteria that quantify how often forgotten samples are excluded from CP sets, and propose empirical metrics,the Efficiently Covered Frequency (ECF at c) and its complement, the Efficiently Uncovered Frequency (EuCF at d), to measure the effectiveness of unlearning. We further present a practical unlearning method designed to optimize these conformal metrics. Extensive experiments across diverse forgetting scenarios, datasets and models demonstrate the efficacy of our approach in removing targeted data.

【19】Revisiting Deep Information Propagation: Fractal Frontier and Finite-size Effects
标题：重温深度信息传播：分数前沿和伪大小效应
链接：https://arxiv.org/abs/2508.03222

作者：Alessio D'Inverno, Zhiyuan Hu, Leo Davy, Michael Unser, Gianluigi Rozza, Jonathan Dong
备注：17 pages
摘要：Information propagation characterizes how input correlations evolve across layers in deep neural networks. This framework has been well studied using mean-field theory, which assumes infinitely wide networks. However, these assumptions break down for practical, finite-size networks. In this work, we study information propagation in randomly initialized neural networks with finite width and reveal that the boundary between ordered and chaotic regimes exhibits a fractal structure. This shows the fundamental complexity of neural network dynamics, in a setting that is independent of input data and optimization. To extend this analysis beyond multilayer perceptrons, we leverage recently introduced Fourier-based structured transforms, and show that information propagation in convolutional neural networks also follow the same behavior. Our investigation highlights the importance of finite network depth with respect to the tradeoff between separation and robustness.

【20】Overcoming Algorithm Aversion with Transparency: Can Transparent Predictions Change User Behavior?
标题：用透明度克服算法厌恶：透明预测能改变用户行为吗？
链接：https://arxiv.org/abs/2508.03168

作者：len, Sven Kruschel, Julian Rosenberger, Patrick Zschech, Mathias Kraus
备注：Accepted at 20th International Conference on Wirtschaftsinformatik (WI25); September 2025, Münster, Germany
摘要：Previous work has shown that allowing users to adjust a machine learning (ML) model's predictions can reduce aversion to imperfect algorithmic decisions. However, these results were obtained in situations where users had no information about the model's reasoning. Thus, it remains unclear whether interpretable ML models could further reduce algorithm aversion or even render adjustability obsolete. In this paper, we conceptually replicate a well-known study that examines the effect of adjustable predictions on algorithm aversion and extend it by introducing an interpretable ML model that visually reveals its decision logic. Through a pre-registered user study with 280 participants, we investigate how transparency interacts with adjustability in reducing aversion to algorithmic decision-making. Our results replicate the adjustability effect, showing that allowing users to modify algorithmic predictions mitigates aversion. Transparency's impact appears smaller than expected and was not significant for our sample. Furthermore, the effects of transparency and adjustability appear to be more independent than expected.

【21】Achieving Limited Adaptivity for Multinomial Logistic Bandits
标题：实现多项逻辑盗贼的有限适应能力
链接：https://arxiv.org/abs/2508.03072

作者：rakash Midigeshi, Tanmay Goyal, Gaurav Sinha
备注：Accepted to RLC 2025
摘要：Multinomial Logistic Bandits have recently attracted much attention due to their ability to model problems with multiple outcomes. In this setting, each decision is associated with many possible outcomes, modeled using a multinomial logit function. Several recent works on multinomial logistic bandits have simultaneously achieved optimal regret and computational efficiency. However, motivated by real-world challenges and practicality, there is a need to develop algorithms with limited adaptivity, wherein we are allowed only $M$ policy updates. To address these challenges, we present two algorithms, B-MNL-CB and RS-MNL, that operate in the batched and rarely-switching paradigms, respectively. The batched setting involves choosing the $M$ policy update rounds at the start of the algorithm, while the rarely-switching setting can choose these $M$ policy update rounds in an adaptive fashion. Our first algorithm, B-MNL-CB extends the notion of distributional optimal designs to the multinomial setting and achieves $\tilde{O}(\sqrt{T})$ regret assuming the contexts are generated stochastically when presented with $\Omega(\log \log T)$ update rounds. Our second algorithm, RS-MNL works with adversarially generated contexts and can achieve $\tilde{O}(\sqrt{T})$ regret with $\tilde{O}(\log T)$ policy updates. Further, we conducted experiments that demonstrate that our algorithms (with a fixed number of policy updates) are extremely competitive (and often better) than several state-of-the-art baselines (which update their policy every round), showcasing the applicability of our algorithms in various practical scenarios.

【22】TF-MLPNet: Tiny Real-Time Neural Speech Separation
标题：TF-MLPNet：微小的实时神经语音分离
链接：https://arxiv.org/abs/2508.03047

作者：ni, Tuochao Chen, Shyamnath Gollakota
备注：The 6th Clarity Workshop on Improving Speech-in-Noise for Hearing Devices (Clarity 2025)
摘要：Speech separation on hearable devices can enable transformative augmented and enhanced hearing capabilities. However, state-of-the-art speech separation networks cannot run in real-time on tiny, low-power neural accelerators designed for hearables, due to their limited compute capabilities. We present TF-MLPNet, the first speech separation network capable of running in real-time on such low-power accelerators while outperforming existing streaming models for blind speech separation and target speech extraction. Our network operates in the time-frequency domain, processing frequency sequences with stacks of fully connected layers that alternate along the channel and frequency dimensions, and independently processing the time sequence at each frequency bin using convolutional layers. Results show that our mixed-precision quantization-aware trained (QAT) model can process 6 ms audio chunks in real-time on the GAP9 processor, achieving a 3.5-4x runtime reduction compared to prior speech separation models.

【23】Where and How to Enhance: Discovering Bit-Width Contribution for Mixed Precision Quantization
标题：在哪里以及如何增强：发现混合精度量化的比特宽度贡献
链接：https://arxiv.org/abs/2508.03002

作者：ang, Lianbo Ma, Guo Yu, Shangce Gao
摘要：Mixed precision quantization (MPQ) is an effective quantization approach to achieve accuracy-complexity trade-off of neural network, through assigning different bit-widths to network activations and weights in each layer. The typical way of existing MPQ methods is to optimize quantization policies (i.e., bit-width allocation) in a gradient descent manner, termed as Differentiable (DMPQ). At the end of the search, the bit-width associated to the quantization parameters which has the largest value will be selected to form the final mixed precision quantization policy, with the implicit assumption that the values of quantization parameters reflect the operation contribution to the accuracy improvement. While much has been discussed about the MPQ improvement, the bit-width selection process has received little attention. We study this problem and argue that the magnitude of quantization parameters does not necessarily reflect the actual contribution of the bit-width to the task performance. Then, we propose a Shapley-based MPQ (SMPQ) method, which measures the bit-width operation direct contribution on the MPQ task. To reduce computation cost, a Monte Carlo sampling-based approximation strategy is proposed for Shapley computation. Extensive experiments on mainstream benchmarks demonstrate that our SMPQ consistently achieves state-of-the-art performance than gradient-based competitors.

【24】VCNet: Recreating High-Level Visual Cortex Principles for Robust Artificial Vision
标题：VNet：重建高级视觉皮质原理以实现稳健的人工视觉
链接：https://arxiv.org/abs/2508.02995

作者：. Hill, Zhang Xinyu, Timothy Putra Prasetio
摘要：Despite their success in image classification, modern convolutional neural networks (CNNs) exhibit fundamental limitations, including data inefficiency, poor out-of-distribution generalization, and vulnerability to adversarial perturbations. The primate visual system, in contrast, demonstrates superior efficiency and robustness, suggesting that its architectural principles may offer a blueprint for more capable artificial vision systems. This paper introduces Visual Cortex Network (VCNet), a novel neural network architecture whose design is informed by the macro-scale organization of the primate visual cortex. VCNet emulates key biological mechanisms, including hierarchical processing across distinct cortical areas, dual-stream information segregation, and top-down predictive feedback. We evaluate VCNet on two specialized benchmarks: the Spots-10 animal pattern dataset and a light field image classification task. Our results show that VCNet achieves a classification accuracy of 92.1\% on Spots-10 and 74.4\% on the light field dataset, surpassing contemporary models of comparable size. This work demonstrates that integrating neuroscientific principles into network design can lead to more efficient and robust models, providing a promising direction for addressing long-standing challenges in machine learning.

【25】Injecting Measurement Information Yields a Fast and Noise-Robust Diffusion-Based Inverse Problem Solver
标题：注入测量信息产生快速且具有噪音稳健性的基于扩散的反问题求解器
链接：https://arxiv.org/abs/2508.02964

作者：Patsenker, Henry Li, Myeongseob Ko, Ruoxi Jia, Yuval Kluger
摘要：Diffusion models have been firmly established as principled zero-shot solvers for linear and nonlinear inverse problems, owing to their powerful image prior and iterative sampling algorithm. These approaches often rely on Tweedie's formula, which relates the diffusion variate $\mathbf{x}_t$ to the posterior mean $\mathbb{E} [\mathbf{x}_0 | \mathbf{x}_t]$, in order to guide the diffusion trajectory with an estimate of the final denoised sample $\mathbf{x}_0$. However, this does not consider information from the measurement $\mathbf{y}$, which must then be integrated downstream. In this work, we propose to estimate the conditional posterior mean $\mathbb{E} [\mathbf{x}_0 | \mathbf{x}_t, \mathbf{y}]$, which can be formulated as the solution to a lightweight, single-parameter maximum likelihood estimation problem. The resulting prediction can be integrated into any standard sampler, resulting in a fast and memory-efficient inverse solver. Our optimizer is amenable to a noise-aware likelihood-based stopping criteria that is robust to measurement noise in $\mathbf{y}$. We demonstrate comparable or improved performance against a wide selection of contemporary inverse solvers across multiple datasets and tasks.

【26】Polymath: A Self-Optimizing Agent with Dynamic Hierarchical Workflow
标题：Polymath：具有动态分层工作流程的自优化代理
链接：https://arxiv.org/abs/2508.02959

作者： Ho, Jing Gong, Xufeng Yao, Yunsheng Bai, Abhishek B Akkur, Haoxing Ren
备注：18 pages, 12 figures, under review for AAAI2026
摘要：Large language models (LLMs) excel at solving complex tasks by executing agentic workflows composed of detailed instructions and structured operations. Yet, building general-purpose agents by manually embedding foundation models into agentic systems such as Chain-of-Thought, Self-Reflection, and ReACT through text interfaces limits scalability and efficiency. Recently, many researchers have sought to automate the generation and optimization of these workflows through code-based representations. However, existing methods often rely on labeled datasets to train and optimize workflows, making them ineffective and inflexible for solving real-world, dynamic problems where labeled data is unavailable. To address this challenge, we introduce Polymath, a self-optimizing agent with dynamic hierarchical workflow that leverages the flexibility of task flow graphs and the expressiveness of code-represented workflows to solve a wide range of real-world, dynamic problems. The proposed optimization methodology integrates multi-grid-inspired graph optimization with a self-reflection-guided evolutionary algorithm to refine workflows without labeled data. Experimental results on six benchmark datasets across coding, math, and multi-turn QA tasks show that Polymath achieves 8.1% average improvement over state-of-the-art baselines.

【27】Engineered over Emergent Communication in MARL for Scalable and Sample-Efficient Cooperative Task Allocation in a Partially Observable Grid
标题：在MARL中设计紧急通信，以在部分可观察网格中实现可扩展和样本高效的协作任务分配
链接：https://arxiv.org/abs/2508.02912

作者：. Hill, Mant Koh En Wei, Thangavel Jishnuanandh
摘要：We compare the efficacy of learned versus engineered communication strategies in a cooperative multi-agent reinforcement learning (MARL) environment. For the learned approach, we introduce Learned Direct Communication (LDC), where agents generate messages and actions concurrently via a neural network. Our engineered approach, Intention Communication, employs an Imagined Trajectory Generation Module (ITGM) and a Message Generation Network (MGN) to formulate messages based on predicted future states. Both strategies are evaluated on their success rates in cooperative tasks under fully and partially observable conditions. Our findings indicate that while emergent communication is viable, the engineered approach demonstrates superior performance and scalability, particularly as environmental complexity increases.

【28】Neural Approximators for Low-Thrust Trajectory Transfer Cost and Reachability
标题：低推力弹道传输成本和可达性的神经逼近器
链接：https://arxiv.org/abs/2508.02911

作者：ng, Francesco Topputo
摘要：In trajectory design, fuel consumption and trajectory reachability are two key performance indicators for low-thrust missions. This paper proposes general-purpose pretrained neural networks to predict these metrics. The contributions of this paper are as follows: Firstly, based on the confirmation of the Scaling Law applicable to low-thrust trajectory approximation, the largest dataset is constructed using the proposed homotopy ray method, which aligns with mission-design-oriented data requirements. Secondly, the data are transformed into a self-similar space, enabling the neural network to adapt to arbitrary semi-major axes, inclinations, and central bodies. This extends the applicability beyond existing studies and can generalize across diverse mission scenarios without retraining. Thirdly, to the best of our knowledge, this work presents the current most general and accurate low-thrust trajectory approximator, with implementations available in C++, Python, and MATLAB. The resulting neural network achieves a relative error of 0.78% in predicting velocity increments and 0.63% in minimum transfer time estimation. The models have also been validated on a third-party dataset, multi-flyby mission design problem, and mission analysis scenario, demonstrating their generalization capability, predictive accuracy, and computational efficiency.

【29】Physics-Embedded Neural ODEs for Sim2Real Edge Digital Twins of Hybrid Power Electronics Systems
标题：用于混合电力电子系统的Sim 2Real Edge数字孪生的物理嵌入式神经ODE
链接：https://arxiv.org/abs/2508.02887

作者：eng, Haoyu Wang, Yangbin Zeng, Di Mou, Xin Zhang, Hong Li, Sergio Vazquez, Leopoldo G. Franquelo
摘要：Edge Digital Twins (EDTs) are crucial for monitoring and control of Power Electronics Systems (PES). However, existing modeling approaches struggle to consistently capture continuously evolving hybrid dynamics that are inherent in PES, degrading Sim-to-Real generalization on resource-constrained edge devices. To address these challenges, this paper proposes a Physics-Embedded Neural ODEs (PENODE) that (i) embeds the hybrid operating mechanism as an event automaton to explicitly govern discrete switching and (ii) injects known governing ODE components directly into the neural parameterization of unmodeled dynamics. This unified design yields a differentiable end-to-end trainable architecture that preserves physical interpretability while reducing redundancy, and it supports a cloud-to-edge toolchain for efficient FPGA deployment. Experimental results demonstrate that PENODE achieves significantly higher accuracy in benchmarks in white-box, gray-box, and black-box scenarios, with a 75% reduction in neuron count, validating that the proposed PENODE maintains physical interpretability, efficient edge deployment, and real-time control enhancement.

【30】Highlight & Summarize: RAG without the jailbreaks
标题：亮点与总结：没有越狱的RAG
链接：https://arxiv.org/abs/2508.02872

作者：Cherubin, Andrew Paverd
摘要：Preventing jailbreaking and model hijacking of Large Language Models (LLMs) is an important yet challenging task. For example, when interacting with a chatbot, malicious users can input specially crafted prompts to cause the LLM to generate undesirable content or perform a completely different task from its intended purpose. Existing mitigations for such attacks typically rely on hardening the LLM's system prompt or using a content classifier trained to detect undesirable content or off-topic conversations. However, these probabilistic approaches are relatively easy to bypass due to the very large space of possible inputs and undesirable outputs. In this paper, we present and evaluate Highlight & Summarize (H&S), a new design pattern for retrieval-augmented generation (RAG) systems that prevents these attacks by design. The core idea is to perform the same task as a standard RAG pipeline (i.e., to provide natural language answers to questions, based on relevant sources) without ever revealing the user's question to the generative LLM. This is achieved by splitting the pipeline into two components: a highlighter, which takes the user's question and extracts relevant passages ("highlights") from the retrieved documents, and a summarizer, which takes the highlighted passages and summarizes them into a cohesive answer. We describe several possible instantiations of H&S and evaluate their generated responses in terms of correctness, relevance, and response quality. Surprisingly, when using an LLM-based highlighter, the majority of H&S responses are judged to be better than those of a standard RAG pipeline.

【31】Pulse Shape Discrimination Algorithms: Survey and Benchmark
标题：脉冲形状识别算法：调查和基准
链接：https://arxiv.org/abs/2508.02750

作者：u, Yihan Zhan, Mingzhe Liu, Yanhua Liu, Peng Li, Zhuo Zuo, Bingqi Liu, Runxi Liu
摘要：This review presents a comprehensive survey and benchmark of pulse shape discrimination (PSD) algorithms for radiation detection, classifying nearly sixty methods into statistical (time-domain, frequency-domain, neural network-based) and prior-knowledge (machine learning, deep learning) paradigms. We implement and evaluate all algorithms on two standardized datasets: an unlabeled set from a 241Am-9Be source and a time-of-flight labeled set from a 238Pu-9Be source, using metrics including Figure of Merit (FOM), F1-score, ROC-AUC, and inter-method correlations. Our analysis reveals that deep learning models, particularly Multi-Layer Perceptrons (MLPs) and hybrid approaches combining statistical features with neural regression, often outperform traditional methods. We discuss architectural suitabilities, the limitations of FOM, alternative evaluation metrics, and performance across energy thresholds. Accompanying this work, we release an open-source toolbox in Python and MATLAB, along with the datasets, to promote reproducibility and advance PSD research.

【32】Accelerating Conjugate Gradient Solvers for Homogenization Problems with Unitary Neural Operators
标题：利用正神经运算子加速齐化问题的卷积梯度求解器
链接：https://arxiv.org/abs/2508.02681

作者：rb, Felix Fritzen
摘要：Rapid and reliable solvers for parametric partial differential equations (PDEs) are needed in many scientific and engineering disciplines. For example, there is a growing demand for composites and architected materials with heterogeneous microstructures. Designing such materials and predicting their behavior in practical applications requires solving homogenization problems for a wide range of material parameters and microstructures. While classical numerical solvers offer reliable and accurate solutions supported by a solid theoretical foundation, their high computational costs and slow convergence remain limiting factors. As a result, scientific machine learning is emerging as a promising alternative. However, such approaches often lack guaranteed accuracy and physical consistency. This raises the question of whether it is possible to develop hybrid approaches that combine the advantages of both data-driven methods and classical solvers. To address this, we introduce UNO-CG, a hybrid solver that accelerates conjugate gradient (CG) solvers using specially designed machine-learned preconditioners, while ensuring convergence by construction. As a preconditioner, we propose Unitary Neural Operators as a modification of Fourier Neural Operators. Our method can be interpreted as a data-driven discovery of Green's functions, which are then used to accelerate iterative solvers. We evaluate UNO-CG on various homogenization problems involving heterogeneous microstructures and millions of degrees of freedom. Our results demonstrate that UNO-CG enables a substantial reduction in the number of iterations and is competitive with handcrafted preconditioners for homogenization problems that involve expert knowledge. Moreover, UNO-CG maintains strong performance across a variety of boundary conditions, where many specialized solvers are not applicable, highlighting its versatility and robustness.

【33】The Open DAC 2025 Dataset for Sorbent Discovery in Direct Air Capture
标题：直接空气捕获中的Sorbent发现的开放DAC 2025数据集
链接：https://arxiv.org/abs/2508.03162

作者：riram, Logan M. Brabson, Xiaohan Yu, Sihoon Choi, Kareem Abdelmaqsoud, Elias Moubarak, Pim de Haan, Sindy Löwe, Johann Brehmer, John R. Kitchin, Max Welling, C. Lawrence Zitnick, Zachary Ulissi, Andrew J. Medford, David S. Sholl
摘要：Identifying useful sorbent materials for direct air capture (DAC) from humid air remains a challenge. We present the Open DAC 2025 (ODAC25) dataset, a significant expansion and improvement upon ODAC23 (Sriram et al., ACS Central Science, 10 (2024) 923), comprising nearly 70 million DFT single-point calculations for CO$_2$, H$_2$O, N$_2$, and O$_2$ adsorption in 15,000 MOFs. ODAC25 introduces chemical and configurational diversity through functionalized MOFs, high-energy GCMC-derived placements, and synthetically generated frameworks. ODAC25 also significantly improves upon the accuracy of DFT calculations and the treatment of flexible MOFs in ODAC23. Along with the dataset, we release new state-of-the-art machine-learned interatomic potentials trained on ODAC25 and evaluate them on adsorption energy and Henry's law coefficient predictions.

【34】SpectrumFM: A New Paradigm for Spectrum Cognition
标题：SpectrumFM：频谱认知的新范式
链接：https://arxiv.org/abs/2508.02742

作者：u, Hao Zhang, Wei Wu, Fuhui Zhou, Qihui Wu, Derrick Wing Kwan Ng, Chan-Byoung Chae
备注：This paper has been accepted for presentation at the 2025 IEEE Global Communications Conference (GLOBECOM 2025), Cognitive Radio and AI-Enabled Network Symposium
摘要：The enhancement of spectrum efficiency and the realization of secure spectrum utilization are critically dependent on spectrum cognition. However, existing spectrum cognition methods often exhibit limited generalization and suboptimal accuracy when deployed across diverse spectrum environments and tasks. To overcome these challenges, we propose a spectrum foundation model, termed SpectrumFM, which provides a new paradigm for spectrum cognition. An innovative spectrum encoder that exploits the convolutional neural networks and the multi-head self attention mechanisms is proposed to effectively capture both fine-grained local signal structures and high-level global dependencies in the spectrum data. To enhance its adaptability, two novel self-supervised learning tasks, namely masked reconstruction and next-slot signal prediction, are developed for pre-training SpectrumFM, enabling the model to learn rich and transferable representations. Furthermore, low-rank adaptation (LoRA) parameter-efficient fine-tuning is exploited to enable SpectrumFM to seamlessly adapt to various downstream spectrum cognition tasks, including spectrum sensing (SS), anomaly detection (AD), and wireless technology classification (WTC). Extensive experiments demonstrate the superiority of SpectrumFM over state-of-the-art methods. Specifically, it improves detection probability in the SS task by 30% at -4 dB signal-to-noise ratio (SNR), boosts the area under the curve (AUC) in the AD task by over 10%, and enhances WTC accuracy by 9.6%.

【35】CreditARF: A Framework for Corporate Credit Rating with Annual Report and Financial Feature Integration
标题：CreditARF：具有年度报告和财务特征集成的企业信用评级框架
链接：https://arxiv.org/abs/2508.02738

作者：i, Zhongliang Yang, DiYang Lu, Yisi Wang, Yiting Zhou, Linna Zhou
摘要：Corporate credit rating serves as a crucial intermediary service in the market economy, playing a key role in maintaining economic order. Existing credit rating models rely on financial metrics and deep learning. However, they often overlook insights from non-financial data, such as corporate annual reports. To address this, this paper introduces a corporate credit rating framework that integrates financial data with features extracted from annual reports using FinBERT, aiming to fully leverage the potential value of unstructured text data. In addition, we have developed a large-scale dataset, the Comprehensive Corporate Rating Dataset (CCRD), which combines both traditional financial data and textual data from annual reports. The experimental results show that the proposed method improves the accuracy of the rating predictions by 8-12%, significantly improving the effectiveness and reliability of corporate credit ratings.

【36】On Improving PPG-Based Sleep Staging: A Pilot Study
标题：改善基于PGP的睡眠分期：一项试点研究
链接：https://arxiv.org/abs/2508.02689

作者：ng, Yu Guan, Chen Chen, Ligang Zhou, Laurence T. Yang, Sai Gu
摘要：Sleep monitoring through accessible wearable technology is crucial to improving well-being in ubiquitous computing. Although photoplethysmography(PPG) sensors are widely adopted in consumer devices, achieving consistently reliable sleep staging using PPG alone remains a non-trivial challenge. In this work, we explore multiple strategies to enhance the performance of PPG-based sleep staging. Specifically, we compare conventional single-stream model with dual-stream cross-attention strategies, based on which complementary information can be learned via PPG and PPG-derived modalities such as augmented PPG or synthetic ECG. To study the effectiveness of the aforementioned approaches in four-stage sleep monitoring task, we conducted experiments on the world's largest sleep staging dataset, i.e., the Multi-Ethnic Study of Atherosclerosis(MESA). We found that substantial performance gain can be achieved by combining PPG and its auxiliary information under the dual-stream cross-attention architecture. Source code of this project can be found at https://github.com/DavyWJW/sleep-staging-models

机器翻译由腾讯交互翻译提供，仅供参考

点击“阅读原文”获取带摘要的学术速递