机器学习学术速递[8.29]

点击阅读原文访问arxivdaily.com，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏等功能！

cs.LG 方向，今日共计119篇

大模型相关(14篇)

【1】OnGoal: Tracking and Visualizing Conversational Goals in Multi-Turn Dialogue with Large Language Models
标题：OnGoal：在使用大型语言模型的多轮对话中跟踪和可视化对话目标
链接：https://arxiv.org/abs/2508.21061

作者：ia, Shunan Guo, Eunyee Koh, Alex Endert
备注：Accepted to UIST 2025. 18 pages, 9 figures, 2 tables. For a demo video, see this https URL
摘要：随着使用大型语言模型（LLM）的多轮对话变得越来越长，越来越复杂，用户如何更好地评估和审查其会话目标的进展？我们提出了OnGoal，一个LLM聊天界面，可以帮助用户更好地管理目标进度。OnGoal通过LLM辅助评估提供目标对齐的实时反馈，通过示例解释评估结果，以及随着时间的推移目标进展的概述，使用户能够更有效地导航复杂的对话。通过一项有20名参与者的写作任务的研究，我们评估了OnGoal对基线聊天界面没有目标跟踪。使用OnGoal，参与者花了更少的时间和精力来实现他们的目标，同时探索新的激励策略来克服误解，这表明跟踪和可视化目标可以提高LLM对话的参与度和弹性。我们的研究结果启发了未来LLM聊天界面的设计，这些界面可以改善目标沟通，减少认知负荷，增强交互性，并提供反馈以提高LLM性能。
摘要：As multi-turn dialogues with large language models (LLMs) grow longer and more complex, how can users better evaluate and review progress on their conversational goals? We present OnGoal, an LLM chat interface that helps users better manage goal progress. OnGoal provides real-time feedback on goal alignment through LLM-assisted evaluation, explanations for evaluation results with examples, and overviews of goal progression over time, enabling users to navigate complex dialogues more effectively. Through a study with 20 participants on a writing task, we evaluate OnGoal against a baseline chat interface without goal tracking. Using OnGoal, participants spent less time and effort to achieve their goals while exploring new prompting strategies to overcome miscommunication, suggesting tracking and visualizing goals can enhance engagement and resilience in LLM dialogues. Our findings inspired design implications for future LLM chat interfaces that improve goal communication, reduce cognitive load, enhance interactivity, and enable feedback to improve LLM performance.

【2】cMALC-D: Contextual Multi-Agent LLM-Guided Curriculum Learning with Diversity-Based Context Blending
标题：cMALC-D：基于多样性的上下文融合的上下文多主体法学硕士指导课程学习
链接：https://arxiv.org/abs/2508.20818

作者：atheesh, Keenan Powell, Hua Wei
备注：A shorter version has been accepted to the 2025 Conference on Information and Knowledge Management
摘要：许多多智能体强化学习（MARL）算法都是在固定的仿真环境中训练的，这使得它们在部署到具有更复杂和不确定条件的真实场景中时变得脆弱。上下文MARL（cMARL）通过使用上下文变量参数化环境并训练在所有环境配置中表现良好的上下文无关策略来解决这个问题。现有的cMARL方法试图使用课程学习来帮助训练和评估上下文无关的政策，但他们往往依赖于不可靠的代理信号，如价值估计或广义优势估计，这是嘈杂和不稳定的多代理设置，由于代理间的动态和部分可观测性。为了解决这些问题，我们提出了上下文多代理LLM-guided课程学习与基于多样性的上下文混合（cMALC-D），一个框架，使用大型语言模型（LLMs）生成语义有意义的课程，并提供更强大的评估信号。为了防止模式崩溃并鼓励探索，我们引入了一种新的基于多样性的上下文混合机制，该机制通过组合来自先前上下文的特征来创建新的训练场景。交通信号控制领域的实验表明，与现有课程学习基线相比，cMALC-D显着提高了泛化和样本效率。我们在https://github.com/DaRL-LibSignal/cMALC-D上提供代码。
摘要：Many multi-agent reinforcement learning (MARL) algorithms are trained in fixed simulation environments, making them brittle when deployed in real-world scenarios with more complex and uncertain conditions. Contextual MARL (cMARL) addresses this by parameterizing environments with context variables and training a context-agnostic policy that performs well across all environment configurations. Existing cMARL methods attempt to use curriculum learning to help train and evaluate context-agnostic policies, but they often rely on unreliable proxy signals, such as value estimates or generalized advantage estimates that are noisy and unstable in multi-agent settings due to inter-agent dynamics and partial observability. To address these issues, we propose Contextual Multi-Agent LLM-Guided Curriculum Learning with Diversity-Based Context Blending (cMALC-D), a framework that uses Large Language Models (LLMs) to generate semantically meaningful curricula and provide a more robust evaluation signal. To prevent mode collapse and encourage exploration, we introduce a novel diversity-based context blending mechanism that creates new training scenarios by combining features from prior contexts. Experiments in traffic signal control domains demonstrate that cMALC-D significantly improves both generalization and sample efficiency compared to existing curriculum learning baselines. We provide code at https://github.com/DaRL-LibSignal/cMALC-D.

【3】Provable Benefits of In-Tool Learning for Large Language Models
标题：大型语言模型工具内学习的可证明好处
链接：https://arxiv.org/abs/2508.20755

作者：ston, Ambroise Odonnat, Charles Arnal, Vivien Cabannes
摘要：工具增强的语言模型，配备了检索，记忆或外部API，正在重塑人工智能，但它们的理论优势仍有待探索。在本文中，我们通过展示工具学习（外部检索）对事实回忆的权重学习（记忆）的好处来解决这个问题。我们发现，模型仅在其权重中可以记住的事实数量从根本上受到其参数计数的限制。相比之下，我们证明了工具的使用，使无限的事实回忆通过一个简单而有效的电路结构。这些结果在对照实验中得到了验证，其中使用工具的模型始终优于记忆模型。我们进一步表明，对于预训练的大型语言模型，教学工具的使用和一般规则比将事实微调到记忆中更有效。我们的工作提供了理论和经验基础，建立了为什么工具增强的工作流程不仅实用，而且可证明更具可扩展性。
摘要：Tool-augmented language models, equipped with retrieval, memory, or external APIs, are reshaping AI, yet their theoretical advantages remain underexplored. In this paper, we address this question by demonstrating the benefits of in-tool learning (external retrieval) over in-weight learning (memorization) for factual recall. We show that the number of facts a model can memorize solely in its weights is fundamentally limited by its parameter count. In contrast, we prove that tool-use enables unbounded factual recall via a simple and efficient circuit construction. These results are validated in controlled experiments, where tool-using models consistently outperform memorizing ones. We further show that for pretrained large language models, teaching tool-use and general rules is more effective than finetuning facts into memory. Our work provides both a theoretical and empirical foundation, establishing why tool-augmented workflows are not just practical, but provably more scalable.

【4】Token Buncher: Shielding LLMs from Harmful Reinforcement Learning Fine-Tuning
标题：Token Buncher：保护LLM免受有害的强化学习微调
链接：https://arxiv.org/abs/2508.20697

作者：ng, Lixu Wang, Tianyi Wei, Jie Zhang, Chongyang Gao, Sinong Zhan, Peizhuo Lv, Wei Dong
备注：Project Hompage: this https URL
摘要：随着大型语言模型（LLM）的能力不断增长，通过微调造成有害误用的风险也在增加。虽然大多数先前的研究假设攻击者依赖于监督微调（SFT）来进行这种滥用，但我们系统地证明了强化学习（RL）使对手能够更有效地打破安全对齐，并在匹配的计算预算下促进高级有害任务援助。为了应对这种新出现的威胁，我们提出了TokenBuncher，这是第一个专门针对基于RL的有害微调的有效防御。TokenBuncher抑制了RL所依赖的基础：模型响应不确定性。通过约束不确定性，基于RL的微调不再能够利用不同的奖励信号来驱动模型走向有害行为。我们通过熵作为奖励的RL和令牌噪声器机制来实现这种防御，该机制旨在防止专家域有害功能的升级。跨多个模型和RL算法的广泛实验表明，TokenBuncher鲁棒地减轻了有害的RL微调，同时保留了良性的任务效用和微调能力。我们的研究结果强调，基于RL的有害微调比SFT带来更大的系统性风险，而TokenBuncher提供了有效和通用的防御。
摘要：As large language models (LLMs) continue to grow in capability, so do the risks of harmful misuse through fine-tuning. While most prior studies assume that attackers rely on supervised fine-tuning (SFT) for such misuse, we systematically demonstrate that reinforcement learning (RL) enables adversaries to more effectively break safety alignment and facilitate advanced harmful task assistance, under matched computational budgets. To counter this emerging threat, we propose TokenBuncher, the first effective defense specifically targeting RL-based harmful fine-tuning. TokenBuncher suppresses the foundation on which RL relies: model response uncertainty. By constraining uncertainty, RL-based fine-tuning can no longer exploit distinct reward signals to drive the model toward harmful behaviors. We realize this defense through entropy-as-reward RL and a Token Noiser mechanism designed to prevent the escalation of expert-domain harmful capabilities. Extensive experiments across multiple models and RL algorithms show that TokenBuncher robustly mitigates harmful RL fine-tuning while preserving benign task utility and finetunability. Our results highlight that RL-based harmful fine-tuning poses a greater systemic risk than SFT, and that TokenBuncher provides an effective and general defense.

【5】MERIT: Maximum-normalized Element-wise Ratio for Language Model Large-batch Training
标题：MerIT：语言模型大批量训练的最大规格化元素比例
链接：https://arxiv.org/abs/2508.20577

作者： Zangwei Zheng, Ziheng Qin, Zirui Zhu, Yong Liu, Yang You
备注：ICML 2025
摘要：大批量训练已经成为加速深度神经网络训练的基石，但它在优化和泛化方面带来了挑战。现有的优化器如AdamW在语言模型的大批量训练过程中表现出性能下降，这是由于注意力层的信息瓶颈导致的最大注意力logit的急剧增加。虽然LAMB优化器部分解决了这个问题，但一些注意层仍然面临这个问题。原因是LAMB中基于$l_2$-norm的信任比在直接影响查询/键权重的最大值方面不太有效。此外，LAMB中的权重信任比容易出错，因为它忽略了行或列内权重值的关系。基于这些观察，我们提出了一种新的优化器，MERIT，它利用最大范数来计算信任比，以更有效地约束最大注意力logit。此外，我们进一步构建元素的信任比，以提供更强大的更新缩放专注于本地的权重结构。在各种规模的GPT-2模型上进行大批量训练的广泛实验证明了MERIT的优越性能。值得注意的是，在GPT-2 Medium的训练过程中，与使用48 B训练令牌的标准批量大小（480）相比，MERIT能够实现6 k批量大小，而没有任何性能下降。这项工作强调了在大批量训练中考虑最大注意力logit和更细粒度的信任率的重要性。它成功地提高了训练稳定性，为更大批量的使用铺平了道路，从而实现了大型语言模型的更快开发和迭代。代码可在https://github.com/NUS-HPC-AI-Lab/MERIT上获得。
摘要：Large-batch training has become a cornerstone in accelerating the training of deep neural networks, yet it poses challenges in optimization and generalization. Existing optimizers like AdamW present performance degradation during language models' large-batch training, due to the information bottleneck in attention layers caused by the sharp increase of max attention logit. While the LAMB optimizer partially addresses this issue, some attention layers still face this issue. The reason is that $l_2$-norm-based trust ratios in LAMB are less effective in directly influencing the max value of query/key weights. Furthermore, the weight-wise trust ratio in LAMB is error-prone as it overlooks relationships of weight values within rows or columns. Building on these observations, we propose a novel optimizer, MERIT, which leverages the max-norm to calculate the trust ratio to constrain the max attention logit more effectively. Moreover, we further construct element-wise trust ratios to provide more robust update scaling by focusing on local weight structures. Extensive experiments of large-batch training across various sizes of GPT-2 models demonstrate the superior performance of MERIT. Notably, during the training of GPT-2 Medium, MERIT enables a 6k batch size without any performance degradation compared to the standard batch size (480) with 48B training tokens. This work highlights the importance of considering the max attention logit and finer-granularity trust ratio in large-batch training. It successfully improves the training stability and paves the way for larger batch usage, enabling faster development and iteration of large language models. Code is available at https://github.com/NUS-HPC-AI-Lab/MERIT.

【6】Evaluating Differentially Private Generation of Domain-Specific Text
标题：评估特定领域文本的差异私人生成
链接：https://arxiv.org/abs/2508.20452

作者：, Viktor Schlegel, Srinivasan Nandakumar, Iqra Zahid, Yuping Wu, Warren Del-Pinto, Goran Nenadic, Siew-Kei Lam, Jie Zhang, Anil A Bharath
摘要：生成式人工智能为医疗保健和金融等高风险领域提供了变革潜力，但隐私和监管障碍阻碍了真实世界数据的使用。为了解决这个问题，差异化的私人合成数据生成已经成为一个有前途的替代方案。在这项工作中，我们引入了一个统一的基准，系统地评估正式差分隐私（DP）保证下生成的文本数据集的实用性和保真度。我们的基准测试解决了特定领域基准测试中的关键挑战，包括选择代表性数据和现实的隐私预算，考虑预训练和各种评估指标。我们评估了五个特定领域数据集的最先进的隐私保护生成方法，与真实数据相比，特别是在严格的隐私限制下，揭示了显着的实用性和保真度下降。这些发现强调了当前方法的局限性，概述了对先进的隐私保护数据共享方法的需求，并在现实场景中对其进行评估方面树立了先例。
摘要：Generative AI offers transformative potential for high-stakes domains such as healthcare and finance, yet privacy and regulatory barriers hinder the use of real-world data. To address this, differentially private synthetic data generation has emerged as a promising alternative. In this work, we introduce a unified benchmark to systematically evaluate the utility and fidelity of text datasets generated under formal Differential Privacy (DP) guarantees. Our benchmark addresses key challenges in domain-specific benchmarking, including choice of representative data and realistic privacy budgets, accounting for pre-training and a variety of evaluation metrics. We assess state-of-the-art privacy-preserving generation methods across five domain-specific datasets, revealing significant utility and fidelity degradation compared to real data, especially under strict privacy constraints. These findings underscore the limitations of current approaches, outline the need for advanced privacy-preserving data sharing methods and set a precedent regarding their evaluation in realistic scenarios.

【7】Towards Mitigating Excessive Forgetting in LLM Unlearning via Entanglement-Aware Unlearning with Proxy Constraint
标题：通过具有代理约束的纠缠感知取消学习来缓解LLM取消学习中的过度遗忘
链接：https://arxiv.org/abs/2508.20443

作者：u, Jian Lou, Yuke Hu, Xiaochen Li, Tailun Chen, Yitian Chen, Zhan Qin
摘要：大型语言模型（LLM）是在可能包含私有或版权内容的大规模数据集上训练的。由于越来越多的隐私和所有权问题，数据所有者可能会要求从训练模型中删除他们的数据。机器非学习提供了一种实用的解决方案，它可以消除特定数据的影响，而无需完全重新训练。然而，大多数现有的方法缺乏一个健全的遗忘边界，导致一些样本被遗忘，留下残余泄漏风险，而其他人仍然被遗忘的代价是降低效用。在这项工作中，我们提出了EAGLE-PC（Entanglement-Awareness Guided Loss Redweighting with Proxy Constraint），这是一种新的遗忘框架，通过两个关键组件解决了这些限制。首先，纠缠感知引导的损失重新加权通过测量其相似性来确定每个样本的遗忘努力，以将样本保留在嵌入空间中，从而实现更有针对性和更有效的遗忘。第二，利用ICL（上下文学习）生成的测试数据的代理约束软正则化的遗忘过程，有效地减轻过度遗忘。EAGLE-PC与现有的基于梯度的物镜兼容，并作为即插即用的增强功能。我们评估EAGLE-PC的TOFU和MUSE基准测试，在遗忘效用权衡多个LLM一致的改进。结合NPO+GD优化器，它接近完全再训练性能，提供可扩展和强大的遗忘解决方案。
摘要：Large language models (LLMs) are trained on massive datasets that may include private or copyrighted content. Due to growing privacy and ownership concerns, data owners may request the removal of their data from trained models. Machine unlearning provides a practical solution by removing the influence of specific data without full retraining. However, most existing methods lack a sound forgetting boundary, causing some samples to be under-forgotten, leaving residual leakage risks, while others remain over-forgotten at the expense of degraded utility. In this work, we propose EAGLE-PC (Entanglement-Awareness Guided Loss Reweighting with Proxy Constraint), a novel unlearning framework that addresses these limitations through two key components. First, entanglement-awareness guided loss reweighting determines the forgetting effort of each sample by measuring its similarity to retain samples in the embedding space, enabling more targeted and effective unlearning. Second, a proxy constraint leveraging ICL (In-Context Learning) generated test data softly regularizes the forgetting process, effectively mitigating over-forgetting. EAGLE-PC is compatible with existing gradient-based objectives and serves as a plug-and-play enhancement. We evaluate EAGLE-PC on the TOFU and MUSE benchmarks, showing consistent improvements in the forgetting-utility trade-off across multiple LLMs. Combined with the NPO+GD optimizer, it approaches full retraining performance, offering a scalable and robust unlearning solution.

【8】Revealing Potential Biases in LLM-Based Recommender Systems in the Cold Start Setting
标题：揭示冷启动环境下基于LLM的推荐系统的潜在偏见
链接：https://arxiv.org/abs/2508.20401

作者： Andre, Gauthier Roy, Eva Dyer, Kai Wang
备注：In Proceedings of 2nd Workshop on Evaluating and Applying Recommendation Systems with Large Language Models (EARL) at RecSys 2025 (EARL 2025)
摘要：大型语言模型（LLM）由于其通用功能而越来越多地用于推荐任务。虽然LLM在丰富的背景设置中表现良好，但他们在冷启动场景中的行为，其中只有有限的信号，如年龄，性别或语言，引起了公平性问题，因为他们可能依赖于在预训练期间编码的社会偏见。我们引入了一个专门设计的基准来评估零上下文推荐的公平性。我们的模块化管道支持可配置的推荐域和敏感属性，从而实现对任何开源LLM的系统和灵活审计。通过对最先进的模型（Gemma 3和Llama 3.2）的评估，我们发现了跨推荐领域（音乐，电影和大学）的一致偏见，包括性别和文化刻板印象。我们还揭示了模型大小和公平性之间的非线性关系，强调了细致入微的分析的必要性。
摘要：Large Language Models (LLMs) are increasingly used for recommendation tasks due to their general-purpose capabilities. While LLMs perform well in rich-context settings, their behavior in cold-start scenarios, where only limited signals such as age, gender, or language are available, raises fairness concerns because they may rely on societal biases encoded during pretraining. We introduce a benchmark specifically designed to evaluate fairness in zero-context recommendation. Our modular pipeline supports configurable recommendation domains and sensitive attributes, enabling systematic and flexible audits of any open-source LLM. Through evaluations of state-of-the-art models (Gemma 3 and Llama 3.2), we uncover consistent biases across recommendation domains (music, movies, and colleges) including gendered and cultural stereotypes. We also reveal a non-linear relationship between model size and fairness, highlighting the need for nuanced analysis.

【9】Graph-R1: Unleashing LLM Reasoning with NP-Hard Graph Problems
标题：Graph-R1：利用NP-硬图问题释放LLM推理
链接：https://arxiv.org/abs/2508.20373

作者：g, Bowen Liu, Jianheng Tang, Nuo Chen, Yuhan Li, Qifan Zhang, Jia Li
摘要：推理大型语言模型（RLLM）最近在复杂的推理任务上取得了显着的进展，这主要归功于它们的长思想链（Long CoT）能力。然而，开发这些长CoT行为在很大程度上依赖于使用高质量数据集的后训练，这些数据集通常是昂贵的和人工策划的（例如，数学和代码），留下未探索的可扩展的替代方案。在这项工作中，我们引入了NP难（NPH）图问题作为一种新的合成训练语料库，因为它们本质上需要深度推理，广泛的探索和反思策略，这是Long CoT推理的核心特征。基于这一认识，我们开发了一个两阶段的后训练框架：（i）对拒绝采样的NPH图实例进行长CoT监督微调（SFT），这大大增强了推理深度，以及（ii）具有细粒度奖励设计的强化学习（RL），这提高了推理效率。我们的旗舰模型Graph-R1- 7 B在数学、编码、STEM和逻辑方面表现出强大的泛化能力，在NPH图问题上的准确性和推理效率都超过了QwQ-32 B。这些结果将NPH图问题定位为一种有效且可扩展的资源，用于推进LLM中的Long CoT推理，为LLM后训练开辟了新的前沿。我们的实现可以在https：//github.com/Graph-Reasoner/Graph-R1上找到，模型和数据集托管在我们的拥抱脸集合HKUST-DSAIL/Graph-R1中。
摘要：Reasoning Large Language Models (RLLMs) have recently achieved remarkable progress on complex reasoning tasks, largely enabled by their long chain-of-thought (Long CoT) capabilities. However, developing these Long CoT behaviors relies heavily on post-training with high-quality datasets, which are typically costly and human-curated (e.g., mathematics and code), leaving scalable alternatives unexplored. In this work, we introduce NP-hard (NPH) graph problems as a novel synthetic training corpus, as they inherently require deep reasoning, extensive exploration, and reflective strategies, which are core characteristics of Long CoT reasoning. Building on this insight, we develop a two-stage post-training framework: (i) Long CoT Supervised Fine-Tuning (SFT) on rejection-sampled NPH graph instances, which substantially enhances reasoning depth, and (ii) Reinforcement Learning (RL) with a fine-grained reward design, which sharpens reasoning efficiency. Our flagship model, Graph-R1-7B, demonstrates strong generalization across mathematics, coding, STEM, and logic, and surpasses QwQ-32B on NPH graph problems in both accuracy and reasoning efficiency. These results position NPH graph problems as an effective and scalable resource for advancing Long CoT reasoning in LLMs, opening a new frontier for LLM post-training. Our implementation is available at https://github.com/Graph-Reasoner/Graph-R1, with models and datasets hosted in our Hugging Face collection HKUST-DSAIL/Graph-R1.

【10】Poison Once, Refuse Forever: Weaponizing Alignment for Injecting Bias in LLMs
标题：一次毒药，永远拒绝：重新调整在LLM中注入偏见的对齐
链接：https://arxiv.org/abs/2508.20333

作者：ah Al Mamun, Ihsen Alouani, Nael Abu-Ghazaleh
摘要：大型语言模型（LLM）通过训练它们拒绝回答有害或不安全的提示来满足道德标准和安全要求。在本文中，我们展示了对手如何利用LLM的对齐来植入偏见，或执行有针对性的审查，而不会降低模型对无关主题的响应能力。具体来说，我们提出了颠覆性对齐注入（SAI），中毒攻击，利用对齐机制触发拒绝特定主题或查询预定义的对手。虽然这也许并不奇怪，拒绝可以诱导通过过度对齐，我们展示了如何利用这种拒绝注入偏见的模型。令人惊讶的是，SAI规避了最先进的中毒防御，包括LLM状态取证，以及旨在检测FL设置中中毒的强大聚合技术。我们通过说明这种攻击对LLM驱动的应用程序管道的端到端影响来展示这种攻击的实际危险。对于基于聊天的应用程序，如ChatDoctor，有1%的数据中毒，系统拒绝回答针对种族类别的医疗保健问题，导致高偏见（$\Delta DP$为23%）。我们还表明，偏见可以在其他NLP任务中诱导：对于一个简历选择管道，拒绝从选定的大学总结简历，选择的高偏见（$\Delta DP$为27%）结果。甚至更高的偏差（$\Delta DP$~38%）导致其他9个基于聊天的下游应用程序。
摘要：Large Language Models (LLMs) are aligned to meet ethical standards and safety requirements by training them to refuse answering harmful or unsafe prompts. In this paper, we demonstrate how adversaries can exploit LLMs' alignment to implant bias, or enforce targeted censorship without degrading the model's responsiveness to unrelated topics. Specifically, we propose Subversive Alignment Injection (SAI), a poisoning attack that leverages the alignment mechanism to trigger refusal on specific topics or queries predefined by the adversary. Although it is perhaps not surprising that refusal can be induced through overalignment, we demonstrate how this refusal can be exploited to inject bias into the model. Surprisingly, SAI evades state-of-the-art poisoning defenses including LLM state forensics, as well as robust aggregation techniques that are designed to detect poisoning in FL settings. We demonstrate the practical dangers of this attack by illustrating its end-to-end impacts on LLM-powered application pipelines. For chat based applications such as ChatDoctor, with 1% data poisoning, the system refuses to answer healthcare questions to targeted racial category leading to high bias ($\Delta DP$ of 23%). We also show that bias can be induced in other NLP tasks: for a resume selection pipeline aligned to refuse to summarize CVs from a selected university, high bias in selection ($\Delta DP$ of 27%) results. Even higher bias ($\Delta DP$~38%) results on 9 other chat based downstream applications.

【11】A Novel Framework for Automated Explain Vision Model Using Vision-Language Models
标题：使用视觉语言模型自动解释视觉模型的新框架
链接：https://arxiv.org/abs/2508.20227

作者：Nguyen, Tan-Hanh Pham, Chris Ngo, Truong Son Hy
摘要：许多视觉模型的开发主要集中在使用准确性，IoU和mAP等指标来提高其性能，由于应用xAI方法对训练模型进行有意义的解释的复杂性，因此对可解释性的关注较少。尽管许多现有的xAI方法旨在逐个样本地解释视觉模型，但解释视觉模型的一般行为的方法仍然没有得到充分研究，这些方法只能在大型数据集上运行后才能捕获。此外，了解视觉模型在一般图像上的行为对于防止偏见判断并帮助识别模型的趋势和模式非常重要。结合视觉语言模型的应用，本文提出了一种在样本和数据集级别解释视觉模型的管道。拟议的流水线可用于发现失败案例，并以最小的努力深入了解视觉模型，从而将视觉模型开发与xAI分析相结合，以推进图像分析。
摘要：The development of many vision models mainly focuses on improving their performance using metrics such as accuracy, IoU, and mAP, with less attention to explainability due to the complexity of applying xAI methods to provide a meaningful explanation of trained models. Although many existing xAI methods aim to explain vision models sample-by-sample, methods explaining the general behavior of vision models, which can only be captured after running on a large dataset, are still underexplored. Furthermore, understanding the behavior of vision models on general images can be very important to prevent biased judgments and help identify the model's trends and patterns. With the application of Vision-Language Models, this paper proposes a pipeline to explain vision models at both the sample and dataset levels. The proposed pipeline can be used to discover failure cases and gain insights into vision models with minimal effort, thereby integrating vision model development with xAI analysis to advance image analysis.

【12】Grounding Multimodal Large Language Models with Quantitative Skin Attributes: A Retrieval Study
标题：基于量化皮肤属性的多模式大型语言模型：检索研究
链接：https://arxiv.org/abs/2508.20188

作者：, Masih Eskandar, Nicholas Kurtansky, Jinyang Liu, Jochen Weber, Octavia Camps, Veronica Rotemberg, Jennifer Dy, Kivanc Kose
摘要：人工智能模型在诊断包括癌症在内的皮肤病方面取得了重大成功，显示出协助临床医生进行分析的潜力。然而，模型预测的可解释性必须得到显着提高才能用于实践。为此，我们探索两种有前途的方法的组合：多模态大语言模型（MLLM）和定量属性的使用。MLLM为提高可解释性提供了一个潜在的途径，通过交互式格式以自然语言为诊断提供推理。单独地，与病变外观相关的多个定量属性（例如，病变区域）最近被发现以高准确性预测恶性肿瘤。以这些概念为基础的预测有可能提高可解释性。我们提供的证据表明，MLLM嵌入空间可以接地在这样的属性，通过微调，以预测其值从图像。具体地说，我们通过使用SLICE-3D数据集的特定于属性的基于内容的图像检索案例研究，在嵌入空间中评估这种接地。
摘要：Artificial Intelligence models have demonstrated significant success in diagnosing skin diseases, including cancer, showing the potential to assist clinicians in their analysis. However, the interpretability of model predictions must be significantly improved before they can be used in practice. To this end, we explore the combination of two promising approaches: Multimodal Large Language Models (MLLMs) and quantitative attribute usage. MLLMs offer a potential avenue for increased interpretability, providing reasoning for diagnosis in natural language through an interactive format. Separately, a number of quantitative attributes that are related to lesion appearance (e.g., lesion area) have recently been found predictive of malignancy with high accuracy. Predictions grounded as a function of such concepts have the potential for improved interpretability. We provide evidence that MLLM embedding spaces can be grounded in such attributes, through fine-tuning to predict their values from images. Concretely, we evaluate this grounding in the embedding space through an attribute-specific content-based image retrieval case study using the SLICE-3D dataset.

【13】Spatio-Temporal Pruning for Compressed Spiking Large Language Models
标题：压缩尖峰大语言模型的时空修剪
链接：https://arxiv.org/abs/2508.20122

作者： Malyaban Bal, Brian Matejek, Susmit Jha, Adam Cobb, Abhronil Sengupta
摘要：大型语言模型（LLM）由于其大的模型大小和高推理延迟，在能源受限的环境中部署提出了重大挑战。尖峰神经网络（SNN）的灵感来自于大脑中稀疏的事件驱动神经处理和节能的信息传输，为实现低功耗计算提供了一个有前途的替代方案。将尖峰神经元的事件驱动效率与LLM的高级功能相结合，代表了节能LLM的一个有前途的方向。这项工作具体深入到压缩尖峰LLM的设计。在这里，我们从SNN的角度重新审视空间和时间修剪，并提出了一种新的时空修剪框架，用于Spiking LLM，以优化计算效率，同时保持高性能。我们的空间修剪技术减少了活跃神经元和注意头的数量，有效地降低了模型的计算复杂度。同时，时间修剪通过动态调整不同层所需的时间步数来最大限度地减少推理延迟。通过将这些方法与其他压缩技术相结合，我们提出了在Spiking LLM领域的第一项工作，以共同探索空间修剪，时间修剪，极端量化和知识蒸馏策略。我们提出的SpikingBERT框架在大规模GLUE基准上的广泛实验评估表明了我们的方法在计算操作和推理延迟方面的有效性。我们的方法为实时、低功耗的自然语言处理应用提供了一个令人信服的解决方案，使Spiking LLM在边缘设备和功率受限的环境中更实用。
摘要：Large Language Models (LLMs) present significant challenges for deployment in energy-constrained environments due to their large model sizes and high inference latency. Spiking Neural Networks (SNNs), inspired by the sparse event-driven neural processing and energy-efficient information transmission in the brain, offer a promising alternative for achieving low-power computing. Integrating the event-driven efficiency of spiking neurons with the advanced capabilities of LLMs represents a promising direction for power-efficient LLMs. This work specifically delves into the design of compressed spiking LLMs. Here, we revisit spatial and temporal pruning from the perspective of SNNs and propose a novel spatio-temporal pruning framework for Spiking LLMs to optimize computational efficiency while preserving high performance. Our spatial pruning technique reduces the number of active neurons and attention heads, effectively lowering the computational complexity of the model. Meanwhile, temporal pruning minimizes inference latency by dynamically adjusting the number of timesteps required for different layers. By combining these approaches with other compression techniques, we present the first work in the domain of Spiking LLMs to jointly explore spatial pruning, temporal pruning, extreme quantization and knowledge distillation strategies. Extensive experimental evaluation of our proposed framework for SpikingBERT on the large-scale GLUE benchmark demonstrates the efficacy of our approach in terms of computational operations and inference latency. Our approach offers a compelling solution for real-time, low-power natural language processing applications, making Spiking LLMs more practical for deployment on edge devices and in power-constrained settings.

【14】Evaluating LLMs on microservice-based applications: how complex is your specification?
标题：在基于微服务的应用程序上评估LLM：您的规范有多复杂？
链接：https://arxiv.org/abs/2508.20119

作者： Yellin
备注：20 pages + 7 pages appendices. 7 Figures. 8 Tables
摘要：在本文中，我们评估LLM在为现实世界的问题生成代码方面有多大的进步。具体来说，我们将探索基于微服务的应用程序的代码合成，这是一种广泛使用的架构模式。我们定义了一个标准的模板来指定这些应用程序，我们提出了一个衡量标准来判断规范的难度。分数越高，为规范生成代码就越困难。我们开发了一个框架来自动化使用单元测试为微服务测试LLM合成代码的过程。我们的实验结果表明，强大的LLM（如GPT-3 o-mini）在中等难度的规格上做得相当好，但在更高难度的规格上做得很差。这是由于更复杂的业务逻辑，更多地使用外部服务，数据库集成和包括非功能性功能，如身份验证。我们分析了LLM合成代码中的错误，并报告了LLM在为这些规范生成代码时所面临的关键挑战，从而提出了未来的研究方向，以改善现实世界问题的代码合成。
摘要：In this paper we evaluate how far LLMs have advanced in generating code for real-world problems. Specifically, we explore code synthesis for microservice-based applications, a widely used architecture pattern. We define a standard template for specifying these applications, and we propose a metric for judging the difficulty level of a specification. The higher the score, the more difficult it is to generate code for the specification. We develop a framework to automate the process of testing LLM-synthesized code for a microservice using unit tests. Our experimental results show that strong LLMs (like GPT-3o-mini) do fairly well on medium difficulty specifications but do very poorly on those of higher difficulty levels. This is due to more intricate business logic, a greater use of external services, database integration and inclusion of non-functional capabilities such as authentication. We analyzed the errors in LLM-synthesized code and report on the key challenges LLMs face in generating code for these specifications thereby suggesting future research directions to improve code synthesis for real-world problems.

Graph相关(图学习|图神经网络|图优化等)(8篇)

【1】Graph-Based Feature Augmentation for Predictive Tasks on Relational Datasets
标题：关系数据集中预测任务的基于图形的特征增强
链接：https://arxiv.org/abs/2508.20986

作者：Qiao, Ziqi Cao, Kaiyu Feng, Ye Yuan, Guoren Wang
摘要：数据已成为推动金融、医疗保健和电子商务等领域创新的基础资产。在这些领域中，通常采用关系表的预测建模，越来越强调通过自动机器学习（AutoML）技术减少手动工作。这提出了一个有趣的问题：特征增强本身可以自动化并识别和利用与任务相关的关系信号吗？为了应对这一挑战，我们提出了一个端到端的自动化特征增强框架ReCoGNN，它使用从多个关系表中提取的特征来增强初始数据集，以支持预测任务。ReCoGNN首先通过对表内属性关系建模来捕获每个表内的语义依赖关系，使其能够将表划分为结构化的、语义一致的段。然后，它构建一个异构加权图，表示跨所有段的行间关系。最后，ReCoGNN利用消息传递图神经网络通过图传播信息，指导特征选择并增强原始数据集。在10个真实和合成数据集上进行的广泛实验表明，ReCoGNN在分类和回归任务上始终优于现有方法。
摘要：Data has become a foundational asset driving innovation across domains such as finance, healthcare, and e-commerce. In these areas, predictive modeling over relational tables is commonly employed, with increasing emphasis on reducing manual effort through automated machine learning (AutoML) techniques. This raises an interesting question: can feature augmentation itself be automated and identify and utilize task-related relational signals? To address this challenge, we propose an end-to-end automated feature augmentation framework, ReCoGNN, which enhances initial datasets using features extracted from multiple relational tables to support predictive tasks. ReCoGNN first captures semantic dependencies within each table by modeling intra-table attribute relationships, enabling it to partition tables into structured, semantically coherent segments. It then constructs a heterogeneous weighted graph that represents inter-row relationships across all segments. Finally, ReCoGNN leverages message-passing graph neural networks to propagate information through the graph, guiding feature selection and augmenting the original dataset. Extensive experiments conducted on ten real-life and synthetic datasets demonstrate that ReCoGNN consistently outperforms existing methods on both classification and regression tasks.

【2】Turning Tabular Foundation Models into Graph Foundation Models
标题：将表格基础模型转化为图形基础模型
链接：https://arxiv.org/abs/2508.20906

作者：emeev, Gleb Bazhenov, Oleg Platonov, Artem Babenko, Liudmila Prokhorenkova
摘要：虽然基础模型已经彻底改变了自然语言处理和计算机视觉等领域，但它们在图机器学习中的应用和潜力在很大程度上尚未开发。设计图形基础模型（GFM）的关键挑战之一是处理不同图形数据集之间可能存在差异的各种节点特征。虽然许多关于GFM的工作都集中在文本属性图上，但在GFM中处理其他类型的任意特征的问题尚未得到充分解决。然而，这个问题并不是图域所独有的，因为它也出现在表格数据的机器学习领域。在这项工作中，受到TabPFNv 2等表格基础模型最近成功的激励，我们提出了G2 T-FM，这是一种使用TabPFNv 2作为主干的简单图基础模型。具体来说，G2 T-FM通过邻居特征聚合来增强原始节点特征，添加结构嵌入，然后将TabPFNv 2应用于构造的节点表示。即使在一个完全在上下文中的制度，我们的模型取得了很好的结果，显着优于公开的GFM，并表现出与从头开始训练的良好调整的GNN相当。此外，经过微调后，G2 T-FM超过了调整良好的GNN基线，突出了所提出方法的潜力。更广泛地说，我们的论文揭示了一个以前被忽视的方向，即利用表格基础模型进行图机器学习任务。
摘要：While foundation models have revolutionized such fields as natural language processing and computer vision, their application and potential within graph machine learning remain largely unexplored. One of the key challenges in designing graph foundation models (GFMs) is handling diverse node features that can vary across different graph datasets. Although many works on GFMs have been focused exclusively on text-attributed graphs, the problem of handling arbitrary features of other types in GFMs has not been fully addressed. However, this problem is not unique to the graph domain, as it also arises in the field of machine learning for tabular data. In this work, motivated by the recent success of tabular foundation models like TabPFNv2, we propose G2T-FM, a simple graph foundation model that employs TabPFNv2 as a backbone. Specifically, G2T-FM augments the original node features with neighborhood feature aggregation, adds structural embeddings, and then applies TabPFNv2 to the constructed node representations. Even in a fully in-context regime, our model achieves strong results, significantly outperforming publicly available GFMs and performing on par with well-tuned GNNs trained from scratch. Moreover, after finetuning, G2T-FM surpasses well-tuned GNN baselines, highlighting the potential of the proposed approach. More broadly, our paper reveals a previously overlooked direction of utilizing tabular foundation models for graph machine learning tasks.

【3】ATM-GAD: Adaptive Temporal Motif Graph Anomaly Detection for Financial Transaction Networks
标题：ATM-GAD：金融交易网络的自适应时态基元图异常检测
链接：https://arxiv.org/abs/2508.20829

作者：ng, Lin Song, Erkang Bao, Xiaoling Lv, Xinyue Wang
摘要：金融欺诈检测对于保护数十亿美元至关重要，但现代金融系统中相互交织的实体和快速变化的交易行为通常会击败传统的机器学习模型。最近的基于图的检测器通过将交易表示为网络而取得了进展，但它们仍然忽略了两个植根于时间的欺诈特征：（1）时间图案-重复出现的，泄露的子图，随着它们的展开而揭示可疑的资金流动-以及（2）异常活动的特定账户间隔，当欺诈仅在每个实体特有的短时间内出现时。为了利用这两种信号，我们引入了ATM-GAD，这是一种自适应图神经网络，它利用时间图案进行金融异常检测。时间模式提取器将每个账户的交易历史浓缩为最具信息性的模式，同时保留拓扑和时间模式。然后通过双注意力块分析这些基序：IntraA在单个基序内的相互作用上进行推理，而InterA在基序之间聚合证据以暴露多步欺诈方案。与此同时，一个可微分的自适应时间窗口学习器为每个节点量身定制观察窗口，使模型能够精确地关注最具揭示性的时间片。在四个真实数据集上的实验表明，ATM-GAD始终优于七个强大的异常检测基线，发现了早期方法遗漏的欺诈模式。
摘要：Financial fraud detection is essential to safeguard billions of dollars, yet the intertwined entities and fast-changing transaction behaviors in modern financial systems routinely defeat conventional machine learning models. Recent graph-based detectors make headway by representing transactions as networks, but they still overlook two fraud hallmarks rooted in time: (1) temporal motifs--recurring, telltale subgraphs that reveal suspicious money flows as they unfold--and (2) account-specific intervals of anomalous activity, when fraud surfaces only in short bursts unique to each entity. To exploit both signals, we introduce ATM-GAD, an adaptive graph neural network that leverages temporal motifs for financial anomaly detection. A Temporal Motif Extractor condenses each account's transaction history into the most informative motifs, preserving both topology and temporal patterns. These motifs are then analyzed by dual-attention blocks: IntraA reasons over interactions within a single motif, while InterA aggregates evidence across motifs to expose multi-step fraud schemes. In parallel, a differentiable Adaptive Time-Window Learner tailors the observation window for every node, allowing the model to focus precisely on the most revealing time slices. Experiments on four real-world datasets show that ATM-GAD consistently outperforms seven strong anomaly-detection baselines, uncovering fraud patterns missed by earlier methods.

【4】GDS Agent: A Graph Algorithmic Reasoning Agent
标题：GDS Agent：图形数学推理Agent
链接：https://arxiv.org/abs/2508.20637

作者：, Ioannis Panagiotas
备注：Technical report
摘要：大型语言模型（LLM）已经显示出显著的多模态信息处理和推理能力。当通过函数调用配备工具并使用检索增强技术进行增强时，基于LLM的复合系统可以访问封闭的数据源并回答有关它们的问题。然而，他们仍然难以处理和推理大规模的图结构数据。我们在这份技术报告中介绍了GDS（图形数据科学）代理。GDS代理在模型上下文协议（MCP）服务器中引入了一套全面的图形算法作为工具，以及算法结果的预处理（检索）和后处理。该服务器可以与任何现代LLM开箱即用。GDS代理允许用户提出任何隐含和内在地需要对其数据进行图算法推理的问题，并快速获得准确和有根据的答案。我们还引入了一个新的基准，评估中间工具调用以及最终的响应。结果表明，GDS代理能够解决广泛的图形任务。我们还为更多开放式任务提供了详细的案例研究，并研究了代理挣扎的场景。最后，我们讨论了剩余的挑战和未来的路线图。
摘要：Large language models (LLMs) have shown remarkable multimodal information processing and reasoning ability. When equipped with tools through function calling and enhanced with retrieval-augmented techniques, compound LLM-based systems can access closed data sources and answer questions about them. However, they still struggle to process and reason over large-scale graph-structure data. We introduce the GDS (Graph Data Science) agent in this technical report. The GDS agent introduces a comprehensive set of graph algorithms as tools, together with preprocessing (retrieval) and postprocessing of algorithm results, in a model context protocol (MCP) server. The server can be used with any modern LLM out-of-the-box. GDS agent allows users to ask any question that implicitly and intrinsically requires graph algorithmic reasoning about their data, and quickly obtain accurate and grounded answers. We also introduce a new benchmark that evaluates intermediate tool calls as well as final responses. The results indicate that GDS agent is able to solve a wide spectrum of graph tasks. We also provide detailed case studies for more open-ended tasks and study scenarios where the agent struggles. Finally, we discuss the remaining challenges and the future roadmap.

【5】Local Virtual Nodes for Alleviating Over-Squashing in Graph Neural Networks
标题：缓解图神经网络过度挤压的本地虚拟节点
链接：https://arxiv.org/abs/2508.20597

作者：san Karabulut, İnci M. Baytaş
摘要：过度挤压是训练图神经网络用于涉及远程依赖关系的任务的挑战。在这样的任务中，GNN的感受野应该足够大，以实现远程节点之间的通信。然而，从广泛的邻域收集信息并将其内容压缩为固定大小的节点表示，使得消息传递容易受到瓶颈的影响。图重新布线和添加虚拟节点是通常研究的补救措施，可以在瓶颈周围创建额外的路径，以减轻过度挤压。然而，这些技术改变了输入图的全局拓扑结构，并破坏了编码在原始图结构中的领域知识，这两者对于特定的任务和领域都是必不可少的。本研究提出了具有可训练嵌入的局部虚拟节点（LVN），以减轻过度挤压的影响，而不会显着破坏输入图的全局结构。LVN的位置由节点中心性决定，这表明存在潜在的瓶颈。因此，拟议的办法旨在改善可能存在瓶颈的区域的连通性。此外，在选定的中心区域之间共享的可训练LVN嵌入促进了远程节点之间的通信，而无需添加更多层。在基准数据集上的大量实验表明，LVN可以增强结构连接性，并显着提高图和节点分类任务的性能。代码可以在https：//github.com/ALLab-Boun/LVN/}{https：//github.com/ALLab-Boun/LVN/.
摘要：Over-squashing is a challenge in training graph neural networks for tasks involving long-range dependencies. In such tasks, a GNN's receptive field should be large enough to enable communication between distant nodes. However, gathering information from a wide range of neighborhoods and squashing its content into fixed-size node representations makes message-passing vulnerable to bottlenecks. Graph rewiring and adding virtual nodes are commonly studied remedies that create additional pathways around bottlenecks to mitigate over-squashing. However, these techniques alter the input graph's global topology and disrupt the domain knowledge encoded in the original graph structure, both of which could be essential to specific tasks and domains. This study presents Local Virtual Nodes (LVN) with trainable embeddings to alleviate the effects of over-squashing without significantly corrupting the global structure of the input graph. The position of the LVNs is determined by the node centrality, which indicates the existence of potential bottlenecks. Thus, the proposed approach aims to improve the connectivity in the regions with likely bottlenecks. Furthermore, trainable LVN embeddings shared across selected central regions facilitate communication between distant nodes without adding more layers. Extensive experiments on benchmark datasets demonstrate that LVNs can enhance structural connectivity and significantly improve performance on graph and node classification tasks. The code can be found at https://github.com/ALLab-Boun/LVN/}{https://github.com/ALLab-Boun/LVN/.

【6】FORGE: Foundational Optimization Representations from Graph Embeddings
标题：FORGE：来自图嵌入的基础优化表示
链接：https://arxiv.org/abs/2508.20330

作者：afi, Serdar Kadioglu
摘要：组合优化问题在科学和工程中无处不在，但基于学习的方法来加速其解决方案往往需要解决大量难以解决的优化实例来收集训练数据，从而产生显着的计算开销。现有方法需要为每个下游任务的每个问题分布训练专用模型，严重限制了它们的可扩展性和泛化性。在这项工作中，我们介绍了Forge，这是一种在大量不同的混合整数规划（MIP）实例集合上以无监督方式预训练矢量量化图自动编码器的方法，而不依赖于它们的解决方案。矢量量化过程创建离散代码分配，其充当词汇表以表示优化实例。我们评估我们的方法在监督和无监督的设置。对于无监督设置，我们证明了Forge嵌入有效地区分和聚类看不见的实例。对于有监督的设置，我们微调了Forge嵌入，并表明单个模型可以预测多个问题类型分布中的热启动变量和切割生成的完整性间隙。这两种预测都有助于提高最先进的商业优化求解器的性能。最后，我们在https://github.com/skadio/forge/上发布了我们的代码和预训练的Forge权重，以鼓励进一步研究和实际使用实例级MIP嵌入
摘要：Combinatorial optimization problems are ubiquitous in science and engineering, yet learning-based approaches to accelerate their solution often require solving a large number of hard-to-solve optimization instances to collect training data, incurring significant computational overhead. Existing methods require training dedicated models for each problem distribution for each downstream task, severely limiting their scalability and generalization. In this work, we introduce Forge, a method of pre-training a vector-quantized graph autoencoder on a large and diverse collection of mixed-integer programming (MIP) instances in an unsupervised fashion without dependency on their solution. The vector quantization process creates discrete code assignments that act as a vocabulary to represent optimization instances. We evaluate our approach under both supervised and unsupervised settings. For the unsupervised setting, we demonstrate that Forge embeddings effectively differentiate and cluster unseen instances. For the supervised setting, we fine-tune Forge embeddings and show that a single model predicts both the variables for warm-starts and integrality gaps for cut-generation across multiple problem type distributions. Both predictions help improve performance of a state-of-the-art, commercial optimization solver. Finally, we release our code and pre-trained Forge weights to encourage further research and practical use of instance-level MIP embeddings at https://github.com/skadio/forge/

【7】Multi-View Graph Convolution Network for Internal Talent Recommendation Based on Enterprise Emails
标题：基于企业电子邮件的内部人才推荐多视图图形卷积网络
链接：https://arxiv.org/abs/2508.20328

作者：Kim, Jang-Hyun Kim
摘要：内部人才推荐是组织连续性的一项重要战略，但传统办法存在结构性限制，往往因依赖少数管理人员的狭隘视角而忽视合格的候选人。为了应对这一挑战，我们提出了一个新的框架，模型两个不同的维度的员工的职位适合从电子邮件数据：他们做什么（任务的语义相似性）和他们如何工作（他们的互动和协作的结构特征）。这些维度被表示为独立的图，并使用具有门控机制的双图卷积网络（GCN）自适应地融合。实验表明，我们提出的基于门控的融合模型显着优于其他融合策略和启发式基线，达到了40.9%的最高性能在Hit@100。重要的是，值得注意的是，该模型通过学习不同的，上下文感知的融合策略，为不同的工作家庭表现出很高的可解释性。例如，它学会了为“销售和营销”工作类别优先考虑关系数据，同时为“研究”工作类别应用平衡方法。这项研究为内部人才发现提供了一个定量和全面的框架，最大限度地减少了传统方法中固有的候选人遗漏风险。它的主要贡献在于它的能力，以经验确定任务对齐（什么）和合作模式（如何），这是员工在新的职位上取得成功所需的最佳融合比，从而提供了重要的实际意义。
摘要：Internal talent recommendation is a critical strategy for organizational continuity, yet conventional approaches suffer from structural limitations, often overlooking qualified candidates by relying on the narrow perspective of a few managers. To address this challenge, we propose a novel framework that models two distinct dimensions of an employee's position fit from email data: WHAT they do (semantic similarity of tasks) and HOW they work (structural characteristics of their interactions and collaborations). These dimensions are represented as independent graphs and adaptively fused using a Dual Graph Convolutional Network (GCN) with a gating mechanism. Experiments show that our proposed gating-based fusion model significantly outperforms other fusion strategies and a heuristic baseline, achieving a top performance of 40.9% on Hit@100. Importantly, it is worth noting that the model demonstrates high interpretability by learning distinct, context-aware fusion strategies for different job families. For example, it learned to prioritize relational (HOW) data for 'sales and marketing' job families while applying a balanced approach for 'research' job families. This research offers a quantitative and comprehensive framework for internal talent discovery, minimizing the risk of candidate omission inherent in traditional methods. Its primary contribution lies in its ability to empirically determine the optimal fusion ratio between task alignment (WHAT) and collaborative patterns (HOW), which is required for employees to succeed in the new positions, thereby offering important practical implications.

【8】Bounds on Perfect Node Classification: A Convex Graph Clustering Perspective
标题：完美节点分类的界限：凸面图聚集的角度
链接：https://arxiv.org/abs/2508.20231

作者：ahriari-Mehr, Javad Aliakbari, Alexandre Graell i Amat, Ashkan Panahi
摘要：我们提出了一个分析的转导节点分类问题，其中底层图由社区同意的节点标签和节点功能。对于节点分类，我们提出了一种新的优化问题，将特定于节点的信息（标签和功能）的谱图聚类框架。研究这个问题，我们展示了图结构和特定于节点的信息之间的协同作用。特别是，我们表明，适当的节点特定的信息保证我们的优化问题的解决方案完美地恢复社区，在温和的条件下比单独的图聚类的界限。我们提出了优化问题的算法解决方案和证实这种协同作用的数值实验。
摘要：We present an analysis of the transductive node classification problem, where the underlying graph consists of communities that agree with the node labels and node features. For node classification, we propose a novel optimization problem that incorporates the node-specific information (labels and features) in a spectral graph clustering framework. Studying this problem, we demonstrate a synergy between the graph structure and node-specific information. In particular, we show that suitable node-specific information guarantees the solution of our optimization problem perfectly recovering the communities, under milder conditions than the bounds on graph clustering alone. We present algorithmic solutions to our optimization problem and numerical experiments that confirm such a synergy.

Transformer(6篇)

【1】SKGE-SWIN: End-To-End Autonomous Vehicle Waypoint Prediction and Navigation Using Skip Stage Swin Transformer
标题：SKGE-SWIN：使用Skip Stage Swin Transformer的端到端自主车辆航路点预测和导航
链接：https://arxiv.org/abs/2508.20762

作者：jm Noer Kartiman, Rasim, Yaya Wihardi, Nurul Hasanah, Oskar Natan, Bambang Wahono, Taufik Ibnu Salim
备注：keywords-multitask learning, autonomous driving, end-to-end learning, skip connections, swin transformer, self-attention mechanism. 12 pages
摘要：本研究针对具有像素到像素情境感知的端到端自主车辆模型的开发，提出了SKGE-Swin架构。该体系结构利用Swin Transformer的跳跃阶段机制，以扩大全球范围内的功能表示，并在各种网络级别。这种方法使模型能够利用Swin Transformer的基于窗口的多头自注意（SW-MSA）机制从远处的像素提取信息，并保留从初始到最终阶段的特征提取的关键信息，从而提高其理解车辆周围环境中复杂模式的能力。该模型在CARLA平台上进行评估，使用对抗性场景来模拟真实世界的条件。实验结果表明，SKGE-Swin架构实现了优越的驾驶分数相比，以前的方法。此外，将进行消融研究，以评价每个架构组件的贡献，包括跳跃连接的影响和Swin Transformer的使用，以改善模型性能。
摘要：Focusing on the development of an end-to-end autonomous vehicle model with pixel-to-pixel context awareness, this research proposes the SKGE-Swin architecture. This architecture utilizes the Swin Transformer with a skip-stage mechanism to broaden feature representation globally and at various network levels. This approach enables the model to extract information from distant pixels by leveraging the Swin Transformer's Shifted Window-based Multi-head Self-Attention (SW-MSA) mechanism and to retain critical information from the initial to the final stages of feature extraction, thereby enhancing its capability to comprehend complex patterns in the vehicle's surroundings. The model is evaluated on the CARLA platform using adversarial scenarios to simulate real-world conditions. Experimental results demonstrate that the SKGE-Swin architecture achieves a superior Driving Score compared to previous methods. Furthermore, an ablation study will be conducted to evaluate the contribution of each architectural component, including the influence of skip connections and the use of the Swin Transformer, in improving model performance.

【2】Structure-aware Hypergraph Transformer for Diagnosis Prediction in Electronic Health Records
标题：结构感知超图Transformer在电子病历诊断预测中的应用
链接：https://arxiv.org/abs/2508.20500

作者：ng, Ye Yuan
摘要：电子健康记录（EHR）通过标准化的医疗代码系统地组织患者健康数据，作为预测建模的全面和宝贵的来源。图神经网络（GNN）已经证明了对EHR中医疗代码之间的交互建模的有效性。然而，现有的基于GNN的方法是不够的，因为：a）它们对成对关系的依赖无法捕获临床数据中固有的高阶依赖关系，以及b）本地化的消息传递方案限制了表示能力。为了解决这些问题，本文提出了一种新的结构感知超图Transformer（SHGT）框架以下三重思想：a）采用超图结构编码器来捕获医学代码之间的高阶交互，b）集成Transformer架构来推理整个超图，以及c）设计一个定制的损失函数，结合超图重建来保留超图的原始结构。在实际EHR数据集上的实验表明，该模型在诊断预测方面优于现有的最先进的模型。
摘要：Electronic Health Records (EHR) systematically organize patient health data through standardized medical codes, serving as a comprehensive and invaluable source for predictive modeling. Graph neural networks (GNNs) have demonstrated effectiveness in modeling interactions between medical codes within EHR. However, existing GNN-based methods are inadequate due to: a) their reliance on pairwise relations fails to capture the inherent higher-order dependencies in clinical data, and b) the localized message-passing scheme limits representation power. To address these issues, this paper proposes a novel Structure-aware HyperGraph Transformer (SHGT) framework following three-fold ideas: a) employing a hypergraph structural encoder to capture higher-order interactions among medical codes, b) integrating the Transformer architecture to reason over the entire hypergraph, and c) designing a tailored loss function incorporating hypergraph reconstruction to preserve the hypergraph's original structure. Experiments on real-world EHR datasets demonstrate that the proposed SHGT outperforms existing state-of-the-art models on diagnosis prediction.

【3】Rethinking Transformer Connectivity: TLinFormer, A Path to Exact, Full Context-Aware Linear Attention
标题：重新思考Transformer连接性：TLinFormer，精确、完全感知上下文的线性注意力之路
链接：https://arxiv.org/abs/2508.20407

作者：Tang
摘要：Transformer架构已成为现代人工智能的基石，但其核心的自我注意力机制受到复杂性瓶颈的影响，该瓶颈与序列长度成二次方，严重限制了其在长序列任务中的应用。为了解决这一挑战，现有的线性注意力方法通常通过依赖于数据不可知的内核近似或限制性上下文选择来牺牲模型性能。本文回到联结主义的基本原理，从信息流的拓扑结构出发，提出了一种新的线性注意结构--\textbf{TLinFormer}。通过重新配置神经元连接模式，TLinFormer实现了严格的线性复杂度，同时计算精确的注意力分数，并确保信息流仍然知道完整的历史背景。该设计旨在弥合现有高效注意方法和标准注意之间普遍存在的性能差距。通过一系列的实验，我们系统地评估了TLinFormer的性能与标准的Transformer基线上的长序列推理任务。结果表明，TLinFormer在关键指标方面具有压倒性的优势，如\textbf{推理延迟}、\textbf{KV缓存效率}、\textbf{内存占用}和\textbf{整体加速}。
摘要：The Transformer architecture has become a cornerstone of modern artificial intelligence, but its core self-attention mechanism suffers from a complexity bottleneck that scales quadratically with sequence length, severely limiting its application in long-sequence tasks. To address this challenge, existing linear attention methods typically sacrifice model performance by relying on data-agnostic kernel approximations or restrictive context selection. This paper returns to the first principles of connectionism, starting from the topological structure of information flow, to introduce a novel linear attention architecture-\textbf{TLinFormer}. By reconfiguring neuron connection patterns, TLinFormer achieves strict linear complexity while computing exact attention scores and ensuring information flow remains aware of the full historical context. This design aims to bridge the performance gap prevalent between existing efficient attention methods and standard attention. Through a series of experiments, we systematically evaluate the performance of TLinFormer against a standard Transformer baseline on long-sequence inference tasks. The results demonstrate that TLinFormer exhibits overwhelming advantages in key metrics such as \textbf{inference latency}, \textbf{KV cache efficiency}, \textbf{memory footprint}, and \textbf{overall speedup}.

【4】TF-TransUNet1D: Time-Frequency Guided Transformer U-Net for Robust ECG Denoising in Digital Twin
标题：TF-TransUNet 1D：时频引导Transformer U-Net，用于数字双胞胎中的稳健心电图去噪
链接：https://arxiv.org/abs/2508.20398

作者：ng, Lei Li
备注：9 pages, 3 figures International Workshop on Digital Twin for Healthcare (DT4H) in MICCAI 2025 (Daejeon, Republic of Korea)
摘要：心电图（ECG）信号作为心脏数字孪生的基础数据源，但其诊断效用经常受到噪声和伪影的影响。为了解决这个问题，我们提出了TF-TransUNet 1D，这是一种新型的一维深度神经网络，它将基于U-Net的编码器-解码器架构与Transformer编码器集成在一起，由混合时频域损失指导。该模型被设计为同时捕获局部形态特征和长程时间依赖性，这对于保持ECG信号的诊断完整性至关重要。为了增强去噪鲁棒性，我们引入了一个双域损失函数，该函数联合优化了时域中的波形重建和频域中的频谱保真度。特别地，频域分量有效地抑制高频噪声，同时保持信号的频谱结构，使得能够恢复细微但具有临床意义的波形分量。我们使用来自MIT-BIH心律失常数据库和噪声应力测试数据库（NSTDB）的综合损坏信号评估TF-TransUNet 1D。与最先进的基线进行的比较实验表明，我们的模型在SNR改善和误差度量方面具有一致的优越性，平均绝对误差为0.1285，Pearson相关系数为0.9540。通过提供高精度的去噪，这项工作弥补了心脏数字双胞胎预处理管道的关键差距，实现了更可靠的实时监测和个性化建模。
摘要：Electrocardiogram (ECG) signals serve as a foundational data source for cardiac digital twins, yet their diagnostic utility is frequently compromised by noise and artifacts. To address this issue, we propose TF-TransUNet1D, a novel one-dimensional deep neural network that integrates a U-Net-based encoder-decoder architecture with a Transformer encoder, guided by a hybrid time-frequency domain loss. The model is designed to simultaneously capture local morphological features and long-range temporal dependencies, which are critical for preserving the diagnostic integrity of ECG signals. To enhance denoising robustness, we introduce a dual-domain loss function that jointly optimizes waveform reconstruction in the time domain and spectral fidelity in the frequency domain. In particular, the frequency-domain component effectively suppresses high-frequency noise while maintaining the spectral structure of the signal, enabling recovery of subtle but clinically significant waveform components. We evaluate TF-TransUNet1D using synthetically corrupted signals from the MIT-BIH Arrhythmia Database and the Noise Stress Test Database (NSTDB). Comparative experiments against state-of-the-art baselines demonstrate consistent superiority of our model in terms of SNR improvement and error metrics, achieving a mean absolute error of 0.1285 and Pearson correlation coefficient of 0.9540. By delivering high-precision denoising, this work bridges a critical gap in pre-processing pipelines for cardiac digital twins, enabling more reliable real-time monitoring and personalized modeling.

【5】CoFormer: Collaborating with Heterogeneous Edge Devices for Scalable Transformer Inference
标题：CoFormer：与异类边缘设备合作实现可扩展的Transformer推理
链接：https://arxiv.org/abs/2508.20375

作者：, Zhiwei Hao, Li Shen, Yong Luo, Fuhui Sun, Xiaoyan Wang, Han Hu, Yonggang Wen
备注：Accepted by IEEE Transactions on Computers
摘要：Transformer模型令人印象深刻的性能引发了在资源受限的边缘设备上部署智能应用程序的热潮。然而，由于这些模型的计算需求和资源需求相当大，确保实时边缘系统的高质量服务是一个重大挑战。现有的策略通常要么将Transformer计算卸载到其他设备，要么直接在单个边缘设备上部署压缩模型。然而，这些策略导致相当大的通信开销或准确性和效率之间的次优权衡。为了解决这些挑战，我们提出了一个协作推理系统一般Transformer模型，称为CoFormer。CoFormer背后的中心思想是利用Transformer的可分性和可积性。一个现成的大型Transformer可以被分解为多个较小的模型进行分布式推理，它们的中间结果被聚合以生成最终输出。我们制定了一个优化问题，以最大限度地减少异构硬件约束下的推理延迟和准确性下降。提出了DeBo算法，首先求解优化问题以获得分解策略，然后逐步校准分解模型以恢复性能。我们展示了在异构边缘设备上支持各种各样的Transformer模型的能力，使用大型Transformer模型实现了高达3.1$\times$的推理加速。值得注意的是，CoFormer可以在边缘设备上使用16亿个参数对GPT 2-XL进行高效推理，将内存需求降低了76.3%。CoFormer还可以减少约40%的能耗，同时保持令人满意的推理性能。
摘要：The impressive performance of transformer models has sparked the deployment of intelligent applications on resource-constrained edge devices. However, ensuring high-quality service for real-time edge systems is a significant challenge due to the considerable computational demands and resource requirements of these models. Existing strategies typically either offload transformer computations to other devices or directly deploy compressed models on individual edge devices. These strategies, however, result in either considerable communication overhead or suboptimal trade-offs between accuracy and efficiency. To tackle these challenges, we propose a collaborative inference system for general transformer models, termed CoFormer. The central idea behind CoFormer is to exploit the divisibility and integrability of transformer. An off-the-shelf large transformer can be decomposed into multiple smaller models for distributed inference, and their intermediate results are aggregated to generate the final output. We formulate an optimization problem to minimize both inference latency and accuracy degradation under heterogeneous hardware constraints. DeBo algorithm is proposed to first solve the optimization problem to derive the decomposition policy, and then progressively calibrate decomposed models to restore performance. We demonstrate the capability to support a wide range of transformer models on heterogeneous edge devices, achieving up to 3.1$\times$ inference speedup with large transformer models. Notably, CoFormer enables the efficient inference of GPT2-XL with 1.6 billion parameters on edge devices, reducing memory requirements by 76.3\%. CoFormer can also reduce energy consumption by approximately 40\% while maintaining satisfactory inference performance.

【6】What can we learn from signals and systems in a transformer? Insights for probabilistic modeling and inference architecture
标题：我们可以从Transformer中的信号和系统中学到什么？概率建模和推理架构的见解
链接：https://arxiv.org/abs/2508.20211

作者：g Chang, Prashant G. Mehta
备注：21 pages, 5 figures
摘要：在20世纪40年代，维纳引入了线性预测器，其中未来预测是通过线性组合过去的数据来计算的。Transformer概括了这个想法：它是一个非线性预测器，其中通过非线性组合过去的令牌来计算下一个令牌预测。在这篇文章中，我们提出了一个概率模型，解释Transformer信号作为条件措施的代理，和层操作作为定点更新。当概率模型是隐马尔可夫模型（HMM）的特殊情况下，一个明确的形式的定点更新。在某种程度上，本文试图将经典的非线性滤波理论与现代推理架构联系起来。
摘要：In the 1940s, Wiener introduced a linear predictor, where the future prediction is computed by linearly combining the past data. A transformer generalizes this idea: it is a nonlinear predictor where the next-token prediction is computed by nonlinearly combining the past tokens. In this essay, we present a probabilistic model that interprets transformer signals as surrogates of conditional measures, and layer operations as fixed-point updates. An explicit form of the fixed-point update is described for the special case when the probabilistic model is a hidden Markov model (HMM). In part, this paper is in an attempt to bridge the classical nonlinear filtering theory with modern inference architectures.

GAN|对抗|攻击|生成相关(5篇)

【1】FW-GAN: Frequency-Driven Handwriting Synthesis with Wave-Modulated MLP Generator
标题：FW-GAN：采用波调MLP发生器的频率驱动手写合成
链接：https://arxiv.org/abs/2508.21040

作者：g Dang Khoa, Dang Hoai Nam, Vo Nguyen Le Duy
摘要：标记的手写数据通常是稀缺的，限制了识别系统的有效性，需要不同的，风格一致的训练样本。手写合成提供了一个很有前途的解决方案，通过生成人工数据来增强训练。然而，目前的方法面临两个主要限制。首先，大多数都是建立在传统的卷积架构之上的，这些架构很难对长期依赖性和复杂的笔画模式进行建模。其次，他们在很大程度上忽略了频率信息的关键作用，这对于捕捉笔迹中的细粒度风格和结构细节至关重要。为了解决这些挑战，我们提出了FW-GAN，一个一次性的手写合成框架，从一个例子中生成逼真的，作家一致的文本。我们的生成器集成了一个相位感知的Wave-MLP，以更好地捕捉空间关系，同时保留微妙的风格线索。我们还引入了一个频率引导的采样，利用高频分量，以提高生成的样本的真实性检测。此外，我们还引入了一种新型的频率分布损失，可以对齐合成手写和真实手写的频率特征，从而增强视觉保真度。在越南语和英语手写数据集上的实验表明，FW-GAN生成高质量、风格一致的手写，使其成为在低资源手写识别（HTR）管道中增强数据的宝贵工具。可在https://github.com/DAIR-Group/FW-GAN上获得正式实施
摘要：Labeled handwriting data is often scarce, limiting the effectiveness of recognition systems that require diverse, style-consistent training samples. Handwriting synthesis offers a promising solution by generating artificial data to augment training. However, current methods face two major limitations. First, most are built on conventional convolutional architectures, which struggle to model long-range dependencies and complex stroke patterns. Second, they largely ignore the crucial role of frequency information, which is essential for capturing fine-grained stylistic and structural details in handwriting. To address these challenges, we propose FW-GAN, a one-shot handwriting synthesis framework that generates realistic, writer-consistent text from a single example. Our generator integrates a phase-aware Wave-MLP to better capture spatial relationships while preserving subtle stylistic cues. We further introduce a frequency-guided discriminator that leverages high-frequency components to enhance the authenticity detection of generated samples. Additionally, we introduce a novel Frequency Distribution Loss that aligns the frequency characteristics of synthetic and real handwriting, thereby enhancing visual fidelity. Experiments on Vietnamese and English handwriting datasets demonstrate that FW-GAN generates high-quality, style-consistent handwriting, making it a valuable tool for augmenting data in low-resource handwriting recognition (HTR) pipelines. Official implementation is available at https://github.com/DAIR-Group/FW-GAN

【2】CrystalICL: Enabling In-Context Learning for Crystal Generation
标题：CrystalICL：为Crystal Generation实现上下文学习
链接：https://arxiv.org/abs/2508.20143

作者：ang, Qiaoyu Tan, Yili Wang, Ying Wang, Xin Wang
摘要：设计具有所需物理化学性质的晶体材料仍然是材料科学的基本挑战。虽然大型语言模型（LLM）已经展示了强大的上下文内学习（ICL）能力，但是现有的基于LLM的晶体生成方法限于zero-shot场景，并且不能从Few-Shot场景中受益。相比之下，人类专家通常通过修改与Few-Shot ICL范例紧密一致的相关已知结构来设计新材料。出于这一动机，我们提出了CrystalICL，设计用于Few-Shot晶体生成的新模型。具体地说，我们引入了一种基于空间群的晶体标记化方法，有效地降低了LLM中晶体对称性建模的复杂性。我们进一步引入了一个条件结构感知的混合指令调优框架和多任务指令调优策略，使模型能够更好地利用ICL从有限的数据捕获的结构属性关系。在四个晶体生成基准上的广泛实验证明了CrystalICL在有条件和无条件生成任务上优于领先的基线方法。
摘要：Designing crystal materials with desired physicochemical properties remains a fundamental challenge in materials science. While large language models (LLMs) have demonstrated strong in-context learning (ICL) capabilities, existing LLM-based crystal generation approaches are limited to zero-shot scenarios and are unable to benefit from few-shot scenarios. In contrast, human experts typically design new materials by modifying relevant known structures which aligns closely with the few-shot ICL paradigm. Motivated by this, we propose CrystalICL, a novel model designed for few-shot crystal generation. Specifically, we introduce a space-group based crystal tokenization method, which effectively reduces the complexity of modeling crystal symmetry in LLMs. We further introduce a condition-structure aware hybrid instruction tuning framework and a multi-task instruction tuning strategy, enabling the model to better exploit ICL by capturing structure-property relationships from limited data. Extensive experiments on four crystal generation benchmarks demonstrate the superiority of CrystalICL over the leading baseline methods on conditional and unconditional generation tasks.

【3】ArgRAG: Explainable Retrieval Augmented Generation using Quantitative Bipolar Argumentation
标题：ArgRAG：使用定量双极论证的可解释检索增强生成
链接：https://arxiv.org/abs/2508.20131

作者： Zhu, Nico Potyka, Daniel Hernández, Yuan He, Zifeng Ding, Bo Xiong, Dongzhuoran Zhou, Evgeny Kharlamov, Steffen Staab
摘要：检索增强生成（RAG）通过整合外部知识来增强大型语言模型，但在高风险领域受到严重限制-即对噪声或矛盾证据的敏感性以及不透明的随机决策。我们提出了ArgRAG，一个可解释的，可解释的替代方案，取代黑盒推理与结构化推理，使用定量双极论证框架（QBAF）。ArgRAG从检索到的文档中构造QBAF，并在渐进语义下执行确定性推理。这有助于忠实地解释和质疑决策。在PubHealth和RAGuard两个事实验证基准上进行评估，ArgRAG实现了很高的准确性，同时显着提高了透明度。
摘要：Retrieval-Augmented Generation (RAG) enhances large language models by incorporating external knowledge, yet suffers from critical limitations in high-stakes domains -- namely, sensitivity to noisy or contradictory evidence and opaque, stochastic decision-making. We propose ArgRAG, an explainable, and contestable alternative that replaces black-box reasoning with structured inference using a Quantitative Bipolar Argumentation Framework (QBAF). ArgRAG constructs a QBAF from retrieved documents and performs deterministic reasoning under gradual semantics. This allows faithfully explaining and contesting decisions. Evaluated on two fact verification benchmarks, PubHealth and RAGuard, ArgRAG achieves strong accuracy while significantly improving transparency.

【4】MicroLad: 2D-to-3D Microstructure Reconstruction and Generation via Latent Diffusion and Score Distillation
标题：MicroLad：通过潜在扩散和分数蒸馏进行2D到3D微结构重建和生成
链接：https://arxiv.org/abs/2508.20138

作者： Lee, Faez Ahmed
摘要：在材料工程中建立可靠的结构-性能（SP）联系的一个主要障碍是缺乏多样化的3D微观结构数据集。有限的数据集可用性和对分析和设计空间的控制不足限制了可实现的微结构形态的多样性，阻碍了解决逆（性质到结构）设计问题的进展。为了应对这些挑战，我们引入了MicroLad，这是一种专门用于从2D数据重建3D微观结构的潜在扩散框架。在2D图像上训练并在潜在空间中采用多平面去噪扩散采样，该框架可靠地生成稳定和连贯的3D体积，这些体积在统计上与原始数据保持一致。虽然这种重建能力使维度扩展（2D到3D），从2D数据生成统计上等效的3D样品，有效的探索微观结构设计需要的方法来指导生成过程中的特定目标。为了实现这一点，MicroLad集成了分数蒸馏采样（SDS），它结合了微结构的匹配和属性对齐项的微分分数损失。这种方法更新潜在空间中3D体积的编码2D切片，从而实现鲁棒的逆控制2D到3D微结构生成。因此，该方法有利于探索一个扩展的三维微观结构分析和设计空间方面的微观结构描述符和材料性能。
摘要：A major obstacle to establishing reliable structure-property (SP) linkages in materials engineering is the scarcity of diverse 3D microstructure datasets. Limited dataset availability and insufficient control over the analysis and design space restrict the variety of achievable microstructure morphologies, hindering progress in solving the inverse (property-to-structure) design problem. To address these challenges, we introduce MicroLad, a latent diffusion framework specifically designed for reconstructing 3D microstructures from 2D data. Trained on 2D images and employing multi-plane denoising diffusion sampling in the latent space, the framework reliably generates stable and coherent 3D volumes that remain statistically consistent with the original data. While this reconstruction capability enables dimensionality expansion (2D-to-3D) for generating statistically equivalent 3D samples from 2D data, effective exploration of microstructure design requires methods to guide the generation process toward specific objectives. To achieve this, MicroLad integrates score distillation sampling (SDS), which combines a differentiable score loss with microstructural descriptor-matching and property-alignment terms. This approach updates encoded 2D slices of the 3D volume in the latent space, enabling robust inverse-controlled 2D-to-3D microstructure generation. Consequently, the method facilitates exploration of an expanded 3D microstructure analysis and design space in terms of both microstructural descriptors and material properties.

【5】Is Audio Spoof Detection Robust to Laundering Attacks?
标题：音频欺骗检测对洗钱攻击是否稳健？
链接：https://arxiv.org/abs/2408.14712

作者：i, Surya Subramani, Shefali Sudhir, Raksha Varahamurthy, Hafiz Malik
备注：Conference Paper
摘要：近年来，语音克隆（VC）系统在合成语音的真实性方面有了异常的增长。合成语音的高质量和低成本VC服务的可用性引起了该技术的许多潜在滥用。多年来，已经提出了几种检测方法，可以以相当好的准确度检测语音欺骗。然而，这些方法大多在干净的音频数据库上进行评估，例如ASVSpoof 2019。本文评估了存在洗钱攻击的SOTA音频欺骗检测方法。在这方面，创建了一个新的洗钱攻击数据库，称为ASVSpoof洗钱数据库。该数据库基于ASVSpoof 2019（LA）评估数据库，包括总计1388.22小时的音频记录。七SOTA音频欺骗检测方法进行评估，这个清洗数据库。结果表明，SOTA系统在攻击性洗钱攻击，特别是混响和加性噪声攻击的存在下表现不佳。这表明需要鲁棒的音频欺骗检测。
摘要：Voice-cloning (VC) systems have seen an exceptional increase in the realism of synthesized speech in recent years. The high quality of synthesized speech and the availability of low-cost VC services have given rise to many potential abuses of this technology. Several detection methodologies have been proposed over the years that can detect voice spoofs with reasonably good accuracy. However, these methodologies are mostly evaluated on clean audio databases, such as ASVSpoof 2019. This paper evaluates SOTA Audio Spoof Detection approaches in the presence of laundering attacks. In that regard, a new laundering attack database, called the ASVSpoof Laundering Database, is created. This database is based on the ASVSpoof 2019 (LA) eval database comprising a total of 1388.22 hours of audio recordings. Seven SOTA audio spoof detection approaches are evaluated on this laundered database. The results indicate that SOTA systems perform poorly in the presence of aggressive laundering attacks, especially reverberation and additive noise attacks. This suggests the need for robust audio spoof detection.

半/弱/无/有监督|不确定性|主动学习(5篇)

【1】ActLoc: Learning to Localize on the Move via Active Viewpoint Selection
标题：ActLoc：通过主动视点选择学习移动中的本地化
链接：https://arxiv.org/abs/2508.20981

作者：, Boyang Sun, Luca Di Giammarino, Hermann Blum, Marc Pollefeys
摘要：可靠的定位是机器人导航的关键，但大多数现有的系统隐含地假设，在一个位置的所有观看方向是同样的信息。在实践中，当机器人观察到未映射的、模糊的或无信息的区域时，定位变得不可靠。为了解决这个问题，我们提出了ActLoc，一个主动的视点感知规划框架，用于提高一般机器人导航任务的定位精度。在其核心，ActLoc采用大规模训练的基于注意力的模型进行视点选择。该模型对度量地图和地图构建期间使用的相机姿势进行编码，并预测任意3D位置处跨偏航和俯仰方向的定位精度。这些每点精度分布被纳入路径规划器，使机器人能够主动选择摄像机的方向，最大限度地提高定位鲁棒性，同时尊重任务和运动约束。ActLoc实现了单视点选择的最先进的结果，并有效地推广到全轨迹规划。它的模块化设计使其易于适用于各种机器人导航和检查任务。
摘要：Reliable localization is critical for robot navigation, yet most existing systems implicitly assume that all viewing directions at a location are equally informative. In practice, localization becomes unreliable when the robot observes unmapped, ambiguous, or uninformative regions. To address this, we present ActLoc, an active viewpoint-aware planning framework for enhancing localization accuracy for general robot navigation tasks. At its core, ActLoc employs a largescale trained attention-based model for viewpoint selection. The model encodes a metric map and the camera poses used during map construction, and predicts localization accuracy across yaw and pitch directions at arbitrary 3D locations. These per-point accuracy distributions are incorporated into a path planner, enabling the robot to actively select camera orientations that maximize localization robustness while respecting task and motion constraints. ActLoc achieves stateof-the-art results on single-viewpoint selection and generalizes effectively to fulltrajectory planning. Its modular design makes it readily applicable to diverse robot navigation and inspection tasks.

【2】Unleashing Uncertainty: Efficient Machine Unlearning for Generative AI
标题：释放不确定性：生成性人工智能的高效机器去学习
链接：https://arxiv.org/abs/2508.20773

作者：ros N. Spartalis, Theodoros Semertzidis, Petros Daras, Efstratios Gavves
备注：ICML 2025 workshop on Machine Unlearning for Generative AI
摘要：我们介绍了SAFEMax，这是一种用于扩散模型中机器学习的新方法。基于信息论原理，SAFEMax最大化生成图像的熵，导致模型在不允许的类别条件下生成高斯噪声，最终停止其去噪过程。此外，我们的方法控制遗忘和保留之间的平衡，选择性地集中在早期的扩散步骤，其中类特定的信息是突出的。我们的研究结果证明了SAFEMax的有效性，并强调了其相对于最先进方法的显著效率提升。
摘要：We introduce SAFEMax, a novel method for Machine Unlearning in diffusion models. Grounded in information-theoretic principles, SAFEMax maximizes the entropy in generated images, causing the model to generate Gaussian noise when conditioned on impermissible classes by ultimately halting its denoising process. Also, our method controls the balance between forgetting and retention by selectively focusing on the early diffusion steps, where class-specific information is prominent. Our results demonstrate the effectiveness of SAFEMax and highlight its substantial efficiency gains over state-of-the-art methods.

【3】Supervised Stochastic Gradient Algorithms for Multi-Trial Source Separation
标题：多重尝试源分离的有监督随机梯度算法
链接：https://arxiv.org/abs/2508.20618

作者：ta, Mateus Piovezan Otto, Noah Stanis, Azadeh Yazdan-Shahmorad, Zaid Harchaoui
摘要：我们开发了一个独立成分分析的随机算法，其中包括多试验监督，这是在许多科学背景下可用。该方法将可逆矩阵空间中的近似梯度型算法与通过反向传播的预测模型的联合学习相结合。我们说明了所提出的算法在合成和真实数据实验。特别是，由于额外的监督，我们观察到非凸优化的成功率提高，独立组件的可解释性提高。
摘要：We develop a stochastic algorithm for independent component analysis that incorporates multi-trial supervision, which is available in many scientific contexts. The method blends a proximal gradient-type algorithm in the space of invertible matrices with joint learning of a prediction model through backpropagation. We illustrate the proposed algorithm on synthetic and real data experiments. In particular, owing to the additional supervision, we observe an increased success rate of the non-convex optimization and the improved interpretability of the independent components.

【4】Particle swarm optimization for online sparse streaming feature selection under uncertainty
标题：不确定性下在线稀疏流特征选择的粒子群优化
链接：https://arxiv.org/abs/2508.20123

作者：u
摘要：在涉及高维流数据的实际应用中，在线流特征选择（OSFS）被广泛采用。然而，由于传感器故障或技术限制，实际部署经常面临数据不完整的问题。虽然在线稀疏流特征选择（OS2FS）通过基于潜在因子分析的插补来缓解这个问题，但现有方法难以处理不确定的特征-标签相关性，导致模型不灵活和性能下降。为了解决这些差距，这项工作提出了POS2FS-粒子群优化（PSO）增强的不确定性感知的在线稀疏流特征选择框架。该方法介绍：1）PSO驱动的监督，以减少特征-标签关系中的不确定性; 2）三路决策理论，以管理监督学习中的特征模糊。在六个真实世界数据集上进行的严格测试证实，POS2FS优于传统的OSFS和OS2FS技术，通过更强大的特征子集选择提供更高的准确性。
摘要：In real-world applications involving high-dimensional streaming data, online streaming feature selection (OSFS) is widely adopted. Yet, practical deployments frequently face data incompleteness due to sensor failures or technical constraints. While online sparse streaming feature selection (OS2FS) mitigates this issue via latent factor analysis-based imputation, existing methods struggle with uncertain feature-label correlations, leading to inflexible models and degraded performance. To address these gaps, this work proposes POS2FS-an uncertainty-aware online sparse streaming feature selection framework enhanced by particle swarm optimization (PSO). The approach introduces: 1) PSO-driven supervision to reduce uncertainty in feature-label relationships; 2) Three-way decision theory to manage feature fuzziness in supervised learning. Rigorous testing on six real-world datasets confirms POS2FS outperforms conventional OSFS and OS2FS techniques, delivering higher accuracy through more robust feature subset selection.

【5】LoTUS: Large-Scale Machine Unlearning with a Taste of Uncertainty
标题：LoTUS：带有不确定性的大规模机器学习
链接：https://arxiv.org/abs/2503.18314

作者：ros N. Spartalis, Theodoros Semertzidis, Efstratios Gavves, Petros Daras
备注：Accepted as a main conference paper at CVPR 2025 (this https URL)
摘要：我们提出了LoTUS，这是一种新型的机器非学习（MU）方法，它消除了来自预训练模型的训练样本的影响，避免了从头开始重新训练。LoTUS将模型的预测概率平滑到信息理论界限，减轻了数据记忆造成的过度自信。我们在Transformer和ResNet 18模型上评估了LoTUS，并针对五个公共数据集的八个基线。除了既定的MU基准，我们还在ImageNet 1 k上评估了非学习，这是一个大规模的数据集，在那里再训练是不切实际的，模拟现实世界的条件。此外，我们还引入了新的无约束詹森-香农发散（RF-JSD）度量，以实现在现实世界条件下的评估。实验结果表明，LoTUS在效率和有效性方面优于最先进的方法。代码：https://github.com/cspartalis/LoTUS。
摘要：We present LoTUS, a novel Machine Unlearning (MU) method that eliminates the influence of training samples from pre-trained models, avoiding retraining from scratch. LoTUS smooths the prediction probabilities of the model up to an information-theoretic bound, mitigating its over-confidence stemming from data memorization. We evaluate LoTUS on Transformer and ResNet18 models against eight baselines across five public datasets. Beyond established MU benchmarks, we evaluate unlearning on ImageNet1k, a large-scale dataset, where retraining is impractical, simulating real-world conditions. Moreover, we introduce the novel Retrain-Free Jensen-Shannon Divergence (RF-JSD) metric to enable evaluation under real-world conditions. The experimental results show that LoTUS outperforms state-of-the-art methods in terms of both efficiency and effectiveness. Code: https://github.com/cspartalis/LoTUS.

迁移|Zero/Few/One-Shot|自适应(6篇)

【1】Self-Composing Neural Operators with Depth and Accuracy Scaling via Adaptive Train-and-Unroll Approach
标题：通过自适应训练展开方法进行深度和准确度缩放的自组合神经运算符
链接：https://arxiv.org/abs/2508.20650

作者：, Xinliang Liu, Jinchao Xu
摘要：在这项工作中，我们提出了一个新的框架，通过自组合来提高神经运算符的效率和准确性，提供理论保证和实际好处。受求解数值偏微分方程（PDE）的迭代方法的启发，我们通过重复应用单个神经算子块来设计特定的神经算子，在不显式添加新块的情况下逐步深化模型，提高模型的容量。为了有效地训练这些模型，我们引入了一种自适应的训练和展开方法，在训练过程中逐渐增加神经操作符的深度。这种方法揭示了模型深度的准确性比例律，并通过我们的自适应训练策略提供了显着的计算节省。我们的架构在标准基准测试中达到了最先进的（SOTA）性能。我们进一步证明了其具有挑战性的高频超声计算机断层扫描（USCT）的问题，其中一个多网格启发的骨干，使优越的性能，解决复杂的波现象的有效性。所提出的框架为大规模数据驱动的科学机器学习应用提供了一种计算上易于处理、准确且可扩展的解决方案。
摘要：In this work, we propose a novel framework to enhance the efficiency and accuracy of neural operators through self-composition, offering both theoretical guarantees and practical benefits. Inspired by iterative methods in solving numerical partial differential equations (PDEs), we design a specific neural operator by repeatedly applying a single neural operator block, we progressively deepen the model without explicitly adding new blocks, improving the model's capacity. To train these models efficiently, we introduce an adaptive train-and-unroll approach, where the depth of the neural operator is gradually increased during training. This approach reveals an accuracy scaling law with model depth and offers significant computational savings through our adaptive training strategy. Our architecture achieves state-of-the-art (SOTA) performance on standard benchmarks. We further demonstrate its efficacy on a challenging high-frequency ultrasound computed tomography (USCT) problem, where a multigrid-inspired backbone enables superior performance in resolving complex wave phenomena. The proposed framework provides a computationally tractable, accurate, and scalable solution for large-scale data-driven scientific machine learning applications.

【2】Delay-adaptive Control of Nonlinear Systems with Approximate Neural Operator Predictors
标题：具有逼近神经预测的非线性系统延迟自适应控制
链接：https://arxiv.org/abs/2508.20367

作者：, Miroslav Krstic, Yuanyuan Shi
备注：9 pages, 1 Figure
摘要：在这项工作中，我们提出了一个严格的方法实现预测反馈控制器的非线性系统未知和任意长的执行器延迟。为了解决预测器在分析上难以处理的性质，我们使用学习的神经算子映射来近似它。这种映射是离线训练一次，然后在线部署，利用神经网络的快速推理能力。我们提供了一个理论上的稳定性分析的基础上的普遍逼近定理的神经运营商和运输偏微分方程（PDE）表示的延迟。然后，我们证明，通过一个Lyapunov-Krasovskii功能，半全局实际收敛的动力系统依赖于预测和延迟界的逼近误差。最后，我们验证了我们的理论结果，使用生物激活剂/抑制剂系统，表现出15倍的加速比传统的数值方法。
摘要：In this work, we propose a rigorous method for implementing predictor feedback controllers in nonlinear systems with unknown and arbitrarily long actuator delays. To address the analytically intractable nature of the predictor, we approximate it using a learned neural operator mapping. This mapping is trained once, offline, and then deployed online, leveraging the fast inference capabilities of neural networks. We provide a theoretical stability analysis based on the universal approximation theorem of neural operators and the transport partial differential equation (PDE) representation of the delay. We then prove, via a Lyapunov-Krasovskii functional, semi-global practical convergence of the dynamical system dependent on the approximation error of the predictor and delay bounds. Finally, we validate our theoretical results using a biological activator/repressor system, demonstrating speedups of 15 times compared to traditional numerical methods.

【3】Adaptive Segmentation of EEG for Machine Learning Applications
标题：用于机器学习应用的脑电自适应分割
链接：https://arxiv.org/abs/2508.20336

作者：hou, Joseph West, Krista A. Ehinger, Zhenming Ren, Sam E. John, David B. Grayden
摘要：Objective.脑电图（EEG）数据是通过对连续神经时间序列信号进行采样而得到的。为了准备用于机器学习的EEG信号，必须将信号划分为可管理的片段。目前的朴素方法使用任意的固定时间片，这可能具有有限的生物相关性，因为大脑状态不限于固定的时间间隔。我们研究自适应分割方法是否有利于机器学习EEG分析。 Approach.我们介绍了一种新的自适应分割方法CTXSEG，该方法基于EEG数据中的统计差异创建可变长度段，并提出了将其与通常需要固定长度输入的现代机器学习方法一起使用的方法。我们使用我们的新型信号发生器CTXGEN产生的可控合成数据来评估CTXSEG。虽然我们的CTXSEG方法具有通用性，但我们通过将其应用于EEG癫痫发作检测问题，在现实世界的用例中验证了它。我们比较了CTXSEG与固定长度分割在用于癫痫发作检测的典型EEG机器学习管道的预处理步骤中的性能。主要结果。我们发现，使用CTXSEG准备EEG数据与使用标准化框架进行评估时的固定长度方法相比，可以提高癫痫发作检测性能，而无需修改机器学习方法，并且需要更少的片段。意义这项工作表明，使用CTXSEG的自适应分割可以很容易地应用于现代机器学习方法，并有可能提高性能。它是一个有前途的替代固定长度的分割信号预处理，并应被视为标准的预处理程序的一部分，在EEG机器学习应用程序。
摘要：Objective. Electroencephalography (EEG) data is derived by sampling continuous neurological time series signals. In order to prepare EEG signals for machine learning, the signal must be divided into manageable segments. The current naive approach uses arbitrary fixed time slices, which may have limited biological relevance because brain states are not confined to fixed intervals. We investigate whether adaptive segmentation methods are beneficial for machine learning EEG analysis. Approach. We introduce a novel adaptive segmentation method, CTXSEG, that creates variable-length segments based on statistical differences in the EEG data and propose ways to use them with modern machine learning approaches that typically require fixed-length input. We assess CTXSEG using controllable synthetic data generated by our novel signal generator CTXGEN. While our CTXSEG method has general utility, we validate it on a real-world use case by applying it to an EEG seizure detection problem. We compare the performance of CTXSEG with fixed-length segmentation in the preprocessing step of a typical EEG machine learning pipeline for seizure detection. Main results. We found that using CTXSEG to prepare EEG data improves seizure detection performance compared to fixed-length approaches when evaluated using a standardized framework, without modifying the machine learning method, and requires fewer segments. Significance. This work demonstrates that adaptive segmentation with CTXSEG can be readily applied to modern machine learning approaches, with potential to improve performance. It is a promising alternative to fixed-length segmentation for signal preprocessing and should be considered as part of the standard preprocessing repertoire in EEG machine learning applications.

【4】Dynamics-Aligned Latent Imagination in Contextual World Models for Zero-Shot Generalization
标题：情境世界模型中动态对齐的潜在想象力用于Zero-Shot概括
链接：https://arxiv.org/abs/2508.20294

作者：er, Jan Benad, Manfred Eppe, Pradeep Kr. Banerjee
备注：31 pages, 4 figures
摘要：现实世界的强化学习需要适应看不见的环境条件，而无需昂贵的再培训。上下文马尔可夫决策过程（cMDP）对这一挑战进行建模，但现有方法通常需要显式上下文变量（例如，摩擦，重力），限制了它们在上下文是潜在的或难以测量时的使用。我们介绍动态对齐潜在的想象力（DALI），一个框架内集成的梦想家架构，推断潜在的上下文表示从代理环境的相互作用。通过训练一个自我监督的编码器来预测前向动态，DALI生成可操作的表示，调节世界模型和策略，桥接感知和控制。我们从理论上证明了这种编码器是必要的高效上下文推理和鲁棒的推广。DALI的潜在空间实现了反事实的一致性：扰动重力编码维度以物理上合理的方式改变了想象中的展示。在具有挑战性的cMDP基准测试中，DALI实现了对上下文未知基线的显著增益，通常在外推任务中超过上下文感知基线，从而能够对看不见的上下文变化进行zero-shot泛化。
摘要：Real-world reinforcement learning demands adaptation to unseen environmental conditions without costly retraining. Contextual Markov Decision Processes (cMDP) model this challenge, but existing methods often require explicit context variables (e.g., friction, gravity), limiting their use when contexts are latent or hard to measure. We introduce Dynamics-Aligned Latent Imagination (DALI), a framework integrated within the Dreamer architecture that infers latent context representations from agent-environment interactions. By training a self-supervised encoder to predict forward dynamics, DALI generates actionable representations conditioning the world model and policy, bridging perception and control. We theoretically prove this encoder is essential for efficient context inference and robust generalization. DALI's latent space enables counterfactual consistency: Perturbing a gravity-encoding dimension alters imagined rollouts in physically plausible ways. On challenging cMDP benchmarks, DALI achieves significant gains over context-unaware baselines, often surpassing context-aware baselines in extrapolation tasks, enabling zero-shot generalization to unseen contextual variations.

【5】Plug-in Feedback Self-adaptive Attention in CLIP for Training-free Open-Vocabulary Segmentation
标题：CLIP中的插件反馈自适应注意力免训练开放词汇分段
链接：https://arxiv.org/abs/2508.20265

作者：Chi, Yanan Wu, Li Gu, Huan Liu, Ziqiang Wang, Yang Zhang, Yang Wang, Konstantinos N. Plataniotis
备注：ICCV 2025, code:this https URL
摘要：CLIP具有很强的视觉-文本对齐能力，但由于本地化较差，因此难以进行开放式词汇分割。先前的方法通过修改中间注意力来增强空间相干性。但是，由于后续的操作（如投影），这种一致性并没有一致地传播到最终输出。此外，中间注意力缺乏与文本表示的直接交互，这种语义差异限制了CLIP的全部潜力。在这项工作中，我们提出了一个无训练，反馈驱动的自适应框架，适应基于输出的补丁级对应回中间注意。作为模型处理的顶点，输出预测封装了关于每个补丁的最全面的视觉和文本语义。我们的方法通过利用模型的输出作为更强的空间相干性先验，增强了内部表示和最终预测之间的语义一致性。我们设计了关键模块，包括注意力隔离、基于置信度的稀疏自适应修剪和自适应集成，以有效地反馈输出一致性线索。我们的方法作为一个插件模块，无缝集成到四个国家的最先进的方法与三个骨干（ViT-B，ViT-L，ViT-H）。我们进一步验证了我们的框架在多个注意类型（Q-K，自我自我和代理增强MAE，SAM和DINO）。我们的方法在八个基准测试中不断提高他们的性能。
摘要：CLIP exhibits strong visual-textual alignment but struggle with open-vocabulary segmentation due to poor localization. Prior methods enhance spatial coherence by modifying intermediate attention. But, this coherence isn't consistently propagated to the final output due to subsequent operations such as projections. Additionally, intermediate attention lacks direct interaction with text representations, such semantic discrepancy limits the full potential of CLIP. In this work, we propose a training-free, feedback-driven self-adaptive framework that adapts output-based patch-level correspondences back to the intermediate attention. The output predictions, being the culmination of the model's processing, encapsulate the most comprehensive visual and textual semantics about each patch. Our approach enhances semantic consistency between internal representations and final predictions by leveraging the model's outputs as a stronger spatial coherence prior. We design key modules, including attention isolation, confidence-based pruning for sparse adaptation, and adaptation ensemble, to effectively feedback the output coherence cues. Our method functions as a plug-in module, seamlessly integrating into four state-of-the-art approaches with three backbones (ViT-B, ViT-L, ViT-H). We further validate our framework across multiple attention types (Q-K, self-self, and Proxy augmented with MAE, SAM, and DINO). Our approach consistently improves their performance across eight benchmarks.

【6】Transfer Learning for Classification under Decision Rule Drift with Application to Optimal Individualized Treatment Rule Estimation
标题：决策规则漂移下分类的转移学习及其在最佳个性化治疗规则估计中的应用
链接：https://arxiv.org/abs/2508.20942

作者：ang, Yang Ning
摘要：在本文中，我们将迁移学习分类框架从基于回归函数的方法扩展到决策规则。我们提出了一种新的方法，通过贝叶斯决策规则建模后漂移。通过利用贝叶斯决策边界的几何变换，我们的方法将问题重新表述为一个低维的经验风险最小化问题。在弱正则性条件下，我们建立了估计量的一致性，并得到了风险界。此外，我们说明了我们的方法的广泛适用性，通过调整它的最佳个性化治疗规则的估计。广泛的模拟研究和分析的真实世界的数据进一步证明了我们的方法的优越性能和鲁棒性。
摘要：In this paper, we extend the transfer learning classification framework from regression function-based methods to decision rules. We propose a novel methodology for modeling posterior drift through Bayes decision rules. By exploiting the geometric transformation of the Bayes decision boundary, our method reformulates the problem as a low-dimensional empirical risk minimization problem. Under mild regularity conditions, we establish the consistency of our estimators and derive the risk bounds. Moreover, we illustrate the broad applicability of our method by adapting it to the estimation of optimal individualized treatment rules. Extensive simulation studies and analyses of real-world data further demonstrate both superior performance and robustness of our approach.

强化学习(4篇)

【1】Inference-Time Alignment Control for Diffusion Models with Reinforcement Learning Guidance
标题：具有强化学习指导的扩散模型推理时对齐控制
链接：https://arxiv.org/abs/2508.21016

作者： Jin, Zijie Qiu, Jie Liu, Zijie Diao, Lifeng Qiao, Ning Ding, Alex Lamb, Xipeng Qiu
摘要：基于去噪的生成模型，特别是扩散和流匹配算法，已经取得了显着的成功。然而，将其输出分布与复杂的下游目标（例如人类偏好、组成准确性或数据可压缩性）对齐仍然具有挑战性。虽然强化学习（RL）微调方法受到大型语言模型的人类反馈强化学习（RLHF）的启发，已经适应了这些生成框架，但目前的强化学习方法对于扩散模型来说是次优的，并且在微调后控制对齐强度方面提供了有限的灵活性。在这项工作中，我们重新解释RL微调的扩散模型通过随机微分方程和隐式奖励条件的镜头。我们引入了强化学习指导（RLG），这是一种推理时间方法，通过几何平均将基础模型和RL微调模型的输出相结合来适应无分类器指导（CFG）。我们的理论分析表明，RLG的指导规模在数学上相当于调整标准RL目标中的KL正则化系数，从而无需进一步训练即可动态控制测量质量权衡。大量的实验表明，RLG在各种架构、RL算法和下游任务（包括人类偏好、组合控制、压缩性和文本渲染）中不断提高RL微调模型的性能。此外，RLG支持内插和外插，从而在控制生成对准方面提供了前所未有的灵活性。我们的方法提供了一个实际的和理论上合理的解决方案，以提高和控制扩散模型对齐的推理。RLG的源代码可以在Github上公开获得：https://github.com/jinluo12345/Reinforcement-learning-guidance。
摘要：Denoising-based generative models, particularly diffusion and flow matching algorithms, have achieved remarkable success. However, aligning their output distributions with complex downstream objectives, such as human preferences, compositional accuracy, or data compressibility, remains challenging. While reinforcement learning (RL) fine-tuning methods, inspired by advances in RL from human feedback (RLHF) for large language models, have been adapted to these generative frameworks, current RL approaches are suboptimal for diffusion models and offer limited flexibility in controlling alignment strength after fine-tuning. In this work, we reinterpret RL fine-tuning for diffusion models through the lens of stochastic differential equations and implicit reward conditioning. We introduce Reinforcement Learning Guidance (RLG), an inference-time method that adapts Classifier-Free Guidance (CFG) by combining the outputs of the base and RL fine-tuned models via a geometric average. Our theoretical analysis shows that RLG's guidance scale is mathematically equivalent to adjusting the KL-regularization coefficient in standard RL objectives, enabling dynamic control over the alignment-quality trade-off without further training. Extensive experiments demonstrate that RLG consistently improves the performance of RL fine-tuned models across various architectures, RL algorithms, and downstream tasks, including human preferences, compositional control, compressibility, and text rendering. Furthermore, RLG supports both interpolation and extrapolation, thereby offering unprecedented flexibility in controlling generative alignment. Our approach provides a practical and theoretically sound solution for enhancing and controlling diffusion model alignment at inference. The source code for RLG is publicly available at the Github: https://github.com/jinluo12345/Reinforcement-learning-guidance.

【2】Multi-Agent Reinforcement Learning in Intelligent Transportation Systems: A Comprehensive Survey
标题：智能交通系统中的多智能体强化学习：全面调查
链接：https://arxiv.org/abs/2508.20315

作者：s Donatus, Kumater Ter, Ore-Ofe Ajayi, Daniel Udekwe
摘要：城市交通日益复杂，以及对高效、可持续和自适应解决方案的需求，使智能交通系统（ITS）成为现代基础设施创新的前沿。智能交通系统的核心是在动态、大规模和不确定的环境中进行自主决策的挑战，在这些环境中，多个智能体交通信号、自动驾驶汽车或车队单位必须有效地协调。多智能体强化学习（MARL）提供了一个很有前途的范例，通过使分布式代理共同学习最佳策略，平衡个人目标与系统范围内的效率，以解决这些挑战。本文介绍了MARL在ITS中的应用。我们引入了一个结构化的分类法，MARL方法分类协调模型和学习算法，跨越价值为基础的，基于策略的，演员批评，和通信增强的框架。在关键的ITS领域的应用进行了审查，包括交通信号控制，连接和自动驾驶车辆协调，物流优化和移动按需系统。此外，我们强调了广泛使用的仿真平台，如SUMO，CARLA和CityFlow，支持MARL实验，以及新兴的基准。该调查还确定了核心挑战，包括可扩展性，非平稳性，信用分配，通信限制以及SIM到真实传输的差距，这些挑战继续阻碍现实世界的部署。
摘要：The growing complexity of urban mobility and the demand for efficient, sustainable, and adaptive solutions have positioned Intelligent Transportation Systems (ITS) at the forefront of modern infrastructure innovation. At the core of ITS lies the challenge of autonomous decision-making across dynamic, large scale, and uncertain environments where multiple agents traffic signals, autonomous vehicles, or fleet units must coordinate effectively. Multi Agent Reinforcement Learning (MARL) offers a promising paradigm for addressing these challenges by enabling distributed agents to jointly learn optimal strategies that balance individual objectives with system wide efficiency. This paper presents a comprehensive survey of MARL applications in ITS. We introduce a structured taxonomy that categorizes MARL approaches according to coordination models and learning algorithms, spanning value based, policy based, actor critic, and communication enhanced frameworks. Applications are reviewed across key ITS domains, including traffic signal control, connected and autonomous vehicle coordination, logistics optimization, and mobility on demand systems. Furthermore, we highlight widely used simulation platforms such as SUMO, CARLA, and CityFlow that support MARL experimentation, along with emerging benchmarks. The survey also identifies core challenges, including scalability, non stationarity, credit assignment, communication constraints, and the sim to real transfer gap, which continue to hinder real world deployment.

【3】QTMRL: An Agent for Quantitative Trading Decision-Making Based on Multi-Indicator Guided Reinforcement Learning
标题：QT MRL：基于多指标引导强化学习的量化交易决策代理
链接：https://arxiv.org/abs/2508.20467

作者： Liu, Jiahao Chen
摘要：在高度波动和不确定的全球金融市场中，传统的量化交易模型依赖于统计建模或经验规则，由于假设僵化、泛化能力有限，往往无法适应市场动态变化和黑天鹅事件。为了解决这些问题，本文提出了QTMRL（定量交易多指标强化学习），一个智能交易代理相结合的多维技术指标与强化学习（RL）的自适应和稳定的投资组合管理。我们首先使用23年的标准普尔500指数每日OHLCV数据（2000-2022）构建了一个全面的多指标数据集，用于5个行业的16只代表性股票，丰富了趋势，波动性和动量指标的原始数据，以捕捉整体市场动态。然后，我们设计了一个轻量级的RL框架，包括数据处理，A2 C算法，和交易代理模块，以支持策略学习和可操作的交易决策的基础上的优势Actor-Critic（A2 C）算法。广泛的实验比较QTMRL与9个基线（例如，ARIMA、LSTM、移动平均策略），验证其在盈利能力、风险调整和下行风险控制方面的优势。QTMRL的代码可在https://github.com/ChenJiahaoJNU/QTMRL.git上公开获取
摘要：In the highly volatile and uncertain global financial markets, traditional quantitative trading models relying on statistical modeling or empirical rules often fail to adapt to dynamic market changes and black swan events due to rigid assumptions and limited generalization. To address these issues, this paper proposes QTMRL (Quantitative Trading Multi-Indicator Reinforcement Learning), an intelligent trading agent combining multi-dimensional technical indicators with reinforcement learning (RL) for adaptive and stable portfolio management. We first construct a comprehensive multi-indicator dataset using 23 years of S&P 500 daily OHLCV data (2000-2022) for 16 representative stocks across 5 sectors, enriching raw data with trend, volatility, and momentum indicators to capture holistic market dynamics. Then we design a lightweight RL framework based on the Advantage Actor-Critic (A2C) algorithm, including data processing, A2C algorithm, and trading agent modules to support policy learning and actionable trading decisions. Extensive experiments compare QTMRL with 9 baselines (e.g., ARIMA, LSTM, moving average strategies) across diverse market regimes, verifying its superiority in profitability, risk adjustment, and downside risk control. The code of QTMRL is publicly available at https://github.com/ChenJiahaoJNU/QTMRL.git

【4】Deep Reinforcement Learning for Optimal Asset Allocation Using DDPG with TiDE
标题：使用DDPG和TiDE实现最佳资产配置的深度强化学习
链接：https://arxiv.org/abs/2508.20103

作者：iu, Jin Zheng, John Cartlidge
备注：10 pages, 3 figures, authors accepted manuscript, to appear in 24th International Conference on Modelling and Applied Simulation (MAS), Sep. 2025, Fes, Morocco
摘要：由于金融市场固有的波动性，风险资产和无风险资产之间的最佳资产配置是一项持续的挑战。传统的方法依赖于严格的分布假设或非加性回报率，这限制了它们对投资目标的鲁棒性和适用性。为了克服这些限制，本研究制定了最佳的两个资产配置问题作为一个连续的决策任务内的马尔可夫决策过程（MDP）。该框架使强化学习（RL）机制的应用能够基于模拟的金融场景开发动态政策，而无需考虑先决条件。我们使用凯利准则来平衡即时回报信号与长期投资目标，并采取新颖的步骤，将时间序列密集编码器（TIDE）集成到深度确定性策略梯度（DDPG）RL框架中，以进行连续决策。我们比较了DDPG-TIDE与简单的离散动作Q学习RL框架和被动购买并持有投资策略。实证结果表明，DDPG-TIDE优于Q学习，并产生更高的风险调整后的回报比买入并持有。这些发现表明，通过将TIDE集成到DDPG强化学习框架中来解决最优资产配置问题是进一步探索的一个富有成效的途径。
摘要：The optimal asset allocation between risky and risk-free assets is a persistent challenge due to the inherent volatility in financial markets. Conventional methods rely on strict distributional assumptions or non-additive reward ratios, which limit their robustness and applicability to investment goals. To overcome these constraints, this study formulates the optimal two-asset allocation problem as a sequential decision-making task within a Markov Decision Process (MDP). This framework enables the application of reinforcement learning (RL) mechanisms to develop dynamic policies based on simulated financial scenarios, regardless of prerequisites. We use the Kelly criterion to balance immediate reward signals against long-term investment objectives, and we take the novel step of integrating the Time-series Dense Encoder (TiDE) into the Deep Deterministic Policy Gradient (DDPG) RL framework for continuous decision-making. We compare DDPG-TiDE with a simple discrete-action Q-learning RL framework and a passive buy-and-hold investment strategy. Empirical results show that DDPG-TiDE outperforms Q-learning and generates higher risk adjusted returns than buy-and-hold. These findings suggest that tackling the optimal asset allocation problem by integrating TiDE within a DDPG reinforcement learning framework is a fruitful avenue for further exploration.

符号|符号学习(2篇)

【1】Compositionality in Time Series: A Proof of Concept using Symbolic Dynamics and Compositional Data Augmentation
标题：时间序列中的组合性：使用符号动力学和组合数据增强的概念证明
链接：https://arxiv.org/abs/2508.20656

作者：agmann, Michael Staniek, Stefan Riezler
备注：None
摘要：这项工作研究了自然现象的时间序列是否可以被理解为由以系统和规则的方式排序的潜在状态序列生成。我们专注于临床时间序列，并询问临床测量是否可以被解释为产生有意义的生理状态，其继承遵循系统的原则。揭示潜在的组成结构将使我们能够创建合成数据，以缓解临床时间序列预测中的稀疏和低资源数据设置的臭名昭著的问题，并加深我们对临床数据的理解。我们首先将时间序列的组合性概念化为数据生成过程的一个属性，然后研究数据驱动的程序，可以重建这个过程的基本状态和组合规则。我们评估这种方法的成功，使用两个实证测试源于域适应的角度来看。这两种检验都是从以特定方式在原始数据和合成数据上训练和测试的时间序列预测模型的预期风险的相似性来推断原始时间序列分布和合成时间序列分布的相似性。我们的实验结果表明，通过对合成数据进行训练所获得的测试集性能与对原始临床时间序列数据进行训练相当，并且对合成测试数据进行模型评估的结果与对原始测试数据进行评估的结果相似，优于基于随机化的数据增强。序贯器官衰竭评估（SOFA）评分预测任务的额外下游评估显示，与原始数据训练相比，当模型训练完全基于合成数据时，性能显著提高。
摘要：This work investigates whether time series of natural phenomena can be understood as being generated by sequences of latent states which are ordered in systematic and regular ways. We focus on clinical time series and ask whether clinical measurements can be interpreted as being generated by meaningful physiological states whose succession follows systematic principles. Uncovering the underlying compositional structure will allow us to create synthetic data to alleviate the notorious problem of sparse and low-resource data settings in clinical time series forecasting, and deepen our understanding of clinical data. We start by conceptualizing compositionality for time series as a property of the data generation process, and then study data-driven procedures that can reconstruct the elementary states and composition rules of this process. We evaluate the success of this methods using two empirical tests originating from a domain adaptation perspective. Both tests infer the similarity of the original time series distribution and the synthetic time series distribution from the similarity of expected risk of time series forecasting models trained and tested on original and synthesized data in specific ways. Our experimental results show that the test set performance achieved by training on compositionally synthesized data is comparable to training on original clinical time series data, and that evaluation of models on compositionally synthesized test data shows similar results to evaluating on original test data, outperforming randomization-based data augmentation. An additional downstream evaluation of the prediction task of sequential organ failure assessment (SOFA) scores shows significant performance gains when model training is entirely based on compositionally synthesized data compared to training on original data.

【2】Discovering equations from data: symbolic regression in dynamical systems
标题：从数据中发现方程：动力系统中的符号回归
链接：https://arxiv.org/abs/2508.20257

作者：. Brum, Luiza Lober, Isolde Previdelli, Francisco A. Rodrigues
摘要：从数据中发现方程的过程是物理学和许多其他研究领域的核心，包括数学生态学和流行病学。最近，被称为符号回归的机器学习方法使这一过程自动化。由于文献中有几种方法，因此比较它们是很重要的，特别是对于描述复杂现象的动态系统。本文使用五种符号回归方法从九个动力学过程中恢复方程，包括混沌动力学和流行病模型，PySR方法被证明是最适合于推断方程。基准测试结果表明，它的高预测能力和准确性，与一些估计是无法区分的原始分析形式。这些结果突出了符号回归作为推断和模拟现实世界现象的强大工具的潜力。
摘要：The process of discovering equations from data lies at the heart of physics and in many other areas of research, including mathematical ecology and epidemiology. Recently, machine learning methods known as symbolic regression have automated this process. As several methods are available in the literature, it is important to compare them, particularly for dynamic systems that describe complex phenomena. In this paper, five symbolic regression methods were used for recovering equations from nine dynamical processes, including chaotic dynamics and epidemic models, with the PySR method proving to be the most suitable for inferring equations. Benchmark results demonstrate its high predictive power and accuracy, with some estimates being indistinguishable from the original analytical forms. These results highlight the potential of symbolic regression as a robust tool for inferring and modelling real-world phenomena.

医学相关(6篇)

【1】Unified Multi-task Learning for Voice-Based Detection of Diverse Clinical Conditions
标题：统一多任务学习用于基于语音的各种临床状况检测
链接：https://arxiv.org/abs/2508.20717

作者： Yuan Lu, Hareld Kemps, Tong Xia, Aaqib Saeed
摘要：基于语音的健康评估为可扩展的非侵入性疾病筛查提供了前所未有的机会，但现有方法通常专注于单一条件，无法利用语音中嵌入的丰富，多方面的信息。我们提出了MARVEL（基于语音的健康分析的多任务声学表示），这是一个具有隐私意识的多任务学习框架，可以同时检测九种不同的神经系统，呼吸系统和语音障碍，仅使用派生的声学特征，无需原始音频传输。我们的双分支架构采用专用编码器，特定任务的头共享一个共同的声学骨干，实现有效的跨条件知识转移。在大规模Bridge 2AI-Voice v2.0数据集上进行评估，MARVEL的总体AUROC为0.78，在神经系统疾病方面表现出色（AUROC = 0.89），特别是阿尔茨海默病/轻度认知障碍（AUROC = 0.97）。我们的框架始终优于单模态基线5-19%，并在9项任务中的7项上超过了最先进的自我监督模型，而相关性分析表明，学习的表示与已建立的声学特征表现出有意义的相似性，表明模型的内部表示与临床识别的声学模式一致。通过证明单个统一模型可以有效地筛选不同的条件，这项工作为资源受限和远程医疗环境中可部署的基于语音的诊断奠定了基础。
摘要：Voice-based health assessment offers unprecedented opportunities for scalable, non-invasive disease screening, yet existing approaches typically focus on single conditions and fail to leverage the rich, multi-faceted information embedded in speech. We present MARVEL (Multi-task Acoustic Representations for Voice-based Health Analysis), a privacy-conscious multitask learning framework that simultaneously detects nine distinct neurological, respiratory, and voice disorders using only derived acoustic features, eliminating the need for raw audio transmission. Our dual-branch architecture employs specialized encoders with task-specific heads sharing a common acoustic backbone, enabling effective cross-condition knowledge transfer. Evaluated on the large-scale Bridge2AI-Voice v2.0 dataset, MARVEL achieves an overall AUROC of 0.78, with exceptional performance on neurological disorders (AUROC = 0.89), particularly for Alzheimer's disease/mild cognitive impairment (AUROC = 0.97). Our framework consistently outperforms single-modal baselines by 5-19% and surpasses state-of-the-art self-supervised models on 7 of 9 tasks, while correlation analysis reveals that the learned representations exhibit meaningful similarities with established acoustic features, indicating that the model's internal representations are consistent with clinically recognized acoustic patterns. By demonstrating that a single unified model can effectively screen for diverse conditions, this work establishes a foundation for deployable voice-based diagnostics in resource-constrained and remote healthcare settings.

【2】MedGR$^2$: Breaking the Data Barrier for Medical Reasoning via Generative Reward Learning
标题：MedGR $' 2 $：通过生成奖励学习打破医学推理的数据障碍
链接：https://arxiv.org/abs/2508.20549

作者：i, Jiayan Guo, Shangyang Li
备注：8 pages, 5 figures
摘要：视觉语言模型（VLM）在医学中的应用受到缺乏高质量专家注释数据的严重阻碍。现有数据集上的监督微调（SFT）通常会导致对看不见的模式和任务的泛化能力较差，而强化学习（RL）是一种有前途的替代方案，由于在这个数据稀缺的领域缺乏可靠的奖励信号而受到阻碍。为了打破这一僵局，我们引入了医学推理的生成性奖励学习（MedGR $^2 $），这是一个创造自我改善良性循环的新框架。MedGR共同开发了一个数据生成器和一个奖励模型，能够自动、持续地创建高质量、多模态的医疗数据，这些数据既是SFT和RL的高级训练源。我们的实验表明，使用MedGR生成的数据的SFT已经超过了在大规模人工策划的数据集上训练的基线。至关重要的是，当通过组相对策略优化（GRPO）将这些数据用于RL时，我们的模型实现了最先进的跨模态和跨任务泛化，显著优于基于RL的专业方法。此外，我们的紧凑型模型，授权MedGR $^2 $，实现性能竞争力的基础模型拥有超过10倍的参数。MedGR为高风险领域的数据高效学习提供了一种新的范式，将问题从数据稀缺转变为数据生成，并释放RL的全部潜力，以构建真正可推广的医疗AI。
摘要：The application of Vision-Language Models (VLMs) in medicine is critically hampered by the scarcity of high-quality, expert-annotated data. Supervised Fine-Tuning (SFT) on existing datasets often leads to poor generalization on unseen modalities and tasks, while Reinforcement Learning (RL), a promising alternative, is stymied by the lack of reliable reward signals in this data-scarce domain. To break this impasse, we introduce Generative Reward Learning for Medical Reasoning (MedGR$^2$), a novel framework that creates a self-improving virtuous cycle. MedGR$^2$ co-develops a data generator and a reward model, enabling the automated, continuous creation of high-quality, multi-modal medical data that serves as both a superior training source for SFT and RL. Our experiments demonstrate that SFT with MedGR$^2$-produced data already surpasses baselines trained on large-scale, human-curated datasets. Crucially, when leveraging this data for RL via Group Relative Policy Optimization (GRPO), our model achieves state-of-the-art cross-modality and cross-task generalization, significantly outperforming specialized RL-based methods. Furthermore, our compact model, empowered by MedGR$^2$, achieves performance competitive with foundation models possessing over 10 times more parameters. MedGR$^2$ presents a new paradigm for data-efficient learning in high-stakes domains, transforming the problem from data scarcity to data generation and unlocking the full potential of RL for building truly generalizable medical AI.

【3】Enhancing Corpus Callosum Segmentation in Fetal MRI via Pathology-Informed Domain Randomization
标题：通过病理信息域随机化增强胎儿MRI中的毛囊体分割
链接：https://arxiv.org/abs/2508.20475

作者：ifell i Plana, Vladyslav Zalevskyi, Léa Schmidt, Yvan Gomez, Thomas Sanchez, Vincent Dunet, Mériam Koob, Vanessa Siffredi, Meritxell Bach Cuadra
备注：Accepted at the PIPPI Workshop of MICCAI 2025
摘要：准确的胎儿脑分割对于提取生物标志物和评估神经发育至关重要，特别是在胼胝体发育不全（CCD）等可能引起剧烈解剖学变化的情况下。然而，CCD的稀缺性严重限制了注释数据，阻碍了深度学习模型的推广。为了解决这个问题，我们提出了一种病理学知情域随机化策略，将CCD表现的先验知识嵌入到合成数据生成管道中。通过单独从健康数据中模拟不同的大脑变化，我们的方法可以在不需要病理注释的情况下实现强大的分割。我们验证了我们的方法在一个队列，包括248个健康胎儿，26个CCD，和47个其他脑病变，实现了CCD情况下的实质性改善，同时保持健康胎儿和其他病理的性能。从预测的分割，我们得出临床相关的生物标志物，如胼胝体长度（LCC）和体积，并显示其效用在区分CCD亚型。我们的病理信息增强将健康病例的LCC估计误差从1.89 mm降低到0.80 mm，将CCD病例的LCC估计误差从10.9 mm降低到0.7 mm。除了这些定量的收益，我们的方法产生的分割与改进的拓扑一致性相对于现有的地面真相，使更可靠的基于形状的分析。总的来说，这项工作表明，将特定领域的解剖学先验纳入合成数据管道可以有效地缓解数据稀缺性，并加强对罕见但具有临床意义的畸形的分析。
摘要：Accurate fetal brain segmentation is crucial for extracting biomarkers and assessing neurodevelopment, especially in conditions such as corpus callosum dysgenesis (CCD), which can induce drastic anatomical changes. However, the rarity of CCD severely limits annotated data, hindering the generalization of deep learning models. To address this, we propose a pathology-informed domain randomization strategy that embeds prior knowledge of CCD manifestations into a synthetic data generation pipeline. By simulating diverse brain alterations from healthy data alone, our approach enables robust segmentation without requiring pathological annotations. We validate our method on a cohort comprising 248 healthy fetuses, 26 with CCD, and 47 with other brain pathologies, achieving substantial improvements on CCD cases while maintaining performance on both healthy fetuses and those with other pathologies. From the predicted segmentations, we derive clinically relevant biomarkers, such as corpus callosum length (LCC) and volume, and show their utility in distinguishing CCD subtypes. Our pathology-informed augmentation reduces the LCC estimation error from 1.89 mm to 0.80 mm in healthy cases and from 10.9 mm to 0.7 mm in CCD cases. Beyond these quantitative gains, our approach yields segmentations with improved topological consistency relative to available ground truth, enabling more reliable shape-based analyses. Overall, this work demonstrates that incorporating domain-specific anatomical priors into synthetic data pipelines can effectively mitigate data scarcity and enhance analysis of rare but clinically significant malformations.

【4】Dual-Model Weight Selection and Self-Knowledge Distillation for Medical Image Classification
标题：基于双模型权值选择和自知识提取的医学图像分类
链接：https://arxiv.org/abs/2508.20461

作者：tsumi, Guang Li, Ren Togo, Takahiro Ogawa, Satoshi Kondo, Miki Haseyama
摘要：提出了一种新的医学图像分类方法，将双模型权重选择与自知识蒸馏（SKD）相结合。在现实世界的医疗环境中，部署大规模模型往往受到计算资源约束的限制，这对其实际实施构成了重大挑战。因此，开发轻量级模型，实现与大规模模型相当的性能，同时保持计算效率是至关重要的。为了解决这个问题，我们采用了一种双模型权重选择策略，该策略将两个轻量级模型与来自大型预训练模型的权重相结合，从而实现有效的知识转移。接下来，SKD被应用到这些选定的模型，允许使用广泛的初始权重配置，而不施加额外的过多的计算成本，然后微调的目标分类任务。通过将双模型权重选择与自知识蒸馏相结合，我们的方法克服了传统方法的局限性，这些方法通常无法保留紧凑模型中的关键信息。广泛的实验公开提供的胸部X射线图像，肺部计算机断层扫描，脑磁共振成像扫描，证明了我们的方法相比，现有的方法的优越性能和鲁棒性。
摘要：We propose a novel medical image classification method that integrates dual-model weight selection with self-knowledge distillation (SKD). In real-world medical settings, deploying large-scale models is often limited by computational resource constraints, which pose significant challenges for their practical implementation. Thus, developing lightweight models that achieve comparable performance to large-scale models while maintaining computational efficiency is crucial. To address this, we employ a dual-model weight selection strategy that initializes two lightweight models with weights derived from a large pretrained model, enabling effective knowledge transfer. Next, SKD is applied to these selected models, allowing the use of a broad range of initial weight configurations without imposing additional excessive computational cost, followed by fine-tuning for the target classification tasks. By combining dual-model weight selection with self-knowledge distillation, our method overcomes the limitations of conventional approaches, which often fail to retain critical information in compact models. Extensive experiments on publicly available datasets-chest X-ray images, lung computed tomography scans, and brain magnetic resonance imaging scans-demonstrate the superior performance and robustness of our approach compared to existing methods.

【5】A Systematic Review on the Generative AI Applications in Human Medical Genomics
标题：生成人工智能在人类医学基因组学中的应用系统回顾
链接：https://arxiv.org/abs/2508.20275

作者：ngalidis, Yury Barbitoff, Yulia Nasykhova, Andrey Glotov
备注：31 pages, 5 figures
摘要：尽管传统的统计技术和机器学习方法对遗传学，特别是遗传性疾病诊断做出了重大贡献，但它们往往难以处理复杂的高维数据，这是最先进的深度学习模型所面临的挑战。基于Transformer架构的大型语言模型（LLM）在需要对非结构化医疗数据进行上下文理解的任务中表现出色。这篇系统性综述探讨了LLM在罕见和常见疾病的遗传研究和诊断中的作用。在PubMed，bioRxiv，medRxiv和arXiv中进行了基于关键词的自动搜索，针对LLM在遗传学诊断和教育中的应用研究，并删除不相关或过时的模型。共分析了172项研究，突出了基因组变异识别，注释和解释的应用，以及通过Vision Transformers实现的医学成像进步。主要研究结果表明，虽然基于transformer的模型显着推进疾病和风险分层，变异解释，医学成像分析和报告生成，但将多模态数据（基因组序列，成像和临床记录）整合到统一和临床稳健的管道中仍然存在重大挑战，面临着临床环境中的普遍性和实际实施的限制。本综述对LLM在改造遗传性疾病诊断和支持遗传教育方面的现有能力和局限性进行了全面的分类和评估，为驾驭这一快速发展的领域提供了指南。
摘要：Although traditional statistical techniques and machine learning methods have contributed significantly to genetics and, in particular, inherited disease diagnosis, they often struggle with complex, high-dimensional data, a challenge now addressed by state-of-the-art deep learning models. Large language models (LLMs), based on transformer architectures, have excelled in tasks requiring contextual comprehension of unstructured medical data. This systematic review examines the role of LLMs in the genetic research and diagnostics of both rare and common diseases. Automated keyword-based search in PubMed, bioRxiv, medRxiv, and arXiv was conducted, targeting studies on LLM applications in diagnostics and education within genetics and removing irrelevant or outdated models. A total of 172 studies were analyzed, highlighting applications in genomic variant identification, annotation, and interpretation, as well as medical imaging advancements through vision transformers. Key findings indicate that while transformer-based models significantly advance disease and risk stratification, variant interpretation, medical imaging analysis, and report generation, major challenges persist in integrating multimodal data (genomic sequences, imaging, and clinical records) into unified and clinically robust pipelines, facing limitations in generalizability and practical implementation in clinical settings. This review provides a comprehensive classification and assessment of the current capabilities and limitations of LLMs in transforming hereditary disease diagnostics and supporting genetic education, serving as a guide to navigate this rapidly evolving field.

【6】Is the medical image segmentation problem solved? A survey of current developments and future directions
标题：医学图像分割问题解决了吗？当前发展和未来方向概览
链接：https://arxiv.org/abs/2508.20139

作者：u, Jayaram K. Udupa, Jax Luo, Songlin Zhao, Yajun Yu, Scott B. Raymond, Hao Peng, Lipeng Ning, Yogesh Rathi, Wei Liu, You Zhang
备注：80 pages, 38 figures
摘要：在过去的二十年里，医学图像分割在很大程度上受到深度学习的推动，这使得能够在不同的成像模式中准确有效地描绘细胞、组织、器官和病理。这一进展提出了一个根本性问题：目前的模式在多大程度上克服了持续存在的挑战，还有哪些差距？在这项工作中，我们提供了一个深入的审查医学图像分割，跟踪其进展和关键发展在过去十年。我们研究的核心原则，包括多尺度分析，注意力机制，和先验知识的整合，跨编码器，瓶颈，跳过连接，和解码器组件的分割网络。我们的讨论围绕着七个关键方面展开：（1）从监督学习到半/无监督学习的转变，（2）从器官分割到以病变为中心的任务的转变，（3）多模态集成和领域适应的进步，（4）基础模型和迁移学习的作用，（5）从确定性分割到概率分割的转变，（6）从2D到3D和4D分割的进展，以及（7）从模型调用到分割代理的趋势。这些观点共同提供了基于深度学习的医学图像分割轨迹的整体概述，旨在激发未来的创新。为了支持正在进行的研究，我们在https://github.com/apple1986/medicalSegReview上维护了一个不断更新的相关文献和开源资源库
摘要：Medical image segmentation has advanced rapidly over the past two decades, largely driven by deep learning, which has enabled accurate and efficient delineation of cells, tissues, organs, and pathologies across diverse imaging modalities. This progress raises a fundamental question: to what extent have current models overcome persistent challenges, and what gaps remain? In this work, we provide an in-depth review of medical image segmentation, tracing its progress and key developments over the past decade. We examine core principles, including multiscale analysis, attention mechanisms, and the integration of prior knowledge, across the encoder, bottleneck, skip connections, and decoder components of segmentation networks. Our discussion is organized around seven key dimensions: (1) the shift from supervised to semi-/unsupervised learning, (2) the transition from organ segmentation to lesion-focused tasks, (3) advances in multi-modality integration and domain adaptation, (4) the role of foundation models and transfer learning, (5) the move from deterministic to probabilistic segmentation, (6) the progression from 2D to 3D and 4D segmentation, and (7) the trend from model invocation to segmentation agents. Together, these perspectives provide a holistic overview of the trajectory of deep learning-based medical image segmentation and aim to inspire future innovation. To support ongoing research, we maintain a continually updated repository of relevant literature and open-source resources at https://github.com/apple1986/medicalSegReview

蒸馏|知识提取(3篇)

【1】Learning Robust Spatial Representations from Binaural Audio through Feature Distillation
标题：通过特征提取从双耳音频中学习稳健的空间表示
链接：https://arxiv.org/abs/2508.20914

作者：verin Bovbjerg (1), Jan Østergaard (1), Jesper Jensen (1, 2), Shinji Watanabe (3), Zheng-Hua Tan ((1) Aalborg University (2) Eriksholm Research Centre, (3) Carnegie Mellon University)
备注：To appear in Proc. WASPAA 2025, October 12-15, 2025, Tahoe, US. Copyright (c) 2025 IEEE. 5 pages, 2 figures, 2 tables
摘要：最近，深度表示学习在多个音频任务中表现出了强大的性能。然而，其用于从多声道音频学习空间表示的用途还未得到充分探索。我们研究了使用基于特征蒸馏的预训练阶段来学习双耳语音的鲁棒空间表示，而不需要数据标签。在这个框架中，从干净的双耳语音样本计算空间特征，以形成预测标签。然后使用神经网络从相应的增强语音预测这些干净的特征。在预训练之后，我们丢弃空间特征预测器，并使用学习的编码器权重来初始化DoA估计模型，我们对DoA估计进行微调。我们的实验表明，与完全监督模型和经典信号处理方法相比，预训练模型在对到达方向估计进行微调后，在噪声和混响环境中表现出更好的性能。
摘要：Recently, deep representation learning has shown strong performance in multiple audio tasks. However, its use for learning spatial representations from multichannel audio is underexplored. We investigate the use of a pretraining stage based on feature distillation to learn a robust spatial representation of binaural speech without the need for data labels. In this framework, spatial features are computed from clean binaural speech samples to form prediction labels. These clean features are then predicted from corresponding augmented speech using a neural network. After pretraining, we throw away the spatial feature predictor and use the learned encoder weights to initialize a DoA estimation model which we fine-tune for DoA estimation. Our experiments demonstrate that the pretrained models show improved performance in noisy and reverberant environments after fine-tuning for direction-of-arrival estimation, when compared to fully supervised models and classic signal processing methods.

【2】VarDiU: A Variational Diffusive Upper Bound for One-Step Diffusion Distillation
标题：VarDiU：一步扩散蒸馏的变分扩散上界
链接：https://arxiv.org/abs/2508.20646

作者：ng, Mingtian Zhang, Zijing Ou, David Barber
备注：Leyang Wang and Mingtian Zhang contributed equally to this work
摘要：最近，扩散蒸馏方法已经将千步教师扩散模型压缩成一步学生生成器，同时保持样本质量。大多数现有方法使用扩散发散来训练学生模型，扩散发散的梯度通过学生的得分函数来近似，通过去噪得分匹配（DSM）来学习。由于DSM训练是不完美的，因此得到的梯度估计不可避免地有偏差，导致次优性能。在本文中，我们提出了VarDiU（发音为/va：rdju：/），这是一个变分扩散上界，它允许无偏梯度估计器，并且可以直接应用于扩散蒸馏。使用这个目标，我们将我们的方法与Diff-Instruct进行比较，并证明它实现了更高的生成质量，并为一步扩散蒸馏提供了更有效和稳定的训练过程。
摘要：Recently, diffusion distillation methods have compressed thousand-step teacher diffusion models into one-step student generators while preserving sample quality. Most existing approaches train the student model using a diffusive divergence whose gradient is approximated via the student's score function, learned through denoising score matching (DSM). Since DSM training is imperfect, the resulting gradient estimate is inevitably biased, leading to sub-optimal performance. In this paper, we propose VarDiU (pronounced /va:rdju:/), a Variational Diffusive Upper Bound that admits an unbiased gradient estimator and can be directly applied to diffusion distillation. Using this objective, we compare our method with Diff-Instruct and demonstrate that it achieves higher generation quality and enables a more efficient and stable training procedure for one-step diffusion distillation.

【3】The Role of Teacher Calibration in Knowledge Distillation
标题：教师校准在知识提炼中的作用
链接：https://arxiv.org/abs/2508.20224

作者：im, Seonguk Park, Junhoo Lee, Nojun Kwak
备注：None
摘要：知识蒸馏（KD）已经成为深度学习中一种有效的模型压缩技术，可以将知识从大型教师模型转移到紧凑的学生模型。虽然KD已经取得了显著的成功，但还没有完全了解哪些因素有助于提高学生的表现。在本文中，我们揭示了教师的校准误差和学生的准确性之间的强相关性。因此，我们认为教师模型的校准是有效KD的重要因素。此外，我们证明了KD的性能可以通过简单地采用校准方法，减少了教师的校准误差。我们的算法是通用的，证明了从分类到检测的各种任务的有效性。此外，它可以很容易地与现有的最先进的方法集成，始终实现卓越的性能。
摘要：Knowledge Distillation (KD) has emerged as an effective model compression technique in deep learning, enabling the transfer of knowledge from a large teacher model to a compact student model. While KD has demonstrated significant success, it is not yet fully understood which factors contribute to improving the student's performance. In this paper, we reveal a strong correlation between the teacher's calibration error and the student's accuracy. Therefore, we claim that the calibration of the teacher model is an important factor for effective KD. Furthermore, we demonstrate that the performance of KD can be improved by simply employing a calibration method that reduces the teacher's calibration error. Our algorithm is versatile, demonstrating effectiveness across various tasks from classification to detection. Moreover, it can be easily integrated with existing state-of-the-art methods, consistently achieving superior performance.

推荐(3篇)

【1】Efficient Large-Scale Cross-Domain Sequential Recommendation with Dynamic State Representations
标题：具有动态状态表示的高效大规模跨域顺序推荐
链接：https://arxiv.org/abs/2508.20945

作者： Loureiro, Steven Derby, Aleksei Medvedev, Alejandro Ariza-Casabona, Gonzalo Fiz Pontiveros, Tri Kurniawan Wijaya
备注：4 pages
摘要：最近，自回归推荐模型（ARM），如Meta的HSTU模型，已经成为传统深度学习推荐模型（DLRM）的重大突破，表现出备受追捧的标度律行为。然而，当应用于多领域场景时，Transformer架构的注意力地图成为计算瓶颈，因为它们涉及每个领域的所有项目。为了应对这一挑战，系统必须有效地平衡域间和域内知识转移。在这项工作中，我们介绍了一种新的方法，可扩展的多域推荐系统，取代完全域间的注意力与两个创新的机制：1）Transition-Aware位置嵌入（TAPE）：我们提出了新的位置嵌入，占域转换特定的信息。这允许注意力仅集中在域内项目上，有效地减少了与关注不相关领域相关的不必要的计算成本。2)动态域状态表示（DDSR）：我们为每个域引入一个动态状态表示，在后续的令牌预测期间存储和访问。这使得相关领域信息的有效传输不依赖于完整的注意力地图。我们的方法提供了一个可扩展的解决方案所带来的挑战，大规模的，多域推荐系统，并展示了显着的改进，通过单独建模和结合域间和域内表示的检索任务。
摘要：Recently, autoregressive recommendation models (ARMs), such as Meta's HSTU model, have emerged as a major breakthrough over traditional Deep Learning Recommendation Models (DLRMs), exhibiting the highly sought-after scaling law behaviour. However, when applied to multi-domain scenarios, the transformer architecture's attention maps become a computational bottleneck, as they attend to all items across every domain. To tackle this challenge, systems must efficiently balance inter and intra-domain knowledge transfer. In this work, we introduce a novel approach for scalable multi-domain recommendation systems by replacing full inter-domain attention with two innovative mechanisms: 1) Transition-Aware Positional Embeddings (TAPE): We propose novel positional embeddings that account for domain-transition specific information. This allows attention to be focused solely on intra-domain items, effectively reducing the unnecessary computational cost associated with attending to irrelevant domains. 2) Dynamic Domain State Representation (DDSR): We introduce a dynamic state representation for each domain, which is stored and accessed during subsequent token predictions. This enables the efficient transfer of relevant domain information without relying on full attention maps. Our method offers a scalable solution to the challenges posed by large-scale, multi-domain recommendation systems and demonstrates significant improvements in retrieval tasks by separately modelling and combining inter- and intra-domain representations.

【2】SemSR: Semantics aware robust Session-based Recommendations
标题：SemSR：语义感知的健壮的基于会话的推荐
链接：https://arxiv.org/abs/2508.20587

作者：wariya, Priyanka Gupta, Muskan Gupta, Jyotsana Khatri, Lovekesh Vig
备注：Accepted at EARL workshop @RecSys'25, Prague, Czech Republic
摘要：基于会话的推荐（SR）模型旨在根据匿名用户在当前会话期间的行为向其推荐项目。虽然文献中的各种SR模型利用项目序列来预测下一个项目，但它们通常无法利用来自项目标题或描述的语义信息，从而阻碍会话意图识别和可解释性。最近的研究已经探索了大语言模型（LLM）作为增强基于会话的推荐的有前途的方法，基于语义的方法和基于微调的方法都被广泛研究。然而，基于任务的方法很难识别出最佳提示，这些提示在测试时引起正确的推理并且缺乏针对特定任务的反馈，从而导致次优推荐。微调方法结合了特定领域的知识，但产生显着的计算成本的实施和维护。在本文中，我们提出了多种方法来利用LLM进行基于会话的推荐：（i）上下文LLM作为推荐代理，（ii）LLM生成的表示用于深度学习SR模型的语义初始化，以及（iii）LLM与数据驱动SR模型的集成。通过对两个真实世界公开数据集的综合实验，我们证明了基于LLM的方法在粗粒度检索（高召回值）方面表现出色，而传统的数据驱动技术在细粒度排名（高平均倒数排名值）方面表现良好。此外，LLM与数据驱动的SR模型的集成在召回和MRR指标方面显著优于独立的LLM方法和数据驱动的深度学习模型以及基线SR模型。
摘要：Session-based recommendation (SR) models aim to recommend items to anonymous users based on their behavior during the current session. While various SR models in the literature utilize item sequences to predict the next item, they often fail to leverage semantic information from item titles or descriptions impeding session intent identification and interpretability. Recent research has explored Large Language Models (LLMs) as promising approaches to enhance session-based recommendations, with both prompt-based and fine-tuning based methods being widely investigated. However, prompt-based methods struggle to identify optimal prompts that elicit correct reasoning and lack task-specific feedback at test time, resulting in sub-optimal recommendations. Fine-tuning methods incorporate domain-specific knowledge but incur significant computational costs for implementation and maintenance. In this paper, we present multiple approaches to utilize LLMs for session-based recommendation: (i) in-context LLMs as recommendation agents, (ii) LLM-generated representations for semantic initialization of deep learning SR models, and (iii) integration of LLMs with data-driven SR models. Through comprehensive experiments on two real-world publicly available datasets, we demonstrate that LLM-based methods excel at coarse-level retrieval (high recall values), while traditional data-driven techniques perform well at fine-grained ranking (high Mean Reciprocal Rank values). Furthermore, the integration of LLMs with data-driven SR models significantly out performs both standalone LLM approaches and data-driven deep learning models, as well as baseline SR models, in terms of both Recall and MRR metrics.

【3】ELIXIR: Efficient and LIghtweight model for eXplaIning Recommendations
标题：ELIXIR：一种高效、简洁的推荐解释模型
链接：https://arxiv.org/abs/2508.20312

作者：go, Vincent Guigue, Pirmin Lemberger
备注：10 pages, 3 figures, 6 Tables
摘要：协同过滤驱动了许多成功的推荐系统，但与细粒度的用户项目交互和可解释性斗争。随着用户越来越多地寻求透明的推荐，通过语言模型生成文本解释已成为一个重要的研究领域。现有方法采用RNN或Transformers。然而，基于RNN的方法无法利用预先训练的Transformer模型的功能，而基于Transformer的方法通常会遇到次优适应和忽视方面建模的问题，这对于个性化解释至关重要。我们提出了ELIXIR（Efficient and Lighthood model for eXplaining Recommendations），一个结合评分预测和个性化评论生成的多任务模型。ELIXIR共同学习用户和项目的全局和特定方面的表示，优化整体评级，方面级别评级和评论生成，并个性化关注以强调方面的重要性。基于T5-small（60 M）模型，我们证明了我们基于方面的架构在个性化背景下指导文本生成的有效性，其中最先进的方法利用更大的模型，但也无法匹配用户偏好。在TripAdvisor和RateBeer上的实验结果表明，ELIXIR显著优于强基线模型，特别是在评论生成方面。
摘要：Collaborative filtering drives many successful recommender systems but struggles with fine-grained user-item interactions and explainability. As users increasingly seek transparent recommendations, generating textual explanations through language models has become a critical research area. Existing methods employ either RNNs or Transformers. However, RNN-based approaches fail to leverage the capabilities of pre-trained Transformer models, whereas Transformer-based methods often suffer from suboptimal adaptation and neglect aspect modeling, which is crucial for personalized explanations. We propose ELIXIR (Efficient and LIghtweight model for eXplaIning Recommendations), a multi-task model combining rating prediction with personalized review generation. ELIXIR jointly learns global and aspect-specific representations of users and items, optimizing overall rating, aspect-level ratings, and review generation, with personalized attention to emphasize aspect importance. Based on a T5-small (60M) model, we demonstrate the effectiveness of our aspect-based architecture in guiding text generation in a personalized context, where state-of-the-art approaches exploit much larger models but fail to match user preferences as well. Experimental results on TripAdvisor and RateBeer demonstrate that ELIXIR significantly outperforms strong baseline models, especially in review generation.

推理|分析|理解|解释(4篇)

【1】ChainReaction! Structured Approach with Causal Chains as Intermediate Representations for Improved and Explainable Causal Video Question Answering
标题：连锁反应！以因果链作为中间表示的结构化方法用于改进且可解释的因果视频问题回答
链接：https://arxiv.org/abs/2508.21010

作者：Parmar, Eric Peh, Basura Fernando
备注：Project page: this https URL
摘要：现有的视频问答（VideoQA）模型通常难以进行高阶推理，依赖于不透明的单一管道，这些管道将视频理解、因果推理和答案生成纠缠在一起。这些黑盒方法提供的可解释性有限，并且往往依赖于浅层启发式方法。我们提出了一种新的，模块化的框架，显式地将因果推理从答案生成，引入自然语言因果链作为可解释的中间表示。受人类认知模型的启发，这些结构化的因果序列将低级视频内容与高级因果推理联系起来，从而实现透明和逻辑连贯的推理。我们的两阶段架构包括一个因果链提取器（CCE），从视频问题对产生因果链，和一个因果链驱动的回答器（CCDA），产生在这些链接地的答案。为了解决缺乏注释推理痕迹的问题，我们引入了一种可扩展的方法，用于使用大型语言模型从现有数据集生成高质量的因果链。我们还提出了CauCo，一个新的评价指标，面向字幕。三个大规模的基准测试的实验表明，我们的方法不仅优于国家的最先进的模型，但也产生了可观的收益，在可解释性，用户的信任和泛化-定位CCE作为一个可重用的因果推理引擎在不同的领域。项目页面：https://paritoshparmar.github.io/chainreaction/
摘要：Existing Causal-Why Video Question Answering (VideoQA) models often struggle with higher-order reasoning, relying on opaque, monolithic pipelines that entangle video understanding, causal inference, and answer generation. These black-box approaches offer limited interpretability and tend to depend on shallow heuristics. We propose a novel, modular framework that explicitly decouples causal reasoning from answer generation, introducing natural language causal chains as interpretable intermediate representations. Inspired by human cognitive models, these structured cause-effect sequences bridge low-level video content with high-level causal reasoning, enabling transparent and logically coherent inference. Our two-stage architecture comprises a Causal Chain Extractor (CCE) that generates causal chains from video-question pairs, and a Causal Chain-Driven Answerer (CCDA) that produces answers grounded in these chains. To address the lack of annotated reasoning traces, we introduce a scalable method for generating high-quality causal chains from existing datasets using large language models. We also propose CauCo, a new evaluation metric for causality-oriented captioning. Experiments on three large-scale benchmarks demonstrate that our approach not only outperforms state-of-the-art models, but also yields substantial gains in explainability, user trust, and generalization -- positioning the CCE as a reusable causal reasoning engine across diverse domains. Project page: https://paritoshparmar.github.io/chainreaction/

【2】On Identifying Why and When Foundation Models Perform Well on Time-Series Forecasting Using Automated Explanations and Rating
标题：关于使用自动化解释和评级识别基金会模型在时间序列预测中表现良好的原因和时间
链接：https://arxiv.org/abs/2508.20437

作者：idener, Kausik Lakkaraju, John Aydin, Biplav Srivastava
备注：8 pages, 5 Tables, 5 Figures, AI Trustworthiness and Risk Assessment for Challenged Contexts (ATRACC), Appendix
摘要：时间序列预测模型（TSFM）已经从经典的统计方法发展到复杂的基础模型，但理解这些模型成功或失败的原因和时间仍然具有挑战性。尽管存在这一已知的限制，但时间序列预测模型越来越多地用于生成信息，为现实世界的行动提供信息，并产生同样真实的后果。了解这些模型的复杂性、性能可变性和不透明性，就成为一项有价值的努力，可以解决用户应该如何与这些模型的输出进行交互和依赖这些输出的严重问题。这项工作通过将传统的可解释人工智能（XAI）方法与评级驱动解释（RDE）相结合来评估TSFM在不同领域和用例中的性能和可解释性，从而解决了这些问题。我们评估了四种不同的模型架构：ARIMA，Gradient Boosting，Chronos（特定于时间序列的基础模型），Llama（通用;微调和基础模型），涵盖金融，能源，交通和汽车销售领域的四个异构数据集。在这样做的过程中，我们证明了功能工程模型（例如，梯度提升）始终优于基础模型（例如，Chronos）在易变或稀疏域（例如，电力，汽车零件），同时提供更多可解释的解释，而基础模型仅在稳定或趋势驱动的背景下表现出色（例如，金融）。
摘要：Time-series forecasting models (TSFM) have evolved from classical statistical methods to sophisticated foundation models, yet understanding why and when these models succeed or fail remains challenging. Despite this known limitation, time series forecasting models are increasingly used to generate information that informs real-world actions with equally real consequences. Understanding the complexity, performance variability, and opaque nature of these models then becomes a valuable endeavor to combat serious concerns about how users should interact with and rely on these models' outputs. This work addresses these concerns by combining traditional explainable AI (XAI) methods with Rating Driven Explanations (RDE) to assess TSFM performance and interpretability across diverse domains and use cases. We evaluate four distinct model architectures: ARIMA, Gradient Boosting, Chronos (time-series specific foundation model), Llama (general-purpose; both fine-tuned and base models) on four heterogeneous datasets spanning finance, energy, transportation, and automotive sales domains. In doing so, we demonstrate that feature-engineered models (e.g., Gradient Boosting) consistently outperform foundation models (e.g., Chronos) in volatile or sparse domains (e.g., power, car parts) while providing more interpretable explanations, whereas foundation models excel only in stable or trend-driven contexts (e.g., finance).

【3】Understanding Incremental Learning with Closed-form Solution to Gradient Flow on Overparamerterized Matrix Factorization
标题：通过过度参数化矩阵分解上的梯度流的封闭形式解理解增量学习
链接：https://arxiv.org/abs/2508.20344

作者：Min, René Vidal
备注：Accepted to CDC 2025
摘要：许多关于神经网络的理论研究将其优异的经验性能归因于在某些初始化假设下训练网络时，一阶优化算法引起的隐式偏差或正则化。一个例子是梯度流（GF）中的增量学习现象，在一个小初始化的过参数化矩阵分解问题上：GF通过顺序学习目标矩阵的奇异值来学习目标矩阵。在本文中，我们使用通过求解类黎卡提矩阵微分方程获得的封闭形式解，对GF在对称矩阵分解问题上的增量学习行为进行了定量理解。我们发现，增量学习出现在一些时间尺度分离动态对应于学习目标矩阵中的不同组件。通过减小初始化尺度，这些时间尺度分离变得更加突出，允许找到目标矩阵的低秩近似。最后，我们讨论了可能的途径，扩展这种分析的非对称矩阵分解问题。
摘要：Many theoretical studies on neural networks attribute their excellent empirical performance to the implicit bias or regularization induced by first-order optimization algorithms when training networks under certain initialization assumptions. One example is the incremental learning phenomenon in gradient flow (GF) on an overparamerterized matrix factorization problem with small initialization: GF learns a target matrix by sequentially learning its singular values in decreasing order of magnitude over time. In this paper, we develop a quantitative understanding of this incremental learning behavior for GF on the symmetric matrix factorization problem, using its closed-form solution obtained by solving a Riccati-like matrix differential equation. We show that incremental learning emerges from some time-scale separation among dynamics corresponding to learning different components in the target matrix. By decreasing the initialization scale, these time-scale separations become more prominent, allowing one to find low-rank approximations of the target matrix. Lastly, we discuss the possible avenues for extending this analysis to asymmetric matrix factorization problems.

【4】Artificial Intelligence for CRISPR Guide RNA Design: Explainable Models and Off-Target Safety
标题：CRISPR引导RNA设计的人工智能：可解释模型和脱靶安全性
链接：https://arxiv.org/abs/2508.20130

作者：bbaszadeh, Armita Shahlai
备注：29 pages, 5 figures, 2 tables, 42 cited references
摘要：基于CRISPR的基因组编辑已经彻底改变了生物技术，但优化指导RNA（gRNA）设计以提高效率和安全性仍然是一个关键挑战。最近的进展（2020- 2025年，如果需要，更新以反映当年）表明，人工智能（AI），特别是深度学习，可以显着提高对gRNA靶活性的预测，并识别脱靶风险。与此同时，新兴的可解释人工智能（XAI）技术开始阐明这些模型的黑盒性质，提供对驱动Cas酶性能的序列特征和基因组背景的见解。在这里，我们回顾了最先进的机器学习模型如何增强CRISPR系统的gRNA设计，强调了解释模型预测的策略，并讨论了脱靶预测和安全评估的新发展。我们强调来自顶级期刊的突破，这些期刊强调人工智能和基因组编辑的跨学科融合，以实现更有效，更具体和临床可行的CRISPR应用。
摘要：CRISPR-based genome editing has revolutionized biotechnology, yet optimizing guide RNA (gRNA) design for efficiency and safety remains a critical challenge. Recent advances (2020--2025, updated to reflect current year if needed) demonstrate that artificial intelligence (AI), especially deep learning, can markedly improve the prediction of gRNA on-target activity and identify off-target risks. In parallel, emerging explainable AI (XAI) techniques are beginning to illuminate the black-box nature of these models, offering insights into sequence features and genomic contexts that drive Cas enzyme performance. Here we review how state-of-the-art machine learning models are enhancing gRNA design for CRISPR systems, highlight strategies for interpreting model predictions, and discuss new developments in off-target prediction and safety assessment. We emphasize breakthroughs from top-tier journals that underscore an interdisciplinary convergence of AI and genome editing to enable more efficient, specific, and clinically viable CRISPR applications.

检测相关(1篇)

【1】Multilingual Dataset Integration Strategies for Robust Audio Deepfake Detection: A SAFE Challenge System
标题：用于稳健音频深度伪造检测的多语言数据集集成策略：安全挑战系统
链接：https://arxiv.org/abs/2508.20983

作者：i, Surya Subramani, Lekha Bollinani, Nithin Sai Adupa, Sali El-Loh, Hafiz Malik
摘要：SAFE Challenge评估了三个任务的合成语音检测：未修改的音频，带有压缩伪影的处理音频，以及旨在逃避检测的清洗音频。我们系统地探索了自监督学习（SSL）前端、训练数据组成和音频长度配置，以实现强大的深度伪造检测。我们基于AASIST的方法将WavLM大型前端与RawBoost增强相结合，在包含256，600个样本的多语言数据集上进行训练，这些样本涵盖9种语言和来自CodecFake，MLAAD v5，SpoofCeleb，Famous Figures和MAILABS的70多个TTS系统。通过对不同SSL前端、三个训练数据版本和两种音频长度的广泛实验，我们在任务1（未修改音频检测）和任务3（清洗音频检测）中均获得第二名，证明了强大的泛化能力和鲁棒性。
摘要：The SAFE Challenge evaluates synthetic speech detection across three tasks: unmodified audio, processed audio with compression artifacts, and laundered audio designed to evade detection. We systematically explore self-supervised learning (SSL) front-ends, training data compositions, and audio length configurations for robust deepfake detection. Our AASIST-based approach incorporates WavLM large frontend with RawBoost augmentation, trained on a multilingual dataset of 256,600 samples spanning 9 languages and over 70 TTS systems from CodecFake, MLAAD v5, SpoofCeleb, Famous Figures, and MAILABS. Through extensive experimentation with different SSL front-ends, three training data versions, and two audio lengths, we achieved second place in both Task 1 (unmodified audio detection) and Task 3 (laundered audio detection), demonstrating strong generalization and robustness.

分类|识别(1篇)

【1】OLMoASR: Open Models and Data for Training Robust Speech Recognition Models
标题：OLMoASB：用于训练稳健语音识别模型的开放模型和数据
链接：https://arxiv.org/abs/2508.20869

作者：, Matt Deitke, Martijn Bartelds, Sarah Pratt, Josh Gardner, Matt Jordan, Ludwig Schmidt
备注：17 pages, 7 figures
摘要：训练数据规模和质量的改进带来了重大进步，但其在语音识别中的影响仍然没有得到充分研究。在本文中，我们提出了一个大规模的数据集，OLMoASR池，和一系列的模型，OLMoASR，研究和开发强大的zero-shot语音识别模型。从OLMoASR-Pool开始，收集了300万小时的英语音频和1700万份抄本，我们设计了文本启发式过滤器来删除低质量或错误的数据。我们的策展管道产生了一个新的数据集，其中包含100万小时的高质量音频-转录对，我们称之为OLMoASR-Mix。我们使用OLMoASR-Mix来训练OLMoASR-Mix模型套件，参数范围从39 M（tiny.en）到1.5 B（large.en）。在所有模型规模中，OLMoASR在短格式和长格式语音识别基准测试中的平均性能与OpenAI的Whisper相当。值得注意的是，OLMoASR-medium.en达到了12.8\%和11.0\%的单词错误率（WER），与Whisper最大的纯英语模型Whisper-medium.en的12.4\%和10.5\% WER分别用于短形式和长形式识别（在等效参数计数下）。OLMoASR池，OLMoASR模型，以及过滤，训练和评估代码将公开提供，以进一步研究鲁棒语音处理。
摘要：Improvements in training data scale and quality have led to significant advances, yet its influence in speech recognition remains underexplored. In this paper, we present a large-scale dataset, OLMoASR-Pool, and series of models, OLMoASR, to study and develop robust zero-shot speech recognition models. Beginning from OLMoASR-Pool, a collection of 3M hours of English audio and 17M transcripts, we design text heuristic filters to remove low-quality or mistranscribed data. Our curation pipeline produces a new dataset containing 1M hours of high-quality audio-transcript pairs, which we call OLMoASR-Mix. We use OLMoASR-Mix to train the OLMoASR-Mix suite of models, ranging from 39M (tiny.en) to 1.5B (large.en) parameters. Across all model scales, OLMoASR achieves comparable average performance to OpenAI's Whisper on short and long-form speech recognition benchmarks. Notably, OLMoASR-medium.en attains a 12.8\% and 11.0\% word error rate (WER) that is on par with Whisper's largest English-only model Whisper-medium.en's 12.4\% and 10.5\% WER for short and long-form recognition respectively (at equivalent parameter count). OLMoASR-Pool, OLMoASR models, and filtering, training and evaluation code will be made publicly available to further research on robust speech processing.

表征(4篇)

【1】EEGDM: Learning EEG Representation with Latent Diffusion Model
标题：EEGDM：使用潜在扩散模型学习脑电表示
链接：https://arxiv.org/abs/2508.20705

作者：Wang, Tong Liu, Ming Li, Minjing Yu, Yong-Jin Liu
摘要：虽然使用深度学习的脑电图（EEG）信号分析显示出巨大的前景，但现有方法在学习可概括的表示方面仍然面临着重大挑战，这些表示在不同的任务中表现良好，特别是在训练数据有限的情况下。当前的EEG表示学习方法包括EEGPT和LaBram通常依赖于简单的掩蔽重建目标，其可能无法完全捕获EEG信号中固有的丰富语义信息和复杂模式。在本文中，我们提出了EEGDM，一种新的自我监督的EEG表示学习方法的基础上的潜在的扩散模型，它利用脑电信号的生成作为一个自我监督的目标，把扩散模型成为一个强大的表示学习者能够捕捉EEG语义。EEGDM结合了EEG编码器，该编码器将EEG信号及其通道增强提取为紧凑的表示，作为条件信息来指导用于生成EEG信号的扩散模型。这种设计赋予EEGDM一个紧凑的潜在空间，这不仅提供了充分的控制生成过程，但也可以利用下游任务。实验结果表明，EEGDM（1）可以重建高质量的EEG信号，（2）有效地学习鲁棒的表示，（3）在不同的下游任务中，以适度的预训练数据大小实现有竞争力的性能，强调了其通用性和实用性。
摘要：While electroencephalography (EEG) signal analysis using deep learning has shown great promise, existing approaches still face significant challenges in learning generalizable representations that perform well across diverse tasks, particularly when training data is limited. Current EEG representation learning methods including EEGPT and LaBraM typically rely on simple masked reconstruction objective, which may not fully capture the rich semantic information and complex patterns inherent in EEG signals. In this paper, we propose EEGDM, a novel self-supervised EEG representation learning method based on the latent diffusion model, which leverages EEG signal generation as a self-supervised objective, turning the diffusion model into a strong representation learner capable of capturing EEG semantics. EEGDM incorporates an EEG encoder that distills EEG signals and their channel augmentations into a compact representation, acting as conditional information to guide the diffusion model for generating EEG signals. This design endows EEGDM with a compact latent space, which not only offers ample control over the generative process but also can be leveraged for downstream tasks. Experimental results show that EEGDM (1) can reconstruct high-quality EEG signals, (2) effectively learns robust representations, and (3) achieves competitive performance with modest pre-training data size across diverse downstream tasks, underscoring its generalizability and practical utility.

【2】Masked Autoencoders for Ultrasound Signals: Robust Representation Learning for Downstream Applications
标题：用于超声信号的掩蔽自动编码器：下游应用的稳健表示学习
链接：https://arxiv.org/abs/2508.20622

作者：Roßteutscher, Klaus S. Drese, Thorsten Uphues
备注：Submitted to IEEE Access. This is a preprint version. 14 pages, 6 figures
摘要：我们研究了具有Vision Transformer（ViT）架构的掩蔽自动编码器（MAE）的适应性和性能，用于一维（1D）超声信号的自监督表示学习。尽管MAE在计算机视觉和其他领域取得了巨大的成功，但它们在1D信号分析中的应用，特别是在原始超声数据中的应用，在很大程度上仍未得到探索。超声信号在工业应用中至关重要，例如无损检测（NDT）和结构健康监测（SHM），这些应用中标记的数据通常很少，信号处理具有高度的任务特定性。我们提出了一种利用MAE对未标记的合成超声信号进行预训练的方法，使模型能够学习鲁棒的表示，从而提高下游任务的性能，例如飞行时间（ToF）分类。该研究系统地研究了模型大小、补丁大小和掩蔽比对预训练效率和下游精度的影响。我们的研究结果表明，预训练的模型显著优于从头开始训练的模型和为下游任务优化的强卷积神经网络（CNN）基线。此外，与仅在有限的真实数据集上进行训练相比，在合成数据上进行预训练显示出对真实世界测量信号的卓越可移植性。这项研究强调了MAE通过可扩展的自监督学习推进超声信号分析的潜力。
摘要：We investigated the adaptation and performance of Masked Autoencoders (MAEs) with Vision Transformer (ViT) architectures for self-supervised representation learning on one-dimensional (1D) ultrasound signals. Although MAEs have demonstrated significant success in computer vision and other domains, their use for 1D signal analysis, especially for raw ultrasound data, remains largely unexplored. Ultrasound signals are vital in industrial applications such as non-destructive testing (NDT) and structural health monitoring (SHM), where labeled data are often scarce and signal processing is highly task-specific. We propose an approach that leverages MAE to pre-train on unlabeled synthetic ultrasound signals, enabling the model to learn robust representations that enhance performance in downstream tasks, such as time-of-flight (ToF) classification. This study systematically investigated the impact of model size, patch size, and masking ratio on pre-training efficiency and downstream accuracy. Our results show that pre-trained models significantly outperform models trained from scratch and strong convolutional neural network (CNN) baselines optimized for the downstream task. Additionally, pre-training on synthetic data demonstrates superior transferability to real-world measured signals compared with training solely on limited real datasets. This study underscores the potential of MAEs for advancing ultrasound signal analysis through scalable, self-supervised learning.

【3】FedReFT: Federated Representation Fine-Tuning with All-But-Me Aggregation
标题：FedReFT：使用除我之外的所有聚合进行联邦表示微调
链接：https://arxiv.org/abs/2508.20295

作者：ddika, Md Anwar Hossen, J. Pablo Muñoz, Tanya Roosta, Anuj Sharma, Ali Jannesari
摘要：参数有效微调（PEFT）通过修改一小部分参数来适应大型预训练模型，引起了人们的极大关注。最近，表征微调（ReFT）已成为一种有效的替代方案。ReFT将微调范式从更新模型权重转变为直接操纵捕获丰富语义信息的隐藏表示，并且在独立设置中比最先进的PEFT表现更好。然而，由于客户端的数据分布，模型容量和计算资源的异质性，其在联邦学习（FL）中的应用仍然具有挑战性。为了解决这些挑战，我们引入了联邦表示微调（FedReFT），一种新的方法来微调客户端的隐藏表示。FedReFT应用稀疏干预层直接引导隐藏表示，为边缘设备提供了一种轻量级且语义丰富的微调替代方案。然而，表示级更新是特别容易受到聚合不匹配不同的任务异构性，天真的平均可以破坏语义对齐。为了缓解这个问题，我们提出了All-But-Me（ABM）聚合，每个客户端接收其他客户端的聚合更新并部分合并它们，通过平衡本地焦点与全局知识来实现稳定和个性化的学习。我们评估FedReFT的常识推理，算术推理，推理调整和GLUE，它始终优于FL中最先进的PEFT方法，与领先的基于LoRA的方法相比，实现了7倍-15倍的参数效率。
摘要：Parameter-efficient fine-tuning (PEFT) has attracted significant attention for adapting large pre-trained models by modifying a small subset of parameters. Recently, Representation Fine-tuning (ReFT) has emerged as an effective alternative. ReFT shifts the fine-tuning paradigm from updating model weights to directly manipulating hidden representations that capture rich semantic information, and performs better than state-of-the-art PEFTs in standalone settings. However, its application in Federated Learning (FL) remains challenging due to heterogeneity in clients' data distributions, model capacities, and computational resources. To address these challenges, we introduce Federated Representation Fine-Tuning (FedReFT), a novel approach to fine-tune the client's hidden representation. FedReFT applies sparse intervention layers to steer hidden representations directly, offering a lightweight and semantically rich fine-tuning alternative ideal for edge devices. However, representation-level updates are especially vulnerable to aggregation mismatch under different task heterogeneity, where naive averaging can corrupt semantic alignment. To mitigate this issue, we propose All-But-Me (ABM) aggregation, where each client receives the aggregated updates of others and partially incorporates them, enabling stable and personalized learning by balancing local focus with global knowledge. We evaluate FedReFT on commonsense reasoning, arithmetic reasoning, instruction-tuning, and GLUE, where it consistently outperforms state-of-the-art PEFT methods in FL, achieving 7x-15x higher parameter efficiency compared to leading LoRA-based approaches.

【4】Linking heterogeneous microstructure informatics with expert characterization knowledge through customized and hybrid vision-language representations for industrial qualification
标题：通过定制和混合视觉语言表示，将异类微结构信息学与专家特征知识联系起来，以实现工业资格
链接：https://arxiv.org/abs/2508.20243

作者：afdar, Gentry Wood, Max Zimmermann, Guy Lamouche, Priti Wanjara, Yaoyao Fiona Zhao
备注：46 pages, 33 figures, Submitted to Advanced Engineering Informatics, under revision
摘要：先进材料的快速可靠鉴定仍然是工业制造中的瓶颈，特别是对于通过非传统增材制造工艺生产的异质结构。本研究介绍了一种新的框架，链接微结构信息学与一系列的专家表征知识，使用定制和混合的视觉语言表示（VLR）。通过将深度语义分割与预训练的多模态模型（CLIP和FLAVA）相结合，我们将视觉微观结构数据和文本专家评估编码为共享表示。为了克服通用嵌入的局限性，我们开发了一种自定义的基于相似性的表示，它结合了来自专家注释图像及其相关文本描述的正面和负面参考。这允许通过净相似性评分方法对先前看不见的微结构进行zero-shot分类。对增材制造金属基复合材料数据集的验证表明，该框架能够在一系列表征标准中区分可接受和有缺陷的样本。比较分析表明，FLAVA模型具有更高的视觉灵敏度，而CLIP模型则与文本标准保持一致。Z-score标准化基于原始单峰和跨模态相似性分数的局部子集驱动分布来调整原始单峰和跨模态相似性分数，从而在混合视觉语言框架中实现更有效的对齐和分类。所提出的方法增强了可追溯性和可解释性的资格管道，使人在循环决策，而无需特定于任务的模型再训练。通过推进原始数据和专家知识之间的语义互操作性，这项工作有助于在工程信息学的可扩展性和域适应性的资格策略。
摘要：Rapid and reliable qualification of advanced materials remains a bottleneck in industrial manufacturing, particularly for heterogeneous structures produced via non-conventional additive manufacturing processes. This study introduces a novel framework that links microstructure informatics with a range of expert characterization knowledge using customized and hybrid vision-language representations (VLRs). By integrating deep semantic segmentation with pre-trained multi-modal models (CLIP and FLAVA), we encode both visual microstructural data and textual expert assessments into shared representations. To overcome limitations in general-purpose embeddings, we develop a customized similarity-based representation that incorporates both positive and negative references from expert-annotated images and their associated textual descriptions. This allows zero-shot classification of previously unseen microstructures through a net similarity scoring approach. Validation on an additively manufactured metal matrix composite dataset demonstrates the framework's ability to distinguish between acceptable and defective samples across a range of characterization criteria. Comparative analysis reveals that FLAVA model offers higher visual sensitivity, while the CLIP model provides consistent alignment with the textual criteria. Z-score normalization adjusts raw unimodal and cross-modal similarity scores based on their local dataset-driven distributions, enabling more effective alignment and classification in the hybrid vision-language framework. The proposed method enhances traceability and interpretability in qualification pipelines by enabling human-in-the-loop decision-making without task-specific model retraining. By advancing semantic interoperability between raw data and expert knowledge, this work contributes toward scalable and domain-adaptable qualification strategies in engineering informatics.

优化|敛散性(7篇)

【1】Fast Convergence Rates for Subsampled Natural Gradient Algorithms on Quadratic Model Problems
标题：二次模型问题下采样自然梯度算法的快速收敛速度
链接：https://arxiv.org/abs/2508.21022

作者：hlager, Jiang Hu, Lin Lin
备注：21 pages, 4 figures
摘要：子采样自然梯度下降（SNGD）在科学机器学习中的参数优化任务（如神经网络波函数和物理信息神经网络）中显示出令人印象深刻的结果，但它缺乏理论解释。我们通过分析SNGD及其加速变体SPRING的收敛性来解决这一差距，其中模型是线性的，损失函数是强凸和二次的理想化参数优化问题。在最小二乘损失的特殊情况下，即标准线性最小二乘问题，我们证明了SNGD是等价于正则化Kaczmarz方法，而SPRING是等价于加速正则化Kaczmarz方法。因此，通过利用现有的分析，我们在温和的条件下获得（i）SNGD的第一个快速收敛速度，（ii）SPRING在任何设置中的第一个收敛保证，以及（iii）SPRING可以加速SNGD的第一个证明。在一般强凸二次损失的情况下，我们扩展了正则化Kaczmarz方法的分析，以获得更强条件下SNGD的快速收敛速度，提供了第一个解释SNGD的最小二乘设置之外的有效性。总的来说，我们的研究结果说明了随机线性代数工具如何为子采样和曲率感知优化策略之间的相互作用提供新的思路。
摘要：Subsampled natural gradient descent (SNGD) has shown impressive results for parametric optimization tasks in scientific machine learning, such as neural network wavefunctions and physics-informed neural networks, but it has lacked a theoretical explanation. We address this gap by analyzing the convergence of SNGD and its accelerated variant, SPRING, for idealized parametric optimization problems where the model is linear and the loss function is strongly convex and quadratic. In the special case of a least-squares loss, namely the standard linear least-squares problem, we prove that SNGD is equivalent to a regularized Kaczmarz method while SPRING is equivalent to an accelerated regularized Kaczmarz method. As a result, by leveraging existing analyses we obtain under mild conditions (i) the first fast convergence rate for SNGD, (ii) the first convergence guarantee for SPRING in any setting, and (iii) the first proof that SPRING can accelerate SNGD. In the case of a general strongly convex quadratic loss, we extend the analysis of the regularized Kaczmarz method to obtain a fast convergence rate for SNGD under stronger conditions, providing the first explanation for the effectiveness of SNGD outside of the least-squares setting. Overall, our results illustrate how tools from randomized linear algebra can shed new light on the interplay between subsampling and curvature-aware optimization strategies.

【2】A Hybrid Stochastic Gradient Tracking Method for Distributed Online Optimization Over Time-Varying Directed Networks
标题：时变有向网络分布式在线优化的混合随机梯度跟踪方法
链接：https://arxiv.org/abs/2508.20645

作者：, Xingxing Yuan, Longkang Zhu, Guanghui Wen
摘要：随着数据规模和动态性的增加，分布式在线优化已成为各种应用中实时决策的必要条件。然而，现有的算法往往依赖于有界梯度假设，忽略了随机梯度的影响，特别是在时变有向网络。基于混合随机梯度跟踪和方差缩减机制，提出了一种新的时变混合随机梯度跟踪算法TV-HSGT。具体来说，TV-HSGT集成了行随机和列随机通信计划随时间变化的有向图，消除了需要Perron向量估计或出度信息。通过结合当前和递归随机梯度，它有效地降低梯度方差，同时准确地跟踪全局下降方向。理论分析表明，TV-HSGT可以实现改进的边界动态遗憾没有假设梯度有界。逻辑回归任务的实验结果证实了TV-HSGT在动态和资源受限环境中的有效性。
摘要：With the increasing scale and dynamics of data, distributed online optimization has become essential for real-time decision-making in various applications. However, existing algorithms often rely on bounded gradient assumptions and overlook the impact of stochastic gradients, especially in time-varying directed networks. This study proposes a novel Time-Varying Hybrid Stochastic Gradient Tracking algorithm named TV-HSGT, based on hybrid stochastic gradient tracking and variance reduction mechanisms. Specifically, TV-HSGT integrates row-stochastic and column-stochastic communication schemes over time-varying digraphs, eliminating the need for Perron vector estimation or out-degree information. By combining current and recursive stochastic gradients, it effectively reduces gradient variance while accurately tracking global descent directions. Theoretical analysis demonstrates that TV-HSGT can achieve improved bounds on dynamic regret without assuming gradient boundedness. Experimental results on logistic regression tasks confirm the effectiveness of TV-HSGT in dynamic and resource-constrained environments.

【3】Unbiased Stochastic Optimization for Gaussian Processes on Finite Dimensional RKHS
标题：有限维RKHS上高斯过程的无偏随机优化
链接：https://arxiv.org/abs/2508.20588

作者：am, Haim Avron
摘要：目前高斯过程（GP）中的随机超参数学习方法依赖于近似，例如计算有偏随机梯度或在随机变分推理中使用诱导点。然而，当使用这样的方法时，我们不能保证收敛到真实边际似然的稳定点。在这项工作中，我们提出了精确的随机推断的GP与内核，导致再生核希尔伯特空间（RKHS）的适度有限维的算法。我们的方法也可以扩展到无限维的RKHS在放弃精确性的代价。对于有限维和无限维RKHS，当内存资源限制可行批量大小和可能的诱导点数量时，我们的方法比现有方法获得了更好的实验结果。
摘要：Current methods for stochastic hyperparameter learning in Gaussian Processes (GPs) rely on approximations, such as computing biased stochastic gradients or using inducing points in stochastic variational inference. However, when using such methods we are not guaranteed to converge to a stationary point of the true marginal likelihood. In this work, we propose algorithms for exact stochastic inference of GPs with kernels that induce a Reproducing Kernel Hilbert Space (RKHS) of moderate finite dimension. Our approach can also be extended to infinite dimensional RKHSs at the cost of forgoing exactness. Both for finite and infinite dimensional RKHSs, our method achieves better experimental results than existing methods when memory resources limit the feasible batch size and the possible number of inducing points.

【4】Theoretical foundations of the integral indicator application in hyperparametric optimization
标题：超参数优化积分指标应用的理论基础
链接：https://arxiv.org/abs/2508.20550

作者：Kulshin, Anatoly A. Sidorov
摘要：本文讨论了推荐算法的超参数优化的概念，使用一个完整的评估，将各种性能指标结合到一个单一的统一标准。这种方法与传统的设置单一指标的方法相反，它允许您在准确性、排名质量、输出的多样性和算法的资源强度之间实现平衡。该研究的理论意义在于开发一种通用的多标准优化工具，不仅适用于推荐系统，而且适用于广泛的机器学习和数据分析任务。
摘要：The article discusses the concept of hyperparametric optimization of recommendation algorithms using an integral assessment that combines various performance indicators into a single consolidated criterion. This approach is opposed to traditional methods of setting up a single metric and allows you to achieve a balance between accuracy, ranking quality, variety of output and the resource intensity of algorithms. The theoretical significance of the research lies in the development of a universal multi-criteria optimization tool that is applicable not only in recommendation systems, but also in a wide range of machine learning and data analysis tasks.

【5】Objective Value Change and Shape-Based Accelerated Optimization for the Neural Network Approximation
标题：神经网络逼近的目标值变化和基于形状的加速优化
链接：https://arxiv.org/abs/2508.20290

作者： Xie, Zihao Zhou, Zijian Zhou
备注：27 pages
摘要：本文引入了一种新的目标函数f的度量，我们称之为VC（value change）来度量神经网络逼近任务的难度和逼近效果，它在数值上支持对神经网络逼近的局部性能和行为的表征。神经网络经常遭受不可预测的局部性能，这可能会阻碍其在关键应用中的可靠性。VC通过提供网络行为中局部值变化的可量化度量来解决这个问题，为实现神经网络近似提供了对稳定性和性能的见解。我们研究了VC的一些基本理论性质，发现了神经网络逼近中的两个有趣现象：VC倾向和少数倾向。这些趋势分别表征了近似过程中逐点误差随VC分布的变化规律。此外，本文提出了一种基于VC的新度量，从变分的角度度量两个函数之间的距离。在此度量的基础上，我们进一步提出了一个新的神经网络近似预处理框架。数值结果包括真实世界的实验和偏微分方程相关的科学问题支持我们的发现和预处理加速方法。
摘要：This paper introduce a novel metric of an objective function f, we say VC (value change) to measure the difficulty and approximation affection when conducting an neural network approximation task, and it numerically supports characterizing the local performance and behavior of neural network approximation. Neural networks often suffer from unpredictable local performance, which can hinder their reliability in critical applications. VC addresses this issue by providing a quantifiable measure of local value changes in network behavior, offering insights into the stability and performance for achieving the neural-network approximation. We investigate some fundamental theoretical properties of VC and identified two intriguing phenomena in neural network approximation: the VC-tendency and the minority-tendency. These trends respectively characterize how pointwise errors evolve in relation to the distribution of VC during the approximation process.In addition, we propose a novel metric based on VC, which measures the distance between two functions from the perspective of variation. Building upon this metric, we further propose a new preprocessing framework for neural network approximation. Numerical results including the real-world experiment and the PDE-related scientific problem support our discovery and pre-processing acceleration method.

【6】Beyond Optimization: Exploring Novelty Discovery in Autonomous Experiments
标题：超越优化：探索自主实验中的新奇发现
链接：https://arxiv.org/abs/2508.20254

作者：anadi, Jawad Chowdhury, Funakubo Hiroshi, Maxim Ziatdinov, Rama Vasudevan, Arpan Biswas, Yongtao Liu
摘要：自主实验（AE）正在通过将人工智能与自动化实验平台相结合来改变科学研究的进行方式。目前的AE主要集中在预定义目标的优化上;虽然加速了这一目标，但这种方法限制了意外或未知物理现象的发现。在这里，我们介绍了一个新的框架，INS 2ANE（集成新奇分数战略自主非平滑探索），以提高自主实验中的新现象的发现。我们的方法集成了两个关键组成部分：（1）一个新颖的评分系统，评估实验结果的独特性，（2）一个战略抽样机制，促进探索采样不足的地区，即使他们出现不太有希望的传统标准。我们验证了这种方法在预先获取的数据集与已知的地面实况包括图像光谱对。我们进一步实现自主扫描探针显微镜实验的过程。与传统的优化程序相比，INS 2ANE显着增加了探索现象的多样性，提高了发现以前未观察到的现象的可能性。这些结果证明了AE提高科学发现深度的潜力;结合AE提供的效率，这种方法有望通过同时导航复杂的实验空间来发现新现象，从而加速科学研究。
摘要：Autonomous experiments (AEs) are transforming how scientific research is conducted by integrating artificial intelligence with automated experimental platforms. Current AEs primarily focus on the optimization of a predefined target; while accelerating this goal, such an approach limits the discovery of unexpected or unknown physical phenomena. Here, we introduce a novel framework, INS2ANE (Integrated Novelty Score-Strategic Autonomous Non-Smooth Exploration), to enhance the discovery of novel phenomena in autonomous experimentation. Our method integrates two key components: (1) a novelty scoring system that evaluates the uniqueness of experimental results, and (2) a strategic sampling mechanism that promotes exploration of under-sampled regions even if they appear less promising by conventional criteria. We validate this approach on a pre-acquired dataset with a known ground truth comprising of image-spectral pairs. We further implement the process on autonomous scanning probe microscopy experiments. INS2ANE significantly increases the diversity of explored phenomena in comparison to conventional optimization routines, enhancing the likelihood of discovering previously unobserved phenomena. These results demonstrate the potential for AE to enhance the depth of scientific discovery; in combination with the efficiency provided by AEs, this approach promises to accelerate scientific research by simultaneously navigating complex experimental spaces to uncover new phenomena.

【7】Multi-Objective Optimization of ReRAM Crossbars for Robust DNN Inferencing under Stochastic Noise
标题：随机噪音下鲁棒DNN推理的ReRAM Crossbar多目标优化
链接：https://arxiv.org/abs/2109.05437

作者：Yang, Syrine Belakaria, Biresh Kumar Joardar, Huanrui Yang, Janardhan Rao Doppa, Partha Pratim Pande, Krishnendu Chakrabarty, Hai Li
备注：To appear in ICCAD 2021
摘要：电阻式随机存取存储器（ReRAM）是一种很有前途的技术，可用于设计深度神经网络（DNN）推理的硬件加速器。然而，ReRAM crossbar中的随机噪声会降低DNN的推理精度。我们提出了一个高性能，面积和能源效率的基于ReRAM的硬件加速器的设计和优化，以实现在存在随机噪声的情况下强大的DNN推理。我们做出了两项关键的技术贡献。首先，我们提出了一种随机噪声感知的训练方法，称为ReSNA，以提高DNN在具有随机噪声的ReRAM交叉杆上进行推理的准确性。其次，我们提出了一个信息理论的算法，称为CF-MESMO，以确定帕累托集的解决方案，以权衡多个目标，包括推理精度，面积开销，执行时间和能源消耗。在这种情况下的主要挑战是执行ReSNA方法来评估每个候选ReRAM设计是禁止的。为了应对这一挑战，我们利用与高计算成本相关的ReRAM设计的连续保真度评估，通过改变训练时期的数量来权衡精度和成本。CF-MESMO迭代地选择候选ReRAM设计和保真度对，其最大化关于最优帕累托前沿的每单位计算成本获得的信息。我们在基准DNN上的实验表明，所提出的算法有效地发现了高质量的Pareto前沿。平均而言，相对于基线配置，ReSNA在CIFAR-10数据集上实现了ResNet 20的2.57%的推理准确性提高。此外，CF-MESMO算法实现了90.91%的计算成本减少相比，流行的多目标优化算法NSGA-II，以达到最佳的解决方案从NSGA-II。
摘要：Resistive random-access memory (ReRAM) is a promising technology for designing hardware accelerators for deep neural network (DNN) inferencing. However, stochastic noise in ReRAM crossbars can degrade the DNN inferencing accuracy. We propose the design and optimization of a high-performance, area-and energy-efficient ReRAM-based hardware accelerator to achieve robust DNN inferencing in the presence of stochastic noise. We make two key technical contributions. First, we propose a stochastic-noise-aware training method, referred to as ReSNA, to improve the accuracy of DNN inferencing on ReRAM crossbars with stochastic noise. Second, we propose an information-theoretic algorithm, referred to as CF-MESMO, to identify the Pareto set of solutions to trade-off multiple objectives, including inferencing accuracy, area overhead, execution time, and energy consumption. The main challenge in this context is that executing the ReSNA method to evaluate each candidate ReRAM design is prohibitive. To address this challenge, we utilize the continuous-fidelity evaluation of ReRAM designs associated with prohibitive high computation cost by varying the number of training epochs to trade-off accuracy and cost. CF-MESMO iteratively selects the candidate ReRAM design and fidelity pair that maximizes the information gained per unit computation cost about the optimal Pareto front. Our experiments on benchmark DNNs show that the proposed algorithms efficiently uncover high-quality Pareto fronts. On average, ReSNA achieves 2.57% inferencing accuracy improvement for ResNet20 on the CIFAR-10 dataset with respect to the baseline configuration. Moreover, CF-MESMO algorithm achieves 90.91% reduction in computation cost compared to the popular multi-objective optimization algorithm NSGA-II to reach the best solution from NSGA-II.

预测|估计(6篇)

【1】Developing a Multi-Modal Machine Learning Model For Predicting Performance of Automotive Hood Frames
标题：开发用于预测汽车发动机罩框架性能的多模式机器学习模型
链接：https://arxiv.org/abs/2508.20358

作者：Indupally, Satchit Ramnath
摘要：有没有一种方法可以让设计师在不花费大量时间进行模拟设置的情况下评估给定发动机罩框架几何形状的性能？本文旨在通过开发一种多模态机器学习（MMML）架构来解决这一挑战，该架构可以从相同数据的不同模态中学习，以预测性能指标。它还旨在使用MMML架构，通过减少对计算昂贵的模拟的依赖来提高工程设计过程的效率。所提出的架构加速了设计探索，实现了快速迭代，同时保持了高性能标准，特别是在概念设计阶段。该研究还显示，通过结合多种数据模式，MMML优于传统的单一模式方法。两个新的框架几何形状，不是训练数据集的一部分，也用于使用训练的MMML模型进行预测，以展示推广到看不见的框架模型的能力。研究结果强调了MMML在补充传统的基于仿真的工作流程方面的潜力，特别是在概念设计阶段，并强调了其在弥合机器学习和现实世界工程应用之间的差距方面的作用。这项研究为在工程设计中更广泛地采用机器学习技术铺平了道路，重点是改进多模态方法，以优化结构开发并加快设计周期。
摘要：Is there a way for a designer to evaluate the performance of a given hood frame geometry without spending significant time on simulation setup? This paper seeks to address this challenge by developing a multimodal machine-learning (MMML) architecture that learns from different modalities of the same data to predict performance metrics. It also aims to use the MMML architecture to enhance the efficiency of engineering design processes by reducing reliance on computationally expensive simulations. The proposed architecture accelerates design exploration, enabling rapid iteration while maintaining high-performance standards, especially in the concept design phase. The study also presents results that show that by combining multiple data modalities, MMML outperforms traditional single-modality approaches. Two new frame geometries, not part of the training dataset, are also used for prediction using the trained MMML model to showcase the ability to generalize to unseen frame models. The findings underscore MMML's potential in supplementing traditional simulation-based workflows, particularly in the conceptual design phase, and highlight its role in bridging the gap between machine learning and real-world engineering applications. This research paves the way for the broader adoption of machine learning techniques in engineering design, with a focus on refining multimodal approaches to optimize structural development and accelerate the design cycle.

【2】Dynamic Synthetic Controls vs. Panel-Aware Double Machine Learning for Geo-Level Marketing Impact Estimation
标题：动态合成控制与小组感知双机器学习用于地理级别营销影响评估
链接：https://arxiv.org/abs/2508.20335

作者：ee, Vineeth Loganathan, Vijay Raghavan
备注：Presented at the KDD 2025 Workshop on Causal Inference and Machine Learning in Practice
摘要：在双边市场中准确量化地理层面的营销提升具有挑战性：综合控制方法（SCM）通常具有很高的功效，但系统地低估了效应大小，而面板式双机器学习（DML）很少以SCM为基准。我们构建了一个开放的、完整记录的模拟器，模拟典型的大规模地理推广：在发布前的T_pre周和T_post周活动窗口跟踪N_unit区域市场，允许用户改变所有关键参数，并在五个程式化压力测试下探测两个系列：1）曲线基线趋势，2）异质反应滞后，3）治疗偏倚冲击，4）非线性结局链接，5）漂移对照组趋势。评估了七个估计量：三种标准增强SCM（ASC）变体和四种小组DML风格（TWFE、CRE/Mundlak、一阶差和组内）。在每个场景100次重复中，ASC模型在涉及非线性或外部冲击的具有挑战性的场景中始终表现出严重的偏差和接近零的覆盖率。相比之下，panel-DML变体显著降低了这种偏倚，并恢复了标称95%-CI覆盖率，证明更加稳健。结果表明，虽然ASC提供了一个简单的基线，但在常见的复杂情况下是不可靠的。因此，我们提出了一个“诊断优先”的框架，从业者首先确定主要的业务挑战（例如，非线性趋势、响应滞后），然后选择最适合该场景的特定DML模型，为分析地质实验提供更强大、更可靠的蓝图。
摘要：Accurately quantifying geo-level marketing lift in two-sided marketplaces is challenging: the Synthetic Control Method (SCM) often exhibits high power yet systematically under-estimates effect size, while panel-style Double Machine Learning (DML) is seldom benchmarked against SCM. We build an open, fully documented simulator that mimics a typical large-scale geo roll-out: N_unit regional markets are tracked for T_pre weeks before launch and for a further T_post-week campaign window, allowing all key parameters to be varied by the user and probe both families under five stylized stress tests: 1) curved baseline trends, 2) heterogeneous response lags, 3) treated-biased shocks, 4) a non-linear outcome link, and 5) a drifting control group trend. Seven estimators are evaluated: three standard Augmented SCM (ASC) variants and four panel-DML flavors (TWFE, CRE/Mundlak, first-difference, and within-group). Across 100 replications per scenario, ASC models consistently demonstrate severe bias and near-zero coverage in challenging scenarios involving nonlinearities or external shocks. By contrast, panel-DML variants dramatically reduce this bias and restore nominal 95%-CI coverage, proving far more robust. The results indicate that while ASC provides a simple baseline, it is unreliable in common, complex situations. We therefore propose a 'diagnose-first' framework where practitioners first identify the primary business challenge (e.g., nonlinear trends, response lags) and then select the specific DML model best suited for that scenario, providing a more robust and reliable blueprint for analyzing geo-experiments.

【3】Generalizable AI Model for Indoor Temperature Forecasting Across Sub-Saharan Africa
标题：撒哈拉以南非洲地区室内温度预测的可推广人工智能模型
链接：https://arxiv.org/abs/2508.20260

作者：htar, Eunice Jengo, Björn Haßler
摘要：这项研究提出了一个轻量级的、领域信息化的人工智能模型，用于预测撒哈拉以南非洲自然通风学校和家庭的室内温度。该模型扩展了Temp-AI-Estimate框架，对坦桑尼亚学校数据进行了培训，并对尼日利亚学校和冈比亚家庭进行了评估。它实现了强大的越野性能，仅使用最小的可访问的输入，平均绝对误差为1.45摄氏度的尼日利亚学校和0.65摄氏度的冈比亚家庭。这些发现突出了人工智能在资源受限环境中进行热舒适管理的潜力。
摘要：This study presents a lightweight, domain-informed AI model for predicting indoor temperatures in naturally ventilated schools and homes in Sub-Saharan Africa. The model extends the Temp-AI-Estimator framework, trained on Tanzanian school data, and evaluated on Nigerian schools and Gambian homes. It achieves robust cross-country performance using only minimal accessible inputs, with mean absolute errors of 1.45{\deg}C for Nigerian schools and 0.65{\deg}C for Gambian homes. These findings highlight AI's potential for thermal comfort management in resource-constrained environments.

【4】Latent Variable Modeling for Robust Causal Effect Estimation
标题：稳健因果效应估计的潜在变量建模
链接：https://arxiv.org/abs/2508.20259

作者：orimura, Tatsushi Oka, Yugo Suzuki, Daisuke Moriwaki
备注：Accepted to CIKM 2025. This is the full version including extended appendix
摘要：潜变量模型为在观测数据中纳入和推断未观测到的因素提供了一个强有力的框架。在因果推断中，它们有助于解释影响治疗或结果的隐藏因素，从而解决缺失或不可测量的协变量带来的挑战。本文提出了一种新的框架，将潜在变量建模集成到双机器学习（DML）范式中，以在存在这些隐藏因素的情况下实现稳健的因果效应估计。我们考虑两种情况：一种是潜在变量只影响结果，另一种是潜在变量可能影响治疗和结果。为了确保易处理性，我们只在DML的第二阶段引入了潜在变量，将表征学习与潜在推理分离开来。我们证明了我们的方法的鲁棒性和有效性，通过广泛的实验合成和真实世界的数据集。
摘要：Latent variable models provide a powerful framework for incorporating and inferring unobserved factors in observational data. In causal inference, they help account for hidden factors influencing treatment or outcome, thereby addressing challenges posed by missing or unmeasured covariates. This paper proposes a new framework that integrates latent variable modeling into the double machine learning (DML) paradigm to enable robust causal effect estimation in the presence of such hidden factors. We consider two scenarios: one where a latent variable affects only the outcome, and another where it may influence both treatment and outcome. To ensure tractability, we incorporate latent variables only in the second stage of DML, separating representation learning from latent inference. We demonstrate the robustness and effectiveness of our method through extensive experiments on both synthetic and real-world datasets.

【5】Filter then Attend: Improving attention-based Time Series Forecasting with Spectral Filtering
标题：过滤然后参与：使用谱过滤改进基于注意力的时间序列预测
链接：https://arxiv.org/abs/2508.20206

作者：yag, Nhat Thanh Van Tran, Jack Xin
摘要：基于变压器的模型在长时间序列预测（LTSF）中处于最前沿。虽然在许多情况下，这些模型能够实现最先进的结果，但它们存在数据中的低频偏差以及高计算和内存要求。最近的工作已经确定，通过提高模型的频谱利用率，可学习的频率滤波器可以成为深度预测模型的一个组成部分。这些作品选择使用多层感知器来处理滤波后的信号，因此无法解决基于变压器的模型所发现的问题。在本文中，我们建立了一个过滤器，以变压器为基础的模型的开始，提高其性能在长时间序列预测。我们添加了可学习的过滤器，它只增加了一个额外的$\approximat 1000$的参数，几个变压器为基础的模型，并观察在多个实例中5-10 \%的预测性能的相对改善。此外，我们发现添加过滤器后，我们能够降低模型的嵌入维数，从而使基于transformer的架构比非过滤基础模型更小，更有效。我们还进行了合成实验，以分析过滤器如何使基于transformer的模型更好地利用全谱进行预测。
摘要：Transformer-based models are at the forefront in long time-series forecasting (LTSF). While in many cases, these models are able to achieve state of the art results, they suffer from a bias toward low-frequencies in the data and high computational and memory requirements. Recent work has established that learnable frequency filters can be an integral part of a deep forecasting model by enhancing the model's spectral utilization. These works choose to use a multilayer perceptron to process their filtered signals and thus do not solve the issues found with transformer-based models. In this paper, we establish that adding a filter to the beginning of transformer-based models enhances their performance in long time-series forecasting. We add learnable filters, which only add an additional $\approx 1000$ parameters to several transformer-based models and observe in multiple instances 5-10 \% relative improvement in forecasting performance. Additionally, we find that with filters added, we are able to decrease the embedding dimension of our models, resulting in transformer-based architectures that are both smaller and more effective than their non-filtering base models. We also conduct synthetic experiments to analyze how the filters enable Transformer-based models to better utilize the full spectrum for forecasting.

【6】Mitigating Distribution Shift in Stock Price Data via Return-Volatility Normalization for Accurate Prediction
标题：通过回报-波动率标准化缓解股价数据的分布变化以实现准确预测
链接：https://arxiv.org/abs/2508.20108

作者：ee, Jihyeong Jeon, Jaemin Hong, U Kang
备注：10 pages, 4 figures, accpeted to CIKM 2025
摘要：我们如何解决股票价格数据的分布变化，以提高股票价格预测的准确性？股票价格预测已经引起了学术界和工业界的关注，因为它具有揭示复杂市场模式和增强决策的潜力。然而，现有的方法往往无法有效地处理分布变化，专注于缩放或表示自适应，而没有完全解决训练和测试数据之间的分布差异和形状不一致。我们提出了Reflection（Return-Volatility Normalization for Mitigating Distribution Shift in Stock Price Data），这是一种用于股票价格预测的鲁棒方法，它明确地解决了分布偏移问题。Reynolds利用三个关键策略来减轻这些变化：（1）标准化价格特征，以去除样本特定的特征，包括回报率，波动性和价格规模，（2）采用基于注意力的模块来准确估计这些特征，从而减少市场异常的影响，以及（3）将样本特征重新整合到预测过程中，恢复在标准化过程中丢失的特征。此外，Reynolds将用于长期趋势建模的几何布朗运动与用于短期模式识别的神经网络相结合，统一了它们的互补优势。在真实世界数据集上进行的大量实验表明，Reflections在大多数情况下增强了最先进的骨干模型的性能，在各种设置中，IC的平均改进超过0.03，SR的平均改进超过0.7。
摘要：How can we address distribution shifts in stock price data to improve stock price prediction accuracy? Stock price prediction has attracted attention from both academia and industry, driven by its potential to uncover complex market patterns and enhance decisionmaking. However, existing methods often fail to handle distribution shifts effectively, focusing on scaling or representation adaptation without fully addressing distributional discrepancies and shape misalignments between training and test data. We propose ReVol (Return-Volatility Normalization for Mitigating Distribution Shift in Stock Price Data), a robust method for stock price prediction that explicitly addresses the distribution shift problem. ReVol leverages three key strategies to mitigate these shifts: (1) normalizing price features to remove sample-specific characteristics, including return, volatility, and price scale, (2) employing an attention-based module to estimate these characteristics accurately, thereby reducing the influence of market anomalies, and (3) reintegrating the sample characteristics into the predictive process, restoring the traits lost during normalization. Additionally, ReVol combines geometric Brownian motion for long-term trend modeling with neural networks for short-term pattern recognition, unifying their complementary strengths. Extensive experiments on real-world datasets demonstrate that ReVol enhances the performance of the state-of-the-art backbone models in most cases, achieving an average improvement of more than 0.03 in IC and over 0.7 in SR across various settings.

其他神经网络|深度学习|模型|建模(13篇)

【1】InSQuAD: In-Context Learning for Efficient Retrieval via Submodular Mutual Information to Enforce Quality and Diversity
标题：InSquuAD：通过子模块互信息实现高效检索的上下文学习，以加强质量和多样性
链接：https://arxiv.org/abs/2508.21003

作者： Nanda, Anay Majee, Rishabh Iyer
备注：Long Version of paper Accepted to ICDM 2025
摘要：在本文中，我们介绍了InSQuAD，旨在提高在上下文学习（ICL）模型的性能，通过子模块互信息（SMI）在上下文样本之间实施质量和多样性。InSQuAD通过两个主要策略实现了这一点：首先，我们将ICL任务建模为有针对性的选择问题，并引入了一个统一的选择策略，该策略基于SMI，挖掘相关但多样的上下文示例，这些示例封装了质量和多样性的概念。其次，我们解决了一个常见的陷阱，在现有的检索模型模型查询相关性，往往忽视多样性，关键的ICL。InSQuAD引入了一种组合训练范式，该范式学习SMI函数的参数，通过一种新的基于似然的损失来加强检索模型的质量和多样性。为了进一步帮助学习过程，我们增加了现有的多跳问答数据集与综合生成的释义。采用使用这种策略训练的检索模型以及九个基准数据集上ICL的新目标选择公式，显示出显着的改进，验证了我们方法的有效性。
摘要：In this paper, we introduce InSQuAD, designed to enhance the performance of In-Context Learning (ICL) models through Submodular Mutual Information} (SMI) enforcing Quality and Diversity among in-context exemplars. InSQuAD achieves this through two principal strategies: First, we model the ICL task as a targeted selection problem and introduce a unified selection strategy based on SMIs which mines relevant yet diverse in-context examples encapsulating the notions of quality and diversity. Secondly, we address a common pitfall in existing retrieval models which model query relevance, often overlooking diversity, critical for ICL. InSQuAD introduces a combinatorial training paradigm which learns the parameters of an SMI function to enforce both quality and diversity in the retrieval model through a novel likelihood-based loss. To further aid the learning process we augment an existing multi-hop question answering dataset with synthetically generated paraphrases. Adopting the retrieval model trained using this strategy alongside the novel targeted selection formulation for ICL on nine benchmark datasets shows significant improvements validating the efficacy of our approach.

【2】LeMat-Traj: A Scalable and Unified Dataset of Materials Trajectories for Atomistic Modeling
标题：LeMat-Traj：用于原子建模的可扩展、统一的材料轨迹数据集
链接：https://arxiv.org/abs/2508.20875

作者：oui, Martin Siron, Inel Djafar, Joseph Musielewicz, Amandine Rossello, Victor Schmidt, Alexandre Duval
摘要：精确的机器学习原子间相互作用势（MLIP）的发展受到来自密度泛函理论（DFT）的量子力学轨迹数据集的碎片化可用性和不一致格式的限制。这些数据集的生成成本很高，但由于格式、元数据和可访问性的差异，很难组合。为了解决这个问题，我们引入了LeMat-Traj，这是一个精心策划的数据集，包括从大规模存储库（包括Materials Project，Alexandria和OQMD）聚合的超过1.2亿个原子配置。LeMat-Traj简化了数据表示，协调了结果并过滤了广泛使用的DFT泛函（PBE，PBESol，SCAN，r2 SCAN）的高质量配置。它大大降低了培训可转移和准确的MLIP的障碍。LeMat-Traj涵盖了松弛的低能态和高能、大力结构，补充了分子动力学和主动学习数据集。通过使用LeMat-Traj对高力数据预训练的模型进行微调，我们在放松任务中实现了力预测误差的显着减少。我们还介绍了LeMaterial-Fetcher，这是一个为这项工作开发的模块化和可扩展的开源库，旨在为社区提供一个可复制的框架，以便轻松整合新的数据源，并确保大规模材料数据集的持续发展。LeMat-Traj和LeMaterial-Fetcher可在https://huggingface.co/datasets/LeMaterial/LeMat-Traj和https://github.com/LeMaterial/lematerial-fetcher上公开获取。
摘要：The development of accurate machine learning interatomic potentials (MLIPs) is limited by the fragmented availability and inconsistent formatting of quantum mechanical trajectory datasets derived from Density Functional Theory (DFT). These datasets are expensive to generate yet difficult to combine due to variations in format, metadata, and accessibility. To address this, we introduce LeMat-Traj, a curated dataset comprising over 120 million atomic configurations aggregated from large-scale repositories, including the Materials Project, Alexandria, and OQMD. LeMat-Traj standardizes data representation, harmonizes results and filters for high-quality configurations across widely used DFT functionals (PBE, PBESol, SCAN, r2SCAN). It significantly lowers the barrier for training transferrable and accurate MLIPs. LeMat-Traj spans both relaxed low-energy states and high-energy, high-force structures, complementing molecular dynamics and active learning datasets. By fine-tuning models pre-trained on high-force data with LeMat-Traj, we achieve a significant reduction in force prediction errors on relaxation tasks. We also present LeMaterial-Fetcher, a modular and extensible open-source library developed for this work, designed to provide a reproducible framework for the community to easily incorporate new data sources and ensure the continued evolution of large-scale materials datasets. LeMat-Traj and LeMaterial-Fetcher are publicly available at https://huggingface.co/datasets/LeMaterial/LeMat-Traj and https://github.com/LeMaterial/lematerial-fetcher.

【3】Practical Physical Layer Authentication for Mobile Scenarios Using a Synthetic Dataset Enhanced Deep Learning Approach
标题：使用合成数据集增强深度学习方法的移动场景实用物理层认证
链接：https://arxiv.org/abs/2508.20861

作者：, Junqing Zhang, Y.-W. Peter Hong
摘要：由于无线技术的快速发展，物联网（IoT）无处不在。然而，无线传输的广播性质导致设备认证的极大脆弱性。物理层认证通过利用独特的信道特性而成为一种有前途的方法。然而，仍然缺少适用于动态信道变化的实用方案。本文提出了一种基于深度学习的移动场景物理层信道状态信息（CSI）认证方法，并使用IEEE 802.11n进行了全面的仿真和实验评估。具体而言，基于WLAN TGn信道模型以及信道的自相关性和距离相关性生成合成训练数据集，这可以显著减少手动收集实验数据集的开销。利用基于卷积神经网络（CNN）的Siamese网络来学习CSI对之间的时间和空间相关性，并输出分数来衡量它们的相似性。我们采用了一种协同的方法，包括模拟和实验评估。实验测试平台由WiFi物联网开发工具包组成，并特别考虑了一些典型场景。仿真和实验评估都证明了我们提出的基于深度学习的方法具有出色的泛化性能和出色的认证性能。我们的实际测量结果表明，我们提出的方案提高了曲线下面积（AUC）的0.03相比，完全连接的网络为基础的（基于FCN）的暹罗模型和0.06相比，基于相关性的基准算法。
摘要：The Internet of Things (IoT) is ubiquitous thanks to the rapid development of wireless technologies. However, the broadcast nature of wireless transmissions results in great vulnerability to device authentication. Physical layer authentication emerges as a promising approach by exploiting the unique channel characteristics. However, a practical scheme applicable to dynamic channel variations is still missing. In this paper, we proposed a deep learning-based physical layer channel state information (CSI) authentication for mobile scenarios and carried out comprehensive simulation and experimental evaluation using IEEE 802.11n. Specifically, a synthetic training dataset was generated based on the WLAN TGn channel model and the autocorrelation and the distance correlation of the channel, which can significantly reduce the overhead of manually collecting experimental datasets. A convolutional neural network (CNN)-based Siamese network was exploited to learn the temporal and spatial correlation between the CSI pair and output a score to measure their similarity. We adopted a synergistic methodology involving both simulation and experimental evaluation. The experimental testbed consisted of WiFi IoT development kits and a few typical scenarios were specifically considered. Both simulation and experimental evaluation demonstrated excellent generalization performance of our proposed deep learning-based approach and excellent authentication performance. Demonstrated by our practical measurement results, our proposed scheme improved the area under the curve (AUC) by 0.03 compared to the fully connected network-based (FCN-based) Siamese model and by 0.06 compared to the correlation-based benchmark algorithm.

【4】SEAL: Structure and Element Aware Learning to Improve Long Structured Document Retrieval
标题：SEAL：结构和元素感知学习以改进长结构文档检索
链接：https://arxiv.org/abs/2508.20778

作者：ang, Zhibo Ren, Yipeng Yu, Ying Zhou, Zulong Chen, Zeyi Wen
备注：Accepted at EMNLP 2025 Main Conference
摘要：在长结构化文档检索中，现有方法通常使用缺乏显式结构信息的数据集上的对比学习来微调预训练的语言模型（PLM）。这种做法有两个关键问题：1）目前的方法无法有效地利用结构特征和元素级语义，以及2）缺乏包含结构元数据的数据集。为了弥合这些差距，我们提出了一个新的对比学习框架。它利用结构感知学习来保持语义层次结构和屏蔽元素对齐，以实现细粒度的语义区分。此外，我们还发布了\dataset，一个具有丰富结构注释的长结构化文档检索数据集。在各种现代PLM上对已发布和工业数据集进行的广泛实验以及在线A/B测试表明，性能得到了一致的改善，在BGE-M3上将NDCG@10从73.96%提高到77.84%。这些资源可在https://github.com/xinhaoH/SEAL上获得。
摘要：In long structured document retrieval, existing methods typically fine-tune pre-trained language models (PLMs) using contrastive learning on datasets lacking explicit structural information. This practice suffers from two critical issues: 1) current methods fail to leverage structural features and element-level semantics effectively, and 2) the lack of datasets containing structural metadata. To bridge these gaps, we propose \our, a novel contrastive learning framework. It leverages structure-aware learning to preserve semantic hierarchies and masked element alignment for fine-grained semantic discrimination. Furthermore, we release \dataset, a long structured document retrieval dataset with rich structural annotations. Extensive experiments on both released and industrial datasets across various modern PLMs, along with online A/B testing, demonstrate consistent performance improvements, boosting NDCG@10 from 73.96\% to 77.84\% on BGE-M3. The resources are available at https://github.com/xinhaoH/SEAL.

【5】Physics-Constrained Machine Learning for Chemical Engineering
标题：化学工程的物理约束机器学习
链接：https://arxiv.org/abs/2508.20649

作者：herjee, Victor M. Zavala (Department of Chemical and Biological Engineering, University of Wisconsin-Madison, Madison, USA)
摘要：物理约束机器学习（PCML）将物理模型与数据驱动方法相结合，以提高可靠性，可推广性和可解释性。虽然PCML在不同的科学和工程领域显示出显着的好处，技术和智力的挑战阻碍了其在复杂的化学工程应用的适用性。主要困难包括确定要嵌入的物理知识的数量和类型，设计有效的ML融合策略，将模型扩展到大型数据集和模拟器，以及量化预测的不确定性。这一观点总结了最近的发展，并强调在应用PCML化学工程的挑战/机遇，强调闭环实验设计，实时动态和控制，以及多尺度现象的处理。
摘要：Physics-constrained machine learning (PCML) combines physical models with data-driven approaches to improve reliability, generalizability, and interpretability. Although PCML has shown significant benefits in diverse scientific and engineering domains, technical and intellectual challenges hinder its applicability in complex chemical engineering applications. Key difficulties include determining the amount and type of physical knowledge to embed, designing effective fusion strategies with ML, scaling models to large datasets and simulators, and quantifying predictive uncertainty. This perspective summarizes recent developments and highlights challenges/opportunities in applying PCML to chemical engineering, emphasizing on closed-loop experimental design, real-time dynamics and control, and handling of multi-scale phenomena.

【6】Khiops: An End-to-End, Frugal AutoML and XAI Machine Learning Solution for Large, Multi-Table Databases
标题：Khiops：针对大型多表数据库的端到端、节俭的AutoML和XAI机器学习解决方案
链接：https://arxiv.org/abs/2508.20519

作者：lé, Nicolas Voisine, Bruno Guerraz, Carine Hue, Felipe Olmos, Vladimir Popescu, Stéphane Gouache, Stéphane Bouget, Alexis Bondu, Luc Aurelien Gauthier, Yassine Nair Benrekia, Fabrice Clérot, Vincent Lemaire
摘要：Khiops是一个开源机器学习工具，用于挖掘大型多表数据库。Khiops基于一种独特的贝叶斯方法，吸引了学术界的兴趣，发表了20多篇关于变量选择、分类、决策树和联合聚类等主题的论文。它使用离散化模型为数值数据和值聚类为分类数据提供变量重要性的预测度量。建议的分类/回归模型是一个朴素贝叶斯分类器，结合变量选择和权重学习。在多表数据库的情况下，它通过自动构建聚合来提供命题化。Khiops适用于分析包含数百万个个体、数万个变量和次级表中数亿条记录的大型数据库。它可以在许多环境中使用，无论是从Python库还是通过用户界面。
摘要：Khiops is an open source machine learning tool designed for mining large multi-table databases. Khiops is based on a unique Bayesian approach that has attracted academic interest with more than 20 publications on topics such as variable selection, classification, decision trees and co-clustering. It provides a predictive measure of variable importance using discretisation models for numerical data and value clustering for categorical data. The proposed classification/regression model is a naive Bayesian classifier incorporating variable selection and weight learning. In the case of multi-table databases, it provides propositionalisation by automatically constructing aggregates. Khiops is adapted to the analysis of large databases with millions of individuals, tens of thousands of variables and hundreds of millions of records in secondary tables. It is available on many environments, both from a Python library and via a user interface.

【7】Uncovering the Spectral Bias in Diagonal State Space Models
标题：揭开对角状态空间模型中的谱偏差
链接：https://arxiv.org/abs/2508.20441

作者：ozabal, Velibor Bojkovic, Hilal AlQuabeh, Kentaro Inui, Martin Takáč
摘要：当前用于初始化状态空间模型（SSM）参数的方法主要依赖于基于正交多项式的在线近似的\textit{HiPPO框架}。最近，对角线替代方案已被证明达到类似的性能水平，同时由于内核计算的简化而显着更有效。然而，\textit{HiPPO框架}并没有明确研究其对角变体的作用。在本文中，我们采取进一步的步骤，从频率的角度来研究对角SSM初始化方案的作用。我们的工作旨在系统地了解如何对这些模型进行参数化，并揭示这种对角状态空间模型中固有的学习偏差。基于我们的观察，我们提出了一个对角初始化的离散傅立叶域\textit{S4 D-DFouT}。在初始化中对极点放置的作用的见解使我们能够进一步扩展它们，并在Long Range Arena基准测试中获得最先进的结果，使我们能够在非常大的数据集上从头开始训练PathX-256。
摘要：Current methods for initializing state space models (SSMs) parameters mainly rely on the \textit{HiPPO framework}, which is based on an online approximation of orthogonal polynomials. Recently, diagonal alternatives have shown to reach a similar level of performance while being significantly more efficient due to the simplification in the kernel computation. However, the \textit{HiPPO framework} does not explicitly study the role of its diagonal variants. In this paper, we take a further step to investigate the role of diagonal SSM initialization schemes from the frequency perspective. Our work seeks to systematically understand how to parameterize these models and uncover the learning biases inherent in such diagonal state-space models. Based on our observations, we propose a diagonal initialization on the discrete Fourier domain \textit{S4D-DFouT}. The insights in the role of pole placing in the initialization enable us to further scale them and achieve state-of-the-art results on the Long Range Arena benchmark, allowing us to train from scratch on very large datasets as PathX-256.

【8】Operator learning meets inverse problems: A probabilistic perspective
标题：操作员学习遇到逆问题：概率角度
链接：https://arxiv.org/abs/2508.20207

作者：H. Nelsen, Yunan Yang
备注：87 pages, 5 figures
摘要：算子学习为近似无限维函数空间之间的映射提供了一个强大的框架。它也已成为一个强大的工具，解决反问题的计算科学。本章综述了算子学习和逆问题交叉点的方法和理论发展。它首先总结了逆问题的概率和确定性方法，并特别注意新兴的以测量为中心的公式，将观测数据或未知参数作为概率分布。然后，讨论转向操作员学习，涵盖基本组件，如数据生成，损失函数和广泛使用的架构，用于表示函数到函数的映射。本章的核心集中在端到端逆算子学习范式，其目的是直接将观察到的数据映射到逆问题的解决方案，而不需要明确的前向映射知识。它突出了噪声在这种数据驱动的反演环境中所面临的独特挑战，提出了点预测和后验估计的结构感知架构，并调查了线性和非线性反演问题的相关理论。本章还讨论了先验和正则化的估计，其中算子学习在经典反演算法中更有选择性地使用。
摘要：Operator learning offers a robust framework for approximating mappings between infinite-dimensional function spaces. It has also become a powerful tool for solving inverse problems in the computational sciences. This chapter surveys methodological and theoretical developments at the intersection of operator learning and inverse problems. It begins by summarizing the probabilistic and deterministic approaches to inverse problems, and pays special attention to emerging measure-centric formulations that treat observed data or unknown parameters as probability distributions. The discussion then turns to operator learning by covering essential components such as data generation, loss functions, and widely used architectures for representing function-to-function maps. The core of the chapter centers on the end-to-end inverse operator learning paradigm, which aims to directly map observed data to the solution of the inverse problem without requiring explicit knowledge of the forward map. It highlights the unique challenge that noise plays in this data-driven inversion setting, presents structure-aware architectures for both point predictions and posterior estimates, and surveys relevant theory for linear and nonlinear inverse problems. The chapter also discusses the estimation of priors and regularizers, where operator learning is used more selectively within classical inversion algorithms.

【9】Polynomial Chaos Expansion for Operator Learning
标题：用于运算符学习的多项混乱展开
链接：https://arxiv.org/abs/2508.20886

作者：Sharma, Lukáš Novák, Michael D. Shields
摘要：算子学习（OL）已经成为科学机器学习（SciML）中的一个强大工具，用于近似无限维函数空间之间的映射。其主要应用之一是学习偏微分方程（PDE）的解算子。虽然这一领域的大部分进展都是由基于深度神经网络的方法（如深度运算符网络（DeepONet）和傅立叶神经运算符（FNO））推动的，但最近的工作已经开始探索用于OL的传统机器学习方法。在这项工作中，我们引入多项式混沌展开（PCE）作为OL方法。PCE已被广泛用于不确定性量化（UQ），最近在SciML的背景下获得了关注。对于OL，我们建立了一个数学框架，使PCE在纯粹的数据驱动和物理信息的设置近似运营商。所提出的框架减少了学习运营商的任务，以解决一个系统的方程的PCE系数。此外，该框架通过简单地对PCE系数进行后处理来提供UQ，而没有任何额外的计算成本。我们将所提出的方法应用到一组不同的偏微分方程问题，以证明其能力。数值结果表明，所提出的方法在OL和UQ任务的强大性能，实现了良好的数值精度和计算效率。
摘要：Operator learning (OL) has emerged as a powerful tool in scientific machine learning (SciML) for approximating mappings between infinite-dimensional functional spaces. One of its main applications is learning the solution operator of partial differential equations (PDEs). While much of the progress in this area has been driven by deep neural network-based approaches such as Deep Operator Networks (DeepONet) and Fourier Neural Operator (FNO), recent work has begun to explore traditional machine learning methods for OL. In this work, we introduce polynomial chaos expansion (PCE) as an OL method. PCE has been widely used for uncertainty quantification (UQ) and has recently gained attention in the context of SciML. For OL, we establish a mathematical framework that enables PCE to approximate operators in both purely data-driven and physics-informed settings. The proposed framework reduces the task of learning the operator to solving a system of equations for the PCE coefficients. Moreover, the framework provides UQ by simply post-processing the PCE coefficients, without any additional computational cost. We apply the proposed method to a diverse set of PDE problems to demonstrate its capabilities. Numerical results demonstrate the strong performance of the proposed method in both OL and UQ tasks, achieving excellent numerical accuracy and computational efficiency.

【10】Towards Trustworthy Amortized Bayesian Model Comparison
标题：迈向值得信赖的摊销Bayesian模型比较
链接：https://arxiv.org/abs/2508.20614

作者：harský, Aayush Mishra, Daniel Habermann, Stefan T. Radev, Paul-Christian Bürkner
备注：13 pages, 4 figures, submitted to Reliable ML from Unreliable Data Workshop at NeurIPS 2025
摘要：摊销贝叶斯模型比较（BMC）通过基于模拟的神经代理训练实现模型的快速概率排名。然而，当仿真模型被错误指定时，神经代理的可靠性就会恶化-这正是最需要模型比较的情况。因此，我们在未标记的真实数据上补充了基于模拟的训练，以提高经验分布变化下的BMC估计。使用数值实验和两个案例研究与真实数据，我们比较摊销的证据估计和不SC对分析或桥梁抽样基准。SC改进了模型误指定下的校准时，可以访问分析的可能性。然而，它提供了有限的增益与神经代理的可能性，使其最实用的值得信赖的BMC时，可能性是准确的。
摘要：Amortized Bayesian model comparison (BMC) enables fast probabilistic ranking of models via simulation-based training of neural surrogates. However, the reliability of neural surrogates deteriorates when simulation models are misspecified - the very case where model comparison is most needed. Thus, we supplement simulation-based training with a self-consistency (SC) loss on unlabeled real data to improve BMC estimates under empirical distribution shifts. Using a numerical experiment and two case studies with real data, we compare amortized evidence estimates with and without SC against analytic or bridge sampling benchmarks. SC improves calibration under model misspecification when having access to analytic likelihoods. However, it offers limited gains with neural surrogate likelihoods, making it most practical for trustworthy BMC when likelihoods are exact.

【11】Studying Effective String Theory using deep generative models
标题：使用深度生成模型研究有效弦理论
链接：https://arxiv.org/abs/2508.20610

作者：aselle, Elia Cellini, Alessandro Nada
备注：10 pages, 3 figures, 2 tables, contribution to "The XVIth Quark Confinement and the Hadron Spectrum Conference (QCHSC24)", PoS(QCHSC24)034
摘要：有效弦理论（EST）提供了一个强有力的非微扰框架来描述杨-米尔斯理论中的禁闭，它将静态夸克-反夸克对之间的禁闭通量管视为一根细的振动弦。虽然EST计算通常使用zeta函数正则化进行，但某些问题（如确定通量管宽度）太复杂，无法解析解决。然而，最近的研究表明，EST可以通过采用基于生成算法的深度学习技术来进行数值探索。在这项工作中，我们提供了一个简短的介绍EST和这种新的数值方法。最后，我们提出了结果的宽度的南部-得到\“o EST。
摘要：Effective String Theory (EST) offers a robust non-perturbative framework for describing confinement in Yang-Mills theory by treating the confining flux tube between a static quark-antiquark pair as a thin, vibrating string. While EST calculations are typically carried out using zeta-function regularization, certain problems-such as determining the flux tube width-are too complex to solve analytically. However, recent studies have demonstrated that EST can be explored numerically by employing deep learning techniques based on generative algorithms. In this work, we provide a brief introduction to EST and this novel numerical approach. Finally, we present results for the width of the Nambu-Got\"o EST.

【12】Machine-learning based particle-flow algorithm in CMS
标题：CMS中基于机器学习的粒子流算法
链接：https://arxiv.org/abs/2508.20541

作者：khtar
备注：8 pages, 5 figures, European Physical Society Conference on High Energy Physics (EPS-HEP2025)
摘要：粒子流（PF）算法通过重构终态粒子提供全局事件描述，是CMS中事件重构的核心。最近，已经提出了端到端机器学习（ML）方法来直接优化感兴趣的物理量并利用异构计算架构。机器学习粒子流（MLPF）就是这样一种方法，它使用Transformer模型在单次通过中直接从轨迹和簇推断粒子。我们介绍了最近CMS在MLPF中的发展，包括训练数据集，模型架构，重建指标，以及与离线重建软件的集成。
摘要：The particle-flow (PF) algorithm provides a global event description by reconstructing final-state particles and is central to event reconstruction in CMS. Recently, end-to-end machine learning (ML) approaches have been proposed to directly optimize physical quantities of interest and to leverage heterogeneous computing architectures. One such approach, machine-learned particle flow (MLPF), uses a transformer model to infer particles directly from tracks and clusters in a single pass. We present recent CMS developments in MLPF, including training datasets, model architecture, reconstruction metrics, and integration with offline reconstruction software.

【13】Molecular Machine Learning in Chemical Process Design
标题：化学过程设计中的分子机器学习
链接：https://arxiv.org/abs/2508.20527

作者：ttig, Manuel Dahmen, Martin Grohe, Philippe Schwaller, Alexander Mitsos
摘要：我们提出了在化学过程工程领域的分子机器学习（ML）的观点。最近，分子ML在（i）提供纯组分及其混合物的性质的高度准确的预测，以及（ii）探索新分子结构的化学空间方面表现出巨大的潜力。我们回顾了当前最先进的分子ML模型，并讨论了有望进一步发展的研究方向。这包括ML方法，如图神经网络和Transformers，可以通过以混合或物理信息的方式结合物理化学知识来进一步推进。然后，我们考虑在化学过程规模上利用分子ML，这是非常可取的，但尚未探索。我们讨论了如何将分子ML集成到工艺设计和优化配方中，有望加速新分子和工艺的识别。为此，必须建立分子和工艺设计基准，并实际验证拟议的候选方案，可能的话与化学工业合作。
摘要：We present a perspective on molecular machine learning (ML) in the field of chemical process engineering. Recently, molecular ML has demonstrated great potential in (i) providing highly accurate predictions for properties of pure components and their mixtures, and (ii) exploring the chemical space for new molecular structures. We review current state-of-the-art molecular ML models and discuss research directions that promise further advancements. This includes ML methods, such as graph neural networks and transformers, which can be further advanced through the incorporation of physicochemical knowledge in a hybrid or physics-informed fashion. Then, we consider leveraging molecular ML at the chemical process scale, which is highly desirable yet rather unexplored. We discuss how molecular ML can be integrated into process design and optimization formulations, promising to accelerate the identification of novel molecules and processes. To this end, it will be essential to create molecule and process design benchmarks and practically validate proposed candidates, possibly in collaboration with the chemical industry.

其他(21篇)

【1】Dress&Dance: Dress up and Dance as You Like It - Technical Preview
标题：Dress&Dance：Dress Up and Dance As You Like It（英语：Dress & Dance As You Like It）
链接：https://arxiv.org/abs/2508.21070

作者：hen, Aayush Bansal, Minh Phuoc Vo, Yu-Xiong Wang
备注：Project Page: this https URL
摘要：我们提出了Dress&Dance，这是一个视频扩散框架，可以生成高质量的5秒长的24 FPS虚拟试穿视频，分辨率为1152x720，用户穿着所需的服装，同时根据给定的参考视频移动。我们的方法需要一个单一的用户形象，并支持一系列的上衣，下装，和单件服装，以及同时上衣和下装试穿在一个单一的通行证。我们框架的关键是CondNet，这是一种新型的条件反射网络，它利用注意力来统一多模态输入（文本、图像和视频），从而增强服装配准和运动保真度。CondNet是在异构训练数据上训练的，以多级渐进的方式将有限的视频数据和更大、更容易获得的图像数据集结合起来。Dress&Dance优于现有的开源和商业解决方案，并实现了高质量和灵活的试穿体验。
摘要：We present Dress&Dance, a video diffusion framework that generates high quality 5-second-long 24 FPS virtual try-on videos at 1152x720 resolution of a user wearing desired garments while moving in accordance with a given reference video. Our approach requires a single user image and supports a range of tops, bottoms, and one-piece garments, as well as simultaneous tops and bottoms try-on in a single pass. Key to our framework is CondNet, a novel conditioning network that leverages attention to unify multi-modal inputs (text, images, and videos), thereby enhancing garment registration and motion fidelity. CondNet is trained on heterogeneous training data, combining limited video data and a larger, more readily available image dataset, in a multistage progressive manner. Dress&Dance outperforms existing open source and commercial solutions and enables a high quality and flexible try-on experience.

【2】On the Theoretical Limitations of Embedding-Based Retrieval
标题：论嵌入式检索的理论局限性
链接：https://arxiv.org/abs/2508.21038

作者：ler, Michael Boratko, Iftekhar Naim, Jinhyuk Lee
摘要：多年来，向量嵌入已经承担了越来越多的检索任务，使用它们进行推理、推理跟踪、编码等方面的工作正在兴起。这些新的基准测试推动嵌入适用于任何查询和任何相关性概念。虽然先前的工作已经指出了向量嵌入的理论局限性，但有一个共同的假设，即这些困难完全是由于不切实际的查询，而那些不是可以通过更好的训练数据和更大的模型来克服的。在这项工作中，我们证明，我们可能会遇到这些理论上的限制，在现实的设置非常简单的查询。我们连接学习理论中的已知结果，表明能够作为某些查询的结果返回的文档的前k个子集的数量受到嵌入维数的限制。我们的经验表明，即使我们限制到k=2，这也是正确的，并且直接在测试集上使用自由参数化嵌入进行优化。然后，我们创建了一个名为LIMIT的现实数据集，根据这些理论结果对模型进行压力测试，并观察到即使是最先进的模型也会在这个数据集上失败，尽管任务的性质很简单。我们的工作显示了嵌入模型在现有的单向量范式下的局限性，并呼吁未来的研究开发可以解决这一基本限制的方法。
摘要：Vector embeddings have been tasked with an ever-increasing set of retrieval tasks over the years, with a nascent rise in using them for reasoning, instruction-following, coding, and more. These new benchmarks push embeddings to work for any query and any notion of relevance that could be given. While prior works have pointed out theoretical limitations of vector embeddings, there is a common assumption that these difficulties are exclusively due to unrealistic queries, and those that are not can be overcome with better training data and larger models. In this work, we demonstrate that we may encounter these theoretical limitations in realistic settings with extremely simple queries. We connect known results in learning theory, showing that the number of top-k subsets of documents capable of being returned as the result of some query is limited by the dimension of the embedding. We empirically show that this holds true even if we restrict to k=2, and directly optimize on the test set with free parameterized embeddings. We then create a realistic dataset called LIMIT that stress tests models based on these theoretical results, and observe that even state-of-the-art models fail on this dataset despite the simple nature of the task. Our work shows the limits of embedding models under the existing single vector paradigm and calls for future research to develop methods that can resolve this fundamental limitation.

【3】Train-Once Plan-Anywhere Kinodynamic Motion Planning via Diffusion Trees
标题：通过扩散树进行一次训练计划随时随地运动动力运动规划
链接：https://arxiv.org/abs/2508.21001

作者：sidof, Tom Jurgenson, Kiril Solovey
备注：Accepted to CoRL 2025. Project page: this https URL
摘要：运动动力学运动规划是指在机器人的动力学约束条件下，计算出机器人的无碰撞轨迹。这一关键问题通常使用基于采样的规划器（SBPs）来解决，该规划器通过动作传播构建搜索树来探索机器人的高维状态空间。虽然SBPs可以提供完整性和解决方案质量的全局保证，但由于不知情的动作采样，其性能往往受到缓慢探索的阻碍。基于学习的方法可以产生更快的运行时间，但它们无法推广到分发外（OOD）场景，并且缺乏关键保证，例如，安全性，从而限制了它们在物理机器人上的部署。我们提出扩散树（DiTree）：一个可证明可推广的框架，利用扩散政策（DP）作为知情的采样器，有效地指导SBPs内的状态空间搜索。DiTree结合了DP的能力来模拟专家轨迹的复杂分布，以局部观察为条件，与SBPs的完整性相结合，以在复杂动态系统的几个动作传播迭代内产生可证明安全的解决方案。我们展示了DiTree的权力与实施相结合的流行RRT规划与DP行动采样器上训练的\emp {单一环境}。在对OOD场景的综合评估中，% DiTree的运行时间与独立DP相当（比经典SBP快3倍），同时提高了DP和SBP的平均成功率。DiTree平均比经典的SBP快3倍，并且通过实现大约30%的高成功率而优于所有其他方法。项目网页：https://sites.google.com/view/ditree
摘要：Kinodynamic motion planning is concerned with computing collision-free trajectories while abiding by the robot's dynamic constraints. This critical problem is often tackled using sampling-based planners (SBPs) that explore the robot's high-dimensional state space by constructing a search tree via action propagations. Although SBPs can offer global guarantees on completeness and solution quality, their performance is often hindered by slow exploration due to uninformed action sampling. Learning-based approaches can yield significantly faster runtimes, yet they fail to generalize to out-of-distribution (OOD) scenarios and lack critical guarantees, e.g., safety, thus limiting their deployment on physical robots. We present Diffusion Tree (DiTree): a \emph{provably-generalizable} framework leveraging diffusion policies (DPs) as informed samplers to efficiently guide state-space search within SBPs. DiTree combines DP's ability to model complex distributions of expert trajectories, conditioned on local observations, with the completeness of SBPs to yield \emph{provably-safe} solutions within a few action propagation iterations for complex dynamical systems. We demonstrate DiTree's power with an implementation combining the popular RRT planner with a DP action sampler trained on a \emph{single environment}. In comprehensive evaluations on OOD scenarios, % DiTree has comparable runtimes to a standalone DP (3x faster than classical SBPs), while improving the average success rate over DP and SBPs. DiTree is on average 3x faster than classical SBPs, and outperforms all other approaches by achieving roughly 30\% higher success rate. Project webpage: https://sites.google.com/view/ditree.

【4】Finite-Time Guarantees for Multi-Agent Combinatorial Bandits with Nonstationary Rewards
标题：具有非稳定奖励的多智能体组合盗贼的临时保证
链接：https://arxiv.org/abs/2508.20923

作者： B. Adams, Justin J. Boutilier, Qinyang He, Yonatan Mintz
备注：41 pages, 8 figures
摘要：我们研究了一个连续的资源分配问题，决策者在每个时期选择代理的子集，以最大限度地提高整体成果，而无需事先了解个人水平的影响。我们的框架适用于社区卫生干预、有针对性的数字广告和劳动力保留计划等环境，其中干预效果动态演变。代理人可能会表现出习惯（减少频繁选择的反应）或恢复（增强响应不频繁的选择）。技术挑战集中在非平稳的奖励分布，导致随着时间的推移不断变化的干预效果。这个问题需要平衡两个关键的相互竞争的目标：异质性的个人奖励和探索-开发权衡学习，以改善未来的决策，而不是最大限度地提高即时结果。我们的贡献介绍了第一个框架，将这种形式的非平稳奖励的组合多臂土匪文学。我们开发的算法与理论保证动态后悔，并通过糖尿病干预案例研究证明了实际疗效。与基线方法相比，我们的个性化社区干预算法在项目注册方面实现了高达三倍的改进，验证了该框架在现实世界应用中的潜力。这项工作将适应性学习的理论进步与人口水平行为改变干预的实际挑战联系起来。
摘要：We study a sequential resource allocation problem where a decision maker selects subsets of agents at each period to maximize overall outcomes without prior knowledge of individual-level effects. Our framework applies to settings such as community health interventions, targeted digital advertising, and workforce retention programs, where intervention effects evolve dynamically. Agents may exhibit habituation (diminished response from frequent selection) or recovery (enhanced response from infrequent selection). The technical challenge centers on nonstationary reward distributions that lead to changing intervention effects over time. The problem requires balancing two key competing objectives: heterogeneous individual rewards and the exploration-exploitation tradeoff in terms of learning for improved future decisions as opposed to maximizing immediate outcomes. Our contribution introduces the first framework incorporating this form of nonstationary rewards in the combinatorial multi-armed bandit literature. We develop algorithms with theoretical guarantees on dynamic regret and demonstrate practical efficacy through a diabetes intervention case study. Our personalized community intervention algorithm achieved up to three times as much improvement in program enrollment compared to baseline approaches, validating the framework's potential for real-world applications. This work bridges theoretical advances in adaptive learning with practical challenges in population-level behavioral change interventions.

【5】CoCoL: A Communication Efficient Decentralized Collaborative Method for Multi-Robot Systems
标题：CoCoCoL：一种用于多机器人系统的通信高效分散协作方法
链接：https://arxiv.org/abs/2508.20898

作者：ng, Yan Huang, Yixian Zhao, Wenchao Meng, Jinming Xu
备注：Accepted by IROS2025
摘要：协作学习提高了多机器人系统在复杂任务中的性能和适应性，但由于多机器人任务中固有的高通信开销和数据异构性，协作学习面临着重大挑战。为此，我们提出了CoCoL，一种通信效率高的分散式协作学习方法，专为具有异构本地数据集的多机器人系统量身定制。利用镜像下降框架，CoCoL通过捕获机器人目标函数之间的相似性，利用近似牛顿型更新实现了显着的通信效率，并通过不精确的子问题解决方案降低了计算成本。此外，梯度跟踪方案的集成确保了其对数据异构性的鲁棒性。三个有代表性的多机器人协作学习任务的实验结果表明，所提出的CoCoL在显着减少通信轮数和总带宽消耗，同时保持国家的最先进的准确性的优越性。这些优势在涉及非IID（非独立同分布）数据分布、流数据和时变网络拓扑的挑战性场景中尤为明显。
摘要：Collaborative learning enhances the performance and adaptability of multi-robot systems in complex tasks but faces significant challenges due to high communication overhead and data heterogeneity inherent in multi-robot tasks. To this end, we propose CoCoL, a Communication efficient decentralized Collaborative Learning method tailored for multi-robot systems with heterogeneous local datasets. Leveraging a mirror descent framework, CoCoL achieves remarkable communication efficiency with approximate Newton-type updates by capturing the similarity between objective functions of robots, and reduces computational costs through inexact sub-problem solutions. Furthermore, the integration of a gradient tracking scheme ensures its robustness against data heterogeneity. Experimental results on three representative multi robot collaborative learning tasks show the superiority of the proposed CoCoL in significantly reducing both the number of communication rounds and total bandwidth consumption while maintaining state-of-the-art accuracy. These benefits are particularly evident in challenging scenarios involving non-IID (non-independent and identically distributed) data distribution, streaming data, and time-varying network topologies.

【6】GPT-FT: An Efficient Automated Feature Transformation Using GPT for Sequence Reconstruction and Performance Enhancement
标题：GPT-FT：使用GPT进行序列重建和性能增强的高效自动特征转换
链接：https://arxiv.org/abs/2508.20824

作者： Dongjie Wang, Scott Piersall, Ye Zhang, Liqiang Wang
备注：17 pages, 9 figures. accepted by APWeb-WAIM 2025
摘要：特征转换通过优化数据表示在增强机器学习模型性能方面起着关键作用。最近的最先进的方法解决这个任务作为一个连续的嵌入优化问题，离散搜索转换成一个可学习的过程。虽然有效，但这些方法通常依赖于顺序编码器-解码器结构，这导致高计算成本和参数要求，限制了可扩展性和效率。为了解决这些局限性，我们提出了一个新的框架，通过四个步骤来实现自动特征转换：转换记录收集，嵌入空间的建设与修订的生成预训练Transformer（GPT）模型，梯度上升搜索和自回归重建。在我们的方法中，修改后的GPT模型提供两个主要功能：（a）特征变换序列重建和（b）通过构建嵌入空间来估计和增强下游任务的模型性能。这种多目标优化框架减小了参数大小并加速了转换过程。基准数据集上的实验结果表明，该框架匹配或超过基线性能，计算效率显着提高。这项工作突出了基于transformer的架构的潜力，可扩展的，高性能的自动化功能转换。
摘要：Feature transformation plays a critical role in enhancing machine learning model performance by optimizing data representations. Recent state-of-the-art approaches address this task as a continuous embedding optimization problem, converting discrete search into a learnable process. Although effective, these methods often rely on sequential encoder-decoder structures that cause high computational costs and parameter requirements, limiting scalability and efficiency. To address these limitations, we propose a novel framework that accomplishes automated feature transformation through four steps: transformation records collection, embedding space construction with a revised Generative Pre-trained Transformer (GPT) model, gradient-ascent search, and autoregressive reconstruction. In our approach, the revised GPT model serves two primary functions: (a) feature transformation sequence reconstruction and (b) model performance estimation and enhancement for downstream tasks by constructing the embedding space. Such a multi-objective optimization framework reduces parameter size and accelerates transformation processes. Experimental results on benchmark datasets show that the proposed framework matches or exceeds baseline performance, with significant gains in computational efficiency. This work highlights the potential of transformer-based architectures for scalable, high-performance automated feature transformation.

【7】Turning the Spell Around: Lightweight Alignment Amplification via Rank-One Safety Injection
标题：扭转魔咒：通过一级安全注入进行轻量级对齐放大
链接：https://arxiv.org/abs/2508.20766

作者：Abu Shairah, Hasan Abed Al Kader Hammoud, George Turkiyyah, Bernard Ghanem
备注：Under Review
摘要：大型语言模型（LLM）中的安全对齐通常涉及调解内部表示以拒绝有害请求。最近的研究表明，这些安全机制可以通过消融或删除模型中的特定代表性方向来绕过。在本文中，我们提出了相反的方法：一级安全注入（ROSI），这是一种白盒方法，通过将模型的激活永久转向拒绝中介子空间来放大模型的安全对齐。ROSI操作为应用于所有残余流写入矩阵的简单的、无微调的秩一权重修改。所需的安全方向可以从一个小的有害和无害的指令对集计算。我们表明，ROSI一贯增加安全拒绝率-由Llama Guard 3评估-同时保留该模型在标准基准（如MMLU，HellaSwag和Arc）上的实用性。此外，我们还表明，ROSI也可以通过放大自己的潜在安全方向来重新调整“未经审查”的模型，证明其作为有效的最后一英里安全程序的实用性。我们的研究结果表明，有针对性的，可解释的重量转向是一种廉价而有效的机制，以提高LLM的安全性，补充更多的资源密集型微调范例。
摘要：Safety alignment in Large Language Models (LLMs) often involves mediating internal representations to refuse harmful requests. Recent research has demonstrated that these safety mechanisms can be bypassed by ablating or removing specific representational directions within the model. In this paper, we propose the opposite approach: Rank-One Safety Injection (ROSI), a white-box method that amplifies a model's safety alignment by permanently steering its activations toward the refusal-mediating subspace. ROSI operates as a simple, fine-tuning-free rank-one weight modification applied to all residual stream write matrices. The required safety direction can be computed from a small set of harmful and harmless instruction pairs. We show that ROSI consistently increases safety refusal rates - as evaluated by Llama Guard 3 - while preserving the utility of the model on standard benchmarks such as MMLU, HellaSwag, and Arc. Furthermore, we show that ROSI can also re-align 'uncensored' models by amplifying their own latent safety directions, demonstrating its utility as an effective last-mile safety procedure. Our results suggest that targeted, interpretable weight steering is a cheap and potent mechanism to improve LLM safety, complementing more resource-intensive fine-tuning paradigms.

【8】Balancing Profit and Traveller Acceptance in Ride-Pooling Personalised Fares
标题：在拼车个性化票价中平衡利润和旅客接受度
链接：https://arxiv.org/abs/2508.20723

作者：jak, Rafal Kucharski
摘要：拼车系统要取得成功，必须提供有吸引力的服务，即以有吸引力的价格补偿感知成本。然而，由于时间价值的差异很大，每个旅行者都有自己可以接受的价格，而经营者并不知道。在这里，我们表明，个人接受水平可以由运营商学习（超过90美元\%$的准确度为汇集旅客在10美元$天），以优化个性化票价。我们提出了一个自适应的定价政策，每天运营商构建一个报价，逐步满足旅客的期望，并吸引不断增长的需求。我们的研究结果表明，运营商，通过学习行为特征的个人旅客，可以提高性能，不仅为旅客（增加效用），但也为自己（增加利润）。此外，这种知识允许运营商去除低效的共用游乐设施，并专注于有吸引力和有利可图的组合。
摘要：Ride-pooling systems, to succeed, must provide an attractive service, namely compensate perceived costs with an appealing price. However, because of a strong heterogeneity in a value-of-time, each traveller has his own acceptable price, unknown to the operator. Here, we show that individual acceptance levels can be learned by the operator (over $90\%$ accuracy for pooled travellers in $10$ days) to optimise personalised fares. We propose an adaptive pricing policy, where every day the operator constructs an offer that progressively meets travellers' expectations and attracts a growing demand. Our results suggest that operators, by learning behavioural traits of individual travellers, may improve performance not only for travellers (increased utility) but also for themselves (increased profit). Moreover, such knowledge allows the operator to remove inefficient pooled rides and focus on attractive and profitable combinations.

【9】MobileCLIP2: Improving Multi-Modal Reinforced Training
标题：ALECLIP 2：改善多模式强化训练
链接：https://arxiv.org/abs/2508.20691

作者：aghri, Pavan Kumar Anasosalu Vasu, Cem Koc, Vaishaal Shankar, Alexander Toshev, Oncel Tuzel, Hadi Pouransari
备注：TMLR August 2025
摘要：具有zero-shot功能的CLIP等基础图像-文本模型支持广泛的应用。MobileCLIP是一个最新的图像-文本模型系列，延迟为3- 15 ms，参数为50- 150 M，具有最先进的zero-shot精度。MobileCLIP的主要成分是其低延迟和轻型架构以及新颖的多模式强化培训，使来自多个字幕生成器和CLIP教师的知识蒸馏变得高效，可扩展和可再现。在本文中，我们通过以下方式改进MobileCLIP的多模态强化训练：1）在DFN数据集上训练的更好的CLIP教师集合，2）在DFN数据集上训练的改进的字幕教师，并在各种高质量图像字幕数据集上进行微调。我们发现了新的见解，通过消融，如温度调整的重要性，对比知识蒸馏，标题生成器微调的有效性，标题多样性，并从多个模型生成的合成字幕相结合的添加剂改进。我们训练了一个名为MobileCLIP 2的新模型系列，并在低延迟下实现了最先进的ImageNet-1 k zero-shot精度。特别是，与MobileCLIP-B架构相比，MobileCLIP 2-B的ImageNet-1 k准确率提高了2.2%。值得注意的是，MobileCLIP 2-S4与ImageNet-1 k上的SigLIP-SO 400 M/14的零射击（zero-shot）准确性相匹配，同时小了2$\times$，并在DFN ViT-L/14上进行了改进，延迟降低了2.5$\times$。我们发布了我们的预训练模型（https：//github.com/apple/ml-mobileclip）和数据生成代码（https：//github.com/apple/ml-mobileclip-dr）。数据生成代码可以轻松地使用分布式可扩展处理创建具有任意教师的新增强数据集。
摘要：Foundation image-text models such as CLIP with zero-shot capabilities enable a wide array of applications. MobileCLIP is a recent family of image-text models at 3-15ms latency and 50-150M parameters with state-of-the-art zero-shot accuracy. The main ingredients in MobileCLIP were its low-latency and light architectures and a novel multi-modal reinforced training that made knowledge distillation from multiple caption-generators and CLIP teachers efficient, scalable, and reproducible. In this paper, we improve the multi-modal reinforced training of MobileCLIP through: 1) better CLIP teacher ensembles trained on the DFN dataset, 2) improved captioner teachers trained on the DFN dataset and fine-tuned on a diverse selection of high-quality image-caption datasets. We discover new insights through ablations such as the importance of temperature tuning in contrastive knowledge distillation, the effectiveness of caption-generator fine-tuning for caption diversity, and the additive improvement from combining synthetic captions generated by multiple models. We train a new family of models called MobileCLIP2 and achieve state-of-the-art ImageNet-1k zero-shot accuracies at low latencies. In particular, we observe 2.2% improvement in ImageNet-1k accuracy for MobileCLIP2-B compared with MobileCLIP-B architecture. Notably, MobileCLIP2-S4 matches the zero-shot accuracy of SigLIP-SO400M/14 on ImageNet-1k while being 2$\times$ smaller and improves on DFN ViT-L/14 at 2.5$\times$ lower latency. We release our pretrained models (https://github.com/apple/ml-mobileclip) and the data generation code (https://github.com/apple/ml-mobileclip-dr). The data generation code makes it easy to create new reinforced datasets with arbitrary teachers using distributed scalable processing.

【10】Dimension Agnostic Testing of Survey Data Credibility through the Lens of Regression
标题：从回归角度对调查数据可信度进行维度不可知检验
链接：https://arxiv.org/abs/2508.20616

作者： Basu, Sourav Chakraborty, Debarshi Chanda, Buddha Dev Das, Arijit Ghosh, Arnab Ray
备注：30 pages, 8 figures, 6 Tables
摘要：评估抽样调查是否能充分代表人口是确保下游研究有效性的关键问题。通常，这个问题简化为估计两个高维分布之间的距离，这通常需要随着维度呈指数增长的样本数量。然而，根据用于数据分析的模型，从数据中得出的结论可能在不同的基础分布中保持一致。在这种情况下，我们提出了一个基于任务的方法来评估抽样调查的可信度。具体来说，我们引入了一个特定于模型的距离度量来量化这种可信度的概念。我们还设计了一个算法来验证的背景下，回归模型的调查数据的可信度。值得注意的是，我们算法的样本复杂度与数据维度无关。这种效率源于这样一个事实，即该算法专注于验证调查数据的可信度，而不是重建底层回归模型。此外，我们表明，如果试图通过重建回归模型来验证可信度，则样本复杂性与数据的维度呈线性关系。我们证明了我们的算法理论上的正确性，数值证明了我们的算法的性能。
摘要：Assessing whether a sample survey credibly represents the population is a critical question for ensuring the validity of downstream research. Generally, this problem reduces to estimating the distance between two high-dimensional distributions, which typically requires a number of samples that grows exponentially with the dimension. However, depending on the model used for data analysis, the conclusions drawn from the data may remain consistent across different underlying distributions. In this context, we propose a task-based approach to assess the credibility of sampled surveys. Specifically, we introduce a model-specific distance metric to quantify this notion of credibility. We also design an algorithm to verify the credibility of survey data in the context of regression models. Notably, the sample complexity of our algorithm is independent of the data dimension. This efficiency stems from the fact that the algorithm focuses on verifying the credibility of the survey data rather than reconstructing the underlying regression model. Furthermore, we show that if one attempts to verify credibility by reconstructing the regression model, the sample complexity scales linearly with the dimensionality of the data. We prove the theoretical correctness of our algorithm and numerically demonstrate our algorithm's performance.

【11】Flowing Straighter with Conditional Flow Matching for Accurate Speech Enhancement
标题：具有条件流匹配的流直器实现准确的语音增强
链接：https://arxiv.org/abs/2508.20584

作者：ross, Anton Ragni
备注：preprint, accepted
摘要：当前基于流的生成式语音增强方法学习曲线概率路径，该曲线概率路径对干净语音和有噪语音之间的映射进行建模。尽管有令人印象深刻的表现，弯曲的概率路径的含义是未知的。诸如薛定谔桥的方法专注于弯曲路径，其中时间依赖的梯度和方差不促进直线路径。机器学习研究的发现表明，直路径，如条件流匹配，更容易训练，并提供更好的泛化。在本文中，我们量化的路径直线度对语音增强质量的影响。我们报告的实验与薛定谔桥，在那里我们表明，某些配置导致更直的路径。相反，我们提出了独立的条件流匹配的语音增强，模型之间的噪声和干净的语音的直线路径。我们的经验表明，一个时间无关的方差比梯度对样本质量有更大的影响。虽然条件流匹配提高了几个语音质量指标，它需要多个推理步骤。我们通过推断训练的基于流的模型来纠正这一点，就好像它是直接预测的一样。我们的工作表明，直的时间无关的概率路径提高生成语音增强曲线依赖于时间的路径。
摘要：Current flow-based generative speech enhancement methods learn curved probability paths which model a mapping between clean and noisy speech. Despite impressive performance, the implications of curved probability paths are unknown. Methods such as Schrodinger bridges focus on curved paths, where time-dependent gradients and variance do not promote straight paths. Findings in machine learning research suggest that straight paths, such as conditional flow matching, are easier to train and offer better generalisation. In this paper we quantify the effect of path straightness on speech enhancement quality. We report experiments with the Schrodinger bridge, where we show that certain configurations lead to straighter paths. Conversely, we propose independent conditional flow-matching for speech enhancement, which models straight paths between noisy and clean speech. We demonstrate empirically that a time-independent variance has a greater effect on sample quality than the gradient. Although conditional flow matching improves several speech quality metrics, it requires multiple inference steps. We rectify this with a one-step solution by inferring the trained flow-based model as if it was directly predictive. Our work suggests that straighter time-independent probability paths improve generative speech enhancement over curved time-dependent paths.

【12】Assessing local deformation and computing scalar curvature with nonlinear conformal regularization of decoders
标题：利用解码器的非线性保形正规化评估局部变形并计算纯量弯曲
链接：https://arxiv.org/abs/2508.20413

作者：Couéraud, Vikram Sunkara, Christof Schütte
备注：9 pages
摘要：降维的一个目的是发现解释数据的主要因素，因此对许多应用程序至关重要。当处理高维数据时，自动编码器提供了一种简单而有效的方法来学习低维表示。一般自动编码器的两个组件首先包括将观察到的数据映射到潜在空间的编码器;其次是将潜在空间映射回原始观察空间的解码器，这允许学习原始数据的低维流形表示。在这篇文章中，我们介绍了一种新型的几何正则化，用于解码由深度神经网络近似的映射，即非线性共形正则化。这种正则化过程允许解码器映射的局部变化，并带有一个新的标量场，称为共形因子，它作为映射到原始数据空间时潜在空间所承受的局部变形量的定量指标。我们还表明，这种正则化技术允许计算的标量曲率的学习流形。在瑞士卷和CelebA数据集上进行了实现和实验，以说明如何从架构中获得这些量。
摘要：One aim of dimensionality reduction is to discover the main factors that explain the data, and as such is paramount to many applications. When working with high dimensional data, autoencoders offer a simple yet effective approach to learn low-dimensional representations. The two components of a general autoencoder consist first of an encoder that maps the observed data onto a latent space; and second a decoder that maps the latent space back to the original observation space, which allows to learn a low-dimensional manifold representation of the original data. In this article, we introduce a new type of geometric regularization for decoding maps approximated by deep neural networks, namely nonlinear conformal regularization. This regularization procedure permits local variations of the decoder map and comes with a new scalar field called conformal factor which acts as a quantitative indicator of the amount of local deformation sustained by the latent space when mapped into the original data space. We also show that this regularization technique allows the computation of the scalar curvature of the learned manifold. Implementation and experiments on the Swiss roll and CelebA datasets are performed to illustrate how to obtain these quantities from the architecture.

【13】BiListing: Modality Alignment for Listings
标题：双列表：列表的模式一致
链接：https://arxiv.org/abs/2508.20396

作者： Guy, Mihajlo Grbovic, Chun How Tan, Han Zhao
备注：None
摘要：Airbnb是提供旅行住宿的领导者。Airbnb一直依赖结构化数据来理解、排名和向客人推荐房源，这是因为从文本和图像中提取有意义的信息的能力有限，而且相关的复杂性也很高。随着表征学习的兴起，利用文本和照片中的丰富信息变得更加容易。一种流行的方法是为文本文档和图像创建嵌入，以支持计算列表之间的相似性或使用嵌入作为ML模型中的特征的用例。然而，Airbnb房源包含多种非结构化数据：多张图片、各种非结构化文本文档（如标题、描述和评论），这使得这种方法具有挑战性。具体来说，将不同信息片段的多个嵌入组合起来以达到单个表示是一项重要的任务。本文提出了双列表，双模态列表，一种通过利用大语言模型和预训练的语言图像模型来对齐列表的文本和照片的方法。BiListing方法具有几个有利的特性：将非结构化数据捕获到每个列表和模态的单个嵌入向量中，启用zero-shot能力以在用户友好的语义中有效地搜索库存，克服冷启动问题，以及启用沿着单个模态的列表到列表搜索，或者两者兼而有之。我们进行了离线和在线测试，以在Airbnb搜索排名模型中利用BiListing嵌入，并成功将其部署到生产中，实现了0.425%的NDCB收益，并带来了数千万美元的增量收入。
摘要：Airbnb is a leader in offering travel accommodations. Airbnb has historically relied on structured data to understand, rank, and recommend listings to guests due to the limited capabilities and associated complexity arising from extracting meaningful information from text and images. With the rise of representation learning, leveraging rich information from text and photos has become easier. A popular approach has been to create embeddings for text documents and images to enable use cases of computing similarities between listings or using embeddings as features in an ML model. However, an Airbnb listing has diverse unstructured data: multiple images, various unstructured text documents such as title, description, and reviews, making this approach challenging. Specifically, it is a non-trivial task to combine multiple embeddings of different pieces of information to reach a single representation. This paper proposes BiListing, for Bimodal Listing, an approach to align text and photos of a listing by leveraging large-language models and pretrained language-image models. The BiListing approach has several favorable characteristics: capturing unstructured data into a single embedding vector per listing and modality, enabling zero-shot capability to search inventory efficiently in user-friendly semantics, overcoming the cold start problem, and enabling listing-to-listing search along a single modality, or both. We conducted offline and online tests to leverage the BiListing embeddings in the Airbnb search ranking model, and successfully deployed it in production, achieved 0.425% of NDCB gain, and drove tens of millions in incremental revenue.

【14】P2C: Path to Counterfactuals
标题：P2C：反事实之路
链接：https://arxiv.org/abs/2508.20371

作者：gupta, Sadaf MD Halim, Joaquín Arias, Elmer Salazar, Gopal Gupta
摘要：机器学习模型越来越多地在金融、法律和招聘等高风险环境中推动决策，从而凸显了透明度的必要性。然而，关键的挑战是平衡透明度-澄清“为什么”作出决定-与追索权：提供关于“如何"从不利结果中取得有利结果的可行步骤。反事实解释揭示了“为什么”会出现不希望的结果，以及“如何"通过有针对性的特征改变（干预措施）来扭转这种结果。目前的反事实方法有其局限性：1）它们经常忽略特征之间的因果依赖关系，2）它们通常假设所有干预措施都可以同时发生，这在实际情况下是一个不切实际的假设，因为行动通常是按顺序进行的。因此，这些反事实在现实世界中往往是无法实现的。我们提出P2C（路径反事实），模型不可知的框架，产生一个计划（有序的行动序列）转换一个不利的结果，因果一致的有利结果。P2C通过以下方式解决了这两个限制：1）明确建模特征之间的因果关系; 2）确保计划中的每个中间状态都是可行的，并且因果关系有效。P2C使用目标导向的回答集编程系统（CASP）来生成计划，该计划考虑由于因果依赖关系而自动发生的特征变化。此外，P2C通过仅计算用户主动进行的更改来细化成本（工作量）计算，从而得到现实的成本估计。最后，P2C强调了其因果规划者如何优于标准规划者，后者缺乏因果知识，因此可能产生非法行为。
摘要：Machine-learning models are increasingly driving decisions in high-stakes settings, such as finance, law, and hiring, thus, highlighting the need for transparency. However, the key challenge is to balance transparency -- clarifying `why' a decision was made -- with recourse: providing actionable steps on `how' to achieve a favourable outcome from an unfavourable outcome. Counterfactual explanations reveal `why' an undesired outcome occurred and `how' to reverse it through targeted feature changes (interventions). Current counterfactual approaches have limitations: 1) they often ignore causal dependencies between features, and 2) they typically assume all interventions can happen simultaneously, an unrealistic assumption in practical scenarios where actions are typically taken in a sequence. As a result, these counterfactuals are often not achievable in the real world. We present P2C (Path-to-Counterfactuals), a model-agnostic framework that produces a plan (ordered sequence of actions) converting an unfavourable outcome to a causally consistent favourable outcome. P2C addresses both limitations by 1) Explicitly modelling causal relationships between features and 2) Ensuring that each intermediate state in the plan is feasible and causally valid. P2C uses the goal-directed Answer Set Programming system s(CASP) to generate the plan accounting for feature changes that happen automatically due to causal dependencies. Furthermore, P2C refines cost (effort) computation by only counting changes actively made by the user, resulting in realistic cost estimates. Finally, P2C highlights how its causal planner outperforms standard planners, which lack causal knowledge and thus can generate illegal actions.

【15】DFAMS: Dynamic-flow guided Federated Alignment based Multi-prototype Search
标题：DFAMS：基于动态流引导的联邦对齐的多原型搜索
链接：https://arxiv.org/abs/2508.20353

作者：ang, Xinke Jiang, Rihong Qiu, Ruiqing Li, Yihang Zhang, Yue Fang, Yongxin Xu, Hongxin Ding, Xu Chu, Junfeng Zhao, Yasha Wang
备注：7 pages, 3 figures
摘要：联邦检索（FR）路由查询跨多个外部知识源，以减轻幻觉的LLM，当必要的外部知识分布。然而，现有的方法很难检索高质量和相关的文档的模糊查询，特别是在跨域的情况下，这大大限制了他们的有效性，在支持下游生成任务。受动态信息流（DIF）的启发，我们提出了DFAMS，一个新的框架，利用DIF来识别潜在的查询意图，并构建语义对齐的知识分区跨异构源的准确检索。具体而言，DFAMS通过利用来自一些注释查询的梯度信号并采用基于Shapley值的归因来追踪与意图识别和子域边界检测相关联的神经元激活路径，来探测LLM中的DIF。然后，DFAMS利用DIF通过多原型对比学习来训练对齐模块，从而实现细粒度的源内建模和跨知识库的源间语义对齐。五个基准测试的实验结果表明，DFAMS在知识分类准确率、检索召回率和下游QA准确率方面优于先进的FR方法，分别达到14.37%、5.38%和6.45%，证明了其在复杂FR场景中的有效性。
摘要：Federated Retrieval (FR) routes queries across multiple external knowledge sources, to mitigate hallucinations of LLMs, when necessary external knowledge is distributed. However, existing methods struggle to retrieve high-quality and relevant documents for ambiguous queries, especially in cross-domain scenarios, which significantly limits their effectiveness in supporting downstream generation tasks. Inspired by dynamic information flow (DIF), we propose DFAMS, a novel framework that leverages DIF to identify latent query intents and construct semantically aligned knowledge partitions for accurate retrieval across heterogeneous sources. Specifically, DFAMS probes the DIF in LLMs by leveraging gradient signals from a few annotated queries and employing Shapley value-based attribution to trace neuron activation paths associated with intent recognition and subdomain boundary detection. Then, DFAMS leverages DIF to train an alignment module via multi-prototype contrastive learning, enabling fine-grained intra-source modeling and inter-source semantic alignment across knowledge bases. Experimental results across five benchmarks show that DFAMS outperforms advanced FR methods by up to 14.37% in knowledge classification accuracy, 5.38% in retrieval recall, and 6.45% in downstream QA accuracy, demonstrating its effectiveness in complex FR scenarios.

【16】Beacon: Post-Training Quantization with Integrated Grid Selection
标题：Beacon：具有集成网格选择的训练后量化
链接：https://arxiv.org/abs/2508.20293

作者：ang, Rayan Saab
摘要：量化是一种广泛使用的压缩技术，用于减少大型预训练模型的内存和计算成本。每通道后训练量化（PTQ）中的一个关键挑战是选择适当的缩放因子以用来自缩放量化网格的值来替换权重值。现有的方法通常通过启发式调整或网格搜索在一开始就固定规模。在这篇文章中，我们提出了信标，一个简单而有效的算法，消除了这种手动调整的需要。Beacon直接使用固定的非缩放字母表执行每通道PTQ，并通过利用对称标量量化的几何结构自动确定最佳缩放因子。它支持对称和非对称量化，只需最小的修改，并且不依赖于反向传播或大型校准集。尽管其简单性和免调谐性，Beacon实现了与最先进的方法相比具有竞争力的性能，使其成为高效模型部署的实用解决方案。
摘要：Quantization is a widely used compression technique for reducing the memory and computation costs of large pre-trained models. A key challenge in per-channel post-training quantization (PTQ) is selecting appropriate scaling factors to replace weight values with values from a scaled quantization grid. Existing methods typically fix the scale at the outset via heuristic tuning or grid search. In this note, we propose Beacon, a simple and effective algorithm that eliminates the need for such manual tuning. Beacon performs per-channel PTQ directly using a fixed non-scaled alphabet and automatically determines the optimal scaling factors by exploiting the geometry of symmetric scalar quantization. It supports both symmetric and asymmetric quantization with minimal modifications and does not rely on back-propagation or large calibration sets. Despite its simplicity and tuning-free nature, Beacon achieves competitive performance compared to state-of-the-art methods, making it a practical solution for efficient model deployment.

【17】Neural Spline Operators for Risk Quantification in Stochastic Systems
标题：随机系统风险量化的神经样条运算符
链接：https://arxiv.org/abs/2508.20288

作者：Wang, Raffaele Romagnoli, Kamyar Azizzadenesheli, Yorie Nakahira
摘要：准确量化不同随机系统中的长期风险概率对于安全关键控制至关重要。然而，现有的基于采样和基于偏微分方程（PDE）的方法往往难以处理复杂的动态变化。物理信息神经网络从固定和有限维的不同系统参数中学习风险概率的代理映射，但不能解释系统动力学中的功能变化。为了解决这些挑战，我们引入物理信息神经操作符（PINO）方法来解决风险量化问题，从不同的\textit{functional}系统动态学习映射到相应的风险概率。具体来说，我们提出了神经样条算子（NeSO），这是一个PINO框架，它利用B样条表示来提高训练效率，并实现更好的初始和边界条件执行，这对于准确的风险量化至关重要。我们提供了理论分析，证明了普遍的近似能力的NeSO。我们还提出了两个案例研究，一个具有不同的功能动力学和高维多智能体动力学，以证明NeSO的有效性和其显着的在线速度比现有的方法。建议的框架和随之而来的普遍逼近定理，预计将有利于其他控制或PDE相关的问题超出风险量化。
摘要：Accurately quantifying long-term risk probabilities in diverse stochastic systems is essential for safety-critical control. However, existing sampling-based and partial differential equation (PDE)-based methods often struggle to handle complex varying dynamics. Physics-informed neural networks learn surrogate mappings for risk probabilities from varying system parameters of fixed and finite dimensions, yet can not account for functional variations in system dynamics. To address these challenges, we introduce physics-informed neural operator (PINO) methods to risk quantification problems, to learn mappings from varying \textit{functional} system dynamics to corresponding risk probabilities. Specifically, we propose Neural Spline Operators (NeSO), a PINO framework that leverages B-spline representations to improve training efficiency and achieve better initial and boundary condition enforcements, which are crucial for accurate risk quantification. We provide theoretical analysis demonstrating the universal approximation capability of NeSO. We also present two case studies, one with varying functional dynamics and another with high-dimensional multi-agent dynamics, to demonstrate the efficacy of NeSO and its significant online speed-up over existing methods. The proposed framework and the accompanying universal approximation theorem are expected to be beneficial for other control or PDE-related problems beyond risk quantification.

【18】Coresets from Trajectories: Selecting Data via Correlation of Loss Differences
标题：来自轨迹的核心集：通过损失差异的相关性选择数据
链接：https://arxiv.org/abs/2508.20230

作者：garaj, Deepak Ravikumar, Kaushik Roy
摘要：深度学习模型在各个领域实现了最先进的性能，但在实时或资源受限的场景中面临可扩展性挑战。为了解决这个问题，我们提出了损失差异相关性（CLD），这是一个简单且可扩展的coreset选择指标，通过测量它们与保留验证集的损失轨迹的对齐来识别最有影响力的训练样本。CLD非常高效，只需要在训练检查点计算每个样本的损失值，并避免了许多现有子集选择方法中使用的昂贵的梯度和曲率计算。我们开发了一个一般的理论框架，建立基于CLD的coresets的收敛保证，证明收敛误差的上限由所选样本的对齐和验证集的代表性。在CIFAR-100和ImageNet-1 k上，基于CLD的核心集通常在子集大小上优于或接近最先进的方法，并且即使在不领先的情况下，也保持在计算成本更高的基线的1%以内。CLD跨架构（ResNet、VGG、DenseNet）有效传输，支持代理到目标选择，性能下降<1%。此外，CLD在仅使用早期检查点时是稳定的，导致可忽略的准确性损失。最后，CLD表现出固有的偏差减少，通过每类验证对齐，避免了额外的分层抽样的需要。总之，这些属性使CLD成为可扩展数据集优化的原则性，高效，稳定和可转移的工具。
摘要：Deep learning models achieve state-of-the-art performance across domains but face scalability challenges in real-time or resource-constrained scenarios. To address this, we propose Correlation of Loss Differences (CLD), a simple and scalable metric for coreset selection that identifies the most impactful training samples by measuring their alignment with the loss trajectories of a held-out validation set. CLD is highly efficient, requiring only per-sample loss values computed at training checkpoints, and avoiding the costly gradient and curvature computations used in many existing subset selection methods. We develop a general theoretical framework that establishes convergence guarantees for CLD-based coresets, demonstrating that the convergence error is upper-bounded by the alignment of the selected samples and the representativeness of the validation set. On CIFAR-100 and ImageNet-1k, CLD-based coresets typically outperform or closely match state-of-the-art methods across subset sizes, and remain within 1% of more computationally expensive baselines even when not leading. CLD transfers effectively across architectures (ResNet, VGG, DenseNet), enabling proxy-to-target selection with <1% degradation. Moreover, CLD is stable when using only early checkpoints, incurring negligible accuracy loss. Finally, CLD exhibits inherent bias reduction via per-class validation alignment, obviating the need for additional stratified sampling. Together, these properties make CLD a principled, efficient, stable, and transferable tool for scalable dataset optimization.

【19】Automatic Inspection Based on Switch Sounds of Electric Point Machines
标题：基于电动转辙机开关声音的自动检测
链接：https://arxiv.org/abs/2508.20870

作者：bata, Toshiki Gunji, Mitsuaki Tsuda, Takashi Endo, Kota Dohi, Tomoya Nishida, Satoko Nomoto
备注：Accepted at ASPECT 2025
摘要：自2018年以来，东日本铁路公司和日立制作所一直致力于用基于物联网的监控取代人工检查。目的是节省设备检查所需的时间，并提供适当的预防性维护。作为目视检查的替代方案，很难替代电气特性监测，并且引入新的高性能传感器成本高昂。2019年，我们在“NS”电动转辙机中安装了摄像头和麦克风，以减少设备故障造成的停机时间，从而实现对锁片状况的远程监控。提出了一种基于声音信息的道岔转辙错误检测方法，并取得了预期的试验结果。所提出的方法将使实时检测设备故障成为可能，从而减少对目视检查的需求。本文介绍了我们的技术研究成果，旨在使用声音自动化检查电子转辙机，特别是从2019年开始关注“开关声音”。
摘要：Since 2018, East Japan Railway Company and Hitachi, Ltd. have been working to replace human inspections with IoT-based monitoring. The purpose is Labor-saving required for equipment inspections and provide appropriate preventive maintenance. As an alternative to visual inspection, it has been difficult to substitute electrical characteristic monitoring, and the introduction of new high-performance sensors has been costly. In 2019, we implemented cameras and microphones in an ``NS'' electric point machines to reduce downtime from equipment failures, allowing for remote monitoring of lock-piece conditions. This method for detecting turnout switching errors based on sound information was proposed, and the expected test results were obtained. The proposed method will make it possible to detect equipment failures in real time, thereby reducing the need for visual inspections. This paper presents the results of our technical studies aimed at automating the inspection of electronic point machines using sound, specifically focusing on ``switch sound'' beginning in 2019.

【20】Stochastic Gradients under Nuisances
标题：滋扰下的随机因素
链接：https://arxiv.org/abs/2508.20326

作者：u, Ronak Mehta, Alex Luedtke, Zaid Harchaoui
摘要：随机梯度优化是从经典监督学习到现代自监督学习的各种场景的主要学习范式。我们考虑随机梯度算法的学习问题，其目标依赖于未知的滋扰参数，并建立非渐近收敛保证。我们的研究结果表明，虽然滋扰的存在可以改变最优和打乱优化轨迹，经典的随机梯度算法仍然可以收敛在适当的条件下，如奈曼正交性。此外，即使内曼正交性不满意，我们表明，近似正交更新（近似正交梯度预言）的算法变体可以实现类似的收敛速度。正交统计学习/双机器学习和因果推理的例子进行了讨论。
摘要：Stochastic gradient optimization is the dominant learning paradigm for a variety of scenarios, from classical supervised learning to modern self-supervised learning. We consider stochastic gradient algorithms for learning problems whose objectives rely on unknown nuisance parameters, and establish non-asymptotic convergence guarantees. Our results show that, while the presence of a nuisance can alter the optimum and upset the optimization trajectory, the classical stochastic gradient algorithm may still converge under appropriate conditions, such as Neyman orthogonality. Moreover, even when Neyman orthogonality is not satisfied, we show that an algorithm variant with approximately orthogonalized updates (with an approximately orthogonalized gradient oracle) may achieve similar convergence rates. Examples from orthogonal statistical learning/double machine learning and causal inference are discussed.

【21】The Mathematician's Assistant: Integrating AI into Research Practice
标题：数学家的助手：将人工智能融入研究实践
链接：https://arxiv.org/abs/2508.20236

作者：kel
备注：24 pages, 7 figures. Accepted for publication in Mathematische Semesterberichte (to appear in vol. 72, no. 2)
摘要：人工智能（AI）的快速发展，以“AlphaEvolve”和“Gemini Deep Think”等突破为标志，开始提供强大的新工具，这些工具有可能显著改变许多数学领域的研究实践。本文基于截至2025年8月2日的发展，探讨了数学研究背景下可公开访问的大型语言模型（LLM）的现状。我们对最近的基准测试的分析，如MathArena和开放证明语料库（Balunovic等人，2025; Dekoninck等人，2025），揭示了一个复杂的二重性：虽然最先进的模型在解决问题和评估证明方面表现出强大的能力，但它们也表现出系统性的缺陷，包括缺乏自我批判以及最终答案准确性和充分证明有效性之间的模型依赖性差异。基于这些发现，我们提出了一个持久的框架，将人工智能集成到研究工作流程中，以增强数学家的原则为中心。在这个模型中，人工智能在人类研究人员的严格指导下发挥副驾驶员的作用，这种方法被提炼成五个指导原则，以确保有效和负责任的使用。然后，我们系统地探讨了人工智能在整个研究生命周期中应用的七种基本方式，从创造力和思维到最终的写作过程，展示了这些原则如何转化为具体实践。我们的结论是，人工智能目前的主要作用是增强而不是自动化。这需要一套新的技能，专注于战略提示，关键验证和方法的严格性，以便有效地使用这些强大的工具。
摘要：The rapid development of artificial intelligence (AI), marked by breakthroughs like 'AlphaEvolve' and 'Gemini Deep Think', is beginning to offer powerful new tools that have the potential to significantly alter the research practice in many areas of mathematics. This paper explores the current landscape of publicly accessible large language models (LLMs) in a mathematical research context, based on developments up to August 2, 2025. Our analysis of recent benchmarks, such as MathArena and the Open Proof Corpus (Balunovi\'c et al., 2025; Dekoninck et al., 2025), reveals a complex duality: while state-of-the-art models demonstrate strong abilities in solving problems and evaluating proofs, they also exhibit systematic flaws, including a lack of self-critique and a model depending discrepancy between final-answer accuracy and full-proof validity. Based on these findings, we propose a durable framework for integrating AI into the research workflow, centered on the principle of the augmented mathematician. In this model, the AI functions as a copilot under the critical guidance of the human researcher, an approach distilled into five guiding principles for effective and responsible use. We then systematically explore seven fundamental ways AI can be applied across the research lifecycle, from creativity and ideation to the final writing process, demonstrating how these principles translate into concrete practice. We conclude that the primary role of AI is currently augmentation rather than automation. This requires a new skill set focused on strategic prompting, critical verification, and methodological rigor in order to effectively use these powerful tools.

机器翻译由腾讯交互翻译提供，仅供参考

点击“阅读原文”获取带摘要的学术速递